[ClusterLabs] [External] : Re: Fence Agent tests

Robert Hayden robert.h.hayden at oracle.com
Sat Nov 5 14:47:59 EDT 2022


> -----Original Message-----
> From: Users <users-bounces at clusterlabs.org> On Behalf Of Valentin Vidic
> via Users
> Sent: Saturday, November 5, 2022 1:07 PM
> To: users at clusterlabs.org
> Cc: Valentin Vidić <vvidic at valentin-vidic.from.hr>
> Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests
> 
> On Sat, Nov 05, 2022 at 05:20:47PM +0000, Robert Hayden wrote:
> > The OCI compute instances don't have a hardware watchdog, only the
> software watchdog.
> > So, when the network goes completely hung (e.g. firewall-cmd panic-on),
> all network
> > traffic stops which implies that IO to the SBD device also stops.  I do not see
> the software
> > watchdog take any action in response to the network hang.
> 
> It seems like the watchdog is not working or is not configured with a
> correct timeout here. sbd will not refresh the watchdog if it fails to
> read from the disk, so the watchdog should eventually expire and reset
> the node.

That was my impression as well...so I may have something wrong.  My expectation was that SBD daemon
should be writing to the /dev/watchdog within 20 seconds and the kernel watchdog would self fence.

Here is my setup
root:dh2vgmprepap02:ablgmprep:/root:# grep ^SBD /etc/sysconfig/sbd
SBD_DEVICE=/dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1
SBD_PACEMAKER=yes
SBD_STARTMODE=always
SBD_DELAY_START=no
SBD_WATCHDOG_DEV=/dev/watchdog
SBD_WATCHDOG_TIMEOUT=5
SBD_TIMEOUT_ACTION=flush,reboot
SBD_MOVE_TO_ROOT_CGROUP=auto
SBD_OPTS=

root:dh2vgmprepap02:ablgmprep:/root:# sbd -d /dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1 dump
==Dumping header on disk /dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1
Header version     : 2.1
UUID               : 04096cc5-1fb8-44da-9c4f-4b6034a0fe06
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 20
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 40
==Header on disk /dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1 is dumped

root:dh2vgmprepap02:ablgmprep:/root:# pcs stonith sbd status  --full
SBD STATUS
<node name>: <installed> | <enabled> | <running>
dh2vgmprepap03: YES | YES | YES
dh2vgmprepap02: YES | YES | YES

Messages list on device '/dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1':
0       dh2vgmprepap03  clear
1       dh2vgmprepap02  clear


SBD header on device '/dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1':
==Dumping header on disk /dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1
Header version     : 2.1
UUID               : 04096cc5-1fb8-44da-9c4f-4b6034a0fe06
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 20
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 40
==Header on disk /dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1 is dumped


> 
> --
> Valentin
> _______________________________________________
> Manage your subscription:
> https://urldefense.com/v3/__https://lists.clusterlabs.org/mailman/listinfo/u
> sers__;!!ACWV5N9M2RV99hQ!LPMOKgky02sAjkujkuJM8HLR5G5hAfCaQGPF
> Zymg81e8rf3Z1klCgoi4HAicoJr6wBEhEvnYaLZ6G1vRBDTKyw$
> 
> ClusterLabs home:
> https://urldefense.com/v3/__https://www.clusterlabs.org/__;!!ACWV5N9
> M2RV99hQ!LPMOKgky02sAjkujkuJM8HLR5G5hAfCaQGPFZymg81e8rf3Z1klCg
> oi4HAicoJr6wBEhEvnYaLZ6G1tNVtP0BA$


More information about the Users mailing list