[ClusterLabs] [External] : Re: Fence Agent tests
Robert Hayden
robert.h.hayden at oracle.com
Sat Nov 5 14:47:59 EDT 2022
> -----Original Message-----
> From: Users <users-bounces at clusterlabs.org> On Behalf Of Valentin Vidic
> via Users
> Sent: Saturday, November 5, 2022 1:07 PM
> To: users at clusterlabs.org
> Cc: Valentin Vidić <vvidic at valentin-vidic.from.hr>
> Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests
>
> On Sat, Nov 05, 2022 at 05:20:47PM +0000, Robert Hayden wrote:
> > The OCI compute instances don't have a hardware watchdog, only the
> software watchdog.
> > So, when the network goes completely hung (e.g. firewall-cmd panic-on),
> all network
> > traffic stops which implies that IO to the SBD device also stops. I do not see
> the software
> > watchdog take any action in response to the network hang.
>
> It seems like the watchdog is not working or is not configured with a
> correct timeout here. sbd will not refresh the watchdog if it fails to
> read from the disk, so the watchdog should eventually expire and reset
> the node.
That was my impression as well...so I may have something wrong. My expectation was that SBD daemon
should be writing to the /dev/watchdog within 20 seconds and the kernel watchdog would self fence.
Here is my setup
root:dh2vgmprepap02:ablgmprep:/root:# grep ^SBD /etc/sysconfig/sbd
SBD_DEVICE=/dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1
SBD_PACEMAKER=yes
SBD_STARTMODE=always
SBD_DELAY_START=no
SBD_WATCHDOG_DEV=/dev/watchdog
SBD_WATCHDOG_TIMEOUT=5
SBD_TIMEOUT_ACTION=flush,reboot
SBD_MOVE_TO_ROOT_CGROUP=auto
SBD_OPTS=
root:dh2vgmprepap02:ablgmprep:/root:# sbd -d /dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1 dump
==Dumping header on disk /dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1
Header version : 2.1
UUID : 04096cc5-1fb8-44da-9c4f-4b6034a0fe06
Number of slots : 255
Sector size : 512
Timeout (watchdog) : 20
Timeout (allocate) : 2
Timeout (loop) : 1
Timeout (msgwait) : 40
==Header on disk /dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1 is dumped
root:dh2vgmprepap02:ablgmprep:/root:# pcs stonith sbd status --full
SBD STATUS
<node name>: <installed> | <enabled> | <running>
dh2vgmprepap03: YES | YES | YES
dh2vgmprepap02: YES | YES | YES
Messages list on device '/dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1':
0 dh2vgmprepap03 clear
1 dh2vgmprepap02 clear
SBD header on device '/dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1':
==Dumping header on disk /dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1
Header version : 2.1
UUID : 04096cc5-1fb8-44da-9c4f-4b6034a0fe06
Number of slots : 255
Sector size : 512
Timeout (watchdog) : 20
Timeout (allocate) : 2
Timeout (loop) : 1
Timeout (msgwait) : 40
==Header on disk /dev/disk/by-id/scsi-360e59ebc0f414569bcc7a5e4a6d58ccb-part1 is dumped
>
> --
> Valentin
> _______________________________________________
> Manage your subscription:
> https://urldefense.com/v3/__https://lists.clusterlabs.org/mailman/listinfo/u
> sers__;!!ACWV5N9M2RV99hQ!LPMOKgky02sAjkujkuJM8HLR5G5hAfCaQGPF
> Zymg81e8rf3Z1klCgoi4HAicoJr6wBEhEvnYaLZ6G1vRBDTKyw$
>
> ClusterLabs home:
> https://urldefense.com/v3/__https://www.clusterlabs.org/__;!!ACWV5N9
> M2RV99hQ!LPMOKgky02sAjkujkuJM8HLR5G5hAfCaQGPFZymg81e8rf3Z1klCg
> oi4HAicoJr6wBEhEvnYaLZ6G1tNVtP0BA$
More information about the Users
mailing list