[ClusterLabs] Antw: [EXT] Re: [External] : Re: Fence Agent tests

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Mon Nov 7 02:13:22 EST 2022


Hi!

Maybe see "test-watchdog" in sbd's manual page ;-)

Regards,
Ulrich

>>> Robert Hayden <robert.h.hayden at oracle.com> schrieb am 05.11.2022 um 19:47
in
Nachricht
<SA2PR10MB44916EC50D93E8F6D8FA42EEC83A9 at SA2PR10MB4491.namprd10.prod.outlook.com>

>>  ‑‑‑‑‑Original Message‑‑‑‑‑
>> From: Users <users‑bounces at clusterlabs.org> On Behalf Of Valentin Vidic
>> via Users
>> Sent: Saturday, November 5, 2022 1:07 PM
>> To: users at clusterlabs.org 
>> Cc: Valentin Vidić <vvidic at valentin‑vidic.from.hr>
>> Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests
>> 
>> On Sat, Nov 05, 2022 at 05:20:47PM +0000, Robert Hayden wrote:
>> > The OCI compute instances don't have a hardware watchdog, only the
>> software watchdog.
>> > So, when the network goes completely hung (e.g. firewall‑cmd panic‑on),
>> all network
>> > traffic stops which implies that IO to the SBD device also stops.  I do
not 
> see
>> the software
>> > watchdog take any action in response to the network hang.
>> 
>> It seems like the watchdog is not working or is not configured with a
>> correct timeout here. sbd will not refresh the watchdog if it fails to
>> read from the disk, so the watchdog should eventually expire and reset
>> the node.
> 
> That was my impression as well...so I may have something wrong.  My 
> expectation was that SBD daemon
> should be writing to the /dev/watchdog within 20 seconds and the kernel 
> watchdog would self fence.
> 
> Here is my setup
> root:dh2vgmprepap02:ablgmprep:/root:# grep ^SBD /etc/sysconfig/sbd
> SBD_DEVICE=/dev/disk/by‑id/scsi‑360e59ebc0f414569bcc7a5e4a6d58ccb‑part1
> SBD_PACEMAKER=yes
> SBD_STARTMODE=always
> SBD_DELAY_START=no
> SBD_WATCHDOG_DEV=/dev/watchdog
> SBD_WATCHDOG_TIMEOUT=5
> SBD_TIMEOUT_ACTION=flush,reboot
> SBD_MOVE_TO_ROOT_CGROUP=auto
> SBD_OPTS=
> 
> root:dh2vgmprepap02:ablgmprep:/root:# sbd ‑d 
> /dev/disk/by‑id/scsi‑360e59ebc0f414569bcc7a5e4a6d58ccb‑part1 dump
> ==Dumping header on disk 
> /dev/disk/by‑id/scsi‑360e59ebc0f414569bcc7a5e4a6d58ccb‑part1
> Header version     : 2.1
> UUID               : 04096cc5‑1fb8‑44da‑9c4f‑4b6034a0fe06
> Number of slots    : 255
> Sector size        : 512
> Timeout (watchdog) : 20
> Timeout (allocate) : 2
> Timeout (loop)     : 1
> Timeout (msgwait)  : 40
> ==Header on disk
/dev/disk/by‑id/scsi‑360e59ebc0f414569bcc7a5e4a6d58ccb‑part1 
> is dumped
> 
> root:dh2vgmprepap02:ablgmprep:/root:# pcs stonith sbd status  ‑‑full
> SBD STATUS
> <node name>: <installed> | <enabled> | <running>
> dh2vgmprepap03: YES | YES | YES
> dh2vgmprepap02: YES | YES | YES
> 
> Messages list on device 
> '/dev/disk/by‑id/scsi‑360e59ebc0f414569bcc7a5e4a6d58ccb‑part1':
> 0       dh2vgmprepap03  clear
> 1       dh2vgmprepap02  clear
> 
> 
> SBD header on device 
> '/dev/disk/by‑id/scsi‑360e59ebc0f414569bcc7a5e4a6d58ccb‑part1':
> ==Dumping header on disk 
> /dev/disk/by‑id/scsi‑360e59ebc0f414569bcc7a5e4a6d58ccb‑part1
> Header version     : 2.1
> UUID               : 04096cc5‑1fb8‑44da‑9c4f‑4b6034a0fe06
> Number of slots    : 255
> Sector size        : 512
> Timeout (watchdog) : 20
> Timeout (allocate) : 2
> Timeout (loop)     : 1
> Timeout (msgwait)  : 40
> ==Header on disk
/dev/disk/by‑id/scsi‑360e59ebc0f414569bcc7a5e4a6d58ccb‑part1 
> is dumped
> 
> 
>> 
>> ‑‑
>> Valentin
>> _______________________________________________
>> Manage your subscription:
>>
https://urldefense.com/v3/__https://lists.clusterlabs.org/mailman/listinfo/u

>> sers__;!!ACWV5N9M2RV99hQ!LPMOKgky02sAjkujkuJM8HLR5G5hAfCaQGPF
>> Zymg81e8rf3Z1klCgoi4HAicoJr6wBEhEvnYaLZ6G1vRBDTKyw$
>> 
>> ClusterLabs home:
>> https://urldefense.com/v3/__https://www.clusterlabs.org/__;!!ACWV5N9 
>> M2RV99hQ!LPMOKgky02sAjkujkuJM8HLR5G5hAfCaQGPFZymg81e8rf3Z1klCg
>> oi4HAicoJr6wBEhEvnYaLZ6G1tNVtP0BA$
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 





More information about the Users mailing list