[ClusterLabs] Antw: [EXT] Re: [External] : Re: Fence Agent tests
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Mon Nov 7 02:13:22 EST 2022
Hi!
Maybe see "test-watchdog" in sbd's manual page ;-)
Regards,
Ulrich
>>> Robert Hayden <robert.h.hayden at oracle.com> schrieb am 05.11.2022 um 19:47
in
Nachricht
<SA2PR10MB44916EC50D93E8F6D8FA42EEC83A9 at SA2PR10MB4491.namprd10.prod.outlook.com>
>> ‑‑‑‑‑Original Message‑‑‑‑‑
>> From: Users <users‑bounces at clusterlabs.org> On Behalf Of Valentin Vidic
>> via Users
>> Sent: Saturday, November 5, 2022 1:07 PM
>> To: users at clusterlabs.org
>> Cc: Valentin Vidić <vvidic at valentin‑vidic.from.hr>
>> Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests
>>
>> On Sat, Nov 05, 2022 at 05:20:47PM +0000, Robert Hayden wrote:
>> > The OCI compute instances don't have a hardware watchdog, only the
>> software watchdog.
>> > So, when the network goes completely hung (e.g. firewall‑cmd panic‑on),
>> all network
>> > traffic stops which implies that IO to the SBD device also stops. I do
not
> see
>> the software
>> > watchdog take any action in response to the network hang.
>>
>> It seems like the watchdog is not working or is not configured with a
>> correct timeout here. sbd will not refresh the watchdog if it fails to
>> read from the disk, so the watchdog should eventually expire and reset
>> the node.
>
> That was my impression as well...so I may have something wrong. My
> expectation was that SBD daemon
> should be writing to the /dev/watchdog within 20 seconds and the kernel
> watchdog would self fence.
>
> Here is my setup
> root:dh2vgmprepap02:ablgmprep:/root:# grep ^SBD /etc/sysconfig/sbd
> SBD_DEVICE=/dev/disk/by‑id/scsi‑360e59ebc0f414569bcc7a5e4a6d58ccb‑part1
> SBD_PACEMAKER=yes
> SBD_STARTMODE=always
> SBD_DELAY_START=no
> SBD_WATCHDOG_DEV=/dev/watchdog
> SBD_WATCHDOG_TIMEOUT=5
> SBD_TIMEOUT_ACTION=flush,reboot
> SBD_MOVE_TO_ROOT_CGROUP=auto
> SBD_OPTS=
>
> root:dh2vgmprepap02:ablgmprep:/root:# sbd ‑d
> /dev/disk/by‑id/scsi‑360e59ebc0f414569bcc7a5e4a6d58ccb‑part1 dump
> ==Dumping header on disk
> /dev/disk/by‑id/scsi‑360e59ebc0f414569bcc7a5e4a6d58ccb‑part1
> Header version : 2.1
> UUID : 04096cc5‑1fb8‑44da‑9c4f‑4b6034a0fe06
> Number of slots : 255
> Sector size : 512
> Timeout (watchdog) : 20
> Timeout (allocate) : 2
> Timeout (loop) : 1
> Timeout (msgwait) : 40
> ==Header on disk
/dev/disk/by‑id/scsi‑360e59ebc0f414569bcc7a5e4a6d58ccb‑part1
> is dumped
>
> root:dh2vgmprepap02:ablgmprep:/root:# pcs stonith sbd status ‑‑full
> SBD STATUS
> <node name>: <installed> | <enabled> | <running>
> dh2vgmprepap03: YES | YES | YES
> dh2vgmprepap02: YES | YES | YES
>
> Messages list on device
> '/dev/disk/by‑id/scsi‑360e59ebc0f414569bcc7a5e4a6d58ccb‑part1':
> 0 dh2vgmprepap03 clear
> 1 dh2vgmprepap02 clear
>
>
> SBD header on device
> '/dev/disk/by‑id/scsi‑360e59ebc0f414569bcc7a5e4a6d58ccb‑part1':
> ==Dumping header on disk
> /dev/disk/by‑id/scsi‑360e59ebc0f414569bcc7a5e4a6d58ccb‑part1
> Header version : 2.1
> UUID : 04096cc5‑1fb8‑44da‑9c4f‑4b6034a0fe06
> Number of slots : 255
> Sector size : 512
> Timeout (watchdog) : 20
> Timeout (allocate) : 2
> Timeout (loop) : 1
> Timeout (msgwait) : 40
> ==Header on disk
/dev/disk/by‑id/scsi‑360e59ebc0f414569bcc7a5e4a6d58ccb‑part1
> is dumped
>
>
>>
>> ‑‑
>> Valentin
>> _______________________________________________
>> Manage your subscription:
>>
https://urldefense.com/v3/__https://lists.clusterlabs.org/mailman/listinfo/u
>> sers__;!!ACWV5N9M2RV99hQ!LPMOKgky02sAjkujkuJM8HLR5G5hAfCaQGPF
>> Zymg81e8rf3Z1klCgoi4HAicoJr6wBEhEvnYaLZ6G1vRBDTKyw$
>>
>> ClusterLabs home:
>> https://urldefense.com/v3/__https://www.clusterlabs.org/__;!!ACWV5N9
>> M2RV99hQ!LPMOKgky02sAjkujkuJM8HLR5G5hAfCaQGPFZymg81e8rf3Z1klCg
>> oi4HAicoJr6wBEhEvnYaLZ6G1tNVtP0BA$
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list