<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Fri, Feb 6, 2026 at 3:41 PM Klaus Wenninger <<a href="mailto:kwenning@redhat.com">kwenning@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Feb 5, 2026 at 8:07 PM Anton Gavriliuk <<a href="mailto:Anton.Gavriliuk@hpe.ua" target="_blank">Anton.Gavriliuk@hpe.ua</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>
<div lang="EN-US">
<div>
<ul style="margin-top:0cm" type="disc">
<li style="margin-left:0cm">The other way round: pcs stonith create watchdog fence_watchdog<u></u><u></u></li></ul>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">Yes, that works, thank you! After creation it automatically started on 2<sup>nd</sup> node – memverge2<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">Cluster Summary:<u></u><u></u></p>
<p class="MsoNormal"> * Stack: corosync (Pacemaker is running)<u></u><u></u></p>
<p class="MsoNormal"> * Current DC: memverge2 (28) (version 3.0.1-3.el10-b1a23a6) - partition with quorum<u></u><u></u></p>
<p class="MsoNormal"> * Last updated: Thu Feb 5 21:02:49 2026 on memverge<u></u><u></u></p>
<p class="MsoNormal"> * Last change: Thu Feb 5 21:01:00 2026 by root via root on memverge<u></u><u></u></p>
<p class="MsoNormal"> * 2 nodes configured<u></u><u></u></p>
<p class="MsoNormal"> * 23 resource instances configured<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">Node List:<u></u><u></u></p>
<p class="MsoNormal"> * Node memverge (27): online, feature set 3.20.1<u></u><u></u></p>
<p class="MsoNormal"> * Node memverge2 (28): online, feature set 3.20.1<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">Full List of Resources:<u></u><u></u></p>
<p class="MsoNormal"> * Resource Group: g-nfs:<u></u><u></u></p>
<p class="MsoNormal"> * pb_nfs (ocf:heartbeat:portblock): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * ip0_nfs (ocf:heartbeat:IPaddr2): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * fs_nfs_internal_info_HA (ocf:heartbeat:Filesystem): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * fs_nfsshare_exports_HA (ocf:heartbeat:Filesystem): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * nfsserver (ocf:heartbeat:nfsserver): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * expfs_nfsshare_exports_HA (ocf:heartbeat:exportfs): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * samba_service (systemd:smb): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * fs_sambashare_exports_HA (ocf:heartbeat:Filesystem): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * punb_nfs (ocf:heartbeat:portblock): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * Resource Group: g-iscsi:<u></u><u></u></p>
<p class="MsoNormal"> * pb_iscsi (ocf:heartbeat:portblock): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * ip0_iscsi (ocf:heartbeat:IPaddr2): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * ip1_iscsi (ocf:heartbeat:IPaddr2): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * iscsi_target (ocf:heartbeat:iSCSITarget): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * iscsi_lun_drbd3 (ocf:heartbeat:iSCSILogicalUnit): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * iscsi_lun_drbd4 (ocf:heartbeat:iSCSILogicalUnit): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * punb_iscsi (ocf:heartbeat:portblock): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * Clone Set: ha-nfs-clone [ha-nfs] (promotable):<u></u><u></u></p>
<p class="MsoNormal"> * ha-nfs (ocf:linbit:drbd): Promoted memverge<u></u><u></u></p>
<p class="MsoNormal"> * ha-nfs (ocf:linbit:drbd): Unpromoted memverge2<u></u><u></u></p>
<p class="MsoNormal"> * Clone Set: ha-iscsi-clone [ha-iscsi] (promotable):<u></u><u></u></p>
<p class="MsoNormal"> * ha-iscsi (ocf:linbit:drbd): Promoted memverge<u></u><u></u></p>
<p class="MsoNormal"> * ha-iscsi (ocf:linbit:drbd): Unpromoted memverge2<u></u><u></u></p>
<p class="MsoNormal"> * ipmi-fence-memverge (stonith:fence_ipmilan): Started memverge2<u></u><u></u></p>
<p class="MsoNormal"> * ipmi-fence-memverge2 (stonith:fence_ipmilan): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * watchdog (stonith:fence_watchdog): Started memverge2<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">But I assume I should create the same for 1<sup>st</sup> node – memverge ?</p></div></div></div></blockquote><div><br></div><div>Probably you will not need a 2nd instance. That is as with any other fencing-resource where</div><div>usually monitoring would be running. But that isn't doing anything with watchdog iirc anyway.</div></div></div></blockquote><div><br></div><div>execution of a fencing action usually can happen wherever you don't explicitly forbid it.</div><div>which is the reason why you should ban it from nodes where you know it would fail for whatever reason.</div><div>watchdog is of course a bit peculiar here as the only action that happens as with other fence-agents</div><div>is meta-data - everything else is handled within pacemaker.</div><div>That was my primary intent when I implemented the possibility to make watchdog visible that you</div><div>could have it in a topology and that you could disable watchdog-fencing for certain nodes - using</div><div>the usual mechanisms and high-level-tooling then.</div><div><br></div><div>Klaus</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div><br></div><div>Klaus </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div lang="EN-US"><div><p class="MsoNormal"><u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">Anton<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<div style="border-right:none;border-bottom:none;border-left:none;border-top:1pt solid rgb(225,225,225);padding:3pt 0cm 0cm">
<p class="MsoNormal"><b><span style="font-size:11pt;font-family:Calibri,sans-serif">From:</span></b><span style="font-size:11pt;font-family:Calibri,sans-serif"> Klaus Wenninger <<a href="mailto:kwenning@redhat.com" target="_blank">kwenning@redhat.com</a>>
<br>
<b>Sent:</b> Thursday, February 5, 2026 4:16 PM<br>
<b>To:</b> Anton Gavriliuk <<a href="mailto:Anton.Gavriliuk@hpe.ua" target="_blank">Anton.Gavriliuk@hpe.ua</a>><br>
<b>Cc:</b> Andrei Borzenkov <<a href="mailto:arvidjaar@gmail.com" target="_blank">arvidjaar@gmail.com</a>>; Cluster Labs - All topics related to open-source clustering welcomed <<a href="mailto:users@clusterlabs.org" target="_blank">users@clusterlabs.org</a>><br>
<b>Subject:</b> Re: [ClusterLabs] Question about two level STONITH/fencing<u></u><u></u></span></p>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<div>
<p class="MsoNormal">On Thu, Feb 5, 2026 at 3:07<span style="font-family:Arial,sans-serif"> </span>PM Anton Gavriliuk <<a href="mailto:Anton.Gavriliuk@hpe.ua" target="_blank">Anton.Gavriliuk@hpe.ua</a>> wrote:<u></u><u></u></p>
</div>
<blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin-left:4.8pt;margin-right:0cm">
<div>
<div>
<div>
<ul type="disc">
<li>
But sry again I forgot to mention that the fence-resource has to be called 'watchdog' otherwise pacemaker won't align it with the already<br>
existent (if you have stonith-watchdog-timeout != 0) internal hidden device.<u></u><u></u></li></ul>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">[root@memverge ~]# pcs stonith create watchdog-fencing watchdog<u></u><u></u></p>
<p class="MsoNormal">Error: Agent 'stonith:watchdog' is not installed or does not provide valid metadata: crm_resource: Metadata query for stonith:watchdog failed: No such device or address, Error performing
operation: No such object, use --force to override<u></u><u></u></p>
<p class="MsoNormal">Error: Errors have occurred, therefore pcs is unable to continue<u></u><u></u></p>
</div>
</div>
</div>
</blockquote>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">The other way round: pcs stonith create watchdog fence_watchdog<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin-left:4.8pt;margin-right:0cm">
<div>
<div>
<div>
<p class="MsoNormal">[root@memverge ~]#<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<ul type="disc">
<li>
Can you provide your cib & corosync-config as that we don't have to write back and forth that often?<u></u><u></u></li></ul>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">I attached it in the files.<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">Anton<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<div style="border-right:none;border-bottom:none;border-left:none;border-top:1pt solid rgb(225,225,225);padding:3pt 0cm 0cm">
<p class="MsoNormal"><b><span style="font-size:11pt;font-family:Calibri,sans-serif">From:</span></b><span style="font-size:11pt;font-family:Calibri,sans-serif"> Klaus Wenninger <<a href="mailto:kwenning@redhat.com" target="_blank">kwenning@redhat.com</a>>
<br>
<b>Sent:</b> Thursday, February 5, 2026 3:42 PM<br>
<b>To:</b> Anton Gavriliuk <<a href="mailto:Anton.Gavriliuk@hpe.ua" target="_blank">Anton.Gavriliuk@hpe.ua</a>><br>
<b>Cc:</b> Andrei Borzenkov <<a href="mailto:arvidjaar@gmail.com" target="_blank">arvidjaar@gmail.com</a>>; Cluster Labs - All topics related to open-source clustering welcomed <<a href="mailto:users@clusterlabs.org" target="_blank">users@clusterlabs.org</a>><br>
<b>Subject:</b> Re: [ClusterLabs] Question about two level STONITH/fencing</span><u></u><u></u></p>
</div>
<p class="MsoNormal"> <u></u><u></u></p>
<div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<p class="MsoNormal"> <u></u><u></u></p>
<div>
<div>
<p class="MsoNormal">On Thu, Feb 5, 2026 at 2:21<span style="font-family:Arial,sans-serif"> </span>PM Anton Gavriliuk <<a href="mailto:Anton.Gavriliuk@hpe.ua" target="_blank">Anton.Gavriliuk@hpe.ua</a>>
wrote:<u></u><u></u></p>
</div>
<blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin:5pt 0cm 5pt 4.8pt">
<div>
<div>
<div>
<p class="MsoNormal">I tried,<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">[root@memverge ~]# pcs stonith create watchdog-fencing fence_watchdog<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">But after that, the running cluster is hanging...., I can't run "crm_mon -Rr", “error: Lost connection to controller”<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">Perhaps this is due to /dev/watchdog is already managed by pacemaker ?<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">[root@memverge ~]# systemctl status sbd<u></u><u></u></p>
<p class="MsoNormal">● sbd.service - Shared-storage based fencing daemon<u></u><u></u></p>
<p class="MsoNormal"> Loaded: loaded (/usr/lib/systemd/system/sbd.service; enabled; preset: disabled)<u></u><u></u></p>
<p class="MsoNormal"> Drop-In: /etc/systemd/system/sbd.service.d<u></u><u></u></p>
<p class="MsoNormal"> └─override.conf<u></u><u></u></p>
<p class="MsoNormal"> Active: active (running) since Tue 2026-02-03 16:09:00 EET; 1 day 22h ago<u></u><u></u></p>
<p class="MsoNormal">Invocation: 11a9ba526ef5403682980d67a886a7b9<u></u><u></u></p>
<p class="MsoNormal"> Docs: man:sbd(8)<u></u><u></u></p>
<p class="MsoNormal"> Main PID: 2473 (sbd)<u></u><u></u></p>
<p class="MsoNormal"> Tasks: 3 (limit: 3355442)<u></u><u></u></p>
<p class="MsoNormal"> Memory: 18.8M (peak: 19.5M)<u></u><u></u></p>
<p class="MsoNormal"> CPU: 2min 22.568s<u></u><u></u></p>
<p class="MsoNormal"> CGroup: /system.slice/sbd.service<u></u><u></u></p>
<p class="MsoNormal">
<span style="font-family:"MS Gothic"">├</span>─2473 "sbd: inquisitor"<u></u><u></u></p>
<p class="MsoNormal">
<span style="font-family:"MS Gothic"">├</span>─2487 "sbd: watcher: Pacemaker"<u></u><u></u></p>
<p class="MsoNormal"> └─2488 "sbd: watcher: Cluster"<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">Feb 03 16:09:00 memverge sbd[2473]: notice: inquisitor_child: Servant cluster is healthy (age: 0)<u></u><u></u></p>
<p class="MsoNormal">Feb 03 16:09:00 memverge sbd[2473]: notice: watchdog_init: Using watchdog device '/dev/watchdog'<u></u><u></u></p>
<p class="MsoNormal">Feb 03 16:09:00 memverge systemd[1]: Started sbd.service - Shared-storage based fencing daemon.<u></u><u></u></p>
<p class="MsoNormal">Feb 03 16:09:04 memverge sbd[2473]: notice: inquisitor_child: Servant pcmk is healthy (age: 0)<u></u><u></u></p>
<p class="MsoNormal">Feb 03 16:11:27 memverge systemd[1]: /etc/systemd/system/sbd.service.d/override.conf:1: Assignment outside of section. Ignoring.<u></u><u></u></p>
<p class="MsoNormal">Feb 03 16:11:28 memverge systemd[1]: /etc/systemd/system/sbd.service.d/override.conf:1: Assignment outside of section. Ignoring.<u></u><u></u></p>
<p class="MsoNormal">Feb 03 16:25:02 memverge sbd[2473]: warning: inquisitor_child: pcmk health check: UNHEALTHY<u></u><u></u></p>
<p class="MsoNormal">Feb 03 16:25:02 memverge sbd[2473]: warning: inquisitor_child: Servant pcmk is outdated (age: 1246)<u></u><u></u></p>
<p class="MsoNormal">Feb 03 16:25:03 memverge sbd[2473]: notice: inquisitor_child: Servant pcmk is healthy (age: 0)<u></u><u></u></p>
<p class="MsoNormal">Feb 05 15:01:05 memverge systemd[1]: /etc/systemd/system/sbd.service.d/override.conf:1: Assignment outside of section. Ignoring.<u></u><u></u></p>
<p class="MsoNormal">[root@memverge ~]#<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">Oh.., now it opened,<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">Cluster Summary:<u></u><u></u></p>
<p class="MsoNormal"> * Stack: corosync (Pacemaker is running)<u></u><u></u></p>
<p class="MsoNormal"> * Current DC: memverge (27) (version 3.0.1-3.el10-b1a23a6) - MIXED-VERSION partition with quorum<u></u><u></u></p>
<p class="MsoNormal"> * Last updated: Thu Feb 5 15:14:45 2026<u></u><u></u></p>
<p class="MsoNormal"> * Last change: Thu Feb 5 15:12:09 2026 by root via root on memverge<u></u><u></u></p>
<p class="MsoNormal"> * 2 nodes configured<u></u><u></u></p>
<p class="MsoNormal"> * 23 resource instances configured<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">Node List:<u></u><u></u></p>
<p class="MsoNormal"> * Node memverge (27): online, feature set 3.20.1<u></u><u></u></p>
<p class="MsoNormal"> * Node memverge2 (28): online, feature set <3.15.1<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">Full List of Resources:<u></u><u></u></p>
<p class="MsoNormal"> * Resource Group: g-nfs:<u></u><u></u></p>
<p class="MsoNormal"> * pb_nfs (ocf:heartbeat:portblock): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * ip0_nfs (ocf:heartbeat:IPaddr2): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * fs_nfs_internal_info_HA (ocf:heartbeat:Filesystem): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * fs_nfsshare_exports_HA (ocf:heartbeat:Filesystem): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * nfsserver (ocf:heartbeat:nfsserver): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * expfs_nfsshare_exports_HA (ocf:heartbeat:exportfs): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * samba_service (systemd:smb): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * fs_sambashare_exports_HA (ocf:heartbeat:Filesystem): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * punb_nfs (ocf:heartbeat:portblock): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * Resource Group: g-iscsi:<u></u><u></u></p>
<p class="MsoNormal"> * pb_iscsi (ocf:heartbeat:portblock): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * ip0_iscsi (ocf:heartbeat:IPaddr2): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * ip1_iscsi (ocf:heartbeat:IPaddr2): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * iscsi_target (ocf:heartbeat:iSCSITarget): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * iscsi_lun_drbd3 (ocf:heartbeat:iSCSILogicalUnit): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * iscsi_lun_drbd4 (ocf:heartbeat:iSCSILogicalUnit): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * punb_iscsi (ocf:heartbeat:portblock): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * Clone Set: ha-nfs-clone [ha-nfs] (promotable):<u></u><u></u></p>
<p class="MsoNormal"> * ha-nfs (ocf:linbit:drbd): Unpromoted memverge2<u></u><u></u></p>
<p class="MsoNormal"> * ha-nfs (ocf:linbit:drbd): Promoted memverge<u></u><u></u></p>
<p class="MsoNormal"> * Clone Set: ha-iscsi-clone [ha-iscsi] (promotable):<u></u><u></u></p>
<p class="MsoNormal"> * ha-iscsi (ocf:linbit:drbd): Unpromoted memverge2<u></u><u></u></p>
<p class="MsoNormal"> * ha-iscsi (ocf:linbit:drbd): Promoted memverge<u></u><u></u></p>
<p class="MsoNormal"> * ipmi-fence-memverge (stonith:fence_ipmilan): Started memverge2<u></u><u></u></p>
<p class="MsoNormal"> * ipmi-fence-memverge2 (stonith:fence_ipmilan): Started memverge<u></u><u></u></p>
<p class="MsoNormal"> * watchdog-fencing (stonith:fence_watchdog): Starting memverge2<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">Failed Resource Actions:<u></u><u></u></p>
<p class="MsoNormal"> * ipmi-fence-memverge_monitor_30000 on memverge2 'Error occurred' (1): call=93, status='Error', exitreason='Lost connection to fencer' * ipmi-fence-memveF<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">And there are so many records in /var/log/messages,<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">Feb 5 15:13:10 memverge pacemaker-controld[755570]: notice: Fencer connection failed (will retry): Transport endpoint is not connected<u></u><u></u></p>
<p class="MsoNormal">Feb 5 15:13:10 memverge pacemaker-controld[755570]: notice: Fencer connection failed (will retry): Transport endpoint is not connected<u></u><u></u></p>
<p class="MsoNormal">Feb 5 15:13:10 memverge pacemaker-controld[755570]: notice: Fencer connection failed (will retry): Transport endpoint is not connected<u></u><u></u></p>
<p class="MsoNormal">Feb 5 15:13:10 memverge pacemaker-controld[755570]: notice: Fencer connection failed (will retry): Transport endpoint is not connected<u></u><u></u></p>
<p class="MsoNormal">Feb 5 15:13:10 memverge pacemaker-controld[755570]: notice: Fencer connection failed (will retry): Transport endpoint is not connected<u></u><u></u></p>
<p class="MsoNormal">Feb 5 15:13:10 memverge pacemaker-controld[755570]: notice: Fencer connection failed (will retry): Transport endpoint is not connected<u></u><u></u></p>
<p class="MsoNormal">Feb 5 15:13:10 memverge pacemaker-controld[755570]: notice: Fencer connection failed (will retry): Transport endpoint is not connected<u></u><u></u></p>
<p class="MsoNormal">Feb 5 15:13:10 memverge pacemaker-controld[755570]: notice: Fencer connection failed (will retry): Transport endpoint is not connected<u></u><u></u></p>
<p class="MsoNormal">Feb 5 15:13:10 memverge pacemaker-controld[755570]: notice: Fencer connection failed (will retry): Transport endpoint is not connected<u></u><u></u></p>
<p class="MsoNormal">Feb 5 15:13:10 memverge pacemaker-controld[755570]: notice: Fencer connection failed (will retry): Transport endpoint is not connected<u></u><u></u></p>
<p class="MsoNormal">[root@memverge ~]#<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">I’m new in pacemaker/corosync, so it is quite complicated to me
<span style="font-family:"Segoe UI Emoji",sans-serif">😊</span><u></u><u></u></p>
<p class="MsoNormal">Or may be add fence_ipmilan as level 1 and don’t add sbd as level 2, assuming cluster should automatically detect it just because have-watchdog=true and fallback to sbd even without
explicit as level 2 ?<u></u><u></u></p>
</div>
</div>
</div>
</blockquote>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Not sure what we're seeing. The 'Fencer connection failed ...' thing would point to pacemaker-fenced having had a segfault or something.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">You might see traces of that elsewhere. And it would explain strange behavior of pacemaker in general if it is constantly trying to<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">restart pacemaker-fenced.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">But sry again I forgot to mention that the fence-resource has to be called 'watchdog' otherwise pacemaker won't align it with the already<br>
existent (if you have stonith-watchdog-timeout != 0) internal hidden device.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">If not doing so this is probably untested (Don't remember if I had tested that during development of the feature. It is definitely not a test-case<br>
for CI or something.) and might lead to pacemaker-fenced having an issue. So this should probably be fixed but if you use the correct<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">naming it should work.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Can you provide your cib & corosync-config as that we don't have to write back and forth that often?<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Regards,<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Klaus <u></u><u></u></p>
</div>
<blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin:5pt 0cm 5pt 4.8pt">
<div>
<div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
<p class="MsoNormal">Anton<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p>
<div style="border-right:none;border-bottom:none;border-left:none;border-top:1pt solid rgb(225,225,225);padding:3pt 0cm 0cm">
<p class="MsoNormal"><b><span style="font-size:11pt;font-family:Calibri,sans-serif">From:</span></b><span style="font-size:11pt;font-family:Calibri,sans-serif"> Klaus Wenninger <<a href="mailto:kwenning@redhat.com" target="_blank">kwenning@redhat.com</a>>
<br>
<b>Sent:</b> Thursday, February 5, 2026 2:52 PM<br>
<b>To:</b> Anton Gavriliuk <<a href="mailto:Anton.Gavriliuk@hpe.ua" target="_blank">Anton.Gavriliuk@hpe.ua</a>><br>
<b>Cc:</b> Andrei Borzenkov <<a href="mailto:arvidjaar@gmail.com" target="_blank">arvidjaar@gmail.com</a>>; Cluster Labs - All topics related to open-source clustering welcomed <<a href="mailto:users@clusterlabs.org" target="_blank">users@clusterlabs.org</a>><br>
<b>Subject:</b> Re: [ClusterLabs] Question about two level STONITH/fencing</span><u></u><u></u></p>
</div>
<p class="MsoNormal"> <u></u><u></u></p>
<div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<p class="MsoNormal"> <u></u><u></u></p>
<div>
<div>
<p class="MsoNormal">On Thu, Feb 5, 2026 at 12:56<span style="font-family:Arial,sans-serif"> </span>PM Anton Gavriliuk <<a href="mailto:Anton.Gavriliuk@hpe.ua" target="_blank">Anton.Gavriliuk@hpe.ua</a>>
wrote:<u></u><u></u></p>
</div>
<blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin:5pt 0cm 5pt 4.8pt">
<p class="MsoNormal"><br>
Correct, in addition to two cluster nodes, there is dedicated 3rd node physical server as qdevice.<br>
<br>
I'm thinking about two level fencing topology, 1st level - fence_ipmilan, 2nd - diskless sbd (hpwdt, /dev/watchdog)<br>
<br>
But I can't add sbd as a 2nd level fencing,<br>
<br>
[root@memverge2 ~]# pcs stonith level add 2 memverge watchdog<br>
Error: Stonith resource(s) 'watchdog' do not exist, use --force to override<br>
Error: Errors have occurred, therefore pcs is unable to continue<br>
[root@memverge2 ~]#<br>
<br>
So back to the original question - what is the most correct way of implementing STONITH/fencing with fence_iomilan + diskless sbd (hpwdt, /dev/watchdog) ?<u></u><u></u></p>
</blockquote>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Sorry then that I had overlooked qdevice (actually I thought I checked for it but ...).<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">For adding the watchdog into a topology you have to make it visible before - just add it as any fencing-device with fence_watchdog as agent.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">There is a fence_watchdog script but that is just for the meta-data. Pacemaker will recognize that hand handle the actual fencing internally.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Regards,<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Klaus<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 6pt;margin:5pt 0cm 5pt 4.8pt">
<p class="MsoNormal" style="margin-bottom:12pt"><br>
Anton<br>
<br>
<br>
-----Original Message-----<br>
From: Andrei Borzenkov <<a href="mailto:arvidjaar@gmail.com" target="_blank">arvidjaar@gmail.com</a>>
<br>
Sent: Thursday, February 5, 2026 1:17 PM<br>
To: Cluster Labs - All topics related to open-source clustering welcomed <<a href="mailto:users@clusterlabs.org" target="_blank">users@clusterlabs.org</a>><br>
Cc: Anton Gavriliuk <<a href="mailto:Anton.Gavriliuk@hpe.ua" target="_blank">Anton.Gavriliuk@hpe.ua</a>><br>
Subject: Re: [ClusterLabs] Question about two level STONITH/fencing<br>
<br>
On Thu, Feb 5, 2026 at 2:07<span style="font-family:Arial,sans-serif"> </span>PM Klaus Wenninger <<a href="mailto:kwenning@redhat.com" target="_blank">kwenning@redhat.com</a>> wrote:<br>
><br>
><br>
><br>
> On Wed, Feb 4, 2026 at 4:36<span style="font-family:Arial,sans-serif"> </span>PM Anton Gavriliuk via Users <<a href="mailto:users@clusterlabs.org" target="_blank">users@clusterlabs.org</a>> wrote:<br>
>><br>
>><br>
>><br>
>> Hello<br>
>><br>
>><br>
>><br>
>> There is two-node (HPE DL345 Gen12 servers) shared-nothing DRBD-based sync (Protocol C) replication, distributed active/standby pacemaker storage metro-cluster. The distributed active/standby pacemaker storage metro-cluster configured with qdevice, heuristics
(parallel fping) and fencing - fence_ipmilan and diskless sbd (hpwdt, /dev/watchdog). All cluster resources are configured to always run together on the same node.<br>
>><br>
>><br>
>><br>
>> The two storage cluster nodes and qdevice running on Rocky Linux 10.1<br>
>><br>
>> Pacemaker version 3.0.1<br>
>><br>
>> Corosync version 3.1.9<br>
>><br>
>> DRBD version 9.3.0<br>
>><br>
>><br>
>><br>
>> So, the question is – what is the most correct way of implementing STONITH/fencing with fence_iomilan + diskless sbd (hpwdt, /dev/watchdog) ?<br>
><br>
><br>
> The correct way of using diskless sbd with a two-node cluster is not <br>
> to use it ;-)<br>
><br>
> diskless sbd (watchdog-fencing) requires 'real' quorum and quorum <br>
> provided by corosync in two-node mode would introduce split-brain <br>
> which is the reason why sbd recognizes the two-node operation and <br>
> replaces quorum from corosync by the information that the peer node is currently in the cluster. This is fine for working with poison-pill fencing - a single single shared disk then doesn't become a single-point-of-failure as long as the peer is there. But
for watchdog-fencing that doesn't help because the peer going away would mean you have to commit suicide.<br>
><br>
> and alternative with a two-node cluster is to step away from the actual two-node design and go with qdevice for 'real' quorum.<br>
<br>
Hmm ... the original description does mention qdevice, although it is not quite clear where it is located (is there the third node?)<br>
<br>
> You'll need some kind of 3rd node but it doesn't have to be a full cluster node.<br>
><u></u><u></u></p>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div></blockquote></div></div>
</blockquote></div></div>