<html><body><p><font size="2">Hi Strahil,<br><br>Here is the output of those commands.... I appreciate the help!</font><br><font size="2"><br></font><b><u><font size="2"># crm config show</font></u></b><font size="2"><br></font><font size="2" face="Courier New">node 1: ceha03 \</font><br><font size="2" face="Courier New"> attributes ethmonitor-ens192=1</font><br><font size="2" face="Courier New">node 2: ceha04 \</font><br><font size="2" face="Courier New"> attributes ethmonitor-ens192=1</font><br><font size="2" face="Courier New">(...)</font><br><font size="2" face="Courier New">primitive stonith_sbd stonith:fence_sbd \</font><br><font size="2" face="Courier New"> params devices="/dev/sde1" \</font><br><font size="2" face="Courier New"> meta is-managed=true<br>(...)</font><br><font size="2" face="Courier New">property cib-bootstrap-options: \</font><br><font size="2" face="Courier New"> have-watchdog=true \</font><br><font size="2" face="Courier New"> dc-version=2.0.2-1.el8-744a30d655 \</font><br><font size="2" face="Courier New"> cluster-infrastructure=corosync \</font><br><font size="2" face="Courier New"> cluster-name=ps_dom \</font><br><font size="2" face="Courier New"> stonith-enabled=true \</font><br><font size="2" face="Courier New"> no-quorum-policy=ignore \</font><br><font size="2" face="Courier New"> stop-all-resources=false \</font><br><font size="2" face="Courier New"> cluster-recheck-interval=60 \</font><br><font size="2" face="Courier New"> symmetric-cluster=true \</font><br><font size="2" face="Courier New"> stonith-watchdog-timeout=0</font><br><font size="2" face="Courier New">rsc_defaults rsc-options: \</font><br><font size="2" face="Courier New"> is-managed=false \</font><br><font size="2" face="Courier New"> resource-stickiness=0 \</font><br><font size="2" face="Courier New"> failure-timeout=1min</font><font size="2"><br></font><br><b><u><font size="2"># cat /etc/sysconfig/sbd</font></u></b><br><tt><font size="2">SBD_DEVICE="/dev/sde1"</font></tt><br><tt><font size="2">SBD_PACEMAKER=yes</font></tt><br><tt><font size="2">SBD_STARTMODE=always</font></tt><br><tt><font size="2">SBD_DELAY_START=no</font></tt><br><tt><font size="2">SBD_WATCHDOG_DEV=/dev/watchdog</font></tt><br><tt><font size="2">SBD_WATCHDOG_TIMEOUT=5</font></tt><br><tt><font size="2">SBD_TIMEOUT_ACTION=flush,reboot</font></tt><br><tt><font size="2">SBD_MOVE_TO_ROOT_CGROUP=auto</font></tt><br><tt><font size="2">SBD_OPTS=</font></tt><br><br><b><u><font size="2"># systemctl status sbd</font></u></b><br><font size="2" face="Courier New"> sbd.service - Shared-storage based fencing daemon</font><br><font size="2" face="Courier New"> Loaded: loaded (/usr/lib/systemd/system/sbd.service; enabled; vendor preset: disabled)</font><br><font size="2" face="Courier New"> Active: active (running) since Mon 2020-09-21 18:36:28 EDT; 15min ago</font><br><font size="2" face="Courier New"> Docs: man:sbd(8)</font><br><font size="2" face="Courier New"> Process: 12810 ExecStart=/usr/sbin/sbd $SBD_OPTS -p /var/run/sbd.pid watch (code=exited, status=0/SUCCESS)</font><br><font size="2" face="Courier New"> Main PID: 12812 (sbd)</font><br><font size="2" face="Courier New"> Tasks: 4 (limit: 26213)</font><br><font size="2" face="Courier New"> Memory: 14.5M</font><br><font size="2" face="Courier New"> CGroup: /system.slice/sbd.service</font><br><font size="2" face="Courier New"> \u251c\u250012812 sbd: inquisitor</font><br><font size="2" face="Courier New"> \u251c\u250012814 sbd: watcher: /dev/sde1 - slot: 0 - uuid: 94d67f15-e301-4fa9-89ae-e3ce2e82c9e7</font><br><font size="2" face="Courier New"> \u251c\u250012815 sbd: watcher: Pacemaker</font><br><font size="2" face="Courier New"> \u2514\u250012816 sbd: watcher: Cluster</font><br><br><font size="2" face="Courier New">Sep 21 18:36:27 ceha03.canlab.ibm.com systemd[1]: Starting Shared-storage based fencing daemon...</font><br><font size="2" face="Courier New">Sep 21 18:36:27 ceha03.canlab.ibm.com sbd[12810]: notice: main: Doing flush + writing 'b' to sysrq on timeout</font><br><font size="2" face="Courier New">Sep 21 18:36:27 ceha03.canlab.ibm.com sbd[12815]: pcmk: notice: servant_pcmk: Monitoring Pacemaker health</font><br><font size="2" face="Courier New">Sep 21 18:36:27 ceha03.canlab.ibm.com sbd[12816]: cluster: notice: servant_cluster: Monitoring unknown cluster health</font><br><font size="2" face="Courier New">Sep 21 18:36:27 ceha03.canlab.ibm.com sbd[12814]: /dev/sde1: notice: servant_md: Monitoring slot 0 on disk /dev/sde1</font><br><font size="2" face="Courier New">Sep 21 18:36:28 ceha03.canlab.ibm.com sbd[12812]: notice: watchdog_init: Using watchdog device '/dev/watchdog'</font><br><font size="2" face="Courier New">Sep 21 18:36:28 ceha03.canlab.ibm.com sbd[12816]: cluster: notice: sbd_get_two_node: Corosync is in 2Node-mode</font><br><font size="2" face="Courier New">Sep 21 18:36:28 ceha03.canlab.ibm.com sbd[12812]: notice: inquisitor_child: Servant cluster is healthy (age: 0)</font><br><font size="2" face="Courier New">Sep 21 18:36:28 ceha03.canlab.ibm.com systemd[1]: Started Shared-storage based fencing daemon.</font><br><font size="2"><br></font><b><u><font size="2"># sbd -d /dev/disk/by-id/scsi-<long_uuid> dump</font></u></b><font size="2"><br></font><font size="2" face="Courier New">[root@ceha03 by-id]# sbd -d /dev/disk/by-id/scsi-36000c292840d37bd13eb6be46d3af4ab-part1 dump</font><br><font size="2" face="Courier New">==Dumping header on disk /dev/disk/by-id/scsi-36000c292840d37bd13eb6be46d3af4ab-part1</font><br><font size="2" face="Courier New">Header version : 2.1</font><br><font size="2" face="Courier New">UUID : 94d67f15-e301-4fa9-89ae-e3ce2e82c9e7</font><br><font size="2" face="Courier New">Number of slots : 255</font><br><font size="2" face="Courier New">Sector size : 512</font><br><font size="2" face="Courier New">Timeout (watchdog) : 5</font><br><font size="2" face="Courier New">Timeout (allocate) : 2</font><br><font size="2" face="Courier New">Timeout (loop) : 1</font><br><font size="2" face="Courier New">Timeout (msgwait) : 10</font><br><font size="2" face="Courier New">==Header on disk /dev/disk/by-id/scsi-36000c292840d37bd13eb6be46d3af4ab-part1 is dumped</font><br><br><font size="2"><br>Thanks,</font><br><br><b><font size="2">Phil Stedman</font></b><br><font size="2">Db2 High Availability Development and Support<br>Email: pmstedma@us.ibm.com</font><br><br><img width="16" height="16" src="cid:1__=8FBB0F79DFE978578f9e8a93df938690918c8FB@" border="0" alt="Inactive hide details for Strahil Nikolov ---09/21/2020 01:41:10 PM---Can you provide (replace sensitive data) : crm configure "><font size="2" color="#424282">Strahil Nikolov ---09/21/2020 01:41:10 PM---Can you provide (replace sensitive data) : crm configure show</font><br><br><font size="2" color="#5F5F5F">From: </font><font size="2">Strahil Nikolov <hunter86_bg@yahoo.com></font><br><font size="2" color="#5F5F5F">To: </font><font size="2">"users@clusterlabs.org" <users@clusterlabs.org></font><br><font size="2" color="#5F5F5F">Date: </font><font size="2">09/21/2020 01:41 PM</font><br><font size="2" color="#5F5F5F">Subject: </font><font size="2">[EXTERNAL] Re: [ClusterLabs] SBD fencing not working on my two-node cluster</font><br><font size="2" color="#5F5F5F">Sent by: </font><font size="2">"Users" <users-bounces@clusterlabs.org></font><br><hr width="100%" size="2" align="left" noshade style="color:#8091A5; "><br><br><br><tt><font size="2">Can you provide (replace sensitive data) :<br><br>crm configure show<br>cat /etc/sysconfig/sbd<br>systemctl status sbd<br>sbd -d /dev/disk/by-id/scsi-<long_uuid> dump<br><br>P.S.: It is very bad practice to use "/dev/sdXYZ" as these are not permanent.Always use persistent names like those inside "/dev/disk/by-XYZ/ZZZZ". Also , SBD needs max 10MB block device and yours seems unnecessarily big.<br><br><br>Most probably /dev/sde1 is your problem. <br><br>Best Regards,<br>Strahil Nikolov<br><br><br><br><br>В понеделник, 21 септември 2020 г., 23:19:47 Гринуич+3, Philippe M Stedman <pmstedma@us.ibm.com> написа: <br><br><br><br><br><br>Hi,<br><br>I have been following the instructions on the following page to try and configure SBD fencing on my two-node cluster:<br></font></tt><tt><font size="2"><a href="https://documentation.suse.com/sle-ha/15-SP1/html/SLE-HA-all/cha-ha-storage-protect.html">https://documentation.suse.com/sle-ha/15-SP1/html/SLE-HA-all/cha-ha-storage-protect.html</a></font></tt><tt><font size="2"> <br><br>I am able to get through all the steps successfully, I am using the following device (/dev/sde1) as my shared disk:<br><br>Disk /dev/sde: 20 GiB, 21474836480 bytes, 41943040 sectors<br>Units: sectors of 1 * 512 = 512 bytes<br>Sector size (logical/physical): 512 bytes / 512 bytes<br>I/O size (minimum/optimal): 512 bytes / 512 bytes<br>Disklabel type: gpt<br>Disk identifier: 43987868-1C0B-41CE-8AF8-C522AB259655<br><br>Device Start End Sectors Size Type<br>/dev/sde1 48 41942991 41942944 20G IBM General Parallel Fs<br><br>Since, I don't have a hardware watchdog at my disposal, I am using the software watchdog (softdog) instead. Having said this, I am able to get through all the steps successfully... I create the fence agent resource successfully, it shows as Started in crm status output:<br><br>stonith_sbd (stonith:fence_sbd): Started ceha04<br><br>The problem is when I run crm node fence ceha04 to test out fencing a host in my cluster. From the crm status output, I see that the reboot action has failed and furthermore, in the system logs, I see the following messages:<br><br>Sep 21 14:12:33 ceha04 pacemaker-controld[24146]: notice: Requesting fencing (reboot) of node ceha04<br>Sep 21 14:12:33 ceha04 pacemaker-fenced[24142]: notice: Client pacemaker-controld.24146.5ff1ac0c wants to fence (reboot) 'ceha04' with device '(any)'<br>Sep 21 14:12:33 ceha04 pacemaker-fenced[24142]: notice: Requesting peer fencing (reboot) of ceha04<br>Sep 21 14:12:33 ceha04 pacemaker-fenced[24142]: notice: Couldn't find anyone to fence (reboot) ceha04 with any device<br>Sep 21 14:12:33 ceha04 pacemaker-fenced[24142]: error: Operation reboot of ceha04 by <no-one> for pacemaker-controld.24146@ceha04.1bad3987: No such device<br>Sep 21 14:12:33 ceha04 pacemaker-controld[24146]: notice: Stonith operation 3/1:4317:0:ec560474-96ea-4984-b801-400d11b5b3ae: No such device (-19)<br>Sep 21 14:12:33 ceha04 pacemaker-controld[24146]: notice: Stonith operation 3 for ceha04 failed (No such device): aborting transition.<br>Sep 21 14:12:33 ceha04 pacemaker-controld[24146]: warning: No devices found in cluster to fence ceha04, giving up<br>Sep 21 14:12:33 ceha04 pacemaker-controld[24146]: notice: Transition 4317 aborted: Stonith failed<br>Sep 21 14:12:33 ceha04 pacemaker-controld[24146]: notice: Peer ceha04 was not terminated (reboot) by <anyone> on behalf of pacemaker-controld.24146: No such device<br><br>I don't know why Pacemaker isn't able to discover my fencing resource, why isn't it able to find anyone to fence the host from the cluster?<br><br>Any help is greatly appreciated. I can provide more details as required.<br><br>Thanks,<br><br>Phil Stedman<br>Db2 High Availability Development and Support<br>Email: pmstedma@us.ibm.com<br><br>_______________________________________________<br>Manage your subscription:<br></font></tt><tt><font size="2"><a href="https://lists.clusterlabs.org/mailman/listinfo/users">https://lists.clusterlabs.org/mailman/listinfo/users</a></font></tt><tt><font size="2"> <br><br>ClusterLabs home: </font></tt><tt><font size="2"><a href="https://www.clusterlabs.org/">https://www.clusterlabs.org/</a></font></tt><tt><font size="2"> <br>_______________________________________________<br>Manage your subscription:<br></font></tt><tt><font size="2"><a href="https://lists.clusterlabs.org/mailman/listinfo/users">https://lists.clusterlabs.org/mailman/listinfo/users</a></font></tt><tt><font size="2"> <br><br>ClusterLabs home: </font></tt><tt><font size="2"><a href="https://www.clusterlabs.org/">https://www.clusterlabs.org/</a></font></tt><tt><font size="2"> <br><br></font></tt><br><br><BR>
</body></html>