[ClusterLabs] Antw: Trying to Understanding crm-fence-peer.sh

Wed Jan 16 10:07:36 EST 2019

Hi!

I guess we need more logs; especially some events from storage2 before fencing
is triggered.

Regards,
Ulrich

>>> "Bryan K. Walton" <bwalton+1546953805 at leepfrog.com> schrieb am 16.01.2019
um
16:03 in Nachricht
<20190116150321.3j2f2upz67ethxox at mygeeto.inside.leepfrog.com>:
> I have posed this question on the DRBD‑user list, but didn't receive a
> response.
> 
> I'm using DRBD 8.4 with Pacemaker in a two node cluster, with a
> single primary and fabric fencing.
> 
> Almost all of my STONITH testing has worked as I would expect it to.  I
> get expected results when I use iptables to sever the replication link,
> when I force a kernel panic, and when I trigger an unclean
> shutdown/reboot with sysrq trigger.  The fact that my iptables test
> results in a fenced node, would seem to suggest that crm‑fence‑peer.sh
> is working as expected.
> 
> However, if I issue a simple reboot command on my current primary node
> (storage1), what I see is that Pacemaker successfully fails over to the
> secondary node.  But the logs on storage2 show the following:
> 
> Jan 11 08:49:53 storage2 kernel: drbd r0: helper command: /sbin/drbdadm
> fence‑peer r0
> Jan 11 08:49:53 storage2 crm‑fence‑peer.sh[15594]:
> DRBD_CONF=/etc/drbd.conf DRBD_DONT_WARN_ON_VERSION_MISMATCH=1
> DRBD_MINOR=1 DRBD_PEER=storage1 DRBD_PEERS=storage1
> DRBD_PEER_ADDRESS=192.168.0.2 DRBD_PEER_AF=ipv4 DRBD_RESOURCE=r0
> UP_TO_DATE_NODES='' /usr/lib/drbd/crm‑fence‑peer.sh
> Jan 11 08:49:53 storage2 crm‑fence‑peer.sh[15594]: INFO peer is
> reachable, my disk is UpToDate: placed constraint
> 'drbd‑fence‑by‑handler‑r0‑StorageClusterClone'
> Jan 11 08:49:53 storage2 kernel: drbd r0: helper command: /sbin/drbdadm
> fence‑peer r0 exit code 4 (0x400)
> Jan 11 08:49:53 storage2 kernel: drbd r0: fence‑peer helper returned 4
> (peer was fenced)
> 
> The exit code 4 would seem to suggest that storage1 should be fenced.
> But the switch ports connected to storage1 are still enabled.
> 
> Am I misreading the logs here?  This is a clean reboot, maybe fencing
> isn't supposed to happen in this situation?  But the logs seem to
> suggest otherwise.
> 
> Thanks!
> Bryan Walton
> 
> ‑‑ 
> Bryan K. Walton                                           319‑337‑3877 
> Linux Systems Administrator                 Leepfrog Technologies, Inc 
> 
> ‑‑‑‑‑ End forwarded message ‑‑‑‑‑
> 
> ‑‑ 
> Bryan K. Walton                                           319‑337‑3877 
> Linux Systems Administrator                 Leepfrog Technologies, Inc 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org