[ClusterLabs] Question about fencing

Wed Apr 17 17:32:42 EDT 2019

Fencing requires some mechanism, outside the nodes themselves, that can 
terminate the nodes. Typically, IPMI (iLO, iRMC, RSA, DRAC, etc) is used 
for this. Alternatively, switched PDUs are common. If you don't have 
these but do have a watchdog timer on your nodes, SBD (storage-based 
death) can work.

You can use 'fence_<device> <options> -o status' at the command line to 
figure out the what will work with your hardware. Once you can called 
'fence_foo ... -o status' and get the status of each node, then 
translating that into a pacemaker configuration is pretty simple. That's 
when you enable stonith.

Once stonith is setup and working in pacemaker (ie: you can crash a node 
and the peer reboots it), then you will go to DRBD and set 'fencing: 
resource-and-stonith;' (tells DRBD to block on communication failure 
with the peer and request a fence), and then setup the 'fence-handler 
/path/to/crm-fence-peer.sh' and 'unfence-handler 
/path/to/crm-unfence-handler.sh' (I am going from memory, check the man 
page to verify syntax).

With all this done; if either pacemaker/corosync or DRBD lose contact 
with the peer, they will block and fence. Only after the peer has been 
confirmed terminated will IO resume. This way, split-nodes become 
effectively impossible.

digimer

On 2019-04-17 5:17 p.m., JCA wrote:
> Here is what I did:
>
> # pcs stonith create disk_fencing fence_scsi pcmk_host_list="one two" 
> pcmk_monitor_action="metadata" pcmk_reboot_action="off" 
> devices="/dev/disk/by-id/ata-VBOX_HARDDISK_VBaaa429e4-514e8ecb" meta 
> provides="unfencing"
>
> where ata-VBOX-... corresponds to the device where I have the 
> partition that is shared between both nodes in my cluster. The command 
> completes without any errors (that I can see) and after that I have
>
> # pcs status
> Cluster name: ClusterOne
> Stack: corosync
> Current DC: one (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with 
> quorum
> Last updated: Wed Apr 17 14:35:25 2019
> Last change: Wed Apr 17 14:11:14 2019 by root via cibadmin on one
>
> 2 nodes configured
> 5 resources configured
>
> Online: [ one two ]
>
> Full list of resources:
>
>  MyCluster(ocf::myapp:myapp-script):Stopped
>  Master/Slave Set: DrbdDataClone [DrbdData]
>      Stopped: [ one two ]
>  DrbdFS(ocf::heartbeat:Filesystem):Stopped
>  disk_fencing (stonith:fence_scsi):Stopped
>
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
>
> Things stay that way indefinitely, until I set stonith-enabled to 
> false - at which point all the resources above get started immediately.
>
> Obviously, I am missing something big here. But, what is it?
>
>
> On Wed, Apr 17, 2019 at 2:59 PM Adam Budziński 
> <budzinski.adam at gmail.com <mailto:budzinski.adam at gmail.com>> wrote:
>
>     You did not configure any fencing device.
>
>     śr., 17.04.2019, 22:51 użytkownik JCA <1.41421 at gmail.com
>     <mailto:1.41421 at gmail.com>> napisał:
>
>         I am trying to get fencing working, as described in the
>         "Cluster from Scratch" guide, and I am stymied at get-go :-(
>
>         The document mentions a property named stonith-enabled. When I
>         was trying to get my first cluster going, I noticed that my
>         resources would start only when this property is set to false,
>         by means of
>
>             # pcs property set stonith-enabled=false
>
>         Otherwise, all the resources remain stopped.
>
>         I created a fencing resource for the partition that I am
>         sharing across the the nodes, by means of DRBD. This works
>         fine - but I still have the same problem as above - i.e. when
>         stonith-enabled is set to true, all the resources get stopped,
>         and remain in that state.
>
>         I am very confused here. Can anybody point me in the right
>         direction out of this conundrum?
>
>
>
>         _______________________________________________
>         Manage your subscription:
>         https://lists.clusterlabs.org/mailman/listinfo/users
>
>         ClusterLabs home: https://www.clusterlabs.org/
>
>     _______________________________________________
>     Manage your subscription:
>     https://lists.clusterlabs.org/mailman/listinfo/users
>
>     ClusterLabs home: https://www.clusterlabs.org/
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190417/e720d731/attachment.html>