[ClusterLabs] Question about fencing

Wed Apr 17 17:46:09 EDT 2019

Thanks. This implies that I officially do not understand what it is that
fencing can do for me, in my simple cluster. Back to the drawing board.

On Wed, Apr 17, 2019 at 3:33 PM digimer <lists at alteeve.ca> wrote:

> Fencing requires some mechanism, outside the nodes themselves, that can
> terminate the nodes. Typically, IPMI (iLO, iRMC, RSA, DRAC, etc) is used
> for this. Alternatively, switched PDUs are common. If you don't have these
> but do have a watchdog timer on your nodes, SBD (storage-based death) can
> work.
>
> You can use 'fence_<device> <options> -o status' at the command line to
> figure out the what will work with your hardware. Once you can called
> 'fence_foo ... -o status' and get the status of each node, then translating
> that into a pacemaker configuration is pretty simple. That's when you
> enable stonith.
>
> Once stonith is setup and working in pacemaker (ie: you can crash a node
> and the peer reboots it), then you will go to DRBD and set 'fencing:
> resource-and-stonith;' (tells DRBD to block on communication failure with
> the peer and request a fence), and then setup the 'fence-handler
> /path/to/crm-fence-peer.sh' and 'unfence-handler
> /path/to/crm-unfence-handler.sh' (I am going from memory, check the man
> page to verify syntax).
>
> With all this done; if either pacemaker/corosync or DRBD lose contact with
> the peer, they will block and fence. Only after the peer has been confirmed
> terminated will IO resume. This way, split-nodes become effectively
> impossible.
>
> digimer
> On 2019-04-17 5:17 p.m., JCA wrote:
>
> Here is what I did:
>
> # pcs stonith create disk_fencing fence_scsi pcmk_host_list="one two"
> pcmk_monitor_action="metadata" pcmk_reboot_action="off"
> devices="/dev/disk/by-id/ata-VBOX_HARDDISK_VBaaa429e4-514e8ecb" meta
> provides="unfencing"
>
> where ata-VBOX-... corresponds to the device where I have the partition
> that is shared between both nodes in my cluster. The command completes
> without any errors (that I can see) and after that I have
>
> # pcs status
> Cluster name: ClusterOne
> Stack: corosync
> Current DC: one (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with
> quorum
> Last updated: Wed Apr 17 14:35:25 2019
> Last change: Wed Apr 17 14:11:14 2019 by root via cibadmin on one
>
> 2 nodes configured
> 5 resources configured
>
> Online: [ one two ]
>
> Full list of resources:
>
>  MyCluster (ocf::myapp:myapp-script): Stopped
>  Master/Slave Set: DrbdDataClone [DrbdData]
>      Stopped: [ one two ]
>  DrbdFS (ocf::heartbeat:Filesystem): Stopped
>  disk_fencing  (stonith:fence_scsi): Stopped
>
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
>
> Things stay that way indefinitely, until I set stonith-enabled to false -
> at which point all the resources above get started immediately.
>
> Obviously, I am missing something big here. But, what is it?
>
>
> On Wed, Apr 17, 2019 at 2:59 PM Adam Budziński <budzinski.adam at gmail.com>
> wrote:
>
>> You did not configure any fencing device.
>>
>> śr., 17.04.2019, 22:51 użytkownik JCA <1.41421 at gmail.com> napisał:
>>
>>> I am trying to get fencing working, as described in the "Cluster from
>>> Scratch" guide, and I am stymied at get-go :-(
>>>
>>> The document mentions a property named stonith-enabled. When I was
>>> trying to get my first cluster going, I noticed that my resources would
>>> start only when this property is set to false, by means of
>>>
>>>     # pcs property set stonith-enabled=false
>>>
>>> Otherwise, all the resources remain stopped.
>>>
>>> I created a fencing resource for the partition that I am sharing across
>>> the the nodes, by means of DRBD. This works fine - but I still have the
>>> same problem as above - i.e. when stonith-enabled is set to true, all the
>>> resources get stopped, and remain in that state.
>>>
>>> I am very confused here. Can anybody point me in the right direction out
>>> of this conundrum?
>>>
>>>
>>>
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>
>
> _______________________________________________
> Manage your subscription:https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190417/ee3e7b12/attachment-0001.html>