[ClusterLabs] Question about fencing

Wed Apr 17 18:30:40 EDT 2019

Thanks. I most assuredly will, but first I have to run some experiments, to
get a feeling for it.

On Wed, Apr 17, 2019 at 3:56 PM digimer <lists at alteeve.ca> wrote:

> Happy to help you understand, just keep asking questions. :)
>
> The point can be explained this way;
>
> * If two nodes can work without coordination, you don't need a cluster,
> just run your services everywhere. If that is not the case, then you
> require coordination. Fencing ensures that a node that has entered an
> unknown state can be forced into a known state (off). In this way, no
> action will be taken by a node unless the peer can be informed, or the peer
> is gone.
>
> The method that a node is forced into a known state depends on the
> hardware (or infrastructure) you have in your particular setup. So perhaps,
> explain what you're nodes are built on and we can assist with more specific
> details.
>
> digimer
> On 2019-04-17 5:46 p.m., JCA wrote:
>
> Thanks. This implies that I officially do not understand what it is that
> fencing can do for me, in my simple cluster. Back to the drawing board.
>
> On Wed, Apr 17, 2019 at 3:33 PM digimer <lists at alteeve.ca> wrote:
>
>> Fencing requires some mechanism, outside the nodes themselves, that can
>> terminate the nodes. Typically, IPMI (iLO, iRMC, RSA, DRAC, etc) is used
>> for this. Alternatively, switched PDUs are common. If you don't have these
>> but do have a watchdog timer on your nodes, SBD (storage-based death) can
>> work.
>>
>> You can use 'fence_<device> <options> -o status' at the command line to
>> figure out the what will work with your hardware. Once you can called
>> 'fence_foo ... -o status' and get the status of each node, then translating
>> that into a pacemaker configuration is pretty simple. That's when you
>> enable stonith.
>>
>> Once stonith is setup and working in pacemaker (ie: you can crash a node
>> and the peer reboots it), then you will go to DRBD and set 'fencing:
>> resource-and-stonith;' (tells DRBD to block on communication failure with
>> the peer and request a fence), and then setup the 'fence-handler
>> /path/to/crm-fence-peer.sh' and 'unfence-handler
>> /path/to/crm-unfence-handler.sh' (I am going from memory, check the man
>> page to verify syntax).
>>
>> With all this done; if either pacemaker/corosync or DRBD lose contact
>> with the peer, they will block and fence. Only after the peer has been
>> confirmed terminated will IO resume. This way, split-nodes become
>> effectively impossible.
>>
>> digimer
>> On 2019-04-17 5:17 p.m., JCA wrote:
>>
>> Here is what I did:
>>
>> # pcs stonith create disk_fencing fence_scsi pcmk_host_list="one two"
>> pcmk_monitor_action="metadata" pcmk_reboot_action="off"
>> devices="/dev/disk/by-id/ata-VBOX_HARDDISK_VBaaa429e4-514e8ecb" meta
>> provides="unfencing"
>>
>> where ata-VBOX-... corresponds to the device where I have the partition
>> that is shared between both nodes in my cluster. The command completes
>> without any errors (that I can see) and after that I have
>>
>> # pcs status
>> Cluster name: ClusterOne
>> Stack: corosync
>> Current DC: one (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with
>> quorum
>> Last updated: Wed Apr 17 14:35:25 2019
>> Last change: Wed Apr 17 14:11:14 2019 by root via cibadmin on one
>>
>> 2 nodes configured
>> 5 resources configured
>>
>> Online: [ one two ]
>>
>> Full list of resources:
>>
>>  MyCluster (ocf::myapp:myapp-script): Stopped
>>  Master/Slave Set: DrbdDataClone [DrbdData]
>>      Stopped: [ one two ]
>>  DrbdFS (ocf::heartbeat:Filesystem): Stopped
>>  disk_fencing  (stonith:fence_scsi): Stopped
>>
>> Daemon Status:
>>   corosync: active/enabled
>>   pacemaker: active/enabled
>>   pcsd: active/enabled
>>
>> Things stay that way indefinitely, until I set stonith-enabled to false -
>> at which point all the resources above get started immediately.
>>
>> Obviously, I am missing something big here. But, what is it?
>>
>>
>> On Wed, Apr 17, 2019 at 2:59 PM Adam Budziński <budzinski.adam at gmail.com>
>> wrote:
>>
>>> You did not configure any fencing device.
>>>
>>> śr., 17.04.2019, 22:51 użytkownik JCA <1.41421 at gmail.com> napisał:
>>>
>>>> I am trying to get fencing working, as described in the "Cluster from
>>>> Scratch" guide, and I am stymied at get-go :-(
>>>>
>>>> The document mentions a property named stonith-enabled. When I was
>>>> trying to get my first cluster going, I noticed that my resources would
>>>> start only when this property is set to false, by means of
>>>>
>>>>     # pcs property set stonith-enabled=false
>>>>
>>>> Otherwise, all the resources remain stopped.
>>>>
>>>> I created a fencing resource for the partition that I am sharing across
>>>> the the nodes, by means of DRBD. This works fine - but I still have the
>>>> same problem as above - i.e. when stonith-enabled is set to true, all the
>>>> resources get stopped, and remain in that state.
>>>>
>>>> I am very confused here. Can anybody point me in the right direction
>>>> out of this conundrum?
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Manage your subscription:
>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>
>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>
>>
>> _______________________________________________
>> Manage your subscription:https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190417/5e199a78/attachment-0001.html>