[ClusterLabs] What triggers fencing?

Mon Jul 9 09:56:47 EDT 2018

On 07/09/2018 03:49 PM, Digimer wrote:
> On 2018-07-09 08:31 AM, Klaus Wenninger wrote:
>> On 07/09/2018 02:04 PM, Confidential Company wrote:
>>> Hi,
>>>
>>> Any ideas what triggers fencing script or stonith?
>>>
>>> Given the setup below:
>>> 1. I have two nodes
>>> 2. Configured fencing on both nodes
>>> 3. Configured delay=15 and delay=30 on fence1(for Node1) and
>>> fence2(for Node2) respectively
>>>
>>> *What does it mean to configured delay in stonith? wait for 15 seconds
>>> before it fence the node?
>> Given that on a 2-node-cluster you don't have real quorum to make one
>> partial cluster fence the rest of the nodes the different delays are meant
>> to prevent a fencing-race.
>> Without different delays that would lead to both nodes fencing each
>> other at the same time - finally both being down.
> Not true, the faster node will kill the slower node first. It is
> possible that through misconfiguration, both could die, but it's rare
> and easily avoided with a 'delay="15"' set on the fence config for the
> node you want to win.
What exactly is not true? Aren't we saying the same?
Of course one of the delays can be 0 (most important is that
they are different).

>
> Don't use a delay on the other node, just the node you want to live in
> such a case.
>
>>> *Given Node1 is active and Node2 goes down, does it mean fence1 will
>>> first execute and shutdowns Node1 even though Node2 goes down?
>> If Node2 managed to sign off properly it will not.
>> If network-connection is down so that Node2 can't inform Node1 that it
>> is going
>> down and finally has stopped all resources it will be fenced by Node1.
>>
>> Regards,
>> Klaus
> Fencing occurs in two cases;
>
> 1. The node stops responding (meaning it's in an unknown state, so it is
> fenced to force it into a known state).
> 2. A resource / service fails to stop stop. In this case, the service is
> in an unknown state, so the node is fenced to force the service into a
> known state so that it can be safely recovered on the peer.
>
> Graceful withdrawal of the node from the cluster, and graceful stopping
> of services will not lead to a fence (because in both cases, the node /
> service are in a known state - off).
>