[ClusterLabs] What triggers fencing?

Mon Jul 9 15:45:48 UTC 2018

On 07/09/2018 05:33 PM, Digimer wrote:
> On 2018-07-09 09:56 AM, Klaus Wenninger wrote:
>> On 07/09/2018 03:49 PM, Digimer wrote:
>>> On 2018-07-09 08:31 AM, Klaus Wenninger wrote:
>>>> On 07/09/2018 02:04 PM, Confidential Company wrote:
>>>>> Hi,
>>>>>
>>>>> Any ideas what triggers fencing script or stonith?
>>>>>
>>>>> Given the setup below:
>>>>> 1. I have two nodes
>>>>> 2. Configured fencing on both nodes
>>>>> 3. Configured delay=15 and delay=30 on fence1(for Node1) and
>>>>> fence2(for Node2) respectively
>>>>>
>>>>> *What does it mean to configured delay in stonith? wait for 15 seconds
>>>>> before it fence the node?
>>>> Given that on a 2-node-cluster you don't have real quorum to make one
>>>> partial cluster fence the rest of the nodes the different delays are meant
>>>> to prevent a fencing-race.
>>>> Without different delays that would lead to both nodes fencing each
>>>> other at the same time - finally both being down.
>>> Not true, the faster node will kill the slower node first. It is
>>> possible that through misconfiguration, both could die, but it's rare
>>> and easily avoided with a 'delay="15"' set on the fence config for the
>>> node you want to win.
>> What exactly is not true? Aren't we saying the same?
>> Of course one of the delays can be 0 (most important is that
>> they are different).
> Perhaps I misunderstood your message. It seemed to me that the
> implication was that fencing in 2-node without a delay always ends up
> with both nodes being down, which isn't the case. It can happen if the
> fence methods are not setup right (ie: the node isn't set to immediately
> power off on ACPI power button event).
Yes, a misunderstanding I guess.

Should have been more verbose in saying that due to the
time between the fencing-command fired off to the fencing
device and the actual fencing taking place (as you state
dependent on how it is configured in detail - but a measurable
time in all cases) there is a certain probability that when
both nodes start fencing at roughly the same time we will
end up with 2 nodes down.

Everybody has to find his own tradeoff between reliability
fence-races are prevented and fencing delay I guess.

>
> If the delay is set on both nodes, and they are different, it will work
> fine. The reason not to do this is that if you use 0, then don't use
> anything at all (0 is default), and any other value causes avoidable
> fence delays.
>
>>> Don't use a delay on the other node, just the node you want to live in
>>> such a case.
>>>
>>>>> *Given Node1 is active and Node2 goes down, does it mean fence1 will
>>>>> first execute and shutdowns Node1 even though Node2 goes down?
>>>> If Node2 managed to sign off properly it will not.
>>>> If network-connection is down so that Node2 can't inform Node1 that it
>>>> is going
>>>> down and finally has stopped all resources it will be fenced by Node1.
>>>>
>>>> Regards,
>>>> Klaus
>>> Fencing occurs in two cases;
>>>
>>> 1. The node stops responding (meaning it's in an unknown state, so it is
>>> fenced to force it into a known state).
>>> 2. A resource / service fails to stop stop. In this case, the service is
>>> in an unknown state, so the node is fenced to force the service into a
>>> known state so that it can be safely recovered on the peer.
>>>
>>> Graceful withdrawal of the node from the cluster, and graceful stopping
>>> of services will not lead to a fence (because in both cases, the node /
>>> service are in a known state - off).
>>>
>
>