[ClusterLabs] Fencing on 2-node cluster

Thu Jun 21 01:35:13 EDT 2018

On 06/21/2018 06:02 AM, Andrei Borzenkov wrote:
> 21.06.2018 01:12, Casey & Gina пишет:
>> Please forgive me, I had inadvertently had stonith-enabled=false when
>> I thought I had it true.  The fencing/rebooting is now working.
>> However, in light of what you brought up earlier, how do I set a
>> delay preference different for one of the two hosts in case of a
>> communications failure?
>>
> One possibility is to set pcmk_delay_max. This will cause random delay
> before executing stonith action. It leaves some probability that these
> delays will not be far enough. In which case you need to create two
> stonith resources, one for each node, and set delays explicitly for each
> of them.

It is actually all there already in the thread but let me
give a quick summary:

On a 3 and up node cluster, when a cluster splits into 2 partitions,
we have one quorate and one non-quorate.
Thus there won't be a race between the 2 partitions each trying to
fence the other (you need quorum for that).

On clusters with just 2 nodes we need this 'trick' enabled with
two_node in Corosync to be able to raise availability above the
availability of a single node. But now both partitions (just one
node each) are quorate if the nodes have seen each other once
(that is needed so that after e.g. a synchronous power-outage that
leaves the network down both nodes don't come up starting all
services on each of them creating a split-brain situation).
Both partitions being quorate they would immediately start fencing
each other and in case of bad luck this race would lead to both
nodes going down.

To prevent this there are basically 2 approaches.

- If you have one node that has somehow a primary role you would
  rather like this node to survive this race. So you create 2
  fencing-resources - one fencing the primary-node and one
  fencing the secondary-node. If you make the fencing-resource
  fencing the secondary-node trigger with a delay a little longer
  than the other a fence-race triggered by a network outage
  (synchronous trigger) will lead to the secondary node being
  fenced before it has time to fence the primary at all.

- If you don't have a node with a distinct primary role you
  can go with a single fencing-resource but with a random delay
  configured so that chances that both kill each other
  simultanously is merely zero.

To make a fencing-resource trigger with a delay you can
either use delay-mechanisms built into the fence-agents
or you can use the 2 pacemaker attributes pcmk_delay_max and
pcmk_delay_base to design the delay.

pcmk_delay_base defines a minimum delay while pcmk_delay_max
defines a maximum delay time. The actual delay is chosen
randomly somewhere within this range.

Klaus
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org