[ClusterLabs] Two-node Pacemaker cluster with "fence_aws" fence agent

Fri Sep 4 17:24:00 EDT 2020

On 2020-09-04 5:15 p.m., Philippe M Stedman wrote:
> Hi ClusterLabs development,
> 
> I am in the process of deploying a two-node cluster on AWS and using the
> fence_aws fence agent for fencing. I was reading through the following
> article about common pitfalls in configuring two-node Pacemaker clusters:
> https://www.thegeekdiary.com/most-common-two-node-pacemaker-cluster-issues-and-their-workarounds/
> 
> and the only concern I have is regarding the fencing device. If I read
> this correctly, there is no need to configure delayed fencing if the
> fence device can guarantee serialized access.My question here is does
> the fence_aws agent guarantee serialized access? In the event of a loss
> of communication between the two cluster nodes, can I guarantee that one
> host will win the race to fence the other and I won't end up in a
> situation where both hosts get fenced.
> 
> Do I need to implement delayed fencing with the fence_aws agent or not?
> I appreciate any feedback.
> 
> Thanks,
> 
> *Phil Stedman*
> Db2 High Availability Development and Support
> Email: pmstedma at us.ibm.com

It would depend on AWS, and I don't believe it's a good idea to design a
solution that depends on a third party's behaviour.

There's another aspect of fence delays to consider as well; It's also to
help ensure that the best node survives, not just that one of them does.
So say your DB is running on node 1, you want to preferentially fence
node 2. If, later, your DB moves to node 2, then you want to reconfigure
your stonith devices to preferentially fence node 1.

The delay parameter tells the agent to wait N seconds before fencing the
associated node. So if your DB is on node 1, you would set the stonith
device configuration that terminates node 1 to have, say, 'delay="15"'.
This way, node 2 looks up how to fence node 1, sees the delay, and
sleeps. Node 1 looks up how to fence node 2, sees no delay, and fences
immediately. Node 2 is dead before the sleep exits, ensuring in a comms
break where both nodes are otherwise OK that the node 1, the service
host, lives.

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould