[ClusterLabs] Two-node Pacemaker cluster with "fence_aws" fence agent

Mon Sep 7 05:26:01 EDT 2020

On 9/4/20 11:24 PM, Digimer wrote:
> On 2020-09-04 5:15 p.m., Philippe M Stedman wrote:
>> Hi ClusterLabs development,
>>
>> I am in the process of deploying a two-node cluster on AWS and using the
>> fence_aws fence agent for fencing. I was reading through the following
>> article about common pitfalls in configuring two-node Pacemaker clusters:
>> https://www.thegeekdiary.com/most-common-two-node-pacemaker-cluster-issues-and-their-workarounds/
>>
>> and the only concern I have is regarding the fencing device. If I read
>> this correctly, there is no need to configure delayed fencing if the
>> fence device can guarantee serialized access.My question here is does
>> the fence_aws agent guarantee serialized access? In the event of a loss
>> of communication between the two cluster nodes, can I guarantee that one
>> host will win the race to fence the other and I won't end up in a
>> situation where both hosts get fenced.
>>
>> Do I need to implement delayed fencing with the fence_aws agent or not?
>> I appreciate any feedback.
>>
>> Thanks,
>>
>> *Phil Stedman*
>> Db2 High Availability Development and Support
>> Email: pmstedma at us.ibm.com
> It would depend on AWS, and I don't believe it's a good idea to design a
> solution that depends on a third party's behaviour.
>
> There's another aspect of fence delays to consider as well; It's also to
> help ensure that the best node survives, not just that one of them does.
> So say your DB is running on node 1, you want to preferentially fence
> node 2. If, later, your DB moves to node 2, then you want to reconfigure
> your stonith devices to preferentially fence node 1.
>
> The delay parameter tells the agent to wait N seconds before fencing the
> associated node. So if your DB is on node 1, you would set the stonith
> device configuration that terminates node 1 to have, say, 'delay="15"'.
> This way, node 2 looks up how to fence node 1, sees the delay, and
> sleeps. Node 1 looks up how to fence node 2, sees no delay, and fences
> immediately. Node 2 is dead before the sleep exits, ensuring in a comms
> break where both nodes are otherwise OK that the node 1, the service
> host, lives.
>
Just as a note to the above I wanted to mention 2 approaches
to automatically give some preference to the 'better' node
in these fencing-races:

- priority-fencing-delay - introduced by Yan Gao earlier this year
Â Â Â  Optionally derive the priority of a node from the
Â Â Â  resource-prioritiesof the resources it is running.
Â Â Â  In a fencing-race the node with the highest priority
Â Â Â  has a certainadvantage over the others as fencing requests
Â Â Â  for that node areexecuted with an additional delay.

- fence_heuristics_ping
Â Â Â  Not really a fencing agent by itself!
Â Â Â  Put on the same fencing level with the actual fencing agent for
Â Â Â  your node to make actual fencing depend on the result of (own)
Â Â Â  connectivity determinded using ping heuristics.

Â Â Â  Btw. still waiting for feedback on the basic idea and
Â Â Â  contributions picking up the idea taking into account
Â Â Â  other aspects that might make a node the 'better' node ;-)

Klaus