[ClusterLabs] 8 node cluster

Antony Stone Antony.Stone at ha.open.source.it
Tue Sep 7 14:08:23 EDT 2021


On Tuesday 07 September 2021 at 19:37:33, M N S H SNGHL wrote:

> I am looking for some suggestions here. I have created an 8 node HA cluster
> on my SuSE hosts.

An even number of nodes is never a good idea.

> 1) The resources should work fine even if 7 nodes go down, which means
> surviving node should still be running the resources.

> I did set "last_man_standing (and last_man_standing_window) option, with
> ATB .. but it didn't really work or didn't dynamically reduce the expected
> votes.

What do the log files (especially on that "last man") tell you happened as you 
gradually reduced the number of nodes online?

> 2) Another requirement is - If all nodes in the cluster go down, and just
> one (anyone) comes back up, it should pick up the resources and should run
> them.

So, how should this one node realise that it is the only node awake and should 
be running the reources, and that there aren't {1..7} other nodes somewhere 
else on the network, all in the same situation, thinking "I can't connect to 
anyone else, but I'm alive, so I'll take on the resources"?

> I tried setting ignore-quorum-policy to ignore, and which worked most of
> the time... (yet to find the case where it doesn't work).. but I am
> suspecting, wouldn't this setting cause split-brain in some cases?

I think you're taking the wrong approach to HA.  Some number of nodes (plural) 
need to be in communication with each other in order for them to decide 
whether they have quorum or not, and can decide to be in charge of the 
resources.

Two basic rules of HA:

1. One node on its own has no clue whatever else is going on with the rest of 
the cluster, and therefore cannot decide to take charge

2. Quorum (unless you override it and really know what you're doing) requires 
>50% of nodes to be in agreement, and an even number of nodes can split into 
50:50, where neither half (literally) is >50%, so everything stops.  This is 
"split brain".

I have two questions:

 - why do you feel you need as many as 8 nodes when the resources will only be 
running on one node?

 - why do you specifically want 8 nodes instead of 7 or 9?


Antony.

-- 
The Royal Society for the Prevention of Cruelty to Animals was formed in 1824.
The National Society for the Prevention of Cruelty to Children was not formed 
until 1884.
That says something about the British.

                                                   Please reply to the list;
                                                         please *don't* CC me.


More information about the Users mailing list