[ClusterLabs] Sub‑clusters / super‑clusters - working :)

Antony Stone Antony.Stone at ha.open.source.it
Thu Aug 5 08:44:39 EDT 2021


On Thursday 05 August 2021 at 10:51:37, Antony Stone wrote:

> On Thursday 05 August 2021 at 07:48:37, Ulrich Windl wrote:
> > 
> > Have you ever tried to find out why this happens? (Talking about logs)
> 
> Not in detail, no, but just in case there's a chance of getting this
> working as suggested simply using location constraints, I shall look
> further.

I now have a working solution - thank you to everyone who has helped.

The answer to the problem above was simple - with a 6-node cluster, 3 votes is 
not quorum.

I added a 7th node (in "city C") and adjusted the location constraints to 
ensure that cluster A resources run in city A, cluster B resources run in city 
B, and the "anywhere" resource runs in either city A or city B.

I've even added a colocation constraint to ensure that the "anywhere" resource 
runs on the same machine in either city A or city B as is running the local 
resources there (which wasn't a strict requirement, but is very useful).

For anyone interested in the detail of how to do this (without needing booth), 
here is my cluster.conf file, as in "crm configure load replace cluster.conf":

--------
node tom attribute site=cityA
node dick attribute site=cityA
node harry attribute site=cityA

node fred attribute site=cityB
node george attribute site=cityB
node ron attribute site=cityB

primitive A-float IPaddr2 params ip=192.168.32.250 cidr_netmask=24 meta 
migration-threshold=3 failure-timeout=60 op monitor interval=5 timeout=20 on-
fail=restart
primitive B-float IPaddr2 params ip=192.168.42.250 cidr_netmask=24 meta 
migration-threshold=3 failure-timeout=60 op monitor interval=5 timeout=20 on-
fail=restart
primitive Asterisk asterisk meta migration-threshold=3 failure-timeout=60 op 
monitor interval=5 timeout=20 on-fail=restart

group GroupA A-float4  resource-stickiness=100
group GroupB B-float4  resource-stickiness=100
group Anywhere Asterisk resource-stickiness=100

location pref_A GroupA rule -inf: site ne cityA
location pref_B GroupB rule -inf: site ne cityB
location no_pref Anywhere rule -inf: site ne cityA and site ne cityB

colocation Ast 100: Anywhere [ cityA cityB ]

property cib-bootstrap-options: stonith-enabled=no no-quorum-policy=stop 
start-failure-is-fatal=false cluster-recheck-interval=60s
--------

Of course, the group definitions are not needed for single resources, but I 
shall in practice be using multiple resources which do need groups, so I 
wanted to ensure I was creating something which would work with that.

I have tested it by:

 - bringing up one node at a time: as soon as any 4 nodes are running, all 
possible resources are running

 - bringing up 5 or more nodes: all resources run

 - taking down one node at a time to a maximum of three nodes offline: if at 
least one node in a given city is running, the resources at that city are 
running

 - turning off (using "halt", so that corosync dies nicely) all three nodes in 
a city simultaneously: that city's resources stop running, the other city 
continues working, as well as the "anywhere" resource

 - causing a network failure at one city (so it simply disappears without 
stopping corosync neatly): the other city continues its resources (plus the 
"anywhere" resource), the isolated city stops

For me, this is the solution I wanted, and in fact it's even slightly better 
than the previous two isolated 3-node clusters I had, because I can now have 
resources running on a single active node in cityA (provided it can see at 
least 3 other nodes in cityB or cityC), which wasn't possible before.


Once again, thanks to everyone who has helped me to achieve this result :)


Antony.

-- 
"The future is already here.   It's just not evenly distributed yet."

 - William Gibson

                                                   Please reply to the list;
                                                         please *don't* CC me.


More information about the Users mailing list