[ClusterLabs] Sub‑clusters / super‑clusters - working :)
Antony Stone
Antony.Stone at ha.open.source.it
Thu Aug 5 08:44:39 EDT 2021
On Thursday 05 August 2021 at 10:51:37, Antony Stone wrote:
> On Thursday 05 August 2021 at 07:48:37, Ulrich Windl wrote:
> >
> > Have you ever tried to find out why this happens? (Talking about logs)
>
> Not in detail, no, but just in case there's a chance of getting this
> working as suggested simply using location constraints, I shall look
> further.
I now have a working solution - thank you to everyone who has helped.
The answer to the problem above was simple - with a 6-node cluster, 3 votes is
not quorum.
I added a 7th node (in "city C") and adjusted the location constraints to
ensure that cluster A resources run in city A, cluster B resources run in city
B, and the "anywhere" resource runs in either city A or city B.
I've even added a colocation constraint to ensure that the "anywhere" resource
runs on the same machine in either city A or city B as is running the local
resources there (which wasn't a strict requirement, but is very useful).
For anyone interested in the detail of how to do this (without needing booth),
here is my cluster.conf file, as in "crm configure load replace cluster.conf":
--------
node tom attribute site=cityA
node dick attribute site=cityA
node harry attribute site=cityA
node fred attribute site=cityB
node george attribute site=cityB
node ron attribute site=cityB
primitive A-float IPaddr2 params ip=192.168.32.250 cidr_netmask=24 meta
migration-threshold=3 failure-timeout=60 op monitor interval=5 timeout=20 on-
fail=restart
primitive B-float IPaddr2 params ip=192.168.42.250 cidr_netmask=24 meta
migration-threshold=3 failure-timeout=60 op monitor interval=5 timeout=20 on-
fail=restart
primitive Asterisk asterisk meta migration-threshold=3 failure-timeout=60 op
monitor interval=5 timeout=20 on-fail=restart
group GroupA A-float4 resource-stickiness=100
group GroupB B-float4 resource-stickiness=100
group Anywhere Asterisk resource-stickiness=100
location pref_A GroupA rule -inf: site ne cityA
location pref_B GroupB rule -inf: site ne cityB
location no_pref Anywhere rule -inf: site ne cityA and site ne cityB
colocation Ast 100: Anywhere [ cityA cityB ]
property cib-bootstrap-options: stonith-enabled=no no-quorum-policy=stop
start-failure-is-fatal=false cluster-recheck-interval=60s
--------
Of course, the group definitions are not needed for single resources, but I
shall in practice be using multiple resources which do need groups, so I
wanted to ensure I was creating something which would work with that.
I have tested it by:
- bringing up one node at a time: as soon as any 4 nodes are running, all
possible resources are running
- bringing up 5 or more nodes: all resources run
- taking down one node at a time to a maximum of three nodes offline: if at
least one node in a given city is running, the resources at that city are
running
- turning off (using "halt", so that corosync dies nicely) all three nodes in
a city simultaneously: that city's resources stop running, the other city
continues working, as well as the "anywhere" resource
- causing a network failure at one city (so it simply disappears without
stopping corosync neatly): the other city continues its resources (plus the
"anywhere" resource), the isolated city stops
For me, this is the solution I wanted, and in fact it's even slightly better
than the previous two isolated 3-node clusters I had, because I can now have
resources running on a single active node in cityA (provided it can see at
least 3 other nodes in cityB or cityC), which wasn't possible before.
Once again, thanks to everyone who has helped me to achieve this result :)
Antony.
--
"The future is already here. It's just not evenly distributed yet."
- William Gibson
Please reply to the list;
please *don't* CC me.
More information about the Users
mailing list