[ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters - working :)

Thu Aug 5 09:44:18 EDT 2021

Hi!

Nice to hear. What could be "interesting" is how stable the WAN-type of
corosync communication works.
If it's not that stable, the cluster could try to fence nodes rather
frequently. OK, you disabled fencing; maybe it works without.
Did you tune the parameters?

Regards,
Ulrich

>>> Antony Stone <Antony.Stone at ha.open.source.it> schrieb am 05.08.2021 um
14:44 in
Nachricht <202108051444.39919.Antony.Stone at ha.open.source.it>:
> On Thursday 05 August 2021 at 10:51:37, Antony Stone wrote:
> 
>> On Thursday 05 August 2021 at 07:48:37, Ulrich Windl wrote:
>> > 
>> > Have you ever tried to find out why this happens? (Talking about logs)
>> 
>> Not in detail, no, but just in case there's a chance of getting this
>> working as suggested simply using location constraints, I shall look
>> further.
> 
> I now have a working solution ‑ thank you to everyone who has helped.
> 
> The answer to the problem above was simple ‑ with a 6‑node cluster, 3 votes
is 
> 
> not quorum.
> 
> I added a 7th node (in "city C") and adjusted the location constraints to 
> ensure that cluster A resources run in city A, cluster B resources run in 
> city 
> B, and the "anywhere" resource runs in either city A or city B.
> 
> I've even added a colocation constraint to ensure that the "anywhere" 
> resource 
> runs on the same machine in either city A or city B as is running the local

> resources there (which wasn't a strict requirement, but is very useful).
> 
> For anyone interested in the detail of how to do this (without needing 
> booth), 
> here is my cluster.conf file, as in "crm configure load replace 
> cluster.conf":
> 
> ‑‑‑‑‑‑‑‑
> node tom attribute site=cityA
> node dick attribute site=cityA
> node harry attribute site=cityA
> 
> node fred attribute site=cityB
> node george attribute site=cityB
> node ron attribute site=cityB
> 
> primitive A‑float IPaddr2 params ip=192.168.32.250 cidr_netmask=24 meta 
> migration‑threshold=3 failure‑timeout=60 op monitor interval=5 timeout=20
on‑
> fail=restart
> primitive B‑float IPaddr2 params ip=192.168.42.250 cidr_netmask=24 meta 
> migration‑threshold=3 failure‑timeout=60 op monitor interval=5 timeout=20
on‑
> fail=restart
> primitive Asterisk asterisk meta migration‑threshold=3 failure‑timeout=60 op

> monitor interval=5 timeout=20 on‑fail=restart
> 
> group GroupA A‑float4  resource‑stickiness=100
> group GroupB B‑float4  resource‑stickiness=100
> group Anywhere Asterisk resource‑stickiness=100
> 
> location pref_A GroupA rule ‑inf: site ne cityA
> location pref_B GroupB rule ‑inf: site ne cityB
> location no_pref Anywhere rule ‑inf: site ne cityA and site ne cityB
> 
> colocation Ast 100: Anywhere [ cityA cityB ]
> 
> property cib‑bootstrap‑options: stonith‑enabled=no no‑quorum‑policy=stop 
> start‑failure‑is‑fatal=false cluster‑recheck‑interval=60s
> ‑‑‑‑‑‑‑‑
> 
> Of course, the group definitions are not needed for single resources, but I

> shall in practice be using multiple resources which do need groups, so I 
> wanted to ensure I was creating something which would work with that.
> 
> I have tested it by:
> 
>  ‑ bringing up one node at a time: as soon as any 4 nodes are running, all 
> possible resources are running
> 
>  ‑ bringing up 5 or more nodes: all resources run
> 
>  ‑ taking down one node at a time to a maximum of three nodes offline: if at

> least one node in a given city is running, the resources at that city are 
> running
> 
>  ‑ turning off (using "halt", so that corosync dies nicely) all three nodes

> in 
> a city simultaneously: that city's resources stop running, the other city 
> continues working, as well as the "anywhere" resource
> 
>  ‑ causing a network failure at one city (so it simply disappears without 
> stopping corosync neatly): the other city continues its resources (plus the

> "anywhere" resource), the isolated city stops
> 
> For me, this is the solution I wanted, and in fact it's even slightly better

> 
> than the previous two isolated 3‑node clusters I had, because I can now have

> resources running on a single active node in cityA (provided it can see at 
> least 3 other nodes in cityB or cityC), which wasn't possible before.
> 
> 
> Once again, thanks to everyone who has helped me to achieve this result :)
> 
> 
> Antony.
> 
> ‑‑ 
> "The future is already here.   It's just not evenly distributed yet."
> 
>  ‑ William Gibson
> 
>                                                    Please reply to the
list;
>                                                          please *don't* CC 
> me.
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/