[ClusterLabs] Sub‑clusters / super‑clusters - working :)
Antony Stone
Antony.Stone at ha.open.source.it
Fri Aug 6 08:41:59 EDT 2021
On Friday 06 August 2021 at 14:14:09, Andrei Borzenkov wrote:
> On Thu, Aug 5, 2021 at 3:44 PM Antony Stone wrote:
> >
> > For anyone interested in the detail of how to do this (without needing
> > booth), here is my cluster.conf file, as in "crm configure load replace
> > cluster.conf":
> >
> > --------
> > node tom attribute site=cityA
> > node dick attribute site=cityA
> > node harry attribute site=cityA
> >
> > node fred attribute site=cityB
> > node george attribute site=cityB
> > node ron attribute site=cityB
> >
> > primitive A-float IPaddr2 params ip=192.168.32.250 cidr_netmask=24 meta
> > migration-threshold=3 failure-timeout=60 op monitor interval=5 timeout=20
> > on- fail=restart
> > primitive B-float IPaddr2 params ip=192.168.42.250 cidr_netmask=24 meta
> > migration-threshold=3 failure-timeout=60 op monitor interval=5 timeout=20
> > on- fail=restart
> > primitive Asterisk asterisk meta migration-threshold=3 failure-timeout=60
> > op monitor interval=5 timeout=20 on-fail=restart
> >
> > group GroupA A-float4 resource-stickiness=100
> > group GroupB B-float4 resource-stickiness=100
> > group Anywhere Asterisk resource-stickiness=100
> >
> > location pref_A GroupA rule -inf: site ne cityA
> > location pref_B GroupB rule -inf: site ne cityB
> > location no_pref Anywhere rule -inf: site ne cityA and site ne cityB
> >
> > colocation Ast 100: Anywhere [ cityA cityB ]
>
> You define a resource set, but there are no resources cityA or cityB,
> at least you do not show them. So it is not quite clear what this
> colocation does.
Apologies - I had used different names in my test setup, and converted them to
cityA etc for the sake of continuity in this discussion.
That should be:
colocation Ast 100: Anywhere [ GroupA GroupB ]
> > property cib-bootstrap-options: stonith-enabled=no no-quorum-policy=stop
>
> If connectivity between (any two) sites is lost you may end up with
> one of A or B going out of quorum.
Agreed.
> While this will stop active resources and restart them on another site,
No. Resources do not start on the "wrong" site because of:
location pref_A GroupA rule -inf: site ne cityA
location pref_B GroupB rule -inf: site ne cityB
The resources in GroupA either run in cityA or they do not run at all.
> there is no coordination between stopping and starting so for some time
> resources will be active on both sites. It is up to you to evaluate whether
> this matters.
Any resource which tried to start at the wrong site would simply fail, because
the IP addresses involved do not work at the "other" site.
> If this matters your solution does not protect against it.
>
> If this does not matter, the usual response is - why do you need a
> cluster in the first place? Why not simply always run asterisk on both
> sites all the time?
Because Asterisk at cityA is bound to a floating IP address, which is held on
one of the three machines in cityA. I can't run Asterisk on all three
machines there because only one of them has the IP address.
Asterisk _does_ normally run on both sites all the time, but only on one
machine at each site.
> > start-failure-is-fatal=false cluster-recheck-interval=60s
> > --------
> >
> > Of course, the group definitions are not needed for single resources, but
> > I shall in practice be using multiple resources which do need groups, so
> > I wanted to ensure I was creating something which would work with that.
>
> > I have tested it by:
> ...
> > - causing a network failure at one city (so it simply disappears without
> > stopping corosync neatly): the other city continues its resources (plus
> > the "anywhere" resource), the isolated city stops
>
> If the site is completely isolated it probably does not matter whether
> anything is active there. It is partial connectivity loss where it
> becomes interesting.
Agreed, however my testing shows that resources which I want running in cityA
are either running there or they're not (they never move to cityB or cityC),
similarly for cityB, and the resources I want just a single instance of are
doing just that, and on the same machine at cityA or cityB as the local
resources are running on.
Thanks for the feedback,
Antony.
--
"Measuring average network latency is about as useful as measuring the mean
temperature of patients in a hospital."
- Stéphane Bortzmeyer
Please reply to the list;
please *don't* CC me.
More information about the Users
mailing list