[ClusterLabs] Sub‑clusters / super‑clusters - working :)

Fri Aug 6 09:12:57 EDT 2021

On Fri, Aug 6, 2021 at 3:42 PM Antony Stone
<Antony.Stone at ha.open.source.it> wrote:
>
> On Friday 06 August 2021 at 14:14:09, Andrei Borzenkov wrote:
>
> > On Thu, Aug 5, 2021 at 3:44 PM Antony Stone wrote:
> > >
> > > For anyone interested in the detail of how to do this (without needing
> > > booth), here is my cluster.conf file, as in "crm configure load replace
> > > cluster.conf":
> > >
> > > --------
> > > node tom attribute site=cityA
> > > node dick attribute site=cityA
> > > node harry attribute site=cityA
> > >
> > > node fred attribute site=cityB
> > > node george attribute site=cityB
> > > node ron attribute site=cityB
> > >
> > > primitive A-float IPaddr2 params ip=192.168.32.250 cidr_netmask=24 meta
> > > migration-threshold=3 failure-timeout=60 op monitor interval=5 timeout=20
> > > on- fail=restart
> > > primitive B-float IPaddr2 params ip=192.168.42.250 cidr_netmask=24 meta
> > > migration-threshold=3 failure-timeout=60 op monitor interval=5 timeout=20
> > > on- fail=restart
> > > primitive Asterisk asterisk meta migration-threshold=3 failure-timeout=60
> > > op monitor interval=5 timeout=20 on-fail=restart
> > >
> > > group GroupA A-float4  resource-stickiness=100
> > > group GroupB B-float4  resource-stickiness=100
> > > group Anywhere Asterisk resource-stickiness=100
> > >
> > > location pref_A GroupA rule -inf: site ne cityA
> > > location pref_B GroupB rule -inf: site ne cityB
> > > location no_pref Anywhere rule -inf: site ne cityA and site ne cityB
> > >
> > > colocation Ast 100: Anywhere [ cityA cityB ]
> >
> > You define a resource set, but there are no resources cityA or cityB,
> > at least you do not show them. So it is not quite clear what this
> > colocation does.
>
> Apologies - I had used different names in my test setup, and converted them to
> cityA etc for the sake of continuity in this discussion.
>
> That should be:
>
>         colocation Ast 100: Anywhere [ GroupA GroupB ]
>
> > > property cib-bootstrap-options: stonith-enabled=no no-quorum-policy=stop
> >
> > If connectivity between (any two) sites is lost you may end up with
> > one of A or B going out of quorum.
>
> Agreed.
>
> > While this will stop active resources and restart them on another site,
>
> No.  Resources do not start on the "wrong" site because of:
>
>         location pref_A GroupA rule -inf: site ne cityA
>         location pref_B GroupB rule -inf: site ne cityB
>
> The resources in GroupA either run in cityA or they do not run at all.
>

Where did I say anything about group A or B? You have single resource
that can migrate between sites

location no_pref Anywhere rule -inf: site ne cityA and site ne cityB

> > there is no coordination between stopping and starting so for some time
> > resources will be active on both sites. It is up to you to evaluate whether
> > this matters.
>
> Any resource which tried to start at the wrong site would simply fail, because
> the IP addresses involved do not work at the "other" site.
>
> > If this matters your solution does not protect against it.
> >
> > If this does not matter, the usual response is - why do you need a
> > cluster in the first place? Why not simply always run asterisk on both
> > sites all the time?
>
> Because Asterisk at cityA is bound to a floating IP address, which is held on
> one of the three machines in cityA.  I can't run Asterisk on all three
> machines there because only one of them has the IP address.
>

I have no idea what "Asterisk in cityA'' means because I see only one
resource named Asterisk which is not restricted to a single site
according to your configuration.

> Asterisk _does_ normally run on both sites all the time, but only on one
> machine at each site.
>

The only resource that allegedly can migrate between sites in
configuration you have shown so far is Asterisk. Now you say this
resource never migrates between sites. I'm not sure how helpful this
will be to anyone reading archives because I completely lost all track
of what you tried to achieve.

> > > start-failure-is-fatal=false cluster-recheck-interval=60s
> > > --------
> > >
> > > Of course, the group definitions are not needed for single resources, but
> > > I shall in practice be using multiple resources which do need groups, so
> > > I wanted to ensure I was creating something which would work with that.
> >
> > > I have tested it by:
> > ...
> > >  - causing a network failure at one city (so it simply disappears without
> > > stopping corosync neatly): the other city continues its resources (plus
> > > the "anywhere" resource), the isolated city stops
> >
> > If the site is completely isolated it probably does not matter whether
> > anything is active there. It is partial connectivity loss where it
> > becomes interesting.
>
> Agreed, however my testing shows that resources which I want running in cityA
> are either running there or they're not (they never move to cityB or cityC),
> similarly for cityB, and the resources I want just a single instance of are
> doing just that, and on the same machine at cityA or cityB as the local
> resources are running on.
>
>
> Thanks for the feedback,
>
>
> Antony.
>
> --
> "Measuring average network latency is about as useful as measuring the mean
> temperature of patients in a hospital."
>
>  - Stéphane Bortzmeyer
>
>                                                    Please reply to the list;
>                                                          please *don't* CC me.
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/