[ClusterLabs] Sub-clusters / super-clusters?

Tue Aug 3 04:40:28 EDT 2021

On Tuesday 11 May 2021 at 12:56:01, Strahil Nikolov wrote:

> Here is the example I had promised:
>
> pcs node attribute server1 city=LA
> pcs node attribute server2 city=NY
>
> # Don't run on any node that is not in LA
> pcs constraint location DummyRes1 rule score=-INFINITY city ne LA
> 
> #Don't run on any node that is not in NY
> pcs constraint location DummyRes2 rule score=-INFINITY city ne NY
>
> The idea is that if you add a node and you forget to specify the attribute
> with the name 'city' , DummyRes1 & DummyRes2 won't be started on it.
> 
> For resources that do not have a constraint based on the city -> they will
> run everywhere unless you specify a colocation constraint between the
> resources.

Excellent - thanks.  I happen to use crmsh rather than pcs, but I've adapted 
the above and got it working.

Unfortunately, there is a problem.

My current setup is:

One 3-machine cluster in city A running a bunch of resources between them, the 
most important of which for this discussion is Asterisk telephony.

One 3-machine cluster in city B doing exactly the same thing.

The two clusters have no knowledge of each other.

I have high-availability routing between my clusters and my upstream telephony 
provider, such that a call can be handled by Cluster A or Cluster B, and if 
one is unavailable, the call gets routed to the other.

Thus, a total failure of Cluster A means I still get phone calls, via Cluster 
B.

To implement the above "one resource which can run anywhere, but only a single 
instance", I joined together clusters A and B, and placed the corresponding 
location constraints on the resources I want only at A and the ones I want 
only at B.  I then added the resource with no location constraint, and it runs 
anywhere, just once.

So far, so good.

The problem is:

With the two independent clusters, if two machines in city A fail, then 
Cluster A fails completely (no quorum), and Cluster B continues working.  That 
means I still get phone calls.

With the new setup, if two machines in city A fail, then _both_ clusters stop 
working and I have no functional resources anywhere.

So, my question now is:

How can I have a 3-machine Cluster A running local resources, and a 3-machine 
Cluster B running local resources, plus one resource running on either Cluster 
A or Cluster B, but without a failure of one cluster causing _everything_ to 
stop?

Thanks,

Antony.

-- 
One tequila, two tequila, three tequila, floor.

                                                   Please reply to the list;
                                                         please *don't* CC me.