[ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?

Wed Aug 4 16:06:39 EDT 2021

There is no safe way to do what you are trying to do.

If the resource is on cluster A and contact is lost between clusters A 
and B due to a network failure, how does cluster B know if the resource 
is still running on cluster A or not?

It has no way of knowing if cluster A is even up and running.

In that situation it cannot safely start the resource.

If the network is down and both clusters come up at the same time, 
without being able to contact each other, neither knows if the other is 
running the resource, so neither can safely start it.

On 8/4/21 3:27 PM, Antony Stone wrote:
> On Wednesday 04 August 2021 at 20:57:49, Strahil Nikolov wrote:
>
>> That's why you need a qdisk at a 3-rd location, so you will have 7 votes in
>> total.When 3 nodes in cityA die, all resources will be started on the
>> remaining 3 nodes.
> I think I have not explained this properly.
>
> I have three nodes in city A which run resources which have to run in city A.
> They are based on IP addresses which are only valid on the network in city A.
>
> I have three nodes in city B which run resources which have to run in city B.
> They are based on IP addresses which are only valid on the network in city B.
>
> I have redundant routing between my upstream provider, and cities A and B, so
> that I only _need_ resources to be running in one of the two cities for
> everything to work as required.  City A can go completely offline and not run
> its resources, and everything I need continues to work via city B.
>
> I now have an additional requirement to run a single resource at either city A
> or city B but not both.
>
> As soon as I connect the clusters at city A and city B, and apply the location
> contraints and weighting rules you have suggested:
>
> 1. everything works, including the single resource at either city A or city B,
> so long as both clusters are operational.
>
> 2. as soon as one cluster fails (all three of its nodes nodes become
> unavailable), then the other cluster stops running all its resources as well.
> This is even with quorum=2.
>
> This means I have lost the redundancy between my two clusters, which is based
> on the expectation that only one cluster will fail at a time.  If the failure
> of one automatically _causes_ the failure of the other, I have no high
> availability any more.
>
> What I require is for cluster A to continue running its own resources, plus
> the single resource which can run anywhere, in the event that cluster B fails.
>
> In other words, I need the exact same outcome as I have at present if cluster
> B fails (its resources stop, cluster A is unaffected), except that cluster A
> continues to run the single resource which I need just a single instance of.
>
> It is impossible for the nodes at city A to run the resources which should be
> running at city B, partly because some of them are identical ("Asterisk" as a
> resource, for example, is already running at city A), and partly because some
> of them are bound to the networking arrangements (I cannot set a floating IP
> address which belongs in city A on a machine which exists in city B - it just
> doesn't work).
>
> Therefore if adding a seventh node at a third location would try to start
> _all_ resources in city A if city B goes down, it is not a working solution.
> If city B goes down then I simply do not want its resources to be running
> anywhere, just the same as I have now with the two independent clusters.
>
>
> Thanks,
>
>
> Antony.
>