[ClusterLabs] DC marks itself as OFFLINE, continues orchestrating the other nodes

Ken Gaillot kgaillot at redhat.com
Thu Sep 8 11:11:46 EDT 2022


On Thu, 2022-09-08 at 15:01 +0200, Lars Ellenberg wrote:
> Scenario:
> three nodes, no fencing (I know)
> break network, isolating nodes
> unbreak network, see how cluster partitions rejoin and resume service

I'm guessing the CIB changed during the break, with more changes in one
of the other partitions than mqhavm24 ...

> 
> 
> Funny outcome:
> /usr/sbin/crm_mon  -x pe-input-689.bz2
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: mqhavm24 (version 1.1.24.linbit-2.0.el7-8f22be2ae) -
> partition with quorum
>   * Last updated: Thu Sep  8 14:39:54 2022
>   * Last change:  Thu Aug 11 12:33:02 2022 by root via crm_resource
> on mqhavm24
>   * 3 nodes configured
>   * 16 resource instances configured (2 DISABLED)
> 
> Node List:
>   * Online: [ mqhavm34 mqhavm37 ]
>   * OFFLINE: [ mqhavm24 ]
> 
> 
> Note how the current DC considers itself as OFFLINE!
> 
> It accepted an apparently outdated cib replaceament from one of the
> non-DCs
> from a previous membership while already authoritative itself,
> overwriting its own "join" status in the cib.

Reconciling CIB differences in different partitions is inherently
lossy. Basically we gotta pick one side to win, and the current
algorithm just looks at the number of changes. (An "admin epoch" can
also be bumped manually to override that.)

> 
> I have full crm_reports and some context knowledge about the setup.
> 
> For now I'd like to know: has anyone seen this before,
> is that a known bug in corner cases/races during re-join,
> has it even been fixed meanwhile?

No, yes, no

It does seem we could handle the specific case of the local node's
state being overwritten a little better. We can't just override the
join state if the other nodes think it is different, but we could
release DC and restart the join process. How did it handle the
situation in this case?

> 
> Thanks,
>     Lars
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list