[ClusterLabs] DC marks itself as OFFLINE, continues orchestrating the other nodes

Lars Ellenberg lars.ellenberg at linbit.com
Thu Sep 8 09:01:39 EDT 2022


Scenario:
three nodes, no fencing (I know)
break network, isolating nodes
unbreak network, see how cluster partitions rejoin and resume service


Funny outcome:
/usr/sbin/crm_mon  -x pe-input-689.bz2
Cluster Summary:
  * Stack: corosync
  * Current DC: mqhavm24 (version 1.1.24.linbit-2.0.el7-8f22be2ae) - partition with quorum
  * Last updated: Thu Sep  8 14:39:54 2022
  * Last change:  Thu Aug 11 12:33:02 2022 by root via crm_resource on mqhavm24
  * 3 nodes configured
  * 16 resource instances configured (2 DISABLED)

Node List:
  * Online: [ mqhavm34 mqhavm37 ]
  * OFFLINE: [ mqhavm24 ]


Note how the current DC considers itself as OFFLINE!

It accepted an apparently outdated cib replaceament from one of the non-DCs
from a previous membership while already authoritative itself,
overwriting its own "join" status in the cib.

I have full crm_reports and some context knowledge about the setup.

For now I'd like to know: has anyone seen this before,
is that a known bug in corner cases/races during re-join,
has it even been fixed meanwhile?

Thanks,
    Lars



More information about the Users mailing list