[ClusterLabs] DC marks itself as OFFLINE, continues orchestrating the other nodes
lars.ellenberg at linbit.com
Thu Sep 8 09:01:39 EDT 2022
three nodes, no fencing (I know)
break network, isolating nodes
unbreak network, see how cluster partitions rejoin and resume service
/usr/sbin/crm_mon -x pe-input-689.bz2
* Stack: corosync
* Current DC: mqhavm24 (version 1.1.24.linbit-2.0.el7-8f22be2ae) - partition with quorum
* Last updated: Thu Sep 8 14:39:54 2022
* Last change: Thu Aug 11 12:33:02 2022 by root via crm_resource on mqhavm24
* 3 nodes configured
* 16 resource instances configured (2 DISABLED)
* Online: [ mqhavm34 mqhavm37 ]
* OFFLINE: [ mqhavm24 ]
Note how the current DC considers itself as OFFLINE!
It accepted an apparently outdated cib replaceament from one of the non-DCs
from a previous membership while already authoritative itself,
overwriting its own "join" status in the cib.
I have full crm_reports and some context knowledge about the setup.
For now I'd like to know: has anyone seen this before,
is that a known bug in corner cases/races during re-join,
has it even been fixed meanwhile?
More information about the Users