[ClusterLabs] Antw: [EXT] DC marks itself as OFFLINE, continues orchestrating the other nodes
Ulrich.Windl at rz.uni-regensburg.de
Thu Sep 8 09:05:47 EDT 2022
>>> Lars Ellenberg <lars.ellenberg at linbit.com> schrieb am 08.09.2022 um 15:01
Nachricht <Yxnns8D0NDTWKjDU at grappa.linbit>:
> three nodes, no fencing (I know)
> break network, isolating nodes
> unbreak network, see how cluster partitions rejoin and resume service
> Funny outcome:
> /usr/sbin/crm_mon ‑x pe‑input‑689.bz2
> Cluster Summary:
> * Stack: corosync
> * Current DC: mqhavm24 (version 1.1.24.linbit‑2.0.el7‑8f22be2ae) ‑
> with quorum
> * Last updated: Thu Sep 8 14:39:54 2022
> * Last change: Thu Aug 11 12:33:02 2022 by root via crm_resource on
> * 3 nodes configured
> * 16 resource instances configured (2 DISABLED)
> Node List:
> * Online: [ mqhavm34 mqhavm37 ]
> * OFFLINE: [ mqhavm24 ]
> Note how the current DC considers itself as OFFLINE!
> It accepted an apparently outdated cib replaceament from one of the non‑DCs
> from a previous membership while already authoritative itself,
> overwriting its own "join" status in the cib.
> I have full crm_reports and some context knowledge about the setup.
> For now I'd like to know: has anyone seen this before,
> is that a known bug in corner cases/races during re‑join,
> has it even been fixed meanwhile?
I think the order ov events is important here. Maybe provide some logs?
> Manage your subscription:
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users