[ClusterLabs] Antw: [EXT] DC marks itself as OFFLINE, continues orchestrating the other nodes

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Thu Sep 8 09:05:47 EDT 2022


>>> Lars Ellenberg <lars.ellenberg at linbit.com> schrieb am 08.09.2022 um 15:01
in
Nachricht <Yxnns8D0NDTWKjDU at grappa.linbit>:

> Scenario:
> three nodes, no fencing (I know)
> break network, isolating nodes
> unbreak network, see how cluster partitions rejoin and resume service
> 
> 
> Funny outcome:
> /usr/sbin/crm_mon  ‑x pe‑input‑689.bz2
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: mqhavm24 (version 1.1.24.linbit‑2.0.el7‑8f22be2ae) ‑
partition 
> with quorum
>   * Last updated: Thu Sep  8 14:39:54 2022
>   * Last change:  Thu Aug 11 12:33:02 2022 by root via crm_resource on 
> mqhavm24
>   * 3 nodes configured
>   * 16 resource instances configured (2 DISABLED)
> 
> Node List:
>   * Online: [ mqhavm34 mqhavm37 ]
>   * OFFLINE: [ mqhavm24 ]
> 
> 
> Note how the current DC considers itself as OFFLINE!
> 
> It accepted an apparently outdated cib replaceament from one of the non‑DCs
> from a previous membership while already authoritative itself,
> overwriting its own "join" status in the cib.
> 
> I have full crm_reports and some context knowledge about the setup.
> 
> For now I'd like to know: has anyone seen this before,
> is that a known bug in corner cases/races during re‑join,
> has it even been fixed meanwhile?

I think the order ov events is important here. Maybe provide some logs?

> 
> Thanks,
>     Lars
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 





More information about the Users mailing list