[ClusterLabs] Antw: [EXT] I_DC_TIMEOUT and node fenced when it joins the cluster

Tue Apr 19 03:20:31 EDT 2022

Hi!

It seems your network connection is unreliable and you don't have a "second
independent ring".
You may increase the timeout (as suggested), but that doesn't really fix your
networking.

Regards,
Ulrich

>>> vitaly <vitaly at unitc.com> schrieb am 15.04.2022 um 14:26 in Nachricht
<1442265456.65535.1650025606161 at webmail6b.networksolutionsemail.com>:
> Hello Everybody.
> I am seeing occasionally the following behavior on two node cluster. 
> 1. Abruptly rebooting both nodes of the cluster (using "reboot")
> 2. Both nodes start to come up. Node d18‑3‑left (2) comes up first 
> Apr 13 23:56:09 d18‑3‑left corosync[11465]:   [MAIN  ] Corosync Cluster
Engine 
> ('2.4.4'): started and ready to provide service.
> 
> 3. Second node d18‑3‑right (1) joins the cluster
> 
> Apr 13 23:56:58 d18‑3‑left corosync[11466]:   [TOTEM ] A new membership 
> (172.16.1.1:60) was formed. Members joined: 1
> Apr 13 23:56:58 d18‑3‑left corosync[11466]:   [QUORUM] This node is within
the 
> primary component and will provide service.
> Apr 13 23:56:58 d18‑3‑left corosync[11466]:   [QUORUM] Members[2]: 1 2
> Apr 13 23:56:58 d18‑3‑left corosync[11466]:   [MAIN  ] Completed service 
> synchronization, ready to provide service.
> Apr 13 23:56:58 d18‑3‑left pacemakerd[11717]:   notice: Quorum acquired
> Apr 13 23:56:58 d18‑3‑left crmd[11763]:   notice: Quorum acquired
> 
> 4. 2 seconds later node d18‑3‑left shows I_DC_TIMEOUT and starts fencing of

> the newly joined node.
> 
> Apr 13 23:57:00 d18‑3‑left crmd[11763]:  warning: Input I_DC_TIMEOUT
received 
> in state S_PENDING from crm_timer_popped
> After that we get:
> Apr 13 23:57:00 d18‑3‑left crmd[11763]:   notice: State transition
S_ELECTION ‑> 
> S_INTEGRATION
> Apr 13 23:57:00 d18‑3‑left crmd[11763]:  warning: Input I_ELECTION_DC
received 
> in state S_INTEGRATION from do_election_check
> 
> and fence the node:
> Apr 13 23:57:01 d18‑3‑left pengine[11762]:  warning: Scheduling Node 
> d18‑3‑right.lab.archivas.com for STONITH
> Apr 13 23:57:01 d18‑3‑left pengine[11762]:   notice:  * Fence (reboot) 
> d18‑3‑right.lab.archivas.com 'node is unclean'
> 
> 5. After this the node that was fenced comes up again and joins the cluster

> without any issues.
> 
> Any idea on what is going on here?
> Thanks,
> _Vitaly
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/