[ClusterLabs] I_DC_TIMEOUT and node fenced when it joins the cluster

vitaly vitaly at unitc.com
Fri Apr 15 08:26:46 EDT 2022


Hello Everybody.
I am seeing occasionally the following behavior on two node cluster. 
1. Abruptly rebooting both nodes of the cluster (using "reboot")
2. Both nodes start to come up. Node d18-3-left (2) comes up first 
Apr 13 23:56:09 d18-3-left corosync[11465]:   [MAIN  ] Corosync Cluster Engine ('2.4.4'): started and ready to provide service.

3. Second node d18-3-right (1) joins the cluster

Apr 13 23:56:58 d18-3-left corosync[11466]:   [TOTEM ] A new membership (172.16.1.1:60) was formed. Members joined: 1
Apr 13 23:56:58 d18-3-left corosync[11466]:   [QUORUM] This node is within the primary component and will provide service.
Apr 13 23:56:58 d18-3-left corosync[11466]:   [QUORUM] Members[2]: 1 2
Apr 13 23:56:58 d18-3-left corosync[11466]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 13 23:56:58 d18-3-left pacemakerd[11717]:   notice: Quorum acquired
Apr 13 23:56:58 d18-3-left crmd[11763]:   notice: Quorum acquired

4. 2 seconds later node d18-3-left shows I_DC_TIMEOUT and starts fencing of the newly joined node.

Apr 13 23:57:00 d18-3-left crmd[11763]:  warning: Input I_DC_TIMEOUT received in state S_PENDING from crm_timer_popped
After that we get:
Apr 13 23:57:00 d18-3-left crmd[11763]:   notice: State transition S_ELECTION -> S_INTEGRATION
Apr 13 23:57:00 d18-3-left crmd[11763]:  warning: Input I_ELECTION_DC received in state S_INTEGRATION from do_election_check

and fence the node:
Apr 13 23:57:01 d18-3-left pengine[11762]:  warning: Scheduling Node d18-3-right.lab.archivas.com for STONITH
Apr 13 23:57:01 d18-3-left pengine[11762]:   notice:  * Fence (reboot) d18-3-right.lab.archivas.com 'node is unclean'

5. After this the node that was fenced comes up again and joins the cluster without any issues.

Any idea on what is going on here?
Thanks,
_Vitaly


More information about the Users mailing list