[ClusterLabs] I_DC_TIMEOUT and node fenced when it joins the cluster

Strahil Nikolov hunter86_bg at yahoo.com
Sat Apr 16 01:04:35 EDT 2022


Set the corosync token to 10000 miliseconds and adjust the consensus as per the man 5 corosync.conf and give it a try.
Don't forget to sync the corosync settings among the cluster.
Best Regards,Strahil Nikolov
 
 
  On Fri, Apr 15, 2022 at 15:27, vitaly<vitaly at unitc.com> wrote:   Hello Everybody.
I am seeing occasionally the following behavior on two node cluster. 
1. Abruptly rebooting both nodes of the cluster (using "reboot")
2. Both nodes start to come up. Node d18-3-left (2) comes up first 
Apr 13 23:56:09 d18-3-left corosync[11465]:  [MAIN  ] Corosync Cluster Engine ('2.4.4'): started and ready to provide service.

3. Second node d18-3-right (1) joins the cluster

Apr 13 23:56:58 d18-3-left corosync[11466]:  [TOTEM ] A new membership (172.16.1.1:60) was formed. Members joined: 1
Apr 13 23:56:58 d18-3-left corosync[11466]:  [QUORUM] This node is within the primary component and will provide service.
Apr 13 23:56:58 d18-3-left corosync[11466]:  [QUORUM] Members[2]: 1 2
Apr 13 23:56:58 d18-3-left corosync[11466]:  [MAIN  ] Completed service synchronization, ready to provide service.
Apr 13 23:56:58 d18-3-left pacemakerd[11717]:  notice: Quorum acquired
Apr 13 23:56:58 d18-3-left crmd[11763]:  notice: Quorum acquired

4. 2 seconds later node d18-3-left shows I_DC_TIMEOUT and starts fencing of the newly joined node.

Apr 13 23:57:00 d18-3-left crmd[11763]:  warning: Input I_DC_TIMEOUT received in state S_PENDING from crm_timer_popped
After that we get:
Apr 13 23:57:00 d18-3-left crmd[11763]:  notice: State transition S_ELECTION -> S_INTEGRATION
Apr 13 23:57:00 d18-3-left crmd[11763]:  warning: Input I_ELECTION_DC received in state S_INTEGRATION from do_election_check

and fence the node:
Apr 13 23:57:01 d18-3-left pengine[11762]:  warning: Scheduling Node d18-3-right.lab.archivas.com for STONITH
Apr 13 23:57:01 d18-3-left pengine[11762]:  notice:  * Fence (reboot) d18-3-right.lab.archivas.com 'node is unclean'

5. After this the node that was fenced comes up again and joins the cluster without any issues.

Any idea on what is going on here?
Thanks,
_Vitaly
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20220416/4fb228f4/attachment.htm>


More information about the Users mailing list