[ClusterLabs] Antw: [EXT] I_DC_TIMEOUT and node fenced when it joins the cluster
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Tue Apr 19 03:20:31 EDT 2022
Hi!
It seems your network connection is unreliable and you don't have a "second
independent ring".
You may increase the timeout (as suggested), but that doesn't really fix your
networking.
Regards,
Ulrich
>>> vitaly <vitaly at unitc.com> schrieb am 15.04.2022 um 14:26 in Nachricht
<1442265456.65535.1650025606161 at webmail6b.networksolutionsemail.com>:
> Hello Everybody.
> I am seeing occasionally the following behavior on two node cluster.
> 1. Abruptly rebooting both nodes of the cluster (using "reboot")
> 2. Both nodes start to come up. Node d18‑3‑left (2) comes up first
> Apr 13 23:56:09 d18‑3‑left corosync[11465]: [MAIN ] Corosync Cluster
Engine
> ('2.4.4'): started and ready to provide service.
>
> 3. Second node d18‑3‑right (1) joins the cluster
>
> Apr 13 23:56:58 d18‑3‑left corosync[11466]: [TOTEM ] A new membership
> (172.16.1.1:60) was formed. Members joined: 1
> Apr 13 23:56:58 d18‑3‑left corosync[11466]: [QUORUM] This node is within
the
> primary component and will provide service.
> Apr 13 23:56:58 d18‑3‑left corosync[11466]: [QUORUM] Members[2]: 1 2
> Apr 13 23:56:58 d18‑3‑left corosync[11466]: [MAIN ] Completed service
> synchronization, ready to provide service.
> Apr 13 23:56:58 d18‑3‑left pacemakerd[11717]: notice: Quorum acquired
> Apr 13 23:56:58 d18‑3‑left crmd[11763]: notice: Quorum acquired
>
> 4. 2 seconds later node d18‑3‑left shows I_DC_TIMEOUT and starts fencing of
> the newly joined node.
>
> Apr 13 23:57:00 d18‑3‑left crmd[11763]: warning: Input I_DC_TIMEOUT
received
> in state S_PENDING from crm_timer_popped
> After that we get:
> Apr 13 23:57:00 d18‑3‑left crmd[11763]: notice: State transition
S_ELECTION ‑>
> S_INTEGRATION
> Apr 13 23:57:00 d18‑3‑left crmd[11763]: warning: Input I_ELECTION_DC
received
> in state S_INTEGRATION from do_election_check
>
> and fence the node:
> Apr 13 23:57:01 d18‑3‑left pengine[11762]: warning: Scheduling Node
> d18‑3‑right.lab.archivas.com for STONITH
> Apr 13 23:57:01 d18‑3‑left pengine[11762]: notice: * Fence (reboot)
> d18‑3‑right.lab.archivas.com 'node is unclean'
>
> 5. After this the node that was fenced comes up again and joins the cluster
> without any issues.
>
> Any idea on what is going on here?
> Thanks,
> _Vitaly
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list