2-Node Cluster, 2 Corosync Rings, Why Failover?

Eric Robinson eric.robinson at psmnv.com
Sat Oct 8 23:25:56 EDT 2016

In a 2-node cluster where each node has two NICs connected to disjoint networks, and thus 2 corosync rings, why would loss of communication on one ring cause cluster failover?

We have the following setup...

                                         LAN A
	 /                                                          \

NODE_A	                                               NODE_B

	 \                                                           /
                                         LAN B

Everything on LAN A is physically separate from LAN B, different switches, cables, power, etc. For some reason, when either LAN A or LAN B suffers a failure, the cluster fails over. What would cause that?

This happened yesterday at 2:05 pm Pacific time. I have the corosync and pacemaker logs from both nodes during that timeframe, but they are 20,000+ lines. I can see the failover happening (because everything was going along normally, then the logs went nuts) but I don't understand why. Can someone tell me what clues I should be looking for?

Eric Robinson

