[ClusterLabs] Antw: 2-Node Cluster, 2 Corosync Rings, Why Failover?

Mon Oct 10 08:19:16 CEST 2016

>>> Eric Robinson <eric.robinson at psmnv.com> schrieb am 09.10.2016 um 05:25 in
Nachricht
<DM5PR03MB2729771F3E4E94C4C49BF159FAD80 at DM5PR03MB2729.namprd03.prod.outlook.com>

> In a 2-node cluster where each node has two NICs connected to disjoint 
> networks, and thus 2 corosync rings, why would loss of communication on one 
> ring cause cluster failover?
> 
> We have the following setup...
> 
> 
>                                          LAN A
>                  /====SWITCH====SWITCH====\
> 	 /                                                          \
> 
> NODE_A	                                               NODE_B
> 
> 	 \                                                           /
>                   \====SWITCH====SWITCH====/
>                                          LAN B
> 
> 
> Everything on LAN A is physically separate from LAN B, different switches, 
> cables, power, etc. For some reason, when either LAN A or LAN B suffers a 
> failure, the cluster fails over. What would cause that?

Without logs that's hard to say. We are running a similar configuration (SLES11) without such problems.

> 
> This happened yesterday at 2:05 pm Pacific time. I have the corosync and 
> pacemaker logs from both nodes during that timeframe, but they are 20,000+ 
> lines. I can see the failover happening (because everything was going along 
> normally, then the logs went nuts) but I don't understand why. Can someone 
> tell me what clues I should be looking for?
> 
> --
> Eric Robinson
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org