[ClusterLabs] Antw: After reboot each node thinks the other is offline.

Tue Aug 1 02:13:58 EDT 2017

>>> "Stephen Carville (HA List)" <62d2a7ca at opayq.com> schrieb am 31.07.2017 um
20:17 in Nachricht <d08c264a-6a84-b32b-049c-82d5ea929f3a at opayq.com>:
> I am experimenting with pacemaker for high availability for some load
> balancers.  I was able to sucessfully get two CentOS (6.9) machines
> (scahadev01da and scahadev01db) to form a cluster and the shared IP was
> assigned to scahadev01da.  I simulated a failure by halting the primary
> and the secondary eventually noticed bringing up the shared IP on its
> eth0.  So far, so good.
> 
> A problem arises when the primary comes back up and, for some reason,
> each node thinks the other is offline.  This leads to both nodes adding

If a node thinks the other is unexpectedly offline, it will fence it, and then it will be offline! Thus the IP can't run there. I guess you have no fencing configured, right?

Regards,
Ulrich

> the duplicate IP to its own eth0.  I probably do not need to tell you
> the mischief that can cause if these were production servers.
> 
> I tried restarting cman, pcsd and pacemaker on both machines with no
> effect on the situation.
> 
> I've found several mentions of it in the search engines but I've been
> unable to find how to fix it.  Any help is appreciated
> 
> Both nodes have quorum disabled in /etc/sysconfig/cman
> 
> CMAN_QUORUM_TIMEOUT=0
> 
> #------------------------------------------------
> Node 1
> 
> scahadev01da# sudo pcs status
> Cluster name: scahadev01d
> Stack: cman
> Current DC: scahadev01da (version 1.1.15-5.el6-e174ec8) - partition
> WITHOUT quorum
> Last updated: Mon Jul 31 10:43:54 2017		Last change: Mon Jul 31 10:30:46
> 2017 by root via cibadmin on scahadev01da
> 
> 2 nodes and 1 resource configured
> 
> Online: [ scahadev01da ]
> OFFLINE: [ scahadev01db ]
> 
> Full list of resources:
> 
>  VirtualIP	(ocf::heartbeat:IPaddr2):	Started scahadev01da
> 
> Daemon Status:
>   cman: active/enabled
>   corosync: active/disabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
> 
> #------------------------------------------------
> Node 2
> 
> scahadev01db ~]$ sudo pcs status
> Cluster name: scahadev01d
> Stack: cman
> Current DC: scahadev01db (version 1.1.15-5.el6-e174ec8) - partition
> WITHOUT quorum
> Last updated: Mon Jul 31 10:43:47 2017		Last change: Sat Jul 29 13:45:15
> 2017 by root via cibadmin on scahadev01da
> 
> 2 nodes and 1 resource configured
> 
> Online: [ scahadev01db ]
> OFFLINE: [ scahadev01da ]
> 
> Full list of resources:
> 
>  VirtualIP	(ocf::heartbeat:IPaddr2):	Started scahadev01db
> 
> Daemon Status:
>   cman: active/enabled
>   corosync: active/disabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
> 
> --
> Stephen Carville
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org