[ClusterLabs] After reboot each node thinks the other is offline.

Stephen Carville (HA List) 62d2a7ca at opayq.com
Mon Jul 31 18:17:14 UTC 2017


I am experimenting with pacemaker for high availability for some load
balancers.  I was able to sucessfully get two CentOS (6.9) machines
(scahadev01da and scahadev01db) to form a cluster and the shared IP was
assigned to scahadev01da.  I simulated a failure by halting the primary
and the secondary eventually noticed bringing up the shared IP on its
eth0.  So far, so good.

A problem arises when the primary comes back up and, for some reason,
each node thinks the other is offline.  This leads to both nodes adding
the duplicate IP to its own eth0.  I probably do not need to tell you
the mischief that can cause if these were production servers.

I tried restarting cman, pcsd and pacemaker on both machines with no
effect on the situation.

I've found several mentions of it in the search engines but I've been
unable to find how to fix it.  Any help is appreciated

Both nodes have quorum disabled in /etc/sysconfig/cman

CMAN_QUORUM_TIMEOUT=0

#------------------------------------------------
Node 1

scahadev01da# sudo pcs status
Cluster name: scahadev01d
Stack: cman
Current DC: scahadev01da (version 1.1.15-5.el6-e174ec8) - partition
WITHOUT quorum
Last updated: Mon Jul 31 10:43:54 2017		Last change: Mon Jul 31 10:30:46
2017 by root via cibadmin on scahadev01da

2 nodes and 1 resource configured

Online: [ scahadev01da ]
OFFLINE: [ scahadev01db ]

Full list of resources:

 VirtualIP	(ocf::heartbeat:IPaddr2):	Started scahadev01da

Daemon Status:
  cman: active/enabled
  corosync: active/disabled
  pacemaker: active/enabled
  pcsd: active/enabled

#------------------------------------------------
Node 2

scahadev01db ~]$ sudo pcs status
Cluster name: scahadev01d
Stack: cman
Current DC: scahadev01db (version 1.1.15-5.el6-e174ec8) - partition
WITHOUT quorum
Last updated: Mon Jul 31 10:43:47 2017		Last change: Sat Jul 29 13:45:15
2017 by root via cibadmin on scahadev01da

2 nodes and 1 resource configured

Online: [ scahadev01db ]
OFFLINE: [ scahadev01da ]

Full list of resources:

 VirtualIP	(ocf::heartbeat:IPaddr2):	Started scahadev01db

Daemon Status:
  cman: active/enabled
  corosync: active/disabled
  pacemaker: active/enabled
  pcsd: active/enabled

--
Stephen Carville




More information about the Users mailing list