[ClusterLabs] Antw: After reboot each node thinks the other is offline.

Tue Aug 1 04:05:00 EDT 2017

On 07/31/2017 11:13 PM, Ulrich Windl [Masked] wrote:

>> I am experimenting with pacemaker for high availability for some load
>> balancers.  I was able to sucessfully get two CentOS (6.9) machines
>> (scahadev01da and scahadev01db) to form a cluster and the shared IP was
>> assigned to scahadev01da.  I simulated a failure by halting the primary
>> and the secondary eventually noticed bringing up the shared IP on its
>> eth0.  So far, so good.
>>
>> A problem arises when the primary comes back up and, for some reason,
>> each node thinks the other is offline.  This leads to both nodes adding
> 
> If a node thinks the other is unexpectedly offline, it will fence it, and then it will be offline! Thus the IP can't run there. I guess you have no fencing configured, right?

No. I didn't realize it was necessary unless there was shared storage
involved.  I guess it is time to go back to the drawing board.  Can
clustering even be done reliably on CentOS 6?  I have no objection to
moving to 7 but I was hoping I could get this up quicker than building
out a bunch of new balancers.

On a related note: I tried rebooting both nodes and each node still
thinks the other is offline.  For future reference is there a way to
clear that?

> Regards,
> Ulrich
> 
>> the duplicate IP to its own eth0.  I probably do not need to tell you
>> the mischief that can cause if these were production servers.
>>
>> I tried restarting cman, pcsd and pacemaker on both machines with no
>> effect on the situation.
>>
>> I've found several mentions of it in the search engines but I've been
>> unable to find how to fix it.  Any help is appreciated
>>
>> Both nodes have quorum disabled in /etc/sysconfig/cman
>>
>> CMAN_QUORUM_TIMEOUT=0
>>
>> #------------------------------------------------
>> Node 1
>>
>> scahadev01da# sudo pcs status
>> Cluster name: scahadev01d
>> Stack: cman
>> Current DC: scahadev01da (version 1.1.15-5.el6-e174ec8) - partition
>> WITHOUT quorum
>> Last updated: Mon Jul 31 10:43:54 2017		Last change: Mon Jul 31 10:30:46
>> 2017 by root via cibadmin on scahadev01da
>>
>> 2 nodes and 1 resource configured
>>
>> Online: [ scahadev01da ]
>> OFFLINE: [ scahadev01db ]
>>
>> Full list of resources:
>>
>>  VirtualIP	(ocf::heartbeat:IPaddr2):	Started scahadev01da
>>
>> Daemon Status:
>>   cman: active/enabled
>>   corosync: active/disabled
>>   pacemaker: active/enabled
>>   pcsd: active/enabled
>>
>> #------------------------------------------------
>> Node 2
>>
>> scahadev01db ~]$ sudo pcs status
>> Cluster name: scahadev01d
>> Stack: cman
>> Current DC: scahadev01db (version 1.1.15-5.el6-e174ec8) - partition
>> WITHOUT quorum
>> Last updated: Mon Jul 31 10:43:47 2017		Last change: Sat Jul 29 13:45:15
>> 2017 by root via cibadmin on scahadev01da
>>
>> 2 nodes and 1 resource configured
>>
>> Online: [ scahadev01db ]
>> OFFLINE: [ scahadev01da ]
>>
>> Full list of resources:
>>
>>  VirtualIP	(ocf::heartbeat:IPaddr2):	Started scahadev01db
>>
>> Daemon Status:
>>   cman: active/enabled
>>   corosync: active/disabled
>>   pacemaker: active/enabled
>>   pcsd: active/enabled
>>
>> --
>> Stephen Carville
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org 
>> http://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
> 
> 
> 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>