[ClusterLabs] two node cluster. each node shows other node offline.

Mon May 22 15:23:51 CEST 2017

Hello all,

I have two nginx nodes running nginx version: nginx/1.11.10
(nginx-plus-r12-p2), Corosync Cluster Engine, version '2.3.5',
and Pacemaker 1.1.14 on Ubuntu 16.04.1 LTS.

This cluster is intended to replace our old nginx cluster running on 14.04
and older versions of corosync/pacemaker.

On initial set up of the cluster everything works wonderfully and I can put
a node on standby and failover works as expected. However if I reboot one
of the nodes the cluster gets into a split situation where each node thinks
the other node is offline. I've tried numerous things to correct it but I
cannot get them to both show as online.

crm status from nginx1:

root at prod-nginx1:~# crm status
Online: [ prod-nginx1 ]
OFFLINE: [ prod-nginx2 ]

Full list of resources:

 ClusterIP (ocf::heartbeat:IPaddr2): Started prod-nginx1
 ClusterIPRestricted (ocf::heartbeat:IPaddr2): Started prod-nginx1
 Nginx (ocf::heartbeat:nginx): Started prod-nginx1

and crm status from nginx2:

root at prod-nginx2:~# crm status
Online: [ prod-nginx2 ]
OFFLINE: [ prod-nginx1 ]

Full list of resources:

 ClusterIP (ocf::heartbeat:IPaddr2): Started prod-nginx2
 ClusterIPRestricted (ocf::heartbeat:IPaddr2): Started prod-nginx2
 Nginx (ocf::heartbeat:nginx): Started prod-nginx2

I've tried forcing the nodes back online, restarting both pacemaker and
corosync on both servers, but nothing seems to work. I do not have this
issue with corosync/pacemaker on ubuntu 14.04.

Here is the current corosync.conf which works on ubuntu 14.04

totem {
version: 2
secauth: on
cluster_name: pacemaker1
transport: udpu
token: 1000
token_retransmits_before_loss_const: 10
}

nodelist {
node {
ring0_addr: 10.10.16.100
nodeid: 101
}
node {
ring0_addr: 10.10.16.101
nodeid: 102
}
}

quorum {
provider: corosync_votequorum
two_node: 1
wait_for_all: 1
last_man_standing: 1
auto_tie_breaker: 0
}

logging {
        # Log the source file and line where messages are being
        # generated. When in doubt, leave off. Potentially useful for
        # debugging.
        fileline: off
        # Log to standard error. When in doubt, set to no. Useful when
        # running in the foreground (when invoking "corosync -f")
        to_stderr: no
        # Log to a log file. When set to "no", the "logfile" option
        # must not be set.
        to_logfile: yes
        logfile: /var/log/corosync/corosync.log
        # Log to the system log daemon. When in doubt, set to yes.
        to_syslog: yes
        # Log debug messages (very verbose). When in doubt, leave off.
        debug: off
        # Log messages with time stamps. When in doubt, set to on
        # (unless you are only logging to syslog, where double
        # timestamps can be annoying).
        timestamp: on
        logger_subsys {
                subsys: QUORUM
                debug: off
        }
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170522/9617dca1/attachment.html>