[ClusterLabs] Merging partitioned two_node cluster?

Nickle, Richard rnickle at holycross.edu
Mon May 4 23:39:54 EDT 2020


I have a two node cluster managing a VIP.  The service is an SMTP service.
This could be active/active, it doesn't matter which node accepts the SMTP
connection, but I wanted to make sure that a VIP was in place so that there
was a well-known address.

This service has been running for quite awhile with no problems.  All of a
sudden, it partitioned, and now I can't work out a good way to get them to
merge the clusters back again.  Right now one partition takes the resource
and starts the VIP, but doesn't see the other node.  The other node doesn't
create a resource, and can't seem to see the other node.

At this point, I am perfectly willing to create another node and make an
odd-numbered cluster, the arguments for this being fairly persuasive.  But
I'm not sure why they are blocking.

Surely there must be some manual way to get a partitioned cluster to
merge?  Some trick?  I also had a scenario several weeks ago where an
odd-numbered cluster configured in a similar way partitioned into a 3 and 2
node cluster, and I was unable to work out how to get them to merge, until
all of a sudden they seemed to fix themselves after doing a 'pcs node
remove/pcs node add' which had failed many times before.  I have tried that
here but with no success so far.

I ruled out some common cases I've seen in discussions and threads, such as
having my host name defined in host as localhost, etc.

Corosync 2.4.3, Pacemaker 0.9.164. (Ubuntu 18.04.).

Output from pcs status for both nodes:

Cluster name: mail
Stack: corosync
Current DC: mail2 (version 1.1.18-2b07d5c5a9) - partition with quorum
Last updated: Mon May  4 23:28:53 2020
Last change: Mon May  4 21:50:04 2020 by hacluster via crmd on mail2

2 nodes configured
1 resource configured

Online: [ mail2 ]
OFFLINE: [ mail3 ]

Full list of resources:

 mail_vip (ocf::heartbeat:IPaddr2): Started mail2

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Cluster name: mail
Stack: corosync
Current DC: mail3 (version 1.1.18-2b07d5c5a9) - partition with quorum
Last updated: Mon May  4 22:13:10 2020
Last change: Mon May  4 22:10:34 2020 by root via cibadmin on mail3

2 nodes configured
0 resources configured

Online: [ mail3 ]
OFFLINE: [ mail2 ]

No resources

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

/etc/corosync/corosync.conf:

totem {
    version: 2
    cluster_name: mail
    clear_node_high_bit: yes
    crypto_cipher: none
    crypto_hash: none

    interface {
        ringnumber: 0
        bindnetaddr: 192.168.80.128
        mcastport: 5405
    }
}

logging {
    fileline: off
    to_stderr: no
    to_logfile: no
    to_syslog: yes
    syslog_facility: daemon
    debug: off
    timestamp: on
}

quorum {
    provider: corosync_votequorum
    wait_for_all: 0
    two_node: 1
}

nodelist {
    node {
        ring0_addr: mail2
        name: mail2
        nodeid: 1
    }

    node {
        ring0_addr: mail3
        name: mail3
        nodeid: 2
    }
}

Thanks!

Rick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20200504/b3f87dcc/attachment.htm>


More information about the Users mailing list