[ClusterLabs] Merging partitioned two_node cluster?
Nickle, Richard
rnickle at holycross.edu
Mon May 4 23:39:54 EDT 2020
I have a two node cluster managing a VIP. The service is an SMTP service.
This could be active/active, it doesn't matter which node accepts the SMTP
connection, but I wanted to make sure that a VIP was in place so that there
was a well-known address.
This service has been running for quite awhile with no problems. All of a
sudden, it partitioned, and now I can't work out a good way to get them to
merge the clusters back again. Right now one partition takes the resource
and starts the VIP, but doesn't see the other node. The other node doesn't
create a resource, and can't seem to see the other node.
At this point, I am perfectly willing to create another node and make an
odd-numbered cluster, the arguments for this being fairly persuasive. But
I'm not sure why they are blocking.
Surely there must be some manual way to get a partitioned cluster to
merge? Some trick? I also had a scenario several weeks ago where an
odd-numbered cluster configured in a similar way partitioned into a 3 and 2
node cluster, and I was unable to work out how to get them to merge, until
all of a sudden they seemed to fix themselves after doing a 'pcs node
remove/pcs node add' which had failed many times before. I have tried that
here but with no success so far.
I ruled out some common cases I've seen in discussions and threads, such as
having my host name defined in host as localhost, etc.
Corosync 2.4.3, Pacemaker 0.9.164. (Ubuntu 18.04.).
Output from pcs status for both nodes:
Cluster name: mail
Stack: corosync
Current DC: mail2 (version 1.1.18-2b07d5c5a9) - partition with quorum
Last updated: Mon May 4 23:28:53 2020
Last change: Mon May 4 21:50:04 2020 by hacluster via crmd on mail2
2 nodes configured
1 resource configured
Online: [ mail2 ]
OFFLINE: [ mail3 ]
Full list of resources:
mail_vip (ocf::heartbeat:IPaddr2): Started mail2
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Cluster name: mail
Stack: corosync
Current DC: mail3 (version 1.1.18-2b07d5c5a9) - partition with quorum
Last updated: Mon May 4 22:13:10 2020
Last change: Mon May 4 22:10:34 2020 by root via cibadmin on mail3
2 nodes configured
0 resources configured
Online: [ mail3 ]
OFFLINE: [ mail2 ]
No resources
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
/etc/corosync/corosync.conf:
totem {
version: 2
cluster_name: mail
clear_node_high_bit: yes
crypto_cipher: none
crypto_hash: none
interface {
ringnumber: 0
bindnetaddr: 192.168.80.128
mcastport: 5405
}
}
logging {
fileline: off
to_stderr: no
to_logfile: no
to_syslog: yes
syslog_facility: daemon
debug: off
timestamp: on
}
quorum {
provider: corosync_votequorum
wait_for_all: 0
two_node: 1
}
nodelist {
node {
ring0_addr: mail2
name: mail2
nodeid: 1
}
node {
ring0_addr: mail3
name: mail3
nodeid: 2
}
}
Thanks!
Rick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20200504/b3f87dcc/attachment.htm>
More information about the Users
mailing list