[ClusterLabs] Merging partitioned two_node cluster?
Jan Friesse
jfriesse at redhat.com
Tue May 5 03:26:03 EDT 2020
> On May 5, 2020 6:39:54 AM GMT+03:00, "Nickle, Richard" <rnickle at holycross.edu> wrote:
>> I have a two node cluster managing a VIP. The service is an SMTP
>> service.
>> This could be active/active, it doesn't matter which node accepts the
>> SMTP
>> connection, but I wanted to make sure that a VIP was in place so that
>> there
>> was a well-known address.
>>
>> This service has been running for quite awhile with no problems. All
>> of a
>> sudden, it partitioned, and now I can't work out a good way to get them
>> to
>> merge the clusters back again. Right now one partition takes the
>> resource
>> and starts the VIP, but doesn't see the other node. The other node
>> doesn't
>> create a resource, and can't seem to see the other node.
>>
>> At this point, I am perfectly willing to create another node and make
>> an
>> odd-numbered cluster, the arguments for this being fairly persuasive.
>> But
>> I'm not sure why they are blocking.
>>
>> Surely there must be some manual way to get a partitioned cluster to
>> merge? Some trick? I also had a scenario several weeks ago where an
>> odd-numbered cluster configured in a similar way partitioned into a 3
>> and 2
>> node cluster, and I was unable to work out how to get them to merge,
>> until
>> all of a sudden they seemed to fix themselves after doing a 'pcs node
>> remove/pcs node add' which had failed many times before. I have tried
>> that
>> here but with no success so far.
>>
>> I ruled out some common cases I've seen in discussions and threads,
>> such as
>> having my host name defined in host as localhost, etc.
>>
>> Corosync 2.4.3, Pacemaker 0.9.164. (Ubuntu 18.04.).
>>
>> Output from pcs status for both nodes:
>>
>> Cluster name: mail
>> Stack: corosync
>> Current DC: mail2 (version 1.1.18-2b07d5c5a9) - partition with quorum
>> Last updated: Mon May 4 23:28:53 2020
>> Last change: Mon May 4 21:50:04 2020 by hacluster via crmd on mail2
>>
>> 2 nodes configured
>> 1 resource configured
>>
>> Online: [ mail2 ]
>> OFFLINE: [ mail3 ]
>>
>> Full list of resources:
>>
>> mail_vip (ocf::heartbeat:IPaddr2): Started mail2
>>
>> Daemon Status:
>> corosync: active/enabled
>> pacemaker: active/enabled
>> pcsd: active/enabled
>>
>> Cluster name: mail
>> Stack: corosync
>> Current DC: mail3 (version 1.1.18-2b07d5c5a9) - partition with quorum
>> Last updated: Mon May 4 22:13:10 2020
>> Last change: Mon May 4 22:10:34 2020 by root via cibadmin on mail3
>>
>> 2 nodes configured
>> 0 resources configured
>>
>> Online: [ mail3 ]
>> OFFLINE: [ mail2 ]
>>
>> No resources
>>
>> Daemon Status:
>> corosync: active/enabled
>> pacemaker: active/enabled
>> pcsd: active/enabled
>>
>> /etc/corosync/corosync.conf:
>>
>> totem {
>> version: 2
>> cluster_name: mail
>> clear_node_high_bit: yes
>> crypto_cipher: none
>> crypto_hash: none
>>
>> interface {
>> ringnumber: 0
>> bindnetaddr: 192.168.80.128
>> mcastport: 5405
>> }
>> }
>>
>> logging {
>> fileline: off
>> to_stderr: no
>> to_logfile: no
>> to_syslog: yes
>> syslog_facility: daemon
>> debug: off
>> timestamp: on
>> }
>>
>> quorum {
>> provider: corosync_votequorum
>> wait_for_all: 0
>> two_node: 1
>> }
>>
>> nodelist {
>> node {
>> ring0_addr: mail2
>> name: mail2
>> nodeid: 1
>> }
>>
>> node {
>> ring0_addr: mail3
>> name: mail3
>> nodeid: 2
>> }
>> }
>>
>> Thanks!
>>
>> Rick
>
> Ah Rick,All
>
> Just ignore the previous one - I guess I'm too sleepy.
Honestly I think your advise was good. Current config uses default
transport and for 2.4.3 it means multicast so trying unicast udpu may
solve the problem.
If not I would take a look to classic things like firewall, ...
Regards,
Honza
>
>
> Best Regards,
> Strahil Nikolov
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
More information about the Users
mailing list