[ClusterLabs] Merging partitioned two_node cluster?

Tue May 5 01:50:17 EDT 2020

05.05.2020 06:39, Nickle, Richard пишет:
> I have a two node cluster managing a VIP.  The service is an SMTP service.
> This could be active/active, it doesn't matter which node accepts the SMTP
> connection, but I wanted to make sure that a VIP was in place so that there
> was a well-known address.
> 
> This service has been running for quite awhile with no problems.  All of a
> sudden, it partitioned, and now I can't work out a good way to get them to
> merge the clusters back again.  Right now one partition takes the resource
> and starts the VIP, but doesn't see the other node.  The other node doesn't
> create a resource, and can't seem to see the other node.
> 
> At this point, I am perfectly willing to create another node and make an
> odd-numbered cluster, the arguments for this being fairly persuasive.  But
> I'm not sure why they are blocking.
> 
> Surely there must be some manual way to get a partitioned cluster to
> merge? 

it does it automatically if nodes can communicate with each other. You
seem to have some network connectivity issues which you need to
investigate and resolve.

> Some trick?  I also had a scenario several weeks ago where an
> odd-numbered cluster configured in a similar way partitioned into a 3 and 2
> node cluster, and I was unable to work out how to get them to merge, until
> all of a sudden they seemed to fix themselves after doing a 'pcs node
> remove/pcs node add' which had failed many times before.  I have tried that
> here but with no success so far.
> 
> I ruled out some common cases I've seen in discussions and threads, such as
> having my host name defined in host as localhost, etc.
> 
> Corosync 2.4.3, Pacemaker 0.9.164. (Ubuntu 18.04.).
> 
> Output from pcs status for both nodes:
> 
> Cluster name: mail
> Stack: corosync
> Current DC: mail2 (version 1.1.18-2b07d5c5a9) - partition with quorum
> Last updated: Mon May  4 23:28:53 2020
> Last change: Mon May  4 21:50:04 2020 by hacluster via crmd on mail2
> 
> 2 nodes configured
> 1 resource configured
> 
> Online: [ mail2 ]
> OFFLINE: [ mail3 ]
> 
> Full list of resources:
> 
>  mail_vip (ocf::heartbeat:IPaddr2): Started mail2
> 
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
> 
> Cluster name: mail
> Stack: corosync
> Current DC: mail3 (version 1.1.18-2b07d5c5a9) - partition with quorum
> Last updated: Mon May  4 22:13:10 2020
> Last change: Mon May  4 22:10:34 2020 by root via cibadmin on mail3
> 
> 2 nodes configured
> 0 resources configured
> 
> Online: [ mail3 ]
> OFFLINE: [ mail2 ]
> 
> No resources
> 
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
> 
> /etc/corosync/corosync.conf:
> 
> totem {
>     version: 2
>     cluster_name: mail
>     clear_node_high_bit: yes
>     crypto_cipher: none
>     crypto_hash: none
> 
>     interface {
>         ringnumber: 0
>         bindnetaddr: 192.168.80.128
>         mcastport: 5405
>     }
> }
> 

Is interconnect attached to LAN switches or it is direct cable between
two host?

> logging {
>     fileline: off
>     to_stderr: no
>     to_logfile: no
>     to_syslog: yes
>     syslog_facility: daemon
>     debug: off
>     timestamp: on
> }
> 
> quorum {
>     provider: corosync_votequorum
>     wait_for_all: 0
>     two_node: 1
> }
> 
> nodelist {
>     node {
>         ring0_addr: mail2
>         name: mail2
>         nodeid: 1
>     }
> 
>     node {
>         ring0_addr: mail3
>         name: mail3
>         nodeid: 2
>     }
> }
> 
> Thanks!
> 
> Rick
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
>