[ClusterLabs] Merging partitioned two_node cluster?

Tue May 5 09:44:32 EDT 2020

Thanks Honza and Andrei (and Strahil?  I might have missed a message in the
thread...)

I'm running this in a VM cluster, so they are on a VLAN and there is
switched routing.

I tried enabling the 'transport: udpu' unicast option, but I have mixed
results: corosync seems to fault and not come up, but even that isn't
consistent.  I can't fool around with it right now because it is
production, so I will move to try udpu on a test environment.

Is it possible for me to rule in/out multicast?  I tried using iperf to do
this:

rnickle at mail3:~$ !605
iperf -s -u -B 239.192.226.65 -i 1
------------------------------------------------------------
Server listening on UDP port 5001
Binding to local address 239.192.226.65
Joining multicast group  239.192.226.65
Receiving 1470 byte datagrams
UDP buffer size:  208 KByte (default)
------------------------------------------------------------

rnickle at mail2:~$ iperf -c 239.192.226.65 -u -T 32 -t 3 -i 1
------------------------------------------------------------
Client connecting to 239.192.226.65, UDP port 5001
Sending 1470 byte datagrams, IPG target: 11215.21 us (kalman adjust)
Setting multicast TTL to 32
UDP buffer size:  208 KByte (default)
------------------------------------------------------------
[  3] local 192.133.83.146 port 46033 connected with 239.192.226.65 port
5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec   131 KBytes  1.07 Mbits/sec
[  3]  1.0- 2.0 sec   128 KBytes  1.05 Mbits/sec
[  3]  2.0- 3.0 sec   128 KBytes  1.05 Mbits/sec
[  3]  0.0- 3.0 sec   386 KBytes  1.05 Mbits/sec
[  3] Sent 269 datagrams

Thanks,

Rick

On Tue, May 5, 2020 at 1:54 AM Andrei Borzenkov <arvidjaar at gmail.com> wrote:

> 05.05.2020 06:39, Nickle, Richard пишет:
> > I have a two node cluster managing a VIP.  The service is an SMTP
> service.
> > This could be active/active, it doesn't matter which node accepts the
> SMTP
> > connection, but I wanted to make sure that a VIP was in place so that
> there
> > was a well-known address.
> >
> > This service has been running for quite awhile with no problems.  All of
> a
> > sudden, it partitioned, and now I can't work out a good way to get them
> to
> > merge the clusters back again.  Right now one partition takes the
> resource
> > and starts the VIP, but doesn't see the other node.  The other node
> doesn't
> > create a resource, and can't seem to see the other node.
> >
> > At this point, I am perfectly willing to create another node and make an
> > odd-numbered cluster, the arguments for this being fairly persuasive.
> But
> > I'm not sure why they are blocking.
> >
> > Surely there must be some manual way to get a partitioned cluster to
> > merge?
>
> it does it automatically if nodes can communicate with each other. You
> seem to have some network connectivity issues which you need to
> investigate and resolve.
>
> > Some trick?  I also had a scenario several weeks ago where an
> > odd-numbered cluster configured in a similar way partitioned into a 3
> and 2
> > node cluster, and I was unable to work out how to get them to merge,
> until
> > all of a sudden they seemed to fix themselves after doing a 'pcs node
> > remove/pcs node add' which had failed many times before.  I have tried
> that
> > here but with no success so far.
> >
> > I ruled out some common cases I've seen in discussions and threads, such
> as
> > having my host name defined in host as localhost, etc.
> >
> > Corosync 2.4.3, Pacemaker 0.9.164. (Ubuntu 18.04.).
> >
> > Output from pcs status for both nodes:
> >
> > Cluster name: mail
> > Stack: corosync
> > Current DC: mail2 (version 1.1.18-2b07d5c5a9) - partition with quorum
> > Last updated: Mon May  4 23:28:53 2020
> > Last change: Mon May  4 21:50:04 2020 by hacluster via crmd on mail2
> >
> > 2 nodes configured
> > 1 resource configured
> >
> > Online: [ mail2 ]
> > OFFLINE: [ mail3 ]
> >
> > Full list of resources:
> >
> >  mail_vip (ocf::heartbeat:IPaddr2): Started mail2
> >
> > Daemon Status:
> >   corosync: active/enabled
> >   pacemaker: active/enabled
> >   pcsd: active/enabled
> >
> > Cluster name: mail
> > Stack: corosync
> > Current DC: mail3 (version 1.1.18-2b07d5c5a9) - partition with quorum
> > Last updated: Mon May  4 22:13:10 2020
> > Last change: Mon May  4 22:10:34 2020 by root via cibadmin on mail3
> >
> > 2 nodes configured
> > 0 resources configured
> >
> > Online: [ mail3 ]
> > OFFLINE: [ mail2 ]
> >
> > No resources
> >
> > Daemon Status:
> >   corosync: active/enabled
> >   pacemaker: active/enabled
> >   pcsd: active/enabled
> >
> > /etc/corosync/corosync.conf:
> >
> > totem {
> >     version: 2
> >     cluster_name: mail
> >     clear_node_high_bit: yes
> >     crypto_cipher: none
> >     crypto_hash: none
> >
> >     interface {
> >         ringnumber: 0
> >         bindnetaddr: 192.168.80.128
> >         mcastport: 5405
> >     }
> > }
> >
>
> Is interconnect attached to LAN switches or it is direct cable between
> two host?
>
> > logging {
> >     fileline: off
> >     to_stderr: no
> >     to_logfile: no
> >     to_syslog: yes
> >     syslog_facility: daemon
> >     debug: off
> >     timestamp: on
> > }
> >
> > quorum {
> >     provider: corosync_votequorum
> >     wait_for_all: 0
> >     two_node: 1
> > }
> >
> > nodelist {
> >     node {
> >         ring0_addr: mail2
> >         name: mail2
> >         nodeid: 1
> >     }
> >
> >     node {
> >         ring0_addr: mail3
> >         name: mail3
> >         nodeid: 2
> >     }
> > }
> >
> > Thanks!
> >
> > Rick
> >
> >
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> >
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20200505/1ac1e8ea/attachment.htm>