[ClusterLabs] Merging partitioned two_node cluster?
Nickle, Richard
rnickle at holycross.edu
Tue May 5 09:44:32 EDT 2020
Thanks Honza and Andrei (and Strahil? I might have missed a message in the
thread...)
I'm running this in a VM cluster, so they are on a VLAN and there is
switched routing.
I tried enabling the 'transport: udpu' unicast option, but I have mixed
results: corosync seems to fault and not come up, but even that isn't
consistent. I can't fool around with it right now because it is
production, so I will move to try udpu on a test environment.
Is it possible for me to rule in/out multicast? I tried using iperf to do
this:
rnickle at mail3:~$ !605
iperf -s -u -B 239.192.226.65 -i 1
------------------------------------------------------------
Server listening on UDP port 5001
Binding to local address 239.192.226.65
Joining multicast group 239.192.226.65
Receiving 1470 byte datagrams
UDP buffer size: 208 KByte (default)
------------------------------------------------------------
rnickle at mail2:~$ iperf -c 239.192.226.65 -u -T 32 -t 3 -i 1
------------------------------------------------------------
Client connecting to 239.192.226.65, UDP port 5001
Sending 1470 byte datagrams, IPG target: 11215.21 us (kalman adjust)
Setting multicast TTL to 32
UDP buffer size: 208 KByte (default)
------------------------------------------------------------
[ 3] local 192.133.83.146 port 46033 connected with 239.192.226.65 port
5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 131 KBytes 1.07 Mbits/sec
[ 3] 1.0- 2.0 sec 128 KBytes 1.05 Mbits/sec
[ 3] 2.0- 3.0 sec 128 KBytes 1.05 Mbits/sec
[ 3] 0.0- 3.0 sec 386 KBytes 1.05 Mbits/sec
[ 3] Sent 269 datagrams
Thanks,
Rick
On Tue, May 5, 2020 at 1:54 AM Andrei Borzenkov <arvidjaar at gmail.com> wrote:
> 05.05.2020 06:39, Nickle, Richard пишет:
> > I have a two node cluster managing a VIP. The service is an SMTP
> service.
> > This could be active/active, it doesn't matter which node accepts the
> SMTP
> > connection, but I wanted to make sure that a VIP was in place so that
> there
> > was a well-known address.
> >
> > This service has been running for quite awhile with no problems. All of
> a
> > sudden, it partitioned, and now I can't work out a good way to get them
> to
> > merge the clusters back again. Right now one partition takes the
> resource
> > and starts the VIP, but doesn't see the other node. The other node
> doesn't
> > create a resource, and can't seem to see the other node.
> >
> > At this point, I am perfectly willing to create another node and make an
> > odd-numbered cluster, the arguments for this being fairly persuasive.
> But
> > I'm not sure why they are blocking.
> >
> > Surely there must be some manual way to get a partitioned cluster to
> > merge?
>
> it does it automatically if nodes can communicate with each other. You
> seem to have some network connectivity issues which you need to
> investigate and resolve.
>
> > Some trick? I also had a scenario several weeks ago where an
> > odd-numbered cluster configured in a similar way partitioned into a 3
> and 2
> > node cluster, and I was unable to work out how to get them to merge,
> until
> > all of a sudden they seemed to fix themselves after doing a 'pcs node
> > remove/pcs node add' which had failed many times before. I have tried
> that
> > here but with no success so far.
> >
> > I ruled out some common cases I've seen in discussions and threads, such
> as
> > having my host name defined in host as localhost, etc.
> >
> > Corosync 2.4.3, Pacemaker 0.9.164. (Ubuntu 18.04.).
> >
> > Output from pcs status for both nodes:
> >
> > Cluster name: mail
> > Stack: corosync
> > Current DC: mail2 (version 1.1.18-2b07d5c5a9) - partition with quorum
> > Last updated: Mon May 4 23:28:53 2020
> > Last change: Mon May 4 21:50:04 2020 by hacluster via crmd on mail2
> >
> > 2 nodes configured
> > 1 resource configured
> >
> > Online: [ mail2 ]
> > OFFLINE: [ mail3 ]
> >
> > Full list of resources:
> >
> > mail_vip (ocf::heartbeat:IPaddr2): Started mail2
> >
> > Daemon Status:
> > corosync: active/enabled
> > pacemaker: active/enabled
> > pcsd: active/enabled
> >
> > Cluster name: mail
> > Stack: corosync
> > Current DC: mail3 (version 1.1.18-2b07d5c5a9) - partition with quorum
> > Last updated: Mon May 4 22:13:10 2020
> > Last change: Mon May 4 22:10:34 2020 by root via cibadmin on mail3
> >
> > 2 nodes configured
> > 0 resources configured
> >
> > Online: [ mail3 ]
> > OFFLINE: [ mail2 ]
> >
> > No resources
> >
> > Daemon Status:
> > corosync: active/enabled
> > pacemaker: active/enabled
> > pcsd: active/enabled
> >
> > /etc/corosync/corosync.conf:
> >
> > totem {
> > version: 2
> > cluster_name: mail
> > clear_node_high_bit: yes
> > crypto_cipher: none
> > crypto_hash: none
> >
> > interface {
> > ringnumber: 0
> > bindnetaddr: 192.168.80.128
> > mcastport: 5405
> > }
> > }
> >
>
> Is interconnect attached to LAN switches or it is direct cable between
> two host?
>
> > logging {
> > fileline: off
> > to_stderr: no
> > to_logfile: no
> > to_syslog: yes
> > syslog_facility: daemon
> > debug: off
> > timestamp: on
> > }
> >
> > quorum {
> > provider: corosync_votequorum
> > wait_for_all: 0
> > two_node: 1
> > }
> >
> > nodelist {
> > node {
> > ring0_addr: mail2
> > name: mail2
> > nodeid: 1
> > }
> >
> > node {
> > ring0_addr: mail3
> > name: mail3
> > nodeid: 2
> > }
> > }
> >
> > Thanks!
> >
> > Rick
> >
> >
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> >
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20200505/1ac1e8ea/attachment.htm>
More information about the Users
mailing list