[ClusterLabs] Network split during corosync startup results in split brain

Mon Jul 20 11:29:04 EDT 2015

Hello,

Our team has been using corosync + pacemaker successfully for the last year or two, but last week ran into an issue which I wanted to get some more insight on.  We have a 2 node cluster, using the WaitForAll votequorum parameter so all nodes must have been seen at least once before resources are started.  We have two layers of fencing configured, IPMI and SBD (storage based death, using shared storage).  We have done extensive testing on our fencing in the past and it works great, but here the fencing never got called.  One of our QA testers managed to pull the network cable at a very particular time during startup, and it seems to have resulted in corosync telling pacemaker that all nodes had been seen, and that the cluster was in a normal state with one node up.  No fencing was ever triggered, and all resources were started normally.  The other node was NOT marked unclean.  This resulted in a split brain scenario, as our master database (pgsql replication) was still running as master on the other node, and had now been started and promoted on this node.  Luckily this is all in a test environment, so no production impact was seen.  Below is test specifics and some relevant logs.

Procedure:
1. Allow both nodes to come up fully.
2. Reboot current master node.
3. As node is booting up again (during corosync startup), pull interconnect cable.

Expected Behavior:
1. Node either a) fails to start any resources or b) fences other node and promotes to master

Actual behavior:
1. Node promotes to master without fencing peer, resulting in both nodes running master database.

Module-2 is rebooted @ 12:57:42, and comes back up ~12:59.
When corosync starts up, both nodes are visible and all vote counts are normal.

Jul 15 12:59:00 module-2 corosync[2906]: [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Jul 15 12:59:00 module-2 corosync[2906]: [TOTEM ] A new membership (10.1.1.2:56) was formed. Members joined: 2
Jul 15 12:59:00 module-2 corosync[2906]: [QUORUM] Waiting for all cluster members. Current votes: 1 expected_votes: 2
Jul 15 12:59:00 module-2 corosync[2906]: [QUORUM] Members[1]: 2
Jul 15 12:59:00 module-2 corosync[2906]: [MAIN  ] Completed service synchronization, ready to provide service.
Jul 15 12:59:06 module-2 pacemakerd[4076]: notice: cluster_connect_quorum: Quorum acquired

3 seconds later, the interconnect network cable is pulled.

Jul 15 12:59:09 module-2 kernel: e1000e: eth3 NIC Link is Down

Corosync recognizes this immediately, and declares the peer as dead.

Jul 15 12:59:10 module-2 crmd[4107]: notice: peer_update_callback: Our peer on the DC (module-1) is dead

Slightly later (very close), corosync initialization completes, it says it has quorum, and declares system ready for use.

Jul 15 12:59:10 module-2 corosync[2906]: [QUORUM] Members[1]: 2
Jul 15 12:59:10 module-2 corosync[2906]: [MAIN  ] Completed service synchronization, ready to provide service.

Pacemaker starts resources normally, including Postgres.

Jul 15 12:59:13 module-2 pengine[4106]: notice: LogActions: Start   fence_sbd        (module-2)
Jul 15 12:59:13 module-2 pengine[4106]: notice: LogActions: Start   ipmi-1        (module-2)
Jul 15 12:59:13 module-2 pengine[4106]: notice: LogActions: Start   SlaveIP        (module-2)
Jul 15 12:59:13 module-2 pengine[4106]: notice: LogActions: Start   postgres:0        (module-2)
Jul 15 12:59:13 module-2 pengine[4106]: notice: LogActions: Start   ethmonitor:0        (module-2)
Jul 15 12:59:13 module-2 pengine[4106]: notice: LogActions: Start   tomcat-instance:0        (module-2 - blocked)
Jul 15 12:59:13 module-2 pengine[4106]: notice: LogActions: Start   ClusterMonitor:0        (module-2 - blocked)

Votequorum shows 1 vote per node, WaitForAll is set. Pacemaker should not be able to start ANY resources until it has seen all nodes once.

module-2 ~ # corosync-quorumtool
Quorum information
------------------
Date:             Wed Jul 15 18:15:34 2015
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          2
Ring ID:          64
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      1
Quorum:           1
Flags:            2Node Quorate WaitForAll

Membership information
----------------------
    Nodeid      Votes Name
         2          1 module-2 (local)

Package versions:

-bash-4.3# rpm -qa | grep corosync
corosynclib-2.3.4-1.fc22.x86_64
corosync-2.3.4-1.fc22.x86_64

-bash-4.3# rpm -qa | grep pacemaker
pacemaker-cluster-libs-1.1.12-2.fc22.x86_64
pacemaker-libs-1.1.12-2.fc22.x86_64
pacemaker-cli-1.1.12-2.fc22.x86_64
pacemaker-1.1.12-2.fc22.x86_64

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20150720/c19a81d1/attachment-0002.html>