[Pacemaker] Sporadic problems of rejoin after split brain situation

Tue Mar 20 17:46:38 EDT 2012

On Sun, Mar 18, 2012 at 11:53:11PM +0000, Roman Sidler wrote:
> Hi all
> No, we have no crossover cable between the nodes.
> The 2 nodes are linked by a switched network, and this works really fine except 
> the mentioned case. 
> 
> It's rather easy to reproduce.
> 1. Activate a 2-node cluster
> 2. disconnect network connection (e.g by disconnect network adapter on VM)
> 3. wait until both nodes are active and act as DC
> 4. reconnect nodes
> 5. the new DC is elected
> 
> When step 4 encounters an unsteady network, sometimes step 5 will never be 
> reached and both nodes stays DC. They're are sending and receiving heartbeat 
> status messages.
> 
> The situation may be simulated by some repeatedly disconnects/connects
> 
> Versions:
> pacemaker 1.1.6 (and 1.0.5)
> heartbeat 3.0.7 (and 3.0.0)

There is no heartbeat 3.0.7 (yet), so I guess you meant 2.0.7.
And no doubt that has a few problems.

You are supposed to use latest heartbeat (now: 3.0.5)
http://hg.linux-ha.org/heartbeat-STABLE_3_0/

"Unsteady network": I suppose that means at least packet loss.
There have been long standing bugs with retransmit and packet loss
in pacemaker which I only fixed in 3.0.5.

There is one more problem in that area I'm aware of, which is much less
likely to trigger, but if it does, you'll know, as heartbeat will start
spinning with high cpu load, and not even notice that a peer node is dead.

It has been introduced in 2005 or so, is a glib mainloop callback
priority inversion involving timeout based events, only triggers
sometimes, may need even special config parameters in ha.cf to trigger
at all (and needs massive packet loss ...). And it is even fixed as
well. But as I only had one complaint that remotely matched these
symptoms, it is still not in the repos.

Hopefully we find some time to clean that up
and release it with heartbeat 3.0.6.

As we finally noticed that some things still packaged with heartbeat
actually belong into glue, we seem to have to cut a release anyways,
"soon".

BTW, I've seen similar behaviour with corosync as well.
In fact, for me, this exact scenario (node returning after having been
declared dead) typically does NOT work with pacemaker on corosync,
while it always worked with pacemaker on heartbeat...

Still, Andrew is right: for new clusters, corosync is the way to go.

Not so long ago, I posted some corosync vs heartbeat summary
from my current perspective:
http://permalink.gmane.org/gmane.linux.highavailability.user/36903

> *** pacemaker 1.0.5 ***

That's not the most recent, either.
iirc, there have been some fixes in pacemaker as well,
in the area of rejoining after partition.

But really: you need to fix your "unsteady" network,
and probably should implement stonith.

> Mar  8 07:05:50 LAB19 heartbeat: [2979]: CRIT: Cluster node lab13 returning 
> after partition.
> Mar  8 07:05:50 LAB19 heartbeat: [2979]: info: For information on cluster 
> partitions, See URL: http://linux-ha.org/SplitBrain
> Mar  8 07:05:50 LAB19 heartbeat: [2979]: WARN: Deadtime value may be too small.
> Mar  8 07:05:50 LAB19 heartbeat: [2979]: info: See FAQ for information on tuning 
> deadtime.
> Mar  8 07:05:50 LAB19 heartbeat: [2979]: info: URL: http://linux-
> ha.org/FAQ#heavy_load
> Mar  8 07:05:50 LAB19 heartbeat: [2979]: info: Link lab13:eth0 up.
> Mar  8 07:05:50 LAB19 heartbeat: [2979]: WARN: Late heartbeat: Node lab13: 
> interval 244130 ms
> Mar  8 07:05:50 LAB19 heartbeat: [2979]: info: Status update for node lab13: 
> status active
> Mar  8 07:05:50 LAB19 crmd: [3083]: notice: crmd_ha_status_callback: Status 
> update: Node lab13 now has status [active] (DC=true)
> Mar  8 07:05:50 LAB19 crmd: [3083]: info: crm_update_peer_proc: lab13.ais is now 
> online
> Mar  8 07:05:50 LAB19 cib: [3079]: WARN: cib_peer_callback: Discarding 
> cib_apply_diff message (14a33) from lab13: not in our membership
> Mar  8 07:05:50 LAB19 heartbeat: [2979]: info: all clients are now paused
> Mar  8 07:05:51 LAB19 ccm: [3078]: info: Break tie for 2 nodes cluster
> Mar  8 07:05:51 LAB19 crmd: [3083]: info: mem_handle_event: Got an event 
> OC_EV_MS_INVALID from ccm
> Mar  8 07:05:51 LAB19 crmd: [3083]: info: mem_handle_event: no mbr_track info
> Mar  8 07:05:51 LAB19 crmd: [3083]: info: mem_handle_event: Got an event 
> OC_EV_MS_NEW_MEMBERSHIP from ccm
> Mar  8 07:05:51 LAB19 crmd: [3083]: info: mem_handle_event: instance=15, 
> nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
> Mar  8 07:05:51 LAB19 crmd: [3083]: info: crmd_ccm_msg_callback: Quorum 
> (re)attained after event=NEW MEMBERSHIP (id=15)
> Mar  8 07:05:51 LAB19 crmd: [3083]: info: ccm_event_detail: NEW MEMBERSHIP: 
> trans=15, nodes=1, new=0, lost=0 n_idx=0, new_idx=1, old_idx=3
> Mar  8 07:05:51 LAB19 crmd: [3083]: info: ccm_event_detail: CURRENT: lab19 
> [nodeid=1, born=15]
> Mar  8 07:05:51 LAB19 crmd: [3083]: info: populate_cib_nodes_ha: Requesting the 
> list of configured nodes
> Mar  8 07:05:51 LAB19 cib: [3079]: info: mem_handle_event: Got an event 
> OC_EV_MS_INVALID from ccm
> Mar  8 07:05:51 LAB19 cib: [3079]: info: mem_handle_event: no mbr_track info
> Mar  8 07:05:51 LAB19 cib: [3079]: info: mem_handle_event: Got an event 
> OC_EV_MS_NEW_MEMBERSHIP from ccm
> Mar  8 07:05:51 LAB19 cib: [3079]: info: mem_handle_event: instance=15, nodes=1, 
> new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
> Mar  8 07:05:51 LAB19 cib: [3079]: info: cib_ccm_msg_callback: Processing CCM 
> event=NEW MEMBERSHIP (id=15)
> Mar  8 07:05:51 LAB19 heartbeat: [2979]: info: all clients are now resumed
> Mar  8 07:05:51 LAB19 ccm: [3078]: info: Break tie for 2 nodes cluster
> Mar  8 07:05:51 LAB19 ccm: [3078]: info: Break tie for 2 nodes cluster
> Mar  8 07:05:51 LAB19 cib: [3079]: info: mem_handle_event: Got an event 
> OC_EV_MS_INVALID from ccm
> Mar  8 07:05:51 LAB19 cib: [3079]: info: mem_handle_event: no mbr_track info
> Mar  8 07:05:51 LAB19 cib: [3079]: info: mem_handle_event: Got an event 
> OC_EV_MS_NEW_MEMBERSHIP from ccm
> Mar  8 07:05:51 LAB19 cib: [3079]: info: mem_handle_event: instance=16, nodes=1, 
> new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
> Mar  8 07:05:51 LAB19 cib: [3079]: info: cib_ccm_msg_callback: Processing CCM 
> event=NEW MEMBERSHIP (id=16)
> Mar  8 07:05:51 LAB19 cib: [3079]: info: mem_handle_event: Got an event 
> OC_EV_MS_INVALID from ccm
> Mar  8 07:05:51 LAB19 cib: [3079]: info: mem_handle_event: no mbr_track info
> Mar  8 07:05:51 LAB19 cib: [3079]: info: mem_handle_event: Got an event 
> OC_EV_MS_NEW_MEMBERSHIP from ccm
> Mar  8 07:05:51 LAB19 cib: [3079]: info: mem_handle_event: instance=17, nodes=1, 
> new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
> Mar  8 07:05:51 LAB19 cib: [3079]: info: cib_ccm_msg_callback: Processing CCM 
> event=NEW MEMBERSHIP (id=17)
> Mar  8 07:05:52 LAB19 crmd: [3083]: info: mem_handle_event: Got an event 
> OC_EV_MS_INVALID from ccm
> Mar  8 07:05:52 LAB19 crmd: [3083]: info: mem_handle_event: no mbr_track info
> Mar  8 07:05:52 LAB19 crmd: [3083]: info: mem_handle_event: Got an event 
> OC_EV_MS_NEW_MEMBERSHIP from ccm
> Mar  8 07:05:52 LAB19 crmd: [3083]: info: mem_handle_event: instance=16, 
> nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
> :
> 
> *** pacemaker 1.1.6 ***
> Mar 16 01:35:28 oan1 heartbeat: [7556]: CRIT: Cluster node oan3 returning after 
> partition.
> Mar 16 01:35:28 oan1 heartbeat: [7556]: info: For information on cluster 
> partitions, See URL: http://linux-ha.org/wiki/Split_Brain
> Mar 16 01:35:28 oan1 heartbeat: [7556]: WARN: Deadtime value may be too small.
> Mar 16 01:35:28 oan1 heartbeat: [7556]: info: See FAQ for information on tuning 
> deadtime.
> Mar 16 01:35:28 oan1 heartbeat: [7556]: info: URL: http://linux-
> ha.org/wiki/FAQ#Heavy_Load
> Mar 16 01:35:28 oan1 heartbeat: [7556]: info: Link oan3:eth0 up.
> Mar 16 01:35:28 oan1 heartbeat: [7556]: WARN: Late heartbeat: Node oan3: 
> interval 73300 ms
> Mar 16 01:35:28 oan1 heartbeat: [7556]: info: Status update for node oan3: 
> status active
> Mar 16 01:35:28 oan1 crmd: [7601]: notice: crmd_ha_status_callback: Status 
> update: Node oan3 now has status [active]
> Mar 16 01:35:28 oan1 crmd: [7601]: info: crm_update_peer_proc: oan3.ais is now 
> online
> Mar 16 01:35:28 oan1 crmd: [7601]: WARN: match_down_event: No match for shutdown 
> action on a9dae67b-b98f-498b-ad1b-950fc73c2ab2
> Mar 16 01:35:28 oan1 crmd: [7601]: info: te_update_diff: Stonith/shutdown of 
> a9dae67b-b98f-498b-ad1b-950fc73c2ab2 not matched
> Mar 16 01:35:28 oan1 crmd: [7601]: info: abort_transition_graph: 
> te_update_diff:215 - Triggered transition abort (complete=1, tag=node_state, 
> id=a9dae67b-b98f-498b-ad1b-950fc73c2ab2, magic=NA, cib=0.210.50) : Node failure
> Mar 16 01:35:28 oan1 crmd: [7601]: info: do_state_transition: State transition 
> S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
> origin=abort_transition_graph ]
> Mar 16 01:35:28 oan1 crmd: [7601]: info: do_state_transition: All 1 cluster 
> nodes are eligible to run resources.
> Mar 16 01:35:28 oan1 crmd: [7601]: info: do_pe_invoke: Query 204: Requesting the 
> current CIB: S_POLICY_ENGINE
> Mar 16 01:35:28 oan1 crmd: [7601]: info: do_pe_invoke_callback: Invoking the PE: 
> query=204, ref=pe_calc-dc-1331858128-123, seq=13, quorate=1
> Mar 16 01:35:28 oan1 pengine: [17673]: notice: LogActions: Leave   
> rsc_clustercontrol	(Started oan1)
> Mar 16 01:35:28 oan1 pengine: [17673]: notice: LogActions: Leave   rscIpaddr2	
> (Started oan1)
> Mar 16 01:35:28 oan1 pengine: [17673]: notice: LogActions: Leave   rsc_asterisk	
> (Started oan1)
> Mar 16 01:35:28 oan1 pengine: [17673]: notice: LogActions: Leave   rsc_tomcat5	
> (Started oan1)
> Mar 16 01:35:28 oan1 pengine: [17673]: notice: LogActions: Leave   
> rsc_ntp_refresh	(Started oan1)
> Mar 16 01:35:28 oan1 crmd: [7601]: info: do_state_transition: State transition 
> S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE 
> origin=handle_response ]
> Mar 16 01:35:28 oan1 crmd: [7601]: info: unpack_graph: Unpacked transition 16: 0 
> actions in 0 synapses
> Mar 16 01:35:28 oan1 crmd: [7601]: info: do_te_invoke: Processing graph 16 
> (ref=pe_calc-dc-1331858128-123) derived from /var/lib/pengine/pe-input-201.bz2
> Mar 16 01:35:28 oan1 pengine: [17673]: notice: process_pe_message: Transition 
> 16: PEngine Input stored in: /var/lib/pengine/pe-input-201.bz2
> Mar 16 01:35:28 oan1 crmd: [7601]: info: run_graph: 
> ====================================================
> Mar 16 01:35:28 oan1 crmd: [7601]: notice: run_graph: Transition 16 (Complete=0, 
> Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-
> 201.bz2): Complete
> Mar 16 01:35:28 oan1 crmd: [7601]: info: te_graph_trigger: Transition 16 is now 
> complete
> Mar 16 01:35:28 oan1 crmd: [7601]: info: notify_crmd: Transition 16 status: done 
> - <null>
> Mar 16 01:35:28 oan1 crmd: [7601]: info: do_state_transition: State transition 
> S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL 
> origin=notify_crmd ]
> Mar 16 01:35:28 oan1 crmd: [7601]: info: do_state_transition: Starting PEngine 
> Recheck Timer
> Mar 16 01:35:30 oan1 cib: [7597]: info: cib_process_diff: Diff 0.210.56 -> 
> 0.210.57 not applied to 0.210.50: current "num_updates" is less than required
> Mar 16 01:35:30 oan1 cib: [7597]: WARN: cib_server_process_diff: Not requesting 
> full refresh in R/W mode
> Mar 16 01:35:30 oan1 ccm: [7596]: info: Break tie for 2 nodes cluster
> Mar 16 01:35:30 oan1 cib: [7597]: info: mem_handle_event: Got an event 
> OC_EV_MS_INVALID from ccm
> Mar 16 01:35:30 oan1 cib: [7597]: info: mem_handle_event: no mbr_track info
> Mar 16 01:35:30 oan1 cib: [7597]: info: mem_handle_event: Got an event 
> OC_EV_MS_NEW_MEMBERSHIP from ccm
> Mar 16 01:35:30 oan1 cib: [7597]: info: mem_handle_event: instance=14, nodes=1, 
> new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
> Mar 16 01:35:30 oan1 cib: [7597]: info: cib_ccm_msg_callback: Processing CCM 
> event=NEW MEMBERSHIP (id=14)
> Mar 16 01:35:30 oan1 crmd: [7601]: info: mem_handle_event: Got an event 
> OC_EV_MS_INVALID from ccm
> Mar 16 01:35:30 oan1 crmd: [7601]: info: mem_handle_event: no mbr_track info
> Mar 16 01:35:30 oan1 crmd: [7601]: info: mem_handle_event: Got an event 
> OC_EV_MS_NEW_MEMBERSHIP from ccm
> Mar 16 01:35:30 oan1 crmd: [7601]: info: mem_handle_event: instance=14, nodes=1, 
> new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
> Mar 16 01:35:30 oan1 crmd: [7601]: info: crmd_ccm_msg_callback: Quorum 
> (re)attained after event=NEW MEMBERSHIP (id=14)
> Mar 16 01:35:30 oan1 crmd: [7601]: info: ccm_event_detail: NEW MEMBERSHIP: 
> trans=14, nodes=1, new=0, lost=0 n_idx=0, new_idx=1, old_idx=3
> Mar 16 01:35:30 oan1 crmd: [7601]: info: ccm_event_detail: 	CURRENT: oan1 
> [nodeid=0, born=14]
> Mar 16 01:35:30 oan1 crmd: [7601]: info: populate_cib_nodes_ha: Requesting the 
> list of configured nodes
> Mar 16 01:35:31 oan1 ccm: [7596]: info: Break tie for 2 nodes cluster
> Mar 16 01:35:31 oan1 crmd: [7601]: info: mem_handle_event: Got an event 
> OC_EV_MS_INVALID from ccm
> Mar 16 01:35:31 oan1 crmd: [7601]: info: mem_handle_event: no mbr_track info
> Mar 16 01:35:31 oan1 crmd: [7601]: info: mem_handle_event: Got an event 
> OC_EV_MS_NEW_MEMBERSHIP from ccm
> Mar 16 01:35:31 oan1 crmd: [7601]: info: mem_handle_event: instance=15, nodes=1, 
> new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
> Mar 16 01:35:31 oan1 cib: [7597]: info: mem_handle_event: Got an event 
> OC_EV_MS_INVALID from ccm
> Mar 16 01:35:31 oan1 cib: [7597]: info: mem_handle_event: no mbr_track info
> Mar 16 01:35:31 oan1 cib: [7597]: info: mem_handle_event: Got an event 
> OC_EV_MS_NEW_MEMBERSHIP from ccm
> Mar 16 01:35:31 oan1 cib: [7597]: info: mem_handle_event: instance=15, nodes=1, 
> new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
> Mar 16 01:35:31 oan1 cib: [7597]: info: cib_ccm_msg_callback: Processing CCM 
> event=NEW MEMBERSHIP (id=15)
> Mar 16 01:35:31 oan1 crmd: [7601]: info: crmd_ccm_msg_callback: Quorum 
> (re)attained after event=NEW MEMBERSHIP (id=15)
> Mar 16 01:35:31 oan1 crmd: [7601]: info: ccm_event_detail: NEW MEMBERSHIP: 
> trans=15, nodes=1, new=0, lost=0 n_idx=0, new_idx=1, old_idx=3
> Mar 16 01:35:31 oan1 crmd: [7601]: info: ccm_event_detail: 	CURRENT: oan1 
> [nodeid=0, born=15]
> Mar 16 01:35:31 oan1 cib: [7597]: info: cib_process_request: Operation complete: 
> op cib_modify for section nodes (origin=local/crmd/205, version=0.210.51): ok 
> (rc=0)
> Mar 16 01:35:31 oan1 crmd: [7601]: info: populate_cib_nodes_ha: Requesting the 
> list of configured nodes
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.