[Pacemaker] Sporadic problems of rejoin after split brain situation

Tue Mar 20 18:02:49 EDT 2012

On Wed, Mar 21, 2012 at 8:46 AM, Lars Ellenberg
<lars.ellenberg at linbit.com> wrote:
> On Sun, Mar 18, 2012 at 11:53:11PM +0000, Roman Sidler wrote:
>> Hi all
>> No, we have no crossover cable between the nodes.
>> The 2 nodes are linked by a switched network, and this works really fine except
>> the mentioned case.
>>
>> It's rather easy to reproduce.
>> 1. Activate a 2-node cluster
>> 2. disconnect network connection (e.g by disconnect network adapter on VM)
>> 3. wait until both nodes are active and act as DC
>> 4. reconnect nodes
>> 5. the new DC is elected
>>
>> When step 4 encounters an unsteady network, sometimes step 5 will never be
>> reached and both nodes stays DC. They're are sending and receiving heartbeat
>> status messages.
>>
>> The situation may be simulated by some repeatedly disconnects/connects
>>
>> Versions:
>> pacemaker 1.1.6 (and 1.0.5)
>> heartbeat 3.0.7 (and 3.0.0)
>
> There is no heartbeat 3.0.7 (yet), so I guess you meant 2.0.7.
> And no doubt that has a few problems.
>
> You are supposed to use latest heartbeat (now: 3.0.5)
> http://hg.linux-ha.org/heartbeat-STABLE_3_0/
>
> "Unsteady network": I suppose that means at least packet loss.
> There have been long standing bugs with retransmit and packet loss
> in pacemaker which I only fixed in 3.0.5.
>
> There is one more problem in that area I'm aware of, which is much less
> likely to trigger, but if it does, you'll know, as heartbeat will start
> spinning with high cpu load, and not even notice that a peer node is dead.
>
> It has been introduced in 2005 or so, is a glib mainloop callback
> priority inversion involving timeout based events, only triggers
> sometimes, may need even special config parameters in ha.cf to trigger
> at all (and needs massive packet loss ...). And it is even fixed as
> well. But as I only had one complaint that remotely matched these
> symptoms, it is still not in the repos.
>
> Hopefully we find some time to clean that up
> and release it with heartbeat 3.0.6.
>
> As we finally noticed that some things still packaged with heartbeat
> actually belong into glue, we seem to have to cut a release anyways,
> "soon".
>
> BTW, I've seen similar behaviour with corosync as well.
> In fact, for me, this exact scenario (node returning after having been
> declared dead) typically does NOT work with pacemaker on corosync,
> while it always worked with pacemaker on heartbeat...

I believe things are much improved with more recent releases.
Hence the recommendation of 1.4.x :-)

>
> Still, Andrew is right: for new clusters, corosync is the way to go.
>
> Not so long ago, I posted some corosync vs heartbeat summary
> from my current perspective:
> http://permalink.gmane.org/gmane.linux.highavailability.user/36903
>
>> *** pacemaker 1.0.5 ***
>
> That's not the most recent, either.
> iirc, there have been some fixes in pacemaker as well,
> in the area of rejoining after partition.
>
> But really: you need to fix your "unsteady" network,
> and probably should implement stonith.
>
>> Mar  8 07:05:50 LAB19 heartbeat: [2979]: CRIT: Cluster node lab13 returning
>> after partition.
>> Mar  8 07:05:50 LAB19 heartbeat: [2979]: info: For information on cluster
>> partitions, See URL: http://linux-ha.org/SplitBrain
>> Mar  8 07:05:50 LAB19 heartbeat: [2979]: WARN: Deadtime value may be too small.
>> Mar  8 07:05:50 LAB19 heartbeat: [2979]: info: See FAQ for information on tuning
>> deadtime.
>> Mar  8 07:05:50 LAB19 heartbeat: [2979]: info: URL: http://linux-
>> ha.org/FAQ#heavy_load
>> Mar  8 07:05:50 LAB19 heartbeat: [2979]: info: Link lab13:eth0 up.
>> Mar  8 07:05:50 LAB19 heartbeat: [2979]: WARN: Late heartbeat: Node lab13:
>> interval 244130 ms
>> Mar  8 07:05:50 LAB19 heartbeat: [2979]: info: Status update for node lab13:
>> status active
>> Mar  8 07:05:50 LAB19 crmd: [3083]: notice: crmd_ha_status_callback: Status
>> update: Node lab13 now has status [active] (DC=true)
>> Mar  8 07:05:50 LAB19 crmd: [3083]: info: crm_update_peer_proc: lab13.ais is now
>> online
>> Mar  8 07:05:50 LAB19 cib: [3079]: WARN: cib_peer_callback: Discarding
>> cib_apply_diff message (14a33) from lab13: not in our membership
>> Mar  8 07:05:50 LAB19 heartbeat: [2979]: info: all clients are now paused
>> Mar  8 07:05:51 LAB19 ccm: [3078]: info: Break tie for 2 nodes cluster
>> Mar  8 07:05:51 LAB19 crmd: [3083]: info: mem_handle_event: Got an event
>> OC_EV_MS_INVALID from ccm
>> Mar  8 07:05:51 LAB19 crmd: [3083]: info: mem_handle_event: no mbr_track info
>> Mar  8 07:05:51 LAB19 crmd: [3083]: info: mem_handle_event: Got an event
>> OC_EV_MS_NEW_MEMBERSHIP from ccm
>> Mar  8 07:05:51 LAB19 crmd: [3083]: info: mem_handle_event: instance=15,
>> nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
>> Mar  8 07:05:51 LAB19 crmd: [3083]: info: crmd_ccm_msg_callback: Quorum
>> (re)attained after event=NEW MEMBERSHIP (id=15)
>> Mar  8 07:05:51 LAB19 crmd: [3083]: info: ccm_event_detail: NEW MEMBERSHIP:
>> trans=15, nodes=1, new=0, lost=0 n_idx=0, new_idx=1, old_idx=3
>> Mar  8 07:05:51 LAB19 crmd: [3083]: info: ccm_event_detail: CURRENT: lab19
>> [nodeid=1, born=15]
>> Mar  8 07:05:51 LAB19 crmd: [3083]: info: populate_cib_nodes_ha: Requesting the
>> list of configured nodes
>> Mar  8 07:05:51 LAB19 cib: [3079]: info: mem_handle_event: Got an event
>> OC_EV_MS_INVALID from ccm
>> Mar  8 07:05:51 LAB19 cib: [3079]: info: mem_handle_event: no mbr_track info
>> Mar  8 07:05:51 LAB19 cib: [3079]: info: mem_handle_event: Got an event
>> OC_EV_MS_NEW_MEMBERSHIP from ccm
>> Mar  8 07:05:51 LAB19 cib: [3079]: info: mem_handle_event: instance=15, nodes=1,
>> new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
>> Mar  8 07:05:51 LAB19 cib: [3079]: info: cib_ccm_msg_callback: Processing CCM
>> event=NEW MEMBERSHIP (id=15)
>> Mar  8 07:05:51 LAB19 heartbeat: [2979]: info: all clients are now resumed
>> Mar  8 07:05:51 LAB19 ccm: [3078]: info: Break tie for 2 nodes cluster
>> Mar  8 07:05:51 LAB19 ccm: [3078]: info: Break tie for 2 nodes cluster
>> Mar  8 07:05:51 LAB19 cib: [3079]: info: mem_handle_event: Got an event
>> OC_EV_MS_INVALID from ccm
>> Mar  8 07:05:51 LAB19 cib: [3079]: info: mem_handle_event: no mbr_track info
>> Mar  8 07:05:51 LAB19 cib: [3079]: info: mem_handle_event: Got an event
>> OC_EV_MS_NEW_MEMBERSHIP from ccm
>> Mar  8 07:05:51 LAB19 cib: [3079]: info: mem_handle_event: instance=16, nodes=1,
>> new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
>> Mar  8 07:05:51 LAB19 cib: [3079]: info: cib_ccm_msg_callback: Processing CCM
>> event=NEW MEMBERSHIP (id=16)
>> Mar  8 07:05:51 LAB19 cib: [3079]: info: mem_handle_event: Got an event
>> OC_EV_MS_INVALID from ccm
>> Mar  8 07:05:51 LAB19 cib: [3079]: info: mem_handle_event: no mbr_track info
>> Mar  8 07:05:51 LAB19 cib: [3079]: info: mem_handle_event: Got an event
>> OC_EV_MS_NEW_MEMBERSHIP from ccm
>> Mar  8 07:05:51 LAB19 cib: [3079]: info: mem_handle_event: instance=17, nodes=1,
>> new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
>> Mar  8 07:05:51 LAB19 cib: [3079]: info: cib_ccm_msg_callback: Processing CCM
>> event=NEW MEMBERSHIP (id=17)
>> Mar  8 07:05:52 LAB19 crmd: [3083]: info: mem_handle_event: Got an event
>> OC_EV_MS_INVALID from ccm
>> Mar  8 07:05:52 LAB19 crmd: [3083]: info: mem_handle_event: no mbr_track info
>> Mar  8 07:05:52 LAB19 crmd: [3083]: info: mem_handle_event: Got an event
>> OC_EV_MS_NEW_MEMBERSHIP from ccm
>> Mar  8 07:05:52 LAB19 crmd: [3083]: info: mem_handle_event: instance=16,
>> nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
>> :
>>
>> *** pacemaker 1.1.6 ***
>> Mar 16 01:35:28 oan1 heartbeat: [7556]: CRIT: Cluster node oan3 returning after
>> partition.
>> Mar 16 01:35:28 oan1 heartbeat: [7556]: info: For information on cluster
>> partitions, See URL: http://linux-ha.org/wiki/Split_Brain
>> Mar 16 01:35:28 oan1 heartbeat: [7556]: WARN: Deadtime value may be too small.
>> Mar 16 01:35:28 oan1 heartbeat: [7556]: info: See FAQ for information on tuning
>> deadtime.
>> Mar 16 01:35:28 oan1 heartbeat: [7556]: info: URL: http://linux-
>> ha.org/wiki/FAQ#Heavy_Load
>> Mar 16 01:35:28 oan1 heartbeat: [7556]: info: Link oan3:eth0 up.
>> Mar 16 01:35:28 oan1 heartbeat: [7556]: WARN: Late heartbeat: Node oan3:
>> interval 73300 ms
>> Mar 16 01:35:28 oan1 heartbeat: [7556]: info: Status update for node oan3:
>> status active
>> Mar 16 01:35:28 oan1 crmd: [7601]: notice: crmd_ha_status_callback: Status
>> update: Node oan3 now has status [active]
>> Mar 16 01:35:28 oan1 crmd: [7601]: info: crm_update_peer_proc: oan3.ais is now
>> online
>> Mar 16 01:35:28 oan1 crmd: [7601]: WARN: match_down_event: No match for shutdown
>> action on a9dae67b-b98f-498b-ad1b-950fc73c2ab2
>> Mar 16 01:35:28 oan1 crmd: [7601]: info: te_update_diff: Stonith/shutdown of
>> a9dae67b-b98f-498b-ad1b-950fc73c2ab2 not matched
>> Mar 16 01:35:28 oan1 crmd: [7601]: info: abort_transition_graph:
>> te_update_diff:215 - Triggered transition abort (complete=1, tag=node_state,
>> id=a9dae67b-b98f-498b-ad1b-950fc73c2ab2, magic=NA, cib=0.210.50) : Node failure
>> Mar 16 01:35:28 oan1 crmd: [7601]: info: do_state_transition: State transition
>> S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
>> origin=abort_transition_graph ]
>> Mar 16 01:35:28 oan1 crmd: [7601]: info: do_state_transition: All 1 cluster
>> nodes are eligible to run resources.
>> Mar 16 01:35:28 oan1 crmd: [7601]: info: do_pe_invoke: Query 204: Requesting the
>> current CIB: S_POLICY_ENGINE
>> Mar 16 01:35:28 oan1 crmd: [7601]: info: do_pe_invoke_callback: Invoking the PE:
>> query=204, ref=pe_calc-dc-1331858128-123, seq=13, quorate=1
>> Mar 16 01:35:28 oan1 pengine: [17673]: notice: LogActions: Leave
>> rsc_clustercontrol    (Started oan1)
>> Mar 16 01:35:28 oan1 pengine: [17673]: notice: LogActions: Leave   rscIpaddr2
>> (Started oan1)
>> Mar 16 01:35:28 oan1 pengine: [17673]: notice: LogActions: Leave   rsc_asterisk
>> (Started oan1)
>> Mar 16 01:35:28 oan1 pengine: [17673]: notice: LogActions: Leave   rsc_tomcat5
>> (Started oan1)
>> Mar 16 01:35:28 oan1 pengine: [17673]: notice: LogActions: Leave
>> rsc_ntp_refresh       (Started oan1)
>> Mar 16 01:35:28 oan1 crmd: [7601]: info: do_state_transition: State transition
>> S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
>> origin=handle_response ]
>> Mar 16 01:35:28 oan1 crmd: [7601]: info: unpack_graph: Unpacked transition 16: 0
>> actions in 0 synapses
>> Mar 16 01:35:28 oan1 crmd: [7601]: info: do_te_invoke: Processing graph 16
>> (ref=pe_calc-dc-1331858128-123) derived from /var/lib/pengine/pe-input-201.bz2
>> Mar 16 01:35:28 oan1 pengine: [17673]: notice: process_pe_message: Transition
>> 16: PEngine Input stored in: /var/lib/pengine/pe-input-201.bz2
>> Mar 16 01:35:28 oan1 crmd: [7601]: info: run_graph:
>> ====================================================
>> Mar 16 01:35:28 oan1 crmd: [7601]: notice: run_graph: Transition 16 (Complete=0,
>> Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-
>> 201.bz2): Complete
>> Mar 16 01:35:28 oan1 crmd: [7601]: info: te_graph_trigger: Transition 16 is now
>> complete
>> Mar 16 01:35:28 oan1 crmd: [7601]: info: notify_crmd: Transition 16 status: done
>> - <null>
>> Mar 16 01:35:28 oan1 crmd: [7601]: info: do_state_transition: State transition
>> S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL
>> origin=notify_crmd ]
>> Mar 16 01:35:28 oan1 crmd: [7601]: info: do_state_transition: Starting PEngine
>> Recheck Timer
>> Mar 16 01:35:30 oan1 cib: [7597]: info: cib_process_diff: Diff 0.210.56 ->
>> 0.210.57 not applied to 0.210.50: current "num_updates" is less than required
>> Mar 16 01:35:30 oan1 cib: [7597]: WARN: cib_server_process_diff: Not requesting
>> full refresh in R/W mode
>> Mar 16 01:35:30 oan1 ccm: [7596]: info: Break tie for 2 nodes cluster
>> Mar 16 01:35:30 oan1 cib: [7597]: info: mem_handle_event: Got an event
>> OC_EV_MS_INVALID from ccm
>> Mar 16 01:35:30 oan1 cib: [7597]: info: mem_handle_event: no mbr_track info
>> Mar 16 01:35:30 oan1 cib: [7597]: info: mem_handle_event: Got an event
>> OC_EV_MS_NEW_MEMBERSHIP from ccm
>> Mar 16 01:35:30 oan1 cib: [7597]: info: mem_handle_event: instance=14, nodes=1,
>> new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
>> Mar 16 01:35:30 oan1 cib: [7597]: info: cib_ccm_msg_callback: Processing CCM
>> event=NEW MEMBERSHIP (id=14)
>> Mar 16 01:35:30 oan1 crmd: [7601]: info: mem_handle_event: Got an event
>> OC_EV_MS_INVALID from ccm
>> Mar 16 01:35:30 oan1 crmd: [7601]: info: mem_handle_event: no mbr_track info
>> Mar 16 01:35:30 oan1 crmd: [7601]: info: mem_handle_event: Got an event
>> OC_EV_MS_NEW_MEMBERSHIP from ccm
>> Mar 16 01:35:30 oan1 crmd: [7601]: info: mem_handle_event: instance=14, nodes=1,
>> new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
>> Mar 16 01:35:30 oan1 crmd: [7601]: info: crmd_ccm_msg_callback: Quorum
>> (re)attained after event=NEW MEMBERSHIP (id=14)
>> Mar 16 01:35:30 oan1 crmd: [7601]: info: ccm_event_detail: NEW MEMBERSHIP:
>> trans=14, nodes=1, new=0, lost=0 n_idx=0, new_idx=1, old_idx=3
>> Mar 16 01:35:30 oan1 crmd: [7601]: info: ccm_event_detail:    CURRENT: oan1
>> [nodeid=0, born=14]
>> Mar 16 01:35:30 oan1 crmd: [7601]: info: populate_cib_nodes_ha: Requesting the
>> list of configured nodes
>> Mar 16 01:35:31 oan1 ccm: [7596]: info: Break tie for 2 nodes cluster
>> Mar 16 01:35:31 oan1 crmd: [7601]: info: mem_handle_event: Got an event
>> OC_EV_MS_INVALID from ccm
>> Mar 16 01:35:31 oan1 crmd: [7601]: info: mem_handle_event: no mbr_track info
>> Mar 16 01:35:31 oan1 crmd: [7601]: info: mem_handle_event: Got an event
>> OC_EV_MS_NEW_MEMBERSHIP from ccm
>> Mar 16 01:35:31 oan1 crmd: [7601]: info: mem_handle_event: instance=15, nodes=1,
>> new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
>> Mar 16 01:35:31 oan1 cib: [7597]: info: mem_handle_event: Got an event
>> OC_EV_MS_INVALID from ccm
>> Mar 16 01:35:31 oan1 cib: [7597]: info: mem_handle_event: no mbr_track info
>> Mar 16 01:35:31 oan1 cib: [7597]: info: mem_handle_event: Got an event
>> OC_EV_MS_NEW_MEMBERSHIP from ccm
>> Mar 16 01:35:31 oan1 cib: [7597]: info: mem_handle_event: instance=15, nodes=1,
>> new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
>> Mar 16 01:35:31 oan1 cib: [7597]: info: cib_ccm_msg_callback: Processing CCM
>> event=NEW MEMBERSHIP (id=15)
>> Mar 16 01:35:31 oan1 crmd: [7601]: info: crmd_ccm_msg_callback: Quorum
>> (re)attained after event=NEW MEMBERSHIP (id=15)
>> Mar 16 01:35:31 oan1 crmd: [7601]: info: ccm_event_detail: NEW MEMBERSHIP:
>> trans=15, nodes=1, new=0, lost=0 n_idx=0, new_idx=1, old_idx=3
>> Mar 16 01:35:31 oan1 crmd: [7601]: info: ccm_event_detail:    CURRENT: oan1
>> [nodeid=0, born=15]
>> Mar 16 01:35:31 oan1 cib: [7597]: info: cib_process_request: Operation complete:
>> op cib_modify for section nodes (origin=local/crmd/205, version=0.210.51): ok
>> (rc=0)
>> Mar 16 01:35:31 oan1 crmd: [7601]: info: populate_cib_nodes_ha: Requesting the
>> list of configured nodes
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org