<div dir="ltr">Hi All,<div><br></div><div><span class="" style="font-size:12.8px">We are using Pacemaker</span><span style="font-size:12.8px"> 1.1.6 on Ubuntu Precise. We have two nodes running KVM as hypervisor & have one VM each on each node (vm-0/vm-1) that are paired in a HA cluster using heartbeat/pacemaker. We have redudant network links between the VM's & also have setup our own STONITH resource that interacts with the underlying node hypervisor to reboot VM's.</span><br></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">We ran into an issue where the underlying node crashed badly, but the networking & the VM were running ok for some more period of time, but with some kind of slowness. So initially heartbeat on the healthy VM detected that the peer VM died & started recovery action with a STONITH, but suddenly when peer VM heartbeats started to arrive, it tried to abort stonith recovery & also unset the current VM as DC. But the STONITH thread was progressing independently & managed to reboot the peer VM which was later recognized by further Link Down events. But this VM didnt takeover DC after this & remained with a single VM without DC for a long period till we identified/rebooted the crashed node.</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">Below are the full logs. Any pointers/insights on what fix should be done within heartbeat/pacemaker is greatly appreciated. Thanks.</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">(from logs)</span></div><div><span style="font-size:12.8px">VM-1 recognizes that VM-0 went away with Link Down events & starts recovery action</span></div><div><div style=""><span style="font-size:12.8px">Nov  9 16:58:40 vm-1 heartbeat: [2378]: WARN: node vm-0: is dead</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:40 vm-1 heartbeat: [2378]: info: Link vm-0:eth0 dead.</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:40 vm-1 heartbeat: [2378]: info: Link vm-0:eth2 dead.</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:40 vm-1 crmd: [2478]: notice: crmd_ha_status_callback: Status update: Node vm-0 now has status [dead]</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:40 vm-1 cib: [2474]: info: mem_handle_event: Got an event OC_EV_MS_NOT_PRIMARY from ccm</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:40 vm-1 cib: [2474]: info: mem_handle_event: instance=54, nodes=2, new=2, lost=0, n_idx=0, new_idx=0, old_idx=4</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:40 vm-1 cib: [2474]: info: cib_ccm_msg_callback: Processing CCM event=NOT PRIMARY (id=54)</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:40 vm-1 crmd: [2478]: info: crm_update_peer_proc: vm-0.ais is now offline</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:40 vm-1 crmd: [2478]: info: mem_handle_event: Got an event OC_EV_MS_NOT_PRIMARY from ccm</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:40 vm-1 crmd: [2478]: info: mem_handle_event: instance=54, nodes=2, new=2, lost=0, n_idx=0, new_idx=0, old_idx=4</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:40 vm-1 crmd: [2478]: info: crmd_ccm_msg_callback: Quorum lost after event=NOT PRIMARY (id=54)</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 ccm: [2473]: info: ccm_state_sent_memlistreq: directly callccm_compute_and_send_final_memlist()</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 ccm: [2473]: debug: quorum plugin: majority</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 ccm: [2473]: debug: cluster:linux-ha, member_count=1, member_quorum_votes=100</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 ccm: [2473]: debug: total_node_count=2, total_quorum_votes=200</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 ccm: [2473]: debug: quorum plugin: twonodes</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 ccm: [2473]: debug: cluster:linux-ha, member_count=1, member_quorum_votes=100</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 ccm: [2473]: debug: total_node_count=2, total_quorum_votes=200</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 ccm: [2473]: info: Break tie for 2 nodes cluster</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 ccm: [2473]: debug: quorum plugin: majority</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 cib: [2474]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 crmd: [2478]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 cib: [2474]: info: mem_handle_event: no mbr_track info</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 crmd: [2478]: info: mem_handle_event: no mbr_track info</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 cib: [2474]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 crmd: [2478]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 crmd: [2478]: info: mem_handle_event: instance=55, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 crmd: [2478]: info: crmd_ccm_msg_callback: Quorum (re)attained after event=NEW MEMBERSHIP (id=55)</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 crmd: [2478]: info: ccm_event_detail: NEW MEMBERSHIP: trans=55, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=3</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 cib: [2474]: info: mem_handle_event: instance=55, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 crmd: [2478]: info: ccm_event_detail: #011CURRENT: vm-1 [nodeid=1, born=55]</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 cib: [2474]: info: cib_ccm_msg_callback: Processing CCM event=NEW MEMBERSHIP (id=55)</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 crmd: [2478]: info: ccm_event_detail: #011LOST:    vm-0 [nodeid=0, born=52]</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 cib: [2474]: info: crm_update_peer: Node vm-0: id=0 state=lost (new) addr=(null) votes=-1 born=52 seen=54 proc=00000000000000000000000000000302</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 ccm: [2473]: debug: cluster:linux-ha, member_count=1, member_quorum_votes=100</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 ccm: [2473]: debug: total_node_count=2, total_quorum_votes=200</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 ccm: [2473]: debug: quorum plugin: twonodes</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 ccm: [2473]: debug: cluster:linux-ha, member_count=1, member_quorum_votes=100</span></div><div style=""><span style="font-size:12.8px">Nov  9 16:58:41 vm-1 ccm: [2473]: debug: total_node_count=2, total_quorum_votes=200</span></div></div><div style=""><span style="font-size:12.8px"><br></span></div><div style=""><span style="font-size:12.8px"><br></span></div><div style="">as part of the recovery it unsets the peer VM (VM-0) as DC & does an election with the single member that it has & moves itself as the DC with starting pengine</div><div style=""><div>Nov  9 16:58:41 vm-1 ccm: [2473]: info: Break tie for 2 nodes cluster</div><div>Nov  9 16:58:41 vm-1 crmd: [2478]: info: ais_status_callback: status: vm-0 is now lost (was member)</div><div>Nov  9 16:58:41 vm-1 crmd: [2478]: info: crm_update_peer: Node vm-0: id=0 state=lost (new) addr=(null) votes=-1 born=52 seen=54 proc=00000000000000000000000000000200</div><div>Nov  9 16:58:41 vm-1 crmd: [2478]: WARN: check_dead_member: Our DC node (vm-0) left the cluster</div><div>Nov  9 16:58:41 vm-1 crmd: [2478]: info: do_state_transition: State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=check_dead_member ]</div><div>Nov  9 16:58:41 vm-1 crmd: [2478]: info: update_dc: Unset DC vm-0</div><div>Nov  9 16:58:41 vm-1 crmd: [2478]: info: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]</div><div>Nov  9 16:58:41 vm-1 crmd: [2478]: info: do_te_control: Registering TE UUID: 4717ec7c-684e-4718-998f-fb1938efed3b</div><div>Nov  9 16:58:41 vm-1 crmd: [2478]: info: set_graph_functions: Setting custom graph functions</div><div>Nov  9 16:58:41 vm-1 crmd: [2478]: info: unpack_graph: Unpacked transition -1: 0 actions in 0 synapses</div><div>Nov  9 16:58:41 vm-1 crmd: [2478]: info: start_subsystem: Starting sub-system "pengine"</div><div>Nov  9 16:58:41 vm-1 pengine: [25018]: info: Invoked: /usr/lib/heartbeat/pengine</div><div>Nov  9 16:58:45 vm-1 crmd: [2478]: info: do_dc_takeover: Taking over DC status for this partition</div><div>Nov  9 16:58:45 vm-1 cib: [2474]: info: cib_process_readwrite: We are now in R/W mode</div><div>Nov  9 16:58:45 vm-1 cib: [2474]: info: cib_process_request: Operation complete: op cib_master for section 'all' (origin=local/crmd/28, version=1.30.149): ok (rc=0)</div><div>Nov  9 16:58:45 vm-1 cib: [2474]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/29, version=1.30.150): ok (rc=0)</div><div>Nov  9 16:58:45 vm-1 cib: [2474]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/31, version=1.30.151): ok (rc=0)</div><div>Nov  9 16:58:45 vm-1 crmd: [2478]: info: join_make_offer: Making join offers based on membership 55</div><div>Nov  9 16:58:45 vm-1 crmd: [2478]: info: do_dc_join_offer_all: join-1: Waiting on 1 outstanding join acks</div><div>Nov  9 16:58:45 vm-1 cib: [2474]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/33, version=1.30.152): ok (rc=0)</div><div>Nov  9 16:58:45 vm-1 crmd: [2478]: info: config_query_callback: Shutdown escalation occurs after: 1200000ms</div><div>Nov  9 16:58:45 vm-1 crmd: [2478]: info: config_query_callback: Checking for expired actions every 900000ms</div><div>Nov  9 16:58:45 vm-1 crmd: [2478]: info: update_dc: Set DC to vm-1 (3.0.5)</div><div>Nov  9 16:58:46 vm-1 crmd: [2478]: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]</div><div>Nov  9 16:58:46 vm-1 crmd: [2478]: info: do_state_transition: All 1 cluster nodes responded to the join offer.</div><div>Nov  9 16:58:46 vm-1 crmd: [2478]: info: do_dc_join_finalize: join-1: Syncing the CIB from vm-1 to the rest of the cluster</div><div>Nov  9 16:58:46 vm-1 cib: [2474]: info: cib_process_request: Operation complete: op cib_sync for section 'all' (origin=local/crmd/36, version=1.30.152): ok (rc=0)</div><div><div>Nov  9 16:58:46 vm-1 cib: [2474]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/37, version=1.30.153): ok (rc=0)</div><div>Nov  9 16:58:46 vm-1 lrmd: [2475]: info: stonith_api_device_metadata: looking up external/vmstonith/heartbeat metadata</div><div>Nov  9 16:58:46 vm-1 vmstonith: Invoked as:  /usr/lib/stonith/plugins/external/vmstonith getinfo-devdescr</div><div>Nov  9 16:58:46 vm-1 vmstonith: Invoked as:  /usr/lib/stonith/plugins/external/vmstonith getinfo-devid</div><div>Nov  9 16:58:46 vm-1 vmstonith: Invoked as:  /usr/lib/stonith/plugins/external/vmstonith getinfo-xml</div></div><div><br></div><div><br></div><div>while this is happening a late heartbeat arrived from peer VM which triggers aborting the current shutdown action</div><div><div>Nov  9 16:58:47 vm-1 heartbeat: [2378]: CRIT: Cluster node vm-0 returning after partition.</div><div>Nov  9 16:58:47 vm-1 crmd: [2478]: notice: crmd_ha_status_callback: Status update: Node vm-0 now has status [active]</div><div>Nov  9 16:58:47 vm-1 heartbeat: [2378]: info: For information on cluster partitions, See URL: <a href="http://linux-ha.org/wiki/Split_Brain">http://linux-ha.org/wiki/Split_Brain</a></div><div>Nov  9 16:58:47 vm-1 crmd: [2478]: info: crm_update_peer_proc: vm-0.ais is now online</div><div>Nov  9 16:58:47 vm-1 heartbeat: [2378]: WARN: Deadtime value may be too small.</div><div>Nov  9 16:58:47 vm-1 heartbeat: [2378]: info: See FAQ for information on tuning deadtime.</div><div>Nov  9 16:58:47 vm-1 heartbeat: [2378]: info: URL: <a href="http://linux-ha.org/wiki/FAQ#Heavy_Load">http://linux-ha.org/wiki/FAQ#Heavy_Load</a></div><div>Nov  9 16:58:47 vm-1 heartbeat: [2378]: info: Link vm-0:eth0 up.</div><div>Nov  9 16:58:47 vm-1 heartbeat: [2378]: WARN: Late heartbeat: Node vm-0: interval 14000 ms</div><div>Nov  9 16:58:47 vm-1 heartbeat: [2378]: info: Status update for node vm-0: status active</div><div>Nov  9 16:58:47 vm-1 crmd: [2478]: info: do_dc_join_ack: join-1: Updating node state to member for vm-1</div><div>Nov  9 16:58:47 vm-1 heartbeat: [2378]: info: Link vm-0:eth2 up.</div><div>Nov  9 16:58:47 vm-1 cib: [2474]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='vm-1']/lrm (origin=local/crmd/39, version=1.30.155): ok (rc=0)</div><div>Nov  9 16:58:47 vm-1 crmd: [2478]: info: erase_xpath_callback: Deletion of "//node_state[@uname='vm-1']/lrm": ok (rc=0)</div><div>Nov  9 16:58:47 vm-1 crmd: [2478]: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]</div><div>Nov  9 16:58:47 vm-1 crmd: [2478]: info: populate_cib_nodes_ha: Requesting the list of configured nodes</div><div>Nov  9 16:58:47 vm-1 crmd: [2478]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.</div><div>Nov  9 16:58:47 vm-1 crmd: [2478]: info: do_dc_join_final: Ensuring DC, quorum and node attributes are up-to-date</div><div>Nov  9 16:58:47 vm-1 crmd: [2478]: info: crm_update_quorum: Updating quorum status to true (call=43)</div><div>Nov  9 16:58:47 vm-1 attrd: [2477]: notice: attrd_local_callback: Sending full refresh (origin=crmd)</div><div>Nov  9 16:58:47 vm-1 crmd: [2478]: info: abort_transition_graph: do_te_invoke:167 - Triggered transition abort (complete=1) : Peer Cancelled</div><div>Nov  9 16:58:47 vm-1 crmd: [2478]: info: do_pe_invoke: Query 44: Requesting the current CIB: S_POLICY_ENGINE</div><div>Nov  9 16:58:47 vm-1 crmd: [2478]: WARN: crmd_ha_msg_callback: Ignoring HA message (op=vote) from vm-0: not in our membership list (size=1)</div><div>Nov  9 16:58:47 vm-1 cib: [2474]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/41, version=1.30.157): ok (rc=0)</div><div>Nov  9 16:58:47 vm-1 crmd: [2478]: WARN: match_down_event: No match for shutdown action on 00000281-0281-0000-2b91-000000000000</div><div>Nov  9 16:58:47 vm-1 crmd: [2478]: info: te_update_diff: Stonith/shutdown of 00000281-0281-0000-2b91-000000000000 not matched</div><div>Nov  9 16:58:47 vm-1 crmd: [2478]: info: abort_transition_graph: te_update_diff:215 - Triggered transition abort (complete=1, tag=node_state, id=00000281-0281-0000-2b91-000000000000, magic=NA, cib=1.30.158) : Node failure</div><div>Nov  9 16:58:47 vm-1 crmd: [2478]: info: do_pe_invoke: Query 45: Requesting the current CIB: S_POLICY_ENGINE</div><div>Nov  9 16:58:47 vm-1 cib: [2474]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/43, version=1.30.159): ok (rc=0)</div><div>Nov  9 16:58:47 vm-1 crmd: [2478]: info: do_pe_invoke_callback: Invoking the PE: query=45, ref=pe_calc-dc-1447106327-10, seq=55, quorate=1</div></div><div><br></div><div><br></div><div>But pengine is disconnected with this at this point & goes ahead asking for a reboot of peer VM</div><div><div>Nov  9 16:58:47 vm-1 pengine: [25018]: notice: unpack_config: On loss of CCM Quorum: Ignore</div><div>Nov  9 16:58:47 vm-1 attrd: [2477]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)</div><div>Nov  9 16:58:47 vm-1 pengine: [25018]: WARN: pe_fence_node: Node vm-0 will be fenced because it is un-expectedly down</div><div>Nov  9 16:58:47 vm-1 pengine: [25018]: WARN: determine_online_status: Node vm-0 is unclean</div><div>Nov  9 16:58:47 vm-1 pengine: [25018]: WARN: custom_action: Action VM-FENCE:0_stop_0 on vm-0 is unrunnable (offline)</div><div>Nov  9 16:58:47 vm-1 pengine: [25018]: WARN: custom_action: Marking node vm-0 unclean</div><div>Nov  9 16:58:47 vm-1 pengine: [25018]: WARN: stage6: Scheduling Node vm-0 for STONITH</div><div>Nov  9 16:58:47 vm-1 pengine: [25018]: notice: LogActions: Stop    VM-FENCE:0#011(vm-0)</div><div>Nov  9 16:58:47 vm-1 pengine: [25018]: notice: LogActions: Leave   VM-FENCE:1#011(Started vm-1)</div><div>Nov  9 16:58:47 vm-1 crmd: [2478]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]</div><div>Nov  9 16:58:47 vm-1 crmd: [2478]: info: unpack_graph: Unpacked transition 0: 33 actions in 33 synapses</div><div>Nov  9 16:58:47 vm-1 crmd: [2478]: info: do_te_invoke: Processing graph 0 (ref=pe_calc-dc-1447106327-10) derived from /var/lib/pengine/pe-warn-1.bz2</div><div>Nov  9 16:58:47 vm-1 crmd: [2478]: info: te_pseudo_action: Pseudo action 14 fired and confirmed</div><div>Nov  9 16:58:47 vm-1 crmd: [2478]: info: te_pseudo_action: Pseudo action 9 fired and confirmed</div><div>Nov  9 16:58:47 vm-1 crmd: [2478]: info: te_pseudo_action: Pseudo action 15 fired and confirmed</div><div>Nov  9 16:58:47 vm-1 crmd: [2478]: info: te_fence_node: Executing reboot fencing operation (51) on vm-0 (timeout=90000)</div><div>Nov  9 16:58:47 vm-1 stonith-ng: [2476]: info: initiate_remote_stonith_op: Initiating remote operation reboot for vm-0: c3bda7ff-b6ec-433e-86a1-e570dbaacc8b</div><div>Nov  9 16:58:47 vm-1 stonith-ng: [2476]: info: can_fence_host_with_device: VM-FENCE:1 can fence vm-0: none</div><div>Nov  9 16:58:48 vm-1 ccm: [2473]: debug: quorum plugin: majority</div><div>Nov  9 16:58:48 vm-1 ccm: [2473]: debug: cluster:linux-ha, member_count=1, member_quorum_votes=100</div><div>Nov  9 16:58:48 vm-1 ccm: [2473]: debug: total_node_count=2, total_quorum_votes=200</div><div>Nov  9 16:58:48 vm-1 stonith-ng: [2476]: info: process_remote_stonith_query: Ignoring reply from vm-0, hosts are not permitted to commit suicide</div><div>Nov  9 16:58:48 vm-1 ccm: [2473]: debug: quorum plugin: twonodes</div><div>Nov  9 16:58:48 vm-1 ccm: [2473]: debug: cluster:linux-ha, member_count=1, member_quorum_votes=100</div><div>Nov  9 16:58:48 vm-1 stonith-ng: [2476]: info: call_remote_stonith: Requesting that vm-1 perform op reboot vm-0</div><div>Nov  9 16:58:48 vm-1 ccm: [2473]: debug: total_node_count=2, total_quorum_votes=200</div><div>Nov  9 16:58:48 vm-1 crmd: [2478]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm</div><div>Nov  9 16:58:48 vm-1 ccm: [2473]: info: Break tie for 2 nodes cluster</div><div>Nov  9 16:58:48 vm-1 cib: [2474]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm</div><div>Nov  9 16:58:48 vm-1 crmd: [2478]: info: mem_handle_event: no mbr_track info</div><div>Nov  9 16:58:48 vm-1 cib: [2474]: info: mem_handle_event: no mbr_track info</div><div>Nov  9 16:58:48 vm-1 crmd: [2478]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm</div><div>Nov  9 16:58:48 vm-1 cib: [2474]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm</div><div>Nov  9 16:58:48 vm-1 crmd: [2478]: info: mem_handle_event: instance=56, nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3</div><div>Nov  9 16:58:48 vm-1 cib: [2474]: info: mem_handle_event: instance=56, nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3</div><div>Nov  9 16:58:48 vm-1 crmd: [2478]: info: crmd_ccm_msg_callback: Quorum (re)attained after event=NEW MEMBERSHIP (id=56)</div><div>Nov  9 16:58:48 vm-1 cib: [2474]: info: cib_ccm_msg_callback: Processing CCM event=NEW MEMBERSHIP (id=56)</div><div>Nov  9 16:58:48 vm-1 crmd: [2478]: info: ccm_event_detail: NEW MEMBERSHIP: trans=56, nodes=1, new=0, lost=0 n_idx=0, new_idx=1, old_idx=3</div><div>Nov  9 16:58:48 vm-1 crmd: [2478]: info: ccm_event_detail: #011CURRENT: vm-1 [nodeid=1, born=56]</div><div>Nov  9 16:58:48 vm-1 crmd: [2478]: info: populate_cib_nodes_ha: Requesting the list of configured nodes</div><div>Nov  9 16:58:48 vm-1 stonith-ng: [2476]: info: stonith_fence: Exec <stonith_command t="stonith-ng" st_async_id="c3bda7ff-b6ec-433e-86a1-e570dbaacc8b" st_op="st_fence" st_callid="0" st_callopt="0" st_remote_op="c3bda7ff-b6ec-433e-86a1-e570dbaacc8b" st_target="vm-0" st_device_action="reboot" st_timeout="81000" dest="vm-1" oseq="2" from_id="stonith-ng" to_id="stonith-ng" client_gen="2" src="vm-1" seq="1015" hg="5487db2a" ts="56411718" ld="0.02 0.04 0.05 4/369 25102" ttl="3" auth="1 06b58c47479e1fb80d0979b4b8f367b60190789f" /></div><div>Nov  9 16:58:48 vm-1 stonith-ng: [2476]: info: can_fence_host_with_device: VM-FENCE:1 can fence vm-0: none</div><div>Nov  9 16:58:48 vm-1 stonith-ng: [2476]: info: stonith_fence: Found 1 matching devices for 'vm-0'</div><div>Nov  9 16:58:48 vm-1 stonith-ng: [2476]: info: stonith_command: Processed st_fence from vm-1: rc=-1</div><div>Nov  9 16:58:48 vm-1 stonith-ng: [2476]: info: make_args: reboot-ing node 'vm-0' as 'port=vm-0'</div><div>Nov  9 16:58:48 vm-1 vmstonith: Invoked as:  /usr/lib/stonith/plugins/external/vmstonith getconfignames</div><div>Nov  9 16:58:48 vm-1 vmstonith: Invoked as:  /usr/lib/stonith/plugins/external/vmstonith reset vm-0</div><div>Nov  9 16:58:48 vm-1 cib: [2474]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/47, version=1.30.164): ok (rc=0)</div></div><div><br></div><div><br></div><div>A membership event arrives in</div><div><div>Nov  9 16:58:49 vm-1 ccm: [2473]: debug: quorum plugin: majority</div><div>Nov  9 16:58:49 vm-1 ccm: [2473]: debug: cluster:linux-ha, member_count=2, member_quorum_votes=200</div><div>Nov  9 16:58:49 vm-1 ccm: [2473]: debug: total_node_count=2, total_quorum_votes=200</div><div>Nov  9 16:58:49 vm-1 crmd: [2478]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm</div><div>Nov  9 16:58:49 vm-1 crmd: [2478]: info: mem_handle_event: no mbr_track info</div><div>Nov  9 16:58:49 vm-1 cib: [2474]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm</div><div>Nov  9 16:58:49 vm-1 crmd: [2478]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm</div><div>Nov  9 16:58:49 vm-1 cib: [2474]: info: mem_handle_event: no mbr_track info</div><div>Nov  9 16:58:49 vm-1 cib: [2474]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm</div><div>Nov  9 16:58:49 vm-1 cib: [2474]: info: mem_handle_event: instance=57, nodes=2, new=1, lost=0, n_idx=0, new_idx=2, old_idx=4</div><div>Nov  9 16:58:49 vm-1 cib: [2474]: info: cib_ccm_msg_callback: Processing CCM event=NEW MEMBERSHIP (id=57)</div><div>Nov  9 16:58:49 vm-1 cib: [2474]: info: crm_update_peer: Node vm-0: id=0 state=member (new) addr=(null) votes=-1 born=57 seen=57 proc=00000000000000000000000000000302</div><div>Nov  9 16:58:49 vm-1 crmd: [2478]: info: mem_handle_event: instance=57, nodes=2, new=1, lost=0, n_idx=0, new_idx=2, old_idx=4</div><div>Nov  9 16:58:49 vm-1 crmd: [2478]: info: crmd_ccm_msg_callback: Quorum (re)attained after event=NEW MEMBERSHIP (id=57)</div><div>Nov  9 16:58:49 vm-1 crmd: [2478]: info: ccm_event_detail: NEW MEMBERSHIP: trans=57, nodes=2, new=1, lost=0 n_idx=0, new_idx=2, old_idx=4</div><div>Nov  9 16:58:49 vm-1 crmd: [2478]: info: ccm_event_detail: #011CURRENT: vm-1 [nodeid=1, born=1]</div><div>Nov  9 16:58:49 vm-1 crmd: [2478]: info: ccm_event_detail: #011CURRENT: vm-0 [nodeid=0, born=57]</div><div>Nov  9 16:58:49 vm-1 crmd: [2478]: info: ccm_event_detail: #011NEW:     vm-0 [nodeid=0, born=57]</div><div>Nov  9 16:58:49 vm-1 crmd: [2478]: info: ais_status_callback: status: vm-0 is now member (was lost)</div><div>Nov  9 16:58:49 vm-1 crmd: [2478]: WARN: match_down_event: No match for shutdown action on vm-0</div><div>Nov  9 16:58:49 vm-1 crmd: [2478]: info: crm_update_peer: Node vm-0: id=0 state=member (new) addr=(null) votes=-1 born=57 seen=57 proc=00000000000000000000000000000202</div><div>Nov  9 16:58:49 vm-1 crmd: [2478]: info: populate_cib_nodes_ha: Requesting the list of configured nodes</div><div>Nov  9 16:58:49 vm-1 cib: [2474]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='vm-0']/lrm (origin=local/crmd/49, version=1.30.166): ok (rc=0)</div><div>Nov  9 16:58:49 vm-1 cib: [2474]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='vm-0']/transient_attributes (origin=local/crmd/50, version=1.30.167): ok (rc=0)</div><div>Nov  9 16:58:49 vm-1 crmd: [2478]: info: erase_xpath_callback: Deletion of "//node_state[@uname='vm-0']/lrm": ok (rc=0)</div><div>Nov  9 16:58:49 vm-1 crmd: [2478]: info: abort_transition_graph: te_update_diff:178 - Triggered transition abort (complete=0, tag=transient_attributes, id=00000281-0281-0000-2b91-000000000000, magic=NA, cib=1.30.167) : Transient attribute: removal</div><div>Nov  9 16:58:49 vm-1 crmd: [2478]: info: erase_xpath_callback: Deletion of "//node_state[@uname='vm-0']/transient_attributes": ok (rc=0)</div><div>Nov  9 16:58:49 vm-1 cib: [2474]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/51, version=1.30.168): ok (rc=0)</div></div><div><br></div><div><br></div><div>as part of this, this VM unsets itself as DC and considers the peer VM as the DC. But it didnt update_dc() with peer VM VM-0) as the DC at this point</div><div><div>Nov  9 16:58:50 vm-1 crmd: [2478]: WARN: crmd_ha_msg_filter: Another DC detected: vm-0 (op=noop)</div><div>Nov  9 16:58:50 vm-1 crmd: [2478]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=crmd_ha_msg_filter ]</div><div>Nov  9 16:58:50 vm-1 crmd: [2478]: info: update_dc: Unset DC vm-1</div><div>Nov  9 16:58:51 vm-1 crmd: [2478]: info: do_state_transition: State transition S_ELECTION -> S_RELEASE_DC [ input=I_RELEASE_DC cause=C_FSA_INTERNAL origin=do_election_count_vote ]</div><div>Nov  9 16:58:51 vm-1 crmd: [2478]: info: do_dc_release: DC role released</div><div>Nov  9 16:58:51 vm-1 crmd: [2478]: info: stop_subsystem: Sent -TERM to pengine: [25018]</div><div>Nov  9 16:58:51 vm-1 crmd: [2478]: WARN: destroy_action: Cancelling timer for action 3 (src=57)</div><div>Nov  9 16:58:51 vm-1 crmd: [2478]: info: do_te_control: Transitioner is now inactive</div><div>Nov  9 16:58:51 vm-1 crmd: [2478]: info: do_state_transition: State transition S_RELEASE_DC -> S_PENDING [ input=I_RELEASE_SUCCESS cause=C_FSA_INTERNAL origin=do_dc_release ]</div><div>Nov  9 16:58:51 vm-1 crmd: [2478]: info: crmdManagedChildDied: Process pengine:[25018] exited (signal=0, exitcode=0)</div><div>Nov  9 16:58:51 vm-1 crmd: [2478]: info: pe_msg_dispatch: Received HUP from pengine:[25018]</div><div>Nov  9 16:58:51 vm-1 crmd: [2478]: info: pe_connection_destroy: Connection to the Policy Engine released</div><div>Nov  9 16:58:51 vm-1 cib: [2474]: info: cib_process_readwrite: We are now in R/O mode</div></div><div><br></div><div><br></div><div>The peer VM STONITH that was triggered manages to reboot the peer VM & at this point the membership is changed<br></div><div><div>Nov  9 16:58:59 vm-1 stonith-ng: [2476]: info: log_operation: Operation 'reboot' [25103] (call 0 from (null)) for host 'vm-0' with device 'VM-FENCE:1' returned: 0</div><div>Nov  9 16:58:59 vm-1 stonith-ng: [2476]: info: log_operation: VM-FENCE:1: Performing: stonith -t external/vmstonith -T reset vm-0</div><div>Nov  9 16:58:59 vm-1 stonith-ng: [2476]: info: log_operation: VM-FENCE:1: success: vm-0 0</div><div>Nov  9 16:58:59 vm-1 stonith-ng: [2476]: info: stonith_device_execute: Nothing to do for VM-FENCE:1</div><div>Nov  9 16:58:59 vm-1 stonith-ng: [2476]: info: process_remote_stonith_exec: ExecResult <st-reply st_origin="stonith_construct_async_reply" t="stonith-ng" st_op="st_notify" st_remote_op="c3bda7ff-b6ec-433e-86a1-e570dbaacc8b" st_callid="0" st_callopt="0" st_rc="0" st_output="Performing: stonith -t external/vmstonith -T reset vm-0#012success: vm-0 0#012" from_id="stonith-ng" to_id="stonith-ng" client_gen="2" src="vm-1" seq="102d" hg="5487db2a" ts="56411723" ld="0.02 0.04 0.05 1/370 25223" ttl="3" auth="1 e388a45fa82902e4d2cc13372bd016c90878a052" /></div><div>Nov  9 16:58:59 vm-1 stonith-ng: [2476]: info: remote_op_done: Notifing clients of c3bda7ff-b6ec-433e-86a1-e570dbaacc8b (reboot of vm-0 from 8a97a051-0052-43fe-b06e-7dd766053c55 by vm-1): 2, rc=0</div><div>Nov  9 16:58:59 vm-1 stonith-ng: [2476]: info: stonith_notify_client: Sending st_fence-notification to client 2478/886556f2-863a-46c8-9c59-33f84f5d8fcf</div><div>Nov  9 16:58:59 vm-1 crmd: [2478]: info: tengine_stonith_callback: StonithOp <st-reply st_origin="stonith_construct_async_reply" t="stonith-ng" st_op="reboot" st_remote_op="c3bda7ff-b6ec-433e-86a1-e570dbaacc8b" st_callid="0" st_callopt="0" st_rc="0" st_output="Performing: stonith -t external/vmstonith -T reset vm-0#012success: vm-0 0#012" from_id="stonith-ng" to_id="stonith-ng" client_gen="2" src="vm-1" seq="102d" hg="5487db2a" ts="56411723" ld="0.02 0.04 0.05 1/370 25223" ttl="3" auth="1 e388a45fa82902e4d2cc13372bd016c90878a052" state="2" st_target="vm-0" /></div><div>Nov  9 16:58:59 vm-1 crmd: [2478]: info: tengine_stonith_callback: Stonith operation 2/51:0:0:4717ec7c-684e-4718-998f-fb1938efed3b: OK (0)</div><div>Nov  9 16:58:59 vm-1 crmd: [2478]: info: tengine_stonith_notify: Peer vm-0 was terminated (reboot) by vm-1 for vm-1 (ref=c3bda7ff-b6ec-433e-86a1-e570dbaacc8b): OK</div><div>Nov  9 16:59:01 vm-1 heartbeat: [2378]: WARN: node vm-0: is dead</div><div>Nov  9 16:59:01 vm-1 heartbeat: [2378]: info: Link vm-0:eth0 dead.</div><div>Nov  9 16:59:01 vm-1 heartbeat: [2378]: info: Link vm-0:eth2 dead.</div><div>Nov  9 16:59:01 vm-1 ccm: [2473]: debug: quorum plugin: majority</div><div>Nov  9 16:59:01 vm-1 ccm: [2473]: debug: cluster:linux-ha, member_count=1, member_quorum_votes=100</div><div>Nov  9 16:59:01 vm-1 ccm: [2473]: debug: total_node_count=2, total_quorum_votes=200</div><div>Nov  9 16:59:01 vm-1 ccm: [2473]: debug: quorum plugin: twonodes</div><div>Nov  9 16:59:01 vm-1 ccm: [2473]: debug: cluster:linux-ha, member_count=1, member_quorum_votes=100</div><div>Nov  9 16:59:01 vm-1 ccm: [2473]: debug: total_node_count=2, total_quorum_votes=200</div><div>Nov  9 16:59:01 vm-1 ccm: [2473]: info: Break tie for 2 nodes cluster</div><div>Nov  9 16:59:01 vm-1 crmd: [2478]: notice: crmd_ha_status_callback: Status update: Node vm-0 now has status [dead]</div><div>Nov  9 16:59:01 vm-1 cib: [2474]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm</div><div>Nov  9 16:59:01 vm-1 cib: [2474]: info: mem_handle_event: no mbr_track info</div><div>Nov  9 16:59:01 vm-1 cib: [2474]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm</div><div>Nov  9 16:59:01 vm-1 cib: [2474]: info: mem_handle_event: instance=58, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3</div><div>Nov  9 16:59:01 vm-1 cib: [2474]: info: cib_ccm_msg_callback: Processing CCM event=NEW MEMBERSHIP (id=58)</div><div>Nov  9 16:59:01 vm-1 cib: [2474]: info: crm_update_peer: Node vm-0: id=0 state=lost (new) addr=(null) votes=-1 born=57 seen=57 proc=00000000000000000000000000000302</div><div>Nov  9 16:59:01 vm-1 crmd: [2478]: info: crm_update_peer_proc: vm-0.ais is now offline</div><div>Nov  9 16:59:01 vm-1 crmd: [2478]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm</div><div>Nov  9 16:59:01 vm-1 crmd: [2478]: info: mem_handle_event: no mbr_track info</div><div>Nov  9 16:59:01 vm-1 crmd: [2478]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm</div><div>Nov  9 16:59:01 vm-1 crmd: [2478]: info: mem_handle_event: instance=58, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3</div><div>Nov  9 16:59:01 vm-1 crmd: [2478]: info: crmd_ccm_msg_callback: Quorum (re)attained after event=NEW MEMBERSHIP (id=58)</div><div>Nov  9 16:59:01 vm-1 crmd: [2478]: info: ccm_event_detail: NEW MEMBERSHIP: trans=58, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=3</div><div>Nov  9 16:59:01 vm-1 crmd: [2478]: info: ccm_event_detail: #011CURRENT: vm-1 [nodeid=1, born=58]</div><div>Nov  9 16:59:01 vm-1 crmd: [2478]: info: ccm_event_detail: #011LOST:    vm-0 [nodeid=0, born=57]</div><div>Nov  9 16:59:01 vm-1 crmd: [2478]: info: ais_status_callback: status: vm-0 is now lost (was member)</div><div>Nov  9 16:59:01 vm-1 crmd: [2478]: info: crm_update_peer: Node vm-0: id=0 state=lost (new) addr=(null) votes=-1 born=57 seen=57 proc=00000000000000000000000000000200</div><div>Nov  9 17:00:08 vm-1 stonith-ng: [2476]: info: stonith_command: Processed st_execute from lrmd: rc=-1</div><div>Nov  9 17:00:08 vm-1 vmstonith: Invoked as:  /usr/lib/stonith/plugins/external/vmstonith getconfignames</div><div>Nov  9 17:00:08 vm-1 vmstonith: Invoked as:  /usr/lib/stonith/plugins/external/vmstonith status</div><div>Nov  9 17:00:09 vm-1 stonith: [26002]: info: external/vmstonith device OK.</div><div>Nov  9 17:00:09 vm-1 stonith-ng: [2476]: info: log_operation: VM-FENCE:1: Performing: stonith -t external/vmstonith -S</div><div>Nov  9 17:00:09 vm-1 stonith-ng: [2476]: info: log_operation: VM-FENCE:1: success:  0</div><div>Nov  9 17:00:09 vm-1 stonith-ng: [2476]: info: stonith_device_execute: Nothing to do for VM-FENCE:1</div></div><div><br></div><div><br></div><div>but by this time vm-0 is not set as the DC. So DC takeover doesnt happen & VM-1 still remains as a member not doing any action.</div><div><br></div><div>When we managed to reboot the underlying node from crash, heartbeat on the peer VM was restarted</div><div><div>Nov  9 17:29:02 vm-1 heartbeat: [2378]: info: Heartbeat restart on node vm-0</div><div>Nov  9 17:29:02 vm-1 heartbeat: [2378]: info: Link vm-0:eth0 up.</div><div>Nov  9 17:29:02 vm-1 heartbeat: [2378]: info: Status update for node vm-0: status init</div><div>Nov  9 17:29:02 vm-1 heartbeat: [2378]: info: Link vm-0:eth2 up.</div><div>Nov  9 17:29:02 vm-1 heartbeat: [2378]: info: Status update for node vm-0: status up</div><div>Nov  9 17:29:02 vm-1 crmd: [2478]: notice: crmd_ha_status_callback: Status update: Node vm-0 now has status [init]</div><div>Nov  9 17:29:02 vm-1 crmd: [2478]: info: crm_update_peer_proc: vm-0.ais is now online</div><div>Nov  9 17:29:02 vm-1 crmd: [2478]: notice: crmd_ha_status_callback: Status update: Node vm-0 now has status [up]</div><div>Nov  9 17:29:03 vm-1 heartbeat: [2378]: debug: get_delnodelist: delnodelist=</div><div>Nov  9 17:29:03 vm-1 heartbeat: [2378]: info: all clients are now paused</div><div>Nov  9 17:29:03 vm-1 heartbeat: [2378]: info: Status update for node vm-0: status active</div><div>Nov  9 17:29:03 vm-1 crmd: [2478]: notice: crmd_ha_status_callback: Status update: Node vm-0 now has status [active]</div><div>Nov  9 17:29:04 vm-1 cib: [2474]: info: cib_client_status_callback: Status update: Client vm-0/cib now has status [join]</div><div>Nov  9 17:29:05 vm-1 heartbeat: [2378]: info: all clients are now resumed</div><div>Nov  9 17:29:11 vm-1 heartbeat: [2378]: WARN: 1 lost packet(s) for [vm-0] [19:21]</div><div>Nov  9 17:29:11 vm-1 heartbeat: [2378]: info: No pkts missing from vm-0!</div><div>Nov  9 17:29:11 vm-1 ccm: [2473]: debug: quorum plugin: majority</div><div>Nov  9 17:29:11 vm-1 crmd: [2478]: notice: crmd_client_status_callback: Status update: Client vm-0/crmd now has status [online] (DC=false)</div><div>Nov  9 17:29:11 vm-1 ccm: [2473]: debug: cluster:linux-ha, member_count=2, member_quorum_votes=200</div><div>Nov  9 17:29:11 vm-1 ccm: [2473]: debug: total_node_count=2, total_quorum_votes=200</div><div>Nov  9 17:29:11 vm-1 cib: [2474]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm</div><div>Nov  9 17:29:11 vm-1 cib: [2474]: info: mem_handle_event: no mbr_track info</div><div>Nov  9 17:29:11 vm-1 cib: [2474]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm</div><div>Nov  9 17:29:11 vm-1 cib: [2474]: info: mem_handle_event: instance=59, nodes=2, new=1, lost=0, n_idx=0, new_idx=2, old_idx=4</div><div>Nov  9 17:29:11 vm-1 cib: [2474]: info: cib_ccm_msg_callback: Processing CCM event=NEW MEMBERSHIP (id=59)</div><div>Nov  9 17:29:11 vm-1 cib: [2474]: info: crm_update_peer: Node vm-0: id=0 state=member (new) addr=(null) votes=-1 born=59 seen=59 proc=00000000000000000000000000000302</div><div>Nov  9 17:29:11 vm-1 crmd: [2478]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm</div><div>Nov  9 17:29:11 vm-1 crmd: [2478]: info: mem_handle_event: no mbr_track info</div><div>Nov  9 17:29:11 vm-1 crmd: [2478]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm</div><div>Nov  9 17:29:11 vm-1 crmd: [2478]: info: mem_handle_event: instance=59, nodes=2, new=1, lost=0, n_idx=0, new_idx=2, old_idx=4</div><div>Nov  9 17:29:11 vm-1 crmd: [2478]: info: crmd_ccm_msg_callback: Quorum (re)attained after event=NEW MEMBERSHIP (id=59)</div><div>Nov  9 17:29:11 vm-1 crmd: [2478]: info: ccm_event_detail: NEW MEMBERSHIP: trans=59, nodes=2, new=1, lost=0 n_idx=0, new_idx=2, old_idx=4</div><div>Nov  9 17:29:11 vm-1 crmd: [2478]: info: ccm_event_detail: #011CURRENT: vm-1 [nodeid=1, born=1]</div><div>Nov  9 17:29:11 vm-1 crmd: [2478]: info: ccm_event_detail: #011CURRENT: vm-0 [nodeid=0, born=59]</div><div>Nov  9 17:29:11 vm-1 crmd: [2478]: info: ccm_event_detail: #011NEW:     vm-0 [nodeid=0, born=59]</div><div>Nov  9 17:29:11 vm-1 crmd: [2478]: info: ais_status_callback: status: vm-0 is now member (was lost)</div><div>Nov  9 17:29:11 vm-1 crmd: [2478]: info: crm_update_peer: Node vm-0: id=0 state=member (new) addr=(null) votes=-1 born=59 seen=59 proc=00000000000000000000000000000202</div><div>Nov  9 17:29:12 vm-1 heartbeat: [2378]: WARN: 1 lost packet(s) for [vm-0] [25:27]</div><div>Nov  9 17:29:12 vm-1 heartbeat: [2378]: info: No pkts missing from vm-0!</div><div><br></div></div><div><br></div><div>Only after this point an election was done & this VM (VM-1) took over as the DC & started resources</div><div><div>Nov  9 17:29:25 vm-1 crmd: [2478]: info: do_election_count_vote: Election 2 (owner: 00000281-0281-0000-2b91-000000000000) pass: vote from vm-0 (Uptime)</div><div>Nov  9 17:29:25 vm-1 crmd: [2478]: info: do_state_transition: State transition S_PENDING -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=do_election_count_vote ]</div><div>Nov  9 17:29:26 vm-1 crmd: [2478]: info: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]</div><div>Nov  9 17:29:26 vm-1 crmd: [2478]: info: do_te_control: Registering TE UUID: cae16cf4-bcc7-4027-8a84-33537c7696a3</div><div>Nov  9 17:29:26 vm-1 crmd: [2478]: info: set_graph_functions: Setting custom graph functions</div><div>Nov  9 17:29:26 vm-1 crmd: [2478]: info: unpack_graph: Unpacked transition -1: 0 actions in 0 synapses</div><div>Nov  9 17:29:26 vm-1 crmd: [2478]: info: start_subsystem: Starting sub-system "pengine"</div><div>Nov  9 17:29:26 vm-1 pengine: [12152]: info: Invoked: /usr/lib/heartbeat/pengine</div><div>Nov  9 17:29:30 vm-1 crmd: [2478]: info: do_dc_takeover: Taking over DC status for this partition</div><div>Nov  9 17:29:30 vm-1 cib: [2474]: info: cib_process_readwrite: We are now in R/W mode</div><div>Nov  9 17:29:30 vm-1 cib: [2474]: info: cib_process_request: Operation complete: op cib_master for section 'all' (origin=local/crmd/55, version=1.30.172): ok (rc=0)</div><div>Nov  9 17:29:30 vm-1 cib: [2474]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/56, version=1.30.173): ok (rc=0)</div><div>Nov  9 17:29:30 vm-1 cib: [2474]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/58, version=1.30.174): ok (rc=0)</div><div>Nov  9 17:29:30 vm-1 crmd: [2478]: info: join_make_offer: Making join offers based on membership 59</div><div>Nov  9 17:29:30 vm-1 crmd: [2478]: info: do_dc_join_offer_all: join-2: Waiting on 2 outstanding join acks</div><div>Nov  9 17:29:30 vm-1 cib: [2474]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/60, version=1.30.175): ok (rc=0)</div><div>Nov  9 17:29:30 vm-1 crmd: [2478]: info: config_query_callback: Shutdown escalation occurs after: 1200000ms</div><div>Nov  9 17:29:30 vm-1 crmd: [2478]: info: config_query_callback: Checking for expired actions every 900000ms</div><div>Nov  9 17:29:30 vm-1 crmd: [2478]: info: update_dc: Set DC to vm-1 (3.0.5)</div><div>Nov  9 17:29:31 vm-1 cib: [2474]: info: cib_process_request: Operation complete: op cib_sync_one for section 'all' (origin=vm-0/vm-0/(null), version=1.30.175): ok (rc=0)</div><div>Nov  9 17:29:31 vm-1 crmd: [2478]: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]</div><div>Nov  9 17:29:31 vm-1 crmd: [2478]: info: do_state_transition: All 2 cluster nodes responded to the join offer.</div><div>Nov  9 17:29:31 vm-1 crmd: [2478]: info: do_dc_join_finalize: join-2: Syncing the CIB from vm-1 to the rest of the cluster</div><div>Nov  9 17:29:31 vm-1 cib: [2474]: info: cib_process_request: Operation complete: op cib_sync for section 'all' (origin=local/crmd/63, version=1.30.175): ok (rc=0)</div><div>Nov  9 17:29:31 vm-1 cib: [2474]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/64, version=1.30.176): ok (rc=0)</div><div>Nov  9 17:29:31 vm-1 cib: [2474]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/65, version=1.30.177): ok (rc=0)</div><div>Nov  9 17:29:32 vm-1 lrmd: [2475]: info: stonith_api_device_metadata: looking up external/vmstonith/heartbeat metadata</div><div>Nov  9 17:29:32 vm-1 vmstonith: Invoked as:  /usr/lib/stonith/plugins/external/vmstonith getinfo-devdescr</div><div>Nov  9 17:29:32 vm-1 vmstonith: Invoked as:  /usr/lib/stonith/plugins/external/vmstonith getinfo-devid</div><div>Nov  9 17:29:32 vm-1 vmstonith: Invoked as:  /usr/lib/stonith/plugins/external/vmstonith getinfo-xml</div><div>Nov  9 17:29:32 vm-1 crmd: [2478]: info: do_dc_join_ack: join-2: Updating node state to member for vm-0</div><div>Nov  9 17:29:32 vm-1 crmd: [2478]: info: do_dc_join_ack: join-2: Updating node state to member for vm-1</div><div>Nov  9 17:29:32 vm-1 cib: [2474]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='vm-0']/transient_attributes (origin=vm-0/crmd/8, version=1.30.178): ok (rc=0)</div><div>Nov  9 17:29:32 vm-1 cib: [2474]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='vm-0']/lrm (origin=local/crmd/66, version=1.30.179): ok (rc=0)</div><div>Nov  9 17:29:32 vm-1 cib: [2474]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='vm-1']/lrm (origin=local/crmd/68, version=1.30.181): ok (rc=0)</div><div>Nov  9 17:29:32 vm-1 crmd: [2478]: info: erase_xpath_callback: Deletion of "//node_state[@uname='vm-0']/lrm": ok (rc=0)</div><div>Nov  9 17:29:32 vm-1 crmd: [2478]: info: erase_xpath_callback: Deletion of "//node_state[@uname='vm-1']/lrm": ok (rc=0)</div><div>Nov  9 17:29:32 vm-1 crmd: [2478]: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]</div><div>Nov  9 17:29:32 vm-1 crmd: [2478]: info: populate_cib_nodes_ha: Requesting the list of configured nodes</div><div>Nov  9 17:29:32 vm-1 crmd: [2478]: info: do_state_transition: All 2 cluster nodes are eligible to run resources.</div><div>Nov  9 17:29:32 vm-1 crmd: [2478]: info: do_dc_join_final: Ensuring DC, quorum and node attributes are up-to-date</div><div>Nov  9 17:29:32 vm-1 crmd: [2478]: info: crm_update_quorum: Updating quorum status to true (call=72)</div><div>Nov  9 17:29:32 vm-1 attrd: [2477]: notice: attrd_local_callback: Sending full refresh (origin=crmd)</div><div>Nov  9 17:29:32 vm-1 crmd: [2478]: info: do_pe_invoke: Query 73: Requesting the current CIB: S_POLICY_ENGINE</div><div>Nov  9 17:29:32 vm-1 crmd: [2478]: info: te_update_diff: Detected LRM refresh - 11 resources updated: Skipping all resource events</div><div>Nov  9 17:29:32 vm-1 crmd: [2478]: info: abort_transition_graph: te_update_diff:251 - Triggered transition abort (complete=1, tag=diff, id=(null), magic=NA, cib=1.30.182) : LRM Refresh</div><div>Nov  9 17:29:32 vm-1 crmd: [2478]: info: do_pe_invoke: Query 74: Requesting the current CIB: S_POLICY_ENGINE</div><div>Nov  9 17:29:32 vm-1 cib: [2474]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/70, version=1.30.183): ok (rc=0)</div><div>Nov  9 17:29:32 vm-1 cib: [2474]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/72, version=1.30.185): ok (rc=0)</div><div>Nov  9 17:29:32 vm-1 crmd: [2478]: info: do_pe_invoke_callback: Invoking the PE: query=74, ref=pe_calc-dc-1447108172-26, seq=59, quorate=1</div><div>Nov  9 17:29:32 vm-1 attrd: [2477]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)</div></div><div><br></div><div>Thanks.</div><div><br></div><div>--Shyam</div><div><br></div></div><div style=""><br></div></div>