Folks,<div><br></div><div>We are constantly running into a long election cycle where in a 2-node cluster when both of them are simultaneously rebooted, they take a long time running through election loop.</div><div><br></div>
<div>On one node pacemaker loops like:</div><div>Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: do_dc_takeover: Taking over DC status for this partition</div><div><div>Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_readwrite: We are now in R/O mode</div>
<div>Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request: Operation complete: op cib_slave_all for section 'all' (origin=local/crmd/222, version=1.1.1): ok (rc=0)</div><div>Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_readwrite: We are now in R/W mode</div>
<div>Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request: Operation complete: op cib_master for section 'all' (origin=local/crmd/223, version=1.1.1): ok (rc=0)</div><div>Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/224, version=1.1.1): ok (rc=0)</div>
<div>Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/226, version=1.1.1): ok (rc=0)</div><div>Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: do_dc_join_offer_all: join-25: Waiting on 2 outstanding join acks</div>
<div>Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/228, version=1.1.1): ok (rc=0)</div><div>Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: config_query_callback: Checking for expired actions every 900000ms</div>
<div>Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: do_election_count_vote: Election 50 (owner: 00000156-0156-0000-2b91-000000000000) pass: vote from vsa-0000009c-vc-0 (Age)</div><div>Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Set DC to vsa-0000009c-vc-1 (3.0.1)</div>
<div>Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: do_state_transition: State transition S_INTEGRATION -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=do_election_count_vote ]</div><div>Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Unset DC vsa-0000009c-vc-1</div>
<div>Jan 26 22:03:21 vsa-0000009c-vc-1 crmd: [1134]: info: do_election_count_vote: Election 51 (owner: 00000156-0156-0000-2b91-000000000000) pass: vote from vsa-0000009c-vc-0 (Age)</div><div>Jan 26 22:03:21 vsa-0000009c-vc-1 crmd: [1134]: WARN: do_log: FSA: Input I_JOIN_REQUEST from route_message() received in state S_ELECTION</div>
<div>Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: info: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]</div><div>Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: info: start_subsystem: Starting sub-system "pengine"</div>
<div>Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: WARN: start_subsystem: Client pengine already running as pid 1234</div><div>Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: do_dc_takeover: Taking over DC status for this partition</div>
<div>Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_readwrite: We are now in R/O mode</div><div>Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request: Operation complete: op cib_slave_all for section 'all' (origin=local/crmd/231, version=1.1.1): ok (rc=0)</div>
<div>Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_readwrite: We are now in R/W mode</div><div>Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request: Operation complete: op cib_master for section 'all' (origin=local/crmd/232, version=1.1.1): ok (rc=0)</div>
<div>Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/233, version=1.1.1): ok (rc=0)</div><div>Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/235, version=1.1.1): ok (rc=0)</div>
<div>Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: do_dc_join_offer_all: join-26: Waiting on 2 outstanding join acks</div><div>Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/237, version=1.1.1): ok (rc=0)</div>
<div>Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: config_query_callback: Checking for expired actions every 900000ms</div><div>Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: do_election_count_vote: Election 52 (owner: 00000156-0156-0000-2b91-000000000000) pass: vote from vsa-0000009c-vc-0 (Age)</div>
<div>Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Set DC to vsa-0000009c-vc-1 (3.0.1)</div><div>Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: do_state_transition: State transition S_INTEGRATION -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=do_election_count_vote ]</div>
<div>Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Unset DC vsa-0000009c-vc-1</div><div>Jan 26 22:03:27 vsa-0000009c-vc-1 crmd: [1134]: info: do_election_count_vote: Election 53 (owner: 00000156-0156-0000-2b91-000000000000) pass: vote from vsa-0000009c-vc-0 (Age)</div>
<div>Jan 26 22:03:27 vsa-0000009c-vc-1 crmd: [1134]: WARN: do_log: FSA: Input I_JOIN_REQUEST from route_message() received in state S_ELECTION</div><div>Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: info: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]</div>
<div>Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: info: start_subsystem: Starting sub-system "pengine"</div><div>Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: WARN: start_subsystem: Client pengine already running as pid 1234</div>
</div><div><br></div><div>& other node with</div><div><div>Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: info: crm_timer_popped: Election Trigger (I_DC_TIMEOUT) just popped!</div><div>Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING</div>
<div>Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: info: do_state_transition: State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_timer_popped ]</div><div>Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: Input I_JOIN_OFFER from route_message() received in state S_ELECTION</div>
<div>Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info: do_state_transition: State transition S_ELECTION -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_election_count_vote ]</div><div>Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info: do_dc_release: DC role released</div>
<div>Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info: do_te_control: Transitioner is now inactive</div><div>Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: info: crm_timer_popped: Election Trigger (I_DC_TIMEOUT) just popped!</div>
<div>Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING</div><div>Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: info: do_state_transition: State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_timer_popped ]</div>
<div>Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: Input I_JOIN_OFFER from route_message() received in state S_ELECTION</div><div>Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info: do_state_transition: State transition S_ELECTION -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_election_count_vote ]</div>
<div>Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info: do_dc_release: DC role released</div><div>Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info: do_te_control: Transitioner is now inactive</div></div><div><br></div>
<div>This takes several minutes & finally breaks.</div><div><br></div><div>Any pointers on what can be causing this?</div><div><br></div><div>Thanks.</div><div><br></div><div>--Shyam</div>