[Pacemaker] Need help with resolving very long election cycle

Shyam shyam.kaushik at gmail.com
Fri Jan 27 11:21:45 UTC 2012


Folks,

We are constantly running into a long election cycle where in a 2-node
cluster when both of them are simultaneously rebooted, they take a long
time running through election loop.

On one node pacemaker loops like:
Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: do_dc_takeover:
Taking over DC status for this partition
Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_readwrite:
We are now in R/O mode
Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request:
Operation complete: op cib_slave_all for section 'all'
(origin=local/crmd/222, version=1.1.1): ok (rc=0)
Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_readwrite:
We are now in R/W mode
Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request:
Operation complete: op cib_master for section 'all' (origin=local/crmd/223,
version=1.1.1): ok (rc=0)
Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request:
Operation complete: op cib_modify for section cib (origin=local/crmd/224,
version=1.1.1): ok (rc=0)
Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request:
Operation complete: op cib_modify for section crm_config
(origin=local/crmd/226, version=1.1.1): ok (rc=0)
Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: do_dc_join_offer_all:
join-25: Waiting on 2 outstanding join acks
Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request:
Operation complete: op cib_modify for section crm_config
(origin=local/crmd/228, version=1.1.1): ok (rc=0)
Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info:
config_query_callback: Checking for expired actions every 900000ms
Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info:
do_election_count_vote: Election 50 (owner:
00000156-0156-0000-2b91-000000000000) pass: vote from vsa-0000009c-vc-0
(Age)
Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Set DC to
vsa-0000009c-vc-1 (3.0.1)
Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: do_state_transition:
State transition S_INTEGRATION -> S_ELECTION [ input=I_ELECTION
cause=C_FSA_INTERNAL origin=do_election_count_vote ]
Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Unset DC
vsa-0000009c-vc-1
Jan 26 22:03:21 vsa-0000009c-vc-1 crmd: [1134]: info:
do_election_count_vote: Election 51 (owner:
00000156-0156-0000-2b91-000000000000) pass: vote from vsa-0000009c-vc-0
(Age)
Jan 26 22:03:21 vsa-0000009c-vc-1 crmd: [1134]: WARN: do_log: FSA: Input
I_JOIN_REQUEST from route_message() received in state S_ELECTION
Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: info: do_state_transition:
State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: info: start_subsystem:
Starting sub-system "pengine"
Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: WARN: start_subsystem:
Client pengine already running as pid 1234
Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: do_dc_takeover:
Taking over DC status for this partition
Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_readwrite:
We are now in R/O mode
Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request:
Operation complete: op cib_slave_all for section 'all'
(origin=local/crmd/231, version=1.1.1): ok (rc=0)
Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_readwrite:
We are now in R/W mode
Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request:
Operation complete: op cib_master for section 'all' (origin=local/crmd/232,
version=1.1.1): ok (rc=0)
Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request:
Operation complete: op cib_modify for section cib (origin=local/crmd/233,
version=1.1.1): ok (rc=0)
Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request:
Operation complete: op cib_modify for section crm_config
(origin=local/crmd/235, version=1.1.1): ok (rc=0)
Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: do_dc_join_offer_all:
join-26: Waiting on 2 outstanding join acks
Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request:
Operation complete: op cib_modify for section crm_config
(origin=local/crmd/237, version=1.1.1): ok (rc=0)
Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info:
config_query_callback: Checking for expired actions every 900000ms
Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info:
do_election_count_vote: Election 52 (owner:
00000156-0156-0000-2b91-000000000000) pass: vote from vsa-0000009c-vc-0
(Age)
Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Set DC to
vsa-0000009c-vc-1 (3.0.1)
Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: do_state_transition:
State transition S_INTEGRATION -> S_ELECTION [ input=I_ELECTION
cause=C_FSA_INTERNAL origin=do_election_count_vote ]
Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Unset DC
vsa-0000009c-vc-1
Jan 26 22:03:27 vsa-0000009c-vc-1 crmd: [1134]: info:
do_election_count_vote: Election 53 (owner:
00000156-0156-0000-2b91-000000000000) pass: vote from vsa-0000009c-vc-0
(Age)
Jan 26 22:03:27 vsa-0000009c-vc-1 crmd: [1134]: WARN: do_log: FSA: Input
I_JOIN_REQUEST from route_message() received in state S_ELECTION
Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: info: do_state_transition:
State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: info: start_subsystem:
Starting sub-system "pengine"
Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: WARN: start_subsystem:
Client pengine already running as pid 1234

&  other node with
Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: info: crm_timer_popped:
Election Trigger (I_DC_TIMEOUT) just popped!
Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: Input
I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: info: do_state_transition:
State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT
cause=C_TIMER_POPPED origin=crm_timer_popped ]
Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: Input
I_JOIN_OFFER from route_message() received in state S_ELECTION
Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info: do_state_transition:
State transition S_ELECTION -> S_PENDING [ input=I_PENDING
cause=C_FSA_INTERNAL origin=do_election_count_vote ]
Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info: do_dc_release: DC
role released
Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info: do_te_control:
Transitioner is now inactive
Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: info: crm_timer_popped:
Election Trigger (I_DC_TIMEOUT) just popped!
Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: Input
I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: info: do_state_transition:
State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT
cause=C_TIMER_POPPED origin=crm_timer_popped ]
Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: Input
I_JOIN_OFFER from route_message() received in state S_ELECTION
Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info: do_state_transition:
State transition S_ELECTION -> S_PENDING [ input=I_PENDING
cause=C_FSA_INTERNAL origin=do_election_count_vote ]
Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info: do_dc_release: DC
role released
Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info: do_te_control:
Transitioner is now inactive

This takes several minutes & finally breaks.

Any pointers on what can be causing this?

Thanks.

--Shyam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120127/c28d3832/attachment-0003.html>


More information about the Pacemaker mailing list