[Pacemaker] Need help with resolving very long election cycle

Andrew Beekhof andrew at beekhof.net
Thu Feb 2 04:58:55 EST 2012


They both think they should be the DC.
But the log fragments dont extend back far enough to say why.

On Fri, Jan 27, 2012 at 10:21 PM, Shyam <shyam.kaushik at gmail.com> wrote:
> Folks,
>
> We are constantly running into a long election cycle where in a 2-node
> cluster when both of them are simultaneously rebooted, they take a long time
> running through election loop.
>
> On one node pacemaker loops like:
> Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: do_dc_takeover: Taking
> over DC status for this partition
> Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_readwrite:
> We are now in R/O mode
> Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request:
> Operation complete: op cib_slave_all for section 'all'
> (origin=local/crmd/222, version=1.1.1): ok (rc=0)
> Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_readwrite:
> We are now in R/W mode
> Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request:
> Operation complete: op cib_master for section 'all' (origin=local/crmd/223,
> version=1.1.1): ok (rc=0)
> Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request:
> Operation complete: op cib_modify for section cib (origin=local/crmd/224,
> version=1.1.1): ok (rc=0)
> Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request:
> Operation complete: op cib_modify for section crm_config
> (origin=local/crmd/226, version=1.1.1): ok (rc=0)
> Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: do_dc_join_offer_all:
> join-25: Waiting on 2 outstanding join acks
> Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request:
> Operation complete: op cib_modify for section crm_config
> (origin=local/crmd/228, version=1.1.1): ok (rc=0)
> Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: config_query_callback:
> Checking for expired actions every 900000ms
> Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info:
> do_election_count_vote: Election 50 (owner:
> 00000156-0156-0000-2b91-000000000000) pass: vote from vsa-0000009c-vc-0
> (Age)
> Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Set DC to
> vsa-0000009c-vc-1 (3.0.1)
> Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: do_state_transition:
> State transition S_INTEGRATION -> S_ELECTION [ input=I_ELECTION
> cause=C_FSA_INTERNAL origin=do_election_count_vote ]
> Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Unset DC
> vsa-0000009c-vc-1
> Jan 26 22:03:21 vsa-0000009c-vc-1 crmd: [1134]: info:
> do_election_count_vote: Election 51 (owner:
> 00000156-0156-0000-2b91-000000000000) pass: vote from vsa-0000009c-vc-0
> (Age)
> Jan 26 22:03:21 vsa-0000009c-vc-1 crmd: [1134]: WARN: do_log: FSA: Input
> I_JOIN_REQUEST from route_message() received in state S_ELECTION
> Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: info: do_state_transition:
> State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
> cause=C_FSA_INTERNAL origin=do_election_check ]
> Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: info: start_subsystem:
> Starting sub-system "pengine"
> Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: WARN: start_subsystem:
> Client pengine already running as pid 1234
> Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: do_dc_takeover: Taking
> over DC status for this partition
> Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_readwrite:
> We are now in R/O mode
> Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request:
> Operation complete: op cib_slave_all for section 'all'
> (origin=local/crmd/231, version=1.1.1): ok (rc=0)
> Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_readwrite:
> We are now in R/W mode
> Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request:
> Operation complete: op cib_master for section 'all' (origin=local/crmd/232,
> version=1.1.1): ok (rc=0)
> Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request:
> Operation complete: op cib_modify for section cib (origin=local/crmd/233,
> version=1.1.1): ok (rc=0)
> Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request:
> Operation complete: op cib_modify for section crm_config
> (origin=local/crmd/235, version=1.1.1): ok (rc=0)
> Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: do_dc_join_offer_all:
> join-26: Waiting on 2 outstanding join acks
> Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request:
> Operation complete: op cib_modify for section crm_config
> (origin=local/crmd/237, version=1.1.1): ok (rc=0)
> Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: config_query_callback:
> Checking for expired actions every 900000ms
> Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info:
> do_election_count_vote: Election 52 (owner:
> 00000156-0156-0000-2b91-000000000000) pass: vote from vsa-0000009c-vc-0
> (Age)
> Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Set DC to
> vsa-0000009c-vc-1 (3.0.1)
> Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: do_state_transition:
> State transition S_INTEGRATION -> S_ELECTION [ input=I_ELECTION
> cause=C_FSA_INTERNAL origin=do_election_count_vote ]
> Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Unset DC
> vsa-0000009c-vc-1
> Jan 26 22:03:27 vsa-0000009c-vc-1 crmd: [1134]: info:
> do_election_count_vote: Election 53 (owner:
> 00000156-0156-0000-2b91-000000000000) pass: vote from vsa-0000009c-vc-0
> (Age)
> Jan 26 22:03:27 vsa-0000009c-vc-1 crmd: [1134]: WARN: do_log: FSA: Input
> I_JOIN_REQUEST from route_message() received in state S_ELECTION
> Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: info: do_state_transition:
> State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
> cause=C_FSA_INTERNAL origin=do_election_check ]
> Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: info: start_subsystem:
> Starting sub-system "pengine"
> Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: WARN: start_subsystem:
> Client pengine already running as pid 1234
>
> &  other node with
> Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: info: crm_timer_popped:
> Election Trigger (I_DC_TIMEOUT) just popped!
> Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: Input
> I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
> Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: info: do_state_transition:
> State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT
> cause=C_TIMER_POPPED origin=crm_timer_popped ]
> Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: Input
> I_JOIN_OFFER from route_message() received in state S_ELECTION
> Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info: do_state_transition:
> State transition S_ELECTION -> S_PENDING [ input=I_PENDING
> cause=C_FSA_INTERNAL origin=do_election_count_vote ]
> Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info: do_dc_release: DC role
> released
> Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info: do_te_control:
> Transitioner is now inactive
> Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: info: crm_timer_popped:
> Election Trigger (I_DC_TIMEOUT) just popped!
> Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: Input
> I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
> Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: info: do_state_transition:
> State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT
> cause=C_TIMER_POPPED origin=crm_timer_popped ]
> Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: Input
> I_JOIN_OFFER from route_message() received in state S_ELECTION
> Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info: do_state_transition:
> State transition S_ELECTION -> S_PENDING [ input=I_PENDING
> cause=C_FSA_INTERNAL origin=do_election_count_vote ]
> Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info: do_dc_release: DC role
> released
> Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info: do_te_control:
> Transitioner is now inactive
>
> This takes several minutes & finally breaks.
>
> Any pointers on what can be causing this?
>
> Thanks.
>
> --Shyam
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>




More information about the Pacemaker mailing list