[Pacemaker] Need help with resolving very long election cycle

Andreas Kurz andreas at hastexo.com
Wed Feb 1 17:29:08 EST 2012


On 01/27/2012 12:21 PM, Shyam wrote:
> Folks,
> 
> We are constantly running into a long election cycle where in a 2-node
> cluster when both of them are simultaneously rebooted, they take a long
> time running through election loop.

why do you want to reboot them simultaneously? ... stop them one after
another and this will work fine.

If you want to avoid time consuming resource movement use cluster
property stop-all-resources prior to the serialized shutdown.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> On one node pacemaker loops like:
> Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: do_dc_takeover:
> Taking over DC status for this partition
> Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:
> cib_process_readwrite: We are now in R/O mode
> Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:
> cib_process_request: Operation complete: op cib_slave_all for section
> 'all' (origin=local/crmd/222, version=1.1.1): ok (rc=0)
> Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:
> cib_process_readwrite: We are now in R/W mode
> Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:
> cib_process_request: Operation complete: op cib_master for section 'all'
> (origin=local/crmd/223, version=1.1.1): ok (rc=0)
> Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:
> cib_process_request: Operation complete: op cib_modify for section cib
> (origin=local/crmd/224, version=1.1.1): ok (rc=0)
> Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:
> cib_process_request: Operation complete: op cib_modify for section
> crm_config (origin=local/crmd/226, version=1.1.1): ok (rc=0)
> Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info:
> do_dc_join_offer_all: join-25: Waiting on 2 outstanding join acks
> Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:
> cib_process_request: Operation complete: op cib_modify for section
> crm_config (origin=local/crmd/228, version=1.1.1): ok (rc=0)
> Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info:
> config_query_callback: Checking for expired actions every 900000ms
> Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info:
> do_election_count_vote: Election 50 (owner:
> 00000156-0156-0000-2b91-000000000000) pass: vote from vsa-0000009c-vc-0
> (Age)
> Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Set DC
> to vsa-0000009c-vc-1 (3.0.1)
> Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info:
> do_state_transition: State transition S_INTEGRATION -> S_ELECTION [
> input=I_ELECTION cause=C_FSA_INTERNAL origin=do_election_count_vote ]
> Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Unset
> DC vsa-0000009c-vc-1
> Jan 26 22:03:21 vsa-0000009c-vc-1 crmd: [1134]: info:
> do_election_count_vote: Election 51 (owner:
> 00000156-0156-0000-2b91-000000000000) pass: vote from vsa-0000009c-vc-0
> (Age)
> Jan 26 22:03:21 vsa-0000009c-vc-1 crmd: [1134]: WARN: do_log: FSA: Input
> I_JOIN_REQUEST from route_message() received in state S_ELECTION
> Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: info:
> do_state_transition: State transition S_ELECTION -> S_INTEGRATION [
> input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]
> Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: info: start_subsystem:
> Starting sub-system "pengine"
> Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: WARN: start_subsystem:
> Client pengine already running as pid 1234
> Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: do_dc_takeover:
> Taking over DC status for this partition
> Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:
> cib_process_readwrite: We are now in R/O mode
> Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:
> cib_process_request: Operation complete: op cib_slave_all for section
> 'all' (origin=local/crmd/231, version=1.1.1): ok (rc=0)
> Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:
> cib_process_readwrite: We are now in R/W mode
> Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:
> cib_process_request: Operation complete: op cib_master for section 'all'
> (origin=local/crmd/232, version=1.1.1): ok (rc=0)
> Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:
> cib_process_request: Operation complete: op cib_modify for section cib
> (origin=local/crmd/233, version=1.1.1): ok (rc=0)
> Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:
> cib_process_request: Operation complete: op cib_modify for section
> crm_config (origin=local/crmd/235, version=1.1.1): ok (rc=0)
> Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info:
> do_dc_join_offer_all: join-26: Waiting on 2 outstanding join acks
> Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:
> cib_process_request: Operation complete: op cib_modify for section
> crm_config (origin=local/crmd/237, version=1.1.1): ok (rc=0)
> Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info:
> config_query_callback: Checking for expired actions every 900000ms
> Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info:
> do_election_count_vote: Election 52 (owner:
> 00000156-0156-0000-2b91-000000000000) pass: vote from vsa-0000009c-vc-0
> (Age)
> Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Set DC
> to vsa-0000009c-vc-1 (3.0.1)
> Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info:
> do_state_transition: State transition S_INTEGRATION -> S_ELECTION [
> input=I_ELECTION cause=C_FSA_INTERNAL origin=do_election_count_vote ]
> Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Unset
> DC vsa-0000009c-vc-1
> Jan 26 22:03:27 vsa-0000009c-vc-1 crmd: [1134]: info:
> do_election_count_vote: Election 53 (owner:
> 00000156-0156-0000-2b91-000000000000) pass: vote from vsa-0000009c-vc-0
> (Age)
> Jan 26 22:03:27 vsa-0000009c-vc-1 crmd: [1134]: WARN: do_log: FSA: Input
> I_JOIN_REQUEST from route_message() received in state S_ELECTION
> Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: info:
> do_state_transition: State transition S_ELECTION -> S_INTEGRATION [
> input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]
> Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: info: start_subsystem:
> Starting sub-system "pengine"
> Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: WARN: start_subsystem:
> Client pengine already running as pid 1234
> 
> &  other node with
> Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: info: crm_timer_popped:
> Election Trigger (I_DC_TIMEOUT) just popped!
> Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: Input
> I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
> Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: info:
> do_state_transition: State transition S_PENDING -> S_ELECTION [
> input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_timer_popped ]
> Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: Input
> I_JOIN_OFFER from route_message() received in state S_ELECTION
> Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info:
> do_state_transition: State transition S_ELECTION -> S_PENDING [
> input=I_PENDING cause=C_FSA_INTERNAL origin=do_election_count_vote ]
> Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info: do_dc_release: DC
> role released
> Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info: do_te_control:
> Transitioner is now inactive
> Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: info: crm_timer_popped:
> Election Trigger (I_DC_TIMEOUT) just popped!
> Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: Input
> I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
> Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: info:
> do_state_transition: State transition S_PENDING -> S_ELECTION [
> input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_timer_popped ]
> Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: Input
> I_JOIN_OFFER from route_message() received in state S_ELECTION
> Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info:
> do_state_transition: State transition S_ELECTION -> S_PENDING [
> input=I_PENDING cause=C_FSA_INTERNAL origin=do_election_count_vote ]
> Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info: do_dc_release: DC
> role released
> Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info: do_te_control:
> Transitioner is now inactive
> 
> This takes several minutes & finally breaks.
> 
> Any pointers on what can be causing this?
> 
> Thanks.
> 
> --Shyam
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 222 bytes
Desc: OpenPGP digital signature
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120201/69578a34/attachment-0002.sig>


More information about the Pacemaker mailing list