Hi Andreas,<div><br></div><div>Thanks for your reply.</div><div><br></div><div>We are using pacemaker in VM environment & was primarily checking how it behaves when two nodes hosting the clustered VM's reboot. It apparently took a very long time doing the elections.</div>
<div><br></div><div>I realized that we were using dc-deadtime at 5sec. After bumping this up to 10sec, this long election cycle problem disappeared.</div><div><br></div><div>--Shyam<br><br><div class="gmail_quote">On Thu, Feb 2, 2012 at 3:59 AM, Andreas Kurz <span dir="ltr"><<a href="mailto:andreas@hastexo.com">andreas@hastexo.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On 01/27/2012 12:21 PM, Shyam wrote:<br>
> Folks,<br>
><br>
> We are constantly running into a long election cycle where in a 2-node<br>
> cluster when both of them are simultaneously rebooted, they take a long<br>
> time running through election loop.<br>
<br>
</div>why do you want to reboot them simultaneously? ... stop them one after<br>
another and this will work fine.<br>
<br>
If you want to avoid time consuming resource movement use cluster<br>
property stop-all-resources prior to the serialized shutdown.<br>
<br>
Regards,<br>
Andreas<br>
<br>
--<br>
Need help with Pacemaker?<br>
<a href="http://www.hastexo.com/now" target="_blank">http://www.hastexo.com/now</a><br>
<div><div></div><div class="h5"><br>
><br>
> On one node pacemaker loops like:<br>
> Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: do_dc_takeover:<br>
> Taking over DC status for this partition<br>
> Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> cib_process_readwrite: We are now in R/O mode<br>
> Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> cib_process_request: Operation complete: op cib_slave_all for section<br>
> 'all' (origin=local/crmd/222, version=1.1.1): ok (rc=0)<br>
> Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> cib_process_readwrite: We are now in R/W mode<br>
> Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> cib_process_request: Operation complete: op cib_master for section 'all'<br>
> (origin=local/crmd/223, version=1.1.1): ok (rc=0)<br>
> Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> cib_process_request: Operation complete: op cib_modify for section cib<br>
> (origin=local/crmd/224, version=1.1.1): ok (rc=0)<br>
> Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> cib_process_request: Operation complete: op cib_modify for section<br>
> crm_config (origin=local/crmd/226, version=1.1.1): ok (rc=0)<br>
> Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
> do_dc_join_offer_all: join-25: Waiting on 2 outstanding join acks<br>
> Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> cib_process_request: Operation complete: op cib_modify for section<br>
> crm_config (origin=local/crmd/228, version=1.1.1): ok (rc=0)<br>
> Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
> config_query_callback: Checking for expired actions every 900000ms<br>
> Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
> do_election_count_vote: Election 50 (owner:<br>
> 00000156-0156-0000-2b91-000000000000) pass: vote from vsa-0000009c-vc-0<br>
> (Age)<br>
> Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Set DC<br>
> to vsa-0000009c-vc-1 (3.0.1)<br>
> Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
> do_state_transition: State transition S_INTEGRATION -> S_ELECTION [<br>
> input=I_ELECTION cause=C_FSA_INTERNAL origin=do_election_count_vote ]<br>
> Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Unset<br>
> DC vsa-0000009c-vc-1<br>
> Jan 26 22:03:21 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
> do_election_count_vote: Election 51 (owner:<br>
> 00000156-0156-0000-2b91-000000000000) pass: vote from vsa-0000009c-vc-0<br>
> (Age)<br>
> Jan 26 22:03:21 vsa-0000009c-vc-1 crmd: [1134]: WARN: do_log: FSA: Input<br>
> I_JOIN_REQUEST from route_message() received in state S_ELECTION<br>
> Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
> do_state_transition: State transition S_ELECTION -> S_INTEGRATION [<br>
> input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]<br>
> Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: info: start_subsystem:<br>
> Starting sub-system "pengine"<br>
> Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: WARN: start_subsystem:<br>
> Client pengine already running as pid 1234<br>
> Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: do_dc_takeover:<br>
> Taking over DC status for this partition<br>
> Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> cib_process_readwrite: We are now in R/O mode<br>
> Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> cib_process_request: Operation complete: op cib_slave_all for section<br>
> 'all' (origin=local/crmd/231, version=1.1.1): ok (rc=0)<br>
> Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> cib_process_readwrite: We are now in R/W mode<br>
> Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> cib_process_request: Operation complete: op cib_master for section 'all'<br>
> (origin=local/crmd/232, version=1.1.1): ok (rc=0)<br>
> Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> cib_process_request: Operation complete: op cib_modify for section cib<br>
> (origin=local/crmd/233, version=1.1.1): ok (rc=0)<br>
> Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> cib_process_request: Operation complete: op cib_modify for section<br>
> crm_config (origin=local/crmd/235, version=1.1.1): ok (rc=0)<br>
> Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
> do_dc_join_offer_all: join-26: Waiting on 2 outstanding join acks<br>
> Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> cib_process_request: Operation complete: op cib_modify for section<br>
> crm_config (origin=local/crmd/237, version=1.1.1): ok (rc=0)<br>
> Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
> config_query_callback: Checking for expired actions every 900000ms<br>
> Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
> do_election_count_vote: Election 52 (owner:<br>
> 00000156-0156-0000-2b91-000000000000) pass: vote from vsa-0000009c-vc-0<br>
> (Age)<br>
> Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Set DC<br>
> to vsa-0000009c-vc-1 (3.0.1)<br>
> Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
> do_state_transition: State transition S_INTEGRATION -> S_ELECTION [<br>
> input=I_ELECTION cause=C_FSA_INTERNAL origin=do_election_count_vote ]<br>
> Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Unset<br>
> DC vsa-0000009c-vc-1<br>
> Jan 26 22:03:27 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
> do_election_count_vote: Election 53 (owner:<br>
> 00000156-0156-0000-2b91-000000000000) pass: vote from vsa-0000009c-vc-0<br>
> (Age)<br>
> Jan 26 22:03:27 vsa-0000009c-vc-1 crmd: [1134]: WARN: do_log: FSA: Input<br>
> I_JOIN_REQUEST from route_message() received in state S_ELECTION<br>
> Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
> do_state_transition: State transition S_ELECTION -> S_INTEGRATION [<br>
> input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]<br>
> Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: info: start_subsystem:<br>
> Starting sub-system "pengine"<br>
> Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: WARN: start_subsystem:<br>
> Client pengine already running as pid 1234<br>
><br>
> & other node with<br>
> Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: info: crm_timer_popped:<br>
> Election Trigger (I_DC_TIMEOUT) just popped!<br>
> Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: Input<br>
> I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING<br>
> Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: info:<br>
> do_state_transition: State transition S_PENDING -> S_ELECTION [<br>
> input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_timer_popped ]<br>
> Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: Input<br>
> I_JOIN_OFFER from route_message() received in state S_ELECTION<br>
> Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info:<br>
> do_state_transition: State transition S_ELECTION -> S_PENDING [<br>
> input=I_PENDING cause=C_FSA_INTERNAL origin=do_election_count_vote ]<br>
> Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info: do_dc_release: DC<br>
> role released<br>
> Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info: do_te_control:<br>
> Transitioner is now inactive<br>
> Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: info: crm_timer_popped:<br>
> Election Trigger (I_DC_TIMEOUT) just popped!<br>
> Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: Input<br>
> I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING<br>
> Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: info:<br>
> do_state_transition: State transition S_PENDING -> S_ELECTION [<br>
> input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_timer_popped ]<br>
> Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: Input<br>
> I_JOIN_OFFER from route_message() received in state S_ELECTION<br>
> Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info:<br>
> do_state_transition: State transition S_ELECTION -> S_PENDING [<br>
> input=I_PENDING cause=C_FSA_INTERNAL origin=do_election_count_vote ]<br>
> Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info: do_dc_release: DC<br>
> role released<br>
> Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info: do_te_control:<br>
> Transitioner is now inactive<br>
><br>
> This takes several minutes & finally breaks.<br>
><br>
> Any pointers on what can be causing this?<br>
><br>
> Thanks.<br>
><br>
> --Shyam<br>
><br>
><br>
</div></div>> _______________________________________________<br>
> Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
> <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
><br>
> Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
> Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
<br>
<br>
<br>_______________________________________________<br>
Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
<br></blockquote></div><br></div>