Hi Andreas,<div><br></div><div>Yes this is only for testing. The specific test was not two VM's running on same host. We have two physical servers each running a VM & the VM's run pacemaker/heartbeat. We reboot both physical servers (to simulate a power-fail) & after that watch both VM's do negotiation.</div>
<div><br></div><div>--Shyam<br><br><div class="gmail_quote">On Thu, Feb 2, 2012 at 3:38 PM, Andreas Kurz <span dir="ltr"><<a href="mailto:andreas@hastexo.com">andreas@hastexo.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">On 02/02/2012 04:45 AM, Shyam wrote:<br>
> Hi Andreas,<br>
><br>
> Thanks for your reply.<br>
><br>
> We are using pacemaker in VM environment & was primarily checking how it<br>
> behaves when two nodes hosting the clustered VM's reboot. It apparently<br>
> took a very long time doing the elections.<br>
<br>
</div>Ok, but this is only for testing? For a production system the VMs<br>
running a cluster should not run on the same host as this would be a SPOF.<br>
<div class="im"><br>
><br>
> I realized that we were using dc-deadtime at 5sec. After bumping this up<br>
> to 10sec, this long election cycle problem disappeared.<br>
<br>
</div>... interesting<br>
<div class="im"><br>
Regards,<br>
Andreas<br>
<br>
--<br>
Need help with Pacemaker?<br>
<a href="http://www.hastexo.com/now" target="_blank">http://www.hastexo.com/now</a><br>
<br>
><br>
</div><div class="im">> --Shyam<br>
><br>
> On Thu, Feb 2, 2012 at 3:59 AM, Andreas Kurz <<a href="mailto:andreas@hastexo.com">andreas@hastexo.com</a><br>
</div><div><div></div><div class="h5">> <mailto:<a href="mailto:andreas@hastexo.com">andreas@hastexo.com</a>>> wrote:<br>
><br>
> On 01/27/2012 12:21 PM, Shyam wrote:<br>
> > Folks,<br>
> ><br>
> > We are constantly running into a long election cycle where in a 2-node<br>
> > cluster when both of them are simultaneously rebooted, they take a<br>
> long<br>
> > time running through election loop.<br>
><br>
> why do you want to reboot them simultaneously? ... stop them one after<br>
> another and this will work fine.<br>
><br>
> If you want to avoid time consuming resource movement use cluster<br>
> property stop-all-resources prior to the serialized shutdown.<br>
><br>
> Regards,<br>
> Andreas<br>
><br>
> --<br>
> Need help with Pacemaker?<br>
> <a href="http://www.hastexo.com/now" target="_blank">http://www.hastexo.com/now</a><br>
><br>
> ><br>
> > On one node pacemaker loops like:<br>
> > Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: do_dc_takeover:<br>
> > Taking over DC status for this partition<br>
> > Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> > cib_process_readwrite: We are now in R/O mode<br>
> > Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> > cib_process_request: Operation complete: op cib_slave_all for section<br>
> > 'all' (origin=local/crmd/222, version=1.1.1): ok (rc=0)<br>
> > Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> > cib_process_readwrite: We are now in R/W mode<br>
> > Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> > cib_process_request: Operation complete: op cib_master for section<br>
> 'all'<br>
> > (origin=local/crmd/223, version=1.1.1): ok (rc=0)<br>
> > Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> > cib_process_request: Operation complete: op cib_modify for section cib<br>
> > (origin=local/crmd/224, version=1.1.1): ok (rc=0)<br>
> > Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> > cib_process_request: Operation complete: op cib_modify for section<br>
> > crm_config (origin=local/crmd/226, version=1.1.1): ok (rc=0)<br>
> > Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
> > do_dc_join_offer_all: join-25: Waiting on 2 outstanding join acks<br>
> > Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> > cib_process_request: Operation complete: op cib_modify for section<br>
> > crm_config (origin=local/crmd/228, version=1.1.1): ok (rc=0)<br>
> > Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
> > config_query_callback: Checking for expired actions every 900000ms<br>
> > Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
> > do_election_count_vote: Election 50 (owner:<br>
> > 00000156-0156-0000-2b91-000000000000) pass: vote from<br>
> vsa-0000009c-vc-0<br>
> > (Age)<br>
> > Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc:<br>
> Set DC<br>
> > to vsa-0000009c-vc-1 (3.0.1)<br>
> > Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
> > do_state_transition: State transition S_INTEGRATION -> S_ELECTION [<br>
> > input=I_ELECTION cause=C_FSA_INTERNAL origin=do_election_count_vote ]<br>
> > Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Unset<br>
> > DC vsa-0000009c-vc-1<br>
> > Jan 26 22:03:21 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
> > do_election_count_vote: Election 51 (owner:<br>
> > 00000156-0156-0000-2b91-000000000000) pass: vote from<br>
> vsa-0000009c-vc-0<br>
> > (Age)<br>
> > Jan 26 22:03:21 vsa-0000009c-vc-1 crmd: [1134]: WARN: do_log: FSA:<br>
> Input<br>
> > I_JOIN_REQUEST from route_message() received in state S_ELECTION<br>
> > Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
> > do_state_transition: State transition S_ELECTION -> S_INTEGRATION [<br>
> > input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]<br>
> > Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: info: start_subsystem:<br>
> > Starting sub-system "pengine"<br>
> > Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: WARN: start_subsystem:<br>
> > Client pengine already running as pid 1234<br>
> > Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: do_dc_takeover:<br>
> > Taking over DC status for this partition<br>
> > Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> > cib_process_readwrite: We are now in R/O mode<br>
> > Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> > cib_process_request: Operation complete: op cib_slave_all for section<br>
> > 'all' (origin=local/crmd/231, version=1.1.1): ok (rc=0)<br>
> > Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> > cib_process_readwrite: We are now in R/W mode<br>
> > Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> > cib_process_request: Operation complete: op cib_master for section<br>
> 'all'<br>
> > (origin=local/crmd/232, version=1.1.1): ok (rc=0)<br>
> > Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> > cib_process_request: Operation complete: op cib_modify for section cib<br>
> > (origin=local/crmd/233, version=1.1.1): ok (rc=0)<br>
> > Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> > cib_process_request: Operation complete: op cib_modify for section<br>
> > crm_config (origin=local/crmd/235, version=1.1.1): ok (rc=0)<br>
> > Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
> > do_dc_join_offer_all: join-26: Waiting on 2 outstanding join acks<br>
> > Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:<br>
> > cib_process_request: Operation complete: op cib_modify for section<br>
> > crm_config (origin=local/crmd/237, version=1.1.1): ok (rc=0)<br>
> > Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
> > config_query_callback: Checking for expired actions every 900000ms<br>
> > Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
> > do_election_count_vote: Election 52 (owner:<br>
> > 00000156-0156-0000-2b91-000000000000) pass: vote from<br>
> vsa-0000009c-vc-0<br>
> > (Age)<br>
> > Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc:<br>
> Set DC<br>
> > to vsa-0000009c-vc-1 (3.0.1)<br>
> > Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
> > do_state_transition: State transition S_INTEGRATION -> S_ELECTION [<br>
> > input=I_ELECTION cause=C_FSA_INTERNAL origin=do_election_count_vote ]<br>
> > Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Unset<br>
> > DC vsa-0000009c-vc-1<br>
> > Jan 26 22:03:27 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
> > do_election_count_vote: Election 53 (owner:<br>
> > 00000156-0156-0000-2b91-000000000000) pass: vote from<br>
> vsa-0000009c-vc-0<br>
> > (Age)<br>
> > Jan 26 22:03:27 vsa-0000009c-vc-1 crmd: [1134]: WARN: do_log: FSA:<br>
> Input<br>
> > I_JOIN_REQUEST from route_message() received in state S_ELECTION<br>
> > Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
> > do_state_transition: State transition S_ELECTION -> S_INTEGRATION [<br>
> > input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]<br>
> > Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: info: start_subsystem:<br>
> > Starting sub-system "pengine"<br>
> > Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: WARN: start_subsystem:<br>
> > Client pengine already running as pid 1234<br>
> ><br>
> > & other node with<br>
> > Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: info:<br>
> crm_timer_popped:<br>
> > Election Trigger (I_DC_TIMEOUT) just popped!<br>
> > Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA:<br>
> Input<br>
> > I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING<br>
> > Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: info:<br>
> > do_state_transition: State transition S_PENDING -> S_ELECTION [<br>
> > input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_timer_popped ]<br>
> > Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA:<br>
> Input<br>
> > I_JOIN_OFFER from route_message() received in state S_ELECTION<br>
> > Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info:<br>
> > do_state_transition: State transition S_ELECTION -> S_PENDING [<br>
> > input=I_PENDING cause=C_FSA_INTERNAL origin=do_election_count_vote ]<br>
> > Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info:<br>
> do_dc_release: DC<br>
> > role released<br>
> > Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info: do_te_control:<br>
> > Transitioner is now inactive<br>
> > Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: info:<br>
> crm_timer_popped:<br>
> > Election Trigger (I_DC_TIMEOUT) just popped!<br>
> > Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA:<br>
> Input<br>
> > I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING<br>
> > Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: info:<br>
> > do_state_transition: State transition S_PENDING -> S_ELECTION [<br>
> > input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_timer_popped ]<br>
> > Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA:<br>
> Input<br>
> > I_JOIN_OFFER from route_message() received in state S_ELECTION<br>
> > Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info:<br>
> > do_state_transition: State transition S_ELECTION -> S_PENDING [<br>
> > input=I_PENDING cause=C_FSA_INTERNAL origin=do_election_count_vote ]<br>
> > Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info:<br>
> do_dc_release: DC<br>
> > role released<br>
> > Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info: do_te_control:<br>
> > Transitioner is now inactive<br>
> ><br>
> > This takes several minutes & finally breaks.<br>
> ><br>
> > Any pointers on what can be causing this?<br>
> ><br>
> > Thanks.<br>
> ><br>
> > --Shyam<br>
> ><br>
> ><br>
> > _______________________________________________<br>
> > Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
</div></div>> <mailto:<a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a>><br>
<div class="im">> > <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
> ><br>
> > Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
> > Getting started:<br>
> <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
> > Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
><br>
><br>
><br>
> _______________________________________________<br>
> Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
</div>> <mailto:<a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a>><br>
<div><div></div><div class="h5">> <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
><br>
> Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
> Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
><br>
><br>
><br>
><br>
> _______________________________________________<br>
> Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
> <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
><br>
> Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
> Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
<br>
<br>
</div></div><br>_______________________________________________<br>
Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
<br></blockquote></div><br></div>