[Pacemaker] never ending election

David Riccitelli david at interact.it
Tue Aug 5 11:17:55 UTC 2008

Here is the log of the second one (node #1):

This line refers to the moment I manually killed the heartbeat on node  
Aug  1 12:16:13 rmefp-srv01x heartbeat: [28575]: WARN: node rmefp- 
srv02x: is dead

Best regards,
David Riccitelli

David Riccitelli

e-mail: david at interact.it
skype: ziodave
phone: +39.0658318336

  roma - tel.+39.0658318301 fax.+39.0658318303 P.I. 04856801008

Rispetta l'ambiente e non stampare questa e-mail a meno che non ti sia  
realmente utile.
Please consider the environment and don't print this e-mail unless you  
really need to.

Le informazioni trasmesse attraverso la presente e-mail ed i suoi  
allegati sono diretti esclusivamente al
destinatario e devono ritenersi riservati con divieto di diffusione e  
di uso. La diffusione e la comunicazione
da parte di soggetto diverso dal destinatario è vietata dall’art. 616  
e ss. c.p. e dal d. l.vo n. 196/03.
Se la presente e-mail ed i suoi allegati fossero stati ricevuti per  
errore da persona diversa dal destinatario
siete pregati di distruggere tutto quanto ricevuto e di informare il  
mittente con lo stesso mezzo.

On 04/ago/08, at 17:19, Andrew Beekhof wrote:

> On Mon, Aug 4, 2008 at 16:53, David Riccitelli <david at interact.it>  
> wrote:
>> The log for the second node are located here:
>> https://share.acrobat.com/adc/document.do?docid=144a8a57-4c6a-46d9-bfc4-cad7dd31fc02
>> I don't have the one for the first node at the moment.
> unfortunately i need both - since both think they should win the
> election and I need to try and figure out who's right (and thus where
> the bug is)
>> The log starts with this line:
>> Aug  1 11:50:09 rmefp-srv02x heartbeat: [20780]: WARN: node rmefp- 
>> srv01x:
>> is dead
>> which is when I removed the two network cables from the first node;
>> And the last meaningful line I believe is this:
>> Aug  1 12:19:51 rmefp-srv02x crmd: [20793]: info:  
>> do_election_check: Still
>> waiting on 2 non-votes (2 total)
>> As the following line happens when I forced the restart of the  
>> heartbeat
>> service (on the second node):
>> Aug  1 12:19:55 rmefp-srv02x heartbeat: [6404]: info: No log entry  
>> found in
>> ha.cf -- use logd
>> Best regards,
>> David Riccitelli
>> ________________________________________________________________________
>> David Riccitelli
>> e-mail: david at interact.it
>> skype: ziodave
>> phone: +39.0658318336
>> roma - tel.+39.0658318301 fax.+39.0658318303 P.I. 04856801008
>> Rispetta l'ambiente e non stampare questa e-mail a meno che non ti  
>> sia
>> realmente utile.
>> Please consider the environment and don't print this e-mail unless  
>> you
>> really need to.
>> Le informazioni trasmesse attraverso la presente e-mail ed i suoi  
>> allegati
>> sono diretti esclusivamente al
>> destinatario e devono ritenersi riservati con divieto di diffusione  
>> e di
>> uso. La diffusione e la comunicazione
>> da parte di soggetto diverso dal destinatario è vietata dall'art.  
>> 616 e ss.
>> c.p. e dal d. l.vo n. 196/03.
>> Se la presente e-mail ed i suoi allegati fossero stati ricevuti per  
>> errore
>> da persona diversa dal destinatario
>> siete pregati di distruggere tutto quanto ricevuto e di informare il
>> mittente con lo stesso mezzo.
>> ________________________________________________________________________
>> On 04/ago/08, at 13:11, Andrew Beekhof wrote:
>> Hard to say what's going on based on this log fragment.
>> Can you put the full logs from both nodes somewhere?
>> On Sun, Aug 3, 2008 at 11:18, David Riccitelli <david at interact.it>  
>> wrote:
>> Hello there,
>> Can somebody help me with this problem?
>> I have 2 identical nodes, node #1 and node #2. Nodes are installed  
>> with
>> CentOS 5 and the current version of heartbeat (2.1.3) and pacemaker  
>> (0.6.5).
>> Each node has 2 network ports bonded together (mode 1). bonding is
>> configured and working fine.
>> The nodes have one resource configured. And I must say everything  
>> works
>> fine. All the tests I'm running show perfect failovers, but one test:
>> 1. node #1 has the resource, node #2 is waiting,
>> 2. I remove both network cables from node #1,
>> 3. node #2 doesn't sense node #1 anymore and believes it is dead,
>> 4. node #2 brings up the resource,
>> 5. then I put back node #1 in the network - I believe the nodes  
>> should see
>> themselves and one of the two will leave the resource,
>> 6. node #1 and node #2 see each other and start counting election  
>> votes,
>> but for an indefinite time and the resource is active on two nodes  
>> at the
>> same time:
>> logs (same on both nodes - this pattern repeats forever, until  
>> heartbeat is
>> manually stopped on one of the nodes):
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker at clusterlabs.org
>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker at clusterlabs.org
>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at clusterlabs.org
> http://list.clusterlabs.org/mailman/listinfo/pacemaker

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20080805/837a9ade/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 398 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20080805/837a9ade/attachment-0004.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.gif
Type: image/gif
Size: 1386 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20080805/837a9ade/attachment-0005.gif>

More information about the Pacemaker mailing list