[Pacemaker] Multiple thread after rebooting server: the node doesn't go online

Giovanni Di Milia gdimilia at cfa.harvard.edu
Thu Nov 19 15:40:23 EST 2009


On Nov 19, 2009, at 3:03 PM, Andrew Beekhof wrote:
>
>> Another problem has appeared:
>> after the reboot of one server I often have a cluster partition and  
>> both
>> servers elect themselves DC.
>> Even if the partition doesn't appear just after the reboot of one  
>> server
>> (i.e. serverA), if I try to restart corosync on the other server  
>> (i.e.
>> serverB), the partition appear.
>> Then if I also restart corosync on the first server (serverA)  
>> everything
>> work fine again.
>> But if I restart corosync on the second server (serverB) nothing  
>> change and
>> the partition appears again.
>> It's seems to me that there is still something wrong with the first  
>> run of
>> corosync just after the server reboot.
>
> I've found that it starts a bit too early by default.
> Various systems seem to like messing with the network stack (xen is
> one but there are others) which confuses corosync.

I wrote a shell script that "manually starts" corosync 5 minutes after  
the server starts and in this case the problem appears every time!
It's driving me crazy, because I can see that my script starts a while  
after the server is up and I'm pretty sure everything is running!
On the other hand, if I start manually corosync just after the server  
is up, everything works fine!


> You're not getting addresses from a dhcp server are you?
> Thats another common cause, since there can be a significant delay in
> obtaining the address - which again messes with corosync.

Absolutely no!
I have two servers with static public IP.
I also added the two server in the /etc/hosts file: in general I  
followed all the guidelines I found in the documentation.


>> I didn't configure any fencing method, because I think that my  
>> configuration
>> is really simple and I don't need it.
>
> Do you need your data though?


Do you mean it's better to configure a fencing method anyway?

Thank you very much for your help!
Giovanni
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20091119/18aaa868/attachment-0001.html>


More information about the Pacemaker mailing list