[Pacemaker] Multiple thread after rebooting server: the node doesn't go online

Thu Nov 19 15:03:04 EST 2009

On Tue, Nov 17, 2009 at 10:31 PM, Giovanni Di Milia
<gdimilia at cfa.harvard.edu> wrote:
> Another problem has appeared:
> after the reboot of one server I often have a cluster partition and both
> servers elect themselves DC.
> Even if the partition doesn't appear just after the reboot of one server
> (i.e. serverA), if I try to restart corosync on the other server (i.e.
> serverB), the partition appear.
> Then if I also restart corosync on the first server (serverA) everything
> work fine again.
> But if I restart corosync on the second server (serverB) nothing change and
> the partition appears again.
> It's seems to me that there is still something wrong with the first run of
> corosync just after the server reboot.

I've found that it starts a bit too early by default.
Various systems seem to like messing with the network stack (xen is
one but there are others) which confuses corosync.

You're not getting addresses from a dhcp server are you?
Thats another common cause, since there can be a significant delay in
obtaining the address - which again messes with corosync.

> I didn't configure any fencing method, because I think that my configuration
> is really simple and I don't need it.

Do you need your data though?