[Pacemaker] Problem with CRMD restart

Sat Nov 20 03:56:36 EST 2010

On Fri, Nov 19, 2010 at 10:11 AM, JiaQiang Xu <xjqkilling at gmail.com> wrote:
> Hi,
>
> I'm using pacemaker 1.0.9 and corosync 1.2.7.
> Recently I found a problem with CRMD restart.
>
> If CRMD crashes or is manually killed, for now corosync will try to restart it
> up to 100 times (done in lib/ais/plugin.c). But what if CRMD become so buggy
> (or due to some environmental factor) that it cannot be restarted successfully
> after 100 times?

This has only ever happened during development when I broke something.
No user has ever hit this.

> I read through the code and found that in this situation the ais
> plugin will send
> out a notification message to other nodes in the cluster. But now the
> nodes won't
> do anything more than updating peer information upon receiving this
> notification.
>
> Is this a bug?

No, there is nothing else that needs to be done.
Other parts of pacemaker look at that peer data and will shoot the
node if necessary.

> Or we just don't plan to deal with it?
>
> Thanks.
> --Jiaqiang
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>