[Pacemaker] Node doesn't rejoin automatically after reboot

Bob Haxo bhaxo at sgi.com
Thu Jan 13 14:15:55 EST 2011


So, Tom ...how do you get the failed node online?  

I've re-installed with the same image that is running on three other
nodes, but still fails.  This node was quite happy for the past 3
months.  As I'm testing installs, this and other nodes have been
installed a significant number of times without this sort of failure.
I'd whack the whole HA cluster ... except that I don't want to run into
this failure again without better solution than "reinstall the
system" ;-)

I'm looking at the information retuned with corosync debug enabled.
After startup, everything looks fine to me until hitting this apparent
local ipc delivery failure:

Jan 13 10:09:10 corosync [TOTEM ] Delivering 2 to 3
Jan 13 10:09:10 corosync [TOTEM ] Delivering MCAST message with seq 3 to pending delivery queue
Jan 13 10:09:10 corosync [pcmk  ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Jan 13 10:09:10 corosync [pcmk  ] Msg[6486] (dest=local:crmd, from=r1lead1:crmd.11229, remote=true, size=181): <create_request_adv origin="post_cache_update" t="crmd" version="3.0.2" subt="request" ref
Jan 13 10:09:10 corosync [TOTEM ] mcasted message added to pending queue

Guess that I'll have to renew my acquaintance with ipc. 

Bob Haxo



On Thu, 2011-01-13 at 19:17 +0100, Tom Tux wrote:
> I don't know. I still have this issue (and it seems, that I'm not the
> only one...). I'll have a look, if there are pacemaker-updates through
> the zypper-update-channel available (sles11-sp1).
> 
> Regards,
> Tom
> 
> 
> 2011/1/13 Bob Haxo <bhaxo at sgi.com>:
> > Tom, others,
> >
> > Please, what was the solution to this issue?
> >
> > Thanks,
> > Bob Haxo
> >
> > On Mon, 2010-09-06 at 09:50 +0200, Tom Tux wrote:
> >
> > Yes, corosync is running after the reboot. It comes up with the
> > regular init-procedure (runlevel 3 in my case).
> >
> > 2010/9/6 Andrew Beekhof <andrew at beekhof.net>:
> >> On Mon, Sep 6, 2010 at 7:57 AM, Tom Tux <tomtux80 at gmail.com> wrote:
> >>> No, I don't have such failed-messages. In my case, the "Connection to
> >>> our AIS plugin" was established.
> >>>
> >>> The /dev/shm is also not full.
> >>
> >> Is corosync running?
> >>
> >>> Kind regards,
> >>> Tom
> >>>
> >>> 2010/9/3 Michael Smith <msmith at cbnco.com>:
> >>>> Tom Tux wrote:
> >>>>
> >>>>> If I disjoin one clusternode (node01) for maintenance-purposes
> >>>>> (/etc/init.d/openais stop) and reboot this node, then it will not join
> >>>>> himself automatically into the cluster. After the reboot, I have the
> >>>>> following error- and warn-messages in the log:
> >>>>>
> >>>>> Sep  3 07:34:15 node01 mgmtd: [9202]: info: login to cib failed: live
> >>>>
> >>>> Do you have messages like this, too?
> >>>>
> >>>> Aug 30 15:48:10 xen-test1 corosync[5851]:  [IPC   ] Invalid IPC
> >>>> credentials.
> >>>> Aug 30 15:48:10 xen-test1 cib: [5858]: info: init_ais_connection:
> >>>> Connection to our AIS plugin (9) failed: unknown (100)
> >>>>
> >>>> Aug 30 15:48:10 xen-test1 cib: [5858]: CRIT: cib_init: Cannot sign in to
> >>>> the cluster... terminating
> >>>>
> >>>>
> >>>>
> >>>> http://news.gmane.org/find-root.php?message_id=%3c4C7C0EC7.2050708%40cbnco.com%3e
> >>>>
> >>>> Mike
> >>>>
> >>>> _______________________________________________
> >>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>>
> >>>> Project Home: http://www.clusterlabs.org
> >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>> Bugs:
> >>>>
> >>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >>>>
> >>>
> >>> _______________________________________________
> >>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>
> >>> Project Home: http://www.clusterlabs.org
> >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> Bugs:
> >>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >>>
> >>
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs:
> >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >>
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs:
> > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >





More information about the Pacemaker mailing list