[Pacemaker] node can't join cluster after reboot

Vladimir Elisseev vovan at vovan.nl
Thu Nov 1 02:08:50 EDT 2012


Yes, hb_report is there, thanks!

On Thu, 2012-11-01 at 11:40 +1100, Andrew Beekhof wrote:
> On Tue, Oct 30, 2012 at 4:35 PM, Vladimir Elisseev <vovan at vovan.nl> wrote:
> > Thanks for trying to help! Currently I can't provide crm_report from the
> > failed node, as I've decided to restore the complete node from backup.
> > The versions I use are corosync-1.3.0 and pacemaker-1.0.10. Actually the
> > problem occurred after updating quiet a few system packages, but all the
> > cluster related software was untouched. I've found exactly the same
> > issue described in the mailing list earlier:
> > http://www.gossamer-threads.com/lists/linuxha/pacemaker/77881?do=post_view_threaded#77881
> > At least symptoms are exactly the same as well as pasted log files. I've
> > tried enable debug logging as well and saw that crm tries to connect to
> > cib sockets (/var/run/crm_*) too early (IMO) and fails because cib
> > wasn't started yet.
> > I'm planning to repeat update of these system again, but I'll do this
> > more carefully in order to understand which particular package leads to
> > this behavior. BTW, how can I create crm_report? I can't find this
> > binary anywhere on the system.
> 
> Its included in subsequent 1.0.x releases.
> You should have hb_report available though.
> 
> > Let me know what kind of input you'll
> > need if I'll be able to reproduce this problem.
> >
> > Regards,
> > Vlad.
> >
> >
> > On Tue, 2012-10-30 at 16:00 +1100, Andrew Beekhof wrote:
> >> On Sun, Oct 28, 2012 at 9:05 PM, Vladimir Elisseev <vovan at vovan.nl> wrote:
> >> > Hello,
> >> >
> >> > I'm having problem that after reboot one cluster node can't join cluster
> >> > anymore. Form the log file I can't understand what actually is going on.
> >> > I only can see, that cib and crm both are respawned frequently. I'd
> >> > appreciate any help. Below is relevant part of the log file:
> >>
> >> I appreciate that you're trying to keep it brief, but problems often
> >> originate much earlier than people suspect.
> >> Can you instead attach a crm_report tarball, that will have everything
> >> (from both nodes) that we need to be able to help.
> >>
> >> What version is this btw?
> >>
> >> >
> >> > Oct 28 10:52:22 srv2 cib: [10646]: info: cib_server_process_diff: Requesting re-sync from peer
> >> > Oct 28 10:52:22 srv2 cib: [10646]: WARN: cib_diff_notify: Local-only Change (client:crmd, call: 4770): -1.-1.-1 (Application of an update diff failed, requesting a full refresh)
> >> > Oct 28 10:52:22 srv2 cib: [10653]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.qJTUAV (digest: /var/lib/heartbeat/crm/cib.XwOKXQ)
> >> > Oct 28 10:52:22 srv2 cib: [10646]: WARN: cib_server_process_diff: Not applying diff 0.1298.5 -> 0.1299.1 (sync in progress)
> >> > Oct 28 10:52:22 srv2 cib: [10646]: info: cib_replace_notify: Local-only Replace: -1.-1.-1 from srv1
> >> > Oct 28 10:52:22 corosync [pcmk]:  ] info: pcmk_ipc_exit: Client cib (conn=0x1837340, async-conn=0x1837340) left
> >> > Oct 28 10:52:22 corosync [pcmk]:  ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 6 (pid=10646, core=true)
> >> > Oct 28 10:52:22 corosync [pcmk]:  ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
> >> > Oct 28 10:52:22 corosync [pcmk]:  ] info: spawn_child: Forked child 10656 for process cib
> >> > Oct 28 10:52:22 srv2 cib: [10656]: info: Invoked: /usr/lib64/heartbeat/cib
> >> >
> >> >
> >> > Regards,
> >> > Vlad.
> >> >
> >> >
> >> > _______________________________________________
> >> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >> >
> >> > Project Home: http://www.clusterlabs.org
> >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> > Bugs: http://bugs.clusterlabs.org
> >>
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org






More information about the Pacemaker mailing list