[Pacemaker] node can't join cluster after reboot

Vladimir Elisseev vovan at vovan.nl
Sat Nov 3 08:26:24 EDT 2012


I've been able to reproduce the problem. Herewith I've attached
crm_report tarballs from both nodes. Although I don't know what
particular package triggers this problem, but below is the list of what
has been updated. Hopefully this helps.

Regards,
Vlad.

Sat Nov  3 12:15:40 2012 <<< sys-apps/busybox-1.20.2
Sat Nov  3 12:15:42 2012 >>> sys-apps/busybox-1.20.2
Sat Nov  3 12:15:50 2012 <<< sys-fs/dosfstools-3.0.9
Sat Nov  3 12:15:52 2012 >>> sys-fs/dosfstools-3.0.12
Sat Nov  3 12:16:00 2012 <<< dev-lang/nasm-2.10.01
Sat Nov  3 12:16:02 2012 >>> dev-lang/nasm-2.10.05
Sat Nov  3 12:16:11 2012 <<< dev-libs/libgamin-0.1.10-r2
Sat Nov  3 12:16:13 2012 >>> dev-libs/libgamin-0.1.10-r3
Sat Nov  3 12:16:40 2012 <<< media-fonts/droid-113-r1
Sat Nov  3 12:16:46 2012 >>> media-fonts/droid-113-r2
Sat Nov  3 12:16:54 2012 <<< media-libs/libpng-1.5.10
Sat Nov  3 12:16:56 2012 >>> media-libs/libpng-1.5.13-r1
Sat Nov  3 12:17:04 2012 <<< app-arch/unzip-6.0-r1
Sat Nov  3 12:17:05 2012 >>> app-arch/unzip-6.0-r3
Sat Nov  3 12:17:12 2012 <<< app-arch/rpm2targz-9.0.0.4g
Sat Nov  3 12:17:14 2012 >>> app-arch/rpm2targz-9.0.0.5g
Sat Nov  3 12:17:22 2012 <<< app-arch/pbzip2-1.1.5
Sat Nov  3 12:17:24 2012 >>> app-arch/pbzip2-1.1.8
Sat Nov  3 12:17:34 2012 <<< app-arch/zip-3.0
Sat Nov  3 12:17:35 2012 >>> app-arch/zip-3.0-r1
Sat Nov  3 12:17:43 2012 <<< sys-process/htop-1.0.1
Sat Nov  3 12:17:45 2012 >>> sys-process/htop-1.0.1-r1
Sat Nov  3 12:17:55 2012 <<< media-libs/tiff-4.0.2
Sat Nov  3 12:17:57 2012 >>> media-libs/tiff-4.0.2-r1
Sat Nov  3 12:18:04 2012 <<< net-ftp/tftp-hpa-5.1
Sat Nov  3 12:18:06 2012 >>> net-ftp/tftp-hpa-5.2
Sat Nov  3 12:18:18 2012 <<< media-video/ffmpeg-0.10.3
Sat Nov  3 12:18:20 2012 >>> media-video/ffmpeg-0.10.3
Sat Nov  3 12:18:35 2012 <<< sys-devel/gettext-0.18.1.1-r1
Sat Nov  3 12:18:37 2012 >>> sys-devel/gettext-0.18.1.1-r3
Sat Nov  3 12:18:44 2012 <<< app-admin/logrotate-3.8.1
Sat Nov  3 12:18:46 2012 >>> app-admin/logrotate-3.8.2
Sat Nov  3 12:18:54 2012 <<< media-libs/libwebp-0.1.3
Sat Nov  3 12:18:55 2012 >>> media-libs/libwebp-0.2.0
Sat Nov  3 12:19:03 2012 <<< dev-perl/Convert-ASN1-0.220.0
Sat Nov  3 12:19:05 2012 >>> dev-perl/Convert-ASN1-0.260.0
Sat Nov  3 12:19:13 2012 <<< dev-perl/net-server-0.97
Sat Nov  3 12:19:15 2012 >>> dev-perl/net-server-2.6.0
Sat Nov  3 12:19:24 2012 <<< dev-perl/Config-IniFiles-2.710.0
Sat Nov  3 12:19:26 2012 >>> dev-perl/Config-IniFiles-2.760.0
Sat Nov  3 12:19:33 2012 <<< dev-perl/HTTP-Date-6.0.0
Sat Nov  3 12:19:35 2012 >>> dev-perl/HTTP-Date-6.20.0
Sat Nov  3 12:19:44 2012 <<< sys-boot/syslinux-4.06_pre11
Sat Nov  3 12:19:46 2012 >>> sys-boot/syslinux-4.06
Sat Nov  3 12:20:05 2012 <<< dev-libs/glib-2.30.3
Sat Nov  3 12:20:08 2012 >>> dev-libs/glib-2.32.4-r1
Sat Nov  3 12:20:16 2012 <<< dev-util/pkgconfig-0.27
Sat Nov  3 12:20:18 2012 >>> dev-util/pkgconfig-0.27.1
Sat Nov  3 12:20:28 2012 <<< net-analyzer/jnettop-0.13.0-r1
Sat Nov  3 12:20:29 2012 >>> net-analyzer/jnettop-0.13.0-r1
Sat Nov  3 12:20:41 2012 <<< x11-libs/pango-1.29.4
Sat Nov  3 12:20:43 2012 >>> x11-libs/pango-1.30.1
Sat Nov  3 12:20:53 2012 <<< net-analyzer/rrdtool-1.4.5-r1
Sat Nov  3 12:20:56 2012 >>> net-analyzer/rrdtool-1.4.7-r1
Sat Nov  3 12:21:03 2012 <<< app-shells/gentoo-bashcomp-20101217
Sat Nov  3 12:21:05 2012 >>> app-shells/gentoo-bashcomp-20101217-r1
Sat Nov  3 12:21:12 2012 <<< dev-perl/MIME-tools-5.502.0
Sat Nov  3 12:21:14 2012 >>> dev-perl/MIME-tools-5.503.0
Sat Nov  3 12:21:24 2012 <<< dev-perl/Convert-TNEF-0.170.0
Sat Nov  3 12:21:26 2012 >>> dev-perl/Convert-TNEF-0.180.0
Sat Nov  3 12:21:35 2012 <<< net-misc/curl-7.25.0-r1
Sat Nov  3 12:21:36 2012 >>> net-misc/curl-7.26.0
Sat Nov  3 12:21:51 2012 <<< mail-mta/postfix-2.9.3
Sat Nov  3 12:21:53 2012 >>> mail-mta/postfix-2.9.4
Sat Nov  3 12:22:01 2012 <<< dev-perl/Net-SSLeay-1.360.0
Sat Nov  3 12:22:03 2012 >>> dev-perl/Net-SSLeay-1.480.0-r1
Sat Nov  3 12:22:12 2012 <<< sys-auth/nss_ldap-264-r1
Sat Nov  3 12:22:14 2012 >>> sys-auth/nss_ldap-265-r1
Sat Nov  3 12:22:25 2012 <<< net-mail/fetchmail-6.3.21
Sat Nov  3 12:22:27 2012 >>> net-mail/fetchmail-6.3.22
Sat Nov  3 12:22:37 2012 <<< net-misc/dhcp-4.2.4_p1
Sat Nov  3 12:22:39 2012 >>> net-misc/dhcp-4.2.4_p2
Sat Nov  3 12:22:48 2012 <<< net-analyzer/tcpdump-3.9.8-r1
Sat Nov  3 12:22:50 2012 >>> net-analyzer/tcpdump-4.3.0
Sat Nov  3 12:23:07 2012 <<< dev-util/cmake-2.8.8-r3
Sat Nov  3 12:23:09 2012 >>> dev-util/cmake-2.8.9
Sat Nov  3 12:23:21 2012 <<< dev-vcs/subversion-1.6.17-r7
Sat Nov  3 12:23:24 2012 >>> dev-vcs/subversion-1.6.17-r7
Sat Nov  3 12:27:56 2012 <<< media-gfx/imagemagick-6.7.8.7
Sat Nov  3 12:27:58 2012 >>> media-gfx/imagemagick-6.7.8.7
 


On Thu, 2012-11-01 at 07:08 +0100, Vladimir Elisseev wrote:
> Yes, hb_report is there, thanks!
> 
> On Thu, 2012-11-01 at 11:40 +1100, Andrew Beekhof wrote:
> > On Tue, Oct 30, 2012 at 4:35 PM, Vladimir Elisseev <vovan at vovan.nl> wrote:
> > > Thanks for trying to help! Currently I can't provide crm_report from the
> > > failed node, as I've decided to restore the complete node from backup.
> > > The versions I use are corosync-1.3.0 and pacemaker-1.0.10. Actually the
> > > problem occurred after updating quiet a few system packages, but all the
> > > cluster related software was untouched. I've found exactly the same
> > > issue described in the mailing list earlier:
> > > http://www.gossamer-threads.com/lists/linuxha/pacemaker/77881?do=post_view_threaded#77881
> > > At least symptoms are exactly the same as well as pasted log files. I've
> > > tried enable debug logging as well and saw that crm tries to connect to
> > > cib sockets (/var/run/crm_*) too early (IMO) and fails because cib
> > > wasn't started yet.
> > > I'm planning to repeat update of these system again, but I'll do this
> > > more carefully in order to understand which particular package leads to
> > > this behavior. BTW, how can I create crm_report? I can't find this
> > > binary anywhere on the system.
> > 
> > Its included in subsequent 1.0.x releases.
> > You should have hb_report available though.
> > 
> > > Let me know what kind of input you'll
> > > need if I'll be able to reproduce this problem.
> > >
> > > Regards,
> > > Vlad.
> > >
> > >
> > > On Tue, 2012-10-30 at 16:00 +1100, Andrew Beekhof wrote:
> > >> On Sun, Oct 28, 2012 at 9:05 PM, Vladimir Elisseev <vovan at vovan.nl> wrote:
> > >> > Hello,
> > >> >
> > >> > I'm having problem that after reboot one cluster node can't join cluster
> > >> > anymore. Form the log file I can't understand what actually is going on.
> > >> > I only can see, that cib and crm both are respawned frequently. I'd
> > >> > appreciate any help. Below is relevant part of the log file:
> > >>
> > >> I appreciate that you're trying to keep it brief, but problems often
> > >> originate much earlier than people suspect.
> > >> Can you instead attach a crm_report tarball, that will have everything
> > >> (from both nodes) that we need to be able to help.
> > >>
> > >> What version is this btw?
> > >>
> > >> >
> > >> > Oct 28 10:52:22 srv2 cib: [10646]: info: cib_server_process_diff: Requesting re-sync from peer
> > >> > Oct 28 10:52:22 srv2 cib: [10646]: WARN: cib_diff_notify: Local-only Change (client:crmd, call: 4770): -1.-1.-1 (Application of an update diff failed, requesting a full refresh)
> > >> > Oct 28 10:52:22 srv2 cib: [10653]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.qJTUAV (digest: /var/lib/heartbeat/crm/cib.XwOKXQ)
> > >> > Oct 28 10:52:22 srv2 cib: [10646]: WARN: cib_server_process_diff: Not applying diff 0.1298.5 -> 0.1299.1 (sync in progress)
> > >> > Oct 28 10:52:22 srv2 cib: [10646]: info: cib_replace_notify: Local-only Replace: -1.-1.-1 from srv1
> > >> > Oct 28 10:52:22 corosync [pcmk]:  ] info: pcmk_ipc_exit: Client cib (conn=0x1837340, async-conn=0x1837340) left
> > >> > Oct 28 10:52:22 corosync [pcmk]:  ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 6 (pid=10646, core=true)
> > >> > Oct 28 10:52:22 corosync [pcmk]:  ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
> > >> > Oct 28 10:52:22 corosync [pcmk]:  ] info: spawn_child: Forked child 10656 for process cib
> > >> > Oct 28 10:52:22 srv2 cib: [10656]: info: Invoked: /usr/lib64/heartbeat/cib
> > >> >
> > >> >
> > >> > Regards,
> > >> > Vlad.
> > >> >
> > >> >
> > >> > _______________________________________________
> > >> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >> >
> > >> > Project Home: http://www.clusterlabs.org
> > >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > >> > Bugs: http://bugs.clusterlabs.org
> > >>
> > >> _______________________________________________
> > >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >>
> > >> Project Home: http://www.clusterlabs.org
> > >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > >> Bugs: http://bugs.clusterlabs.org
> > >
> > >
> > >
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> > 
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: srv1.tar.bz2
Type: application/x-bzip-compressed-tar
Size: 69084 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20121103/a90705f9/attachment-0006.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: srv2.tar.bz2
Type: application/x-bzip-compressed-tar
Size: 69280 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20121103/a90705f9/attachment-0007.bin>


More information about the Pacemaker mailing list