[Pacemaker] node can't join cluster after reboot

Andrew Beekhof andrew at beekhof.net
Wed Oct 31 20:40:46 EDT 2012


On Tue, Oct 30, 2012 at 4:35 PM, Vladimir Elisseev <vovan at vovan.nl> wrote:
> Thanks for trying to help! Currently I can't provide crm_report from the
> failed node, as I've decided to restore the complete node from backup.
> The versions I use are corosync-1.3.0 and pacemaker-1.0.10. Actually the
> problem occurred after updating quiet a few system packages, but all the
> cluster related software was untouched. I've found exactly the same
> issue described in the mailing list earlier:
> http://www.gossamer-threads.com/lists/linuxha/pacemaker/77881?do=post_view_threaded#77881
> At least symptoms are exactly the same as well as pasted log files. I've
> tried enable debug logging as well and saw that crm tries to connect to
> cib sockets (/var/run/crm_*) too early (IMO) and fails because cib
> wasn't started yet.
> I'm planning to repeat update of these system again, but I'll do this
> more carefully in order to understand which particular package leads to
> this behavior. BTW, how can I create crm_report? I can't find this
> binary anywhere on the system.

Its included in subsequent 1.0.x releases.
You should have hb_report available though.

> Let me know what kind of input you'll
> need if I'll be able to reproduce this problem.
>
> Regards,
> Vlad.
>
>
> On Tue, 2012-10-30 at 16:00 +1100, Andrew Beekhof wrote:
>> On Sun, Oct 28, 2012 at 9:05 PM, Vladimir Elisseev <vovan at vovan.nl> wrote:
>> > Hello,
>> >
>> > I'm having problem that after reboot one cluster node can't join cluster
>> > anymore. Form the log file I can't understand what actually is going on.
>> > I only can see, that cib and crm both are respawned frequently. I'd
>> > appreciate any help. Below is relevant part of the log file:
>>
>> I appreciate that you're trying to keep it brief, but problems often
>> originate much earlier than people suspect.
>> Can you instead attach a crm_report tarball, that will have everything
>> (from both nodes) that we need to be able to help.
>>
>> What version is this btw?
>>
>> >
>> > Oct 28 10:52:22 srv2 cib: [10646]: info: cib_server_process_diff: Requesting re-sync from peer
>> > Oct 28 10:52:22 srv2 cib: [10646]: WARN: cib_diff_notify: Local-only Change (client:crmd, call: 4770): -1.-1.-1 (Application of an update diff failed, requesting a full refresh)
>> > Oct 28 10:52:22 srv2 cib: [10653]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.qJTUAV (digest: /var/lib/heartbeat/crm/cib.XwOKXQ)
>> > Oct 28 10:52:22 srv2 cib: [10646]: WARN: cib_server_process_diff: Not applying diff 0.1298.5 -> 0.1299.1 (sync in progress)
>> > Oct 28 10:52:22 srv2 cib: [10646]: info: cib_replace_notify: Local-only Replace: -1.-1.-1 from srv1
>> > Oct 28 10:52:22 corosync [pcmk]:  ] info: pcmk_ipc_exit: Client cib (conn=0x1837340, async-conn=0x1837340) left
>> > Oct 28 10:52:22 corosync [pcmk]:  ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 6 (pid=10646, core=true)
>> > Oct 28 10:52:22 corosync [pcmk]:  ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
>> > Oct 28 10:52:22 corosync [pcmk]:  ] info: spawn_child: Forked child 10656 for process cib
>> > Oct 28 10:52:22 srv2 cib: [10656]: info: Invoked: /usr/lib64/heartbeat/cib
>> >
>> >
>> > Regards,
>> > Vlad.
>> >
>> >
>> > _______________________________________________
>> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




More information about the Pacemaker mailing list