[ClusterLabs] Pacemaker fails to start after few starts

Thu Jun 11 03:49:40 EDT 2015

Hi Ken,

I remember with to be informed about this issue.
Unfortunately I cannot work on it right now, so it is postponed for some
time.
Anyway, as soon as I resume my work on it, I will let you know the result.
Thank you for the help and all the advises =)

P.S.: I noticed that "users at clusterlabs.org" wasn't in the later
conversation, so I am adding it to keep this info alive forever in the web
=)

Thank you,
Kostya

On Sat, May 2, 2015 at 1:32 AM, Ken Gaillot <kgaillot at redhat.com> wrote:

> On 04/28/2015 04:56 AM, Kostiantyn Ponomarenko wrote:
> >> Are you configuring and compiling the software separately on each node,
> >> or compiling once and installing those binaries/packages on all nodes?
> >
> > I compiled it once and now just using the binaries.
> >
> >> If compiling once, are your nodes similar enough that the one
> >> ./configure is valid for all of them?
> >
> > Yes, the are all same - same Debian version.
> >
> >> What specific versions of libqb, corosync and pacemaker are you using?
> >
> > libqb 0.17.1
> > corosync 3.2.4
> > pacemaker 1.1.12
>
> Those should be good. FYI pacemaker 1.1.13 should be out soon; I doubt
> it would fix your issue, but if you're compiling anyway you may want to
> grab it when it comes out.
>
> >> Are you sure you're replacing all of these and not using one from
> > an older install?
> >
> > Yes, I am pretty sure, but I will give it a try - I will build all the
> > packages one more time on a clean system.
>
> If your destination systems ever had the Debian-provided packages
> installed, make sure you purge them, in case there is some file hanging
> around somewhere. Maybe compare the file list of the Debian-provided
> packages against what you're installing to make sure you have everything.
>
> This is an odd problem and I'm curious whether you find a solution. The
> big mystery to me is that you saw both these messages:
>
> Apr 22 10:49:08 daemon.notice<29> pacemakerd[114133]:   notice:
> crm_add_logfile: Additional logging available in
> /var/log/pacemaker.log
>
> Apr 22 10:49:08 daemon.err<27> pacemakerd[114133]:    error:
> mcp_read_config: Couldn't create logfile: /var/log/pacemaker.log
>
> If you look at lib/common/logging.c in the source code,
> crm_add_logfile() will always return TRUE if it gets as far as the
> "Additional logging" message. But if you look at mcp/corosync.c,
> mcp_read_config() will print the "Couldn't create logfile" message only
> if crm_add_logfile() returned FALSE.
>
> I don't see a code path that could print both, so that's why I suspect
> some sort of memory corruption (such as a corrupted library or binary,
> but potentially there could be a memory overflow in the code somewhere
> that for some reason is triggered only in your setup).
>
> You might try turning on debug logging by setting PCMK_debug=yes in
> /etc/default/pacemaker. Normally the additional logs would go into
> /var/log/pacemaker.log rather than syslog, but since the issue is that
> you don't get pacemaker.log, you can make it go to syslog instead with
> PCMK_logfacility=info (noisy) or debug (noisier).
>
> It's very verbose though and mainly useful for tracing through the code.
> If you turn it on make sure it doesn't impact your disk I/O too much.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20150611/4a88592d/attachment-0002.html>