<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Apr 3, 2018 at 10:18 PM, Jehan-Guillaume de Rorthais <span dir="ltr"><<a href="mailto:jgdr@dalibo.com" target="_blank">jgdr@dalibo.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Tue, 3 Apr 2018 09:58:50 +1000<br>

<span class="">Andrew Beekhof <<a href="mailto:abeekhof@redhat.com">abeekhof@redhat.com</a>> wrote:<br>

> On Fri, Mar 30, 2018 at 8:36 PM, Jehan-Guillaume de Rorthais <<br>

> <a href="mailto:jgdr@dalibo.com">jgdr@dalibo.com</a>> wrote:<br>

> > On Thu, 29 Mar 2018 09:32:41 +1100<br>

> > Andrew Beekhof <<a href="mailto:abeekhof@redhat.com">abeekhof@redhat.com</a>> wrote:<br>

> > > On Thu, Mar 29, 2018 at 8:07 AM, Jehan-Guillaume de Rorthais <<br>

> > > <a href="mailto:jgdr@dalibo.com">jgdr@dalibo.com</a>> wrote:<br>

</span>[...]<br>

<span class="">> > > Though by now there is surely a decent library for logging to files with<br>

> > > sub-second timestamps - if we could incorporate that into libqb and have<br>

> > > corosync use it too...<br>

> ><br>

> > In my opinion, this is neither the role of libqb<br>

><br>

><br>

> libqb has the logging library that pacemaker and corosync use.<br>

> it is absolutely where this change should happen<br>

<br>

</span>I meant that this could be handled 100% by some external dedicated daemon, eg.<br>

syslog or journalctl.<br>

<br>

I was thinking about code simplification.<br>

<br>

[...]<br>

<span class=""><br>

> > > then we could consider 1 log per daemon.<br>

> > > In which case, the outcome of the PREFIX-SUFFIX discussion above could<br>

> > > instead be used for /var/log/pacemaker/SUFFIX<br>

> ><br>

> > I think the best would be to have one log for Corosync, one log for<br>

> > Pacemaker (and all its sub-process/childs).<br>

> ><br>

> > Another good path toward understandable log file would be to hide what<br>

> > process is speaking. Experienced user will still know that "LOG: setting<br>

> > failcount to 3" comes from CRMd and "DEBUG1: failcount setted to 3" comes<br>

> > from attrd.<br>

> ><br>

> > However, this would probably be a mess...because again, the cause might be<br>

> > logged AFTER the effects/reaction :/<br>

><br>

> why?  i've never seen that be the case<br>

<br>

</span>Please find in attachment a demonstration of such behavior I found last week.<br>

Note that this comes from a Sles 12 SP1 using Pacemaker 1.1.13...People there<br>

were not able to upgrade the servers before we built the PoC together.<br>

<br>

First column is the order in the log file. Second column is how I would expect<br>

the messages to appear in the log.<br>

<br>

Eg. I would expect L.11<br>

<br>

  "pengine: notice: process_pe_message: Calculated Transition 29: [...]"<br>

<br>

Before CRMd begin to process it at L.6-10.<br>

<br>

Another exemple, I would expect LRMd L.35:<br>

<br>

  "lrmd:  notice: log_finished:  finished - rsc:pgsqld action:notify"<br>

<br>

Before the CRMd receive the result L.26...<br></blockquote><div><br></div><div><br></div><div>No, none of these are out of order.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Maybe this is something fixed in 1.1.18 or 2.0.0, I just couldn't find commit<br>

messages related to this when searching through them quickly.<br>

<span class=""><br>

> > Maybe the solution is to log only messages from CRMD, where all the<br>

> > orchestration comes from. Everything else might go to some debug level if<br>

> > needed.<br>

><br>

> sorry, that is a terrible idea<br>

<br>

</span>I was throwing random ideas as I'm not familiar with internal architecture.<br>

Maybe it should be pacemakerd to gather messages from all other messages and<br>

spit them to stderr so they are captured by journald or redirected to a file...<br>

<br>

Regards,</blockquote></div><br></div></div>