[ClusterLabs] Antw: [EXT] Coming in Pacemaker 2.1.2: friendlier failed action display

Ken Gaillot kgaillot at redhat.com
Fri Oct 29 11:08:36 EDT 2021


On Fri, 2021-10-29 at 08:24 +0200, Ulrich Windl wrote:
> > > > Ken Gaillot <kgaillot at redhat.com> schrieb am 28.10.2021 um
> > > > 17:28 in
> Nachricht
> <f94a60ebb33d4a25c137c19485638e13ab00090d.camel at redhat.com>:
> > Hi all,
> > 
> > I hope to release the first release candidate for Pacemaker 2.1.2
> > next
> > week.
> > 
> > One of the most noticeable changes will be in failed action
> > displays in
> > crm_mon. (This change will *not* show up if Pacemaker is built with
> > the
> > ‑‑enable‑compat‑2.0 option.)
> > 
> > An example from one of our regression tests, using the familiar
> > display:
> > 
> >  * ClusterIP:0_monitor_30000 on fc16‑builder 'not running' (7):
> > call=11, status='complete', last‑rc‑change='Wed Feb 22 11:04:34
> > 2012',
> > queued=0ms, exec=20ms
> > 
> > 
> > As of 2.1.2, that display will still be available if crm_mon is
> > given
> > the ‑‑show‑detail option, but by default it will now look like:
> > 
> >  * ClusterIP:0 30s‑interval monitor on fc16‑builder returned 'not
> > running' at Wed Feb 22 11:04:34 2012 after 20ms
> 
> Hi!
> 
> that's ambiguous: 20ms after 11:04:34, or is 11:04:34 20ms after the
> result
> was reported (I think that is what the old format says)
> Probably too verbose: "... started at ... returned ... after 20ms"
> (it won't make a big difference for 20ms, but it might for 3000ms)

The intent is "the action completed at this time, after executing for
this much time". So, the action started at the given time minus the
given duration.

> Also, as clusters are faster than they were 15 years ago, what about
> "donating" fractional digits to the seconds?

The detail output (the only output before) always shows milliseconds.
The new default output shows a readable version with the full available
precision, e.g. "335ms" or "1m30.937s".

> I wouldn't care to use a more ISO-like date format too (not much
> caring much
> about the days of week)? Maybe like "YYYY-MM-DD HH:MM:SS (%:z)"?

I used the same time format that's used in other messages, for
consistency.

Somewhat related, if a newer libqb is used, the pacemaker log will
automatically use high-precision timestamps, so you can check for a
more specific time in the logs.

> > Here's another before‑and‑after example:
> > 
> >  * rsc1_monitor_10000 on sles11‑1 'not installed' (5): call=26,
> > status='Not installed', last‑rc‑change='Thu Aug  8 20:20:39 2013',
> > queued=0ms, exec=0ms
> > 
> >  * rsc1 10s‑interval monitor on sles11‑1 could not be executed (Not
> > installed) at Thu Aug  8 20:20:39 2013
> 
> What about "... be executed (reason: Not installed) ..."
> Yes whe _know_ it's the reason given in parens, but ...

I tried to balance readability with brevity, the display can get really
crowded pretty quickly. Keep in mind there can be an exit reason in
addition to that brief status, in which case it will look something
like:

 * rsc1 10s‑interval monitor on sles11‑1 could not be executed (Not
installed: No such file or directory) at Thu Aug  8 20:20:39 2013

The original example didn't have an exit reason because it's from one
of our older regression tests, but the new code would add an exit
reason when it can't execute something.

> > Combined with exit reasons now displayed for internal errors, this
> > should hopefully make it easier to quickly see what's wrong (or at
> > least a decent pointer in the right direction).
> 
> Also: What about option -J, --as-json? ;-)

Actually we designed the output model so that it can easily be extended
to new formats. :) Pull requests welcome ;)

> And it the structure of the XML output formally described somewhere?

Sort of ... there's a RelaxNG schema in xml/api in the source, usually
installed as /usr/share/pacemaker/api. It's not human-friendly but it's
definitive.

> Regards,
> Ulrich

-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list