[Pacemaker] OCF exit code 8 triggers WARN message

Wed Oct 5 16:35:52 EDT 2011

On Fri, Sep 16, 2011 at 05:33:48PM +0200, Lars Ellenberg wrote:
> On Fri, Sep 16, 2011 at 05:02:52PM +0200, Dejan Muhamedagic wrote:
> > Hi Thilo,
> > 
> > On Fri, Sep 16, 2011 at 04:41:59PM +0200, Thilo Uttendorfer wrote:
> > > Hi,
> > > 
> > > I experience a lot of "WARN" log entries in several pacemaker cluster setups:
> > > 
> > > Sep 16 11:53:21 server01 lrmd: [23946]: WARN: Managed res1:0:monitor process 
> > > 26489 exited with return code 8.
> > > 
> > > That's because multi state resources like DRBD have some special return 
> > > codes. "8" means OCF_RUNNING_MASTER which should not trigger a warning. The 
> > > folowing patch in cluster-clue solved this issue:
> > > 
> > > -------------
> > > diff -u  lib/clplumbing/proctrack.c lib/clplumbing/proctrack.c.patched
> > > 
> > > --- lib/clplumbing/proctrack.c  2011-09-16 15:48:25.000000000 +0200
> > > +++ lib/clplumbing/proctrack.c.patched  2011-09-16 15:51:43.000000000 +0200
> > > @@ -271,7 +271,7 @@
> > >  
> > >         if (doreport) {
> > >                 if (deathbyexit) {
> > > -                       cl_log((exitcode == 0 ? LOG_INFO : LOG_WARNING)
> > > +                       cl_log(((exitcode == 0 || exitcode == 8) ? LOG_INFO : 
> > > LOG_WARNING)
> > >                         ,       "Managed %s process %d exited with return 
> > > code %d."
> > >                         ,       type, pid, exitcode);
> > >                 }else if (deathbysig) {
> > > -------------
> > 
> > I did consider this before but was worried that a process
> > different from OCF RA instance could exit with such a code. Code
> > 7 (not running) also belongs to this category. Anyway, we should
> > probably add this patch.
> 
> Hm...
> As lrmd is not the sole users of that proctrack interface,
> and not everything lrmd does is a monitor operation,

True, but I'd say that exit codes 7 and 8 are a rarity. And even
if occasionally we log a message with info severity instead of
warning, that wouldn't be such a big deal, IMO.

> can we add an other loglevel flag there, e.g. PT_LOG_OCF_MONITOR,
> and base "degradation" of log level for "expected exit codes" on that?

Nothing against it, of course, if it doesn't complicate things
too much :)

Cheers,

Dejan

P.S. Moving to a more proper list (lest we lose this again).

> 
> -- 
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker