[Pacemaker] Designated reaction of Pacemaker to monitor-op returning rc=7 (OCF_NOT_RUNNING)

Dejan Muhamedagic dejanmm at fastmail.fm
Thu Aug 26 04:38:31 EDT 2010


Hi,

On Wed, Aug 25, 2010 at 08:56:08PM +0200, Cnut Jansen wrote:
> Am 25.08.2010 16:00, schrieb Dejan Muhamedagic:
> > Hi,
> > 
> > On Tue, Aug 24, 2010 at 05:19:23PM +0200, Cnut Jansen wrote:
> >> Hi,
> >>
> >> just (for now) a short question for to make sure I didn't miss anything:
> >> What's the designated reaction of Pacemaker when a resource agents
> >> called for monitoring a resource, which is supposed to run and thus
> >> resulting in a return of 0 (OCF_SUCCESS), returns 7 (OCF_NOT_RUNNING)?
> >> Shall Pacemaker's very next call be for stopping the resource or shall
> >> it be yet another (or even several) monitorings?
> > 
> > It should be stop, followed by start, either on the same node or
> > on another depending on the migration-threshold setting and
> > failcount.
> 
> Ok, that's what I expected.
> So there are neither so-far-unknown-to-me circumstances where it's by
> design that Pacemaker - after having gotten a rc=7 from the RA; and for
> adding a "FAILED" behind the resource in crm_mon, it obviously also
> understood it correctly - calls the RA yet another several times for
> monitoring (while letting the rest of the cluster hang) before finally
> calling the desired stop, instead of immediately calling the RA for
> stopping and continueing with the pending transactions and migrations.

Yes, that sounds quite unusual.

> I'll first try to reproduce that on my cluster at home too, reduce the
> configuration to reproductional minimum and then might give a more
> detailed description for this issue.
> 
> 
> >> Or are there various designated reactions to this case, depending on
> >> various conditions or something?
> > 
> > This is the default. You can change it by setting the "on-fail"
> > attribute for the monitor (or any other) operation.
> 
> Allowed values are [ignore, block, restart, stop, fence], default is
> restart, and there's no value, option or whatever like
> on-fail="repeat-op[-N-times]" or something, right?

Right.

> (btw., jfyi: migration-thresholds are currently completely banned out of

Why? Anything wrong with them?

> my configurations, so this is another issue; I probably also might have
> yet another issue / possible bug regarding zombie-(monitor-)operations,
> with symptoms like of an off-by-one-error)

Please file a bugzilla if you find a bug.

Thanks,

Dejan

> > 
> > Thanks,
> > 
> > Dejan
> > 
> >> Cnut Jansen
> >>
> >>
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> > 
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> > 
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




More information about the Pacemaker mailing list