[Pacemaker] Designated reaction of Pacemaker to monitor-op returning rc=7 (OCF_NOT_RUNNING)

Wed Aug 25 14:56:08 EDT 2010

Am 25.08.2010 16:00, schrieb Dejan Muhamedagic:
> Hi,
> 
> On Tue, Aug 24, 2010 at 05:19:23PM +0200, Cnut Jansen wrote:
>> Hi,
>>
>> just (for now) a short question for to make sure I didn't miss anything:
>> What's the designated reaction of Pacemaker when a resource agents
>> called for monitoring a resource, which is supposed to run and thus
>> resulting in a return of 0 (OCF_SUCCESS), returns 7 (OCF_NOT_RUNNING)?
>> Shall Pacemaker's very next call be for stopping the resource or shall
>> it be yet another (or even several) monitorings?
> 
> It should be stop, followed by start, either on the same node or
> on another depending on the migration-threshold setting and
> failcount.

Ok, that's what I expected.
So there are neither so-far-unknown-to-me circumstances where it's by
design that Pacemaker - after having gotten a rc=7 from the RA; and for
adding a "FAILED" behind the resource in crm_mon, it obviously also
understood it correctly - calls the RA yet another several times for
monitoring (while letting the rest of the cluster hang) before finally
calling the desired stop, instead of immediately calling the RA for
stopping and continueing with the pending transactions and migrations.

I'll first try to reproduce that on my cluster at home too, reduce the
configuration to reproductional minimum and then might give a more
detailed description for this issue.

>> Or are there various designated reactions to this case, depending on
>> various conditions or something?
> 
> This is the default. You can change it by setting the "on-fail"
> attribute for the monitor (or any other) operation.

Allowed values are [ignore, block, restart, stop, fence], default is
restart, and there's no value, option or whatever like
on-fail="repeat-op[-N-times]" or something, right?

(btw., jfyi: migration-thresholds are currently completely banned out of
my configurations, so this is another issue; I probably also might have
yet another issue / possible bug regarding zombie-(monitor-)operations,
with symptoms like of an off-by-one-error)

> 
> Thanks,
> 
> Dejan
> 
>> Cnut Jansen
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>