[Pacemaker] monitor on-fail=ignore not restarting when resource reported as stopped

Patrick Hemmer pacemaker at feystorm.net
Fri Dec 6 16:06:09 EST 2013


 


------------------------------------------------------------------------
*From: *Lars Marowsky-Bree <lmb at suse.com>
*Sent: * 2013-12-06 13:44:53 E
*To: *The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
*Subject: *Re: [Pacemaker] monitor on-fail=ignore not restarting when
resource reported as stopped

> On 2013-12-06T11:21:02, Patrick Hemmer <pacemaker at feystorm.net> wrote:
>
>>> So where is the problem? If the script returns "ERROR" than pacemaker has to 
>>> acct accordingly.
>> If the script returns "ERROR" the `on-fail=ignore` should make it do
>> nothing. Amazon's API failed, we need to just retry again later.
>> If the script returns "STOPPED", this isn't an error. The script queried
>> the resource, found it was stopped, and reported it as stopped.
>> Pacemaker should act accordingly and start it back up.
> For a resource that pacemaker expects to be started, it's an error if it
> is found to be stopped. Pacemaker can't tell if it is really cleanly
> stopped, or died, or ...
Oh, and I'll quote the OCF spec on this one:

1     generic or unspecified error (current practice)
    The "monitor" operation shall return this for a crashed, hung or
    otherwise non-functional resource.

7     program is not running
    Note: This is not the error code to be returned by a successful
    "stop" operation. A successful "stop" operation shall return 0.
    The "monitor" action shall return this value only for a
    _cleanly_ stopped resource. If in doubt, it should return 1.

So the OCF spec very clearly states that OCF_ERR_GENERIC means it's
failed. OCF_NOT_RUNNING means it shut down cleanly. So yes, pacemaker
can tell if it cleanly stopped.

>
> If you want Pacemaker to recover failed resources, do not set
> on-fail="ignore". I still don't quite get why you set that when you
> obviously don't want the associated behaviour?
>
>
> Regards,
>     Lars
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131206/e794f21a/attachment-0003.html>


More information about the Pacemaker mailing list