[Pacemaker] OCF Resource agent monitor activity failed due to temporary error

Kulovits Christian - OS ITSC Christian.Kulovits at austrian.com
Thu Apr 19 10:46:58 EDT 2012


>You want pacemaker to ignore monitor errors on all unknown return values
>and go on with monitoring until a resource "heals" itself?

Definitely not. I do not want to let pacemaker ignore all unknown return values.
I ever thought that pacemaker is a tool for HA.

>.... please rethink ... it is a resource agents work to reliable tell
>pacemaker the definite resource state -- and "uhm, hm, don't know now
>please try later" can be everything -- and how to find that out is very
>specific depending on the resource. IMHO that makes no sense at all to
>let the cluster manager do this work.

I do not want to let the cluster manager do this work. Instead a method for retry of a RA monitor activity in the next interval should be provided.

In this specific case a whole application becomes unavailable only because the external command to check the resource state was temporarily unavailable. The resource itself was available until pacemaker did a restart. To retry the command until it succeeds is an option until the specified timeout occurs. The RA has no option to avoid this. I think it could be a nice feature to give the RA the options to return a value for on-fail. If the RA could return on-fail=block (Don't perform any further operations on the resource) and pacemaker would it set unmanaged, the resource would be HA.

>There may be cases were a "degraded" resource state may be a nice
>feature and is already a topic here on the list ... from time to time.

There may be sufficient reasons to ignore topics on the list .... from time to time. But our goal is HA and there is no reason not to talk about it, or?

Christian



-----Original Message-----
From: Andreas Kurz [mailto:andreas at hastexo.com]
Sent: Donnerstag, 19. April 2012 14:36
To: pacemaker at oss.clusterlabs.org
Subject: Re: [Pacemaker] OCF Resource agent monitor activity failed due to temporary error

On 04/19/2012 01:59 PM, Kulovits Christian - OS ITSC wrote:
> Hi Andreas,
> Exactly this is what i want pacemaker to do when my RA is not able to determine the resource´s state. But without running into timeout and restart.
> It's the method to display the resource´s state that is unavailable not the resource itself. This typically approach must be coded in every RA instead of once in pacemaker.

You want pacemaker to ignore monitor errors on all unknown return values
and go on with monitoring until a resource "heals" itself?

.... please rethink ... it is a resource agents work to reliable tell
pacemaker the definite resource state -- and "uhm, hm, don't know now
please try later" can be everything -- and how to find that out is very
specific depending on the resource. IMHO that makes no sense at all to
let the cluster manager do this work.

There may be cases were a "degraded" resource state may be a nice
feature and is already a topic here on the list ... from time to time.

Regards,
Andreas

> Christian
>
> -----Original Message-----
> From: Andreas Kurz [mailto:andreas at hastexo.com]
> Sent: Donnerstag, 19. April 2012 13:51
> To: pacemaker at oss.clusterlabs.org
> Subject: Re: [Pacemaker] OCF Resource agent monitor activity failed due to temporary error
>
> Hi Christian,
>
> On 04/19/2012 01:38 PM, Kulovits Christian - OS ITSC wrote:
>> Hi, Andreas
>>
>> What if the RA gets a response from an external command in the form: "display currently unavailable, try later". The RA has 3 possibly states available, "Running", "Not Running", "Failed". But in this situation he would say "don't know". When I set "on-fail=ignore" this error will be ignored the same way as when response is "not running" and the resource will never be restarted.
>> Christian
>
> A typically approach is to wait a little bit and retry the monitor
> command until it succeeds to deliver a valid status (running/not
> running) or the RA monitor operation timeouts and the script is killed
> including resource recovery.
>
> Regards,
> Andreas
>

--
Need help with Pacemaker?
http://www.hastexo.com/now



______________________________________________________________________

Austrian Airlines AG, Office Park 2, P.O. Box 100, 1300 Vienna-Airport, Austria, registered office: Vienna, registered with Vienna Commercial Court under FN 111000k, DVR 0091740. This e-mail is confidential and is subject to disclaimers. Details can be found at: http://www.austrian.com/disclaimer.




More information about the Pacemaker mailing list