[Pacemaker] Resource Agent timeout

Dejan Muhamedagic dejanmm at fastmail.fm
Tue Jun 21 05:28:17 EDT 2011


On Tue, Jun 21, 2011 at 07:37:22AM +0200, Kulovits Christian - OS ITSC wrote:
> Hi Dejan,
> We have sybase at our shop, and the start of the Sybase server may last from 5 minutes to up to 45 minutes. I found a resource agent in the web who needs 3 timeout parameter passed to it, one for start, one for stop and one for monitor.

I guess that you know you have to make sure that the resource
agent is correctly implemented. There's also ocf-tester to help
with testing.

> And the cluster config itself has similar timeout values set for start, stop and monitor activity in the metadata for the defined resource primitive. 
> Back to the Sybase server. I tried to change this RA in a way to remove the redundant timeout parameters, run the start until the resources start-timeout has elapsed, set the resource itself to unmanaged with
> crm_resource --meta -t primitive -r $OCF_RESOURCE_INSTANCE -p is-managed -v false
> and return with rc=0 to leave the starting Sybase running. But for this part of the code running after the SIGTERM there are only 5 seconds to live.
> 
> The reason to do so is because after the Sybase startup has timed out the cluster itself will stop the Sybase resource, and this will terminate the startup process and we have to run the long lasting startup again. Another way would be to get the meta data for the resource primitive passed to the resource agent. But I found no way to get it till now.
> Another way is to set the timeout to a very very high value, but I think this is not a very good idea.

Why not? That's the only thing you can do actually. Note that
only if the resource may hang the shorter timeout may help.

Thanks,

Dejan

> 
> Regards, Christian
> 
> -----Original Message-----
> From: Dejan Muhamedagic [mailto:dejanmm at fastmail.fm] 
> Sent: Montag, 20. Juni 2011 16:18
> To: The Pacemaker cluster resource manager
> Subject: Re: [Pacemaker] Resource Agent timeout
> 
> Hi,
> 
> On Mon, Jun 20, 2011 at 03:15:23PM +0200, Kulovits Christian - OS ITSC wrote:
> > Andreas,
> > you mean the cluster wide default timeout? I wonder if there is a possibility to set the fixed timeout of 5 secs when SIGKILL is issued after the SIGTERM when the resource timeout is exceeded.
> 
> No, it's not configurable. 
> 
> What's the use case? You should prevent actions from timing out in the first place.
> 
> Thanks,
> 
> Dejan
> 
> > Regards,
> > Christian
> > 
> > -----Original Message-----
> > From: Andreas Kurz [mailto:andreas.kurz at linbit.com]
> > Sent: Montag, 20. Juni 2011 15:08
> > To: pacemaker at oss.clusterlabs.org
> > Subject: Re: [Pacemaker] Resource Agent timeout
> > 
> > On 2011-06-20 14:28, Kulovits Christian - OS ITSC wrote:
> > > Hello List,
> > > 
> > >  
> > > 
> > > When a resource agent times out a SIGTERM is issued when the timeout 
> > > value has exceeded. When the resource agent will not terminate 
> > > within the next  5 seconds a SIGKILL is issued. Is there a way to 
> > > set this limit? May be to 30 secs or so? 5 seconds may often be 
> > > insufficient for a proper cleanup.
> > > 
> > 
> > The default action timeout is 20s so you already "tuned" it ... you can set a global "default-action-timeout" or specify a timeout for each operation per resource.
> > 
> > Regards,
> > Andreas
> > 
> > >  
> > > 
> > >  
> > > 
> > > Jun 20 10:51:04 mars lrmd: [2178]: info: RA output:
> > > (res_TimeoutRA_Killroy:stop:stderr) + sleep 10
> > > 
> > > Jun 20 10:51:08 mars lrmd: [2178]: WARN: res_TimeoutRA_Killroy:stop 
> > > process (PID 24359) timed out (try 1).  Killing with signal SIGTERM (15).
> > > 
> > > Jun 20 10:51:08 mars lrmd: [2178]: info: RA output:
> > > (res_TimeoutRA_Killroy:stop:stderr) Terminated
> > > 
> > > Jun 20 10:51:08 mars lrmd: [2178]: info: RA output:
> > > (res_TimeoutRA_Killroy:stop:stderr) ++ ha_debug 'DEBUG: Resource
> > > (res_TimeoutRA_Killroy): Timeout during stop of res_TimeoutRA_Killroy'
> > > 
> > > ++ sleep 10
> > > 
> > >  
> > > 
> > > Jun 20 10:51:13 mars lrmd: [2178]: WARN: res_TimeoutRA_Killroy:stop 
> > > process (PID 24359) timed out (try 2).  Killing with signal SIGKILL (9).
> > > 
> > > Jun 20 10:51:13 mars lrmd: [2178]: WARN: operation stop[94] on 
> > > ocf::TimeoutRA::res_TimeoutRA_Killroy for client 2181, its parameters:
> > > CRM_meta_timeout=[5000] crm_feature_set=[3.0.1] CRM_meta_name=[start] :
> > > pid [24359] timed out
> > > 
> > > Jun 20 10:51:13 mars crmd: [2181]: ERROR: process_lrm_event: LRM 
> > > operation res_TimeoutRA_Killroy_stop_0 (94) Timed Out 
> > > (timeout=5000ms)
> > > 
> > > Mit freundlichen Grüßen / with best regards Christian Kulovits
> > > 
> > > ____________________________________________
> > > 
> > > Description: cid:497353613 at 17022010-1F5B *AUSTRIAN AIRLINES 
> > > Christian
> > > Kulovits* *ITSC Central System & Database Services Senior IT System
> > > Engineer*
> > > 
> > > Head Office
> > > Office Park 2, P.O. Box 100
> > > 1300 Vienna-Airport, Austria
> > > 
> > >  
> > > 
> > > *(**   *Phone:     +43 (0)5 1766   11557
> > > *Ê**   *Fax:         +43 (0)5 1766 511557
> > > È*   *Mobile:     +43 (0)664 80111 11557
> > > *   email:      christian.kulovits at austrian.com
> > > <mailto:christian.kulovits at austrian.com>
> > > ý   www:       www.austrian.com <http://www.austrian.com/>
> > > 
> > > ____________________________________________
> > > 
> > >  
> > > 
> > > ________________________________________________
> > > 
> > > Austrian Airlines AG, Office Park 2, P.O. Box 100, 1300 
> > > Vienna-Airport, Austria, registered office: Vienna, registered with 
> > > Vienna Commercial Court under FN 111000k, DVR 0091740. This e-mail 
> > > is confidential and is subject to disclaimers. Details can be found at:
> > > http://www.austrian.com/disclaimer.
> > > 
> > >  
> > > 
> > > 
> > > 
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > > 
> > > Project Home: http://www.clusterlabs.org Getting started: 
> > > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: 
> > > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pace
> > > ma
> > > ker
> > 
> > 
> > 
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org Getting started: 
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: 
> > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacema
> > ker
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




More information about the Pacemaker mailing list