[Pacemaker] long time to start

Andrew Beekhof andrew at beekhof.net
Mon Apr 19 09:02:25 EDT 2010


On Mon, Apr 19, 2010 at 2:29 PM, Schaefer, Diane E
<diane.schaefer at unisys.com> wrote:
>>> Hi,
>
>>>
>
>>> ? I have a resource that sometimes can take 10 minutes to start after a
>
>>> failure due to log records that need to be sync?d. (my own OCF)? I
>>> noticed
>
>>> while the start action was being performed, if other resources in my
>>> cluster
>
>>> report a ?not running?, no restart will be attempted until my long
>>> running
>
>>> started resource returns.? Meanwhile, the crm_mon ?reports the resources
>>> as
>
>>> ?started? eventhough they are not running, and may not be for many
>>> minutes.
>
>
>
>>Does your RA return from the start action immediately or after the
>
>>sync is complete and the service is truly started?
>
>>It _must_ only do the later.
>
>>Doing the former would explain what you're seeing.
>
>
>
> Actually this RA waits for the sync to complete.  If it takes longer than
> the allotted time-out, Pacemaker SIGTERM/SIGKILLs it.  The issue is if it
> can never complete in the allotted time frame,

Then make the timeout longer?

> my cluster is basically not
> servicing any other resources that may have failed until this original
> resource can resolve itself or a failover occurs.
>
>
>
> Diane Schaefer
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>




More information about the Pacemaker mailing list