[Pacemaker] long time to start

Lars Ellenberg lars.ellenberg at linbit.com
Mon Apr 19 15:39:00 EDT 2010


On Fri, Apr 16, 2010 at 02:28:26PM -0500, Schaefer, Diane E wrote:
> Hi,
>   I have a resource that sometimes can take 10 minutes to start after
>   a failure due to log records that need to be sync'd. (my own OCF)
>
>   I noticed while the start action was being performed, if other
>   resources in my cluster report a "not running", no restart will be
>   attempted until my long running started resource returns.
>
>   Meanwhile, the crm_mon  reports the resources as "started"
>   eventhough they are not running, and may not be for many minutes.
>   Is the lrm process single threaded?

You are saying that while your RA starts (with a long start timeout),
and the start action is not yet complete,
other _independend_ resources are not yet started,
but crm_mon thinks they are running already,
even though "something" (what?) reports "not running" for those?

I think you lost me ;)

please show a "crm configure show"

Can you reproduce this easily?
Can you reproduce this with just a few "Dummy" resources?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.




More information about the Pacemaker mailing list