[Pacemaker] start/stop operations fail to happen in parallel on resources

Thu Apr 19 09:05:21 EDT 2012

Hi,

On Thu, Apr 19, 2012 at 2:22 PM, Parshvi <parshvi.17 at gmail.com> wrote:
> Observations:
> max-children=30
> total no. of resources=18
>
> 1) At a default value 4 of max-children, following logs were observed
> that led to monitor op’s timeout for some resources (a total of 18 rscs):
>  a. “max_child_count (4) reached, postponing execution of operation monitor”
>  b. “WARN: perform_ra_op: the operation operation monitor[18] on
> ocf::IPaddr2::ClusterIP for client 3754, stayed in operation list for
> 14100 ms (longer than 10000 ms)”
>  c. SOLUTION: the max-children of lrmd was raised to 30.
>  d. ISSUES STILL OBSERVED: while 2-3 resources are stuck in start operation,
> if a rsc is issued an explicit start command `crm resource start rcs1`, then the
> start op on this rsc is delayed until any one of the previous resources exit
> from their start operation.

What version of Pacemaker?

>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
Dan Frincu
CCNA, RHCE