[ClusterLabs] What is the mechanism for pacemaker to recovery resources

Thu May 10 20:56:42 UTC 2018

On Thu, 2018-05-10 at 22:02 +0800, lkxjtu wrote:
> 
> Great! These two parameters (batch-limit & node-action-limit) solve
> my problem. Thank you very much!
> 
> By the way, is there any way to know the number of parallel action on
> node and cluster?

If you set PCMK_debug=crmd (or pacemaker-controld in the soon-to-be-
released 2.0.0), then the detail log on each node will have messages
like:

debug: Current load is 0.570000 across 1 core(s)

and

debug: Host rhel7-1 supports a maximum of 2 jobs and throttle mode
0000.  New job limit is 2

Of course your logs will grow faster with debug turned on ...

Otherwise there's no simple way to know. It might be nice to have a
command-line option to query the current values.

> At 2018-05-10 20:56:27, "lkxjtu" <lkxjtu at 163.com> wrote:
> On Tue, 2018-05-08 at 23:52 +0800, lkxjtu wrote: > I have a three
> node cluster of about 50 resources. When I reboot
> > three nodes at the same time, I observe the resource by "crm
> status".
> > I found that pacemaker starts 3-5 resources at a time, from top to
> > bottom, rather than start all at the same time. Is there any
> > parameter control?
> > It seems to be acceptable. But if there is a resource that can not
> > start up because of a exception, the latter resources recovery will
> > become very slow.I don't know the principle of pacemaker recovery
> > resources.In particular, order and priority.Is there any
> > suggestions?Thank you very much!
> There are a few things affecting start-up order. First (obviously) is
> your constraints. If you have any ordering constraints, they will
> enforce the configured order. Second is internal constraints.
> Pacemaker has certain built-in constraints for safety. This includes
> obvious logical requirements such as starting a resource before
> promoting it. Pacemaker will do a probe (one-time monitor) of each
> resource on each node to find its initial state; everything is
> ordered after those probes. A clone won't be promoted until all
> pending starts complete. Last is throttling. By default Pacemaker
> computes a maximum number of jobs that can be executed at once across
> the entire cluster, and for each node. The number is based on
> observed CPU load on the nodes (and thus depends partly on the number
> of CPU cores). Usually it is best to allow Pacemaker to calculate the
> throttling, but you can force particular values by setting: - node-
> action-limit: a cluster-wide property specifying the maximum number
> of actions that can be executed at once on any one node. -
> PCMK_node_action_limit: an environment variable specifying the same
> thing but can be configured differently per node. - batch-limit: a
> cluster-wide property specifying the maximum number of actions that
> can be executed at once across the entire cluster. The purpose of
> throttling is to keep Pacemaker from overloading the nodes such that
> actions might start timing out, causing unnecessary recovery.
> 
>  	 
> lkxjtu
> 邮箱：lkxjtu at 163.com
> 签名由 网易邮箱大师 定制
> 
> 
>  
-- 
Ken Gaillot <kgaillot at redhat.com>