[ClusterLabs] What is the mechanism for pacemaker to recovery resources
Ken Gaillot
kgaillot at redhat.com
Wed May 9 10:19:46 EDT 2018
On Tue, 2018-05-08 at 23:52 +0800, lkxjtu wrote:
> I have a three node cluster of about 50 resources. When I reboot
> three nodes at the same time, I observe the resource by "crm status".
> I found that pacemaker starts 3-5 resources at a time, from top to
> bottom, rather than start all at the same time. Is there any
> parameter control?
> It seems to be acceptable. But if there is a resource that can not
> start up because of a exception, the latter resources recovery will
> become very slow.I don't know the principle of pacemaker recovery
> resources.In particular, order and priority.Is there any
> suggestions?Thank you very much!
There are a few things affecting start-up order.
First (obviously) is your constraints. If you have any ordering
constraints, they will enforce the configured order.
Second is internal constraints. Pacemaker has certain built-in
constraints for safety. This includes obvious logical requirements such
as starting a resource before promoting it. Pacemaker will do a probe
(one-time monitor) of each resource on each node to find its initial
state; everything is ordered after those probes. A clone won't be
promoted until all pending starts complete.
Last is throttling. By default Pacemaker computes a maximum number of
jobs that can be executed at once across the entire cluster, and for
each node. The number is based on observed CPU load on the nodes (and
thus depends partly on the number of CPU cores). Usually it is best to
allow Pacemaker to calculate the throttling, but you can force
particular values by setting:
- node-action-limit: a cluster-wide property specifying the maximum
number of actions that can be executed at once on any one node.
- PCMK_node_action_limit: an environment variable specifying the same
thing but can be configured differently per node.
- batch-limit: a cluster-wide property specifying the maximum number of
actions that can be executed at once across the entire cluster.
The purpose of throttling is to keep Pacemaker from overloading the
nodes such that actions might start timing out, causing unnecessary
recovery.
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list