[ClusterLabs] Antw: Re: Antw: Pacemaker 1.1.16 - Release Candidate 1

Ken Gaillot kgaillot at redhat.com
Tue Nov 8 12:16:11 EST 2016

On 11/08/2016 03:02 AM, Ulrich Windl wrote:
>>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 07.11.2016 um 16:15 in Nachricht
>>>>>> * Pacemaker's existing "node health" feature allows resources to move
>>>>>> off nodes that become unhealthy. Now, when using
>>>>>> node-health-strategy=progressive, a new cluster property
>>>>>> node-health-base will be used as the initial health score of newly
>>>>>> joined nodes (defaulting to 0, which is the previous behavior). This
>>>>>> allows cloned and multistate resource instances to start on a node even
>>>>>> if it has some "yellow" health attributes.
>>>>> So the node health is more or less a "node score"? I don't understand the 
>>>> last
>>>>> sentence. Maybe give an example?
>>>> Yes, node health is a score that's added when deciding where to place a
>>>> resource. It does get complicated ...
>>>> Node health monitoring is optional, and off by default.
>>>> Node health attributes are set to red, yellow or green (outside
>>>> pacemaker itself -- either by a resource agent, or some external
>>>> process). As an example, let's say we have three node health attributes
>>>> for CPU usage, CPU temperature, and SMART error count.
>>>> With a progressive strategy, red and yellow are assigned some negative
>>>> score, and green is 0. In our example, let's say yellow gets a -10 score.
>>>> If any of our attributes are yellow, resources will avoid the node
>>>> (unless they have higher positive scores from something like stickiness
>>>> or a location constraint).
>>> I understood so far.
>>>> Normally, this is what you want, but if your resources are cloned on all
>>>> nodes, maybe you don't care if some attributes are yellow. In that case,
>>>> you can set node-health-base=20, so even if two attributes are yellow,
>>>> it won't prevent resources from running (20 + -10 + -10 = 0).
>>> I don't understand that: "node-health-base" is a global setting, but what you 
>> want is an exception for some specific (clone) resource.
>>> To me the more obvious solution would be to provide an exception rule for 
>> the resource, not a global setting for the node.
>> The main advantage of node-health-base over other approaches -- such as
>> defining a constant #health-base attribute for all nodes, or defining
>> positive location constraints for each resource on each node -- is that
>> node-health-base applies to all resources and nodes, present and future.
>> If someone adds a node to the cluster, it will automatically get
>> node-health-base when it joins, whereas any other approach requires
>> additional configuration changes (which leaves a window where the value
>> is not applied).
> So the node-health-base is a default value for the node until it will be explicitly set? Do you try to handle the problem "all nodes are to be assumed bad until proven to be good"? Are we maybe fighting a completely different problem (with some RAs)?

node-health-base is a sort of default health value, but node-health is
never explicitly set -- it's the sum of node-health-base and the
adjustments for each health attribute.

node-health-base could be used for the "assumed bad" approach: you could
set node-health-base to a negative value, and set green to a positive
value (rather than 0, which is its default). Then, each green attribute
would eat away at the deficit.

>> It also simplifies the configuration the more nodes/resources you have,
>> and is less prone to accidental configuration mistakes.
>> The idea is straightforward: instead of each node starting with a health
>> score of 0 (which means any negative health attribute will push all
>> resources away), start each node with a positive health score, so that
>> health has to drop below a certain point before affecting resources.
> I don't see the difference between "starting at 0, substracting a small score" and "staring at some positive, subtracting a large score": You are saying that any negative score will move all resources away? I thought it only happens on -INFINITY.

Pacemaker always combines scores from all sources and uses the final
value to decide resource placement.

So, if a node's health is -50 but a resource has a location preference
of +100 for that node, then the resource could still be placed there.

You are right, only -INFINITY is mandatory, but if there are no positive
scores from other sources, any negative score will have the same effect
of keeping resources off the node. So, starting from a positive number
is a big difference in effect.

The user is responsible for choosing meaningful values. For example, if
node-health-base is +10 but yellow is -15, then any yellow attribute
will still push resources away. Of course, that could still be
meaningful when combined with other scores -- someone might do that if
they want a location preference of +5 to counteract a single yellow
attribute. Or maybe instead of node-health-base, someone sets a positive
stickiness, so existing resources can stay on a yellow node, but new
resources won't be placed there. It can be as simple or complicated as
you want to get :)

More information about the Users mailing list