[ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1

Mon Nov 7 10:15:01 EST 2016

On 11/07/2016 01:41 AM, Ulrich Windl wrote:
>>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 04.11.2016 um 22:37 in Nachricht
> <27c2ca20-c52c-8fb4-a60f-5ae12f7ffc64 at redhat.com>:
>> On 11/04/2016 02:29 AM, Ulrich Windl wrote:
>>>>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 03.11.2016 um 17:08 in
>>>> * The new ocf:pacemaker:attribute resource agent sets a node attribute
>>>> according to whether the resource is running or stopped. This may be
>>>> useful in combination with attribute-based rules to model dependencies
>>>> that simple constraints can't handle.
>>>
>>> I don't quite understand this: Isn't the state of a resource in the CIB 
>> status
>>> section anyway? If not, why not add it? So it would be readily available for
>>> anyone (rules, constraints, etc.).
>>
>> This (hopefully) lets you model more complicated relationships.
>>
>> For example, someone recently asked whether they could make an ordering
>> constraint apply only at "start-up" -- the first time resource A starts,
>> it does some initialization that B needs, but once that's done, B can be
>> independent of A.
> 
> Is "at start-up" before start of the resource, after start of the resource, or parallel to the start of the resource ;-)
> Probably a "hook" in the corresponding RA is the better approach, unless you can really model all of the above.
> 
>>
>> For that case, you could group A with an ocf:pacemaker:attribute
>> resource. The important part is that the attribute is not set if A has
>> never run on a node. So, you can make a rule that B can run only where
>> the attribute is set, regardless of the value -- even if A is later
>> stopped, the attribute will still be set.
> 
> If a resource is not running on a node,, it is "stopped"; isn't it?

Sure, but what I mean is, if resource A has *never* run on a node, then
the corresponding node attribute will be *unset*. But if A has ever
started and/or stopped on a node, the attribute will be set to one value
or the other. So, a rule can be used to check whether the attribute is
set, to determine whether A has *ever* been run on the node, regardless
of whether it is currently running.

>> Another possible use would be for a cron that needs to know whether a
>> particular resource is running, and an attribute query is quicker and
>> easier than something like parsing crm_mon output or probing the service.
> 
> crm_mon reads parts of the CIB; crm_attribute also does, I guess, so besides of lacking options and inefficient implementation, why should one be faster than the other?
> 
>>
>> It's all theoretical at this point, and I'm not entirely sure those
>> examples would be useful :) but I wanted to make the agent available for
>> people to experiment with.
> 
> A good product manager should resist the attempt to provide any feature the customers ask for, avoiding bloat-ware. That is to protect the customer from their own bad decisions. In most cases there is a better, more universal solution to the specific problem.

Sure, but this is a resource agent -- it adds no overhead to anyone not
using it, and since we don't have any examples or walk-throughs using
it, users would have to investigate and experiment to see whether it's
of any use in their environment.

Hopefully, this will turn out to be a general-purpose tool of value to
multiple problem scenarios.

>>>> * Pacemaker's existing "node health" feature allows resources to move
>>>> off nodes that become unhealthy. Now, when using
>>>> node-health-strategy=progressive, a new cluster property
>>>> node-health-base will be used as the initial health score of newly
>>>> joined nodes (defaulting to 0, which is the previous behavior). This
>>>> allows cloned and multistate resource instances to start on a node even
>>>> if it has some "yellow" health attributes.
>>>
>>> So the node health is more or less a "node score"? I don't understand the 
>> last
>>> sentence. Maybe give an example?
>>
>> Yes, node health is a score that's added when deciding where to place a
>> resource. It does get complicated ...
>>
>> Node health monitoring is optional, and off by default.
>>
>> Node health attributes are set to red, yellow or green (outside
>> pacemaker itself -- either by a resource agent, or some external
>> process). As an example, let's say we have three node health attributes
>> for CPU usage, CPU temperature, and SMART error count.
>>
>> With a progressive strategy, red and yellow are assigned some negative
>> score, and green is 0. In our example, let's say yellow gets a -10 score.
>>
>> If any of our attributes are yellow, resources will avoid the node
>> (unless they have higher positive scores from something like stickiness
>> or a location constraint).
>>
> 
> I understood so far.
> 
>> Normally, this is what you want, but if your resources are cloned on all
>> nodes, maybe you don't care if some attributes are yellow. In that case,
>> you can set node-health-base=20, so even if two attributes are yellow,
>> it won't prevent resources from running (20 + -10 + -10 = 0).
> 
> I don't understand that: "node-health-base" is a global setting, but what you want is an exception for some specific (clone) resource.
> To me the more obvious solution would be to provide an exception rule for the resource, not a global setting for the node.

The main advantage of node-health-base over other approaches -- such as
defining a constant #health-base attribute for all nodes, or defining
positive location constraints for each resource on each node -- is that
node-health-base applies to all resources and nodes, present and future.
If someone adds a node to the cluster, it will automatically get
node-health-base when it joins, whereas any other approach requires
additional configuration changes (which leaves a window where the value
is not applied).

It also simplifies the configuration the more nodes/resources you have,
and is less prone to accidental configuration mistakes.

The idea is straightforward: instead of each node starting with a health
score of 0 (which means any negative health attribute will push all
resources away), start each node with a positive health score, so that
health has to drop below a certain point before affecting resources.

> [...saving independent bits from being retransmitted...]
> 
> Ulrich