[ClusterLabs] Antw: Re: Antw: Pacemaker 1.1.16 - Release Candidate 1

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Tue Nov 8 09:02:54 UTC 2016


>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 07.11.2016 um 16:15 in Nachricht
<d2e382c2-fa09-1fd7-0dbd-bf305f2252d7 at redhat.com>:
> On 11/07/2016 01:41 AM, Ulrich Windl wrote:
>>>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 04.11.2016 um 22:37 in Nachricht
>> <27c2ca20-c52c-8fb4-a60f-5ae12f7ffc64 at redhat.com>:
>>> On 11/04/2016 02:29 AM, Ulrich Windl wrote:
>>>>>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 03.11.2016 um 17:08 in
>>>>> * The new ocf:pacemaker:attribute resource agent sets a node attribute
>>>>> according to whether the resource is running or stopped. This may be
>>>>> useful in combination with attribute-based rules to model dependencies
>>>>> that simple constraints can't handle.
>>>>
>>>> I don't quite understand this: Isn't the state of a resource in the CIB 
>>> status
>>>> section anyway? If not, why not add it? So it would be readily available for
>>>> anyone (rules, constraints, etc.).
>>>
>>> This (hopefully) lets you model more complicated relationships.
>>>
>>> For example, someone recently asked whether they could make an ordering
>>> constraint apply only at "start-up" -- the first time resource A starts,
>>> it does some initialization that B needs, but once that's done, B can be
>>> independent of A.
>> 
>> Is "at start-up" before start of the resource, after start of the resource, 
> or parallel to the start of the resource ;-)
>> Probably a "hook" in the corresponding RA is the better approach, unless you 
> can really model all of the above.
>> 
>>>
>>> For that case, you could group A with an ocf:pacemaker:attribute
>>> resource. The important part is that the attribute is not set if A has
>>> never run on a node. So, you can make a rule that B can run only where
>>> the attribute is set, regardless of the value -- even if A is later
>>> stopped, the attribute will still be set.
>> 
>> If a resource is not running on a node,, it is "stopped"; isn't it?
> 
> Sure, but what I mean is, if resource A has *never* run on a node, then
> the corresponding node attribute will be *unset*. But if A has ever
> started and/or stopped on a node, the attribute will be set to one value
> or the other. So, a rule can be used to check whether the attribute is
> set, to determine whether A has *ever* been run on the node, regardless
> of whether it is currently running.

What I wanted to say is this: To be usable the cluster framework should be clever enough to set the proper attribute to "not running" even if a resource never ran. It's easier to implement it once in the framework rather than implementing it in every tool.

> 
>>> Another possible use would be for a cron that needs to know whether a
>>> particular resource is running, and an attribute query is quicker and
>>> easier than something like parsing crm_mon output or probing the service.
>> 
>> crm_mon reads parts of the CIB; crm_attribute also does, I guess, so besides 
> of lacking options and inefficient implementation, why should one be faster 
> than the other?
>> 
>>>
>>> It's all theoretical at this point, and I'm not entirely sure those
>>> examples would be useful :) but I wanted to make the agent available for
>>> people to experiment with.
>> 
>> A good product manager should resist the attempt to provide any feature the 
> customers ask for, avoiding bloat-ware. That is to protect the customer from 
> their own bad decisions. In most cases there is a better, more universal 
> solution to the specific problem.
> 
> Sure, but this is a resource agent -- it adds no overhead to anyone not
> using it, and since we don't have any examples or walk-throughs using
> it, users would have to investigate and experiment to see whether it's
> of any use in their environment.
> 
> Hopefully, this will turn out to be a general-purpose tool of value to
> multiple problem scenarios.
> 
>>>>> * Pacemaker's existing "node health" feature allows resources to move
>>>>> off nodes that become unhealthy. Now, when using
>>>>> node-health-strategy=progressive, a new cluster property
>>>>> node-health-base will be used as the initial health score of newly
>>>>> joined nodes (defaulting to 0, which is the previous behavior). This
>>>>> allows cloned and multistate resource instances to start on a node even
>>>>> if it has some "yellow" health attributes.
>>>>
>>>> So the node health is more or less a "node score"? I don't understand the 
>>> last
>>>> sentence. Maybe give an example?
>>>
>>> Yes, node health is a score that's added when deciding where to place a
>>> resource. It does get complicated ...
>>>
>>> Node health monitoring is optional, and off by default.
>>>
>>> Node health attributes are set to red, yellow or green (outside
>>> pacemaker itself -- either by a resource agent, or some external
>>> process). As an example, let's say we have three node health attributes
>>> for CPU usage, CPU temperature, and SMART error count.
>>>
>>> With a progressive strategy, red and yellow are assigned some negative
>>> score, and green is 0. In our example, let's say yellow gets a -10 score.
>>>
>>> If any of our attributes are yellow, resources will avoid the node
>>> (unless they have higher positive scores from something like stickiness
>>> or a location constraint).
>>>
>> 
>> I understood so far.
>> 
>>> Normally, this is what you want, but if your resources are cloned on all
>>> nodes, maybe you don't care if some attributes are yellow. In that case,
>>> you can set node-health-base=20, so even if two attributes are yellow,
>>> it won't prevent resources from running (20 + -10 + -10 = 0).
>> 
>> I don't understand that: "node-health-base" is a global setting, but what you 
> want is an exception for some specific (clone) resource.
>> To me the more obvious solution would be to provide an exception rule for 
> the resource, not a global setting for the node.
> 
> The main advantage of node-health-base over other approaches -- such as
> defining a constant #health-base attribute for all nodes, or defining
> positive location constraints for each resource on each node -- is that
> node-health-base applies to all resources and nodes, present and future.
> If someone adds a node to the cluster, it will automatically get
> node-health-base when it joins, whereas any other approach requires
> additional configuration changes (which leaves a window where the value
> is not applied).

So the node-health-base is a default value for the node until it will be explicitly set? Do you try to handle the problem "all nodes are to be assumed bad until proven to be good"? Are we maybe fighting a completely different problem (with some RAs)?

> 
> It also simplifies the configuration the more nodes/resources you have,
> and is less prone to accidental configuration mistakes.
> 
> The idea is straightforward: instead of each node starting with a health
> score of 0 (which means any negative health attribute will push all
> resources away), start each node with a positive health score, so that
> health has to drop below a certain point before affecting resources.

I don't see the difference between "starting at 0, substracting a small score" and "staring at some positive, subtracting a large score": You are saying that any negative score will move all resources away? I thought it only happens on -INFINITY.


> 
>> [...saving independent bits from being retransmitted...]
>> 
>> Ulrich
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 







More information about the Users mailing list