[Pacemaker] System Health

Fri May 8 07:26:16 EDT 2009

On Thu, May 7, 2009 at 10:24 PM, Mark Hamzy <hamzy at us.ibm.com> wrote:
> andrew at beekhof.net wrote on 05/07/2009 17:06:23 PM:
>> On Wed, May 6, 2009 at 11:32 PM, Mark Hamzy <hamzy at us.ibm.com> wrote:
>> >
>
>> This is where the disconnect is.
>> You seem convinced that everyone will want to sum them up the same way
>> you do, for every resource in the cluster.
>> I'm not so sure.
>
> Which is why I wrote the following:
>
>> > How about a PE option for pacemaker to automatically calculate health?
>> > Admins could then enable these calculations if they do not want to go
>> > through the effort to investigate all of the health variables and how
>> > they might affect system operations?
>
> Is putting code into pacemaker, so that system health is automatically
> calculated (when requested), something that the community wants?

Oh I don't doubt there is a non-trivial number of people want the
calculation done automatically.
Or even that many would want it calculated just as you describe.

My job though, is to also consider those for which the simplistic
model doesn't fit.
There must be a logical progression between the two usage styles.

Groups is a classic example of what happens when this isn't the case.
Everything you learnt about groups goes out the window once you have a
non-linear start order.

So what I think we need is the scores:
 - node-health-score-red (defaults to -INFINITY),
 - node-health-score-yellow (defaults to 0),
 - node-health-score-green (defaults to 0),

These would work like the INFINITY an -INFINITY aliases (see char2score()).
So "red", "yellow" and "green" would be allowed anyway a score is
expected and used in the same way.
This could go into 1.0.x

Then I'd add
 - node-base-score
Which cuts out half of the rsc_location constraints and seems like a
generically useful concept.
(One would probably look up this value and set node-weight during
unpack_nodes() )
I still debating whether this is suitable for 1.0

So now the constraints now look like:

<rsc_location id="other_health" rsc="X">
  <rule id="health_location_1" score-attribute="#health-ipmi"/>
  <rule id="health_location_2" score-attribute="#health-smart"/>
</rsc_location>

Which one would only have to specify once for 1.2 when pattern
matching gets added (thats definitely not a change for a stable
series).

I don't consider this a huge configuration overhead, BUT, I wouldn't
object to adding
 - node-health-strategy = (none, custom, migrate-on-red, ...)

* "none" would be the default
* "custom" would require the admin to configure the rsc_location
constraints manually
* "migrate-on-red" would be the one you're proposing and implicitly
create the relevant rsc_location constraints behind the scenes.
  I'd not call it "automatic" in order to allow us to add other
"default" strategies in the future.

To me this seems like a good compromise between ease-of-use and
flexibility, with a sane progression from "none", to default
strategies, to "custom".