[ClusterLabs] Antw: [EXT] Coming in 2.1.3: node health monitoring improvements

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Wed Apr 13 02:22:57 EDT 2022


>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 12.04.2022 um 17:22 in
Nachricht
<33f4147d0f6a3e46581aaa46a4eca81dfa59ce15.camel at redhat.com>:
> Hi all,
> 
> I'm hoping to have the first release candidate for 2.1.3 ready next
> week.
> 
> Pacemaker has long had a feature to monitor node health (CPU usage,
> SMART drive errors, etc.) and move resources off degraded nodes:
> 
> https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/singlehtml/ind

> ex.html#tracking‑node‑health

Great, I wanted to ask a question on it anyway:
Is the node health attribute stored in the CIB, or is it transient (i.e.:
reset when the node is restarted)?

Some comments on the docs:

"yellow" state: could also mean node is becoming healthy (coming from red),
right?

The "Node Health Strategy" could benefit from  better explanation.
E.g.: "Assign the value of ..." Assign to whom/what?
It's very hard to find out what "progressive" really does.

I think an configuration example with a sample scenario (node health changes)
would be very helpful.

> 
> The 2.1.3 release will add a couple of features to make this more
> useful.
> 
> First, you can now exempt particular resources from health‑related
> bans, using the new "allow‑unhealthy‑nodes" resource meta‑attribute.

If that's  a resource attribute, then the name is poorly chosen (IMHO).
In times like these I'd almost suggest to name it
"immune-against-node-health=red" or so (OK, just a joke).


> 
> This is particularly helpful for the health monitoring agents
> themselves. Without the new option, health agents get moved off

Specifically if the health state can improve again.

> degraded nodes, which means the cluster can't detect if the degraded
> condition goes away. Users had to manually clear the health attributes
> to allow resources to move back to the node. Now, you can set allow‑
> unhealthy‑nodes=true on your health agent resources, so they can
> continue detecting changes in the health status.
> 
> Second, crm_mon will indicate when a node's health is yellow or red,
> like:
> 
>     * Node List:
>         * Node node1: online (health is RED)

For compatibility I'd prefer a new option to display those, or at least a new
item; maybe like:
----
Node Health:
  * Node: h16: green
  ...
----

or

---
Node Attributes:
  * Node h16: green
---

> 
> Previously, you would see that the node is not running any resources,
> but not know why, unless you thought to check every node health
> attribute.

That's definitely a bad thing for any atrificial intelligence not to be able
to explain itself ;-)

Regards,
Ulrich


> ‑‑ 
> Ken Gaillot <kgaillot at redhat.com>
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 





More information about the Users mailing list