[ClusterLabs] Antw: [EXT] Coming in 2.1.3: node health monitoring improvements
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Wed Apr 13 02:22:57 EDT 2022
>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 12.04.2022 um 17:22 in
Nachricht
<33f4147d0f6a3e46581aaa46a4eca81dfa59ce15.camel at redhat.com>:
> Hi all,
>
> I'm hoping to have the first release candidate for 2.1.3 ready next
> week.
>
> Pacemaker has long had a feature to monitor node health (CPU usage,
> SMART drive errors, etc.) and move resources off degraded nodes:
>
> https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/singlehtml/ind
> ex.html#tracking‑node‑health
Great, I wanted to ask a question on it anyway:
Is the node health attribute stored in the CIB, or is it transient (i.e.:
reset when the node is restarted)?
Some comments on the docs:
"yellow" state: could also mean node is becoming healthy (coming from red),
right?
The "Node Health Strategy" could benefit from better explanation.
E.g.: "Assign the value of ..." Assign to whom/what?
It's very hard to find out what "progressive" really does.
I think an configuration example with a sample scenario (node health changes)
would be very helpful.
>
> The 2.1.3 release will add a couple of features to make this more
> useful.
>
> First, you can now exempt particular resources from health‑related
> bans, using the new "allow‑unhealthy‑nodes" resource meta‑attribute.
If that's a resource attribute, then the name is poorly chosen (IMHO).
In times like these I'd almost suggest to name it
"immune-against-node-health=red" or so (OK, just a joke).
>
> This is particularly helpful for the health monitoring agents
> themselves. Without the new option, health agents get moved off
Specifically if the health state can improve again.
> degraded nodes, which means the cluster can't detect if the degraded
> condition goes away. Users had to manually clear the health attributes
> to allow resources to move back to the node. Now, you can set allow‑
> unhealthy‑nodes=true on your health agent resources, so they can
> continue detecting changes in the health status.
>
> Second, crm_mon will indicate when a node's health is yellow or red,
> like:
>
> * Node List:
> * Node node1: online (health is RED)
For compatibility I'd prefer a new option to display those, or at least a new
item; maybe like:
----
Node Health:
* Node: h16: green
...
----
or
---
Node Attributes:
* Node h16: green
---
>
> Previously, you would see that the node is not running any resources,
> but not know why, unless you thought to check every node health
> attribute.
That's definitely a bad thing for any atrificial intelligence not to be able
to explain itself ;-)
Regards,
Ulrich
> ‑‑
> Ken Gaillot <kgaillot at redhat.com>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list