[ClusterLabs] Coming in 2.1.3: node health monitoring improvements

Ken Gaillot kgaillot at redhat.com
Tue Apr 12 11:22:07 EDT 2022


Hi all,

I'm hoping to have the first release candidate for 2.1.3 ready next
week.

Pacemaker has long had a feature to monitor node health (CPU usage,
SMART drive errors, etc.) and move resources off degraded nodes:

https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/singlehtml/index.html#tracking-node-health

The 2.1.3 release will add a couple of features to make this more
useful.

First, you can now exempt particular resources from health-related
bans, using the new "allow-unhealthy-nodes" resource meta-attribute.

This is particularly helpful for the health monitoring agents
themselves. Without the new option, health agents get moved off
degraded nodes, which means the cluster can't detect if the degraded
condition goes away. Users had to manually clear the health attributes
to allow resources to move back to the node. Now, you can set allow-
unhealthy-nodes=true on your health agent resources, so they can
continue detecting changes in the health status.

Second, crm_mon will indicate when a node's health is yellow or red,
like:

    * Node List:
        * Node node1: online (health is RED)

Previously, you would see that the node is not running any resources,
but not know why, unless you thought to check every node health
attribute.
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list