[Pacemaker] New System Health feature

Andrew Beekhof beekhof at gmail.com
Fri Apr 24 03:52:22 EDT 2009


On Thu, Apr 23, 2009 at 17:49, Mark Hamzy <hamzy at us.ibm.com> wrote:
> Hello,
>
> I am working on a feature to add system health metrics to HA. With this
> information, HA could failover nodes away from hardware that might have
> problems. The initial proposal briefly started on the linux-HA mailing list,
> but it has been moved to the pacemaker mailing list.
>
> The following is a short description of what we want this new feature to do.
>
> Feature Name: Health monitoring support
> Purpose: Allow pacemaker to schedule resources in a way that's sensitive to
> a variety of server-related health metrics
>
> Description:
> Add support in pacemaker for a class of attributes which would be specially
> treated. Under this proposal, all attributes defined for a node whose name
> matches the regular expression /^#health-.*$/ would be automatically added
> into the score for each resource being considered for scheduling on that
> node.
>
> The purpose of this is to allow multiple independent health monitors to each
> set their own health status and have that taken into account when scheduling
> resources. For example, IBM might define one called #health-ibmserver.
> Someone using smarttools (disk health monitors) might define one called
> #health-smarttools. Someone else using IPMI might define one called
> #health-ipmi. This means that this feature is not specific to any vendor,
> and various health monitor providers can develop health metrics for their
> hardware and not have to coordinate with each other in their development
> process.
>
> Typical usage of these variables is expected to be something like this:
>
> Health Attribute-value Meaning
> green 1000 server is happy, capable of running any resource
> yellow 0 server is marginal - it is desirable to schedule resources
> somewhere else if you can
> red -INFINITY server is unreliable (but still up) and should not be used
>
> Note that all of the values given would be configuration-specific. These
> attributes would be set via attrd_updater.

Agreed.
What I'm not yet clear on though, is why you can't just use these
attribute with the existing rsc_location constraints.

(And even if there is a need to expose it differently to users, it
should definitely be using the rsc_location logic internally)

> Should the translation of health scores (colors) into specific valuse be
> done outside the core system?

I think some PE options would be a good idea.
health-score-red=..., health-score-yelow=..., ...

> There should be an API for health monitoring agents.

More information?

> This would be similar to cluster-wide default set by symmetric-cluster true
> (0) or false (-INFINITY).

You lost me here.

> Special Note:
> IBM is already in the process of developing such a health monitoring tool
> for IBM X (intel-class) servers.
>
> So, what do you all think of this proposed functionality? Does it sound
> reasonable? Comments are appreciated.
>
> Mark
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>




More information about the Pacemaker mailing list