[Pacemaker] Enable remote monitoring

Lars Marowsky-Bree lmb at suse.com
Tue Nov 6 07:59:26 EST 2012

On 2012-11-06T19:30:20, "Gao,Yan" <ygao at suse.com> wrote:

Hi Yan,

thanks for proposing this.

Let me try to add -

The proposal has essentially three parts.

First, like Yan said, a new resource agent class so that we can wrap
around the Icinga/nagios plugins, provide meta-data, etc. This is quite
separate from the other components, and fairly straightforward. It also
means that someone could configure these as a (unmanaged?) primitive in
case they just want to gather monitor data and make stuff depend on it.

This is hopefully not very controversial; after all, it's why we have
agent classes. ;-)

Second, the ability to specify a different class/(provider/)type for a
monitor op. This neatly allows us to pull in those probes for the
"monitor a VM use case", with hopefully minimal impact (on the PE or the
schema, where only optional attributes would be added), and also be
straightforward to configure for admins. (Clearly, the shell/hawk would
need to be taught about this so that it is easy too.) It may have
applications beyond this though.

And no, I'm not proposing that we allow overriding the
class/provider/type tuple for start/stop ;-)

Third, since the "start" of the base container may return before the
guest is fully booted (to stick with the VM resource), we may need an
additional timeout here. We *could* abuse start-delay (which might
finally give it some legitimate use), but looping until we got the first
success also appears attractive.

The one downside here is that, unless we modify the PE or make the
update to the CIB special somehow, the tools can't show that those
ops/services aren't yet reporting healthy. But I think this trade-off is
acceptable. And might be useful in other scenarios too.

Fourth, since this means we'll have multiple monitors doing different
things, this draws attention to the deficiency in Pacemaker where
monitor intervals clash, and something that really should be fixed
eventually - it affects the slave/master resources too, and is a bit
arcane knowledge to need on the admin'ss side.

I think all of these four pillars have merit on their own, and combined
would provide the use case that we wish to cover quite neatly. We'd be
contributing the first three, and would really appreciate if "someone"
could look at number 4 ;-)


Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

More information about the Pacemaker mailing list