[Pacemaker] Enable remote monitoring

Thu Nov 8 00:15:50 EST 2012

On Tue, Nov 6, 2012 at 11:59 PM, Lars Marowsky-Bree <lmb at suse.com> wrote:
> On 2012-11-06T19:30:20, "Gao,Yan" <ygao at suse.com> wrote:
>
> Hi Yan,
>
> thanks for proposing this.
>
> Let me try to add -
>
> The proposal has essentially three parts.
>
> First, like Yan said, a new resource agent class so that we can wrap
> around the Icinga/nagios plugins, provide meta-data, etc. This is quite
> separate from the other components, and fairly straightforward. It also
> means that someone could configure these as a (unmanaged?) primitive in
> case they just want to gather monitor data and make stuff depend on it.
>
> This is hopefully not very controversial; after all, it's why we have
> agent classes. ;-)
>
>
> Second, the ability to specify a different class/(provider/)type for a
> monitor op. This neatly allows us to pull in those probes for the
> "monitor a VM use case", with hopefully minimal impact (on the PE or the
> schema, where only optional attributes would be added), and also be
> straightforward to configure for admins. (Clearly, the shell/hawk would
> need to be taught about this so that it is easy too.) It may have
> applications beyond this though.
>
> And no, I'm not proposing that we allow overriding the
> class/provider/type tuple for start/stop ;-)

Did you consider having the VirtualDomain do the nagios redirect for
monitor operations?
If so, what was the drawback?

>
>
> Third, since the "start" of the base container may return before the
> guest is fully booted (to stick with the VM resource), we may need an
> additional timeout here. We *could* abuse start-delay (which might
> finally give it some legitimate use), but looping until we got the first
> success also appears attractive.

My concern there is that there needs to be a finite termination point
for the "its still bad" looping.
No better ideas yet though.

>
> The one downside here is that, unless we modify the PE or make the
> update to the CIB special somehow, the tools can't show that those
> ops/services aren't yet reporting healthy. But I think this trade-off is
> acceptable. And might be useful in other scenarios too.

You do or dont want to show them as unhealthy?  I'm not parsing this well.

>
> Fourth, since this means we'll have multiple monitors doing different
> things, this draws attention to the deficiency in Pacemaker where
> monitor intervals clash, and something that really should be fixed
> eventually - it affects the slave/master resources too, and is a bit
> arcane knowledge to need on the admin'ss side.

True. I wouldn't mind getting that fixed.  Doing it in a backwards
compatible manner might be tricky though.

>
> I think all of these four pillars have merit on their own, and combined
> would provide the use case that we wish to cover quite neatly. We'd be
> contributing the first three, and would really appreciate if "someone"
> could look at number 4 ;-)
>
>
> Regards,
>     Lars
>
> --
> Architect Storage/HA
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org