[Pacemaker] Enable remote monitoring

Lars Marowsky-Bree lmb at suse.com
Thu Dec 6 05:24:05 EST 2012


On 2012-12-06T20:04:20, Andrew Beekhof <andrew at beekhof.net> wrote:

> >> Does that make sense though?
> >> You've not achieved anything a restart wouldn't have done.
> >> The choice to move the VM should be up to the VM.
> > If the fail-count of a nagios resource reaches its own
> > migration-threshold, the colocated VM should migrate with it anyway,
> > shouldn't it?
> 
> But moving a nagios resource makes no sense.

Exactly; we would want to move the container/parent.

> Because its running inside the guest, which would have already moved
> if it was the right thing to do.

No, that's not a given. The VM might be "healthy" (as in, the kernel is
running), but a service being monitored within it may not have
sufficient resources/CPU/IO/network or even connectivity problems on a
given host, to the point where trying to restart it on another
hypervisor makes sense.

But migration-threshold on the nagios primitive combined with a
mandatory colocation constraint will take care of that already, if an
admin wants to configure such.

I agree that, for the most part, people will not do that but keep
restarting VMs.

> > I like the concept of "failure-delegate". If we introduce it, it sounds
> > more like a resource's meta/op attribute to me, rather than into order
> > constraint or group. What do you think?
> Yes. It would be a resource meta attribute.

Hmmm. OK, I think I see where this is going.

We already have on-fail settings. How would these play together?

Would it even make sense to have on-fail="restart-container"? (Or a
nicer wording.)

Hmmm. That might work. We allow a "container" to be specified as a meta
attribute.

If set, on-fail would default to restart container for most actions. But
admins could actually modify it - say, they might want to set
monitor on-fail="ignore" to just get notified. And when we move forward
to whiteboxes, we could have start/monitor/promote/demote
on-fail="restart" (like now) and stop on-fail="restart-container".

That appears reasonably neat?



Regards,
    Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde





More information about the Pacemaker mailing list