[Pacemaker] Enable remote monitoring
Lars Marowsky-Bree
lmb at suse.com
Wed Dec 12 04:51:43 EST 2012
On 2012-12-11T12:53:39, David Vossel <dvossel at redhat.com> wrote:
Excellent progress!
Just one aspect caught my eye:
> > - on-fail defaults "restart-container" for most actions,
> >
> > except for stop op (Not sure what it means if a stop fails. A
> > nagios
> > daemon cannot be terminated? Should it always return success?) ,
>
> A nagios "stop" action should always return success. The nagio's agent doesn't even need a stop function, the lrmd can know to treat a "stop" as a (no-op for stop) + (cancel all recurring actions). In this case if the nagios agent doesn't stop successfully, it is because of an lrmd failure which should result in a fencing action i'd imagine.
That's something that, IMHO, shouldn't be handled by the container
abstraction, but - like you say - by the LRM/class code.
I think on-fail="restart-container" makes sense even for stop. If
"stop" can't technically fail for a given class, even better. But it
could mean that we actually need to stop some monitoring daemon or
whatever.
The other logic might be to set it to "ignore", which would also work
for me (even if a bit less obviously).
But really I'd not want to make "oh let's just skip stop for contained
resources" here ;-)
> > - Failures of resources count against container's
> What happens if someone wants to clear the container's failcount? Do we need to add some logic to go in and clear all the child resource's failures as well to make this happen correctly?
That appears to make sense.
Regards,
Lars
--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
More information about the Pacemaker
mailing list