[Pacemaker] Enable remote monitoring

Andrew Beekhof andrew at beekhof.net
Fri Nov 9 16:30:20 EST 2012


On Sat, Nov 10, 2012 at 4:54 AM, Lars Marowsky-Bree <lmb at suse.com> wrote:
> On 2012-11-09T11:46:59, David Vossel <dvossel at redhat.com> wrote:
>
>> What if we made something similar to the concept of an "un-managed" resource, in that it is only ever monitored, but treated it like a normal resource.  Meaning start/stop could still execute, but start is really just the first "monitor" operation and stop just means the recurring "monitor" cancels.
>>
>> Having "start" redirect to "monitor" in pacemaker would take care of that timeout problem you all were talking about with the first failure.  Set the start operation to some larger timeout.  Basically start would just verify that monitor passed once, then you could move on to the normal monitor timeouts/intervals.  Stop would always return success and cancel whatever recurring monitors are running.
>
> That's exactly the kind of abstraction a resource agent class can
> provide though for the nagios agents - no need to have that special
> knowledge in the PE. The LRM can hide this, which is partly its
> purpose.
>
>> Now that I think about it, I'm not even sure we need the new container Andrew and I talked about at all if we introduce "monitor-only" resources.
>
> Yes. We'd still need it.
>
>> At this point we could just have a group where the first member launches the vm, and all the members after that are the monitor-only resources that start/stop similar to normal resources for the PE.  If any of the group members fail, I guess we'd need the whole group to be recovered in the right order.
>
> That's the point - "right order" for a container is not quite the right
> order as for a regular group. Basically, the group semantics would
> recover from the failed resource onward, never the VM resource
> (container).
>
> If you look at my proposal, I actually made the "container=" a group
> attribute

I think I'd rather it be a whole different tag than piggyback off the group tag.

>- because we need to map monitor failures to the container, as
> well as ignore any stop failures (service is down clean as long as the
> container is eventually stopped).
>
> I think the shell might render this differently, even if we express it
> as a group + meta-attribute(s) in the XML (which seems to be the way to
> go). "container ..." is easier on the eyes ;-)
>
>
> Regards,
>     Lars
>
> --
> Architect Storage/HA
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




More information about the Pacemaker mailing list