[Pacemaker] Enable remote monitoring

Fri Nov 9 11:46:59 EST 2012

----- Original Message -----
> From: "Lars Marowsky-Bree" <lmb at suse.com>
> To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
> Sent: Friday, November 9, 2012 5:25:41 AM
> Subject: Re: [Pacemaker] Enable remote monitoring
> 
> On 2012-11-09T11:04:15, Andrew Beekhof <andrew at beekhof.net> wrote:
> 
> > So I was just explaining the problem and context to David... his
> > comment was "aren't these just unmanaged resources and some
> > constraints?".
> 
> They can even be managed - the start would be a "while ! monitor ;
> sleep
> 1 ; done" fake, and similar for stop. And then you could see the
> services wink in and out too.

This got me thinking... we really shouldn't have to do weird things like that, but I see how it would be useful.

It seems like the concept of an un-managed resource almost fits what we are trying to do but not quite.  Un-managed resources come with some baggage.  Constraints involving un-managed resources are kind of messy and I don't like how they aren't treated the same in the policy engine as everything else.

What if we made something similar to the concept of an "un-managed" resource, in that it is only ever monitored, but treated it like a normal resource.  Meaning start/stop could still execute, but start is really just the first "monitor" operation and stop just means the recurring "monitor" cancels.

Having "start" redirect to "monitor" in pacemaker would take care of that timeout problem you all were talking about with the first failure.  Set the start operation to some larger timeout.  Basically start would just verify that monitor passed once, then you could move on to the normal monitor timeouts/intervals.  Stop would always return success and cancel whatever recurring monitors are running.

Maybe we could call this resource primitive option "monitor-only" or something similar.

Now that I think about it, I'm not even sure we need the new container Andrew and I talked about at all if we introduce "monitor-only" resources. At this point we could just have a group where the first member launches the vm, and all the members after that are the monitor-only resources that start/stop similar to normal resources for the PE.  If any of the group members fail, I guess we'd need the whole group to be recovered in the right order.

Anyway, sorry if I missed something obvious here and got this conversation off track.  I fairly new to the project and plead ignorance :)

-- Vossel