[ClusterLabs] heartbeat/anything Resource Agent : "wait for proper service before ending the start operation"

Nicolas Huillard nicolas at huillard.net
Fri Apr 13 05:53:42 EDT 2018


Le vendredi 13 avril 2018 à 11:15 +0200, Oyvind Albrigtsen a écrit :
> On 13/04/18 11:07 +0200, Nicolas Huillard wrote:
> > One of my resources is a pppd process, which is started with the
> > heartbeat/anything RA. That RA just spawn the pppd process with the
> > correct parameters and return OCF_SUCCESS if the process started.
> > The problem is that the service provided by pppd is only available
> > after some time (a few seconds to 30s), ie. when it have
> > successfully
> > negotiated a connection. At this time, the interface it creates is
> > UP.
> > 
> > The issue here is that other resources that depend on this
> > connection
> > are started by Pacemaker just after it starts pppd, thus before the
> > interface is UP. This creates various problems.
> > 
> > I figured that fixing this would require to add a monitor call
> > inside
> > the start operation, and wait for a successful monitor before
> > returning
> > OCF_SUCCESS, within the start timeout.
> > 
> > Is it a correct approach?
> > Are there some other standard way to fix this, like a "wait for
> > condition" Resource Agent?
> 
> You could try using the monitor_hook parameter to check the status, 

The issue here is the monitor will at first return a "fail", which is
considered fatal by Pacemaker unless property start-failure-is-fatal is
set to false, which may come with side-effects.
That's what I do now with a ping RA inserted before the service which
may fail if the interface is not UP. It works, but triggers some "fail"
events which are not really "fails" but "not started yet".

> or
> use the Delay agent between the anything resource and the other
> resources.

I'll try this. Hoping a sensible delay can be derived from the logs.

Thanks,

-- 
Nicolas Huillard



More information about the Users mailing list