[Pacemaker] unbound resource agent

Dejan Muhamedagic dejanmm at fastmail.fm
Wed Mar 14 12:52:21 EDT 2012


On Wed, Mar 14, 2012 at 02:48:11PM +0100, Benjamin Kiessling wrote:
> Hi,
> 
> On 2012.03.14 14:24:10 +0100, Dejan Muhamedagic wrote:
> > > dnsCache_start_0 (node=router1, call=56, rc=-2, status=Timed Out): unknown exec error
> > > dnsCache_monitor_1000 (node=router2, call=24, rc=1, status=complete): unknown error

This one exited with a generic error. Didn't notice that. The RA
should've logged the reason.

> > > dnsCache_start_0 (node=router2, call=81, rc=-2, status=Timed Out): unknown exec error
> > 
> > These operations timed out, i.e. didn't finish in the given time
> > frame which is by default 20 seconds.
> 
> It says the return code is -2 which isn't a return code specified in the
> OCF standard. unbound usually starts fast and I can't see anything in
> the logs indicating an error during initialization.

Negative exit codes are special and cannot be produced by a
script. Hmm, I've always thought that "Timed Out" in that
message above is unequivocal.

> > > primitive dnsCache ocf:heartbeat:unbound \
> > > 	op monitor interval="1s" timeout="10s" start-delay="10s" \
> > 
> > This is very aggressive and the timeout is too short.
> 
> I'd like to keep the monitor interval as short as possible. As the check
> simply resolves localhost I see no reason not to run it as often as
> possible.

Of course, what you want to spend the CPU cycles for is entirely
up to you. It's just a fairly atypical interval.

> The timeout should be plenty as there should be no
> considerable delay in resolving localhost. I can reset the interval to
> the defaults but I don't see this being the cause for this weird
> behavior (correct my if I'm wrong).

Well, let's just say that it may happen that 10s is not enough.
If you're willing to risk an unnecessary failover, keep it as
short as you like. There're a few texts on timeouts:

http://www.advogato.org/person/lmb/diary/108.html
http://www.linux-ha.org/wiki/File:Linuxtag-09-ha-paper.pdf

> > start-delay is of no benefit if the start action makes sure that
> > the instance is operational (which it should do).
> 
> The resource agent executes the monitor function in the start routine
> until it returns successfully. 

That's good.

> > There's an excellent RA implementation guide at linux-ha.org (NB:
> > didn't take a look at your RA).
> 
> I read it and decided to use the named resource agent as a base as it is
> conceptually very similar to what a resource agent for unbound or any
> other recursive dns server should do (and because I am lazy).

Cool.

Thanks,

Dejan

> Regards,
> Ben



> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org





More information about the Pacemaker mailing list