[Pacemaker] Trouble with ordering

Serge Dubrouski sergeyfd at gmail.com
Mon Oct 3 01:47:26 UTC 2011

On Sun, Oct 2, 2011 at 12:31 AM, Gerald Vogt <vogt at spamcop.net> wrote:

> On 02.10.11 03:18, Serge Dubrouski wrote:
> >     1. You expect rndc and host to be in $PATH. At the same time the path
> to
> >     named can be configured. I think consequently, the same should apply
> to
> >     rndc and host as they are bind utils.
> >
> >     On our CentOS servers we run the latest version of bind, compiled
> from
> >     source and installed in a custom path which is added in /etc/profile.
> >     For some reason /etc/profile doesn't seem to apply to the ocf scripts
> >     thus the script doesn't find rndc or host unless I extend PATH
> manually
> >     at the beginning of the script.
> >
> >
> > We had some discussion around this and finally decided  to leave it up
> > to sysadmin ti make sure that both tools are available in PATH. One
> > can always create a couple of symlink to cover it.
> But isn't it inconsequent that you can set the named path as a parameter
> but not rndc or host. named, rndc, and host all come out of a bind
> installation and they all run on the same host...
> >     2. In the stop function you call "rndc stop" to stop the daemon.
> >     However, if the daemon hangs, rndc will hang. Thus pacemaker runs
> into a
> >     timeout and kills the ocf script, leading to a failed stop.
> >
> >
> > You didn't read the code carefully again. Yes it does exactly what you
> > want or at least it's supposed to:
> >
> >     if ! $RNDC stop >/dev/null; then
> The problem is your script never gets beyond this line. rndc tries to
> contact named which is hanging. I don't know what time out rndc has
> exactly but at least on our CentOS installation it doesn't time out
> within 60s.
> 60s is currently the timeout we have set in the "primitive" declaration.
> Thus after 60s pacemaker assumes your script is hanging and kills your
> script with TERM.
> As I wrote before: you should be able to test this easily by sending a
> STOP signal to the named process. At least in this situation I see that
> the "rndc stop" doesn't return before those 60s.

Indeed you are right. Thanks for catching. Attached is the patch that fixes
this issue. It also makes rndc and host commands configurable.

Please take a look at the patch and if it's all right I'll ask pacemaker
team to push it into git.

Thanks again.

> >         kill `cat ${OCF_RESKEY_named_pidfile}`
> >     fi
> >
> >     if [ -n "$OCF_RESKEY_CRM_meta_timeout" ]; then
> >       # Allow 2/3 of the action timeout for the orderly shutdown
> >       # (The origin unit is ms, hence the conversion)
> >       timeout=$((OCF_RESKEY_CRM_meta_timeout/1500))
> >     else
> >       timeout=20
> >     fi
> >
> >     while named_status ; do
> >         if [ $timeout -ge ${OCF_RESKEY_named_stop_timeout} ]; then
> >             break
> >         else
> >             sleep 1
> >             timeout=$((timeout++))
> >         fi
> >     done
> >
> >     *#If still up*
> > *    if named_status 2>&1; then*
> > *        ocf_log err "named is still up! Killing";*
> > *        kill -9 `cat ${OCF_RESKEY_named_pidfile}`*
> > *    fi*
> >
> >
> >     I think the ocf script should have its own timeout and abort the rndc
> >     call if it takes too long and then try to kill the server.
> >
> >
> > See above.
> >
> >
> >
> >     To test send a STOP signal to named and wait...
> Gerald
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Serge Dubrouski.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111002/1cc66069/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: named.patch
Type: text/x-patch
Size: 4231 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111002/1cc66069/attachment-0004.bin>

More information about the Pacemaker mailing list