[Pacemaker] Trouble with ordering

Sun Oct 2 02:31:22 EDT 2011

On 02.10.11 03:18, Serge Dubrouski wrote:
>     1. You expect rndc and host to be in $PATH. At the same time the path to
>     named can be configured. I think consequently, the same should apply to
>     rndc and host as they are bind utils.
> 
>     On our CentOS servers we run the latest version of bind, compiled from
>     source and installed in a custom path which is added in /etc/profile.
>     For some reason /etc/profile doesn't seem to apply to the ocf scripts
>     thus the script doesn't find rndc or host unless I extend PATH manually
>     at the beginning of the script.
> 
> 
> We had some discussion around this and finally decided  to leave it up
> to sysadmin ti make sure that both tools are available in PATH. One
> can always create a couple of symlink to cover it.

But isn't it inconsequent that you can set the named path as a parameter
but not rndc or host. named, rndc, and host all come out of a bind
installation and they all run on the same host...

>     2. In the stop function you call "rndc stop" to stop the daemon.
>     However, if the daemon hangs, rndc will hang. Thus pacemaker runs into a
>     timeout and kills the ocf script, leading to a failed stop.
> 
> 
> You didn't read the code carefully again. Yes it does exactly what you
> want or at least it's supposed to:
> 
>     if ! $RNDC stop >/dev/null; then

The problem is your script never gets beyond this line. rndc tries to
contact named which is hanging. I don't know what time out rndc has
exactly but at least on our CentOS installation it doesn't time out
within 60s.

60s is currently the timeout we have set in the "primitive" declaration.
Thus after 60s pacemaker assumes your script is hanging and kills your
script with TERM.

As I wrote before: you should be able to test this easily by sending a
STOP signal to the named process. At least in this situation I see that
the "rndc stop" doesn't return before those 60s.

>         kill `cat ${OCF_RESKEY_named_pidfile}`
>     fi
> 
>     if [ -n "$OCF_RESKEY_CRM_meta_timeout" ]; then
>       # Allow 2/3 of the action timeout for the orderly shutdown
>       # (The origin unit is ms, hence the conversion)
>       timeout=$((OCF_RESKEY_CRM_meta_timeout/1500))
>     else
>       timeout=20
>     fi
> 
>     while named_status ; do
>         if [ $timeout -ge ${OCF_RESKEY_named_stop_timeout} ]; then
>             break
>         else
>             sleep 1
>             timeout=$((timeout++))
>         fi
>     done
> 
>     *#If still up*
> *    if named_status 2>&1; then*
> *        ocf_log err "named is still up! Killing";*
> *        kill -9 `cat ${OCF_RESKEY_named_pidfile}`*
> *    fi*
> 
> 
>     I think the ocf script should have its own timeout and abort the rndc
>     call if it takes too long and then try to kill the server.
> 
> 
> See above.
>  
> 
> 
>     To test send a STOP signal to named and wait...

Gerald