[Pacemaker] Trouble with ordering

Serge Dubrouski sergeyfd at gmail.com
Sat Oct 1 21:18:09 EDT 2011


On Sat, Oct 1, 2011 at 2:49 PM, Gerald Vogt <vogt at spamcop.net> wrote:

> On 01.10.11 04:53, Serge Dubrouski wrote:
> >     Technically, I don't want the cluster to control the service in the
> >     meaning of starting and stopping. The cluster controls the IP
> addresses
> >     and moves them between nodes. The dns service resource is supposed to
> >     provide a check that the dns service is working on the node and
> migrate
> >     the service and most important the IP address if it becomes
> >     unresponsive.
> >
> >     I didn't look at the concept of clones, yet. Maybe I took a
> completely
> >     wrong approach to what I am trying to do.
> >
> >
> > I think that clones is  rally good solution for this situation. You can
> > configure BIND as a clone service with different configuration though.
> > One node will be master another slave. You can also have a floating VIP
> > tied up to any of the nodes but collocated with the running BIND.If BIND
> > dies for some reason, pacemaker will move your IP to the survived node.
> > You can addsending additional alarms.
>
> Thanks a lot! Just learned a couple of things.
>

I'm glad it helped.


>
> I have removed my own script. Installed yours and set it up. Configured
> a clone.
>
> primitive bind ocf:heartbeat:named ...
> clone bind-clone bind
>
> Then bind is kept running on all nodes and is only shutdown if it fails.
> If necessary named is restarted. Great.
>
> Then I colocate my ip resources with the clone:
>
> colocation ns1-ip-bind inf: nsi1-ip bind-clone
> colocation ns2-ip-bind inf: nsi2-ip bind-clone
>
> Thus the service IP addresses only run on nodes where bind is active. If
> bind fails on a node the ip address is moved.
>
> Two notes (regarding the latest version on github):
>
> 1. You expect rndc and host to be in $PATH. At the same time the path to
> named can be configured. I think consequently, the same should apply to
> rndc and host as they are bind utils.
>
> On our CentOS servers we run the latest version of bind, compiled from
> source and installed in a custom path which is added in /etc/profile.
> For some reason /etc/profile doesn't seem to apply to the ocf scripts
> thus the script doesn't find rndc or host unless I extend PATH manually
> at the beginning of the script.
>

We had some discussion around this and finally decided  to leave it up to
sysadmin ti make sure that both tools are available in PATH. One
can always create a couple of symlink to cover it.


>
> 2. In the stop function you call "rndc stop" to stop the daemon.
> However, if the daemon hangs, rndc will hang. Thus pacemaker runs into a
> timeout and kills the ocf script, leading to a failed stop.
>

You didn't read the code carefully again. Yes it does exactly what you want
or at least it's supposed to:

    if ! $RNDC stop >/dev/null; then
        kill `cat ${OCF_RESKEY_named_pidfile}`
    fi

    if [ -n "$OCF_RESKEY_CRM_meta_timeout" ]; then
      # Allow 2/3 of the action timeout for the orderly shutdown
      # (The origin unit is ms, hence the conversion)
      timeout=$((OCF_RESKEY_CRM_meta_timeout/1500))
    else
      timeout=20
    fi

    while named_status ; do
        if [ $timeout -ge ${OCF_RESKEY_named_stop_timeout} ]; then
            break
        else
            sleep 1
            timeout=$((timeout++))
        fi
    done

    *#If still up*
*    if named_status 2>&1; then*
*        ocf_log err "named is still up! Killing";*
*        kill -9 `cat ${OCF_RESKEY_named_pidfile}`*
*    fi*


> I think the ocf script should have its own timeout and abort the rndc
> call if it takes too long and then try to kill the server.
>

See above.


>
> To test send a STOP signal to named and wait...
>
>
> But otherwise, great script.
>
> Thanks!
>
> Gerald
>
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>



-- 
Serge Dubrouski.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111001/acf3039f/attachment-0003.html>


More information about the Pacemaker mailing list