<br><br><div class="gmail_quote">On Sat, Oct 1, 2011 at 2:49 PM, Gerald Vogt <span dir="ltr"><<a href="mailto:vogt@spamcop.net">vogt@spamcop.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div class="im">On 01.10.11 04:53, Serge Dubrouski wrote:<br>

>     Technically, I don't want the cluster to control the service in the<br>

>     meaning of starting and stopping. The cluster controls the IP addresses<br>

>     and moves them between nodes. The dns service resource is supposed to<br>

>     provide a check that the dns service is working on the node and migrate<br>

>     the service and most important the IP address if it becomes<br>

>     unresponsive.<br>

><br>

>     I didn't look at the concept of clones, yet. Maybe I took a completely<br>

>     wrong approach to what I am trying to do.<br>

><br>

><br>

> I think that clones is  rally good solution for this situation. You can<br>

> configure BIND as a clone service with different configuration though.<br>

> One node will be master another slave. You can also have a floating VIP<br>

> tied up to any of the nodes but collocated with the running BIND.If BIND<br>

> dies for some reason, pacemaker will move your IP to the survived node.<br>

> You can addsending additional alarms.<br>

<br>

</div>Thanks a lot! Just learned a couple of things.<br></blockquote><div><br></div><div>I'm glad it helped.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


<br>

I have removed my own script. Installed yours and set it up. Configured<br>

a clone.<br>

<br>

primitive bind ocf:heartbeat:named ...<br>

clone bind-clone bind<br>

<br>

Then bind is kept running on all nodes and is only shutdown if it fails.<br>

If necessary named is restarted. Great.<br>

<br>

Then I colocate my ip resources with the clone:<br>

<br>

colocation ns1-ip-bind inf: nsi1-ip bind-clone<br>

colocation ns2-ip-bind inf: nsi2-ip bind-clone<br>

<br>

Thus the service IP addresses only run on nodes where bind is active. If<br>

bind fails on a node the ip address is moved.<br>

<br>

Two notes (regarding the latest version on github):<br>

<br>

1. You expect rndc and host to be in $PATH. At the same time the path to<br>

named can be configured. I think consequently, the same should apply to<br>

rndc and host as they are bind utils.<br>

<br>

On our CentOS servers we run the latest version of bind, compiled from<br>

source and installed in a custom path which is added in /etc/profile.<br>

For some reason /etc/profile doesn't seem to apply to the ocf scripts<br>

thus the script doesn't find rndc or host unless I extend PATH manually<br>

at the beginning of the script.<br></blockquote><div><br></div><div>We had some discussion around this and finally decided  to leave it up to sysadmin ti make sure that both tools are available in PATH. One can always create a couple of symlink to cover it.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br>

2. In the stop function you call "rndc stop" to stop the daemon.<br>

However, if the daemon hangs, rndc will hang. Thus pacemaker runs into a<br>

timeout and kills the ocf script, leading to a failed stop.<br></blockquote><div><br></div><div>You didn't read the code carefully again. Yes it does exactly what you want or at least it's supposed to:</div><div>

<br></div><div><div>    if ! $RNDC stop >/dev/null; then</div><div>        kill `cat ${OCF_RESKEY_named_pidfile}`</div><div>    fi</div></div><div><br></div><div><div>    if [ -n "$OCF_RESKEY_CRM_meta_timeout" ]; then</div>

<div>      # Allow 2/3 of the action timeout for the orderly shutdown</div><div>      # (The origin unit is ms, hence the conversion)</div><div>      timeout=$((OCF_RESKEY_CRM_meta_timeout/1500))</div><div>    else</div><div>

      timeout=20</div><div>    fi</div><div><br></div><div>    while named_status ; do</div><div>        if [ $timeout -ge ${OCF_RESKEY_named_stop_timeout} ]; then</div><div>            break</div><div>        else</div><div>

            sleep 1</div><div>            timeout=$((timeout++))</div><div>        fi</div><div>    done</div><div><br></div><div>    <b><font class="Apple-style-span" color="#FF0000">#If still up</font></b></div><div><b><font class="Apple-style-span" color="#FF0000">    if named_status 2>&1; then</font></b></div>

<div><b><font class="Apple-style-span" color="#FF0000">        ocf_log err "named is still up! Killing";</font></b></div><div><b><font class="Apple-style-span" color="#FF0000">        kill -9 `cat ${OCF_RESKEY_named_pidfile}`</font></b></div>

<div><b><font class="Apple-style-span" color="#FF0000">    fi</font></b></div></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br>

I think the ocf script should have its own timeout and abort the rndc<br>

call if it takes too long and then try to kill the server.<br></blockquote><div><br></div><div>See above.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


<br>

To test send a STOP signal to named and wait...<br>

<br>

<br>

But otherwise, great script.<br>

<div><div></div><div class="h5"><br>

Thanks!<br>

<br>

Gerald<br>

<br>

<br>

<br>

<br>

<br>

_______________________________________________<br>

Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>

<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

<br>

Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

Bugs: <a href="http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker" target="_blank">http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker</a><br>

</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br>Serge Dubrouski.<br>