<br><br><div class="gmail_quote">On Sat, Oct 1, 2011 at 2:49 PM, Gerald Vogt <span dir="ltr"><<a href="mailto:vogt@spamcop.net">vogt@spamcop.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="im">On 01.10.11 04:53, Serge Dubrouski wrote:<br>
> Technically, I don't want the cluster to control the service in the<br>
> meaning of starting and stopping. The cluster controls the IP addresses<br>
> and moves them between nodes. The dns service resource is supposed to<br>
> provide a check that the dns service is working on the node and migrate<br>
> the service and most important the IP address if it becomes<br>
> unresponsive.<br>
><br>
> I didn't look at the concept of clones, yet. Maybe I took a completely<br>
> wrong approach to what I am trying to do.<br>
><br>
><br>
> I think that clones is rally good solution for this situation. You can<br>
> configure BIND as a clone service with different configuration though.<br>
> One node will be master another slave. You can also have a floating VIP<br>
> tied up to any of the nodes but collocated with the running BIND.If BIND<br>
> dies for some reason, pacemaker will move your IP to the survived node.<br>
> You can addsending additional alarms.<br>
<br>
</div>Thanks a lot! Just learned a couple of things.<br></blockquote><div><br></div><div>I'm glad it helped.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
I have removed my own script. Installed yours and set it up. Configured<br>
a clone.<br>
<br>
primitive bind ocf:heartbeat:named ...<br>
clone bind-clone bind<br>
<br>
Then bind is kept running on all nodes and is only shutdown if it fails.<br>
If necessary named is restarted. Great.<br>
<br>
Then I colocate my ip resources with the clone:<br>
<br>
colocation ns1-ip-bind inf: nsi1-ip bind-clone<br>
colocation ns2-ip-bind inf: nsi2-ip bind-clone<br>
<br>
Thus the service IP addresses only run on nodes where bind is active. If<br>
bind fails on a node the ip address is moved.<br>
<br>
Two notes (regarding the latest version on github):<br>
<br>
1. You expect rndc and host to be in $PATH. At the same time the path to<br>
named can be configured. I think consequently, the same should apply to<br>
rndc and host as they are bind utils.<br>
<br>
On our CentOS servers we run the latest version of bind, compiled from<br>
source and installed in a custom path which is added in /etc/profile.<br>
For some reason /etc/profile doesn't seem to apply to the ocf scripts<br>
thus the script doesn't find rndc or host unless I extend PATH manually<br>
at the beginning of the script.<br></blockquote><div><br></div><div>We had some discussion around this and finally decided to leave it up to sysadmin ti make sure that both tools are available in PATH. One can always create a couple of symlink to cover it.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
2. In the stop function you call "rndc stop" to stop the daemon.<br>
However, if the daemon hangs, rndc will hang. Thus pacemaker runs into a<br>
timeout and kills the ocf script, leading to a failed stop.<br></blockquote><div><br></div><div>You didn't read the code carefully again. Yes it does exactly what you want or at least it's supposed to:</div><div>
<br></div><div><div> if ! $RNDC stop >/dev/null; then</div><div> kill `cat ${OCF_RESKEY_named_pidfile}`</div><div> fi</div></div><div><br></div><div><div> if [ -n "$OCF_RESKEY_CRM_meta_timeout" ]; then</div>
<div> # Allow 2/3 of the action timeout for the orderly shutdown</div><div> # (The origin unit is ms, hence the conversion)</div><div> timeout=$((OCF_RESKEY_CRM_meta_timeout/1500))</div><div> else</div><div>
timeout=20</div><div> fi</div><div><br></div><div> while named_status ; do</div><div> if [ $timeout -ge ${OCF_RESKEY_named_stop_timeout} ]; then</div><div> break</div><div> else</div><div>
sleep 1</div><div> timeout=$((timeout++))</div><div> fi</div><div> done</div><div><br></div><div> <b><font class="Apple-style-span" color="#FF0000">#If still up</font></b></div><div><b><font class="Apple-style-span" color="#FF0000"> if named_status 2>&1; then</font></b></div>
<div><b><font class="Apple-style-span" color="#FF0000"> ocf_log err "named is still up! Killing";</font></b></div><div><b><font class="Apple-style-span" color="#FF0000"> kill -9 `cat ${OCF_RESKEY_named_pidfile}`</font></b></div>
<div><b><font class="Apple-style-span" color="#FF0000"> fi</font></b></div></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
I think the ocf script should have its own timeout and abort the rndc<br>
call if it takes too long and then try to kill the server.<br></blockquote><div><br></div><div>See above.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
To test send a STOP signal to named and wait...<br>
<br>
<br>
But otherwise, great script.<br>
<div><div></div><div class="h5"><br>
Thanks!<br>
<br>
Gerald<br>
<br>
<br>
<br>
<br>
<br>
_______________________________________________<br>
Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker" target="_blank">http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker</a><br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br>Serge Dubrouski.<br>