[Pacemaker] Trouble with ordering

Sat Oct 1 16:49:31 EDT 2011

On 01.10.11 04:53, Serge Dubrouski wrote:
>     Technically, I don't want the cluster to control the service in the
>     meaning of starting and stopping. The cluster controls the IP addresses
>     and moves them between nodes. The dns service resource is supposed to
>     provide a check that the dns service is working on the node and migrate
>     the service and most important the IP address if it becomes
>     unresponsive.
> 
>     I didn't look at the concept of clones, yet. Maybe I took a completely
>     wrong approach to what I am trying to do.
> 
> 
> I think that clones is  rally good solution for this situation. You can
> configure BIND as a clone service with different configuration though.
> One node will be master another slave. You can also have a floating VIP
> tied up to any of the nodes but collocated with the running BIND.If BIND
> dies for some reason, pacemaker will move your IP to the survived node.
> You can addsending additional alarms.

Thanks a lot! Just learned a couple of things.

I have removed my own script. Installed yours and set it up. Configured
a clone.

primitive bind ocf:heartbeat:named ...
clone bind-clone bind

Then bind is kept running on all nodes and is only shutdown if it fails.
If necessary named is restarted. Great.

Then I colocate my ip resources with the clone:

colocation ns1-ip-bind inf: nsi1-ip bind-clone
colocation ns2-ip-bind inf: nsi2-ip bind-clone

Thus the service IP addresses only run on nodes where bind is active. If
bind fails on a node the ip address is moved.

Two notes (regarding the latest version on github):

1. You expect rndc and host to be in $PATH. At the same time the path to
named can be configured. I think consequently, the same should apply to
rndc and host as they are bind utils.

On our CentOS servers we run the latest version of bind, compiled from
source and installed in a custom path which is added in /etc/profile.
For some reason /etc/profile doesn't seem to apply to the ocf scripts
thus the script doesn't find rndc or host unless I extend PATH manually
at the beginning of the script.

2. In the stop function you call "rndc stop" to stop the daemon.
However, if the daemon hangs, rndc will hang. Thus pacemaker runs into a
timeout and kills the ocf script, leading to a failed stop.

I think the ocf script should have its own timeout and abort the rndc
call if it takes too long and then try to kill the server.

To test send a STOP signal to named and wait...

But otherwise, great script.

Thanks!

Gerald