[Pacemaker] IPaddr2 cloned address doesn't survive node standby

>> primitive p_ip_service_ns ocf:heartbeat:IPaddr2 \
>>    params ip="" cidr_netmask="24" nic="eth0" \
>>      clusterip_hash="sourceip-sourceport"
> netmask should be 32 if that's supposed to be a single IP load balanced.

I've been wondering about that, but I think 24 is correct. The address
is recognized as "secondary" by Linux, as can be seen in this "ip addr"

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP qlen 1000
    inet brd scope global eth0
    inet brd scope global secondary eth0

Setting it this way has been working fine for a long time now. *shrug*

> Don't you need colocation also between the clones so that bind can only start on a node that has already started an ip instance?

I thought since clones are started on all nodes anyway that a simple
"order" directive would suffice. But I've added a colocation constraint
as well, to be sure. Thanks for the hint.

> For the number of restarts it's likely because of the interleaving settings.  True for both would likely help that but wouldn't work in your case - more here: http://www.hastexo.com/resources/hints-and-kinks/interleaving-pacemaker-clones

Yes, there doesn't seem to be a way to interleave these cloned resources
in a way that avoids restarting Bind on such cluster state changes.

> When you put dns01 in standby does dns02 have both instances of the IP there?
> If not it should be (you are just load balancing a single IP correct?).  You need clone-node-max=2 for the ip clone.

clone-node-max was always set to "2", yes.

> If so one just doesn't move back to dns01 when you bring it out of standby?  I would look at resource stickiness=0 for the ip close resource only so the cluster will redistribute when the node comes out of standby (I think that would work).  Clones have a default stickiness of 1 if you don't have a default set for the cluster.

Bingo, the resource stickiness was the problem! I've set it to 0 and now
the IP resource gets started again when the node comes back online.

Thanks a lot, I would not have thought of that. As stated above,
shouldn't cloned resources be (re-)started on all nodes by definition?

> And/or you can write location constraints for the clone instances of ip to prefer one node over the other causing them to fail back if the node returns i.e. location ip0_prefers_dns01 cl_ip_service_ns:0 200: dns01 and location ip1_prefers_dns02 cl_ip_service_ns:1 200: dns02

That doesn't seem necessary, now with resource-stickiness="0".

Thanks again!


PS: Here's the complete configuration for the archives, in case someone
might be interested in the future:

node dns01
node dns02
primitive p_bind9 lsb:bind9 \
        op monitor interval="10s" timeout="15s" \
        op start interval="0" timeout="15s" \
        op stop interval="0" timeout="15s" \
        meta target-role="Started"
primitive p_ip_service_ns ocf:heartbeat:IPaddr2 \
        params ip="" cidr_netmask="24" nic="eth0"
clusterip_hash="sourceip-sourceport" \
        op monitor interval="10s" \
        op start interval="0" timeout="20s" \
        op stop interval="0" timeout="20s"
clone cl_bind9 p_bind9 \
        meta globally-unique="false" clone-max="2" clone-node-max="1"
interleave="false" target-role="Started"
clone cl_ip_service_ns p_ip_service_ns \
        meta globally-unique="true" clone-max="2" clone-node-max="2"
interleave="false" target-role="Started"
colocation co_ip_before_bind9 inf: cl_ip_service_ns cl_bind9
order o_ip_before_bind9 inf: cl_ip_service_ns cl_bind9
property $id="cib-bootstrap-options" \
        dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        no-quorum-policy="ignore" \
        stonith-enabled="no" \
rsc_defaults $id="rsc-options" \

