[Pacemaker] Preventing auto-fail-back

Mon May 16 18:33:49 EDT 2011

For the life of me, I cannot prevent auto-failback from occurring in a master-slave setup I have in virtual machines. I have a very simple configuration:

node $id="4fe75075-333c-4614-8a8a-87149c7c9fbb" ha2 \
        attributes standby="off"
node $id="70718968-41b5-4aee-ace1-431b5b65fd52" ha1 \
        attributes standby="off"
primitive FAILOVER-IP ocf:heartbeat:IPaddr \
        params ip="192.168.1.79" \
        op monitor interval="10s"
primitive PGPOOL lsb:pgpool2 \
        op monitor interval="10s"
group PGPOOL-AND-IP FAILOVER-IP PGPOOL
colocation IP-WITH-PGPOOL inf: FAILOVER-IP PGPOOL
property $id="cib-bootstrap-options" \
        dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
        cluster-infrastructure="Heartbeat" \
        stonith-enabled="false" \
        no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
        resource-stickiness="1000"

No matter what I do with resource stickiness, I cannot prevent fail-back. I usually don't have a problem with failback when I restart the current master, but when I disable network connectivity to the master, everything fails over fine. Then I enable the network adapter and everything jumps back to the original "failed" node. I've done some "watch ptest -Ls"ing, and the scores seem to signify that failback should not occur. I'm also seeing resources bounce more times than necessary when a node is added (~3 times each) and resources seem to bounce when a node returns to the cluster even if it isn't necessary for them to do so. I also had an order directive in my configuration at one time, and often the second resource would start, then stop, then allow the first resource to start, then start itself. Quite weird. Any nods in the right direction would be greatly appreciated. I've scoured Google and read the official documentation to no avail. I suppose I should mention I am using heartbeat as well. My LSB resource implements start/stop/status properly without error.

I've been testing this with a floating IP + Postgres as well with the same issues. One thing I notice is that my "group" resources have no score. Is this normal? There doesn't seem to be any way to assign a stickiness to a group, and default stickiness has no effect.

Thanks!

Daniel Bozeman