[Pacemaker] Preventing auto-fail-back

Wed May 18 05:02:44 EDT 2011

Hi,

On Wed, May 18, 2011 at 11:30 AM, Max Williams <Max.Williams at betfair.com>wrote:

> Hi Daniel,
>
> You might want to set “on-fail=standby” for the resource group or
> individual resources. This will put the host in to standby when a failure
> occurs thus preventing failback:
>

This is not the most optimal solution.

>
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-operations.html#s-resource-failure
>
>
>
> Another option is to set resource stickiness which will stop resources
> moving back after a failure:
>
>
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch05s03s02.html
>

That is set globally in his config.

>
>
> Also note if you are using a two node cluster you will also need the
> property “no-quorum-policy=ignore” set.
>

This as well.

>
>
> Hope that helps!
>
> Cheers,
>
> Max
>
>
>
> *From:* Daniel Bozeman [mailto:daniel.bozeman at americanroamer.com]
> *Sent:* 17 May 2011 19:09
> *To:* pacemaker at oss.clusterlabs.org
> *Subject:* Re: [Pacemaker] Preventing auto-fail-back
>
>
>
> To be more specific:
>
>
>
> I've tried following the example on page 25/26 of this document to the
> teeth: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>

Well, not really, that's why there are errors in your config.

>
>
> And it does work as advertised. When I stop corosync, the resource goes to
> the other node. I start corosync and it remains there as it should.
>
>
>
> However, if I simply unplug the ethernet connection, let the resource
> migrate, then plug it back in, it will fail back to the original node. Is
> this the intended behavior? It seems a bad NIC could wreck havoc on such a
> setup.
>
>
>
> Thanks!
>
>
>
> Daniel
>
>
>
> On May 16, 2011, at 5:33 PM, Daniel Bozeman wrote:
>
>
>
> For the life of me, I cannot prevent auto-failback from occurring in a
> master-slave setup I have in virtual machines. I have a very simple
> configuration:
>
> node $id="4fe75075-333c-4614-8a8a-87149c7c9fbb" ha2 \
>        attributes standby="off"
> node $id="70718968-41b5-4aee-ace1-431b5b65fd52" ha1 \
>        attributes standby="off"
> primitive FAILOVER-IP ocf:heartbeat:IPaddr \
>        params ip="192.168.1.79" \
>        op monitor interval="10s"
> primitive PGPOOL lsb:pgpool2 \
>        op monitor interval="10s"
> group PGPOOL-AND-IP FAILOVER-IP PGPOOL
> colocation IP-WITH-PGPOOL inf: FAILOVER-IP PGPOOL
> property $id="cib-bootstrap-options" \
>        dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
>

Change to cluster-infrastructure="openais"

>        cluster-infrastructure="Heartbeat" \
>        stonith-enabled="false" \
>        no-quorum-policy="ignore"
>

You're missing expected-quorum-votes here, it should be
expected-quorum-votes="2" and it's usually added automatically when the
nodes are added/seen to/by the cluster, I assume it's related to the
cluster-infrastructure="Heartbeat".

Regards,
Dan

> rsc_defaults $id="rsc-options" \
>        resource-stickiness="1000"
>
> No matter what I do with resource stickiness, I cannot prevent fail-back. I
> usually don't have a problem with failback when I restart the current
> master, but when I disable network connectivity to the master, everything
> fails over fine. Then I enable the network adapter and everything jumps back
> to the original "failed" node. I've done some "watch ptest -Ls"ing, and the
> scores seem to signify that failback should not occur. I'm also seeing
> resources bounce more times than necessary when a node is added (~3 times
> each) and resources seem to bounce when a node returns to the cluster even
> if it isn't necessary for them to do so. I also had an order directive in my
> configuration at one time, and often the second resource would start, then
> stop, then allow the first resource to start, then start itself. Quite
> weird. Any nods in the right direction would be greatly appreciated. I've
> scoured Google and read the official documentation to no avail. I suppose I
> should mention I am using heartbeat as well. My LSB resource implements
> start/stop/status properly without error.
>
> I've been testing this with a floating IP + Postgres as well with the same
> issues. One thing I notice is that my "group" resources have no score. Is
> this normal? There doesn't seem to be any way to assign a stickiness to a
> group, and default stickiness has no effect.
>
> Thanks!
>
> Daniel Bozeman
>
>
>
> Daniel Bozeman
> American Roamer
> Systems Administrator
> daniel.bozeman at americanroamer.com
>
>
>
> ________________________________________________________________________
> In order to protect our email recipients, Betfair Group use SkyScan from
> MessageLabs to scan all Incoming and Outgoing mail for viruses.
>
> ________________________________________________________________________
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>

-- 
Dan Frincu
CCNA, RHCE
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110518/a4b3408d/attachment-0003.html>