[ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover

Nikhil Utane nikhil.subscribed at gmail.com
Mon Oct 17 14:46:05 UTC 2016


This is driving me insane.

This is how the resources were started. Redund_CU1_WB30  was the DC which I
rebooted.
 cu_4 (ocf::redundancy:RedundancyRA): Started Redund_CU1_WB30
 cu_2 (ocf::redundancy:RedundancyRA): Started Redund_CU5_WB30
 cu_3 (ocf::redundancy:RedundancyRA): Started Redun_CU4_Wb30

Since the standby node was not UP. I was expecting resource cu_4 to be
waiting to be scheduled.
But then it re-arranged everything as below.
 cu_4 (ocf::redundancy:RedundancyRA): Started Redun_CU4_Wb30
 cu_2 (ocf::redundancy:RedundancyRA): Stopped
 cu_3 (ocf::redundancy:RedundancyRA): Started Redund_CU5_WB30

There is not much information available in the logs on new DC. It just
shows what it has decided to do but nothing to suggest why it did it that
way.

notice: Start   cu_4 (Redun_CU4_Wb30)
notice: Stop    cu_2 (Redund_CU5_WB30)
notice: Move    cu_3 (Started Redun_CU4_Wb30 -> Redund_CU5_WB30)

I have default stickiness set to 100 which is higher than any score that I
have configured.
I have migration_threshold set to 1. Should I bump that up instead?

-Thanks
Nikhil

On Sat, Oct 15, 2016 at 12:36 AM, Ken Gaillot <kgaillot at redhat.com> wrote:

> On 10/14/2016 06:56 AM, Nikhil Utane wrote:
> > Hi,
> >
> > Thank you for the responses so far.
> > I added reverse colocation as well. However seeing some other issue in
> > resource movement that I am analyzing.
> >
> > Thinking further on this, why doesn't "/a not with b" does not imply "b
> > not with a"?/
> > Coz wouldn't putting "b with a" violate "a not with b"?
> >
> > Can someone confirm that colocation is required to be configured both
> ways?
>
> The anti-colocation should only be defined one-way. Otherwise, you get a
> dependency loop (as seen in logs you showed elsewhere).
>
> The one-way constraint is enough to keep the resources apart. However,
> the question is whether the cluster might move resources around
> unnecessarily.
>
> For example, "A not with B" means that the cluster will place B first,
> then place A somewhere else. So, if B's node fails, can the cluster
> decide that A's node is now the best place for B, and move A to a free
> node, rather than simply start B on the free node?
>
> The cluster does take dependencies into account when placing a resource,
> so I would hope that wouldn't happen. But I'm not sure. Having some
> stickiness might help, so that A has some preference against moving.
>
> > -Thanks
> > Nikhil
> >
> > /
> > /
> >
> > On Fri, Oct 14, 2016 at 1:09 PM, Vladislav Bogdanov
> > <bubble at hoster-ok.com <mailto:bubble at hoster-ok.com>> wrote:
> >
> >     On October 14, 2016 10:13:17 AM GMT+03:00, Ulrich Windl
> >     <Ulrich.Windl at rz.uni-regensburg.de
> >     <mailto:Ulrich.Windl at rz.uni-regensburg.de>> wrote:
> >     >>>> Nikhil Utane <nikhil.subscribed at gmail.com
> >     <mailto:nikhil.subscribed at gmail.com>> schrieb am 13.10.2016 um
> >     >16:43 in
> >     >Nachricht
> >     ><CAGNWmJUbPucnBGXroHkHSbQ0LXovwsLFPkUPg1R8gJqRFqM9Dg at mail.gmail.com
> >     <mailto:CAGNWmJUbPucnBGXroHkHSbQ0LXovwsLFPkUPg1R8gJqRFqM9Dg@
> mail.gmail.com>>:
> >     >> Ulrich,
> >     >>
> >     >> I have 4 resources only (not 5, nodes are 5). So then I only need
> 6
> >     >> constraints, right?
> >     >>
> >     >>      [,1]   [,2]   [,3]   [,4]   [,5]  [,6]
> >     >> [1,] "A"  "A"  "A"    "B"   "B"    "C"
> >     >> [2,] "B"  "C"  "D"   "C"  "D"    "D"
> >     >
> >     >Sorry for my confusion. As Andrei Borzenkovsaid in
> >     ><CAA91j0W+epAHFLg9u6VX_X8LgFkf9Rp55g3nocY4oZNA9BbZ+g at mail.gmail.com
> >     <mailto:CAA91j0W%2BepAHFLg9u6VX_X8LgFkf9Rp55g3nocY4oZNA9BbZ%
> 2Bg at mail.gmail.com>>
> >     >you probably have to add (A, B) _and_ (B, A)! Thinking about it, I
> >     >wonder whether an easier solution would be using "utilization": If
> >     >every node has one token to give, and every resource needs on
> token, no
> >     >two resources will run on one node. Sounds like an easier solution
> to
> >     >me.
> >     >
> >     >Regards,
> >     >Ulrich
> >     >
> >     >
> >     >>
> >     >> I understand that if I configure constraint of R1 with R2 score as
> >     >> -infinity, then the same applies for R2 with R1 score as -infinity
> >     >(don't
> >     >> have to configure it explicitly).
> >     >> I am not having a problem of multiple resources getting schedule
> on
> >     >the
> >     >> same node. Rather, one working resource is unnecessarily getting
> >     >relocated.
> >     >>
> >     >> -Thanks
> >     >> Nikhil
> >     >>
> >     >>
> >     >> On Thu, Oct 13, 2016 at 7:45 PM, Ulrich Windl <
> >     >> Ulrich.Windl at rz.uni-regensburg.de
> >     <mailto:Ulrich.Windl at rz.uni-regensburg.de>> wrote:
> >     >>
> >     >>> Hi!
> >     >>>
> >     >>> Don't you need 10 constraints, excluding every possible pair of
> your
> >     >5
> >     >>> resources (named A-E here), like in this table (produced with R):
> >     >>>
> >     >>>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
> >     >>> [1,] "A"  "A"  "A"  "A"  "B"  "B"  "B"  "C"  "C"  "D"
> >     >>> [2,] "B"  "C"  "D"  "E"  "C"  "D"  "E"  "D"  "E"  "E"
> >     >>>
> >     >>> Ulrich
> >     >>>
> >     >>> >>> Nikhil Utane <nikhil.subscribed at gmail.com
> >     <mailto:nikhil.subscribed at gmail.com>> schrieb am 13.10.2016
> >     >um
> >     >>> 15:59 in
> >     >>> Nachricht
> >     >>>
> >     ><CAGNWmJW0CWMr3bvR3L9xZCAcJUzyczQbZEzUzpaJxi+Pn7Oj_A at mail.gmail.com
> >     <mailto:CAGNWmJW0CWMr3bvR3L9xZCAcJUzyczQbZEzUzpaJxi%2BPn7Oj_
> A at mail.gmail.com>>:
> >     >>> > Hi,
> >     >>> >
> >     >>> > I have 5 nodes and 4 resources configured.
> >     >>> > I have configured constraint such that no two resources can be
> >     >>> co-located.
> >     >>> > I brought down a node (which happened to be DC). I was
> expecting
> >     >the
> >     >>> > resource on the failed node would be migrated to the 5th
> waiting
> >     >node
> >     >>> (that
> >     >>> > is not running any resource).
> >     >>> > However what happened was the failed node resource was started
> on
> >     >another
> >     >>> > active node (after stopping it's existing resource) and that
> >     >node's
> >     >>> > resource was moved to the waiting node.
> >     >>> >
> >     >>> > What could I be doing wrong?
> >     >>> >
> >     >>> > <nvpair id="cib-bootstrap-options-have-watchdog" value="true"
> >     >>> > name="have-watchdog"/>
> >     >>> > <nvpair id="cib-bootstrap-options-dc-version"
> >     >value="1.1.14-5a6cdd1"
> >     >>> > name="dc-version"/>
> >     >>> > <nvpair id="cib-bootstrap-options-cluster-infrastructure"
> >     >>> value="corosync"
> >     >>> > name="cluster-infrastructure"/>
> >     >>> > <nvpair id="cib-bootstrap-options-stonith-enabled"
> value="false"
> >     >>> > name="stonith-enabled"/>
> >     >>> > <nvpair id="cib-bootstrap-options-no-quorum-policy"
> value="ignore"
> >     >>> > name="no-quorum-policy"/>
> >     >>> > <nvpair id="cib-bootstrap-options-default-action-timeout"
> >     >value="240"
> >     >>> > name="default-action-timeout"/>
> >     >>> > <nvpair id="cib-bootstrap-options-symmetric-cluster"
> value="false"
> >     >>> > name="symmetric-cluster"/>
> >     >>> >
> >     >>> > # pcs constraint
> >     >>> > Location Constraints:
> >     >>> >   Resource: cu_2
> >     >>> >     Enabled on: Redun_CU4_Wb30 (score:0)
> >     >>> >     Enabled on: Redund_CU2_WB30 (score:0)
> >     >>> >     Enabled on: Redund_CU3_WB30 (score:0)
> >     >>> >     Enabled on: Redund_CU5_WB30 (score:0)
> >     >>> >     Enabled on: Redund_CU1_WB30 (score:0)
> >     >>> >   Resource: cu_3
> >     >>> >     Enabled on: Redun_CU4_Wb30 (score:0)
> >     >>> >     Enabled on: Redund_CU2_WB30 (score:0)
> >     >>> >     Enabled on: Redund_CU3_WB30 (score:0)
> >     >>> >     Enabled on: Redund_CU5_WB30 (score:0)
> >     >>> >     Enabled on: Redund_CU1_WB30 (score:0)
> >     >>> >   Resource: cu_4
> >     >>> >     Enabled on: Redun_CU4_Wb30 (score:0)
> >     >>> >     Enabled on: Redund_CU2_WB30 (score:0)
> >     >>> >     Enabled on: Redund_CU3_WB30 (score:0)
> >     >>> >     Enabled on: Redund_CU5_WB30 (score:0)
> >     >>> >     Enabled on: Redund_CU1_WB30 (score:0)
> >     >>> >   Resource: cu_5
> >     >>> >     Enabled on: Redun_CU4_Wb30 (score:0)
> >     >>> >     Enabled on: Redund_CU2_WB30 (score:0)
> >     >>> >     Enabled on: Redund_CU3_WB30 (score:0)
> >     >>> >     Enabled on: Redund_CU5_WB30 (score:0)
> >     >>> >     Enabled on: Redund_CU1_WB30 (score:0)
> >     >>> > Ordering Constraints:
> >     >>> > Colocation Constraints:
> >     >>> >   cu_3 with cu_2 (score:-INFINITY)
> >     >>> >   cu_4 with cu_2 (score:-INFINITY)
> >     >>> >   cu_4 with cu_3 (score:-INFINITY)
> >     >>> >   cu_5 with cu_2 (score:-INFINITY)
> >     >>> >   cu_5 with cu_3 (score:-INFINITY)
> >     >>> >   cu_5 with cu_4 (score:-INFINITY)
> >     >>> >
> >     >>> > -Thanks
> >     >>> > Nikhil
> >     >>>
> >     >>>
> >     >>>
> >     >>>
> >     >>>
> >     >>> _______________________________________________
> >     >>> Users mailing list: Users at clusterlabs.org
> >     <mailto:Users at clusterlabs.org>
> >     >>> http://clusterlabs.org/mailman/listinfo/users
> >     <http://clusterlabs.org/mailman/listinfo/users>
> >     >>>
> >     >>> Project Home: http://www.clusterlabs.org
> >     >>> Getting started:
> >     >http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >     <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> >     >>> Bugs: http://bugs.clusterlabs.org
> >     >>>
> >     >
> >     >
> >     >
> >     >
> >     >_______________________________________________
> >     >Users mailing list: Users at clusterlabs.org
> >     <mailto:Users at clusterlabs.org>
> >     >http://clusterlabs.org/mailman/listinfo/users
> >     <http://clusterlabs.org/mailman/listinfo/users>
> >     >
> >     >Project Home: http://www.clusterlabs.org
> >     >Getting started:
> >     >http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >     <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> >     >Bugs: http://bugs.clusterlabs.org
> >
> >     Hi,
> >
> >     use of utilization (balanced strategy) has one caveat: resources are
> >     not moved just because of utilization of one node is less, when
> >     nodes have the same allocation score for the resource.
> >     So, after the simultaneus outage of two nodes in a 5-node cluster,
> >     it may appear that one node runs two resources and two recovered
> >     nodes run nothing.
> >
> >     Original 'utilization' strategy only limits resource placement, it
> >     is not considered when choosing a node for a resource.
> >
> >     Vladislav
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20161017/8f8f26de/attachment-0002.html>


More information about the Users mailing list