[ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover

Tue Oct 18 14:25:30 UTC 2016

On 10/17/2016 11:29 PM, Nikhil Utane wrote:
> Thanks Ken.
> I will give it a shot.
> 
> http://oss.clusterlabs.org/pipermail/pacemaker/2011-August/011271.html
> On this thread, if I interpret it correctly, his problem was solved when
> he swapped the anti-location constraint 
> 
> From (mapping to my example)
> cu_2 with cu_4 (score:-INFINITY)
> cu_3 with cu_4 (score:-INFINITY)
> cu_2 with cu_3 (score:-INFINITY)
> 
> To
> cu_2 with cu_4 (score:-INFINITY)
> cu_4 with cu_3 (score:-INFINITY)
> cu_3 with cu_2 (score:-INFINITY)
> 
> Do you think that would make any difference? The way you explained it,
> sounds to me it might.

It would create a dependency loop:

cu_2 must be placed before cu_3
cu_3 must be placed before cu_4
cu_4 must be placed before cu_2
(loop)

The cluster tries to detect and break such loops, but I wouldn't rely on
that resulting in a particular behavior.

> -Regards
> Nikhil
> 
> On Mon, Oct 17, 2016 at 11:36 PM, Ken Gaillot <kgaillot at redhat.com
> <mailto:kgaillot at redhat.com>> wrote:
> 
>     On 10/17/2016 09:55 AM, Nikhil Utane wrote:
>     > I see these prints.
>     >
>     > pengine:     info: rsc_merge_weights:cu_4: Rolling back scores from cu_3
>     > pengine:    debug: native_assign_node:Assigning Redun_CU4_Wb30 to cu_4
>     > pengine:     info: rsc_merge_weights:cu_3: Rolling back scores from cu_2
>     > pengine:    debug: native_assign_node:Assigning Redund_CU5_WB30 to cu_3
>     >
>     > Looks like rolling back the scores is causing the new decision to
>     > relocate the resources.
>     > Am I using the scores incorrectly?
> 
>     No, I think this is expected.
> 
>     Your anti-colocation constraints place cu_2 and cu_3 relative to cu_4,
>     so that means the cluster will place cu_4 first if possible, before
>     deciding where the others should go. Similarly, cu_2 has a constraint
>     relative to cu_3, so cu_3 gets placed next, and cu_2 is the one left
>     out.
> 
>     The anti-colocation scores of -INFINITY outweigh the stickiness of 100.
>     I'm not sure whether setting stickiness to INFINITY would change
>     anything; hopefully, it would stop cu_3 from moving, but cu_2 would
>     still be stopped.
> 
>     I don't see a good way around this. The cluster has to place some
>     resource first, in order to know not to place some other resource on the
>     same node. I don't think there's a way to make them "equal", because
>     then none of them could be placed to begin with -- unless you went with
>     utilization attributes, as someone else suggested, with
>     placement-strategy=balanced:
> 
>     http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140521708557280
>     <http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140521708557280>
> 
>     >
>     > [root at Redund_CU5_WB30 root]# pcs constraint
>     > Location Constraints:
>     >   Resource: cu_2
>     >     Enabled on: Redun_CU4_Wb30 (score:0)
>     >     Enabled on: Redund_CU5_WB30 (score:0)
>     >     Enabled on: Redund_CU3_WB30 (score:0)
>     >     Enabled on: Redund_CU1_WB30 (score:0)
>     >   Resource: cu_3
>     >     Enabled on: Redun_CU4_Wb30 (score:0)
>     >     Enabled on: Redund_CU5_WB30 (score:0)
>     >     Enabled on: Redund_CU3_WB30 (score:0)
>     >     Enabled on: Redund_CU1_WB30 (score:0)
>     >   Resource: cu_4
>     >     Enabled on: Redun_CU4_Wb30 (score:0)
>     >     Enabled on: Redund_CU5_WB30 (score:0)
>     >     Enabled on: Redund_CU3_WB30 (score:0)
>     >     Enabled on: Redund_CU1_WB30 (score:0)
>     > Ordering Constraints:
>     > Colocation Constraints:
>     >   cu_2 with cu_4 (score:-INFINITY)
>     >   cu_3 with cu_4 (score:-INFINITY)
>     >   cu_2 with cu_3 (score:-INFINITY)
>     >
>     >
>     > On Mon, Oct 17, 2016 at 8:16 PM, Nikhil Utane
>     > <nikhil.subscribed at gmail.com <mailto:nikhil.subscribed at gmail.com>
>     <mailto:nikhil.subscribed at gmail.com
>     <mailto:nikhil.subscribed at gmail.com>>> wrote:
>     >
>     >     This is driving me insane.
>     >
>     >     This is how the resources were started. Redund_CU1_WB30  was the DC
>     >     which I rebooted.
>     >      cu_4(ocf::redundancy:RedundancyRA):Started Redund_CU1_WB30
>     >      cu_2(ocf::redundancy:RedundancyRA):Started Redund_CU5_WB30
>     >      cu_3(ocf::redundancy:RedundancyRA):Started Redun_CU4_Wb30
>     >
>     >     Since the standby node was not UP. I was expecting resource cu_4 to
>     >     be waiting to be scheduled.
>     >     But then it re-arranged everything as below.
>     >      cu_4(ocf::redundancy:RedundancyRA):Started Redun_CU4_Wb30
>     >      cu_2(ocf::redundancy:RedundancyRA):Stopped
>     >      cu_3(ocf::redundancy:RedundancyRA):Started Redund_CU5_WB30
>     >
>     >     There is not much information available in the logs on new DC. It
>     >     just shows what it has decided to do but nothing to suggest why it
>     >     did it that way.
>     >
>     >     notice: Start   cu_4(Redun_CU4_Wb30)
>     >     notice: Stop    cu_2(Redund_CU5_WB30)
>     >     notice: Move    cu_3(Started Redun_CU4_Wb30 -> Redund_CU5_WB30)
>     >
>     >     I have default stickiness set to 100 which is higher than any score
>     >     that I have configured.
>     >     I have migration_threshold set to 1. Should I bump that up instead?
>     >
>     >     -Thanks
>     >     Nikhil
>     >
>     >     On Sat, Oct 15, 2016 at 12:36 AM, Ken Gaillot <kgaillot at redhat.com <mailto:kgaillot at redhat.com>
>     >     <mailto:kgaillot at redhat.com <mailto:kgaillot at redhat.com>>> wrote:
>     >
>     >         On 10/14/2016 06:56 AM, Nikhil Utane wrote:
>     >         > Hi,
>     >         >
>     >         > Thank you for the responses so far.
>     >         > I added reverse colocation as well. However seeing some
>     other issue in
>     >         > resource movement that I am analyzing.
>     >         >
>     >         > Thinking further on this, why doesn't "/a not with b"
>     does not
>     >         imply "b
>     >         > not with a"?/
>     >         > Coz wouldn't putting "b with a" violate "a not with b"?
>     >         >
>     >         > Can someone confirm that colocation is required to be
>     configured both ways?
>     >
>     >         The anti-colocation should only be defined one-way. Otherwise,
>     >         you get a
>     >         dependency loop (as seen in logs you showed elsewhere).
>     >
>     >         The one-way constraint is enough to keep the resources apart.
>     >         However,
>     >         the question is whether the cluster might move resources
>     around
>     >         unnecessarily.
>     >
>     >         For example, "A not with B" means that the cluster will
>     place B
>     >         first,
>     >         then place A somewhere else. So, if B's node fails, can
>     the cluster
>     >         decide that A's node is now the best place for B, and move
>     A to
>     >         a free
>     >         node, rather than simply start B on the free node?
>     >
>     >         The cluster does take dependencies into account when placing a
>     >         resource,
>     >         so I would hope that wouldn't happen. But I'm not sure.
>     Having some
>     >         stickiness might help, so that A has some preference
>     against moving.
>     >
>     >         > -Thanks
>     >         > Nikhil
>     >         >
>     >         > /
>     >         > /
>     >         >
>     >         > On Fri, Oct 14, 2016 at 1:09 PM, Vladislav Bogdanov
>     >         > <bubble at hoster-ok.com <mailto:bubble at hoster-ok.com>
>     <mailto:bubble at hoster-ok.com <mailto:bubble at hoster-ok.com>>
>     >         <mailto:bubble at hoster-ok.com <mailto:bubble at hoster-ok.com>
>     <mailto:bubble at hoster-ok.com <mailto:bubble at hoster-ok.com>>>> wrote:
>     >         >
>     >         >     On October 14, 2016 10:13:17 AM GMT+03:00, Ulrich Windl
>     >         >     <Ulrich.Windl at rz.uni-regensburg.de
>     <mailto:Ulrich.Windl at rz.uni-regensburg.de>
>     >         <mailto:Ulrich.Windl at rz.uni-regensburg.de
>     <mailto:Ulrich.Windl at rz.uni-regensburg.de>>
>     >         >     <mailto:Ulrich.Windl at rz.uni-regensburg.de
>     <mailto:Ulrich.Windl at rz.uni-regensburg.de>
>     >         <mailto:Ulrich.Windl at rz.uni-regensburg.de
>     <mailto:Ulrich.Windl at rz.uni-regensburg.de>>>> wrote:
>     >         >     >>>> Nikhil Utane <nikhil.subscribed at gmail.com <mailto:nikhil.subscribed at gmail.com>
>     <mailto:nikhil.subscribed at gmail.com
>     <mailto:nikhil.subscribed at gmail.com>>
>     >         >     <mailto:nikhil.subscribed at gmail.com
>     <mailto:nikhil.subscribed at gmail.com>
>     >         <mailto:nikhil.subscribed at gmail.com <mailto:nikhil.subscribed at gmail.com>>>>
>     schrieb am 13.10.2016 um
>     >         >     >16:43 in
>     >         >     >Nachricht
>     >         >     ><CAGNWmJUbPucnBGXroHkHSbQ0LXovwsLFPkUPg1R8gJqRFqM9Dg at mail.gmail.com
>     <mailto:CAGNWmJUbPucnBGXroHkHSbQ0LXovwsLFPkUPg1R8gJqRFqM9Dg at mail.gmail.com>
>     >         <mailto:CAGNWmJUbPucnBGXroHkHSbQ0LXovwsLFPkUPg1R8gJqRFqM9Dg at mail.gmail.com
>     <mailto:CAGNWmJUbPucnBGXroHkHSbQ0LXovwsLFPkUPg1R8gJqRFqM9Dg at mail.gmail.com>>
>     >         >
>     >         
>     <mailto:CAGNWmJUbPucnBGXroHkHSbQ0LXovwsLFPkUPg1R8gJqRFqM9Dg at mail.gmail.com
>     <mailto:CAGNWmJUbPucnBGXroHkHSbQ0LXovwsLFPkUPg1R8gJqRFqM9Dg at mail.gmail.com>
>     >         <mailto:CAGNWmJUbPucnBGXroHkHSbQ0LXovwsLFPkUPg1R8gJqRFqM9Dg at mail.gmail.com
>     <mailto:CAGNWmJUbPucnBGXroHkHSbQ0LXovwsLFPkUPg1R8gJqRFqM9Dg at mail.gmail.com>>>>:
>     >         >     >> Ulrich,
>     >         >     >>
>     >         >     >> I have 4 resources only (not 5, nodes are 5). So then I only need 6
>     >         >     >> constraints, right?
>     >         >     >>
>     >         >     >>      [,1]   [,2]   [,3]   [,4]   [,5]  [,6]
>     >         >     >> [1,] "A"  "A"  "A"    "B"   "B"    "C"
>     >         >     >> [2,] "B"  "C"  "D"   "C"  "D"    "D"
>     >         >     >
>     >         >     >Sorry for my confusion. As Andrei Borzenkovsaid in
>     >         >     ><CAA91j0W+epAHFLg9u6VX_X8LgFkf9Rp55g3nocY4oZNA9BbZ+g at mail.gmail.com
>     <mailto:CAA91j0W%2BepAHFLg9u6VX_X8LgFkf9Rp55g3nocY4oZNA9BbZ%2Bg at mail.gmail.com>
>     >         <mailto:CAA91j0W%2BepAHFLg9u6VX_X8LgFkf9Rp55g3nocY4oZNA9BbZ%2Bg at mail.gmail.com
>     <mailto:CAA91j0W%252BepAHFLg9u6VX_X8LgFkf9Rp55g3nocY4oZNA9BbZ%252Bg at mail.gmail.com>>
>     >         >
>     >         
>     <mailto:CAA91j0W%2BepAHFLg9u6VX_X8LgFkf9Rp55g3nocY4oZNA9BbZ%2Bg at mail.gmail.com
>     <mailto:CAA91j0W%252BepAHFLg9u6VX_X8LgFkf9Rp55g3nocY4oZNA9BbZ%252Bg at mail.gmail.com>
>     >       
>      <mailto:CAA91j0W%252BepAHFLg9u6VX_X8LgFkf9Rp55g3nocY4oZNA9BbZ%252Bg at mail.gmail.com
>     <mailto:CAA91j0W%25252BepAHFLg9u6VX_X8LgFkf9Rp55g3nocY4oZNA9BbZ%25252Bg at mail.gmail.com>>>>
>     >         >     >you probably have to add (A, B) _and_ (B, A)! Thinking about it, I
>     >         >     >wonder whether an easier solution would be using "utilization": If
>     >         >     >every node has one token to give, and every resource needs on token, no
>     >         >     >two resources will run on one node. Sounds like an easier solution to
>     >         >     >me.
>     >         >     >
>     >         >     >Regards,
>     >         >     >Ulrich
>     >         >     >
>     >         >     >
>     >         >     >>
>     >         >     >> I understand that if I configure constraint of R1 with R2 score as
>     >         >     >> -infinity, then the same applies for R2 with R1 score as -infinity
>     >         >     >(don't
>     >         >     >> have to configure it explicitly).
>     >         >     >> I am not having a problem of multiple resources getting schedule on
>     >         >     >the
>     >         >     >> same node. Rather, one working resource is unnecessarily getting
>     >         >     >relocated.
>     >         >     >>
>     >         >     >> -Thanks
>     >         >     >> Nikhil
>     >         >     >>
>     >         >     >>
>     >         >     >> On Thu, Oct 13, 2016 at 7:45 PM, Ulrich Windl <
>     >         >     >> Ulrich.Windl at rz.uni-regensburg.de
>     <mailto:Ulrich.Windl at rz.uni-regensburg.de>
>     >         <mailto:Ulrich.Windl at rz.uni-regensburg.de
>     <mailto:Ulrich.Windl at rz.uni-regensburg.de>>
>     >         >     <mailto:Ulrich.Windl at rz.uni-regensburg.de
>     <mailto:Ulrich.Windl at rz.uni-regensburg.de>
>     >         <mailto:Ulrich.Windl at rz.uni-regensburg.de
>     <mailto:Ulrich.Windl at rz.uni-regensburg.de>>>> wrote:
>     >         >     >>
>     >         >     >>> Hi!
>     >         >     >>>
>     >         >     >>> Don't you need 10 constraints, excluding every possible pair of your
>     >         >     >5
>     >         >     >>> resources (named A-E here), like in this table (produced with R):
>     >         >     >>>
>     >         >     >>>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>     >         >     >>> [1,] "A"  "A"  "A"  "A"  "B"  "B"  "B"  "C"  "C"  "D"
>     >         >     >>> [2,] "B"  "C"  "D"  "E"  "C"  "D"  "E"  "D"  "E"  "E"
>     >         >     >>>
>     >         >     >>> Ulrich
>     >         >     >>>
>     >         >     >>> >>> Nikhil Utane <nikhil.subscribed at gmail.com <mailto:nikhil.subscribed at gmail.com>
>     <mailto:nikhil.subscribed at gmail.com
>     <mailto:nikhil.subscribed at gmail.com>>
>     >         >     <mailto:nikhil.subscribed at gmail.com
>     <mailto:nikhil.subscribed at gmail.com>
>     >         <mailto:nikhil.subscribed at gmail.com <mailto:nikhil.subscribed at gmail.com>>>>
>     schrieb am 13.10.2016
>     >         >     >um
>     >         >     >>> 15:59 in
>     >         >     >>> Nachricht
>     >         >     >>>
>     >         >     ><CAGNWmJW0CWMr3bvR3L9xZCAcJUzyczQbZEzUzpaJxi+Pn7Oj_A at mail.gmail.com
>     <mailto:CAGNWmJW0CWMr3bvR3L9xZCAcJUzyczQbZEzUzpaJxi%2BPn7Oj_A at mail.gmail.com>
>     >         <mailto:CAGNWmJW0CWMr3bvR3L9xZCAcJUzyczQbZEzUzpaJxi%2BPn7Oj_A at mail.gmail.com
>     <mailto:CAGNWmJW0CWMr3bvR3L9xZCAcJUzyczQbZEzUzpaJxi%252BPn7Oj_A at mail.gmail.com>>
>     >         >
>     >         
>     <mailto:CAGNWmJW0CWMr3bvR3L9xZCAcJUzyczQbZEzUzpaJxi%2BPn7Oj_A at mail.gmail.com
>     <mailto:CAGNWmJW0CWMr3bvR3L9xZCAcJUzyczQbZEzUzpaJxi%252BPn7Oj_A at mail.gmail.com>
>     >       
>      <mailto:CAGNWmJW0CWMr3bvR3L9xZCAcJUzyczQbZEzUzpaJxi%252BPn7Oj_A at mail.gmail.com
>     <mailto:CAGNWmJW0CWMr3bvR3L9xZCAcJUzyczQbZEzUzpaJxi%25252BPn7Oj_A at mail.gmail.com>>>>:
>     >         >     >>> > Hi,
>     >         >     >>> >
>     >         >     >>> > I have 5 nodes and 4 resources configured.
>     >         >     >>> > I have configured constraint such that no two
>     >         resources can be
>     >         >     >>> co-located.
>     >         >     >>> > I brought down a node (which happened to be DC). I
>     >         was expecting
>     >         >     >the
>     >         >     >>> > resource on the failed node would be migrated
>     to the
>     >         5th waiting
>     >         >     >node
>     >         >     >>> (that
>     >         >     >>> > is not running any resource).
>     >         >     >>> > However what happened was the failed node resource
>     >         was started on
>     >         >     >another
>     >         >     >>> > active node (after stopping it's existing
>     resource)
>     >         and that
>     >         >     >node's
>     >         >     >>> > resource was moved to the waiting node.
>     >         >     >>> >
>     >         >     >>> > What could I be doing wrong?
>     >         >     >>> >
>     >         >     >>> > <nvpair id="cib-bootstrap-options-have-watchdog"
>     >         value="true"
>     >         >     >>> > name="have-watchdog"/>
>     >         >     >>> > <nvpair id="cib-bootstrap-options-dc-version"
>     >         >     >value="1.1.14-5a6cdd1"
>     >         >     >>> > name="dc-version"/>
>     >         >     >>> > <nvpair
>     >         id="cib-bootstrap-options-cluster-infrastructure"
>     >         >     >>> value="corosync"
>     >         >     >>> > name="cluster-infrastructure"/>
>     >         >     >>> > <nvpair id="cib-bootstrap-options-stonith-enabled"
>     >         value="false"
>     >         >     >>> > name="stonith-enabled"/>
>     >         >     >>> > <nvpair
>     id="cib-bootstrap-options-no-quorum-policy"
>     >         value="ignore"
>     >         >     >>> > name="no-quorum-policy"/>
>     >         >     >>> > <nvpair
>     >         id="cib-bootstrap-options-default-action-timeout"
>     >         >     >value="240"
>     >         >     >>> > name="default-action-timeout"/>
>     >         >     >>> > <nvpair
>     id="cib-bootstrap-options-symmetric-cluster"
>     >         value="false"
>     >         >     >>> > name="symmetric-cluster"/>
>     >         >     >>> >
>     >         >     >>> > # pcs constraint
>     >         >     >>> > Location Constraints:
>     >         >     >>> >   Resource: cu_2
>     >         >     >>> >     Enabled on: Redun_CU4_Wb30 (score:0)
>     >         >     >>> >     Enabled on: Redund_CU2_WB30 (score:0)
>     >         >     >>> >     Enabled on: Redund_CU3_WB30 (score:0)
>     >         >     >>> >     Enabled on: Redund_CU5_WB30 (score:0)
>     >         >     >>> >     Enabled on: Redund_CU1_WB30 (score:0)
>     >         >     >>> >   Resource: cu_3
>     >         >     >>> >     Enabled on: Redun_CU4_Wb30 (score:0)
>     >         >     >>> >     Enabled on: Redund_CU2_WB30 (score:0)
>     >         >     >>> >     Enabled on: Redund_CU3_WB30 (score:0)
>     >         >     >>> >     Enabled on: Redund_CU5_WB30 (score:0)
>     >         >     >>> >     Enabled on: Redund_CU1_WB30 (score:0)
>     >         >     >>> >   Resource: cu_4
>     >         >     >>> >     Enabled on: Redun_CU4_Wb30 (score:0)
>     >         >     >>> >     Enabled on: Redund_CU2_WB30 (score:0)
>     >         >     >>> >     Enabled on: Redund_CU3_WB30 (score:0)
>     >         >     >>> >     Enabled on: Redund_CU5_WB30 (score:0)
>     >         >     >>> >     Enabled on: Redund_CU1_WB30 (score:0)
>     >         >     >>> >   Resource: cu_5
>     >         >     >>> >     Enabled on: Redun_CU4_Wb30 (score:0)
>     >         >     >>> >     Enabled on: Redund_CU2_WB30 (score:0)
>     >         >     >>> >     Enabled on: Redund_CU3_WB30 (score:0)
>     >         >     >>> >     Enabled on: Redund_CU5_WB30 (score:0)
>     >         >     >>> >     Enabled on: Redund_CU1_WB30 (score:0)
>     >         >     >>> > Ordering Constraints:
>     >         >     >>> > Colocation Constraints:
>     >         >     >>> >   cu_3 with cu_2 (score:-INFINITY)
>     >         >     >>> >   cu_4 with cu_2 (score:-INFINITY)
>     >         >     >>> >   cu_4 with cu_3 (score:-INFINITY)
>     >         >     >>> >   cu_5 with cu_2 (score:-INFINITY)
>     >         >     >>> >   cu_5 with cu_3 (score:-INFINITY)
>     >         >     >>> >   cu_5 with cu_4 (score:-INFINITY)
>     >         >     >>> >
>     >         >     >>> > -Thanks
>     >         >     >>> > Nikhil
>     >         >     >>>
>     >         >     >>>
>     >         >     >>>
>     >         >
>     >         >     Hi,
>     >         >
>     >         >     use of utilization (balanced strategy) has one caveat:
>     >         resources are
>     >         >     not moved just because of utilization of one node is
>     less,
>     >         when
>     >         >     nodes have the same allocation score for the resource.
>     >         >     So, after the simultaneus outage of two nodes in a
>     5-node
>     >         cluster,
>     >         >     it may appear that one node runs two resources and two
>     >         recovered
>     >         >     nodes run nothing.
>     >         >
>     >         >     Original 'utilization' strategy only limits resource
>     >         placement, it
>     >         >     is not considered when choosing a node for a resource.
>     >         >
>     >         >     Vladislav
> 
>