[ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover
Nikhil Utane
nikhil.subscribed at gmail.com
Tue Oct 18 00:29:26 EDT 2016
Thanks Ken.
I will give it a shot.
http://oss.clusterlabs.org/pipermail/pacemaker/2011-August/011271.html
On this thread, if I interpret it correctly, his problem was solved when he
swapped the anti-location constraint
>From (mapping to my example)
cu_2 with cu_4 (score:-INFINITY)
cu_3 with cu_4 (score:-INFINITY)
cu_2 with cu_3 (score:-INFINITY)
To
cu_2 with cu_4 (score:-INFINITY)
cu_4 with cu_3 (score:-INFINITY)
cu_3 with cu_2 (score:-INFINITY)
Do you think that would make any difference? The way you explained it,
sounds to me it might.
-Regards
Nikhil
On Mon, Oct 17, 2016 at 11:36 PM, Ken Gaillot <kgaillot at redhat.com> wrote:
> On 10/17/2016 09:55 AM, Nikhil Utane wrote:
> > I see these prints.
> >
> > pengine: info: rsc_merge_weights:cu_4: Rolling back scores from cu_3
> > pengine: debug: native_assign_node:Assigning Redun_CU4_Wb30 to cu_4
> > pengine: info: rsc_merge_weights:cu_3: Rolling back scores from cu_2
> > pengine: debug: native_assign_node:Assigning Redund_CU5_WB30 to cu_3
> >
> > Looks like rolling back the scores is causing the new decision to
> > relocate the resources.
> > Am I using the scores incorrectly?
>
> No, I think this is expected.
>
> Your anti-colocation constraints place cu_2 and cu_3 relative to cu_4,
> so that means the cluster will place cu_4 first if possible, before
> deciding where the others should go. Similarly, cu_2 has a constraint
> relative to cu_3, so cu_3 gets placed next, and cu_2 is the one left out.
>
> The anti-colocation scores of -INFINITY outweigh the stickiness of 100.
> I'm not sure whether setting stickiness to INFINITY would change
> anything; hopefully, it would stop cu_3 from moving, but cu_2 would
> still be stopped.
>
> I don't see a good way around this. The cluster has to place some
> resource first, in order to know not to place some other resource on the
> same node. I don't think there's a way to make them "equal", because
> then none of them could be placed to begin with -- unless you went with
> utilization attributes, as someone else suggested, with
> placement-strategy=balanced:
>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-
> single/Pacemaker_Explained/index.html#idm140521708557280
>
> >
> > [root at Redund_CU5_WB30 root]# pcs constraint
> > Location Constraints:
> > Resource: cu_2
> > Enabled on: Redun_CU4_Wb30 (score:0)
> > Enabled on: Redund_CU5_WB30 (score:0)
> > Enabled on: Redund_CU3_WB30 (score:0)
> > Enabled on: Redund_CU1_WB30 (score:0)
> > Resource: cu_3
> > Enabled on: Redun_CU4_Wb30 (score:0)
> > Enabled on: Redund_CU5_WB30 (score:0)
> > Enabled on: Redund_CU3_WB30 (score:0)
> > Enabled on: Redund_CU1_WB30 (score:0)
> > Resource: cu_4
> > Enabled on: Redun_CU4_Wb30 (score:0)
> > Enabled on: Redund_CU5_WB30 (score:0)
> > Enabled on: Redund_CU3_WB30 (score:0)
> > Enabled on: Redund_CU1_WB30 (score:0)
> > Ordering Constraints:
> > Colocation Constraints:
> > cu_2 with cu_4 (score:-INFINITY)
> > cu_3 with cu_4 (score:-INFINITY)
> > cu_2 with cu_3 (score:-INFINITY)
> >
> >
> > On Mon, Oct 17, 2016 at 8:16 PM, Nikhil Utane
> > <nikhil.subscribed at gmail.com <mailto:nikhil.subscribed at gmail.com>>
> wrote:
> >
> > This is driving me insane.
> >
> > This is how the resources were started. Redund_CU1_WB30 was the DC
> > which I rebooted.
> > cu_4(ocf::redundancy:RedundancyRA):Started Redund_CU1_WB30
> > cu_2(ocf::redundancy:RedundancyRA):Started Redund_CU5_WB30
> > cu_3(ocf::redundancy:RedundancyRA):Started Redun_CU4_Wb30
> >
> > Since the standby node was not UP. I was expecting resource cu_4 to
> > be waiting to be scheduled.
> > But then it re-arranged everything as below.
> > cu_4(ocf::redundancy:RedundancyRA):Started Redun_CU4_Wb30
> > cu_2(ocf::redundancy:RedundancyRA):Stopped
> > cu_3(ocf::redundancy:RedundancyRA):Started Redund_CU5_WB30
> >
> > There is not much information available in the logs on new DC. It
> > just shows what it has decided to do but nothing to suggest why it
> > did it that way.
> >
> > notice: Start cu_4(Redun_CU4_Wb30)
> > notice: Stop cu_2(Redund_CU5_WB30)
> > notice: Move cu_3(Started Redun_CU4_Wb30 -> Redund_CU5_WB30)
> >
> > I have default stickiness set to 100 which is higher than any score
> > that I have configured.
> > I have migration_threshold set to 1. Should I bump that up instead?
> >
> > -Thanks
> > Nikhil
> >
> > On Sat, Oct 15, 2016 at 12:36 AM, Ken Gaillot <kgaillot at redhat.com
> > <mailto:kgaillot at redhat.com>> wrote:
> >
> > On 10/14/2016 06:56 AM, Nikhil Utane wrote:
> > > Hi,
> > >
> > > Thank you for the responses so far.
> > > I added reverse colocation as well. However seeing some other
> issue in
> > > resource movement that I am analyzing.
> > >
> > > Thinking further on this, why doesn't "/a not with b" does not
> > imply "b
> > > not with a"?/
> > > Coz wouldn't putting "b with a" violate "a not with b"?
> > >
> > > Can someone confirm that colocation is required to be
> configured both ways?
> >
> > The anti-colocation should only be defined one-way. Otherwise,
> > you get a
> > dependency loop (as seen in logs you showed elsewhere).
> >
> > The one-way constraint is enough to keep the resources apart.
> > However,
> > the question is whether the cluster might move resources around
> > unnecessarily.
> >
> > For example, "A not with B" means that the cluster will place B
> > first,
> > then place A somewhere else. So, if B's node fails, can the
> cluster
> > decide that A's node is now the best place for B, and move A to
> > a free
> > node, rather than simply start B on the free node?
> >
> > The cluster does take dependencies into account when placing a
> > resource,
> > so I would hope that wouldn't happen. But I'm not sure. Having
> some
> > stickiness might help, so that A has some preference against
> moving.
> >
> > > -Thanks
> > > Nikhil
> > >
> > > /
> > > /
> > >
> > > On Fri, Oct 14, 2016 at 1:09 PM, Vladislav Bogdanov
> > > <bubble at hoster-ok.com <mailto:bubble at hoster-ok.com>
> > <mailto:bubble at hoster-ok.com <mailto:bubble at hoster-ok.com>>>
> wrote:
> > >
> > > On October 14, 2016 10:13:17 AM GMT+03:00, Ulrich Windl
> > > <Ulrich.Windl at rz.uni-regensburg.de
> > <mailto:Ulrich.Windl at rz.uni-regensburg.de>
> > > <mailto:Ulrich.Windl at rz.uni-regensburg.de
> > <mailto:Ulrich.Windl at rz.uni-regensburg.de>>> wrote:
> > > >>>> Nikhil Utane <nikhil.subscribed at gmail.com <mailto:
> nikhil.subscribed at gmail.com>
> > > <mailto:nikhil.subscribed at gmail.com
> > <mailto:nikhil.subscribed at gmail.com>>> schrieb am 13.10.2016 um
> > > >16:43 in
> > > >Nachricht
> > > ><CAGNWmJUbPucnBGXroHkHSbQ0LXo
> vwsLFPkUPg1R8gJqRFqM9Dg at mail.gmail.com
> > <mailto:CAGNWmJUbPucnBGXroHkHSbQ0LXovwsLFPkUPg1R8gJqRFqM9Dg@
> mail.gmail.com>
> > >
> > <mailto:CAGNWmJUbPucnBGXroHkHSbQ0LXovwsLFPkUPg1R8gJqRFqM9Dg@
> mail.gmail.com
> > <mailto:CAGNWmJUbPucnBGXroHkHSbQ0LXovwsLFPkUPg1R8gJqRFqM9Dg@
> mail.gmail.com>>>:
> > > >> Ulrich,
> > > >>
> > > >> I have 4 resources only (not 5, nodes are 5). So then I
> only need 6
> > > >> constraints, right?
> > > >>
> > > >> [,1] [,2] [,3] [,4] [,5] [,6]
> > > >> [1,] "A" "A" "A" "B" "B" "C"
> > > >> [2,] "B" "C" "D" "C" "D" "D"
> > > >
> > > >Sorry for my confusion. As Andrei Borzenkovsaid in
> > > ><CAA91j0W+epAHFLg9u6VX_X8LgFkf9Rp55g3nocY4oZNA9BbZ+g@
> mail.gmail.com
> > <mailto:CAA91j0W%2BepAHFLg9u6VX_X8LgFkf9Rp55g3nocY4oZNA9BbZ%
> 2Bg at mail.gmail.com>
> > >
> > <mailto:CAA91j0W%2BepAHFLg9u6VX_X8LgFkf9Rp55g3nocY4oZNA9BbZ%
> 2Bg at mail.gmail.com
> > <mailto:CAA91j0W%252BepAHFLg9u6VX_X8LgFkf9Rp55g3nocY4oZNA9BbZ%
> 252Bg at mail.gmail.com>>>
> > > >you probably have to add (A, B) _and_ (B, A)! Thinking
> about it, I
> > > >wonder whether an easier solution would be using
> "utilization": If
> > > >every node has one token to give, and every resource
> needs on token, no
> > > >two resources will run on one node. Sounds like an easier
> solution to
> > > >me.
> > > >
> > > >Regards,
> > > >Ulrich
> > > >
> > > >
> > > >>
> > > >> I understand that if I configure constraint of R1 with
> R2 score as
> > > >> -infinity, then the same applies for R2 with R1 score
> as -infinity
> > > >(don't
> > > >> have to configure it explicitly).
> > > >> I am not having a problem of multiple resources getting
> schedule on
> > > >the
> > > >> same node. Rather, one working resource is
> unnecessarily getting
> > > >relocated.
> > > >>
> > > >> -Thanks
> > > >> Nikhil
> > > >>
> > > >>
> > > >> On Thu, Oct 13, 2016 at 7:45 PM, Ulrich Windl <
> > > >> Ulrich.Windl at rz.uni-regensburg.de
> > <mailto:Ulrich.Windl at rz.uni-regensburg.de>
> > > <mailto:Ulrich.Windl at rz.uni-regensburg.de
> > <mailto:Ulrich.Windl at rz.uni-regensburg.de>>> wrote:
> > > >>
> > > >>> Hi!
> > > >>>
> > > >>> Don't you need 10 constraints, excluding every
> possible pair of your
> > > >5
> > > >>> resources (named A-E here), like in this table
> (produced with R):
> > > >>>
> > > >>> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
> > > >>> [1,] "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
> > > >>> [2,] "B" "C" "D" "E" "C" "D" "E" "D" "E" "E"
> > > >>>
> > > >>> Ulrich
> > > >>>
> > > >>> >>> Nikhil Utane <nikhil.subscribed at gmail.com <mailto:
> nikhil.subscribed at gmail.com>
> > > <mailto:nikhil.subscribed at gmail.com
> > <mailto:nikhil.subscribed at gmail.com>>> schrieb am 13.10.2016
> > > >um
> > > >>> 15:59 in
> > > >>> Nachricht
> > > >>>
> > > ><CAGNWmJW0CWMr3bvR3L9xZCAcJUz
> yczQbZEzUzpaJxi+Pn7Oj_A at mail.gmail.com
> > <mailto:CAGNWmJW0CWMr3bvR3L9xZCAcJUzyczQbZEzUzpaJxi%2BPn7Oj_
> A at mail.gmail.com>
> > >
> > <mailto:CAGNWmJW0CWMr3bvR3L9xZCAcJUzyczQbZEzUzpaJxi%2BPn7Oj_
> A at mail.gmail.com
> > <mailto:CAGNWmJW0CWMr3bvR3L9xZCAcJUzyczQbZEzUzpaJxi%
> 252BPn7Oj_A at mail.gmail.com>>>:
> > > >>> > Hi,
> > > >>> >
> > > >>> > I have 5 nodes and 4 resources configured.
> > > >>> > I have configured constraint such that no two
> > resources can be
> > > >>> co-located.
> > > >>> > I brought down a node (which happened to be DC). I
> > was expecting
> > > >the
> > > >>> > resource on the failed node would be migrated to the
> > 5th waiting
> > > >node
> > > >>> (that
> > > >>> > is not running any resource).
> > > >>> > However what happened was the failed node resource
> > was started on
> > > >another
> > > >>> > active node (after stopping it's existing resource)
> > and that
> > > >node's
> > > >>> > resource was moved to the waiting node.
> > > >>> >
> > > >>> > What could I be doing wrong?
> > > >>> >
> > > >>> > <nvpair id="cib-bootstrap-options-have-watchdog"
> > value="true"
> > > >>> > name="have-watchdog"/>
> > > >>> > <nvpair id="cib-bootstrap-options-dc-version"
> > > >value="1.1.14-5a6cdd1"
> > > >>> > name="dc-version"/>
> > > >>> > <nvpair
> > id="cib-bootstrap-options-cluster-infrastructure"
> > > >>> value="corosync"
> > > >>> > name="cluster-infrastructure"/>
> > > >>> > <nvpair id="cib-bootstrap-options-stonith-enabled"
> > value="false"
> > > >>> > name="stonith-enabled"/>
> > > >>> > <nvpair id="cib-bootstrap-options-no-quorum-policy"
> > value="ignore"
> > > >>> > name="no-quorum-policy"/>
> > > >>> > <nvpair
> > id="cib-bootstrap-options-default-action-timeout"
> > > >value="240"
> > > >>> > name="default-action-timeout"/>
> > > >>> > <nvpair id="cib-bootstrap-options-symmetric-cluster"
> > value="false"
> > > >>> > name="symmetric-cluster"/>
> > > >>> >
> > > >>> > # pcs constraint
> > > >>> > Location Constraints:
> > > >>> > Resource: cu_2
> > > >>> > Enabled on: Redun_CU4_Wb30 (score:0)
> > > >>> > Enabled on: Redund_CU2_WB30 (score:0)
> > > >>> > Enabled on: Redund_CU3_WB30 (score:0)
> > > >>> > Enabled on: Redund_CU5_WB30 (score:0)
> > > >>> > Enabled on: Redund_CU1_WB30 (score:0)
> > > >>> > Resource: cu_3
> > > >>> > Enabled on: Redun_CU4_Wb30 (score:0)
> > > >>> > Enabled on: Redund_CU2_WB30 (score:0)
> > > >>> > Enabled on: Redund_CU3_WB30 (score:0)
> > > >>> > Enabled on: Redund_CU5_WB30 (score:0)
> > > >>> > Enabled on: Redund_CU1_WB30 (score:0)
> > > >>> > Resource: cu_4
> > > >>> > Enabled on: Redun_CU4_Wb30 (score:0)
> > > >>> > Enabled on: Redund_CU2_WB30 (score:0)
> > > >>> > Enabled on: Redund_CU3_WB30 (score:0)
> > > >>> > Enabled on: Redund_CU5_WB30 (score:0)
> > > >>> > Enabled on: Redund_CU1_WB30 (score:0)
> > > >>> > Resource: cu_5
> > > >>> > Enabled on: Redun_CU4_Wb30 (score:0)
> > > >>> > Enabled on: Redund_CU2_WB30 (score:0)
> > > >>> > Enabled on: Redund_CU3_WB30 (score:0)
> > > >>> > Enabled on: Redund_CU5_WB30 (score:0)
> > > >>> > Enabled on: Redund_CU1_WB30 (score:0)
> > > >>> > Ordering Constraints:
> > > >>> > Colocation Constraints:
> > > >>> > cu_3 with cu_2 (score:-INFINITY)
> > > >>> > cu_4 with cu_2 (score:-INFINITY)
> > > >>> > cu_4 with cu_3 (score:-INFINITY)
> > > >>> > cu_5 with cu_2 (score:-INFINITY)
> > > >>> > cu_5 with cu_3 (score:-INFINITY)
> > > >>> > cu_5 with cu_4 (score:-INFINITY)
> > > >>> >
> > > >>> > -Thanks
> > > >>> > Nikhil
> > > >>>
> > > >>>
> > > >>>
> > >
> > > Hi,
> > >
> > > use of utilization (balanced strategy) has one caveat:
> > resources are
> > > not moved just because of utilization of one node is less,
> > when
> > > nodes have the same allocation score for the resource.
> > > So, after the simultaneus outage of two nodes in a 5-node
> > cluster,
> > > it may appear that one node runs two resources and two
> > recovered
> > > nodes run nothing.
> > >
> > > Original 'utilization' strategy only limits resource
> > placement, it
> > > is not considered when choosing a node for a resource.
> > >
> > > Vladislav
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20161018/d0465402/attachment-0003.html>
More information about the Users
mailing list