[ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover
Nikhil Utane
nikhil.subscribed at gmail.com
Mon Oct 17 14:55:44 UTC 2016
I see these prints.
pengine: info: rsc_merge_weights: cu_4: Rolling back scores from cu_3
pengine: debug: native_assign_node: Assigning Redun_CU4_Wb30 to cu_4
pengine: info: rsc_merge_weights: cu_3: Rolling back scores from cu_2
pengine: debug: native_assign_node: Assigning Redund_CU5_WB30 to cu_3
Looks like rolling back the scores is causing the new decision to relocate
the resources.
Am I using the scores incorrectly?
[root at Redund_CU5_WB30 root]# pcs constraint
Location Constraints:
Resource: cu_2
Enabled on: Redun_CU4_Wb30 (score:0)
Enabled on: Redund_CU5_WB30 (score:0)
Enabled on: Redund_CU3_WB30 (score:0)
Enabled on: Redund_CU1_WB30 (score:0)
Resource: cu_3
Enabled on: Redun_CU4_Wb30 (score:0)
Enabled on: Redund_CU5_WB30 (score:0)
Enabled on: Redund_CU3_WB30 (score:0)
Enabled on: Redund_CU1_WB30 (score:0)
Resource: cu_4
Enabled on: Redun_CU4_Wb30 (score:0)
Enabled on: Redund_CU5_WB30 (score:0)
Enabled on: Redund_CU3_WB30 (score:0)
Enabled on: Redund_CU1_WB30 (score:0)
Ordering Constraints:
Colocation Constraints:
cu_2 with cu_4 (score:-INFINITY)
cu_3 with cu_4 (score:-INFINITY)
cu_2 with cu_3 (score:-INFINITY)
On Mon, Oct 17, 2016 at 8:16 PM, Nikhil Utane <nikhil.subscribed at gmail.com>
wrote:
> This is driving me insane.
>
> This is how the resources were started. Redund_CU1_WB30 was the DC which
> I rebooted.
> cu_4 (ocf::redundancy:RedundancyRA): Started Redund_CU1_WB30
> cu_2 (ocf::redundancy:RedundancyRA): Started Redund_CU5_WB30
> cu_3 (ocf::redundancy:RedundancyRA): Started Redun_CU4_Wb30
>
> Since the standby node was not UP. I was expecting resource cu_4 to be
> waiting to be scheduled.
> But then it re-arranged everything as below.
> cu_4 (ocf::redundancy:RedundancyRA): Started Redun_CU4_Wb30
> cu_2 (ocf::redundancy:RedundancyRA): Stopped
> cu_3 (ocf::redundancy:RedundancyRA): Started Redund_CU5_WB30
>
> There is not much information available in the logs on new DC. It just
> shows what it has decided to do but nothing to suggest why it did it that
> way.
>
> notice: Start cu_4 (Redun_CU4_Wb30)
> notice: Stop cu_2 (Redund_CU5_WB30)
> notice: Move cu_3 (Started Redun_CU4_Wb30 -> Redund_CU5_WB30)
>
> I have default stickiness set to 100 which is higher than any score that I
> have configured.
> I have migration_threshold set to 1. Should I bump that up instead?
>
> -Thanks
> Nikhil
>
> On Sat, Oct 15, 2016 at 12:36 AM, Ken Gaillot <kgaillot at redhat.com> wrote:
>
>> On 10/14/2016 06:56 AM, Nikhil Utane wrote:
>> > Hi,
>> >
>> > Thank you for the responses so far.
>> > I added reverse colocation as well. However seeing some other issue in
>> > resource movement that I am analyzing.
>> >
>> > Thinking further on this, why doesn't "/a not with b" does not imply "b
>> > not with a"?/
>> > Coz wouldn't putting "b with a" violate "a not with b"?
>> >
>> > Can someone confirm that colocation is required to be configured both
>> ways?
>>
>> The anti-colocation should only be defined one-way. Otherwise, you get a
>> dependency loop (as seen in logs you showed elsewhere).
>>
>> The one-way constraint is enough to keep the resources apart. However,
>> the question is whether the cluster might move resources around
>> unnecessarily.
>>
>> For example, "A not with B" means that the cluster will place B first,
>> then place A somewhere else. So, if B's node fails, can the cluster
>> decide that A's node is now the best place for B, and move A to a free
>> node, rather than simply start B on the free node?
>>
>> The cluster does take dependencies into account when placing a resource,
>> so I would hope that wouldn't happen. But I'm not sure. Having some
>> stickiness might help, so that A has some preference against moving.
>>
>> > -Thanks
>> > Nikhil
>> >
>> > /
>> > /
>> >
>> > On Fri, Oct 14, 2016 at 1:09 PM, Vladislav Bogdanov
>> > <bubble at hoster-ok.com <mailto:bubble at hoster-ok.com>> wrote:
>> >
>> > On October 14, 2016 10:13:17 AM GMT+03:00, Ulrich Windl
>> > <Ulrich.Windl at rz.uni-regensburg.de
>> > <mailto:Ulrich.Windl at rz.uni-regensburg.de>> wrote:
>> > >>>> Nikhil Utane <nikhil.subscribed at gmail.com
>> > <mailto:nikhil.subscribed at gmail.com>> schrieb am 13.10.2016 um
>> > >16:43 in
>> > >Nachricht
>> > ><CAGNWmJUbPucnBGXroHkHSbQ0LXovwsLFPkUPg1R8gJqRFqM9Dg at mail.
>> gmail.com
>> > <mailto:CAGNWmJUbPucnBGXroHkHSbQ0LXovwsLFPkUPg1R8gJqRFqM9Dg
>> @mail.gmail.com>>:
>> > >> Ulrich,
>> > >>
>> > >> I have 4 resources only (not 5, nodes are 5). So then I only
>> need 6
>> > >> constraints, right?
>> > >>
>> > >> [,1] [,2] [,3] [,4] [,5] [,6]
>> > >> [1,] "A" "A" "A" "B" "B" "C"
>> > >> [2,] "B" "C" "D" "C" "D" "D"
>> > >
>> > >Sorry for my confusion. As Andrei Borzenkovsaid in
>> > ><CAA91j0W+epAHFLg9u6VX_X8LgFkf9Rp55g3nocY4oZNA9BbZ+g at mail.
>> gmail.com
>> > <mailto:CAA91j0W%2BepAHFLg9u6VX_X8LgFkf9Rp55g3nocY4oZNA9BbZ
>> %2Bg at mail.gmail.com>>
>> > >you probably have to add (A, B) _and_ (B, A)! Thinking about it, I
>> > >wonder whether an easier solution would be using "utilization": If
>> > >every node has one token to give, and every resource needs on
>> token, no
>> > >two resources will run on one node. Sounds like an easier solution
>> to
>> > >me.
>> > >
>> > >Regards,
>> > >Ulrich
>> > >
>> > >
>> > >>
>> > >> I understand that if I configure constraint of R1 with R2 score
>> as
>> > >> -infinity, then the same applies for R2 with R1 score as
>> -infinity
>> > >(don't
>> > >> have to configure it explicitly).
>> > >> I am not having a problem of multiple resources getting schedule
>> on
>> > >the
>> > >> same node. Rather, one working resource is unnecessarily getting
>> > >relocated.
>> > >>
>> > >> -Thanks
>> > >> Nikhil
>> > >>
>> > >>
>> > >> On Thu, Oct 13, 2016 at 7:45 PM, Ulrich Windl <
>> > >> Ulrich.Windl at rz.uni-regensburg.de
>> > <mailto:Ulrich.Windl at rz.uni-regensburg.de>> wrote:
>> > >>
>> > >>> Hi!
>> > >>>
>> > >>> Don't you need 10 constraints, excluding every possible pair of
>> your
>> > >5
>> > >>> resources (named A-E here), like in this table (produced with
>> R):
>> > >>>
>> > >>> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>> > >>> [1,] "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
>> > >>> [2,] "B" "C" "D" "E" "C" "D" "E" "D" "E" "E"
>> > >>>
>> > >>> Ulrich
>> > >>>
>> > >>> >>> Nikhil Utane <nikhil.subscribed at gmail.com
>> > <mailto:nikhil.subscribed at gmail.com>> schrieb am 13.10.2016
>> > >um
>> > >>> 15:59 in
>> > >>> Nachricht
>> > >>>
>> > ><CAGNWmJW0CWMr3bvR3L9xZCAcJUzyczQbZEzUzpaJxi+Pn7Oj_A at mail.
>> gmail.com
>> > <mailto:CAGNWmJW0CWMr3bvR3L9xZCAcJUzyczQbZEzUzpaJxi%
>> 2BPn7Oj_A at mail.gmail.com>>:
>> > >>> > Hi,
>> > >>> >
>> > >>> > I have 5 nodes and 4 resources configured.
>> > >>> > I have configured constraint such that no two resources can be
>> > >>> co-located.
>> > >>> > I brought down a node (which happened to be DC). I was
>> expecting
>> > >the
>> > >>> > resource on the failed node would be migrated to the 5th
>> waiting
>> > >node
>> > >>> (that
>> > >>> > is not running any resource).
>> > >>> > However what happened was the failed node resource was
>> started on
>> > >another
>> > >>> > active node (after stopping it's existing resource) and that
>> > >node's
>> > >>> > resource was moved to the waiting node.
>> > >>> >
>> > >>> > What could I be doing wrong?
>> > >>> >
>> > >>> > <nvpair id="cib-bootstrap-options-have-watchdog" value="true"
>> > >>> > name="have-watchdog"/>
>> > >>> > <nvpair id="cib-bootstrap-options-dc-version"
>> > >value="1.1.14-5a6cdd1"
>> > >>> > name="dc-version"/>
>> > >>> > <nvpair id="cib-bootstrap-options-cluster-infrastructure"
>> > >>> value="corosync"
>> > >>> > name="cluster-infrastructure"/>
>> > >>> > <nvpair id="cib-bootstrap-options-stonith-enabled"
>> value="false"
>> > >>> > name="stonith-enabled"/>
>> > >>> > <nvpair id="cib-bootstrap-options-no-quorum-policy"
>> value="ignore"
>> > >>> > name="no-quorum-policy"/>
>> > >>> > <nvpair id="cib-bootstrap-options-default-action-timeout"
>> > >value="240"
>> > >>> > name="default-action-timeout"/>
>> > >>> > <nvpair id="cib-bootstrap-options-symmetric-cluster"
>> value="false"
>> > >>> > name="symmetric-cluster"/>
>> > >>> >
>> > >>> > # pcs constraint
>> > >>> > Location Constraints:
>> > >>> > Resource: cu_2
>> > >>> > Enabled on: Redun_CU4_Wb30 (score:0)
>> > >>> > Enabled on: Redund_CU2_WB30 (score:0)
>> > >>> > Enabled on: Redund_CU3_WB30 (score:0)
>> > >>> > Enabled on: Redund_CU5_WB30 (score:0)
>> > >>> > Enabled on: Redund_CU1_WB30 (score:0)
>> > >>> > Resource: cu_3
>> > >>> > Enabled on: Redun_CU4_Wb30 (score:0)
>> > >>> > Enabled on: Redund_CU2_WB30 (score:0)
>> > >>> > Enabled on: Redund_CU3_WB30 (score:0)
>> > >>> > Enabled on: Redund_CU5_WB30 (score:0)
>> > >>> > Enabled on: Redund_CU1_WB30 (score:0)
>> > >>> > Resource: cu_4
>> > >>> > Enabled on: Redun_CU4_Wb30 (score:0)
>> > >>> > Enabled on: Redund_CU2_WB30 (score:0)
>> > >>> > Enabled on: Redund_CU3_WB30 (score:0)
>> > >>> > Enabled on: Redund_CU5_WB30 (score:0)
>> > >>> > Enabled on: Redund_CU1_WB30 (score:0)
>> > >>> > Resource: cu_5
>> > >>> > Enabled on: Redun_CU4_Wb30 (score:0)
>> > >>> > Enabled on: Redund_CU2_WB30 (score:0)
>> > >>> > Enabled on: Redund_CU3_WB30 (score:0)
>> > >>> > Enabled on: Redund_CU5_WB30 (score:0)
>> > >>> > Enabled on: Redund_CU1_WB30 (score:0)
>> > >>> > Ordering Constraints:
>> > >>> > Colocation Constraints:
>> > >>> > cu_3 with cu_2 (score:-INFINITY)
>> > >>> > cu_4 with cu_2 (score:-INFINITY)
>> > >>> > cu_4 with cu_3 (score:-INFINITY)
>> > >>> > cu_5 with cu_2 (score:-INFINITY)
>> > >>> > cu_5 with cu_3 (score:-INFINITY)
>> > >>> > cu_5 with cu_4 (score:-INFINITY)
>> > >>> >
>> > >>> > -Thanks
>> > >>> > Nikhil
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> > >>> _______________________________________________
>> > >>> Users mailing list: Users at clusterlabs.org
>> > <mailto:Users at clusterlabs.org>
>> > >>> http://clusterlabs.org/mailman/listinfo/users
>> > <http://clusterlabs.org/mailman/listinfo/users>
>> > >>>
>> > >>> Project Home: http://www.clusterlabs.org
>> > >>> Getting started:
>> > >http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>> > >>> Bugs: http://bugs.clusterlabs.org
>> > >>>
>> > >
>> > >
>> > >
>> > >
>> > >_______________________________________________
>> > >Users mailing list: Users at clusterlabs.org
>> > <mailto:Users at clusterlabs.org>
>> > >http://clusterlabs.org/mailman/listinfo/users
>> > <http://clusterlabs.org/mailman/listinfo/users>
>> > >
>> > >Project Home: http://www.clusterlabs.org
>> > >Getting started:
>> > >http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>> > >Bugs: http://bugs.clusterlabs.org
>> >
>> > Hi,
>> >
>> > use of utilization (balanced strategy) has one caveat: resources are
>> > not moved just because of utilization of one node is less, when
>> > nodes have the same allocation score for the resource.
>> > So, after the simultaneus outage of two nodes in a 5-node cluster,
>> > it may appear that one node runs two resources and two recovered
>> > nodes run nothing.
>> >
>> > Original 'utilization' strategy only limits resource placement, it
>> > is not considered when choosing a node for a resource.
>> >
>> > Vladislav
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20161017/c2a03a7e/attachment.htm>
More information about the Users
mailing list