[ClusterLabs] Antw: Re: Colocation constraint moving resource

Wed Mar 27 04:43:27 EDT 2019

On Wed, 27 Mar 2019 08:10:23 +0100
"Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de> wrote:

> >>> Ken Gaillot <kgaillot at redhat.com> schrieb am 26.03.2019 um 20:28 in  
> Nachricht
> <1d8d000ab946586783fc9adec3063a1748a5b06f.camel at redhat.com>:
> > On Tue, 2019-03-26 at 22:12 +0300, Andrei Borzenkov wrote:  
> >> 26.03.2019 17:14, Ken Gaillot пишет:  
> >> > On Tue, 2019-03-26 at 14:11 +0100, Thomas Singleton wrote:  
> >> > > Dear all
> >> > > 
> >> > > I am encountering an issue with colocation constraints. 
> >> > > 
> >> > > I have created a 4 nodes cluster (3 "main" and 1 "spare") with 3
> >> > > resources and I wish to have each resource run only on its own
> >> > > node
> >> > > (or
> >> > > on the spare) and resources must never run together on the spare.
> >> > > 
> >> > > I understand that this implies a definition of priorities between
> >> > > resources should two nodes fail at the same time. This is the
> >> > > desired
> >> > > behavior. Resource 1 is more important than resource 2 which is
> >> > > more
> >> > > important than resource 3. And thus in the case of multiple nodes
> >> > > failure, the spare node must be running the higher priority
> >> > > resource
> >> > > even if this means the lower priority resources will be stopped.
> >> > > 
> >> > > When the resources are created with the adequate priorities but
> >> > > without
> >> > > colocation constraint they are indeed running on each "main" node
> >> > > as
> >> > > expected.
> >> > > 
> >> > > Trouble arises when I start adding the colocation constraints.
> >> > > 
> >> > > If I add one colocation constraint (resource3 cannot run with
> >> > > resource2), resources remain correctly on their node. 
> >> > > 
> >> > > But as soon as I add a second colocation constraint (resource2
> >> > > cannot
> >> > > run with resource1), resource1 switches to the spare node because
> >> > > the
> >> > > resource1 allocation score on node1 becomes -INFINITY and I
> >> > > cannot
> >> > > understand why ?
> >> > > 
> >> > > Setup and commands output below
> >> > > 
> >> > > Thank you
> >> > > 
> >> > > 
> >> > > ****************
> >> > > 
> >> > > Resources definition, opt-in cluster
> >> > > 
> >> > > # pcs property set symmetric-cluster=false
> >> > > # pcs resource create TestResourceNode1 ocf:pacemaker:Dummy op
> >> > > monitor
> >> > > interval=120s
> >> > > # pcs constraint location TestResourceNode1 prefers node1=100
> >> > > # pcs constraint location TestResourceNode1 prefers nodespare=80
> >> > > # pcs resource create TestResourceNode2 ocf:pacemaker:Dummy op
> >> > > monitor
> >> > > interval=120s
> >> > > # pcs constraint location TestResourceNode2 prefers node2=50
> >> > > # pcs constraint location TestResourceNode2 prefers nodespare=30
> >> > > # pcs resource create TestResourceNode3 ocf:pacemaker:Dummy op
> >> > > monitor
> >> > > interval=120s
> >> > > # pcs constraint location TestResourceNode3 prefers node3=10
> >> > > # pcs constraint location TestResourceNode3 prefers nodespare=3  
> >> > 
> >> > Side comment: Using different location constraint scores for each
> >> > resource doesn't establish a priority of resources if they can't
> >> > all be
> >> > run. For that, there is an actual "priority" meta-attribute for
> >> > resources, so you want to set that for all three.
> >> >   
> >> > > # crm_simulate -sL
> >> > > 
> >> > > Current cluster status:
> >> > > Online: [ node1 node2 node3 nodespare ]
> >> > > 
> >> > >  TestResourceNode1	(ocf::pacemaker:Dummy):	Started
> >> > > node1
> >> > >  TestResourceNode2	(ocf::pacemaker:Dummy):	Started
> >> > > node2
> >> > >  TestResourceNode3	(ocf::pacemaker:Dummy):	Started
> >> > > node3
> >> > > 
> >> > > Allocation scores:
> >> > > native_color: TestResourceNode1 allocation score on node1: 100
> >> > > native_color: TestResourceNode1 allocation score on nodespare: 80
> >> > > native_color: TestResourceNode2 allocation score on node2: 50
> >> > > native_color: TestResourceNode2 allocation score on nodespare: 30
> >> > > native_color: TestResourceNode3 allocation score on node3: 10
> >> > > native_color: TestResourceNode3 allocation score on nodespare: 3
> >> > > 
> >> > > # pcs constraint colocation add TestResourceNode3 with
> >> > > TestResourceNode2 score=-INFINITY
> >> > > # crm_simulate -sL
> >> > > 
> >> > > Current cluster status:
> >> > > Online: [ node1 node2 node3 nodespare ]
> >> > > 
> >> > >  TestResourceNode1	(ocf::pacemaker:Dummy):	Started
> >> > > node1
> >> > >  TestResourceNode2	(ocf::pacemaker:Dummy):	Started
> >> > > node2
> >> > >  TestResourceNode3	(ocf::pacemaker:Dummy):	Started
> >> > > node3
> >> > > 
> >> > > Allocation scores:
> >> > > native_color: TestResourceNode1 allocation score on node1: 100
> >> > > native_color: TestResourceNode1 allocation score on nodespare: 80
> >> > > native_color: TestResourceNode2 allocation score on node2: 50
> >> > > native_color: TestResourceNode2 allocation score on nodespare: 27
> >> > > native_color: TestResourceNode3 allocation score on node3: 10
> >> > > native_color: TestResourceNode3 allocation score on nodespare: 3
> >> > > 
> >> > > # pcs constraint colocation add TestResourceNode2 with
> >> > > TestResourceNode1 score=-INFINITY
> >> > > # crm_simulate -sL
> >> > > 
> >> > > Current cluster status:
> >> > > Online: [ node1 node2 node3 nodespare ]
> >> > > 
> >> > >  TestResourceNode1	(ocf::pacemaker:Dummy):	Started
> >> > > nodespare
> >> > >  TestResourceNode2	(ocf::pacemaker:Dummy):	Started
> >> > > node2
> >> > >  TestResourceNode3	(ocf::pacemaker:Dummy):	Started
> >> > > node3
> >> > > 
> >> > > Allocation scores:
> >> > > native_color: TestResourceNode1 allocation score on node1:
> >> > > -INFINITY
> >> > > native_color: TestResourceNode1 allocation score on nodespare: 53
> >> > > native_color: TestResourceNode2 allocation score on node2: 50
> >> > > native_color: TestResourceNode2 allocation score on nodespare:
> >> > > -INFINITY
> >> > > native_color: TestResourceNode3 allocation score on node3: 10
> >> > > native_color: TestResourceNode3 allocation score on nodespare: 3  
> >> > 
> >> > This seems like a bug to me. Can you attach (or e-mail me
> >> > privately)
> >> > the pe-input file that led to the above situation?
> >> >   
> >> 
> >> What apparently happens is the problem with INFINITY math. We have
> >> chain
> >> 
> >> TestResourceNode3 -> TestResourceNode2 -> TestResourceNode1
> >> 
> >> from colocation constraints
> >> 
> >> colocate(TestResourceNode3, TestResourceNode2, -INFINITY)
> >> colocate(TestResourceNode2, TestResourceNode1, -INFINITY)
> >> 
> >> TestResourceNode1 gets score 100 on node1 and then tries to include
> >> score of TestResourceNode2. Factor is -1 (-INFINITY/INFINITY), score
> >> on
> >> node is -INFINITY so result is 100 + (-1)*(-INFINITY) == INFINITY.
> >> Next
> >> step is to include scores of TestResourceNode3. The problem is,
> >> rsc_merge_wights flips factor:
> >> 
> >>     if (factor < 0) {
> >>         multiplier = -1;
> >>     }
> >> ...
> >>             work = rsc_merge_weights(other, rhs, work,
> >> constraint->node_attribute,
> >>                                      multiplier *
> >> (float)constraint->score / INFINITY, flags|pe_weights_rollback);
> >> 
> >> 
> >> so factor becomes (-1)*(-INFINITY/INFINITY) == 1, score of
> >> TestResourceNode3 on node1 is -INFINITY so pacemaker adds
> >> (1)*(-INFINITY) == -INFINITY with final result -INFINITY, blocking
> >> TestResourceNode1 from node1.  
> > 
> > Your explanation triggered a distant memory :)
> > 
> > This is a similar situation to CLBZ#5320:
> > https://bugs.clusterlabs.org/show_bug.cgi?id=5320 
> > 
> > The solution I recommended there is probably the best for this one,
> > too: use utilization attributes instead of colocation constraints to
> > keep the resources on different nodes.  
> 
> Sounds like the "carpet solution" (lift carpet, move the dirt under it, then
> lower the carpet, and everything looks just fine" ;-)
> 
> No efforts to clean up the mess or document the brokenness?

Agree. This looks to be a fairly common scenario. At least, it deserve to be
properly documented, if not fixed.