[ClusterLabs] Antw: Re: Colocation constraint moving resource

Wed Mar 27 10:27:32 EDT 2019

On Wed, 2019-03-27 at 09:43 +0100, Jehan-Guillaume de Rorthais wrote:
> On Wed, 27 Mar 2019 08:10:23 +0100
> "Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de> wrote:
> 
> > > > > Ken Gaillot <kgaillot at redhat.com> schrieb am 26.03.2019 um
> > > > > 20:28 in  
> > 
> > Nachricht
> > <1d8d000ab946586783fc9adec3063a1748a5b06f.camel at redhat.com>:
> > > On Tue, 2019-03-26 at 22:12 +0300, Andrei Borzenkov wrote:  
> > > > 26.03.2019 17:14, Ken Gaillot пишет:  
> > > > > On Tue, 2019-03-26 at 14:11 +0100, Thomas Singleton wrote:  
> > > > > > Dear all
> > > > > > 
> > > > > > I am encountering an issue with colocation constraints. 
> > > > > > 
> > > > > > I have created a 4 nodes cluster (3 "main" and 1 "spare")
> > > > > > with 3
> > > > > > resources and I wish to have each resource run only on its
> > > > > > own
> > > > > > node
> > > > > > (or
> > > > > > on the spare) and resources must never run together on the
> > > > > > spare.
> > > > > > 
> > > > > > I understand that this implies a definition of priorities
> > > > > > between
> > > > > > resources should two nodes fail at the same time. This is
> > > > > > the
> > > > > > desired
> > > > > > behavior. Resource 1 is more important than resource 2
> > > > > > which is
> > > > > > more
> > > > > > important than resource 3. And thus in the case of multiple
> > > > > > nodes
> > > > > > failure, the spare node must be running the higher priority
> > > > > > resource
> > > > > > even if this means the lower priority resources will be
> > > > > > stopped.
> > > > > > 
> > > > > > When the resources are created with the adequate priorities
> > > > > > but
> > > > > > without
> > > > > > colocation constraint they are indeed running on each
> > > > > > "main" node
> > > > > > as
> > > > > > expected.
> > > > > > 
> > > > > > Trouble arises when I start adding the colocation
> > > > > > constraints.
> > > > > > 
> > > > > > If I add one colocation constraint (resource3 cannot run
> > > > > > with
> > > > > > resource2), resources remain correctly on their node. 
> > > > > > 
> > > > > > But as soon as I add a second colocation constraint
> > > > > > (resource2
> > > > > > cannot
> > > > > > run with resource1), resource1 switches to the spare node
> > > > > > because
> > > > > > the
> > > > > > resource1 allocation score on node1 becomes -INFINITY and I
> > > > > > cannot
> > > > > > understand why ?
> > > > > > 
> > > > > > Setup and commands output below
> > > > > > 
> > > > > > Thank you
> > > > > > 
> > > > > > 
> > > > > > ****************
> > > > > > 
> > > > > > Resources definition, opt-in cluster
> > > > > > 
> > > > > > # pcs property set symmetric-cluster=false
> > > > > > # pcs resource create TestResourceNode1 ocf:pacemaker:Dummy
> > > > > > op
> > > > > > monitor
> > > > > > interval=120s
> > > > > > # pcs constraint location TestResourceNode1 prefers
> > > > > > node1=100
> > > > > > # pcs constraint location TestResourceNode1 prefers
> > > > > > nodespare=80
> > > > > > # pcs resource create TestResourceNode2 ocf:pacemaker:Dummy
> > > > > > op
> > > > > > monitor
> > > > > > interval=120s
> > > > > > # pcs constraint location TestResourceNode2 prefers
> > > > > > node2=50
> > > > > > # pcs constraint location TestResourceNode2 prefers
> > > > > > nodespare=30
> > > > > > # pcs resource create TestResourceNode3 ocf:pacemaker:Dummy
> > > > > > op
> > > > > > monitor
> > > > > > interval=120s
> > > > > > # pcs constraint location TestResourceNode3 prefers
> > > > > > node3=10
> > > > > > # pcs constraint location TestResourceNode3 prefers
> > > > > > nodespare=3  
> > > > > 
> > > > > Side comment: Using different location constraint scores for
> > > > > each
> > > > > resource doesn't establish a priority of resources if they
> > > > > can't
> > > > > all be
> > > > > run. For that, there is an actual "priority" meta-attribute
> > > > > for
> > > > > resources, so you want to set that for all three.
> > > > >   
> > > > > > # crm_simulate -sL
> > > > > > 
> > > > > > Current cluster status:
> > > > > > Online: [ node1 node2 node3 nodespare ]
> > > > > > 
> > > > > >  TestResourceNode1	(ocf::pacemaker:Dummy):	Started
> > > > > > node1
> > > > > >  TestResourceNode2	(ocf::pacemaker:Dummy):	Started
> > > > > > node2
> > > > > >  TestResourceNode3	(ocf::pacemaker:Dummy):	Started
> > > > > > node3
> > > > > > 
> > > > > > Allocation scores:
> > > > > > native_color: TestResourceNode1 allocation score on node1:
> > > > > > 100
> > > > > > native_color: TestResourceNode1 allocation score on
> > > > > > nodespare: 80
> > > > > > native_color: TestResourceNode2 allocation score on node2:
> > > > > > 50
> > > > > > native_color: TestResourceNode2 allocation score on
> > > > > > nodespare: 30
> > > > > > native_color: TestResourceNode3 allocation score on node3:
> > > > > > 10
> > > > > > native_color: TestResourceNode3 allocation score on
> > > > > > nodespare: 3
> > > > > > 
> > > > > > # pcs constraint colocation add TestResourceNode3 with
> > > > > > TestResourceNode2 score=-INFINITY
> > > > > > # crm_simulate -sL
> > > > > > 
> > > > > > Current cluster status:
> > > > > > Online: [ node1 node2 node3 nodespare ]
> > > > > > 
> > > > > >  TestResourceNode1	(ocf::pacemaker:Dummy):	Started
> > > > > > node1
> > > > > >  TestResourceNode2	(ocf::pacemaker:Dummy):	Started
> > > > > > node2
> > > > > >  TestResourceNode3	(ocf::pacemaker:Dummy):	Started
> > > > > > node3
> > > > > > 
> > > > > > Allocation scores:
> > > > > > native_color: TestResourceNode1 allocation score on node1:
> > > > > > 100
> > > > > > native_color: TestResourceNode1 allocation score on
> > > > > > nodespare: 80
> > > > > > native_color: TestResourceNode2 allocation score on node2:
> > > > > > 50
> > > > > > native_color: TestResourceNode2 allocation score on
> > > > > > nodespare: 27
> > > > > > native_color: TestResourceNode3 allocation score on node3:
> > > > > > 10
> > > > > > native_color: TestResourceNode3 allocation score on
> > > > > > nodespare: 3
> > > > > > 
> > > > > > # pcs constraint colocation add TestResourceNode2 with
> > > > > > TestResourceNode1 score=-INFINITY
> > > > > > # crm_simulate -sL
> > > > > > 
> > > > > > Current cluster status:
> > > > > > Online: [ node1 node2 node3 nodespare ]
> > > > > > 
> > > > > >  TestResourceNode1	(ocf::pacemaker:Dummy):	Started
> > > > > > nodespare
> > > > > >  TestResourceNode2	(ocf::pacemaker:Dummy):	Started
> > > > > > node2
> > > > > >  TestResourceNode3	(ocf::pacemaker:Dummy):	Started
> > > > > > node3
> > > > > > 
> > > > > > Allocation scores:
> > > > > > native_color: TestResourceNode1 allocation score on node1:
> > > > > > -INFINITY
> > > > > > native_color: TestResourceNode1 allocation score on
> > > > > > nodespare: 53
> > > > > > native_color: TestResourceNode2 allocation score on node2:
> > > > > > 50
> > > > > > native_color: TestResourceNode2 allocation score on
> > > > > > nodespare:
> > > > > > -INFINITY
> > > > > > native_color: TestResourceNode3 allocation score on node3:
> > > > > > 10
> > > > > > native_color: TestResourceNode3 allocation score on
> > > > > > nodespare: 3  
> > > > > 
> > > > > This seems like a bug to me. Can you attach (or e-mail me
> > > > > privately)
> > > > > the pe-input file that led to the above situation?
> > > > >   
> > > > 
> > > > What apparently happens is the problem with INFINITY math. We
> > > > have
> > > > chain
> > > > 
> > > > TestResourceNode3 -> TestResourceNode2 -> TestResourceNode1
> > > > 
> > > > from colocation constraints
> > > > 
> > > > colocate(TestResourceNode3, TestResourceNode2, -INFINITY)
> > > > colocate(TestResourceNode2, TestResourceNode1, -INFINITY)
> > > > 
> > > > TestResourceNode1 gets score 100 on node1 and then tries to
> > > > include
> > > > score of TestResourceNode2. Factor is -1 (-INFINITY/INFINITY),
> > > > score
> > > > on
> > > > node is -INFINITY so result is 100 + (-1)*(-INFINITY) ==
> > > > INFINITY.
> > > > Next
> > > > step is to include scores of TestResourceNode3. The problem is,
> > > > rsc_merge_wights flips factor:
> > > > 
> > > >     if (factor < 0) {
> > > >         multiplier = -1;
> > > >     }
> > > > ...
> > > >             work = rsc_merge_weights(other, rhs, work,
> > > > constraint->node_attribute,
> > > >                                      multiplier *
> > > > (float)constraint->score / INFINITY,
> > > > flags|pe_weights_rollback);
> > > > 
> > > > 
> > > > so factor becomes (-1)*(-INFINITY/INFINITY) == 1, score of
> > > > TestResourceNode3 on node1 is -INFINITY so pacemaker adds
> > > > (1)*(-INFINITY) == -INFINITY with final result -INFINITY,
> > > > blocking
> > > > TestResourceNode1 from node1.  
> > > 
> > > Your explanation triggered a distant memory :)
> > > 
> > > This is a similar situation to CLBZ#5320:
> > > https://bugs.clusterlabs.org/show_bug.cgi?id=5320 
> > > 
> > > The solution I recommended there is probably the best for this
> > > one,
> > > too: use utilization attributes instead of colocation constraints
> > > to
> > > keep the resources on different nodes.  
> > 
> > Sounds like the "carpet solution" (lift carpet, move the dirt under
> > it, then
> > lower the carpet, and everything looks just fine" ;-)
> > 
> > No efforts to clean up the mess or document the brokenness?
> 
> Agree. This looks to be a fairly common scenario. At least, it
> deserve to be
> properly documented, if not fixed.

The recommended solution is a workaround until #5320 is fixed.
Documenting it is a good idea, I'll try to do that.

At this point, it's not clear whether the behavior can be corrected
automatically, or whether additional configuration syntax will be
needed to handle the case.

Of course I would love all reported bugs to be fixed by tomorrow :) but
the reality is developer time is in short supply and triage is
practically the Trolley Problem. Volunteers are always welcome.
-- 
Ken Gaillot <kgaillot at redhat.com>