[ClusterLabs] Antw: Re: Colocation constraint moving resource

Andrei Borzenkov arvidjaar at gmail.com
Sun Mar 31 02:04:51 EDT 2019


27.03.2019 17:27, Ken Gaillot пишет:
> On Wed, 2019-03-27 at 09:43 +0100, Jehan-Guillaume de Rorthais wrote:
>> On Wed, 27 Mar 2019 08:10:23 +0100
>> "Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de> wrote:
>>
>>>>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 26.03.2019 um
>>>>>> 20:28 in  
>>>
>>> Nachricht
>>> <1d8d000ab946586783fc9adec3063a1748a5b06f.camel at redhat.com>:
>>>> On Tue, 2019-03-26 at 22:12 +0300, Andrei Borzenkov wrote:  
>>>>> 26.03.2019 17:14, Ken Gaillot пишет:  
>>>>>> On Tue, 2019-03-26 at 14:11 +0100, Thomas Singleton wrote:  
>>>>>>> Dear all
>>>>>>>
>>>>>>> I am encountering an issue with colocation constraints. 
>>>>>>>
>>>>>>> I have created a 4 nodes cluster (3 "main" and 1 "spare")
>>>>>>> with 3
>>>>>>> resources and I wish to have each resource run only on its
>>>>>>> own
>>>>>>> node
>>>>>>> (or
>>>>>>> on the spare) and resources must never run together on the
>>>>>>> spare.
>>>>>>>
>>>>>>> I understand that this implies a definition of priorities
>>>>>>> between
>>>>>>> resources should two nodes fail at the same time. This is
>>>>>>> the
>>>>>>> desired
>>>>>>> behavior. Resource 1 is more important than resource 2
>>>>>>> which is
>>>>>>> more
>>>>>>> important than resource 3. And thus in the case of multiple
>>>>>>> nodes
>>>>>>> failure, the spare node must be running the higher priority
>>>>>>> resource
>>>>>>> even if this means the lower priority resources will be
>>>>>>> stopped.
>>>>>>>
>>>>>>> When the resources are created with the adequate priorities
>>>>>>> but
>>>>>>> without
>>>>>>> colocation constraint they are indeed running on each
>>>>>>> "main" node
>>>>>>> as
>>>>>>> expected.
>>>>>>>
>>>>>>> Trouble arises when I start adding the colocation
>>>>>>> constraints.
>>>>>>>
>>>>>>> If I add one colocation constraint (resource3 cannot run
>>>>>>> with
>>>>>>> resource2), resources remain correctly on their node. 
>>>>>>>
>>>>>>> But as soon as I add a second colocation constraint
>>>>>>> (resource2
>>>>>>> cannot
>>>>>>> run with resource1), resource1 switches to the spare node
>>>>>>> because
>>>>>>> the
>>>>>>> resource1 allocation score on node1 becomes -INFINITY and I
>>>>>>> cannot
>>>>>>> understand why ?
>>>>>>>
>>>>>>> Setup and commands output below
>>>>>>>
>>>>>>> Thank you
>>>>>>>
>>>>>>>
>>>>>>> ****************
>>>>>>>
>>>>>>> Resources definition, opt-in cluster
>>>>>>>
>>>>>>> # pcs property set symmetric-cluster=false
>>>>>>> # pcs resource create TestResourceNode1 ocf:pacemaker:Dummy
>>>>>>> op
>>>>>>> monitor
>>>>>>> interval=120s
>>>>>>> # pcs constraint location TestResourceNode1 prefers
>>>>>>> node1=100
>>>>>>> # pcs constraint location TestResourceNode1 prefers
>>>>>>> nodespare=80
>>>>>>> # pcs resource create TestResourceNode2 ocf:pacemaker:Dummy
>>>>>>> op
>>>>>>> monitor
>>>>>>> interval=120s
>>>>>>> # pcs constraint location TestResourceNode2 prefers
>>>>>>> node2=50
>>>>>>> # pcs constraint location TestResourceNode2 prefers
>>>>>>> nodespare=30
>>>>>>> # pcs resource create TestResourceNode3 ocf:pacemaker:Dummy
>>>>>>> op
>>>>>>> monitor
>>>>>>> interval=120s
>>>>>>> # pcs constraint location TestResourceNode3 prefers
>>>>>>> node3=10
>>>>>>> # pcs constraint location TestResourceNode3 prefers
>>>>>>> nodespare=3  
>>>>>>
>>>>>> Side comment: Using different location constraint scores for
>>>>>> each
>>>>>> resource doesn't establish a priority of resources if they
>>>>>> can't
>>>>>> all be
>>>>>> run. For that, there is an actual "priority" meta-attribute
>>>>>> for
>>>>>> resources, so you want to set that for all three.
>>>>>>   
>>>>>>> # crm_simulate -sL
>>>>>>>
>>>>>>> Current cluster status:
>>>>>>> Online: [ node1 node2 node3 nodespare ]
>>>>>>>
>>>>>>>  TestResourceNode1	(ocf::pacemaker:Dummy):	Started
>>>>>>> node1
>>>>>>>  TestResourceNode2	(ocf::pacemaker:Dummy):	Started
>>>>>>> node2
>>>>>>>  TestResourceNode3	(ocf::pacemaker:Dummy):	Started
>>>>>>> node3
>>>>>>>
>>>>>>> Allocation scores:
>>>>>>> native_color: TestResourceNode1 allocation score on node1:
>>>>>>> 100
>>>>>>> native_color: TestResourceNode1 allocation score on
>>>>>>> nodespare: 80
>>>>>>> native_color: TestResourceNode2 allocation score on node2:
>>>>>>> 50
>>>>>>> native_color: TestResourceNode2 allocation score on
>>>>>>> nodespare: 30
>>>>>>> native_color: TestResourceNode3 allocation score on node3:
>>>>>>> 10
>>>>>>> native_color: TestResourceNode3 allocation score on
>>>>>>> nodespare: 3
>>>>>>>
>>>>>>> # pcs constraint colocation add TestResourceNode3 with
>>>>>>> TestResourceNode2 score=-INFINITY
>>>>>>> # crm_simulate -sL
>>>>>>>
>>>>>>> Current cluster status:
>>>>>>> Online: [ node1 node2 node3 nodespare ]
>>>>>>>
>>>>>>>  TestResourceNode1	(ocf::pacemaker:Dummy):	Started
>>>>>>> node1
>>>>>>>  TestResourceNode2	(ocf::pacemaker:Dummy):	Started
>>>>>>> node2
>>>>>>>  TestResourceNode3	(ocf::pacemaker:Dummy):	Started
>>>>>>> node3
>>>>>>>
>>>>>>> Allocation scores:
>>>>>>> native_color: TestResourceNode1 allocation score on node1:
>>>>>>> 100
>>>>>>> native_color: TestResourceNode1 allocation score on
>>>>>>> nodespare: 80
>>>>>>> native_color: TestResourceNode2 allocation score on node2:
>>>>>>> 50
>>>>>>> native_color: TestResourceNode2 allocation score on
>>>>>>> nodespare: 27
>>>>>>> native_color: TestResourceNode3 allocation score on node3:
>>>>>>> 10
>>>>>>> native_color: TestResourceNode3 allocation score on
>>>>>>> nodespare: 3
>>>>>>>
>>>>>>> # pcs constraint colocation add TestResourceNode2 with
>>>>>>> TestResourceNode1 score=-INFINITY
>>>>>>> # crm_simulate -sL
>>>>>>>
>>>>>>> Current cluster status:
>>>>>>> Online: [ node1 node2 node3 nodespare ]
>>>>>>>
>>>>>>>  TestResourceNode1	(ocf::pacemaker:Dummy):	Started
>>>>>>> nodespare
>>>>>>>  TestResourceNode2	(ocf::pacemaker:Dummy):	Started
>>>>>>> node2
>>>>>>>  TestResourceNode3	(ocf::pacemaker:Dummy):	Started
>>>>>>> node3
>>>>>>>
>>>>>>> Allocation scores:
>>>>>>> native_color: TestResourceNode1 allocation score on node1:
>>>>>>> -INFINITY
>>>>>>> native_color: TestResourceNode1 allocation score on
>>>>>>> nodespare: 53
>>>>>>> native_color: TestResourceNode2 allocation score on node2:
>>>>>>> 50
>>>>>>> native_color: TestResourceNode2 allocation score on
>>>>>>> nodespare:
>>>>>>> -INFINITY
>>>>>>> native_color: TestResourceNode3 allocation score on node3:
>>>>>>> 10
>>>>>>> native_color: TestResourceNode3 allocation score on
>>>>>>> nodespare: 3  
>>>>>>
>>>>>> This seems like a bug to me. Can you attach (or e-mail me
>>>>>> privately)
>>>>>> the pe-input file that led to the above situation?
>>>>>>   
>>>>>
>>>>> What apparently happens is the problem with INFINITY math. We
>>>>> have
>>>>> chain
>>>>>
>>>>> TestResourceNode3 -> TestResourceNode2 -> TestResourceNode1
>>>>>
>>>>> from colocation constraints
>>>>>
>>>>> colocate(TestResourceNode3, TestResourceNode2, -INFINITY)
>>>>> colocate(TestResourceNode2, TestResourceNode1, -INFINITY)
>>>>>
>>>>> TestResourceNode1 gets score 100 on node1 and then tries to
>>>>> include
>>>>> score of TestResourceNode2. Factor is -1 (-INFINITY/INFINITY),
>>>>> score
>>>>> on
>>>>> node is -INFINITY so result is 100 + (-1)*(-INFINITY) ==
>>>>> INFINITY.
>>>>> Next
>>>>> step is to include scores of TestResourceNode3. The problem is,
>>>>> rsc_merge_wights flips factor:
>>>>>
>>>>>     if (factor < 0) {
>>>>>         multiplier = -1;
>>>>>     }
>>>>> ...
>>>>>             work = rsc_merge_weights(other, rhs, work,
>>>>> constraint->node_attribute,
>>>>>                                      multiplier *
>>>>> (float)constraint->score / INFINITY,
>>>>> flags|pe_weights_rollback);
>>>>>
>>>>>
>>>>> so factor becomes (-1)*(-INFINITY/INFINITY) == 1, score of
>>>>> TestResourceNode3 on node1 is -INFINITY so pacemaker adds
>>>>> (1)*(-INFINITY) == -INFINITY with final result -INFINITY,
>>>>> blocking
>>>>> TestResourceNode1 from node1.  
>>>>
>>>> Your explanation triggered a distant memory :)
>>>>
>>>> This is a similar situation to CLBZ#5320:
>>>> https://bugs.clusterlabs.org/show_bug.cgi?id=5320 
>>>>
>>>> The solution I recommended there is probably the best for this
>>>> one,
>>>> too: use utilization attributes instead of colocation constraints
>>>> to
>>>> keep the resources on different nodes.  
>>>
>>> Sounds like the "carpet solution" (lift carpet, move the dirt under
>>> it, then
>>> lower the carpet, and everything looks just fine" ;-)
>>>
>>> No efforts to clean up the mess or document the brokenness?
>>
>> Agree. This looks to be a fairly common scenario. At least, it
>> deserve to be
>> properly documented, if not fixed.
> 


Pacemaker sorely lacks basic explanation of its logic. "Everything is a
score" is very unusual to anyone coming from a different HA stack. And
this basic logic is not documented anywhere. There are scattered
descriptions of "weights", "scores" etc but nowhere is explained how
they are actually combined together, where pacemaker starts and hwo it
proceeds.

Yes, after lurking on this list for some time one can get vague picture,
but it should not be that difficult really (for something that is at the
very heart of high available solution).


> The recommended solution is a workaround until #5320 is fixed.
> Documenting it is a good idea, I'll try to do that.
> 
> At this point, it's not clear whether the behavior can be corrected
> automatically, or whether additional configuration syntax will be
> needed to handle the case.
> 

I have a felling that this is logical fallacy.

We know that positive and negative constraints are asymmetrical. To the
extent that pacemaker has special cased it.

node_hash_update():

        if (factor < 0 && score < 0) {
            /* Negative preference for a node with a negative score
             * should not become a positive preference
             *
             * TODO - Decide if we want to filter only if weight ==
-INFINITY
             *
             */

This actually explains why single colocation constraint did not affect
score of TestResourceNode2 on node2 - factor is -1, score of
TestResourceNode3 is -INFINITY so this constraint is simply ignored.

Unfortunately pacemaker subtly inverts constraint polarity when
traversing colocation chain.

Consider general case of colocation(B,A,factor1). We take score of A and
then (according to "Colocation Explained") we want to weigh in B score
adjusted by factor1. So far so good.

Now there is colocation(C,B,factor2). Logically to compute score of B
before it is used by computing score of A we should take score of B and
the weigh in score of C adjusted by factor2. But that's not what happens
here. Instead pacemaker *reverts* polarity of second colocation which
now becomes colocation(C,B,-factor2). But positive and negative
constraints are not symmetrical; INFINITY is not opposite of -INFINITY.

Considering this change,

location(B,node1,-INFINITY)
location(C,node1,-INFINITY)
colocation(B,A,-INFINITY)
colocation(C,B,-INFINITY)

would result in first computing score of B on node1, we evaluate
colocation(C,B,-INFINITY), we have factor -1 and C score -INFINITY so
this colocation constraint is ignored according to filtering rule; score
of B does not change.

Same repeats now for colocation(B,A,-INFINITY) - factor is -1, B score
is -INFINITY, constraint is ignored.

That does not mean that we should allocate B first! We still start with
A, and allocate A first; we just take into account "true" scores of
dependent resources they would have had when overall allocation is
completed.

> Of course I would love all reported bugs to be fixed by tomorrow :) but
> the reality is developer time is in short supply and triage is
> practically the Trolley Problem. Volunteers are always welcome.
> 



More information about the Users mailing list