[ClusterLabs] Question about ping nodes

Mon Apr 19 12:15:26 EDT 2021

On Mon, 2021-04-19 at 15:02 +0000, Walker, Chris wrote:
> I see the same behavior as Andrei.  On a two-node test cluster
> (symmetric-cluster=true, resource-stickiness=100) I configure a
> simple Dummy resource and add the rule
>  
>       <rsc_location id="dummy_loc" rsc="dummy">
>         <rule score="-10000" id="dummy_loc-rule-0">
>           <expression attribute="mgmt" operation="lt" value="1"
> type="number" id="dummy_loc-rule-0-expression"/>
>         </rule>
>       </rsc_location>
>  
> If I then change the value of ‘mgmt’ to zero for both nodes, the
> resource stops.  We have code much like Andrei outlined to work
> around this behavior.

Well, this is a new one to me. I've confirmed that this is the
intentional behavior, and the documentation is an oversimplification.

The documentation is partly correct: a -INFINITY score means the node
is *never* eligible to run the resource, and a nonnegative score means
the node *may* be eligible to run the resource under certain
circumstances.

What it omits is what those circumstances are: when some other positive
factor outweighs the negative score. For example, another constraint,
or stickiness.

That does give an interesting configuration possibility: with a
stickiness able to outweigh the location constraint, any node running a
resource when connectivity is lost would be able to continue running
the resource, but once the resource stopped, it couldn't be started
again.

Anyway, a workaround to achieve the desired effect would be to give
positive location constraints for the resource on all nodes, equal in
dimension to the negative constraint score. Because the positive scores
are all equal, they shouldn't affect which node is chosen, but any node
that loses connectivity will have it score drop to 0 rather than
negative, allowing it to run resources, while still preferring any node
with connectivity.

I've filed a bug to update either the documentation or the behavior
(I'm not sure which would be better at this point):

  https://bugs.clusterlabs.org/show_bug.cgi?id=5472

> Thanks,
> Chris
>  
> From: Users <users-bounces at clusterlabs.org>
> Date: Monday, April 19, 2021 at 10:28 AM
> To: Cluster Labs - All topics related to open-source clustering
> welcomed <users at clusterlabs.org>
> Subject: Re: [ClusterLabs] Question about ping nodes
> 
> On Sun, 2021-04-18 at 17:31 +0300, Andrei Borzenkov wrote:
> > On 18.04.2021 08:41, Andrei Borzenkov wrote:
> > > On 17.04.2021 22:41, Piotr Kandziora wrote:
> > > > Hi,
> > > > 
> > > > Hope some guru will advise here ;)
> > > > 
> > > > I've got two nodes cluster with some resource placement
> dependent
> > > > on ping
> > > > node visibility (
> > > > 
> https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/high_availability_add-on_reference/s1-moving_resources_due_to_connectivity_changes-haar
> > > > ).
> > > > 
> > > > Is it possible to do nothing with these resources when both
> nodes
> > > > do not
> > > > have access to the ping node?
> > > > 
> > > > Currently, when the ping node is unavailable (node itself
> becomes
> > > > unavailable) both nodes stop the resources.
> > > > 
> > > 
> > > Just use any negative score higher than -INFINITY in location
> > > constraint.
> > > 
> > 
> > No, it does not work. I was mislead by documentation (5.2.1
> location
> > properties):
> 
> Actually your initial interpretation was correct. Using a non-
> infinite
> negative score for the location constraint should work as expected --
> the resource can still run there if there's no better place.
> 
> I'm not sure why you didn't see that in your test, maybe some other
> factor prevented it from running?
> 
> BTW another solution to the initial problem would be to use multiple
> IPs in the ping agent. It would be less likely for all of them to be
> down at the same time without a network issue.
> 
> > 
> > score:
> > 
> > Negative values indicate the resource(s) should avoid this node (a
> > value
> > of -INFINITY changes "should" to "must").
> > 
> > I interpreted "should" as "pacemaker will normally avoid this node
> > but
> > still may chose it if nothing better is possible". It is not what
> > happens. Apparently negative score completely prevents assigning
> > resource to this node, and "should" here probably means "it is
> still
> > possible that final score may become positive".
> > 
> > As it is not possible to refer to attributes of multiple nodes in a
> > rule, you would need something that combines current pingd status
> for
> > individual nodes and makes it available. Logical place is
> > ocf:pacemaker:ping resource agent itself.
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot <kgaillot at redhat.com>