[ClusterLabs] Trying to understand dampening (ping)

Klaus Wenninger kwenning at redhat.com
Fri Oct 15 02:24:33 EDT 2021


On Thu, Oct 14, 2021 at 10:51 PM martin doc <db1280 at hotmail.com> wrote:

>
>
> ------------------------------
> *From: *Andrei Borzenkov <arvidjaar at gmail.com>,  Friday, 15 October 2021
> 4:59 AM
> *...*
> > Dampening defines delay before attributes are committed to CIB.
> > Private attributes are never ever written into CIB, so dampening
> > makes no sense here. Private attributes are managed by attrd
> > itself and you see the latest value.
>
> > If you change transient attribute (without -p option) value you
> > will see different values reported by
>
> > attrd_updater -n my_ping -Q
>
> > and
>
> > cibadmin -Q -A "//nvpair[@name='my_ping']"
>
> > until dampening timeout expires.
>
> > This applies even to deleting attribute.
>
> Ok, now I understand what the dampen function does.
>
> If I understand this correctly then this probably makes every documented
> example of using ocf:pacemaker:ping with a colocation statement wrong
> because the only way to see the effect of dampen is to use a rule that
> references the value of pingd directly. That or the script for ping has a
> major flaw with respect to dampen.
>

As we've already tried to explain, purpose of dampening is not
implementation of any
kind of resilience against loss of a certain percentage of packets or
anything similar.

Basic idea is to have more than one ping host so that - given failure_score
is low enough -
there is gonna be a certain resilience against packet loss.
If your number of ping-hosts isn't large enough you might play with adding
them in multiple
times to get some kind of resilience.
But I agree that this one out of two behavior is probably too resilient for
most cases and
thus there might be room for improvement.
Main pain-point here is that ping-RA allows us to configure the count of
pings sent, but it
is just using the exit-value from ping that becomes negative already when
one of the
answers is missing.
This is why with
https://github.com/ClusterLabs/fence-agents/blob/master/agents/heuristics_ping/fence_heuristics_ping.py
I chose to both give the number of packets sent + number received necessary
to be
assumed as alive. If we assume the latter, when not given at all, as equal
to the number
of packets sent we would preserve unchanged behavior for existent
configurations.

Klaus


>
> That is when I do this:
>
> pcs resource create myPing ocf:pacemaker:ping host_list=192.168.1.1
> failure_score=1
> pcs resource create database ocf:heartbeat:pgsql
> pcs group add pgrp myPing database
>
> PCS will move everything to a new node if there is even 1 ping failure
> because monitor in ping doesn't look at the dampened value, only the value
> of the immediate returned value.
>
> The same is true with colocation statements - if a constraint is made with
> a ping resource without using a rule that references pingd then  the dampen
> behaviour is ignored completely.
>
> Is the ping'er missing something that does this:
>
> score=`cibadmin -Q -A "//nvpair[@name='ping']" | sed -e
> 's/.*value="\([^"]*\)".*/\1/'`
>
> before it checks if $score is less than $OCF_RESKEY_failure_score?
>
> Thanks
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20211015/45eab5c3/attachment.htm>


More information about the Users mailing list