[ClusterLabs] Antw: [EXT] Re: Trying to understand dampening (ping)

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Fri Oct 15 03:16:55 EDT 2021


Oh well, pingd is interesting:
My guess is that it was originally designed to check the connectivity of an interface by pinging some hosts. but some people seem to use it to check the reachability of a specific host.
Regardless of the number of packets being sent, some non-binary behavior would be desired (instead of setting the attribute to 0 or 100 (for example), the value could _range_ from 0 to 1000, indicating the quality of the reachability). As said before, some moving average or exponential average, maybe.

When trying to find out more about pingd, I found this interesting thing in SLES15 SP2 (resource-agents-4.4.0+git57.70549516-3.36.1.x86_64):
"crm ra info pingd" reports:
---
Monitors connectivity to specific hosts or
IP addresses ("ping nodes") (deprecated) (ocf:heartbeat:pingd)

Deprecation warning: This agent is deprecated and may be removed from
a future release. See the ocf:pacemaker:pingd resource agent for a
supported alternative. --
This is a pingd Resource Agent.
...
---

However when I use the recommended "crm ra info ocf:pacemaker:pingd", I also get:
---
pingd resource agent (ocf:pacemaker:pingd)

This agent (ocf:pacemaker:pingd) is deprecated and broken, and has been
replaced by the more reliable ocf:pacemaker:ping. It records (in the CIB)
the current number of ping nodes (specified in the 'host_list' parameter)
a cluster node can connect to.
---
The final ocf:pacemaker:ping still has the same poor description:
---
dampen (integer, [5s]): Dampening interval
    The time to wait (dampening) further changes occur
---

(IMHO "wait ... _for_ further changes _to_ occur" would be a half-was correct sentence)

Regards,
Ulrich


>>> Klaus Wenninger <kwenning at redhat.com> schrieb am 15.10.2021 um 08:24 in
Nachricht
<CALrDAo3C5DOHx2KrUqLLwnqeo9YmoD0ygDzsrt54o6xe1Yz+GQ at mail.gmail.com>:
> On Thu, Oct 14, 2021 at 10:51 PM martin doc <db1280 at hotmail.com> wrote:
> 
>>
>>
>> ------------------------------
>> *From: *Andrei Borzenkov <arvidjaar at gmail.com>,  Friday, 15 October 2021
>> 4:59 AM
>> *...*
>> > Dampening defines delay before attributes are committed to CIB.
>> > Private attributes are never ever written into CIB, so dampening
>> > makes no sense here. Private attributes are managed by attrd
>> > itself and you see the latest value.
>>
>> > If you change transient attribute (without -p option) value you
>> > will see different values reported by
>>
>> > attrd_updater -n my_ping -Q
>>
>> > and
>>
>> > cibadmin -Q -A "//nvpair[@name='my_ping']"
>>
>> > until dampening timeout expires.
>>
>> > This applies even to deleting attribute.
>>
>> Ok, now I understand what the dampen function does.
>>
>> If I understand this correctly then this probably makes every documented
>> example of using ocf:pacemaker:ping with a colocation statement wrong
>> because the only way to see the effect of dampen is to use a rule that
>> references the value of pingd directly. That or the script for ping has a
>> major flaw with respect to dampen.
>>
> 
> As we've already tried to explain, purpose of dampening is not
> implementation of any
> kind of resilience against loss of a certain percentage of packets or
> anything similar.
> 
> Basic idea is to have more than one ping host so that - given failure_score
> is low enough -
> there is gonna be a certain resilience against packet loss.
> If your number of ping-hosts isn't large enough you might play with adding
> them in multiple
> times to get some kind of resilience.
> But I agree that this one out of two behavior is probably too resilient for
> most cases and
> thus there might be room for improvement.
> Main pain-point here is that ping-RA allows us to configure the count of
> pings sent, but it
> is just using the exit-value from ping that becomes negative already when
> one of the
> answers is missing.
> This is why with
> https://github.com/ClusterLabs/fence-agents/blob/master/agents/heuristics_pi 
> ng/fence_heuristics_ping.py
> I chose to both give the number of packets sent + number received necessary
> to be
> assumed as alive. If we assume the latter, when not given at all, as equal
> to the number
> of packets sent we would preserve unchanged behavior for existent
> configurations.
> 
> Klaus
> 
> 
>>
>> That is when I do this:
>>
>> pcs resource create myPing ocf:pacemaker:ping host_list=192.168.1.1
>> failure_score=1
>> pcs resource create database ocf:heartbeat:pgsql
>> pcs group add pgrp myPing database
>>
>> PCS will move everything to a new node if there is even 1 ping failure
>> because monitor in ping doesn't look at the dampened value, only the value
>> of the immediate returned value.
>>
>> The same is true with colocation statements - if a constraint is made with
>> a ping resource without using a rule that references pingd then  the dampen
>> behaviour is ignored completely.
>>
>> Is the ping'er missing something that does this:
>>
>> score=`cibadmin -Q -A "//nvpair[@name='ping']" | sed -e
>> 's/.*value="\([^"]*\)".*/\1/'`
>>
>> before it checks if $score is less than $OCF_RESKEY_failure_score?
>>
>> Thanks
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> ClusterLabs home: https://www.clusterlabs.org/ 
>>





More information about the Users mailing list