[Pacemaker] pingd process dies for no reason

Tue Jan 11 10:21:13 EST 2011

On Tue, Jan 11, 2011 at 03:53:29PM +0100, Andrew Beekhof wrote:
> On Tue, Jan 11, 2011 at 2:45 PM, Lars Ellenberg
> <lars.ellenberg at linbit.com> wrote:
> > On Tue, Jan 11, 2011 at 11:24:35AM +0100, Patrik.Rapposch at knapp.com wrote:
> >> we already made changes to the interval and timeout (<op
> >> id="pingd-op-monitor-30s" interval="30s" name="monitor" timeout="10s"/>).
> >>
> >> how big should dampen be set?
> >>
> >> please correct me, if i am wrong, as i calculate it as following:
> >> assuming the last check was ok and in the next second, the failures takes
> >> place:
> >> then we there would be 29s till the next check will start, and another 10
> >> seconds timeout, plus 5 seconds dampen. this would be 44 seconds, isn't
> >> that enough?
> >
> > I think "dampen" needs to be larger than the monitoring interval.
> > And the timeout on the operation should be large enough that
> > ping, even if the remote is unreachable for the first time,
> > will timeout by itself (and not killed prematurely by lrmd because
> > the operation timeout elapsed).
> >
> > try with interval 15s, dampen 20,
> >  instance parameter timeout: something explicit, if you want to.
> >  instance parameter attempts: something explicit, if you want to.
> >  monitor operation timeout=60s
> >
> > BTW, someone should really implement the fping based ping RA ...
> 
> Thankyou for volunteering :-)

  :-P

 Date: Fri, 3 Sep 2010 12:12:58 +0200
 From: Bernd Schubert <bs_lists at aakef.fastmail.fm>                                                                                                                                                              
 Subject: Re: [Pacemaker] pingd                                                                                                                                                                                 

On Friday, September 03, 2010, Lars Ellenberg wrote:
> > > how about an fping RA ?
> > > active=$(fping -a -i 5 -t 250 -B1 -r1 $host_list 2>/dev/null | wc -l)
> > > 
> > > terminates in about 3 seconds for a hostlist of 100 (on the LAN, 29 of
> > > which are alive).
> > 
> > Happy to add if someone writes it :-)
> 
> I thought so ;-)
> Additional note to whomever is going to:
> 
> With fping you can get fancy about "better connectivity",
> you are not limited to the measure "number of nodes responding".

I think for the beginning, just the basic feature should be sufficient. 
Actually I thought about to add an option to the existing ping RA to let the 
user choose between ping and fping, it would default to ping. I will do that 
mid of next week.

...

Bernd?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.