[Pacemaker] pingd

Bernd Schubert bs_lists at aakef.fastmail.fm
Thu Sep 2 09:00:12 UTC 2010


On Thursday, September 02, 2010, Andrew Beekhof wrote:
> On Wed, Sep 1, 2010 at 11:59 AM, Bernd Schubert
> > My proposal is to rip out all network code out of pingd and to add
> > slightly modified files from 'iputils'.
> 
> Close, but thats not portable.
> Instead use ocf:pacemaker:ping which goes a step further and ditches
> the daemon piece altogether.

Hmm, we are already using that for now temporarily. But I don't think the ping 
RA is suitable for larger clusters. The ping script RA runs everything 
serially and only in intervals when called by lrmd. Now lets assume we have a 
20 node cluster.

nodes = 20
timeout = 2
attempts = 2

Makes 80s for a single run with default already rather small timeouts, which 
is IMHO a bit large. And with a shell script I don't see a way to improve 
that. While we could send the pings in parallel, I have no idea how to lock 
the variable of active nodes (active=`expr $active + 1`). I don't think that 
the simple sh or even bash have a semaphore or mutex lock. So IMHO, we need a 
language that supports that, rewriting the pingd RA is one choice, rewriting 
the ping RA into python is another.

So in fact my first proposal also only was the first step - first add better 
network code and then to make it multi-threaded - each ping host gets its own 
thread.

Another reason why I don't like the shell RA too much is that shell takes a 
considerable amount of CPU time. For a subset of systems where we need ping as 
replacement for quorum policy (*) CPU time is precious. 

Thanks,
Bernd

PS: (*) As you insist ;) on quorum with n/2 + 1 nodes, we use ping as 
replacement. We simply cannot fulfill n/2 + 1, as controller failure takes 
down 50% of the systems (virtual machines) and the systems (VMs) of the 2nd 
controller are then supposed to take over failed services. I see that n/2 + 1 
is optimal and also required for a few nodes. But if you have a larger set of 
system (e.g. minimum 6 with the VM systems I have in my mind) n/2 + 1 is 
sufficient, IMHO. Therefore I asked before to make the quorum policy 
configurable. Now with Lustres multiple-mount-protection and additional stop 
of resources due to ping,  I'm willing to set quorum policy to ignore.








More information about the Pacemaker mailing list