[Pacemaker] SBD Fencing daemon: explain me more clear

Tue Jun 15 09:12:58 EDT 2010

On 2010-06-14T17:24:16, Aleksey Zholdak <aleksey at zholdak.com> wrote:

Hi Aleksey,

> Can anybody explain me more clear than on official and (IMHO)
> outdated page http://www.linux-ha.org/wiki/SBD_Fencing next:
> 
> What timeouts I must specify, if my multipath needs from 90 to 160
> secs to be switched off the dead path... Timeouts below are maybe
> wrong because sometime node1 kills node2 (or vice versa) or some
> node makes suicide...
> 
> > Timeout (watchdog) : 90
> > Timeout (allocate) : 2
> > Timeout (loop)     : 10
> > Timeout (msgwait)  : 180
> 
> And what logic in the calculation of the above timeouts?

Well, 90-160s is a very long time; that effectively could make SBD
unusable in your environment, basically you're introducing a delay of at
least 160s on each fail-over. (At least with the current sbd
implementation.)

You need to increase the watchdog timeout to >160s - probably 180s
should be good in your environment, if you completely want to eliminate
spurious self-fencing.

msgwait should be larger than watchdog timeout; so probably 200s, which
will imply a 200s latency on fail-over.

You may want to make the timeouts lower, leading to a faster fail-over,
since the work-load is paused during the MPIO downtime too I assume, so
fail-over may actually be faster than waiting for MPIO to recover.

But with a ~160s MPIO latency, I'd personally be wary to use sbd
fencing. Why is the MPIO scenario so slow?

Regards,
    Lars

-- 
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde