[ClusterLabs] Timeout - SBD's vs Watchdog's

Jorge Fábregas jorge.fabregas at gmail.com
Tue Sep 15 16:32:43 EDT 2015


Hi,

I've finished my tests with SBD on x86 (using the emulated 6300esb
watchdog provided by qemu) but now I'm doing final tests on the target
platform (s390x).

I have a situation where the watchdog provided by the hypervisor (z/VM)
is not configurable (you can't change the heartbeat via the provided
kernel module).  SBD warms me about this and suggests the -T option (so
it doesn't try to change it to match the "watchdog" timeout as specified
in SBD's metadata). The -T option helped there.

Now, I want to use the SBD defaults (5 seconds for watchdog timeout and
10 seconds for msgwait).  I plan to use the -P option so storage &
multipath latency issues is not an issue for me.

The problem is that I don't want to set SBD's watchdog timeout to 1
minute (so that it matches the "hardware" watchdog) because I'll have to
change msgwait to 2 minutes at least (it's too much time) so I plan to
leave the defaults (5  & 10 seconds).  My question is:  is there a
problem if I leave the defaults (5 & 10 seconds for SBD) and the
"hardware" watchdog timeout set at one minute?

In this situation SBD will help in the following ways:

- if it sees a poison-pill on its slot, it will self-fence right away
- if it can't read the SBD device for 5 seconds it will self-fence
- it will "kick the dog" every 5 seconds (even though the timeout is set
at 1 minute at the hardware level)

And if SBD misbehaves or the OS hangs:

- the "hardware" watchdog kicks in (if it has been like that for 1 minute)

Is there something I might be missing?

Thanks!
Jorge




More information about the Users mailing list