[ClusterLabs] Timeout - SBD's vs Watchdog's
jorge.fabregas at gmail.com
Tue Sep 15 16:32:43 EDT 2015
I've finished my tests with SBD on x86 (using the emulated 6300esb
watchdog provided by qemu) but now I'm doing final tests on the target
I have a situation where the watchdog provided by the hypervisor (z/VM)
is not configurable (you can't change the heartbeat via the provided
kernel module). SBD warms me about this and suggests the -T option (so
it doesn't try to change it to match the "watchdog" timeout as specified
in SBD's metadata). The -T option helped there.
Now, I want to use the SBD defaults (5 seconds for watchdog timeout and
10 seconds for msgwait). I plan to use the -P option so storage &
multipath latency issues is not an issue for me.
The problem is that I don't want to set SBD's watchdog timeout to 1
minute (so that it matches the "hardware" watchdog) because I'll have to
change msgwait to 2 minutes at least (it's too much time) so I plan to
leave the defaults (5 & 10 seconds). My question is: is there a
problem if I leave the defaults (5 & 10 seconds for SBD) and the
"hardware" watchdog timeout set at one minute?
In this situation SBD will help in the following ways:
- if it sees a poison-pill on its slot, it will self-fence right away
- if it can't read the SBD device for 5 seconds it will self-fence
- it will "kick the dog" every 5 seconds (even though the timeout is set
at 1 minute at the hardware level)
And if SBD misbehaves or the OS hangs:
- the "hardware" watchdog kicks in (if it has been like that for 1 minute)
Is there something I might be missing?
More information about the Users