[ClusterLabs] setting up SBD_WATCHDOG_TIMEOUT, stonith-timeout and stonith-watchdog-timeout

Jehan-Guillaume de Rorthais jgdr at dalibo.com
Thu Dec 8 05:47:20 EST 2016


Hello,

While setting this various parameters, I couldn't find documentation and
details about them. Bellow some questions.

Considering the watchdog module used on a server is set up with a 30s timer
(lets call it the wdt, the "watchdog timer"), how should
"SBD_WATCHDOG_TIMEOUT", "stonith-timeout" and "stonith-watchdog-timeout" be set?

Here is my thinking so far:

"SBD_WATCHDOG_TIMEOUT < wdt". The sbd daemon should reset the timer before the
wdt expire so the server stay alive. Online resources and default values are
usually "SBD_WATCHDOG_TIMEOUT=5s" and "wdt=30s". But what if sbd fails to reset
the timer multiple times (eg. because of excessive load, swap storm etc)? The
server will not reset before random*SBD_WATCHDOG_TIMEOUT or wdt, right? 

"stonith-watchdog-timeout > SBD_WATCHDOG_TIMEOUT". I'm not quite sure what is
stonith-watchdog-timeout. Is it the maximum time to wait from stonithd after it
asked for a node fencing before it considers the watchdog was actually
triggered and the node reseted, even with no confirmation? I suppose
"stonith-watchdog-timeout" is mostly useful to stonithd, right?

"stonith-watchdog-timeout < stonith-timeout". I understand the stonith action
timeout should be at least greater than the wdt so stonithd will not raise a
timeout before the wdt had a chance to exprire and reset the node. Is it right?

Any other comments?

Regards,
-- 
Jehan-Guillaume de Rorthais
Dalibo




More information about the Users mailing list