[ClusterLabs] setting up SBD_WATCHDOG_TIMEOUT, stonith-timeout and stonith-watchdog-timeout
emi2fast at gmail.com
Thu Dec 8 08:37:20 EST 2016
the only thing that I can say is: sbd is a realtime process
2016-12-08 11:47 GMT+01:00 Jehan-Guillaume de Rorthais <jgdr at dalibo.com>:
> While setting this various parameters, I couldn't find documentation and
> details about them. Bellow some questions.
> Considering the watchdog module used on a server is set up with a 30s timer
> (lets call it the wdt, the "watchdog timer"), how should
> "SBD_WATCHDOG_TIMEOUT", "stonith-timeout" and "stonith-watchdog-timeout" be set?
> Here is my thinking so far:
> "SBD_WATCHDOG_TIMEOUT < wdt". The sbd daemon should reset the timer before the
> wdt expire so the server stay alive. Online resources and default values are
> usually "SBD_WATCHDOG_TIMEOUT=5s" and "wdt=30s". But what if sbd fails to reset
> the timer multiple times (eg. because of excessive load, swap storm etc)? The
> server will not reset before random*SBD_WATCHDOG_TIMEOUT or wdt, right?
> "stonith-watchdog-timeout > SBD_WATCHDOG_TIMEOUT". I'm not quite sure what is
> stonith-watchdog-timeout. Is it the maximum time to wait from stonithd after it
> asked for a node fencing before it considers the watchdog was actually
> triggered and the node reseted, even with no confirmation? I suppose
> "stonith-watchdog-timeout" is mostly useful to stonithd, right?
> "stonith-watchdog-timeout < stonith-timeout". I understand the stonith action
> timeout should be at least greater than the wdt so stonithd will not raise a
> timeout before the wdt had a chance to exprire and reset the node. Is it right?
> Any other comments?
> Jehan-Guillaume de Rorthais
> Users mailing list: Users at clusterlabs.org
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users