[ClusterLabs] Antw: Re: setting up SBD_WATCHDOG_TIMEOUT, stonith-timeout and stonith-watchdog-timeout

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Fri Dec 9 02:11:30 EST 2016


>>> emmanuel segura <emi2fast at gmail.com> schrieb am 08.12.2016 um 14:37 in
Nachricht
<CAE7pJ3CSQyxQvBqLFvsFU=NLp95JWQdZvAP_6cLyBo5rSdCRng at mail.gmail.com>:
> the only thing that I can say is: sbd is a realtime process

Hi! 

You are saying it's scheduled with policy SCHED_RR and priority 0? A realtime-process is more than ist scheduling policy IMHO.
What are you really trying to say?

Regards,
Ulrich

> 
> 2016-12-08 11:47 GMT+01:00 Jehan-Guillaume de Rorthais <jgdr at dalibo.com>:
>> Hello,
>>
>> While setting this various parameters, I couldn't find documentation and
>> details about them. Bellow some questions.
>>
>> Considering the watchdog module used on a server is set up with a 30s timer
>> (lets call it the wdt, the "watchdog timer"), how should
>> "SBD_WATCHDOG_TIMEOUT", "stonith-timeout" and "stonith-watchdog-timeout" be 
> set?
>>
>> Here is my thinking so far:
>>
>> "SBD_WATCHDOG_TIMEOUT < wdt". The sbd daemon should reset the timer before 
> the
>> wdt expire so the server stay alive. Online resources and default values are
>> usually "SBD_WATCHDOG_TIMEOUT=5s" and "wdt=30s". But what if sbd fails to 
> reset
>> the timer multiple times (eg. because of excessive load, swap storm etc)? 
> The
>> server will not reset before random*SBD_WATCHDOG_TIMEOUT or wdt, right?
>>
>> "stonith-watchdog-timeout > SBD_WATCHDOG_TIMEOUT". I'm not quite sure what is
>> stonith-watchdog-timeout. Is it the maximum time to wait from stonithd after 
> it
>> asked for a node fencing before it considers the watchdog was actually
>> triggered and the node reseted, even with no confirmation? I suppose
>> "stonith-watchdog-timeout" is mostly useful to stonithd, right?
>>
>> "stonith-watchdog-timeout < stonith-timeout". I understand the stonith action
>> timeout should be at least greater than the wdt so stonithd will not raise a
>> timeout before the wdt had a chance to exprire and reset the node. Is it 
> right?
>>
>> Any other comments?
>>
>> Regards,
>> --
>> Jehan-Guillaume de Rorthais
>> Dalibo
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org 
>> http://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
> 
> 
> 
> -- 
>   .~.
>   /V\
>  //  \\
> /(   )\
> ^`~'^
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 








More information about the Users mailing list