[ClusterLabs] setting up SBD_WATCHDOG_TIMEOUT, stonith-timeout and stonith-watchdog-timeout

Mon Dec 19 15:02:04 CET 2016

On Mon, 19 Dec 2016 13:37:09 +0100
Klaus Wenninger <kwenning at redhat.com> wrote:

> On 12/17/2016 11:55 PM, Jehan-Guillaume de Rorthais wrote:
> > On Wed, 14 Dec 2016 14:52:41 +0100
> > Klaus Wenninger <kwenning at redhat.com> wrote:
> >  
> >> On 12/14/2016 01:26 PM, Jehan-Guillaume de Rorthais wrote:  
> >>> On Thu, 8 Dec 2016 11:47:20 +0100
> >>> Jehan-Guillaume de Rorthais <jgdr at dalibo.com> wrote:
> >>>    
> >>>> Hello,
> >>>>
> >>>> While setting this various parameters, I couldn't find documentation and
> >>>> details about them. Bellow some questions.
> >>>>
> >>>> Considering the watchdog module used on a server is set up with a 30s
> >>>> timer (lets call it the wdt, the "watchdog timer"), how should
> >>>> "SBD_WATCHDOG_TIMEOUT", "stonith-timeout" and "stonith-watchdog-timeout"
> >>>> be set?
> >>>>
> >>>> Here is my thinking so far:
> >>>>
> >>>> "SBD_WATCHDOG_TIMEOUT < wdt". The sbd daemon should reset the timer
> >>>> before the wdt expire so the server stay alive. Online resources and
> >>>> default values are usually "SBD_WATCHDOG_TIMEOUT=5s" and "wdt=30s". But
> >>>> what if sbd fails to reset the timer multiple times (eg. because of
> >>>> excessive load, swap storm etc)? The server will not reset before
> >>>> random*SBD_WATCHDOG_TIMEOUT or wdt, right?     
> >> SBD_WATCHDOG_TIMEOUT (e.g. in /etc/sysconfig/sbd) is already the
> >> timeout the hardware watchdog is configured to by sbd-daemon.  
> > Oh, ok, I did not realized sbd was actually setting the hardware watchdog
> > timeout itself based on this variable. After some quick search to make sure
> > I understand it right, I suppose it is done there?
> > https://github.com/ClusterLabs/sbd/blob/172dcd03eaf26503a10a18501aa1b9f30eed7ee2/src/sbd-common.c#L123
> >  
> >> sbd-daemon is triggering faster - timeout_loop defaults to 1s but
> >> is configurable.
> >>
> >> SBD_WATCHDOG_TIMEOUT (and maybe the loop timeout as well
> >> but significantly shorter should be sufficient)
> >> has to be configured so that failing to trigger within time means
> >> a failure with high enough certainty or the machine showing
> >> comparable response-times would anyway violate timing requirements
> >> of the services running on itself and in the cluster.  
> > OK. So I understand now why 5s is fine as a default value then.
> >  
> >> Have in mind that sbd-daemon defaults to running realtime-scheduled
> >> and thus is gonna be more responsive than the usual services
> >> on the system. Although you of course have to consider that
> >> the watchers (child-processes of sbd that are observing e.g.
> >> the block-device(s), corosync, pacemaker_remoted or
> >> pacemaker node-health) might be significantly less responsive
> >> due to their communication partners.  
> > I'm not sure yet to understand clearly the mechanism and interactions of sbd
> > with other daemons. So far, I understood that Pacemaker/stonithd was able to
> > poke sbd to ask it to trigger a node reset through the wd device. I'm very
> > new to this area and I still lake of self documentation.  
> 
> Pacemaker is setting the node unclean which pacemaker-watcher
> (one of sbd daemons) sees as it is connected to the cib.
> This is why the mechanism is working (sort of - see the discussion
> in my pull request in the sbd-repo) on nodes without stonithd as
> well (remote-nodes).
> If you are running sbd with a block-device there is of course this
> way of communication as well between pacemaker and sbd.
> (e.g. via fence_sbd fence-agent)
> Be aware that there are different levels of support for these
> features in the distributions. (RHEL more on the watchdog-side,
> SLES more on the block-device side ... roughly as far as I got it)

OK, I have a better understanding of the need for various sbd watchers and
how it all sounds to works.

> >>>> "stonith-watchdog-timeout > SBD_WATCHDOG_TIMEOUT". I'm not quite sure
> >>>> what is stonith-watchdog-timeout. Is it the maximum time to wait from
> >>>> stonithd after it asked for a node fencing before it considers the
> >>>> watchdog was actually triggered and the node reseted, even with no
> >>>> confirmation? I suppose "stonith-watchdog-timeout" is mostly useful to
> >>>> stonithd, right?    
> >> Yes, the time we can assume a node to be killed by the hardware-watchdog...
> >> Double the hardware-watchdog-timeout is a good choice.  
> > OK, thank you
> >  
> >>>> "stonith-watchdog-timeout < stonith-timeout". I understand the stonith
> >>>> action timeout should be at least greater than the wdt so stonithd will
> >>>> not raise a timeout before the wdt had a chance to exprire and reset the
> >>>> node. Is it right?    
> >> stonith-timeout is the cluster-wide-defaut to wait for stonith-devices
> >> to carry out their duty. In the sbd-case without a block-device (sbd used
> >> for pacemaker to be observed by a hardware-watchdog) it shouldn't
> >> play a role.  
> > I thought self-fencing through sbd/wd was carried by stonithd because of
> > such messages in my PoC log files:
> >
> >   stonith-ng: notice: unpack_config: Relying on watchdog integration for
> > fencing  
> 
> see above ... or read as sit still and wait for the watchdog to do the
> job ;-)

Ok, perfectly clear now :)

Thank you again!