[ClusterLabs] XenServer guest and host watchdog

Andrew Cooper andrew.cooper3 at citrix.com
Fri Sep 8 20:39:26 UTC 2017


On 08/09/2017 21:20, Valentin Vidic wrote:
> On Fri, Sep 08, 2017 at 12:57:12PM +0000, Mark Syms wrote:
>> As we discussed regarding the handling of watchdog in XenServer, both
>> guest and host, I've had a discussion with our subject matter expert
>> (Andrew, cc'd) on this topic. The guest watchdogs are handled by a
>> hardware timer in the hypervisor but if the timers themselves are not
>> serviced within 5 seconds the host watchdog will fire and pull the
>> host down.
> I presume the host watchdog is the NMI watchdog described in the
> Xen Hypervisor Command Line Options?
>
> watchdog = force | <boolean> (Default: false)
> Run an NMI watchdog on each processor. If a processor is stuck for
> longer than the watchdog_timeout, a panic occurs. When force is
> specified, in addition to running an NMI watchdog on each processor,
> unknown NMIs will still be processed.
>
> watchdog_timeout = <integer> (Default: 5)
> Set the NMI watchdog timeout in seconds. Specifying 0 will turn off the
> watchdog.
>

Yes.  The internal mechanism of the host watchdog is to use one
performance counter to count retired instructions and generate an NMI
roughly once every half second (give or take C and P states).

Separately, there is a one second timer (the same framework as all other
timers in Xen, including the guest watchdog), which triggers a softirq
(lower priority, runs on the return-to-guest path), which increments a
local variable.  If the NMI handler doesn't observe this local variable
incrementing in the timeout period, Xen crash the entire system.

~Andrew




More information about the Users mailing list