[ClusterLabs] How to reduce SBD watchdog timeout?

Lars Marowsky-Bree lmb at suse.com
Mon Apr 8 06:38:05 EDT 2019


On 2019-04-07T12:06:40, Andrei Borzenkov <arvidjaar at gmail.com> wrote:

> After reading sources and experimenting I still do not see how it can
> help in two node cluster. In this case SBD will assume both nodes are
> out of quorum and both nodes will commit suicide.

It helps by not making a single SBD device a single point of failure in
the cluster.

That's particularly relevant in 2 node clusters which can more readily
experience loss of quorum temporarily due to loss of connectivity.

You're right it's not a perfect choice. Multiple SBDs or 2+ nodes are
preferable.

> Pacemaker integration may be useful in two node cluster as backup to
> avoid suicide in case of temporary disk access issues, but we still need
> disk as primary channel with all associated considerations for proper
> timeouts.

Right. If they do lose the device access in addition to being in a
non-quorate or dirty state, the node(s) will still suicide. So even if
it can't read the poison pill from the surviving other node, it'll
self-fence and the other node can trust this and proceed. That is, at
least, the idea.



-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Mary Higgins, Sri Rasiah, HRB 21284 (AG Nürnberg)
"Architects should open possibilities and not determine everything." (Ueli Zbinden)


More information about the Users mailing list