[ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

Sat Apr 22 04:39:50 EDT 2017

22.04.2017 11:31, Klaus Wenninger пишет:
>>>>
>>> I wonder how SBD fits into this discussion. It is marketed as stonith
>>> agent, but it is based on committing suicide so relies on well-behaving
>>> nodes. Which we by definition cannot trust to behave well, otherwise
>>> we'd not need stonith in the first place.
>> The logic, when using a watchdog timer, is that if the node is alive
>> enough to kick the watchdog, it's alive enough to not do something dumb
>> to the cluster. If it's not able to kick the timer, the watchdog timer
>> will reset the machine. This works *if* all resources hang when messages
>> stop coming back from the peer (a side effect of corosync's virtual
>> synchrony).
> 
> In fact watchdog-implementations (meaning the software that
> kicks the hardware-watchdog) are a little bit smarter - and
> so is SBD.
> By having the watchdog-kicking and observation-code in a
> simple loop that is executed periodically you don't need the
> 'if it is alive enough to do the kicking it will behave well'
> paradigm.
> This burns down to making the critical part of the code very
> small and on top hard to control failures that result in any
> kind of hanging don't bother us.
> 
>>
>> So as I understand it, for SBD to be safe, it requires a hardware
>> watchdog timer and a properly configured cluster.
> 
> Yes, yes and yes ... as important as fencing I would say ;-)
> 

So I gather that for SBD to be reasonably safe, it needs real hardware
watchdog. I often see SBD recommended as stonith agent inside a VM,
where we do not have "hardware watchdog" by definition. I still wonder
whether it can be trusted in this case.