Hello,<div><br></div><div>I'm just looking to verify that I'm understanding/configuring SBD correctly. It works great in the controlled cases where you unplug a node from the network (it gets fenced via SBD) or remove its access to the shared disk (the node suicides). However, In the event of a hardware failure or power interruption that takes a node offline before SBD can fence it, if that node never comes back into the cluster then its resources can't ever start anywhere else. The surviving nodes will continue to try to fence the dead node at regular intervals but can never succeed.</div>
<div><br></div><div>It makes sense why this would be the case, as without a successful fence operation the remaining nodes have no way of knowing if it's safe to start those resources. Still, am I missing some option or setting that may allow for a safe auto-recovery, or is it a caveat of SBD that if a node leaves suddenly and uncleanly, its resources are gone until you do some heavy manual intervention? I suppose this may be one of the reasons that fencing via power devices is pretty much the best way to go about it?</div>
<div><br></div><div>Thanks,</div><div>Mark</div>