[ClusterLabs] Antw: SBD & Failed Peer

Mon Sep 7 06:15:35 UTC 2015

>>> Jorge Fábregas <jorge.fabregas at gmail.com> schrieb am 06.09.2015 um 22:23
in
Nachricht <55ECA0B8.6090708 at gmail.com>:
> Hi,
> 
> I was reading one of the latest posts [1] from Andrew Beekhof on SBD and
> got me into thinking...
> 
> Assume an active/active cluster using OCFS2 and SBD with shared storage.
> Then one node explodes (the hardware watchdog is gone as well
> obviously).  At this point my guess is that the remaining node will
> notice that its partner hasn't updated its mailbox slot on the SBD
> shared-storage.
> 
> My question:  Is this enough proof (confirmation) that the other node
> isn't capable of causing corruption? And so...will DLM/OCFS2 resume
> operation?

IMHO it will wor differently: If the node goes down, the network layer
(corosync) will notice that (sooner or later depending on some settings). The a
remaining node will try a fencing operation. After some time (also
configurable) the remaining nodes will assume the other node was fenced
successfully. I doesn not mean that anything actually happened, but that's the
way it's designed. You'll have to make sure things work as configured.

> 
> Thanks,
> Jorge
> 
> [1]: http://blog.clusterlabs.org/blog/2015/sbd-fun-and-profit/ 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org