[ClusterLabs] Antw: [EXT] Stonith failing

Jehan-Guillaume de Rorthais jgdr at dalibo.com
Tue Aug 18 06:07:08 EDT 2020


On Tue, 18 Aug 2020 08:21:50 +0200
Klaus Wenninger <kwenning at redhat.com> wrote:

> On 8/18/20 7:49 AM, Andrei Borzenkov wrote:
> > 17.08.2020 23:39, Jehan-Guillaume de Rorthais пишет:  
> >> On Mon, 17 Aug 2020 10:19:45 -0500
> >> Ken Gaillot <kgaillot at redhat.com> wrote:
> >>  
> >>> On Fri, 2020-08-14 at 15:09 +0200, Gabriele Bulfon wrote:  
> >>>> Thanks to all your suggestions, I now have the systems with stonith
> >>>> configured on ipmi.    
> >>> A word of caution: if the IPMI is on-board -- i.e. it shares the same
> >>> power supply as the computer -- power becomes a single point of
> >>> failure. If the node loses power, the other node can't fence because
> >>> the IPMI is also down, and the cluster can't recover.
> >>>
> >>> Some on-board IPMI controllers can share an Ethernet port with the main
> >>> computer, which would be a similar situation.
> >>>
> >>> It's best to have a backup fencing method when using IPMI as the
> >>> primary fencing method. An example would be an intelligent power switch
> >>> or sbd.  
> >> How SBD would be useful in this scenario? Poison pill will not be
> >> swallowed by the dead node... Is it just to wait for the watchdog timeout?
> >>  
> > Node is expected to commit suicide if SBD lost access to shared block
> > device. So either node swallowed poison pill and died or node died
> > because it realized it was impossible to see poison pill or node was
> > dead already. After watchdog timeout (twice watchdog timeout for safety)
> > we assume node is dead.  
> Yes, like this a suicide via watchdog will be triggered if there are
> issues with thedisk. This is why it is important to have a reliable
> watchdog with SBD even whenusing poison pill. As this alone would
> make a single shared disk a SPOF, runningwith pacemaker integration
> (default) a node with SBD will survive despite ofloosing the disk
> when it has quorum and pacemaker looks healthy. As corosync-quorum
> in 2-node-mode obviously won't be fit for this purpose SBD will switch
> to checking for presence of both nodes if 2-node-flag is set.
> 
> Sorry for the lengthy explanation but the full picture is required
> to understand whyit is sufficiently reliable and useful if configured

Thank you Andrei and Klaus for the explanation.

Regards,


More information about the Users mailing list