[ClusterLabs] Antw: [EXT] Stonith failing

Ken Gaillot kgaillot at redhat.com
Mon Aug 17 11:19:45 EDT 2020


On Fri, 2020-08-14 at 15:09 +0200, Gabriele Bulfon wrote:
> Thanks to all your suggestions, I now have the systems with stonith
> configured on ipmi.

A word of caution: if the IPMI is on-board -- i.e. it shares the same
power supply as the computer -- power becomes a single point of
failure. If the node loses power, the other node can't fence because
the IPMI is also down, and the cluster can't recover.

Some on-board IPMI controllers can share an Ethernet port with the main
computer, which would be a similar situation.

It's best to have a backup fencing method when using IPMI as the
primary fencing method. An example would be an intelligent power switch
or sbd.

> Two questions:
> - how can I simulate a stonith situation to check that everything is
> ok?
> - considering that I have both nodes with stonith against the other
> node, once the two nodes can communicate, how can I be sure the two
> nodes will not try to stonith each other?
>  
> :)
> Thanks!
> Gabriele
> 
>  
>  
> Sonicle S.r.l. : http://www.sonicle.com
> Music: http://www.gabrielebulfon.com
> Quantum Mechanics : http://www.cdbaby.com/cd/gabrielebulfon
> 
> 
> 
> Da: Gabriele Bulfon <gbulfon at sonicle.com>
> A: Cluster Labs - All topics related to open-source clustering
> welcomed <users at clusterlabs.org>
> Data: 29 luglio 2020 14.22.42 CEST
> Oggetto: Re: [ClusterLabs] Antw: [EXT] Stonith failing
> 
> 
> >  
> > It is a ZFS based illumos system.
> > I don't think SBD is an option.
> > Is there a reliable ZFS based stonith?
> >  
> > Gabriele
> > 
> >  
> >  
> > Sonicle S.r.l. : http://www.sonicle.com
> > Music: http://www.gabrielebulfon.com
> > Quantum Mechanics : http://www.cdbaby.com/cd/gabrielebulfon
> > 
> > 
> > 
> > Da: Andrei Borzenkov <arvidjaar at gmail.com>
> > A: Cluster Labs - All topics related to open-source clustering
> > welcomed <users at clusterlabs.org>
> > Data: 29 luglio 2020 9.46.09 CEST
> > Oggetto: Re: [ClusterLabs] Antw: [EXT] Stonith failing
> > 
> > 
> > >  
> > > 
> > > On Wed, Jul 29, 2020 at 9:01 AM Gabriele Bulfon <
> > > gbulfon at sonicle.com> wrote:
> > > > That one was taken from a specific implementation on Solaris
> > > > 11.
> > > > The situation is a dual node server with shared storage
> > > > controller: both nodes see the same disks concurrently.
> > > > Here we must be sure that the two nodes are not going to
> > > > import/mount the same zpool at the same time, or we will
> > > > encounter data corruption:
> > > > 
> > > 
> > >  
> > > ssh based "stonith" cannot guarantee it.
> > >  
> > > > node 1 will be perferred for pool 1, node 2 for pool 2, only in
> > > > case one of the node goes down or is taken offline the
> > > > resources should be first free by the leaving node and taken by
> > > > the other node.
> > > >  
> > > > Would you suggest one of the available stonith in this case?
> > > >  
> > > > 
> > > 
> > >  
> > > IPMI, managed PDU, SBD ...
> > > In practice, the only stonith method that works in case of
> > > complete node outage including any power supply is SBD.
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list