[ClusterLabs] Antw: Re: Antw: [EXT] Stonith failing

Tue Aug 18 03:10:07 EDT 2020

>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 17.08.2020 um 17:19 in
Nachricht
<73d6ecf113098a3154a2e7db2e2a59557272024a.camel at redhat.com>:
> On Fri, 2020‑08‑14 at 15:09 +0200, Gabriele Bulfon wrote:
>> Thanks to all your suggestions, I now have the systems with stonith
>> configured on ipmi.
> 
> A word of caution: if the IPMI is on‑board ‑‑ i.e. it shares the same
> power supply as the computer ‑‑ power becomes a single point of
> failure. If the node loses power, the other node can't fence because
> the IPMI is also down, and the cluster can't recover.

This may not always be true: We had servers with three(!) power supplies and a
BMC (what today is called "light-out management"). You could "power down" the
server, while the BMC was still operational (and thus could "power up" the
server again).
With standard PC architecture these days things seem to be a bit more
compicated (meaning "primitive")...

> 
> Some on‑board IPMI controllers can share an Ethernet port with the main
> computer, which would be a similar situation.
> 
> It's best to have a backup fencing method when using IPMI as the
> primary fencing method. An example would be an intelligent power switch
> or sbd.
> 
>> Two questions:
>> ‑ how can I simulate a stonith situation to check that everything is
>> ok?
>> ‑ considering that I have both nodes with stonith against the other
>> node, once the two nodes can communicate, how can I be sure the two
>> nodes will not try to stonith each other?
>>  
>> :)
>> Thanks!
>> Gabriele
>> 
>>  
>>  
>> Sonicle S.r.l. : http://www.sonicle.com 
>> Music: http://www.gabrielebulfon.com 
>> Quantum Mechanics : http://www.cdbaby.com/cd/gabrielebulfon 
>> 
>> 
>> 
>> Da: Gabriele Bulfon <gbulfon at sonicle.com>
>> A: Cluster Labs ‑ All topics related to open‑source clustering
>> welcomed <users at clusterlabs.org>
>> Data: 29 luglio 2020 14.22.42 CEST
>> Oggetto: Re: [ClusterLabs] Antw: [EXT] Stonith failing
>> 
>> 
>> >  
>> > It is a ZFS based illumos system.
>> > I don't think SBD is an option.
>> > Is there a reliable ZFS based stonith?
>> >  
>> > Gabriele
>> > 
>> >  
>> >  
>> > Sonicle S.r.l. : http://www.sonicle.com 
>> > Music: http://www.gabrielebulfon.com 
>> > Quantum Mechanics : http://www.cdbaby.com/cd/gabrielebulfon 
>> > 
>> > 
>> > 
>> > Da: Andrei Borzenkov <arvidjaar at gmail.com>
>> > A: Cluster Labs ‑ All topics related to open‑source clustering
>> > welcomed <users at clusterlabs.org>
>> > Data: 29 luglio 2020 9.46.09 CEST
>> > Oggetto: Re: [ClusterLabs] Antw: [EXT] Stonith failing
>> > 
>> > 
>> > >  
>> > > 
>> > > On Wed, Jul 29, 2020 at 9:01 AM Gabriele Bulfon <
>> > > gbulfon at sonicle.com> wrote:
>> > > > That one was taken from a specific implementation on Solaris
>> > > > 11.
>> > > > The situation is a dual node server with shared storage
>> > > > controller: both nodes see the same disks concurrently.
>> > > > Here we must be sure that the two nodes are not going to
>> > > > import/mount the same zpool at the same time, or we will
>> > > > encounter data corruption:
>> > > > 
>> > > 
>> > >  
>> > > ssh based "stonith" cannot guarantee it.
>> > >  
>> > > > node 1 will be perferred for pool 1, node 2 for pool 2, only in
>> > > > case one of the node goes down or is taken offline the
>> > > > resources should be first free by the leaving node and taken by
>> > > > the other node.
>> > > >  
>> > > > Would you suggest one of the available stonith in this case?
>> > > >  
>> > > > 
>> > > 
>> > >  
>> > > IPMI, managed PDU, SBD ...
>> > > In practice, the only stonith method that works in case of
>> > > complete node outage including any power supply is SBD.
> ‑‑ 
> Ken Gaillot <kgaillot at redhat.com>
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/