[ClusterLabs] Stonith

Ken Gaillot kgaillot at redhat.com
Mon Dec 19 15:44:49 EST 2022


On Mon, 2022-12-19 at 16:17 +0300, Andrei Borzenkov wrote:
> On Mon, Dec 19, 2022 at 4:01 PM Antony Stone
> <Antony.Stone at ha.open.source.it> wrote:
> > On Monday 19 December 2022 at 13:55:45, Andrei Borzenkov wrote:
> > 
> > > On Mon, Dec 19, 2022 at 3:44 PM Antony Stone
> > > 
> > > <Antony.Stone at ha.open.source.it> wrote:
> > > > So, do I simply create one stonith resource for each server,
> > > > and rely on
> > > > some other random server to invoke it when needed?
> > > 
> > > Yes, this is the most simple approach. You need to restrict this
> > > stonith resource to only one cluster node (set pcmk_host_list).
> > 
> > So, just to be clear, I create one stonith resource for each
> > machine which
> > needs to be able to be shut down by some other server?
> > 
> 
> Correct.
> 
> > I ask simply because the acronym stonith refers to "the other
> > node", so it
> > sounds to me more like something I need to define so that a working
> > machine can
> > kill another one.
> > 
> 
> Yes, you define a stonith resource that can kill node A and nodes B,
> C, D, ... will use this resource to kill A when needed. As long as
> your stonith resource can actually work on any node it does not
> matter
> which one will do the killing. You can restrict which nodes can use
> this stonith agent using usual location constraints if necessary.
> 
> But keep in mind that if the whole site is down (or unaccessible) you
> will not have access to IPMI/PDU/whatever on this site so your
> stonith
> agents will fail ...

This is the main problem I see. Presumably the goal of the three-center 
setup is to handle network interruptions to one of them, but without
network, the fencing will fail and the cluster will be unable to
recover the resources from that center.

You may want to look at designing this as two independent clusters,
coordinated via booth. The third site only needs to run a booth
"arbitrator" (quorum server), not pacemaker. With this design, if one
site loses network access, it will shut itself down, and fencing only
needs to be able to work locally at each site.

https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/singlehtml/index.html#document-multi-site-clusters

-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list