[ClusterLabs] What triggers fencing?

Wed Jul 11 10:11:16 EDT 2018

On Wed, 2018-07-11 at 11:06 +0200, Klaus Wenninger wrote:
> On 07/11/2018 05:48 AM, Andrei Borzenkov wrote:
> > 11.07.2018 05:45, Confidential Company пишет:
> > > Not true, the faster node will kill the slower node first. It is
> > > possible that through misconfiguration, both could die, but it's
> > > rare
> > > and easily avoided with a 'delay="15"' set on the fence config
> > > for the
> > > node you want to win.
> > > 
> > > Don't use a delay on the other node, just the node you want to
> > > live in
> > > such a case.
> > > 
> > > **
> > >                 1. Given Active/Passive setup, resources are
> > > active on Node1
> > >                 2. fence1(prefers to Node1, delay=15) and
> > > fence2(prefers to
> > > Node2, delay=30)
> > >                 3. Node2 goes down

What do you mean by "down" in this case?

If you mean the host itself has crashed, then it will not do anything,
and node1 will fence it.

If you mean node2's network goes out, so it's still functioning but no
one can reach the managed service on it, then you are correct, the
"wrong" node can get shot -- because you didn't specify anything about
what the right node would be. This is a somewhat tricky area, but it
can be done with a quorum-only node, qdisk, or fence_heuristics_ping,
all of which are different ways of "preferring" the node that can reach
a certain host.

If you mean the cluster-managed resource crashes on node2, but node2
itself is still functioning properly, then what happens depends on how
you've configured failure recovery. By default, there is no fencing,
and the cluster tries to restart the resource.

> > >                 4. Node1 thinks Node2 goes down / Node2 thinks
> > > Node1 goes
> > > down
> > 
> > If node2 is down, it cannot think anything.
> 
> True. Assuming it is not really down but just somehow disconnected
> for my answer below.
> 
> > 
> > >                 5. fence1 counts 15 seconds before he fence Node1
> > > while
> > > fence2 counts 30 seconds before he fence Node2
> > >                 6. Since fence1 do have shorter time than fence2,
> > > fence1
> > > executes and shutdown Node1.
> > >                 7. fence1(action: shutdown Node1)  will trigger
> > > first
> > > always because it has shorter delay than fence2.
> > > 
> > > ** Okay what's important is that they should be different. But in
> > > the case
> > > above, even though Node2 goes down but Node1 has shorter delay,
> > > Node1 gets
> > > fenced/shutdown. This is a sample scenario. I don't get the
> > > point. Can you
> > > comment on this?
> 
> You didn't send the actual config but from your description
> I get the scenario that way:
> 
> fencing-resource fence1 is running on Node2 and it is there
> to fence Node1 and it has a delay of 15s.
> fencing-resource fence2 is running on Node1 and it is there
> to fence Node2 and it has a delay of 30s.
> If they now begin to fence each other at the same time the
> node actually fenced would be Node1 of course as the
> fencing-resource fence1 is gonna shoot 15s earlier that the
> fence2.
> Looks consistent to me ...
> 
> Regards,
> Klaus
> 
> > > 
> > > Thanks
> > > 
> > > On Tue, Jul 10, 2018 at 12:18 AM, Klaus Wenninger <kwenning at redha
> > > t.com>
> > > wrote:
> > > 
> > > > On 07/09/2018 05:53 PM, Digimer wrote:
> > > > > On 2018-07-09 11:45 AM, Klaus Wenninger wrote:
> > > > > > On 07/09/2018 05:33 PM, Digimer wrote:
> > > > > > > On 2018-07-09 09:56 AM, Klaus Wenninger wrote:
> > > > > > > > On 07/09/2018 03:49 PM, Digimer wrote:
> > > > > > > > > On 2018-07-09 08:31 AM, Klaus Wenninger wrote:
> > > > > > > > > > On 07/09/2018 02:04 PM, Confidential Company wrote:
> > > > > > > > > > > Hi,
> > > > > > > > > > > 
> > > > > > > > > > > Any ideas what triggers fencing script or
> > > > > > > > > > > stonith?
> > > > > > > > > > > 
> > > > > > > > > > > Given the setup below:
> > > > > > > > > > > 1. I have two nodes
> > > > > > > > > > > 2. Configured fencing on both nodes
> > > > > > > > > > > 3. Configured delay=15 and delay=30 on fence1(for
> > > > > > > > > > > Node1) and
> > > > > > > > > > > fence2(for Node2) respectively
> > > > > > > > > > > 
> > > > > > > > > > > *What does it mean to configured delay in
> > > > > > > > > > > stonith? wait for 15
> > > > 
> > > > seconds
> > > > > > > > > > > before it fence the node?
> > > > > > > > > > 
> > > > > > > > > > Given that on a 2-node-cluster you don't have real
> > > > > > > > > > quorum to make
> > > > 
> > > > one
> > > > > > > > > > partial cluster fence the rest of the nodes the
> > > > > > > > > > different delays
> > > > 
> > > > are meant
> > > > > > > > > > to prevent a fencing-race.
> > > > > > > > > > Without different delays that would lead to both
> > > > > > > > > > nodes fencing each
> > > > > > > > > > other at the same time - finally both being down.
> > > > > > > > > 
> > > > > > > > > Not true, the faster node will kill the slower node
> > > > > > > > > first. It is
> > > > > > > > > possible that through misconfiguration, both could
> > > > > > > > > die, but it's rare
> > > > > > > > > and easily avoided with a 'delay="15"' set on the
> > > > > > > > > fence config for
> > > > 
> > > > the
> > > > > > > > > node you want to win.
> > > > > > > > 
> > > > > > > > What exactly is not true? Aren't we saying the same?
> > > > > > > > Of course one of the delays can be 0 (most important is
> > > > > > > > that
> > > > > > > > they are different).
> > > > > > > 
> > > > > > > Perhaps I misunderstood your message. It seemed to me
> > > > > > > that the
> > > > > > > implication was that fencing in 2-node without a delay
> > > > > > > always ends up
> > > > > > > with both nodes being down, which isn't the case. It can
> > > > > > > happen if the
> > > > > > > fence methods are not setup right (ie: the node isn't set
> > > > > > > to
> > > > 
> > > > immediately
> > > > > > > power off on ACPI power button event).
> > > > > > 
> > > > > > Yes, a misunderstanding I guess.
> > > > > > 
> > > > > > Should have been more verbose in saying that due to the
> > > > > > time between the fencing-command fired off to the fencing
> > > > > > device and the actual fencing taking place (as you state
> > > > > > dependent on how it is configured in detail - but a
> > > > > > measurable
> > > > > > time in all cases) there is a certain probability that when
> > > > > > both nodes start fencing at roughly the same time we will
> > > > > > end up with 2 nodes down.
> > > > > > 
> > > > > > Everybody has to find his own tradeoff between reliability
> > > > > > fence-races are prevented and fencing delay I guess.
> > > > > 
> > > > > We've used this;
> > > > > 
> > > > > 1. IPMI (with the guest OS set to immediately power off) as
> > > > > primary,
> > > > > with a 15 second delay on the active node.
> > > > > 
> > > > > 2. Two Switched PDUs (two power circuits, two PSUs) as backup
> > > > > fencing
> > > > > for when IPMI fails, with no delay.
> > > > > 
> > > > > In ~8 years, across dozens and dozens of clusters and
> > > > > countless fence
> > > > > actions, we've never had a dual-fence event (where both nodes
> > > > > go down).
> > > > > So it can be done safely, but as always, test test test
> > > > > before prod.
> > > > 
> > > > No doubt about that this setup is working reliably.
> > > > You just have to know your fencing-devices and
> > > > which delays they involve.
> > > > 
> > > > If we are talking about SBD (with disk as otherwise
> > > > it doesn't work in a sensible way in 2-node-clusters)
> > > > for instance I would strongly advise using a delay.
> > > > 
> > > > So I guess it is important to understand the basic
> > > > idea behind this different delay-based fence-race
> > > > avoidance.
> > > > Afterwards you can still decide why it is no issue
> > > > in your own setup.
> > > > 
> > > > > > > If the delay is set on both nodes, and they are
> > > > > > > different, it will work
> > > > > > > fine. The reason not to do this is that if you use 0,
> > > > > > > then don't use
> > > > > > > anything at all (0 is default), and any other value
> > > > > > > causes avoidable
> > > > > > > fence delays.
> > > > > > > 
> > > > > > > > > Don't use a delay on the other node, just the node
> > > > > > > > > you want to live
> > > > 
> > > > in
> > > > > > > > > such a case.
> > > > > > > > > 
> > > > > > > > > > > *Given Node1 is active and Node2 goes down, does
> > > > > > > > > > > it mean fence1
> > > > 
> > > > will
> > > > > > > > > > > first execute and shutdowns Node1 even though
> > > > > > > > > > > Node2 goes down?
> > > > > > > > > > 
> > > > > > > > > > If Node2 managed to sign off properly it will not.
> > > > > > > > > > If network-connection is down so that Node2 can't
> > > > > > > > > > inform Node1 that
> > > > 
> > > > it
> > > > > > > > > > is going
> > > > > > > > > > down and finally has stopped all resources it will
> > > > > > > > > > be fenced by
> > > > 
> > > > Node1.
> > > > > > > > > > Regards,
> > > > > > > > > > Klaus
> > > > > > > > > 
> > > > > > > > > Fencing occurs in two cases;
> > > > > > > > > 
> > > > > > > > > 1. The node stops responding (meaning it's in an
> > > > > > > > > unknown state, so
> > > > 
> > > > it is
> > > > > > > > > fenced to force it into a known state).
> > > > > > > > > 2. A resource / service fails to stop stop. In this
> > > > > > > > > case, the
> > > > 
> > > > service is
> > > > > > > > > in an unknown state, so the node is fenced to force
> > > > > > > > > the service into
> > > > 
> > > > a
> > > > > > > > > known state so that it can be safely recovered on the
> > > > > > > > > peer.
> > > > > > > > > 
> > > > > > > > > Graceful withdrawal of the node from the cluster, and
> > > > > > > > > graceful
> > > > 
> > > > stopping
> > > > > > > > > of services will not lead to a fence (because in both
> > > > > > > > > cases, the
> > > > 
> > > > node /
> > > > > > > > > service are in a known state - off).
> > > > > > > > > 
> > > 
> > > 
> > > _______________________________________________
> > > Users mailing list: Users at clusterlabs.org
> > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > 
> > > Project Home: http://www.clusterlabs.org
> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scra
> > > tch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> > > 
> > 
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> > https://lists.clusterlabs.org/mailman/listinfo/users
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratc
> > h.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
> pdf
> Bugs: http://bugs.clusterlabs.org
-- 
Ken Gaillot <kgaillot at redhat.com>