[ClusterLabs] What triggers fencing?

Thu Jul 12 06:22:55 EDT 2018

On 07/12/2018 09:39 AM, Confidential Company wrote:
> Message: 2
> Date: Wed, 11 Jul 2018 16:33:31 +0200
> From: Klaus Wenninger <kwenning at redhat.com <mailto:kwenning at redhat.com>>
> To: Ken Gaillot <kgaillot at redhat.com <mailto:kgaillot at redhat.com>>,
> Cluster Labs - All topics
>         related to open-source clustering welcomed
> <users at clusterlabs.org <mailto:users at clusterlabs.org>>,
>         Andrei Borzenkov <arvidjaar at gmail.com
> <mailto:arvidjaar at gmail.com>>
> Subject: Re: [ClusterLabs] What triggers fencing?
> Message-ID: <2bf61b9f-98b0-482f-fa65-263ba9490950 at redhat.com
> <mailto:2bf61b9f-98b0-482f-fa65-263ba9490950 at redhat.com>>
> Content-Type: text/plain; charset=utf-8
>
> On 07/11/2018 04:11 PM, Ken Gaillot wrote:
> > On Wed, 2018-07-11 at 11:06 +0200, Klaus Wenninger wrote:
> >> On 07/11/2018 05:48 AM, Andrei Borzenkov wrote:
> >>> 11.07.2018 05:45, Confidential Company ?????:
> >>>> Not true, the faster node will kill the slower node first. It is
> >>>> possible that through misconfiguration, both could die, but it's
> >>>> rare
> >>>> and easily avoided with a 'delay="15"' set on the fence config
> >>>> for the
> >>>> node you want to win.
> >>>>
> >>>> Don't use a delay on the other node, just the node you want to
> >>>> live in
> >>>> such a case.
> >>>>
> >>>> **
> >>>> ????????????????1. Given Active/Passive setup, resources are
> >>>> active on Node1
> >>>> ????????????????2. fence1(prefers to Node1, delay=15) and
> >>>> fence2(prefers to
> >>>> Node2, delay=30)
> >>>> ????????????????3. Node2 goes down
> > What do you mean by "down" in this case?
> >
> > If you mean the host itself has crashed, then it will not do anything,
> > and node1 will fence it.
> >
> > If you mean node2's network goes out, so it's still functioning but no
> > one can reach the managed service on it, then you are correct, the
> > "wrong" node can get shot -- because you didn't specify anything about
> > what the right node would be. This is a somewhat tricky area, but it
> > can be done with a quorum-only node, qdisk, or fence_heuristics_ping,
> > all of which are different ways of "preferring" the node that can reach
> > a certain host.
>
>
>
> Or in other words why would I - as a cluster-node - shoot the
> peer to be able to start the services locally if I can somehow
> tell beforehand that my services anyway wouldn't be
> reachable by anybody (e.g. network disconnected).
> Then it might make more sense to sit still and wait to be shot by
> the other side for the case that guy is more lucky and
> has e.g. access to the network.
>
>
> -Klaus
>
>
> in case of 2node setup, they are both know nothing if their services
> are reachable by anybody.

Of course they can not get that knowledge using the cluster-peer but
maybe it is possible to get some additional instance into the game.
As Ken already mentioned that might be a disk, an additional node
just for quorum, qdevice or fence_heuristics_ping.
The latter is used on the same fencing level before your real
fencing device and tries to reach IP-Address(es) you configure
and dependent on that it gains some knowledge in how far the
local node might be accessible from outside.

Btw. in your config I saw that you are using pcmk_delay_max on just
one of the nodes. That is not how it is designed to be used as
you will have a random delay between 0 and max. I would rather
recommend using pcmk_delay_base on one of the nodes (fixed delay)
if you want to prioritize one of them or pcmk_delay_max
with the same delay if you rather want a random behavior.

Unfortunately the current implementation of fencing doesn't
allow things like dynamic location-rules that can react on e.g.
certain resources running as to prioritize the active node.
What you still can do is that you try to go the way fence_heuristics_ping
is going (put something in a fencing hierarchy in front of the real
fencing device) and add a fence-agent that in case the node
has certain resources running (active) would return successfully
immediately and in case they are not running (passive) waits
a certain time before returning successfully.

Otherwise - without checking the logs - I don't know why
disconnecting either node2 or node1 makes a difference.
(Is that reproducible at all?)
In the back of my mind I remember an issue with Corosync
where an interface going down might prevent loss detection
somehow - not remembering exactly.

Regards,
Klaus

>
> Sharing you my config and my tests:
>
> Last login: Thu Jul 12 14:57:21 2018
> [root at ArcosRhel1 ~]# pcs config
> Cluster Name: ARCOSCLUSTER
> Corosync Nodes:
>  ArcosRhel1 ArcosRhel2
> Pacemaker Nodes:
>  ArcosRhel1 ArcosRhel2
>
> Resources:
>  Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
>   Attributes: cidr_netmask=32 ip=172.16.10.243
>   Operations: monitor interval=30s (ClusterIP-monitor-interval-30s)
>               start interval=0s timeout=20s (ClusterIP-start-interval-0s)
>               stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)
>
> Stonith Devices:
>  Resource: Fence1 (class=stonith type=fence_vmware_soap)
>   Attributes: action=off ipaddr=172.16.11.201 login=test
> passwd=testing pcmk_host_list=ArcosRhel1 pcmk_monitor_timeout=60s
> port=ArcosRhel1(Joniel) ssl_insecure=1
>   Operations: monitor interval=60s (Fence1-monitor-interval-60s)
>  Resource: fence2 (class=stonith type=fence_vmware_soap)
>   Attributes: action=off ipaddr=172.16.11.202 login=test
> passwd=testing pcmk_delay_max=10s pcmk_host_list=ArcosRhel2
> pcmk_monitor_timeout=60s port=ArcosRhel2(Ben) ssl_insecure=1
>   Operations: monitor interval=60s (fence2-monitor-interval-60s)
> Fencing Levels:
>
> Location Constraints:
>   Resource: Fence1
>     Enabled on: ArcosRhel2 (score:INFINITY)
> (id:location-Fence1-ArcosRhel2-INFINITY)
>   Resource: fence2
>     Enabled on: ArcosRhel1 (score:INFINITY)
> (id:location-fence2-ArcosRhel1-INFINITY)
> Ordering Constraints:
> Colocation Constraints:
> Ticket Constraints:
>
> Alerts:
>  No alerts defined
>
> Resources Defaults:
>  No defaults set
> Operations Defaults:
>  No defaults set
>
> Cluster Properties:
>  cluster-infrastructure: corosync
>  cluster-name: ARCOSCLUSTER
>  dc-version: 1.1.16-12.el7-94ff4df
>  have-watchdog: false
>  last-lrm-refresh: 1531375458
>  stonith-enabled: true
>
> Quorum:
>   Options:
> [root at ArcosRhel1 ~]#
>
> **Test scenario:
> Given:
> Nodes has two interfaces: (ens192 for corosync traffic / ens224 for
> esxi traffic)
>
> a.) Node1=Active and Node2=Passive.
>  Action=disconnect ens192 of Node1 
> Output= Node2 was fenced and shutdown
> b.) Node1=Passive and Node2=Active
> Action=disconnect ens192 of Node1
> Output= Node1 was fenced and shutdown
> c.) Node1=Passive and Node2=Active
> Action=disconnect ens192 of Node2
> Output=Node2 was fenced and shutdown
>
>
> Thanks,
> imnotarobot
>
>
>
> >
> > If you mean the cluster-managed resource crashes on node2, but node2
> > itself is still functioning properly, then what happens depends on how
> > you've configured failure recovery. By default, there is no fencing,
> > and the cluster tries to restart the resource.
> >
> >>>> ????????????????4. Node1 thinks Node2 goes down / Node2 thinks
> >>>> Node1 goes
> >>>> down
> >>> If node2 is down, it cannot think anything.
> >> True. Assuming it is not really down but just somehow disconnected
> >> for my answer below.
> >>
> >>>> ????????????????5. fence1 counts 15 seconds before he fence Node1
> >>>> while
> >>>> fence2 counts 30 seconds before he fence Node2
> >>>> ????????????????6. Since fence1 do have shorter time than fence2,
> >>>> fence1
> >>>> executes and shutdown Node1.
> >>>> ????????????????7. fence1(action: shutdown Node1)??will trigger
> >>>> first
> >>>> always because it has shorter delay than fence2.
> >>>>
> >>>> ** Okay what's important is that they should be different. But in
> >>>> the case
> >>>> above, even though Node2 goes down but Node1 has shorter delay,
> >>>> Node1 gets
> >>>> fenced/shutdown. This is a sample scenario. I don't get the
> >>>> point. Can you
> >>>> comment on this?
> >> You didn't send the actual config but from your description
> >> I get the scenario that way:
> >>
> >> fencing-resource fence1 is running on Node2 and it is there
> >> to fence Node1 and it has a delay of 15s.
> >> fencing-resource fence2 is running on Node1 and it is there
> >> to fence Node2 and it has a delay of 30s.
> >> If they now begin to fence each other at the same time the
> >> node actually fenced would be Node1 of course as the
> >> fencing-resource fence1 is gonna shoot 15s earlier that the
> >> fence2.
> >> Looks consistent to me ...
> >>
> >> Regards,
> >> Klaus
> >>
> >>>> Thanks
> >>>>
> >>>> On Tue, Jul 10, 2018 at 12:18 AM, Klaus Wenninger <kwenning at redha
> >>>> t.com <http://t.com/>>
> >>>> wrote:
> >>>>
> >>>>> On 07/09/2018 05:53 PM, Digimer wrote:
> >>>>>> On 2018-07-09 11:45 AM, Klaus Wenninger wrote:
> >>>>>>> On 07/09/2018 05:33 PM, Digimer wrote:
> >>>>>>>> On 2018-07-09 09:56 AM, Klaus Wenninger wrote:
> >>>>>>>>> On 07/09/2018 03:49 PM, Digimer wrote:
> >>>>>>>>>> On 2018-07-09 08:31 AM, Klaus Wenninger wrote:
> >>>>>>>>>>> On 07/09/2018 02:04 PM, Confidential Company wrote:
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Any ideas what triggers fencing script or
> >>>>>>>>>>>> stonith?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Given the setup below:
> >>>>>>>>>>>> 1. I have two nodes
> >>>>>>>>>>>> 2. Configured fencing on both nodes
> >>>>>>>>>>>> 3. Configured delay=15 and delay=30 on fence1(for
> >>>>>>>>>>>> Node1) and
> >>>>>>>>>>>> fence2(for Node2) respectively
> >>>>>>>>>>>>
> >>>>>>>>>>>> *What does it mean to configured delay in
> >>>>>>>>>>>> stonith? wait for 15
> >>>>> seconds
> >>>>>>>>>>>> before it fence the node?
> >>>>>>>>>>> Given that on a 2-node-cluster you don't have real
> >>>>>>>>>>> quorum to make
> >>>>> one
> >>>>>>>>>>> partial cluster fence the rest of the nodes the
> >>>>>>>>>>> different delays
> >>>>> are meant
> >>>>>>>>>>> to prevent a fencing-race.
> >>>>>>>>>>> Without different delays that would lead to both
> >>>>>>>>>>> nodes fencing each
> >>>>>>>>>>> other at the same time - finally both being down.
> >>>>>>>>>> Not true, the faster node will kill the slower node
> >>>>>>>>>> first. It is
> >>>>>>>>>> possible that through misconfiguration, both could
> >>>>>>>>>> die, but it's rare
> >>>>>>>>>> and easily avoided with a 'delay="15"' set on the
> >>>>>>>>>> fence config for
> >>>>> the
> >>>>>>>>>> node you want to win.
> >>>>>>>>> What exactly is not true? Aren't we saying the same?
> >>>>>>>>> Of course one of the delays can be 0 (most important is
> >>>>>>>>> that
> >>>>>>>>> they are different).
> >>>>>>>> Perhaps I misunderstood your message. It seemed to me
> >>>>>>>> that the
> >>>>>>>> implication was that fencing in 2-node without a delay
> >>>>>>>> always ends up
> >>>>>>>> with both nodes being down, which isn't the case. It can
> >>>>>>>> happen if the
> >>>>>>>> fence methods are not setup right (ie: the node isn't set
> >>>>>>>> to
> >>>>> immediately
> >>>>>>>> power off on ACPI power button event).
> >>>>>>> Yes, a misunderstanding I guess.
> >>>>>>>
> >>>>>>> Should have been more verbose in saying that due to the
> >>>>>>> time between the fencing-command fired off to the fencing
> >>>>>>> device and the actual fencing taking place (as you state
> >>>>>>> dependent on how it is configured in detail - but a
> >>>>>>> measurable
> >>>>>>> time in all cases) there is a certain probability that when
> >>>>>>> both nodes start fencing at roughly the same time we will
> >>>>>>> end up with 2 nodes down.
> >>>>>>>
> >>>>>>> Everybody has to find his own tradeoff between reliability
> >>>>>>> fence-races are prevented and fencing delay I guess.
> >>>>>> We've used this;
> >>>>>>
> >>>>>> 1. IPMI (with the guest OS set to immediately power off) as
> >>>>>> primary,
> >>>>>> with a 15 second delay on the active node.
> >>>>>>
> >>>>>> 2. Two Switched PDUs (two power circuits, two PSUs) as backup
> >>>>>> fencing
> >>>>>> for when IPMI fails, with no delay.
> >>>>>>
> >>>>>> In ~8 years, across dozens and dozens of clusters and
> >>>>>> countless fence
> >>>>>> actions, we've never had a dual-fence event (where both nodes
> >>>>>> go down).
> >>>>>> So it can be done safely, but as always, test test test
> >>>>>> before prod.
> >>>>> No doubt about that this setup is working reliably.
> >>>>> You just have to know your fencing-devices and
> >>>>> which delays they involve.
> >>>>>
> >>>>> If we are talking about SBD (with disk as otherwise
> >>>>> it doesn't work in a sensible way in 2-node-clusters)
> >>>>> for instance I would strongly advise using a delay.
> >>>>>
> >>>>> So I guess it is important to understand the basic
> >>>>> idea behind this different delay-based fence-race
> >>>>> avoidance.
> >>>>> Afterwards you can still decide why it is no issue
> >>>>> in your own setup.
> >>>>>
> >>>>>>>> If the delay is set on both nodes, and they are
> >>>>>>>> different, it will work
> >>>>>>>> fine. The reason not to do this is that if you use 0,
> >>>>>>>> then don't use
> >>>>>>>> anything at all (0 is default), and any other value
> >>>>>>>> causes avoidable
> >>>>>>>> fence delays.
> >>>>>>>>
> >>>>>>>>>> Don't use a delay on the other node, just the node
> >>>>>>>>>> you want to live
> >>>>> in
> >>>>>>>>>> such a case.
> >>>>>>>>>>
> >>>>>>>>>>>> *Given Node1 is active and Node2 goes down, does
> >>>>>>>>>>>> it mean fence1
> >>>>> will
> >>>>>>>>>>>> first execute and shutdowns Node1 even though
> >>>>>>>>>>>> Node2 goes down?
> >>>>>>>>>>> If Node2 managed to sign off properly it will not.
> >>>>>>>>>>> If network-connection is down so that Node2 can't
> >>>>>>>>>>> inform Node1 that
> >>>>> it
> >>>>>>>>>>> is going
> >>>>>>>>>>> down and finally has stopped all resources it will
> >>>>>>>>>>> be fenced by
> >>>>> Node1.
> >>>>>>>>>>> Regards,
> >>>>>>>>>>> Klaus
> >>>>>>>>>> Fencing occurs in two cases;
> >>>>>>>>>>
> >>>>>>>>>> 1. The node stops responding (meaning it's in an
> >>>>>>>>>> unknown state, so
> >>>>> it is
> >>>>>>>>>> fenced to force it into a known state).
> >>>>>>>>>> 2. A resource / service fails to stop stop. In this
> >>>>>>>>>> case, the
> >>>>> service is
> >>>>>>>>>> in an unknown state, so the node is fenced to force
> >>>>>>>>>> the service into
> >>>>> a
> >>>>>>>>>> known state so that it can be safely recovered on the
> >>>>>>>>>> peer.
> >>>>>>>>>>
> >>>>>>>>>> Graceful withdrawal of the node from the cluster, and
> >>>>>>>>>> graceful
> >>>>> stopping
> >>>>>>>>>> of services will not lead to a fence (because in both
> >>>>>>>>>> cases, the
> >>>>> node /
> >>>>>>>>>> service are in a known state - off).
> >>>>>>>>>>
> >>>>
> >>>> _______________________________________________
> >>>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
> >>>> https://lists.clusterlabs.org/mailman/listinfo/users
> <https://lists.clusterlabs.org/mailman/listinfo/users>
> >>>>
> >>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
> >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scra
> <http://www.clusterlabs.org/doc/Cluster_from_Scra>
> >>>> tch.pdf
> >>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
> >>>>
> >>> _______________________________________________
> >>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
> >>> https://lists.clusterlabs.org/mailman/listinfo/users
> <https://lists.clusterlabs.org/mailman/listinfo/users>
> >>>
> >>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
> >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratc
> <http://www.clusterlabs.org/doc/Cluster_from_Scratc>
> >>> h.pdf
> >>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
> >> _______________________________________________
> >> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
> >> https://lists.clusterlabs.org/mailman/listinfo/users
> <https://lists.clusterlabs.org/mailman/listinfo/users>
> >>
> >> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch
> <http://www.clusterlabs.org/doc/Cluster_from_Scratch>.
> >> pdf
> >> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180712/b44ebba8/attachment-0002.html>