[ClusterLabs] What triggers fencing?

Thu Jul 12 07:39:29 UTC 2018

Message: 2
Date: Wed, 11 Jul 2018 16:33:31 +0200
From: Klaus Wenninger <kwenning at redhat.com>
To: Ken Gaillot <kgaillot at redhat.com>, Cluster Labs - All topics
        related to open-source clustering welcomed <users at clusterlabs.org>,
        Andrei Borzenkov <arvidjaar at gmail.com>
Subject: Re: [ClusterLabs] What triggers fencing?
Message-ID: <2bf61b9f-98b0-482f-fa65-263ba9490950 at redhat.com>
Content-Type: text/plain; charset=utf-8

On 07/11/2018 04:11 PM, Ken Gaillot wrote:
> On Wed, 2018-07-11 at 11:06 +0200, Klaus Wenninger wrote:
>> On 07/11/2018 05:48 AM, Andrei Borzenkov wrote:
>>> 11.07.2018 05:45, Confidential Company ?????:
>>>> Not true, the faster node will kill the slower node first. It is
>>>> possible that through misconfiguration, both could die, but it's
>>>> rare
>>>> and easily avoided with a 'delay="15"' set on the fence config
>>>> for the
>>>> node you want to win.
>>>>
>>>> Don't use a delay on the other node, just the node you want to
>>>> live in
>>>> such a case.
>>>>
>>>> **
>>>> ????????????????1. Given Active/Passive setup, resources are
>>>> active on Node1
>>>> ????????????????2. fence1(prefers to Node1, delay=15) and
>>>> fence2(prefers to
>>>> Node2, delay=30)
>>>> ????????????????3. Node2 goes down
> What do you mean by "down" in this case?
>
> If you mean the host itself has crashed, then it will not do anything,
> and node1 will fence it.
>
> If you mean node2's network goes out, so it's still functioning but no
> one can reach the managed service on it, then you are correct, the
> "wrong" node can get shot -- because you didn't specify anything about
> what the right node would be. This is a somewhat tricky area, but it
> can be done with a quorum-only node, qdisk, or fence_heuristics_ping,
> all of which are different ways of "preferring" the node that can reach
> a certain host.

Or in other words why would I - as a cluster-node - shoot the
peer to be able to start the services locally if I can somehow
tell beforehand that my services anyway wouldn't be
reachable by anybody (e.g. network disconnected).
Then it might make more sense to sit still and wait to be shot by
the other side for the case that guy is more lucky and
has e.g. access to the network.

-Klaus

in case of 2node setup, they are both know nothing if their services are
reachable by anybody.

Sharing you my config and my tests:

Last login: Thu Jul 12 14:57:21 2018
[root at ArcosRhel1 ~]# pcs config
Cluster Name: ARCOSCLUSTER
Corosync Nodes:
 ArcosRhel1 ArcosRhel2
Pacemaker Nodes:
 ArcosRhel1 ArcosRhel2

Resources:
 Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: cidr_netmask=32 ip=172.16.10.243
  Operations: monitor interval=30s (ClusterIP-monitor-interval-30s)
              start interval=0s timeout=20s (ClusterIP-start-interval-0s)
              stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)

Stonith Devices:
 Resource: Fence1 (class=stonith type=fence_vmware_soap)
  Attributes: action=off ipaddr=172.16.11.201 login=test passwd=testing
pcmk_host_list=ArcosRhel1 pcmk_monitor_timeout=60s port=ArcosRhel1(Joniel)
ssl_insecure=1
  Operations: monitor interval=60s (Fence1-monitor-interval-60s)
 Resource: fence2 (class=stonith type=fence_vmware_soap)
  Attributes: action=off ipaddr=172.16.11.202 login=test passwd=testing
pcmk_delay_max=10s pcmk_host_list=ArcosRhel2 pcmk_monitor_timeout=60s
port=ArcosRhel2(Ben) ssl_insecure=1
  Operations: monitor interval=60s (fence2-monitor-interval-60s)
Fencing Levels:

Location Constraints:
  Resource: Fence1
    Enabled on: ArcosRhel2 (score:INFINITY)
(id:location-Fence1-ArcosRhel2-INFINITY)
  Resource: fence2
    Enabled on: ArcosRhel1 (score:INFINITY)
(id:location-fence2-ArcosRhel1-INFINITY)
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: ARCOSCLUSTER
 dc-version: 1.1.16-12.el7-94ff4df
 have-watchdog: false
 last-lrm-refresh: 1531375458
 stonith-enabled: true

Quorum:
  Options:
[root at ArcosRhel1 ~]#

**Test scenario:
Given:
Nodes has two interfaces: (ens192 for corosync traffic / ens224 for esxi
traffic)

a.) Node1=Active and Node2=Passive.
 Action=disconnect ens192 of Node1
Output= Node2 was fenced and shutdown
b.) Node1=Passive and Node2=Active
Action=disconnect ens192 of Node1
Output= Node1 was fenced and shutdown
c.) Node1=Passive and Node2=Active
Action=disconnect ens192 of Node2
Output=Node2 was fenced and shutdown

Thanks,
imnotarobot

>
> If you mean the cluster-managed resource crashes on node2, but node2
> itself is still functioning properly, then what happens depends on how
> you've configured failure recovery. By default, there is no fencing,
> and the cluster tries to restart the resource.
>
>>>> ????????????????4. Node1 thinks Node2 goes down / Node2 thinks
>>>> Node1 goes
>>>> down
>>> If node2 is down, it cannot think anything.
>> True. Assuming it is not really down but just somehow disconnected
>> for my answer below.
>>
>>>> ????????????????5. fence1 counts 15 seconds before he fence Node1
>>>> while
>>>> fence2 counts 30 seconds before he fence Node2
>>>> ????????????????6. Since fence1 do have shorter time than fence2,
>>>> fence1
>>>> executes and shutdown Node1.
>>>> ????????????????7. fence1(action: shutdown Node1)??will trigger
>>>> first
>>>> always because it has shorter delay than fence2.
>>>>
>>>> ** Okay what's important is that they should be different. But in
>>>> the case
>>>> above, even though Node2 goes down but Node1 has shorter delay,
>>>> Node1 gets
>>>> fenced/shutdown. This is a sample scenario. I don't get the
>>>> point. Can you
>>>> comment on this?
>> You didn't send the actual config but from your description
>> I get the scenario that way:
>>
>> fencing-resource fence1 is running on Node2 and it is there
>> to fence Node1 and it has a delay of 15s.
>> fencing-resource fence2 is running on Node1 and it is there
>> to fence Node2 and it has a delay of 30s.
>> If they now begin to fence each other at the same time the
>> node actually fenced would be Node1 of course as the
>> fencing-resource fence1 is gonna shoot 15s earlier that the
>> fence2.
>> Looks consistent to me ...
>>
>> Regards,
>> Klaus
>>
>>>> Thanks
>>>>
>>>> On Tue, Jul 10, 2018 at 12:18 AM, Klaus Wenninger <kwenning at redha
>>>> t.com>
>>>> wrote:
>>>>
>>>>> On 07/09/2018 05:53 PM, Digimer wrote:
>>>>>> On 2018-07-09 11:45 AM, Klaus Wenninger wrote:
>>>>>>> On 07/09/2018 05:33 PM, Digimer wrote:
>>>>>>>> On 2018-07-09 09:56 AM, Klaus Wenninger wrote:
>>>>>>>>> On 07/09/2018 03:49 PM, Digimer wrote:
>>>>>>>>>> On 2018-07-09 08:31 AM, Klaus Wenninger wrote:
>>>>>>>>>>> On 07/09/2018 02:04 PM, Confidential Company wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Any ideas what triggers fencing script or
>>>>>>>>>>>> stonith?
>>>>>>>>>>>>
>>>>>>>>>>>> Given the setup below:
>>>>>>>>>>>> 1. I have two nodes
>>>>>>>>>>>> 2. Configured fencing on both nodes
>>>>>>>>>>>> 3. Configured delay=15 and delay=30 on fence1(for
>>>>>>>>>>>> Node1) and
>>>>>>>>>>>> fence2(for Node2) respectively
>>>>>>>>>>>>
>>>>>>>>>>>> *What does it mean to configured delay in
>>>>>>>>>>>> stonith? wait for 15
>>>>> seconds
>>>>>>>>>>>> before it fence the node?
>>>>>>>>>>> Given that on a 2-node-cluster you don't have real
>>>>>>>>>>> quorum to make
>>>>> one
>>>>>>>>>>> partial cluster fence the rest of the nodes the
>>>>>>>>>>> different delays
>>>>> are meant
>>>>>>>>>>> to prevent a fencing-race.
>>>>>>>>>>> Without different delays that would lead to both
>>>>>>>>>>> nodes fencing each
>>>>>>>>>>> other at the same time - finally both being down.
>>>>>>>>>> Not true, the faster node will kill the slower node
>>>>>>>>>> first. It is
>>>>>>>>>> possible that through misconfiguration, both could
>>>>>>>>>> die, but it's rare
>>>>>>>>>> and easily avoided with a 'delay="15"' set on the
>>>>>>>>>> fence config for
>>>>> the
>>>>>>>>>> node you want to win.
>>>>>>>>> What exactly is not true? Aren't we saying the same?
>>>>>>>>> Of course one of the delays can be 0 (most important is
>>>>>>>>> that
>>>>>>>>> they are different).
>>>>>>>> Perhaps I misunderstood your message. It seemed to me
>>>>>>>> that the
>>>>>>>> implication was that fencing in 2-node without a delay
>>>>>>>> always ends up
>>>>>>>> with both nodes being down, which isn't the case. It can
>>>>>>>> happen if the
>>>>>>>> fence methods are not setup right (ie: the node isn't set
>>>>>>>> to
>>>>> immediately
>>>>>>>> power off on ACPI power button event).
>>>>>>> Yes, a misunderstanding I guess.
>>>>>>>
>>>>>>> Should have been more verbose in saying that due to the
>>>>>>> time between the fencing-command fired off to the fencing
>>>>>>> device and the actual fencing taking place (as you state
>>>>>>> dependent on how it is configured in detail - but a
>>>>>>> measurable
>>>>>>> time in all cases) there is a certain probability that when
>>>>>>> both nodes start fencing at roughly the same time we will
>>>>>>> end up with 2 nodes down.
>>>>>>>
>>>>>>> Everybody has to find his own tradeoff between reliability
>>>>>>> fence-races are prevented and fencing delay I guess.
>>>>>> We've used this;
>>>>>>
>>>>>> 1. IPMI (with the guest OS set to immediately power off) as
>>>>>> primary,
>>>>>> with a 15 second delay on the active node.
>>>>>>
>>>>>> 2. Two Switched PDUs (two power circuits, two PSUs) as backup
>>>>>> fencing
>>>>>> for when IPMI fails, with no delay.
>>>>>>
>>>>>> In ~8 years, across dozens and dozens of clusters and
>>>>>> countless fence
>>>>>> actions, we've never had a dual-fence event (where both nodes
>>>>>> go down).
>>>>>> So it can be done safely, but as always, test test test
>>>>>> before prod.
>>>>> No doubt about that this setup is working reliably.
>>>>> You just have to know your fencing-devices and
>>>>> which delays they involve.
>>>>>
>>>>> If we are talking about SBD (with disk as otherwise
>>>>> it doesn't work in a sensible way in 2-node-clusters)
>>>>> for instance I would strongly advise using a delay.
>>>>>
>>>>> So I guess it is important to understand the basic
>>>>> idea behind this different delay-based fence-race
>>>>> avoidance.
>>>>> Afterwards you can still decide why it is no issue
>>>>> in your own setup.
>>>>>
>>>>>>>> If the delay is set on both nodes, and they are
>>>>>>>> different, it will work
>>>>>>>> fine. The reason not to do this is that if you use 0,
>>>>>>>> then don't use
>>>>>>>> anything at all (0 is default), and any other value
>>>>>>>> causes avoidable
>>>>>>>> fence delays.
>>>>>>>>
>>>>>>>>>> Don't use a delay on the other node, just the node
>>>>>>>>>> you want to live
>>>>> in
>>>>>>>>>> such a case.
>>>>>>>>>>
>>>>>>>>>>>> *Given Node1 is active and Node2 goes down, does
>>>>>>>>>>>> it mean fence1
>>>>> will
>>>>>>>>>>>> first execute and shutdowns Node1 even though
>>>>>>>>>>>> Node2 goes down?
>>>>>>>>>>> If Node2 managed to sign off properly it will not.
>>>>>>>>>>> If network-connection is down so that Node2 can't
>>>>>>>>>>> inform Node1 that
>>>>> it
>>>>>>>>>>> is going
>>>>>>>>>>> down and finally has stopped all resources it will
>>>>>>>>>>> be fenced by
>>>>> Node1.
>>>>>>>>>>> Regards,
>>>>>>>>>>> Klaus
>>>>>>>>>> Fencing occurs in two cases;
>>>>>>>>>>
>>>>>>>>>> 1. The node stops responding (meaning it's in an
>>>>>>>>>> unknown state, so
>>>>> it is
>>>>>>>>>> fenced to force it into a known state).
>>>>>>>>>> 2. A resource / service fails to stop stop. In this
>>>>>>>>>> case, the
>>>>> service is
>>>>>>>>>> in an unknown state, so the node is fenced to force
>>>>>>>>>> the service into
>>>>> a
>>>>>>>>>> known state so that it can be safely recovered on the
>>>>>>>>>> peer.
>>>>>>>>>>
>>>>>>>>>> Graceful withdrawal of the node from the cluster, and
>>>>>>>>>> graceful
>>>>> stopping
>>>>>>>>>> of services will not lead to a fence (because in both
>>>>>>>>>> cases, the
>>>>> node /
>>>>>>>>>> service are in a known state - off).
>>>>>>>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scra
>>>> tch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratc
>>> h.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
>> pdf
>> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180712/95abfb7f/attachment-0001.html>