[ClusterLabs] What triggers fencing?

Wed Jul 11 14:12:55 UTC 2018

Message: 1
Date: Wed, 11 Jul 2018 11:06:56 +0200
From: Klaus Wenninger <kwenning at redhat.com>
To: Cluster Labs - All topics related to open-source clustering
        welcomed <users at clusterlabs.org>, Andrei Borzenkov
        <arvidjaar at gmail.com>
Subject: Re: [ClusterLabs] What triggers fencing?
Message-ID: <db834639-3f15-7861-ca62-d42971b93085 at redhat.com>
Content-Type: text/plain; charset=utf-8

On 07/11/2018 05:48 AM, Andrei Borzenkov wrote:
> 11.07.2018 05:45, Confidential Company ?????:
>> Not true, the faster node will kill the slower node first. It is
>> possible that through misconfiguration, both could die, but it's rare
>> and easily avoided with a 'delay="15"' set on the fence config for the
>> node you want to win.
>>
>> Don't use a delay on the other node, just the node you want to live in
>> such a case.
>>
>> **
>>                 1. Given Active/Passive setup, resources are active on
Node1
>>                 2. fence1(prefers to Node1, delay=15) and fence2(prefers
to
>> Node2, delay=30)
>>                 3. Node2 goes down
>>                 4. Node1 thinks Node2 goes down / Node2 thinks Node1 goes
>> down
> If node2 is down, it cannot think anything.

True. Assuming it is not really down but just somehow disconnected
for my answer below.

>
>>                 5. fence1 counts 15 seconds before he fence Node1 while
>> fence2 counts 30 seconds before he fence Node2
>>                 6. Since fence1 do have shorter time than fence2, fence1
>> executes and shutdown Node1.
>>                 7. fence1(action: shutdown Node1)  will trigger first
>> always because it has shorter delay than fence2.
>>
>> ** Okay what's important is that they should be different. But in the
case
>> above, even though Node2 goes down but Node1 has shorter delay, Node1
gets
>> fenced/shutdown. This is a sample scenario. I don't get the point. Can
you
>> comment on this?

You didn't send the actual config but from your description
I get the scenario that way:

fencing-resource fence1 is running on Node2 and it is there
to fence Node1 and it has a delay of 15s.
fencing-resource fence2 is running on Node1 and it is there
to fence Node2 and it has a delay of 30s.
If they now begin to fence each other at the same time the
node actually fenced would be Node1 of course as the
fencing-resource fence1 is gonna shoot 15s earlier that the
fence2.
Looks consistent to me ...

Regards,
Klaus

*******
Yes, that is right Klaus. fence1 running on Node2 will fence Node1, fence1
will execute first whichever Node goes down because it has shorter delay.
But if Node2 goes down or disconnected, how can it be fenced by Node1 using
fence2, if fence2 cannot be triggered because fence1 always comes first.

My point here is that giving delay on fencing resolves the issue of double
fencing, but it doesnt resolve or doesnt know who's Node should be fenced.
Even though Node2 gets disconnected, Node1 will be fenced and the whole
service totally goes down.

******Let me share you my actual config:

I have two ESXI hosts, 2 virtual machines, 2 interfaces on each (1=corosync
interface, 1=interface for VM to contact ESXI host)

Pacemaker Nodes:
 ArcosRhel1 ArcosRhel2

Resources:
 Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: cidr_netmask=32 ip=172.16.10.243
  Operations: monitor interval=30s (ClusterIP-monitor-interval-30s)
              start interval=0s timeout=20s (ClusterIP-start-interval-0s)
              stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)

Stonith Devices:
 Resource: Fence1 (class=stonith type=fence_vmware_soap)
  Attributes: action=off ipaddr=172.16.10.201 login=test passwd=testing
pcmk_host_list=ArcosRhel1 pcmk_monitor_timeout=60s port=ArcosRhel1
ssl_insecure=1
  Operations: monitor interval=60s (Fence1-monitor-interval-60s)
 Resource: fence2 (class=stonith type=fence_vmware_soap)
  Attributes: action=off ipaddr=172.16.10.202 login=test passwd=testing
pcmk_delay_max=10s pcmk_host_list=ArcosRhel2 pcmk_monitor_timeout=60s
port=ArcosRhel2 ssl_insecure=1
  Operations: monitor interval=60s (fence2-monitor-interval-60s)
Fencing Levels:

Location Constraints:
  Resource: Fence1
    Enabled on: ArcosRhel2 (score:INFINITY)
(id:location-Fence1-ArcosRhel2-INFINITY)
  Resource: fence2
    Enabled on: ArcosRhel1 (score:INFINITY)
(id:location-fence2-ArcosRhel1-INFINITY)
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: ARCOSCLUSTER
 dc-version: 1.1.16-12.el7-94ff4df
 have-watchdog: false
 last-lrm-refresh: 1531300540
 stonith-enabled: true

*****
>>
>> Thanks
>>
>> On Tue, Jul 10, 2018 at 12:18 AM, Klaus Wenninger <kwenning at redhat.com>
>> wrote:
>>
>>> On 07/09/2018 05:53 PM, Digimer wrote:
>>>> On 2018-07-09 11:45 AM, Klaus Wenninger wrote:
>>>>> On 07/09/2018 05:33 PM, Digimer wrote:
>>>>>> On 2018-07-09 09:56 AM, Klaus Wenninger wrote:
>>>>>>> On 07/09/2018 03:49 PM, Digimer wrote:
>>>>>>>> On 2018-07-09 08:31 AM, Klaus Wenninger wrote:
>>>>>>>>> On 07/09/2018 02:04 PM, Confidential Company wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Any ideas what triggers fencing script or stonith?
>>>>>>>>>>
>>>>>>>>>> Given the setup below:
>>>>>>>>>> 1. I have two nodes
>>>>>>>>>> 2. Configured fencing on both nodes
>>>>>>>>>> 3. Configured delay=15 and delay=30 on fence1(for Node1) and
>>>>>>>>>> fence2(for Node2) respectively
>>>>>>>>>>
>>>>>>>>>> *What does it mean to configured delay in stonith? wait for 15
>>> seconds
>>>>>>>>>> before it fence the node?
>>>>>>>>> Given that on a 2-node-cluster you don't have real quorum to make
>>> one
>>>>>>>>> partial cluster fence the rest of the nodes the different delays
>>> are meant
>>>>>>>>> to prevent a fencing-race.
>>>>>>>>> Without different delays that would lead to both nodes fencing
each
>>>>>>>>> other at the same time - finally both being down.
>>>>>>>> Not true, the faster node will kill the slower node first. It is
>>>>>>>> possible that through misconfiguration, both could die, but it's
rare
>>>>>>>> and easily avoided with a 'delay="15"' set on the fence config for
>>> the
>>>>>>>> node you want to win.
>>>>>>> What exactly is not true? Aren't we saying the same?
>>>>>>> Of course one of the delays can be 0 (most important is that
>>>>>>> they are different).
>>>>>> Perhaps I misunderstood your message. It seemed to me that the
>>>>>> implication was that fencing in 2-node without a delay always ends up
>>>>>> with both nodes being down, which isn't the case. It can happen if
the
>>>>>> fence methods are not setup right (ie: the node isn't set to
>>> immediately
>>>>>> power off on ACPI power button event).
>>>>> Yes, a misunderstanding I guess.
>>>>>
>>>>> Should have been more verbose in saying that due to the
>>>>> time between the fencing-command fired off to the fencing
>>>>> device and the actual fencing taking place (as you state
>>>>> dependent on how it is configured in detail - but a measurable
>>>>> time in all cases) there is a certain probability that when
>>>>> both nodes start fencing at roughly the same time we will
>>>>> end up with 2 nodes down.
>>>>>
>>>>> Everybody has to find his own tradeoff between reliability
>>>>> fence-races are prevented and fencing delay I guess.
>>>> We've used this;
>>>>
>>>> 1. IPMI (with the guest OS set to immediately power off) as primary,
>>>> with a 15 second delay on the active node.
>>>>
>>>> 2. Two Switched PDUs (two power circuits, two PSUs) as backup fencing
>>>> for when IPMI fails, with no delay.
>>>>
>>>> In ~8 years, across dozens and dozens of clusters and countless fence
>>>> actions, we've never had a dual-fence event (where both nodes go down).
>>>> So it can be done safely, but as always, test test test before prod.
>>> No doubt about that this setup is working reliably.
>>> You just have to know your fencing-devices and
>>> which delays they involve.
>>>
>>> If we are talking about SBD (with disk as otherwise
>>> it doesn't work in a sensible way in 2-node-clusters)
>>> for instance I would strongly advise using a delay.
>>>
>>> So I guess it is important to understand the basic
>>> idea behind this different delay-based fence-race
>>> avoidance.
>>> Afterwards you can still decide why it is no issue
>>> in your own setup.
>>>
>>>>>> If the delay is set on both nodes, and they are different, it will
work
>>>>>> fine. The reason not to do this is that if you use 0, then don't use
>>>>>> anything at all (0 is default), and any other value causes avoidable
>>>>>> fence delays.
>>>>>>
>>>>>>>> Don't use a delay on the other node, just the node you want to live
>>> in
>>>>>>>> such a case.
>>>>>>>>
>>>>>>>>>> *Given Node1 is active and Node2 goes down, does it mean fence1
>>> will
>>>>>>>>>> first execute and shutdowns Node1 even though Node2 goes down?
>>>>>>>>> If Node2 managed to sign off properly it will not.
>>>>>>>>> If network-connection is down so that Node2 can't inform Node1
that
>>> it
>>>>>>>>> is going
>>>>>>>>> down and finally has stopped all resources it will be fenced by
>>> Node1.
>>>>>>>>> Regards,
>>>>>>>>> Klaus
>>>>>>>> Fencing occurs in two cases;
>>>>>>>>
>>>>>>>> 1. The node stops responding (meaning it's in an unknown state, so
>>> it is
>>>>>>>> fenced to force it into a known state).
>>>>>>>> 2. A resource / service fails to stop stop. In this case, the
>>> service is
>>>>>>>> in an unknown state, so the node is fenced to force the service
into
>>> a
>>>>>>>> known state so that it can be safely recovered on the peer.
>>>>>>>>
>>>>>>>> Graceful withdrawal of the node from the cluster, and graceful
>>> stopping
>>>>>>>> of services will not lead to a fence (because in both cases, the
>>> node /
>>>>>>>> service are in a known state - off).
>>>>>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180711/ba87275e/attachment-0001.html>