<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 07/12/2018 09:39 AM, Confidential
Company wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAJiLmDT6--ioK-HvMFxEo_ratFnBMVY1x3tQEXPeDQ1rdNEaYg@mail.gmail.com">
<div dir="ltr"><span style="font-size:12.8px">Message: 2</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">Date: Wed, 11 Jul 2018 16:33:31
+0200</span><br style="font-size:12.8px">
<span style="font-size:12.8px">From: Klaus Wenninger <</span><a
href="mailto:kwenning@redhat.com" style="font-size:12.8px"
moz-do-not-send="true">kwenning@redhat.com</a><span
style="font-size:12.8px">></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">To: Ken Gaillot <</span><a
href="mailto:kgaillot@redhat.com" style="font-size:12.8px"
moz-do-not-send="true">kgaillot@redhat.com</a><span
style="font-size:12.8px">>, Cluster Labs - All topics</span><br
style="font-size:12.8px">
<span style="font-size:12.8px"> related to open-source
clustering welcomed <</span><a
href="mailto:users@clusterlabs.org" style="font-size:12.8px"
moz-do-not-send="true">users@clusterlabs.org</a><span
style="font-size:12.8px">>,</span><br
style="font-size:12.8px">
<span style="font-size:12.8px"> Andrei Borzenkov <</span><a
href="mailto:arvidjaar@gmail.com" style="font-size:12.8px"
moz-do-not-send="true">arvidjaar@gmail.com</a><span
style="font-size:12.8px">></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">Subject: Re: [ClusterLabs] What
triggers fencing?</span><br style="font-size:12.8px">
<span style="font-size:12.8px">Message-ID: <</span><a
href="mailto:2bf61b9f-98b0-482f-fa65-263ba9490950@redhat.com"
style="font-size:12.8px" moz-do-not-send="true">2bf61b9f-98b0-482f-fa65-<wbr>263ba9490950@redhat.com</a><span
style="font-size:12.8px">></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">Content-Type: text/plain;
charset=utf-8</span><br style="font-size:12.8px">
<br style="font-size:12.8px">
<span style="font-size:12.8px">On 07/11/2018 04:11 PM, Ken
Gaillot wrote:</span><br style="font-size:12.8px">
<span style="font-size:12.8px">> On Wed, 2018-07-11 at 11:06
+0200, Klaus Wenninger wrote:</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>> On 07/11/2018 05:48 AM,
Andrei Borzenkov wrote:</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>> 11.07.2018 05:45,
Confidential Company ?????:</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>> Not true, the
faster node will kill the slower node first. It is</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> possible that
through misconfiguration, both could die, but it's</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> rare</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> and easily
avoided with a 'delay="15"' set on the fence config</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> for the</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> node you want to
win.</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> Don't use a
delay on the other node, just the node you want to</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> live in</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> such a case.</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> **</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>
????????????????1. Given Active/Passive setup, resources are</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> active on Node1</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>
????????????????2. fence1(prefers to Node1, delay=15) and</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> fence2(prefers
to</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>> Node2, delay=30)</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>
????????????????3. Node2 goes down</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">> What do you mean by "down"
in this case?</span><br style="font-size:12.8px">
<span style="font-size:12.8px">></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">> If you mean the host itself
has crashed, then it will not do anything,</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">> and node1 will fence it.</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">> If you mean node2's network
goes out, so it's still functioning but no</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">> one can reach the managed
service on it, then you are correct, the</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">> "wrong" node can get shot --
because you didn't specify anything about</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">> what the right node would
be. This is a somewhat tricky area, but it</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">> can be done with a
quorum-only node, qdisk, or fence_heuristics_ping,</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">> all of which are different
ways of "preferring" the node that can reach</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">> a certain host.</span>
<div><br>
</div>
<div><br>
</div>
<div><br style="font-size:12.8px">
<span style="font-size:12.8px">Or in other words why would I -
as a cluster-node - shoot the</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">peer to be able to start the
services locally if I can somehow</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">tell beforehand that my
services anyway wouldn't be</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">reachable by anybody (e.g.
network disconnected).</span><br style="font-size:12.8px">
<span style="font-size:12.8px">Then it might make more sense
to sit still and wait to be shot by</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">the other side for the case
that guy is more lucky and</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">has e.g. access to the network.</span>
<div><br>
</div>
<div><br>
</div>
<div>-Klaus<br>
<br>
<br>
in case of 2node setup, they are both know nothing if their
services are reachable by anybody.<br>
</div>
</div>
</div>
</blockquote>
<br>
Of course they can not get that knowledge using the cluster-peer but<br>
maybe it is possible to get some additional instance into the game.<br>
As Ken already mentioned that might be a disk, an additional node<br>
just for quorum, qdevice or fence_heuristics_ping.<br>
The latter is used on the same fencing level before your real<br>
fencing device and tries to reach IP-Address(es) you configure<br>
and dependent on that it gains some knowledge in how far the<br>
local node might be accessible from outside.<br>
<br>
Btw. in your config I saw that you are using pcmk_delay_max on just<br>
one of the nodes. That is not how it is designed to be used as<br>
you will have a random delay between 0 and max. I would rather<br>
recommend using pcmk_delay_base on one of the nodes (fixed delay)<br>
if you want to prioritize one of them or pcmk_delay_max<br>
with the same delay if you rather want a random behavior.<br>
<br>
Unfortunately the current implementation of fencing doesn't<br>
allow things like dynamic location-rules that can react on e.g.<br>
certain resources running as to prioritize the active node.<br>
What you still can do is that you try to go the way
fence_heuristics_ping<br>
is going (put something in a fencing hierarchy in front of the real<br>
fencing device) and add a fence-agent that in case the node<br>
has certain resources running (active) would return successfully<br>
immediately and in case they are not running (passive) waits<br>
a certain time before returning successfully.<br>
<br>
Otherwise - without checking the logs - I don't know why<br>
disconnecting either node2 or node1 makes a difference.<br>
(Is that reproducible at all?)<br>
In the back of my mind I remember an issue with Corosync<br>
where an interface going down might prevent loss detection<br>
somehow - not remembering exactly.<br>
<br>
Regards,<br>
Klaus <br>
<br>
<br>
<br>
<blockquote type="cite"
cite="mid:CAJiLmDT6--ioK-HvMFxEo_ratFnBMVY1x3tQEXPeDQ1rdNEaYg@mail.gmail.com">
<div dir="ltr">
<div>
<div><br>
Sharing you my config and my tests:<br>
<br>
<div>Last login: Thu Jul 12 14:57:21 2018</div>
<div>[root@ArcosRhel1 ~]# pcs config</div>
<div>Cluster Name: ARCOSCLUSTER</div>
<div>Corosync Nodes:</div>
<div> ArcosRhel1 ArcosRhel2</div>
<div>Pacemaker Nodes:</div>
<div> ArcosRhel1 ArcosRhel2</div>
<div><br>
</div>
<div>Resources:</div>
<div> Resource: ClusterIP (class=ocf provider=heartbeat
type=IPaddr2)</div>
<div> Attributes: cidr_netmask=32 ip=172.16.10.243</div>
<div> Operations: monitor interval=30s
(ClusterIP-monitor-interval-30s)</div>
<div> start interval=0s timeout=20s
(ClusterIP-start-interval-0s)</div>
<div> stop interval=0s timeout=20s
(ClusterIP-stop-interval-0s)</div>
<div><br>
</div>
<div>Stonith Devices:</div>
<div> Resource: Fence1 (class=stonith
type=fence_vmware_soap)</div>
<div> Attributes: action=off ipaddr=172.16.11.201
login=test passwd=testing pcmk_host_list=ArcosRhel1
pcmk_monitor_timeout=60s port=ArcosRhel1(Joniel)
ssl_insecure=1</div>
<div> Operations: monitor interval=60s
(Fence1-monitor-interval-60s)</div>
<div> Resource: fence2 (class=stonith
type=fence_vmware_soap)</div>
<div> Attributes: action=off ipaddr=172.16.11.202
login=test passwd=testing pcmk_delay_max=10s
pcmk_host_list=ArcosRhel2 pcmk_monitor_timeout=60s
port=ArcosRhel2(Ben) ssl_insecure=1</div>
<div> Operations: monitor interval=60s
(fence2-monitor-interval-60s)</div>
<div>Fencing Levels:</div>
<div><br>
</div>
<div>Location Constraints:</div>
<div> Resource: Fence1</div>
<div> Enabled on: ArcosRhel2 (score:INFINITY)
(id:location-Fence1-ArcosRhel2-INFINITY)</div>
<div> Resource: fence2</div>
<div> Enabled on: ArcosRhel1 (score:INFINITY)
(id:location-fence2-ArcosRhel1-INFINITY)</div>
<div>Ordering Constraints:</div>
<div>Colocation Constraints:</div>
<div>Ticket Constraints:</div>
<div><br>
</div>
<div>Alerts:</div>
<div> No alerts defined</div>
<div><br>
</div>
<div>Resources Defaults:</div>
<div> No defaults set</div>
<div>Operations Defaults:</div>
<div> No defaults set</div>
<div><br>
</div>
<div>Cluster Properties:</div>
<div> cluster-infrastructure: corosync</div>
<div> cluster-name: ARCOSCLUSTER</div>
<div> dc-version: 1.1.16-12.el7-94ff4df</div>
<div> have-watchdog: false</div>
<div> last-lrm-refresh: 1531375458</div>
<div> stonith-enabled: true</div>
<div><br>
</div>
<div>Quorum:</div>
<div> Options:</div>
<div>[root@ArcosRhel1 ~]#</div>
<br>
</div>
<div>**Test scenario:</div>
<div>Given:<br>
</div>
<div>Nodes has two interfaces: (ens192 for corosync traffic /
ens224 for esxi traffic)</div>
<div><br>
</div>
<div>a.) Node1=Active and Node2=Passive.</div>
<div> Action=disconnect ens192 of Node1 </div>
<div>Output= Node2 was fenced and shutdown</div>
<div>b.) Node1=Passive and Node2=Active</div>
<div>Action=disconnect ens192 of Node1</div>
<div>Output= Node1 was fenced and shutdown</div>
<div>c.) Node1=Passive and Node2=Active</div>
<div>Action=disconnect ens192 of Node2</div>
<div>Output=Node2 was fenced and shutdown<br>
<br>
<br>
Thanks,<br>
imnotarobot</div>
<div><br>
</div>
<div><br>
</div>
<div><br style="font-size:12.8px">
<span style="font-size:12.8px">></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">> If you mean the
cluster-managed resource crashes on node2, but node2</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">> itself is still
functioning properly, then what happens depends on how</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">> you've configured
failure recovery. By default, there is no fencing,</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">> and the cluster tries to
restart the resource.</span><br style="font-size:12.8px">
<span style="font-size:12.8px">></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>
????????????????4. Node1 thinks Node2 goes down / Node2
thinks</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>> Node1 goes</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> down</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>> If node2 is
down, it cannot think anything.</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>> True. Assuming it is
not really down but just somehow disconnected</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>> for my answer below.</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>
????????????????5. fence1 counts 15 seconds before he
fence Node1</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>> while</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> fence2
counts 30 seconds before he fence Node2</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>
????????????????6. Since fence1 do have shorter time than
fence2,</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>> fence1</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> executes and
shutdown Node1.</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>
????????????????7. fence1(action: shutdown Node1)??will
trigger</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>> first</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> always
because it has shorter delay than fence2.</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> ** Okay
what's important is that they should be different. But in</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> the case</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> above, even
though Node2 goes down but Node1 has shorter delay,</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> Node1 gets</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>
fenced/shutdown. This is a sample scenario. I don't get
the</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>> point. Can
you</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>> comment on
this?</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>> You didn't send the
actual config but from your description</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>> I get the scenario
that way:</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>> fencing-resource
fence1 is running on Node2 and it is there</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>> to fence Node1 and
it has a delay of 15s.</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>> fencing-resource
fence2 is running on Node1 and it is there</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>> to fence Node2 and
it has a delay of 30s.</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>> If they now begin to
fence each other at the same time the</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>> node actually fenced
would be Node1 of course as the</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>> fencing-resource
fence1 is gonna shoot 15s earlier that the</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>> fence2.</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>> Looks consistent to
me ...</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>> Regards,</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>> Klaus</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> Thanks</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> On Tue, Jul
10, 2018 at 12:18 AM, Klaus Wenninger <kwenning@redha</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> </span><a
href="http://t.com/" rel="noreferrer" target="_blank"
style="font-size:12.8px" moz-do-not-send="true">t.com</a><span
style="font-size:12.8px">></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> wrote:</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>> On
07/09/2018 05:53 PM, Digimer wrote:</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>> On
2018-07-09 11:45 AM, Klaus Wenninger wrote:</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>
On 07/09/2018 05:33 PM, Digimer wrote:</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>
On 2018-07-09 09:56 AM, Klaus Wenninger wrote:</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>
On 07/09/2018 03:49 PM, Digimer wrote:</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>
On 2018-07-09 08:31 AM, Klaus Wenninger wrote:</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>
On 07/09/2018 02:04 PM, Confidential Company wrote:</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>>
Hi,</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>>
Any ideas what triggers fencing script or</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>>
stonith?</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>>
Given the setup below:</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>>
1. I have two nodes</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>>
2. Configured fencing on both nodes</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>>
3. Configured delay=15 and delay=30 on fence1(for</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>>
Node1) and</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>>
fence2(for Node2) respectively</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>>
*What does it mean to configured delay in</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>>
stonith? wait for 15</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>> seconds</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>>
before it fence the node?</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>
Given that on a 2-node-cluster you don't have real</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>
quorum to make</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>> one</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>
partial cluster fence the rest of the nodes the</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>
different delays</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>> are
meant</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>
to prevent a fencing-race.</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>
Without different delays that would lead to both</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>
nodes fencing each</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>
other at the same time - finally both being down.</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>
Not true, the faster node will kill the slower node</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>
first. It is</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>
possible that through misconfiguration, both could</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>
die, but it's rare</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>
and easily avoided with a 'delay="15"' set on the</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>
fence config for</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>> the</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>
node you want to win.</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>
What exactly is not true? Aren't we saying the same?</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>
Of course one of the delays can be 0 (most important is</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>
that</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>
they are different).</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>
Perhaps I misunderstood your message. It seemed to me</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>
that the</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>
implication was that fencing in 2-node without a delay</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>
always ends up</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>
with both nodes being down, which isn't the case. It can</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>
happen if the</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>
fence methods are not setup right (ie: the node isn't set</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>
to</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>
immediately</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>
power off on ACPI power button event).</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>
Yes, a misunderstanding I guess.</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>
Should have been more verbose in saying that due to the</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>
time between the fencing-command fired off to the fencing</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>
device and the actual fencing taking place (as you state</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>
dependent on how it is configured in detail - but a</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>
measurable</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>
time in all cases) there is a certain probability that
when</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>
both nodes start fencing at roughly the same time we will</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>
end up with 2 nodes down.</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>
Everybody has to find his own tradeoff between reliability</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>
fence-races are prevented and fencing delay I guess.</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>
We've used this;</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>> 1.
IPMI (with the guest OS set to immediately power off) as</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>
primary,</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>> with
a 15 second delay on the active node.</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>> 2.
Two Switched PDUs (two power circuits, two PSUs) as backup</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>
fencing</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>> for
when IPMI fails, with no delay.</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>> In
~8 years, across dozens and dozens of clusters and</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>
countless fence</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>
actions, we've never had a dual-fence event (where both
nodes</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>> go
down).</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>> So
it can be done safely, but as always, test test test</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>
before prod.</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>> No doubt
about that this setup is working reliably.</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>> You just
have to know your fencing-devices and</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>> which
delays they involve.</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>> If we
are talking about SBD (with disk as otherwise</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>> it
doesn't work in a sensible way in 2-node-clusters)</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>> for
instance I would strongly advise using a delay.</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>> So I
guess it is important to understand the basic</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>> idea
behind this different delay-based fence-race</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>
avoidance.</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>
Afterwards you can still decide why it is no issue</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>> in your
own setup.</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>
If the delay is set on both nodes, and they are</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>
different, it will work</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>
fine. The reason not to do this is that if you use 0,</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>
then don't use</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>
anything at all (0 is default), and any other value</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>
causes avoidable</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>
fence delays.</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>
Don't use a delay on the other node, just the node</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>
you want to live</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>> in</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>
such a case.</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>>
*Given Node1 is active and Node2 goes down, does</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>>
it mean fence1</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>> will</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>>
first execute and shutdowns Node1 even though</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>>
Node2 goes down?</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>
If Node2 managed to sign off properly it will not.</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>
If network-connection is down so that Node2 can't</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>
inform Node1 that</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>> it</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>
is going</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>
down and finally has stopped all resources it will</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>
be fenced by</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>> Node1.</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>
Regards,</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>>
Klaus</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>
Fencing occurs in two cases;</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>
1. The node stops responding (meaning it's in an</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>
unknown state, so</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>> it is</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>
fenced to force it into a known state).</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>
2. A resource / service fails to stop stop. In this</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>
case, the</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>> service
is</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>
in an unknown state, so the node is fenced to force</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>
the service into</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>> a</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>
known state so that it can be safely recovered on the</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>
peer.</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>
Graceful withdrawal of the node from the cluster, and</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>
graceful</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>> stopping</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>
of services will not lead to a fence (because in both</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>
cases, the</span><br style="font-size:12.8px">
<span style="font-size:12.8px">>>>>> node /</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>>
service are in a known state - off).</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>>>>>>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>>
______________________________</span><wbr
style="font-size:12.8px"><span style="font-size:12.8px">_________________</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> Users
mailing list: </span><a
href="mailto:Users@clusterlabs.org"
style="font-size:12.8px" moz-do-not-send="true">Users@clusterlabs.org</a><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> </span><a
href="https://lists.clusterlabs.org/mailman/listinfo/users"
rel="noreferrer" target="_blank" style="font-size:12.8px"
moz-do-not-send="true">https://lists.clusterlabs.org/<wbr>mailman/listinfo/users</a><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> Project
Home: </span><a href="http://www.clusterlabs.org/"
rel="noreferrer" target="_blank" style="font-size:12.8px"
moz-do-not-send="true">http://www.clusterlabs.org</a><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> Getting
started: </span><a
href="http://www.clusterlabs.org/doc/Cluster_from_Scra"
rel="noreferrer" target="_blank" style="font-size:12.8px"
moz-do-not-send="true">http://www.clusterlabs.org/<wbr>doc/Cluster_from_Scra</a><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> tch.pdf</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>> Bugs: </span><a
href="http://bugs.clusterlabs.org/" rel="noreferrer"
target="_blank" style="font-size:12.8px"
moz-do-not-send="true">http://bugs.clusterlabs.org</a><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>>
______________________________</span><wbr
style="font-size:12.8px"><span style="font-size:12.8px">_________________</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>> Users mailing
list: </span><a href="mailto:Users@clusterlabs.org"
style="font-size:12.8px" moz-do-not-send="true">Users@clusterlabs.org</a><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>> </span><a
href="https://lists.clusterlabs.org/mailman/listinfo/users"
rel="noreferrer" target="_blank" style="font-size:12.8px"
moz-do-not-send="true">https://lists.clusterlabs.org/<wbr>mailman/listinfo/users</a><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>> Project Home: </span><a
href="http://www.clusterlabs.org/" rel="noreferrer"
target="_blank" style="font-size:12.8px"
moz-do-not-send="true">http://www.clusterlabs.org</a><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>> Getting
started: </span><a
href="http://www.clusterlabs.org/doc/Cluster_from_Scratc"
rel="noreferrer" target="_blank" style="font-size:12.8px"
moz-do-not-send="true">http://www.clusterlabs.org/<wbr>doc/Cluster_from_Scratc</a><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>> h.pdf</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>> Bugs: </span><a
href="http://bugs.clusterlabs.org/" rel="noreferrer"
target="_blank" style="font-size:12.8px"
moz-do-not-send="true">http://bugs.clusterlabs.org</a><br
style="font-size:12.8px">
<span style="font-size:12.8px">>>
______________________________</span><wbr
style="font-size:12.8px"><span style="font-size:12.8px">_________________</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>> Users mailing list: </span><a
href="mailto:Users@clusterlabs.org"
style="font-size:12.8px" moz-do-not-send="true">Users@clusterlabs.org</a><br
style="font-size:12.8px">
<span style="font-size:12.8px">>> </span><a
href="https://lists.clusterlabs.org/mailman/listinfo/users"
rel="noreferrer" target="_blank" style="font-size:12.8px"
moz-do-not-send="true">https://lists.clusterlabs.org/<wbr>mailman/listinfo/users</a><br
style="font-size:12.8px">
<span style="font-size:12.8px">>></span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>> Project Home: </span><a
href="http://www.clusterlabs.org/" rel="noreferrer"
target="_blank" style="font-size:12.8px"
moz-do-not-send="true">http://www.clusterlabs.org</a><br
style="font-size:12.8px">
<span style="font-size:12.8px">>> Getting started: </span><a
href="http://www.clusterlabs.org/doc/Cluster_from_Scratch"
rel="noreferrer" target="_blank" style="font-size:12.8px"
moz-do-not-send="true">http://www.clusterlabs.org/<wbr>doc/Cluster_from_Scratch</a><span
style="font-size:12.8px">.</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>> pdf</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">>> Bugs: </span><a
href="http://bugs.clusterlabs.org/" rel="noreferrer"
target="_blank" style="font-size:12.8px"
moz-do-not-send="true">http://bugs.clusterlabs.org</a><br>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Users mailing list: <a class="moz-txt-link-abbreviated" href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a>
<a class="moz-txt-link-freetext" href="https://lists.clusterlabs.org/mailman/listinfo/users">https://lists.clusterlabs.org/mailman/listinfo/users</a>
Project Home: <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org">http://www.clusterlabs.org</a>
Getting started: <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a>
Bugs: <a class="moz-txt-link-freetext" href="http://bugs.clusterlabs.org">http://bugs.clusterlabs.org</a>
</pre>
</blockquote>
<br>
</body>
</html>