<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">On 07/12/2018 09:39 AM, Confidential
      Company wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAJiLmDT6--ioK-HvMFxEo_ratFnBMVY1x3tQEXPeDQ1rdNEaYg@mail.gmail.com">
      <div dir="ltr"><span style="font-size:12.8px">Message: 2</span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">Date: Wed, 11 Jul 2018 16:33:31
          +0200</span><br style="font-size:12.8px">
        <span style="font-size:12.8px">From: Klaus Wenninger <</span><a
          href="mailto:kwenning@redhat.com" style="font-size:12.8px"
          moz-do-not-send="true">kwenning@redhat.com</a><span
          style="font-size:12.8px">></span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">To: Ken Gaillot <</span><a
          href="mailto:kgaillot@redhat.com" style="font-size:12.8px"
          moz-do-not-send="true">kgaillot@redhat.com</a><span
          style="font-size:12.8px">>, Cluster Labs - All topics</span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">        related to open-source
          clustering welcomed <</span><a
          href="mailto:users@clusterlabs.org" style="font-size:12.8px"
          moz-do-not-send="true">users@clusterlabs.org</a><span
          style="font-size:12.8px">>,</span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">        Andrei Borzenkov <</span><a
          href="mailto:arvidjaar@gmail.com" style="font-size:12.8px"
          moz-do-not-send="true">arvidjaar@gmail.com</a><span
          style="font-size:12.8px">></span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">Subject: Re: [ClusterLabs] What
          triggers fencing?</span><br style="font-size:12.8px">
        <span style="font-size:12.8px">Message-ID: <</span><a
          href="mailto:2bf61b9f-98b0-482f-fa65-263ba9490950@redhat.com"
          style="font-size:12.8px" moz-do-not-send="true">2bf61b9f-98b0-482f-fa65-<wbr>263ba9490950@redhat.com</a><span
          style="font-size:12.8px">></span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">Content-Type: text/plain;
          charset=utf-8</span><br style="font-size:12.8px">
        <br style="font-size:12.8px">
        <span style="font-size:12.8px">On 07/11/2018 04:11 PM, Ken
          Gaillot wrote:</span><br style="font-size:12.8px">
        <span style="font-size:12.8px">> On Wed, 2018-07-11 at 11:06
          +0200, Klaus Wenninger wrote:</span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">>> On 07/11/2018 05:48 AM,
          Andrei Borzenkov wrote:</span><br style="font-size:12.8px">
        <span style="font-size:12.8px">>>> 11.07.2018 05:45,
          Confidential Company ?????:</span><br style="font-size:12.8px">
        <span style="font-size:12.8px">>>>> Not true, the
          faster node will kill the slower node first. It is</span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">>>>> possible that
          through misconfiguration, both could die, but it's</span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">>>>> rare</span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">>>>> and easily
          avoided with a 'delay="15"' set on the fence config</span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">>>>> for the</span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">>>>> node you want to
          win.</span><br style="font-size:12.8px">
        <span style="font-size:12.8px">>>>></span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">>>>> Don't use a
          delay on the other node, just the node you want to</span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">>>>> live in</span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">>>>> such a case.</span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">>>>></span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">>>>> **</span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">>>>>
          ????????????????1. Given Active/Passive setup, resources are</span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">>>>> active on Node1</span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">>>>>
          ????????????????2. fence1(prefers to Node1, delay=15) and</span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">>>>> fence2(prefers
          to</span><br style="font-size:12.8px">
        <span style="font-size:12.8px">>>>> Node2, delay=30)</span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">>>>>
          ????????????????3. Node2 goes down</span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">> What do you mean by "down"
          in this case?</span><br style="font-size:12.8px">
        <span style="font-size:12.8px">></span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">> If you mean the host itself
          has crashed, then it will not do anything,</span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">> and node1 will fence it.</span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">></span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">> If you mean node2's network
          goes out, so it's still functioning but no</span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">> one can reach the managed
          service on it, then you are correct, the</span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">> "wrong" node can get shot --
          because you didn't specify anything about</span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">> what the right node would
          be. This is a somewhat tricky area, but it</span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">> can be done with a
          quorum-only node, qdisk, or fence_heuristics_ping,</span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">> all of which are different
          ways of "preferring" the node that can reach</span><br
          style="font-size:12.8px">
        <span style="font-size:12.8px">> a certain host.</span>
        <div><br>
        </div>
        <div><br>
        </div>
        <div><br style="font-size:12.8px">
          <span style="font-size:12.8px">Or in other words why would I -
            as a cluster-node - shoot the</span><br
            style="font-size:12.8px">
          <span style="font-size:12.8px">peer to be able to start the
            services locally if I can somehow</span><br
            style="font-size:12.8px">
          <span style="font-size:12.8px">tell beforehand that my
            services anyway wouldn't be</span><br
            style="font-size:12.8px">
          <span style="font-size:12.8px">reachable by anybody (e.g.
            network disconnected).</span><br style="font-size:12.8px">
          <span style="font-size:12.8px">Then it might make more sense
            to sit still and wait to be shot by</span><br
            style="font-size:12.8px">
          <span style="font-size:12.8px">the other side for the case
            that guy is more lucky and</span><br
            style="font-size:12.8px">
          <span style="font-size:12.8px">has e.g. access to the network.</span>
          <div><br>
          </div>
          <div><br>
          </div>
          <div>-Klaus<br>
            <br>
            <br>
            in case of 2node setup, they are both know nothing if their
            services are reachable by anybody.<br>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    Of course they can not get that knowledge using the cluster-peer but<br>
    maybe it is possible to get some additional instance into the game.<br>
    As Ken already mentioned that might be a disk, an additional node<br>
    just for quorum, qdevice or fence_heuristics_ping.<br>
    The latter is used on the same fencing level before your real<br>
    fencing device and tries to reach IP-Address(es) you configure<br>
    and dependent on that it gains some knowledge in how far the<br>
    local node might be accessible from outside.<br>
    <br>
    Btw. in your config I saw that you are using pcmk_delay_max on just<br>
    one of the nodes. That is not how it is designed to be used as<br>
    you will have a random delay between 0 and max. I would rather<br>
    recommend using pcmk_delay_base on one of the nodes (fixed delay)<br>
    if you want to prioritize one of them or pcmk_delay_max<br>
    with the same delay if you rather want a random behavior.<br>
    <br>
    Unfortunately the current implementation of fencing doesn't<br>
    allow things like dynamic location-rules that can react on e.g.<br>
    certain resources running as to prioritize the active node.<br>
    What you still can do is that you try to go the way
    fence_heuristics_ping<br>
    is going (put something in a fencing hierarchy in front of the real<br>
    fencing device) and add a fence-agent that in case the node<br>
    has certain resources running (active) would return successfully<br>
    immediately and in case they are not running (passive) waits<br>
    a certain time before returning successfully.<br>
    <br>
    Otherwise - without checking the logs - I don't know why<br>
    disconnecting either node2 or node1 makes a difference.<br>
    (Is that reproducible at all?)<br>
    In the back of my mind I remember an issue with Corosync<br>
    where an interface going down might prevent loss detection<br>
    somehow - not remembering exactly.<br>
    <br>
    Regards,<br>
    Klaus <br>
    <br>
    <br>
    <br>
    <blockquote type="cite"
cite="mid:CAJiLmDT6--ioK-HvMFxEo_ratFnBMVY1x3tQEXPeDQ1rdNEaYg@mail.gmail.com">
      <div dir="ltr">
        <div>
          <div><br>
            Sharing you my config and my tests:<br>
            <br>
            <div>Last login: Thu Jul 12 14:57:21 2018</div>
            <div>[root@ArcosRhel1 ~]# pcs config</div>
            <div>Cluster Name: ARCOSCLUSTER</div>
            <div>Corosync Nodes:</div>
            <div> ArcosRhel1 ArcosRhel2</div>
            <div>Pacemaker Nodes:</div>
            <div> ArcosRhel1 ArcosRhel2</div>
            <div><br>
            </div>
            <div>Resources:</div>
            <div> Resource: ClusterIP (class=ocf provider=heartbeat
              type=IPaddr2)</div>
            <div>  Attributes: cidr_netmask=32 ip=172.16.10.243</div>
            <div>  Operations: monitor interval=30s
              (ClusterIP-monitor-interval-30s)</div>
            <div>              start interval=0s timeout=20s
              (ClusterIP-start-interval-0s)</div>
            <div>              stop interval=0s timeout=20s
              (ClusterIP-stop-interval-0s)</div>
            <div><br>
            </div>
            <div>Stonith Devices:</div>
            <div> Resource: Fence1 (class=stonith
              type=fence_vmware_soap)</div>
            <div>  Attributes: action=off ipaddr=172.16.11.201
              login=test passwd=testing pcmk_host_list=ArcosRhel1
              pcmk_monitor_timeout=60s port=ArcosRhel1(Joniel)
              ssl_insecure=1</div>
            <div>  Operations: monitor interval=60s
              (Fence1-monitor-interval-60s)</div>
            <div> Resource: fence2 (class=stonith
              type=fence_vmware_soap)</div>
            <div>  Attributes: action=off ipaddr=172.16.11.202
              login=test passwd=testing pcmk_delay_max=10s
              pcmk_host_list=ArcosRhel2 pcmk_monitor_timeout=60s
              port=ArcosRhel2(Ben) ssl_insecure=1</div>
            <div>  Operations: monitor interval=60s
              (fence2-monitor-interval-60s)</div>
            <div>Fencing Levels:</div>
            <div><br>
            </div>
            <div>Location Constraints:</div>
            <div>  Resource: Fence1</div>
            <div>    Enabled on: ArcosRhel2 (score:INFINITY)
              (id:location-Fence1-ArcosRhel2-INFINITY)</div>
            <div>  Resource: fence2</div>
            <div>    Enabled on: ArcosRhel1 (score:INFINITY)
              (id:location-fence2-ArcosRhel1-INFINITY)</div>
            <div>Ordering Constraints:</div>
            <div>Colocation Constraints:</div>
            <div>Ticket Constraints:</div>
            <div><br>
            </div>
            <div>Alerts:</div>
            <div> No alerts defined</div>
            <div><br>
            </div>
            <div>Resources Defaults:</div>
            <div> No defaults set</div>
            <div>Operations Defaults:</div>
            <div> No defaults set</div>
            <div><br>
            </div>
            <div>Cluster Properties:</div>
            <div> cluster-infrastructure: corosync</div>
            <div> cluster-name: ARCOSCLUSTER</div>
            <div> dc-version: 1.1.16-12.el7-94ff4df</div>
            <div> have-watchdog: false</div>
            <div> last-lrm-refresh: 1531375458</div>
            <div> stonith-enabled: true</div>
            <div><br>
            </div>
            <div>Quorum:</div>
            <div>  Options:</div>
            <div>[root@ArcosRhel1 ~]#</div>
            <br>
          </div>
          <div>**Test scenario:</div>
          <div>Given:<br>
          </div>
          <div>Nodes has two interfaces: (ens192 for corosync traffic /
            ens224 for esxi traffic)</div>
          <div><br>
          </div>
          <div>a.) Node1=Active and Node2=Passive.</div>
          <div> Action=disconnect ens192 of Node1 </div>
          <div>Output= Node2 was fenced and shutdown</div>
          <div>b.) Node1=Passive and Node2=Active</div>
          <div>Action=disconnect ens192 of Node1</div>
          <div>Output= Node1 was fenced and shutdown</div>
          <div>c.) Node1=Passive and Node2=Active</div>
          <div>Action=disconnect ens192 of Node2</div>
          <div>Output=Node2 was fenced and shutdown<br>
            <br>
            <br>
            Thanks,<br>
            imnotarobot</div>
          <div><br>
          </div>
          <div><br>
          </div>
          <div><br style="font-size:12.8px">
            <span style="font-size:12.8px">></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">> If you mean the
              cluster-managed resource crashes on node2, but node2</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">> itself is still
              functioning properly, then what happens depends on how</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">> you've configured
              failure recovery. By default, there is no fencing,</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">> and the cluster tries to
              restart the resource.</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>
              ????????????????4. Node1 thinks Node2 goes down / Node2
              thinks</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>> Node1 goes</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>> down</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>> If node2 is
              down, it cannot think anything.</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>> True. Assuming it is
              not really down but just somehow disconnected</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>> for my answer below.</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>
              ????????????????5. fence1 counts 15 seconds before he
              fence Node1</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>> while</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>> fence2
              counts 30 seconds before he fence Node2</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>
              ????????????????6. Since fence1 do have shorter time than
              fence2,</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>> fence1</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>> executes and
              shutdown Node1.</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>
              ????????????????7. fence1(action: shutdown Node1)??will
              trigger</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>> first</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>> always
              because it has shorter delay than fence2.</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>> ** Okay
              what's important is that they should be different. But in</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>> the case</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>> above, even
              though Node2 goes down but Node1 has shorter delay,</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>> Node1 gets</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>
              fenced/shutdown. This is a sample scenario. I don't get
              the</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>> point. Can
              you</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>> comment on
              this?</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>> You didn't send the
              actual config but from your description</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>> I get the scenario
              that way:</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>> fencing-resource
              fence1 is running on Node2 and it is there</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>> to fence Node1 and
              it has a delay of 15s.</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>> fencing-resource
              fence2 is running on Node1 and it is there</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>> to fence Node2 and
              it has a delay of 30s.</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>> If they now begin to
              fence each other at the same time the</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>> node actually fenced
              would be Node1 of course as the</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>> fencing-resource
              fence1 is gonna shoot 15s earlier that the</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>> fence2.</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>> Looks consistent to
              me ...</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>> Regards,</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>> Klaus</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>> Thanks</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>> On Tue, Jul
              10, 2018 at 12:18 AM, Klaus Wenninger <kwenning@redha</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>> </span><a
              href="http://t.com/" rel="noreferrer" target="_blank"
              style="font-size:12.8px" moz-do-not-send="true">t.com</a><span
              style="font-size:12.8px">></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>> wrote:</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>> On
              07/09/2018 05:53 PM, Digimer wrote:</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>> On
              2018-07-09 11:45 AM, Klaus Wenninger wrote:</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>
              On 07/09/2018 05:33 PM, Digimer wrote:</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>
              On 2018-07-09 09:56 AM, Klaus Wenninger wrote:</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>
              On 07/09/2018 03:49 PM, Digimer wrote:</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>
              On 2018-07-09 08:31 AM, Klaus Wenninger wrote:</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>
              On 07/09/2018 02:04 PM, Confidential Company wrote:</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>>
              Hi,</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>>
              Any ideas what triggers fencing script or</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>>
              stonith?</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>>
              Given the setup below:</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>>
              1. I have two nodes</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>>
              2. Configured fencing on both nodes</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>>
              3. Configured delay=15 and delay=30 on fence1(for</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>>
              Node1) and</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>>
              fence2(for Node2) respectively</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>>
              *What does it mean to configured delay in</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>>
              stonith? wait for 15</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>> seconds</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>>
              before it fence the node?</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>
              Given that on a 2-node-cluster you don't have real</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>
              quorum to make</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>> one</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>
              partial cluster fence the rest of the nodes the</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>
              different delays</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>> are
              meant</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>
              to prevent a fencing-race.</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>
              Without different delays that would lead to both</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>
              nodes fencing each</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>
              other at the same time - finally both being down.</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>
              Not true, the faster node will kill the slower node</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>
              first. It is</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>
              possible that through misconfiguration, both could</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>
              die, but it's rare</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>
              and easily avoided with a 'delay="15"' set on the</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>
              fence config for</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>> the</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>
              node you want to win.</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>
              What exactly is not true? Aren't we saying the same?</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>
              Of course one of the delays can be 0 (most important is</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>
              that</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>
              they are different).</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>
              Perhaps I misunderstood your message. It seemed to me</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>
              that the</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>
              implication was that fencing in 2-node without a delay</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>
              always ends up</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>
              with both nodes being down, which isn't the case. It can</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>
              happen if the</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>
              fence methods are not setup right (ie: the node isn't set</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>
              to</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>
              immediately</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>
              power off on ACPI power button event).</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>
              Yes, a misunderstanding I guess.</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>
              Should have been more verbose in saying that due to the</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>
              time between the fencing-command fired off to the fencing</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>
              device and the actual fencing taking place (as you state</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>
              dependent on how it is configured in detail - but a</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>
              measurable</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>
              time in all cases) there is a certain probability that
              when</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>
              both nodes start fencing at roughly the same time we will</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>
              end up with 2 nodes down.</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>
              Everybody has to find his own tradeoff between reliability</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>
              fence-races are prevented and fencing delay I guess.</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>
              We've used this;</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>> 1.
              IPMI (with the guest OS set to immediately power off) as</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>
              primary,</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>> with
              a 15 second delay on the active node.</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>> 2.
              Two Switched PDUs (two power circuits, two PSUs) as backup</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>
              fencing</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>> for
              when IPMI fails, with no delay.</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>> In
              ~8 years, across dozens and dozens of clusters and</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>
              countless fence</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>
              actions, we've never had a dual-fence event (where both
              nodes</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>> go
              down).</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>> So
              it can be done safely, but as always, test test test</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>
              before prod.</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>> No doubt
              about that this setup is working reliably.</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>> You just
              have to know your fencing-devices and</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>> which
              delays they involve.</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>> If we
              are talking about SBD (with disk as otherwise</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>> it
              doesn't work in a sensible way in 2-node-clusters)</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>> for
              instance I would strongly advise using a delay.</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>> So I
              guess it is important to understand the basic</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>> idea
              behind this different delay-based fence-race</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>
              avoidance.</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>
              Afterwards you can still decide why it is no issue</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>> in your
              own setup.</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>
              If the delay is set on both nodes, and they are</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>
              different, it will work</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>
              fine. The reason not to do this is that if you use 0,</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>
              then don't use</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>
              anything at all (0 is default), and any other value</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>
              causes avoidable</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>
              fence delays.</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>
              Don't use a delay on the other node, just the node</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>
              you want to live</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>> in</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>
              such a case.</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>>
              *Given Node1 is active and Node2 goes down, does</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>>
              it mean fence1</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>> will</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>>
              first execute and shutdowns Node1 even though</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>>
              Node2 goes down?</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>
              If Node2 managed to sign off properly it will not.</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>
              If network-connection is down so that Node2 can't</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>
              inform Node1 that</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>> it</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>
              is going</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>
              down and finally has stopped all resources it will</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>
              be fenced by</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>> Node1.</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>
              Regards,</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>>
              Klaus</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>
              Fencing occurs in two cases;</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>
              1. The node stops responding (meaning it's in an</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>
              unknown state, so</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>> it is</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>
              fenced to force it into a known state).</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>
              2. A resource / service fails to stop stop. In this</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>
              case, the</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>> service
              is</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>
              in an unknown state, so the node is fenced to force</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>
              the service into</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>> a</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>
              known state so that it can be safely recovered on the</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>
              peer.</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>
              Graceful withdrawal of the node from the cluster, and</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>
              graceful</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>> stopping</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>
              of services will not lead to a fence (because in both</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>
              cases, the</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>> node /</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>>
              service are in a known state - off).</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>>>>>>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>>
              ______________________________</span><wbr
              style="font-size:12.8px"><span style="font-size:12.8px">_________________</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>> Users
              mailing list: </span><a
              href="mailto:Users@clusterlabs.org"
              style="font-size:12.8px" moz-do-not-send="true">Users@clusterlabs.org</a><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>> </span><a
              href="https://lists.clusterlabs.org/mailman/listinfo/users"
              rel="noreferrer" target="_blank" style="font-size:12.8px"
              moz-do-not-send="true">https://lists.clusterlabs.org/<wbr>mailman/listinfo/users</a><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>> Project
              Home: </span><a href="http://www.clusterlabs.org/"
              rel="noreferrer" target="_blank" style="font-size:12.8px"
              moz-do-not-send="true">http://www.clusterlabs.org</a><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>> Getting
              started: </span><a
              href="http://www.clusterlabs.org/doc/Cluster_from_Scra"
              rel="noreferrer" target="_blank" style="font-size:12.8px"
              moz-do-not-send="true">http://www.clusterlabs.org/<wbr>doc/Cluster_from_Scra</a><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>> tch.pdf</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>> Bugs: </span><a
              href="http://bugs.clusterlabs.org/" rel="noreferrer"
              target="_blank" style="font-size:12.8px"
              moz-do-not-send="true">http://bugs.clusterlabs.org</a><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>>
              ______________________________</span><wbr
              style="font-size:12.8px"><span style="font-size:12.8px">_________________</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>> Users mailing
              list: </span><a href="mailto:Users@clusterlabs.org"
              style="font-size:12.8px" moz-do-not-send="true">Users@clusterlabs.org</a><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>> </span><a
              href="https://lists.clusterlabs.org/mailman/listinfo/users"
              rel="noreferrer" target="_blank" style="font-size:12.8px"
              moz-do-not-send="true">https://lists.clusterlabs.org/<wbr>mailman/listinfo/users</a><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>> Project Home: </span><a
              href="http://www.clusterlabs.org/" rel="noreferrer"
              target="_blank" style="font-size:12.8px"
              moz-do-not-send="true">http://www.clusterlabs.org</a><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>> Getting
              started: </span><a
              href="http://www.clusterlabs.org/doc/Cluster_from_Scratc"
              rel="noreferrer" target="_blank" style="font-size:12.8px"
              moz-do-not-send="true">http://www.clusterlabs.org/<wbr>doc/Cluster_from_Scratc</a><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>> h.pdf</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>> Bugs: </span><a
              href="http://bugs.clusterlabs.org/" rel="noreferrer"
              target="_blank" style="font-size:12.8px"
              moz-do-not-send="true">http://bugs.clusterlabs.org</a><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>>
              ______________________________</span><wbr
              style="font-size:12.8px"><span style="font-size:12.8px">_________________</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>> Users mailing list: </span><a
              href="mailto:Users@clusterlabs.org"
              style="font-size:12.8px" moz-do-not-send="true">Users@clusterlabs.org</a><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>> </span><a
              href="https://lists.clusterlabs.org/mailman/listinfo/users"
              rel="noreferrer" target="_blank" style="font-size:12.8px"
              moz-do-not-send="true">https://lists.clusterlabs.org/<wbr>mailman/listinfo/users</a><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>></span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>> Project Home: </span><a
              href="http://www.clusterlabs.org/" rel="noreferrer"
              target="_blank" style="font-size:12.8px"
              moz-do-not-send="true">http://www.clusterlabs.org</a><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>> Getting started: </span><a
              href="http://www.clusterlabs.org/doc/Cluster_from_Scratch"
              rel="noreferrer" target="_blank" style="font-size:12.8px"
              moz-do-not-send="true">http://www.clusterlabs.org/<wbr>doc/Cluster_from_Scratch</a><span
              style="font-size:12.8px">.</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>> pdf</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">>> Bugs: </span><a
              href="http://bugs.clusterlabs.org/" rel="noreferrer"
              target="_blank" style="font-size:12.8px"
              moz-do-not-send="true">http://bugs.clusterlabs.org</a><br>
          </div>
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
Users mailing list: <a class="moz-txt-link-abbreviated" href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a>
<a class="moz-txt-link-freetext" href="https://lists.clusterlabs.org/mailman/listinfo/users">https://lists.clusterlabs.org/mailman/listinfo/users</a>

Project Home: <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org">http://www.clusterlabs.org</a>
Getting started: <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a>
Bugs: <a class="moz-txt-link-freetext" href="http://bugs.clusterlabs.org">http://bugs.clusterlabs.org</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>