<div dir="ltr">Thanks. This implies that I officially do not understand what it is that fencing can do for me, in my simple cluster. Back to the drawing board.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Apr 17, 2019 at 3:33 PM digimer <<a href="mailto:lists@alteeve.ca">lists@alteeve.ca</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF">
    <p>Fencing requires some mechanism, outside the nodes themselves,
      that can terminate the nodes. Typically, IPMI (iLO, iRMC, RSA,
      DRAC, etc) is used for this. Alternatively, switched PDUs are
      common. If you don't have these but do have a watchdog timer on
      your nodes, SBD (storage-based death) can work.</p>
    <p>You can use 'fence_<device> <options> -o status' at
      the command line to figure out the what will work with your
      hardware. Once you can called 'fence_foo ... -o status' and get
      the status of each node, then translating that into a pacemaker
      configuration is pretty simple. That's when you enable stonith. <br>
    </p>
    <p>Once stonith is setup and working in pacemaker (ie: you can crash
      a node and the peer reboots it), then you will go to DRBD and set
      'fencing: resource-and-stonith;' (tells DRBD to block on
      communication failure with the peer and request a fence), and then
      setup the 'fence-handler /path/to/crm-fence-peer.sh' and
      'unfence-handler /path/to/crm-unfence-handler.sh' (I am going from
      memory, check the man page to verify syntax). <br>
    </p>
    <p>With all this done; if either pacemaker/corosync or DRBD lose
      contact with the peer, they will block and fence. Only after the
      peer has been confirmed terminated will IO resume. This way,
      split-nodes become effectively impossible.</p>
    <p>digimer<br>
    </p>
    <div class="gmail-m_-5179552301465381124moz-cite-prefix">On 2019-04-17 5:17 p.m., JCA wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">
        <div dir="ltr">
          <div dir="ltr">Here is what I did:
            <div><br>
            </div>
            <div>
              <div># pcs stonith create disk_fencing fence_scsi
                pcmk_host_list="one two" pcmk_monitor_action="metadata"
                pcmk_reboot_action="off"
                devices="/dev/disk/by-id/ata-VBOX_HARDDISK_VBaaa429e4-514e8ecb"
                meta provides="unfencing"</div>
            </div>
            <div><br>
            </div>
            <div>where ata-VBOX-... corresponds to the device where I
              have the partition that is shared between both nodes in my
              cluster. The command completes without any errors (that I
              can see) and after that I have</div>
            <div><br>
            </div>
            <div>
              <div># pcs status</div>
              <div>Cluster name: ClusterOne</div>
              <div>Stack: corosync</div>
              <div>Current DC: one (version 1.1.19-8.el7_6.4-c3c624ea3d)
                - partition with quorum</div>
              <div>Last updated: Wed Apr 17 14:35:25 2019</div>
              <div>Last change: Wed Apr 17 14:11:14 2019 by root via
                cibadmin on one</div>
              <div><br>
              </div>
              <div>2 nodes configured</div>
              <div>5 resources configured</div>
              <div><br>
              </div>
              <div>Online: [ one two ]</div>
              <div><br>
              </div>
              <div>Full list of resources:</div>
              <div><br>
              </div>
              <div> MyCluster<span style="white-space:pre-wrap"> </span>(ocf::myapp:myapp-script):<span style="white-space:pre-wrap">      </span>Stopped</div>
              <div> Master/Slave Set: DrbdDataClone [DrbdData]</div>
              <div>     Stopped: [ one two ]</div>
              <div> DrbdFS<span style="white-space:pre-wrap">    </span>(ocf::heartbeat:Filesystem):<span style="white-space:pre-wrap">    </span>Stopped</div>
              <div> disk_fencing <span style="white-space:pre-wrap">    </span>(stonith:fence_scsi):<span style="white-space:pre-wrap">   </span>Stopped</div>
              <div><br>
              </div>
              <div>Daemon Status:</div>
              <div>  corosync: active/enabled</div>
              <div>  pacemaker: active/enabled</div>
              <div>  pcsd: active/enabled</div>
            </div>
            <div><br>
            </div>
            <div>Things stay that way indefinitely, until I set
              stonith-enabled to false - at which point all the
              resources above get started immediately.</div>
            <div><br>
            </div>
            <div>Obviously, I am missing something big here. But, what
              is it?</div>
            <div><br>
            </div>
          </div>
        </div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Wed, Apr 17, 2019 at 2:59
          PM Adam Budziński <<a href="mailto:budzinski.adam@gmail.com" target="_blank">budzinski.adam@gmail.com</a>>
          wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
          <div dir="auto">You did not configure any fencing device.</div>
          <br>
          <div class="gmail_quote">
            <div dir="ltr" class="gmail_attr">śr., 17.04.2019, 22:51
              użytkownik JCA <<a href="mailto:1.41421@gmail.com" target="_blank">1.41421@gmail.com</a>>
              napisał:<br>
            </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
              <div dir="ltr">
                <div dir="ltr">I am trying to get fencing working, as
                  described in the "Cluster from Scratch" guide, and I
                  am stymied at get-go :-(
                  <div><br>
                  </div>
                  <div>The document mentions a property named
                    stonith-enabled. When I was trying to get my first
                    cluster going, I noticed that my resources would
                    start only when this property is set to false, by
                    means of </div>
                  <div><br>
                  </div>
                  <div>    # pcs property set stonith-enabled=false<br>
                  </div>
                  <div><br>
                  </div>
                  <div>Otherwise, all the resources remain stopped.</div>
                  <div><br>
                  </div>
                  <div>I created a fencing resource for the partition
                    that I am sharing across the the nodes, by means of
                    DRBD. This works fine - but I still have the same
                    problem as above - i.e. when stonith-enabled is set
                    to true, all the resources get stopped, and remain
                    in that state.</div>
                  <div><br>
                  </div>
                  <div>I am very confused here. Can anybody point me in
                    the right direction out of this conundrum?</div>
                  <div><br>
                  </div>
                  <div><br>
                  </div>
                  <div><br>
                  </div>
                </div>
              </div>
              _______________________________________________<br>
              Manage your subscription:<br>
              <a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
              <br>
              ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer noreferrer" target="_blank">https://www.clusterlabs.org/</a></blockquote>
          </div>
          _______________________________________________<br>
          Manage your subscription:<br>
          <a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
          <br>
          ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a></blockquote>
      </div>
      <br>
      <fieldset class="gmail-m_-5179552301465381124mimeAttachmentHeader"></fieldset>
      <pre class="gmail-m_-5179552301465381124moz-quote-pre">_______________________________________________
Manage your subscription:
<a class="gmail-m_-5179552301465381124moz-txt-link-freetext" href="https://lists.clusterlabs.org/mailman/listinfo/users" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a>

ClusterLabs home: <a class="gmail-m_-5179552301465381124moz-txt-link-freetext" href="https://www.clusterlabs.org/" target="_blank">https://www.clusterlabs.org/</a></pre>
    </blockquote>
  </div>

</blockquote></div>