[ClusterLabs] gfs2 mounts don't come online after fence/reboot

Mon Apr 6 10:06:22 EDT 2020

Good Day All!

I am trying to setup a new gfs2 cluster on centos7 using the included
HA/pacemaker.  Previously we did this on Centos6 using cman.

We have a 7 node cluster that we need to share gfs2 iscsi SAN mounts
between.  I setup two test mounts but when i fence or reboot a node
the mounts do not come back online on the node that was rebooted.  I
can clear this using

pcs resource cleanup testvol3_mnt

What can I do with my setup so have the cluster bring the gfs2 mounts
online once the problem node is back up and responsive?

Thank you so much for any help.

Here is what I see in the log:

Apr  6 14:00:51 data2-eval crmd[2375]:  notice: Initiating start
operation testvol3_mnt:6_start_0 on tools-eval
Apr  6 14:00:51 data2-eval crmd[2375]: warning: Action 102
(testvol2_mnt:6_start_0) on tools-eval failed (target: 0 vs. rc: 5):
Error
Apr  6 14:00:51 data2-eval crmd[2375]:  notice: Transition aborted by
operation testvol2_mnt_start_0 'modify' on tools-eval: Event failed
Apr  6 14:00:51 data2-eval crmd[2375]:  notice: Transition aborted by
transient_attributes.1 'create': Transient attribute change
Apr  6 14:00:51 data2-eval crmd[2375]: warning: Action 120
(testvol3_mnt:6_start_0) on tools-eval failed (target: 0 vs. rc: 5):
Error
Apr  6 14:00:52 data2-eval crmd[2375]:  notice: Transition 437
(Complete=21, Pending=0, Fired=0, Skipped=2, Incomplete=6,
Source=/var/lib/pacemaker/pengine/pe-input-131.bz2): Stopped
Apr  6 14:00:52 data2-eval pengine[2374]: warning: Processing failed
start of fencemap2 on data1-eval: unknown error
Apr  6 14:00:52 data2-eval pengine[2374]: warning: Processing failed
start of fencemap2 on map1-eval: unknown error
Apr  6 14:00:52 data2-eval pengine[2374]: warning: Processing failed
start of testvol2_mnt:2 on tools-eval: not installed
Apr  6 14:00:52 data2-eval pengine[2374]:  notice: Preventing
testvol2_mnt-clone from re-starting on tools-eval: operation start
failed 'not installed' (5)
Apr  6 14:00:52 data2-eval pengine[2374]: warning: Processing failed
start of testvol2_mnt:2 on tools-eval: not installed
Apr  6 14:00:52 data2-eval pengine[2374]:  notice: Preventing
testvol2_mnt-clone from re-starting on tools-eval: operation start
failed 'not installed' (5)
Apr  6 14:00:52 data2-eval pengine[2374]: warning: Processing failed
start of testvol3_mnt:2 on tools-eval: not installed
Apr  6 14:00:52 data2-eval pengine[2374]:  notice: Preventing
testvol3_mnt-clone from re-starting on tools-eval: operation start
failed 'not installed' (5)
Apr  6 14:00:52 data2-eval pengine[2374]: warning: Processing failed
start of testvol3_mnt:2 on tools-eval: not installed
Apr  6 14:00:52 data2-eval pengine[2374]:  notice: Preventing
testvol3_mnt-clone from re-starting on tools-eval: operation start
failed 'not installed' (5)
Apr  6 14:00:52 data2-eval pengine[2374]: warning: Processing failed
start of fencemap2 on data2-eval: unknown error
Apr  6 14:00:52 data2-eval pengine[2374]: warning: Processing failed
start of fencemap2 on cache1-eval: unknown error
Apr  6 14:00:52 data2-eval pengine[2374]: warning: Processing failed
start of fencemap2 on map2-eval: unknown error
Apr  6 14:00:52 data2-eval pengine[2374]: warning: Processing failed
start of fencemap2 on cache2-eval: unknown error
Apr  6 14:00:52 data2-eval pengine[2374]: warning: Forcing fencemap2
away from cache1-eval after 1000000 failures (max=1000000)
Apr  6 14:00:52 data2-eval pengine[2374]: warning: Forcing fencemap2
away from cache2-eval after 1000000 failures (max=1000000)
Apr  6 14:00:52 data2-eval pengine[2374]: warning: Forcing fencemap2
away from data1-eval after 1000000 failures (max=1000000)
Apr  6 14:00:52 data2-eval pengine[2374]: warning: Forcing fencemap2
away from data2-eval after 1000000 failures (max=1000000)
Apr  6 14:00:52 data2-eval pengine[2374]: warning: Forcing fencemap2
away from map1-eval after 1000000 failures (max=1000000)
Apr  6 14:00:52 data2-eval pengine[2374]: warning: Forcing fencemap2
away from map2-eval after 1000000 failures (max=1000000)
Apr  6 14:00:52 data2-eval pengine[2374]: warning: Forcing
testvol2_mnt-clone away from tools-eval after 1000000 failures
(max=1000000)
Apr  6 14:00:52 data2-eval pengine[2374]: warning: Forcing
testvol2_mnt-clone away from tools-eval after 1000000 failures
(max=1000000)
Apr  6 14:00:52 data2-eval pengine[2374]: warning: Forcing
testvol2_mnt-clone away from tools-eval after 1000000 failures
(max=1000000)
Apr  6 14:00:52 data2-eval pengine[2374]: warning: Forcing
testvol2_mnt-clone away from tools-eval after 1000000 failures
(max=1000000)
Apr  6 14:00:52 data2-eval pengine[2374]: warning: Forcing
testvol2_mnt-clone away from tools-eval after 1000000 failures
(max=1000000)
Apr  6 14:00:52 data2-eval pengine[2374]: warning: Forcing
testvol2_mnt-clone away from tools-eval after 1000000 failures
(max=1000000)

And my cib.xml i removed the fencing stuff to make it a shorter paste

<cib crm_feature_set="3.0.14" validate-with="pacemaker-2.10"
epoch="128" num_updates="0" admin_epoch="0" cib-last-written="Mon Apr
6 14:00:49 2020" update-origin="tools-eval" update-client="crmd"
update-user="hacluster" have-quorum="1" dc-uuid="3">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-have-watchdog"
name="have-watchdog" value="false"/>
        <nvpair id="cib-bootstrap-options-dc-version"
name="dc-version" value="1.1.20-5.el7_7.2-3c4c782f70"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure"
name="cluster-infrastructure" value="corosync"/>
        <nvpair id="cib-bootstrap-options-cluster-name"
name="cluster-name" value="gibseval"/>
        <nvpair id="cib-bootstrap-options-no-quorum-policy"
name="no-quorum-policy" value="freeze"/>
        <nvpair id="cib-bootstrap-options-last-lrm-refresh"
name="last-lrm-refresh" value="1586181264"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="1" uname="tools-eval">
        <instance_attributes id="nodes-1"/>
      </node>
      <node id="2" uname="data1-eval"/>
      <node id="3" uname="data2-eval">
        <instance_attributes id="nodes-3"/>
      </node>
      <node id="4" uname="map1-eval"/>
      <node id="5" uname="map2-eval"/>
      <node id="6" uname="cache1-eval"/>
      <node id="7" uname="cache2-eval"/>
    </nodes>
    <resources>
.
.
.

        <operations>
          <op id="fencedata2-monitor-interval-60s" interval="60s"
name="monitor"/>
        </operations>
      </primitive>
      <clone id="dlm-clone">
        <primitive class="ocf" id="dlm" provider="pacemaker" type="controld">
          <operations>
            <op id="dlm-monitor-interval-30s" interval="30s"
name="monitor" on-fail="fence"/>
            <op id="dlm-start-interval-0s" interval="0s" name="start"
timeout="90"/>
            <op id="dlm-stop-interval-0s" interval="0s" name="stop"
timeout="100"/>
          </operations>
        </primitive>
        <meta_attributes id="dlm-clone-meta_attributes">
          <nvpair id="dlm-clone-meta_attributes-interleave"
name="interleave" value="true"/>
          <nvpair id="dlm-clone-meta_attributes-ordered"
name="ordered" value="true"/>
        </meta_attributes>
      </clone>
      <clone id="clvmd-clone">
        <primitive class="ocf" id="clvmd" provider="heartbeat" type="clvm">
          <operations>
            <op id="clvmd-monitor-interval-30s" interval="30s"
name="monitor" on-fail="fence"/>
            <op id="clvmd-start-interval-0s" interval="0s"
name="start" timeout="90s"/>
            <op id="clvmd-stop-interval-0s" interval="0s" name="stop"
timeout="90s"/>
          </operations>
        </primitive>
        <meta_attributes id="clvmd-clone-meta_attributes">
          <nvpair id="clvmd-clone-meta_attributes-interleave"
name="interleave" value="true"/>
          <nvpair id="clvmd-clone-meta_attributes-ordered"
name="ordered" value="true"/>
        </meta_attributes>
      </clone>
      <clone id="testvol2_mnt-clone">
        <primitive class="ocf" id="testvol2_mnt" provider="heartbeat"
type="Filesystem">
          <instance_attributes id="testvol2_mnt-instance_attributes">
            <nvpair id="testvol2_mnt-instance_attributes-device"
name="device" value="/dev/testvol2/testvol2"/>
            <nvpair id="testvol2_mnt-instance_attributes-directory"
name="directory" value="/archive/testvol2"/>
            <nvpair id="testvol2_mnt-instance_attributes-fstype"
name="fstype" value="gfs2"/>
            <nvpair id="testvol2_mnt-instance_attributes-options"
name="options" value="noatime,nodiratime"/>
          </instance_attributes>
          <operations>
            <op id="testvol2_mnt-monitor-interval-10s" interval="10s"
name="monitor" on-fail="fence"/>
            <op id="testvol2_mnt-notify-interval-0s" interval="0s"
name="notify" timeout="60s"/>
            <op id="testvol2_mnt-start-interval-0s" interval="0s"
name="start" timeout="60s"/>
            <op id="testvol2_mnt-stop-interval-0s" interval="0s"
name="stop" timeout="60s"/>
          </operations>
        </primitive>
        <meta_attributes id="testvol2_mnt-clone-meta_attributes">
          <nvpair id="testvol2_mnt-clone-meta_attributes-interleave"
name="interleave" value="true"/>
        </meta_attributes>
      </clone>
      <clone id="testvol3_mnt-clone">
        <primitive class="ocf" id="testvol3_mnt" provider="heartbeat"
type="Filesystem">
          <meta_attributes id="testvol3_mnt-meta_attributes"/>
          <instance_attributes id="testvol3_mnt-instance_attributes">
            <nvpair id="testvol3_mnt-instance_attributes-device"
name="device" value="/dev/testvol3/testvol3"/>
            <nvpair id="testvol3_mnt-instance_attributes-directory"
name="directory" value="/archive/testvol3"/>
            <nvpair id="testvol3_mnt-instance_attributes-fstype"
name="fstype" value="gfs2"/>
            <nvpair id="testvol3_mnt-instance_attributes-options"
name="options" value="noatime,nodiratime,context="system_u:object_r:httpd_sys_content_t:s0""/>
          </instance_attributes>
          <operations>
            <op id="testvol3_mnt-monitor-interval-10s" interval="10s"
name="monitor" on-fail="fence"/>
            <op id="testvol3_mnt-notify-interval-0s" interval="0s"
name="notify" timeout="60s"/>
            <op id="testvol3_mnt-start-interval-0s" interval="0s"
name="start" timeout="60s"/>
            <op id="testvol3_mnt-stop-interval-0s" interval="0s"
name="stop" timeout="60s"/>
          </operations>
        </primitive>
        <meta_attributes id="testvol3_mnt-clone-meta_attributes">
          <nvpair id="testvol3_mnt-clone-meta_attributes-interleave"
name="interleave" value="true"/>
        </meta_attributes>
      </clone>
    </resources>
    <constraints>
      <rsc_order first="dlm-clone" first-action="start"
id="order-dlm-clone-clvmd-clone-mandatory" then="clvmd-clone"
then-action="start"/>
      <rsc_colocation id="colocation-clvmd-clone-dlm-clone-INFINITY"
rsc="clvmd-clone" score="INFINITY" with-rsc="dlm-clone"/>
    </constraints>
  </configuration>
</cib>