[ClusterLabs] Two-node cluster stops resources when second node is running alone

Thu Feb 20 12:05:58 EST 2020

I have a two-node cluster, meqfc0 and meqfc1.  When both nodes are up, the cluster will run OK on either meqfc0 or meqfc1.   

My practice for OS patching is to patch the inactive node, migrate, then patch the formerly active node.  Patching requires a reboot.

The cluster has run peacefully with meqfc0 active, so I patched and rebooted meqfc1.  The cluster stayed active.

Then I migrated the resources to meqfc1.  The cluster stabilized and ran OK.

I patched and rebooted mepfc0.  As soon as it shut down, all the cluster resources on meqfc1 stopped.  The cluster was still up, crm status listed meqfc1 as online and meqfc0 as offline.  All the resources showed Stopped on meqfc1.

When meqfc0 finished rebooting and rejoined the cluster, the resources migrated themselves over to meqfc0 and started up.

This does not make sense to me, that the cluster can run as A+B or A, but not B.

Basic configuration from  cib.xml:

cib crm_feature_set="3.0.13" validate-with="pacemaker-2.7" epoch="93" num_updates="0" admin_epoch="0" cib-last-written="Wed Feb 19 15:36:41 2020" update-origin="meqfc1" update-client="crmd" update-user="hacluster" have-quorum="1" dc-uuid="1">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-have-watchdog" name="have-watchdog" value="false"/>
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.16-4.8-77ea74d"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
        <nvpair id="cib-bootstrap-options-cluster-name" name="cluster-name" value="hacluster-uni"/>
        <nvpair name="stonith-enabled" value="false" id="cib-bootstrap-options-stonith-enabled"/>
        <nvpair name="placement-strategy" value="balanced" id="cib-bootstrap-options-placement-strategy"/>
        <nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1582148201"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="1" uname="meqfc0"/>
      <node id="2" uname="meqfc1"/>
    </nodes>
    <resources>
	(about 200 lines removed)
    </resources>
    <constraints/>
    <rsc_defaults>
      <meta_attributes id="rsc-options">
        <nvpair name="resource-stickiness" value="1" id="rsc-options-resource-stickiness"/>
        <nvpair name="migration-threshold" value="3" id="rsc-options-migration-threshold"/>
      </meta_attributes>
    </rsc_defaults>
    <op_defaults>
      <meta_attributes id="op-options">
        <nvpair name="timeout" value="600" id="op-options-timeout"/>
        <nvpair name="record-pending" value="true" id="op-options-record-pending"/>
      </meta_attributes>
    </op_defaults>
  </configuration>
</cib>

There is one anomalous entry in cib.xml, the line:

    <constraints/>

That syntax is wrong, and there should be an opening and closing constraint, shouldn't there?

Please advise, thank you.

John Reynolds