[ClusterLabs] iSCSI Target resource starts on both nodes -- despite my colocation constraint

Mon Jun 24 11:55:37 EDT 2019

We have a two node cluster managing a two iSCSI targets on top of DRBD.
Except for the fence agents, the resources are configured to reside on
only one node at a time. These are CentOS 7 servers running Pacemaker
version 1.1.19-8.el7_6.4.  DRBD is version: 9.0.16-1.

We have two iSCSI targets that are served up by Pacemaker.  I configured
these servers so that these targets are 100% controlled by Pacemaker --
or at least that was my intention.

My iSCSI targets part of a resource group, called ISCSIGroup.  I have a
colocation constraint that I believe restricts this group so that it can
only run on the same node that hosts my DRBD resource. DRBD is
configured as primary/secondary.

-----

ISCSIGroup: storageLVM ISCSICentipedeIP ISCSIMillipedeIP targetRHEVM lunRHEVM targetVMStorage lunVMStorage

Colocation Constraints:
  ISCSIGroup with storageClusterClone (score:INFINITY) (with-rsc-role:Master)

-----

I was doing OS maintenance on the node that was NOT hosting the iSCSI
targets (storage2).  I took the following steps to do the updates: 

I put the cluster into maintenance mode and did a yum update on
storage2. Then, I took the cluster out of maintenance mode (which was
successfully done.).

Then, I put the cluster, on storage2, into standby mode and then stopped
the cluster on storage2. I then rebooted storage2. After storage2 came
back online, I started the cluster on storage2. All of these steps were
successful. Then, as a final step, I instructed pacemaker to unstandby
storage2. That is when things went horribly wrong.

What happened is that storage1 (primary node) complained that the
monitor action for the iscsi targets failed on storage2.  The error is
suggesting that they were already running on storage2.

Are my constraints, not sufficient?  I would really appreciate some
input on what I've got wrong, or what is missing that would cause the
target resources to start on both nodes.

Thanks,
Bryan

Here are logs from storage1, which was the node hosting the resource:

Jun 20 11:48:35 storage1 crmd[240695]:  notice: Initiating monitor operation targetRHEVM_monitor_0 on storage2
Jun 20 11:48:35 storage1 crmd[240695]:  notice: Initiating monitor operation lunRHEVM_monitor_0 on storage2
Jun 20 11:48:35 storage1 crmd[240695]:  notice: Initiating monitor operation targetVMStorage_monitor_0 on storage2
Jun 20 11:48:35 storage1 crmd[240695]:  notice: Initiating monitor operation lunVMStorage_monitor_0 on storage2
Jun 20 11:48:36 storage1 crmd[240695]: warning: Action 18 (targetRHEVM_monitor_0) on storage2 failed (target: 7 vs. rc: 0): Error
Jun 20 11:48:36 storage1 crmd[240695]:  notice: Transition aborted by operation targetRHEVM_monitor_0 'modify' on storage2: Event failed
Jun 20 11:48:36 storage1 crmd[240695]: warning: Action 20 (targetVMStorage_monitor_0) on storage2 failed (target: 7 vs. rc: 0): Error
Jun 20 11:48:36 storage1 crmd[240695]:  notice: Transition 1 (Complete=12, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-1054.bz2): Complete
Jun 20 11:48:36 storage1 pengine[240694]:   error: Resource targetRHEVM is active on 2 nodes (attempting recovery)
Jun 20 11:48:36 storage1 pengine[240694]:  notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information
Jun 20 11:48:36 storage1 pengine[240694]:   error: Resource targetVMStorage is active on 2 nodes (attempting recovery)
Jun 20 11:48:36 storage1 pengine[240694]:  notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information
Jun 20 11:48:36 storage1 pengine[240694]:  notice:  * Move       targetRHEVM     ( storage2 -> storage1 )
Jun 20 11:48:36 storage1 pengine[240694]:  notice:  * Restart    lunRHEVM        (             storage1 )   due to required targetRHEVM start
Jun 20 11:48:36 storage1 pengine[240694]:  notice:  * Move       targetVMStorage     ( storage2 -> storage1 )
Jun 20 11:48:36 storage1 pengine[240694]:  notice:  * Restart    lunVMStorage        (             storage1 )   due to required targetVMStorage start
Jun 20 11:48:36 storage1 pengine[240694]:   error: Calculated transition 2 (with errors), saving inputs in /var/lib/pacemaker/pengine/pe-error-7.bz2
Jun 20 11:48:36 storage1 crmd[240695]:  notice: Initiating stop operation lunVMStorage_stop_0 locally on storage1
Jun 20 11:48:36 storage1 iSCSILogicalUnit(lunVMStorage)[41725]: INFO: Deleted LUN 1.
Jun 20 11:48:37 storage1 iSCSILogicalUnit(lunVMStorage)[41725]: INFO: Deleted storage object lunVMStorage.
Jun 20 11:48:37 storage1 crmd[240695]:  notice: Result of stop operation for lunVMStorage on storage1: 0 (ok)
Jun 20 11:48:37 storage1 crmd[240695]:  notice: Initiating stop operation targetVMStorage_stop_0 locally on storage1
Jun 20 11:48:37 storage1 crmd[240695]:  notice: Initiating stop operation targetVMStorage_stop_0 on storage2
Jun 20 11:48:37 storage1 iSCSITarget(targetVMStorage)[41806]: INFO: Deleted Target iqn.2019-03.com.leepfrog:storage.vmstorage.
Jun 20 11:48:37 storage1 crmd[240695]:  notice: Result of stop operation for targetVMStorage on storage1: 0 (ok)
Jun 20 11:48:37 storage1 crmd[240695]:  notice: Initiating stop operation lunRHEVM_stop_0 locally on storage1
Jun 20 11:48:37 storage1 iSCSILogicalUnit(lunRHEVM)[41863]: INFO: Deleted LUN 0.
Jun 20 11:48:38 storage1 iSCSILogicalUnit(lunRHEVM)[41863]: INFO: Deleted storage object lunRHEVM.
Jun 20 11:48:38 storage1 crmd[240695]:  notice: Result of stop operation for lunRHEVM on storage1: 0 (ok)
Jun 20 11:48:38 storage1 crmd[240695]:  notice: Initiating stop operation targetRHEVM_stop_0 locally on storage1
Jun 20 11:48:38 storage1 crmd[240695]:  notice: Initiating stop operation targetRHEVM_stop_0 on storage2
Jun 20 11:48:38 storage1 iSCSITarget(targetRHEVM)[42032]: INFO: Deleted Target iqn.2019-02.com.leepfrog:storage.rhevm.
Jun 20 11:48:38 storage1 crmd[240695]:  notice: Result of stop operation for targetRHEVM on storage1: 0 (ok)
Jun 20 11:48:38 storage1 crmd[240695]:  notice: Initiating start operation targetRHEVM_start_0 locally on storage1
Jun 20 11:48:38 storage1 iSCSITarget(targetRHEVM)[42089]: INFO: Parameter auto_add_default_portal is now 'false'.
Jun 20 11:48:39 storage1 iSCSITarget(targetRHEVM)[42089]: INFO: Created target iqn.2019-02.com.leepfrog:storage.rhevm. Created TPG 1.
Jun 20 11:48:39 storage1 iSCSITarget(targetRHEVM)[42089]: INFO: Created Node ACL for iqn.1994-05.com.redhat:3d066d1f423e
Jun 20 11:48:39 storage1 iSCSITarget(targetRHEVM)[42089]: INFO: Created Node ACL for iqn.1994-05.com.redhat:84f0f7458c58
Jun 20 11:48:39 storage1 crmd[240695]:  notice: Result of start operation for targetRHEVM on storage1: 0 (ok)
Jun 20 11:48:39 storage1 crmd[240695]:  notice: Initiating monitor operation targetRHEVM_monitor_10000 locally on storage1
Jun 20 11:48:39 storage1 crmd[240695]:  notice: Initiating start operation lunRHEVM_start_0 locally on storage1
Jun 20 11:48:39 storage1 iSCSILogicalUnit(lunRHEVM)[42221]: INFO: Created block storage object lunRHEVM using /dev/storage/rhevm.
Jun 20 11:48:40 storage1 iSCSILogicalUnit(lunRHEVM)[42221]: INFO: Created LUN 0. Created LUN 0->0 mapping in node ACL iqn.1994-05.com.redhat:84f0f7458c58 Created LUN 0->0 mapping in node ACL iqn.1994-05.com.redhat:3d066d1f423e
Jun 20 11:48:40 storage1 iSCSILogicalUnit(lunRHEVM)[42221]: ERROR: No such NetworkPortal in configfs: /sys/kernel/config/target/iscsi/iqn.2019-02.com.leepfrog:storage.rhevm/tpgt_1/np/0.0.0.0:3260
Jun 20 11:48:40 storage1 crmd[240695]:  notice: Result of start operation for lunRHEVM on storage1: 0 (ok)
Jun 20 11:48:40 storage1 crmd[240695]:  notice: Initiating monitor operation lunRHEVM_monitor_10000 locally on storage1
Jun 20 11:48:40 storage1 crmd[240695]:  notice: Initiating start operation targetVMStorage_start_0 locally on storage1
Jun 20 11:48:40 storage1 iSCSITarget(targetVMStorage)[42383]: INFO: Parameter auto_add_default_portal is now 'false'.
Jun 20 11:48:41 storage1 iSCSITarget(targetVMStorage)[42383]: INFO: Created target iqn.2019-03.com.leepfrog:storage.vmstorage. Created TPG 1.
Jun 20 11:48:41 storage1 iSCSITarget(targetVMStorage)[42383]: INFO: Created Node ACL for iqn.1994-05.com.redhat:3d066d1f423e
Jun 20 11:48:41 storage1 iSCSITarget(targetVMStorage)[42383]: INFO: Created Node ACL for iqn.1994-05.com.redhat:84f0f7458c58
Jun 20 11:48:41 storage1 crmd[240695]:  notice: Result of start operation for targetVMStorage on storage1: 0 (ok)
Jun 20 11:48:41 storage1 crmd[240695]:  notice: Initiating monitor operation targetVMStorage_monitor_10000 locally on storage1
Jun 20 11:48:41 storage1 crmd[240695]:  notice: Initiating start operation lunVMStorage_start_0 locally on storage1
Jun 20 11:48:41 storage1 iSCSILogicalUnit(lunVMStorage)[42555]: INFO: Created block storage object lunVMStorage using /dev/storage/vmstorage.
Jun 20 11:48:42 storage1 iSCSILogicalUnit(lunVMStorage)[42555]: INFO: Created LUN 1. Created LUN 1->1 mapping in node ACL iqn.1994-05.com.redhat:84f0f7458c58 Created LUN 1->1 mapping in node ACL iqn.1994-05.com.redhat:3d066d1f423e
Jun 20 11:48:42 storage1 iSCSILogicalUnit(lunVMStorage)[42555]: ERROR: No such NetworkPortal in configfs: /sys/kernel/config/target/iscsi/iqn.2019-03.com.leepfrog:storage.vmstorage/tpgt_1/np/0.0.0.0:3260
Jun 20 11:48:42 storage1 lrmd[240692]:  notice: lunVMStorage_start_0:42555:stderr [ /usr/lib/ocf/resource.d/heartbeat/iSCSILogicalUnit: line 410: /sys/kernel/config/target/core/iblock_0/lunVMStorage/wwn/vpd_unit_serial: No such file or directory ]
Jun 20 11:48:42 storage1 crmd[240695]:  notice: Result of start operation for lunVMStorage on storage1: 0 (ok)
Jun 20 11:48:42 storage1 crmd[240695]:  notice: Initiating monitor operation lunVMStorage_monitor_10000 locally on storage1
Jun 20 11:48:42 storage1 crmd[240695]:  notice: Transition 2 (Complete=18, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-error-7.bz2): Complete
Jun 20 11:48:42 storage1 crmd[240695]:  notice: State transition S_TRANSITION_ENGINE -> S_IDLE

Here are the logs from storage2, which was coming back online:

Jun 20 11:48:36 storage2 crmd[22305]:  notice: Result of probe operation for ISCSIMillipedeIP on storage2: 7 (not running)
Jun 20 11:48:36 storage2 crmd[22305]:  notice: Result of probe operation for ISCSICentipedeIP on storage2: 7 (not running)
Jun 20 11:48:36 storage2 crmd[22305]:  notice: Result of probe operation for targetRHEVM on storage2: 0 (ok)
Jun 20 11:48:36 storage2 crmd[22305]:  notice: Result of probe operation for targetVMStorage on storage2: 0 (ok)
Jun 20 11:48:36 storage2 crmd[22305]:  notice: Result of probe operation for lunRHEVM on storage2: 7 (not running)
Jun 20 11:48:36 storage2 crmd[22305]:  notice: Result of probe operation for lunVMStorage on storage2: 7 (not running)

-- 
Bryan K. Walton                                           319-337-3877 
Linux Systems Administrator                 Leepfrog Technologies, Inc