[ClusterLabs] iSCSI Target resource starts on both nodes -- despite my colocation constraint

Ken Gaillot kgaillot at redhat.com
Mon Jun 24 13:02:59 EDT 2019


On Mon, 2019-06-24 at 10:55 -0500, Bryan K. Walton wrote:
> We have a two node cluster managing a two iSCSI targets on top of
> DRBD.
> Except for the fence agents, the resources are configured to reside
> on
> only one node at a time. These are CentOS 7 servers running Pacemaker
> version 1.1.19-8.el7_6.4.  DRBD is version: 9.0.16-1.
> 
> We have two iSCSI targets that are served up by Pacemaker.  I
> configured
> these servers so that these targets are 100% controlled by Pacemaker
> --
> or at least that was my intention.
> 
> My iSCSI targets part of a resource group, called ISCSIGroup.  I have
> a
> colocation constraint that I believe restricts this group so that it
> can
> only run on the same node that hosts my DRBD resource. DRBD is
> configured as primary/secondary.
> 
> -----
> 
> ISCSIGroup: storageLVM ISCSICentipedeIP ISCSIMillipedeIP targetRHEVM
> lunRHEVM targetVMStorage lunVMStorage
> 
> Colocation Constraints:
>   ISCSIGroup with storageClusterClone (score:INFINITY) (with-rsc-
> role:Master)
> 
> -----
> 
> I was doing OS maintenance on the node that was NOT hosting the iSCSI
> targets (storage2).  I took the following steps to do the updates: 
> 
> I put the cluster into maintenance mode and did a yum update on
> storage2. Then, I took the cluster out of maintenance mode (which was
> successfully done.).
> 
> Then, I put the cluster, on storage2, into standby mode and then
> stopped
> the cluster on storage2. I then rebooted storage2. After storage2
> came
> back online, I started the cluster on storage2. All of these steps
> were
> successful. Then, as a final step, I instructed pacemaker to
> unstandby
> storage2. That is when things went horribly wrong.
> 
> What happened is that storage1 (primary node) complained that the
> monitor action for the iscsi targets failed on storage2.  The error
> is
> suggesting that they were already running on storage2.
> 
> Are my constraints, not sufficient?  I would really appreciate some
> input on what I've got wrong, or what is missing that would cause the
> target resources to start on both nodes.
> 
> Thanks,
> Bryan
> 
> Here are logs from storage1, which was the node hosting the resource:
> 
> Jun 20 11:48:35 storage1 crmd[240695]:  notice: Initiating monitor
> operation targetRHEVM_monitor_0 on storage2
> Jun 20 11:48:35 storage1 crmd[240695]:  notice: Initiating monitor
> operation lunRHEVM_monitor_0 on storage2
> Jun 20 11:48:35 storage1 crmd[240695]:  notice: Initiating monitor
> operation targetVMStorage_monitor_0 on storage2
> Jun 20 11:48:35 storage1 crmd[240695]:  notice: Initiating monitor
> operation lunVMStorage_monitor_0 on storage2
> Jun 20 11:48:36 storage1 crmd[240695]: warning: Action 18
> (targetRHEVM_monitor_0) on storage2 failed (target: 7 vs. rc: 0):
> Error
> Jun 20 11:48:36 storage1 crmd[240695]:  notice: Transition aborted by
> operation targetRHEVM_monitor_0 'modify' on storage2: Event failed
> Jun 20 11:48:36 storage1 crmd[240695]: warning: Action 20
> (targetVMStorage_monitor_0) on storage2 failed (target: 7 vs. rc: 0):
> Error
> Jun 20 11:48:36 storage1 crmd[240695]:  notice: Transition 1
> (Complete=12, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-1054.bz2): Complete
> Jun 20 11:48:36 storage1 pengine[240694]:   error: Resource
> targetRHEVM is active on 2 nodes (attempting recovery)

This means that pacemaker found the target active on storage2 *without*
having scheduled it there -- so either something outside pacemaker is
setting up the target, or (less likely) the agent is returning the
wrong status.

> Jun 20 11:48:36 storage1 pengine[240694]:  notice: See 
> https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more
> information
> Jun 20 11:48:36 storage1 pengine[240694]:   error: Resource
> targetVMStorage is active on 2 nodes (attempting recovery)
> Jun 20 11:48:36 storage1 pengine[240694]:  notice: See 
> https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more
> information
> Jun 20 11:48:36 storage1 pengine[240694]:  notice:  *
> Move       targetRHEVM     ( storage2 -> storage1 )
> Jun 20 11:48:36 storage1 pengine[240694]:  notice:  *
> Restart    lunRHEVM        (             storage1 )   due to required
> targetRHEVM start
> Jun 20 11:48:36 storage1 pengine[240694]:  notice:  *
> Move       targetVMStorage     ( storage2 -> storage1 )
> Jun 20 11:48:36 storage1 pengine[240694]:  notice:  *
> Restart    lunVMStorage        (             storage1 )   due to
> required targetVMStorage start
> Jun 20 11:48:36 storage1 pengine[240694]:   error: Calculated
> transition 2 (with errors), saving inputs in
> /var/lib/pacemaker/pengine/pe-error-7.bz2
> Jun 20 11:48:36 storage1 crmd[240695]:  notice: Initiating stop
> operation lunVMStorage_stop_0 locally on storage1
> Jun 20 11:48:36 storage1 iSCSILogicalUnit(lunVMStorage)[41725]: INFO:
> Deleted LUN 1.
> Jun 20 11:48:37 storage1 iSCSILogicalUnit(lunVMStorage)[41725]: INFO:
> Deleted storage object lunVMStorage.
> Jun 20 11:48:37 storage1 crmd[240695]:  notice: Result of stop
> operation for lunVMStorage on storage1: 0 (ok)
> Jun 20 11:48:37 storage1 crmd[240695]:  notice: Initiating stop
> operation targetVMStorage_stop_0 locally on storage1
> Jun 20 11:48:37 storage1 crmd[240695]:  notice: Initiating stop
> operation targetVMStorage_stop_0 on storage2
> Jun 20 11:48:37 storage1 iSCSITarget(targetVMStorage)[41806]: INFO:
> Deleted Target iqn.2019-03.com.leepfrog:storage.vmstorage.
> Jun 20 11:48:37 storage1 crmd[240695]:  notice: Result of stop
> operation for targetVMStorage on storage1: 0 (ok)
> Jun 20 11:48:37 storage1 crmd[240695]:  notice: Initiating stop
> operation lunRHEVM_stop_0 locally on storage1
> Jun 20 11:48:37 storage1 iSCSILogicalUnit(lunRHEVM)[41863]: INFO:
> Deleted LUN 0.
> Jun 20 11:48:38 storage1 iSCSILogicalUnit(lunRHEVM)[41863]: INFO:
> Deleted storage object lunRHEVM.
> Jun 20 11:48:38 storage1 crmd[240695]:  notice: Result of stop
> operation for lunRHEVM on storage1: 0 (ok)
> Jun 20 11:48:38 storage1 crmd[240695]:  notice: Initiating stop
> operation targetRHEVM_stop_0 locally on storage1
> Jun 20 11:48:38 storage1 crmd[240695]:  notice: Initiating stop
> operation targetRHEVM_stop_0 on storage2
> Jun 20 11:48:38 storage1 iSCSITarget(targetRHEVM)[42032]: INFO:
> Deleted Target iqn.2019-02.com.leepfrog:storage.rhevm.
> Jun 20 11:48:38 storage1 crmd[240695]:  notice: Result of stop
> operation for targetRHEVM on storage1: 0 (ok)
> Jun 20 11:48:38 storage1 crmd[240695]:  notice: Initiating start
> operation targetRHEVM_start_0 locally on storage1
> Jun 20 11:48:38 storage1 iSCSITarget(targetRHEVM)[42089]: INFO:
> Parameter auto_add_default_portal is now 'false'.
> Jun 20 11:48:39 storage1 iSCSITarget(targetRHEVM)[42089]: INFO:
> Created target iqn.2019-02.com.leepfrog:storage.rhevm. Created TPG 1.
> Jun 20 11:48:39 storage1 iSCSITarget(targetRHEVM)[42089]: INFO:
> Created Node ACL for iqn.1994-05.com.redhat:3d066d1f423e
> Jun 20 11:48:39 storage1 iSCSITarget(targetRHEVM)[42089]: INFO:
> Created Node ACL for iqn.1994-05.com.redhat:84f0f7458c58
> Jun 20 11:48:39 storage1 crmd[240695]:  notice: Result of start
> operation for targetRHEVM on storage1: 0 (ok)
> Jun 20 11:48:39 storage1 crmd[240695]:  notice: Initiating monitor
> operation targetRHEVM_monitor_10000 locally on storage1
> Jun 20 11:48:39 storage1 crmd[240695]:  notice: Initiating start
> operation lunRHEVM_start_0 locally on storage1
> Jun 20 11:48:39 storage1 iSCSILogicalUnit(lunRHEVM)[42221]: INFO:
> Created block storage object lunRHEVM using /dev/storage/rhevm.
> Jun 20 11:48:40 storage1 iSCSILogicalUnit(lunRHEVM)[42221]: INFO:
> Created LUN 0. Created LUN 0->0 mapping in node ACL iqn.1994-
> 05.com.redhat:84f0f7458c58 Created LUN 0->0 mapping in node ACL
> iqn.1994-05.com.redhat:3d066d1f423e
> Jun 20 11:48:40 storage1 iSCSILogicalUnit(lunRHEVM)[42221]: ERROR: No
> such NetworkPortal in configfs:
> /sys/kernel/config/target/iscsi/iqn.2019-
> 02.com.leepfrog:storage.rhevm/tpgt_1/np/0.0.0.0:3260
> Jun 20 11:48:40 storage1 crmd[240695]:  notice: Result of start
> operation for lunRHEVM on storage1: 0 (ok)
> Jun 20 11:48:40 storage1 crmd[240695]:  notice: Initiating monitor
> operation lunRHEVM_monitor_10000 locally on storage1
> Jun 20 11:48:40 storage1 crmd[240695]:  notice: Initiating start
> operation targetVMStorage_start_0 locally on storage1
> Jun 20 11:48:40 storage1 iSCSITarget(targetVMStorage)[42383]: INFO:
> Parameter auto_add_default_portal is now 'false'.
> Jun 20 11:48:41 storage1 iSCSITarget(targetVMStorage)[42383]: INFO:
> Created target iqn.2019-03.com.leepfrog:storage.vmstorage. Created
> TPG 1.
> Jun 20 11:48:41 storage1 iSCSITarget(targetVMStorage)[42383]: INFO:
> Created Node ACL for iqn.1994-05.com.redhat:3d066d1f423e
> Jun 20 11:48:41 storage1 iSCSITarget(targetVMStorage)[42383]: INFO:
> Created Node ACL for iqn.1994-05.com.redhat:84f0f7458c58
> Jun 20 11:48:41 storage1 crmd[240695]:  notice: Result of start
> operation for targetVMStorage on storage1: 0 (ok)
> Jun 20 11:48:41 storage1 crmd[240695]:  notice: Initiating monitor
> operation targetVMStorage_monitor_10000 locally on storage1
> Jun 20 11:48:41 storage1 crmd[240695]:  notice: Initiating start
> operation lunVMStorage_start_0 locally on storage1
> Jun 20 11:48:41 storage1 iSCSILogicalUnit(lunVMStorage)[42555]: INFO:
> Created block storage object lunVMStorage using
> /dev/storage/vmstorage.
> Jun 20 11:48:42 storage1 iSCSILogicalUnit(lunVMStorage)[42555]: INFO:
> Created LUN 1. Created LUN 1->1 mapping in node ACL iqn.1994-
> 05.com.redhat:84f0f7458c58 Created LUN 1->1 mapping in node ACL
> iqn.1994-05.com.redhat:3d066d1f423e
> Jun 20 11:48:42 storage1 iSCSILogicalUnit(lunVMStorage)[42555]:
> ERROR: No such NetworkPortal in configfs:
> /sys/kernel/config/target/iscsi/iqn.2019-
> 03.com.leepfrog:storage.vmstorage/tpgt_1/np/0.0.0.0:3260
> Jun 20 11:48:42 storage1 lrmd[240692]:  notice:
> lunVMStorage_start_0:42555:stderr [
> /usr/lib/ocf/resource.d/heartbeat/iSCSILogicalUnit: line 410:
> /sys/kernel/config/target/core/iblock_0/lunVMStorage/wwn/vpd_unit_ser
> ial: No such file or directory ]
> Jun 20 11:48:42 storage1 crmd[240695]:  notice: Result of start
> operation for lunVMStorage on storage1: 0 (ok)
> Jun 20 11:48:42 storage1 crmd[240695]:  notice: Initiating monitor
> operation lunVMStorage_monitor_10000 locally on storage1
> Jun 20 11:48:42 storage1 crmd[240695]:  notice: Transition 2
> (Complete=18, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-error-7.bz2): Complete
> Jun 20 11:48:42 storage1 crmd[240695]:  notice: State transition
> S_TRANSITION_ENGINE -> S_IDLE
> 
> Here are the logs from storage2, which was coming back online:
> 
> Jun 20 11:48:36 storage2 crmd[22305]:  notice: Result of probe
> operation for ISCSIMillipedeIP on storage2: 7 (not running)
> Jun 20 11:48:36 storage2 crmd[22305]:  notice: Result of probe
> operation for ISCSICentipedeIP on storage2: 7 (not running)
> Jun 20 11:48:36 storage2 crmd[22305]:  notice: Result of probe
> operation for targetRHEVM on storage2: 0 (ok)
> Jun 20 11:48:36 storage2 crmd[22305]:  notice: Result of probe
> operation for targetVMStorage on storage2: 0 (ok)
> Jun 20 11:48:36 storage2 crmd[22305]:  notice: Result of probe
> operation for lunRHEVM on storage2: 7 (not running)
> Jun 20 11:48:36 storage2 crmd[22305]:  notice: Result of probe
> operation for lunVMStorage on storage2: 7 (not running)
> 
> 
> 
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list