[ClusterLabs] Pacemaker not starting ISCSI LUNs and Targets

Sat Aug 26 12:41:55 UTC 2017

Hey John.

I also encountered the same error message "ERROR: This Target already
exists in configFS" a while back and when I issued targetcli and listed it
configuration contents I could see the target in iscsi folder. And that was
due to a force reboot of the node.

To solve it I've made an workaround by adding the following line "ocf_run
targetcli /iscsi delete ${OCF_RESKEY_iqn}" in
/usr/lib/ocf/resource.d/heartbeat/iSCSITarget at line 330 just before
"ocf_run targetcli /iscsi create ${OCF_RESKEY_iqn} || exit
$OCF_ERR_GENERIC". That command will delete the target to be created if
already exists.

I hope this workaround will help you with your issue until a valid solution
is available.

Best regards
Octavian Ciobanu

On Tue, Aug 22, 2017 at 12:19 AM, John Keates <john at keates.nl> wrote:

> Hi,
>
> I have a strange issue where LIO-T based ISCSI targets and LUNs most of
> the time simply don’t work. They either don’t start, or bounce around until
> no more nodes are tried.
> The less-than-usefull information on the logs is like:
>
> Aug 21 22:49:06 [10531] storage-1-prod    pengine:  warning:
> check_migration_threshold: Forcing iscsi0-target away from storage-1-prod
> after 1000000 failures (max=1000000)
>
> Aug 21 22:54:47 storage-1-prod crmd[2757]:   notice: Result of start
> operation for ip-iscsi0-vlan40 on storage-1-prod: 0 (ok)
> Aug 21 22:54:47 storage-1-prod iSCSITarget(iscsi0-target)[5427]: WARNING:
> Configuration parameter "tid" is not supported by the iSCSI implementation
> and will be ignored.
> Aug 21 22:54:48 storage-1-prod iSCSITarget(iscsi0-target)[5427]: INFO:
> Parameter auto_add_default_portal is now 'false'.
> Aug 21 22:54:48 storage-1-prod iSCSITarget(iscsi0-target)[5427]: INFO:
> Created target iqn.2017-08.acccess.net:prod-1-ha. Created TPG 1.
> Aug 21 22:54:48 storage-1-prod iSCSITarget(iscsi0-target)[5427]: ERROR:
> This Target already exists in configFS
> Aug 21 22:54:48 storage-1-prod crmd[2757]:   notice: Result of start
> operation for iscsi0-target on storage-1-prod: 1 (unknown error)
> Aug 21 22:54:49 storage-1-prod iSCSITarget(iscsi0-target)[5536]: INFO:
> Deleted Target iqn.2017-08.access.net:prod-1-ha.
> Aug 21 22:54:49 storage-1-prod crmd[2757]:   notice: Result of stop
> operation for iscsi0-target on storage-1-prod: 0 (ok)
>
> Now, the unknown error seems to actually be a targetcli type of error:
> "This Target already exists in configFS”. Checking with targetcli shows
> zero configured items on either node.
> Manually starting the LUNs and target gives:
>
>
> john at storage-1-prod:~$ sudo pcs resource debug-start iscsi0-target
> Error performing operation: Operation not permitted
> Operation start for iscsi0-target (ocf:heartbeat:iSCSITarget) returned 1
>  >  stderr: WARNING: Configuration parameter "tid" is not supported by the
> iSCSI implementation and will be ignored.
>  >  stderr: INFO: Parameter auto_add_default_portal is now 'false'.
>  >  stderr: INFO: Created target iqn.2017-08.access.net:prod-1-ha.
> Created TPG 1.
>  >  stderr: ERROR: This Target already exists in configFS
>
> but now targetcli shows at least the target. Checking with crm status
> still shows the target as stopped.
> Manually starting the LUNs gives:
>
>
> john at storage-1-prod:~$ sudo pcs resource debug-start iscsi0-lun0
> Operation start for iscsi0-lun0 (ocf:heartbeat:iSCSILogicalUnit) returned
> 0
>  >  stderr: INFO: Created block storage object iscsi0-lun0 using
> /dev/zvol/iscsipool0/iscsi/net.access.prod-1-ha-root.
>  >  stderr: INFO: Created LUN 0.
>  >  stderr: DEBUG: iscsi0-lun0 start : 0
> john at storage-1-prod:~$ sudo pcs resource debug-start iscsi0-lun1
> Operation start for iscsi0-lun1 (ocf:heartbeat:iSCSILogicalUnit) returned
> 0
>  >  stderr: INFO: Created block storage object iscsi0-lun1 using
> /dev/zvol/iscsipool0/iscsi/net.access.prod-1-ha-swap.
>  >  stderr: /usr/lib/ocf/resource.d/heartbeat/iSCSILogicalUnit: line 378:
> /sys/kernel/config/target/core/iblock_0/iscsi0-lun1/wwn/vpd_unit_serial:
> No such file or directory
>  >  stderr: INFO: Created LUN 1.
>  >  stderr: DEBUG: iscsi0-lun1 start : 0
>
> So the second LUN seems to have some bad parameters created by the
> iSCSILogicalUnit script. Checking with targetcli however shows both LUNs
> and the target up and running.
> Checking again with crm status (and pcs status) shows all three resources
> still stopped. Since LUNs are colocated with the target and the target
> still has fail counts, I clear them with:
>
> sudo pcs resource cleanup iscsi0-target
>
> Now the LUNs and target are all active in crm status / pcs status. But
> it’s quite a manual process to get this to work! I’m thinking either my
> configuration is bad or there is some bug somewhere in targetcli / LIO or
> the iSCSI heartbeat script.
> On top of all the manual work, it still breaks on any action. A move,
> failover, reboot etc. instantly breaks it. Everything else (the underlying
> ZFS Pool, the DRBD device, the IPv4 IP’s etc) moves just fine, it’s only
> the ISCSI that’s being problematic.
>
> Concrete questions:
>
> - Is my config bad?
> - Is there a known issue with ISCSI? (I have only found old references
> about ordering)
>
> I have added the output of crm config show as cib.txt and the output of a
> fresh boot of both nodes is:
>
> Current DC: storage-2-prod (version 1.1.16-94ff4df) - partition with quorum
> Last updated: Mon Aug 21 22:55:05 2017
> Last change: Mon Aug 21 22:36:23 2017 by root via cibadmin on
> storage-1-prod
>
> 2 nodes configured
> 21 resources configured
>
> Online: [ storage-1-prod storage-2-prod ]
>
> Full list of resources:
>
>  ip-iscsi0-vlan10       (ocf::heartbeat:IPaddr2):       Started
> storage-1-prod
>  ip-iscsi0-vlan20       (ocf::heartbeat:IPaddr2):       Started
> storage-1-prod
>  ip-iscsi0-vlan30       (ocf::heartbeat:IPaddr2):       Started
> storage-1-prod
>  ip-iscsi0-vlan40       (ocf::heartbeat:IPaddr2):       Started
> storage-1-prod
>  Master/Slave Set: drbd_master_slave0 [drbd_disk0]
>      Masters: [ storage-1-prod ]
>      Slaves: [ storage-2-prod ]
>  Master/Slave Set: drbd_master_slave1 [drbd_disk1]
>      Masters: [ storage-2-prod ]
>      Slaves: [ storage-1-prod ]
>  ip-iscsi1-vlan10       (ocf::heartbeat:IPaddr2):       Started
> storage-2-prod
>  ip-iscsi1-vlan20       (ocf::heartbeat:IPaddr2):       Started
> storage-2-prod
>  ip-iscsi1-vlan30       (ocf::heartbeat:IPaddr2):       Started
> storage-2-prod
>  ip-iscsi1-vlan40       (ocf::heartbeat:IPaddr2):       Started
> storage-2-prod
>  st-storage-1-prod      (stonith:meatware):     Started storage-2-prod
>  st-storage-2-prod      (stonith:meatware):     Started storage-1-prod
>  zfs-iscsipool0 (ocf::heartbeat:ZFS):   Started storage-1-prod
>  zfs-iscsipool1 (ocf::heartbeat:ZFS):   Started storage-2-prod
>  iscsi0-lun0    (ocf::heartbeat:iSCSILogicalUnit):      Stopped
>  iscsi0-lun1    (ocf::heartbeat:iSCSILogicalUnit):      Stopped
>  iscsi0-target  (ocf::heartbeat:iSCSITarget):   Stopped
>  Clone Set: dlm-clone [dlm]
>      Started: [ storage-1-prod storage-2-prod ]
>
> Failed Actions:
> * iscsi0-target_start_0 on storage-2-prod 'unknown error' (1): call=99,
> status=complete, exitreason='none',
>     last-rc-change='Mon Aug 21 22:54:49 2017', queued=0ms, exec=954ms
> * iscsi0-target_start_0 on storage-1-prod 'unknown error' (1): call=98,
> status=complete, exitreason='none',
>     last-rc-change='Mon Aug 21 22:54:47 2017', queued=0ms, exec=1062ms
>
> Regards,
> John
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170826/eac037fa/attachment-0002.html>