[ClusterLabs] Pacemaker not starting ISCSI LUNs and Targets

John Keates john at keates.nl
Sat Aug 26 11:25:41 EDT 2017


Hey Octavian,

I did that too, and it worked. Afterwards, I did some more checking to see why the control flow ended up there, since the script works with tgtd and LIO (non -T / fe version).
It seems it has a slight implementation issue where the for loop fails if there are no portals listed or the one(s) listed is equal to the default value. Setting only one portal doesn’t help either for some reason (but I don’t see why, the loop would run exactly once). When I configure 2 or more target portals manually it works just fine. 

This is something I can work around since I use SaltStack to configure the CIB, so I just pull the IP addresses off of the interfaces I want to use, and stick them in a space delimited list as the portals.

I created an issue about it at https://github.com/ClusterLabs/resource-agents/issues/1026 <https://github.com/ClusterLabs/resource-agents/issues/1026>

At the same time, I’m still not sure about my configuration. It seems I have way more locations, colocation and ordering constraints than most configs I come across.

Kind regards,
John Keates


> On 26 Aug 2017, at 14:41, Octavian Ciobanu <coctavian1979 at gmail.com> wrote:
> 
> Hey John.
> 
> I also encountered the same error message "ERROR: This Target already exists in configFS" a while back and when I issued targetcli and listed it configuration contents I could see the target in iscsi folder. And that was due to a force reboot of the node.
> 
> To solve it I've made an workaround by adding the following line "ocf_run targetcli /iscsi delete ${OCF_RESKEY_iqn}" in /usr/lib/ocf/resource.d/heartbeat/iSCSITarget at line 330 just before "ocf_run targetcli /iscsi create ${OCF_RESKEY_iqn} || exit $OCF_ERR_GENERIC". That command will delete the target to be created if already exists.
> 
> I hope this workaround will help you with your issue until a valid solution is available.
> 
> Best regards
> Octavian Ciobanu
> 
> On Tue, Aug 22, 2017 at 12:19 AM, John Keates <john at keates.nl <mailto:john at keates.nl>> wrote:
> Hi,
> 
> I have a strange issue where LIO-T based ISCSI targets and LUNs most of the time simply don’t work. They either don’t start, or bounce around until no more nodes are tried.
> The less-than-usefull information on the logs is like:
> 
> Aug 21 22:49:06 [10531] storage-1-prod    pengine:  warning: check_migration_threshold: Forcing iscsi0-target away from storage-1-prod after 1000000 failures (max=1000000)
> 
> Aug 21 22:54:47 storage-1-prod crmd[2757]:   notice: Result of start operation for ip-iscsi0-vlan40 on storage-1-prod: 0 (ok)
> Aug 21 22:54:47 storage-1-prod iSCSITarget(iscsi0-target)[5427]: WARNING: Configuration parameter "tid" is not supported by the iSCSI implementation and will be ignored.
> Aug 21 22:54:48 storage-1-prod iSCSITarget(iscsi0-target)[5427]: INFO: Parameter auto_add_default_portal is now 'false'.
> Aug 21 22:54:48 storage-1-prod iSCSITarget(iscsi0-target)[5427]: INFO: Created target iqn.2017-08.acccess.net:prod-1-ha. Created TPG 1.
> Aug 21 22:54:48 storage-1-prod iSCSITarget(iscsi0-target)[5427]: ERROR: This Target already exists in configFS
> Aug 21 22:54:48 storage-1-prod crmd[2757]:   notice: Result of start operation for iscsi0-target on storage-1-prod: 1 (unknown error)
> Aug 21 22:54:49 storage-1-prod iSCSITarget(iscsi0-target)[5536]: INFO: Deleted Target iqn.2017-08.access.net:prod-1-ha.
> Aug 21 22:54:49 storage-1-prod crmd[2757]:   notice: Result of stop operation for iscsi0-target on storage-1-prod: 0 (ok)
> 
> Now, the unknown error seems to actually be a targetcli type of error: "This Target already exists in configFS”. Checking with targetcli shows zero configured items on either node.
> Manually starting the LUNs and target gives:
> 
> 
> john at storage-1-prod:~$ sudo pcs resource debug-start iscsi0-target
> Error performing operation: Operation not permitted
> Operation start for iscsi0-target (ocf:heartbeat:iSCSITarget) returned 1
>  >  stderr: WARNING: Configuration parameter "tid" is not supported by the iSCSI implementation and will be ignored.
>  >  stderr: INFO: Parameter auto_add_default_portal is now 'false'.
>  >  stderr: INFO: Created target iqn.2017-08.access.net:prod-1-ha. Created TPG 1.
>  >  stderr: ERROR: This Target already exists in configFS
> 
> but now targetcli shows at least the target. Checking with crm status still shows the target as stopped.
> Manually starting the LUNs gives:
> 
> 
> john at storage-1-prod:~$ sudo pcs resource debug-start iscsi0-lun0
> Operation start for iscsi0-lun0 (ocf:heartbeat:iSCSILogicalUnit) returned 0
>  >  stderr: INFO: Created block storage object iscsi0-lun0 using /dev/zvol/iscsipool0/iscsi/net.access.prod-1-ha-root.
>  >  stderr: INFO: Created LUN 0.
>  >  stderr: DEBUG: iscsi0-lun0 start : 0
> john at storage-1-prod:~$ sudo pcs resource debug-start iscsi0-lun1
> Operation start for iscsi0-lun1 (ocf:heartbeat:iSCSILogicalUnit) returned 0
>  >  stderr: INFO: Created block storage object iscsi0-lun1 using /dev/zvol/iscsipool0/iscsi/net.access.prod-1-ha-swap.
>  >  stderr: /usr/lib/ocf/resource.d/heartbeat/iSCSILogicalUnit: line 378: /sys/kernel/config/target/core/iblock_0/iscsi0-lun1/wwn/vpd_unit_serial: No such file or directory
>  >  stderr: INFO: Created LUN 1.
>  >  stderr: DEBUG: iscsi0-lun1 start : 0
> 
> So the second LUN seems to have some bad parameters created by the iSCSILogicalUnit script. Checking with targetcli however shows both LUNs and the target up and running.
> Checking again with crm status (and pcs status) shows all three resources still stopped. Since LUNs are colocated with the target and the target still has fail counts, I clear them with:
> 
> sudo pcs resource cleanup iscsi0-target
> 
> Now the LUNs and target are all active in crm status / pcs status. But it’s quite a manual process to get this to work! I’m thinking either my configuration is bad or there is some bug somewhere in targetcli / LIO or the iSCSI heartbeat script.
> On top of all the manual work, it still breaks on any action. A move, failover, reboot etc. instantly breaks it. Everything else (the underlying ZFS Pool, the DRBD device, the IPv4 IP’s etc) moves just fine, it’s only the ISCSI that’s being problematic.
> 
> Concrete questions:
> 
> - Is my config bad?
> - Is there a known issue with ISCSI? (I have only found old references about ordering)
> 
> I have added the output of crm config show as cib.txt and the output of a fresh boot of both nodes is:
> 
> Current DC: storage-2-prod (version 1.1.16-94ff4df) - partition with quorum
> Last updated: Mon Aug 21 22:55:05 2017
> Last change: Mon Aug 21 22:36:23 2017 by root via cibadmin on storage-1-prod
> 
> 2 nodes configured
> 21 resources configured
> 
> Online: [ storage-1-prod storage-2-prod ]
> 
> Full list of resources:
> 
>  ip-iscsi0-vlan10       (ocf::heartbeat:IPaddr2):       Started storage-1-prod
>  ip-iscsi0-vlan20       (ocf::heartbeat:IPaddr2):       Started storage-1-prod
>  ip-iscsi0-vlan30       (ocf::heartbeat:IPaddr2):       Started storage-1-prod
>  ip-iscsi0-vlan40       (ocf::heartbeat:IPaddr2):       Started storage-1-prod
>  Master/Slave Set: drbd_master_slave0 [drbd_disk0]
>      Masters: [ storage-1-prod ]
>      Slaves: [ storage-2-prod ]
>  Master/Slave Set: drbd_master_slave1 [drbd_disk1]
>      Masters: [ storage-2-prod ]
>      Slaves: [ storage-1-prod ]
>  ip-iscsi1-vlan10       (ocf::heartbeat:IPaddr2):       Started storage-2-prod
>  ip-iscsi1-vlan20       (ocf::heartbeat:IPaddr2):       Started storage-2-prod
>  ip-iscsi1-vlan30       (ocf::heartbeat:IPaddr2):       Started storage-2-prod
>  ip-iscsi1-vlan40       (ocf::heartbeat:IPaddr2):       Started storage-2-prod
>  st-storage-1-prod      (stonith:meatware):     Started storage-2-prod
>  st-storage-2-prod      (stonith:meatware):     Started storage-1-prod
>  zfs-iscsipool0 (ocf::heartbeat:ZFS):   Started storage-1-prod
>  zfs-iscsipool1 (ocf::heartbeat:ZFS):   Started storage-2-prod
>  iscsi0-lun0    (ocf::heartbeat:iSCSILogicalUnit):      Stopped
>  iscsi0-lun1    (ocf::heartbeat:iSCSILogicalUnit):      Stopped
>  iscsi0-target  (ocf::heartbeat:iSCSITarget):   Stopped
>  Clone Set: dlm-clone [dlm]
>      Started: [ storage-1-prod storage-2-prod ]
> 
> Failed Actions:
> * iscsi0-target_start_0 on storage-2-prod 'unknown error' (1): call=99, status=complete, exitreason='none',
>     last-rc-change='Mon Aug 21 22:54:49 2017', queued=0ms, exec=954ms
> * iscsi0-target_start_0 on storage-1-prod 'unknown error' (1): call=98, status=complete, exitreason='none',
>     last-rc-change='Mon Aug 21 22:54:47 2017', queued=0ms, exec=1062ms
> 
> Regards,
> John
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
> http://lists.clusterlabs.org/mailman/listinfo/users <http://lists.clusterlabs.org/mailman/listinfo/users>
> 
> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
> http://lists.clusterlabs.org/mailman/listinfo/users <http://lists.clusterlabs.org/mailman/listinfo/users>
> 
> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170826/9dedf685/attachment-0003.html>


More information about the Users mailing list