[ClusterLabs] Pacemaker not starting ISCSI LUNs and Targets

Sat Aug 26 16:42:30 UTC 2017

Hey John,

For the portal issue I've added another line containing "ocf_run targetcli
/iscsi set global auto_add_default_portal=false || exit $OCF_ERR_GENERIC"
at line 328 of the iSCSITarget script just before the "for portal in
${OCF_RESKEY_portals}; do" and this way I can use only one portal in my
cluster configuration. This line I've found while I was searching to fix
the same issue as you had.

Best regards
Octavian Ciobanu

On Sat, Aug 26, 2017 at 6:25 PM, John Keates <john at keates.nl> wrote:

> Hey Octavian,
>
> I did that too, and it worked. Afterwards, I did some more checking to see
> why the control flow ended up there, since the script works with tgtd and
> LIO (non -T / fe version).
> It seems it has a slight implementation issue where the for loop fails if
> there are no portals listed or the one(s) listed is equal to the default
> value. Setting only one portal doesn’t help either for some reason (but I
> don’t see why, the loop would run exactly once). When I configure 2 or more
> target portals manually it works just fine.
>
> This is something I can work around since I use SaltStack to configure the
> CIB, so I just pull the IP addresses off of the interfaces I want to use,
> and stick them in a space delimited list as the portals.
>
> I created an issue about it at https://github.com/
> ClusterLabs/resource-agents/issues/1026
>
> At the same time, I’m still not sure about my configuration. It seems I
> have way more locations, colocation and ordering constraints than most
> configs I come across.
>
> Kind regards,
> John Keates
>
>
> On 26 Aug 2017, at 14:41, Octavian Ciobanu <coctavian1979 at gmail.com>
> wrote:
>
> Hey John.
>
> I also encountered the same error message "ERROR: This Target already
> exists in configFS" a while back and when I issued targetcli and listed it
> configuration contents I could see the target in iscsi folder. And that was
> due to a force reboot of the node.
>
> To solve it I've made an workaround by adding the following line "ocf_run
> targetcli /iscsi delete ${OCF_RESKEY_iqn}" in /usr/lib/ocf/resource.d/heartbeat/iSCSITarget
> at line 330 just before "ocf_run targetcli /iscsi create ${OCF_RESKEY_iqn}
> || exit $OCF_ERR_GENERIC". That command will delete the target to be
> created if already exists.
>
> I hope this workaround will help you with your issue until a valid
> solution is available.
>
> Best regards
> Octavian Ciobanu
>
> On Tue, Aug 22, 2017 at 12:19 AM, John Keates <john at keates.nl> wrote:
>
>> Hi,
>>
>> I have a strange issue where LIO-T based ISCSI targets and LUNs most of
>> the time simply don’t work. They either don’t start, or bounce around until
>> no more nodes are tried.
>> The less-than-usefull information on the logs is like:
>>
>> Aug 21 22:49:06 [10531] storage-1-prod    pengine:  warning:
>> check_migration_threshold: Forcing iscsi0-target away from storage-1-prod
>> after 1000000 failures (max=1000000)
>>
>> Aug 21 22:54:47 storage-1-prod crmd[2757]:   notice: Result of start
>> operation for ip-iscsi0-vlan40 on storage-1-prod: 0 (ok)
>> Aug 21 22:54:47 storage-1-prod iSCSITarget(iscsi0-target)[5427]:
>> WARNING: Configuration parameter "tid" is not supported by the iSCSI
>> implementation and will be ignored.
>> Aug 21 22:54:48 storage-1-prod iSCSITarget(iscsi0-target)[5427]: INFO:
>> Parameter auto_add_default_portal is now 'false'.
>> Aug 21 22:54:48 storage-1-prod iSCSITarget(iscsi0-target)[5427]: INFO:
>> Created target iqn.2017-08.acccess.net:prod-1-ha. Created TPG 1.
>> Aug 21 22:54:48 storage-1-prod iSCSITarget(iscsi0-target)[5427]: ERROR:
>> This Target already exists in configFS
>> Aug 21 22:54:48 storage-1-prod crmd[2757]:   notice: Result of start
>> operation for iscsi0-target on storage-1-prod: 1 (unknown error)
>> Aug 21 22:54:49 storage-1-prod iSCSITarget(iscsi0-target)[5536]: INFO:
>> Deleted Target iqn.2017-08.access.net:prod-1-ha.
>> Aug 21 22:54:49 storage-1-prod crmd[2757]:   notice: Result of stop
>> operation for iscsi0-target on storage-1-prod: 0 (ok)
>>
>> Now, the unknown error seems to actually be a targetcli type of error:
>> "This Target already exists in configFS”. Checking with targetcli shows
>> zero configured items on either node.
>> Manually starting the LUNs and target gives:
>>
>>
>> john at storage-1-prod:~$ sudo pcs resource debug-start iscsi0-target
>> Error performing operation: Operation not permitted
>> Operation start for iscsi0-target (ocf:heartbeat:iSCSITarget) returned 1
>>  >  stderr: WARNING: Configuration parameter "tid" is not supported by
>> the iSCSI implementation and will be ignored.
>>  >  stderr: INFO: Parameter auto_add_default_portal is now 'false'.
>>  >  stderr: INFO: Created target iqn.2017-08.access.net:prod-1-ha.
>> Created TPG 1.
>>  >  stderr: ERROR: This Target already exists in configFS
>>
>> but now targetcli shows at least the target. Checking with crm status
>> still shows the target as stopped.
>> Manually starting the LUNs gives:
>>
>>
>> john at storage-1-prod:~$ sudo pcs resource debug-start iscsi0-lun0
>> Operation start for iscsi0-lun0 (ocf:heartbeat:iSCSILogicalUnit)
>> returned 0
>>  >  stderr: INFO: Created block storage object iscsi0-lun0 using
>> /dev/zvol/iscsipool0/iscsi/net.access.prod-1-ha-root.
>>  >  stderr: INFO: Created LUN 0.
>>  >  stderr: DEBUG: iscsi0-lun0 start : 0
>> john at storage-1-prod:~$ sudo pcs resource debug-start iscsi0-lun1
>> Operation start for iscsi0-lun1 (ocf:heartbeat:iSCSILogicalUnit)
>> returned 0
>>  >  stderr: INFO: Created block storage object iscsi0-lun1 using
>> /dev/zvol/iscsipool0/iscsi/net.access.prod-1-ha-swap.
>>  >  stderr: /usr/lib/ocf/resource.d/heartbeat/iSCSILogicalUnit: line
>> 378: /sys/kernel/config/target/core/iblock_0/iscsi0-lun1/wwn/vpd_unit_serial:
>> No such file or directory
>>  >  stderr: INFO: Created LUN 1.
>>  >  stderr: DEBUG: iscsi0-lun1 start : 0
>>
>> So the second LUN seems to have some bad parameters created by the
>> iSCSILogicalUnit script. Checking with targetcli however shows both LUNs
>> and the target up and running.
>> Checking again with crm status (and pcs status) shows all three resources
>> still stopped. Since LUNs are colocated with the target and the target
>> still has fail counts, I clear them with:
>>
>> sudo pcs resource cleanup iscsi0-target
>>
>> Now the LUNs and target are all active in crm status / pcs status. But
>> it’s quite a manual process to get this to work! I’m thinking either my
>> configuration is bad or there is some bug somewhere in targetcli / LIO or
>> the iSCSI heartbeat script.
>> On top of all the manual work, it still breaks on any action. A move,
>> failover, reboot etc. instantly breaks it. Everything else (the underlying
>> ZFS Pool, the DRBD device, the IPv4 IP’s etc) moves just fine, it’s only
>> the ISCSI that’s being problematic.
>>
>> Concrete questions:
>>
>> - Is my config bad?
>> - Is there a known issue with ISCSI? (I have only found old references
>> about ordering)
>>
>> I have added the output of crm config show as cib.txt and the output of a
>> fresh boot of both nodes is:
>>
>> Current DC: storage-2-prod (version 1.1.16-94ff4df) - partition with
>> quorum
>> Last updated: Mon Aug 21 22:55:05 2017
>> Last change: Mon Aug 21 22:36:23 2017 by root via cibadmin on
>> storage-1-prod
>>
>> 2 nodes configured
>> 21 resources configured
>>
>> Online: [ storage-1-prod storage-2-prod ]
>>
>> Full list of resources:
>>
>>  ip-iscsi0-vlan10       (ocf::heartbeat:IPaddr2):       Started
>> storage-1-prod
>>  ip-iscsi0-vlan20       (ocf::heartbeat:IPaddr2):       Started
>> storage-1-prod
>>  ip-iscsi0-vlan30       (ocf::heartbeat:IPaddr2):       Started
>> storage-1-prod
>>  ip-iscsi0-vlan40       (ocf::heartbeat:IPaddr2):       Started
>> storage-1-prod
>>  Master/Slave Set: drbd_master_slave0 [drbd_disk0]
>>      Masters: [ storage-1-prod ]
>>      Slaves: [ storage-2-prod ]
>>  Master/Slave Set: drbd_master_slave1 [drbd_disk1]
>>      Masters: [ storage-2-prod ]
>>      Slaves: [ storage-1-prod ]
>>  ip-iscsi1-vlan10       (ocf::heartbeat:IPaddr2):       Started
>> storage-2-prod
>>  ip-iscsi1-vlan20       (ocf::heartbeat:IPaddr2):       Started
>> storage-2-prod
>>  ip-iscsi1-vlan30       (ocf::heartbeat:IPaddr2):       Started
>> storage-2-prod
>>  ip-iscsi1-vlan40       (ocf::heartbeat:IPaddr2):       Started
>> storage-2-prod
>>  st-storage-1-prod      (stonith:meatware):     Started storage-2-prod
>>  st-storage-2-prod      (stonith:meatware):     Started storage-1-prod
>>  zfs-iscsipool0 (ocf::heartbeat:ZFS):   Started storage-1-prod
>>  zfs-iscsipool1 (ocf::heartbeat:ZFS):   Started storage-2-prod
>>  iscsi0-lun0    (ocf::heartbeat:iSCSILogicalUnit):      Stopped
>>  iscsi0-lun1    (ocf::heartbeat:iSCSILogicalUnit):      Stopped
>>  iscsi0-target  (ocf::heartbeat:iSCSITarget):   Stopped
>>  Clone Set: dlm-clone [dlm]
>>      Started: [ storage-1-prod storage-2-prod ]
>>
>> Failed Actions:
>> * iscsi0-target_start_0 on storage-2-prod 'unknown error' (1): call=99,
>> status=complete, exitreason='none',
>>     last-rc-change='Mon Aug 21 22:54:49 2017', queued=0ms, exec=954ms
>> * iscsi0-target_start_0 on storage-1-prod 'unknown error' (1): call=98,
>> status=complete, exitreason='none',
>>     last-rc-change='Mon Aug 21 22:54:47 2017', queued=0ms, exec=1062ms
>>
>> Regards,
>> John
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170826/24cd02b2/attachment-0002.html>