[ClusterLabs] serious problem with iSCSILogicalUnit

Wed Dec 11 09:58:51 EST 2019

Hello,

I've a working HA-Setup with iSCSI an ZFS, but last week I add an iSCSI allowed initiator, and than it happens - my hole VMware infrastructure fails because iSCSI does not working anymore.. today I've time to get a closer look into this..

create 2 VMs an put the same (more or less) config into it.
What I do:
- I create a iSCSI-target with allowed initiators
- I create iSCSI Logical units

but I got this:

targetcli
targetcli shell version 2.1.fb43
Copyright 2011-2013 by Datera, Inc and others.
For help on commands, type 'help'.

/> ls
o- / ......................................................................................................................... [...]
  o- backstores .............................................................................................................. [...]
  | o- block .................................................................................................. [Storage Objects: 3]
  | | o- iscsi-lun00 .................................................................. [/dev/loop1 (1.0GiB) write-thru deactivated]
  | | o- iscsi-lun01 .................................................................. [/dev/loop2 (1.0GiB) write-thru deactivated]
  | | o- iscsi-lun02 ................................................................. [/dev/loop3 (0 bytes) write-thru deactivated]
  | o- fileio ................................................................................................. [Storage Objects: 0]
  | o- pscsi .................................................................................................. [Storage Objects: 0]
  | o- ramdisk ................................................................................................ [Storage Objects: 0]
  o- iscsi ............................................................................................................ [Targets: 1]
  | o- iqn.2003-01.org.linux-iscsi.vm-storage.x8664:sn.cf6fa66tgyh3 ...................................................... [TPGs: 1]
  |   o- tpg1 ............................................................................................... [no-gen-acls, no-auth]
  |     o- acls .......................................................................................................... [ACLs: 4]
  |     | o- iqn.1993-08.org.debian:01:fee35be01c4d ............................................................... [Mapped LUNs: 0]
  |     | o- iqn.1998-01.com.vmware:brainslug10-34ad648763 ........................................................ [Mapped LUNs: 0]
  |     | o- iqn.1998-01.com.vmware:brainslug10-5564u4325 ......................................................... [Mapped LUNs: 0]
  |     | o- iqn.1998-01.com.vmware:brainslug9-75488e35 ........................................................... [Mapped LUNs: 0]
  |     o- luns .......................................................................................................... [LUNs: 0]
  |     o- portals .................................................................................................... [Portals: 1]
  |       o- 172.16.101.166:3260 .............................................................................................. [OK]
  o- loopback ......................................................................................................... [Targets: 0]
  o- vhost ............................................................................................................ [Targets: 0]

here you can see there are missing luns, when I move the ressource to the other node, it will shown the luns, if I then add/remove/change an "allowed_initiators" it will happen again - all luns are gone. And that is a very serious problem for us.

So my questions is, do i misconfigure something or is that a bug? My pacemaker config looks like the following:

crm conf sh
node 1: ha-test1 \
        attributes \
        attributes standby=off maintenance=off
node 2: ha-test2 \
        attributes \
        attributes standby=off
primitive ha-ip IPaddr2 \
        params ip=172.16.101.166 cidr_netmask=24 nic=ens192 \
        op start interval=0s timeout=20s \
        op stop interval=0s timeout=20s \
        op monitor interval=10s timeout=20s \
        meta target-role=Started
primitive iscsi-lun00 iSCSILogicalUnit \
        params implementation=lio-t target_iqn="iqn.2003-01.org.linux-iscsi.vm-storage.x8664:sn.cf6fa66tgyh3" lun=0 lio_iblock=0 path="/dev/loop1" \
        op start interval=0 trace_ra=1 \
        op stop interval=0 trace_ra=1 \
        meta target-role=Started
primitive iscsi-lun01 iSCSILogicalUnit \
        params implementation=lio-t target_iqn="iqn.2003-01.org.linux-iscsi.vm-storage.x8664:sn.cf6fa66tgyh3" lun=1 lio_iblock=1 path="/dev/loop2" \
        meta
primitive iscsi-lun02 iSCSILogicalUnit \
        params implementation=lio-t target_iqn="iqn.2003-01.org.linux-iscsi.vm-storage.x8664:sn.cf6fa66tgyh3" lun=2 lio_iblock=2 path="/dev/loop3" \
        meta
primitive iscsi-server iSCSITarget \
        params implementation=lio-t iqn="iqn.2003-01.org.linux-iscsi.vm-storage.x8664:sn.cf6fa66tgyh3" portals="172.16.101.166:3260" allowed_initiators="iqn.1998-01.com.vmware:brainslug9-75488e35 iqn.1998-01.com.vmware:brainslug10-5564u4325 iqn.1993-08.org.debian:01:fee35be01c4d iqn.1998-01.com.vmware:brainslug10-34ad648763" \
        meta
colocation pcs_rsc_colocation_set_ha-ip_vm_storage_iscsi-server inf: ha-ip iscsi-server iscsi-lun00 iscsi-lun01 iscsi-lun02
order pcs_rsc_order_set_ha-ip_iscsi-server_vm_storage ha-ip:stop iscsi-lun00:stop iscsi-lun01:stop iscsi-lun02:stop iscsi-server:stop symmetrical=false
order pcs_rsc_order_set_iscsi-server_vm_storage_ha-ip iscsi-lun00:start iscsi-server:start iscsi-lun01:start iscsi-lun02:start ha-ip:start symmetrical=false
property cib-bootstrap-options: \
        have-watchdog=false \
        dc-version=1.1.16-94ff4df \
        cluster-infrastructure=corosync \
        cluster-name=ha-vmstorage \
        no-quorum-policy=stop \
        stonith-enabled=false \
        last-lrm-refresh=1576056627
rsc_defaults rsc_defaults-options: \
        resource-stickiness=100

The system is running on Debian Stretch.

Thank you very much for you help!

best regards
Stefan