[ClusterLabs] LIO (iSCSI Target) does not release DRBD device?

Per von Zweigbergk pvz at itassistans.se
Tue May 26 07:33:57 EDT 2015


I was able to resolve this by updating the entire software stack using a third-party service, which must have pulled in a new version of Pacemaker, or something, because now it works as expected without any config changes. 

-----Ursprungligt meddelande-----
Från: Per von Zweigbergk [mailto:pvz at itassistans.se] 
Skickat: den 25 maj 2015 17:35
Till: users at clusterlabs.org
Ämne: [ClusterLabs] LIO (iSCSI Target) does not release DRBD device?

I'm attempting to get a two-node cluster setup working. The workload is going to be a DRBD-backed iSCSI target. I have chosen the Linux-IO Target (LIO) for this purpose. I'm running on Ubuntu LTS 14.04, with the software as packaged by the distro.

Unfortunately, it doesn't quite work the way I expect. When I do a "crm resource move g_iscsi" to force a move of the iSCSI target, and thereby DRBD, from what I can tell:

First the iSCSI target resource (ocf:heartbeat:iSCSITarget) is torn down. After that, the LUN resource is torn down (ocf:heartbeat:iSCSILogicalUnit). After that, the two IP address resources are torn down (ocf:heartbeat:IPaddr2). This is all as I expect to happen.

Then, it attempts to demote DRBD to secondary, which is where it seems to fail according to:

May 25 16:44:12 node01 kernel: [  702.206628] block drbd1: State change failed: Device is held open by someone

This is despite the fact that I have verified that the LIO LUN is "deactivated" according to:

root at node01:~# targetcli
targetcli GIT_VERSION (rtslib GIT_VERSION)
Copyright (c) 2011-2013 by Datera, Inc.
All rights reserved.
/> ls
o- / .................................................................................................... [...]
  o- backstores ......................................................................................... [...]
  | o- fileio .............................................................................. [0 Storage Object]
  | o- iblock .............................................................................. [1 Storage Object]
  | | o- p_lun_iscsi ................................................................. [/dev/drbd1 deactivated]
  | o- pscsi ............................................................................... [0 Storage Object]
  | o- rd_dr ............................................................................... [0 Storage Object]
  | o- rd_mcp .............................................................................. [0 Storage Object]
  o- ib_srpt ...................................................................................... [0 Targets]
  o- iscsi ........................................................................................ [0 Targets]
  o- loopback ..................................................................................... [0 Targets]
  o- qla2xxx ...................................................................................... [0 Targets]
  o- tcm_fc ....................................................................................... [0 Targets]
/>

No joy on using fuser or lsof to check what might be holding /dev/drbd1 open unfortunately (perhaps because LIO lives in the kernel?), but if I go in and manually delete the p_lun_iscsi object, I'm able to demote to secondary, as below:

(in targetcli)
/backstores/iblock> delete p_lun_iscsi
Deleted storage object p_lun_iscsi.
/backstores/iblock> exit
There are unsaved configuration changes.
If you exit now, configuration will not be updated and changes will be lost upon reboot.
Type 'exit' if you want to exit anyway: exit
root at node01:~# drbdadm secondary fs01_data
root at node01:~# cat /proc/drbd
version: 8.4.3 (api:1/proto:86-101)
srcversion: 6551AD2C98F533733BE558C

 1: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
    ns:7095200 nr:0 dw:2028504 dr:5069232 al:472 bm:312 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
root at node01:~#

The DRBD node is in "standalone" mode because it seemed the other node forcibly took over the resource somehow, so this is not to be unexpected.

So in summary, what I think is happening is that failover fails, because Pacemaker isn't telling LIO to sufficiently let go of the DRBD device, which is causing it to be unable to go into secondary mode. After a bunch of failures, the other node realizes this, and goes into standalone mode to force-take-over the DRBD resource, killing replication.

What can I do to try to get this working?

Finally, here's a dump of "crm configure show", for good measure, with some potentially sensitive data redacted (I'm not actually running production in the RFC3330 TEST-NET subnet, and my nodeid's aren't as listed):

node $id="111111111" node01 \
        attributes maintenance="off"
node $id="222222222" node02 \
        attributes maintenance="off"
primitive p_drbd_iscsi ocf:linbit:drbd \
        params drbd_resource="fs01_data" \
        op start timeout="240" interval="0" \
        op stop timeout="180" interval="0" \
        op monitor interval="60" timeout="60"
primitive p_ip1_iscsi ocf:heartbeat:IPaddr2 \
        params ip="192.0.2.131" cidr_netmask="28" nic="eth1" iflabel="iscsi" \
        op monitor interval="30s"
primitive p_ip2_iscsi ocf:heartbeat:IPaddr2 \
        params ip="192.0.2.147" cidr_netmask="28" nic="eth2" iflabel="iscsi" \
        op monitor interval="30s"
primitive p_iscsitarget_iscsi ocf:heartbeat:iSCSITarget \
        params iqn="iqn.2015-05.com.example:iscsi" implementation="lio" portals="192.0.2.131 192.0.2.146" \
        meta is-managed="true"
primitive p_lun_iscsi ocf:heartbeat:iSCSILogicalUnit \
        params target_iqn="iqn.2015-05.com.example:iscsi" lun="0" path="/dev/drbd1"
group g_iscsi p_ip1_iscsi p_ip2_iscsi p_lun_iscsi p_iscsitarget_iscsi \
        meta target-role="Started"
ms ms_drbd_iscsi p_drbd_iscsi \
        meta clone-max="2" clone-node-max="1" master-max="1" master-node-max="1" notify="true" is-managed="true" target-role="Started"
location lo_drbd_iscsi ms_drbd_iscsi \
        rule $id="lo_drbd_iscsi-rule" -inf: #uname ne node01 and #uname ne node02
colocation co_iscsitarget_iscsi inf: g_iscsi ms_drbd_iscsi:Master
order o_iscsi inf: ms_drbd_iscsi:promote g_iscsi:start
property $id="cib-bootstrap-options" \
        dc-version="1.1.10-42f2063" \
        cluster-infrastructure="corosync" \
        stonith-enabled="false" \
        last-lrm-refresh="1432564681" \
        no-quorum-policy="ignore" \
        default-resource-stickiness="200"

_______________________________________________
Users mailing list: Users at clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




More information about the Users mailing list