[ClusterLabs] fence_scsi problem
Patrick Vranckx
patrick.vranckx at uclouvain.be
Wed Oct 28 08:17:48 EDT 2020
Hi,
I try yo setup an HA cluster for ZFS. I think fence_scsi is not working
properly. I can reproduce the problem on two kind of hardware: iSCSI and
SAS storage.
Here is what I did:
- set up a storage server with 3 iscsi targets to be accessed by my
2-nodes cluster
- set up two nodes with centos 8.1. iscsi initiators let me see three
scsi disks on each node:
/dev/disk/by-id/wwn-0x60014053374288c674e4add8679947ca -> ../../sda
/dev/disk/by-id/wwn-0x6001405a38892587b3f4a739046b9ff4 -> ../../sdb
/dev/disk/by-id/wwn-0x6001405b249db0e617b41b19f3147af5 -> ../../sdc
- configure zpool "vol1" on one node
- defined resources on the cluster:
pcs resource create vol1 ZFS pool="vol1" op start timeout="90" op
stop timeout="90" --group=group-vol1
pcs resource create vol1-ip IPaddr2 ip=10.30.0.100 cidr_netmask=16
--group group-vol1
pcs stonith create fence-vol1 fence_scsi
pcmk_monitor_action="metadata" pcmk_host_list="node1,node2"
devices="/dev/disk/by-id/wwn-0x60014053374288c674e4add8679947ca,/dev/disk/by-id/wwn-0x6001405a38892587b3f4a739046b9ff4,/dev/disk/by-id/wwn-0x6001405b249db0e617b41b19f3147af5"
meta provides=unfencing
Everything seems ok:
[root at node1 /]# pcs status
Cluster name: mycluster
Cluster Summary:
* Stack: corosync
* Current DC: node2 (version 2.0.3-5.el8_2.1-4b1f869f0f) - partition
with quorum
* Last updated: Wed Oct 28 12:55:52 2020
* Last change: Wed Oct 28 12:32:36 2020 by root via crm_resource on
node2
* 2 nodes configured
* 3 resource instances configured
Node List:
* Online: [ node1 node2 ]
Full List of Resources:
* fence-vol1 (stonith:fence_scsi): Started node1
* Resource Group: group-vol1:
* vol1-ip (ocf::heartbeat:IPaddr2): Started node2
* vol1 (ocf::heartbeat:ZFS): Started node2
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Moving the resource to the other node runs fine and the zpool is only
imported on the active node.
SCSI reservations for scsi disks are as follows:
LIO-ORG disk1 4.0
Peripheral device type: disk
PR generation=0x2
Key=0x6c9d0000
All target ports bit clear
Relative port address: 0x1
<< Reservation holder >>
scope: LU_SCOPE, type: Write Exclusive, registrants only
Transport Id of initiator:
iSCSI world wide unique port id: iqn.2020-10.localhost.store:node1
Key=0x6c9d0001
All target ports bit clear
Relative port address: 0x1
not reservation holder
Transport Id of initiator:
iSCSI world wide unique port id: iqn.2020-10.localhost.store:node2
QUESTION : when I move the resource to the other node, SCSI reservations
do not change. I would expect that the Write Exclusive reservation would
follow the active node. Is it normal ?
If I fence manually the active node for group-vol1:
[root at node1 /]# pcs stonith fence node1
Node: node1 fenced
the resource is restarted on the other node BUT THE ZFS FILESTYSTEMS ARE
MOUNTED AND WRITEABLE ON BOTH NODES even if the scsi reservations seem ok:
[root at node2 by-id]# sg_persist -s
/dev/disk/by-id/wwn-0x60014053374288c674e4add8679947ca
LIO-ORG disk1 4.0
Peripheral device type: disk
PR generation=0x3
Key=0x6c9d0001
All target ports bit clear
Relative port address: 0x1
<< Reservation holder >>
scope: LU_SCOPE, type: Write Exclusive, registrants only
Transport Id of initiator:
iSCSI world wide unique port id: iqn.2020-10.localhost.store:node2
I can reproduce the same thing with two node cluster with SAS attached
storage running centos 7.8.
In my understanding, if node2 has write exclusive reservation, node1
shouldn't be able to write on those disks.
What's the problem ?
Thanks for your help,
Patrick
More information about the Users
mailing list