[ClusterLabs] fence_scsi problem

Wed Oct 28 08:17:48 EDT 2020

Hi,

I try yo setup an HA cluster for ZFS. I think fence_scsi is not working 
properly. I can reproduce the problem on two kind of hardware: iSCSI and 
SAS storage.

Here is what I did:

- set up a storage server with 3 iscsi targets to be accessed by my 
2-nodes cluster

- set up two nodes with centos 8.1. iscsi initiators let me see three 
scsi disks on each node:

     /dev/disk/by-id/wwn-0x60014053374288c674e4add8679947ca -> ../../sda
     /dev/disk/by-id/wwn-0x6001405a38892587b3f4a739046b9ff4 -> ../../sdb
     /dev/disk/by-id/wwn-0x6001405b249db0e617b41b19f3147af5 -> ../../sdc

- configure zpool "vol1" on one node

- defined resources on the cluster:

     pcs resource create vol1 ZFS pool="vol1"  op start timeout="90" op 
stop timeout="90" --group=group-vol1
     pcs resource create vol1-ip IPaddr2 ip=10.30.0.100 cidr_netmask=16 
--group group-vol1
     pcs stonith create fence-vol1 fence_scsi 
pcmk_monitor_action="metadata" pcmk_host_list="node1,node2" 
devices="/dev/disk/by-id/wwn-0x60014053374288c674e4add8679947ca,/dev/disk/by-id/wwn-0x6001405a38892587b3f4a739046b9ff4,/dev/disk/by-id/wwn-0x6001405b249db0e617b41b19f3147af5" 
meta provides=unfencing

Everything seems ok:

[root at node1 /]# pcs status

Cluster name: mycluster
Cluster Summary:
   * Stack: corosync
   * Current DC: node2 (version 2.0.3-5.el8_2.1-4b1f869f0f) - partition 
with quorum
   * Last updated: Wed Oct 28 12:55:52 2020
   * Last change:  Wed Oct 28 12:32:36 2020 by root via crm_resource on 
node2
   * 2 nodes configured
   * 3 resource instances configured

Node List:
   * Online: [ node1 node2 ]

Full List of Resources:
   * fence-vol1  (stonith:fence_scsi):   Started node1
   * Resource Group: group-vol1:
     * vol1-ip   (ocf::heartbeat:IPaddr2):       Started node2
     * vol1      (ocf::heartbeat:ZFS):   Started node2

Daemon Status:
   corosync: active/enabled
   pacemaker: active/enabled
   pcsd: active/enabled

Moving the resource to the other node runs fine and the zpool is only 
imported on the active node.

SCSI reservations for scsi disks are as follows:

  LIO-ORG   disk1             4.0
   Peripheral device type: disk
   PR generation=0x2
     Key=0x6c9d0000
       All target ports bit clear
       Relative port address: 0x1
       << Reservation holder >>
       scope: LU_SCOPE,  type: Write Exclusive, registrants only
       Transport Id of initiator:
         iSCSI world wide unique port id: iqn.2020-10.localhost.store:node1
     Key=0x6c9d0001
       All target ports bit clear
       Relative port address: 0x1
       not reservation holder
       Transport Id of initiator:
         iSCSI world wide unique port id: iqn.2020-10.localhost.store:node2

QUESTION : when I move the resource to the other node, SCSI reservations 
do not change. I would expect that the Write Exclusive reservation would 
follow the active node. Is it normal ?

If I fence manually the active node for group-vol1:

     [root at node1 /]# pcs stonith fence node1
     Node: node1 fenced

the resource is restarted on the other node BUT THE ZFS FILESTYSTEMS ARE 
MOUNTED AND WRITEABLE ON BOTH NODES even if the scsi reservations seem ok:

[root at node2 by-id]# sg_persist -s 
/dev/disk/by-id/wwn-0x60014053374288c674e4add8679947ca
   LIO-ORG   disk1             4.0
   Peripheral device type: disk
   PR generation=0x3
     Key=0x6c9d0001
       All target ports bit clear
       Relative port address: 0x1
       << Reservation holder >>
       scope: LU_SCOPE,  type: Write Exclusive, registrants only
       Transport Id of initiator:
         iSCSI world wide unique port id: iqn.2020-10.localhost.store:node2

I can reproduce the same thing with two node cluster with SAS attached 
storage running centos 7.8.

In my understanding, if node2 has write exclusive reservation, node1 
shouldn't be able to write on those disks.

What's the problem ?

Thanks for your help,

Patrick