[ClusterLabs] Help with tweaking an active/passive NFS cluster

Ronny Adsetts ronny.adsetts at amazinginternet.com
Thu Mar 30 17:41:54 EDT 2023


Hi,

I wonder if someone more familiar with the workings of pacemaker/corosync would be able to assist in solving an issue.

I have a 3-node NFS cluster which exports several iSCSI LUNs. The LUNs are presented to the nodes via multipathd.

This all works fine except that I can't stop just one export. Sometimes I need to take a single filesystem offline for maintenance for example. Or if there's an issue and a filesystem goes offline and can't come back.

There's a trimmed down config below but essentially I want all the NFS exports on one node but I don't want any of the exports to block. So it's OK to stop (or fail) a single export.

My config has a group for each export and filesystem and another group for the NFS server and VIP. I then co-locate them together.

Cut-down config to limit the number of exports:

node 1: nfs-01
node 2: nfs-02
node 3: nfs-03
primitive NFSExportAdminHomes exportfs \
        params clientspec="172.16.40.0/24" options="rw,async,no_root_squash" directory="/srv/adminhomes" fsid=dcfd1bbb-c026-4d6d-8541-7fc29d6fef1a \
        op monitor timeout=20 interval=10 \
        op_params interval=10
primitive NFSExportArchive exportfs \
        params clientspec="172.16.40.0/24" options="rw,async,no_root_squash" directory="/srv/archive" fsid=3abb6e34-bff2-4896-b8ff-fc1123517359 \
        op monitor timeout=20 interval=10 \
        op_params interval=10 \
        meta target-role=Started
primitive NFSExportDBBackups exportfs \
        params clientspec="172.16.40.0/24" options="rw,async,no_root_squash" directory="/srv/dbbackups" fsid=df58b9c0-593b-45c0-9923-155b3d7d9483 \
        op monitor timeout=20 interval=10 \
        op_params interval=10
primitive NFSFSAdminHomes Filesystem \
        params device="/dev/mapper/adminhomes-part1" directory="/srv/adminhomes" fstype=xfs \
        op start interval=0 timeout=120 \
        op monitor interval=60 timeout=60 \
        op_params OCF_CHECK_LEVEL=20 \
        op stop interval=0 timeout=240
primitive NFSFSArchive Filesystem \
        params device="/dev/mapper/archive-part1" directory="/srv/archive" fstype=xfs \
        op start interval=0 timeout=120 \
        op monitor interval=60 timeout=60 \
        op_params OCF_CHECK_LEVEL=20 \
        op stop interval=0 timeout=240 \
        meta target-role=Started
primitive NFSFSDBBackups Filesystem \
        params device="/dev/mapper/dbbackups-part1" directory="/srv/dbbackups" fstype=xfs \
        op start timeout=60 interval=0 \
        op monitor interval=20 timeout=40 \
        op stop timeout=60 interval=0 \
        op_params OCF_CHECK_LEVEL=20
primitive NFSIP-01 IPaddr2 \
        params ip=172.16.40.17 cidr_netmask=24 nic=ens14 \
        op monitor interval=30s
group AdminHomes NFSFSAdminHomes NFSExportAdminHomes \
        meta target-role=Started
group Archive NFSFSArchive NFSExportArchive \
        meta target-role=Started
group DBBackups NFSFSDBBackups NFSExportDBBackups \
        meta target-role=Started
group NFSServerIP NFSIP-01 NFSServer \
        meta target-role=Started
colocation NFSMaster inf: NFSServerIP AdminHomes Archive DBBackups
property cib-bootstrap-options: \
        have-watchdog=false \
        dc-version=2.0.1-9e909a5bdd \
        cluster-infrastructure=corosync \
        cluster-name=nfs-cluster \
        stonith-enabled=false \
        last-lrm-refresh=1675344768
rsc_defaults rsc-options: \
        resource-stickiness=200


The problem is that if one export fails, none of the following exports will be attempted. Reading the docs, that's to be expected as each item in the colocation needs the preceding item to succeed.

I tried changing the colocation line like so to remove the dependency:

colocation NFSMaster inf: NFSServerIP ( AdminHomes Archive DBBackups )

but this gave me two problems:

1. Issuing a "resource stop DBBackups" took everything offline briefly

2. Issuing a "resource start DBBackups" brought it back on a different node to NFSServerIP 

I'm very obviously missing something here.

Could someone kindly point me in the right direction?

TIA.

Ronny

-- 
Ronny Adsetts
Technical Director
Amazing Internet Ltd, London
t: +44 20 8977 8943
w: www.amazinginternet.com

Registered office: 85 Waldegrave Park, Twickenham, TW1 4TJ
Registered in England. Company No. 4042957



More information about the Users mailing list