[ClusterLabs] Help with tweaking an active/passive NFS cluster

Mon Apr 17 06:17:45 EDT 2023

Andrei Borzenkov wrote on 05/04/2023 08:36:
> On Fri, Mar 31, 2023 at 12:42 AM Ronny Adsetts
> <ronny.adsetts at amazinginternet.com> wrote:
>>
>> Hi,
>>
>> I wonder if someone more familiar with the workings of pacemaker/corosync would be able to assist in solving an issue.
>>
>> I have a 3-node NFS cluster which exports several iSCSI LUNs. The LUNs are presented to the nodes via multipathd.
>>
>> This all works fine except that I can't stop just one export. Sometimes I need to take a single filesystem offline for maintenance for example. Or if there's an issue and a filesystem goes offline and can't come back.
>>
>> There's a trimmed down config below but essentially I want all the NFS exports on one node but I don't want any of the exports to block. So it's OK to stop (or fail) a single export.
>>
>> My config has a group for each export and filesystem and another group for the NFS server and VIP. I then co-locate them together.
>>
>> Cut-down config to limit the number of exports:
>>
>> node 1: nfs-01
>> node 2: nfs-02
>> node 3: nfs-03
>> primitive NFSExportAdminHomes exportfs \
>>         params clientspec="172.16.40.0/24" options="rw,async,no_root_squash" directory="/srv/adminhomes" fsid=dcfd1bbb-c026-4d6d-8541-7fc29d6fef1a \
>>         op monitor timeout=20 interval=10 \
>>         op_params interval=10
>> primitive NFSExportArchive exportfs \
>>         params clientspec="172.16.40.0/24" options="rw,async,no_root_squash" directory="/srv/archive" fsid=3abb6e34-bff2-4896-b8ff-fc1123517359 \
>>         op monitor timeout=20 interval=10 \
>>         op_params interval=10 \
>>         meta target-role=Started
>> primitive NFSExportDBBackups exportfs \
>>         params clientspec="172.16.40.0/24" options="rw,async,no_root_squash" directory="/srv/dbbackups" fsid=df58b9c0-593b-45c0-9923-155b3d7d9483 \
>>         op monitor timeout=20 interval=10 \
>>         op_params interval=10
>> primitive NFSFSAdminHomes Filesystem \
>>         params device="/dev/mapper/adminhomes-part1" directory="/srv/adminhomes" fstype=xfs \
>>         op start interval=0 timeout=120 \
>>         op monitor interval=60 timeout=60 \
>>         op_params OCF_CHECK_LEVEL=20 \
>>         op stop interval=0 timeout=240
>> primitive NFSFSArchive Filesystem \
>>         params device="/dev/mapper/archive-part1" directory="/srv/archive" fstype=xfs \
>>         op start interval=0 timeout=120 \
>>         op monitor interval=60 timeout=60 \
>>         op_params OCF_CHECK_LEVEL=20 \
>>         op stop interval=0 timeout=240 \
>>         meta target-role=Started
>> primitive NFSFSDBBackups Filesystem \
>>         params device="/dev/mapper/dbbackups-part1" directory="/srv/dbbackups" fstype=xfs \
>>         op start timeout=60 interval=0 \
>>         op monitor interval=20 timeout=40 \
>>         op stop timeout=60 interval=0 \
>>         op_params OCF_CHECK_LEVEL=20
>> primitive NFSIP-01 IPaddr2 \
>>         params ip=172.16.40.17 cidr_netmask=24 nic=ens14 \
>>         op monitor interval=30s
>> group AdminHomes NFSFSAdminHomes NFSExportAdminHomes \
>>         meta target-role=Started
>> group Archive NFSFSArchive NFSExportArchive \
>>         meta target-role=Started
>> group DBBackups NFSFSDBBackups NFSExportDBBackups \
>>         meta target-role=Started
>> group NFSServerIP NFSIP-01 NFSServer \
>>         meta target-role=Started
>> colocation NFSMaster inf: NFSServerIP AdminHomes Archive DBBackups
> 
> This is entirely equivalent to defining a group and says that
> resources must be started in strict order on the same node. Like with
> a group, if an earlier resource cannot be started, all following
> resources are not started either.
> 
>> property cib-bootstrap-options: \
>>         have-watchdog=false \
>>         dc-version=2.0.1-9e909a5bdd \
>>         cluster-infrastructure=corosync \
>>         cluster-name=nfs-cluster \
>>         stonith-enabled=false \
>>         last-lrm-refresh=1675344768
>> rsc_defaults rsc-options: \
>>         resource-stickiness=200
>>
>>
>> The problem is that if one export fails, none of the following exports will be attempted. Reading the docs, that's to be expected as each item in the colocation needs the preceding item to succeed.
>>
>> I tried changing the colocation line like so to remove the dependency:
>>
>> colocation NFSMaster inf: NFSServerIP ( AdminHomes Archive DBBackups )
>>
> 
> 1. The ( AdminHomes Archive DBBackups ) creates a set with
> sequential=false. Now, the documentation for "sequential" is one of
> the most obscure I have seen, but judging by "the individual members
> within any one set may or may not be colocated relative to each other
> (determined by the set’s sequential property)" and "A colocated set
> with sequential="false" makes sense only if there is another set in
> the constraint. Otherwise, the constraint has no effect" members of a
> set with sequential=false are not colocated on the same node.
> 
> 2. The condition is backward. You colocate NFSServerIP *with* set (
> AdminHomes Archive DBBackups ), while you actually want to colocate
> set ( AdminHomes Archive DBBackups ) *with* NFSServerIP.
> 
> So the
> 
> colocation NFSMaster inf: ( AdminHomes Archive DBBackups ) ( NFSServerIP )
> 
> may work.
> 
> The pacemaker behavior is rather puzzling though. According to
> documentation "in order for any member of one set in the constraint to
> be active, all members of sets listed after it must also be active
> (and naturally on the same node)", but in your case members of set are
> on the same node which would imply that NFSServerIP (which is a sole
> member of an implicit set) should not be active.

Thanks for the explainer here, that's really useful.

I don't spend lots of time tinkering with pacemaker as it's only a tiny part of what I do so I do suffer from lack of in-depth knowledge which can be both painful and annoying. :-).

This particular issue only came to the fore and therefore became more urgent to solve when one of the LUNs failed to mount.

> Anyway, an alternative is to define separate colocation for each group
> which likely makes configuration more clear

Yes, this seems the sensible way forward. I'll reconfigure and give it a go.

I've no idea why I did it the way I did - it was a couple of years ago now. Probably some aversion to having NFSServerIP listed in multiple colocation lines.

Ronny

-- 
Ronny Adsetts
Technical Director
Amazing Internet Ltd, London
t: +44 20 8977 8943
w: www.amazinginternet.com

Registered office: 85 Waldegrave Park, Twickenham, TW1 4TJ
Registered in England. Company No. 4042957