[ClusterLabs] Antw: [EXT] Multiple nfsserver resource groups

Mon Mar 9 08:28:56 EDT 2020

>From the manpages:
https://linux.die.net/man/8/sm-notify
"After an NFS client reboots, an NFS server must release all file locks held by applications that were running on that client. After a server reboots, a client must remind the server of file locks held by applications running on that client."
https://linux.die.net/man/8/rpc.statd 
" Prevents rpc.statd from running the sm-notify command when it starts up, preserving the existing NSM state number and monitor list."

I am no expert on this, but it seems that "rpc.stad --no-notify is to prevent rpc.statd from sending an additional reboot notification on startup.

So from what I understand, the client will try to recover the files via sm-notify to recover the locks released prior to the reboot.
If rpc.statd. sent another reboot notification on startup, the NSM state would be lost.

As Strahil mentioned in another response, this is how Red Hat documentation indicates that it should be setup (i.e. nfs_no_notify=true on the nfsserver and have nfsnotify resource exist)

-----Original Message-----
From: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de> 
Sent: Monday, March 9, 2020 1:40 PM
To: users at clusterlabs.org; christoforos at globalreach.com
Subject: RE: Antw: [EXT] [ClusterLabs] Multiple nfsserver resource groups

>>> "Christoforos Christoforou" <christoforos at globalreach.com> schrieb 
>>> am
09.03.2020 um 12:05 in Nachricht
<030101d5f602$9c9e3880$d5daa980$@globalreach.com>:
>>  Start: RAID1, LVM, fs, nsfsserver, exportfs, IP address (stop is the 
>> other

> way 'round)
> If your IP resource stops before the nfsserver, I would think it's 
> possible

> that connections and file handles are left hanging.

That's intended: I should look as if the server crashed. The clients would continue to try to reach the server. Then when the sever comes up on a different node and set's the IP address the clients would think the original server has recovered. I also think non-crashed clients should try to recover their locks then (we use NFS hard mounts).

> If you don't have an nfsnotify service, then this could be your problem.

OK, I thought the server would use nfsnotify to inform clients once it's up, but it looks like it does not; I see:
 /usr/sbin/rpc.statd --no-notify

> Does your nfsserver resource also have nfs_no_notify=true?

The parameter isn't set, so it should have the default value.

> 
> I'd say test the following:
> Test1: Set nfs_no_notify=true on your nfsserver resource and create an 
> nfsnotify resource that starts after the IP and stops first.
> Something like this: pcs resource create r_nfsnotify nfsnotify 
> source_host=IPADDRESS
> Test2: Set the IP to stop after the nfsserver resource and have 
> nfs_no_notify=false (default)

The interesting thing is that I've set up a test NFS server and that works perfectly. Only the real productive one has the problem. It's apoosible that some user (including systemd) does some stupid thing I'm not aware of.
One indicateor is that a filesystem unmounted during the stop sequence seems to be mounted again before the stop sequence is complete.

Regards,
Ulrich

> 
> Christoforos Christoforou
> Senior Systems Administrator
> Global Reach Internet Productions
> Twitter | Facebook | LinkedIn
> p (515) 996-0996 | globalreach.com
> 
> -----Original Message-----
> From: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>
> Sent: Monday, March 9, 2020 12:37 PM
> To: users at clusterlabs.org; christoforos at globalreach.com
> Subject: RE: Antw: [EXT] [ClusterLabs] Multiple nfsserver resource 
> groups
> 
>>>> "Christoforos Christoforou" <christoforos at globalreach.com> schrieb 
>>>> am
> 09.03.2020 um 10:38 in Nachricht
> <02ff01d5f5f6$7024af70$506e0e50$@globalreach.com>:
>> Thanks for the advice.
>> We haven’t had any issues with the time it takes to prepare the 
>> exportfs resources so far, and we've been running this setup for 2 
>> years now, but I will keep it in mind as we increase the number of 
>> exportfs resources. I have
> 
>> already implemented the solution discussed and merged all filesystems 
>> and exports into one resource group and everything looks good.
>> 
>> For your problem, what is the order in which your resources 
>> startup/shutdown?
> 
> Start: RAID1, LVM, fs, nsfsserver, exportfs, IP address (stop is the 
> other way
> 'round)
> 
>> Is your nfs info dir a filesystem resource or an LVM resource?
> 
> We have everything as LVs (see above)
> 
>> Do you have an nfsnotify resource in place?
> 
> Not explicitly. We are using NFSv3 only.
> 
>> We have found that the order in which resources startup/shutdown 
>> without any
> 
>> problems is to have the nfsnotify resource stop first, then stop the
> exportfs
>> resources, nfsserver after that, filesystems (one of which is the nfs 
>> shared
> 
>> info dir) and finally the virtual IP resource. Startup is the reverse 
>> of that.
> 
> Regards,
> Ulrich
> 
> 
>> 
>> -----Original Message-----
>> From: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>
>> Sent: Monday, March 9, 2020 9:26 AM
>> To: users at clusterlabs.org; christoforos at globalreach.com
>> Subject: Antw: [EXT] [ClusterLabs] Multiple nfsserver resource groups
>> 
>>>>> "Christoforos Christoforou" <christoforos at globalreach.com> schrieb 
>>>>> am
>> 06.03.2020 um 18:56 in Nachricht
>>
>
<25205_1583517421_5E628EE5_25205_75_1_01e001d5f3e0$804a6240$80df26c0$@globalr
>> eac
>> .com>:
>>> Hello,
>>> 
>>>  
>>> 
>>> We have a PCS cluster running on 2 CentOS 7 nodes, exposing 2 NFSv3 
>>> volumes which are then mounted to multiple servers (around 8).
>>> 
>>> We want to have 2 more sets of additional shared NFS volumes, for a 
>>> total
>> of
>>> 6.
>>> 
>>>  
>>> 
>>> I have successfully configured 3 resource groups, with each group 
>>> having
>> the
>>> following resources:
>>> 
>>> *	1x ocf_heartbeat_IPaddr2 resource for the Virtual IP that exposes
>>> the NFS share assigned to its own NIC.
>>> *	3x ocf_heartbeat_Filesystem resources (1 is for the
>>> nfs_shared_infodir and the other 2 are the ones exposed via the NFS
> server)
>>> *	1x ocf_heartbeat_nfsserver resource that uses the aforementioned
>>> nfs_shared_infodir.
>>> *	2x ocf_heartbeat_exportfs resources that expose the other 2
>>> filesystems as NFS shares.
>>> *	1x ocf_heartbeat_nfsnotify resource that has the Virtual IP set as
>>> its own source_host.
>>> 
>>>  
>>> 
>>> All 9 filesystem volumes are mounted via iSCSI to the PCS nodes in 
>>> /dev/mapper/mpathX
>>> 
>>> So the structure is like so:
>>> 
>>> Resource group 1:
>>> 
>>> *	/dev/mapper/mpatha ‑ shared volume 1
>>> *	/dev/mapper/mpathb ‑ shared volume 2
>>> *	/dev/mapper/mpathc ‑ nfs_shared_infodir for resource group 1
>>> 
>>> Resource group 2:
>>> 
>>> *	/dev/mapper/mpathd ‑ shared volume 3
>>> *	/dev/mapper/mpathe ‑ shared volume 4
>>> *	/dev/mapper/mpathf ‑ nfs_shared_infodir for resource group 2
>>> 
>>> Resource group 3:
>>> 
>>> *	/dev/mapper/mpathg ‑ shared volume 5
>>> *	/dev/mapper/mpathh ‑ shared volume 6
>>> *	/dev/mapper/mpathi ‑ nfs_shared_infodir for resource group 3
>>> 
>>>  
>>> 
>>> My concern is that when I run a df command on the active node, the 
>>> last ocf_heartbeat_nfsserver volume (/dev/mapper/mpathi) mounted to
>> /var/lib/nfs.
>>> I understand that I cannot change this, but I can change the 
>>> location of
>> the
>>> rpc_pipefs folder.
>>> 
>>>  
>>> 
>>> I have had this setup running with 2 resource groups in our 
>>> development environment, and have not noticed any issues, but since 
>>> we're planning to move to production and add a 3rd resource group, I 
>>> want to make sure that this setup will not cause any issues. I am by 
>>> no means an expert on NFS, so some insight is appreciated.
>>> 
>>>  
>>> 
>>> If this kind of setup is not supported or recommended, I have 2 
>>> alternate plans in mind:
>>> 
>>> 1.	Have all resources in the same resource group, in a setup that will
>>> look like this:
>>> 
>>> a.	1x ocf_heartbeat_IPaddr2 resource for the Virtual IP that exposes
>>> the NFS share.
>>> b.	7x ocf_heartbeat_Filesystem resources (1 is for the
>>> nfs_shared_infodir and 6 exposed via the NFS server)
>>> c.	1x ocf_heartbeat_nfsserver resource that uses the aforementioned
>>> nfs_shared_infodir.
>>> d.	6x ocf_heartbeat_exportfs resources that expose the other 6
>>> filesystems as NFS shares. Use the clientspec option to restrict to 
>>> IPs and prevent unwanted mounts.
>>> e.	1x ocf_heartbeat_nfsnotify resource that has the Virtual IP set as
>>> its own source_host.
>>> 
>>> 2.	Setup 2 more clusters to accommodate our needs
>>> 
>>>  
>>> 
>>> I really want to avoid #2, due to the fact that it will be overkill 
>>> for our case.
>> 
>> Things you might consider is to get reid of the groups and use 
>> explicit colocation and orderings. The advantages will be that you 
>> can execute
> several
>> agents in parallel (e.g. prepare all fileysstems in parallel). In the 
>> past
> we
>> had made the experience that exportfs resources can take quite some 
>> time and
> 
>> if you have like 20 or more of them, it delays the shutdown/startup 
>> significatly.
>> So we moved to using netgroups provided by LDAP instead, and we could 
>> reduce
> 
>> the number of exportfs statements drastically.
>> However we have one odd problem (SLES12 SP4): The NFS resource using 
>> systemd
> 
>> does not shut down clearly due some unmount issue related the shared 
>> info dir.
>> 
>>> 
>>> Thanks
>>> 
>>>  
>>> 
>>> Christoforos Christoforou
>>> 
>>> Senior Systems Administrator
>>> 
>>> Global Reach Internet Productions
>>> 
>>>  <http://www.twitter.com/globalreach> Twitter | 
>>> <http://www.facebook.com/globalreach> Facebook | 
>>> <https://www.linkedin.com/company/global‑reach‑internet‑productions>
>>> LinkedIn
>>> 
>>> p (515) 996‑0996 |  <http://www.globalreach.com/> globalreach.com
>>> 
>>>