[ClusterLabs Developers] Issue with ocf:heartbeat:Filesystem

Fri Apr 23 11:32:37 EDT 2021

On Thu, 2021-04-22 at 13:15 +0000, Yanakiev, Vladimir x wrote:
> Team,
>  
> This is an old issue that persists over different HA platforms:
> Automounter (or autofs service) provides the ability of one FS to be
> mounted additionally on demand on specific location. Typical example
> of this is the home environment for different applications
> And app ID in naming services (LDAP, AD etc.) has one home
> environment – lets say “/home/appid”
> For different hosts this home environment might be different – on one
> host it can be “/export/home/appid/v1.0”, on another host it might be
> “/export/home/appid/v2.0” etc.
> Still when the app is to execute something on a host it needs its
> home environment as listed in the naming layer, so we configure
> automounter to mount as /home/appid “/export/home/appid/v1.0” for the
> first server and “/export/home/appid/v2.0” for the second server.
> A directory used by automounter CANNOT BE USED by anyone else – any
> attempt to mount something under /home in the above example will show
> “device busy” message and will fail.
> File system mounted and then loopback mounted under autofs control
> looks like this:
> tlsys-ucs-eng08a:/appl/test # cat /etc/auto.master
> # Sample auto.master file
> # Format of this file:
> # mountpoint map options
> # Also see variable AUTOFS_OPTIONS in /etc/sysconfig/autofs
> # For details of the format look at autofs(8).
> /appl   /etc/auto_appl -rw,intr,nosuid,nobrowse
> …
> tlsys-ucs-eng08a:/appl/test # cat /etc/auto_appl
> # Local auto_appl automounter file.
> # Example:
> # directory             --bind         
> localhost:/export/appl/directory
> #
> # When adding entries for NFS mounts from Solaris servers add the
> following
> # options:
> # -rsize=32768,wsize=32768,nfsvers=3,tcp,retrans=5,timeo=600
> # Example
> # dir -rsize=32768,wsize=32768,nfsvers=3,tcp,retrans=5,timeo=600
> server:/mount
> …
> test    --bind  localhost:/export/appl/test
> tlsys-ucs-eng08a:/usr/lib/ocf/resource.d/heartbeat # cd /appl/test
> tlsys-ucs-eng08a:/appl/test # mount |grep test
> /dev/mapper/DG1-test on /export/appl/test type ext4 (rw,relatime)
> /dev/mapper/DG1-test on /appl/test type ext4 (rw,relatime)
> If we try to unmount /export/appl/test in the above example we will
> get “device busy” message, but there will be no process in the
> processes table showing usage. Neither lsof will show anything
> regarding this FS.
> In case of SLES HA, attempt to stop the resource or to migrate it to
> another server will cause panic to the server as the Filesystem agent
> will be unable to stop the resource.
>  
> The above behavior is not acceptable. We have configure multiple
> service groups that can be executed independently on any of the
> members of a HA cluster, so on one host we may have more than one
> services. Panic on the host would disrupt the work of other
> applications.
>  
> To avoid this I modified lightly the Filesystem agent, allowing it to
> search for such cases. As a base I will use version resource-agents-
> 4.4.0+git57.70549516-3.12.1.x86_64:
>  
> …
> 320 # Lists all filesystems potentially mounted under a given path,
> 321 # excluding the path itself.
> 322 list_submounts() {
> 323         list_mounts | grep " $1/" | cut -d' ' -f2 | sort -r
> 324 }
> 325
> 326 # FNMA - Lists automounter loopback
> 327 list_loopbacks() {
> 328         list_mounts | grep "$1" | grep -v "$2" | cut -d' ' -f2 |
> sort -r
> 329 }
> 330
> …
> 649                 # for SUB in `list_submounts $MOUNTPOINT`
> $MOUNTPOINT; do
> 650                 # FNMA: original line above was modified bellow:
> 651                 for SUB in `list_submounts $MOUNTPOINT`
> `list_loopbacks $DEVICE $MOUNTPOINT` $MOUNTPOINT; do
>  
> Adding one extra subroutine to look for loopbacks, and during the
> buildup of the list of submounts adding it to address the issue. This
> makes graceful stop of the FS resource without the panic.
>  
> The logic of the proposed change is this – if we stop FS, we need to
> stop ANY its representations on the current host. Most likely this
> will be in preparation for the next step – to disable the vg for
> migration. Failure to release the device that has the FS will prevent
> the vg to be disabled.
>  
> Attached is a copy that we at Fannie Mae use with no problems. The
> inconvenience for us is with each and every patch upgrade or major
> version release we need to redo the agent modification. I believe
> this small change deserves to be part of the original code. Any
> thoughts?
>  
> Vladimir Yanakiev
> Unix Engineer, Hosting & Engineering Services – Solution Engineering
> Compute
> Phone: 703-833-3770 (direct) | 571-246-1946 (mobile)

I think it's a great enhancement. If you can, submit a pull request at:

https://github.com/ClusterLabs/resource-agents

If you can't do that for whatever reason, update the latest version of
the agent with your changes and post it here with permission given to
merge it into the project.

Either way, from there we can discuss it and merge it if appropriate.
-- 
Ken Gaillot <kgaillot at redhat.com>