[ClusterLabs Developers] Issue with ocf:heartbeat:Filesystem

Yanakiev, Vladimir x vladimir_x_yanakiev at fanniemae.com
Thu Apr 22 09:15:31 EDT 2021


Team,

This is an old issue that persists over different HA platforms:

  *   Automounter (or autofs service) provides the ability of one FS to be mounted additionally on demand on specific location. Typical example of this is the home environment for different applications
     *   And app ID in naming services (LDAP, AD etc.) has one home environment - lets say "/home/appid"
     *   For different hosts this home environment might be different - on one host it can be "/export/home/appid/v1.0", on another host it might be "/export/home/appid/v2.0" etc.
     *   Still when the app is to execute something on a host it needs its home environment as listed in the naming layer, so we configure automounter to mount as /home/appid "/export/home/appid/v1.0" for the first server and "/export/home/appid/v2.0" for the second server.
  *   A directory used by automounter CANNOT BE USED by anyone else - any attempt to mount something under /home in the above example will show "device busy" message and will fail.
  *   File system mounted and then loopback mounted under autofs control looks like this:
tlsys-ucs-eng08a:/appl/test # cat /etc/auto.master
# Sample auto.master file
# Format of this file:
# mountpoint map options
# Also see variable AUTOFS_OPTIONS in /etc/sysconfig/autofs
# For details of the format look at autofs(8).
/appl   /etc/auto_appl -rw,intr,nosuid,nobrowse
...
tlsys-ucs-eng08a:/appl/test # cat /etc/auto_appl
# Local auto_appl automounter file.
# Example:
# directory             --bind          localhost:/export/appl/directory
#
# When adding entries for NFS mounts from Solaris servers add the following
# options:
# -rsize=32768,wsize=32768,nfsvers=3,tcp,retrans=5,timeo=600
# Example
# dir -rsize=32768,wsize=32768,nfsvers=3,tcp,retrans=5,timeo=600 server:/mount
...
test    --bind  localhost:/export/appl/test
tlsys-ucs-eng08a:/usr/lib/ocf/resource.d/heartbeat # cd /appl/test
tlsys-ucs-eng08a:/appl/test # mount |grep test
/dev/mapper/DG1-test on /export/appl/test type ext4 (rw,relatime)
/dev/mapper/DG1-test on /appl/test type ext4 (rw,relatime)

  *   If we try to unmount /export/appl/test in the above example we will get "device busy" message, but there will be no process in the processes table showing usage. Neither lsof will show anything regarding this FS.
  *   In case of SLES HA, attempt to stop the resource or to migrate it to another server will cause panic to the server as the Filesystem agent will be unable to stop the resource.

The above behavior is not acceptable. We have configure multiple service groups that can be executed independently on any of the members of a HA cluster, so on one host we may have more than one services. Panic on the host would disrupt the work of other applications.

To avoid this I modified lightly the Filesystem agent, allowing it to search for such cases. As a base I will use version resource-agents-4.4.0+git57.70549516-3.12.1.x86_64:

...
320 # Lists all filesystems potentially mounted under a given path,
321 # excluding the path itself.
322 list_submounts() {
323         list_mounts | grep " $1/" | cut -d' ' -f2 | sort -r
324 }
325
326 # FNMA - Lists automounter loopback
327 list_loopbacks() {
328         list_mounts | grep "$1" | grep -v "$2" | cut -d' ' -f2 | sort -r
329 }
330
...
649                 # for SUB in `list_submounts $MOUNTPOINT` $MOUNTPOINT; do
650                 # FNMA: original line above was modified bellow:
651                 for SUB in `list_submounts $MOUNTPOINT` `list_loopbacks $DEVICE $MOUNTPOINT` $MOUNTPOINT; do

Adding one extra subroutine to look for loopbacks, and during the buildup of the list of submounts adding it to address the issue. This makes graceful stop of the FS resource without the panic.

The logic of the proposed change is this - if we stop FS, we need to stop ANY its representations on the current host. Most likely this will be in preparation for the next step - to disable the vg for migration. Failure to release the device that has the FS will prevent the vg to be disabled.

Attached is a copy that we at Fannie Mae use with no problems. The inconvenience for us is with each and every patch upgrade or major version release we need to redo the agent modification. I believe this small change deserves to be part of the original code. Any thoughts?

Vladimir Yanakiev
Unix Engineer, Hosting & Engineering Services - Solution Engineering Compute
Phone: 703-833-3770 (direct) | 571-246-1946 (mobile)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/developers/attachments/20210422/cc1ec825/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Filesystem
Type: application/octet-stream
Size: 27538 bytes
Desc: Filesystem
URL: <https://lists.clusterlabs.org/pipermail/developers/attachments/20210422/cc1ec825/attachment-0001.obj>


More information about the Developers mailing list