[ClusterLabs] nfsserver_monitor() doesn't detect nfsd process is lost.

yuta takeshita y.takeshita0311 at gmail.com
Thu Jan 14 02:20:19 EST 2016


Hello.

I have been a problem with nfsserver RA on RHEL 7.1 and systemd.
When the nfsd process is lost with unexpectly failure, nfsserver_monitor()
doesn't detect it and doesn't execute failover.

I use the below RA.(but this problem may be caused with latest nfsserver RA
as well)
https://github.com/ClusterLabs/resource-agents/blob/v3.9.6/heartbeat/nfsserver

The cause is following.

1. After execute "pkill -9 nfsd", "systemctl status nfs-server.service"
returns 0.
2. nfsserver_monitor() judge with the return value of "systemctl status
nfs-server.service".

----------------------------------------------------------------------
# ps ax | grep nfsd
25193 ?        S<     0:00 [nfsd4]
25194 ?        S<     0:00 [nfsd4_callbacks]
25197 ?        S      0:00 [nfsd]
25198 ?        S      0:00 [nfsd]
25199 ?        S      0:00 [nfsd]
25200 ?        S      0:00 [nfsd]
25201 ?        S      0:00 [nfsd]
25202 ?        S      0:00 [nfsd]
25203 ?        S      0:00 [nfsd]
25204 ?        S      0:00 [nfsd]
25238 pts/0    S+     0:00 grep --color=auto nfsd
#
# pkill -9 nfsd
#
# systemctl status nfs-server.service
● nfs-server.service - NFS server and services
   Loaded: loaded (/etc/systemd/system/nfs-server.service; disabled; vendor
preset: disabled)
   Active: active (exited) since 木 2016-01-14 11:35:39 JST; 1min 3s ago
  Process: 25184 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited,
status=0/SUCCESS)
  Process: 25182 ExecStartPre=/usr/sbin/exportfs -r (code=exited,
status=0/SUCCESS)
 Main PID: 25184 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/nfs-server.service
(snip)
#
# echo $?
0
#
# ps ax | grep nfsd
25256 pts/0    S+     0:00 grep --color=auto nfsd
----------------------------------------------------------------------

It is because the nfsd process is kernel process, and systemd does not
monitor the state of the kernel process of running.

Is there something good way?
(When I use "pidof" instead of "systemctl status", the faileover is
successful.)

Regards,
Yuta Takeshita
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20160114/2a749ad2/attachment-0002.html>


More information about the Users mailing list