[ClusterLabs] nfsserver_monitor() doesn't detect nfsd process is lost.

Thu Jan 14 05:04:09 EST 2016

Hi,

On Thu, Jan 14, 2016 at 04:20:19PM +0900, yuta takeshita wrote:
> Hello.
> 
> I have been a problem with nfsserver RA on RHEL 7.1 and systemd.
> When the nfsd process is lost with unexpectly failure, nfsserver_monitor()
> doesn't detect it and doesn't execute failover.
> 
> I use the below RA.(but this problem may be caused with latest nfsserver RA
> as well)
> https://github.com/ClusterLabs/resource-agents/blob/v3.9.6/heartbeat/nfsserver
> 
> The cause is following.
> 
> 1. After execute "pkill -9 nfsd", "systemctl status nfs-server.service"
> returns 0.

I think that it should be systemctl is-active. Already had a
problem with systemctl status, well, not being what one would
assume status would be. Can you please test that and then open
either a pull request or issue at
https://github.com/ClusterLabs/resource-agents

Thanks,

Dejan

> 2. nfsserver_monitor() judge with the return value of "systemctl status
> nfs-server.service".
> 
> ----------------------------------------------------------------------
> # ps ax | grep nfsd
> 25193 ?        S<     0:00 [nfsd4]
> 25194 ?        S<     0:00 [nfsd4_callbacks]
> 25197 ?        S      0:00 [nfsd]
> 25198 ?        S      0:00 [nfsd]
> 25199 ?        S      0:00 [nfsd]
> 25200 ?        S      0:00 [nfsd]
> 25201 ?        S      0:00 [nfsd]
> 25202 ?        S      0:00 [nfsd]
> 25203 ?        S      0:00 [nfsd]
> 25204 ?        S      0:00 [nfsd]
> 25238 pts/0    S+     0:00 grep --color=auto nfsd
> #
> # pkill -9 nfsd
> #
> # systemctl status nfs-server.service
> ● nfs-server.service - NFS server and services
>    Loaded: loaded (/etc/systemd/system/nfs-server.service; disabled; vendor
> preset: disabled)
>    Active: active (exited) since 木 2016-01-14 11:35:39 JST; 1min 3s ago
>   Process: 25184 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited,
> status=0/SUCCESS)
>   Process: 25182 ExecStartPre=/usr/sbin/exportfs -r (code=exited,
> status=0/SUCCESS)
>  Main PID: 25184 (code=exited, status=0/SUCCESS)
>    CGroup: /system.slice/nfs-server.service
> (snip)
> #
> # echo $?
> 0
> #
> # ps ax | grep nfsd
> 25256 pts/0    S+     0:00 grep --color=auto nfsd
> ----------------------------------------------------------------------
> 
> It is because the nfsd process is kernel process, and systemd does not
> monitor the state of the kernel process of running.
> 
> Is there something good way?
> (When I use "pidof" instead of "systemctl status", the faileover is
> successful.)
> 
> Regards,
> Yuta Takeshita

> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org