[ClusterLabs] nfsserver_monitor() doesn't detect nfsd process is lost.
Dejan Muhamedagic
dejanmm at fastmail.fm
Thu Jan 14 10:16:50 UTC 2016
On Thu, Jan 14, 2016 at 11:04:09AM +0100, Dejan Muhamedagic wrote:
> Hi,
>
> On Thu, Jan 14, 2016 at 04:20:19PM +0900, yuta takeshita wrote:
> > Hello.
> >
> > I have been a problem with nfsserver RA on RHEL 7.1 and systemd.
> > When the nfsd process is lost with unexpectly failure, nfsserver_monitor()
> > doesn't detect it and doesn't execute failover.
> >
> > I use the below RA.(but this problem may be caused with latest nfsserver RA
> > as well)
> > https://github.com/ClusterLabs/resource-agents/blob/v3.9.6/heartbeat/nfsserver
> >
> > The cause is following.
> >
> > 1. After execute "pkill -9 nfsd", "systemctl status nfs-server.service"
> > returns 0.
>
> I think that it should be systemctl is-active. Already had a
> problem with systemctl status, well, not being what one would
> assume status would be. Can you please test that and then open
> either a pull request or issue at
> https://github.com/ClusterLabs/resource-agents
I already made a pull request:
https://github.com/ClusterLabs/resource-agents/pull/741
Please test if you find time.
Thanks for reporting!
Dejan
> Thanks,
>
> Dejan
>
> > 2. nfsserver_monitor() judge with the return value of "systemctl status
> > nfs-server.service".
> >
> > ----------------------------------------------------------------------
> > # ps ax | grep nfsd
> > 25193 ? S< 0:00 [nfsd4]
> > 25194 ? S< 0:00 [nfsd4_callbacks]
> > 25197 ? S 0:00 [nfsd]
> > 25198 ? S 0:00 [nfsd]
> > 25199 ? S 0:00 [nfsd]
> > 25200 ? S 0:00 [nfsd]
> > 25201 ? S 0:00 [nfsd]
> > 25202 ? S 0:00 [nfsd]
> > 25203 ? S 0:00 [nfsd]
> > 25204 ? S 0:00 [nfsd]
> > 25238 pts/0 S+ 0:00 grep --color=auto nfsd
> > #
> > # pkill -9 nfsd
> > #
> > # systemctl status nfs-server.service
> > ● nfs-server.service - NFS server and services
> > Loaded: loaded (/etc/systemd/system/nfs-server.service; disabled; vendor
> > preset: disabled)
> > Active: active (exited) since 木 2016-01-14 11:35:39 JST; 1min 3s ago
> > Process: 25184 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited,
> > status=0/SUCCESS)
> > Process: 25182 ExecStartPre=/usr/sbin/exportfs -r (code=exited,
> > status=0/SUCCESS)
> > Main PID: 25184 (code=exited, status=0/SUCCESS)
> > CGroup: /system.slice/nfs-server.service
> > (snip)
> > #
> > # echo $?
> > 0
> > #
> > # ps ax | grep nfsd
> > 25256 pts/0 S+ 0:00 grep --color=auto nfsd
> > ----------------------------------------------------------------------
> >
> > It is because the nfsd process is kernel process, and systemd does not
> > monitor the state of the kernel process of running.
> >
> > Is there something good way?
> > (When I use "pidof" instead of "systemctl status", the faileover is
> > successful.)
> >
> > Regards,
> > Yuta Takeshita
>
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> > http://clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list