[ClusterLabs] NFS Failover - client hangs

Tue Jun 23 17:03:43 UTC 2015

Ok, i made some tests and now I can be more precise:
My configuration:
NODE1 192.168.122.143  (rep net 10.1.0.2)   Centos 6.5 64bit
NODE2 192.168.122.63 (rep net 10.1.0.3)      Centos 6.5 64 bit
are 2 vms
so on my host i have virbr0 192.168.122.1
and eth0 192.168.1.45 (nfs ACL is ok!)

Failover simulation: putting in standby one node, so all resources are
migrated to the other node. Is this ok?

Following the red hat tutorial, i'm trying to use tcpdump when i simulate a
failover but i cannot see any  NFS4ERR_GRACE error. I think that nfs grace
and lease time are ok, because they are ok in /proc/fs/nfsd/nfsv4leasetime
and /proc/fs/nfsd/nfsv4gracetime. I see a lot of TCP Dup ACK packets. Could
be a problem related to the environment? Should i test this configuration
on physical nodes?

I also noted another strange behavior:
When i simulate a down, i launch "time ls" on the client. If the client
mounts from NODE1 (active server) and this node goes down, "time ls"
require few seconds, but then i simulate another down (NODE2) and in this
case "time ls" require from 2 to 5 minutes. There is the same behavior if
the client mounts from NODE2 (active server): first down require less time
than the second.
Is this simulation wrong?
Someone can help me? please, i need a really HA NFS server.
Thanks,
MM

2015-06-22 12:18 GMT+02:00 Marco Marino <marino.mrc at gmail.com>:

> Following the solution proposed by red hat I noted that the resource agent
> cannot manage NFSD_V4_LEASE and NFSD_V4_GRACE options in
> /etc/sysconfig/nfs.
> However, i manually changed the script
> in /usr/lib/ocf/resource.d/heartbeat/nfsserver on both nodes, but there is
> the same problem. How can i check if nfs "understand" this parameters?
> Should i reduce some timeout in the exportfs resources or nfsserver
> resource?
>
> Thanks,
> MM
>
> 2015-06-22 11:12 GMT+02:00 Michael Schwartzkopff <ms at sys4.de>:
>
>> Am Montag, 22. Juni 2015, 10:51:16 schrieb Marco Marino:
>> > Hi,
>> > I'm building an nfs server with drbd and pacemaker on CentOS 6.5 and i
>> have
>> > some questions related to the failover. In my installation after a
>> > simulated failover, clients hangs for a random time (between few seconds
>> > and 140 seconds) before commands like "ls" or "touch" became responsive.
>> > This happens also if I use nfsvers=3 on clients. Why this happens? How
>> can
>> > manage this case for reduce this time?
>> > Following the guide on the linbit site ("Nfs on rhel 6") on chapter 11
>> > there are some failover tests and it should works without this kind of
>> > problems.
>> >
>> > Thanks,
>> > MM
>>
>> Looks like your server waits for the lease / grace timeout. Please see:
>>
>> https://access.redhat.com/solutions/42868
>>
>> or goole for "nfsv4 lease timeout"
>>
>> The options grace / least timeout can be configured as agent parameters.
>> Lower
>> it according to your needs.
>>
>> Mit freundlichen Grüßen,
>>
>> Michael Schwartzkopff
>>
>> --
>> [*] sys4 AG
>>
>> http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
>> Franziskanerstraße 15, 81669 München
>>
>> Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
>> Vorstand: Patrick Ben Koetter, Marc Schiffbauer
>> Aufsichtsratsvorsitzender: Florian Kirstein
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150623/aea1a5e8/attachment.htm>