[ClusterLabs] Two Node NFS doesn't failover with hardware glitches

Wed Apr 8 02:38:20 EDT 2015

> -----Original Message-----
> From: David Vossel [mailto:dvossel at redhat.com]
> Sent: Wednesday, 8 April 2015 4:16 AM
> To: Cluster Labs - All topics related to open-source clustering welcomed
> Subject: Re: [ClusterLabs] Two Node NFS doesn't failover with hardware
> glitches
> 
> 
> 
> ----- Original Message -----
> > Another behavior that was undesirable was the almost 20 minute delay
> > in shutting down the corosync+pacemaker services on the primary node
> > to force a failover. This left the NFS clients with stale connections
> > that were only able to be cleared by restarting the client machines
> > (web servers.)
> 
> Yep, i've seen this. This is typically a result of the floating IP address becoming
> available before the exports after a failover.
> 
> startup order should be.
> 
> 1. mount shared storage
> 2. nfs server
> 3. exports
> 4. floating IP.
> 
> 
> Here's some slides that outline how I'd recommend deploying nfs
> Active/Passive.
> It is a little different from what you have deployed.
> https://github.com/davidvossel/phd/blob/master/doc/presentations/nfs-
> ap-overview.pdf

Great explanation there, thanks! Tangential question: to get the  nfsnotify/lock recovery implemented in crm is it better to use lsb:statd or ocf:heartbeat:nfsserver?

Reading the documentation on statd it sounds as though it handles what to do in the case of a reboot. Do you know if they can be used in conjunction?

--Dan