[ClusterLabs] Two Node NFS doesn't failover with hardware glitches

Tue Apr 7 18:15:51 UTC 2015

----- Original Message -----
> Hello All,
> 
> First post to the list and a newbie to HA - please be patient with me!
> 
> We have a very simple 2 node NFS HA setup using Linbit's recipe. It's been in
> use since September of 2014. Initial testing showed successful failover to
> the other node and it's been left alone since that time.
> 
> Two weeks ago, the primary node had two issues hit at once. First, the
> primary RAID 1 HDD for the OS began to fail and this actually masked a
> second issue where the adaptec controller card started to throw errors as
> well (a RAID10 setup for the NFS mounts) giving spurious access to files.
> 
> Amazingly the entire node continued to operate, but that's really not the
> behavior I was expecting. The logs show (I'll post the relevant logs by
> Friday if they are needed - the machine is currently down) several attempts
> to cut-over to the other node, but the NFS mounts would not release. It
> eventually lead to a split-brain condition.
> 
> Another behavior that was undesirable was the almost 20 minute delay in
> shutting down the corosync+pacemaker services on the primary node to force a
> failover. This left the NFS clients with stale connections that were only
> able to be cleared by restarting the client machines (web servers.)

Yep, i've seen this. This is typically a result of the floating IP address
becoming available before the exports after a failover.

startup order should be.

1. mount shared storage
2. nfs server
3. exports
4. floating IP.

Here's some slides that outline how I'd recommend deploying nfs Active/Passive.
It is a little different from what you have deployed.
https://github.com/davidvossel/phd/blob/master/doc/presentations/nfs-ap-overview.pdf

-- David

> Restarting rpcbind, autofs, and nfs services weren't enough to kick the
> problem.
> 
> I've done quite a bit of digging to understand more of the issue. One item is
> adjusting the /proc/nfsv4leasetime to 10 seconds and the nfs.conf
> nfsv4gracetime setting to 10 seconds.
> 
> Still, this doesn't solve for the problem of a resource hanging on the
> primary node. Everything I'm reading indicates fencing is required, yet the
> boilerplate configuration from Linbit has stonith disabled.
> 
> These units are running CentOS 6.5
> corosync 1.4.1
> pacemaker 1.1.10
> drbd
> 
> Two questions then:
> 
> 1. how do we handle cranky hardware issues to ensure a smooth failover?
> 2. what additional steps are needed to ensure the NFS mounts don't go stale
> on the clients?
> 
> 
> 
> Below is the current output of crm configure show
> 
> node store-1.usync.us \
> 	attributes standby=off maintenance=off
> node store-2.usync.us \
> 	attributes standby=off maintenance=on
> primitive p_drbdr0_nfs ocf:linbit:drbd \
> 	params drbd_resource=r0 \
> 	op monitor interval=31s role=Master \
> 	op monitor interval=29s role=Slave \
> 	op start interval=0 timeout=240s \
> 	op stop interval=0 timeout=120s
> primitive p_exportfs_root exportfs \
> 	params fsid=0 directory="/export" options="rw,sync,crossmnt"
> 	clientspec="10.0.2.0/255.255.255.0" wait_for_leasetime_on_stop=false \
> 	op start interval=0 timeout=240s \
> 	op stop interval=0 timeout=100s \
> 	meta target-role=Started
> primitive p_exportfs_user_assets exportfs \
> 	params fsid=1 directory="/export/user_assets"
> 	options="rw,sync,no_root_squash,mountpoint"
> 	clientspec="10.0.2.0/255.255.255.0" wait_for_leasetime_on_stop=false \
> 	op monitor interval=30s \
> 	op start interval=0 timeout=240s \
> 	op stop interval=0 timeout=100s \
> 	meta is-managed=true target-role=Started
> primitive p_fs_user_assets Filesystem \
> 	params device="/dev/drbd0" directory="/export/user_assets" fstype=ext4 \
> 	op monitor interval=10s \
> 	meta target-role=Started is-managed=true
> primitive p_ip_nfs IPaddr2 \
> 	params ip=10.0.2.200 cidr_netmask=24 \
> 	op monitor interval=30s \
> 	meta target-role=Started
> primitive p_lsb_nfsserver lsb:nfs \
> 	op monitor interval=30s \
> 	meta target-role=Started
> primitive p_ping ocf:pacemaker:ping \
> 	params host_list=10.0.2.100 multiplier=1000 name=p_ping \
> 	op monitor interval=30 timeout=60
> group g_fs p_fs_user_assets \
> 	meta target-role=Started
> group g_nfs p_ip_nfs p_exportfs_user_assets \
> 	meta target-role=Started
> ms ms_drbdr0_nfs p_drbdr0_nfs \
> 	meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
> 	target-role=Started is-managed=true
> clone cl_ping p_ping
> location g_fs_on_connected_node g_fs \
> 	rule -inf: not_defined p_ping or p_ping lte 0
> colocation c_filesystem_with_drbdr0master inf: g_fs ms_drbdr0_nfs:Master
> colocation c_rootexport_with_nfsserver inf: p_exportfs_root p_lsb_nfsserver
> order o_drbdr0_before_filesystems inf: ms_drbdr0_nfs:promote g_fs:start
> order o_filesystem_before_nfsserver inf: g_fs p_lsb_nfsserver
> order o_nfsserver_before_rootexport inf: p_lsb_nfsserver p_exportfs_root
> order o_rootexport_before_exports inf: p_exportfs_root g_nfs
> property cib-bootstrap-options: \
> 	stonith-enabled=false \
> 	dc-version=1.1.10-14.el6_5.3-368c726 \
> 	cluster-infrastructure="classic openais (with plugin)" \
> 	expected-quorum-votes=2 \
> 	no-quorum-policy=ignore \
> 	maintenance-mode=false
> rsc_defaults rsc-options: \
> 	resource-stickiness=200
> 
> 
> #########################################
> 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>