[ClusterLabs] Two Node NFS doesn't failover with hardware glitches

Tue Apr 7 12:51:17 EDT 2015

Hello All,

First post to the list and a newbie to HA - please be patient with me!

We have a very simple 2 node NFS HA setup using Linbit's recipe. It's been in use since September of 2014. Initial testing showed successful failover to the other node and it's been left alone since that time. 

Two weeks ago, the primary node had two issues hit at once. First, the primary RAID 1 HDD for the OS began to fail and this actually masked a second issue where the adaptec controller card started to throw errors as well (a RAID10 setup for the NFS mounts) giving spurious access to files.

Amazingly the entire node continued to operate, but that's really not the behavior I was expecting. The logs show (I'll post the relevant logs by Friday if they are needed - the machine is currently down) several attempts to cut-over to the other node, but the NFS mounts would not release. It eventually lead to a split-brain condition. 

Another behavior that was undesirable was the almost 20 minute delay in shutting down the corosync+pacemaker services on the primary node to force a failover. This left the NFS clients with stale connections that were only able to be cleared by restarting the client machines (web servers.) Restarting rpcbind, autofs, and nfs services weren't enough to kick the problem.

I've done quite a bit of digging to understand more of the issue. One item is adjusting the /proc/nfsv4leasetime to 10 seconds and the nfs.conf nfsv4gracetime setting to 10 seconds.

Still, this doesn't solve for the problem of a resource hanging on the primary node. Everything I'm reading indicates fencing is required, yet the boilerplate configuration from Linbit has stonith disabled.

These units are running CentOS 6.5
corosync 1.4.1
pacemaker 1.1.10
drbd

Two questions then:

1. how do we handle cranky hardware issues to ensure a smooth failover?
2. what additional steps are needed to ensure the NFS mounts don't go stale on the clients?

Below is the current output of crm configure show

node store-1.usync.us \
	attributes standby=off maintenance=off
node store-2.usync.us \
	attributes standby=off maintenance=on
primitive p_drbdr0_nfs ocf:linbit:drbd \
	params drbd_resource=r0 \
	op monitor interval=31s role=Master \
	op monitor interval=29s role=Slave \
	op start interval=0 timeout=240s \
	op stop interval=0 timeout=120s
primitive p_exportfs_root exportfs \
	params fsid=0 directory="/export" options="rw,sync,crossmnt" clientspec="10.0.2.0/255.255.255.0" wait_for_leasetime_on_stop=false \
	op start interval=0 timeout=240s \
	op stop interval=0 timeout=100s \
	meta target-role=Started
primitive p_exportfs_user_assets exportfs \
	params fsid=1 directory="/export/user_assets" options="rw,sync,no_root_squash,mountpoint" clientspec="10.0.2.0/255.255.255.0" wait_for_leasetime_on_stop=false \
	op monitor interval=30s \
	op start interval=0 timeout=240s \
	op stop interval=0 timeout=100s \
	meta is-managed=true target-role=Started
primitive p_fs_user_assets Filesystem \
	params device="/dev/drbd0" directory="/export/user_assets" fstype=ext4 \
	op monitor interval=10s \
	meta target-role=Started is-managed=true
primitive p_ip_nfs IPaddr2 \
	params ip=10.0.2.200 cidr_netmask=24 \
	op monitor interval=30s \
	meta target-role=Started
primitive p_lsb_nfsserver lsb:nfs \
	op monitor interval=30s \
	meta target-role=Started
primitive p_ping ocf:pacemaker:ping \
	params host_list=10.0.2.100 multiplier=1000 name=p_ping \
	op monitor interval=30 timeout=60
group g_fs p_fs_user_assets \
	meta target-role=Started
group g_nfs p_ip_nfs p_exportfs_user_assets \
	meta target-role=Started
ms ms_drbdr0_nfs p_drbdr0_nfs \
	meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true target-role=Started is-managed=true
clone cl_ping p_ping
location g_fs_on_connected_node g_fs \
	rule -inf: not_defined p_ping or p_ping lte 0
colocation c_filesystem_with_drbdr0master inf: g_fs ms_drbdr0_nfs:Master
colocation c_rootexport_with_nfsserver inf: p_exportfs_root p_lsb_nfsserver
order o_drbdr0_before_filesystems inf: ms_drbdr0_nfs:promote g_fs:start
order o_filesystem_before_nfsserver inf: g_fs p_lsb_nfsserver
order o_nfsserver_before_rootexport inf: p_lsb_nfsserver p_exportfs_root
order o_rootexport_before_exports inf: p_exportfs_root g_nfs
property cib-bootstrap-options: \
	stonith-enabled=false \
	dc-version=1.1.10-14.el6_5.3-368c726 \
	cluster-infrastructure="classic openais (with plugin)" \
	expected-quorum-votes=2 \
	no-quorum-policy=ignore \
	maintenance-mode=false
rsc_defaults rsc-options: \
	resource-stickiness=200

#########################################