[ClusterLabs] Active-Active NFS cluster failover test - system hangs (VirtualBox)

Fri Jul 14 04:49:24 EDT 2017

I'm still having troubles with 2-node active-active configuration for
NFS. The standby and unstandby of any node seems to work fine but NFS
hungs every time a node state changes.

When both nodes are up and I do ls on client1 I get directory listing.
Sometimes when a node is put to standby the ls on client1 is OK but
sometimes it hungs and it takes ~a minute until it responds. It seems
that after unstandby, the cluster stops nfs on healthy node and than
starts over on both. That could make client1 to temporary unable to
reach nfs export. Sometimes (not always) when a hung occures there is
a message in logs: ERROR: nfs-mountd is not running. Maybe the problem
is not caused by nfsserver itsef but bue to some problem with
ClusterIP. I've checked many configurations and still no luck.

- logs after unstandby node2
Jul 14 09:49:21 nfsnode2 nfsserver(nfs)[9420]: INFO: Status: rpcbind
Jul 14 09:49:21 nfsnode2 nfsserver(nfs)[9420]: INFO: Status: nfs-mountd
Jul 14 09:49:21 nfsnode2 nfsserver(nfs)[9420]: ERROR: nfs-mountd is not running
Jul 14 09:49:21 nfsnode2 nfsserver(nfs)[9420]: INFO: Starting NFS server ...
Jul 14 09:49:21 nfsnode2 nfsserver(nfs)[9420]: INFO: Start: rpcbind i: 1
Jul 14 09:49:21 nfsnode2 nfsserver(nfs)[9420]: INFO: Start: v3locking: 0
Jul 14 09:49:22 nfsnode2 nfsserver(nfs)[9420]: INFO: Start: nfs-mountd i: 1
Jul 14 09:49:22 nfsnode2 nfsserver(nfs)[9420]: INFO: Start: nfs-idmapd i: 1
Jul 14 09:49:22 nfsnode2 nfsserver(nfs)[9420]: INFO: Start: rpc-statd i: 1
Jul 14 09:49:22 nfsnode2 nfsserver(nfs)[9420]: INFO: NFS server started

- logs after standby node2:
Jul 14 09:54:36 nfsnode2 nfsserver(nfs)[23284]: INFO: Stopping NFS server ...
Jul 14 09:54:36 nfsnode2 nfsserver(nfs)[23284]: INFO: Stop: threads
Jul 14 09:54:36 nfsnode2 nfsserver(nfs)[23284]: INFO: Stop: rpc-statd
Jul 14 09:54:36 nfsnode2 nfsserver(nfs)[23284]: INFO: Stop: nfs-idmapd
Jul 14 09:54:36 nfsnode2 nfsserver(nfs)[23284]: INFO: Stop: nfs-mountd
Jul 14 09:54:36 nfsnode2 nfsserver(nfs)[23284]: INFO: Stop: rpcbind
Jul 14 09:54:36 nfsnode2 nfsserver(nfs)[23284]: INFO: Stop: rpc-gssd
Jul 14 09:54:36 nfsnode2 nfsserver(nfs)[23284]: INFO: Stop: umount
(1/10 attempts)
Jul 14 09:54:38 nfsnode2 nfsserver(nfs)[23284]: INFO: NFS server stopped

I don't see anything else in logs (I could paste all logs but it would be long)

# pcs resource --full
 Master: StorageClone
  Meta Attrs: master-node-max=1 clone-max=2 notify=true master-max=2
clone-node-max=1
  Resource: Storage (class=ocf provider=linbit type=drbd)
   Attributes: drbd_resource=storage
   Operations: start interval=0s timeout=240 (Storage-start-interval-0s)
               promote interval=0s timeout=90 (Storage-promote-interval-0s)
               demote interval=0s timeout=90 (Storage-demote-interval-0s)
               stop interval=0s timeout=100 (Storage-stop-interval-0s)
               monitor interval=60s (Storage-monitor-interval-60s)
 Clone: ClusterIP-clone
  Meta Attrs: clone-max=2 globally-unique=true clone-node-max=2
  Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: ip=10.0.2.7 cidr_netmask=32 clusterip_hash=sourceip
   Meta Attrs: resource-stickiness=0
   Operations: start interval=0s timeout=20s (ClusterIP-start-interval-0s)
               stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)
               monitor interval=5s (ClusterIP-monitor-interval-5s)
 Clone: ping-clone
  Resource: ping (class=ocf provider=pacemaker type=ping)
   Attributes: host_list="nfsnode1 nfsnode2"
   Operations: start interval=0s timeout=60 (ping-start-interval-0s)
               stop interval=0s timeout=20 (ping-stop-interval-0s)
               monitor interval=10 timeout=60 (ping-monitor-interval-10)
 Clone: vbox-fencing-clone
  Resource: vbox-fencing (class=stonith type=fence_vbox)
   Attributes: ip=10.0.2.2 username=AW23321
identity_file=/root/.ssh/id_rsa host_os=windows
vboxmanage_path="/cygdrive/c/Program\
Files/Oracle/VirtualBox/VBoxManage"
pcmk_host_map=nfsnode1:centos1;nfsnode2:centos2 secure=true
inet4_only=true login_timeout=30
   Operations: monitor interval=10 (vbox-fencing-monitor-interval-10)
 Clone: dlm-clone
  Meta Attrs: clone-max=2 clone-node-max=1 on-fail=fence
  Resource: dlm (class=ocf provider=pacemaker type=controld)
   Operations: start interval=0s timeout=90 (dlm-start-interval-0s)
               stop interval=0s timeout=100 (dlm-stop-interval-0s)
               monitor interval=3s (dlm-monitor-interval-3s)
 Clone: StorageFS-clone
  Resource: StorageFS (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/drbd1 directory=/mnt/drbd fstype=gfs2
   Operations: start interval=0s timeout=60 (StorageFS-start-interval-0s)
               stop interval=0s timeout=60 (StorageFS-stop-interval-0s)
               monitor interval=20 timeout=40 (StorageFS-monitor-interval-20)
 Clone: nfs-group-clone
  Meta Attrs: clone-max=2 clone-node-max=1 interleave=true
  Group: nfs-group
   Resource: nfs (class=ocf provider=heartbeat type=nfsserver)
    Attributes: nfs_ip=10.0.2.7 nfs_no_notify=true
    Operations: start interval=0s timeout=40 (nfs-start-interval-0s)
                stop interval=0s timeout=20s (nfs-stop-interval-0s)
                monitor interval=10s (nfs-monitor-interval-10s)
   Resource: nfs-export (class=ocf provider=heartbeat type=exportfs)
    Attributes: clientspec=10.0.2.0/255.255.255.0
options=rw,sync,no_root_squash directory=/mnt/drbd/nfs fsid=0
    Operations: start interval=0s timeout=40 (nfs-export-start-interval-0s)
                stop interval=0s timeout=120 (nfs-export-stop-interval-0s)
                monitor interval=10s (nfs-export-monitor-interval-10s)

# pcs constraint
Location Constraints:
Ordering Constraints:
  promote StorageClone then start StorageFS-clone (kind:Mandatory)
  start dlm-clone then start StorageFS-clone (kind:Mandatory)
  start StorageFS-clone then start nfs-group-clone (kind:Mandatory)
Colocation Constraints:
  StorageFS-clone with StorageClone (score:INFINITY) (with-rsc-role:Master)
  StorageFS-clone with dlm-clone (score:INFINITY)
  StorageFS-clone with nfs-group-clone (score:INFINITY)