[ClusterLabs] Active-Active NFS cluster failover test - system hangs (VirtualBox)
ArekW
arkaduis at gmail.com
Fri Jul 14 04:49:24 EDT 2017
I'm still having troubles with 2-node active-active configuration for
NFS. The standby and unstandby of any node seems to work fine but NFS
hungs every time a node state changes.
When both nodes are up and I do ls on client1 I get directory listing.
Sometimes when a node is put to standby the ls on client1 is OK but
sometimes it hungs and it takes ~a minute until it responds. It seems
that after unstandby, the cluster stops nfs on healthy node and than
starts over on both. That could make client1 to temporary unable to
reach nfs export. Sometimes (not always) when a hung occures there is
a message in logs: ERROR: nfs-mountd is not running. Maybe the problem
is not caused by nfsserver itsef but bue to some problem with
ClusterIP. I've checked many configurations and still no luck.
- logs after unstandby node2
Jul 14 09:49:21 nfsnode2 nfsserver(nfs)[9420]: INFO: Status: rpcbind
Jul 14 09:49:21 nfsnode2 nfsserver(nfs)[9420]: INFO: Status: nfs-mountd
Jul 14 09:49:21 nfsnode2 nfsserver(nfs)[9420]: ERROR: nfs-mountd is not running
Jul 14 09:49:21 nfsnode2 nfsserver(nfs)[9420]: INFO: Starting NFS server ...
Jul 14 09:49:21 nfsnode2 nfsserver(nfs)[9420]: INFO: Start: rpcbind i: 1
Jul 14 09:49:21 nfsnode2 nfsserver(nfs)[9420]: INFO: Start: v3locking: 0
Jul 14 09:49:22 nfsnode2 nfsserver(nfs)[9420]: INFO: Start: nfs-mountd i: 1
Jul 14 09:49:22 nfsnode2 nfsserver(nfs)[9420]: INFO: Start: nfs-idmapd i: 1
Jul 14 09:49:22 nfsnode2 nfsserver(nfs)[9420]: INFO: Start: rpc-statd i: 1
Jul 14 09:49:22 nfsnode2 nfsserver(nfs)[9420]: INFO: NFS server started
- logs after standby node2:
Jul 14 09:54:36 nfsnode2 nfsserver(nfs)[23284]: INFO: Stopping NFS server ...
Jul 14 09:54:36 nfsnode2 nfsserver(nfs)[23284]: INFO: Stop: threads
Jul 14 09:54:36 nfsnode2 nfsserver(nfs)[23284]: INFO: Stop: rpc-statd
Jul 14 09:54:36 nfsnode2 nfsserver(nfs)[23284]: INFO: Stop: nfs-idmapd
Jul 14 09:54:36 nfsnode2 nfsserver(nfs)[23284]: INFO: Stop: nfs-mountd
Jul 14 09:54:36 nfsnode2 nfsserver(nfs)[23284]: INFO: Stop: rpcbind
Jul 14 09:54:36 nfsnode2 nfsserver(nfs)[23284]: INFO: Stop: rpc-gssd
Jul 14 09:54:36 nfsnode2 nfsserver(nfs)[23284]: INFO: Stop: umount
(1/10 attempts)
Jul 14 09:54:38 nfsnode2 nfsserver(nfs)[23284]: INFO: NFS server stopped
I don't see anything else in logs (I could paste all logs but it would be long)
# pcs resource --full
Master: StorageClone
Meta Attrs: master-node-max=1 clone-max=2 notify=true master-max=2
clone-node-max=1
Resource: Storage (class=ocf provider=linbit type=drbd)
Attributes: drbd_resource=storage
Operations: start interval=0s timeout=240 (Storage-start-interval-0s)
promote interval=0s timeout=90 (Storage-promote-interval-0s)
demote interval=0s timeout=90 (Storage-demote-interval-0s)
stop interval=0s timeout=100 (Storage-stop-interval-0s)
monitor interval=60s (Storage-monitor-interval-60s)
Clone: ClusterIP-clone
Meta Attrs: clone-max=2 globally-unique=true clone-node-max=2
Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=10.0.2.7 cidr_netmask=32 clusterip_hash=sourceip
Meta Attrs: resource-stickiness=0
Operations: start interval=0s timeout=20s (ClusterIP-start-interval-0s)
stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)
monitor interval=5s (ClusterIP-monitor-interval-5s)
Clone: ping-clone
Resource: ping (class=ocf provider=pacemaker type=ping)
Attributes: host_list="nfsnode1 nfsnode2"
Operations: start interval=0s timeout=60 (ping-start-interval-0s)
stop interval=0s timeout=20 (ping-stop-interval-0s)
monitor interval=10 timeout=60 (ping-monitor-interval-10)
Clone: vbox-fencing-clone
Resource: vbox-fencing (class=stonith type=fence_vbox)
Attributes: ip=10.0.2.2 username=AW23321
identity_file=/root/.ssh/id_rsa host_os=windows
vboxmanage_path="/cygdrive/c/Program\
Files/Oracle/VirtualBox/VBoxManage"
pcmk_host_map=nfsnode1:centos1;nfsnode2:centos2 secure=true
inet4_only=true login_timeout=30
Operations: monitor interval=10 (vbox-fencing-monitor-interval-10)
Clone: dlm-clone
Meta Attrs: clone-max=2 clone-node-max=1 on-fail=fence
Resource: dlm (class=ocf provider=pacemaker type=controld)
Operations: start interval=0s timeout=90 (dlm-start-interval-0s)
stop interval=0s timeout=100 (dlm-stop-interval-0s)
monitor interval=3s (dlm-monitor-interval-3s)
Clone: StorageFS-clone
Resource: StorageFS (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/drbd1 directory=/mnt/drbd fstype=gfs2
Operations: start interval=0s timeout=60 (StorageFS-start-interval-0s)
stop interval=0s timeout=60 (StorageFS-stop-interval-0s)
monitor interval=20 timeout=40 (StorageFS-monitor-interval-20)
Clone: nfs-group-clone
Meta Attrs: clone-max=2 clone-node-max=1 interleave=true
Group: nfs-group
Resource: nfs (class=ocf provider=heartbeat type=nfsserver)
Attributes: nfs_ip=10.0.2.7 nfs_no_notify=true
Operations: start interval=0s timeout=40 (nfs-start-interval-0s)
stop interval=0s timeout=20s (nfs-stop-interval-0s)
monitor interval=10s (nfs-monitor-interval-10s)
Resource: nfs-export (class=ocf provider=heartbeat type=exportfs)
Attributes: clientspec=10.0.2.0/255.255.255.0
options=rw,sync,no_root_squash directory=/mnt/drbd/nfs fsid=0
Operations: start interval=0s timeout=40 (nfs-export-start-interval-0s)
stop interval=0s timeout=120 (nfs-export-stop-interval-0s)
monitor interval=10s (nfs-export-monitor-interval-10s)
# pcs constraint
Location Constraints:
Ordering Constraints:
promote StorageClone then start StorageFS-clone (kind:Mandatory)
start dlm-clone then start StorageFS-clone (kind:Mandatory)
start StorageFS-clone then start nfs-group-clone (kind:Mandatory)
Colocation Constraints:
StorageFS-clone with StorageClone (score:INFINITY) (with-rsc-role:Master)
StorageFS-clone with dlm-clone (score:INFINITY)
StorageFS-clone with nfs-group-clone (score:INFINITY)
More information about the Users
mailing list