[Pacemaker] nfs4 cluster fail-over stops working once I introduce ipaddr2 resource

Fri Feb 14 01:50:10 UTC 2014

Hi,
I'm still working on my NFSv4 cluster and things are working as 
expected...as long as I don't add an IPAddr2 resource.

The DRBD, filesystem and exportfs resources work fine and when I put the 
active node into standby everything fails over as expected.

Once I add a VIP as a IPAddr2 resource however I seem to get monitor 
problems with the p_exportfs_root resource.

I've attached the configuration, status and a log file.

The transition status is the status a moment after I take nfs1 
(192.168.100.41) offline. It looks like the stopping of p_ip_nfs does 
something to the p_exportfs_root resource although I have no idea what 
that could be.

The final status is the status after the cluster has settled. The 
fail-over finished but the failed action is still present and cannot be 
cleared with a "crm resource cleanup p_exportfs_root".

The log is the result of a "tail -f" on the corosync.log from the moment 
before I issued the "crm node standby nfs1" to when the cluster has settled.

Does anybody know what the issue could be here? At first I thought that 
using a VIP from the same network as the cluster nodes could be an issue 
but when I change this to use an IP in a different network 
192.168.101.43/24 the same thing happens.

The moment I remove p_ip_nfs from the configuration again fail-over back 
and forth works without a hitch.

Regards,
   Dennis
-------------- next part --------------
A non-text attachment was scrubbed...
Name: corosync.log
Type: text/x-log
Size: 65132 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140214/82187b08/attachment-0003.bin>
-------------- next part --------------
node nfs1 \
	attributes standby="off"
node nfs2 \
	attributes standby="off"
primitive p_drbd_nfs ocf:linbit:drbd \
	params drbd_resource="r0" \
	op monitor interval="15" role="Master" \
	op monitor interval="30" role="Slave"
primitive p_exportfs_data ocf:heartbeat:exportfs \
	params fsid="1" directory="/srv/nfs/data" options="rw,mountpoint,no_root_squash" clientspec="192.168.100.0/255.255.255.0" wait_for_leasetime_on_stop="true" \
	op monitor interval="30s" \
	op stop interval="0" timeout="20s"
primitive p_exportfs_root ocf:heartbeat:exportfs \
	params fsid="0" directory="/srv/nfs" options="rw,crossmnt" clientspec="192.168.100.0/255.255.255.0" \
	op monitor interval="10s" \
	op stop interval="0" timeout="20s"
primitive p_fs_data ocf:heartbeat:Filesystem \
	params device="/dev/drbd1" directory="/srv/nfs/data" fstype="ext4" \
	op monitor interval="10s" \
	op stop interval="0" timeout="20s"
primitive p_ip_nfs ocf:heartbeat:IPaddr2 \
	params ip="192.168.100.43" cidr_netmask="24" \
	op monitor interval="30s"
primitive p_lsb_nfsserver lsb:nfs \
	op monitor interval="30s"
group g_nfs p_fs_data p_exportfs_root p_exportfs_data p_ip_nfs
ms ms_drbd_nfs p_drbd_nfs \
	meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
clone cl_lsb_nfsserver p_lsb_nfsserver
colocation c_nfs_on_drbd inf: g_nfs ms_drbd_nfs:Master
order o_drbd_before_nfs inf: ms_drbd_nfs:promote g_nfs:start
property $id="cib-bootstrap-options" \
	dc-version="1.1.10-14.el6_5.2-368c726" \
	cluster-infrastructure="cman" \
	stonith-enabled="false" \
	no-quorum-policy="ignore" \
	last-lrm-refresh="1392341228" \
	maintenance-mode="false"
rsc_defaults $id="rsc-options" \
	resource-stickiness="100"

-------------- next part --------------
Last updated: Fri Feb 14 01:26:36 2014
Last change: Fri Feb 14 01:22:53 2014 via crm_attribute on nfs2
Stack: cman
Current DC: nfs1 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured
8 Resources configured

Node nfs1: standby
Online: [ nfs2 ]

 Master/Slave Set: ms_drbd_nfs [p_drbd_nfs]
     Masters: [ nfs2 ]
     Stopped: [ nfs1 ]
 Clone Set: cl_lsb_nfsserver [p_lsb_nfsserver]
     Started: [ nfs2 ]
     Stopped: [ nfs1 ]
 Resource Group: g_nfs
     p_fs_data	(ocf::heartbeat:Filesystem):	Started nfs2 
     p_exportfs_root	(ocf::heartbeat:exportfs):	Started nfs2 
     p_exportfs_data	(ocf::heartbeat:exportfs):	Started nfs2 
     p_ip_nfs	(ocf::heartbeat:IPaddr2):	Started nfs2 

Failed actions:
    p_exportfs_root_monitor_10000 on nfs1 'not running' (7): call=337, status=complete, last-rc-change='Fri Feb 14 01:23:02 2014', queued=0ms, exec=0ms

-------------- next part --------------
Last updated: Fri Feb 14 01:44:07 2014
Last change: Fri Feb 14 01:43:56 2014 via crm_attribute on nfs2
Stack: cman
Current DC: nfs1 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured
8 Resources configured

Node nfs1: standby
Online: [ nfs2 ]

 Master/Slave Set: ms_drbd_nfs [p_drbd_nfs]
     Masters: [ nfs1 ]
     Slaves: [ nfs2 ]
 Clone Set: cl_lsb_nfsserver [p_lsb_nfsserver]
     Started: [ nfs2 ]
     Stopped: [ nfs1 ]
 Resource Group: g_nfs
     p_fs_data	(ocf::heartbeat:Filesystem):	Started nfs1 
     p_exportfs_root	(ocf::heartbeat:exportfs):	FAILED nfs1 
     p_exportfs_data	(ocf::heartbeat:exportfs):	Started nfs1 
     p_ip_nfs	(ocf::heartbeat:IPaddr2):	Stopped 

Failed actions:
    p_exportfs_root_monitor_10000 on nfs1 'not running' (7): call=485, status=complete, last-rc-change='Fri Feb 14 01:43:58 2014', queued=0ms, exec=0ms