[Pacemaker] Fail-over NFS Server (need cluster configuration check)

Sun Feb 26 21:32:08 EST 2012

On two machines (A and B) I've created three identical LVM
partitions (DRBD backing device) called srv, home and software.

The fs on all of them is ext4.

The home fs has quotas.

srv, home and software are exported via NFS.

Both A and B do also have an extra locally mounted fs (data1 and
data2 respectively) with quotas, data1 and data2 are exported via
NFS too (NO DRBD backing device for them... they are just local
file systems).

Both A and B do have a dhcp server but only one dhcp server can
be found running on the machine which have all three drbd fs
in primary mode.

A floating IP is used for mounting srv, software and home on all
NFS clients.

The cluster configuration I'd like to have should reproduce the
following scenario:

A: ( srv + home + software + IP + dhcp + nfsserver + quota-server)
B: ( nfs-server + quota-server)

or

A: ( nfs-server + quota-server)
A: ( srv + home + software + IP + dhcp + nfsserver )

### Cluster Configuration ###

1) All ms_drbd must be in primary mode on the same host:

primitive p_drbd_home ocf:linbit:drbd \
  params drbd_resource="home" \
  op start interval="0" timeout="60" \
  op stop interval="0" timeout="240" \
  op monitor interval="20"
primitive p_drbd_software ocf:linbit:drbd \
  params drbd_resource="software" \
  op start interval="0" timeout="60" \
  op stop interval="0" timeout="240" \
  op monitor interval="20"
primitive p_drbd_srv ocf:linbit:drbd \
  params drbd_resource="srv" \
  op start interval="0" timeout="60" \
  op stop interval="0" timeout="240" \
  op monitor interval="20"
ms ms_drbd_home p_drbd_home \
  meta master-max="1" master-node-max="1" \
  clone-max="2" clone-node-max="1" notify="true"
ms ms_drbd_software p_drbd_software \
  meta master-max="1" master-node-max="1" \
  clone-max="2" clone-node-max="1" notify="true"
ms ms_drbd_srv p_drbd_srv \
  meta master-max="1" master-node-max="1" \
  clone-max="2" clone-node-max="1" notify="true"
colocation co_ms_drbd_home_with_ms_drbd_srv_and_ms_drbd_software \
  inf: ms_drbd_home:Master ms_drbd_srv:Master ms_drbd_software:Master

Questions:

 - is the "colocation" definition correct/enough?

 - how to enforce a sequence of events such as: promote software first,
   then if everything went ok promote srv, then if everything went ok
   promote home? (I would need this behavior because... see questions at
   the end of point 2)

2) Mounting srv, software and home fs + floating IP + dhcp server on the
   node hosting all drbd devices in primary mode:

primitive p_fs_home ocf:heartbeat:Filesystem \
  params device="/dev/drbd/by-res/home" \
  directory="/share/drbd/nfs/home" fstype="ext4" \
  options="noatime,usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0" \
  op start interval="0" timeout="60" \
  op stop interval="0" timeout="240" \
  op monitor interval="20"
primitive p_fs_software ocf:heartbeat:Filesystem \
  params device="/dev/drbd/by-res/software" \
  directory="/share/drbd/nfs/software" fstype="ext4" \
  options="noatime" \
  op start interval="0" timeout="60" \
  op stop interval="0" timeout="240" \
  op monitor interval="20"
primitive p_fs_srv ocf:heartbeat:Filesystem \
  params device="/dev/drbd/by-res/srv" \
  directory="/share/drbd/nfs/srv" fstype="ext4" \
  options="noatime" \
  op start interval="0" timeout="60" \
  op stop interval="0" timeout="240" \
  op monitor interval="20"
primitive p_ip_nfs ocf:heartbeat:IPaddr2 \
  params ip="192.168.0.50" cidr_netmask="24" iflabel="nfs" \
  op monitor interval="20"
primitive p_service_isc-dhcp-server lsb:isc-dhcp-server \
  op start interval="0" timeout="60" \
  op stop interval="0" timeout="240" \
  op monitor interval="20"
group g_service_fs_ip_dhcp p_fs_srv p_fs_software p_fs_home \
  p_ip_nfs p_service_isc-dhcp-server
colocation co_ms_drbd_home_with_g_service_fs_ip_dhcp \
  inf: g_service_fs_ip_dhcp ms_drbd_home:Master
order o_g_service_fs_ip_dhcp_after_ms_drbd_home_promote \
  inf: ms_drbd_home:promote g_service_fs_ip_dhcp:start

Questions:

 - If I know that home is the last drbd device promoted into
   primary mode, then I'm ready to mount all fs, start the
   floating IP and dhcp server on the node where drbd home is
   in primary mode... are both colocation and order constraints
   correct? 

3) nfs-server and quota-server must be started on both hosts
   once all filesystems are mouned:

primitive p_service_nfs-common lsb:nfs-common \
  op start interval="0" timeout="60" \
  op stop interval="0" timeout="240" \
  op monitor interval="20"
primitive p_service_nfs-kernel-server lsb:nfs-kernel-server \
  op start interval="0" timeout="60" \
  op stop interval="0" timeout="240" \
  op monitor interval="20"
primitive p_service_quota lsb:quota \
  op start interval="0" timeout="60" \
  op stop interval="0" timeout="240" \
  op monitor interval="20"
group g_service_nfs_quota p_service_nfs-common \
  p_service_nfs-kernel-server p_service_quota
clone cl_g_service_nfs_quota g_service_nfs_quota
order o_cl_g_service_nfs_quota_after_service_fs_ip_dhcp_start \
inf: g_service_fs_ip_dhcp:start cl_g_service_nfs_quota

Questions:

  - Here I'm really lost... and with this configuration my
    cluster do not act properly (many error messages) once I set
    in standby one of the two nodes.... do you see anything weired
    here?

###

I can post the error messages but I'd first like to make sure that
the cluster configuration is at least not that bad...

Thanks to all.

--matt