[ClusterLabs] Problems with pcs/corosync/pacemaker/drbd/vip/nfs

Tue Mar 15 17:18:49 UTC 2016

On 03/14/2016 12:47 PM, Todd Hebert wrote:
> Hello,
> 
> I'm working on setting up a test-system that can handle NFS failover.
> 
> The base is CentOS 7.
> I'm using ZVOL block devices out of ZFS to back DRBD replicated volumes.
> 
> I have four DRBD resources (r0, r1, r2, r3, which are /dev/drbd1 drbd2 drbd3 and drbd4 respectively)
> 
> These all have XFS filesystems on them that mount properly and serve content etc..
> 
> I tried using corosync/pacemaker/drbd/lsb:nfs-kernel-server on Ubuntu, and it would serve content on the primary server without issue.  Any attempt to failover or migrate services.. .everything would say it migrated fine, and the filesystems would be mounted and readable/writable etc.., but the NFS clients access to them would "pause"
> 
> This appears to be an issue with the nfs-kernel-server in Ubuntu, where it simply would not recognise the NFS session information, which was on a replicated volume.
> 
> If the primary node is put back online, everything migrates back "perfect" and traffic that had "paused" on failover to the secondary system resumes, even if it's been sitting there for 15-20 minutes not working.
> 
> There is no difference in behaviour between offlining the primary node, and migrating lsb:nfs-kernel-server to another node (by it's primitive name, not as lsb:nfs-kernel-server, obviously)
> 
> If I create new connections into NFS while test-sanb is active, they work, only to "freeze" as above with an offline or migrate away from test-sanb, so symptoms are the same in both "directions"
> 
> ----
> 
> After not being able to get the lsb:nfs-kernel-server working properly in Ubuntu, and reading similar stories from other users after a series of googles, I switched over to CentOS 7.
> CentOS 7, instead of lsb:nfs-kernel-server, I am trying to use systemd:nfs-server, since CentOS 7 uses systemd, rather than sysinitv for managing services.

I'm not very familiar with NFS in a cluster, but there is an
ocf:heartbeat:nfsserver resource agent in the resource-agents package.
OCF agents are generally preferable to lsb/systemd because they give
more detailed information to the cluster, and it looks like in this
case, the RA does some RPC commands that the system scripts don't.

I'd give it a shot and see if it helps.

> Pretty much everything in the configuration except lsb:nfs-kernel-server came right over.
> 
> Now, everything will run properly on the primary node (as was the case with Ubuntu) but...
> If I put the "test-sana" node into standby, first NFS stops, then the VIP stops, then the three NFS-shared filesystems get umounted (perfect so far)
> Then.. it appears that parts of the NFS service, either idmapd or rpcbind haven't released their hold on the rpc_pipefs filesystem, so it's still mounted... it's mounted inside /var/lib/nfs, which is on the last drbd volume.
> Pacemaker, or some other element detects that rpc_pipefs was still mounted, umounts it, then umounts /var/lib/nfs, which should clear the way for everything else to work.. but that's not what happens.
> 
> At this point, the ms_drbd_r<N> should demote to "Secondary" on the primary mode, allowing for the secondary node to promote to "Primary" and services to start on "test-sanb", but instead the drbd processes on "test-sana" end up marked as "Stopped" and checking `cat /proc/drbd` shows that the volumes are still Primary/Secondary UpToDate/UpToDate on test-sana (and the opposite on test-sanb)
> 
> It takes AGES (several minutes) for things to reach this state.
> 
> They stay this way indefinitely.  If I manually demote DRBD resources on test-sana, they end up listed as "Master" in a "crm status" or "pcs status" again, and eventually the status changes to "Primary/Secondary" in /proc/drbd as well.
> 
> If I put node test-sana back online (node online test-sana) it takes a few seconds for services to start back up and serve content again.
> 
> Since I cannot get services to run on test-sanb at all thus far, I don't know if the symptoms would be the same in both directions.
> I can't find any differences in the two nodes that should account for this.
> 
> ---
> 
> In any case, what I need to arrive at is a working solution for NFS failure across two nodes.
> 
> I have several systems where I'm using just heartbeat in order to failover IP address, drbd, and nfs, but for single instances of drbd/nfs
> 
> I cannot find any working examples for either Ubuntu 14.04, nor CentOS 7 for this scenario.  (There are some out there for Ubuntu, but they do not appear to actually work with modern pacemaker et al.)
> 
> Does anyone have an example of working configurations for this?
> 
> My existing pacemaker configuration can be found here:  http://paste.ie/view/c766a4ff
> 
> As I mentioned, the configurations are nearly identical for both the Ubuntu 14.04 and CentOS 7 setups, and the hardware used is the same in both cases.
> 
> I know that I do not have Stonith configured, as well.  I do not have access to any fencing devices for a test system and have to rely on software only.