[Pacemaker] nfs4 cluster fail-over stops working once I introduce ipaddr2 resource

Fri Feb 14 19:32:42 EST 2014

On 14.02.2014 23:51, Dennis Jacobfeuerborn wrote:
> On 14.02.2014 22:33, David Vossel wrote:
>>
>>
>>
>>
>> ----- Original Message -----
>>> From: "Dennis Jacobfeuerborn" <dennisml at conversis.de>
>>> To: "The Pacemaker cluster resource manager"
>>> <pacemaker at oss.clusterlabs.org>
>>> Sent: Thursday, February 13, 2014 11:18:04 PM
>>> Subject: Re: [Pacemaker] nfs4 cluster fail-over stops working once I
>>> introduce ipaddr2 resource
>>>
>>> On 14.02.2014 02:50, Dennis Jacobfeuerborn wrote:
>>>> Hi,
>>>> I'm still working on my NFSv4 cluster and things are working as
>>>> expected...as long as I don't add an IPAddr2 resource.
>>>>
>>>> The DRBD, filesystem and exportfs resources work fine and when I put
>>>> the
>>>> active node into standby everything fails over as expected.
>>>>
>>>> Once I add a VIP as a IPAddr2 resource however I seem to get monitor
>>>> problems with the p_exportfs_root resource.
>>>>
>>>> I've attached the configuration, status and a log file.
>>>>
>>>> The transition status is the status a moment after I take nfs1
>>>> (192.168.100.41) offline. It looks like the stopping of p_ip_nfs does
>>>> something to the p_exportfs_root resource although I have no idea what
>>>> that could be.
>>>>
>>>> The final status is the status after the cluster has settled. The
>>>> fail-over finished but the failed action is still present and cannot be
>>>> cleared with a "crm resource cleanup p_exportfs_root".
>>>>
>>>> The log is the result of a "tail -f" on the corosync.log from the
>>>> moment
>>>> before I issued the "crm node standby nfs1" to when the cluster has
>>>> settled.
>>>>
>>>> Does anybody know what the issue could be here? At first I thought that
>>>> using a VIP from the same network as the cluster nodes could be an
>>>> issue
>>>> but when I change this to use an IP in a different network
>>>> 192.168.101.43/24 the same thing happens.
>>>>
>>>> The moment I remove p_ip_nfs from the configuration again fail-over
>>>> back
>>>> and forth works without a hitch.
>>>
>>> So after a lot of digging I think I pinpointed the issue: A race between
>>> the monitoring and stop actions of the exportfs resource script.
>>>
>>> When "wait_for_leasetime_on_stop" is set the following happens for the
>>> stop action and in this specific order:
>>>
>>> 1. The directory is unexported
>>> 2. Sleep nfs lease time + 2 seconds
>>>
>>> The problem seems to be that during the sleep phase the monitoring
>>> action is still invoked and since the directory has already been
>>> unexported it reports a failure.
>>>
>>> Once I add enabled="false" to the monitoring action of the exportfs
>>> resource the problem disappears.
>>>
>>> The question is how to ensure that the monitoring action is not called
>>> while the stop action is still sleeping?
>>>
>>> Would it be a solution to create a lock file for the duration of the
>>> sleep and check for that lock file in the monitoring action?
>>>
>>> I'm not 100% sure if this analysis is correct because if monitoring
>>
>> right, I doubt that is happening.
>>
>> What happens if you put the ip before the nfs server.
>>
>> group p_ip_nfs g_nfs p_fs_data p_exportfs_root p_exportfs_data
>
> The first thing I have now done is simplify this a little bit more by
> removing the clone for the nfs-server and put that first in the g_nfs
> group instead.
> This apparently fixed things although I have no idea why that would make
> a difference.

This was a mistake on my end. I forgot that I still had 
'enabled="false"' in the config for the monitor actions.

When I have a "watch -n 1 exportfs" running on the active node I notice 
that the moment the stop action is called for the exportfs_data resource 
the entry for both _root and _data are no longer exported.
Since at that point the stop action for _data is still riding out the 
lease time the monitor action the _root resource is still active and as 
a result the monitor action will be called and notice that the directory 
is no longer exported.

The question is why both exportfs entries disappear at the same time.

> Also before the FAILED state only ever happened with the p_exportfs_root
> resource and never with the p_exportfs_data resource and it seemed to
> fail even before the _data resource was stopped which lead me to the
> suspicion that it's the monitoring call that is responsible for putting
> that resource into the failed state.
> The other bit of evidence was that when I had a "watch -n 1 exportfs"
> running the exports disappeared long before the lease time sleep
> expired. Looking at the script I verified that the exports are first
> removed before the sleep happens.
>
> Can you confirm that pacemaker actually suspends monitoring calls for a
> resource *before* the stop action is called? I just want to make sure
> that I'm not making unwarranted assumptions and base my debugging on that.
>
>> Without drbd, I have a scenario I test for active/passive nfs server
>> here that works for me.
>> https://github.com/davidvossel/phd/blob/master/scenarios/nfs-basic.scenario
>> I'm using the actual nfsserver ocf script from the latest
>> resource-agent github branch.
>
> Interesting. I will take a closer look at that once I understand what is
> going wrong in my seemingly simple yet failing configuration.
>
> Regards,
>    Dennis
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org