[ClusterLabs] [Linux-HA] file system resource becomes inaccesible when any of the node goes down

Mon Jul 6 10:14:34 UTC 2015

On 07/06/2015 02:50 PM, Dejan Muhamedagic wrote:
> Hi,
>
> On Sun, Jul 05, 2015 at 09:13:56PM +0500, Muhammad Sharfuddin wrote:
>> SLES 11 SP 3 + online updates(pacemaker-1.1.11-0.8.11.70
>> openais-1.1.4-5.22.1.7)
>>
>> Its a dual primary drbd cluster, which mounts a file system resource
>> on both the cluster nodes simultaneously(file system type is ocfs2).
>>
>> Whenever any of the nodes goes down, the file system(/sharedata)
>> become inaccessible for exact 35 seconds on the other
>> (surviving/online) node, and then become available again on the
>> online node.
>>
>> Please help me understand why the node which survives or remains
>> online unable to access the file system resource(/sharedata) for 35
>> seconds ? and how can I fix the cluster so that file system remains
>> accessible on the surviving node without any interruption/delay(as
>> in my case of about 35 seconds)
>>
>> By inaccessible, I meant to say that running "ls -l /sharedata" and
>> "df /sharedata" does not return any output and does not return the
>> prompt back on the online node for exact 35 seconds once the other
>> node becomes offline.
>>
>> e.g "node1" got offline somewhere around  01:37:15, and then
>> /sharedata file system was inaccessible during 01:37:35 and 01:38:18
>> on the online node i.e "node2".
> Before the failing node gets fenced you won't be able to use the
> ocfs2 filesystem. In this case, the fencing operation takes 40
> seconds:
so its expected.
>> [...]
>> Jul  5 01:37:35 node2 sbd: [6197]: info: Writing reset to node slot node1
>> Jul  5 01:37:35 node2 sbd: [6197]: info: Messaging delay: 40
>> Jul  5 01:38:15 node2 sbd: [6197]: info: reset successfully
>> delivered to node1
>> Jul  5 01:38:15 node2 sbd: [6196]: info: Message successfully delivered.
>> [...]
> You may want to reduce that sbd timeout.
Ok, so reducing the sbd timeout(or msgwait) would provide the 
uninterrupted access to the ocfs2 file system on the surviving/online node ?
or would it just minimize the downtime ?
>
> Thanks,
>
> Dejan
> _______________________________________________
> Linux-HA mailing list is closing down.
> Please subscribe to users at clusterlabs.org instead.
> http://clusterlabs.org/mailman/listinfo/users
> _______________________________________________
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>
--
Regards,

Muhammad Sharfuddin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150706/e836a5bd/attachment-0003.html>