[ClusterLabs] [Linux-HA] Antw: Re: Antw: Re: file system resource becomes inaccesible when any of the node goes down

Thu Jul 9 09:24:23 UTC 2015

On 07/09/2015 01:54 PM, Lars Marowsky-Bree wrote:
> On 2015-07-07T14:15:14, Muhammad Sharfuddin <M.Sharfuddin at nds.com.pk> wrote:
>
>> now msgwait timeout is set to 10s and a delay/inaccessibility of 15 seconds
>> was observed. If a service(App, DB, file server) is installed and running
>> from the ocfs2 file system via the surviving/online node, then
>> wouldn't that service get crashed or become offline due to the
>> inaccessibility of the file system(event though its ocfs2) while a member
>> node goes down ?
> You're seeing a trade-off of using OCFS2. The semantics of the file
> system require all accessing nodes to be very closely synchronized (that
> is not optional), and that requires the access to the fs to be paused
> during recovery. (See the CAP theorem.)
>
> The apps don't crash, they are simply blocked. (To them it looks like
> slow IO.)
>
> The same is true for DRBD in active/active mode; the block device is
> tightly synchronized, and this requires both nodes to be up, or cleanly
> reported as down.
>
>> If cluster is configured to run the two independent services, and starts one
>> on node1 and ther on node2, while both the service shared the same file
>> system, /sharedata(ocfs2),  then in case of a failure of one node, the
>> other/online wont be able to
>> keep running the particular service because the file system holding the
>> binaries/configuration/service is not available for around at least 15
>> seconds.
>>
>> I don't understand the advantage of Ocfs2 file system in such a setup.
> If that's your setup, indeed, you're not getting any advantages. OCFS2
> makes sense if you have services that indeed need access to the same
> file system and directory structure.
>
> If you have two independent services, or even services that are
> essentially node local, you're much better off using independent,
> separate file system mounts with XFS or extX.
>
>
>
> Regards,
>      Lars
>
Thanks for keeping this thread alive and the explanation share.

Sorry, I didn't understand, at one point you wrote:
a - The apps don't crash, they are simply blocked.

while on the other you wrote:
b - If that's your setup, indeed, you're not getting any advantages.

If the service configured to run on the node, which survives / remains 
online, but get blocked(due to slow I/O) and
eventually becomes healthy after like 30 seconds, then its also 
acceptable. E.g a user is connected to the service
via a client app, that was running on a node that will remain online, 
and the other member goes down, that client app wont
get the response from service till next 35 seconds, but the queries 
client has executed or information that was being
uploaded at the time when the other member goes down, wont be lost as 
the service will resume shortly, isn't ?

--
Regards,

Muhammad Sharfuddin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150709/e65311f1/attachment.htm>