[ClusterLabs] Antw: Re: Antw: Re: Antw: Re: Antw: Re: OCFS2 on cLVM with node waiting for fencing timeout

Wed Oct 19 04:50:22 EDT 2016

>>> Eric Ren <zren at suse.com> schrieb am 19.10.2016 um 10:32 in Nachricht
<bd71c012-0832-6ad3-9958-5df2b7e69fa5 at suse.com>:
> Hi,
> 
> On 10/14/2016 03:17 PM, Ulrich Windl wrote:
>>>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 13.10.2016 um 16:49 in Nachricht
>> <97fdafc7-7efe-41d8-99fa-20abb20506f6 at redhat.com>:
>>> On 10/13/2016 03:36 AM, Ulrich Windl wrote:
>>>> That's what I'm talking about: If 1 of 3 nodes is rebooting (or the cluster
>>> is split-brain 1:2), the single node CANNOT continue due to lack of quorum,
>>> while the remaining two nodes can. Is it still necessary to wait for
>>> completion of stonith?
>>>
>>> If the 2 nodes have working communication with the 1 node, then the 1
>>> node will leave the cluster in an orderly way, and fencing will not be
>>> involved. In that case, yes, quorum is used to prevent the 1 node from
>>> starting services until it rejoins the cluster.
>> The $%&/@ problem of a root process having a file open on OCFS prevented the 
> clean unmount of the filesystem. I think newer versions of the RA now even 
> kill root processes.
>> Can anybody explain why root processes were excluded before?
> @Ken Gaillot
> Thanks your explanation on the fencing things.
> 
> @Ulrich Windl
> I did't try the RAs you mentioned. But I am wondering if the new version of 
> the RA can kill
> the root process that is doing IO in D state?

Hi!

I'm not absolutely sure, but I always thought one of the major differences between HP-UX and Linux was that you could even kill processes that were waiting for I/O in Linux ;-) I'm no making any statement what will happen to the actual I/O being performed...

Ulrich

> 
> Eric
>>
>>> However, if the 2 nodes lose communication with the 1 node, they cannot
>>> be sure it is functioning well enough to respect quorum. In this case,
>>> they have to fence it. DLM has to wait for the fencing to succeed to be
>>> sure the 1 node is not messing with shared resources.
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org 
>>> http://clusterlabs.org/mailman/listinfo/users 
>>>
>>> Project Home: http://www.clusterlabs.org 
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>> Bugs: http://bugs.clusterlabs.org 
>>
>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org 
>> http://clusterlabs.org/mailman/listinfo/users 
>>
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
>>
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org