[ClusterLabs] Antw: Re: OCFS2 on cLVM with node waiting for fencing timeout

Thu Oct 13 03:48:56 EDT 2016

Hi,

On 10/11/2016 02:18 PM, Ulrich Windl wrote:
> {>>>> emmanuel segura <emi2fast at gmail.com> schrieb am 10.10.2016 um 16:49 in
>> Nachricht
>> <CAE7pJ3CBJR3pctT3N_jaMCXBuUGD3nta=yA8FZNbNfAifK3uXg at mail.gmail.com>:
>>
>
> Node h01 (old DC) was fenced at Oct 10 10:06:33
> Node h01 went down around Oct 10 10:06:37.
> DLM noticed that on node h05:
> Oct 10 10:06:44 h05 cluster-dlm[12063]: dlm_process_node: Removed inactive node 739512321: born-on=3180, last-seen=3208, this-event=3212, last-event=3208
> cLVM and OCFS noticed the event also:
> Oct 10 10:06:44 h05 ocfs2_controld[12147]: Sending notification of node 739512321 for "490B9FCAFA3D4B2F9A586A5893E00730"
> Oct 10 10:06:44 h05 ocfs2_controld[12147]: Notified for "490B9FCAFA3D4B2F9A586A5893E00730", node 739512321, status 0
>
> Similar on node h10 (new DC):
> Oct 10 10:06:44 h10 cluster-dlm[32150]: dlm_process_node: Removed inactive node 739512321: born-on=3180, last-seen=3208, this-event=3212, last-event=3208
> Oct 10 10:06:44 h10 ocfs2_controld[32271]:   notice: crm_update_peer_state: plugin_handle_membership: Node h01[739512321] - state is now lost (was member)
> Oct 10 10:06:44 h10 ocfs2_controld[32271]: node daemon left 739512321
> Oct 10 10:06:44 h10 ocfs2_controld[32271]: Sending notification of node 739512321 for "490B9FCAFA3D4B2F9A586A5893E00730"
>
> My point is this: For a resource that can only exclusively run on one node, it's important that the other node is down before taking action. But for cLVM and OCFS2 the resources can run concurrently on each node, so I don't see why every node veirtually freezes until STONITH completed.

OCFS2 use DLM (fs/dlm in kernel). And DLM use the cpg service provided by corosync [1] to 
get nodes membership.
And the membership only gets stable until STONITH completed.

[1] https://en.wikipedia.org/wiki/Corosync_Cluster_Engine

> If you have a large cluster (maybe 100 nodes), OCFS will be unavailable most of the time if any node fails.
The upper limit is 32 nodes AFAIK. But I think it's unusual to see more than 3-nodes cluster?

Assuming your case exists, yes, it will take much more time to recover from node failure.
>
> When assuming node h01 still lived when communication failed, wouldn't quorum prevent h01 from doing anything with DLM and OCFS2 anyway?
Not sure I understand you correctly. By default, loosing quorum will make DLM stop service. 
See `man dlm_controld`:
```
--enable_quorum_lockspace 0|1
                enable/disable quorum requirement for lockspace operations
```

Eric
>
> Regards,
> Ulrich
>
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>