[ClusterLabs] Antw: Re: OCFS2 on cLVM with node waiting for fencing timeout
emi2fast at gmail.com
Thu Oct 13 04:33:26 EDT 2016
If you want to reduce the multipath switching time, when one
controller goes down
2016-10-13 10:27 GMT+02:00 Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>:
>>>> Eric Ren <zren at suse.com> schrieb am 13.10.2016 um 09:31 in Nachricht
> <e23ba209-fdc6-987e-db14-5c57b72c63d4 at suse.com>:
>> On 10/10/2016 10:46 PM, Ulrich Windl wrote:
>>> I observed an interesting thing: In a three node cluster (SLES11 SP4) with
>> cLVM and OCFS2 on top, one node was fenced as the OCFS2 filesystem was
>> somehow busy on unmount. We have (for paranoid reasons mainly) an excessive
>> long fencing timout for SBD: 180 seconds
>>> While one node was actually reset immediately (the cluster was still waiting
>> for the fencing to "complete" through timeout), the other nodes seemed to
>> freeze the filesystem. Thus I observed a read delay > 140 seconds on one node,
>> the other was also close to 140 seconds.
>> ocfs2 and cLVM are both depending on DLM. DLM deamon will notify them to
>> stop service (which
>> means any cluster locking
>> request would be blocked) during the fencing process.
>> So I'm wondering why it takes so long to finish the fencing process?
> As I wrote: Using SBD this is paranoia (as fencing doesn't report back a status like "completed" or "failed". Actually the fencing only needs a few seconds, but the timeout is 3 minutes. Only then the cluster believes that the node is down now (our servers boot so slowly that they are not up within three minutes, also). Why three minutes? Writing to a SCSI disk may be retried up to one minute, and reading may also be retried for a minute. So for a bad SBD disk (or some strange transport problem) it could take two minutes until the receiving SBD gets the fencing command. If the timeout is too low, resources could be restarted before the node was actually fenced, causing data corruption.
> P.S: One common case where our SAN disks seem slow is "Online" firmware update where a controller may be down 20 to 30 seconds. Multipathing is expected to switch to another controller within a few seconds. However the commands to test the disk in multipath are also SCSI commands that may hang for a while...
>>> This was not expected for a cluster filesystem (by me).
>>> I wonder: Is that expected bahavior?
>>> Users mailing list: Users at clusterlabs.org
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> Users mailing list: Users at clusterlabs.org
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> Users mailing list: Users at clusterlabs.org
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users