[ClusterLabs] Antw: RES: Performance of a mirrored LV (cLVM) with OCFS: Attempt to monitor it

Fri May 27 16:37:21 UTC 2016

On 05/27/2016 12:58 AM, Ulrich Windl wrote:
> Hi!
> 
> Thanks for this info. We actually run the "noop" scheduler for  the SAN
> storage (as per menufacturer's recommendation), because on "disk" is actually
> spread over up to 40 disks.
> Other settings we changes was:
> queue/rotational:0
> queue/add_random:0
> queue/max_sectors_kb:128 (manufacturer's recommendation, before up to 1MB
> transfers were seen)
> queue/read_ahead_kb:0
> 
> And we apply those setting (where available) the the whole stack (disk
> devices, multipath device, LV).
> 
> Regards,
> Ulrich

I don't have anything to add about clvm specifically, but some general
RAID tips that are often overlooked:

If you're using striped RAID (i.e. >1), it's important to choose a
stripe size wisely and make sure everything is aligned with it. Somewhat
counterintuitively, smaller stripe sizes are better for large reads and
writes, while larger stripe sizes are better for small reads and writes.
There's a big performance penalty by setting a stripe size too small,
but not much penalty from setting it too large.

Things that should be aligned:

* Partition sizes. A disk's first usable partition will generally start
at (your stripe size in kilobytes * 2) sectors.

* LVM physical volume metadata (via the --metadatasize option to
pvcreate). It will set the metadata size to the next 64K boundary above
the value, so set it to be just under the size you want, ex.
--metadatasize 1.99M will get a metadata size of 2MB.

* The filesystem creation options (varies by fs type). For example, with
ext3/ext4, where N1 is stripe size in kilobytes / 4, and N2 is $N1 times
the number of nonparity disks in the array, use -E
stride=$N1,stripe-width=$N2. For xfs, where STRIPE is the stripe size in
kilobytes and NONPARITY is the number of nonparity disks in the array,
use -d su=${STRIPE}k,sw=${NONPARITY} -l su=${STRIPE}k.

If your RAID controller has power backup (BBU or supercapacitor), mount
filesystems with the nobarrier option.

>>>> "Carlos Xavier" <cbastos at connection.com.br> schrieb am 25.05.2016 um 22:25
> in
> Nachricht <01da01d1b6c3$8f5c3dc0$ae14b940$@com.br>:
>> Hi.
>>
>> I have been running OCFS2 on clusters for quite long time.
>> We started running it over DRBD and now we have it running on a Dell 
>> storage.
>> Over DRBD it showed a very poor performance, most because the way DRBD 
>> works.
>> To improve the performance we had to change the I/O Scheduler of the disk to
> 
>> "Deadline"
>>
>> When we migrate the system to the storage, the issue show up again. 
>> Sometimes the system was hanging due to disk access, to solve the issue I 
>> changed the I/O Schedule To Deadline and the trouble vanished.
>>
>> Regards,
>> Carlos.
>>
>>
>>> -----Mensagem original-----
>>> De: Kristoffer Grönlund [mailto:kgronlund at suse.com]
>>> Enviada em: quarta-feira, 25 de maio de 2016 06:55
>>> Para: Ulrich Windl; users at clusterlabs.org 
>>> Assunto: Re: [ClusterLabs] Performance of a mirrored LV (cLVM) with OCFS: 
>> Attempt to monitor it
>>>
>>> Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de> writes:
>>>
>>>> cLVM has never made a good impression regarding performance, so I wonder
> if 
>> there's anything we
>>> could do to improve the4 performance. I suspect that one VM paging heavily
> 
>> on OCFS2 kills the
>>> performance of the whole cluster (that hosts Xen PV guests only). Anyone 
>> with deeper insights?
>>>
>>> My understanding is that this is a problem inherent in the design of CLVM 
>> and there is work ongoing to
>>> mitigate this by handling clustering in md instead. See this LWN article
> for 
>> more details:
>>>
>>> http://lwn.net/Articles/674085/ 
>>>
>>> Cheers,
>>> Kristoffer
>>>
>>> --
>>> // Kristoffer Grönlund
>>> // kgronlund at suse.com