[ClusterLabs] Antw: Re: Antw: Re: Antw: DRBD and SSD TRIM - Slow! -- RESOLVED!

Fri Aug 4 08:14:49 CEST 2017

Hi!

Thanks for sharing your interesting insights!
I have no idea how the trimming works, but I can imagine if you try to trim a portion that is smaller than the managed size (assuming it does not fail) may cause the rest of the data to be migrated to a new block before the old block is trimmed. That's only a guess, as I said.

Another thing is stacking block layers: I once caused I/O errors myself by reducing the maximum I/O size of a lower layer. I was expecting that the upper layers would send small-enough chunks to the lower layers, but instead the whole request failed at the higher layer. So obviously the I/O size at the lowest layer is important.
I did not play with trim, but I did lots of tests with parallel I/O and different block sizes to fund out the optimal value for some storage device.

Regards,
Ulrich

>>> Eric Robinson <eric.robinson at psmnv.com> schrieb am 03.08.2017 um 19:52 in
Nachricht
<DM5PR03MB27295F164251B8AC4C6A05ADFAB10 at DM5PR03MB2729.namprd03.prod.outlook.com>

> For anyone else who has this problem, we have reduced the time required to 
> trim a 1.3TB volume from 3 days to 1.5 minutes.
> 
> Initially, we had used mdraid to build a raid0 array with a 32K chunk size. 
> We initialized it as a drbd disk, synced it, built an lvm logical volume on 
> it, and created an ext4 filesystem on the volume. Creating the filesystem and 
> trimming it took 3 days (each time, every time, across multiple tests). 
> 
> When running lsblk -D, we noticed that the DISC-MAX value for the array was 
> only 32K, compared to 4GB for the SSD drive itself. We also noticed that the 
> number matched the chunk size. We deleted the array and built a new one with 
> a 4MB chunk size. The DISC-MAX value changed to 4MB, which is the max 
> selectable chunk size (but still way below the other DISC-MAX values shown in 
> lsblk -D). We realized that, when using mdadm, the DISK-MAX value ends up 
> matching the array chunk size. We theorized that the small DISC-MAX value was 
> responsible for the slow trim rate across the DRBD link.
> 
> Instead of using mdadm to build the array, we used LVM to create a striped 
> logical volume and made that the backing device for drbd. Then lsblk -D showed 
> a DISC-MAX size of 128MB.  Creating an ext4 filesystem on it and trimming only 
> took 1.5 minutes (across multiple tests).
> 
> Somebody knowledgeable may be able to explain how DISC-MAX affects the trim 
> speed, and why the DISC-MAX value is different when creating the array with 
> mdadm versus lvm.
> 
> --
> Eric Robinson
> 
>> -----Original Message-----
>> From: Ulrich Windl [mailto:Ulrich.Windl at rz.uni-regensburg.de]
>> Sent: Wednesday, August 02, 2017 11:36 PM
>> To: users at clusterlabs.org 
>> Subject: [ClusterLabs] Antw: Re: Antw: DRBD and SSD TRIM - Slow!
>> 
>> >>> Eric Robinson <eric.robinson at psmnv.com> schrieb am 02.08.2017 um
>> >>> 23:20 in
>> Nachricht
>> <DM5PR03MB2729C66CEC1E3B8B9E297185FAB00 at DM5PR03MB2729.nampr 
>> d03.prod.outlook.com>
>> 
>> > 1) iotop did not show any significant io, just maybe 30k/second of
>> > drbd traffic.
>> >
>> > 2) okay. I've never done that before. I'll give it a shot.
>> >
>> > 3) I'm not sure what I'm looking at there.
>> 
>> See /usr/src/linux/Documentation/block/stat.txt ;-) I wrote an NRPE plugin
>> to monitor those with performance data and verbose text output, e.g.:
>> CFS_VMs-xen: [delta 120s], 1.15086 IO/s read, 60.7789 IO/s write, 0 req/s
>> read merges, 0 req/s write merges, 4.53674 sec/s read, 486.231 sec/s write,
>> 2.36844 ms/s read wait, 2702.19 ms/s write wait, 0 req in_flight, 115.987 
> ms/s
>> active, 2704.53 ms/s wait
>> 
>> Regards,
>> Ulrich
>> 
>> >
>> > --
>> > Eric Robinson
>> >
>> >> -----Original Message-----
>> >> From: Ulrich Windl [mailto:Ulrich.Windl at rz.uni-regensburg.de]
>> >> Sent: Tuesday, August 01, 2017 11:28 PM
>> >> To: users at clusterlabs.org 
>> >> Subject: [ClusterLabs] Antw: DRBD and SSD TRIM - Slow!
>> >>
>> >> Hi!
>> >>
>> >> I know little about trim operations, but you could try one of these:
>> >>
>> >> 1) iotop to see whether some I/O is done during trimming (assuming
>> >> trimming itself is not considered to be I/O)
>> >>
>> >> 2) Try blocktrace on the affected devices to see what's going on.
>> >> It's hard
>> > to
>> >> set up and to extract the info you are looking for, but it provides
>> >> deep insights
>> >>
>> >> 3) Watch /sys/block/$BDEV/stat for performance statistics. I don't
>> >> know how well DRBD supports these, however (e.g. MDRAID shows no
>> wait
>> >> times and no busy operations, while a multipath map has it all).
>> >>
>> >> Regards,
>> >> Ulrich
>> >>
>> >> >>> Eric Robinson <eric.robinson at psmnv.com> schrieb am 02.08.2017 um
>> >> >>> 07:09 in
>> >> Nachricht
>> >>
>> <DM5PR03MB27297014DF96DC01FE849A63FAB00 at DM5PR03MB2729.nampr 
>> >> d03.prod.outlook.com>
>> >>
>> >> > Does anyone know why trimming a filesystem mounted on a DRBD
>> volume
>> >> > takes so long? I mean like three days to trim a 1.2TB filesystem.
>> >> >
>> >> > Here are some pertinent details:
>> >> >
>> >> > OS: SLES 12 SP2
>> >> > Kernel: 4.4.74-92.29
>> >> > Drives: 6 x Samsung SSD 840 Pro 512GB
>> >> > RAID: 0 (mdraid)
>> >> > DRBD: 9.0.8
>> >> > Protocol: C
>> >> > Network: Gigabit
>> >> > Utilization: 10%
>> >> > Latency: < 1ms
>> >> > Loss: 0%
>> >> > Iperf test: 900 mbits/sec
>> >> >
>> >> > When I write to a non-DRBD partition, I get 400MB/sec (bypassing
>> caches).
>> >> > When I trim a non-DRBD partition, it completes fast.
>> >> > When I write to a DRBD volume, I get 80MB/sec.
>> >> >
>> >> > When I trim a DRBD volume, it takes bloody ages!
>> >> >
>> >> > --
>> >> > Eric Robinson
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Users mailing list: Users at clusterlabs.org 
>> >> http://lists.clusterlabs.org/mailman/listinfo/users 
>> >>
>> >> Project Home: http://www.clusterlabs.org Getting started:
>> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> >> Bugs: http://bugs.clusterlabs.org 
>> >
>> > _______________________________________________
>> > Users mailing list: Users at clusterlabs.org 
>> > http://lists.clusterlabs.org/mailman/listinfo/users 
>> >
>> > Project Home: http://www.clusterlabs.org Getting started:
>> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> > Bugs: http://bugs.clusterlabs.org 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org 
>> http://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>> Project Home: http://www.clusterlabs.org Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org