[ClusterLabs] Antw: Re: Antw: DRBD and SSD TRIM - Slow! -- RESOLVED!

Thu Aug 3 19:52:42 CEST 2017

For anyone else who has this problem, we have reduced the time required to trim a 1.3TB volume from 3 days to 1.5 minutes.

Initially, we had used mdraid to build a raid0 array with a 32K chunk size. We initialized it as a drbd disk, synced it, built an lvm logical volume on it, and created an ext4 filesystem on the volume. Creating the filesystem and trimming it took 3 days (each time, every time, across multiple tests). 

When running lsblk -D, we noticed that the DISC-MAX value for the array was only 32K, compared to 4GB for the SSD drive itself. We also noticed that the number matched the chunk size. We deleted the array and built a new one with a 4MB chunk size. The DISC-MAX value changed to 4MB, which is the max selectable chunk size (but still way below the other DISC-MAX values shown in lsblk -D). We realized that, when using mdadm, the DISK-MAX value ends up matching the array chunk size. We theorized that the small DISC-MAX value was responsible for the slow trim rate across the DRBD link.

Instead of using mdadm to build the array, we used LVM to create a striped logical volume and made that the backing device for drbd. Then lsblk -D showed a DISC-MAX size of 128MB.  Creating an ext4 filesystem on it and trimming only took 1.5 minutes (across multiple tests).

Somebody knowledgeable may be able to explain how DISC-MAX affects the trim speed, and why the DISC-MAX value is different when creating the array with mdadm versus lvm.

--
Eric Robinson

> -----Original Message-----
> From: Ulrich Windl [mailto:Ulrich.Windl at rz.uni-regensburg.de]
> Sent: Wednesday, August 02, 2017 11:36 PM
> To: users at clusterlabs.org
> Subject: [ClusterLabs] Antw: Re: Antw: DRBD and SSD TRIM - Slow!
> 
> >>> Eric Robinson <eric.robinson at psmnv.com> schrieb am 02.08.2017 um
> >>> 23:20 in
> Nachricht
> <DM5PR03MB2729C66CEC1E3B8B9E297185FAB00 at DM5PR03MB2729.nampr
> d03.prod.outlook.com>
> 
> > 1) iotop did not show any significant io, just maybe 30k/second of
> > drbd traffic.
> >
> > 2) okay. I've never done that before. I'll give it a shot.
> >
> > 3) I'm not sure what I'm looking at there.
> 
> See /usr/src/linux/Documentation/block/stat.txt ;-) I wrote an NRPE plugin
> to monitor those with performance data and verbose text output, e.g.:
> CFS_VMs-xen: [delta 120s], 1.15086 IO/s read, 60.7789 IO/s write, 0 req/s
> read merges, 0 req/s write merges, 4.53674 sec/s read, 486.231 sec/s write,
> 2.36844 ms/s read wait, 2702.19 ms/s write wait, 0 req in_flight, 115.987 ms/s
> active, 2704.53 ms/s wait
> 
> Regards,
> Ulrich
> 
> >
> > --
> > Eric Robinson
> >
> >> -----Original Message-----
> >> From: Ulrich Windl [mailto:Ulrich.Windl at rz.uni-regensburg.de]
> >> Sent: Tuesday, August 01, 2017 11:28 PM
> >> To: users at clusterlabs.org
> >> Subject: [ClusterLabs] Antw: DRBD and SSD TRIM - Slow!
> >>
> >> Hi!
> >>
> >> I know little about trim operations, but you could try one of these:
> >>
> >> 1) iotop to see whether some I/O is done during trimming (assuming
> >> trimming itself is not considered to be I/O)
> >>
> >> 2) Try blocktrace on the affected devices to see what's going on.
> >> It's hard
> > to
> >> set up and to extract the info you are looking for, but it provides
> >> deep insights
> >>
> >> 3) Watch /sys/block/$BDEV/stat for performance statistics. I don't
> >> know how well DRBD supports these, however (e.g. MDRAID shows no
> wait
> >> times and no busy operations, while a multipath map has it all).
> >>
> >> Regards,
> >> Ulrich
> >>
> >> >>> Eric Robinson <eric.robinson at psmnv.com> schrieb am 02.08.2017 um
> >> >>> 07:09 in
> >> Nachricht
> >>
> <DM5PR03MB27297014DF96DC01FE849A63FAB00 at DM5PR03MB2729.nampr
> >> d03.prod.outlook.com>
> >>
> >> > Does anyone know why trimming a filesystem mounted on a DRBD
> volume
> >> > takes so long? I mean like three days to trim a 1.2TB filesystem.
> >> >
> >> > Here are some pertinent details:
> >> >
> >> > OS: SLES 12 SP2
> >> > Kernel: 4.4.74-92.29
> >> > Drives: 6 x Samsung SSD 840 Pro 512GB
> >> > RAID: 0 (mdraid)
> >> > DRBD: 9.0.8
> >> > Protocol: C
> >> > Network: Gigabit
> >> > Utilization: 10%
> >> > Latency: < 1ms
> >> > Loss: 0%
> >> > Iperf test: 900 mbits/sec
> >> >
> >> > When I write to a non-DRBD partition, I get 400MB/sec (bypassing
> caches).
> >> > When I trim a non-DRBD partition, it completes fast.
> >> > When I write to a DRBD volume, I get 80MB/sec.
> >> >
> >> > When I trim a DRBD volume, it takes bloody ages!
> >> >
> >> > --
> >> > Eric Robinson
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Users mailing list: Users at clusterlabs.org
> >> http://lists.clusterlabs.org/mailman/listinfo/users
> >>
> >> Project Home: http://www.clusterlabs.org Getting started:
> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> > http://lists.clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org Getting started:
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org