[Pacemaker] DRBD < LVM < EXT4 < NFS performance

Lars Ellenberg lars.ellenberg at linbit.com
Thu May 24 12:43:02 EDT 2012


On Thu, May 24, 2012 at 03:34:51PM +0300, Dan Frincu wrote:
> Hi,
> 
> On Mon, May 21, 2012 at 4:24 PM, Christoph Bartoschek <bartoschek at gmx.de> wrote:
> > Florian Haas wrote:
> >
> >>> Thus I would expect to have a write performance of about 100 MByte/s. But
> >>> dd gives me only 20 MByte/s.
> >>>
> >>> dd if=/dev/zero of=bigfile.10G bs=8192  count=1310720
> >>> 1310720+0 records in
> >>> 1310720+0 records out
> >>> 10737418240 bytes (11 GB) copied, 498.26 s, 21.5 MB/s
> >>
> >> If you used that same dd invocation for your local test that allegedly
> >> produced 450 MB/s, you've probably been testing only your page cache.
> >> Add oflag=dsync or oflag=direct (the latter will only work locally, as
> >> NFS doesn't support O_DIRECT).
> >>
> >> If your RAID is one of reasonably contemporary SAS or SATA drives,
> >> then a sustained to-disk throughput of 450 MB/s would require about
> >> 7-9 stripes in a RAID-0 or RAID-10 configuration. Is that what you've
> >> got? Or are you writing to SSDs?
> >
> > I used the same invocation with different filenames each time. To which page
> > cache to you refer? To the one on the client or on the server side?
> >
> > We are using RAID-1 with 6 x 2 disks. I have repeated the local test 10
> > times with different files in a row:
> >
> > for i in `seq 10`; do time dd if=/dev/zero of=bigfile.10G.$i bs=8192
> > count=1310720; done
> >
> > The resulting values on a system that is also used by other programs as
> > reported by dd are:
> >
> > 515 MB/s, 480 MB/s, 340 MB/s, 338 MB/s, 360 MB/s, 284 MB/s, 311 MB/s, 320
> > MB/s, 242 MB/s,  289 MB/s
> >
> > So I think that the system is capable of more than 200 MB/s which is way
> > more what can arrive over the network.
> 
> A bit off-topic maybe.
> 
> Whenever you do these kinds of tests regarding performance on disk
> (locally) to test actual speed and not some caching, as Florian said,
> you should use oflag=direct option to dd and also echo 3 >
> /proc/sys/vm/drop_caches and sync.
> 

You should sync before you drop caches,
or you won't drop those caches that have been dirty at that time.

> I usually use echo 3 > /proc/sys/vm/drop_caches && sync && date &&
> time dd if=/dev/zero of=whatever bs=1G count=x oflag=direct && sync &&
> date
> 
> You can assess if there is data being flushed if the results given by
> dd differ from those obtained by calculating the amount of data
> written between the two date calls. It also helps to push more data
> than the controller can store.

Also, dd is doing one bs sized chunk at a time.

fio with appropriate options can be more useful,
once you learned all those options, and how to interpret the results...

> Regards,
> Dan
> 
> >
> > I've done the measurements on the filesystem that sits on top of LVM and
> > DRBD. Thus I think that DRBD is not a problem.
> >
> > However the strange thing is that I get 108 MB/s on the clients as soon as I
> > disable the secondary node for DRBD. Maybe there is strange interaction
> > between DRBD and NFS.

Dedicated replication link?

Maybe the additional latency is all that kills you.
Do you have non-volatile write cache on your IO backend?
Did you post your drbd configuration setings already?

> >
> > After reenabling the secondary node the DRBD synchronization is quite slow.
> >
> >
> >>>
> >>> Has anyone an idea what could cause such problems? I have no idea for
> >>> further analysis.
> >>
> >> As a knee-jerk response, that might be the classic issue of NFS
> >> filling up the page cache until it hits the vm.dirty_ratio and then
> >> having a ton of stuff to write to disk, which the local I/O subsystem
> >> can't cope with.
> >
> > Sounds reasonable but shouldn't the I/O subsystem be capable to write
> > anything away that arrives?
> >
> > Christoph

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.




More information about the Pacemaker mailing list