[Pacemaker] DRBD < LVM < EXT4 < NFS performance

Mon May 21 08:59:29 UTC 2012

On Sun, May 20, 2012 at 12:05 PM, Christoph Bartoschek
<ponto at pontohonk.de> wrote:
> Hi,
>
> we have a two node setup with drbd below LVM and an Ext4 filesystem that is
> shared vi NFS. The system shows low performance and lots of timeouts
> resulting in unnecessary failovers from pacemaker.
>
> The connection between both nodes is capable of 1 GByte/s as shown by iperf.
> The network between the clients and the nodes is capable of 110 MByte/s. The
> RAID can be filled with 450 MByte/s.

No it can't (most likely); see below.

> Thus I would expect to have a write performance of about 100 MByte/s. But dd
> gives me only 20 MByte/s.
>
> dd if=/dev/zero of=bigfile.10G bs=8192  count=1310720
> 1310720+0 records in
> 1310720+0 records out
> 10737418240 bytes (11 GB) copied, 498.26 s, 21.5 MB/s

If you used that same dd invocation for your local test that allegedly
produced 450 MB/s, you've probably been testing only your page cache.
Add oflag=dsync or oflag=direct (the latter will only work locally, as
NFS doesn't support O_DIRECT).

If your RAID is one of reasonably contemporary SAS or SATA drives,
then a sustained to-disk throughput of 450 MB/s would require about
7-9 stripes in a RAID-0 or RAID-10 configuration. Is that what you've
got? Or are you writing to SSDs?

> While the slow dd runs there are timeouts on the server resulting in a
> restart of some resources. In the logfile I also see:
>
> [329014.592452] INFO: task nfsd:2252 blocked for more than 120 seconds.
> [329014.592820] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [329014.593273] nfsd            D 0000000000000007     0  2252      2
> 0x00000000
> [329014.593278]  ffff88060a847c40 0000000000000046 ffff88060a847bf8
> 0000000300000001
> [329014.593284]  ffff88060a847fd8 ffff88060a847fd8 ffff88060a847fd8
> 0000000000013780
> [329014.593290]  ffff8806091416f0 ffff8806085bc4d0 ffff88060a847c50
> ffff88061870c800
> [329014.593295] Call Trace:
> [329014.593303]  [<ffffffff8165a55f>] schedule+0x3f/0x60
> [329014.593309]  [<ffffffff81265085>] jbd2_log_wait_commit+0xb5/0x130
> [329014.593315]  [<ffffffff8108aec0>] ? add_wait_queue+0x60/0x60
> [329014.593321]  [<ffffffff812111b8>] ext4_sync_file+0x208/0x2d0
> [329014.593328]  [<ffffffff811a62dd>] vfs_fsync_range+0x1d/0x40
> [329014.593339]  [<ffffffffa0227e51>] nfsd_commit+0xb1/0xd0 [nfsd]
> [329014.593349]  [<ffffffffa022f28d>] nfsd3_proc_commit+0x9d/0x100 [nfsd]
> [329014.593356]  [<ffffffffa0222a4b>] nfsd_dispatch+0xeb/0x230 [nfsd]
> [329014.593373]  [<ffffffffa00e9d95>] svc_process_common+0x345/0x690
> [sunrpc]
> [329014.593379]  [<ffffffff8105f990>] ? try_to_wake_up+0x200/0x200
> [329014.593391]  [<ffffffffa00ea1e2>] svc_process+0x102/0x150 [sunrpc]
> [329014.593397]  [<ffffffffa02221ad>] nfsd+0xbd/0x160 [nfsd]
> [329014.593403]  [<ffffffffa02220f0>] ? nfsd_startup+0xf0/0xf0 [nfsd]
> [329014.593407]  [<ffffffff8108a42c>] kthread+0x8c/0xa0
> [329014.593412]  [<ffffffff81666bf4>] kernel_thread_helper+0x4/0x10
> [329014.593416]  [<ffffffff8108a3a0>] ? flush_kthread_worker+0xa0/0xa0
> [329014.593420]  [<ffffffff81666bf0>] ? gs_change+0x13/0x13
>
>
> Has anyone an idea what could cause such problems? I have no idea for
> further analysis.

As a knee-jerk response, that might be the classic issue of NFS
filling up the page cache until it hits the vm.dirty_ratio and then
having a ton of stuff to write to disk, which the local I/O subsystem
can't cope with.

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now