1
linux/block
Vivek Goyal 1efe8fe1c2 cfq-iosched: Do not idle on async queues
Few weeks back, Shaohua Li had posted similar patch. I am reposting it
with more test results.

This patch does two things.

- Do not idle on async queues.

- It also changes the write queue depth CFQ drives (cfq_may_dispatch()).
  Currently, we seem to driving queue depth of 1 always for WRITES. This is
  true even if there is only one write queue in the system and all the logic
  of infinite queue depth in case of single busy queue as well as slowly
  increasing queue depth based on last delayed sync request does not seem to
  be kicking in at all.

This patch will allow deeper WRITE queue depths (subjected to the other
WRITE queue depth contstraints like cfq_quantum and last delayed sync
request).

Shaohua Li had reported getting more out of his SSD. For me, I have got
one Lun exported from an HP EVA and when pure buffered writes are on, I
can get more out of the system. Following are test results of pure
buffered writes (with end_fsync=1) with vanilla and patched kernel. These
results are average of 3 sets of run with increasing number of threads.

AVERAGE[bufwfs][vanilla]
-------
job       Set NR  ReadBW(KB/s)   MaxClat(us)    WriteBW(KB/s)  MaxClat(us)
---       --- --  ------------   -----------    -------------  -----------
bufwfs    3   1   0              0              95349          474141
bufwfs    3   2   0              0              100282         806926
bufwfs    3   4   0              0              109989         2.7301e+06
bufwfs    3   8   0              0              116642         3762231
bufwfs    3   16  0              0              118230         6902970

AVERAGE[bufwfs] [patched kernel]
-------
bufwfs    3   1   0              0              270722         404352
bufwfs    3   2   0              0              206770         1.06552e+06
bufwfs    3   4   0              0              195277         1.62283e+06
bufwfs    3   8   0              0              260960         2.62979e+06
bufwfs    3   16  0              0              299260         1.70731e+06

I also ran buffered writes along with some sequential reads and some
buffered reads going on in the system on a SATA disk because the potential
risk could be that we should not be driving queue depth higher in presence
of sync IO going to keep the max clat low.

With some random and sequential reads going on in the system on one SATA
disk I did not see any significant increase in max clat. So it looks like
other WRITE queue depth control logic is doing its job. Here are the
results.

AVERAGE[brr, bsr, bufw together] [vanilla]
-------
job       Set NR  ReadBW(KB/s)   MaxClat(us)    WriteBW(KB/s)  MaxClat(us)
---       --- --  ------------   -----------    -------------  -----------
brr       3   1   850            546345         0              0
bsr       3   1   14650          729543         0              0
bufw      3   1   0              0              23908          8274517

brr       3   2   981.333        579395         0              0
bsr       3   2   14149.7        1175689        0              0
bufw      3   2   0              0              21921          1.28108e+07

brr       3   4   898.333        1.75527e+06    0              0
bsr       3   4   12230.7        1.40072e+06    0              0
bufw      3   4   0              0              19722.3        2.4901e+07

brr       3   8   900            3160594        0              0
bsr       3   8   9282.33        1.91314e+06    0              0
bufw      3   8   0              0              18789.3        23890622

AVERAGE[brr, bsr, bufw mixed] [patched kernel]
-------
job       Set NR  ReadBW(KB/s)   MaxClat(us)    WriteBW(KB/s)  MaxClat(us)
---       --- --  ------------   -----------    -------------  -----------
brr       3   1   837            417973         0              0
bsr       3   1   14357.7        591275         0              0
bufw      3   1   0              0              24869.7        8910662

brr       3   2   1038.33        543434         0              0
bsr       3   2   13351.3        1205858        0              0
bufw      3   2   0              0              18626.3        13280370

brr       3   4   913            1.86861e+06    0              0
bsr       3   4   12652.3        1430974        0              0
bufw      3   4   0              0              15343.3        2.81305e+07

brr       3   8   890            2.92695e+06    0              0
bsr       3   8   9635.33        1.90244e+06    0              0
bufw      3   8   0              0              17200.3        24424392

So looks like it might make sense to include this patch.

Thanks
Vivek

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-02-02 20:46:10 +01:00
..
blk-barrier.c block: Honor the gfp_mask for alloc_page() in blkdev_issue_discard() 2009-12-29 08:53:54 +01:00
blk-cgroup.c blk-cgroup: Fix potential deadlock in blk-cgroup 2010-02-01 09:58:54 +01:00
blk-cgroup.h blkio: Implement dynamic io controlling policy registration 2009-12-04 16:38:14 +01:00
blk-core.c block: add helpers to run flush_dcache_page() against a bio and a request's pages 2009-11-26 09:16:19 +01:00
blk-exec.c block: don't set REQ_NOMERGE unnecessarily 2009-04-28 07:37:33 +02:00
blk-integrity.c block: fix improper kobject release in blk_integrity_unregister 2009-07-28 09:11:14 +02:00
blk-ioc.c block: removed unused as_io_context 2010-01-11 14:29:20 +01:00
blk-iopoll.c tree-wide: fix assorted typos all over the place 2009-12-04 15:39:55 +01:00
blk-map.c block: Use accessor functions for queue limits 2009-05-22 23:22:54 +02:00
blk-merge.c block: Seperate read and write statistics of in_flight requests v2 2009-10-06 20:16:55 +02:00
blk-settings.c block: bdev_stack_limits wrapper 2010-01-11 14:29:20 +01:00
blk-softirq.c generic-ipi: remove CSD_FLAG_WAIT 2009-02-25 14:13:44 +01:00
blk-sysfs.c block: Allow devices to indicate whether discarded blocks are zeroed 2009-12-03 09:24:48 +01:00
blk-tag.c block: use proper BLK_RW_ASYNC in blk_queue_start_tag() 2009-10-06 20:19:02 +02:00
blk-timeout.c block: clean up misc stuff after block layer timeout conversion 2009-04-28 07:37:34 +02:00
blk.h block: implement mixed merge of different failfast requests 2009-09-11 14:33:30 +02:00
bsg.c block: jiffies fixes 2009-11-11 13:47:45 +01:00
cfq-iosched.c cfq-iosched: Do not idle on async queues 2010-02-02 20:46:10 +01:00
compat_ioctl.c block: Allow devices to indicate whether discarded blocks are zeroed 2009-12-03 09:24:48 +01:00
deadline-iosched.c block: convert to pos and nr_sectors accessors 2009-05-11 09:50:54 +02:00
elevator.c Merge branch 'for-linus' into for-2.6.33 2009-10-13 12:29:45 +02:00
genhd.c block: Fix discard alignment calculation and printing 2010-01-11 14:29:19 +01:00
ioctl.c block: Allow devices to indicate whether discarded blocks are zeroed 2009-12-03 09:24:48 +01:00
Kconfig blkio: Some debugging aids for CFQ 2009-12-03 19:28:52 +01:00
Kconfig.iosched blkio: Allow CFQ group IO scheduling even when CFQ is a module 2009-12-04 16:38:14 +01:00
Makefile blkio: Introduce blkio controller cgroup interface 2009-12-03 19:28:51 +01:00
noop-iosched.c block: get rid of elevator_t typedef 2008-12-29 08:29:50 +01:00
scsi_ioctl.c block/scsi_ioctl.c: quiet sparse noise 2009-11-04 09:10:33 +01:00