linux

Author	SHA1	Message	Date
Linus Torvalds	ada3fa1505	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits) powerpc64: convert to dynamic percpu allocator sparc64: use embedding percpu first chunk allocator percpu: kill lpage first chunk allocator x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA percpu: update embedding first chunk allocator to handle sparse units percpu: use group information to allocate vmap areas sparsely vmalloc: implement pcpu_get_vm_areas() vmalloc: separate out insert_vmalloc_vm() percpu: add chunk->base_addr percpu: add pcpu_unit_offsets[] percpu: introduce pcpu_alloc_info and pcpu_group_info percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward percpu: add @align to pcpu_fc_alloc_fn_t percpu: make @dyn_size mandatory for pcpu_setup_first_chunk() percpu: drop @static_size from first chunk allocators percpu: generalize first chunk allocator selection percpu: build first chunk allocators selectively percpu: rename 4k first chunk allocator to page percpu: improve boot messages percpu: fix pcpu_reclaim() locking ... Fix trivial conflict as by Tejun Heo in kernel/sched.c	2009-09-15 09:39:44 -07:00
Jeff Moyer	06d2188644	cfq: choose a new next_req when a request is dispatched This patch addresses http://bugzilla.kernel.org/show_bug.cgi?id=13401, a regression introduced in 2.6.30. From the bug report: Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-14 08:24:52 +02:00
Shan Wei	b217a903ab	cfq: fix the log message after dispatched a request The blktrace tools can show process id when cfq dispatched a request, using cfq_log_cfqq() instead of cfq_log(). Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-11 14:34:33 +02:00
Jens Axboe	1b379d8daf	cfq-iosched: get rid of must_alloc flag It's not currently used, as pointed out by Gui Jianfeng <guijianfeng@cn.fujitsu.com>. We already check the wait_request flag to allow an idling queue priority allocation access, so we don't need this extra flag. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-11 14:33:32 +02:00
Jens Axboe	1f98a13f62	bio: first step in sanitizing the bio->bi_rw flag testing Get rid of any functions that test for these bits and make callers use bio_rw_flagged() directly. Then it is at least directly apparent what variable and flag they check. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-11 14:33:31 +02:00
Vivek Goyal	d58b85e1e8	cfq-iosched: no need to keep track of busy_rt_queues o Get rid of busy_rt_queues infrastructure. Looks like it is redundant. o Once an RT queue gets request it will preempt any of the BE or IDLE queues immediately. Otherwise this queue will be put on service tree and scheduler will anyway select this queue before any of the BE or IDLE queue. Hence looks like there is no need to keep track of how many busy RT queues are currently on service tree. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-11 14:33:30 +02:00
Jens Axboe	5ad531db6e	cfq-iosched: drain device queue before switching to a sync queue To lessen the impact of async IO on sync IO, let the device drain of any async IO in progress when switching to a sync cfqq that has idling enabled. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-11 14:33:30 +02:00
Tejun Heo	384be2b18a	Merge branch 'percpu-for-linus' into percpu-for-next Conflicts: arch/sparc/kernel/smp_64.c arch/x86/kernel/cpu/perf_counter.c arch/x86/kernel/setup_percpu.c drivers/cpufreq/cpufreq_ondemand.c mm/percpu.c Conflicts in core and arch percpu codes are mostly from commit ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many num_possible_cpus() with nr_cpu_ids. As for-next branch has moved all the first chunk allocators into mm/percpu.c, the changes are moved from arch code to mm/percpu.c. Signed-off-by: Tejun Heo <tj@kernel.org>	2009-08-14 14:45:31 +09:00
Vivek Goyal	32f2e807a3	cfq-iosched: reset oom_cfqq in cfq_set_request() In case memory is scarce, we now default to oom_cfqq. Once memory is available again, we should allocate a new cfqq and stop using oom_cfqq for a particular io context. Once a new request comes in, check if we are using oom_cfqq, and if yes, try to allocate a new cfqq. Tested the patch by forcing the use of oom_cfqq and upon next request thread realized that it was using oom_cfqq and it allocated a new cfqq. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-07-10 20:31:54 +02:00
Tejun Heo	c43768cbb7	Merge branch 'master' into for-next Pull linus#master to merge PER_CPU_DEF_ATTRIBUTES and alpha build fix changes. As alpha in percpu tree uses 'weak' attribute instead of inline assembly, there's no need for __used attribute. Conflicts: arch/alpha/include/asm/percpu.h arch/mn10300/kernel/vmlinux.lds.S include/linux/percpu-defs.h	2009-07-04 07:13:18 +09:00
Shan Wei	b706f64281	cfq-iosched: remove redundant check for NULL cfqq in cfq_set_request() With the changes for falling back to an oom_cfqq, we never fail to find/allocate a queue in cfq_get_queue(). So remove the check. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-07-01 12:41:14 +02:00
Jens Axboe	6118b70b3a	cfq-iosched: get rid of the need for __GFP_NOFAIL in cfq_find_alloc_queue() Setup an emergency fallback cfqq that we allocate at IO scheduler init time. If the slab allocation fails in cfq_find_alloc_queue(), we'll just punt IO to that cfqq instead. This ensures that cfq_find_alloc_queue() never fails without having to ensure free memory. On cfqq lookup, always try to allocate a new cfqq if the given cfq io context has the oom_cfqq assigned. This ensures that we only temporarily punt to this shared queue. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-07-01 10:56:25 +02:00
Jens Axboe	d5036d770f	cfq-iosched: move cfqq initialization out of cfq_find_alloc_queue() We're going to be needing that init code outside of that function to get rid of the __GFP_NOFAIL in cfqq allocation. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-07-01 10:56:25 +02:00
Tejun Heo	245b2e70ea	percpu: clean up percpu variable definitions Percpu variable definition is about to be updated such that all percpu symbols including the static ones must be unique. Update percpu variable definitions accordingly. * as,cfq: rename ioc_count uniquely * cpufreq: rename cpu_dbs_info uniquely * xen: move nesting_count out of xen_evtchn_do_upcall() and rename it * mm: move ratelimits out of balance_dirty_pages_ratelimited_nr() and rename it * ipv4,6: rename cookie_scratch uniquely * x86 perf_counter: rename prev_left to pmc_prev_left, irq_entry to pmc_irq_entry and nmi_entry to pmc_nmi_entry * perf_counter: rename disable_count to perf_disable_count * ftrace: rename test_event_disable to ftrace_test_event_disable * kmemleak: rename test_pointer to kmemleak_test_pointer * mce: rename next_interval to mce_next_interval [ Impact: percpu usage cleanups, no duplicate static percpu var names ] Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Dave Jones <davej@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: linux-mm <linux-mm@kvack.org> Cc: David S. Miller <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andi Kleen <andi@firstfloor.org>	2009-06-24 15:13:48 +09:00
Jeff Moyer	6923715ae3	cfq: remove extraneous '\n' in blktrace output I noticed a blank line in blktrace output. This patch fixes that. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-16 08:21:04 +02:00
Gui Jianfeng	81be834713	cfq: cleanup for last_end_request in cfq_data Actually, last_end_request in cfq_data isn't used now. So lets just remove it. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-16 08:21:03 +02:00
Nikanth Karthikesan	d9c7d394a8	block: prevent possible io_context->refcount overflow Currently io_context has an atomic_t(32-bit) as refcount. In the case of cfq, for each device against whcih a task does I/O, a reference to the io_context would be taken. And when there are multiple process sharing io_contexts(CLONE_IO) would also have a reference to the same io_context. Theoretically the possible maximum number of processes sharing the same io_context + the number of disks/cfq_data referring to the same io_context can overflow the 32-bit counter on a very high-end machine. Even though it is an improbable case, let us make it atomic_long_t. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-10 23:07:15 +02:00
Tejun Heo	2e46e8b27a	block: drop request->hard_* and nr_sectors struct request has had a few different ways to represent some properties of a request. ->hard_ represent block layer's view of the request progress (completion cursor) and the ones without the prefix are supposed to represent the issue cursor and allowed to be updated as necessary by the low level drivers. The thing is that as block layer supports partial completion, the two cursors really aren't necessary and only cause confusion. In addition, manual management of request detail from low level drivers is cumbersome and error-prone at the very least. Another interesting duplicate fields are rq->[hard_]nr_sectors and rq->{hard_cur\|current}_nr_sectors against rq->data_len and rq->bio->bi_size. This is more convoluted than the hard_ case. rq->[hard_]nr_sectors are initialized for requests with bio but blk_rq_bytes() uses it only for !pc requests. rq->data_len is initialized for all request but blk_rq_bytes() uses it only for pc requests. This causes good amount of confusion throughout block layer and its drivers and determining the request length has been a bit of black magic which may or may not work depending on circumstances and what the specific LLD is actually doing. rq->{hard_cur\|current}_nr_sectors represent the number of sectors in the contiguous data area at the front. This is mainly used by drivers which transfers data by walking request segment-by-segment. This value always equals rq->bio->bi_size >> 9. However, data length for pc requests may not be multiple of 512 bytes and using this field becomes a bit confusing. In general, having multiple fields to represent the same property leads only to confusion and subtle bugs. With recent block low level driver cleanups, no driver is accessing or manipulating these duplicate fields directly. Drop all the duplicates. Now rq->sector means the current sector, rq->data_len the current total length and rq->bio->bi_size the current segment length. Everything else is defined in terms of these three and available only through accessors. * blk_recalc_rq_sectors() is collapsed into blk_update_request() and now handles pc and fs requests equally other than rq->sector update. This means that now pc requests can use partial completion too (no in-kernel user yet tho). * bio_cur_sectors() is replaced with bio_cur_bytes() as block layer now uses byte count as the primary data length. * blk_rq_pos() is now guranteed to be always correct. In-block users converted. * blk_rq_bytes() is now guaranteed to be always valid as is blk_rq_sectors(). In-block users converted. * blk_rq_sectors() is now guaranteed to equal blk_rq_bytes() >> 9. More convenient one is used. * blk_rq_bytes() and blk_rq_cur_bytes() are now inlined and take const pointer to request. [ Impact: API cleanup, single way to represent one property of a request ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-11 09:50:54 +02:00
Tejun Heo	83096ebf12	block: convert to pos and nr_sectors accessors With recent cleanups, there is no place where low level driver directly manipulates request fields. This means that the 'hard' request fields always equal the !hard fields. Convert all rq->sectors, nr_sectors and current_nr_sectors references to accessors. While at it, drop superflous blk_rq_pos() < 0 test in swim.c. [ Impact: use pos and nr_sectors accessors ] Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Tested-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Grant Likely <grant.likely@secretlab.ca> Tested-by: Adrian McMenamin <adrian@mcmen.demon.co.uk> Acked-by: Adrian McMenamin <adrian@mcmen.demon.co.uk> Acked-by: Mike Miller <mike.miller@hp.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Borislav Petkov <petkovbb@googlemail.com> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Eric Moore <Eric.Moore@lsi.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Pete Zaitcev <zaitcev@redhat.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Tim Waugh <tim@cyberelk.net> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Alex Dubov <oakad@yahoo.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Dario Ballabio <ballabio_dario@emc.com> Cc: David S. Miller <davem@davemloft.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: unsik Kim <donari75@gmail.com> Cc: Laurent Vivier <Laurent@lvivier.info> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-11 09:50:54 +02:00
Tejun Heo	5b93629b45	block: implement blk_rq_pos/[cur_]sectors() and convert obvious ones Implement accessors - blk_rq_pos(), blk_rq_sectors() and blk_rq_cur_sectors() which return rq->hard_sector, rq->hard_nr_sectors and rq->hard_cur_sectors respectively and convert direct references of the said fields to the accessors. This is in preparation of request data length handling cleanup. Geert : suggested adding const to struct request * parameter to accessors Sergei : spotted error in patch description [ Impact: cleanup ] Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Acked-by: Stephen Rothwell <sfr@canb.auug.org.au> Tested-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Grant Likely <grant.likely@secretlab.ca> Ackec-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Borislav Petkov <petkovbb@googlemail.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-11 09:50:53 +02:00
Tejun Heo	a7f5579234	block: kill blk_start_queueing() blk_start_queueing() is identical to __blk_run_queue() except that it doesn't check for recursion. None of the current users depends on blk_start_queueing() running request_fn directly. Replace usages of blk_start_queueing() with [__]blk_run_queue() and kill it. [ Impact: removal of mostly duplicate interface function ] Signed-off-by: Tejun Heo <tj@kernel.org>	2009-04-28 07:37:33 +02:00
Jens Axboe	f2d1f0ae78	cfq-iosched: cache prio_tree root in cfqq->p_root Currently we look it up from ->ioprio, but ->ioprio can change if either the process gets its IO priority changed explicitly, or if cfq decides to temporarily boost it. So if we are unlucky, we can end up attempting to remove a node from a different rbtree root than where it was added. Fix this by using ->org_ioprio as the prio_tree index, since that will only change for explicit IO priority settings (not for a boost). Additionally cache the rbtree root inside the cfqq, then we don't have to add code to reinsert the cfqq in the prio_tree if IO priority changes. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-24 08:54:22 +02:00
Jens Axboe	3ac6c9f8a6	cfq-iosched: fix bug with aliased request and cooperation detection cfq_prio_tree_lookup() should return the direct match, yet it always returns zero. Fix that. cfq_prio_tree_add() assumes that we don't get a direct match, while it is very possible that we do. Using O_DIRECT, you can have different cfqq with matching requests, since you don't have the page cache to serialize things for you. Fix this bug by only adding the cfqq if there isn't an existing match. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-24 08:54:22 +02:00
Jens Axboe	26a2ac009c	cfq-iosched: clear ->prio_trees[] on cfqd alloc Not strictly needed, but we should make it clear that we init the rbtree roots here. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-24 08:54:22 +02:00
Jeff Moyer	04dc6e71a2	cfq-iosched: use the default seek distance when there aren't enough seek samples If the cfq io context doesn't have enough samples yet to provide a mean seek distance, then use the default threshold we have for seeky IO instead of defaulting to 0. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-22 08:35:11 +02:00
Jeff Moyer	4d00aa47e2	cfq-iosched: make seek_mean converge more quickly Right now, depending on the first sector to which a process issues I/O, the seek time may start out way out of whack. So make sure we start with 0 sectors in seek, instead of the offset of the first request issued. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-22 08:35:11 +02:00
Jens Axboe	a36e71f996	cfq-iosched: add close cooperator code If we have processes that are working in close proximity to each other on disk, we don't want to idle wait. Instead allow the close process to issue a request, getting better aggregate bandwidth. The anticipatory scheduler has similar checks, noop and deadline do not need it since they don't care about process <-> io mappings. The code for CFQ is a little more involved though, since we split request queues into per-process contexts. This fixes a performance problem with eg dump(8), since it uses several processes in some silly attempt to speed IO up. Even if dump(8) isn't really a valid case (it should be fixed by using CLONE_IO), there are other cases where we see close processes and where idling ends up hurting performance. Credit goes to Jeff Moyer <jmoyer@redhat.com> for writing the initial implementation. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-15 12:15:11 +02:00
Jens Axboe	9481ffdc61	cfq-iosched: log responsible 'cfqq' in idle timer arm Makes it easier to read the traces. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-15 12:14:13 +02:00
Jens Axboe	2d87072296	cfq-iosched: tweak kick logic a bit more We only kick the dispatch for an idling queue, if we think it's a (somewhat) fully merged request. Also allow a kick if we have other busy queues in the system, since we don't want to risk waiting for a potential merge in that case. It's better to get some work done and proceed. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-15 12:12:46 +02:00
Jens Axboe	40bb54d197	cfq-iosched: no need to save interrupts in cfq_kick_queue() It's called from the workqueue handlers from process context, so we always have irqs enabled when entered. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-15 12:11:10 +02:00
Jens Axboe	d6ceb25e8d	cfq-iosched: don't delay queue kick for a merged request "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> reports that commit `b029195dda` introduced a regression of about 50% with sequential threaded read workloads. The test case is: tiotest -k0 -k1 -k3 -f 80 -t 32 which starts 32 threads each reading a 80MB file. Twiddle the kick queue logic so that we do start IO immediately, if it appears to be a fully merged request. We can't really detect that, so just check if the request is bigger than a page or not. The assumption is that since single bio issues will first queue a single request with just one page attached and then later do merges on that, if we already have more than a page worth of data in the request, then the request is most likely good to go. Verified that this doesn't cause a regression with the test case that commit `b029195dda` was fixing. It does not, we still see maximum sized requests for the queue-then-merge cases. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-15 08:28:12 +02:00
Jens Axboe	ff6657c6c8	cfq-iosched: get rid of private SYNC/ASYNC defines We can just use the block layer BLK_RW_SYNC/ASYNC defines now. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-15 08:28:10 +02:00
Jens Axboe	b0b78f81a5	cfq-iosched: use rw_is_sync() to see if rw flags are sync or not Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-15 08:28:10 +02:00
Jens Axboe	b029195dda	cfq-iosched: don't let idling interfere with plugging When CFQ is waiting for a new request from a process, currently it'll immediately restart queuing when it sees such a request. This doesn't work very well with streamed IO, since we then end up splitting IO that would otherwise have been merged nicely. For a simple dd test, this causes 10x as many requests to be issued as we should have. Normally this goes unnoticed due to the low overhead of requests at the device side, but some hardware is very sensitive to request sizes and there it can cause big slow downs. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-07 11:38:31 +02:00
Jens Axboe	75e50984f0	cfq-iosched: kill two unused cfqq flags We only manipulate the must_dispatch and queue_new flags, they are not tested anymore. So get rid of them. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-07 08:56:14 +02:00
Jens Axboe	2f5cb7381b	cfq-iosched: change dispatch logic to deal with single requests at the time The IO scheduler core calls into the IO scheduler dispatch_request hook to move requests from the IO scheduler and into the driver dispatch list. It only does so when the dispatch list is empty. CFQ moves several requests to the dispatch list, which can cause higher latencies if we suddenly have to switch to some important sync IO. Change the logic to move one request at the time instead. This should almost be functionally equivalent to what we did before, except that we now honor 'quantum' as the maximum queue depth at the device side from any single cfqq. If there's just a single active cfqq, we allow up to 4 times the normal quantum. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-07 08:51:19 +02:00
Jens Axboe	aeb6fafb8f	block: Add flag for telling the IO schedulers NOT to anticipate more IO By default, CFQ will anticipate more IO from a given io context if the previously completed IO was sync. This used to be fine, since the only sync IO was reads and O_DIRECT writes. But with more "normal" sync writes being used now, we don't want to anticipate for those. Add a bio/request flag that informs the IO scheduler that this is a sync request that we should not idle for. Introduce WRITE_ODIRECT specifically for O_DIRECT writes, and make sure that the other sync writes set this flag. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-06 08:04:54 -07:00
Divyesh Shah	3a9a3f6cc5	cfq-iosched: Allow RT requests to pre-empt ongoing BE timeslice This patch adds the ability to pre-empt an ongoing BE timeslice when a RT request is waiting for the current timeslice to complete. This reduces the wait time to disk for RT requests from an upper bound of 4 (current value of cfq_quantum) to 1 disk request. Applied Jens' suggeested changes to avoid the rb lookup and use !cfq_class_rt() and retested. Latency(secs) for the RT task when doing sequential reads from 10G file. \| only RT \| RT + BE \| RT + BE + this patch small (512 byte) reads \| 143 \| 163 \| 145 large (1Mb) reads \| 142 \| 158 \| 146 Signed-off-by: Divyesh Shah <dpshah@google.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-01-30 12:47:33 +01:00
Jens Axboe	62c1fe9d9f	cfq-iosched: fix race between exiting queue and exiting task Original patch from Nikanth Karthikesan <knikanth@suse.de> When a queue exits the queue lock is taken and cfq_exit_queue() would free all the cic's associated with the queue. But when a task exits, cfq_exit_io_context() gets cic one by one and then locks the associated queue to call __cfq_exit_single_io_context. It looks like between getting a cic from the ioc and locking the queue, the queue might have exited on another cpu. Fix this by rechecking the cfq_io_context queue key inside the queue lock again, and not calling into __cfq_exit_single_io_context() if somebody beat us to it. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:29:52 +01:00
Jens Axboe	30e0dc28bf	cfq-iosched: remove limit of dispatch depth of max 4 times quantum This basically limits the hardware queue depth to 4*quantum at any point in time, which is 16 with the default settings. As CFQ uses other means to shrink the hardware queue when necessary in the first place, there's really no need for this extra heuristic. Additionally, it ends up hurting performance in some cases. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:29:51 +01:00
Jens Axboe	b374d18a4b	block: get rid of elevator_t typedef Just use struct elevator_queue everywhere instead. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:29:50 +01:00
Cheng Renquan	64d01dc9e1	block: use cancel_work_sync() instead of kblockd_flush_work() After many improvements on kblockd_flush_work, it is now identical to cancel_work_sync, so a direct call to cancel_work_sync is suggested. The only difference is that cancel_work_sync is a GPL symbol, so no non-GPL modules anymore. Signed-off-by: Cheng Renquan <crquan@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:44 +01:00
Jens Axboe	f7d7b7a7a3	block: as/cfq ssd idle check update We really need to know about the hardware tagging support as well, since if the SSD does not do tagging then we still want to idle. Otherwise have the same dependent sync IO vs flooding async IO problem as on rotational media. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:19 +02:00
Jens Axboe	a68bbddba4	block: add queue flag for SSD/non-rotational devices We don't want to idle in AS/CFQ if the device doesn't have a seek penalty. So add a QUEUE_FLAG_NONROT to indicate a non-rotational device, low level drivers should set this flag upon discovery of an SSD or similar device type. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:19 +02:00
Aaron Carroll	45333d5a31	cfq-iosched: fix queue depth detection CFQ's detection of queueing devices assumes a non-queuing device and detects if the queue depth reaches a certain threshold. Under some workloads (e.g. synchronous reads), CFQ effectively forces a unit queue depth, thus defeating the detection logic. This leads to poor performance on queuing hardware, since the idle window remains enabled. This patch inverts the sense of the logic: assume a queuing-capable device, and detect if the depth does not exceed the threshold. Signed-off-by: Aaron Carroll <aaronc@gelato.unsw.edu.au> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:09 +02:00
Jens Axboe	18887ad910	block: make kblockd_schedule_work() take the queue as parameter Preparatory patch for checking queuing affinity. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:09 +02:00
Jens Axboe	c265a7f417	cfq-iosched: get rid of enable_idle being unused warning Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-07-03 13:21:14 +02:00
Jens Axboe	7b679138b3	cfq-iosched: add message logging through blktrace Now that blktrace has the ability to carry arbitrary messages in its stream, use that for some CFQ logging. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-07-03 13:21:12 +02:00
Jens Axboe	9a11b4ed0e	cfq-iosched: properly protect ioc_gone and ioc count If we have multiple tasks freeing cfq_io_contexts when cfq-iosched is being unloaded, we could complete() ioc_gone twice. Fix that by protecting ioc_gone complete() and clearing with a spinlock for just that purpose. Doesn't matter from a performance perspective, since it'll only enter that path when ioc_gone != NULL (when cfq-iosched is being rmmod'ed). Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-07-03 13:21:12 +02:00
Jens Axboe	d6de8be711	cfq-iosched: fix RCU problem in cfq_cic_lookup() cfq_cic_lookup() needs to properly protect ioc->ioc_data before dereferencing it and also exclude updaters of ioc->ioc_data as well. Also add a number of comments documenting why the existing RCU usage is OK. Thanks a lot to "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> for review and comments! Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-05-28 14:49:28 +02:00

1 2 3 4

180 Commits