1
linux/mm
Michel Lespinasse d065bd810b mm: retry page fault when blocking on disk transfer
This change reduces mmap_sem hold times that are caused by waiting for
disk transfers when accessing file mapped VMAs.

It introduces the VM_FAULT_ALLOW_RETRY flag, which indicates that the call
site wants mmap_sem to be released if blocking on a pending disk transfer.
In that case, filemap_fault() returns the VM_FAULT_RETRY status bit and
do_page_fault() will then re-acquire mmap_sem and retry the page fault.

It is expected that the retry will hit the same page which will now be
cached, and thus it will complete with a low mmap_sem hold time.

Tests:

- microbenchmark: thread A mmaps a large file and does random read accesses
  to the mmaped area - achieves about 55 iterations/s. Thread B does
  mmap/munmap in a loop at a separate location - achieves 55 iterations/s
  before, 15000 iterations/s after.

- We are seeing related effects in some applications in house, which show
  significant performance regressions when running without this change.

[akpm@linux-foundation.org: fix warning & crash]
Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Ying Han <yinghan@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 16:52:09 -07:00
..
backing-dev.c writeback: do not sleep on the congestion queue if there are no congested BDIs or if significant congestion is not being encountered in the current zone 2010-10-26 16:52:07 -07:00
bootmem.c x86, memblock: Replace e820_/_early string with memblock_ 2010-08-27 11:13:47 -07:00
bounce.c bounce: call flush_dcache_page() after bounce_copy_vec() 2010-09-09 18:57:25 -07:00
compaction.c mm: compaction: handle active and inactive fairly in too_many_isolated 2010-09-09 18:57:24 -07:00
debug-pagealloc.c
dmapool.c mm: add a might_sleep_if() to dma_pool_alloc() 2010-10-26 16:52:08 -07:00
fadvise.c readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM 2010-03-06 11:26:25 -08:00
failslab.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
filemap_xip.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
filemap.c mm: retry page fault when blocking on disk transfer 2010-10-26 16:52:09 -07:00
fremap.c Avoid pgoff overflow in remap_file_pages 2010-09-25 09:34:58 -07:00
highmem.c mm: stack based kmap_atomic() 2010-10-26 16:52:08 -07:00
hugetlb.c Encode huge page size for VM_FAULT_HWPOISON errors 2010-10-08 09:32:46 +02:00
hwpoison-inject.c HWPOISON, hugetlb: support hwpoison injection for hugepage 2010-08-11 09:23:11 +02:00
init-mm.c mm: provide init_mm mm_context initializer 2010-08-09 20:44:54 -07:00
internal.h
Kconfig Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2010-10-22 17:31:36 -07:00
Kconfig.debug
kmemcheck.c kmemcheck: Fix build errors due to missing slab.h 2010-03-30 22:02:32 +09:00
kmemleak-test.c
kmemleak.c kmemleak: Fix typo in the comment 2010-08-08 21:57:23 +01:00
ksm.c ksm: fix bad user data when swapping 2010-10-04 11:09:53 -07:00
maccess.c maccess,probe_kernel: Allow arch specific override probe_kernel_(read|write) 2010-01-07 11:58:36 -06:00
madvise.c
Makefile percpu: use percpu allocator on UP too 2010-10-02 10:26:05 +03:00
memblock.c memblock: Annotate memblock functions with __init_memblock 2010-10-11 16:00:52 -07:00
memcontrol.c memcg: fix thresholds with use_hierarchy == 1 2010-10-07 13:31:21 -07:00
memory_hotplug.c memory hotplug: unify is_removable and offline detection code 2010-10-26 16:52:06 -07:00
memory-failure.c mm: compaction: fix COMPACTPAGEFAILED counting 2010-10-26 16:52:06 -07:00
memory.c mm: retry page fault when blocking on disk transfer 2010-10-26 16:52:09 -07:00
mempolicy.c mm/mempolicy.c: check return code of check_range 2010-10-26 16:52:06 -07:00
mempool.c
migrate.c mm: compaction: fix COMPACTPAGEFAILED counting 2010-10-26 16:52:06 -07:00
mincore.c mincore: do nested page table walks 2010-05-25 08:06:58 -07:00
mlock.c mm: Move vma_stack_continue into mm.h 2010-09-09 09:05:06 -07:00
mm_init.c
mmap.c mmap: call unlink_anon_vmas() in __split_vma() in case of error 2010-09-22 17:22:40 -07:00
mmu_context.c exit: fix oops in sync_mm_rss 2010-03-24 16:31:21 -07:00
mmu_notifier.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
mmzone.c mm: page allocator: calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake 2010-09-09 18:57:25 -07:00
mprotect.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
mremap.c mm: remove pte_*map_nested() 2010-10-26 16:52:08 -07:00
msync.c sanitize vfs_fsync calling conventions 2010-05-21 18:31:21 -04:00
nommu.c mm: make the vma list be doubly linked 2010-08-21 08:49:21 -07:00
oom_kill.c oom: kill all threads sharing oom killed task's mm 2010-10-26 16:52:05 -07:00
page_alloc.c writeback: do not sleep on the congestion queue if there are no congested BDIs or if significant congestion is not being encountered in the current zone 2010-10-26 16:52:07 -07:00
page_cgroup.c kmemleak: Annotate false positive in init_section_page_cgroup() 2010-07-19 11:54:14 +01:00
page_io.c block: unify flags for struct bio and struct request 2010-08-07 18:20:39 +02:00
page_isolation.c
page-writeback.c writeback: remove the internal 5% low bound on dirty_ratio 2010-10-26 16:52:08 -07:00
pagewalk.c pagemap: fix pfn calculation for hugepage 2010-04-07 08:38:04 -07:00
percpu-km.c percpu: clear memory allocated with the km allocator 2010-10-02 10:28:42 +03:00
percpu-vm.c percpu: move vmalloc based chunk management into percpu-vm.c 2010-05-01 08:30:50 +02:00
percpu.c Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2010-10-24 13:41:39 -07:00
prio_tree.c
quicklist.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
readahead.c readahead.c: fix comment 2010-05-25 08:07:00 -07:00
rmap.c Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 2010-10-26 10:13:10 -07:00
shmem.c shmem: put_super must percpu_counter_destroy 2010-08-17 18:33:11 -07:00
slab.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 2010-08-22 10:08:52 -07:00
slob.c slob: fix gfp flags for order-0 page allocations 2010-10-02 10:24:28 +03:00
slub.c SLUB: Fix memory hotplug with !NUMA 2010-10-06 21:16:42 +03:00
sparse-vmemmap.c x86: Use memblock to replace early_res 2010-08-27 11:12:29 -07:00
sparse.c sparsemem: on no vmemmap path put mem_map on node high too 2010-05-25 08:06:56 -07:00
swap_state.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
swap.c mm: export lru_cache_add_*() to modules 2010-05-25 15:06:06 +02:00
swapfile.c Merge branch 'v2.6.36-rc8' into for-2.6.37/barrier 2010-10-19 09:13:04 +02:00
thrash.c
truncate.c check ATTR_SIZE contraints in inode_change_ok 2010-08-09 16:47:39 -04:00
util.c export __get_user_pages_fast() function 2010-10-24 10:51:24 +02:00
vmalloc.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2010-10-22 17:31:36 -07:00
vmscan.c vmscan,tmpfs: treat used once pages on tmpfs as used once 2010-10-26 16:52:08 -07:00
vmstat.c writeback: report dirty thresholds in /proc/vmstat 2010-10-26 16:52:06 -07:00