1
linux/mm
Mel Gorman fc1b8a73dd hugetlb: move hugetlb_acct_memory()
This is a patchset to give reliable behaviour to a process that
successfully calls mmap(MAP_PRIVATE) on a hugetlbfs file.  Currently, it
is possible for the process to be killed due to a small hugepage pool size
even if it calls mlock().

MAP_SHARED mappings on hugetlbfs reserve huge pages at mmap() time.  This
guarantees all future faults against the mapping will succeed.  This
allows local allocations at first use improving NUMA locality whilst
retaining reliability.

MAP_PRIVATE mappings do not reserve pages.  This can result in an
application being SIGKILLed later if a huge page is not available at fault
time.  This makes huge pages usage very ill-advised in some cases as the
unexpected application failure cannot be detected and handled as it is
immediately fatal.  Although an application may force instantiation of the
pages using mlock(), this may lead to poor memory placement and the
process may still be killed when performing COW.

This patchset introduces a reliability guarantee for the process which
creates a private mapping, i.e.  the process that calls mmap() on a
hugetlbfs file successfully.  The first patch of the set is purely
mechanical code move to make later diffs easier to read.  The second patch
will guarantee faults up until the process calls fork().  After patch two,
as long as the child keeps the mappings, the parent is no longer
guaranteed to be reliable.  Patch 3 guarantees that the parent will always
successfully COW by unmapping the pages from the child in the event there
are insufficient pages in the hugepage pool in allocate a new page, be it
via a static or dynamic pool.

Existing hugepage-aware applications are unlikely to be affected by this
change.  For much of hugetlbfs's history, pages were pre-faulted at mmap()
time or mmap() failed which acts in a reserve-like manner.  If the pool is
sized correctly already so that parent and child can fault reliably, the
application will not even notice the reserves.  It's only when the pool is
too small for the application to function perfectly reliably that the
reserves come into play.

Credit goes to Andy Whitcroft for cleaning up a number of mistakes during
review before the patches were released.

This patch:

A later patch in this set needs to call hugetlb_acct_memory() before it is
defined.  This patch moves the function without modification.  This makes
later diffs easier to read.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:16 -07:00
..
allocpercpu.c Merge commit 'v2.6.26-rc9' into cpus4096 2008-07-06 14:23:39 +02:00
backing-dev.c mm: bdi: fix race in bdi_class device creation 2008-05-20 13:31:53 -07:00
bootmem.c mm: unexport __alloc_bootmem_core() 2008-07-24 10:47:14 -07:00
bounce.c block: Initial support for data-less (or empty) barrier support 2007-10-16 11:03:56 +02:00
dmapool.c dmapool: enable debugging for CONFIG_SLUB_DEBUG_ON too 2008-04-28 08:58:20 -07:00
fadvise.c xip: support non-struct page backed memory 2008-04-28 08:58:23 -07:00
filemap_xip.c xip: support non-struct page backed memory 2008-04-28 08:58:23 -07:00
filemap.c kill generic_file_direct_IO() 2008-07-24 10:47:14 -07:00
fremap.c mm: fix various kernel-doc comments 2008-03-19 18:53:35 -07:00
highmem.c highmem: Export totalhigh_pages. 2008-07-19 22:39:46 -07:00
hugetlb.c hugetlb: move hugetlb_acct_memory() 2008-07-24 10:47:16 -07:00
internal.h mm: remove double indirection on tlb parameter to free_pgd_range() & Co 2008-07-24 10:47:15 -07:00
Kconfig Merge git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6 2008-07-14 13:37:29 -07:00
maccess.c kgdb: fix optional arch functions and probe_kernel_* 2008-04-17 20:05:39 +02:00
madvise.c xip: support non-struct page backed memory 2008-04-28 08:58:23 -07:00
Makefile mm: add a basic debugging framework for memory initialisation 2008-07-24 10:47:13 -07:00
memcontrol.c memcg: simple stats for memory resource controller 2008-05-01 08:04:02 -07:00
memory_hotplug.c mm: drop unneeded pgdat argument from free_area_init_node() 2008-07-24 10:47:16 -07:00
memory.c mm: remove double indirection on tlb parameter to free_pgd_range() & Co 2008-07-24 10:47:15 -07:00
mempolicy.c mempolicy: mask off internal flags for userspace API 2008-07-04 13:03:05 -07:00
mempool.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
migrate.c mm/migrate.c should #include <linux/syscalls.h> 2008-07-24 10:47:14 -07:00
mincore.c mm: remove nopage 2008-04-28 08:58:18 -07:00
mlock.c
mm_init.c mm: print out the zonelists on request for manual verification 2008-07-24 10:47:14 -07:00
mmap.c mm: remove double indirection on tlb parameter to free_pgd_range() & Co 2008-07-24 10:47:15 -07:00
mmzone.c mm: filter based on a nodemask as well as a gfp_mask 2008-04-28 08:58:19 -07:00
mprotect.c Merge commit '85082fd7cbe3173198aac0eb5e85ab1edcc6352c' into test-build 2008-07-15 15:44:51 +10:00
mremap.c sparse pointer use of zero as null 2007-10-18 14:37:31 -07:00
msync.c
nommu.c nommu: Correct kobjsize() page validity checks. 2008-06-12 07:56:17 -07:00
oom_kill.c oom_kill: remove unused parameter in badness() 2008-04-28 08:58:26 -07:00
page_alloc.c mm: drop unneeded pgdat argument from free_area_init_node() 2008-07-24 10:47:16 -07:00
page_io.c mm: fix PageUptodate data race 2008-02-05 09:44:19 -08:00
page_isolation.c memory hotremove: unset migrate type "ISOLATE" after removal 2007-11-14 18:45:38 -08:00
page-writeback.c Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 2008-07-15 08:36:38 -07:00
pagewalk.c pagemap: pass mm into pagewalkers 2008-06-12 18:05:41 -07:00
pdflush.c mm/pdflush.c: merge the same code in two path 2008-05-13 08:02:24 -07:00
prio_tree.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
quicklist.c quicklists: Only consider memory that can be used with GFP_KERNEL 2008-01-14 08:52:22 -08:00
readahead.c mm: bdi: export BDI attributes in sysfs 2008-04-30 08:29:49 -07:00
rmap.c mm: remove nopage 2008-04-28 08:58:18 -07:00
shmem_acl.c
shmem.c mm: bdi: add separate writeback accounting capability 2008-04-30 08:29:50 -07:00
slab.c Merge branch 'generic-ipi' into generic-ipi-for-linus 2008-07-15 21:55:59 +02:00
slob.c slob: record page flag overlays explicitly 2008-07-24 10:47:15 -07:00
slub.c slub: record page flag overlays explicitly 2008-07-24 10:47:15 -07:00
sparse-vmemmap.c Christoph has moved 2008-07-04 10:40:04 -07:00
sparse.c mm: make defensive checks around PFN values registered for memory usage 2008-07-24 10:47:13 -07:00
swap_state.c mm: bdi: add separate writeback accounting capability 2008-04-30 08:29:50 -07:00
swap.c mm: fix atomic_t overflow in vm 2008-05-24 09:56:09 -07:00
swapfile.c mm: use non-racy method for /proc/swaps creation 2008-04-29 08:06:20 -07:00
thrash.c
tiny-shmem.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2008-03-25 08:57:47 -07:00
truncate.c fix invalidate_inode_pages2_range() to not clear ret 2008-04-28 08:58:18 -07:00
util.c fix mm/util.c:krealloc() 2007-11-14 18:45:41 -08:00
vmalloc.c docbook: fix vmalloc missing parameter notation 2008-05-01 08:03:59 -07:00
vmscan.c mm: fix incorrect variable type in do_try_to_free_pages() 2008-06-12 18:05:39 -07:00
vmstat.c mm/vmstat.c: proper externs 2008-07-24 10:47:14 -07:00