1
linux/mm
Mel Gorman a1e78772d7 hugetlb: reserve huge pages for reliable MAP_PRIVATE hugetlbfs mappings until fork()
This patch reserves huge pages at mmap() time for MAP_PRIVATE mappings in
a similar manner to the reservations taken for MAP_SHARED mappings.  The
reserve count is accounted both globally and on a per-VMA basis for
private mappings.  This guarantees that a process that successfully calls
mmap() will successfully fault all pages in the future unless fork() is
called.

The characteristics of private mappings of hugetlbfs files behaviour after
this patch are;

1. The process calling mmap() is guaranteed to succeed all future faults until
   it forks().
2. On fork(), the parent may die due to SIGKILL on writes to the private
   mapping if enough pages are not available for the COW. For reasonably
   reliable behaviour in the face of a small huge page pool, children of
   hugepage-aware processes should not reference the mappings; such as
   might occur when fork()ing to exec().
3. On fork(), the child VMAs inherit no reserves. Reads on pages already
   faulted by the parent will succeed. Successful writes will depend on enough
   huge pages being free in the pool.
4. Quotas of the hugetlbfs mount are checked at reserve time for the mapper
   and at fault time otherwise.

Before this patch, all reads or writes in the child potentially needs page
allocations that can later lead to the death of the parent.  This applies
to reads and writes of uninstantiated pages as well as COW.  After the
patch it is only a write to an instantiated page that causes problems.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:16 -07:00
..
allocpercpu.c Merge commit 'v2.6.26-rc9' into cpus4096 2008-07-06 14:23:39 +02:00
backing-dev.c mm: bdi: fix race in bdi_class device creation 2008-05-20 13:31:53 -07:00
bootmem.c mm: unexport __alloc_bootmem_core() 2008-07-24 10:47:14 -07:00
bounce.c
dmapool.c dmapool: enable debugging for CONFIG_SLUB_DEBUG_ON too 2008-04-28 08:58:20 -07:00
fadvise.c xip: support non-struct page backed memory 2008-04-28 08:58:23 -07:00
filemap_xip.c xip: support non-struct page backed memory 2008-04-28 08:58:23 -07:00
filemap.c kill generic_file_direct_IO() 2008-07-24 10:47:14 -07:00
fremap.c
highmem.c highmem: Export totalhigh_pages. 2008-07-19 22:39:46 -07:00
hugetlb.c hugetlb: reserve huge pages for reliable MAP_PRIVATE hugetlbfs mappings until fork() 2008-07-24 10:47:16 -07:00
internal.h mm: remove double indirection on tlb parameter to free_pgd_range() & Co 2008-07-24 10:47:15 -07:00
Kconfig Merge git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6 2008-07-14 13:37:29 -07:00
maccess.c
madvise.c xip: support non-struct page backed memory 2008-04-28 08:58:23 -07:00
Makefile mm: add a basic debugging framework for memory initialisation 2008-07-24 10:47:13 -07:00
memcontrol.c memcg: simple stats for memory resource controller 2008-05-01 08:04:02 -07:00
memory_hotplug.c mm: drop unneeded pgdat argument from free_area_init_node() 2008-07-24 10:47:16 -07:00
memory.c mm: remove double indirection on tlb parameter to free_pgd_range() & Co 2008-07-24 10:47:15 -07:00
mempolicy.c mempolicy: mask off internal flags for userspace API 2008-07-04 13:03:05 -07:00
mempool.c
migrate.c mm/migrate.c should #include <linux/syscalls.h> 2008-07-24 10:47:14 -07:00
mincore.c mm: remove nopage 2008-04-28 08:58:18 -07:00
mlock.c
mm_init.c mm: print out the zonelists on request for manual verification 2008-07-24 10:47:14 -07:00
mmap.c mm: remove double indirection on tlb parameter to free_pgd_range() & Co 2008-07-24 10:47:15 -07:00
mmzone.c mm: filter based on a nodemask as well as a gfp_mask 2008-04-28 08:58:19 -07:00
mprotect.c Merge commit '85082fd7cbe3173198aac0eb5e85ab1edcc6352c' into test-build 2008-07-15 15:44:51 +10:00
mremap.c
msync.c
nommu.c nommu: Correct kobjsize() page validity checks. 2008-06-12 07:56:17 -07:00
oom_kill.c oom_kill: remove unused parameter in badness() 2008-04-28 08:58:26 -07:00
page_alloc.c mm: drop unneeded pgdat argument from free_area_init_node() 2008-07-24 10:47:16 -07:00
page_io.c
page_isolation.c
page-writeback.c Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 2008-07-15 08:36:38 -07:00
pagewalk.c pagemap: pass mm into pagewalkers 2008-06-12 18:05:41 -07:00
pdflush.c mm/pdflush.c: merge the same code in two path 2008-05-13 08:02:24 -07:00
prio_tree.c
quicklist.c
readahead.c mm: bdi: export BDI attributes in sysfs 2008-04-30 08:29:49 -07:00
rmap.c mm: remove nopage 2008-04-28 08:58:18 -07:00
shmem_acl.c
shmem.c mm: bdi: add separate writeback accounting capability 2008-04-30 08:29:50 -07:00
slab.c Merge branch 'generic-ipi' into generic-ipi-for-linus 2008-07-15 21:55:59 +02:00
slob.c slob: record page flag overlays explicitly 2008-07-24 10:47:15 -07:00
slub.c slub: record page flag overlays explicitly 2008-07-24 10:47:15 -07:00
sparse-vmemmap.c Christoph has moved 2008-07-04 10:40:04 -07:00
sparse.c mm: make defensive checks around PFN values registered for memory usage 2008-07-24 10:47:13 -07:00
swap_state.c mm: bdi: add separate writeback accounting capability 2008-04-30 08:29:50 -07:00
swap.c mm: fix atomic_t overflow in vm 2008-05-24 09:56:09 -07:00
swapfile.c mm: use non-racy method for /proc/swaps creation 2008-04-29 08:06:20 -07:00
thrash.c
tiny-shmem.c
truncate.c fix invalidate_inode_pages2_range() to not clear ret 2008-04-28 08:58:18 -07:00
util.c
vmalloc.c docbook: fix vmalloc missing parameter notation 2008-05-01 08:03:59 -07:00
vmscan.c mm: fix incorrect variable type in do_try_to_free_pages() 2008-06-12 18:05:39 -07:00
vmstat.c mm/vmstat.c: proper externs 2008-07-24 10:47:14 -07:00