1
linux/mm
Nick Piggin 479db0bf40 mm: dirty page tracking race fix
There is a race with dirty page accounting where a page may not properly
be accounted for.

clear_page_dirty_for_io() calls page_mkclean; then TestClearPageDirty.

page_mkclean walks the rmaps for that page, and for each one it cleans and
write protects the pte if it was dirty.  It uses page_check_address to
find the pte.  That function has a shortcut to avoid the ptl if the pte is
not present.  Unfortunately, the pte can be switched to not-present then
back to present by other code while holding the page table lock -- this
should not be a signal for page_mkclean to ignore that pte, because it may
be dirty.

For example, powerpc64's set_pte_at will clear a previously present pte
before setting it to the desired value.  There may also be other code in
core mm or in arch which do similar things.

The consequence of the bug is loss of data integrity due to msync, and
loss of dirty page accounting accuracy.  XIP's __xip_unmap could easily
also be unreliable (depending on the exact XIP locking scheme), which can
lead to data corruption.

Fix this by having an option to always take ptl to check the pte in
page_check_address.

It's possible to retain this optimization for page_referenced and
try_to_unmap.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Jared Hulbert <jaredeh@gmail.com>
Cc: Carsten Otte <cotte@freenet.de>
Cc: Hugh Dickins <hugh@veritas.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-20 15:40:32 -07:00
..
allocpercpu.c mm/allocpercpu.c: make 4 functions static 2008-07-26 12:00:12 -07:00
backing-dev.c mm: bdi: fix race in bdi_class device creation 2008-05-20 13:31:53 -07:00
bootmem.c bootmem: fix aligning of node-relative indexes and offsets 2008-08-20 15:40:31 -07:00
bounce.c block: Initial support for data-less (or empty) barrier support 2007-10-16 11:03:56 +02:00
dmapool.c dmapool: enable debugging for CONFIG_SLUB_DEBUG_ON too 2008-04-28 08:58:20 -07:00
fadvise.c xip: support non-struct page backed memory 2008-04-28 08:58:23 -07:00
filemap_xip.c mm: dirty page tracking race fix 2008-08-20 15:40:32 -07:00
filemap.c mm: rename page trylock 2008-08-04 21:31:34 -07:00
fremap.c mmu-notifiers: core 2008-07-28 16:30:21 -07:00
highmem.c highmem: Export totalhigh_pages. 2008-07-19 22:39:46 -07:00
hugetlb.c allocate structures for reservation tracking in hugetlbfs outside of spinlocks v2 2008-08-12 16:07:28 -07:00
internal.h mm: export prep_compound_page to mm 2008-07-24 10:47:17 -07:00
Kconfig mm: Make generic weak get_user_pages_fast and EXPORT_GPL it 2008-08-12 17:52:53 +10:00
maccess.c kgdb: fix optional arch functions and probe_kernel_* 2008-04-17 20:05:39 +02:00
madvise.c madvise: update function comment of madvise_dontneed 2008-07-30 09:41:45 -07:00
Makefile mmu-notifiers: core 2008-07-28 16:30:21 -07:00
memcontrol.c memcg: fix oops in mem_cgroup_shrink_usage 2008-08-12 16:07:28 -07:00
memory_hotplug.c memory-hotplug: add sysfs removable attribute for hotplug memory remove 2008-07-24 10:47:21 -07:00
memory.c mm: rename page trylock 2008-08-04 21:31:34 -07:00
mempolicy.c do_migrate_pages(): remove unused variable 2008-08-12 16:07:29 -07:00
mempool.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
migrate.c mm: rename page trylock 2008-08-04 21:31:34 -07:00
mincore.c mm: remove nopage 2008-04-28 08:58:18 -07:00
mlock.c mlock() fix return values 2008-08-04 16:58:45 -07:00
mm_init.c mm: mminit_loglevel cannot be __meminitdata anymore 2008-08-20 15:40:30 -07:00
mmap.c Merge branch 'core/locking' into core/urgent 2008-08-12 00:11:49 +02:00
mmu_notifier.c mmu-notifiers: core 2008-07-28 16:30:21 -07:00
mmzone.c mm: filter based on a nodemask as well as a gfp_mask 2008-04-28 08:58:19 -07:00
mprotect.c mmu-notifiers: core 2008-07-28 16:30:21 -07:00
mremap.c mmu-notifiers: core 2008-07-28 16:30:21 -07:00
msync.c
nommu.c nommu: Provide vmalloc_exec(). 2008-08-04 16:01:47 +09:00
oom_kill.c security: Fix setting of PF_SUPERPRIV by __capable() 2008-08-14 22:59:43 +10:00
page_alloc.c page allocator: use no-panic variant of alloc_bootmem() in alloc_large_system_hash() 2008-08-12 16:07:27 -07:00
page_io.c mm: fix PageUptodate data race 2008-02-05 09:44:19 -08:00
page_isolation.c memory hotremove: unset migrate type "ISOLATE" after removal 2007-11-14 18:45:38 -08:00
page-writeback.c mm: spinlock tree_lock 2008-07-26 12:00:06 -07:00
pagewalk.c pagemap: pass mm into pagewalkers 2008-06-12 18:05:41 -07:00
pdflush.c pdflush: use time_after() instead of open-coding it 2008-07-25 10:53:28 -07:00
prio_tree.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
quicklist.c quicklists: Only consider memory that can be used with GFP_KERNEL 2008-01-14 08:52:22 -08:00
readahead.c mm: readahead scan lockless 2008-07-26 12:00:06 -07:00
rmap.c mm: dirty page tracking race fix 2008-08-20 15:40:32 -07:00
shmem_acl.c [PATCH] sanitize ->permission() prototype 2008-07-26 20:53:14 -04:00
shmem.c mm: rename page trylock 2008-08-04 21:31:34 -07:00
slab.c mm: unexport ksize 2008-07-29 23:44:26 +03:00
slob.c mm: unexport ksize 2008-07-29 23:44:26 +03:00
slub.c SLUB: dynamic per-cache MIN_PARTIAL 2008-08-05 09:28:47 +03:00
sparse-vmemmap.c Christoph has moved 2008-07-04 10:40:04 -07:00
sparse.c mm/sparse.c: removed duplicated include 2008-08-12 16:07:30 -07:00
swap_state.c mm: show free swap as signed 2008-08-20 15:40:30 -07:00
swap.c mm: rename page trylock 2008-08-04 21:31:34 -07:00
swapfile.c mm: rename page trylock 2008-08-04 21:31:34 -07:00
thrash.c
tiny-shmem.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2008-03-25 08:57:47 -07:00
truncate.c mm: rename page trylock 2008-08-04 21:31:34 -07:00
util.c mm: Make generic weak get_user_pages_fast and EXPORT_GPL it 2008-08-12 17:52:53 +10:00
vmalloc.c Use WARN() in mm/vmalloc.c 2008-07-26 12:00:07 -07:00
vmscan.c mm: rename page trylock 2008-08-04 21:31:34 -07:00
vmstat.c mm/vmstat.c: proper externs 2008-07-24 10:47:14 -07:00