1
linux/mm
Nick Piggin 362a61ad61 fix SMP data race in pagetable setup vs walking
There is a possible data race in the page table walking code. After the split
ptlock patches, it actually seems to have been introduced to the core code, but
even before that I think it would have impacted some architectures (powerpc
and sparc64, at least, walk the page tables without taking locks eg. see
find_linux_pte()).

The race is as follows:
The pte page is allocated, zeroed, and its struct page gets its spinlock
initialized. The mm-wide ptl is then taken, and then the pte page is inserted
into the pagetables.

At this point, the spinlock is not guaranteed to have ordered the previous
stores to initialize the pte page with the subsequent store to put it in the
page tables. So another Linux page table walker might be walking down (without
any locks, because we have split-leaf-ptls), and find that new pte we've
inserted. It might try to take the spinlock before the store from the other
CPU initializes it. And subsequently it might read a pte_t out before stores
from the other CPU have cleared the memory.

There are also similar races in higher levels of the page tables. They
obviously don't involve the spinlock, but could see uninitialized memory.

Arch code and hardware pagetable walkers that walk the pagetables without
locks could see similar uninitialized memory problems, regardless of whether
split ptes are enabled or not.

I prefer to put the barriers in core code, because that's where the higher
level logic happens, but the page table accessors are per-arch, and open-coding
them everywhere I don't think is an option. I'll put the read-side barriers
in alpha arch code for now (other architectures perform data-dependent loads
in order).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-14 10:05:18 -07:00
..
allocpercpu.c cpumask: Cleanup more uses of CPU_MASK and NODE_MASK 2008-04-19 19:44:58 +02:00
backing-dev.c mm: bdi: move statistics to debugfs 2008-04-30 08:29:50 -07:00
bootmem.c memory hotplug: make alloc_bootmem_section() 2008-04-28 08:58:25 -07:00
bounce.c
dmapool.c dmapool: enable debugging for CONFIG_SLUB_DEBUG_ON too 2008-04-28 08:58:20 -07:00
fadvise.c xip: support non-struct page backed memory 2008-04-28 08:58:23 -07:00
filemap_xip.c xip: support non-struct page backed memory 2008-04-28 08:58:23 -07:00
filemap.c vfs: splice remove_suid() cleanup 2008-05-07 09:29:00 +02:00
fremap.c mm: fix various kernel-doc comments 2008-03-19 18:53:35 -07:00
highmem.c mm: highmem kernel-doc additions 2008-03-19 18:53:35 -07:00
hugetlb.c page allocator: explicitly retry hugepage allocations 2008-04-29 08:05:58 -07:00
internal.h memory hotplug: free memmaps allocated by bootmem 2008-04-28 08:58:26 -07:00
Kconfig PAGEFLAGS_EXTENDED and separate page flags for Head and Tail 2008-04-28 08:58:22 -07:00
maccess.c kgdb: fix optional arch functions and probe_kernel_* 2008-04-17 20:05:39 +02:00
madvise.c xip: support non-struct page backed memory 2008-04-28 08:58:23 -07:00
Makefile uaccess: add probe_kernel_write() 2008-04-17 20:05:36 +02:00
memcontrol.c memcg: simple stats for memory resource controller 2008-05-01 08:04:02 -07:00
memory_hotplug.c mm/memory_hotplug.c must #include "internal.h" 2008-04-28 13:44:29 -07:00
memory.c fix SMP data race in pagetable setup vs walking 2008-05-14 10:05:18 -07:00
mempolicy.c mempolicy: use struct mempolicy pointer in shmem_sb_info 2008-04-28 08:58:25 -07:00
mempool.c
migrate.c mm: fix warning on memory offline 2008-04-30 08:29:55 -07:00
mincore.c mm: remove nopage 2008-04-28 08:58:18 -07:00
mlock.c
mmap.c procfs task exe symlink 2008-04-29 08:06:17 -07:00
mmzone.c mm: filter based on a nodemask as well as a gfp_mask 2008-04-28 08:58:19 -07:00
mprotect.c
mremap.c
msync.c
nommu.c procfs task exe symlink 2008-04-29 08:06:17 -07:00
oom_kill.c oom_kill: remove unused parameter in badness() 2008-04-28 08:58:26 -07:00
page_alloc.c infrastructure to debug (dynamic) objects 2008-04-30 08:29:53 -07:00
page_io.c mm: fix PageUptodate data race 2008-02-05 09:44:19 -08:00
page_isolation.c
page-writeback.c mm: Add NR_WRITEBACK_TEMP counter 2008-04-30 08:29:50 -07:00
pagewalk.c mm: fix possible off-by-one in walk_pte_range() 2008-04-28 08:58:16 -07:00
pdflush.c mm/pdflush.c: merge the same code in two path 2008-05-13 08:02:24 -07:00
prio_tree.c
quicklist.c
readahead.c mm: bdi: export BDI attributes in sysfs 2008-04-30 08:29:49 -07:00
rmap.c mm: remove nopage 2008-04-28 08:58:18 -07:00
shmem_acl.c
shmem.c mm: bdi: add separate writeback accounting capability 2008-04-30 08:29:50 -07:00
slab.c mm: remove remaining __FUNCTION__ occurrences 2008-04-30 08:29:53 -07:00
slob.c slob: fix bug - when slob allocates "struct kmem_cache", it does not force alignment. 2008-04-27 18:25:51 +03:00
slub.c slub: fix atomic usage in any_slab_objects() 2008-05-08 10:46:56 -07:00
sparse-vmemmap.c NULL noise: fs/*, mm/*, kernel/* 2008-03-30 14:18:41 -07:00
sparse.c revert "memory hotplug: allocate usemap on the section with pgdat" 2008-04-30 08:29:55 -07:00
swap_state.c mm: bdi: add separate writeback accounting capability 2008-04-30 08:29:50 -07:00
swap.c mm: rotate_reclaimable_page() cleanup 2008-04-28 08:58:20 -07:00
swapfile.c mm: use non-racy method for /proc/swaps creation 2008-04-29 08:06:20 -07:00
thrash.c
tiny-shmem.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2008-03-25 08:57:47 -07:00
truncate.c fix invalidate_inode_pages2_range() to not clear ret 2008-04-28 08:58:18 -07:00
util.c
vmalloc.c docbook: fix vmalloc missing parameter notation 2008-05-01 08:03:59 -07:00
vmscan.c mm: remove remaining __FUNCTION__ occurrences 2008-04-30 08:29:53 -07:00
vmstat.c make vmstat cpu-unplug safe 2008-05-13 08:02:23 -07:00