1
Commit Graph

1300 Commits

Author SHA1 Message Date
Ingo Molnar
daab7fc734 Merge commit 'v2.6.36-rc3' into x86/memblock
Conflicts:
	arch/x86/kernel/trampoline.c
	mm/memblock.c

Merge reason: Resolve the conflicts, update to latest upstream.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-31 09:45:46 +02:00
Yinghai Lu
774ea0bcb2 x86: Remove old bootmem code
Requested by Ingo, Thomas and HPA.

The old bootmem code is no longer necessary, and the transition is
complete.  Remove it.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:14:37 -07:00
Yinghai Lu
6f2a75369e x86, memblock: Use memblock_memory_size()/memblock_free_memory_size() to get correct dma_reserve
memblock_memory_size() will return memory size in memblock.memory.region.
memblock_free_memory_size() will return free memory size in memblock.memory.region.

So We can get exact reseved size in specified range.

Set the size right after initmem_init(), because later bootmem API will
get area above 16M. (except some fallback).

Later after we remove the bootmem, We could call that just before paging_init().

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:13:54 -07:00
Yinghai Lu
a9ce6bc151 x86, memblock: Replace e820_/_early string with memblock_
1.include linux/memblock.h directly. so later could reduce e820.h reference.
2 this patch is done by sed scripts mainly

-v2: use MEMBLOCK_ERROR instead of -1ULL or -1UL

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:13:47 -07:00
Yinghai Lu
72d7c3b33c x86: Use memblock to replace early_res
1. replace find_e820_area with memblock_find_in_range
2. replace reserve_early with memblock_x86_reserve_range
3. replace free_early with memblock_x86_free_range.
4. NO_BOOTMEM will switch to use memblock too.
5. use _e820, _early wrap in the patch, in following patch, will
   replace them all
6. because memblock_x86_free_range support partial free, we can remove some special care
7. Need to make sure that memblock_find_in_range() is called after memblock_x86_fill()
   so adjust some calling later in setup.c::setup_arch()
   -- corruption_check and mptable_update

-v2: Move reserve_brk() early
    Before fill_memblock_area, to avoid overlap between brk and memblock_find_in_range()
    that could happen We have more then 128 RAM entry in E820 tables, and
    memblock_x86_fill() could use memblock_find_in_range() to find a new place for
    memblock.memory.region array.
    and We don't need to use extend_brk() after fill_memblock_area()
    So move reserve_brk() early before fill_memblock_area().
-v3: Move find_smp_config early
    To make sure memblock_find_in_range not find wrong place, if BIOS doesn't put mptable
    in right place.
-v4: Treat RESERVED_KERN as RAM in memblock.memory. and they are already in
    memblock.reserved already..
    use __NOT_KEEP_MEMBLOCK to make sure memblock related code could be freed later.
-v5: Generic version __memblock_find_in_range() is going from high to low, and for 32bit
    active_region for 32bit does include high pages
    need to replace the limit with memblock.default_alloc_limit, aka get_max_mapped()
-v6: Use current_limit instead
-v7: check with MEMBLOCK_ERROR instead of -1ULL or -1L
-v8: Set memblock_can_resize early to handle EFI with more RAM entries
-v9: update after kmemleak changes in mainline

Suggested-by: David S. Miller <davem@davemloft.net>
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:12:29 -07:00
Yinghai Lu
301ff3e88e x86, memblock: Use memblock_debug to control debug message print out
Also let memblock_x86_reserve_range/memblock_x86_free_range could print out name if memblock=debug is
specified

will also print ther name when reserve_memblock_area/free_memblock_area are called.

-v2: according to Ingo, put " if (memblock_debug) " in one place

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:11:35 -07:00
Yinghai Lu
e82d42be24 x86, memblock: Add memblock_x86_memory_in_range()
It will return memory size in specified range according to memblock.memory.region

Try to share some code with memblock_x86_free_memory_in_range() by passing get_free to
__memblock_x86_memory_in_range().

-v2: Ben want _in_range in the name instead of size

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:11:30 -07:00
Yinghai Lu
b52c17ce85 x86, memblock: Add memblock_x86_free_memory_in_range()
It will return free memory size in specified range.

We can not use memory_size - reserved_size here, because some reserved area
may not be in the scope of memblock.memory.region.

Use memblock.memory.region subtracting memblock.reserved.region to get free range array.
then count size of all free ranges.

-v2: Ben insist on using _in_range

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:11:16 -07:00
Yinghai Lu
6bcc8176d0 x86, memblock: Add memblock_x86_find_in_range_node()
It can be used to find NODE_DATA for numa.

Need to make sure early_node_map[] is filled before it is called, otherwise
it will fallback to memblock_find_in_range(), with node range.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:10:57 -07:00
Yinghai Lu
88ba088c18 x86, memblock: Add memblock_x86_register_active_regions() and memblock_x86_hole_size()
memblock_x86_register_active_regions() will be used to fill early_node_map,
the result will be memblock.memory.region AND numa data

memblock_x86_hole_size will be used to find hole size on memblock.memory.region
with specified range.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:10:52 -07:00
Yinghai Lu
4d5cf86ce1 x86, memblock: Add get_free_all_memory_range()
get_free_all_memory_range is for CONFIG_NO_BOOTMEM=y, and will be called by
free_all_memory_core_early().

It will use early_node_map aka active ranges subtract memblock.reserved to
get all free range, and those ranges will convert to slab pages.

-v4: increase range size

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:10:48 -07:00
Yinghai Lu
9dc5d569c1 x86, memblock: Add memblock_x86_reserve_range/memblock_x86_free_range
They are wrappers for core versions, which take start/end/name instead
of base/size.  This will make x86 conversion eaasier.

could add more debug print out

-v2: change get_max_mapped() to memblock.default_alloc_limit according to Michael
      Ellerman and Ben
     change to memblock_x86_reserve_range and memblock_x86_free_range according to Michael Ellerman
-v3: call check_and_double after reserve/free, so could avoid to use
      find_memblock_area. Suggested by Michael Ellerman

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:10:41 -07:00
Yinghai Lu
27de794365 x86, memblock: Add memblock_x86_to_bootmem()
memblock_x86_to_bootmem() will reserve memblock.reserved.region in
bootmem after bootmem is set up.

We can use it to with all arches that support memblock later.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:08:21 -07:00
Yinghai Lu
f88eff74aa bootmem, x86: Add weak version of reserve_bootmem_generic
It will be used memblock_x86_to_bootmem converting

It is an wrapper for reserve_bootmem, and x86 64bit is using special one.

Also clean up that version for x86_64. We don't need to take care of numa
path for that, bootmem can handle it how

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:08:13 -07:00
Yinghai Lu
fb74fb6db9 x86, memblock: Add memblock_x86_find_in_range_size()
size is returned according free range.
Will be used to find free ranges for early_memtest and memory corruption check

Do not mess it up with lib/memblock.c yet.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:08:06 -07:00
Linus Torvalds
9605456919 x86: don't send SIGBUS for kernel page faults
It's wrong for several reasons, but the most direct one is that the
fault may be for the stack accesses to set up a previous SIGBUS.  When
we have a kernel exception, the kernel exception handler does all the
fixups, not some user-level signal handler.

Even apart from the nested SIGBUS issue, it's also wrong to give out
kernel fault addresses in the signal handler info block, or to send a
SIGBUS when a system call already returns EFAULT.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-13 09:49:20 -07:00
Cesar Eduardo Barros
597781f3e5 kmap_atomic: make kunmap_atomic() harder to misuse
kunmap_atomic() is currently at level -4 on Rusty's "Hard To Misuse"
list[1] ("Follow common convention and you'll get it wrong"), except in
some architectures when CONFIG_DEBUG_HIGHMEM is set[2][3].

kunmap() takes a pointer to a struct page; kunmap_atomic(), however, takes
takes a pointer to within the page itself.  This seems to once in a while
trip people up (the convention they are following is the one from
kunmap()).

Make it much harder to misuse, by moving it to level 9 on Rusty's list[4]
("The compiler/linker won't let you get it wrong").  This is done by
refusing to build if the type of its first argument is a pointer to a
struct page.

The real kunmap_atomic() is renamed to kunmap_atomic_notypecheck()
(which is what you would call in case for some strange reason calling it
with a pointer to a struct page is not incorrect in your code).

The previous version of this patch was compile tested on x86-64.

[1] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html
[2] In these cases, it is at level 5, "Do it right or it will always
    break at runtime."
[3] At least mips and powerpc look very similar, and sparc also seems to
    share a common ancestor with both; there seems to be quite some
    degree of copy-and-paste coding here. The include/asm/highmem.h file
    for these three archs mention x86 CPUs at its top.
[4] http://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html
[5] As an aside, could someone tell me why mn10300 uses unsigned long as
    the first parameter of kunmap_atomic() instead of void *?

Signed-off-by: Cesar Eduardo Barros <cesarb@cesarb.net>
Cc: Russell King <linux@arm.linux.org.uk> (arch/arm)
Cc: Ralf Baechle <ralf@linux-mips.org> (arch/mips)
Cc: David Howells <dhowells@redhat.com> (arch/frv, arch/mn10300)
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> (arch/mn10300)
Cc: Kyle McMartin <kyle@mcmartin.ca> (arch/parisc)
Cc: Helge Deller <deller@gmx.de> (arch/parisc)
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> (arch/parisc)
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> (arch/powerpc)
Cc: Paul Mackerras <paulus@samba.org> (arch/powerpc)
Cc: "David S. Miller" <davem@davemloft.net> (arch/sparc)
Cc: Thomas Gleixner <tglx@linutronix.de> (arch/x86)
Cc: Ingo Molnar <mingo@redhat.com> (arch/x86)
Cc: "H. Peter Anvin" <hpa@zytor.com> (arch/x86)
Cc: Arnd Bergmann <arnd@arndb.de> (include/asm-generic)
Cc: Rusty Russell <rusty@rustcorp.com.au> ("Hard To Misuse" list)
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09 20:44:54 -07:00
Linus Torvalds
9faa1e5942 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Ioremap: fix wrong physical address handling in PAT code
  x86, tlb: Clean up and correct used type
  x86, iomap: Fix wrong page aligned size calculation in ioremapping code
  x86, mm: Create symbolic index into address_markers array
  x86, ioremap: Fix normal ram range check
  x86, ioremap: Fix incorrect physical address handling in PAE mode
  x86-64, mm: Initialize VDSO earlier on 64 bits
  x86, kmmio/mmiotrace: Fix double free of kmmio_fault_pages
2010-08-06 10:17:52 -07:00
Linus Torvalds
4aed2fd8e3 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (162 commits)
  tracing/kprobes: unregister_trace_probe needs to be called under mutex
  perf: expose event__process function
  perf events: Fix mmap offset determination
  perf, powerpc: fsl_emb: Restore setting perf_sample_data.period
  perf, powerpc: Convert the FSL driver to use local64_t
  perf tools: Don't keep unreferenced maps when unmaps are detected
  perf session: Invalidate last_match when removing threads from rb_tree
  perf session: Free the ref_reloc_sym memory at the right place
  x86,mmiotrace: Add support for tracing STOS instruction
  perf, sched migration: Librarize task states and event headers helpers
  perf, sched migration: Librarize the GUI class
  perf, sched migration: Make the GUI class client agnostic
  perf, sched migration: Make it vertically scrollable
  perf, sched migration: Parameterize cpu height and spacing
  perf, sched migration: Fix key bindings
  perf, sched migration: Ignore unhandled task states
  perf, sched migration: Handle ignored migrate out events
  perf: New migration tool overview
  tracing: Drop cpparg() macro
  perf: Use tracepoint_synchronize_unregister() to flush any pending tracepoint call
  ...

Fix up trivial conflicts in Makefile and drivers/cpufreq/cpufreq.c
2010-08-06 09:30:52 -07:00
Jiri Kosina
d790d4d583 Merge branch 'master' into for-next 2010-08-04 15:14:38 +02:00
Marcin Slusarz
cc05152ab7 x86,mmiotrace: Add support for tracing STOS instruction
Add support for stos access tracing with mmiotrace.

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Acked-by: Pekka Paalanen <pq@iki.fi>
Cc: Nouveau <nouveau@lists.freedesktop.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20100731205101.GA5860@joi.lan>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-08-02 01:32:01 +02:00
Yasuaki Ishimatsu
3709c85735 x86: Ioremap: fix wrong physical address handling in PAT code
The following two commits fixed a problem that x86 ioremap() doesn't handle
physical address higher than 32-bit properly in X86_32 PAE mode.

 ffa71f33a8 (x86, ioremap: Fix incorrect
 physical address handling in PAE mode)

 35be1b716a (x86, ioremap: Fix normal
 ram range check)

But these fixes are not enough, since pat_pagerange_is_ram() in PAT code
also has a same problem. This patch fixes it.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
LKML-Reference: <4C47DDCF.80300@jp.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-29 15:26:11 +02:00
Borislav Petkov
3f8afb77cd x86, tlb: Clean up and correct used type
smp_processor_id() returns an int and not an unsigned long.
Also, since the function is small enough, there's no need for a
local variable caching its value.

No functionality change, just cleanup.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100721124705.GA674@aftab>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-21 21:48:15 +02:00
Florian Zumbiehl
468c30f2bb x86, iomap: Fix wrong page aligned size calculation in ioremapping code
x86 early_iounmap(): fix off-by-one error in page alignment of allocation
size for sizes where size%PAGE_SIZE==1.

Signed-off-by: Florian Zumbiehl <florz@florz.de>
LKML-Reference: <201007202219.o6KMJlES021058@imap1.linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-20 16:56:35 -07:00
Andres Salomon
92851e2fca x86, mm: Create symbolic index into address_markers array
Without this, adding entries into the address_markers array means adding
more and more of an #ifdef maze in pt_dump_init().  By using indices, we
can keep it a bit saner.

Signed-off-by: Andres Salomon <dilinger@queued.net>
LKML-Reference: <201007202219.o6KMJkUs021052@imap1.linux-foundation.org>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-20 16:56:19 -07:00
Pavel Machek
a2531293db update email address
pavel@suse.cz no longer works, replace it with working address.

Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-07-19 10:56:54 +02:00
Kenji Kaneshige
35be1b716a x86, ioremap: Fix normal ram range check
Check for normal RAM in x86 ioremap() code seems to not work for the
last page frame in the specified physical address range.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
LKML-Reference: <4C1AE6CD.1080704@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-09 11:42:11 -07:00
Kenji Kaneshige
ffa71f33a8 x86, ioremap: Fix incorrect physical address handling in PAE mode
Current x86 ioremap() doesn't handle physical address higher than
32-bit properly in X86_32 PAE mode. When physical address higher than
32-bit is passed to ioremap(), higher 32-bits in physical address is
cleared wrongly. Due to this bug, ioremap() can map wrong address to
linear address space.

In my case, 64-bit MMIO region was assigned to a PCI device (ioat
device) on my system. Because of the ioremap()'s bug, wrong physical
address (instead of MMIO region) was mapped to linear address space.
Because of this, loading ioatdma driver caused unexpected behavior
(kernel panic, kernel hangup, ...).

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
LKML-Reference: <4C1AE680.7090408@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-09 11:42:03 -07:00
Peter Zijlstra
b945d6b255 rbtree: Undo augmented trees performance damage and regression
Reimplement augmented RB-trees without sprinkling extra branches
all over the RB-tree code (which lives in the scheduler hot path).

This approach is 'borrowed' from Fabio's BFQ implementation and
relies on traversing the rebalance path after the RB-tree-op to
correct the heap property for insertion/removal and make up for
the damage done by the tree rotations.

For insertion the rebalance path is trivially that from the new
node upwards to the root, for removal it is that from the deepest
node in the path from the to be removed node that will still
be around after the removal.

[ This patch also fixes a video driver regression reported by
  Ali Gholami Rudi - the memtype->subtree_max_end was updated
  incorrectly. ]

Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Ali Gholami Rudi <ali@rudi.ir>
Cc: Fabio Checconi <fabio@gandalf.sssup.it>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1275414172.27810.27961.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-05 14:43:50 +02:00
Marcin Slusarz
8b8f79b927 x86, kmmio/mmiotrace: Fix double free of kmmio_fault_pages
After every iounmap mmiotrace has to free kmmio_fault_pages, but
it can't do it directly, so it defers freeing by RCU.

It usually works, but when mmiotraced code calls ioremap-iounmap
multiple times without sleeping between (so RCU won't kick in
and start freeing) it can be given the same virtual address, so
at every iounmap mmiotrace will schedule the same pages for
release. Obviously it will explode on second free.

Fix it by marking kmmio_fault_pages which are scheduled for
release and not adding them second time.

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Tested-by: Marcin Kocielnicki <koriakin@0x04.net>
Tested-by: Shinpei KATO <shinpei@il.is.s.u-tokyo.ac.jp>
Acked-by: Pekka Paalanen <pq@iki.fi>
Cc: Stuart Bennett <stuart@freedesktop.org>
Cc: Marcin Kocielnicki <koriakin@0x04.net>
Cc: nouveau@lists.freedesktop.org
Cc: <stable@kernel.org>
LKML-Reference: <20100613215654.GA3829@joi.lan>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-18 11:30:09 +02:00
Venkatesh Pallipadi
6a4f3b5237 x86, pat: Proper init of memtype subtree_max_end
subtree_max_end that was recently added to struct memtype was not getting
properly initialized resulting in

WARNING: kmemcheck: Caught 64-bit read from uninitialized memory
in memtype_rb_augment_cb()
reported here
https://bugzilla.kernel.org/show_bug.cgi?id=16092

This change fixes the problem.

Reported-by: Christian Casteyde <casteyde.christian@free.fr>
Tested-by: Christian Casteyde <casteyde.christian@free.fr>
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
LKML-Reference: <1276217101-11515-1-git-send-email-venki@google.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
2010-06-11 14:12:22 -07:00
Akinobu Mita
e565813ab9 x86/mm: Remove unused DBG() macro
DBG() macro for CONFIG_DEBUG_PER_CPU_MAPS is unused.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
LKML-Reference: <1274706291-13554-1-git-send-email-akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-31 10:01:53 +02:00
Linus Torvalds
021fad8b70 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, cpufeature: Unbreak compile with gcc 3.x
  x86, pat: Fix memory leak in free_memtype
  x86, k8: Fix section mismatch for powernowk8_exit()
  lib/atomic64_test: fix missing include of linux/kernel.h
  x86: remove last traces of quicklist usage
  x86, setup: Phoenix BIOS fixup is needed on Dell Inspiron Mini 1012
  x86: "nosmp" command line option should force the system into UP mode
  arch/x86/pci: use kasprintf
  x86, apic: ack all pending irqs when crashed/on kexec
2010-05-30 09:06:13 -07:00
Linus Torvalds
35926ff5fb Revert "cpusets: randomize node rotor used in cpuset_mem_spread_node()"
This reverts commit 0ac0c0d0f8, which
caused cross-architecture build problems for all the wrong reasons.
IA64 already added its own version of __node_random(), but the fact is,
there is nothing architectural about the function, and the original
commit was just badly done. Revert it, since no fix is forthcoming.

Requested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-30 09:00:03 -07:00
Linus Torvalds
c5617b200a Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (61 commits)
  tracing: Add __used annotation to event variable
  perf, trace: Fix !x86 build bug
  perf report: Support multiple events on the TUI
  perf annotate: Fix up usage of the build id cache
  x86/mmiotrace: Remove redundant instruction prefix checks
  perf annotate: Add TUI interface
  perf tui: Remove annotate from popup menu after failure
  perf report: Don't start the TUI if -D is used
  perf: Fix getline undeclared
  perf: Optimize perf_tp_event_match()
  perf: Remove more code from the fastpath
  perf: Optimize the !vmalloc backed buffer
  perf: Optimize perf_output_copy()
  perf: Fix wakeup storm for RO mmap()s
  perf-record: Share per-cpu buffers
  perf-record: Remove -M
  perf: Ensure that IOC_OUTPUT isn't used to create multi-writer buffers
  perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to track events
  perf, trace: Optimize tracepoints by removing IRQ-disable from perf/tracepoint interaction
  perf tui: Allow disabling the TUI on a per command basis in ~/.perfconfig
  ...
2010-05-27 15:23:47 -07:00
Lee Schermerhorn
e534c7c5f8 numa: x86_64: use generic percpu var numa_node_id() implementation
x86 arch specific changes to use generic numa_node_id() based on generic
percpu variable infrastructure.  Back out x86's custom version of
numa_node_id()

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:57 -07:00
Jack Steiner
0ac0c0d0f8 cpusets: randomize node rotor used in cpuset_mem_spread_node()
Some workloads that create a large number of small files tend to assign
too many pages to node 0 (multi-node systems).  Part of the reason is that
the rotor (in cpuset_mem_spread_node()) used to assign nodes starts at
node 0 for newly created tasks.

This patch changes the rotor to be initialized to a random node number of
the cpuset.

[akpm@linux-foundation.org: fix layout]
[Lee.Schermerhorn@hp.com: Define stub numa_random() for !NUMA configuration]
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Paul Menage <menage@google.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:44 -07:00
Xiaotian Feng
20413f2716 x86, pat: Fix memory leak in free_memtype
Reserve_memtype will allocate memory for new memtype, but
in free_memtype, after the memtype erased from rbtree, the
memory is not freed.

Changes since V1:
	make rbt_memtype_erase return erased memtype so that
	it can be freed in free_memtype.

[ hpa: not for -stable: 2.6.34 and earlier not affected ]

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
LKML-Reference: <1274838670-8731-1-git-send-email-dfeng@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Jack Steiner <steiner@sgi.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-05-26 11:26:04 -07:00
Peter Zijlstra
48691ff86d x86: remove last traces of quicklist usage
We still have a stray quicklist header included even though we axed
quicklist usage quite a while back.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <201005241913.o4OJDJe9010881@imap1.linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-05-24 13:33:31 -07:00
Akinobu Mita
841bca1393 x86/mmiotrace: Remove redundant instruction prefix checks
Get rid of the duplicated entries in prefix_codes[]
to eliminate redundant checks by skip_prefix().

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Pekka Paalanen <pq@iki.fi>
LKML-Reference: <1274140110-5841-1-git-send-email-akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-23 11:02:43 +02:00
Linus Torvalds
59534f7298 Merge branch 'drm-for-2.6.35' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-for-2.6.35' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (207 commits)
  drm/radeon/kms/pm/r600: select the mid clock mode for single head low profile
  drm/radeon: fix power supply kconfig interaction.
  drm/radeon/kms: record object that have been list reserved
  drm/radeon: AGP memory is only I/O if the aperture can be mapped by the CPU.
  drm/radeon/kms: don't default display priority to high on rs4xx
  drm/edid: fix typo in 1600x1200@75 mode
  drm/nouveau: fix i2c-related init table handlers
  drm/nouveau: support init table i2c device identifier 0x81
  drm/nouveau: ensure we've parsed i2c table entry for INIT_*I2C* handlers
  drm/nouveau: display error message for any failed init table opcode
  drm/nouveau: fix init table handlers to return proper error codes
  drm/nv50: support fractional feedback divider on newer chips
  drm/nv50: fix monitor detection on certain chipsets
  drm/nv50: store full dcb i2c entry from vbios
  drm/nv50: fix suspend/resume with DP outputs
  drm/nv50: output calculated crtc pll when debugging on
  drm/nouveau: dump pll limits entries when debugging is on
  drm/nouveau: bios parser fixes for eDP boards
  drm/nouveau: fix a nouveau_bo dereference after it's been destroyed
  drm/nv40: remove some completed ctxprog TODOs
  ...
2010-05-21 11:14:52 -07:00
Linus Torvalds
c4fd308ed6 Merge branch 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, pat: Update the page flags for memtype atomically instead of using memtype_lock
  x86, pat: In rbt_memtype_check_insert(), update new->type only if valid
  x86, pat: Migrate to rbtree only backend for pat memtype management
  x86, pat: Preparatory changes in pat.c for bigger rbtree change
  rbtree: Add support for augmented rbtrees
2010-05-18 09:28:04 -07:00
Linus Torvalds
1f8caa986a Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-64: Combine SRAT regions when possible
2010-05-18 09:17:50 -07:00
Eric Anholt
34dc4d4423 Merge remote branch 'origin/master' into drm-intel-next
Conflicts:
	drivers/gpu/drm/i915/i915_dma.c
	drivers/gpu/drm/i915/i915_drv.h
	drivers/gpu/drm/radeon/r300.c

The BSD ringbuffer support that is landing in this branch
significantly conflicts with the Ironlake PIPE_CONTROL fix on master,
and requires it to be tested successfully anyway.
2010-05-10 13:36:52 -07:00
David Rientjes
b0c4d952a1 x86: Fix fake apicid to node mapping for numa emulation
With NUMA emulation, it's possible for a single cpu to be bound
to multiple nodes since more than one may have affinity if
allocated on a physical node that is local to the cpu.

APIC ids must therefore be mapped to the lowest node ids to
maintain generic kernel use of functions such as cpu_to_node()
that determine device affinity.  For example, if a device has
proximity to physical node 1, for instance, and a cpu happens to
be mapped to a higher emulated node id 8, the proximity may not
be correctly determined by comparison in generic code even
though the cpu may be truly local and allocated on physical node 1.

When this happens, the true topology of the machine isn't
accurately represented in the emulated environment; although
this isn't critical to the system's uptime, any generic code
that is NUMA aware benefits from the physical topology being
accurately represented.

This can affect any system that maps multiple APIC ids to a
single node and is booted with numa=fake=N where N is greater
than the number of physical nodes.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <alpine.DEB.2.00.1005060224140.19473@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-06 12:02:05 +02:00
Ingo Molnar
56f0e74c9c x86: Fix parse_reservetop() build failure on certain configs
Commit e67a807 ("x86: Fix 'reservetop=' functionality") added a
fixup_early_ioremap() call to parse_reservetop() and declared it
in io.h.

But asm/io.h was only included indirectly - and on some configs
not at all, causing a build failure on those configs.

Cc: Liang Li <liang.li@windriver.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Wang Chen <wangchen@cn.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1272621711-8683-1-git-send-email-liang.li@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-03 09:22:19 +02:00
Liang Li
e67a807f3d x86: Fix 'reservetop=' functionality
When specifying the 'reservetop=0xbadc0de' kernel parameter,
the kernel will stop booting due to a early_ioremap bug that
relates to commit 8827247ff.

The root cause of boot failure problem is the value of
'slot_virt[i]' was initialized in setup_arch->early_ioremap_init().
But later in setup_arch, the function 'parse_early_param' will
modify 'FIXADDR_TOP' when 'reservetop=0xbadc0de' being specified.

The simplest fix might be use __fix_to_virt(idx0) to get updated
value of 'FIXADDR_TOP' in '__early_ioremap' instead of reference
old value from slot_virt[slot] directly.

Changelog since v0:

-v1: When reservetop being handled then FIXADDR_TOP get
     adjusted, Hence check prev_map then re-initialize slot_virt and
     PMD based on new FIXADDR_TOP.

-v2: place fixup_early_ioremap hence call early_ioremap_init in
     reserve_top_address  to re-initialize slot_virt and
     corresponding PMD when parse_reservertop

-v3: move fixup_early_ioremap out of reserve_top_address to make
     sure other clients of reserve_top_address like xen/lguest won't
     broken

Signed-off-by: Liang Li <liang.li@windriver.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Wang Chen <wangchen@cn.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1272621711-8683-1-git-send-email-liang.li@windriver.com>
[ fixed three small cleanliness details in fixup_early_ioremap() ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-30 12:19:53 +02:00
Jan Beulich
2e61878698 x86-64: Combine SRAT regions when possible
... i.e. when the hole between two regions isn't occupied by memory on
another node. This reduces the memory->node table size, thus reducing
cache footprint of lookups, which got increased significantly some
time ago, and things go back to how they were before that change on
the systems I looked at.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4BCF3230020000780003B3CA@vpn.id2.novell.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-28 17:14:11 -07:00
Robin Holt
1f9cc3cb6a x86, pat: Update the page flags for memtype atomically instead of using memtype_lock
While testing an application using the xpmem (out of kernel) driver, we
noticed a significant page fault rate reduction of x86_64 with respect
to ia64.  For one test running with 32 cpus, one thread per cpu, it
took 01:08 for each of the threads to vm_insert_pfn 2GB worth of pages.
For the same test running on 256 cpus, one thread per cpu, it took 14:48
to vm_insert_pfn 2 GB worth of pages.

The slowdown was tracked to lookup_memtype which acquires the
spinlock memtype_lock.  This heavily contended lock was slowing down
vm_insert_pfn().

With the cmpxchg on page->flags method, both the 32 cpu and 256 cpu
cases take approx 00:01.3 seconds to complete.

Signed-off-by: Robin Holt <holt@sgi.com>
LKML-Reference: <20100423153627.751194346@gulag1.americas.sgi.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@gmail.com>
Cc: Rafael Wysocki <rjw@novell.com>
Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-23 15:57:23 -07:00
Dave Airlie
c2b41276da Merge branch 'drm-ttm-pool' into drm-core-next
* drm-ttm-pool:
  drm/ttm: using kmalloc/kfree requires including slab.h
  drm/ttm: include linux/seq_file.h for seq_printf
  drm/ttm: Add sysfs interface to control pool allocator.
  drm/ttm: Use set_pages_array_wc instead of set_memory_wc.
  arch/x86: Add array variants for setting memory to wc caching.
  drm/nouveau: Add ttm page pool debugfs file.
  drm/radeon/kms: Add ttm page pool debugfs file.
  drm/ttm: Add debugfs output entry to pool allocator.
  drm/ttm: add pool wc/uc page allocator V3
2010-04-20 13:12:28 +10:00