1
Commit Graph

250 Commits

Author SHA1 Message Date
Linus Torvalds
cb110171a6 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, xen: fix use of pgd_page now that it really does return a page
2008-11-07 09:17:59 -08:00
Jeremy Fitzhardinge
d05fdf3160 xen: make sure stray alias mappings are gone before pinning
Xen requires that all mappings of pagetable pages are read-only, so
that they can't be updated illegally.  As a result, if a page is being
turned into a pagetable page, we need to make sure all its mappings
are RO.

If the page had been used for ioremap or vmalloc, it may still have
left over mappings as a result of not having been lazily unmapped.
This change makes sure we explicitly mop them all up before pinning
the page.

Unlike aliases created by kmap, the there can be vmalloc aliases even
for non-high pages, so we must do the flush unconditionally.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Linux Memory Management List <linux-mm@kvack.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-07 10:05:59 +01:00
Jeremy Fitzhardinge
47cb2ed9df x86, xen: fix use of pgd_page now that it really does return a page
Impact: fix 32-bit Xen guest boot crash

On 32-bit PAE, pud_page, for no good reason, didn't really return a
struct page *.  Since Jan Beulich's fix "i386/PAE: fix pud_page()",
pud_page does return a struct page *.

Because PAE has 3 pagetable levels, the pud level is folded into the
pgd level, so pgd_page() is the same as pud_page(), and now returns
a struct page *.  Update the xen/mmu.c code which uses pgd_page()
accordingly.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-06 23:20:47 +01:00
Linus Torvalds
e946217e4f Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
  ftrace: fix current_tracer error return
  tracing: fix a build error on alpha
  ftrace: use a real variable for ftrace_nop in x86
  tracing/ftrace: make boot tracer select the sched_switch tracer
  tracepoint: check if the probe has been registered
  asm-generic: define DIE_OOPS in asm-generic
  trace: fix printk warning for u64
  ftrace: warning in kernel/trace/ftrace.c
  ftrace: fix build failure
  ftrace, powerpc, sparc64, x86: remove notrace from arch ftrace file
  ftrace: remove ftrace hash
  ftrace: remove mcount set
  ftrace: remove daemon
  ftrace: disable dynamic ftrace for all archs that use daemon
  ftrace: add ftrace warn on to disable ftrace
  ftrace: only have ftrace_kill atomic
  ftrace: use probe_kernel
  ftrace: comment arch ftrace code
  ftrace: return error on failed modified text.
  ftrace: dynamic ftrace process only text section
  ...
2008-10-28 09:52:25 -07:00
Chris Lalancette
9f32d21c98 xen: fix Xen domU boot with batched mprotect
Impact: fix guest kernel boot crash on certain configs

Recent i686 2.6.27 kernels with a certain amount of memory (between
736 and 855MB) have a problem booting under a hypervisor that supports
batched mprotect (this includes the RHEL-5 Xen hypervisor as well as
any 3.3 or later Xen hypervisor).

The problem ends up being that xen_ptep_modify_prot_commit() is using
virt_to_machine to calculate which pfn to update.  However, this only
works for pages that are in the p2m list, and the pages coming from
change_pte_range() in mm/mprotect.c are kmap_atomic pages.  Because of
this, we can run into the situation where the lookup in the p2m table
returns an INVALID_MFN, which we then try to pass to the hypervisor,
which then (correctly) denies the request to a totally bogus pfn.

The right thing to do is to use arbitrary_virt_to_machine, so that we
can be sure we are modifying the right pfn.  This unfortunately
introduces a performance penalty because of a full page-table-walk,
but we can avoid that penalty for pages in the p2m list by checking if
virt_addr_valid is true, and if so, just doing the lookup in the p2m
table.

The attached patch implements this, and allows my 2.6.27 i686 based
guest with 768MB of memory to boot on a RHEL-5 hypervisor again.
Thanks to Jeremy for the suggestions about how to fix this particular
issue.

Signed-off-by: Chris Lalancette <clalance@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Chris Lalancette <clalance@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-27 14:11:20 +01:00
Ingo Molnar
debfcaf93e Merge branch 'tracing/ftrace' into tracing/urgent 2008-10-22 09:08:14 +02:00
Linus Torvalds
9301975ec2 Merge branch 'genirq-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
This merges branches irq/genirq, irq/sparseirq-v4, timers/hpet-percpu
and x86/uv.

The sparseirq branch is just preliminary groundwork: no sparse IRQs are
actually implemented by this tree anymore - just the new APIs are added
while keeping the old way intact as well (the new APIs map 1:1 to
irq_desc[]).  The 'real' sparse IRQ support will then be a relatively
small patch ontop of this - with a v2.6.29 merge target.

* 'genirq-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (178 commits)
  genirq: improve include files
  intr_remapping: fix typo
  io_apic: make irq_mis_count available on 64-bit too
  genirq: fix name space collisions of nr_irqs in arch/*
  genirq: fix name space collision of nr_irqs in autoprobe.c
  genirq: use iterators for irq_desc loops
  proc: fixup irq iterator
  genirq: add reverse iterator for irq_desc
  x86: move ack_bad_irq() to irq.c
  x86: unify show_interrupts() and proc helpers
  x86: cleanup show_interrupts
  genirq: cleanup the sparseirq modifications
  genirq: remove artifacts from sparseirq removal
  genirq: revert dynarray
  genirq: remove irq_to_desc_alloc
  genirq: remove sparse irq code
  genirq: use inline function for irq_to_desc
  genirq: consolidate nr_irqs and for_each_irq_desc()
  x86: remove sparse irq from Kconfig
  genirq: define nr_irqs for architectures with GENERIC_HARDIRQS=n
  ...
2008-10-20 13:23:01 -07:00
Steven Rostedt
606576ce81 ftrace: rename FTRACE to FUNCTION_TRACER
Due to confusion between the ftrace infrastructure and the gcc profiling
tracer "ftrace", this patch renames the config options from FTRACE to
FUNCTION_TRACER.  The other two names that are offspring from FTRACE
DYNAMIC_FTRACE and FTRACE_MCOUNT_RECORD will stay the same.

This patch was generated mostly by script, and partially by hand.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-20 18:27:03 +02:00
Nick Piggin
db64fe0225 mm: rewrite vmap layer
Rewrite the vmap allocator to use rbtrees and lazy tlb flushing, and
provide a fast, scalable percpu frontend for small vmaps (requires a
slightly different API, though).

The biggest problem with vmap is actually vunmap.  Presently this requires
a global kernel TLB flush, which on most architectures is a broadcast IPI
to all CPUs to flush the cache.  This is all done under a global lock.  As
the number of CPUs increases, so will the number of vunmaps a scaled
workload will want to perform, and so will the cost of a global TLB flush.
 This gives terrible quadratic scalability characteristics.

Another problem is that the entire vmap subsystem works under a single
lock.  It is a rwlock, but it is actually taken for write in all the fast
paths, and the read locking would likely never be run concurrently anyway,
so it's just pointless.

This is a rewrite of vmap subsystem to solve those problems.  The existing
vmalloc API is implemented on top of the rewritten subsystem.

The TLB flushing problem is solved by using lazy TLB unmapping.  vmap
addresses do not have to be flushed immediately when they are vunmapped,
because the kernel will not reuse them again (would be a use-after-free)
until they are reallocated.  So the addresses aren't allocated again until
a subsequent TLB flush.  A single TLB flush then can flush multiple
vunmaps from each CPU.

XEN and PAT and such do not like deferred TLB flushing because they can't
always handle multiple aliasing virtual addresses to a physical address.
They now call vm_unmap_aliases() in order to flush any deferred mappings.
That call is very expensive (well, actually not a lot more expensive than
a single vunmap under the old scheme), however it should be OK if not
called too often.

The virtual memory extent information is stored in an rbtree rather than a
linked list to improve the algorithmic scalability.

There is a per-CPU allocator for small vmaps, which amortizes or avoids
global locking.

To use the per-CPU interface, the vm_map_ram / vm_unmap_ram interfaces
must be used in place of vmap and vunmap.  Vmalloc does not use these
interfaces at the moment, so it will not be quite so scalable (although it
will use lazy TLB flushing).

As a quick test of performance, I ran a test that loops in the kernel,
linearly mapping then touching then unmapping 4 pages.  Different numbers
of tests were run in parallel on an 4 core, 2 socket opteron.  Results are
in nanoseconds per map+touch+unmap.

threads           vanilla         vmap rewrite
1                 14700           2900
2                 33600           3000
4                 49500           2800
8                 70631           2900

So with a 8 cores, the rewritten version is already 25x faster.

In a slightly more realistic test (although with an older and less
scalable version of the patch), I ripped the not-very-good vunmap batching
code out of XFS, and implemented the large buffer mapping with vm_map_ram
and vm_unmap_ram...  along with a couple of other tricks, I was able to
speed up a large directory workload by 20x on a 64 CPU system.  I believe
vmap/vunmap is actually sped up a lot more than 20x on such a system, but
I'm running into other locks now.  vmap is pretty well blown off the
profiles.

Before:
1352059 total                                      0.1401
798784 _write_lock                              8320.6667 <- vmlist_lock
529313 default_idle                             1181.5022
 15242 smp_call_function                         15.8771  <- vmap tlb flushing
  2472 __get_vm_area_node                         1.9312  <- vmap
  1762 remove_vm_area                             4.5885  <- vunmap
   316 map_vm_area                                0.2297  <- vmap
   312 kfree                                      0.1950
   300 _spin_lock                                 3.1250
   252 sn_send_IPI_phys                           0.4375  <- tlb flushing
   238 vmap                                       0.8264  <- vmap
   216 find_lock_page                             0.5192
   196 find_next_bit                              0.3603
   136 sn2_send_IPI                               0.2024
   130 pio_phys_write_mmr                         2.0312
   118 unmap_kernel_range                         0.1229

After:
 78406 total                                      0.0081
 40053 default_idle                              89.4040
 33576 ia64_spinlock_contention                 349.7500
  1650 _spin_lock                                17.1875
   319 __reg_op                                   0.5538
   281 _atomic_dec_and_lock                       1.0977
   153 mutex_unlock                               1.5938
   123 iget_locked                                0.1671
   117 xfs_dir_lookup                             0.1662
   117 dput                                       0.1406
   114 xfs_iget_core                              0.0268
    92 xfs_da_hashname                            0.1917
    75 d_alloc                                    0.0670
    68 vmap_page_range                            0.0462 <- vmap
    58 kmem_cache_alloc                           0.0604
    57 memset                                     0.0540
    52 rb_next                                    0.1625
    50 __copy_user                                0.0208
    49 bitmap_find_free_region                    0.2188 <- vmap
    46 ia64_sn_udelay                             0.1106
    45 find_inode_fast                            0.1406
    42 memcmp                                     0.2188
    42 finish_task_switch                         0.1094
    42 __d_lookup                                 0.0410
    40 radix_tree_lookup_slot                     0.1250
    37 _spin_unlock_irqrestore                    0.3854
    36 xfs_bmapi                                  0.0050
    36 kmem_cache_free                            0.0256
    35 xfs_vn_getattr                             0.0322
    34 radix_tree_lookup                          0.1062
    33 __link_path_walk                           0.0035
    31 xfs_da_do_buf                              0.0091
    30 _xfs_buf_find                              0.0204
    28 find_get_page                              0.0875
    27 xfs_iread                                  0.0241
    27 __strncpy_from_user                        0.2812
    26 _xfs_buf_initialize                        0.0406
    24 _xfs_buf_lookup_pages                      0.0179
    24 vunmap_page_range                          0.0250 <- vunmap
    23 find_lock_page                             0.0799
    22 vm_map_ram                                 0.0087 <- vmap
    20 kfree                                      0.0125
    19 put_page                                   0.0330
    18 __kmalloc                                  0.0176
    17 xfs_da_node_lookup_int                     0.0086
    17 _read_lock                                 0.0885
    17 page_waitqueue                             0.0664

vmap has gone from being the top 5 on the profiles and flushing the crap
out of all TLBs, to using less than 1% of kernel time.

[akpm@linux-foundation.org: cleanups, section fix]
[akpm@linux-foundation.org: fix build on alpha]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20 08:52:32 -07:00
Thomas Gleixner
d6c88a507e genirq: revert dynarray
Revert the dynarray changes. They need more thought and polishing.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-10-16 16:53:15 +02:00
Alex Nixon
bf9d3cf73e xen: Fix bug `do_IRQ: cannot handle IRQ -1 vector 0x6 cpu 1'
Following commit 9c3f2468d8339866d9ef6a25aae31a8909c6be0d, do_IRQ()
looks up the IRQ number in the per-cpu variable vector_irq.

This commit makes Xen initialise an identity vector_irq map for both X86_32 and X86_64.

Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:58 +02:00
Yinghai Lu
7f95ec9e4c x86: move kstat_irqs from kstat to irq_desc
based on Eric's patch ...

together mold it with dyn_array for irq_desc, will allcate kstat_irqs for
nr_irq_desc alltogether if needed. -- at that point nr_cpus is known already.

v2: make sure system without generic_hardirqs works they don't have irq_desc
v3: fix merging
v4: [mingo@elte.hu] fix typo

[ mingo@elte.hu ] irq: build fix

fix:

 arch/x86/xen/spinlock.c: In function 'xen_spin_lock_slow':
 arch/x86/xen/spinlock.c:90: error: 'struct kernel_stat' has no member named 'irqs'

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:32 +02:00
Glauber Costa
3807f345b2 x86: paravirt: factor out cpu_khz to common code
KVM intends to use paravirt code to calibrate khz. Xen
current code will do just fine. So as a first step, factor out
code to pvclock.c.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:17 +02:00
Ingo Molnar
365d46dc9b Merge branch 'linus' into x86/xen
Conflicts:
	arch/x86/kernel/cpu/common.c
	arch/x86/kernel/process_64.c
	arch/x86/xen/enlighten.c
2008-10-12 12:37:32 +02:00
Ingo Molnar
d84705969f Merge branch 'x86/apic' into x86-v28-for-linus-phase4-B
Conflicts:
	arch/x86/kernel/apic_32.c
	arch/x86/kernel/apic_64.c
	arch/x86/kernel/setup.c
	drivers/pci/intel-iommu.c
	include/asm-x86/cpufeature.h
	include/asm-x86/dma-mapping.h
2008-10-11 20:17:36 +02:00
Ian Campbell
5dc64a3442 xen: do not reserve 2 pages of padding between hypervisor and fixmap.
When reserving space for the hypervisor the Xen paravirt backend adds
an extra two pages (this was carried forward from the 2.6.18-xen tree
which had them "for safety"). Depending on various CONFIG options this
can cause the boot time fixmaps to span multiple PMDs which is not
supported and triggers a WARN in early_ioremap_init().

This was exposed by 2216d199b1 which
moved the dmi table parsing earlier.
    x86: fix CONFIG_X86_RESERVE_LOW_64K=y

    The bad_bios_dmi_table() quirk never triggered because we do DMI setup
    too late. Move it a bit earlier.

There is no real reason to reserve these two extra pages and the
fixmap already incorporates FIX_HOLE which serves the same
purpose. None of the other callers of reserve_top_address do this.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 13:00:15 +02:00
Jeremy Fitzhardinge
eefb47f6a1 xen: use spin_lock_nest_lock when pinning a pagetable
When pinning/unpinning a pagetable with split pte locks, we can end up
holding multiple pte locks at once (we need to hold the locks while
there's a pending batched hypercall affecting the pte page).  Because
all the pte locks are in the same lock class, lockdep thinks that
we're potentially taking a lock recursively.

This warning is spurious because we always take the pte locks while
holding mm->page_table_lock.  lockdep now has spin_lock_nest_lock to
express this kind of dominant lock use, so use it here so that lockdep
knows what's going on.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-09 14:25:19 +02:00
Ingo Molnar
e496e3d645 Merge branches 'x86/alternatives', 'x86/cleanups', 'x86/commandline', 'x86/crashdump', 'x86/debug', 'x86/defconfig', 'x86/doc', 'x86/exports', 'x86/fpu', 'x86/gart', 'x86/idle', 'x86/mm', 'x86/mtrr', 'x86/nmi-watchdog', 'x86/oprofile', 'x86/paravirt', 'x86/reboot', 'x86/sparse-fixes', 'x86/tsc', 'x86/urgent' and 'x86/vmalloc' into x86-v28-for-linus-phase1 2008-10-06 18:17:07 +02:00
Jeremy Fitzhardinge
db053b86f4 xen: clean up x86-64 warnings
There are a couple of Xen features which rely on directly accessing
per-cpu data via a segment register, which is not yet available on
x86-64.  In the meantime, just disable direct access to the vcpu info
structure; this leaves some of the code as dead, but it will come to
life in time, and the warnings are suppressed.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-03 10:04:10 +02:00
Chuck Ebbert
08115ab4d9 xen: make CONFIG_XEN_SAVE_RESTORE depend on CONFIG_XEN
Xen options need to depend on XEN.
Also, add newline at end of file.

Without this patch you need to disable CONFIG_PM in order to
disable CPU hotplugging.

Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
Acked-by Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-30 09:58:05 +02:00
Ingo Molnar
07bbc16a86 Merge branch 'timers/urgent' into x86/xen
Conflicts:
	arch/x86/kernel/process_32.c
	arch/x86/kernel/process_64.c

Manual merge:

	arch/x86/kernel/smpboot.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-23 23:26:42 +02:00
Jeremy Fitzhardinge
5670a43d71 xen: fix for xen guest with mem > 3.7G
PFN_PHYS() can truncate large addresses unless its passed a suitable
large type.  This is fixed more generally in the patch series
introducing phys_addr_t, but we need a short-term fix to solve a
Xen regression reported by Roberto De Ioris.

Reported-by: Roberto De Ioris <roberto@unbit.it>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-14 16:46:34 +02:00
Jeremy Fitzhardinge
6a9e91846b xen: fix pinning when not using split pte locks
We only pin PTE pages when using split PTE locks, so don't do the
pin/unpin when attaching/detaching pte pages to a pinned pagetable.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-10 14:05:53 +02:00
Ingo Molnar
3ce9bcb583 Merge branch 'core/xen' into x86/xen 2008-09-10 14:05:45 +02:00
Jeremy Fitzhardinge
f7d0b926ac mm: define USE_SPLIT_PTLOCKS rather than repeating expression
Define USE_SPLIT_PTLOCKS as a constant expression rather than repeating
"NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS" all over the place.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-10 14:04:59 +02:00
Alex Nixon
26fd10517e xen: make CPU hotplug functions static
There's no need for these functions to be accessed from outside of xen/smp.c

Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-08 19:12:24 +02:00
Alex Nixon
2737146b3a x86, xen: fix build when !CONFIG_HOTPLUG_CPU
Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-08 19:12:23 +02:00
Eduardo Habkost
e4a6be4d28 x86, xen: Use native_pte_flags instead of native_pte_val for .pte_flags
Using native_pte_val triggers the BUG_ON() in the paravirt_ops
version of pte_flags().

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-06 20:13:58 +02:00
Alex Nixon
d68d82afd4 xen: implement CPU hotplugging
Note the changes from 2.6.18-xen CPU hotplugging:

A vcpu_down request from the remote admin via Xenbus both hotunplugs the
CPU, and disables it by removing it from the cpu_present map, and removing
its entry in /sys.

A vcpu_up request from the remote admin only re-enables the CPU, and does
not immediately bring the CPU up. A udev event is emitted, which can be
caught by the user if he wishes to automatically re-up CPUs when available,
or implement a more complex policy.

Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-25 11:25:14 +02:00
Jeremy Fitzhardinge
ee7686bc04 x86: build fix in "xen spinlock updates and performance measurements"
Ingo Molnar wrote:
> -tip testing found this build failure:
>
>  arch/x86/xen/spinlock.c: In function ‘spin_time_start’:
>  arch/x86/xen/spinlock.c:60: error: implicit declaration of function ‘xen_clocksource_read’
>
> i've excluded these new commits for now from tip/master - could you
> please send a delta fix against tip/x86/xen?

Make xen_clocksource_read non-static.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-22 08:08:46 +02:00
Eduardo Habkost
f86399396c x86, paravirt_ops: use unsigned long instead of u32 for alloc_p*() pfn args
This patch changes the pfn args from 'u32' to 'unsigned long'
on alloc_p*() functions on paravirt_ops, and the corresponding
implementations for Xen and VMI. The prototypes for CONFIG_PARAVIRT=n
are already using unsigned long, so paravirt.h now matches the prototypes
on asm-x86/pgalloc.h.

It shouldn't result in any changes on generated code on 32-bit, with
or without CONFIG_PARAVIRT. On both cases, 'codiff -f' didn't show any
change after applying this patch.

On 64-bit, there are (expected) binary changes only when CONFIG_PARAVIRT
is enabled, as the patch is really supposed to change the size of the
pfn args.

[ v2: KVM_GUEST: use the right parameter type on kvm_release_pt() ]

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Acked-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-22 05:34:44 +02:00
Jeremy Fitzhardinge
f8eca41f0a xen: measure how long spinlocks spend blocking
Measure how long spinlocks spend blocked.  Also rename some fields to
be more consistent.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-21 13:52:59 +02:00
Jeremy Fitzhardinge
1e696f638b xen: allow interrupts to be enabled while doing a blocking spin
If spin_lock is called in an interrupts-enabled context, we can safely
enable interrupts while spinning.  We don't bother for the actual spin
loop, but if we timeout and fall back to blocking, it's definitely
worthwhile enabling interrupts if possible.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-21 13:52:58 +02:00
Jeremy Fitzhardinge
994025caba xen: add debugfs support
Add support for exporting statistics on mmu updates, multicall
batching and pv spinlocks into debugfs. The base path is xen/ and
each subsystem adds its own directory: mmu, multicalls, spinlocks.

In each directory, writing 1 to "zero_stats" will cause the
corresponding stats to be zeroed the next time they're updated.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-21 13:52:58 +02:00
Jeremy Fitzhardinge
168d2f464a xen: save previous spinlock when blocking
A spinlock can be interrupted while spinning, so make sure we preserve
the previous lock of interest if we're taking a lock from within an
interrupt handler.

We also need to deal with the case where the blocking path gets
interrupted between testing to see if the lock is free and actually
blocking.  If we get interrupted there and end up in the state where
the lock is free but the irq isn't pending, then we'll block
indefinitely in the hypervisor.  This fix is to make sure that any
nested lock-takers will always leave the irq pending if there's any
chance the outer lock became free.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-21 13:52:57 +02:00
Jeremy Fitzhardinge
7708ad64a2 xen: add xen_ prefixes to make tracing with ftrace easier
It's easier to pattern match on Xen function if they all start with xen_.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-20 12:40:08 +02:00
Jeremy Fitzhardinge
11ad93e59d xen: clarify locking used when pinning a pagetable.
Add some comments explaining the locking and pinning algorithm when
using split pte locks.  Also implement a minor optimisation of not
pinning the PTE when not using split pte locks.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Xen-devel <xen-devel@lists.xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-20 12:40:08 +02:00
Jeremy Fitzhardinge
6e833587e1 xen: clean up domain mode predicates
There are four operating modes Xen code may find itself running in:
 - native
 - hvm domain
 - pv dom0
 - pv domU

Clean up predicates for testing for these states to make them more consistent.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Xen-devel <xen-devel@lists.xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-20 12:40:07 +02:00
Eduardo Habkost
169ad16bb8 xen_alloc_ptpage: cast PFN_PHYS() argument to unsigned long
Currently paravirt_ops alloc_p*() uses u32 for the pfn args. We should
change that later, but while the pfn parameter is still u32, we need to
cast the PFN_PHYS() argument at xen_alloc_ptpage() to unsigned long,
otherwise it will lose bits on the shift.

I think PFN_PHYS() should behave better when fed with smaller integers,
but a cast to unsigned long won't be enough for all cases on 32-bit PAE,
and a cast to u64 would be overkill for most users of PFN_PHYS().

We could have two different flavors of PFN_PHYS: one for low pages
only (unsigned long) and another that works for any page (u64)),
but while we don't have it, we will need the cast to unsigned long on
xen_alloc_ptpage().

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-31 17:10:35 +02:00
Jeremy Fitzhardinge
cef43bf6b3 xen: fix allocation and use of large ldts, cleanup
Add a proper comment for set_aliased_prot() and fix an
unsigned long/void * warning.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-31 17:10:35 +02:00
Jeremy Fitzhardinge
0d1edf46ba xen: compile irq functions without -pg for ftrace
For some reason I managed to miss a bunch of irq-related functions
which also need to be compiled without -pg when using ftrace.  This
patch moves them into their own file, and starts a cleanup process
I've been meaning to do anyway.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: "Alex Nixon (Intern)" <Alex.Nixon@eu.citrix.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-31 12:39:39 +02:00
Ingo Molnar
eac4345be6 Merge branch 'x86/spinlocks' into x86/xen 2008-07-31 12:39:15 +02:00
Jeremy Fitzhardinge
d89961e2dc xen: suppress known wrmsrs
In general, Xen doesn't support wrmsr from an unprivileged domain; it
just ends up ignoring the instruction and printing a message on the
console.

Given that there are sets of MSRs we know the kernel will try to write
to, but we don't care, just eat them in xen_write_msr to cut down on
console noise.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-28 16:33:08 +02:00
Jeremy Fitzhardinge
a05d2ebab2 xen: fix allocation and use of large ldts
When the ldt gets to more than 1 page in size, the kernel uses vmalloc
to allocate it.  This means that:

 - when making the ldt RO, we must update the pages in both the vmalloc
   mapping and the linear mapping to make sure there are no RW aliases.

 - we need to use arbitrary_virt_to_machine to compute the machine addr
   for each update

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-28 14:26:27 +02:00
Eduardo Habkost
b56afe1d41 x86, xen: Use native_pte_flags instead of native_pte_val for .pte_flags
Using native_pte_val triggers the BUG_ON() in the paravirt_ops
version of pte_flags().

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-26 17:49:33 +02:00
Ingo Molnar
c3cc99ff5d Merge branch 'linus' into x86/xen 2008-07-26 17:48:49 +02:00
Ingo Molnar
10a010f695 Merge branch 'linus' into x86/x2apic
Conflicts:

	drivers/pci/dmar.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-25 13:08:16 +02:00
Jeremy Fitzhardinge
d5de884135 x86: split spinlock implementations out into their own files
ftrace requires certain low-level code, like spinlocks and timestamps,
to be compiled without -pg in order to avoid infinite recursion.  This
patch splits out the core paravirt spinlocks and the Xen spinlocks
into separate files which can be compiled without -pg.

Also do xen/time.c while we're about it.  As a result, we can now use
ftrace within a Xen domain.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-24 12:31:51 +02:00
Jeremy Fitzhardinge
38ffbe66d5 x86/paravirt/xen: properly fill out the ldt ops
LTP testing showed that Xen does not properly implement
sys_modify_ldt().  This patch does the final little bits needed to
make the ldt work properly.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-24 12:30:06 +02:00
Jeremy Fitzhardinge
2dc1697eb3 xen: don't use sysret for sysexit32
When implementing sysexit32, don't let Xen use sysret to return to
userspace.  That results in usermode register state being trashed.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-24 12:28:12 +02:00