1
linux/arch
Konrad Rzeszutek Wilk a38647837a xen/mmu: Add workaround "x86-64, mm: Put early page table high"
As a consequence of the commit:

commit 4b239f458c
Author: Yinghai Lu <yinghai@kernel.org>
Date:   Fri Dec 17 16:58:28 2010 -0800

    x86-64, mm: Put early page table high

it causes the Linux kernel to crash under Xen:

mapping kernel into physical memory
Xen: setup ISA identity maps
about to get started...
(XEN) mm.c:2466:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn b1d89 (pfn bacf7)
(XEN) mm.c:3027:d0 Error while pinning mfn b1d89
(XEN) traps.c:481:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
...

The reason is that at some point init_memory_mapping is going to reach
the pagetable pages area and map those pages too (mapping them as normal
memory that falls in the range of addresses passed to init_memory_mapping
as argument). Some of those pages are already pagetable pages (they are
in the range pgt_buf_start-pgt_buf_end) therefore they are going to be
mapped RO and everything is fine.
Some of these pages are not pagetable pages yet (they fall in the range
pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
are going to be mapped RW.  When these pages become pagetable pages and
are hooked into the pagetable, xen will find that the guest has already
a RW mapping of them somewhere and fail the operation.
The reason Xen requires pagetables to be RO is that the hypervisor needs
to verify that the pagetables are valid before using them. The validation
operations are called "pinning" (more details in arch/x86/xen/mmu.c).

In order to fix the issue we mark all the pages in the entire range
pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation
is completed only the range pgt_buf_start-pgt_buf_end is reserved by
init_memory_mapping. Hence the kernel is going to crash as soon as one
of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those
ranges are RO).

For this reason, this function is introduced which is called _after_
the init_memory_mapping has completed (in a perfect world we would
call this function from init_memory_mapping, but lets ignore that).

Because we are called _after_ init_memory_mapping the pgt_buf_[start,
end,top] have all changed to new values (b/c another init_memory_mapping
is called). Hence, the first time we enter this function, we save
away the pgt_buf_start value and update the pgt_buf_[end,top].

When we detect that the "old" pgt_buf_start through pgt_buf_end
PFNs have been reserved (so memblock_x86_reserve_range has been called),
we immediately set out to RW the "old" pgt_buf_end through pgt_buf_top.

And then we update those "old" pgt_buf_[end|top] with the new ones
so that we can redo this on the next pagetable.

Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Reviewed-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
[v1: Updated with Jeremy's comments]
[v2: Added the crash output]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-02 16:33:34 -04:00
..
alpha alpha: Fix uninitialized value in read_persistent_clock. 2011-04-17 14:41:30 -07:00
arm Merge branch 'for-torvalds' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson 2011-04-25 19:00:55 -07:00
avr32 avr32: add ATAG_BOARDINFO 2011-04-13 15:46:59 +02:00
blackfin Blackfin: SMP: fix cache flush loop 2011-04-13 19:34:06 -04:00
cris Fix common misspellings 2011-03-31 11:26:23 -03:00
frv Fix common misspellings 2011-03-31 11:26:23 -03:00
h8300 genirq: Remove the now obsolete config options and select statements 2011-03-30 14:13:23 +02:00
ia64 Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6 2011-04-07 11:14:49 -07:00
m32r Fix common misspellings 2011-03-31 11:26:23 -03:00
m68k m68k,m68knommu: Wire up name_to_handle_at, open_by_handle_at, clock_adjtime, syncfs 2011-04-12 19:02:03 -07:00
microblaze usb: Fix Kconfig unmet dependencies for Microblaze EHCI 2011-04-13 15:43:59 -07:00
mips Fix common misspellings 2011-03-31 11:26:23 -03:00
mn10300 Fix common misspellings 2011-03-31 11:26:23 -03:00
parisc Fix common misspellings 2011-03-31 11:26:23 -03:00
powerpc powerpc/powermac: Build fix with SMP and CPU hotplug 2011-04-18 15:46:35 +10:00
s390 [S390] kvm-390: Let kernel exit SIE instruction on work 2011-04-20 10:15:44 +02:00
score Fix common misspellings 2011-03-31 11:26:23 -03:00
sh Merge branch 'fbdev-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/fbdev-2.6 2011-04-07 12:49:17 -07:00
sparc Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6 2011-04-07 11:14:49 -07:00
tile Fix common misspellings 2011-03-31 11:26:23 -03:00
um um: disable CONFIG_CMPXCHG_LOCAL 2011-04-14 16:06:56 -07:00
unicore32 Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6 2011-04-07 11:14:49 -07:00
x86 xen/mmu: Add workaround "x86-64, mm: Put early page table high" 2011-05-02 16:33:34 -04:00
xtensa xtensa: Fixup irq conversion fallout and nmi_count 2011-04-20 00:32:09 +02:00
.gitignore
Kconfig oprofile, s390: Cleanups 2011-03-16 14:30:40 +01:00