1
linux/arch/x86/kernel
Andi Kleen 8346ea17aa x86: split large page mapping for AMD TSEG
On AMD SMM protected memory is part of the address map, but handled
internally like an MTRR. That leads to large pages getting split
internally which has some performance implications. Check for the
AMD TSEG MSR and split the large page mapping on that area
explicitely if it is part of the direct mapping.

There is also SMM ASEG, but it is in the first 1MB and already covered by
the earlier split first page patch.

Idea for this came from an earlier patch by Andreas Herrmann

On a RevF dual Socket Opteron system kernbench shows a clear
improvement from this:
(together with the earlier patches in this series, especially the
split first 2MB patch)

[lower is better]
              no split stddev         split  stddev    delta
Elapsed Time   87.146 (0.727516)     84.296 (1.09098)  -3.2%
User Time     274.537 (4.05226)     273.692 (3.34344)  -0.3%
System Time    34.907 (0.42492)      34.508 (0.26832)  -1.1%
Percent CPU   322.5   (38.3007)     326.5   (44.5128)  +1.2%

=> About 3.2% improvement in elapsed time for kernbench.

With GB pages on AMD Fam1h the impact of splitting is much higher of course,
since it would split two full GB pages (together with the first
1MB split patch) instead of two 2MB pages.  I could not benchmark
a clear difference in kernbench on gbpages, so I kept it disabled
for that case

That was only limited benchmarking of course, so if someone
was interested in running more tests for the gbpages case
that could be revisited (contributions welcome)

I didn't bother implementing this for 32bit because it is very
unlikely the 32bit lowmem mapping overlaps into the TSEG near 4GB
and the 2MB low split is already handled for both.

[ mingo@elte.hu: do it on gbpages kernels too, there's no clear reason
                 why it shouldnt help there. ]

Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: andreas.herrmann3@amd.com
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:30 +02:00
..
acpi x86: replace the now useless max_pfn_mapped define 2008-04-17 17:41:30 +02:00
cpu x86: pat cpu feature bit setting for known cpus 2008-04-17 17:41:20 +02:00
.gitignore
alternative.c x86: fix test_poke for vmalloced pages 2008-04-17 17:41:29 +02:00
aperture_64.c x86: clean up aperture_64.c 2008-04-17 17:41:19 +02:00
apic_32.c x86: provide an end_local_APIC_setup function 2008-04-17 17:41:02 +02:00
apic_64.c x86: apic_is_clustered_box to indicate unsynched TSC's on multiboard vSMP systems 2008-04-17 17:41:08 +02:00
apm_32.c x86: switch to proc_create() 2008-04-17 17:40:51 +02:00
asm-offsets_32.c x86: move struct definitions to unifed sigframe.h 2008-04-17 17:40:46 +02:00
asm-offsets_64.c x86: add asm_offset PARAVIRT constants 2008-01-30 13:33:19 +01:00
asm-offsets.c
audit_64.c
bootflag.c x86: coding style cleanup for kernel/bootflag.c 2008-01-30 13:32:31 +01:00
bugs_64.c x86: don't use large pages to map the first 2/4MB of memory 2008-04-17 17:41:30 +02:00
cpuid.c x86: cpuid, msr: use inode mutex instead of big kernel lock 2008-02-04 16:47:59 +01:00
crash_dump_32.c
crash_dump_64.c
crash.c
doublefault_32.c x86: unify tss_struct 2008-01-30 13:31:31 +01:00
ds.c x86: fix small sparse warning 2008-01-31 22:05:47 +01:00
e820_32.c x86: reserve end-of-conventional-memory to 1MB on 32-bit 2008-04-17 17:40:51 +02:00
e820_64.c x86: replace the now useless max_pfn_mapped define 2008-04-17 17:41:30 +02:00
early_printk.c x86: coding style fixes to x86/kernel/early_printk.c 2008-04-17 17:40:51 +02:00
early-quirks.c x86: fix section mismatch warning in early-quirks.c 2008-01-30 13:33:37 +01:00
efi_32.c x86: sparse error in efi_32.c 2008-02-19 16:18:28 +01:00
efi_64.c x86: EFI runtime code mapping enhancement 2008-02-13 16:20:35 +01:00
efi_stub_32.S
efi_stub_64.S x86: EFI runtime service support 2008-01-30 13:31:19 +01:00
efi.c x86: sparse warning in efi.c 2008-02-19 16:18:28 +01:00
entry_32.S x86: only enable interrupts when kernel state has been set up 2008-04-17 17:41:29 +02:00
entry_64.S x86: ptrace vs -ENOSYS 2008-04-17 17:41:13 +02:00
genapic_64.c x86: cleanup x86_cpu_to_apicid references 2008-01-30 13:33:11 +01:00
genapic_flat_64.c
geode_32.c x86: GEODE: MFGPT: Use "just-in-time" detection for the MFGPT timers 2008-02-09 23:24:08 +01:00
head32.c x86: introduce kernel/head32.c 2008-04-17 17:40:49 +02:00
head64.c x86: don't set up early exception handlers for external interrupts 2008-04-17 17:41:29 +02:00
head_32.S x86: introduce kernel/head32.c 2008-04-17 17:40:49 +02:00
head_64.S x86: move early exception handlers into init.text 2008-04-17 17:41:29 +02:00
hpet.c x86: revert assign IRQs to hpet timer 2008-04-04 18:36:49 +02:00
i386_ksyms_32.c
i387.c x86: clean up i387.c 2008-04-17 17:40:57 +02:00
i8237.c
i8253.c x86: pit_clockevent can be static 2008-02-13 16:20:35 +01:00
i8259_32.c x86: i8259A: remove redundant irq_descinitialization 2008-02-19 16:18:34 +01:00
i8259_64.c x86: provide a native_init_IRQ function on 64-bit 2008-01-30 13:33:19 +01:00
init_task.c x86: delay the export removal of init_mm 2008-02-29 18:55:42 +01:00
io_apic_32.c x86: fix ioapic bug again 2008-04-17 17:41:21 +02:00
io_apic_64.c arch/x86/kernel/io_apic_{64,32}.c: use time_before 2008-01-30 13:32:19 +01:00
io_delay.c x86: add dmi quirk for io_delay 2008-03-26 22:23:40 +01:00
ioport.c x86: refactor ioport unification 2008-01-30 13:33:10 +01:00
ipi.c x86: create ipi.c 2008-04-17 17:40:56 +02:00
irq_32.c x86: replace remaining __FUNCTION__ occurances 2008-04-17 17:40:57 +02:00
irq_64.c
k8.c
kdebugfs.c x86 boot : export boot_params via debugfs for debugging 2008-01-30 13:32:51 +01:00
kprobes.c x86, kprobes: correct post-eip value in post_hander() 2008-04-17 17:41:13 +02:00
ldt.c x86: cleanup - eliminate numbers in LDT allocation code 2008-02-04 16:48:03 +01:00
machine_kexec_32.c vmcoreinfo: fix the configuration dependencies 2008-02-07 08:42:25 -08:00
machine_kexec_64.c vmcoreinfo: add the symbol "phys_base" 2008-04-02 15:28:19 -07:00
Makefile x86: fix build breakage when PCI is define and PARAVIRT is not 2008-04-17 17:41:08 +02:00
mca_32.c x86: coding style fixes to arch/x86/kernel/mca_32.c 2008-04-17 17:40:49 +02:00
mfgpt_32.c x86: GEODE: add missing module.h include 2008-03-26 22:23:40 +01:00
microcode.c x86: fix section mismatch warnings when referencing notifiers 2008-02-01 17:49:42 +01:00
module_32.c
module_64.c
mpparse_32.c x86: use same index for processor maps 2008-04-17 17:41:21 +02:00
mpparse_64.c x86: rename gsi_start to gsi_base to match mpparse_32.c 2008-04-17 17:41:05 +02:00
msr.c x86: coding style fixes to arch/x86/kernel/msr.c 2008-04-17 17:40:50 +02:00
nmi_32.c x86: fix ioapic bug again 2008-04-17 17:41:21 +02:00
nmi_64.c x86: wipe get_nmi_reason out of nmi_64.h 2008-04-17 17:41:01 +02:00
numaq_32.c x86: convert TSC disabling to generic cpuid disable bitmap 2008-01-30 13:33:20 +01:00
paravirt_patch_32.c x86: move patching code to arch-specific file. 2008-01-30 13:32:10 +01:00
paravirt_patch_64.c x86: add stringify header 2008-01-30 13:33:19 +01:00
paravirt.c x86: fill in missing pv_mmu_ops entries for PAGETABLE_LEVELS >= 3 2008-01-30 13:33:20 +01:00
pci-calgary_64.c iommu sg: x86: convert calgary IOMMU to use the IOMMU helper 2008-02-05 09:44:11 -08:00
pci-dma_32.c
pci-dma_64.c x86 iommu: add more documentation 2008-04-17 17:40:59 +02:00
pci-gart_64.c x86, agpgart: scary messages are fortunately obsolete 2008-04-04 18:36:46 +02:00
pci-nommu_64.c
pci-swiotlb_64.c
pcspeaker.c
pmtimer_64.c
process_32.c x86: always enable irqs when entering idle 2008-04-17 17:41:00 +02:00
process_64.c x86: prevent unconditional writes to DebugCtl MSR 2008-04-17 17:40:58 +02:00
ptrace.c x86: regparm(3) is mandatory, no need to annotate 2008-04-17 17:40:45 +02:00
quirks.c x86: hpet clock enable quirk on nVidia nForce 430 2008-03-21 17:06:15 +01:00
reboot_fixups_32.c x86: add the RDC machine specific reboot fixup 2008-01-30 13:33:36 +01:00
reboot.c x86: cleanup duplicate includes 2008-04-17 17:40:58 +02:00
relocate_kernel_32.S x86: relocate_kernel - use predefined macroses for page attributes 2008-04-17 17:41:29 +02:00
relocate_kernel_64.S x86: relocate_kernel - use predefined macroses for page attributes 2008-04-17 17:41:29 +02:00
rtc.c x86: fix cmos read and write to not use inb_p and outb_p 2008-04-17 17:40:47 +02:00
scx200_32.c x86: fix sparse warning in kernel/scx200_32.c 2008-01-31 22:05:45 +01:00
setup64.c x86: use specialized routine for setup per-cpu area 2008-04-17 17:41:01 +02:00
setup_32.c x86: use get_bios_ebda in mpparse_64.c 2008-04-17 17:41:05 +02:00
setup_64.c x86: split large page mapping for AMD TSEG 2008-04-17 17:41:30 +02:00
setup.c x86: use specialized routine for setup per-cpu area 2008-04-17 17:41:01 +02:00
sigframe.h x86: move struct definitions to unifed sigframe.h 2008-04-17 17:40:46 +02:00
signal_32.c x86: add KERN_INFO to show_unhandled_signals printout 2008-04-17 17:40:57 +02:00
signal_64.c x86: remove DEBUG_SIG 2008-04-17 17:40:57 +02:00
smp.c x86: Don't send RESCHEDULE_VECTOR to offlined cpus 2008-04-17 17:40:58 +02:00
smpboot.c x86: remove smpboot_32.c and smpboot_64.c 2008-04-17 17:41:04 +02:00
smpcommon_32.c x86: create smpcommon.c 2008-04-17 17:40:55 +02:00
smpcommon.c x86: create smpcommon.c 2008-04-17 17:40:55 +02:00
srat_32.c x86: replace remaining __FUNCTION__ occurances 2008-04-17 17:40:57 +02:00
stacktrace.c x86: don't save unreliable stack trace entries 2008-02-26 12:55:58 +01:00
step.c x86: prevent unconditional writes to DebugCtl MSR 2008-04-17 17:40:58 +02:00
summit_32.c x86: move mp_bus_id_to_node to numa.c 2008-04-17 17:40:59 +02:00
sys_i386_32.c
sys_x86_64.c
syscall_64.c x86: coding style fixes to arch/x86/kernel/syscall_64.c 2008-04-17 17:40:48 +02:00
syscall_table_32.S timerfd: wire the new timerfd API to the x86 family 2008-02-05 09:44:07 -08:00
tce_64.c
test_nx.c x86: Explicitly include required header files. 2008-04-17 17:41:15 +02:00
test_rodata.c x86: include proper prototypes for rodata_test 2008-02-14 23:30:20 +01:00
time_32.c
time_64.c time: fix typo in comments 2008-02-08 09:22:29 -08:00
tlb_32.c x86: create tlb files 2008-04-17 17:40:56 +02:00
tlb_64.c x86: create tlb files 2008-04-17 17:40:56 +02:00
tls.c asmlinkage_protect replaces prevent_tail_call 2008-04-10 17:28:26 -07:00
tls.h x86: x86 user_regset TLS 2008-01-30 13:31:52 +01:00
topology.c x86: fix section mismatch warning in topology.c:arch_register_cpu 2008-02-19 16:18:30 +01:00
trampoline_32.S x86: remove misleading comments in trampoline_*.S 2008-02-04 16:48:01 +01:00
trampoline_64.S x86: remove misleading comments in trampoline_*.S 2008-02-04 16:48:01 +01:00
traps_32.c x86: clean up traps_32.c 2008-04-17 17:40:51 +02:00
traps_64.c x86: wipe get_nmi_reason out of nmi_64.h 2008-04-17 17:41:01 +02:00
tsc_32.c x86: if we cannot calibrate the TSC, we panic. 2008-04-17 17:40:52 +02:00
tsc_64.c x86: fix call to set_cyc2ns_scale() from time_cpufreq_notifier() 2008-04-07 21:09:14 +02:00
tsc_sync.c x86: add warning to check_tsc_warp() 2008-01-30 13:33:24 +01:00
verify_cpu_64.S
vm86_32.c x86: handle_vm86_trap cleanup 2008-04-17 17:41:13 +02:00
vmi_32.c x86: VMI fix 2008-02-04 16:47:54 +01:00
vmiclock_32.c x86: isolate PIC/PIT in/out calls 2008-01-30 13:33:14 +01:00
vmlinux_32.lds.S x86: use ELF section to list CPU vendor specific code 2008-04-17 17:40:47 +02:00
vmlinux_64.lds.S x86: use ELF section to list CPU vendor specific code 2008-04-17 17:40:47 +02:00
vmlinux.lds.S
vsmp_64.c x86: clean up vSMP detection 2008-04-17 17:41:29 +02:00
vsyscall_64.c x86: restore vsyscall64 prochandler 2008-02-29 18:55:39 +01:00
x8664_ksyms_64.c x86: coding style fixes to arch/x86/kernel/x8664_ksyms_64.c 2008-04-17 17:40:48 +02:00