Currently, CONFIG_X86_CMPXCHG64 both enables boot-time checking of
the cmpxchg64b feature and enables compilation of the set_64bit() family.
Since the option is dependent on PAE, and since KVM depends on set_64bit(),
this effectively disables KVM on i386 nopae.
Simplify by removing the config option altogether: the boot check is made
dependent on CONFIG_X86_PAE directly, and the set_64bit() family is exposed
without constraints. It is up to users to check for the feature flag (KVM
does not as virtualiation extensions imply its existence).
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Only at the CPU_DYING stage can we be sure that no user process will
be scheduled onto the cpu and oops when trying to use virtualization
extensions.
Signed-off-by: Avi Kivity <avi@qumranet.com>
The hotplug IPIs can be called from the cpu on which we are currently
running on, so use on_cpu(). Similarly, drop on_each_cpu() for the
suspend/resume callbacks, as we're in atomic context here and only one
cpu is up anyway.
Signed-off-by: Avi Kivity <avi@qumranet.com>
By keeping track of which cpus have virtualization enabled, we
prevent double-enable or double-disable during hotplug, which is a
very fatal oops.
Signed-off-by: Avi Kivity <avi@qumranet.com>
kvm uses a pseudo filesystem, kvmfs, to generate inodes, a job that the
new anonymous inodes source does much better.
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch adds an implementation to the svm is_disabled function to
detect reliably if the BIOS disabled the SVM feature in the CPU. This
fixes the issues with kernel panics when loading the kvm-amd module on
machines where SVM is available but disabled.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Protected mode code may have corrupted the real-mode tss, so re-initialize
it when switching to real mode.
Signed-off-by: Avi Kivity <avi@qumranet.com>
When writing to normal memory and the memory area is unchanged the write
can be safely skipped, avoiding the costly kvm_mmu_pte_write.
Signed-Off-By: Luca Tettamanti <kronos.it@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
When the old value and new one are the same the emulator skips the
write; this is undesirable when the destination is a MMIO area and the
write shall be performed regardless of the previous value. This
optimization breaks e.g. a Linux guest APIC compiled without
X86_GOOD_APIC.
Remove the check and perform the writeback stage in the emulation unless
it's explicitly disabled (currently push and some 2 bytes instructions
may disable the writeback).
Signed-Off-By: Luca Tettamanti <kronos.it@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
With kernel-injected interrupts, we need to check for interrupts on
lightweight exits too.
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
If the time stamp counter goes backwards, a guest delay loop can become
infinite. This can happen if a vcpu is migrated to another cpu, where
the counter has a lower value than the first cpu.
Since we're doing an IPI to the first cpu anyway, we can use that to pick
up the old tsc, and use that to calculate the adjustment we need to make
to the tsc offset.
Signed-off-by: Avi Kivity <avi@qumranet.com>
When a vcpu causes a shadow tlb entry to have reduced permissions, it
must also clear the tlb on remote vcpus. We do that by:
- setting a bit on the vcpu that requests a tlb flush before the next entry
- if the vcpu is currently executing, we send an ipi to make sure it
exits before we continue
Signed-off-by: Avi Kivity <avi@qumranet.com>
A vcpu can pin up to four mmu shadow pages, which means the freeing
loop will never terminate. Fix by first unpinning shadow pages on
all vcpus, then freeing shadow pages.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Switch guest paging context may require us to allocate memory, which
might fail. Instead of wiring up error paths everywhere, make context
switching lazy and actually do the switch before the next guest entry,
where we can return an error if allocation fails.
Signed-off-by: Avi Kivity <avi@qumranet.com>
This was once used to avoid accessing the guest pte when upgrading
the shadow pte from read-only to read-write. But usually we need
to set the guest pte dirty or accessed bits anyway, so this wasn't
really exploited.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Always set the accessed and dirty bit (since having them cleared causes
a read-modify-write cycle), always set the present bit, and copy the
nx bit from the guest.
Signed-off-by: Avi Kivity <avi@qumranet.com>
With guest smp, a second vcpu might see partial updates when the first
vcpu services a page fault. So delay all updates until we have figured
out what the pte should look like.
Note that on i386, this is still not completely atomic as a 64-bit write
will be split into two on a 32-bit machine.
Signed-off-by: Avi Kivity <avi@qumranet.com>
This prevents some work from being performed twice, and, more importantly,
reduces the number of places where we modify shadow ptes.
Signed-off-by: Avi Kivity <avi@qumranet.com>
We will need the accessed bit (in addition to the dirty bit) and
also write access (for setting the dirty bit) in a future patch.
Signed-off-by: Avi Kivity <avi@qumranet.com>
KVM compilation fails for some .configs. This fixes it.
Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Make a "menuconfig" out of the Kconfig objects "menu, ..., endmenu",
so that the user can disable all the options in that menu at once
instead of having to disable each option separately.
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
MSR_EFER.LME/LMA bits are automatically save/restored by VMX
hardware, KVM only needs to save NX/SCE bits at time of heavy
weight VM Exit. But clearing NX bits in host envirnment may
cause system hang if the host page table is using EXB bits,
thus we leave NX bits as it is. If Host NX=1 and guest NX=0, we
can do guest page table EXB bits check before inserting a shadow
pte (though no guest is expecting to see this kind of gp fault).
If host NX=0, we present guest no Execute-Disable feature to guest,
thus no host NX=0, guest NX=1 combination.
This patch reduces raw vmexit time by ~27%.
Me: fix compile warnings on i386.
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>