2019-05-29 16:57:35 -07:00
|
|
|
// SPDX-License-Identifier: GPL-2.0-only
|
2007-07-06 02:20:49 -07:00
|
|
|
/*
|
|
|
|
* irq.c: API for in kernel interrupt controller
|
|
|
|
* Copyright (c) 2007, Intel Corporation.
|
2010-10-06 05:23:22 -07:00
|
|
|
* Copyright 2009 Red Hat, Inc. and/or its affiliates.
|
2007-07-06 02:20:49 -07:00
|
|
|
*
|
|
|
|
* Authors:
|
|
|
|
* Yaozu (Eddie) Dong <Eddie.dong@intel.com>
|
|
|
|
*/
|
KVM: x86: Unify pr_fmt to use module name for all KVM modules
Define pr_fmt using KBUILD_MODNAME for all KVM x86 code so that printks
use consistent formatting across common x86, Intel, and AMD code. In
addition to providing consistent print formatting, using KBUILD_MODNAME,
e.g. kvm_amd and kvm_intel, allows referencing SVM and VMX (and SEV and
SGX and ...) as technologies without generating weird messages, and
without causing naming conflicts with other kernel code, e.g. "SEV: ",
"tdx: ", "sgx: " etc.. are all used by the kernel for non-KVM subsystems.
Opportunistically move away from printk() for prints that need to be
modified anyways, e.g. to drop a manual "kvm: " prefix.
Opportunistically convert a few SGX WARNs that are similarly modified to
WARN_ONCE; in the very unlikely event that the WARNs fire, odds are good
that they would fire repeatedly and spam the kernel log without providing
unique information in each print.
Note, defining pr_fmt yields undesirable results for code that uses KVM's
printk wrappers, e.g. vcpu_unimpl(). But, that's a pre-existing problem
as SVM/kvm_amd already defines a pr_fmt, and thankfully use of KVM's
wrappers is relatively limited in KVM x86 code.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Message-Id: <20221130230934.1014142-35-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-30 16:09:18 -07:00
|
|
|
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
|
2007-07-06 02:20:49 -07:00
|
|
|
|
2016-07-13 17:19:00 -07:00
|
|
|
#include <linux/export.h>
|
2007-12-16 02:02:48 -07:00
|
|
|
#include <linux/kvm_host.h>
|
2007-07-06 02:20:49 -07:00
|
|
|
|
|
|
|
#include "irq.h"
|
2008-01-27 14:10:22 -07:00
|
|
|
#include "i8254.h"
|
2009-04-21 07:44:56 -07:00
|
|
|
#include "x86.h"
|
2020-12-09 13:08:30 -07:00
|
|
|
#include "xen.h"
|
2007-07-06 02:20:49 -07:00
|
|
|
|
2008-04-11 10:53:26 -07:00
|
|
|
/*
|
|
|
|
* check if there are pending timer events
|
|
|
|
* to be processed.
|
|
|
|
*/
|
|
|
|
int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2022-03-03 08:41:22 -07:00
|
|
|
int r = 0;
|
|
|
|
|
2016-01-08 05:41:16 -07:00
|
|
|
if (lapic_in_kernel(vcpu))
|
2022-03-03 08:41:22 -07:00
|
|
|
r = apic_has_pending_timer(vcpu);
|
|
|
|
if (kvm_xen_timer_enabled(vcpu))
|
|
|
|
r += kvm_xen_has_pending_timer(vcpu);
|
2016-01-08 05:41:16 -07:00
|
|
|
|
2022-03-03 08:41:22 -07:00
|
|
|
return r;
|
2008-04-11 10:53:26 -07:00
|
|
|
}
|
|
|
|
|
KVM: x86: Add support for local interrupt requests from userspace
In order to enable userspace PIC support, the userspace PIC needs to
be able to inject local interrupts even when the APICs are in the
kernel.
KVM_INTERRUPT now supports sending local interrupts to an APIC when
APICs are in the kernel.
The ready_for_interrupt_request flag is now only set when the CPU/APIC
will immediately accept and inject an interrupt (i.e. APIC has not
masked the PIC).
When the PIC wishes to initiate an INTA cycle with, say, CPU0, it
kicks CPU0 out of the guest, and renedezvous with CPU0 once it arrives
in userspace.
When the CPU/APIC unmasks the PIC, a KVM_EXIT_IRQ_WINDOW_OPEN is
triggered, so that userspace has a chance to inject a PIC interrupt
if it had been pending.
Overall, this design can lead to a small number of spurious userspace
renedezvous. In particular, whenever the PIC transistions from low to
high while it is masked and whenever the PIC becomes unmasked while
it is low.
Note: this does not buffer more than one local interrupt in the
kernel, so the VMM needs to enter the guest in order to complete
interrupt injection before injecting an additional interrupt.
Compiles for x86.
Can pass the KVM Unit Tests.
Signed-off-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-30 02:27:16 -07:00
|
|
|
/*
|
|
|
|
* check if there is a pending userspace external interrupt
|
|
|
|
*/
|
|
|
|
static int pending_userspace_extint(struct kvm_vcpu *v)
|
|
|
|
{
|
|
|
|
return v->arch.pending_external_vector != -1;
|
|
|
|
}
|
|
|
|
|
2013-01-24 19:18:51 -07:00
|
|
|
/*
|
|
|
|
* check if there is pending interrupt from
|
|
|
|
* non-APIC source without intack.
|
|
|
|
*/
|
KVM: x86: Fix split-irqchip vs interrupt injection window request
kvm_cpu_accept_dm_intr and kvm_vcpu_ready_for_interrupt_injection are
a hodge-podge of conditions, hacked together to get something that
more or less works. But what is actually needed is much simpler;
in both cases the fundamental question is, do we have a place to stash
an interrupt if userspace does KVM_INTERRUPT?
In userspace irqchip mode, that is !vcpu->arch.interrupt.injected.
Currently kvm_event_needs_reinjection(vcpu) covers it, but it is
unnecessarily restrictive.
In split irqchip mode it's a bit more complicated, we need to check
kvm_apic_accept_pic_intr(vcpu) (the IRQ window exit is basically an INTACK
cycle and thus requires ExtINTs not to be masked) as well as
!pending_userspace_extint(vcpu). However, there is no need to
check kvm_event_needs_reinjection(vcpu), since split irqchip keeps
pending ExtINT state separate from event injection state, and checking
kvm_cpu_has_interrupt(vcpu) is wrong too since ExtINT has higher
priority than APIC interrupts. In fact the latter fixes a bug:
when userspace requests an IRQ window vmexit, an interrupt in the
local APIC can cause kvm_cpu_has_interrupt() to be true and thus
kvm_vcpu_ready_for_interrupt_injection() to return false. When this
happens, vcpu_run does not exit to userspace but the interrupt window
vmexits keep occurring. The VM loops without any hope of making progress.
Once we try to fix these with something like
return kvm_arch_interrupt_allowed(vcpu) &&
- !kvm_cpu_has_interrupt(vcpu) &&
- !kvm_event_needs_reinjection(vcpu) &&
- kvm_cpu_accept_dm_intr(vcpu);
+ (!lapic_in_kernel(vcpu)
+ ? !vcpu->arch.interrupt.injected
+ : (kvm_apic_accept_pic_intr(vcpu)
+ && !pending_userspace_extint(v)));
we realize two things. First, thanks to the previous patch the complex
conditional can reuse !kvm_cpu_has_extint(vcpu). Second, the interrupt
window request in vcpu_enter_guest()
bool req_int_win =
dm_request_for_irq_injection(vcpu) &&
kvm_cpu_accept_dm_intr(vcpu);
should be kept in sync with kvm_vcpu_ready_for_interrupt_injection():
it is unnecessary to ask the processor for an interrupt window
if we would not be able to return to userspace. Therefore,
kvm_cpu_accept_dm_intr(vcpu) is basically !kvm_cpu_has_extint(vcpu)
ANDed with the existing check for masked ExtINT. It all makes sense:
- we can accept an interrupt from userspace if there is a place
to stash it (and, for irqchip split, ExtINTs are not masked).
Interrupts from userspace _can_ be accepted even if right now
EFLAGS.IF=0.
- in order to tell userspace we will inject its interrupt ("IRQ
window open" i.e. kvm_vcpu_ready_for_interrupt_injection), both
KVM and the vCPU need to be ready to accept the interrupt.
... and this is what the patch implements.
Reported-by: David Woodhouse <dwmw@amazon.co.uk>
Analyzed-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Nikos Tsironis <ntsironis@arrikto.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
2020-11-27 01:18:20 -07:00
|
|
|
int kvm_cpu_has_extint(struct kvm_vcpu *v)
|
2013-01-24 19:18:51 -07:00
|
|
|
{
|
KVM: x86: Rename interrupt.pending to interrupt.injected
For exceptions & NMIs events, KVM code use the following
coding convention:
*) "pending" represents an event that should be injected to guest at
some point but it's side-effects have not yet occurred.
*) "injected" represents an event that it's side-effects have already
occurred.
However, interrupts don't conform to this coding convention.
All current code flows mark interrupt.pending when it's side-effects
have already taken place (For example, bit moved from LAPIC IRR to
ISR). Therefore, it makes sense to just rename
interrupt.pending to interrupt.injected.
This change follows logic of previous commit 664f8e26b00c ("KVM: X86:
Fix loss of exception which has not yet been injected") which changed
exception to follow this coding convention as well.
It is important to note that in case !lapic_in_kernel(vcpu),
interrupt.pending usage was and still incorrect.
In this case, interrrupt.pending can only be set using one of the
following ioctls: KVM_INTERRUPT, KVM_SET_VCPU_EVENTS and
KVM_SET_SREGS. Looking at how QEMU uses these ioctls, one can see that
QEMU uses them either to re-set an "interrupt.pending" state it has
received from KVM (via KVM_GET_VCPU_EVENTS interrupt.pending or
via KVM_GET_SREGS interrupt_bitmap) or by dispatching a new interrupt
from QEMU's emulated LAPIC which reset bit in IRR and set bit in ISR
before sending ioctl to KVM. So it seems that indeed "interrupt.pending"
in this case is also suppose to represent "interrupt.injected".
However, kvm_cpu_has_interrupt() & kvm_cpu_has_injectable_intr()
is misusing (now named) interrupt.injected in order to return if
there is a pending interrupt.
This leads to nVMX/nSVM not be able to distinguish if it should exit
from L2 to L1 on EXTERNAL_INTERRUPT on pending interrupt or should
re-inject an injected interrupt.
Therefore, add a FIXME at these functions for handling this issue.
This patch introduce no semantics change.
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-22 17:01:31 -07:00
|
|
|
/*
|
2020-11-27 00:53:52 -07:00
|
|
|
* FIXME: interrupt.injected represents an interrupt whose
|
KVM: x86: Rename interrupt.pending to interrupt.injected
For exceptions & NMIs events, KVM code use the following
coding convention:
*) "pending" represents an event that should be injected to guest at
some point but it's side-effects have not yet occurred.
*) "injected" represents an event that it's side-effects have already
occurred.
However, interrupts don't conform to this coding convention.
All current code flows mark interrupt.pending when it's side-effects
have already taken place (For example, bit moved from LAPIC IRR to
ISR). Therefore, it makes sense to just rename
interrupt.pending to interrupt.injected.
This change follows logic of previous commit 664f8e26b00c ("KVM: X86:
Fix loss of exception which has not yet been injected") which changed
exception to follow this coding convention as well.
It is important to note that in case !lapic_in_kernel(vcpu),
interrupt.pending usage was and still incorrect.
In this case, interrrupt.pending can only be set using one of the
following ioctls: KVM_INTERRUPT, KVM_SET_VCPU_EVENTS and
KVM_SET_SREGS. Looking at how QEMU uses these ioctls, one can see that
QEMU uses them either to re-set an "interrupt.pending" state it has
received from KVM (via KVM_GET_VCPU_EVENTS interrupt.pending or
via KVM_GET_SREGS interrupt_bitmap) or by dispatching a new interrupt
from QEMU's emulated LAPIC which reset bit in IRR and set bit in ISR
before sending ioctl to KVM. So it seems that indeed "interrupt.pending"
in this case is also suppose to represent "interrupt.injected".
However, kvm_cpu_has_interrupt() & kvm_cpu_has_injectable_intr()
is misusing (now named) interrupt.injected in order to return if
there is a pending interrupt.
This leads to nVMX/nSVM not be able to distinguish if it should exit
from L2 to L1 on EXTERNAL_INTERRUPT on pending interrupt or should
re-inject an injected interrupt.
Therefore, add a FIXME at these functions for handling this issue.
This patch introduce no semantics change.
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-22 17:01:31 -07:00
|
|
|
* side-effects have already been applied (e.g. bit from IRR
|
|
|
|
* already moved to ISR). Therefore, it is incorrect to rely
|
|
|
|
* on interrupt.injected to know if there is a pending
|
|
|
|
* interrupt in the user-mode LAPIC.
|
|
|
|
* This leads to nVMX/nSVM not be able to distinguish
|
|
|
|
* if it should exit from L2 to L1 on EXTERNAL_INTERRUPT on
|
|
|
|
* pending interrupt or should re-inject an injected
|
|
|
|
* interrupt.
|
|
|
|
*/
|
2015-07-29 03:05:37 -07:00
|
|
|
if (!lapic_in_kernel(v))
|
KVM: x86: Rename interrupt.pending to interrupt.injected
For exceptions & NMIs events, KVM code use the following
coding convention:
*) "pending" represents an event that should be injected to guest at
some point but it's side-effects have not yet occurred.
*) "injected" represents an event that it's side-effects have already
occurred.
However, interrupts don't conform to this coding convention.
All current code flows mark interrupt.pending when it's side-effects
have already taken place (For example, bit moved from LAPIC IRR to
ISR). Therefore, it makes sense to just rename
interrupt.pending to interrupt.injected.
This change follows logic of previous commit 664f8e26b00c ("KVM: X86:
Fix loss of exception which has not yet been injected") which changed
exception to follow this coding convention as well.
It is important to note that in case !lapic_in_kernel(vcpu),
interrupt.pending usage was and still incorrect.
In this case, interrrupt.pending can only be set using one of the
following ioctls: KVM_INTERRUPT, KVM_SET_VCPU_EVENTS and
KVM_SET_SREGS. Looking at how QEMU uses these ioctls, one can see that
QEMU uses them either to re-set an "interrupt.pending" state it has
received from KVM (via KVM_GET_VCPU_EVENTS interrupt.pending or
via KVM_GET_SREGS interrupt_bitmap) or by dispatching a new interrupt
from QEMU's emulated LAPIC which reset bit in IRR and set bit in ISR
before sending ioctl to KVM. So it seems that indeed "interrupt.pending"
in this case is also suppose to represent "interrupt.injected".
However, kvm_cpu_has_interrupt() & kvm_cpu_has_injectable_intr()
is misusing (now named) interrupt.injected in order to return if
there is a pending interrupt.
This leads to nVMX/nSVM not be able to distinguish if it should exit
from L2 to L1 on EXTERNAL_INTERRUPT on pending interrupt or should
re-inject an injected interrupt.
Therefore, add a FIXME at these functions for handling this issue.
This patch introduce no semantics change.
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-22 17:01:31 -07:00
|
|
|
return v->arch.interrupt.injected;
|
2013-01-24 19:18:51 -07:00
|
|
|
|
2020-12-09 13:08:30 -07:00
|
|
|
if (kvm_xen_has_interrupt(v))
|
|
|
|
return 1;
|
|
|
|
|
2020-11-27 00:53:52 -07:00
|
|
|
if (!kvm_apic_accept_pic_intr(v))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (irqchip_split(v->kvm))
|
|
|
|
return pending_userspace_extint(v);
|
|
|
|
else
|
|
|
|
return v->kvm->arch.vpic->output;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* check if there is injectable interrupt:
|
|
|
|
* when virtual interrupt delivery enabled,
|
|
|
|
* interrupt from apic will handled by hardware,
|
|
|
|
* we don't need to check it here.
|
|
|
|
*/
|
|
|
|
int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v)
|
|
|
|
{
|
2013-01-24 19:18:51 -07:00
|
|
|
if (kvm_cpu_has_extint(v))
|
|
|
|
return 1;
|
|
|
|
|
2017-12-24 09:12:56 -07:00
|
|
|
if (!is_guest_mode(v) && kvm_vcpu_apicv_active(v))
|
2013-01-24 19:18:51 -07:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
return kvm_apic_has_interrupt(v) != -1; /* LAPIC */
|
|
|
|
}
|
2020-05-22 09:18:27 -07:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_cpu_has_injectable_intr);
|
2013-01-24 19:18:51 -07:00
|
|
|
|
2007-07-06 02:20:49 -07:00
|
|
|
/*
|
|
|
|
* check if there is pending interrupt without
|
|
|
|
* intack.
|
|
|
|
*/
|
|
|
|
int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
|
|
|
|
{
|
2013-01-24 19:18:51 -07:00
|
|
|
if (kvm_cpu_has_extint(v))
|
|
|
|
return 1;
|
2012-12-10 05:05:55 -07:00
|
|
|
|
|
|
|
return kvm_apic_has_interrupt(v) != -1; /* LAPIC */
|
2007-07-06 02:20:49 -07:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
|
|
|
|
|
2013-01-24 19:18:51 -07:00
|
|
|
/*
|
|
|
|
* Read pending interrupt(from non-APIC source)
|
|
|
|
* vector and intack.
|
|
|
|
*/
|
KVM: nVMX: Get to-be-acknowledge IRQ for nested VM-Exit at injection site
Move the logic to get the to-be-acknowledge IRQ for a nested VM-Exit from
nested_vmx_vmexit() to vmx_check_nested_events(), which is subtly the one
and only path where KVM invokes nested_vmx_vmexit() with
EXIT_REASON_EXTERNAL_INTERRUPT. A future fix will perform a last-minute
check on L2's nested posted interrupt notification vector, just before
injecting a nested VM-Exit. To handle that scenario correctly, KVM needs
to get the interrupt _before_ injecting VM-Exit, as simply querying the
highest priority interrupt, via kvm_cpu_has_interrupt(), would result in
TOCTOU bug, as a new, higher priority interrupt could arrive between
kvm_cpu_has_interrupt() and kvm_cpu_get_interrupt().
Unfortunately, simply moving the call to kvm_cpu_get_interrupt() doesn't
suffice, as a VMWRITE to GUEST_INTERRUPT_STATUS.SVI is hiding in
kvm_get_apic_interrupt(), and acknowledging the interrupt before nested
VM-Exit would cause the VMWRITE to hit vmcs02 instead of vmcs01.
Open code a rough equivalent to kvm_cpu_get_interrupt() so that the IRQ
is acknowledged after emulating VM-Exit, taking care to avoid the TOCTOU
issue described above.
Opportunistically convert the WARN_ON() to a WARN_ON_ONCE(). If KVM has
a bug that results in a false positive from kvm_cpu_has_interrupt(),
spamming dmesg won't help the situation.
Note, nested_vmx_reflect_vmexit() can never reflect external interrupts as
they are always "wanted" by L0.
Link: https://lore.kernel.org/r/20240906043413.1049633-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-05 21:34:08 -07:00
|
|
|
int kvm_cpu_get_extint(struct kvm_vcpu *v)
|
2013-01-24 19:18:51 -07:00
|
|
|
{
|
2020-11-27 00:53:52 -07:00
|
|
|
if (!kvm_cpu_has_extint(v)) {
|
|
|
|
WARN_ON(!lapic_in_kernel(v));
|
KVM: x86: Add support for local interrupt requests from userspace
In order to enable userspace PIC support, the userspace PIC needs to
be able to inject local interrupts even when the APICs are in the
kernel.
KVM_INTERRUPT now supports sending local interrupts to an APIC when
APICs are in the kernel.
The ready_for_interrupt_request flag is now only set when the CPU/APIC
will immediately accept and inject an interrupt (i.e. APIC has not
masked the PIC).
When the PIC wishes to initiate an INTA cycle with, say, CPU0, it
kicks CPU0 out of the guest, and renedezvous with CPU0 once it arrives
in userspace.
When the CPU/APIC unmasks the PIC, a KVM_EXIT_IRQ_WINDOW_OPEN is
triggered, so that userspace has a chance to inject a PIC interrupt
if it had been pending.
Overall, this design can lead to a small number of spurious userspace
renedezvous. In particular, whenever the PIC transistions from low to
high while it is masked and whenever the PIC becomes unmasked while
it is low.
Note: this does not buffer more than one local interrupt in the
kernel, so the VMM needs to enter the guest in order to complete
interrupt injection before injecting an additional interrupt.
Compiles for x86.
Can pass the KVM Unit Tests.
Signed-off-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-30 02:27:16 -07:00
|
|
|
return -1;
|
2020-11-27 00:53:52 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
if (!lapic_in_kernel(v))
|
|
|
|
return v->arch.interrupt.nr;
|
|
|
|
|
2023-12-05 03:36:15 -07:00
|
|
|
#ifdef CONFIG_KVM_XEN
|
2020-12-09 13:08:30 -07:00
|
|
|
if (kvm_xen_has_interrupt(v))
|
|
|
|
return v->kvm->arch.xen.upcall_vector;
|
2023-12-05 03:36:15 -07:00
|
|
|
#endif
|
2020-12-09 13:08:30 -07:00
|
|
|
|
2020-11-27 00:53:52 -07:00
|
|
|
if (irqchip_split(v->kvm)) {
|
|
|
|
int vector = v->arch.pending_external_vector;
|
|
|
|
|
|
|
|
v->arch.pending_external_vector = -1;
|
|
|
|
return vector;
|
|
|
|
} else
|
|
|
|
return kvm_pic_read_irq(v->kvm); /* PIC */
|
2013-01-24 19:18:51 -07:00
|
|
|
}
|
KVM: nVMX: Get to-be-acknowledge IRQ for nested VM-Exit at injection site
Move the logic to get the to-be-acknowledge IRQ for a nested VM-Exit from
nested_vmx_vmexit() to vmx_check_nested_events(), which is subtly the one
and only path where KVM invokes nested_vmx_vmexit() with
EXIT_REASON_EXTERNAL_INTERRUPT. A future fix will perform a last-minute
check on L2's nested posted interrupt notification vector, just before
injecting a nested VM-Exit. To handle that scenario correctly, KVM needs
to get the interrupt _before_ injecting VM-Exit, as simply querying the
highest priority interrupt, via kvm_cpu_has_interrupt(), would result in
TOCTOU bug, as a new, higher priority interrupt could arrive between
kvm_cpu_has_interrupt() and kvm_cpu_get_interrupt().
Unfortunately, simply moving the call to kvm_cpu_get_interrupt() doesn't
suffice, as a VMWRITE to GUEST_INTERRUPT_STATUS.SVI is hiding in
kvm_get_apic_interrupt(), and acknowledging the interrupt before nested
VM-Exit would cause the VMWRITE to hit vmcs02 instead of vmcs01.
Open code a rough equivalent to kvm_cpu_get_interrupt() so that the IRQ
is acknowledged after emulating VM-Exit, taking care to avoid the TOCTOU
issue described above.
Opportunistically convert the WARN_ON() to a WARN_ON_ONCE(). If KVM has
a bug that results in a false positive from kvm_cpu_has_interrupt(),
spamming dmesg won't help the situation.
Note, nested_vmx_reflect_vmexit() can never reflect external interrupts as
they are always "wanted" by L0.
Link: https://lore.kernel.org/r/20240906043413.1049633-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-05 21:34:08 -07:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_cpu_get_extint);
|
2013-01-24 19:18:51 -07:00
|
|
|
|
2007-07-06 02:20:49 -07:00
|
|
|
/*
|
|
|
|
* Read pending interrupt vector and intack.
|
|
|
|
*/
|
|
|
|
int kvm_cpu_get_interrupt(struct kvm_vcpu *v)
|
|
|
|
{
|
2020-11-27 00:53:52 -07:00
|
|
|
int vector = kvm_cpu_get_extint(v);
|
KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use
After commit 77b0f5d (KVM: nVMX: Ack and write vector info to intr_info
if L1 asks us to), "Acknowledge interrupt on exit" behavior can be
emulated. To do so, KVM will ask the APIC for the interrupt vector if
during a nested vmexit if VM_EXIT_ACK_INTR_ON_EXIT is set. With APICv,
kvm_get_apic_interrupt would return -1 and give the following WARNING:
Call Trace:
[<ffffffff81493563>] dump_stack+0x49/0x5e
[<ffffffff8103f0eb>] warn_slowpath_common+0x7c/0x96
[<ffffffffa059709a>] ? nested_vmx_vmexit+0xa4/0x233 [kvm_intel]
[<ffffffff8103f11a>] warn_slowpath_null+0x15/0x17
[<ffffffffa059709a>] nested_vmx_vmexit+0xa4/0x233 [kvm_intel]
[<ffffffffa0594295>] ? nested_vmx_exit_handled+0x6a/0x39e [kvm_intel]
[<ffffffffa0537931>] ? kvm_apic_has_interrupt+0x80/0xd5 [kvm]
[<ffffffffa05972ec>] vmx_check_nested_events+0xc3/0xd3 [kvm_intel]
[<ffffffffa051ebe9>] inject_pending_event+0xd0/0x16e [kvm]
[<ffffffffa051efa0>] vcpu_enter_guest+0x319/0x704 [kvm]
To fix this, we cannot rely on the processor's virtual interrupt delivery,
because "acknowledge interrupt on exit" must only update the virtual
ISR/PPR/IRR registers (and SVI, which is just a cache of the virtual ISR)
but it should not deliver the interrupt through the IDT. Thus, KVM has
to deliver the interrupt "by hand", similar to the treatment of EOI in
commit fc57ac2c9ca8 (KVM: lapic: sync highest ISR to hardware apic on
EOI, 2014-05-14).
The patch modifies kvm_cpu_get_interrupt to always acknowledge an
interrupt; there are only two callers, and the other is not affected
because it is never reached with kvm_apic_vid_enabled() == true. Then it
modifies apic_set_isr and apic_clear_irr to update SVI and RVI in addition
to the registers.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: "Zhang, Yang Z" <yang.z.zhang@intel.com>
Tested-by: Liu, RongrongX <rongrongx.liu@intel.com>
Tested-by: Felipe Reyes <freyes@suse.com>
Fixes: 77b0f5d67ff2781f36831cba79674c3e97bd7acf
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-04 21:42:24 -07:00
|
|
|
if (vector != -1)
|
2013-01-24 19:18:51 -07:00
|
|
|
return vector; /* PIC */
|
2012-12-10 05:05:55 -07:00
|
|
|
|
2024-09-05 21:34:11 -07:00
|
|
|
vector = kvm_apic_has_interrupt(v); /* APIC */
|
|
|
|
if (vector != -1)
|
|
|
|
kvm_apic_ack_interrupt(v, vector);
|
|
|
|
|
|
|
|
return vector;
|
2007-07-06 02:20:49 -07:00
|
|
|
}
|
2007-09-12 00:58:04 -07:00
|
|
|
|
2007-09-03 06:56:58 -07:00
|
|
|
void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2016-01-08 05:41:16 -07:00
|
|
|
if (lapic_in_kernel(vcpu))
|
|
|
|
kvm_inject_apic_timer_irqs(vcpu);
|
2022-03-03 08:41:22 -07:00
|
|
|
if (kvm_xen_timer_enabled(vcpu))
|
|
|
|
kvm_xen_inject_timer_irqs(vcpu);
|
2007-09-03 06:56:58 -07:00
|
|
|
}
|
|
|
|
|
2008-05-27 08:10:20 -07:00
|
|
|
void __kvm_migrate_timers(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
__kvm_migrate_apic_timer(vcpu);
|
|
|
|
__kvm_migrate_pit_timer(vcpu);
|
2024-05-07 06:31:02 -07:00
|
|
|
kvm_x86_call(migrate_timers)(vcpu);
|
2008-05-27 08:10:20 -07:00
|
|
|
}
|
2019-05-05 01:56:42 -07:00
|
|
|
|
|
|
|
bool kvm_arch_irqfd_allowed(struct kvm *kvm, struct kvm_irqfd *args)
|
|
|
|
{
|
|
|
|
bool resample = args->flags & KVM_IRQFD_FLAG_RESAMPLE;
|
|
|
|
|
|
|
|
return resample ? irqchip_kernel(kvm) : irqchip_in_kernel(kvm);
|
|
|
|
}
|
2022-11-03 07:44:10 -07:00
|
|
|
|
|
|
|
bool kvm_arch_irqchip_in_kernel(struct kvm *kvm)
|
|
|
|
{
|
|
|
|
return irqchip_in_kernel(kvm);
|
|
|
|
}
|