License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 07:07:57 -07:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
2015-06-19 04:54:23 -07:00
|
|
|
#ifndef __KVM_X86_PMU_H
|
|
|
|
#define __KVM_X86_PMU_H
|
|
|
|
|
2019-12-11 13:47:48 -07:00
|
|
|
#include <linux/nospec.h>
|
|
|
|
|
KVM: x86/pmu: Move pmc_idx => pmc translation helper to common code
Add a common helper for *internal* PMC lookups, and delete the ops hook
and Intel's implementation. Keep AMD's implementation, but rename it to
amd_pmu_get_pmc() to make it somewhat more obvious that it's suited for
both KVM-internal and guest-initiated lookups.
Because KVM tracks all counters in a single bitmap, getting a counter
when iterating over a bitmap, e.g. of all valid PMCs, requires a small
amount of math, that while simple, isn't super obvious and doesn't use the
same semantics as PMC lookups from RDPMC! Although AMD doesn't support
fixed counters, the common PMU code still behaves as if there a split, the
high half of which just happens to always be empty.
Opportunstically add a comment to explain both what is going on, and why
KVM uses a single bitmap, e.g. the boilerplate for iterating over separate
bitmaps could be done via macros, so it's not (just) about deduplicating
code.
Link: https://lore.kernel.org/r/20231110022857.1273836-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-09 19:28:50 -07:00
|
|
|
#include <asm/kvm_host.h>
|
|
|
|
|
2015-06-19 04:54:23 -07:00
|
|
|
#define vcpu_to_pmu(vcpu) (&(vcpu)->arch.pmu)
|
|
|
|
#define pmu_to_vcpu(pmu) (container_of((pmu), struct kvm_vcpu, arch.pmu))
|
|
|
|
#define pmc_to_pmu(pmc) (&(pmc)->vcpu->arch.pmu)
|
|
|
|
|
2022-05-31 20:19:25 -07:00
|
|
|
#define MSR_IA32_MISC_ENABLE_PMU_RO_MASK (MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL | \
|
|
|
|
MSR_IA32_MISC_ENABLE_BTS_UNAVAIL)
|
|
|
|
|
2015-06-19 06:45:05 -07:00
|
|
|
/* retrieve the 4 bits for EN and PMI out of IA32_FIXED_CTR_CTRL */
|
2024-04-29 17:52:39 -07:00
|
|
|
#define fixed_ctrl_field(ctrl_reg, idx) \
|
|
|
|
(((ctrl_reg) >> ((idx) * INTEL_FIXED_BITS_STRIDE)) & INTEL_FIXED_BITS_MASK)
|
2015-06-19 06:45:05 -07:00
|
|
|
|
2018-03-12 04:12:53 -07:00
|
|
|
#define VMWARE_BACKDOOR_PMC_HOST_TSC 0x10000
|
|
|
|
#define VMWARE_BACKDOOR_PMC_REAL_TIME 0x10001
|
|
|
|
#define VMWARE_BACKDOOR_PMC_APPARENT_TIME 0x10002
|
|
|
|
|
2023-11-09 19:28:49 -07:00
|
|
|
#define KVM_FIXED_PMC_BASE_IDX INTEL_PMC_IDX_FIXED
|
|
|
|
|
2023-11-09 19:28:54 -07:00
|
|
|
struct kvm_pmu_emulated_event_selectors {
|
|
|
|
u64 INSTRUCTIONS_RETIRED;
|
|
|
|
u64 BRANCH_INSTRUCTIONS_RETIRED;
|
|
|
|
};
|
|
|
|
|
2015-06-19 06:45:05 -07:00
|
|
|
struct kvm_pmu_ops {
|
2019-10-27 03:52:40 -07:00
|
|
|
struct kvm_pmc *(*rdpmc_ecx_to_pmc)(struct kvm_vcpu *vcpu,
|
|
|
|
unsigned int idx, u64 *mask);
|
2019-10-27 03:52:41 -07:00
|
|
|
struct kvm_pmc *(*msr_idx_to_pmc)(struct kvm_vcpu *vcpu, u32 msr);
|
KVM: x86/pmu: Prioritize VMX interception over #GP on RDPMC due to bad index
Apply the pre-intercepts RDPMC validity check only to AMD, and rename all
relevant functions to make it as clear as possible that the check is not a
standard PMC index check. On Intel, the basic rule is that only invalid
opcodes and privilege/permission/mode checks have priority over VM-Exit,
i.e. RDPMC with an invalid index should VM-Exit, not #GP. While the SDM
doesn't explicitly call out RDPMC, it _does_ explicitly use RDMSR of a
non-existent MSR as an example where VM-Exit has priority over #GP, and
RDPMC is effectively just a variation of RDMSR.
Manually testing on various Intel CPUs confirms this behavior, and the
inverted priority was introduced for SVM compatibility, i.e. was not an
intentional change for Intel PMUs. On AMD, *all* exceptions on RDPMC have
priority over VM-Exit.
Check for a NULL kvm_pmu_ops.check_rdpmc_early instead of using a RET0
static call so as to provide a convenient location to document the
difference between Intel and AMD, and to again try to make it as obvious
as possible that the early check is a one-off thing, not a generic "is
this PMC valid?" helper.
Fixes: 8061252ee0d2 ("KVM: SVM: Add intercept checks for remaining twobyte instructions")
Cc: Jim Mattson <jmattson@google.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20240109230250.424295-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-01-09 16:02:27 -07:00
|
|
|
int (*check_rdpmc_early)(struct kvm_vcpu *vcpu, unsigned int idx);
|
2022-06-10 17:57:52 -07:00
|
|
|
bool (*is_valid_msr)(struct kvm_vcpu *vcpu, u32 msr);
|
2020-05-29 00:43:44 -07:00
|
|
|
int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
|
2015-06-19 06:45:05 -07:00
|
|
|
int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
|
|
|
|
void (*refresh)(struct kvm_vcpu *vcpu);
|
|
|
|
void (*init)(struct kvm_vcpu *vcpu);
|
|
|
|
void (*reset)(struct kvm_vcpu *vcpu);
|
2021-01-31 22:10:36 -07:00
|
|
|
void (*deliver_pmi)(struct kvm_vcpu *vcpu);
|
2021-01-31 22:10:37 -07:00
|
|
|
void (*cleanup)(struct kvm_vcpu *vcpu);
|
2022-12-20 09:12:30 -07:00
|
|
|
|
|
|
|
const u64 EVENTSEL_EVENT;
|
2023-01-24 16:49:00 -07:00
|
|
|
const int MAX_NR_GP_COUNTERS;
|
2023-06-02 18:10:53 -07:00
|
|
|
const int MIN_NR_GP_COUNTERS;
|
2015-06-19 06:45:05 -07:00
|
|
|
};
|
|
|
|
|
2022-03-29 16:50:52 -07:00
|
|
|
void kvm_pmu_ops_update(const struct kvm_pmu_ops *pmu_ops);
|
|
|
|
|
2023-06-02 18:10:50 -07:00
|
|
|
static inline bool kvm_pmu_has_perf_global_ctrl(struct kvm_pmu *pmu)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Architecturally, Intel's SDM states that IA32_PERF_GLOBAL_CTRL is
|
|
|
|
* supported if "CPUID.0AH: EAX[7:0] > 0", i.e. if the PMU version is
|
|
|
|
* greater than zero. However, KVM only exposes and emulates the MSR
|
|
|
|
* to/for the guest if the guest PMU supports at least "Architectural
|
|
|
|
* Performance Monitoring Version 2".
|
|
|
|
*
|
|
|
|
* AMD's version of PERF_GLOBAL_CTRL conveniently shows up with v2.
|
|
|
|
*/
|
|
|
|
return pmu->version > 1;
|
|
|
|
}
|
|
|
|
|
KVM: x86/pmu: Move pmc_idx => pmc translation helper to common code
Add a common helper for *internal* PMC lookups, and delete the ops hook
and Intel's implementation. Keep AMD's implementation, but rename it to
amd_pmu_get_pmc() to make it somewhat more obvious that it's suited for
both KVM-internal and guest-initiated lookups.
Because KVM tracks all counters in a single bitmap, getting a counter
when iterating over a bitmap, e.g. of all valid PMCs, requires a small
amount of math, that while simple, isn't super obvious and doesn't use the
same semantics as PMC lookups from RDPMC! Although AMD doesn't support
fixed counters, the common PMU code still behaves as if there a split, the
high half of which just happens to always be empty.
Opportunstically add a comment to explain both what is going on, and why
KVM uses a single bitmap, e.g. the boilerplate for iterating over separate
bitmaps could be done via macros, so it's not (just) about deduplicating
code.
Link: https://lore.kernel.org/r/20231110022857.1273836-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-09 19:28:50 -07:00
|
|
|
/*
|
|
|
|
* KVM tracks all counters in 64-bit bitmaps, with general purpose counters
|
|
|
|
* mapped to bits 31:0 and fixed counters mapped to 63:32, e.g. fixed counter 0
|
|
|
|
* is tracked internally via index 32. On Intel, (AMD doesn't support fixed
|
|
|
|
* counters), this mirrors how fixed counters are mapped to PERF_GLOBAL_CTRL
|
|
|
|
* and similar MSRs, i.e. tracking fixed counters at base index 32 reduces the
|
|
|
|
* amounter of boilerplate needed to iterate over PMCs *and* simplifies common
|
|
|
|
* enabling/disable/reset operations.
|
|
|
|
*
|
|
|
|
* WARNING! This helper is only for lookups that are initiated by KVM, it is
|
|
|
|
* NOT safe for guest lookups, e.g. will do the wrong thing if passed a raw
|
|
|
|
* ECX value from RDPMC (fixed counters are accessed by setting bit 30 in ECX
|
|
|
|
* for RDPMC, not by adding 32 to the fixed counter index).
|
|
|
|
*/
|
|
|
|
static inline struct kvm_pmc *kvm_pmc_idx_to_pmc(struct kvm_pmu *pmu, int idx)
|
|
|
|
{
|
|
|
|
if (idx < pmu->nr_arch_gp_counters)
|
|
|
|
return &pmu->gp_counters[idx];
|
|
|
|
|
|
|
|
idx -= KVM_FIXED_PMC_BASE_IDX;
|
|
|
|
if (idx >= 0 && idx < pmu->nr_arch_fixed_counters)
|
|
|
|
return &pmu->fixed_counters[idx];
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2023-11-09 19:28:52 -07:00
|
|
|
#define kvm_for_each_pmc(pmu, pmc, i, bitmap) \
|
|
|
|
for_each_set_bit(i, bitmap, X86_PMC_IDX_MAX) \
|
|
|
|
if (!(pmc = kvm_pmc_idx_to_pmc(pmu, i))) \
|
|
|
|
continue; \
|
|
|
|
else \
|
|
|
|
|
2015-06-19 06:45:05 -07:00
|
|
|
static inline u64 pmc_bitmask(struct kvm_pmc *pmc)
|
|
|
|
{
|
|
|
|
struct kvm_pmu *pmu = pmc_to_pmu(pmc);
|
|
|
|
|
|
|
|
return pmu->counter_bitmask[pmc->type];
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline u64 pmc_read_counter(struct kvm_pmc *pmc)
|
|
|
|
{
|
|
|
|
u64 counter, enabled, running;
|
|
|
|
|
KVM: x86/pmu: Track emulated counter events instead of previous counter
Explicitly track emulated counter events instead of using the common
counter value that's shared with the hardware counter owned by perf.
Bumping the common counter requires snapshotting the pre-increment value
in order to detect overflow from emulation, and the snapshot approach is
inherently flawed.
Snapshotting the previous counter at every increment assumes that there is
at most one emulated counter event per emulated instruction (or rather,
between checks for KVM_REQ_PMU). That's mostly holds true today because
KVM only emulates (branch) instructions retired, but the approach will
fall apart if KVM ever supports event types that don't have a 1:1
relationship with instructions.
And KVM already has a relevant bug, as handle_invalid_guest_state()
emulates multiple instructions without checking KVM_REQ_PMU, i.e. could
miss an overflow event due to clobbering pmc->prev_counter. Not checking
KVM_REQ_PMU is problematic in both cases, but at least with the emulated
counter approach, the resulting behavior is delayed overflow detection,
as opposed to completely lost detection.
Tracking the emulated count fixes another bug where the snapshot approach
can signal spurious overflow due to incorporating both the emulated count
and perf's count in the check, i.e. if overflow is detected by perf, then
KVM's emulation will also incorrectly signal overflow. Add a comment in
the related code to call out the need to process emulated events *after*
pausing the perf event (big kudos to Mingwei for figuring out that
particular wrinkle).
Cc: Mingwei Zhang <mizhang@google.com>
Cc: Roman Kagan <rkagan@amazon.de>
Cc: Jim Mattson <jmattson@google.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Like Xu <like.xu.linux@gmail.com>
Reviewed-by: Mingwei Zhang <mizhang@google.com>
Link: https://lore.kernel.org/r/20231103230541.352265-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-03 16:05:41 -07:00
|
|
|
counter = pmc->counter + pmc->emulated_counter;
|
|
|
|
|
KVM: x86/pmu: Introduce pmc->is_paused to reduce the call time of perf interfaces
Based on our observations, after any vm-exit associated with vPMU, there
are at least two or more perf interfaces to be called for guest counter
emulation, such as perf_event_{pause, read_value, period}(), and each one
will {lock, unlock} the same perf_event_ctx. The frequency of calls becomes
more severe when guest use counters in a multiplexed manner.
Holding a lock once and completing the KVM request operations in the perf
context would introduce a set of impractical new interfaces. So we can
further optimize the vPMU implementation by avoiding repeated calls to
these interfaces in the KVM context for at least one pattern:
After we call perf_event_pause() once, the event will be disabled and its
internal count will be reset to 0. So there is no need to pause it again
or read its value. Once the event is paused, event period will not be
updated until the next time it's resumed or reprogrammed. And there is
also no need to call perf_event_period twice for a non-running counter,
considering the perf_event for a running counter is never paused.
Based on this implementation, for the following common usage of
sampling 4 events using perf on a 4u8g guest:
echo 0 > /proc/sys/kernel/watchdog
echo 25 > /proc/sys/kernel/perf_cpu_time_max_percent
echo 10000 > /proc/sys/kernel/perf_event_max_sample_rate
echo 0 > /proc/sys/kernel/perf_cpu_time_max_percent
for i in `seq 1 1 10`
do
taskset -c 0 perf record \
-e cpu-cycles -e instructions -e branch-instructions -e cache-misses \
/root/br_instr a
done
the average latency of the guest NMI handler is reduced from
37646.7 ns to 32929.3 ns (~1.14x speed up) on the Intel ICX server.
Also, in addition to collecting more samples, no loss of sampling
accuracy was observed compared to before the optimization.
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20210728120705.6855-1-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
2021-07-28 05:07:05 -07:00
|
|
|
if (pmc->perf_event && !pmc->is_paused)
|
2015-06-19 06:45:05 -07:00
|
|
|
counter += perf_event_read_value(pmc->perf_event,
|
|
|
|
&enabled, &running);
|
|
|
|
/* FIXME: Scaling needed? */
|
|
|
|
return counter & pmc_bitmask(pmc);
|
|
|
|
}
|
|
|
|
|
2023-11-03 16:05:40 -07:00
|
|
|
void pmc_write_counter(struct kvm_pmc *pmc, u64 val);
|
KVM: x86/pmu: Truncate counter value to allowed width on write
Performance counters are defined to have width less than 64 bits. The
vPMU code maintains the counters in u64 variables but assumes the value
to fit within the defined width. However, for Intel non-full-width
counters (MSR_IA32_PERFCTRx) the value receieved from the guest is
truncated to 32 bits and then sign-extended to full 64 bits. If a
negative value is set, it's sign-extended to 64 bits, but then in
kvm_pmu_incr_counter() it's incremented, truncated, and compared to the
previous value for overflow detection.
That previous value is not truncated, so it always evaluates bigger than
the truncated new one, and a PMI is injected. If the PMI handler writes
a negative counter value itself, the vCPU never quits the PMI loop.
Turns out that Linux PMI handler actually does write the counter with
the value just read with RDPMC, so when no full-width support is exposed
via MSR_IA32_PERF_CAPABILITIES, and the guest initializes the counter to
a negative value, it locks up.
This has been observed in the field, for example, when the guest configures
atop to use perfevents and runs two instances of it simultaneously.
To address the problem, maintain the invariant that the counter value
always fits in the defined bit width, by truncating the received value
in the respective set_msr methods. For better readability, factor the
out into a helper function, pmc_write_counter(), shared by vmx and svm
parts.
Fixes: 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions")
Cc: stable@vger.kernel.org
Signed-off-by: Roman Kagan <rkagan@amazon.de>
Link: https://lore.kernel.org/all/20230504120042.785651-1-rkagan@amazon.de
Tested-by: Like Xu <likexu@tencent.com>
[sean: tweak changelog, s/set/write in the helper]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-05-04 05:00:42 -07:00
|
|
|
|
2015-06-19 06:45:05 -07:00
|
|
|
static inline bool pmc_is_gp(struct kvm_pmc *pmc)
|
|
|
|
{
|
|
|
|
return pmc->type == KVM_PMC_GP;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool pmc_is_fixed(struct kvm_pmc *pmc)
|
|
|
|
{
|
|
|
|
return pmc->type == KVM_PMC_FIXED;
|
|
|
|
}
|
|
|
|
|
2019-11-13 17:17:15 -07:00
|
|
|
static inline bool kvm_valid_perf_global_ctrl(struct kvm_pmu *pmu,
|
|
|
|
u64 data)
|
|
|
|
{
|
2024-04-29 17:52:38 -07:00
|
|
|
return !(pmu->global_ctrl_rsvd & data);
|
2019-11-13 17:17:15 -07:00
|
|
|
}
|
|
|
|
|
2015-06-19 06:45:05 -07:00
|
|
|
/* returns general purpose PMC with the specified MSR. Note that it can be
|
|
|
|
* used for both PERFCTRn and EVNTSELn; that is why it accepts base as a
|
2021-03-18 07:28:01 -07:00
|
|
|
* parameter to tell them apart.
|
2015-06-19 06:45:05 -07:00
|
|
|
*/
|
|
|
|
static inline struct kvm_pmc *get_gp_pmc(struct kvm_pmu *pmu, u32 msr,
|
|
|
|
u32 base)
|
|
|
|
{
|
2019-12-11 13:47:48 -07:00
|
|
|
if (msr >= base && msr < base + pmu->nr_arch_gp_counters) {
|
|
|
|
u32 index = array_index_nospec(msr - base,
|
|
|
|
pmu->nr_arch_gp_counters);
|
|
|
|
|
|
|
|
return &pmu->gp_counters[index];
|
|
|
|
}
|
2015-06-19 06:45:05 -07:00
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* returns fixed PMC with the specified MSR */
|
|
|
|
static inline struct kvm_pmc *get_fixed_pmc(struct kvm_pmu *pmu, u32 msr)
|
|
|
|
{
|
|
|
|
int base = MSR_CORE_PERF_FIXED_CTR0;
|
|
|
|
|
2019-12-11 13:47:48 -07:00
|
|
|
if (msr >= base && msr < base + pmu->nr_arch_fixed_counters) {
|
|
|
|
u32 index = array_index_nospec(msr - base,
|
|
|
|
pmu->nr_arch_fixed_counters);
|
|
|
|
|
|
|
|
return &pmu->fixed_counters[index];
|
|
|
|
}
|
2015-06-19 06:45:05 -07:00
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2022-04-11 03:19:42 -07:00
|
|
|
static inline bool pmc_speculative_in_use(struct kvm_pmc *pmc)
|
|
|
|
{
|
|
|
|
struct kvm_pmu *pmu = pmc_to_pmu(pmc);
|
|
|
|
|
|
|
|
if (pmc_is_fixed(pmc))
|
|
|
|
return fixed_ctrl_field(pmu->fixed_ctr_ctrl,
|
2024-04-29 17:52:39 -07:00
|
|
|
pmc->idx - KVM_FIXED_PMC_BASE_IDX) &
|
|
|
|
(INTEL_FIXED_0_KERNEL | INTEL_FIXED_0_USER);
|
2022-04-11 03:19:42 -07:00
|
|
|
|
|
|
|
return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
|
|
|
|
}
|
|
|
|
|
2022-04-11 03:19:44 -07:00
|
|
|
extern struct x86_pmu_capability kvm_pmu_cap;
|
2023-11-09 19:28:54 -07:00
|
|
|
extern struct kvm_pmu_emulated_event_selectors kvm_pmu_eventsel;
|
2022-04-11 03:19:44 -07:00
|
|
|
|
2023-01-24 16:49:00 -07:00
|
|
|
static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
|
2022-04-11 03:19:44 -07:00
|
|
|
{
|
2022-05-18 10:01:18 -07:00
|
|
|
bool is_intel = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL;
|
2023-06-02 18:10:53 -07:00
|
|
|
int min_nr_gp_ctrs = pmu_ops->MIN_NR_GP_COUNTERS;
|
2022-05-18 10:01:18 -07:00
|
|
|
|
KVM: x86/pmu: Disable vPMU support on hybrid CPUs (host PMUs)
Disable KVM support for virtualizing PMUs on hosts with hybrid PMUs until
KVM gains a sane way to enumeration the hybrid vPMU to userspace and/or
gains a mechanism to let userspace opt-in to the dangers of exposing a
hybrid vPMU to KVM guests. Virtualizing a hybrid PMU, or at least part of
a hybrid PMU, is possible, but it requires careful, deliberate
configuration from userspace.
E.g. to expose full functionality, vCPUs need to be pinned to pCPUs to
prevent migrating a vCPU between a big core and a little core, userspace
must enumerate a reasonable topology to the guest, and guest CPUID must be
curated per vCPU to enumerate accurate vPMU capabilities.
The last point is especially problematic, as KVM doesn't control which
pCPU it runs on when enumerating KVM's vPMU capabilities to userspace,
i.e. userspace can't rely on KVM_GET_SUPPORTED_CPUID in it's current form.
Alternatively, userspace could enable vPMU support by enumerating the
set of features that are common and coherent across all cores, e.g. by
filtering PMU events and restricting guest capabilities. But again, that
requires userspace to take action far beyond reflecting KVM's supported
feature set into the guest.
For now, simply disable vPMU support on hybrid CPUs to avoid inducing
seemingly random #GPs in guests, and punt support for hybrid CPUs to a
future enabling effort.
Reported-by: Jianfeng Gao <jianfeng.gao@intel.com>
Cc: stable@vger.kernel.org
Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/all/20220818181530.2355034-1-kan.liang@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230208204230.1360502-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-08 13:42:29 -07:00
|
|
|
/*
|
|
|
|
* Hybrid PMUs don't play nice with virtualization without careful
|
|
|
|
* configuration by userspace, and KVM's APIs for reporting supported
|
|
|
|
* vPMU features do not account for hybrid PMUs. Disable vPMU support
|
|
|
|
* for hybrid PMUs until KVM gains a way to let userspace opt-in.
|
|
|
|
*/
|
|
|
|
if (cpu_feature_enabled(X86_FEATURE_HYBRID_CPU))
|
2022-05-18 10:01:18 -07:00
|
|
|
enable_pmu = false;
|
2022-05-31 20:19:24 -07:00
|
|
|
|
KVM: x86/pmu: Disable vPMU support on hybrid CPUs (host PMUs)
Disable KVM support for virtualizing PMUs on hosts with hybrid PMUs until
KVM gains a sane way to enumeration the hybrid vPMU to userspace and/or
gains a mechanism to let userspace opt-in to the dangers of exposing a
hybrid vPMU to KVM guests. Virtualizing a hybrid PMU, or at least part of
a hybrid PMU, is possible, but it requires careful, deliberate
configuration from userspace.
E.g. to expose full functionality, vCPUs need to be pinned to pCPUs to
prevent migrating a vCPU between a big core and a little core, userspace
must enumerate a reasonable topology to the guest, and guest CPUID must be
curated per vCPU to enumerate accurate vPMU capabilities.
The last point is especially problematic, as KVM doesn't control which
pCPU it runs on when enumerating KVM's vPMU capabilities to userspace,
i.e. userspace can't rely on KVM_GET_SUPPORTED_CPUID in it's current form.
Alternatively, userspace could enable vPMU support by enumerating the
set of features that are common and coherent across all cores, e.g. by
filtering PMU events and restricting guest capabilities. But again, that
requires userspace to take action far beyond reflecting KVM's supported
feature set into the guest.
For now, simply disable vPMU support on hybrid CPUs to avoid inducing
seemingly random #GPs in guests, and punt support for hybrid CPUs to a
future enabling effort.
Reported-by: Jianfeng Gao <jianfeng.gao@intel.com>
Cc: stable@vger.kernel.org
Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/all/20220818181530.2355034-1-kan.liang@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230208204230.1360502-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-08 13:42:29 -07:00
|
|
|
if (enable_pmu) {
|
|
|
|
perf_get_x86_pmu_capability(&kvm_pmu_cap);
|
|
|
|
|
|
|
|
/*
|
2023-06-02 18:10:53 -07:00
|
|
|
* WARN if perf did NOT disable hardware PMU if the number of
|
|
|
|
* architecturally required GP counters aren't present, i.e. if
|
|
|
|
* there are a non-zero number of counters, but fewer than what
|
|
|
|
* is architecturally required.
|
KVM: x86/pmu: Disable vPMU support on hybrid CPUs (host PMUs)
Disable KVM support for virtualizing PMUs on hosts with hybrid PMUs until
KVM gains a sane way to enumeration the hybrid vPMU to userspace and/or
gains a mechanism to let userspace opt-in to the dangers of exposing a
hybrid vPMU to KVM guests. Virtualizing a hybrid PMU, or at least part of
a hybrid PMU, is possible, but it requires careful, deliberate
configuration from userspace.
E.g. to expose full functionality, vCPUs need to be pinned to pCPUs to
prevent migrating a vCPU between a big core and a little core, userspace
must enumerate a reasonable topology to the guest, and guest CPUID must be
curated per vCPU to enumerate accurate vPMU capabilities.
The last point is especially problematic, as KVM doesn't control which
pCPU it runs on when enumerating KVM's vPMU capabilities to userspace,
i.e. userspace can't rely on KVM_GET_SUPPORTED_CPUID in it's current form.
Alternatively, userspace could enable vPMU support by enumerating the
set of features that are common and coherent across all cores, e.g. by
filtering PMU events and restricting guest capabilities. But again, that
requires userspace to take action far beyond reflecting KVM's supported
feature set into the guest.
For now, simply disable vPMU support on hybrid CPUs to avoid inducing
seemingly random #GPs in guests, and punt support for hybrid CPUs to a
future enabling effort.
Reported-by: Jianfeng Gao <jianfeng.gao@intel.com>
Cc: stable@vger.kernel.org
Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/all/20220818181530.2355034-1-kan.liang@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230208204230.1360502-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-08 13:42:29 -07:00
|
|
|
*/
|
2023-06-02 18:10:53 -07:00
|
|
|
if (!kvm_pmu_cap.num_counters_gp ||
|
|
|
|
WARN_ON_ONCE(kvm_pmu_cap.num_counters_gp < min_nr_gp_ctrs))
|
|
|
|
enable_pmu = false;
|
|
|
|
else if (is_intel && !kvm_pmu_cap.version)
|
KVM: x86/pmu: Disable vPMU support on hybrid CPUs (host PMUs)
Disable KVM support for virtualizing PMUs on hosts with hybrid PMUs until
KVM gains a sane way to enumeration the hybrid vPMU to userspace and/or
gains a mechanism to let userspace opt-in to the dangers of exposing a
hybrid vPMU to KVM guests. Virtualizing a hybrid PMU, or at least part of
a hybrid PMU, is possible, but it requires careful, deliberate
configuration from userspace.
E.g. to expose full functionality, vCPUs need to be pinned to pCPUs to
prevent migrating a vCPU between a big core and a little core, userspace
must enumerate a reasonable topology to the guest, and guest CPUID must be
curated per vCPU to enumerate accurate vPMU capabilities.
The last point is especially problematic, as KVM doesn't control which
pCPU it runs on when enumerating KVM's vPMU capabilities to userspace,
i.e. userspace can't rely on KVM_GET_SUPPORTED_CPUID in it's current form.
Alternatively, userspace could enable vPMU support by enumerating the
set of features that are common and coherent across all cores, e.g. by
filtering PMU events and restricting guest capabilities. But again, that
requires userspace to take action far beyond reflecting KVM's supported
feature set into the guest.
For now, simply disable vPMU support on hybrid CPUs to avoid inducing
seemingly random #GPs in guests, and punt support for hybrid CPUs to a
future enabling effort.
Reported-by: Jianfeng Gao <jianfeng.gao@intel.com>
Cc: stable@vger.kernel.org
Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/all/20220818181530.2355034-1-kan.liang@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230208204230.1360502-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-08 13:42:29 -07:00
|
|
|
enable_pmu = false;
|
|
|
|
}
|
|
|
|
|
2022-05-31 20:19:24 -07:00
|
|
|
if (!enable_pmu) {
|
|
|
|
memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
|
2022-05-18 10:01:18 -07:00
|
|
|
return;
|
|
|
|
}
|
2022-04-11 03:19:44 -07:00
|
|
|
|
|
|
|
kvm_pmu_cap.version = min(kvm_pmu_cap.version, 2);
|
2023-01-24 16:49:00 -07:00
|
|
|
kvm_pmu_cap.num_counters_gp = min(kvm_pmu_cap.num_counters_gp,
|
|
|
|
pmu_ops->MAX_NR_GP_COUNTERS);
|
2022-04-11 03:19:44 -07:00
|
|
|
kvm_pmu_cap.num_counters_fixed = min(kvm_pmu_cap.num_counters_fixed,
|
2024-06-26 19:17:55 -07:00
|
|
|
KVM_MAX_NR_FIXED_COUNTERS);
|
2023-11-09 19:28:54 -07:00
|
|
|
|
|
|
|
kvm_pmu_eventsel.INSTRUCTIONS_RETIRED =
|
|
|
|
perf_get_hw_event_config(PERF_COUNT_HW_INSTRUCTIONS);
|
|
|
|
kvm_pmu_eventsel.BRANCH_INSTRUCTIONS_RETIRED =
|
|
|
|
perf_get_hw_event_config(PERF_COUNT_HW_BRANCH_INSTRUCTIONS);
|
2022-04-11 03:19:44 -07:00
|
|
|
}
|
|
|
|
|
2023-03-10 04:33:49 -07:00
|
|
|
static inline void kvm_pmu_request_counter_reprogram(struct kvm_pmc *pmc)
|
2022-09-22 17:13:54 -07:00
|
|
|
{
|
|
|
|
set_bit(pmc->idx, pmc_to_pmu(pmc)->reprogram_pmi);
|
|
|
|
kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
|
|
|
|
}
|
2015-06-19 06:45:05 -07:00
|
|
|
|
2023-06-02 18:10:48 -07:00
|
|
|
static inline void reprogram_counters(struct kvm_pmu *pmu, u64 diff)
|
|
|
|
{
|
|
|
|
int bit;
|
|
|
|
|
|
|
|
if (!diff)
|
|
|
|
return;
|
|
|
|
|
|
|
|
for_each_set_bit(bit, (unsigned long *)&diff, X86_PMC_IDX_MAX)
|
|
|
|
set_bit(bit, pmu->reprogram_pmi);
|
|
|
|
kvm_make_request(KVM_REQ_PMU, pmu_to_vcpu(pmu));
|
|
|
|
}
|
|
|
|
|
2023-06-02 18:10:51 -07:00
|
|
|
/*
|
|
|
|
* Check if a PMC is enabled by comparing it against global_ctrl bits.
|
|
|
|
*
|
|
|
|
* If the vPMU doesn't have global_ctrl MSR, all vPMCs are enabled.
|
|
|
|
*/
|
|
|
|
static inline bool pmc_is_globally_enabled(struct kvm_pmc *pmc)
|
|
|
|
{
|
|
|
|
struct kvm_pmu *pmu = pmc_to_pmu(pmc);
|
|
|
|
|
|
|
|
if (!kvm_pmu_has_perf_global_ctrl(pmu))
|
|
|
|
return true;
|
|
|
|
|
|
|
|
return test_bit(pmc->idx, (unsigned long *)&pmu->global_ctrl);
|
|
|
|
}
|
|
|
|
|
2015-06-19 04:54:23 -07:00
|
|
|
void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
|
|
|
|
void kvm_pmu_handle_event(struct kvm_vcpu *vcpu);
|
|
|
|
int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
|
KVM: x86/pmu: Prioritize VMX interception over #GP on RDPMC due to bad index
Apply the pre-intercepts RDPMC validity check only to AMD, and rename all
relevant functions to make it as clear as possible that the check is not a
standard PMC index check. On Intel, the basic rule is that only invalid
opcodes and privilege/permission/mode checks have priority over VM-Exit,
i.e. RDPMC with an invalid index should VM-Exit, not #GP. While the SDM
doesn't explicitly call out RDPMC, it _does_ explicitly use RDMSR of a
non-existent MSR as an example where VM-Exit has priority over #GP, and
RDPMC is effectively just a variation of RDMSR.
Manually testing on various Intel CPUs confirms this behavior, and the
inverted priority was introduced for SVM compatibility, i.e. was not an
intentional change for Intel PMUs. On AMD, *all* exceptions on RDPMC have
priority over VM-Exit.
Check for a NULL kvm_pmu_ops.check_rdpmc_early instead of using a RET0
static call so as to provide a convenient location to document the
difference between Intel and AMD, and to again try to make it as obvious
as possible that the early check is a one-off thing, not a generic "is
this PMC valid?" helper.
Fixes: 8061252ee0d2 ("KVM: SVM: Add intercept checks for remaining twobyte instructions")
Cc: Jim Mattson <jmattson@google.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20240109230250.424295-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-01-09 16:02:27 -07:00
|
|
|
int kvm_pmu_check_rdpmc_early(struct kvm_vcpu *vcpu, unsigned int idx);
|
2022-06-10 17:57:52 -07:00
|
|
|
bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr);
|
2020-05-29 00:43:44 -07:00
|
|
|
int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
|
2015-06-19 04:54:23 -07:00
|
|
|
int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
|
|
|
|
void kvm_pmu_refresh(struct kvm_vcpu *vcpu);
|
|
|
|
void kvm_pmu_init(struct kvm_vcpu *vcpu);
|
2019-10-27 03:52:43 -07:00
|
|
|
void kvm_pmu_cleanup(struct kvm_vcpu *vcpu);
|
2015-06-19 04:54:23 -07:00
|
|
|
void kvm_pmu_destroy(struct kvm_vcpu *vcpu);
|
2019-07-10 18:25:15 -07:00
|
|
|
int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp);
|
2023-11-09 19:28:54 -07:00
|
|
|
void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 eventsel);
|
2015-06-19 04:54:23 -07:00
|
|
|
|
2018-03-12 04:12:53 -07:00
|
|
|
bool is_vmware_backdoor_pmc(u32 pmc_idx);
|
|
|
|
|
2015-06-19 06:45:05 -07:00
|
|
|
extern struct kvm_pmu_ops intel_pmu_ops;
|
|
|
|
extern struct kvm_pmu_ops amd_pmu_ops;
|
2015-06-19 04:54:23 -07:00
|
|
|
#endif /* __KVM_X86_PMU_H */
|