License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 07:07:57 -07:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
2012-05-21 19:50:07 -07:00
|
|
|
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
|
|
|
|
|
2008-03-10 15:28:04 -07:00
|
|
|
#include <linux/errno.h>
|
|
|
|
#include <linux/kernel.h>
|
|
|
|
#include <linux/mm.h>
|
|
|
|
#include <linux/smp.h>
|
2023-02-14 01:38:57 -07:00
|
|
|
#include <linux/cpu.h>
|
2009-02-27 14:25:28 -07:00
|
|
|
#include <linux/prctl.h>
|
2008-03-10 15:28:04 -07:00
|
|
|
#include <linux/slab.h>
|
|
|
|
#include <linux/sched.h>
|
2017-02-01 08:36:40 -07:00
|
|
|
#include <linux/sched/idle.h>
|
2017-02-08 10:51:35 -07:00
|
|
|
#include <linux/sched/debug.h>
|
2017-02-08 10:51:36 -07:00
|
|
|
#include <linux/sched/task.h>
|
2017-02-08 10:51:37 -07:00
|
|
|
#include <linux/sched/task_stack.h>
|
2016-07-13 17:18:56 -07:00
|
|
|
#include <linux/init.h>
|
|
|
|
#include <linux/export.h>
|
2008-04-25 08:39:01 -07:00
|
|
|
#include <linux/pm.h>
|
2015-04-02 17:01:28 -07:00
|
|
|
#include <linux/tick.h>
|
2009-05-11 19:05:28 -07:00
|
|
|
#include <linux/random.h>
|
2009-09-18 23:40:22 -07:00
|
|
|
#include <linux/user-return-notifier.h>
|
2009-12-08 01:29:42 -07:00
|
|
|
#include <linux/dmi.h>
|
|
|
|
#include <linux/utsname.h>
|
2012-03-25 14:00:04 -07:00
|
|
|
#include <linux/stackprotector.h>
|
|
|
|
#include <linux/cpuidle.h>
|
2018-11-21 19:04:09 -07:00
|
|
|
#include <linux/acpi.h>
|
|
|
|
#include <linux/elf-randomize.h>
|
2023-01-12 12:43:16 -07:00
|
|
|
#include <linux/static_call.h>
|
2009-09-17 07:11:28 -07:00
|
|
|
#include <trace/events/power.h>
|
2009-09-09 10:22:48 -07:00
|
|
|
#include <linux/hw_breakpoint.h>
|
2023-06-23 15:55:29 -07:00
|
|
|
#include <linux/entry-common.h>
|
2011-01-20 07:42:52 -07:00
|
|
|
#include <asm/cpu.h>
|
2008-11-11 06:33:44 -07:00
|
|
|
#include <asm/apic.h>
|
2016-12-24 12:46:01 -07:00
|
|
|
#include <linux/uaccess.h>
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-14 22:37:34 -07:00
|
|
|
#include <asm/mwait.h>
|
2021-10-21 15:55:10 -07:00
|
|
|
#include <asm/fpu/api.h>
|
2021-10-14 18:16:20 -07:00
|
|
|
#include <asm/fpu/sched.h>
|
2021-10-21 15:55:22 -07:00
|
|
|
#include <asm/fpu/xstate.h>
|
2009-06-01 11:14:55 -07:00
|
|
|
#include <asm/debugreg.h>
|
2012-03-25 14:00:04 -07:00
|
|
|
#include <asm/nmi.h>
|
2014-10-24 15:58:07 -07:00
|
|
|
#include <asm/tlbflush.h>
|
2015-08-12 09:29:40 -07:00
|
|
|
#include <asm/mce.h>
|
2015-07-28 22:41:16 -07:00
|
|
|
#include <asm/vm86.h>
|
2016-08-13 09:38:18 -07:00
|
|
|
#include <asm/switch_to.h>
|
2017-02-20 09:56:14 -07:00
|
|
|
#include <asm/desc.h>
|
2017-03-20 01:16:26 -07:00
|
|
|
#include <asm/prctl.h>
|
2018-04-29 06:21:42 -07:00
|
|
|
#include <asm/spec-ctrl.h>
|
2019-11-11 15:03:21 -07:00
|
|
|
#include <asm/io_bitmap.h>
|
2018-11-21 19:04:09 -07:00
|
|
|
#include <asm/proto.h>
|
2020-09-14 10:04:22 -07:00
|
|
|
#include <asm/frame.h>
|
2021-10-22 07:53:02 -07:00
|
|
|
#include <asm/unwind.h>
|
x86/tdx: Add HLT support for TDX guests
The HLT instruction is a privileged instruction, executing it stops
instruction execution and places the processor in a HALT state. It
is used in kernel for cases like reboot, idle loop and exception fixup
handlers. For the idle case, interrupts will be enabled (using STI)
before the HLT instruction (this is also called safe_halt()).
To support the HLT instruction in TDX guests, it needs to be emulated
using TDVMCALL (hypercall to VMM). More details about it can be found
in Intel Trust Domain Extensions (Intel TDX) Guest-Host-Communication
Interface (GHCI) specification, section TDVMCALL[Instruction.HLT].
In TDX guests, executing HLT instruction will generate a #VE, which is
used to emulate the HLT instruction. But #VE based emulation will not
work for the safe_halt() flavor, because it requires STI instruction to
be executed just before the TDCALL. Since idle loop is the only user of
safe_halt() variant, handle it as a special case.
To avoid *safe_halt() call in the idle function, define the
tdx_guest_idle() and use it to override the "x86_idle" function pointer
for a valid TDX guest.
Alternative choices like PV ops have been considered for adding
safe_halt() support. But it was rejected because HLT paravirt calls
only exist under PARAVIRT_XXL, and enabling it in TDX guest just for
safe_halt() use case is not worth the cost.
Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20220405232939.73860-9-kirill.shutemov@linux.intel.com
2022-04-05 16:29:17 -07:00
|
|
|
#include <asm/tdx.h>
|
2023-03-12 04:26:01 -07:00
|
|
|
#include <asm/mmu_context.h>
|
x86/shstk: Handle thread shadow stack
When a process is duplicated, but the child shares the address space with
the parent, there is potential for the threads sharing a single stack to
cause conflicts for each other. In the normal non-CET case this is handled
in two ways.
With regular CLONE_VM a new stack is provided by userspace such that the
parent and child have different stacks.
For vfork, the parent is suspended until the child exits. So as long as
the child doesn't return from the vfork()/CLONE_VFORK calling function and
sticks to a limited set of operations, the parent and child can share the
same stack.
For shadow stack, these scenarios present similar sharing problems. For the
CLONE_VM case, the child and the parent must have separate shadow stacks.
Instead of changing clone to take a shadow stack, have the kernel just
allocate one and switch to it.
Use stack_size passed from clone3() syscall for thread shadow stack size. A
compat-mode thread shadow stack size is further reduced to 1/4. This
allows more threads to run in a 32-bit address space. The clone() does not
pass stack_size, which was added to clone3(). In that case, use
RLIMIT_STACK size and cap to 4 GB.
For shadow stack enabled vfork(), the parent and child can share the same
shadow stack, like they can share a normal stack. Since the parent is
suspended until the child terminates, the child will not interfere with
the parent while executing as long as it doesn't return from the vfork()
and overwrite up the shadow stack. The child can safely overwrite down
the shadow stack, as the parent can just overwrite this later. So CET does
not add any additional limitations for vfork().
Free the shadow stack on thread exit by doing it in mm_release(). Skip
this when exiting a vfork() child since the stack is shared in the
parent.
During this operation, the shadow stack pointer of the new thread needs
to be updated to point to the newly allocated shadow stack. Since the
ability to do this is confined to the FPU subsystem, change
fpu_clone() to take the new shadow stack pointer, and update it
internally inside the FPU subsystem. This part was suggested by Thomas
Gleixner.
Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/all/20230613001108.3040476-30-rick.p.edgecombe%40intel.com
2023-06-12 17:10:55 -07:00
|
|
|
#include <asm/shstk.h>
|
2012-03-25 14:00:04 -07:00
|
|
|
|
2018-11-25 11:33:47 -07:00
|
|
|
#include "process.h"
|
|
|
|
|
2012-05-03 02:03:01 -07:00
|
|
|
/*
|
|
|
|
* per-CPU TSS segments. Threads are completely 'soft' on Linux,
|
|
|
|
* no more per-task TSS's. The TSS size is kept cacheline-aligned
|
|
|
|
* so they are allowed to end up in the .data..cacheline_aligned
|
|
|
|
* section. Since TSS's are completely CPU-local, we want them
|
|
|
|
* on exact cacheline boundaries, to eliminate cacheline ping-pong.
|
|
|
|
*/
|
2018-01-03 13:39:52 -07:00
|
|
|
__visible DEFINE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw) = {
|
2015-03-05 20:19:06 -07:00
|
|
|
.x86_tss = {
|
2017-11-02 00:59:13 -07:00
|
|
|
/*
|
|
|
|
* .sp0 is only used when entering ring 0 from a lower
|
|
|
|
* privilege level. Since the init task never runs anything
|
|
|
|
* but ring 0 code, there is no need for a valid value here.
|
|
|
|
* Poison it.
|
|
|
|
*/
|
|
|
|
.sp0 = (1UL << (BITS_PER_LONG-1)) + 1,
|
2017-12-04 07:07:21 -07:00
|
|
|
|
2021-01-25 10:34:29 -07:00
|
|
|
#ifdef CONFIG_X86_32
|
2017-12-04 07:07:21 -07:00
|
|
|
.sp1 = TOP_OF_INIT_STACK,
|
|
|
|
|
2015-03-05 20:19:06 -07:00
|
|
|
.ss0 = __KERNEL_DS,
|
|
|
|
.ss1 = __KERNEL_CS,
|
|
|
|
#endif
|
2019-11-11 15:03:20 -07:00
|
|
|
.io_bitmap_base = IO_BITMAP_OFFSET_INVALID,
|
2015-03-05 20:19:06 -07:00
|
|
|
},
|
|
|
|
};
|
2017-12-04 07:07:29 -07:00
|
|
|
EXPORT_PER_CPU_SYMBOL(cpu_tss_rw);
|
2012-05-03 02:03:01 -07:00
|
|
|
|
2017-02-22 08:36:16 -07:00
|
|
|
DEFINE_PER_CPU(bool, __tss_limit_invalid);
|
|
|
|
EXPORT_PER_CPU_SYMBOL_GPL(__tss_limit_invalid);
|
2017-02-20 09:56:14 -07:00
|
|
|
|
2012-05-16 15:03:51 -07:00
|
|
|
/*
|
|
|
|
* this gets called so that we can store lazy state into memory and copy the
|
|
|
|
* current task into the new thread.
|
|
|
|
*/
|
2008-03-10 15:28:04 -07:00
|
|
|
int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
|
|
|
|
{
|
2015-07-17 03:28:12 -07:00
|
|
|
memcpy(dst, src, arch_task_struct_size);
|
2015-10-30 22:42:46 -07:00
|
|
|
#ifdef CONFIG_VM86
|
|
|
|
dst->thread.vm86 = NULL;
|
|
|
|
#endif
|
2021-10-13 07:55:43 -07:00
|
|
|
/* Drop the copied pointer to current's fpstate */
|
|
|
|
dst->thread.fpu.fpstate = NULL;
|
2021-10-21 15:55:22 -07:00
|
|
|
|
2021-10-14 18:16:04 -07:00
|
|
|
return 0;
|
2008-03-10 15:28:04 -07:00
|
|
|
}
|
2008-04-25 08:39:01 -07:00
|
|
|
|
2021-10-21 15:55:22 -07:00
|
|
|
#ifdef CONFIG_X86_64
|
|
|
|
void arch_release_task_struct(struct task_struct *tsk)
|
|
|
|
{
|
|
|
|
if (fpu_state_size_dynamic())
|
|
|
|
fpstate_free(&tsk->thread.fpu);
|
2008-03-10 15:28:04 -07:00
|
|
|
}
|
2021-10-21 15:55:22 -07:00
|
|
|
#endif
|
2008-04-25 08:39:01 -07:00
|
|
|
|
2009-02-27 14:25:28 -07:00
|
|
|
/*
|
x86/ioperm: Prevent a memory leak when fork fails
In the copy_process() routine called by _do_fork(), failure to allocate
a PID (or further along in the function) will trigger an invocation to
exit_thread(). This is done to clean up from an earlier call to
copy_thread_tls(). Naturally, the child task is passed into exit_thread(),
however during the process, io_bitmap_exit() nullifies the parent's
io_bitmap rather than the child's.
As copy_thread_tls() has been called ahead of the failure, the reference
count on the calling thread's io_bitmap is incremented as we would expect.
However, io_bitmap_exit() doesn't accept any arguments, and thus assumes
it should trash the current thread's io_bitmap reference rather than the
child's. This is pretty sneaky in practice, because in all instances but
this one, exit_thread() is called with respect to the current task and
everything works out.
A determined attacker can issue an appropriate ioctl (i.e. KDENABIO) to
get a bitmap allocated, and force a clone3() syscall to fail by passing
in a zeroed clone_args structure. The kernel handles the erroneous struct
and the buggy code path is followed, and even though the parent's reference
to the io_bitmap is trashed, the child still holds a reference and thus
the structure will never be freed.
Fix this by tweaking io_bitmap_exit() and its subroutines to accept a
task_struct argument which to operate on.
Fixes: ea5f1cd7ab49 ("x86/ioperm: Remove bitmap if all permissions dropped")
Signed-off-by: Jay Lang <jaytlang@mit.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable#@vger.kernel.org
Link: https://lkml.kernel.org/r/20200524162742.253727-1-jaytlang@mit.edu
2020-05-24 09:27:39 -07:00
|
|
|
* Free thread data structures etc..
|
2009-02-27 14:25:28 -07:00
|
|
|
*/
|
2016-05-20 17:00:20 -07:00
|
|
|
void exit_thread(struct task_struct *tsk)
|
2009-02-27 14:25:28 -07:00
|
|
|
{
|
2016-05-20 17:00:20 -07:00
|
|
|
struct thread_struct *t = &tsk->thread;
|
2015-04-23 03:33:50 -07:00
|
|
|
struct fpu *fpu = &t->fpu;
|
2019-11-11 15:03:24 -07:00
|
|
|
|
|
|
|
if (test_thread_flag(TIF_IO_BITMAP))
|
x86/ioperm: Prevent a memory leak when fork fails
In the copy_process() routine called by _do_fork(), failure to allocate
a PID (or further along in the function) will trigger an invocation to
exit_thread(). This is done to clean up from an earlier call to
copy_thread_tls(). Naturally, the child task is passed into exit_thread(),
however during the process, io_bitmap_exit() nullifies the parent's
io_bitmap rather than the child's.
As copy_thread_tls() has been called ahead of the failure, the reference
count on the calling thread's io_bitmap is incremented as we would expect.
However, io_bitmap_exit() doesn't accept any arguments, and thus assumes
it should trash the current thread's io_bitmap reference rather than the
child's. This is pretty sneaky in practice, because in all instances but
this one, exit_thread() is called with respect to the current task and
everything works out.
A determined attacker can issue an appropriate ioctl (i.e. KDENABIO) to
get a bitmap allocated, and force a clone3() syscall to fail by passing
in a zeroed clone_args structure. The kernel handles the erroneous struct
and the buggy code path is followed, and even though the parent's reference
to the io_bitmap is trashed, the child still holds a reference and thus
the structure will never be freed.
Fix this by tweaking io_bitmap_exit() and its subroutines to accept a
task_struct argument which to operate on.
Fixes: ea5f1cd7ab49 ("x86/ioperm: Remove bitmap if all permissions dropped")
Signed-off-by: Jay Lang <jaytlang@mit.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable#@vger.kernel.org
Link: https://lkml.kernel.org/r/20200524162742.253727-1-jaytlang@mit.edu
2020-05-24 09:27:39 -07:00
|
|
|
io_bitmap_exit(tsk);
|
2012-05-16 15:03:54 -07:00
|
|
|
|
2015-07-28 22:41:16 -07:00
|
|
|
free_vm86(t);
|
|
|
|
|
x86/shstk: Handle thread shadow stack
When a process is duplicated, but the child shares the address space with
the parent, there is potential for the threads sharing a single stack to
cause conflicts for each other. In the normal non-CET case this is handled
in two ways.
With regular CLONE_VM a new stack is provided by userspace such that the
parent and child have different stacks.
For vfork, the parent is suspended until the child exits. So as long as
the child doesn't return from the vfork()/CLONE_VFORK calling function and
sticks to a limited set of operations, the parent and child can share the
same stack.
For shadow stack, these scenarios present similar sharing problems. For the
CLONE_VM case, the child and the parent must have separate shadow stacks.
Instead of changing clone to take a shadow stack, have the kernel just
allocate one and switch to it.
Use stack_size passed from clone3() syscall for thread shadow stack size. A
compat-mode thread shadow stack size is further reduced to 1/4. This
allows more threads to run in a 32-bit address space. The clone() does not
pass stack_size, which was added to clone3(). In that case, use
RLIMIT_STACK size and cap to 4 GB.
For shadow stack enabled vfork(), the parent and child can share the same
shadow stack, like they can share a normal stack. Since the parent is
suspended until the child terminates, the child will not interfere with
the parent while executing as long as it doesn't return from the vfork()
and overwrite up the shadow stack. The child can safely overwrite down
the shadow stack, as the parent can just overwrite this later. So CET does
not add any additional limitations for vfork().
Free the shadow stack on thread exit by doing it in mm_release(). Skip
this when exiting a vfork() child since the stack is shared in the
parent.
During this operation, the shadow stack pointer of the new thread needs
to be updated to point to the newly allocated shadow stack. Since the
ability to do this is confined to the FPU subsystem, change
fpu_clone() to take the new shadow stack pointer, and update it
internally inside the FPU subsystem. This part was suggested by Thomas
Gleixner.
Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/all/20230613001108.3040476-30-rick.p.edgecombe%40intel.com
2023-06-12 17:10:55 -07:00
|
|
|
shstk_free(tsk);
|
x86/fpu: Synchronize the naming of drop_fpu() and fpu_reset_state()
drop_fpu() and fpu_reset_state() are similar in functionality
and in scope, yet this is not apparent from their names.
drop_fpu() deactivates FPU contents (both the fpregs and the fpstate),
but leaves register contents intact in the eager-FPU case, mostly as an
optimization. It disables fpregs in the lazy FPU case. The drop_fpu()
method can be used to destroy FPU state in an optimized way, when we
know that a new state will be loaded before user-space might see
any remains of the old FPU state:
- such as in sys_exit()'s exit_thread() where we know this task
won't execute any user-space instructions anymore and the
next context switch cleans up the FPU. The old FPU state
might still be around in the eagerfpu case but won't be
saved.
- in __restore_xstate_sig(), where we use drop_fpu() before
copying a new state into the fpstate and activating that one.
No user-pace instructions can execute between those steps.
- in sys_execve()'s fpu__clear(): there we use drop_fpu() in
the !eagerfpu case, where it's equivalent to a full reinit.
fpu_reset_state() is a stronger version of drop_fpu(): both in
the eagerfpu and the lazy-FPU case it guarantees that fpregs
are reinitialized to init state. This method is used in cases
where we need a full reset:
- handle_signal() uses fpu_reset_state() to reset the FPU state
to init before executing a user-space signal handler. While we
have already saved the original FPU state at this point, and
always restore the original state, the signal handling code
still has to do this reinit, because signals may interrupt
any user-space instruction, and the FPU might be in various
intermediate states (such as an unbalanced x87 stack) that is
not immediately usable for general C signal handler code.
- __restore_xstate_sig() uses fpu_reset_state() when the signal
frame has no FP context. Since the signal handler may have
modified the FPU state, it gets reset back to init state.
- in another branch __restore_xstate_sig() uses fpu_reset_state()
to handle a restoration error: when restore_user_xstate() fails
to restore FPU state and we might have inconsistent FPU data,
fpu_reset_state() is used to reset it back to a known good
state.
- __kernel_fpu_end() uses fpu_reset_state() in an error branch.
This is in a 'must not trigger' error branch, so on bug-free
kernels this never triggers.
- fpu__restore() uses fpu_reset_state() in an error path
as well: if the fpstate was set up with invalid FPU state
(via ptrace or via a signal handler), then it's reset back
to init state.
- likewise, the scheduler's switch_fpu_finish() uses it in a
restoration error path too.
Move both drop_fpu() and fpu_reset_state() to the fpu__*() namespace
and harmonize their naming with their function:
fpu__drop()
fpu__reset()
This clearly shows that both methods operate on the full state of the
FPU, just like fpu__restore().
Also add comments to explain what each function does.
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-29 10:04:31 -07:00
|
|
|
fpu__drop(fpu);
|
2009-02-27 14:25:28 -07:00
|
|
|
}
|
|
|
|
|
2019-11-11 15:03:16 -07:00
|
|
|
static int set_new_tls(struct task_struct *p, unsigned long tls)
|
|
|
|
{
|
|
|
|
struct user_desc __user *utls = (struct user_desc __user *)tls;
|
|
|
|
|
|
|
|
if (in_ia32_syscall())
|
|
|
|
return do_set_thread_area(p, -1, utls, 0);
|
|
|
|
else
|
|
|
|
return do_set_thread_area_64(p, ARCH_SET_FS, tls);
|
|
|
|
}
|
|
|
|
|
2023-06-23 15:55:29 -07:00
|
|
|
__visible void ret_from_fork(struct task_struct *prev, struct pt_regs *regs,
|
|
|
|
int (*fn)(void *), void *fn_arg)
|
|
|
|
{
|
|
|
|
schedule_tail(prev);
|
|
|
|
|
|
|
|
/* Is this a kernel thread? */
|
|
|
|
if (unlikely(fn)) {
|
|
|
|
fn(fn_arg);
|
|
|
|
/*
|
|
|
|
* A kernel thread is allowed to return here after successfully
|
|
|
|
* calling kernel_execve(). Exit to userspace to complete the
|
|
|
|
* execve() syscall.
|
|
|
|
*/
|
|
|
|
regs->ax = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
syscall_exit_to_user_mode(regs);
|
|
|
|
}
|
|
|
|
|
2022-04-08 16:07:50 -07:00
|
|
|
int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
|
2019-11-11 15:03:16 -07:00
|
|
|
{
|
2022-04-08 16:07:50 -07:00
|
|
|
unsigned long clone_flags = args->flags;
|
|
|
|
unsigned long sp = args->stack;
|
|
|
|
unsigned long tls = args->tls;
|
2019-11-11 15:03:16 -07:00
|
|
|
struct inactive_task_frame *frame;
|
|
|
|
struct fork_frame *fork_frame;
|
|
|
|
struct pt_regs *childregs;
|
x86/shstk: Handle thread shadow stack
When a process is duplicated, but the child shares the address space with
the parent, there is potential for the threads sharing a single stack to
cause conflicts for each other. In the normal non-CET case this is handled
in two ways.
With regular CLONE_VM a new stack is provided by userspace such that the
parent and child have different stacks.
For vfork, the parent is suspended until the child exits. So as long as
the child doesn't return from the vfork()/CLONE_VFORK calling function and
sticks to a limited set of operations, the parent and child can share the
same stack.
For shadow stack, these scenarios present similar sharing problems. For the
CLONE_VM case, the child and the parent must have separate shadow stacks.
Instead of changing clone to take a shadow stack, have the kernel just
allocate one and switch to it.
Use stack_size passed from clone3() syscall for thread shadow stack size. A
compat-mode thread shadow stack size is further reduced to 1/4. This
allows more threads to run in a 32-bit address space. The clone() does not
pass stack_size, which was added to clone3(). In that case, use
RLIMIT_STACK size and cap to 4 GB.
For shadow stack enabled vfork(), the parent and child can share the same
shadow stack, like they can share a normal stack. Since the parent is
suspended until the child terminates, the child will not interfere with
the parent while executing as long as it doesn't return from the vfork()
and overwrite up the shadow stack. The child can safely overwrite down
the shadow stack, as the parent can just overwrite this later. So CET does
not add any additional limitations for vfork().
Free the shadow stack on thread exit by doing it in mm_release(). Skip
this when exiting a vfork() child since the stack is shared in the
parent.
During this operation, the shadow stack pointer of the new thread needs
to be updated to point to the newly allocated shadow stack. Since the
ability to do this is confined to the FPU subsystem, change
fpu_clone() to take the new shadow stack pointer, and update it
internally inside the FPU subsystem. This part was suggested by Thomas
Gleixner.
Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/all/20230613001108.3040476-30-rick.p.edgecombe%40intel.com
2023-06-12 17:10:55 -07:00
|
|
|
unsigned long new_ssp;
|
2019-11-11 15:03:25 -07:00
|
|
|
int ret = 0;
|
2019-11-11 15:03:16 -07:00
|
|
|
|
|
|
|
childregs = task_pt_regs(p);
|
|
|
|
fork_frame = container_of(childregs, struct fork_frame, regs);
|
|
|
|
frame = &fork_frame->frame;
|
|
|
|
|
2020-09-14 10:04:22 -07:00
|
|
|
frame->bp = encode_frame_pointer(childregs);
|
2023-06-23 15:55:29 -07:00
|
|
|
frame->ret_addr = (unsigned long) ret_from_fork_asm;
|
2019-11-11 15:03:16 -07:00
|
|
|
p->thread.sp = (unsigned long) fork_frame;
|
2019-11-11 15:03:21 -07:00
|
|
|
p->thread.io_bitmap = NULL;
|
2021-09-17 02:20:04 -07:00
|
|
|
p->thread.iopl_warn = 0;
|
2019-11-11 15:03:16 -07:00
|
|
|
memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
|
|
|
|
|
|
|
|
#ifdef CONFIG_X86_64
|
2020-05-28 13:13:53 -07:00
|
|
|
current_save_fsgs();
|
|
|
|
p->thread.fsindex = current->thread.fsindex;
|
|
|
|
p->thread.fsbase = current->thread.fsbase;
|
|
|
|
p->thread.gsindex = current->thread.gsindex;
|
|
|
|
p->thread.gsbase = current->thread.gsbase;
|
|
|
|
|
2019-11-11 15:03:16 -07:00
|
|
|
savesegment(es, p->thread.es);
|
|
|
|
savesegment(ds, p->thread.ds);
|
2023-03-12 04:26:03 -07:00
|
|
|
|
|
|
|
if (p->mm && (clone_flags & (CLONE_VM | CLONE_VFORK)) == CLONE_VM)
|
|
|
|
set_bit(MM_CONTEXT_LOCK_LAM, &p->mm->context.flags);
|
2019-11-11 15:03:16 -07:00
|
|
|
#else
|
|
|
|
p->thread.sp0 = (unsigned long) (childregs + 1);
|
2022-03-25 08:39:52 -07:00
|
|
|
savesegment(gs, p->thread.gs);
|
2019-11-11 15:03:16 -07:00
|
|
|
/*
|
|
|
|
* Clear all status flags including IF and set fixed bit. 64bit
|
|
|
|
* does not have this initialization as the frame does not contain
|
|
|
|
* flags. The flags consistency (especially vs. AC) is there
|
|
|
|
* ensured via objtool, which lacks 32bit support.
|
|
|
|
*/
|
|
|
|
frame->flags = X86_EFLAGS_FIXED;
|
|
|
|
#endif
|
|
|
|
|
x86/shstk: Handle thread shadow stack
When a process is duplicated, but the child shares the address space with
the parent, there is potential for the threads sharing a single stack to
cause conflicts for each other. In the normal non-CET case this is handled
in two ways.
With regular CLONE_VM a new stack is provided by userspace such that the
parent and child have different stacks.
For vfork, the parent is suspended until the child exits. So as long as
the child doesn't return from the vfork()/CLONE_VFORK calling function and
sticks to a limited set of operations, the parent and child can share the
same stack.
For shadow stack, these scenarios present similar sharing problems. For the
CLONE_VM case, the child and the parent must have separate shadow stacks.
Instead of changing clone to take a shadow stack, have the kernel just
allocate one and switch to it.
Use stack_size passed from clone3() syscall for thread shadow stack size. A
compat-mode thread shadow stack size is further reduced to 1/4. This
allows more threads to run in a 32-bit address space. The clone() does not
pass stack_size, which was added to clone3(). In that case, use
RLIMIT_STACK size and cap to 4 GB.
For shadow stack enabled vfork(), the parent and child can share the same
shadow stack, like they can share a normal stack. Since the parent is
suspended until the child terminates, the child will not interfere with
the parent while executing as long as it doesn't return from the vfork()
and overwrite up the shadow stack. The child can safely overwrite down
the shadow stack, as the parent can just overwrite this later. So CET does
not add any additional limitations for vfork().
Free the shadow stack on thread exit by doing it in mm_release(). Skip
this when exiting a vfork() child since the stack is shared in the
parent.
During this operation, the shadow stack pointer of the new thread needs
to be updated to point to the newly allocated shadow stack. Since the
ability to do this is confined to the FPU subsystem, change
fpu_clone() to take the new shadow stack pointer, and update it
internally inside the FPU subsystem. This part was suggested by Thomas
Gleixner.
Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/all/20230613001108.3040476-30-rick.p.edgecombe%40intel.com
2023-06-12 17:10:55 -07:00
|
|
|
/*
|
|
|
|
* Allocate a new shadow stack for thread if needed. If shadow stack,
|
|
|
|
* is disabled, new_ssp will remain 0, and fpu_clone() will know not to
|
|
|
|
* update it.
|
|
|
|
*/
|
|
|
|
new_ssp = shstk_alloc_thread_stack(p, clone_flags, args->stack_size);
|
|
|
|
if (IS_ERR_VALUE(new_ssp))
|
|
|
|
return PTR_ERR((void *)new_ssp);
|
|
|
|
|
|
|
|
fpu_clone(p, clone_flags, args->fn, new_ssp);
|
2021-10-14 18:16:04 -07:00
|
|
|
|
2019-11-11 15:03:16 -07:00
|
|
|
/* Kernel thread ? */
|
2021-05-05 04:03:10 -07:00
|
|
|
if (unlikely(p->flags & PF_KTHREAD)) {
|
2021-06-23 05:02:18 -07:00
|
|
|
p->thread.pkru = pkru_get_init_value();
|
2019-11-11 15:03:16 -07:00
|
|
|
memset(childregs, 0, sizeof(struct pt_regs));
|
2022-04-12 08:18:48 -07:00
|
|
|
kthread_frame_init(frame, args->fn, args->fn_arg);
|
2019-11-11 15:03:16 -07:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2021-06-23 05:02:18 -07:00
|
|
|
/*
|
|
|
|
* Clone current's PKRU value from hardware. tsk->thread.pkru
|
|
|
|
* is only valid when scheduled out.
|
|
|
|
*/
|
|
|
|
p->thread.pkru = read_pkru();
|
|
|
|
|
2019-11-11 15:03:16 -07:00
|
|
|
frame->bx = 0;
|
|
|
|
*childregs = *current_pt_regs();
|
|
|
|
childregs->ax = 0;
|
|
|
|
if (sp)
|
|
|
|
childregs->sp = sp;
|
|
|
|
|
2022-04-12 08:18:48 -07:00
|
|
|
if (unlikely(args->fn)) {
|
2021-05-05 04:03:10 -07:00
|
|
|
/*
|
2022-04-12 08:18:48 -07:00
|
|
|
* A user space thread, but it doesn't return to
|
|
|
|
* ret_after_fork().
|
2021-05-05 04:03:10 -07:00
|
|
|
*
|
|
|
|
* In order to indicate that to tools like gdb,
|
|
|
|
* we reset the stack and instruction pointers.
|
|
|
|
*
|
|
|
|
* It does the same kernel frame setup to return to a kernel
|
|
|
|
* function that a kernel thread does.
|
|
|
|
*/
|
|
|
|
childregs->sp = 0;
|
|
|
|
childregs->ip = 0;
|
2022-04-12 08:18:48 -07:00
|
|
|
kthread_frame_init(frame, args->fn, args->fn_arg);
|
2021-05-05 04:03:10 -07:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-11-11 15:03:16 -07:00
|
|
|
/* Set a new TLS for the child thread? */
|
2019-11-11 15:03:25 -07:00
|
|
|
if (clone_flags & CLONE_SETTLS)
|
2019-11-11 15:03:16 -07:00
|
|
|
ret = set_new_tls(p, tls);
|
2019-11-11 15:03:25 -07:00
|
|
|
|
|
|
|
if (!ret && unlikely(test_tsk_thread_flag(current, TIF_IO_BITMAP)))
|
|
|
|
io_bitmap_share(p);
|
|
|
|
|
2019-11-11 15:03:16 -07:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2021-06-23 05:02:13 -07:00
|
|
|
static void pkru_flush_thread(void)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* If PKRU is enabled the default PKRU value has to be loaded into
|
|
|
|
* the hardware right here (similar to context switch).
|
|
|
|
*/
|
|
|
|
pkru_write_default();
|
|
|
|
}
|
|
|
|
|
2009-02-27 14:25:28 -07:00
|
|
|
void flush_thread(void)
|
|
|
|
{
|
|
|
|
struct task_struct *tsk = current;
|
|
|
|
|
2009-09-09 10:22:48 -07:00
|
|
|
flush_ptrace_hw_breakpoint(tsk);
|
2009-02-27 14:25:28 -07:00
|
|
|
memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
|
2015-01-19 11:52:12 -07:00
|
|
|
|
2021-06-23 05:02:12 -07:00
|
|
|
fpu_flush_thread();
|
2021-06-23 05:02:13 -07:00
|
|
|
pkru_flush_thread();
|
2009-02-27 14:25:28 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
void disable_TSC(void)
|
|
|
|
{
|
|
|
|
preempt_disable();
|
|
|
|
if (!test_and_set_thread_flag(TIF_NOTSC))
|
|
|
|
/*
|
|
|
|
* Must flip the CPU state synchronously with
|
|
|
|
* TIF_NOTSC in the current running context.
|
|
|
|
*/
|
2017-02-14 01:11:04 -07:00
|
|
|
cr4_set_bits(X86_CR4_TSD);
|
2009-02-27 14:25:28 -07:00
|
|
|
preempt_enable();
|
|
|
|
}
|
|
|
|
|
|
|
|
static void enable_TSC(void)
|
|
|
|
{
|
|
|
|
preempt_disable();
|
|
|
|
if (test_and_clear_thread_flag(TIF_NOTSC))
|
|
|
|
/*
|
|
|
|
* Must flip the CPU state synchronously with
|
|
|
|
* TIF_NOTSC in the current running context.
|
|
|
|
*/
|
2017-02-14 01:11:04 -07:00
|
|
|
cr4_clear_bits(X86_CR4_TSD);
|
2009-02-27 14:25:28 -07:00
|
|
|
preempt_enable();
|
|
|
|
}
|
|
|
|
|
|
|
|
int get_tsc_mode(unsigned long adr)
|
|
|
|
{
|
|
|
|
unsigned int val;
|
|
|
|
|
|
|
|
if (test_thread_flag(TIF_NOTSC))
|
|
|
|
val = PR_TSC_SIGSEGV;
|
|
|
|
else
|
|
|
|
val = PR_TSC_ENABLE;
|
|
|
|
|
|
|
|
return put_user(val, (unsigned int __user *)adr);
|
|
|
|
}
|
|
|
|
|
|
|
|
int set_tsc_mode(unsigned int val)
|
|
|
|
{
|
|
|
|
if (val == PR_TSC_SIGSEGV)
|
|
|
|
disable_TSC();
|
|
|
|
else if (val == PR_TSC_ENABLE)
|
|
|
|
enable_TSC();
|
|
|
|
else
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-03-20 01:16:26 -07:00
|
|
|
DEFINE_PER_CPU(u64, msr_misc_features_shadow);
|
|
|
|
|
|
|
|
static void set_cpuid_faulting(bool on)
|
|
|
|
{
|
|
|
|
u64 msrval;
|
|
|
|
|
|
|
|
msrval = this_cpu_read(msr_misc_features_shadow);
|
|
|
|
msrval &= ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT;
|
|
|
|
msrval |= (on << MSR_MISC_FEATURES_ENABLES_CPUID_FAULT_BIT);
|
|
|
|
this_cpu_write(msr_misc_features_shadow, msrval);
|
|
|
|
wrmsrl(MSR_MISC_FEATURES_ENABLES, msrval);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void disable_cpuid(void)
|
|
|
|
{
|
|
|
|
preempt_disable();
|
|
|
|
if (!test_and_set_thread_flag(TIF_NOCPUID)) {
|
|
|
|
/*
|
|
|
|
* Must flip the CPU state synchronously with
|
|
|
|
* TIF_NOCPUID in the current running context.
|
|
|
|
*/
|
|
|
|
set_cpuid_faulting(true);
|
|
|
|
}
|
|
|
|
preempt_enable();
|
|
|
|
}
|
|
|
|
|
|
|
|
static void enable_cpuid(void)
|
|
|
|
{
|
|
|
|
preempt_disable();
|
|
|
|
if (test_and_clear_thread_flag(TIF_NOCPUID)) {
|
|
|
|
/*
|
|
|
|
* Must flip the CPU state synchronously with
|
|
|
|
* TIF_NOCPUID in the current running context.
|
|
|
|
*/
|
|
|
|
set_cpuid_faulting(false);
|
|
|
|
}
|
|
|
|
preempt_enable();
|
|
|
|
}
|
|
|
|
|
|
|
|
static int get_cpuid_mode(void)
|
|
|
|
{
|
|
|
|
return !test_thread_flag(TIF_NOCPUID);
|
|
|
|
}
|
|
|
|
|
2022-05-12 05:04:08 -07:00
|
|
|
static int set_cpuid_mode(unsigned long cpuid_enabled)
|
2017-03-20 01:16:26 -07:00
|
|
|
{
|
2019-03-29 11:52:59 -07:00
|
|
|
if (!boot_cpu_has(X86_FEATURE_CPUID_FAULT))
|
2017-03-20 01:16:26 -07:00
|
|
|
return -ENODEV;
|
|
|
|
|
|
|
|
if (cpuid_enabled)
|
|
|
|
enable_cpuid();
|
|
|
|
else
|
|
|
|
disable_cpuid();
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Called immediately after a successful exec.
|
|
|
|
*/
|
|
|
|
void arch_setup_new_exec(void)
|
|
|
|
{
|
|
|
|
/* If cpuid was previously disabled for this task, re-enable it. */
|
|
|
|
if (test_thread_flag(TIF_NOCPUID))
|
|
|
|
enable_cpuid();
|
2019-01-16 15:01:36 -07:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Don't inherit TIF_SSBD across exec boundary when
|
|
|
|
* PR_SPEC_DISABLE_NOEXEC is used.
|
|
|
|
*/
|
|
|
|
if (test_thread_flag(TIF_SSBD) &&
|
|
|
|
task_spec_ssb_noexec(current)) {
|
|
|
|
clear_thread_flag(TIF_SSBD);
|
|
|
|
task_clear_spec_ssb_disable(current);
|
|
|
|
task_clear_spec_ssb_noexec(current);
|
2021-11-29 06:06:53 -07:00
|
|
|
speculation_ctrl_update(read_thread_flags());
|
2019-01-16 15:01:36 -07:00
|
|
|
}
|
2023-03-12 04:26:01 -07:00
|
|
|
|
|
|
|
mm_reset_untag_mask(current->mm);
|
2017-03-20 01:16:26 -07:00
|
|
|
}
|
|
|
|
|
2019-11-12 13:40:33 -07:00
|
|
|
#ifdef CONFIG_X86_IOPL_IOPERM
|
2019-11-11 15:03:23 -07:00
|
|
|
static inline void switch_to_bitmap(unsigned long tifp)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Invalidate I/O bitmap if the previous task used it. This prevents
|
|
|
|
* any possible leakage of an active I/O bitmap.
|
|
|
|
*
|
|
|
|
* If the next task has an I/O bitmap it will handle it on exit to
|
|
|
|
* user mode.
|
|
|
|
*/
|
|
|
|
if (tifp & _TIF_IO_BITMAP)
|
2020-07-17 16:53:55 -07:00
|
|
|
tss_invalidate_io_bitmap();
|
2019-11-11 15:03:23 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
static void tss_copy_io_bitmap(struct tss_struct *tss, struct io_bitmap *iobm)
|
2019-11-11 15:03:22 -07:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Copy at least the byte range of the incoming tasks bitmap which
|
|
|
|
* covers the permitted I/O ports.
|
|
|
|
*
|
|
|
|
* If the previous task which used an I/O bitmap had more bits
|
|
|
|
* permitted, then the copy needs to cover those as well so they
|
|
|
|
* get turned off.
|
|
|
|
*/
|
|
|
|
memcpy(tss->io_bitmap.bitmap, iobm->bitmap,
|
|
|
|
max(tss->io_bitmap.prev_max, iobm->max));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Store the new max and the sequence number of this bitmap
|
|
|
|
* and a pointer to the bitmap itself.
|
|
|
|
*/
|
|
|
|
tss->io_bitmap.prev_max = iobm->max;
|
|
|
|
tss->io_bitmap.prev_sequence = iobm->sequence;
|
|
|
|
}
|
|
|
|
|
2019-11-11 15:03:23 -07:00
|
|
|
/**
|
2022-04-13 23:21:10 -07:00
|
|
|
* native_tss_update_io_bitmap - Update I/O bitmap before exiting to user mode
|
2019-11-11 15:03:23 -07:00
|
|
|
*/
|
2020-02-18 08:47:12 -07:00
|
|
|
void native_tss_update_io_bitmap(void)
|
2017-02-14 01:11:02 -07:00
|
|
|
{
|
2018-11-25 11:33:47 -07:00
|
|
|
struct tss_struct *tss = this_cpu_ptr(&cpu_tss_rw);
|
2019-11-30 08:00:53 -07:00
|
|
|
struct thread_struct *t = ¤t->thread;
|
2019-11-11 15:03:28 -07:00
|
|
|
u16 *base = &tss->x86_tss.io_bitmap_base;
|
2018-11-25 11:33:47 -07:00
|
|
|
|
2019-11-30 08:00:53 -07:00
|
|
|
if (!test_thread_flag(TIF_IO_BITMAP)) {
|
2020-07-17 16:53:55 -07:00
|
|
|
native_tss_invalidate_io_bitmap();
|
2019-11-30 08:00:53 -07:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (IS_ENABLED(CONFIG_X86_IOPL_IOPERM) && t->iopl_emul == 3) {
|
|
|
|
*base = IO_BITMAP_OFFSET_VALID_ALL;
|
|
|
|
} else {
|
|
|
|
struct io_bitmap *iobm = t->io_bitmap;
|
|
|
|
|
2017-02-14 01:11:02 -07:00
|
|
|
/*
|
2019-11-30 08:00:53 -07:00
|
|
|
* Only copy bitmap data when the sequence number differs. The
|
|
|
|
* update time is accounted to the incoming task.
|
2017-02-14 01:11:02 -07:00
|
|
|
*/
|
2019-11-30 08:00:53 -07:00
|
|
|
if (tss->io_bitmap.prev_sequence != iobm->sequence)
|
|
|
|
tss_copy_io_bitmap(tss, iobm);
|
|
|
|
|
|
|
|
/* Enable the bitmap */
|
|
|
|
*base = IO_BITMAP_OFFSET_VALID_MAP;
|
2017-02-14 01:11:02 -07:00
|
|
|
}
|
2019-11-30 08:00:53 -07:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Make sure that the TSS limit is covering the IO bitmap. It might have
|
|
|
|
* been cut down by a VMEXIT to 0x67 which would cause a subsequent I/O
|
2024-01-02 17:40:11 -07:00
|
|
|
* access from user space to trigger a #GP because the bitmap is outside
|
2019-11-30 08:00:53 -07:00
|
|
|
* the TSS limit.
|
|
|
|
*/
|
|
|
|
refresh_tss_limit();
|
2017-02-14 01:11:02 -07:00
|
|
|
}
|
2019-11-12 13:40:33 -07:00
|
|
|
#else /* CONFIG_X86_IOPL_IOPERM */
|
|
|
|
static inline void switch_to_bitmap(unsigned long tifp) { }
|
|
|
|
#endif
|
2017-02-14 01:11:02 -07:00
|
|
|
|
2018-05-09 12:53:09 -07:00
|
|
|
#ifdef CONFIG_SMP
|
|
|
|
|
|
|
|
struct ssb_state {
|
|
|
|
struct ssb_state *shared_state;
|
|
|
|
raw_spinlock_t lock;
|
|
|
|
unsigned int disable_state;
|
|
|
|
unsigned long local_state;
|
|
|
|
};
|
|
|
|
|
|
|
|
#define LSTATE_SSB 0
|
|
|
|
|
|
|
|
static DEFINE_PER_CPU(struct ssb_state, ssb_state);
|
|
|
|
|
|
|
|
void speculative_store_bypass_ht_init(void)
|
2018-04-29 06:21:42 -07:00
|
|
|
{
|
2018-05-09 12:53:09 -07:00
|
|
|
struct ssb_state *st = this_cpu_ptr(&ssb_state);
|
|
|
|
unsigned int this_cpu = smp_processor_id();
|
|
|
|
unsigned int cpu;
|
|
|
|
|
|
|
|
st->local_state = 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Shared state setup happens once on the first bringup
|
|
|
|
* of the CPU. It's not destroyed on CPU hotunplug.
|
|
|
|
*/
|
|
|
|
if (st->shared_state)
|
|
|
|
return;
|
|
|
|
|
|
|
|
raw_spin_lock_init(&st->lock);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Go over HT siblings and check whether one of them has set up the
|
|
|
|
* shared state pointer already.
|
|
|
|
*/
|
|
|
|
for_each_cpu(cpu, topology_sibling_cpumask(this_cpu)) {
|
|
|
|
if (cpu == this_cpu)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (!per_cpu(ssb_state, cpu).shared_state)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/* Link it to the state of the sibling: */
|
|
|
|
st->shared_state = per_cpu(ssb_state, cpu).shared_state;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* First HT sibling to come up on the core. Link shared state of
|
|
|
|
* the first HT sibling to itself. The siblings on the same core
|
|
|
|
* which come up later will see the shared state pointer and link
|
2021-03-18 07:28:01 -07:00
|
|
|
* themselves to the state of this CPU.
|
2018-05-09 12:53:09 -07:00
|
|
|
*/
|
|
|
|
st->shared_state = st;
|
|
|
|
}
|
2018-04-29 06:21:42 -07:00
|
|
|
|
2018-05-09 12:53:09 -07:00
|
|
|
/*
|
|
|
|
* Logic is: First HT sibling enables SSBD for both siblings in the core
|
|
|
|
* and last sibling to disable it, disables it for the whole core. This how
|
|
|
|
* MSR_SPEC_CTRL works in "hardware":
|
|
|
|
*
|
|
|
|
* CORE_SPEC_CTRL = THREAD0_SPEC_CTRL | THREAD1_SPEC_CTRL
|
|
|
|
*/
|
|
|
|
static __always_inline void amd_set_core_ssb_state(unsigned long tifn)
|
|
|
|
{
|
|
|
|
struct ssb_state *st = this_cpu_ptr(&ssb_state);
|
|
|
|
u64 msr = x86_amd_ls_cfg_base;
|
|
|
|
|
|
|
|
if (!static_cpu_has(X86_FEATURE_ZEN)) {
|
|
|
|
msr |= ssbd_tif_to_amd_ls_cfg(tifn);
|
2018-04-29 06:21:42 -07:00
|
|
|
wrmsrl(MSR_AMD64_LS_CFG, msr);
|
2018-05-09 12:53:09 -07:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (tifn & _TIF_SSBD) {
|
|
|
|
/*
|
|
|
|
* Since this can race with prctl(), block reentry on the
|
|
|
|
* same CPU.
|
|
|
|
*/
|
|
|
|
if (__test_and_set_bit(LSTATE_SSB, &st->local_state))
|
|
|
|
return;
|
|
|
|
|
|
|
|
msr |= x86_amd_ls_cfg_ssbd_mask;
|
|
|
|
|
|
|
|
raw_spin_lock(&st->shared_state->lock);
|
|
|
|
/* First sibling enables SSBD: */
|
|
|
|
if (!st->shared_state->disable_state)
|
|
|
|
wrmsrl(MSR_AMD64_LS_CFG, msr);
|
|
|
|
st->shared_state->disable_state++;
|
|
|
|
raw_spin_unlock(&st->shared_state->lock);
|
2018-04-29 06:21:42 -07:00
|
|
|
} else {
|
2018-05-09 12:53:09 -07:00
|
|
|
if (!__test_and_clear_bit(LSTATE_SSB, &st->local_state))
|
|
|
|
return;
|
|
|
|
|
|
|
|
raw_spin_lock(&st->shared_state->lock);
|
|
|
|
st->shared_state->disable_state--;
|
|
|
|
if (!st->shared_state->disable_state)
|
|
|
|
wrmsrl(MSR_AMD64_LS_CFG, msr);
|
|
|
|
raw_spin_unlock(&st->shared_state->lock);
|
2018-04-29 06:21:42 -07:00
|
|
|
}
|
|
|
|
}
|
2018-05-09 12:53:09 -07:00
|
|
|
#else
|
|
|
|
static __always_inline void amd_set_core_ssb_state(unsigned long tifn)
|
|
|
|
{
|
|
|
|
u64 msr = x86_amd_ls_cfg_base | ssbd_tif_to_amd_ls_cfg(tifn);
|
|
|
|
|
|
|
|
wrmsrl(MSR_AMD64_LS_CFG, msr);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2018-05-17 08:09:18 -07:00
|
|
|
static __always_inline void amd_set_ssb_virt_state(unsigned long tifn)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* SSBD has the same definition in SPEC_CTRL and VIRT_SPEC_CTRL,
|
|
|
|
* so ssbd_tif_to_spec_ctrl() just works.
|
|
|
|
*/
|
|
|
|
wrmsrl(MSR_AMD64_VIRT_SPEC_CTRL, ssbd_tif_to_spec_ctrl(tifn));
|
|
|
|
}
|
|
|
|
|
2018-11-25 11:33:35 -07:00
|
|
|
/*
|
|
|
|
* Update the MSRs managing speculation control, during context switch.
|
|
|
|
*
|
|
|
|
* tifp: Previous task's thread flags
|
|
|
|
* tifn: Next task's thread flags
|
|
|
|
*/
|
|
|
|
static __always_inline void __speculation_ctrl_update(unsigned long tifp,
|
|
|
|
unsigned long tifn)
|
2018-05-09 12:53:09 -07:00
|
|
|
{
|
2018-11-25 11:33:46 -07:00
|
|
|
unsigned long tif_diff = tifp ^ tifn;
|
2018-11-25 11:33:35 -07:00
|
|
|
u64 msr = x86_spec_ctrl_base;
|
|
|
|
bool updmsr = false;
|
|
|
|
|
2019-04-14 10:51:06 -07:00
|
|
|
lockdep_assert_irqs_disabled();
|
|
|
|
|
2020-01-05 13:19:43 -07:00
|
|
|
/* Handle change of TIF_SSBD depending on the mitigation method. */
|
|
|
|
if (static_cpu_has(X86_FEATURE_VIRT_SSBD)) {
|
|
|
|
if (tif_diff & _TIF_SSBD)
|
2018-11-25 11:33:35 -07:00
|
|
|
amd_set_ssb_virt_state(tifn);
|
2020-01-05 13:19:43 -07:00
|
|
|
} else if (static_cpu_has(X86_FEATURE_LS_CFG_SSBD)) {
|
|
|
|
if (tif_diff & _TIF_SSBD)
|
2018-11-25 11:33:35 -07:00
|
|
|
amd_set_core_ssb_state(tifn);
|
2020-01-05 13:19:43 -07:00
|
|
|
} else if (static_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD) ||
|
|
|
|
static_cpu_has(X86_FEATURE_AMD_SSBD)) {
|
|
|
|
updmsr |= !!(tif_diff & _TIF_SSBD);
|
|
|
|
msr |= ssbd_tif_to_spec_ctrl(tifn);
|
2018-11-25 11:33:35 -07:00
|
|
|
}
|
2018-05-09 12:53:09 -07:00
|
|
|
|
2020-01-05 13:19:43 -07:00
|
|
|
/* Only evaluate TIF_SPEC_IB if conditional STIBP is enabled. */
|
2018-11-25 11:33:46 -07:00
|
|
|
if (IS_ENABLED(CONFIG_SMP) &&
|
|
|
|
static_branch_unlikely(&switch_to_cond_stibp)) {
|
|
|
|
updmsr |= !!(tif_diff & _TIF_SPEC_IB);
|
|
|
|
msr |= stibp_tif_to_spec_ctrl(tifn);
|
|
|
|
}
|
|
|
|
|
2018-11-25 11:33:35 -07:00
|
|
|
if (updmsr)
|
2022-11-30 08:25:51 -07:00
|
|
|
update_spec_ctrl_cond(msr);
|
2018-05-09 12:53:09 -07:00
|
|
|
}
|
|
|
|
|
2018-11-28 02:56:57 -07:00
|
|
|
static unsigned long speculation_ctrl_update_tif(struct task_struct *tsk)
|
2018-05-09 12:53:09 -07:00
|
|
|
{
|
2018-11-28 02:56:57 -07:00
|
|
|
if (test_and_clear_tsk_thread_flag(tsk, TIF_SPEC_FORCE_UPDATE)) {
|
|
|
|
if (task_spec_ssb_disable(tsk))
|
|
|
|
set_tsk_thread_flag(tsk, TIF_SSBD);
|
|
|
|
else
|
|
|
|
clear_tsk_thread_flag(tsk, TIF_SSBD);
|
x86/speculation: Add prctl() control for indirect branch speculation
Add the PR_SPEC_INDIRECT_BRANCH option for the PR_GET_SPECULATION_CTRL and
PR_SET_SPECULATION_CTRL prctls to allow fine grained per task control of
indirect branch speculation via STIBP and IBPB.
Invocations:
Check indirect branch speculation status with
- prctl(PR_GET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, 0, 0, 0);
Enable indirect branch speculation with
- prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_ENABLE, 0, 0);
Disable indirect branch speculation with
- prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_DISABLE, 0, 0);
Force disable indirect branch speculation with
- prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_FORCE_DISABLE, 0, 0);
See Documentation/userspace-api/spec_ctrl.rst.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185005.866780996@linutronix.de
2018-11-25 11:33:53 -07:00
|
|
|
|
|
|
|
if (task_spec_ib_disable(tsk))
|
|
|
|
set_tsk_thread_flag(tsk, TIF_SPEC_IB);
|
|
|
|
else
|
|
|
|
clear_tsk_thread_flag(tsk, TIF_SPEC_IB);
|
2018-11-28 02:56:57 -07:00
|
|
|
}
|
|
|
|
/* Return the updated threadinfo flags*/
|
2021-11-29 06:06:53 -07:00
|
|
|
return read_task_thread_flags(tsk);
|
2018-05-09 12:53:09 -07:00
|
|
|
}
|
2018-04-29 06:21:42 -07:00
|
|
|
|
2018-11-25 11:33:34 -07:00
|
|
|
void speculation_ctrl_update(unsigned long tif)
|
2018-04-29 06:21:42 -07:00
|
|
|
{
|
2019-04-14 10:51:06 -07:00
|
|
|
unsigned long flags;
|
|
|
|
|
2018-11-25 11:33:35 -07:00
|
|
|
/* Forced update. Make sure all relevant TIF flags are different */
|
2019-04-14 10:51:06 -07:00
|
|
|
local_irq_save(flags);
|
2018-11-25 11:33:35 -07:00
|
|
|
__speculation_ctrl_update(~tif, tif);
|
2019-04-14 10:51:06 -07:00
|
|
|
local_irq_restore(flags);
|
2018-04-29 06:21:42 -07:00
|
|
|
}
|
|
|
|
|
2018-11-28 02:56:57 -07:00
|
|
|
/* Called from seccomp/prctl update */
|
|
|
|
void speculation_ctrl_update_current(void)
|
|
|
|
{
|
|
|
|
preempt_disable();
|
|
|
|
speculation_ctrl_update(speculation_ctrl_update_tif(current));
|
|
|
|
preempt_enable();
|
|
|
|
}
|
|
|
|
|
2020-04-21 02:20:29 -07:00
|
|
|
static inline void cr4_toggle_bits_irqsoff(unsigned long mask)
|
|
|
|
{
|
|
|
|
unsigned long newval, cr4 = this_cpu_read(cpu_tlbstate.cr4);
|
|
|
|
|
|
|
|
newval = cr4 ^ mask;
|
|
|
|
if (newval != cr4) {
|
|
|
|
this_cpu_write(cpu_tlbstate.cr4, newval);
|
|
|
|
__write_cr4(newval);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-11-25 11:33:47 -07:00
|
|
|
void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
|
2009-02-27 14:25:28 -07:00
|
|
|
{
|
2017-02-14 01:11:02 -07:00
|
|
|
unsigned long tifp, tifn;
|
2009-02-27 14:25:28 -07:00
|
|
|
|
2021-11-29 06:06:53 -07:00
|
|
|
tifn = read_task_thread_flags(next_p);
|
|
|
|
tifp = read_task_thread_flags(prev_p);
|
2019-11-11 15:03:23 -07:00
|
|
|
|
|
|
|
switch_to_bitmap(tifp);
|
2017-02-14 01:11:02 -07:00
|
|
|
|
|
|
|
propagate_user_return_notify(prev_p, next_p);
|
|
|
|
|
2017-02-14 01:11:03 -07:00
|
|
|
if ((tifp & _TIF_BLOCKSTEP || tifn & _TIF_BLOCKSTEP) &&
|
|
|
|
arch_has_block_step()) {
|
|
|
|
unsigned long debugctl, msk;
|
2010-03-25 06:51:51 -07:00
|
|
|
|
2017-02-14 01:11:03 -07:00
|
|
|
rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
|
2010-03-25 06:51:51 -07:00
|
|
|
debugctl &= ~DEBUGCTLMSR_BTF;
|
2017-02-14 01:11:03 -07:00
|
|
|
msk = tifn & _TIF_BLOCKSTEP;
|
|
|
|
debugctl |= (msk >> TIF_BLOCKSTEP) << DEBUGCTLMSR_BTF_SHIFT;
|
|
|
|
wrmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
|
2010-03-25 06:51:51 -07:00
|
|
|
}
|
2009-02-27 14:25:28 -07:00
|
|
|
|
2017-02-14 01:11:04 -07:00
|
|
|
if ((tifp ^ tifn) & _TIF_NOTSC)
|
2017-11-24 20:29:07 -07:00
|
|
|
cr4_toggle_bits_irqsoff(X86_CR4_TSD);
|
2017-03-20 01:16:26 -07:00
|
|
|
|
|
|
|
if ((tifp ^ tifn) & _TIF_NOCPUID)
|
|
|
|
set_cpuid_faulting(!!(tifn & _TIF_NOCPUID));
|
2018-04-29 06:21:42 -07:00
|
|
|
|
2018-11-28 02:56:57 -07:00
|
|
|
if (likely(!((tifp | tifn) & _TIF_SPEC_FORCE_UPDATE))) {
|
|
|
|
__speculation_ctrl_update(tifp, tifn);
|
|
|
|
} else {
|
|
|
|
speculation_ctrl_update_tif(prev_p);
|
|
|
|
tifn = speculation_ctrl_update_tif(next_p);
|
|
|
|
|
|
|
|
/* Enforce MSR update to ensure consistent state */
|
|
|
|
__speculation_ctrl_update(~tifn, tifn);
|
|
|
|
}
|
2009-02-27 14:25:28 -07:00
|
|
|
}
|
|
|
|
|
2008-06-09 09:35:28 -07:00
|
|
|
/*
|
|
|
|
* Idle related variables and functions
|
|
|
|
*/
|
2010-11-03 09:06:14 -07:00
|
|
|
unsigned long boot_option_idle_override = IDLE_NO_OVERRIDE;
|
2008-06-09 09:35:28 -07:00
|
|
|
EXPORT_SYMBOL(boot_option_idle_override);
|
|
|
|
|
2023-01-12 12:43:16 -07:00
|
|
|
/*
|
|
|
|
* We use this if we don't have any better idle routine..
|
|
|
|
*/
|
|
|
|
void __cpuidle default_idle(void)
|
|
|
|
{
|
|
|
|
raw_safe_halt();
|
2023-01-12 12:43:35 -07:00
|
|
|
raw_local_irq_disable();
|
2023-01-12 12:43:16 -07:00
|
|
|
}
|
|
|
|
#if defined(CONFIG_APM_MODULE) || defined(CONFIG_HALTPOLL_CPUIDLE_MODULE)
|
|
|
|
EXPORT_SYMBOL(default_idle);
|
|
|
|
#endif
|
|
|
|
|
|
|
|
DEFINE_STATIC_CALL_NULL(x86_idle, default_idle);
|
|
|
|
|
|
|
|
static bool x86_idle_set(void)
|
|
|
|
{
|
|
|
|
return !!static_call_query(x86_idle);
|
|
|
|
}
|
2008-06-09 09:35:28 -07:00
|
|
|
|
2012-03-25 14:00:04 -07:00
|
|
|
#ifndef CONFIG_SMP
|
2023-02-14 00:05:52 -07:00
|
|
|
static inline void __noreturn play_dead(void)
|
2012-03-25 14:00:04 -07:00
|
|
|
{
|
|
|
|
BUG();
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2013-03-21 14:50:03 -07:00
|
|
|
void arch_cpu_idle_enter(void)
|
|
|
|
{
|
2016-12-13 06:14:17 -07:00
|
|
|
tsc_verify_tsc_adjust(false);
|
2013-03-21 14:50:03 -07:00
|
|
|
local_touch_nmi();
|
|
|
|
}
|
2012-03-25 14:00:04 -07:00
|
|
|
|
2023-02-14 00:05:58 -07:00
|
|
|
void __noreturn arch_cpu_idle_dead(void)
|
2013-03-21 14:50:03 -07:00
|
|
|
{
|
|
|
|
play_dead();
|
|
|
|
}
|
2012-03-25 14:00:04 -07:00
|
|
|
|
2013-03-21 14:50:03 -07:00
|
|
|
/*
|
|
|
|
* Called from the generic idle code.
|
|
|
|
*/
|
2023-01-12 12:43:16 -07:00
|
|
|
void __cpuidle arch_cpu_idle(void)
|
2008-06-09 09:35:28 -07:00
|
|
|
{
|
2023-01-12 12:43:16 -07:00
|
|
|
static_call(x86_idle)();
|
2008-06-09 09:35:28 -07:00
|
|
|
}
|
2023-01-05 21:03:42 -07:00
|
|
|
EXPORT_SYMBOL_GPL(arch_cpu_idle);
|
2008-06-09 09:35:28 -07:00
|
|
|
|
2013-02-09 21:08:07 -07:00
|
|
|
#ifdef CONFIG_XEN
|
|
|
|
bool xen_set_default_idle(void)
|
2011-11-21 16:02:02 -07:00
|
|
|
{
|
2023-01-12 12:43:16 -07:00
|
|
|
bool ret = x86_idle_set();
|
2011-11-21 16:02:02 -07:00
|
|
|
|
2023-01-12 12:43:16 -07:00
|
|
|
static_call_update(x86_idle, default_idle);
|
2011-11-21 16:02:02 -07:00
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
2013-02-09 21:08:07 -07:00
|
|
|
#endif
|
2017-07-17 14:10:28 -07:00
|
|
|
|
2023-04-26 09:37:00 -07:00
|
|
|
struct cpumask cpus_stop_mask;
|
|
|
|
|
2022-03-08 08:30:47 -07:00
|
|
|
void __noreturn stop_this_cpu(void *dummy)
|
2008-11-11 06:33:44 -07:00
|
|
|
{
|
2023-06-15 13:33:52 -07:00
|
|
|
struct cpuinfo_x86 *c = this_cpu_ptr(&cpu_info);
|
2023-04-26 09:37:00 -07:00
|
|
|
unsigned int cpu = smp_processor_id();
|
|
|
|
|
2008-11-11 06:33:44 -07:00
|
|
|
local_irq_disable();
|
2023-04-26 09:37:00 -07:00
|
|
|
|
2008-11-11 06:33:44 -07:00
|
|
|
/*
|
2023-04-26 09:37:00 -07:00
|
|
|
* Remove this CPU from the online mask and disable it
|
|
|
|
* unconditionally. This might be redundant in case that the reboot
|
|
|
|
* vector was handled late and stop_other_cpus() sent an NMI.
|
|
|
|
*
|
|
|
|
* According to SDM and APM NMIs can be accepted even after soft
|
|
|
|
* disabling the local APIC.
|
2008-11-11 06:33:44 -07:00
|
|
|
*/
|
2023-04-26 09:37:00 -07:00
|
|
|
set_cpu_online(cpu, false);
|
2008-11-11 06:33:44 -07:00
|
|
|
disable_local_APIC();
|
2023-06-15 13:33:52 -07:00
|
|
|
mcheck_cpu_clear(c);
|
2008-11-11 06:33:44 -07:00
|
|
|
|
2018-01-17 16:41:41 -07:00
|
|
|
/*
|
|
|
|
* Use wbinvd on processors that support SME. This provides support
|
|
|
|
* for performing a successful kexec when going from SME inactive
|
|
|
|
* to SME active (or vice-versa). The cache must be cleared so that
|
|
|
|
* if there are entries with the same physical address, both with and
|
|
|
|
* without the encryption bit, they don't race each other when flushed
|
|
|
|
* and potentially end up with the wrong entry being committed to
|
|
|
|
* memory.
|
2022-02-15 20:44:46 -07:00
|
|
|
*
|
|
|
|
* Test the CPUID bit directly because the machine might've cleared
|
|
|
|
* X86_FEATURE_SME due to cmdline options.
|
2018-01-17 16:41:41 -07:00
|
|
|
*/
|
2023-06-15 13:33:52 -07:00
|
|
|
if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0)))
|
2018-01-17 16:41:41 -07:00
|
|
|
native_wbinvd();
|
2023-04-26 09:37:00 -07:00
|
|
|
|
|
|
|
/*
|
|
|
|
* This brings a cache line back and dirties it, but
|
|
|
|
* native_stop_other_cpus() will overwrite cpus_stop_mask after it
|
|
|
|
* observed that all CPUs reported stop. This write will invalidate
|
|
|
|
* the related cache line on this CPU.
|
|
|
|
*/
|
|
|
|
cpumask_clear_cpu(cpu, &cpus_stop_mask);
|
|
|
|
|
2024-06-14 02:59:01 -07:00
|
|
|
#ifdef CONFIG_SMP
|
|
|
|
if (smp_ops.stop_this_cpu) {
|
|
|
|
smp_ops.stop_this_cpu();
|
|
|
|
unreachable();
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2017-07-17 14:10:28 -07:00
|
|
|
for (;;) {
|
|
|
|
/*
|
2018-01-17 16:41:41 -07:00
|
|
|
* Use native_halt() so that memory contents don't change
|
|
|
|
* (stack usage and variables) after possibly issuing the
|
|
|
|
* native_wbinvd() above.
|
2017-07-17 14:10:28 -07:00
|
|
|
*/
|
2018-01-17 16:41:41 -07:00
|
|
|
native_halt();
|
2017-07-17 14:10:28 -07:00
|
|
|
}
|
2008-04-25 08:39:01 -07:00
|
|
|
}
|
|
|
|
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-14 22:37:34 -07:00
|
|
|
/*
|
2022-06-06 11:03:35 -07:00
|
|
|
* Prefer MWAIT over HALT if MWAIT is supported, MWAIT_CPUID leaf
|
|
|
|
* exists and whenever MONITOR/MWAIT extensions are present there is at
|
|
|
|
* least one C1 substate.
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-14 22:37:34 -07:00
|
|
|
*
|
2022-06-06 11:03:35 -07:00
|
|
|
* Do not prefer MWAIT if MONITOR instruction has a bug or idle=nomwait
|
|
|
|
* is passed to kernel commandline parameter.
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-14 22:37:34 -07:00
|
|
|
*/
|
2024-02-28 15:20:32 -07:00
|
|
|
static __init bool prefer_mwait_c1_over_halt(void)
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-14 22:37:34 -07:00
|
|
|
{
|
2024-02-28 15:20:32 -07:00
|
|
|
const struct cpuinfo_x86 *c = &boot_cpu_data;
|
2022-06-06 11:03:35 -07:00
|
|
|
u32 eax, ebx, ecx, edx;
|
|
|
|
|
2024-02-29 07:23:40 -07:00
|
|
|
/* If override is enforced on the command line, fall back to HALT. */
|
|
|
|
if (boot_option_idle_override != IDLE_NO_OVERRIDE)
|
2024-02-29 07:23:41 -07:00
|
|
|
return false;
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-14 22:37:34 -07:00
|
|
|
|
2022-06-06 11:03:35 -07:00
|
|
|
/* MWAIT is not supported on this platform. Fallback to HALT */
|
|
|
|
if (!cpu_has(c, X86_FEATURE_MWAIT))
|
2024-02-29 07:23:41 -07:00
|
|
|
return false;
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-14 22:37:34 -07:00
|
|
|
|
2024-02-28 15:13:00 -07:00
|
|
|
/* Monitor has a bug or APIC stops in C1E. Fallback to HALT */
|
|
|
|
if (boot_cpu_has_bug(X86_BUG_MONITOR) || boot_cpu_has_bug(X86_BUG_AMD_APIC_C1E))
|
2024-02-29 07:23:41 -07:00
|
|
|
return false;
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-14 22:37:34 -07:00
|
|
|
|
2022-06-06 11:03:35 -07:00
|
|
|
cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If MWAIT extensions are not available, it is safe to use MWAIT
|
|
|
|
* with EAX=0, ECX=0.
|
|
|
|
*/
|
|
|
|
if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED))
|
2024-02-29 07:23:41 -07:00
|
|
|
return true;
|
2022-06-06 11:03:35 -07:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If MWAIT extensions are available, there should be at least one
|
|
|
|
* MWAIT C1 substate present.
|
|
|
|
*/
|
2024-02-29 07:23:41 -07:00
|
|
|
return !!(edx & MWAIT_C1_SUBSTATE_MASK);
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-14 22:37:34 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2015-05-26 01:28:09 -07:00
|
|
|
* MONITOR/MWAIT with no hints, used for default C1 state. This invokes MWAIT
|
|
|
|
* with interrupts enabled and no flags, which is backwards compatible with the
|
|
|
|
* original MWAIT implementation.
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-14 22:37:34 -07:00
|
|
|
*/
|
2016-10-07 17:02:55 -07:00
|
|
|
static __cpuidle void mwait_idle(void)
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-14 22:37:34 -07:00
|
|
|
{
|
2014-01-18 09:14:44 -07:00
|
|
|
if (!current_set_polling_and_test()) {
|
|
|
|
if (this_cpu_has(X86_BUG_CLFLUSH_MONITOR)) {
|
2016-01-28 10:02:51 -07:00
|
|
|
mb(); /* quirk */
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-14 22:37:34 -07:00
|
|
|
clflush((void *)¤t_thread_info()->flags);
|
2016-01-28 10:02:51 -07:00
|
|
|
mb(); /* quirk */
|
2014-01-18 09:14:44 -07:00
|
|
|
}
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-14 22:37:34 -07:00
|
|
|
|
|
|
|
__monitor((void *)¤t_thread_info()->flags, 0, 0);
|
2023-01-12 12:43:35 -07:00
|
|
|
if (!need_resched()) {
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-14 22:37:34 -07:00
|
|
|
__sti_mwait(0, 0);
|
2023-01-12 12:43:35 -07:00
|
|
|
raw_local_irq_disable();
|
|
|
|
}
|
2014-01-18 09:14:44 -07:00
|
|
|
}
|
|
|
|
__current_clr_polling();
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-14 22:37:34 -07:00
|
|
|
}
|
|
|
|
|
2024-02-28 15:20:32 -07:00
|
|
|
void __init select_idle_routine(void)
|
2008-04-25 08:39:01 -07:00
|
|
|
{
|
2024-02-29 07:23:38 -07:00
|
|
|
if (boot_option_idle_override == IDLE_POLL) {
|
Core x86 changes for v6.9:
- The biggest change is the rework of the percpu code,
to support the 'Named Address Spaces' GCC feature,
by Uros Bizjak:
- This allows C code to access GS and FS segment relative
memory via variables declared with such attributes,
which allows the compiler to better optimize those accesses
than the previous inline assembly code.
- The series also includes a number of micro-optimizations
for various percpu access methods, plus a number of
cleanups of %gs accesses in assembly code.
- These changes have been exposed to linux-next testing for
the last ~5 months, with no known regressions in this area.
- Fix/clean up __switch_to()'s broken but accidentally
working handling of FPU switching - which also generates
better code.
- Propagate more RIP-relative addressing in assembly code,
to generate slightly better code.
- Rework the CPU mitigations Kconfig space to be less idiosyncratic,
to make it easier for distros to follow & maintain these options.
- Rework the x86 idle code to cure RCU violations and
to clean up the logic.
- Clean up the vDSO Makefile logic.
- Misc cleanups and fixes.
[ Please note that there's a higher number of merge commits in
this branch (three) than is usual in x86 topic trees. This happened
due to the long testing lifecycle of the percpu changes that
involved 3 merge windows, which generated a longer history
and various interactions with other core x86 changes that we
felt better about to carry in a single branch. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmXvB0gRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1jUqRAAqnEQPiabF5acQlHrwviX+cjSobDlqtH5
9q2AQy9qaEHapzD0XMOxvFye6XIvehGOGxSPvk6CoviSxBND8rb56lvnsEZuLeBV
Bo5QSIL2x42Zrvo11iPHwgXZfTIusU90sBuKDRFkYBAxY3HK2naMDZe8MAsYCUE9
nwgHF8DDc/NYiSOXV8kosWoWpNIkoK/STyH5bvTQZMqZcwyZ49AIeP1jGZb/prbC
e/rbnlrq5Eu6brpM7xo9kELO0Vhd34urV14KrrIpdkmUKytW2KIsyvW8D6fqgDBj
NSaQLLcz0pCXbhF+8Nqvdh/1coR4L7Ymt08P1rfEjCsQgb/2WnSAGUQuC5JoGzaj
ngkbFcZllIbD9gNzMQ1n4Aw5TiO+l9zxCqPC/r58Uuvstr+K9QKlwnp2+B3Q73Ft
rojIJ04NJL6lCHdDgwAjTTks+TD2PT/eBWsDfJ/1pnUWttmv9IjMpnXD5sbHxoiU
2RGGKnYbxXczYdq/ALYDWM6JXpfnJZcXL3jJi0IDcCSsb92xRvTANYFHnTfyzGfw
EHkhbF4e4Vy9f6QOkSP3CvW5H26BmZS9DKG0J9Il5R3u2lKdfbb5vmtUmVTqHmAD
Ulo5cWZjEznlWCAYSI/aIidmBsp9OAEvYd+X7Z5SBIgTfSqV7VWHGt0BfA1heiVv
F/mednG0gGc=
=3v4F
-----END PGP SIGNATURE-----
Merge tag 'x86-core-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core x86 updates from Ingo Molnar:
- The biggest change is the rework of the percpu code, to support the
'Named Address Spaces' GCC feature, by Uros Bizjak:
- This allows C code to access GS and FS segment relative memory
via variables declared with such attributes, which allows the
compiler to better optimize those accesses than the previous
inline assembly code.
- The series also includes a number of micro-optimizations for
various percpu access methods, plus a number of cleanups of %gs
accesses in assembly code.
- These changes have been exposed to linux-next testing for the
last ~5 months, with no known regressions in this area.
- Fix/clean up __switch_to()'s broken but accidentally working handling
of FPU switching - which also generates better code
- Propagate more RIP-relative addressing in assembly code, to generate
slightly better code
- Rework the CPU mitigations Kconfig space to be less idiosyncratic, to
make it easier for distros to follow & maintain these options
- Rework the x86 idle code to cure RCU violations and to clean up the
logic
- Clean up the vDSO Makefile logic
- Misc cleanups and fixes
* tag 'x86-core-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
x86/idle: Select idle routine only once
x86/idle: Let prefer_mwait_c1_over_halt() return bool
x86/idle: Cleanup idle_setup()
x86/idle: Clean up idle selection
x86/idle: Sanitize X86_BUG_AMD_E400 handling
sched/idle: Conditionally handle tick broadcast in default_idle_call()
x86: Increase brk randomness entropy for 64-bit systems
x86/vdso: Move vDSO to mmap region
x86/vdso/kbuild: Group non-standard build attributes and primary object file rules together
x86/vdso: Fix rethunk patching for vdso-image-{32,64}.o
x86/retpoline: Ensure default return thunk isn't used at runtime
x86/vdso: Use CONFIG_COMPAT_32 to specify vdso32
x86/vdso: Use $(addprefix ) instead of $(foreach )
x86/vdso: Simplify obj-y addition
x86/vdso: Consolidate targets and clean-files
x86/bugs: Rename CONFIG_RETHUNK => CONFIG_MITIGATION_RETHUNK
x86/bugs: Rename CONFIG_CPU_SRSO => CONFIG_MITIGATION_SRSO
x86/bugs: Rename CONFIG_CPU_IBRS_ENTRY => CONFIG_MITIGATION_IBRS_ENTRY
x86/bugs: Rename CONFIG_CPU_UNRET_ENTRY => CONFIG_MITIGATION_UNRET_ENTRY
x86/bugs: Rename CONFIG_SLS => CONFIG_MITIGATION_SLS
...
2024-03-11 19:53:15 -07:00
|
|
|
if (IS_ENABLED(CONFIG_SMP) && __max_threads_per_core > 1)
|
2024-02-29 07:23:38 -07:00
|
|
|
pr_warn_once("WARNING: polling idle and HT enabled, performance may degrade\n");
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2024-02-28 15:20:32 -07:00
|
|
|
/* Required to guard against xen_set_default_idle() */
|
2024-02-29 07:23:38 -07:00
|
|
|
if (x86_idle_set())
|
2008-06-09 07:59:53 -07:00
|
|
|
return;
|
|
|
|
|
2024-02-28 15:20:32 -07:00
|
|
|
if (prefer_mwait_c1_over_halt()) {
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-14 22:37:34 -07:00
|
|
|
pr_info("using mwait in idle threads\n");
|
2023-01-12 12:43:16 -07:00
|
|
|
static_call_update(x86_idle, mwait_idle);
|
x86/tdx: Add HLT support for TDX guests
The HLT instruction is a privileged instruction, executing it stops
instruction execution and places the processor in a HALT state. It
is used in kernel for cases like reboot, idle loop and exception fixup
handlers. For the idle case, interrupts will be enabled (using STI)
before the HLT instruction (this is also called safe_halt()).
To support the HLT instruction in TDX guests, it needs to be emulated
using TDVMCALL (hypercall to VMM). More details about it can be found
in Intel Trust Domain Extensions (Intel TDX) Guest-Host-Communication
Interface (GHCI) specification, section TDVMCALL[Instruction.HLT].
In TDX guests, executing HLT instruction will generate a #VE, which is
used to emulate the HLT instruction. But #VE based emulation will not
work for the safe_halt() flavor, because it requires STI instruction to
be executed just before the TDCALL. Since idle loop is the only user of
safe_halt() variant, handle it as a special case.
To avoid *safe_halt() call in the idle function, define the
tdx_guest_idle() and use it to override the "x86_idle" function pointer
for a valid TDX guest.
Alternative choices like PV ops have been considered for adding
safe_halt() support. But it was rejected because HLT paravirt calls
only exist under PARAVIRT_XXL, and enabling it in TDX guest just for
safe_halt() use case is not worth the cost.
Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20220405232939.73860-9-kirill.shutemov@linux.intel.com
2022-04-05 16:29:17 -07:00
|
|
|
} else if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
|
|
|
|
pr_info("using TDX aware idle routine\n");
|
2023-01-12 12:43:16 -07:00
|
|
|
static_call_update(x86_idle, tdx_safe_halt);
|
2024-02-28 15:13:00 -07:00
|
|
|
} else {
|
2023-01-12 12:43:16 -07:00
|
|
|
static_call_update(x86_idle, default_idle);
|
2024-02-28 15:13:00 -07:00
|
|
|
}
|
2008-04-25 08:39:01 -07:00
|
|
|
}
|
|
|
|
|
2016-12-09 11:29:11 -07:00
|
|
|
void amd_e400_c1e_apic_setup(void)
|
2009-03-16 21:20:34 -07:00
|
|
|
{
|
2016-12-09 11:29:11 -07:00
|
|
|
if (boot_cpu_has_bug(X86_BUG_AMD_APIC_C1E)) {
|
|
|
|
pr_info("Switch to broadcast mode on CPU%d\n", smp_processor_id());
|
|
|
|
local_irq_disable();
|
|
|
|
tick_broadcast_force();
|
|
|
|
local_irq_enable();
|
|
|
|
}
|
2009-03-16 21:20:34 -07:00
|
|
|
}
|
|
|
|
|
2016-12-09 11:29:10 -07:00
|
|
|
void __init arch_post_acpi_subsys_init(void)
|
|
|
|
{
|
|
|
|
u32 lo, hi;
|
|
|
|
|
|
|
|
if (!boot_cpu_has_bug(X86_BUG_AMD_E400))
|
|
|
|
return;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* AMD E400 detection needs to happen after ACPI has been enabled. If
|
|
|
|
* the machine is affected K8_INTP_C1E_ACTIVE_MASK bits are set in
|
|
|
|
* MSR_K8_INT_PENDING_MSG.
|
|
|
|
*/
|
|
|
|
rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
|
|
|
|
if (!(lo & K8_INTP_C1E_ACTIVE_MASK))
|
|
|
|
return;
|
|
|
|
|
|
|
|
boot_cpu_set_bug(X86_BUG_AMD_APIC_C1E);
|
|
|
|
|
|
|
|
if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
|
|
|
|
mark_tsc_unstable("TSC halt in AMD C1E");
|
2024-02-28 15:13:00 -07:00
|
|
|
|
|
|
|
if (IS_ENABLED(CONFIG_GENERIC_CLOCKEVENTS_BROADCAST_IDLE))
|
|
|
|
static_branch_enable(&arch_needs_tick_broadcast);
|
|
|
|
pr_info("System has AMD C1E erratum E400. Workaround enabled.\n");
|
2016-12-09 11:29:10 -07:00
|
|
|
}
|
|
|
|
|
2008-04-25 08:39:01 -07:00
|
|
|
static int __init idle_setup(char *str)
|
|
|
|
{
|
2008-07-05 04:53:36 -07:00
|
|
|
if (!str)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2008-04-25 08:39:01 -07:00
|
|
|
if (!strcmp(str, "poll")) {
|
2012-05-21 19:50:07 -07:00
|
|
|
pr_info("using polling idle threads\n");
|
2010-11-03 09:06:14 -07:00
|
|
|
boot_option_idle_override = IDLE_POLL;
|
2013-03-21 14:50:03 -07:00
|
|
|
cpu_idle_poll_ctrl(true);
|
2010-11-03 09:06:14 -07:00
|
|
|
} else if (!strcmp(str, "halt")) {
|
2024-02-29 07:23:40 -07:00
|
|
|
/* 'idle=halt' HALT for idle. C-states are disabled. */
|
2010-11-03 09:06:14 -07:00
|
|
|
boot_option_idle_override = IDLE_HALT;
|
2008-06-24 03:01:09 -07:00
|
|
|
} else if (!strcmp(str, "nomwait")) {
|
2024-02-29 07:23:40 -07:00
|
|
|
/* 'idle=nomwait' disables MWAIT for idle */
|
2010-11-03 09:06:14 -07:00
|
|
|
boot_option_idle_override = IDLE_NOMWAIT;
|
2024-02-29 07:23:40 -07:00
|
|
|
} else {
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
2008-04-25 08:39:01 -07:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
early_param("idle", idle_setup);
|
|
|
|
|
2009-05-11 19:05:28 -07:00
|
|
|
unsigned long arch_align_stack(unsigned long sp)
|
|
|
|
{
|
|
|
|
if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)
|
2022-10-09 19:44:02 -07:00
|
|
|
sp -= get_random_u32_below(8192);
|
2009-05-11 19:05:28 -07:00
|
|
|
return sp & ~0xf;
|
|
|
|
}
|
|
|
|
|
|
|
|
unsigned long arch_randomize_brk(struct mm_struct *mm)
|
|
|
|
{
|
2024-02-16 23:25:43 -07:00
|
|
|
if (mmap_is_ia32())
|
|
|
|
return randomize_page(mm->brk, SZ_32M);
|
|
|
|
|
|
|
|
return randomize_page(mm->brk, SZ_1G);
|
2009-05-11 19:05:28 -07:00
|
|
|
}
|
|
|
|
|
2015-09-30 01:38:23 -07:00
|
|
|
/*
|
|
|
|
* Called from fs/proc with a reference on @p to find the function
|
|
|
|
* which called into schedule(). This needs to be done carefully
|
|
|
|
* because the task might wake up and we might look at a stack
|
|
|
|
* changing under us.
|
|
|
|
*/
|
2021-09-29 15:02:14 -07:00
|
|
|
unsigned long __get_wchan(struct task_struct *p)
|
2015-09-30 01:38:23 -07:00
|
|
|
{
|
2021-10-22 07:53:02 -07:00
|
|
|
struct unwind_state state;
|
|
|
|
unsigned long addr = 0;
|
2015-09-30 01:38:23 -07:00
|
|
|
|
2021-11-19 02:29:47 -07:00
|
|
|
if (!try_get_task_stack(p))
|
|
|
|
return 0;
|
|
|
|
|
2021-10-22 07:53:02 -07:00
|
|
|
for (unwind_start(&state, p, NULL, NULL); !unwind_done(&state);
|
|
|
|
unwind_next_frame(&state)) {
|
|
|
|
addr = unwind_get_return_address(&state);
|
|
|
|
if (!addr)
|
|
|
|
break;
|
|
|
|
if (in_sched_functions(addr))
|
|
|
|
continue;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2021-11-19 02:29:47 -07:00
|
|
|
put_task_stack(p);
|
|
|
|
|
2021-10-22 07:53:02 -07:00
|
|
|
return addr;
|
2015-09-30 01:38:23 -07:00
|
|
|
}
|
2017-03-20 01:16:23 -07:00
|
|
|
|
2022-05-12 05:04:08 -07:00
|
|
|
long do_arch_prctl_common(int option, unsigned long arg2)
|
2017-03-20 01:16:23 -07:00
|
|
|
{
|
2017-03-20 01:16:26 -07:00
|
|
|
switch (option) {
|
|
|
|
case ARCH_GET_CPUID:
|
|
|
|
return get_cpuid_mode();
|
|
|
|
case ARCH_SET_CPUID:
|
2022-05-12 05:04:08 -07:00
|
|
|
return set_cpuid_mode(arg2);
|
2021-10-21 15:55:10 -07:00
|
|
|
case ARCH_GET_XCOMP_SUPP:
|
|
|
|
case ARCH_GET_XCOMP_PERM:
|
|
|
|
case ARCH_REQ_XCOMP_PERM:
|
2022-01-05 05:35:12 -07:00
|
|
|
case ARCH_GET_XCOMP_GUEST_PERM:
|
|
|
|
case ARCH_REQ_XCOMP_GUEST_PERM:
|
2022-05-12 05:04:08 -07:00
|
|
|
return fpu_xstate_prctl(option, arg2);
|
2017-03-20 01:16:26 -07:00
|
|
|
}
|
|
|
|
|
2017-03-20 01:16:23 -07:00
|
|
|
return -EINVAL;
|
|
|
|
}
|