384a746bb5
There was insufficient review and no agreement that this is the right
approach.
There are serious flaws with the implementation that make processes using
mlock() not even work with simple fork() [1] and we get reliable crashes
when rebooting.
Further, simply because we might be unmapping a single PTE of a large
mlocked folio, we shouldn't zero out the whole folio.
... especially because the code can also *corrupt* urelated memory because
kernel_init_pages(page, folio_nr_pages(folio));
Could end up writing outside of the actual folio if we work with a tail
page.
Let's revert it. Once there is agreement that this is the right approach,
the issues were fixed and there was reasonable review and proper testing,
we can consider it again.
[1] https://lkml.kernel.org/r/4da9da2f-73e4-45fd-b62f-a8a513314057@redhat.com
Link: https://lkml.kernel.org/r/20240605091710.38961-1-david@redhat.com
Fixes: ba42b524a0
("mm: init_mlocked_on_free_v3")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: David Wang <00107082@163.com>
Closes: https://lore.kernel.org/lkml/20240528151340.4282-1-00107082@163.com/
Reported-by: Lance Yang <ioworker0@gmail.com>
Closes: https://lkml.kernel.org/r/20240601140917.43562-1-ioworker0@gmail.com
Acked-by: Lance Yang <ioworker0@gmail.com>
Cc: York Jasper Niebuhr <yjnworkstation@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
382 lines
15 KiB
Plaintext
382 lines
15 KiB
Plaintext
# SPDX-License-Identifier: GPL-2.0-only
|
|
menu "Kernel hardening options"
|
|
|
|
config GCC_PLUGIN_STRUCTLEAK
|
|
bool
|
|
help
|
|
While the kernel is built with warnings enabled for any missed
|
|
stack variable initializations, this warning is silenced for
|
|
anything passed by reference to another function, under the
|
|
occasionally misguided assumption that the function will do
|
|
the initialization. As this regularly leads to exploitable
|
|
flaws, this plugin is available to identify and zero-initialize
|
|
such variables, depending on the chosen level of coverage.
|
|
|
|
This plugin was originally ported from grsecurity/PaX. More
|
|
information at:
|
|
* https://grsecurity.net/
|
|
* https://pax.grsecurity.net/
|
|
|
|
menu "Memory initialization"
|
|
|
|
config CC_HAS_AUTO_VAR_INIT_PATTERN
|
|
def_bool $(cc-option,-ftrivial-auto-var-init=pattern)
|
|
|
|
config CC_HAS_AUTO_VAR_INIT_ZERO_BARE
|
|
def_bool $(cc-option,-ftrivial-auto-var-init=zero)
|
|
|
|
config CC_HAS_AUTO_VAR_INIT_ZERO_ENABLER
|
|
# Clang 16 and later warn about using the -enable flag, but it
|
|
# is required before then.
|
|
def_bool $(cc-option,-ftrivial-auto-var-init=zero -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang)
|
|
depends on !CC_HAS_AUTO_VAR_INIT_ZERO_BARE
|
|
|
|
config CC_HAS_AUTO_VAR_INIT_ZERO
|
|
def_bool CC_HAS_AUTO_VAR_INIT_ZERO_BARE || CC_HAS_AUTO_VAR_INIT_ZERO_ENABLER
|
|
|
|
choice
|
|
prompt "Initialize kernel stack variables at function entry"
|
|
default GCC_PLUGIN_STRUCTLEAK_BYREF_ALL if COMPILE_TEST && GCC_PLUGINS
|
|
default INIT_STACK_ALL_PATTERN if COMPILE_TEST && CC_HAS_AUTO_VAR_INIT_PATTERN
|
|
default INIT_STACK_ALL_ZERO if CC_HAS_AUTO_VAR_INIT_ZERO
|
|
default INIT_STACK_NONE
|
|
help
|
|
This option enables initialization of stack variables at
|
|
function entry time. This has the possibility to have the
|
|
greatest coverage (since all functions can have their
|
|
variables initialized), but the performance impact depends
|
|
on the function calling complexity of a given workload's
|
|
syscalls.
|
|
|
|
This chooses the level of coverage over classes of potentially
|
|
uninitialized variables. The selected class of variable will be
|
|
initialized before use in a function.
|
|
|
|
config INIT_STACK_NONE
|
|
bool "no automatic stack variable initialization (weakest)"
|
|
help
|
|
Disable automatic stack variable initialization.
|
|
This leaves the kernel vulnerable to the standard
|
|
classes of uninitialized stack variable exploits
|
|
and information exposures.
|
|
|
|
config GCC_PLUGIN_STRUCTLEAK_USER
|
|
bool "zero-init structs marked for userspace (weak)"
|
|
# Plugin can be removed once the kernel only supports GCC 12+
|
|
depends on GCC_PLUGINS && !CC_HAS_AUTO_VAR_INIT_ZERO
|
|
select GCC_PLUGIN_STRUCTLEAK
|
|
help
|
|
Zero-initialize any structures on the stack containing
|
|
a __user attribute. This can prevent some classes of
|
|
uninitialized stack variable exploits and information
|
|
exposures, like CVE-2013-2141:
|
|
https://git.kernel.org/linus/b9e146d8eb3b9eca
|
|
|
|
config GCC_PLUGIN_STRUCTLEAK_BYREF
|
|
bool "zero-init structs passed by reference (strong)"
|
|
# Plugin can be removed once the kernel only supports GCC 12+
|
|
depends on GCC_PLUGINS && !CC_HAS_AUTO_VAR_INIT_ZERO
|
|
depends on !(KASAN && KASAN_STACK)
|
|
select GCC_PLUGIN_STRUCTLEAK
|
|
help
|
|
Zero-initialize any structures on the stack that may
|
|
be passed by reference and had not already been
|
|
explicitly initialized. This can prevent most classes
|
|
of uninitialized stack variable exploits and information
|
|
exposures, like CVE-2017-1000410:
|
|
https://git.kernel.org/linus/06e7e776ca4d3654
|
|
|
|
As a side-effect, this keeps a lot of variables on the
|
|
stack that can otherwise be optimized out, so combining
|
|
this with CONFIG_KASAN_STACK can lead to a stack overflow
|
|
and is disallowed.
|
|
|
|
config GCC_PLUGIN_STRUCTLEAK_BYREF_ALL
|
|
bool "zero-init everything passed by reference (very strong)"
|
|
# Plugin can be removed once the kernel only supports GCC 12+
|
|
depends on GCC_PLUGINS && !CC_HAS_AUTO_VAR_INIT_ZERO
|
|
depends on !(KASAN && KASAN_STACK)
|
|
select GCC_PLUGIN_STRUCTLEAK
|
|
help
|
|
Zero-initialize any stack variables that may be passed
|
|
by reference and had not already been explicitly
|
|
initialized. This is intended to eliminate all classes
|
|
of uninitialized stack variable exploits and information
|
|
exposures.
|
|
|
|
As a side-effect, this keeps a lot of variables on the
|
|
stack that can otherwise be optimized out, so combining
|
|
this with CONFIG_KASAN_STACK can lead to a stack overflow
|
|
and is disallowed.
|
|
|
|
config INIT_STACK_ALL_PATTERN
|
|
bool "pattern-init everything (strongest)"
|
|
depends on CC_HAS_AUTO_VAR_INIT_PATTERN
|
|
depends on !KMSAN
|
|
help
|
|
Initializes everything on the stack (including padding)
|
|
with a specific debug value. This is intended to eliminate
|
|
all classes of uninitialized stack variable exploits and
|
|
information exposures, even variables that were warned about
|
|
having been left uninitialized.
|
|
|
|
Pattern initialization is known to provoke many existing bugs
|
|
related to uninitialized locals, e.g. pointers receive
|
|
non-NULL values, buffer sizes and indices are very big. The
|
|
pattern is situation-specific; Clang on 64-bit uses 0xAA
|
|
repeating for all types and padding except float and double
|
|
which use 0xFF repeating (-NaN). Clang on 32-bit uses 0xFF
|
|
repeating for all types and padding.
|
|
|
|
config INIT_STACK_ALL_ZERO
|
|
bool "zero-init everything (strongest and safest)"
|
|
depends on CC_HAS_AUTO_VAR_INIT_ZERO
|
|
depends on !KMSAN
|
|
help
|
|
Initializes everything on the stack (including padding)
|
|
with a zero value. This is intended to eliminate all
|
|
classes of uninitialized stack variable exploits and
|
|
information exposures, even variables that were warned
|
|
about having been left uninitialized.
|
|
|
|
Zero initialization provides safe defaults for strings
|
|
(immediately NUL-terminated), pointers (NULL), indices
|
|
(index 0), and sizes (0 length), so it is therefore more
|
|
suitable as a production security mitigation than pattern
|
|
initialization.
|
|
|
|
endchoice
|
|
|
|
config GCC_PLUGIN_STRUCTLEAK_VERBOSE
|
|
bool "Report forcefully initialized variables"
|
|
depends on GCC_PLUGIN_STRUCTLEAK
|
|
depends on !COMPILE_TEST # too noisy
|
|
help
|
|
This option will cause a warning to be printed each time the
|
|
structleak plugin finds a variable it thinks needs to be
|
|
initialized. Since not all existing initializers are detected
|
|
by the plugin, this can produce false positive warnings.
|
|
|
|
config GCC_PLUGIN_STACKLEAK
|
|
bool "Poison kernel stack before returning from syscalls"
|
|
depends on GCC_PLUGINS
|
|
depends on HAVE_ARCH_STACKLEAK
|
|
help
|
|
This option makes the kernel erase the kernel stack before
|
|
returning from system calls. This has the effect of leaving
|
|
the stack initialized to the poison value, which both reduces
|
|
the lifetime of any sensitive stack contents and reduces
|
|
potential for uninitialized stack variable exploits or information
|
|
exposures (it does not cover functions reaching the same stack
|
|
depth as prior functions during the same syscall). This blocks
|
|
most uninitialized stack variable attacks, with the performance
|
|
impact being driven by the depth of the stack usage, rather than
|
|
the function calling complexity.
|
|
|
|
The performance impact on a single CPU system kernel compilation
|
|
sees a 1% slowdown, other systems and workloads may vary and you
|
|
are advised to test this feature on your expected workload before
|
|
deploying it.
|
|
|
|
This plugin was ported from grsecurity/PaX. More information at:
|
|
* https://grsecurity.net/
|
|
* https://pax.grsecurity.net/
|
|
|
|
config GCC_PLUGIN_STACKLEAK_VERBOSE
|
|
bool "Report stack depth analysis instrumentation" if EXPERT
|
|
depends on GCC_PLUGIN_STACKLEAK
|
|
depends on !COMPILE_TEST # too noisy
|
|
help
|
|
This option will cause a warning to be printed each time the
|
|
stackleak plugin finds a function it thinks needs to be
|
|
instrumented. This is useful for comparing coverage between
|
|
builds.
|
|
|
|
config STACKLEAK_TRACK_MIN_SIZE
|
|
int "Minimum stack frame size of functions tracked by STACKLEAK"
|
|
default 100
|
|
range 0 4096
|
|
depends on GCC_PLUGIN_STACKLEAK
|
|
help
|
|
The STACKLEAK gcc plugin instruments the kernel code for tracking
|
|
the lowest border of the kernel stack (and for some other purposes).
|
|
It inserts the stackleak_track_stack() call for the functions with
|
|
a stack frame size greater than or equal to this parameter.
|
|
If unsure, leave the default value 100.
|
|
|
|
config STACKLEAK_METRICS
|
|
bool "Show STACKLEAK metrics in the /proc file system"
|
|
depends on GCC_PLUGIN_STACKLEAK
|
|
depends on PROC_FS
|
|
help
|
|
If this is set, STACKLEAK metrics for every task are available in
|
|
the /proc file system. In particular, /proc/<pid>/stack_depth
|
|
shows the maximum kernel stack consumption for the current and
|
|
previous syscalls. Although this information is not precise, it
|
|
can be useful for estimating the STACKLEAK performance impact for
|
|
your workloads.
|
|
|
|
config STACKLEAK_RUNTIME_DISABLE
|
|
bool "Allow runtime disabling of kernel stack erasing"
|
|
depends on GCC_PLUGIN_STACKLEAK
|
|
help
|
|
This option provides 'stack_erasing' sysctl, which can be used in
|
|
runtime to control kernel stack erasing for kernels built with
|
|
CONFIG_GCC_PLUGIN_STACKLEAK.
|
|
|
|
config INIT_ON_ALLOC_DEFAULT_ON
|
|
bool "Enable heap memory zeroing on allocation by default"
|
|
depends on !KMSAN
|
|
help
|
|
This has the effect of setting "init_on_alloc=1" on the kernel
|
|
command line. This can be disabled with "init_on_alloc=0".
|
|
When "init_on_alloc" is enabled, all page allocator and slab
|
|
allocator memory will be zeroed when allocated, eliminating
|
|
many kinds of "uninitialized heap memory" flaws, especially
|
|
heap content exposures. The performance impact varies by
|
|
workload, but most cases see <1% impact. Some synthetic
|
|
workloads have measured as high as 7%.
|
|
|
|
config INIT_ON_FREE_DEFAULT_ON
|
|
bool "Enable heap memory zeroing on free by default"
|
|
depends on !KMSAN
|
|
help
|
|
This has the effect of setting "init_on_free=1" on the kernel
|
|
command line. This can be disabled with "init_on_free=0".
|
|
Similar to "init_on_alloc", when "init_on_free" is enabled,
|
|
all page allocator and slab allocator memory will be zeroed
|
|
when freed, eliminating many kinds of "uninitialized heap memory"
|
|
flaws, especially heap content exposures. The primary difference
|
|
with "init_on_free" is that data lifetime in memory is reduced,
|
|
as anything freed is wiped immediately, making live forensics or
|
|
cold boot memory attacks unable to recover freed memory contents.
|
|
The performance impact varies by workload, but is more expensive
|
|
than "init_on_alloc" due to the negative cache effects of
|
|
touching "cold" memory areas. Most cases see 3-5% impact. Some
|
|
synthetic workloads have measured as high as 8%.
|
|
|
|
config CC_HAS_ZERO_CALL_USED_REGS
|
|
def_bool $(cc-option,-fzero-call-used-regs=used-gpr)
|
|
# https://github.com/ClangBuiltLinux/linux/issues/1766
|
|
# https://github.com/llvm/llvm-project/issues/59242
|
|
depends on !CC_IS_CLANG || CLANG_VERSION > 150006
|
|
|
|
config ZERO_CALL_USED_REGS
|
|
bool "Enable register zeroing on function exit"
|
|
depends on CC_HAS_ZERO_CALL_USED_REGS
|
|
help
|
|
At the end of functions, always zero any caller-used register
|
|
contents. This helps ensure that temporary values are not
|
|
leaked beyond the function boundary. This means that register
|
|
contents are less likely to be available for side channels
|
|
and information exposures. Additionally, this helps reduce the
|
|
number of useful ROP gadgets by about 20% (and removes compiler
|
|
generated "write-what-where" gadgets) in the resulting kernel
|
|
image. This has a less than 1% performance impact on most
|
|
workloads. Image size growth depends on architecture, and should
|
|
be evaluated for suitability. For example, x86_64 grows by less
|
|
than 1%, and arm64 grows by about 5%.
|
|
|
|
endmenu
|
|
|
|
menu "Hardening of kernel data structures"
|
|
|
|
config LIST_HARDENED
|
|
bool "Check integrity of linked list manipulation"
|
|
help
|
|
Minimal integrity checking in the linked-list manipulation routines
|
|
to catch memory corruptions that are not guaranteed to result in an
|
|
immediate access fault.
|
|
|
|
If unsure, say N.
|
|
|
|
config BUG_ON_DATA_CORRUPTION
|
|
bool "Trigger a BUG when data corruption is detected"
|
|
select LIST_HARDENED
|
|
help
|
|
Select this option if the kernel should BUG when it encounters
|
|
data corruption in kernel memory structures when they get checked
|
|
for validity.
|
|
|
|
If unsure, say N.
|
|
|
|
endmenu
|
|
|
|
config CC_HAS_RANDSTRUCT
|
|
def_bool $(cc-option,-frandomize-layout-seed-file=/dev/null)
|
|
# Randstruct was first added in Clang 15, but it isn't safe to use until
|
|
# Clang 16 due to https://github.com/llvm/llvm-project/issues/60349
|
|
depends on !CC_IS_CLANG || CLANG_VERSION >= 160000
|
|
|
|
choice
|
|
prompt "Randomize layout of sensitive kernel structures"
|
|
default RANDSTRUCT_FULL if COMPILE_TEST && (GCC_PLUGINS || CC_HAS_RANDSTRUCT)
|
|
default RANDSTRUCT_NONE
|
|
help
|
|
If you enable this, the layouts of structures that are entirely
|
|
function pointers (and have not been manually annotated with
|
|
__no_randomize_layout), or structures that have been explicitly
|
|
marked with __randomize_layout, will be randomized at compile-time.
|
|
This can introduce the requirement of an additional information
|
|
exposure vulnerability for exploits targeting these structure
|
|
types.
|
|
|
|
Enabling this feature will introduce some performance impact,
|
|
slightly increase memory usage, and prevent the use of forensic
|
|
tools like Volatility against the system (unless the kernel
|
|
source tree isn't cleaned after kernel installation).
|
|
|
|
The seed used for compilation is in scripts/basic/randomize.seed.
|
|
It remains after a "make clean" to allow for external modules to
|
|
be compiled with the existing seed and will be removed by a
|
|
"make mrproper" or "make distclean". This file should not be made
|
|
public, or the structure layout can be determined.
|
|
|
|
config RANDSTRUCT_NONE
|
|
bool "Disable structure layout randomization"
|
|
help
|
|
Build normally: no structure layout randomization.
|
|
|
|
config RANDSTRUCT_FULL
|
|
bool "Fully randomize structure layout"
|
|
depends on CC_HAS_RANDSTRUCT || GCC_PLUGINS
|
|
select MODVERSIONS if MODULES
|
|
help
|
|
Fully randomize the member layout of sensitive
|
|
structures as much as possible, which may have both a
|
|
memory size and performance impact.
|
|
|
|
One difference between the Clang and GCC plugin
|
|
implementations is the handling of bitfields. The GCC
|
|
plugin treats them as fully separate variables,
|
|
introducing sometimes significant padding. Clang tries
|
|
to keep adjacent bitfields together, but with their bit
|
|
ordering randomized.
|
|
|
|
config RANDSTRUCT_PERFORMANCE
|
|
bool "Limit randomization of structure layout to cache-lines"
|
|
depends on GCC_PLUGINS
|
|
select MODVERSIONS if MODULES
|
|
help
|
|
Randomization of sensitive kernel structures will make a
|
|
best effort at restricting randomization to cacheline-sized
|
|
groups of members. It will further not randomize bitfields
|
|
in structures. This reduces the performance hit of RANDSTRUCT
|
|
at the cost of weakened randomization.
|
|
endchoice
|
|
|
|
config RANDSTRUCT
|
|
def_bool !RANDSTRUCT_NONE
|
|
|
|
config GCC_PLUGIN_RANDSTRUCT
|
|
def_bool GCC_PLUGINS && RANDSTRUCT
|
|
help
|
|
Use GCC plugin to randomize structure layout.
|
|
|
|
This plugin was ported from grsecurity/PaX. More
|
|
information at:
|
|
* https://grsecurity.net/
|
|
* https://pax.grsecurity.net/
|
|
|
|
endmenu
|