4e58aaeebb
Although the RCU CPU stall notifiers can be useful for dumping state when tracking down delicate forward-progress bugs where NUMA effects cause cache lines to be delivered to a given CPU regularly, but always in a state that prevents that CPU from making forward progress. These bugs can be detected by the RCU CPU stall-warning mechanism, but in some cases, the stall-warnings printk()s disrupt the forward-progress bug before any useful state can be obtained. Unfortunately, the notifier mechanism added by commit5b404fdaba
("rcu: Add RCU CPU stall notifier") can make matters worse if used at all carelessly. For example, if the stall warning was caused by a lock not being released, then any attempt to acquire that lock in the notifier will hang. This will prevent not only the notifier from producing any useful output, but it will also prevent the stall-warning message from ever appearing. This commit therefore hides this new RCU CPU stall notifier mechanism under a new RCU_CPU_STALL_NOTIFIER Kconfig option that depends on both DEBUG_KERNEL and RCU_EXPERT. In addition, the rcupdate.rcu_cpu_stall_notifiers=1 kernel boot parameter must also be specified. The RCU_CPU_STALL_NOTIFIER Kconfig option's help text contains a warning and explains the dangers of careless use, recommending lockless notifier code. In addition, a WARN() is triggered each time that an attempt is made to register a stall-warning notifier in kernels built with CONFIG_RCU_CPU_STALL_NOTIFIER=y. This combination of measures will keep use of this mechanism confined to debug kernels and away from routine deployments. [ paulmck: Apply Dan Carpenter feedback. ] Fixes:5b404fdaba
("rcu: Add RCU CPU stall notifier") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
172 lines
6.0 KiB
Plaintext
172 lines
6.0 KiB
Plaintext
# SPDX-License-Identifier: GPL-2.0-only
|
|
#
|
|
# RCU-related debugging configuration options
|
|
#
|
|
|
|
menu "RCU Debugging"
|
|
|
|
config PROVE_RCU
|
|
def_bool PROVE_LOCKING
|
|
|
|
config PROVE_RCU_LIST
|
|
bool "RCU list lockdep debugging"
|
|
depends on PROVE_RCU && RCU_EXPERT
|
|
default n
|
|
help
|
|
Enable RCU lockdep checking for list usages. By default it is
|
|
turned off since there are several list RCU users that still
|
|
need to be converted to pass a lockdep expression. To prevent
|
|
false-positive splats, we keep it default disabled but once all
|
|
users are converted, we can remove this config option.
|
|
|
|
config TORTURE_TEST
|
|
tristate
|
|
default n
|
|
|
|
config RCU_SCALE_TEST
|
|
tristate "performance tests for RCU"
|
|
depends on DEBUG_KERNEL
|
|
select TORTURE_TEST
|
|
default n
|
|
help
|
|
This option provides a kernel module that runs performance
|
|
tests on the RCU infrastructure. The kernel module may be built
|
|
after the fact on the running kernel to be tested, if desired.
|
|
|
|
Say Y here if you want RCU performance tests to be built into
|
|
the kernel.
|
|
Say M if you want the RCU performance tests to build as a module.
|
|
Say N if you are unsure.
|
|
|
|
config RCU_TORTURE_TEST
|
|
tristate "torture tests for RCU"
|
|
depends on DEBUG_KERNEL
|
|
select TORTURE_TEST
|
|
default n
|
|
help
|
|
This option provides a kernel module that runs torture tests
|
|
on the RCU infrastructure. The kernel module may be built
|
|
after the fact on the running kernel to be tested, if desired.
|
|
|
|
Say Y here if you want RCU torture tests to be built into
|
|
the kernel.
|
|
Say M if you want the RCU torture tests to build as a module.
|
|
Say N if you are unsure.
|
|
|
|
config RCU_REF_SCALE_TEST
|
|
tristate "Scalability tests for read-side synchronization (RCU and others)"
|
|
depends on DEBUG_KERNEL
|
|
select TORTURE_TEST
|
|
default n
|
|
help
|
|
This option provides a kernel module that runs performance tests
|
|
useful comparing RCU with various read-side synchronization mechanisms.
|
|
The kernel module may be built after the fact on the running kernel to be
|
|
tested, if desired.
|
|
|
|
Say Y here if you want these performance tests built into the kernel.
|
|
Say M if you want to build it as a module instead.
|
|
Say N if you are unsure.
|
|
|
|
config RCU_CPU_STALL_TIMEOUT
|
|
int "RCU CPU stall timeout in seconds"
|
|
depends on RCU_STALL_COMMON
|
|
range 3 300
|
|
default 21
|
|
help
|
|
If a given RCU grace period extends more than the specified
|
|
number of seconds, a CPU stall warning is printed. If the
|
|
RCU grace period persists, additional CPU stall warnings are
|
|
printed at more widely spaced intervals.
|
|
|
|
config RCU_EXP_CPU_STALL_TIMEOUT
|
|
int "Expedited RCU CPU stall timeout in milliseconds"
|
|
depends on RCU_STALL_COMMON
|
|
range 0 300000
|
|
default 0
|
|
help
|
|
If a given expedited RCU grace period extends more than the
|
|
specified number of milliseconds, a CPU stall warning is printed.
|
|
If the RCU grace period persists, additional CPU stall warnings
|
|
are printed at more widely spaced intervals. A value of zero
|
|
says to use the RCU_CPU_STALL_TIMEOUT value converted from
|
|
seconds to milliseconds.
|
|
|
|
config RCU_CPU_STALL_CPUTIME
|
|
bool "Provide additional RCU stall debug information"
|
|
depends on RCU_STALL_COMMON
|
|
default n
|
|
help
|
|
Collect statistics during the sampling period, such as the number of
|
|
(hard interrupts, soft interrupts, task switches) and the cputime of
|
|
(hard interrupts, soft interrupts, kernel tasks) are added to the
|
|
RCU stall report. For multiple continuous RCU stalls, all sampling
|
|
periods begin at half of the first RCU stall timeout.
|
|
The boot option rcupdate.rcu_cpu_stall_cputime has the same function
|
|
as this one, but will override this if it exists.
|
|
|
|
config RCU_CPU_STALL_NOTIFIER
|
|
bool "Provide RCU CPU-stall notifiers"
|
|
depends on RCU_STALL_COMMON
|
|
depends on DEBUG_KERNEL
|
|
depends on RCU_EXPERT
|
|
default n
|
|
help
|
|
WARNING: You almost certainly do not want this!!!
|
|
|
|
Enable RCU CPU-stall notifiers, which are invoked just before
|
|
printing the RCU CPU stall warning. As such, bugs in notifier
|
|
callbacks can prevent stall warnings from being printed.
|
|
And the whole reason that a stall warning is being printed is
|
|
that something is hung up somewhere. Therefore, the notifier
|
|
callbacks must be written extremely carefully, preferably
|
|
containing only lockless code. After all, it is quite possible
|
|
that the whole reason that the RCU CPU stall is happening in
|
|
the first place is that someone forgot to release whatever lock
|
|
that you are thinking of acquiring. In which case, having your
|
|
notifier callback acquire that lock will hang, preventing the
|
|
RCU CPU stall warning from appearing.
|
|
|
|
Say Y here if you want RCU CPU stall notifiers (you don't want them)
|
|
Say N if you are unsure.
|
|
|
|
config RCU_TRACE
|
|
bool "Enable tracing for RCU"
|
|
depends on DEBUG_KERNEL
|
|
default y if TREE_RCU
|
|
select TRACE_CLOCK
|
|
help
|
|
This option enables additional tracepoints for ftrace-style
|
|
event tracing.
|
|
|
|
Say Y here if you want to enable RCU tracing
|
|
Say N if you are unsure.
|
|
|
|
config RCU_EQS_DEBUG
|
|
bool "Provide debugging asserts for adding NO_HZ support to an arch"
|
|
depends on DEBUG_KERNEL
|
|
help
|
|
This option provides consistency checks in RCU's handling of
|
|
NO_HZ. These checks have proven quite helpful in detecting
|
|
bugs in arch-specific NO_HZ code.
|
|
|
|
Say N here if you need ultimate kernel/user switch latencies
|
|
Say Y if you are unsure
|
|
|
|
config RCU_STRICT_GRACE_PERIOD
|
|
bool "Provide debug RCU implementation with short grace periods"
|
|
depends on DEBUG_KERNEL && RCU_EXPERT && NR_CPUS <= 4 && !TINY_RCU
|
|
default n
|
|
select PREEMPT_COUNT if PREEMPT=n
|
|
help
|
|
Select this option to build an RCU variant that is strict about
|
|
grace periods, making them as short as it can. This limits
|
|
scalability, destroys real-time response, degrades battery
|
|
lifetime and kills performance. Don't try this on large
|
|
machines, as in systems with more than about 10 or 20 CPUs.
|
|
But in conjunction with tools like KASAN, it can be helpful
|
|
when looking for certain types of RCU usage bugs, for example,
|
|
too-short RCU read-side critical sections.
|
|
|
|
endmenu # "RCU Debugging"
|