2019-05-19 05:07:45 -07:00
|
|
|
# SPDX-License-Identifier: GPL-2.0-only
|
2008-05-12 12:20:42 -07:00
|
|
|
#
|
2008-10-06 16:06:12 -07:00
|
|
|
# Architectures that offer an FUNCTION_TRACER implementation should
|
|
|
|
# select HAVE_FUNCTION_TRACER:
|
2008-05-12 12:20:42 -07:00
|
|
|
#
|
2008-09-21 11:12:14 -07:00
|
|
|
|
2008-11-23 03:39:08 -07:00
|
|
|
config USER_STACKTRACE_SUPPORT
|
|
|
|
bool
|
|
|
|
|
2008-09-21 11:12:14 -07:00
|
|
|
config NOP_TRACER
|
|
|
|
bool
|
|
|
|
|
2022-03-15 07:00:50 -07:00
|
|
|
config HAVE_RETHOOK
|
|
|
|
bool
|
|
|
|
|
|
|
|
config RETHOOK
|
|
|
|
bool
|
|
|
|
depends on HAVE_RETHOOK
|
|
|
|
help
|
|
|
|
Enable generic return hooking feature. This is an internal
|
|
|
|
API, which will be used by other function-entry hooking
|
|
|
|
features like fprobe and kprobes.
|
|
|
|
|
2008-10-06 16:06:12 -07:00
|
|
|
config HAVE_FUNCTION_TRACER
|
2008-05-12 12:20:42 -07:00
|
|
|
bool
|
2009-09-14 17:10:15 -07:00
|
|
|
help
|
2018-05-08 11:14:57 -07:00
|
|
|
See Documentation/trace/ftrace-design.rst
|
2008-05-12 12:20:42 -07:00
|
|
|
|
2008-11-25 13:07:04 -07:00
|
|
|
config HAVE_FUNCTION_GRAPH_TRACER
|
2008-11-10 23:14:25 -07:00
|
|
|
bool
|
2009-09-14 17:10:15 -07:00
|
|
|
help
|
2018-05-08 11:14:57 -07:00
|
|
|
See Documentation/trace/ftrace-design.rst
|
2008-11-10 23:14:25 -07:00
|
|
|
|
function_graph: Support recording and printing the return value of function
Analyzing system call failures with the function_graph tracer can be a
time-consuming process, particularly when locating the kernel function
that first returns an error in the trace logs. This change aims to
simplify the process by recording the function return value to the
'retval' member of 'ftrace_graph_ret' and printing it when outputting
the trace log.
We have introduced new trace options: funcgraph-retval and
funcgraph-retval-hex. The former controls whether to display the return
value, while the latter controls the display format.
Please note that even if a function's return type is void, a return
value will still be printed. You can simply ignore it.
This patch only establishes the fundamental infrastructure. Subsequent
patches will make this feature available on some commonly used processor
architectures.
Here is an example:
I attempted to attach the demo process to a cpu cgroup, but it failed:
echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks
-bash: echo: write error: Invalid argument
The strace logs indicate that the write system call returned -EINVAL(-22):
...
write(1, "273\n", 4) = -1 EINVAL (Invalid argument)
...
To capture trace logs during a write system call, use the following
commands:
cd /sys/kernel/debug/tracing/
echo 0 > tracing_on
echo > trace
echo *sys_write > set_graph_function
echo *spin* > set_graph_notrace
echo *rcu* >> set_graph_notrace
echo *alloc* >> set_graph_notrace
echo preempt* >> set_graph_notrace
echo kfree* >> set_graph_notrace
echo $$ > set_ftrace_pid
echo function_graph > current_tracer
echo 1 > options/funcgraph-retval
echo 0 > options/funcgraph-retval-hex
echo 1 > tracing_on
echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks
echo 0 > tracing_on
cat trace > ~/trace.log
To locate the root cause, search for error code -22 directly in the file
trace.log and identify the first function that returned -22. Once you
have identified this function, examine its code to determine the root
cause.
For example, in the trace log below, cpu_cgroup_can_attach
returned -22 first, so we can focus our analysis on this function to
identify the root cause.
...
1) | cgroup_migrate() {
1) 0.651 us | cgroup_migrate_add_task(); /* = 0xffff93fcfd346c00 */
1) | cgroup_migrate_execute() {
1) | cpu_cgroup_can_attach() {
1) | cgroup_taskset_first() {
1) 0.732 us | cgroup_taskset_next(); /* = 0xffff93fc8fb20000 */
1) 1.232 us | } /* cgroup_taskset_first = 0xffff93fc8fb20000 */
1) 0.380 us | sched_rt_can_attach(); /* = 0x0 */
1) 2.335 us | } /* cpu_cgroup_can_attach = -22 */
1) 4.369 us | } /* cgroup_migrate_execute = -22 */
1) 7.143 us | } /* cgroup_migrate = -22 */
...
Link: https://lkml.kernel.org/r/1fc502712c981e0e6742185ba242992170ac9da8.1680954589.git.pengdonglin@sangfor.com.cn
Tested-by: Florian Kauer <florian.kauer@linutronix.de>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Donglin Peng <pengdonglin@sangfor.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-04-08 05:42:15 -07:00
|
|
|
config HAVE_FUNCTION_GRAPH_RETVAL
|
|
|
|
bool
|
|
|
|
|
2008-05-16 21:01:36 -07:00
|
|
|
config HAVE_DYNAMIC_FTRACE
|
|
|
|
bool
|
2009-09-14 17:10:15 -07:00
|
|
|
help
|
2018-05-08 11:14:57 -07:00
|
|
|
See Documentation/trace/ftrace-design.rst
|
2008-05-16 21:01:36 -07:00
|
|
|
|
2012-09-28 01:15:17 -07:00
|
|
|
config HAVE_DYNAMIC_FTRACE_WITH_REGS
|
|
|
|
bool
|
|
|
|
|
2019-11-08 11:07:06 -07:00
|
|
|
config HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
|
|
|
|
bool
|
|
|
|
|
ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS
Architectures without dynamic ftrace trampolines incur an overhead when
multiple ftrace_ops are enabled with distinct filters. in these cases,
each call site calls a common trampoline which uses
ftrace_ops_list_func() to iterate over all enabled ftrace functions, and
so incurs an overhead relative to the size of this list (including RCU
protection overhead).
Architectures with dynamic ftrace trampolines avoid this overhead for
call sites which have a single associated ftrace_ops. In these cases,
the dynamic trampoline is customized to branch directly to the relevant
ftrace function, avoiding the list overhead.
On some architectures it's impractical and/or undesirable to implement
dynamic ftrace trampolines. For example, arm64 has limited branch ranges
and cannot always directly branch from a call site to an arbitrary
address (e.g. from a kernel text address to an arbitrary module
address). Calls from modules to core kernel text can be indirected via
PLTs (allocated at module load time) to address this, but the same is
not possible from calls from core kernel text.
Using an indirect branch from a call site to an arbitrary trampoline is
possible, but requires several more instructions in the function
prologue (or immediately before it), and/or comes with far more complex
requirements for patching.
Instead, this patch adds a new option, where an architecture can
associate each call site with a pointer to an ftrace_ops, placed at a
fixed offset from the call site. A shared trampoline can recover this
pointer and call ftrace_ops::func() without needing to go via
ftrace_ops_list_func(), avoiding the associated overhead.
This avoids issues with branch range limitations, and avoids the need to
allocate and manipulate dynamic trampolines, making it far simpler to
implement and maintain, while having similar performance
characteristics.
Note that this allows for dynamic ftrace_ops to be invoked directly from
an architecture's ftrace_caller trampoline, whereas existing code forces
the use of ftrace_ops_get_list_func(), which is in part necessary to
permit the ftrace_ops to be freed once unregistered *and* to avoid
branch/address-generation range limitation on some architectures (e.g.
where ops->func is a module address, and may be outside of the direct
branch range for callsites within the main kernel image).
The CALL_OPS approach avoids this problems and is safe as:
* The existing synchronization in ftrace_shutdown() using
ftrace_shutdown() using synchronize_rcu_tasks_rude() (and
synchronize_rcu_tasks()) ensures that no tasks hold a stale reference
to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline,
or while invoking ftrace_ops::func), when that ftrace_ops is
unregistered.
Arguably this could also be relied upon for the existing scheme,
permitting dynamic ftrace_ops to be invoked directly when ops->func is
in range, but this will require additional logic to handle branch
range limitations, and is not handled by this patch.
* Each callsite's ftrace_ops pointer literal can hold any valid kernel
address, and is updated atomically. As an architecture's ftrace_caller
trampoline will atomically load the ops pointer then dereference
ops->func, there is no risk of invoking ops->func with a mismatches
ops pointer, and updates to the ops pointer do not require special
care.
A subsequent patch will implement architectures support for arm64. There
should be no functional change as a result of this patch alone.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-23 06:45:56 -07:00
|
|
|
config HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
|
|
|
|
bool
|
|
|
|
|
2020-10-27 07:55:55 -07:00
|
|
|
config HAVE_DYNAMIC_FTRACE_WITH_ARGS
|
|
|
|
bool
|
|
|
|
help
|
|
|
|
If this is set, then arguments and stack can be found from
|
2022-11-03 10:05:19 -07:00
|
|
|
the ftrace_regs passed into the function callback regs parameter
|
2020-10-27 07:55:55 -07:00
|
|
|
by default, even without setting the REGS flag in the ftrace_ops.
|
2022-11-03 10:05:19 -07:00
|
|
|
This allows for use of ftrace_regs_get_argument() and
|
|
|
|
ftrace_regs_get_stack_pointer().
|
2020-10-27 07:55:55 -07:00
|
|
|
|
2022-09-03 06:11:53 -07:00
|
|
|
config HAVE_DYNAMIC_FTRACE_NO_PATCHABLE
|
|
|
|
bool
|
|
|
|
help
|
|
|
|
If the architecture generates __patchable_function_entries sections
|
|
|
|
but does not want them included in the ftrace locations.
|
|
|
|
|
ftrace: create __mcount_loc section
This patch creates a section in the kernel called "__mcount_loc".
This will hold a list of pointers to the mcount relocation for
each call site of mcount.
For example:
objdump -dr init/main.o
[...]
Disassembly of section .text:
0000000000000000 <do_one_initcall>:
0: 55 push %rbp
[...]
000000000000017b <init_post>:
17b: 55 push %rbp
17c: 48 89 e5 mov %rsp,%rbp
17f: 53 push %rbx
180: 48 83 ec 08 sub $0x8,%rsp
184: e8 00 00 00 00 callq 189 <init_post+0xe>
185: R_X86_64_PC32 mcount+0xfffffffffffffffc
[...]
We will add a section to point to each function call.
.section __mcount_loc,"a",@progbits
[...]
.quad .text + 0x185
[...]
The offset to of the mcount call site in init_post is an offset from
the start of the section, and not the start of the function init_post.
The mcount relocation is at the call site 0x185 from the start of the
.text section.
.text + 0x185 == init_post + 0xa
We need a way to add this __mcount_loc section in a way that we do not
lose the relocations after final link. The .text section here will
be attached to all other .text sections after final link and the
offsets will be meaningless. We need to keep track of where these
.text sections are.
To do this, we use the start of the first function in the section.
do_one_initcall. We can make a tmp.s file with this function as a reference
to the start of the .text section.
.section __mcount_loc,"a",@progbits
[...]
.quad do_one_initcall + 0x185
[...]
Then we can compile the tmp.s into a tmp.o
gcc -c tmp.s -o tmp.o
And link it into back into main.o.
ld -r main.o tmp.o -o tmp_main.o
mv tmp_main.o main.o
But we have a problem. What happens if the first function in a section
is not exported, and is a static function. The linker will not let
the tmp.o use it. This case exists in main.o as well.
Disassembly of section .init.text:
0000000000000000 <set_reset_devices>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: e8 00 00 00 00 callq 9 <set_reset_devices+0x9>
5: R_X86_64_PC32 mcount+0xfffffffffffffffc
The first function in .init.text is a static function.
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
The lowercase 't' means that set_reset_devices is local and is not exported.
If we simply try to link the tmp.o with the set_reset_devices we end
up with two symbols: one local and one global.
.section __mcount_loc,"a",@progbits
.quad set_reset_devices + 0x10
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
U set_reset_devices
We still have an undefined reference to set_reset_devices, and if we try
to compile the kernel, we will end up with an undefined reference to
set_reset_devices, or even worst, it could be exported someplace else,
and then we will have a reference to the wrong location.
To handle this case, we make an intermediate step using objcopy.
We convert set_reset_devices into a global exported symbol before linking
it with tmp.o and set it back afterwards.
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 T set_reset_devices
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 T set_reset_devices
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
Now we have a section in main.o called __mcount_loc that we can place
somewhere in the kernel using vmlinux.ld.S and access it to convert
all these locations that call mcount into nops before starting SMP
and thus, eliminating the need to do this with kstop_machine.
Note, A well documented perl script (scripts/recordmcount.pl) is used
to do all this in one location.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-14 12:45:07 -07:00
|
|
|
config HAVE_FTRACE_MCOUNT_RECORD
|
|
|
|
bool
|
2009-09-14 17:10:15 -07:00
|
|
|
help
|
2018-05-08 11:14:57 -07:00
|
|
|
See Documentation/trace/ftrace-design.rst
|
ftrace: create __mcount_loc section
This patch creates a section in the kernel called "__mcount_loc".
This will hold a list of pointers to the mcount relocation for
each call site of mcount.
For example:
objdump -dr init/main.o
[...]
Disassembly of section .text:
0000000000000000 <do_one_initcall>:
0: 55 push %rbp
[...]
000000000000017b <init_post>:
17b: 55 push %rbp
17c: 48 89 e5 mov %rsp,%rbp
17f: 53 push %rbx
180: 48 83 ec 08 sub $0x8,%rsp
184: e8 00 00 00 00 callq 189 <init_post+0xe>
185: R_X86_64_PC32 mcount+0xfffffffffffffffc
[...]
We will add a section to point to each function call.
.section __mcount_loc,"a",@progbits
[...]
.quad .text + 0x185
[...]
The offset to of the mcount call site in init_post is an offset from
the start of the section, and not the start of the function init_post.
The mcount relocation is at the call site 0x185 from the start of the
.text section.
.text + 0x185 == init_post + 0xa
We need a way to add this __mcount_loc section in a way that we do not
lose the relocations after final link. The .text section here will
be attached to all other .text sections after final link and the
offsets will be meaningless. We need to keep track of where these
.text sections are.
To do this, we use the start of the first function in the section.
do_one_initcall. We can make a tmp.s file with this function as a reference
to the start of the .text section.
.section __mcount_loc,"a",@progbits
[...]
.quad do_one_initcall + 0x185
[...]
Then we can compile the tmp.s into a tmp.o
gcc -c tmp.s -o tmp.o
And link it into back into main.o.
ld -r main.o tmp.o -o tmp_main.o
mv tmp_main.o main.o
But we have a problem. What happens if the first function in a section
is not exported, and is a static function. The linker will not let
the tmp.o use it. This case exists in main.o as well.
Disassembly of section .init.text:
0000000000000000 <set_reset_devices>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: e8 00 00 00 00 callq 9 <set_reset_devices+0x9>
5: R_X86_64_PC32 mcount+0xfffffffffffffffc
The first function in .init.text is a static function.
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
The lowercase 't' means that set_reset_devices is local and is not exported.
If we simply try to link the tmp.o with the set_reset_devices we end
up with two symbols: one local and one global.
.section __mcount_loc,"a",@progbits
.quad set_reset_devices + 0x10
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
U set_reset_devices
We still have an undefined reference to set_reset_devices, and if we try
to compile the kernel, we will end up with an undefined reference to
set_reset_devices, or even worst, it could be exported someplace else,
and then we will have a reference to the wrong location.
To handle this case, we make an intermediate step using objcopy.
We convert set_reset_devices into a global exported symbol before linking
it with tmp.o and set it back afterwards.
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 T set_reset_devices
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 T set_reset_devices
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
Now we have a section in main.o called __mcount_loc that we can place
somewhere in the kernel using vmlinux.ld.S and access it to convert
all these locations that call mcount into nops before starting SMP
and thus, eliminating the need to do this with kstop_machine.
Note, A well documented perl script (scripts/recordmcount.pl) is used
to do all this in one location.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-14 12:45:07 -07:00
|
|
|
|
2009-08-24 14:43:11 -07:00
|
|
|
config HAVE_SYSCALL_TRACEPOINTS
|
2009-03-06 21:52:59 -07:00
|
|
|
bool
|
2009-09-14 17:10:15 -07:00
|
|
|
help
|
2018-05-08 11:14:57 -07:00
|
|
|
See Documentation/trace/ftrace-design.rst
|
2009-03-06 21:52:59 -07:00
|
|
|
|
2011-02-09 11:15:59 -07:00
|
|
|
config HAVE_FENTRY
|
|
|
|
bool
|
|
|
|
help
|
|
|
|
Arch supports the gcc options -pg with -mfentry
|
|
|
|
|
2018-08-06 06:17:46 -07:00
|
|
|
config HAVE_NOP_MCOUNT
|
|
|
|
bool
|
|
|
|
help
|
|
|
|
Arch supports the gcc options -pg with -mrecord-mcount and -nop-mcount
|
|
|
|
|
2020-09-25 16:43:53 -07:00
|
|
|
config HAVE_OBJTOOL_MCOUNT
|
|
|
|
bool
|
|
|
|
help
|
|
|
|
Arch supports objtool --mcount
|
|
|
|
|
2022-11-14 10:57:49 -07:00
|
|
|
config HAVE_OBJTOOL_NOP_MCOUNT
|
|
|
|
bool
|
|
|
|
help
|
|
|
|
Arch supports the objtool options --mcount with --mnop.
|
|
|
|
An architecture can select this if it wants to enable nop'ing
|
|
|
|
of ftrace locations.
|
|
|
|
|
2010-10-14 20:32:44 -07:00
|
|
|
config HAVE_C_RECORDMCOUNT
|
2010-10-13 14:12:30 -07:00
|
|
|
bool
|
|
|
|
help
|
|
|
|
C version of recordmcount available?
|
|
|
|
|
2022-01-25 07:19:10 -07:00
|
|
|
config HAVE_BUILDTIME_MCOUNT_SORT
|
|
|
|
bool
|
|
|
|
help
|
|
|
|
An architecture selects this if it sorts the mcount_loc section
|
|
|
|
at build time.
|
|
|
|
|
2022-01-22 07:17:10 -07:00
|
|
|
config BUILDTIME_MCOUNT_SORT
|
|
|
|
bool
|
|
|
|
default y
|
2022-01-25 07:19:10 -07:00
|
|
|
depends on HAVE_BUILDTIME_MCOUNT_SORT && DYNAMIC_FTRACE
|
2022-01-22 07:17:10 -07:00
|
|
|
help
|
|
|
|
Sort the mcount_loc section at build time.
|
|
|
|
|
2008-05-12 12:20:42 -07:00
|
|
|
config TRACER_MAX_TRACE
|
|
|
|
bool
|
|
|
|
|
trace: Stop compiling in trace_clock unconditionally
Commit 56449f437 "tracing: make the trace clocks available generally",
in April 2009, made trace_clock available unconditionally, since
CONFIG_X86_DS used it too.
Commit faa4602e47 "x86, perf, bts, mm: Delete the never used BTS-ptrace code",
in March 2010, removed CONFIG_X86_DS, and now only CONFIG_RING_BUFFER (split
out from CONFIG_TRACING for general use) has a dependency on trace_clock. So,
only compile in trace_clock with CONFIG_RING_BUFFER or CONFIG_TRACING
enabled.
Link: http://lkml.kernel.org/r/20120903024513.GA19583@leaf
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-09-02 19:45:14 -07:00
|
|
|
config TRACE_CLOCK
|
|
|
|
bool
|
|
|
|
|
tracing: unified trace buffer
This is a unified tracing buffer that implements a ring buffer that
hopefully everyone will eventually be able to use.
The events recorded into the buffer have the following structure:
struct ring_buffer_event {
u32 type:2, len:3, time_delta:27;
u32 array[];
};
The minimum size of an event is 8 bytes. All events are 4 byte
aligned inside the buffer.
There are 4 types (all internal use for the ring buffer, only
the data type is exported to the interface users).
RINGBUF_TYPE_PADDING: this type is used to note extra space at the end
of a buffer page.
RINGBUF_TYPE_TIME_EXTENT: This type is used when the time between events
is greater than the 27 bit delta can hold. We add another
32 bits, and record that in its own event (8 byte size).
RINGBUF_TYPE_TIME_STAMP: (Not implemented yet). This will hold data to
help keep the buffer timestamps in sync.
RINGBUF_TYPE_DATA: The event actually holds user data.
The "len" field is only three bits. Since the data must be
4 byte aligned, this field is shifted left by 2, giving a
max length of 28 bytes. If the data load is greater than 28
bytes, the first array field holds the full length of the
data load and the len field is set to zero.
Example, data size of 7 bytes:
type = RINGBUF_TYPE_DATA
len = 2
time_delta: <time-stamp> - <prev_event-time-stamp>
array[0..1]: <7 bytes of data> <1 byte empty>
This event is saved in 12 bytes of the buffer.
An event with 82 bytes of data:
type = RINGBUF_TYPE_DATA
len = 0
time_delta: <time-stamp> - <prev_event-time-stamp>
array[0]: 84 (Note the alignment)
array[1..14]: <82 bytes of data> <2 bytes empty>
The above event is saved in 92 bytes (if my math is correct).
82 bytes of data, 2 bytes empty, 4 byte header, 4 byte length.
Do not reference the above event struct directly. Use the following
functions to gain access to the event table, since the
ring_buffer_event structure may change in the future.
ring_buffer_event_length(event): get the length of the event.
This is the size of the memory used to record this
event, and not the size of the data pay load.
ring_buffer_time_delta(event): get the time delta of the event
This returns the delta time stamp since the last event.
Note: Even though this is in the header, there should
be no reason to access this directly, accept
for debugging.
ring_buffer_event_data(event): get the data from the event
This is the function to use to get the actual data
from the event. Note, it is only a pointer to the
data inside the buffer. This data must be copied to
another location otherwise you risk it being written
over in the buffer.
ring_buffer_lock: A way to lock the entire buffer.
ring_buffer_unlock: unlock the buffer.
ring_buffer_alloc: create a new ring buffer. Can choose between
overwrite or consumer/producer mode. Overwrite will
overwrite old data, where as consumer producer will
throw away new data if the consumer catches up with the
producer. The consumer/producer is the default.
ring_buffer_free: free the ring buffer.
ring_buffer_resize: resize the buffer. Changes the size of each cpu
buffer. Note, it is up to the caller to provide that
the buffer is not being used while this is happening.
This requirement may go away but do not count on it.
ring_buffer_lock_reserve: locks the ring buffer and allocates an
entry on the buffer to write to.
ring_buffer_unlock_commit: unlocks the ring buffer and commits it to
the buffer.
ring_buffer_write: writes some data into the ring buffer.
ring_buffer_peek: Look at a next item in the cpu buffer.
ring_buffer_consume: get the next item in the cpu buffer and
consume it. That is, this function increments the head
pointer.
ring_buffer_read_start: Start an iterator of a cpu buffer.
For now, this disables the cpu buffer, until you issue
a finish. This is just because we do not want the iterator
to be overwritten. This restriction may change in the future.
But note, this is used for static reading of a buffer which
is usually done "after" a trace. Live readings would want
to use the ring_buffer_consume above, which will not
disable the ring buffer.
ring_buffer_read_finish: Finishes the read iterator and reenables
the ring buffer.
ring_buffer_iter_peek: Look at the next item in the cpu iterator.
ring_buffer_read: Read the iterator and increment it.
ring_buffer_iter_reset: Reset the iterator to point to the beginning
of the cpu buffer.
ring_buffer_iter_empty: Returns true if the iterator is at the end
of the cpu buffer.
ring_buffer_size: returns the size in bytes of each cpu buffer.
Note, the real size is this times the number of CPUs.
ring_buffer_reset_cpu: Sets the cpu buffer to empty
ring_buffer_reset: sets all cpu buffers to empty
ring_buffer_swap_cpu: swaps a cpu buffer from one buffer with a
cpu buffer of another buffer. This is handy when you
want to take a snap shot of a running trace on just one
cpu. Having a backup buffer, to swap with facilitates this.
Ftrace max latencies use this.
ring_buffer_empty: Returns true if the ring buffer is empty.
ring_buffer_empty_cpu: Returns true if the cpu buffer is empty.
ring_buffer_record_disable: disable all cpu buffers (read only)
ring_buffer_record_disable_cpu: disable a single cpu buffer (read only)
ring_buffer_record_enable: enable all cpu buffers.
ring_buffer_record_enabl_cpu: enable a single cpu buffer.
ring_buffer_entries: The number of entries in a ring buffer.
ring_buffer_overruns: The number of entries removed due to writing wrap.
ring_buffer_time_stamp: Get the time stamp used by the ring buffer
ring_buffer_normalize_time_stamp: normalize the ring buffer time stamp
into nanosecs.
I still need to implement the GTOD feature. But we need support from
the cpu frequency infrastructure. But this can be done at a later
time without affecting the ring buffer interface.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-29 20:02:38 -07:00
|
|
|
config RING_BUFFER
|
|
|
|
bool
|
trace: Stop compiling in trace_clock unconditionally
Commit 56449f437 "tracing: make the trace clocks available generally",
in April 2009, made trace_clock available unconditionally, since
CONFIG_X86_DS used it too.
Commit faa4602e47 "x86, perf, bts, mm: Delete the never used BTS-ptrace code",
in March 2010, removed CONFIG_X86_DS, and now only CONFIG_RING_BUFFER (split
out from CONFIG_TRACING for general use) has a dependency on trace_clock. So,
only compile in trace_clock with CONFIG_RING_BUFFER or CONFIG_TRACING
enabled.
Link: http://lkml.kernel.org/r/20120903024513.GA19583@leaf
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-09-02 19:45:14 -07:00
|
|
|
select TRACE_CLOCK
|
2013-05-03 08:16:18 -07:00
|
|
|
select IRQ_WORK
|
tracing: unified trace buffer
This is a unified tracing buffer that implements a ring buffer that
hopefully everyone will eventually be able to use.
The events recorded into the buffer have the following structure:
struct ring_buffer_event {
u32 type:2, len:3, time_delta:27;
u32 array[];
};
The minimum size of an event is 8 bytes. All events are 4 byte
aligned inside the buffer.
There are 4 types (all internal use for the ring buffer, only
the data type is exported to the interface users).
RINGBUF_TYPE_PADDING: this type is used to note extra space at the end
of a buffer page.
RINGBUF_TYPE_TIME_EXTENT: This type is used when the time between events
is greater than the 27 bit delta can hold. We add another
32 bits, and record that in its own event (8 byte size).
RINGBUF_TYPE_TIME_STAMP: (Not implemented yet). This will hold data to
help keep the buffer timestamps in sync.
RINGBUF_TYPE_DATA: The event actually holds user data.
The "len" field is only three bits. Since the data must be
4 byte aligned, this field is shifted left by 2, giving a
max length of 28 bytes. If the data load is greater than 28
bytes, the first array field holds the full length of the
data load and the len field is set to zero.
Example, data size of 7 bytes:
type = RINGBUF_TYPE_DATA
len = 2
time_delta: <time-stamp> - <prev_event-time-stamp>
array[0..1]: <7 bytes of data> <1 byte empty>
This event is saved in 12 bytes of the buffer.
An event with 82 bytes of data:
type = RINGBUF_TYPE_DATA
len = 0
time_delta: <time-stamp> - <prev_event-time-stamp>
array[0]: 84 (Note the alignment)
array[1..14]: <82 bytes of data> <2 bytes empty>
The above event is saved in 92 bytes (if my math is correct).
82 bytes of data, 2 bytes empty, 4 byte header, 4 byte length.
Do not reference the above event struct directly. Use the following
functions to gain access to the event table, since the
ring_buffer_event structure may change in the future.
ring_buffer_event_length(event): get the length of the event.
This is the size of the memory used to record this
event, and not the size of the data pay load.
ring_buffer_time_delta(event): get the time delta of the event
This returns the delta time stamp since the last event.
Note: Even though this is in the header, there should
be no reason to access this directly, accept
for debugging.
ring_buffer_event_data(event): get the data from the event
This is the function to use to get the actual data
from the event. Note, it is only a pointer to the
data inside the buffer. This data must be copied to
another location otherwise you risk it being written
over in the buffer.
ring_buffer_lock: A way to lock the entire buffer.
ring_buffer_unlock: unlock the buffer.
ring_buffer_alloc: create a new ring buffer. Can choose between
overwrite or consumer/producer mode. Overwrite will
overwrite old data, where as consumer producer will
throw away new data if the consumer catches up with the
producer. The consumer/producer is the default.
ring_buffer_free: free the ring buffer.
ring_buffer_resize: resize the buffer. Changes the size of each cpu
buffer. Note, it is up to the caller to provide that
the buffer is not being used while this is happening.
This requirement may go away but do not count on it.
ring_buffer_lock_reserve: locks the ring buffer and allocates an
entry on the buffer to write to.
ring_buffer_unlock_commit: unlocks the ring buffer and commits it to
the buffer.
ring_buffer_write: writes some data into the ring buffer.
ring_buffer_peek: Look at a next item in the cpu buffer.
ring_buffer_consume: get the next item in the cpu buffer and
consume it. That is, this function increments the head
pointer.
ring_buffer_read_start: Start an iterator of a cpu buffer.
For now, this disables the cpu buffer, until you issue
a finish. This is just because we do not want the iterator
to be overwritten. This restriction may change in the future.
But note, this is used for static reading of a buffer which
is usually done "after" a trace. Live readings would want
to use the ring_buffer_consume above, which will not
disable the ring buffer.
ring_buffer_read_finish: Finishes the read iterator and reenables
the ring buffer.
ring_buffer_iter_peek: Look at the next item in the cpu iterator.
ring_buffer_read: Read the iterator and increment it.
ring_buffer_iter_reset: Reset the iterator to point to the beginning
of the cpu buffer.
ring_buffer_iter_empty: Returns true if the iterator is at the end
of the cpu buffer.
ring_buffer_size: returns the size in bytes of each cpu buffer.
Note, the real size is this times the number of CPUs.
ring_buffer_reset_cpu: Sets the cpu buffer to empty
ring_buffer_reset: sets all cpu buffers to empty
ring_buffer_swap_cpu: swaps a cpu buffer from one buffer with a
cpu buffer of another buffer. This is handy when you
want to take a snap shot of a running trace on just one
cpu. Having a backup buffer, to swap with facilitates this.
Ftrace max latencies use this.
ring_buffer_empty: Returns true if the ring buffer is empty.
ring_buffer_empty_cpu: Returns true if the cpu buffer is empty.
ring_buffer_record_disable: disable all cpu buffers (read only)
ring_buffer_record_disable_cpu: disable a single cpu buffer (read only)
ring_buffer_record_enable: enable all cpu buffers.
ring_buffer_record_enabl_cpu: enable a single cpu buffer.
ring_buffer_entries: The number of entries in a ring buffer.
ring_buffer_overruns: The number of entries removed due to writing wrap.
ring_buffer_time_stamp: Get the time stamp used by the ring buffer
ring_buffer_normalize_time_stamp: normalize the ring buffer time stamp
into nanosecs.
I still need to implement the GTOD feature. But we need support from
the cpu frequency infrastructure. But this can be done at a later
time without affecting the ring buffer interface.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-29 20:02:38 -07:00
|
|
|
|
2009-04-08 01:14:01 -07:00
|
|
|
config EVENT_TRACING
|
2009-05-25 03:11:59 -07:00
|
|
|
select CONTEXT_SWITCH_TRACER
|
2019-11-20 06:38:07 -07:00
|
|
|
select GLOB
|
2009-05-25 03:11:59 -07:00
|
|
|
bool
|
|
|
|
|
|
|
|
config CONTEXT_SWITCH_TRACER
|
2009-04-08 01:14:01 -07:00
|
|
|
bool
|
|
|
|
|
2009-09-04 11:24:40 -07:00
|
|
|
config RING_BUFFER_ALLOW_SWAP
|
|
|
|
bool
|
|
|
|
help
|
|
|
|
Allow the use of ring_buffer_swap_cpu.
|
|
|
|
Adds a very slight overhead to tracing when enabled.
|
|
|
|
|
2018-07-30 15:24:23 -07:00
|
|
|
config PREEMPTIRQ_TRACEPOINTS
|
|
|
|
bool
|
|
|
|
depends on TRACE_PREEMPT_TOGGLE || TRACE_IRQFLAGS
|
|
|
|
select TRACING
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
Create preempt/irq toggle tracepoints if needed, so that other parts
|
|
|
|
of the kernel can use them to generate or add hooks to them.
|
|
|
|
|
2009-05-28 12:50:13 -07:00
|
|
|
# All tracer options should select GENERIC_TRACER. For those options that are
|
|
|
|
# enabled by all tracers (context switch and event tracer) they select TRACING.
|
|
|
|
# This allows those options to appear when no other tracer is selected. But the
|
|
|
|
# options do not appear when something else selects it. We need the two options
|
|
|
|
# GENERIC_TRACER and TRACING to avoid circular dependencies to accomplish the
|
2009-12-21 13:01:17 -07:00
|
|
|
# hiding of the automatic options.
|
2009-05-28 12:50:13 -07:00
|
|
|
|
2008-05-12 12:20:42 -07:00
|
|
|
config TRACING
|
|
|
|
bool
|
tracing: unified trace buffer
This is a unified tracing buffer that implements a ring buffer that
hopefully everyone will eventually be able to use.
The events recorded into the buffer have the following structure:
struct ring_buffer_event {
u32 type:2, len:3, time_delta:27;
u32 array[];
};
The minimum size of an event is 8 bytes. All events are 4 byte
aligned inside the buffer.
There are 4 types (all internal use for the ring buffer, only
the data type is exported to the interface users).
RINGBUF_TYPE_PADDING: this type is used to note extra space at the end
of a buffer page.
RINGBUF_TYPE_TIME_EXTENT: This type is used when the time between events
is greater than the 27 bit delta can hold. We add another
32 bits, and record that in its own event (8 byte size).
RINGBUF_TYPE_TIME_STAMP: (Not implemented yet). This will hold data to
help keep the buffer timestamps in sync.
RINGBUF_TYPE_DATA: The event actually holds user data.
The "len" field is only three bits. Since the data must be
4 byte aligned, this field is shifted left by 2, giving a
max length of 28 bytes. If the data load is greater than 28
bytes, the first array field holds the full length of the
data load and the len field is set to zero.
Example, data size of 7 bytes:
type = RINGBUF_TYPE_DATA
len = 2
time_delta: <time-stamp> - <prev_event-time-stamp>
array[0..1]: <7 bytes of data> <1 byte empty>
This event is saved in 12 bytes of the buffer.
An event with 82 bytes of data:
type = RINGBUF_TYPE_DATA
len = 0
time_delta: <time-stamp> - <prev_event-time-stamp>
array[0]: 84 (Note the alignment)
array[1..14]: <82 bytes of data> <2 bytes empty>
The above event is saved in 92 bytes (if my math is correct).
82 bytes of data, 2 bytes empty, 4 byte header, 4 byte length.
Do not reference the above event struct directly. Use the following
functions to gain access to the event table, since the
ring_buffer_event structure may change in the future.
ring_buffer_event_length(event): get the length of the event.
This is the size of the memory used to record this
event, and not the size of the data pay load.
ring_buffer_time_delta(event): get the time delta of the event
This returns the delta time stamp since the last event.
Note: Even though this is in the header, there should
be no reason to access this directly, accept
for debugging.
ring_buffer_event_data(event): get the data from the event
This is the function to use to get the actual data
from the event. Note, it is only a pointer to the
data inside the buffer. This data must be copied to
another location otherwise you risk it being written
over in the buffer.
ring_buffer_lock: A way to lock the entire buffer.
ring_buffer_unlock: unlock the buffer.
ring_buffer_alloc: create a new ring buffer. Can choose between
overwrite or consumer/producer mode. Overwrite will
overwrite old data, where as consumer producer will
throw away new data if the consumer catches up with the
producer. The consumer/producer is the default.
ring_buffer_free: free the ring buffer.
ring_buffer_resize: resize the buffer. Changes the size of each cpu
buffer. Note, it is up to the caller to provide that
the buffer is not being used while this is happening.
This requirement may go away but do not count on it.
ring_buffer_lock_reserve: locks the ring buffer and allocates an
entry on the buffer to write to.
ring_buffer_unlock_commit: unlocks the ring buffer and commits it to
the buffer.
ring_buffer_write: writes some data into the ring buffer.
ring_buffer_peek: Look at a next item in the cpu buffer.
ring_buffer_consume: get the next item in the cpu buffer and
consume it. That is, this function increments the head
pointer.
ring_buffer_read_start: Start an iterator of a cpu buffer.
For now, this disables the cpu buffer, until you issue
a finish. This is just because we do not want the iterator
to be overwritten. This restriction may change in the future.
But note, this is used for static reading of a buffer which
is usually done "after" a trace. Live readings would want
to use the ring_buffer_consume above, which will not
disable the ring buffer.
ring_buffer_read_finish: Finishes the read iterator and reenables
the ring buffer.
ring_buffer_iter_peek: Look at the next item in the cpu iterator.
ring_buffer_read: Read the iterator and increment it.
ring_buffer_iter_reset: Reset the iterator to point to the beginning
of the cpu buffer.
ring_buffer_iter_empty: Returns true if the iterator is at the end
of the cpu buffer.
ring_buffer_size: returns the size in bytes of each cpu buffer.
Note, the real size is this times the number of CPUs.
ring_buffer_reset_cpu: Sets the cpu buffer to empty
ring_buffer_reset: sets all cpu buffers to empty
ring_buffer_swap_cpu: swaps a cpu buffer from one buffer with a
cpu buffer of another buffer. This is handy when you
want to take a snap shot of a running trace on just one
cpu. Having a backup buffer, to swap with facilitates this.
Ftrace max latencies use this.
ring_buffer_empty: Returns true if the ring buffer is empty.
ring_buffer_empty_cpu: Returns true if the cpu buffer is empty.
ring_buffer_record_disable: disable all cpu buffers (read only)
ring_buffer_record_disable_cpu: disable a single cpu buffer (read only)
ring_buffer_record_enable: enable all cpu buffers.
ring_buffer_record_enabl_cpu: enable a single cpu buffer.
ring_buffer_entries: The number of entries in a ring buffer.
ring_buffer_overruns: The number of entries removed due to writing wrap.
ring_buffer_time_stamp: Get the time stamp used by the ring buffer
ring_buffer_normalize_time_stamp: normalize the ring buffer time stamp
into nanosecs.
I still need to implement the GTOD feature. But we need support from
the cpu frequency infrastructure. But this can be done at a later
time without affecting the ring buffer interface.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-29 20:02:38 -07:00
|
|
|
select RING_BUFFER
|
2008-10-31 12:50:41 -07:00
|
|
|
select STACKTRACE if STACKTRACE_SUPPORT
|
2008-07-23 05:15:22 -07:00
|
|
|
select TRACEPOINTS
|
2008-10-29 08:15:57 -07:00
|
|
|
select NOP_TRACER
|
2009-03-06 09:21:49 -07:00
|
|
|
select BINARY_PRINTF
|
2009-04-08 01:14:01 -07:00
|
|
|
select EVENT_TRACING
|
trace: Stop compiling in trace_clock unconditionally
Commit 56449f437 "tracing: make the trace clocks available generally",
in April 2009, made trace_clock available unconditionally, since
CONFIG_X86_DS used it too.
Commit faa4602e47 "x86, perf, bts, mm: Delete the never used BTS-ptrace code",
in March 2010, removed CONFIG_X86_DS, and now only CONFIG_RING_BUFFER (split
out from CONFIG_TRACING for general use) has a dependency on trace_clock. So,
only compile in trace_clock with CONFIG_RING_BUFFER or CONFIG_TRACING
enabled.
Link: http://lkml.kernel.org/r/20120903024513.GA19583@leaf
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-09-02 19:45:14 -07:00
|
|
|
select TRACE_CLOCK
|
2024-02-22 11:25:44 -07:00
|
|
|
select NEED_TASKS_RCU
|
2008-05-12 12:20:42 -07:00
|
|
|
|
2009-05-28 12:50:13 -07:00
|
|
|
config GENERIC_TRACER
|
|
|
|
bool
|
|
|
|
select TRACING
|
|
|
|
|
2009-03-05 13:19:55 -07:00
|
|
|
#
|
|
|
|
# Minimum requirements an architecture has to meet for us to
|
|
|
|
# be able to offer generic tracing facilities:
|
|
|
|
#
|
|
|
|
config TRACING_SUPPORT
|
|
|
|
bool
|
2018-05-02 04:29:48 -07:00
|
|
|
depends on TRACE_IRQFLAGS_SUPPORT
|
2009-03-05 13:19:55 -07:00
|
|
|
depends on STACKTRACE_SUPPORT
|
2009-03-05 18:40:53 -07:00
|
|
|
default y
|
2009-03-05 13:19:55 -07:00
|
|
|
|
2009-04-20 07:47:36 -07:00
|
|
|
menuconfig FTRACE
|
|
|
|
bool "Tracers"
|
2021-07-30 22:22:31 -07:00
|
|
|
depends on TRACING_SUPPORT
|
2009-05-07 09:49:27 -07:00
|
|
|
default y if DEBUG_KERNEL
|
2009-04-20 07:47:36 -07:00
|
|
|
help
|
2009-12-21 13:01:17 -07:00
|
|
|
Enable the kernel tracing infrastructure.
|
2009-04-20 07:47:36 -07:00
|
|
|
|
|
|
|
if FTRACE
|
2008-10-21 07:31:18 -07:00
|
|
|
|
2020-01-29 14:30:30 -07:00
|
|
|
config BOOTTIME_TRACING
|
|
|
|
bool "Boot-time Tracing support"
|
2020-02-20 05:18:33 -07:00
|
|
|
depends on TRACING
|
|
|
|
select BOOT_CONFIG
|
2020-01-29 14:30:30 -07:00
|
|
|
help
|
|
|
|
Enable developer to setup ftrace subsystem via supplemental
|
|
|
|
kernel cmdline at boot time for debugging (tracing) driver
|
|
|
|
initialization and boot process.
|
|
|
|
|
2008-10-06 16:06:12 -07:00
|
|
|
config FUNCTION_TRACER
|
ftrace: function tracer
This is a simple trace that uses the ftrace infrastructure. It is
designed to be fast and small, and easy to use. It is useful to
record things that happen over a very short period of time, and
not to analyze the system in general.
Updates:
available_tracers
"function" is added to this file.
current_tracer
To enable the function tracer:
echo function > /debugfs/tracing/current_tracer
To disable the tracer:
echo disable > /debugfs/tracing/current_tracer
The output of the function_trace file is as follows
"echo noverbose > /debugfs/tracing/iter_ctrl"
preemption latency trace v1.1.5 on 2.6.24-rc7-tst
Signed-off-by: Ingo Molnar <mingo@elte.hu>
--------------------------------------------------------------------
latency: 0 us, #419428/4361791, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
-----------------
| task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
-----------------
_------=> CPU#
/ _-----=> irqs-off
| / _----=> need-resched
|| / _---=> hardirq/softirq
||| / _--=> preempt-depth
|||| /
||||| delay
cmd pid ||||| time | caller
\ / ||||| \ | /
swapper-0 0d.h. 1595128us+: set_normalized_timespec+0x8/0x2d <c043841d> (ktime_get_ts+0x4a/0x4e <c04499d4>)
swapper-0 0d.h. 1595131us+: _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
Or with verbose turned on:
"echo verbose > /debugfs/tracing/iter_ctrl"
preemption latency trace v1.1.5 on 2.6.24-rc7-tst
--------------------------------------------------------------------
latency: 0 us, #419428/4361791, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
-----------------
| task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
-----------------
swapper 0 0 9 00000000 00000000 [f3675f41] 1595.128ms (+0.003ms): set_normalized_timespec+0x8/0x2d <c043841d> (ktime_get_ts+0x4a/0x4e <c04499d4>)
swapper 0 0 9 00000000 00000001 [f3675f45] 1595.131ms (+0.003ms): _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
swapper 0 0 9 00000000 00000002 [f3675f48] 1595.135ms (+0.003ms): _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
The "trace" file is not affected by the verbose mode, but is by the symonly.
echo "nosymonly" > /debugfs/tracing/iter_ctrl
tracer:
[ 81.479967] CPU 0: bash:3154 register_ftrace_function+0x5f/0x66 <ffffffff80337a4d> <-- _spin_unlock_irqrestore+0xe/0x5a <ffffffff8048cc8f>
[ 81.479967] CPU 0: bash:3154 _spin_unlock_irqrestore+0x3e/0x5a <ffffffff8048ccbf> <-- sub_preempt_count+0xc/0x7a <ffffffff80233d7b>
[ 81.479968] CPU 0: bash:3154 sub_preempt_count+0x30/0x7a <ffffffff80233d9f> <-- in_lock_functions+0x9/0x24 <ffffffff8025a75d>
[ 81.479968] CPU 0: bash:3154 vfs_write+0x11d/0x155 <ffffffff8029a043> <-- dnotify_parent+0x12/0x78 <ffffffff802d54fb>
[ 81.479968] CPU 0: bash:3154 dnotify_parent+0x2d/0x78 <ffffffff802d5516> <-- _spin_lock+0xe/0x70 <ffffffff8048c910>
[ 81.479969] CPU 0: bash:3154 _spin_lock+0x1b/0x70 <ffffffff8048c91d> <-- add_preempt_count+0xe/0x77 <ffffffff80233df7>
[ 81.479969] CPU 0: bash:3154 add_preempt_count+0x3e/0x77 <ffffffff80233e27> <-- in_lock_functions+0x9/0x24 <ffffffff8025a75d>
echo "symonly" > /debugfs/tracing/iter_ctrl
tracer:
[ 81.479913] CPU 0: bash:3154 register_ftrace_function+0x5f/0x66 <-- _spin_unlock_irqrestore+0xe/0x5a
[ 81.479913] CPU 0: bash:3154 _spin_unlock_irqrestore+0x3e/0x5a <-- sub_preempt_count+0xc/0x7a
[ 81.479913] CPU 0: bash:3154 sub_preempt_count+0x30/0x7a <-- in_lock_functions+0x9/0x24
[ 81.479914] CPU 0: bash:3154 vfs_write+0x11d/0x155 <-- dnotify_parent+0x12/0x78
[ 81.479914] CPU 0: bash:3154 dnotify_parent+0x2d/0x78 <-- _spin_lock+0xe/0x70
[ 81.479914] CPU 0: bash:3154 _spin_lock+0x1b/0x70 <-- add_preempt_count+0xe/0x77
[ 81.479914] CPU 0: bash:3154 add_preempt_count+0x3e/0x77 <-- in_lock_functions+0x9/0x24
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 12:20:42 -07:00
|
|
|
bool "Kernel Function Tracer"
|
2008-10-06 16:06:12 -07:00
|
|
|
depends on HAVE_FUNCTION_TRACER
|
2009-02-18 20:06:18 -07:00
|
|
|
select KALLSYMS
|
2009-05-28 12:50:13 -07:00
|
|
|
select GENERIC_TRACER
|
2008-05-12 12:20:42 -07:00
|
|
|
select CONTEXT_SWITCH_TRACER
|
2017-04-06 07:28:12 -07:00
|
|
|
select GLOB
|
2024-02-22 11:25:44 -07:00
|
|
|
select NEED_TASKS_RCU
|
2020-04-03 12:10:28 -07:00
|
|
|
select TASKS_RUDE_RCU
|
ftrace: function tracer
This is a simple trace that uses the ftrace infrastructure. It is
designed to be fast and small, and easy to use. It is useful to
record things that happen over a very short period of time, and
not to analyze the system in general.
Updates:
available_tracers
"function" is added to this file.
current_tracer
To enable the function tracer:
echo function > /debugfs/tracing/current_tracer
To disable the tracer:
echo disable > /debugfs/tracing/current_tracer
The output of the function_trace file is as follows
"echo noverbose > /debugfs/tracing/iter_ctrl"
preemption latency trace v1.1.5 on 2.6.24-rc7-tst
Signed-off-by: Ingo Molnar <mingo@elte.hu>
--------------------------------------------------------------------
latency: 0 us, #419428/4361791, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
-----------------
| task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
-----------------
_------=> CPU#
/ _-----=> irqs-off
| / _----=> need-resched
|| / _---=> hardirq/softirq
||| / _--=> preempt-depth
|||| /
||||| delay
cmd pid ||||| time | caller
\ / ||||| \ | /
swapper-0 0d.h. 1595128us+: set_normalized_timespec+0x8/0x2d <c043841d> (ktime_get_ts+0x4a/0x4e <c04499d4>)
swapper-0 0d.h. 1595131us+: _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
Or with verbose turned on:
"echo verbose > /debugfs/tracing/iter_ctrl"
preemption latency trace v1.1.5 on 2.6.24-rc7-tst
--------------------------------------------------------------------
latency: 0 us, #419428/4361791, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
-----------------
| task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
-----------------
swapper 0 0 9 00000000 00000000 [f3675f41] 1595.128ms (+0.003ms): set_normalized_timespec+0x8/0x2d <c043841d> (ktime_get_ts+0x4a/0x4e <c04499d4>)
swapper 0 0 9 00000000 00000001 [f3675f45] 1595.131ms (+0.003ms): _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
swapper 0 0 9 00000000 00000002 [f3675f48] 1595.135ms (+0.003ms): _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
The "trace" file is not affected by the verbose mode, but is by the symonly.
echo "nosymonly" > /debugfs/tracing/iter_ctrl
tracer:
[ 81.479967] CPU 0: bash:3154 register_ftrace_function+0x5f/0x66 <ffffffff80337a4d> <-- _spin_unlock_irqrestore+0xe/0x5a <ffffffff8048cc8f>
[ 81.479967] CPU 0: bash:3154 _spin_unlock_irqrestore+0x3e/0x5a <ffffffff8048ccbf> <-- sub_preempt_count+0xc/0x7a <ffffffff80233d7b>
[ 81.479968] CPU 0: bash:3154 sub_preempt_count+0x30/0x7a <ffffffff80233d9f> <-- in_lock_functions+0x9/0x24 <ffffffff8025a75d>
[ 81.479968] CPU 0: bash:3154 vfs_write+0x11d/0x155 <ffffffff8029a043> <-- dnotify_parent+0x12/0x78 <ffffffff802d54fb>
[ 81.479968] CPU 0: bash:3154 dnotify_parent+0x2d/0x78 <ffffffff802d5516> <-- _spin_lock+0xe/0x70 <ffffffff8048c910>
[ 81.479969] CPU 0: bash:3154 _spin_lock+0x1b/0x70 <ffffffff8048c91d> <-- add_preempt_count+0xe/0x77 <ffffffff80233df7>
[ 81.479969] CPU 0: bash:3154 add_preempt_count+0x3e/0x77 <ffffffff80233e27> <-- in_lock_functions+0x9/0x24 <ffffffff8025a75d>
echo "symonly" > /debugfs/tracing/iter_ctrl
tracer:
[ 81.479913] CPU 0: bash:3154 register_ftrace_function+0x5f/0x66 <-- _spin_unlock_irqrestore+0xe/0x5a
[ 81.479913] CPU 0: bash:3154 _spin_unlock_irqrestore+0x3e/0x5a <-- sub_preempt_count+0xc/0x7a
[ 81.479913] CPU 0: bash:3154 sub_preempt_count+0x30/0x7a <-- in_lock_functions+0x9/0x24
[ 81.479914] CPU 0: bash:3154 vfs_write+0x11d/0x155 <-- dnotify_parent+0x12/0x78
[ 81.479914] CPU 0: bash:3154 dnotify_parent+0x2d/0x78 <-- _spin_lock+0xe/0x70
[ 81.479914] CPU 0: bash:3154 _spin_lock+0x1b/0x70 <-- add_preempt_count+0xe/0x77
[ 81.479914] CPU 0: bash:3154 add_preempt_count+0x3e/0x77 <-- in_lock_functions+0x9/0x24
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 12:20:42 -07:00
|
|
|
help
|
|
|
|
Enable the kernel to trace every kernel function. This is done
|
|
|
|
by using a compiler feature to insert a small, 5-byte No-Operation
|
2009-12-21 13:01:17 -07:00
|
|
|
instruction at the beginning of every kernel function, which NOP
|
ftrace: function tracer
This is a simple trace that uses the ftrace infrastructure. It is
designed to be fast and small, and easy to use. It is useful to
record things that happen over a very short period of time, and
not to analyze the system in general.
Updates:
available_tracers
"function" is added to this file.
current_tracer
To enable the function tracer:
echo function > /debugfs/tracing/current_tracer
To disable the tracer:
echo disable > /debugfs/tracing/current_tracer
The output of the function_trace file is as follows
"echo noverbose > /debugfs/tracing/iter_ctrl"
preemption latency trace v1.1.5 on 2.6.24-rc7-tst
Signed-off-by: Ingo Molnar <mingo@elte.hu>
--------------------------------------------------------------------
latency: 0 us, #419428/4361791, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
-----------------
| task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
-----------------
_------=> CPU#
/ _-----=> irqs-off
| / _----=> need-resched
|| / _---=> hardirq/softirq
||| / _--=> preempt-depth
|||| /
||||| delay
cmd pid ||||| time | caller
\ / ||||| \ | /
swapper-0 0d.h. 1595128us+: set_normalized_timespec+0x8/0x2d <c043841d> (ktime_get_ts+0x4a/0x4e <c04499d4>)
swapper-0 0d.h. 1595131us+: _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
Or with verbose turned on:
"echo verbose > /debugfs/tracing/iter_ctrl"
preemption latency trace v1.1.5 on 2.6.24-rc7-tst
--------------------------------------------------------------------
latency: 0 us, #419428/4361791, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
-----------------
| task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
-----------------
swapper 0 0 9 00000000 00000000 [f3675f41] 1595.128ms (+0.003ms): set_normalized_timespec+0x8/0x2d <c043841d> (ktime_get_ts+0x4a/0x4e <c04499d4>)
swapper 0 0 9 00000000 00000001 [f3675f45] 1595.131ms (+0.003ms): _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
swapper 0 0 9 00000000 00000002 [f3675f48] 1595.135ms (+0.003ms): _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
The "trace" file is not affected by the verbose mode, but is by the symonly.
echo "nosymonly" > /debugfs/tracing/iter_ctrl
tracer:
[ 81.479967] CPU 0: bash:3154 register_ftrace_function+0x5f/0x66 <ffffffff80337a4d> <-- _spin_unlock_irqrestore+0xe/0x5a <ffffffff8048cc8f>
[ 81.479967] CPU 0: bash:3154 _spin_unlock_irqrestore+0x3e/0x5a <ffffffff8048ccbf> <-- sub_preempt_count+0xc/0x7a <ffffffff80233d7b>
[ 81.479968] CPU 0: bash:3154 sub_preempt_count+0x30/0x7a <ffffffff80233d9f> <-- in_lock_functions+0x9/0x24 <ffffffff8025a75d>
[ 81.479968] CPU 0: bash:3154 vfs_write+0x11d/0x155 <ffffffff8029a043> <-- dnotify_parent+0x12/0x78 <ffffffff802d54fb>
[ 81.479968] CPU 0: bash:3154 dnotify_parent+0x2d/0x78 <ffffffff802d5516> <-- _spin_lock+0xe/0x70 <ffffffff8048c910>
[ 81.479969] CPU 0: bash:3154 _spin_lock+0x1b/0x70 <ffffffff8048c91d> <-- add_preempt_count+0xe/0x77 <ffffffff80233df7>
[ 81.479969] CPU 0: bash:3154 add_preempt_count+0x3e/0x77 <ffffffff80233e27> <-- in_lock_functions+0x9/0x24 <ffffffff8025a75d>
echo "symonly" > /debugfs/tracing/iter_ctrl
tracer:
[ 81.479913] CPU 0: bash:3154 register_ftrace_function+0x5f/0x66 <-- _spin_unlock_irqrestore+0xe/0x5a
[ 81.479913] CPU 0: bash:3154 _spin_unlock_irqrestore+0x3e/0x5a <-- sub_preempt_count+0xc/0x7a
[ 81.479913] CPU 0: bash:3154 sub_preempt_count+0x30/0x7a <-- in_lock_functions+0x9/0x24
[ 81.479914] CPU 0: bash:3154 vfs_write+0x11d/0x155 <-- dnotify_parent+0x12/0x78
[ 81.479914] CPU 0: bash:3154 dnotify_parent+0x2d/0x78 <-- _spin_lock+0xe/0x70
[ 81.479914] CPU 0: bash:3154 _spin_lock+0x1b/0x70 <-- add_preempt_count+0xe/0x77
[ 81.479914] CPU 0: bash:3154 add_preempt_count+0x3e/0x77 <-- in_lock_functions+0x9/0x24
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 12:20:42 -07:00
|
|
|
sequence is then dynamically patched into a tracer call when
|
|
|
|
tracing is enabled by the administrator. If it's runtime disabled
|
|
|
|
(the bootup default), then the overhead of the instructions is very
|
2022-07-06 13:12:31 -07:00
|
|
|
small and not measurable even in micro-benchmarks (at least on
|
|
|
|
x86, but may have impact on other architectures).
|
2008-05-12 12:20:42 -07:00
|
|
|
|
2008-11-25 13:07:04 -07:00
|
|
|
config FUNCTION_GRAPH_TRACER
|
|
|
|
bool "Kernel Function Graph Tracer"
|
|
|
|
depends on HAVE_FUNCTION_GRAPH_TRACER
|
2008-11-10 23:14:25 -07:00
|
|
|
depends on FUNCTION_TRACER
|
function-graph: disable when both x86_32 and optimize for size are configured
On x86_32, when optimize for size is set, gcc may align the frame pointer
and make a copy of the the return address inside the stack frame.
The return address that is located in the stack frame may not be
the one used to return to the calling function. This will break the
function graph tracer.
The function graph tracer replaces the return address with a jump to a hook
function that can trace the exit of the function. If it only replaces
a copy, then the hook will not be called when the function returns.
Worse yet, when the parent function returns, the function graph tracer
will return back to the location of the child function which will
easily crash the kernel with weird results.
To see the problem, when i386 is compiled with -Os we get:
c106be03: 57 push %edi
c106be04: 8d 7c 24 08 lea 0x8(%esp),%edi
c106be08: 83 e4 e0 and $0xffffffe0,%esp
c106be0b: ff 77 fc pushl 0xfffffffc(%edi)
c106be0e: 55 push %ebp
c106be0f: 89 e5 mov %esp,%ebp
c106be11: 57 push %edi
c106be12: 56 push %esi
c106be13: 53 push %ebx
c106be14: 81 ec 8c 00 00 00 sub $0x8c,%esp
c106be1a: e8 f5 57 fb ff call c1021614 <mcount>
When it is compiled with -O2 instead we get:
c10896f0: 55 push %ebp
c10896f1: 89 e5 mov %esp,%ebp
c10896f3: 83 ec 28 sub $0x28,%esp
c10896f6: 89 5d f4 mov %ebx,0xfffffff4(%ebp)
c10896f9: 89 75 f8 mov %esi,0xfffffff8(%ebp)
c10896fc: 89 7d fc mov %edi,0xfffffffc(%ebp)
c10896ff: e8 d0 08 fa ff call c1029fd4 <mcount>
The compile with -Os will align the stack pointer then set up the
frame pointer (%ebp), and it copies the return address back into
the stack frame. The change to the return address in mcount is done
to the copy and not the real place holder of the return address.
Then compile with -O2 sets up the frame pointer first, this makes
the change to the return address by mcount affect where the function
will jump on exit.
Reported-by: Jake Edge <jake@lwn.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-18 09:53:21 -07:00
|
|
|
depends on !X86_32 || !CC_OPTIMIZE_FOR_SIZE
|
2008-12-03 02:33:58 -07:00
|
|
|
default y
|
2008-11-10 23:14:25 -07:00
|
|
|
help
|
2008-11-25 13:07:04 -07:00
|
|
|
Enable the kernel to trace a function at both its return
|
|
|
|
and its entry.
|
2009-01-26 03:12:25 -07:00
|
|
|
Its first purpose is to trace the duration of functions and
|
|
|
|
draw a call graph for each thread with some information like
|
2009-12-21 13:01:17 -07:00
|
|
|
the return value. This is done by setting the current return
|
2009-01-26 03:12:25 -07:00
|
|
|
address on the current task structure into a stack of calls.
|
2008-11-10 23:14:25 -07:00
|
|
|
|
function_graph: Support recording and printing the return value of function
Analyzing system call failures with the function_graph tracer can be a
time-consuming process, particularly when locating the kernel function
that first returns an error in the trace logs. This change aims to
simplify the process by recording the function return value to the
'retval' member of 'ftrace_graph_ret' and printing it when outputting
the trace log.
We have introduced new trace options: funcgraph-retval and
funcgraph-retval-hex. The former controls whether to display the return
value, while the latter controls the display format.
Please note that even if a function's return type is void, a return
value will still be printed. You can simply ignore it.
This patch only establishes the fundamental infrastructure. Subsequent
patches will make this feature available on some commonly used processor
architectures.
Here is an example:
I attempted to attach the demo process to a cpu cgroup, but it failed:
echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks
-bash: echo: write error: Invalid argument
The strace logs indicate that the write system call returned -EINVAL(-22):
...
write(1, "273\n", 4) = -1 EINVAL (Invalid argument)
...
To capture trace logs during a write system call, use the following
commands:
cd /sys/kernel/debug/tracing/
echo 0 > tracing_on
echo > trace
echo *sys_write > set_graph_function
echo *spin* > set_graph_notrace
echo *rcu* >> set_graph_notrace
echo *alloc* >> set_graph_notrace
echo preempt* >> set_graph_notrace
echo kfree* >> set_graph_notrace
echo $$ > set_ftrace_pid
echo function_graph > current_tracer
echo 1 > options/funcgraph-retval
echo 0 > options/funcgraph-retval-hex
echo 1 > tracing_on
echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks
echo 0 > tracing_on
cat trace > ~/trace.log
To locate the root cause, search for error code -22 directly in the file
trace.log and identify the first function that returned -22. Once you
have identified this function, examine its code to determine the root
cause.
For example, in the trace log below, cpu_cgroup_can_attach
returned -22 first, so we can focus our analysis on this function to
identify the root cause.
...
1) | cgroup_migrate() {
1) 0.651 us | cgroup_migrate_add_task(); /* = 0xffff93fcfd346c00 */
1) | cgroup_migrate_execute() {
1) | cpu_cgroup_can_attach() {
1) | cgroup_taskset_first() {
1) 0.732 us | cgroup_taskset_next(); /* = 0xffff93fc8fb20000 */
1) 1.232 us | } /* cgroup_taskset_first = 0xffff93fc8fb20000 */
1) 0.380 us | sched_rt_can_attach(); /* = 0x0 */
1) 2.335 us | } /* cpu_cgroup_can_attach = -22 */
1) 4.369 us | } /* cgroup_migrate_execute = -22 */
1) 7.143 us | } /* cgroup_migrate = -22 */
...
Link: https://lkml.kernel.org/r/1fc502712c981e0e6742185ba242992170ac9da8.1680954589.git.pengdonglin@sangfor.com.cn
Tested-by: Florian Kauer <florian.kauer@linutronix.de>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Donglin Peng <pengdonglin@sangfor.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-04-08 05:42:15 -07:00
|
|
|
config FUNCTION_GRAPH_RETVAL
|
|
|
|
bool "Kernel Function Graph Return Value"
|
|
|
|
depends on HAVE_FUNCTION_GRAPH_RETVAL
|
|
|
|
depends on FUNCTION_GRAPH_TRACER
|
|
|
|
default n
|
|
|
|
help
|
|
|
|
Support recording and printing the function return value when
|
|
|
|
using function graph tracer. It can be helpful to locate functions
|
|
|
|
that return errors. This feature is off by default, and you can
|
|
|
|
enable it via the trace option funcgraph-retval.
|
|
|
|
See Documentation/trace/ftrace.rst
|
|
|
|
|
2020-01-29 14:19:10 -07:00
|
|
|
config DYNAMIC_FTRACE
|
|
|
|
bool "enable/disable function tracing dynamically"
|
|
|
|
depends on FUNCTION_TRACER
|
|
|
|
depends on HAVE_DYNAMIC_FTRACE
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
This option will modify all the calls to function tracing
|
|
|
|
dynamically (will patch them out of the binary image and
|
|
|
|
replace them with a No-Op instruction) on boot up. During
|
|
|
|
compile time, a table is made of all the locations that ftrace
|
|
|
|
can function trace, and this table is linked into the kernel
|
|
|
|
image. When this is enabled, functions can be individually
|
|
|
|
enabled, and the functions not enabled will not affect
|
|
|
|
performance of the system.
|
|
|
|
|
2023-02-15 15:33:45 -07:00
|
|
|
See the files in /sys/kernel/tracing:
|
2020-01-29 14:19:10 -07:00
|
|
|
available_filter_functions
|
|
|
|
set_ftrace_filter
|
|
|
|
set_ftrace_notrace
|
|
|
|
|
|
|
|
This way a CONFIG_FUNCTION_TRACER kernel is slightly larger, but
|
|
|
|
otherwise has native performance as long as no tracing is active.
|
|
|
|
|
|
|
|
config DYNAMIC_FTRACE_WITH_REGS
|
|
|
|
def_bool y
|
|
|
|
depends on DYNAMIC_FTRACE
|
|
|
|
depends on HAVE_DYNAMIC_FTRACE_WITH_REGS
|
|
|
|
|
|
|
|
config DYNAMIC_FTRACE_WITH_DIRECT_CALLS
|
|
|
|
def_bool y
|
2023-03-21 07:04:23 -07:00
|
|
|
depends on DYNAMIC_FTRACE_WITH_REGS || DYNAMIC_FTRACE_WITH_ARGS
|
2020-01-29 14:19:10 -07:00
|
|
|
depends on HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
|
|
|
|
|
ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS
Architectures without dynamic ftrace trampolines incur an overhead when
multiple ftrace_ops are enabled with distinct filters. in these cases,
each call site calls a common trampoline which uses
ftrace_ops_list_func() to iterate over all enabled ftrace functions, and
so incurs an overhead relative to the size of this list (including RCU
protection overhead).
Architectures with dynamic ftrace trampolines avoid this overhead for
call sites which have a single associated ftrace_ops. In these cases,
the dynamic trampoline is customized to branch directly to the relevant
ftrace function, avoiding the list overhead.
On some architectures it's impractical and/or undesirable to implement
dynamic ftrace trampolines. For example, arm64 has limited branch ranges
and cannot always directly branch from a call site to an arbitrary
address (e.g. from a kernel text address to an arbitrary module
address). Calls from modules to core kernel text can be indirected via
PLTs (allocated at module load time) to address this, but the same is
not possible from calls from core kernel text.
Using an indirect branch from a call site to an arbitrary trampoline is
possible, but requires several more instructions in the function
prologue (or immediately before it), and/or comes with far more complex
requirements for patching.
Instead, this patch adds a new option, where an architecture can
associate each call site with a pointer to an ftrace_ops, placed at a
fixed offset from the call site. A shared trampoline can recover this
pointer and call ftrace_ops::func() without needing to go via
ftrace_ops_list_func(), avoiding the associated overhead.
This avoids issues with branch range limitations, and avoids the need to
allocate and manipulate dynamic trampolines, making it far simpler to
implement and maintain, while having similar performance
characteristics.
Note that this allows for dynamic ftrace_ops to be invoked directly from
an architecture's ftrace_caller trampoline, whereas existing code forces
the use of ftrace_ops_get_list_func(), which is in part necessary to
permit the ftrace_ops to be freed once unregistered *and* to avoid
branch/address-generation range limitation on some architectures (e.g.
where ops->func is a module address, and may be outside of the direct
branch range for callsites within the main kernel image).
The CALL_OPS approach avoids this problems and is safe as:
* The existing synchronization in ftrace_shutdown() using
ftrace_shutdown() using synchronize_rcu_tasks_rude() (and
synchronize_rcu_tasks()) ensures that no tasks hold a stale reference
to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline,
or while invoking ftrace_ops::func), when that ftrace_ops is
unregistered.
Arguably this could also be relied upon for the existing scheme,
permitting dynamic ftrace_ops to be invoked directly when ops->func is
in range, but this will require additional logic to handle branch
range limitations, and is not handled by this patch.
* Each callsite's ftrace_ops pointer literal can hold any valid kernel
address, and is updated atomically. As an architecture's ftrace_caller
trampoline will atomically load the ops pointer then dereference
ops->func, there is no risk of invoking ops->func with a mismatches
ops pointer, and updates to the ops pointer do not require special
care.
A subsequent patch will implement architectures support for arm64. There
should be no functional change as a result of this patch alone.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-23 06:45:56 -07:00
|
|
|
config DYNAMIC_FTRACE_WITH_CALL_OPS
|
|
|
|
def_bool y
|
|
|
|
depends on HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
|
|
|
|
|
tracing: define needed config DYNAMIC_FTRACE_WITH_ARGS
Commit 2860cd8a2353 ("livepatch: Use the default ftrace_ops instead of
REGS when ARGS is available") intends to enable config LIVEPATCH when
ftrace with ARGS is available. However, the chain of configs to enable
LIVEPATCH is incomplete, as HAVE_DYNAMIC_FTRACE_WITH_ARGS is available,
but the definition of DYNAMIC_FTRACE_WITH_ARGS, combining DYNAMIC_FTRACE
and HAVE_DYNAMIC_FTRACE_WITH_ARGS, needed to enable LIVEPATCH, is missing
in the commit.
Fortunately, ./scripts/checkkconfigsymbols.py detects this and warns:
DYNAMIC_FTRACE_WITH_ARGS
Referencing files: kernel/livepatch/Kconfig
So, define the config DYNAMIC_FTRACE_WITH_ARGS analogously to the already
existing similar configs, DYNAMIC_FTRACE_WITH_REGS and
DYNAMIC_FTRACE_WITH_DIRECT_CALLS, in ./kernel/trace/Kconfig to connect the
chain of configs.
Link: https://lore.kernel.org/kernel-janitors/CAKXUXMwT2zS9fgyQHKUUiqo8ynZBdx2UEUu1WnV_q0OCmknqhw@mail.gmail.com/
Link: https://lkml.kernel.org/r/20210806195027.16808-1-lukas.bulwahn@gmail.com
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: stable@vger.kernel.org
Fixes: 2860cd8a2353 ("livepatch: Use the default ftrace_ops instead of REGS when ARGS is available")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-06 12:50:27 -07:00
|
|
|
config DYNAMIC_FTRACE_WITH_ARGS
|
|
|
|
def_bool y
|
|
|
|
depends on DYNAMIC_FTRACE
|
|
|
|
depends on HAVE_DYNAMIC_FTRACE_WITH_ARGS
|
|
|
|
|
2022-03-15 07:00:38 -07:00
|
|
|
config FPROBE
|
|
|
|
bool "Kernel Function Probe (fprobe)"
|
|
|
|
depends on FUNCTION_TRACER
|
|
|
|
depends on DYNAMIC_FTRACE_WITH_REGS
|
2022-03-15 07:01:48 -07:00
|
|
|
depends on HAVE_RETHOOK
|
|
|
|
select RETHOOK
|
2022-03-15 07:00:38 -07:00
|
|
|
default n
|
|
|
|
help
|
2022-03-15 07:01:48 -07:00
|
|
|
This option enables kernel function probe (fprobe) based on ftrace.
|
|
|
|
The fprobe is similar to kprobes, but probes only for kernel function
|
|
|
|
entries and exits. This also can probe multiple functions by one
|
|
|
|
fprobe.
|
2022-03-15 07:00:38 -07:00
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2020-01-29 14:19:10 -07:00
|
|
|
config FUNCTION_PROFILER
|
|
|
|
bool "Kernel function profiler"
|
|
|
|
depends on FUNCTION_TRACER
|
|
|
|
default n
|
|
|
|
help
|
|
|
|
This option enables the kernel function profiler. A file is created
|
|
|
|
in debugfs called function_profile_enabled which defaults to zero.
|
|
|
|
When a 1 is echoed into this file profiling begins, and when a
|
|
|
|
zero is entered, profiling stops. A "functions" file is created in
|
|
|
|
the trace_stat directory; this file shows the list of functions that
|
|
|
|
have been hit and their counters.
|
|
|
|
|
|
|
|
If in doubt, say N.
|
|
|
|
|
|
|
|
config STACK_TRACER
|
|
|
|
bool "Trace max stack"
|
|
|
|
depends on HAVE_FUNCTION_TRACER
|
|
|
|
select FUNCTION_TRACER
|
|
|
|
select STACKTRACE
|
|
|
|
select KALLSYMS
|
|
|
|
help
|
|
|
|
This special tracer records the maximum stack footprint of the
|
2023-02-15 15:33:45 -07:00
|
|
|
kernel and displays it in /sys/kernel/tracing/stack_trace.
|
2020-01-29 14:19:10 -07:00
|
|
|
|
|
|
|
This tracer works by hooking into every function call that the
|
|
|
|
kernel executes, and keeping a maximum stack depth value and
|
|
|
|
stack-trace saved. If this is configured with DYNAMIC_FTRACE
|
|
|
|
then it will not have any overhead while the stack tracer
|
|
|
|
is disabled.
|
|
|
|
|
|
|
|
To enable the stack tracer on bootup, pass in 'stacktrace'
|
|
|
|
on the kernel command line.
|
|
|
|
|
|
|
|
The stack tracer can also be enabled or disabled via the
|
|
|
|
sysctl kernel.stack_tracer_enabled
|
|
|
|
|
|
|
|
Say N if unsure.
|
|
|
|
|
2018-07-30 15:24:23 -07:00
|
|
|
config TRACE_PREEMPT_TOGGLE
|
|
|
|
bool
|
|
|
|
help
|
|
|
|
Enables hooks which will be called when preemption is first disabled,
|
|
|
|
and last enabled.
|
2009-03-20 09:50:56 -07:00
|
|
|
|
2008-05-12 12:20:42 -07:00
|
|
|
config IRQSOFF_TRACER
|
|
|
|
bool "Interrupts-off Latency Tracer"
|
|
|
|
default n
|
|
|
|
depends on TRACE_IRQFLAGS_SUPPORT
|
|
|
|
select TRACE_IRQFLAGS
|
2009-05-28 12:50:13 -07:00
|
|
|
select GENERIC_TRACER
|
2008-05-12 12:20:42 -07:00
|
|
|
select TRACER_MAX_TRACE
|
2009-09-04 11:24:40 -07:00
|
|
|
select RING_BUFFER_ALLOW_SWAP
|
2013-03-05 05:30:24 -07:00
|
|
|
select TRACER_SNAPSHOT
|
2013-03-05 12:50:23 -07:00
|
|
|
select TRACER_SNAPSHOT_PER_CPU_SWAP
|
2008-05-12 12:20:42 -07:00
|
|
|
help
|
|
|
|
This option measures the time spent in irqs-off critical
|
|
|
|
sections, with microsecond accuracy.
|
|
|
|
|
|
|
|
The default measurement method is a maximum search, which is
|
|
|
|
disabled by default and can be runtime (re-)started
|
|
|
|
via:
|
|
|
|
|
2023-02-15 15:33:45 -07:00
|
|
|
echo 0 > /sys/kernel/tracing/tracing_max_latency
|
2008-05-12 12:20:42 -07:00
|
|
|
|
2009-12-21 13:01:17 -07:00
|
|
|
(Note that kernel size and overhead increase with this option
|
2008-05-12 12:20:42 -07:00
|
|
|
enabled. This option and the preempt-off timing option can be
|
|
|
|
used together or separately.)
|
|
|
|
|
|
|
|
config PREEMPT_TRACER
|
|
|
|
bool "Preemption-off Latency Tracer"
|
|
|
|
default n
|
2019-07-26 14:19:40 -07:00
|
|
|
depends on PREEMPTION
|
2009-05-28 12:50:13 -07:00
|
|
|
select GENERIC_TRACER
|
2008-05-12 12:20:42 -07:00
|
|
|
select TRACER_MAX_TRACE
|
2009-09-04 11:24:40 -07:00
|
|
|
select RING_BUFFER_ALLOW_SWAP
|
2013-03-05 05:30:24 -07:00
|
|
|
select TRACER_SNAPSHOT
|
2013-03-05 12:50:23 -07:00
|
|
|
select TRACER_SNAPSHOT_PER_CPU_SWAP
|
2018-07-30 15:24:23 -07:00
|
|
|
select TRACE_PREEMPT_TOGGLE
|
2008-05-12 12:20:42 -07:00
|
|
|
help
|
2009-12-21 13:01:17 -07:00
|
|
|
This option measures the time spent in preemption-off critical
|
2008-05-12 12:20:42 -07:00
|
|
|
sections, with microsecond accuracy.
|
|
|
|
|
|
|
|
The default measurement method is a maximum search, which is
|
|
|
|
disabled by default and can be runtime (re-)started
|
|
|
|
via:
|
|
|
|
|
2023-02-15 15:33:45 -07:00
|
|
|
echo 0 > /sys/kernel/tracing/tracing_max_latency
|
2008-05-12 12:20:42 -07:00
|
|
|
|
2009-12-21 13:01:17 -07:00
|
|
|
(Note that kernel size and overhead increase with this option
|
2008-05-12 12:20:42 -07:00
|
|
|
enabled. This option and the irqs-off timing option can be
|
|
|
|
used together or separately.)
|
|
|
|
|
2008-05-12 12:20:42 -07:00
|
|
|
config SCHED_TRACER
|
|
|
|
bool "Scheduling Latency Tracer"
|
2009-05-28 12:50:13 -07:00
|
|
|
select GENERIC_TRACER
|
2008-05-12 12:20:42 -07:00
|
|
|
select CONTEXT_SWITCH_TRACER
|
|
|
|
select TRACER_MAX_TRACE
|
2013-03-05 05:30:24 -07:00
|
|
|
select TRACER_SNAPSHOT
|
2008-05-12 12:20:42 -07:00
|
|
|
help
|
|
|
|
This tracer tracks the latency of the highest priority task
|
|
|
|
to be scheduled in, starting from the point it has woken up.
|
|
|
|
|
2016-06-23 09:45:36 -07:00
|
|
|
config HWLAT_TRACER
|
|
|
|
bool "Tracer to detect hardware latencies (like SMIs)"
|
|
|
|
select GENERIC_TRACER
|
2022-12-06 07:18:01 -07:00
|
|
|
select TRACER_MAX_TRACE
|
2016-06-23 09:45:36 -07:00
|
|
|
help
|
|
|
|
This tracer, when enabled will create one or more kernel threads,
|
2017-06-13 04:06:59 -07:00
|
|
|
depending on what the cpumask file is set to, which each thread
|
2016-06-23 09:45:36 -07:00
|
|
|
spinning in a loop looking for interruptions caused by
|
|
|
|
something other than the kernel. For example, if a
|
|
|
|
System Management Interrupt (SMI) takes a noticeable amount of
|
|
|
|
time, this tracer will detect it. This is useful for testing
|
|
|
|
if a system is reliable for Real Time tasks.
|
|
|
|
|
|
|
|
Some files are created in the tracing directory when this
|
|
|
|
is enabled:
|
|
|
|
|
|
|
|
hwlat_detector/width - time in usecs for how long to spin for
|
|
|
|
hwlat_detector/window - time in usecs between the start of each
|
|
|
|
iteration
|
|
|
|
|
|
|
|
A kernel thread is created that will spin with interrupts disabled
|
2017-06-13 04:06:59 -07:00
|
|
|
for "width" microseconds in every "window" cycle. It will not spin
|
2016-06-23 09:45:36 -07:00
|
|
|
for "window - width" microseconds, where the system can
|
|
|
|
continue to operate.
|
|
|
|
|
|
|
|
The output will appear in the trace and trace_pipe files.
|
|
|
|
|
|
|
|
When the tracer is not running, it has no affect on the system,
|
|
|
|
but when it is running, it can cause the system to be
|
|
|
|
periodically non responsive. Do not run this tracer on a
|
|
|
|
production system.
|
|
|
|
|
|
|
|
To enable this tracer, echo in "hwlat" into the current_tracer
|
|
|
|
file. Every time a latency is greater than tracing_thresh, it will
|
|
|
|
be recorded into the ring buffer.
|
|
|
|
|
trace: Add osnoise tracer
In the context of high-performance computing (HPC), the Operating System
Noise (*osnoise*) refers to the interference experienced by an application
due to activities inside the operating system. In the context of Linux,
NMIs, IRQs, SoftIRQs, and any other system thread can cause noise to the
system. Moreover, hardware-related jobs can also cause noise, for example,
via SMIs.
The osnoise tracer leverages the hwlat_detector by running a similar
loop with preemption, SoftIRQs and IRQs enabled, thus allowing all
the sources of *osnoise* during its execution. Using the same approach
of hwlat, osnoise takes note of the entry and exit point of any
source of interferences, increasing a per-cpu interference counter. The
osnoise tracer also saves an interference counter for each source of
interference. The interference counter for NMI, IRQs, SoftIRQs, and
threads is increased anytime the tool observes these interferences' entry
events. When a noise happens without any interference from the operating
system level, the hardware noise counter increases, pointing to a
hardware-related noise. In this way, osnoise can account for any
source of interference. At the end of the period, the osnoise tracer
prints the sum of all noise, the max single noise, the percentage of CPU
available for the thread, and the counters for the noise sources.
Usage
Write the ASCII text "osnoise" into the current_tracer file of the
tracing system (generally mounted at /sys/kernel/tracing).
For example::
[root@f32 ~]# cd /sys/kernel/tracing/
[root@f32 tracing]# echo osnoise > current_tracer
It is possible to follow the trace by reading the trace trace file::
[root@f32 tracing]# cat trace
# tracer: osnoise
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth MAX
# || / SINGLE Interference counters:
# |||| RUNTIME NOISE % OF CPU NOISE +-----------------------------+
# TASK-PID CPU# |||| TIMESTAMP IN US IN US AVAILABLE IN US HW NMI IRQ SIRQ THREAD
# | | | |||| | | | | | | | | | |
<...>-859 [000] .... 81.637220: 1000000 190 99.98100 9 18 0 1007 18 1
<...>-860 [001] .... 81.638154: 1000000 656 99.93440 74 23 0 1006 16 3
<...>-861 [002] .... 81.638193: 1000000 5675 99.43250 202 6 0 1013 25 21
<...>-862 [003] .... 81.638242: 1000000 125 99.98750 45 1 0 1011 23 0
<...>-863 [004] .... 81.638260: 1000000 1721 99.82790 168 7 0 1002 49 41
<...>-864 [005] .... 81.638286: 1000000 263 99.97370 57 6 0 1006 26 2
<...>-865 [006] .... 81.638302: 1000000 109 99.98910 21 3 0 1006 18 1
<...>-866 [007] .... 81.638326: 1000000 7816 99.21840 107 8 0 1016 39 19
In addition to the regular trace fields (from TASK-PID to TIMESTAMP), the
tracer prints a message at the end of each period for each CPU that is
running an osnoise/CPU thread. The osnoise specific fields report:
- The RUNTIME IN USE reports the amount of time in microseconds that
the osnoise thread kept looping reading the time.
- The NOISE IN US reports the sum of noise in microseconds observed
by the osnoise tracer during the associated runtime.
- The % OF CPU AVAILABLE reports the percentage of CPU available for
the osnoise thread during the runtime window.
- The MAX SINGLE NOISE IN US reports the maximum single noise observed
during the runtime window.
- The Interference counters display how many each of the respective
interference happened during the runtime window.
Note that the example above shows a high number of HW noise samples.
The reason being is that this sample was taken on a virtual machine,
and the host interference is detected as a hardware interference.
Tracer options
The tracer has a set of options inside the osnoise directory, they are:
- osnoise/cpus: CPUs at which a osnoise thread will execute.
- osnoise/period_us: the period of the osnoise thread.
- osnoise/runtime_us: how long an osnoise thread will look for noise.
- osnoise/stop_tracing_us: stop the system tracing if a single noise
higher than the configured value happens. Writing 0 disables this
option.
- osnoise/stop_tracing_total_us: stop the system tracing if total noise
higher than the configured value happens. Writing 0 disables this
option.
- tracing_threshold: the minimum delta between two time() reads to be
considered as noise, in us. When set to 0, the default value will
be used, which is currently 5 us.
Additional Tracing
In addition to the tracer, a set of tracepoints were added to
facilitate the identification of the osnoise source.
- osnoise:sample_threshold: printed anytime a noise is higher than
the configurable tolerance_ns.
- osnoise:nmi_noise: noise from NMI, including the duration.
- osnoise:irq_noise: noise from an IRQ, including the duration.
- osnoise:softirq_noise: noise from a SoftIRQ, including the
duration.
- osnoise:thread_noise: noise from a thread, including the duration.
Note that all the values are *net values*. For example, if while osnoise
is running, another thread preempts the osnoise thread, it will start a
thread_noise duration at the start. Then, an IRQ takes place, preempting
the thread_noise, starting a irq_noise. When the IRQ ends its execution,
it will compute its duration, and this duration will be subtracted from
the thread_noise, in such a way as to avoid the double accounting of the
IRQ execution. This logic is valid for all sources of noise.
Here is one example of the usage of these tracepoints::
osnoise/8-961 [008] d.h. 5789.857532: irq_noise: local_timer:236 start 5789.857529929 duration 1845 ns
osnoise/8-961 [008] dNh. 5789.858408: irq_noise: local_timer:236 start 5789.858404871 duration 2848 ns
migration/8-54 [008] d... 5789.858413: thread_noise: migration/8:54 start 5789.858409300 duration 3068 ns
osnoise/8-961 [008] .... 5789.858413: sample_threshold: start 5789.858404555 duration 8723 ns interferences 2
In this example, a noise sample of 8 microseconds was reported in the last
line, pointing to two interferences. Looking backward in the trace, the
two previous entries were about the migration thread running after a
timer IRQ execution. The first event is not part of the noise because
it took place one millisecond before.
It is worth noticing that the sum of the duration reported in the
tracepoints is smaller than eight us reported in the sample_threshold.
The reason roots in the overhead of the entry and exit code that happens
before and after any interference execution. This justifies the dual
approach: measuring thread and tracing.
Link: https://lkml.kernel.org/r/e649467042d60e7b62714c9c6751a56299d15119.1624372313.git.bristot@redhat.com
Cc: Phil Auld <pauld@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Kate Carcia <kcarcia@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexandre Chartre <alexandre.chartre@oracle.com>
Cc: Clark Willaims <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
[
Made the following functions static:
trace_irqentry_callback()
trace_irqexit_callback()
trace_intel_irqentry_callback()
trace_intel_irqexit_callback()
Added to include/trace.h:
osnoise_arch_register()
osnoise_arch_unregister()
Fixed define logic for LATENCY_FS_NOTIFY
Reported-by: kernel test robot <lkp@intel.com>
]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-22 07:42:27 -07:00
|
|
|
config OSNOISE_TRACER
|
|
|
|
bool "OS Noise tracer"
|
|
|
|
select GENERIC_TRACER
|
2022-12-06 07:18:01 -07:00
|
|
|
select TRACER_MAX_TRACE
|
trace: Add osnoise tracer
In the context of high-performance computing (HPC), the Operating System
Noise (*osnoise*) refers to the interference experienced by an application
due to activities inside the operating system. In the context of Linux,
NMIs, IRQs, SoftIRQs, and any other system thread can cause noise to the
system. Moreover, hardware-related jobs can also cause noise, for example,
via SMIs.
The osnoise tracer leverages the hwlat_detector by running a similar
loop with preemption, SoftIRQs and IRQs enabled, thus allowing all
the sources of *osnoise* during its execution. Using the same approach
of hwlat, osnoise takes note of the entry and exit point of any
source of interferences, increasing a per-cpu interference counter. The
osnoise tracer also saves an interference counter for each source of
interference. The interference counter for NMI, IRQs, SoftIRQs, and
threads is increased anytime the tool observes these interferences' entry
events. When a noise happens without any interference from the operating
system level, the hardware noise counter increases, pointing to a
hardware-related noise. In this way, osnoise can account for any
source of interference. At the end of the period, the osnoise tracer
prints the sum of all noise, the max single noise, the percentage of CPU
available for the thread, and the counters for the noise sources.
Usage
Write the ASCII text "osnoise" into the current_tracer file of the
tracing system (generally mounted at /sys/kernel/tracing).
For example::
[root@f32 ~]# cd /sys/kernel/tracing/
[root@f32 tracing]# echo osnoise > current_tracer
It is possible to follow the trace by reading the trace trace file::
[root@f32 tracing]# cat trace
# tracer: osnoise
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth MAX
# || / SINGLE Interference counters:
# |||| RUNTIME NOISE % OF CPU NOISE +-----------------------------+
# TASK-PID CPU# |||| TIMESTAMP IN US IN US AVAILABLE IN US HW NMI IRQ SIRQ THREAD
# | | | |||| | | | | | | | | | |
<...>-859 [000] .... 81.637220: 1000000 190 99.98100 9 18 0 1007 18 1
<...>-860 [001] .... 81.638154: 1000000 656 99.93440 74 23 0 1006 16 3
<...>-861 [002] .... 81.638193: 1000000 5675 99.43250 202 6 0 1013 25 21
<...>-862 [003] .... 81.638242: 1000000 125 99.98750 45 1 0 1011 23 0
<...>-863 [004] .... 81.638260: 1000000 1721 99.82790 168 7 0 1002 49 41
<...>-864 [005] .... 81.638286: 1000000 263 99.97370 57 6 0 1006 26 2
<...>-865 [006] .... 81.638302: 1000000 109 99.98910 21 3 0 1006 18 1
<...>-866 [007] .... 81.638326: 1000000 7816 99.21840 107 8 0 1016 39 19
In addition to the regular trace fields (from TASK-PID to TIMESTAMP), the
tracer prints a message at the end of each period for each CPU that is
running an osnoise/CPU thread. The osnoise specific fields report:
- The RUNTIME IN USE reports the amount of time in microseconds that
the osnoise thread kept looping reading the time.
- The NOISE IN US reports the sum of noise in microseconds observed
by the osnoise tracer during the associated runtime.
- The % OF CPU AVAILABLE reports the percentage of CPU available for
the osnoise thread during the runtime window.
- The MAX SINGLE NOISE IN US reports the maximum single noise observed
during the runtime window.
- The Interference counters display how many each of the respective
interference happened during the runtime window.
Note that the example above shows a high number of HW noise samples.
The reason being is that this sample was taken on a virtual machine,
and the host interference is detected as a hardware interference.
Tracer options
The tracer has a set of options inside the osnoise directory, they are:
- osnoise/cpus: CPUs at which a osnoise thread will execute.
- osnoise/period_us: the period of the osnoise thread.
- osnoise/runtime_us: how long an osnoise thread will look for noise.
- osnoise/stop_tracing_us: stop the system tracing if a single noise
higher than the configured value happens. Writing 0 disables this
option.
- osnoise/stop_tracing_total_us: stop the system tracing if total noise
higher than the configured value happens. Writing 0 disables this
option.
- tracing_threshold: the minimum delta between two time() reads to be
considered as noise, in us. When set to 0, the default value will
be used, which is currently 5 us.
Additional Tracing
In addition to the tracer, a set of tracepoints were added to
facilitate the identification of the osnoise source.
- osnoise:sample_threshold: printed anytime a noise is higher than
the configurable tolerance_ns.
- osnoise:nmi_noise: noise from NMI, including the duration.
- osnoise:irq_noise: noise from an IRQ, including the duration.
- osnoise:softirq_noise: noise from a SoftIRQ, including the
duration.
- osnoise:thread_noise: noise from a thread, including the duration.
Note that all the values are *net values*. For example, if while osnoise
is running, another thread preempts the osnoise thread, it will start a
thread_noise duration at the start. Then, an IRQ takes place, preempting
the thread_noise, starting a irq_noise. When the IRQ ends its execution,
it will compute its duration, and this duration will be subtracted from
the thread_noise, in such a way as to avoid the double accounting of the
IRQ execution. This logic is valid for all sources of noise.
Here is one example of the usage of these tracepoints::
osnoise/8-961 [008] d.h. 5789.857532: irq_noise: local_timer:236 start 5789.857529929 duration 1845 ns
osnoise/8-961 [008] dNh. 5789.858408: irq_noise: local_timer:236 start 5789.858404871 duration 2848 ns
migration/8-54 [008] d... 5789.858413: thread_noise: migration/8:54 start 5789.858409300 duration 3068 ns
osnoise/8-961 [008] .... 5789.858413: sample_threshold: start 5789.858404555 duration 8723 ns interferences 2
In this example, a noise sample of 8 microseconds was reported in the last
line, pointing to two interferences. Looking backward in the trace, the
two previous entries were about the migration thread running after a
timer IRQ execution. The first event is not part of the noise because
it took place one millisecond before.
It is worth noticing that the sum of the duration reported in the
tracepoints is smaller than eight us reported in the sample_threshold.
The reason roots in the overhead of the entry and exit code that happens
before and after any interference execution. This justifies the dual
approach: measuring thread and tracing.
Link: https://lkml.kernel.org/r/e649467042d60e7b62714c9c6751a56299d15119.1624372313.git.bristot@redhat.com
Cc: Phil Auld <pauld@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Kate Carcia <kcarcia@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexandre Chartre <alexandre.chartre@oracle.com>
Cc: Clark Willaims <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
[
Made the following functions static:
trace_irqentry_callback()
trace_irqexit_callback()
trace_intel_irqentry_callback()
trace_intel_irqexit_callback()
Added to include/trace.h:
osnoise_arch_register()
osnoise_arch_unregister()
Fixed define logic for LATENCY_FS_NOTIFY
Reported-by: kernel test robot <lkp@intel.com>
]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-22 07:42:27 -07:00
|
|
|
help
|
|
|
|
In the context of high-performance computing (HPC), the Operating
|
|
|
|
System Noise (osnoise) refers to the interference experienced by an
|
|
|
|
application due to activities inside the operating system. In the
|
|
|
|
context of Linux, NMIs, IRQs, SoftIRQs, and any other system thread
|
|
|
|
can cause noise to the system. Moreover, hardware-related jobs can
|
|
|
|
also cause noise, for example, via SMIs.
|
|
|
|
|
|
|
|
The osnoise tracer leverages the hwlat_detector by running a similar
|
|
|
|
loop with preemption, SoftIRQs and IRQs enabled, thus allowing all
|
|
|
|
the sources of osnoise during its execution. The osnoise tracer takes
|
|
|
|
note of the entry and exit point of any source of interferences,
|
|
|
|
increasing a per-cpu interference counter. It saves an interference
|
|
|
|
counter for each source of interference. The interference counter for
|
|
|
|
NMI, IRQs, SoftIRQs, and threads is increased anytime the tool
|
|
|
|
observes these interferences' entry events. When a noise happens
|
|
|
|
without any interference from the operating system level, the
|
|
|
|
hardware noise counter increases, pointing to a hardware-related
|
|
|
|
noise. In this way, osnoise can account for any source of
|
|
|
|
interference. At the end of the period, the osnoise tracer prints
|
|
|
|
the sum of all noise, the max single noise, the percentage of CPU
|
|
|
|
available for the thread, and the counters for the noise sources.
|
|
|
|
|
|
|
|
In addition to the tracer, a set of tracepoints were added to
|
|
|
|
facilitate the identification of the osnoise source.
|
|
|
|
|
|
|
|
The output will appear in the trace and trace_pipe files.
|
|
|
|
|
|
|
|
To enable this tracer, echo in "osnoise" into the current_tracer
|
|
|
|
file.
|
|
|
|
|
2021-06-22 07:42:28 -07:00
|
|
|
config TIMERLAT_TRACER
|
|
|
|
bool "Timerlat tracer"
|
|
|
|
select OSNOISE_TRACER
|
|
|
|
select GENERIC_TRACER
|
|
|
|
help
|
|
|
|
The timerlat tracer aims to help the preemptive kernel developers
|
|
|
|
to find sources of wakeup latencies of real-time threads.
|
|
|
|
|
|
|
|
The tracer creates a per-cpu kernel thread with real-time priority.
|
|
|
|
The tracer thread sets a periodic timer to wakeup itself, and goes
|
|
|
|
to sleep waiting for the timer to fire. At the wakeup, the thread
|
|
|
|
then computes a wakeup latency value as the difference between
|
|
|
|
the current time and the absolute time that the timer was set
|
|
|
|
to expire.
|
|
|
|
|
|
|
|
The tracer prints two lines at every activation. The first is the
|
|
|
|
timer latency observed at the hardirq context before the
|
|
|
|
activation of the thread. The second is the timer latency observed
|
|
|
|
by the thread, which is the same level that cyclictest reports. The
|
|
|
|
ACTIVATION ID field serves to relate the irq execution to its
|
|
|
|
respective thread execution.
|
|
|
|
|
|
|
|
The tracer is build on top of osnoise tracer, and the osnoise:
|
|
|
|
events can be used to trace the source of interference from NMI,
|
|
|
|
IRQs and other threads. It also enables the capture of the
|
|
|
|
stacktrace at the IRQ context, which helps to identify the code
|
|
|
|
path that can cause thread delay.
|
|
|
|
|
2020-01-29 14:26:45 -07:00
|
|
|
config MMIOTRACE
|
|
|
|
bool "Memory mapped IO tracing"
|
|
|
|
depends on HAVE_MMIOTRACE_SUPPORT && PCI
|
|
|
|
select GENERIC_TRACER
|
|
|
|
help
|
|
|
|
Mmiotrace traces Memory Mapped I/O access and is meant for
|
|
|
|
debugging and reverse engineering. It is called from the ioremap
|
|
|
|
implementation and works via page faults. Tracing is disabled by
|
|
|
|
default and can be enabled at run-time.
|
|
|
|
|
|
|
|
See Documentation/trace/mmiotrace.rst.
|
|
|
|
If you are not helping to develop drivers, say N.
|
|
|
|
|
2009-05-28 13:31:21 -07:00
|
|
|
config ENABLE_DEFAULT_TRACERS
|
|
|
|
bool "Trace process context switches and events"
|
2009-05-28 12:50:13 -07:00
|
|
|
depends on !GENERIC_TRACER
|
2009-02-24 08:21:36 -07:00
|
|
|
select TRACING
|
|
|
|
help
|
2009-12-21 13:01:17 -07:00
|
|
|
This tracer hooks to various trace points in the kernel,
|
2009-02-24 08:21:36 -07:00
|
|
|
allowing the user to pick and choose which trace point they
|
2009-05-28 13:31:21 -07:00
|
|
|
want to trace. It also includes the sched_switch tracer plugin.
|
2009-04-20 07:59:34 -07:00
|
|
|
|
2009-03-06 21:52:59 -07:00
|
|
|
config FTRACE_SYSCALLS
|
|
|
|
bool "Trace syscalls"
|
2009-08-24 14:43:11 -07:00
|
|
|
depends on HAVE_SYSCALL_TRACEPOINTS
|
2009-05-28 12:50:13 -07:00
|
|
|
select GENERIC_TRACER
|
2009-03-15 14:10:38 -07:00
|
|
|
select KALLSYMS
|
2009-03-06 21:52:59 -07:00
|
|
|
help
|
|
|
|
Basic tracer to catch the syscall entry and exit events.
|
|
|
|
|
2012-12-25 19:53:00 -07:00
|
|
|
config TRACER_SNAPSHOT
|
|
|
|
bool "Create a snapshot trace buffer"
|
|
|
|
select TRACER_MAX_TRACE
|
|
|
|
help
|
|
|
|
Allow tracing users to take snapshot of the current buffer using the
|
|
|
|
ftrace interface, e.g.:
|
|
|
|
|
2023-02-15 15:33:45 -07:00
|
|
|
echo 1 > /sys/kernel/tracing/snapshot
|
2012-12-25 19:53:00 -07:00
|
|
|
cat snapshot
|
|
|
|
|
2013-03-05 12:50:23 -07:00
|
|
|
config TRACER_SNAPSHOT_PER_CPU_SWAP
|
2019-11-20 06:38:07 -07:00
|
|
|
bool "Allow snapshot to swap per CPU"
|
2013-03-05 12:50:23 -07:00
|
|
|
depends on TRACER_SNAPSHOT
|
|
|
|
select RING_BUFFER_ALLOW_SWAP
|
|
|
|
help
|
|
|
|
Allow doing a snapshot of a single CPU buffer instead of a
|
|
|
|
full swap (all buffers). If this is set, then the following is
|
|
|
|
allowed:
|
|
|
|
|
2023-02-15 15:33:45 -07:00
|
|
|
echo 1 > /sys/kernel/tracing/per_cpu/cpu2/snapshot
|
2013-03-05 12:50:23 -07:00
|
|
|
|
|
|
|
After which, only the tracing buffer for CPU 2 was swapped with
|
|
|
|
the main tracing buffer, and the other CPU buffers remain the same.
|
|
|
|
|
|
|
|
When this is enabled, this adds a little more overhead to the
|
|
|
|
trace recording, as it needs to add some checks to synchronize
|
|
|
|
recording with swaps. But this does not affect the performance
|
|
|
|
of the overall system. This is enabled by default when the preempt
|
|
|
|
or irq latency tracers are enabled, as those need to swap as well
|
|
|
|
and already adds the overhead (plus a lot more).
|
|
|
|
|
2008-11-12 13:24:24 -07:00
|
|
|
config TRACE_BRANCH_PROFILING
|
2009-04-20 07:27:58 -07:00
|
|
|
bool
|
2009-05-28 12:50:13 -07:00
|
|
|
select GENERIC_TRACER
|
2009-04-20 07:27:58 -07:00
|
|
|
|
|
|
|
choice
|
|
|
|
prompt "Branch Profiling"
|
|
|
|
default BRANCH_PROFILE_NONE
|
|
|
|
help
|
|
|
|
The branch profiling is a software profiler. It will add hooks
|
|
|
|
into the C conditionals to test which path a branch takes.
|
|
|
|
|
|
|
|
The likely/unlikely profiler only looks at the conditions that
|
|
|
|
are annotated with a likely or unlikely macro.
|
|
|
|
|
2009-12-21 13:01:17 -07:00
|
|
|
The "all branch" profiler will profile every if-statement in the
|
2009-04-20 07:27:58 -07:00
|
|
|
kernel. This profiler will also enable the likely/unlikely
|
2009-12-21 13:01:17 -07:00
|
|
|
profiler.
|
2009-04-20 07:27:58 -07:00
|
|
|
|
2009-12-21 13:01:17 -07:00
|
|
|
Either of the above profilers adds a bit of overhead to the system.
|
|
|
|
If unsure, choose "No branch profiling".
|
2009-04-20 07:27:58 -07:00
|
|
|
|
|
|
|
config BRANCH_PROFILE_NONE
|
|
|
|
bool "No branch profiling"
|
|
|
|
help
|
2009-12-21 13:01:17 -07:00
|
|
|
No branch profiling. Branch profiling adds a bit of overhead.
|
|
|
|
Only enable it if you want to analyse the branching behavior.
|
|
|
|
Otherwise keep it disabled.
|
2009-04-20 07:27:58 -07:00
|
|
|
|
|
|
|
config PROFILE_ANNOTATED_BRANCHES
|
|
|
|
bool "Trace likely/unlikely profiler"
|
|
|
|
select TRACE_BRANCH_PROFILING
|
2008-11-11 22:14:39 -07:00
|
|
|
help
|
2012-04-17 08:01:21 -07:00
|
|
|
This tracer profiles all likely and unlikely macros
|
2008-11-11 22:14:39 -07:00
|
|
|
in the kernel. It will display the results in:
|
|
|
|
|
2023-02-15 15:33:45 -07:00
|
|
|
/sys/kernel/tracing/trace_stat/branch_annotated
|
2008-11-11 22:14:39 -07:00
|
|
|
|
2009-12-21 13:01:17 -07:00
|
|
|
Note: this will add a significant overhead; only turn this
|
2008-11-11 22:14:39 -07:00
|
|
|
on if you need to profile the system's use of these macros.
|
|
|
|
|
2008-11-20 23:30:54 -07:00
|
|
|
config PROFILE_ALL_BRANCHES
|
2018-01-15 12:07:27 -07:00
|
|
|
bool "Profile all if conditionals" if !FORTIFY_SOURCE
|
2009-04-20 07:27:58 -07:00
|
|
|
select TRACE_BRANCH_PROFILING
|
2008-11-20 23:30:54 -07:00
|
|
|
help
|
|
|
|
This tracer profiles all branch conditions. Every if ()
|
|
|
|
taken in the kernel is recorded whether it hit or miss.
|
|
|
|
The results will be displayed in:
|
|
|
|
|
2023-02-15 15:33:45 -07:00
|
|
|
/sys/kernel/tracing/trace_stat/branch_all
|
2008-11-20 23:30:54 -07:00
|
|
|
|
2009-04-20 07:27:58 -07:00
|
|
|
This option also enables the likely/unlikely profiler.
|
|
|
|
|
2008-11-20 23:30:54 -07:00
|
|
|
This configuration, when enabled, will impose a great overhead
|
|
|
|
on the system. This should only be enabled when the system
|
2009-12-21 13:01:17 -07:00
|
|
|
is to be analyzed in much detail.
|
2009-04-20 07:27:58 -07:00
|
|
|
endchoice
|
2008-11-20 23:30:54 -07:00
|
|
|
|
2008-11-12 13:24:24 -07:00
|
|
|
config TRACING_BRANCHES
|
2008-11-11 22:14:40 -07:00
|
|
|
bool
|
|
|
|
help
|
|
|
|
Selected by tracers that will trace the likely and unlikely
|
|
|
|
conditions. This prevents the tracers themselves from being
|
|
|
|
profiled. Profiling the tracing infrastructure can only happen
|
|
|
|
when the likelys and unlikelys are not being traced.
|
|
|
|
|
2008-11-12 13:24:24 -07:00
|
|
|
config BRANCH_TRACER
|
2008-11-11 22:14:40 -07:00
|
|
|
bool "Trace likely/unlikely instances"
|
2008-11-12 13:24:24 -07:00
|
|
|
depends on TRACE_BRANCH_PROFILING
|
|
|
|
select TRACING_BRANCHES
|
2008-11-11 22:14:40 -07:00
|
|
|
help
|
|
|
|
This traces the events of likely and unlikely condition
|
|
|
|
calls in the kernel. The difference between this and the
|
|
|
|
"Trace likely/unlikely profiler" is that this is not a
|
|
|
|
histogram of the callers, but actually places the calling
|
|
|
|
events into a running trace buffer to see when and where the
|
|
|
|
events happened, as well as their results.
|
|
|
|
|
|
|
|
Say N if unsure.
|
|
|
|
|
2009-02-07 12:46:45 -07:00
|
|
|
config BLK_DEV_IO_TRACE
|
2009-12-21 13:01:17 -07:00
|
|
|
bool "Support for tracing block IO actions"
|
2009-02-07 12:46:45 -07:00
|
|
|
depends on SYSFS
|
2009-02-09 04:06:54 -07:00
|
|
|
depends on BLOCK
|
2009-02-07 12:46:45 -07:00
|
|
|
select RELAY
|
|
|
|
select DEBUG_FS
|
|
|
|
select TRACEPOINTS
|
2009-05-28 12:50:13 -07:00
|
|
|
select GENERIC_TRACER
|
2009-02-07 12:46:45 -07:00
|
|
|
select STACKTRACE
|
|
|
|
help
|
|
|
|
Say Y here if you want to be able to trace the block layer actions
|
|
|
|
on a given queue. Tracing allows you to see any traffic happening
|
|
|
|
on a block device queue. For more information (and the userspace
|
|
|
|
support tools needed), fetch the blktrace tools from:
|
|
|
|
|
|
|
|
git://git.kernel.dk/blktrace.git
|
|
|
|
|
|
|
|
Tracing also is possible using the ftrace interface, e.g.:
|
|
|
|
|
|
|
|
echo 1 > /sys/block/sda/sda1/trace/enable
|
2023-02-15 15:33:45 -07:00
|
|
|
echo blk > /sys/kernel/tracing/current_tracer
|
|
|
|
cat /sys/kernel/tracing/trace_pipe
|
2009-02-07 12:46:45 -07:00
|
|
|
|
|
|
|
If unsure, say N.
|
2008-12-29 14:42:23 -07:00
|
|
|
|
2023-06-06 05:39:55 -07:00
|
|
|
config FPROBE_EVENTS
|
|
|
|
depends on FPROBE
|
|
|
|
depends on HAVE_REGS_AND_STACK_ACCESS_API
|
|
|
|
bool "Enable fprobe-based dynamic events"
|
|
|
|
select TRACING
|
|
|
|
select PROBE_EVENTS
|
|
|
|
select DYNAMIC_EVENTS
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
This allows user to add tracing events on the function entry and
|
|
|
|
exit via ftrace interface. The syntax is same as the kprobe events
|
|
|
|
and the kprobe events on function entry and exit will be
|
|
|
|
transparently converted to this fprobe events.
|
|
|
|
|
2023-06-06 05:39:56 -07:00
|
|
|
config PROBE_EVENTS_BTF_ARGS
|
|
|
|
depends on HAVE_FUNCTION_ARG_ACCESS_API
|
|
|
|
depends on FPROBE_EVENTS || KPROBE_EVENTS
|
|
|
|
depends on DEBUG_INFO_BTF && BPF_SYSCALL
|
|
|
|
bool "Support BTF function arguments for probe events"
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
The user can specify the arguments of the probe event using the names
|
|
|
|
of the arguments of the probed function, when the probe location is a
|
|
|
|
kernel function entry or a tracepoint.
|
|
|
|
This is available only if BTF (BPF Type Format) support is enabled.
|
|
|
|
|
2017-02-15 23:00:50 -07:00
|
|
|
config KPROBE_EVENTS
|
2009-08-13 13:35:11 -07:00
|
|
|
depends on KPROBES
|
2010-02-10 09:25:17 -07:00
|
|
|
depends on HAVE_REGS_AND_STACK_ACCESS_API
|
2009-11-03 17:12:47 -07:00
|
|
|
bool "Enable kprobes-based dynamic events"
|
2009-08-13 13:35:11 -07:00
|
|
|
select TRACING
|
2012-04-09 02:11:44 -07:00
|
|
|
select PROBE_EVENTS
|
2018-11-05 02:02:36 -07:00
|
|
|
select DYNAMIC_EVENTS
|
2009-11-03 17:12:47 -07:00
|
|
|
default y
|
2009-08-13 13:35:11 -07:00
|
|
|
help
|
2009-12-21 13:01:17 -07:00
|
|
|
This allows the user to add tracing events (similar to tracepoints)
|
|
|
|
on the fly via the ftrace interface. See
|
2018-05-08 11:14:57 -07:00
|
|
|
Documentation/trace/kprobetrace.rst for more details.
|
2009-11-03 17:12:47 -07:00
|
|
|
|
|
|
|
Those events can be inserted wherever kprobes can probe, and record
|
|
|
|
various register and memory values.
|
|
|
|
|
2009-12-21 13:01:17 -07:00
|
|
|
This option is also required by perf-probe subcommand of perf tools.
|
|
|
|
If you want to use perf tools, this option is strongly recommended.
|
2009-08-13 13:35:11 -07:00
|
|
|
|
2018-07-30 03:20:14 -07:00
|
|
|
config KPROBE_EVENTS_ON_NOTRACE
|
|
|
|
bool "Do NOT protect notrace function from kprobe events"
|
|
|
|
depends on KPROBE_EVENTS
|
2021-01-07 21:19:38 -07:00
|
|
|
depends on DYNAMIC_FTRACE
|
2018-07-30 03:20:14 -07:00
|
|
|
default n
|
|
|
|
help
|
|
|
|
This is only for the developers who want to debug ftrace itself
|
|
|
|
using kprobe events.
|
|
|
|
|
|
|
|
If kprobes can use ftrace instead of breakpoint, ftrace related
|
2020-12-16 04:40:51 -07:00
|
|
|
functions are protected from kprobe-events to prevent an infinite
|
2018-07-30 03:20:14 -07:00
|
|
|
recursion or any unexpected execution path which leads to a kernel
|
|
|
|
crash.
|
|
|
|
|
|
|
|
This option disables such protection and allows you to put kprobe
|
|
|
|
events on ftrace functions for debugging ftrace by itself.
|
|
|
|
Note that this might let you shoot yourself in the foot.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2017-02-15 23:00:50 -07:00
|
|
|
config UPROBE_EVENTS
|
2012-04-11 03:30:43 -07:00
|
|
|
bool "Enable uprobes-based dynamic events"
|
|
|
|
depends on ARCH_SUPPORTS_UPROBES
|
|
|
|
depends on MMU
|
2014-03-07 08:32:22 -07:00
|
|
|
depends on PERF_EVENTS
|
2012-04-11 03:30:43 -07:00
|
|
|
select UPROBES
|
|
|
|
select PROBE_EVENTS
|
2018-11-05 02:03:04 -07:00
|
|
|
select DYNAMIC_EVENTS
|
2012-04-11 03:30:43 -07:00
|
|
|
select TRACING
|
2017-03-16 08:42:02 -07:00
|
|
|
default y
|
2012-04-11 03:30:43 -07:00
|
|
|
help
|
|
|
|
This allows the user to add tracing events on top of userspace
|
|
|
|
dynamic events (similar to tracepoints) on the fly via the trace
|
|
|
|
events interface. Those events can be inserted wherever uprobes
|
|
|
|
can probe, and record various registers.
|
|
|
|
This option is required if you plan to use perf-probe subcommand
|
|
|
|
of perf tools on user space applications.
|
|
|
|
|
2015-04-02 06:51:39 -07:00
|
|
|
config BPF_EVENTS
|
|
|
|
depends on BPF_SYSCALL
|
2017-02-15 23:00:50 -07:00
|
|
|
depends on (KPROBE_EVENTS || UPROBE_EVENTS) && PERF_EVENTS
|
2015-04-02 06:51:39 -07:00
|
|
|
bool
|
|
|
|
default y
|
|
|
|
help
|
2019-08-20 16:08:57 -07:00
|
|
|
This allows the user to attach BPF programs to kprobe, uprobe, and
|
|
|
|
tracepoint events.
|
2015-04-02 06:51:39 -07:00
|
|
|
|
2018-11-05 02:02:08 -07:00
|
|
|
config DYNAMIC_EVENTS
|
|
|
|
def_bool n
|
|
|
|
|
2012-04-09 02:11:44 -07:00
|
|
|
config PROBE_EVENTS
|
|
|
|
def_bool n
|
|
|
|
|
2017-12-11 09:36:48 -07:00
|
|
|
config BPF_KPROBE_OVERRIDE
|
|
|
|
bool "Enable BPF programs to override a kprobed function"
|
|
|
|
depends on BPF_EVENTS
|
2018-01-12 10:55:03 -07:00
|
|
|
depends on FUNCTION_ERROR_INJECTION
|
2017-12-11 09:36:48 -07:00
|
|
|
default n
|
|
|
|
help
|
|
|
|
Allows BPF to override the execution of a probed function and
|
|
|
|
set a different return value. This is used for error injection.
|
|
|
|
|
ftrace: create __mcount_loc section
This patch creates a section in the kernel called "__mcount_loc".
This will hold a list of pointers to the mcount relocation for
each call site of mcount.
For example:
objdump -dr init/main.o
[...]
Disassembly of section .text:
0000000000000000 <do_one_initcall>:
0: 55 push %rbp
[...]
000000000000017b <init_post>:
17b: 55 push %rbp
17c: 48 89 e5 mov %rsp,%rbp
17f: 53 push %rbx
180: 48 83 ec 08 sub $0x8,%rsp
184: e8 00 00 00 00 callq 189 <init_post+0xe>
185: R_X86_64_PC32 mcount+0xfffffffffffffffc
[...]
We will add a section to point to each function call.
.section __mcount_loc,"a",@progbits
[...]
.quad .text + 0x185
[...]
The offset to of the mcount call site in init_post is an offset from
the start of the section, and not the start of the function init_post.
The mcount relocation is at the call site 0x185 from the start of the
.text section.
.text + 0x185 == init_post + 0xa
We need a way to add this __mcount_loc section in a way that we do not
lose the relocations after final link. The .text section here will
be attached to all other .text sections after final link and the
offsets will be meaningless. We need to keep track of where these
.text sections are.
To do this, we use the start of the first function in the section.
do_one_initcall. We can make a tmp.s file with this function as a reference
to the start of the .text section.
.section __mcount_loc,"a",@progbits
[...]
.quad do_one_initcall + 0x185
[...]
Then we can compile the tmp.s into a tmp.o
gcc -c tmp.s -o tmp.o
And link it into back into main.o.
ld -r main.o tmp.o -o tmp_main.o
mv tmp_main.o main.o
But we have a problem. What happens if the first function in a section
is not exported, and is a static function. The linker will not let
the tmp.o use it. This case exists in main.o as well.
Disassembly of section .init.text:
0000000000000000 <set_reset_devices>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: e8 00 00 00 00 callq 9 <set_reset_devices+0x9>
5: R_X86_64_PC32 mcount+0xfffffffffffffffc
The first function in .init.text is a static function.
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
The lowercase 't' means that set_reset_devices is local and is not exported.
If we simply try to link the tmp.o with the set_reset_devices we end
up with two symbols: one local and one global.
.section __mcount_loc,"a",@progbits
.quad set_reset_devices + 0x10
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
U set_reset_devices
We still have an undefined reference to set_reset_devices, and if we try
to compile the kernel, we will end up with an undefined reference to
set_reset_devices, or even worst, it could be exported someplace else,
and then we will have a reference to the wrong location.
To handle this case, we make an intermediate step using objcopy.
We convert set_reset_devices into a global exported symbol before linking
it with tmp.o and set it back afterwards.
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 T set_reset_devices
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 T set_reset_devices
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
Now we have a section in main.o called __mcount_loc that we can place
somewhere in the kernel using vmlinux.ld.S and access it to convert
all these locations that call mcount into nops before starting SMP
and thus, eliminating the need to do this with kstop_machine.
Note, A well documented perl script (scripts/recordmcount.pl) is used
to do all this in one location.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-14 12:45:07 -07:00
|
|
|
config FTRACE_MCOUNT_RECORD
|
|
|
|
def_bool y
|
|
|
|
depends on DYNAMIC_FTRACE
|
|
|
|
depends on HAVE_FTRACE_MCOUNT_RECORD
|
|
|
|
|
2020-12-11 11:46:18 -07:00
|
|
|
config FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY
|
|
|
|
bool
|
|
|
|
depends on FTRACE_MCOUNT_RECORD
|
|
|
|
|
|
|
|
config FTRACE_MCOUNT_USE_CC
|
|
|
|
def_bool y
|
|
|
|
depends on $(cc-option,-mrecord-mcount)
|
|
|
|
depends on !FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY
|
|
|
|
depends on FTRACE_MCOUNT_RECORD
|
|
|
|
|
2020-09-25 16:43:53 -07:00
|
|
|
config FTRACE_MCOUNT_USE_OBJTOOL
|
|
|
|
def_bool y
|
|
|
|
depends on HAVE_OBJTOOL_MCOUNT
|
|
|
|
depends on !FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY
|
|
|
|
depends on !FTRACE_MCOUNT_USE_CC
|
|
|
|
depends on FTRACE_MCOUNT_RECORD
|
2022-04-18 09:50:36 -07:00
|
|
|
select OBJTOOL
|
2020-09-25 16:43:53 -07:00
|
|
|
|
2020-12-11 11:46:18 -07:00
|
|
|
config FTRACE_MCOUNT_USE_RECORDMCOUNT
|
|
|
|
def_bool y
|
|
|
|
depends on !FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY
|
|
|
|
depends on !FTRACE_MCOUNT_USE_CC
|
2020-09-25 16:43:53 -07:00
|
|
|
depends on !FTRACE_MCOUNT_USE_OBJTOOL
|
2020-12-11 11:46:18 -07:00
|
|
|
depends on FTRACE_MCOUNT_RECORD
|
|
|
|
|
2015-12-10 11:50:50 -07:00
|
|
|
config TRACING_MAP
|
|
|
|
bool
|
|
|
|
depends on ARCH_HAVE_NMI_SAFE_CMPXCHG
|
|
|
|
help
|
|
|
|
tracing_map is a special-purpose lock-free map for tracing,
|
|
|
|
separated out as a stand-alone facility in order to allow it
|
|
|
|
to be shared between multiple tracers. It isn't meant to be
|
|
|
|
generally used outside of that context, and is normally
|
|
|
|
selected by tracers that use it.
|
|
|
|
|
2020-05-28 12:32:37 -07:00
|
|
|
config SYNTH_EVENTS
|
|
|
|
bool "Synthetic trace events"
|
|
|
|
select TRACING
|
|
|
|
select DYNAMIC_EVENTS
|
|
|
|
default n
|
|
|
|
help
|
|
|
|
Synthetic events are user-defined trace events that can be
|
|
|
|
used to combine data from other trace events or in fact any
|
|
|
|
data source. Synthetic events can be generated indirectly
|
|
|
|
via the trace() action of histogram triggers or directly
|
|
|
|
by way of an in-kernel API.
|
|
|
|
|
|
|
|
See Documentation/trace/events.rst or
|
|
|
|
Documentation/trace/histogram.rst for details and examples.
|
|
|
|
|
|
|
|
If in doubt, say N.
|
|
|
|
|
user_events: Add minimal support for trace_event into ftrace
Minimal support for interacting with dynamic events, trace_event and
ftrace. Core outline of flow between user process, ioctl and trace_event
APIs.
User mode processes that wish to use trace events to get data into
ftrace, perf, eBPF, etc are limited to uprobes today. The user events
features enables an ABI for user mode processes to create and write to
trace events that are isolated from kernel level trace events. This
enables a faster path for tracing from user mode data as well as opens
managed code to participate in trace events, where stub locations are
dynamic.
User processes often want to trace only when it's useful. To enable this
a set of pages are mapped into the user process space that indicate the
current state of the user events that have been registered. User
processes can check if their event is hooked to a trace/probe, and if it
is, emit the event data out via the write() syscall.
Two new files are introduced into tracefs to accomplish this:
user_events_status - This file is mmap'd into participating user mode
processes to indicate event status.
user_events_data - This file is opened and register/delete ioctl's are
issued to create/open/delete trace events that can be used for tracing.
The typical scenario is on process start to mmap user_events_status. Processes
then register the events they plan to use via the REG ioctl. The ioctl reads
and updates the passed in user_reg struct. The status_index of the struct is
used to know the byte in the status page to check for that event. The
write_index of the struct is used to describe that event when writing out to
the fd that was used for the ioctl call. The data must always include this
index first when writing out data for an event. Data can be written either by
write() or by writev().
For example, in memory:
int index;
char data[];
Psuedo code example of typical usage:
struct user_reg reg;
int page_fd = open("user_events_status", O_RDWR);
char *page_data = mmap(NULL, PAGE_SIZE, PROT_READ, MAP_SHARED, page_fd, 0);
close(page_fd);
int data_fd = open("user_events_data", O_RDWR);
reg.size = sizeof(reg);
reg.name_args = (__u64)"test";
ioctl(data_fd, DIAG_IOCSREG, ®);
int status_id = reg.status_index;
int write_id = reg.write_index;
struct iovec io[2];
io[0].iov_base = &write_id;
io[0].iov_len = sizeof(write_id);
io[1].iov_base = payload;
io[1].iov_len = sizeof(payload);
if (page_data[status_id])
writev(data_fd, io, 2);
User events are also exposed via the dynamic_events tracefs file for
both create and delete. Current status is exposed via the user_events_status
tracefs file.
Simple example to register a user event via dynamic_events:
echo u:test >> dynamic_events
cat dynamic_events
u:test
If an event is hooked to a probe, the probe hooked shows up:
echo 1 > events/user_events/test/enable
cat user_events_status
1:test # Used by ftrace
Active: 1
Busy: 1
Max: 4096
If an event is not hooked to a probe, no probe status shows up:
echo 0 > events/user_events/test/enable
cat user_events_status
1:test
Active: 1
Busy: 0
Max: 4096
Users can describe the trace event format via the following format:
name[:FLAG1[,FLAG2...] [field1[;field2...]]
Each field has the following format:
type name
Example for char array with a size of 20 named msg:
echo 'u:detailed char[20] msg' >> dynamic_events
cat dynamic_events
u:detailed char[20] msg
Data offsets are based on the data written out via write() and will be
updated to reflect the correct offset in the trace_event fields. For dynamic
data it is recommended to use the new __rel_loc data type. This type will be
the same as __data_loc, but the offset is relative to this entry. This allows
user_events to not worry about what common fields are being inserted before
the data.
The above format is valid for both the ioctl and the dynamic_events file.
Link: https://lkml.kernel.org/r/20220118204326.2169-2-beaub@linux.microsoft.com
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-01-18 13:43:15 -07:00
|
|
|
config USER_EVENTS
|
|
|
|
bool "User trace events"
|
|
|
|
select TRACING
|
|
|
|
select DYNAMIC_EVENTS
|
|
|
|
help
|
|
|
|
User trace events are user-defined trace events that
|
|
|
|
can be used like an existing kernel trace event. User trace
|
|
|
|
events are generated by writing to a tracefs file. User
|
|
|
|
processes can determine if their tracing events should be
|
2023-03-28 16:52:10 -07:00
|
|
|
generated by registering a value and bit with the kernel
|
|
|
|
that reflects when it is enabled or not.
|
user_events: Add minimal support for trace_event into ftrace
Minimal support for interacting with dynamic events, trace_event and
ftrace. Core outline of flow between user process, ioctl and trace_event
APIs.
User mode processes that wish to use trace events to get data into
ftrace, perf, eBPF, etc are limited to uprobes today. The user events
features enables an ABI for user mode processes to create and write to
trace events that are isolated from kernel level trace events. This
enables a faster path for tracing from user mode data as well as opens
managed code to participate in trace events, where stub locations are
dynamic.
User processes often want to trace only when it's useful. To enable this
a set of pages are mapped into the user process space that indicate the
current state of the user events that have been registered. User
processes can check if their event is hooked to a trace/probe, and if it
is, emit the event data out via the write() syscall.
Two new files are introduced into tracefs to accomplish this:
user_events_status - This file is mmap'd into participating user mode
processes to indicate event status.
user_events_data - This file is opened and register/delete ioctl's are
issued to create/open/delete trace events that can be used for tracing.
The typical scenario is on process start to mmap user_events_status. Processes
then register the events they plan to use via the REG ioctl. The ioctl reads
and updates the passed in user_reg struct. The status_index of the struct is
used to know the byte in the status page to check for that event. The
write_index of the struct is used to describe that event when writing out to
the fd that was used for the ioctl call. The data must always include this
index first when writing out data for an event. Data can be written either by
write() or by writev().
For example, in memory:
int index;
char data[];
Psuedo code example of typical usage:
struct user_reg reg;
int page_fd = open("user_events_status", O_RDWR);
char *page_data = mmap(NULL, PAGE_SIZE, PROT_READ, MAP_SHARED, page_fd, 0);
close(page_fd);
int data_fd = open("user_events_data", O_RDWR);
reg.size = sizeof(reg);
reg.name_args = (__u64)"test";
ioctl(data_fd, DIAG_IOCSREG, ®);
int status_id = reg.status_index;
int write_id = reg.write_index;
struct iovec io[2];
io[0].iov_base = &write_id;
io[0].iov_len = sizeof(write_id);
io[1].iov_base = payload;
io[1].iov_len = sizeof(payload);
if (page_data[status_id])
writev(data_fd, io, 2);
User events are also exposed via the dynamic_events tracefs file for
both create and delete. Current status is exposed via the user_events_status
tracefs file.
Simple example to register a user event via dynamic_events:
echo u:test >> dynamic_events
cat dynamic_events
u:test
If an event is hooked to a probe, the probe hooked shows up:
echo 1 > events/user_events/test/enable
cat user_events_status
1:test # Used by ftrace
Active: 1
Busy: 1
Max: 4096
If an event is not hooked to a probe, no probe status shows up:
echo 0 > events/user_events/test/enable
cat user_events_status
1:test
Active: 1
Busy: 0
Max: 4096
Users can describe the trace event format via the following format:
name[:FLAG1[,FLAG2...] [field1[;field2...]]
Each field has the following format:
type name
Example for char array with a size of 20 named msg:
echo 'u:detailed char[20] msg' >> dynamic_events
cat dynamic_events
u:detailed char[20] msg
Data offsets are based on the data written out via write() and will be
updated to reflect the correct offset in the trace_event fields. For dynamic
data it is recommended to use the new __rel_loc data type. This type will be
the same as __data_loc, but the offset is relative to this entry. This allows
user_events to not worry about what common fields are being inserted before
the data.
The above format is valid for both the ioctl and the dynamic_events file.
Link: https://lkml.kernel.org/r/20220118204326.2169-2-beaub@linux.microsoft.com
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-01-18 13:43:15 -07:00
|
|
|
|
2023-03-28 16:52:10 -07:00
|
|
|
See Documentation/trace/user_events.rst.
|
user_events: Add minimal support for trace_event into ftrace
Minimal support for interacting with dynamic events, trace_event and
ftrace. Core outline of flow between user process, ioctl and trace_event
APIs.
User mode processes that wish to use trace events to get data into
ftrace, perf, eBPF, etc are limited to uprobes today. The user events
features enables an ABI for user mode processes to create and write to
trace events that are isolated from kernel level trace events. This
enables a faster path for tracing from user mode data as well as opens
managed code to participate in trace events, where stub locations are
dynamic.
User processes often want to trace only when it's useful. To enable this
a set of pages are mapped into the user process space that indicate the
current state of the user events that have been registered. User
processes can check if their event is hooked to a trace/probe, and if it
is, emit the event data out via the write() syscall.
Two new files are introduced into tracefs to accomplish this:
user_events_status - This file is mmap'd into participating user mode
processes to indicate event status.
user_events_data - This file is opened and register/delete ioctl's are
issued to create/open/delete trace events that can be used for tracing.
The typical scenario is on process start to mmap user_events_status. Processes
then register the events they plan to use via the REG ioctl. The ioctl reads
and updates the passed in user_reg struct. The status_index of the struct is
used to know the byte in the status page to check for that event. The
write_index of the struct is used to describe that event when writing out to
the fd that was used for the ioctl call. The data must always include this
index first when writing out data for an event. Data can be written either by
write() or by writev().
For example, in memory:
int index;
char data[];
Psuedo code example of typical usage:
struct user_reg reg;
int page_fd = open("user_events_status", O_RDWR);
char *page_data = mmap(NULL, PAGE_SIZE, PROT_READ, MAP_SHARED, page_fd, 0);
close(page_fd);
int data_fd = open("user_events_data", O_RDWR);
reg.size = sizeof(reg);
reg.name_args = (__u64)"test";
ioctl(data_fd, DIAG_IOCSREG, ®);
int status_id = reg.status_index;
int write_id = reg.write_index;
struct iovec io[2];
io[0].iov_base = &write_id;
io[0].iov_len = sizeof(write_id);
io[1].iov_base = payload;
io[1].iov_len = sizeof(payload);
if (page_data[status_id])
writev(data_fd, io, 2);
User events are also exposed via the dynamic_events tracefs file for
both create and delete. Current status is exposed via the user_events_status
tracefs file.
Simple example to register a user event via dynamic_events:
echo u:test >> dynamic_events
cat dynamic_events
u:test
If an event is hooked to a probe, the probe hooked shows up:
echo 1 > events/user_events/test/enable
cat user_events_status
1:test # Used by ftrace
Active: 1
Busy: 1
Max: 4096
If an event is not hooked to a probe, no probe status shows up:
echo 0 > events/user_events/test/enable
cat user_events_status
1:test
Active: 1
Busy: 0
Max: 4096
Users can describe the trace event format via the following format:
name[:FLAG1[,FLAG2...] [field1[;field2...]]
Each field has the following format:
type name
Example for char array with a size of 20 named msg:
echo 'u:detailed char[20] msg' >> dynamic_events
cat dynamic_events
u:detailed char[20] msg
Data offsets are based on the data written out via write() and will be
updated to reflect the correct offset in the trace_event fields. For dynamic
data it is recommended to use the new __rel_loc data type. This type will be
the same as __data_loc, but the offset is relative to this entry. This allows
user_events to not worry about what common fields are being inserted before
the data.
The above format is valid for both the ioctl and the dynamic_events file.
Link: https://lkml.kernel.org/r/20220118204326.2169-2-beaub@linux.microsoft.com
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-01-18 13:43:15 -07:00
|
|
|
If in doubt, say N.
|
|
|
|
|
tracing: Add 'hist' event trigger command
'hist' triggers allow users to continually aggregate trace events,
which can then be viewed afterwards by simply reading a 'hist' file
containing the aggregation in a human-readable format.
The basic idea is very simple and boils down to a mechanism whereby
trace events, rather than being exhaustively dumped in raw form and
viewed directly, are automatically 'compressed' into meaningful tables
completely defined by the user.
This is done strictly via single-line command-line commands and
without the aid of any kind of programming language or interpreter.
A surprising number of typical use cases can be accomplished by users
via this simple mechanism. In fact, a large number of the tasks that
users typically do using the more complicated script-based tracing
tools, at least during the initial stages of an investigation, can be
accomplished by simply specifying a set of keys and values to be used
in the creation of a hash table.
The Linux kernel trace event subsystem happens to provide an extensive
list of keys and values ready-made for such a purpose in the form of
the event format files associated with each trace event. By simply
consulting the format file for field names of interest and by plugging
them into the hist trigger command, users can create an endless number
of useful aggregations to help with investigating various properties
of the system. See Documentation/trace/events.txt for examples.
hist triggers are implemented on top of the existing event trigger
infrastructure, and as such are consistent with the existing triggers
from a user's perspective as well.
The basic syntax follows the existing trigger syntax. Users start an
aggregation by writing a 'hist' trigger to the event of interest's
trigger file:
# echo hist:keys=xxx [ if filter] > event/trigger
Once a hist trigger has been set up, by default it continually
aggregates every matching event into a hash table using the event key
and a value field named 'hitcount'.
To view the aggregation at any point in time, simply read the 'hist'
file in the same directory as the 'trigger' file:
# cat event/hist
The detailed syntax provides additional options for user control, and
is described exhaustively in Documentation/trace/events.txt and in the
virtual tracing/README file in the tracing subsystem.
Link: http://lkml.kernel.org/r/72d263b5e1853fe9c314953b65833c3aa75479f2.1457029949.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-03-03 11:54:42 -07:00
|
|
|
config HIST_TRIGGERS
|
|
|
|
bool "Histogram triggers"
|
|
|
|
depends on ARCH_HAVE_NMI_SAFE_CMPXCHG
|
|
|
|
select TRACING_MAP
|
2016-07-03 06:51:34 -07:00
|
|
|
select TRACING
|
2018-11-05 02:03:33 -07:00
|
|
|
select DYNAMIC_EVENTS
|
2020-05-28 12:32:37 -07:00
|
|
|
select SYNTH_EVENTS
|
tracing: Add 'hist' event trigger command
'hist' triggers allow users to continually aggregate trace events,
which can then be viewed afterwards by simply reading a 'hist' file
containing the aggregation in a human-readable format.
The basic idea is very simple and boils down to a mechanism whereby
trace events, rather than being exhaustively dumped in raw form and
viewed directly, are automatically 'compressed' into meaningful tables
completely defined by the user.
This is done strictly via single-line command-line commands and
without the aid of any kind of programming language or interpreter.
A surprising number of typical use cases can be accomplished by users
via this simple mechanism. In fact, a large number of the tasks that
users typically do using the more complicated script-based tracing
tools, at least during the initial stages of an investigation, can be
accomplished by simply specifying a set of keys and values to be used
in the creation of a hash table.
The Linux kernel trace event subsystem happens to provide an extensive
list of keys and values ready-made for such a purpose in the form of
the event format files associated with each trace event. By simply
consulting the format file for field names of interest and by plugging
them into the hist trigger command, users can create an endless number
of useful aggregations to help with investigating various properties
of the system. See Documentation/trace/events.txt for examples.
hist triggers are implemented on top of the existing event trigger
infrastructure, and as such are consistent with the existing triggers
from a user's perspective as well.
The basic syntax follows the existing trigger syntax. Users start an
aggregation by writing a 'hist' trigger to the event of interest's
trigger file:
# echo hist:keys=xxx [ if filter] > event/trigger
Once a hist trigger has been set up, by default it continually
aggregates every matching event into a hash table using the event key
and a value field named 'hitcount'.
To view the aggregation at any point in time, simply read the 'hist'
file in the same directory as the 'trigger' file:
# cat event/hist
The detailed syntax provides additional options for user control, and
is described exhaustively in Documentation/trace/events.txt and in the
virtual tracing/README file in the tracing subsystem.
Link: http://lkml.kernel.org/r/72d263b5e1853fe9c314953b65833c3aa75479f2.1457029949.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-03-03 11:54:42 -07:00
|
|
|
default n
|
|
|
|
help
|
|
|
|
Hist triggers allow one or more arbitrary trace event fields
|
|
|
|
to be aggregated into hash tables and dumped to stdout by
|
|
|
|
reading a debugfs/tracefs file. They're useful for
|
|
|
|
gathering quick and dirty (though precise) summaries of
|
|
|
|
event activity as an initial guide for further investigation
|
|
|
|
using more advanced tools.
|
|
|
|
|
2018-01-15 19:52:10 -07:00
|
|
|
Inter-event tracing of quantities such as latencies is also
|
|
|
|
supported using hist triggers under this option.
|
|
|
|
|
2018-06-26 02:49:11 -07:00
|
|
|
See Documentation/trace/histogram.rst.
|
tracing: Add 'hist' event trigger command
'hist' triggers allow users to continually aggregate trace events,
which can then be viewed afterwards by simply reading a 'hist' file
containing the aggregation in a human-readable format.
The basic idea is very simple and boils down to a mechanism whereby
trace events, rather than being exhaustively dumped in raw form and
viewed directly, are automatically 'compressed' into meaningful tables
completely defined by the user.
This is done strictly via single-line command-line commands and
without the aid of any kind of programming language or interpreter.
A surprising number of typical use cases can be accomplished by users
via this simple mechanism. In fact, a large number of the tasks that
users typically do using the more complicated script-based tracing
tools, at least during the initial stages of an investigation, can be
accomplished by simply specifying a set of keys and values to be used
in the creation of a hash table.
The Linux kernel trace event subsystem happens to provide an extensive
list of keys and values ready-made for such a purpose in the form of
the event format files associated with each trace event. By simply
consulting the format file for field names of interest and by plugging
them into the hist trigger command, users can create an endless number
of useful aggregations to help with investigating various properties
of the system. See Documentation/trace/events.txt for examples.
hist triggers are implemented on top of the existing event trigger
infrastructure, and as such are consistent with the existing triggers
from a user's perspective as well.
The basic syntax follows the existing trigger syntax. Users start an
aggregation by writing a 'hist' trigger to the event of interest's
trigger file:
# echo hist:keys=xxx [ if filter] > event/trigger
Once a hist trigger has been set up, by default it continually
aggregates every matching event into a hash table using the event key
and a value field named 'hitcount'.
To view the aggregation at any point in time, simply read the 'hist'
file in the same directory as the 'trigger' file:
# cat event/hist
The detailed syntax provides additional options for user control, and
is described exhaustively in Documentation/trace/events.txt and in the
virtual tracing/README file in the tracing subsystem.
Link: http://lkml.kernel.org/r/72d263b5e1853fe9c314953b65833c3aa75479f2.1457029949.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-03-03 11:54:42 -07:00
|
|
|
If in doubt, say N.
|
|
|
|
|
tracing: Introduce trace event injection
We have been trying to use rasdaemon to monitor hardware errors like
correctable memory errors. rasdaemon uses trace events to monitor
various hardware errors. In order to test it, we have to inject some
hardware errors, unfortunately not all of them provide error
injections. MCE does provide a way to inject MCE errors, but errors
like PCI error and devlink error don't, it is not easy to add error
injection to each of them. Instead, it is relatively easier to just
allow users to inject trace events in a generic way so that all trace
events can be injected.
This patch introduces trace event injection, where a new 'inject' is
added to each tracepoint directory. Users could write into this file
with key=value pairs to specify the value of each fields of the trace
event, all unspecified fields are set to zero values by default.
For example, for the net/net_dev_queue tracepoint, we can inject:
INJECT=/sys/kernel/debug/tracing/events/net/net_dev_queue/inject
echo "" > $INJECT
echo "name='test'" > $INJECT
echo "name='test' len=1024" > $INJECT
cat /sys/kernel/debug/tracing/trace
...
<...>-614 [000] .... 36.571483: net_dev_queue: dev= skbaddr=00000000fbf338c2 len=0
<...>-614 [001] .... 136.588252: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=0
<...>-614 [001] .N.. 208.431878: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=1024
Triggers could be triggered as usual too:
echo "stacktrace if len == 1025" > /sys/kernel/debug/tracing/events/net/net_dev_queue/trigger
echo "len=1025" > $INJECT
cat /sys/kernel/debug/tracing/trace
...
bash-614 [000] .... 36.571483: net_dev_queue: dev= skbaddr=00000000fbf338c2 len=0
bash-614 [001] .... 136.588252: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=0
bash-614 [001] .N.. 208.431878: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=1024
bash-614 [001] .N.1 284.236349: <stack trace>
=> event_inject_write
=> vfs_write
=> ksys_write
=> do_syscall_64
=> entry_SYSCALL_64_after_hwframe
The only thing that can't be injected is string pointers as they
require constant string pointers, this can't be done at run time.
Link: http://lkml.kernel.org/r/20191130045218.18979-1-xiyou.wangcong@gmail.com
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-11-29 21:52:18 -07:00
|
|
|
config TRACE_EVENT_INJECT
|
|
|
|
bool "Trace event injection"
|
|
|
|
depends on TRACING
|
|
|
|
help
|
|
|
|
Allow user-space to inject a specific trace event into the ring
|
|
|
|
buffer. This is mainly used for testing purpose.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2014-05-29 19:49:07 -07:00
|
|
|
config TRACEPOINT_BENCHMARK
|
2019-11-20 06:38:07 -07:00
|
|
|
bool "Add tracepoint that benchmarks tracepoints"
|
2014-05-29 19:49:07 -07:00
|
|
|
help
|
|
|
|
This option creates the tracepoint "benchmark:benchmark_event".
|
|
|
|
When the tracepoint is enabled, it kicks off a kernel thread that
|
2021-03-02 01:49:28 -07:00
|
|
|
goes into an infinite loop (calling cond_resched() to let other tasks
|
2014-05-29 19:49:07 -07:00
|
|
|
run), and calls the tracepoint. Each iteration will record the time
|
|
|
|
it took to write to the tracepoint and the next iteration that
|
|
|
|
data will be passed to the tracepoint itself. That is, the tracepoint
|
|
|
|
will report the time it took to do the previous tracepoint.
|
|
|
|
The string written to the tracepoint is a static string of 128 bytes
|
|
|
|
to keep the time the same. The initial string is simply a write of
|
|
|
|
"START". The second string records the cold cache time of the first
|
|
|
|
write which is not added to the rest of the calculations.
|
|
|
|
|
|
|
|
As it is a tight loop, it benchmarks as hot cache. That's fine because
|
|
|
|
we care most about hot paths that are probably in cache already.
|
|
|
|
|
|
|
|
An example of the output:
|
|
|
|
|
|
|
|
START
|
|
|
|
first=3672 [COLD CACHED]
|
|
|
|
last=632 first=3672 max=632 min=632 avg=316 std=446 std^2=199712
|
|
|
|
last=278 first=3672 max=632 min=278 avg=303 std=316 std^2=100337
|
|
|
|
last=277 first=3672 max=632 min=277 avg=296 std=258 std^2=67064
|
|
|
|
last=273 first=3672 max=632 min=273 avg=292 std=224 std^2=50411
|
|
|
|
last=273 first=3672 max=632 min=273 avg=288 std=200 std^2=40389
|
|
|
|
last=281 first=3672 max=632 min=273 avg=287 std=183 std^2=33666
|
|
|
|
|
|
|
|
|
2009-05-05 19:47:18 -07:00
|
|
|
config RING_BUFFER_BENCHMARK
|
|
|
|
tristate "Ring buffer benchmark stress tester"
|
|
|
|
depends on RING_BUFFER
|
|
|
|
help
|
2009-12-21 13:01:17 -07:00
|
|
|
This option creates a test to stress the ring buffer and benchmark it.
|
|
|
|
It creates its own ring buffer such that it will not interfere with
|
2009-05-05 19:47:18 -07:00
|
|
|
any other users of the ring buffer (such as ftrace). It then creates
|
|
|
|
a producer and consumer that will run for 10 seconds and sleep for
|
|
|
|
10 seconds. Each interval it will print out the number of events
|
|
|
|
it recorded and give a rough estimate of how long each iteration took.
|
|
|
|
|
|
|
|
It does not disable interrupts or raise its priority, so it may be
|
|
|
|
affected by processes that are running.
|
|
|
|
|
2009-12-21 13:01:17 -07:00
|
|
|
If unsure, say N.
|
2009-05-05 19:47:18 -07:00
|
|
|
|
2020-01-29 14:30:30 -07:00
|
|
|
config TRACE_EVAL_MAP_FILE
|
|
|
|
bool "Show eval mappings for trace events"
|
|
|
|
depends on TRACING
|
|
|
|
help
|
|
|
|
The "print fmt" of the trace events will show the enum/sizeof names
|
|
|
|
instead of their values. This can cause problems for user space tools
|
|
|
|
that use this string to parse the raw data as user space does not know
|
|
|
|
how to convert the string to its value.
|
|
|
|
|
|
|
|
To fix this, there's a special macro in the kernel that can be used
|
|
|
|
to convert an enum/sizeof into its value. If this macro is used, then
|
|
|
|
the print fmt strings will be converted to their values.
|
|
|
|
|
|
|
|
If something does not get converted properly, this option can be
|
|
|
|
used to show what enums/sizeof the kernel tried to convert.
|
|
|
|
|
|
|
|
This option is for debugging the conversions. A file is created
|
|
|
|
in the tracing directory called "eval_map" that will show the
|
|
|
|
names matched with their values and what trace event system they
|
|
|
|
belong too.
|
|
|
|
|
|
|
|
Normally, the mapping of the strings to values will be freed after
|
|
|
|
boot up or module load. With this option, they will not be freed, as
|
|
|
|
they are needed for the "eval_map" file. Enabling this option will
|
|
|
|
increase the memory footprint of the running kernel.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2020-11-05 19:32:46 -07:00
|
|
|
config FTRACE_RECORD_RECURSION
|
|
|
|
bool "Record functions that recurse in function tracing"
|
|
|
|
depends on FUNCTION_TRACER
|
|
|
|
help
|
|
|
|
All callbacks that attach to the function tracing have some sort
|
|
|
|
of protection against recursion. Even though the protection exists,
|
|
|
|
it adds overhead. This option will create a file in the tracefs
|
|
|
|
file system called "recursed_functions" that will list the functions
|
|
|
|
that triggered a recursion.
|
|
|
|
|
|
|
|
This will add more overhead to cases that have recursion.
|
|
|
|
|
|
|
|
If unsure, say N
|
|
|
|
|
|
|
|
config FTRACE_RECORD_RECURSION_SIZE
|
|
|
|
int "Max number of recursed functions to record"
|
2024-03-22 05:18:01 -07:00
|
|
|
default 128
|
2020-11-05 19:32:46 -07:00
|
|
|
depends on FTRACE_RECORD_RECURSION
|
|
|
|
help
|
|
|
|
This defines the limit of number of functions that can be
|
|
|
|
listed in the "recursed_functions" file, that lists all
|
|
|
|
the functions that caused a recursion to happen.
|
|
|
|
This file can be reset, but the limit can not change in
|
|
|
|
size at runtime.
|
|
|
|
|
2024-04-18 12:09:08 -07:00
|
|
|
config FTRACE_VALIDATE_RCU_IS_WATCHING
|
|
|
|
bool "Validate RCU is on during ftrace execution"
|
|
|
|
depends on FUNCTION_TRACER
|
|
|
|
depends on ARCH_WANTS_NO_INSTR
|
|
|
|
help
|
|
|
|
All callbacks that attach to the function tracing have some sort of
|
|
|
|
protection against recursion. This option is only to verify that
|
|
|
|
ftrace (and other users of ftrace_test_recursion_trylock()) are not
|
|
|
|
called outside of RCU, as if they are, it can cause a race. But it
|
|
|
|
also has a noticeable overhead when enabled.
|
|
|
|
|
|
|
|
If unsure, say N
|
|
|
|
|
2020-11-02 12:43:10 -07:00
|
|
|
config RING_BUFFER_RECORD_RECURSION
|
|
|
|
bool "Record functions that recurse in the ring buffer"
|
|
|
|
depends on FTRACE_RECORD_RECURSION
|
|
|
|
# default y, because it is coupled with FTRACE_RECORD_RECURSION
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
The ring buffer has its own internal recursion. Although when
|
2023-01-24 11:16:47 -07:00
|
|
|
recursion happens it won't cause harm because of the protection,
|
|
|
|
but it does cause unwanted overhead. Enabling this option will
|
2020-11-02 12:43:10 -07:00
|
|
|
place where recursion was detected into the ftrace "recursed_functions"
|
|
|
|
file.
|
|
|
|
|
|
|
|
This will add more overhead to cases that have recursion.
|
|
|
|
|
2020-01-29 14:30:30 -07:00
|
|
|
config GCOV_PROFILE_FTRACE
|
|
|
|
bool "Enable GCOV profiling on ftrace subsystem"
|
|
|
|
depends on GCOV_KERNEL
|
|
|
|
help
|
|
|
|
Enable GCOV profiling on ftrace subsystem for checking
|
|
|
|
which functions/lines are tested.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
|
|
|
Note that on a kernel compiled with this config, ftrace will
|
|
|
|
run significantly slower.
|
|
|
|
|
|
|
|
config FTRACE_SELFTEST
|
|
|
|
bool
|
|
|
|
|
|
|
|
config FTRACE_STARTUP_TEST
|
|
|
|
bool "Perform a startup test on ftrace"
|
|
|
|
depends on GENERIC_TRACER
|
|
|
|
select FTRACE_SELFTEST
|
|
|
|
help
|
|
|
|
This option performs a series of startup tests on ftrace. On bootup
|
|
|
|
a series of tests are made to verify that the tracer is
|
|
|
|
functioning properly. It will do tests on all the configured
|
|
|
|
tracers of ftrace.
|
|
|
|
|
|
|
|
config EVENT_TRACE_STARTUP_TEST
|
|
|
|
bool "Run selftest on trace events"
|
|
|
|
depends on FTRACE_STARTUP_TEST
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
This option performs a test on all trace events in the system.
|
|
|
|
It basically just enables each event and runs some code that
|
|
|
|
will trigger events (not necessarily the event it enables)
|
|
|
|
This may take some time run as there are a lot of events.
|
|
|
|
|
|
|
|
config EVENT_TRACE_TEST_SYSCALLS
|
|
|
|
bool "Run selftest on syscall events"
|
|
|
|
depends on EVENT_TRACE_STARTUP_TEST
|
|
|
|
help
|
|
|
|
This option will also enable testing every syscall event.
|
|
|
|
It only enables the event and disables it and runs various loads
|
|
|
|
with the event enabled. This adds a bit more time for kernel boot
|
|
|
|
up since it runs this on every system call defined.
|
|
|
|
|
|
|
|
TBD - enable a way to actually call the syscalls as we test their
|
|
|
|
events
|
|
|
|
|
2021-12-06 13:18:58 -07:00
|
|
|
config FTRACE_SORT_STARTUP_TEST
|
|
|
|
bool "Verify compile time sorting of ftrace functions"
|
|
|
|
depends on DYNAMIC_FTRACE
|
2022-01-22 07:17:10 -07:00
|
|
|
depends on BUILDTIME_MCOUNT_SORT
|
2021-12-06 13:18:58 -07:00
|
|
|
help
|
|
|
|
Sorting of the mcount_loc sections that is used to find the
|
|
|
|
where the ftrace knows where to patch functions for tracing
|
|
|
|
and other callbacks is done at compile time. But if the sort
|
|
|
|
is not done correctly, it will cause non-deterministic failures.
|
|
|
|
When this is set, the sorted sections will be verified that they
|
|
|
|
are in deed sorted and will warn if they are not.
|
|
|
|
|
|
|
|
If unsure, say N
|
|
|
|
|
2013-03-15 08:32:53 -07:00
|
|
|
config RING_BUFFER_STARTUP_TEST
|
|
|
|
bool "Ring buffer startup self test"
|
|
|
|
depends on RING_BUFFER
|
|
|
|
help
|
2019-11-20 06:38:07 -07:00
|
|
|
Run a simple self test on the ring buffer on boot up. Late in the
|
2013-03-15 08:32:53 -07:00
|
|
|
kernel boot sequence, the test will start that kicks off
|
|
|
|
a thread per cpu. Each thread will write various size events
|
|
|
|
into the ring buffer. Another thread is created to send IPIs
|
|
|
|
to each of the threads, where the IPI handler will also write
|
|
|
|
to the ring buffer, to test/stress the nesting ability.
|
|
|
|
If any anomalies are discovered, a warning will be displayed
|
|
|
|
and all ring buffers will be disabled.
|
|
|
|
|
|
|
|
The test runs for 10 seconds. This will slow your boot time
|
|
|
|
by at least 10 more seconds.
|
|
|
|
|
2023-01-24 11:16:47 -07:00
|
|
|
At the end of the test, statistics and more checks are done.
|
|
|
|
It will output the stats of each per cpu buffer: What
|
2013-03-15 08:32:53 -07:00
|
|
|
was written, the sizes, what was read, what was lost, and
|
|
|
|
other similar details.
|
|
|
|
|
|
|
|
If unsure, say N
|
|
|
|
|
2020-11-30 21:37:33 -07:00
|
|
|
config RING_BUFFER_VALIDATE_TIME_DELTAS
|
|
|
|
bool "Verify ring buffer time stamp deltas"
|
|
|
|
depends on RING_BUFFER
|
|
|
|
help
|
|
|
|
This will audit the time stamps on the ring buffer sub
|
|
|
|
buffer to make sure that all the time deltas for the
|
|
|
|
events on a sub buffer matches the current time stamp.
|
|
|
|
This audit is performed for every event that is not
|
|
|
|
interrupted, or interrupting another event. A check
|
|
|
|
is also made when traversing sub buffers to make sure
|
|
|
|
that all the deltas on the previous sub buffer do not
|
|
|
|
add up to be greater than the current time stamp.
|
|
|
|
|
|
|
|
NOTE: This adds significant overhead to recording of events,
|
|
|
|
and should only be used to test the logic of the ring buffer.
|
|
|
|
Do not use it on production systems.
|
|
|
|
|
|
|
|
Only say Y if you understand what this does, and you
|
|
|
|
still want it enabled. Otherwise say N
|
|
|
|
|
2020-01-29 14:23:04 -07:00
|
|
|
config MMIOTRACE_TEST
|
|
|
|
tristate "Test module for mmiotrace"
|
|
|
|
depends on MMIOTRACE && m
|
|
|
|
help
|
|
|
|
This is a dumb module for testing mmiotrace. It is very dangerous
|
|
|
|
as it will write garbage to IO memory starting at a given address.
|
|
|
|
However, it should be safe to use on e.g. unused portion of VRAM.
|
|
|
|
|
|
|
|
Say N, unless you absolutely know what you are doing.
|
|
|
|
|
2018-07-12 14:36:11 -07:00
|
|
|
config PREEMPTIRQ_DELAY_TEST
|
2020-01-29 14:23:04 -07:00
|
|
|
tristate "Test module to create a preempt / IRQ disable delay thread to test latency tracers"
|
2018-07-12 14:36:11 -07:00
|
|
|
depends on m
|
|
|
|
help
|
|
|
|
Select this option to build a test module that can help test latency
|
|
|
|
tracers by executing a preempt or irq disable section with a user
|
|
|
|
configurable delay. The module busy waits for the duration of the
|
|
|
|
critical section.
|
|
|
|
|
2019-10-08 15:08:22 -07:00
|
|
|
For example, the following invocation generates a burst of three
|
|
|
|
irq-disabled critical sections for 500us:
|
|
|
|
modprobe preemptirq_delay_test test_mode=irq delay=500 burst_size=3
|
2018-07-12 14:36:11 -07:00
|
|
|
|
2021-01-27 18:35:13 -07:00
|
|
|
What's more, if you want to attach the test on the cpu which the latency
|
|
|
|
tracer is running on, specify cpu_affinity=cpu_num at the end of the
|
|
|
|
command.
|
|
|
|
|
2018-07-12 14:36:11 -07:00
|
|
|
If unsure, say N
|
|
|
|
|
2020-01-29 11:59:28 -07:00
|
|
|
config SYNTH_EVENT_GEN_TEST
|
|
|
|
tristate "Test module for in-kernel synthetic event generation"
|
2024-06-11 06:30:37 -07:00
|
|
|
depends on SYNTH_EVENTS && m
|
2020-01-29 11:59:28 -07:00
|
|
|
help
|
|
|
|
This option creates a test module to check the base
|
|
|
|
functionality of in-kernel synthetic event definition and
|
|
|
|
generation.
|
|
|
|
|
|
|
|
To test, insert the module, and then check the trace buffer
|
|
|
|
for the generated sample events.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2020-01-29 11:59:31 -07:00
|
|
|
config KPROBE_EVENT_GEN_TEST
|
|
|
|
tristate "Test module for in-kernel kprobe event generation"
|
2024-06-11 06:30:37 -07:00
|
|
|
depends on KPROBE_EVENTS && m
|
2020-01-29 11:59:31 -07:00
|
|
|
help
|
|
|
|
This option creates a test module to check the base
|
|
|
|
functionality of in-kernel kprobe event definition.
|
|
|
|
|
|
|
|
To test, insert the module, and then check the trace buffer
|
|
|
|
for the generated kprobe events.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2020-04-03 12:31:21 -07:00
|
|
|
config HIST_TRIGGERS_DEBUG
|
|
|
|
bool "Hist trigger debug support"
|
|
|
|
depends on HIST_TRIGGERS
|
|
|
|
help
|
|
|
|
Add "hist_debug" file for each event, which when read will
|
|
|
|
dump out a bunch of internal details about the hist triggers
|
|
|
|
defined on that event.
|
|
|
|
|
|
|
|
The hist_debug file serves a couple of purposes:
|
|
|
|
|
|
|
|
- Helps developers verify that nothing is broken.
|
|
|
|
|
|
|
|
- Provides educational information to support the details
|
|
|
|
of the hist trigger internals as described by
|
|
|
|
Documentation/trace/histogram-design.rst.
|
|
|
|
|
|
|
|
The hist_debug output only covers the data structures
|
|
|
|
related to the histogram definitions themselves and doesn't
|
|
|
|
display the internals of map buckets or variable values of
|
|
|
|
running histograms.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
rv: Add Runtime Verification (RV) interface
RV is a lightweight (yet rigorous) method that complements classical
exhaustive verification techniques (such as model checking and
theorem proving) with a more practical approach to complex systems.
RV works by analyzing the trace of the system's actual execution,
comparing it against a formal specification of the system behavior.
RV can give precise information on the runtime behavior of the
monitored system while enabling the reaction for unexpected
events, avoiding, for example, the propagation of a failure on
safety-critical systems.
The development of this interface roots in the development of the
paper:
De Oliveira, Daniel Bristot; Cucinotta, Tommaso; De Oliveira, Romulo
Silva. Efficient formal verification for the Linux kernel. In:
International Conference on Software Engineering and Formal Methods.
Springer, Cham, 2019. p. 315-332.
And:
De Oliveira, Daniel Bristot. Automata-based formal analysis
and verification of the real-time Linux kernel. PhD Thesis, 2020.
The RV interface resembles the tracing/ interface on purpose. The current
path for the RV interface is /sys/kernel/tracing/rv/.
It presents these files:
"available_monitors"
- List the available monitors, one per line.
For example:
# cat available_monitors
wip
wwnr
"enabled_monitors"
- Lists the enabled monitors, one per line;
- Writing to it enables a given monitor;
- Writing a monitor name with a '!' prefix disables it;
- Truncating the file disables all enabled monitors.
For example:
# cat enabled_monitors
# echo wip > enabled_monitors
# echo wwnr >> enabled_monitors
# cat enabled_monitors
wip
wwnr
# echo '!wip' >> enabled_monitors
# cat enabled_monitors
wwnr
# echo > enabled_monitors
# cat enabled_monitors
#
Note that more than one monitor can be enabled concurrently.
"monitoring_on"
- It is an on/off general switcher for monitoring. Note
that it does not disable enabled monitors or detach events,
but stop the per-entity monitors of monitoring the events
received from the system. It resembles the "tracing_on" switcher.
"monitors/"
Each monitor will have its one directory inside "monitors/". There
the monitor specific files will be presented.
The "monitors/" directory resembles the "events" directory on
tracefs.
For example:
# cd monitors/wip/
# ls
desc enable
# cat desc
wakeup in preemptive per-cpu testing monitor.
# cat enable
0
For further information, see the comments in the header of
kernel/trace/rv/rv.c from this patch.
Link: https://lkml.kernel.org/r/a4bfe038f50cb047bfb343ad0e12b0e646ab308b.1659052063.git.bristot@kernel.org
Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Gabriele Paoloni <gpaoloni@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Tao Zhou <tao.zhou@linux.dev>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-29 02:38:40 -07:00
|
|
|
source "kernel/trace/rv/Kconfig"
|
|
|
|
|
2009-04-20 07:47:36 -07:00
|
|
|
endif # FTRACE
|