1f0d69a9fc
Impact: new unlikely/likely profiler Andrew Morton recently suggested having an in-kernel way to profile likely and unlikely macros. This patch achieves that goal. When configured, every(*) likely and unlikely macro gets a counter attached to it. When the condition is hit, the hit and misses of that condition are recorded. These numbers can later be retrieved by: /debugfs/tracing/profile_likely - All likely markers /debugfs/tracing/profile_unlikely - All unlikely markers. # cat /debug/tracing/profile_unlikely | head correct incorrect % Function File Line ------- --------- - -------- ---- ---- 2167 0 0 do_arch_prctl process_64.c 832 0 0 0 do_arch_prctl process_64.c 804 2670 0 0 IS_ERR err.h 34 71230 5693 7 __switch_to process_64.c 673 76919 0 0 __switch_to process_64.c 639 43184 33743 43 __switch_to process_64.c 624 12740 64181 83 __switch_to process_64.c 594 12740 64174 83 __switch_to process_64.c 590 # cat /debug/tracing/profile_unlikely | \ awk '{ if ($3 > 25) print $0; }' |head -20 44963 35259 43 __switch_to process_64.c 624 12762 67454 84 __switch_to process_64.c 594 12762 67447 84 __switch_to process_64.c 590 1478 595 28 syscall_get_error syscall.h 51 0 2821 100 syscall_trace_leave ptrace.c 1567 0 1 100 native_smp_prepare_cpus smpboot.c 1237 86338 265881 75 calc_delta_fair sched_fair.c 408 210410 108540 34 calc_delta_mine sched.c 1267 0 54550 100 sched_info_queued sched_stats.h 222 51899 66435 56 pick_next_task_fair sched_fair.c 1422 6 10 62 yield_task_fair sched_fair.c 982 7325 2692 26 rt_policy sched.c 144 0 1270 100 pre_schedule_rt sched_rt.c 1261 1268 48073 97 pick_next_task_rt sched_rt.c 884 0 45181 100 sched_info_dequeued sched_stats.h 177 0 15 100 sched_move_task sched.c 8700 0 15 100 sched_move_task sched.c 8690 53167 33217 38 schedule sched.c 4457 0 80208 100 sched_info_switch sched_stats.h 270 30585 49631 61 context_switch sched.c 2619 # cat /debug/tracing/profile_likely | awk '{ if ($3 > 25) print $0; }' 39900 36577 47 pick_next_task sched.c 4397 20824 15233 42 switch_mm mmu_context_64.h 18 0 7 100 __cancel_work_timer workqueue.c 560 617 66484 99 clocksource_adjust timekeeping.c 456 0 346340 100 audit_syscall_exit auditsc.c 1570 38 347350 99 audit_get_context auditsc.c 732 0 345244 100 audit_syscall_entry auditsc.c 1541 38 1017 96 audit_free auditsc.c 1446 0 1090 100 audit_alloc auditsc.c 862 2618 1090 29 audit_alloc auditsc.c 858 0 6 100 move_masked_irq migration.c 9 1 198 99 probe_sched_wakeup trace_sched_switch.c 58 2 2 50 probe_wakeup trace_sched_wakeup.c 227 0 2 100 probe_wakeup_sched_switch trace_sched_wakeup.c 144 4514 2090 31 __grab_cache_page filemap.c 2149 12882 228786 94 mapping_unevictable pagemap.h 50 4 11 73 __flush_cpu_slab slub.c 1466 627757 330451 34 slab_free slub.c 1731 2959 61245 95 dentry_lru_del_init dcache.c 153 946 1217 56 load_elf_binary binfmt_elf.c 904 102 82 44 disk_put_part genhd.h 206 1 1 50 dst_gc_task dst.c 82 0 19 100 tcp_mss_split_point tcp_output.c 1126 As you can see by the above, there's a bit of work to do in rethinking the use of some unlikelys and likelys. Note: the unlikely case had 71 hits that were more than 25%. Note: After submitting my first version of this patch, Andrew Morton showed me a version written by Daniel Walker, where I picked up the following ideas from: 1) Using __builtin_constant_p to avoid profiling fixed values. 2) Using __FILE__ instead of instruction pointers. 3) Using the preprocessor to stop all profiling of likely annotations from vsyscall_64.c. Thanks to Andrew Morton, Arjan van de Ven, Theodore Tso and Ingo Molnar for their feed back on this patch. (*) Not ever unlikely is recorded, those that are used by vsyscalls (a few of them) had to have profiling disabled. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Theodore Tso <tytso@mit.edu> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
236 lines
6.7 KiB
Plaintext
236 lines
6.7 KiB
Plaintext
#
|
|
# Architectures that offer an FUNCTION_TRACER implementation should
|
|
# select HAVE_FUNCTION_TRACER:
|
|
#
|
|
|
|
config NOP_TRACER
|
|
bool
|
|
|
|
config HAVE_FUNCTION_TRACER
|
|
bool
|
|
|
|
config HAVE_FUNCTION_RET_TRACER
|
|
bool
|
|
|
|
config HAVE_FUNCTION_TRACE_MCOUNT_TEST
|
|
bool
|
|
help
|
|
This gets selected when the arch tests the function_trace_stop
|
|
variable at the mcount call site. Otherwise, this variable
|
|
is tested by the called function.
|
|
|
|
config HAVE_DYNAMIC_FTRACE
|
|
bool
|
|
|
|
config HAVE_FTRACE_MCOUNT_RECORD
|
|
bool
|
|
|
|
config TRACER_MAX_TRACE
|
|
bool
|
|
|
|
config RING_BUFFER
|
|
bool
|
|
|
|
config TRACING
|
|
bool
|
|
select DEBUG_FS
|
|
select RING_BUFFER
|
|
select STACKTRACE if STACKTRACE_SUPPORT
|
|
select TRACEPOINTS
|
|
select NOP_TRACER
|
|
|
|
menu "Tracers"
|
|
|
|
config FUNCTION_TRACER
|
|
bool "Kernel Function Tracer"
|
|
depends on HAVE_FUNCTION_TRACER
|
|
depends on DEBUG_KERNEL
|
|
select FRAME_POINTER
|
|
select TRACING
|
|
select CONTEXT_SWITCH_TRACER
|
|
help
|
|
Enable the kernel to trace every kernel function. This is done
|
|
by using a compiler feature to insert a small, 5-byte No-Operation
|
|
instruction to the beginning of every kernel function, which NOP
|
|
sequence is then dynamically patched into a tracer call when
|
|
tracing is enabled by the administrator. If it's runtime disabled
|
|
(the bootup default), then the overhead of the instructions is very
|
|
small and not measurable even in micro-benchmarks.
|
|
|
|
config FUNCTION_RET_TRACER
|
|
bool "Kernel Function return Tracer"
|
|
depends on !DYNAMIC_FTRACE
|
|
depends on HAVE_FUNCTION_RET_TRACER
|
|
depends on FUNCTION_TRACER
|
|
help
|
|
Enable the kernel to trace a function at its return.
|
|
It's first purpose is to trace the duration of functions.
|
|
This is done by setting the current return address on the thread
|
|
info structure of the current task.
|
|
|
|
config IRQSOFF_TRACER
|
|
bool "Interrupts-off Latency Tracer"
|
|
default n
|
|
depends on TRACE_IRQFLAGS_SUPPORT
|
|
depends on GENERIC_TIME
|
|
depends on DEBUG_KERNEL
|
|
select TRACE_IRQFLAGS
|
|
select TRACING
|
|
select TRACER_MAX_TRACE
|
|
help
|
|
This option measures the time spent in irqs-off critical
|
|
sections, with microsecond accuracy.
|
|
|
|
The default measurement method is a maximum search, which is
|
|
disabled by default and can be runtime (re-)started
|
|
via:
|
|
|
|
echo 0 > /debugfs/tracing/tracing_max_latency
|
|
|
|
(Note that kernel size and overhead increases with this option
|
|
enabled. This option and the preempt-off timing option can be
|
|
used together or separately.)
|
|
|
|
config PREEMPT_TRACER
|
|
bool "Preemption-off Latency Tracer"
|
|
default n
|
|
depends on GENERIC_TIME
|
|
depends on PREEMPT
|
|
depends on DEBUG_KERNEL
|
|
select TRACING
|
|
select TRACER_MAX_TRACE
|
|
help
|
|
This option measures the time spent in preemption off critical
|
|
sections, with microsecond accuracy.
|
|
|
|
The default measurement method is a maximum search, which is
|
|
disabled by default and can be runtime (re-)started
|
|
via:
|
|
|
|
echo 0 > /debugfs/tracing/tracing_max_latency
|
|
|
|
(Note that kernel size and overhead increases with this option
|
|
enabled. This option and the irqs-off timing option can be
|
|
used together or separately.)
|
|
|
|
config SYSPROF_TRACER
|
|
bool "Sysprof Tracer"
|
|
depends on X86
|
|
select TRACING
|
|
help
|
|
This tracer provides the trace needed by the 'Sysprof' userspace
|
|
tool.
|
|
|
|
config SCHED_TRACER
|
|
bool "Scheduling Latency Tracer"
|
|
depends on DEBUG_KERNEL
|
|
select TRACING
|
|
select CONTEXT_SWITCH_TRACER
|
|
select TRACER_MAX_TRACE
|
|
help
|
|
This tracer tracks the latency of the highest priority task
|
|
to be scheduled in, starting from the point it has woken up.
|
|
|
|
config CONTEXT_SWITCH_TRACER
|
|
bool "Trace process context switches"
|
|
depends on DEBUG_KERNEL
|
|
select TRACING
|
|
select MARKERS
|
|
help
|
|
This tracer gets called from the context switch and records
|
|
all switching of tasks.
|
|
|
|
config BOOT_TRACER
|
|
bool "Trace boot initcalls"
|
|
depends on DEBUG_KERNEL
|
|
select TRACING
|
|
select CONTEXT_SWITCH_TRACER
|
|
help
|
|
This tracer helps developers to optimize boot times: it records
|
|
the timings of the initcalls and traces key events and the identity
|
|
of tasks that can cause boot delays, such as context-switches.
|
|
|
|
Its aim is to be parsed by the /scripts/bootgraph.pl tool to
|
|
produce pretty graphics about boot inefficiencies, giving a visual
|
|
representation of the delays during initcalls - but the raw
|
|
/debug/tracing/trace text output is readable too.
|
|
|
|
( Note that tracing self tests can't be enabled if this tracer is
|
|
selected, because the self-tests are an initcall as well and that
|
|
would invalidate the boot trace. )
|
|
|
|
config TRACE_UNLIKELY_PROFILE
|
|
bool "Trace likely/unlikely profiler"
|
|
depends on DEBUG_KERNEL
|
|
select TRACING
|
|
help
|
|
This tracer profiles all the the likely and unlikely macros
|
|
in the kernel. It will display the results in:
|
|
|
|
/debugfs/tracing/profile_likely
|
|
/debugfs/tracing/profile_unlikely
|
|
|
|
Note: this will add a significant overhead, only turn this
|
|
on if you need to profile the system's use of these macros.
|
|
|
|
Say N if unsure.
|
|
|
|
config STACK_TRACER
|
|
bool "Trace max stack"
|
|
depends on HAVE_FUNCTION_TRACER
|
|
depends on DEBUG_KERNEL
|
|
select FUNCTION_TRACER
|
|
select STACKTRACE
|
|
help
|
|
This special tracer records the maximum stack footprint of the
|
|
kernel and displays it in debugfs/tracing/stack_trace.
|
|
|
|
This tracer works by hooking into every function call that the
|
|
kernel executes, and keeping a maximum stack depth value and
|
|
stack-trace saved. Because this logic has to execute in every
|
|
kernel function, all the time, this option can slow down the
|
|
kernel measurably and is generally intended for kernel
|
|
developers only.
|
|
|
|
Say N if unsure.
|
|
|
|
config DYNAMIC_FTRACE
|
|
bool "enable/disable ftrace tracepoints dynamically"
|
|
depends on FUNCTION_TRACER
|
|
depends on HAVE_DYNAMIC_FTRACE
|
|
depends on DEBUG_KERNEL
|
|
default y
|
|
help
|
|
This option will modify all the calls to ftrace dynamically
|
|
(will patch them out of the binary image and replaces them
|
|
with a No-Op instruction) as they are called. A table is
|
|
created to dynamically enable them again.
|
|
|
|
This way a CONFIG_FUNCTION_TRACER kernel is slightly larger, but otherwise
|
|
has native performance as long as no tracing is active.
|
|
|
|
The changes to the code are done by a kernel thread that
|
|
wakes up once a second and checks to see if any ftrace calls
|
|
were made. If so, it runs stop_machine (stops all CPUS)
|
|
and modifies the code to jump over the call to ftrace.
|
|
|
|
config FTRACE_MCOUNT_RECORD
|
|
def_bool y
|
|
depends on DYNAMIC_FTRACE
|
|
depends on HAVE_FTRACE_MCOUNT_RECORD
|
|
|
|
config FTRACE_SELFTEST
|
|
bool
|
|
|
|
config FTRACE_STARTUP_TEST
|
|
bool "Perform a startup test on ftrace"
|
|
depends on TRACING && DEBUG_KERNEL && !BOOT_TRACER
|
|
select FTRACE_SELFTEST
|
|
help
|
|
This option performs a series of startup tests on ftrace. On bootup
|
|
a series of tests are made to verify that the tracer is
|
|
functioning properly. It will do tests on all the configured
|
|
tracers of ftrace.
|
|
|
|
endmenu
|