linux

History

Eduard Zingerman aa30eb3260 bpf: Force checkpoint when jmp history is too long A specifically crafted program might trick verifier into growing very long jump history within a single bpf_verifier_state instance. Very long jump history makes mark_chain_precision() unreasonably slow, especially in case if verifier processes a loop. Mitigate this by forcing new state in is_state_visited() in case if current state's jump history is too long. Use same constant as in `skip_inf_loop_check`, but multiply it by arbitrarily chosen value 2 to account for jump history containing not only information about jumps, but also information about stack access. For an example of problematic program consider the code below, w/o this patch the example is processed by verifier for ~15 minutes, before failing to allocate big-enough chunk for jmp_history. 0: r7 = (u16 )(r1 +0);" 1: r7 += 0x1ab064b9;" 2: if r7 & 0x702000 goto 1b; 3: r7 &= 0x1ee60e;" 4: r7 += r1;" 5: if r7 s> 0x37d2 goto +0;" 6: r0 = 0;" 7: exit;" Perf profiling shows that most of the time is spent in mark_chain_precision() ~95%. The easiest way to explain why this program causes problems is to apply the following patch: diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 0c216e71cec7..4b4823961abe 100644 \--- a/include/linux/bpf.h \+++ b/include/linux/bpf.h \@@ -1926,7 +1926,7 @@ struct bpf_array { }; }; -#define BPF_COMPLEXITY_LIMIT_INSNS 1000000 /* yes. 1M insns / +#define BPF_COMPLEXITY_LIMIT_INSNS 256 / yes. 1M insns / #define MAX_TAIL_CALL_CNT 33 / Maximum number of loops for bpf_loop and bpf_iter_num. diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index f514247ba8ba..75e88be3bb3e 100644 \--- a/kernel/bpf/verifier.c \+++ b/kernel/bpf/verifier.c \@@ -18024,8 +18024,13 @@ static int is_state_visited(struct bpf_verifier_env env, int insn_idx) skip_inf_loop_check: if (!force_new_state && env->jmps_processed - env->prev_jmps_processed < 20 && - env->insn_processed - env->prev_insn_processed < 100) + env->insn_processed - env->prev_insn_processed < 100) { + verbose(env, "is_state_visited: suppressing checkpoint at %d, %d jmps processed, cur->jmp_history_cnt is %d\n", + env->insn_idx, + env->jmps_processed - env->prev_jmps_processed, + cur->jmp_history_cnt); add_new_state = false; + } goto miss; } / If sl->state is a part of a loop and this loop's entry is a part of \@@ -18142,6 +18147,9 @@ static int is_state_visited(struct bpf_verifier_env env, int insn_idx) if (!add_new_state) return 0; + verbose(env, "is_state_visited: new checkpoint at %d, resetting env->jmps_processed\n", + env->insn_idx); + / There were no equivalent states, remember the current one. * Technically the current state is not proven to be safe yet, * but it will either reach outer most bpf_exit (which means it's safe) And observe verification log: ... is_state_visited: new checkpoint at 5, resetting env->jmps_processed 5: R1=ctx() R7=ctx(...) 5: (65) if r7 s> 0x37d2 goto pc+0 ; R7=ctx(...) 6: (b7) r0 = 0 ; R0_w=0 7: (95) exit from 5 to 6: R1=ctx() R7=ctx(...) R10=fp0 6: R1=ctx() R7=ctx(...) R10=fp0 6: (b7) r0 = 0 ; R0_w=0 7: (95) exit is_state_visited: suppressing checkpoint at 1, 3 jmps processed, cur->jmp_history_cnt is 74 from 2 to 1: R1=ctx() R7_w=scalar(...) R10=fp0 1: R1=ctx() R7_w=scalar(...) R10=fp0 1: (07) r7 += 447767737 is_state_visited: suppressing checkpoint at 2, 3 jmps processed, cur->jmp_history_cnt is 75 2: R7_w=scalar(...) 2: (45) if r7 & 0x702000 goto pc-2 ... mark_precise 152 steps for r7 ... 2: R7_w=scalar(...) is_state_visited: suppressing checkpoint at 1, 4 jmps processed, cur->jmp_history_cnt is 75 1: (07) r7 += 447767737 is_state_visited: suppressing checkpoint at 2, 4 jmps processed, cur->jmp_history_cnt is 76 2: R7_w=scalar(...) 2: (45) if r7 & 0x702000 goto pc-2 ... BPF program is too large. Processed 257 insn The log output shows that checkpoint at label (1) is never created, because it is suppressed by `skip_inf_loop_check` logic: a. When 'if' at (2) is processed it pushes a state with insn_idx (1) onto stack and proceeds to (3); b. At (5) checkpoint is created, and this resets env->{jmps,insns}_processed. c. Verification proceeds and reaches `exit`; d. State saved at step (a) is popped from stack and is_state_visited() considers if checkpoint needs to be added, but because env->{jmps,insns}_processed had been just reset at step (b) the `skip_inf_loop_check` logic forces `add_new_state` to false. e. Verifier proceeds with current state, which slowly accumulates more and more entries in the jump history. The accumulation of entries in the jump history is a problem because of two factors: - it eventually exhausts memory available for kmalloc() allocation; - mark_chain_precision() traverses the jump history of a state, meaning that if `r7` is marked precise, verifier would iterate ever growing jump history until parent state boundary is reached. (note: the log also shows a REG INVARIANTS VIOLATION warning upon jset processing, but that's another bug to fix). With this patch applied, the example above is rejected by verifier under 1s of time, reaching 1M instructions limit. The program is a simplified reproducer from syzbot report. Previous discussion could be found at [1]. The patch does not cause any changes in verification performance, when tested on selftests from veristat.cfg and cilium programs taken from [2]. [1] https://lore.kernel.org/bpf/20241009021254.2805446-1-eddyz87@gmail.com/ [2] https://github.com/anakryiko/cilium Changelog: - v1 -> v2: - moved patch to bpf tree; - moved force_new_state variable initialization after declaration and shortened the comment. v1: https://lore.kernel.org/bpf/20241018020307.1766906-1-eddyz87@gmail.com/ Fixes: `2589726d12` ("bpf: introduce bounded loops") Reported-by: syzbot+7e46cdef14bf496a3ab4@syzkaller.appspotmail.com Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20241029172641.1042523-1-eddyz87@gmail.com Closes: https://lore.kernel.org/bpf/670429f6.050a0220.49194.0517.GAE@google.com/		2024-10-29 11:42:21 -07:00
..
bpf	bpf: Force checkpoint when jmp history is too long	2024-10-29 11:42:21 -07:00
cgroup	struct fd layout change (and conversion to accessor helpers)	2024-09-23 09:35:36 -07:00
configs	tinyconfig: remove unnecessary 'is not set' for choice blocks	2024-09-01 20:34:38 +09:00
debug	move asm/unaligned.h to linux/unaligned.h	2024-10-02 17:23:23 -04:00
dma	dma-mapping: report unlimited DMA addressing in IOMMU DMA path	2024-09-23 08:38:56 +02:00
entry	treewide: context_tracking: Rename CONTEXT_* into CT_STATE_*	2024-07-29 07:33:10 +05:30
events	sched/fair: Fix external p->on_rq users	2024-10-14 09:14:35 +02:00
futex	fault-inject: improve build for CONFIG_FAULT_INJECTION=n	2024-09-01 20:43:33 -07:00
gcov
irq	pci-v6.12-changes	2024-09-23 12:47:06 -07:00
kcsan	kcsan: Use min() to fix Coccinelle warning	2024-08-01 16:40:44 -07:00
livepatch
locking	Locking changes for v6.12:	2024-09-29 08:51:30 -07:00
module	Modules changes for v6.12-rc1	2024-09-28 09:06:15 -07:00
power	[tree-wide] finally take no_llseek out	2024-09-27 08:18:43 -07:00
printk	drm next for 6.12-rc1	2024-09-19 10:18:15 +02:00
rcu	Merge branch 'linus' into sched/urgent, to resolve conflict	2024-10-17 09:58:07 +02:00
sched	Merge branch 'linus' into sched/urgent, to resolve conflict	2024-10-17 09:58:07 +02:00
time	Including fixes from netfiler, xfrm and bluetooth.	2024-10-24 16:43:50 -07:00
trace	BPF fixes:	2024-10-24 16:53:20 -07:00
.gitignore
acct.c
async.c
audit_fsnotify.c
audit_tree.c
audit_watch.c
audit.c	audit: Make use of str_enabled_disabled() helper	2024-09-03 16:35:16 -04:00
audit.h
auditfilter.c	audit: use task_tgid_nr() instead of task_pid_nr()	2024-08-28 16:48:28 -04:00
auditsc.c	audit: use task_tgid_nr() instead of task_pid_nr()	2024-08-28 16:48:28 -04:00
backtracetest.c
bounds.c
capability.c
cfi.c
compat.c
configs.c
context_tracking.c	context_tracking, rcu: Rename rcu_dyntick trace event into rcu_watching	2024-08-15 21:30:43 +05:30
cpu_pm.c
cpu.c	Updates for timers and timekeeping:	2024-09-17 07:25:37 +02:00
crash_core.c	Document/kexec: generalize crash hotplug description	2024-09-01 20:43:37 -07:00
crash_reserve.c	crash: fix crash memory reserve exceed system memory bug	2024-09-01 20:43:30 -07:00
cred.c
delayacct.c	sysctl: treewide: constify the ctl_table argument of proc_handlers	2024-07-24 20:59:29 +02:00
dma.c
elfcorehdr.c
exec_domain.c
exit.c	ALong with the usual shower of singleton patches, notable patch series in	2024-09-21 07:29:05 -07:00
exit.h
extable.c
fail_function.c
fork.c	close_range(): fix the logics in descriptor table trimming	2024-09-29 21:52:29 -04:00
freezer.c	sched/fair: Fix external p->on_rq users	2024-10-14 09:14:35 +02:00
gen_kheaders.sh
groups.c
hung_task.c	sysctl: treewide: constify the ctl_table argument of proc_handlers	2024-07-24 20:59:29 +02:00
iomem.c
irq_work.c
jump_label.c	jump_label: Fix static_key_slow_dec() yet again	2024-09-10 11:57:27 +02:00
kallsyms_internal.h	kallsyms: get rid of code for absolute kallsyms	2024-07-20 16:33:21 +09:00
kallsyms_selftest.c	kallsyms: Match symbols exactly with CONFIG_LTO_CLANG	2024-08-15 09:33:35 -07:00
kallsyms_selftest.h
kallsyms.c	kallsyms: Match symbols exactly with CONFIG_LTO_CLANG	2024-08-15 09:33:35 -07:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.kexec	crash: clean up kdump related config items	2024-02-23 17:48:22 -08:00
Kconfig.locks
Kconfig.preempt	sched_ext: Build fix on !CONFIG_STACKTRACE[_SUPPORT]	2024-08-01 07:08:01 -10:00
kcov.c	Updates for KCOV instrumentation on x86:	2024-09-17 12:40:34 +02:00
kexec_core.c	sysctl: treewide: constify the ctl_table argument of proc_handlers	2024-07-24 20:59:29 +02:00
kexec_elf.c
kexec_file.c	kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y	2024-09-01 17:59:01 -07:00
kexec_internal.h	kexec: use atomic_try_cmpxchg_acquire() in kexec_trylock()	2024-09-01 20:43:23 -07:00
kexec.c
kheaders.c
kprobes.c	kprobes: Fix to check symbol prefixes correctly	2024-08-05 14:04:03 +09:00
ksyms_common.c
ksysfs.c	profiling: remove prof_cpu_mask	2024-07-29 10:45:54 -07:00
kthread.c	kthread: unpark only parked kthread	2024-10-09 12:47:19 -07:00
latencytop.c	sysctl: treewide: constify the ctl_table argument of proc_handlers	2024-07-24 20:59:29 +02:00
Makefile	mm: move kernel/numa.c to mm/	2024-09-03 21:15:26 -07:00
module_signature.c
notifier.c
nsproxy.c	introduce fd_file(), convert all accessors to it.	2024-08-12 22:00:43 -04:00
padata.c	This update includes the following changes:	2024-09-16 06:28:28 +02:00
panic.c	drm next for 6.12-rc1	2024-09-19 10:18:15 +02:00
params.c
pid_namespace.c	sysctl: treewide: constify the ctl_table argument of proc_handlers	2024-07-24 20:59:29 +02:00
pid_sysctl.h	sysctl: treewide: constify the ctl_table argument of proc_handlers	2024-07-24 20:59:29 +02:00
pid.c	introduce fd_file(), convert all accessors to it.	2024-08-12 22:00:43 -04:00
profile.c	profiling: remove profile=sleep support	2024-08-04 13:36:28 -07:00
ptrace.c	ptrace_attach: shift send(SIGSTOP) into ptrace_set_stopped()	2024-02-22 15:38:52 -08:00
range.c
reboot.c	kernel misc: Remove the now superfluous sentinel elements from ctl_table array	2024-04-24 09:43:53 +02:00
regset.c
relay.c	[tree-wide] finally take no_llseek out	2024-09-27 08:18:43 -07:00
resource_kunit.c	resource, kunit: fix user-after-free in resource_test_region_intersects()	2024-10-09 12:47:19 -07:00
resource.c	ALong with the usual shower of singleton patches, notable patch series in	2024-09-21 07:29:05 -07:00
rseq.c
scftorture.c
scs.c
seccomp.c	sysctl: treewide: constify the ctl_table argument of proc_handlers	2024-07-24 20:59:29 +02:00
signal.c	Revert "binfmt_elf, coredump: Log the reason of the failed core dumps"	2024-09-26 11:39:02 -07:00
smp.c	smp: print only local CPU info when sched_clock goes backward	2024-08-15 00:06:48 +05:30
smpboot.c
smpboot.h
softirq.c	softirq: Remove unused 'action' parameter from action callback	2024-08-20 17:13:40 +02:00
stackleak.c	sysctl: treewide: constify the ctl_table argument of proc_handlers	2024-07-24 20:59:29 +02:00
stacktrace.c
static_call_inline.c	static_call: Replace pointless WARN_ON() in static_call_module_notify()	2024-09-06 16:29:22 +02:00
static_call.c
stop_machine.c	rcu: Rename rcu_momentary_dyntick_idle() into rcu_momentary_eqs()	2024-08-15 21:30:42 +05:30
sys_ni.c	Probes updates for v6.11:	2024-07-18 12:19:20 -07:00
sys.c	struct fd layout change (and conversion to accessor helpers)	2024-09-23 09:35:36 -07:00
sysctl-test.c
sysctl.c	sysctl: treewide: constify the ctl_table argument of proc_handlers	2024-07-24 20:59:29 +02:00
task_work.c	sched/core: Disable page allocation in task_tick_mm_cid()	2024-10-11 10:49:32 +02:00
taskstats.c	introduce fd_file(), convert all accessors to it.	2024-08-12 22:00:43 -04:00
torture.c
tracepoint.c	tracepoint: Support iterating tracepoints in a loading module	2024-09-25 23:23:44 +09:00
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c	sysctl: treewide: constify the ctl_table argument of proc_handlers	2024-07-24 20:59:29 +02:00
up.c
user_namespace.c	user_namespace: use kmemdup_array() instead of kmemdup() for multiple allocation	2024-09-09 16:47:42 -07:00
user-return-notifier.c
user.c	uidgid: make sure we fit into one cacheline	2024-09-12 12:16:09 +02:00
usermode_driver.c
utsname_sysctl.c	sysctl: treewide: constify the ctl_table argument of proc_handlers	2024-07-24 20:59:29 +02:00
utsname.c
vhost_task.c	vhost_task: Handle SIGKILL by flushing work and exiting	2024-05-22 08:31:15 -04:00
vmcore_info.c	mm: support only one page_type per page	2024-09-03 21:15:43 -07:00
watch_queue.c	introduce fd_file(), convert all accessors to it.	2024-08-12 22:00:43 -04:00
watchdog_buddy.c
watchdog_perf.c	watchdog/perf: properly initialize the turbo mode timestamp and rearm counter	2024-07-17 21:11:34 -07:00
watchdog.c	watchdog: handle the ENODEV failure case of lockup_detector_delay_init() separately	2024-09-01 20:43:32 -07:00
workqueue_internal.h
workqueue.c	workqueue: Changes for v6.12	2024-09-18 06:59:44 +02:00