linux

Author	SHA1	Message	Date
Lorenzo Bianconi	3e705251d9	net: netfilter: move nf flowtable bpf initialization in nf_flow_table_module_init() Move nf flowtable bpf initialization in nf_flow_table module load routine since nf_flow_table_bpf is part of nf_flow_table module and not nf_flow_table_inet one. This patch allows to avoid the following kernel warning running the reproducer below: $modprobe nf_flow_table_inet $rmmod nf_flow_table_inet $modprobe nf_flow_table_inet modprobe: ERROR: could not insert 'nf_flow_table_inet': Invalid argument [ 184.081501] ------------[ cut here ]------------ [ 184.081527] WARNING: CPU: 0 PID: 1362 at kernel/bpf/btf.c:8206 btf_populate_kfunc_set+0x23c/0x330 [ 184.081550] CPU: 0 UID: 0 PID: 1362 Comm: modprobe Kdump: loaded Not tainted 6.11.0-0.rc5.22.el10.x86_64 #1 [ 184.081553] Hardware name: Red Hat OpenStack Compute, BIOS 1.14.0-1.module+el8.4.0+8855+a9e237a9 04/01/2014 [ 184.081554] RIP: 0010:btf_populate_kfunc_set+0x23c/0x330 [ 184.081558] RSP: 0018:ff22cfb38071fc90 EFLAGS: 00010202 [ 184.081559] RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000000 [ 184.081560] RDX: 000000000000006e RSI: ffffffff95c00000 RDI: ff13805543436350 [ 184.081561] RBP: ffffffffc0e22180 R08: ff13805543410808 R09: 000000000001ec00 [ 184.081562] R10: ff13805541c8113c R11: 0000000000000010 R12: ff13805541b83c00 [ 184.081563] R13: ff13805543410800 R14: 0000000000000001 R15: ffffffffc0e2259a [ 184.081564] FS: 00007fa436c46740(0000) GS:ff1380557ba00000(0000) knlGS:0000000000000000 [ 184.081569] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 184.081570] CR2: 000055e7b3187000 CR3: 0000000100c48003 CR4: 0000000000771ef0 [ 184.081571] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 184.081572] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 184.081572] PKRU: 55555554 [ 184.081574] Call Trace: [ 184.081575] <TASK> [ 184.081578] ? show_trace_log_lvl+0x1b0/0x2f0 [ 184.081580] ? show_trace_log_lvl+0x1b0/0x2f0 [ 184.081582] ? __register_btf_kfunc_id_set+0x199/0x200 [ 184.081585] ? btf_populate_kfunc_set+0x23c/0x330 [ 184.081586] ? __warn.cold+0x93/0xed [ 184.081590] ? btf_populate_kfunc_set+0x23c/0x330 [ 184.081592] ? report_bug+0xff/0x140 [ 184.081594] ? handle_bug+0x3a/0x70 [ 184.081596] ? exc_invalid_op+0x17/0x70 [ 184.081597] ? asm_exc_invalid_op+0x1a/0x20 [ 184.081601] ? btf_populate_kfunc_set+0x23c/0x330 [ 184.081602] __register_btf_kfunc_id_set+0x199/0x200 [ 184.081605] ? __pfx_nf_flow_inet_module_init+0x10/0x10 [nf_flow_table_inet] [ 184.081607] do_one_initcall+0x58/0x300 [ 184.081611] do_init_module+0x60/0x230 [ 184.081614] __do_sys_init_module+0x17a/0x1b0 [ 184.081617] do_syscall_64+0x7d/0x160 [ 184.081620] ? __count_memcg_events+0x58/0xf0 [ 184.081623] ? handle_mm_fault+0x234/0x350 [ 184.081626] ? do_user_addr_fault+0x347/0x640 [ 184.081630] ? clear_bhb_loop+0x25/0x80 [ 184.081633] ? clear_bhb_loop+0x25/0x80 [ 184.081634] ? clear_bhb_loop+0x25/0x80 [ 184.081637] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 184.081639] RIP: 0033:0x7fa43652e4ce [ 184.081647] RSP: 002b:00007ffe8213be18 EFLAGS: 00000246 ORIG_RAX: 00000000000000af [ 184.081649] RAX: ffffffffffffffda RBX: 000055e7b3176c20 RCX: 00007fa43652e4ce [ 184.081650] RDX: 000055e7737fde79 RSI: 0000000000003990 RDI: 000055e7b3185380 [ 184.081651] RBP: 000055e7737fde79 R08: 0000000000000007 R09: 000055e7b3179bd0 [ 184.081651] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000040000 [ 184.081652] R13: 000055e7b3176fa0 R14: 0000000000000000 R15: 000055e7b3179b80 Fixes: `391bb6594f` ("netfilter: Add bpf_xdp_flow_lookup kfunc") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Link: https://patch.msgid.link/20240911-nf-flowtable-bpf-modprob-fix-v1-1-f9fc075aafc3@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-09-12 15:41:03 +02:00
Paolo Abeni	8700970971	netfilter pull request 24-09-12 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmbiF/oACgkQ1V2XiooU IOQRQBAAoeOp8RlVU4XTm2++A+jwZODeeKoDcnyrdXWFZFUmORHLuUmIzVUDMJyB rp9/Jnw/J8aS5pT4Bx79cWuPlZ9UylSsPr8lt7oJ3NNxRwzbdQjX97wKhONAYUGZ j4Jnd1ObWj5uDHvI4hbMoqZlqzk64Cgdw6tMgYDxmjTnUuJbJztNL6QkUWvZz4mR 0gY3SluDadKc3dLqoFDefi7ZLidn5Fc3W99JZu295y40pe3qzWNhFPhQPC1s+1T6 9svzC95cIN1JncEqnuplcZQJEylRikOH2W6sH1SFflvI4QhiUXnOqluwjVGiDuxm nPsoXZx3/uqlWwNKEMn/WwjTg9bqUnTzaT8M6RlZKimwmo+FKIgOQb6QjCgaBMWm kbp2XWM/rmsDhjM58asCVgw7Bcftw6mrh8qt5fq2gxGnyE6G+3M9WDR8F9VmhxME AtjgNqogrFYJkMSjVBvrqIr6CSzad3GgG+upWCbArAAyIZvAJdX54DSvXxgiv7bS CV5J03/zIMohiLAwiGZC5q0IPNaHN4hUnBSTRhWSsQpnjg00dUeplYLlOfIRvgJO uPKhzpYBzo+E4CzZLWeeNC+ArWT7iIO4NOYLxxuD1Olm71LNNHMlt2XFCJWV9oJl UsB9QKWDh3MwFmbgyzxf/0QalGawGL/MVWa8MGmRKcUsxM9uXuU= =Oc51 -----END PGP SIGNATURE----- Merge tag 'nf-24-09-12' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following batch contains two fixes from Florian Westphal: Patch #1 fixes a sk refcount leak in nft_socket on mismatch. Patch #2 fixes cgroupsv2 matching from containers due to incorrect level in subtree. netfilter pull request 24-09-12 * tag 'nf-24-09-12' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_socket: make cgroupsv2 matching work with namespaces netfilter: nft_socket: fix sk refcount leaks ==================== Link: https://patch.msgid.link/20240911222520.3606-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-09-12 15:26:19 +02:00
Willem de Bruijn	6513eb3d31	net: tighten bad gso csum offset check in virtio_net_hdr The referenced commit drops bad input, but has false positives. Tighten the check to avoid these. The check detects illegal checksum offload requests, which produce csum_start/csum_off beyond end of packet after segmentation. But it is based on two incorrect assumptions: 1. virtio_net_hdr_to_skb with VIRTIO_NET_HDR_GSO_TCP[46] implies GSO. True in callers that inject into the tx path, such as tap. But false in callers that inject into rx, like virtio-net. Here, the flags indicate GRO, and CHECKSUM_UNNECESSARY or CHECKSUM_NONE without VIRTIO_NET_HDR_F_NEEDS_CSUM is normal. 2. TSO requires checksum offload, i.e., ip_summed == CHECKSUM_PARTIAL. False, as tcp[46]_gso_segment will fix up csum_start and offset for all other ip_summed by calling __tcp_v4_send_check. Because of 2, we can limit the scope of the fix to virtio_net_hdr that do try to set these fields, with a bogus value. Link: https://lore.kernel.org/netdev/20240909094527.GA3048202@port70.net/ Fixes: `89add40066` ("net: drop bad gso csum_start and offset in virtio_net_hdr") Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20240910213553.839926-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-11 20:43:07 -07:00
Asbjørn Sloth Tønnesen	09a45a5553	netlink: specs: mptcp: fix port endianness The MPTCP port attribute is in host endianness, but was documented as big-endian in the ynl specification. Below are two examples from net/mptcp/pm_netlink.c showing that the attribute is converted to/from host endianness for use with netlink. Import from netlink: addr->port = htons(nla_get_u16(tb[MPTCP_PM_ADDR_ATTR_PORT])) Export to netlink: nla_put_u16(skb, MPTCP_PM_ADDR_ATTR_PORT, ntohs(addr->port)) Where addr->port is defined as __be16. No functional change intended. Fixes: `bc8aeb2045` ("Documentation: netlink: add a YAML spec for mptcp") Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240911091003.1112179-1-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-11 20:42:47 -07:00
Sean Anderson	cbd7ec0834	net: dpaa: Pad packets to ETH_ZLEN When sending packets under 60 bytes, up to three bytes of the buffer following the data may be leaked. Avoid this by extending all packets to ETH_ZLEN, ensuring nothing is leaked in the padding. This bug can be reproduced by running $ ping -s 11 destination Fixes: `9ad1a37493` ("dpaa_eth: add support for DPAA Ethernet") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240910143144.1439910-1-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-11 16:01:25 -07:00
Edward Adam Davis	b4cd80b033	mptcp: pm: Fix uaf in __timer_delete_sync There are two paths to access mptcp_pm_del_add_timer, result in a race condition: CPU1 CPU2 ==== ==== net_rx_action napi_poll netlink_sendmsg __napi_poll netlink_unicast process_backlog netlink_unicast_kernel __netif_receive_skb genl_rcv __netif_receive_skb_one_core netlink_rcv_skb NF_HOOK genl_rcv_msg ip_local_deliver_finish genl_family_rcv_msg ip_protocol_deliver_rcu genl_family_rcv_msg_doit tcp_v4_rcv mptcp_pm_nl_flush_addrs_doit tcp_v4_do_rcv mptcp_nl_remove_addrs_list tcp_rcv_established mptcp_pm_remove_addrs_and_subflows tcp_data_queue remove_anno_list_by_saddr mptcp_incoming_options mptcp_pm_del_add_timer mptcp_pm_del_add_timer kfree(entry) In remove_anno_list_by_saddr(running on CPU2), after leaving the critical zone protected by "pm.lock", the entry will be released, which leads to the occurrence of uaf in the mptcp_pm_del_add_timer(running on CPU1). Keeping a reference to add_timer inside the lock, and calling sk_stop_timer_sync() with this reference, instead of "entry->add_timer". Move list_del(&entry->list) to mptcp_pm_del_add_timer and inside the pm lock, do not directly access any members of the entry outside the pm lock, which can avoid similar "entry->x" uaf. Fixes: `00cfd77b90` ("mptcp: retransmit ADD_ADDR when timeout") Cc: stable@vger.kernel.org Reported-and-tested-by: syzbot+f3a31fb909db9b2a5c4d@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f3a31fb909db9b2a5c4d Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Edward Adam Davis <eadavis@qq.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/tencent_7142963A37944B4A74EF76CD66EA3C253609@qq.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-11 16:00:11 -07:00
Jiawen Wu	077ee7e6b1	net: libwx: fix number of Rx and Tx descriptors The number of transmit and receive descriptors must be a multiple of 128 due to the hardware limitation. If it is set to a multiple of 8 instead of a multiple 128, the queues will easily be hung. Cc: stable@vger.kernel.org Fixes: `883b5984a5` ("net: wangxun: add ethtool_ops for ring parameters") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240910095629.570674-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-11 15:59:43 -07:00
Xiaoliang Yang	70654f4c21	net: dsa: felix: ignore pending status of TAS module when it's disabled The TAS module could not be configured when it's running in pending status. We need disable the module and configure it again. However, the pending status is not cleared after the module disabled. TC taprio set will always return busy even it's disabled. For example, a user uses tc-taprio to configure Qbv and a future basetime. The TAS module will run in a pending status. There is no way to reconfigure Qbv, it always returns busy. Actually the TAS module can be reconfigured when it's disabled. So it doesn't need to check the pending status if the TAS module is disabled. After the patch, user can delete the tc taprio configuration to disable Qbv and reconfigure it again. Fixes: `de143c0e27` ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload") Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Link: https://patch.msgid.link/20240906093550.29985-1-xiaoliang.yang_1@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-11 15:54:51 -07:00
Jeongjun Park	a7789fd4ca	net: hsr: prevent NULL pointer dereference in hsr_proxy_announce() In the function hsr_proxy_annouance() added in the previous commit `5f703ce5c9` ("net: hsr: Send supervisory frames to HSR network with ProxyNodeTable data"), the return value of the hsr_port_get_hsr() function is not checked to be a NULL pointer, which causes a NULL pointer dereference. To solve this, we need to add code to check whether the return value of hsr_port_get_hsr() is NULL. Reported-by: syzbot+02a42d9b1bd395cbcab4@syzkaller.appspotmail.com Fixes: `5f703ce5c9` ("net: hsr: Send supervisory frames to HSR network with ProxyNodeTable data") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Lukasz Majewski <lukma@denx.de> Link: https://patch.msgid.link/20240907190341.162289-1-aha310510@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-11 15:49:41 -07:00
Jakub Kicinski	6254031777	Merge branch 'selftests-mptcp-misc-small-fixes' Matthieu Baerts says: ==================== selftests: mptcp: misc. small fixes Here are some various fixes for the MPTCP selftests. Patch 1 fixes a recently modified test to continue to work as expected on older kernels. This is a fix for a recent fix that can be backported up to v5.15. Patch 2 and 3 include dependences when exporting or installing the tests. Two fixes for v6.11-rc1. ==================== Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-0-8f124aa9156d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-11 15:18:23 -07:00
Matthieu Baerts (NGI0)	c66c08e51b	selftests: mptcp: include net_helper.sh file Similar to the previous commit, the net_helper.sh file from the parent directory is used by the MPTCP selftests and it needs to be present when running the tests. This file then needs to be listed in the Makefile to be included when exporting or installing the tests, e.g. with: make -C tools/testing/selftests \ TARGETS=net/mptcp \ install INSTALL_PATH=$KSFT_INSTALL_PATH cd $KSFT_INSTALL_PATH ./run_kselftest.sh -c net/mptcp Fixes: `1af3bc912e` ("selftests: mptcp: lib: use wait_local_port_listen helper") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-3-8f124aa9156d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-11 15:18:20 -07:00
Matthieu Baerts (NGI0)	1a5a2d19e8	selftests: mptcp: include lib.sh file The lib.sh file from the parent directory is used by the MPTCP selftests and it needs to be present when running the tests. This file then needs to be listed in the Makefile to be included when exporting or installing the tests, e.g. with: make -C tools/testing/selftests \ TARGETS=net/mptcp \ install INSTALL_PATH=$KSFT_INSTALL_PATH cd $KSFT_INSTALL_PATH ./run_kselftest.sh -c net/mptcp Fixes: `f265d3119a` ("selftests: mptcp: lib: use setup/cleanup_ns helpers") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-2-8f124aa9156d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-11 15:18:19 -07:00
Matthieu Baerts (NGI0)	49ac6f05ac	selftests: mptcp: join: restrict fullmesh endp on 1st sf A new endpoint using the IP of the initial subflow has been recently added to increase the code coverage. But it breaks the test when using old kernels not having commit `86e39e0448` ("mptcp: keep track of local endpoint still available for each msk"), e.g. on v5.15. Similar to commit `d4c81bbb86` ("selftests: mptcp: join: support local endpoint being tracked or not"), it is possible to add the new endpoint conditionally, by checking if "mptcp_pm_subflow_check_next" is present in kallsyms: this is not directly linked to the commit introducing this symbol but for the parent one which is linked anyway. So we can know in advance what will be the expected behaviour, and add the new endpoint only when it makes sense to do so. Fixes: `4878f9f842` ("selftests: mptcp: join: validate fullmesh endp on 1st sf") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-1-8f124aa9156d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-11 15:18:19 -07:00
Florian Westphal	7f3287db65	netfilter: nft_socket: make cgroupsv2 matching work with namespaces When running in container environmment, /sys/fs/cgroup/ might not be the real root node of the sk-attached cgroup. Example: In container: % stat /sys//fs/cgroup/ Device: 0,21 Inode: 2214 .. % stat /sys/fs/cgroup/foo Device: 0,21 Inode: 2264 .. The expectation would be for: nft add rule .. socket cgroupv2 level 1 "foo" counter to match traffic from a process that got added to "foo" via "echo $pid > /sys/fs/cgroup/foo/cgroup.procs". However, 'level 3' is needed to make this work. Seen from initial namespace, the complete hierarchy is: % stat /sys/fs/cgroup/system.slice/docker-.../foo Device: 0,21 Inode: 2264 .. i.e. hierarchy is 0 1 2 3 / -> system.slice -> docker-1... -> foo ... but the container doesn't know that its "/" is the "docker-1.." cgroup. Current code will retrieve the 'system.slice' cgroup node and store its kn->id in the destination register, so compare with 2264 ("foo" cgroup id) will not match. Fetch "/" cgroup from ->init() and add its level to the level we try to extract. cgroup root-level is 0 for the init-namespace or the level of the ancestor that is exposed as the cgroup root inside the container. In the above case, cgrp->level of "/" resolved in the container is 2 (docker-1...scope/) and request for 'level 1' will get adjusted to fetch the actual level (3). v2: use CONFIG_SOCK_CGROUP_DATA, eval function depends on it. (kernel test robot) Cc: cgroups@vger.kernel.org Fixes: `e0bb96db96` ("netfilter: nft_socket: add support for cgroupsv2") Reported-by: Nadia Pinaeva <n.m.pinaeva@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2024-09-12 00:16:58 +02:00
Florian Westphal	8b26ff7af8	netfilter: nft_socket: fix sk refcount leaks We must put 'sk' reference before returning. Fixes: `039b1f4f24` ("netfilter: nft_socket: fix erroneous socket assignment") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2024-09-12 00:16:54 +02:00
Jakub Kicinski	d1aaaa2e0a	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-09-09 (ice, igb) This series contains updates to ice and igb drivers. Martyna moves LLDP rule removal to the proper uninitialization function for ice. Jake corrects accounting logic for FWD_TO_VSI_LIST switch filters on ice. Przemek removes incorrect, explicit calls to pci_disable_device() for ice. Michal Schmidt stops incorrect use of VSI list for VLAN use on ice. Sriram Yagnaraman adjusts igb_xdp_ring_update_tail() to be called under Tx lock on igb. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: igb: Always call igb_xdp_ring_update_tail() under Tx lock ice: fix VSI lists confusion when adding VLANs ice: stop calling pci_disable_device() as we use pcim ice: fix accounting for filters shared by multiple VSIs ice: Fix lldp packets dropping after changing the number of channels ==================== Link: https://patch.msgid.link/20240909203842.3109822-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-10 20:15:10 -07:00
Jakub Kicinski	3d731dc9b1	mlx5-fixes-2024-09-09 -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmbfTw4ACgkQSD+KveBX +j4jxggAnxuWbJuvFBVkiU+62SpPldhKy/ut7Dc3KTOOezb7HMD7suYawgZl0jxr 1cSltKL3lpmaN2FEKITRxESsOKjHqVShkWpZCi+c8hMwd+vWowlaO4r6BY/5ZYhj 2KPx3PjJl6d30d0gw4zMNu3XlOnpunuaRXJv5dbmRkz6G2XGVQzyOH2pfzSJWxyk bcqYm/3Ma0psfEQhIP6I0LDBvHU4rOAlIGQN4IAzmLmwi4Whk6ECI7Q91v3PH/c9 nTJNTQhvyUJEc5aYuHftNU2MHlzejDPx5F3xd4dcQs30MXk5efSD9+OWnxHivjrP c9GE3+PmWAWJtSLLb/iOMyTvY+x63Q== =mGZl -----END PGP SIGNATURE----- Merge tag 'mlx5-fixes-2024-09-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2024-09-09 This series provides bug fixes to mlx5 driver. * tag 'mlx5-fixes-2024-09-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Fix bridge mode operations when there are no VFs net/mlx5: Verify support for scheduling element and TSAR type net/mlx5: Add missing masks and QoS bit masks for scheduling elements net/mlx5: Explicitly set scheduling element and TSAR type net/mlx5e: Add missing link mode to ptys2ext_ethtool_map net/mlx5e: Add missing link modes to ptys2ethtool_map net/mlx5: Update the list of the PCI supported devices ==================== Link: https://patch.msgid.link/20240909194505.69715-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-10 20:11:40 -07:00
Kory Maincent	330dadacc5	MAINTAINERS: Add ethtool pse-pd to PSE NETWORK DRIVER Add net/ethtool/pse-pd.c to PSE NETWORK DRIVER to receive emails concerning modifications to the ethtool part. Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240909114336.362174-1-kory.maincent@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-10 18:31:23 -07:00
Wei Fang	2f9caba9b2	dt-bindings: net: tja11xx: fix the broken binding As Rob pointed in another mail thread [1], the binding of tja11xx PHY is completely broken, the schema cannot catch the error in the DTS. A compatiable string must be needed if we want to add a custom propety. So extract known PHY IDs from the tja11xx PHY drivers and convert them into supported compatible string list to fix the broken binding issue. Fixes: `52b2fe4535` ("dt-bindings: net: tja11xx: add nxp,refclk_in property") Link: https://lore.kernel.org/31058f49-bac5-49a9-a422-c43b121bf049@kernel.org # [1] Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20240909012152.431647-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-10 17:12:58 -07:00
Sean Anderson	e8a63d473b	selftests: net: csum: Fix checksums for packets with non-zero padding Padding is not included in UDP and TCP checksums. Therefore, reduce the length of the checksummed data to include only the data in the IP payload. This fixes spurious reported checksum failures like rx: pkt: sport=33000 len=26 csum=0xc850 verify=0xf9fe pkt: bad csum Technically it is possible for there to be trailing bytes after the UDP data but before the Ethernet padding (e.g. if sizeof(ip) + sizeof(udp) + udp.len < ip.len). However, we don't generate such packets. Fixes: `91a7de8560` ("selftests/net: add csum offload test") Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240906210743.627413-1-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-10 16:33:31 -07:00
Tomas Paukrt	3f62ea572b	net: phy: dp83822: Fix NULL pointer dereference on DP83825 devices The probe() function is only used for DP83822 and DP83826 PHY, leaving the private data pointer uninitialized for the DP83825 models which causes a NULL pointer dereference in the recently introduced/changed functions dp8382x_config_init() and dp83822_set_wol(). Add the dp8382x_probe() function, so all PHY models will have a valid private data pointer to fix this issue and also prevent similar issues in the future. Fixes: `9ef9ecfa9e` ("net: phy: dp8382x: keep WOL settings across suspends") Signed-off-by: Tomas Paukrt <tomaspaukrt@email.cz> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/66w.ZbGt.65Ljx42yHo5.1csjxu@seznam.cz Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-10 16:20:15 -07:00
Jakub Kicinski	48aa361c5d	Merge branch 'revert-virtio_net-rx-enable-premapped-mode-by-default' Xuan Zhuo says: ==================== Revert "virtio_net: rx enable premapped mode by default" Regression: http://lore.kernel.org/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com ==================== Link: https://patch.msgid.link/20240906123137.108741-1-xuanzhuo@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-10 09:01:08 -07:00
Xuan Zhuo	111fc9f517	virtio_net: disable premapped mode by default Now, the premapped mode encounters some problem. http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com So we disable the premapped mode by default. We can re-enable it in the future. Fixes: `f9dac92ba9` ("virtio_ring: enable premapped mode whatever use_dma_api") Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Takero Funaki <flintglass@gmail.com> Link: https://patch.msgid.link/20240906123137.108741-4-xuanzhuo@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-10 09:01:06 -07:00
Xuan Zhuo	38eef112a8	Revert "virtio_net: big mode skip the unmap check" This reverts commit `a377ae542d`. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Takero Funaki <flintglass@gmail.com> Link: https://patch.msgid.link/20240906123137.108741-3-xuanzhuo@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-10 09:01:06 -07:00
Xuan Zhuo	dc4547fbba	Revert "virtio_net: rx remove premapped failover code" This reverts commit `defd28aa5a`. Recover the code to disable premapped mode. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Takero Funaki <flintglass@gmail.com> Link: https://patch.msgid.link/20240906123137.108741-2-xuanzhuo@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-10 09:01:06 -07:00
Jacky Chou	fef2843bb4	net: ftgmac100: Enable TX interrupt to avoid TX timeout Currently, the driver only enables RX interrupt to handle RX packets and TX resources. Sometimes there is not RX traffic, so the TX resource needs to wait for RX interrupt to free. This situation will toggle the TX timeout watchdog when the MAC TX ring has no more resources to transmit packets. Therefore, enable TX interrupt to release TX resources at any time. When I am verifying iperf3 over UDP, the network hangs. Like the log below. root# iperf3 -c 192.168.100.100 -i1 -t10 -u -b0 Connecting to host 192.168.100.100, port 5201 [ 4] local 192.168.100.101 port 35773 connected to 192.168.100.100 port 5201 [ ID] Interval Transfer Bandwidth Total Datagrams [ 4] 0.00-20.42 sec 160 KBytes 64.2 Kbits/sec 20 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams [ 4] 0.00-20.42 sec 160 KBytes 64.2 Kbits/sec 0.000 ms 0/20 (0%) [ 4] Sent 20 datagrams iperf3: error - the server has terminated The network topology is FTGMAC connects directly to a PC. UDP does not need to wait for ACK, unlike TCP. Therefore, FTGMAC needs to enable TX interrupt to release TX resources instead of waiting for the RX interrupt. Fixes: `10cbd64076` ("ftgmac100: Rework NAPI & interrupts handling") Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com> Link: https://patch.msgid.link/20240906062831.2243399-1-jacky_chou@aspeedtech.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-09-10 12:50:07 +02:00
Naveen Mamindlapalli	019aba04f0	octeontx2-af: Modify SMQ flush sequence to drop packets The current implementation of SMQ flush sequence waits for the packets in the TM pipeline to be transmitted out of the link. This sequence doesn't succeed in HW when there is any issue with link such as lack of link credits, link down or any other traffic that is fully occupying the link bandwidth (QoS). This patch modifies the SMQ flush sequence to drop the packets after TL1 level (SQM) instead of polling for the packets to be sent out of RPM/CGX link. Fixes: `5d9b976d44` ("octeontx2-af: Support fixed transmit scheduler topology") Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Reviewed-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Link: https://patch.msgid.link/20240906045838.1620308-1-naveenm@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-09-10 12:16:13 +02:00
Muhammad Usama Anjum	4c80022771	fou: fix initialization of grc The grc must be initialize first. There can be a condition where if fou is NULL, goto out will be executed and grc would be used uninitialized. Fixes: `7e41969350` ("fou: Fix null-ptr-deref in GRO.") Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20240906102839.202798-1-usama.anjum@collabora.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-09 17:21:47 -07:00
Benjamin Poirier	b1d305abef	net/mlx5: Fix bridge mode operations when there are no VFs Currently, trying to set the bridge mode attribute when numvfs=0 leads to a crash: bridge link set dev eth2 hwmode vepa [ 168.967392] BUG: kernel NULL pointer dereference, address: 0000000000000030 [...] [ 168.969989] RIP: 0010:mlx5_add_flow_rules+0x1f/0x300 [mlx5_core] [...] [ 168.976037] Call Trace: [ 168.976188] <TASK> [ 168.978620] _mlx5_eswitch_set_vepa_locked+0x113/0x230 [mlx5_core] [ 168.979074] mlx5_eswitch_set_vepa+0x7f/0xa0 [mlx5_core] [ 168.979471] rtnl_bridge_setlink+0xe9/0x1f0 [ 168.979714] rtnetlink_rcv_msg+0x159/0x400 [ 168.980451] netlink_rcv_skb+0x54/0x100 [ 168.980675] netlink_unicast+0x241/0x360 [ 168.980918] netlink_sendmsg+0x1f6/0x430 [ 168.981162] ____sys_sendmsg+0x3bb/0x3f0 [ 168.982155] ___sys_sendmsg+0x88/0xd0 [ 168.985036] __sys_sendmsg+0x59/0xa0 [ 168.985477] do_syscall_64+0x79/0x150 [ 168.987273] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 168.987773] RIP: 0033:0x7f8f7950f917 (esw->fdb_table.legacy.vepa_fdb is null) The bridge mode is only relevant when there are multiple functions per port. Therefore, prevent setting and getting this setting when there are no VFs. Note that after this change, there are no settings to change on the PF interface using `bridge link` when there are no VFs, so the interface no longer appears in the `bridge link` output. Fixes: `4b89251de0` ("net/mlx5: Support ndo bridge_setlink and getlink") Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2024-09-09 12:39:57 -07:00
Carolina Jubran	861cd9b9cb	net/mlx5: Verify support for scheduling element and TSAR type Before creating a scheduling element in a NIC or E-Switch scheduler, ensure that the requested element type is supported. If the element is of type Transmit Scheduling Arbiter (TSAR), also verify that the specific TSAR type is supported. Fixes: `214baf2287` ("net/mlx5e: Support HTB offload") Fixes: `85c5f7c920` ("net/mlx5: E-switch, Create QoS on demand") Fixes: `0fe132eac3` ("net/mlx5: E-switch, Allow to add vports to rate groups") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2024-09-09 12:39:57 -07:00
Carolina Jubran	452ef7f860	net/mlx5: Add missing masks and QoS bit masks for scheduling elements Add the missing masks for supported element types and Transmit Scheduling Arbiter (TSAR) types in scheduling elements. Also, add the corresponding bit masks for these types in the QoS capabilities of a NIC scheduler. Fixes: `214baf2287` ("net/mlx5e: Support HTB offload") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2024-09-09 12:39:57 -07:00
Carolina Jubran	c88146abe4	net/mlx5: Explicitly set scheduling element and TSAR type Ensure the scheduling element type and TSAR type are explicitly initialized in the QoS rate group creation. This prevents potential issues due to default values. Fixes: `1ae258f8b3` ("net/mlx5: E-switch, Introduce rate limiting groups API") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2024-09-09 12:39:57 -07:00
Shahar Shitrit	80bf474242	net/mlx5e: Add missing link mode to ptys2ext_ethtool_map Add MLX5E_400GAUI_8_400GBASE_CR8 to the extended modes in ptys2ext_ethtool_table, since it was missing. Fixes: `6a89737241` ("net/mlx5: ethtool, Add ethtool support for 50Gbps per lane link modes") Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2024-09-09 12:39:57 -07:00
Shahar Shitrit	7617d62cba	net/mlx5e: Add missing link modes to ptys2ethtool_map Add MLX5E_1000BASE_T and MLX5E_100BASE_TX to the legacy modes in ptys2legacy_ethtool_table, since they were missing. Fixes: `665bc53969` ("net/mlx5e: Use new ethtool get/set link ksettings API") Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2024-09-09 12:39:57 -07:00
Maher Sanalla	7472d157cb	net/mlx5: Update the list of the PCI supported devices Add the upcoming ConnectX-9 device ID to the table of supported PCI device IDs. Fixes: `f908a35b22` ("net/mlx5: Update the list of the PCI supported devices") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2024-09-09 12:39:56 -07:00
Sriram Yagnaraman	27717f8b17	igb: Always call igb_xdp_ring_update_tail() under Tx lock Always call igb_xdp_ring_update_tail() under __netif_tx_lock, add a comment and lockdep assert to indicate that. This is needed to share the same TX ring between XDP, XSK and slow paths. Furthermore, the current XDP implementation is racy on tail updates. Fixes: `9cbc948b5a` ("igb: add XDP support") Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> [Kurt: Add lockdep assert and fixes tag] Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2024-09-09 11:01:01 -07:00
Michal Schmidt	d2940002b0	ice: fix VSI lists confusion when adding VLANs The description of function ice_find_vsi_list_entry says: Search VSI list map with VSI count 1 However, since the blamed commit (see Fixes below), the function no longer checks vsi_count. This causes a problem in ice_add_vlan_internal, where the decision to share VSI lists between filter rules relies on the vsi_count of the found existing VSI list being 1. The reproducing steps: 1. Have a PF and two VFs. There will be a filter rule for VLAN 0, referring to a VSI list containing VSIs: 0 (PF), 2 (VF#0), 3 (VF#1). 2. Add VLAN 1234 to VF#0. ice will make the wrong decision to share the VSI list with the new rule. The wrong behavior may not be immediately apparent, but it can be observed with debug prints. 3. Add VLAN 1234 to VF#1. ice will unshare the VSI list for the VLAN 1234 rule. Due to the earlier bad decision, the newly created VSI list will contain VSIs 0 (PF) and 3 (VF#1), instead of expected 2 (VF#0) and 3 (VF#1). 4. Try pinging a network peer over the VLAN interface on VF#0. This fails. Reproducer script at: https://gitlab.com/mschmidt2/repro/-/blob/master/RHEL-46814/test-vlan-vsi-list-confusion.sh Commented debug trace: https://gitlab.com/mschmidt2/repro/-/blob/master/RHEL-46814/ice-vlan-vsi-lists-debug.txt Patch adding the debug prints: `f8a8814623` (Unsafe, by the way. Lacks rule_lock when dumping in ice_remove_vlan.) Michal Swiatkowski added to the explanation that the bug is caused by reusing a VSI list created for VLAN 0. All created VFs' VSIs are added to VLAN 0 filter. When a non-zero VLAN is created on a VF which is already in VLAN 0 (normal case), the VSI list from VLAN 0 is reused. It leads to a problem because all VFs (VSIs to be specific) that are subscribed to VLAN 0 will now receive a new VLAN tag traffic. This is one bug, another is the bug described above. Removing filters from one VF will remove VLAN filter from the previous VF. It happens a VF is reset. Example: - creation of 3 VFs - we have VSI list (used for VLAN 0) [0 (pf), 2 (vf1), 3 (vf2), 4 (vf3)] - we are adding VLAN 100 on VF1, we are reusing the previous list because 2 is there - VLAN traffic works fine, but VLAN 100 tagged traffic can be received on all VSIs from the list (for example broadcast or unicast) - trust is turning on VF2, VF2 is resetting, all filters from VF2 are removed; the VLAN 100 filter is also removed because 3 is on the list - VLAN traffic to VF1 isn't working anymore, there is a need to recreate VLAN interface to readd VLAN filter One thing I'm not certain about is the implications for the LAG feature, which is another caller of ice_find_vsi_list_entry. I don't have a LAG-capable card at hand to test. Fixes: `23ccae5ce1` ("ice: changes to the interface with the HW and FW for SRIOV_VF+LAG") Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Dave Ertman <David.m.ertman@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2024-09-09 11:01:01 -07:00
Przemek Kitszel	e6501fc38a	ice: stop calling pci_disable_device() as we use pcim Our driver uses devres to manage resources, in particular we call pcim_enable_device(), what also means we express the intent to get automatic pci_disable_device() call at driver removal. Manual calls to pci_disable_device() misuse the API. Recent commit (see "Fixes" tag) has changed the removal action from conditional (silent ignore of double call to pci_disable_device()) to unconditional, but able to catch unwanted redundant calls; see cited "Fixes" commit for details. Since that, unloading the driver yields following warn+splat: [70633.628490] ice 0000:af:00.7: disabling already-disabled device [70633.628512] WARNING: CPU: 52 PID: 33890 at drivers/pci/pci.c:2250 pci_disable_device+0xf4/0x100 ... [70633.628744] ? pci_disable_device+0xf4/0x100 [70633.628752] release_nodes+0x4a/0x70 [70633.628759] devres_release_all+0x8b/0xc0 [70633.628768] device_unbind_cleanup+0xe/0x70 [70633.628774] device_release_driver_internal+0x208/0x250 [70633.628781] driver_detach+0x47/0x90 [70633.628786] bus_remove_driver+0x80/0x100 [70633.628791] pci_unregister_driver+0x2a/0xb0 [70633.628799] ice_module_exit+0x11/0x3a [ice] Note that this is the only Intel ethernet driver that needs such fix. Fixes: `f748a07a0b` ("PCI: Remove legacy pcim_release()") Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2024-09-09 11:01:00 -07:00
Jacob Keller	e843cf7b34	ice: fix accounting for filters shared by multiple VSIs When adding a switch filter (such as a MAC or VLAN filter), it is expected that the driver will detect the case where the filter already exists, and return -EEXIST. This is used by calling code such as ice_vc_add_mac_addr, and ice_vsi_add_vlan to avoid incrementing the accounting fields such as vsi->num_vlan or vf->num_mac. This logic works correctly for the case where only a single VSI has added a given switch filter. When a second VSI adds the same switch filter, the driver converts the existing filter from an ICE_FWD_TO_VSI filter into an ICE_FWD_TO_VSI_LIST filter. This saves switch resources, by ensuring that multiple VSIs can re-use the same filter. The ice_add_update_vsi_list() function is responsible for doing this conversion. When first converting a filter from the FWD_TO_VSI into FWD_TO_VSI_LIST, it checks if the VSI being added is the same as the existing rule's VSI. In such a case it returns -EEXIST. However, when the switch rule has already been converted to a FWD_TO_VSI_LIST, the logic is different. Adding a new VSI in this case just requires extending the VSI list entry. The logic for checking if the rule already exists in this case returns 0 instead of -EEXIST. This breaks the accounting logic mentioned above, so the counters for how many MAC and VLAN filters exist for a given VF or VSI no longer accurately reflect the actual count. This breaks other code which relies on these counts. In typical usage this primarily affects such filters generally shared by multiple VSIs such as VLAN 0, or broadcast and multicast MAC addresses. Fix this by correctly reporting -EEXIST in the case of adding the same VSI to a switch rule already converted to ICE_FWD_TO_VSI_LIST. Fixes: `9daf8208dd` ("ice: Add support for switch filter programming") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2024-09-09 11:01:00 -07:00
Martyna Szapar-Mudlaw	9debb703e1	ice: Fix lldp packets dropping after changing the number of channels After vsi setup refactor commit `6624e780a5` ("ice: split ice_vsi_setup into smaller functions") ice_cfg_sw_lldp function which removes rx rule directing LLDP packets to vsi is moved from ice_vsi_release to ice_vsi_decfg function. ice_vsi_decfg is used in more cases than just in vsi_release resulting in unnecessary removal of rx lldp packets handling switch rule. This leads to lldp packets being dropped after a change number of channels via ethtool. This patch moves ice_cfg_sw_lldp function that removes rx lldp sw rule back to ice_vsi_release function. Fixes: `6624e780a5` ("ice: split ice_vsi_setup into smaller functions") Reported-by: Matěj Grégr <mgregr@netx.as> Closes: https://lore.kernel.org/intel-wired-lan/1be45a76-90af-4813-824f-8398b69745a9@netx.as/T/#u Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2024-09-09 11:00:25 -07:00
Eric Dumazet	b3c9e65eb2	net: hsr: remove seqnr_lock syzbot found a new splat [1]. Instead of adding yet another spin_lock_bh(&hsr->seqnr_lock) / spin_unlock_bh(&hsr->seqnr_lock) pair, remove seqnr_lock and use atomic_t for hsr->sequence_nr and hsr->sup_sequence_nr. This also avoid a race in hsr_fill_info(). Also remove interlink_sequence_nr which is unused. [1] WARNING: CPU: 1 PID: 9723 at net/hsr/hsr_forward.c:602 handle_std_frame+0x247/0x2c0 net/hsr/hsr_forward.c:602 Modules linked in: CPU: 1 UID: 0 PID: 9723 Comm: syz.0.1657 Not tainted 6.11.0-rc6-syzkaller-00026-g88fac17500f4 #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:handle_std_frame+0x247/0x2c0 net/hsr/hsr_forward.c:602 Code: 49 8d bd b0 01 00 00 be ff ff ff ff e8 e2 58 25 00 31 ff 89 c5 89 c6 e8 47 53 a8 f6 85 ed 0f 85 5a ff ff ff e8 fa 50 a8 f6 90 <0f> 0b 90 e9 4c ff ff ff e8 cc e7 06 f7 e9 8f fe ff ff e8 52 e8 06 RSP: 0018:ffffc90000598598 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffc90000598670 RCX: ffffffff8ae2c919 RDX: ffff888024e94880 RSI: ffffffff8ae2c926 RDI: 0000000000000005 RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003 R13: ffff8880627a8cc0 R14: 0000000000000000 R15: ffff888012b03c3a FS: 0000000000000000(0000) GS:ffff88802b700000(0063) knlGS:00000000f5696b40 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 0000000020010000 CR3: 00000000768b4000 CR4: 0000000000350ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> hsr_fill_frame_info+0x2c8/0x360 net/hsr/hsr_forward.c:630 fill_frame_info net/hsr/hsr_forward.c:700 [inline] hsr_forward_skb+0x7df/0x25c0 net/hsr/hsr_forward.c:715 hsr_handle_frame+0x603/0x850 net/hsr/hsr_slave.c:70 __netif_receive_skb_core.constprop.0+0xa3d/0x4330 net/core/dev.c:5555 __netif_receive_skb_list_core+0x357/0x950 net/core/dev.c:5737 __netif_receive_skb_list net/core/dev.c:5804 [inline] netif_receive_skb_list_internal+0x753/0xda0 net/core/dev.c:5896 gro_normal_list include/net/gro.h:515 [inline] gro_normal_list include/net/gro.h:511 [inline] napi_complete_done+0x23f/0x9a0 net/core/dev.c:6247 gro_cell_poll+0x162/0x210 net/core/gro_cells.c:66 __napi_poll.constprop.0+0xb7/0x550 net/core/dev.c:6772 napi_poll net/core/dev.c:6841 [inline] net_rx_action+0xa92/0x1010 net/core/dev.c:6963 handle_softirqs+0x216/0x8f0 kernel/softirq.c:554 do_softirq kernel/softirq.c:455 [inline] do_softirq+0xb2/0xf0 kernel/softirq.c:442 </IRQ> <TASK> Fixes: `06afd2c31d` ("hsr: Synchronize sending frames to have always incremented outgoing seq nr.") Fixes: `f421436a59` ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-09-09 10:25:01 +01:00
Linus Torvalds	d759ee240d	Including fixes from can, bluetooth and wireless. No known regressions at this point. Another calm week, but chances are that has more to do with vacation season than the quality of our work. Current release - new code bugs: - smc: prevent NULL pointer dereference in txopt_get - eth: ti: am65-cpsw: number of XDP-related fixes Previous releases - regressions: - Revert "Bluetooth: MGMT/SMP: Fix address type when using SMP over BREDR/LE", it breaks existing user space - Bluetooth: qca: if memdump doesn't work, re-enable IBS to avoid later problems with suspend - can: mcp251x: fix deadlock if an interrupt occurs during mcp251x_open - eth: r8152: fix the firmware communication error due to use of bulk write - ptp: ocp: fix serial port information export - eth: igb: fix not clearing TimeSync interrupts for 82580 - Revert "wifi: ath11k: support hibernation", fix suspend on Lenovo Previous releases - always broken: - eth: intel: fix crashes and bugs when reconfiguration and resets happening in parallel - wifi: ath11k: fix NULL dereference in ath11k_mac_get_eirp_power() Misc: - docs: netdev: document guidance on cleanup.h Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmbaMbsACgkQMUZtbf5S IrtpUg/+J6rNaZuGVTHJQAjdSlMx/HzpN3GIbhYyUSg+iHNclqtxJ706b2vyrG88 Rw5a+f8aQueONNsoFfa/ooU4cGsdO1oYlch0Wtuj5taCqy2SvtVqJAyiuDyNNjU0 BQ1Rf7aRLI7enmEpZJN2FFu106YTVccBcLqhPkx0CPcEjV+p5RvypfeQL72H6ZKx +7/HzEl4bagHIQW3W1uJGNUdwNP7fP2/Kg7TrTJ1t629nLiJCxKL7LrsmebO5o9a v+2NxAa1eujTZ1k7ITcM0wYlxKOaGNFF4sT+dA+GfMe+SFssdhGeZYyv1t0zm5VI 3apJ/pSHza1a/hXFa6PaOSw5M5LWn4bJOqeZLl/yIV0upu5xadWqmT0gVay8V9lY +x/MURGr3seuNRSMsaToHDIq+Us45Dt/qkDDNO/P+9R/BsJKCW05Pfqx3Mr/OHzv eeCPbXRh4YYBdrUicBWo04gSD+BUA53vW8FC3pxU5ieLOOcX4kOPeb8wNPHcXjMU 73D+kyO1ufsfsFMkd3VfgDI1mMz+xpEuZ6pxs33tJ/1Ny7DdG1Q49xlQVh4Wnobk uQqUSzdoelOROeg1rwmsIbfwIvj5a5dVIyBu8TDjHlb/rk1QNTkyu+fFmQRWEotL fQ7U62wXlpoCT8WchSMtiU32IDJ2+Lwhwecguy1Z7kLOLrtL8XU= =ju86 -----END PGP SIGNATURE----- Merge tag 'net-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from can, bluetooth and wireless. No known regressions at this point. Another calm week, but chances are that has more to do with vacation season than the quality of our work. Current release - new code bugs: - smc: prevent NULL pointer dereference in txopt_get - eth: ti: am65-cpsw: number of XDP-related fixes Previous releases - regressions: - Revert "Bluetooth: MGMT/SMP: Fix address type when using SMP over BREDR/LE", it breaks existing user space - Bluetooth: qca: if memdump doesn't work, re-enable IBS to avoid later problems with suspend - can: mcp251x: fix deadlock if an interrupt occurs during mcp251x_open - eth: r8152: fix the firmware communication error due to use of bulk write - ptp: ocp: fix serial port information export - eth: igb: fix not clearing TimeSync interrupts for 82580 - Revert "wifi: ath11k: support hibernation", fix suspend on Lenovo Previous releases - always broken: - eth: intel: fix crashes and bugs when reconfiguration and resets happening in parallel - wifi: ath11k: fix NULL dereference in ath11k_mac_get_eirp_power() Misc: - docs: netdev: document guidance on cleanup.h" * tag 'net-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits) ila: call nf_unregister_net_hooks() sooner tools/net/ynl: fix cli.py --subscribe feature MAINTAINERS: fix ptp ocp driver maintainers address selftests: net: enable bind tests net: dsa: vsc73xx: fix possible subblocks range of CAPT block sched: sch_cake: fix bulk flow accounting logic for host fairness docs: netdev: document guidance on cleanup.h net: xilinx: axienet: Fix race in axienet_stop net: bridge: br_fdb_external_learn_add(): always set EXT_LEARN r8152: fix the firmware doesn't work fou: Fix null-ptr-deref in GRO. bareudp: Fix device stats updates. net: mana: Fix error handling in mana_create_txq/rxq's NAPI cleanup bpf, net: Fix a potential race in do_sock_getsockopt() net: dqs: Do not use extern for unused dql_group sch/netem: fix use after free in netem_dequeue usbnet: modern method to get random MAC MAINTAINERS: wifi: cw1200: add net-cw1200.h ice: do not bring the VSI up, if it was down before the XDP setup ice: remove ICE_CFG_BUSY locking from AF_XDP code ...	2024-09-05 17:08:01 -07:00
Linus Torvalds	f95359996a	spi: Fixes for v6.11 A few small driver specific fixes (including some of the widespread work on fixing missing ID tables for module autoloading and the revert of some problematic PM work in spi-rockchip), some improvements to the MAINTAINERS information for the NXP drivers and the addition of a new device ID to spidev. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmbaK/gACgkQJNaLcl1U h9DzvAf+LDdI7hKt61JqwMNpmAB9n2AlXBCksfiWHVVXqoYaRBUQeMsoXjaAmkZ5 CAZneRV0yud9JR9BdRWITbYkMc3XzoMVDN5Z6C329lQjAJKyOE4n/VhgODcT1Hcj PmKigUI0Ub6xEq7NyZZxl4/W/1Pp8Ybg2Z1+1aF4vW2SVMGg6+NlmEZitV61M64f ZHH4c6SpW1fQPcUeNzlhOg3v6dGuyDf1/Mn4gopdrNzXP8VUvltMtnKBo4xZk34o kMxXMgWQ8bDD6vHQP7P50D49y70OMmoySo4GkWpnyzuIQVEN1W/QoC3py2NbPhQE WtAgG/9sf90tuPgaDQ6h3fknbYEGZg== =fedt -----END PGP SIGNATURE----- Merge tag 'spi-fix-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A few small driver specific fixes (including some of the widespread work on fixing missing ID tables for module autoloading and the revert of some problematic PM work in spi-rockchip), some improvements to the MAINTAINERS information for the NXP drivers and the addition of a new device ID to spidev" * tag 'spi-fix-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: MAINTAINERS: SPI: Add mailing list imx@lists.linux.dev for nxp spi drivers MAINTAINERS: SPI: Add freescale lpspi maintainer information spi: spi-fsl-lpspi: Fix off-by-one in prescale max spi: spidev: Add missing spi_device_id for jg10309-01 spi: bcm63xx: Enable module autoloading spi: intel: Add check devm_kasprintf() returned value spi: spidev: Add an entry for elgin,jg10309-01 spi: rockchip: Resolve unbalanced runtime PM / system PM handling	2024-09-05 16:49:10 -07:00
Linus Torvalds	2a66044754	regulator: Fix for v6.11 A fix from Doug Anderson for a missing stub, required to fix the build for some newly added users of devm_regulator_bulk_get_const() in !REGULATOR configurations. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmbaKwgACgkQJNaLcl1U h9A0Twf+Nw4D6a/LVuGTArpWXahY/HEP/BndU9Su4modr30SQY1/Ih4/PMAlIVyd RQvWTGXZdesxv04gOM9TSoLl7IUgVR5nrVPbj7H36C+Nw+l9e6lH24+PgVN96FIb jFXOAyI6ka+ghj3xRlxObDqnjxRBFdcY6WTw2RbaJIPj3XjY3EQ9gQhp2k75ktTS JWU0QX5Q9jaNirQGjLUuDRNyhIZZHur7VM1TuzH44qDmbZ/O5e3UKQUAWUGKJgxE bH64OYvQjWPnbhr0ABOPRd+0cVCAXbW+7KxcnhhHtIf/023YCAmMIDqAjsLlIzvP cb3sWErzFS7pzQUTE1RMMbbfQ/eqKw== =hPtk -----END PGP SIGNATURE----- Merge tag 'regulator-fix-v6.11-stub' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fix from Mark Brown: "A fix from Doug Anderson for a missing stub, required to fix the build for some newly added users of devm_regulator_bulk_get_const() in !REGULATOR configurations" * tag 'regulator-fix-v6.11-stub' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: core: Stub devm_regulator_bulk_get_const() if !CONFIG_REGULATOR	2024-09-05 16:41:16 -07:00
Linus Torvalds	6c5b3e30e5	Rust fixes for v6.11 (2nd) Toolchain and infrastructure: - Fix builds for nightly compiler users now that 'new_uninit' was split into new features by using an alternative approach for the code that used what is now called the 'box_uninit_write' feature. - Allow the 'stable_features' lint to preempt upcoming warnings about them, since soon there will be unstable features that will become stable in nightly compilers. - Export bss symbols too. 'kernel' crate: - 'block' module: fix wrong usage of lockdep API. 'macros' crate: - Provide correct provenance when constructing 'THIS_MODULE'. Documentation: - Remove unintended indentation (blockquotes) in generated output. - Fix a couple typos. MAINTAINERS: - Remove Wedson as Rust maintainer. - Update Andreas' email. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAmbaIkMACgkQGXyLc2ht IW2PtQ//ZMLngRVmzOtek7oz3NfR7vAbMjwknC8uy9xKsBGGYpyI9LUdBziMB/SD +/moWfLacJwuFRRlgsPVjo6IcwCnMauO5V77cWWTbcH6dgigfD4fz/IpZlMuTdUA gQ8xAjr8jMK0uT3ME5v7QpXQYOE/1wvOHJpULO71On1l4JqxCbSRozxI4TTxFEmP mVRCV7c38NzX2YZyYkIzbkXNS8y0tbCTMC1HLCfROOrxbGh7eqjq/AZcHl2SkxHg C0u0Rc7BQtMkCQqtceEWPNloe2SlznbhqiJi6vOttM3LUFYCtS76zqrGnnqtFkMF zJuXYNpbjFDvZCfbOL50p0oYb5G/+FgZarK+TPXympPnyqoyWNufwS5N0Hn10wn8 5GbI3ZGrg6q5SE39Sf/wiC9tMP4h2dUTeLwabk2yuW7QVBAxw52SkEtpRIevSN2n b0IUExRHilXMBY0FmkP3x+Ot+qkydxWH0c/rpf9NldNfP7F6QY384hn47/cl8bMo LUygXBjAbDTtKage97JkZTfmOUeS6nrYrQjLL/kUu1H+AdnhOUndtaHv6W5zArQQ hFU751+izJFiEw8wqw1MoK2CcRL5QNMDdrj+kwXaCMNSv0Ie3p4jqXuWXSVUBoMz l0pJMgjE1Dc2QYTDV3Jq8ptiiBBM9Oyxen/doBA0OJiMm9S1drQ= =nJge -----END PGP SIGNATURE----- Merge tag 'rust-fixes-6.11-2' of https://github.com/Rust-for-Linux/linux Pull Rust fixes from Miguel Ojeda: "Toolchain and infrastructure: - Fix builds for nightly compiler users now that 'new_uninit' was split into new features by using an alternative approach for the code that used what is now called the 'box_uninit_write' feature - Allow the 'stable_features' lint to preempt upcoming warnings about them, since soon there will be unstable features that will become stable in nightly compilers - Export bss symbols too 'kernel' crate: - 'block' module: fix wrong usage of lockdep API 'macros' crate: - Provide correct provenance when constructing 'THIS_MODULE' Documentation: - Remove unintended indentation (blockquotes) in generated output - Fix a couple typos MAINTAINERS: - Remove Wedson as Rust maintainer - Update Andreas' email" * tag 'rust-fixes-6.11-2' of https://github.com/Rust-for-Linux/linux: MAINTAINERS: update Andreas Hindborg's email address MAINTAINERS: Remove Wedson as Rust maintainer rust: macros: provide correct provenance when constructing THIS_MODULE rust: allow `stable_features` lint docs: rust: remove unintended blockquote in Quick Start rust: alloc: eschew `Box<MaybeUninit<T>>::write` rust: kernel: fix typos in code comments docs: rust: remove unintended blockquote in Coding Guidelines rust: block: fix wrong usage of lockdep API rust: kbuild: fix export of bss symbols	2024-09-05 16:35:57 -07:00
Linus Torvalds	e4b42053b7	Tracing fixes for 6.11: - Fix adding a new fgraph callback after function graph tracing has already started. If the new caller does not initialize its hash before registering the fgraph_ops, it can cause a NULL pointer dereference. Fix this by adding a new parameter to ftrace_graph_enable_direct() passing in the newly added gops directly and not rely on using the fgraph_array[], as entries in the fgraph_array[] must be initialized. Assign the new gops to the fgraph_array[] after it goes through ftrace_startup_subops() as that will properly initialize the gops->ops and initialize its hashes. - Fix a memory leak in fgraph storage memory test. If the "multiple fgraph storage on a function" boot up selftest fails in the registering of the function graph tracer, it will not free the memory it allocated for the filter. Break the loop up into two where it allocates the filters first and then registers the functions where any errors will do the appropriate clean ups. - Only clear the timerlat timers if it has an associated kthread. In the rtla tool that uses timerlat, if it was killed just as it was shutting down, the signals can free the kthread and the timer. But the closing of the timerlat files could cause the hrtimer_cancel() to be called on the already freed timer. As the kthread variable is is set to NULL when the kthreads are stopped and the timers are freed it can be used to know not to call hrtimer_cancel() on the timer if the kthread variable is NULL. - Use a cpumask to keep track of osnoise/timerlat kthreads The timerlat tracer can use user space threads for its analysis. With the killing of the rtla tool, the kernel can get confused between if it is using a user space thread to analyze or one of its own kernel threads. When this confusion happens, kthread_stop() can be called on a user space thread and bad things happen. As the kernel threads are per-cpu, a bitmask can be used to know when a kernel thread is used or when a user space thread is used. - Add missing interface_lock to osnoise/timerlat stop_kthread() The stop_kthread() function in osnoise/timerlat clears the osnoise kthread variable, and if it was a user space thread does a put_task on it. But this can race with the closing of the timerlat files that also does a put_task on the kthread, and if the race happens the task will have put_task called on it twice and oops. - Add cond_resched() to the tracing_iter_reset() loop. The latency tracers keep writing to the ring buffer without resetting when it issues a new "start" event (like interrupts being disabled). When reading the buffer with an iterator, the tracing_iter_reset() sets its pointer to that start event by walking through all the events in the buffer until it gets to the time stamp of the start event. In the case of a very large buffer, the loop that looks for the start event has been reported taking a very long time with a non preempt kernel that it can trigger a soft lock up warning. Add a cond_resched() into that loop to make sure that doesn't happen. - Use list_del_rcu() for eventfs ei->list variable It was reported that running loops of creating and deleting kprobe events could cause a crash due to the eventfs list iteration hitting a LIST_POISON variable. This is because the list is protected by SRCU but when an item is deleted from the list, it was using list_del() which poisons the "next" pointer. This is what list_del_rcu() was to prevent. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZtohNBQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qtoNAQDQKjomYLCpLz2EqgHZ6VB81QVrHuqt cU7xuEfUJDzyyAEA/n0t6quIdjYRd6R2/KxGkP6By/805Coq4IZMTgNQmw0= =nZ7k -----END PGP SIGNATURE----- Merge tag 'trace-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix adding a new fgraph callback after function graph tracing has already started. If the new caller does not initialize its hash before registering the fgraph_ops, it can cause a NULL pointer dereference. Fix this by adding a new parameter to ftrace_graph_enable_direct() passing in the newly added gops directly and not rely on using the fgraph_array[], as entries in the fgraph_array[] must be initialized. Assign the new gops to the fgraph_array[] after it goes through ftrace_startup_subops() as that will properly initialize the gops->ops and initialize its hashes. - Fix a memory leak in fgraph storage memory test. If the "multiple fgraph storage on a function" boot up selftest fails in the registering of the function graph tracer, it will not free the memory it allocated for the filter. Break the loop up into two where it allocates the filters first and then registers the functions where any errors will do the appropriate clean ups. - Only clear the timerlat timers if it has an associated kthread. In the rtla tool that uses timerlat, if it was killed just as it was shutting down, the signals can free the kthread and the timer. But the closing of the timerlat files could cause the hrtimer_cancel() to be called on the already freed timer. As the kthread variable is is set to NULL when the kthreads are stopped and the timers are freed it can be used to know not to call hrtimer_cancel() on the timer if the kthread variable is NULL. - Use a cpumask to keep track of osnoise/timerlat kthreads The timerlat tracer can use user space threads for its analysis. With the killing of the rtla tool, the kernel can get confused between if it is using a user space thread to analyze or one of its own kernel threads. When this confusion happens, kthread_stop() can be called on a user space thread and bad things happen. As the kernel threads are per-cpu, a bitmask can be used to know when a kernel thread is used or when a user space thread is used. - Add missing interface_lock to osnoise/timerlat stop_kthread() The stop_kthread() function in osnoise/timerlat clears the osnoise kthread variable, and if it was a user space thread does a put_task on it. But this can race with the closing of the timerlat files that also does a put_task on the kthread, and if the race happens the task will have put_task called on it twice and oops. - Add cond_resched() to the tracing_iter_reset() loop. The latency tracers keep writing to the ring buffer without resetting when it issues a new "start" event (like interrupts being disabled). When reading the buffer with an iterator, the tracing_iter_reset() sets its pointer to that start event by walking through all the events in the buffer until it gets to the time stamp of the start event. In the case of a very large buffer, the loop that looks for the start event has been reported taking a very long time with a non preempt kernel that it can trigger a soft lock up warning. Add a cond_resched() into that loop to make sure that doesn't happen. - Use list_del_rcu() for eventfs ei->list variable It was reported that running loops of creating and deleting kprobe events could cause a crash due to the eventfs list iteration hitting a LIST_POISON variable. This is because the list is protected by SRCU but when an item is deleted from the list, it was using list_del() which poisons the "next" pointer. This is what list_del_rcu() was to prevent. * tag 'trace-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/timerlat: Add interface_lock around clearing of kthread in stop_kthread() tracing/timerlat: Only clear timer if a kthread exists tracing/osnoise: Use a cpumask to know what threads are kthreads eventfs: Use list_del_rcu() for SRCU protected list variable tracing: Avoid possible softlockup in tracing_iter_reset() tracing: Fix memory leak in fgraph storage selftest tracing: fgraph: Fix to add new fgraph_ops to array after ftrace_startup_subops()	2024-09-05 16:29:41 -07:00
Eric Dumazet	031ae72825	ila: call nf_unregister_net_hooks() sooner syzbot found an use-after-free Read in ila_nf_input [1] Issue here is that ila_xlat_exit_net() frees the rhashtable, then call nf_unregister_net_hooks(). It should be done in the reverse way, with a synchronize_rcu(). This is a good match for a pre_exit() method. [1] BUG: KASAN: use-after-free in rht_key_hashfn include/linux/rhashtable.h:159 [inline] BUG: KASAN: use-after-free in __rhashtable_lookup include/linux/rhashtable.h:604 [inline] BUG: KASAN: use-after-free in rhashtable_lookup include/linux/rhashtable.h:646 [inline] BUG: KASAN: use-after-free in rhashtable_lookup_fast+0x77a/0x9b0 include/linux/rhashtable.h:672 Read of size 4 at addr ffff888064620008 by task ksoftirqd/0/16 CPU: 0 UID: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.11.0-rc4-syzkaller-00238-g2ad6d23f465a #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 rht_key_hashfn include/linux/rhashtable.h:159 [inline] __rhashtable_lookup include/linux/rhashtable.h:604 [inline] rhashtable_lookup include/linux/rhashtable.h:646 [inline] rhashtable_lookup_fast+0x77a/0x9b0 include/linux/rhashtable.h:672 ila_lookup_wildcards net/ipv6/ila/ila_xlat.c:132 [inline] ila_xlat_addr net/ipv6/ila/ila_xlat.c:652 [inline] ila_nf_input+0x1fe/0x3c0 net/ipv6/ila/ila_xlat.c:190 nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline] nf_hook_slow+0xc3/0x220 net/netfilter/core.c:626 nf_hook include/linux/netfilter.h:269 [inline] NF_HOOK+0x29e/0x450 include/linux/netfilter.h:312 __netif_receive_skb_one_core net/core/dev.c:5661 [inline] __netif_receive_skb+0x1ea/0x650 net/core/dev.c:5775 process_backlog+0x662/0x15b0 net/core/dev.c:6108 __napi_poll+0xcb/0x490 net/core/dev.c:6772 napi_poll net/core/dev.c:6841 [inline] net_rx_action+0x89b/0x1240 net/core/dev.c:6963 handle_softirqs+0x2c4/0x970 kernel/softirq.c:554 run_ksoftirqd+0xca/0x130 kernel/softirq.c:928 smpboot_thread_fn+0x544/0xa30 kernel/smpboot.c:164 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> The buggy address belongs to the physical page: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x64620 flags: 0xfff00000000000(node=0\|zone=1\|lastcpupid=0x7ff) page_type: 0xbfffffff(buddy) raw: 00fff00000000000 ffffea0000959608 ffffea00019d9408 0000000000000000 raw: 0000000000000000 0000000000000003 00000000bfffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as freed page last allocated via order 3, migratetype Unmovable, gfp_mask 0x52dc0(GFP_KERNEL\|__GFP_NOWARN\|__GFP_NORETRY\|__GFP_COMP\|__GFP_ZERO), pid 5242, tgid 5242 (syz-executor), ts 73611328570, free_ts 618981657187 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1493 prep_new_page mm/page_alloc.c:1501 [inline] get_page_from_freelist+0x2e4c/0x2f10 mm/page_alloc.c:3439 __alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4695 __alloc_pages_node_noprof include/linux/gfp.h:269 [inline] alloc_pages_node_noprof include/linux/gfp.h:296 [inline] ___kmalloc_large_node+0x8b/0x1d0 mm/slub.c:4103 __kmalloc_large_node_noprof+0x1a/0x80 mm/slub.c:4130 __do_kmalloc_node mm/slub.c:4146 [inline] __kmalloc_node_noprof+0x2d2/0x440 mm/slub.c:4164 __kvmalloc_node_noprof+0x72/0x190 mm/util.c:650 bucket_table_alloc lib/rhashtable.c:186 [inline] rhashtable_init_noprof+0x534/0xa60 lib/rhashtable.c:1071 ila_xlat_init_net+0xa0/0x110 net/ipv6/ila/ila_xlat.c:613 ops_init+0x359/0x610 net/core/net_namespace.c:139 setup_net+0x515/0xca0 net/core/net_namespace.c:343 copy_net_ns+0x4e2/0x7b0 net/core/net_namespace.c:508 create_new_namespaces+0x425/0x7b0 kernel/nsproxy.c:110 unshare_nsproxy_namespaces+0x124/0x180 kernel/nsproxy.c:228 ksys_unshare+0x619/0xc10 kernel/fork.c:3328 __do_sys_unshare kernel/fork.c:3399 [inline] __se_sys_unshare kernel/fork.c:3397 [inline] __x64_sys_unshare+0x38/0x40 kernel/fork.c:3397 page last free pid 11846 tgid 11846 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] free_pages_prepare mm/page_alloc.c:1094 [inline] free_unref_page+0xd22/0xea0 mm/page_alloc.c:2612 __folio_put+0x2c8/0x440 mm/swap.c:128 folio_put include/linux/mm.h:1486 [inline] free_large_kmalloc+0x105/0x1c0 mm/slub.c:4565 kfree+0x1c4/0x360 mm/slub.c:4588 rhashtable_free_and_destroy+0x7c6/0x920 lib/rhashtable.c:1169 ila_xlat_exit_net+0x55/0x110 net/ipv6/ila/ila_xlat.c:626 ops_exit_list net/core/net_namespace.c:173 [inline] cleanup_net+0x802/0xcc0 net/core/net_namespace.c:640 process_one_work kernel/workqueue.c:3231 [inline] process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3312 worker_thread+0x86d/0xd40 kernel/workqueue.c:3390 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Memory state around the buggy address: ffff88806461ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88806461ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff888064620000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff888064620080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888064620100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff Fixes: `7f00feaf10` ("ila: Add generic ILA translation facility") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Reviewed-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20240904144418.1162839-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-05 14:57:12 -07:00
Arkadiusz Kubalewski	6fda63c45f	tools/net/ynl: fix cli.py --subscribe feature Execution of command: ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml / --subscribe "monitor" --sleep 10 fails with: File "/repo/./tools/net/ynl/cli.py", line 109, in main ynl.check_ntf() File "/repo/tools/net/ynl/lib/ynl.py", line 924, in check_ntf op = self.rsp_by_value[nl_msg.cmd()] KeyError: 19 Parsing Generic Netlink notification messages performs lookup for op in the message. The message was not yet decoded, and is not yet considered GenlMsg, thus msg.cmd() returns Generic Netlink family id (19) instead of proper notification command id (i.e.: DPLL_CMD_PIN_CHANGE_NTF=13). Allow the op to be obtained within NetlinkProtocol.decode(..) itself if the op was not passed to the decode function, thus allow parsing of Generic Netlink notifications without causing the failure. Suggested-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/netdev/m2le0n5xpn.fsf@gmail.com/ Fixes: `0a966d606c` ("tools/net/ynl: Fix extack decoding for directional ops") Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20240904135034.316033-1-arkadiusz.kubalewski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-05 14:56:45 -07:00
Vadim Fedorenko	20d664ebd2	MAINTAINERS: fix ptp ocp driver maintainers address While checking the latest series for ptp_ocp driver I realised that MAINTAINERS file has wrong item about email on linux.dev domain. Fixes: `795fd9342c` ("ptp_ocp: adjust MAINTAINERS and mailmap") Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240904131855.559078-1-vadim.fedorenko@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-05 14:47:46 -07:00
Jamie Bainbridge	e4af74a53b	selftests: net: enable bind tests bind_wildcard is compiled but not run, bind_timewait is not compiled. These two tests complete in a very short time, use the test harness properly, and seem reasonable to enable. The author of the tests confirmed via email that these were intended to be run. Enable these two tests. Fixes: `13715acf8a` ("selftest: Add test for bind() conflicts.") Fixes: `2c042e8e54` ("tcp: Add selftest for bind() and TIME_WAIT.") Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/5a009b26cf5fb1ad1512d89c61b37e2fac702323.1725430322.git.jamie.bainbridge@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-05 14:38:15 -07:00

1 2 3 4 5 ...

1296352 Commits