OpenCloudOS-Kernel/net/core
Yan Zhai e92971a7ed net: report RCU QS on threaded NAPI repolling
[ Upstream commit d6dbbb11247c71203785a2c9da474c36f4b19eae ]

NAPI threads can keep polling packets under load. Currently it is only
calling cond_resched() before repolling, but it is not sufficient to
clear out the holdout of RCU tasks, which prevent BPF tracing programs
from detaching for long period. This can be reproduced easily with
following set up:

ip netns add test1
ip netns add test2

ip -n test1 link add veth1 type veth peer name veth2 netns test2

ip -n test1 link set veth1 up
ip -n test1 link set lo up
ip -n test2 link set veth2 up
ip -n test2 link set lo up

ip -n test1 addr add 192.168.1.2/31 dev veth1
ip -n test1 addr add 1.1.1.1/32 dev lo
ip -n test2 addr add 192.168.1.3/31 dev veth2
ip -n test2 addr add 2.2.2.2/31 dev lo

ip -n test1 route add default via 192.168.1.3
ip -n test2 route add default via 192.168.1.2

for i in `seq 10 210`; do
 for j in `seq 10 210`; do
    ip netns exec test2 iptables -I INPUT -s 3.3.$i.$j -p udp --dport 5201
 done
done

ip netns exec test2 ethtool -K veth2 gro on
ip netns exec test2 bash -c 'echo 1 > /sys/class/net/veth2/threaded'
ip netns exec test1 ethtool -K veth1 tso off

Then run an iperf3 client/server and a bpftrace script can trigger it:

ip netns exec test2 iperf3 -s -B 2.2.2.2 >/dev/null&
ip netns exec test1 iperf3 -c 2.2.2.2 -B 1.1.1.1 -u -l 1500 -b 3g -t 100 >/dev/null&
bpftrace -e 'kfunc:__napi_poll{@=count();} interval:s:1{exit();}'

Report RCU quiescent states periodically will resolve the issue.

Fixes: 29863d41bb ("net: implement threaded-able napi poll loop support")
Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/4c3b0d3f32d3b18949d75b18e5e1d9f13a24f025.1710877680.git.yan@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-26 18:20:12 -04:00
..
Makefile net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
bpf_sk_storage.c bpf: Add length check for SK_DIAG_BPF_STORAGE_REQ_MAP_FD parsing 2023-07-27 10:07:56 -07:00
datagram.c net: datagram: fix data-races in datagram_poll() 2023-05-10 19:06:49 -07:00
dev.c net: report RCU QS on threaded NAPI repolling 2024-03-26 18:20:12 -04:00
dev.h net: fix removing a namespace with conflicting altnames 2024-01-31 16:19:01 -08:00
dev_addr_lists.c net: extract a few internals from netdevice.h 2022-04-07 20:32:09 -07:00
dev_addr_lists_test.c kunit: Use KUNIT_EXPECT_MEMEQ macro 2022-10-27 02:40:14 -06:00
dev_ioctl.c net: omit ndo_hwtstamp_get() call when possible in dev_set_hwtstamp_phylib() 2023-08-06 13:25:10 +01:00
drop_monitor.c drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group 2023-12-13 18:45:10 +01:00
dst.c net: remove unnecessary input parameter 'how' in ifdown function 2023-08-22 13:19:02 +02:00
dst_cache.c wireguard: device: reset peer src endpoint when netns exits 2021-11-29 19:50:45 -08:00
failover.c net: failover: use IFF_NO_ADDRCONF flag to prevent ipv6 addrconf 2022-12-12 15:18:25 -08:00
fib_notifier.c
fib_rules.c fib: expand fib_rule_policy 2021-12-16 07:18:35 -08:00
filter.c bpf: Derive source IP addr via bpf_*_fib_lookup() 2024-03-01 13:35:04 +01:00
flow_dissector.c net/core: Fix ETH_P_1588 flow dissector 2023-09-15 10:40:04 +01:00
flow_offload.c tc: flower: Enable offload support IPSEC SPI field. 2023-08-02 10:09:32 +01:00
gen_estimator.c treewide: Convert del_timer*() to timer_shutdown*() 2022-12-25 13:38:09 -08:00
gen_stats.c net: Remove the obsolte u64_stats_fetch_*_irq() users (net). 2022-10-28 20:13:54 -07:00
gro.c gro: move the tc_ext comparison to a helper 2023-06-18 18:08:35 +01:00
gro_cells.c net: drop the weight argument from netif_napi_add 2022-09-28 18:57:14 -07:00
gso.c net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
hwbm.c
link_watch.c net: linkwatch: only report IF_OPER_LOWERLAYERDOWN if iflink is actually down 2022-11-16 09:45:00 +00:00
lwt_bpf.c lwt: Fix return values of BPF xmit ops 2023-08-18 16:05:26 +02:00
lwtunnel.c xfrm: lwtunnel: squelch kernel warning in case XFRM encap type is not available 2022-10-12 10:45:51 +02:00
neighbour.c neighbour: Don't let neigh_forced_gc() disable preemption for long 2024-01-20 11:51:43 +01:00
net-procfs.c net-sysfs: display two backlog queue len separately 2023-03-22 12:03:52 +01:00
net-sysfs.c net: move struct netdev_rx_queue out of netdevice.h 2023-08-03 08:38:07 -07:00
net-sysfs.h
net-traces.c udp6: add a missing call into udp_fail_queue_rcv_skb tracepoint 2023-07-07 09:16:52 +01:00
net_namespace.c lib/ref_tracker: improve printing stats 2023-06-05 15:28:42 -07:00
netclassid_cgroup.c core: Variable type completion 2022-08-31 09:40:34 +01:00
netdev-genl-gen.c net: ynl: prefix uAPI header include with uapi/ 2023-05-26 10:30:14 +01:00
netdev-genl-gen.h net: ynl: prefix uAPI header include with uapi/ 2023-05-26 10:30:14 +01:00
netdev-genl.c netdev-genl: use struct genl_info for reply construction 2023-08-15 15:01:03 -07:00
netevent.c net: core: Correct function name netevent_unregister_notifier() in the kerneldoc 2021-03-28 17:56:56 -07:00
netpoll.c netpoll: allocate netdev tracker right away 2023-06-15 08:21:11 +01:00
netprio_cgroup.c bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode 2021-09-13 16:35:58 -07:00
of_net.c net: Explicitly include correct DT includes 2023-07-27 20:33:16 -07:00
page_pool.c net: page_pool: add missing free_percpu when page_pool_init fail 2023-11-20 11:59:34 +01:00
pktgen.c net: pktgen: Fix interface flags printing 2023-10-18 10:12:30 +01:00
ptp_classifier.c ptp: Add generic PTP is_sync() function 2022-03-07 11:31:34 +00:00
request_sock.c tcp: make sure init the accept_queue's spinlocks once 2024-01-31 16:19:00 -08:00
rtnetlink.c rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back 2024-03-06 14:48:36 +00:00
scm.c io_uring/unix: drop usage of io_uring socket 2024-03-26 18:19:09 -04:00
secure_seq.c tcp: Fix data-races around sysctl knobs related to SYN option. 2022-07-20 10:14:49 +01:00
selftests.c net: core: constify mac addrs in selftests 2021-10-24 13:59:44 +01:00
skbuff.c net: mctp: copy skb ext data when fragmenting 2024-03-26 18:19:34 -04:00
skmsg.c bpf, sockmap: Fix NULL pointer dereference in sk_psock_verdict_data_ready() 2024-03-01 13:35:08 +01:00
sock.c udp: fix busy polling 2024-01-31 16:19:01 -08:00
sock_destructor.h skb_expand_head() adjust skb->truesize incorrectly 2021-10-22 12:35:51 -07:00
sock_diag.c sock_diag: annotate data-races around sock_diag_handlers[family] 2024-03-26 18:19:22 -04:00
sock_map.c bpf: syzkaller found null ptr deref in unix_bpf proto add 2024-01-01 12:42:28 +00:00
sock_reuseport.c soreuseport: Fix socket selection for SO_INCOMING_CPU. 2022-10-25 11:35:16 +02:00
stream.c net: Return error from sk_stream_wait_connect() if sk_wait_event() fails 2024-01-01 12:42:30 +00:00
sysctl_net_core.c networking: Update to register_net_sysctl_sz 2023-08-15 15:26:18 -07:00
timestamping.c
tso.c net: tso: inline tso_count_descs() 2022-12-12 15:04:39 -08:00
utils.c net: core: inet[46]_pton strlen len types 2022-11-01 21:14:39 -07:00
xdp.c page_pool: split types and declarations from page_pool.h 2023-08-07 13:05:19 -07:00