OpenCloudOS-Kernel/tools/testing/selftests/net
Omid Ehtemam-Haghighi 52da02521e ipv6: Fix soft lockups in fib6_select_path under high next hop churn
commit d9ccb18f83ea2bb654289b6ecf014fd267cc988b upstream.

Soft lockups have been observed on a cluster of Linux-based edge routers
located in a highly dynamic environment. Using the `bird` service, these
routers continuously update BGP-advertised routes due to frequently
changing nexthop destinations, while also managing significant IPv6
traffic. The lockups occur during the traversal of the multipath
circular linked-list in the `fib6_select_path` function, particularly
while iterating through the siblings in the list. The issue typically
arises when the nodes of the linked list are unexpectedly deleted
concurrently on a different core—indicated by their 'next' and
'previous' elements pointing back to the node itself and their reference
count dropping to zero. This results in an infinite loop, leading to a
soft lockup that triggers a system panic via the watchdog timer.

Apply RCU primitives in the problematic code sections to resolve the
issue. Where necessary, update the references to fib6_siblings to
annotate or use the RCU APIs.

Include a test script that reproduces the issue. The script
periodically updates the routing table while generating a heavy load
of outgoing IPv6 traffic through multiple iperf3 clients. It
consistently induces infinite soft lockups within a couple of minutes.

Kernel log:

 0 [ffffbd13003e8d30] machine_kexec at ffffffff8ceaf3eb
 1 [ffffbd13003e8d90] __crash_kexec at ffffffff8d0120e3
 2 [ffffbd13003e8e58] panic at ffffffff8cef65d4
 3 [ffffbd13003e8ed8] watchdog_timer_fn at ffffffff8d05cb03
 4 [ffffbd13003e8f08] __hrtimer_run_queues at ffffffff8cfec62f
 5 [ffffbd13003e8f70] hrtimer_interrupt at ffffffff8cfed756
 6 [ffffbd13003e8fd0] __sysvec_apic_timer_interrupt at ffffffff8cea01af
 7 [ffffbd13003e8ff0] sysvec_apic_timer_interrupt at ffffffff8df1b83d
-- <IRQ stack> --
 8 [ffffbd13003d3708] asm_sysvec_apic_timer_interrupt at ffffffff8e000ecb
    [exception RIP: fib6_select_path+299]
    RIP: ffffffff8ddafe7b  RSP: ffffbd13003d37b8  RFLAGS: 00000287
    RAX: ffff975850b43600  RBX: ffff975850b40200  RCX: 0000000000000000
    RDX: 000000003fffffff  RSI: 0000000051d383e4  RDI: ffff975850b43618
    RBP: ffffbd13003d3800   R8: 0000000000000000   R9: ffff975850b40200
    R10: 0000000000000000  R11: 0000000000000000  R12: ffffbd13003d3830
    R13: ffff975850b436a8  R14: ffff975850b43600  R15: 0000000000000007
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 9 [ffffbd13003d3808] ip6_pol_route at ffffffff8ddb030c
10 [ffffbd13003d3888] ip6_pol_route_input at ffffffff8ddb068c
11 [ffffbd13003d3898] fib6_rule_lookup at ffffffff8ddf02b5
12 [ffffbd13003d3928] ip6_route_input at ffffffff8ddb0f47
13 [ffffbd13003d3a18] ip6_rcv_finish_core.constprop.0 at ffffffff8dd950d0
14 [ffffbd13003d3a30] ip6_list_rcv_finish.constprop.0 at ffffffff8dd96274
15 [ffffbd13003d3a98] ip6_sublist_rcv at ffffffff8dd96474
16 [ffffbd13003d3af8] ipv6_list_rcv at ffffffff8dd96615
17 [ffffbd13003d3b60] __netif_receive_skb_list_core at ffffffff8dc16fec
18 [ffffbd13003d3be0] netif_receive_skb_list_internal at ffffffff8dc176b3
19 [ffffbd13003d3c50] napi_gro_receive at ffffffff8dc565b9
20 [ffffbd13003d3c80] ice_receive_skb at ffffffffc087e4f5 [ice]
21 [ffffbd13003d3c90] ice_clean_rx_irq at ffffffffc0881b80 [ice]
22 [ffffbd13003d3d20] ice_napi_poll at ffffffffc088232f [ice]
23 [ffffbd13003d3d80] __napi_poll at ffffffff8dc18000
24 [ffffbd13003d3db8] net_rx_action at ffffffff8dc18581
25 [ffffbd13003d3e40] __do_softirq at ffffffff8df352e9
26 [ffffbd13003d3eb0] run_ksoftirqd at ffffffff8ceffe47
27 [ffffbd13003d3ec0] smpboot_thread_fn at ffffffff8cf36a30
28 [ffffbd13003d3ee8] kthread at ffffffff8cf2b39f
29 [ffffbd13003d3f28] ret_from_fork at ffffffff8ce5fa64
30 [ffffbd13003d3f50] ret_from_fork_asm at ffffffff8ce03cbb

Fixes: 66f5d6ce53 ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
Reported-by: Adrian Oliver <kernel@aoliver.ca>
Signed-off-by: Omid Ehtemam-Haghighi <omid.ehtemamhaghighi@menlosecurity.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Ido Schimmel <idosch@idosch.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Simon Horman <horms@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20241106010236.1239299-1-omid.ehtemamhaghighi@menlosecurity.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Rajani Kantha <rajanikantha@engineer.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-01 18:37:52 +01:00
..
af_unix selftests/net: unix: fix unused variable compiler warning 2023-12-08 08:52:22 +01:00
forwarding selftests: net: no_forwarding: fix VID for $swp2 in one_bridge_two_pvids() test 2024-10-17 15:24:24 +02:00
hsr selftests: hsr: Extend the testsuite to also cover HSRv1. 2023-09-18 08:26:19 +01:00
mptcp selftests: mptcp: avoid spurious errors on disconnect 2025-01-23 17:21:15 +01:00
openvswitch selftests: openvswitch: fix tcpdump execution 2024-12-27 13:58:49 +01:00
.gitignore selftests/net: Add log.txt and tools to .gitignore 2023-08-20 15:15:41 +01:00
Makefile ipv6: Fix soft lockups in fib6_select_path under high next hop churn 2025-02-01 18:37:52 +01:00
altnames.sh selftest: net: fix typo in altname test 2021-09-12 10:48:26 +01:00
amt.sh selftests: net: kill smcrouted in the cleanup logic in amt.sh 2024-06-12 11:12:47 +02:00
arp_ndisc_evict_nocarrier.sh selftests: net: return non-zero for failures reported in arp_ndisc_evict_nocarrier 2023-01-01 11:59:53 +00:00
arp_ndisc_untracked_subnets.sh selftests/net: specify the interface when do arping 2024-01-25 15:35:22 -08:00
bareudp.sh
big_tcp.sh selftests: net: let big_tcp test cope with slow env 2024-02-16 19:10:50 +01:00
bind_bhash.c selftests/net: Add test for timing a bind request to a port with a populated bhash entry 2022-08-24 19:30:09 -07:00
bind_bhash.sh selftests/net: Improve bind_bhash.sh to accommodate predictable network interface names 2023-09-10 18:49:29 +01:00
bind_timewait.c tcp: Add selftest for bind() and TIME_WAIT. 2022-12-30 07:25:53 +00:00
bind_wildcard.c selftest: tcp: Add v4-mapped-v6 cases in bind_wildcard.c. 2023-09-13 07:18:04 +01:00
cmsg_ipv6.sh selftests: cmsg_ipv6: repeat the exact packet 2024-02-16 19:10:50 +01:00
cmsg_sender.c selftests/net: fix a char signedness issue 2023-12-08 08:52:22 +01:00
cmsg_so_mark.sh selftests: net: cmsg_so_mark: test with SO_MARK set by setsockopt 2022-02-10 15:04:51 +00:00
cmsg_time.sh selftests: net: test standard socket cmsgs across UDP and ICMP sockets 2022-02-10 15:04:52 +00:00
config selftests: net: add missing config for amt.sh 2024-06-12 11:11:51 +02:00
csum.c selftests: net: csum: Fix checksums for packets with non-zero padding 2024-09-18 19:24:09 +02:00
devlink_port_split.py selftests: net: devlink_port_split.py: skip test if no suitable device available 2023-03-16 17:38:05 -07:00
drop_monitor_tests.sh
fcnal-test.sh Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-06-22 18:40:38 -07:00
fib-onlink-tests.sh
fib_nexthop_multiprefix.sh selftests/net: fix grep checking for fib_nexthop_multiprefix 2024-01-25 15:35:26 -08:00
fib_nexthop_nongw.sh selftests/net: test nexthop without gw 2022-07-14 14:41:19 +02:00
fib_nexthops.sh Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-10 14:10:53 -07:00
fib_rule_tests.sh selftests: fib_rule_tests: Test UDP and TCP connections with DSCP rules. 2023-02-09 22:49:04 -08:00
fib_tests.sh ipv4: Fix incorrect TOS in fibmatch route get reply 2024-08-03 08:54:05 +02:00
fin_ack_lat.c
fin_ack_lat.sh
gre_gso.sh selftests: net: switch to socat in the GSO GRE test 2021-11-12 19:59:01 -08:00
gro.c selftests/net: fix gro.c compilation failure due to non-existent opt_ipproto_off 2024-07-18 13:21:24 +02:00
gro.sh selftests/net: allow GRO coalesce test on veth 2021-08-26 12:03:49 +01:00
hwtstamp_config.c selftests/net: replace manual array size calc with ARRAYSIZE macro. 2023-07-18 17:43:51 -07:00
icmp.sh selftests/net: Add icmp.sh for testing ICMP dummy address responses 2021-06-18 12:13:24 -07:00
icmp_redirect.sh selftests: icmp_redirect: pass xfail=0 to log_test() 2021-12-12 12:56:41 +00:00
in_netns.sh
io_uring_zerocopy_tx.c selftest/net: adjust io_uring sendzc notif handling 2022-09-23 14:57:27 -06:00
io_uring_zerocopy_tx.sh selftests/net: don't tests batched TCP io_uring zc 2022-11-02 08:27:24 -06:00
ioam6.sh selftests: net: fix IOAM test skip return code 2022-08-02 09:51:31 +02:00
ioam6_parser.c selftests: net: ioam: expect support for Queue depth data 2022-01-21 19:26:39 -08:00
ip6_gre_headroom.sh
ip_defrag.c
ip_defrag.sh
ip_local_port_range.c selftests/net: fix uninitialized variables 2024-07-11 12:49:08 +02:00
ip_local_port_range.sh selftests/net: Cover the IP_LOCAL_PORT_RANGE socket option 2023-01-25 22:45:00 -08:00
ipsec.c selftests/net: ipsec: fix constant out of range 2023-12-08 08:52:22 +01:00
ipv6_flowlabel.c ping: support ipv6 ping socket flow labels 2022-07-22 12:40:27 +01:00
ipv6_flowlabel.sh ping: support ipv6 ping socket flow labels 2022-07-22 12:40:27 +01:00
ipv6_flowlabel_mgr.c
ipv6_route_update_soft_lockup.sh ipv6: Fix soft lockups in fib6_select_path under high next hop churn 2025-02-01 18:37:52 +01:00
l2_tos_ttl_inherit.sh selftests/net: l2_tos_ttl_inherit.sh: Ensure environment cleanup on failure. 2023-01-10 10:13:52 +01:00
l2tp.sh
lib.sh selftests: net: lib: kill PIDs before del netns 2024-08-29 17:33:21 +02:00
msg_zerocopy.c selftests: make order checking verbose in msg_zerocopy selftest 2024-07-11 12:49:14 +02:00
msg_zerocopy.sh
nat6to4.c selftests/net: mv bpf/nat6to4.c to net folder 2023-01-19 13:25:53 +01:00
ndisc_unsolicited_na_test.sh net/ipv6: Expand and rename accept_unsolicited_na to accept_untracked_na 2022-05-31 11:36:57 +02:00
net_helper.sh selftests: net: more strict check in net_helper 2024-06-16 13:47:48 +02:00
netdevice.sh
netns-name.sh selftests: net: add very basic test for netdev names and namespaces 2023-10-19 15:51:16 +02:00
nettest.c selftests: Add SO_DONTROUTE option to nettest. 2023-05-12 09:43:56 +01:00
pmtu.sh selftests: net: really check for bg process completion 2024-12-09 10:32:10 +01:00
psock_fanout.c selftests: net: fix array_size.cocci warning 2022-03-17 15:21:16 +01:00
psock_lib.h selftests/net: replace manual array size calc with ARRAYSIZE macro. 2023-07-18 17:43:51 -07:00
psock_snd.c selftests/net: enable lo.accept_local in psock_snd test 2022-05-25 21:58:35 -07:00
psock_snd.sh selftests/net: remove min gso test in packet_snd 2021-08-02 10:34:04 +01:00
psock_tpacket.c
reuseaddr_conflict.c selftests: reuseaddr_conflict: add missing new line at the end of the output 2024-04-10 16:35:53 +02:00
reuseaddr_ports_exhausted.c selftests/net: fix warnings on reuseaddr_ports_exhausted 2021-03-16 15:01:21 -07:00
reuseaddr_ports_exhausted.sh
reuseport_addr_any.c
reuseport_addr_any.sh
reuseport_bpf.c selftests: Fix the if conditions of in test_extra_filter() 2022-09-27 11:00:02 +02:00
reuseport_bpf_cpu.c
reuseport_bpf_numa.c selftests: net: fix array_size.cocci warning 2022-03-07 12:23:27 +00:00
reuseport_dualstack.c
route_localnet.sh
rps_default_mask.sh selftests: net: fix rps_default_mask with >32 CPUs 2024-01-31 16:19:04 -08:00
rtnetlink.sh selftests: rtnetlink: add MACsec offload tests 2023-07-14 09:16:53 +01:00
run_afpackettests selftests/net: Use kselftest skip code for skipped tests 2021-08-24 16:49:09 -07:00
run_netsocktests
rxtimestamp.c selftests/net: remove ARRAY_SIZE define from individual tests 2021-12-10 17:50:57 -07:00
rxtimestamp.sh
sctp_hello.c selftests: add a selftest for sctp vrf 2022-11-18 11:42:54 +00:00
sctp_vrf.sh selftests: add a selftest for sctp vrf 2022-11-18 11:42:54 +00:00
settings kselftests/net: adapt the timeout to the largest runtime 2022-01-13 12:53:22 +00:00
setup_loopback.sh selftests: net: Remove executable bits from library scripts 2024-10-17 15:24:13 +02:00
setup_veth.sh selftests: net: give more time for GRO aggregation 2024-02-05 20:14:35 +00:00
sk_bind_sendto_listen.c selftests/net: Add sk_bind_sendto_listen and sk_connect_zero_addr 2022-08-24 19:30:09 -07:00
sk_connect_zero_addr.c selftests/net: Add sk_bind_sendto_listen and sk_connect_zero_addr 2022-08-24 19:30:09 -07:00
so_incoming_cpu.c selftest: Don't reuse port for SO_INCOMING_CPU test. 2024-01-31 16:19:02 -08:00
so_netns_cookie.c tools/testing: add a selftest for SO_NETNS_COOKIE 2021-06-24 11:13:05 -07:00
so_txtime.c selftests/net: so_txtime: usage(): fix documentation of default clock 2022-05-03 13:18:26 +02:00
so_txtime.sh selftests/net: so_txtime multi-host support 2021-03-31 17:48:21 -07:00
socket.c selftests/net: remove ARRAY_SIZE define from individual tests 2021-12-10 17:50:57 -07:00
srv6_end_dt4_l3vpn_test.sh selftets: seg6: disable rp_filter by default in srv6_end_dt4_l3vpn_test 2023-05-11 18:01:38 -07:00
srv6_end_dt6_l3vpn_test.sh selftests/net: Use kselftest skip code for skipped tests 2021-08-24 16:49:09 -07:00
srv6_end_dt46_l3vpn_test.sh selftests: srv6: make srv6_end_dt46_l3vpn_test more robust 2023-04-28 09:51:40 +01:00
srv6_end_flavors_test.sh selftests: seg6: add selftest for PSP flavor in SRv6 End behavior 2023-02-16 13:18:06 +01:00
srv6_end_next_csid_l3vpn_test.sh selftests: seg6: add selftest for NEXT-C-SID flavor in SRv6 End behavior 2022-09-20 12:33:22 +02:00
srv6_end_x_next_csid_l3vpn_test.sh selftests: seg6: add selftest for NEXT-C-SID flavor in SRv6 End.X behavior 2023-08-15 18:51:47 -07:00
srv6_hencap_red_l3vpn_test.sh selftests: seg6: add selftest for SRv6 H.Encaps.Red behavior 2022-07-29 12:14:03 +01:00
srv6_hl2encap_red_l2vpn_test.sh selftests: seg6: add selftest for SRv6 H.L2Encaps.Red behavior 2022-07-29 12:14:03 +01:00
stress_reuseport_listen.c net: selftests: Stress reuseport listen 2022-05-12 16:52:18 -07:00
stress_reuseport_listen.sh net: selftests: Stress reuseport listen 2022-05-12 16:52:18 -07:00
tap.c selftests: add few test cases for tap driver 2022-08-05 08:59:15 +01:00
tcp_fastopen_backup_key.c selftests/net: remove ARRAY_SIZE define from individual tests 2021-12-10 17:50:57 -07:00
tcp_fastopen_backup_key.sh
tcp_inq.c
tcp_mmap.c selftests/net: report rcv_mss in tcp_mmap 2023-08-02 11:40:49 +01:00
test_blackhole_dev.sh
test_bpf.sh
test_bridge_backup_port.sh selftests: net: Fix bridge backup port test flakiness 2024-02-23 09:24:51 +01:00
test_bridge_neigh_suppress.sh selftests: test_bridge_neigh_suppress.sh: Fix failures due to duplicate MAC 2024-05-17 12:02:23 +02:00
test_ingress_egress_chaining.sh selftests: add selftest for chaining of tc ingress handling to egress 2022-10-19 14:04:36 +01:00
test_vxlan_fdb_changelink.sh
test_vxlan_mdb.sh selftests: vxlan_mdb: Fix failures with old libnet 2024-04-10 16:35:42 +02:00
test_vxlan_nolocalbypass.sh selftests: net: vxlan: Fix selftest regression after changes in iproute2. 2023-06-11 21:05:53 +01:00
test_vxlan_under_vrf.sh selftests: test_vxlan_under_vrf: Fix broken test case 2022-03-25 17:00:11 -07:00
test_vxlan_vnifiltering.sh selftests: Fix failing VXLAN VNI filtering test 2023-02-08 16:54:03 -08:00
timestamping.c selftests/net: timestamping: Fix bind_phc check 2022-01-31 11:44:04 +00:00
tls.c net: tls, add test to capture error on large splice 2024-08-19 06:04:26 +02:00
toeplitz.c selftests/net: toeplitz: fix race on tpacket_v3 block close 2023-01-19 09:27:15 -08:00
toeplitz.sh selftests: net: Use "grep -E" instead of "egrep" 2022-12-02 20:56:41 -08:00
toeplitz_client.sh selftests/net: toeplitz test 2021-08-05 13:14:09 +01:00
traceroute.sh
tun.c selftest: tun: add test for NAPI dismantle 2022-06-30 11:34:10 -07:00
txring_overwrite.c
txtimestamp.c selftests: net: change fprintf format specifiers 2022-03-21 16:37:54 -07:00
txtimestamp.sh
udpgro.sh selftests: udpgro: report error when receive failed 2024-08-29 17:33:46 +02:00
udpgro_bench.sh selftests/net: synchronize udpgro tests' tx and rx connection 2024-06-16 13:47:31 +02:00
udpgro_frglist.sh selftests/net: synchronize udpgro tests' tx and rx connection 2024-06-16 13:47:31 +02:00
udpgro_fwd.sh selftests: net: gro fwd: update vxlan GRO test expectations 2024-04-10 16:35:52 +02:00
udpgso.c net: change maximum number of UDP segments to 128 2024-04-27 17:11:32 +02:00
udpgso.sh
udpgso_bench.sh selftests: net: udpgso_bench: Fix racing bug between the rx/tx programs 2023-02-02 13:29:51 +01:00
udpgso_bench_rx.c selftests: net: cut more slack for gro fwd tests. 2024-02-16 19:10:48 +01:00
udpgso_bench_tx.c selftests: net: udpgso_bench_tx: Cater for pending datagrams zerocopy benchmarking 2023-02-02 13:29:51 +01:00
unicast_extensions.sh selftests/net: change shebang to bash to support "source" 2024-02-16 19:10:48 +01:00
veth.sh selftests: net: remove dependency on ebpf tests 2024-02-05 20:14:34 +00:00
vrf-xfrm-tests.sh selftests: net: vrf-xfrm-tests: change authentication and encryption algos 2023-06-15 22:24:01 -07:00
vrf_route_leaking.sh selftests: vrf_route_leaking: remove ipv6_ping_frag from default testing 2023-08-20 15:25:10 +01:00
vrf_strict_mode_test.sh selftests: net: vrf_strict_mode_test: add support to select a test to run 2022-05-02 10:48:29 +02:00
xdp_dummy.c selftests: net: remove dependency on ebpf tests 2024-02-05 20:14:34 +00:00
xfrm_policy.sh xfrm: Fix wraparound in xfrm_policy_addr_delta() 2021-01-04 10:35:09 +01:00