OpenCloudOS-Kernel/net/ipv4
John Fastabend ea444185a6 bpf, sockmap: TCP data stall on recv before accept
A common mechanism to put a TCP socket into the sockmap is to hook the
BPF_SOCK_OPS_{ACTIVE_PASSIVE}_ESTABLISHED_CB event with a BPF program
that can map the socket info to the correct BPF verdict parser. When
the user adds the socket to the map the psock is created and the new
ops are assigned to ensure the verdict program will 'see' the sk_buffs
as they arrive.

Part of this process hooks the sk_data_ready op with a BPF specific
handler to wake up the BPF verdict program when data is ready to read.
The logic is simple enough (posted here for easy reading)

 static void sk_psock_verdict_data_ready(struct sock *sk)
 {
	struct socket *sock = sk->sk_socket;

	if (unlikely(!sock || !sock->ops || !sock->ops->read_skb))
		return;
	sock->ops->read_skb(sk, sk_psock_verdict_recv);
 }

The oversight here is sk->sk_socket is not assigned until the application
accepts() the new socket. However, its entirely ok for the peer application
to do a connect() followed immediately by sends. The socket on the receiver
is sitting on the backlog queue of the listening socket until its accepted
and the data is queued up. If the peer never accepts the socket or is slow
it will eventually hit data limits and rate limit the session. But,
important for BPF sockmap hooks when this data is received TCP stack does
the sk_data_ready() call but the read_skb() for this data is never called
because sk_socket is missing. The data sits on the sk_receive_queue.

Then once the socket is accepted if we never receive more data from the
peer there will be no further sk_data_ready calls and all the data
is still on the sk_receive_queue(). Then user calls recvmsg after accept()
and for TCP sockets in sockmap we use the tcp_bpf_recvmsg_parser() handler.
The handler checks for data in the sk_msg ingress queue expecting that
the BPF program has already run from the sk_data_ready hook and enqueued
the data as needed. So we are stuck.

To fix do an unlikely check in recvmsg handler for data on the
sk_receive_queue and if it exists wake up data_ready. We have the sock
locked in both read_skb and recvmsg so should avoid having multiple
runners.

Fixes: 04919bed94 ("tcp: Introduce tcp_read_skb()")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-7-john.fastabend@gmail.com
2023-05-23 16:10:28 +02:00
..
bpfilter net: Revert "net: optimize the sockptr_t for unified kernel/user address spaces" 2020-08-10 12:06:44 -07:00
netfilter xtables: move icmp/icmpv6 logic to xt_tcpudp 2023-03-22 21:48:59 +01:00
Kconfig tcp: configurable source port perturb table size 2022-11-16 13:02:04 +00:00
Makefile bpf,fou: Add bpf_skb_{set,get}_fou_encap kfuncs 2023-04-12 16:40:39 -07:00
af_inet.c tcp: add annotations around sk->sk_shutdown accesses 2023-05-10 10:27:31 +01:00
ah4.c net: ipv4: Remove completion function scaffolding 2023-02-13 18:35:15 +08:00
arp.c neighbour: annotate lockless accesses to n->nud_state 2023-03-15 00:37:32 -07:00
bpf_tcp_ca.c bpf: Remove unused arguments from btf_struct_access(). 2023-04-04 16:57:10 -07:00
cipso_ipv4.c cipso_ipv4: use iph_set_totlen in skbuff_setattr 2023-02-01 20:54:27 -08:00
datagram.c Networking fixes for 6.1-rc2, including fixes from netfilter 2022-10-20 17:24:59 -07:00
devinet.c net: ipv4: Allow changing IPv4 address protocol 2023-03-23 08:32:52 +00:00
esp4.c net: ipv4: Remove completion function scaffolding 2023-02-13 18:35:15 +08:00
esp4_offload.c xfrm: replay: Fix ESN wrap around for GSO 2022-10-19 09:00:53 +02:00
fib_frontend.c ipv4: Fix incorrect table ID in IOCTL path 2023-03-16 17:26:31 -07:00
fib_lookup.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-02-17 11:44:20 -08:00
fib_notifier.c net: ipv4: remove superfluous header files from fib_notifier.c 2021-09-28 17:32:56 -07:00
fib_rules.c ipv4: remove unnecessary type castings 2022-04-30 15:12:58 +01:00
fib_semantics.c neighbour: switch to standard rcu, instead of rcu_bh 2023-03-21 21:32:18 -07:00
fib_trie.c ipv4: Fix error return code in fib_table_insert() 2022-11-22 20:18:20 -08:00
fou_bpf.c bpf,fou: Add bpf_skb_{set,get}_fou_encap kfuncs 2023-04-12 16:40:39 -07:00
fou_core.c bpf,fou: Add bpf_skb_{set,get}_fou_encap kfuncs 2023-04-12 16:40:39 -07:00
fou_nl.c ynl: broaden the license even more 2023-03-16 21:20:32 -07:00
fou_nl.h ynl: broaden the license even more 2023-03-16 21:20:32 -07:00
gre_demux.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
gre_offload.c net: gro: skb_gro_header helper function 2022-08-25 10:33:21 +02:00
icmp.c icmp: guard against too small mtu 2023-03-31 21:37:06 -07:00
igmp.c ipv4: constify ip_mc_sf_allow() socket argument 2023-03-17 08:56:37 +00:00
inet_connection_sock.c net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). 2023-02-20 16:31:49 -08:00
inet_diag.c net: inet: Retire port only listening_hash 2022-05-12 16:52:18 -07:00
inet_fragment.c net: dropreason: add SKB_DROP_REASON_FRAG_REASM_TIMEOUT 2022-10-31 20:14:27 -07:00
inet_hashtables.c ipv6: Remove in6addr_any alternatives. 2023-03-29 08:22:52 +01:00
inet_timewait_sock.c net: no longer support SOCK_REFCNT_DEBUG feature 2023-02-15 10:25:21 +00:00
inetpeer.c inetpeer: Fix data-races around sysctl. 2022-07-08 12:10:33 +01:00
ip_forward.c ip: Fix data-races around sysctl_ip_fwd_update_priority. 2022-07-15 11:49:55 +01:00
ip_fragment.c net: dropreason: add SKB_DROP_REASON_FRAG_TOO_FAR 2022-10-31 20:14:27 -07:00
ip_gre.c erspan: do not use skb_mac_header() in ndo_start_xmit() 2023-03-21 21:16:26 -07:00
ip_input.c net: add support for ipv4 big tcp 2023-02-01 20:54:27 -08:00
ip_options.c ipv4: drop fragmentation code from ip_options_build() 2022-01-29 17:53:07 +00:00
ip_output.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-04-26 10:17:46 +02:00
ip_sockglue.c inet: Add IP_LOCAL_PORT_RANGE socket option 2023-01-25 22:45:00 -08:00
ip_tunnel.c bpf-next-for-netdev 2023-04-13 16:43:38 -07:00
ip_tunnel_core.c net: Add helper function to parse netlink msg of ip_tunnel_parm 2022-10-03 07:59:06 +01:00
ip_vti.c ipv4: tunnels: use DEV_STATS_INC() 2022-11-16 12:48:44 +00:00
ipcomp.c xfrm: ipcomp: add extack to ipcomp{4,6}_init_state 2022-09-29 07:18:00 +02:00
ipconfig.c Driver core / kernfs changes for 6.0-rc1 2022-08-04 11:31:20 -07:00
ipip.c ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip devices 2023-04-12 16:40:39 -07:00
ipmr.c treewide: Convert del_timer*() to timer_shutdown*() 2022-12-25 13:38:09 -08:00
ipmr_base.c ipmr: adopt rcu_read_lock() in mr_dump() 2022-06-24 11:34:38 +01:00
metrics.c ipv4: prevent potential spectre v1 gadget in ip_metrics_convert() 2023-01-23 21:37:25 -08:00
netfilter.c netfilter: Use l3mdev flow key when re-routing mangled packets 2022-05-16 13:03:29 +02:00
netlink.c
nexthop.c neighbour: switch to standard rcu, instead of rcu_bh 2023-03-21 21:32:18 -07:00
ping.c ping: Fix potentail NULL deref for /proc/net/icmp. 2023-04-04 18:56:58 -07:00
proc.c icmp: Add counters for rate limits 2023-01-26 10:52:18 +01:00
protocol.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
raw.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-04-06 12:01:20 -07:00
raw_diag.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-04-06 12:01:20 -07:00
route.c net: dst: fix missing initialization of rt_uncached 2023-04-21 20:26:56 -07:00
syncookies.c mptcp: remove MPTCP 'ifdef' in TCP SYN cookies 2022-12-12 13:11:24 -08:00
sysctl_net_ipv4.c tcp: restrict net.ipv4.tcp_app_win 2023-04-07 08:19:11 +01:00
tcp.c bpf, sockmap: Pass skb ownership through read_skb 2023-05-23 16:09:47 +02:00
tcp_bbr.c bpf: Add __bpf_kfunc tag to all kfuncs 2023-02-02 00:25:14 +01:00
tcp_bic.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_bpf.c bpf, sockmap: TCP data stall on recv before accept 2023-05-23 16:10:28 +02:00
tcp_cdg.c Random number generator fixes for Linux 6.1-rc1. 2022-10-16 15:27:07 -07:00
tcp_cong.c net: Update an existing TCP congestion control algorithm. 2023-03-22 22:53:00 -07:00
tcp_cubic.c bpf: Add __bpf_kfunc tag to all kfuncs 2023-02-02 00:25:14 +01:00
tcp_dctcp.c bpf: Add __bpf_kfunc tag to all kfuncs 2023-02-02 00:25:14 +01:00
tcp_dctcp.h
tcp_diag.c tcp: Access &tcp_hashinfo via net. 2022-09-20 10:21:49 -07:00
tcp_fastopen.c tcp: Make SYN ACK RTO tunable by BPF programs with TFO 2022-08-17 10:19:22 +01:00
tcp_highspeed.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_htcp.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_hybla.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_illinois.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_input.c tcp: add annotations around sk->sk_shutdown accesses 2023-05-10 10:27:31 +01:00
tcp_ipv4.c tcp: fix possible sk_priority leak in tcp_v4_send_reset() 2023-05-12 10:05:50 +01:00
tcp_lp.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_metrics.c genetlink: start to validate reserved header bytes 2022-08-29 12:47:15 +01:00
tcp_minisocks.c tcp: preserve const qualifier in tcp_sk() 2023-03-18 12:23:34 +00:00
tcp_nv.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_offload.c gro: add support of (hw)gro packets to gro stack 2022-10-03 12:38:34 +01:00
tcp_output.c tcp: preserve const qualifier in tcp_sk() 2023-03-18 12:23:34 +00:00
tcp_plb.c prandom: remove prandom_u32_max() 2022-12-20 03:13:45 +01:00
tcp_rate.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-04-28 13:02:01 -07:00
tcp_recovery.c tcp: preserve const qualifier in tcp_sk() 2023-03-18 12:23:34 +00:00
tcp_scalable.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_timer.c tcp: annotate lockless access to sk->sk_err 2023-03-17 08:25:05 +00:00
tcp_ulp.c net/ulp: use consistent error code when blocking ULP 2023-01-19 09:26:16 -08:00
tcp_vegas.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_vegas.h
tcp_veno.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_westwood.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_yeah.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tunnel4.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
udp.c bpf, sockmap: Pass skb ownership through read_skb 2023-05-23 16:09:47 +02:00
udp_bpf.c bpf, sockmap: Fix an infinite loop error when len is 0 in tcp_bpf_recvmsg_parser() 2023-03-03 17:25:15 +01:00
udp_diag.c udp: Access &udp_table via net. 2022-11-16 09:43:35 +00:00
udp_impl.h net: remove noblock parameter from recvmsg() entities 2022-04-12 15:00:25 +02:00
udp_offload.c udp: allow header check for dodgy GSO_UDP_L4 packets. 2022-12-12 09:29:56 +00:00
udp_tunnel_core.c net/tunnel: wait until all sk_user_data reader finish before releasing the sock 2022-12-12 09:51:52 +00:00
udp_tunnel_nic.c udp_tunnel: Add checks for nla_nest_start() in __udp_tunnel_nic_dump_write() 2022-11-29 08:44:24 -08:00
udp_tunnel_stub.c udp_tunnel: add central NIC RX port offload infrastructure 2020-07-10 13:54:00 -07:00
udplite.c tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct(). 2022-10-12 17:50:37 -07:00
xfrm4_input.c
xfrm4_output.c
xfrm4_policy.c net: dst: fix missing initialization of rt_uncached 2023-04-21 20:26:56 -07:00
xfrm4_protocol.c net: xfrm: unexport __init-annotated xfrm4_protocol_init() 2022-06-08 10:10:13 -07:00
xfrm4_state.c
xfrm4_tunnel.c xfrm: tunnel: add extack to ipip_init_state, xfrm6_tunnel_init_state 2022-09-29 07:18:00 +02:00