OpenCloudOS-Kernel/net
Aditi Ghag 9378096e8a bpf: tcp: Avoid taking fast sock lock in iterator
This is a preparatory commit to replace `lock_sock_fast` with
`lock_sock`,and facilitate BPF programs executed from the TCP sockets
iterator to be able to destroy TCP sockets using the bpf_sock_destroy
kfunc (implemented in follow-up commits).

Previously, BPF TCP iterator was acquiring the sock lock with BH
disabled. This led to scenarios where the sockets hash table bucket lock
can be acquired with BH enabled in some path versus disabled in other.
In such situation, kernel issued a warning since it thinks that in the
BH enabled path the same bucket lock *might* be acquired again in the
softirq context (BH disabled), which will lead to a potential dead lock.
Since bpf_sock_destroy also happens in a process context, the potential
deadlock warning is likely a false alarm.

Here is a snippet of annotated stack trace that motivated this change:

```

Possible interrupt unsafe locking scenario:

      CPU0                    CPU1
      ----                    ----
 lock(&h->lhash2[i].lock);
                              local_bh_disable();
                              lock(&h->lhash2[i].lock);
kernel imagined possible scenario:
  local_bh_disable();  /* Possible softirq */
  lock(&h->lhash2[i].lock);
*** Potential Deadlock ***

process context:

lock_acquire+0xcd/0x330
_raw_spin_lock+0x33/0x40
------> Acquire (bucket) lhash2.lock with BH enabled
__inet_hash+0x4b/0x210
inet_csk_listen_start+0xe6/0x100
inet_listen+0x95/0x1d0
__sys_listen+0x69/0xb0
__x64_sys_listen+0x14/0x20
do_syscall_64+0x3c/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc

bpf_sock_destroy run from iterator:

lock_acquire+0xcd/0x330
_raw_spin_lock+0x33/0x40
------> Acquire (bucket) lhash2.lock with BH disabled
inet_unhash+0x9a/0x110
tcp_set_state+0x6a/0x210
tcp_abort+0x10d/0x200
bpf_prog_6793c5ca50c43c0d_iter_tcp6_server+0xa4/0xa9
bpf_iter_run_prog+0x1ff/0x340
------> lock_sock_fast that acquires sock lock with BH disabled
bpf_iter_tcp_seq_show+0xca/0x190
bpf_seq_read+0x177/0x450

```

Also, Yonghong reported a deadlock for non-listening TCP sockets that
this change resolves. Previously, `lock_sock_fast` held the sock spin
lock with BH which was again being acquired in `tcp_abort`:

```
watchdog: BUG: soft lockup - CPU#0 stuck for 86s! [test_progs:2331]
RIP: 0010:queued_spin_lock_slowpath+0xd8/0x500
Call Trace:
 <TASK>
 _raw_spin_lock+0x84/0x90
 tcp_abort+0x13c/0x1f0
 bpf_prog_88539c5453a9dd47_iter_tcp6_client+0x82/0x89
 bpf_iter_run_prog+0x1aa/0x2c0
 ? preempt_count_sub+0x1c/0xd0
 ? from_kuid_munged+0x1c8/0x210
 bpf_iter_tcp_seq_show+0x14e/0x1b0
 bpf_seq_read+0x36c/0x6a0

bpf_iter_tcp_seq_show
   lock_sock_fast
     __lock_sock_fast
       spin_lock_bh(&sk->sk_lock.slock);
	/* * Fast path return with bottom halves disabled and * sock::sk_lock.slock held.* */

 ...
 tcp_abort
   local_bh_disable();
   spin_lock(&((sk)->sk_lock.slock)); // from bh_lock_sock(sk)

```

With the switch to `lock_sock`, it calls `spin_unlock_bh` before returning:

```
lock_sock
    lock_sock_nested
       spin_lock_bh(&sk->sk_lock.slock);
       :
       spin_unlock_bh(&sk->sk_lock.slock);
```

Acked-by: Yonghong Song <yhs@meta.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
Link: https://lore.kernel.org/r/20230519225157.760788-2-aditi.ghag@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-05-19 17:45:46 -07:00
..
6lowpan 6lowpan: Remove redundant initialisation. 2023-03-29 08:22:52 +01:00
9p Including fixes from netfilter. 2023-05-05 19:12:01 -07:00
802 treewide: Convert del_timer*() to timer_shutdown*() 2022-12-25 13:38:09 -08:00
8021q vlan: Add MACsec offload operations for VLAN interface 2023-04-21 08:22:14 +01:00
appletalk
atm Networking changes for 6.4. 2023-04-26 16:07:23 -07:00
ax25 ax25: af_ax25: Remove unnecessary (void*) conversions 2022-11-16 13:31:03 +00:00
batman-adv net: vlan: introduce skb_vlan_eth_hdr() 2023-04-23 14:16:44 +01:00
bluetooth Driver core changes for 6.4-rc1 2023-04-27 11:53:57 -07:00
bpf bpf: Move kernel test kfuncs to bpf_testmod 2023-05-16 22:09:24 -07:00
bpfilter
bridge net: add vlan_get_protocol_and_depth() helper 2023-05-10 10:25:55 +01:00
caif net: caif: Fix use-after-free in cfusbl_device_notify() 2023-03-02 22:22:07 -08:00
can Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-04-06 12:01:20 -07:00
ceph Networking changes for 6.3. 2023-02-21 18:24:12 -08:00
core bpf-next-for-netdev 2023-05-16 19:50:05 -07:00
dcb net: dcb: add helper functions to retrieve PCP and DSCP rewrite maps 2023-01-20 09:33:22 +00:00
dccp netfilter: keep conntrack reference until IPsecv6 policy checks are done 2023-03-22 21:50:23 +01:00
devlink devlink: drop leftover duplicate/unused code 2023-02-20 11:38:35 +00:00
dns_resolver
dsa net: dsa: tag_ocelot: call only the relevant portion of __skb_vlan_pop() on TX 2023-04-23 14:16:45 +01:00
ethernet net: ethernet: use sysfs_emit() to instead of scnprintf() 2022-12-07 20:02:44 -08:00
ethtool ethtool: Fix uninitialized number of lanes 2023-05-03 09:13:20 +01:00
handshake net/handshake: Enable the SNI extension to work properly 2023-05-12 09:24:08 +01:00
hsr hsr: ratelimit only when errors are printed 2023-03-16 21:11:03 -07:00
ieee802154 Revert "net: Remove low_thresh in ip defrag" 2023-05-16 20:46:30 -07:00
ife
ipv4 bpf: tcp: Avoid taking fast sock lock in iterator 2023-05-19 17:45:46 -07:00
ipv6 Revert "net: Remove low_thresh in ip defrag" 2023-05-16 20:46:30 -07:00
iucv net/iucv: Fix size of interrupt data 2023-03-16 17:34:40 -07:00
kcm net/sock: Introduce trace_sk_data_ready() 2023-01-23 11:26:50 +00:00
key af_key: Fix heap information leak 2023-02-13 09:30:14 +00:00
l2tp l2tp: generate correct module alias strings 2023-03-31 09:25:12 +01:00
l3mdev
lapb
llc net: deal with most data-races in sk_wait_event() 2023-05-10 10:03:32 +01:00
mac80211 wireless-next patches for v6.4 2023-04-21 07:35:51 -07:00
mac802154 mac802154: Rename kfree_rcu() to kvfree_rcu_mightsleep() 2023-04-05 13:48:04 +00:00
mctp mctp: remove MODULE_LICENSE in non-modules 2023-03-09 23:06:21 -08:00
mpls net: mpls: fix stale pointer if allocation fails during device rename 2023-02-15 10:26:37 +00:00
mptcp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-04-20 16:29:51 -07:00
ncsi net/ncsi: clear Tx enable mode when handling a Config required AEN 2023-04-28 09:35:33 +01:00
netfilter netfilter: conntrack: fix possible bug_on with enable_hooks=1 2023-05-10 08:50:39 +02:00
netlabel
netlink netlink: annotate accesses to nlk->cb_running 2023-05-10 09:28:38 +01:00
netrom netrom: Fix use-after-free caused by accept on already connected socket 2023-01-30 07:30:47 +00:00
nfc nfc: llcp: fix possible use of uninitialized variable in nfc_llcp_send_connect() 2023-05-15 13:03:34 +01:00
nsh
openvswitch net: openvswitch: fix race on port output 2023-04-07 19:42:53 -07:00
packet net: add vlan_get_protocol_and_depth() helper 2023-05-10 10:25:55 +01:00
phonet net/sock: Introduce trace_sk_data_ready() 2023-01-23 11:26:50 +00:00
psample
qrtr net: qrtr: Fix an uninit variable access bug in qrtr_tx_resume() 2023-04-13 09:35:30 +02:00
rds rds: rds_rm_zerocopy_callback() correct order for list_add_tail() 2023-02-13 09:33:39 +00:00
rfkill net: rfkill-gpio: Add explicit include for of.h 2023-04-06 20:36:27 +02:00
rose net/rose: Fix to not accept on connected socket 2023-01-28 00:19:57 -08:00
rxrpc Including fixes from netfilter. 2023-05-05 19:12:01 -07:00
sched sch_htb: Allow HTB priority parameter in offload mode 2023-05-15 09:31:07 +01:00
sctp sctp: add bpf_bypass_getsockopt proto callback 2023-05-12 10:07:16 +01:00
smc net: deal with most data-races in sk_wait_event() 2023-05-10 10:03:32 +01:00
strparser
sunrpc NFSD 6.4 Release Notes 2023-04-29 11:04:14 -07:00
switchdev
tipc net: deal with most data-races in sk_wait_event() 2023-05-10 10:03:32 +01:00
tls net: introduce and use skb_frag_fill_page_desc() 2023-05-13 19:47:56 +01:00
unix af_unix: Fix data races around sk->sk_shutdown. 2023-05-10 19:06:53 -07:00
vmw_vsock vsock/loopback: don't disable irqs for queue access 2023-04-14 11:04:04 +01:00
wireless Driver core changes for 6.4-rc1 2023-04-27 11:53:57 -07:00
x25 net/x25: Fix to not accept on connected socket 2023-01-25 09:51:04 +00:00
xdp xsk: Use pool->dma_pages to check for DMA 2023-04-27 22:24:51 +02:00
xfrm net: introduce and use skb_frag_fill_page_desc() 2023-05-13 19:47:56 +01:00
Kconfig net/handshake: Add Kunit tests for the handshake consumer API 2023-04-19 18:48:48 -07:00
Kconfig.debug
Makefile net/handshake: Create a NETLINK service for handling handshake requests 2023-04-19 18:48:48 -07:00
compat.c net/compat: Update msg_control_is_user when setting a kernel pointer 2023-04-14 11:09:27 +01:00
devres.c
socket.c net: annotate sk->sk_err write from do_recvmmsg() 2023-05-10 09:58:29 +01:00
sysctl_net.c