OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Jakub Kicinski	abb47dc95d	tls: rx: don't keep decrypted skbs on ctx->recv_pkt Detach the skb from ctx->recv_pkt after decryption is done, even if we can't consume it. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 11:24:10 +01:00
Jakub Kicinski	008141de85	tls: rx: don't try to keep the skbs always on the list I thought that having the skb either always on the ctx->rx_list or ctx->recv_pkt will simplify the handling, as we would not have to remember to flip it from one to the other on exit paths. This became a little harder to justify with the fix for BPF sockmaps. Subsequent changes will make the situation even worse. Queue the skbs only when really needed. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 11:24:10 +01:00
Jakub Kicinski	4cbc325ed6	tls: rx: allow only one reader at a time recvmsg() in TLS gets data from the skb list (rx_list) or fresh skbs we read from TCP via strparser. The former holds skbs which were already decrypted for peek or decrypted and partially consumed. tls_wait_data() only notices appearance of fresh skbs coming out of TCP (or psock). It is possible, if there is a concurrent call to peek() and recv() that the peek() will move the data from input to rx_list without recv() noticing. recv() will then read data out of order or never wake up. This is not a practical use case/concern, but it makes the self tests less reliable. This patch solves the problem by allowing only one reader in. Because having multiple processes calling read()/peek() is not normal avoid adding a lock and try to fast-path the single reader case. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 11:24:10 +01:00
Wen Gu	ddefb2d205	net/smc: Extend SMC-R link group netlink attribute Extend SMC-R link group netlink attribute SMC_GEN_LGR_SMCR. Introduce SMC_NLA_LGR_R_BUF_TYPE to show the buffer type of SMC-R link group. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 11:19:17 +01:00
Wen Gu	b8d199451c	net/smc: Allow virtually contiguous sndbufs or RMBs for SMC-R On long-running enterprise production servers, high-order contiguous memory pages are usually very rare and in most cases we can only get fragmented pages. When replacing TCP with SMC-R in such production scenarios, attempting to allocate high-order physically contiguous sndbufs and RMBs may result in frequent memory compaction, which will cause unexpected hung issue and further stability risks. So this patch is aimed to allow SMC-R link group to use virtually contiguous sndbufs and RMBs to avoid potential issues mentioned above. Whether to use physically or virtually contiguous buffers can be set by sysctl smcr_buf_type. Note that using virtually contiguous buffers will bring an acceptable performance regression, which can be mainly divided into two parts: 1) regression in data path, which is brought by additional address translation of sndbuf by RNIC in Tx. But in general, translating address through MTT is fast. Taking 256KB sndbuf and RMB as an example, the comparisons in qperf latency and bandwidth test with physically and virtually contiguous buffers are as follows: - client: smc_run taskset -c <cpu> qperf <server> -oo msg_size:1:64K:*2\ -t 5 -vu tcp_{bw\|lat} - server: smc_run taskset -c <cpu> qperf [latency] msgsize tcp smcr smcr-use-virt-buf 1 11.17 us 7.56 us 7.51 us (-0.67%) 2 10.65 us 7.74 us 7.56 us (-2.31%) 4 11.11 us 7.52 us 7.59 us ( 0.84%) 8 10.83 us 7.55 us 7.51 us (-0.48%) 16 11.21 us 7.46 us 7.51 us ( 0.71%) 32 10.65 us 7.53 us 7.58 us ( 0.61%) 64 10.95 us 7.74 us 7.80 us ( 0.76%) 128 11.14 us 7.83 us 7.87 us ( 0.47%) 256 10.97 us 7.94 us 7.92 us (-0.28%) 512 11.23 us 7.94 us 8.20 us ( 3.25%) 1024 11.60 us 8.12 us 8.20 us ( 0.96%) 2048 14.04 us 8.30 us 8.51 us ( 2.49%) 4096 16.88 us 9.13 us 9.07 us (-0.64%) 8192 22.50 us 10.56 us 11.22 us ( 6.26%) 16384 28.99 us 12.88 us 13.83 us ( 7.37%) 32768 40.13 us 16.76 us 16.95 us ( 1.16%) 65536 68.70 us 24.68 us 24.85 us ( 0.68%) [bandwidth] msgsize tcp smcr smcr-use-virt-buf 1 1.65 MB/s 1.59 MB/s 1.53 MB/s (-3.88%) 2 3.32 MB/s 3.17 MB/s 3.08 MB/s (-2.67%) 4 6.66 MB/s 6.33 MB/s 6.09 MB/s (-3.85%) 8 13.67 MB/s 13.45 MB/s 11.97 MB/s (-10.99%) 16 25.36 MB/s 27.15 MB/s 24.16 MB/s (-11.01%) 32 48.22 MB/s 54.24 MB/s 49.41 MB/s (-8.89%) 64 106.79 MB/s 107.32 MB/s 99.05 MB/s (-7.71%) 128 210.21 MB/s 202.46 MB/s 201.02 MB/s (-0.71%) 256 400.81 MB/s 416.81 MB/s 393.52 MB/s (-5.59%) 512 746.49 MB/s 834.12 MB/s 809.99 MB/s (-2.89%) 1024 1292.33 MB/s 1641.96 MB/s 1571.82 MB/s (-4.27%) 2048 2007.64 MB/s 2760.44 MB/s 2717.68 MB/s (-1.55%) 4096 2665.17 MB/s 4157.44 MB/s 4070.76 MB/s (-2.09%) 8192 3159.72 MB/s 4361.57 MB/s 4270.65 MB/s (-2.08%) 16384 4186.70 MB/s 4574.13 MB/s 4501.17 MB/s (-1.60%) 32768 4093.21 MB/s 4487.42 MB/s 4322.43 MB/s (-3.68%) 65536 4057.14 MB/s 4735.61 MB/s 4555.17 MB/s (-3.81%) 2) regression in buffer initialization and destruction path, which is brought by additional MR operations of sndbufs. But thanks to link group buffer reuse mechanism, the impact of this kind of regression decreases as times of buffer reuse increases. Taking 256KB sndbuf and RMB as an example, latency of some key SMC-R buffer-related function obtained by bpftrace are as follows: Function Phys-bufs Virt-bufs smcr_new_buf_create() 67154 ns 79164 ns smc_ib_buf_map_sg() 525 ns 928 ns smc_ib_get_memory_region() 162294 ns 161191 ns smc_wr_reg_send() 9957 ns 9635 ns smc_ib_put_memory_region() 203548 ns 198374 ns smc_ib_buf_unmap_sg() 508 ns 1158 ns ------------ Test environment notes: 1. Above tests run on 2 VMs within the same Host. 2. The NIC is ConnectX-4Lx, using SRIOV and passing through 2 VFs to the each VM respectively. 3. VMs' vCPUs are binded to different physical CPUs, and the binded physical CPUs are isolated by `isolcpus=xxx` cmdline. 4. NICs' queue number are set to 1. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 11:19:17 +01:00
Wen Gu	b984f370ed	net/smc: Use sysctl-specified types of buffers in new link group This patch introduces a new SMC-R specific element buf_type in struct smc_link_group, for recording the value of sysctl smcr_buf_type when link group is created. New created link group will create and reuse buffers of the type specified by buf_type. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 11:19:17 +01:00
Wen Gu	4bc5008e43	net/smc: Introduce a sysctl for setting SMC-R buffer type This patch introduces the sysctl smcr_buf_type for setting the type of SMC-R sndbufs and RMBs. Valid values includes: - SMCR_PHYS_CONT_BUFS, which means use physically contiguous buffers for better performance and is the default value. - SMCR_VIRT_CONT_BUFS, which means use virtually contiguous buffers in case of physically contiguous memory is scarce. - SMCR_MIXED_BUFS, which means first try to use physically contiguous buffers. If not available, then use virtually contiguous buffers. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 11:19:17 +01:00
Guangguan Wang	0ef69e7884	net/smc: optimize for smc_sndbuf_sync_sg_for_device and smc_rmb_sync_sg_for_cpu Some CPU, such as Xeon, can guarantee DMA cache coherency. So it is no need to use dma sync APIs to flush cache on such CPUs. In order to avoid calling dma sync APIs on the IO path, use the dma_need_sync to check whether smc_buf_desc needs dma sync when creating smc_buf_desc. Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 11:19:17 +01:00
Guangguan Wang	6d52e2de64	net/smc: remove redundant dma sync ops smc_ib_sync_sg_for_cpu/device are the ops used for dma memory cache consistency. Smc sndbufs are dma buffers, where CPU writes data to it and PCIE device reads data from it. So for sndbufs, smc_ib_sync_sg_for_device is needed and smc_ib_sync_sg_for_cpu is redundant as PCIE device will not write the buffers. Smc rmbs are dma buffers, where PCIE device write data to it and CPU read data from it. So for rmbs, smc_ib_sync_sg_for_cpu is needed and smc_ib_sync_sg_for_device is redundant as CPU will not write the buffers. Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 11:19:16 +01:00
Jaehee Park	aaa5f515b1	net: ipv6: new accept_untracked_na option to accept na only if in-network This patch adds a third knob, '2', which extends the accept_untracked_na option to learn a neighbor only if the src ip is in the same subnet as an address configured on the interface that received the neighbor advertisement. This is similar to the arp_accept configuration for ipv4. Signed-off-by: Jaehee Park <jhpark1013@gmail.com> Suggested-by: Roopa Prabhu <roopa@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-15 18:55:50 -07:00
Jaehee Park	e68c5dcf0a	net: ipv4: new arp_accept option to accept garp only if in-network In many deployments, we want the option to not learn a neighbor from garp if the src ip is not in the same subnet as an address configured on the interface that received the garp message. net.ipv4.arp_accept sysctl is currently used to control creation of a neigh from a received garp packet. This patch adds a new option '2' to net.ipv4.arp_accept which extends option '1' by including the subnet check. Signed-off-by: Jaehee Park <jhpark1013@gmail.com> Suggested-by: Roopa Prabhu <roopa@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-15 18:55:49 -07:00
Kuniyuki Iwashima	11052589cf	tcp/udp: Make early_demux back namespacified. Commit `e21145a987` ("ipv4: namespacify ip_early_demux sysctl knob") made it possible to enable/disable early_demux on a per-netns basis. Then, we introduced two knobs, tcp_early_demux and udp_early_demux, to switch it for TCP/UDP in commit `dddb64bcb3` ("net: Add sysctl to toggle early demux for tcp and udp"). However, the .proc_handler() was wrong and actually disabled us from changing the behaviour in each netns. We can execute early_demux if net.ipv4.ip_early_demux is on and each proto .early_demux() handler is not NULL. When we toggle (tcp\|udp)_early_demux, the change itself is saved in each netns variable, but the .early_demux() handler is a global variable, so the handler is switched based on the init_net's sysctl variable. Thus, netns (tcp\|udp)_early_demux knobs have nothing to do with the logic. Whether we CAN execute proto .early_demux() is always decided by init_net's sysctl knob, and whether we DO it or not is by each netns ip_early_demux knob. This patch namespacifies (tcp\|udp)_early_demux again. For now, the users of the .early_demux() handler are TCP and UDP only, and they are called directly to avoid retpoline. So, we can remove the .early_demux() handler from inet6?_protos and need not dereference them in ip6?_rcv_finish_core(). If another proto needs .early_demux(), we can restore it at that time. Fixes: `dddb64bcb3` ("net: Add sysctl to toggle early demux for tcp and udp") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20220713175207.7727-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-15 18:50:35 -07:00
Johannes Berg	bd363ee533	wifi: mac80211: mlme: set sta.mlo correctly Due to some changes and rebasing between different patches this fell through the cracks; we need to set sta.mlo if the connection is using MLO. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 15:12:45 +02:00
Johannes Berg	8f5d9e68c9	wifi: mac80211: remove stray printk Unfortunately, a printk snuck into a previous patch, remove it. Fixes: `81151ce462` ("wifi: mac80211: support MLO authentication/association with one link") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 15:05:35 +02:00
Kuniyuki Iwashima	2a85388f1d	tcp: Fix a data-race around sysctl_tcp_probe_interval. While reading sysctl_tcp_probe_interval, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `05cbc0db03` ("ipv4: Create probe timer for tcp PMTU as per RFC4821") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-15 11:49:56 +01:00
Kuniyuki Iwashima	92c0aa4175	tcp: Fix a data-race around sysctl_tcp_probe_threshold. While reading sysctl_tcp_probe_threshold, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `6b58e0a5f3` ("ipv4: Use binary search to choose tcp PMTU probe_size") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-15 11:49:56 +01:00
Kuniyuki Iwashima	8e92d44236	tcp: Fix a data-race around sysctl_tcp_mtu_probe_floor. While reading sysctl_tcp_mtu_probe_floor, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `c04b79b6cf` ("tcp: add new tcp_mtu_probe_floor sysctl") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-15 11:49:56 +01:00
Kuniyuki Iwashima	78eb166cde	tcp: Fix data-races around sysctl_tcp_min_snd_mss. While reading sysctl_tcp_min_snd_mss, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `5f3e2bf008` ("tcp: add tcp_min_snd_mss sysctl") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-15 11:49:56 +01:00
Kuniyuki Iwashima	88d78bc097	tcp: Fix data-races around sysctl_tcp_base_mss. While reading sysctl_tcp_base_mss, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `5d424d5a67` ("[TCP]: MTU probing") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-15 11:49:55 +01:00
Kuniyuki Iwashima	f47d00e077	tcp: Fix data-races around sysctl_tcp_mtu_probing. While reading sysctl_tcp_mtu_probing, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `5d424d5a67` ("[TCP]: MTU probing") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-15 11:49:55 +01:00
Kuniyuki Iwashima	0db2327658	ip: Fix a data-race around sysctl_ip_autobind_reuse. While reading sysctl_ip_autobind_reuse, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `4b01a96742` ("tcp: bind(0) remove the SO_REUSEADDR restriction when ephemeral ports are exhausted.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-15 11:49:55 +01:00
Kuniyuki Iwashima	289d3b21fb	ip: Fix data-races around sysctl_ip_nonlocal_bind. While reading sysctl_ip_nonlocal_bind, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-15 11:49:55 +01:00
Kuniyuki Iwashima	7bf9e18d9a	ip: Fix data-races around sysctl_ip_fwd_update_priority. While reading sysctl_ip_fwd_update_priority, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `432e05d328` ("net: ipv4: Control SKB reprioritization after forwarding") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-15 11:49:55 +01:00
Kuniyuki Iwashima	60c158dc7b	ip: Fix data-races around sysctl_ip_fwd_use_pmtu. While reading sysctl_ip_fwd_use_pmtu, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `f87c10a8aa` ("ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-15 11:49:55 +01:00
Kuniyuki Iwashima	0968d2a441	ip: Fix data-races around sysctl_ip_no_pmtu_disc. While reading sysctl_ip_no_pmtu_disc, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-15 11:49:55 +01:00
Kuniyuki Iwashima	8281b7ec5c	ip: Fix data-races around sysctl_ip_default_ttl. While reading sysctl_ip_default_ttl, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-15 11:49:55 +01:00
Peilin Ye	88b3822cdf	net/sched: sch_cbq: Delete unused delay_timer delay_timer has been unused since commit `c3498d34dd` ("cbq: remove TCA_CBQ_OVL_STRATEGY support"). Delete it. Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-15 11:29:09 +01:00
Johannes Berg	81151ce462	wifi: mac80211: support MLO authentication/association with one link It might seem a bit pointless to do a multi-link operation connection with just a single link, but this is already a big change, so for now, limit MLO connections to a single link. Extending that to multiple links will require * work on parsing the multi-link element with STA profile properly, including element fragmentation; * checking the per-link status in the multi-link element * implementing logic to have active/inactive links to let drivers decide which links should be active; * implementing multicast RX deduplication; * and likely more. For now this is still useful since it lets us do multi-link connections for the purposes of testing APIs and the higher layers such as wpa_supplicant. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:24 +02:00
Johannes Berg	425f4b5fce	wifi: mac80211: add API to parse multi-link element Add the necessary API to parse the multi-link element in the future. For now, link only to the element when found so we can use it in the client-side code later. Later, we'll need to fill this in to deal with element fragmentation, parse the STA profile, etc. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:24 +02:00
Johannes Berg	42fb9148c0	wifi: mac80211: do link->MLD address translation on RX In some cases, e.g. with Qualcomm devices and management frames, or in hwsim, frames may be reported from the driver with link addresses, but for decryption and matching needs we really want to have them with MLD addresses. Support the translation on RX. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:23 +02:00
Andrei Otcheretianski	3e0278b717	wifi: mac80211: select link when transmitting to non-MLO stations When an MLO AP is transmitting to a non-MLO station, addr2 should be set to a link address. This should be done before the frame is encrypted as otherwise aad verification would fail. In case of software encryption this can't be left for the device to handle, and should be done by mac80211 when building the frame hdr. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:23 +02:00
Johannes Berg	f36fe0a2df	wifi: mac80211: fix up link station creation/insertion When we create a station with a non-default link, then we should have a link address, and we definitely need to insert it into the link hash table on insertion. Split the API into with and without link creation and if it has a link, insert the link into the link hash table on sta_info_insert(). Fixes: `ba6ddab94f` ("wifi: mac80211: maintain link-sta hash table") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:23 +02:00
Johannes Berg	175ad2ec89	wifi: mac80211: limit A-MSDU subframes for client too In AP/mesh where the stations are added by userspace, we limit the number of A-MSDU subframes according to the extended capabilities. Refactor the code and extend that also to client-side. Fixes: `506bcfa8ab` ("mac80211: limit the A-MSDU Tx based on peer's capabilities") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:23 +02:00
Johannes Berg	5d3a341c0d	wifi: mac80211: mlme: refactor ieee80211_set_associated() Split out much of the code in ieee80211_set_associated() into a new ieee80211_link_set_associated() which can be called per link later for MLO. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:23 +02:00
Johannes Berg	7464f66515	wifi: cfg80211: add cfg80211_get_iftype_ext_capa() Add a helper function cfg80211_get_iftype_ext_capa() to look up interface type-specific (extended) capabilities. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:23 +02:00
Johannes Berg	74e1309ace	wifi: mac80211: mlme: look up beacon elems only if needed If NEED_DTIM_BEFORE_ASSOC isn't set, then we don't need to enter an RCU critical section and look up the beacon elements. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:23 +02:00
Johannes Berg	1845c1d4a4	wifi: mac80211: mlme: refactor assoc link setup Factor out the code to set up the assoc link into a new function ieee80211_setup_assoc_link(). While at it, also modify the 'override' handling to just take into account whether or not the conn_flags were changed, which is what we need to setup again the channel later. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:23 +02:00
Johannes Berg	a857c21eaf	wifi: mac80211: mlme: remove address arg to ieee80211_mark_sta_auth() There's no need to pass the address, we can look at the auth_data inside the function rather than outside. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:22 +02:00
Johannes Berg	6911458dc4	wifi: mac80211: mlme: refactor assoc success handling Refactor the per-link setup out of ieee80211_assoc_success() into a new function ieee80211_assoc_config_link(). It looks useless for now to parse the elements again inside ieee80211_assoc_config_link(), but that will be done with the link ID in the future. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:22 +02:00
Johannes Berg	7781f0d81c	wifi: mac80211: mlme: refactor ieee80211_prep_channel() a bit Refactor ieee80211_prep_channel() to make the link argument optional and add a conn_flags pointer argument instead, so that we can later use this for links that don't exist yet to build the right information for MLO. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:22 +02:00
Johannes Berg	978420c210	wifi: mac80211: mlme: refactor assoc req element building For MLO, we will need to build these elements per link, so factor out the code that does this, returning the capability, to simplify building the multi-link element in the future. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:22 +02:00
Johannes Berg	39d805998c	wifi: mac80211: mlme: switch some things back to deflink With MLO, when we'll disconnect from an AP MLD, we'll just destroy all the links. Therefore, the only thing we (may) need to reset is the deflink data, so switch back to that and adjust the comments accordingly. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:22 +02:00
Johannes Berg	4a21a8ae79	wifi: mac80211: mlme: change flags in ieee80211_determine_chantype() For MLO we'll need to read flags not directly from the link as it may not even exist yet if we're just setting up flags for a secondary link before sending the association request, so pass the incoming conn_flags separately. Also, while at it, pass the sdata/link separately as for non-tracking now the link may be NULL. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:22 +02:00
Johannes Berg	61513162aa	wifi: mac80211: mlme: shift some code around We'll need ieee80211_prep_channel() in other code for MLO later, so move the code up - unchanged for now - to avoid forward declarations in the future. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:22 +02:00
Johannes Berg	bbe90107e1	wifi: mac80211: mlme: refactor link station setup Refactor the code here since we need to have it also for each link station after association in MLO later. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:22 +02:00
Johannes Berg	39eac2de00	wifi: mac80211: move IEEE80211_SDATA_OPERATING_GMODE to link The flag here is currently per interface, but the way we set and clear it means it should be per link, so change it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:21 +02:00
Johannes Berg	de03f8ac5c	wifi: mac80211: make ieee80211_check_rate_mask() link-aware Change ieee80211_check_rate_mask() to use a link rather than the sdata and deflink/bss_conf. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:21 +02:00
Johannes Berg	8ec9a96b83	wifi: mac80211: add multi-link element to AUTH frames When sending an authentication frame from an MLD, include the multi-link element with the MLD address and use the link address for transmission. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:21 +02:00
Johannes Berg	64f4b93afa	wifi: mac80211: mlme: clean up supported channels element code Clean up the code building the supported channels element a little bit by using a local variable instead of the long line. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:21 +02:00
Johannes Berg	b048c98447	wifi: mac80211: release channel context on link stop When a link is stopped for removal, release the channel context it may have. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:21 +02:00
Johannes Berg	19343659c8	wifi: mac80211: prohibit DEAUTH_NEED_MGD_TX_PREP in MLO For now, prohibit DEAUTH_NEED_MGD_TX_PREP since we can't really transmit this on a specific link yet as we don't know which links are active. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:21 +02:00
Johannes Berg	ff5c4dc4cd	wifi: nl80211: fix some attribute policy entries The new NL80211_CMD_ADD_LINK_STA and NL80211_CMD_MODIFY_LINK_STA commands have strict policy validation, so fix the policy so it can be validated correctly. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:21 +02:00
Johannes Berg	df35f3164e	wifi: nl80211: reject fragmented and non-inheritance elements The underlying mac80211 code cannot deal with fragmented elements for purposes of sorting the elements into the association frame, so reject those inside the link. We might want to reject them inside the assoc frame, but they're used today for FILS, so cannot do that. The non-inheritance element inside the links similarly cannot be handled by mac80211, and outside the links it makes no sense. Reject both since using them could lead to an incorrect implementation. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:21 +02:00
Johannes Berg	34d76a14f8	wifi: nl80211: reject link specific elements on assoc link When we associate, we'll include all the elements for the link we're sending the association request on in the frame and the specific ones for other links in the multi-link element container. Prohibit adding link-specific elements for the association link. Fixes: `d648c23024` ("wifi: nl80211: support MLO in auth/assoc") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:21 +02:00
Johannes Berg	e3d331c9b6	wifi: cfg80211: set country_elem to NULL The link loop will always have a valid link so that it's always set, but static checkers don't always see that, so set it to NULL explicitly. Fixes: `efbabc1165` ("cfg80211: Indicate MLO connection info in connect and roam callbacks") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:21 +02:00
Gregory Greenman	7840bd468a	wifi: mac80211: remove link_id parameter from link_info_changed() Since struct ieee80211_bss_conf already contains link_id, passing link_id is not necessary. Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:20 +02:00
Gregory Greenman	727eff4dd1	wifi: mac80211: replace link_id with link_conf in switch/(un)assign_vif_chanctx() Since mac80211 already has a protected pointer to link_conf, pass it to the driver to avoid additional RCU locking. Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:20 +02:00
Johannes Berg	fa2ca639c4	wifi: nl80211: advertise MLO support At least while we don't have any more specific interface combinations support, add a simple flag for MLO support, we can keep this later based on something other than the wiphy flag. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:20 +02:00
Andrei Otcheretianski	0cbf348a9a	wifi: mac80211: Support multi link in ieee80211_recalc_min_chandef() Recalculate min channel context for the given or all interface links, depending on the caller. For a station state change, we need to recalculate all of them since we don't know which link (or multiple) it might be on. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:20 +02:00
Andrei Otcheretianski	e10b680118	wifi: mac80211: don't check carrier in chanctx code We check here that we don't enable TX (netif_carrier_ok()) before we actually start using some channel context, but to our knowledge this check has never triggered, and with MLO it's just wrong since links can be added and removed much more dynamically than before. Simply remove the checks, there's no really good way to do anything that would replace them. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:20 +02:00
Ilan Peer	69c3f2d30c	wifi: nl80211: allow link ID in set_wiphy with frequency This simplifies hostapd implementation, since it didn't switch to NL80211_CMD_SET_CHANNEL. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:20 +02:00
Andrei Otcheretianski	0d5891e347	wifi: mac80211: Allow EAPOL tx from specific link Allow link source address on TX. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:20 +02:00
Andrei Otcheretianski	d06faef148	wifi: mac80211: Allow EAPOL frames from link addresses Allow transmitting EAPOL frames not only from the interface address (which is the MLD address) but also any link addresses, in order to support non-MLO stations on AP interfaces. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:20 +02:00
Andrei Otcheretianski	67207bab93	wifi: cfg80211/mac80211: Support control port TX from specific link In case of authentication with a legacy station, link addressed EAPOL frames should be sent. Support it. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:19 +02:00
Andrei Otcheretianski	d2bc52498b	wifi: nl80211: Support MLD parameters in nl80211_set_station() Set the MLD parameters in NL80211_CMD_SET_STATION handling to be able to change an MLD station. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:19 +02:00
Johannes Berg	45aaf17c0c	wifi: nl80211: check MLO support in authenticate We should check that MLO connections are supported before attempting to authenticate with MLO parameters, check that. Fixes: `d648c23024` ("wifi: nl80211: support MLO in auth/assoc") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:19 +02:00
Johannes Berg	e434254946	wifi: mac80211: add a helper to fragment an element The way this works is that you add all the element data, keeping a pointer to the length field of the element. Then call this helper function, which will fragment the element if there was more than 255 bytes in the element, memmove()ing the data back if needed. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:19 +02:00
Johannes Berg	8a263dcb58	wifi: mac80211: skip rate statistics for MLD STAs For now, skip rate statistics here to avoid warnings in the called code, we'll need to adjust this to have all the statistics for link stations. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:19 +02:00
Johannes Berg	9b6bf4d612	wifi: nl80211: set BSS to NULL if IS_ERR() If the BSS lookup returned an error, set it to NULL so we don't try to free it. Fixes: `d648c23024` ("wifi: nl80211: support MLO in auth/assoc") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:19 +02:00
Johannes Berg	4e9c3af398	wifi: nl80211: add EML/MLD capabilities to per-iftype capabilities We have the per-interface type capabilities, currently for extended capabilities, add the EML/MLD capabilities there to have this advertised by the driver. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:19 +02:00
Johannes Berg	efbfe5165e	wifi: nl80211: better validate link ID for stations If we add a station on an MLD, we need a link ID to see where it lives (by default). Validate the link ID against the valid_links. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:19 +02:00
Johannes Berg	d3e2439b0f	wifi: mac80211: fix link manipulation When we add non-deflink pointers, we need to remove the link[0] pointer to deflink in case link[0] is not valid afterwards. Also, we need to add that back when there are no more valid links. Reorg the code to fix that. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:19 +02:00
Johannes Berg	939c4c7e82	wifi: mac80211: tighten locking check When we remove a link that doesn't have a channel context, we don't really need the local->mtx locking. Tighten the check here. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:19 +02:00
Johannes Berg	cdf0a0a80c	wifi: cfg80211: clean up links appropriately This was missing earlier, we need to remove links when interfaces are being destroyed, and we also need to stop (AP) operations when a link is being destroyed. Address these issues to remove many warnings that will otherwise appear in mac80211. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:18 +02:00
Johannes Berg	a95fe06782	wifi: mac80211: consider EHT element size in assoc request We need to consider the (maximum) size of the EHT element we'll add for the association request, otherwise we may run out of space. Fixes: `820acc810f` ("mac80211: Add EHT capabilities to association/probe request") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:18 +02:00
Johannes Berg	df9a9c44e9	wifi: mac80211: mlme: simplify adding ht/vht/he/eht elements The functions currently take a link and check data from it, but this needs to change for MLO. Simplify the prototypes by passing only the needed arguments. Remove the regulatory checks, the warnings shouldn't trigger, and haven't as far as I know. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:18 +02:00
Johannes Berg	3c68cb81bf	wifi: mac80211: refactor adding custom elements Rework the sorting of custom elements into the association request by moving the elements before HT/VHT/HE to each their own function. While at it, fix the placement of the ones that should be between VHT and HE. This doesn't fix the placement of elements that should be between HE and EHT yet, a similar change might be needed in the future. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:18 +02:00
Johannes Berg	c1690b66ba	wifi: mac80211: refactor adding rates to assoc request There's some awkward code that really only exists because we want to optimize the allocation size, but that's not really all that necessary. Refactor the code that adds rates to the association request frame to have a separate function, removing the goto. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:18 +02:00
Johannes Berg	3dc05935ea	wifi: mac80211: use only channel width in ieee80211_parse_bitrates() For MLO, we may not have a full chandef here later, so change the API to pass only the width. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:18 +02:00
Johannes Berg	c57d2e6a65	wifi: mac80211: remove redundant condition Here, ext_capa is checked and can only be non-NULL if assoc_data->ie_len was set before, so the check here is redundant. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:18 +02:00
Johannes Berg	483456590a	wifi: mac80211: don't set link address for station We need to handle the link addresses for station differently, they will be determined by the association code, stored, and then applied when the links are actually created on success, cfg80211 will fill in the right addresses per the data we're sending back to it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:18 +02:00
Johannes Berg	38c6aa29d4	wifi: mac80211: fix multi-BSSID element parsing When parsing a frame containing a multi-BSSID element, we need to know both the transmitted and non-transmitted BSSID so we can parse it correctly. Unfortunately, in quite a number of cases, we got this wrong and were passing the wrong BSSID or useless information: * the mgmt->bssid from a frame is only the transmitted BSSID if the frame is a beacon * passing just one of the parameters as non-NULL isn't useful and ignored In those case where we need to parse for a specific BSS we always have a BSS structure pointer, representing the BSS we need, whether transmitted or not. Thus, pass that pointer to the parsing function instead of the two BSSIDs. Also fix two bugs: * we need to re-parse all the elements for the other BSS when iterating the non-transmitted BSSes in scan * we need to parse for the correct BSS when setting up the channel data in client code Fixes: `78ac51f815` ("mac80211: support multi-bssid") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:17 +02:00
Johannes Berg	ab3a830d96	wifi: mac80211: move tdls_chan_switch_prohibited to link data This value should be per link, since a TDLS connection is only established on a given link. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:17 +02:00
Johannes Berg	635495e9c4	wifi: mac80211: don't re-parse elems in ieee80211_assoc_success() We're already passing the elems pointer, and have parsed them from the same frame with exactly the same parameters, so don't need to do that again. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:17 +02:00
Gregory Greenman	b327c84c32	wifi: mac80211: replace link_id with link_conf in start/stop_ap() When calling start/stop_ap(), mac80211 already has a protected link_conf pointer. Pass it to the driver, so it shouldn't handle RCU protection. Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:17 +02:00
Johannes Berg	fd17bf041b	wifi: mac80211: refactor elements parsing with parameter struct Refactor the element parsing into a version that has a parameter struct so we can add more parameters more easily in the future. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:17 +02:00
Johannes Berg	5cd212cb64	wifi: cfg80211: extend cfg80211_rx_assoc_resp() for MLO Extend the cfg80211_rx_assoc_resp() to cover multiple BSSes, the AP MLD address and local link addresses for MLO. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:17 +02:00
Johannes Berg	cd47c0f57a	wifi: cfg80211: put cfg80211_rx_assoc_resp() arguments into a struct For MLO we'll need a lot more arguments, including all the BSS pointers and link addresses, so move the data to a struct to be able to extend it more easily later. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:17 +02:00
Johannes Berg	e69dac88a1	wifi: cfg80211: adjust assoc comeback for MLO We only report the BSSID to userspace, so change the argument from BSS struct pointer to AP address, which we'll use to carry either the BSSID or AP MLD address. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:17 +02:00
Johannes Berg	afa2d65938	wifi: mac80211: mlme: unify assoc data event sending There are a few cases where we send an event to cfg80211 manually, but ieee80211_destroy_assoc_data() also handles the case of abandoning; some cases don't need an event and success is handled yet differently. Unify this by providing a single status argument to the ieee80211_destroy_assoc_data() function and then handling all the different cases of events (or no events) there. This will help simplify the code when MLO support is added. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:17 +02:00
Johannes Berg	f662d2f4e2	wifi: cfg80211: prepare association failure APIs for MLO For MLO, we need the ability to report back multiple BSS structures to release, as well as the AP MLD address (if attempting to make an MLO connection). Unify cfg80211_assoc_timeout() and cfg80211_abandon_assoc() into a new cfg80211_assoc_failure() that gets a structure parameter with the necessary data. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:16 +02:00
Johannes Berg	8f6e0dfc22	wifi: cfg80211: remove BSS pointer from cfg80211_disassoc_request The race described by the comment in mac80211 hasn't existed since the locking rework to use the same lock and for MLO we need to pass the AP MLD address, so just pass the BSSID or AP MLD address instead of the BSS struct pointer, and adjust all the code accordingly. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:16 +02:00
Johannes Berg	98b0b46746	wifi: mac80211: mlme: use correct link_sta For station capabilities, e.g. TWT, we need to use the correct link station instead of deflink. Switch the code to do that. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:16 +02:00
Johannes Berg	d3853f700c	wifi: mac80211: mlme: remove sta argument from ieee80211_config_bw The argument is unused except for NULL checking, but we already do that anyway, so it's not needed. Remove the argument. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:16 +02:00
Johannes Berg	1dd0f31c23	wifi: mac80211: mlme: use ieee80211_get_link_sband() This requires a few more changes. While at it, also add a warning to ieee80211_get_sband() to avoid it being used when there are multiple links. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:16 +02:00
Johannes Berg	6359598df6	wifi: mac80211: split IEEE80211_STA_DISABLE_WMM to link data If we decide to stop tracking QoS/WMM parameters, then this should be a per-link decision. Move the flag to the link instead. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:16 +02:00
Johannes Berg	5bd5666d8a	wifi: mac80211: mlme: first adjustments for MLO Do the first adjustments in the client-side code to pass the link pointer (instead of sdata) to most places etc. This is just preparation, so the real MLO patches become smaller. Note that this isn't complete, notably there are still quite a few references to sta->deflink and sta->sta.deflink. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:16 +02:00
Johannes Berg	42ed6748af	wifi: mac80211: mlme: do IEEE80211_STA_RESET_SIGNAL_AVE per link Remove the IEEE80211_STA_RESET_SIGNAL_AVE flag and use a bool instead, but invert the polarity (now calling it tracking_signal_avg) so we don't have to initialize it, and put that into the link instead. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:16 +02:00
Johannes Berg	b65567b03c	wifi: mac80211: mlme: track AP (MLD) address separately To prepare a bit more for MLO in the client code, track the AP's address (for now only the BSSID, but will track the AP MLD's address later) separately from the per-link BSSID. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:16 +02:00
Johannes Berg	7ebe994fbd	wifi: mac80211: remove unused bssid variable This variable is only written to, remove it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:16 +02:00
Johannes Berg	b3e2130bf5	wifi: mac80211: change QoS settings API to take link into account Take the link into account in the QoS settings (EDCA parameters) APIs. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:15 +02:00
Johannes Berg	8c7c6b5819	wifi: mac80211: expect powersave handling in driver for MLO In MLO, expect the driver fully handles powersave handling, including tracking whether or not a beacon was received, the DTIM period, etc. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:15 +02:00
Johannes Berg	a3b8008dc1	wifi: mac80211: move ps setting to vif config This really shouldn't be in a per-link config, we don't want to let anyone control it that way (if anything, link powersave could be forced through APIs to activate/deactivate a link), and we don't support powersave in software with devices that can do MLO. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:15 +02:00
Johannes Berg	3fbddae46e	wifi: mac80211: provide link ID in link_conf It might be useful to drivers to be able to pass only the link_conf pointer, rather than both the pointer and the link_id; add the link_id to the link_conf to facility that. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:15 +02:00
Johannes Berg	b2e8434f18	wifi: mac80211: set up/tear down client vif links properly In station/client mode, the link data needs a bit more initialization and destruction than just zero-init and kfree() respectively, implement that. This required some shuffling of the link data handling in general, as we should set it up in setup and do the teardown in teardown, otherwise we're asymmetric in case of interface type changes. Also stop using kfree_rcu(), we cannot guarantee that nothing is scheduling things that live within the link (e.g. the u.mgd.request_smps_work) until we're sure it cannot be referenced anymore, therefore synchronize instead. This isn't very efficient, but we can always optimize it later. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:15 +02:00
Johannes Berg	94ddc3b5aa	wifi: mac80211: move ieee80211_request_smps_mgd_work This function can be static. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:15 +02:00
Johannes Berg	284b38b690	wifi: nl80211: acquire wdev mutex for dump_survey At least the quantenna driver calls wdev_chandef() here which now requires the lock, so acquire it. Fixes: `7b0a0e3c3a` ("wifi: cfg80211: do some rework towards MLO link APIs") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:15 +02:00
Johannes Berg	e2722d278e	wifi: mac80211: fix key lookup With the split into keys[]/deflink.gtk[] arrays, WEP keys are still installed into the keys[] array, but we didn't look them up there. This meant they weren't deleted correctly. Fix this by looking up the key there even if it's not pairwise so we can be sure we don't have it. Fixes: `bfd8403add` ("wifi: mac80211: reorg some iface data structs for MLD") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:15 +02:00
Johannes Berg	ba323e2985	wifi: mac80211: separate out connection downgrade flags Separate out the connection downgrade flags from the ifmgd->flags and put them into the link information instead. While at it, make them a separate sparse type so we don't get confused about where they belong and have static checking on correct handling. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:14 +02:00
Ilan Peer	1e0b3b0b6c	wifi: mac80211: Align with Draft P802.11be_D1.5 Align the mac80211 implementation with P802.11be_D1.5. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:14 +02:00
Johannes Berg	28977e790b	wifi: mac80211: skip powersave recalc if driver SUPPORTS_DYNAMIC_PS There are a few places that check ps_sdata and/or the dynamic PS timeout, but they're erroneous in case SUPPORTS_DYNAMIC_PS is set by the driver. Skip the entire recalculation in this case so we cannot get into those paths elsewhere, and so we simplify this for the purpose of implementing MLO. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:14 +02:00
Johannes Berg	c5c48a11dd	wifi: mac80211: debug: omit link if non-MLO connection If we don't really have multiple links, omit the link ID from link debug prints, otherwise we change the format for all of the existing drivers (most of which might never support MLO), and also have extra noise in the logs. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:14 +02:00
Johannes Berg	1d4c0f0405	wifi: cfg80211: drop BSS elements from assoc trace for now For multi-link operation, this cannot work as the req->bss pointer will be NULL, and we'll need to do more work on this to really add tracing for the MLO case here. Drop the BSS elements for now as they're not the most useful thing, and it's hard to size things correctly for the MLO case (without adding a lot of code that's also executed when tracing isn't enabled.) Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:14 +02:00
Shaul Triebitz	c0d6701261	wifi: nl80211: enable setting the link address at new station Since for an MLD station the default link is added together with the add station command, allow also setting the link MAC address. Otherwise, it is needed to use the modify link API only for setting the link MAC address. Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:14 +02:00
Johannes Berg	d8675a6351	wifi: mac80211: RCU-ify link/link_conf pointers Since links can be added and removed dynamically, we need to somehow protect the sdata->link[] and vif->link_conf[] array pointers from disappearing when accessing them without locks. RCU-ify the pointers to achieve this, which requires quite a bit of rework. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:14 +02:00
Johannes Berg	3d1cc7cdf2	wifi: nl80211: hold wdev mutex for station APIs Since this will need to refer - at least in part - to the link stations of an MLD, hold the wdev mutex for driver convenience. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:13 +02:00
Johannes Berg	4e2f3d67e3	wifi: nl80211: hold wdev mutex for channel switch APIs Since we deal with links in an MLD here, hold the wdev mutex now. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:13 +02:00
Johannes Berg	858fd1880b	wifi: nl80211: hold wdev mutex in add/mod/del link station Since we deal with links, and that requires looking at wdev links, we should hold the wdev mutex for driver convenience. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:13 +02:00
Shaul Triebitz	21476ad16d	wifi: mac80211: implement callbacks for <add/mod/del>_link_station Implement callbacks for cfg80211 add_link_station, mod_link_station, and del_link_station API. Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:13 +02:00
Shaul Triebitz	b95eb7f0ee	wifi: cfg80211/mac80211: separate link params from station params Put the link_station_parameters structure in the station_parameters structure (and remove the station_parameters fields already existing in link_station_parameters). Now, for an MLD station, the default link is added together with the station. Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:13 +02:00
Shaul Triebitz	577e5b8c39	wifi: cfg80211: add API to add/modify/remove a link station Add an API for adding/modifying/removing a link of a station. Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:13 +02:00
Shaul Triebitz	f91cb507e6	wifi: mac80211: add an ieee80211_get_link_sband Similar to ieee80211_get_sband but get the sband of the link_conf. Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:13 +02:00
Andrei Otcheretianski	0866f8e3ef	wifi: mac80211: Remove AP SMPS leftovers AP SMPS was removed and not needed anymore. Remove the leftovers. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:13 +02:00
Andrei Otcheretianski	6df2810ac9	wifi: cfg80211: Allow MLO TX with link source address Management frames are transmitted from link address and not device address. Allow that. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:12 +02:00
Andrei Otcheretianski	54283409cd	wifi: mac80211: Consider MLO links in offchannel logic Check all the MLO links to decide whether offchannel TX is needed. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:12 +02:00
Johannes Berg	892b3bceb0	wifi: mac80211: rx: accept link-addressed frames When checking whether or not to accept a frame in RX, take into account the configured link addresses. Also look up the link station correctly. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:12 +02:00
Johannes Berg	6858ad75c2	wifi: mac80211: consistently use sdata_dereference() Instead of open-coding it, use sdata_dereference(). Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:12 +02:00
Aditya Kumar Singh	0bd5093255	wifi: mac80211: fix mesh airtime link metric estimating ieee80211s_update_metric function uses sta_set_rate_info_tx function to get struct rate_info data from ieee80211_tx_rate struct, present in ieee80211_sta->deflink.tx_stats. However, drivers can skip tx rate calculation by setting rate idx as -1. Such drivers provides rate_info directly and hence ieee80211s metric is updated incorrectly since ieee80211_tx_rate has inconsistent data. Add fix to use rate_info directly if present instead of sta_set_rate_info_tx for updating ieee80211s metric. Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com> Link: https://lore.kernel.org/r/20220701133611.544-1-quic_adisi@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:12 +02:00
Lian Chen	bf326cf53a	wifi: mac80211: make 4addr null frames using min_rate for WDS WDS needs 4addr packets to trigger AP for wlan0.sta creation. However, the 4addr null frame is sent at a high rate so that sometimes the AP can't receive it. Switch to using min rate. Signed-off-by: Lian Chen <lian.chen@mediatek.com> Link: https://lore.kernel.org/r/20220714091636.59107-1-lian.chen@mediatek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:12 +02:00
XueBing Chen	59e8ef18f6	wifi: cfg80211: use strscpy to replace strlcpy The strlcpy should not be used because it doesn't limit the source length. Preferred is strscpy. Signed-off-by: XueBing Chen <chenxuebing@jari.cn> Link: https://lore.kernel.org/r/2d2fcbf7.e33.181eda8e70e.Coremail.chenxuebing@jari.cn Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:12 +02:00
Felix Fietkau	51d3cfaf99	wifi: mac80211: exclude multicast packets from AQL pending airtime In AP mode, multicast traffic is handled very differently from normal traffic, especially if at least one client is in powersave mode. This means that multicast packets can be buffered a lot longer than normal unicast packets, and can eat up the AQL budget very quickly because of the low data rate. Along with the recent change to maintain a global PHY AQL limit, this can lead to significant latency spikes for unicast traffic. Since queueing multicast to hardware is currently not constrained by AQL limits anyway, let's just exclude it from the AQL pending airtime calculation entirely. Fixes: `8e4bac0671` ("wifi: mac80211: add a per-PHY AQL limit to improve fairness") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20220713083444.86129-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:12 +02:00
Eric Biggers	ec8f7f4821	crypto: lib - make the sha1 library optional Since the Linux RNG no longer uses sha1_transform(), the SHA-1 library is no longer needed unconditionally. Make it possible to build the Linux kernel without the SHA-1 library by putting it behind a kconfig option, and selecting this new option from the kconfig options that gate the remaining users: CRYPTO_SHA1 for crypto/sha1_generic.c, BPF for kernel/bpf/core.c, and IPV6 for net/ipv6/addrconf.c. Unfortunately, since BPF is selected by NET, for now this can only make a difference for kernels built without networking support. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-07-15 16:43:59 +08:00
Jiri Pirko	a44c4511ff	net: devlink: fix return statement in devlink_port_new_notify() Return directly without intermediate value store at the end of devlink_port_new_notify() function. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-14 21:58:46 -07:00
Jiri Pirko	ced92571af	net: devlink: fix a typo in function name devlink_port_new_notifiy() Fix the typo in a name of devlink_port_new_notifiy() function. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-14 21:58:46 -07:00
Jiri Pirko	9a7923668b	net: devlink: make devlink_dpipe_headers_register() return void The return value is not used, so change the return value type to void. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-14 21:58:46 -07:00
Jakub Kicinski	816cd16883	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net include/net/sock.h `310731e2f1` ("net: Fix data-races around sysctl_mem.") `e70f3c7012` ("Revert "net: set SK_MEM_QUANTUM to 4096"") https://lore.kernel.org/all/20220711120211.7c8b7cba@canb.auug.org.au/ net/ipv4/fib_semantics.c `747c143072` ("ip: fix dflt addr selection for connected nexthop") `d62607c3fe` ("net: rename reference+tracking helpers") net/tls/tls.h include/net/tls.h `3d8c51b25a` ("net/tls: Check for errors in tls_device_init") `5879031423` ("tls: create an internal header") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-14 15:27:35 -07:00
Ben Dooks	96a233e600	bpf: Add endian modifiers to fix endian warnings A couple of the syscalls which load values (bpf_skb_load_helper_16() and bpf_skb_load_helper_32()) are using u16/u32 types which are triggering warnings as they are then converted from big-endian to CPU-endian. Fix these by making the types __be instead. Fixes the following sparse warnings: net/core/filter.c:246:32: warning: cast to restricted __be16 net/core/filter.c:246:32: warning: cast to restricted __be16 net/core/filter.c:246:32: warning: cast to restricted __be16 net/core/filter.c:246:32: warning: cast to restricted __be16 net/core/filter.c:273:32: warning: cast to restricted __be32 net/core/filter.c:273:32: warning: cast to restricted __be32 net/core/filter.c:273:32: warning: cast to restricted __be32 net/core/filter.c:273:32: warning: cast to restricted __be32 net/core/filter.c:273:32: warning: cast to restricted __be32 net/core/filter.c:273:32: warning: cast to restricted __be32 Signed-off-by: Ben Dooks <ben.dooks@sifive.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220714105101.297304-1-ben.dooks@sifive.com	2022-07-14 23:00:48 +02:00
Maciej Fijalkowski	ca2e1a6270	xsk: Mark napi_id on sendmsg() When application runs in busy poll mode and does not receive a single packet but only sends them, it is currently impossible to get into napi_busy_loop() as napi_id is only marked on Rx side in xsk_rcv_check(). In there, napi_id is being taken from xdp_rxq_info carried by xdp_buff. From Tx perspective, we do not have access to it. What we have handy is the xsk pool. Xsk pool works on a pool of internal xdp_buff wrappers called xdp_buff_xsk. AF_XDP ZC enabled drivers call xp_set_rxq_info() so each of xdp_buff_xsk has a valid pointer to xdp_rxq_info of underlying queue. Therefore, on Tx side, napi_id can be pulled from xs->pool->heads[0].xdp.rxq->napi_id. Hide this pointer chase under helper function, xsk_pool_get_napi_id(). Do this only for sockets working in ZC mode as otherwise rxq pointers would not be initialized. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20220707130842.49408-1-maciej.fijalkowski@intel.com	2022-07-14 22:45:34 +02:00
Tariq Toukan	3d8c51b25a	net/tls: Check for errors in tls_device_init Add missing error checks in tls_device_init. Fixes: `e8f6979981` ("net/tls: Add generic NIC offload infrastructure") Reported-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220714070754.1428-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-14 10:12:39 -07:00
Nicolas Dichtel	747c143072	ip: fix dflt addr selection for connected nexthop When a nexthop is added, without a gw address, the default scope was set to 'host'. Thus, when a source address is selected, 127.0.0.1 may be chosen but rejected when the route is used. When using a route without a nexthop id, the scope can be configured in the route, thus the problem doesn't exist. To explain more deeply: when a user creates a nexthop, it cannot specify the scope. To create it, the function nh_create_ipv4() calls fib_check_nh() with scope set to 0. fib_check_nh() calls fib_check_nh_nongw() wich was setting scope to 'host'. Then, nh_create_ipv4() calls fib_info_update_nhc_saddr() with scope set to 'host'. The src addr is chosen before the route is inserted. When a 'standard' route (ie without a reference to a nexthop) is added, fib_create_info() calls fib_info_update_nhc_saddr() with the scope set by the user. iproute2 set the scope to 'link' by default. Here is a way to reproduce the problem: ip netns add foo ip -n foo link set lo up ip netns add bar ip -n bar link set lo up sleep 1 ip -n foo link add name eth0 type dummy ip -n foo link set eth0 up ip -n foo address add 192.168.0.1/24 dev eth0 ip -n foo link add name veth0 type veth peer name veth1 netns bar ip -n foo link set veth0 up ip -n bar link set veth1 up ip -n bar address add 192.168.1.1/32 dev veth1 ip -n bar route add default dev veth1 ip -n foo nexthop add id 1 dev veth0 ip -n foo route add 192.168.1.1 nhid 1 Try to get/use the route: > $ ip -n foo route get 192.168.1.1 > RTNETLINK answers: Invalid argument > $ ip netns exec foo ping -c1 192.168.1.1 > ping: connect: Invalid argument Try without nexthop group (iproute2 sets scope to 'link' by dflt): ip -n foo route del 192.168.1.1 ip -n foo route add 192.168.1.1 dev veth0 Try to get/use the route: > $ ip -n foo route get 192.168.1.1 > 192.168.1.1 dev veth0 src 192.168.0.1 uid 0 > cache > $ ip netns exec foo ping -c1 192.168.1.1 > PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data. > 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.039 ms > > --- 192.168.1.1 ping statistics --- > 1 packets transmitted, 1 received, 0% packet loss, time 0ms > rtt min/avg/max/mdev = 0.039/0.039/0.039/0.000 ms CC: stable@vger.kernel.org Fixes: `597cfe4fc3` ("nexthop: Add support for IPv4 nexthops") Reported-by: Edwin Brossette <edwin.brossette@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/20220713114853.29406-1-nicolas.dichtel@6wind.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-07-14 14:41:19 +02:00
Andrea Mayer	4889fbd98d	seg6: bpf: fix skb checksum in bpf_push_seg6_encap() Both helper functions bpf_lwt_seg6_action() and bpf_lwt_push_encap() use the bpf_push_seg6_encap() to encapsulate the packet in an IPv6 with Segment Routing Header (SRH) or insert an SRH between the IPv6 header and the payload. To achieve this result, such helper functions rely on bpf_push_seg6_encap() which, in turn, leverages seg6_do_srh_{encap,inline}() to perform the required operation (i.e. encap/inline). This patch removes the initialization of the IPv6 header payload length from bpf_push_seg6_encap(), as it is now handled properly by seg6_do_srh_{encap,inline}() to prevent corruption of the skb checksum. Fixes: `fe94cc290f` ("bpf: Add IPv6 Segment Routing helpers") Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-07-14 10:15:15 +02:00
Andrea Mayer	f048880fc7	seg6: fix skb checksum in SRv6 End.B6 and End.B6.Encaps behaviors The SRv6 End.B6 and End.B6.Encaps behaviors rely on functions seg6_do_srh_{encap,inline}() to, respectively: i) encapsulate the packet within an outer IPv6 header with the specified Segment Routing Header (SRH); ii) insert the specified SRH directly after the IPv6 header of the packet. This patch removes the initialization of the IPv6 header payload length from the input_action_end_b6{_encap}() functions, as it is now handled properly by seg6_do_srh_{encap,inline}() to avoid corruption of the skb checksum. Fixes: `140f04c33b` ("ipv6: sr: implement several seg6local actions") Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-07-14 10:15:15 +02:00
Andrea Mayer	df8386d13e	seg6: fix skb checksum evaluation in SRH encapsulation/insertion Support for SRH encapsulation and insertion was introduced with commit `6c8702c60b` ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels"), through the seg6_do_srh_encap() and seg6_do_srh_inline() functions, respectively. The former encapsulates the packet in an outer IPv6 header along with the SRH, while the latter inserts the SRH between the IPv6 header and the payload. Then, the headers are initialized/updated according to the operating mode (i.e., encap/inline). Finally, the skb checksum is calculated to reflect the changes applied to the headers. The IPv6 payload length ('payload_len') is not initialized within seg6_do_srh_{inline,encap}() but is deferred in seg6_do_srh(), i.e. the caller of seg6_do_srh_{inline,encap}(). However, this operation invalidates the skb checksum, since the 'payload_len' is updated only after the checksum is evaluated. To solve this issue, the initialization of the IPv6 payload length is moved from seg6_do_srh() directly into the seg6_do_srh_{inline,encap}() functions and before the skb checksum update takes place. Fixes: `6c8702c60b` ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Reported-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/all/20220705190727.69d532417be7438b15404ee1@uniroma2.it Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-07-14 10:15:15 +02:00
Zhengchao Shao	bc5c8260f4	net/sched: remove return value of unregister_tcf_proto_ops Return value of unregister_tcf_proto_ops is unused, remove it. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 14:46:59 +01:00
David S. Miller	736002fb6a	A fairly large set of updates for next, highlights: ath10k * ethernet frame format support rtw89 * TDLS support cfg80211/mac80211 * airtime fairness fixes * EHT support continued, especially in AP mode * initial (and still major) rework for multi-link operation (MLO) from 802.11be/wifi 7 As usual, also many small updates/cleanups/fixes/etc. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmLOca8ACgkQB8qZga/f l8S2sQ//VyUyfPxKTnos4xLm9cZFYbP4/JAl+e1QwbYpa8TtQFMjyiDq+/mTiowA gS5qdiAllS75MyxH5LuVJ1fSWe7DmSQ1A733gO4cQUxPUtaUrtXWZpsinYT+Vk4J a20kOic/9KCD6j1JFLEFToaDBHxO6Rbqo1knnTuOpMXIV6H/ou0PNlj6Ys66oFLV V5SvsoeIfCXsN3j/8JyGgjIC52LiNLam3VfdalParurY8yAxda0ub9IKvYqL/s3M PZyuHUc0kJsL/2094sjmn6SKZobjTzrOQcLgq4nPXgspp+8YQ+CUf97QS8nH5rBV AOlv7+WOiC9Ext/rBzxwZvjCmJUZSVn44mDMjafzIfTYDn0sB9m4CpqfQpgK5zvC mf+jhvI99VuK3S4Zx/xRhNFZMAZZG65zkJKEACclBL2Bcs9A+z12CPIWvalEb3/k Hk38VlUIMWPQlbcJW7oVTNH8HNpKIuOCecxKWZC+8MDDb/ZhIYhFqFNMb5TnbOBI GMXIDBlfYZgvBKHgwcj9G24QGgm1P+yKGyDcnVH0KPismZwt0gm9R+VX2B4HyBnD neT/7wx8yxsm7ujJIF28CM+BnF9vxZKVPGUS6XhS2aarOKanAalybsm9DKLwlArZ Qlr2rwaTM+ZkHS82Yapv6At97IYvfiq+ju3b940aL3YrOmgHoqs= =smwk -----END PGP SIGNATURE----- Merge tag 'wireless-next-2022-07-13' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== A fairly large set of updates for next, highlights: ath10k * ethernet frame format support rtw89 * TDLS support cfg80211/mac80211 * airtime fairness fixes * EHT support continued, especially in AP mode * initial (and still major) rework for multi-link operation (MLO) from 802.11be/wifi 7 As usual, also many small updates/cleanups/fixes/etc. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 14:28:52 +01:00
David S. Miller	67de8acdd3	A small set of fixes for * queue selection in mesh/ocb * queue handling on interface stop * hwsim virtio device vs. some other virtio changes * dt-bindings email addresses * color collision memory allocation * a const variable in rtw88 * shared SKB transmit in the ethernet format path * P2P client port authorization -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmLOcFIACgkQB8qZga/f l8TK6g//dM2kjGZhyDJUnUicUplN6m4sHLeVqqWCJiUaZepg0Zb3zwEhfEjXnYgn nWfFCqyRYN2JgESKFG2LNliAUW954ccu5mAHNoR41SXjwPxPLZblYqdirdtMsbv3 VM6Ar7WKVWqIer103lUOmiH+tSMObuUhfESbFVByutJfRAcWOolEIJdoAQEmqoKt BgU0frkZLGpX9PTzJaT5KmgOnXstrWqdTY1JzLPR93k+fN0kwsOcBtwipqYTombI gcnIMb5eY16EHQES9Rf02PIGDe9Oka2+xr9gfOAwFE5JWgh6j6TwHnXBi6UM5mby /i6owhSS9km1rwTzsqJnpC89zZ1E26e5W7i6tDdQ+70OorSgPjMOGiyPNP+1KX0x P9CfFGV6c2CICCfylva7lQXoBkAUn9uQsimGBOzYY3eWt5gYZKrwNistLKlrZQca qRMRCXApfPvcyPvkX4DEuiJDgi+74nUqm0okIHLVHN4QfAuoq22DzTlTlFiF6OCJ Fj5URCCfwyuwNtaF0W6IH8PnhkD8VQjYHH0RqclQAUaS5yJxj4x///GTGPwYDCxe JcbASQfDOK1QmN4C3vOweym9J5jUdJR4fbvuj2iJhL0qQLrQZrKHoPfu8J5G4EyC rtHAVmz8eI+IQtYsppRpQbRpNtmcj773FXhQ2wNqkZ6Y7i/GtFE= =GrDi -----END PGP SIGNATURE----- Merge tag 'wireless-2022-07-13' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== A small set of fixes for * queue selection in mesh/ocb * queue handling on interface stop * hwsim virtio device vs. some other virtio changes * dt-bindings email addresses * color collision memory allocation * a const variable in rtw88 * shared SKB transmit in the ethernet format path * P2P client port authorization ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 14:27:38 +01:00
David Lamparter	d7c31cbde4	net: ip6mr: add RTM_GETROUTE netlink op The IPv6 multicast routing code previously implemented only the dump variant of RTM_GETROUTE. Implement single MFC item retrieval by copying and adapting the respective IPv4 code. Tested against FRRouting's IPv6 PIM stack. Signed-off-by: David Lamparter <equinox@diac24.net> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: David Ahern <dsahern@kernel.org> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 13:53:48 +01:00
Jiri Pirko	7715023aa5	net: devlink: use helpers to work with devlink->lock mutex As far as the lock helpers exist as the drivers need to work with the devlink->lock mutex, use the helpers internally in devlink.c in order to be consistent. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 13:49:44 +01:00
Jiri Pirko	1abfb265f0	net: devlink: fix unlocked vs locked functions descriptions To be unified with the rest of the code, the unlocked version (devl_*) of function should have the same description in documentation as the locked one. Add the missing documentation. Also, add "Context" annotation for the locked versions where it is missing. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 13:49:43 +01:00
Kuniyuki Iwashima	bdf00bf24b	nexthop: Fix data-races around nexthop_compat_mode. While reading nexthop_compat_mode, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `4f80116d3d` ("net: ipv4: add sysctl for nexthop api compatibility mode") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 12:56:50 +01:00
Kuniyuki Iwashima	e49e4aff7e	ipv4: Fix data-races around sysctl_ip_dynaddr. While reading sysctl_ip_dynaddr, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 12:56:49 +01:00
Kuniyuki Iwashima	12b8d9ca7e	tcp: Fix a data-race around sysctl_tcp_ecn_fallback. While reading sysctl_tcp_ecn_fallback, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `492135557d` ("tcp: add rfc3168, section 6.1.1.1. fallback") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 12:56:49 +01:00
Kuniyuki Iwashima	4785a66702	tcp: Fix data-races around sysctl_tcp_ecn. While reading sysctl_tcp_ecn, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 12:56:49 +01:00
Kuniyuki Iwashima	1ebcb25ad6	icmp: Fix a data-race around sysctl_icmp_ratemask. While reading sysctl_icmp_ratemask, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 12:56:49 +01:00
Kuniyuki Iwashima	2a4eb71484	icmp: Fix a data-race around sysctl_icmp_ratelimit. While reading sysctl_icmp_ratelimit, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 12:56:49 +01:00
Kuniyuki Iwashima	d2efabce81	icmp: Fix a data-race around sysctl_icmp_errors_use_inbound_ifaddr. While reading sysctl_icmp_errors_use_inbound_ifaddr, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `1c2fb7f93c` ("[IPV4]: Sysctl configurable icmp error source address.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 12:56:49 +01:00
Kuniyuki Iwashima	b04f9b7e85	icmp: Fix a data-race around sysctl_icmp_ignore_bogus_error_responses. While reading sysctl_icmp_ignore_bogus_error_responses, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 12:56:49 +01:00
Kuniyuki Iwashima	66484bb98e	icmp: Fix a data-race around sysctl_icmp_echo_ignore_broadcasts. While reading sysctl_icmp_echo_ignore_broadcasts, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 12:56:49 +01:00
Kuniyuki Iwashima	4a2f7083cc	icmp: Fix data-races around sysctl_icmp_echo_enable_probe. While reading sysctl_icmp_echo_enable_probe, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `d329ea5bd8` ("icmp: add response to RFC 8335 PROBE messages") Fixes: `1fd07f33c3` ("ipv6: ICMPV6: add response to ICMPV6 RFC 8335 PROBE messages") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 12:56:49 +01:00
Kuniyuki Iwashima	bb7bb35a63	icmp: Fix a data-race around sysctl_icmp_echo_ignore_all. While reading sysctl_icmp_echo_ignore_all, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 12:56:49 +01:00
Kuniyuki Iwashima	6f605b57f3	tcp: Fix a data-race around sysctl_max_tw_buckets. While reading sysctl_max_tw_buckets, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 12:56:49 +01:00
Maksym Glubokiy	83d85bb069	net: extract port range fields from fl_flow_key So it can be used for port range filter offloading. Co-developed-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu> Signed-off-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu> Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 12:16:56 +01:00
Matthias May	b09ab9c92e	ip6_tunnel: allow to inherit from VLAN encapsulated IP The current code allows to inherit the TTL (hop_limit) from the payload when skb->protocol is ETH_P_IP or ETH_P_IPV6. However when the payload is VLAN encapsulated (e.g because the tunnel is of type GRETAP), then this inheriting does not work, because the visible skb->protocol is of type ETH_P_8021Q or ETH_P_8021AD. Instead of skb->protocol, use skb_protocol(). Signed-off-by: Matthias May <matthias.may@westermo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 12:10:22 +01:00
Matthias May	3f8a8447fd	ip6_gre: use actual protocol to select xmit When the payload is a VLAN encapsulated IPv6/IPv6 frame, we can skip the 802.1q/802.1ad ethertypes and jump to the actual protocol. This way we treat IPv4/IPv6 frames as IP instead of as "other". Signed-off-by: Matthias May <matthias.may@westermo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 12:10:22 +01:00
Matthias May	41337f52b9	ip6_gre: set DSCP for non-IP The current code always forces a dscp of 0 for all non-IP frames. However when setting a specific TOS with the command ip link add name tep0 type ip6gretap local fdd1:ced0:5d88:3fce::1 remote fdd1:ced0:5d88:3fce::2 tos 0xa0 one would expect all GRE encapsulated frames to have a TOS of 0xA0. and not only when the payload is IPv4/IPv6. Signed-off-by: Matthias May <matthias.may@westermo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 12:10:22 +01:00
Matthias May	7ae29fd1be	ip_tunnel: allow to inherit from VLAN encapsulated IP The current code allows to inherit the TOS, TTL, DF from the payload when skb->protocol is ETH_P_IP or ETH_P_IPV6. However when the payload is VLAN encapsulated (e.g because the tunnel is of type GRETAP), then this inheriting does not work, because the visible skb->protocol is of type ETH_P_8021Q or ETH_P_8021AD. Instead of skb->protocol, use skb_protocol(). Signed-off-by: Matthias May <matthias.may@westermo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 12:10:21 +01:00
Paolo Abeni	3ad14f54bd	mptcp: more accurate MPC endpoint tracking Currently the id accounting for the ID 0 subflow is not correct: at creation time we mark (correctly) as unavailable the endpoint id corresponding the MPC subflow source address, while at subflow removal time set as available the id 0. With this change we track explicitly the endpoint id corresponding to the MPC subflow so that we can mark it as available at removal time. Additionally this allow deleting the initial subflow via the NL PM specifying the corresponding endpoint id. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-12 18:37:20 -07:00
Paolo Abeni	c157bbe776	mptcp: allow the in kernel PM to set MPC subflow priority Any local endpoints configured on the address matching the MPC subflow are currently ignored. Specifically, setting a backup flag on them has no effect on the first subflow, as the MPC handshake can't carry such info. This change refactors the MPC endpoint id accounting to additionally fetch the priority info from the relevant endpoint and eventually trigger the MP_PRIO handshake as needed. As a result, the MPC subflow now switches to backup priority after that the MPTCP socket is fully established, according to the local endpoint configuration. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-12 18:37:19 -07:00
Paolo Abeni	bedee0b561	mptcp: address lookup improvements When looking-up a socket address in the endpoint list, we must prefer port-based matches over address only match. Ensure that port-based endpoints are listed first, using head insertion for them. Additionally be sure that only port-based endpoints carry a non zero port number. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-12 18:37:19 -07:00
Paolo Abeni	f5360e9b31	mptcp: introduce and use mptcp_pm_send_ack() The in-kernel PM has a bit of duplicate code related to ack generation. Create a new helper factoring out the PM-specific needs and use it in a couple of places. As a bonus, mptcp_subflow_send_ack() is not used anymore outside its own compilation unit and can become static. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-12 18:37:19 -07:00
XueBing Chen	512b2dc48e	net: ip_tunnel: use strscpy to replace strlcpy The strlcpy should not be used because it doesn't limit the source length. Preferred is strscpy. Signed-off-by: XueBing Chen <chenxuebing@jari.cn> Link: https://lore.kernel.org/r/2a08f6c1.e30.181ed8b49ad.Coremail.chenxuebing@jari.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-12 18:31:57 -07:00
Yonglong Li	536a6c8e05	tcp: make retransmitted SKB fit into the send window current code of __tcp_retransmit_skb only check TCP_SKB_CB(skb)->seq in send window, and TCP_SKB_CB(skb)->seq_end maybe out of send window. If receiver has shrunk his window, and skb is out of new window, it should retransmit a smaller portion of the payload. test packetdrill script: 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 fcntl(3, F_GETFL) = 0x2 (flags O_RDWR) +0 fcntl(3, F_SETFL, O_RDWR\|O_NONBLOCK) = 0 +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) +0 > S 0:0(0) win 65535 <mss 1460,sackOK,TS val 100 ecr 0,nop,wscale 8> +.05 < S. 0:0(0) ack 1 win 6000 <mss 1000,nop,nop,sackOK> +0 > . 1:1(0) ack 1 +0 write(3, ..., 10000) = 10000 +0 > . 1:2001(2000) ack 1 win 65535 +0 > . 2001:4001(2000) ack 1 win 65535 +0 > . 4001:6001(2000) ack 1 win 65535 +.05 < . 1:1(0) ack 4001 win 1001 and tcpdump show: 192.168.226.67.55 > 192.0.2.1.8080: Flags [.], seq 1:2001, ack 1, win 65535, length 2000 192.168.226.67.55 > 192.0.2.1.8080: Flags [.], seq 2001:4001, ack 1, win 65535, length 2000 192.168.226.67.55 > 192.0.2.1.8080: Flags [P.], seq 4001:5001, ack 1, win 65535, length 1000 192.168.226.67.55 > 192.0.2.1.8080: Flags [.], seq 5001:6001, ack 1, win 65535, length 1000 192.0.2.1.8080 > 192.168.226.67.55: Flags [.], ack 4001, win 1001, length 0 192.168.226.67.55 > 192.0.2.1.8080: Flags [.], seq 5001:6001, ack 1, win 65535, length 1000 192.168.226.67.55 > 192.0.2.1.8080: Flags [P.], seq 4001:5001, ack 1, win 65535, length 1000 when cient retract window to 1001, send window is [4001,5002], but TLP send 5001-6001 packet which is out of send window. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/1657532838-20200-1-git-send-email-liyonglong@chinatelecom.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-12 18:13:48 -07:00
Zhengchao Shao	5022e221c9	net: change the type of ip_route_input_rcu to static The type of ip_route_input_rcu should be static. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20220711073549.8947-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-07-12 15:08:45 +02:00
Justin Stitt	e79b9473e9	net: ipv4: fix clang -Wformat warnings When building with Clang we encounter these warnings: \| net/ipv4/ah4.c:513:4: error: format specifies type 'unsigned short' but \| the argument has type 'int' [-Werror,-Wformat] \| aalg_desc->uinfo.auth.icv_fullbits / 8); - \| net/ipv4/esp4.c:1114:5: error: format specifies type 'unsigned short' \| but the argument has type 'int' [-Werror,-Wformat] \| aalg_desc->uinfo.auth.icv_fullbits / 8); `aalg_desc->uinfo.auth.icv_fullbits` is a u16 but due to default argument promotion becomes an int. Variadic functions (printf-like) undergo default argument promotion. Documentation/core-api/printk-formats.rst specifically recommends using the promoted-to-type's format flag. As per C11 6.3.1.1: (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf) `If an int can represent all values of the original type ..., the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.` Thus it makes sense to change %hu to %d not only to follow this standard but to suppress the warning as well. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Justin Stitt <justinstitt@google.com> Suggested-by: Joe Perches <joe@perches.com> Suggested-by: Nathan Chancellor <nathan@kernel.org> Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-07-12 12:58:53 +02:00
Moshe Shemesh	f0680ef0f9	devlink: Hold the instance lock in port_new / port_del callbacks Let the core take the devlink instance lock around port_new and port_del callbacks and remove the now redundant locking in the only driver that currently use them. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-07-12 10:26:23 +02:00
Moshe Shemesh	df539fc62b	devlink: Remove unused functions devlink_rate_leaf_create/destroy The previous patch removed the last usage of the functions devlink_rate_leaf_create() and devlink_rate_nodes_destroy(). Thus, remove these function from devlink API. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-07-12 10:26:22 +02:00
Moshe Shemesh	868232f5cd	devlink: Remove unused function devlink_rate_nodes_destroy The previous patch removed the last usage of the function devlink_rate_nodes_destroy(). Thus, remove this function from devlink API. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-07-12 10:26:22 +02:00
Jakub Kicinski	57128e98c3	tls: rx: fix the NoPad getsockopt Maxim reports do_tls_getsockopt_no_pad() will always return an error. Indeed looks like refactoring gone wrong - remove err and use value. Reported-by: Maxim Mikityanskiy <maximmi@nvidia.com> Fixes: `88527790c0` ("tls: rx: add sockopt for enabling optimistic decrypt with TLS 1.3") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-11 19:48:33 -07:00
Jakub Kicinski	bb56cea9ab	tls: rx: add counter for NoPad violations As discussed with Maxim add a counter for true NoPad violations. This should help deployments catch unexpected padded records vs just control records which always need re-encryption. https: //lore.kernel.org/all/b111828e6ac34baad9f4e783127eba8344ac252d.camel@nvidia.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-11 19:48:33 -07:00
Jakub Kicinski	1090c1ea22	tls: fix spelling of MIB MIN -> MIB Fixes: `88527790c0` ("tls: rx: add sockopt for enabling optimistic decrypt with TLS 1.3") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-11 19:48:32 -07:00
Liu Jian	9974d37ea7	skmsg: Fix invalid last sg check in sk_msg_recvmsg() In sk_psock_skb_ingress_enqueue function, if the linear area + nr_frags + frag_list of the SKB has NR_MSG_FRAG_IDS blocks in total, skb_to_sgvec will return NR_MSG_FRAG_IDS, then msg->sg.end will be set to NR_MSG_FRAG_IDS, and in addition, (NR_MSG_FRAG_IDS - 1) is set to the last SG of msg. Recv the msg in sk_msg_recvmsg, when i is (NR_MSG_FRAG_IDS - 1), the sk_msg_iter_var_next(i) will change i to 0 (not NR_MSG_FRAG_IDS), the judgment condition "msg_rx->sg.start==msg_rx->sg.end" and "i != msg_rx->sg.end" can not work. As a result, the processed msg cannot be deleted from ingress_msg list. But the length of all the sge of the msg has changed to 0. Then the next recvmsg syscall will process the msg repeatedly, because the length of sge is 0, the -EFAULT error is always returned. Fixes: `604326b41a` ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220628123616.186950-1-liujian56@huawei.com	2022-07-11 18:22:07 +02:00
Florian Westphal	6b77205374	netfilter: nf_tables: move nft_cmp_fast_mask to where its used ... and cast result to u32 so sparse won't complain anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-11 16:40:46 +02:00
Florian Westphal	ffb3d9a30c	netfilter: nf_tables: use correct integer types Sparse tool complains about mixing of different endianess types, so use the correct ones. Add type casts where needed. objdiff shows no changes except in nft_tunnel (type is changed). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-11 16:40:46 +02:00
Florian Westphal	7278b3c1e4	netfilter: nf_tables: add and use BE register load-store helpers Same as the existing ones, no conversions. This is just for sparse sake only so that we no longer mix be16/u16 and be32/u32 types. Alternative is to add __force __beX in various places, but this seems nicer. objdiff shows no changes. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-11 16:40:46 +02:00
Florian Westphal	d86473bf2f	netfilter: nf_tables: use the correct get/put helpers Switch to be16/32 and u16/32 respectively. No code changes here, the functions do the same thing, this is just for sparse checkers' sake. objdiff shows no changes. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-11 16:40:46 +02:00
Florian Westphal	168141f7e0	netfilter: x_tables: use correct integer types Sparse complains because __be32 and u32 are mixed without conversions. Use the correct types, no code changes. Furthermore, xt_DSCP generates a bit truncation warning: "cast truncates bits from constant value (ffffff03 becomes 3)" The truncation is fine (and wanted). Add a private definition and use that instead. objdiff shows no changes. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-11 16:40:45 +02:00
Florian Westphal	ec6f2ff0a3	netfilter: nfnetlink: add missing __be16 cast Sparse flags this as suspicious, because this compares integer with a be16 with no conversion. Its a compat check for old userspace that sends host byte order, so force a be16 cast here. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-11 16:40:45 +02:00
Zhang Jiaming	f72547473f	netfilter: nft_set_bitmap: Fix spelling mistake Change 'succesful' to 'successful'. Change 'transation' to 'transaction'. Signed-off-by: Zhang Jiaming <jiaming@nfschina.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-11 16:40:37 +02:00
Florian Westphal	d3f2d0a292	netfilter: h323: merge nat hook pointers into one sparse complains about incorrect rcu usage. Code uses the correct rcu access primitives, but the function pointers lack rcu annotations. Collapse all of them into a single structure, then annotate the pointer. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-11 16:25:16 +02:00
Florian Westphal	e14575fa75	netfilter: nf_conntrack: use rcu accessors where needed Sparse complains about direct access to the 'helper' and timeout members. Both have __rcu annotation, so use the accessors. xt_CT is fine, accesses occur before the structure is visible to other cpus. Switch to rcu accessors there as well to reduce noise. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-11 16:25:15 +02:00
Florian Westphal	6976890e89	netfilter: nf_conntrack: add missing __rcu annotations Access to the hook pointers use correct helpers but the pointers lack the needed __rcu annotation. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-11 16:25:15 +02:00
Vlad Buslov	b038177636	netfilter: nf_flow_table: count pending offload workqueue tasks To improve hardware offload debuggability count pending 'add', 'del' and 'stats' flow_table offload workqueue tasks. Counters are incremented before scheduling new task and decremented when workqueue handler finishes executing. These counters allow user to diagnose congestion on hardware offload workqueues that can happen when either CPU is starved and workqueue jobs are executed at lower rate than new ones are added or when hardware/driver can't keep up with the rate. Implement the described counters as percpu counters inside new struct netns_ft which is stored inside struct net. Expose them via new procfs file '/proc/net/stats/nf_flowtable' that is similar to existing 'nf_conntrack' file. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-11 16:25:14 +02:00
Vlad Buslov	fc54d9065f	net/sched: act_ct: set 'net' pointer when creating new nf_flow_table Following patches in series use the pointer to access flow table offload debug variables. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-11 16:25:14 +02:00
Bill Wendling	b8acd43148	netfilter: conntrack: use correct format characters When compiling with -Wformat, clang emits the following warnings: net/netfilter/nf_conntrack_helper.c:168:18: error: format string is not a string literal (potentially insecure) [-Werror,-Wformat-security] request_module(mod_name); ^~~~~~~~ Use a string literal for the format string. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Bill Wendling <isanbard@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-11 16:25:14 +02:00
Jackie Liu	6be7915612	netfilter: conntrack: use fallthrough to cleanup These cases all use the same function. we can simplify the code through fallthrough. $ size net/netfilter/nf_conntrack_core.o text data bss dec hex filename before 81601 81430 768 163799 27fd7 net/netfilter/nf_conntrack_core.o after 80361 81430 768 162559 27aff net/netfilter/nf_conntrack_core.o Arch: aarch64 Gcc : gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1) Reported-by: k2ci <kernel-bot@kylinos.cn> Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-11 16:25:13 +02:00
sewookseo	e22aa14866	net: Find dst with sk's xfrm policy not ctl_sk If we set XFRM security policy by calling setsockopt with option IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock' struct. However tcp_v6_send_response doesn't look up dst_entry with the actual socket but looks up with tcp control socket. This may cause a problem that a RST packet is sent without ESP encryption & peer's TCP socket can't receive it. This patch will make the function look up dest_entry with actual socket, if the socket has XFRM policy(sock_policy), so that the TCP response packet via this function can be encrypted, & aligned on the encrypted TCP socket. Tested: We encountered this problem when a TCP socket which is encrypted in ESP transport mode encryption, receives challenge ACK at SYN_SENT state. After receiving challenge ACK, TCP needs to send RST to establish the socket at next SYN try. But the RST was not encrypted & peer TCP socket still remains on ESTABLISHED state. So we verified this with test step as below. [Test step] 1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED). 2. Client tries a new connection on the same TCP ports(src & dst). 3. Server will return challenge ACK instead of SYN,ACK. 4. Client will send RST to server to clear the SOCKET. 5. Client will retransmit SYN to server on the same TCP ports. [Expected result] The TCP connection should be established. Cc: Maciej Żenczykowski <maze@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Sehee Lee <seheele@google.com> Signed-off-by: Sewook Seo <sewookseo@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-11 13:39:56 +01:00
David S. Miller	e45955766b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) refcount_inc_not_zero() is not semantically equivalent to atomic_int_not_zero(), from Florian Westphal. My understanding was that refcount_*() API provides a wrapper to easier debugging of reference count leaks, however, there are semantic differences between these two APIs, where refcount_inc_not_zero() needs a barrier. Reason for this subtle difference to me is unknown. 2) packet logging is not correct for ARP and IP packets, from the ARP family and netdev/egress respectively. Use skb_network_offset() to reach the headers accordingly. 3) set element extension length have been growing over time, replace a BUG_ON by EINVAL which might be triggerable from userspace. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-11 11:58:38 +01:00
Paolo Abeni	5c835bb142	mptcp: fix subflow traversal at disconnect time At disconnect time the MPTCP protocol traverse the subflows list closing each of them. In some circumstances - MPJ subflow, passive MPTCP socket, the latter operation can remove the subflow from the list, invalidating the current iterator. Address the issue using the safe list traversing helper variant. Reported-by: van fantasy <g1042620637@gmail.com> Fixes: `b29fcfb54c` ("mptcp: full disconnect implementation") Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-11 11:31:38 +01:00
Felix Fietkau	50e2ab3929	wifi: mac80211: fix queue selection for mesh/OCB interfaces When using iTXQ, the code assumes that there is only one vif queue for broadcast packets, using the BE queue. Allowing non-BE queue marking violates that assumption and txq->ac == skb_queue_mapping is no longer guaranteed. This can cause issues with queue handling in the driver and also causes issues with the recent ATF change, resulting in an AQL underflow warning. Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20220702145227.39356-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-11 10:36:55 +02:00
Christophe JAILLET	37babce912	wifi: mac80211: Use the bitmap API to allocate bitmaps Use bitmap_zalloc()/bitmap_free() instead of hand-writing them. It is less verbose and it improves the semantic. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/dfb438a6a199ee4c95081fa01bd758fd30e50931.1656962156.git.christophe.jaillet@wanadoo.fr Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-11 10:21:25 +02:00
MeiChia Chiu	68608f9991	wifi: mac80211: fix center freq calculation in ieee80211_chandef_downgrade When mac80211 downgrades working bandwidth, the center_freq and center_freq1 need to be recalculated. There is a typo in the case of downgrading bandwidth from 320MHz to 160MHz which would cause a wrong frequency value. Reviewed-by: Money Wang <Money.Wang@mediatek.com> Signed-off-by: MeiChia Chiu <MeiChia.Chiu@mediatek.com> Link: https://lore.kernel.org/r/20220708095823.12959-1-MeiChia.Chiu@mediatek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-11 10:21:04 +02:00
Veerendranath Jakkam	3c512307de	wifi: nl80211: fix sending link ID info of associated BSS commit `dd374f84ba` ("wifi: nl80211: expose link ID for associated BSSes") used a top-level attribute to send link ID of the associated BSS in the nested attribute NL80211_ATTR_BSS. But since NL80211_ATTR_BSS is a nested attribute of the attributes defined in enum nl80211_bss, define a new attribute in enum nl80211_bss and use it for sending the link ID of the BSS. Fixes: `dd374f84ba` ("wifi: nl80211: expose link ID for associated BSSes") Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com> Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20220708122607.1836958-1-quic_vjakkam@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-11 10:20:18 +02:00
Veerendranath Jakkam	c528d7a275	wifi: cfg80211: fix a comment in cfg80211_mlme_mgmt_tx() A comment in cfg80211_mlme_mgmt_tx() is describing this API used only for transmitting action frames. Fix the comment since cfg80211_mlme_mgmt_tx() can be used to transmit any management frame. Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com> Link: https://lore.kernel.org/r/20220708165545.2072999-1-quic_vjakkam@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-11 10:19:54 +02:00
Veerendranath Jakkam	ff3821bc35	wifi: nl80211: Fix reading NL80211_ATTR_MLO_LINK_ID in nl80211_pre_doit nl80211_pre_doit() using nla_get_u16() to read u8 attribute NL80211_ATTR_MLO_LINK_ID. Fix this by using nla_get_u8() to read NL80211_ATTR_MLO_LINK_ID attribute. Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com> Link: https://lore.kernel.org/r/1657517683-5724-1-git-send-email-quic_vjakkam@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-11 10:19:32 +02:00
Jakub Kicinski	0076cad301	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2022-07-09 We've added 94 non-merge commits during the last 19 day(s) which contain a total of 125 files changed, 5141 insertions(+), 6701 deletions(-). The main changes are: 1) Add new way for performing BTF type queries to BPF, from Daniel Müller. 2) Add inlining of calls to bpf_loop() helper when its function callback is statically known, from Eduard Zingerman. 3) Implement BPF TCP CC framework usability improvements, from Jörn-Thorben Hinz. 4) Add LSM flavor for attaching per-cgroup BPF programs to existing LSM hooks, from Stanislav Fomichev. 5) Remove all deprecated libbpf APIs in prep for 1.0 release, from Andrii Nakryiko. 6) Add benchmarks around local_storage to BPF selftests, from Dave Marchevsky. 7) AF_XDP sample removal (given move to libxdp) and various improvements around AF_XDP selftests, from Magnus Karlsson & Maciej Fijalkowski. 8) Add bpftool improvements for memcg probing and bash completion, from Quentin Monnet. 9) Add arm64 JIT support for BPF-2-BPF coupled with tail calls, from Jakub Sitnicki. 10) Sockmap optimizations around throughput of UDP transmissions which have been improved by 61%, from Cong Wang. 11) Rework perf's BPF prologue code to remove deprecated functions, from Jiri Olsa. 12) Fix sockmap teardown path to avoid sleepable sk_psock_stop, from John Fastabend. 13) Fix libbpf's cleanup around legacy kprobe/uprobe on error case, from Chuang Wang. 14) Fix libbpf's bpf_helpers.h to work with gcc for the case of its sec/pragma macro, from James Hilliard. 15) Fix libbpf's pt_regs macros for riscv to use a0 for RC register, from Yixun Lan. 16) Fix bpftool to show the name of type BPF_OBJ_LINK, from Yafang Shao. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (94 commits) selftests/bpf: Fix xdp_synproxy build failure if CONFIG_NF_CONNTRACK=m/n bpf: Correctly propagate errors up from bpf_core_composites_match libbpf: Disable SEC pragma macro on GCC bpf: Check attach_func_proto more carefully in check_return_code selftests/bpf: Add test involving restrict type qualifier bpftool: Add support for KIND_RESTRICT to gen min_core_btf command MAINTAINERS: Add entry for AF_XDP selftests files selftests, xsk: Rename AF_XDP testing app bpf, docs: Remove deprecated xsk libbpf APIs description selftests/bpf: Add benchmark for local_storage RCU Tasks Trace usage libbpf, riscv: Use a0 for RC register libbpf: Remove unnecessary usdt_rel_ip assignments selftests/bpf: Fix few more compiler warnings selftests/bpf: Fix bogus uninitialized variable warning bpftool: Remove zlib feature test from Makefile libbpf: Cleanup the legacy uprobe_event on failed add/attach_event() libbpf: Fix wrong variable used in perf_event_uprobe_open_legacy() libbpf: Cleanup the legacy kprobe_event on failed add/attach_event() selftests/bpf: Add type match test against kernel's task_struct selftests/bpf: Add nested type to type based tests ... ==================== Link: https://lore.kernel.org/r/20220708233145.32365-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-09 12:24:16 -07:00
Pablo Neira Ayuso	c39ba4de6b	netfilter: nf_tables: replace BUG_ON by element length check BUG_ON can be triggered from userspace with an element with a large userdata area. Replace it by length check and return EINVAL instead. Over time extensions have been growing in size. Pick a sufficiently old Fixes: tag to propagate this fix. Fixes: `7d7402642e` ("netfilter: nf_tables: variable sized set element keys / data") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-09 16:25:09 +02:00
Eric Dumazet	44ac441a51	af_unix: fix unix_sysctl_register() error path We want to kfree(table) if @table has been kmalloced, ie for non initial network namespace. Fixes: `849d5aa3a1` ("af_unix: Do not call kmemdup() for init_net's sysctl table.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-09 12:27:33 +01:00
Eric Dumazet	72a0b32911	vlan: fix memory leak in vlan_newlink() Blamed commit added back a bug I fixed in commit `9bbd917e0b` ("vlan: fix memory leak in vlan_dev_set_egress_priority") If a memory allocation fails in vlan_changelink() after other allocations succeeded, we need to call vlan_dev_free_egress_priority() to free all allocated memory because after a failed ->newlink() we do not call any methods like ndo_uninit() or dev->priv_destructor(). In following example, if the allocation for last element 2000:2001 fails, we need to free eight prior allocations: ip link add link dummy0 dummy0.100 type vlan id 100 \ egress-qos-map 1:2 2:3 3:4 4:5 5:6 6:7 7:8 8:9 2000:2001 syzbot report was: BUG: memory leak unreferenced object 0xffff888117bd1060 (size 32): comm "syz-executor408", pid 3759, jiffies 4294956555 (age 34.090s) hex dump (first 32 bytes): 09 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff83fc60ad>] kmalloc include/linux/slab.h:600 [inline] [<ffffffff83fc60ad>] vlan_dev_set_egress_priority+0xed/0x170 net/8021q/vlan_dev.c:193 [<ffffffff83fc6628>] vlan_changelink+0x178/0x1d0 net/8021q/vlan_netlink.c:128 [<ffffffff83fc67c8>] vlan_newlink+0x148/0x260 net/8021q/vlan_netlink.c:185 [<ffffffff838b1278>] rtnl_newlink_create net/core/rtnetlink.c:3363 [inline] [<ffffffff838b1278>] __rtnl_newlink+0xa58/0xdc0 net/core/rtnetlink.c:3580 [<ffffffff838b1629>] rtnl_newlink+0x49/0x70 net/core/rtnetlink.c:3593 [<ffffffff838ac66c>] rtnetlink_rcv_msg+0x21c/0x5c0 net/core/rtnetlink.c:6089 [<ffffffff839f9c37>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2501 [<ffffffff839f8da7>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] [<ffffffff839f8da7>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345 [<ffffffff839f9266>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921 [<ffffffff8384dbf6>] sock_sendmsg_nosec net/socket.c:714 [inline] [<ffffffff8384dbf6>] sock_sendmsg+0x56/0x80 net/socket.c:734 [<ffffffff8384e15c>] ____sys_sendmsg+0x36c/0x390 net/socket.c:2488 [<ffffffff838523cb>] ___sys_sendmsg+0x8b/0xd0 net/socket.c:2542 [<ffffffff838525b8>] __sys_sendmsg net/socket.c:2571 [inline] [<ffffffff838525b8>] __do_sys_sendmsg net/socket.c:2580 [inline] [<ffffffff838525b8>] __se_sys_sendmsg net/socket.c:2578 [inline] [<ffffffff838525b8>] __x64_sys_sendmsg+0x78/0xf0 net/socket.c:2578 [<ffffffff845ad8d5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff845ad8d5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 [<ffffffff8460006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: `37aa50c539` ("vlan: introduce vlan_dev_free_egress_priority") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Xin Long <lucien.xin@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-09 12:26:59 +01:00
Geliang Tang	f7657ff4a7	mptcp: move MPTCPOPT_HMAC_LEN to net/mptcp.h Move macro MPTCPOPT_HMAC_LEN definition from net/mptcp/protocol.h to include/net/mptcp.h. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-09 12:19:23 +01:00
Pablo Neira Ayuso	7a847c00ee	netfilter: nf_log: incorrect offset to network header NFPROTO_ARP is expecting to find the ARP header at the network offset. In the particular case of ARP, HTYPE= field shows the initial bytes of the ethernet header destination MAC address. netdev out: IN= OUT=bridge0 MACSRC=c2:76:e5:71:e1:de MACDST=36:b0:4a:e2:72:ea MACPROTO=0806 ARP HTYPE=14000 PTYPE=0x4ae2 OPCODE=49782 NFPROTO_NETDEV egress hook is also expecting to find the IP headers at the network offset. Fixes: `35b9395104` ("netfilter: add generic ARP packet logger") Reported-by: Tom Yan <tom.ty89@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-09 09:55:43 +02:00
Justin Stitt	5b47d23646	net: rxrpc: fix clang -Wformat warning When building with Clang we encounter this warning: \| net/rxrpc/rxkad.c:434:33: error: format specifies type 'unsigned short' \| but the argument has type 'u32' (aka 'unsigned int') [-Werror,-Wformat] \| _leave(" = %d [set %hx]", ret, y); y is a u32 but the format specifier is `%hx`. Going from unsigned int to short int results in a loss of data. This is surely not intended behavior. If it is intended, the warning should be suppressed through other means. This patch should get us closer to the goal of enabling the -Wformat flag for Clang builds. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Acked-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20220707182052.769989-1-justinstitt@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-08 20:15:11 -07:00
Jakub Kicinski	35560b7f06	tls: rx: make tls_wait_data() return an recvmsg retcode tls_wait_data() sets the return code as an output parameter and always returns ctx->recv_pkt on success. Return the error code directly and let the caller read the skb from the context. Use positive return code to indicate ctx->recv_pkt is ready. While touching the definition of the function rename it. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-08 18:38:45 -07:00
Jakub Kicinski	5879031423	tls: create an internal header include/net/tls.h is getting a little long, and is probably hard for driver authors to navigate. Split out the internals into a header which will live under net/tls/. While at it move some static inlines with a single user into the source files, add a few tls_ prefixes and fix spelling of 'proccess'. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-08 18:38:45 -07:00
Jakub Kicinski	03957d8405	tls: rx: coalesce exit paths in tls_decrypt_sg() Jump to the free() call, instead of having to remember to free the memory in multiple places. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-08 18:38:45 -07:00
Jakub Kicinski	b89fec54fd	tls: rx: wrap decrypt params in a struct The max size of iv + aad + tail is 22B. That's smaller than a single sg entry (32B). Don't bother with the memory packing, just create a struct which holds the max size of those members. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-08 18:38:45 -07:00
Jakub Kicinski	50a07aa531	tls: rx: always allocate max possible aad size for decrypt AAD size is either 5 or 13. Really no point complicating the code for the 8B of difference. This will also let us turn the chunked up buffer into a sane struct. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-08 18:38:45 -07:00
Jakub Kicinski	2d91ecace6	strparser: pad sk_skb_cb to avoid straddling cachelines sk_skb_cb lives within skb->cb[]. skb->cb[] straddles 2 cache lines, each containing 24B of data. The first cache line does not contain much interesting information for users of strparser, so pad things a little. Previously strp_msg->full_len would live in the first cache line and strp_msg->offset in the second. We need to reorder the 8 byte temp_reg with struct tls_msg to prevent a 4B hole which would push the struct over 48B. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-08 18:38:44 -07:00
Jakub Kicinski	7c895ef884	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== bpf 2022-07-08 We've added 3 non-merge commits during the last 2 day(s) which contain a total of 7 files changed, 40 insertions(+), 24 deletions(-). The main changes are: 1) Fix cBPF splat triggered by skb not having a mac header, from Eric Dumazet. 2) Fix spurious packet loss in generic XDP when pushing packets out (note that native XDP is not affected by the issue), from Johan Almbladh. 3) Fix bpf_dynptr_{read,write}() helper signatures with flag argument before its set in stone as UAPI, from Joanne Koong. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Add flags arg to bpf_dynptr_read and bpf_dynptr_write APIs bpf: Make sure mac_header was set before using it xdp: Fix spurious packet loss in generic XDP TX path ==================== Link: https://lore.kernel.org/r/20220708213418.19626-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-08 15:24:16 -07:00
Eric Dumazet	c2dd4059dc	net: minor optimization in __alloc_skb() TCP allocates 'fast clones' skbs for packets in tx queues. Currently, __alloc_skb() initializes the companion fclone field to SKB_FCLONE_CLONE, and leaves other fields untouched. It makes sense to defer this init much later in skb_clone(), because all fclone fields are copied and hot in cpu caches at that time. This removes one cache line miss in __alloc_skb(), cost seen on an host with 256 cpus all competing on memory accesses. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-08 14:21:08 +01:00
Justin Stitt	9d899dbe23	l2tp: l2tp_debugfs: fix Clang -Wformat warnings When building with Clang we encounter the following warnings: \| net/l2tp/l2tp_debugfs.c:187:40: error: format specifies type 'unsigned \| short' but the argument has type 'u32' (aka 'unsigned int') \| [-Werror,-Wformat] seq_printf(m, " nr %hu, ns %hu\n", session->nr, \| session->ns); - \| net/l2tp/l2tp_debugfs.c:196:32: error: format specifies type 'unsigned \| short' but the argument has type 'int' [-Werror,-Wformat] \| session->l2specific_type, l2tp_get_l2specific_len(session)); - \| net/l2tp/l2tp_debugfs.c:219:6: error: format specifies type 'unsigned \| short' but the argument has type 'u32' (aka 'unsigned int') \| [-Werror,-Wformat] session->nr, session->ns, Both session->nr and ->nc are of type `u32`. The currently used format specifier is `%hu` which describes a `u16`. My proposed fix is to listen to Clang and use the correct format specifier `%u`. For the warning at line 196, l2tp_get_l2specific_len() returns an int and should therefore be using the `%d` format specifier. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Justin Stitt <justinstitt@google.com> Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-08 12:14:36 +01:00
Kuniyuki Iwashima	73318c4b7d	ipv4: Fix a data-race around sysctl_fib_sync_mem. While reading sysctl_fib_sync_mem, it can be changed concurrently. So, we need to add READ_ONCE() to avoid a data-race. Fixes: `9ab948a91b` ("ipv4: Allow amount of dirty memory from fib resizing to be controllable") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-08 12:10:34 +01:00
Kuniyuki Iwashima	48d7ee321e	icmp: Fix data-races around sysctl. While reading icmp sysctl variables, they can be changed concurrently. So, we need to add READ_ONCE() to avoid data-races. Fixes: `4cdf507d54` ("icmp: add a global rate limitation") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-08 12:10:34 +01:00
Kuniyuki Iwashima	dd44f04b92	cipso: Fix data-races around sysctl. While reading cipso sysctl variables, they can be changed concurrently. So, we need to add READ_ONCE() to avoid data-races. Fixes: `446fda4f26` ("[NetLabel]: CIPSOv4 engine") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-08 12:10:33 +01:00
Kuniyuki Iwashima	3d32edf1f3	inetpeer: Fix data-races around sysctl. While reading inetpeer sysctl variables, they can be changed concurrently. So, we need to add READ_ONCE() to avoid data-races. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-08 12:10:33 +01:00
Kuniyuki Iwashima	47e6ab24e8	tcp: Fix a data-race around sysctl_tcp_max_orphans. While reading sysctl_tcp_max_orphans, it can be changed concurrently. So, we need to add READ_ONCE() to avoid a data-race. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-08 12:10:33 +01:00
Justin Stitt	a2b6111b55	net: l2tp: fix clang -Wformat warning When building with clang we encounter this warning: \| net/l2tp/l2tp_ppp.c:1557:6: error: format specifies type 'unsigned \| short' but the argument has type 'u32' (aka 'unsigned int') \| [-Werror,-Wformat] session->nr, session->ns, Both session->nr and session->ns are of type u32. The format specifier previously used is `%hu` which would truncate our unsigned integer from 32 to 16 bits. This doesn't seem like intended behavior, if it is then perhaps we need to consider suppressing the warning with pragma clauses. This patch should get us closer to the goal of enabling the -Wformat flag for Clang builds. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Justin Stitt <justinstitt@google.com> Acked-by: Guillaume Nault <gnault@redhat.com> Link: https://lore.kernel.org/r/20220706230833.535238-1-justinstitt@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-07 18:07:01 -07:00
Jie Wang	d810d367ec	net: page_pool: optimize page pool page allocation in NUMA scenario Currently NIC packet receiving performance based on page pool deteriorates occasionally. To analysis the causes of this problem page allocation stats are collected. Here are the stats when NIC rx performance deteriorates: bandwidth(Gbits/s) 16.8 6.91 rx_pp_alloc_fast 13794308 21141869 rx_pp_alloc_slow 108625 166481 rx_pp_alloc_slow_h 0 0 rx_pp_alloc_empty 8192 8192 rx_pp_alloc_refill 0 0 rx_pp_alloc_waive 100433 158289 rx_pp_recycle_cached 0 0 rx_pp_recycle_cache_full 0 0 rx_pp_recycle_ring 362400 420281 rx_pp_recycle_ring_full 6064893 9709724 rx_pp_recycle_released_ref 0 0 The rx_pp_alloc_waive count indicates that a large number of pages' numa node are inconsistent with the NIC device numa node. Therefore these pages can't be reused by the page pool. As a result, many new pages would be allocated by __page_pool_alloc_pages_slow which is time consuming. This causes the NIC rx performance fluctuations. The main reason of huge numa mismatch pages in page pool is that page pool uses alloc_pages_bulk_array to allocate original pages. This function is not suitable for page allocation in NUMA scenario. So this patch uses alloc_pages_bulk_array_node which has a NUMA id input parameter to ensure the NUMA consistent between NIC device and allocated pages. Repeated NIC rx performance tests are performed 40 times. NIC rx bandwidth is higher and more stable compared to the datas above. Here are three test stats, the rx_pp_alloc_waive count is zero and rx_pp_alloc_slow which indicates pages allocated from slow patch is relatively low. bandwidth(Gbits/s) 93 93.9 93.8 rx_pp_alloc_fast 60066264 61266386 60938254 rx_pp_alloc_slow 16512 16517 16539 rx_pp_alloc_slow_ho 0 0 0 rx_pp_alloc_empty 16512 16517 16539 rx_pp_alloc_refill 473841 481910 481585 rx_pp_alloc_waive 0 0 0 rx_pp_recycle_cached 0 0 0 rx_pp_recycle_cache_full 0 0 0 rx_pp_recycle_ring 29754145 30358243 30194023 rx_pp_recycle_ring_full 0 0 0 rx_pp_recycle_released_ref 0 0 0 Signed-off-by: Jie Wang <wangjie125@huawei.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/r/20220705113515.54342-1-huangguangbin2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-07 17:03:16 -07:00
Jakub Kicinski	83ec88d81a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-07 12:07:37 -07:00
Florian Westphal	0ed8f619b4	netfilter: conntrack: fix crash due to confirmed bit load reordering Kajetan Puchalski reports crash on ARM, with backtrace of: __nf_ct_delete_from_lists nf_ct_delete early_drop __nf_conntrack_alloc Unlike atomic_inc_not_zero, refcount_inc_not_zero is not a full barrier. conntrack uses SLAB_TYPESAFE_BY_RCU, i.e. it is possible that a 'newly' allocated object is still in use on another CPU: CPU1 CPU2 encounter 'ct' during hlist walk delete_from_lists refcount drops to 0 kmem_cache_free(ct); __nf_conntrack_alloc() // returns same object refcount_inc_not_zero(ct); /* might fail / / If set, ct is public/in the hash table */ test_bit(IPS_CONFIRMED_BIT, &ct->status); In case CPU1 already set refcount back to 1, refcount_inc_not_zero() will succeed. The expected possibilities for a CPU that obtained the object 'ct' (but no reference so far) are: 1. refcount_inc_not_zero() fails. CPU2 ignores the object and moves to the next entry in the list. This happens for objects that are about to be free'd, that have been free'd, or that have been reallocated by __nf_conntrack_alloc(), but where the refcount has not been increased back to 1 yet. 2. refcount_inc_not_zero() succeeds. CPU2 checks the CONFIRMED bit in ct->status. If set, the object is public/in the table. If not, the object must be skipped; CPU2 calls nf_ct_put() to un-do the refcount increment and moves to the next object. Parallel deletion from the hlists is prevented by a 'test_and_set_bit(IPS_DYING_BIT, &ct->status);' check, i.e. only one cpu will do the unlink, the other one will only drop its reference count. Because refcount_inc_not_zero is not a full barrier, CPU2 may try to delete an object that is not on any list: 1. refcount_inc_not_zero() successful (refcount inited to 1 on other CPU) 2. CONFIRMED test also successful (load was reordered or zeroing of ct->status not yet visible) 3. delete_from_lists unlinks entry not on the hlist, because IPS_DYING_BIT is 0 (already cleared). 2) is already wrong: CPU2 will handle a partially initited object that is supposed to be private to CPU1. Add needed barriers when refcount_inc_not_zero() is successful. It also inserts a smp_wmb() before the refcount is set to 1 during allocation. Because other CPU might still see the object, refcount_set(1) "resurrects" it, so we need to make sure that other CPUs will also observe the right content. In particular, the CONFIRMED bit test must only pass once the object is fully initialised and either in the hash or about to be inserted (with locks held to delay possible unlink from early_drop or gc worker). I did not change flow_offload_alloc(), as far as I can see it should call refcount_inc(), not refcount_inc_not_zero(): the ct object is attached to the skb so its refcount should be >= 1 in all cases. v2: prefer smp_acquire__after_ctrl_dep to smp_rmb (Will Deacon). v3: keep smp_acquire__after_ctrl_dep close to refcount_inc_not_zero call add comment in nf_conntrack_netlink, no control dependency there due to locks. Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/all/Yr7WTfd6AVTQkLjI@e126311.manchester.arm.com/ Reported-by: Kajetan Puchalski <kajetan.puchalski@arm.com> Diagnosed-by: Will Deacon <will@kernel.org> Fixes: `7197743776` ("netfilter: conntrack: convert to refcount_t api") Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Will Deacon <will@kernel.org>	2022-07-07 20:55:18 +02:00
Linus Torvalds	ef4ab3ba4e	Networking fixes for 5.19-rc6, including fixes from bpf, netfilter, can, bluetooth Current release - regressions: - bluetooth: fix deadlock on hci_power_on_sync. Previous releases - regressions: - sched: act_police: allow 'continue' action offload - eth: usbnet: fix memory leak in error case - eth: ibmvnic: properly dispose of all skbs during a failover. Previous releases - always broken: - bpf: - fix insufficient bounds propagation from adjust_scalar_min_max_vals - clear page contiguity bit when unmapping pool - netfilter: nft_set_pipapo: release elements in clone from abort path - mptcp: netlink: issue MP_PRIO signals from userspace PMs - can: - rcar_canfd: fix data transmission failed on R-Car V3U - gs_usb: gs_usb_open/close(): fix memory leak Misc: - add Wenjia as SMC maintainer Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmLGqsUSHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOkz8kQAINYcsrZ7sBKAVeGNq/PzPXpIuIvxLVL XP+9nqs+8JiBG0xPQNfV/AlRWilWckMzQf1F8SfuDwg5ahz0HSN9XJVf+v9p9uYs GthlBgLCH+Kp06831wVC/j8GBcQm2cneOaaZN4udLRORztbOGkn5xFhJOu3lezap IqvAIlyQFCi6uan+iGUXEwh/hEPgH2imOM+1ICao/fp9m7cGkBQKyqAY/ztxgby4 H1DdSsPSZ7e1wjAczdr0oGPzEE5OMxdJUk9yigSNnKwGavoGtizRefStWD+yEUBj XzeWwlAO/otJsklp9cesRYPKiiIx1bmVG14ZTSRpzobg3FEKjP0H4iBgtO67972W RJcolGUtxPd6lgrP5ZxzcStS2v44GeuKkvhKbMMsEEvEDg/we9vBZc6AX6Xs8yr3 fBBkSQnzCJF7CtHxSf7n/6RM4VfaHMbSBb2u23DVsf9N0rU2atNPRvwT2koe0SyO 8lSECzUdjRE2f48PIk0/+nl4zFmAjDBMI1W8+YeeBrjcYQmBtkmHn9eMjAWu5E1f 1pGqmtc3N/LqI4f6l9/oAE2IuiIvdTyo53/Zdqm5SLmIDttVzxAeHrEAaOCwoiWV QXxpvwG3nYd1mE0MfBQLcjD0tpw7ZK3oG/IqDTSiLwGaRXVPxqqQ6jdSriWFUzGm 3zl8fnai73hd =x7Dr -----END PGP SIGNATURE----- Merge tag 'net-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf, netfilter, can, and bluetooth. Current release - regressions: - bluetooth: fix deadlock on hci_power_on_sync Previous releases - regressions: - sched: act_police: allow 'continue' action offload - eth: usbnet: fix memory leak in error case - eth: ibmvnic: properly dispose of all skbs during a failover Previous releases - always broken: - bpf: - fix insufficient bounds propagation from adjust_scalar_min_max_vals - clear page contiguity bit when unmapping pool - netfilter: nft_set_pipapo: release elements in clone from abort path - mptcp: netlink: issue MP_PRIO signals from userspace PMs - can: - rcar_canfd: fix data transmission failed on R-Car V3U - gs_usb: gs_usb_open/close(): fix memory leak Misc: - add Wenjia as SMC maintainer" * tag 'net-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits) wireguard: Kconfig: select CRYPTO_CHACHA_S390 crypto: s390 - do not depend on CRYPTO_HW for SIMD implementations wireguard: selftests: use microvm on x86 wireguard: selftests: always call kernel makefile wireguard: selftests: use virt machine on m68k wireguard: selftests: set fake real time in init r8169: fix accessing unset transport header net: rose: fix UAF bug caused by rose_t0timer_expiry usbnet: fix memory leak in error case Revert "tls: rx: move counting TlsDecryptErrors for sync" mptcp: update MIB_RMSUBFLOW in cmd_sf_destroy mptcp: fix local endpoint accounting selftests: mptcp: userspace PM support for MP_PRIO signals mptcp: netlink: issue MP_PRIO signals from userspace PMs mptcp: Acquire the subflow socket lock before modifying MP_PRIO flags mptcp: Avoid acquiring PM lock for subflow priority changes mptcp: fix locking in mptcp_nl_cmd_sf_destroy() net/mlx5e: Fix matchall police parameters validation net/sched: act_police: allow 'continue' action offload net: lan966x: hardcode the number of external ports ...	2022-07-07 10:08:20 -07:00
Kuniyuki Iwashima	cf21b355cc	af_unix: Optimise hash table layout. Commit `6dd4142fb5` ("Merge branch 'af_unix-per-netns-socket-hash'") and commit `51bae889fe` ("af_unix: Put pathname sockets in the global hash table.") changed a hash table layout. Before: unix_socket_table [0 - 255] : abstract & pathname sockets [256 - 511] : unnamed sockets After: per-netns table [0 - 255] : abstract & pathname sockets [256 - 511] : unnamed sockets bsd_socket_table [0 - 255] : pathname sockets (sk_bind_node) Now, while looking up sockets, we traverse the global table for the pathname sockets and the first half of each per-netns hash table for abstract sockets, where pathname sockets are also linked. Thus, the more pathname sockets we have, the longer we take to look up abstract sockets. This characteristic has been there before the layout change, but we can improve it now. This patch changes the per-netns hash table's layout so that sockets not requiring lookup reside in the first half and do not impact the lookup of abstract sockets. per-netns table [0 - 255] : pathname & unnamed sockets [256 - 511] : abstract sockets bsd_socket_table [0 - 255] : pathname sockets (sk_bind_node) We have run a test that bind()s 100,000 abstract/pathname sockets for each, bind()s an abstract socket 100,000 times and measures the time on __unix_find_socket_byname(). The result shows that the patch makes each lookup faster. Without this patch: $ sudo ./funclatency -p 2278 --microseconds __unix_find_socket_byname.isra.44 usec : count distribution 0 -> 1 : 0 \| \| 2 -> 3 : 0 \| \| 4 -> 7 : 0 \| \| 8 -> 15 : 126 \| \| 16 -> 31 : 1438 \|* \| 32 -> 63 : 4150 \|* \| 64 -> 127 : 9049 \|*** \| 128 -> 255 : 37704 \|*************************** \| 256 -> 511 : 47533 \|*************************************\| With this patch: $ sudo ./funclatency -p 3648 --microseconds __unix_find_socket_byname.isra.46 usec : count distribution 0 -> 1 : 109 \| \| 2 -> 3 : 318 \| \| 4 -> 7 : 725 \| \| 8 -> 15 : 2501 \| \| 16 -> 31 : 3061 \| \| 32 -> 63 : 4028 \|* \| 64 -> 127 : 9312 \|***** \| 128 -> 255 : 51372 \|************************************\| 256 -> 511 : 28574 \|******************** \| Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20220705233715.759-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-07-07 13:19:01 +02:00
Duoming Zhou	148ca04518	net: rose: fix UAF bug caused by rose_t0timer_expiry There are UAF bugs caused by rose_t0timer_expiry(). The root cause is that del_timer() could not stop the timer handler that is running and there is no synchronization. One of the race conditions is shown below: (thread 1) \| (thread 2) \| rose_device_event \| rose_rt_device_down \| rose_remove_neigh rose_t0timer_expiry \| rose_stop_t0timer(rose_neigh) ... \| del_timer(&neigh->t0timer) \| kfree(rose_neigh) //[1]FREE neigh->dce_mode //[2]USE \| The rose_neigh is deallocated in position [1] and use in position [2]. The crash trace triggered by POC is like below: BUG: KASAN: use-after-free in expire_timers+0x144/0x320 Write of size 8 at addr ffff888009b19658 by task swapper/0/0 ... Call Trace: <IRQ> dump_stack_lvl+0xbf/0xee print_address_description+0x7b/0x440 print_report+0x101/0x230 ? expire_timers+0x144/0x320 kasan_report+0xed/0x120 ? expire_timers+0x144/0x320 expire_timers+0x144/0x320 __run_timers+0x3ff/0x4d0 run_timer_softirq+0x41/0x80 __do_softirq+0x233/0x544 ... This patch changes rose_stop_ftimer() and rose_stop_t0timer() in rose_remove_neigh() to del_timer_sync() in order that the timer handler could be finished before the resources such as rose_neigh and so on are deallocated. As a result, the UAF bugs could be mitigated. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Link: https://lore.kernel.org/r/20220705125610.77971-1-duoming@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-06 19:49:11 -07:00
Johan Almbladh	1fd6e56753	xdp: Fix spurious packet loss in generic XDP TX path The byte queue limits (BQL) mechanism is intended to move queuing from the driver to the network stack in order to reduce latency caused by excessive queuing in hardware. However, when transmitting or redirecting a packet using generic XDP, the qdisc layer is bypassed and there are no additional queues. Since netif_xmit_stopped() also takes BQL limits into account, but without having any alternative queuing, packets are silently dropped. This patch modifies the drop condition to only consider cases when the driver itself cannot accept any more packets. This is analogous to the condition in __dev_direct_xmit(). Dropped packets are also counted on the device. Bypassing the qdisc layer in the generic XDP TX path means that XDP packets are able to starve other packets going through a qdisc, and DDOS attacks will be more effective. In-driver-XDP use dedicated TX queues, so they do not have this starvation issue. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220705082345.2494312-1-johan.almbladh@anyfinetworks.com	2022-07-06 16:43:53 +02:00
Gal Pressman	a069a90554	Revert "tls: rx: move counting TlsDecryptErrors for sync" This reverts commit `284b4d93da`. When using TLS device offload and coming from tls_device_reencrypt() flow, -EBADMSG error in tls_do_decryption() should not be counted towards the TLSTlsDecryptError counter. Move the counter increase back to the decrypt_internal() call site in decrypt_skb_update(). This also fixes an issue where: if (n_sgin < 1) return -EBADMSG; Errors in decrypt_internal() were not counted after the cited patch. Fixes: `284b4d93da` ("tls: rx: move counting TlsDecryptErrors for sync") Cc: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-06 13:10:59 +01:00
Jakub Kicinski	c46b01839f	tls: rx: periodically flush socket backlog We continuously hold the socket lock during large reads and writes. This may inflate RTT and negatively impact TCP performance. Flush the backlog periodically. I tried to pick a flush period (128kB) which gives significant benefit but the max Bps rate is not yet visibly impacted. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-06 12:56:35 +01:00
Jakub Kicinski	88527790c0	tls: rx: add sockopt for enabling optimistic decrypt with TLS 1.3 Since optimisitic decrypt may add extra load in case of retries require socket owner to explicitly opt-in. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-06 12:56:35 +01:00
Jakub Kicinski	ce61327ce9	tls: rx: support optimistic decrypt to user buffer with TLS 1.3 We currently don't support decrypt to user buffer with TLS 1.3 because we don't know the record type and how much padding record contains before decryption. In practice data records are by far most common and padding gets used rarely so we can assume data record, no padding, and if we find out that wasn't the case - retry the crypto in place (decrypt to skb). To safeguard from user overwriting content type and padding before we can check it attach a 1B sg entry where last byte of the record will land. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-06 12:56:35 +01:00
Jakub Kicinski	603380f54f	tls: rx: don't include tail size in data_len To make future patches easier to review make data_len contain the length of the data, without the tail. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-06 12:56:35 +01:00
Geliang Tang	d2d21f175f	mptcp: update MIB_RMSUBFLOW in cmd_sf_destroy This patch increases MPTCP_MIB_RMSUBFLOW mib counter in userspace pm destroy subflow function mptcp_nl_cmd_sf_destroy() when removing subflow. Fixes: `702c2f646d` ("mptcp: netlink: allow userspace-driven subflow establishment") Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-06 12:50:26 +01:00
Paolo Abeni	843b5e75ef	mptcp: fix local endpoint accounting In mptcp_pm_nl_rm_addr_or_subflow() we always mark as available the id corresponding to the just removed address. The used bitmap actually tracks only the local IDs: we must restrict the operation when a (local) subflow is removed. Fixes: `a88c9e4969` ("mptcp: do not block subflows creation on errors") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-06 12:50:26 +01:00
Kishen Maloor	892f396c8e	mptcp: netlink: issue MP_PRIO signals from userspace PMs This change updates MPTCP_PM_CMD_SET_FLAGS to allow userspace PMs to issue MP_PRIO signals over a specific subflow selected by the connection token, local and remote address+port. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/286 Fixes: `702c2f646d` ("mptcp: netlink: allow userspace-driven subflow establishment") Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Kishen Maloor <kishen.maloor@intel.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-06 12:50:26 +01:00
Mat Martineau	a657430260	mptcp: Acquire the subflow socket lock before modifying MP_PRIO flags When setting up a subflow's flags for sending MP_PRIO MPTCP options, the subflow socket lock was not held while reading and modifying several struct members that are also read and modified in mptcp_write_options(). Acquire the subflow socket lock earlier and send the MP_PRIO ACK with that lock already acquired. Add a new variant of the mptcp_subflow_send_ack() helper to use with the subflow lock held. Fixes: `067065422f` ("mptcp: add the outgoing MP_PRIO support") Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-06 12:50:26 +01:00
Mat Martineau	c21b50d591	mptcp: Avoid acquiring PM lock for subflow priority changes The in-kernel path manager code for changing subflow flags acquired both the msk socket lock and the PM lock when possibly changing the "backup" and "fullmesh" flags. mptcp_pm_nl_mp_prio_send_ack() does not access anything protected by the PM lock, and it must release and reacquire the PM lock. By pushing the PM lock to where it is needed in mptcp_pm_nl_fullmesh(), the lock is only acquired when the fullmesh flag is changed and the backup flag code no longer has to release and reacquire the PM lock. The change in locking context requires the MIB update to be modified - move that to a better location instead. This change also makes it possible to call mptcp_pm_nl_mp_prio_send_ack() for the userspace PM commands without manipulating the in-kernel PM lock. Fixes: `0f9f696a50` ("mptcp: add set_flags command in PM netlink") Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-06 12:50:26 +01:00
Paolo Abeni	5ccecaec5c	mptcp: fix locking in mptcp_nl_cmd_sf_destroy() The user-space PM subflow removal path uses a couple of helpers that must be called under the msk socket lock and the current code lacks such requirement. Change the existing lock scope so that the relevant code is under its protection. Fixes: `702c2f646d` ("mptcp: netlink: allow userspace-driven subflow establishment") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/287 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-06 12:50:26 +01:00
Vlad Buslov	052f744f44	net/sched: act_police: allow 'continue' action offload Offloading police with action TC_ACT_UNSPEC was erroneously disabled even though it was supported by mlx5 matchall offload implementation, which didn't verify the action type but instead assumed that any single police action attached to matchall classifier is a 'continue' action. Lack of action type check made it non-obvious what mlx5 matchall implementation actually supports and caused implementers and reviewers of referenced commits to disallow it as a part of improved validation code. Fixes: `b8cd5831c6` ("net: flow_offload: add tc police action parameters") Fixes: `b50e462bc2` ("net/sched: act_police: Add extack messages for offload failure") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-06 12:44:39 +01:00
Vasyl Vavrychuk	e36bea6e78	Bluetooth: core: Fix deadlock on hci_power_on_sync. `cancel_work_sync(&hdev->power_on)` was moved to hci_dev_close_sync in commit [1] to ensure that power_on work is canceled after HCI interface down. But, in certain cases power_on work function may call hci_dev_close_sync itself: hci_power_on -> hci_dev_do_close -> hci_dev_close_sync -> cancel_work_sync(&hdev->power_on), causing deadlock. In particular, this happens when device is rfkilled on boot. To avoid deadlock, move power_on work canceling out of hci_dev_do_close/hci_dev_close_sync. Deadlock introduced by commit [1] was reported in [2,3] as broken suspend. Suspend did not work because `hdev->req_lock` held as result of `power_on` work deadlock. In fact, other BT features were not working. It was not observed when testing [1] since it was verified without rfkill in place. NOTE: It is not needed to cancel power_on work from other places where hci_dev_do_close/hci_dev_close_sync is called in case: * Requests were serialized due to `hdev->req_workqueue`. The power_on work is first in that workqueue. * hci_rfkill_set_block which won't close device anyway until HCI_SETUP is on. * hci_sock_release which runs after hci_sock_bind which ensures HCI_SETUP was cleared. As result, behaviour is the same as in pre-dd06ed7 commit, except power_on work cancel added to hci_dev_close. [1]: commit `ff7f292611` ("Bluetooth: core: Fix missing power_on work cancel on HCI close") [2]: https://lore.kernel.org/lkml/20220614181706.26513-1-max.oss.09@gmail.com/ [2]: https://lore.kernel.org/lkml/1236061d-95dd-c3ad-a38f-2dae7aae51ef@o2.pl/ Fixes: `ff7f292611` ("Bluetooth: core: Fix missing power_on work cancel on HCI close") Signed-off-by: Vasyl Vavrychuk <vasyl.vavrychuk@opensynergy.com> Reported-by: Max Krummenacher <max.krummenacher@toradex.com> Reported-by: Mateusz Jonczyk <mat.jonczyk@o2.pl> Tested-by: Max Krummenacher <max.krummenacher@toradex.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-05 13:20:03 -07:00
Tobias Klauser	2064a132c0	bpf: Omit superfluous address family check in __bpf_skc_lookup family is only set to either AF_INET or AF_INET6 based on len. In all other cases we return early. Thus the check against AF_UNSPEC can be omitted. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220630082618.15649-1-tklauser@distanz.ch	2022-07-05 11:51:30 +02:00
Kuniyuki Iwashima	51bae889fe	af_unix: Put pathname sockets in the global hash table. Commit `cf2f225e26` ("af_unix: Put a socket into a per-netns hash table.") accidentally broke user API for pathname sockets. A socket was able to connect() to a pathname socket whose file was visible even if they were in different network namespaces. The commit puts all sockets into a per-netns hash table. As a result, connect() to a pathname socket in a different netns fails to find it in the caller's per-netns hash table and returns -ECONNREFUSED even when the task can view the peer socket file. We can reproduce this issue by: Console A: # python3 >>> from socket import * >>> s = socket(AF_UNIX, SOCK_STREAM, 0) >>> s.bind('test') >>> s.listen(32) Console B: # ip netns add test # ip netns exec test sh # python3 >>> from socket import * >>> s = socket(AF_UNIX, SOCK_STREAM, 0) >>> s.connect('test') Note when dumping sockets by sock_diag, procfs, and bpf_iter, they are filtered only by netns. In other words, even if they are visible and connect()able, all sockets in different netns are skipped while iterating sockets. Thus, we need a fix only for finding a peer pathname socket. This patch adds a global hash table for pathname sockets, links them with sk_bind_node, and uses it in unix_find_socket_byinode(). By doing so, we can keep sockets in per-netns hash tables and dump them easily. Thanks to Sachin Sant and Leonard Crestez for reports, logs and a reproducer. Fixes: `cf2f225e26` ("af_unix: Put a socket into a per-netns hash table.") Reported-by: Sachin Sant <sachinp@linux.ibm.com> Reported-by: Leonard Crestez <cdleonard@gmail.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Tested-by: Leonard Crestez <cdleonard@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-07-05 11:34:58 +02:00
XueBing Chen	634b215b73	net: ipconfig: use strscpy to replace strlcpy The strlcpy should not be used because it doesn't limit the source length. Preferred is strscpy. Signed-off-by: XueBing Chen <chenxuebing@jari.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-04 10:28:00 +01:00
Oliver Hartkopp	f1b4e32aca	can: bcm: use call_rcu() instead of costly synchronize_rcu() In commit `d5f9023fa6` ("can: bcm: delay release of struct bcm_op after synchronize_rcu()") Thadeu Lima de Souza Cascardo introduced two synchronize_rcu() calls in bcm_release() (only once at socket close) and in bcm_delete_rx_op() (called on removal of each single bcm_op). Unfortunately this slow removal of the bcm_op's affects user space applications like cansniffer where the modification of a filter removes 2048 bcm_op's which blocks the cansniffer application for 40(!) seconds. In commit `181d444790` ("can: gw: use call_rcu() instead of costly synchronize_rcu()") Eric Dumazet replaced the synchronize_rcu() calls with several call_rcu()'s to safely remove the data structures after the removal of CAN ID subscriptions with can_rx_unregister() calls. This patch adopts Erics approach for the can-bcm which should be applicable since the removal of tasklet_kill() in bcm_remove_op() and the introduction of the HRTIMER_MODE_SOFT timer handling in Linux 5.4. Fixes: `d5f9023fa6` ("can: bcm: delay release of struct bcm_op after synchronize_rcu()") # >= 5.4 Link: https://lore.kernel.org/all/20220520183239.19111-1-socketcan@hartkopp.net Cc: stable@vger.kernel.org Cc: Eric Dumazet <edumazet@google.com> Cc: Norbert Slusarek <nslusarek@gmx.net> Cc: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-07-04 10:33:39 +02:00
Zhang Jiaming	cf746bac6c	esp6: Fix spelling mistake Change 'accomodate' to 'accommodate'. Signed-off-by: Zhang Jiaming <jiaming@nfschina.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-07-04 10:20:11 +02:00
David S. Miller	280e3a857d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Insufficient validation of element datatype and length in nft_setelem_parse_data(). At least commit `7d7402642e` updates maximum element data area up to 64 bytes when only 16 bytes where supported at the time. Support for larger element size came later in `fdb9c405e3` though. Picking this older commit as Fixes: tag to be safe than sorry. 2) Memleak in pipapo destroy path, reproducible when transaction in aborted. This is already triggering in the existing netfilter test infrastructure since more recent new tests are covering this path. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-03 12:29:18 +01:00
Pablo Neira Ayuso	9827a0e6e2	netfilter: nft_set_pipapo: release elements in clone from abort path New elements that reside in the clone are not released in case that the transaction is aborted. [16302.231754] ------------[ cut here ]------------ [16302.231756] WARNING: CPU: 0 PID: 100509 at net/netfilter/nf_tables_api.c:1864 nf_tables_chain_destroy+0x26/0x127 [nf_tables] [...] [16302.231882] CPU: 0 PID: 100509 Comm: nft Tainted: G W 5.19.0-rc3+ #155 [...] [16302.231887] RIP: 0010:nf_tables_chain_destroy+0x26/0x127 [nf_tables] [16302.231899] Code: f3 fe ff ff 41 55 41 54 55 53 48 8b 6f 10 48 89 fb 48 c7 c7 82 96 d9 a0 8b 55 50 48 8b 75 58 e8 de f5 92 e0 83 7d 50 00 74 09 <0f> 0b 5b 5d 41 5c 41 5d c3 4c 8b 65 00 48 8b 7d 08 49 39 fc 74 05 [...] [16302.231917] Call Trace: [16302.231919] <TASK> [16302.231921] __nf_tables_abort.cold+0x23/0x28 [nf_tables] [16302.231934] nf_tables_abort+0x30/0x50 [nf_tables] [16302.231946] nfnetlink_rcv_batch+0x41a/0x840 [nfnetlink] [16302.231952] ? __nla_validate_parse+0x48/0x190 [16302.231959] nfnetlink_rcv+0x110/0x129 [nfnetlink] [16302.231963] netlink_unicast+0x211/0x340 [16302.231969] netlink_sendmsg+0x21e/0x460 Add nft_set_pipapo_match_destroy() helper function to release the elements in the lookup tables. Stefano Brivio says: "We additionally look for elements pointers in the cloned matching data if priv->dirty is set, because that means that cloned data might point to additional elements we did not commit to the working copy yet (such as the abort path case, but perhaps not limited to it)." Fixes: `3c4287f620` ("nf_tables: Add set type for arbitrary concatenation of ranges") Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-02 21:04:19 +02:00
Pablo Neira Ayuso	7e6bc1f6ca	netfilter: nf_tables: stricter validation of element data Make sure element data type and length do not mismatch the one specified by the set declaration. Fixes: `7d7402642e` ("netfilter: nf_tables: variable sized set element keys / data") Reported-by: Hugues ANGUELKOV <hanguelkov@randorisec.fr> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-02 21:04:10 +02:00
Linus Torvalds	69cb6c6556	Notable regression fixes: - Fix NFSD crash during NFSv4.2 READ_PLUS operation - Fix incorrect status code returned by COMMIT operation -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmLAhFIACgkQM2qzM29m f5fK2w//TyUAzwoTZQEgmFbxNhXgUYC6uwWPqTCLMieStKtPwT8GsuXviY34QziF 2vER0NW9Am2GQyL3gtiYFM07OoHhQ4gr86ltaeIHAHhcwm3eIs879ARwEsyN6eDR +RDpqnONwtg+yaepfCMc4Bki9Jex+mmoXro86nFPmH+TDM5QiIRY0ncBWSLVWvYT YciAgvL6vfo2G79NYOzohoTb15ydotmy6m9H70nN+a2l6bKOIT8cF4S8lZETJZXA Rlj+R0eE0iXZTtp7VsAfVAHHfOzexGJjE85hpVzGiZWbxe6o0WNmBpoHICXs9VoP WRkq6vBC9P6wJ7EOu84/SRlR/rQaxfrYB7beqTC2kO6t5Ka5/xpJMKZOhfukCMV+ 7+uPAUnSIRqpHG0hWm1kPto00bfXm9fMETQbOEw7UQ9dH/322iOJnXwy03ZEXtJq 9G0k+yNcydoIs2g5OpNrw4f/wTQcdlbf1ZA5O9dAsxwS1ZTNpKUiE0Sd0Ez/0VIJ t5DZFWH44rs/60avxRYxtw965gNugTxb1YiKdXObvbFGf+xYtH0zePce++4kTJlE t886stLcHbuPYs4/XItwTxLMSnDM5UR22Icho7ElRHvPiWjZmc6o2fxCkWTzCOE2 ylZowLFMYZ/KbrP5mcqLARKEP6oFJERXt/U8U0CWeVbZomy9GVY= =z5W8 -----END PGP SIGNATURE----- Merge tag 'nfsd-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "Notable regression fixes: - Fix NFSD crash during NFSv4.2 READ_PLUS operation - Fix incorrect status code returned by COMMIT operation" * tag 'nfsd-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Fix READ_PLUS crasher NFSD: restore EINVAL error translation in nfsd_commit()	2022-07-02 11:20:56 -07:00
Prasanna Vengateshan	092f875131	net: dsa: tag_ksz: add tag handling for Microchip LAN937x The Microchip LAN937X switches have a tagging protocol which is very similar to KSZ tagging. So that the implementation is added to tag_ksz.c and reused common APIs Signed-off-by: Prasanna Vengateshan <prasanna.vengateshan@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-02 16:34:05 +01:00
Eric Dumazet	504148fedb	net: add skb_[inner_]tcp_all_headers helpers Most drivers use "skb_transport_offset(skb) + tcp_hdrlen(skb)" to compute headers length for a TCP packet, but others use more convoluted (but equivalent) ways. Add skb_tcp_all_headers() and skb_inner_tcp_all_headers() helpers to harmonize this a bit. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-02 16:22:25 +01:00
Jakub Kicinski	bc38fae3a6	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2022-07-02 We've added 7 non-merge commits during the last 14 day(s) which contain a total of 6 files changed, 193 insertions(+), 86 deletions(-). The main changes are: 1) Fix clearing of page contiguity when unmapping XSK pool, from Ivan Malov. 2) Two verifier fixes around bounds data propagation, from Daniel Borkmann. 3) Fix fprobe sample module's parameter descriptions, from Masami Hiramatsu. 4) General BPF maintainer entry revamp to better scale patch reviews. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, selftests: Add verifier test case for jmp32's jeq/jne bpf, selftests: Add verifier test case for imm=0,umin=0,umax=1 scalar bpf: Fix insufficient bounds propagation from adjust_scalar_min_max_vals bpf: Fix incorrect verifier simulation around jmp32's jeq/jne xsk: Clear page contiguity bit when unmapping pool bpf, docs: Better scale maintenance of BPF subsystem fprobe, samples: Add module parameter descriptions ==================== Link: https://lore.kernel.org/r/20220701230121.10354-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-01 19:56:28 -07:00
Paolo Abeni	69d93daec0	mptcp: refine memory scheduling Similar to commit `7c80b038d2` ("net: fix sk_wmem_schedule() and sk_rmem_schedule() errors"), let the MPTCP receive path schedule exactly the required amount of memory. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-01 13:24:59 +01:00
Paolo Abeni	d24141fe7b	mptcp: drop SK_RECLAIM_* macros After commit `4890b686f4` ("net: keep sk->sk_forward_alloc as small as possible"), the MPTCP protocol is the last SK_RECLAIM_CHUNK and SK_RECLAIM_THRESHOLD users. Update the MPTCP reclaim schema to match the core/TCP one and drop the mentioned macros. This additionally clean the MPTCP code a bit. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-01 13:24:59 +01:00
Paolo Abeni	4aaa1685f7	mptcp: never fetch fwd memory from the subflow The memory accounting is broken in such exceptional code path, and after commit `4890b686f4` ("net: keep sk->sk_forward_alloc as small as possible") we can't find much help there. Drop the broken code. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-01 13:24:59 +01:00
Aloka Dixit	8bc65d38ee	wifi: nl80211: retrieve EHT related elements in AP mode Add support to retrieve EHT capabilities and EHT operation elements passed by the userspace in the beacon template and store the pointers in struct cfg80211_ap_settings to be used by the drivers. Co-developed-by: Vikram Kandukuri <quic_vikram@quicinc.com> Signed-off-by: Vikram Kandukuri <quic_vikram@quicinc.com> Co-developed-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com> Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com> Signed-off-by: Aloka Dixit <quic_alokad@quicinc.com> Link: https://lore.kernel.org/r/20220523064904.28523-1-quic_alokad@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-01 12:37:54 +02:00
Veerendranath Jakkam	ecad3b0b99	wifi: cfg80211: Increase akm_suites array size in cfg80211_crypto_settings Increase akm_suites array size in struct cfg80211_crypto_settings to 10 and advertise the capability to userspace. This allows userspace to send more than two AKMs to driver in netlink commands such as NL80211_CMD_CONNECT. This capability is needed for implementing WPA3-Personal transition mode correctly with any driver that handles roaming internally. Currently, the possible AKMs for multi-AKM connect can include PSK, PSK-SHA-256, SAE, FT-PSK and FT-SAE. Since the count is already 5, increasing the akm_suites array size to 10 should be reasonable for future usecases. Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com> Link: https://lore.kernel.org/r/1653312358-12321-1-git-send-email-quic_vjakkam@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-01 12:07:08 +02:00
Johannes Berg	d6f671c8a3	wifi: cfg80211: remove chandef check in cfg80211_cac_event() The current check only worked for AP mode, but we can do radar detection in mesh as well (for example). We could try to check this using wdev_chandef(), but we also don't really care since the chandef is passed in and we have no need to use it anymore (since we added the argument in commit `d2859df5e7` ("cfg80211/mac80211: DFS setup chandef for cac event")). Change-Id: I856e4344d5e64ff4d2eead0b4c53b11f264be9b8 Fixes: `7b0a0e3c3a` ("wifi: cfg80211: do some rework towards MLO link APIs") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-01 12:05:48 +02:00
Johannes Berg	31177127e0	wifi: nl80211: relax wdev mutex check in wdev_chandef() In many cases we might get here from driver code that's not really set up to care about the locking, and for the non-MLO cases we really don't care so much about it. So relax the checking here for now, perhaps we should even remove it completely since we might not really care if we point to an invalid link's chandef and can require the caller to check the link validity first. Fixes: `7b0a0e3c3a` ("wifi: cfg80211: do some rework towards MLO link APIs") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-01 11:42:58 +02:00
Johannes Berg	c2653990d5	wifi: nl80211: acquire wdev mutex earlier in start_ap We need to hold the wdev mutex already in order to call nl80211_parse_tx_bitrate_mask(), so acquire it earlier. Fixes: `7b0a0e3c3a` ("wifi: cfg80211: do some rework towards MLO link APIs") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-01 11:18:38 +02:00
Johannes Berg	206bbcf761	wifi: nl80211: hold wdev mutex for tid config We need wdev_chandef() in this code, which now requires the wdev mutex due to the per-link nature. Hold it here to make sure we can access the link. Reported-by: syzbot+b4e9aa0f32ffd9902442@syzkaller.appspotmail.com Fixes: `7b0a0e3c3a` ("wifi: cfg80211: do some rework towards MLO link APIs") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-01 11:14:04 +02:00
Johannes Berg	77e7b6ba78	wifi: cfg80211: handle IBSS in channel switch Prior to commit `7b0a0e3c3a` ("wifi: cfg80211: do some rework towards MLO link APIs") the interface type didn't really matter here, but now we need to handle all of the possible cases. Add IBSS ("ADHOC") and handle it. Fixes: `7b0a0e3c3a` ("wifi: cfg80211: do some rework towards MLO link APIs") Reported-by: syzbot+90d912872157e63589e4@syzkaller.appspotmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-01 11:13:50 +02:00
Johannes Berg	591e73ee3f	wifi: mac80211: properly skip link info driver update If the interface isn't (yet) added to the driver, skip the link info update. This was previously done for the BSS info changes, but I forgot to copy the same check here. Fixes: `7b7090b4c6` ("wifi: mac80211: split bss_info_changed method") Reported-by: syzbot+bce2ca140cc00578ed07@syzkaller.appspotmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-01 11:13:35 +02:00
Felix Fietkau	c77bfab923	wifi: mac80211: only accumulate airtime deficit for active clients When a client does not generate any local tx activity, accumulating airtime deficit for the round-robin scheduler can be harmful. If this goes on for too long, the deficit could grow quite large, which might cause unreasonable initial latency once the client becomes active Signed-off-by: Felix Fietkau <nbd@nbd.name> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://lore.kernel.org/r/20220625212411.36675-7-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-01 10:51:48 +02:00
Felix Fietkau	3db2c5604f	wifi: mac80211: add debugfs file to display per-phy AQL pending airtime Now that the global pending airtime is more relevant for airtime fairness, it makes sense to make it accessible via debugfs for debugging Signed-off-by: Felix Fietkau <nbd@nbd.name> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://lore.kernel.org/r/20220625212411.36675-6-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-01 10:51:48 +02:00
Felix Fietkau	8e4bac0671	wifi: mac80211: add a per-PHY AQL limit to improve fairness In order to maintain fairness, the amount of queueing needs to be limited beyond the simple per-station AQL budget, otherwise the driver can simply repeatedly do scheduling rounds until all queues that have not used their AQL budget become eligble. To be conservative, use the high AQL limit for the first txq and add half of the low AQL for each subsequent queue. Signed-off-by: Felix Fietkau <nbd@nbd.name> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://lore.kernel.org/r/20220625212411.36675-5-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-01 10:51:48 +02:00
Felix Fietkau	8ccc07028c	wifi: mac80211: keep recently active tx queues in scheduling list This allows proper deficit accounting to ensure that they don't carry their deficit until the next time they become active Signed-off-by: Felix Fietkau <nbd@nbd.name> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://lore.kernel.org/r/20220625212411.36675-4-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-01 10:51:48 +02:00
Felix Fietkau	9c1be3cde0	wifi: mac80211: consider aql_tx_pending when checking airtime deficit When queueing packets for a station, deficit only gets added once the packets have been transmitted, which could be much later. During that time, a lot of temporary unfairness could happen, which could lead to bursty behavior. Fix this by subtracting the aql_tx_pending when checking the deficit in tx scheduling. Signed-off-by: Felix Fietkau <nbd@nbd.name> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://lore.kernel.org/r/20220625212411.36675-3-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-01 10:51:48 +02:00
Felix Fietkau	445452d438	wifi: mac80211: make sta airtime deficit field s32 instead of s64 32 bit is more than enough range for the airtime deficit Signed-off-by: Felix Fietkau <nbd@nbd.name> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://lore.kernel.org/r/20220625212411.36675-2-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-01 10:51:48 +02:00
Felix Fietkau	942741dabc	wifi: mac80211: switch airtime fairness back to deficit round-robin scheduling This reverts commits `6a789ba679` and `2433647bc8`. The virtual time scheduler code has a number of issues: - queues slowed down by hardware/firmware powersave handling were not properly handled. - on ath10k in push-pull mode, tx queues that the driver tries to pull from were starved, causing excessive latency - delay between tx enqueue and reported airtime use were causing excessively bursty tx behavior The bursty behavior may also be present on the round-robin scheduler, but there it is much easier to fix without introducing additional regressions Signed-off-by: Felix Fietkau <nbd@nbd.name> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://lore.kernel.org/r/20220625212411.36675-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-01 10:51:41 +02:00
Mauro Carvalho Chehab	fe37f73d11	wifi: mac80211: sta_info: fix a missing kernel-doc struct element struct link_sta_info has now a cur_max_bandwidth data: net/mac80211/sta_info.h:569: warning: Function parameter or member 'cur_max_bandwidth' not described in 'link_sta_info' Copy the meaning from struct sta_info, documenting it. Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Link: https://lore.kernel.org/r/37d898634bb30776442a33833c48cbb21c90ecc6.1656409369.git.mchehab@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-01 10:30:25 +02:00
Vladimir Oltean	837ced3a1a	time64.h: consolidate uses of PSEC_PER_NSEC Time-sensitive networking code needs to work with PTP times expressed in nanoseconds, and with packet transmission times expressed in picoseconds, since those would be fractional at higher than gigabit speed when expressed in nanoseconds. Convert the existing uses in tc-taprio and the ocelot/felix DSA driver to a PSEC_PER_NSEC macro. This macro is placed in include/linux/time64.h as opposed to its relatives (PSEC_PER_SEC etc) from include/vdso/time64.h because the vDSO library does not (yet) need/use it. Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # for the vDSO parts Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-30 21:18:16 -07:00
Jakub Kicinski	0d8730f07c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net drivers/net/ethernet/microchip/sparx5/sparx5_switchdev.c `9c5de246c1` ("net: sparx5: mdb add/del handle non-sparx5 devices") `fbb89d02e3` ("net: sparx5: Allow mdb entries to both CPU and ports") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-30 16:31:00 -07:00
Chuck Lever	a23dd544de	SUNRPC: Fix READ_PLUS crasher Looks like there are still cases when "space_left - frag1bytes" can legitimately exceed PAGE_SIZE. Ensure that xdr->end always remains within the current encode buffer. Reported-by: Bruce Fields <bfields@fieldses.org> Reported-by: Zorro Lang <zlang@redhat.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216151 Fixes: `6c254bf3b6` ("SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer()") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-06-30 17:41:08 -04:00
Yuwei Wang	211da42eaa	net, neigh: introduce interval_probe_time_ms for periodic probe commit `ed6cd6a178` ("net, neigh: Set lower cap for neigh_managed_work rearming") fixed a case when DELAY_PROBE_TIME is configured to 0, the processing of the system work queue hog CPU to 100%, and further more we should introduce a new option used by periodic probe Signed-off-by: Yuwei Wang <wangyuweihx@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-06-30 13:14:35 +02:00
Duoming Zhou	9cc02ede69	net: rose: fix UAF bugs caused by timer handler There are UAF bugs in rose_heartbeat_expiry(), rose_timer_expiry() and rose_idletimer_expiry(). The root cause is that del_timer() could not stop the timer handler that is running and the refcount of sock is not managed properly. One of the UAF bugs is shown below: (thread 1) \| (thread 2) \| rose_bind \| rose_connect \| rose_start_heartbeat rose_release \| (wait a time) case ROSE_STATE_0 \| rose_destroy_socket \| rose_heartbeat_expiry rose_stop_heartbeat \| sock_put(sk) \| ... sock_put(sk) // FREE \| \| bh_lock_sock(sk) // USE The sock is deallocated by sock_put() in rose_release() and then used by bh_lock_sock() in rose_heartbeat_expiry(). Although rose_destroy_socket() calls rose_stop_heartbeat(), it could not stop the timer that is running. The KASAN report triggered by POC is shown below: BUG: KASAN: use-after-free in _raw_spin_lock+0x5a/0x110 Write of size 4 at addr ffff88800ae59098 by task swapper/3/0 ... Call Trace: <IRQ> dump_stack_lvl+0xbf/0xee print_address_description+0x7b/0x440 print_report+0x101/0x230 ? irq_work_single+0xbb/0x140 ? _raw_spin_lock+0x5a/0x110 kasan_report+0xed/0x120 ? _raw_spin_lock+0x5a/0x110 kasan_check_range+0x2bd/0x2e0 _raw_spin_lock+0x5a/0x110 rose_heartbeat_expiry+0x39/0x370 ? rose_start_heartbeat+0xb0/0xb0 call_timer_fn+0x2d/0x1c0 ? rose_start_heartbeat+0xb0/0xb0 expire_timers+0x1f3/0x320 __run_timers+0x3ff/0x4d0 run_timer_softirq+0x41/0x80 __do_softirq+0x233/0x544 irq_exit_rcu+0x41/0xa0 sysvec_apic_timer_interrupt+0x8c/0xb0 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1b/0x20 RIP: 0010:default_idle+0xb/0x10 RSP: 0018:ffffc9000012fea0 EFLAGS: 00000202 RAX: 000000000000bcae RBX: ffff888006660f00 RCX: 000000000000bcae RDX: 0000000000000001 RSI: ffffffff843a11c0 RDI: ffffffff843a1180 RBP: dffffc0000000000 R08: dffffc0000000000 R09: ffffed100da36d46 R10: dfffe9100da36d47 R11: ffffffff83cf0950 R12: 0000000000000000 R13: 1ffff11000ccc1e0 R14: ffffffff8542af28 R15: dffffc0000000000 ... Allocated by task 146: __kasan_kmalloc+0xc4/0xf0 sk_prot_alloc+0xdd/0x1a0 sk_alloc+0x2d/0x4e0 rose_create+0x7b/0x330 __sock_create+0x2dd/0x640 __sys_socket+0xc7/0x270 __x64_sys_socket+0x71/0x80 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 152: kasan_set_track+0x4c/0x70 kasan_set_free_info+0x1f/0x40 ____kasan_slab_free+0x124/0x190 kfree+0xd3/0x270 __sk_destruct+0x314/0x460 rose_release+0x2fa/0x3b0 sock_close+0xcb/0x230 __fput+0x2d9/0x650 task_work_run+0xd6/0x160 exit_to_user_mode_loop+0xc7/0xd0 exit_to_user_mode_prepare+0x4e/0x80 syscall_exit_to_user_mode+0x20/0x40 do_syscall_64+0x4f/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 This patch adds refcount of sock when we use functions such as rose_start_heartbeat() and so on to start timer, and decreases the refcount of sock when timer is finished or deleted by functions such as rose_stop_heartbeat() and so on. As a result, the UAF bugs could be mitigated. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Tested-by: Duoming Zhou <duoming@zju.edu.cn> Link: https://lore.kernel.org/r/20220629002640.5693-1-duoming@zju.edu.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-06-30 11:07:30 +02:00
Colin Ian King	74fd304f23	ipv6: remove redundant store to value after addition There is no need to store the result of the addition back to variable count after the addition. The store is redundant, replace += with just + Cleans up clang scan build warning: warning: Although the value stored to 'count' is used in the enclosing expression, the value is never actually read from 'count' Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220628145406.183527-1-colin.i.king@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-29 20:42:42 -07:00
Eric Dumazet	4e43e64d0f	ipv6: fix lockdep splat in in6_dump_addrs() As reported by syzbot, we should not use rcu_dereference() when rcu_read_lock() is not held. WARNING: suspicious RCU usage 5.19.0-rc2-syzkaller #0 Not tainted net/ipv6/addrconf.c:5175 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by syz-executor326/3617: #0: ffffffff8d5848e8 (rtnl_mutex){+.+.}-{3:3}, at: netlink_dump+0xae/0xc20 net/netlink/af_netlink.c:2223 stack backtrace: CPU: 0 PID: 3617 Comm: syz-executor326 Not tainted 5.19.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 in6_dump_addrs+0x12d1/0x1790 net/ipv6/addrconf.c:5175 inet6_dump_addr+0x9c1/0xb50 net/ipv6/addrconf.c:5300 netlink_dump+0x541/0xc20 net/netlink/af_netlink.c:2275 __netlink_dump_start+0x647/0x900 net/netlink/af_netlink.c:2380 netlink_dump_start include/linux/netlink.h:245 [inline] rtnetlink_rcv_msg+0x73e/0xc90 net/core/rtnetlink.c:6046 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:734 ____sys_sendmsg+0x6eb/0x810 net/socket.c:2492 ___sys_sendmsg+0xf3/0x170 net/socket.c:2546 __sys_sendmsg net/socket.c:2575 [inline] __do_sys_sendmsg net/socket.c:2584 [inline] __se_sys_sendmsg net/socket.c:2582 [inline] __x64_sys_sendmsg+0x132/0x220 net/socket.c:2582 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: `88e2ca3080` ("mld: convert ifmcaddr6 to RCU") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Taehee Yoo <ap420073@gmail.com> Link: https://lore.kernel.org/r/20220628121248.858695-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-29 20:41:09 -07:00
Oleksij Rempel	3d410403a5	net: dsa: add get_pause_stats support Add support for pause stats Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-29 20:17:11 -07:00
Jakub Kicinski	236d59292e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Restore set counter when one of the CPU loses race to add elements to sets. 2) After NF_STOLEN, skb might be there no more, update nftables trace infra to avoid access to skb in this case. From Florian Westphal. 3) nftables bridge might register a prerouting hook with zero priority, br_netfilter incorrectly skips it. Also from Florian. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: br_netfilter: do not skip all hooks with 0 priority netfilter: nf_tables: avoid skb access on nf_stolen netfilter: nft_dynset: restore set element counter when failing to update ==================== Link: https://lore.kernel.org/r/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-29 20:09:32 -07:00
Stanislav Fomichev	9113d7e48e	bpf: expose bpf_{g,s}etsockopt to lsm cgroup I don't see how to make it nice without introducing btf id lists for the hooks where these helpers are allowed. Some LSM hooks work on the locked sockets, some are triggering early and don't grab any locks, so have two lists for now: 1. LSM hooks which trigger under socket lock - minority of the hooks, but ideal case for us, we can expose existing BTF-based helpers 2. LSM hooks which trigger without socket lock, but they trigger early in the socket creation path where it should be safe to do setsockopt without any locks 3. The rest are prohibited. I'm thinking that this use-case might be a good gateway to sleeping lsm cgroup hooks in the future. We can either expose lock/unlock operations (and add tracking to the verifier) or have another set of bpf_setsockopt wrapper that grab the locks and might sleep. Reviewed-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220628174314.1216643-7-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-06-29 13:21:52 -07:00
Hangyu Hua	00aff3590f	net: tipc: fix possible refcount leak in tipc_sk_create() Free sk in case tipc_sk_insert() fails. Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-29 13:49:06 +01:00
Vinayak Yadawad	8d70f33ed7	wifi: cfg80211: Allow P2P client interface to indicate port authorization In case of 4way handshake offload, cfg80211_port_authorized enables driver to indicate successful 4way handshake to cfg80211 layer. Currently this path of port authorization is restricted to interface type NL80211_IFTYPE_STATION. This patch extends the use of port authorization API for P2P client as well. Signed-off-by: Vinayak Yadawad <vinayak.yadawad@broadcom.com> Link: https://lore.kernel.org/r/ef25cb49fcb921df2e5d99e574f65e8a009cc52c.1655905440.git.vinayak.yadawad@broadcom.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-06-29 11:43:15 +02:00
Felix Fietkau	f856373e2f	wifi: mac80211: do not wake queues on a vif that is being stopped When a vif is being removed and sdata->bss is cleared, __ieee80211_wake_txqs can still be called on it, which crashes as soon as sdata->bss is being dereferenced. To fix this properly, check for SDATA_STATE_RUNNING before waking queues, and take the fq lock when setting it (to ensure that __ieee80211_wake_txqs observes the change when running on a different CPU) Signed-off-by: Felix Fietkau <nbd@nbd.name> Acked-by: Toke Høiland-Jørgensen <toke@kernel.org> Link: https://lore.kernel.org/r/20220531190824.60019-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-06-29 11:43:15 +02:00
Ryder Lee	a4926abb78	wifi: mac80211: check skb_shared in ieee80211_8023_xmit() Add a missing skb_shared check into 802.3 path to prevent potential use-after-free from happening. This also uses skb_share_check() instead of open-coding in tx path. Signed-off-by: Ryder Lee <ryder.lee@mediatek.com> Link: https://lore.kernel.org/r/e7a73aaf7742b17e43421c56625646dfc5c4d2cb.1653571902.git.ryder.lee@mediatek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-06-29 11:43:15 +02:00
Lorenzo Bianconi	03895c8414	wifi: mac80211: add gfp_t parameter to ieeee80211_obss_color_collision_notify Introduce the capability to specify gfp_t parameter to ieeee80211_obss_color_collision_notify routine since it runs in interrupt context in ieee80211_rx_check_bss_color_collision(). Fixes: `6d945a33f2` ("mac80211: introduce BSS color collision detection") Co-developed-by: Ryder Lee <ryder.lee@mediatek.com> Signed-off-by: Ryder Lee <ryder.lee@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/02c990fb3fbd929c8548a656477d20d6c0427a13.1655419135.git.lorenzo@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-06-29 11:43:15 +02:00
Menglong Dong	d640516a65	net: mptcp: fix some spelling mistake in mptcp codespell finds some spelling mistake in mptcp: net/mptcp/subflow.c:1624: interaces ==> interfaces net/mptcp/pm_netlink.c:1130: regarless ==> regardless Just fix them. Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/20220627121626.1595732-1-imagedong@tencent.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-28 22:11:45 -07:00
Eric Dumazet	af9784d007	tcp: diag: add support for TIME_WAIT sockets to tcp_abort() Currently, "ss -K -ta ..." does not support TIME_WAIT sockets. Issue has been raised at least two times in the past [1] [2] it is time to fix it. [1] https://lore.kernel.org/netdev/ba65f579-4e69-ae0d-4770-bc6234beb428@gmail.com/ [2] https://lore.kernel.org/netdev/CANn89i+R9RgmD=AQ4vX1Vb_SQAj4c3fi7-ZtQz-inYY4Sq4CMQ@mail.gmail.com/T/ While we are at it, use inet_sk_state_load() while tcp_abort() does not hold a lock on the socket. Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20220627121038.226500-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-28 21:26:56 -07:00
YueHaibing	53ad46169f	net: ipv6: unexport __init-annotated seg6_hmac_net_init() As of commit `5801f064e3` ("net: ipv6: unexport __init-annotated seg6_hmac_init()"), EXPORT_SYMBOL and __init is a bad combination because the .init.text section is freed up after the initialization. Hence, modules cannot use symbols annotated __init. The access to a freed symbol may end up with kernel panic. This remove the EXPORT_SYMBOL to fix modpost warning: WARNING: modpost: vmlinux.o(___ksymtab+seg6_hmac_net_init+0x0): Section mismatch in reference from the variable __ksymtab_seg6_hmac_net_init to the function .init.text:seg6_hmac_net_init() The symbol seg6_hmac_net_init is exported and annotated __init Fix this by removing the __init annotation of seg6_hmac_net_init or drop the export. Fixes: `bf355b8d2c` ("ipv6: sr: add core files for SR HMAC support") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20220628033134.21088-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-28 21:23:30 -07:00
katrinzhou	adabdd8f6a	ipv6/sit: fix ipip6_tunnel_get_prl return value When kcalloc fails, ipip6_tunnel_get_prl() should return -ENOMEM. Move the position of label "out" to return correctly. Addresses-Coverity: ("Unused value") Fixes: `300aaeeaab` ("[IPV6] SIT: Add SIOCGETPRL ioctl to get/dump PRL.") Signed-off-by: katrinzhou <katrinzhou@tencent.com> Reviewed-by: Eric Dumazet<edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220628035030.1039171-1-zys.zljxml@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-28 21:00:34 -07:00
Kuniyuki Iwashima	849d5aa3a1	af_unix: Do not call kmemdup() for init_net's sysctl table. While setting up init_net's sysctl table, we need not duplicate the global table and can use it directly as ipv4_sysctl_init_net() does. Unlike IPv4, AF_UNIX does not have a huge sysctl table for now, so it cannot be a problem, but this patch makes code consistent. Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20220627233627.51646-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-28 20:58:57 -07:00
Paolo Abeni	6aeed90450	mptcp: fix race on unaccepted mptcp sockets When the listener socket owning the relevant request is closed, it frees the unaccepted subflows and that causes later deletion of the paired MPTCP sockets. The mptcp socket's worker can run in the time interval between such delete operations. When that happens, any access to msk->first will cause an UaF access, as the subflow cleanup did not cleared such field in the mptcp socket. Address the issue explicitly traversing the listener socket accept queue at close time and performing the needed cleanup on the pending msk. Note that the locking is a bit tricky, as we need to acquire the msk socket lock, while still owning the subflow socket one. Fixes: `86e39e0448` ("mptcp: keep track of local endpoint still available for each msk") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-28 20:45:42 -07:00
Paolo Abeni	f745a3ebdf	mptcp: consistent map handling on failure When the MPTCP receive path reach a non fatal fall-back condition, e.g. when the MPC sockets must fall-back to TCP, the existing code is a little self-inconsistent: it reports that new data is available - return true - but sets the MPC flag to the opposite value. As the consequence read operations in some exceptional scenario may block unexpectedly. Address the issue setting the correct MPC read status. Additionally avoid some code duplication in the fatal fall-back scenario. Fixes: `9c81be0dbc` ("mptcp: add MP_FAIL response support") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-28 20:45:42 -07:00
Paolo Abeni	d51991e2e3	mptcp: fix shutdown vs fallback race If the MPTCP socket shutdown happens before a fallback to TCP, and all the pending data have been already spooled, we never close the TCP connection. Address the issue explicitly checking for critical condition at fallback time. Fixes: `1e39e5a32a` ("mptcp: infinite mapping sending") Fixes: `0348c690ed` ("mptcp: add the fallback check") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-28 20:45:42 -07:00

... 4 5 6 7 8 ...

70156 Commits