OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Daniel Golle	b9b05381e5	net: dsa: mt7530: improve and relax PHY driver dependency Different MT7530 variants require different PHY drivers. Use 'imply' instead of 'select' to relax the dependency on the PHY driver, and choose the appropriate driver. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-09 11:22:55 +01:00
David S. Miller	c992fde9f9	Merge branch 'smc-fixes' Gerd Bayer says: ==================== net/smc: Fix effective buffer size commit `0227f058aa` ("net/smc: Unbind r/w buffer size from clcsock and make them tunable") started to derive the effective buffer size for SMC connections inconsistently in case a TCP fallback was used and memory consumption of SMC with the default settings was doubled when a connection negotiated SMC. That was not what we want. This series consolidates the resulting effective buffer size that is used with SMC sockets, which is based on Jan Karcher's effort (see [1]). For all TCP exchanges (in particular in case of a fall back when no SMC connection was possible) the values from net.ipv4.tcp_[rw]mem are used. If SMC succeeds in establishing a SMC connection, the newly introduced values from net.smc.[rw]mem are used. net.smc.[rw]mem is initialized to 64kB, respectively. Internal test have show this to be a good compromise between throughput/latency and memory consumption. Also net.smc.[rw]mem is now decoupled completely from any tuning through net.ipv4.tcp_[rw]mem. If a user chose to tune a socket's receive or send buffer size with setsockopt, this tuning is now consistently applied to either fall-back TCP or proper SMC connections over the socket. Thanks, Gerd v2 - v3: - Rebase to and resolve conflict of second patch with latest net/master. v1 - v2: - In second patch, use sock_net() helper as suggested by Tony and demanded by kernel test robot. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-09 11:20:29 +01:00
Gerd Bayer	30c3c4a449	net/smc: Use correct buffer sizes when switching between TCP and SMC Tuning of the effective buffer size through setsockopts was working for SMC traffic only but not for TCP fall-back connections even before commit `0227f058aa` ("net/smc: Unbind r/w buffer size from clcsock and make them tunable"). That change made it apparent that TCP fall-back connections would use net.smc.[rw]mem as buffer size instead of net.ipv4_tcp_[rw]mem. Amend the code that copies attributes between the (TCP) clcsock and the SMC socket and adjust buffer sizes appropriately: - Copy over sk_userlocks so that both sockets agree on whether tuning via setsockopt is active. - When falling back to TCP use sk_sndbuf or sk_rcvbuf as specified with setsockopt. Otherwise, use the sysctl value for TCP/IPv4. - Likewise, use either values from setsockopt or from sysctl for SMC (duplicated) on successful SMC connect. In smc_tcp_listen_work() drop the explicit copy of buffer sizes as that is taken care of by the attribute copy. Fixes: `0227f058aa` ("net/smc: Unbind r/w buffer size from clcsock and make them tunable") Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-09 11:20:29 +01:00
Gerd Bayer	833bac7ec3	net/smc: Fix setsockopt and sysctl to specify same buffer size again Commit `0227f058aa` ("net/smc: Unbind r/w buffer size from clcsock and make them tunable") introduced the net.smc.rmem and net.smc.wmem sysctls to specify the size of buffers to be used for SMC type connections. This created a regression for users that specified the buffer size via setsockopt() as the effective buffer size was now doubled. Re-introduce the division by 2 in the SMC buffer create code and level this out by duplicating the net.smc.[rw]mem values used for initializing sk_rcvbuf/sk_sndbuf at socket creation time. This gives users of both methods (setsockopt or sysctl) the effective buffer size that they expect. Initialize net.smc.[rw]mem from its own constant of 64kB, respectively. Internal performance tests show that this value is a good compromise between throughput/latency and memory consumption. Also, this decouples it from any tuning that was done to net.ipv4.tcp_[rw]mem[1] before the module for SMC protocol was loaded. Check that no more than INT_MAX / 2 is assigned to net.smc.[rw]mem, in order to avoid any overflow condition when that is doubled for use in sk_sndbuf or sk_rcvbuf. While at it, drop the confusing sk_buf_size variable from __smc_buf_create and name "compressed" buffer size variables more consistently. Background: Before the commit mentioned above, SMC's buffer allocator in __smc_buf_create() always used half of the sockets' sk_rcvbuf/sk_sndbuf value as initial value to search for appropriate buffers. If the search resorted to using a bigger buffer when all buffers of the specified size were busy, the duplicate of the used effective buffer size is stored back to sk_rcvbuf/sk_sndbuf. When available, buffers of exactly the size that a user had specified as input to setsockopt() were used, despite setsockopt()'s documentation in "man 7 socket" talking of a mandatory duplication: [...] SO_SNDBUF Sets or gets the maximum socket send buffer in bytes. The kernel doubles this value (to allow space for book‐ keeping overhead) when it is set using setsockopt(2), and this doubled value is returned by getsockopt(2). The default value is set by the /proc/sys/net/core/wmem_default file and the maximum allowed value is set by the /proc/sys/net/core/wmem_max file. The minimum (doubled) value for this option is 2048. [...] Fixes: `0227f058aa` ("net/smc: Unbind r/w buffer size from clcsock and make them tunable") Co-developed-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Jan Karcher <jaka@linux.ibm.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-09 11:20:28 +01:00
David S. Miller	ae1ae5eb14	Merge branch 'sfc-conntrack-offload' Edward Cree says: ==================== sfc: basic conntrack offload Support offloading tracked connections and matching against them in TC chains on the PF and on representors. Later patch serieses will add NAT and conntrack-on-tunnel-netdevs; keep it simple for now. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-09 11:14:38 +01:00
Edward Cree	01ad088fb0	sfc: offload left-hand side rules for conntrack Handle the (comparatively) simple case of a -trk rule on an efx netdev (i.e. not a tunnel decap rule) with ct and goto chain actions. Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-09 11:14:38 +01:00
Edward Cree	1dfc29be4d	sfc: conntrack state matches in TC rules Parse ct_state trk/est, mark and zone out of flower keys, and plumb them through to the hardware, performing some minor translations. Nothing can actually hit them yet as we're not offloading any DO_CT actions. Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-09 11:14:38 +01:00
Edward Cree	2941602518	sfc: handle non-zero chain_index on TC rules Map it to an 8-bit recirc_id for use by the hardware. Currently nothing in the driver is offloading 'goto chain' actions, so these rules cannot yet be hit. Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-09 11:14:38 +01:00
Edward Cree	1909387fcf	sfc: offload conntrack flow entries (match only) from CT zones No handling yet for FLOW_ACTION_MANGLE (NAT or NAPT) actions. Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-09 11:14:38 +01:00
Edward Cree	94aa05bdc7	sfc: functions to insert/remove conntrack entries to MAE hardware Translate from software struct efx_tc_ct_entry objects to the key and response bitstrings, and implement insertion and removal of these entries from the hardware table. Callers of these functions will be added in subsequent patches. Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-09 11:14:38 +01:00
Edward Cree	c3bb5c6acd	sfc: functions to register for conntrack zone offload Bind a stub callback to the netfilter flow table. Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-09 11:14:37 +01:00
Edward Cree	3bf969e88a	sfc: add MAE table machinery for conntrack table Access to the connection tracking table in EF100 hardware is through a "generic" table mechanism, whereby a firmware call at probe time gives the driver a description of the field widths and offsets, so that the driver can then construct key and response bitstrings at runtime. Probe the NIC for this information and populate the needed metadata into a new meta_ct field of struct efx_tc_state. Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-09 11:14:37 +01:00
David S. Miller	d0378ae6d1	Merge branch 'enetc-probe-fix' Vladimir Oltean says: ==================== Fix ENETC probing after `6fffbc7ae1` ("PCI: Honor firmware's device disabled status") I'm not sure who should take this patch set (net maintainers or PCI maintainers). Everyone could pick up just their part, and that would work (no compile time dependencies). However, the entire series needs ACK from both sides and Rob for sure. v1 at: https://lore.kernel.org/netdev/20230521115141.2384444-1-vladimir.oltean@nxp.com/ ==================== Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-09 09:18:31 +01:00
Vladimir Oltean	bfce089ddd	net: enetc: remove of_device_is_available() handling Since commit `6fffbc7ae1` ("PCI: Honor firmware's device disabled status"), this is redundant and does nothing, because enetc_pf_probe() no longer even gets called. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-09 09:18:31 +01:00
Vladimir Oltean	f0168042a2	net: enetc: reimplement RFS/RSS memory clearing as PCI quirk The workaround implemented in commit `3222b5b613` ("net: enetc: initialize RFS/RSS memories for unused ports too") is no longer effective after commit `6fffbc7ae1` ("PCI: Honor firmware's device disabled status"). Thus, it has introduced a regression and we see AER errors being reported again: $ ip link set sw2p0 up && dhclient -i sw2p0 && ip addr show sw2p0 fsl_enetc 0000:00:00.2 eno2: configuring for fixed/internal link mode fsl_enetc 0000:00:00.2 eno2: Link is Up - 2.5Gbps/Full - flow control rx/tx mscc_felix 0000:00:00.5 swp2: configuring for fixed/sgmii link mode mscc_felix 0000:00:00.5 swp2: Link is Up - 1Gbps/Full - flow control off sja1105 spi2.2 sw2p0: configuring for phy/rgmii-id link mode sja1105 spi2.2 sw2p0: Link is Up - 1Gbps/Full - flow control off pcieport 0000:00:1f.0: AER: Multiple Corrected error received: 0000:00:00.0 pcieport 0000:00:1f.0: AER: can't find device of ID0000 Rob's suggestion is to reimplement the enetc driver workaround as a PCI fixup, and to modify the PCI core to run the fixups for all PCI functions. This change handles the first part. We refactor the common code in enetc_psi_create() and enetc_psi_destroy(), and use the PCI fixup only for those functions for which enetc_pf_probe() won't get called. This avoids some work being done twice for the PFs which are enabled. Fixes: `6fffbc7ae1` ("PCI: Honor firmware's device disabled status") Link: https://lore.kernel.org/netdev/CAL_JsqLsVYiPLx2kcHkDQ4t=hQVCR7NHziDwi9cCFUFhx48Qow@mail.gmail.com/ Suggested-by: Rob Herring <robh@kernel.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-09 09:18:31 +01:00
Vladimir Oltean	1a8c251cff	PCI: move OF status = "disabled" detection to dev->match_driver The blamed commit has broken probing on arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi when &enetc_port0 (PCI function 0) has status = "disabled". Background: pci_scan_slot() has logic to say that if the function 0 of a device is absent, the entire device is absent and we can skip the other functions entirely. Traditionally, this has meant that pci_bus_read_dev_vendor_id() returns an error code for that function. However, since the blamed commit, there is an extra confounding condition: function 0 of the device exists and has a valid vendor id, but it is disabled in the device tree. In that case, pci_scan_slot() would incorrectly skip the entire device instead of just that function. In the case of NXP LS1028A, status = "disabled" does not mean that the PCI function's config space is not available for reading. It is, but the Ethernet port is just not functionally useful with a particular SerDes protocol configuration (0x9999) due to pinmuxing constraints of the Soc. So, pci_scan_slot() skips all other functions on the ENETC ECAM (enetc_port1, enetc_port2, enetc_mdio_pf3 etc) when just enetc_port0 had to not be probed. There is an additional regression introduced by the change, caused by its fundamental premise. The enetc driver needs to run code for all PCI functions, regardless of whether they're enabled or not in the device tree. That is no longer possible if the driver's probe function is no longer called. But Rob recommends that we move the of_device_is_available() detection to dev->match_driver, and this makes the PCI fixups still run on all functions, while just probing drivers for those functions that are enabled. So, a separate change in the enetc driver will have to move the workarounds to a PCI fixup. Fixes: `6fffbc7ae1` ("PCI: Honor firmware's device disabled status") Link: https://lore.kernel.org/netdev/CAL_JsqLsVYiPLx2kcHkDQ4t=hQVCR7NHziDwi9cCFUFhx48Qow@mail.gmail.com/ Suggested-by: Rob Herring <robh@kernel.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-09 09:18:30 +01:00
Yue Haibing	2adbb7637f	bpf: btf: Remove two unused function declarations Commit `db55911782` ("bpf: Consolidate spin_lock, timer management into btf_record") removed the implementations but leave declarations. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20230808145741.33292-1-yuehaibing@huawei.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-08-08 17:25:02 -07:00
Yue Haibing	526bc5ba19	bpf: lru: Remove unused declaration bpf_lru_promote() Commit `3a08c2fd76` ("bpf: LRU List") declared but never implemented this. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20230808145531.19692-1-yuehaibing@huawei.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-08-08 17:21:42 -07:00
Eduard Zingerman	898f55f50a	selftests/bpf: relax expected log messages to allow emitting BPF_ST Update [1] to LLVM BPF backend seeks to enable generation of BPF_ST instruction when CPUv4 is selected. This affects expected log messages for the following selftests: - log_fixup/missing_map - spin_lock/lock_id_mapval_preserve - spin_lock/lock_id_innermapval_preserve Expected messages in these tests hard-code instruction numbers for BPF programs compiled from C. These instruction numbers change when BPF_ST is allowed because single BPF_ST instruction replaces a pair of BPF_MOV/BPF_STX instructions, e.g.: r1 = 42; (u32 )(r10 - 8) = r1; ---> (u32 )(r10 - 8) = 42; This commit updates expected log messages to avoid matching specific instruction numbers (program position still could be uniquely identified). [1] https://reviews.llvm.org/D140804 "[BPF] support for BPF_ST instruction in codegen" Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230808162755.392606-1-eddyz87@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-08-08 17:02:22 -07:00
Kui-Feng Lee	96ead1e702	selftests/bpf: remove duplicated functions The file cgroup_tcp_skb.c contains redundant implementations of the similar functions (create_server_sock_v6(), connect_client_server_v6() and get_sock_port_v6()) found in network_helpers.c. Let's eliminate these duplicated functions. Changes from v1: - Remove get_sock_port_v6() as well. v1: https://lore.kernel.org/all/20230807193840.567962-1-thinker.li@gmail.com/ Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20230808162858.326871-1-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-08-08 17:01:19 -07:00
Jakub Kicinski	1c2c8c3517	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-08-07 (ice) This series contains updates to ice driver only. Wojciech allows for LAG interfaces to be used for bridge offloads. Marcin tracks additional metadata for filtering rules to aid in proper differentiation of similar rules. He also renames some flags that do not entirely describe their representation. Karol and Jan add additional waiting for firmware load on devices that require it. Przemek refactors RSS implementation to clarify/simplify configurations. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: clean up __ice_aq_get_set_rss_lut() ice: add FW load wait ice: Add get C827 PHY index function ice: Rename enum ice_pkt_flags values ice: Add direction metadata ice: Accept LAG netdevs in bridge offloads ==================== Link: https://lore.kernel.org/r/20230807204835.3129164-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 16:41:30 -07:00
Piotr Gardocki	0fb1d8eb23	iavf: fix potential races for FDIR filters Add fdir_fltr_lock locking in unprotected places. The change in iavf_fdir_is_dup_fltr adds a spinlock around a loop which iterates over all filters and looks for a duplicate. The filter can be removed from list and freed from memory at the same time it's being compared. All other places where filters are deleted are already protected with spinlock. The remaining changes protect adapter->fdir_active_fltr variable so now all its uses are under a spinlock. Fixes: `527691bf06` ("iavf: Support IPv4 Flow Director filters") Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230807205011.3129224-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 16:38:28 -07:00
Muhammad Husaini Zulkifli	06b412589e	igc: Add lock to safeguard global Qbv variables Access to shared variables through hrtimer requires locking in order to protect the variables because actions to write into these variables (oper_gate_closed, admin_gate_closed, and qbv_transition) might potentially occur simultaneously. This patch provides a locking mechanisms to avoid such scenarios. Fixes: `175c241288` ("igc: Fix TX Hang issue when QBV Gate is closed") Suggested-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230807205129.3129346-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 16:36:23 -07:00
Jakub Kicinski	b9077ef4c1	mlx5-fixes-2023-08-07 -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmTRPIkACgkQSD+KveBX +j4ozgf/aS4GktVhN0DkTksN3K8D4QSrYYR+hiW4e7o7xz1K32qw+GRiC2r/FSGh XHzOGybuj7+V9TxcOb8NRSqKVtpci4MChQTrWzGutqwtcxU18SyDSo/kEmHfqkT+ kuH+PDpDpNtcCwr+z3Cb+M22ZjpqwZnWdxfKa9rG+ur9QPTUnBY1+MGGYn6eeMnC DD7HiB+q7YnCNbsFHJNp4ZeUVsTWO4gD6aOUkUhXDaBkBTpKwrCUQ+dNMm2AG1/z 3v4VdtB28BHV6o4QSop7HYk1DEnpTjZt/R+FnC6bZpIu7bDYVqw9p97EoPMvSlMv RgorORXPr30WpvWcH6ff2y/A0MmhIw== =nDa5 -----END PGP SIGNATURE----- Merge tag 'mlx5-fixes-2023-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2023-08-07 This series provides bug fixes to mlx5 driver. * tag 'mlx5-fixes-2023-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Add capability check for vnic counters net/mlx5: Reload auxiliary devices in pci error handlers net/mlx5: Skip clock update work when device is in error state net/mlx5: LAG, Check correct bucket when modifying LAG net/mlx5e: Unoffload post act rule when handling FIB events net/mlx5: Fix devlink controller number for ECVF net/mlx5: Allow 0 for total host VFs net/mlx5: Return correct EC_VF function ID net/mlx5: DR, Fix wrong allocation of modify hdr pattern net/mlx5e: TC, Fix internal port memory leak net/mlx5e: Take RTNL lock when needed before calling xdp_set_features() ==================== Link: https://lore.kernel.org/r/20230807212607.50883-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 16:33:19 -07:00
Will Hawkins	e546a11980	bpf, docs: Fix small typo and define semantics of sign extension Add additional precision on the semantics of the sign extension operations in BPF. In addition, fix a very minor typo. Signed-off-by: Will Hawkins <hawkinsw@obs.cr> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230808212503.197834-1-hawkinsw@obs.cr Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-08-08 16:29:20 -07:00
Jakub Kicinski	f5f502a3ea	mlx5-updates-2023-08-07 1) Few cleanups 2) Dynamic completion EQs The driver creates completion EQs for all vectors directly on driver load, even if those EQs will not be utilized later on. To allow more flexibility in managing completion EQs and to reduce the memory overhead of driver load, this series will adjust completion EQs creation to be dynamic. Meaning, completion EQs will be created only when needed. Patch #1 introduces a counter for tracking the current number of completion EQs. Patches #2-6 refactor the existing infrastructure of managing completion EQs and completion IRQs to be compatible with per-vector allocation/release requests. Patches #7-8 modify the CPU-to-IRQ affinity calculation to be resilient in case the affinity is requested but completion IRQ is not allocated yet. Patch #9 function rename. Patch #10 handles the corner case of SF performing an IRQ request when no SF IRQ pool is found, and no PF IRQ exists for the same vector. Patch #11 modify driver to use dynamically allocate completion EQs. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmTRL7AACgkQSD+KveBX +j69VggA0+5AO3WkMOy6055/HQoSiO/UqdSwyZhedu9nUFtQmmybIkkqghebTZgO +tLMjSzPjkhtnY70Fybb+MAH2QiOVO7MxMvg99punmhpB1RBYSo7mg2WG/la55A1 E+ocN6uL0Jv2HUXbECSM8OhGOOf6wJNXRk9kzc6AJWsmun356HveaW8n8hp0JCJr hnqbyPZJjLAYlcXH9jjvseCW7tHZH30WPAoYqVXRMao8LPbvjCWlinw3gntmet0f 746oxkNapZrHj+lHHrmGRr23+pOWZGW0nt8cxJlXHd0yDP116RDDf64gcop92AYX YWVe8whSb2yp6FcscNtb1Vt5vw/c9w== =e5aZ -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2023-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-08-07 1) Few cleanups 2) Dynamic completion EQs The driver creates completion EQs for all vectors directly on driver load, even if those EQs will not be utilized later on. To allow more flexibility in managing completion EQs and to reduce the memory overhead of driver load, this series will adjust completion EQs creation to be dynamic. Meaning, completion EQs will be created only when needed. Patch #1 introduces a counter for tracking the current number of completion EQs. Patches #2-6 refactor the existing infrastructure of managing completion EQs and completion IRQs to be compatible with per-vector allocation/release requests. Patches #7-8 modify the CPU-to-IRQ affinity calculation to be resilient in case the affinity is requested but completion IRQ is not allocated yet. Patch #9 function rename. Patch #10 handles the corner case of SF performing an IRQ request when no SF IRQ pool is found, and no PF IRQ exists for the same vector. Patch #11 modify driver to use dynamically allocate completion EQs. * tag 'mlx5-updates-2023-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Bridge, Only handle registered netdev bridge events net/mlx5: E-Switch, Remove redundant arg ignore_flow_lvl net/mlx5: Fix typo reminder -> remainder net/mlx5: remove many unnecessary NULL values net/mlx5: Allocate completion EQs dynamically net/mlx5: Handle SF IRQ request in the absence of SF IRQ pool net/mlx5: Rename mlx5_comp_vectors_count() to mlx5_comp_vectors_max() net/mlx5: Add IRQ vector to CPU lookup function net/mlx5: Introduce mlx5_cpumask_default_spread net/mlx5: Implement single completion EQ create/destroy methods net/mlx5: Use xarray to store and manage completion EQs net/mlx5: Refactor completion IRQ request/release handlers in EQ layer net/mlx5: Use xarray to store and manage completion IRQs net/mlx5: Refactor completion IRQ request/release API net/mlx5: Track the current number of completion EQs ==================== Link: https://lore.kernel.org/r/20230807175642.20834-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 16:28:29 -07:00
Jakub Kicinski	81f3768d91	Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver' Jijie Shao says: ==================== There are some bugfix for the HNS3 ethernet driver There are some bugfix for the HNS3 ethernet driver v1: https://lore.kernel.org/all/20230728075840.4022760-2-shaojijie@huawei.com/ ==================== Link: https://lore.kernel.org/r/20230807113452.474224-1-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 16:13:07 -07:00
Yonglong Liu	ac6257a3ae	net: hns3: fix deadlock issue when externel_lb and reset are executed together When externel_lb and reset are executed together, a deadlock may occur: [ 3147.217009] INFO: task kworker/u321:0:7 blocked for more than 120 seconds. [ 3147.230483] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 3147.238999] task:kworker/u321:0 state:D stack: 0 pid: 7 ppid: 2 flags:0x00000008 [ 3147.248045] Workqueue: hclge hclge_service_task [hclge] [ 3147.253957] Call trace: [ 3147.257093] __switch_to+0x7c/0xbc [ 3147.261183] __schedule+0x338/0x6f0 [ 3147.265357] schedule+0x50/0xe0 [ 3147.269185] schedule_preempt_disabled+0x18/0x24 [ 3147.274488] __mutex_lock.constprop.0+0x1d4/0x5dc [ 3147.279880] __mutex_lock_slowpath+0x1c/0x30 [ 3147.284839] mutex_lock+0x50/0x60 [ 3147.288841] rtnl_lock+0x20/0x2c [ 3147.292759] hclge_reset_prepare+0x68/0x90 [hclge] [ 3147.298239] hclge_reset_subtask+0x88/0xe0 [hclge] [ 3147.303718] hclge_reset_service_task+0x84/0x120 [hclge] [ 3147.309718] hclge_service_task+0x2c/0x70 [hclge] [ 3147.315109] process_one_work+0x1d0/0x490 [ 3147.319805] worker_thread+0x158/0x3d0 [ 3147.324240] kthread+0x108/0x13c [ 3147.328154] ret_from_fork+0x10/0x18 In externel_lb process, the hns3 driver call napi_disable() first, then the reset happen, then the restore process of the externel_lb will fail, and will not call napi_enable(). When doing externel_lb again, napi_disable() will be double call, cause a deadlock of rtnl_lock(). This patch use the HNS3_NIC_STATE_DOWN state to protect the calling of napi_disable() and napi_enable() in externel_lb process, just as the usage in ndo_stop() and ndo_start(). Fixes: `04b6ba1435` ("net: hns3: add support for external loopback test") Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230807113452.474224-5-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 16:13:05 -07:00
Jie Wang	6265e242f7	net: hns3: add wait until mac link down In some configure flow of hns3 driver, for example, change mtu, it will disable MAC through firmware before configuration. But firmware disables MAC asynchronously. The rx traffic may be not stopped in this case. So fixes it by waiting until mac link is down. Fixes: `a9775bb64a` ("net: hns3: fix set and get link ksettings issue") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230807113452.474224-4-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 16:13:05 -07:00
Jie Wang	08469dacfa	net: hns3: refactor hclge_mac_link_status_wait for interface reuse Some nic configurations could only be performed after link is down. So this patch refactor this API for reuse. Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230807113452.474224-3-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 16:13:05 -07:00
Jian Shen	15159ec0c8	net: hns3: restore user pause configure when disable autoneg Restore the mac pause state to user configuration when autoneg is disabled Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230807113452.474224-2-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 16:13:05 -07:00
Jakub Kicinski	2c2b88748f	docs: net: page_pool: de-duplicate the intro comment In commit `82e896d992` ("docs: net: page_pool: use kdoc to avoid duplicating the information") I shied away from using the DOC: comments when moving to kdoc for documenting page_pool API, because I wasn't sure how familiar people are with it. Turns out there is already a DOC: comment for the intro, which is the same in both places, modulo what looks like minor rewording. Use the version from Documentation/ but keep the contents with the code. Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20230807210051.1014580-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 16:09:10 -07:00
Yue Haibing	b876b71a6a	devlink: Remove unused devlink_dpipe_table_resource_set() declaration Commit `f655dacb59` ("net: devlink: remove unused locked functions") removed this but leave the declaration. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230807143214.46648-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 16:07:43 -07:00
Yue Haibing	209bccbac9	net: fq: Remove unused typedef fq_flow_get_default_t Commitbf9009bf21b5 ("net/fq_impl: drop get_default_func, move default flow to fq_tin") remove its last user, so can remove it. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230807142111.33524-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 15:58:23 -07:00
David Rheinsberg	b6f79e826f	net/unix: use consistent error code in SO_PEERPIDFD Change the new (unreleased) SO_PEERPIDFD sockopt to return ENODATA rather than ESRCH if a socket type does not support remote peer-PID queries. Currently, SO_PEERPIDFD returns ESRCH when the socket in question is not an AF_UNIX socket. This is quite unexpected, given that one would assume ESRCH means the peer process already exited and thus cannot be found. However, in that case the sockopt actually returns EINVAL (via pidfd_prepare()). This is rather inconsistent with other syscalls, which usually return ESRCH if a given PID refers to a non-existant process. This changes SO_PEERPIDFD to return ENODATA instead. This is also what SO_PEERGROUPS returns, and thus keeps a consistent behavior across sockopts. Note that this code is returned in 2 cases: First, if the socket type is not AF_UNIX, and secondly if the socket was not yet connected. In both cases ENODATA seems suitable. Signed-off-by: David Rheinsberg <david@readahead.eu> Reviewed-by: Christian Brauner <brauner@kernel.org> Acked-by: Luca Boccassi <bluca@debian.org> Fixes: `7b26952a91` ("net: core: add getsockopt SO_PEERPIDFD") Link: https://lore.kernel.org/r/20230807081225.816199-1-david@readahead.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 15:56:48 -07:00
Hannes Reinecke	ba4a734e1a	net/tls: avoid TCP window full during ->read_sock() When flushing the backlog after decoding a record we don't really know how much data the caller want us to evaluate, so use INT_MAX and 0 as arguments to tls_read_flush_backlog() to ensure we flush at 128k of data. Otherwise we might be reading too much data and trigger a TCP window full. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20230807071022.10091-1-hare@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 15:53:49 -07:00
Ziyang Xuan	794529c448	ipv6: exthdrs: Replace opencoded swap() implementation Get a coccinelle warning as follows: net/ipv6/exthdrs.c:800:29-30: WARNING opportunity for swap() Use swap() to replace opencoded implementation. Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230807020947.1991716-1-william.xuanziyang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 15:36:47 -07:00
xu xin	c67180efc5	net/ipv4: return the real errno instead of -EINVAL For now, No matter what error pointer ip_neigh_for_gw() returns, ip_finish_output2() always return -EINVAL, which may mislead the upper users. For exemple, an application uses sendto to send an UDP packet, but when the neighbor table overflows, sendto() will get a value of -EINVAL, and it will cause users to waste a lot of time checking parameters for errors. Return the real errno instead of -EINVAL. Signed-off-by: xu xin <xu.xin16@zte.com.cn> Reviewed-by: Yang Yang <yang.yang29@zte.com.cn> Cc: Si Hao <si.hao@zte.com.cn> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://lore.kernel.org/r/20230807015408.248237-1-xu.xin16@zte.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 15:35:51 -07:00
Jakub Kicinski	3337022bab	Merge branch 'net-renesas-rswitch-add-speed-change-support' Yoshihiro Shimoda says: ==================== net: renesas: rswitch: Add speed change support Add speed change support at runtime for the latest SoC version. Also, add ethtool .[gs]et_link_ksettings. v1: https://lore.kernel.org/all/20230803120621.1471440-1-yoshihiro.shimoda.uh@renesas.com/ ==================== Link: https://lore.kernel.org/r/20230807003231.1552062-1-yoshihiro.shimoda.uh@renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 15:31:53 -07:00
Yoshihiro Shimoda	20f8be6b24	net: renesas: rswitch: Add .[gs]et_link_ksettings support Add .[gs]et_link_ksettings support by using phy_ethtool_[gs]et_link_ksettings() functions. Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230807003231.1552062-3-yoshihiro.shimoda.uh@renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 15:31:52 -07:00
Yoshihiro Shimoda	c009b903f8	net: renesas: rswitch: Add runtime speed change support The latest SoC version can support runtime speed change. So, add detect SoC version by using soc_device_match() and then reconfigure the hardware of this and SerDes if needed. Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230807003231.1552062-2-yoshihiro.shimoda.uh@renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 15:31:52 -07:00
Lin Ma	f1d152eb66	rtnetlink: remove redundant checks for nlattr IFLA_BRIDGE_MODE The commit `d73ef2d69c` ("rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length") added the nla_len check in rtnl_bridge_setlink, which is the only caller for ndo_bridge_setlink handlers defined in low-level driver codes. Hence, this patch cleanups the redundant checks in each ndo_bridge_setlink handler function. Suggested-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Lin Ma <linma@zju.edu.cn> Acked-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230807091347.3804523-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 15:10:37 -07:00
Jakub Kicinski	a6c1fd040d	Merge branch 'bnxt_en-fix-2-compile-warnings-in-bnxt_dcb-c' Michael Chan says: ==================== bnxt_en: Fix 2 compile warnings in bnxt_dcb.c Fix 2 similar warnings in bnxt_dcb.c related to the cos2bw interface definitions in bnxt_hsi.h. ==================== Link: https://lore.kernel.org/r/20230807145720.159645-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 15:07:20 -07:00
Michael Chan	3d5ecada04	bnxt_en: Fix W=stringop-overflow warning in bnxt_dcb.c Fix the following warning: drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c: In function ‘bnxt_hwrm_queue_cos2bw_cfg’: cc1: error: writing 12 bytes into a region of size 1 [-Werror=stringop-overflow ] In file included from drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c:19: drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h:6045:17: note: destination object ‘unused_0’ of size 1 6045 \| u8 unused_0; Fix it by modifying struct hwrm_queue_cos2bw_cfg_input to use an array of sub struct similar to the previous patch. This will eliminate the pointer arithmetc to calculate the destination pointer passed to memcpy(). Link: https://lore.kernel.org/netdev/CACKFLinikvXmKcxr4kjWO9TPYxTd2cb5agT1j=w9Qyj5-24s5A@mail.gmail.com/ Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20230807145720.159645-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 15:07:18 -07:00
Michael Chan	ac1b8c978a	bnxt_en: Fix W=1 warning in bnxt_dcb.c from fortify memcpy() Fix the following warning: inlined from ‘bnxt_hwrm_queue_cos2bw_qcfg’ at drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c:165:3, ./include/linux/fortify-string.h:592:4: error: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror] __read_overflow2_field(q_size_field, size); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Modify the FW interface defintion of struct hwrm_queue_cos2bw_qcfg_output to use an array of sub struct for the queue1 to queue7 fields. Note that the layout of the queue0 fields are different and these are not part of the array. This makes the code much cleaner by removing the pointer arithmetic for memcpy(). Link: https://lore.kernel.org/netdev/20230727190726.1859515-2-kuba@kernel.org/ Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20230807145720.159645-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 15:07:18 -07:00
Dan Carpenter	48d17c517a	net: bcmasp: Prevent array undereflow in bcmasp_netfilt_get_init() The "loc" value comes from the user and it can be negative leading to an an array underflow when we check "priv->net_filters[loc].claimed". Fix this by changing the type to u32. Fixes: `c5d511c495` ("net: bcmasp: Add support for wake on net filters") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Justin Chen <justin.chen@broadcom.com> Link: https://lore.kernel.org/r/b3b47b25-01fc-4d9f-a6c3-e037ad4d71d7@moroto.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 15:06:15 -07:00
Jakub Kicinski	c0256168d1	Merge branch 'team-do-some-cleanups-in-team-driver' Zhengchao Shao says: ==================== team: do some cleanups in team driver Do some cleanups in team driver. ==================== Link: https://lore.kernel.org/r/20230807012556.3146071-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 15:04:07 -07:00
Zhengchao Shao	7790eaeb68	team: remove unused input parameters in lb_htpm_select_tx_port and lb_hash_select_tx_port The input parameters "lb_priv" and "skb" in lb_htpm_select_tx_port and lb_hash_select_tx_port are unused, so remove them. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230807012556.3146071-6-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 15:04:05 -07:00
Zhengchao Shao	c3b41f4c7b	team: change the getter function in the team_option structure to void Because the getter function in the team_option structure always returns 0, so change the getter function to void and remove redundant code. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230807012556.3146071-5-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 15:04:05 -07:00
Zhengchao Shao	de3ecc4fd8	team: change the init function in the team_option structure to void Because the init function in the team_option structure always returns 0, so change the init function to void and remove redundant code. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230807012556.3146071-4-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-08 15:04:04 -07:00

... 2 3 4 5 6 ...

1202510 Commits All Branches Search

1202510 Commits

All Branches