OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Jakub Kicinski	e61caf04b9	Merge branch 'page_pool-allow-caching-from-safely-localized-napi' Jakub Kicinski says: ==================== page_pool: allow caching from safely localized NAPI I went back to the explicit "are we in NAPI method", mostly because I don't like having both around :( (even tho I maintain that in_softirq() && !in_hardirq() is as safe, as softirqs do not nest). Still returning the skbs to a CPU, tho, not to the NAPI instance. I reckon we could create a small refcounted struct per NAPI instance which would allow sockets and other users so hold a persisent and safe reference. But that's a bigger change, and I get 90+% recycling thru the cache with just these patches (for RR and streaming tests with 100% CPU use it's almost 100%). Some numbers for streaming test with 100% CPU use (from previous version, but really they perform the same): HW-GRO page=page before after before after recycle: cached: 0 138669686 0 150197505 cache_full: 0 223391 0 74582 ring: 138551933 9997191 149299454 0 ring_full: 0 488 3154 127590 released_refcnt: 0 0 0 0 alloc: fast: 136491361 148615710 146969587 150322859 slow: 1772 1799 144 105 slow_high_order: 0 0 0 0 empty: 1772 1799 144 105 refill: 2165245 156302 2332880 2128 waive: 0 0 0 0 v1: https://lore.kernel.org/all/20230411201800.596103-1-kuba@kernel.org/ rfcv2: https://lore.kernel.org/all/20230405232100.103392-1-kuba@kernel.org/ ==================== Link: https://lore.kernel.org/r/20230413042605.895677-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-14 18:56:14 -07:00
Jakub Kicinski	294e39e0d0	bnxt: hook NAPIs to page pools bnxt has 1:1 mapping of page pools and NAPIs, so it's safe to hoook them up together. Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Tested-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-14 18:56:12 -07:00
Jakub Kicinski	8c48eea3ad	page_pool: allow caching from safely localized NAPI Recent patches to mlx5 mentioned a regression when moving from driver local page pool to only using the generic page pool code. Page pool has two recycling paths (1) direct one, which runs in safe NAPI context (basically consumer context, so producing can be lockless); and (2) via a ptr_ring, which takes a spin lock because the freeing can happen from any CPU; producer and consumer may run concurrently. Since the page pool code was added, Eric introduced a revised version of deferred skb freeing. TCP skbs are now usually returned to the CPU which allocated them, and freed in softirq context. This places the freeing (producing of pages back to the pool) enticingly close to the allocation (consumer). If we can prove that we're freeing in the same softirq context in which the consumer NAPI will run - lockless use of the cache is perfectly fine, no need for the lock. Let drivers link the page pool to a NAPI instance. If the NAPI instance is scheduled on the same CPU on which we're freeing - place the pages in the direct cache. With that and patched bnxt (XDP enabled to engage the page pool, sigh, bnxt really needs page pool work :() I see a 2.6% perf boost with a TCP stream test (app on a different physical core than softirq). The CPU use of relevant functions decreases as expected: page_pool_refill_alloc_cache 1.17% -> 0% _raw_spin_lock 2.41% -> 0.98% Only consider lockless path to be safe when NAPI is scheduled - in practice this should cover majority if not all of steady state workloads. It's usually the NAPI kicking in that causes the skb flush. The main case we'll miss out on is when application runs on the same CPU as NAPI. In that case we don't use the deferred skb free path. Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-14 18:56:12 -07:00
Jakub Kicinski	b07a2d97ba	net: skb: plumb napi state thru skb freeing paths We maintain a NAPI-local cache of skbs which is fed by napi_consume_skb(). Going forward we will also try to cache head and data pages. Plumb the "are we in a normal NAPI context" information thru deeper into the freeing path, up to skb_release_data() and skb_free_head()/skb_pp_recycle(). The "not normal NAPI context" comes from netpoll which passes budget of 0 to try to reap the Tx completions but not perform any Rx. Use "bool napi_safe" rather than bare "int budget", the further we get from NAPI the more confusing the budget argument may seem (particularly whether 0 or MAX is the correct value to pass in when not in NAPI). Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Tested-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-14 18:56:12 -07:00
Yevgeny Kliteynik	220ae98783	net/mlx5: DR, Enable patterns and arguments for supporting devices Check if patterns and arguments for modify header action are supported and enable them accordingly. Signed-off-by: Muhammad Sammar <muhammads@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-04-14 15:06:22 -07:00
Yevgeny Kliteynik	a21e52bb8f	net/mlx5: DR, Add support for the pattern/arg parameters in debug dump Support the pattern/args-based MODIFY_HDR and TNL_L3_TO_L2 actions in dbg dump Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-04-14 15:06:22 -07:00
Yevgeny Kliteynik	40ff097f25	net/mlx5: DR, Modify header action of size 1 optimization Set modify header action of size 1 directly on the STE for supporting devices, thus reducing number of hops and cache misses. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-04-14 15:06:22 -07:00
Yevgeny Kliteynik	947e258537	net/mlx5: DR, Support decap L3 action using pattern / arg mechanism Use the new accelerated action for decap L3 on RX side: use the mechanism of pattern and argument same as in modify-header action. Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-04-14 15:06:22 -07:00
Yevgeny Kliteynik	62e40c8568	net/mlx5: DR, Apply new accelerated modify action and decapl3 If there is support for pattern/args, use the new accelerated modify header action for modify header and decap L3 actions. Otherwise fall back to the old modify-header implementation. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-04-14 15:06:22 -07:00
Yevgeny Kliteynik	0caebadda5	net/mlx5: DR, Add modify header argument pointer to actions attributes While building the actions, add the pointer of the arguments for accelerated modify list action into the action's attributes. This will be used later on while building the specific STE for this action. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-04-14 15:06:22 -07:00
Yevgeny Kliteynik	608d4f1769	net/mlx5: DR, Add modify header arg pool mechanism Added new mechanism for handling arguments for modify-header action. The new action "accelerated modify-header" asks for the arguments from separated area from the pattern, this area accessed via general objects. Handling of these object is done via the pool-manager struct. When the new header patterns are supported, while loading the domain, a few pools for argument creations will be created. The requests for allocating/deallocating arg objects are done via the pool manager API. Signed-off-by: Muhammad Sammar <muhammads@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-04-14 15:06:21 -07:00
Yevgeny Kliteynik	17dc71c336	net/mlx5: DR, Fix QP continuous allocation When allocating a QP we allocate an RQ and an SQ, the RQ is stored first in memory and followed by the SQ. This allocation is not physically continiuos - it may span across different physical pages. SW Steering code always writes in pairs: 1BB write + 1BB read, or 2 continuous BBs of GTA WQE. This lead to an issue where RQ allocation was 4x16 which is equal to 1 WQE BB, causing 1 BB offset in the page and splitting the GTA WQE between different physical pages. The solution was to create the RQ with a even number of BBs and to have the RQ aligned to a page. Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-04-14 15:06:21 -07:00
Yevgeny Kliteynik	7d7c9453d6	net/mlx5: DR, Read ICM memory into dedicated buffer Instead of using the write buffer for reading we will use a dedicated buffer only for reading ICM memory. Due to the new support for args, we can have a case with pending_wc being odd number, and with reading into the same write buffer, it is possible to overwrite next write on the same slot. For example: pending_wc is 17 so the buffer for write is: \| 1 \| 2 \| 3 \| 4 \| 5 \| 6 \| 7 \| 8 \| and we have requests as follows: r wr wr wr wr wr wr wr wr Now, the first read will be written into the last write because we use the same buffer for read and write, before it was written to the HW and we will have a wrong data in the ICM area. Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-04-14 15:06:21 -07:00
Yevgeny Kliteynik	4605fc0a2b	net/mlx5: DR, Add support for writing modify header argument The accelerated modify header arguments are written in the HW area with special WQE and specific data format. New function was added to support writing of new argument type. Note that GTA WQE is larger than READ and WRITE, so the queue management logic was updated to support this. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-04-14 15:06:21 -07:00
Yevgeny Kliteynik	de69696b6e	net/mlx5: DR, Add create/destroy for modify-header-argument general object Add functions for creation/destruction of the new type of general object. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-04-14 15:06:21 -07:00
Yevgeny Kliteynik	b7ba743a2f	net/mlx5: DR, Check for modify_header_argument device capabilities Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-04-14 15:06:21 -07:00
Yevgeny Kliteynik	2533e726f4	net/mlx5: DR, Split chunk allocation to HW-dependent ways This way we are able to allocate chunk for modify_headers from 2 types: STEv0 that is allocated from the action area, and STEv1 that is allocating the chunks from the special area for patterns. Signed-off-by: Muhammad Sammar <muhammads@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-04-14 15:06:20 -07:00
Yevgeny Kliteynik	da5d0027d6	net/mlx5: DR, Add cache for modify header pattern Starting with ConnectX-6 Dx, we use new design of modify_header FW object. The current modify_header object allows for having only limited number of FW objects, so the new design of pattern and argument allows pattern reuse, saving memory, and having a large number of modify_header objects. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-04-14 15:06:20 -07:00
Yevgeny Kliteynik	b47dddc624	net/mlx5: DR, Move ACTION_CACHE_LINE_SIZE macro to header Move ACTION_CACHE_LINE_SIZE macro to header to be used by the pattern functions as well. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-04-14 15:06:20 -07:00
Linus Torvalds	7a934f4bd7	RISC-V Fixes for 6.3-rc7 * A fix for a missing fence when generating the NOMMU sigreturn trampoline. * A set of fixes for early DTB handling of reserved memory nodes. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmQ5ZsoTHHBhbG1lckBk YWJiZWx0LmNvbQAKCRAuExnzX7sYiUhED/9IsUdJfMdQHzrL7XZL7U9FgHhSCVl4 HgpcpngG6Op+F8lGnVhQV4ivoVYrk85lR2JqJeES9FpChSmZXCptB6K9NFCDLdW+ 9+Qnqa6NnmcJ9Aba7Ckr7zwApohtDGucegIKx3bqafsnBqh2SM/l7ljCWEuaxIbZ 5qFEbWGcILVGQrXJheLgkx7uGbBmEvrmLi9T5jZ1Vwltg1QS2STDjOCSR5cfBpx7 kqoWfSa1+fb5RF428KRvDTuIjUzkkKvUEo/yEzJMMp1s3vI2mKT8ytqQHw4GrQ9w 6X/wvabjDtz8iGRWsh4VVg7iJ+xImzpqRO4UXkd4wArCDdCvjSu4BpAs3cOqKVTl 4w+dTbQPjL22PYCdhBVmJH8K4TmJGzSkOPK8wcIlK1mQaRCAjAalSvCLQIN/+ttq 4AwkXLpfwLvJLhn78Y/7WS6dbedYjMbNxdz2Sy6yFm3qZj1dqfgS5yIuHpLVo7UW AK7nS3FGJRyii8w/lywxDgLTdMeJiZ8/Xy0gJkWps5sGolBIPqSwUcLDhFS68HoM 6x+IzOXDlHnw7i/sj6Q10IRvjFVLieNDYniD7CnOYDKFSgUOvehCMlky7l4naRdO vfVHsHPX38z5ZTeM/vZ1HKobKeTmi3Hy/C1MWUwMgj2nPkui777c9mAQLHY7ppeZ dr4Cn/4GNfZz3w== =1Tw7 -----END PGP SIGNATURE----- Merge tag 'riscv-for-linus-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - A fix for a missing fence when generating the NOMMU sigreturn trampoline - A set of fixes for early DTB handling of reserved memory nodes * tag 'riscv-for-linus-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: No need to relocate the dtb as it lies in the fixmap region riscv: Do not set initial_boot_params to the linear address of the dtb riscv: Move early dtb mapping into the fixmap region riscv: add icache flush for nommu sigreturn trampoline	2023-04-14 10:44:48 -07:00
Linus Torvalds	95abc817ab	ACPI fixes for 6.3-rc7 - Add a quirk to force StorageD3Enable on AMD Picasso systems (Mario Limonciello). - Add an ACPI IRQ override quirk for ASUS ExpertBook B1502CBA (Paul Menzel). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmQ5V/4SHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRx3Z8P/3LgmzvoNt+Xsfx3u3//vyu02XGHd48N xA9SQgNJeqwPQxpjnZnycuX5RPex4NGjzEQaIuXkxRVf3UnMzxFir4AYT22qJfea 7m1S59ym/LBCOIhGWuQ4de9vhmebR60nSMrR5geaQVwWedTUfMphIjh10volcnae IwWuaABCtL9vNoPdzy1syhg1nWFZnMIvDrGDdexBQ+68BQznAnEvL0ZlEUv4gls3 Dh2I4ZYe+95Me8qAaNthG3U+BfLDR+/7Sqo3H0urOoTDcLxq/n2Wu6YfeC1Y6dp2 LoBPWGVQpmuITZjTrr5BDTbzjePEA7l0vRdKkhKqmkQ6cfgelwL6LxK2ShIU8Ulx 5OFxlR/i3kZmdpc3n32WdCaVScNEjZ1NEhoGkqAOBXyIplDm83K+uscy6M5Gv5fM lCgwzWu5Vn+NLDzZNx2fIFHOs5sLXb8R9cnycazVUas/NMC96VYRqViK4jcooFHz +y/Z4Sl23gx9i8C85UYN/02jPcjjALWH1PlOyVXWx/0wOvukYLQ1z943+ZEZyfkb vtPwv/xFuLNiZIrnDBBMoi0eri5amdqchYC/JKzDlVs6oPtR2F7hM7nqaajJ1JOz WLQy4XH5lAueUeMneAY2Th7Ogc3lMHsAPOChLwXAvPRJyxPsBaiS0PUeTVcGdiQ/ vYUg1budlxa+ =SG+Z -----END PGP SIGNATURE----- Merge tag 'acpi-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These add two ACPI-related quirks: - Add a quirk to force StorageD3Enable on AMD Picasso systems (Mario Limonciello) - Add an ACPI IRQ override quirk for ASUS ExpertBook B1502CBA (Paul Menzel)" * tag 'acpi-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: resource: Skip IRQ override on ASUS ExpertBook B1502CBA ACPI: x86: utils: Add Picasso to the list for forcing StorageD3Enable	2023-04-14 10:37:07 -07:00
Linus Torvalds	4b992ead33	Power management fix for 6.3-rc7 Make the amd-pstate cpufreq driver take all of the possible combinations of the "old" and "new" status values correctly while changing the operation mode via sysfs (Wyes Karny). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmQ5WF8SHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxGt0P/0NGMuOoDVS9zNhROphsUhBWw/XuVYQQ xrtAkouF9zPDRR7R3p/tbqHNeaSOGapmGbFK6Xm4Z+Cko09jlHpIssfrkCK9W+mh BI8CC6jdFavstzbifzLwVg9dkTeD2ANgCwkSM4CLxzyzktV3C4WOcrBombzpK6CI rTL039jabNaelSzB+dp2asX8GVQD+H57BXTPb33h6BLfH2tvnreVEfTOTlOvrCKD eClLvDw8QArp74qh4U0VwtJ1EIZsYqv84AxSfmKQi9dOrfwIK81D9dHhJXpKfj4V 4DSOuRRxG4wbxF816gXm1mbFiFs2+aK3SkrPXLP4LyphqMfDRP3ID/HojcrUBBtp YCiAQ1JHa9DLWuqOO8/pqUgIetwLuZYD/10JBgXX+ULmkC4Wuqd1Lz1GspdP9/Xy YJrMcwTAj5KCZw2ehdaeZMBa7Ox2flkeS1QdMY8uBposX2w/F8IhWTC4Xe5tbmk2 s1hYZL7yHAZWTyNtcTY7+gPte4qRzQJ/dhFgvbwclaL6z4cgLitkRMf1Rv+vceV+ pKpqzZWRUBqVCNSl8zZdYfyYy2qfAqMszvbDH5fa+26wqQNsKjAGrfQy4qOZGBje BQKsVA1MlAAEckmRH5wOoO0XieT9pn7hJjuTcocg4R4mdF+IfgmUsHgDiGnxG6X+ BkdRd1ec5NX+ =gU5P -----END PGP SIGNATURE----- Merge tag 'pm-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Make the amd-pstate cpufreq driver take all of the possible combinations of the 'old' and 'new' status values correctly while changing the operation mode via sysfs (Wyes Karny)" * tag 'pm-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: amd-pstate: Fix amd_pstate mode switch	2023-04-14 10:25:30 -07:00
Linus Torvalds	d0b85e7e60	Thermal control fix for 6.3-rc7 Modify the Intel thermal throttling code to avoid updating unsupported status clearing mask bits which causes the kernel to complain about unchecked MSR access (Srinivas Pandruvada). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmQ5WNoSHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxN3sP/jxTTvyatqFDmrmjuf4rysrapgGhh9IF l1iTixZiHWN3KsbpGoXGvK6QmbyAB0S61FMzucnKktrdzA5tSLSms+UqQcTlqucP pjNzZikojs4pEML9GvlChC2V2aKhALNrzEfIuS24Pw/TEDTH2Mbm0Wrz+yRg5PPt pQhdBwe9Y/NTSVRt3JlZJEzGcWajRsx5YZMyud2zGUqXtNJnpSw5y3Klrbdd3urs 6501j/jLUcLRmLbnmf8oZGs0tCZK/FVhfgVpBemhJgMxeqflMnXBwYw43d0GLJNA bg7kq6eSCr7IP743AAuJaory9WCqAoa+6Km1BGS5TpTOdVtsJ5UDr7ARaCB7E+jg o0lYIs3O1q4WqFBV85R0JtRDopgdkFYxWSjNNa5lyAVwVbVA7Xi7jt48LUaSRO23 Ag76U7VWMXNKSpznPxhTboywE5MH4Gu+VeoVeitXNTG7hM7plB1UoFZ5br3+dPfT qyxHvMkG2n+V9a7JXxaGg2/LvG97QX9LPtCPMTyVAMgQWtg2JN3NblXzGjunZnVK kXScxvRYKKH2xz1ArdTXJj0z9CsnyHRebKXk+PcbFY0kKtQupKpNXSJsMv5zC4kP /uKVxxdAroLfoMkD8BFjhiDV7GHrhXpSBJz72dOIB0eG4jEV+82XCohdNIj9xzEV LtJcGOutktSu =8D2v -----END PGP SIGNATURE----- Merge tag 'thermal-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fix from Rafael Wysocki: "Modify the Intel thermal throttling code to avoid updating unsupported status clearing mask bits which causes the kernel to complain about unchecked MSR access (Srinivas Pandruvada)" * tag 'thermal-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: intel: Avoid updating unsupported THERM_STATUS_CLEAR mask bits	2023-04-14 10:19:18 -07:00
Linus Torvalds	e251c42318	sound fixes for 6.3-rc7 A collection of small fixes. At this time, quite a few fixes for the old PCI drivers are found. Although they are no regression fixes, I took these as they are materials for stable kernels. In addition, a couple of regression fixes and another couple of HD-audio quirks are included. -----BEGIN PGP SIGNATURE----- iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmQ5RlAOHHRpd2FpQHN1 c2UuZGUACgkQLtJE4w1nLE+lzxAAzfsapwLW4pN5gDApDiAFxu9Lu9VQvk3e4k7W khQFb7o+2jdjyCTiGEQCfFbPu/Ru4KlrYXkCkHEXRR0/95NiRq9GQx6xJtU+S7/m WBTWPhIh2Hic90yYUTD62Pb7j40P8C+wsATwpQftQIdGixBdv7nirbbzBPi5Xfcs +4iu8TBEh9izFIAnXADl/O+WA3E4r6TOvDeuD2FZLQWPcJLeHF9h79BU0k7UmUYR ZgLg+0GJrwJR9A1SzGs5kc47Q0zmP62gRyExBSnskHFjCglbTCo0VhVpBoBi+o0y epHyOJrvs/AdpMz4VFvn4WP5IVtyxa0diUWmd/eRczzzxJThLpMQYx2+ukYkUJMc +fua+NraWqXwC+s7+C/N8MhFXbSrRm+4fzOPmBdE9dV/hnAQIOCWM6f9PhciUcLm kZfcCCwZXNR0TVmtlZxKq5s4xyoYVVkEctQU3QO8TeHXKmV8EYQO+zzYQjch2izD H6gJyT8/QcKhRQSkthLEiltnkuFY+nq+jDdlCRH2/9v5VjvY0XZlBpxuP0vPIYX4 iChCuCu5qkforCijetDkdArWdO4+WiFums4t+h1hekvhnN9IrrXUxU+dn+Hu63jJ oVtlxW1AVzEcYynUowI3jSo4z/5jxBF8a10XF4+ysCbUoTQ6pYTMVl4wYDEEEYJB 2RUdHPQ= =Gmhh -----END PGP SIGNATURE----- Merge tag 'sound-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of small fixes. At this time, quite a few fixes for the old PCI drivers are found. Although they are not regression fixes, I took these as they are materials for stable kernels. In addition, a couple of regression fixes and another couple of HD-audio quirks are included" * tag 'sound-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/hdmi: disable KAE for Intel DG2 ALSA: hda/realtek: Add quirks for Lenovo Z13/Z16 Gen2 ALSA: hda: patch_realtek: add quirk for Asus N7601ZM ALSA: firewire-tascam: add missing unwind goto in snd_tscm_stream_start_duplex() ALSA: emu10k1: don't create old pass-through playback device on Audigy ALSA: emu10k1: fix capture interrupt handler unlinking ALSA: hda/sigmatel: fix S/PDIF out on Intel D45 motherboards ALSA: hda/sigmatel: add pin overrides for Intel DP45SG motherboard ALSA: i2c/cs8427: fix iec958 mixer control deactivation	2023-04-14 10:13:54 -07:00
Linus Torvalds	aee3c14e86	v6.3 rc RDMA pull request Driver bug fixes: - irdma should not generate extra completions during flushing - Fix several memory leaks - Do not get confused in irdma's iwarp mode if IPv6 is present - Correct a link speed calculation in mlx5 - Increase the EQ/WQ limits on erdma as they are too small for big applications - Use the right math for erdma's inline mtt feature - Make erdma probing more robust to boot time ordering differences - Fix a KMSAN crash in CMA due to uninitialized qkey -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZDlBPgAKCRCFwuHvBreF YfOtAQDAX3rEL4T6xu4jIHAhInTYZCYVI7pJALTzHh62DmdoZAD+Owy2vTRTmkvJ OLfA++yDRWdrzeSeCWbTYpwEo+00TAA= =9XAm -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma fixes from Jason Gunthorpe: "We had a fairly slow cycle on the rc side this time, here are the accumulated fixes, mostly in drivers: - irdma should not generate extra completions during flushing - Fix several memory leaks - Do not get confused in irdma's iwarp mode if IPv6 is present - Correct a link speed calculation in mlx5 - Increase the EQ/WQ limits on erdma as they are too small for big applications - Use the right math for erdma's inline mtt feature - Make erdma probing more robust to boot time ordering differences - Fix a KMSAN crash in CMA due to uninitialized qkey" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/core: Fix GID entry ref leak when create_ah fails RDMA/cma: Allow UD qp_type to join multicast only RDMA/erdma: Defer probing if netdevice can not be found RDMA/erdma: Inline mtt entries into WQE if supported RDMA/erdma: Update default EQ depth to 4096 and max_send_wr to 8192 RDMA/erdma: Fix some typos IB/mlx5: Add support for 400G_8X lane speed RDMA/irdma: Add ipv4 check to irdma_find_listener() RDMA/irdma: Increase iWARP CM default rexmit count RDMA/irdma: Fix memory leak of PBLE objects RDMA/irdma: Do not generate SW completions for NOPs	2023-04-14 10:06:50 -07:00
Ilya Leoshkevich	c730fce7c7	s390/bpf: Fix bpf_arch_text_poke() with new_addr == NULL Thomas Richter reported a crash in linux-next with a backtrace similar to the following one: [<0000000000000000>] 0x0 ([<000000000031a182>] bpf_trace_run4+0xc2/0x218) [<00000000001d59f4>] __bpf_trace_sched_switch+0x1c/0x28 [<0000000000c44a3a>] __schedule+0x43a/0x890 [<0000000000c44ef8>] schedule+0x68/0x110 [<0000000000c4e5ca>] do_nanosleep+0xa2/0x168 [<000000000026e7fe>] hrtimer_nanosleep+0xf6/0x1c0 [<000000000026eb6e>] __s390x_sys_nanosleep+0xb6/0xf0 [<0000000000c3b81c>] __do_syscall+0x1e4/0x208 [<0000000000c50510>] system_call+0x70/0x98 Last Breaking-Event-Address: [<000003ff7fda1814>] bpf_prog_65e887c70a835bbf_on_switch+0x1a4/0x1f0 The problem is that bpf_arch_text_poke() with new_addr == NULL is susceptible to the following race condition: T1 T2 ----------------- ------------------- plt.target = NULL entry: brcl 0xf,plt entry.mask = 0 lgrl %r1,plt.target br %r1 Fix by setting PLT target to the instruction following `brcl 0xf,plt` instead of 0. This way T2 will simply resume the execution of the eBPF program, which is the desired effect of passing new_addr == NULL. Fixes: `f1d5df84cd` ("s390/bpf: Implement bpf_arch_text_poke()") Reported-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/bpf/20230414154755.184502-1-iii@linux.ibm.com	2023-04-14 18:28:52 +02:00
Rafael J. Wysocki	a3babdb7a8	Merge branch 'acpi-x86' Merge a quirk to force StorageD3Enable on AMD Picasso systems (Mario Limonciello). * acpi-x86: ACPI: x86: utils: Add Picasso to the list for forcing StorageD3Enable	2023-04-14 15:15:32 +02:00
Ming Lei	860e1c7f8b	io_uring: complete request via task work in case of DEFER_TASKRUN So far io_req_complete_post() only covers DEFER_TASKRUN by completing request via task work when the request is completed from IOWQ. However, uring command could be completed from any context, and if io uring is setup with DEFER_TASKRUN, the command is required to be completed from current context, otherwise wait on IORING_ENTER_GETEVENTS can't be wakeup, and may hang forever. The issue can be observed on removing ublk device, but turns out it is one generic issue for uring command & DEFER_TASKRUN, so solve it in io_uring core code. Fixes: `e6aeb2721d` ("io_uring: complete all requests in task context") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/linux-block/b3fc9991-4c53-9218-a8cc-5b4dd3952108@kernel.dk/ Reported-by: Jens Axboe <axboe@kernel.dk> Cc: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-14 06:38:23 -06:00
Jens Axboe	f7ca1ae32b	Merge branch 'nvme-6.3' of git://git.infradead.org/nvme into block-6.3 Pull NVMe fix from Christoph. * 'nvme-6.3' of git://git.infradead.org/nvme: nvme-pci: add NVME_QUIRK_BOGUS_NID for T-FORCE Z330 SSD	2023-04-14 06:29:00 -06:00
Arnd Bergmann	d75eecc3d1	A few more Qualcomm ARM64 DeviceTree fixes for 6.3 The GPIO polarity of the WSA881x shutdown GPIO was inconsistent and had to be corrected in the driver, this fixes the polarity in the DeviceTree for QRB5165 RB5, SM8250 MTP, Samsung Galaxy Book 2 and Lenovo Yoga C630. The recent rearrangement of nodes among the IPQ8074 accidentally enabled the PCIe PHYs, rather than the PCIe controllers. This is being corrected, to restore PCIe functionality. PMK8280 PON node has the wrong compatible, which recently caused the driver to stop probing. This is corrected and the required "pbs" region is added. With support for HBR3 introduced, it's noted that SC7280 Herobrine devices are having trouble running at this rate. This drops the claim that it's supported, until further analysis can be done. -----BEGIN PGP SIGNATURE----- iQJJBAABCAAzFiEEBd4DzF816k8JZtUlCx85Pw2ZrcUFAmQ0LU8VHGFuZGVyc3Nv bkBrZXJuZWwub3JnAAoJEAsfOT8Nma3Fa+wP/23+62lBn+5sfsUMo5k/ajYel0Ug sfqMS7PDvTPG3AsVhk1R95NrKWb9eWmkpvkwVKsBHD75BeRkuDovIx9yxSLSDZm2 yWlV514AOLWlhgBu75XzOIDy/J9BQccZPmkBHPJY2I9vpnvlhWnQhaPx4tKgQM8T 5qKuQUf2Q1sQy/QDcbOrMEpHCBHl+Zm11ffmnFosXtVsF0xEzcGyHcKwCOMSaxg1 KVxeK0wS3Mt09D4SiPwVNaMA+c0vcexp2jpySoQBMbEmbYOZr3DEU+b5lyZDQqOi odMwF4EhMHaGvUCRwM+c4GgdcdXP8IT4eex9ZjdgYblzJsAUfdpFQ5NBB4PeoHPt 9GOp1n+s4+L3pVm9oyeqUDAUNhE3X6PinOfEScwtgFnQAob45k+6vznVgWt66Apg uZAXRE1kONxtJO7bR8hzH+NH395mtb18lHh1MF09KdIR48wl8bK2f91yP0NkSTY8 N4mKcdy9GwBAmTK3FpEmExOWfmwq0EXdeV2j2RdrVTGNBvTuX9nzpKHUdckk77Ay sOS5KfUANAHy/7UIOZHwerPKImRyo+y7gZ8kRcihtUgGOXaESeT1s1KPsTUE2ZqX xWuJWRxYizPHdn809iEr5jgAHOAZlKh5RsJ6QQijY1XKNwNLU5/8iLb61x55PbHv fnvqlKTO04pDSWD9 =3Uaf -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmQ5PpEACgkQYKtH/8kJ Uic7Vw/8CTiwm+gErIrED6Wm7eCNkzvENkruXboy5437tyrPsnd68sJ+KFs7lznz 0cXSA06bGKp6rj/i1nodfwO8A0PtEl+RAtCaRuesCI8hQRfkF9o6BOEjpmQ4mYbn 09nr0ehIatVdwl+RxFn9GZEBQT4yPIt8cTjjgT/zyxUsomYJjDv6dl9iODyX8TBY w3J2VInSUW1GRrlf+GD7XPxbIHhHl7tqWG1neBENQirWb7N0xjDuq2X7absWyzgt TaPaKYiaqr9X5cE1cfDff+RDoBVZRdtG5H7Gs1u28UaVvmziP04oCxgD8L5cHbX5 niT8cq0d0+f367o+B9WXfcqJXHk/0q3clWP+Ad5q6Bt7HhAkfxY5eezYUMgoDpN4 3+fRrdBJBVK7e2OobxK8+tUt5Zu97zIrNL1b0k9+RkauhBL1efuOQ/tAfh4AY4LR tguyJ6RBIneNYlAWsWPW7f4TpkR/2XmnFYityjFMLLj75rVK2oC+q1kYU/4/Un6a Z9G20Uotr3YXThGQJasZSwh+R0maymIEN7pugg9mrImC5zvGOUXA1DC+6MqdLmd1 ocqSr3a0p6jRe9t0xSBxS8YtFT33zU9pevOTnIIYrQhyQ1xL8DFXjmoEFbuQdvWG Ks2x83/F5OefLoGYXG5Q/K10eKmodaWsuyPsSePOkOTo4VWZEoE= =ikZD -----END PGP SIGNATURE----- Merge tag 'qcom-arm64-fixes-for-6.3-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes A few more Qualcomm ARM64 DeviceTree fixes for 6.3 The GPIO polarity of the WSA881x shutdown GPIO was inconsistent and had to be corrected in the driver, this fixes the polarity in the DeviceTree for QRB5165 RB5, SM8250 MTP, Samsung Galaxy Book 2 and Lenovo Yoga C630. The recent rearrangement of nodes among the IPQ8074 accidentally enabled the PCIe PHYs, rather than the PCIe controllers. This is being corrected, to restore PCIe functionality. PMK8280 PON node has the wrong compatible, which recently caused the driver to stop probing. This is corrected and the required "pbs" region is added. With support for HBR3 introduced, it's noted that SC7280 Herobrine devices are having trouble running at this rate. This drops the claim that it's supported, until further analysis can be done. * tag 'qcom-arm64-fixes-for-6.3-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: arm64: dts: qcom: sc7280: remove hbr3 support on herobrine boards arm64: dts: qcom: sc8280xp-pmics: fix pon compatible and registers arm64: dts: qcom: ipq8074-hk10: enable QMP device, not the PHY node arm64: dts: qcom: ipq8074-hk01: enable QMP device, not the PHY node arm64: dts: qcom: qrb5165-rb5: Use proper WSA881x shutdown GPIO polarity arm64: dts: qcom: sm8250-mtp: Use proper WSA881x shutdown GPIO polarity arm64: dts: qcom: sdm850-samsung-w737: Use proper WSA881x shutdown GPIO polarity arm64: dts: qcom: sdm850-lenovo-yoga-c630: Use proper WSA881x shutdown GPIO polarity Link: https://lore.kernel.org/r/20230410153850.4752-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2023-04-14 13:52:48 +02:00
Arnd Bergmann	43950556b7	Lower sd card speeds for two boards to make them run more reliable, missing 32k clock definition for Anbric xx3 devices, missing cache-levels for rk3588, fixed rk3326-board display supplies and more dt-schema fixes. -----BEGIN PGP SIGNATURE----- iQFEBAABCAAuFiEE7v+35S2Q1vLNA3Lx86Z5yZzRHYEFAmQx/6UQHGhlaWtvQHNu dGVjaC5kZQAKCRDzpnnJnNEdgYMiB/45skI/LcajSjuqIJpAC+cDeCs/IaQb6aev 4fnwBRYgb836JvThTGeOn4KRe2avnZMfyjn3IlkHhYBHtLukS8jcuNuU+T+oD1/n Yss63kUSbMCvNl13uCSz/dqsaFZJ1x8hHth0b19PlGWnnBgtzWWWScr371m+PsF6 2x2dnteRvQcUyWa8/5Xavgc8bseMds2SiKEc9Y58MTwRVL/pqfh9AFJhsWRJYFsN x0SU+Gwopg2qW+Mfdm6DOK1m08Vu6Uaw1zftF06/oAnDqvYMiVSWfsFIpW5eEi/T kb5ovzTTDwnLz/n8jkG1ldB7hQY362NQjyIc/GoUab/03yxyFE63 =JS4f -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmQ5PlAACgkQYKtH/8kJ UifHPRAAwF4jqjWejgzFnHsB7RPmjvUtvuobU0gmI25HIjKnWLdcuS4IFXWvFgLU gxc1oQS1m4C7eM8ceI/jGR3oDZfhKXCIrpif9Tg+LxtmwRaeo3tbpasPANwiO+Eq BI6LssLFStBblwwaaMzC1E/8fCdBsZYVbQe4LqsJmzb9hOvFCih4At1YFHKQXcHM L+1sCdpKMo/f/AQ3sjS3kEYnFwrIYdskxImHXmYkYe2e1OMkXSQcWjz958A7GPB1 kdm5yMPARR8FZxY9RZBf6QLg3xnIc/OqraeHYCjpFLs+gDGGoEMGwwBW5u3/JJmE SLTq2EYYf6szdVh156zjfMtCFCIM51BXoWwlB3swhWI80WD87lzrKzYYRbW5H6jp 5SIW74a8vR2OxOKyQu00Qb8S0SsTJF5Ord9EImYsm6rXpQnSs894kp18zNjmjUDh FLxNhMMuY8aiwbypYUm4YRFN9BJJth3SKLXIyP1xxBEpN3IIeQSIXeD9ZoA+sc6e OdEREqMwQV9xPq84JQLsxTqjcSCwO1MJECqwNvKj07w0qdaIMgDAgirLCTXSPJPh b8b2p07Dpb5wFKw/FwGSBuvi3DEzSLhK7zbJzky8zNhmWNAx0VS7gR/koFQDGxR0 gNcVU5vPB2ujvBdW76MegqynP2UowsGdD3Sy3KlB/FAZ+LTGMGY= =Yjl8 -----END PGP SIGNATURE----- Merge tag 'v6.3-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/fixes Lower sd card speeds for two boards to make them run more reliable, missing 32k clock definition for Anbric xx3 devices, missing cache-levels for rk3588, fixed rk3326-board display supplies and more dt-schema fixes. * tag 'v6.3-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: arm64: dts: rockchip: correct panel supplies on some rk3326 boards arm64: dts: rockchip: use just "port" in panel on RockPro64 arm64: dts: rockchip: use just "port" in panel on Pinebook Pro arm64: dts: rockchip: Remove non-existing pwm-delay-us property arm64: dts: rockchip: Add clk_rtc_32k to Anbernic xx3 Devices arm64: dts: rockchip: add rk3588 cache level information arm64: dts: rockchip: Lower SD card speed on rk3399 Pinebook Pro arm64: dts: rockchip: Lower sd speed on rk3566-soquartz ARM: dts: rockchip: fix a typo error for rk3288 spdif node arm64: dts: rockchip: Fix rk3399 GICv3 ITS node name Link: https://lore.kernel.org/r/10559306.CDJkKcVGEf@phil Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2023-04-14 13:51:43 +02:00
Johan Hovold	4323516879	firmware/psci: demote suspend-mode warning to info level On some Qualcomm platforms, like SC8280XP, the attempt to set PC mode during boot fails with PSCI_RET_DENIED and since commit `998fcd001f` ("firmware/psci: Print a warning if PSCI doesn't accept PC mode") this is now logged at warning level: psci: failed to set PC mode: -3 As there is nothing users can do about the firmware behaving this way, demote the warning to info level and clearly mark it as a firmware bug: psci: [Firmware Bug]: failed to set PC mode: -3 Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Acked-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2023-04-14 13:48:56 +02:00
David S. Miller	c11d2e718c	Merge branch 'msg_control-split' Kevin Brodsky says: ==================== net: Finish up ->msg_control{,_user} split Commit `1f466e1f15` ("net: cleanly handle kernel vs user buffers for ->msg_control") introduced the msg_control_user and msg_control_is_user fields in struct msghdr, to ensure that user pointers are represented as such. It also took care of converting most users of struct msghdr::msg_control where user pointers are involved. It did however miss a number of cases, and some code using msg_control inappropriately has also appeared in the meantime. This series is attempting to complete the split, by eliminating the remaining cases where msg_control is used when in fact a user pointer is stored in the union (patch 1). It also addresses a couple of issues with msg_control_is_user: one where it is not updated as it should (patch 2), and one where it is not initialised (patch 3). v1..v2: * Split out the msg_control_is_user fixes into separate patches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-14 11:09:27 +01:00
Kevin Brodsky	b6d85cf5bd	net/ipv6: Initialise msg_control_is_user do_ipv6_setsockopt() makes use of struct msghdr::msg_control in the IPV6_2292PKTOPTIONS case. Make sure to initialise msg_control_is_user accordingly. Cc: Christoph Hellwig <hch@lst.de> Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-14 11:09:27 +01:00
Kevin Brodsky	60daf8d40b	net/compat: Update msg_control_is_user when setting a kernel pointer cmsghdr_from_user_compat_to_kern() is an unusual case w.r.t. how the kmsg->msg_control* fields are used. The input struct msghdr holds a pointer to a user buffer, i.e. ksmg->msg_control_user is active. However, upon success, a kernel pointer is stored in kmsg->msg_control. kmsg->msg_control_is_user should therefore be updated accordingly. Cc: Christoph Hellwig <hch@lst.de> Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-14 11:09:27 +01:00
Kevin Brodsky	c39ef21304	net: Ensure ->msg_control_user is used for user buffers Since commit `1f466e1f15` ("net: cleanly handle kernel vs user buffers for ->msg_control"), pointers to user buffers should be stored in struct msghdr::msg_control_user, instead of the msg_control field. Most users of msg_control have already been converted (where user buffers are involved), but not all of them. This patch attempts to address the remaining cases. An exception is made for null checks, as it should be safe to use msg_control unconditionally for that purpose. Cc: Christoph Hellwig <hch@lst.de> Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-14 11:09:27 +01:00
Arseniy Krasnov	eaaa4e9239	vsock/loopback: don't disable irqs for queue access This replaces 'skb_queue_tail()' with 'virtio_vsock_skb_queue_tail()'. The first one uses 'spin_lock_irqsave()', second uses 'spin_lock_bh()'. There is no need to disable interrupts in the loopback transport as there is no access to the queue with skbs from interrupt context. Both virtio and vhost transports work in the same way. Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-14 11:04:04 +01:00
Gwangun Jung	3037933448	net: sched: sch_qfq: prevent slab-out-of-bounds in qfq_activate_agg If the TCA_QFQ_LMAX value is not offered through nlattr, lmax is determined by the MTU value of the network device. The MTU of the loopback device can be set up to 2^31-1. As a result, it is possible to have an lmax value that exceeds QFQ_MIN_LMAX. Due to the invalid lmax value, an index is generated that exceeds the QFQ_MAX_INDEX(=24) value, causing out-of-bounds read/write errors. The following reports a oob access: [ 84.582666] BUG: KASAN: slab-out-of-bounds in qfq_activate_agg.constprop.0 (net/sched/sch_qfq.c:1027 net/sched/sch_qfq.c:1060 net/sched/sch_qfq.c:1313) [ 84.583267] Read of size 4 at addr ffff88810f676948 by task ping/301 [ 84.583686] [ 84.583797] CPU: 3 PID: 301 Comm: ping Not tainted 6.3.0-rc5 #1 [ 84.584164] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 84.584644] Call Trace: [ 84.584787] <TASK> [ 84.584906] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) [ 84.585108] print_report (mm/kasan/report.c:320 mm/kasan/report.c:430) [ 84.585570] kasan_report (mm/kasan/report.c:538) [ 84.585988] qfq_activate_agg.constprop.0 (net/sched/sch_qfq.c:1027 net/sched/sch_qfq.c:1060 net/sched/sch_qfq.c:1313) [ 84.586599] qfq_enqueue (net/sched/sch_qfq.c:1255) [ 84.587607] dev_qdisc_enqueue (net/core/dev.c:3776) [ 84.587749] __dev_queue_xmit (./include/net/sch_generic.h:186 net/core/dev.c:3865 net/core/dev.c:4212) [ 84.588763] ip_finish_output2 (./include/net/neighbour.h:546 net/ipv4/ip_output.c:228) [ 84.589460] ip_output (net/ipv4/ip_output.c:430) [ 84.590132] ip_push_pending_frames (./include/net/dst.h:444 net/ipv4/ip_output.c:126 net/ipv4/ip_output.c:1586 net/ipv4/ip_output.c:1606) [ 84.590285] raw_sendmsg (net/ipv4/raw.c:649) [ 84.591960] sock_sendmsg (net/socket.c:724 net/socket.c:747) [ 84.592084] __sys_sendto (net/socket.c:2142) [ 84.593306] __x64_sys_sendto (net/socket.c:2150) [ 84.593779] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 84.593902] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) [ 84.594070] RIP: 0033:0x7fe568032066 [ 84.594192] Code: 0e 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c09[ 84.594796] RSP: 002b:00007ffce388b4e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c Code starting with the faulting instruction =========================================== [ 84.595047] RAX: ffffffffffffffda RBX: 00007ffce388cc70 RCX: 00007fe568032066 [ 84.595281] RDX: 0000000000000040 RSI: 00005605fdad6d10 RDI: 0000000000000003 [ 84.595515] RBP: 00005605fdad6d10 R08: 00007ffce388eeec R09: 0000000000000010 [ 84.595749] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040 [ 84.595984] R13: 00007ffce388cc30 R14: 00007ffce388b4f0 R15: 0000001d00000001 [ 84.596218] </TASK> [ 84.596295] [ 84.596351] Allocated by task 291: [ 84.596467] kasan_save_stack (mm/kasan/common.c:46) [ 84.596597] kasan_set_track (mm/kasan/common.c:52) [ 84.596725] __kasan_kmalloc (mm/kasan/common.c:384) [ 84.596852] __kmalloc_node (./include/linux/kasan.h:196 mm/slab_common.c:967 mm/slab_common.c:974) [ 84.596979] qdisc_alloc (./include/linux/slab.h:610 ./include/linux/slab.h:731 net/sched/sch_generic.c:938) [ 84.597100] qdisc_create (net/sched/sch_api.c:1244) [ 84.597222] tc_modify_qdisc (net/sched/sch_api.c:1680) [ 84.597357] rtnetlink_rcv_msg (net/core/rtnetlink.c:6174) [ 84.597495] netlink_rcv_skb (net/netlink/af_netlink.c:2574) [ 84.597627] netlink_unicast (net/netlink/af_netlink.c:1340 net/netlink/af_netlink.c:1365) [ 84.597759] netlink_sendmsg (net/netlink/af_netlink.c:1942) [ 84.597891] sock_sendmsg (net/socket.c:724 net/socket.c:747) [ 84.598016] ____sys_sendmsg (net/socket.c:2501) [ 84.598147] ___sys_sendmsg (net/socket.c:2557) [ 84.598275] __sys_sendmsg (./include/linux/file.h:31 net/socket.c:2586) [ 84.598399] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 84.598520] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) [ 84.598688] [ 84.598744] The buggy address belongs to the object at ffff88810f674000 [ 84.598744] which belongs to the cache kmalloc-8k of size 8192 [ 84.599135] The buggy address is located 2664 bytes to the right of [ 84.599135] allocated 7904-byte region [ffff88810f674000, ffff88810f675ee0) [ 84.599544] [ 84.599598] The buggy address belongs to the physical page: [ 84.599777] page:00000000e638567f refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10f670 [ 84.600074] head:00000000e638567f order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 84.600330] flags: 0x200000000010200(slab\|head\|node=0\|zone=2) [ 84.600517] raw: 0200000000010200 ffff888100043180 dead000000000122 0000000000000000 [ 84.600764] raw: 0000000000000000 0000000080020002 00000001ffffffff 0000000000000000 [ 84.601009] page dumped because: kasan: bad access detected [ 84.601187] [ 84.601241] Memory state around the buggy address: [ 84.601396] ffff88810f676800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 84.601620] ffff88810f676880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 84.601845] >ffff88810f676900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 84.602069] ^ [ 84.602243] ffff88810f676980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 84.602468] ffff88810f676a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 84.602693] ================================================================== [ 84.602924] Disabling lock debugging due to kernel taint Fixes: `3015f3d2a3` ("pkt_sched: enable QFQ to support TSO/GSO") Reported-by: Gwangun Jung <exsociety@gmail.com> Signed-off-by: Gwangun Jung <exsociety@gmail.com> Acked-by: Jamal Hadi Salim<jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-14 10:59:26 +01:00
David S. Miller	c61fcc090f	Merge branch 'mana-jumbo-frames' Haiyang Zhang says: ==================== net: mana: Add support for jumbo frame The set adds support for jumbo frame, with some optimization for the RX path. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-14 08:56:20 +01:00
Haiyang Zhang	80f6215b45	net: mana: Add support for jumbo frame During probe, get the hardware-allowed max MTU by querying the device configuration. Users can select MTU up to the device limit. When XDP is in use, limit MTU settings so the buffer size is within one page. And, when MTU is set to a too large value, XDP is not allowed to run. Also, to prevent changing MTU fails, and leaves the NIC in a bad state, pre-allocate all buffers before starting the change. So in low memory condition, it will return error, without affecting the NIC. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-14 08:56:19 +01:00
Haiyang Zhang	2fbbd712ba	net: mana: Enable RX path to handle various MTU sizes Update RX data path to allocate and use RX queue DMA buffers with proper size based on potentially various MTU sizes. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-14 08:56:19 +01:00
Haiyang Zhang	a2917b2349	net: mana: Refactor RX buffer allocation code to prepare for various MTU Move out common buffer allocation code from mana_process_rx_cqe() and mana_alloc_rx_wqe() to helper functions. Refactor related variables so they can be changed in one place, and buffer sizes are in sync. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-14 08:56:19 +01:00
Haiyang Zhang	ce518bc3e9	net: mana: Use napi_build_skb in RX path Use napi_build_skb() instead of build_skb() to take advantage of the NAPI percpu caches to obtain skbuff_head. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-14 08:56:19 +01:00
Kai Vehmanen	6ab6f98fcd	ALSA: hda/hdmi: disable KAE for Intel DG2 Use of keep-alive (KAE) has resulted in loss of audio on some A750/770 cards as the transition from keep-alive to stream playback is not working as expected. As there is limited benefit of the new KAE mode on discrete cards, revert back to older silent-stream implementation on these systems. Cc: stable@vger.kernel.org Fixes: `15175a4f2b` ("ALSA: hda/hdmi: add keep-alive support for ADL-P and DG2") Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8307 Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Link: https://lore.kernel.org/r/20230413191153.3692049-1-kai.vehmanen@linux.intel.com Signed-off-by: Takashi Iwai <tiwai@suse.de>	2023-04-14 07:50:52 +02:00
Jakub Kicinski	e473ea818b	mlx5-updates-2023-04-11 1) Vlad adds the support for linux bridge multicast offload support Patches #1 through #9 Synopsis Vlad Says: ============== Implement support of bridge multicast offload in mlx5. Handle port object attribute SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED notification to toggle multicast offload and bridge snooping support on bridge. Handle port object SWITCHDEV_OBJ_ID_PORT_MDB notification to attach a bridge port to MDB. Steering architecture Existing offload infrastructure relies on two levels of flow tables - bridge ingress and egress. For multicast offload the architecture is extended with additional layer of per-port multicast replication tables. Such tables filter loopback traffic (so packets are not replicated to their source port) and pop VLAN headers for "untagged" VLANs. The tables are referenced by the MDB rules in egress table. MDB egress rule can point to multiple per-port multicast tables, which causes matching multicast traffic to be replicated to all of them, and, consecutively, to several bridge ports: +--------+--+ +---------------------------------------> Port 1 \| \| \| +-^------+--+ \| \| \| \| +-----------------------------------------+ \| +---------------------------+ \| \| EGRESS table \| \| +--> PORT 1 multicast table \| \| +----------------------------------+ +-----------------------------------------+ \| \| +---------------------------+ \| \| INGRESS table \| \| \| \| \| \| \| \| +----------------------------------+ \| dst_mac=P1,vlan=X -> pop vlan, goto P1 +--+ \| \| FG0: \| \| \| \| \| dst_mac=P1,vlan=Y -> pop vlan, goto P1 \| \| \| src_port=dst_port -> drop \| \| \| src_mac=M1,vlan=X -> goto egress +---> dst_mac=P2,vlan=X -> pop vlan, goto P2 +--+ \| \| FG1: \| \| \| ... \| \| dst_mac=P2,vlan=Y -> goto P2 \| \| \| \| VLAN X -> pop, goto port \| \| \| \| \| dst_mac=MDB1,vlan=Y -> goto mcast P1,P2 +-----+ \| ... \| \| +----------------------------------+ \| \| \| \| \| VLAN Y -> pop, goto port +-------+ +-----------------------------------------+ \| \| \| FG3: \| \| \| \| matchall -> goto port \| \| \| \| \| \| \| +---------------------------+ \| \| \| \| \| \| +--------+--+ +---------------------------------------> Port 2 \| \| \| +-^------+--+ \| \| \| \| \| +---------------------------+ \| +--> PORT 2 multicast table \| \| +---------------------------+ \| \| \| \| \| FG0: \| \| \| src_port=dst_port -> drop \| \| \| FG1: \| \| \| VLAN X -> pop, goto port \| \| \| ... \| \| \| \| \| \| FG3: \| \| \| matchall -> goto port +-------+ \| \| +---------------------------+ Patches overview: - Patch 1 adds hardware definition bits for capabilities required to replicate multicast packets to multiple per-port tables. These bits are used by following patches to only attempt multicast offload if firmware and hardware provide necessary support. - Pathces 2-4 patches are preparations and refactoring. - Patch 5 implements necessary infrastructure to toggle multicast offload via SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED port object attribute notification. This also enabled IGMP and MLD snooping. - Patch 6 implements per-port multicast replication tables. It only supports filtering of loopback packets. - Patch 7 extends per-port multicast tables with VLAN pop support for 'untagged' VLANs. - Patch 8 handles SWITCHDEV_OBJ_ID_PORT_MDB port object notifications. It creates MDB replication rules in egress table that can replicate packets to multiple per-port multicast tables. - Patch 9 adds tracepoints for MDB events. ============== 2) Parav Create a new allocation profile for SFs, to save on memory 3) Yevgeny provides some initial patches for upcoming software steering support new pattern/arguments type of modify_header actions. Starting with ConnectX-6 DX, we use a new design of modify_header FW object. The current modify_header object allows for having only limited number of these FW objects, which means that we are limited in the number of offloaded flows that require modify_header action. As a preparation Yevgeny provides the following 4 patches: - Patch 1: Add required mlx5_ifc HW bits - Patch 2, 3: Add new WQE type and opcode that is required for pattern/arg support and adds appropriate support in dr_send.c - Patch 4: Add ICM pool for modify-header-pattern objects and implement patterns cache, allowing patterns reuse for different flows -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmQ2LDIACgkQSD+KveBX +j59UAf+NfLEq7i/gYOri1rLwXOrgPd13w64Oob79+pKVXzRqik/gBg4d0CKdc2X Qafntudus7IvVrGJCM0vurw7nC75H9BMFh3dQW2u3uaR9m6i0k4tSKIAoJUFHMdj dMgAJywYkCCXlBbGVhRE3yybal2EXfOSJ+Fr2tGeqpL4vRlg4kIaVuFGcGk2gMpq gOahX6wtuop3ABzY3sqKCIQ1Q2jWx9AWWz+lEmw7HJGmHGugscYrgSIloHDnu1fz Bt6uODn7GFr866rQr9+JemG0VyK8ahyisEFnX1oJ3642ZidPcDrbIrrtMWeEqo7X 1tcmY/wNti87xFbg3gL+DZekhHTHoQ== =oIN8 -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2023-04-11' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-04-11 1) Vlad adds the support for linux bridge multicast offload support Patches #1 through #9 Synopsis Vlad Says: ============== Implement support of bridge multicast offload in mlx5. Handle port object attribute SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED notification to toggle multicast offload and bridge snooping support on bridge. Handle port object SWITCHDEV_OBJ_ID_PORT_MDB notification to attach a bridge port to MDB. Steering architecture Existing offload infrastructure relies on two levels of flow tables - bridge ingress and egress. For multicast offload the architecture is extended with additional layer of per-port multicast replication tables. Such tables filter loopback traffic (so packets are not replicated to their source port) and pop VLAN headers for "untagged" VLANs. The tables are referenced by the MDB rules in egress table. MDB egress rule can point to multiple per-port multicast tables, which causes matching multicast traffic to be replicated to all of them, and, consecutively, to several bridge ports: +--------+--+ +---------------------------------------> Port 1 \| \| \| +-^------+--+ \| \| \| \| +-----------------------------------------+ \| +---------------------------+ \| \| EGRESS table \| \| +--> PORT 1 multicast table \| \| +----------------------------------+ +-----------------------------------------+ \| \| +---------------------------+ \| \| INGRESS table \| \| \| \| \| \| \| \| +----------------------------------+ \| dst_mac=P1,vlan=X -> pop vlan, goto P1 +--+ \| \| FG0: \| \| \| \| \| dst_mac=P1,vlan=Y -> pop vlan, goto P1 \| \| \| src_port=dst_port -> drop \| \| \| src_mac=M1,vlan=X -> goto egress +---> dst_mac=P2,vlan=X -> pop vlan, goto P2 +--+ \| \| FG1: \| \| \| ... \| \| dst_mac=P2,vlan=Y -> goto P2 \| \| \| \| VLAN X -> pop, goto port \| \| \| \| \| dst_mac=MDB1,vlan=Y -> goto mcast P1,P2 +-----+ \| ... \| \| +----------------------------------+ \| \| \| \| \| VLAN Y -> pop, goto port +-------+ +-----------------------------------------+ \| \| \| FG3: \| \| \| \| matchall -> goto port \| \| \| \| \| \| \| +---------------------------+ \| \| \| \| \| \| +--------+--+ +---------------------------------------> Port 2 \| \| \| +-^------+--+ \| \| \| \| \| +---------------------------+ \| +--> PORT 2 multicast table \| \| +---------------------------+ \| \| \| \| \| FG0: \| \| \| src_port=dst_port -> drop \| \| \| FG1: \| \| \| VLAN X -> pop, goto port \| \| \| ... \| \| \| \| \| \| FG3: \| \| \| matchall -> goto port +-------+ \| \| +---------------------------+ Patches overview: - Patch 1 adds hardware definition bits for capabilities required to replicate multicast packets to multiple per-port tables. These bits are used by following patches to only attempt multicast offload if firmware and hardware provide necessary support. - Pathces 2-4 patches are preparations and refactoring. - Patch 5 implements necessary infrastructure to toggle multicast offload via SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED port object attribute notification. This also enabled IGMP and MLD snooping. - Patch 6 implements per-port multicast replication tables. It only supports filtering of loopback packets. - Patch 7 extends per-port multicast tables with VLAN pop support for 'untagged' VLANs. - Patch 8 handles SWITCHDEV_OBJ_ID_PORT_MDB port object notifications. It creates MDB replication rules in egress table that can replicate packets to multiple per-port multicast tables. - Patch 9 adds tracepoints for MDB events. ============== 2) Parav Create a new allocation profile for SFs, to save on memory 3) Yevgeny provides some initial patches for upcoming software steering support new pattern/arguments type of modify_header actions. Starting with ConnectX-6 DX, we use a new design of modify_header FW object. The current modify_header object allows for having only limited number of these FW objects, which means that we are limited in the number of offloaded flows that require modify_header action. As a preparation Yevgeny provides the following 4 patches: - Patch 1: Add required mlx5_ifc HW bits - Patch 2, 3: Add new WQE type and opcode that is required for pattern/arg support and adds appropriate support in dr_send.c - Patch 4: Add ICM pool for modify-header-pattern objects and implement patterns cache, allowing patterns reuse for different flows * tag 'mlx5-updates-2023-04-11' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: DR, Add modify-header-pattern ICM pool net/mlx5: DR, Prepare sending new WQE type net/mlx5: Add new WQE for updating flow table net/mlx5: Add mlx5_ifc bits for modify header argument net/mlx5: DR, Set counter ID on the last STE for STEv1 TX net/mlx5: Create a new profile for SFs net/mlx5: Bridge, add tracepoints for multicast net/mlx5: Bridge, implement mdb offload net/mlx5: Bridge, support multicast VLAN pop net/mlx5: Bridge, add per-port multicast replication tables net/mlx5: Bridge, snoop igmp/mld packets net/mlx5: Bridge, extract code to lookup parent bridge of port net/mlx5: Bridge, move additional data structures to priv header net/mlx5: Bridge, increase bridge tables sizes net/mlx5: Add mlx5_ifc definitions for bridge multicast support ==================== Link: https://lore.kernel.org/r/20230412040752.14220-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 22:28:03 -07:00
Jakub Kicinski	f7d29571ab	Merge branch 'add-kernel-tc-mqprio-and-tc-taprio-support-for-preemptible-traffic-classes' Vladimir Oltean says: ==================== Add kernel tc-mqprio and tc-taprio support for preemptible traffic classes The last RFC in August 2022 contained a proposal for the UAPI of both TSN standards which together form Frame Preemption (802.1Q and 802.3): https://lore.kernel.org/netdev/20220816222920.1952936-1-vladimir.oltean@nxp.com/ It wasn't clear at the time whether the 802.1Q portion of Frame Preemption should be exposed via the tc qdisc (mqprio, taprio) or via some other layer (perhaps also ethtool like the 802.3 portion, or dcbnl), even though the options were discussed extensively, with pros and cons: https://lore.kernel.org/netdev/20220816222920.1952936-3-vladimir.oltean@nxp.com/ So the 802.3 portion got submitted separately and finally was accepted: https://lore.kernel.org/netdev/20230119122705.73054-1-vladimir.oltean@nxp.com/ leaving the only remaining question: how do we expose the 802.1Q bits? This series proposes that we use the Qdisc layer, through separate (albeit very similar) UAPI in mqprio and taprio, and that both these Qdiscs pass the information down to the offloading device driver through the common mqprio offload structure (which taprio also passes). An implementation is provided for the NXP LS1028A on-board Ethernet endpoint (enetc). Previous versions also contained support for its embedded switch (felix), but this needs more work and will be submitted separately. v4: https://lore.kernel.org/netdev/20230403103440.2895683-1-vladimir.oltean@nxp.com/ v2: https://lore.kernel.org/netdev/20230219135309.594188-1-vladimir.oltean@nxp.com/ v1: https://lore.kernel.org/netdev/20230216232126.3402975-1-vladimir.oltean@nxp.com/ ==================== Link: https://lore.kernel.org/r/20230411180157.1850527-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 22:22:12 -07:00
Vladimir Oltean	01e23b2b3b	net: enetc: add support for preemptible traffic classes PFs which support the MAC Merge layer also have a set of 8 registers called "Port traffic class N frame preemption register (PTC0FPR - PTC7FPR)". Through these, a traffic class (group of TX rings of same dequeue priority) can be mapped to the eMAC or to the pMAC. There's nothing particularly spectacular here. We should probably only commit the preemptible TCs to hardware once the MAC Merge layer became active, but unlike Felix, we don't have an IRQ that notifies us of that. We'd have to sleep for up to verifyTime (127 ms) to wait for a resolution coming from the verification state machine; not only from the ndo_setup_tc() code path, but also from enetc_mm_link_state_update(). Since it's relatively complicated and has a relatively small benefit, I'm not doing it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 22:22:10 -07:00
Vladimir Oltean	50764da37c	net: enetc: rename "mqprio" to "qopt" To gain access to the larger encapsulating structure which has the type tc_mqprio_qopt_offload, rename just the "qopt" field as "qopt". Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 22:22:10 -07:00
Vladimir Oltean	a721c3e54b	net/sched: taprio: allow per-TC user input of FP adminStatus This is a duplication of the FP adminStatus logic introduced for tc-mqprio. Offloading is done through the tc_mqprio_qopt_offload structure embedded within tc_taprio_qopt_offload. So practically, if a device driver is written to treat the mqprio portion of taprio just like standalone mqprio, it gets unified handling of frame preemption. I would have reused more code with taprio, but this is mostly netlink attribute parsing, which is hard to transform into generic code without having something that stinks as a result. We have the same variables with the same semantics, just different nlattr type values (TCA_MQPRIO_TC_ENTRY=5 vs TCA_TAPRIO_ATTR_TC_ENTRY=12; TCA_MQPRIO_TC_ENTRY_FP=2 vs TCA_TAPRIO_TC_ENTRY_FP=3, etc) and consequently, different policies for the nest. Every time nla_parse_nested() is called, an on-stack table "tb" of nlattr pointers is allocated statically, up to the maximum understood nlattr type. That array size is hardcoded as a constant, but when transforming this into a common parsing function, it would become either a VLA (which the Linux kernel rightfully doesn't like) or a call to the allocator. Having FP adminStatus in tc-taprio can be seen as addressing the 802.1Q Annex S.3 "Scheduling and preemption used in combination, no HOLD/RELEASE" and S.4 "Scheduling and preemption used in combination with HOLD/RELEASE" use cases. HOLD and RELEASE events are emitted towards the underlying MAC Merge layer when the schedule hits a Set-And-Hold-MAC or a Set-And-Release-MAC gate operation. So within the tc-taprio UAPI space, one can distinguish between the 2 use cases by choosing whether to use the TC_TAPRIO_CMD_SET_AND_HOLD and TC_TAPRIO_CMD_SET_AND_RELEASE gate operations within the schedule, or just TC_TAPRIO_CMD_SET_GATES. A small part of the change is dedicated to refactoring the max_sdu nlattr parsing to put all logic under the "if" that tests for presence of that nlattr. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 22:22:10 -07:00
Vladimir Oltean	f62af20bed	net/sched: mqprio: allow per-TC user input of FP adminStatus IEEE 802.1Q-2018 clause 6.7.2 Frame preemption specifies that each packet priority can be assigned to a "frame preemption status" value of either "express" or "preemptible". Express priorities are transmitted by the local device through the eMAC, and preemptible priorities through the pMAC (the concepts of eMAC and pMAC come from the 802.3 MAC Merge layer). The FP adminStatus is defined per packet priority, but 802.1Q clause 12.30.1.1.1 framePreemptionAdminStatus also says that: \| Priorities that all map to the same traffic class should be \| constrained to use the same value of preemption status. It is impossible to ignore the cognitive dissonance in the standard here, because it practically means that the FP adminStatus only takes distinct values per traffic class, even though it is defined per priority. I can see no valid use case which is prevented by having the kernel take the FP adminStatus as input per traffic class (what we do here). In addition, this also enforces the above constraint by construction. User space network managers which wish to expose FP adminStatus per priority are free to do so; they must only observe the prio_tc_map of the netdev (which presumably is also under their control, when constructing the mqprio netlink attributes). The reason for configuring frame preemption as a property of the Qdisc layer is that the information about "preemptible TCs" is closest to the place which handles the num_tc and prio_tc_map of the netdev. If the UAPI would have been any other layer, it would be unclear what to do with the FP information when num_tc collapses to 0. A key assumption is that only mqprio/taprio change the num_tc and prio_tc_map of the netdev. Not sure if that's a great assumption to make. Having FP in tc-mqprio can be seen as an implementation of the use case defined in 802.1Q Annex S.2 "Preemption used in isolation". There will be a separate implementation of FP in tc-taprio, for the other use cases. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 22:22:10 -07:00

... 4 5 6 7 8 ...

1172897 Commits All Branches Search

1172897 Commits

All Branches