OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Russell King (Oracle)	f8810ca758	net: sfp: ignore power level 3 prior to SFF-8472 Rev 11.4 Power level 3 was included in SFF-8472 revision 11.9, but this does not have a compliance code. Use revision 11.4 as the minimum compliance level instead. This should avoid any spurious indication of 2W modules. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-24 21:06:12 -07:00
Russell King (Oracle)	18cc659e95	net: sfp: ignore power level 2 prior to SFF-8472 Rev 10.2 Power level 2 was introduced by SFF-8472 revision 10.2. Ignore the power declaration bit for modules that are not compliant with at least this revision. This should remove any spurious indication of 1.5W modules. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-24 21:06:06 -07:00
Russell King (Oracle)	02eaf5a791	net: sfp: check firmware provided max power Check that the firmware provided maximum power is at least 1W, which is the minimum power level for any SFP module. Now that we enforce the minimum of 1W, we can exit early from sfp_module_parse_power() if the module power is 1W or less. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-24 21:05:54 -07:00
Russell King (Oracle)	a272bcb9e5	dt-bindings: net: sff,sfp: update binding Add a minimum and default for the maximum-power-milliwatt option; module power levels were originally up to 1W, so this is the default and the minimum power level we can have for a functional SFP cage. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-24 21:05:45 -07:00
Horatiu Vultur	4a4b6848d1	net: lan966x: Stop replacing tx dcbs and dcbs_buf when changing MTU When a frame is sent using FDMA, the skb is mapped and then the mapped address is given to an tx dcb that is different than the last used tx dcb. Once the HW finish with this frame, it would generate an interrupt and then the dcb can be reused and memory can be freed. For each dcb there is an dcb buf that contains some meta-data(is used by PTP, is it free). There is 1 to 1 relationship between dcb and dcb_buf. The following issue was observed. That sometimes after changing the MTU to allocate new tx dcbs and dcbs_buf, two frames were not transmitted. The frames were not transmitted because when reloading the tx dcbs, it was always presuming to use the first dcb but that was not always happening. Because it could be that the last tx dcb used before changing MTU was first dcb and then when it tried to get the next dcb it would take dcb 1 instead of 0. Because it is supposed to take a different dcb than the last used one. This can be fixed simply by changing tx->last_in_use to -1 when the fdma is disabled to reload the new dcb and dcbs_buff. But there could be a different issue. For example, right after the frame is sent, the MTU is changed. Now all the dcbs and dcbs_buf will be cleared. And now get the interrupt from HW that it finished with the frame. So when we try to clear the skb, it is not possible because we lost all the dcbs_buf. The solution here is to stop replacing the tx dcbs and dcbs_buf when changing MTU because the TX doesn't care what is the MTU size, it is only the RX that needs this information. Fixes: `2ea1cbac26` ("net: lan966x: Update FDMA to change MTU.") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20221021090711.3749009-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-24 21:04:51 -07:00
Jakub Kicinski	1b3d6ecd41	Merge branch 'bnxt_en-driver-updates' Michael Chan says: ==================== bnxt_en: Driver updates This patchset adds .get_module_eeprom_by_page() support and adds an NVRAM resize step to allow larger firmware images to be flashed to older firmware. ==================== Link: https://lore.kernel.org/r/1666334243-23866-1-git-send-email-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-24 19:24:20 -07:00
Vikas Gupta	4503422462	bnxt_en: check and resize NVRAM UPDATE entry before flashing Resize of the UPDATE entry is required if the image to be flashed is larger than the available space. Add this step, otherwise flashing larger firmware images by ethtool or devlink may fail. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-24 19:24:14 -07:00
Vikas Gupta	7ef3d3901b	bnxt_en: add .get_module_eeprom_by_page() support Add support for .get_module_eeprom_by_page() callback which implements generic solution for module`s eeprom access. v3: Add bnxt_get_module_status() to get a more specific extack error string. Return -EINVAL from bnxt_get_module_eeprom_by_page() when we don't want to fallback to old method. v2: Simplification suggested by Ido Schimmel Link: https://lore.kernel.org/netdev/YzVJ%2FvKJugoz15yV@shredder/ Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-24 19:24:14 -07:00
Michael Chan	84a911db83	bnxt_en: Update firmware interface to 1.10.2.118 The main changes are PTM timestamp support, CMIS EEPROM support, and asymmetric CoS queues support. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-24 19:24:14 -07:00
Jakub Kicinski	4fa86555d1	genetlink: piggy back on resv_op to default to a reject policy To keep backward compatibility we used to leave attribute parsing to the family if no policy is specified. This becomes tedious as we move to more strict validation. Families must define reject all policies if they don't want any attributes accepted. Piggy back on the resv_start_op field as the switchover point. AFAICT only ethtool has added new commands since the resv_start_op was defined, and it has per-op policies so this should be a no-op. Nonetheless the patch should still go into v6.1 for consistency. Link: https://lore.kernel.org/all/20221019125745.3f2e7659@kernel.org/ Link: https://lore.kernel.org/r/20221021193532.1511293-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-24 19:08:46 -07:00
Xin Long	9d9effca9d	ethtool: eeprom: fix null-deref on genl_info in dump The similar fix as commit `46cdedf2a0` ("ethtool: pse-pd: fix null-deref on genl_info in dump") is also needed for ethtool eeprom. Fixes: `c781ff12a2` ("ethtool: Allow network drivers to dump arbitrary EEPROM data") Signed-off-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/5575919a2efc74cd9ad64021880afc3805c54166.1666362167.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-24 19:08:07 -07:00
Steven Rostedt (Google)	a970174d7a	x86/mm: Do not verify W^X at boot up Adding on the kernel command line "ftrace=function" triggered: CPA detected W^X violation: 8000000000000063 -> 0000000000000063 range: 0xffffffffc0013000 - 0xffffffffc0013fff PFN 10031b WARNING: CPU: 0 PID: 0 at arch/x86/mm/pat/set_memory.c:609 verify_rwx+0x61/0x6d Call Trace: __change_page_attr_set_clr+0x146/0x8a6 change_page_attr_set_clr+0x135/0x268 change_page_attr_clear.constprop.0+0x16/0x1c set_memory_x+0x2c/0x32 arch_ftrace_update_trampoline+0x218/0x2db ftrace_update_trampoline+0x16/0xa1 __register_ftrace_function+0x93/0xb2 ftrace_startup+0x21/0xf0 register_ftrace_function_nolock+0x26/0x40 register_ftrace_function+0x4e/0x143 function_trace_init+0x7d/0xc3 tracer_init+0x23/0x2c tracing_set_tracer+0x1d5/0x206 register_tracer+0x1c0/0x1e4 init_function_trace+0x90/0x96 early_trace_init+0x25c/0x352 start_kernel+0x424/0x6e4 x86_64_start_reservations+0x24/0x2a x86_64_start_kernel+0x8c/0x95 secondary_startup_64_no_verify+0xe0/0xeb This is because at boot up, kernel text is writable, and there's no reason to do tricks to updated it. But the verifier does not distinguish updates at boot up and at run time, and causes a warning at time of boot. Add a check for system_state == SYSTEM_BOOTING and allow it if that is the case. [ These SYSTEM_BOOTING special cases are all pretty horrid, but the x86 text_poke() code does some odd things at bootup, forcing this for now - Linus ] Link: https://lore.kernel.org/r/20221024112730.180916b3@gandalf.local.home Fixes: `652c5bf380` ("x86/mm: Refuse W^X violations") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-10-24 18:05:27 -07:00
Jakub Kicinski	96917bb3a3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net include/linux/net.h `a5ef058dc4` ("net: introduce and use custom sockopt socket flag") `e993ffe3da` ("net: flag sockets supporting msghdr originated zerocopy") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-24 13:44:11 -07:00
Linus Torvalds	337a0a0b63	Including fixes from bpf. Current release - regressions: - eth: fman: re-expose location of the MAC address to userspace, apparently some udev scripts depended on the exact value Current release - new code bugs: - bpf: - wait for busy refill_work when destroying bpf memory allocator - allow bpf_user_ringbuf_drain() callbacks to return 1 - fix dispatcher patchable function entry to 5 bytes nop Previous releases - regressions: - net-memcg: avoid stalls when under memory pressure - tcp: fix indefinite deferral of RTO with SACK reneging - tipc: fix a null-ptr-deref in tipc_topsrv_accept - eth: macb: specify PHY PM management done by MAC - tcp: fix a signed-integer-overflow bug in tcp_add_backlog() Previous releases - always broken: - eth: amd-xgbe: SFP fixes and compatibility improvements Misc: - docs: netdev: offer performance feedback to contributors Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmNW024ACgkQMUZtbf5S IrvX7w//SP/zKZwgzC13zd2rrCP16TX2QvkHPmSLvcldQDXdCypmsoc5Vb8UNpkG jwAuy2pxqPy2oxTwTBQv9TNRT2oqEFOsFTK+w410whlL7g1wZ02aXU8qFhV2XumW o4gRtM+UISPUKFbOnawdK1XlrNdeLF3bjETvW2GP9zxCb0iqoQXtDDNKxv2B2iQA MSyTtzHA4n9GS7LKGtPgsP2Ose7h1Z+AjTIpQH1nvfEHJUf/wmxUdCK+fuwfeLjY PhmYaPG/333j1bfBk1Ms/nUYA5KRXlEj9A/7jDtxhxNEwaTNKyLB19a6oVxXxpSQ x/k+nZP1RColn5xeco5a1X9aHHQ46PJQ8wVAmxYDIeIA5XPMgShNmhAyjrq1ac+o 9vYeYpmnMGSTLdBMvGbWpynWHe7SddgF8LkbnYf2HLKbxe4bgkOnmxOUH4q9iinZ MfVSknjax4DP0C7X1kGgR6WyltWnkrahOdUkINsIUNxj0KxJa/eStpJIbJrfkxNV gHbOjB2/bF3SXENrS4A0IJCgsbO9YugN83Eyu0WDWQOw9wVgopzxOJx9R+H0wkVH XpGGP8qi1DZiTE3iQiq1LHj6f6kirFmtt9QFH5yzaqtKBaqXakHaXwUO4VtD+BI9 NPFKvFL6jrp8EAn0PTM/RrvhJZN+V0bFXiyiMe0TLx+aR0UMxGc= =dD6N -----END PGP SIGNATURE----- Merge tag 'net-6.1-rc3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf. The net-memcg fix stands out, the rest is very run-off-the-mill. Maybe I'm biased. Current release - regressions: - eth: fman: re-expose location of the MAC address to userspace, apparently some udev scripts depended on the exact value Current release - new code bugs: - bpf: - wait for busy refill_work when destroying bpf memory allocator - allow bpf_user_ringbuf_drain() callbacks to return 1 - fix dispatcher patchable function entry to 5 bytes nop Previous releases - regressions: - net-memcg: avoid stalls when under memory pressure - tcp: fix indefinite deferral of RTO with SACK reneging - tipc: fix a null-ptr-deref in tipc_topsrv_accept - eth: macb: specify PHY PM management done by MAC - tcp: fix a signed-integer-overflow bug in tcp_add_backlog() Previous releases - always broken: - eth: amd-xgbe: SFP fixes and compatibility improvements Misc: - docs: netdev: offer performance feedback to contributors" * tag 'net-6.1-rc3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (37 commits) net-memcg: avoid stalls when under memory pressure tcp: fix indefinite deferral of RTO with SACK reneging tcp: fix a signed-integer-overflow bug in tcp_add_backlog() net: lantiq_etop: don't free skb when returning NETDEV_TX_BUSY net: fix UAF issue in nfqnl_nf_hook_drop() when ops_init() failed docs: netdev: offer performance feedback to contributors kcm: annotate data-races around kcm->rx_wait kcm: annotate data-races around kcm->rx_psock net: fman: Use physical address for userspace interfaces net/mlx5e: Cleanup MACsec uninitialization routine atlantic: fix deadlock at aq_nic_stop nfp: only clean `sp_indiff` when application firmware is unloaded amd-xgbe: add the bit rate quirk for Molex cables amd-xgbe: fix the SFP compliance codes check for DAC cables amd-xgbe: enable PLL_CTL for fixed PHY modes only amd-xgbe: use enums for mailbox cmd and sub_cmds amd-xgbe: Yellow carp devices do not need rrc bpf: Use __llist_del_all() whenever possbile during memory draining bpf: Wait for busy refill_work when destroying bpf memory allocator MAINTAINERS: add keyword match on PTP ...	2022-10-24 12:43:51 -07:00
Linus Torvalds	f6602a97a1	Urgent RCU pull request for v6.1 This pull request contains a commit that fixes `bf95b2bc3e` ("rcu: Switch polled grace-period APIs to ->gp_seq_polled"), which could incorrectly leave interrupts enabled after an early-boot call to synchronize_rcu(). Such synchronize_rcu() calls must acquire leaf rcu_node locks in order to properly interact with polled grace periods, but the code did not take into account the possibility of synchronize_rcu() being invoked from the portion of the boot sequence during which interrupts are disabled. This commit therefore switches the lock acquisition and release from irq to irqsave/irqrestore. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmNS54gTHHBhdWxtY2tA a2VybmVsLm9yZwAKCRCevxLzctn7jI5pD/0aURo25lH/gxCxt1A5gi/UI+/rlz+O 7rV51ZXW3cAPY1O7EP/57D8sXtgcALxYyRZ2JVh1WH+ScP+Se9YENs+UBp5zwy75 NIOkJ2LRxMCeX+T7Vw8jtsYd4bsgSQ1q2IXiGHCrRxtUwXewYMxLEVx4Bd89P45F P49NJx66s5uXmKH2VV6UUcJT7yfyn8+USYxTaDmhhnEGE1frKBoYVqsqoaVfa0ON r8O50/06bj6VdSRooGGw9vUcuoUlsrmnnXDd2aITkWqtBkeBOA6S3Nx+LUiRRXaw m9e0s0yLulTlMsXVx/UAM+eBaXP3SDKK7wF1xXoyCqLdPKtZ4ABM0RQzXMsFBQQy xs04Ba/0CBe1wVv9BijXuR1WgmcRUtoqxAFKXR6bIwh7uPtFDINRO2hfi9vEEVSQ +4XbnoqsyHaul8tvBWV7O1Fo7+hNgFgJwAZq9I2xOJzh8wbVCc1+5E13K0Oktbq4 HERDAzeqA8Cr+6VxicrXt8/5xw7GsUJbYoXMpttB1WQBIps3/ayUxww1RyK9qegi DnpUSAUdd0bF1ZfVtkh7V59uk+BBsL9MKNowTRrT1nVWO2VcAbf9IVvONHBFlIw4 d2IKn4IbpGsQR9ZJUfo/6y7GkZYBLU46lknduW+n7vuP+7j2B3KkFQw7z71e1c7u LSEgmtPZS/c7SA== =o6hI -----END PGP SIGNATURE----- Merge tag 'rcu-urgent.2022.10.20a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU fix from Paul McKenney: "Fix a regression caused by commit `bf95b2bc3e` ("rcu: Switch polled grace-period APIs to ->gp_seq_polled"), which could incorrectly leave interrupts enabled after an early-boot call to synchronize_rcu(). Such synchronize_rcu() calls must acquire leaf rcu_node locks in order to properly interact with polled grace periods, but the code did not take into account the possibility of synchronize_rcu() being invoked from the portion of the boot sequence during which interrupts are disabled. This commit therefore switches the lock acquisition and release from irq to irqsave/irqrestore" * tag 'rcu-urgent.2022.10.20a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: rcu: Keep synchronize_rcu() from enabling irqs in early boot	2022-10-24 12:33:30 -07:00
Linus Torvalds	2a91e897c0	linux-kselftest-kunit-fixes-6.1-rc3 This KUnit fixes update for Linux 6.1-rc3 consists of one single fix to update alloc_string_stream() callers to check for IS_ERR() instead of NULL to be in sync with alloc_string_stream() returning IS_ERR(). -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmNWwFwACgkQCwJExA0N QxzH8Q/+PdNr/qmF3eNWiz5u/UC+jBQOiNURb+DenFkS9i6iMp61HBSUTDFakR// NsPiOg4G8RUT7XI9kvhb22vC5b+RacHTwHrxh1TBd83j3WW2xsvCEOF228yiRM3m 8XM2pg+oLSkmcPmt9ym2hgiZBQ5rvzkyWmZKSZy+gzW9vylCA8BXDoVlTNbsiemh JbxFyPaq3ApSkNcLBPuKYWpz1PungGg6RybJ2ZlP0384Z7JJtpdpom/a1fEXTABi d4DCkvKntI6xyVkv62t3F2WVKnV78VJJX87RcwDV4qZ5W8V2MXXZwuPPrbeyC6tB TLkh4M1EdfSfr6zxkVOiPz8noCmiF9TzJauwjPV/tyOnDT8dDSpcty2iAgjyIdFg PSS1rNeFo6EkeouETkX+h8TI3VzzNTFBrWiAcqOlr1i5R2Pt20IqAQcUdhQVFg0C Y3BjycG83f7k2uYQivUSLSv3DYbrDIov29Ej9P4sZ9MZSO/DbkkhNb87A7IQQCI5 zRKRh8KZWUzjAHJge5M9a5wZdhnECpwoJNiBlhKxa1/TmrIS3Vc0KthC1URrgS4E IK/T3HGqlMlkV3Zi59R/bIBFCfz7Y6OJ0CjPt/Ey+xUpm8uUYwU4PgDfPbVWglzf 5KMEkEXmEWqZJviMbs+YF+FlqaIz692WTEYMQ/obrz2cvchPsUc= =rHRJ -----END PGP SIGNATURE----- Merge tag 'linux-kselftest-kunit-fixes-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull KUnit fixes from Shuah Khan: "One single fix to update alloc_string_stream() callers to check for IS_ERR() instead of NULL to be in sync with alloc_string_stream() returning an ERR_PTR()" * tag 'linux-kselftest-kunit-fixes-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: update NULL vs IS_ERR() tests	2022-10-24 12:19:34 -07:00
Linus Torvalds	21c92498e9	linux-kselftest-fixes-6.1-rc3 This Kselftest fixes update for Linux 6.1-rc3 consists of: - futex, intel_pstate, kexec build fixes - ftrace dynamic_events dependency check fix - memory-hotplug fix to remove redundant warning from test report -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmNWu/cACgkQCwJExA0N Qxz/nBAAhKn5l5G69tUcJIPT2XNNVnslJJKg8CBuFhfzw9U+BIJqbR9ZCqyzLAz2 ZVYTEbZtshbjgEJbe8KwDskK3H9F3RYkMEJ+EOH7N6GwcL9Ol7Ubzx6Yf9RM3V8S +Y8rUE3iZfC4grR4uQNK0edUb2mXZwMLB+ZYEetp9YlTS62vtpUUtaMWRAU4mvd5 2Ws+6YKV89kP4zoZB6SCJe9/ORvp0wvskoSCshw9BAaJrbJ6MkGToOZtFWzTCKNx x3he60mj4n5rE6PtxuWh1sAar1m/7ZF9DT9gGy6t99/zT0BUWTUQ8RECi2/IIe5m SqXJkWiNqO+ytFm17IqDPd/fk/XQlCdCRR423fkAjie6DGdZzu1KeJjg+Q9vPpeP qaCpFXwc3E4culDKDWBn2ZRQZFBoFbrCUEbqO0uZB5Ua9jB9FuTJhofanEmz3TUh XZlXUrPTVwE6SLur7f5x8ubj//0qGcbdM0qyYl6C9/i213axYdj3V6Xh1FStHJpK IvNcUqcxrSAPEKXmOY9hEZeAMHYevBwRXOqHSLNhrZbjGN0ASGfIhU5Ys1ffNS0c SuVsxfxzqR1Eu11JqRnSIcHj9hJBgroaBv+c5h2ivZD+MqZSrocFoRnnwjV3nW9S B+n6Whi1JuatLK4pfn8Ba+VoZ7hl9Bb3dxvALRdqDbqYBs5cp+I= =awHm -----END PGP SIGNATURE----- Merge tag 'linux-kselftest-fixes-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest fixes from Shuah Khan: - futex, intel_pstate, kexec build fixes - ftrace dynamic_events dependency check fix - memory-hotplug fix to remove redundant warning from test report * tag 'linux-kselftest-fixes-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/ftrace: fix dynamic_events dependency check selftests/memory-hotplug: Remove the redundant warning information selftests/kexec: fix build for ARCH=x86_64 selftests/intel_pstate: fix build for ARCH=x86_64 selftests/futex: fix build for clang	2022-10-24 12:10:55 -07:00
Linus Torvalds	74d5b415a5	Some pin control fixes for v6.1: - Fix typos in UART1 and MMC in the Ingenic driver. - A really well researched glitch bug fix to the Qualcomm driver that was tracked down and fixed by Dough Anderson from Chromium. Hats off for this one! - Revert two patches on the Xilinx ZynqMP driver: this needs a proper solution making use of firmware version information to adapt to different firmware releases. - Fix interrupt triggers in the Ocelot driver. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAmNWPCQACgkQQRCzN7AZ XXOcbRAAyyqqUTiHc3v6OiU12WyNxTay3Fypq4hl2UKsi6BN+NIzliavDdSKYJ+y dR5h9YnXZ766OShGPoBgXAusY5Apx09IJkFePsc11IZcZwbsBRSmAwKkD9LHxb93 AIVHyn6zA1Ic2oD8ysgCC7kMAXUWibiSLgEqj3RSuwbD9lN2pYuUqZUK/Q7ydHpC 2yPRTlJig/9Ai79NlbFQXz8QUXKAR/niPPtaVtYEii+M87643kCHxKop52oWmF8V KV9WTxFqtW7TgNqunrBn0JJjxU/k0Dlbecj4Y6gDszg9V+7sR0u+LpZ/KVFakOI0 P8FcmQquAOG+jApGoe8XMwQw0xYuSTJxKZ3sBHzkj5k3f//MzB47UUMh34wbwo/N nt2lIzyV/Jlbhhj/1NjRMqECS00Ap+bo5BmDoyNFsPpQFqFfuyEmT27Mq/hJnmfm i896lGkvJvnNAofBzw0/QiB65iAhNt6xhy2L0VqyNRAeFfQ0K9ltf3wipg0g3KH4 rq0W2e7Fepl/vE+2qSyViDXbPG5GgQG3ljv/DE+8DdvMWYLLc2zMz0FenYj5k2bv XW9NtvEPRxVMCzTW5WEcDL21UQF7F5Vu8yojrjlv5XZ/QWafPjm/xpKhNBhNXuYf hmR4hL2k4YCBRqWmhn9B3S+7jotuTRF4E5CUIPkXIOjUWXKJE1Y= =w7Lm -----END PGP SIGNATURE----- Merge tag 'pinctrl-v6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: - Fix typos in UART1 and MMC in the Ingenic driver - A really well researched glitch bug fix to the Qualcomm driver that was tracked down and fixed by Dough Anderson from Chromium. Hats off for this one! - Revert two patches on the Xilinx ZynqMP driver: this needs a proper solution making use of firmware version information to adapt to different firmware releases - Fix interrupt triggers in the Ocelot driver * tag 'pinctrl-v6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: ocelot: Fix incorrect trigger of the interrupt. Revert "dt-bindings: pinctrl-zynqmp: Add output-enable configuration" Revert "pinctrl: pinctrl-zynqmp: Add support for output-enable and bias-high-impedance" pinctrl: qcom: Avoid glitching lines when we first mux to output pinctrl: Ingenic: JZ4755 bug fixes	2022-10-24 11:48:30 -07:00
Jakub Kicinski	720ca52bce	net-memcg: avoid stalls when under memory pressure As Shakeel explains the commit under Fixes had the unintended side-effect of no longer pre-loading the cached memory allowance. Even tho we previously dropped the first packet received when over memory limit - the consecutive ones would get thru by using the cache. The charging was happening in batches of 128kB, so we'd let in 128kB (truesize) worth of packets per one drop. After the change we no longer force charge, there will be no cache filling side effects. This causes significant drops and connection stalls for workloads which use a lot of page cache, since we can't reclaim page cache under GFP_NOWAIT. Some of the latency can be recovered by improving SACK reneg handling but nowhere near enough to get back to the pre-5.15 performance (the application I'm experimenting with still sees 5-10x worst latency). Apply the suggested workaround of using GFP_ATOMIC. We will now be more permissive than previously as we'll drop _no_ packets in softirq when under pressure. But I can't think of any good and simple way to address that within networking. Link: https://lore.kernel.org/all/20221012163300.795e7b86@kernel.org/ Suggested-by: Shakeel Butt <shakeelb@google.com> Fixes: `4b1327be9f` ("net-memcg: pass in gfp_t mask to mem_cgroup_charge_skmem()") Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://lore.kernel.org/r/20221021160304.1362511-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-24 10:35:09 -07:00
Neal Cardwell	3d2af9cce3	tcp: fix indefinite deferral of RTO with SACK reneging This commit fixes a bug that can cause a TCP data sender to repeatedly defer RTOs when encountering SACK reneging. The bug is that when we're in fast recovery in a scenario with SACK reneging, every time we get an ACK we call tcp_check_sack_reneging() and it can note the apparent SACK reneging and rearm the RTO timer for srtt/2 into the future. In some SACK reneging scenarios that can happen repeatedly until the receive window fills up, at which point the sender can't send any more, the ACKs stop arriving, and the RTO fires at srtt/2 after the last ACK. But that can take far too long (O(10 secs)), since the connection is stuck in fast recovery with a low cwnd that cannot grow beyond ssthresh, even if more bandwidth is available. This fix changes the logic in tcp_check_sack_reneging() to only rearm the RTO timer if data is cumulatively ACKed, indicating forward progress. This avoids this kind of nearly infinite loop of RTO timer re-arming. In addition, this meets the goals of tcp_check_sack_reneging() in handling Windows TCP behavior that looks temporarily like SACK reneging but is not really. Many thanks to Jakub Kicinski and Neil Spring, who reported this issue and provided critical packet traces that enabled root-causing this issue. Also, many thanks to Jakub Kicinski for testing this fix. Fixes: `5ae344c949` ("tcp: reduce spurious retransmits due to transient SACK reneging") Reported-by: Jakub Kicinski <kuba@kernel.org> Reported-by: Neil Spring <ntspring@fb.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Tested-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20221021170821.1093930-1-ncardwell.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-24 10:34:48 -07:00
Jakub Kicinski	e28c44450b	bpf-for-netdev -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmNVkYkACgkQ6rmadz2v bTqzHw/+NYMwfLm5Ck+BK0+HiYU5VVLoG4jp8G7B3sJL/6nUDduajzoqa+nM19Xl +HEjbMza7CizmhkCRkzIs1VVtx8mtvKdTxbhvm77SU2+GBn+X1es+XhtFd4EOpok MINNHs+cOC/HlnPD/QbFgvxKiKkjyjWxInjUp6Y/mLMcKCn7l9KOkc07/la9Dj3j RI0gXCywq1pJaPuTCnt0/wcYLJvzn6QsZnKmmksQwt59GQqOd11HWid3rBWZhDp6 beEoHDIMGHROtu60vm4DB0p4l6tauZfeXyPCeu3Tx5ZSsypJIyU1iTdKiIUjG963 ilpy55nrX9bWxadB7LIKHyYfW3in4o+D1ZZaUvLIato/69CZJZ7Uc4kU1RF4Ay1F Df1Fmal2WeNAxxETPmQPvVeCePvQvwLHl4KNogdZZvd/67cyc1cDhnuTJp37iPak FALHaaw0VOrTdTvxsWym7yEbkhPbCHpPrKYFZFHgGrRTFk/GM2k38mM07lcLxFGw aKyooS+eoIZMEgtK5Hma2wpeIVSlkJiJk1d0K20OxdnIUyYEsMXmI+uV1gMxq/8z EHNi0+296xOoxy22I1Bd5Tu7fIeecHFN44q7YFmpGsB54UNLpFsP0vYUmYT/6hLI Y0KVZu4c3oQDX7ttifMvkeOCURDJBPrZx37bpNpNXF55fB5ehNk= =eV7W -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Alexei Starovoitov says: ==================== pull-request: bpf 2022-10-23 We've added 7 non-merge commits during the last 18 day(s) which contain a total of 8 files changed, 69 insertions(+), 5 deletions(-). The main changes are: 1) Wait for busy refill_work when destroying bpf memory allocator, from Hou. 2) Allow bpf_user_ringbuf_drain() callbacks to return 1, from David. 3) Fix dispatcher patchable function entry to 5 bytes nop, from Jiri. 4) Prevent decl_tag from being referenced in func_proto, from Stanislav. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Use __llist_del_all() whenever possbile during memory draining bpf: Wait for busy refill_work when destroying bpf memory allocator bpf: Fix dispatcher patchable function entry to 5 bytes nop bpf: prevent decl_tag from being referenced in func_proto selftests/bpf: Add reproducer for decl_tag in func_proto return type selftests/bpf: Make bpf_user_ringbuf_drain() selftest callback return 1 bpf: Allow bpf_user_ringbuf_drain() callbacks to return 1 ==================== Link: https://lore.kernel.org/r/20221023192244.81137-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-24 10:32:01 -07:00
Dmitry Osipenko	e9cf4d9b9a	ACPI: video: Fix missing native backlight on Chromebooks Chromebooks don't have backlight in ACPI table, they suppose to use native backlight in this case. Check presence of the CrOS embedded controller ACPI device and prefer the native backlight if EC found. Suggested-by: Hans de Goede <hdegoede@redhat.com> Fixes: `2600bfa3df` ("ACPI: video: Add acpi_video_backlight_use_native() helper") Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20221024141210.67784-1-dmitry.osipenko@collabora.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>	2022-10-24 17:02:13 +02:00
David S. Miller	86d6f77a3c	Merge branch 'ptp-ocxp-Oroli-ART-CARD' Vadim Fedorenko says: ==================== ptp: ocp: add support for Orolia ART-CARD Orolia company created alternative open source TimeCard. The hardware of the card provides similar to OCP's card functions, that's why the support is added to current driver. The first patch in the series changes the way to store information about serial ports and is more like preparation. The patches 2 to 4 introduces actual hardware support. The last patch removes fallback from devlink flashing interface to protect against flashing wrong image. This became actual now as we have 2 different boards supported and wrong image can ruin hardware easily. v2: Address comments from Jonathan Lemon v3: Fix issue reported by kernel test robot <lkp@intel.com> v4: Fix clang build issue v5: Fix warnings and per-patch build errors v6: Fix more style issues ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 13:10:40 +01:00
Vadim Fedorenko	c1fd463d57	ptp: ocp: remove flash image header check fallback Previously there was a fallback mode to flash firmware image without proper header. But now we have different supported vendors and flashing wrong image could destroy the hardware. Remove fallback mode and force header check. Both vendors have published firmware images with headers. Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Vadim Fedorenko <vadfed@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 13:10:40 +01:00
Vadim Fedorenko	ee6439aaad	ptp: ocp: expose config and temperature for ART card Orolia card has disciplining configuration and temperature table stored in EEPROM. This patch exposes them as binary attributes to have read and write access. Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Co-developed-by: Charles Parent <charles.parent@orolia2s.com> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Vadim Fedorenko <vadfed@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 13:10:40 +01:00
Vadim Fedorenko	9c44a7ac17	ptp: ocp: add serial port of mRO50 MAC on ART card ART card provides interface to access to serial port of miniature atomic clock found on the card. Add support for this device and configure it during init phase. Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Co-developed-by: Charles Parent <charles.parent@orolia2s.com> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Vadim Fedorenko <vadfed@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 13:10:40 +01:00
Vadim Fedorenko	69dbe1079c	ptp: ocp: add Orolia timecard support This brings in the Orolia timecard support from the GitHub repository. The card uses different drivers to provide access to i2c EEPROM and firmware SPI flash. And it also has a bit different EEPROM map, but other parts of the code are the same and could be reused. Co-developed-by: Charles Parent <charles.parent@orolia2s.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Vadim Fedorenko <vadfed@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 13:10:40 +01:00
Vadim Fedorenko	895ac5a51f	ptp: ocp: upgrade serial line information Introduce structure to hold serial port line number and the baud rate it supports. Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Vadim Fedorenko <vadfed@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 13:10:40 +01:00
Lu Wei	ec791d8149	tcp: fix a signed-integer-overflow bug in tcp_add_backlog() The type of sk_rcvbuf and sk_sndbuf in struct sock is int, and in tcp_add_backlog(), the variable limit is caculated by adding sk_rcvbuf, sk_sndbuf and 64 * 1024, it may exceed the max value of int and overflow. This patch reduces the limit budget by halving the sndbuf to solve this issue since ACK packets are much smaller than the payload. Fixes: `c9c3321257` ("tcp: add tcp_add_backlog()") Signed-off-by: Lu Wei <luwei32@huawei.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 13:04:25 +01:00
Yunsheng Lin	4727bab4e9	net: skb: move skb_pp_recycle() to skbuff.c skb_pp_recycle() is only used by skb_free_head() in skbuff.c, so move it to skbuff.c. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 13:03:43 +01:00
Zhang Changzhong	9c1eaa27ec	net: lantiq_etop: don't free skb when returning NETDEV_TX_BUSY The ndo_start_xmit() method must not free skb when returning NETDEV_TX_BUSY, since caller is going to requeue freed skb. Fixes: `504d4721ee` ("MIPS: Lantiq: Add ethernet driver") Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 13:02:18 +01:00
Nick Child	127b7218bf	ibmveth: Always stop tx queues during close netif_stop_all_queues must be called before calling H_FREE_LOGICAL_LAN. As a result, we can remove the pool_config field from the ibmveth adapter structure. Some device configuration changes call ibmveth_close in order to free the current resources held by the device. These functions then make their changes and call ibmveth_open to reallocate and reserve resources for the device. Prior to this commit, the flag pool_config was used to tell ibmveth_close that it should not halt the transmit queue. pool_config was introduced in commit `860f242eb5` ("[PATCH] ibmveth change buffer pools dynamically") to avoid interrupting the tx flow when making rx config changes. Since then, other commits adopted this approach, even if making tx config changes. The issue with this approach was that the hypervisor freed all of the devices control structures after the hcall H_FREE_LOGICAL_LAN was performed but the transmit queues were never stopped. So the higher layers in the network stack would continue transmission but any H_SEND_LOGICAL_LAN hcall would fail with H_PARAMETER until the hypervisor's structures for the device were allocated with the H_REGISTER_LOGICAL_LAN hcall in ibmveth_open. This resulted in no real networking harm but did cause several of these error messages to be logged: "h_send_logical_lan failed with rc=-4" So, instead of trying to keep the transmit queues alive during network configuration changes, just stop the queues, make necessary changes then restart the queues. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 13:01:37 +01:00
xu xin	233baf9a1b	net: remove useless parameter of __sock_cmsg_send The parameter 'msg' has never been used by __sock_cmsg_send, so we can remove it safely. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: xu xin <xu.xin16@zte.com.cn> Reviewed-by: Zhang Yunkai <zhang.yunkai@zte.com.cn> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 12:43:46 +01:00
Wei Fang	350749b909	net: fec: Add support for periodic output signal of PPS This patch adds the support for configuring periodic output signal of PPS. So the PPS can be output at a specified time and period. For developers or testers, they can use the command "echo <channel> <start.sec> <start.nsec> <period.sec> <period. nsec> > /sys/class/ptp/ptp0/period" to specify time and period to output PPS signal. Notice that, the channel can only be set to 0. In addtion, the start time must larger than the current PTP clock time. So users can use the command "phc_ctl /dev/ptp0 -- get" to get the current PTP clock time before. Signed-off-by: Wei Fang <wei.fang@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 12:40:58 +01:00
Zhengchao Shao	d266935ac4	net: fix UAF issue in nfqnl_nf_hook_drop() when ops_init() failed When the ops_init() interface is invoked to initialize the net, but ops->init() fails, data is released. However, the ptr pointer in net->gen is invalid. In this case, when nfqnl_nf_hook_drop() is invoked to release the net, invalid address access occurs. The process is as follows: setup_net() ops_init() data = kzalloc(...) ---> alloc "data" net_assign_generic() ---> assign "date" to ptr in net->gen ... ops->init() ---> failed ... kfree(data); ---> ptr in net->gen is invalid ... ops_exit_list() ... nfqnl_nf_hook_drop() *q = nfnl_queue_pernet(net) ---> q is invalid The following is the Call Trace information: BUG: KASAN: use-after-free in nfqnl_nf_hook_drop+0x264/0x280 Read of size 8 at addr ffff88810396b240 by task ip/15855 Call Trace: <TASK> dump_stack_lvl+0x8e/0xd1 print_report+0x155/0x454 kasan_report+0xba/0x1f0 nfqnl_nf_hook_drop+0x264/0x280 nf_queue_nf_hook_drop+0x8b/0x1b0 __nf_unregister_net_hook+0x1ae/0x5a0 nf_unregister_net_hooks+0xde/0x130 ops_exit_list+0xb0/0x170 setup_net+0x7ac/0xbd0 copy_net_ns+0x2e6/0x6b0 create_new_namespaces+0x382/0xa50 unshare_nsproxy_namespaces+0xa6/0x1c0 ksys_unshare+0x3a4/0x7e0 __x64_sys_unshare+0x2d/0x40 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 </TASK> Allocated by task 15855: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 __kasan_kmalloc+0xa1/0xb0 __kmalloc+0x49/0xb0 ops_init+0xe7/0x410 setup_net+0x5aa/0xbd0 copy_net_ns+0x2e6/0x6b0 create_new_namespaces+0x382/0xa50 unshare_nsproxy_namespaces+0xa6/0x1c0 ksys_unshare+0x3a4/0x7e0 __x64_sys_unshare+0x2d/0x40 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 15855: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x40 ____kasan_slab_free+0x155/0x1b0 slab_free_freelist_hook+0x11b/0x220 __kmem_cache_free+0xa4/0x360 ops_init+0xb9/0x410 setup_net+0x5aa/0xbd0 copy_net_ns+0x2e6/0x6b0 create_new_namespaces+0x382/0xa50 unshare_nsproxy_namespaces+0xa6/0x1c0 ksys_unshare+0x3a4/0x7e0 __x64_sys_unshare+0x2d/0x40 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: `f875bae065` ("net: Automatically allocate per namespace data.") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 12:40:06 +01:00
Eric Dumazet	0cafd77dcd	net: add a refcount tracker for kernel sockets Commit `ffa84b5ffb` ("net: add netns refcount tracker to struct sock") added a tracker to sockets, but did not track kernel sockets. We still have syzbot reports hinting about netns being destroyed while some kernel TCP sockets had not been dismantled. This patch tracks kernel sockets, and adds a ref_tracker_dir_print() call to net_free() right before the netns is freed. Normally, each layer is responsible for properly releasing its kernel sockets before last call to net_free(). This debugging facility is enabled with CONFIG_NET_NS_REFCNT_TRACKER=y Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Tested-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 11:04:43 +01:00
Jakub Kicinski	c5884ef477	docs: netdev: offer performance feedback to contributors Some of us gotten used to producing large quantities of peer feedback at work, every 3 or 6 months. Extending the same courtesy to community members seems like a logical step. It may be hard for some folks to get validation of how important their work is internally, especially at smaller companies which don't employ many kernel experts. The concept of "peer feedback" may be a hyperscaler / silicon valley thing so YMMV. Hopefully we can build more context as we go. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 11:03:44 +01:00
David S. Miller	931ae86f8b	Merge branch 'kcm-data-races' Eric Dumazet says: ==================== kcm: annotate data-races This series address two different syzbot reports for KCM. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 10:57:56 +01:00
Eric Dumazet	0c745b5141	kcm: annotate data-races around kcm->rx_wait kcm->rx_psock can be read locklessly in kcm_rfree(). Annotate the read and writes accordingly. syzbot reported: BUG: KCSAN: data-race in kcm_rcv_strparser / kcm_rfree write to 0xffff88810784e3d0 of 1 bytes by task 1823 on cpu 1: reserve_rx_kcm net/kcm/kcmsock.c:283 [inline] kcm_rcv_strparser+0x250/0x3a0 net/kcm/kcmsock.c:363 __strp_recv+0x64c/0xd20 net/strparser/strparser.c:301 strp_recv+0x6d/0x80 net/strparser/strparser.c:335 tcp_read_sock+0x13e/0x5a0 net/ipv4/tcp.c:1703 strp_read_sock net/strparser/strparser.c:358 [inline] do_strp_work net/strparser/strparser.c:406 [inline] strp_work+0xe8/0x180 net/strparser/strparser.c:415 process_one_work+0x3d3/0x720 kernel/workqueue.c:2289 worker_thread+0x618/0xa70 kernel/workqueue.c:2436 kthread+0x1a9/0x1e0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 read to 0xffff88810784e3d0 of 1 bytes by task 17869 on cpu 0: kcm_rfree+0x121/0x220 net/kcm/kcmsock.c:181 skb_release_head_state+0x8e/0x160 net/core/skbuff.c:841 skb_release_all net/core/skbuff.c:852 [inline] __kfree_skb net/core/skbuff.c:868 [inline] kfree_skb_reason+0x5c/0x260 net/core/skbuff.c:891 kfree_skb include/linux/skbuff.h:1216 [inline] kcm_recvmsg+0x226/0x2b0 net/kcm/kcmsock.c:1161 ____sys_recvmsg+0x16c/0x2e0 ___sys_recvmsg net/socket.c:2743 [inline] do_recvmmsg+0x2f1/0x710 net/socket.c:2837 __sys_recvmmsg net/socket.c:2916 [inline] __do_sys_recvmmsg net/socket.c:2939 [inline] __se_sys_recvmmsg net/socket.c:2932 [inline] __x64_sys_recvmmsg+0xde/0x160 net/socket.c:2932 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x01 -> 0x00 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 17869 Comm: syz-executor.2 Not tainted 6.1.0-rc1-syzkaller-00010-gbb1a1146467a-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022 Fixes: `ab7ac4eb98` ("kcm: Kernel Connection Multiplexor module") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 10:57:55 +01:00
Eric Dumazet	15e4dabda1	kcm: annotate data-races around kcm->rx_psock kcm->rx_psock can be read locklessly in kcm_rfree(). Annotate the read and writes accordingly. We do the same for kcm->rx_wait in the following patch. syzbot reported: BUG: KCSAN: data-race in kcm_rfree / unreserve_rx_kcm write to 0xffff888123d827b8 of 8 bytes by task 2758 on cpu 1: unreserve_rx_kcm+0x72/0x1f0 net/kcm/kcmsock.c:313 kcm_rcv_strparser+0x2b5/0x3a0 net/kcm/kcmsock.c:373 __strp_recv+0x64c/0xd20 net/strparser/strparser.c:301 strp_recv+0x6d/0x80 net/strparser/strparser.c:335 tcp_read_sock+0x13e/0x5a0 net/ipv4/tcp.c:1703 strp_read_sock net/strparser/strparser.c:358 [inline] do_strp_work net/strparser/strparser.c:406 [inline] strp_work+0xe8/0x180 net/strparser/strparser.c:415 process_one_work+0x3d3/0x720 kernel/workqueue.c:2289 worker_thread+0x618/0xa70 kernel/workqueue.c:2436 kthread+0x1a9/0x1e0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 read to 0xffff888123d827b8 of 8 bytes by task 5859 on cpu 0: kcm_rfree+0x14c/0x220 net/kcm/kcmsock.c:181 skb_release_head_state+0x8e/0x160 net/core/skbuff.c:841 skb_release_all net/core/skbuff.c:852 [inline] __kfree_skb net/core/skbuff.c:868 [inline] kfree_skb_reason+0x5c/0x260 net/core/skbuff.c:891 kfree_skb include/linux/skbuff.h:1216 [inline] kcm_recvmsg+0x226/0x2b0 net/kcm/kcmsock.c:1161 ____sys_recvmsg+0x16c/0x2e0 ___sys_recvmsg net/socket.c:2743 [inline] do_recvmmsg+0x2f1/0x710 net/socket.c:2837 __sys_recvmmsg net/socket.c:2916 [inline] __do_sys_recvmmsg net/socket.c:2939 [inline] __se_sys_recvmmsg net/socket.c:2932 [inline] __x64_sys_recvmmsg+0xde/0x160 net/socket.c:2932 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0xffff88812971ce00 -> 0x0000000000000000 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 5859 Comm: syz-executor.3 Not tainted 6.0.0-syzkaller-12189-g19d17ab7c68b-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022 Fixes: `ab7ac4eb98` ("kcm: Kernel Connection Multiplexor module") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 10:57:55 +01:00
David S. Miller	b29e0dece4	Merge branch 'udp-false-sharing' Paolo Abeni says: ==================== udp: avoid false sharing on receive Under high UDP load, the BH processing and the user-space receiver can run on different cores. The UDP implementation does a lot of effort to avoid false sharing in the receive path, but recent changes to the struct sock layout moved the sk_forward_alloc and the sk_rcvbuf fields on the same cacheline: /* --- cacheline 4 boundary (256 bytes) --- / struct sk_buff tail; } sk_backlog; int sk_forward_alloc; unsigned int sk_reserved_mem; unsigned int sk_ll_usec; unsigned int sk_napi_id; int sk_rcvbuf; sk_forward_alloc is updated by the BH, while sk_rcvbuf is accessed by udp_recvmsg(), causing false sharing. A possible solution would be to re-order the struct sock fields to avoid the false sharing. Such change is subject to being invalidated by future changes and could have negative side effects on other workload. Instead this series uses a different approach, touching only the UDP socket layout. The first patch generalizes the custom setsockopt infrastructure, to allow UDP tracking the buffer size, and the second patch addresses the issue, copying the relevant buffer information into an already hot cacheline. Overall the above gives a 10% peek throughput increase under UDP flood. v1 -> v2: - introduce and use a common helper to initialize the UDP v4/v6 sockets (Kuniyuki) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 10:52:50 +01:00
Paolo Abeni	8a3854c7b8	udp: track the forward memory release threshold in an hot cacheline When the receiver process and the BH runs on different cores, udp_rmem_release() experience a cache miss while accessing sk_rcvbuf, as the latter shares the same cacheline with sk_forward_alloc, written by the BH. With this patch, UDP tracks the rcvbuf value and its update via custom SOL_SOCKET socket options, and copies the forward memory threshold value used by udp_rmem_release() in a different cacheline, already accessed by the above function and uncontended. Since the UDP socket init operation grown a bit, factor out the common code between v4 and v6 in a shared helper. Overall the above give a 10% peek throughput increase under UDP flood. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 10:52:50 +01:00
Paolo Abeni	a5ef058dc4	net: introduce and use custom sockopt socket flag We will soon introduce custom setsockopt for UDP sockets, too. Instead of doing even more complex arbitrary checks inside sock_use_custom_sol_socket(), add a new socket flag and set it for the relevant socket types (currently only MPTCP). Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 10:52:50 +01:00
Sean Anderson	c99f0f7e68	net: fman: Use physical address for userspace interfaces Before `262f2b782e` ("net: fman: Map the base address once"), the physical address of the MAC was exposed to userspace in two places: via sysfs and via SIOCGIFMAP. While this is not best practice, it is an external ABI which is in use by userspace software. The aforementioned commit inadvertently modified these addresses and made them virtual. This constitutes and ABI break. Additionally, it leaks the kernel's memory layout to userspace. Partially revert that commit, reintroducing the resource back into struct mac_device, while keeping the intended changes (the rework of the address mapping). Fixes: `262f2b782e` ("net: fman: Map the base address once") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Sean Anderson <sean.anderson@seco.com> Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 10:45:14 +01:00
David S. Miller	ea5ed0f00b	Merge branch 'net-800Gbps-support' Petr Machata says: ==================== net: Add support for 800Gbps speed Amit Cohen <amcohen@nvidia.com> writes: The next Nvidia Spectrum ASIC will support 800Gbps speed. The IEEE 802 LAN/MAN Standards Committee already published standards for 800Gbps, see the last update [1] and the list of approved changes [2]. As first phase, add support for 800Gbps over 8 lanes (100Gbps/lane). In the future 800Gbps over 4 lanes can be supported also. Extend ethtool to support the relevant PMDs and extend mlxsw and bonding drivers to support 800Gbps. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 10:43:39 +01:00
Amit Cohen	41305d3781	bonding: 3ad: Add support for 800G speed Add support for 800Gbps speed to allow using 3ad mode with 800G devices. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 10:43:39 +01:00
Amit Cohen	cceef209dd	mlxsw: Add support for 800Gbps link modes Add support for 800Gbps speed, link modes of 100Gbps per lane. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 10:43:39 +01:00
Amit Cohen	404c76783f	ethtool: Add support for 800Gbps link modes Add support for 800Gbps speed, link modes of 100Gbps per lane. As mentioned in slide 21 in IEEE documentation [1], all adopted 802.3df copper and optical PMDs baselines using 100G/lane will be supported. Add the relevant PMDs which are mentioned in slide 5 in IEEE documentation [1] and were approved on 10-2022 [2]: BP - KR8 Cu Cable - CR8 MMF 50m - VR8 MMF 100m - SR8 SMF 500m - DR8 SMF 2km - DR8-2 [1]: https://www.ieee802.org/3/df/public/22_10/22_1004/shrikhande_3df_01a_221004.pdf [2]: https://ieee802.org/3/df/KeyMotions_3df_221005.pdf Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 10:43:39 +01:00
Gayatri Kammela	555a68dd68	platform/x86/intel: pmc/core: Add Raptor Lake support to pmc core driver Add Raptor Lake client parts (both RPL and RPL_S) support to pmc core driver. Raptor Lake client parts reuse all the Alder Lake PCH IPs. Cc: Srinivas Pandruvada <srinivas.pandruvada@intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: David Box <david.e.box@intel.com> Acked-by: Rajneesh Bhardwaj <irenic.rajneesh@gmail.com> Signed-off-by: Gayatri Kammela <gayatri.kammela@linux.intel.com> Link: https://lore.kernel.org/r/20220912233307.409954-2-gayatri.kammela@linux.intel.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>	2022-10-24 11:39:27 +02:00
David S. Miller	c1aa0a9078	Merge branch 'sparx5-IS2-VCAP' Steen Hegelund says: ==================== Add support for Sparx5 IS2 VCAP This provides initial support for the Sparx5 VCAP functionality via the 'tc' traffic control userspace tool and its flower filter. Overview: ========= The supported flower filter keys and actions are: - source and destination MAC address keys - trap action - pass action The supported Sparx5 VCAPs are: IS2 (see below for more info) The VCAP (Versatile Content-Aware Processor) feature is essentially a TCAM with rules consisting of: - Programmable key fields - Programmable action fields - A counter (which may be only one bit wide) Besides this each VCAP has: - A number of independent lookups - A keyset configuration typically per port per lookup VCAPs are used in many of the TSN features such as PSFP, PTP, FRER as well as the general shaping, policing and access control, so it is an important building block for these advanced features. Functionality: ============== When a frame is passed to a VCAP the VCAP will generate a set of keys (keyset) based on the traffic type. If there is a rule created with this keyset in the VCAP and the values of the keys matches the values in the keyset of the frame, the rule is said to match and the actions in the rule will be executed and the rule counter will be incremented. No more rules will be examined in this VCAP lookup. If there is no match in the current lookup the frame will be matched against the next lookup (some VCAPs do the processing of the lookups in parallel). The Sparx5 SoC has 6 different VCAP types: - IS0: Ingress Stage 0 (AKA CLM) mostly handles classification - IS2: Ingress Stage 2 mostly handles access control - IP6PFX: IPv6 prefix: Provides tables for IPV6 address management - LPM: Longest Path Match for IP guarding and routing - ES0: Egress Stage 0 is mostly used for CPU copying and multicast handling - ES2: Egress Stage 2 is known as the rewriter and mostly updates tags Design: ======= The VCAP implementation provides switchcore independent handling of rules and supports: - Creating and deleting rules - Updating and getting rules The platform specific API implementation as well as the platform specific model of the VCAP instances are attached to the VCAP API and a client can then access rules via the API in a platform independent way, with the limitations that each VCAP has in terms of is supported keys and actions. The VCAP model is generated from information delivered by the designers of the VCAP hardware. Here is an illustration of this: +------------------+ +------------------+ \| TC flower filter \| \| PTP client \| \| for Sparx5 \| \| for Sparx5 \| +-------------\----+ +---------/--------+ \ / \ / \ / \ / \ / +----v--------v----+ \| VCAP API \| +---------\|--------+ \| \| \| \| +---------v--------+ \| VCAP control \| \| instance \| +----/--------\|----+ / \| / \| / \| / \| +--------------v---+ +----v-------------+ \| Sparx5 VCAP \| \| Sparx5 VCAP API \| \| model \| \| Implementation \| +------------------+ +---------\|--------+ \| \| \| \| +---------v--------+ \| Sparx5 VCAP HW \| +------------------+ Delivery: ========= For now only the IS2 is supported but later the IS0, ES0 and ES2 will be added. There are currently no plans to support the IP6PFX and the LPM VCAPs. The IS2 VCAP has 4 lookups and they are accessible with a TC chain id: - chain 8000000: IS2 Lookup 0 - chain 8100000: IS2 Lookup 1 - chain 8200000: IS2 Lookup 2 - chain 8300000: IS2 Lookup 3 These lookups are executed in parallel by the IS2 VCAP but the actions are executed in series (the datasheet explains what happens if actions overlap). The functionality of TC flower as well as TC matchall filters will be expanded in later submissions as well as the number of VCAPs supported. This is current plan: - add support for more TC flower filter keys and extend the Sparx5 port keyset configuration - support for TC protocol all - debugfs support for inspecting rules - TC flower filter statistics - Sparx5 IS0 VCAP support and more TC keys and actions to support this - add TC policer and drop action support (depends on the Sparx5 QoS support upstreamed separately) - Sparx5 ES0 VCAP support and more TC actions to support this - TC flower template support - TC matchall filter support for mirroring and policing ports - TC flower filter mirror action support - Sparx5 ES2 VCAP support The LAN966x switchcore will also be updated to use the VCAP API as well as future Microchip switches. The LAN966x has 3 VCAPS (IS1, IS2 and ES0) and a slightly different keyset and actionset portfolio than Sparx5. Version History: ================ v3 Moved the sparx5_tc_flower_set_exterr function to the VCAP API and renamed it. Moved the sparx5_netbytes_copy function to the VCAP_API and renamed it (thanks Horatiu Vultur). Fixed indentation in the vcap_write_rule function. Added a comment mentioning the typegroup table terminator in the vcap_iter_skip_tg function. v2 Made the KUNIT test model a superset of the real model to fix a kernel robot build error. v1 Initial version ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-24 10:37:43 +01:00

... 3 4 5 6 7 ...

1136916 Commits All Branches Search

1136916 Commits

All Branches