-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmJGFgQACgkQSfxwEqXe
A64CbBAAmi1I+wOVtO8BAC/Two4yH9s9WC0nBc7c70ZIhVnNF+hi2KmJuVGnj8Id
Lj3yIVKDqfZuoqqqOTlDKwPPsNLHPX2h/XrhrYju/nJBY6Eh8cSbOHRA26Xnziq5
cGfOW85eQpKyxDTWH3R4SDs7ng+omPYtn54tDnUsN/obJYiSsX7yT7IFFJgCtRpA
9tboSO9Wb6u9+wR1TnxvLYDEXwrUjmz2UKNlKlMlgeAVCvmnfvzD47ez/vo9B44+
IOPa8QM5PCHIxBvWDyVlMHZs6lK6fDZF4TWAwe5etJda972eQWDb9mpQZ2ft9INX
9gBN6g7CLCSb9047ItaPqkgzdhRqnxww8Pd1ccxf/6tW/5+kVedaA7Eypy1UcuA/
WrQIqx6lh+Qx4YcWyO8ULUiky64zad7pahtaFXzjdEGjQuylqjPHCxxCmiltpSZ9
PTbR5r+2wEdVlm4I2u3cIVSLy+lgS5sgF5YA2UKOB32fqlB3y2Cykq4FfOiJZK6Z
9VdQqqhWs3zE5d6olfFiNewDLyKTfnJ1FBOOxMNLhOKEL0qDFcjd9UXmrkpZHdv2
yz4Ps4k+d3gqGpcIue97zEBA7mU9UyP9rzX6pMEMTb+i8WpZa8rrdxak1AmJBwfI
FINjZl4fe6ZmDPBTW9FZB2ibjRAt7wtzEsQjNI7sfT9hKnGZlYI=
=NfL4
-----END PGP SIGNATURE-----
Merge tag 'random-5.18-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator fixes from Jason Donenfeld:
- If a hardware random number generator passes a sufficiently large
chunk of entropy to random.c during early boot, we now skip the
"fast_init" business and let it initialize the RNG.
This makes CONFIG_RANDOM_TRUST_BOOTLOADER=y actually useful.
- We already have the command line `random.trust_cpu=0/1` option for
RDRAND, which let distros enable CONFIG_RANDOM_TRUST_CPU=y while
placating concerns of more paranoid users.
Now we add `random.trust_bootloader=0/1` so that distros can
similarly enable CONFIG_RANDOM_TRUST_BOOTLOADER=y.
- Re-add a comment that got removed by accident in the recent revert.
- Add the spec-compliant ACPI CID for vmgenid, which Microsoft added to
the vmgenid spec at Ard's request during earlier review.
- Restore build-time randomness via the latent entropy plugin, which
was lost when we transitioned to using a hash function.
* tag 'random-5.18-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
random: mix build-time latent entropy into pool at init
virt: vmgenid: recognize new CID added by Hyper-V
random: re-add removed comment about get_random_{u32,u64} reseeding
random: treat bootloader trust toggle the same way as cpu trust toggle
random: skip fast_init if hwrng provides large chunk of entropy
- Add per core DVFS support for QCom SoC (Bjorn Andersson), convert
to yaml binding (Manivannan Sadhasivam) and various other fixes
to the QCom drivers (Luca Weiss).
- Add OPP table for imx7s SoC (Denys Drozdov) and minor fixes (Stefan
Agner).
- Fix CPPC driver's freq/performance conversions (Pierre Gondois).
- Minor generic cleanups (Yury Norov).
- Introduce opp-microwatt property to the OPP core, bindings, etc
(Lukasz Luba).
- Convert DT bindings to schema format and various related fixes
(Yassine Oudjana).
- Expose OPP's OF node in debugfs (Viresh Kumar).
- Add Intel uncore frequency scaling documentation file to its
MAINTAINERS entry (Srinivas Pandruvada).
- Clean up the AMD P-state driver documentation (Jan Engelhardt).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmJDPHMSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxjYYQAJfa7TO+VtoB+0tSKKT0iVI52HVPyvr5
BTlDgCKmpt9/mfwfsa+9+RJ0k78pT8zWMl27VL1fAkF9SXbY13kbFIVyfWDoigiK
uAH8hoWHp50PvyZtLHoiIzp5hBl/CLer/66Ys/jo32B5ui2iA50hCJ6W5KIpynT2
yzZBsyGNfak78Y6S6TyeMHR+8ytmfpsuDGMmcs8DVb8Fn6tlcBIfskZcYrimp1Ss
IoiDvRvcavY0kZ5tBuzgikSzR4SJ8bx0hPzKnV/n6HCe2CAIg9o5HHnqPT8Su1ar
dqI/ghkO+5JvHL1oxlperCDjfXREuYbpx7IkatN8zR9/Xkho5iabW+Pn/E0ZN/e9
boLzAWaHsJxnJkhcwKzOWeYhsbzNuNY+n9LjHHiUH2kgMCUsvPBPIcV5LgUU8oeX
1TXvl36OI883rgymBmi6nBc7+v/lE8qAxFds4M5rsLR173rcSewpiov/uOPgzGhe
JyerIpuOU5EWbvJDUCLLgHVLlK+ciu/98wiuWQCWwwiNZBZGEMbxA8Die/0Pq3T5
Nvhh96EDqF6QPACAKTHTSMFDmHC9J78guFvqBSI3XiGDSTlTC+l3twNOT3mU1kUz
s9pEwWROBnSg5QsfB/logsjV4q6RcEdSaKSW3bCZ/lRO18/daHFoLH/NlQif626O
Jsx3k+vgd2Hv
=rsb/
-----END PGP SIGNATURE-----
Merge tag 'pm-5.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"These update ARM cpufreq drivers, the OPP (Operating Performance
Points) library and the power management documentation.
Specifics:
- Add per core DVFS support for QCom SoC (Bjorn Andersson), convert
to yaml binding (Manivannan Sadhasivam) and various other fixes to
the QCom drivers (Luca Weiss).
- Add OPP table for imx7s SoC (Denys Drozdov) and minor fixes (Stefan
Agner).
- Fix CPPC driver's freq/performance conversions (Pierre Gondois).
- Minor generic cleanups (Yury Norov).
- Introduce opp-microwatt property to the OPP core, bindings, etc
(Lukasz Luba).
- Convert DT bindings to schema format and various related fixes
(Yassine Oudjana).
- Expose OPP's OF node in debugfs (Viresh Kumar).
- Add Intel uncore frequency scaling documentation file to its
MAINTAINERS entry (Srinivas Pandruvada).
- Clean up the AMD P-state driver documentation (Jan Engelhardt)"
* tag 'pm-5.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (24 commits)
Documentation: amd-pstate: grammar and sentence structure updates
dt-bindings: cpufreq: cpufreq-qcom-hw: Convert to YAML bindings
dt-bindings: dvfs: Use MediaTek CPUFREQ HW as an example
Documentation: EM: Describe new registration method using DT
OPP: Add support of "opp-microwatt" for EM registration
PM: EM: add macro to set .active_power() callback conditionally
OPP: Add "opp-microwatt" supporting code
dt-bindings: opp: Add "opp-microwatt" entry in the OPP
MAINTAINERS: Add additional file to uncore frequency control
cpufreq: blocklist Qualcomm sc8280xp and sa8540p in cpufreq-dt-platdev
cpufreq: qcom-hw: Add support for per-core-dcvs
dt-bindings: power: avs: qcom,cpr: Convert to DT schema
arm64: dts: qcom: qcs404: Rename CPU and CPR OPP tables
arm64: dts: qcom: msm8996: Rename cluster OPP tables
dt-bindings: opp: Convert qcom-nvmem-cpufreq to DT schema
dt-bindings: opp: qcom-opp: Convert to DT schema
arm64: dts: qcom: msm8996-mtp: Add msm8996 compatible
dt-bindings: arm: qcom: Add msm8996 and apq8096 compatibles
opp: Expose of-node's name in debugfs
cpufreq: CPPC: Fix performance/frequency conversion
...
Merge additional power management documentation udates for 5.18-rc1:
- Add Intel uncore frequency scaling documentation file to its
MAINTAINERS entry (Srinivas Pandruvada).
- Clean up the AMD P-state driver documentation (Jan Engelhardt).
* pm-docs:
Documentation: amd-pstate: grammar and sentence structure updates
MAINTAINERS: Add additional file to uncore frequency control
Highlights:
- new drivers:
- AMD Host System Management Port (HSMP)
- Intel Software Defined Silicon
- removed drivers (functionality folded into other drivers):
- intel_cht_int33fe_microb
- surface3_button
- amd-pmc:
- s2idle bug-fixes
- Support for AMD Spill to DRAM STB feature
- hp-wmi:
- Fix SW_TABLET_MODE detection method (and other fixes)
- Support omen thermal profile policy v1
- serial-multi-instantiate:
- Add SPI device support
- Add support for CS35L41 amplifiers used in new laptops
- think-lmi:
- syfs-class-firmware-attributes Certificate authentication support
- thinkpad_acpi:
- Fixes + quirks
- Add platform_profile support on AMD based ThinkPads
- x86-android-tablets
- Improve Asus ME176C / TF103C support
- Support Nextbook Ares 8, Lenovo Tab 2 830 and 1050 tablets
- Lots of various other small fixes and hardware-id additions
The following is an automated git shortlog grouped by driver:
ACPI / scan:
- Create platform device for CS35L41
ACPI / x86:
- Add support for LPS0 callback handler
ALSA:
- hda/realtek: Add support for HP Laptops
Add AMD system management interface:
- Add AMD system management interface
Add Intel Software Defined Silicon driver:
- Add Intel Software Defined Silicon driver
Documentation:
- syfs-class-firmware-attributes: Lenovo Certificate support
- Add x86/amd_hsmp driver
ISST:
- Fix possible circular locking dependency detected
Input:
- soc_button_array - add support for Microsoft Surface 3 (MSHW0028) buttons
Merge remote-tracking branch 'pdx86/platform-drivers-x86-pinctrl-pmu_clk' into review-hans-gcc12:
- Merge remote-tracking branch 'pdx86/platform-drivers-x86-pinctrl-pmu_clk' into review-hans-gcc12
Merge tag 'platform-drivers-x86-serial-multi-instantiate-1' into review-hans:
- Merge tag 'platform-drivers-x86-serial-multi-instantiate-1' into review-hans
Replace acpi_bus_get_device():
- Replace acpi_bus_get_device()
amd-pmc:
- Only report STB errors when STB enabled
- Drop CPU QoS workaround
- Output error codes in messages
- Move to later in the suspend process
- Validate entry into the deepest state on resume
- uninitialized variable in amd_pmc_s2d_init()
- Set QOS during suspend on CZN w/ timer wakeup
- Add support for AMD Spill to DRAM STB feature
- Correct usage of SMU version
- Make amd_pmc_stb_debugfs_fops static
asus-tf103c-dock:
- Make 2 global structs static
asus-wmi:
- Fix regression when probing for fan curve control
hp-wmi:
- support omen thermal profile policy v1
- Changing bios_args.data to be dynamically allocated
- Fix 0x05 error code reported by several WMI calls
- Fix SW_TABLET_MODE detection method
- Fix hp_wmi_read_int() reporting error (0x05)
huawei-wmi:
- check the return value of device_create_file()
i2c-multi-instantiate:
- Rename it for a generic serial driver name
int3472:
- Add terminator to gpiod_lookup_table
intel-uncore-freq:
- fix uncore_freq_common_init() error codes
intel_cht_int33fe:
- Move to intel directory
- Drop Lenovo Yogabook YB1-X9x code
- Switch to DMI modalias based loading
intel_crystal_cove_charger:
- Fix IRQ masking / unmasking
lg-laptop:
- Move setting of battery charge limit to common location
pinctrl:
- baytrail: Add pinconf group + function for the pmu_clk
platform/dcdbas:
- move EXPORT_SYMBOL after function
platform/surface:
- Remove Surface 3 Button driver
- surface3-wmi: Simplify resource management
- Replace acpi_bus_get_device()
- Reinstate platform dependency
platform/x86/intel-uncore-freq:
- Split common and enumeration part
platform/x86/intel/uncore-freq:
- Display uncore current frequency
- Use sysfs API to create attributes
- Move to uncore-frequency folder
selftests:
- sdsi: test sysfs setup
serial-multi-instantiate:
- Add SPI support
- Reorganize I2C functions
spi:
- Add API to count spi acpi resources
- Support selection of the index of the ACPI Spi Resource before alloc
- Create helper API to lookup ACPI info for spi device
- Make spi_alloc_device and spi_add_device public again
surface:
- surface3_power: Fix battery readings on batteries without a serial number
think-lmi:
- Certificate authentication support
thinkpad_acpi:
- consistently check fan_get_status return.
- Don't use test_bit on an integer
- Fix compiler warning about uninitialized err variable
- clean up dytc profile convert
- Add PSC mode support
- Add dual fan probe
- Add dual-fan quirk for T15g (2nd gen)
- Fix incorrect use of platform profile on AMD platforms
- Add quirk for ThinkPads without a fan
tools arch x86:
- Add Intel SDSi provisiong tool
touchscreen_dmi:
- Add info for the RWC NANOTE P8 AY07J 2-in-1
x86-android-tablets:
- Depend on EFI and SPI
- Lenovo Yoga Tablet 2 830/1050 sound support
- Workaround Lenovo Yoga Tablet 2 830/1050 poweroff hang
- Add Lenovo Yoga Tablet 2 830 / 1050 data
- Fix EBUSY error when requesting IOAPIC IRQs
- Minor charger / fuel-gauge improvements
- Add Nextbook Ares 8 data
- Add IRQ to Asus ME176C accelerometer info
- Add lid-switch gpio-keys pdev to Asus ME176C + TF103C
- Add x86_android_tablet_get_gpiod() helper
- Add Asus ME176C/TF103C charger and fuelgauge props
- Add battery swnode support
- Trivial typo fix for MODULE_AUTHOR
- Fix the buttons on CZC P10T tablet
- Constify the gpiod_lookup_tables arrays
- Add an init() callback to struct x86_dev_info
- Add support for disabling ACPI _AEI handlers
- Correct crystal_cove_charger module name
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmI8SjEUHGhkZWdvZWRl
QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9wYUwf/cdUMPFy5cwpHq1LuqGy+PxVCRHCe
71PFd2Ycj+HGOtrt66RxSiCC1Seb4tylr7FvudToDaqWjlBf5n6LhpDudg4ds7Qw
lCuRlaXTIrF7p3nOLIsWvJPRqacMG79KkRM62MLTS2evtRYjbnKvFzNPJPzr8827
1AhCakE92S8gkR5lUZYYHtsaz9rZ4z4TrEtjO6GdlbL2bDw0l18dNNwdMomfVpNS
bBIHIDLeufDuMJ4PxIHlE5MB3AuZAuc0HTJWihozyJX/h5FMGI6qVm0/s9RAfHgX
XdMCpADtS/JjHCmkFgLZYIzvXTxwQVZRo5VO0Wrv5Mis6gSpxJXCd0aKlA==
=1x9/
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Hans de Goede:
"New drivers:
- AMD Host System Management Port (HSMP)
- Intel Software Defined Silicon
Removed drivers (functionality folded into other drivers):
- intel_cht_int33fe_microb
- surface3_button
amd-pmc:
- s2idle bug-fixes
- Support for AMD Spill to DRAM STB feature
hp-wmi:
- Fix SW_TABLET_MODE detection method (and other fixes)
- Support omen thermal profile policy v1
serial-multi-instantiate:
- Add SPI device support
- Add support for CS35L41 amplifiers used in new laptops
think-lmi:
- syfs-class-firmware-attributes Certificate authentication support
thinkpad_acpi:
- Fixes + quirks
- Add platform_profile support on AMD based ThinkPads
x86-android-tablets:
- Improve Asus ME176C / TF103C support
- Support Nextbook Ares 8, Lenovo Tab 2 830 and 1050 tablets
Lots of various other small fixes and hardware-id additions"
* tag 'platform-drivers-x86-v5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (60 commits)
platform/x86: think-lmi: Certificate authentication support
Documentation: syfs-class-firmware-attributes: Lenovo Certificate support
platform/x86: amd-pmc: Only report STB errors when STB enabled
platform/x86: amd-pmc: Drop CPU QoS workaround
platform/x86: amd-pmc: Output error codes in messages
platform/x86: amd-pmc: Move to later in the suspend process
ACPI / x86: Add support for LPS0 callback handler
platform/x86: thinkpad_acpi: consistently check fan_get_status return.
platform/x86: hp-wmi: support omen thermal profile policy v1
platform/x86: hp-wmi: Changing bios_args.data to be dynamically allocated
platform/x86: hp-wmi: Fix 0x05 error code reported by several WMI calls
platform/x86: hp-wmi: Fix SW_TABLET_MODE detection method
platform/x86: hp-wmi: Fix hp_wmi_read_int() reporting error (0x05)
platform/x86: amd-pmc: Validate entry into the deepest state on resume
platform/x86: thinkpad_acpi: Don't use test_bit on an integer
platform/x86: thinkpad_acpi: Fix compiler warning about uninitialized err variable
platform/x86: thinkpad_acpi: clean up dytc profile convert
platform/x86: x86-android-tablets: Depend on EFI and SPI
platform/x86: amd-pmc: uninitialized variable in amd_pmc_s2d_init()
platform/x86: intel-uncore-freq: fix uncore_freq_common_init() error codes
...
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If CONFIG_RANDOM_TRUST_CPU is set, the RNG initializes using RDRAND.
But, the user can disable (or enable) this behavior by setting
`random.trust_cpu=0/1` on the kernel command line. This allows system
builders to do reasonable things while avoiding howls from tinfoil
hatters. (Or vice versa.)
CONFIG_RANDOM_TRUST_BOOTLOADER is basically the same thing, but regards
the seed passed via EFI or device tree, which might come from RDRAND or
a TPM or somewhere else. In order to allow distros to more easily enable
this while avoiding those same howls (or vice versa), this commit adds
the corresponding `random.trust_bootloader=0/1` toggle.
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Graham Christensen <graham@grahamc.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Link: https://github.com/NixOS/nixpkgs/pull/165355
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Merge more updates from Andrew Morton:
"Various misc subsystems, before getting into the post-linux-next
material.
41 patches.
Subsystems affected by this patch series: procfs, misc, core-kernel,
lib, checkpatch, init, pipe, minix, fat, cgroups, kexec, kdump,
taskstats, panic, kcov, resource, and ubsan"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (41 commits)
Revert "ubsan, kcsan: Don't combine sanitizer with kcov on clang"
kernel/resource: fix kfree() of bootmem memory again
kcov: properly handle subsequent mmap calls
kcov: split ioctl handling into locked and unlocked parts
panic: move panic_print before kmsg dumpers
panic: add option to dump all CPUs backtraces in panic_print
docs: sysctl/kernel: add missing bit to panic_print
taskstats: remove unneeded dead assignment
kasan: no need to unset panic_on_warn in end_report()
ubsan: no need to unset panic_on_warn in ubsan_epilogue()
panic: unset panic_on_warn inside panic()
docs: kdump: add scp example to write out the dump file
docs: kdump: update description about sysfs file system support
arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
x86/setup: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
riscv: mm: init: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
kexec: make crashk_res, crashk_low_res and crash_notes symbols always visible
cgroup: use irqsave in cgroup_rstat_flush_locked().
fat: use pointer to simple type in put_user()
minix: fix bug when opening a file with O_DIRECT
...
Core
----
- Introduce XDP multi-buffer support, allowing the use of XDP with
jumbo frame MTUs and combination with Rx coalescing offloads (LRO).
- Speed up netns dismantling (5x) and lower the memory cost a little.
Remove unnecessary per-netns sockets. Scope some lists to a netns.
Cut down RCU syncing. Use batch methods. Allow netdev registration
to complete out of order.
- Support distinguishing timestamp types (ingress vs egress) and
maintaining them across packet scrubbing points (e.g. redirect).
- Continue the work of annotating packet drop reasons throughout
the stack.
- Switch netdev error counters from an atomic to dynamically
allocated per-CPU counters.
- Rework a few preempt_disable(), local_irq_save() and busy waiting
sections problematic on PREEMPT_RT.
- Extend the ref_tracker to allow catching use-after-free bugs.
BPF
---
- Introduce "packing allocator" for BPF JIT images. JITed code is
marked read only, and used to be allocated at page granularity.
Custom allocator allows for more efficient memory use, lower
iTLB pressure and prevents identity mapping huge pages from
getting split.
- Make use of BTF type annotations (e.g. __user, __percpu) to enforce
the correct probe read access method, add appropriate helpers.
- Convert the BPF preload to use light skeleton and drop
the user-mode-driver dependency.
- Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
its use as a packet generator.
- Allow local storage memory to be allocated with GFP_KERNEL if called
from a hook allowed to sleep.
- Introduce fprobe (multi kprobe) to speed up mass attachment (arch
bits to come later).
- Add unstable conntrack lookup helpers for BPF by using the BPF
kfunc infra.
- Allow cgroup BPF progs to return custom errors to user space.
- Add support for AF_UNIX iterator batching.
- Allow iterator programs to use sleepable helpers.
- Support JIT of add, and, or, xor and xchg atomic ops on arm64.
- Add BTFGen support to bpftool which allows to use CO-RE in kernels
without BTF info.
- Large number of libbpf API improvements, cleanups and deprecations.
Protocols
---------
- Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.
- Adjust TSO packet sizes based on min_rtt, allowing very low latency
links (data centers) to always send full-sized TSO super-frames.
- Make IPv6 flow label changes (AKA hash rethink) more configurable,
via sysctl and setsockopt. Distinguish between server and client
behavior.
- VxLAN support to "collect metadata" devices to terminate only
configured VNIs. This is similar to VLAN filtering in the bridge.
- Support inserting IPv6 IOAM information to a fraction of frames.
- Add protocol attribute to IP addresses to allow identifying where
given address comes from (kernel-generated, DHCP etc.)
- Support setting socket and IPv6 options via cmsg on ping6 sockets.
- Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
Define dscp_t and stop taking ECN bits into account in fib-rules.
- Add support for locked bridge ports (for 802.1X).
- tun: support NAPI for packets received from batched XDP buffs,
doubling the performance in some scenarios.
- IPv6 extension header handling in Open vSwitch.
- Support IPv6 control message load balancing in bonding, prevent
neighbor solicitation and advertisement from using the wrong port.
Support NS/NA monitor selection similar to existing ARP monitor.
- SMC
- improve performance with TCP_CORK and sendfile()
- support auto-corking
- support TCP_NODELAY
- MCTP (Management Component Transport Protocol)
- add user space tag control interface
- I2C binding driver (as specified by DMTF DSP0237)
- Multi-BSSID beacon handling in AP mode for WiFi.
- Bluetooth:
- handle MSFT Monitor Device Event
- add MGMT Adv Monitor Device Found/Lost events
- Multi-Path TCP:
- add support for the SO_SNDTIMEO socket option
- lots of selftest cleanups and improvements
- Increase the max PDU size in CAN ISOTP to 64 kB.
Driver API
----------
- Add HW counters for SW netdevs, a mechanism for devices which
offload packet forwarding to report packet statistics back to
software interfaces such as tunnels.
- Select the default NIC queue count as a fraction of number of
physical CPU cores, instead of hard-coding to 8.
- Expose devlink instance locks to drivers. Allow device layer of
drivers to use that lock directly instead of creating their own
which always runs into ordering issues in devlink callbacks.
- Add header/data split indication to guide user space enabling
of TCP zero-copy Rx.
- Allow configuring completion queue event size.
- Refactor page_pool to enable fragmenting after allocation.
- Add allocation and page reuse statistics to page_pool.
- Improve Multiple Spanning Trees support in the bridge to allow
reuse of topologies across VLANs, saving HW resources in switches.
- DSA (Distributed Switch Architecture):
- replay and offload of host VLAN entries
- offload of static and local FDB entries on LAG interfaces
- FDB isolation and unicast filtering
New hardware / drivers
----------------------
- Ethernet:
- LAN937x T1 PHYs
- Davicom DM9051 SPI NIC driver
- Realtek RTL8367S, RTL8367RB-VB switch and MDIO
- Microchip ksz8563 switches
- Netronome NFP3800 SmartNICs
- Fungible SmartNICs
- MediaTek MT8195 switches
- WiFi:
- mt76: MediaTek mt7916
- mt76: MediaTek mt7921u USB adapters
- brcmfmac: Broadcom BCM43454/6
- Mobile:
- iosm: Intel M.2 7360 WWAN card
Drivers
-------
- Convert many drivers to the new phylink API built for split PCS
designs but also simplifying other cases.
- Intel Ethernet NICs:
- add TTY for GNSS module for E810T device
- improve AF_XDP performance
- GTP-C and GTP-U filter offload
- QinQ VLAN support
- Mellanox Ethernet NICs (mlx5):
- support xdp->data_meta
- multi-buffer XDP
- offload tc push_eth and pop_eth actions
- Netronome Ethernet NICs (nfp):
- flow-independent tc action hardware offload (police / meter)
- AF_XDP
- Other Ethernet NICs:
- at803x: fiber and SFP support
- xgmac: mdio: preamble suppression and custom MDC frequencies
- r8169: enable ASPM L1.2 if system vendor flags it as safe
- macb/gem: ZynqMP SGMII
- hns3: add TX push mode
- dpaa2-eth: software TSO
- lan743x: multi-queue, mdio, SGMII, PTP
- axienet: NAPI and GRO support
- Mellanox Ethernet switches (mlxsw):
- source and dest IP address rewrites
- RJ45 ports
- Marvell Ethernet switches (prestera):
- basic routing offload
- multi-chain TC ACL offload
- NXP embedded Ethernet switches (ocelot & felix):
- PTP over UDP with the ocelot-8021q DSA tagging protocol
- basic QoS classification on Felix DSA switch using dcbnl
- port mirroring for ocelot switches
- Microchip high-speed industrial Ethernet (sparx5):
- offloading of bridge port flooding flags
- PTP Hardware Clock
- Other embedded switches:
- lan966x: PTP Hardward Clock
- qca8k: mdio read/write operations via crafted Ethernet packets
- Qualcomm 802.11ax WiFi (ath11k):
- add LDPC FEC type and 802.11ax High Efficiency data in radiotap
- enable RX PPDU stats in monitor co-exist mode
- Intel WiFi (iwlwifi):
- UHB TAS enablement via BIOS
- band disablement via BIOS
- channel switch offload
- 32 Rx AMPDU sessions in newer devices
- MediaTek WiFi (mt76):
- background radar detection
- thermal management improvements on mt7915
- SAR support for more mt76 platforms
- MBSSID and 6 GHz band on mt7915
- RealTek WiFi:
- rtw89: AP mode
- rtw89: 160 MHz channels and 6 GHz band
- rtw89: hardware scan
- Bluetooth:
- mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)
- Microchip CAN (mcp251xfd):
- multiple RX-FIFOs and runtime configurable RX/TX rings
- internal PLL, runtime PM handling simplification
- improve chip detection and error handling after wakeup
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmI7YBcACgkQMUZtbf5S
IrveSBAAmSNJlUK6vPsnNzs7IhsZnfI/AUjm2TCLZnlhKttbpI4A/4Pohk33V7RS
FGX7f8kjEfhUwrIiLDgeCnztNHRECrCmk6aZc/jLEvecmTauJ+f6kjShkDY/wix+
AkPHmrZnQeLPAEVuljDdV+sL6ik08+zQL7PazIYHsaSKKC0MGQptRwcri8PLRAKE
KPBAhVhleq2rAZ/ntprSN52F4Af6rpFTrPIWuN8Bqdbc9dy5094LT0mpOOWYvgr3
/DLvvAPuLemwyIQkjWknVKBRUAQcmNPC+BY3J8K3LRaiNhekGqOFan46BfqP+k2J
6DWu0Qrp2yWt4BMOeEToZR5rA6v5suUAMIBu8PRZIDkINXQMlIxHfGjZyNm0rVfw
7edNri966yus9OdzwPa32MIG3oC6PnVAwYCJAjjBMNS8sSIkp7wgHLkgWN4UFe2H
K/e6z8TLF4UQ+zFM0aGI5WZ+9QqWkTWEDF3R3OhdFpGrznna0gxmkOeV2YvtsgxY
cbS0vV9Zj73o+bYzgBKJsw/dAjyLdXoHUGvus26VLQ78S/VGunVKtItwoxBAYmZo
krW964qcC89YofzSi8RSKLHuEWtNWZbVm8YXr75u6jpr5GhMBu0CYefLs+BuZcxy
dw8c69cGneVbGZmY2J3rBhDkchbuICl8vdUPatGrOJAoaFdYKuw=
=ELpe
-----END PGP SIGNATURE-----
Merge tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"The sprinkling of SPI drivers is because we added a new one and Mark
sent us a SPI driver interface conversion pull request.
Core
----
- Introduce XDP multi-buffer support, allowing the use of XDP with
jumbo frame MTUs and combination with Rx coalescing offloads (LRO).
- Speed up netns dismantling (5x) and lower the memory cost a little.
Remove unnecessary per-netns sockets. Scope some lists to a netns.
Cut down RCU syncing. Use batch methods. Allow netdev registration
to complete out of order.
- Support distinguishing timestamp types (ingress vs egress) and
maintaining them across packet scrubbing points (e.g. redirect).
- Continue the work of annotating packet drop reasons throughout the
stack.
- Switch netdev error counters from an atomic to dynamically
allocated per-CPU counters.
- Rework a few preempt_disable(), local_irq_save() and busy waiting
sections problematic on PREEMPT_RT.
- Extend the ref_tracker to allow catching use-after-free bugs.
BPF
---
- Introduce "packing allocator" for BPF JIT images. JITed code is
marked read only, and used to be allocated at page granularity.
Custom allocator allows for more efficient memory use, lower iTLB
pressure and prevents identity mapping huge pages from getting
split.
- Make use of BTF type annotations (e.g. __user, __percpu) to enforce
the correct probe read access method, add appropriate helpers.
- Convert the BPF preload to use light skeleton and drop the
user-mode-driver dependency.
- Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
its use as a packet generator.
- Allow local storage memory to be allocated with GFP_KERNEL if
called from a hook allowed to sleep.
- Introduce fprobe (multi kprobe) to speed up mass attachment (arch
bits to come later).
- Add unstable conntrack lookup helpers for BPF by using the BPF
kfunc infra.
- Allow cgroup BPF progs to return custom errors to user space.
- Add support for AF_UNIX iterator batching.
- Allow iterator programs to use sleepable helpers.
- Support JIT of add, and, or, xor and xchg atomic ops on arm64.
- Add BTFGen support to bpftool which allows to use CO-RE in kernels
without BTF info.
- Large number of libbpf API improvements, cleanups and deprecations.
Protocols
---------
- Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.
- Adjust TSO packet sizes based on min_rtt, allowing very low latency
links (data centers) to always send full-sized TSO super-frames.
- Make IPv6 flow label changes (AKA hash rethink) more configurable,
via sysctl and setsockopt. Distinguish between server and client
behavior.
- VxLAN support to "collect metadata" devices to terminate only
configured VNIs. This is similar to VLAN filtering in the bridge.
- Support inserting IPv6 IOAM information to a fraction of frames.
- Add protocol attribute to IP addresses to allow identifying where
given address comes from (kernel-generated, DHCP etc.)
- Support setting socket and IPv6 options via cmsg on ping6 sockets.
- Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
Define dscp_t and stop taking ECN bits into account in fib-rules.
- Add support for locked bridge ports (for 802.1X).
- tun: support NAPI for packets received from batched XDP buffs,
doubling the performance in some scenarios.
- IPv6 extension header handling in Open vSwitch.
- Support IPv6 control message load balancing in bonding, prevent
neighbor solicitation and advertisement from using the wrong port.
Support NS/NA monitor selection similar to existing ARP monitor.
- SMC
- improve performance with TCP_CORK and sendfile()
- support auto-corking
- support TCP_NODELAY
- MCTP (Management Component Transport Protocol)
- add user space tag control interface
- I2C binding driver (as specified by DMTF DSP0237)
- Multi-BSSID beacon handling in AP mode for WiFi.
- Bluetooth:
- handle MSFT Monitor Device Event
- add MGMT Adv Monitor Device Found/Lost events
- Multi-Path TCP:
- add support for the SO_SNDTIMEO socket option
- lots of selftest cleanups and improvements
- Increase the max PDU size in CAN ISOTP to 64 kB.
Driver API
----------
- Add HW counters for SW netdevs, a mechanism for devices which
offload packet forwarding to report packet statistics back to
software interfaces such as tunnels.
- Select the default NIC queue count as a fraction of number of
physical CPU cores, instead of hard-coding to 8.
- Expose devlink instance locks to drivers. Allow device layer of
drivers to use that lock directly instead of creating their own
which always runs into ordering issues in devlink callbacks.
- Add header/data split indication to guide user space enabling of
TCP zero-copy Rx.
- Allow configuring completion queue event size.
- Refactor page_pool to enable fragmenting after allocation.
- Add allocation and page reuse statistics to page_pool.
- Improve Multiple Spanning Trees support in the bridge to allow
reuse of topologies across VLANs, saving HW resources in switches.
- DSA (Distributed Switch Architecture):
- replay and offload of host VLAN entries
- offload of static and local FDB entries on LAG interfaces
- FDB isolation and unicast filtering
New hardware / drivers
----------------------
- Ethernet:
- LAN937x T1 PHYs
- Davicom DM9051 SPI NIC driver
- Realtek RTL8367S, RTL8367RB-VB switch and MDIO
- Microchip ksz8563 switches
- Netronome NFP3800 SmartNICs
- Fungible SmartNICs
- MediaTek MT8195 switches
- WiFi:
- mt76: MediaTek mt7916
- mt76: MediaTek mt7921u USB adapters
- brcmfmac: Broadcom BCM43454/6
- Mobile:
- iosm: Intel M.2 7360 WWAN card
Drivers
-------
- Convert many drivers to the new phylink API built for split PCS
designs but also simplifying other cases.
- Intel Ethernet NICs:
- add TTY for GNSS module for E810T device
- improve AF_XDP performance
- GTP-C and GTP-U filter offload
- QinQ VLAN support
- Mellanox Ethernet NICs (mlx5):
- support xdp->data_meta
- multi-buffer XDP
- offload tc push_eth and pop_eth actions
- Netronome Ethernet NICs (nfp):
- flow-independent tc action hardware offload (police / meter)
- AF_XDP
- Other Ethernet NICs:
- at803x: fiber and SFP support
- xgmac: mdio: preamble suppression and custom MDC frequencies
- r8169: enable ASPM L1.2 if system vendor flags it as safe
- macb/gem: ZynqMP SGMII
- hns3: add TX push mode
- dpaa2-eth: software TSO
- lan743x: multi-queue, mdio, SGMII, PTP
- axienet: NAPI and GRO support
- Mellanox Ethernet switches (mlxsw):
- source and dest IP address rewrites
- RJ45 ports
- Marvell Ethernet switches (prestera):
- basic routing offload
- multi-chain TC ACL offload
- NXP embedded Ethernet switches (ocelot & felix):
- PTP over UDP with the ocelot-8021q DSA tagging protocol
- basic QoS classification on Felix DSA switch using dcbnl
- port mirroring for ocelot switches
- Microchip high-speed industrial Ethernet (sparx5):
- offloading of bridge port flooding flags
- PTP Hardware Clock
- Other embedded switches:
- lan966x: PTP Hardward Clock
- qca8k: mdio read/write operations via crafted Ethernet packets
- Qualcomm 802.11ax WiFi (ath11k):
- add LDPC FEC type and 802.11ax High Efficiency data in radiotap
- enable RX PPDU stats in monitor co-exist mode
- Intel WiFi (iwlwifi):
- UHB TAS enablement via BIOS
- band disablement via BIOS
- channel switch offload
- 32 Rx AMPDU sessions in newer devices
- MediaTek WiFi (mt76):
- background radar detection
- thermal management improvements on mt7915
- SAR support for more mt76 platforms
- MBSSID and 6 GHz band on mt7915
- RealTek WiFi:
- rtw89: AP mode
- rtw89: 160 MHz channels and 6 GHz band
- rtw89: hardware scan
- Bluetooth:
- mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)
- Microchip CAN (mcp251xfd):
- multiple RX-FIFOs and runtime configurable RX/TX rings
- internal PLL, runtime PM handling simplification
- improve chip detection and error handling after wakeup"
* tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits)
llc: fix netdevice reference leaks in llc_ui_bind()
drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool
ice: don't allow to run ice_send_event_to_aux() in atomic ctx
ice: fix 'scheduling while atomic' on aux critical err interrupt
net/sched: fix incorrect vlan_push_eth dest field
net: bridge: mst: Restrict info size queries to bridge ports
net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init()
drivers: net: xgene: Fix regression in CRC stripping
net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT
net: dsa: fix missing host-filtered multicast addresses
net/mlx5e: Fix build warning, detected write beyond size of field
iwlwifi: mvm: Don't fail if PPAG isn't supported
selftests/bpf: Fix kprobe_multi test.
Revert "rethook: x86: Add rethook x86 implementation"
Revert "arm64: rethook: Add arm64 rethook implementation"
Revert "powerpc: Add rethook support"
Revert "ARM: rethook: Add rethook arm implementation"
netdevice: add missing dm_private kdoc
net: bridge: mst: prevent NULL deref in br_mst_info_size()
selftests: forwarding: Use same VRF for port and VLAN upper
...
- Proper emulation of the OSLock feature of the debug architecture
- Scalibility improvements for the MMU lock when dirty logging is on
- New VMID allocator, which will eventually help with SVA in VMs
- Better support for PMUs in heterogenous systems
- PSCI 1.1 support, enabling support for SYSTEM_RESET2
- Implement CONFIG_DEBUG_LIST at EL2
- Make CONFIG_ARM64_ERRATUM_2077057 default y
- Reduce the overhead of VM exit when no interrupt is pending
- Remove traces of 32bit ARM host support from the documentation
- Updated vgic selftests
- Various cleanups, doc updates and spelling fixes
RISC-V:
- Prevent KVM_COMPAT from being selected
- Optimize __kvm_riscv_switch_to() implementation
- RISC-V SBI v0.3 support
s390:
- memop selftest
- fix SCK locking
- adapter interruptions virtualization for secure guests
- add Claudio Imbrenda as maintainer
- first step to do proper storage key checking
x86:
- Continue switching kvm_x86_ops to static_call(); introduce
static_call_cond() and __static_call_ret0 when applicable.
- Cleanup unused arguments in several functions
- Synthesize AMD 0x80000021 leaf
- Fixes and optimization for Hyper-V sparse-bank hypercalls
- Implement Hyper-V's enlightened MSR bitmap for nested SVM
- Remove MMU auditing
- Eager splitting of page tables (new aka "TDP" MMU only) when dirty
page tracking is enabled
- Cleanup the implementation of the guest PGD cache
- Preparation for the implementation of Intel IPI virtualization
- Fix some segment descriptor checks in the emulator
- Allow AMD AVIC support on systems with physical APIC ID above 255
- Better API to disable virtualization quirks
- Fixes and optimizations for the zapping of page tables:
- Zap roots in two passes, avoiding RCU read-side critical sections
that last too long for very large guests backed by 4 KiB SPTEs.
- Zap invalid and defunct roots asynchronously via concurrency-managed
work queue.
- Allowing yielding when zapping TDP MMU roots in response to the root's
last reference being put.
- Batch more TLB flushes with an RCU trick. Whoever frees the paging
structure now holds RCU as a proxy for all vCPUs running in the guest,
i.e. to prolongs the grace period on their behalf. It then kicks the
the vCPUs out of guest mode before doing rcu_read_unlock().
Generic:
- Introduce __vcalloc and use it for very large allocations that
need memcg accounting
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmI4fdwUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMq8gf/WoeVHtw2QlL5Mmz6McvRRmPAYPLV
wLUIFNrRqRvd8Tw4kivzZoh/xTpwmnojv0YdK5SjKAiMjgv094YI1LrNp1JSPvmL
pitocMkA10RSJNWHeEMg9cMSKH0rKiqeYl6S1e2XsdB+UZZ2BINOCVtvglmjTAvJ
dFBdKdBkqjAUZbdXAGIvz4JEEER3N/LkFDKGaUGX+0QIQOzGBPIyLTxynxIDG6mt
RViCCFyXdy5NkVp5hZFm96vQ2qAlWL9B9+iKruQN++82+oqWbeTdSqPhdwF7GyFz
BfOv3gobQ2c4ef/aMLO5LswZ9joI1t/4kQbbAn6dNybpOAz/NXfDnbNefg==
=keox
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- Proper emulation of the OSLock feature of the debug architecture
- Scalibility improvements for the MMU lock when dirty logging is on
- New VMID allocator, which will eventually help with SVA in VMs
- Better support for PMUs in heterogenous systems
- PSCI 1.1 support, enabling support for SYSTEM_RESET2
- Implement CONFIG_DEBUG_LIST at EL2
- Make CONFIG_ARM64_ERRATUM_2077057 default y
- Reduce the overhead of VM exit when no interrupt is pending
- Remove traces of 32bit ARM host support from the documentation
- Updated vgic selftests
- Various cleanups, doc updates and spelling fixes
RISC-V:
- Prevent KVM_COMPAT from being selected
- Optimize __kvm_riscv_switch_to() implementation
- RISC-V SBI v0.3 support
s390:
- memop selftest
- fix SCK locking
- adapter interruptions virtualization for secure guests
- add Claudio Imbrenda as maintainer
- first step to do proper storage key checking
x86:
- Continue switching kvm_x86_ops to static_call(); introduce
static_call_cond() and __static_call_ret0 when applicable.
- Cleanup unused arguments in several functions
- Synthesize AMD 0x80000021 leaf
- Fixes and optimization for Hyper-V sparse-bank hypercalls
- Implement Hyper-V's enlightened MSR bitmap for nested SVM
- Remove MMU auditing
- Eager splitting of page tables (new aka "TDP" MMU only) when dirty
page tracking is enabled
- Cleanup the implementation of the guest PGD cache
- Preparation for the implementation of Intel IPI virtualization
- Fix some segment descriptor checks in the emulator
- Allow AMD AVIC support on systems with physical APIC ID above 255
- Better API to disable virtualization quirks
- Fixes and optimizations for the zapping of page tables:
- Zap roots in two passes, avoiding RCU read-side critical
sections that last too long for very large guests backed by 4
KiB SPTEs.
- Zap invalid and defunct roots asynchronously via
concurrency-managed work queue.
- Allowing yielding when zapping TDP MMU roots in response to the
root's last reference being put.
- Batch more TLB flushes with an RCU trick. Whoever frees the
paging structure now holds RCU as a proxy for all vCPUs running
in the guest, i.e. to prolongs the grace period on their behalf.
It then kicks the the vCPUs out of guest mode before doing
rcu_read_unlock().
Generic:
- Introduce __vcalloc and use it for very large allocations that need
memcg accounting"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits)
KVM: use kvcalloc for array allocations
KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2
kvm: x86: Require const tsc for RT
KVM: x86: synthesize CPUID leaf 0x80000021h if useful
KVM: x86: add support for CPUID leaf 0x80000021
KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_mask
Revert "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()"
kvm: x86/mmu: Flush TLB before zap_gfn_range releases RCU
KVM: arm64: fix typos in comments
KVM: arm64: Generalise VM features into a set of flags
KVM: s390: selftests: Add error memop tests
KVM: s390: selftests: Add more copy memop tests
KVM: s390: selftests: Add named stages for memop test
KVM: s390: selftests: Add macro as abstraction for MEM_OP
KVM: s390: selftests: Split memop tests
KVM: s390x: fix SCK locking
RISC-V: KVM: Implement SBI HSM suspend call
RISC-V: KVM: Add common kvm_riscv_vcpu_wfi() function
RISC-V: Add SBI HSM suspend related defines
RISC-V: KVM: Implement SBI v0.3 SRST extension
...
The panic_print setting allows users to collect more information in a
panic event, like memory stats, tasks, CPUs backtraces, etc. This is an
interesting debug mechanism, but currently the print event happens *after*
kmsg_dump(), meaning that pstore, for example, cannot collect a dmesg with
the panic_print extra information.
This patch changes that in 2 steps:
(a) The panic_print setting allows to replay the existing kernel log
buffer to the console (bit 5), besides the extra information dump.
This functionality makes sense only at the end of the panic()
function. So, we hereby allow to distinguish the two situations by a
new boolean parameter in the function panic_print_sys_info().
(b) With the above change, we can safely call panic_print_sys_info()
before kmsg_dump(), allowing to dump the extra information when using
pstore or other kmsg dumpers.
The additional messages from panic_print could overwrite the oldest
messages when the buffer is full. The only reasonable solution is to use
a large enough log buffer, hence we added an advice into the kernel
parameters documentation about that.
Link: https://lkml.kernel.org/r/20220214141308.841525-1-gpiccoli@igalia.com
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Acked-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Feng Tang <feng.tang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently the "panic_print" parameter/sysctl allows some interesting debug
information to be printed during a panic event. This is useful for
example in cases the user cannot kdump due to resource limits, or if the
user collects panic logs in a serial output (or pstore) and prefers a fast
reboot instead of a kdump.
Happens that currently there's no way to see all CPUs backtraces in a
panic using "panic_print" on architectures that support that. We do have
"oops_all_cpu_backtrace" sysctl, but although partially overlapping in the
functionality, they are orthogonal in nature: "panic_print" is a panic
tuning (and we have panics without oopses, like direct calls to panic() or
maybe other paths that don't go through oops_enter() function), and the
original purpose of "oops_all_cpu_backtrace" is to provide more
information on oopses for cases in which the users desire to continue
running the kernel even after an oops, i.e., used in non-panic scenarios.
So, we hereby introduce an additional bit for "panic_print" to allow
dumping the CPUs backtraces during a panic event.
Link: https://lkml.kernel.org/r/20211109202848.610874-3-gpiccoli@igalia.com
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: Feng Tang <feng.tang@intel.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "Some improvements on panic_print".
This is a mix of a documentation fix with some additions to the
"panic_print" syscall / parameter. The goal here is being able to collect
all CPUs backtraces during a panic event and also to enable "panic_print"
in a kdump event - details of the reasoning and design choices in the
patches.
This patch (of 3):
Commit de6da1e8bc ("panic: add an option to replay all the printk
message in buffer") added a new bit to the sysctl/kernel parameter
"panic_print", but the documentation was added only in
kernel-parameters.txt, not in the sysctl guide.
Fix it here by adding bit 5 to sysctl admin-guide documentation.
[rdunlap@infradead.org: fix table format warning]
Link: https://lkml.kernel.org/r/20220109055635.6999-1-rdunlap@infradead.org
Link: https://lkml.kernel.org/r/20211109202848.610874-1-gpiccoli@igalia.com
Link: https://lkml.kernel.org/r/20211109202848.610874-2-gpiccoli@igalia.com
Fixes: de6da1e8bc ("panic: add an option to replay all the printk message in buffer")
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: Feng Tang <feng.tang@intel.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Except cp and makedumpfile, add scp example to write out the dump file.
Link: https://lkml.kernel.org/r/1644324666-15947-3-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "Update doc and fix some issues about kdump", v2.
This patch (of 5):
After commit 6a108a14fa ("kconfig: rename CONFIG_EMBEDDED to
CONFIG_EXPERT"), "Configure standard kernel features (for small
systems)" is not exist, we should use "Configure standard kernel
features (expert users)" now.
Link: https://lkml.kernel.org/r/1644324666-15947-1-git-send-email-yangtiezhu@loongson.cn
Link: https://lkml.kernel.org/r/1644324666-15947-2-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAmI5jiEACgkQCF8+vY7k
4RWfnw/9FSBVrFgzoDwM4choQu997T6GSsEuqJFbLdDLPbKZifl9UsCPmenFp0aS
4D2EG4A1nF/HQTHJ6vPSWjgVP9zhCAX/DvHH+9DiSAWQoSIVmUZGoEhbAHlbE12K
PUs0MEIR8o8k3IBvMD6buH1FpnIgZO1ULi1Cx/5YH1GaRshdZrLcgz0YioXomLKE
KvNokrhLYzJFIWl34KZ+92RluPOy7DlEJpRNbCTYkaLYfSYqLs/FTisuEUt3gEso
tjgUaBxJ/k3AOgU4XXoeVlqTFuK1TY70aA0aqmVYPqZ7eCO2Btbm11h8WoYO/SgY
N3P57LP86WWUHNA13argVv/pQo0x8iX5RnYObLDMGGrUQyQT7BcjMGCrKIVyMRAz
06dZbnGnbsOOph9D7wwQ+xJQwUqyrllVVhRdMIWXJQjKqAP9mmgIB/dcwrrP5Ziw
y0fmuaXZ/ZmvD63yq2iWwV6niWvNa5XMnR3NxceOV60WOe9LS6aio/duwfaZ5ic1
qzTAtc/+3FuIgRD35eILrjymu53gW6pt6vS0pHP/+xvHq5Yp7u8Pc5+jFxLYRM8e
AOglA7ZxGGz1uL/LUJ4DD8BQ55wr0EH63Lm7Pfy4JmmzqI/TQwEQifT/H8mDNP+G
DCmod3ZyCsHH6vsN0afa4ZxqyCDToVHVwvko4mzOnl4hED5JteI=
=Bc0l
-----END PGP SIGNATURE-----
Merge tag 'media/v5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- a major reorg at platform Kconfig/Makefile files, organizing them per
vendor. The other media Kconfig/Makefile files also sorted
- New sensor drivers: hi847, isl7998x, ov08d10
- New Amphion vpu decoder stateful driver
- New Atmel microchip csi2dc driver
- tegra-vde driver promoted from staging
- atomisp: some fixes for it to work on BYT
- imx7-mipi-csis driver promoted from staging and renamed
- camss driver got initial support for VFE hardware version Titan 480
- mtk-vcodec has gained support for MT8192
- lots of driver changes, fixes and improvements
* tag 'media/v5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (417 commits)
media: nxp: Restrict VIDEO_IMX_MIPI_CSIS to ARCH_MXC or COMPILE_TEST
media: amphion: cleanup media device if register it fail
media: amphion: fix some issues to improve robust
media: amphion: fix some error related with undefined reference to __divdi3
media: amphion: fix an issue that using pm_runtime_get_sync incorrectly
media: vidtv: use vfree() for memory allocated with vzalloc()
media: m5mols/m5mols.h: document new reset field
media: pixfmt-yuv-planar.rst: fix PIX_FMT labels
media: platform: Remove unnecessary print function dev_err()
media: amphion: Add missing of_node_put() in vpu_core_parse_dt()
media: mtk-vcodec: Add missing of_node_put() in mtk_vdec_hw_prob_done()
media: platform: amphion: Fix build error without MAILBOX
media: spi: Kconfig: Place SPI drivers on a single menu
media: i2c: Kconfig: move camera drivers to the top
media: atomisp: fix bad usage at error handling logic
media: platform: rename mediatek/mtk-jpeg/ to mediatek/jpeg/
media: media/*/Kconfig: sort entries
media: Kconfig: cleanup VIDEO_DEV dependencies
media: platform/*/Kconfig: make manufacturer menus more uniform
media: platform: Create vendor/{Makefile,Kconfig} files
...
- New user_events interface. User space can register an event with the kernel
describing the format of the event. Then it will receive a byte in a page
mapping that it can check against. A privileged task can then enable that
event like any other event, which will change the mapped byte to true,
telling the user space application to start writing the event to the
tracing buffer.
- Add new "ftrace_boot_snapshot" kernel command line parameter. When set,
the tracing buffer will be saved in the snapshot buffer at boot up when
the kernel hands things over to user space. This will keep the traces that
happened at boot up available even if user space boot up has tracing as
well.
- Have TRACE_EVENT_ENUM() also update trace event field type descriptions.
Thus if a static array defines its size with an enum, the user space trace
event parsers can still know how to parse that array.
- Add new TRACE_CUSTOM_EVENT() macro. This acts the same as the
TRACE_EVENT() macro, but will attach to an existing tracepoint. This will
make one tracepoint be able to trace different content and not be stuck at
only what the original TRACE_EVENT() macro exports.
- Fixes to tracing error logging.
- Better saving of cmdlines to PIDs when tracing (use the wakeup events for
mapping).
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYjiO3RQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qhQzAQDtek5p80p/zkMGymm14wSH6qq0NdgN
Kv7fTBwEewUa0gD/UCOVLw4Oj+JtHQhCa3sCGZopmRv0BT1+4UQANqosKQY=
=Au08
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
- New user_events interface. User space can register an event with the
kernel describing the format of the event. Then it will receive a
byte in a page mapping that it can check against. A privileged task
can then enable that event like any other event, which will change
the mapped byte to true, telling the user space application to start
writing the event to the tracing buffer.
- Add new "ftrace_boot_snapshot" kernel command line parameter. When
set, the tracing buffer will be saved in the snapshot buffer at boot
up when the kernel hands things over to user space. This will keep
the traces that happened at boot up available even if user space boot
up has tracing as well.
- Have TRACE_EVENT_ENUM() also update trace event field type
descriptions. Thus if a static array defines its size with an enum,
the user space trace event parsers can still know how to parse that
array.
- Add new TRACE_CUSTOM_EVENT() macro. This acts the same as the
TRACE_EVENT() macro, but will attach to an existing tracepoint. This
will make one tracepoint be able to trace different content and not
be stuck at only what the original TRACE_EVENT() macro exports.
- Fixes to tracing error logging.
- Better saving of cmdlines to PIDs when tracing (use the wakeup events
for mapping).
* tag 'trace-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (30 commits)
tracing: Have type enum modifications copy the strings
user_events: Add trace event call as root for low permission cases
tracing/user_events: Use alloc_pages instead of kzalloc() for register pages
tracing: Add snapshot at end of kernel boot up
tracing: Have TRACE_DEFINE_ENUM affect trace event types as well
tracing: Fix strncpy warning in trace_events_synth.c
user_events: Prevent dyn_event delete racing with ioctl add/delete
tracing: Add TRACE_CUSTOM_EVENT() macro
tracing: Move the defines to create TRACE_EVENTS into their own files
tracing: Add sample code for custom trace events
tracing: Allow custom events to be added to the tracefs directory
tracing: Fix last_cmd_set() string management in histogram code
user_events: Fix potential uninitialized pointer while parsing field
tracing: Fix allocation of last_cmd in last_cmd_set()
user_events: Add documentation file
user_events: Add sample code for typical usage
user_events: Add self-test for validator boundaries
user_events: Add self-test for perf_event integration
user_events: Add self-test for dynamic_events integration
user_events: Add self-test for ftrace integration
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmI4ggsACgkQUqAMR0iA
lPLrlA//R12HGCGSzdpdyynl+5wByIqcHe8RANHOAj9f9qxBtmYv2ZK69mzSvhHO
6kAGdb3vBtxo1NCHeqxlXpds9GP/zOGEWmEJP2P7pIZ8ci8QtwrXCtQ8XIW9UGhJ
WHzXpXkfzcIDsRZs6B1pxN5cRXuW2VVzfgxyu6L+hvNV0o0PPO4A48ptzNBZh8rj
URAid+n/aGs9SOXM0h8SRjjBYEqjiB2RZ3gLg5XGZmcATtitBO135LGZnBR2fwnX
RZKckbdA/fBzqS4Njsp2rV5Rqldwj7mHzQbcsQm4YDrxSdl8d78XxQdAA5sNyaCD
ToDw6/DeegXzgtPJpuBH/ymF9RczIu4l3eawO1FBMCB5EPq56zVHWErxry8qaTgi
yQFqhBgifNN5NqfQCn7dyF10usmsvImFczre7ZxJvL7vmzqDsYYqdZG5oouLudR4
iOphFwX71v4X+RsxbOXqEt+mS3AwqEJc1SZl5rrDc4TSUOE1qCd+ncLTAuAf3Wfm
1xaZ+siomahcZAKrgmSw6AcD5bU+JJpr6FktKAddiO7J1+nIdT1lYEbpUsfWZ/p8
Kx8A2M2ula+whJ6CgtnGTsbsacsFi+j/MioMTGZIU+Fubkig3XEeIp3QUO6sEN+9
/sUQ6Wj6c95miWdttff9o6ap8py9NbfuKIw/HMOesfLVKP82rmw=
=EUpJ
-----END PGP SIGNATURE-----
Merge tag 'printk-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Make %pK behave the same as %p for kptr_restrict == 0 also with
no_hash_pointers parameter
- Ignore the default console in the device tree also when console=null
or console="" is used on the command line
- Document console=null and console="" behavior
- Prevent a deadlock and a livelock caused by console_lock in panic()
- Make console_lock available for panicking CPU
- Fast query for the next to-be-used sequence number
- Use the expected return values in printk.devkmsg __setup handler
- Use the correct atomic operations in wake_up_klogd() irq_work handler
- Avoid possible unaligned access when handling %4cc printing format
* tag 'printk-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: fix return value of printk.devkmsg __setup handler
vsprintf: Fix %pK with kptr_restrict == 0
printk: make suppress_panic_printk static
printk: Set console_set_on_cmdline=1 when __add_preferred_console() is called with user_specified == true
Docs: printk: add 'console=null|""' to admin/kernel-parameters
printk: use atomic updates for klogd work
printk: Drop console_sem during panic
printk: Avoid livelock with heavy printk during panic
printk: disable optimistic spin during panic
printk: Add panic_in_progress helper
vsprintf: Move space out of string literals in fourcc_string()
vsprintf: Fix potential unaligned access
printk: ringbuffer: Improve prb_next_seq() performance
Before DAMON is merged in the mainline, the concept of 'regions update
interval' has generalized to be used as the time interval for update of
any monitoring operations related data structure, but the document has not
updated properly. This commit updates the document for better
consistency.
Link: https://lkml.kernel.org/r/20220222170100.17068-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A previous commit made init_regions debugfs file to use target index
instead of target id for specifying the target of the init regions. This
commit updates the usage document to reflect the change.
Link: https://lkml.kernel.org/r/20211230100723.2238-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zswap has an ability to efficiently store same-value filled pages, which
can be turned on and off using the "same_filled_pages_enabled"
parameter.
However, there is currently no way to enable just this (lightweight)
functionality, while not making use of the whole compressed page storage
machinery.
Add a "non_same_filled_pages_enabled" parameter which allows disabling
handling of pages that aren't same-value filled. This way zswap can be
run in such lightweight same-value filled pages only mode.
Link: https://lkml.kernel.org/r/7dbafa963e8bab43608189abbe2067f4b9287831.1641247624.git.maciej.szmigiero@oracle.com
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With the advent of various new memory types, some machines will have
multiple types of memory, e.g. DRAM and PMEM (persistent memory). The
memory subsystem of these machines can be called memory tiering system,
because the performance of the different types of memory are usually
different.
In such system, because of the memory accessing pattern changing etc,
some pages in the slow memory may become hot globally. So in this
patch, the NUMA balancing mechanism is enhanced to optimize the page
placement among the different memory types according to hot/cold
dynamically.
In a typical memory tiering system, there are CPUs, fast memory and slow
memory in each physical NUMA node. The CPUs and the fast memory will be
put in one logical node (called fast memory node), while the slow memory
will be put in another (faked) logical node (called slow memory node).
That is, the fast memory is regarded as local while the slow memory is
regarded as remote. So it's possible for the recently accessed pages in
the slow memory node to be promoted to the fast memory node via the
existing NUMA balancing mechanism.
The original NUMA balancing mechanism will stop to migrate pages if the
free memory of the target node becomes below the high watermark. This
is a reasonable policy if there's only one memory type. But this makes
the original NUMA balancing mechanism almost do not work to optimize
page placement among different memory types. Details are as follows.
It's the common cases that the working-set size of the workload is
larger than the size of the fast memory nodes. Otherwise, it's
unnecessary to use the slow memory at all. So, there are almost always
no enough free pages in the fast memory nodes, so that the globally hot
pages in the slow memory node cannot be promoted to the fast memory
node. To solve the issue, we have 2 choices as follows,
a. Ignore the free pages watermark checking when promoting hot pages
from the slow memory node to the fast memory node. This will
create some memory pressure in the fast memory node, thus trigger
the memory reclaiming. So that, the cold pages in the fast memory
node will be demoted to the slow memory node.
b. Define a new watermark called wmark_promo which is higher than
wmark_high, and have kswapd reclaiming pages until free pages reach
such watermark. The scenario is as follows: when we want to promote
hot-pages from a slow memory to a fast memory, but fast memory's free
pages would go lower than high watermark with such promotion, we wake
up kswapd with wmark_promo watermark in order to demote cold pages and
free us up some space. So, next time we want to promote hot-pages we
might have a chance of doing so.
The choice "a" may create high memory pressure in the fast memory node.
If the memory pressure of the workload is high, the memory pressure
may become so high that the memory allocation latency of the workload
is influenced, e.g. the direct reclaiming may be triggered.
The choice "b" works much better at this aspect. If the memory
pressure of the workload is high, the hot pages promotion will stop
earlier because its allocation watermark is higher than that of the
normal memory allocation. So in this patch, choice "b" is implemented.
A new zone watermark (WMARK_PROMO) is added. Which is larger than the
high watermark and can be controlled via watermark_scale_factor.
In addition to the original page placement optimization among sockets,
the NUMA balancing mechanism is extended to be used to optimize page
placement according to hot/cold among different memory types. So the
sysctl user space interface (numa_balancing) is extended in a backward
compatible way as follow, so that the users can enable/disable these
functionality individually.
The sysctl is converted from a Boolean value to a bits field. The
definition of the flags is,
- 0: NUMA_BALANCING_DISABLED
- 1: NUMA_BALANCING_NORMAL
- 2: NUMA_BALANCING_MEMORY_TIERING
We have tested the patch with the pmbench memory accessing benchmark
with the 80:20 read/write ratio and the Gauss access address
distribution on a 2 socket Intel server with Optane DC Persistent
Memory Model. The test results shows that the pmbench score can
improve up to 95.9%.
Thanks Andrew Morton to help fix the document format error.
Link: https://lkml.kernel.org/r/20220221084529.1052339-3-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: zhongjiang-ali <zhongjiang-ali@linux.alibaba.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Feng Tang <feng.tang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "Free the 2nd vmemmap page associated with each HugeTLB
page", v7.
This series can minimize the overhead of struct page for 2MB HugeTLB
pages significantly. It further reduces the overhead of struct page by
12.5% for a 2MB HugeTLB compared to the previous approach, which means
2GB per 1TB HugeTLB. It is a nice gain. Comments and reviews are
welcome. Thanks.
The main implementation and details can refer to the commit log of patch
1. In this series, I have changed the following four helpers, the
following table shows the impact of the overhead of those helpers.
+------------------+-----------------------+
| APIs | head page | tail page |
+------------------+-----------+-----------+
| PageHead() | Y | N |
+------------------+-----------+-----------+
| PageTail() | Y | N |
+------------------+-----------+-----------+
| PageCompound() | N | N |
+------------------+-----------+-----------+
| compound_head() | Y | N |
+------------------+-----------+-----------+
Y: Overhead is increased.
N: Overhead is _NOT_ increased.
It shows that the overhead of those helpers on a tail page don't change
between "hugetlb_free_vmemmap=on" and "hugetlb_free_vmemmap=off". But the
overhead on a head page will be increased when "hugetlb_free_vmemmap=on"
(except PageCompound()). So I believe that Matthew Wilcox's folio series
will help with this.
The users of PageHead() and PageTail() are much less than compound_head()
and most users of PageTail() are VM_BUG_ON(), so I have done some tests
about the overhead of compound_head() on head pages.
I have tested the overhead of calling compound_head() on a head page,
which is 2.11ns (Measure the call time of 10 million times
compound_head(), and then average).
For a head page whose address is not aligned with PAGE_SIZE or a
non-compound page, the overhead of compound_head() is 2.54ns which is
increased by 20%. For a head page whose address is aligned with
PAGE_SIZE, the overhead of compound_head() is 2.97ns which is increased by
40%. Most pages are the former. I do not think the overhead is
significant since the overhead of compound_head() itself is low.
This patch (of 5):
This patch minimizes the overhead of struct page for 2MB HugeTLB pages
significantly. It further reduces the overhead of struct page by 12.5%
for a 2MB HugeTLB compared to the previous approach, which means 2GB per
1TB HugeTLB (2MB type).
After the feature of "Free sonme vmemmap pages of HugeTLB page" is
enabled, the mapping of the vmemmap addresses associated with a 2MB
HugeTLB page becomes the figure below.
HugeTLB struct pages(8 pages) page frame(8 pages)
+-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+---> PG_head
| | | 0 | -------------> | 0 |
| | +-----------+ +-----------+
| | | 1 | -------------> | 1 |
| | +-----------+ +-----------+
| | | 2 | ----------------^ ^ ^ ^ ^ ^
| | +-----------+ | | | | |
| | | 3 | ------------------+ | | | |
| | +-----------+ | | | |
| | | 4 | --------------------+ | | |
| 2MB | +-----------+ | | |
| | | 5 | ----------------------+ | |
| | +-----------+ | |
| | | 6 | ------------------------+ |
| | +-----------+ |
| | | 7 | --------------------------+
| | +-----------+
| |
| |
| |
+-----------+
As we can see, the 2nd vmemmap page frame (indexed by 1) is reused and
remaped. However, the 2nd vmemmap page frame is also can be freed to
the buddy allocator, then we can change the mapping from the figure
above to the figure below.
HugeTLB struct pages(8 pages) page frame(8 pages)
+-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+---> PG_head
| | | 0 | -------------> | 0 |
| | +-----------+ +-----------+
| | | 1 | ---------------^ ^ ^ ^ ^ ^ ^
| | +-----------+ | | | | | |
| | | 2 | -----------------+ | | | | |
| | +-----------+ | | | | |
| | | 3 | -------------------+ | | | |
| | +-----------+ | | | |
| | | 4 | ---------------------+ | | |
| 2MB | +-----------+ | | |
| | | 5 | -----------------------+ | |
| | +-----------+ | |
| | | 6 | -------------------------+ |
| | +-----------+ |
| | | 7 | ---------------------------+
| | +-----------+
| |
| |
| |
+-----------+
After we do this, all tail vmemmap pages (1-7) are mapped to the head
vmemmap page frame (0). In other words, there are more than one page
struct with PG_head associated with each HugeTLB page. We __know__ that
there is only one head page struct, the tail page structs with PG_head are
fake head page structs. We need an approach to distinguish between those
two different types of page structs so that compound_head(), PageHead()
and PageTail() can work properly if the parameter is the tail page struct
but with PG_head.
The following code snippet describes how to distinguish between real and
fake head page struct.
if (test_bit(PG_head, &page->flags)) {
unsigned long head = READ_ONCE(page[1].compound_head);
if (head & 1) {
if (head == (unsigned long)page + 1)
==> head page struct
else
==> tail page struct
} else
==> head page struct
}
We can safely access the field of the @page[1] with PG_head because the
@page is a compound page composed with at least two contiguous pages.
[songmuchun@bytedance.com: restore lost comment changes]
Link: https://lkml.kernel.org/r/20211101031651.75851-1-songmuchun@bytedance.com
Link: https://lkml.kernel.org/r/20211101031651.75851-2-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Barry Song <song.bao.hua@hisilicon.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Chen Huang <chenhuang5@huawei.com>
Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
During the integration of PREEMPT_RT support, the code flow around
memcg_check_events() resulted in `twisted code'. Moving the code around
and avoiding then would then lead to an additional local-irq-save
section within memcg_check_events(). While looking better, it adds a
local-irq-save section to code flow which is usually within an
local-irq-off block on non-PREEMPT_RT configurations.
The threshold event handler is a deprecated memcg v1 feature. Instead
of trying to get it to work under PREEMPT_RT just disable it. There
should be no users on PREEMPT_RT. From that perspective it makes even
less sense to get it to work under PREEMPT_RT while having zero users.
Make memory.soft_limit_in_bytes and cgroup.event_control return
-EOPNOTSUPP on PREEMPT_RT. Make an empty memcg_check_events() and
memcg_write_event_control() which return only -EOPNOTSUPP on PREEMPT_RT.
Document that the two knobs are disabled on PREEMPT_RT.
Link: https://lkml.kernel.org/r/20220226204144.1008339-3-bigeasy@linutronix.de
Suggested-by: Michal Hocko <mhocko@kernel.org>
Suggested-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: kernel test robot <oliver.sang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently memcg stats show several types of kernel memory: kernel stack,
page tables, sock, vmalloc, and slab. However, there are other
allocations with __GFP_ACCOUNT (or supersets such as GFP_KERNEL_ACCOUNT)
that are not accounted in any of those stats, a few examples are:
- various kvm allocations (e.g. allocated pages to create vcpus)
- io_uring
- tmp_page in pipes during pipe_write()
- bpf ringbuffers
- unix sockets
Keeping track of the total kernel memory is essential for the ease of
migration from cgroup v1 to v2 as there are large discrepancies between
v1's kmem.usage_in_bytes and the sum of the available kernel memory
stats in v2. Adding separate memcg stats for all __GFP_ACCOUNT kernel
allocations is an impractical maintenance burden as there a lot of those
all over the kernel code, with more use cases likely to show up in the
future.
Therefore, add a "kernel" memcg stat that is analogous to kmem page
counter, with added benefits such as using rstat infrastructure which
aggregates stats more efficiently. Additionally, this provides a
lighter alternative in case the legacy kmem is deprecated in the future
[yosryahmed@google.com: v2]
Link: https://lkml.kernel.org/r/20220203193856.972500-1-yosryahmed@google.com
Link: https://lkml.kernel.org/r/20220201200823.3283171-1-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Cleanups for SCHED_DEADLINE
- Tracing updates/fixes
- CPU Accounting fixes
- First wave of changes to optimize the overhead of the scheduler build,
from the fast-headers tree - including placeholder *_api.h headers for
later header split-ups.
- Preempt-dynamic using static_branch() for ARM64
- Isolation housekeeping mask rework; preperatory for further changes
- NUMA-balancing: deal with CPU-less nodes
- NUMA-balancing: tune systems that have multiple LLC cache domains per node (eg. AMD)
- Updates to RSEQ UAPI in preparation for glibc usage
- Lots of RSEQ/selftests, for same
- Add Suren as PSI co-maintainer
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmI5rg8RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hGrw/+M3QOk6fH7G48wjlNnBvcOife6ls+Ni4k
ixOAcF4JKoixO8HieU5vv0A7yf/83tAa6fpeXeMf1hkCGc0NSlmLtuIux+WOmoAL
LzCyDEYfiP8KnVh0A1Tui/lK0+AkGo21O6ADhQE2gh8o2LpslOHQMzvtyekSzeeb
mVxMYQN+QH0m518xdO2D8IQv9ctOYK0eGjmkqdNfntOlytypPZHeNel/tCzwklP/
dElJUjNiSKDlUgTBPtL3DfpoLOI/0mHF2p6NEXvNyULxSOqJTu8pv9Z2ADb2kKo1
0D56iXBDngMi9MHIJLgvzsA8gKzHLFSuPbpODDqkTZCa28vaMB9NYGhJ643NtEie
IXTJEvF1rmNkcLcZlZxo0yjL0fjvPkczjw4Vj27gbrUQeEBfb4mfuI4BRmij63Ep
qEkgQTJhduCqqrQP1rVyhwWZRk1JNcVug+F6N42qWW3fg1xhj0YSrLai2c9nPez6
3Zt98H8YGS1Z/JQomSw48iGXVqfTp/ETI7uU7jqHK8QcjzQ4lFK5H4GZpwuqGBZi
NJJ1l97XMEas+rPHiwMEN7Z1DVhzJLCp8omEj12QU+tGLofxxwAuuOVat3CQWLRk
f80Oya3TLEgd22hGIKDRmHa22vdWnNQyS0S15wJotawBzQf+n3auS9Q3/rh979+t
ES/qvlGxTIs=
=Z8uT
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- Cleanups for SCHED_DEADLINE
- Tracing updates/fixes
- CPU Accounting fixes
- First wave of changes to optimize the overhead of the scheduler
build, from the fast-headers tree - including placeholder *_api.h
headers for later header split-ups.
- Preempt-dynamic using static_branch() for ARM64
- Isolation housekeeping mask rework; preperatory for further changes
- NUMA-balancing: deal with CPU-less nodes
- NUMA-balancing: tune systems that have multiple LLC cache domains per
node (eg. AMD)
- Updates to RSEQ UAPI in preparation for glibc usage
- Lots of RSEQ/selftests, for same
- Add Suren as PSI co-maintainer
* tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (81 commits)
sched/headers: ARM needs asm/paravirt_api_clock.h too
sched/numa: Fix boot crash on arm64 systems
headers/prep: Fix header to build standalone: <linux/psi.h>
sched/headers: Only include <linux/entry-common.h> when CONFIG_GENERIC_ENTRY=y
cgroup: Fix suspicious rcu_dereference_check() usage warning
sched/preempt: Tell about PREEMPT_DYNAMIC on kernel headers
sched/topology: Remove redundant variable and fix incorrect type in build_sched_domains
sched/deadline,rt: Remove unused parameter from pick_next_[rt|dl]_entity()
sched/deadline,rt: Remove unused functions for !CONFIG_SMP
sched/deadline: Use __node_2_[pdl|dle]() and rb_first_cached() consistently
sched/deadline: Merge dl_task_can_attach() and dl_cpu_busy()
sched/deadline: Move bandwidth mgmt and reclaim functions into sched class source file
sched/deadline: Remove unused def_dl_bandwidth
sched/tracing: Report TASK_RTLOCK_WAIT tasks as TASK_UNINTERRUPTIBLE
sched/tracing: Don't re-read p->state when emitting sched_switch event
sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race
sched/cpuacct: Remove redundant RCU read lock
sched/cpuacct: Optimize away RCU read lock
sched/cpuacct: Fix charge percpu cpuusage
sched/headers: Reorganize, clean up and optimize kernel/sched/sched.h dependencies
...
New drivers
- Driver for Texas Instruments TMP464 and TMP468
- Driver for Vicor PLI1209BC Digital Supervisor
- Driver for ASUS EC
Improvements to existing drivers:
- adt7x10: Convert to use regmap, convert to use with_info API,
use hwmon_notify_event, and other cleanup
- aquacomputer_d5next: Add support for Aquacomputer Farbwerk 360
- asus_wmi_sensors: Add ASUS ROG STRIX B450-F GAMING II
- asus_wmi_ec_sensors: Support T_Sensor on Prime X570-Pro
Deprecate driver (replaced by new driver)
- axi-fan-control: Use hwmon_notify_event
- dell-smm: Clean up CONFIG_I8K, disable fan type support for
Inspiron 3505, various other cleanup
- hwmon core: Report attribute name with udev events,
Add "label" attribute to ABI,
Add support for pwm auto channels attribute
- max6639: Add regulator support
- lm70: Add support for TI TMP125
- lm83: Cleanup, convert to use with_info API
- mlxreg-fan: Use pwm attribute for setting fan speed low limit
- nct6775: Sdd ASUS ROG STRIX Z390/Z490/X570-* / PRIME X570-P,
PRIME B550-PLUS, ASUS Pro B550M-C/PRIME B550M-A,
and support for TSI temperature registers
- occ: Add various new sysfs attributes
- pmbus core: Handle VIN unit off status,
Add regulator supply into macro,
Add get_error_flags support to regulator ops
- pmbus/adm1275: Allow setting sample averaging
- pmbus/lm25066: Add regulator support
- pmbus/xdpe12284: Add support for xdpe11280 and register as regulator
- powr1220: Convert to with_info API,
Add support for Lattice's POWR1014 power manager IC
- sch56xx: Cleanup and minor improvements
- sch5627: Add pwmX_auto_channels_temp support
- tc654: Add thermal_cooling device support
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEiHPvMQj9QTOCiqgVyx8mb86fmYEFAmI3tLEACgkQyx8mb86f
mYGztw//T9YdVs+oBImLAF2Q3NXzvCNWSdO7ZUzaxUTDw60lGHZpjMYNt/m+k3j/
vdjVFpyyvJW0D1iDxHd5EIO7IVhSsd01VYlSCMELF2NYug12XzAAofzbGsEJih65
DISIsPp1h+Ddc6XQsuuh7A3etDQhPu+YWnuIpSgEI33wK5+U7zxl5CIR9o5asfmf
ZkGjQoN5DjYxqB4MpdTisHz7JC8YAxdXk2ZxqFlZ8yoptB7kLMjbhIFx/PGwI1Os
TEoVEcp8n4KnirVpwwx5wFustX0Abd4Radm9iUTrRhJHHYsVP5RVwYpfs3FhgccW
k69wJ32Hx2fyRWT6/wo88+VJ/T1hPSTB1wMRHDLmJJzLBZsXIfFcf7Hxr/A+N0/U
D88JtWvu2GyJQt5k54IU1RwvN0cBz6J0X3PE7nldaR7lAE1tqF98KC57Esr0vtYu
TLxz/ISBva9mwwWH6Gar1X+kiODhTiHQRTDWhl6vIbAUCcpTjMyNxppAxSFNciGU
S7LDvuPY10S3DHJ0sqLVeANgF5Q/GLBGMI3iamVD08U8dgcEsrycUzs98LAYONyi
d3+3AUUl60OAWlrk43OOLzYllGrCy3OhKNWzmVXbM8Ue9Fb1cet7UD2fAg8WlOVX
kXhtNdXkPk//65NVIikI8owMcQg6iC+zFESPkFlaXQfnhA6KyqQ=
=4u33
-----END PGP SIGNATURE-----
Merge tag 'hwmon-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon updates from Guenter Roeck:
"New drivers:
- Texas Instruments TMP464 and TMP468 driver
- Vicor PLI1209BC Digital Supervisor driver
- ASUS EC driver
Improvements to existing drivers:
- adt7x10:
- Convert to use regmap
- convert to use with_info API
- use hwmon_notify_event
- other cleanup
- aquacomputer_d5next:
- Add support for Aquacomputer Farbwerk 360
- asus_wmi_sensors:
- Add ASUS ROG STRIX B450-F GAMING II
- asus_wmi_ec_sensors:
- Support T_Sensor on Prime X570-Pro
- Deprecate driver (replaced by new driver)
- axi-fan-control:
- Use hwmon_notify_event
- dell-smm:
- Clean up CONFIG_I8K
- disable fan type support for Inspiron 3505
- various other cleanup
- hwmon core:
- Report attribute name with udev events
- Add "label" attribute to ABI,
- Add support for pwm auto channels attribute
- max6639:
- Add regulator support
- lm70:
- Add support for TI TMP125
- lm83:
- Cleanup, convert to use with_info API
- mlxreg-fan:
- Use pwm attribute for setting fan speed low limit
- nct6775:
- Add board ID's for ASUS ROG STRIX Z390/Z490/X570-* / PRIME
X570-P, PRIME B550-PLUS, ASUS Pro B550M-C/PRIME B550M-A
- Add support for TSI temperature registers
- occ:
- Add various new sysfs attributes
- pmbus core:
- Handle VIN unit off status
- Add regulator supply into macro
- Add get_error_flags support to regulator ops
- pmbus/adm1275:
- Allow setting sample averaging
- pmbus/lm25066:
- Add regulator support
- pmbus/xdpe12284:
- Add support for xdpe11280
- register as regulator
- powr1220:
- Convert to with_info API
- Add support for Lattice's POWR1014 power manager IC
- sch56xx:
- Cleanup and minor improvements
- sch5627:
- Add pwmX_auto_channels_temp support
- tc654:
- Add thermal_cooling device support"
* tag 'hwmon-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (86 commits)
hwmon: (dell-smm) Add Inspiron 3505 to fan type blacklist
hwmon: (pmbus) Add Vin unit off handling
hwmon: (scpi-hwmon): Use of_device_get_match_data()
hwmon: (axi-fan-control) Use hwmon_notify_event
hwmon: (vexpress-hwmon) Use of_device_get_match_data()
hwmon: Add driver for Texas Instruments TMP464 and TMP468
dt-bindings: hwmon: add tmp464.yaml
dt-bindings: hwmon: Add sample averaging properties for ADM1275
hwmon: (adm1275) Allow setting sample averaging
hwmon: (xdpe12284) Add regulator support
hwmon: (xdpe12284) Add support for xdpe11280
dt-bindings: trivial-devices: Add xdpe11280
hwmon: (aquacomputer_d5next) Add support for Aquacomputer Farbwerk 360
hwmon: (sch5627) Add pwmX_auto_channels_temp support
hwmon: (core) Add support for pwm auto channels attribute
hwmon: (lm70) Add ti,tmp125 support
dt-bindings: Add ti,tmp125 temperature sensor binding
hwmon: (pmbus/pli1209bc) Add regulator support
hwmon: (pmbus) Add support for pli1209bc
dt-bindings:trivial-devices: Add pli1209bc
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmIzwtEACgkQSfxwEqXe
A67NCBAA1+U01HXx4ethmmy1m2pXHAIwngI7PP0QzyZtmoloWockdN1lRfQ1C0uJ
Whk/9Hc9G7iujznsxOnCS+LeNwRzd7CjtFbTgK+yGIRKwL9GFcVwA5nrifP9TjqZ
FWmTIomjjmA06YRYsNOdNSQdN6DdpQz8xLw0EqVOZerI4ITFErYlW8lLqOOKY99N
f9glQK75kh41SUgo+K3JSn46fhB95HldL6dYSZzjQ6QsVKBQuQTDE9ryfrH2XZDw
xI2nf/ycXPUBv7Bb+0op+7ES++CoDigM2nIyxapEj3ZkpplxL4M+cCIHq3Juzfwm
jDdbZbs5SqDszOQM/dvCJSR+S/D3QIKdv3fwwWHDTigByZdgpudT3rr9k7dY60Z8
aNvOzNWOzGH9/0boLl55WysF6cBQnazbgtzeWpzeuWFhAyfxN/DJx2sf8U+TmN6n
3bDUafamAvmkkIOoHUzOXfjo2lhXxlmRZ40rWVNX5JvcJj5+5jRmTawrQj+9fn8/
MhiIZ6KBDV1OxPwJzG6jm++JP6rgXfXsxduomO7cIEWs10itf/cE8WD9qJrtZTtg
kfjYUguFOd/QyzY0A1w6FD865vy8YhATk71Ywgwj9AI+cfH8QUajpDkXOutjop8x
8HBxIGx6Itgzilfuo5jpJxlVhNO3G6v1fX/A+mUMAfHufkmnfiQ=
=cyDR
-----END PGP SIGNATURE-----
Merge tag 'random-5.18-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
"There have been a few important changes to the RNG's crypto, but the
intent for 5.18 has been to shore up the existing design as much as
possible with modern cryptographic functions and proven constructions,
rather than actually changing up anything fundamental to the RNG's
design.
So it's still the same old RNG at its core as before: it still counts
entropy bits, and collects from the various sources with the same
heuristics as before, and so forth. However, the cryptographic
algorithms that transform that entropic data into safe random numbers
have been modernized.
Just as important, if not more, is that the code has been cleaned up
and re-documented. As one of the first drivers in Linux, going back to
1.3.30, its general style and organization was showing its age and
becoming both a maintenance burden and an auditability impediment.
Hopefully this provides a more solid foundation to build on for the
future. I encourage you to open up the file in full, and maybe you'll
remark, "oh, that's what it's doing," and enjoy reading it. That, at
least, is the eventual goal, which this pull begins working toward.
Here's a summary of the various patches in this pull:
- /dev/urandom and /dev/random now do the same thing, per the patch
we discussed on the list. I think this is worth trying out. If it
does appear problematic, I've made sure to keep it standalone and
revertible without any conflicts.
- Fixes and cleanups for numerous integer type problems, locking
issues, and general code quality concerns.
- The input pool's LFSR has been replaced with a cryptographically
secure hash function, which has security and performance benefits
alike, and consequently allows us to count entropy bits linearly.
- The pre-init injection now uses a real hash function too, instead
of an LFSR or vanilla xor.
- The interrupt handler's fast_mix() function now uses one round of
SipHash, rather than the fake crypto that was there before.
- All additions of RDRAND and RDSEED now go through the input pool's
hash function, in part to mitigate ridiculous hypothetical CPU
backdoors, but more so to have a consistent interface for ingesting
entropy that's easy to analyze, making everything happen one way,
instead of a potpourri of different ways.
- The crng now works on per-cpu data, while also being in accordance
with the actual "fast key erasure RNG" design. This allows us to
fix several boot-time race complications associated with the prior
dynamically allocated model, eliminates much locking, and makes our
backtrack protection more robust.
- Batched entropy now erases doled out values so that it's backtrack
resistant.
- Working closely with Sebastian, the interrupt handler no longer
needs to take any locks at all, as we punt the
synchronized/expensive operations to a workqueue. This is
especially nice for PREEMPT_RT, where taking spinlocks in irq
context is problematic. It also makes the handler faster for the
rest of us.
- Also working with Sebastian, we now do the right thing on CPU
hotplug, so that we don't use stale entropy or fail to accumulate
new entropy when CPUs come back online.
- We handle virtual machines that fork / clone / snapshot, using the
"vmgenid" ACPI specification for retrieving a unique new RNG seed,
which we can use to also make WireGuard (and in the future, other
things) safe across VM forks.
- Around boot time, we now try to reseed more often if enough entropy
is available, before settling on the usual 5 minute schedule.
- Last, but certainly not least, the documentation in the file has
been updated considerably"
* tag 'random-5.18-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (60 commits)
random: check for signal and try earlier when generating entropy
random: reseed more often immediately after booting
random: make consistent usage of crng_ready()
random: use SipHash as interrupt entropy accumulator
wireguard: device: clear keys on VM fork
random: provide notifier for VM fork
random: replace custom notifier chain with standard one
random: do not export add_vmfork_randomness() unless needed
virt: vmgenid: notify RNG of VM fork and supply generation ID
ACPI: allow longer device IDs
random: add mechanism for VM forks to reinitialize crng
random: don't let 644 read-only sysctls be written to
random: give sysctl_random_min_urandom_seed a more sensible value
random: block in /dev/urandom
random: do crng pre-init loading in worker rather than irq
random: unify cycles_t and jiffies usage and types
random: cleanup UUID handling
random: only wake up writers after zap if threshold was passed
random: round-robin registers as ulong, not u32
random: clear fast pool, crng, and batches in cpuhp bring up
...
- Allow device_pm_check_callbacks() to be called from interrupt
context without issues (Dmitry Baryshkov).
- Modify devm_pm_runtime_enable() to automatically handle
pm_runtime_dont_use_autosuspend() at driver exit time (Douglas
Anderson).
- Make the schedutil cpufreq governor use to_gov_attr_set() instead
of open coding it (Kevin Hao).
- Replace acpi_bus_get_device() with acpi_fetch_acpi_dev() in the
cpufreq longhaul driver (Rafael Wysocki).
- Unify show() and store() naming in cpufreq and make it use
__ATTR_XX (Lianjie Zhang).
- Make the intel_pstate driver use the EPP value set by the firmware
by default (Srinivas Pandruvada).
- Re-order the init checks in the powernow-k8 cpufreq driver (Mario
Limonciello).
- Make the ACPI processor idle driver check for architectural
support for LPI to avoid using it on x86 by mistake (Mario
Limonciello).
- Add Sapphire Rapids Xeon support to the intel_idle driver (Artem
Bityutskiy).
- Add 'preferred_cstates' module argument to the intel_idle driver
to work around C1 and C1E handling issue on Sapphire Rapids (Artem
Bityutskiy).
- Add core C6 optimization on Sapphire Rapids to the intel_idle
driver (Artem Bityutskiy).
- Optimize the haltpoll cpuidle driver a bit (Li RongQing).
- Remove leftover text from intel_idle() kerneldoc comment and fix
up white space in intel_idle (Rafael Wysocki).
- Fix load_image_and_restore() error path (Ye Bin).
- Fix typos in comments in the system wakeup hadling code (Tom Rix).
- Clean up non-kernel-doc comments in hibernation code (Jiapeng
Chong).
- Fix __setup handler error handling in system-wide suspend and
hibernation core code (Randy Dunlap).
- Add device name to suspend_report_result() (Youngjin Jang).
- Make virtual guests honour ACPI S4 hardware signature by
default (David Woodhouse).
- Block power off of a parent PM domain unless child is in deepest
state (Ulf Hansson).
- Use dev_err_probe() to simplify error handling for generic PM
domains (Ahmad Fatoum).
- Fix sleep-in-atomic bug caused by genpd_debug_remove() (Shawn Guo).
- Document Intel uncore frequency scaling (Srinivas Pandruvada).
- Add DTPM hierarchy description (Daniel Lezcano).
- Change the locking scheme in DTPM (Daniel Lezcano).
- Fix dtpm_cpu cleanup at exit time and missing virtual DTPM pointer
release (Daniel Lezcano).
- Make dtpm_node_callback[] static (kernel test robot).
- Fix spelling mistake "initialze" -> "initialize" in
dtpm_create_hierarchy() (Colin Ian King).
- Add tracer tool for the amd-pstate driver (Jinzhou Su).
- Fix PC6 displaying in turbostat on some systems (Artem Bityutskiy).
- Add AMD P-State support to the cpupower utility (Huang Rui).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmI4pM4SHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxh5wQAJEz3u55wIHzeov30obtXaD3SxxnvRzR
p96gRcmNoR2so/Q9D+h+JHZKQkVklbnbqExMXQn1qarceAUN7KPjVMRvagjZsC/f
J3LtQmx96yqGTCzOTu5n+Ol2ojKLMCMo++no/2873BYhd60TV6oQxRzkNiZx215n
tT6MKY5ZMX448VKWAWh9vt5rdvbBj9z6cfvpchK/3bziE21lfLz/1iXeFnwqjPGU
XuA7NYbVAHOfsdHZk19+4qAgm8EYkmjd4/J8HDlb7XouyLuUGy8KJZYhSrJKiQ1C
f9f2Zw0925/YpBmFXOwxuYWP9KjFKlq7Cdr3SSgVGDOvgyRtpeV4fU8Y6WPFCtEV
fQdKr9/4KQP6hwUpxJZucSf49wcnyh7hFDMxrwVVcL96yXZef1OqG3ITihJY/n4J
+wDnpR2VqBeiG5NyECjk3mPROZGFfUlHRsqMd3JOswMpGF5phpEI9nNFcayB262S
Rkgcb3MacFVsuo/ZBdzCUTZ6ECvjxZn4FGZPxumkp65SJO18gOPbqs8qfGCZ3Tgb
GDy0CWEOv/KuGnks1CkBGok2Z4q8s2GcZmaOp9BiPjxKJD71i4uPtiGA/5Ahb6cm
Cu0G7Ub/t2Vc93E7mnTE4hh2IuiAN73yB5teM4YNllHw6f+aqVGlvJktIMpShajo
eEBNFlkwljyz
=WlR9
-----END PGP SIGNATURE-----
Merge tag 'pm-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These are mostly fixes and cleanups all over the code and a new piece
of documentation for Intel uncore frequency scaling.
Functionality-wise, the intel_idle driver will support Sapphire Rapids
Xeons natively now (with some extra facilities for controlling
C-states more precisely on those systems), virtual guests will take
the ACPI S4 hardware signature into account by default, the
intel_pstate driver will take the defualt EPP value from the firmware,
cpupower utility will support the AMD P-state driver added in the
previous cycle, and there is a new tracer utility for that driver.
Specifics:
- Allow device_pm_check_callbacks() to be called from interrupt
context without issues (Dmitry Baryshkov).
- Modify devm_pm_runtime_enable() to automatically handle
pm_runtime_dont_use_autosuspend() at driver exit time (Douglas
Anderson).
- Make the schedutil cpufreq governor use to_gov_attr_set() instead
of open coding it (Kevin Hao).
- Replace acpi_bus_get_device() with acpi_fetch_acpi_dev() in the
cpufreq longhaul driver (Rafael Wysocki).
- Unify show() and store() naming in cpufreq and make it use
__ATTR_XX (Lianjie Zhang).
- Make the intel_pstate driver use the EPP value set by the firmware
by default (Srinivas Pandruvada).
- Re-order the init checks in the powernow-k8 cpufreq driver (Mario
Limonciello).
- Make the ACPI processor idle driver check for architectural support
for LPI to avoid using it on x86 by mistake (Mario Limonciello).
- Add Sapphire Rapids Xeon support to the intel_idle driver (Artem
Bityutskiy).
- Add 'preferred_cstates' module argument to the intel_idle driver to
work around C1 and C1E handling issue on Sapphire Rapids (Artem
Bityutskiy).
- Add core C6 optimization on Sapphire Rapids to the intel_idle
driver (Artem Bityutskiy).
- Optimize the haltpoll cpuidle driver a bit (Li RongQing).
- Remove leftover text from intel_idle() kerneldoc comment and fix up
white space in intel_idle (Rafael Wysocki).
- Fix load_image_and_restore() error path (Ye Bin).
- Fix typos in comments in the system wakeup hadling code (Tom Rix).
- Clean up non-kernel-doc comments in hibernation code (Jiapeng
Chong).
- Fix __setup handler error handling in system-wide suspend and
hibernation core code (Randy Dunlap).
- Add device name to suspend_report_result() (Youngjin Jang).
- Make virtual guests honour ACPI S4 hardware signature by default
(David Woodhouse).
- Block power off of a parent PM domain unless child is in deepest
state (Ulf Hansson).
- Use dev_err_probe() to simplify error handling for generic PM
domains (Ahmad Fatoum).
- Fix sleep-in-atomic bug caused by genpd_debug_remove() (Shawn Guo).
- Document Intel uncore frequency scaling (Srinivas Pandruvada).
- Add DTPM hierarchy description (Daniel Lezcano).
- Change the locking scheme in DTPM (Daniel Lezcano).
- Fix dtpm_cpu cleanup at exit time and missing virtual DTPM pointer
release (Daniel Lezcano).
- Make dtpm_node_callback[] static (kernel test robot).
- Fix spelling mistake "initialze" -> "initialize" in
dtpm_create_hierarchy() (Colin Ian King).
- Add tracer tool for the amd-pstate driver (Jinzhou Su).
- Fix PC6 displaying in turbostat on some systems (Artem Bityutskiy).
- Add AMD P-State support to the cpupower utility (Huang Rui)"
* tag 'pm-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (58 commits)
cpufreq: powernow-k8: Re-order the init checks
cpuidle: intel_idle: Drop redundant backslash at line end
cpuidle: intel_idle: Update intel_idle() kerneldoc comment
PM: hibernate: Honour ACPI hardware signature by default for virtual guests
cpufreq: intel_pstate: Use firmware default EPP
cpufreq: unify show() and store() naming and use __ATTR_XX
PM: core: keep irq flags in device_pm_check_callbacks()
cpuidle: haltpoll: Call cpuidle_poll_state_init() later
Documentation: amd-pstate: add tracer tool introduction
tools/power/x86/amd_pstate_tracer: Add tracer tool for AMD P-state
tools/power/x86/intel_pstate_tracer: make tracer as a module
cpufreq: amd-pstate: Add more tracepoint for AMD P-State module
PM: sleep: Add device name to suspend_report_result()
turbostat: fix PC6 displaying on some systems
intel_idle: add core C6 optimization for SPR
intel_idle: add 'preferred_cstates' module argument
intel_idle: add SPR support
PM: runtime: Have devm_pm_runtime_enable() handle pm_runtime_dont_use_autosuspend()
ACPI: processor idle: Check for architectural support for LPI
cpuidle: PSCI: Move the `has_lpi` check to the beginning of the function
...
- Use uintptr_t and offsetof() in the ACPICA code to avoid compiler
warnings regarding NULL pointer arithmetic (Rafael Wysocki).
- Fix possible NULL pointer dereference in acpi_ns_walk_namespace()
when passed "acpi=off" in the command line (Rafael Wysocki).
- Fix and clean up acpi_os_read/write_port() (Rafael Wysocki).
- Introduce acpi_bus_for_each_dev() and use it for walking all ACPI
device objects in the Type C code (Rafael Wysocki).
- Fix the _OSC platform capabilities negotioation and prevent CPPC
from being used if the platform firmware indicates that it not
supported via _OSC (Rafael Wysocki).
- Use ida_alloc() instead of ida_simple_get() for ACPI enumeration
of devices (Rafael Wysocki).
- Add AGDI and CEDT to the list of known ACPI table signatures (Ilkka
Koskinen, Robert Kiraly).
- Add power management debug messages related to suspend-to-idle in
two places (Rafael Wysocki).
- Fix __acpi_node_get_property_reference() return value and clean up
that function (Andy Shevchenko, Sakari Ailus).
- Fix return value of the __setup handler in the ACPI PM timer clock
source driver (Randy Dunlap).
- Clean up double words in two comments (Tom Rix).
- Add "skip i2c clients" quirks for Lenovo Yoga Tablet 1050F/L and
Nextbook Ares 8 (Hans de Goede).
- Clean up frequency invariance handling on x86 in the ACPI CPPC
library (Huang Rui).
- Work around broken XSDT on the Advantech DAC-BJ01 board (Mark
Cilissen).
- Make wakeup events checks in the ACPI EC driver more
straightforward and clean up acpi_ec_submit_event() (Rafael
Wysocki).
- Make it possible to obtain the CPU capacity with the help of CPPC
information (Ionela Voinescu).
- Improve fine grained fan control in the ACPI fan driver and
document it (Srinivas Pandruvada).
- Add device HID and quirk for Microsoft Surface Go 3 to the ACPI
battery driver (Maximilian Luz).
- Make the ACPI driver for Intel SoCs (LPSS) let the SPI driver know
the exact type of the controller (Andy Shevchenko).
- Force native backlight mode on Clevo NL5xRU and NL5xNU (Werner
Sembach).
- Fix return value of __setup handlers in the APEI code (Randy
Dunlap).
- Add Arm Generic Diagnostic Dump and Reset device driver (Ilkka
Koskinen).
- Limit printable size of BERT table data (Darren Hart).
- Fix up HEST and GHES initialization (Shuai Xue).
- Update the ACPI device enumeration documentation and unify the ASL
style in GPIO-related examples (Andy Shevchenko).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmI4pF0SHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxrPMP/A8kkgzJegS4CtUCtUpLcCufaggdpQTd
I9GQJeo73wGdmaelCQuXFJ9NUhuA1KHIU0WYqneWX+wifht+wl+KAZYvswPm0/wt
TiypiyRMf8Il0Q9tTTmWKSokK80O7ks8OZEe1HmiJimdEn+F1XUzLLgbQKFqhbbV
NHkVix3xR/7htgSb0ksaijH3XLyStuwPvc4WFueO14Pp5Bkr2Of33Xdd0UYeTCi4
RUqL3qJ4DT5gvgKipg43y6D2igRq/xMKx1bgnBjtwKChtjK23GGR6UB/jAIitIMv
XpxLw7kceY65zjJmmJ1+OKeM6CNAcIbTeyCyffSAH/MYRObj93XpMjnhxXILzjYB
Pz2U/lJy0kgw0PUkFzTdPkuuJlDn5GLY8F2cytvtlQAIhtFVFFcnHZYfhhLRWpoN
Sta2NHpGRejR/jixkQ4JtsjQ/Og02zQ9N344enaC64h3JYPBSyM8mpLH/YoXnuSx
jDPQK1KE/QVXRixKFjrPSXYq2p7w/CH7yZXX7TOo+ScnLhapiSUpyh7wiFslZ729
v11yzjsgBQk27qf1EGSImsh+YoRck9qOTb9tkVXGxcifTUPYzyXGn4T5i/ZwpN9v
nL6imYuiRJjFNAksbWo72hjYfhNwWAoCIXgUuxroCPLGGT394j5djisHYMjDNAsG
x43D1Fd4vEgT
=uB8P
-----END PGP SIGNATURE-----
Merge tag 'acpi-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"From the new functionality perspective, the most significant items
here are the new driver for the 'ARM Generic Diagnostic Dump and
Reset' device, the extension of fine grain fan control in the ACPI fan
driver, and the change making it possible to use CPPC information to
obtain CPU capacity.
There are also a few new quirks, a bunch of fixes, including the
platform-level _OSC handling change to make it actually take the
platform firmware response into account, some code and documentation
cleanups, and a notable update of the ACPI device enumeration
documentation.
Specifics:
- Use uintptr_t and offsetof() in the ACPICA code to avoid compiler
warnings regarding NULL pointer arithmetic (Rafael Wysocki).
- Fix possible NULL pointer dereference in acpi_ns_walk_namespace()
when passed "acpi=off" in the command line (Rafael Wysocki).
- Fix and clean up acpi_os_read/write_port() (Rafael Wysocki).
- Introduce acpi_bus_for_each_dev() and use it for walking all ACPI
device objects in the Type C code (Rafael Wysocki).
- Fix the _OSC platform capabilities negotioation and prevent CPPC
from being used if the platform firmware indicates that it not
supported via _OSC (Rafael Wysocki).
- Use ida_alloc() instead of ida_simple_get() for ACPI enumeration of
devices (Rafael Wysocki).
- Add AGDI and CEDT to the list of known ACPI table signatures (Ilkka
Koskinen, Robert Kiraly).
- Add power management debug messages related to suspend-to-idle in
two places (Rafael Wysocki).
- Fix __acpi_node_get_property_reference() return value and clean up
that function (Andy Shevchenko, Sakari Ailus).
- Fix return value of the __setup handler in the ACPI PM timer clock
source driver (Randy Dunlap).
- Clean up double words in two comments (Tom Rix).
- Add "skip i2c clients" quirks for Lenovo Yoga Tablet 1050F/L and
Nextbook Ares 8 (Hans de Goede).
- Clean up frequency invariance handling on x86 in the ACPI CPPC
library (Huang Rui).
- Work around broken XSDT on the Advantech DAC-BJ01 board (Mark
Cilissen).
- Make wakeup events checks in the ACPI EC driver more
straightforward and clean up acpi_ec_submit_event() (Rafael
Wysocki).
- Make it possible to obtain the CPU capacity with the help of CPPC
information (Ionela Voinescu).
- Improve fine grained fan control in the ACPI fan driver and
document it (Srinivas Pandruvada).
- Add device HID and quirk for Microsoft Surface Go 3 to the ACPI
battery driver (Maximilian Luz).
- Make the ACPI driver for Intel SoCs (LPSS) let the SPI driver know
the exact type of the controller (Andy Shevchenko).
- Force native backlight mode on Clevo NL5xRU and NL5xNU (Werner
Sembach).
- Fix return value of __setup handlers in the APEI code (Randy
Dunlap).
- Add Arm Generic Diagnostic Dump and Reset device driver (Ilkka
Koskinen).
- Limit printable size of BERT table data (Darren Hart).
- Fix up HEST and GHES initialization (Shuai Xue).
- Update the ACPI device enumeration documentation and unify the ASL
style in GPIO-related examples (Andy Shevchenko)"
* tag 'acpi-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (52 commits)
clocksource: acpi_pm: fix return value of __setup handler
ACPI: bus: Avoid using CPPC if not supported by firmware
Revert "ACPI: Pass the same capabilities to the _OSC regardless of the query flag"
ACPI: video: Force backlight native for Clevo NL5xRU and NL5xNU
arm64, topology: enable use of init_cpu_capacity_cppc()
arch_topology: obtain cpu capacity using information from CPPC
x86, ACPI: rename init_freq_invariance_cppc() to arch_init_invariance_cppc()
ACPI: AGDI: Add driver for Arm Generic Diagnostic Dump and Reset device
ACPI: tables: Add AGDI to the list of known table signatures
ACPI/APEI: Limit printable size of BERT table data
ACPI: docs: gpio-properties: Unify ASL style for GPIO examples
ACPI / x86: Work around broken XSDT on Advantech DAC-BJ01 board
ACPI: APEI: fix return value of __setup handlers
x86/ACPI: CPPC: Move init_freq_invariance_cppc() into x86 CPPC
x86: Expose init_freq_invariance() to topology header
x86/ACPI: CPPC: Move AMD maximum frequency ratio setting function into x86 CPPC
x86/ACPI: CPPC: Rename cppc_msr.c to cppc.c
ACPI / x86: Add skip i2c clients quirk for Lenovo Yoga Tablet 1050F/L
ACPI / x86: Add skip i2c clients quirk for Nextbook Ares 8
ACPICA: Avoid walking the ACPI Namespace if it is not there
...
highlights are:
- Numerous PDF-generation improvements
- Kees's new document with guidelines for researchers studying the
development community.
- The ongoing stream of Chinese translations
- Thorsten's new document on regression handling
- A major reworking of the internal documentation for the kernel-doc
script.
Plus the usual stream of typo fixes and such.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmI4puIPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5YUnoH/RubGxsXMBcpVk0szSN/c8VEhp+QxnL6Q8NV
BtOySou5lu204awb285m4HPvuVSViHKoDObFK3fYZUjCHP8rrSymnBV8N0E0pCYm
QAgUtNcGqFk41uEkr1v4wmGCj3hIvklycOtBAite4NulHoUzpMsssf6YbajZRIt9
/PyX30jC3dVPDCZ33lYIzJRdilhoKlS5r/x2Fk/c9uOLGCsJDufHlI6PB+RA7Gpf
6kDMriIKmEU9Pq2P+Gl+tVPnrQYSSxVP8QUweObQQll2Wq/tQR/1YtecAg0RvG+g
qc3ciEpVnNHRzSP1XY6Um1FyE338cBtZckdUMgVZ+5vY0300ooA=
=wRYm
-----END PGP SIGNATURE-----
Merge tag 'docs-5.18' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"It has been a moderately busy cycle for documentation; some of the
highlights are:
- Numerous PDF-generation improvements
- Kees's new document with guidelines for researchers studying the
development community.
- The ongoing stream of Chinese translations
- Thorsten's new document on regression handling
- A major reworking of the internal documentation for the kernel-doc
script.
Plus the usual stream of typo fixes and such"
* tag 'docs-5.18' of git://git.lwn.net/linux: (80 commits)
docs/kernel-parameters: update description of mem=
docs/zh_CN: Add sched-nice-design Chinese translation
docs: scheduler: Convert schedutil.txt to ReST
Docs: ktap: add code-block type
docs: serial: fix a reference file name in driver.rst
docs: UML: Mention telnetd for port channel
docs/zh_CN: add damon reclaim translation
docs/zh_CN: add damon usage translation
docs/zh_CN: add admin-guide damon start translation
docs/zh_CN: add admin-guide damon index translation
docs/zh_CN: Refactoring the admin-guide directory index
zh_CN: Add translation for admin-guide/mm/index.rst
zh_CN: Add translations for admin-guide/mm/ksm.rst
Add Chinese translation for vm/ksm.rst
docs/zh_CN: Add sched-stats Chinese translation
docs/zh_CN: add devicetree of_unittest translation
docs/zh_CN: add devicetree usage-model translation
docs/zh_CN: add devicetree index translation
Documentation: describe how to apply incremental stable patches
docs/zh_CN: add peci subsystem translation
...
This pull request contains the following branches:
exp.2022.02.24a: Contains a fix for idle detection from Neeraj Upadhyay
and missing access marking detected by KCSAN.
fixes.2022.02.14a: Miscellaneous fixes.
rcu_barrier.2022.02.08a: Reduces coupling between rcu_barrier() and
CPU-hotplug operations, so that rcu_barrier() no longer needs
to do cpus_read_lock(). This may also someday allow system
boot to bring CPUs online concurrently.
rcu-tasks.2022.02.08a: Enable more aggressive movement to per-CPU
queueing when reacting to excessive lock contention due
to workloads placing heavy update-side stress on RCU tasks.
rt.2022.02.01b: Improvements to RCU priority boosting, including
changes from Neeraj Upadhyay, Zqiang, and Alison Chaiken.
torture.2022.02.01b: Various fixes improving test robustness and
debug information.
torturescript.2022.02.08a: Add tests for SRCU size transitions, further
compress torture.sh build products, and improve debug output.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmIusb0THHBhdWxtY2tA
a2VybmVsLm9yZwAKCRCevxLzctn7jAklD/9VXLK7crcg2YeRXUIg1IOdnancsVCV
MNtTfxNYqYIis+W2UfuHKuQu2yEXF5fihdY0J9TQv0byHsprp6FIZT+i1An4Ukgd
0vyHjd/DaIKgs2txsB1DjhlatWlJUfQuBwhtNUkpYFLFwKdCI1l813bPbNlL+GiL
p0ZejVMpBC5HgE6sDOtaaQSAB+AEUp+Lgr+yaG/On8hfzwWFKO8KldxhiKY9n07v
SNDfKDgXB+80hx4RBVGbkuogV3s9brFULoNRXJy7Uf79DtiY09uazhhA3G0TjO34
zGwmF91dqsXDF/Uz8g4aZO0xYRXUchOrsQ5lgO/GhTVbM9I0wWlMHEk/8WHyBJkU
vlXOMuwzBc9/5uwZE3rnkA4a3nkXhPQjLlCr+/I7A/7Vsv9IBW9WSlgMvUN0Qf4S
XAwTnIqfErnR60a+L0+HRr5kIV5VoXcxqI/Nv0/4/BMLRubS/c7cYjOTxXNJL9SU
50pv5vty9xk3HSpuz0JAOyLf+PUT773uUQhFr5xCBSCVqbAm5WFg6hWPAgrN/tUS
wstBc0wlA73rKVJxeLDQwHc/oT1zTUEzswVZITQ5zLHK0t0GbeR6QHccsdeaJyTe
DisX+66A6YQrEuJmx5xUZqjYHqtYLDOBTbHA3ZwQmvjKu8ibWZ8Fg9ioURLCS4bF
+FVkp/5KdcAN9w==
=ljVY
-----END PGP SIGNATURE-----
Merge tag 'rcu.2022.03.13a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:
- Fix idle detection (Neeraj Upadhyay) and missing access marking
detected by KCSAN.
- Reduce coupling between rcu_barrier() and CPU-hotplug operations, so
that rcu_barrier() no longer needs to do cpus_read_lock(). This may
also someday allow system boot to bring CPUs online concurrently.
- Enable more aggressive movement to per-CPU queueing when reacting to
excessive lock contention due to workloads placing heavy update-side
stress on RCU tasks.
- Improvements to RCU priority boosting, including changes from Neeraj
Upadhyay, Zqiang, and Alison Chaiken.
- Various fixes improving test robustness and debug information.
- Add tests for SRCU size transitions, further compress torture.sh
build products, and improve debug output.
- Miscellaneous fixes.
* tag 'rcu.2022.03.13a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (49 commits)
rcu: Replace cpumask_weight with cpumask_empty where appropriate
rcu: Remove __read_mostly annotations from rcu_scheduler_active externs
rcu: Uninline multi-use function: finish_rcuwait()
rcu: Mark writes to the rcu_segcblist structure's ->flags field
kasan: Record work creation stack trace with interrupts enabled
rcu: Inline __call_rcu() into call_rcu()
rcu: Add mutex for rcu boost kthread spawning and affinity setting
rcu: Fix description of kvfree_rcu()
MAINTAINERS: Add Frederic and Neeraj to their RCU files
rcutorture: Provide non-power-of-two Tasks RCU scenarios
rcutorture: Test SRCU size transitions
torture: Make torture.sh help message match reality
rcu-tasks: Set ->percpu_enqueue_shift to zero upon contention
rcu-tasks: Use order_base_2() instead of ilog2()
rcu: Create and use an rcu_rdp_cpu_online()
rcu: Make rcu_barrier() no longer block CPU-hotplug operations
rcu: Rework rcu_barrier() and callback-migration logic
rcu: Refactor rcu_barrier() empty-list handling
rcu: Kill rnp->ofl_seq and use only rcu_state.ofl_lock for exclusion
torture: Change KVM environment variable to RCUTORTURE
...
- Support for including MTE tags in ELF coredumps
- Instruction encoder updates, including fixes to 64-bit immediate
generation and support for the LSE atomic instructions
- Improvements to kselftests for MTE and fpsimd
- Symbol aliasing and linker script cleanups
- Reduce instruction cache maintenance performed for user mappings
created using contiguous PTEs
- Support for the new "asymmetric" MTE mode, where stores are checked
asynchronously but loads are checked synchronously
- Support for the latest pointer authentication algorithm ("QARMA3")
- Support for the DDR PMU present in the Marvell CN10K platform
- Support for the CPU PMU present in the Apple M1 platform
- Use the RNDR instruction for arch_get_random_{int,long}()
- Update our copy of the Arm optimised string routines for str{n}cmp()
- Fix signal frame generation for CPUs which have foolishly elected to
avoid building in support for the fpsimd instructions
- Workaround for Marvell GICv3 erratum #38545
- Clarification to our Documentation (booting reqs. and MTE prctl())
- Miscellanous cleanups and minor fixes
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmIvta8QHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNAIhB/oDSva5FryAFExVuIB+mqRkbZO9kj6fy/5J
ctN9LEVO2GI/U1TVAUWop1lXmP8Kbq5UCZOAuY8sz7dAZs7NRUWkwTrXVhaTpi6L
oxCfu5Afu76d/TGgivNz+G7/ewIJRFj5zCPmHezLF9iiWPUkcAsP0XCp4a0iOjU4
04O4d7TL/ap9ujEes+U0oEXHnyDTPrVB2OVE316FKD1fgztcjVJ2U+TxX5O4xitT
PPIfeQCjQBq1B2OC1cptE3wpP+YEr9OZJbx+Ieweidy1CSInEy0nZ13tLoUnGPGU
KPhsvO9daUCbhbd5IDRBuXmTi/sHU4NIB8LNEVzT1mUPnU8pCizv
=ziGg
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
- Support for including MTE tags in ELF coredumps
- Instruction encoder updates, including fixes to 64-bit immediate
generation and support for the LSE atomic instructions
- Improvements to kselftests for MTE and fpsimd
- Symbol aliasing and linker script cleanups
- Reduce instruction cache maintenance performed for user mappings
created using contiguous PTEs
- Support for the new "asymmetric" MTE mode, where stores are checked
asynchronously but loads are checked synchronously
- Support for the latest pointer authentication algorithm ("QARMA3")
- Support for the DDR PMU present in the Marvell CN10K platform
- Support for the CPU PMU present in the Apple M1 platform
- Use the RNDR instruction for arch_get_random_{int,long}()
- Update our copy of the Arm optimised string routines for str{n}cmp()
- Fix signal frame generation for CPUs which have foolishly elected to
avoid building in support for the fpsimd instructions
- Workaround for Marvell GICv3 erratum #38545
- Clarification to our Documentation (booting reqs. and MTE prctl())
- Miscellanous cleanups and minor fixes
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (90 commits)
docs: sysfs-devices-system-cpu: document "asymm" value for mte_tcf_preferred
arm64/mte: Remove asymmetric mode from the prctl() interface
arm64: Add cavium_erratum_23154_cpus missing sentinel
perf/marvell: Fix !CONFIG_OF build for CN10K DDR PMU driver
arm64: mm: Drop 'const' from conditional arm64_dma_phys_limit definition
Documentation: vmcoreinfo: Fix htmldocs warning
kasan: fix a missing header include of static_keys.h
drivers/perf: Add Apple icestorm/firestorm CPU PMU driver
drivers/perf: arm_pmu: Handle 47 bit counters
arm64: perf: Consistently make all event numbers as 16-bits
arm64: perf: Expose some Armv9 common events under sysfs
perf/marvell: cn10k DDR perf event core ownership
perf/marvell: cn10k DDR perfmon event overflow handling
perf/marvell: CN10k DDR performance monitor support
dt-bindings: perf: marvell: cn10k ddr performance monitor
arm64: clean up tools Makefile
perf/arm-cmn: Update watchpoint format
perf/arm-cmn: Hide XP PUB events for CMN-600
arm64: drop unused includes of <linux/personality.h>
arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones
...
Merge power management utilities changes for 5.18-rc1:
- Add tracer tool for the amd-pstate driver (Jinzhou Su).
- Fix PC6 displaying in turbostat on some systems (Artem Bityutskiy).
- Add AMD P-State support to the cpupower utility (Huang Rui).
* pm-tools:
Documentation: amd-pstate: add tracer tool introduction
tools/power/x86/amd_pstate_tracer: Add tracer tool for AMD P-state
tools/power/x86/intel_pstate_tracer: make tracer as a module
cpufreq: amd-pstate: Add more tracepoint for AMD P-State module
turbostat: fix PC6 displaying on some systems
cpupower: Add "perf" option to print AMD P-State information
cpupower: Add function to print AMD P-State performance capabilities
cpupower: Move print_speed function into misc helper
cpupower: Enable boost state support for AMD P-State module
cpupower: Add AMD P-State sysfs definition and access helper
cpupower: Introduce ACPI CPPC library
cpupower: Add the function to get the sysfs value from specific table
cpupower: Initial AMD P-State capability
cpupower: Add the function to check AMD P-State enabled
cpupower: Add AMD P-State capability flag
tools/power/cpupower/{ToDo => TODO}: Rename the todo file
tools: cpupower: fix typo in cpupower-idle-set(1) manpage
Merge ACPI EC driver changes, CPPC-related changes, ACPI fan driver
changes and ACPI battery driver changes for 5.18-rc1:
- Make wakeup events checks in the ACPI EC driver more
straightforward and clean up acpi_ec_submit_event() (Rafael
Wysocki).
- Make it possible to obtain the CPU capacity with the help of CPPC
information (Ionela Voinescu).
- Improve fine grained fan control in the ACPI fan driver and
document it (Srinivas Pandruvada).
- Add device HID and quirk for Microsoft Surface Go 3 to the ACPI
battery driver (Maximilian Luz).
* acpi-ec:
ACPI: EC: Rearrange code in acpi_ec_submit_event()
ACPI: EC: Reduce indentation level in acpi_ec_submit_event()
ACPI: EC: Do not return result from advance_transaction()
* acpi-cppc:
arm64, topology: enable use of init_cpu_capacity_cppc()
arch_topology: obtain cpu capacity using information from CPPC
x86, ACPI: rename init_freq_invariance_cppc() to arch_init_invariance_cppc()
* acpi-fan:
Documentation/admin-guide/acpi: Add documentation for fine grain control
ACPI: fan: Add additional attributes for fine grain control
ACPI: fan: Properly handle fine grain control
ACPI: fan: Optimize struct acpi_fan_fif
ACPI: fan: Separate file for attributes creation
ACPI: fan: Fix error reporting to user space
* acpi-battery:
ACPI: battery: Add device HID and quirk for Microsoft Surface Go 3
As the end goal is to have platform drivers split by vendor,
rename omap3isp/ to ti/omap3isp/.
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
As the end goal is to have platform drivers split by vendor,
rename exynos4-is/ to samsung/exynos4-is/.
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
The existing description of mem= does not cover all the cases and
differences between how architectures treat it.
Extend the description to match the code.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/20220310082736.1346366-1-rppt@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
This is the only tuner driver that has "tuner-" on its name.
Rename it, in order to match all the other tuner drivers.
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Add ftrace_boot_snapshot kernel parameter that will take a snapshot at the
end of boot up just before switching over to user space (it happens during
the kernel freeing of init memory).
This is useful when there's interesting data that can be collected from
kernel start up, but gets overridden by user space start up code. With
this option, the ring buffer content from the boot up traces gets saved in
the snapshot at the end of boot up. This trace can be read from:
/sys/kernel/tracing/snapshot
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Since commit 2369f171d5 ("arm64: crash_core: Export MODULES, VMALLOC,
and VMEMMAP ranges"), Stephen reports a warning when building htmldocs:
| Documentation/admin-guide/kdump/vmcoreinfo.rst:498: WARNING: Title underline too short.
Extend the underline to squash the warning.
Fixes: 2369f171d5 ("arm64: crash_core: Export MODULES, VMALLOC, and VMEMMAP ranges")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Will Deacon <will@kernel.org>
which support eIBRS, i.e., the hardware-assisted speculation restriction
after it has been shown that such machines are vulnerable even with the
hardware mitigation.
- Do not use the default LFENCE-based Spectre v2 mitigation on AMD as it
is insufficient to mitigate such attacks. Instead, switch to retpolines
on all AMD by default.
- Update the docs and add some warnings for the obviously vulnerable
cmdline configurations.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmIkktUACgkQEsHwGGHe
VUo7ZQ/+O4hzL/tHY0V/ekkDxCrJ3q3Hp+DcxUl2ee5PC3Qgxv1Z1waH6ppK8jQs
marAGr7FYbvzY039ON7irxhpSIckBCpx9tM2F43zsPxxY8EdxGojkHbmaqso5HtW
l3/O28AcZYoKN/fF8rRAIJy4hrTVascKrNJ2fOiYWYBT62ZIoPm0FusgXbKTZPD+
gT7iUMoyPjBnKdWDT9L6kKOxDF9TivX1Y6JdDHbnnBsgRkeFatkeq9BJ93M73q63
Ziq9c8ZcEXyKez+cGFCfXM7+pNYmfsiL48lilTyf+v+GXahDJQOkFw39j5zXEALm
Nk6yB3PRQ74pEwm5WbK7KO8iwPpblmnDB978mfUcpk+9xWJD8pyoUcItAmCBsXh1
LjIImYPqL6YihUb9udh+PEDISsfzWNzr4T+kgW9/yXXG4ZmGy3TLInhTK+rNAxJa
EshWZExEZj6yJvt83Vu08W9fppYJq976tJvl8LWOYthaxqY7IQz0q7mYd799yxk0
MLPqvZP1+4pHzqn2c9yeHgrwHwMmoqcyMx6B3EA5maYQPdlT7Fk9RCBeCdIA/ieF
OgGxy1WwMH+cvUa5MaBy3Y32LeYU3bUJh0yPFq/7BxEYGG9PJtLhg2xTo1Ui8F1d
fKrcSFcjZKVJ9UE5HaqOcp4ka+Q220I9IDGURXkAFQlnOU7X7CE=
=Athd
-----END PGP SIGNATURE-----
Merge tag 'x86_bugs_for_v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 spectre fixes from Borislav Petkov:
- Mitigate Spectre v2-type Branch History Buffer attacks on machines
which support eIBRS, i.e., the hardware-assisted speculation
restriction after it has been shown that such machines are vulnerable
even with the hardware mitigation.
- Do not use the default LFENCE-based Spectre v2 mitigation on AMD as
it is insufficient to mitigate such attacks. Instead, switch to
retpolines on all AMD by default.
- Update the docs and add some warnings for the obviously vulnerable
cmdline configurations.
* tag 'x86_bugs_for_v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT
x86/speculation: Warn about Spectre v2 LFENCE mitigation
x86/speculation: Update link to AMD speculation whitepaper
x86/speculation: Use generic retpoline by default on AMD
x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
Documentation/hw-vuln: Update spectre doc
x86/speculation: Add eIBRS + Retpoline options
x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
The following interrelated ranges are needed by the kdump crash tool:
MODULES_VADDR ~ MODULES_END,
VMALLOC_START ~ VMALLOC_END,
VMEMMAP_START ~ VMEMMAP_END
Since these values change from time to time, it is preferable to export
them via vmcoreinfo than to change the crash's code frequently.
Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
Link: https://lore.kernel.org/r/20220209092642.9181-1-shijie@os.amperecomputing.com
Signed-off-by: Will Deacon <will@kernel.org>
Since bit 57 was exported for uffd-wp write-protected (commit
fb8e37f35a2f: "mm/pagemap: export uffd-wp protection information"),
fixing it can reduce some unnecessary confusion.
Link: https://lkml.kernel.org/r/20220301044538.3042713-1-yun.zhou@windriver.com
Fixes: fb8e37f35a ("mm/pagemap: export uffd-wp protection information")
Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Tiberiu A Georgescu <tiberiu.georgescu@nutanix.com>
Cc: Florian Schmidt <florian.schmidt@nutanix.com>
Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Colin Cross <ccross@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix following 'make htmldocs' warnings:
./Documentation/admin-guide/perf/hisi-pcie-pmu.rst: WARNING:
document isn't included in any toctree
Fixes: c8602008e2 ("docs: perf: Add description for HiSilicon PCIe PMU driver")
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/20220228031700.1669086-1-wanjiabing@vivo.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Added documentation to configure uncore frequency limits in Intel
Xeon processors.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Clean up the document wording ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>