- More userfaultfs work from Peter Xu.
- Several convert-to-folios series from Sidhartha Kumar and Huang Ying.
- Some filemap cleanups from Vishal Moola.
- David Hildenbrand added the ability to selftest anon memory COW handling.
- Some cpuset simplifications from Liu Shixin.
- Addition of vmalloc tracing support by Uladzislau Rezki.
- Some pagecache folioifications and simplifications from Matthew Wilcox.
- A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use it.
- Miguel Ojeda contributed some cleanups for our use of the
__no_sanitize_thread__ gcc keyword. This series shold have been in the
non-MM tree, my bad.
- Naoya Horiguchi improved the interaction between memory poisoning and
memory section removal for huge pages.
- DAMON cleanups and tuneups from SeongJae Park
- Tony Luck fixed the handling of COW faults against poisoned pages.
- Peter Xu utilized the PTE marker code for handling swapin errors.
- Hugh Dickins reworked compound page mapcount handling, simplifying it
and making it more efficient.
- Removal of the autonuma savedwrite infrastructure from Nadav Amit and
David Hildenbrand.
- zram support for multiple compression streams from Sergey Senozhatsky.
- David Hildenbrand reworked the GUP code's R/O long-term pinning so
that drivers no longer need to use the FOLL_FORCE workaround which
didn't work very well anyway.
- Mel Gorman altered the page allocator so that local IRQs can remnain
enabled during per-cpu page allocations.
- Vishal Moola removed the try_to_release_page() wrapper.
- Stefan Roesch added some per-BDI sysfs tunables which are used to
prevent network block devices from dirtying excessive amounts of
pagecache.
- David Hildenbrand did some cleanup and repair work on KSM COW
breaking.
- Nhat Pham and Johannes Weiner have implemented writeback in zswap's
zsmalloc backend.
- Brian Foster has fixed a longstanding corner-case oddity in
file[map]_write_and_wait_range().
- sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang
Chen.
- Shiyang Ruan has done some work on fsdax, to make its reflink mode
work better under xfstests. Better, but still not perfect.
- Christoph Hellwig has removed the .writepage() method from several
filesystems. They only need .writepages().
- Yosry Ahmed wrote a series which fixes the memcg reclaim target
beancounting.
- David Hildenbrand has fixed some of our MM selftests for 32-bit
machines.
- Many singleton patches, as usual.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY5j6ZwAKCRDdBJ7gKXxA
jkDYAP9qNeVqp9iuHjZNTqzMXkfmJPsw2kmy2P+VdzYVuQRcJgEAgoV9d7oMq4ml
CodAgiA51qwzId3GRytIo/tfWZSezgA=
=d19R
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- More userfaultfs work from Peter Xu
- Several convert-to-folios series from Sidhartha Kumar and Huang Ying
- Some filemap cleanups from Vishal Moola
- David Hildenbrand added the ability to selftest anon memory COW
handling
- Some cpuset simplifications from Liu Shixin
- Addition of vmalloc tracing support by Uladzislau Rezki
- Some pagecache folioifications and simplifications from Matthew
Wilcox
- A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use
it
- Miguel Ojeda contributed some cleanups for our use of the
__no_sanitize_thread__ gcc keyword.
This series should have been in the non-MM tree, my bad
- Naoya Horiguchi improved the interaction between memory poisoning and
memory section removal for huge pages
- DAMON cleanups and tuneups from SeongJae Park
- Tony Luck fixed the handling of COW faults against poisoned pages
- Peter Xu utilized the PTE marker code for handling swapin errors
- Hugh Dickins reworked compound page mapcount handling, simplifying it
and making it more efficient
- Removal of the autonuma savedwrite infrastructure from Nadav Amit and
David Hildenbrand
- zram support for multiple compression streams from Sergey Senozhatsky
- David Hildenbrand reworked the GUP code's R/O long-term pinning so
that drivers no longer need to use the FOLL_FORCE workaround which
didn't work very well anyway
- Mel Gorman altered the page allocator so that local IRQs can remnain
enabled during per-cpu page allocations
- Vishal Moola removed the try_to_release_page() wrapper
- Stefan Roesch added some per-BDI sysfs tunables which are used to
prevent network block devices from dirtying excessive amounts of
pagecache
- David Hildenbrand did some cleanup and repair work on KSM COW
breaking
- Nhat Pham and Johannes Weiner have implemented writeback in zswap's
zsmalloc backend
- Brian Foster has fixed a longstanding corner-case oddity in
file[map]_write_and_wait_range()
- sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang
Chen
- Shiyang Ruan has done some work on fsdax, to make its reflink mode
work better under xfstests. Better, but still not perfect
- Christoph Hellwig has removed the .writepage() method from several
filesystems. They only need .writepages()
- Yosry Ahmed wrote a series which fixes the memcg reclaim target
beancounting
- David Hildenbrand has fixed some of our MM selftests for 32-bit
machines
- Many singleton patches, as usual
* tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (313 commits)
mm/hugetlb: set head flag before setting compound_order in __prep_compound_gigantic_folio
mm: mmu_gather: allow more than one batch of delayed rmaps
mm: fix typo in struct pglist_data code comment
kmsan: fix memcpy tests
mm: add cond_resched() in swapin_walk_pmd_entry()
mm: do not show fs mm pc for VM_LOCKONFAULT pages
selftests/vm: ksm_functional_tests: fixes for 32bit
selftests/vm: cow: fix compile warning on 32bit
selftests/vm: madv_populate: fix missing MADV_POPULATE_(READ|WRITE) definitions
mm/gup_test: fix PIN_LONGTERM_TEST_READ with highmem
mm,thp,rmap: fix races between updates of subpages_mapcount
mm: memcg: fix swapcached stat accounting
mm: add nodes= arg to memory.reclaim
mm: disable top-tier fallback to reclaim on proactive reclaim
selftests: cgroup: make sure reclaim target memcg is unprotected
selftests: cgroup: refactor proactive reclaim code to reclaim_until()
mm: memcg: fix stale protection of reclaim target memcg
mm/mmap: properly unaccount memory on mas_preallocate() failure
omfs: remove ->writepage
jfs: remove ->writepage
...
This patch implements ftrace trampolines through plt entry.
Tested by forcing ftrace_make_call() to use the module PLT, and then
loading up a module after setting up ftrace with:
| echo ":mod:<module-name>" > set_ftrace_filter;
| echo function > current_tracer;
| modprobe <module-name>
Since FTRACE_ADDR/FTRACE_REGS_ADDR is only defined when CONFIG_DYNAMIC_
FTRACE is selected, we wrap their usage in module_init_ftrace_plt() with
ifdeffery rather than using IS_ENABLED().
Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
ftrace_graph_ret_addr() can be called by stack unwinding code to convert
a found stack return address ('ret') to its original value, in case the
function graph tracer has modified it to be 'return_to_handler'. If the
hasn't been modified, the unchanged value of 'ret' is returned.
Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Allow for arguments to be passed in to ftrace_regs by default. If this
is set, then arguments and stack can be found from the pt_regs.
1. HAVE_DYNAMIC_FTRACE_WITH_ARGS don't need special hook for graph
tracer entry point, but instead we can use graph_ops::func function to
install the return_hooker.
2. Livepatch requires this option in the future.
Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
This patch implements CONFIG_DYNAMIC_FTRACE_WITH_REGS on LoongArch,
which allows a traced function's arguments (and some other registers)
to be captured into a struct pt_regs, allowing these to be inspected
and modified.
Co-developed-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
The compiler has inserted 2 NOPs before the regular function prologue.
T series registers are available and safe because of LoongArch's psABI.
At runtime, we can replace nop with bl to enable ftrace call and replace
bl with nop to disable ftrace call. The bl instruction requires us to
save the original RA value, so it saves RA at t0 here.
Details are:
| Compiled | Disabled | Enabled |
+------------+------------------------+------------------------+
| nop | move t0, ra | move t0, ra |
| nop | nop | bl ftrace_caller |
| func_body | func_body | func_body |
The RA value will be recovered by ftrace_regs_entry, and restored into
RA before returning to the regular function prologue. When a function is
not being traced, the "move t0, ra" is not harmful.
1) ftrace_make_call, ftrace_make_nop (in kernel/ftrace.c)
The two functions turn each recorded call site of filtered functions
into a call to ftrace_caller or nops.
2) ftracce_update_ftrace_func (in kernel/ftrace.c)
turns the nops at ftrace_call into a call to a generic entry for
function tracers.
3) ftrace_caller (in kernel/mcount_dyn.S)
The entry where each _mcount call sites calls to once they are
filtered to be traced.
Co-developed-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
This patch contains basic ftrace support for LoongArch. Specifically,
function tracer (HAVE_FUNCTION_TRACER), function graph tracer (HAVE_
FUNCTION_GRAPH_TRACER) are implemented following the instructions in
Documentation/trace/ftrace-design.txt.
Use `-pg` makes stub like a child function `void _mcount(void *ra)`.
Thus, it can be seen store RA and alloc stack before `call _mcount`.
Find `alloc stack` at first, and then find `store RA`.
Note that the functions in both inst.c and time.c should not be hooked
with the compiler's -pg option: to prevent infinite self-referencing for
the former, and to ignore early setup stuff for the latter.
Co-developed-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Instead of saving a pointer to the .got, .plt and .plt_idx sections to
apply {got,plt}-based relocations, save and use their section indices
instead.
The mod->arch.{core,init}.{got,plt} pointers were problematic for live-
patch because they pointed within temporary section headers (provided by
the module loader via info->sechdrs) that would be freed after module
load. Since livepatch modules may need to apply relocations post-module-
load (for example, to patch a module that is loaded later), using section
indices to offset into the section headers (instead of accessing them
through a saved pointer) allows livepatch modules on LoongArch to pass
in their own copy of the section headers to apply_relocate_add() to
apply delayed relocations.
The method used is same as commit c8ebf64eab ("arm64/module: use plt
section indices for relocations").
Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Add basic stack protector support similar to other architectures. A
constant canary value is set at boot time, and with help of compiler's
-fstack-protector we can detect stack corruption.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Since commit 40cd01a9c324("efi/loongarch: libstub: remove dependency on
flattened DT"), we can parse the FDT from efi system table.
And now, LoongArch is coming to support booting with FDT, so we add the
relevant booting support as well as parameter parsing.
Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Introduce the "alternative" mechanism from ARM64 and x86 for LoongArch
to apply runtime patching. The main purpose of this patch is to provide
a framework. In future we can use this mechanism (i.e., the ALTERNATIVE
and ALTERNATIVE_2 macros) to optimize hotspot functions according to cpu
features.
Signed-off-by: Jun Yi <yijun@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Loongson-2 series (Loongson-2K500, Loongson-2K1000) don't support
unaligned access in hardware, while Loongson-3 series (Loongson-3A5000,
Loongson-3C5000) are configurable whether support unaligned access in
hardware. This patch add unaligned access emulation for those LoongArch
processors without hardware support.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Inspired by commit 800834285361("bpf, arm64: Add BPF exception tables"),
do similar to LoongArch to add BPF exception tables.
When a tracing BPF program attempts to read memory without using the
bpf_probe_read() helper, the verifier marks the load instruction with
the BPF_PROBE_MEM flag. Since the LoongArch JIT does not currently
recognize this flag it falls back to the interpreter.
Add support for BPF_PROBE_MEM, by appending an exception table to the
BPF program. If the load instruction causes a data abort, the fixup
infrastructure finds the exception table and fixes up the fault, by
clearing the destination register and jumping over the faulting
instruction.
To keep the compact exception table entry format, inspect the pc in
fixup_exception(). A more generic solution would add a "handler" field
to the table entry, like on x86, s390 and arm64, etc.
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Inspired by commit 2e77a62cb3a6("arm64: extable: add a dedicated uaccess
handler"), do similar to LoongArch to add a dedicated uaccess exception
handler to update registers in exception context and subsequently return
back into the function which faulted, so we remove the need for fixups
specialized to each faulting instruction.
Add gpr-num.h here because we need to map the same GPR names to integer
constants, so that we can use this to build meta-data for the exception
fixups.
The compiler treats gpr 0 as zero rather than $r0, so set it separately
to .L__gpr_num_zero, otherwise the following assembly error will occurs:
{standard input}: Assembler messages:
{standard input}:1074: Error: invalid operands (*UND* and *ABS* sections) for `<<'
{standard input}:1160: Error: invalid operands (*UND* and *ABS* sections) for `<<'
make[1]: *** [scripts/Makefile.build:249: fs/fcntl.o] Error 1
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
This is a LoongArch port of commit d6e2cc5647 ("arm64: extable: add
`type` and `data` fields").
Subsequent patches will add specialized handlers for fixups, in addition
to the simple PC fixup we have today. In preparation, this patch adds a
new `type` field to struct exception_table_entry, and uses this to
distinguish the fixup and other cases. A `data` field is also added so
that subsequent patches can associate data specific to each exception
site (e.g. register numbers).
Handlers are named ex_handler_*() for consistency, following the example
of x86. At the same time, get_ex_fixup() is split out into a helper so
that it can be used by other ex_handler_*() functions in the subsequent
patches.
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Similar to other architectures such as arm64, x86, riscv and so on, use
offsets relative to the exception table entry values rather than their
absolute addresses for both the exception location and the fixup.
However, LoongArch label difference because it will actually produce two
relocations, a pair of R_LARCH_ADD32 and R_LARCH_SUB32. Take simple code
below for example:
$ cat test_ex_table.S
.section .text
1:
nop
.section __ex_table,"a"
.balign 4
.long (1b - .)
.previous
$ loongarch64-unknown-linux-gnu-gcc -c test_ex_table.S
$ loongarch64-unknown-linux-gnu-readelf -Wr test_ex_table.o
Relocation section '.rela__ex_table' at offset 0x100 contains 2 entries:
Offset Info Type Symbol's Value Symbol's Name + Addend
0000000000000000 0000000600000032 R_LARCH_ADD32 0000000000000000 .L1^B1 + 0
0000000000000000 0000000500000037 R_LARCH_SUB32 0000000000000000 L0^A + 0
The modpost will complain the R_LARCH_SUB32 relocation, so we need to
patch modpost.c to skip this relocation for .rela__ex_table section.
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Consolidate all the __ex_table constuction code with a _ASM_EXTABLE or
_asm_extable helper.
There should be no functional change as a result of this patch.
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
- Refactor the zboot code so that it incorporates all the EFI stub
logic, rather than calling the decompressed kernel as a EFI app.
- Add support for initrd= command line option to x86 mixed mode.
- Allow initrd= to be used with arbitrary EFI accessible file systems
instead of just the one the kernel itself was loaded from.
- Move some x86-only handling and manipulation of the EFI memory map
into arch/x86, as it is not used anywhere else.
- More flexible handling of any random seeds provided by the boot
environment (i.e., systemd-boot) so that it becomes available much
earlier during the boot.
- Allow improved arch-agnostic EFI support in loaders, by setting a
uniform baseline of supported features, and adding a generic magic
number to the DOS/PE header. This should allow loaders such as GRUB or
systemd-boot to reduce the amount of arch-specific handling
substantially.
- (arm64) Run EFI runtime services from a dedicated stack, and use it to
recover from synchronous exceptions that might occur in the firmware
code.
- (arm64) Ensure that we don't allocate memory outside of the 48-bit
addressable physical range.
- Make EFI pstore record size configurable
- Add support for decoding CXL specific CPER records
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmOTQ1cACgkQw08iOZLZ
jyQRkAv+LqaZFWeVwhAQHiw/N3RnRM0nZHea6++D2p1y/ZbCpwv3pdLl2YHQ1KmW
wDG9Nr4C1ITLtfy1YZKeYpwloQtq9S1GZDWnFpVv/hdo7L924eRAwIlxowWn1OnP
ruxv2PaYXyb0plh1YD1f6E1BqrfUOtajET55Kxs9ZsxmnMtDpIX3NiYy4LKMBIZC
+Eywt41M3uBX+wgmSujFBMVVJjhOX60WhUYXqy0RXwDKOyrz/oW5td+eotSCreB6
FVbjvwQvUdtzn4s1FayOMlTrkxxLw4vLhsaUGAdDOHd3rg3sZT9Xh1HqFFD6nss6
ZAzAYQ6BzdiV/5WSB9meJe+BeG1hjTNKjJI6JPO2lctzYJqlnJJzI6JzBuH9vzQ0
dffLB8NITeEW2rphIh+q+PAKFFNbXWkJtV4BMRpqmzZ/w7HwupZbUXAzbWE8/5km
qlFpr0kmq8GlVcbXNOFjmnQVrJ8jPYn+O3AwmEiVAXKZJOsMH0sjlXHKsonme9oV
Sk71c6Em
=JEXz
-----END PGP SIGNATURE-----
Merge tag 'efi-next-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
"Another fairly sizable pull request, by EFI subsystem standards.
Most of the work was done by me, some of it in collaboration with the
distro and bootloader folks (GRUB, systemd-boot), where the main focus
has been on removing pointless per-arch differences in the way EFI
boots a Linux kernel.
- Refactor the zboot code so that it incorporates all the EFI stub
logic, rather than calling the decompressed kernel as a EFI app.
- Add support for initrd= command line option to x86 mixed mode.
- Allow initrd= to be used with arbitrary EFI accessible file systems
instead of just the one the kernel itself was loaded from.
- Move some x86-only handling and manipulation of the EFI memory map
into arch/x86, as it is not used anywhere else.
- More flexible handling of any random seeds provided by the boot
environment (i.e., systemd-boot) so that it becomes available much
earlier during the boot.
- Allow improved arch-agnostic EFI support in loaders, by setting a
uniform baseline of supported features, and adding a generic magic
number to the DOS/PE header. This should allow loaders such as GRUB
or systemd-boot to reduce the amount of arch-specific handling
substantially.
- (arm64) Run EFI runtime services from a dedicated stack, and use it
to recover from synchronous exceptions that might occur in the
firmware code.
- (arm64) Ensure that we don't allocate memory outside of the 48-bit
addressable physical range.
- Make EFI pstore record size configurable
- Add support for decoding CXL specific CPER records"
* tag 'efi-next-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (43 commits)
arm64: efi: Recover from synchronous exceptions occurring in firmware
arm64: efi: Execute runtime services from a dedicated stack
arm64: efi: Limit allocations to 48-bit addressable physical region
efi: Put Linux specific magic number in the DOS header
efi: libstub: Always enable initrd command line loader and bump version
efi: stub: use random seed from EFI variable
efi: vars: prohibit reading random seed variables
efi: random: combine bootloader provided RNG seed with RNG protocol output
efi/cper, cxl: Decode CXL Error Log
efi/cper, cxl: Decode CXL Protocol Error Section
efi: libstub: fix efi_load_initrd_dev_path() kernel-doc comment
efi: x86: Move EFI runtime map sysfs code to arch/x86
efi: runtime-maps: Clarify purpose and enable by default for kexec
efi: pstore: Add module parameter for setting the record size
efi: xen: Set EFI_PARAVIRT for Xen dom0 boot on all architectures
efi: memmap: Move manipulation routines into x86 arch tree
efi: memmap: Move EFI fake memmap support into x86 arch tree
efi: libstub: Undeprecate the command line initrd loader
efi: libstub: Add mixed mode support to command line initrd loader
efi: libstub: Permit mixed mode return types other than efi_status_t
...
- Update the ACPICA code in the kernel to the 20221020 upstream
version and fix a couple of issues in it:
* Make acpi_ex_load_op() match upstream implementation (Rafael
Wysocki).
* Add support for loong_arch-specific APICs in MADT (Huacai Chen).
* Add support for fixed PCIe wake event (Huacai Chen).
* Add EBDA pointer sanity checks (Vit Kabele).
* Avoid accessing VGA memory when EBDA < 1KiB (Vit Kabele).
* Add CCEL table support to both compiler/disassembler (Kuppuswamy
Sathyanarayanan).
* Add a couple of new UUIDs to the known UUID list (Bob Moore).
* Add support for FFH Opregion special context data (Sudeep Holla).
* Improve warning message for "invalid ACPI name" (Bob Moore).
* Add support for CXL 3.0 structures (CXIMS & RDPAS) in the CEDT
table (Alison Schofield).
* Prepare IORT support for revision E.e (Robin Murphy).
* Finish support for the CDAT table (Bob Moore).
* Fix error code path in acpi_ds_call_control_method() (Rafael
Wysocki).
* Fix use-after-free in acpi_ut_copy_ipackage_to_ipackage() (Li
Zetao).
* Update the version of the ACPICA code in the kernel (Bob Moore).
- Use ZERO_PAGE(0) instead of empty_zero_page in the ACPI device
enumeration code (Giulio Benetti).
- Change the return type of the ACPI driver remove callback to void and
update its users accordingly (Dawei Li).
- Add general support for FFH address space type and implement the low-
level part of it for ARM64 (Sudeep Holla).
- Fix stale comments in the ACPI tables parsing code and make it print
more messages related to MADT (Hanjun Guo, Huacai Chen).
- Replace invocations of generic library functions with more kernel-
specific counterparts in the ACPI sysfs interface (Christophe JAILLET,
Xu Panda).
- Print full name paths of ACPI power resource objects during
enumeration (Kane Chen).
- Eliminate a compiler warning regarding a missing function prototype
in the ACPI power management code (Sudeep Holla).
- Fix and clean up the ACPI processor driver (Rafael Wysocki, Li Zhong,
Colin Ian King, Sudeep Holla).
- Add quirk for the HP Pavilion Gaming 15-cx0041ur to the ACPI EC
driver (Mia Kanashi).
- Add some mew ACPI backlight handling quirks and update some existing
ones (Hans de Goede).
- Make the ACPI backlight driver prefer the native backlight control
over vendor backlight control when possible (Hans de Goede).
- Drop unsetting ACPI APEI driver data on remove (Uwe Kleine-König).
- Use xchg_release() instead of cmpxchg() for updating new GHES cache
slots (Ard Biesheuvel).
- Clean up the ACPI APEI code (Sudeep Holla, Christophe JAILLET, Jay Lu).
- Add new I2C device enumeration quirks for Medion Lifetab S10346 and
Lenovo Yoga Tab 3 Pro (YT3-X90F) (Hans de Goede).
- Make the ACPI battery driver notify user space about adding new
battery hooks and removing the existing ones (Armin Wolf).
- Modify the pfr_update and pfr_telemetry drivers to use ACPI_FREE()
for freeing acpi_object structures to help diagnostics (Wang ShaoBo).
- Make the ACPI fan driver use sysfs_emit_at() in its sysfs interface
code (ye xingchen).
- Fix the _FIF package extraction failure handling in the ACPI fan
driver (Hanjun Guo).
- Fix the PCC mailbox handling error code path (Huisong Li).
- Avoid using PCC Opregions if there is no platform interrupt allocated
for this purpose (Huisong Li).
- Use sysfs_emit() instead of scnprintf() in the ACPI PAD driver and
CPPC library (ye xingchen).
- Fix some kernel-doc issues in the ACPI GSI processing code (Xiongfeng
Wang).
- Fix name memory leak in pnp_alloc_dev() (Yang Yingliang).
- Do not disable PNP devices on suspend when they cannot be re-enabled
on resume (Hans de Goede).
- Clean up the ACPI thermal driver a bit (Rafael Wysocki).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmOXV10SHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxuOwP/2zew6val2Jf7I/Yxf1iQLlRyGmhFnaH
wpltJvBjlHjAUKnPQ/kLYK9fjuUY5HVgjOE03WpwhFUpmhftYTrSkhoVkJ1Mw9Zl
RNOAEgCG484ThHiTIVp/dMPxrtfuqpdbamhWX3Q51IfXjGW8Vc/lDxIa3k/JQxyq
ko8GFPCoebJrSCfuwaAf2+xSQaf6dq4jpL/rlIk+nYMMB9mQmXhNEhc+l97NaCe8
MyCIGynyNbhGsIlwdHRvTp04EIe8h0Z1+Dyns7g/TrzHj3Aezy7QVZbn8sKdZWa1
W/Ck9QST5tfpDWyr+hUXxUJjEn4Yy+GXjM2xON0EMx5q+JD9XsOpwWOVwTR7CS5s
FwEd6I89SC8OZM86AgMtnGxygjpK24R/kGzHjhG15IQCsypc8Rvzoxl0L0YVoon/
UTkE57GzNWVzu0pY/oXJc2aT7lVqFXMFZ6ft/zHnBRnQmrcIi+xgDO5ni5KxctFN
TVFwbAMCuwVx6IOcVQCZM2g4aJw426KpUn19fKnXvPwR5UIufBaCzSKWMiYrtdXr
O5BM8ElYuyKCWGYEE0GSMjZygyDpyY6ENLH7s7P1IEmFyigBzaaGBbKm108JJq4V
eCWJYTAx8pAptsU/vfuMvEQ1ErfhZ3TTokA5Lv0uPf53VcAnWDb7EAbW6ZGMwFSI
IaV6cv6ILoqO
=GVzp
-----END PGP SIGNATURE-----
mergetag object 6132a490f9
type commit
tag irq-core-2022-12-10
tagger Thomas Gleixner <tglx@linutronix.de> 1670689576 +0100
Updates for the interrupt core and driver subsystem:
- Core:
The bulk is the rework of the MSI subsystem to support per device MSI
interrupt domains. This solves conceptual problems of the current
PCI/MSI design which are in the way of providing support for PCI/MSI[-X]
and the upcoming PCI/IMS mechanism on the same device.
IMS (Interrupt Message Store] is a new specification which allows device
manufactures to provide implementation defined storage for MSI messages
contrary to the uniform and specification defined storage mechanisms for
PCI/MSI and PCI/MSI-X. IMS not only allows to overcome the size limitations
of the MSI-X table, but also gives the device manufacturer the freedom to
store the message in arbitrary places, even in host memory which is shared
with the device.
There have been several attempts to glue this into the current MSI code,
but after lengthy discussions it turned out that there is a fundamental
design problem in the current PCI/MSI-X implementation. This needs some
historical background.
When PCI/MSI[-X] support was added around 2003, interrupt management was
completely different from what we have today in the actively developed
architectures. Interrupt management was completely architecture specific
and while there were attempts to create common infrastructure the
commonalities were rudimentary and just providing shared data structures and
interfaces so that drivers could be written in an architecture agnostic
way.
The initial PCI/MSI[-X] support obviously plugged into this model which
resulted in some basic shared infrastructure in the PCI core code for
setting up MSI descriptors, which are a pure software construct for holding
data relevant for a particular MSI interrupt, but the actual association to
Linux interrupts was completely architecture specific. This model is still
supported today to keep museum architectures and notorious stranglers
alive.
In 2013 Intel tried to add support for hot-pluggable IO/APICs to the kernel,
which was creating yet another architecture specific mechanism and resulted
in an unholy mess on top of the existing horrors of x86 interrupt handling.
The x86 interrupt management code was already an incomprehensible maze of
indirections between the CPU vector management, interrupt remapping and the
actual IO/APIC and PCI/MSI[-X] implementation.
At roughly the same time ARM struggled with the ever growing SoC specific
extensions which were glued on top of the architected GIC interrupt
controller.
This resulted in a fundamental redesign of interrupt management and
provided the today prevailing concept of hierarchical interrupt
domains. This allowed to disentangle the interactions between x86 vector
domain and interrupt remapping and also allowed ARM to handle the zoo of
SoC specific interrupt components in a sane way.
The concept of hierarchical interrupt domains aims to encapsulate the
functionality of particular IP blocks which are involved in interrupt
delivery so that they become extensible and pluggable. The X86
encapsulation looks like this:
|--- device 1
[Vector]---[Remapping]---[PCI/MSI]--|...
|--- device N
where the remapping domain is an optional component and in case that it is
not available the PCI/MSI[-X] domains have the vector domain as their
parent. This reduced the required interaction between the domains pretty
much to the initialization phase where it is obviously required to
establish the proper parent relation ship in the components of the
hierarchy.
While in most cases the model is strictly representing the chain of IP
blocks and abstracting them so they can be plugged together to form a
hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the hardware
it's clear that the actual PCI/MSI[-X] interrupt controller is not a global
entity, but strict a per PCI device entity.
Here we took a short cut on the hierarchical model and went for the easy
solution of providing "global" PCI/MSI domains which was possible because
the PCI/MSI[-X] handling is uniform across the devices. This also allowed
to keep the existing PCI/MSI[-X] infrastructure mostly unchanged which in
turn made it simple to keep the existing architecture specific management
alive.
A similar problem was created in the ARM world with support for IP block
specific message storage. Instead of going all the way to stack a IP block
specific domain on top of the generic MSI domain this ended in a construct
which provides a "global" platform MSI domain which allows overriding the
irq_write_msi_msg() callback per allocation.
In course of the lengthy discussions we identified other abuse of the MSI
infrastructure in wireless drivers, NTB etc. where support for
implementation specific message storage was just mindlessly glued into the
existing infrastructure. Some of this just works by chance on particular
platforms but will fail in hard to diagnose ways when the driver is used
on platforms where the underlying MSI interrupt management code does not
expect the creative abuse.
Another shortcoming of today's PCI/MSI-X support is the inability to
allocate or free individual vectors after the initial enablement of
MSI-X. This results in an works by chance implementation of VFIO (PCI
pass-through) where interrupts on the host side are not set up upfront to
avoid resource exhaustion. They are expanded at run-time when the guest
actually tries to use them. The way how this is implemented is that the
host disables MSI-X and then re-enables it with a larger number of
vectors again. That works by chance because most device drivers set up
all interrupts before the device actually will utilize them. But that's
not universally true because some drivers allocate a large enough number
of vectors but do not utilize them until it's actually required,
e.g. for acceleration support. But at that point other interrupts of the
device might be in active use and the MSI-X disable/enable dance can
just result in losing interrupts and therefore hard to diagnose subtle
problems.
Last but not least the "global" PCI/MSI-X domain approach prevents to
utilize PCI/MSI[-X] and PCI/IMS on the same device due to the fact that IMS
is not longer providing a uniform storage and configuration model.
The solution to this is to implement the missing step and switch from
global PCI/MSI domains to per device PCI/MSI domains. The resulting
hierarchy then looks like this:
|--- [PCI/MSI] device 1
[Vector]---[Remapping]---|...
|--- [PCI/MSI] device N
which in turn allows to provide support for multiple domains per device:
|--- [PCI/MSI] device 1
|--- [PCI/IMS] device 1
[Vector]---[Remapping]---|...
|--- [PCI/MSI] device N
|--- [PCI/IMS] device N
This work converts the MSI and PCI/MSI core and the x86 interrupt
domains to the new model, provides new interfaces for post-enable
allocation/free of MSI-X interrupts and the base framework for PCI/IMS.
PCI/IMS has been verified with the work in progress IDXD driver.
There is work in progress to convert ARM over which will replace the
platform MSI train-wreck. The cleanup of VFIO, NTB and other creative
"solutions" are in the works as well.
- Drivers:
- Updates for the LoongArch interrupt chip drivers
- Support for MTK CIRQv2
- The usual small fixes and updates all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmOUsygTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoYXiD/40tXKzCzf0qFIqUlZLia1N3RRrwrNC
DVTixuLtR9MrjwE+jWLQILa85SHInV8syXHSd35SzhsGDxkURFGi+HBgVWmysODf
br9VSh3Gi+kt7iXtIwAg8WNWviGNmS3kPksxCko54F0YnJhMY5r5bhQVUBQkwFG2
wES1C9Uzd4pdV2bl24Z+WKL85cSmZ+pHunyKw1n401lBABXnTF9c4f13zC14jd+y
wDxNrmOxeL3mEH4Pg6VyrDuTOURSf3TjJjeEq3EYqvUo0FyLt9I/cKX0AELcZQX7
fkRjrQQAvXNj39RJfeSkojDfllEPUHp7XSluhdBu5aIovSamdYGCDnuEoZ+l4MJ+
CojIErp3Dwj/uSaf5c7C3OaDAqH2CpOFWIcrUebShJE60hVKLEpUwd6W8juplaoT
gxyXRb1Y+BeJvO8VhMN4i7f3232+sj8wuj+HTRTTbqMhkElnin94tAx8rgwR1sgR
BiOGMJi4K2Y8s9Rqqp0Dvs01CW4guIYvSR4YY+WDbbi1xgiev89OYs6zZTJCJe4Y
NUwwpqYSyP1brmtdDdBOZLqegjQm+TwUb6oOaasFem4vT1swgawgLcDnPOx45bk5
/FWt3EmnZxMz99x9jdDn1+BCqAZsKyEbEY1avvhPVMTwoVIuSX2ceTBMLseGq+jM
03JfvdxnueM3gw==
=9erA
-----END PGP SIGNATURE-----
Merge tags 'acpi-6.2-rc1' and 'irq-core-2022-12-10' into loongarch-next
LoongArch architecture changes for 6.2 depend on the acpi and irqchip
changes to work, so merge them to create a base.
- Update the ACPICA code in the kernel to the 20221020 upstream
version and fix a couple of issues in it:
* Make acpi_ex_load_op() match upstream implementation (Rafael
Wysocki).
* Add support for loong_arch-specific APICs in MADT (Huacai Chen).
* Add support for fixed PCIe wake event (Huacai Chen).
* Add EBDA pointer sanity checks (Vit Kabele).
* Avoid accessing VGA memory when EBDA < 1KiB (Vit Kabele).
* Add CCEL table support to both compiler/disassembler (Kuppuswamy
Sathyanarayanan).
* Add a couple of new UUIDs to the known UUID list (Bob Moore).
* Add support for FFH Opregion special context data (Sudeep Holla).
* Improve warning message for "invalid ACPI name" (Bob Moore).
* Add support for CXL 3.0 structures (CXIMS & RDPAS) in the CEDT
table (Alison Schofield).
* Prepare IORT support for revision E.e (Robin Murphy).
* Finish support for the CDAT table (Bob Moore).
* Fix error code path in acpi_ds_call_control_method() (Rafael
Wysocki).
* Fix use-after-free in acpi_ut_copy_ipackage_to_ipackage() (Li
Zetao).
* Update the version of the ACPICA code in the kernel (Bob Moore).
- Use ZERO_PAGE(0) instead of empty_zero_page in the ACPI device
enumeration code (Giulio Benetti).
- Change the return type of the ACPI driver remove callback to void and
update its users accordingly (Dawei Li).
- Add general support for FFH address space type and implement the low-
level part of it for ARM64 (Sudeep Holla).
- Fix stale comments in the ACPI tables parsing code and make it print
more messages related to MADT (Hanjun Guo, Huacai Chen).
- Replace invocations of generic library functions with more kernel-
specific counterparts in the ACPI sysfs interface (Christophe JAILLET,
Xu Panda).
- Print full name paths of ACPI power resource objects during
enumeration (Kane Chen).
- Eliminate a compiler warning regarding a missing function prototype
in the ACPI power management code (Sudeep Holla).
- Fix and clean up the ACPI processor driver (Rafael Wysocki, Li Zhong,
Colin Ian King, Sudeep Holla).
- Add quirk for the HP Pavilion Gaming 15-cx0041ur to the ACPI EC
driver (Mia Kanashi).
- Add some mew ACPI backlight handling quirks and update some existing
ones (Hans de Goede).
- Make the ACPI backlight driver prefer the native backlight control
over vendor backlight control when possible (Hans de Goede).
- Drop unsetting ACPI APEI driver data on remove (Uwe Kleine-König).
- Use xchg_release() instead of cmpxchg() for updating new GHES cache
slots (Ard Biesheuvel).
- Clean up the ACPI APEI code (Sudeep Holla, Christophe JAILLET, Jay Lu).
- Add new I2C device enumeration quirks for Medion Lifetab S10346 and
Lenovo Yoga Tab 3 Pro (YT3-X90F) (Hans de Goede).
- Make the ACPI battery driver notify user space about adding new
battery hooks and removing the existing ones (Armin Wolf).
- Modify the pfr_update and pfr_telemetry drivers to use ACPI_FREE()
for freeing acpi_object structures to help diagnostics (Wang ShaoBo).
- Make the ACPI fan driver use sysfs_emit_at() in its sysfs interface
code (ye xingchen).
- Fix the _FIF package extraction failure handling in the ACPI fan
driver (Hanjun Guo).
- Fix the PCC mailbox handling error code path (Huisong Li).
- Avoid using PCC Opregions if there is no platform interrupt allocated
for this purpose (Huisong Li).
- Use sysfs_emit() instead of scnprintf() in the ACPI PAD driver and
CPPC library (ye xingchen).
- Fix some kernel-doc issues in the ACPI GSI processing code (Xiongfeng
Wang).
- Fix name memory leak in pnp_alloc_dev() (Yang Yingliang).
- Do not disable PNP devices on suspend when they cannot be re-enabled
on resume (Hans de Goede).
- Clean up the ACPI thermal driver a bit (Rafael Wysocki).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmOXV10SHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxuOwP/2zew6val2Jf7I/Yxf1iQLlRyGmhFnaH
wpltJvBjlHjAUKnPQ/kLYK9fjuUY5HVgjOE03WpwhFUpmhftYTrSkhoVkJ1Mw9Zl
RNOAEgCG484ThHiTIVp/dMPxrtfuqpdbamhWX3Q51IfXjGW8Vc/lDxIa3k/JQxyq
ko8GFPCoebJrSCfuwaAf2+xSQaf6dq4jpL/rlIk+nYMMB9mQmXhNEhc+l97NaCe8
MyCIGynyNbhGsIlwdHRvTp04EIe8h0Z1+Dyns7g/TrzHj3Aezy7QVZbn8sKdZWa1
W/Ck9QST5tfpDWyr+hUXxUJjEn4Yy+GXjM2xON0EMx5q+JD9XsOpwWOVwTR7CS5s
FwEd6I89SC8OZM86AgMtnGxygjpK24R/kGzHjhG15IQCsypc8Rvzoxl0L0YVoon/
UTkE57GzNWVzu0pY/oXJc2aT7lVqFXMFZ6ft/zHnBRnQmrcIi+xgDO5ni5KxctFN
TVFwbAMCuwVx6IOcVQCZM2g4aJw426KpUn19fKnXvPwR5UIufBaCzSKWMiYrtdXr
O5BM8ElYuyKCWGYEE0GSMjZygyDpyY6ENLH7s7P1IEmFyigBzaaGBbKm108JJq4V
eCWJYTAx8pAptsU/vfuMvEQ1ErfhZ3TTokA5Lv0uPf53VcAnWDb7EAbW6ZGMwFSI
IaV6cv6ILoqO
=GVzp
-----END PGP SIGNATURE-----
Merge tag 'acpi-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and PNP updates from Rafael Wysocki:
"These include new code (for instance, support for the FFH address
space type and support for new firmware data structures in ACPICA),
some new quirks (mostly related to backlight handling and I2C
enumeration), a number of fixes and a fair amount of cleanups all
over.
Specifics:
- Update the ACPICA code in the kernel to the 20221020 upstream
version and fix a couple of issues in it:
- Make acpi_ex_load_op() match upstream implementation (Rafael
Wysocki)
- Add support for loong_arch-specific APICs in MADT (Huacai Chen)
- Add support for fixed PCIe wake event (Huacai Chen)
- Add EBDA pointer sanity checks (Vit Kabele)
- Avoid accessing VGA memory when EBDA < 1KiB (Vit Kabele)
- Add CCEL table support to both compiler/disassembler (Kuppuswamy
Sathyanarayanan)
- Add a couple of new UUIDs to the known UUID list (Bob Moore)
- Add support for FFH Opregion special context data (Sudeep
Holla)
- Improve warning message for "invalid ACPI name" (Bob Moore)
- Add support for CXL 3.0 structures (CXIMS & RDPAS) in the CEDT
table (Alison Schofield)
- Prepare IORT support for revision E.e (Robin Murphy)
- Finish support for the CDAT table (Bob Moore)
- Fix error code path in acpi_ds_call_control_method() (Rafael
Wysocki)
- Fix use-after-free in acpi_ut_copy_ipackage_to_ipackage() (Li
Zetao)
- Update the version of the ACPICA code in the kernel (Bob Moore)
- Use ZERO_PAGE(0) instead of empty_zero_page in the ACPI device
enumeration code (Giulio Benetti)
- Change the return type of the ACPI driver remove callback to void
and update its users accordingly (Dawei Li)
- Add general support for FFH address space type and implement the
low- level part of it for ARM64 (Sudeep Holla)
- Fix stale comments in the ACPI tables parsing code and make it
print more messages related to MADT (Hanjun Guo, Huacai Chen)
- Replace invocations of generic library functions with more kernel-
specific counterparts in the ACPI sysfs interface (Christophe
JAILLET, Xu Panda)
- Print full name paths of ACPI power resource objects during
enumeration (Kane Chen)
- Eliminate a compiler warning regarding a missing function prototype
in the ACPI power management code (Sudeep Holla)
- Fix and clean up the ACPI processor driver (Rafael Wysocki, Li
Zhong, Colin Ian King, Sudeep Holla)
- Add quirk for the HP Pavilion Gaming 15-cx0041ur to the ACPI EC
driver (Mia Kanashi)
- Add some mew ACPI backlight handling quirks and update some
existing ones (Hans de Goede)
- Make the ACPI backlight driver prefer the native backlight control
over vendor backlight control when possible (Hans de Goede)
- Drop unsetting ACPI APEI driver data on remove (Uwe Kleine-König)
- Use xchg_release() instead of cmpxchg() for updating new GHES cache
slots (Ard Biesheuvel)
- Clean up the ACPI APEI code (Sudeep Holla, Christophe JAILLET, Jay
Lu)
- Add new I2C device enumeration quirks for Medion Lifetab S10346 and
Lenovo Yoga Tab 3 Pro (YT3-X90F) (Hans de Goede)
- Make the ACPI battery driver notify user space about adding new
battery hooks and removing the existing ones (Armin Wolf)
- Modify the pfr_update and pfr_telemetry drivers to use ACPI_FREE()
for freeing acpi_object structures to help diagnostics (Wang
ShaoBo)
- Make the ACPI fan driver use sysfs_emit_at() in its sysfs interface
code (ye xingchen)
- Fix the _FIF package extraction failure handling in the ACPI fan
driver (Hanjun Guo)
- Fix the PCC mailbox handling error code path (Huisong Li)
- Avoid using PCC Opregions if there is no platform interrupt
allocated for this purpose (Huisong Li)
- Use sysfs_emit() instead of scnprintf() in the ACPI PAD driver and
CPPC library (ye xingchen)
- Fix some kernel-doc issues in the ACPI GSI processing code
(Xiongfeng Wang)
- Fix name memory leak in pnp_alloc_dev() (Yang Yingliang)
- Do not disable PNP devices on suspend when they cannot be
re-enabled on resume (Hans de Goede)
- Clean up the ACPI thermal driver a bit (Rafael Wysocki)"
* tag 'acpi-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (67 commits)
ACPI: x86: Add skip i2c clients quirk for Medion Lifetab S10346
ACPI: APEI: EINJ: Refactor available_error_type_show()
ACPI: APEI: EINJ: Fix formatting errors
ACPI: processor: perflib: Adjust acpi_processor_notify_smm() return value
ACPI: processor: perflib: Rearrange acpi_processor_notify_smm()
ACPI: processor: perflib: Rearrange unregistration routine
ACPI: processor: perflib: Drop redundant parentheses
ACPI: processor: perflib: Adjust white space
ACPI: processor: idle: Drop unnecessary statements and parens
ACPI: thermal: Adjust critical.flags.valid check
ACPI: fan: Convert to use sysfs_emit_at() API
ACPICA: Fix use-after-free in acpi_ut_copy_ipackage_to_ipackage()
ACPI: battery: Call power_supply_changed() when adding hooks
ACPI: use sysfs_emit() instead of scnprintf()
ACPI: x86: Add skip i2c clients quirk for Lenovo Yoga Tab 3 Pro (YT3-X90F)
ACPI: APEI: Remove a useless include
PNP: Do not disable devices on suspend when they cannot be re-enabled on resume
ACPI: processor: Silence missing prototype warnings
ACPI: processor_idle: Silence missing prototype warnings
ACPI: PM: Silence missing prototype warning
...
- Core:
The bulk is the rework of the MSI subsystem to support per device MSI
interrupt domains. This solves conceptual problems of the current
PCI/MSI design which are in the way of providing support for PCI/MSI[-X]
and the upcoming PCI/IMS mechanism on the same device.
IMS (Interrupt Message Store] is a new specification which allows device
manufactures to provide implementation defined storage for MSI messages
contrary to the uniform and specification defined storage mechanisms for
PCI/MSI and PCI/MSI-X. IMS not only allows to overcome the size limitations
of the MSI-X table, but also gives the device manufacturer the freedom to
store the message in arbitrary places, even in host memory which is shared
with the device.
There have been several attempts to glue this into the current MSI code,
but after lengthy discussions it turned out that there is a fundamental
design problem in the current PCI/MSI-X implementation. This needs some
historical background.
When PCI/MSI[-X] support was added around 2003, interrupt management was
completely different from what we have today in the actively developed
architectures. Interrupt management was completely architecture specific
and while there were attempts to create common infrastructure the
commonalities were rudimentary and just providing shared data structures and
interfaces so that drivers could be written in an architecture agnostic
way.
The initial PCI/MSI[-X] support obviously plugged into this model which
resulted in some basic shared infrastructure in the PCI core code for
setting up MSI descriptors, which are a pure software construct for holding
data relevant for a particular MSI interrupt, but the actual association to
Linux interrupts was completely architecture specific. This model is still
supported today to keep museum architectures and notorious stranglers
alive.
In 2013 Intel tried to add support for hot-pluggable IO/APICs to the kernel,
which was creating yet another architecture specific mechanism and resulted
in an unholy mess on top of the existing horrors of x86 interrupt handling.
The x86 interrupt management code was already an incomprehensible maze of
indirections between the CPU vector management, interrupt remapping and the
actual IO/APIC and PCI/MSI[-X] implementation.
At roughly the same time ARM struggled with the ever growing SoC specific
extensions which were glued on top of the architected GIC interrupt
controller.
This resulted in a fundamental redesign of interrupt management and
provided the today prevailing concept of hierarchical interrupt
domains. This allowed to disentangle the interactions between x86 vector
domain and interrupt remapping and also allowed ARM to handle the zoo of
SoC specific interrupt components in a sane way.
The concept of hierarchical interrupt domains aims to encapsulate the
functionality of particular IP blocks which are involved in interrupt
delivery so that they become extensible and pluggable. The X86
encapsulation looks like this:
|--- device 1
[Vector]---[Remapping]---[PCI/MSI]--|...
|--- device N
where the remapping domain is an optional component and in case that it is
not available the PCI/MSI[-X] domains have the vector domain as their
parent. This reduced the required interaction between the domains pretty
much to the initialization phase where it is obviously required to
establish the proper parent relation ship in the components of the
hierarchy.
While in most cases the model is strictly representing the chain of IP
blocks and abstracting them so they can be plugged together to form a
hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the hardware
it's clear that the actual PCI/MSI[-X] interrupt controller is not a global
entity, but strict a per PCI device entity.
Here we took a short cut on the hierarchical model and went for the easy
solution of providing "global" PCI/MSI domains which was possible because
the PCI/MSI[-X] handling is uniform across the devices. This also allowed
to keep the existing PCI/MSI[-X] infrastructure mostly unchanged which in
turn made it simple to keep the existing architecture specific management
alive.
A similar problem was created in the ARM world with support for IP block
specific message storage. Instead of going all the way to stack a IP block
specific domain on top of the generic MSI domain this ended in a construct
which provides a "global" platform MSI domain which allows overriding the
irq_write_msi_msg() callback per allocation.
In course of the lengthy discussions we identified other abuse of the MSI
infrastructure in wireless drivers, NTB etc. where support for
implementation specific message storage was just mindlessly glued into the
existing infrastructure. Some of this just works by chance on particular
platforms but will fail in hard to diagnose ways when the driver is used
on platforms where the underlying MSI interrupt management code does not
expect the creative abuse.
Another shortcoming of today's PCI/MSI-X support is the inability to
allocate or free individual vectors after the initial enablement of
MSI-X. This results in an works by chance implementation of VFIO (PCI
pass-through) where interrupts on the host side are not set up upfront to
avoid resource exhaustion. They are expanded at run-time when the guest
actually tries to use them. The way how this is implemented is that the
host disables MSI-X and then re-enables it with a larger number of
vectors again. That works by chance because most device drivers set up
all interrupts before the device actually will utilize them. But that's
not universally true because some drivers allocate a large enough number
of vectors but do not utilize them until it's actually required,
e.g. for acceleration support. But at that point other interrupts of the
device might be in active use and the MSI-X disable/enable dance can
just result in losing interrupts and therefore hard to diagnose subtle
problems.
Last but not least the "global" PCI/MSI-X domain approach prevents to
utilize PCI/MSI[-X] and PCI/IMS on the same device due to the fact that IMS
is not longer providing a uniform storage and configuration model.
The solution to this is to implement the missing step and switch from
global PCI/MSI domains to per device PCI/MSI domains. The resulting
hierarchy then looks like this:
|--- [PCI/MSI] device 1
[Vector]---[Remapping]---|...
|--- [PCI/MSI] device N
which in turn allows to provide support for multiple domains per device:
|--- [PCI/MSI] device 1
|--- [PCI/IMS] device 1
[Vector]---[Remapping]---|...
|--- [PCI/MSI] device N
|--- [PCI/IMS] device N
This work converts the MSI and PCI/MSI core and the x86 interrupt
domains to the new model, provides new interfaces for post-enable
allocation/free of MSI-X interrupts and the base framework for PCI/IMS.
PCI/IMS has been verified with the work in progress IDXD driver.
There is work in progress to convert ARM over which will replace the
platform MSI train-wreck. The cleanup of VFIO, NTB and other creative
"solutions" are in the works as well.
- Drivers:
- Updates for the LoongArch interrupt chip drivers
- Support for MTK CIRQv2
- The usual small fixes and updates all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmOUsygTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoYXiD/40tXKzCzf0qFIqUlZLia1N3RRrwrNC
DVTixuLtR9MrjwE+jWLQILa85SHInV8syXHSd35SzhsGDxkURFGi+HBgVWmysODf
br9VSh3Gi+kt7iXtIwAg8WNWviGNmS3kPksxCko54F0YnJhMY5r5bhQVUBQkwFG2
wES1C9Uzd4pdV2bl24Z+WKL85cSmZ+pHunyKw1n401lBABXnTF9c4f13zC14jd+y
wDxNrmOxeL3mEH4Pg6VyrDuTOURSf3TjJjeEq3EYqvUo0FyLt9I/cKX0AELcZQX7
fkRjrQQAvXNj39RJfeSkojDfllEPUHp7XSluhdBu5aIovSamdYGCDnuEoZ+l4MJ+
CojIErp3Dwj/uSaf5c7C3OaDAqH2CpOFWIcrUebShJE60hVKLEpUwd6W8juplaoT
gxyXRb1Y+BeJvO8VhMN4i7f3232+sj8wuj+HTRTTbqMhkElnin94tAx8rgwR1sgR
BiOGMJi4K2Y8s9Rqqp0Dvs01CW4guIYvSR4YY+WDbbi1xgiev89OYs6zZTJCJe4Y
NUwwpqYSyP1brmtdDdBOZLqegjQm+TwUb6oOaasFem4vT1swgawgLcDnPOx45bk5
/FWt3EmnZxMz99x9jdDn1+BCqAZsKyEbEY1avvhPVMTwoVIuSX2ceTBMLseGq+jM
03JfvdxnueM3gw==
=9erA
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Updates for the interrupt core and driver subsystem:
The bulk is the rework of the MSI subsystem to support per device MSI
interrupt domains. This solves conceptual problems of the current
PCI/MSI design which are in the way of providing support for
PCI/MSI[-X] and the upcoming PCI/IMS mechanism on the same device.
IMS (Interrupt Message Store] is a new specification which allows
device manufactures to provide implementation defined storage for MSI
messages (as opposed to PCI/MSI and PCI/MSI-X that has a specified
message store which is uniform accross all devices). The PCI/MSI[-X]
uniformity allowed us to get away with "global" PCI/MSI domains.
IMS not only allows to overcome the size limitations of the MSI-X
table, but also gives the device manufacturer the freedom to store the
message in arbitrary places, even in host memory which is shared with
the device.
There have been several attempts to glue this into the current MSI
code, but after lengthy discussions it turned out that there is a
fundamental design problem in the current PCI/MSI-X implementation.
This needs some historical background.
When PCI/MSI[-X] support was added around 2003, interrupt management
was completely different from what we have today in the actively
developed architectures. Interrupt management was completely
architecture specific and while there were attempts to create common
infrastructure the commonalities were rudimentary and just providing
shared data structures and interfaces so that drivers could be written
in an architecture agnostic way.
The initial PCI/MSI[-X] support obviously plugged into this model
which resulted in some basic shared infrastructure in the PCI core
code for setting up MSI descriptors, which are a pure software
construct for holding data relevant for a particular MSI interrupt,
but the actual association to Linux interrupts was completely
architecture specific. This model is still supported today to keep
museum architectures and notorious stragglers alive.
In 2013 Intel tried to add support for hot-pluggable IO/APICs to the
kernel, which was creating yet another architecture specific mechanism
and resulted in an unholy mess on top of the existing horrors of x86
interrupt handling. The x86 interrupt management code was already an
incomprehensible maze of indirections between the CPU vector
management, interrupt remapping and the actual IO/APIC and PCI/MSI[-X]
implementation.
At roughly the same time ARM struggled with the ever growing SoC
specific extensions which were glued on top of the architected GIC
interrupt controller.
This resulted in a fundamental redesign of interrupt management and
provided the today prevailing concept of hierarchical interrupt
domains. This allowed to disentangle the interactions between x86
vector domain and interrupt remapping and also allowed ARM to handle
the zoo of SoC specific interrupt components in a sane way.
The concept of hierarchical interrupt domains aims to encapsulate the
functionality of particular IP blocks which are involved in interrupt
delivery so that they become extensible and pluggable. The X86
encapsulation looks like this:
|--- device 1
[Vector]---[Remapping]---[PCI/MSI]--|...
|--- device N
where the remapping domain is an optional component and in case that
it is not available the PCI/MSI[-X] domains have the vector domain as
their parent. This reduced the required interaction between the
domains pretty much to the initialization phase where it is obviously
required to establish the proper parent relation ship in the
components of the hierarchy.
While in most cases the model is strictly representing the chain of IP
blocks and abstracting them so they can be plugged together to form a
hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the
hardware it's clear that the actual PCI/MSI[-X] interrupt controller
is not a global entity, but strict a per PCI device entity.
Here we took a short cut on the hierarchical model and went for the
easy solution of providing "global" PCI/MSI domains which was possible
because the PCI/MSI[-X] handling is uniform across the devices. This
also allowed to keep the existing PCI/MSI[-X] infrastructure mostly
unchanged which in turn made it simple to keep the existing
architecture specific management alive.
A similar problem was created in the ARM world with support for IP
block specific message storage. Instead of going all the way to stack
a IP block specific domain on top of the generic MSI domain this ended
in a construct which provides a "global" platform MSI domain which
allows overriding the irq_write_msi_msg() callback per allocation.
In course of the lengthy discussions we identified other abuse of the
MSI infrastructure in wireless drivers, NTB etc. where support for
implementation specific message storage was just mindlessly glued into
the existing infrastructure. Some of this just works by chance on
particular platforms but will fail in hard to diagnose ways when the
driver is used on platforms where the underlying MSI interrupt
management code does not expect the creative abuse.
Another shortcoming of today's PCI/MSI-X support is the inability to
allocate or free individual vectors after the initial enablement of
MSI-X. This results in an works by chance implementation of VFIO (PCI
pass-through) where interrupts on the host side are not set up upfront
to avoid resource exhaustion. They are expanded at run-time when the
guest actually tries to use them. The way how this is implemented is
that the host disables MSI-X and then re-enables it with a larger
number of vectors again. That works by chance because most device
drivers set up all interrupts before the device actually will utilize
them. But that's not universally true because some drivers allocate a
large enough number of vectors but do not utilize them until it's
actually required, e.g. for acceleration support. But at that point
other interrupts of the device might be in active use and the MSI-X
disable/enable dance can just result in losing interrupts and
therefore hard to diagnose subtle problems.
Last but not least the "global" PCI/MSI-X domain approach prevents to
utilize PCI/MSI[-X] and PCI/IMS on the same device due to the fact
that IMS is not longer providing a uniform storage and configuration
model.
The solution to this is to implement the missing step and switch from
global PCI/MSI domains to per device PCI/MSI domains. The resulting
hierarchy then looks like this:
|--- [PCI/MSI] device 1
[Vector]---[Remapping]---|...
|--- [PCI/MSI] device N
which in turn allows to provide support for multiple domains per
device:
|--- [PCI/MSI] device 1
|--- [PCI/IMS] device 1
[Vector]---[Remapping]---|...
|--- [PCI/MSI] device N
|--- [PCI/IMS] device N
This work converts the MSI and PCI/MSI core and the x86 interrupt
domains to the new model, provides new interfaces for post-enable
allocation/free of MSI-X interrupts and the base framework for
PCI/IMS. PCI/IMS has been verified with the work in progress IDXD
driver.
There is work in progress to convert ARM over which will replace the
platform MSI train-wreck. The cleanup of VFIO, NTB and other creative
"solutions" are in the works as well.
Drivers:
- Updates for the LoongArch interrupt chip drivers
- Support for MTK CIRQv2
- The usual small fixes and updates all over the place"
* tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (134 commits)
irqchip/ti-sci-inta: Fix kernel doc
irqchip/gic-v2m: Mark a few functions __init
irqchip/gic-v2m: Include arm-gic-common.h
irqchip/irq-mvebu-icu: Fix works by chance pointer assignment
iommu/amd: Enable PCI/IMS
iommu/vt-d: Enable PCI/IMS
x86/apic/msi: Enable PCI/IMS
PCI/MSI: Provide pci_ims_alloc/free_irq()
PCI/MSI: Provide IMS (Interrupt Message Store) support
genirq/msi: Provide constants for PCI/IMS support
x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN
PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X
PCI/MSI: Provide prepare_desc() MSI domain op
PCI/MSI: Split MSI-X descriptor setup
genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN
genirq/msi: Provide msi_domain_alloc_irq_at()
genirq/msi: Provide msi_domain_ops:: Prepare_desc()
genirq/msi: Provide msi_desc:: Msi_data
genirq/msi: Provide struct msi_map
x86/apic/msi: Remove arch_create_remap_msi_irq_domain()
...
Merge ACPICA changes, including bug fixes and cleanups as well as support
for some recently defined data structures, for 6.2-rc1:
- Make acpi_ex_load_op() match upstream implementation (Rafael Wysocki).
- Add support for loong_arch-specific APICs in MADT (Huacai Chen).
- Add support for fixed PCIe wake event (Huacai Chen).
- Add EBDA pointer sanity checks (Vit Kabele).
- Avoid accessing VGA memory when EBDA < 1KiB (Vit Kabele).
- Add CCEL table support to both compiler/disassembler (Kuppuswamy
Sathyanarayanan).
- Add a couple of new UUIDs to the known UUID list (Bob Moore).
- Add support for FFH Opregion special context data (Sudeep Holla).
- Improve warning message for "invalid ACPI name" (Bob Moore).
- Add support for CXL 3.0 structures (CXIMS & RDPAS) in the CEDT table
(Alison Schofield).
- Prepare IORT support for revision E.e (Robin Murphy).
- Finish support for the CDAT table (Bob Moore).
- Fix error code path in acpi_ds_call_control_method() (Rafael Wysocki).
- Fix use-after-free in acpi_ut_copy_ipackage_to_ipackage() (Li Zetao).
- Update the version of the ACPICA code in the kernel (Bob Moore).
* acpica:
ACPICA: Fix use-after-free in acpi_ut_copy_ipackage_to_ipackage()
ACPICA: Fix error code path in acpi_ds_call_control_method()
ACPICA: Update version to 20221020
ACPICA: Add utcksum.o to the acpidump Makefile
Revert "LoongArch: Provisionally add ACPICA data structures"
ACPICA: Finish support for the CDAT table
ACPICA: IORT: Update for revision E.e
ACPICA: Add CXL 3.0 structures (CXIMS & RDPAS) to the CEDT table
ACPICA: Improve warning message for "invalid ACPI name"
ACPICA: Add support for FFH Opregion special context data
ACPICA: Add a couple of new UUIDs to the known UUID list
ACPICA: iASL: Add CCEL table to both compiler/disassembler
ACPICA: Do not touch VGA memory when EBDA < 1ki_b
ACPICA: Check that EBDA pointer is in valid memory
ACPICA: Events: Support fixed PCIe wake event
ACPICA: MADT: Add loong_arch-specific APICs support
ACPICA: Make acpi_ex_load_op() match upstream
Add sparse memory vmemmap support for LoongArch. SPARSEMEM_VMEMMAP uses a
virtually mapped memmap to optimise pfn_to_page and page_to_pfn
operations. This is the most efficient option when sufficient kernel
resources are available.
Link: https://lkml.kernel.org/r/20221027125253.3458989-3-chenhuacai@loongson.cn
Signed-off-by: Min Zhou <zhoumin@loongson.cn>
Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Will Deacon <will@kernel.org>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Cc: Xuerui Wang <kernel@xen0n.name>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mm/sparse-vmemmap: Generalise helpers and enable for
LoongArch", v14.
This series is in order to enable sparse-vmemmap for LoongArch. But
LoongArch cannot use generic helpers directly because MIPS&LoongArch need
to call pgd_init()/pud_init()/pmd_init() when populating page tables. So
we adjust the prototypes of p?d_init() to make generic helpers can call
them, then enable sparse-vmemmap with generic helpers, and to be further,
generalise vmemmap_populate_hugepages() for ARM64, X86 and LoongArch.
This patch (of 4):
We are preparing to add sparse vmemmap support to LoongArch. MIPS and
LoongArch need to call pgd_init()/pud_init()/pmd_init() when populating
page tables, so adjust their prototypes to make generic helpers can call
them.
NIOS2 declares pmd_init() but doesn't use, just remove it to avoid build
errors.
Link: https://lkml.kernel.org/r/20221027125253.3458989-1-chenhuacai@loongson.cn
Link: https://lkml.kernel.org/r/20221027125253.3458989-2-chenhuacai@loongson.cn
Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Will Deacon <will@kernel.org>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Cc: Xuerui Wang <kernel@xen0n.name>
Cc: Min Zhou <zhoumin@loongson.cn>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Function smp_send_reschedule() is standard kernel API, which is defined
in header file include/linux/smp.h. However, on LoongArch it is defined
as an inline function, this is confusing and kernel modules can not use
this function.
Now we define smp_send_reschedule() as a general function, and add a
EXPORT_SYMBOL_GPL on this function, so that kernel modules can use it.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
- More APCI fixes and improvements for the LoongArch architecture,
adding support for the HTVEC irqchip, suspend-resume, and some
PCI INTx workarounds
- Initial DT support for LoongArch. I'm not even kidding.
- Support for the MTK CIRQv2, a minor deviation from the original version
- Error handling fixes for wpcm450, GIC...
- BE detection for a FSL controller
- Declare the Sifive PLIC as wake-up agnostic
- Simplify fishing out the device data for the ST irqchip
- Mark some data structures as __initconst in the apple-aic driver
- Switch over from strtobool to kstrtobool
- COMPILE_TEST fixes
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmOQsZ8PHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDQMwQAJWBLgkSHnPSJizG+KLpDSljRmkUif7j3PBk
TVCwfj/xXiCUzP7PBhq2DUZWLvrKIiYl79Z6j/Jz4xlavPAVp6Cs+afxmJF2n/wy
OwWF8uevHvFhKIFJUFxrSDaODWGI6afmDT7iflE3/Uz2ahVYoLkz5a10ji78N3JU
peeX81i/kZE19w1aCs0sdMgzJufjK98hj55917GGv/IRBAjj2qi2tOZnPdQuE6d4
+vDiRtKDjDZYp9toaaH7DGBlBMG/yQzBhPaXtMnxH9wGK5858sYmHZroeM4FFbPR
sJ/rCiaBEuvC6F1JkShKsRqT2DW1BgXYjSPxh+qhzwCOSG32DDjkj6LqVkygSoKQ
MogCIkC3C4D0BIiokSdJByN1PWjoynwtYJrUkYPk4fMg7JxHYwee9dCptT81irMM
+3L1mqx9CAll7YkzPVwrCbFyJF9K4ax9kjEvSH3E8xZxLKGnS9iqjQBfE9NKILCl
tT6dmCJXTXsoGFjK4zz4o2N8i8aOeUz/B3GX5eYD+x0s0riXhg4kYIWFVXsxwJlX
VSxBphr5fAQrdgBdOOESmiqNxPv9HuLRyR2mZ4blE4ylDa9NRUd+4UTy8x0bRmxD
BO7qPof0CnrplBNEQi7kDtc91q4UxQoOV/G8LoFp454d9imMlAuPMbizTAULpWZu
JxPXyrJl
=fhi8
-----END PGP SIGNATURE-----
Merge tag 'irqchip-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates frim Marc Zyngier:
- More APCI fixes and improvements for the LoongArch architecture,
adding support for the HTVEC irqchip, suspend-resume, and some
PCI INTx workarounds
- Initial DT support for LoongArch. I'm not even kidding.
- Support for the MTK CIRQv2, a minor deviation from the original version
- Error handling fixes for wpcm450, GIC...
- BE detection for a FSL controller
- Declare the Sifive PLIC as wake-up agnostic
- Simplify fishing out the device data for the ST irqchip
- Mark some data structures as __initconst in the apple-aic driver
- Switch over from strtobool to kstrtobool
- COMPILE_TEST fixes
address post-6.0 issues, which is hopefully a sign that things are
converging.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY4pQpQAKCRDdBJ7gKXxA
jquxAP9Lqif7CGDgdq8uWY2hHS/Ujc3k7Ohgyzs37olnCuU8KwEA6/J7SpjsBgtY
OfzvnwxpCTh8Kfzu/oNckIHo/EEiIA8=
=o6qT
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
"15 hotfixes, 11 marked cc:stable.
Only three or four of the latter address post-6.0 issues, which is
hopefully a sign that things are converging"
* tag 'mm-hotfixes-stable-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
revert "kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible"
Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled
drm/amdgpu: temporarily disable broken Clang builds due to blown stack-frame
mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths
mm/khugepaged: fix GUP-fast interaction by sending IPI
mm/khugepaged: take the right locks for page table retraction
mm: migrate: fix THP's mapcount on isolation
mm: introduce arch_has_hw_nonleaf_pmd_young()
mm: add dummy pmd_young() for architectures not having it
mm/damon/sysfs: fix wrong empty schemes assumption under online tuning in damon_sysfs_set_schemes()
tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep"
nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry()
hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing
madvise: use zap_page_range_single for madvise dontneed
mm: replace VM_WARN_ON to pr_warn if the node is offline with __GFP_THISNODE
In order to avoid #ifdeffery add a dummy pmd_young() implementation as a
fallback. This is required for the later patch "mm: introduce
arch_has_hw_nonleaf_pmd_young()".
Link: https://lkml.kernel.org/r/fd3ac3cd-7349-6bbd-890a-71a9454ca0b3@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
HTVECINTC stands for "HyperTransport Interrupts" that described in
Section 14.3 of "Loongson 3A5000 Processor Reference Manual". For more
information please refer Documentation/loongarch/irq-chip-model.rst.
Though the extended model is the recommended one, there are still some
legacy model machines. So we add ACPI init support for HTVECINTC.
Co-developed-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020142535.1725573-1-chenhuacai@loongson.cn
Set _PAGE_DIRTY only if _PAGE_MODIFIED is set in {pmd,pte}_mkwrite().
Otherwise, _PAGE_DIRTY silences the TLB modify exception and make us
have no chance to mark a pmd/pte dirty (_PAGE_MODIFIED) for software.
Reviewed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Now {pmd,pte}_mkdirty() set _PAGE_DIRTY bit unconditionally, this causes
random segmentation fault after commit 0ccf7f168e ("mm/thp: carry
over dirty bit when thp splits on pmd").
The reason is: when fork(), parent process use pmd_wrprotect() to clear
huge page's _PAGE_WRITE and _PAGE_DIRTY (for COW); then pte_mkdirty() set
_PAGE_DIRTY as well as _PAGE_MODIFIED while splitting dirty huge pages;
once _PAGE_DIRTY is set, there will be no tlb modify exception so the COW
machanism fails; and at last memory corruption occurred between parent
and child processes.
So, we should set _PAGE_DIRTY only when _PAGE_WRITE is set in {pmd,pte}_
mkdirty().
Cc: stable@vger.kernel.org
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
SMP operations can be shared by Loongson-2 series and Loongson-3 series,
so we change the prefix from loongson3 to loongson for all functions and
data structures.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Currently, the EFI entry code for LoongArch is set up to copy the
executable image to the preferred offset, but instead of branching
directly into that image, it branches to the local copy of kernel_entry,
and relies on the logic in that function to switch to the link time
address instead.
This is a bit sloppy, and not something we can support once we merge the
EFI decompressor with the EFI stub. So let's clean this up a bit, by
adding a helper that computes the offset of kernel_entry from the start
of the image, and simply adding the result to VMLINUX_LOAD_ADDRESS.
And considering that we cannot execute from anywhere else anyway, let's
avoid efi_relocate_kernel() and just allocate the pages instead.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Factor out the expressions that describe the preferred placement of the
loaded image as well as the minimum alignment so we can reuse them in
the decompressor.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Currently, arm64, RISC-V and LoongArch rely on the fact that struct
screen_info can be accessed directly, due to the fact that the EFI stub
and the core kernel are part of the same image. This will change after a
future patch, so let's ensure that the screen_info handling is able to
deal with this, by adopting the arm32 approach of passing it as a
configuration table. While at it, switch to ACPI reclaim memory to hold
the screen_info data, which is more appropriate for this kind of
allocation.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Most architectures (except arm64/x86/sparc) simply return 1 for
kern_addr_valid(), which is only used in read_kcore(), and it calls
copy_from_kernel_nofault() which could check whether the address is a
valid kernel address. So as there is no need for kern_addr_valid(), let's
remove it.
Link: https://lkml.kernel.org/r/20221018074014.185687-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Heiko Carstens <hca@linux.ibm.com> [s390]
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Helge Deller <deller@gmx.de> [parisc]
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Acked-by: Guo Ren <guoren@kernel.org> [csky]
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: <aou@eecs.berkeley.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Chris Zankel <chris@zankel.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Xuerui Wang <kernel@xen0n.name>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This reverts commit af6a1cfa68 ("LoongArch: Provisionally add
ACPICA data structures") to fix build error for linux-next on LoongArch,
since acpica is merged to linux-pm.git now.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Eliminate the following coccicheck warning:
./arch/loongarch/include/asm/ptrace.h:32:15-21: WARNING use flexible-array member instead
Reviewed-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Yushan Zhou <katrinzhou@tencent.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
The current LoongArch kernel stack is padded as if obeying the MIPS o32
calling convention (32 bytes), signifying the port's MIPS lineage but no
longer making sense. Remove the padding for clarity.
Reviewed-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
refcounting errors in ZONE_DEVICE pages.
- Peter Xu fixes some userfaultfd test harness instability.
- Various other patches in MM, mainly fixes.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0j6igAKCRDdBJ7gKXxA
jnGxAP99bV39ZtOsoY4OHdZlWU16BUjKuf/cb3bZlC2G849vEwD+OKlij86SG20j
MGJQ6TfULJ8f1dnQDd6wvDfl3FMl7Qc=
=tbdp
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
- fix a race which causes page refcounting errors in ZONE_DEVICE pages
(Alistair Popple)
- fix userfaultfd test harness instability (Peter Xu)
- various other patches in MM, mainly fixes
* tag 'mm-stable-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (29 commits)
highmem: fix kmap_to_page() for kmap_local_page() addresses
mm/page_alloc: fix incorrect PGFREE and PGALLOC for high-order page
mm/selftest: uffd: explain the write missing fault check
mm/hugetlb: use hugetlb_pte_stable in migration race check
mm/hugetlb: fix race condition of uffd missing/minor handling
zram: always expose rw_page
LoongArch: update local TLB if PTE entry exists
mm: use update_mmu_tlb() on the second thread
kasan: fix array-bounds warnings in tests
hmm-tests: add test for migrate_device_range()
nouveau/dmem: evict device private memory during release
nouveau/dmem: refactor nouveau_dmem_fault_copy_one()
mm/migrate_device.c: add migrate_device_range()
mm/migrate_device.c: refactor migrate_vma and migrate_deivce_coherent_page()
mm/memremap.c: take a pgmap reference on page allocation
mm: free device private pages have zero refcount
mm/memory.c: fix race when faulting a device private page
mm/damon: use damon_sz_region() in appropriate place
mm/damon: move sz_damon_region to damon_sz_region
lib/test_meminit: add checks for the allocation functions
...
Currently, the implementation of update_mmu_tlb() is empty if
__HAVE_ARCH_UPDATE_MMU_TLB is not defined. Then if two threads
concurrently fault at the same page, the second thread that did not win
the race will give up and do nothing. In the LoongArch architecture, this
second thread will trigger another fault, and only updates its local TLB.
Instead of triggering another fault, it's better to implement
update_mmu_tlb() to directly update the local TLB of the second thread.
Just do it.
Link: https://lkml.kernel.org/r/20220929112318.32393-3-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Suggested-by: Bibo Mao <maobibo@loongson.cn>
Acked-by: Huacai Chen <chenhuacai@loongson.cn>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- Valentin Schneider makes crash-kexec work properly when invoked from
an NMI-time panic.
- ntfs bugfixes from Hawkins Jiawei
- Jiebin Sun improves IPC msg scalability by replacing atomic_t's with
percpu counters.
- nilfs2 cleanups from Minghao Chi
- lots of other single patches all over the tree!
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0Yf0gAKCRDdBJ7gKXxA
joapAQDT1d1zu7T8yf9cQXkYnZVuBKCjxKE/IsYvqaq1a42MjQD/SeWZg0wV05B8
DhJPj9nkEp6R3Rj3Mssip+3vNuceAQM=
=lUQY
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- hfs and hfsplus kmap API modernization (Fabio Francesco)
- make crash-kexec work properly when invoked from an NMI-time panic
(Valentin Schneider)
- ntfs bugfixes (Hawkins Jiawei)
- improve IPC msg scalability by replacing atomic_t's with percpu
counters (Jiebin Sun)
- nilfs2 cleanups (Minghao Chi)
- lots of other single patches all over the tree!
* tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits)
include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype
proc: test how it holds up with mapping'less process
mailmap: update Frank Rowand email address
ia64: mca: use strscpy() is more robust and safer
init/Kconfig: fix unmet direct dependencies
ia64: update config files
nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure
fork: remove duplicate included header files
init/main.c: remove unnecessary (void*) conversions
proc: mark more files as permanent
nilfs2: remove the unneeded result variable
nilfs2: delete unnecessary checks before brelse()
checkpatch: warn for non-standard fixes tag style
usr/gen_init_cpio.c: remove unnecessary -1 values from int file
ipc/msg: mitigate the lock contention with percpu counter
percpu: add percpu_counter_add_local and percpu_counter_sub_local
fs/ocfs2: fix repeated words in comments
relay: use kvcalloc to alloc page array in relay_alloc_page_array
proc: make config PROC_CHILDREN depend on PROC_FS
fs: uninline inode_maybe_inc_iversion()
...
BPF programs are normally handled by a BPF interpreter, add BPF JIT
support for LoongArch to allow the kernel to generate native code when
a program is loaded into the kernel. This will significantly speed-up
processing of BPF programs.
Co-developed-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>