lru_gen_migrate_mm() assumes lru_gen_add_mm() runs prior to itself. This
isn't true for the following scenario:
CPU 1 CPU 2
clone()
cgroup_can_fork()
cgroup_procs_write()
cgroup_post_fork()
task_lock()
lru_gen_migrate_mm()
task_unlock()
task_lock()
lru_gen_add_mm()
task_unlock()
And when the above happens, kernel crashes because of linked list
corruption (mm_struct->lru_gen.list).
Link: https://lore.kernel.org/r/20230115134651.30028-1-msizanoen@qtmlabs.xyz/
Link: https://lkml.kernel.org/r/20230116034405.2960276-1-yuzhao@google.com
Fixes: bd74fdaea1 ("mm: multi-gen LRU: support page table walks")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reported-by: msizanoen <msizanoen@qtmlabs.xyz>
Tested-by: msizanoen <msizanoen@qtmlabs.xyz>
Cc: <stable@vger.kernel.org> [6.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This reverts commit 12a5d39552.
Although it is recognized that a finer grained pro-active reclaim is
something we need and want the semantic of this implementation is really
ambiguous.
In a follow up discussion it became clear that there are two essential
usecases here. One is to use memory.reclaim to pro-actively reclaim
memory and expectation is that the requested and reported amount of memory
is uncharged from the memcg. Another usecase focuses on pro-active
demotion when the memory is merely shuffled around to demotion targets
while the overall charged memory stays unchanged.
The current implementation considers demoted pages as reclaimed and that
break both usecases. [1] has tried to address the reporting part but
there are more issues with that summarized in [2] and follow up emails.
Let's revert the nodemask based extension of the memcg pro-active
reclaim for now until we settle with a more robust semantic.
[1] http://lkml.kernel.org/r/http://lkml.kernel.org/r/20221206023406.3182800-1-almasrymina@google.com
[2] http://lkml.kernel.org/r/Y5bsmpCyeryu3Zz1@dhcp22.suse.cz
Link: https://lkml.kernel.org/r/Y5xASNe1x8cusiTx@dhcp22.suse.cz
Fixes: 12a5d39552 ("mm: add nodes= arg to memory.reclaim")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Wei Xu <weixugc@google.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: zefan li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently, there is a race between zs_free() and zs_reclaim_page():
zs_reclaim_page() finds a handle to an allocated object, but before the
eviction happens, an independent zs_free() call to the same handle could
come in and overwrite the object value stored at the handle with the last
deferred handle. When zs_reclaim_page() finally gets to call the eviction
handler, it will see an invalid object value (i.e the previous deferred
handle instead of the original object value).
This race happens quite infrequently. We only managed to produce it with
out-of-tree developmental code that triggers zsmalloc writeback with a
much higher frequency than usual.
This patch fixes this race by storing the deferred handle in the object
header instead. We differentiate the deferred handle from the other two
cases (handle for allocated object, and linkage for free object) with a
new tag. If zspage reclamation succeeds, we will free these deferred
handles by walking through the zspage objects. On the other hand, if
zspage reclamation fails, we reconstruct the zspage freelist (with the
deferred handle tag and allocated tag) before trying again with the
reclamation.
[arnd@arndb.de: avoid unused-function warning]
Link: https://lkml.kernel.org/r/20230117170507.2651972-1-arnd@kernel.org
Link: https://lkml.kernel.org/r/20230110231701.326724-1-nphamcs@gmail.com
Fixes: 9997bc0175 ("zsmalloc: implement writeback mechanism for zsmalloc")
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
If an ->anon_vma is attached to the VMA, collapse_and_free_pmd() requires
it to be locked.
Page table traversal is allowed under any one of the mmap lock, the
anon_vma lock (if the VMA is associated with an anon_vma), and the
mapping lock (if the VMA is associated with a mapping); and so to be
able to remove page tables, we must hold all three of them.
retract_page_tables() bails out if an ->anon_vma is attached, but does
this check before holding the mmap lock (as the comment above the check
explains).
If we racily merged an existing ->anon_vma (shared with a child
process) from a neighboring VMA, subsequent rmap traversals on pages
belonging to the child will be able to see the page tables that we are
concurrently removing while assuming that nothing else can access them.
Repeat the ->anon_vma check once we hold the mmap lock to ensure that
there really is no concurrent page table access.
Hitting this bug causes a lockdep warning in collapse_and_free_pmd(),
in the line "lockdep_assert_held_write(&vma->anon_vma->root->rwsem)".
It can also lead to use-after-free access.
Link: https://lore.kernel.org/linux-mm/CAG48ez3434wZBKFFbdx4M9j6eUwSUVPd4dxhzW_k_POneSDF+A@mail.gmail.com/
Link: https://lkml.kernel.org/r/20230111133351.807024-1-jannh@google.com
Fixes: f3f0e1d215 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Jann Horn <jannh@google.com>
Reported-by: Zach O'Keefe <zokeefe@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@intel.linux.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mas_empty_area_rev() was not correctly validating the start of a gap
against the lower limit. This could lead to the range starting lower than
the requested minimum.
Fix the issue by better validating a gap once one is found.
This commit also adds tests to the maple tree test suite for this issue
and tests the mas_empty_area() function for similar bound checking.
Link: https://lkml.kernel.org/r/20230111200136.1851322-1-Liam.Howlett@oracle.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216911
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: <amanieu@gmail.com>
Link: https://lore.kernel.org/linux-mm/0b9f5425-08d4-8013-aa4c-e620c3b10bb2@leemhuis.info/
Tested-by: Holger Hoffsttte <holger@applied-asynchrony.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When use tools/testing/selftests/kselftest_install.sh to make the
kselftest-list.txt under tools/testing/selftests/kselftest_install.
Then use tools/testing/selftests/kselftest_install/run_kselftest.sh to run
all the kselftests in kselftest-list.txt, it will be blocked by case
"filesystems/fat: run_fat_tests.sh" with "Warning: file run_fat_tests.sh
is not executable", so grant executable permission to run_fat_tests.sh to
fix this issue.
Link: https://lkml.kernel.org/r/dfdbba6df8a1ab34bb1e81cd8bd7ca3f9ed5c369.1673424747.git.pengfei.xu@intel.com
Fixes: dd7c9be330 ("selftests/filesystems: add a vfat RENAME_EXCHANGE test")
Signed-off-by: Pengfei Xu <pengfei.xu@intel.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
__USE_GNU should be an internal macro only used inside glibc. Either
memfd_create() or fallocate() requires _GNU_SOURCE per man page, where
__USE_GNU will further be defined by glibc headers include/features.h:
#ifdef _GNU_SOURCE
# define __USE_GNU 1
#endif
This fixes:
>> hugetlb-madvise.c:20: warning: "__USE_GNU" redefined
20 | #define __USE_GNU
|
In file included from /usr/include/x86_64-linux-gnu/bits/libc-header-start.h:33,
from /usr/include/stdlib.h:26,
from hugetlb-madvise.c:16:
/usr/include/features.h:407: note: this is the location of the previous definition
407 | # define __USE_GNU 1
|
Link: https://lkml.kernel.org/r/Y8V9z+z6Tk7NetI3@x1n
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This patch should harden commit 15520a3f04 ("mm: use pte markers for
swap errors") on using pte markers for swapin errors on a few corner
cases.
1. Propagate swapin errors across fork()s: if there're swapin errors in
the parent mm, after fork()s the child should sigbus too when an error
page is accessed.
2. Fix a rare condition race in pte_marker_clear() where a uffd-wp pte
marker can be quickly switched to a swapin error.
3. Explicitly ignore swapin error pte markers in change_protection().
I mostly don't worry on (2) or (3) at all, but we should still have them.
Case (1) is special because it can potentially cause silent data corrupt
on child when parent has swapin error triggered with swapoff, but since
swapin error is rare itself already it's probably not easy to trigger
either.
Currently there is a priority difference between the uffd-wp bit and the
swapin error entry, in which the swapin error always has higher priority
(e.g. we don't need to wr-protect a swapin error pte marker).
If there will be a 3rd bit introduced, we'll probably need to consider a
more involved approach so we may need to start operate on the bits. Let's
leave that for later.
This patch is tested with case (1) explicitly where we'll get corrupted
data before in the child if there's existing swapin error pte markers, and
after patch applied the child can be rightfully killed.
We don't need to copy stable for this one since 15520a3f04 just landed
as part of v6.2-rc1, only "Fixes" applied.
Link: https://lkml.kernel.org/r/20221214200453.1772655-3-peterx@redhat.com
Fixes: 15520a3f04 ("mm: use pte markers for swap errors")
Signed-off-by: Peter Xu <peterx@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mm: Fixes on pte markers".
Patch 1 resolves the syzkiller report from Pengfei.
Patch 2 further harden pte markers when used with the recent swapin error
markers. The major case is we should persist a swapin error marker after
fork(), so child shouldn't read a corrupted page.
This patch (of 2):
When fork(), dst_vma is not guaranteed to have VM_UFFD_WP even if src may
have it and has pte marker installed. The warning is improper along with
the comment. The right thing is to inherit the pte marker when needed, or
keep the dst pte empty.
A vague guess is this happened by an accident when there's the prior patch
to introduce src/dst vma into this helper during the uffd-wp feature got
developed and I probably messed up in the rebase, since if we replace
dst_vma with src_vma the warning & comment it all makes sense too.
Hugetlb did exactly the right here (copy_hugetlb_page_range()). Fix the
general path.
Reproducer:
https://github.com/xupengfe/syzkaller_logs/blob/main/221208_115556_copy_page_range/repro.c
Bugzilla report: https://bugzilla.kernel.org/show_bug.cgi?id=216808
Link: https://lkml.kernel.org/r/20221214200453.1772655-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20221214200453.1772655-2-peterx@redhat.com
Fixes: c56d1b62cc ("mm/shmem: handle uffd-wp during fork()")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: <stable@vger.kernel.org> # 5.19+
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- Fixes for two resctrl races when moving a task or creating a new monitoring
group
- Fix SEV-SNP guests running under HyperV where MTRRs are disabled to not return
a UC- type mapping type on memremap() and thus cause a serious slowdown
- Fix insn mnemonics in bioscall.S now that binutils is starting to fix
confusing insn suffixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPD5xsACgkQEsHwGGHe
VUr65g/8CkfKQKIQ/kPn1B+M/PI4S8DBmz7CdufQTbB66GSfDwRpGnxIKJKZM1UG
pyOmP1kHVXGGCsvFQimalxnBtFx6t3wFS+R+c/l5pRtgU63bwQpNjJ+vBcBpb4xs
J7f06VgF6jB8qk0NqXBKNJt4kauZA50wiErYm8PU/yf0tmHratjrvDfKQmus9pI4
AMcQEudRhDDo8xqn8fvIC/S7xm9/TgNzKxYP+HciPa74HZm5vrjS0hIyZkIsh5d7
Q4MsP4VaBH12W6MbpF9TBw5VTDomwY8oS4xTNfkhyOTcHi8uLrMjA66VmaJwWXpe
EZjOk4+KhtxYigaI5oQO9M5e9IINK6ZfbzU65P74wTvP3orszsBPG7QvodIzLc76
YI1GEea/bXgxYgPuMR6spZoGMS58K7BXgViVMVRwZ9bZk17+5pfbLkzKqZsfw/s/
nj3n8z4ayoc+ffDPllh0471Dn16ugKMf+EvW2Su1Q1QoaA7icNNkZrKaM6eSuoFr
bClsrHeglQFadwy4kmY0fi7BUfiKTp+c5Ur7lM7VojrjH0XcoZreThVg0tNrk3ZR
IAyBjtCdtg1rzrRfb7qBf6B+WNGdxYWGuDgOPiH1Xh+/plvyCqQ2/3AGWQAbu/qP
to+IYmZg09mRpVBDYpCY4z6IzRbMJ9rRho4rNXCQUeAzSVMCrZE=
=W2zR
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v6.2_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Make sure the poking PGD is pinned for Xen PV as it requires it this
way
- Fixes for two resctrl races when moving a task or creating a new
monitoring group
- Fix SEV-SNP guests running under HyperV where MTRRs are disabled to
not return a UC- type mapping type on memremap() and thus cause a
serious slowdown
- Fix insn mnemonics in bioscall.S now that binutils is starting to fix
confusing insn suffixes
* tag 'x86_urgent_for_v6.2_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: fix poking_init() for Xen PV guests
x86/resctrl: Fix event counts regression in reused RMIDs
x86/resctrl: Fix task CLOSID/RMID update race
x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case
x86/boot: Avoid using Intel mnemonics in AT&T syntax asm
- Fix a memory leak in highbank's probing function
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPD41kACgkQEsHwGGHe
VUqYOQ//bsKMgAbPDbvgSm8OkFvtLd+r38RE+dcg84QrBaeQ7z2x269i1bUhqkQV
baatecdz+VoNQK3zmrIRshwgCGuwDEpRWAuPKJxoCfSU1eNzEipxHCoctNcp2q+Y
7nP65JtkQ6y1nEnBYh5h1snwbM3yuLIvmmuhYhPpd204k00L16Tzsu1uGiz7nDsZ
9ohBtS6laHztJnVd9gKIHOcgBCylPjNYpRecsgZk2COK/O9uTo8drQ46gKxwHQ4l
P4RsvDligoaFO8rRLAzg2sfsOt9O2runkdWhYnVKSLdXOOA+wQ+eacOuzWs5dNpQ
BQfWZB3GwxQ3+G1c2WOVrq/15YNrxpqMA9/060vPzGqW9VDCnOnoS3w5nHCDw5Ep
YB+3OVn31Z4Hw+nP/WCR/0QYKDSBmqvGd4wKmHcUnL22rRZSl6UZTSqnBw3yrhV9
YzmZ+r3B9b7AtvDs3qWMjQwFYnxmeU7DSXsxjbUpDcMDuy/Y8/P9CSzE2ruC26L1
TdK/Vy6igoDCThzv0RMFEUfmpojIYoTtGNuyMrA90ZNKwXKVOcf9mM3TQm4vLH7m
P3Q9UvBlApoaKbXSZaU0cV+klnba2prAnu4LTVeBWoyjPlmFZkv9ZxDHId8n2rie
dTe/KZJRlScfwVtp83xh428ixdYWNs03R6P5tQAMKoZNLnixqXQ=
=TiMH
-----END PGP SIGNATURE-----
Merge tag 'edac_urgent_for_v6.2_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fixes from Borislav Petkov:
- Fix the EDAC device's confusion in the polling setting units
- Fix a memory leak in highbank's probing function
* tag 'edac_urgent_for_v6.2_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/highbank: Fix memory leak in highbank_mc_probe()
EDAC/device: Fix period calculation in edac_device_reset_delay_period()
- Fix a build failure with some versions of ld that have an odd version string.
- Fix incorrect use of mutex in the IMC PMU driver.
Thanks to: Kajol Jain, Michael Petlan, Ojaswin Mujoo, Peter Zijlstra, Yang Yingliang.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmPD2ZoTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgIJTD/9HeqFTpviUPd0QCW2HWVv2jwLA97Qy
BGsMVz6eSk+f3YBFDFJ0n5KqYR4Bry5mgwPm1nQe86cKr+ZCdtnF+10vPL0TEdK5
M4a+u9PA+xqq9ukIN96ZP7QGi80YY0RRIrfkOwK6iQJVLSxT+TUcl1ko/iPalbI7
/5c2BCnkIoDbwU1ux3/6+uJCDFE3oLy5v1nt/mbzuekFTRvPGRCt+DWBywrYKYi8
XfBBVTz6F+PoBZ7vZa6Kv5IQANiuU4COTjpm9AjVvjp0oKfYskBmtZvHQhr3v6v4
HZsms49w0r3D7sZgEB2hmKFM2/QijSDeyBxnmt/hHeNYMMiFg+0lhoxoNoykzcN9
UfFU7NID3uqruYMkhAxiCIvyul9Vzcr+pe3GgooY+AtuokhuMUEUXJjEDIyXtoci
2VnEsdbl0/gihdHedfhLRXlEn8xz6fQvxDcpYZClSIDeS/nL7cuPd/9+JHn1hilq
aJ0MX6VUKnwkSBb9Gkd1bt09jS6lqDUQS5+88IMvoJo53xHVlHF+5kqXAJRkpt1w
XsDMBuKqT/aEC+rI5GyHXNglGuBYqMvEmbdEGtIVFCbUkcI35Xe8RXKmqDbO2U7N
qqfYj53dgaptJF/sllcGuCTj6JrdJvKEVnuzsIwoE+XyXzr1e9UBmyZ+I//VjTjL
Mhy9rXp7tJYnkw==
=BaFo
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix a build failure with some versions of ld that have an odd version
string
- Fix incorrect use of mutex in the IMC PMU driver
Thanks to Kajol Jain, Michael Petlan, Ojaswin Mujoo, Peter Zijlstra, and
Yang Yingliang.
* tag 'powerpc-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s/hash: Make stress_hpt_timer_fn() static
powerpc/imc-pmu: Fix use of mutex in IRQs disabled section
powerpc/boot: Fix incorrect version calculation issue in ld_version
If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages()
only releases pages to the buddy allocator if they are not in the
deferred range. This is correct for free pages (as defined by
for_each_free_mem_pfn_range_in_zone()) because free pages in the
deferred range will be initialized and released as part of the deferred
init process. memblock_free_pages() is called by memblock_free_late(),
which is used to free reserved ranges after memblock_free_all() has
run. All pages in reserved ranges have been initialized at that point,
and accordingly, those pages are not touched by the deferred init
process. This means that currently, if the pages that
memblock_free_late() intends to release are in the deferred range, they
will never be released to the buddy allocator. They will forever be
reserved.
In addition, memblock_free_pages() calls kmsan_memblock_free_pages(),
which is also correct for free pages but is not correct for reserved
pages. KMSAN metadata for reserved pages is initialized by
kmsan_init_shadow(), which runs shortly before memblock_free_all().
For both of these reasons, memblock_free_pages() should only be called
for free pages, and memblock_free_late() should call __free_pages_core()
directly instead.
One case where this issue can occur in the wild is EFI boot on
x86_64. The x86 EFI code reserves all EFI boot services memory ranges
via memblock_reserve() and frees them later via memblock_free_late()
(efi_reserve_boot_services() and efi_free_boot_services(),
respectively). If any of those ranges happens to fall within the
deferred init range, the pages will not be released and that memory will
be unavailable.
For example, on an Amazon EC2 t3.micro VM (1 GB) booting via EFI:
v6.2-rc2:
Node 0, zone DMA
spanned 4095
present 3999
managed 3840
Node 0, zone DMA32
spanned 246652
present 245868
managed 178867
v6.2-rc2 + patch:
Node 0, zone DMA
spanned 4095
present 3999
managed 3840
Node 0, zone DMA32
spanned 246652
present 245868
managed 222816 # +43,949 pages
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmPCrI8QHHJwcHRAa2Vy
bmVsLm9yZwAKCRA5A4Ymyw79kT1lB/wPbLpePLzZfDGyV/NR9gi4FuJiaRfhlklV
rbxnJce050GERbSQoF/r4zrxn2pzvIWGMh1xWZBGi/q8mT2rOIYtVqUahY9YuL/Z
7+xqdCOALIxEj+cXqYocqp8/NFgUWLGuMoomc9lWvEkUs+zOvkD8Z/bRecfPYvOa
BftPALmtXgx46Ecce0gZvvh4YULpVLNdDPPiwZTabV+47Cl8+cJ0Y+iEHsUfOesU
hQG0unWJH77O3IU4QxiirLekLP/6a5O5f0W7u3PZmNNv7N+UdwE+De+QF0aamfgA
LZDO1qOakflegFZvK0JchCzS4hc6dtRKqIvNM3cCBMXLvV4REHKP
=geNh
-----END PGP SIGNATURE-----
Merge tag 'fixes-2023-01-14' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock fix from Mike Rapoport:
"memblock: always release pages to the buddy allocator in
memblock_free_late()
If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages()
only releases pages to the buddy allocator if they are not in the
deferred range. This is correct for free pages (as defined by
for_each_free_mem_pfn_range_in_zone()) because free pages in the
deferred range will be initialized and released as part of the
deferred init process.
memblock_free_pages() is called by memblock_free_late(), which is used
to free reserved ranges after memblock_free_all() has run. All pages
in reserved ranges have been initialized at that point, and
accordingly, those pages are not touched by the deferred init process.
This means that currently, if the pages that memblock_free_late()
intends to release are in the deferred range, they will never be
released to the buddy allocator. They will forever be reserved.
In addition, memblock_free_pages() calls kmsan_memblock_free_pages(),
which is also correct for free pages but is not correct for reserved
pages. KMSAN metadata for reserved pages is initialized by
kmsan_init_shadow(), which runs shortly before memblock_free_all().
For both of these reasons, memblock_free_pages() should only be called
for free pages, and memblock_free_late() should call
__free_pages_core() directly instead.
One case where this issue can occur in the wild is EFI boot on x86_64.
The x86 EFI code reserves all EFI boot services memory ranges via
memblock_reserve() and frees them later via memblock_free_late()
(efi_reserve_boot_services() and efi_free_boot_services(),
respectively).
If any of those ranges happens to fall within the deferred init range,
the pages will not be released and that memory will be unavailable.
For example, on an Amazon EC2 t3.micro VM (1 GB) booting via EFI:
v6.2-rc2:
Node 0, zone DMA
spanned 4095
present 3999
managed 3840
Node 0, zone DMA32
spanned 246652
present 245868
managed 178867
v6.2-rc2 + patch:
Node 0, zone DMA
spanned 4095
present 3999
managed 3840
Node 0, zone DMA32
spanned 246652
present 245868
managed 222816 # +43,949 pages"
* tag 'fixes-2023-01-14' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
mm: Always release pages to the buddy allocator in memblock_free_late().
Just one fix for modules by Nick.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmPB5S4SHG1jZ3JvZkBr
ZXJuZWwub3JnAAoJEM4jHQowkoin0OUQALx+uch7cZvl3aznH4CAkF1U2mKuESLT
unQsnSlxIF0BETTAdPbhcCVEtKi102Xq4R6ffCSilB33b0IGM6qY6ZUQb/GMGhg5
qIur7EDiFksjCBmBuE1iNaAknkiKeirkJTEC4lEOl8OLXprmnuMeeR+E0BDi5sH3
YIxexLuhyKeF9Ke4bjAJ7A4WKvh2R7yamISCUcP6GL3w7U9urSjo9FHKcUmFH9nr
qCX/1zE0fz7iJzb9YtBNVhdgKNOYNOxa7TDNXQCVuLcZQfmQqDMBVgKdOkj6TAWX
6L2CHT4N9IjPt9+DrlEUf6bSSwP4N4aFdyMAo6UlVXcgvEbTS/kdoMSqFOAQSAdb
G/lzsvS3fD76VcCdZkwAFYnEhUTJ4xWTS0oaI++tu0EFX5lvRuQv2DRt0WULlD/u
L0paUwmjtVajcSIATxRZkjoMiVD4btDRz30kaIUU/xoc1Gg/EADrSLHESaZ9eZVL
EJ40aqLLIRBXGZrVEzvf97HIzuQiKfaPzywNvbMpxG3m0tV2pn3Z4ts/A8aO7c+O
mBDnTURiZN6pT+xsnJBvqWrlXwPRUGwI+NjRcdPZhUyfgj5MHpEI8PAcUWy6TTUn
H2P6x2iC3/nypqhnwjoixSptjaUWcak3R6UgwVS2YqfjePCqaq0wg9DAFDQH0Yx3
awAOoum0Ubin
=aa3U
-----END PGP SIGNATURE-----
Merge tag 'modules-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull module fix from Luis Chamberlain:
"Just one fix for modules by Nick"
* tag 'modules-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
kallsyms: Fix scheduling with interrupts disabled in self-test
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmPB/t0ACgkQiiy9cAdy
T1Gvbwv+LIXF5dHNGHDuezecbD8T9sRF2v15Mh7i6SSg7BWeXXebYY4dyrSQ5SHu
KqsN2Y3P3A0ZQ3ApzFryImM6BwSpOeyCsVRhl7VCWnMgXcroqc/O6F6/YRFVkUAi
iWWZLXM7WFBQGXUbPXaiPc+wCRARrnul9p+48Teyy0CJWiWormQmkznVxeihErDX
/pWdQdvJeFcUrIj1H3e4cyJF2hVzRiUGI/eZmBGlDyaK192vYgGYO2AhHnTfd7fU
dUJ+/trVw0koyC5/86veHRqCcXzFD44ORkAB46NaCic1K8t+RPhmxgtriiNLcCrQ
kEmeub6ayPkuniV88NBEPXaDy0S/cEYHr7GEuZTn4sq+hw//y5KbKN3sa6aLHsMH
46BIHcyTXU59eNJ4lWOjoqD2NiqP2GYFY4PZftB85H1EW8Fchwcsw4WzW0ENpzmi
qcWslXDKYjJIZ6NHnBiR/FYI1VdsmoGbDzMhrHreCrWCuIuVaqfrFPMnvE8f5dqY
fkfnkqT4
=SdZA
-----END PGP SIGNATURE-----
Merge tag '6.2-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
- memory leak and double free fix
- two symlink fixes
- minor cleanup fix
- two smb1 fixes
* tag '6.2-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix uninitialized memory read for smb311 posix symlink create
cifs: fix potential memory leaks in session setup
cifs: do not query ifaces on smb1 mounts
cifs: fix double free on failed kerberos auth
cifs: remove redundant assignment to the variable match
cifs: fix file info setting in cifs_open_file()
cifs: fix file info setting in cifs_query_path_info()
Two minor fixes in the hisi_sas driver which only impact enterprise
style multi-expander and shared disk situations and no core changes.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCY8KxpSYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishRAlAP9cQmiq
M8WGimymqulRFEpkWpDM1R06eqHDI2K0h5/rfAD+KlgS0cOVvx0nenWyhmprYvDX
2Z239J+WXOjSJq8UJL4=
=gMQX
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two minor fixes in the hisi_sas driver which only impact enterprise
style multi-expander and shared disk situations and no core changes"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: hisi_sas: Set a port invalid only if there are no devices attached when refreshing port id
scsi: hisi_sas: Use abort task set to reset SAS disks when discovered
A single fix for rc4 to prevent building the pata_cs5535 driver with
user mode linux as it uses msr operations that are not defined with UML.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCY8HnowAKCRDdoc3SxdoY
dmDWAPwKMUiDzFSkeD7zfKGwCd72HWUa1298yL+XnD8Y7vLBtgEAqhYi9fAnVnN0
dUHm12rEwFOay+lVwxWuQowFaVmzGAs=
=RXSo
-----END PGP SIGNATURE-----
Merge tag 'ata-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ATA fix from Damien Le Moal:
"A single fix to prevent building the pata_cs5535 driver with user mode
linux as it uses msr operations that are not defined with UML"
* tag 'ata-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: pata_cs5535: Don't build on UML
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmPBsFAQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpqgnEAC0OqxnMsOPNbkLO7k6FsSrG7ZoENkOIMCt
Grk3D1cPkM13I0xc+WiaOBezMriPzfdXvt5AGDn9fd53Ih47qpSY4eU6pCqoCk5y
HWdn8KXZvhJGZsSy0Nz+cfPDW/8diJON8YBpJwWM/DfDdP8XibtjlIMTVTtJab6h
aGWjmy3leNfghOJ0cZ1wjL6maWFoowQASs52PZfajSc0mQ5X0i8BgQb1WOHNu89C
vEir9PYlTmdMnYlAKLsyEL3KoGUPm++zSLtJeyWYavlCMGK5WTyNkzmeXqsQhAGf
b1LjovQASe//1t2wvCzQviRf4cae0pE9JhiaYt2oxoDdHrfQj/WPndVS4yE9c+0O
BnLVTCFHNv86TRXNCbEUzI+Ftj6m9qt4MrHz8YpstX7FxGxYC+T5RqTwYClWZQ0j
llBuJUHj+kkAv6kBMJCHTyat6pxIDgcb52QMJr5mFWuEaTloraBIJC70hMtxBQV/
j5mrBYqCngCHVs+hAl9UQ4zqQVSvkeT11QFvwFolxIfs7qtfLqeGzYxvaeomqO3V
sA+H5NY50OEuPfFFmCpcNUJXeUKg7wP39iNHdz6P5cCDBCfUwbNbgKKKNmBovaC+
KhPd8Xo1MmzDuF+cylvTcjOBDte4425GN7PBj4vP1xbuHYcjg6AEFLawgqE9Y4XX
xyNlgJXPOg==
=ujiw
-----END PGP SIGNATURE-----
Merge tag 'block-6.2-2023-01-13' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
"Nothing major in here, just a collection of NVMe fixes and dropping a
wrong might_sleep() that static checkers tripped over but which isn't
valid"
* tag 'block-6.2-2023-01-13' of git://git.kernel.dk/linux:
MAINTAINERS: stop nvme matching for nvmem files
nvme: don't allow unprivileged passthrough on partitions
nvme: replace the "bool vec" arguments with flags in the ioctl path
nvme: remove __nvme_ioctl
nvme-pci: fix error handling in nvme_pci_enable()
nvme-pci: add NVME_QUIRK_IDENTIFY_CNS quirk to Apple T2 controllers
nvme-apple: add NVME_QUIRK_IDENTIFY_CNS quirk to fix regression
block: Drop spurious might_sleep() from blk_put_queue()
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmPBsL8QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgplLXD/9kmbwSseR8sm8s6rB3mZcfgN4vvkRDa5Kg
r5bj32sOl2o37szObGI53MneFcCpdOszA2R3AHqUoj7ZQsyPlt59OIpKtdBlp6xv
ACX1weBnxMTtdMZ90Mp/GxGLZuvSifbfL4z5YgbRRKnGorz3prmfRkKuO51DcJes
mhoPGTDUsAmxoU261LzuHI7DtD69We8yBzj58p21NF8DvBI3NtuWWhOI+EGs2CDr
aqilG5LxOYBKuEY660KhWKMAXwVbVkyLak4eIh4iax0R0Um/SKXnRMIRI3L1QZoP
eqGtF4zGJGM+47RUnl6GF59/IsyRZR04GJlZ5ma8aqqZNh4oj8E6A+hpgczJ0025
QM4hG+NwqXeNpMjN0PwDypo3WoYjyIhaMoEyCT7dmVzj62y3pm3DDaZDnzUlQF5w
8YvoSPwyCzWHWFDGEZp6WRYn2YfoiE+L48euknADmTL5FwUOs1J5OUK0v8BcAYLO
tqJ8Q8emx15ZTzHeI+Z9lDYIuYKBU9XuO8ugfbss/Xx2tQMDeHe6BIaujhwam6c4
BWyAIViXgWKpIIDB3Emsu3lStc3PJ1WLbBdw4ja0nwhCRB7IeclzZf1IZHTT4xsQ
eg5EK2QORLrlY9keCoFhqfT4guSYQptBlwPuxhHM2gcrxXegR6294hBoPVSd1be5
g5NK0rrSWA==
=9C0J
-----END PGP SIGNATURE-----
Merge tag 'io_uring-6.2-2023-01-13' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
"A fix for a regression that happened last week, rest is fixes that
will be headed to stable as well. In detail:
- Fix for a regression added with the leak fix from last week (me)
- In writing a test case for that leak, inadvertently discovered a
case where we a poll request can race. So fix that up and mark it
for stable, and also ensure that fdinfo covers both the poll tables
that we have. The latter was an oversight when the split poll table
were added (me)
- Fix for a lockdep reported issue with IOPOLL (Pavel)"
* tag 'io_uring-6.2-2023-01-13' of git://git.kernel.dk/linux:
io_uring: lock overflowing for IOPOLL
io_uring/poll: attempt request issue after racy poll wakeup
io_uring/fdinfo: include locked hash table in fdinfo output
io_uring/poll: add hash if ready poll request can't complete inline
io_uring/io-wq: only free worker if it was allocated for creation
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmPBniAUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vwOjRAAhjyRAgyiZV2rWS4pyvpQpqcpZWD9
796ZSqnzLJjVYCymGvUTX23FEA48n59+bCM/WpfEGUPrBf8LZQxC9YOCm6ltuM8+
FoSBykW/tHPq5IWaLzgrWpHeDOgEnZu/WFGGvrV3tl1mLpM1SJT8bGDsjHXlo+FM
qkTEiA3nUEKQs5x9r2TTLCeUWGPNTIHNd2VfuxOqM3qC/nVCOfTTxU8nm6Lk7Eix
nboAugAIADJIjs/+ZGekLBuzZYPkLYuDTyMYJ5hdo1p7wWCLc9gArEqvXKwVgmD3
ptenZeOlQi9Ay45HmkfIgfgKeeQ7REJj3dx04vf67neAianyUrB0EZDqDjR7LmgM
ozlNt0XjyoeEhu6AQS0s1LZtbDiED1R/00P6Gb+YEjUCVipW2lEYYwP0v9dsnNoh
6wblgnkQoxLFM+5CAXRmCmpaoQn0Uam7okfVeohtsz8/kNQF2St0hjzr4Dmws+O3
k9PUqnnUl4ByElzpEDesVGZMJ3pxFVH15ufu8VnRqN60pLTvNrsPyU4cVnG176Rc
3RSDN3zMtPxnHJVy4r3bTNEZsX/7RUrOb4xScOXMmRDBMUc8QdscF8Oj1ucKlj5j
mp7vB/7+VjU96uRarRyqUxGeQc77DCTcvOa1IGh/cuYom8ZJ6vpSCpKy6f6SFGuf
i8iTTUcQKCdqVW4=
=Fv2v
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci fixes from Bjorn Helgaas:
- Work around apparent firmware issue that made Linux reject MMCONFIG
space, which broke PCI extended config space (Bjorn Helgaas)
- Fix CONFIG_PCIE_BT1 dependency due to mid-air collision between a
PCI_MSI_IRQ_DOMAIN -> PCI_MSI change and addition of PCIE_BT1 (Lukas
Bulwahn)
* tag 'pci-v6.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space
x86/pci: Simplify is_mmconf_reserved() messages
PCI: dwc: Adjust to recent removal of PCI_MSI_IRQ_DOMAIN
Clang emits a asan.module_ctor constructor to each object file
when KASAN is enabled, and these functions are indirectly called
in do_ctors. With CONFIG_CFI_CLANG, the compiler also emits a CFI
type hash before each address-taken global function so they can
pass indirect call checks.
However, in commit 0c3e806ec0 ("x86/cfi: Add boot time hash
randomization"), x86 implemented boot time hash randomization,
which relies on the .cfi_sites section generated by objtool. As
objtool is run against vmlinux.o instead of individual object
files with X86_KERNEL_IBT (enabled by default), CFI types in
object files that are not part of vmlinux.o end up not being
included in .cfi_sites, and thus won't get randomized and trip
CFI when called.
Only .vmlinux.export.o and init/version-timestamp.o are linked
into vmlinux separately from vmlinux.o. As these files don't
contain any functions, disable KASAN for both of them to avoid
breaking hash randomization.
Link: https://github.com/ClangBuiltLinux/linux/issues/1742
Fixes: 0c3e806ec0 ("x86/cfi: Add boot time hash randomization")
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230112224948.1479453-2-samitolvanen@google.com
The memcpy() of the data following a coreboot_table_entry couldn't
be evaluated by the compiler under CONFIG_FORTIFY_SOURCE. To make it
easier to reason about, add an explicit flexible array member to struct
coreboot_device so the entire entry can be copied at once. Additionally,
validate the sizes before copying. Avoids this run-time false positive
warning:
memcpy: detected field-spanning write (size 168) of single field "&device->entry" at drivers/firmware/google/coreboot_table.c:103 (size 8)
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Link: https://lore.kernel.org/all/03ae2704-8c30-f9f0-215b-7cdf4ad35a9a@molgen.mpg.de/
Cc: Jack Rosenthal <jrosenth@chromium.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Julius Werner <jwerner@chromium.org>
Cc: Brian Norris <briannorris@chromium.org>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Julius Werner <jwerner@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Link: https://lore.kernel.org/r/20230107031406.gonna.761-kees@kernel.org
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Jack Rosenthal <jrosenth@chromium.org>
Link: https://lore.kernel.org/r/20230112230312.give.446-kees@kernel.org
kallsyms_on_each* may schedule so must not be called with interrupts
disabled. The iteration function could disable interrupts, but this
also changes lookup_symbol() to match the change to the other timing
code.
Reported-by: Erhard F. <erhard_f@mailbox.org>
Link: https://lore.kernel.org/all/bug-216902-206035@https.bugzilla.kernel.org%2F/
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/oe-lkp/202212251728.8d0872ff-oliver.sang@intel.com
Fixes: 30f3bb0977 ("kallsyms: Add self-test facility")
Tested-by: "Erhard F." <erhard_f@mailbox.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
This driver uses MSR functions that aren't implemented under UML.
Avoid building it to prevent tripping up allyesconfig.
e.g.
/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x3a3): undefined reference to `__tracepoint_read_msr'
/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x3d2): undefined reference to `__tracepoint_write_msr'
/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x457): undefined reference to `__tracepoint_write_msr'
/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x481): undefined reference to `do_trace_write_msr'
/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x4d5): undefined reference to `do_trace_write_msr'
/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x4f5): undefined reference to `do_trace_read_msr'
/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x51c): undefined reference to `do_trace_write_msr'
Signed-off-by: Peter Foley <pefoley2@pefoley.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
* Fix the PMCR_EL0 reset value after the PMU rework
* Correctly handle S2 fault triggered by a S1 page table walk
by not always classifying it as a write, as this breaks on
R/O memslots
* Document why we cannot exit with KVM_EXIT_MMIO when taking
a write fault from a S1 PTW on a R/O memslot
* Put the Apple M2 on the naughty list for not being able to
correctly implement the vgic SEIS feature, just like the M1
before it
* Reviewer updates: Alex is stepping down, replaced by Zenghui
x86:
* Fix various rare locking issues in Xen emulation and teach lockdep
to detect them
* Documentation improvements
* Do not return host topology information from KVM_GET_SUPPORTED_CPUID
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmPAT3EUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPmDAf+ICCVMwgm+PjAc6NuXzaUk6BFGWKF
1lzMvnKb6ARnhMKwyjl/Sf5EgnTuucnSTBHuE1kjaLkPUDNJvi4oRXVdDwKjtXnZ
Zxk4dpsNLWVfALHTk1KweIkR5KNif0kugUh9RNp6zOBnoTVRh8XdCHpeDv73tJaG
R1gCAreVTDbp+wNrVpiImUfYAZ4GrGpwwWRH/xLAGDWoTL9Z9J5tQygf+0C429n/
eJoTrToLjESbYadDgCNDD+TUkHbeDVg8aeio2JZga9SvH3RBhwriLqz26v9yvikL
UoY96AySMaiox4pgCUYUl8nng8MR8AG4C4vpNnLalj7tfHxRfhtAwD0EYw==
=gDOV
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix the PMCR_EL0 reset value after the PMU rework
- Correctly handle S2 fault triggered by a S1 page table walk by not
always classifying it as a write, as this breaks on R/O memslots
- Document why we cannot exit with KVM_EXIT_MMIO when taking a write
fault from a S1 PTW on a R/O memslot
- Put the Apple M2 on the naughty list for not being able to
correctly implement the vgic SEIS feature, just like the M1 before
it
- Reviewer updates: Alex is stepping down, replaced by Zenghui
x86:
- Fix various rare locking issues in Xen emulation and teach lockdep
to detect them
- Documentation improvements
- Do not return host topology information from KVM_GET_SUPPORTED_CPUID"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86/xen: Avoid deadlock by adding kvm->arch.xen.xen_lock leaf node lock
KVM: Ensure lockdep knows about kvm->lock vs. vcpu->mutex ordering rule
KVM: x86/xen: Fix potential deadlock in kvm_xen_update_runstate_guest()
KVM: x86/xen: Fix lockdep warning on "recursive" gpc locking
Documentation: kvm: fix SRCU locking order docs
KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID
KVM: nSVM: clarify recalc_intercepts() wrt CR8
MAINTAINERS: Remove myself as a KVM/arm64 reviewer
MAINTAINERS: Add Zenghui Yu as a KVM/arm64 reviewer
KVM: arm64: vgic: Add Apple M2 cpus to the list of broken SEIS implementations
KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_*
KVM: arm64: Document the behaviour of S1PTW faults on RO memslots
KVM: arm64: Fix S1PTW handling on RO memslots
KVM: arm64: PMU: Fix PMCR_EL0 reset value
On the x86-64 architecture even a failing cmpxchg grants exclusive
access to the cacheline, making it preferable to retry the failed op
immediately instead of stalling with the pause instruction.
To illustrate the impact, below are benchmark results obtained by
running various will-it-scale tests on top of the 6.2-rc3 kernel and
Cascade Lake (2 sockets * 24 cores * 2 threads) CPU.
All results in ops/s. Note there is some variance in re-runs, but the
code is consistently faster when contention is present.
open3 ("Same file open/close"):
proc stock no-pause
1 805603 814942 (+%1)
2 1054980 1054781 (-0%)
8 1544802 1822858 (+18%)
24 1191064 2199665 (+84%)
48 851582 1469860 (+72%)
96 609481 1427170 (+134%)
fstat2 ("Same file fstat"):
proc stock no-pause
1 3013872 3047636 (+1%)
2 4284687 4400421 (+2%)
8 3257721 5530156 (+69%)
24 2239819 5466127 (+144%)
48 1701072 5256609 (+209%)
96 1269157 6649326 (+423%)
Additionally, a kernel with a private patch to help access() scalability:
access2 ("Same file access"):
proc stock patched patched
+nopause
24 2378041 2005501 5370335 (-15% / +125%)
That is, fixing the problems in access itself *reduces* scalability
after the cacheline ping-pong only happens in lockref with the pause
instruction.
Note that fstat and access benchmarks are not currently integrated into
will-it-scale, but interested parties can find them in pull requests to
said project.
Code at hand has a rather tortured history. First modification showed
up in commit d472d9d98b ("lockref: Relax in cmpxchg loop"), written
with Itanium in mind. Later it got patched up to use an arch-dependent
macro to stop doing it on s390 where it caused a significant regression.
Said macro had undergone revisions and was ultimately eliminated later,
going back to cpu_relax.
While I intended to only remove cpu_relax for x86-64, I got the
following comment from Linus:
I would actually prefer just removing it entirely and see if
somebody else hollers. You have the numbers to prove it hurts on
real hardware, and I don't think we have any numbers to the
contrary.
So I think it's better to trust the numbers and remove it as a
failure, than say "let's just remove it on x86-64 and leave
everybody else with the potentially broken code"
Additionally, Will Deacon (maintainer of the arm64 port, one of the
architectures previously benchmarked):
So, from the arm64 side of the fence, I'm perfectly happy just
removing the cpu_relax() calls from lockref.
As such, come back full circle in history and whack it altogether.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/all/CAGudoHHx0Nqg6DE70zAVA75eV-HXfWyhVMWZ-aSeOofkA_=WdA@mail.gmail.com/
Acked-by: Tony Luck <tony.luck@intel.com> # ia64
Acked-by: Nicholas Piggin <npiggin@gmail.com> # powerpc
Acked-by: Will Deacon <will@kernel.org> # arm64
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Normally we reject ECAM space unless it is reported as reserved in the E820
table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2).
07eab0901e ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes
E820 entries that correspond to EfiMemoryMappedIO regions because some
other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the
E820 entries prevent Linux from allocating BAR space for hot-added devices.
Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does
mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is
normally converted to an E820 entry by a bootloader or EFI stub. After
07eab0901e, that E820 entry is removed, so we reject this ECAM space,
which makes PCI extended config space (offsets 0x100-0xfff) inaccessible.
The lack of extended config space breaks anything that relies on it,
including perf, VSEC telemetry, EDAC, QAT, SR-IOV, etc.
Allow use of ECAM for extended config space when the region is covered by
an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02
_CRS.
Link: https://lore.kernel.org/r/ac2693d8-8ba3-72e0-5b66-b3ae008d539d@linux.intel.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216891
Fixes: 07eab0901e ("efi/x86: Remove EfiMemoryMappedIO from E820 map")
Link: https://lore.kernel.org/r/20230110180243.1590045-3-helgaas@kernel.org
Reported-by: Kan Liang <kan.liang@linux.intel.com>
Reported-by: Tony Luck <tony.luck@intel.com>
Reported-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reported-by: Yunying Sun <yunying.sun@intel.com>
Reported-by: Baowen Zheng <baowen.zheng@corigine.com>
Reported-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reported-by: Yang Lixiao <lixiao.yang@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Yunying Sun <yunying.sun@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>
- avoid a potential crash on the efi_subsys_init() error path
- use more appropriate error code for runtime services calls issued
after a crash in the firmware occurred
- avoid READ_ONCE() for accessing firmware tables that may appear
misaligned in memory
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmPBg68ACgkQw08iOZLZ
jyQs5Qv+PVg06BhEqN+vwNQy6vd4ezTxmDAy7yx751mo3HIw0qT0ohsCIpRydq0c
+qlCXa+Uu/yr/IQplfDT9vY+MEwD9iuwJha8ltGRWM3++yEF4uQXowHDoEKsO84l
5PaC37EfOvHmV6UdFdIF0OYDOcRvX2FsIbmUKRyvIav1e+QRLvUWWKKEmAh04c7G
yNc0837kmoOpjKrYPc8j2n3dVUbhrFUW5eLIFmd8yrR+GRu6Ae5RH3J7iF7Nqtrq
oReYYq3XpmYg8c00WV0NKVuB0DK7fhGY7jcbDfLmTrPwqVzLjxQGecxsQPYnqrJd
mZywkm2fM8KIJy2LQDJOVOZaDAzaC2SkrpELHX/MnPK1UrP561AIv/sXK+3+UBEm
b6m5dHbJgaifKP3kkbc9Cy4f9avLJOdjdXH5f5zPe7it54yHLsacEvjT6M2oiunx
zIvTd/MXi24J+tzgxr08KM5wHXgLGh+fUM7BfZTvEVQmUjY8TnIPjsaAJhTS3jzV
TN3/XAWi
=4LbF
-----END PGP SIGNATURE-----
Merge tag 'efi-fixes-for-v6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
- avoid a potential crash on the efi_subsys_init() error path
- use more appropriate error code for runtime services calls issued
after a crash in the firmware occurred
- avoid READ_ONCE() for accessing firmware tables that may appear
misaligned in memory
* tag 'efi-fixes-for-v6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi: tpm: Avoid READ_ONCE() for accessing the event log
efi: rt-wrapper: Add missing include
efi: fix userspace infinite retry read efivars after EFI runtime services page fault
efi: fix NULL-deref in init error path
- Sphinx 6.0 broke our configuration mechanism, so fix it.
- I broke our configuration for non-Alabaster themes; Akira fixed it.
- Deprecate Sphinx < 2.4 with an eye toward future removal
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmPBhWwPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5Y+i8IAJCd8qgopxIcmzif8ncsrZFIdk3FBd4INCU3
gEr5IBQN10Fm3es8FcWQPhX8nqFzlyG9GyjNSEfZpKYF9y3zWx9l5xOD0f6Ki6F4
HEaBcP11zQCSbrdZMR2in7fW+SNqIjJ+srDLrLkG2d4il6IbbSwx121pjPxJgHkK
Y4Sj4Aa3fm5m5JzqArc8/IQRl6ewrfCuGXGh2RdunMzdf22Q2vMdIzEfhyilV1Cg
FSXttLDTq5huRSBv8PYaMJnpx2mMn+si8c5mFcNV6oDP+VG4m2rBw4kYQk6q0rU2
xJFnbh7oThKyQ955k+sxJYoSxq9Fd5lXX/3d+HtqSvvC/WAP8gY=
=ttzZ
-----END PGP SIGNATURE-----
Merge tag 'docs-6.2-fixes' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
"Three documentation fixes (or rather two and one warning):
- Sphinx 6.0 broke our configuration mechanism, so fix it
- I broke our configuration for non-Alabaster themes; Akira fixed it
- Deprecate Sphinx < 2.4 with an eye toward future removal"
* tag 'docs-6.2-fixes' of git://git.lwn.net/linux:
docs/conf.py: Use about.html only in sidebar of alabaster theme
docs: Deprecate use of Sphinx < 2.4.x
docs: Fix the docs build with Sphinx 6.0
Nathan reports that recent kernels built with LTO will crash when doing
EFI boot using Fedora's GRUB and SHIM. The culprit turns out to be a
misaligned load from the TPM event log, which is annotated with
READ_ONCE(), and under LTO, this gets translated into a LDAR instruction
which does not tolerate misaligned accesses.
Interestingly, this does not happen when booting the same kernel
straight from the UEFI shell, and so the fact that the event log may
appear misaligned in memory may be caused by a bug in GRUB or SHIM.
However, using READ_ONCE() to access firmware tables is slightly unusual
in any case, and here, we only need to ensure that 'event' is not
dereferenced again after it gets unmapped, but this is already taken
care of by the implicit barrier() semantics of the early_memunmap()
call.
Cc: <stable@vger.kernel.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/1782
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
syzbot reports an issue with overflow filling for IOPOLL:
WARNING: CPU: 0 PID: 28 at io_uring/io_uring.c:734 io_cqring_event_overflow+0x1c0/0x230 io_uring/io_uring.c:734
CPU: 0 PID: 28 Comm: kworker/u4:1 Not tainted 6.2.0-rc3-syzkaller-16369-g358a161a6a9e #0
Workqueue: events_unbound io_ring_exit_work
Call trace:
io_cqring_event_overflow+0x1c0/0x230 io_uring/io_uring.c:734
io_req_cqe_overflow+0x5c/0x70 io_uring/io_uring.c:773
io_fill_cqe_req io_uring/io_uring.h:168 [inline]
io_do_iopoll+0x474/0x62c io_uring/rw.c:1065
io_iopoll_try_reap_events+0x6c/0x108 io_uring/io_uring.c:1513
io_uring_try_cancel_requests+0x13c/0x258 io_uring/io_uring.c:3056
io_ring_exit_work+0xec/0x390 io_uring/io_uring.c:2869
process_one_work+0x2d8/0x504 kernel/workqueue.c:2289
worker_thread+0x340/0x610 kernel/workqueue.c:2436
kthread+0x12c/0x158 kernel/kthread.c:376
ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:863
There is no real problem for normal IOPOLL as flush is also called with
uring_lock taken, but it's getting more complicated for IOPOLL|SQPOLL,
for which __io_cqring_overflow_flush() happens from the CQ waiting path.
Reported-and-tested-by: syzbot+6805087452d72929404e@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org # 5.10+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This became a slightly big update, but it's more or less expected,
as the first batch after holidays.
All changes (but for the last two last-minute fixes) have been
stewed in linux-next long enough, so it's fairly safe to take.
- PCM UAF fix in 32bit compat layer
- ASoC board-specific fixes for Intel, AMD, Medathek, Qualcomm
- SOF power management fixes
- ASoC Intel link failure fixes
- A series of fixes for USB-audio regressions
- CS35L41 HD-audio codec regression fixes
- HD-audio device-specific fixes / quirks
Note that one SPI patch has been taken in ASoC subtree mistakenly,
and the same fix is found in spi tree, but it should be OK to apply.
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmPBZIsOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE/LEQ//XLbjI0EnjdpBgYRMocykaI2Y3n+72rncfTwQ
FX8mtGtVpBsOp6LCQhJX/0A5B+NcBRk1atyJRsuaAuWxC298jZPMQaSBjlHiP56f
3IcV67gBM5Ve3JwttjF3AtveQPHe+pYh0RRy9HHd4qEEPpO+JdTwHbex+5Xnf3Qq
jLDPwtrC5fd/Q38mfs881yoRLmnq3OppHJ630SklJBmdjuvVjHhXinOaDY5KBR42
/ewf946RKJFUpz0w72Hl4Kaw6kQ1HWb2kef+RZmdSp49EmUXyVmi9mxSSBJlDV0f
QueZlvcZj+Hm6cAw1Kt6pSpShDJO/LwuCuRTaiprBYq+SbCSxaq2mWz0vzyGx8un
U9QmLK4FRtPsZm3SPfVfDpROXc7ovydMOvOzFF6uRou/UdJZLnQYXmFTV5gphREa
jSaekWqQhrV53LTYBFn38PDfq+xLEeBx7VsYBOLUHyR7QX2rdpRpLc9XJkgLy5tJ
xozwrQFNdAvynfPGwBPBUAuWQOvRnuj0W9QIUmA4U2mxwvCbNGJETkI46/Wzixx0
C7n6JA6bc8rBsWHWXefGaabUaCqnBrPKxVqThYLJy05FUh58SP85Xo+k780CRQT7
tLMw2P7sJ1KKlvrVrmCqA2DlsraRT2pBcQEghKmfgded31/3PxAylhPuNO+E1QxP
Eq2uo2g=
=PKWN
-----END PGP SIGNATURE-----
Merge tag 'sound-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This became a slightly big update, but it's more or less expected, as
the first batch after holidays.
All changes (but for the last two last-minute fixes) have been stewed
in linux-next long enough, so it's fairly safe to take:
- PCM UAF fix in 32bit compat layer
- ASoC board-specific fixes for Intel, AMD, Medathek, Qualcomm
- SOF power management fixes
- ASoC Intel link failure fixes
- A series of fixes for USB-audio regressions
- CS35L41 HD-audio codec regression fixes
- HD-audio device-specific fixes / quirks
Note that one SPI patch has been taken in ASoC subtree mistakenly, and
the same fix is found in spi tree, but it should be OK to apply"
* tag 'sound-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (39 commits)
ALSA: pcm: Move rwsem lock inside snd_ctl_elem_read to prevent UAF
ALSA: usb-audio: Fix possible NULL pointer dereference in snd_usb_pcm_has_fixed_rate()
ALSA: hda/realtek: Enable mute/micmute LEDs on HP Spectre x360 13-aw0xxx
ASoC: fsl-asoc-card: Fix naming of AC'97 CODEC widgets
ASoC: fsl_ssi: Rename AC'97 streams to avoid collisions with AC'97 CODEC
ALSA: hda/hdmi: Add a HP device 0x8715 to force connect list
ALSA: control-led: use strscpy in set_led_id()
ALSA: usb-audio: Always initialize fixed_rate in snd_usb_find_implicit_fb_sync_format()
ASoC: dt-bindings: qcom,lpass-tx-macro: correct clocks on SC7280
ASoC: dt-bindings: qcom,lpass-wsa-macro: correct clocks on SM8250
ASoC: qcom: Fix building APQ8016 machine driver without SOUNDWIRE
ALSA: hda: cs35l41: Check runtime suspend capability at runtime_idle
ALSA: hda: cs35l41: Don't return -EINVAL from system suspend/resume
ASoC: fsl_micfil: Correct the number of steps on SX controls
ALSA: hda/realtek: fix mute/micmute LEDs don't work for a HP platform
Revert "ALSA: usb-audio: Drop superfluous interface setup at parsing"
ALSA: usb-audio: More refactoring of hw constraint rules
ALSA: usb-audio: Relax hw constraints for implicit fb sync
ALSA: usb-audio: Make sure to stop endpoints before closing EPs
ALSA: hda - Enable headset mic on another Dell laptop with ALC3254
...
- Fix cpufreq policy reference counting in amd-pstate to prevent it
from crashing on removal (Perry Yuan).
- Fix double initialization and set suspend-freq for Apple's cpufreq
driver (Arnd Bergmann, Hector Martin).
- Fix reading of "reg" property, update cpufreq-dt's blocklist and
update DT documentation for Qualcomm's cpufreq driver (Konrad Dybcio,
Krzysztof Kozlowski).
- Replace 0 with NULL in the Armada cpufreq driver (Miles Chen).
- Fix potential overflows in the CPPC cpufreq driver (Pierre Gondois).
- Update blocklist for the Tegra234 Soc cpufreq driver (Sumit Gupta).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmPBNrMSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxk9kP+waNWD/1Nuh8owqkTJduMV6uh46rOMdH
EO1siHGFxyfQn6IW6ch4wOwcMEjdk0VnRgcq5/sI1EVIE66aERBM0og3Se+g3RCg
nhYm3ZAhHOisHB3adhUVvdkHvD43iwuE1DedHSKI+TWDO5DmZeDD92/iCmXJ0esK
4mBAGHYX4Kr3aJNDvpMX6Q9VfLrBJ5+o3XM6PrKCk+tx1Aq7dH1M/iV/oLlM+tPz
OJus5KAwJGR0U+VXlIwu1d9CFyvynND15mfSl57G8RCxluhJl59+W4Vfs2ti/Pqo
Vf553jcftEBJB/aey5QwdEgJa3zkJYJZRg698NadF+WmlEZR1xGA0n4R7VSFVU88
xE7iSed4fIGfWTP01G72ZruF9Z7A6KIyvzq3vyhcl01yb7vsF1WC21qG/YGsPkRQ
i7JrLpbKYU+94rsfNEY2dTxYAL5Qs2cWld0S5iXxZVPixPeOeOtm56ci/dwPcuSA
FEeXGolStd1hwM70i4LYlcQ2RkcvJcEI/SF6jJpaiawXWdZihlQYHEq8/uNUsG5y
6gEfMUd1H7aershUBylqlcgMCpQIYHctI1M4hSePjhNy91Q1Ajuy+RCzL2YEtuwE
/NdPsVPHmbrHvHdQkbH3ntdj+S40Tnn9o8ytZit4iSGW1OIqqNexhqEb6NR82YGb
lx98QCiCw48e
=j/pc
-----END PGP SIGNATURE-----
Merge tag 'pm-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix assorted issues in the ARM cpufreq drivers and in the AMD
P-state driver.
Specifics:
- Fix cpufreq policy reference counting in amd-pstate to prevent it
from crashing on removal (Perry Yuan)
- Fix double initialization and set suspend-freq for Apple's cpufreq
driver (Arnd Bergmann, Hector Martin)
- Fix reading of "reg" property, update cpufreq-dt's blocklist and
update DT documentation for Qualcomm's cpufreq driver (Konrad
Dybcio, Krzysztof Kozlowski)
- Replace 0 with NULL in the Armada cpufreq driver (Miles Chen)
- Fix potential overflows in the CPPC cpufreq driver (Pierre Gondois)
- Update blocklist for the Tegra234 Soc cpufreq driver (Sumit Gupta)"
* tag 'pm-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: amd-pstate: fix kernel hang issue while amd-pstate unregistering
cpufreq: armada-37xx: stop using 0 as NULL pointer
cpufreq: apple-soc: Switch to the lowest frequency on suspend
dt-bindings: cpufreq: cpufreq-qcom-hw: document interrupts
cpufreq: Add SM6375 to cpufreq-dt-platdev blocklist
cpufreq: Add Tegra234 to cpufreq-dt-platdev blocklist
cpufreq: qcom-hw: Fix reading "reg" with address/size-cells != 2
cpufreq: CPPC: Add u64 casts to avoid overflowing
cpufreq: apple: remove duplicate intializer
- Improve ACPI companion lookup for backlight devices in the cases when
there is more than one candidate ACPI device object (Hans de Goede).
- Add missing support for manual selection of NVidia-WMI-EC or Apple
GMUX backlight in the kernel command line to the ACPI backlight
driver (Hans de Goede).
- Skip ACPI IRQ override on Asus Expertbook B2402CBA (Tamim Khan).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmPBNbISHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxixkP/3aIp3UZCjMwRqv0KtFGB3+bLiIcrVVb
WILER6FwaC9zy1fts2oa23nDT6l+DRdr6nqYA4K4557gK0pETJr/UK6uAFNOVFbA
fiVe3saGeHTOd+Oq8xcchX/sMSMneYwJwkByVaYwixLVvXSb4XNHAew5y0L3diod
xig1OreiPtEX8FvLCg6GNWBf9ULq1xiRPz2NMfx/FmkmvpQuFNVwlkrO7RJQMiCD
0uIwNqvM4wCgHyi7dwV6XsPebgpEJmwz4PaDkJ8FhpzARqes00Ua6wiBnhptuOsd
J1G1MEFeH095+xeSivnLesYhwrwiFGvLAq9pKIJYnuiRz80m2Y7ekhUuP1APPN3Z
5e+o2KqlDYOBZxD1tfDuJiaBwP+nzqKNseaYDeBR+Anj4lPnolCq/XQ4tdR1P+0o
hL8Xo8JWzN17r8AewLYN+pLwUceFrv7IL8pkSb+JF4UyZsLqmpYbdsulVSh8BDLt
g36lrABcZYqeFLiQD/ep3F39MTMNtkN3F1N6GUq0KHdyh/2f8LUWIpBQD35udaeb
GSamDfCIevSBDc6lk9kIvIrsh+uaB+Tk9ziMMZLIm4wB9ZD5ugYov7GrKo9cQuhi
3tIhjwScsGAfRMcvzo4zKkjxq1+nffHL6HJs4y15xsDnxgd09VjNNseBU+8D1y1F
2lcIUduQWUyd
=bDXc
-----END PGP SIGNATURE-----
Merge tag 'acpi-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These add one more ACPI IRQ override quirk, improve ACPI companion
lookup for backlight devices and add missing kernel command line
option values for backlight detection.
Specifics:
- Improve ACPI companion lookup for backlight devices in the cases
when there is more than one candidate ACPI device object (Hans de
Goede)
- Add missing support for manual selection of NVidia-WMI-EC or Apple
GMUX backlight in the kernel command line to the ACPI backlight
driver (Hans de Goede)
- Skip ACPI IRQ override on Asus Expertbook B2402CBA (Tamim Khan)"
* tag 'acpi-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: Fix selecting wrong ACPI fwnode for the iGPU on some Dell laptops
ACPI: video: Allow selecting NVidia-WMI-EC or Apple GMUX backlight from the cmdline
ACPI: resource: Skip IRQ override on Asus Expertbook B2402CBA
A small set of assorted fixes and hardware-id additions for 6.2.
The following is an automated git shortlog grouped by driver:
asus-nb-wmi:
- Add alternate mapping for KEY_SCREENLOCK
- Add alternate mapping for KEY_CAMERA
asus-wmi:
- Don't load fan curves without fan
- Ignore fan on E410MA
- Add quirk wmi_ignore_fan
dell-privacy:
- Only register SW_CAMERA_LENS_COVER if present
- Fix SW_CAMERA_LENS_COVER reporting
ideapad-laptop:
- Add Legion 5 15ARH05 DMI id to set_fn_lock_led_list[]
int3472/discrete:
- Ensure the clk/power enable pins are in output mode
intel/pmc/core:
- Add Meteor Lake mobile support
platform/surface:
- aggregator: Add missing call to ssam_request_sync_free()
- aggregator: Ignore command messages not intended for us
platform/x86/amd:
- Fix refcount leak in amd_pmc_probe
simatic-ipc:
- add another model
- correct name of a model
sony-laptop:
- Don't turn off 0x153 keyboard backlight during probe
thinkpad_acpi:
- Fix profile mode display in AMT mode
touchscreen_dmi:
- Add info for the CSL Panther Tab HD
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmPBN78UHGhkZWdvZWRl
QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9yQ7AgAuKK+TDlru+rup5PSvUBiRddYX8VI
U+cJokT9sp748zau+S7zy+1PDYtAnaXbV6wf6/YwANq6Pw9aI9MCMFyc2iXzIDCW
fp6d8xvow5XuWG/cK3rggl3WxzInyE2rcSI5epQPV9ylZSOPSPI8CKug/68I2L7W
kohws/18ujOU4J5Y8ATH1jY3t8Zx+uA7sdU/Oo6hiA4Xen1qrABCSgcGgWNqxfqb
C6tk1kF5agLmvR5I7Y0bDh1EHeN1CALPjl8MibEyYFldASLxmCYogx4bGDQBf0Qm
XFZ5MxLdFbHDFXiyaKh+RNW2uHzbJV3rXYVOyUy2eXahBRGj+yoFwDK8Zw==
=tB+M
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"A set of assorted fixes and hardware-id additions"
* tag 'platform-drivers-x86-v6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: thinkpad_acpi: Fix profile mode display in AMT mode
platform/x86: int3472/discrete: Ensure the clk/power enable pins are in output mode
platform/x86/amd: Fix refcount leak in amd_pmc_probe
platform/x86: intel/pmc/core: Add Meteor Lake mobile support
platform/x86: simatic-ipc: add another model
platform/x86: simatic-ipc: correct name of a model
platform/x86: dell-privacy: Only register SW_CAMERA_LENS_COVER if present
platform/x86: dell-privacy: Fix SW_CAMERA_LENS_COVER reporting
platform/x86: asus-wmi: Don't load fan curves without fan
platform/x86: asus-wmi: Ignore fan on E410MA
platform/x86: asus-wmi: Add quirk wmi_ignore_fan
platform/x86: asus-nb-wmi: Add alternate mapping for KEY_SCREENLOCK
platform/x86: asus-nb-wmi: Add alternate mapping for KEY_CAMERA
platform/surface: aggregator: Add missing call to ssam_request_sync_free()
platform/surface: aggregator: Ignore command messages not intended for us
platform/x86: touchscreen_dmi: Add info for the CSL Panther Tab HD
platform/x86: ideapad-laptop: Add Legion 5 15ARH05 DMI id to set_fn_lock_led_list[]
platform/x86: sony-laptop: Don't turn off 0x153 keyboard backlight during probe
buddy:
- benchmark regression fix for top-down buddy allocation
panel:
- add Lenovo panel orientation quirk
ttm:
- fix kernel oops regression
amdgpu:
- fix missing fence references
- fix missing pipeline sync fencing
- SMU13 fan speed fix
- SMU13 fix power cap handling
- SMU13 BACO fix
- Fix a possible segfault in bo validation error case
- Delay removal of firmware framebuffer
- Fix error when unloading
amdkfd:
- SVM fix when clearing vram
- GC11 fix for multi-GPU
i915:
- Reserve enough fence slot for i915_vma_unbind_vsync
- Fix potential use after free
- Reset engines twice in case of reset failure
- Use multi-cast registers for SVG Unit registers
msm:
- display:
- doc warning fixes
- dt attribs cleanups
- memory leak fix
- error handing in hdmi probe fix
- dp_aux_isr incorrect signalling fix
- shutdown path fix
- accel:
- a5xx: fix quirks to be a bitmask
- a6xx: fix gx halt to avoid 1s hang
- kexec shutdown fix
- fix potential double free
vmwgfx:
- drop rcu usage to make code more robust
virtio:
- fix use-after-free in gem handle code
nouveau:
- drop unused nouveau_fbcon.c
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmPA5skACgkQDHTzWXnE
hr69GA/8CgYwN/4vWDiQ+p4rX6muw0gicxKmfJZLxt8BFnQSSjESWZ6/L201JeZT
dl34SdZ++rx8yYLeoJKHKocyvMIj9goHeGdNkeyhaoZvlfPVf0sAMNjYfysIvcGk
s4HoLoXgzH8SCO2dp2MgOstJNncZNSZrH13b3UkgqQuB+VrL+pGx3qJM7z9Khe1j
vtgpBStgFIlkvDYHuTJDsn0X4E543EBs54U4g/Jc3WcnQwtRycUhXmkFDOtwkM/d
bwghisms6P+OvSAMdU2JWDwYLe/87zeXklKZqJzWpbrcB1iZPF/L2B40CuSXidj+
cXjNbWlwm0Yn6ytHMXp7+3bV6VTvDFmI+1uabXVH0wn8UIxX9WoKJeW7JgYnveXU
FG4Un/PhILeRxZa3jRNJVhPPq4JWjoINJvVmwSMMZdKT9x5MvdfHy+gsCpP6Ojjy
++MjslROZE0ciYPmwG2WPsmwylV00aztdIcNHzXZp4tX79hGw6cOfFjH9rfUUqJv
W52WVQnJ+JHv3BgFCyqXReUdSkXT39J3c54L1E9rK+OpVvc1i2gEN+eTVoNp1Vwn
4Gyb8MPKj//NxaUNwpEDfBRp6scd553xz5K+0SPBNB+G/XnRxoPT8jz0/ivpYGDd
WB73KZbvxRz2vzyy+biuEctyVTDlKDNM3UADW83eFspxHNthX28=
=OCT2
-----END PGP SIGNATURE-----
Merge tag 'drm-fixes-2023-01-13' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"There is a bit of a post-holiday build up here I expect, small fixes
across the board, amdgpu and msm being the main leaders, with others
having a few. One code removal patch for nouveau:
buddy:
- benchmark regression fix for top-down buddy allocation
panel:
- add Lenovo panel orientation quirk
ttm:
- fix kernel oops regression
amdgpu:
- fix missing fence references
- fix missing pipeline sync fencing
- SMU13 fan speed fix
- SMU13 fix power cap handling
- SMU13 BACO fix
- Fix a possible segfault in bo validation error case
- Delay removal of firmware framebuffer
- Fix error when unloading
amdkfd:
- SVM fix when clearing vram
- GC11 fix for multi-GPU
i915:
- Reserve enough fence slot for i915_vma_unbind_vsync
- Fix potential use after free
- Reset engines twice in case of reset failure
- Use multi-cast registers for SVG Unit registers
msm:
- display:
- doc warning fixes
- dt attribs cleanups
- memory leak fix
- error handing in hdmi probe fix
- dp_aux_isr incorrect signalling fix
- shutdown path fix
- accel:
- a5xx: fix quirks to be a bitmask
- a6xx: fix gx halt to avoid 1s hang
- kexec shutdown fix
- fix potential double free
vmwgfx:
- drop rcu usage to make code more robust
virtio:
- fix use-after-free in gem handle code
nouveau:
- drop unused nouveau_fbcon.c"
* tag 'drm-fixes-2023-01-13' of git://anongit.freedesktop.org/drm/drm: (35 commits)
drm: Optimize drm buddy top-down allocation method
drm/ttm: Fix a regression causing kernel oops'es
drm/i915/gt: Cover rest of SVG unit MCR registers
drm/nouveau: Remove file nouveau_fbcon.c
drm/amdkfd: Fix NULL pointer error for GC 11.0.1 on mGPU
drm/amd/pm/smu13: BACO is supported when it's in BACO state
drm/amdkfd: Add sync after creating vram bo
drm/i915/gt: Reset twice
drm/amdgpu: fix pipeline sync v2
drm/vmwgfx: Remove rcu locks from user resources
drm/virtio: Fix GEM handle creation UAF
drm/amdgpu: Fixed bug on error when unloading amdgpu
drm/amd: Delay removal of the firmware framebuffer
drm/amdgpu: Fix potential NULL dereference
drm/i915: Fix potential context UAFs
drm/i915: Reserve enough fence slot for i915_vma_unbind_async
drm: Add orientation quirk for Lenovo ideapad D330-10IGL
drm/msm/a6xx: Avoid gx gbit halt during rpm suspend
drm/msm/adreno: Make adreno quirks not overwrite each other
drm/msm: another fix for the headless Adreno GPU
...
Takes rwsem lock inside snd_ctl_elem_read instead of snd_ctl_elem_read_user
like it was done for write in commit 1fa4445f9a ("ALSA: control - introduce
snd_ctl_notify_one() helper"). Doing this way we are also fixing the following
locking issue happening in the compat path which can be easily triggered and
turned into an use-after-free.
64-bits:
snd_ctl_ioctl
snd_ctl_elem_read_user
[takes controls_rwsem]
snd_ctl_elem_read [lock properly held, all good]
[drops controls_rwsem]
32-bits:
snd_ctl_ioctl_compat
snd_ctl_elem_write_read_compat
ctl_elem_write_read
snd_ctl_elem_read [missing lock, not good]
CVE-2023-0266 was assigned for this issue.
Cc: stable@kernel.org # 5.13+
Signed-off-by: Clement Lecigne <clecigne@google.com>
Reviewed-by: Jaroslav Kysela <perex@perex.cz>
Link: https://lore.kernel.org/r/20230113120745.25464-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
- Fix PAGE_TABLE_CHECK failures on hugepage splitting path
- Fix PSCI encoding of MEM_PROTECT_RANGE function in UAPI header
- Fix NULL deref when accessing debugfs node if PSCI is not present
- Fix MTE core dumping when VMA list is being updated concurrently
- Fix SME signal frame handling when SVE is not implemented by the CPU
- Fix asm constraints for cmpxchg_double() to hazard both words
- Fix build failure with stack tracer and older versions of Clang
- Bring back workaround for Cortex-A715 erratum 2645198
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmO9SzwQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNLdYB/9pX4El38TX4Y4M6sR2yl+m1rkGRiU4nV3N
MKJ3ZVjrx87QZ8CKVYmJbnHzolN0Art9WvqFnyxtPMBlZyWzHjtsrQnad3VwLDOu
4qmqjDCXvPod1EncCxBiGu28FZ88HoLqhnwWB6O2Su6TlczD0kJTfzincdyzqvi2
r0uUlBd9gtFt3sjV+sLPjE6NqMf9MfhoOLLafijz7ZMElQL+2/BjZxhpHLaWhUz1
aHIp4w841TJOuSlCwstX20Nc6Q9+6ta07bw+TD/flyQ+IGUptgDEoIrpjdSO5b2t
zFFHHN5IXovAJPDfhAdXGAbC2SDFyYJtURCpv6hVt/SSsilGEbYg
=241k
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Here's a sizeable batch of Friday the 13th arm64 fixes for -rc4. What
could possibly go wrong?
The obvious reason we have so much here is because of the holiday
season right after the merge window, but we've also brought back an
erratum workaround that was previously dropped at the last minute and
there's an MTE coredumping fix that strays outside of the arch/arm64
directory.
Summary:
- Fix PAGE_TABLE_CHECK failures on hugepage splitting path
- Fix PSCI encoding of MEM_PROTECT_RANGE function in UAPI header
- Fix NULL deref when accessing debugfs node if PSCI is not present
- Fix MTE core dumping when VMA list is being updated concurrently
- Fix SME signal frame handling when SVE is not implemented by the
CPU
- Fix asm constraints for cmpxchg_double() to hazard both words
- Fix build failure with stack tracer and older versions of Clang
- Bring back workaround for Cortex-A715 erratum 2645198"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Fix build with CC=clang, CONFIG_FTRACE=y and CONFIG_STACK_TRACER=y
arm64/mm: Define dummy pud_user_exec() when using 2-level page-table
arm64: errata: Workaround possible Cortex-A715 [ESR|FAR]_ELx corruption
firmware/psci: Don't register with debugfs if PSCI isn't available
firmware/psci: Fix MEM_PROTECT_RANGE function numbers
arm64/signal: Always allocate SVE signal frames on SME only systems
arm64/signal: Always accept SVE signal frames on SME only systems
arm64/sme: Fix context switch for SME only systems
arm64: cmpxchg_double*: hazard against entire exchange variable
arm64/uprobes: change the uprobe_opcode_t typedef to fix the sparse warning
arm64: mte: Avoid the racy walk of the vma list during core dump
elfcore: Add a cprm parameter to elf_core_extra_{phdrs,data_size}
arm64: mte: Fix double-freeing of the temporary tag storage during coredump
arm64: ptrace: Use ARM64_SME to guard the SME register enumerations
arm64/mm: add pud_user_exec() check in pud_user_accessible_page()
arm64/mm: fix incorrect file_map_count for invalid pmd
A clk, prepared and enabled in mtk_iommu_v1_hw_init(), is not released in
the error handling path of mtk_iommu_v1_probe().
Add the corresponding clk_disable_unprepare(), as already done in the
remove function.
Fixes: b17336c55d ("iommu/mediatek: add support for mtk iommu generation one HW")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/593e7b7d97c6e064b29716b091a9d4fd122241fb.1671473163.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In __alloc_and_insert_iova_range, there is an issue that retry_pfn
overflows. The value of iovad->anchor.pfn_hi is ~0UL, then when
iovad->cached_node is iovad->anchor, curr_iova->pfn_hi + 1 will
overflow. As a result, if the retry logic is executed, low_pfn is
updated to 0, and then new_pfn < low_pfn returns false to make the
allocation successful.
This issue occurs in the following two situations:
1. The first iova size exceeds the domain size. When initializing
iova domain, iovad->cached_node is assigned as iovad->anchor. For
example, the iova domain size is 10M, start_pfn is 0x1_F000_0000,
and the iova size allocated for the first time is 11M. The
following is the log information, new->pfn_lo is smaller than
iovad->cached_node.
Example log as follows:
[ 223.798112][T1705487] sh: [name:iova&]__alloc_and_insert_iova_range
start_pfn:0x1f0000,retry_pfn:0x0,size:0xb00,limit_pfn:0x1f0a00
[ 223.799590][T1705487] sh: [name:iova&]__alloc_and_insert_iova_range
success start_pfn:0x1f0000,new->pfn_lo:0x1efe00,new->pfn_hi:0x1f08ff
2. The node with the largest iova->pfn_lo value in the iova domain
is deleted, iovad->cached_node will be updated to iovad->anchor,
and then the alloc iova size exceeds the maximum iova size that can
be allocated in the domain.
After judging that retry_pfn is less than limit_pfn, call retry_pfn+1
to fix the overflow issue.
Signed-off-by: jianjiao zeng <jianjiao.zeng@mediatek.com>
Signed-off-by: Yunfei Wang <yf.wang@mediatek.com>
Cc: <stable@vger.kernel.org> # 5.15.*
Fixes: 4e89dce725 ("iommu/iova: Retry from last rb tree node if iova search fails")
Acked-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20230111063801.25107-1-yf.wang@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Similar to SMMUv2, this driver calls iommu_device_unregister() from the
shutdown path, which removes the IOMMU groups with no coordination
whatsoever with their users - shutdown methods are optional in device
drivers. This can lead to NULL pointer dereferences in those drivers'
DMA API calls, or worse.
Instead of calling the full arm_smmu_device_remove() from
arm_smmu_device_shutdown(), let's pick only the relevant function call -
arm_smmu_device_disable() - more or less the reverse of
arm_smmu_device_reset() - and call just that from the shutdown path.
Fixes: 57365a04c9 ("iommu: Move bus setup to IOMMU device registration")
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20221215141251.3688780-2-vladimir.oltean@nxp.com
Signed-off-by: Will Deacon <will@kernel.org>
Michael Walle says he noticed the following stack trace while performing
a shutdown with "reboot -f". He suggests he got "lucky" and just hit the
correct spot for the reboot while there was a packet transmission in
flight.
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000098
CPU: 0 PID: 23 Comm: kworker/0:1 Not tainted 6.1.0-rc5-00088-gf3600ff8e322 #1930
Hardware name: Kontron KBox A-230-LS (DT)
pc : iommu_get_dma_domain+0x14/0x20
lr : iommu_dma_map_page+0x9c/0x254
Call trace:
iommu_get_dma_domain+0x14/0x20
dma_map_page_attrs+0x1ec/0x250
enetc_start_xmit+0x14c/0x10b0
enetc_xmit+0x60/0xdc
dev_hard_start_xmit+0xb8/0x210
sch_direct_xmit+0x11c/0x420
__dev_queue_xmit+0x354/0xb20
ip6_finish_output2+0x280/0x5b0
__ip6_finish_output+0x15c/0x270
ip6_output+0x78/0x15c
NF_HOOK.constprop.0+0x50/0xd0
mld_sendpack+0x1bc/0x320
mld_ifc_work+0x1d8/0x4dc
process_one_work+0x1e8/0x460
worker_thread+0x178/0x534
kthread+0xe0/0xe4
ret_from_fork+0x10/0x20
Code: d503201f f9416800 d503233f d50323bf (f9404c00)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Oops: Fatal exception in interrupt
This appears to be reproducible when the board has a fixed IP address,
is ping flooded from another host, and "reboot -f" is used.
The following is one more manifestation of the issue:
$ reboot -f
kvm: exiting hardware virtualization
cfg80211: failed to load regulatory.db
arm-smmu 5000000.iommu: disabling translation
sdhci-esdhc 2140000.mmc: Removing from iommu group 11
sdhci-esdhc 2150000.mmc: Removing from iommu group 12
fsl-edma 22c0000.dma-controller: Removing from iommu group 17
dwc3 3100000.usb: Removing from iommu group 9
dwc3 3110000.usb: Removing from iommu group 10
ahci-qoriq 3200000.sata: Removing from iommu group 2
fsl-qdma 8380000.dma-controller: Removing from iommu group 20
platform f080000.display: Removing from iommu group 0
etnaviv-gpu f0c0000.gpu: Removing from iommu group 1
etnaviv etnaviv: Removing from iommu group 1
caam_jr 8010000.jr: Removing from iommu group 13
caam_jr 8020000.jr: Removing from iommu group 14
caam_jr 8030000.jr: Removing from iommu group 15
caam_jr 8040000.jr: Removing from iommu group 16
fsl_enetc 0000:00:00.0: Removing from iommu group 4
arm-smmu 5000000.iommu: Blocked unknown Stream ID 0x429; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
arm-smmu 5000000.iommu: GFSR 0x80000002, GFSYNR0 0x00000002, GFSYNR1 0x00000429, GFSYNR2 0x00000000
fsl_enetc 0000:00:00.1: Removing from iommu group 5
arm-smmu 5000000.iommu: Blocked unknown Stream ID 0x429; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
arm-smmu 5000000.iommu: GFSR 0x80000002, GFSYNR0 0x00000002, GFSYNR1 0x00000429, GFSYNR2 0x00000000
arm-smmu 5000000.iommu: Blocked unknown Stream ID 0x429; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
arm-smmu 5000000.iommu: GFSR 0x80000002, GFSYNR0 0x00000000, GFSYNR1 0x00000429, GFSYNR2 0x00000000
fsl_enetc 0000:00:00.2: Removing from iommu group 6
fsl_enetc_mdio 0000:00:00.3: Removing from iommu group 8
mscc_felix 0000:00:00.5: Removing from iommu group 3
fsl_enetc 0000:00:00.6: Removing from iommu group 7
pcieport 0001:00:00.0: Removing from iommu group 18
arm-smmu 5000000.iommu: Blocked unknown Stream ID 0x429; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
arm-smmu 5000000.iommu: GFSR 0x00000002, GFSYNR0 0x00000000, GFSYNR1 0x00000429, GFSYNR2 0x00000000
pcieport 0002:00:00.0: Removing from iommu group 19
Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a8
pc : iommu_get_dma_domain+0x14/0x20
lr : iommu_dma_unmap_page+0x38/0xe0
Call trace:
iommu_get_dma_domain+0x14/0x20
dma_unmap_page_attrs+0x38/0x1d0
enetc_unmap_tx_buff.isra.0+0x6c/0x80
enetc_poll+0x170/0x910
__napi_poll+0x40/0x1e0
net_rx_action+0x164/0x37c
__do_softirq+0x128/0x368
run_ksoftirqd+0x68/0x90
smpboot_thread_fn+0x14c/0x190
Code: d503201f f9416800 d503233f d50323bf (f9405400)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Oops: Fatal exception in interrupt
---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
The problem seems to be that iommu_group_remove_device() is allowed to
run with no coordination whatsoever with the shutdown procedure of the
enetc PCI device. In fact, it almost seems as if it implies that the
pci_driver :: shutdown() method is mandatory if DMA is used with an
IOMMU, otherwise this is inevitable. That was never the case; shutdown
methods are optional in device drivers.
This is the call stack that leads to iommu_group_remove_device() during
reboot:
kernel_restart
-> device_shutdown
-> platform_shutdown
-> arm_smmu_device_shutdown
-> arm_smmu_device_remove
-> iommu_device_unregister
-> bus_for_each_dev
-> remove_iommu_group
-> iommu_release_device
-> iommu_group_remove_device
I don't know much about the arm_smmu driver, but
arm_smmu_device_shutdown() invoking arm_smmu_device_remove() looks
suspicious, since it causes the IOMMU device to unregister and that's
where everything starts to unravel. It forces all other devices which
depend on IOMMU groups to also point their ->shutdown() to ->remove(),
which will make reboot slower overall.
There are 2 moments relevant to this behavior. First was commit
b06c076ea9 ("Revert "iommu/arm-smmu: Make arm-smmu explicitly
non-modular"") when arm_smmu_device_shutdown() was made to run the exact
same thing as arm_smmu_device_remove(). Prior to that, there was no
iommu_device_unregister() call in arm_smmu_device_shutdown(). However,
that was benign until commit 57365a04c9 ("iommu: Move bus setup to
IOMMU device registration"), which made iommu_device_unregister() call
remove_iommu_group().
Restore the old shutdown behavior by making remove() call shutdown(),
but shutdown() does not call the remove() specific bits.
Fixes: 57365a04c9 ("iommu: Move bus setup to IOMMU device registration")
Reported-by: Michael Walle <michael@walle.cc>
Tested-by: Michael Walle <michael@walle.cc> # on kontron-sl28
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20221215141251.3688780-1-vladimir.oltean@nxp.com
Signed-off-by: Will Deacon <will@kernel.org>
Although it's vanishingly unlikely that anyone would integrate an SMMU
within a coherent interconnect without also making the pagetable walk
interface coherent, the same effect happens if a coherent SMMU fails to
advertise CTTW correctly. This turns out to be the case on some popular
NXP SoCs, where VFIO started failing the IOMMU_CAP_CACHE_COHERENCY test,
even though IOMMU_CACHE *was* previously achieving the desired effect
anyway thanks to the underlying integration.
While those SoCs stand to gain some more general benefits from a
firmware update to override CTTW correctly in DT/ACPI, it's also easy
to work around this in Linux as well, to avoid imposing too much on
affected users - since the upstream client devices *are* correctly
marked as coherent, we can trivially infer their coherent paths through
the SMMU as well.
Reported-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Fixes: df198b37e7 ("iommu/arm-smmu: Report IOMMU_CAP_CACHE_COHERENCY better")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/d6dc41952961e5c7b21acac08a8bf1eb0f69e124.1671123115.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Recently AMT mode was enabled (somewhat unexpectedly) on the Lenovo
Z13 platform. The FW is advertising it is available and the driver tries
to use it - unfortunately it reports the profile mode incorrectly.
Note, there is also some extra work needed to enable the dynamic aspect
of AMT support that I will be following up with; but more testing is
needed first. This patch just fixes things so the profiles are reported
correctly.
Link: https://gitlab.freedesktop.org/hadess/power-profiles-daemon/-/issues/115
Fixes: 46dcbc61b7 ("platform/x86: thinkpad-acpi: Add support for automatic mode transitions")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Link: https://lore.kernel.org/r/20230112221228.490946-1-mpearson-lenovo@squebb.ca
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Merge an ACPI resource management quirk and an ACPI backlight driver fix
for 6.2-rc4:
- Skip ACPI IRQ override on Asus Expertbook B2402CBA (Tamim Khan).
- Add missing support for manual selection of NVidia-WMI-EC or Apple
GMUX backlight in the kernel command line to the ACPI backlight
driver (Hans de Goede).
* acpi-resource:
ACPI: resource: Skip IRQ override on Asus Expertbook B2402CBA
* acpi-video:
ACPI: video: Allow selecting NVidia-WMI-EC or Apple GMUX backlight from the cmdline