Go to file
David Hildenbrand 9e89821170 mm/madvise: make MADV_POPULATE_(READ|WRITE) handle VM_FAULT_RETRY properly
[ Upstream commit 631426ba1d45a8672b177ee85ad4cabe760dd131 ]

Darrick reports that in some cases where pread() would fail with -EIO and
mmap()+access would generate a SIGBUS signal, MADV_POPULATE_READ /
MADV_POPULATE_WRITE will keep retrying forever and not fail with -EFAULT.

While the madvise() call can be interrupted by a signal, this is not the
desired behavior.  MADV_POPULATE_READ / MADV_POPULATE_WRITE should behave
like page faults in that case: fail and not retry forever.

A reproducer can be found at [1].

The reason is that __get_user_pages(), as called by
faultin_vma_page_range(), will not handle VM_FAULT_RETRY in a proper way:
it will simply return 0 when VM_FAULT_RETRY happened, making
madvise_populate()->faultin_vma_page_range() retry again and again, never
setting FOLL_TRIED->FAULT_FLAG_TRIED for __get_user_pages().

__get_user_pages_locked() does what we want, but duplicating that logic in
faultin_vma_page_range() feels wrong.

So let's use __get_user_pages_locked() instead, that will detect
VM_FAULT_RETRY and set FOLL_TRIED when retrying, making the fault handler
return VM_FAULT_SIGBUS (VM_FAULT_ERROR) at some point, propagating -EFAULT
from faultin_page() to __get_user_pages(), all the way to
madvise_populate().

But, there is an issue: __get_user_pages_locked() will end up re-taking
the MM lock and then __get_user_pages() will do another VMA lookup.  In
the meantime, the VMA layout could have changed and we'd fail with
different error codes than we'd want to.

As __get_user_pages() will currently do a new VMA lookup either way, let
it do the VMA handling in a different way, controlled by a new
FOLL_MADV_POPULATE flag, effectively moving these checks from
madvise_populate() + faultin_page_range() in there.

With this change, Darricks reproducer properly fails with -EFAULT, as
documented for MADV_POPULATE_READ / MADV_POPULATE_WRITE.

[1] https://lore.kernel.org/all/20240313171936.GN1927156@frogsfrogsfrogs/

Link: https://lkml.kernel.org/r/20240314161300.382526-1-david@redhat.com
Link: https://lkml.kernel.org/r/20240314161300.382526-2-david@redhat.com
Fixes: 4ca9b3859d ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Darrick J. Wong <djwong@kernel.org>
Closes: https://lore.kernel.org/all/20240311223815.GW1927156@frogsfrogsfrogs/
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02 16:32:40 +02:00
Documentation net: make SK_MEMORY_PCPU_RESERV tunable 2024-05-02 16:32:36 +02:00
LICENSES LICENSES: Add the copyleft-next-0.3.1 license 2022-11-08 15:44:01 +01:00
arch KVM: x86/pmu: Set enable bits for GP counters in PERF_GLOBAL_CTRL at "RESET" 2024-05-02 16:32:40 +02:00
block block: fix q->blkg_list corruption during disk rebind 2024-04-17 11:19:28 +02:00
certs certs: Reference revocation list for all keyrings 2023-08-17 20:12:41 +00:00
crypto crypto: jitter - fix CRYPTO_JITTERENTROPY help text 2024-03-26 18:19:52 -04:00
drivers net: ethernet: ti: am65-cpts: Fix PTPv1 message type on TX packets 2024-05-02 16:32:40 +02:00
fs cifs: reinstate original behavior again for forceuid/forcegid 2024-05-02 16:32:30 +02:00
include af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc(). 2024-05-02 16:32:40 +02:00
init init/main.c: Fix potential static_command_line memory overflow 2024-04-27 17:11:41 +02:00
io_uring io_uring: Fix io_cqring_wait() not restoring sigmask on get_timespec64() failure 2024-04-27 17:11:30 +02:00
ipc Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
kernel sched: Add missing memory barrier in switch_mm_cid 2024-04-27 17:11:41 +02:00
lib bootconfig: use memblock_free_late to free xbc memory to buddy 2024-04-27 17:11:43 +02:00
mm mm/madvise: make MADV_POPULATE_(READ|WRITE) handle VM_FAULT_RETRY properly 2024-05-02 16:32:40 +02:00
net af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc(). 2024-05-02 16:32:40 +02:00
rust rust: upgrade to Rust 1.73.0 2024-02-16 19:10:43 +01:00
samples work around gcc bugs with 'asm goto' with outputs 2024-02-23 09:24:47 +01:00
scripts gcc-plugins/stackleak: Avoid .head.text section 2024-04-13 13:07:40 +02:00
security selinux: avoid dereference of garbage after mount failure 2024-04-10 16:35:48 +02:00
sound ALSA: hda/realtek - Enable audio jacks of Haier Boyue G42 with ALC269VC 2024-04-27 17:11:38 +02:00
tools tools: ynl: don't ignore errors in NLMSG_DONE messages 2024-05-02 16:32:36 +02:00
usr initramfs: Encode dependency on KBUILD_BUILD_TIMESTAMP 2023-06-06 17:54:49 +09:00
virt KVM: Always flush async #PF workqueue when vCPU is being destroyed 2024-04-03 15:28:18 +02:00
.clang-format iommu: Add for_each_group_device() 2023-05-23 08:15:51 +02:00
.cocciconfig
.get_maintainer.ignore get_maintainer: add Alan to .get_maintainer.ignore 2022-08-20 15:17:44 -07:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore kbuild: rpm-pkg: rename binkernel.spec to kernel.spec 2023-07-25 00:59:33 +09:00
.mailmap 20 hotfixes. 12 are cc:stable and the remainder address post-6.5 issues 2023-10-24 09:52:16 -10:00
.rustfmt.toml rust: add `.rustfmt.toml` 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS USB: Remove Wireless USB and UWB documentation 2023-08-09 14:17:32 +02:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS ALSA: scarlett2: Rename scarlett_gen2 to scarlett2 2024-04-27 17:11:36 +02:00
Makefile Linux 6.6.29 2024-04-27 17:11:44 +02:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.