Go to file
yangge c5978b9962 mm/page_alloc: Separate THP PCP into movable and non-movable categories
commit bf14ed81f571f8dba31cd72ab2e50fbcc877cc31 upstream.

Since commit 5d0a661d80 ("mm/page_alloc: use only one PCP list for
THP-sized allocations") no longer differentiates the migration type of
pages in THP-sized PCP list, it's possible that non-movable allocation
requests may get a CMA page from the list, in some cases, it's not
acceptable.

If a large number of CMA memory are configured in system (for example, the
CMA memory accounts for 50% of the system memory), starting a virtual
machine with device passthrough will get stuck.  During starting the
virtual machine, it will call pin_user_pages_remote(..., FOLL_LONGTERM,
...) to pin memory.  Normally if a page is present and in CMA area,
pin_user_pages_remote() will migrate the page from CMA area to non-CMA
area because of FOLL_LONGTERM flag.  But if non-movable allocation
requests return CMA memory, migrate_longterm_unpinnable_pages() will
migrate a CMA page to another CMA page, which will fail to pass the check
in check_and_migrate_movable_pages() and cause migration endless.

Call trace:
pin_user_pages_remote
--__gup_longterm_locked // endless loops in this function
----_get_user_pages_locked
----check_and_migrate_movable_pages
------migrate_longterm_unpinnable_pages
--------alloc_migration_target

This problem will also have a negative impact on CMA itself.  For example,
when CMA is borrowed by THP, and we need to reclaim it through cma_alloc()
or dma_alloc_coherent(), we must move those pages out to ensure CMA's
users can retrieve that contigous memory.  Currently, CMA's memory is
occupied by non-movable pages, meaning we can't relocate them.  As a
result, cma_alloc() is more likely to fail.

To fix the problem above, we add one PCP list for THP, which will not
introduce a new cacheline for struct per_cpu_pages.  THP will have 2 PCP
lists, one PCP list is used by MOVABLE allocation, and the other PCP list
is used by UNMOVABLE allocation.  MOVABLE allocation contains GPF_MOVABLE,
and UNMOVABLE allocation contains GFP_UNMOVABLE and GFP_RECLAIMABLE.

Link: https://lkml.kernel.org/r/1718845190-4456-1-git-send-email-yangge1116@126.com
Fixes: 5d0a661d80 ("mm/page_alloc: use only one PCP list for THP-sized allocations")
Signed-off-by: yangge <yangge1116@126.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <21cnbao@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-05 09:34:05 +02:00
Documentation kbuild: doc: Update default INSTALL_MOD_DIR from extra to updates 2024-07-05 09:33:56 +02:00
LICENSES LICENSES: Add the copyleft-next-0.3.1 license 2022-11-08 15:44:01 +01:00
arch syscalls: fix compat_sys_io_pgetevents_time64 usage 2024-07-05 09:34:04 +02:00
block block/ioctl: prefer different overflow check 2024-06-27 13:49:01 +02:00
certs certs: Reference revocation list for all keyrings 2023-08-17 20:12:41 +00:00
crypto crypto: ecdh - explicitly zeroize private_key 2024-07-05 09:33:52 +02:00
drivers Revert "cpufreq: amd-pstate: Fix the inconsistency in max frequency units" 2024-07-05 09:34:05 +02:00
fs erofs: fix NULL dereference of dif->bdev_handle in fscache mode 2024-07-05 09:34:05 +02:00
include mm/page_alloc: Separate THP PCP into movable and non-movable categories 2024-07-05 09:34:05 +02:00
init smp: Provide 'setup_max_cpus' definition on UP too 2024-06-16 13:47:49 +02:00
io_uring io_uring/rsrc: fix incorrect assignment of iter->nr_segs in io_import_fixed 2024-06-27 13:49:10 +02:00
ipc Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
kernel syscalls: fix compat_sys_io_pgetevents_time64 usage 2024-07-05 09:34:04 +02:00
lib lib/test_hmm.c: handle src_pfns and dst_pfns allocation failure 2024-06-12 11:12:08 +02:00
mm mm/page_alloc: Separate THP PCP into movable and non-movable categories 2024-07-05 09:34:05 +02:00
net batman-adv: Don't accept TT entries for out-of-spec VIDs 2024-07-05 09:34:04 +02:00
rust rust: kernel: require `Send` for `Module` implementations 2024-05-17 12:01:56 +02:00
samples work around gcc bugs with 'asm goto' with outputs 2024-02-23 09:24:47 +01:00
scripts kbuild: Install dtb files as 0644 in Makefile.dtbinst 2024-07-05 09:34:02 +02:00
security ima: Fix use-after-free on a dentry's dname.name 2024-06-21 14:38:48 +02:00
sound ALSA: hda/realtek: fix mute/micmute LEDs don't work for EliteBook 645/665 G11. 2024-07-05 09:34:00 +02:00
tools selftests: mptcp: userspace_pm: fixed subtest names 2024-07-05 09:33:44 +02:00
usr initramfs: Encode dependency on KBUILD_BUILD_TIMESTAMP 2023-06-06 17:54:49 +09:00
virt KVM: Fix a data race on last_boosted_vcpu in kvm_vcpu_on_spin() 2024-06-27 13:49:11 +02:00
.clang-format iommu: Add for_each_group_device() 2023-05-23 08:15:51 +02:00
.cocciconfig
.get_maintainer.ignore get_maintainer: add Alan to .get_maintainer.ignore 2022-08-20 15:17:44 -07:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore kbuild: rpm-pkg: rename binkernel.spec to kernel.spec 2023-07-25 00:59:33 +09:00
.mailmap 20 hotfixes. 12 are cc:stable and the remainder address post-6.5 issues 2023-10-24 09:52:16 -10:00
.rustfmt.toml rust: add `.rustfmt.toml` 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS USB: Remove Wireless USB and UWB documentation 2023-08-09 14:17:32 +02:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS pwm: Rename pwm_apply_state() to pwm_apply_might_sleep() 2024-06-12 11:12:24 +02:00
Makefile Linux 6.6.36 2024-06-27 13:49:15 +02:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.