Commit Graph

21374 Commits

Author SHA1 Message Date
frankjpliu 24dfafc175 Merge branch 'honglin/release' into 'release' (merge request !220)
rue: fix some sparse warnings of RUE net/mm reported by lore kernel test robot
Upstream kernel test robot report few issues caused by RUE functions.
Fix the net/mm functions related issues.

Note:
This MR is not related to the real RUE function, so no need to re-trigger the RUE regression test already done for [1].

[1] https://git.woa.com/tlinux/tkernel5/-/merge_requests/197
2024-10-28 06:46:29 +00:00
Jianping Liu 754fb4ce83 Merge linux 6.6.58
Conflicts:
	fs/xfs/libxfs/xfs_inode_buf.c

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
2024-10-28 10:55:21 +08:00
Honglin Li 83c77dd970 rue/mm: fix suspicious RCU usage in mem_cgroup_account_oom_skip
It was found that "suspicious RCU usage" lockdep warning
was issued with the missing rcu_read_lock() call, which
resulted in splat as follows:

=============================
WARNING: suspicious RCU usage
6.6.47-01324-g0d35c4c63934 #1 Not tainted
-----------------------------
include/linux/cgroup.h:444 suspicious rcu_dereference_check() usage!

other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
stack backtrace:
CPU: 1 PID: 4377 Comm: test_memcontrol Not tainted 6.6.47-01324-g0d35c4c63934 #1
Call Trace:
 dump_stack_lvl (lib/dump_stack.c:108)
 dump_stack (lib/dump_stack.c:115)
 lockdep_rcu_suspicious
 mem_cgroup_account_oom_skip
 oom_evaluate_task
 ? oom_badness
 mem_cgroup_scan_tasks
 mem_cgroup_select_bad_process
 out_of_memory
 ? oom_killer_disable
 mem_cgroup_out_of_memory

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202410091610.3345ae6c-oliver.sang@intel.com
Signed-off-by: Honglin Li <honglinli@tencent.com>
2024-10-25 18:01:28 +08:00
Honglin Li 203c0f92e6 rue/mm: fix some sparse warnings of incorrect type in argument
Since __user is no longer used in proc handler, so drop it.
Some symbols are not declared, make them static.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202410051837.9XzQcFvd-lkp@intel.com/
Signed-off-by: Honglin Li <honglinli@tencent.com>
2024-10-25 18:00:12 +08:00
Honglin Li 59bef3a756 rue/mm: fix compile error of dereferencing pointer to incomplete type
Compile error caused by CONFIG_MEMCG disabled in collapse_file.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202409300602.m8qpsPwl-lkp@intel.com/
Signed-off-by: Honglin Li <honglinli@tencent.com>
2024-10-25 11:52:07 +08:00
Honglin Li 711f0a3fc0 rue/mm: fix some sparse warnings due to no previous prototype
No previous prototype for some functions, make them right.
Variable pre_used set but not used, so remove it.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202409300256.RAybhLQs-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202409300458.ikC3QeVe-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202410051113.ARgT0WJv-lkp@intel.com/
Signed-off-by: Honglin Li <honglinli@tencent.com>
2024-10-25 11:50:47 +08:00
Liu Shixin bed2b90378 mm/swapfile: skip HugeTLB pages for unuse_vma
commit 7528c4fb1237512ee18049f852f014eba80bbe8d upstream.

I got a bad pud error and lost a 1GB HugeTLB when calling swapoff.  The
problem can be reproduced by the following steps:

 1. Allocate an anonymous 1GB HugeTLB and some other anonymous memory.
 2. Swapout the above anonymous memory.
 3. run swapoff and we will get a bad pud error in kernel message:

  mm/pgtable-generic.c:42: bad pud 00000000743d215d(84000001400000e7)

We can tell that pud_clear_bad is called by pud_none_or_clear_bad in
unuse_pud_range() by ftrace.  And therefore the HugeTLB pages will never
be freed because we lost it from page table.  We can skip HugeTLB pages
for unuse_vma to fix it.

Link: https://lkml.kernel.org/r/20241015014521.570237-1-liushixin2@huawei.com
Fixes: 0fe6e20b9c ("hugetlb, rmap: add reverse mapping for hugepage")
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Acked-by: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-22 15:46:21 +02:00
Wei Xu a0035fc555 mm/mglru: only clear kswapd_failures if reclaimable
commit b130ba4a6259f6b64d8af15e9e7ab1e912bcb7ad upstream.

lru_gen_shrink_node() unconditionally clears kswapd_failures, which can
prevent kswapd from sleeping and cause 100% kswapd cpu usage even when
kswapd repeatedly fails to make progress in reclaim.

Only clear kswap_failures in lru_gen_shrink_node() if reclaim makes some
progress, similar to shrink_node().

I happened to run into this problem in one of my tests recently.  It
requires a combination of several conditions: The allocator needs to
allocate a right amount of pages such that it can wake up kswapd
without itself being OOM killed; there is no memory for kswapd to
reclaim (My test disables swap and cleans page cache first); no other
process frees enough memory at the same time.

Link: https://lkml.kernel.org/r/20241014221211.832591-1-weixugc@google.com
Fixes: e4dde56cd2 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: Wei Xu <weixugc@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Jan Alexander Steffens <heftig@archlinux.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-22 15:46:21 +02:00
Jann Horn 17396e32f9 mm/mremap: fix move_normal_pmd/retract_page_tables race
commit 6fa1066fc5d00cb9f1b0e83b7ff6ef98d26ba2aa upstream.

In mremap(), move_page_tables() looks at the type of the PMD entry and the
specified address range to figure out by which method the next chunk of
page table entries should be moved.

At that point, the mmap_lock is held in write mode, but no rmap locks are
held yet.  For PMD entries that point to page tables and are fully covered
by the source address range, move_pgt_entry(NORMAL_PMD, ...) is called,
which first takes rmap locks, then does move_normal_pmd().
move_normal_pmd() takes the necessary page table locks at source and
destination, then moves an entire page table from the source to the
destination.

The problem is: The rmap locks, which protect against concurrent page
table removal by retract_page_tables() in the THP code, are only taken
after the PMD entry has been read and it has been decided how to move it.
So we can race as follows (with two processes that have mappings of the
same tmpfs file that is stored on a tmpfs mount with huge=advise); note
that process A accesses page tables through the MM while process B does it
through the file rmap:

process A                      process B
=========                      =========
mremap
  mremap_to
    move_vma
      move_page_tables
        get_old_pmd
        alloc_new_pmd
                      *** PREEMPT ***
                               madvise(MADV_COLLAPSE)
                                 do_madvise
                                   madvise_walk_vmas
                                     madvise_vma_behavior
                                       madvise_collapse
                                         hpage_collapse_scan_file
                                           collapse_file
                                             retract_page_tables
                                               i_mmap_lock_read(mapping)
                                               pmdp_collapse_flush
                                               i_mmap_unlock_read(mapping)
        move_pgt_entry(NORMAL_PMD, ...)
          take_rmap_locks
          move_normal_pmd
          drop_rmap_locks

When this happens, move_normal_pmd() can end up creating bogus PMD entries
in the line `pmd_populate(mm, new_pmd, pmd_pgtable(pmd))`.  The effect
depends on arch-specific and machine-specific details; on x86, you can end
up with physical page 0 mapped as a page table, which is likely
exploitable for user->kernel privilege escalation.

Fix the race by letting process B recheck that the PMD still points to a
page table after the rmap locks have been taken.  Otherwise, we bail and
let the caller fall back to the PTE-level copying path, which will then
bail immediately at the pmd_none() check.

Bug reachability: Reaching this bug requires that you can create
shmem/file THP mappings - anonymous THP uses different code that doesn't
zap stuff under rmap locks.  File THP is gated on an experimental config
flag (CONFIG_READ_ONLY_THP_FOR_FS), so on normal distro kernels you need
shmem THP to hit this bug.  As far as I know, getting shmem THP normally
requires that you can mount your own tmpfs with the right mount flags,
which would require creating your own user+mount namespace; though I don't
know if some distros maybe enable shmem THP by default or something like
that.

Bug impact: This issue can likely be used for user->kernel privilege
escalation when it is reachable.

Link: https://lkml.kernel.org/r/20241007-move_normal_pmd-vs-collapse-fix-2-v1-1-5ead9631f2ea@google.com
Fixes: 1d65b771bc ("mm/khugepaged: retract_page_tables() without mmap or vma lock")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Co-developed-by: David Hildenbrand <david@redhat.com>
Closes: https://project-zero.issues.chromium.org/371047675
Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-22 15:46:21 +02:00
Jianping Liu ba131a89d7 Merge linux 6.6.57
Conflicts:
	drivers/scsi/sd.c

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
2024-10-21 14:40:59 +08:00
Patrick Roy 7caf966390 secretmem: disable memfd_secret() if arch cannot set direct map
commit 532b53cebe58f34ce1c0f34d866f5c0e335c53c6 upstream.

Return -ENOSYS from memfd_secret() syscall if !can_set_direct_map().  This
is the case for example on some arm64 configurations, where marking 4k
PTEs in the direct map not present can only be done if the direct map is
set up at 4k granularity in the first place (as ARM's break-before-make
semantics do not easily allow breaking apart large/gigantic pages).

More precisely, on arm64 systems with !can_set_direct_map(),
set_direct_map_invalid_noflush() is a no-op, however it returns success
(0) instead of an error.  This means that memfd_secret will seemingly
"work" (e.g.  syscall succeeds, you can mmap the fd and fault in pages),
but it does not actually achieve its goal of removing its memory from the
direct map.

Note that with this patch, memfd_secret() will start erroring on systems
where can_set_direct_map() returns false (arm64 with
CONFIG_RODATA_FULL_DEFAULT_ENABLED=n, CONFIG_DEBUG_PAGEALLOC=n and
CONFIG_KFENCE=n), but that still seems better than the current silent
failure.  Since CONFIG_RODATA_FULL_DEFAULT_ENABLED defaults to 'y', most
arm64 systems actually have a working memfd_secret() and aren't be
affected.

From going through the iterations of the original memfd_secret patch
series, it seems that disabling the syscall in these scenarios was the
intended behavior [1] (preferred over having
set_direct_map_invalid_noflush return an error as that would result in
SIGBUSes at page-fault time), however the check for it got dropped between
v16 [2] and v17 [3], when secretmem moved away from CMA allocations.

[1]: https://lore.kernel.org/lkml/20201124164930.GK8537@kernel.org/
[2]: https://lore.kernel.org/lkml/20210121122723.3446-11-rppt@kernel.org/#t
[3]: https://lore.kernel.org/lkml/20201125092208.12544-10-rppt@kernel.org/

Link: https://lkml.kernel.org/r/20241001080056.784735-1-roypat@amazon.co.uk
Fixes: 1507f51255 ("mm: introduce memfd_secret system call to create "secret" memory areas")
Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: James Gowans <jgowans@amazon.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-17 15:24:37 +02:00
frankjpliu c17649fea4 Merge branch 'haisu/master-fix-rue-lore-issues' into 'master' (merge request !207)
rue/io: Fix RUE IO kernel CONFIG issues reported by lore kernel test robot
Upstream kernel test robot report few issues caused by RUE IO functions. 
Most of them are caused by disabled CONFIG like allnoconfig build.

Fix the IO functions related issues.

Note:
This MR is not related to the real RUE IO function, so no need to re-trigger the RUE regression test already done for [1].

[1] https://git.woa.com/tlinux/tkernel5/-/merge_requests/197
2024-10-16 06:32:46 +00:00
Jianping Liu b03e997445 Merge linux 6.6.55
Conflicts:
	arch/loongarch/configs/loongson3_defconfig
	drivers/block/null_blk/main.c
	kernel/sched/psi.c

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
2024-10-11 18:06:09 +08:00
Jianping Liu 51d3da3439 Merge branch '6.6.54'
Conflicts:
	drivers/md/dm.c
	kernel/sched/fair.c

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
2024-10-10 21:47:18 +08:00
Yosry Ahmed 54ad9c7608 mm: z3fold: deprecate CONFIG_Z3FOLD
[ Upstream commit 7a2369b74abf76cd3e54c45b30f6addb497f831b ]

The z3fold compressed pages allocator is rarely used, most users use
zsmalloc.  The only disadvantage of zsmalloc in comparison is the
dependency on MMU, and zbud is a more common option for !MMU as it was the
default zswap allocator for a long time.

Historically, zsmalloc had worse latency than zbud and z3fold but offered
better memory savings.  This is no longer the case as shown by a simple
recent analysis [1].  That analysis showed that z3fold does not have any
advantage over zsmalloc or zbud considering both performance and memory
usage.  In a kernel build test on tmpfs in a limited cgroup, z3fold took
3% more time and used 1.8% more memory.  The latency of zswap_load() was
7% higher, and that of zswap_store() was 10% higher.  Zsmalloc is better
in all metrics.

Moreover, z3fold apparently has latent bugs, which was made noticeable by
a recent soft lockup bug report with z3fold [2].  Switching to zsmalloc
not only fixed the problem, but also reduced the swap usage from 6~8G to
1~2G.  Other users have also reported being bitten by mistakenly enabling
z3fold.

Other than hurting users, z3fold is repeatedly causing wasted engineering
effort.  Apart from investigating the above bug, it came up in multiple
development discussions (e.g.  [3]) as something we need to handle, when
there aren't any legit users (at least not intentionally).

The natural course of action is to deprecate z3fold, and remove in a few
cycles if no objections are raised from active users.  Next on the list
should be zbud, as it offers marginal latency gains at the cost of huge
memory waste when compared to zsmalloc.  That one will need to wait until
zsmalloc does not depend on MMU.

Rename the user-visible config option from CONFIG_Z3FOLD to
CONFIG_Z3FOLD_DEPRECATED so that users with CONFIG_Z3FOLD=y get a new
prompt with explanation during make oldconfig.  Also, remove
CONFIG_Z3FOLD=y from defconfigs.

[1]https://lore.kernel.org/lkml/CAJD7tkbRF6od-2x_L8-A1QL3=2Ww13sCj4S3i4bNndqF+3+_Vg@mail.gmail.com/
[2]https://lore.kernel.org/lkml/EF0ABD3E-A239-4111-A8AB-5C442E759CF3@gmail.com/
[3]https://lore.kernel.org/lkml/CAJD7tkbnmeVugfunffSovJf9FAgy9rhBVt_tx=nxUveLUfqVsA@mail.gmail.com/

[arnd@arndb.de: deprecate ZSWAP_ZPOOL_DEFAULT_Z3FOLD as well]
  Link: https://lkml.kernel.org/r/20240909202625.1054880-1-arnd@kernel.org
Link: https://lkml.kernel.org/r/20240904233343.933462-1-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Chris Down <chris@chrisdown.name>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vitaly Wool <vitaly.wool@konsulko.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 7a2369b74abf76cd3e54c45b30f6addb497f831b)
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10 11:58:02 +02:00
Danilo Krummrich e3a9fc1520 mm: krealloc: consider spare memory for __GFP_ZERO
commit 1a83a716ec233990e1fd5b6fbb1200ade63bf450 upstream.

As long as krealloc() is called with __GFP_ZERO consistently, starting
with the initial memory allocation, __GFP_ZERO should be fully honored.

However, if for an existing allocation krealloc() is called with a
decreased size, it is not ensured that the spare portion the allocation is
zeroed.  Thus, if krealloc() is subsequently called with a larger size
again, __GFP_ZERO can't be fully honored, since we don't know the previous
size, but only the bucket size.

Example:

	buf = kzalloc(64, GFP_KERNEL);
	memset(buf, 0xff, 64);

	buf = krealloc(buf, 48, GFP_KERNEL | __GFP_ZERO);

	/* After this call the last 16 bytes are still 0xff. */
	buf = krealloc(buf, 64, GFP_KERNEL | __GFP_ZERO);

Fix this, by explicitly setting spare memory to zero, when shrinking an
allocation with __GFP_ZERO flag set or init_on_alloc enabled.

Link: https://lkml.kernel.org/r/20240812223707.32049-1-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10 11:57:50 +02:00
Haisu Wang 148ec2e1cd rue/io: mem_cgroup_bind_blkio_write also require CONFIG_SWAP
mem_cgroup_bind_blkio_write() caused by disable CONFIG_SWAP
Partially fix the report issue.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202409300839.3XARZneM-lkp@intel.com/
Signed-off-by: Haisu Wang <haisuwang@tencent.com>
2024-10-10 15:44:23 +08:00
Liam R. Howlett b35a42bdaf mm/damon/vaddr: protect vma traversal in __damon_va_thre_regions() with rcu read lock
commit fb497d6db7c19c797cbd694b52d1af87c4eebcc6 upstream.

Traversing VMAs of a given maple tree should be protected by rcu read
lock.  However, __damon_va_three_regions() is not doing the protection.
Hold the lock.

Link: https://lkml.kernel.org/r/20240905001204.1481-1-sj@kernel.org
Fixes: d0cf3dd47f ("damon: convert __damon_va_three_regions to use the VMA iterator")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: SeongJae Park <sj@kernel.org>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/b83651a0-5b24-4206-b860-cb54ffdf209b@roeck-us.net
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04 16:30:03 +02:00
David Gow 9347605691 mm: only enforce minimum stack gap size if it's sensible
commit 69b50d4351ed924f29e3d46b159e28f70dfc707f upstream.

The generic mmap_base code tries to leave a gap between the top of the
stack and the mmap base address, but enforces a minimum gap size (MIN_GAP)
of 128MB, which is too large on some setups.  In particular, on arm tasks
without ADDR_LIMIT_32BIT, the STACK_TOP value is less than 128MB, so it's
impossible to fit such a gap in.

Only enforce this minimum if MIN_GAP < MAX_GAP, as we'd prefer to honour
MAX_GAP, which is defined proportionally, so scales better and always
leaves us with both _some_ stack space and some room for mmap.

This fixes the usercopy KUnit test suite on 32-bit arm, as it doesn't set
any personality flags so gets the default (in this case 26-bit) task size.
This test can be run with: ./tools/testing/kunit/kunit.py run --arch arm
usercopy --make_options LLVM=1

Link: https://lkml.kernel.org/r/20240803074642.1849623-2-davidgow@google.com
Fixes: dba79c3df4 ("arm: use generic mmap top-down layout and brk randomization")
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04 16:30:02 +02:00
Kairui Song 722e9e5acc mm/filemap: optimize filemap folio adding
commit 6758c1128ceb45d1a35298912b974eb4895b7dd9 upstream.

Instead of doing multiple tree walks, do one optimism range check with
lock hold, and exit if raced with another insertion.  If a shadow exists,
check it with a new xas_get_order helper before releasing the lock to
avoid redundant tree walks for getting its order.

Drop the lock and do the allocation only if a split is needed.

In the best case, it only need to walk the tree once.  If it needs to
alloc and split, 3 walks are issued (One for first ranged conflict check
and order retrieving, one for the second check after allocation, one for
the insert after split).

Testing with 4K pages, in an 8G cgroup, with 16G brd as block device:

  echo 3 > /proc/sys/vm/drop_caches

  fio -name=cached --numjobs=16 --filename=/mnt/test.img \
    --buffered=1 --ioengine=mmap --rw=randread --time_based \
    --ramp_time=30s --runtime=5m --group_reporting

Before:
bw (  MiB/s): min= 1027, max= 3520, per=100.00%, avg=2445.02, stdev=18.90, samples=8691
iops        : min=263001, max=901288, avg=625924.36, stdev=4837.28, samples=8691

After (+7.3%):
bw (  MiB/s): min=  493, max= 3947, per=100.00%, avg=2625.56, stdev=25.74, samples=8651
iops        : min=126454, max=1010681, avg=672142.61, stdev=6590.48, samples=8651

Test result with THP (do a THP randread then switch to 4K page in hope it
issues a lot of splitting):

  echo 3 > /proc/sys/vm/drop_caches

  fio -name=cached --numjobs=16 --filename=/mnt/test.img \
      --buffered=1 --ioengine=mmap -thp=1 --readonly \
      --rw=randread --time_based --ramp_time=30s --runtime=10m \
      --group_reporting

  fio -name=cached --numjobs=16 --filename=/mnt/test.img \
      --buffered=1 --ioengine=mmap \
      --rw=randread --time_based --runtime=5s --group_reporting

Before:
bw (  KiB/s): min= 4141, max=14202, per=100.00%, avg=7935.51, stdev=96.85, samples=18976
iops        : min= 1029, max= 3548, avg=1979.52, stdev=24.23, samples=18976·

READ: bw=4545B/s (4545B/s), 4545B/s-4545B/s (4545B/s-4545B/s), io=64.0KiB (65.5kB), run=14419-14419msec

After (+12.5%):
bw (  KiB/s): min= 4611, max=15370, per=100.00%, avg=8928.74, stdev=105.17, samples=19146
iops        : min= 1151, max= 3842, avg=2231.27, stdev=26.29, samples=19146

READ: bw=4635B/s (4635B/s), 4635B/s-4635B/s (4635B/s-4635B/s), io=64.0KiB (65.5kB), run=14137-14137msec

The performance is better for both 4K (+7.5%) and THP (+12.5%) cached read.

Link: https://lkml.kernel.org/r/20240415171857.19244-5-ryncsn@gmail.com
Signed-off-by: Kairui Song <kasong@tencent.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Closes: https://lore.kernel.org/linux-mm/A5A976CB-DB57-4513-A700-656580488AB6@flyingcircus.io/
[ kasong@tencent.com: minor adjustment of variable declarations ]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04 16:30:02 +02:00
Kairui Song ff3c557fa9 mm/filemap: return early if failed to allocate memory for split
commit de60fd8ddeda2b41fbe11df11733838c5f684616 upstream.

xas_split_alloc could fail with NOMEM, and in such case, it should abort
early instead of keep going and fail the xas_split below.

Link: https://lkml.kernel.org/r/20240416071722.45997-1-ryncsn@gmail.com
Link: https://lkml.kernel.org/r/20240415171857.19244-1-ryncsn@gmail.com
Link: https://lkml.kernel.org/r/20240415171857.19244-2-ryncsn@gmail.com
Signed-off-by: Kairui Song <kasong@tencent.com>
Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stable-dep-of: 6758c1128ceb ("mm/filemap: optimize filemap folio adding")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04 16:30:02 +02:00
Shu Han 49d3a4ad57 mm: call the security_mmap_file() LSM hook in remap_file_pages()
commit ea7e2d5e49c05e5db1922387b09ca74aa40f46e2 upstream.

The remap_file_pages syscall handler calls do_mmap() directly, which
doesn't contain the LSM security check. And if the process has called
personality(READ_IMPLIES_EXEC) before and remap_file_pages() is called for
RW pages, this will actually result in remapping the pages to RWX,
bypassing a W^X policy enforced by SELinux.

So we should check prot by security_mmap_file LSM hook in the
remap_file_pages syscall handler before do_mmap() is called. Otherwise, it
potentially permits an attacker to bypass a W^X policy enforced by
SELinux.

The bypass is similar to CVE-2016-10044, which bypass the same thing via
AIO and can be found in [1].

The PoC:

$ cat > test.c

int main(void) {
	size_t pagesz = sysconf(_SC_PAGE_SIZE);
	int mfd = syscall(SYS_memfd_create, "test", 0);
	const char *buf = mmap(NULL, 4 * pagesz, PROT_READ | PROT_WRITE,
		MAP_SHARED, mfd, 0);
	unsigned int old = syscall(SYS_personality, 0xffffffff);
	syscall(SYS_personality, READ_IMPLIES_EXEC | old);
	syscall(SYS_remap_file_pages, buf, pagesz, 0, 2, 0);
	syscall(SYS_personality, old);
	// show the RWX page exists even if W^X policy is enforced
	int fd = open("/proc/self/maps", O_RDONLY);
	unsigned char buf2[1024];
	while (1) {
		int ret = read(fd, buf2, 1024);
		if (ret <= 0) break;
		write(1, buf2, ret);
	}
	close(fd);
}

$ gcc test.c -o test
$ ./test | grep rwx
7f1836c34000-7f1836c35000 rwxs 00002000 00:01 2050 /memfd:test (deleted)

Link: https://project-zero.issues.chromium.org/issues/42452389 [1]
Cc: stable@vger.kernel.org
Signed-off-by: Shu Han <ebpqwerty472123@gmail.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04 16:29:43 +02:00
Jianping Liu 6c7ba3824d Merge linux 6.6.52 2024-09-29 19:06:44 +08:00
Jianping Liu bfe2e2a287 Merge linux 6.6.51 2024-09-29 19:06:17 +08:00
Jianping Liu 962cf19785 Merge linux 6.6.49 2024-09-29 16:02:40 +08:00
Jianping Liu 60113f62b0 Merge linux 6.6.48
Conflicts:
	fs/nfsd/nfssvc.c

Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
2024-09-29 16:01:50 +08:00
frankjpliu 90f53db9f8 Merge branch 'haisu/master-tryrue-20240905-merge-honglin' into 'master' (merge request !197)
rue: Port MM/IO/NET functions for tkernel5

This MR including RUE MM/IO/NET functions for tkernel5.
Could be built as module or built-in.
The corresponding module code existed in [1].

Signed-off-by: Haisu Wang <haisuwang@tencent.com>
2024-09-29 01:56:56 +00:00
Haisu Wang f630af7168 rue/io: buffered_write_bps hierarchy support
Support hierarchy setting of buffered_write_bps

Signed-off-by: Haisu Wang <haisuwang@tencent.com>
2024-09-28 15:42:22 +08:00
Haisu Wang fed4a7c8be rue/io: add io_cgv1_buff_wb to enable buffer IO counting in cgroup v1
Add a sysctl switch to control buffer IO counting in
memcg of cgroup v1. If turn on this switch, remove
memory cgroup may leave zombie slabs until wb finished.

Need to turn on io_qos and io_cgv1_buff_wb in cgroup v1.

Signed-off-by: Haisu Wang <haisuwang@tencent.com>
Reviewed-by: Bin Lai <robinlai@tencent.com>
2024-09-28 15:42:22 +08:00
Haisu Wang 826a0366a1 rue/io: introduce per mem_cgroup sync interface
Introduce the per cgroup.sync interface, so that we can ensure
that the dirty pages of the cgroup are actually written to the
disk without considering the dirty pages generated elsewhere.
This can avoid the problem of large cgroup exit delay caused
by system-level sync and avoid the problem of IO jitter.

Note:
struct wb_writeback_work moved from fs/fs-writeback.c to
include/linux/writeback.h

Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Signed-off-by: Haisu Wang <haisuwang@tencent.com>
2024-09-28 15:42:22 +08:00
Haisu Wang a12bb1a43d rue/io: add bufio isolation based for cgroup v1
Add buffer IO isolation bind_blkio based on v2 infrastructure to v1,
so we can unify the interface for dio and bufio.

Add sysctl switch to allow migrate already bind cgroup.

Signed-off-by: Haisu Wang <haisuwang@tencent.com>
Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Signed-off-by: Lenny Chen <lennychen@tencent.com>
2024-09-28 15:42:22 +08:00
Haisu Wang 1860b51781 rue/io: add buffer IO writeback throtl for cgroup v1
Add buffer IO throttle for cgroup v1 base on dirty throttle,
Since the actual IO speed is not considered, this solution
may cause the continuous accumulation of dirty pages in the
IO performance bottleneck scenario, which will lead to the
deterioration of the isolation effect.

Note:
struct blkcg moved from block/blk-cgroup.h to
include/linux/blk-cgroup.h

Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Signed-off-by: Lenny Chen <lennychen@tencent.com>
Signed-off-by: Haisu Wang <haisuwang@tencent.com>
2024-09-28 15:41:58 +08:00
Haojie Ning a1574c433d rue/mm: add sysctl_vm_use_priority_oom to enable priority oom for all cgroups
Add sysctl_vm_use_priority_oom as a global setting to enable the
priority_oom setting for all cgroups without the need to manually
set it for each cgroup. This global setting has no effect when it
is turned off.

Signed-off-by: Haojie Ning <paulning@tencent.com>
Signed-off-by: Honglin Li <honglinli@tencent.com>
2024-09-27 11:13:32 +08:00
Honglin Li 7c45f9b01f rue/mm: compatible with mglru for pagecache limit
The pagecache limit for system and per-cgroup will
cause the process to get stuck when mglru is enabled.
Use lru_gen_enabled() to check whether mglru is
enabled in the system.

Signed-off-by: Honglin Li <honglinli@tencent.com>
Signed-off-by: Kairui Song <kasong@tencent.com>
Signed-off-by: Jingxiang Zeng <linuszeng@tencent.com>
2024-09-27 11:13:32 +08:00
Xin Hao 4e6f350b03 rue/mm: fix file page_counter 'memcg->pagecache' error when THP enabled
When the CONFIG_MEM_QPS feature is enabled, the __mod_lruvec_state function is
called to increase the page_counter 'pagecache' value in per-memcg by 'NR_FILE_PAGES',
which is not a problem if THP is not enabled, but if THP is enabled, the CONFIG_MEM_QPS
feature forgot to increase the value of page_counter 'pagecache', because THP pagecache
becomes 'NR_FILE_THPS' type. it will lead to the page_counter 'pagecache' val becomes
negative val when these THP pagecache pages is released, so the results in the following
warning situation.

[55530.397796] ------------[ cut here ]------------
[55530.398854] page_counter underflow: -512 nr_pages=512
[55530.399864] WARNING: CPU: 1 PID: 3026157 at mm/page_counter.c:63 page_counter_cancel+0x55/0x60
[55530.412193] CPU: 1 PID: 3026157 Comm: bash Kdump: loaded Tainted: G
[55530.416075] RIP: 0010:page_counter_cancel+0x55/0x60
[55530.421353] RAX: 0000000000000000 RBX: ffff8888161a8270 RCX: 0000000000000006
[55530.422680] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff88881f85bb60
[55530.424008] RBP: ffffc90004ceba58 R08: 0000000000009617 R09: ffff88881584c820
[55530.425330] R10: 0000000000000000 R11: ffffffffa00d60b0 R12: 0000000000000200
[55530.426663] R13: ffff8888194f7000 R14: 0000000000000000 R15: 0000000000000000
[55530.427999] FS:  00007fe2932d1740(0000) GS:ffff88881f840000(0000) knlGS:0000000000000000
[55530.429447] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[55530.430645] CR2: 00007f97c4e00000 CR3: 00000007e7256004 CR4: 00000000003706e0
[55530.432007] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[55530.433360] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[55530.434711] Call Trace:
[55530.435541]  page_counter_uncharge+0x22/0x40
[55530.436571]  __mod_memcg_state.part.80+0x79/0xe0
[55530.437645]  __mod_memcg_lruvec_state+0x27/0x110
[55530.438712]  __mod_lruvec_state+0x39/0x40
[55530.439712]  unaccount_page_cache_page+0xd0/0x210
[55530.440803]  __delete_from_page_cache+0x3d/0x1d0
[55530.441877]  __remove_mapping+0xeb/0x220
[55530.442871]  remove_mapping+0x16/0x30
[55530.443836]  invalidate_inode_page+0x84/0x90
[55530.444869]  invalidate_mapping_pages+0x162/0x3e0
[55530.445957]  ? pick_next_task_fair+0x1f2/0x520
[55530.446996]  drop_pagecache_sb+0xac/0x130
[55530.447972]  iterate_supers+0xa2/0x110
[55530.448907]  ? do_coredump+0xb20/0xb20
[55530.449840]  drop_caches_sysctl_handler+0x5d/0x90
[55530.450893]  proc_sys_call_handler+0x1d0/0x290
[55530.451906]  proc_sys_write+0x14/0x20
[55530.452830]  __vfs_write+0x1b/0x40
[55530.453722]  vfs_write+0xab/0x1b0
[55530.454598]  ksys_write+0x61/0xe0
[55530.455471]  __x64_sys_write+0x1a/0x20
[55530.456392]  do_syscall_64+0x4d/0x120
[55530.457296]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[55530.458346] RIP: 0033:0x7fe292836bc8

Fixes: a0d7d9851512 ("rue/mm: pagecache limit per cgroup support")
Signed-off-by: Xin Hao <vernhao@tencent.com>
2024-09-27 11:13:31 +08:00
Honglin Li b82ababba6 rue/mm: introduce new feature to async clean dying memcgs
When memcg was removed, page caches and slab pages still
reference to this memcg, it will cause very large number
of dying memcgs in out system. This feature can async to
clean dying memcgs in system.

1) sysctl -w vm.clean_dying_memcg_async=1
   #start a kthread to async clean dying memcgs, default
   #value is 0.

2) sysctl -w vm.clean_dying_memcg_threshold=10
   #Whenever 10 dying memcgs are generated in the system,
   #wakeup a kthread to async clean dying memcgs, default
   #value is 100.

Signed-off-by: Bin Lai <robinlai@tencent.com>
Signed-off-by: Jingxiang Zeng <linuszeng@tencent.com>
Signed-off-by: Honglin Li <honglinli@tencent.com>
2024-09-27 11:13:31 +08:00
Honglin Li 200560da23 rue/mm: introduce memcg page cache hit & miss ratio tool
A new memory.page_cache_hit control file is added
under each memory cgroup directory. Cat this file can
print page cache hit and miss ratio at the memory
cgroup level.

Signed-off-by: Jingxiang Zeng <linuszeng@tencent.com>
Signed-off-by: Honglin Li <honglinli@tencent.com>
2024-09-27 11:13:31 +08:00
Honglin Li 8de07be077 rue/mm: introduce memory allocation latency for per-cgroup tool
A new memory.latency_histogram control file is added
under each memory cgroup directory. Cat this file can
print the memory access latency at the memory cgroup level.

Signed-off-by: Jingxiang Zeng <linuszeng@tencent.com>
Signed-off-by: Honglin Li <honglinli@tencent.com>
2024-09-27 11:13:31 +08:00
Honglin Li 75ad2bae3d rue/mm: pagecache limit per cgroup support
Functional test:
http://tapd.oa.com/TencentOS_QoS/prong/stories/view/
1020426664867405667?jump_count=1

Signed-off-by: Xiaoguang Chen <xiaoggchen@tencent.com>
Signed-off-by: Jingxiang Zeng <linuszeng@tencent.com>
Signed-off-by: Xuan Liu <benxliu@tencent.com>
Signed-off-by: Honglin Li <honglinli@tencent.com>
2024-09-27 11:13:31 +08:00
Honglin Li 56d80c4ea2 rue/mm: add memory cgroup async page reclaim mechanism
Introduce background page reclaim mechanism for memcg, it can
be configured according to the cgroup priorities for different
reclaim strategies.

Signed-off-by: Yulei Zhang <yuleixzhang@tencent.com>
Signed-off-by: Mengmeng Chen <bauerchen@tencent.com>
Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Signed-off-by: Honglin Li <honglinli@tencent.com>
2024-09-27 11:13:31 +08:00
Honglin Li 0d35c4c639 rue/mm: introduce memcg priority oom
Under memory pressure reclaim and oom would happen,
with multiple cgroups exist in one system, we might
want some of their memory or tasks survived the
reclaim and oom while there are other cadidates.

When oom happens it always choose victim from low
priority memcg. And it works both for memcg oom and
global oom, it can be enabled/disabled through
@memory.use_priority_oom, for global oom through the root
memcg's @memory.use_priority_oom, it is disabled by default.

Signed-off-by: Haiwei Li <gerryhwli@tencent.com>
Signed-off-by: Mengmeng Chen <bauerchen@tencent.com>
Signed-off-by: Xiaoguang Chen <xiaoggchen@tencent.com>
Signed-off-by: Honglin Li <honglinli@tencent.com>
2024-09-27 11:13:31 +08:00
Honglin Li db44c11cdd rue/mm: add priority reclaim support
Introduce the sync && async priority reclaim mechanism.

Signed-off-by: Yu Liu <allanyuliu@tencent.com>
Signed-off-by: Xiaoguang Chen <xiaoggchen@tencent.com>
Signed-off-by: Honglin Li <honglinli@tencent.com>
2024-09-27 11:13:30 +08:00
Honglin Li 04f49a445c pagecachelimit: set an initial value for may_deactivate in shrink page cache
The global pagecache limit function fails due to backport the
upstream commit. In the scenario where the active file list
needs to be reclaimed, it cannot reclaim the LRU_ACTIVE_FILE
list, making the pagecache limit inaccurate.

When shrinking page cache, we set an initial value for
may_deactivate in scan_control to DEACTIVATE_FILE, allowing
the active file list to be scanned in shrink_list.

Signed-off-by: Honglin Li <honglinli@tencent.com>
Reviewed-by: Hongbo Li <herberthbli@tencent.com>
2024-09-27 11:13:30 +08:00
Haisu Wang 2fc4b0e9c0 mm: set default watermark_boost_factor value to 0
Upstream: no

Watermark boost factor controls the level of reclaim when memory is
being fragmented. The intent is that compaction has less work to do in the
future and to increase the success rate of future high-order allocations
such as SLUB allocations, THP and hugetlbfs pages.
However, it wakeup kswapd to do defragmentation, the action caused
performance jitter in many cases without enough gain.

In some distributions like Debian, also set the default boost
fator to 0 to disable the feature.

WXG Story of compaction cause performance jitter:
https://doc.weixin.qq.com/doc/w3_AIAAcwacAAYudo6ERcUQMiNUbmvzb?scode=AJEAIQdfAAoeO7AbqSAYQATQaYAJg

Signed-off-by: Haisu Wang <haisuwang@tencent.com>
Signed-off-by: Kairui Song <kasong@tencent.com>
Signed-off-by: Zeng Jingxiang <linuszeng@tencent.com>
Reviewed-by:  Jianping Liu <frankjpliu@tencent.com>
2024-09-27 11:13:28 +08:00
Haisu Wang b03afc0d33 Revert "io/tqos: merge buffer io limit series patch from brookxu, and rework some function."
This reverts commit 538ec11bed.

Revert due to refactory the buffer IO function.
In TK5, unnecessary to compatible kabi by using the "nodeinfo"
in "struct mem_cgroup {}".

Original tapd and MR:
  https://tapd.woa.com/tapd_fe/20422414/story/detail/1020422414117471502
  https://git.woa.com/tlinux/tkernel5/-/merge_requests/117

Signed-off-by: Haisu Wang <haisuwang@tencent.com>
2024-09-27 11:13:24 +08:00
Haisu Wang 3231efb956 Revert "io/tqos: add sysctl_buffer_io_limit switch for buffer io limit."
This reverts commit 4d87de6bb4.

Revert due to refactory the buffer IO function.
In TK5, unnecessary to compatible kabi by using the "nodeinfo"
in "struct mem_cgroup {}".

Original tapd and MR:
  https://tapd.woa.com/tapd_fe/20422414/story/detail/1020422414117471502
  https://git.woa.com/tlinux/tkernel5/-/merge_requests/117

Signed-off-by: Haisu Wang <haisuwang@tencent.com>
2024-09-27 11:13:21 +08:00
Linus Torvalds a95a24fcae mm: avoid leaving partial pfn mappings around in error case
commit 79a61cc3fc0466ad2b7b89618a6157785f0293b3 upstream.

As Jann points out, PFN mappings are special, because unlike normal
memory mappings, there is no lifetime information associated with the
mapping - it is just a raw mapping of PFNs with no reference counting of
a 'struct page'.

That's all very much intentional, but it does mean that it's easy to
mess up the cleanup in case of errors.  Yes, a failed mmap() will always
eventually clean up any partial mappings, but without any explicit
lifetime in the page table mapping itself, it's very easy to do the
error handling in the wrong order.

In particular, it's easy to mistakenly free the physical backing store
before the page tables are actually cleaned up and (temporarily) have
stale dangling PTE entries.

To make this situation less error-prone, just make sure that any partial
pfn mapping is torn down early, before any other error handling.

Reported-and-tested-by: Jann Horn <jannh@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Simona Vetter <simona.vetter@ffwll.ch>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-18 19:24:07 +02:00
Kairui Song 51254ba2b2 emm: fix cgroup initilization check
Should check for cgroup switch instead of root cgroup status.

Signed-off-by: Kairui Song <kasong@tencent.com>
2024-09-13 17:55:43 +08:00
Usama Arif 0eceaa9d05 Revert "mm: skip CMA pages when they are not available"
[ Upstream commit bfe0857c20c663fcc1592fa4e3a61ca12b07dac9 ]

This reverts commit 5da226dbfc ("mm: skip CMA pages when they are not
available") and b7108d6631 ("Multi-gen LRU: skip CMA pages when they are
not eligible").

lruvec->lru_lock is highly contended and is held when calling
isolate_lru_folios.  If the lru has a large number of CMA folios
consecutively, while the allocation type requested is not MIGRATE_MOVABLE,
isolate_lru_folios can hold the lock for a very long time while it skips
those.  For FIO workload, ~150million order=0 folios were skipped to
isolate a few ZONE_DMA folios [1].  This can cause lockups [1] and high
memory pressure for extended periods of time [2].

Remove skipping CMA for MGLRU as well, as it was introduced in sort_folio
for the same resaon as 5da226dbfc.

[1] https://lore.kernel.org/all/CAOUHufbkhMZYz20aM_3rHZ3OcK4m2puji2FGpUpn_-DevGk3Kg@mail.gmail.com/
[2] https://lore.kernel.org/all/ZrssOrcJIDy8hacI@gmail.com/

[usamaarif642@gmail.com: also revert b7108d6631, per Johannes]
  Link: https://lkml.kernel.org/r/9060a32d-b2d7-48c0-8626-1db535653c54@gmail.com
  Link: https://lkml.kernel.org/r/357ac325-4c61-497a-92a3-bdbd230d5ec9@gmail.com
Link: https://lkml.kernel.org/r/9060a32d-b2d7-48c0-8626-1db535653c54@gmail.com
Fixes: 5da226dbfc ("mm: skip CMA pages when they are not available")
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Bharata B Rao <bharata@amd.com>
Cc: Breno Leitao <leitao@debian.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zhaoyang Huang <huangzhaoyang@gmail.com>
Cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-12 11:11:42 +02:00
Vern Hao 9a9974713d mm/vmscan: use folio_migratetype() instead of get_pageblock_migratetype()
[ Upstream commit 97144ce008f918249fa7275ee1d29f6f27665c34 ]

In skip_cma(), we can use folio_migratetype() to replace
get_pageblock_migratetype().

Link: https://lkml.kernel.org/r/20230825075735.52436-1-user@VERNHAO-MC1
Signed-off-by: Vern Hao <vernhao@tencent.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stable-dep-of: bfe0857c20c6 ("Revert "mm: skip CMA pages when they are not available"")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-12 11:11:42 +02:00