OpenCloudOS-Kernel/mm
Yu Zhao 3e1e476361 mm/mglru: fix overshooting shrinker memory
commit 3f74e6bd3b84a8b6bb3cc51609c89e5b9d58eed7 upstream.

set_initial_priority() tries to jump-start global reclaim by estimating
the priority based on cold/hot LRU pages.  The estimation does not account
for shrinker objects, and it cannot do so because their sizes can be in
different units other than page.

If shrinker objects are the majority, e.g., on TrueNAS SCALE 24.04.0 where
ZFS ARC can use almost all system memory, set_initial_priority() can
vastly underestimate how much memory ARC shrinker can evict and assign
extreme low values to scan_control->priority, resulting in overshoots of
shrinker objects.

To reproduce the problem, using TrueNAS SCALE 24.04.0 with 32GB DRAM, a
test ZFS pool and the following commands:

  fio --name=mglru.file --numjobs=36 --ioengine=io_uring \
      --directory=/root/test-zfs-pool/ --size=1024m --buffered=1 \
      --rw=randread --random_distribution=random \
      --time_based --runtime=1h &

  for ((i = 0; i < 20; i++))
  do
    sleep 120
    fio --name=mglru.anon --numjobs=16 --ioengine=mmap \
      --filename=/dev/zero --size=1024m --fadvise_hint=0 \
      --rw=randrw --random_distribution=random \
      --time_based --runtime=1m
  done

To fix the problem:
1. Cap scan_control->priority at or above DEF_PRIORITY/2, to prevent
   the jump-start from being overly aggressive.
2. Account for the progress from mm_account_reclaimed_pages(), to
   prevent kswapd_shrink_node() from raising the priority
   unnecessarily.

Link: https://lkml.kernel.org/r/20240711191957.939105-2-yuzhao@google.com
Fixes: e4dde56cd2 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reported-by: Alexander Motin <mav@ixsystems.com>
Cc: Wei Xu <weixugc@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03 08:54:12 +02:00
..
damon mm/damon/core: merge regions aggressively when max_nr_regions is unmet 2024-07-18 13:21:24 +02:00
kasan kasan/test: avoid gcc warning for intentional overflow 2024-04-03 15:28:20 +02:00
kfence LoongArch changes for v6.6 2023-09-08 12:16:52 -07:00
kmsan kmsan: do not wipe out origin when doing partial unpoisoning 2024-06-16 13:47:41 +02:00
Kconfig - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
Kconfig.debug mm: page_table_check: Make it dependent on EXCLUSIVE_SYSTEM_RAM 2023-05-29 16:14:28 +01:00
Makefile - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
backing-dev.c blk-wbt: Fix detection of dirty-throttled tasks 2024-02-23 09:25:16 +01:00
balloon_compaction.c
bootmem_info.c
cma.c mm/cma: drop incorrect alignment check in cma_init_reserved_mem 2024-06-16 13:47:42 +02:00
cma.h
cma_debug.c
cma_sysfs.c mm: cma: make kobj_type structure constant 2023-03-28 16:20:06 -07:00
compaction.c mm, treewide: introduce NR_PAGE_ORDERS 2024-05-02 16:32:41 +02:00
debug.c mm: update validate_mm() to use vma iterator 2023-06-09 16:25:31 -07:00
debug_page_alloc.c mm: page_alloc: split out DEBUG_PAGEALLOC 2023-06-09 16:25:23 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: fix BUG_ON with pud advanced test 2024-03-06 14:48:41 +00:00
dmapool.c dmapool: create/destroy cleanup 2023-06-09 16:25:17 -07:00
dmapool_test.c dmapool: add alloc/free performance test 2023-04-05 19:42:38 -07:00
early_ioremap.c mm/early_ioremap.c: improve the execution efficiency of early_ioremap_setup() 2023-06-09 16:25:56 -07:00
fadvise.c mm: remove unnecessary pagevec includes 2023-06-23 16:59:31 -07:00
fail_page_alloc.c mm: page_alloc: split out FAIL_PAGE_ALLOC 2023-06-09 16:25:23 -07:00
failslab.c mm: fix unexpected changes to {failslab|fail_page_alloc}.attr 2022-11-22 18:50:44 -08:00
filemap.c mm: page_ref: remove folio_try_get_rcu() 2024-07-25 09:50:56 +02:00
folio-compat.c filemap: Add fgf_t typedef 2023-07-24 18:04:30 -04:00
gup.c mm: page_ref: remove folio_try_get_rcu() 2024-07-25 09:50:56 +02:00
gup_test.c Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes. 2023-06-23 16:58:19 -07:00
gup_test.h mm/gup_test: start/stop/read functionality for PIN LONGTERM test 2022-11-08 17:37:15 -08:00
highmem.c mm: ptep_get() conversion 2023-06-19 16:19:25 -07:00
hmm.c mm: enable page walking API to lock vmas during the walk 2023-08-21 13:07:20 -07:00
huge_memory.c mm: fix race between __split_huge_pmd_locked() and GUP-fast 2024-06-16 13:47:40 +02:00
hugetlb.c mm/hugetlb: fix possible recursive locking detected warning 2024-08-03 08:54:11 +02:00
hugetlb_cgroup.c mm/hugetlb: increase use of folios in alloc_huge_page() 2023-02-13 15:54:27 -08:00
hugetlb_vmemmap.c mm: hugetlb_vmemmap: fix a race between vmemmap pmd split 2023-08-18 10:12:14 -07:00
hugetlb_vmemmap.h
hwpoison-inject.c mm/hwpoison: add __init/__exit annotations to module init/exit funcs 2022-10-03 14:03:05 -07:00
init-mm.c mm: move dummy_vm_ops out of a header 2023-08-21 13:37:46 -07:00
internal.h mm/madvise: make MADV_POPULATE_(READ|WRITE) handle VM_FAULT_RETRY properly 2024-05-02 16:32:40 +02:00
interval_tree.c
io-mapping.c
ioremap.c mm: ioremap: remove unneeded ioremap_allowed and iounmap_allowed 2023-08-18 10:12:36 -07:00
khugepaged.c - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
kmemleak.c mm/kmemleak: move up cond_resched() call in page scanning loop 2023-09-02 15:17:34 -07:00
ksm.c mm/ksm: fix ksm_zero_pages accounting 2024-06-16 13:47:41 +02:00
list_lru.c
maccess.c mm: Fix copy_from_user_nofault(). 2023-04-12 17:36:23 -07:00
madvise.c mm/madvise: make MADV_POPULATE_(READ|WRITE) handle VM_FAULT_RETRY properly 2024-05-02 16:32:40 +02:00
mapping_dirty_helpers.c mm: fix clean_record_shared_mapping_range kernel-doc 2023-08-24 16:20:30 -07:00
memblock.c x86/numa: Fix the address overlap check in numa_fill_memblks() 2024-03-01 13:35:06 +01:00
memcontrol.c mm: memcontrol: clarify swapaccount=0 deprecation warning 2024-03-01 13:35:00 +01:00
memfd.c revert "memfd: improve userspace warnings for missing exec-related flags". 2023-09-05 11:11:52 -07:00
memory-failure.c mm/huge_memory: don't unpoison huge_zero_folio 2024-06-21 14:38:46 +02:00
memory-tiers.c memory tier: use helper macro __ATTR_RW() 2023-08-18 10:12:38 -07:00
memory.c x86/mm/pat: fix VM_PAT handling in COW mappings 2024-04-10 16:36:03 +02:00
memory_hotplug.c mm/memory_hotplug: fix memmap_on_memory sysfs value retrieval 2024-01-20 11:51:49 +01:00
mempolicy.c numa: Generalize numa_map_to_online_node() 2023-11-20 11:58:51 +01:00
mempool.c mempool: do not use ksize() for poisoning 2022-11-30 15:58:41 -08:00
memremap.c mm/memremap.c: fix outdated comment in devm_memremap_pages 2023-02-09 16:51:46 -08:00
memtest.c memtest: use {READ,WRITE}_ONCE in memory scanning 2024-04-03 15:28:33 +02:00
migrate.c mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index 2024-03-15 10:48:14 -04:00
migrate_device.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
mincore.c mm: enable page walking API to lock vmas during the walk 2023-08-21 13:07:20 -07:00
mlock.c merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes 2023-08-21 14:26:20 -07:00
mm_init.c efi: disable mirror feature during crashkernel 2024-01-31 16:18:56 -08:00
mm_slot.h mm: introduce common struct mm_slot 2022-10-03 14:02:43 -07:00
mmap.c mm, mmap: fix vma_merge() case 7 with vma_ops->close 2024-04-03 15:28:40 +02:00
mmap_lock.c mm: mmap_lock: replace get_memcg_path_buf() with on-stack buffer 2024-08-03 08:54:12 +02:00
mmu_gather.c mm: fix kernel-doc warning from tlb_flush_rmaps() 2023-08-24 16:20:30 -07:00
mmu_notifier.c mmu_notifiers: rename invalidate_range notifier 2023-08-18 10:12:41 -07:00
mmzone.c
mprotect.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
mremap.c vm: fix move_vma() memory accounting being off 2023-09-16 15:23:31 -07:00
msync.c
nommu.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
oom_kill.c mm: remove redundant K() macro definition 2023-08-21 13:37:44 -07:00
page-writeback.c Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again" 2024-07-11 12:49:16 +02:00
page_alloc.c mm/page_alloc: Separate THP PCP into movable and non-movable categories 2024-07-05 09:34:05 +02:00
page_counter.c
page_ext.c mm/page_ext: move functions around for minor cleanups to page_ext 2023-08-18 10:12:31 -07:00
page_idle.c mm: page_idle: convert page idle to use a folio 2023-01-18 17:12:52 -08:00
page_io.c zswap: make zswap_load() take a folio 2023-08-21 13:37:27 -07:00
page_isolation.c mm/hugetlb: get rid of page_hstate() 2023-08-18 10:12:39 -07:00
page_owner.c mm/page_ext: use page_ext_data helper in page_owner 2023-08-21 13:37:27 -07:00
page_poison.c mm/page_poison: remove unused page_ext.h from page_poison 2023-08-21 13:37:30 -07:00
page_reporting.c mm, treewide: introduce NR_PAGE_ORDERS 2024-05-02 16:32:41 +02:00
page_reporting.h
page_table_check.c mm/page_table_check: fix crash on ZONE_DEVICE 2024-06-27 13:49:13 +02:00
page_vma_mapped.c mm: correct stale comment of function check_pte 2023-08-18 10:12:13 -07:00
pagewalk.c mm/pagewalk: fix bootstopping regression from extra pte_unmap() 2023-09-02 08:39:21 -07:00
percpu-internal.h percpu-internal/pcpu_chunk: re-layout pcpu_chunk structure to reduce false sharing 2023-06-19 16:19:29 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c mm: Introduce flush_cache_vmap_early() 2024-02-16 19:10:52 +01:00
pgalloc-track.h
pgtable-generic.c mm: fix race between __split_huge_pmd_locked() and GUP-fast 2024-06-16 13:47:40 +02:00
process_vm_access.c mm/gup: remove unused vmas parameter from pin_user_pages_remote() 2023-06-09 16:25:25 -07:00
ptdump.c mm: ptdump should use ptep_get_lockless() 2023-06-19 16:19:24 -07:00
readahead.c mm: use memalloc_nofs_save() in page_cache_ra_order() 2024-05-17 12:02:36 +02:00
rmap.c mm: hugetlb: add huge page size param to set_huge_pte_at() 2023-09-29 17:20:47 -07:00
rodata_test.c mm/rodata_test: use PAGE_ALIGNED() helper 2022-10-03 14:03:05 -07:00
secretmem.c mm/secretmem: use a folio in secretmem_fault() 2023-08-21 13:38:02 -07:00
shmem.c mm/shmem: disable PMD-sized page cache if needed 2024-07-18 13:21:24 +02:00
shmem_quota.c tmpfs: fix race on handling dquot rbtree 2024-04-03 15:28:54 +02:00
show_mem.c mm, treewide: introduce NR_PAGE_ORDERS 2024-05-02 16:32:41 +02:00
shrinker_debug.c Revert "mm: shrinkers: make count and scan in shrinker debugfs lockless" 2023-06-19 13:19:34 -07:00
shuffle.c mm/shuffle: convert module_param_call to module_param_cb 2022-10-03 14:03:07 -07:00
shuffle.h mm, treewide: redefine MAX_ORDER sanely 2023-04-05 19:42:46 -07:00
slab.c Randomized slab caches for kmalloc() 2023-07-18 10:07:47 +02:00
slab.h Randomized slab caches for kmalloc() 2023-07-18 10:07:47 +02:00
slab_common.c mm: slab: Do not create kmalloc caches smaller than arch_slab_minalign() 2023-10-11 15:24:49 +02:00
slub.c mm/slub: remove freelist_dereference() 2023-07-14 09:57:21 +02:00
sparse-vmemmap.c mm/vmemmap: allow architectures to override how vmemmap optimization works 2023-08-18 10:12:53 -07:00
sparse.c mm/sparsemem: fix race in accessing memory_section->usage 2024-01-31 16:18:56 -08:00
swap.c mm: remove references to pagevec 2023-06-23 16:59:30 -07:00
swap.h mm/swap: fix race when skipping swapcache 2024-03-01 13:35:00 +01:00
swap_cgroup.c mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled 2022-10-03 14:03:36 -07:00
swap_slots.c mm/swap: convert put_swap_page() to put_swap_folio() 2022-10-03 14:02:46 -07:00
swap_state.c mm/swap: inline folio_set_swap_entry() and folio_swap_entry() 2023-08-24 16:20:28 -07:00
swapfile.c mm: swap: fix race between free_swap_and_cache() and swapoff() 2024-04-03 15:28:27 +02:00
truncate.c - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
usercopy.c mm: Fix copy_from_user_nofault(). 2023-04-12 17:36:23 -07:00
userfaultfd.c mm/userfaultfd: Do not place zeropages when zeropages are disallowed 2024-06-12 11:11:33 +02:00
util.c parisc: fix mmap_base calculation when stack grows upwards 2023-11-28 17:20:08 +00:00
vmalloc.c mm: vmalloc: check if a hash-index is in cpu_possible_mask 2024-07-18 13:21:20 +02:00
vmpressure.c net-memcg: Fix scope of sockmem pressure indicators 2023-08-16 12:21:32 +01:00
vmscan.c mm/mglru: fix overshooting shrinker memory 2024-08-03 08:54:12 +02:00
vmstat.c mm, treewide: introduce NR_PAGE_ORDERS 2024-05-02 16:32:41 +02:00
workingset.c mm: ratelimit stat flush from workingset shrinker 2024-06-16 13:47:31 +02:00
z3fold.c mm/z3fold: remove obsolete comment for struct z3fold_pool 2023-08-21 13:37:51 -07:00
zbud.c mm: zswap: remove shrink from zpool interface 2023-06-19 16:19:27 -07:00
zpool.c mm: zswap: remove shrink from zpool interface 2023-06-19 16:19:27 -07:00
zsmalloc.c merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes 2023-08-21 14:26:20 -07:00
zswap.c mm: zswap: fix missing folio cleanup in writeback race path 2024-03-01 13:35:10 +01:00