OpenCloudOS-Kernel/mm
Mike Kravetz 7118fc2906 hugetlb: address ref count racing in prep_compound_gigantic_page
In [1], Jann Horn points out a possible race between
prep_compound_gigantic_page and __page_cache_add_speculative.  The root
cause of the possible race is prep_compound_gigantic_page uncondittionally
setting the ref count of pages to zero.  It does this because
prep_compound_gigantic_page is handed a 'group' of pages from an allocator
and needs to convert that group of pages to a compound page.  The ref
count of each page in this 'group' is one as set by the allocator.
However, the ref count of compound page tail pages must be zero.

The potential race comes about when ref counted pages are returned from
the allocator.  When this happens, other mm code could also take a
reference on the page.  __page_cache_add_speculative is one such example.
Therefore, prep_compound_gigantic_page can not just set the ref count of
pages to zero as it does today.  Doing so would lose the reference taken
by any other code.  This would lead to BUGs in code checking ref counts
and could possibly even lead to memory corruption.

There are two possible ways to address this issue.

1) Make all allocators of gigantic groups of pages be able to return a
   properly constructed compound page.

2) Make prep_compound_gigantic_page be more careful when constructing a
   compound page.

This patch takes approach 2.

In prep_compound_gigantic_page, use cmpxchg to only set ref count to zero
if it is one.  If the cmpxchg fails, call synchronize_rcu() in the hope
that the extra ref count will be driopped during a rcu grace period.  This
is not a performance critical code path and the wait should be
accceptable.  If the ref count is still inflated after the grace period,
then undo any modifications made and return an error.

Currently prep_compound_gigantic_page is type void and does not return
errors.  Modify the two callers to check for and handle error returns.  On
error, the caller must free the 'group' of pages as they can not be used
to form a gigantic page.  After freeing pages, the runtime caller
(alloc_fresh_huge_page) will retry the allocation once.  Boot time
allocations can not be retried.

The routine prep_compound_page also unconditionally sets the ref count of
compound page tail pages to zero.  However, in this case the buddy
allocator is constructing a compound page from freshly allocated pages.
The ref count on those freshly allocated pages is already zero, so the
set_page_count(p, 0) is unnecessary and could lead to confusion.  Just
remove it.

[1] https://lore.kernel.org/linux-mm/CAG48ez23q0Jy9cuVnwAe7t_fdhMk2S7N5Hdi-GLcCeq5bsfLxw@mail.gmail.com/

Link: https://lkml.kernel.org/r/20210622021423.154662-3-mike.kravetz@oracle.com
Fixes: 58a84aa927 ("thp: set compound tail page _count to zero")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Jann Horn <jannh@google.com>
Cc: Youquan Song <youquan.song@intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-30 20:47:27 -07:00
..
kasan kasan: add memory corruption identification support for hardware tag-based mode 2021-06-29 10:53:53 -07:00
kfence mm, slub: change run-time assertion in kmalloc_index() to compile-time 2021-06-29 10:53:46 -07:00
Kconfig mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM 2021-06-29 10:53:55 -07:00
Kconfig.debug mm, page_poison: remove CONFIG_PAGE_POISONING_ZERO 2020-12-15 12:13:46 -08:00
Makefile mm: hugetlb: free the vmemmap pages associated with each HugeTLB page 2021-06-30 20:47:25 -07:00
backing-dev.c writeback, cgroup: release dying cgwbs by switching attached inodes 2021-06-29 10:53:48 -07:00
balloon_compaction.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
bootmem_info.c mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c 2021-06-30 20:47:25 -07:00
cleancache.c Driver Core and debugfs changes for 5.3-rc1 2019-07-12 12:24:03 -07:00
cma.c mm: use proper type for cma_[alloc|release] 2021-05-05 11:27:24 -07:00
cma.h mm: cma: support sysfs 2021-05-05 11:27:24 -07:00
cma_debug.c mm/cma: change cma mutex to irq safe spinlock 2021-05-05 11:27:21 -07:00
cma_sysfs.c mm: cma: support sysfs 2021-05-05 11:27:24 -07:00
compaction.c mm: memcontrol: remove the pgdata parameter of mem_cgroup_page_lruvec 2021-06-29 10:53:50 -07:00
debug.c mm/debug: factor PagePoisoned out of __dump_page 2021-06-29 10:53:53 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: remove redundant pfn_{pmd/pte}() and fix one comment mistake 2021-06-30 20:47:26 -07:00
dmapool.c mm/dmapool: use DEVICE_ATTR_RO macro 2021-06-29 10:53:52 -07:00
early_ioremap.c mm/early_ioremap.c: use __func__ instead of function name 2021-02-26 09:41:02 -08:00
fadvise.c mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED 2020-10-13 18:38:29 -07:00
failslab.c mm/failslab.c: by default, do not fail allocations with direct reclaim only 2019-07-12 11:05:43 -07:00
filemap.c mm: charge active memcg when no mm is set 2021-06-29 10:53:50 -07:00
frontswap.c mm/mempool: minor coding style tweaks 2021-05-05 11:27:27 -07:00
gup.c mm: gup: pack has_pinned in MMF_HAS_PINNED 2021-06-29 10:53:48 -07:00
gup_test.c selftests/vm: gup_test: test faulting in kernel, and verify pinnable pages 2021-05-05 11:27:26 -07:00
gup_test.h selftests/vm: gup_test: fix test flag 2021-05-05 11:27:26 -07:00
highmem.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
hmm.c mm: do page fault accounting in handle_mm_fault 2020-08-12 10:58:02 -07:00
huge_memory.c mm/huge_memory.c: don't discard hugepage if other processes are mapping it 2021-06-30 20:47:26 -07:00
hugetlb.c hugetlb: address ref count racing in prep_compound_gigantic_page 2021-06-30 20:47:27 -07:00
hugetlb_cgroup.c hugetlb: make free_huge_page irq safe 2021-05-05 11:27:22 -07:00
hugetlb_vmemmap.c mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON 2021-06-30 20:47:26 -07:00
hugetlb_vmemmap.h mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate 2021-06-30 20:47:25 -07:00
hwpoison-inject.c mm,hwpoison-inject: don't pin for hwpoison_filter 2020-10-16 11:11:16 -07:00
init-mm.c mm/gup: prevent gup_fast from racing with COW during fork 2020-12-15 12:13:39 -08:00
internal.h mm/page_alloc: allow high-order pages to be stored on the per-cpu lists 2021-06-29 10:53:55 -07:00
interval_tree.c mm/interval_tree: add comments to improve code readability 2021-04-30 11:20:38 -07:00
io-mapping.c mm: add a io_mapping_map_user helper 2021-04-30 11:20:39 -07:00
ioremap.c mm/ioremap: fix iomap_max_page_shift 2021-05-14 19:41:32 -07:00
khugepaged.c mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled() 2021-06-30 20:47:26 -07:00
kmemleak.c mm/kmemleak: fix possible wrong memory scanning period 2021-06-29 10:53:47 -07:00
ksm.c mm/ksm: use vma_lookup() in find_mergeable_vma() 2021-06-29 10:53:52 -07:00
list_lru.c mm: vmscan: consolidate shrinker_maps handling code 2021-05-05 11:27:23 -07:00
maccess.c uaccess: add force_uaccess_{begin,end} helpers 2020-08-12 10:57:59 -07:00
madvise.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
mapping_dirty_helpers.c mm/mapping_dirty_helpers: guard hugepage pud's usage 2021-04-16 16:10:37 -07:00
memblock.c mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA 2021-06-29 10:53:55 -07:00
memcontrol.c loop: charge i/o to mem and blk cg 2021-06-29 10:53:50 -07:00
memfd.c mm: page cache: store only head pages in i_pages 2019-09-24 15:54:08 -07:00
memory-failure.c mm,hwpoison: make get_hwpoison_page() call get_any_page() 2021-06-29 10:53:56 -07:00
memory.c mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA 2021-06-29 10:53:55 -07:00
memory_hotplug.c mm: sparsemem: use huge PMD mapping for vmemmap pages 2021-06-30 20:47:26 -07:00
mempolicy.c mm/vmstat: convert NUMA statistics to basic NUMA counters 2021-06-29 10:53:54 -07:00
mempool.c mm/mempool: minor coding style tweaks 2021-05-05 11:27:27 -07:00
memremap.c mm/memremap.c: fix improper SPDX comment style 2021-04-30 11:20:37 -07:00
memtest.c
migrate.c mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY 2021-06-30 20:47:26 -07:00
mincore.c inode: make init and permission helpers idmapped mount aware 2021-01-24 14:27:16 +01:00
mlock.c mm/mempool: minor coding style tweaks 2021-05-05 11:27:27 -07:00
mm_init.c include/linux/page-flags-layout.h: cleanups 2021-04-30 11:20:42 -07:00
mmap.c mm/mmap: use find_vma_intersection() in do_mmap() for overlap 2021-06-29 10:53:51 -07:00
mmap_lock.c mm: mmap_lock: use local locks instead of disabling preemption 2021-06-29 10:53:47 -07:00
mmu_gather.c mm: eliminate "expecting prototype" kernel-doc warnings 2021-04-16 16:10:36 -07:00
mmu_notifier.c mm/mmu_notifiers: ensure range_end() is paired with range_start() 2021-03-25 09:22:55 -07:00
mmzone.c mm/lru: replace pgdat lru_lock with lruvec lock 2020-12-15 14:48:04 -08:00
mprotect.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
mremap.c mm/mremap: use vma_lookup() in vma_to_resize() 2021-06-29 10:53:52 -07:00
msync.c mm/msync: exit early when the flags is an MS_ASYNC and start < vm_start 2021-04-30 11:20:37 -07:00
nommu.c mm: ignore MAP_EXECUTABLE in ksys_mmap_pgoff() 2021-06-29 10:53:50 -07:00
oom_kill.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
page-writeback.c fs: remove noop_set_page_dirty() 2021-06-29 10:53:48 -07:00
page_alloc.c hugetlb: address ref count racing in prep_compound_gigantic_page 2021-06-30 20:47:27 -07:00
page_counter.c mm: page_counter: mitigate consequences of a page_counter underflow 2021-04-30 11:20:38 -07:00
page_ext.c mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM 2021-06-29 10:53:55 -07:00
page_idle.c mm: page_idle_get_page() does not need lru_lock 2020-12-15 14:48:03 -08:00
page_io.c swap: fix swapfile read/write offset 2021-03-02 17:25:46 -07:00
page_isolation.c mm/page_isolation: do not isolate the max order page 2020-12-15 12:13:45 -08:00
page_owner.c mm/page_owner: constify dump_page_owner 2021-06-29 10:53:53 -07:00
page_poison.c mm: page_poison: print page info when corruption is caught 2021-04-30 11:20:36 -07:00
page_reporting.c mm/page_reporting: allow driver to specify reporting order 2021-06-29 10:53:47 -07:00
page_reporting.h mm/page_reporting: export reporting order as module parameter 2021-06-29 10:53:47 -07:00
page_vma_mapped.c mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() 2021-06-24 19:40:53 -07:00
pagewalk.c mm: pagewalk: fix walk for hugepage tables 2021-06-29 10:53:49 -07:00
percpu-internal.h mm: fix typos in comments 2021-05-07 00:26:35 -07:00
percpu-km.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-stats.c percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-09 13:58:38 +00:00
percpu-vm.c mm/vmalloc: remove unmap_kernel_range 2021-04-30 11:20:40 -07:00
percpu.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
pgalloc-track.h mm: fix typos in comments 2021-05-07 00:26:35 -07:00
pgtable-generic.c mm/thp: fix __split_huge_pmd_locked() on shmem migration entry 2021-06-16 09:24:42 -07:00
process_vm_access.c mm/process_vm_access.c: remove duplicate include 2021-05-05 11:27:27 -07:00
ptdump.c mm: ptdump: fix build failure 2021-04-16 16:10:37 -07:00
readahead.c mm: Implement readahead_control pageset expansion 2021-04-23 10:14:29 +01:00
rmap.c mm/thp: fix page_address_in_vma() on file THP tails 2021-06-16 09:24:42 -07:00
rodata_test.c mm/rodata_test.c: fix missing function declaration 2020-08-21 09:52:53 -07:00
shmem.c mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled() 2021-06-30 20:47:26 -07:00
shuffle.c mm: eliminate "expecting prototype" kernel-doc warnings 2021-04-16 16:10:36 -07:00
shuffle.h mm/shuffle: fix section mismatch warning 2021-05-22 15:09:07 -10:00
slab.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
slab.h mm: memcg/slab: properly set up gfp flags for objcg pointer array 2021-06-29 10:53:49 -07:00
slab_common.c mm: memcg/slab: disable cache merging for KMALLOC_NORMAL caches 2021-06-29 10:53:49 -07:00
slob.c mm: Don't build mm_dump_obj() on CONFIG_PRINTK=n kernels 2021-03-08 14:18:46 -08:00
slub.c mm/slub: add taint after the errors are printed 2021-06-29 10:53:47 -07:00
sparse-vmemmap.c mm: sparsemem: split the huge PMD mapping of vmemmap pages 2021-06-30 20:47:26 -07:00
sparse.c mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c 2021-06-30 20:47:25 -07:00
swap.c mm/page_alloc: allow high-order pages to be stored on the per-cpu lists 2021-06-29 10:53:55 -07:00
swap_cgroup.c mm: memcontrol: make swap tracking an integral part of memory control 2020-06-03 20:09:48 -07:00
swap_slots.c mm/swap_slots.c: delete meaningless forward declarations 2021-06-29 10:53:49 -07:00
swap_state.c swap: check mapping_empty() for swap cache before being freed 2021-06-29 10:53:49 -07:00
swapfile.c mm, swap: remove unnecessary smp_rmb() in swap_type_to_swap_info() 2021-06-29 10:53:49 -07:00
truncate.c mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() 2021-06-16 09:24:42 -07:00
usercopy.c mm/usercopy.c: delete duplicated word 2020-08-12 10:57:58 -07:00
userfaultfd.c mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY 2021-06-30 20:47:26 -07:00
util.c mm/util.c: fix typo 2021-05-05 11:27:25 -07:00
vmacache.c kernel: better document the use_mm/unuse_mm API contract 2020-06-10 19:14:18 -07:00
vmalloc.c mm/vmalloc: enable mapping of huge pages at pte level in vmalloc 2021-06-30 20:47:26 -07:00
vmpressure.c mm: vmpressure: use mem_cgroup_is_root API 2020-04-02 09:35:31 -07:00
vmscan.c mm/page_alloc: limit the number of pages on PCP lists when reclaim is active 2021-06-29 10:53:54 -07:00
vmstat.c mm/vmstat: inline NUMA event counter updates 2021-06-29 10:53:54 -07:00
workingset.c mm: memcontrol: remove the pgdata parameter of mem_cgroup_page_lruvec 2021-06-29 10:53:50 -07:00
z3fold.c mm: fix some typos and code style problems 2021-05-07 00:26:33 -07:00
zbud.c mm: set the sleep_mapped to true for zbud and z3fold 2021-02-26 09:41:01 -08:00
zpool.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
zsmalloc.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
zswap.c mm/zswap.c: switch from strlcpy to strscpy 2021-05-05 11:27:27 -07:00