OpenCloudOS-Kernel/mm
Linus Torvalds 6cce9b22fc mm: make wait_on_page_writeback() wait for multiple pending writebacks
commit c2407cf7d2 upstream.

Ever since commit 2a9127fcf2 ("mm: rewrite wait_on_page_bit_common()
logic") we've had some very occasional reports of BUG_ON(PageWriteback)
in write_cache_pages(), which we thought we already fixed in commit
073861ed77 ("mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)").

But syzbot just reported another one, even with that commit in place.

And it turns out that there's a simpler way to trigger the BUG_ON() than
the one Hugh found with page re-use.  It all boils down to the fact that
the page writeback is ostensibly serialized by the page lock, but that
isn't actually really true.

Yes, the people _setting_ writeback all do so under the page lock, but
the actual clearing of the bit - and waking up any waiters - happens
without any page lock.

This gives us this fairly simple race condition:

  CPU1 = end previous writeback
  CPU2 = start new writeback under page lock
  CPU3 = write_cache_pages()

  CPU1          CPU2            CPU3
  ----          ----            ----

  end_page_writeback()
    test_clear_page_writeback(page)
    ... delayed...

                lock_page();
                set_page_writeback()
                unlock_page()

                                lock_page()
                                wait_on_page_writeback();

    wake_up_page(page, PG_writeback);
    .. wakes up CPU3 ..

                                BUG_ON(PageWriteback(page));

where the BUG_ON() happens because we woke up the PG_writeback bit
becasue of the _previous_ writeback, but a new one had already been
started because the clearing of the bit wasn't actually atomic wrt the
actual wakeup or serialized by the page lock.

The reason this didn't use to happen was that the old logic in waiting
on a page bit would just loop if it ever saw the bit set again.

The nice proper fix would probably be to get rid of the whole "wait for
writeback to clear, and then set it" logic in the writeback path, and
replace it with an atomic "wait-to-set" (ie the same as we have for page
locking: we set the page lock bit with a single "lock_page()", not with
"wait for lock bit to clear and then set it").

However, out current model for writeback is that the waiting for the
writeback bit is done by the generic VFS code (ie write_cache_pages()),
but the actual setting of the writeback bit is done much later by the
filesystem ".writepages()" function.

IOW, to make the writeback bit have that same kind of "wait-to-set"
behavior as we have for page locking, we'd have to change our roughly
~50 different writeback functions.  Painful.

Instead, just make "wait_on_page_writeback()" loop on the very unlikely
situation that the PG_writeback bit is still set, basically re-instating
the old behavior.  This is very non-optimal in case of contention, but
since we only ever set the bit under the page lock, that situation is
controlled.

Reported-by: syzbot+2fc0712f8f8b8b8fa0ef@syzkaller.appspotmail.com
Fixes: 2a9127fcf2 ("mm: rewrite wait_on_page_bit_common() logic")
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Bin Lai <robinlai@tencent.com>
Signed-off-by: Xinghui Li <korantli@tencent.com>
2024-06-11 21:18:41 +08:00
..
kasan ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
Kconfig ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
Kconfig.debug mm, page_owner, debug_pagealloc: save and dump freeing stack trace 2019-09-24 15:54:08 -07:00
Makefile mm: silence -Woverride-init/initializer-overrides 2019-09-24 15:54:10 -07:00
backing-dev.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
balloon_compaction.c mm/balloon_compaction: suppress allocation warnings 2019-09-04 07:42:01 -04:00
cleancache.c Driver Core and debugfs changes for 5.3-rc1 2019-07-12 12:24:03 -07:00
cma.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
cma.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
cma_debug.c mm/cma_debug.c: fix the break condition in cma_maxchunk_get() 2019-05-14 09:47:45 -07:00
compaction.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
debug.c tkernel: add base tlinux kernel interfaces 2024-06-11 20:09:33 +08:00
debug_page_ref.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dmapool.c mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options 2019-07-12 11:05:46 -07:00
early_ioremap.c mm/early_ioremap: Fix boot hang with earlyprintk=efi,keep 2017-12-11 14:54:44 +01:00
fadvise.c fs: Export generic_fadvise() 2019-08-30 22:43:58 -07:00
failslab.c mm/failslab.c: by default, do not fail allocations with direct reclaim only 2019-07-12 11:05:43 -07:00
filemap.c Intel: generic_perform_write()/iomap_write_actor(): saner logics for short copy 2024-06-11 21:18:20 +08:00
frame_vector.c mm: untag user pointers in get_vaddr_frames 2019-09-25 17:51:41 -07:00
frontswap.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 482 2019-06-19 17:09:52 +02:00
gup.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
gup_benchmark.c tkernel: add base tlinux kernel interfaces 2024-06-11 20:09:33 +08:00
highmem.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
hmm.c pagewalk: separate function pointers from iterator data 2019-09-07 04:28:04 -03:00
huge_memory.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
hugetlb.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
hugetlb_cgroup.c mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup() 2019-11-15 18:34:00 -08:00
hwpoison-inject.c hwpoison-inject: no need to check return value of debugfs_create functions 2019-06-03 15:39:40 +02:00
init-mm.c kernel/fork: Initialize mm's PASID 2024-06-11 21:12:36 +08:00
internal.h tkernel: add base tlinux kernel interfaces 2024-06-11 20:09:33 +08:00
interval_tree.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 248 2019-06-19 17:09:08 +02:00
khugepaged.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
kmemleak-test.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333 2019-06-05 17:37:06 +02:00
kmemleak.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
ksm.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
list_lru.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
maccess.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
madvise.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
memblock.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
memcontrol.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
memfd.c mm: page cache: store only head pages in i_pages 2019-09-24 15:54:08 -07:00
memory-failure.c mm/hwpoison: fix error page recovered but reported "not recovered" 2024-06-11 21:18:21 +08:00
memory.c x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid() 2024-06-11 21:05:59 +08:00
memory_hotplug.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
mempolicy.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
mempool.c docs/core-api/mm: fix return value descriptions in mm/ 2019-03-05 21:07:20 -08:00
memremap.c tkernel: add base tlinux kernel interfaces 2024-06-11 20:09:33 +08:00
memtest.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
migrate.c tkernel: add base tlinux kernel interfaces 2024-06-11 20:09:33 +08:00
mincore.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
mlock.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
mm_init.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
mmap.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
mmu_context.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
mmu_gather.c tkernel: add base tlinux kernel interfaces 2024-06-11 20:09:33 +08:00
mmu_notifier.c mm/mmu_notifiers: use the right return code for WARN_ON 2019-11-06 08:47:50 -08:00
mmzone.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mprotect.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
mremap.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
msync.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
nommu.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
oom_kill.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
page-writeback.c mm: make wait_on_page_writeback() wait for multiple pending writebacks 2024-06-11 21:18:41 +08:00
page_alloc.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
page_counter.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
page_ext.c mm, page_owner: fix off-by-one error in __set_page_owner_handle() 2019-10-14 15:04:00 -07:00
page_idle.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
page_io.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
page_isolation.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
page_owner.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
page_poison.c mm/page_poison.c: fix a typo in a comment 2019-09-24 15:54:08 -07:00
page_vma_mapped.c mm: introduce page_size() 2019-09-24 15:54:08 -07:00
pagewalk.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
percpu-internal.h percpu: convert chunk hints to be based on pcpu_block_md 2019-03-13 12:25:31 -07:00
percpu-km.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu-stats.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu-vm.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu.c bitmap: genericize percpu bitmap region iterators 2024-06-11 21:16:24 +08:00
pgtable-generic.c x86/mm: Page size aware flush_tlb_mm_range() 2018-10-09 16:51:11 +02:00
process_vm_access.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
readahead.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
rmap.c mm: include <linux/huge_mm.h> for is_vma_temporary_stack 2019-10-19 06:32:32 -04:00
rodata_test.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
shmem.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
shuffle.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
shuffle.h mm: maintain randomization of page free lists 2019-05-14 19:52:48 -07:00
slab.c tkernel: add base tlinux kernel interfaces 2024-06-11 20:09:33 +08:00
slab.h mm: slab: make page_cgroup_ino() to recognize non-compound slab pages properly 2019-11-06 08:47:50 -08:00
slab_common.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
slob.c mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two) 2019-10-07 15:47:20 -07:00
slub.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
sparse-vmemmap.c mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap() 2019-07-18 17:08:07 -07:00
sparse.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
swap.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
swap_cgroup.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
swap_slots.c mm, swap, get_swap_pages: use entry_size instead of cluster in parameter 2018-08-22 10:52:44 -07:00
swap_state.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
swapfile.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
truncate.c mm/thp: allow dropping THP from page cache 2019-10-19 06:32:33 -04:00
usercopy.c usercopy: Avoid HIGHMEM pfn warning 2019-09-17 15:20:17 -07:00
userfaultfd.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
util.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
vmacache.c mm: get rid of vmacache_flush_all() entirely 2018-09-13 15:18:04 -10:00
vmalloc.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
vmpressure.c mm/vmpressure.c: fix a signedness bug in vmpressure_register_event() 2019-10-07 15:47:19 -07:00
vmscan.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
vmstat.c tkernel: add base tlinux kernel interfaces 2024-06-11 20:09:33 +08:00
workingset.c mm: workingset: fix vmstat counters for shadow nodes 2019-08-13 16:06:52 -07:00
z3fold.c mm/z3fold.c: claim page in the beginning of free 2019-10-07 15:47:19 -07:00
zbud.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
zpool.c zpool: add malloc_support_movable to zpool_driver 2019-09-24 15:54:12 -07:00
zsmalloc.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
zswap.c zswap: do not map same object twice 2019-09-24 15:54:12 -07:00