Go to file
Linus Torvalds 980a335360 mm: make wait_on_page_writeback() wait for multiple pending writebacks
upstream commit: c2407cf7d2

Ever since commit 2a9127fcf2 ("mm: rewrite wait_on_page_bit_common()
logic") we've had some very occasional reports of BUG_ON(PageWriteback)
in write_cache_pages(), which we thought we already fixed in commit
073861ed77 ("mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)").

But syzbot just reported another one, even with that commit in place.

And it turns out that there's a simpler way to trigger the BUG_ON() than
the one Hugh found with page re-use.  It all boils down to the fact that
the page writeback is ostensibly serialized by the page lock, but that
isn't actually really true.

Yes, the people _setting_ writeback all do so under the page lock, but
the actual clearing of the bit - and waking up any waiters - happens
without any page lock.

This gives us this fairly simple race condition:

  CPU1 = end previous writeback
  CPU2 = start new writeback under page lock
  CPU3 = write_cache_pages()

  CPU1          CPU2            CPU3
  ----          ----            ----

  end_page_writeback()
    test_clear_page_writeback(page)
    ... delayed...

                lock_page();
                set_page_writeback()
                unlock_page()

                                lock_page()
                                wait_on_page_writeback();

    wake_up_page(page, PG_writeback);
    .. wakes up CPU3 ..

                                BUG_ON(PageWriteback(page));

where the BUG_ON() happens because we woke up the PG_writeback bit
becasue of the _previous_ writeback, but a new one had already been
started because the clearing of the bit wasn't actually atomic wrt the
actual wakeup or serialized by the page lock.

The reason this didn't use to happen was that the old logic in waiting
on a page bit would just loop if it ever saw the bit set again.

The nice proper fix would probably be to get rid of the whole "wait for
writeback to clear, and then set it" logic in the writeback path, and
replace it with an atomic "wait-to-set" (ie the same as we have for page
locking: we set the page lock bit with a single "lock_page()", not with
"wait for lock bit to clear and then set it").

However, out current model for writeback is that the waiting for the
writeback bit is done by the generic VFS code (ie write_cache_pages()),
but the actual setting of the writeback bit is done much later by the
filesystem ".writepages()" function.

IOW, to make the writeback bit have that same kind of "wait-to-set"
behavior as we have for page locking, we'd have to change our roughly
~50 different writeback functions.  Painful.

Instead, just make "wait_on_page_writeback()" loop on the very unlikely
situation that the PG_writeback bit is still set, basically re-instating
the old behavior.  This is very non-optimal in case of contention, but
since we only ever set the bit under the page lock, that situation is
controlled.

Reported-by: syzbot+2fc0712f8f8b8b8fa0ef@syzkaller.appspotmail.com
Fixes: 2a9127fcf2 ("mm: rewrite wait_on_page_bit_common() logic")
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Bin Lai <robinlai@tencent.com>
2024-06-11 20:51:18 +08:00
Documentation kdump: update Documentation about crashkernel 2024-06-11 20:51:01 +08:00
LICENSES LICENSES: Rename other to deprecated 2019-05-03 06:34:32 -06:00
arch kdump: the capture kernel can't use dma memory 2024-06-11 20:51:11 +08:00
block block: fix the incorrect spin_lock_irq to spin_lock 2024-06-11 20:44:39 +08:00
certs ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
crypto ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
dist dist: remove leds from filter out directory 2024-06-11 20:41:06 +08:00
drivers driver: update e1000e to 3.8.4 2024-06-11 20:51:17 +08:00
fs jfs: prevent NULL deref in diFree 2024-06-11 20:43:56 +08:00
include ipvs: queue delayed work to expire no destination connections if expire_nodest_conn=1 2024-06-11 20:51:18 +08:00
init irqlatency: add irq latency monitor support 2024-06-11 20:40:51 +08:00
ipc shm: extend forced shm destroy to support objects from several IPC nses 2024-06-11 20:44:39 +08:00
kernel x86, arm64: Add ARCH_WANT_RESERVE_CRASH_KERNEL config 2024-06-11 20:51:00 +08:00
lib libfdt: include fdt_addresses.c 2024-06-11 20:51:02 +08:00
mm mm: make wait_on_page_writeback() wait for multiple pending writebacks 2024-06-11 20:51:18 +08:00
net ipvs: queue delayed work to expire no destination connections if expire_nodest_conn=1 2024-06-11 20:51:18 +08:00
package config: enable BFQ io scheduler 2024-06-11 20:50:53 +08:00
samples ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
scripts ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
security security: Fix hook iteration for secid_to_secctx 2024-06-11 20:50:15 +08:00
sound ALSA: pcm: Fix races among concurrent hw_params and hw_free calls 2024-06-11 20:41:27 +08:00
tools ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
usr tkernel: add base tlinux kernel interfaces 2024-06-11 20:09:33 +08:00
virt KVM: Do not leak memory for duplicate debugfs directories 2024-06-11 20:51:18 +08:00
.clang-format clang-format: Update with the latest for_each macro list 2019-08-31 10:00:51 +02:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
.gitignore ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS MAINTAINERS: Remove Simon as Renesas SoC Co-Maintainer 2019-10-10 08:12:51 -07:00
Kbuild kbuild: do not descend to ./Kbuild when cleaning 2019-08-21 21:03:58 +09:00
Kconfig docs: kbuild: convert docs to ReST and rename to *.rst 2019-06-14 14:21:21 -06:00
MAINTAINERS Phytium JPEG Encoder driver 2024-06-11 20:41:01 +08:00
Makefile ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
README.md tkernel: add base tlinux kernel interfaces 2024-06-11 20:09:33 +08:00
backport_remove_lists.txt tkernel: add base tlinux kernel interfaces 2024-06-11 20:09:33 +08:00
tools_key.pub tkernel: add base tlinux kernel interfaces 2024-06-11 20:09:33 +08:00

README.md

Tencent Linux Kernel 4.0