OpenCloudOS-Kernel

History

Omar Sandoval 4c5b123ab2 blk-rq-qos: fix crash on rq_qos_wait vs. rq_qos_wake_function race commit e972b08b91ef48488bae9789f03cfedb148667fb upstream. We're seeing crashes from rq_qos_wake_function that look like this: BUG: unable to handle page fault for address: ffffafe180a40084 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 100000067 P4D 100000067 PUD 10027c067 PMD 10115d067 PTE 0 Oops: Oops: 0002 [#1] PREEMPT SMP PTI CPU: 17 UID: 0 PID: 0 Comm: swapper/17 Not tainted 6.12.0-rc3-00013-geca631b8fe80 #11 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:_raw_spin_lock_irqsave+0x1d/0x40 Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 54 9c 41 5c fa 65 ff 05 62 97 30 4c 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 0a 4c 89 e0 41 5c c3 cc cc cc cc 89 c6 e8 2c 0b 00 RSP: 0018:ffffafe180580ca0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffffafe180a3f7a8 RCX: 0000000000000011 RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffffafe180a40084 RBP: 0000000000000000 R08: 00000000001e7240 R09: 0000000000000011 R10: 0000000000000028 R11: 0000000000000888 R12: 0000000000000002 R13: ffffafe180a40084 R14: 0000000000000000 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff9aaf1f280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffafe180a40084 CR3: 000000010e428002 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> try_to_wake_up+0x5a/0x6a0 rq_qos_wake_function+0x71/0x80 __wake_up_common+0x75/0xa0 __wake_up+0x36/0x60 scale_up.part.0+0x50/0x110 wb_timer_fn+0x227/0x450 ... So rq_qos_wake_function() calls wake_up_process(data->task), which calls try_to_wake_up(), which faults in raw_spin_lock_irqsave(&p->pi_lock). p comes from data->task, and data comes from the waitqueue entry, which is stored on the waiter's stack in rq_qos_wait(). Analyzing the core dump with drgn, I found that the waiter had already woken up and moved on to a completely unrelated code path, clobbering what was previously data->task. Meanwhile, the waker was passing the clobbered garbage in data->task to wake_up_process(), leading to the crash. What's happening is that in between rq_qos_wake_function() deleting the waitqueue entry and calling wake_up_process(), rq_qos_wait() is finding that it already got a token and returning. The race looks like this: rq_qos_wait() rq_qos_wake_function() ============================================================== prepare_to_wait_exclusive() data->got_token = true; list_del_init(&curr->entry); if (data.got_token) break; finish_wait(&rqw->wait, &data.wq); ^- returns immediately because list_empty_careful(&wq_entry->entry) is true ... return, go do something else ... wake_up_process(data->task) (NO LONGER VALID!)-^ Normally, finish_wait() is supposed to synchronize against the waker. But, as noted above, it is returning immediately because the waitqueue entry has already been removed from the waitqueue. The bug is that rq_qos_wake_function() is accessing the waitqueue entry AFTER deleting it. Note that autoremove_wake_function() wakes the waiter and THEN deletes the waitqueue entry, which is the proper order. Fix it by swapping the order. We also need to use list_del_init_careful() to match the list_empty_careful() in finish_wait(). Fixes: `38cfb5a45e` ("blk-wbt: improve waking of tasks") Cc: stable@vger.kernel.org Signed-off-by: Omar Sandoval <osandov@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/d3bee2463a67b1ee597211823bf7ad3721c26e41.1729014591.git.osandov@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>		2024-10-22 15:46:27 +02:00
..
partitions	block: fix potential invalid pointer dereference in blk_add_partition	2024-10-04 16:29:01 +02:00
Kconfig	block: sed-opal: keyring support for SED keys	2023-08-22 11:10:26 -06:00
Kconfig.iosched	block: Default to use cgroup support for BFQ	2023-01-30 09:42:42 -07:00
Makefile	block: move the code to do early boot lookup of block devices to block/	2023-06-05 10:57:40 -06:00
badblocks.c	block/badblocks: Remove redundant assignments	2022-04-23 07:15:26 -06:00
bdev.c	block: Provide bdev_open_* functions	2024-03-26 18:19:40 -04:00
bfq-cgroup.c	blkcg: Restructure blkg_conf_prep() and friends	2023-04-13 06:46:49 -06:00
bfq-iosched.c	block, bfq: fix procress reference leakage for bfqq in merge chain	2024-10-04 16:29:00 +02:00
bfq-iosched.h	block, bfq: remove BFQ_WEIGHT_LEGACY_DFL	2023-04-06 16:17:32 -06:00
bfq-wf2q.c	block, bfq: inject I/O to underutilized actuators	2023-01-29 15:18:33 -07:00
bio-integrity.c	block: initialize integrity buffer to zero before writing it to media	2024-08-03 08:53:20 +02:00
bio.c	block: Fix page refcounts for unaligned buffers in __bio_release_pages()	2024-04-03 15:28:27 +02:00
blk-cgroup-fc-appid.c	block: Replace all non-returning strlcpy with strscpy	2023-06-01 09:13:31 -06:00
blk-cgroup-rwstat.c	Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"	2023-02-14 14:24:09 -07:00
blk-cgroup-rwstat.h	block: Use the new blk_opf_t type	2022-07-14 12:14:30 -06:00
blk-cgroup.c	blk-cgroup: Properly propagate the iostat update up the hierarchy	2024-06-12 11:12:46 +02:00
blk-cgroup.h	block: fix q->blkg_list corruption during disk rebind	2024-04-17 11:19:28 +02:00
blk-core.c	block: Fix where bio IO priority gets set	2024-09-30 16:25:12 +02:00
blk-crypto-fallback.c	blk-crypto: dynamically allocate fallback profile	2023-08-18 15:00:39 -06:00
blk-crypto-internal.h	blk-crypto: remove blk_crypto_insert_cloned_request()	2023-03-16 09:35:09 -06:00
blk-crypto-profile.c	blk-crypto: use dynamic lock class for blk_crypto_profile::lock	2023-07-05 16:36:12 -06:00
blk-crypto-sysfs.c	block: make kobj_type structures constant	2023-02-09 09:38:16 -07:00
blk-crypto.c	blk-crypto: make blk_crypto_evict_key() more robust	2023-03-16 09:35:09 -06:00
blk-flush.c	block: fix request.queuelist usage in flush	2024-06-21 14:38:35 +02:00
blk-ia-ranges.c	block: make kobj_type structures constant	2023-02-09 09:38:16 -07:00
blk-integrity.c	block: remove the blk_flush_integrity call in blk_integrity_unregister	2024-09-08 07:54:47 +02:00
blk-ioc.c	blk-ioc: fix recursive spin_lock/unlock_irq() in ioc_clear_queue()	2023-06-07 07:51:00 -06:00
blk-iocost.c	blk_iocost: fix more out of bound shifts	2024-10-10 11:57:24 +02:00
blk-iolatency.c	block: fix bad lockdep annotation in blk-iolatency	2023-08-10 17:24:53 -06:00
blk-ioprio.c	blk-ioprio: Introduce promote-to-rt policy	2023-06-06 22:26:26 -06:00
blk-ioprio.h	blk-ioprio: pass a gendisk to blk_ioprio_init and blk_ioprio_exit	2022-09-26 19:09:31 -06:00
blk-lib.c	blk-lib: fix blkdev_issue_secure_erase	2022-09-15 00:25:17 -06:00
blk-map.c	block: Fix WARNING in _copy_from_iter	2024-03-01 13:34:49 +01:00
blk-merge.c	block: support to account io_ticks precisely	2024-06-12 11:11:35 +02:00
blk-mq-cpumap.c	blk-mq: include <linux/blk-mq.h> in block/blk-mq.h	2023-04-13 06:52:29 -06:00
blk-mq-debugfs-zoned.c	block: move zone related fields to struct gendisk	2022-07-06 06:46:26 -06:00
blk-mq-debugfs.c	blk-mq: fix potential io hang by wrong 'wake_batch'	2023-06-12 09:55:53 -06:00
blk-mq-debugfs.h	block: remove per-disk debugfs files in blk_unregister_queue	2022-06-17 07:31:05 -06:00
blk-mq-pci.c	blk-mq: include <linux/blk-mq.h> in block/blk-mq.h	2023-04-13 06:52:29 -06:00
blk-mq-sched.c	blk-mq: cleanup __blk_mq_sched_dispatch_requests	2023-04-13 06:57:18 -06:00
blk-mq-sched.h	blk-mq: make sure elevator callbacks aren't called for passthrough request	2023-05-18 19:42:54 -06:00
blk-mq-sysfs.c	blk-mq: include <linux/blk-mq.h> in block/blk-mq.h	2023-04-13 06:52:29 -06:00
blk-mq-tag.c	block: Fix lockdep warning in blk_mq_mark_tag_wait	2024-08-29 17:33:30 +02:00
blk-mq-virtio.c	blk-mq: include <linux/blk-mq.h> in block/blk-mq.h	2023-04-13 06:52:29 -06:00
blk-mq.c	block: Fix where bio IO priority gets set	2024-09-30 16:25:12 +02:00
blk-mq.h	blk-mq: fix potential io hang by wrong 'wake_batch'	2023-06-12 09:55:53 -06:00
blk-pm.c	blk-mq: include <linux/blk-mq.h> in block/blk-mq.h	2023-04-13 06:52:29 -06:00
blk-pm.h	block: Remove unused blk_pm_*() function definitions	2021-02-22 06:33:48 -07:00
blk-rq-qos.c	blk-rq-qos: fix crash on rq_qos_wait vs. rq_qos_wake_function race	2024-10-22 15:46:27 +02:00
blk-rq-qos.h	blk-iolatency: s/blkcg_rq_qos/iolat_rq_qos/	2023-04-13 06:46:49 -06:00
blk-settings.c	block: Clear zone limits for a non-zoned stacked queue	2024-04-03 15:28:20 +02:00
blk-stat.c	block: prevent division by zero in blk_rq_stat_sum()	2024-04-13 13:07:37 +02:00
blk-stat.h	block: make queue stat accounting a reference	2021-12-14 17:23:05 -07:00
blk-sysfs.c	block: don't allow enabling a cache on devices that don't support it	2023-07-17 08:18:18 -06:00
blk-throttle.c	blk-throttle: fix lockdep warning of "cgroup_mutex or RCU read lock required!"	2023-12-20 17:01:55 +01:00
blk-throttle.h	blk-throttle: print signed value 'carryover_bytes/ios' for user	2023-08-30 10:15:01 -06:00
blk-timeout.c	block: blk-timeout: delete duplicated word	2020-07-31 16:29:47 -06:00
blk-wbt.c	blk-wbt: Fix detection of dirty-throttled tasks	2024-02-23 09:25:16 +01:00
blk-wbt.h	blk-wbt: don't create wbt sysfs entry if CONFIG_BLK_WBT is disabled	2023-06-26 09:53:36 -06:00
blk-zoned.c	Merge branch '6.5/scsi-staging' into 6.5/scsi-fixes	2023-07-11 12:15:15 -04:00
blk.h	block: support to account io_ticks precisely	2024-06-12 11:11:35 +02:00
bounce.c	block: change the blk_queue_bounce calling convention	2022-08-02 17:22:54 -06:00
bsg-lib.c	scsi: replace the fmode_t argument to ->sg_io_fn with a simple bool	2023-06-12 08:04:04 -06:00
bsg.c	SCSI misc on 20230629	2023-06-30 11:57:07 -07:00
disk-events.c	block: fix kernel-doc for disk_force_media_change()	2023-09-26 00:43:34 -06:00
early-lookup.c	block: don't return -EINVAL for not found names in devt_from_devname	2023-06-22 09:09:33 -06:00
elevator.c	blk-mq: release scheduler resource when request completes	2023-08-19 07:47:17 -06:00
elevator.h	blk-mq: pass a flags argument to elevator_type->insert_requests	2023-04-13 06:52:30 -06:00
fops.c	block: refine the EOF check in blkdev_iomap_begin	2024-06-12 11:11:35 +02:00
genhd.c	block: fix deadlock between sd_remove & sd_release	2024-08-03 08:54:24 +02:00
holder.c	block: don't allow a disk link holder to itself	2022-11-16 15:19:56 -07:00
ioctl.c	block/ioctl: prefer different overflow check	2024-06-27 13:49:01 +02:00
ioprio.c	scsi: block: Improve ioprio value validity checks	2023-06-16 12:04:30 -04:00
kyber-iosched.c	blk-mq: pass a flags argument to elevator_type->insert_requests	2023-04-13 06:52:30 -06:00
mq-deadline.c	block/mq-deadline: Fix the tag reservation code	2024-08-03 08:53:22 +02:00
opal_proto.h	block: sed-opal: handle empty atoms when parsing response	2024-03-26 18:19:12 -04:00
sed-opal.c	block: sed-opal: avoid possible wrong address reference in read_sed_opal_key()	2024-06-21 14:38:35 +02:00
t10-pi.c	block: add pi for extended integrity	2022-03-07 12:48:35 -07:00