OpenCloudOS-Kernel

History

Tejun Heo 8a177a36da blk-iolatency: Fix inflight count imbalances and IO hangs on offline iolatency needs to track the number of inflight IOs per cgroup. As this tracking can be expensive, it is disabled when no cgroup has iolatency configured for the device. To ensure that the inflight counters stay balanced, iolatency_set_limit() freezes the request_queue while manipulating the enabled counter, which ensures that no IO is in flight and thus all counters are zero. Unfortunately, iolatency_set_limit() isn't the only place where the enabled counter is manipulated. iolatency_pd_offline() can also dec the counter and trigger disabling. As this disabling happens without freezing the q, this can easily happen while some IOs are in flight and thus leak the counts. This can be easily demonstrated by turning on iolatency on an one empty cgroup while IOs are in flight in other cgroups and then removing the cgroup. Note that iolatency shouldn't have been enabled elsewhere in the system to ensure that removing the cgroup disables iolatency for the whole device. The following keeps flipping on and off iolatency on sda: echo +io > /sys/fs/cgroup/cgroup.subtree_control while true; do mkdir -p /sys/fs/cgroup/test echo '8:0 target=100000' > /sys/fs/cgroup/test/io.latency sleep 1 rmdir /sys/fs/cgroup/test sleep 1 done and there's concurrent fio generating direct rand reads: fio --name test --filename=/dev/sda --direct=1 --rw=randread \ --runtime=600 --time_based --iodepth=256 --numjobs=4 --bs=4k while monitoring with the following drgn script: while True: for css in css_for_each_descendant_pre(prog['blkcg_root'].css.address_of_()): for pos in hlist_for_each(container_of(css, 'struct blkcg', 'css').blkg_list): blkg = container_of(pos, 'struct blkcg_gq', 'blkcg_node') pd = blkg.pd[prog['blkcg_policy_iolatency'].plid] if pd.value_() == 0: continue iolat = container_of(pd, 'struct iolatency_grp', 'pd') inflight = iolat.rq_wait.inflight.counter.value_() if inflight: print(f'inflight={inflight} {disk_name(blkg.q.disk).decode("utf-8")} ' f'{cgroup_path(css.cgroup).decode("utf-8")}') time.sleep(1) The monitoring output looks like the following: inflight=1 sda /user.slice inflight=1 sda /user.slice ... inflight=14 sda /user.slice inflight=13 sda /user.slice inflight=17 sda /user.slice inflight=15 sda /user.slice inflight=18 sda /user.slice inflight=17 sda /user.slice inflight=20 sda /user.slice inflight=19 sda /user.slice <- fio stopped, inflight stuck at 19 inflight=19 sda /user.slice inflight=19 sda /user.slice If a cgroup with stuck inflight ends up getting throttled, the throttled IOs will never get issued as there's no completion event to wake it up leading to an indefinite hang. This patch fixes the bug by unifying enable handling into a work item which is automatically kicked off from iolatency_set_min_lat_nsec() which is called from both iolatency_set_limit() and iolatency_pd_offline() paths. Punting to a work item is necessary as iolatency_pd_offline() is called under spinlocks while freezing a request_queue requires a sleepable context. This also simplifies the code reducing LOC sans the comments and avoids the unnecessary freezes which were happening whenever a cgroup's latency target is newly set or cleared. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Liu Bo <bo.liu@linux.alibaba.com> Fixes: `8c772a9bfc` ("blk-iolatency: fix IO hang due to negative inflight counter") Cc: stable@vger.kernel.org # v5.0+ Link: https://lore.kernel.org/r/Yn9ScX6Nx2qIiQQi@slm.duckdns.org Signed-off-by: Jens Axboe <axboe@kernel.dk>		2022-05-26 11:43:00 -06:00
..
partitions	block/partitions/ldm: Remove redundant assignments	2022-04-23 07:15:26 -06:00
Kconfig	block: add pi for extended integrity	2022-03-07 12:48:35 -07:00
Kconfig.iosched	block: only build the icq tracking code when needed	2021-12-16 10:59:02 -07:00
Makefile	blk-cgroup: move blkcg_{get,set}_fc_appid out of line	2022-05-02 14:06:20 -06:00
badblocks.c	block/badblocks: Remove redundant assignments	2022-04-23 07:15:26 -06:00
bdev.c	Merge branch 'akpm' (patches from Andrew)	2022-03-22 16:11:53 -07:00
bfq-cgroup.c	bfq: Make sure bfqg for which we are queueing requests is online	2022-04-17 19:34:32 -06:00
bfq-iosched.c	bfq: Remove bfq_requeue_request_body()	2022-05-19 06:52:36 -06:00
bfq-iosched.h	bfq: Relax waker detection for shared queues	2022-05-19 06:52:33 -06:00
bfq-wf2q.c	block, bfq: cleanup bfq_bfqq_to_bfqg()	2022-02-18 06:13:00 -07:00
bio-integrity.c	for-5.18/block-2022-03-18	2022-03-21 16:48:55 -07:00
bio.c	block: allow passing a NULL bdev to bio_alloc_clone/bio_init_clone	2022-05-04 18:29:52 -06:00
blk-cgroup-fc-appid.c	blk-cgroup: move blkcg_{get,set}_fc_appid out of line	2022-05-02 14:06:20 -06:00
blk-cgroup-rwstat.c	blk-cgroup: Fix the recursive blkg rwstat	2021-03-05 11:32:15 -07:00
blk-cgroup-rwstat.h	block: partition include/linux/blk-cgroup.h	2022-02-11 10:02:41 -07:00
blk-cgroup.c	blk-cgroup: delete rcu_read_lock_held() WARN_ON_ONCE()	2022-05-18 16:32:00 -06:00
blk-cgroup.h	blk-cgroup: always terminate io.stat lines	2022-05-17 06:11:17 -06:00
blk-core.c	block: cleanup the VM accounting in submit_bio	2022-05-16 11:37:50 -06:00
blk-crypto-fallback.c	block: remove superfluous calls to blkcg_bio_issue_init	2022-05-04 18:29:52 -06:00
blk-crypto-internal.h	blk-crypto: show crypto capabilities in sysfs	2022-02-28 06:40:23 -07:00
blk-crypto-profile.c	blk-crypto: remove blk_crypto_unregister()	2021-11-29 06:38:51 -07:00
blk-crypto-sysfs.c	blk-crypto: show crypto capabilities in sysfs	2022-02-28 06:40:23 -07:00
blk-crypto.c	blk-crypto: show crypto capabilities in sysfs	2022-02-28 06:40:23 -07:00
blk-flush.c	block: pass a block_device and opf to bio_init	2022-02-02 07:49:59 -07:00
blk-ia-ranges.c	block: fix memory leak in disk_register_independent_access_ranges	2022-01-23 09:13:09 -07:00
blk-integrity.c	blk-crypto: remove blk_crypto_unregister()	2021-11-29 06:38:51 -07:00
blk-ioc.c	block: restore the old set_task_ioprio() behaviour wrt PF_EXITING	2022-03-28 06:34:11 -06:00
blk-iocost.c	blk-cgroup: always terminate io.stat lines	2022-05-17 06:11:17 -06:00
blk-iolatency.c	blk-iolatency: Fix inflight count imbalances and IO hangs on offline	2022-05-26 11:43:00 -06:00
blk-ioprio.c	block: partition include/linux/blk-cgroup.h	2022-02-11 10:02:41 -07:00
blk-ioprio.h	block: Introduce the ioprio rq-qos policy	2021-06-21 15:03:40 -06:00
blk-lib.c	block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD	2022-04-17 19:49:59 -06:00
blk-map.c	block/blk-map: Remove redundant assignment	2022-04-23 07:15:26 -06:00
blk-merge.c	for-5.18/write-streams-2022-03-18	2022-03-26 11:51:46 -07:00
blk-mq-cpumap.c	blk-mq: remove the calling of local_memory_node()	2020-10-20 07:08:17 -06:00
blk-mq-debugfs-zoned.c	block: Cleanup license notice	2019-01-17 21:21:40 -07:00
blk-mq-debugfs.c	block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD	2022-04-17 19:49:59 -06:00
blk-mq-debugfs.h	blk-mq: manage hctx map via xarray	2022-03-08 19:39:38 -07:00
blk-mq-pci.c	block: Fix blk_mq_*_map_queues() kernel-doc headers	2019-05-31 15:12:34 -06:00
blk-mq-rdma.c	block: Fix blk_mq_*_map_queues() kernel-doc headers	2019-05-31 15:12:34 -06:00
blk-mq-sched.c	block: limit request dispatch loop duration	2022-03-17 20:31:43 -06:00
blk-mq-sched.h	block: move blk_mq_sched_assign_ioc to blk-ioc.c	2021-11-29 06:41:29 -07:00
blk-mq-sysfs.c	blk-mq: prepare for implementing hctx table via xarray	2022-03-08 17:57:19 -07:00
blk-mq-tag.c	blk-mq: manage hctx map via xarray	2022-03-08 19:39:38 -07:00
blk-mq-tag.h	blk-mq: Delete busy_iter_fn	2021-12-06 13:18:47 -07:00
blk-mq-virtio.c	blk-mq: Fix typo in comment	2020-03-17 20:55:21 +01:00
blk-mq.c	blk-mq: don't touch ->tagset in blk_mq_get_sq_hctx	2022-05-23 06:28:28 -06:00
blk-mq.h	blk-mq: manage hctx map via xarray	2022-03-08 19:39:38 -07:00
blk-pm.c	scsi: block: pm: Always set request queue runtime active in blk_post_runtime_resume()	2021-12-22 23:38:29 -05:00
blk-pm.h	block: Remove unused blk_pm_*() function definitions	2021-02-22 06:33:48 -07:00
blk-rq-qos.c	rq-qos: fix missed wake-ups in rq_qos_throttle try two	2021-06-08 15:12:57 -06:00
blk-rq-qos.h	block: fix rq-qos breakage from skipping rq_qos_done_bio()	2022-03-14 14:23:13 -06:00
blk-settings.c	block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD	2022-04-17 19:49:59 -06:00
blk-stat.c	block: make queue stat accounting a reference	2021-12-14 17:23:05 -07:00
blk-stat.h	block: make queue stat accounting a reference	2021-12-14 17:23:05 -07:00
blk-sysfs.c	SCSI misc on 20220324	2022-03-24 19:37:53 -07:00
blk-throttle.c	blk-throttle: Set BIO_THROTTLED when bio has been throttled	2022-05-17 19:32:10 -06:00
blk-throttle.h	block: cancel all throttled bios in del_gendisk()	2022-03-18 09:57:56 -06:00
blk-timeout.c	block: blk-timeout: delete duplicated word	2020-07-31 16:29:47 -06:00
blk-wbt.c	blk-wbt: prevent NULL pointer dereference in wb_timer_fn	2021-10-19 06:13:41 -06:00
blk-wbt.h	blk-wbt: remove wbt_track stub	2022-03-31 12:58:38 -06:00
blk-zoned.c	SCSI misc on 20220324	2022-03-24 19:37:53 -07:00
blk.h	block: refactor discard bio size limiting	2022-04-17 19:49:59 -06:00
bounce.c	block: remove superfluous calls to blkcg_bio_issue_init	2022-05-04 18:29:52 -06:00
bsg-lib.c	block: remove the gendisk argument to blk_execute_rq	2021-11-29 06:41:29 -07:00
bsg.c	scsi: bsg: Fix device unregistration	2021-09-14 00:22:15 -04:00
disk-events.c	block: remove genhd.h	2022-02-02 07:49:59 -07:00
elevator.c	for-5.18/block-2022-03-18	2022-03-21 16:48:55 -07:00
elevator.h	block: move elevator.h to block/	2021-10-18 06:17:01 -06:00
fops.c	block: ignore RWF_HIPRI hint for sync dio	2022-05-02 10:07:42 -06:00
genhd.c	block: remove queue_discard_alignment	2022-04-17 19:49:59 -06:00
holder.c	block: remove genhd.h	2022-02-02 07:49:59 -07:00
ioctl.c	block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD	2022-04-17 19:49:59 -06:00
ioprio.c	for-5.17/block-2022-01-11	2022-01-12 10:26:52 -08:00
kyber-iosched.c	block: make queue stat accounting a reference	2021-12-14 17:23:05 -07:00
mq-deadline.c	block: fix async_depth sysfs interface for mq-deadline	2022-01-20 10:54:02 -07:00
opal_proto.h	block: sed-opal: Change the check condition for regular session validity	2020-03-12 08:00:10 -06:00
sed-opal.c	block: remove genhd.h	2022-02-02 07:49:59 -07:00
t10-pi.c	block: add pi for extended integrity	2022-03-07 12:48:35 -07:00