OpenCloudOS-Kernel/block
Yang Yang 5a5625a83e block: fix deadlock between sd_remove & sd_release
commit 7e04da2dc7013af50ed3a2beb698d5168d1e594b upstream.

Our test report the following hung task:

[ 2538.459400] INFO: task "kworker/0:0":7 blocked for more than 188 seconds.
[ 2538.459427] Call trace:
[ 2538.459430]  __switch_to+0x174/0x338
[ 2538.459436]  __schedule+0x628/0x9c4
[ 2538.459442]  schedule+0x7c/0xe8
[ 2538.459447]  schedule_preempt_disabled+0x24/0x40
[ 2538.459453]  __mutex_lock+0x3ec/0xf04
[ 2538.459456]  __mutex_lock_slowpath+0x14/0x24
[ 2538.459459]  mutex_lock+0x30/0xd8
[ 2538.459462]  del_gendisk+0xdc/0x350
[ 2538.459466]  sd_remove+0x30/0x60
[ 2538.459470]  device_release_driver_internal+0x1c4/0x2c4
[ 2538.459474]  device_release_driver+0x18/0x28
[ 2538.459478]  bus_remove_device+0x15c/0x174
[ 2538.459483]  device_del+0x1d0/0x358
[ 2538.459488]  __scsi_remove_device+0xa8/0x198
[ 2538.459493]  scsi_forget_host+0x50/0x70
[ 2538.459497]  scsi_remove_host+0x80/0x180
[ 2538.459502]  usb_stor_disconnect+0x68/0xf4
[ 2538.459506]  usb_unbind_interface+0xd4/0x280
[ 2538.459510]  device_release_driver_internal+0x1c4/0x2c4
[ 2538.459514]  device_release_driver+0x18/0x28
[ 2538.459518]  bus_remove_device+0x15c/0x174
[ 2538.459523]  device_del+0x1d0/0x358
[ 2538.459528]  usb_disable_device+0x84/0x194
[ 2538.459532]  usb_disconnect+0xec/0x300
[ 2538.459537]  hub_event+0xb80/0x1870
[ 2538.459541]  process_scheduled_works+0x248/0x4dc
[ 2538.459545]  worker_thread+0x244/0x334
[ 2538.459549]  kthread+0x114/0x1bc

[ 2538.461001] INFO: task "fsck.":15415 blocked for more than 188 seconds.
[ 2538.461014] Call trace:
[ 2538.461016]  __switch_to+0x174/0x338
[ 2538.461021]  __schedule+0x628/0x9c4
[ 2538.461025]  schedule+0x7c/0xe8
[ 2538.461030]  blk_queue_enter+0xc4/0x160
[ 2538.461034]  blk_mq_alloc_request+0x120/0x1d4
[ 2538.461037]  scsi_execute_cmd+0x7c/0x23c
[ 2538.461040]  ioctl_internal_command+0x5c/0x164
[ 2538.461046]  scsi_set_medium_removal+0x5c/0xb0
[ 2538.461051]  sd_release+0x50/0x94
[ 2538.461054]  blkdev_put+0x190/0x28c
[ 2538.461058]  blkdev_release+0x28/0x40
[ 2538.461063]  __fput+0xf8/0x2a8
[ 2538.461066]  __fput_sync+0x28/0x5c
[ 2538.461070]  __arm64_sys_close+0x84/0xe8
[ 2538.461073]  invoke_syscall+0x58/0x114
[ 2538.461078]  el0_svc_common+0xac/0xe0
[ 2538.461082]  do_el0_svc+0x1c/0x28
[ 2538.461087]  el0_svc+0x38/0x68
[ 2538.461090]  el0t_64_sync_handler+0x68/0xbc
[ 2538.461093]  el0t_64_sync+0x1a8/0x1ac

  T1:				T2:
  sd_remove
  del_gendisk
  __blk_mark_disk_dead
  blk_freeze_queue_start
  ++q->mq_freeze_depth
  				bdev_release
 				mutex_lock(&disk->open_mutex)
  				sd_release
 				scsi_execute_cmd
 				blk_queue_enter
 				wait_event(!q->mq_freeze_depth)
  mutex_lock(&disk->open_mutex)

SCSI does not set GD_OWNS_QUEUE, so QUEUE_FLAG_DYING is not set in
this scenario. This is a classic ABBA deadlock. To fix the deadlock,
make sure we don't try to acquire disk->open_mutex after freezing
the queue.

Cc: stable@vger.kernel.org
Fixes: eec1be4c30 ("block: delete partitions later in del_gendisk")
Signed-off-by: Yang Yang <yang.yang@vivo.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Fixes: and Cc: stable tags are missing. Otherwise this patch looks fine
Link: https://lore.kernel.org/r/20240724070412.22521-1-yang.yang@vivo.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03 08:54:24 +02:00
..
partitions block: fix and simplify blkdevparts= cmdline parsing 2024-06-12 11:11:35 +02:00
Kconfig block: sed-opal: keyring support for SED keys 2023-08-22 11:10:26 -06:00
Kconfig.iosched
Makefile
badblocks.c
bdev.c block: Provide bdev_open_* functions 2024-03-26 18:19:40 -04:00
bfq-cgroup.c
bfq-iosched.c SCSI misc on 20230629 2023-06-30 11:57:07 -07:00
bfq-iosched.h
bfq-wf2q.c
bio-integrity.c block: initialize integrity buffer to zero before writing it to media 2024-08-03 08:53:20 +02:00
bio.c block: Fix page refcounts for unaligned buffers in __bio_release_pages() 2024-04-03 15:28:27 +02:00
blk-cgroup-fc-appid.c
blk-cgroup-rwstat.c
blk-cgroup-rwstat.h
blk-cgroup.c blk-cgroup: Properly propagate the iostat update up the hierarchy 2024-06-12 11:12:46 +02:00
blk-cgroup.h block: fix q->blkg_list corruption during disk rebind 2024-04-17 11:19:28 +02:00
blk-core.c block: support to account io_ticks precisely 2024-06-12 11:11:35 +02:00
blk-crypto-fallback.c blk-crypto: dynamically allocate fallback profile 2023-08-18 15:00:39 -06:00
blk-crypto-internal.h
blk-crypto-profile.c blk-crypto: use dynamic lock class for blk_crypto_profile::lock 2023-07-05 16:36:12 -06:00
blk-crypto-sysfs.c
blk-crypto.c
blk-flush.c block: fix request.queuelist usage in flush 2024-06-21 14:38:35 +02:00
blk-ia-ranges.c
blk-integrity.c
blk-ioc.c
blk-iocost.c blk-iocost: do not WARN if iocg was already offlined 2024-05-17 12:02:20 +02:00
blk-iolatency.c block: fix bad lockdep annotation in blk-iolatency 2023-08-10 17:24:53 -06:00
blk-ioprio.c
blk-ioprio.h
blk-lib.c
blk-map.c block: Fix WARNING in _copy_from_iter 2024-03-01 13:34:49 +01:00
blk-merge.c block: support to account io_ticks precisely 2024-06-12 11:11:35 +02:00
blk-mq-cpumap.c
blk-mq-debugfs-zoned.c
blk-mq-debugfs.c
blk-mq-debugfs.h
blk-mq-pci.c
blk-mq-sched.c
blk-mq-sched.h
blk-mq-sysfs.c
blk-mq-tag.c
blk-mq-virtio.c
blk-mq.c block: Call .limit_depth() after .hctx has been set 2024-08-03 08:53:22 +02:00
blk-mq.h
blk-pm.c
blk-pm.h
blk-rq-qos.c block: correct stale comment in rq_qos_wait 2023-09-18 14:15:28 -06:00
blk-rq-qos.h
blk-settings.c block: Clear zone limits for a non-zoned stacked queue 2024-04-03 15:28:20 +02:00
blk-stat.c block: prevent division by zero in blk_rq_stat_sum() 2024-04-13 13:07:37 +02:00
blk-stat.h
blk-sysfs.c block: don't allow enabling a cache on devices that don't support it 2023-07-17 08:18:18 -06:00
blk-throttle.c blk-throttle: fix lockdep warning of "cgroup_mutex or RCU read lock required!" 2023-12-20 17:01:55 +01:00
blk-throttle.h blk-throttle: print signed value 'carryover_bytes/ios' for user 2023-08-30 10:15:01 -06:00
blk-timeout.c
blk-wbt.c blk-wbt: Fix detection of dirty-throttled tasks 2024-02-23 09:25:16 +01:00
blk-wbt.h
blk-zoned.c Merge branch '6.5/scsi-staging' into 6.5/scsi-fixes 2023-07-11 12:15:15 -04:00
blk.h block: support to account io_ticks precisely 2024-06-12 11:11:35 +02:00
bounce.c
bsg-lib.c
bsg.c SCSI misc on 20230629 2023-06-30 11:57:07 -07:00
disk-events.c block: fix kernel-doc for disk_force_media_change() 2023-09-26 00:43:34 -06:00
early-lookup.c
elevator.c blk-mq: release scheduler resource when request completes 2023-08-19 07:47:17 -06:00
elevator.h
fops.c block: refine the EOF check in blkdev_iomap_begin 2024-06-12 11:11:35 +02:00
genhd.c block: fix deadlock between sd_remove & sd_release 2024-08-03 08:54:24 +02:00
holder.c
ioctl.c block/ioctl: prefer different overflow check 2024-06-27 13:49:01 +02:00
ioprio.c
kyber-iosched.c
mq-deadline.c block/mq-deadline: Fix the tag reservation code 2024-08-03 08:53:22 +02:00
opal_proto.h block: sed-opal: handle empty atoms when parsing response 2024-03-26 18:19:12 -04:00
sed-opal.c block: sed-opal: avoid possible wrong address reference in read_sed_opal_key() 2024-06-21 14:38:35 +02:00
t10-pi.c