OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Greg Kroah-Hartman	4a248f85b3	Merge 5.17-rc6 into driver-core-next We need the driver core fix in here as well for future changes. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-02-28 07:45:41 +01:00
Christoph Hellwig	a773187e37	scsi: dm: Remove WRITE_SAME support There are no more end-users of REQ_OP_WRITE_SAME left, so we can start deleting it. Link: https://lore.kernel.org/r/20220209082828.2629273-7-hch@lst.de Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-02-22 21:11:08 -05:00
Christoph Hellwig	10fa225c33	scsi: md: Remove WRITE_SAME support There are no more end-users of REQ_OP_WRITE_SAME left, so we can start deleting it. Link: https://lore.kernel.org/r/20220209082828.2629273-6-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-02-22 21:11:08 -05:00
Mike Snitzer	f5b4aee10c	dm: remove unnecessary local variables in __bind Also remove empty newline before 'out:' label at end of __bind. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-22 13:55:53 -05:00
Mike Snitzer	fa247089de	dm: requeue IO if mapping table not yet available Update both bio-based and request-based DM to requeue IO if the mapping table not available. This race of IO being submitted before the DM device ready is so narrow, yet possible for initial table load given that the DM device's request_queue is created prior, that it best to requeue IO to handle this unlikely case. Reported-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-22 13:55:52 -05:00
Barry Song	a6a4901a5e	dm io: remove stale comment block for dm_io() Commit `7eaceaccab` ("block: remove per-queue plugging") dropped unplug_delay and blk_unplug(). Plus, the current kernel has no fundamental difference between sync_io() and async_io() except sync_io() uses sync_io_complete() as the notify.fn and explicitly calls wait_for_completion_io() to sync. The comment isn't valid any more. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-22 13:55:51 -05:00
Zhiqiang Liu	75274a4bf2	dm thin metadata: remove unused dm_thin_remove_block and __remove Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-22 13:55:50 -05:00
Wang Qing	8ca8b1e147	dm thin: use time_is_before_jiffies instead of open coding it Use time_is_before_jiffies() to improve code readability. Signed-off-by: Wang Qing <wangqing@vivo.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-22 11:30:50 -05:00
Aashish Sharma	6fc5150438	dm crypt: fix get_key_size compiler warning if !CONFIG_KEYS Explicitly convert unsigned int in the right of the conditional expression to int to match the left side operand and the return type, fixing the following compiler warning: drivers/md/dm-crypt.c:2593:43: warning: signed and unsigned type in conditional expression [-Wsign-compare] Fixes: `c538f6ec9f` ("dm crypt: add ability to use keys from the kernel key retention service") Signed-off-by: Aashish Sharma <shraash@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-22 11:27:56 -05:00
Kirill Tkhai	588b7f5df0	dm: fix use-after-free in dm_cleanup_zoned_dev() dm_cleanup_zoned_dev() uses queue, so it must be called before blk_cleanup_disk() starts its killing: blk_cleanup_disk->blk_cleanup_queue()->kobject_put()->blk_release_queue()-> ->...RCU...->blk_free_queue_rcu()->kmem_cache_free() Otherwise, RCU callback may be executed first and dm_cleanup_zoned_dev() will touch free'd memory: BUG: KASAN: use-after-free in dm_cleanup_zoned_dev+0x33/0xd0 Read of size 8 at addr ffff88805ac6e430 by task dmsetup/681 CPU: 4 PID: 681 Comm: dmsetup Not tainted 5.17.0-rc2+ #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x57/0x7d print_address_description.constprop.0+0x1f/0x150 ? dm_cleanup_zoned_dev+0x33/0xd0 kasan_report.cold+0x7f/0x11b ? dm_cleanup_zoned_dev+0x33/0xd0 dm_cleanup_zoned_dev+0x33/0xd0 __dm_destroy+0x26a/0x400 ? dm_blk_ioctl+0x230/0x230 ? up_write+0xd8/0x270 dev_remove+0x156/0x1d0 ctl_ioctl+0x269/0x530 ? table_clear+0x140/0x140 ? lock_release+0xb2/0x750 ? remove_all+0x40/0x40 ? rcu_read_lock_sched_held+0x12/0x70 ? lock_downgrade+0x3c0/0x3c0 ? rcu_read_lock_sched_held+0x12/0x70 dm_ctl_ioctl+0xa/0x10 __x64_sys_ioctl+0xb9/0xf0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fb6dfa95c27 Fixes: `bb37d77239` ("dm: introduce zone append emulation") Cc: stable@vger.kernel.org Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-22 11:13:28 -05:00
Jordy Zomer	cd9c88da17	dm ioctl: prevent potential spectre v1 gadget It appears like cmd could be a Spectre v1 gadget as it's supplied by a user and used as an array index. Prevent the contents of kernel memory from being leaked to userspace via speculative execution by using array_index_nospec. Signed-off-by: Jordy Zomer <jordy@pwning.systems> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-22 11:04:41 -05:00
Thore Sommer	118f31b496	dm ima: fix wrong length calculation for no_data string All entries measured by dm ima are prefixed by a version string (dm_version=N.N.N). When there is no data to measure, the entire buffer is overwritten with a string containing the version string again and the length of that string is added to the length of the version string. The new length is now wrong because it contains the version string twice. This caused entries like this: dm_version=4.45.0;name=test,uuid=test;table_clear=no_data; \ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 \ current_device_capacity=204808; Signed-off-by: Thore Sommer <public@thson.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-22 10:42:41 -05:00
Colin Ian King	302f035141	dm cache policy smq: make static read-only array table const The 'table' static array is read-only so it make sense to make it const. Add in the int type to clean up checkpatch warning. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-22 10:35:53 -05:00
Mike Snitzer	c357342186	dm delay: use dm_submit_bio_remap Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-21 15:36:34 -05:00
Mike Snitzer	e5524e128f	dm crypt: use dm_submit_bio_remap Care was taken to support kcryptd_io_read being called from crypt_map or workqueue. Use of an intermediate CRYPT_MAP_READ_GFP gfp_t (defined as GFP_NOWAIT) should protect from maintenance burden if that flag were to change for some reason. Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-21 15:36:34 -05:00
Mike Snitzer	0fbb4d93b3	dm: add dm_submit_bio_remap interface Where possible, switch from early bio-based IO accounting (at the time DM clones each incoming bio) to late IO accounting just before each remapped bio is issued to underlying device via submit_bio_noacct(). Allows more precise bio-based IO accounting for DM targets that use their own workqueues to perform additional processing of each bio in conjunction with their DM_MAPIO_SUBMITTED return from their map function. When a target is updated to use dm_submit_bio_remap() they must also set ti->accounts_remapped_io to true. Use xchg() in start_io_acct(), as suggested by Mikulas, to ensure each IO is only started once. The xchg race only happens if __send_duplicate_bios() sends multiple bios -- that case is reflected via tio->is_duplicate_bio. Given the niche nature of this race, it is best to avoid any xchg performance penalty for normal IO. For IO that was never submitted with dm_bio_submit_remap(), but the target completes the clone with bio_endio, accounting is started then ended and pending_io counter decremented. Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-21 15:36:33 -05:00
Mike Snitzer	e6fc9f62ce	dm: flag clones created by __send_duplicate_bios Formally disallow dm_accept_partial_bio() on clones created by __send_duplicate_bios() because their len_ptr points to a shared unsigned int. __send_duplicate_bios() is only used for flush bios and other "abnormal" bios (discards, writezeroes, etc). And dm_accept_partial_bio() already didn't support flush bios. Also refactor __send_changing_extent_only() to reflect it cannot fail. As such __send_changing_extent_only() can update the clone_info before __send_duplicate_bios() is called to fan-out __map_bio() calls. Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-21 15:36:32 -05:00
Mike Snitzer	300432f58b	dm: reduce dm_io and dm_target_io struct sizes Remove one 4 byte hole in dm_io struct. Remove two 4 byte holes in dm_target_io struct. Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-21 15:36:31 -05:00
Mike Snitzer	018b05ebbf	dm: move duplicate code from callers of alloc_tio into alloc_tio Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-21 15:36:30 -05:00
Mike Snitzer	743598f049	dm: record old_sector in dm_target_io before calling map function Prep for being able to defer trace_block_bio_remap() until when the bio is remapped and submitted by the DM target. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-21 15:36:29 -05:00
Mike Snitzer	77c11720a4	dm: remove legacy code only needed before submit_bio recursion Commit `8615cb65bd` ("dm: remove useless loop in __split_and_process_bio") showcased that we no longer loop. Remove the bio_advance() in __split_and_process_bio() that was only needed when looping was possible. Similarly there is no need to advance the bio, using ci->sector cursor, in __send_duplicate_bios(). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-21 15:36:28 -05:00
Mike Snitzer	0119ab14c3	dm: remove unused mapped_device argument from free_tio Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-21 15:36:27 -05:00
Mike Snitzer	5b27b8ddbf	dm: remove impossible BUG_ON in __send_empty_flush The flush_bio in question was just initialized to be empty, so there is no way bio_has_data() will return true. So remove stale BUG_ON(). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-21 15:36:26 -05:00
Mike Snitzer	90a2326ede	dm: reduce code duplication in __map_bio Error path code (for handling DM_MAPIO_REQUEUE and DM_MAPIO_KILL) is effectively identical. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-21 15:36:25 -05:00
Mike Snitzer	d41e077ab6	dm: refactor dm_split_and_process_bio a bit Remove needless branching and indentation. Leaves code to catch malformed op_is_zone_mgmt bios (they shouldn't have a payload). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-21 15:36:24 -05:00
Mike Snitzer	66bdaa4302	dm: fold __clone_and_map_data_bio into __split_and_process_bio Fold __clone_and_map_data_bio into its only caller. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-21 15:36:23 -05:00
Mike Snitzer	96c9865cb6	dm: rename split functions Rename __split_and_process_bio to dm_split_and_process_bio. Rename __split_and_process_non_flush to __split_and_process_bio. Also fix a stale comment and whitespace. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-21 15:36:22 -05:00
Mike Snitzer	205649d84c	dm: reorder members in mapped_device struct Improves alignment and groups related members relative to cachelines. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-21 15:36:21 -05:00
Mike Snitzer	0ab30b4079	dm: eliminate copying of dm_io fields in dm_io_dec_pending There is no need for dm_io_dec_pending() to copy dm_io fields anymore now that DM provides its own pending_io counters again. The race documented in commit `d208b89401` ("dm: fix mempool NULL pointer race when completing IO") no longer exists now that block core's in_flight counters aren't used to signal all dm_io is complete. Also, rename {start,end}_io_acct to dm_{start,end}_io_acct. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-21 15:36:18 -05:00
Mike Snitzer	0cdb90f0f3	dm stats: fix too short end duration_ns when using precise_timestamps dm_stats_account_io()'s STAT_PRECISE_TIMESTAMPS support doesn't handle the fact that with commit `b879f915bc` ("dm: properly fix redundant bio-based IO accounting") io->start_time _may_ be in the past (meaning the start_io_acct() was deferred until later). Add a new dm_stats_recalc_precise_timestamps() helper that will set/clear a new 'precise_timestamps' flag in the dm_stats struct based on whether any configured stats enable STAT_PRECISE_TIMESTAMPS. And update DM core's alloc_io() to use dm_stats_record_start() to set stats_aux.duration_ns if stats->precise_timestamps is true. Also, remove unused 'last_sector' and 'last_rw' members from the dm_stats struct. Fixes: `b879f915bc` ("dm: properly fix redundant bio-based IO accounting") Cc: stable@vger.kernel.org Co-developed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-21 15:35:39 -05:00
Mike Snitzer	8d394bc4ad	dm: fix double accounting of flush with data DM handles a flush with data by first issuing an empty flush and then once it completes the REQ_PREFLUSH flag is removed and the payload is issued. The problem fixed by this commit is that both the empty flush bio and the data payload will account the full extent of the data payload. Fix this by factoring out dm_io_acct() and having it wrap all IO accounting to set the size of bio with REQ_PREFLUSH to 0, account the IO, and then restore the original size. Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-21 15:35:25 -05:00
Mike Snitzer	9f6dc63376	dm: interlock pending dm_io and dm_wait_for_bios_completion Commit `d208b89401` ("dm: fix mempool NULL pointer race when completing IO") didn't go far enough. When bio_end_io_acct ends the count of in-flight I/Os may reach zero and the DM device may be suspended. There is a possibility that the suspend races with dm_stats_account_io. Fix this by adding percpu "pending_io" counters to track outstanding dm_io. Move kicking of suspend queue to dm_io_dec_pending(). Also, rename md_in_flight_bios() to dm_in_flight_bios() and update it to iterate all pending_io counters. Fixes: `d208b89401` ("dm: fix mempool NULL pointer race when completing IO") Cc: stable@vger.kernel.org Co-developed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-02-21 15:19:08 -05:00
Christoph Hellwig	7a5428dcb7	block: fix surprise removal for drivers calling blk_set_queue_dying Various block drivers call blk_set_queue_dying to mark a disk as dead due to surprise removal events, but since commit `8e141f9eb8` that doesn't work given that the GD_DEAD flag needs to be set to stop I/O. Replace the driver calls to blk_set_queue_dying with a new (and properly documented) blk_mark_disk_dead API, and fold blk_set_queue_dying into the only remaining caller. Fixes: `8e141f9eb8` ("block: drain file system I/O on del_gendisk") Reported-by: Markus Blöchl <markus.bloechl@ipetronik.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20220217075231.1140-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-17 07:54:03 -07:00
Christoph Hellwig	9f9adea718	dm: remove dm_dispatch_clone_request Fold dm_dispatch_clone_request into it's only caller, and use a switch statement to single dispatch for the handling of the different return values from blk_insert_cloned_request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220215100540.3892965-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-16 19:39:10 -07:00
Christoph Hellwig	8803c89f36	dm: remove useless code from dm_dispatch_clone_request Both ->start_time_ns and the RQF_IO_STAT are set when the request is allocated using blk_mq_alloc_request by dm-mpath in blk_mq_rq_ctx_init. The block layer also ensures ->start_time_ns is only set when actually needed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220215100540.3892965-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-16 19:39:10 -07:00
Christoph Hellwig	28db4711bf	blk-mq: remove the request_queue argument to blk_insert_cloned_request The request must be submitted to the queue it was allocated for, so remove the extra request_queue argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220215100540.3892965-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-16 19:39:10 -07:00
Christoph Hellwig	248c793359	blk-mq: make the blk-mq stacking code optional The code to stack blk-mq drivers is only used by dm-multipath, and will preferably stay that way. Make it optional and only selected by device mapper, so that the buildbots more easily catch abuses like the one that slipped in in the ufs driver in the last merged window. Another positive side effects is that kernel builds without device mapper shrink a little bit as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220215100540.3892965-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-16 19:39:09 -07:00
Christoph Hellwig	abfc426d1b	block: pass a block_device to bio_clone_fast Pass a block_device to bio_clone_fast and __bio_clone_fast and give the functions more suitable names. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220202160109.108149-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-04 07:43:18 -07:00
Christoph Hellwig	a0e8de798d	block: initialize the target bio in __bio_clone_fast All callers of __bio_clone_fast initialize the bio first. Move that initialization into __bio_clone_fast instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220202160109.108149-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-04 07:43:18 -07:00
Christoph Hellwig	92986f6b4c	dm: use bio_clone_fast in alloc_io/alloc_tio Replace open coded bio_clone_fast implementations with the actual helper. Note that the bio allocated as part of the dm_io structure in alloc_io will only actually be used later in alloc_tio, making this earlier cloning of the information safe. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220202160109.108149-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-04 07:43:18 -07:00
Christoph Hellwig	56b4b5abcd	block: clone crypto and integrity data in __bio_clone_fast __bio_clone_fast should also clone integrity and crypto data, as a clone without those is incomplete. Right now the only caller that can actually support crypto and integrity data (dm) does it manually for the one callchain that supports these, but we better do it properly in the core. Note that all callers except for the above mentioned one also don't need to handle failure at all, given that the integrity and crypto clones are based on mempool allocations that won't fail for sleeping allocations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220202160109.108149-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-04 07:43:18 -07:00
Christoph Hellwig	3c4b455ef8	dm-cache: remove __remap_to_origin_clear_discard Fold __remap_to_origin_clear_discard into the two callers to prepare for bio cloning refactoring. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220202160109.108149-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-04 07:43:18 -07:00
Christoph Hellwig	891fced644	dm: simplify the single bio fast path in __send_duplicate_bios Most targets just need a single flush bio. Open code that case in __send_duplicate_bios without the need to add the bio to a list. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220202160109.108149-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-04 07:43:17 -07:00
Christoph Hellwig	1d1068cecf	dm: retun the clone bio from alloc_tio Return the clone bio embedded into the tio as that is what the callers actually want. Similar for the free side. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220202160109.108149-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-04 07:43:17 -07:00
Christoph Hellwig	1561b39610	dm: pass the bio instead of tio to __map_bio This simplifies the callers a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220202160109.108149-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-04 07:43:17 -07:00
Christoph Hellwig	dc8e2021da	dm: move cloning the bio into alloc_tio Move the call to __bio_clone_fast and the assignment of ->len_ptr from the callers into alloc_tio to prepare for changes to the bio clone API. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220202160109.108149-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-04 07:43:17 -07:00
Christoph Hellwig	8eabf5d0a7	dm: fold __send_duplicate_bios into __clone_and_map_simple_bio Fold __send_duplicate_bios into its only caller to prepare for refactoring. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220202160109.108149-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-04 07:43:17 -07:00
Christoph Hellwig	b1bee79237	dm: fold clone_bio into __clone_and_map_data_bio Fold clone_bio into its only caller to prepare for refactoring. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220202160109.108149-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-04 07:43:17 -07:00
Christoph Hellwig	6c23f0bd7f	dm: add a clone_to_tio helper Add a helper to stop open coding the container_of operations to get from the clone bio to the tio structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220202160109.108149-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-04 07:43:17 -07:00
Song Liu	0f9650bd83	md: fix NULL pointer deref with nowait but no mddev->queue Leon reported NULL pointer deref with nowait support: [ 15.123761] device-mapper: raid: Loading target version 1.15.1 [ 15.124185] device-mapper: raid: Ignoring chunk size parameter for RAID 1 [ 15.124192] device-mapper: raid: Choosing default region size of 4MiB [ 15.129524] BUG: kernel NULL pointer dereference, address: 0000000000000060 [ 15.129530] #PF: supervisor write access in kernel mode [ 15.129533] #PF: error_code(0x0002) - not-present page [ 15.129535] PGD 0 P4D 0 [ 15.129538] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 15.129541] CPU: 5 PID: 494 Comm: ldmtool Not tainted 5.17.0-rc2-1-mainline #1 9fe89d43dfcb215d2731e6f8851740520778615e [ 15.129546] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ELITE/X570 AORUS ELITE, BIOS F36e 10/14/2021 [ 15.129549] RIP: 0010:blk_queue_flag_set+0x7/0x20 [ 15.129555] Code: 00 00 00 0f 1f 44 00 00 48 8b 35 e4 e0 04 02 48 8d 57 28 bf 40 01 \ 00 00 e9 16 c1 be ff 66 0f 1f 44 00 00 0f 1f 44 00 00 89 ff <f0> 48 0f ab 7e 60 \ 31 f6 89 f7 c3 66 66 2e 0f 1f 84 00 00 00 00 00 [ 15.129559] RSP: 0018:ffff966b81987a88 EFLAGS: 00010202 [ 15.129562] RAX: ffff8b11c363a0d0 RBX: ffff8b11e294b070 RCX: 0000000000000000 [ 15.129564] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000001d [ 15.129566] RBP: ffff8b11e294b058 R08: 0000000000000000 R09: 0000000000000000 [ 15.129568] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b11e294b070 [ 15.129570] R13: 0000000000000000 R14: ffff8b11e294b000 R15: 0000000000000001 [ 15.129572] FS: 00007fa96e826780(0000) GS:ffff8b18deb40000(0000) knlGS:0000000000000000 [ 15.129575] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.129577] CR2: 0000000000000060 CR3: 000000010b8ce000 CR4: 00000000003506e0 [ 15.129580] Call Trace: [ 15.129582] <TASK> [ 15.129584] md_run+0x67c/0xc70 [md_mod 1e470c1b6bcf1114198109f42682f5a2740e9531] [ 15.129597] raid_ctr+0x134a/0x28ea [dm_raid 6a645dd7519e72834bd7e98c23497eeade14cd63] [ 15.129604] ? dm_split_args+0x63/0x150 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e] [ 15.129615] dm_table_add_target+0x188/0x380 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e] [ 15.129625] table_load+0x13b/0x370 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e] [ 15.129635] ? dev_suspend+0x2d0/0x2d0 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e] [ 15.129644] ctl_ioctl+0x1bd/0x460 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e] [ 15.129655] dm_ctl_ioctl+0xa/0x20 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e] [ 15.129663] __x64_sys_ioctl+0x8e/0xd0 [ 15.129667] do_syscall_64+0x5c/0x90 [ 15.129672] ? syscall_exit_to_user_mode+0x23/0x50 [ 15.129675] ? do_syscall_64+0x69/0x90 [ 15.129677] ? do_syscall_64+0x69/0x90 [ 15.129679] ? syscall_exit_to_user_mode+0x23/0x50 [ 15.129682] ? do_syscall_64+0x69/0x90 [ 15.129684] ? do_syscall_64+0x69/0x90 [ 15.129686] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 15.129689] RIP: 0033:0x7fa96ecd559b [ 15.129692] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c \ c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff \ ff 73 01 c3 48 8b 0d a5 a8 0c 00 f7 d8 64 89 01 48 [ 15.129696] RSP: 002b:00007ffcaf85c258 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 [ 15.129699] RAX: ffffffffffffffda RBX: 00007fa96f1b48f0 RCX: 00007fa96ecd559b [ 15.129701] RDX: 00007fa97017e610 RSI: 00000000c138fd09 RDI: 0000000000000003 [ 15.129702] RBP: 00007fa96ebab583 R08: 00007fa97017c9e0 R09: 00007ffcaf85bf27 [ 15.129704] R10: 0000000000000001 R11: 0000000000000206 R12: 00007fa97017e610 [ 15.129706] R13: 00007fa97017e640 R14: 00007fa97017e6c0 R15: 00007fa97017e530 [ 15.129709] </TASK> This is caused by missing mddev->queue check for setting QUEUE_FLAG_NOWAIT Fix this by moving the QUEUE_FLAG_NOWAIT logic to under mddev->queue check. Fixes: `f51d46d0e7` ("md: add support for REQ_NOWAIT") Reported-by: Leon Möller <jkhsjdhjs@totally.rip> Tested-by: Leon Möller <jkhsjdhjs@totally.rip> Cc: Vishal Verma <vverma@digitalocean.com> Signed-off-by: Song Liu <song@kernel.org>	2022-02-02 10:14:07 -08:00
Christoph Hellwig	a7c50c9404	block: pass a block_device and opf to bio_reset Pass the block_device that we plan to use this bio for and the operation to bio_reset to optimize the assigment. A NULL block_device can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-20-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-02 07:50:00 -07:00
Christoph Hellwig	49add4966d	block: pass a block_device and opf to bio_init Pass the block_device that we plan to use this bio for and the operation to bio_init to optimize the assignment. A NULL block_device can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-19-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-02 07:49:59 -07:00
Christoph Hellwig	07888c665b	block: pass a block_device and opf to bio_alloc Pass the block_device and operation that we plan to use this bio for to bio_alloc to optimize the assignment. NULL/0 can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Also move the gfp_mask argument after the nr_vecs argument for a much more logical calling convention matching what most of the kernel does. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-18-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-02 07:49:59 -07:00
Christoph Hellwig	609be10667	block: pass a block_device and opf to bio_alloc_bioset Pass the block_device and operation that we plan to use this bio for to bio_alloc_bioset to optimize the assigment. NULL/0 can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Also move the gfp_mask argument after the nr_vecs argument for a much more logical calling convention matching what most of the kernel does. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-02 07:49:59 -07:00
Christoph Hellwig	28d7d128aa	dm-thin: use blkdev_issue_flush instead of open coding it Use blkdev_issue_flush, which uses an on-stack bio instead of an opencoded version with a bio embedded into struct pool. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220124091107.642561-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-02 07:49:59 -07:00
Christoph Hellwig	eba33b8ef1	dm-snap: use blkdev_issue_flush instead of open coding it Use blkdev_issue_flush, which uses an on-stack bio instead of an opencoded version with a bio embedded into struct dm_snapshot. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220124091107.642561-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-02 07:49:59 -07:00
Christoph Hellwig	3f868c09ea	dm-crypt: remove clone_init Just open code it next to the bio allocations, which saves a few lines of code, prepares for future changes and allows to remove the duplicate bi_opf assignment for the bio_clone_fast case in kcryptd_io_read. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220124091107.642561-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-02 07:49:59 -07:00
Christoph Hellwig	53db984e00	dm: bio_alloc can't fail if it is allowed to sleep Remove handling of NULL returns from sleeping bio_alloc calls given that those can't fail. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220124091107.642561-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-02 07:49:59 -07:00
Christoph Hellwig	322cbb50de	block: remove genhd.h There is no good reason to keep genhd.h separate from the main blkdev.h header that includes it. So fold the contents of genhd.h into blkdev.h and remove genhd.h entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220124093913.742411-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-02 07:49:59 -07:00
Mike Snitzer	b879f915bc	dm: properly fix redundant bio-based IO accounting Record the start_time for a bio but defer the starting block core's IO accounting until after IO is submitted using bio_start_io_acct_time(). This approach avoids the need to mess around with any of the individual IO stats in response to a bio_split() that follows bio submission. Reported-by: Bud Brown <bubrown@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Depends-on: `e45c47d1f9` ("block: add bio_start_io_acct_time() to control start_time") Signed-off-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220128155841.39644-4-snitzer@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-01-28 12:28:15 -07:00
Mike Snitzer	f524d9c95f	dm: revert partial fix for redundant bio-based IO accounting Reverts `a1e1cb72d9` ("dm: fix redundant IO accounting for bios that need splitting") because it was too narrow in scope (only addressed redundant 'sectors[]' accounting and not ios, nsecs[], etc). Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220128155841.39644-3-snitzer@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-01-28 12:28:15 -07:00
Greg Kroah-Hartman	fa97cb843c	bcache: use default_groups in kobj_type There are currently 2 ways to create a set of sysfs files for a kobj_type, through the default_attrs field, and the default_groups field. Move the bcache sysfs code to use default_groups field which has been the preferred way since `aa30f47cf6` ("kobject: Add support for default attribute groups to kobj_type") so that we can soon get rid of the obsolete default_attrs field. Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: linux-bcache@vger.kernel.org Acked-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20220106100004.3277439-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-01-26 15:56:18 +01:00
Linus Torvalds	3acbdbf42e	dax + libnvdimm for v5.17 - Simplify the dax_operations API - Eliminate bdev_dax_pgoff() in favor of the filesystem maintaining and applying a partition offset to all its DAX iomap operations. - Remove wrappers and device-mapper stacked callbacks for ->copy_from_iter() and ->copy_to_iter() in favor of moving block_device relative offset responsibility to the dax_direct_access() caller. - Remove the need for an @bdev in filesystem-DAX infrastructure - Remove unused uio helpers copy_from_iter_flushcache() and copy_mc_to_iter() as only the non-check_copy_size() versions are used for DAX. - Prepare XFS for the pending (next merge window) DAX+reflink support - Remove deprecated DEV_DAX_PMEM_COMPAT support - Cleanup a straggling misuse of the GUID api Tags offered after the branch was cut: Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/Ydb/3P+8nvjCjYfO@redhat.com -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCYd3dTAAKCRDfioYZHlFs Z//UAP9zetoTE+O7zJG7CXja4jSopSadbdbh6QKSXaqfKBPvQQD+N4US3wA2bGv8 f/qCY62j2Hj3hUTGHs9RvTyw3JsSYAA= =QvDs -----END PGP SIGNATURE----- Merge tag 'libnvdimm-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull dax and libnvdimm updates from Dan Williams: "The bulk of this is a rework of the dax_operations API after discovering the obstacles it posed to the work-in-progress DAX+reflink support for XFS and other copy-on-write filesystem mechanics. Primarily the need to plumb a block_device through the API to handle partition offsets was a sticking point and Christoph untangled that dependency in addition to other cleanups to make landing the DAX+reflink support easier. The DAX_PMEM_COMPAT option has been around for 4 years and not only are distributions shipping userspace that understand the current configuration API, but some are not even bothering to turn this option on anymore, so it seems a good time to remove it per the deprecation schedule. Recall that this was added after the device-dax subsystem moved from /sys/class/dax to /sys/bus/dax for its sysfs organization. All recent functionality depends on /sys/bus/dax. Some other miscellaneous cleanups and reflink prep patches are included as well. Summary: - Simplify the dax_operations API: - Eliminate bdev_dax_pgoff() in favor of the filesystem maintaining and applying a partition offset to all its DAX iomap operations. - Remove wrappers and device-mapper stacked callbacks for ->copy_from_iter() and ->copy_to_iter() in favor of moving block_device relative offset responsibility to the dax_direct_access() caller. - Remove the need for an @bdev in filesystem-DAX infrastructure - Remove unused uio helpers copy_from_iter_flushcache() and copy_mc_to_iter() as only the non-check_copy_size() versions are used for DAX. - Prepare XFS for the pending (next merge window) DAX+reflink support - Remove deprecated DEV_DAX_PMEM_COMPAT support - Cleanup a straggling misuse of the GUID api" * tag 'libnvdimm-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (38 commits) iomap: Fix error handling in iomap_zero_iter() ACPI: NFIT: Import GUID before use dax: remove the copy_from_iter and copy_to_iter methods dax: remove the DAXDEV_F_SYNC flag dax: simplify dax_synchronous and set_dax_synchronous uio: remove copy_from_iter_flushcache() and copy_mc_to_iter() iomap: turn the byte variable in iomap_zero_iter into a ssize_t memremap: remove support for external pgmap refcounts fsdax: don't require CONFIG_BLOCK iomap: build the block based code conditionally dax: fix up some of the block device related ifdefs fsdax: shift partition offset handling into the file systems dax: return the partition offset from fs_dax_get_by_bdev iomap: add a IOMAP_DAX flag xfs: pass the mapping flags to xfs_bmbt_to_iomap xfs: use xfs_direct_write_iomap_ops for DAX zeroing xfs: move dax device handling into xfs_{alloc,free}_buftarg ext4: cleanup the dax handling in ext4_fill_super ext2: cleanup the dax handling in ext2_fill_super fsdax: decouple zeroing from the iomap buffered I/O code ...	2022-01-12 15:46:11 -08:00
Linus Torvalds	49008f0cc1	- Fixes and improvements to dm btree and dm space map code in persistent-data library used by thinp and cache. - Update DM integrity to use struct_group() to zero struct journal_sector. - Update DM sysfs to use default_groups in kobj_type. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmHe/s4ACgkQxSPxCi2d A1rlwggA4esorVQSM86KcoHjRqatof1NBPoAdTPt1BykO9wl/HBCn4YGOPJVWkoP Dwe1PUvlH0ZbpnW/J7IrMpdT+IyXJVzQSmFJsQTFjMWXQM2OCDKz5YJc4KQzFBda Hl30kChoot58kxtZ29FLz4t3cpXG8lkB3tz3O/Z2McGQTymNM6Ekbv7GXwX5jfDo QpRsXEJ2L2jyVFxwE6kCIpDIiOI0a3+3OvHQneTGunCnhqZ6N3ii8llKbw/OAzHI jE+q+tJ6NeNMVMdETenrFXSBcbxo07VG87hWj62XbGzHaeYE5vWu1UCWPjx2kqS3 +Co7+q02v925cThrRGftc5xQlB88+Q== =YgBt -----END PGP SIGNATURE----- Merge tag 'for-5.17/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Fixes and improvements to dm btree and dm space map code in persistent-data library used by thinp and cache. - Update DM integrity to use struct_group() to zero struct journal_sector. - Update DM sysfs to use default_groups in kobj_type. * tag 'for-5.17/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm sysfs: use default_groups in kobj_type dm integrity: Use struct_group() to zero struct journal_sector dm space map common: add bounds check to sm_ll_lookup_bitmap() dm btree: add a defensive bounds check to insert_at() dm btree remove: change a bunch of BUG_ON() calls to proper errors dm btree spine: eliminate duplicate le32_to_cpu() in node_check() dm btree spine: remove extra node_check function declaration	2022-01-12 10:40:11 -08:00
Linus Torvalds	c9193f48e9	for-5.17/drivers-2022-01-11 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmHd8EIQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpnOKEADGpxp+Vntbm8nZI/PFP5fA2gUTZWgSVB4l axVTYW21pjSrsrAhGg2FIgBgL0tNkgxQnIPRn50YL8jT3pTkCEcR7kLbhEU7W/Ln 7hrsBgFnsCBoCs38LvzXHZD69jtEtNRk1ijPMLo5iCcHkAyUVKa1glfeMwefuI5/ Rl8SoueRXppvCfwNPptaAKiDsYVN8KCJPvvhlMNoKP5n1iTsNYJ/HVsLqfRnP0oc CR6eHaYceWGLER8tWtBlG2Qp40+cd/A320thkIlEpEKJPWE/ce5AUp0PYxVJbwjU qvO1tMYSya7gPiaVWRJcUeAgRFiivM/kTdDrGwiY9hpv/BQG7EAW5D9Xecz/M4UG BgNLfhe0aR9QssjPxITgyiy9sRpwwpnpoVONTu3slgXVTUVlOq0QT6LOTPR1B9A4 ZjbHVCuI3eyrAOqD4IjYSqjHa6GjFLiKTh8Q0ZB/KJGX1eItLVLVdJfcfV4RkBIf 6RZg9+7/mXaDxU74DZ2tfUhHT0sC5RS+5VFxpkhThVk9qRbVdZGGWAHcVOkMjk9B L4PCpJeuaR+rzXvCDOCOI5sHraa5F/IRhMaTu5sHj/MIuEpq1fqjaB7tWRvfm6HO 4tepUtb++rS3/zFFQlZCLyjVk2o0p2b0viwPLjvsRqsBp1bVoO9mJIiyp6POmM3G UjxQS0vEDw== =k0IZ -----END PGP SIGNATURE----- Merge tag 'for-5.17/drivers-2022-01-11' of git://git.kernel.dk/linux-block Pull block driver updates from Jens Axboe: - mtip32xx pci cleanups (Bjorn) - mtip32xx conversion to generic power management (Vaibhav) - rsxx pci powermanagement cleanups (Bjorn) - Remove the rsxx driver. This hardware never saw much adoption, and it's been end of lifed for a while. (Christoph) - MD pull request from Song: - REQ_NOWAIT support (Vishal Verma) - raid6 benchmark optimization (Dirk Müller) - Fix for acct bioset (Xiao Ni) - Clean up max_queued_requests (Mariusz Tkaczyk) - PREEMPT_RT optimization (Davidlohr Bueso) - Use default_groups in kobj_type (Greg Kroah-Hartman) - Use attribute groups in pktcdvd and rnbd (Greg) - NVMe pull request from Christoph: - increment request genctr on completion (Keith Busch, Geliang Tang) - add a 'iopolicy' module parameter (Hannes Reinecke) - print out valid arguments when reading from /dev/nvme-fabrics (Hannes Reinecke) - Use struct_group() in drbd (Kees) - null_blk fixes (Ming) - Get rid of congestion logic in pktcdvd (Neil) - Floppy ejection hang fix (Tasos) - Floppy max user request size fix (Xiongwei) - Loop locking fix (Tetsuo) * tag 'for-5.17/drivers-2022-01-11' of git://git.kernel.dk/linux-block: (32 commits) md: use default_groups in kobj_type md: Move alloc/free acct bioset in to personality lib/raid6: Use strict priority ranking for pq gen() benchmarking lib/raid6: skip benchmark of non-chosen xor_syndrome functions md: fix spelling of "its" md: raid456 add nowait support md: raid10 add nowait support md: raid1 add nowait support md: add support for REQ_NOWAIT md: drop queue limitation for RAID1 and RAID10 md/raid5: play nice with PREEMPT_RT block/rnbd-clt-sysfs: use default_groups in kobj_type pktcdvd: convert to use attribute groups block: null_blk: only set set->nr_maps as 3 if active poll_queues is > 0 nvme: add 'iopolicy' module parameter nvme: drop unused variable ctrl in nvme_setup_cmd nvme: increment request genctr on completion nvme-fabrics: print out valid arguments when reading from /dev/nvme-fabrics block: remove the rsxx driver rsxx: Drop PCI legacy power management ...	2022-01-12 10:35:23 -08:00
Linus Torvalds	d3c8108035	for-5.17/block-2022-01-11 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmHd8DAQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpnhRD/wMAjsNO65PCA+o/bPpVi4ulx9EejAzrJnB 5vHFvREAoOOGKvRpYGe4w3TcKyW+zPb+GtlXFjPfK+wuVzWhrQtW/+vkjKlBt8wK o7rzeMwTKJ9ZGvYaaQpp1yC0WURBB3qnCRQhb8dOQzhJgEXinhIOznZsut4mniLv fTqcDmKAb/+G6K6CQCCqnH0I/+OJZyUeSFo1kk2i4ZqCBepQpBkOL6H2rBOtGxUg bt1jiGHbbhCRYEE3u2kV0HP10qAChNaMQC705jV4Qpf4+3EntSxs+6nSb74dvMkX 3+Wmp8Ctq6lpPnDL1nrAFGz3jZnB0Y+GdgOclQn3ViQd1FCXZzuYWQ3fTaBfURCZ /RE5nc047SqpwCFLOynM++OkaeQZ1zSxeyoFTtzDaPF4tLuaX3JHswvTzNGPw8SN BnexseNnNBCjJliZSEE7fOkjJDcev2dvRxPtI8/wkF4lHUgETc5IW563C53xo/Tx 32yFjZwCVIpNWk21su/0H3iEq80wZ7PnriiN/E3JA6XbnevlRPu0NPMb0D258GCm yCcdPVDNZsQCB8hluqZcu0g6LSgZRo90Yg1oqKqEpAllJJMBaEAPPPuUIJh998mo iKGxZzgr7d9jrbGJTInp0F8b3B3/oV/hxgzy0Hu/mHP3AsnaAk9o/oEQZ7rX4Khr 6biloqkIMA== =RWnJ -----END PGP SIGNATURE----- Merge tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-block Pull block updates from Jens Axboe: - Unify where the struct request handling code is located in the blk-mq code (Christoph) - Header cleanups (Christoph) - Clean up the io_context handling code (Christoph, me) - Get rid of ->rq_disk in struct request (Christoph) - Error handling fix for add_disk() (Christoph) - request allocation cleanusp (Christoph) - Documentation updates (Eric, Matthew) - Remove trivial crypto unregister helper (Eric) - Reduce shared tag overhead (John) - Reduce poll_stats memory overhead (me) - Known indirect function call for dio (me) - Use atomic references for struct request (me) - Support request list issue for block and NVMe (me) - Improve queue dispatch pinning (Ming) - Improve the direct list issue code (Keith) - BFQ improvements (Jan) - Direct completion helper and use it in mmc block (Sebastian) - Use raw spinlock for the blktrace code (Wander) - fsync error handling fix (Ye) - Various fixes and cleanups (Lukas, Randy, Yang, Tetsuo, Ming, me) * tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-block: (132 commits) MAINTAINERS: add entries for block layer documentation docs: block: remove queue-sysfs.rst docs: sysfs-block: document virt_boundary_mask docs: sysfs-block: document stable_writes docs: sysfs-block: fill in missing documentation from queue-sysfs.rst docs: sysfs-block: add contact for nomerges docs: sysfs-block: sort alphabetically docs: sysfs-block: move to stable directory block: don't protect submit_bio_checks by q_usage_counter block: fix old-style declaration nvme-pci: fix queue_rqs list splitting block: introduce rq_list_move block: introduce rq_list_for_each_safe macro block: move rq_list macros to blk-mq.h block: drop needless assignment in set_task_ioprio() block: remove unnecessary trailing '\' bio.h: fix kernel-doc warnings block: check minor range in device_add_disk() block: use "unsigned long" for blk_validate_block_size(). block: fix error unwinding in device_add_disk ...	2022-01-12 10:26:52 -08:00
Linus Torvalds	35632d92ef	block-5.16-2022-01-07 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmHYRcAQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgplYyEACwEaL/JVb1fYf5fv7nQiVQPQxDhIFwMB66 1bBV5y3SoQmq0ZO0mEkb3SIProbEVadnTOjcF8+8SB6K0qLefhhBfTRB9jnIfQqL qTGbIMcWv0q2wnBGnRIv1ADzu6KkRT+pXLW9RgLs8ZbQDxH2w9tMSvRYeBKV+xHp J6PPNp5OjuA+Obd8fi7ILNTDh5EIkWQBpZNoByyAUmz9iH8InsALhM3aae2jPbO+ uWdPTxK7QKUvGIS+mfnTF/F+pURyQDwrPLirxq8yNfiAe++ZaryLb6jQcF/m3yKP s1FlGZrYsbRhroRJxa1uRZzVkenHbWTSea/iqFQdf+rJs4RASOr2/lHimorWKt0H HfV+xAcVmZR775KINh+zwImuAvB5ymOyUePUMeAvDyaAeiPsQGQ+VuylNFBvcHNg JvEmSaQfKw5H0qAUyIN70v3bqeOaPF7b50Kyg35a9RbELAvMeVIASokIjmJJ2ndb uVdPo+9FozwDFQ4iMbdJ4fjl9XHKxSKxJKaPRfzz+Tfbx1QSKXkcepT8b8WJ0S7O yweuLRwrlrUW1kFMbAS9sspL94z0o1rF61FjRJdd24D5YnGTqUK9Ciok2mpYqokk jBQX05+nyiyKwrPYJfblDxnQGLzv5VCkhkGmpARla0J1si9OrK9Oxp1lq9Q1XeWY EdO4WmGMDA== =Liaw -----END PGP SIGNATURE----- Merge tag 'block-5.16-2022-01-07' of git://git.kernel.dk/linux-block Pull block fix from Jens Axboe: "Just the md bitmap regression this time" * tag 'block-5.16-2022-01-07' of git://git.kernel.dk/linux-block: md/raid1: fix missing bitmap update w/o WriteMostly devices	2022-01-07 13:28:20 -08:00
Greg Kroah-Hartman	1745e857e7	md: use default_groups in kobj_type There are currently 2 ways to create a set of sysfs files for a kobj_type, through the default_attrs field, and the default_groups field. Move the md rdev sysfs code to use default_groups field which has been the preferred way since commit `aa30f47cf6` ("kobject: Add support for default attribute groups to kobj_type") so that we can soon get rid of the obsolete default_attrs field. Cc: Song Liu <song@kernel.org> Cc: linux-raid@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Song Liu <song@kernel.org>	2022-01-06 10:42:50 -08:00
Xiao Ni	0c031fd37f	md: Move alloc/free acct bioset in to personality bioset acct is only needed for raid0 and raid5. Therefore, md_run only allocates it for raid0 and raid5. However, this does not cover personality takeover, which may cause uninitialized bioset. For example, the following repro steps: mdadm -CR /dev/md0 -l1 -n2 /dev/loop0 /dev/loop1 mdadm --wait /dev/md0 mkfs.xfs /dev/md0 mdadm /dev/md0 --grow -l5 mount /dev/md0 /mnt causes panic like: [ 225.933939] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 225.934903] #PF: supervisor instruction fetch in kernel mode [ 225.935639] #PF: error_code(0x0010) - not-present page [ 225.936361] PGD 0 P4D 0 [ 225.936677] Oops: 0010 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI [ 225.937525] CPU: 27 PID: 1133 Comm: mount Not tainted 5.16.0-rc3+ #706 [ 225.938416] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.module_el8.4.0+547+a85d02ba 04/01/2014 [ 225.939922] RIP: 0010:0x0 [ 225.940289] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. [ 225.941196] RSP: 0018:ffff88815897eff0 EFLAGS: 00010246 [ 225.941897] RAX: 0000000000000000 RBX: 0000000000092800 RCX: ffffffff81370a39 [ 225.942813] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000092800 [ 225.943772] RBP: 1ffff1102b12fe04 R08: fffffbfff0b43c01 R09: fffffbfff0b43c01 [ 225.944807] R10: ffffffff85a1e007 R11: fffffbfff0b43c00 R12: ffff88810eaaaf58 [ 225.945757] R13: 0000000000000000 R14: ffff88810eaaafb8 R15: ffff88815897f040 [ 225.946709] FS: 00007ff3f2505080(0000) GS:ffff888fb5e00000(0000) knlGS:0000000000000000 [ 225.947814] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 225.948556] CR2: ffffffffffffffd6 CR3: 000000015aa5a006 CR4: 0000000000370ee0 [ 225.949537] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 225.950455] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 225.951414] Call Trace: [ 225.951787] <TASK> [ 225.952120] mempool_alloc+0xe5/0x250 [ 225.952625] ? mempool_resize+0x370/0x370 [ 225.953187] ? rcu_read_lock_sched_held+0xa1/0xd0 [ 225.953862] ? rcu_read_lock_bh_held+0xb0/0xb0 [ 225.954464] ? sched_clock_cpu+0x15/0x120 [ 225.955019] ? find_held_lock+0xac/0xd0 [ 225.955564] bio_alloc_bioset+0x1ed/0x2a0 [ 225.956080] ? lock_downgrade+0x3a0/0x3a0 [ 225.956644] ? bvec_alloc+0xc0/0xc0 [ 225.957135] bio_clone_fast+0x19/0x80 [ 225.957651] raid5_make_request+0x1370/0x1b70 [ 225.958286] ? sched_clock_cpu+0x15/0x120 [ 225.958797] ? __lock_acquire+0x8b2/0x3510 [ 225.959339] ? raid5_get_active_stripe+0xce0/0xce0 [ 225.959986] ? lock_is_held_type+0xd8/0x130 [ 225.960528] ? rcu_read_lock_sched_held+0xa1/0xd0 [ 225.961135] ? rcu_read_lock_bh_held+0xb0/0xb0 [ 225.961703] ? sched_clock_cpu+0x15/0x120 [ 225.962232] ? lock_release+0x27a/0x6c0 [ 225.962746] ? do_wait_intr_irq+0x130/0x130 [ 225.963302] ? lock_downgrade+0x3a0/0x3a0 [ 225.963815] ? lock_release+0x6c0/0x6c0 [ 225.964348] md_handle_request+0x342/0x530 [ 225.964888] ? set_in_sync+0x170/0x170 [ 225.965397] ? blk_queue_split+0x133/0x150 [ 225.965988] ? __blk_queue_split+0x8b0/0x8b0 [ 225.966524] ? submit_bio_checks+0x3b2/0x9d0 [ 225.967069] md_submit_bio+0x127/0x1c0 [...] Fix this by moving alloc/free of acct bioset to pers->run and pers->free. While we are on this, properly handle md_integrity_register() error in raid0_run(). Fixes: `daee202471` (md: check level before create and exit io_acct_set) Cc: stable@vger.kernel.org Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org>	2022-01-06 08:37:03 -08:00
Randy Dunlap	dd3dc5f416	md: fix spelling of "its" Use the possessive "its" instead of the contraction "it's" in printed messages. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Song Liu <song@kernel.org> Cc: linux-raid@vger.kernel.org Signed-off-by: Song Liu <song@kernel.org>	2022-01-06 08:37:03 -08:00
Vishal Verma	bf2c411bb1	md: raid456 add nowait support Returns EAGAIN in case the raid456 driver would block waiting for reshape. Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Vishal Verma <vverma@digitalocean.com> Signed-off-by: Song Liu <song@kernel.org>	2022-01-06 08:37:02 -08:00
Vishal Verma	c9aa889b03	md: raid10 add nowait support This adds nowait support to the RAID10 driver. Very similar to raid1 driver changes. It makes RAID10 driver return with EAGAIN for situations where it could wait for eg: - Waiting for the barrier, - Reshape operation, - Discard operation. wait_barrier() and regular_request_wait() fn are modified to return bool to support error for wait barriers. They returns true in case of wait or if wait is not required and returns false if wait was required but not performed to support nowait. Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Vishal Verma <vverma@digitalocean.com> Signed-off-by: Song Liu <song@kernel.org>	2022-01-06 08:37:02 -08:00
Vishal Verma	5aa705039c	md: raid1 add nowait support This adds nowait support to the RAID1 driver. It makes RAID1 driver return with EAGAIN for situations where it could wait for eg: - Waiting for the barrier, wait_barrier() fn is modified to return bool to support error for wait barriers. It returns true in case of wait or if wait is not required and returns false if wait was required but not performed to support nowait. Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Vishal Verma <vverma@digitalocean.com> Signed-off-by: Song Liu <song@kernel.org>	2022-01-06 08:37:02 -08:00
Vishal Verma	f51d46d0e7	md: add support for REQ_NOWAIT commit `021a24460d` ("block: add QUEUE_FLAG_NOWAIT") added support for checking whether a given bdev supports handling of REQ_NOWAIT or not. Since then commit `6abc49468e` ("dm: add support for REQ_NOWAIT and enable it for linear target") added support for REQ_NOWAIT for dm. This uses a similar approach to incorporate REQ_NOWAIT for md based bios. This patch was tested using t/io_uring tool within FIO. A nvme drive was partitioned into 2 partitions and a simple raid 0 configuration /dev/md0 was created. md0 : active raid0 nvme4n1p1[1] nvme4n1p2[0] 937423872 blocks super 1.2 512k chunks Before patch: $ ./t/io_uring /dev/md0 -p 0 -a 0 -d 1 -r 100 Running top while the above runs: $ ps -eL \| grep $(pidof io_uring) 38396 38396 pts/2 00:00:00 io_uring 38396 38397 pts/2 00:00:15 io_uring 38396 38398 pts/2 00:00:13 iou-wrk-38397 We can see iou-wrk-38397 io worker thread created which gets created when io_uring sees that the underlying device (/dev/md0 in this case) doesn't support nowait. After patch: $ ./t/io_uring /dev/md0 -p 0 -a 0 -d 1 -r 100 Running top while the above runs: $ ps -eL \| grep $(pidof io_uring) 38341 38341 pts/2 00:10:22 io_uring 38341 38342 pts/2 00:10:37 io_uring After running this patch, we don't see any io worker thread being created which indicated that io_uring saw that the underlying device does support nowait. This is the exact behaviour noticed on a dm device which also supports nowait. For all the other raid personalities except raid0, we would need to train pieces which involves make_request fn in order for them to correctly handle REQ_NOWAIT. Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Vishal Verma <vverma@digitalocean.com> Signed-off-by: Song Liu <song@kernel.org>	2022-01-06 08:37:02 -08:00
Mariusz Tkaczyk	a92ce0feff	md: drop queue limitation for RAID1 and RAID10 As suggested by Neil Brown[1], this limitation seems to be deprecated. With plugging in use, writes are processed behind the raid thread and conf->pending_count is not increased. This limitation occurs only if caller doesn't use plugs. It can be avoided and often it is (with plugging). There are no reports that queue is growing to enormous size so remove queue limitation for non-plugged IOs too. [1] https://lore.kernel.org/linux-raid/162496301481.7211.18031090130574610495@noble.neil.brown.name Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> Signed-off-by: Song Liu <song@kernel.org>	2022-01-06 08:37:02 -08:00
Davidlohr Bueso	770b1d216d	md/raid5: play nice with PREEMPT_RT raid_run_ops() relies on the implicitly disabled preemption for its percpu ops, although this is really about CPU locality. This breaks RT semantics as it can take regular (and thus sleeping) spinlocks, such as stripe_lock. Add a local_lock such that non-RT does not change and continues to be just map to preempt_disable/enable, but makes RT happy as the region will use a per-CPU spinlock and thus be preemptible and still guarantee CPU locality. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Song Liu <songliubraving@fb.com>	2022-01-06 08:37:02 -08:00
Greg Kroah-Hartman	eaac0b590a	dm sysfs: use default_groups in kobj_type There are currently 2 ways to create a set of sysfs files for a kobj_type, through the default_attrs field, and the default_groups field. Move the dm sysfs code to use default_groups field which has been the preferred way since `aa30f47cf6` ("kobject: Add support for default attribute groups to kobj_type") so that we can soon get rid of the obsolete default_attrs field. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-01-06 09:48:55 -05:00
Kees Cook	f069c7ab6c	dm integrity: Use struct_group() to zero struct journal_sector In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memset(), avoid intentionally writing across neighboring fields. Add struct_group() to mark region of struct journal_sector that should be initialized to zero. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-01-06 09:48:33 -05:00
Joe Thornber	cba23ac158	dm space map common: add bounds check to sm_ll_lookup_bitmap() Corrupted metadata could warrant returning error from sm_ll_lookup_bitmap(). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-01-04 13:58:19 -05:00
Joe Thornber	85bca3c05b	dm btree: add a defensive bounds check to insert_at() Corrupt metadata could trigger an out of bounds write. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-01-04 13:56:03 -05:00
Joe Thornber	c671ffa55d	dm btree remove: change a bunch of BUG_ON() calls to proper errors Abuse of BUG_ON() is never appropriate, best to propagate errors to fail gracefully (rather than take the entire system down). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-01-04 13:48:12 -05:00
Joe Thornber	e36649b648	dm btree spine: eliminate duplicate le32_to_cpu() in node_check() Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-01-04 12:59:55 -05:00
Joe Thornber	851a8cd3f0	dm btree spine: remove extra node_check function declaration Should have been removed as part of commit `f73e2e70ec` ("dm btree spine: remove paranoid node_check call in node_prep_for_write()") Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2022-01-04 12:52:08 -05:00
Song Liu	46669e8616	md/raid1: fix missing bitmap update w/o WriteMostly devices commit [1] causes missing bitmap updates when there isn't any WriteMostly devices. Detailed steps to reproduce by Norbert (which somehow didn't make to lore): # setup md10 (raid1) with two drives (1 GByte sparse files) dd if=/dev/zero of=disk1 bs=1024k seek=1024 count=0 dd if=/dev/zero of=disk2 bs=1024k seek=1024 count=0 losetup /dev/loop11 disk1 losetup /dev/loop12 disk2 mdadm --create /dev/md10 --level=1 --raid-devices=2 /dev/loop11 /dev/loop12 # add bitmap (aka write-intent log) mdadm /dev/md10 --grow --bitmap=internal echo check > /sys/block/md10/md/sync_action root:# cat /sys/block/md10/md/mismatch_cnt 0 root:# # remove member drive disk2 (loop12) mdadm /dev/md10 -f loop12 ; mdadm /dev/md10 -r loop12 # modify degraded md device dd if=/dev/urandom of=/dev/md10 bs=512 count=1 # no blocks recorded as out of sync on the remaining member disk1/loop11 root:# mdadm -X /dev/loop11 \| grep Bitmap Bitmap : 16 bits (chunks), 0 dirty (0.0%) root:# # re-add disk2, nothing synced because of empty bitmap mdadm /dev/md10 --re-add /dev/loop12 # check integrity again echo check > /sys/block/md10/md/sync_action # disk1 and disk2 are no longer in sync, reads return differend data root:# cat /sys/block/md10/md/mismatch_cnt 128 root:# # clean up mdadm -S /dev/md10 losetup -d /dev/loop11 losetup -d /dev/loop12 rm disk1 disk2 Fix this by moving the WriteMostly check to the if condition for alloc_behind_master_bio(). [1] commit `fd3b6975e9` ("md/raid1: only allocate write behind bio for WriteMostly device") Fixes: `fd3b6975e9` ("md/raid1: only allocate write behind bio for WriteMostly device") Cc: stable@vger.kernel.org # v5.12+ Cc: Guoqing Jiang <guoqing.jiang@linux.dev> Cc: Jens Axboe <axboe@kernel.dk> Reported-by: Norbert Warmuth <nwarmuth@t-online.de> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Song Liu <song@kernel.org>	2022-01-03 14:13:48 -08:00
Christoph Hellwig	7ac5360cd4	dax: remove the copy_from_iter and copy_to_iter methods These methods indirect the actual DAX read/write path. In the end pmem uses magic flush and mc safe variants and fuse and dcssblk use plain ones while device mapper picks redirects to the underlying device. Add set_dax_nocache() and set_dax_nomc() APIs to control which copy routines are used to remove indirect call from the read/write fast path as well as a lot of boilerplate code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> [virtiofs] Link: https://lore.kernel.org/r/20211215084508.435401-5-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2021-12-18 08:04:53 -08:00
Christoph Hellwig	30c6828a17	dax: remove the DAXDEV_F_SYNC flag Remove the DAXDEV_F_SYNC flag and thus the flags argument to alloc_dax and just let the drivers call set_dax_synchronous directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20211215084508.435401-4-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2021-12-18 08:04:52 -08:00
Linus Torvalds	fa09ca5ebc	block-5.16-2021-12-17 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmG8wJ4QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpp+aD/9C6yo/8I2rbP0iJqIWKaADKoU9WGo4E4Q5 NFPsEF17IY2mxZGfvpjMMx2fWPbgfvMuXnWPyfHnFv/00Bm3sr79eQ4Y7Ak8pZWQ r0YzNz1lbgHn8S+EsqWGxWbDZisptaYN2d0skT81dTE62s2dcZc7WYHBJnCW5JZb OTwx68e5QoHrWnoHVnCx8toUszFr0OMiQVRVwbrRtJKVh+2rQODIKz2qs2ajceHo IN2Bk35W1xsv6TWGLKCK4zk9k+/Nz5Cqx+FF59bKAIs3kPy3dt10PDjN30jQGR17 DuGZe08wZ+lCMQGXd1XNcjz3GS9qWlc+mbzVpqUATNMQSGbqwW8JyheiNZwfNClw SzWasQD0KipdBiU0hOxBbDko2n4TVFHbnISfBwz0vHEz54HHHtltNd4iBowJuaq4 VG/zuD2CyWaj7p9Xpymrir1xMonVNTRFRL3mkkU8feTxqTEhy2zIix9CbKdGkU5e rEhAAb5i1+/58A8dcCh9X+zULvXjFL6FfbnCGSMrLEq0ymyysc/+w0BPNzq93Jva F8qSx2nUGq0250u+Boc+wL9uB7AXWwR8addyWWlwaAZ52nqVhciWus4gh1jn8CAV EhO+NFbx/9K1rmPkvbJau1/fgnz4+gJS8iBC0pxlyhvt6rNzMFCJB719jcdr05tG JO0131r51g== =Huz8 -----END PGP SIGNATURE----- Merge tag 'block-5.16-2021-12-17' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: - Fix for hammering on the delayed run queue timer (me) - bcache regression fix for this merge window (Lin) - Fix a divide-by-zero in the blk-iocost code (Tejun) * tag 'block-5.16-2021-12-17' of git://git.kernel.dk/linux-block: bcache: fix NULL pointer reference in cached_dev_detach_finish block: reduce kblockd_mod_delayed_work_on() CPU consumption iocost: Fix divide-by-zero on donation from low hweight cgroup	2021-12-17 11:46:07 -08:00
Linus Torvalds	81eebd5405	- Fix use after free in DM btree remove's rebalance_children(). - Fix DM integrity data corruption, introduced during 5.16 merge, due to improper use of bvec_kmap_local(). -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmG7cKcACgkQxSPxCi2d A1qgxggAn4k9cGzxLTT+xgkufQKwsc+WvegR7Amwb/jzKh5XE9KaXDnMkcyz/4GX nHCfynHRgWjPU6V3cESRz/MApG/7sQ6UGtLgIXkZGbeSHIE4aRYf3AECUhpD/uB+ XPX4kTFz0ZvIHYpk4HielOHVA31DQl+GkYXddDXCijXYmG80rpUgUg2fm0+O+TtQ eCQjbQV173KSbi4vlzeDyK9cp2rIGvk/UfmY9cIw1b3Gd5vpCVStW9r+P8MEpSNA ar5exvN9c3AR/VIVfBS/9rw0T+l56M8L0efPrSXEV9/pdiXHFzEx+sGEnDUE5F3o g9K2VwkLtuk3kubiSV/kjNBIB4cZyA== =razj -----END PGP SIGNATURE----- Merge tag 'for-5.16/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Fix use after free in DM btree remove's rebalance_children() - Fix DM integrity data corruption, introduced during 5.16 merge, due to improper use of bvec_kmap_local() * tag 'for-5.16/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm integrity: fix data corruption due to improper use of bvec_kmap_local dm btree remove: fix use after free in rebalance_children()	2021-12-16 10:05:49 -08:00
Mike Snitzer	1cef171abd	dm integrity: fix data corruption due to improper use of bvec_kmap_local Commit `25058d1c72` ("dm integrity: use bvec_kmap_local in __journal_read_write") didn't account for __journal_read_write() later adding the biovec's bv_offset. As such using bvec_kmap_local() caused the start of the biovec to be skipped. Trivial test that illustrates data corruption: # integritysetup format /dev/pmem0 # integritysetup open /dev/pmem0 integrityroot # mkfs.xfs /dev/mapper/integrityroot ... bad magic number bad magic number Metadata corruption detected at xfs_sb block 0x0/0x1000 libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x1000 releasing dirty buffer (bulk) to free list! Fix this by using kmap_local_page() instead of bvec_kmap_local() in __journal_read_write(). Fixes: `25058d1c72` ("dm integrity: use bvec_kmap_local in __journal_read_write") Reported-by: Tony Asleson <tasleson@redhat.com> Reviewed-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2021-12-15 14:16:35 -05:00
Lin Feng	aa97f6cdb7	bcache: fix NULL pointer reference in cached_dev_detach_finish Commit `0259d4498b` ("bcache: move calc_cached_dev_sectors to proper place on backing device detach") tries to fix calc_cached_dev_sectors when bcache device detaches, but now we have: cached_dev_detach_finish ... bcache_device_detach(&dc->disk); ... closure_put(&d->c->caching); d->c = NULL; [explicitly set dc->disk.c to NULL] list_move(&dc->list, &uncached_devices); calc_cached_dev_sectors(dc->disk.c); [passing a NULL pointer] ... Upper codeflows shows how bug happens, this patch fix the problem by caching dc->disk.c beforehand, and cache_set won't be freed under us because c->caching closure at least holds a reference count and closure callback __cache_set_unregister only being called by bch_cache_set_stop which using closure_queue(&c->caching), that means c->caching closure callback for destroying cache_set won't be trigger by previous closure_put(&d->c->caching). So at this stage(while cached_dev_detach_finish is calling) it's safe to access cache_set dc->disk.c. Fixes: `0259d4498b` ("bcache: move calc_cached_dev_sectors to proper place on backing device detach") Signed-off-by: Lin Feng <linf@wangsu.com> Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20211112053629.3437-2-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-14 20:32:54 -07:00
zhangyue	07641b5f32	md: fix double free of mddev->private in autorun_array() In driver/md/md.c, if the function autorun_array() is called, the problem of double free may occur. In function autorun_array(), when the function do_md_run() returns an error, the function do_md_stop() will be called. The function do_md_run() called function md_run(), but in function md_run(), the pointer mddev->private may be freed. The function do_md_stop() called the function __md_stop(), but in function __md_stop(), the pointer mddev->private also will be freed without judging null. At this time, the pointer mddev->private will be double free, so it needs to be judged null or not. Signed-off-by: zhangyue <zhangyue1@kylinos.cn> Signed-off-by: Song Liu <songliubraving@fb.com>	2021-12-10 09:11:07 -08:00
Markus Hochholdinger	55df1ce0d4	md: fix update super 1.0 on rdev size change The superblock of version 1.0 doesn't get moved to the new position on a device size change. This leads to a rdev without a superblock on a known position, the raid can't be re-assembled. The line was removed by mistake and is re-added by this patch. Fixes: `d9c0fa509e` ("md: fix max sectors calculation for super 1.0") Cc: stable@vger.kernel.org Signed-off-by: Markus Hochholdinger <markus@hochholdinger.net> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com>	2021-12-10 09:11:07 -08:00
Christoph Hellwig	cd913c76f4	dax: return the partition offset from fs_dax_get_by_bdev Prepare for the removal of the block_device from the DAX I/O path by returning the partition offset from fs_dax_get_by_bdev so that the file systems have it at hand for use during I/O. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-26-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2021-12-04 08:58:54 -08:00
Christoph Hellwig	2a68553e8a	dm-stripe: add a stripe_dax_pgoff helper Add a helper to perform the entire remapping for DAX accesses. This helper open codes bdev_dax_pgoff given that the alignment checks have already been done by the submitting file system and don't need to be repeated. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20211129102203.2243509-12-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2021-12-04 08:58:52 -08:00
Christoph Hellwig	d19bd6756e	dm-log-writes: add a log_writes_dax_pgoff helper Add a helper to perform the entire remapping for DAX accesses. This helper open codes bdev_dax_pgoff given that the alignment checks have already been done by the submitting file system and don't need to be repeated. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20211129102203.2243509-11-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2021-12-04 08:58:52 -08:00
Christoph Hellwig	f43e0065c2	dm-linear: add a linear_dax_pgoff helper Add a helper to perform the entire remapping for DAX accesses. This helper open codes bdev_dax_pgoff given that the alignment checks have already been done by the submitting file system and don't need to be repeated. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20211129102203.2243509-10-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2021-12-04 08:58:51 -08:00
Christoph Hellwig	7b0800d00d	dax: remove dax_capable Just open code the block size and dax_dev == NULL checks in the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> [erofs] Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-9-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2021-12-04 08:58:51 -08:00
Christoph Hellwig	fb08a1908c	dax: simplify the dax_device <-> gendisk association Replace the dax_host_hash with an xarray indexed by the pointer value of the gendisk, and require explicitly calls from the block drivers that want to associate their gendisk with a dax_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-5-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2021-12-04 08:58:51 -08:00
Christoph Hellwig	5d2a228b9e	dm: make the DAX support depend on CONFIG_FS_DAX The device mapper DAX support is all hanging off a block device and thus can't be used with device dax. Make it depend on CONFIG_FS_DAX instead of CONFIG_DAX_DRIVER. This also means that bdev_dax_pgoff only needs to be built under CONFIG_FS_DAX now. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20211129102203.2243509-3-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2021-12-04 08:58:50 -08:00
Christoph Hellwig	d751939235	dm: fix alloc_dax error handling in alloc_dev Make sure ->dax_dev is NULL on error so that the cleanup path doesn't trip over an ERR_PTR. Reported-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211129102203.2243509-2-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2021-12-04 08:58:50 -08:00

1 2 3 4 5 ...

7012 Commits