OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Christoph Hellwig	14ef1e4826	dm mpath: merge do_end_io_bio into multipath_end_io_bio This simplifies the code and especially the error passing a bit and will help with the next patch. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-06-09 09:27:32 -06:00
Christoph Hellwig	f98e0eb680	dm mpath: multipath_clone_and_map must not return -EIO Since `412445ac` ("dm: introduce a new DM_MAPIO_KILL return value"), the clone_and_map_rq methods must not return errno values, so fix it up to properly return DM_MAPIO_KILL, instead of the -EIO value that snuck in due to a conflict between two patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-15 15:09:53 -04:00
Christoph Hellwig	18a482f524	dm mpath: don't return -EIO from dm_report_EIO Instead just turn the macro into a helper for the warning message. This removes an unnecessary assignment and will allow the next commit to fix a place where -EIO is the wrong return value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-15 15:09:52 -04:00
Christoph Hellwig	7ed8578a96	dm rq: change ->rq_end_io calling conventions Instead of returning either a DM_ENDIO_* constant or an error code, add a new DM_ENDIO_DONE value that means keep errno as is. This allows us to easily keep the existing error code in case where we can't push back, and it also preparares for the new block level status codes with strict type checking. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-01 18:19:03 -04:00
Christoph Hellwig	b79f10eefd	dm mpath: merge do_end_io into multipath_end_io This simplifies the I/O completion path a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-01 18:19:02 -04:00
Mike Snitzer	7e25a76061	Merge branch 'dm-4.12' into dm-4.12-post-merge	2017-05-01 18:18:04 -04:00
Bart Van Assche	86331f39a5	dm mpath: make it easier to detect unintended I/O request flushes I/O errors triggered by multipathd incorrectly not enabling the no-flush flag for DM_DEVICE_SUSPEND or DM_DEVICE_RESUME are hard to debug. Add more logging to make it easier to debug this. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-04-27 17:08:47 -04:00
Bart Van Assche	9a8ac3ae68	dm mpath: cleanup QUEUE_IF_NO_PATH bit manipulation by introducing assign_bit() No functional change but makes the code easier to read. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-04-27 17:08:46 -04:00
Bart Van Assche	ca5beb76c3	dm mpath: micro-optimize the hot path relative to MPATHF_QUEUE_IF_NO_PATH Instead of checking MPATHF_QUEUE_IF_NO_PATH, MPATHF_SAVED_QUEUE_IF_NO_PATH and the no_flush flag to decide whether or not to push back a request (or bio) if there are no paths available, only clear MPATHF_QUEUE_IF_NO_PATH in queue_if_no_path() if no_flush has not been set. The result is that only a single bit has to be tested in the hot path to decide whether or not a request must be pushed back and also that m->lock does not have to be taken in the hot path. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-04-27 17:08:45 -04:00
Bart Van Assche	7e0d574f26	dm: introduce enum dm_queue_mode to cleanup related code Introduce an enumeration type for the queue mode. This patch does not change any functionality but makes the DM code easier to read. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-04-27 17:08:44 -04:00
Bart Van Assche	b194679fac	dm mpath: verify __pg_init_all_paths locking assumptions at runtime Verify at runtime that __pg_init_all_paths() is called with multipath.lock held if lockdep is enabled. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-04-27 17:08:43 -04:00
Bart Van Assche	c1d7ecf7ca	dm mpath: delay requeuing while path initialization is in progress Requeuing a request immediately while path initialization is ongoing causes high CPU usage, something that is undesired. Hence delay requeuing while path initialization is in progress. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-04-27 17:08:01 -04:00
Bart Van Assche	7083abbbfc	dm mpath: avoid that path removal can trigger an infinite loop If blk_get_request() fails, check whether the failure is due to a path being removed. If that is the case, fail the path by triggering a call to fail_path(). This avoids that the following scenario can be encountered while removing paths: * CPU usage of a kworker thread jumps to 100%. * Removing the DM device becomes impossible. Delay requeueing if blk_get_request() returns -EBUSY or -EWOULDBLOCK, and the queue is not dying, because in these cases immediate requeuing is inappropriate. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-04-27 17:04:27 -04:00
Bart Van Assche	89bfce763e	dm mpath: split and rename activate_path() to prepare for its expanded use activate_path() is renamed to activate_path_work() which now calls activate_or_offline_path(). activate_or_offline_path() will be used by the next commit. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-04-27 17:00:35 -04:00
Bart Van Assche	06eb061f48	dm mpath: requeue after a small delay if blk_get_request() fails If blk_get_request() returns ENODEV then multipath_clone_and_map() causes a request to be requeued immediately. This can cause a kworker thread to spend 100% of the CPU time of a single core in __blk_mq_run_hw_queue() and also can cause device removal to never finish. Avoid this by only requeuing after a delay if blk_get_request() fails. Additionally, reduce the requeue delay. Cc: stable@vger.kernel.org # 4.9+ Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-04-24 15:06:19 -04:00
Christoph Hellwig	8fc7798058	dm mpath: don't check for req->errors We'll get all proper errors reported through ->end_io and ->errors will go away soon. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-20 12:16:10 -06:00
Christoph Hellwig	ac62d6208a	dm: support REQ_OP_WRITE_ZEROES Copy & paste from the REQ_OP_WRITE_SAME code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-08 11:25:38 -06:00
Jens Axboe	818551e2b2	Merge branch 'for-4.11/next' into for-4.11/linus-merge Signed-off-by: Jens Axboe <axboe@fb.com>	2017-02-17 14:08:19 -07:00
Mike Snitzer	d19a55ccad	dm mpath: cleanup -Wbool-operation warning in choose_pgpath() Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-02-03 10:18:37 -05:00
Christoph Hellwig	eb8db831be	dm: always defer request allocation to the owner of the request_queue DM already calls blk_mq_alloc_request on the request_queue of the underlying device if it is a blk-mq device. But now that we allow drivers to allocate additional data and initialize it ahead of time we need to do the same for all drivers. Doing so and using the new cmd_size infrastructure in the block layer greatly simplifies the dm-rq and mpath code, and should also make arbitrary combinations of SQ and MQ devices with SQ or MQ device mapper tables easily possible as a further step. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-01-27 15:08:35 -07:00
tang.junhui	54cd640d20	dm mpath: use hw_handler_params if attached hw_handler is same as requested Let the requested m->hw_handler_params be used if the attached hardware handler is the same handler as requested with m->hw_handler_name. Signed-off-by: tang.junhui <tang.junhui@zte.com.cn> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-12-08 14:13:10 -05:00
Bart Van Assche	6599c84e4c	dm mpath: do not modify __clone if blk_mq_alloc_request() fails Purely cleanup, avoids potential for strange coding bugs. But in reality if __multipath_map() fails the caller has no business looking at __clone. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-11-21 09:52:10 -05:00
Bart Van Assche	4813577f93	dm mpath: change return type of pg_init_all_paths() from int to void None of the callers of pg_init_all_paths() check its return value. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-11-21 09:52:09 -05:00
tang.junhui	cc5bd925f1	dm mpath: add checks for priority group count to avoid invalid memory access This avoids the potential for invalid memory access, if/when there are no priority groups, in response to invalid arguments being sent by the user via DM message (e.g. "switch_group", "disable_group" or "enable_group"). Signed-off-by: tang.junhui <tang.junhui@zte.com.cn> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-11-21 09:52:08 -05:00
tang.junhui	f97dc42128	dm mpath: add m->hw_handler_name NULL pointer check in parse_hw_handler() Avoids false positive of no hardware handler being specified (which is implied by a NULL m->hw_handler_name). Signed-off-by: tang.junhui <tang.junhui@zte.com.cn> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-11-21 09:52:07 -05:00
Hannes Reinecke	8ff232c1a8	dm mpath: always return reservation conflict without failing over If dm-mpath encounters an reservation conflict it should not fail the path (as communication with the target is not affected) but should rather retry on another path. However, in doing so we might be inducing a ping-pong between paths, with no guarantee of any forward progress. And arguably a reservation conflict is an unexpected error, so we should be passing it upwards to allow the application to take appropriate steps. This change resolves a show-stopper problem seen with the pNFS SCSI layout because it is trivial to hit reservation conflict based failover loops without it. Doubts were raised about the implications of this change relative to products like IBM's SVC. But there is little point withholding a fix for Linux because a proprietary product may or may not have some issues in its implementation of how it interfaces with Linux. In the future, if there is glaring evidence that this change is certainly problematic we can revisit it. Signed-off-by: Hannes Reinecke <hare@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> # tweaked header	2016-09-29 10:57:07 -04:00
Mike Snitzer	b88efd43f9	dm mpath: delay the requeue of blk-mq requests while all paths down Return DM_MAPIO_DELAY_REQUEUE from .clone_and_map_rq. Also, return false from .busy, if all paths are down, so that blk-mq requests get mapped via .clone_and_map_rq -- which results in DM_MAPIO_DELAY_REQUEUE being returned to dm-rq. This change allows for a noticeable reduction in cpu utilization (reduced kworker load) while all paths are down, e.g.: system CPU idleness (as measured by fio's --idle-prof=system): before: system: 86.58% after: system: 98.60% Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com>	2016-09-15 11:16:17 -04:00
Mike Snitzer	7e48c768f4	dm mpath: use dm_mq_kick_requeue_list() When reinstating a path the blk-mq request_queue's requeue_list should get kicked. It makes sense to kick the requeue_list as part of the existing hook (previously only used by bio-based support). Rename process_queued_bios_list to process_queued_io_list. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com>	2016-09-15 11:16:11 -04:00
Bart Van Assche	9f4c3f874a	dm: convert wait loops to use autoremove_wake_function() Use autoremove_wake_function() instead of default_wake_function() to make the dm wait loops more similar to other wait loops in the kernel. This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-09-14 13:56:38 -04:00
Mike Snitzer	f10e06b744	dm mpath: check if path's request_queue is dying in activate_path() If pg_init_retries is set and a request is queued against a multipath device with all underlying block device request_queues in the "dying" state then an infinite loop is triggered because activate_path() never succeeds and hence never calls pg_init_done(). This change avoids that device removal triggers an infinite loop by failing the activate_path() which causes the "dying" path to be failed. Reported-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org	2016-09-14 13:56:38 -04:00
Jens Axboe	1eff9d322a	block: rename bio bi_rw to bi_opf Since commit `63a4cc2486`, bio->bi_rw contains flags in the lower portion and the op code in the higher portions. This means that old code that relies on manually setting bi_rw is most likely going to be broken. Instead of letting that brokeness linger, rename the member, to force old and out-of-tree code to break at compile time instead of at runtime. No intended functional changes in this commit. Signed-off-by: Jens Axboe <axboe@fb.com>	2016-08-07 14:41:02 -06:00
Mike Snitzer	1814f2e3fb	dm mpath: add locking to multipath_resume and must_push_back Multiple flags were being tested without locking. Protect against non-atomic bit changes in m->flags by holding m->lock (while testing or setting the queue_if_no_path related flags). Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-08-02 16:21:34 -04:00
Mike Snitzer	e83068a5fa	dm mpath: add optional "queue_mode" feature Allow a user to specify an optional feature 'queue_mode <mode>' where <mode> may be "bio", "rq" or "mq" -- which corresponds to bio-based, request_fn rq-based, and blk-mq rq-based respectively. If the queue_mode feature isn't specified the default for the "multipath" target is still "rq" but if dm_mod.use_blk_mq is set to Y it'll default to mode "mq". This new queue_mode feature introduces the ability for each multipath device to have its own queue_mode (whereas before this feature all multipath devices effectively had to have the same queue_mode). This commit also goes a long way to eliminate the awkward (ab)use of DM_TYPE_*, the associated filter_md_type() and other relatively fragile and difficult to maintain code. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-06-10 15:16:02 -04:00
Mike Snitzer	bf661be1fc	dm mpath: remove bio-based bloat from struct dm_mpath_io Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-06-10 15:15:48 -04:00
Mike Snitzer	76e33fe4e2	dm mpath: reinstate bio-based support Add "multipath-bio" target that offers a bio-based multipath target as an alternative to the request-based "multipath" target -- but in a following commit "multipath-bio" will immediately be replaced by a new "queue_mode" feature for the "multipath" target which will allow bio-based mode to be selected. When DM multipath was originally converted from bio-based to request-based the motivation for the change was better dynamic load balancing (by leveraging block core's request-based IO schedulers, for merging and sorting, _before_ DM multipath would make the decision on where to steer the IO -- based on path load and/or availability). More background is available in this "Request-based Device-mapper multipath and Dynamic load balancing" paper: https://www.kernel.org/doc/ols/2007/ols2007v2-pages-235-244.pdf But we've now come full circle where significantly faster storage devices no longer need IOs to be made larger to drive optimal IO performance. And even if they do there have been changes to the block and filesystem layers that help ensure upper layers are constructing larger IOs. In addition, SCSI's differentiated IO errors will propagate through to bio-based IO completion hooks -- so that eliminates another historic justiciation for request-based DM multipath. Lastly, the block layer's immutable biovec changes have made bio cloning cheaper than it has ever been; whereas request cloning is still relatively expensive (both on a CPU usage and memory footprint level). As such, bio-based DM multipath offers the promise of a more efficient IO path for high IOPs devices that are, or will be, emerging. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-06-10 15:15:47 -04:00
Mike Snitzer	4cc96131af	dm: move request-based code out to dm-rq.[hc] Add some seperation between bio-based and request-based DM core code. 'struct mapped_device' and other DM core only structures and functions have been moved to dm-core.h and all relevant DM core .c files have been updated to include dm-core.h rather than dm.h DM targets should _never_ include dm-core.h! [block core merge conflict resolution from Stephen Rothwell] Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>	2016-06-10 15:15:44 -04:00
Mike Snitzer	2da1610ae2	dm mpath: eliminate use of spinlock in IO fast-paths The primary motivation of this commit is to improve the scalability of DM multipath on large NUMA systems where m->lock spinlock contention has been proven to be a serious bottleneck on really fast storage. The ability to atomically read a pointer, using lockless_dereference(), is leveraged in this commit. But all pointer writes are still protected by the m->lock spinlock (which is fine since these all now occur in the slow-path). The following functions no longer require the m->lock spinlock in their fast-path: multipath_busy(), __multipath_map(), and do_end_io() And choose_pgpath() is modified to _not_ update m->current_pgpath unless it also switches the path-group. This is done to avoid needing to take the m->lock everytime __multipath_map() calls choose_pgpath(). But m->current_pgpath will be reset if it is failed via fail_path(). Suggested-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-05-05 15:25:52 -04:00
Mike Snitzer	20800cb345	dm mpath: move trigger_event member to the end of 'struct multipath' Allows the 'work_mutex' member to no longer cross a cacheline. Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-05-05 15:25:52 -04:00
Mike Snitzer	91e968aa60	dm mpath: use atomic_t for counting members of 'struct multipath' The use of atomic_t for nr_valid_paths, pg_init_in_progress and pg_init_count will allow relaxing the use of the m->lock spinlock. Suggested-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-05-05 15:25:51 -04:00
Mike Snitzer	518257b132	dm mpath: switch to using bitops for state flags Mechanical change that doesn't make any real effort to reduce the use of m->lock; that will come later (once atomics are used for counters, etc). Suggested-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-05-05 15:25:50 -04:00
Mike Snitzer	ec31f3f78a	dm mpath: cleanup reinstate_path() et al based on code review fail_path() will print a "Failing path ..." message but reinstate_path() doesn't print a "Reinstating path ...". Add that message to reinstate_path() to add symmetry and aid system debugging. Remove reinstate_path()'s check for the path_selector providing .reinstate_path hook. All path selectors provide this and any future ones must too. activate_path() calls pg_init_done() with SCSI_DH_DEV_OFFLINED but pg_init_done() doesn't expicitly handle it in its swicth statement. Add SCSI_DH_DEV_OFFLINED to the default case. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-03-10 17:12:04 -05:00
Mike Snitzer	9f54cec553	dm mpath: remove __pgpath_busy forward declaration, rename to pgpath_busy Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-02-22 22:34:44 -05:00
Mike Snitzer	be7d31cca8	dm mpath: switch from 'unsigned' to 'bool' for flags where appropriate Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-02-22 22:34:43 -05:00
Mike Snitzer	90a4323ccf	dm path selector: remove 'repeat_count' return from .select_path hook If a path selector has any use for a repeat_count it should be handled locally and not depend on the dm-mpath core to be concerned with it. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-02-22 22:34:42 -05:00
Mike Snitzer	21136f89d7	dm mpath: remove repeat_count support from multipath core Preparation for making __multipath_map() avoid taking the m->lock spinlock -- in favor of using RCU locking. repeat_count was primarily for bio-based DM multipath's benefit. There is really no need for it anymore now that DM multipath is request-based. As such, repeat_count > 1 is no longer honored and a warning is displayed if the user attempts to use a value > 1. This is a temporary change for the round-robin path-selector (as a later commit will restore its support for repeat_count > 1). Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-02-22 22:34:40 -05:00
Mike Snitzer	7943bd6dd3	dm mpath: remove unnecessary casts in front of ti->private Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-02-22 22:34:40 -05:00
Mike Snitzer	78ce23b518	dm mpath: use blk_mq_alloc_request() and blk_mq_free_request() directly There isn't any need to support both old .request_fn and blk-mq paths in the blk-mq specific portion of __multipath_map(). Call blk_mq_alloc_request() directly rather than use blk_get_request(). Similarly, call blk_mq_free_request(), rather than blk_put_request(), in multipath_release_clone(). Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-02-22 22:34:39 -05:00
Mike Snitzer	2eff1924e1	dm mpath: cleanup 'struct dm_mpath_io' management code Refactor and rename existing interfaces to be more specific and self-documenting. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-02-22 22:34:39 -05:00
Mike Snitzer	8637a6bf14	dm mpath: use blk-mq pdu for per-request 'struct dm_mpath_io' Allow the multipath target to avoid making small allocations for each 'struct dm_mpath_io' that is needed for each request. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-02-22 22:34:38 -05:00
Mike Snitzer	eca7ee6dc0	dm: distinquish old .request_fn (dm-old) vs dm-mq request-based DM Rename various methods to have either a "dm_old" or "dm_mq" prefix. Improve code comments to assist with understanding the duality of code that handles both "dm_old" and "dm_mq" cases. It is no much easier to quickly look at the code and _know_ that a given method is either 1) "dm_old" only 2) "dm_mq" only 3) common to both. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-02-22 22:34:33 -05:00
Mike Snitzer	c5248f79f3	dm: remove support for stacking dm-mq on .request_fn device(s) Remove all fiddley code that propped up this support for a blk-mq request-queue ontop of all .request_fn devices. Testing has proven this niche request-based dm-mq mode to be buggy, when testing fault tolerance with DM multipath, and there is no point trying to preserve it. Should help improve efficiency of pure dm-mq code and make code maintenance less delicate. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-02-22 22:33:46 -05:00
Mike Snitzer	16f122661d	dm: optimize dm_mq_queue_rq() DM multipath is the only dm-mq target. But that aside, request-based DM only supports tables with a single target that is immutable. Leverage this fact in dm_mq_queue_rq() by using the 'immutable_target' stored in the mapped_device when the table was made active. This saves the need to even take the read-side of the SRCU via dm_{get,put}_live_table. If the active DM table does not have an immutable target (e.g. "error" target was swapped in) then fallback to the slow-path where the target is looked up from the live table. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-02-22 11:06:22 -05:00
Junichi Nomura	43e43c9ea6	dm mpath: fix infinite recursion in ioctl when no paths and !queue_if_no_path In multipath_prepare_ioctl(), - pgpath is a path selected from available paths - m->queue_io is true if we cannot send a request immediately to paths, either because: * there is no available path * the path group needs activation (pg_init) - pg_init is not started - pg_init is still running - m->queue_if_no_path is true if the device is configured to queue I/O if there are no available paths If !pgpath && !m->queue_if_no_path, the handler should return -EIO. However in the course of refactoring the condition check has broken and returns success in that case. Since bdev points to the dm device itself, dm_blk_ioctl() calls __blk_dev_driver_ioctl() for itself and recurses until crash. You could reproduce the problem like this: # dmsetup create mp --table '0 1024 multipath 0 0 0 0' # sg_inq /dev/mapper/mp <crash> [ 172.648615] BUG: unable to handle kernel paging request at fffffffc81b10268 [ 172.662843] PGD 19dd067 PUD 0 [ 172.666269] Thread overran stack, or stack corrupted [ 172.671808] Oops: 0000 [#1] SMP ... Fix the condition check with some clarifications. Fixes: `e56f81e0b0` ("dm: refactor ioctl handling") Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-11-17 14:19:00 -05:00
Junichi Nomura	5bbbfdf685	dm: fix ioctl retry termination with signal dm-mpath retries ioctl, when no path is readily available and the device is configured to queue I/O in such a case. If you want to stop the retry before multipathd decides to turn off queueing mode, you could send signal for the process to exit from the loop. However the check of fatal signal has not carried along when commit `6c182cd88d` ("dm mpath: fix ioctl deadlock when no paths") moved the loop from dm-mpath to dm core. As a result, we can't terminate such a process in the retry loop. Easy reproducer of the situation is: # dmsetup create mp --table '0 1024 multipath 0 0 0 0' # dmsetup message mp 0 'queue_if_no_path' # sg_inq /dev/mapper/mp then you should be able to terminate sg_inq by pressing Ctrl+C. Fixes: `6c182cd88d` ("dm mpath: fix ioctl deadlock when no paths") Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org	2015-11-17 14:04:32 -05:00
Christoph Hellwig	71cdb6978a	dm: add support for passing through persistent reservations This adds support to pass through persistent reservation requests similar to the existing ioctl handling, and with the same limitations, e.g. devices may only have a single target attached. This is mostly intended for multipathing. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-10-31 19:05:59 -04:00
Christoph Hellwig	e56f81e0b0	dm: refactor ioctl handling This moves the call to blkdev_ioctl and the argument checking to DM core code, and only leaves a callout to find the block device to operate on in the targets. This simplifies the code and allows us to pass through ioctl-like command using other methods in the next patch. Also split out a helper around calling the prepare_ioctl method that will be reused for persistent reservation handling. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-10-31 19:05:59 -04:00
Mauricio Faria de Oliveira	47796938c4	Revert "dm mpath: fix stalls when handling invalid ioctls" This reverts commit `a1989b3300`. That commit introduced a regression at least for the case of the SG_IO ioctl() running without CAP_SYS_RAWIO capability (e.g., unprivileged users) when there are no active paths: the ioctl() fails with the ENOTTY errno immediately rather than blocking due to queue_if_no_path until a path becomes active, for example. That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices (qemu "-device scsi-block" [1], libvirt "<disk type='block' device='lun'>" [2]) from multipath devices; which leads to SCSI/filesystem errors in such a guest. More general scenarios can hit that regression too. The following demonstration employs a SG_IO ioctl() with a standard SCSI INQUIRY command for this objective (some output & user changes omitted for brevity and comments added for clarity). Reverting that commit restores normal operation (queueing) in failing scenarios; tested on linux-next (next-20151022). 1) Test-case is based on sg_simple0 [3] (just SG_IO; remove SG_GET_VERSION_NUM) $ cat sg_simple0.c ... see [3] ... $ sed '/SG_GET_VERSION_NUM/,/}/d' sg_simple0.c > sgio_inquiry.c $ gcc sgio_inquiry.c -o sgio_inquiry 2) The ioctl() works fine with active paths present. # multipath -l 85ag56 85ag56 (...) dm-19 IBM ,2145 size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw \|-+- policy='service-time 0' prio=0 status=active \| \|- 8:0:11:0 sdz 65:144 active undef running \| `- 9:0:9:0 sdbf 67:144 active undef running `-+- policy='service-time 0' prio=0 status=enabled \|- 8:0:12:0 sdae 65:224 active undef running `- 9:0:12:0 sdbo 68:32 active undef running $ ./sgio_inquiry /dev/mapper/85ag56 Some of the INQUIRY command's response: IBM 2145 0000 INQUIRY duration=0 millisecs, resid=0 3) The ioctl() fails with ENOTTY errno with _no_ active paths present, for unprivileged users (rather than blocking due to queue_if_no_path). # for path in $(multipath -l 85ag56 \| grep -o 'sd[a-z]\+'); \ do multipathd -k"fail path $path"; done # multipath -l 85ag56 85ag56 (...) dm-19 IBM ,2145 size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw \|-+- policy='service-time 0' prio=0 status=enabled \| \|- 8:0:11:0 sdz 65:144 failed undef running \| `- 9:0:9:0 sdbf 67:144 failed undef running `-+- policy='service-time 0' prio=0 status=enabled \|- 8:0:12:0 sdae 65:224 failed undef running `- 9:0:12:0 sdbo 68:32 failed undef running $ ./sgio_inquiry /dev/mapper/85ag56 sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device 4) dmesg shows that scsi_verify_blk_ioctl() failed for SG_IO (0x2285); it returns -ENOIOCTLCMD, later replaced with -ENOTTY in vfs_ioctl(). $ dmesg <...> [] device-mapper: multipath: Failing path 65:144. [] device-mapper: multipath: Failing path 67:144. [] device-mapper: multipath: Failing path 65:224. [] device-mapper: multipath: Failing path 68:32. [] sgio_inquiry: sending ioctl 2285 to a partition! 5) The ioctl() only works if the SYS_CAP_RAWIO capability is present (then queueing happens -- in this example, queue_if_no_path is set); this is due to a conditional check in scsi_verify_blk_ioctl(). # capsh --drop=cap_sys_rawio -- -c './sgio_inquiry /dev/mapper/85ag56' sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device # ./sgio_inquiry /dev/mapper/85ag56 & [1] 72830 # cat /proc/72830/stack [<c00000171c0df700>] 0xc00000171c0df700 [<c000000000015934>] __switch_to+0x204/0x350 [<c000000000152d4c>] msleep+0x5c/0x80 [<c00000000077dfb0>] dm_blk_ioctl+0x70/0x170 [<c000000000487c40>] blkdev_ioctl+0x2b0/0x9b0 [<c0000000003128e4>] block_ioctl+0x64/0xd0 [<c0000000002dd3b0>] do_vfs_ioctl+0x490/0x780 [<c0000000002dd774>] SyS_ioctl+0xd4/0xf0 [<c000000000009358>] system_call+0x38/0xd0 6) This is the function call chain exercised in this analysis: SYSCALL_DEFINE3(ioctl, <...>) @ fs/ioctl.c -> do_vfs_ioctl() -> vfs_ioctl() ... error = filp->f_op->unlocked_ioctl(filp, cmd, arg); ... -> dm_blk_ioctl() @ drivers/md/dm.c -> multipath_ioctl() @ drivers/md/dm-mpath.c ... (bdev = NULL, due to no active paths) ... if (!bdev \|\| <...>) { int err = scsi_verify_blk_ioctl(NULL, cmd); if (err) r = err; } ... -> scsi_verify_blk_ioctl() @ block/scsi_ioctl.c ... if (bd && bd == bd->bd_contains) // not taken (bd = NULL) return 0; ... if (capable(CAP_SYS_RAWIO)) // not taken (unprivileged user) return 0; ... printk_ratelimited(KERN_WARNING "%s: sending ioctl %x to a partition!\n" <...>); return -ENOIOCTLCMD; <- ... return r ? : <...> <- ... if (error == -ENOIOCTLCMD) error = -ENOTTY; out: return error; ... Links: [1] http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52 [2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device') [3] http://tldp.org/HOWTO/SCSI-Generic-HOWTO/pexample.html (Revision 1.2, 2002-05-03) Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org	2015-10-31 18:53:51 -04:00
Christoph Hellwig	566079c849	dm-mpath, scsi_dh: request scsi_dh modules in scsi_dh, not dm-mpath This way we can reused the same code any attachment method, not just those requested from dm-mpath. [jejb: fixup checkpatch error] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>	2015-08-28 13:14:55 -07:00
Christoph Hellwig	1bab0de027	dm-mpath, scsi_dh: don't let dm detach device handlers While allowing dm-mpath to attach device handlers is a functionality we need for backwards compatibility reason there is no reason to reference count them and detach them if dm-mpath stops using the device for some reason. If the device handler works for the given device it can just stay attached, and we can take the retain_hw_handler codepath. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Hannes Reinecke <hare@Suse.de> Signed-off-by: James Bottomley <JBottomley@Odin.com>	2015-08-28 13:14:54 -07:00
Mike Snitzer	4c6dd53dd3	dm mpath: fix leak of dm_mpath_io structure in blk-mq .queue_rq error path Otherwise kmemleak reported: unreferenced object 0xffff88009b14e2b0 (size 16): comm "fio", pid 4274, jiffies 4294978034 (age 1253.210s) hex dump (first 16 bytes): 40 12 f3 99 01 88 ff ff 00 10 00 00 00 00 00 00 @............... backtrace: [<ffffffff81600029>] kmemleak_alloc+0x49/0xb0 [<ffffffff811679a8>] kmem_cache_alloc+0xf8/0x160 [<ffffffff8111c950>] mempool_alloc_slab+0x10/0x20 [<ffffffff8111cb37>] mempool_alloc+0x57/0x150 [<ffffffffa04d2b61>] __multipath_map.isra.17+0xe1/0x220 [dm_multipath] [<ffffffffa04d2cb5>] multipath_clone_and_map+0x15/0x20 [dm_multipath] [<ffffffffa02889b5>] map_request.isra.39+0xd5/0x220 [dm_mod] [<ffffffffa028b0e4>] dm_mq_queue_rq+0x134/0x240 [dm_mod] [<ffffffff812cccb5>] __blk_mq_run_hw_queue+0x1d5/0x380 [<ffffffff812ccaa5>] blk_mq_run_hw_queue+0xc5/0x100 [<ffffffff812ce350>] blk_sq_make_request+0x240/0x300 [<ffffffff812c0f30>] generic_make_request+0xc0/0x110 [<ffffffff812c0ff2>] submit_bio+0x72/0x150 [<ffffffff811c07cb>] do_blockdev_direct_IO+0x1f3b/0x2da0 [<ffffffff811c166e>] __blockdev_direct_IO+0x3e/0x40 [<ffffffff8120aa1a>] ext4_direct_IO+0x1aa/0x390 Fixes: `e5863d9ad` ("dm: allocate requests in target when stacking on blk-mq devices") Reported-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 4.0+	2015-05-27 17:37:22 -04:00
Mike Snitzer	022333427a	dm: optimize dm_mq_queue_rq to _not_ use kthread if using pure blk-mq dm_mq_queue_rq() is in atomic context so care must be taken to not sleep -- as such GFP_ATOMIC is used for the md->bs bioset allocations and dm-mpath's call to blk_get_request(). In the future the bioset allocations will hopefully go away (by removing support for partial completions of bios in a cloned request). Also prepare for supporting DM blk-mq ontop of old-style request_fn device(s) if a new dm-mod 'use_blk_mq' parameter is set. The kthread will still be used to queue work if blk-mq is used ontop of old-style request_fn device(s). Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-04-15 12:10:17 -04:00
Mike Snitzer	bfebd1cdb4	dm: add full blk-mq support to request-based DM Commit `e5863d9ad` ("dm: allocate requests in target when stacking on blk-mq devices") served as the first step toward fully utilizing blk-mq in request-based DM -- it enabled stacking an old-style (request_fn) request_queue ontop of the underlying blk-mq device(s). That first step didn't improve performance of DM multipath ontop of fast blk-mq devices (e.g. NVMe) because the top-level old-style request_queue was severely limited by the queue_lock. The second step offered here enables stacking a blk-mq request_queue ontop of the underlying blk-mq device(s). This unlocks significant performance gains on fast blk-mq devices, Keith Busch tested on his NVMe testbed and offered this really positive news: "Just providing a performance update. All my fio tests are getting roughly equal performance whether accessed through the raw block device or the multipath device mapper (~470k IOPS). I could only push ~20% of the raw iops through dm before this conversion, so this latest tree is looking really solid from a performance standpoint." Signed-off-by: Mike Snitzer <snitzer@redhat.com> Tested-by: Keith Busch <keith.busch@intel.com>	2015-04-15 12:10:16 -04:00
Mike Snitzer	52b09914af	dm: remove unnecessary wrapper around blk_lld_busy There is no need for DM to export a wrapper around the already exported blk_lld_busy(). Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-03-31 12:03:49 -04:00
Johannes Thumshirn	ff658e9c1a	dm mpath: simplify failure path of dm_multipath_init() Currently the cleanup of all error cases are open-coded. Introduce a common exit path and labels. Signed-off-by: Johannes Thumshirn <morbidrsa@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-02-09 13:06:49 -05:00
Mike Snitzer	e5863d9ad7	dm: allocate requests in target when stacking on blk-mq devices For blk-mq request-based DM the responsibility of allocating a cloned request is transfered from DM core to the target type. Doing so enables the cloned request to be allocated from the appropriate blk-mq request_queue's pool (only the DM target, e.g. multipath, can know which block device to send a given cloned request to). Care was taken to preserve compatibility with old-style block request completion that requires request-based DM _not_ acquire the clone request's queue lock in the completion path. As such, there are now 2 different request-based DM target_type interfaces: 1) the original .map_rq() interface will continue to be used for non-blk-mq devices -- the preallocated clone request is passed in from DM core. 2) a new .clone_and_map_rq() and .release_clone_rq() will be used for blk-mq devices -- blk_get_request() and blk_put_request() are used respectively from these hooks. dm_table_set_type() was updated to detect if the request-based target is being stacked on blk-mq devices, if so DM_TYPE_MQ_REQUEST_BASED is set. DM core disallows switching the DM table's type after it is set. This means that there is no mixing of non-blk-mq and blk-mq devices within the same request-based DM table. [This patch was started by Keith and later heavily modified by Mike] Tested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-02-09 13:06:47 -05:00
Keith Busch	2eb6e1e3aa	dm: submit stacked requests in irq enabled context Switch to having request-based DM enqueue all prep'ed requests into work processed by another thread. This allows request-based DM to invoke block APIs that assume interrupt enabled context (e.g. blk_get_request) and is a prerequisite for adding blk-mq support to request-based DM. The new kernel thread is only initialized for request-based DM devices. multipath_map() is now always in irq enabled context so change multipath spinlock (m->lock) locking to always disable interrupts. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-02-09 13:06:47 -05:00
Benjamin Marzinski	1f27197247	dm mpath: stop queueing IO when no valid paths exist 'queue_io' is set so that IO is queued while paths are being initialized. Clear queue_io in __choose_pgpath if there are no valid paths, since there are obviously no paths that can be initialized. Otherwise IOs to the device will back up. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-10-05 20:03:35 -04:00
Mike Snitzer	6afbc01d75	dm mpath: eliminate pg_ready() wrapper pg_ready() is not comprehensive in its logic and only serves to obfuscate code. Replace pg_ready() with the appropriate logic in multipath_map(). Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-08-01 12:30:31 -04:00
Jun'ichi Nomura	7a7a3b45fe	dm mpath: fix IO hang due to logic bug in multipath_busy Commit `e80991773` ("dm mpath: push back requests instead of queueing") modified multipath_busy() to return true if !pg_ready(). pg_ready() checks the current state of the multipath device and may return false even if a new IO is needed to change the state. Bart Van Assche reported that he had multipath IO lockup when he was performing cable pull tests. Analysis showed that the multipath device had a single path group with both paths active, but that the path group itself was not active. During the multipath device state transitions 'queue_io' got set but nothing could clear it. Clearing 'queue_io' only happens in __choose_pgpath(), but it won't be called if multipath_busy() returns true due to pg_ready() returning false when 'queue_io' is set. As such the !pg_ready() check in multipath_busy() is wrong because new IO will not be sent to multipath target and the multipath state change won't happen. That results in multipath IO lockup. The intent of multipath_busy() is to avoid unnecessary cycles of dequeue + request_fn + requeue if it is known that the multipath device will requeue. Such "busy" situations would be: - path group is being activated - there is no path and the multipath is setup to requeue if no path Fix multipath_busy() to return "busy" early only for these specific situations. Reported-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # v3.15	2014-07-10 16:44:15 -04:00
Mike Snitzer	7eee4ae2db	dm: disable WRITE SAME if it fails Add DM core support for disabling WRITE SAME on first failure to both request-based and bio-based targets. The need to disable WRITE SAME stems from SCSI enabling it by default but then disabling it when it fails. When SCSI does this it returns "permanent target failure, do not retry" using -EREMOTEIO. Update DM core to only disable WRITE SAME on failure if the returned error is -EREMOTEIO. Commit `f84cb8a4` ("dm mpath: disable WRITE SAME if it fails") implemented multipath specific disabling of WRITE SAME if it fails. However, as that commit detailed, the multipath-only solution doesn't go far enough if bio-based DM targets are stacked ontop of the request-based dm-multipath target (as is commonly done using dm-linear to support partitions on multipath devices, via kpartx). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Tested-by: Alex Chen <alex.chen@huawei.com>	2014-06-04 09:45:52 -04:00
Hannes Reinecke	63d832c301	dm mpath: really fix lockdep warning lockdep complains about a circular locking. And indeed, we need to release the lock before calling dm_table_run_md_queue_async(). As such, commit `4cdd2ad` ("dm mpath: fix lock order inconsistency in multipath_ioctl") must also be reverted in addition to fixing the lock order in the other dm_table_run_md_queue_async() callers. Reported-by: Bart van Assche <bvanassche@acm.org> Tested-by: Bart van Assche <bvanassche@acm.org> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-05-27 10:46:01 -04:00
Mike Snitzer	4cdd2ad780	dm mpath: fix lock order inconsistency in multipath_ioctl Commit `3e9f1be1b4` ("dm mpath: remove process_queued_ios()") did not consistently take the multipath device's spinlock (m->lock) before calling dm_table_run_md_queue_async() -- which takes the q->queue_lock. Found with code inspection using hint from reported lockdep warning. Reported-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-05-14 16:12:17 -04:00
Jose Castillo	a356e42620	dm mpath: print more useful warnings in multipath_message() The warning message "Unrecognised multipath message received" is displayed in two different situations in multipath_message(): when the number of arguments passed is invalid and when the string passed in argv[0] is not recognized. Make it easier to identify where the problem is by making these warnings more specific with additional context for each case. Signed-off-by: Jose Castillo <jcastillo@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-03-27 16:56:25 -04:00
Hannes Reinecke	3a01750964	dm-mpath: do not activate failed paths activate_path() is run without a lock, so the path might be set to failed before activate_path() had a chance to run. This patch add a check for ->active in activate_path() to avoid unnecessary overhead by calling functions which are known to be failing. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-03-27 16:56:25 -04:00
Mike Snitzer	9bf59a611a	dm mpath: remove extra nesting in map function Return early for case when no path exists, and when the pathgroup isn't ready. This eliminates the need for extra nesting for the the common case. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Hannes Reinecke <hare@suse.de>	2014-03-27 16:56:25 -04:00
Hannes Reinecke	36fcffcc65	dm mpath: remove map_io() multipath_map() is now just a wrapper around map_io(), so we can rename map_io() to multipath_map(). Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>	2014-03-27 16:56:25 -04:00
Hannes Reinecke	e3bde04f1e	dm mpath: reduce memory pressure when requeuing When multipath needs to requeue I/O in the block layer the per-request context shouldn't be allocated, as it will be freed immediately afterwards anyway. Avoiding this memory allocation will reduce memory pressure during requeuing. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>	2014-03-27 16:56:25 -04:00
Hannes Reinecke	3e9f1be1b4	dm mpath: remove process_queued_ios() process_queued_ios() has served 3 functions: 1) select pg and pgpath if none is selected 2) start pg_init if requested 3) dispatch queued IOs when pg is ready Basically, a call to queue_work(process_queued_ios) can be replaced by dm_table_run_md_queue_async(), which runs request queue and ends up calling map_io(), which does 1), 2) and 3). Exception is when !pg_ready() (which means either pg_init is running or requested), then multipath_busy() prevents map_io() being called from request_fn. If pg_init is running, it should be ok as long as pg_init_done() does the right thing when pg_init is completed, I.e.: restart pg_init if !pg_ready() or call dm_table_run_md_queue_async() to kick map_io(). If pg_init is requested, we have to make sure the request is detected and pg_init will be started. pg_init is requested in 3 places: a) __choose_pgpath() in map_io() b) __choose_pgpath() in multipath_ioctl() c) pg_init retry in pg_init_done() a) is ok because map_io() calls __pg_init_all_paths(), which does 2). b) needs a call to __pg_init_all_paths(), which does 2). c) needs a call to __pg_init_all_paths(), which does 2). So this patch removes process_queued_ios() and ensures that __pg_init_all_paths() is called at the appropriate locations. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>	2014-03-27 16:56:24 -04:00
Hannes Reinecke	e809917735	dm mpath: push back requests instead of queueing There is no reason why multipath needs to queue requests internally for queue_if_no_path or pg_init; we should rather push them back onto the request queue. And while we're at it we can simplify the conditional statement in map_io() to make it easier to read. Since mpath no longer does internal queuing of I/O the table info no longer emits the internal queue_size. Instead it displays 1 if queuing is being used or 0 if it is not. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>	2014-03-27 16:56:24 -04:00
Hannes Reinecke	17f4ff45b5	dm mpath: do not call pg_init when it is already running This patch moves condition checks as a preparation of following patches and has no effect on behaviour. process_queued_ios() is the only caller of __pg_init_all_paths() and 2 condition checks are moved from outside to inside without side effects. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>	2014-03-27 16:56:24 -04:00
Hannes Reinecke	a1989b3300	dm mpath: fix stalls when handling invalid ioctls An invalid ioctl will never be valid, irrespective of whether multipath has active paths or not. So for invalid ioctls we do not have to wait for multipath to activate any paths, but can rather return an error code immediately. This fix resolves numerous instances of: udevd[]: worker [] unexpectedly returned with status 0x0100 that have been seen during testing. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org	2014-02-26 09:44:44 -05:00
Hannes Reinecke	b63349a7a5	dm mpath: requeue I/O during pg_init When pg_init is running no I/O can be submitted to the underlying devices, as the path priority etc might change. When using queue_io for this, requests will be piling up within multipath as the block I/O scheduler just sees a _very fast_ device. All of this queued I/O has to be resubmitted from within multipathing once pg_init is done. This approach has the problem that it's virtually impossible to abort I/O when pg_init is running, and we're adding heavy load to the devices after pg_init since all of the queued I/O needs to be resubmitted _before_ any requests can be pulled off of the request queue and normal operation continues. This patch will requeue the I/O that triggers the pg_init call, and return 'busy' when pg_init is in progress. With these changes the block I/O scheduler will stop submitting I/O during pg_init, resulting in a quicker path switch and less I/O pressure (and memory consumption) after pg_init. Signed-off-by: Hannes Reinecke <hare@suse.de> [patch header edited for clarity and typos by Mike Snitzer] Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2013-11-05 11:20:34 -05:00
Shiva Krishna Merla	954a73d5d3	dm mpath: fix race condition between multipath_dtr and pg_init_done Whenever multipath_dtr() is happening we must prevent queueing any further path activation work. Implement this by adding a new 'pg_init_disabled' flag to the multipath structure that denotes future path activation work should be skipped if it is set. By disabling pg_init and then re-enabling in flush_multipath_work() we also avoid the potential for pg_init to be initiated while suspending an mpath device. Without this patch a race condition exists that may result in a kernel panic: 1) If after pg_init_done() decrements pg_init_in_progress to 0, a call to wait_for_pg_init_completion() assumes there are no more pending path management commands. 2) If pg_init_required is set by pg_init_done(), due to retryable mode_select errors, then process_queued_ios() will again queue the path activation work. 3) If free_multipath() completes before activate_path() work is called a NULL pointer dereference like the following can be seen when accessing members of the recently destructed multipath: BUG: unable to handle kernel NULL pointer dereference at 0000000000000090 RIP: 0010:[<ffffffffa003db1b>] [<ffffffffa003db1b>] activate_path+0x1b/0x30 [dm_multipath] [<ffffffff81090ac0>] worker_thread+0x170/0x2a0 [<ffffffff81096c80>] ? autoremove_wake_function+0x0/0x40 [switch to disabling pg_init in flush_multipath_work & header edits by Mike Snitzer] Signed-off-by: Shiva Krishna Merla <shivakrishna.merla@netapp.com> Reviewed-by: Krishnasamy Somasundaram <somasundaram.krishnasamy@netapp.com> Tested-by: Speagle Andy <Andy.Speagle@netapp.com> Acked-by: Junichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org	2013-10-31 21:39:47 -04:00
Mike Snitzer	f47908269f	dm: add reserved_rq_based_ios module parameter Allow user to change the number of IOs that are reserved by request-based DM's mempools by writing to this file: /sys/module/dm_mod/parameters/reserved_rq_based_ios The default value is RESERVED_REQUEST_BASED_IOS (256). The maximum allowed value is RESERVED_MAX_IOS (1024). Export dm_get_reserved_rq_based_ios() for use by DM targets and core code. Switch to sizing dm-mpath's mempool using DM core's configurable 'reserved_rq_based_ios'. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Frank Mayhar <fmayhar@google.com> Acked-by: Mikulas Patocka <mpatocka@redhat.com>	2013-09-23 10:42:24 -04:00
Mike Snitzer	f84cb8a46a	dm mpath: disable WRITE SAME if it fails Workaround the SCSI layer's problematic WRITE SAME heuristics by disabling WRITE SAME in the DM multipath device's queue_limits if an underlying device disabled it. The WRITE SAME heuristics, with both the original commit `5db44863b6` ("[SCSI] sd: Implement support for WRITE SAME") and the updated commit `66c28f971` ("[SCSI] sd: Update WRITE SAME heuristics"), default to enabling WRITE SAME(10) even without successfully determining it is supported. After the first failed WRITE SAME the SCSI layer will disable WRITE SAME for the device (by setting sdkp->device->no_write_same which results in 'max_write_same_sectors' in device's queue_limits to be set to 0). When a device is stacked ontop of such a SCSI device any changes to that SCSI device's queue_limits do not automatically propagate up the stack. As such, a DM multipath device will not have its WRITE SAME support disabled. This causes the block layer to continue to issue WRITE SAME requests to the mpath device which causes paths to fail and (if mpath IO isn't configured to queue when no paths are available) it will result in actual IO errors to the upper layers. This fix doesn't help configurations that have additional devices stacked ontop of the mpath device (e.g. LVM created linear DM devices ontop). A proper fix that restacks all the queue_limits from the bottom of the device stack up will need to be explored if SCSI will continue to use this model of optimistically allowing op codes and then disabling them after they fail for the first time. Before this patch: EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null) device-mapper: multipath: XXX snitm debugging: got -EREMOTEIO (-121) device-mapper: multipath: XXX snitm debugging: failing WRITE SAME IO with error=-121 end_request: critical target error, dev dm-6, sector 528 dm-6: WRITE SAME failed. Manually zeroing. device-mapper: multipath: Failing path 8:112. end_request: I/O error, dev dm-6, sector 4616 dm-6: WRITE SAME failed. Manually zeroing. end_request: I/O error, dev dm-6, sector 4616 end_request: I/O error, dev dm-6, sector 5640 end_request: I/O error, dev dm-6, sector 6664 end_request: I/O error, dev dm-6, sector 7688 end_request: I/O error, dev dm-6, sector 524288 Buffer I/O error on device dm-6, logical block 65536 lost page write due to I/O error on dm-6 JBD2: Error -5 detected when updating journal superblock for dm-6-8. end_request: I/O error, dev dm-6, sector 524296 Aborting journal on device dm-6-8. end_request: I/O error, dev dm-6, sector 524288 Buffer I/O error on device dm-6, logical block 65536 lost page write due to I/O error on dm-6 JBD2: Error -5 detected when updating journal superblock for dm-6-8. # cat /sys/block/sdh/queue/write_same_max_bytes 0 # cat /sys/block/dm-6/queue/write_same_max_bytes 33553920 After this patch: EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null) device-mapper: multipath: XXX snitm debugging: got -EREMOTEIO (-121) device-mapper: multipath: XXX snitm debugging: WRITE SAME I/O failed with error=-121 end_request: critical target error, dev dm-6, sector 528 dm-6: WRITE SAME failed. Manually zeroing. # cat /sys/block/sdh/queue/write_same_max_bytes 0 # cat /sys/block/dm-6/queue/write_same_max_bytes 0 It should be noted that WRITE SAME support wasn't enabled in DM multipath until v3.10. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Hannes Reinecke <hare@suse.de> Cc: stable@vger.kernel.org # 3.10+	2013-09-20 10:36:34 -04:00
Jun'ichi Nomura	cc9d3c382b	dm mpath: do not fail path on -ENOSPC Since ENOSPC is a target-side error, dm-mpath should just pass the error information to upper layer instead of retrying itself with path failover. Otherwise it will end up failing all paths down while path checkers find all paths ok. ENOSPC can now be returned from SCSI device after commit `a9d6ceb8` ("[SCSI] return ENOSPC on thin provisioning failure"). Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2013-09-18 14:41:06 -04:00
Hannes Reinecke	7e782af576	[SCSI] Return ENODATA on medium error When a medium error is detected the SCSI stack should return ENODATA to the upper layers. [jejb: fix whitespace error] Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2013-08-23 12:54:53 -04:00
Hannes Reinecke	6c182cd88d	dm mpath: fix ioctl deadlock when no paths When multipath needs to retry an ioctl the reference to the current live table needs to be dropped. Otherwise a deadlock occurs when all paths are down: - dm_blk_ioctl takes a reference to the current table and spins in multipath_ioctl(). - A new table is being loaded, but upon resume the process hangs in dm_table_destroy() waiting for references to drop to zero. With this patch the reference to the old table is dropped prior to retry, thereby avoiding the deadlock. Signed-off-by: Hannes Reinecke <hare@suse.de> Cc: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2013-07-10 23:41:15 +01:00
Mike Snitzer	042bcef889	dm mpath: enable WRITE SAME support Enable WRITE SAME support in dm multipath. As far as multipath is concerned it is just another write request. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Tested-by: Bharata B Rao <bharata.rao@gmail.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2013-05-10 14:37:16 +01:00
Alasdair G Kergon	55a62eef8d	dm: rename request variables to bios Use 'bio' in the name of variables and functions that deal with bios rather than 'request' to avoid confusion with the normal block layer use of 'request'. No functional changes. Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2013-03-01 22:45:47 +00:00
Mikulas Patocka	fd7c092e71	dm: fix truncated status strings Avoid returning a truncated table or status string instead of setting the DM_BUFFER_FULL_FLAG when the last target of a table fills the buffer. When processing a table or status request, the function retrieve_status calls ti->type->status. If ti->type->status returns non-zero, retrieve_status assumes that the buffer overflowed and sets DM_BUFFER_FULL_FLAG. However, targets don't return non-zero values from their status method on overflow. Most targets returns always zero. If a buffer overflow happens in a target that is not the last in the table, it gets noticed during the next iteration of the loop in retrieve_status; but if a buffer overflow happens in the last target, it goes unnoticed and erroneously truncated data is returned. In the current code, the targets behave in the following way: * dm-crypt returns -ENOMEM if there is not enough space to store the key, but it returns 0 on all other overflows. * dm-thin returns errors from the status method if a disk error happened. This is incorrect because retrieve_status doesn't check the error code, it assumes that all non-zero values mean buffer overflow. * all the other targets always return 0. This patch changes the ti->type->status function to return void (because most targets don't use the return code). Overflow is detected in retrieve_status: if the status method fills up the remaining space completely, it is assumed that buffer overflow happened. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2013-03-01 22:45:44 +00:00
Wei Yongjun	a71a261f5c	dm mpath: fix check for null mpio in end_io fn The mpio dereference should be moved below the BUG_ON NULL test in multipath_end_io(). spatch with a semantic match was used to found this. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-10-12 16:59:42 +01:00
Linus Torvalds	033d9959ed	Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue changes from Tejun Heo: "This is workqueue updates for v3.7-rc1. A lot of activities this round including considerable API and behavior cleanups. * delayed_work combines a timer and a work item. The handling of the timer part has always been a bit clunky leading to confusing cancelation API with weird corner-case behaviors. delayed_work is updated to use new IRQ safe timer and cancelation now works as expected. * Another deficiency of delayed_work was lack of the counterpart of mod_timer() which led to cancel+queue combinations or open-coded timer+work usages. mod_delayed_work[_on]() are added. These two delayed_work changes make delayed_work provide interface and behave like timer which is executed with process context. * A work item could be executed concurrently on multiple CPUs, which is rather unintuitive and made flush_work() behavior confusing and half-broken under certain circumstances. This problem doesn't exist for non-reentrant workqueues. While non-reentrancy check isn't free, the overhead is incurred only when a work item bounces across different CPUs and even in simulated pathological scenario the overhead isn't too high. All workqueues are made non-reentrant. This removes the distinction between flush_[delayed_]work() and flush_[delayed_]_work_sync(). The former is now as strong as the latter and the specified work item is guaranteed to have finished execution of any previous queueing on return. * In addition to the various bug fixes, Lai redid and simplified CPU hotplug handling significantly. * Joonsoo introduced system_highpri_wq and used it during CPU hotplug. There are two merge commits - one to pull in IRQ safe timer from tip/timers/core and the other to pull in CPU hotplug fixes from wq/for-3.6-fixes as Lai's hotplug restructuring depended on them." Fixed a number of trivial conflicts, but the more interesting conflicts were silent ones where the deprecated interfaces had been used by new code in the merge window, and thus didn't cause any real data conflicts. Tejun pointed out a few of them, I fixed a couple more. * 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits) workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending() workqueue: use cwq_set_max_active() helper for workqueue_set_max_active() workqueue: introduce cwq_set_max_active() helper for thaw_workqueues() workqueue: remove @delayed from cwq_dec_nr_in_flight() workqueue: fix possible stall on try_to_grab_pending() of a delayed work item workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback() workqueue: use __cpuinit instead of __devinit for cpu callbacks workqueue: rename manager_mutex to assoc_mutex workqueue: WORKER_REBIND is no longer necessary for idle rebinding workqueue: WORKER_REBIND is no longer necessary for busy rebinding workqueue: reimplement idle worker rebinding workqueue: deprecate __cancel_delayed_work() workqueue: reimplement cancel_delayed_work() using try_to_grab_pending() workqueue: use mod_delayed_work() instead of __cancel + queue workqueue: use irqsafe timer for delayed_work workqueue: clean up delayed_work initializers and add missing one workqueue: make deferrable delayed_work initializer names consistent workqueue: cosmetic whitespace updates for macro definitions workqueue: deprecate system_nrt[_freezable]_wq workqueue: deprecate flush[_delayed]_work_sync() ...	2012-10-02 09:54:49 -07:00
Mike Snitzer	7ba10aa6fb	dm mpath: only retry ioctl when no paths if queue_if_no_path set When there are no paths and multipath receives an ioctl, it waits until a path becomes available. This behaviour is incorrect if the "queue_if_no_path" setting was not specified, as then the ioctl should be rejected immediately, which this patch now does. commit `35991652b` ("dm mpath: allow ioctls to trigger pg init") should have checked if queue_if_no_path was configured before queueing IO. Checking for the queue_if_no_path feature, like is done in map_io(), allows the following table load to work without blocking in the multipath_ioctl retry loop: echo "0 1024 multipath 0 0 0 0" \| dmsetup create mpath_nodevs Without this fix the multipath_ioctl will block with the following stack trace: blkid D 0000000000000002 0 23936 1 0x00000000 ffff8802b89e5cd8 0000000000000082 ffff8802b89e5fd8 0000000000012440 ffff8802b89e4010 0000000000012440 0000000000012440 0000000000012440 ffff8802b89e5fd8 0000000000012440 ffff88030c2aab30 ffff880325794040 Call Trace: [<ffffffff814ce099>] schedule+0x29/0x70 [<ffffffff814cc312>] schedule_timeout+0x182/0x2e0 [<ffffffff8104dee0>] ? lock_timer_base+0x70/0x70 [<ffffffff814cc48e>] schedule_timeout_uninterruptible+0x1e/0x20 [<ffffffff8104f840>] msleep+0x20/0x30 [<ffffffffa0000839>] multipath_ioctl+0x109/0x170 [dm_multipath] [<ffffffffa06bfb9c>] dm_blk_ioctl+0xbc/0xd0 [dm_mod] [<ffffffff8122a408>] __blkdev_driver_ioctl+0x28/0x30 [<ffffffff8122a79e>] blkdev_ioctl+0xce/0x730 [<ffffffff811970ac>] block_ioctl+0x3c/0x40 [<ffffffff8117321c>] do_vfs_ioctl+0x8c/0x340 [<ffffffff81166293>] ? sys_newfstat+0x33/0x40 [<ffffffff81173571>] sys_ioctl+0xa1/0xb0 [<ffffffff814d70a9>] system_call_fastpath+0x16/0x1b Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.5+ Acked-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-09-26 23:45:41 +01:00
Tejun Heo	43829731dd	workqueue: deprecate flush[_delayed]_work_sync() flush[_delayed]_work_sync() are now spurious. Mark them deprecated and convert all users to flush[_delayed]_work(). If you're cc'd and wondering what's going on: Now all workqueues are non-reentrant and the regular flushes guarantee that the work item is not pending or running on any CPU on return, so there's no reason to use the sync flushes at all and they're going away. This patch doesn't make any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Mattia Dongili <malattia@linux.it> Cc: Kent Yoder <key@linux.vnet.ibm.com> Cc: David Airlie <airlied@linux.ie> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Karsten Keil <isdn@linux-pingi.de> Cc: Bryan Wu <bryan.wu@canonical.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-wireless@vger.kernel.org Cc: Anton Vorontsov <cbou@mail.ru> Cc: Sangbeom Kim <sbkim73@samsung.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Petr Vandrovec <petr@vandrovec.name> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Avi Kivity <avi@redhat.com>	2012-08-20 14:51:24 -07:00
Alasdair G Kergon	1f4e0ff079	dm thin: commit before gathering status Commit outstanding metadata before returning the status for a dm thin pool so that the numbers reported are as up-to-date as possible. The commit is not performed if the device is suspended or if the DM_NOFLUSH_FLAG is supplied by userspace and passed to the target through a new 'status_flags' parameter in the target's dm_status_fn. The userspace dmsetup tool will support the --noflush flag with the 'dmsetup status' and 'dmsetup wait' commands from version 1.02.76 onwards. Tested-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-07-27 15:08:16 +01:00
Mike Snitzer	a58a935d5a	dm mpath: add retain_attached_hw_handler feature A SCSI device handler might get attached to a device during the initial device scan. We do not necessarily want to override this when loading a multipath table, so this patch adds a new multipath feature argument "retain_attached_hw_handler". During SCSI device scan all loaded SCSI device handlers will be consulted for a match (via scsi_dh's provided .match). If a match is found that device handler will be attached. We need a way to have userspace multipathd's provided 'hw_handler' not override the already attached hardware handler. When specifying the new feature 'retain_attached_hw_handler' multipath will use the currently attached hardware handler instead of trying to attach the one specified during table load. If no hardware handler is attached the specified hardware handler will still be used. Leverages scsi_dh_attach's ability to increment the scsi_dh's reference count if the same scsi_dh name is provided when attaching - currently attached scsi_dh name is determined with scsi_dh_attached_handler_name. Depends upon commit `7e8a74b177` ("[SCSI] scsi_dh: add scsi_dh_attached_handler_name"). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Tested-by: Babu Moger <babu.moger@netapp.com> Reviewed-by: Chandra Seetharaman <sekharan@us.ibm.com> Acked-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-07-27 15:08:04 +01:00
Mikulas Patocka	35991652ba	dm mpath: allow ioctls to trigger pg init After the failure of a group of paths, any alternative paths that need initialising do not become available until further I/O is sent to the device. Until this has happened, ioctls return -EAGAIN. With this patch, new paths are made available in response to an ioctl too. The processing of the ioctl gets delayed until this has happened. Instead of returning an error, we submit a work item to kmultipathd (that will potentially activate the new path) and retry in ten milliseconds. Note that the patch doesn't retry an ioctl if the ioctl itself fails due to a path failure. Such retries should be handled intelligently by the code that generated the ioctl in the first place, noting that some SCSI commands should not be retried because they are not idempotent (XOR write commands). For commands that could be retried, there is a danger that if the device rejected the SCSI command, the path could be errorneously marked as failed, and the request would be retried on another path which might fail too. It can be determined if the failure happens on the device or on the SCSI controller, but there is no guarantee that all SCSI drivers set these flags correctly. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-06-03 00:29:58 +01:00
Mike Christie	f220fd4efb	dm mpath: delay retry of bypassed pg If I/O needs retrying and only bypassed priority groups are available, set the pg_init_delay_retry flag to wait before retrying. If, for example, the reason for the bypass is that the controller is getting reset or there is a firmware upgrade happening, retrying right away would cause a flood of log messages and retries for what could be a few seconds or even several minutes. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-06-03 00:29:45 +01:00
Mike Snitzer	1fbdd2b3a3	dm mpath: reduce size of struct multipath Move multipath structure's 'lock' and 'queue_size' members to eliminate two 4-byte holes. Also use a bit within a single unsigned int for each existing flag (saves 8-bytes). This allows future flags to be added without each consuming an unsigned int. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-06-03 00:29:43 +01:00

1 2 3 4 5

244 Commits