Commit Graph

99 Commits

Author SHA1 Message Date
Alasdair G Kergon 1f4e0ff079 dm thin: commit before gathering status
Commit outstanding metadata before returning the status for a dm thin
pool so that the numbers reported are as up-to-date as possible.

The commit is not performed if the device is suspended or if
the DM_NOFLUSH_FLAG is supplied by userspace and passed to the target
through a new 'status_flags' parameter in the target's dm_status_fn.

The userspace dmsetup tool will support the --noflush flag with the
'dmsetup status' and 'dmsetup wait' commands from version 1.02.76
onwards.

Tested-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-27 15:08:16 +01:00
Mike Snitzer a58a935d5a dm mpath: add retain_attached_hw_handler feature
A SCSI device handler might get attached to a device during the
initial device scan.  We do not necessarily want to override
this when loading a multipath table, so this patch adds a new
multipath feature argument "retain_attached_hw_handler".

During SCSI device scan all loaded SCSI device handlers will be
consulted for a match (via scsi_dh's provided .match).  If a match is
found that device handler will be attached.  We need a way to have
userspace multipathd's provided 'hw_handler' not override the already
attached hardware handler.

When specifying the new feature 'retain_attached_hw_handler' multipath
will use the currently attached hardware handler instead of trying to
attach the one specified during table load.  If no hardware handler is
attached the specified hardware handler will still be used.

Leverages scsi_dh_attach's ability to increment the scsi_dh's reference
count if the same scsi_dh name is provided when attaching - currently
attached scsi_dh name is determined with scsi_dh_attached_handler_name.

Depends upon commit 7e8a74b177
("[SCSI] scsi_dh: add scsi_dh_attached_handler_name").

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Tested-by: Babu Moger <babu.moger@netapp.com>
Reviewed-by: Chandra Seetharaman <sekharan@us.ibm.com>
Acked-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-27 15:08:04 +01:00
Mikulas Patocka 35991652ba dm mpath: allow ioctls to trigger pg init
After the failure of a group of paths, any alternative paths that
need initialising do not become available until further I/O is sent to
the device.  Until this has happened, ioctls return -EAGAIN.

With this patch, new paths are made available in response to an ioctl
too.  The processing of the ioctl gets delayed until this has happened.

Instead of returning an error, we submit a work item to kmultipathd
(that will potentially activate the new path) and retry in ten
milliseconds.

Note that the patch doesn't retry an ioctl if the ioctl itself fails due
to a path failure.  Such retries should be handled intelligently by the
code that generated the ioctl in the first place, noting that some SCSI
commands should not be retried because they are not idempotent (XOR write
commands).  For commands that could be retried, there is a danger that
if the device rejected the SCSI command, the path could be errorneously
marked as failed, and the request would be retried on another path which
might fail too.  It can be determined if the failure happens on the
device or on the SCSI controller, but there is no guarantee that all
SCSI drivers set these flags correctly.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-06-03 00:29:58 +01:00
Mike Christie f220fd4efb dm mpath: delay retry of bypassed pg
If I/O needs retrying and only bypassed priority groups are available,
set the pg_init_delay_retry flag to wait before retrying.

If, for example, the reason for the bypass is that the controller is
getting reset or there is a firmware upgrade happening, retrying right
away would cause a flood of log messages and retries for what could be a
few seconds or even several minutes.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-06-03 00:29:45 +01:00
Mike Snitzer 1fbdd2b3a3 dm mpath: reduce size of struct multipath
Move multipath structure's 'lock' and 'queue_size' members to eliminate
two 4-byte holes.  Also use a bit within a single unsigned int for each
existing flag (saves 8-bytes).  This allows future flags to be added
without each consuming an unsigned int.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-06-03 00:29:43 +01:00
Mike Snitzer 510193a2d3 dm mpath: check if scsi_dh module already loaded before trying to load
If the requested scsi_dh module is already loaded then skip
request_module().

Multipath table loads can hang in an unnecessary __request_module.

Reported-by: Ben Marzinski <bmarzins@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-05-12 01:43:21 +01:00
Mikulas Patocka 31998ef193 dm: reject trailing characters in sccanf input
Device mapper uses sscanf to convert arguments to numbers. The problem is that
the way we use it ignores additional unmatched characters in the scanned string.

For example, this `if (sscanf(string, "%d", &number) == 1)' will match a number,
but also it will match number with some garbage appended, like "123abc".

As a result, device mapper accepts garbage after some numbers. For example
the command `dmsetup create vg1-new --table "0 16384 linear 254:1bla 34816bla"'
will pass without an error.

This patch fixes all sscanf uses in device mapper. It appends "%c" with
a pointer to a dummy character variable to every sscanf statement.

The construct `if (sscanf(string, "%d%c", &number, &dummy) == 1)' succeeds
only if string is a null-terminated number (optionally preceded by some
whitespace characters). If there is some character appended after the number,
sscanf matches "%c", writes the character to the dummy variable and returns 2.
We check the return value for 1 and consequently reject numbers with some
garbage appended.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-03-28 18:41:26 +01:00
Jun'ichi Nomura 466891f995 dm mpath: detect invalid map_context
The map_context pointer should always be set. However, we have reports
that upon requeuing it is not set correctly.  So add set and clear
functions with a BUG_ON() to track the issue properly.

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Acked-by: Hannes Reinecke <hare@suse.de>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-03-28 18:41:25 +01:00
Paolo Bonzini ec8013bedd dm: do not forward ioctls from logical volumes to the underlying device
A logical volume can map to just part of underlying physical volume.
In this case, it must be treated like a partition.

Based on a patch from Alasdair G Kergon.

Cc: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-14 15:07:24 -08:00
Mike Snitzer 498f0103ea dm table: share target argument parsing functions
Move multipath target argument parsing code into dm-table so other
targets can share it.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02 12:32:04 +01:00
Mike Snitzer 286f367dad dm mpath: fix potential NULL pointer in feature arg processing
Avoid dereferencing a NULL pointer if the number of feature arguments
supplied is fewer than indicated.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: stable@kernel.org
2011-08-02 12:32:00 +01:00
Arun Sharma 60063497a9 atomic: use <linux/atomic.h>
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>

Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 16:49:47 -07:00
Martin K. Petersen 6f13f6fba7 dm mpath: do not fail paths after integrity errors
Integrity errors need to be passed to the owner of the integrity
metadata for processing. Consequently EILSEQ should be passed up the
stack.

Cc: stable@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29 13:02:55 +01:00
Mike Snitzer a490a07a67 dm mpath: allow table load with no priority groups
This patch adjusts the multipath target to allow a table with both 0
priority groups and 0 for the initial priority group number.

If any mpath device is held open when all paths in the last priority
group have failed, userspace multipathd will attempt to reload the
associated DM table to reflect the fact that the device no longer has
any priority groups.  But the reload attempt always failed because the
multipath target did not allow 0 priority groups.

All multipath target messages related to priority group (enable_group,
disable_group, switch_group) will handle a priority group of 0 (will
cause error).

When reloading a multipath table with 0 priority groups, userspace
multipathd must be updated to specify an initial priority group number
of 0 (rather than 1).

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: Babu Moger <babu.moger@lsi.com>
Acked-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-03-24 13:54:33 +00:00
Mike Snitzer 19040c0bc8 dm mpath: fail message ioctl if specified path is not valid
Fail the reinstate_path and fail_path message ioctl if the specified
path is not valid.

The message ioctl would succeed for the 'reinistate_path' and
'fail_path' messages even if action was not taken because the
specified device was not a valid path of the multipath device.

Before, when /dev/vdb is not a path of mpathb:
$ dmsetup message mpathb 0 reinstate_path /dev/vdb
$ echo $?
0

After:
$ dmsetup message mpathb 0 reinstate_path /dev/vdb
device-mapper: message ioctl failed: Invalid argument
Command failed
$ echo $?
1

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-03-24 13:54:31 +00:00
Hannes Reinecke 751b2a7d62 [SCSI] dm mpath: propagate target errors immediately
DM now has more information about the nature of the underlying storage
failure.  Path failure is avoided if a request failed due to a target
error.  Instead the target error is immediately passed up the stack.

Discard requests that fail due to non-target errors may now be retried.

Errors restricted to the path will be retried or returned if no
paths are available, irregarding the no_path_retry setting.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-02-12 10:33:29 -06:00
Chandra Seetharaman 4e2d19e46b dm mpath: delay activate_path retry on SCSI_DH_RETRY
This patch adds a user-configurable 'pg_init_delay_msecs' feature.  Use
this feature to specify the number of milliseconds to delay before
retrying scsi_dh_activate, when SCSI_DH_RETRY is returned.

SCSI Device Handlers return SCSI_DH_IMM_RETRY if we could retry
activation immediately and SCSI_DH_RETRY in cases where it is better to
retry after some delay.

Currently we immediately retry scsi_dh_activate irrespective of
SCSI_DH_IMM_RETRY and SCSI_DH_RETRY.

The 'pg_init_delay_msecs' feature may be provided during table create or
load, e.g.:
    dmsetup create --table "0 20971520 multipath 3 queue_if_no_path \
	pg_init_delay_msecs 2500 ..." mpatha

The default for 'pg_init_delay_msecs' is 2000 milliseconds.
Maximum configurable delay is 60000 milliseconds.  Specifying a
'pg_init_delay_msecs' of 0 will cause immediate retry.

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Acked-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-01-13 20:00:01 +00:00
Tejun Heo 4d4d66ab53 dm: convert workqueues to alloc_ordered
Convert all create[_singlethread]_work() users to the new
alloc[_ordered]_workqueue().  This conversion is mechanical and
doesn't introduce any behavior change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-01-13 19:59:57 +00:00
Tejun Heo d5ffa387e2 dm: dont use flush_scheduled_work
flush_scheduled_work() is being deprecated.  Flush the used work
directly instead.  In all dm targets, the only work which uses
system_wq is ->trigger_event.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-01-13 19:59:56 +00:00
Mike Snitzer 09c9d4c9b6 dm mpath: disable blk_abort_queue
Revert commit 224cb3e981
  dm: Call blk_abort_queue on failed paths

Multipath began to use blk_abort_queue() to allow for
lower latency path deactivation.  This was found to
cause list corruption:

   the cmd gets blk_abort_queued/timedout run on it and the scsi eh
   somehow is able to complete and run scsi_queue_insert while
   scsi_request_fn is still trying to process the request.

   https://www.redhat.com/archives/dm-devel/2010-November/msg00085.html

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: Mike Anderson <andmike@linux.vnet.ibm.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: stable@kernel.org
2011-01-13 19:59:46 +00:00
Mike Snitzer 959eb4e559 dm mpath: support discard
Enable discard support in the DM multipath target.

This discard support depends on a few discard-specific fixes to the
block layer's request stacking driver methods.

Discard requests are optional so don't allow a failed discard to trigger
path failures.  If there is a real problem with a given path the
barriers associated with the discard (either before or after the
discard) will cause path failure.  That said, unconditionally passing
discard failures up the stack is not ideal.  This must be fixed once DM
has more information about the nature of the underlying storage failure.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
2010-08-12 04:14:32 +01:00
Alasdair G Kergon 6bbf79a140 dm mpath: fix NULL pointer dereference when path parameters missing
multipath_ctr() forgets to return an error after detecting
missing path parameters.  Fix this.

Signed-off-by: Patrick LoPresti <lopresti@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2010-08-12 04:13:49 +01:00
Nikanth Karthikesan 8215d6ec5f dm table: remove unused dm_get_device range parameters
Remove unused parameters(start and len) of dm_get_device()
and fix the callers.

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2010-03-06 02:32:27 +00:00
Kiyoshi Ueda fb61264297 dm mpath: refactor pg_init
This patch pulls the pg_init path activation code out of
process_queued_ios() into a new function.

No functional change.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2010-03-06 02:32:18 +00:00
Kiyoshi Ueda 2bded7bd7e dm mpath: wait for pg_init completion when suspending
When suspending the device we must wait for all I/O to complete, but
pg-init may be still in progress even after flushing the workqueue
for kmpath_handlerd in multipath_postsuspend.

This patch waits for pg-init completion correctly in
multipath_postsuspend().

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2010-03-06 02:32:13 +00:00
Kiyoshi Ueda d0259bf0ee dm mpath: hold io until all pg_inits completed
m->queue_io is set to block processing I/Os, and it needs to be kept
while pg-init, which issues multiple path activations, is in progress.
But m->queue is cleared when a path activation completes without error
in pg_init_done(), even while other path activations are in progress.
That may cause undesired -EIO on paths which are not complete activation.

This patch fixes that by not clearing m->queue_io until all path
activations complete.

(Before the hardware handlers were moved into the SCSI layer, pg_init
only used one path.)

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2010-03-06 02:30:02 +00:00
Kiyoshi Ueda fce323dd68 dm mpath: avoid storing private suspended state
'suspended' flag in struct multipath was introduced to check whether
the multipath target is in suspended state, but the same check is
done through dm_suspended() now, so remove the flag and related code.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2010-03-06 02:29:59 +00:00
Moger, Babu f7b934c812 dm mpath: skip activate_path for failed paths
This patch adds two minor fixes while processing device mapper path activation.

Skip failed paths while calling activate_path.  If the path is already failed
then activate_path will fail for sure. We don't have to call in that case. In
some case this might cause prolonged retries unnecessarily.

Change the misleading message if the path being activated fails with SCSI_DH_NOSYS.

Signed-off-by: Babu Moger <babu.moger@lsi.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2010-03-06 02:29:49 +00:00
Moger, Babu 83c0d5d538 dm mpath: pass struct pgpath to pg init done
This patch removes some unnecessary argument casting. There is no
functional change with this patch.

Passes 'struct pgpath' through to pg_init_done() instead of the enclosed
'struct dm_path'.

Tested the changes with LSI storage..

CC: Chandra Seetharaman <chandra.seetharaman@us.ibm.com>
Signed-off-by: Babu Moger <babu.moger@lsi.com>
Acked-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2010-03-06 02:29:45 +00:00
Kiyoshi Ueda c2f3d24b78 dm mpath: reject messages when device is suspended
This patch rejects messages that can generate I/O while the device
itself is suspended.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Cc: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:52:27 +00:00
Mike Anderson 67a46dad25 dm mpath: prevent io from work queue while suspended
Reject messages that can generate I/O while the device itself
is suspended.

Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com>
Acked-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:52:21 +00:00
Mike Anderson 6380f26f04 dm mpath: add mutex to synchronize adding and flushing work
Add a mutex to allow possible creators of new work to synchronize with
flushing work queues.

Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com>
Acked-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:52:21 +00:00
Kiyoshi Ueda 6df400ab64 dm mpath: flush workqueues before suspend completes
This patch stops the remaining dm-mpath activity during the suspend
sequence by flushing workqueues in postsuspend function.

The current dm-mpath target may not be quiet even after suspend completes
because some workqueues (e.g. device_handler's work, event handling)
are not flushed during the suspend sequence, even though suspended
devices/targets are supposed to be quiet in this state.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:52:19 +00:00
Chandra Seetharaman 3ae31f6a7b [SCSI] scsi_dh: Change the scsidh_activate interface to be asynchronous
Make scsi_dh_activate() function asynchronous, by taking in two additional
parameters, one is the callback function and the other is the data to call
the callback function with.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-12-04 12:00:46 -06:00
Chandra Seetharaman 2bfd2e1337 [SCSI] scsi_dh: Use scsi_dh_set_params() in multipath.
Use scsi_dh_set_params() set parameters provided. Save the parameters in
parse_hw_handler() and use it in parse_path().

Reported-by: Eddie Williams <Eddie.Williams@steeleye.com>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Tested-by: Eddie Williams <Eddie.Williams@steeleye.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-08-22 17:52:15 -05:00
Mike Snitzer 5dea271b6d dm table: pass correct dev area size to device_area_is_valid
Incorrect device area lengths are being passed to device_area_is_valid().

The regression appeared in 2.6.31-rc1 through commit
754c5fc7eb.

With the dm-stripe target, the size of the target (ti->len) was used
instead of the stripe_width (ti->len/#stripes).  An example of a
consequent incorrect error message is:

  device-mapper: table: 254:0: sdb too small for target

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-07-23 20:30:42 +01:00
Kiyoshi Ueda f40c67f0f7 dm mpath: change to be request based
This patch converts dm-multipath target to request-based from bio-based.

Basically, the patch just converts the I/O unit from struct bio
to struct request.
In the course of the conversion, it also changes the I/O queueing
mechanism.  The change in the I/O queueing is described in details
as follows.

I/O queueing mechanism change
-----------------------------
In I/O submission, map_io(), there is no mechanism change from
bio-based, since the clone request is ready for retry as it is.
However, in I/O complition, do_end_io(), there is a mechanism change
from bio-based, since the clone request is not ready for retry.

In do_end_io() of bio-based, the clone bio has all needed memory
for resubmission.  So the target driver can queue it and resubmit
it later without memory allocations.
The mechanism has almost no overhead.

On the other hand, in do_end_io() of request-based, the clone request
doesn't have clone bios, so the target driver can't resubmit it
as it is.  To resubmit the clone request, memory allocation for
clone bios is needed, and it takes some overheads.
To avoid the overheads just for queueing, the target driver doesn't
queue the clone request inside itself.
Instead, the target driver asks dm core for queueing and remapping
the original request of the clone request, since the overhead for
queueing is just a freeing memory for the clone request.

As a result, the target driver doesn't need to record/restore
the information of the original request for resubmitting
the clone request.  So dm_bio_details in dm_mpath_io is removed.

multipath_busy()
---------------------
The target driver returns "busy", only when the following case:
  o The target driver will map I/Os, if map() function is called
  and
  o The mapped I/Os will wait on underlying device's queue due to
    their congestions, if map() function is called now.

In other cases, the target driver doesn't return "busy".
Otherwise, dm core will keep the I/Os and the target driver can't
do what it wants.
(e.g. the target driver can't map I/Os now, so wants to kill I/Os.)

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:37 +01:00
Mike Snitzer af4874e03e dm target:s introduce iterate devices fn
Add .iterate_devices to 'struct target_type' to allow a function to be
called for all devices in a DM target.  Implemented it for all targets
except those in dm-snap.c (origin and snapshot).

(The raid1 version number jumps to 1.12 because we originally reserved
1.1 to 1.11 for 'block_on_error' but ended up using 'handle_errors'
instead.)

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: martin.petersen@oracle.com
2009-06-22 10:12:33 +01:00
Kiyoshi Ueda 02ab823fd1 dm mpath: add start_io and nr_bytes to path selectors
This patch makes two additions to the dm path selector interface for
dynamic load balancers:
  o a new hook, start_io()
  o a new parameter 'nr_bytes' to select_path()/start_io()/end_io()
    to pass the size of the I/O

start_io() is called when a target driver actually submits I/O
to the selected path.
Path selectors can use it to start accounting of the I/O.
(e.g. counting the number of in-flight I/Os.)
The start_io hook is based on the patch posted by Stefan Bader:
https://www.redhat.com/archives/dm-devel/2005-October/msg00050.html

nr_bytes, the size of the I/O, is so path selectors can take the
size of the I/O into account when deciding which path to use.
dm-service-time uses it to estimate service time, for example.
(Added the nr_bytes member to dm_mpath_io instead of using existing
 details.bi_size, since request-based dm patch deletes it.)

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:27 +01:00
Mikulas Patocka 8627921fa2 dm mpath: support barriers
Flush support for dm-multipath target.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:24 +01:00
Mikulas Patocka 53b351f972 dm mpath: flush keventd queue in destructor
The commit fe9cf30eb8 moves dm table event
submission from kmultipath queue to kernel kevent queue to avoid a
deadlock.

There is a possibility of race condition because kevent queue is not flushed
in the multipath destructor. The scenario is:
- some event happens and is queued to keventd
- keventd thread is delayed due to scheuling latency or some other work
- multipath device is destroyed
- keventd now attempts to process work_struct that is residing in already
  released memory.

The patch flushes the keventd queue in multipath constructor.
I've already fixed similar bug in dm-raid1.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: stable@kernel.org
2009-06-22 10:12:13 +01:00
Chandra Seetharaman e54f77ddda dm mpath: call activate fn for each path in pg_init
Fixed a problem affecting reinstatement of passive paths.

Before we moved the hardware handler from dm to SCSI, it performed a pg_init
for a path group and didn't maintain any state about each path in hardware
handler code.

But in SCSI dh, such state is now maintained, as we want to fail I/O early on a
path if it is not the active path.

All the hardware handlers have a state now and set to active or some form of
inactive.  They have prep_fn() which uses this state to fail the I/O without
it ever being sent to the device.

So in effect when dm-multipath calls scsi_dh_activate(), activate is
sent to only one path and the "state" of that path is changed appropriately
to "active" while other paths in the same path group are never changed
as they never got an "activate".

In order make sure all the paths in a path group gets their state set
properly when a pg_init happens, we need to call scsi_dh_activate() on
all paths in a path group.

Doing this at the hardware handler layer is not a good option as we
want the multipath layer to define the relationship between path and path
groups and not the hardware handler.

Attached patch sends an "activate" on each path in a path group when a
path group is switched. It also sends an activate when a path is reinstated.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:12 +01:00
Hannes Reinecke a0cf7ea954 dm mpath: change attached scsi_dh
When specifying a different hardware handler via multipath
features we should be able to override the built-in defaults.

The problem here is the hardware table from scsi_dh is compiled
in and cannot be changed from userland. The multipath.conf OTOH
is purely user-defined and, what's more, the user might have a valid
reason for modifying it.
(EG EMC Clariion can well be run in PNR mode even though ALUA is
active, or the user might want to try ALUA on any as-of-yet unknown
devices)

So _not_ allowing multipath to override the device handler setting
will just add to the confusion and makes error tracking even more
difficult.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:11 +01:00
Mikulas Patocka e094f4f15f dm mpath: validate hw_handler argument count
Fix arg count parsing error in hw handlers.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:10 +01:00
Mikulas Patocka 0e0497c0c0 dm mpath: validate table argument count
The parser reads the argument count as a number but doesn't check that
sufficient arguments are supplied. This command triggers the bug:

dmsetup create mpath --table "0 `blockdev --getsize /dev/mapper/cr0`
    multipath 0 0 2 1 round-robin 1000 0 1 1 /dev/mapper/cr0
    round-robin 0 1 1 /dev/mapper/cr1 1000"
kernel BUG at drivers/md/dm-mpath.c:530!

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:08:02 +01:00
Christoph Hellwig 8f3d8ba20e block: move bio list helpers into bio.h
It's used by DM and MD and generally useful, so move the bio list
helpers into bio.h.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-04-15 08:28:09 +02:00
Alasdair G Kergon fe9cf30eb8 dm mpath: move trigger_event to system workqueue
The same workqueue is used both for sending uevents and processing queued I/O.
Deadlock has been reported in RHEL5 when sending a uevent was blocked waiting
for the queued I/O to be processed.  Use scheduled_work() for the asynchronous
uevents instead.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:05:13 +00:00
Mikulas Patocka 10d3bd09a3 dm: consolidate target deregistration error handling
Change dm_unregister_target to return void and use BUG() for error
reporting.

dm_unregister_target can only fail because of programming bug in the
target driver. It can't fail because of user's behavior or disk errors.

This patch changes unregister_target to return void and use BUG if
someone tries to unregister non-registered target or unregister target
that is in use.

This patch removes code duplication (testing of error codes in all dm
targets) and reports bugs in just one place, in dm_unregister_target. In
some target drivers, these return codes were ignored, which could lead
to a situation where bugs could be missed.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:04:58 +00:00
Chandra Seetharaman 14e98c5ca8 dm mpath: warn if args ignored
Currently dm ignores the parameters provided to hardware handlers
without providing any notifications to the user.

This patch just prints a warning message so that the user knows that
the arguments are ignored.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-11-13 23:39:06 +00:00
Chandra Seetharaman b81aa1c792 dm mpath: avoid attempting to activate null path
Path activation code is called even when the pgpath is NULL. This could
lead to a panic in activate_path(). Such a panic is seen in -rt kernel.

This problem has been there before the pg_init() was moved to a
workqueue.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-11-13 23:39:00 +00:00