2005-04-17 06:20:36 +08:00
|
|
|
#
|
|
|
|
# Block device driver configuration
|
|
|
|
#
|
|
|
|
|
2007-07-17 19:06:11 +08:00
|
|
|
menuconfig MD
|
2005-04-17 06:20:36 +08:00
|
|
|
bool "Multiple devices driver support (RAID and LVM)"
|
2007-07-17 19:06:11 +08:00
|
|
|
depends on BLOCK
|
2005-04-17 06:20:36 +08:00
|
|
|
help
|
|
|
|
Support multiple physical spindles through a single logical device.
|
|
|
|
Required for RAID and logical volume management.
|
|
|
|
|
2007-07-17 19:06:11 +08:00
|
|
|
if MD
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
config BLK_DEV_MD
|
|
|
|
tristate "RAID support"
|
|
|
|
---help---
|
|
|
|
This driver lets you combine several hard disk partitions into one
|
|
|
|
logical block device. This can be used to simply append one
|
|
|
|
partition to another one or to combine several redundant hard disks
|
|
|
|
into a RAID1/4/5 device so as to provide protection against hard
|
|
|
|
disk failures. This is called "Software RAID" since the combining of
|
|
|
|
the partitions is done by the kernel. "Hardware RAID" means that the
|
|
|
|
combining is done by a dedicated controller; if you have such a
|
|
|
|
controller, you do not need to say Y here.
|
|
|
|
|
|
|
|
More information about Software RAID on Linux is contained in the
|
|
|
|
Software RAID mini-HOWTO, available from
|
|
|
|
<http://www.tldp.org/docs.html#howto>. There you will also learn
|
|
|
|
where to get the supporting user space utilities raidtools.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
|
|
|
config MD_LINEAR
|
|
|
|
tristate "Linear (append) mode"
|
|
|
|
depends on BLK_DEV_MD
|
|
|
|
---help---
|
|
|
|
If you say Y here, then your multiple devices driver will be able to
|
|
|
|
use the so-called linear mode, i.e. it will combine the hard disk
|
|
|
|
partitions by simply appending one to the other.
|
|
|
|
|
|
|
|
To compile this as a module, choose M here: the module
|
|
|
|
will be called linear.
|
|
|
|
|
|
|
|
If unsure, say Y.
|
|
|
|
|
|
|
|
config MD_RAID0
|
|
|
|
tristate "RAID-0 (striping) mode"
|
|
|
|
depends on BLK_DEV_MD
|
|
|
|
---help---
|
|
|
|
If you say Y here, then your multiple devices driver will be able to
|
|
|
|
use the so-called raid0 mode, i.e. it will combine the hard disk
|
|
|
|
partitions into one logical device in such a fashion as to fill them
|
|
|
|
up evenly, one chunk here and one chunk there. This will increase
|
|
|
|
the throughput rate if the partitions reside on distinct disks.
|
|
|
|
|
|
|
|
Information about Software RAID on Linux is contained in the
|
|
|
|
Software-RAID mini-HOWTO, available from
|
|
|
|
<http://www.tldp.org/docs.html#howto>. There you will also
|
|
|
|
learn where to get the supporting user space utilities raidtools.
|
|
|
|
|
|
|
|
To compile this as a module, choose M here: the module
|
|
|
|
will be called raid0.
|
|
|
|
|
|
|
|
If unsure, say Y.
|
|
|
|
|
|
|
|
config MD_RAID1
|
|
|
|
tristate "RAID-1 (mirroring) mode"
|
|
|
|
depends on BLK_DEV_MD
|
|
|
|
---help---
|
|
|
|
A RAID-1 set consists of several disk drives which are exact copies
|
|
|
|
of each other. In the event of a mirror failure, the RAID driver
|
|
|
|
will continue to use the operational mirrors in the set, providing
|
|
|
|
an error free MD (multiple device) to the higher levels of the
|
|
|
|
kernel. In a set with N drives, the available space is the capacity
|
|
|
|
of a single drive, and the set protects against a failure of (N - 1)
|
|
|
|
drives.
|
|
|
|
|
|
|
|
Information about Software RAID on Linux is contained in the
|
|
|
|
Software-RAID mini-HOWTO, available from
|
|
|
|
<http://www.tldp.org/docs.html#howto>. There you will also
|
|
|
|
learn where to get the supporting user space utilities raidtools.
|
|
|
|
|
|
|
|
If you want to use such a RAID-1 set, say Y. To compile this code
|
|
|
|
as a module, choose M here: the module will be called raid1.
|
|
|
|
|
|
|
|
If unsure, say Y.
|
|
|
|
|
|
|
|
config MD_RAID10
|
|
|
|
tristate "RAID-10 (mirrored striping) mode (EXPERIMENTAL)"
|
|
|
|
depends on BLK_DEV_MD && EXPERIMENTAL
|
|
|
|
---help---
|
|
|
|
RAID-10 provides a combination of striping (RAID-0) and
|
2006-06-26 15:27:50 +08:00
|
|
|
mirroring (RAID-1) with easier configuration and more flexible
|
2005-04-17 06:20:36 +08:00
|
|
|
layout.
|
|
|
|
Unlike RAID-0, but like RAID-1, RAID-10 requires all devices to
|
|
|
|
be the same size (or at least, only as much as the smallest device
|
|
|
|
will be used).
|
|
|
|
RAID-10 provides a variety of layouts that provide different levels
|
|
|
|
of redundancy and performance.
|
|
|
|
|
|
|
|
RAID-10 requires mdadm-1.7.0 or later, available at:
|
|
|
|
|
|
|
|
ftp://ftp.kernel.org/pub/linux/utils/raid/mdadm/
|
|
|
|
|
|
|
|
If unsure, say Y.
|
|
|
|
|
2006-06-26 15:27:38 +08:00
|
|
|
config MD_RAID456
|
|
|
|
tristate "RAID-4/RAID-5/RAID-6 mode"
|
2005-04-17 06:20:36 +08:00
|
|
|
depends on BLK_DEV_MD
|
async_tx: add the async_tx api
The async_tx api provides methods for describing a chain of asynchronous
bulk memory transfers/transforms with support for inter-transactional
dependencies. It is implemented as a dmaengine client that smooths over
the details of different hardware offload engine implementations. Code
that is written to the api can optimize for asynchronous operation and the
api will fit the chain of operations to the available offload resources.
I imagine that any piece of ADMA hardware would register with the
'async_*' subsystem, and a call to async_X would be routed as
appropriate, or be run in-line. - Neil Brown
async_tx exploits the capabilities of struct dma_async_tx_descriptor to
provide an api of the following general format:
struct dma_async_tx_descriptor *
async_<operation>(..., struct dma_async_tx_descriptor *depend_tx,
dma_async_tx_callback cb_fn, void *cb_param)
{
struct dma_chan *chan = async_tx_find_channel(depend_tx, <operation>);
struct dma_device *device = chan ? chan->device : NULL;
int int_en = cb_fn ? 1 : 0;
struct dma_async_tx_descriptor *tx = device ?
device->device_prep_dma_<operation>(chan, len, int_en) : NULL;
if (tx) { /* run <operation> asynchronously */
...
tx->tx_set_dest(addr, tx, index);
...
tx->tx_set_src(addr, tx, index);
...
async_tx_submit(chan, tx, flags, depend_tx, cb_fn, cb_param);
} else { /* run <operation> synchronously */
...
<operation>
...
async_tx_sync_epilog(flags, depend_tx, cb_fn, cb_param);
}
return tx;
}
async_tx_find_channel() returns a capable channel from its pool. The
channel pool is organized as a per-cpu array of channel pointers. The
async_tx_rebalance() routine is tasked with managing these arrays. In the
uniprocessor case async_tx_rebalance() tries to spread responsibility
evenly over channels of similar capabilities. For example if there are two
copy+xor channels, one will handle copy operations and the other will
handle xor. In the SMP case async_tx_rebalance() attempts to spread the
operations evenly over the cpus, e.g. cpu0 gets copy channel0 and xor
channel0 while cpu1 gets copy channel 1 and xor channel 1. When a
dependency is specified async_tx_find_channel defaults to keeping the
operation on the same channel. A xor->copy->xor chain will stay on one
channel if it supports both operation types, otherwise the transaction will
transition between a copy and a xor resource.
Currently the raid5 implementation in the MD raid456 driver has been
converted to the async_tx api. A driver for the offload engines on the
Intel Xscale series of I/O processors, iop-adma, is provided in a later
commit. With the iop-adma driver and async_tx, raid456 is able to offload
copy, xor, and xor-zero-sum operations to hardware engines.
On iop342 tiobench showed higher throughput for sequential writes (20 - 30%
improvement) and sequential reads to a degraded array (40 - 55%
improvement). For the other cases performance was roughly equal, +/- a few
percentage points. On a x86-smp platform the performance of the async_tx
implementation (in synchronous mode) was also +/- a few percentage points
of the original implementation. According to 'top' on iop342 CPU
utilization drops from ~50% to ~15% during a 'resync' while the speed
according to /proc/mdstat doubles from ~25 MB/s to ~50 MB/s.
The tiobench command line used for testing was: tiobench --size 2048
--block 4096 --block 131072 --dir /mnt/raid --numruns 5
* iop342 had 1GB of memory available
Details:
* if CONFIG_DMA_ENGINE=n the asynchronous path is compiled away by making
async_tx_find_channel a static inline routine that always returns NULL
* when a callback is specified for a given transaction an interrupt will
fire at operation completion time and the callback will occur in a
tasklet. if the the channel does not support interrupts then a live
polling wait will be performed
* the api is written as a dmaengine client that requests all available
channels
* In support of dependencies the api implicitly schedules channel-switch
interrupts. The interrupt triggers the cleanup tasklet which causes
pending operations to be scheduled on the next channel
* Xor engines treat an xor destination address differently than a software
xor routine. To the software routine the destination address is an implied
source, whereas engines treat it as a write-only destination. This patch
modifies the xor_blocks routine to take a an explicit destination address
to mirror the hardware.
Changelog:
* fixed a leftover debug print
* don't allow callbacks in async_interrupt_cond
* fixed xor_block changes
* fixed usage of ASYNC_TX_XOR_DROP_DEST
* drop dma mapping methods, suggested by Chris Leech
* printk warning fixups from Andrew Morton
* don't use inline in C files, Adrian Bunk
* select the API when MD is enabled
* BUG_ON xor source counts <= 1
* implicitly handle hardware concerns like channel switching and
interrupts, Neil Brown
* remove the per operation type list, and distribute operation capabilities
evenly amongst the available channels
* simplify async_tx_find_channel to optimize the fast path
* introduce the channel_table_initialized flag to prevent early calls to
the api
* reorganize the code to mimic crypto
* include mm.h as not all archs include it in dma-mapping.h
* make the Kconfig options non-user visible, Adrian Bunk
* move async_tx under crypto since it is meant as 'core' functionality, and
the two may share algorithms in the future
* move large inline functions into c files
* checkpatch.pl fixes
* gpl v2 only correction
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-By: NeilBrown <neilb@suse.de>
2007-01-03 02:10:44 +08:00
|
|
|
select ASYNC_MEMCPY
|
|
|
|
select ASYNC_XOR
|
2005-04-17 06:20:36 +08:00
|
|
|
---help---
|
|
|
|
A RAID-5 set of N drives with a capacity of C MB per drive provides
|
|
|
|
the capacity of C * (N - 1) MB, and protects against a failure
|
|
|
|
of a single drive. For a given sector (row) number, (N - 1) drives
|
|
|
|
contain data sectors, and one drive contains the parity protection.
|
|
|
|
For a RAID-4 set, the parity blocks are present on a single drive,
|
|
|
|
while a RAID-5 set distributes the parity across the drives in one
|
|
|
|
of the available parity distribution methods.
|
|
|
|
|
2006-06-26 15:27:38 +08:00
|
|
|
A RAID-6 set of N drives with a capacity of C MB per drive
|
|
|
|
provides the capacity of C * (N - 2) MB, and protects
|
|
|
|
against a failure of any two drives. For a given sector
|
|
|
|
(row) number, (N - 2) drives contain data sectors, and two
|
|
|
|
drives contains two independent redundancy syndromes. Like
|
|
|
|
RAID-5, RAID-6 distributes the syndromes across the drives
|
|
|
|
in one of the available parity distribution methods.
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
Information about Software RAID on Linux is contained in the
|
|
|
|
Software-RAID mini-HOWTO, available from
|
|
|
|
<http://www.tldp.org/docs.html#howto>. There you will also
|
|
|
|
learn where to get the supporting user space utilities raidtools.
|
|
|
|
|
2006-06-26 15:27:38 +08:00
|
|
|
If you want to use such a RAID-4/RAID-5/RAID-6 set, say Y. To
|
2005-04-17 06:20:36 +08:00
|
|
|
compile this code as a module, choose M here: the module
|
2006-06-26 15:27:38 +08:00
|
|
|
will be called raid456.
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
If unsure, say Y.
|
|
|
|
|
2006-03-27 17:18:10 +08:00
|
|
|
config MD_RAID5_RESHAPE
|
2006-10-03 16:16:00 +08:00
|
|
|
bool "Support adding drives to a raid-5 array"
|
|
|
|
depends on MD_RAID456
|
|
|
|
default y
|
2006-03-27 17:18:10 +08:00
|
|
|
---help---
|
|
|
|
A RAID-5 set can be expanded by adding extra drives. This
|
|
|
|
requires "restriping" the array which means (almost) every
|
|
|
|
block must be written to a different place.
|
|
|
|
|
|
|
|
This option allows such restriping to be done while the array
|
2006-10-03 16:16:00 +08:00
|
|
|
is online.
|
2006-03-27 17:18:10 +08:00
|
|
|
|
2006-06-26 15:27:50 +08:00
|
|
|
You will need mdadm version 2.4.1 or later to use this
|
2006-04-11 13:52:48 +08:00
|
|
|
feature safely. During the early stage of reshape there is
|
|
|
|
a critical section where live data is being over-written. A
|
|
|
|
crash during this time needs extra care for recovery. The
|
|
|
|
newer mdadm takes a copy of the data in the critical section
|
|
|
|
and will restore it, if necessary, after a crash.
|
2006-03-27 17:18:10 +08:00
|
|
|
|
|
|
|
The mdadm usage is e.g.
|
|
|
|
mdadm --grow /dev/md1 --raid-disks=6
|
|
|
|
to grow '/dev/md1' to having 6 disks.
|
|
|
|
|
|
|
|
Note: The array can only be expanded, not contracted.
|
|
|
|
There should be enough spares already present to make the new
|
|
|
|
array workable.
|
|
|
|
|
2006-10-03 16:16:00 +08:00
|
|
|
If unsure, say Y.
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
config MD_MULTIPATH
|
|
|
|
tristate "Multipath I/O support"
|
|
|
|
depends on BLK_DEV_MD
|
|
|
|
help
|
|
|
|
Multipath-IO is the ability of certain devices to address the same
|
|
|
|
physical disk over multiple 'IO paths'. The code ensures that such
|
|
|
|
paths can be defined and handled at runtime, and ensures that a
|
|
|
|
transparent failover to the backup path(s) happens if a IO errors
|
|
|
|
arrives on the primary path.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
|
|
|
config MD_FAULTY
|
|
|
|
tristate "Faulty test module for MD"
|
|
|
|
depends on BLK_DEV_MD
|
|
|
|
help
|
|
|
|
The "faulty" module allows for a block device that occasionally returns
|
|
|
|
read or write errors. It is useful for testing.
|
|
|
|
|
|
|
|
In unsure, say N.
|
|
|
|
|
|
|
|
config BLK_DEV_DM
|
|
|
|
tristate "Device mapper support"
|
|
|
|
---help---
|
|
|
|
Device-mapper is a low level volume manager. It works by allowing
|
|
|
|
people to specify mappings for ranges of logical sectors. Various
|
|
|
|
mapping types are available, in addition people may write their own
|
|
|
|
modules containing custom mappings if they wish.
|
|
|
|
|
|
|
|
Higher level volume managers such as LVM2 use this driver.
|
|
|
|
|
|
|
|
To compile this as a module, choose M here: the module will be
|
|
|
|
called dm-mod.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2006-10-03 16:15:35 +08:00
|
|
|
config DM_DEBUG
|
|
|
|
boolean "Device mapper debugging support"
|
|
|
|
depends on BLK_DEV_DM && EXPERIMENTAL
|
|
|
|
---help---
|
|
|
|
Enable this for messages that may help debug device-mapper problems.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
config DM_CRYPT
|
|
|
|
tristate "Crypt target support"
|
|
|
|
depends on BLK_DEV_DM && EXPERIMENTAL
|
|
|
|
select CRYPTO
|
2006-12-10 06:50:36 +08:00
|
|
|
select CRYPTO_CBC
|
2005-04-17 06:20:36 +08:00
|
|
|
---help---
|
|
|
|
This device-mapper target allows you to create a device that
|
|
|
|
transparently encrypts the data on it. You'll need to activate
|
|
|
|
the ciphers you're going to use in the cryptoapi configuration.
|
|
|
|
|
|
|
|
Information on how to use dm-crypt can be found on
|
|
|
|
|
|
|
|
<http://www.saout.de/misc/dm-crypt/>
|
|
|
|
|
|
|
|
To compile this code as a module, choose M here: the module will
|
|
|
|
be called dm-crypt.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
|
|
|
config DM_SNAPSHOT
|
|
|
|
tristate "Snapshot target (EXPERIMENTAL)"
|
|
|
|
depends on BLK_DEV_DM && EXPERIMENTAL
|
|
|
|
---help---
|
2006-06-26 15:27:50 +08:00
|
|
|
Allow volume managers to take writable snapshots of a device.
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
config DM_MIRROR
|
|
|
|
tristate "Mirror target (EXPERIMENTAL)"
|
|
|
|
depends on BLK_DEV_DM && EXPERIMENTAL
|
|
|
|
---help---
|
|
|
|
Allow volume managers to mirror logical volumes, also
|
|
|
|
needed for live data migration tools such as 'pvmove'.
|
|
|
|
|
|
|
|
config DM_ZERO
|
|
|
|
tristate "Zero target (EXPERIMENTAL)"
|
|
|
|
depends on BLK_DEV_DM && EXPERIMENTAL
|
|
|
|
---help---
|
|
|
|
A target that discards writes, and returns all zeroes for
|
|
|
|
reads. Useful in some recovery situations.
|
|
|
|
|
|
|
|
config DM_MULTIPATH
|
|
|
|
tristate "Multipath target (EXPERIMENTAL)"
|
|
|
|
depends on BLK_DEV_DM && EXPERIMENTAL
|
|
|
|
---help---
|
|
|
|
Allow volume managers to support multipath hardware.
|
|
|
|
|
|
|
|
config DM_MULTIPATH_EMC
|
|
|
|
tristate "EMC CX/AX multipath support (EXPERIMENTAL)"
|
|
|
|
depends on DM_MULTIPATH && BLK_DEV_DM && EXPERIMENTAL
|
|
|
|
---help---
|
|
|
|
Multipath support for EMC CX/AX series hardware.
|
|
|
|
|
2007-07-13 00:30:05 +08:00
|
|
|
config DM_MULTIPATH_RDAC
|
|
|
|
tristate "LSI/Engenio RDAC multipath support (EXPERIMENTAL)"
|
2007-08-25 06:35:15 +08:00
|
|
|
depends on DM_MULTIPATH && BLK_DEV_DM && SCSI && EXPERIMENTAL
|
2007-07-13 00:30:05 +08:00
|
|
|
---help---
|
|
|
|
Multipath support for LSI/Engenio RDAC.
|
|
|
|
|
2007-10-20 05:47:54 +08:00
|
|
|
config DM_MULTIPATH_HP
|
|
|
|
tristate "HP MSA multipath support (EXPERIMENTAL)"
|
|
|
|
depends on DM_MULTIPATH && BLK_DEV_DM && EXPERIMENTAL
|
|
|
|
---help---
|
|
|
|
Multipath support for HP MSA (Active/Passive) series hardware.
|
|
|
|
|
2007-05-09 17:33:06 +08:00
|
|
|
config DM_DELAY
|
|
|
|
tristate "I/O delaying target (EXPERIMENTAL)"
|
|
|
|
depends on BLK_DEV_DM && EXPERIMENTAL
|
|
|
|
---help---
|
|
|
|
A target that delays reads and/or writes and can send
|
|
|
|
them to different devices. Useful for testing.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2007-07-17 19:06:11 +08:00
|
|
|
endif # MD
|