OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Mike Snitzer	f6d16d3279	dm thin: fix Documentation for held metadata root feature The Documentation for the thin provisioning target's held metadata root feature was incorrect. It is now available and the value for the held metadata root is in block units (not 512b sectors). Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-03-06 14:23:35 -05:00
Mike Snitzer	07f2b6e038	dm thin: ensure user takes action to validate data and metadata consistency If a thin metadata operation fails the current transaction will abort, whereby causing potential for IO layers up the stack (e.g. filesystems) to have data loss. As such, set THIN_METADATA_NEEDS_CHECK_FLAG in the thin metadata's superblock which: 1) requires the user verify the thin metadata is consistent (e.g. use thin_check, etc) 2) suggests the user verify the thin data is consistent (e.g. use fsck) The only way to clear the superblock's THIN_METADATA_NEEDS_CHECK_FLAG is to run thin_repair. On metadata operation failure: abort current metadata transaction, set pool in read-only mode, and now set the needs_check flag. As part of this change, constraints are introduced or relaxed: * don't allow a pool to transition to write mode if needs_check is set * don't allow data or metadata space to be resized if needs_check is set * if a thin pool's metadata space is exhausted: the kernel will now force the user to take the pool offline for repair before the kernel will allow the metadata space to be extended. Also, update Documentation to include information about when the thin provisioning target commits metadata, how it handles metadata failures and running out of space. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com>	2014-03-05 15:25:35 -05:00
Mike Snitzer	2e68c4e6ca	dm cache: add policy name to status output The cache's policy may have been established using the "default" alias, which is currently the "mq" policy but the default policy may change in the future. It is useful to know exactly which policy is being used. Add a 'real' member to the dm_cache_policy_type structure and have the "default" dm_cache_policy_type point to the real "mq" dm_cache_policy_type. Update dm_cache_policy_get_name() to check if real is set, if so report the name of the real policy (not the alias). Requested-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-01-16 13:44:11 -05:00
Mike Snitzer	6a388618f1	dm cache: add block sizes and total cache blocks to status output Improve cache_status to emit: <metadata block size> <#used metadata blocks>/<#total metadata blocks> <cache block size> <#used cache blocks>/<#total cache blocks> ... Adding the block sizes allows for easier calculation of the overall size of both the metadata and cache devices. Adding <#total cache blocks> provides useful context for how much of the cache is used. Unfortunately these additions to the status will require updates to users' scripts that monitor the cache status. But these changes help provide more comprehensive information about the cache device and will simplify tools that are being developed to manage dm-cache devices -- because they won't need to issue 3 operations to cobble together the information that we can easily provide via a single status ioctl. While updating the status documentation in cache.txt spaces were tabify'd. Requested-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>	2014-01-10 10:24:33 -05:00
Joe Thornber	78e03d6973	dm cache policy mq: introduce three promotion threshold tunables Internally the mq policy maintains a promotion threshold variable. If the hit count of a block not in the cache goes above this threshold it gets promoted to the cache. This patch introduces three new tunables that allow you to tweak the promotion threshold by adding a small value. These adjustments depend on the io type: read_promote_adjustment: READ io, default 4 write_promote_adjustment: WRITE io, default 8 discard_promote_adjustment: READ/WRITE io to a discarded block, default 1 If you're trying to quickly warm a new cache device you may wish to reduce these to encourage promotion. Remember to switch them back to their defaults after the cache fills though. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-01-07 10:14:33 -05:00
Mike Snitzer	787a996cb2	dm thin: add error_if_no_space feature If the pool runs out of data or metadata space, the pool can either queue or error the IO destined to the data device. The default is to queue the IO until more space is added. An admin may now configure the pool to error IO when no space is available by setting the 'error_if_no_space' feature when loading the thin-pool table. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>	2014-01-07 10:14:30 -05:00
Mike Snitzer	83f539e1a4	dm cache: update Documentation for invalidate_cblocks's range syntax The cache target's invalidate_cblocks message allows cache block (cblock) ranges to be expressed with: <cblock start>-<cblock end> The range's <cblock end> value is "one past the end", so the range includes <cblock start> through <cblock end>-1. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>	2013-12-10 16:35:15 -05:00
Mike Snitzer	7b6b2bc98c	dm cache: resolve small nits and improve Documentation Document passthrough mode, cache shrinking, and cache invalidation. Also, use strcasecmp() and hlist_unhashed(). Reported-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2013-11-12 13:11:09 -05:00
Joe Thornber	65790ff919	dm cache: add cache block invalidation support Cache block invalidation is removing an entry from the cache without writing it back. Cache blocks can be invalidated via the 'invalidate_cblocks' message, which takes an arbitrary number of cblock ranges: invalidate_cblocks [<cblock>\|<cblock begin>-<cblock end>]* E.g. dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789 Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2013-11-11 11:37:51 -05:00
Joe Thornber	2ee57d5873	dm cache: add passthrough mode "Passthrough" is a dm-cache operating mode (like writethrough or writeback) which is intended to be used when the cache contents are not known to be coherent with the origin device. It behaves as follows: * All reads are served from the origin device (all reads miss the cache) * All writes are forwarded to the origin device; additionally, write hits cause cache block invalidates This mode decouples cache coherency checks from cache device creation, largely to avoid having to perform coherency checks while booting. Boot scripts can create cache devices in passthrough mode and put them into service (mount cached filesystems, for example) without having to worry about coherency. Coherency that exists is maintained, although the cache will gradually cool as writes take place. Later, applications can perform coherency checks, the nature of which will depend on the type of the underlying storage. If coherency can be verified, the cache device can be transitioned to writethrough or writeback mode while still warm; otherwise, the cache contents can be discarded prior to transitioning to the desired operating mode. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Morgan Mears <Morgan.Mears@netapp.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2013-11-11 11:37:49 -05:00
Joe Thornber	01911c19be	dm cache policy mq: implement writeback_work() and mq_{set,clear}_dirty() There are now two multiqueues for in cache blocks. A clean one and a dirty one. writeback_work comes from the dirty one. Demotions come from the clean one. There are two benefits: - Performance improvement, since demoting a clean block is a noop. - The cache cleans itself when io load is light. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2013-11-09 18:20:25 -05:00
Milan Broz	ed04d98169	dm crypt: add TCW IV mode for old CBC TCRYPT containers dm-crypt can already activate TCRYPT (TrueCrypt compatible) containers in LRW or XTS block encryption mode. TCRYPT containers prior to version 4.1 use CBC mode with some additional tweaks, this patch adds support for these containers. This new mode is implemented using special IV generator named TCW (TrueCrypt IV with whitening). TCW IV only supports containers that are encrypted with one cipher (Tested with AES, Twofish, Serpent, CAST5 and TripleDES). While this mode is legacy and is known to be vulnerable to some watermarking attacks (e.g. revealing of hidden disk existence) it can still be useful to activate old containers without using 3rd party software or for independent forensic analysis of such containers. (Both the userspace and kernel code is an independent implementation based on the format documentation and it completely avoids use of original source code.) The TCW IV generator uses two additional keys: Kw (whitening seed, size is always 16 bytes - TCW_WHITENING_SIZE) and Kiv (IV seed, size is always the IV size of the selected cipher). These keys are concatenated at the end of the main encryption key provided in mapping table. While whitening is completely independent from IV, it is implemented inside IV generator for simplification. The whitening value is always 16 bytes long and is calculated per sector from provided Kw as initial seed, xored with sector number and mixed with CRC32 algorithm. Resulting value is xored with ciphertext sector content. IV is calculated from the provided Kiv as initial IV seed and xored with sector number. Detailed calculation can be found in the Truecrypt documentation for version < 4.1 and will also be described on dm-crypt site, see: http://code.google.com/p/cryptsetup/wiki/DMCrypt The experimental support for activation of these containers is already present in git devel brach of cryptsetup. Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2013-11-09 18:20:20 -05:00
Mikulas Patocka	fd2ed4d252	dm: add statistics support Support the collection of I/O statistics on user-defined regions of a DM device. If no regions are defined no statistics are collected so there isn't any performance impact. Only bio-based DM devices are currently supported. Each user-defined region specifies a starting sector, length and step. Individual statistics will be collected for each step-sized area within the range specified. The I/O statistics counters for each step-sized area of a region are in the same format as /sys/block/*/stat or /proc/diskstats but extra counters (12 and 13) are provided: total time spent reading and writing in milliseconds. All these counters may be accessed by sending the @stats_print message to the appropriate DM device via dmsetup. The creation of DM statistics will allocate memory via kmalloc or fallback to using vmalloc space. At most, 1/4 of the overall system memory may be allocated by DM statistics. The admin can see how much memory is used by reading /sys/module/dm_mod/parameters/stats_current_allocated_bytes See Documentation/device-mapper/statistics.txt for more details. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2013-09-05 20:46:06 -04:00
Carlos Maiolino	a561ddbec1	dm thin: add data block size limits to Documentation Inform users that the data block size can't be any arbitrary number, i.e. its value must be between 64KB and 1GB. Also, it should be a multiple of 64KB. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>	2013-08-23 09:02:14 -04:00
Mike Snitzer	0547304463	dm cache: add data block size limits to code and Documentation Place upper bound on the cache's data block size (1GB). Inform users that the data block size can't be any arbitrary number, i.e. its value must be between 32KB and 1GB. Also, it should be a multiple of 32KB. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>	2013-08-23 09:02:13 -04:00
Mike Snitzer	66bb2644ca	dm cache: document metadata device is exclussive to a cache A cache's metadata device may not be shared by multiple cache devices. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>	2013-08-23 09:02:13 -04:00
Linus Torvalds	9903883f1d	Add a device-mapper target called dm-switch to provide a multipath framework for storage arrays that dynamically reconfigure their preferred paths for different device regions. Fix a bug in the verity target that prevented its use with some specific sizes of devices. Improve some locking mechanisms in the device-mapper core and bufio. Add Mike Snitzer as a device-mapper maintainer. A few more clean-ups and fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJR3ehdAAoJEK2W1qbAHj1nseUP+gPgoX2YTBiKW/fQnbixb11c 0BExXiHtHgVnxQP4aJo8BJRFW9/DAN740UvKb2XjjbNChIQ47j6vOLCCzJ+97wW+ FCJ48pltsacgywvm5e3BbnwmcmpQXKk1Wd+1/9beWbcib9IzVB2B06Esv3HRtQZj cQbIkeeTGbrSnsiAWSQh2xsNqjv1YObUohs43uG+Pa0WmdE1KebAYfkgEvi0b+E6 ehSsvAMqYRgkLvYdYTxRNJtC+H3pkucS6r42Q/tZj2YciU3tc0v6rsFW9Ey+l0E7 c5KaUAKk5e3HAhFvJ4ydlj7r1cu7G49rixIBJ60lX86QBwmZ8js5EEPliw0ZoWI+ av1P+9gLsxaQTH/Cw8jJW4xK7hYAZAvn//iNVBAATATd65nmQImHNWWMjr205Kw9 9XOeFUxAdnM7ITKXJkFf3vH2tFrRAKgXiR57im5ZuLMOFYWjR6EYE870+GCWSya8 Dhzj0Mb8IFHrelEbRWicNbD5IaAxvfQ6/sTvXBiV642jImkQIyIj+PBiIvsq8fTH LKNL1l545R5aOHSU4TXnseq3TcIqElx0KsPTJuZq+q/2UfvMe9Lv9g+ld5CywfH1 1HkEB75yWPvEfOtIac9tzQSt3KnF01fC2QMYZE4rSiYs8KPgln9pxo+UulUaZzId 8Gch3/C5cBBCHjMJtv/b =s5m4 -----END PGP SIGNATURE----- Merge tag 'dm-3.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm Pull device-mapper changes from Alasdair G Kergon: "Add a device-mapper target called dm-switch to provide a multipath framework for storage arrays that dynamically reconfigure their preferred paths for different device regions. Fix a bug in the verity target that prevented its use with some specific sizes of devices. Improve some locking mechanisms in the device-mapper core and bufio. Add Mike Snitzer as a device-mapper maintainer. A few more clean-ups and fixes" * tag 'dm-3.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm: dm: add switch target dm: update maintainers dm: optimize reorder structure dm: optimize use SRCU and RCU dm bufio: submit writes outside lock dm cache: fix arm link errors with inline dm verity: use __ffs and __fls dm flakey: correct ctr alloc failure mesg dm verity: remove pointless comparison dm: use __GFP_HIGHMEM in __vmalloc dm verity: fix inability to use a few specific devices sizes dm ioctl: set noio flag to avoid __vmalloc deadlock dm mpath: fix ioctl deadlock when no paths	2013-07-11 13:05:40 -07:00
Jim Ramsay	9d0eb0ab43	dm: add switch target dm-switch is a new target that maps IO to underlying block devices efficiently when there is a large number of fixed-sized address regions but there is no simple pattern to allow for a compact mapping representation such as dm-stripe. Though we have developed this target for a specific storage device, Dell EqualLogic, we have made an effort to keep it as general purpose as possible in the hope that others may benefit. Originally developed by Jim Ramsay. Simplified by Mikulas Patocka. Signed-off-by: Jim Ramsay <jim_ramsay@dell.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2013-07-10 23:41:19 +01:00
Linus Torvalds	80cc38b163	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "The usual stuff from trivial tree" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits) treewide: relase -> release Documentation/cgroups/memory.txt: fix stat file documentation sysctl/net.txt: delete reference to obsolete 2.4.x kernel spinlock_api_smp.h: fix preprocessor comments treewide: Fix typo in printk doc: device tree: clarify stuff in usage-model.txt. open firmware: "/aliasas" -> "/aliases" md: bcache: Fixed a typo with the word 'arithmetic' irq/generic-chip: fix a few kernel-doc entries frv: Convert use of typedef ctl_table to struct ctl_table sgi: xpc: Convert use of typedef ctl_table to struct ctl_table doc: clk: Fix incorrect wording Documentation/arm/IXP4xx fix a typo Documentation/networking/ieee802154 fix a typo Documentation/DocBook/media/v4l fix a typo Documentation/video4linux/si476x.txt fix a typo Documentation/virtual/kvm/api.txt fix a typo Documentation/early-userspace/README fix a typo Documentation/video4linux/soc-camera.txt fix a typo lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment ...	2013-07-04 11:40:58 -07:00
Jonathan Brassow	c4a3955145	MD: Remember the last sync operation that was performed MD: Remember the last sync operation that was performed This patch adds a field to the mddev structure to track the last sync operation that was performed. This is especially useful when it comes to what is recorded in mismatch_cnt in sysfs. If the last operation was "data-check", then it reports the number of descrepancies found by the user-initiated check. If it was a "repair" operation, then it is reporting the number of descrepancies repaired. etc. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2013-06-26 12:38:24 +10:00
Jonathan Brassow	9092c02d94	DM RAID: Add ability to restore transiently failed devices on resume DM RAID: Add ability to restore transiently failed devices on resume This patch adds code to the resume function to check over the devices in the RAID array. If any are found to be marked as failed and their superblocks can be read, an attempt is made to reintegrate them into the array. This allows the user to refresh the array with a simple suspend and resume of the array - rather than having to load a completely new table, allocate and initialize all the structures and throw away the old instantiation. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2013-06-14 08:10:24 +10:00
Anatol Pomozov	f884ab15af	doc: fix misspellings with 'codespell' tool Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2013-05-28 12:02:12 +02:00
Jonathan Brassow	be83651f00	DM RAID: Add message/status support for changing sync action DM RAID: Add message/status support for changing sync action This patch adds a message interface to dm-raid to allow the user to more finely control the sync actions being performed by the MD driver. This gives the user the ability to initiate "check" and "repair" (i.e. scrubbing). Two additional fields have been appended to the status output to provide more information about the type of sync action occurring and the results of those actions, specifically: <sync_action> and <mismatch_cnt>. These new fields will always be populated. This is essentially the device-mapper way of doing what MD controls through the 'sync_action' sysfs file and shows through the 'mismatch_cnt' sysfs file. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2013-04-24 11:42:43 +10:00
Linus Torvalds	a5e0d73163	md updates for 3.9 mostly little bugfixes. Only "feature" is a new RAID10 layout which slightly improves the number of sets of devices that can concurrently fail, without data loss. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIVAwUAUTPm+znsnt1WYoG5AQLLsw/+PMqr8roC4twgxTWV1NRbU8NtOcRi9Rj9 uvBS63uYAaLdi/D3UBKFYczmNCu9knuXbcp9SgFDxH7LlthQsWN/GYnif06pPo3w 9Agu5M8c062TJEG1vrnX6FhPO6pNgrWFr3h+CKkTiD3179i9DoQpP8LXQToeyMtI YRMQf/zCkxYtDvWAP0iwsEWtw8cf+q9I/uGPhQ1L+DnZapXYdbtnqWBRz9q6mrDt orcGrP41aZHvnOHUaTbwmaorCKkf/Ys4SMaGenrSFpnpQMypt7VgNuwHC59LxvJT 5eiFG/26zIsv7Wk0jv/TvFP5qzUPo0/PFkd5ug0ArvbVRiXS2cMJDwQvMdO1toxD i5Bb+P9DptadvoWhOTgIpxnG77yRH45wJvyJOk+ZfS1/IO87nCRa3d0yiNOU5e2/ o0VdXPZRr72sdKKTK6kQuYfwCPb+Z2Pz6Q8BJdk6GxlmTXyP6sKhIgwUX86534fE LrOxfK8qV+GetVu3X02RoX2CyJJRQHXyXmbHuSzXuo/JiOYtDigAydwNZChvf+tf OoMY9K8vgNbhnGsUG6la7XPvZ+6dZMjdnxp2HB99Ml5A3PWZd75i5T6IHHxIQFbD C3z9PWTWP+hK4k15DEyjlELtsE9WduGTXG4kUcf328xJ/7lj4VIImVugdCz+1B6z +HlI6BiLwzY= =YdVD -----END PGP SIGNATURE----- Merge tag 'md-3.9' of git://neil.brown.name/md Pull md updates from NeilBrown: "Mostly little bugfixes. Only "feature" is a new RAID10 layout which slightly improves the number of sets of devices that can concurrently fail, without data loss." * tag 'md-3.9' of git://neil.brown.name/md: md: expedite metadata update when switching read-auto -> active md: remove CONFIG_MULTICORE_RAID456 md/raid1,raid10: fix deadlock with freeze_array() md/raid0: improve error message when converting RAID4-with-spares to RAID0 md: raid0: fix error return from create_stripe_zones. md: fix two bugs when attempting to resize RAID0 array. DM RAID: Add support for MD's RAID10 "far" and "offset" algorithms MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 2) MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 1) MD RAID10: Minor non-functional code changes md: raid1,10: Handle REQ_WRITE_SAME flag in write bios md: protect against crash upon fsync on ro array	2013-03-05 17:22:08 -08:00
Heinz Mauelshagen	8735a81347	dm cache: add cleaner policy A simple cache policy that writes back all data to the origin. This is used to decommission a dm cache by emptying it. Signed-off-by: Heinz Mauelshagen <mauelshagen@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2013-03-01 22:45:52 +00:00
Joe Thornber	f283635281	dm cache: add mq policy A cache policy that uses a multiqueue ordered by recent hit count to select which blocks should be promoted and demoted. This is meant to be a general purpose policy. It prioritises reads over writes. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2013-03-01 22:45:51 +00:00
Joe Thornber	c6b4fcbad0	dm: add cache target Add a target that allows a fast device such as an SSD to be used as a cache for a slower device such as a disk. A plug-in architecture was chosen so that the decisions about which data to migrate and when are delegated to interchangeable tunable policy modules. The first general purpose module we have developed, called "mq" (multiqueue), follows in the next patch. Other modules are under development. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Heinz Mauelshagen <mauelshagen@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2013-03-01 22:45:51 +00:00
Jonathan Brassow	fe5d2f4a15	DM RAID: Add support for MD's RAID10 "far" and "offset" algorithms DM RAID: Add support for MD's RAID10 "far" and "offset" algorithms Until now, dm-raid.c only supported the "near" algorthm of MD's RAID10 implementation. This patch adds support for the "far" and "offset" algorithms, but only with the improved redundancy that is brought with the introduction of the 'use_far_sets' bit, which shifts copied stripes according to smaller sets vs the entire array. That is, the 17th bit of the 'layout' variable that defines the RAID10 implementation will always be set. (More information on how the 'layout' variable selects the RAID10 algorithm can be found in the opening comments of drivers/md/raid10.c.) Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2013-02-26 11:55:36 +11:00
Jonathan Brassow	55ebbb59c1	DM-RAID: Fix RAID10's check for sufficient redundancy Before attempting to activate a RAID array, it is checked for sufficient redundancy. That is, we make sure that there are not too many failed devices - or devices specified for rebuild - to undermine our ability to activate the array. The current code performs this check twice - once to ensure there were not too many devices specified for rebuild by the user ('validate_rebuild_devices') and again after possibly experiencing a failure to read the superblock ('analyse_superblocks'). Neither of these checks are sufficient. The first check is done properly but with insufficient information about the possible failure state of the devices to make a good determination if the array can be activated. The second check is simply done wrong in the case of RAID10 because it doesn't account for the independence of the stripes (i.e. mirror sets). The solution is to use the properly written check ('validate_rebuild_devices'), but perform the check after the superblocks have been read and we know which devices have failed. This gives us one check instead of two and performs it in a location where it can be done right. Only RAID10 was affected and it was affected in the following ways: - the code did not properly catch the condition where a user specified a device for rebuild that already had a failed device in the same mirror set. (This condition would, however, be caught at a deeper level in MD.) - the code triggers a false positive and denies activation when devices in independent mirror sets have failed - counting the failures as though they were all in the same set. The most likely place this error was introduced (or this patch should have been included) is in commit `4ec1e369` - first introduced in v3.7-rc1. Consequently this fix should also go in v3.7.y, however there is a small conflict on the .version in raid_target, so I'll submit a separate patch to -stable. Cc: stable@vger.kernel.org Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2013-01-24 12:02:36 +11:00
Jonathan Brassow	4ec1e369af	DM RAID: Add rebuild capability for RAID10 DM RAID: Add code to validate replacement slots for RAID10 arrays RAID10 can handle 'copies - 1' failures for each mirror group. This code ensures the user has provided a valid array - one whose devices specified for rebuild do not exceed the amount of redundancy available. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-10-11 13:40:24 +11:00
Linus Torvalds	fcff06c438	Merge branch 'for-next' of git://neil.brown.name/md Pull md updates from NeilBrown. * 'for-next' of git://neil.brown.name/md: DM RAID: Add support for MD RAID10 md/RAID1: Add missing case for attempting to repair known bad blocks. md/raid5: For odirect-write performance, do not set STRIPE_PREREAD_ACTIVE. md/raid1: don't abort a resync on the first badblock. md: remove duplicated test on ->openers when calling do_md_stop() raid5: Add R5_ReadNoMerge flag which prevent bio from merging at block layer md/raid1: prevent merging too large request md/raid1: read balance chooses idlest disk for SSD md/raid1: make sequential read detection per disk based MD RAID10: Export md_raid10_congested MD: Move macros from raid1.h to raid1.c MD RAID1: rename mirror_info structure MD RAID10: rename mirror_info structure MD RAID10: Fix compiler warning. raid5: add a per-stripe lock raid5: remove unnecessary bitmap write optimization raid5: lockless access raid5 overrided bi_phys_segments raid5: reduce chance release_stripe() taking device_lock	2012-08-01 09:02:01 -07:00
Jonathan Brassow	63f33b8dda	DM RAID: Add support for MD RAID10 Support the MD RAID10 personality through dm-raid.c Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-08-01 20:41:20 +10:00
Joe Thornber	e49e582965	dm thin: add read only and fail io modes Add read-only and fail-io modes to thin provisioning. If a transaction commit fails the pool's metadata device will transition to "read-only" mode. If a commit fails once already in read-only mode the transition to "fail-io" mode occurs. Once in fail-io mode the pool and all associated thin devices will report a status of "Fail". Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-07-27 15:08:16 +01:00
Mike Snitzer	eb850de608	dm stripe: support for non power of 2 chunksize Support non-power-of-2 chunk sizes with dm striping for proper alignment of stripe IO on storage that has non-power-of-2 optimal IO sizes (e.g. RAID6 10+2). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-07-27 15:08:01 +01:00
Mikulas Patocka	f14fa693c9	dm stripe: fix size test dm-stripe is supposed to ensure that all the space allocated to the stripes is fully used and that all stripes are the same size. This patch fixes the test. It checks that device length is divisible by the chunk size and checks that the resulting quotient is divisible by the number of stripes (which is equivalent to testing if device length is divisible by chunk_size * stripes). Previously, the code only tested that the number of sectors in the target was divisible by each of the chunk size and the number of stripes separately, which could leave entire stripes unused. (A setup that genuinely needs some stripes to be shorter than others can be created by concatenating striped targets.) Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-07-27 15:08:00 +01:00
Milan Broz	18068bdd5f	dm: verity fix documentation Veritysetup is now part of cryptsetup package. Remove on-disk header description (which is not parsed in kernel) and point users to cryptsetup where it the format is documented. Mention units for block size paramaters. Fix target line specification and dmsetup parameters. Signed-off-by: Milan Broz <mbroz@redhat.com> Cc: stable@kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-07-03 12:55:41 +01:00
Joe Thornber	cc8394d86f	dm thin: provide userspace access to pool metadata This patch implements two new messages that can be sent to the thin pool target allowing it to take a snapshot of the _metadata_. This, read-only snapshot can be accessed by userland, concurrently with the live target. Only one metadata snapshot can be held at a time. The pool's status line will give the block location for the current msnap. Since version 0.1.5 of the userland thin provisioning tools, the thin_dump program displays the msnap as follows: thin_dump -m <msnap root> <metadata dev> Available here: https://github.com/jthornber/thin-provisioning-tools Now that userland can access the metadata we can do various things that have traditionally been kernel side tasks: i) Incremental backups. By using metadata snapshots we can work out what blocks have changed over time. Combined with data snapshots we can ensure the data doesn't change while we back it up. A short proof of concept script can be found here: https://github.com/jthornber/thinp-test-suite/blob/master/incremental_backup_example.rb ii) Migration of thin devices from one pool to another. iii) Merging snapshots back into an external origin. iv) Asyncronous replication. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-06-03 00:30:01 +01:00
Mikulas Patocka	a4ffc15219	dm: add verity target This device-mapper target creates a read-only device that transparently validates the data on one underlying device against a pre-generated tree of cryptographic checksums stored on a second device. Two checksum device formats are supported: version 0 which is already shipping in Chromium OS and version 1 which incorporates some improvements. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mandeep Singh Baines <msb@chromium.org> Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Elly Jones <ellyjones@chromium.org> Cc: Milan Broz <mbroz@redhat.com> Cc: Olof Johansson <olofj@chromium.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-03-28 18:43:38 +01:00
Joe Thornber	67e2e2b281	dm thin: add pool target flags to control discard Add dm thin target arguments to control discard support. ignore_discard: Disables discard support no_discard_passdown: Don't pass discards down to the underlying data device, but just remove the mapping within the thin provisioning target. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-03-28 18:41:29 +01:00
Joe Thornber	2dd9c257fb	dm thin: support read only external snapshot origins Support the use of an external _read only_ device as an origin for a thin device. Any read to an unprovisioned area of the thin device will be passed through to the origin. Writes trigger allocation of new blocks as usual. One possible use case for this would be VM hosts that want to run guests on thinly-provisioned volumes but have the base image on another device (possibly shared between many VMs). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-03-28 18:41:28 +01:00
Mike Snitzer	c4a69ecdb4	dm thin: relax hard limit on the maximum size of a metadata device The thin metadata format can only make use of a device that is <= THIN_METADATA_MAX_SECTORS (currently 15.9375 GB). Therefore, there is no practical benefit to using a larger device. However, it may be that other factors impose a certain granularity for the space that is allocated to a device (E.g. lvm2 can impose a coarse granularity through the use of large, >= 1 GB, physical extents). Rather than reject a larger metadata device, during thin-pool device construction, switch to allowing it but issue a warning if a device larger than THIN_METADATA_MAX_SECTORS_WARNING (16 GB) is provided. Any space over 15.9375 GB will not be used. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-03-28 18:41:28 +01:00
Joe Thornber	fe878f34df	dm thin: correct comments Remove documentation for unimplemented 'trim' message. I'd planned a 'trim' target message for shrinking thin devices, but this is better handled via the discard ioctl. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-03-28 18:41:24 +01:00
Masanari Iida	40e47125e6	Documentation: Fix multiple typo in Documentation Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-03-07 16:08:24 +01:00
Masanari Iida	4998d8ed54	Documentation: Fix typo in thin-provisioning.txt Correct spelling "descibes" to "describes" in Documentation/device-mapper/thin-provisioning.txt Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-02-21 11:40:37 +01:00
Jonathan Brassow	b89544575d	dm log userspace: fix comment hyphens Fix comments: clustered-disk needs a hyphen not an underscore. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:21:22 +00:00
Joe Thornber	991d9fa02d	dm: add thin provisioning target Initial EXPERIMENTAL implementation of device-mapper thin provisioning with snapshot support. The 'thin' target is used to create instances of the virtual devices that are hosted in the 'thin-pool' target. The thin-pool target provides data sharing among devices. This sharing is made possible using the persistent-data library in the previous patch. The main highlight of this implementation, compared to the previous implementation of snapshots, is that it allows many virtual devices to be stored on the same data volume, simplifying administration and allowing sharing of data between volumes (thus reducing disk usage). Another big feature is support for arbitrary depth of recursive snapshots (snapshots of snapshots of snapshots ...). The previous implementation of snapshots did this by chaining together lookup tables, and so performance was O(depth). This new implementation uses a single data structure so we don't get this degradation with depth. For further information and examples of how to use this, please read Documentation/device-mapper/thin-provisioning.txt Signed-off-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:21:18 +00:00
Joe Thornber	3241b1d3e0	dm: add persistent data library The persistent-data library offers a re-usable framework for the storage and management of on-disk metadata in device-mapper targets. It's used by the thin-provisioning target in the next patch and in an upcoming hierarchical storage target. For further information, please read Documentation/device-mapper/persistent-data.txt Signed-off-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:19:11 +00:00
Milan Broz	772ae5f54d	dm crypt: optionally support discard requests Add optional parameter field to dmcrypt table and support "allow_discards" option. Discard requests bypass crypt queue processing. Bio is simple remapped to underlying device. Note that discard will be never enabled by default because of security consequences. It is up to the administrator to enable it for encrypted devices. (Note that userspace cryptsetup does not understand new optional parameters yet. Support for this will come later. Until then, you should use 'dmsetup' to enable and disable this.) Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-08-02 12:32:08 +01:00
Jonathan Brassow	b12d437b73	dm raid: support metadata devices Add the ability to parse and use metadata devices to dm-raid. Although not strictly required, without the metadata devices, many features of RAID are unavailable. They are used to store a superblock and bitmap. The role, or position in the array, of each device must be recorded in its superblock. This is to help with fault handling, array reshaping, and sanity checks. RAID 4/5/6 devices must be loaded in a specific order: in this way, the 'array_position' field helps validate the correctness of the mapping when it is loaded. It can be used during reshaping to identify which devices are added/removed. Fault handling is impossible without this field. For example, when a device fails it is recorded in the superblock. If this is a RAID1 device and the offending device is removed from the array, there must be a way during subsequent array assembly to determine that the failed device was the one removed. This is done by correlating the 'array_position' field and the bit-field variable 'failed_devices'. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-08-02 12:32:07 +01:00
Jonathan Brassow	46bed2b5c1	dm raid: add write_mostly parameter Add the write_mostly parameter to RAID1 dm-raid tables. This allows the user to set the WriteMostly flag on a RAID1 device that should normally be avoided for read I/O. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-08-02 12:32:07 +01:00

1 2

72 Commits