OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Heinz Mauelshagen	6e53636fe8	dm raid: add raid4/5/6 journal write-back support via journal_mode option Commit `63c32ed4af` ("dm raid: add raid4/5/6 journaling support") added journal support to close the raid4/5/6 "write hole" -- in terms of writethrough caching. Introduce a "journal_mode" feature and use the new r5c_journal_mode_set() API to add support for switching the journal device's cache mode between write-through (the current default) and write-back. NOTE: If the journal device is not layered on resilent storage and it fails, write-through mode will cause the "write hole" to reoccur. But if the journal fails while in write-back mode it will cause data loss for any dirty cache entries unless resilent storage is used for the journal. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-03-27 12:08:07 -04:00
Heinz Mauelshagen	4464e36e06	dm raid: fix table line argument order in status Commit `3a1c1ef2f` ("dm raid: enhance status interface and fixup takeover/raid0") added new table line arguments and introduced an ordering flaw. The sequence of the raid10_copies and raid10_format raid parameters got reversed which causes lvm2 userspace to fail by falsely assuming a changed table line. Sequence those 2 parameters as before so that old lvm2 can function properly with new kernels by adjusting the table line output as documented in Documentation/device-mapper/dm-raid.txt. Also, add missing version 1.10.1 highlight to the documention. Fixes: `3a1c1ef2f` ("dm raid: enhance status interface and fixup takeover/raid0") Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-03-27 11:45:26 -04:00
Masahiro Yamada	34dcaf40c1	scripts/spelling.txt: add "explictely" pattern and fix typo instances Fix typos and add the following to the scripts/spelling.txt: explictely\|\|explicitly Link: http://lkml.kernel.org/r/1481573103-11329-25-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-27 18:43:47 -08:00
Heinz Mauelshagen	63c32ed4af	dm raid: add raid4/5/6 journaling support Add md raid4/5/6 journaling support (upstream commit `bac624f3f8` started the implementation) which closes the write hole (i.e. non-atomic updates to stripes) using a dedicated journal device. Background: raid4/5/6 stripes hold N data payloads per stripe plus one parity raid4/5 or two raid6 P/Q syndrome payloads in an in-memory stripe cache. Parity or P/Q syndromes used to recover any data payloads in case of a disk failure are calculated from the N data payloads and need to be updated on the different component devices of the raid device. Those are non-atomic, persistent updates. Hence a crash can cause failure to update all stripe payloads persistently and thus cause data loss during stripe recovery. This problem gets addressed by writing whole stripe cache entries (together with journal metadata) to a persistent journal entry on a dedicated journal device. Only if that journal entry is written successfully, the stripe cache entry is updated on the component devices of the raid device (i.e. writethrough type). In case of a crash, the entry can be recovered from the journal and be written again thus ensuring consistent stripe payload suitable to data recovery. Future dependencies: once writeback caching being worked on to compensate for the throughput implictions involved with writethrough overhead is supported with journaling in upstream, an additional patch based on this one will support it in dm-raid. Journal resilience related remarks: because stripes are recovered from the journal in case of a crash, the journal device better be resilient. Resilience becomes mandatory with future writeback support, because loosing the working set in the log means data loss as oposed to writethrough, were the loss of the journal device 'only' reintroduces the write hole. Fix comment on data offsets in parse_dev_params() and initialize new_data_offset as well. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-01-25 12:49:06 +01:00
Heinz Mauelshagen	c63ede3b42	dm raid: fix transient device failure processing This fix addresses the following 3 failure scenarios: 1) If a (transiently) inaccessible metadata device is being passed into the constructor (e.g. a device tuple '254:4 254:5'), it is processed as if '- -' was given. This erroneously results in a status table line containing '- -', which mistakenly differs from what has been passed in. As a result, userspace libdevmapper puts the device tuple seperate from the RAID device thus not processing the dependencies properly. 2) False health status char 'A' instead of 'D' is emitted on the status status info line for the meta/data device tuple in this metadata device failure case. 3) If the metadata device is accessible when passed into the constructor but the data device (partially) isn't, that leg may be set faulty by the raid personality on access to the (partially) unavailable leg. Restore tried in a second raid device resume on such failed leg (status char 'D') fails after the (partial) leg returned. Fixes for aforementioned failure scenarios: - don't release passed in devices in the constructor thus allowing the status table line to e.g. contain '254:4 254:5' rather than '- -' - emit device status char 'D' rather than 'A' for the device tuple with the failed metadata device on the status info line - when attempting to restore faulty devices in a second resume, allow the device hot remove function to succeed by setting the device to not in-sync In case userspace intentionally passes '- -' into the constructor to avoid that device tuple (e.g. to split off a raid1 leg temporarily for later re-addition), the status table line will correctly show '- -' and the status info line will provide a '-' device health character for the non-defined device tuple. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-01-25 12:49:06 +01:00
Heinz Mauelshagen	58fc4fedee	Documentation: dm raid: define data_offset status field Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-12-08 14:13:13 -05:00
Masanari Iida	bb1423a96f	dm raid: fix typos in Documentation/device-mapper/dm-raid.txt Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-11-21 09:52:04 -05:00
Heinz Mauelshagen	b052b07c39	dm raid: fix activation of existing raid4/10 devices dm-raid 1.9.0 fails to activate existing RAID4/10 devices that have the old superblock format (which does not have takeover/reshaping support that was added via commit `33e53f0685`). Fix validation path for old superblocks by reverting to the old raid4 layout and basing checks on mddev->new_{level,layout,...} members in super_init_validation(). Cc: stable@vger.kernel.org # 4.8 Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-10-17 16:41:31 -04:00
Heinz Mauelshagen	d41bfed091	dm raid: update Documentation about reshaping/takeover/additonal RAID types Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-06-14 18:52:12 -04:00
Heinz Mauelshagen	f15f4d7200	dm raid: document RAID 4/5/6 discard support For RAID 4/5/6 data integrity reasons 'discard_zeroes_data' must work properly. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-08-31 15:05:31 -04:00
Heinz Mauelshagen	0cf4503174	dm raid: add support for the MD RAID0 personality Add dm-raid access to the MD RAID0 personality to enable single zone striping. The following changes enable that access: - add type definition to raid_types array - make bitmap creation conditonal in super_validate(), because bitmaps are not allowed in raid0 - set rdev->sectors to the data image size in super_validate() to allow the raid0 personality to calculate the MD array size properly - use mdddev(un)lock() functions instead of direct mutex_(un)lock() (wrapped in here because it's a trivial change) - enhance raid_status() to always report full sync for raid0 so that userspace checks for 100% sync will succeed and allow for resize (and takeover/reshape once added in future paches) - enhance raid_resume() to not load bitmap in case of raid0 - add merge function to avoid data corruption (seen with readahead) that resulted from bio payloads that grew too large. This problem did not occur with the other raid levels because it either did not apply without striping (raid1) or was avoided via stripe caching. - raise version to 1.7.0 because of the raid0 API change Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Reviewed-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-05-29 14:19:00 -04:00
Heinz Mauelshagen	0f4106b32f	dm raid: fixup documentation for discard support Remove comment above parse_raid_params() that claims "devices_handle_discard_safely" is a table line argument when it is actually is a module parameter. Also, backfill dm-raid target version 1.6.0 documentation. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Reviewed-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-05-29 14:18:59 -04:00
Jonathan Brassow	c4a3955145	MD: Remember the last sync operation that was performed MD: Remember the last sync operation that was performed This patch adds a field to the mddev structure to track the last sync operation that was performed. This is especially useful when it comes to what is recorded in mismatch_cnt in sysfs. If the last operation was "data-check", then it reports the number of descrepancies found by the user-initiated check. If it was a "repair" operation, then it is reporting the number of descrepancies repaired. etc. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2013-06-26 12:38:24 +10:00
Jonathan Brassow	9092c02d94	DM RAID: Add ability to restore transiently failed devices on resume DM RAID: Add ability to restore transiently failed devices on resume This patch adds code to the resume function to check over the devices in the RAID array. If any are found to be marked as failed and their superblocks can be read, an attempt is made to reintegrate them into the array. This allows the user to refresh the array with a simple suspend and resume of the array - rather than having to load a completely new table, allocate and initialize all the structures and throw away the old instantiation. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2013-06-14 08:10:24 +10:00
Jonathan Brassow	be83651f00	DM RAID: Add message/status support for changing sync action DM RAID: Add message/status support for changing sync action This patch adds a message interface to dm-raid to allow the user to more finely control the sync actions being performed by the MD driver. This gives the user the ability to initiate "check" and "repair" (i.e. scrubbing). Two additional fields have been appended to the status output to provide more information about the type of sync action occurring and the results of those actions, specifically: <sync_action> and <mismatch_cnt>. These new fields will always be populated. This is essentially the device-mapper way of doing what MD controls through the 'sync_action' sysfs file and shows through the 'mismatch_cnt' sysfs file. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2013-04-24 11:42:43 +10:00
Jonathan Brassow	fe5d2f4a15	DM RAID: Add support for MD's RAID10 "far" and "offset" algorithms DM RAID: Add support for MD's RAID10 "far" and "offset" algorithms Until now, dm-raid.c only supported the "near" algorthm of MD's RAID10 implementation. This patch adds support for the "far" and "offset" algorithms, but only with the improved redundancy that is brought with the introduction of the 'use_far_sets' bit, which shifts copied stripes according to smaller sets vs the entire array. That is, the 17th bit of the 'layout' variable that defines the RAID10 implementation will always be set. (More information on how the 'layout' variable selects the RAID10 algorithm can be found in the opening comments of drivers/md/raid10.c.) Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2013-02-26 11:55:36 +11:00
Jonathan Brassow	55ebbb59c1	DM-RAID: Fix RAID10's check for sufficient redundancy Before attempting to activate a RAID array, it is checked for sufficient redundancy. That is, we make sure that there are not too many failed devices - or devices specified for rebuild - to undermine our ability to activate the array. The current code performs this check twice - once to ensure there were not too many devices specified for rebuild by the user ('validate_rebuild_devices') and again after possibly experiencing a failure to read the superblock ('analyse_superblocks'). Neither of these checks are sufficient. The first check is done properly but with insufficient information about the possible failure state of the devices to make a good determination if the array can be activated. The second check is simply done wrong in the case of RAID10 because it doesn't account for the independence of the stripes (i.e. mirror sets). The solution is to use the properly written check ('validate_rebuild_devices'), but perform the check after the superblocks have been read and we know which devices have failed. This gives us one check instead of two and performs it in a location where it can be done right. Only RAID10 was affected and it was affected in the following ways: - the code did not properly catch the condition where a user specified a device for rebuild that already had a failed device in the same mirror set. (This condition would, however, be caught at a deeper level in MD.) - the code triggers a false positive and denies activation when devices in independent mirror sets have failed - counting the failures as though they were all in the same set. The most likely place this error was introduced (or this patch should have been included) is in commit `4ec1e369` - first introduced in v3.7-rc1. Consequently this fix should also go in v3.7.y, however there is a small conflict on the .version in raid_target, so I'll submit a separate patch to -stable. Cc: stable@vger.kernel.org Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2013-01-24 12:02:36 +11:00
Jonathan Brassow	4ec1e369af	DM RAID: Add rebuild capability for RAID10 DM RAID: Add code to validate replacement slots for RAID10 arrays RAID10 can handle 'copies - 1' failures for each mirror group. This code ensures the user has provided a valid array - one whose devices specified for rebuild do not exceed the amount of redundancy available. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-10-11 13:40:24 +11:00
Jonathan Brassow	63f33b8dda	DM RAID: Add support for MD RAID10 Support the MD RAID10 personality through dm-raid.c Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-08-01 20:41:20 +10:00
Masanari Iida	40e47125e6	Documentation: Fix multiple typo in Documentation Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-03-07 16:08:24 +01:00
Jonathan Brassow	b12d437b73	dm raid: support metadata devices Add the ability to parse and use metadata devices to dm-raid. Although not strictly required, without the metadata devices, many features of RAID are unavailable. They are used to store a superblock and bitmap. The role, or position in the array, of each device must be recorded in its superblock. This is to help with fault handling, array reshaping, and sanity checks. RAID 4/5/6 devices must be loaded in a specific order: in this way, the 'array_position' field helps validate the correctness of the mapping when it is loaded. It can be used during reshaping to identify which devices are added/removed. Fault handling is impossible without this field. For example, when a device fails it is recorded in the superblock. If this is a RAID1 device and the offending device is removed from the array, there must be a way during subsequent array assembly to determine that the failed device was the one removed. This is done by correlating the 'array_position' field and the bit-field variable 'failed_devices'. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-08-02 12:32:07 +01:00
Jonathan Brassow	46bed2b5c1	dm raid: add write_mostly parameter Add the write_mostly parameter to RAID1 dm-raid tables. This allows the user to set the WriteMostly flag on a RAID1 device that should normally be avoided for read I/O. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-08-02 12:32:07 +01:00
Jonathan Brassow	c1084561bb	dm raid: add region_size parameter Allow the user to specify the region_size. Ensures that the supplied value meets md's constraints, viz. the number of regions does not exceed 2^21. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-08-02 12:32:07 +01:00
Jonathan Brassow	c0a2fa1ef1	dm raid: improve table parameters documentation Add more information about some dm-raid table parameters and clarify how parameters are printed when 'dmsetup table' is issued. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-08-02 12:32:06 +01:00
NeilBrown	9d09e663d5	dm: raid456 basic support This patch is the skeleton for the DM target that will be the bridge from DM to MD (initially RAID456 and later RAID1). It provides a way to use device-mapper interfaces to the MD RAID456 drivers. As with all device-mapper targets, the nominal public interfaces are the constructor (CTR) tables and the status outputs (both STATUSTYPE_INFO and STATUSTYPE_TABLE). The CTR table looks like the following: 1: <s> <l> raid \ 2: <raid_type> <#raid_params> <raid_params> \ 3: <#raid_devs> <meta_dev1> <dev1> .. <meta_devN> <devN> Line 1 contains the standard first three arguments to any device-mapper target - the start, length, and target type fields. The target type in this case is "raid". Line 2 contains the arguments that define the particular raid type/personality/level, the required arguments for that raid type, and any optional arguments. Possible raid types include: raid4, raid5_la, raid5_ls, raid5_rs, raid6_zr, raid6_nr, and raid6_nc. (again, raid1 is planned for the future.) The list of required and optional parameters is the same for all the current raid types. The required parameters are positional, while the optional parameters are given as key/value pairs. The possible parameters are as follows: <chunk_size> Chunk size in sectors. [[no]sync] Force/Prevent RAID initialization [rebuild <idx>] Rebuild the drive indicated by the index [daemon_sleep <ms>] Time between bitmap daemon work to clear bits [min_recovery_rate <kB/sec/disk>] Throttle RAID initialization [max_recovery_rate <kB/sec/disk>] Throttle RAID initialization [max_write_behind <value>] See '-write-behind=' (man mdadm) [stripe_cache <sectors>] Stripe cache size for higher RAIDs Line 3 contains the list of devices that compose the array in metadata/data device pairs. If the metadata is stored separately, a '-' is given for the metadata device position. If a drive has failed or is missing at creation time, a '-' can be given for both the metadata and data drives for a given position. Examples: # RAID4 - 4 data drives, 1 parity # No metadata devices specified to hold superblock/bitmap info # Chunk size of 1MiB # (Lines separated for easy reading) 0 1960893648 raid \ raid4 1 2048 \ 5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81 # RAID4 - 4 data drives, 1 parity (no metadata devices) # Chunk size of 1MiB, force RAID initialization, # min recovery rate at 20 kiB/sec/disk 0 1960893648 raid \ raid4 4 2048 min_recovery_rate 20 sync\ 5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81 Performing a 'dmsetup table' should display the CTR table used to construct the mapping (with possible reordering of optional parameters). Performing a 'dmsetup status' will yield information on the state and health of the array. The output is as follows: 1: <s> <l> raid \ 2: <raid_type> <#devices> <1 health char for each dev> <resync_ratio> Line 1 is standard DM output. Line 2 is best shown by example: 0 1960893648 raid raid4 5 AAAAA 2/490221568 Here we can see the RAID type is raid4, there are 5 devices - all of which are 'A'live, and the array is 2/490221568 complete with recovery. Cc: linux-raid@vger.kernel.org Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-01-13 20:00:02 +00:00

25 Commits