OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	01aa9d518e	This is a fairly typical cycle for documentation. There's some welcome readability improvements for the formatted output, some LICENSES updates including the addition of the ISC license, the removal of the unloved and unmaintained 00-INDEX files, the deprecated APIs document from Kees, more MM docs from Mike Rapoport, and the usual pile of typo fixes and corrections. -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJbztcuAAoJEI3ONVYwIuV6nTAP/0Be+5dNPGJmSnb/RbkwBuBV zAFVUj2sx4lZlRmWRZ0r7AOef2eSw3IvwBix/vnmllYCVahjp+BdRbhXQAijjyeb FWWjOH50/J+BaxSthAINiLRLvuoe0D/M08OpmXQfRl5q0S8RufeV3BDtEABx9j2n IICPGTl8LpPUgSMA4cw8zPhHdauhZpbmL2mGE9LXZ27SJT4S8lcHMwyPU1n5S+Jd ChEz5g9dYr3GNxFp712pkI5GcVL3tP2nfoVwK7EuGf1tvSnEnn2kzac8QgMqorIh xB2+Sh4XIUCbHYpGHpxIniD+WI4voNr/E7STQioJK5o2G4HTuxLjktvTezNF8paa hgNHWjPQBq0OOCdM/rsffONFF2J/v/r7E3B+kaRg8pE0uZWTFaDMs6MVaL2fL4Ls DrFhi90NJI/Fs7uB4sriiviShAhwboiSIRXJi4VlY/5oFJKHFgqes+R7miU+zTX3 2qv0k4mWZXWDV9w1piPxSCZSdRzaoYSoxEihX+tnYpCyEcYd9ovW/X1Uhl/wCWPl Ft+Op6rkHXRXVfZzTLuF6PspZ4Udpw2PUcnA5zj5FRDDBsjSMFR31c19IFbCeiNY kbTIcqejJG1WbVrAK4LCcFyVSGxbrr281eth4rE06cYmmsz3kJy1DB6Lhyg/2vI0 I8K9ZJ99n1RhPJIcburB =C0wt -----END PGP SIGNATURE----- Merge tag 'docs-4.20' of git://git.lwn.net/linux Pull documentation updates from Jonathan Corbet: "This is a fairly typical cycle for documentation. There's some welcome readability improvements for the formatted output, some LICENSES updates including the addition of the ISC license, the removal of the unloved and unmaintained 00-INDEX files, the deprecated APIs document from Kees, more MM docs from Mike Rapoport, and the usual pile of typo fixes and corrections" * tag 'docs-4.20' of git://git.lwn.net/linux: (41 commits) docs: Fix typos in histogram.rst docs: Introduce deprecated APIs list kernel-doc: fix declaration type determination doc: fix a typo in adding-syscalls.rst docs/admin-guide: memory-hotplug: remove table of contents doc: printk-formats: Remove bogus kobject references for device nodes Documentation: preempt-locking: Use better example dm flakey: Document "error_writes" feature docs/completion.txt: Fix a couple of punctuation nits LICENSES: Add ISC license text LICENSES: Add note to CDDL-1.0 license that it should not be used docs/core-api: memory-hotplug: add some details about locking internals docs/core-api: rename memory-hotplug-notifier to memory-hotplug docs: improve readability for people with poorer eyesight yama: clarify ptrace_scope=2 in Yama documentation docs/vm: split memory hotplug notifier description to Documentation/core-api docs: move memory hotplug description into admin-guide/mm doc: Fix acronym "FEKEK" in ecryptfs docs: fix some broken documentation references iommu: Fix passthrough option documentation ...	2018-10-24 18:01:11 +01:00
Nikolay Borisov	0c6c987f37	dm flakey: Document "error_writes" feature Commit `ef548c551e` ("dm flakey: introduce "error_writes" feature") added the ability to dm flakey to error out writes in contrast to silently dropping it with 'drop_writes'. Unfortunately this feature is not currently documented and one has to be either familiar with the source code of dm flakey or check out xfstests sources to know of this parameter. So document it. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2018-10-12 11:31:39 -06:00
Bart Van Assche	9305455acf	block: Finish renaming REQ_DISCARD into REQ_OP_DISCARD Some time ago REQ_DISCARD was renamed into REQ_OP_DISCARD. Some comments and documentation files were not updated however. Update these comments and documentation files. See also commit `4e1b2d52a8` ("block, fs, drivers: remove REQ_OP compat defs and related code"). Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Mike Christie <mchristi@redhat.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-10-03 16:12:28 -06:00
Heinz Mauelshagen	5380c05b68	dm raid: bump target version, update comments and documentation Bump target version to reflect the documented fixes are available. Also fix some code comments (typos and clarity). Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2018-09-06 17:07:58 -04:00
Andy Grover	63c8ecb626	dm thin: include metadata_low_watermark threshold in pool status The metadata low watermark threshold is set by the kernel. But the kernel depends on userspace to extend the thinpool metadata device when the threshold is crossed. Since the metadata low watermark threshold is not visible to userspace, upon receiving an event, userspace cannot tell that the kernel wants the metadata device extended, instead of some other eventing condition. Making it visible (but not settable) enables userspace to affirmatively know the kernel is asking for a metadata device extension, by comparing metadata_low_watermark against nr_free_blocks_metadata, also reported in status. Current solutions like dmeventd have their own thresholds for extending the data and metadata devices, and both devices are checked against their thresholds on each event. This lessens the value of the kernel-set threshold, since userspace will either extend the metadata device sooner, when receiving another event; or will receive the metadata lowater event and do nothing, if dmeventd's threshold is less than the kernel's. (This second case is dangerous. The metadata lowater event will not be re-sent, so no further event will be generated before the metadata device is out if space, unless some other event causes userspace to recheck its thresholds.) Signed-off-by: Andy Grover <agrover@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2018-07-30 11:49:08 -04:00
Mikulas Patocka	a3fcf72531	dm integrity: recalculate checksums on creation When using external metadata device and internal hash, recalculate the checksums when the device is created - so that dm-integrity doesn't have to overwrite the device. The superblock stores the last position when the recalculation ended, so that it is properly restarted. Integrity tags that haven't been recalculated yet are ignored. Also bump the target version. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2018-07-27 15:24:27 -04:00
Mikulas Patocka	cda6b5ab7f	dm delay: add flush as a third class of IO Add a new class for dm-delay that delays flush requests. Previously, flushes were delayed as writes, but it caused problems if the user needed to create a device with one or a few slow sectors for the purpose of testing - all flushes would be forwarded to this device and delayed, and that skews the test results. Fix this by allowing to select 0 delay for flushes. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2018-07-27 15:24:19 -04:00
Mike Snitzer	6c7413c0f5	dm thin: update stale "Status" Documentation Documentation/device-mapper-/thin-provisioning.txt's "Status" section no longer reflected the current fitness level of DM thin-provisioning. That is, DM thinp is no longer "EXPERIMENTAL". It has since seen considerable improvement, has been fairly widely deployed and has performed in a robust manner. Update Documentation to dispel concern raised by potential DM thinp users. Reported-by: Drew Hastings <dhastings@crucialwebhost.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2018-07-27 15:24:03 -04:00
Mikulas Patocka	d284f8248c	dm writecache: support optional offset for start of device Add an optional parameter "start_sector" to allow the start of the device to be offset by the specified number of 512-byte sectors. The sectors below this offset are not used by the writecache device and are left to be used for disk labels and/or userspace metadata (e.g. lvm). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2018-07-02 16:14:02 -04:00
Mikulas Patocka	48debafe4f	dm: add writecache target The writecache target caches writes on persistent memory or SSD. It is intended for databases or other programs that need extremely low commit latency. The writecache target doesn't cache reads because reads are supposed to be cached in page cache in normal RAM. If persistent memory isn't available this target can still be used in SSD mode. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Colin Ian King <colin.king@canonical.com> # fix missing goto Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> # fix compilation issue with !DAX Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> # use msecs_to_jiffies Acked-by: Dan Williams <dan.j.williams@intel.com> # reworks to unify ARM and x86 flushing Signed-off-by: Mike Snitzer <msnitzer@redhat.com>	2018-06-08 11:59:51 -04:00
Mike Snitzer	28700a3623	dm thin: update Documentation to clarify when "read_only" is valid Due to user confusion, clarify that it doesn't make sense to try to create a thin-pool with "read_only" mode enabled. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2018-05-10 11:18:49 -04:00
Patrik Torstensson	843f38d382	dm verity: add 'check_at_most_once' option to only validate hashes once This allows platforms that are CPU/memory contrained to verify data blocks only the first time they are read from the data device, rather than every time. As such, it provides a reduced level of security because only offline tampering of the data device's content will be detected, not online tampering. Hash blocks are still verified each time they are read from the hash device, since verification of hash blocks is less performance critical than data blocks, and a hash block will not be verified any more after all the data blocks it covers have been verified anyway. This option introduces a bitset that is used to check if a block has been validated before or not. A block can be validated more than once as there is no thread protection for the bitset. These changes were developed and tested on entry-level Android Go devices. Signed-off-by: Patrik Torstensson <totte@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2018-04-03 15:04:29 -04:00
John Pittman	9614e2ba91	dm cache: Documentation: update default migration_throttling value In commit `f8350daf7a` ("dm cache: tune migration throttling") the value for DEFAULT_MIGRATION_THRESHOLD was decreased from 204800 to 2048. Edit device-mapper/cache.txt to reflect the correct default value for migration_threshold. Signed-off-by: John Pittman <jpittman@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2018-01-30 16:55:47 -05:00
mulhern	7efd5fed6f	dm thin: extend thinpool status format string with omitted fields Signed-off-by: mulhern <amulhern@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2018-01-17 09:16:12 -05:00
mulhern	cc3ff0af19	dm thin: fixes in thin-provisioning.txt Make the format string for thinpool status more correct. Swap the order of two items to correspond with reality. Signed-off-by: mulhern <amulhern@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2018-01-17 09:16:12 -05:00
mulhern	2bc8a61c69	dm thin: document representation of <highest mapped sector> when there is none Signed-off-by: mulhern <amulhern@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2018-01-17 09:16:11 -05:00
mulhern	9b28a1102e	dm thin: fix documentation relative to low water mark threshold Fixes: 1. The use of "exceeds" when the opposite of exceeds, falls below, was meant. 2. Properly speaking, a table can not exceed a threshold. It emphasizes the important point, which is that it is the userspace daemon's responsibility to check for low free space when a device is resumed, since it won't get a special event indicating low free space in that situation. Signed-off-by: mulhern <amulhern@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2018-01-17 09:16:10 -05:00
mulhern	1346638e5f	dm cache: be consistent in specifying sectors and SI units in cache.txt Signed-off-by: mulhern <amulhern@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2018-01-17 09:16:09 -05:00
mulhern	3716e20af5	dm cache: delete obsoleted paragraph in cache.txt The 'mq' policy is no longer the default policy, and the default policy, 'smq', does not store hit counts. Signed-off-by: mulhern <amulhern@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2018-01-17 09:16:08 -05:00
mulhern	677210462d	dm cache: fix grammar in cache-policies.txt Use possessive pronoun where appropriate, instead of contraction. Signed-off-by: mulhern <amulhern@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2018-01-17 09:16:07 -05:00
Mikulas Patocka	424da29c5a	dm snapshot: improve documentation relative to origin suspend requirements Add a note to snapshot.txt that the origin target must be suspended when loading or unloading the snapshot target. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2018-01-17 09:16:06 -05:00
Scott Bauer	18a5bf2705	dm: add unstriped target This device mapper "unstriped" target remaps and unstripes I/O so it is issued solely on a single drive in a HW RAID0 or dm-striped target. In a 4 drive HW RAID0 the striped target exposes 1/4th of the LBA range as a virtual drive. Each I/O to that virtual drive will only be issued to the 1 drive that was selected of the 4 drives in the HW RAID0. This unstriped target is most useful for Intel NVMe drives that have multiple cores but that do not have firmware control to pin separate LBA ranges to each discrete cpu core. Signed-off-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2018-01-17 09:16:00 -05:00
Heinz Mauelshagen	11e4723206	dm raid: stop keeping raid set frozen altogether In order to avoid redoing synchronization/recovery/reshape partially, the raid set got frozen until after all passed in table line flags had been cleared. The related table reload sequence had to be precisely followed, or reshaping may lead to data corruption caused by the active mapping carrying on with a reshape when the inactive mapping already had retrieved a stale reshape position. Harden by retrieving the actual resync/recovery/reshape position during resume whilst the active table is suspended thus avoiding to keep the raid set frozen altogether. This prevents superfluous redoing of an already resynchronized or recovered segment and, most importantly, potential for redoing of an already reshaped segment causing data corruption. Fixes: `d39f0010e` ("dm raid: fix raid_resume() to keep raid set frozen as needed") Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-12-13 11:52:02 -05:00
Mike Snitzer	b84cf26924	dm raid: bump target version to reflect numerous fixes Also update Documentation accordingly. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-12-08 10:59:58 -05:00
Jonathan Brassow	41dcf197ad	dm raid: fix incorrect status output at the end of a "recover" process There are three important fields that indicate the overall health and status of an array: dev_health, sync_ratio, and sync_action. They tell us the condition of the devices in the array, and the degree to which the array is synchronized. This commit fixes a condition that is reported incorrectly. When a member of the array is being rebuilt or a new device is added, the "recover" process is used to synchronize it with the rest of the array. When the process is complete, but the sync thread hasn't yet been reaped, it is possible for the state of MD to be: mddev->recovery = [ MD_RECOVERY_RUNNING MD_RECOVERY_RECOVER MD_RECOVERY_DONE ] curr_resync_completed = <max dev size> (but not MaxSector) and all rdevs to be In_sync. This causes the 'array_in_sync' output parameter that is passed to rs_get_progress() to be computed incorrectly and reported as 'false' -- or not in-sync. This in turn causes the dev_health status characters to be reported as all 'a', rather than the proper 'A'. This can cause erroneous output for several seconds at a time when tools will want to be checking the condition due to events that are raised at the end of a sync process. Fix this by properly calculating the 'array_in_sync' return parameter in rs_get_progress(). Also, remove an unnecessary intermediate 'recovery_cp' variable in rs_get_progress(). Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-10-05 16:21:30 -04:00
Heinz Mauelshagen	ac6a318888	dm raid: bump target version Bumo dm-raid target version to 1.12.1 to reflect that commit `cc27b0c78c` ("md: fix deadlock between mddev_suspend() and md_write_start()") is available. This version change allows userspace to detect that MD fix is available. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-07-25 14:54:20 -04:00
Damien Le Moal	3b1a94c88b	dm zoned: drive-managed zoned block device target The dm-zoned device mapper target provides transparent write access to zoned block devices (ZBC and ZAC compliant block devices). dm-zoned hides to the device user (a file system or an application doing raw block device accesses) any constraint imposed on write requests by the device, equivalent to a drive-managed zoned block device model. Write requests are processed using a combination of on-disk buffering using the device conventional zones and direct in-place processing for requests aligned to a zone sequential write pointer position. A background reclaim process implemented using dm_kcopyd_copy ensures that conventional zones are always available for executing unaligned write requests. The reclaim process overhead is minimized by managing buffer zones in a least-recently-written order and first targeting the oldest buffer zones. Doing so, blocks under regular write access (such as metadata blocks of a file system) remain stored in conventional zones, resulting in no apparent overhead. dm-zoned implementation focus on simplicity and on minimizing overhead (CPU, memory and storage overhead). For a 14TB host-managed disk with 256 MB zones, dm-zoned memory usage per disk instance is at most about 3 MB and as little as 5 zones will be used internally for storing metadata and performing buffer zone reclaim operations. This is achieved using zone level indirection rather than a full block indirection system for managing block movement between zones. dm-zoned primary target is host-managed zoned block devices but it can also be used with host-aware device models to mitigate potential device-side performance degradation due to excessive random writing. Zoned block devices can be formatted and checked for use with the dm-zoned target using the dmzadm utility available at: https://github.com/hgst/dm-zoned-tools Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [Mike Snitzer partly refactored Damien's original work to cleanup the code] Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-06-19 11:05:20 -04:00
Linus Torvalds	d35a878ae1	- A major update for DM cache that reduces the latency for deciding whether blocks should migrate to/from the cache. The bio-prison-v2 interface supports this improvement by enabling direct dispatch of work to workqueues rather than having to delay the actual work dispatch to the DM cache core. So the dm-cache policies are much more nimble by being able to drive IO as they see fit. One immediate benefit from the improved latency is a cache that should be much more adaptive to changing workloads. - Add a new DM integrity target that emulates a block device that has additional per-sector tags that can be used for storing integrity information. - Add a new authenticated encryption feature to the DM crypt target that builds on the capabilities provided by the DM integrity target. - Add MD interface for switching the raid4/5/6 journal mode and update the DM raid target to use it to enable aid4/5/6 journal write-back support. - Switch the DM verity target over to using the asynchronous hash crypto API (this helps work better with architectures that have access to off-CPU algorithm providers, which should reduce CPU utilization). - Various request-based DM and DM multipath fixes and improvements from Bart and Christoph. - A DM thinp target fix for a bio structure leak that occurs for each discard IFF discard passdown is enabled. - A fix for a possible deadlock in DM bufio and a fix to re-check the new buffer allocation watermark in the face of competing admin changes to the 'max_cache_size_bytes' tunable. - A couple DM core cleanups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJZB6vtAAoJEMUj8QotnQNaoicIALuZTLElgAzxzA28cfk1+1Ea Gd09CfJ3M6cvk/YGUU7WwiSYIwu16yOJALG4sLcYnEmUCzvKfFPcl/RpeSJHPpYM 0aVXa6NIJw7K2r3C17toiK2DRMHYw6QU843WeWI93vBW13lDJklNJL9fM7GBEOLH NMSNw2mAq9ajtLlnJhM3ZfhloA7/u/jektvlBO1AA3RQ5Kx1cXVXFPqN7FdRfcqp 4RuEMe9faAadlXLsj3bia5IBmF/W0Qza6JilP+NLKLWB4fm7LZDjN/k+TsHWMa9e cGR73TgUGLMBJX+sDJy8R3oeBG9JZkFVkD7I30eCjzyhSOs/54XNYQ23EkqHJU0= =9Ryi -----END PGP SIGNATURE----- Merge tag 'for-4.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - A major update for DM cache that reduces the latency for deciding whether blocks should migrate to/from the cache. The bio-prison-v2 interface supports this improvement by enabling direct dispatch of work to workqueues rather than having to delay the actual work dispatch to the DM cache core. So the dm-cache policies are much more nimble by being able to drive IO as they see fit. One immediate benefit from the improved latency is a cache that should be much more adaptive to changing workloads. - Add a new DM integrity target that emulates a block device that has additional per-sector tags that can be used for storing integrity information. - Add a new authenticated encryption feature to the DM crypt target that builds on the capabilities provided by the DM integrity target. - Add MD interface for switching the raid4/5/6 journal mode and update the DM raid target to use it to enable aid4/5/6 journal write-back support. - Switch the DM verity target over to using the asynchronous hash crypto API (this helps work better with architectures that have access to off-CPU algorithm providers, which should reduce CPU utilization). - Various request-based DM and DM multipath fixes and improvements from Bart and Christoph. - A DM thinp target fix for a bio structure leak that occurs for each discard IFF discard passdown is enabled. - A fix for a possible deadlock in DM bufio and a fix to re-check the new buffer allocation watermark in the face of competing admin changes to the 'max_cache_size_bytes' tunable. - A couple DM core cleanups. * tag 'for-4.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (50 commits) dm bufio: check new buffer allocation watermark every 30 seconds dm bufio: avoid a possible ABBA deadlock dm mpath: make it easier to detect unintended I/O request flushes dm mpath: cleanup QUEUE_IF_NO_PATH bit manipulation by introducing assign_bit() dm mpath: micro-optimize the hot path relative to MPATHF_QUEUE_IF_NO_PATH dm: introduce enum dm_queue_mode to cleanup related code dm mpath: verify __pg_init_all_paths locking assumptions at runtime dm: verify suspend_locking assumptions at runtime dm block manager: remove an unused argument from dm_block_manager_create() dm rq: check blk_mq_register_dev() return value in dm_mq_init_request_queue() dm mpath: delay requeuing while path initialization is in progress dm mpath: avoid that path removal can trigger an infinite loop dm mpath: split and rename activate_path() to prepare for its expanded use dm ioctl: prevent stack leak in dm ioctl call dm integrity: use previously calculated log2 of sectors_per_block dm integrity: use hex2bin instead of open-coded variant dm crypt: replace custom implementation of hex2bin() dm crypt: remove obsolete references to per-CPU state dm verity: switch to using asynchronous hash crypto API dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues ...	2017-05-03 10:31:20 -07:00
Mikulas Patocka	9d609f85b7	dm integrity: support larger block sizes The DM integrity block size can now be 512, 1k, 2k or 4k. Using larger blocks reduces metadata handling overhead. The block size can be configured at table load time using the "block_size:<value>" option; where <value> is expressed in bytes (defult is still 512 bytes). It is safe to use larger block sizes with DM integrity, because the DM integrity journal makes sure that the whole block is updated atomically even if the underlying device doesn't support atomic writes of that size (e.g. 4k block ontop of a 512b device). Depends-on: `2859323e` ("block: fix blk_integrity_register to use template's interval_exp if not 0") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-04-24 12:04:33 -04:00
Mikulas Patocka	56b67a4f29	dm integrity: various small changes and cleanups Some coding style changes. Fix a bug that the array test_tag has insufficient size if the digest size of internal has is bigger than the tag size. The function __fls is undefined for zero argument, this patch fixes undefined behavior if the user sets zero interleave_sectors. Fix the limit of optional arguments to 8. Don't allocate crypt_data on the stack to avoid a BUG with debug kernel. Rename all optional argument names to have underscores rather than dashes. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-04-24 12:04:32 -04:00
Heinz Mauelshagen	6e53636fe8	dm raid: add raid4/5/6 journal write-back support via journal_mode option Commit `63c32ed4af` ("dm raid: add raid4/5/6 journaling support") added journal support to close the raid4/5/6 "write hole" -- in terms of writethrough caching. Introduce a "journal_mode" feature and use the new r5c_journal_mode_set() API to add support for switching the journal device's cache mode between write-through (the current default) and write-back. NOTE: If the journal device is not layered on resilent storage and it fails, write-through mode will cause the "write hole" to reoccur. But if the journal fails while in write-back mode it will cause data loss for any dirty cache entries unless resilent storage is used for the journal. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-03-27 12:08:07 -04:00
Heinz Mauelshagen	4464e36e06	dm raid: fix table line argument order in status Commit `3a1c1ef2f` ("dm raid: enhance status interface and fixup takeover/raid0") added new table line arguments and introduced an ordering flaw. The sequence of the raid10_copies and raid10_format raid parameters got reversed which causes lvm2 userspace to fail by falsely assuming a changed table line. Sequence those 2 parameters as before so that old lvm2 can function properly with new kernels by adjusting the table line output as documented in Documentation/device-mapper/dm-raid.txt. Also, add missing version 1.10.1 highlight to the documention. Fixes: `3a1c1ef2f` ("dm raid: enhance status interface and fixup takeover/raid0") Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-03-27 11:45:26 -04:00
Mikulas Patocka	c2bcb2b702	dm integrity: add recovery mode In recovery mode, we don't: - replay the journal - check checksums - allow writes to the device This mode can be used as a last resort for data recovery. The motivation for recovery mode is that when there is a single error in the journal, the user should not lose access to the whole device. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-03-24 15:54:23 -04:00
Milan Broz	8f0009a225	dm crypt: optionally support larger encryption sector size Add optional "sector_size" parameter that specifies encryption sector size (atomic unit of block device encryption). Parameter can be in range 512 - 4096 bytes and must be power of two. For compatibility reasons, the maximal IO must fit into the page limit, so the limit is set to the minimal page size possible (4096 bytes). NOTE: this device cannot yet be handled by cryptsetup if this parameter is set. IV for the sector is calculated from the 512 bytes sector offset unless the iv_large_sectors option is used. Test script using dmsetup: DEV="/dev/sdb" DEV_SIZE=$(blockdev --getsz $DEV) KEY="9c1185a5c5e9fc54612808977ee8f548b2258d31ddadef707ba62c166051b9e3cd0294c27515f2bccee924e8823ca6e124b8fc3167ed478bca702babe4e130ac" BLOCK_SIZE=4096 # dmsetup create test_crypt --table "0 $DEV_SIZE crypt aes-xts-plain64 $KEY 0 $DEV 0 1 sector_size:$BLOCK_SIZE" # dmsetup table --showkeys test_crypt Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-03-24 15:54:21 -04:00
Milan Broz	33d2f09fcb	dm crypt: introduce new format of cipher with "capi:" prefix For the new authenticated encryption we have to support generic composed modes (combination of encryption algorithm and authenticator) because this is how the kernel crypto API accesses such algorithms. To simplify the interface, we accept an algorithm directly in crypto API format. The new format is recognised by the "capi:" prefix. The dmcrypt internal IV specification is the same as for the old format. The crypto API cipher specifications format is: capi:cipher_api_spec-ivmode[:ivopts] Examples: capi:cbc(aes)-essiv:sha256 (equivalent to old aes-cbc-essiv:sha256) capi:xts(aes)-plain64 (equivalent to old aes-xts-plain64) Examples of authenticated modes: capi:gcm(aes)-random capi:authenc(hmac(sha256),xts(aes))-random capi:rfc7539(chacha20,poly1305)-random Authenticated modes can only be configured using the new cipher format. Note that this format allows user to specify arbitrary combinations that can be insecure. (Policy decision is done in cryptsetup userspace.) Authenticated encryption algorithms can be of two types, either native modes (like GCM) that performs both encryption and authentication internally, or composed modes where user can compose AEAD with separate specification of encryption algorithm and authenticator. For composed mode with HMAC (length-preserving encryption mode like an XTS and HMAC as an authenticator) we have to calculate HMAC digest size (the separate authentication key is the same size as the HMAC digest). Introduce crypt_ctr_auth_cipher() to parse the crypto API string to get HMAC algorithm and retrieve digest size from it. Also, for HMAC composed mode we need to parse the crypto API string to get the cipher mode nested in the specification. For native AEAD mode (like GCM), we can use crypto_tfm_alg_name() API to get the cipher specification. Because the HMAC composed mode is not processed the same as the native AEAD mode, the CRYPT_MODE_INTEGRITY_HMAC flag is no longer needed and "hmac" specification for the table integrity argument is removed. Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-03-24 15:54:20 -04:00
Milan Broz	ef43aa3806	dm crypt: add cryptographic data integrity protection (authenticated encryption) Allow the use of per-sector metadata, provided by the dm-integrity module, for integrity protection and persistently stored per-sector Initialization Vector (IV). The underlying device must support the "DM-DIF-EXT-TAG" dm-integrity profile. The per-bio integrity metadata is allocated by dm-crypt for every bio. Example of low-level mapping table for various types of use: DEV=/dev/sdb SIZE=417792 # Additional HMAC with CBC-ESSIV, key is concatenated encryption key + HMAC key SIZE_INT=389952 dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 32 J 0" dmsetup create y --table "0 $SIZE_INT crypt aes-cbc-essiv:sha256 \ 11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \ 00112233445566778899aabbccddeeff00112233445566778899aabbccddeeff \ 0 /dev/mapper/x 0 1 integrity:32:hmac(sha256)" # AEAD (Authenticated Encryption with Additional Data) - GCM with random IVs # GCM in kernel uses 96bits IV and we store 128bits auth tag (so 28 bytes metadata space) SIZE_INT=393024 dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 28 J 0" dmsetup create y --table "0 $SIZE_INT crypt aes-gcm-random \ 11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \ 0 /dev/mapper/x 0 1 integrity:28:aead" # Random IV only for XTS mode (no integrity protection but provides atomic random sector change) SIZE_INT=401272 dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 16 J 0" dmsetup create y --table "0 $SIZE_INT crypt aes-xts-random \ 11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \ 0 /dev/mapper/x 0 1 integrity:16:none" # Random IV with XTS + HMAC integrity protection SIZE_INT=377656 dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 48 J 0" dmsetup create y --table "0 $SIZE_INT crypt aes-xts-random \ 11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \ 00112233445566778899aabbccddeeff00112233445566778899aabbccddeeff \ 0 /dev/mapper/x 0 1 integrity:48:hmac(sha256)" Both AEAD and HMAC protection authenticates not only data but also sector metadata. HMAC protection is implemented through autenc wrapper (so it is processed the same way as an authenticated mode). In HMAC mode there are two keys (concatenated in dm-crypt mapping table). First is the encryption key and the second is the key for authentication (HMAC). (It is userspace decision if these keys are independent or somehow derived.) The sector request for AEAD/HMAC authenticated encryption looks like this: \|----- AAD -------\|------ DATA -------\|-- AUTH TAG --\| \| (authenticated) \| (auth+encryption) \| \| \| sector_LE \| IV \| sector in/out \| tag in/out \| For writes, the integrity fields are calculated during AEAD encryption of every sector and stored in bio integrity fields and sent to underlying dm-integrity target for storage. For reads, the integrity metadata is verified during AEAD decryption of every sector (they are filled in by dm-integrity, but the integrity fields are pre-allocated in dm-crypt). There is also an experimental support in cryptsetup utility for more friendly configuration (part of LUKS2 format). Because the integrity fields are not valid on initial creation, the device must be "formatted". This can be done by direct-io writes to the device (e.g. dd in direct-io mode). For now, there is available trivial tool to do this, see: https://github.com/mbroz/dm_int_tools Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com> Signed-off-by: Vashek Matyas <matyas@fi.muni.cz> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-03-24 15:49:41 -04:00
Mikulas Patocka	7eada909bf	dm: add integrity target The dm-integrity target emulates a block device that has additional per-sector tags that can be used for storing integrity information. A general problem with storing integrity tags with every sector is that writing the sector and the integrity tag must be atomic - i.e. in case of crash, either both sector and integrity tag or none of them is written. To guarantee write atomicity the dm-integrity target uses a journal. It writes sector data and integrity tags into a journal, commits the journal and then copies the data and integrity tags to their respective location. The dm-integrity target can be used with the dm-crypt target - in this situation the dm-crypt target creates the integrity data and passes them to the dm-integrity target via bio_integrity_payload attached to the bio. In this mode, the dm-crypt and dm-integrity targets provide authenticated disk encryption - if the attacker modifies the encrypted device, an I/O error is returned instead of random data. The dm-integrity target can also be used as a standalone target, in this mode it calculates and verifies the integrity tag internally. In this mode, the dm-integrity target can be used to detect silent data corruption on the disk or in the I/O path. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-03-24 15:49:07 -04:00
sayli karnik	3f816bac24	Documentation: device-mapper: cache.txt: Fix typos Fix a spelling error (hexidecimal->hexadecimal). Signed-off-by: sayli karnik <karniksayli1995@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2017-03-19 09:16:07 -06:00
Masahiro Yamada	34dcaf40c1	scripts/spelling.txt: add "explictely" pattern and fix typo instances Fix typos and add the following to the scripts/spelling.txt: explictely\|\|explicitly Link: http://lkml.kernel.org/r/1481573103-11329-25-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-27 18:43:47 -08:00
Joe Thornber	629d0a8a1a	dm cache metadata: add "metadata2" feature If "metadata2" is provided as a table argument when creating/loading a cache target a more compact metadata format, with separate dirty bits, is used. "metadata2" improves speed of shutting down a cache target. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-02-16 13:12:47 -05:00
Heinz Mauelshagen	63c32ed4af	dm raid: add raid4/5/6 journaling support Add md raid4/5/6 journaling support (upstream commit `bac624f3f8` started the implementation) which closes the write hole (i.e. non-atomic updates to stripes) using a dedicated journal device. Background: raid4/5/6 stripes hold N data payloads per stripe plus one parity raid4/5 or two raid6 P/Q syndrome payloads in an in-memory stripe cache. Parity or P/Q syndromes used to recover any data payloads in case of a disk failure are calculated from the N data payloads and need to be updated on the different component devices of the raid device. Those are non-atomic, persistent updates. Hence a crash can cause failure to update all stripe payloads persistently and thus cause data loss during stripe recovery. This problem gets addressed by writing whole stripe cache entries (together with journal metadata) to a persistent journal entry on a dedicated journal device. Only if that journal entry is written successfully, the stripe cache entry is updated on the component devices of the raid device (i.e. writethrough type). In case of a crash, the entry can be recovered from the journal and be written again thus ensuring consistent stripe payload suitable to data recovery. Future dependencies: once writeback caching being worked on to compensate for the throughput implictions involved with writethrough overhead is supported with journaling in upstream, an additional patch based on this one will support it in dm-raid. Journal resilience related remarks: because stripes are recovered from the journal in case of a crash, the journal device better be resilient. Resilience becomes mandatory with future writeback support, because loosing the working set in the log means data loss as oposed to writethrough, were the loss of the journal device 'only' reintroduces the write hole. Fix comment on data offsets in parse_dev_params() and initialize new_data_offset as well. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-01-25 12:49:06 +01:00
Heinz Mauelshagen	c63ede3b42	dm raid: fix transient device failure processing This fix addresses the following 3 failure scenarios: 1) If a (transiently) inaccessible metadata device is being passed into the constructor (e.g. a device tuple '254:4 254:5'), it is processed as if '- -' was given. This erroneously results in a status table line containing '- -', which mistakenly differs from what has been passed in. As a result, userspace libdevmapper puts the device tuple seperate from the RAID device thus not processing the dependencies properly. 2) False health status char 'A' instead of 'D' is emitted on the status status info line for the meta/data device tuple in this metadata device failure case. 3) If the metadata device is accessible when passed into the constructor but the data device (partially) isn't, that leg may be set faulty by the raid personality on access to the (partially) unavailable leg. Restore tried in a second raid device resume on such failed leg (status char 'D') fails after the (partial) leg returned. Fixes for aforementioned failure scenarios: - don't release passed in devices in the constructor thus allowing the status table line to e.g. contain '254:4 254:5' rather than '- -' - emit device status char 'D' rather than 'A' for the device tuple with the failed metadata device on the status info line - when attempting to restore faulty devices in a second resume, allow the device hot remove function to succeed by setting the device to not in-sync In case userspace intentionally passes '- -' into the constructor to avoid that device tuple (e.g. to split off a raid1 leg temporarily for later re-addition), the status table line will correctly show '- -' and the status info line will provide a '-' device health character for the non-defined device tuple. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-01-25 12:49:06 +01:00
Linus Torvalds	a9042defa2	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: NTB: correct ntb_spad_count comment typo misc: ibmasm: fix typo in error message Remove references to dead make variable LINUX_INCLUDE Remove last traces of ikconfig.h treewide: Fix printk() message errors Documentation/device-mapper: s/getsize/getsz/	2016-12-14 11:12:25 -08:00
Michael Witten	95f21c5c6d	Documentation/device-mapper: s/getsize/getsz/ According to `man blockdev': --getsize Print device size (32-bit!) in sectors. Deprecated in favor of the --getsz option. ... --getsz Get size in 512-byte sectors. Hence, occurrences of `--getsize' should be replaced with `--getsz', which this commit has achieved as follows: $ cd "$repo" $ git grep -l -e --getsz Documentation/device-mapper/delay.txt Documentation/device-mapper/dm-crypt.txt Documentation/device-mapper/linear.txt Documentation/device-mapper/log-writes.txt Documentation/device-mapper/striped.txt Documentation/device-mapper/switch.txt $ cd Documentation/device-mapper $ sed -i s/getsize/getsz/g * Signed-off-by: Michael Witten <mfwitten@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2016-12-14 10:54:27 +01:00
Heinz Mauelshagen	58fc4fedee	Documentation: dm raid: define data_offset status field Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-12-08 14:13:13 -05:00
Ondrej Kozina	c538f6ec9f	dm crypt: add ability to use keys from the kernel key retention service The kernel key service is a generic way to store keys for the use of other subsystems. Currently there is no way to use kernel keys in dm-crypt. This patch aims to fix that. Instead of key userspace may pass a key description with preceding ':'. So message that constructs encryption mapping now looks like this: <cipher> [<key>\|:<key_string>] <iv_offset> <dev_path> <start> [<#opt_params> <opt_params>] where <key_string> is in format: <key_size>:<key_type>:<key_description> Currently we only support two elementary key types: 'user' and 'logon'. Keys may be loaded in dm-crypt either via <key_string> or using classical method and pass the key in hex representation directly. dm-crypt device initialised with a key passed in hex representation may be replaced with key passed in key_string format and vice versa. (Based on original work by Andrey Ryabinin) Signed-off-by: Ondrej Kozina <okozina@redhat.com> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-12-08 14:13:09 -05:00
Masanari Iida	bb1423a96f	dm raid: fix typos in Documentation/device-mapper/dm-raid.txt Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-11-21 09:52:04 -05:00
Heinz Mauelshagen	b052b07c39	dm raid: fix activation of existing raid4/10 devices dm-raid 1.9.0 fails to activate existing RAID4/10 devices that have the old superblock format (which does not have takeover/reshaping support that was added via commit `33e53f0685`). Fix validation path for old superblocks by reverting to the old raid4 layout and basing checks on mddev->new_{level,layout,...} members in super_init_validation(). Cc: stable@vger.kernel.org # 4.8 Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-10-17 16:41:31 -04:00
Jens Axboe	1eff9d322a	block: rename bio bi_rw to bi_opf Since commit `63a4cc2486`, bio->bi_rw contains flags in the lower portion and the op code in the higher portions. This means that old code that relies on manually setting bi_rw is most likely going to be broken. Instead of letting that brokeness linger, rename the member, to force old and out-of-tree code to break at compile time instead of at runtime. No intended functional changes in this commit. Signed-off-by: Jens Axboe <axboe@fb.com>	2016-08-07 14:41:02 -06:00
Heinz Mauelshagen	d41bfed091	dm raid: update Documentation about reshaping/takeover/additonal RAID types Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-06-14 18:52:12 -04:00

1 2 3 4

153 Commits