Commit Graph

125 Commits

Author SHA1 Message Date
Michael Weiß 82bb85998c dm integrity: log audit events for dm-integrity target
dm-integrity signals integrity violations by returning I/O errors
to user space. To identify integrity violations by a controlling
instance, the kernel audit subsystem can be used to emit audit
events to user space. We use the new dm-audit submodule allowing
to emit audit events on relevant I/O errors.

The construction and destruction of integrity device mappings are
also relevant for auditing a system. Thus, those events are also
logged as audit events.

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-10-27 16:54:36 -04:00
Michael Weiß 2cc1ae4878 dm: introduce audit event module for device mapper
To be able to send auditing events to user space, we introduce a
generic dm-audit module. It provides helper functions to emit audit
events through the kernel audit subsystem. We claim the
AUDIT_DM_CTRL type=1336 and AUDIT_DM_EVENT type=1337 out of the
audit event messages range in the corresponding userspace api in
'include/uapi/linux/audit.h' for those events.

AUDIT_DM_CTRL is used to provide information about creation and
destruction of device mapper targets which are triggered by user space
admin control actions.
AUDIT_DM_EVENT is used to provide information about actual errors
during operation of the mapped device, showing e.g. integrity
violations in audit log.

Following commits to device mapper targets actually will make use of
this to emit those events in relevant cases.

The audit logs look like this if executing the following simple test:

 # dd if=/dev/zero of=test.img bs=1M count=1024
 # losetup -f test.img
 # integritysetup -vD format --integrity sha256 -t 32 /dev/loop0
 # integritysetup open -D /dev/loop0 --integrity sha256 integritytest
 # integritysetup status integritytest
 # integritysetup close integritytest
 # integritysetup open -D /dev/loop0 --integrity sha256 integritytest
 # integritysetup status integritytest
 # dd if=/dev/urandom of=/dev/loop0 bs=512 count=1 seek=100000
 # dd if=/dev/mapper/integritytest of=/dev/null

-------------------------
audit.log from auditd

type=UNKNOWN[1336] msg=audit(1630425039.363:184): module=integrity
op=ctr ppid=3807 pid=3819 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0
egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup"
exe="/sbin/integritysetup" subj==unconfined dev=254:3
error_msg='success' res=1
type=UNKNOWN[1336] msg=audit(1630425039.471:185): module=integrity
op=dtr ppid=3807 pid=3819 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0
egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup"
exe="/sbin/integritysetup" subj==unconfined dev=254:3
error_msg='success' res=1
type=UNKNOWN[1336] msg=audit(1630425039.611:186): module=integrity
op=ctr ppid=3807 pid=3819 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0
egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup"
exe="/sbin/integritysetup" subj==unconfined dev=254:3
error_msg='success' res=1
type=UNKNOWN[1336] msg=audit(1630425054.475:187): module=integrity
op=dtr ppid=3807 pid=3819 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0
egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup"
exe="/sbin/integritysetup" subj==unconfined dev=254:3
error_msg='success' res=1

type=UNKNOWN[1336] msg=audit(1630425073.171:191): module=integrity
op=ctr ppid=3807 pid=3883 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0
egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup"
exe="/sbin/integritysetup" subj==unconfined dev=254:3
error_msg='success' res=1

type=UNKNOWN[1336] msg=audit(1630425087.239:192): module=integrity
op=dtr ppid=3807 pid=3902 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0
egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup"
exe="/sbin/integritysetup" subj==unconfined dev=254:3
error_msg='success' res=1

type=UNKNOWN[1336] msg=audit(1630425093.755:193): module=integrity
op=ctr ppid=3807 pid=3906 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0
egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup"
exe="/sbin/integritysetup" subj==unconfined dev=254:3
error_msg='success' res=1

type=UNKNOWN[1337] msg=audit(1630425112.119:194): module=integrity
op=integrity-checksum dev=254:3 sector=77480 res=0
type=UNKNOWN[1337] msg=audit(1630425112.119:195): module=integrity
op=integrity-checksum dev=254:3 sector=77480 res=0
type=UNKNOWN[1337] msg=audit(1630425112.119:196): module=integrity
op=integrity-checksum dev=254:3 sector=77480 res=0
type=UNKNOWN[1337] msg=audit(1630425112.119:197): module=integrity
op=integrity-checksum dev=254:3 sector=77480 res=0
type=UNKNOWN[1337] msg=audit(1630425112.119:198): module=integrity
op=integrity-checksum dev=254:3 sector=77480 res=0
type=UNKNOWN[1337] msg=audit(1630425112.119:199): module=integrity
op=integrity-checksum dev=254:3 sector=77480 res=0
type=UNKNOWN[1337] msg=audit(1630425112.119:200): module=integrity
op=integrity-checksum dev=254:3 sector=77480 res=0
type=UNKNOWN[1337] msg=audit(1630425112.119:201): module=integrity
op=integrity-checksum dev=254:3 sector=77480 res=0
type=UNKNOWN[1337] msg=audit(1630425112.119:202): module=integrity
op=integrity-checksum dev=254:3 sector=77480 res=0
type=UNKNOWN[1337] msg=audit(1630425112.119:203): module=integrity
op=integrity-checksum dev=254:3 sector=77480 res=0

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
Signed-off-by: Paul Moore <paul@paul-moore.com> # fix audit.h numbering
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-10-27 16:53:47 -04:00
Christoph Hellwig 1c277e5013 dm: make EBS depend on !HIGHMEM
__ebs_rw_bvec use page_address on the submitted bios data, and thus
can't deal with highmem.  Disable the target on highmem configs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210804095634.460779-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-16 10:50:32 -06:00
Christoph Hellwig c66fd01971 block: make the block holder code optional
Move the block holder code into a separate file as it is not in any way
related to the other block_dev.c code, and add a new selectable config
option for it so that we don't have to build it without any remapped
drivers selected.

The Kconfig symbol contains a _DEPRECATED suffix to match the comments
added in commit 49731baa41
("block: restore multiple bd_link_disk_holder() support").

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210804094147.459763-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-09 11:50:42 -06:00
Guoqing Jiang 608f52e30a md: mark some personalities as deprecated
Mark the three personalities (linear, fault and multipath) as deprecated
because:

1. people can use dm multipath or nvme multipath.
2. linear is already deprecated in MODULE_ALIAS.
3. no one actively using fault.

Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: Song Liu <song@kernel.org>
2021-06-14 22:32:07 -07:00
Ahmad Fatoum 363880c4eb dm crypt: support using trusted keys
Commit 27f5411a71 ("dm crypt: support using encrypted keys") extended
dm-crypt to allow use of "encrypted" keys along with "user" and "logon".

Along the same lines, teach dm-crypt to support "trusted" keys as well.

Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-02-03 10:13:00 -05:00
Arnd Bergmann b690bd546b dm zoned: select CONFIG_CRC32
Without crc32 support, this driver fails to link:

arm-linux-gnueabi-ld: drivers/md/dm-zoned-metadata.o: in function `dmz_write_sb':
dm-zoned-metadata.c:(.text+0xe98): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/md/dm-zoned-metadata.o: in function `dmz_check_sb':
dm-zoned-metadata.c:(.text+0x7978): undefined reference to `crc32_le'

Fixes: 3b1a94c88b ("dm zoned: drive-managed zoned block device target")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-01-04 15:02:32 -05:00
Anthony Iliopoulos f7b347acb5 dm integrity: select CRYPTO_SKCIPHER
The integrity target relies on skcipher for encryption/decryption, but
certain kernel configurations may not enable CRYPTO_SKCIPHER, leading to
compilation errors due to unresolved symbols. Explicitly select
CRYPTO_SKCIPHER for DM_INTEGRITY, since it is unconditionally dependent
on it.

Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-01-04 15:02:31 -05:00
Mike Christie e4d2e82b23 dm mpath: add IO affinity path selector
This patch adds a path selector that selects paths based on a CPU to
path mapping the user passes in and what CPU we are executing on. The
primary user for this PS is where the app is optimized to use specific
CPUs so other PSs undo the apps handy work, and the storage and it's
transport are not a bottlneck.

For these io-affinity PS setups a path's transport/interconnect
perf is not going to flucuate a lot and there is no major differences
between paths, so QL/HST smarts do not help and RR always messes up
what the app is trying to do.

On a system with 16 cores, where you have a job per CPU:

fio --filename=/dev/dm-0 --direct=1 --rw=randrw --bs=4k \
--ioengine=libaio --iodepth=128 --numjobs=16

and a dm-multipath device setup where each CPU is mapped to one path:

// When in mq mode I had to set dm_mq_nr_hw_queues=$NUM_PATHS.
// Bio mode also showed similar results.
0 16777216 multipath 0 0 1 1 io-affinity 0 16 1 8:16 1 8:32 2 8:64 4
8:48 8 8:80 10 8:96 20 8:112 40 8:128 80 8:144 100 8:160 200 8:176
400 8:192 800 8:208 1000 8:224 2000 8:240 4000 65:0 8000

we can see a IOPs increase of 25%.

The percent increase depends on the device and interconnect. For a
slower/medium speed path/device that can do around 180K IOPs a path
if you ran that fio command to it directly we saw a 25% increase like
above. Slower path'd devices that could do around 90K per path showed
maybe around a 2 - 5% increase. If you use something like null_blk or
scsi_debug which can multi-million IOPs and hack it up so each device
they export shows up as a path then you see 50%+ increases.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-04 18:04:35 -05:00
Mickaël Salaün 4da8f8c8a1 dm verity: Add support for signature verification with 2nd keyring
Add a new configuration DM_VERITY_VERIFY_ROOTHASH_SIG_SECONDARY_KEYRING
to enable dm-verity signatures to be verified against the secondary
trusted keyring.  Instead of relying on the builtin trusted keyring
(with hard-coded certificates), the second trusted keyring can include
certificate authorities from the builtin trusted keyring and child
certificates loaded at run time.  Using the secondary trusted keyring
enables to use dm-verity disks (e.g. loop devices) signed by keys which
did not exist at kernel build time, leveraging the certificate chain of
trust model.  In practice, this makes it possible to update certificates
without kernel update and reboot, aligning with module and kernel
(kexec) signature verification which already use the secondary trusted
keyring.

Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-04 18:04:35 -05:00
Alexander A. Klimov 6f3bc22bf5 Replace HTTP links with HTTPS ones: LVM
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
  If not .svg:
    For each line:
      If doesn't contain `\bxmlns\b`:
        For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
          If both the HTTP and HTTPS versions
          return 200 OK and serve the same content:
            Replace HTTP with HTTPS.

Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Link: https://lore.kernel.org/r/20200627103138.71885-1-grandmaster@al2klimov.de
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-07-05 14:28:27 -06:00
Masahiro Yamada a7f7f6248d treewide: replace '---help---' in Kconfig files with 'help'
Since commit 84af7a6194 ("checkpatch: kconfig: prefer 'help' over
'---help---'"), the number of '---help---' has been gradually
decreasing, but there are still more than 2400 instances.

This commit finishes the conversion. While I touched the lines,
I also fixed the indentation.

There are a variety of indentation styles found.

  a) 4 spaces + '---help---'
  b) 7 spaces + '---help---'
  c) 8 spaces + '---help---'
  d) 1 space + 1 tab + '---help---'
  e) 1 tab + '---help---'    (correct indentation)
  f) 1 tab + 1 space + '---help---'
  g) 1 tab + 2 spaces + '---help---'

In order to convert all of them to 1 tab + 'help', I ran the
following commend:

  $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-14 01:57:21 +09:00
Khazhismel Kumykov 2613eab119 dm mpath: add Historical Service Time Path Selector
This new selector keeps an exponential moving average of the service
time for each path (losely defined as delta between start_io and
end_io), and uses this along with the number of inflight requests to
estimate future service time for a path.  Since we don't have a prober
to account for temporally slow paths, re-try "slow" paths every once in
a while (num_paths * historical_service_time). To account for fast paths
transitioning to slow, if a path has not completed any request within
(num_paths * historical_service_time), limit the number of outstanding
requests.  To account for low volume situations where number of
inflight IOs would be zero, the last finish time of each path is
factored in.

Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Co-developed-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-15 10:29:36 -04:00
Heinz Mauelshagen d3c7b35c20 dm: add emulated block size target
This new target is similar to the linear target except that it emulates
a smaller logical block size on a device with a larger logical block
size.  Its main purpose is to emulate 512 byte sectors on 4K native
disks (i.e. 512e).

See Documentation/admin-guide/device-mapper/dm-ebs.rst for details.

Reviewed-by: Damien Le Moal <DamienLeMoal@wdc.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> [Kconfig fixes]
Signed-off-by: Zheng Bin <zhengbin13@huawei.com> [static fixes]
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-15 10:29:35 -04:00
Dmitry Baryshkov 27f5411a71 dm crypt: support using encrypted keys
Allow one to use "encrypted" in addition to "user" and "logon" key
types for device encryption.

Signed-off-by: Dmitry Baryshkov <dmitry_baryshkov@mentor.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-15 10:29:34 -04:00
Krzysztof Kozlowski 443633225e dm: Fix Kconfig indentation
Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:
	$ sed -e 's/^        /\t/' -i */Kconfig

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-11-20 10:35:31 -05:00
Nikos Tsironis 7431b7835f dm: add clone target
Add the dm-clone target, which allows cloning of arbitrary block
devices.

dm-clone produces a one-to-one copy of an existing, read-only source
device into a writable destination device: It presents a virtual block
device which makes all data appear immediately, and redirects reads and
writes accordingly.

The main use case of dm-clone is to clone a potentially remote,
high-latency, read-only, archival-type block device into a writable,
fast, primary-type device for fast, low-latency I/O. The cloned device
is visible/mountable immediately and the copy of the source device to
the destination device happens in the background, in parallel with user
I/O.

When the cloning completes, the dm-clone table can be removed altogether
and be replaced, e.g., by a linear table, mapping directly to the
destination device.

For further information and examples of how to use dm-clone, please read
Documentation/admin-guide/device-mapper/dm-clone.rst

Suggested-by: Vangelis Koukis <vkoukis@arrikto.com>
Co-developed-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Signed-off-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-09-12 09:32:31 -04:00
Ard Biesheuvel a1a262b66e dm crypt: switch to ESSIV crypto API template
Replace the explicit ESSIV handling in the dm-crypt driver with calls
into the crypto API, which now possesses the capability to perform
this processing within the crypto subsystem.

Note that we reorder the AEAD cipher_api string parsing with the TFM
instantiation: this is needed because cipher_api is mangled by the
ESSIV handling, and throws off the parsing of "authenc(" otherwise.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-09-03 16:45:54 -04:00
Jaskaran Khurana 88cd3e6cfa dm verity: add root hash pkcs#7 signature verification
The verification is to support cases where the root hash is not secured
by Trusted Boot, UEFI Secureboot or similar technologies.

One of the use cases for this is for dm-verity volumes mounted after
boot, the root hash provided during the creation of the dm-verity volume
has to be secure and thus in-kernel validation implemented here will be
used before we trust the root hash and allow the block device to be
created.

The signature being provided for verification must verify the root hash
and must be trusted by the builtin keyring for verification to succeed.

The hash is added as a key of type "user" and the description is passed
to the kernel so it can look it up and use it for verification.

Adds CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG which can be turned on if root
hash verification is needed.

Kernel commandline dm_verity module parameter 'require_signatures' will
indicate whether to force root hash signature verification (for all dm
verity volumes).

Signed-off-by: Jaskaran Khurana <jaskarankhurana@linux.microsoft.com>
Tested-and-Reviewed-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-08-23 10:13:14 -04:00
Mauro Carvalho Chehab 6cf2a73cb2 docs: device-mapper: move it to the admin-guide
The DM support describes lots of aspects related to mapped
disk partitions from the userspace PoV.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-07-15 11:03:01 -03:00
Mauro Carvalho Chehab f0ba43774c docs: convert docs to ReST and rename to *.rst
The conversion is actually:
  - add blank lines and indentation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-06-14 14:21:04 -06:00
Thomas Gleixner ec8f24b7fa treewide: Add SPDX license identifier - Makefile/Kconfig
Add SPDX license identifiers to all Make/Kconfig files which:

 - Have no license information of any form

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21 10:50:46 +02:00
Bryan Gurney e4f3fabd67 dm: add dust target
Add the dm-dust target, which simulates the behavior of bad sectors
at arbitrary locations, and the ability to enable the emulation of
the read failures at an arbitrary time.

This target behaves similarly to a linear target.  At a given time,
the user can send a message to the target to start failing read
requests on specific blocks.  When the failure behavior is enabled,
reads of blocks configured "bad" will fail with EIO.

Writes of blocks configured "bad" will result in the following:

1. Remove the block from the "bad block list".
2. Successfully complete the write.

After this point, the block will successfully contain the written
data, and will service reads and writes normally.  This emulates the
behavior of a "remapped sector" on a hard disk drive.

dm-dust provides logging of which blocks have been added or removed
to the "bad block list", as well as logging when a block has been
removed from the bad block list.  These messages can be used
alongside the messages from the driver using a dm-dust device to
analyze the driver's behavior when a read fails at a given time.

(This logging can be reduced via a "quiet" mode, if desired.)

NOTE: If the block size is larger than 512 bytes, only the first sector
of each "dust block" is detected.  Placing a limiting layer above a dust
target, to limit the minimum I/O size to the dust block size, will
ensure proper emulation of the given large block size.

Signed-off-by: Bryan Gurney <bgurney@redhat.com>
Co-developed-by: Joe Shimkus <jshimkus@redhat.com>
Co-developed-by: John Dorminy <jdorminy@redhat.com>
Co-developed-by: John Pittman <jpittman@redhat.com>
Co-developed-by: Thomas Jaskiewicz <tjaskiew@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-04-30 16:37:19 -04:00
Helen Koike 6bbc923dfc dm: add support to directly boot to a mapped device
Add a "create" module parameter, which allows device-mapper targets to
be configured at boot time. This enables early use of DM targets in the
boot process (as the root device or otherwise) without the need of an
initramfs.

The syntax used in the boot param is based on the concise format from
the dmsetup tool to follow the rule of least surprise:

	dmsetup table --concise /dev/mapper/lroot

Which is:
	dm-mod.create=<name>,<uuid>,<minor>,<flags>,<table>[,<table>+][;<name>,<uuid>,<minor>,<flags>,<table>[,<table>+]+]

Where,
	<name>		::= The device name.
	<uuid>		::= xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | ""
	<minor>		::= The device minor number | ""
	<flags>		::= "ro" | "rw"
	<table>		::= <start_sector> <num_sectors> <target_type> <target_args>
	<target_type>	::= "verity" | "linear" | ...

For example, the following could be added in the boot parameters:
dm-mod.create="lroot,,,rw, 0 4096 linear 98:16 0, 4096 4096 linear 98:32 0" root=/dev/dm-0

Only the targets that were tested are allowed and the ones that don't
change any block device when the device is create as read-only. For
example, mirror and cache targets are not allowed. The rationale behind
this is that if the user makes a mistake, choosing the wrong device to
be the mirror or the cache can corrupt data.

The only targets initially allowed are:
* crypt
* delay
* linear
* snapshot-origin
* striped
* verity

Co-developed-by: Will Drewry <wad@chromium.org>
Co-developed-by: Kees Cook <keescook@chromium.org>
Co-developed-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Signed-off-by: Helen Koike <helen.koike@collabora.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-03-05 14:53:50 -05:00
Jens Axboe 6a23e05c2f dm: remove legacy request-based IO path
dm supports both, and since we're killing off the legacy path in
general, get rid of it in dm.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-10-11 11:36:09 -04:00
Mikulas Patocka 48debafe4f dm: add writecache target
The writecache target caches writes on persistent memory or SSD.
It is intended for databases or other programs that need extremely low
commit latency.

The writecache target doesn't cache reads because reads are supposed to
be cached in page cache in normal RAM.

If persistent memory isn't available this target can still be used in
SSD mode.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com> # fix missing goto
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> # fix compilation issue with !DAX
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> # use msecs_to_jiffies
Acked-by: Dan Williams <dan.j.williams@intel.com> # reworks to unify ARM and x86 flushing
Signed-off-by: Mike Snitzer <msnitzer@redhat.com>
2018-06-08 11:59:51 -04:00
Dan Williams 976431b02c dax, dm: allow device-mapper to operate without dax support
Change device-mapper's DAX dependency to require the presence of at
least one DAX_DRIVER. This allows device-mapper to be built without
bringing the DAX core along which is especially wasteful when there are
no DAX drivers, like BLK_DEV_PMEM, configured.

Cc: Alasdair Kergon <agk@redhat.com>
Reported-by: Bart Van Assche <Bart.VanAssche@wdc.com>
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-04-03 05:41:19 -07:00
Scott Bauer 18a5bf2705 dm: add unstriped target
This device mapper "unstriped" target remaps and unstripes I/O so it
is issued solely on a single drive in a HW RAID0 or dm-striped target.

In a 4 drive HW RAID0 the striped target exposes 1/4th of the LBA range
as a virtual drive.  Each I/O to that virtual drive will only be issued
to the 1 drive that was selected of the 4 drives in the HW RAID0.

This unstriped target is most useful for Intel NVMe drives that have
multiple cores but that do not have firmware control to pin separate LBA
ranges to each discrete cpu core.

Signed-off-by: Scott Bauer <scott.bauer@intel.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17 09:16:00 -05:00
Guoqing Jiang f0e230ad87 md-cluster: update document for raid10
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2017-11-01 21:32:25 -07:00
Damien Le Moal 3b1a94c88b dm zoned: drive-managed zoned block device target
The dm-zoned device mapper target provides transparent write access
to zoned block devices (ZBC and ZAC compliant block devices).
dm-zoned hides to the device user (a file system or an application
doing raw block device accesses) any constraint imposed on write
requests by the device, equivalent to a drive-managed zoned block
device model.

Write requests are processed using a combination of on-disk buffering
using the device conventional zones and direct in-place processing for
requests aligned to a zone sequential write pointer position.
A background reclaim process implemented using dm_kcopyd_copy ensures
that conventional zones are always available for executing unaligned
write requests. The reclaim process overhead is minimized by managing
buffer zones in a least-recently-written order and first targeting the
oldest buffer zones. Doing so, blocks under regular write access (such
as metadata blocks of a file system) remain stored in conventional
zones, resulting in no apparent overhead.

dm-zoned implementation focus on simplicity and on minimizing overhead
(CPU, memory and storage overhead). For a 14TB host-managed disk with
256 MB zones, dm-zoned memory usage per disk instance is at most about
3 MB and as little as 5 zones will be used internally for storing metadata
and performing buffer zone reclaim operations. This is achieved using
zone level indirection rather than a full block indirection system for
managing block movement between zones.

dm-zoned primary target is host-managed zoned block devices but it can
also be used with host-aware device models to mitigate potential
device-side performance degradation due to excessive random writing.

Zoned block devices can be formatted and checked for use with the dm-zoned
target using the dmzadm utility available at:

https://github.com/hgst/dm-zoned-tools

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
[Mike Snitzer partly refactored Damien's original work to cleanup the code]
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-06-19 11:05:20 -04:00
Linus Torvalds 2eecf3a49f - DM cache metadata fixes to short-circuit operations that require the
metadata not be in 'fail_io' mode.  Otherwise crashes are possible.
 
 - a DM cache fix to address the inability to adapt to continuous IO that
   happened to also reflect a changing working set (which required old
   blocks be demoted before the new working set could be promoted)
 
 - a DM cache smq policy cleanup that fell out from reviewing the above
 
 - fix the Kconfig help text for CONFIG_DM_INTEGRITY
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJZDMmrAAoJEMUj8QotnQNaALIH/3YJj9gZOQly+I6Rk9157nGX
 0mjXTd9SV6IT95ulX/DywBt3pbStXim15DYMQG1BxHTqHbrFmTRxR+K/AtbnEXCI
 Ww8tJB3Adz4ETVd6IJ2ptCFxBLZwgPDkY6RJlPe8ZG/mBvVLXjKHvNQ5siy7sXvr
 gAqn2XrSr5y4ZB06xJXrhfMxW4QHkgOGLcn5TUeXZumU7cAnNBoCWHAqtJtwxxog
 Iwaz342PCM81W7rXvnuIJm6PkEDbfNGHbjPZo4vAOHAD/Hok8LZFc89vTRZHO/EB
 gElsj9fIMyiLeyhBX/OXxguGBNL8hsyKZ8GdBJ/9Q9FcJvRm/LfkFiRx3agc7Bc=
 =mK65
 -----END PGP SIGNATURE-----

Merge tag 'for-4.12/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - DM cache metadata fixes to short-circuit operations that require the
   metadata not be in 'fail_io' mode. Otherwise crashes are possible.

 - a DM cache fix to address the inability to adapt to continuous IO
   that happened to also reflect a changing working set (which required
   old blocks be demoted before the new working set could be promoted)

 - a DM cache smq policy cleanup that fell out from reviewing the above

 - fix the Kconfig help text for CONFIG_DM_INTEGRITY

* tag 'for-4.12/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm cache metadata: fail operations if fail_io mode has been established
  dm integrity: improve the Kconfig help text for DM_INTEGRITY
  dm cache policy smq: cleanup free_target_met() and clean_target_met()
  dm cache policy smq: allow demotions to happen even during continuous IO
2017-05-05 19:31:06 -07:00
Linus Torvalds 53ef7d0e20 libnvdimm for 4.12
* Region media error reporting: A libnvdimm region device is the parent
 to one or more namespaces. To date, media errors have been reported via
 the "badblocks" attribute attached to pmem block devices for namespaces
 in "raw" or "memory" mode. Given that namespaces can be in "device-dax"
 or "btt-sector" mode this new interface reports media errors
 generically, i.e. independent of namespace modes or state. This
 subsequently allows userspace tooling to craft "ACPI 6.1 Section
 9.20.7.6 Function Index 4 - Clear Uncorrectable Error" requests and
 submit them via the ioctl path for NVDIMM root bus devices.
 
 * Introduce 'struct dax_device' and 'struct dax_operations': Prompted by
 a request from Linus and feedback from Christoph this allows for dax
 capable drivers to publish their own custom dax operations. This fixes
 the broken assumption that all dax operations are related to a
 persistent memory device, and makes it easier for other architectures
 and platforms to add customized persistent memory support.
 
 * 'libnvdimm' core updates: A new "deep_flush" sysfs attribute is
 available for storage appliance applications to manually trigger memory
 controllers to drain write-pending buffers that would otherwise be
 flushed automatically by the platform ADR (asynchronous-DRAM-refresh)
 mechanism at a power loss event. Support for "locked" DIMMs is included
 to prevent namespaces from surfacing when the namespace label data area
 is locked. Finally, fixes for various reported deadlocks and crashes,
 also tagged for -stable.
 
 * ACPI / nfit driver updates: General updates of the nfit driver to add
 DSM command overrides, ACPI 6.1 health state flags support, DSM payload
 debug available by default, and various fixes.
 
 Acknowledgements that came after the branch was pushed:
 
 commmit 565851c972 "device-dax: fix sysfs attribute deadlock"
 Tested-by: Yi Zhang <yizhan@redhat.com>
 
 commit 23f4984483 "libnvdimm: rework region badblocks clearing"
 Tested-by: Toshi Kani <toshi.kani@hpe.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZDONJAAoJEB7SkWpmfYgC3SsP/2KrLvTUcz646ViuPOgZ2cC4
 W6wAx6cvDSt+H52kLnFEsYoFt7WAj20ggPirb/Bc5jkGlvwE0lT9Xtmso9GpVkYT
 J9ZJ9pP/4YaAD3II1gmTwaUjYi0FxoOdx3Eb92yuWkO/8ylz4b2Nu3cBpYwyziGQ
 nIfEVwDXRLE86u6x0bWuf6TlVuvsbdiAI55CDqDMVQC6xIOLbSez7b8QIHlpiKEb
 Mw+xqdQva0esoreZEOXEhWNO+qtfILx8/ceBEGTNMp4e/JjZ2FbrSNplM+9bH5k7
 ywqP8lW+mBEw0fmBBkYoVG/xyesiiBb55JLnbi8Ew+7IUxw8a3iV7wftRi62lHcK
 zAjsHe4L+MansgtZsCL8wluvIPaktAdtB4xr7l9VNLKRYRUG73jEWU0gcUNryHIL
 BkQJ52pUS1PkClyAsWbBBHl1I/CvzVPd21VW0YELmLR4OywKy1c+eKw2bcYgjrb4
 59HZSv6S6EoKaQC+2qvVNpePil7cdfg5V2ubH/ki9HoYVyoxDptEWHnvf0NNatIH
 Y7mNcOPvhOksJmnKSyHbDjtRur7WoHIlC9D7UjEFkSBWsKPjxJHoidN4SnCMRtjQ
 WKQU0seoaKj04b68Bs/Qm9NozVgnsPFIUDZeLMikLFX2Jt7YSPu+Jmi2s4re6WLh
 TmJQ3Ly9t3o3/weHSzmn
 =Ox0s
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "The bulk of this has been in multiple -next releases. There were a few
  late breaking fixes and small features that got added in the last
  couple days, but the whole set has received a build success
  notification from the kbuild robot.

  Change summary:

   - Region media error reporting: A libnvdimm region device is the
     parent to one or more namespaces. To date, media errors have been
     reported via the "badblocks" attribute attached to pmem block
     devices for namespaces in "raw" or "memory" mode. Given that
     namespaces can be in "device-dax" or "btt-sector" mode this new
     interface reports media errors generically, i.e. independent of
     namespace modes or state.

     This subsequently allows userspace tooling to craft "ACPI 6.1
     Section 9.20.7.6 Function Index 4 - Clear Uncorrectable Error"
     requests and submit them via the ioctl path for NVDIMM root bus
     devices.

   - Introduce 'struct dax_device' and 'struct dax_operations': Prompted
     by a request from Linus and feedback from Christoph this allows for
     dax capable drivers to publish their own custom dax operations.
     This fixes the broken assumption that all dax operations are
     related to a persistent memory device, and makes it easier for
     other architectures and platforms to add customized persistent
     memory support.

   - 'libnvdimm' core updates: A new "deep_flush" sysfs attribute is
     available for storage appliance applications to manually trigger
     memory controllers to drain write-pending buffers that would
     otherwise be flushed automatically by the platform ADR
     (asynchronous-DRAM-refresh) mechanism at a power loss event.
     Support for "locked" DIMMs is included to prevent namespaces from
     surfacing when the namespace label data area is locked. Finally,
     fixes for various reported deadlocks and crashes, also tagged for
     -stable.

   - ACPI / nfit driver updates: General updates of the nfit driver to
     add DSM command overrides, ACPI 6.1 health state flags support, DSM
     payload debug available by default, and various fixes.

  Acknowledgements that came after the branch was pushed:

   - commmit 565851c972 "device-dax: fix sysfs attribute deadlock":
     Tested-by: Yi Zhang <yizhan@redhat.com>

   - commit 23f4984483 "libnvdimm: rework region badblocks clearing"
     Tested-by: Toshi Kani <toshi.kani@hpe.com>"

* tag 'libnvdimm-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (52 commits)
  libnvdimm, pfn: fix 'npfns' vs section alignment
  libnvdimm: handle locked label storage areas
  libnvdimm: convert NDD_ flags to use bitops, introduce NDD_LOCKED
  brd: fix uninitialized use of brd->dax_dev
  block, dax: use correct format string in bdev_dax_supported
  device-dax: fix sysfs attribute deadlock
  libnvdimm: restore "libnvdimm: band aid btt vs clear poison locking"
  libnvdimm: fix nvdimm_bus_lock() vs device_lock() ordering
  libnvdimm: rework region badblocks clearing
  acpi, nfit: kill ACPI_NFIT_DEBUG
  libnvdimm: fix clear length of nvdimm_forget_poison()
  libnvdimm, pmem: fix a NULL pointer BUG in nd_pmem_notify
  libnvdimm, region: sysfs trigger for nvdimm_flush()
  libnvdimm: fix phys_addr for nvdimm_clear_poison
  x86, dax, pmem: remove indirection around memcpy_from_pmem()
  block: remove block_device_operations ->direct_access()
  block, dax: convert bdev_dax_supported() to dax_direct_access()
  filesystem-dax: convert to dax_direct_access()
  Revert "block: use DAX for partition table reads"
  ext2, ext4, xfs: retrieve dax_device for iomap operations
  ...
2017-05-05 18:49:20 -07:00
Mike Snitzer 7ab84db64f dm integrity: improve the Kconfig help text for DM_INTEGRITY
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Milan Broz <gmazyland@gmail.com>
2017-05-04 10:58:55 -04:00
Linus Torvalds d35a878ae1 - A major update for DM cache that reduces the latency for deciding
whether blocks should migrate to/from the cache.  The bio-prison-v2
   interface supports this improvement by enabling direct dispatch of
   work to workqueues rather than having to delay the actual work
   dispatch to the DM cache core.  So the dm-cache policies are much more
   nimble by being able to drive IO as they see fit.  One immediate
   benefit from the improved latency is a cache that should be much more
   adaptive to changing workloads.
 
 - Add a new DM integrity target that emulates a block device that has
   additional per-sector tags that can be used for storing integrity
   information.
 
 - Add a new authenticated encryption feature to the DM crypt target that
   builds on the capabilities provided by the DM integrity target.
 
 - Add MD interface for switching the raid4/5/6 journal mode and update
   the DM raid target to use it to enable aid4/5/6 journal write-back
   support.
 
 - Switch the DM verity target over to using the asynchronous hash crypto
   API (this helps work better with architectures that have access to
   off-CPU algorithm providers, which should reduce CPU utilization).
 
 - Various request-based DM and DM multipath fixes and improvements from
   Bart and Christoph.
 
 - A DM thinp target fix for a bio structure leak that occurs for each
   discard IFF discard passdown is enabled.
 
 - A fix for a possible deadlock in DM bufio and a fix to re-check the
   new buffer allocation watermark in the face of competing admin changes
   to the 'max_cache_size_bytes' tunable.
 
 - A couple DM core cleanups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJZB6vtAAoJEMUj8QotnQNaoicIALuZTLElgAzxzA28cfk1+1Ea
 Gd09CfJ3M6cvk/YGUU7WwiSYIwu16yOJALG4sLcYnEmUCzvKfFPcl/RpeSJHPpYM
 0aVXa6NIJw7K2r3C17toiK2DRMHYw6QU843WeWI93vBW13lDJklNJL9fM7GBEOLH
 NMSNw2mAq9ajtLlnJhM3ZfhloA7/u/jektvlBO1AA3RQ5Kx1cXVXFPqN7FdRfcqp
 4RuEMe9faAadlXLsj3bia5IBmF/W0Qza6JilP+NLKLWB4fm7LZDjN/k+TsHWMa9e
 cGR73TgUGLMBJX+sDJy8R3oeBG9JZkFVkD7I30eCjzyhSOs/54XNYQ23EkqHJU0=
 =9Ryi
 -----END PGP SIGNATURE-----

Merge tag 'for-4.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - A major update for DM cache that reduces the latency for deciding
   whether blocks should migrate to/from the cache. The bio-prison-v2
   interface supports this improvement by enabling direct dispatch of
   work to workqueues rather than having to delay the actual work
   dispatch to the DM cache core. So the dm-cache policies are much more
   nimble by being able to drive IO as they see fit. One immediate
   benefit from the improved latency is a cache that should be much more
   adaptive to changing workloads.

 - Add a new DM integrity target that emulates a block device that has
   additional per-sector tags that can be used for storing integrity
   information.

 - Add a new authenticated encryption feature to the DM crypt target
   that builds on the capabilities provided by the DM integrity target.

 - Add MD interface for switching the raid4/5/6 journal mode and update
   the DM raid target to use it to enable aid4/5/6 journal write-back
   support.

 - Switch the DM verity target over to using the asynchronous hash
   crypto API (this helps work better with architectures that have
   access to off-CPU algorithm providers, which should reduce CPU
   utilization).

 - Various request-based DM and DM multipath fixes and improvements from
   Bart and Christoph.

 - A DM thinp target fix for a bio structure leak that occurs for each
   discard IFF discard passdown is enabled.

 - A fix for a possible deadlock in DM bufio and a fix to re-check the
   new buffer allocation watermark in the face of competing admin
   changes to the 'max_cache_size_bytes' tunable.

 - A couple DM core cleanups.

* tag 'for-4.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (50 commits)
  dm bufio: check new buffer allocation watermark every 30 seconds
  dm bufio: avoid a possible ABBA deadlock
  dm mpath: make it easier to detect unintended I/O request flushes
  dm mpath: cleanup QUEUE_IF_NO_PATH bit manipulation by introducing assign_bit()
  dm mpath: micro-optimize the hot path relative to MPATHF_QUEUE_IF_NO_PATH
  dm: introduce enum dm_queue_mode to cleanup related code
  dm mpath: verify __pg_init_all_paths locking assumptions at runtime
  dm: verify suspend_locking assumptions at runtime
  dm block manager: remove an unused argument from dm_block_manager_create()
  dm rq: check blk_mq_register_dev() return value in dm_mq_init_request_queue()
  dm mpath: delay requeuing while path initialization is in progress
  dm mpath: avoid that path removal can trigger an infinite loop
  dm mpath: split and rename activate_path() to prepare for its expanded use
  dm ioctl: prevent stack leak in dm ioctl call
  dm integrity: use previously calculated log2 of sectors_per_block
  dm integrity: use hex2bin instead of open-coded variant
  dm crypt: replace custom implementation of hex2bin()
  dm crypt: remove obsolete references to per-CPU state
  dm verity: switch to using asynchronous hash crypto API
  dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues
  ...
2017-05-03 10:31:20 -07:00
Dan Williams f26c5719b2 dm: add dax_device and dax_operations support
Allocate a dax_device to represent the capacity of a device-mapper
instance. Provide a ->direct_access() method via the new dax_operations
indirection that mirrors the functionality of the current direct_access
support via block_device_operations.  Once fs/dax.c has been converted
to use dax_operations the old dm_blk_direct_access() will be removed.

A new helper dm_dax_get_live_target() is introduced to separate some of
the dm-specifics from the direct_access implementation.

This enabling is only for the top-level dm representation to upper
layers. Converting target direct_access implementations is deferred to a
separate patch.

Cc: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-20 11:57:52 -07:00
Mikulas Patocka 7b81ef8b14 dm raid: select the Kconfig option CONFIG_MD_RAID0
Since the commit 0cf4503174 ("dm raid: add support for the MD RAID0
personality"), the dm-raid subsystem can activate a RAID-0 array.
Therefore, add MD_RAID0 to the dependencies of DM_RAID, so that MD_RAID0
will be selected when DM_RAID is selected.

Fixes: 0cf4503174 ("dm raid: add support for the MD RAID0 personality")
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-03-30 11:17:08 -04:00
SeongJae Park 4f6cce3910 Fix dead URLs to ftp.kernel.org
URLs to ftp.kernel.org are still exist though the service is closed [0].
This commit fixes the URLs to use www.kernel.org instead.

[0] https://www.kernel.org/shutting-down-ftp-services.html

Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-03-28 16:16:52 +02:00
Mikulas Patocka 7eada909bf dm: add integrity target
The dm-integrity target emulates a block device that has additional
per-sector tags that can be used for storing integrity information.

A general problem with storing integrity tags with every sector is that
writing the sector and the integrity tag must be atomic - i.e. in case of
crash, either both sector and integrity tag or none of them is written.

To guarantee write atomicity the dm-integrity target uses a journal. It
writes sector data and integrity tags into a journal, commits the journal
and then copies the data and integrity tags to their respective location.

The dm-integrity target can be used with the dm-crypt target - in this
situation the dm-crypt target creates the integrity data and passes them
to the dm-integrity target via bio_integrity_payload attached to the bio.
In this mode, the dm-crypt and dm-integrity targets provide authenticated
disk encryption - if the attacker modifies the encrypted device, an I/O
error is returned instead of random data.

The dm-integrity target can also be used as a standalone target, in this
mode it calculates and verifies the integrity tag internally. In this
mode, the dm-integrity target can be used to detect silent data
corruption on the disk or in the I/O path.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-03-24 15:49:07 -04:00
Joe Thornber b29d4986d0 dm cache: significant rework to leverage dm-bio-prison-v2
The cache policy interfaces have been updated to work well with the new
bio-prison v2 interface's ability to queue work immediately (promotion,
demotion, etc) -- overriding benefit being reduced latency on processing
IO through the cache.  Previously such work would be left for the DM
cache core to queue on various lists and then process in batches later
-- this caused a serious delay in latency for IO driven by the cache.

The background tracker code was factored out so that all cache policies
can make use of it.

Also, the "cleaner" policy has been removed and is now a variant of the
smq policy that simply disallows migrations.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-03-07 13:28:31 -05:00
Joe Thornber 2e8ed71102 dm block manager: make block locking optional
The block manager's locking is useful for catching cycles that may
result from certain btree metadata corruption.  But in general it serves
as a developer tool to catch bugs in code.  Unless you're finding that
DM thin provisioning is hanging due to infinite loops within the block
manager's access to btree nodes you can safely disable this feature.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de> # do/while(0) macro fix
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-11-14 15:17:47 -05:00
Mike Snitzer 3f0680402c dm: add missing newline between DM_DEBUG_BLOCK_STACK_TRACING and DM_BUFIO
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-03-10 17:12:11 -05:00
Joe Thornber 9ed84698fd dm cache: make the 'mq' policy an alias for 'smq'
smq seems to be performing better than the old mq policy in all
situations, as well as using a quarter of the memory.

Make 'mq' an alias for 'smq' when choosing a cache policy.  The tunables
that were present for the old mq are faked, and have no effect.  mq
should be considered deprecated now.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-03-10 17:12:08 -05:00
Sami Tolvanen a739ff3f54 dm verity: add support for forward error correction
Add support for correcting corrupted blocks using Reed-Solomon.

This code uses RS(255, N) interleaved across data and hash
blocks. Each error-correcting block covers N bytes evenly
distributed across the combined total data, so that each byte is a
maximum distance away from the others. This makes it possible to
recover from several consecutive corrupted blocks with relatively
small space overhead.

In addition, using verity hashes to locate erasures nearly doubles
the effectiveness of error correction. Being able to detect
corrupted blocks also improves performance, because only corrupted
blocks need to corrected.

For a 2 GiB partition, RS(255, 253) (two parity bytes for each
253-byte block) can correct up to 16 MiB of consecutive corrupted
blocks if erasures can be located, and 8 MiB if they cannot, with
16 MiB space overhead.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10 10:39:03 -05:00
Mikulas Patocka 86bad0c707 dm bufio: store stacktrace in buffers to help find buffer leaks
The option DM_DEBUG_BLOCK_STACK_TRACING is moved from persistent-data
directory to device mapper directory because it will now be used by
persistent-data and bufio.  When the option is enabled, each bufio buffer
stores the stacktrace of the last dm_bufio_get(), dm_bufio_read() or
dm_bufio_new() call that increased the hold count to 1.  The buffer's
stacktrace is printed if the buffer was not released before the bufio
client is destroyed.

When DM_DEBUG_BLOCK_STACK_TRACING is enabled, any bufio buffer leaks are
considered warnings - i.e. the kernel continues afterwards.  If not
enabled, buffer leaks are considered BUGs and the kernel with crash.
Reasoning on this disposition is: if we only ever warned on buffer leaks
users would generally ignore them and the problematic code would never
get fixed.

Successfully used to find source of bufio leaks fixed with commit
fce079f63c3 ("dm btree: fix bufio buffer leaks in dm_btree_del() error
path").

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10 10:38:58 -05:00
Arnd Bergmann 14f09e2f9b raid5-cache: add crc32c Kconfig dependency
The recent change of the raid5-cache code to use crc32c instead
of crc32 causes link errors when CONFIG_LIBCRC32C is disabled:

drivers/built-in.o: In function crc32c'
core.c:(.text+0x1c6060): undefined reference to `crc32c'

This adds an explicit 'select' statement like all other users
of this function do.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 5cb2fbd6ea ("raid5-cache: use crc32c checksum")
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-09 09:09:52 +11:00
Linus Torvalds 8e78b7dc93 SCSI misc on 20150911
The major pieces of this patch are a set patches facilitating better
 integration between scsi and scsi_dh (the device handling layer used by
 multi-path; all the dm parts are acked by Mike Snitzer).  It also includes
 driver updates for mp3sas, scsi_debug and an assortment of bug fixes.
 
 Signed-off-by: James Bottomley <JBottomley@Odin.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJV8yt5AAoJEDeqqVYsXL0MBsQH+wXvlx3o0BGuz5ZXfIs/RxzI
 MwGnu1J0LSA9FPakkMUVOBtsxIG+pCV+4eKorQMkfGCKAZ8daaYsyYvSEM2mcqIX
 1Y/srEnbzfE94JHbsI2pbiMPkB7QdtW27WjTSjQGgD9igAyVmmITiQJrXbpAlSLF
 F6n++9avng+GhjXQ5TF8/y13OYgabIoAPM1j4B/ut/Ok8ReruBvMBnOla5w5RMKR
 rBZKTZfUwvX5S0cuREwj8tFsRVUgdBNSrcGswFJrZo5x9WAsSHLC6+SOLZuUy1vC
 ua0tNtEiyXiuR0/jSP9qv7hJ/j0BW+EGdnW6GZEzKpeMK5PxfVspOsbNunUDRsY=
 =Y9G1
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull second round of SCSI updates from James Bottomley:
 "There's one late arriving patch here (added today), fixing a build
  issue which the scsi_dh patch set in here uncovered.  Other than that,
  everything has been incubated in -next and the checkers for a week.

  The major pieces of this patch are a set patches facilitating better
  integration between scsi and scsi_dh (the device handling layer used
  by multi-path; all the dm parts are acked by Mike Snitzer).

  This also includes driver updates for mp3sas, scsi_debug and an
  assortment of bug fixes"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (50 commits)
  scsi_dh: fix randconfig build error
  scsi: fix scsi_error_handler vs. scsi_host_dev_release race
  fcoe: Convert use of __constant_htons to htons
  mpt2sas: setpci reset kernel oops fix
  pm80xx: Don't override ts->stat on IO_OPEN_CNX_ERROR_HW_RESOURCE_BUSY
  lpfc: Fix possible use-after-free and double free in lpfc_mbx_cmpl_rdp_page_a2()
  bfa: Fix incorrect de-reference of pointer
  bfa: Fix indentation
  scsi_transport_sas: Remove check for SAS expander when querying bay/enclosure IDs.
  scsi_debug: resp_request: remove unused variable
  scsi_debug: fix REPORT LUNS Well Known LU
  scsi_debug: schedule_resp fix input variable check
  scsi_debug: make dump_sector static
  scsi_debug: vfree is null safe so drop the check
  scsi_debug: use SCSI_W_LUN_REPORT_LUNS instead of SAM2_WLUN_REPORT_LUNS;
  scsi_debug: define pr_fmt() for consistent logging
  mpt2sas: Refcount fw_events and fix unsafe list usage
  mpt2sas: Refcount sas_device objects and fix unsafe list usage
  scsi_dh: return SCSI_DH_NOTCONN in scsi_dh_activate()
  scsi_dh: don't allow to detach device handlers at runtime
  ...
2015-09-11 18:15:18 -07:00
Christoph Hellwig 294ab783ad scsi_dh: fix randconfig build error
It looks like the Kconfig check that was meant to fix this (commit
fe9233fb69 [SCSI] scsi_dh: fix kconfig related
build errors) was actually reversed, but no-one noticed until the new set of
patches which separated DM and SCSI_DH).

Fixes: fe9233fb69
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-09-11 11:54:37 -07:00
Geert Uytterhoeven 57d4248731 dm: Spelling s/consitent/consistent/
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
2015-08-07 14:36:43 +02:00
Baruch Siach 6ed443c07f dm crypt: update wiki page URL
Cryptsetup moved to gitlab.  This is a leftover from commit e44f23b32d
(dm crypt: update URLs to new cryptsetup project page, 2015-04-05).

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-07-27 07:58:16 -04:00
Joe Thornber 66a6363566 dm cache: add stochastic-multi-queue (smq) policy
The stochastic-multi-queue (smq) policy addresses some of the problems
with the current multiqueue (mq) policy.

Memory usage
------------

The mq policy uses a lot of memory; 88 bytes per cache block on a 64
bit machine.

SMQ uses 28bit indexes to implement it's data structures rather than
pointers.  It avoids storing an explicit hit count for each block.  It
has a 'hotspot' queue rather than a pre cache which uses a quarter of
the entries (each hotspot block covers a larger area than a single
cache block).

All these mean smq uses ~25bytes per cache block.  Still a lot of
memory, but a substantial improvement nontheless.

Level balancing
---------------

MQ places entries in different levels of the multiqueue structures
based on their hit count (~ln(hit count)).  This means the bottom
levels generally have the most entries, and the top ones have very
few.  Having unbalanced levels like this reduces the efficacy of the
multiqueue.

SMQ does not maintain a hit count, instead it swaps hit entries with
the least recently used entry from the level above.  The over all
ordering being a side effect of this stochastic process.  With this
scheme we can decide how many entries occupy each multiqueue level,
resulting in better promotion/demotion decisions.

Adaptability
------------

The MQ policy maintains a hit count for each cache block.  For a
different block to get promoted to the cache it's hit count has to
exceed the lowest currently in the cache.  This means it can take a
long time for the cache to adapt between varying IO patterns.
Periodically degrading the hit counts could help with this, but I
haven't found a nice general solution.

SMQ doesn't maintain hit counts, so a lot of this problem just goes
away.  In addition it tracks performance of the hotspot queue, which
is used to decide which blocks to promote.  If the hotspot queue is
performing badly then it starts moving entries more quickly between
levels.  This lets it adapt to new IO patterns very quickly.

Performance
-----------

In my tests SMQ shows substantially better performance than MQ.  Once
this matures a bit more I'm sure it'll become the default policy.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-06-11 17:12:59 -04:00