Commit Graph

162 Commits

Author SHA1 Message Date
Jérémy Lefaure fbc61291d7 dm space map metadata: use ARRAY_SIZE
Using the ARRAY_SIZE macro improves the readability of the code.

Found with Coccinelle with the following semantic patch:
@r depends on (org || report)@
type T;
T[] E;
position p;
@@
(
 (sizeof(E)@p /sizeof(*E))
|
 (sizeof(E)@p /sizeof(E[...]))
|
 (sizeof(E)@p /sizeof(T))
)

Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10 15:44:52 -05:00
Greg Kroah-Hartman b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Joe Thornber 0377a07c7a dm space map disk: fix some book keeping in the disk space map
When decrementing the reference count for a block, the free count wasn't
being updated if the reference count went to zero.

Cc: stable@vger.kernel.org
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-05-15 15:09:50 -04:00
Bart Van Assche 73cbca6a63 dm block manager: remove an unused argument from dm_block_manager_create()
The 'cache_size' argument of dm_block_manager_create() has never been
used.  Remove it along with the definitions of the constants passed as
the 'cache_size' argument.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-27 17:08:41 -04:00
Vinothkumar Raja 7d1fedb6e9 dm btree: fix for dm_btree_find_lowest_key()
dm_btree_find_lowest_key() is giving incorrect results.  find_key()
traverses the btree correctly for finding the highest key, but there is
an error in the way it traverses the btree for retrieving the lowest
key.  dm_btree_find_lowest_key() fetches the first key of the rightmost
block of the btree instead of fetching the first key from the leftmost
block.

Fix this by conditionally passing the correct parameter to value64()
based on the @find_highest flag.

Cc: stable@vger.kernel.org
Signed-off-by: Erez Zadok <ezk@fsl.cs.sunysb.edu>
Signed-off-by: Vinothkumar Raja <vinraja@cs.stonybrook.edu>
Signed-off-by: Nidhi Panpalia <npanpalia@cs.stonybrook.edu>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-24 14:47:49 -04:00
Ingo Molnar 0881e7bd34 sched/headers: Prepare to move the get_task_struct()/put_task_struct() and related APIs from <linux/sched.h> to <linux/sched/task.h>
But first update usage sites with the new header dependency.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:40 +01:00
Linus Torvalds 7a771ceac7 - Fix dm-raid transient device failure processing and other smaller
tweaks.
 
 - Add journal support to the DM raid target to close the 'write hole' on
   raid 4/5/6.
 
 - Fix dm-cache corruption, due to rounding bug, when cache exceeds 2TB.
 
 - Add 'metadata2' feature to dm-cache to separate the dirty bitset out
   from other cache metadata.  This improves speed of shutting down
   a large cache device (which implies writing out dirty bits).
 
 - Fix a memory leak during dm-stats data structure destruction.
 
 - Fix a DM multipath round-robin path selector performance regression
   that was caused by less precise balancing across all paths.
 
 - Lastly, introduce a DM core fix for a long-standing DM snapshot
   deadlock that is rooted in the complexity of the device stack used in
   conjunction with block core maintaining bios on current->bio_list to
   manage recursion in generic_make_request().  A more comprehensive fix
   to block core (and its hook in the cpu scheduler) would be wonderful
   but this DM-specific fix is pragmatic considering how difficult it has
   been to make progress on a generic fix.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJYrJJeAAoJEMUj8QotnQNaDskIAIJeMX3Dc8Skt00tZ6vEj3p6
 9juDpOrBKH3RYdqPmrYy9lVhhpFs6OoDfTQZaW/SmjDjHboJ3skKMjO+/NWav4nN
 39LoDfxLbDi06fC7Y4H7FHUPjb5sKSzw4W5IttFEKmHOwkz+iwVFL1R0dihBqv7G
 Lq0Ta6xffW8jHrzpmmSDY1I6FSmZ9LlHPCL00qQ5Z7WkMS5oDk0GzZoLFasdNfvm
 fP9N13+uel2/R7hclpxE6J+IZPN5ARG3HAQ5POS+2gMlIzaH4AlMh7yf5q0sSGwq
 uQsmdps8c+LOtAakOzVScykEZvwBh+ci8VqE1X1zol+fl8ijeWqgWtz4XXYECC0=
 =saD8
 -----END PGP SIGNATURE-----

Merge tag 'dm-4.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - Fix dm-raid transient device failure processing and other smaller
   tweaks.

 - Add journal support to the DM raid target to close the 'write hole'
   on raid 4/5/6.

 - Fix dm-cache corruption, due to rounding bug, when cache exceeds 2TB.

 - Add 'metadata2' feature to dm-cache to separate the dirty bitset out
   from other cache metadata. This improves speed of shutting down a
   large cache device (which implies writing out dirty bits).

 - Fix a memory leak during dm-stats data structure destruction.

 - Fix a DM multipath round-robin path selector performance regression
   that was caused by less precise balancing across all paths.

 - Lastly, introduce a DM core fix for a long-standing DM snapshot
   deadlock that is rooted in the complexity of the device stack used in
   conjunction with block core maintaining bios on current->bio_list to
   manage recursion in generic_make_request(). A more comprehensive fix
   to block core (and its hook in the cpu scheduler) would be wonderful
   but this DM-specific fix is pragmatic considering how difficult it
   has been to make progress on a generic fix.

* tag 'dm-4.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (22 commits)
  dm: flush queued bios when process blocks to avoid deadlock
  dm round robin: revert "use percpu 'repeat_count' and 'current_path'"
  dm stats: fix a leaked s->histogram_boundaries array
  dm space map metadata: constify dm_space_map structures
  dm cache metadata: use cursor api in blocks_are_clean_separate_dirty()
  dm persistent data: add cursor skip functions to the cursor APIs
  dm cache metadata: use dm_bitset_new() to create the dirty bitset in format 2
  dm bitset: add dm_bitset_new()
  dm cache metadata: name the cache block that couldn't be loaded
  dm cache metadata: add "metadata2" feature
  dm cache metadata: use bitset cursor api to load discard bitset
  dm bitset: introduce cursor api
  dm btree: use GFP_NOFS in dm_btree_del()
  dm space map common: memcpy the disk root to ensure it's arch aligned
  dm block manager: add unlikely() annotations on dm_bufio error paths
  dm cache: fix corruption seen when using cache > 2TB
  dm raid: cleanup awkward branching in raid_message() option processing
  dm raid: use mddev rather than rdev->mddev
  dm raid: use read_disk_sb() throughout
  dm raid: add raid4/5/6 journaling support
  ...
2017-02-21 12:11:41 -08:00
Bhumika Goyal b79af13efd dm space map metadata: constify dm_space_map structures
Declare dm_space_map structures as const as they are only passed as an
argument to the function memcpy. This argument is of type const void *,
so dm_space_map structures having this property can be declared as
const.

File size before:
   text	   data	    bss	    dec	    hex	filename
   4889	    240	      0	   5129	   1409 dm-space-map-metadata.o

File size after:
   text	   data	    bss	    dec	    hex	filename
   5139	      0	      0	   5139	   1413 dm-space-map-metadata.o

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-02-16 14:14:36 -05:00
Joe Thornber 9b696229aa dm persistent data: add cursor skip functions to the cursor APIs
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-02-16 13:12:50 -05:00
Joe Thornber 2151249eaa dm bitset: add dm_bitset_new()
A more efficient way of creating a populated bitset.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-02-16 13:12:48 -05:00
Joe Thornber 6fe28dbf05 dm bitset: introduce cursor api
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-02-16 13:12:45 -05:00
Joe Thornber 9f9ef0657d dm btree: use GFP_NOFS in dm_btree_del()
dm_btree_del() is called from an ioctl so don't recurse into FS.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-02-16 13:09:10 -05:00
Joe Thornber 3ba3ba1e84 dm space map common: memcpy the disk root to ensure it's arch aligned
The metadata_space_map_root passed to sm_ll_open_metadata() may or may
not be arch aligned, use memcpy to ensure it is.  This is not a fast
path so the extra memcpy doesn't hurt us.

Long-term it'd be better to use the kernel's alignment infrastructure to
remove the memcpy()s that are littered across persistent-data (btree,
array, space-maps, etc).

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-02-16 13:02:54 -05:00
Joe Thornber 602548bdd5 dm block manager: add unlikely() annotations on dm_bufio error paths
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-02-16 13:01:07 -05:00
Davidlohr Bueso 642fa448ae sched/core: Remove set_task_state()
This is a nasty interface and setting the state of a foreign task must
not be done. As of the following commit:

  be628be095 ("bcache: Make gc wakeup sane, remove set_task_state()")

... everyone in the kernel calls set_task_state() with current, allowing
the helper to be removed.

However, as the comment indicates, it is still around for those archs
where computing current is more expensive than using a pointer, at least
in theory. An important arch that is affected is arm64, however this has
been addressed now [1] and performance is up to par making no difference
with either calls.

Of all the callers, if any, it's the locking bits that would care most
about this -- ie: we end up passing a tsk pointer to a lot of the lock
slowpath, and setting ->state on that. The following numbers are based
on two tests: a custom ad-hoc microbenchmark that just measures
latencies (for ~65 million calls) between get_task_state() vs
get_current_state().

Secondly for a higher overview, an unlink microbenchmark was used,
which pounds on a single file with open, close,unlink combos with
increasing thread counts (up to 4x ncpus). While the workload is quite
unrealistic, it does contend a lot on the inode mutex or now rwsem.

[1] https://lkml.kernel.org/r/1483468021-8237-1-git-send-email-mark.rutland@arm.com

== 1. x86-64 ==

Avg runtime set_task_state():    601 msecs
Avg runtime set_current_state(): 552 msecs

                                            vanilla                 dirty
Hmean    unlink1-processes-2      36089.26 (  0.00%)    38977.33 (  8.00%)
Hmean    unlink1-processes-5      28555.01 (  0.00%)    29832.55 (  4.28%)
Hmean    unlink1-processes-8      37323.75 (  0.00%)    44974.57 ( 20.50%)
Hmean    unlink1-processes-12     43571.88 (  0.00%)    44283.01 (  1.63%)
Hmean    unlink1-processes-21     34431.52 (  0.00%)    38284.45 ( 11.19%)
Hmean    unlink1-processes-30     34813.26 (  0.00%)    37975.17 (  9.08%)
Hmean    unlink1-processes-48     37048.90 (  0.00%)    39862.78 (  7.59%)
Hmean    unlink1-processes-79     35630.01 (  0.00%)    36855.30 (  3.44%)
Hmean    unlink1-processes-110    36115.85 (  0.00%)    39843.91 ( 10.32%)
Hmean    unlink1-processes-141    32546.96 (  0.00%)    35418.52 (  8.82%)
Hmean    unlink1-processes-172    34674.79 (  0.00%)    36899.21 (  6.42%)
Hmean    unlink1-processes-203    37303.11 (  0.00%)    36393.04 ( -2.44%)
Hmean    unlink1-processes-224    35712.13 (  0.00%)    36685.96 (  2.73%)

== 2. ppc64le ==

Avg runtime set_task_state():  938 msecs
Avg runtime set_current_state: 940 msecs

                                            vanilla                 dirty
Hmean    unlink1-processes-2      19269.19 (  0.00%)    30704.50 ( 59.35%)
Hmean    unlink1-processes-5      20106.15 (  0.00%)    21804.15 (  8.45%)
Hmean    unlink1-processes-8      17496.97 (  0.00%)    17243.28 ( -1.45%)
Hmean    unlink1-processes-12     14224.15 (  0.00%)    17240.21 ( 21.20%)
Hmean    unlink1-processes-21     14155.66 (  0.00%)    15681.23 ( 10.78%)
Hmean    unlink1-processes-30     14450.70 (  0.00%)    15995.83 ( 10.69%)
Hmean    unlink1-processes-48     16945.57 (  0.00%)    16370.42 ( -3.39%)
Hmean    unlink1-processes-79     15788.39 (  0.00%)    14639.27 ( -7.28%)
Hmean    unlink1-processes-110    14268.48 (  0.00%)    14377.40 (  0.76%)
Hmean    unlink1-processes-141    14023.65 (  0.00%)    16271.69 ( 16.03%)
Hmean    unlink1-processes-172    13417.62 (  0.00%)    16067.55 ( 19.75%)
Hmean    unlink1-processes-203    15293.08 (  0.00%)    15440.40 (  0.96%)
Hmean    unlink1-processes-234    13719.32 (  0.00%)    16190.74 ( 18.01%)
Hmean    unlink1-processes-265    16400.97 (  0.00%)    16115.22 ( -1.74%)
Hmean    unlink1-processes-296    14388.60 (  0.00%)    16216.13 ( 12.70%)
Hmean    unlink1-processes-320    15771.85 (  0.00%)    15905.96 (  0.85%)

x86-64 (known to be fast for get_current()/this_cpu_read_stable() caching)
and ppc64 (with paca) show similar improvements in the unlink microbenches.
The small delta for ppc64 (2ms), does not represent the gains on the unlink
runs. In the case of x86, there was a decent amount of variation in the
latency runs, but always within a 20 to 50ms increase), ppc was more constant.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@stgolabs.net
Cc: mark.rutland@arm.com
Link: http://lkml.kernel.org/r/1483479794-14013-5-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14 11:14:16 +01:00
Benjamin Marzinski b446396b74 dm space map: always set ev if sm_ll_mutate() succeeds
If no block was allocated or freed, sm_ll_mutate() wasn't setting
*ev, leaving the variable unitialized. sm_ll_insert(),
sm_disk_inc_block(), and sm_disk_new_block() all check ev to see
if there was an allocation event in sm_ll_mutate(), possibly
reading unitialized data.

If no allocation event occured, sm_ll_mutate() should set *ev
to SM_NONE.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-12-08 14:13:15 -05:00
Benjamin Marzinski 0c79ce0b75 dm space map metadata: skip useless memcpy in metadata_ll_init_index()
When metadata_ll_init_index() is called by sm_ll_new_metadata(),
ll->mi_le hasn't been initialized yet. So, when
metadata_ll_init_index() copies the contents of ll->mi_le into the
newly allocated bitmap_root, it is just copying garbage. ll->mi_le
will be allocated later in sm_ll_extend() and copied into the
bitmap_root, in sm_ll_commit().

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-12-08 14:13:15 -05:00
Benjamin Marzinski 314c25c56c dm space map metadata: fix 'struct sm_metadata' leak on failed create
In dm_sm_metadata_create() we temporarily change the dm_space_map
operations from 'ops' (whose .destroy function deallocates the
sm_metadata) to 'bootstrap_ops' (whose .destroy function doesn't).

If dm_sm_metadata_create() fails in sm_ll_new_metadata() or
sm_ll_extend(), it exits back to dm_tm_create_internal(), which calls
dm_sm_destroy() with the intention of freeing the sm_metadata, but it
doesn't (because the dm_space_map operations is still set to
'bootstrap_ops').

Fix this by setting the dm_space_map operations back to 'ops' if
dm_sm_metadata_create() fails when it is set to 'bootstrap_ops'.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2016-12-08 14:13:14 -05:00
Bart Van Assche 0637018dff dm array: remove a dead assignment in populate_ablock_with_values()
A value is assigned to 'nr_entries' but is never used, remove it.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-12-08 14:13:09 -05:00
Joe Thornber 2e8ed71102 dm block manager: make block locking optional
The block manager's locking is useful for catching cycles that may
result from certain btree metadata corruption.  But in general it serves
as a developer tool to catch bugs in code.  Unless you're finding that
DM thin provisioning is hanging due to infinite loops within the block
manager's access to btree nodes you can safely disable this feature.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de> # do/while(0) macro fix
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-11-14 15:17:47 -05:00
Joe Thornber fdd1315aa5 dm array: introduce cursor api
More efficient way to iterate an array due to prefetching (makes use of
the new dm_btree_cursor_* api).

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-09-22 11:15:04 -04:00
Joe Thornber 7d111c81fa dm btree: introduce cursor api
This uses prefetching to speed up iteration through a btree.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-09-22 11:15:04 -04:00
Joe Thornber dd6a77d998 dm array: add dm_array_new()
dm_array_new() creates a new, populated array more efficiently than
starting with an empty one and resizing.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-09-22 11:12:23 -04:00
Joe Thornber e7e0f73047 dm btree: fix a bug in dm_btree_find_next_single()
dm_btree_find_next_single() can short-circuit the search for a block
with a return of -ENODATA if all entries are higher than the search key
passed to lower_bound().

This hasn't been a problem because of the way the btree has been used by
DM thinp.  But it must be fixed now in preparation for fixing the race
in DM thinp's handling of simultaneous block discard vs allocation.
Otherwise, once that fix is in place, some of the blocks in a discard
would not be unmapped as expected.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-20 12:43:34 -04:00
Mike Snitzer 512167788a dm space map metadata: remove unused variable in brb_pop()
Remove the unused struct block_op pointer that was inadvertantly
introduced, via cut-and-paste of previous brb_op() code, as part of
commit 50dd842ad.

(Cc'ing stable@ because commit 50dd842ad did)

Fixes: 50dd842ad ("dm space map metadata: fix ref counting bug when bootstrapping a new space map")
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2015-12-14 09:26:01 -05:00
Mike Snitzer ba503835ad dm btree: factor out need_insert() helper
Eliminates code duplication within insert().

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10 10:38:59 -05:00
Mikulas Patocka 86bad0c707 dm bufio: store stacktrace in buffers to help find buffer leaks
The option DM_DEBUG_BLOCK_STACK_TRACING is moved from persistent-data
directory to device mapper directory because it will now be used by
persistent-data and bufio.  When the option is enabled, each bufio buffer
stores the stacktrace of the last dm_bufio_get(), dm_bufio_read() or
dm_bufio_new() call that increased the hold count to 1.  The buffer's
stacktrace is printed if the buffer was not released before the bufio
client is destroyed.

When DM_DEBUG_BLOCK_STACK_TRACING is enabled, any bufio buffer leaks are
considered warnings - i.e. the kernel continues afterwards.  If not
enabled, buffer leaks are considered BUGs and the kernel with crash.
Reasoning on this disposition is: if we only ever warned on buffer leaks
users would generally ignore them and the problematic code would never
get fixed.

Successfully used to find source of bufio leaks fixed with commit
fce079f63c3 ("dm btree: fix bufio buffer leaks in dm_btree_del() error
path").

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10 10:38:58 -05:00
Mikulas Patocka 313c9b9736 dm block manager: cleanup code that prints stacktrace
There is no need to record stack trace and immediately print it.  Just
use dump_stack() to print the current stack.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10 10:38:56 -05:00
Joe Thornber ed8b45a367 dm btree: fix bufio buffer leaks in dm_btree_del() error path
If dm_btree_del()'s call to push_frame() fails, e.g. due to
btree_node_validator finding invalid metadata, the dm_btree_del() error
path must unlock all frames (which have active dm-bufio buffers) that
were pushed onto the del_stack.

Otherwise, dm_bufio_client_destroy() will BUG_ON() because dm-bufio
buffers have leaked, e.g.:
  device-mapper: bufio: leaked buffer 3, hold count 1, list 0

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2015-12-10 10:30:18 -05:00
Joe Thornber 50dd842ad8 dm space map metadata: fix ref counting bug when bootstrapping a new space map
When applying block operations (BOPs) do not remove them from the
uncommitted BOP ring-buffer until after they've been applied -- in case
we recurse.

Also, perform BOP_INC operation, in dm_sm_metadata_create() and
sm_metadata_extend(), in terms of the uncommitted BOP ring-buffer rather
than using direct calls to sm_ll_inc().

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2015-12-09 13:27:25 -05:00
Joe Thornber 993ceab919 dm thin metadata: fix bug in dm_thin_remove_range()
dm_btree_remove_leaves() only unmaps a contiguous region so we need a
loop, in __remove_range(), to handle ranges that contain multiple
regions.

A new btree function, dm_btree_lookup_next(), is introduced which is
more efficiently able to skip over regions of the thin device which
aren't mapped.  __remove_range() uses dm_btree_lookup_next() for each
iteration of __remove_range()'s loop.

Also, improve description of dm_btree_remove_leaves().

Fixes: 6550f075 ("dm thin metadata: add dm_thin_remove_range()")
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 4.1+
2015-12-02 13:26:49 -05:00
Mike Snitzer 30ce6e1cc5 dm btree: fix leak of bufio-backed block in btree_split_sibling error path
The block allocated at the start of btree_split_sibling() is never
released if later insert_at() fails.

Fix this by releasing the previously allocated bufio block using
unlock_block().

Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2015-12-02 13:20:34 -05:00
Linus Torvalds e0700ce709 - Revert a dm-multipath change that caused a regression for unprivledged
users (e.g. kvm guests) that issued ioctls when a multipath device had
   no available paths.
 
 - Include Christoph's refactoring of DM's ioctl handling and add support
   for passing through persistent reservations with DM multipath.
 
 - All other changes are very simple cleanups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWOp04AAoJEMUj8QotnQNaFLsH/AhMEH/jI1ObOfy4J1Wy4rOx
 ujJT91uS/s0H3pc9cGKQYnuGpFkX6WWU4wMiabIyiTn4sAsoXaflfIGutivLiDJr
 HfecrMrGZgnP4ZlpPPB02BmlxFbcPW8yzAU4ma38xBgQ+Pu30RO/HkvX/2vKOppG
 qwPop/XsNxq3KXgFGM44ToytM6c/MPGluhuvOwbaacAO1HviMuen9qsVjk4kwcf3
 jGYTbEPHATxyu5/6oKDTkQTYhzdwg3B2qHCiKMGw3l1kXhaQLFcaOivOLV8Sf3xh
 bj1070pkGe9OpqaVzMnwDtJ8rnsBl/Nt4wj9oiQPxbX71GYZAmcMIYn9WEkcKFI=
 =AR2D
 -----END PGP SIGNATURE-----

Merge tag 'dm-4.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:
 "Smaller set of DM changes for this merge.  I've based these changes on
  Jens' for-4.4/reservations branch because the associated DM changes
  required it.

   - Revert a dm-multipath change that caused a regression for
     unprivledged users (e.g. kvm guests) that issued ioctls when a
     multipath device had no available paths.

   - Include Christoph's refactoring of DM's ioctl handling and add
     support for passing through persistent reservations with DM
     multipath.

   - All other changes are very simple cleanups"

* tag 'dm-4.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm switch: simplify conditional in alloc_region_table()
  dm delay: document that offsets are specified in sectors
  dm delay: capitalize the start of an delay_ctr() error message
  dm delay: Use DM_MAPIO macros instead of open-coded equivalents
  dm linear: remove redundant target name from error messages
  dm persistent data: eliminate unnecessary return values
  dm: eliminate unused "bioset" process for each bio-based DM device
  dm: convert ffs to __ffs
  dm: drop NULL test before kmem_cache_destroy() and mempool_destroy()
  dm: add support for passing through persistent reservations
  dm: refactor ioctl handling
  Revert "dm mpath: fix stalls when handling invalid ioctls"
  dm: initialize non-blk-mq queue data before queue is used
2015-11-04 21:19:53 -08:00
Mikulas Patocka 4c7da06f5a dm persistent data: eliminate unnecessary return values
dm_bm_unlock and dm_tm_unlock return an integer value but the returned
value is always 0.  The calling code sometimes checks the return value
and sometimes doesn't.

Eliminate these unnecessary return values and also the checks for them.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-10-31 19:06:02 -04:00
Mike Snitzer 4dcb8b57df dm btree: fix leak of bufio-backed block in btree_split_beneath error path
btree_split_beneath()'s error path had an outstanding FIXME that speaks
directly to the potential for _not_ cleaning up a previously allocated
bufio-backed block.

Fix this by releasing the previously allocated bufio block using
unlock_block().

Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <thornber@redhat.com>
Cc: stable@vger.kernel.org
2015-10-23 14:02:55 -04:00
Joe Thornber 2871c69e02 dm btree remove: fix a bug when rebalancing nodes after removal
Commit 4c7e309340 ("dm btree remove: fix bug in redistribute3") wasn't
a complete fix for redistribute3().

The redistribute3 function takes 3 btree nodes and shares out the entries
evenly between them.  If the three nodes in total contained
(MAX_ENTRIES * 3) - 1 entries between them then this was erroneously getting
rebalanced as (MAX_ENTRIES - 1) on the left and right, and (MAX_ENTRIES + 1) in
the center.

Fix this issue by being more careful about calculating the target number
of entries for the left and right nodes.

Unit tested in userspace using this program:
https://github.com/jthornber/redistribute3-test/blob/master/redistribute3_t.c

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2015-10-23 14:02:55 -04:00
viresh kumar fc0a446152 dm: remove unlikely() before IS_ERR()
IS_ERR() already contains an 'unlikely' compiler flag so there is no
need to do that again from IS_ERR() callers.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-08-12 11:32:21 -04:00
Vivek Goyal 8c747fd0c3 dm btree remove: remove unused function get_nr_entries()
rebalance_children() calls get_nr_entries() and assigns the result to an
unused local 'child_entries' variable.  Remove get_nr_entries() and
cleanup rebalance_children() accordingly.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-08-12 11:32:19 -04:00
Vivek Goyal 0a8d4c3ef8 dm btree: remove unused "dm_block_t root" parameter in btree_split_sibling()
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-08-12 11:32:19 -04:00
Joe Thornber b0dc3c8bc1 dm btree: add ref counting ops for the leaves of top level btrees
When using nested btrees, the top leaves of the top levels contain
block addresses for the root of the next tree down.  If we shadow a
shared leaf node the leaf values (sub tree roots) should be incremented
accordingly.

This is only an issue if there is metadata sharing in the top levels.
Which only occurs if metadata snapshots are being used (as is possible
with dm-thinp).  And could result in a block from the thinp metadata
snap being reused early, thus corrupting the thinp metadata snap.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2015-08-12 10:50:37 -04:00
Joe Thornber aa0cd28d05 dm btree remove: fix bug in remove_one()
remove_one() was not incrementing the key for the beginning of the
range, so not all entries were being removed.  This resulted in
discards that were not unmapping all blocks.

Fixes: 4ec331c3ea ("dm btree: add dm_btree_remove_leaves()")
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-08-07 11:56:43 -04:00
Joe Thornber 1c7518794a dm btree: silence lockdep lock inversion in dm_btree_del()
Allocate memory using GFP_NOIO when deleting a btree.  dm_btree_del()
can be called via an ioctl and we don't want to recurse into the FS or
block layer.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2015-07-06 10:45:02 -04:00
Dennis Yang 4c7e309340 dm btree remove: fix bug in redistribute3
redistribute3() shares entries out across 3 nodes.  Some entries were
being moved the wrong way, breaking the ordering.  This manifested as a
BUG() in dm-btree-remove.c:shift() when entries were removed from the
btree.

For additional context see:
https://www.redhat.com/archives/dm-devel/2015-May/msg00113.html

Signed-off-by: Dennis Yang <shinrairis@gmail.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2015-07-05 17:43:58 -04:00
Joe Thornber 6096d91af0 dm space map metadata: fix occasional leak of a metadata block on resize
The metadata space map has a simplified 'bootstrap' mode that is
operational when extending the space maps.  Whilst in this mode it's
possible for some refcount decrement operations to become queued (eg, as
a result of shadowing one of the bitmap indexes).  These decrements were
not being applied when switching out of bootstrap mode.

The effect of this bug was the leaking of a 4k metadata block.  This is
detected by the latest version of thin_check as a non fatal error.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2015-06-17 10:09:23 -04:00
Joe Thornber 4ec331c3ea dm btree: add dm_btree_remove_leaves()
Removes a range of leaf values from the tree.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-06-11 17:13:03 -04:00
Mike Snitzer 49f154c732 dm thin metadata: remove in-core 'read_only' flag
Leverage the block manager's read_only flag instead of duplicating it;
access with new dm_bm_is_read_only() method.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29 14:18:59 -04:00
Linus Torvalds a911dcdba1 - Significant dm-crypt CPU scalability performance improvements thanks
to changes that enable effective use of an unbound workqueue across
   all available CPUs.  A large battery of tests were performed to
   validate these changes, summary of results is available here:
   https://www.redhat.com/archives/dm-devel/2015-February/msg00106.html
 
 - A few additional stable fixes (to DM core, dm-snapshot and dm-mirror)
   and a small fix to the dm-space-map-disk.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJU51MsAAoJEMUj8QotnQNan8wH/3Oa2/ofbX8zNYtdDxFBK7W0
 pHwVA2CH9yzDDi90X9GsQODmXerMrYf+SHo/RsQTpSnt5WI/4ZVP/1EoNGu6T2Yr
 XiaKc/Jwo+ScxdhoHSNtyGL5vKeipX6clgz1wcd/4/UBcBAr0Vxj25s8Ta7naK2V
 hZyWnt3MCGEDQ45NVBmLOU78Cl68LGf8JOUY0l3cd8ehQjGyR9a12GXwRktepslp
 hw9xu8uYmE93fD+MIe/fUs7W/WvIVgFSskhswlvD3kAFpEAsMczjLGVSRQlBE90L
 OK7v1HKxiIUJa17g3LmiCFa59ZjrnSHGpOOv9IPAFU/LzFIU47lcFwRFjgfB8po=
 =8vFi
 -----END PGP SIGNATURE-----

Merge tag 'dm-3.20-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull more device mapper changes from Mike Snitzer:

- Significant dm-crypt CPU scalability performance improvements thanks
  to changes that enable effective use of an unbound workqueue across
  all available CPUs.  A large battery of tests were performed to
  validate these changes, summary of results is available here:
  https://www.redhat.com/archives/dm-devel/2015-February/msg00106.html

- A few additional stable fixes (to DM core, dm-snapshot and dm-mirror)
  and a small fix to the dm-space-map-disk.

* tag 'dm-3.20-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm snapshot: fix a possible invalid memory access on unload
  dm: fix a race condition in dm_get_md
  dm crypt: sort writes
  dm crypt: add 'submit_from_crypt_cpus' option
  dm crypt: offload writes to thread
  dm crypt: remove unused io_pool and _crypt_io_pool
  dm crypt: avoid deadlock in mempools
  dm crypt: don't allocate pages for a partial request
  dm crypt: use unbound workqueue for request processing
  dm io: reject unsupported DISCARD requests with EOPNOTSUPP
  dm mirror: do not degrade the mirror on discard error
  dm space map disk: fix sm_disk_count_is_more_than_one()
2015-02-21 13:28:45 -08:00
Mike Snitzer 145b9006a0 dm space map disk: fix sm_disk_count_is_more_than_one()
dm_tm_shadow_block() is the only caller of
dm_sm_count_is_more_than_one() which only ever operates on a metadata
space-map.  So in practice, sm_disk_count_is_more_than_one() isn't
actually used (which explains why this bug never amounted to anything).

But fix sm_disk_count_is_more_than_one() to properly set *result and
return 0.

Reported-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-02-13 19:32:58 -05:00
Christoph Jaeger 6341e62b21 kconfig: use bool instead of boolean for type definition attributes
Support for keyword 'boolean' will be dropped later on.

No functional change.

Reference: http://lkml.kernel.org/r/cover.1418003065.git.cj@linux.com
Signed-off-by: Christoph Jaeger <cj@linux.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>
2015-01-07 13:08:04 +01:00
Joe Thornber 02717d9855 dm space map metadata: fix sm_bootstrap_get_count()
Must set 'result' accordingly rather than return it.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-12-02 10:25:06 -05:00
Dan Carpenter c1c6156fe4 dm space map metadata: fix sm_bootstrap_get_nr_blocks()
This function isn't right and it causes a static checker warning:

	drivers/md/dm-thin.c:3016 maybe_resize_data_dev()
	error: potentially using uninitialized 'sb_data_size'.

It should set "*count" and return zero on success the same as the
sm_metadata_get_nr_blocks() function does earlier.

Fixes: 3241b1d3e0 ('dm: add persistent data library')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-12-01 11:31:58 -05:00
Joe Thornber 8001e87d0e dm array: if resizing the array is a noop set the new root to the old one
This could've been quite bad (to return success but not update the new
root to point at the old) but in practice the only known consumer of the
dm array code is the DM cache target.  And the DM cache target passes in
the same old root to array_resize() anyway.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-12-01 11:30:07 -05:00
Joe Thornber 4646015d7e dm transaction manager: add support for prefetching blocks of metadata
Introduce the dm_tm_issue_prefetches interface.  If you're using a
non-blocking clone the tm will build up a list of requested blocks that
weren't in core.  dm_tm_issue_prefetches will request those blocks to be
prefetched.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10 15:25:26 -05:00
Joe Thornber 9b460d3699 dm btree: fix a recursion depth bug in btree walking code
The walk code was using a 'ro_spine' to hold it's locked btree nodes.
But this data structure is designed for the rolling lock scheme, and
as such automatically unlocks blocks that are two steps up the call
chain.  This is not suitable for the simple recursive walk algorithm,
which retraces its steps.

This code is only used by the persistent array code, which in turn is
only used by dm-cache.  In order to trigger it you need to have a
mapping tree that is more than 2 levels deep; which equates to 8-16
million cache blocks.  For instance a 4T ssd with a very small block
size of 32k only just triggers this bug.

The fix just places the locked blocks on the stack, and stops using
the ro_spine altogether.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2014-11-10 15:23:58 -05:00
Joe Thornber a9d45396f5 dm transaction manager: fix corruption due to non-atomic transaction commit
The persistent-data library used by dm-thin, dm-cache, etc is
transactional.  If anything goes wrong, such as an io error when writing
new metadata or a power failure, then we roll back to the last
transaction.

Atomicity when committing a transaction is achieved by:

a) Never overwriting data from the previous transaction.
b) Writing the superblock last, after all other metadata has hit the
   disk.

This commit and the following commit ("dm: take care to copy the space
map roots before locking the superblock") fix a bug associated with (b).
When committing it was possible for the superblock to still be written
in spite of an io error occurring during the preceeding metadata flush.
With these commits we're careful not to take the write lock out on the
superblock until after the metadata flush has completed.

Change the transaction manager's semantics for dm_tm_commit() to assume
all data has been flushed _before_ the single superblock that is passed
in.

As a prerequisite, split the block manager's block unlocking and
flushing by simplifying dm_bm_flush_and_unlock() to dm_bm_flush().  Now
the unlocking must be done separately.

This issue was discovered by forcing io errors at the crucial time
using dm-flakey.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2014-03-27 16:56:23 -04:00
Joe Thornber 428e469864 dm bitset: only flush the current word if it has been dirtied
This change offers a big performance boost for dm-era.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-03-27 16:56:23 -04:00
Joe Thornber cebc2de44d dm space map metadata: fix refcount decrement below 0 which caused corruption
This has been a relatively long-standing issue that wasn't nailed down
until Teng-Feng Yang's meticulous bug report to dm-devel on 3/7/2014,
see: http://www.redhat.com/archives/dm-devel/2014-March/msg00021.html

From that report:
  "When decreasing the reference count of a metadata block with its
  reference count equals 3, we will call dm_btree_remove() to remove
  this enrty from the B+tree which keeps the reference count info in
  metadata device.

  The B+tree will try to rebalance the entry of the child nodes in each
  node it traversed, and the rebalance process contains the following
  steps.

  (1) Finding the corresponding children in current node (shadow_current(s))
  (2) Shadow the children block (issue BOP_INC)
  (3) redistribute keys among children, and free children if necessary (issue BOP_DEC)

  Since the update of a metadata block's reference count could be
  recursive, we will stash these reference count update operations in
  smm->uncommitted and then process them in a FILO fashion.

  The problem is that step(3) could free the children which is created
  in step(2), so the BOP_DEC issued in step(3) will be carried out
  before the BOP_INC issued in step(2) since these BOPs will be
  processed in FILO fashion. Once the BOP_DEC from step(3) tries to
  decrease the reference count of newly shadow block, it will report
  failure for its reference equals 0 before decreasing. It looks like we
  can solve this issue by processing these BOPs in a FIFO fashion
  instead of FILO."

Commit 5b564d80 ("dm space map: disallow decrementing a reference count
below zero") changed the code to report an error for this temporary
refcount decrement below zero.  So what was previously a harmless
invalid refcount became a hard failure due to the new error path:

 device-mapper: space map common: unable to decrement a reference count below 0
 device-mapper: thin: 253:6: dm_thin_insert_block() failed: error = -22
 device-mapper: thin: 253:6: switching pool to read-only mode

This bug is in dm persistent-data code that is common to the DM thin and
cache targets.  So any users of those targets should apply this fix.

Fix this by applying recursive space map operations in FIFO order rather
than FILO.

Resolves: https://bugzilla.kernel.org/show_bug.cgi?id=68801

Reported-by: Apollon Oikonomopoulos <apoikos@debian.org>
Reported-by: edwillam1007@gmail.com
Reported-by: Teng-Feng Yang <shinrairis@gmail.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.13+
2014-03-07 12:02:47 -05:00
Mike Snitzer c64d240df3 dm: fix Kconfig indentation
Since DM_DEBUG_BLOCK_STACK_TRACING is a DM_PERSISTENT_DATA config option
move it from drivers/md/Kconfig to drivers/md/persistent-data/Kconfig.

Doing so fixes indentation for other DM config options.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-03-03 17:31:07 -05:00
Mike Snitzer 7d48935eff dm thin: allow metadata space larger than supported to go unused
It was always intended that a user could provide a thin metadata device
that is larger than the max supported by the on-disk format.  The extra
space would just go unused.

Unfortunately that never worked.  If the user attempted to use a larger
metadata device on creation they would get an error like the following:

 device-mapper: space map common: space map too large
 device-mapper: transaction manager: couldn't create metadata space map
 device-mapper: thin metadata: tm_create_with_sm failed
 device-mapper: table: 252:17: thin-pool: Error creating metadata object
 device-mapper: ioctl: error adding target to table

Fix this by allowing the initial metadata space map creation to cap its
size at the max number of blocks supported (DM_SM_METADATA_MAX_BLOCKS).
get_metadata_dev_size() must also impose DM_SM_METADATA_MAX_BLOCKS (via
THIN_METADATA_MAX_SECTORS), otherwise extending metadata would cap at
THIN_METADATA_MAX_SECTORS_WARNING (which is larger than supported).

Also, the calculation for THIN_METADATA_MAX_SECTORS didn't account for
the sizeof the disk_bitmap_header.  So the supported maximum metadata
size is a bit smaller (reduced from 33423360 to 33292800 sectors).

Lastly, remove the "excess space will not be used" warning message from
get_metadata_dev_size(); it resulted in printing the warning multiple
times.  Factor out warn_if_metadata_device_too_big(), call it from
pool_ctr() and maybe_resize_metadata_dev().

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
2014-02-27 11:49:08 -05:00
Joe Thornber fca028438f dm space map metadata: fix bug in resizing of thin metadata
This bug was introduced in commit 7e664b3dec ("dm space map metadata:
fix extending the space map").

When extending a dm-thin metadata volume we:

- Switch the space map into a simple bootstrap mode, which allocates
  all space linearly from the newly added space.
- Add new bitmap entries for the new space
- Increment the reference counts for those newly allocated bitmap
  entries
- Commit changes to disk
- Switch back out of bootstrap mode.

But, the disk commit may allocate space itself, if so this fact will be
lost when switching out of bootstrap mode.

The bug exhibited itself as an error when the bitmap_root, with an
erroneous ref count of 0, was subsequently decremented as part of a
later disk commit.  This would cause the disk commit to fail, and thinp
to enter read_only mode.  The metadata was not damaged (thin_check
passed).

The fix is to put the increments + commit into a loop, running until
the commit has not allocated extra space.  In practise this loop only
runs twice.

With this fix the following device mapper testsuite test passes:
 dmtest run --suite thin-provisioning -n thin_remove_works_after_resize

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # depends on commit 7e664b3dec
2014-01-21 12:15:01 -05:00
Joe Thornber f164e6900f dm btree: add dm_btree_find_lowest_key
dm_btree_find_lowest_key is the reciprocal of dm_btree_find_highest_key.
Factor out common code for dm_btree_find_{highest,lowest}_key.

dm_btree_find_lowest_key is needed for an upcoming DM target, as such it
is best to get this interface in place.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-01-09 16:29:17 -05:00
Joe Thornber 7e664b3dec dm space map metadata: fix extending the space map
When extending a metadata space map we should do the first commit whilst
still in bootstrap mode -- a mode where all blocks get allocated in the
new area.

That way the commit overhead is allocated from the newly added space.
Otherwise we risk running out of space.

With this fix, and the previous commit "dm space map common: make sure
new space is used during extend", the following device mapper testsuite
test passes:
 dmtest run --suite thin-provisioning -n /resize_metadata_no_io/

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2014-01-07 21:05:18 -05:00
Joe Thornber 12c91a5c2d dm space map common: make sure new space is used during extend
When extending a low level space map we should update nr_blocks at
the start so the new space is used for the index entries.

Otherwise extend can fail, e.g.: sm_metadata_extend call sequence
that fails:
 -> sm_ll_extend
    -> dm_tm_new_block -> dm_sm_new_block -> sm_bootstrap_new_block
    => returns -ENOSPC because smm->begin == smm->ll.nr_blocks

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2014-01-07 21:05:17 -05:00
Mike Snitzer 10343180f5 dm persistent data: cleanup dm-thin specific references in text
DM's persistent-data library is now used my multiple targets so
exclusive references to "pool" or "thin provisioning" need to be
cleaned up.  Adjust Kconfig's DM_DEBUG_BLOCK_STACK_TRACING text
and remove "pool" from a block manager error message.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
2014-01-07 10:11:54 -05:00
Mike Snitzer c46985e211 dm space map metadata: limit errors in sm_metadata_new_block
The "unable to allocate new metadata block" error can be a particularly
verbose error if there is a systemic issue with the metadata device.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
2014-01-07 10:11:46 -05:00
Joe Thornber ed9571f0cf dm array: fix a reference counting bug in shadow_ablock
An old array block could have its reference count decremented below
zero when it is being replaced in the btree by a new array block.

The fix is to increment the old ablock's reference count just before
inserting a new ablock into the btree.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.9+
2013-12-13 14:22:10 -05:00
Joe Thornber 5b564d80f8 dm space map: disallow decrementing a reference count below zero
The old behaviour, returning -EINVAL if a ref_count of 0 would be
decremented, was removed in commit f722063 ("dm space map: optimise
sm_ll_dec and sm_ll_inc").  To fix this regression we return an error
code from the mutator function pointer passed to sm_ll_mutate() and have
dec_ref_count() return -EINVAL if the old ref_count is 0.

Add a DMERR to reflect the potential seriousness of this error.

Also, add missing dm_tm_unlock() to sm_ll_mutate()'s error path.

With this fix the following dmts regression test now passes:
 dmtest run --suite cache -n /metadata_use_kernel/

The next patch fixes the higher-level dm-array code that exposed this
regression.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.12+
2013-12-13 14:22:09 -05:00
Joe Thornber 9b7aaa64f9 dm thin: allow pool in read-only mode to transition to read-write mode
A thin-pool may be in read-only mode because the pool's data or metadata
space was exhausted.  To allow for recovery, by adding more space to the
pool, we must allow a pool to transition from PM_READ_ONLY to PM_WRITE
mode.  Otherwise, running out of space will render the pool permanently
read-only.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2013-12-10 16:35:13 -05:00
Mike Snitzer f62b6b8f49 dm space map metadata: return on failure in sm_metadata_new_block
Commit 2fc48021f4 ("dm persistent
metadata: add space map threshold callback") introduced a regression
to the metadata block allocation path that resulted in errors being
ignored.  This regression was uncovered by running the following
device-mapper-test-suite test:
dmtest run --suite thin-provisioning -n /exhausting_metadata_space_causes_fail_mode/

The ignored error codes in sm_metadata_new_block() could crash the
kernel through use of either the dm-thin or dm-cache targets, e.g.:

device-mapper: thin: 253:4: reached low water mark for metadata device: sending event.
device-mapper: space map metadata: unable to allocate new metadata block
general protection fault: 0000 [#1] SMP
...
Workqueue: dm-thin do_worker [dm_thin_pool]
task: ffff880035ce2ab0 ti: ffff88021a054000 task.ti: ffff88021a054000
RIP: 0010:[<ffffffffa0331385>]  [<ffffffffa0331385>] metadata_ll_load_ie+0x15/0x30 [dm_persistent_data]
RSP: 0018:ffff88021a055a68  EFLAGS: 00010202
RAX: 003fc8243d212ba0 RBX: ffff88021a780070 RCX: ffff88021a055a78
RDX: ffff88021a055a78 RSI: 0040402222a92a80 RDI: ffff88021a780070
RBP: ffff88021a055a68 R08: ffff88021a055ba4 R09: 0000000000000010
R10: 0000000000000000 R11: 00000002a02e1000 R12: ffff88021a055ad4
R13: 0000000000000598 R14: ffffffffa0338470 R15: ffff88021a055ba4
FS:  0000000000000000(0000) GS:ffff88033fca0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f467c0291b8 CR3: 0000000001a0b000 CR4: 00000000000007e0
Stack:
 ffff88021a055ab8 ffffffffa0332020 ffff88021a055b30 0000000000000001
 ffff88021a055b30 0000000000000000 ffff88021a055b18 0000000000000000
 ffff88021a055ba4 ffff88021a055b98 ffff88021a055ae8 ffffffffa033304c
Call Trace:
 [<ffffffffa0332020>] sm_ll_lookup_bitmap+0x40/0xa0 [dm_persistent_data]
 [<ffffffffa033304c>] sm_metadata_count_is_more_than_one+0x8c/0xc0 [dm_persistent_data]
 [<ffffffffa0333825>] dm_tm_shadow_block+0x65/0x110 [dm_persistent_data]
 [<ffffffffa0331b00>] sm_ll_mutate+0x80/0x300 [dm_persistent_data]
 [<ffffffffa0330e60>] ? set_ref_count+0x10/0x10 [dm_persistent_data]
 [<ffffffffa0331dba>] sm_ll_inc+0x1a/0x20 [dm_persistent_data]
 [<ffffffffa0332270>] sm_disk_new_block+0x60/0x80 [dm_persistent_data]
 [<ffffffff81520036>] ? down_write+0x16/0x40
 [<ffffffffa001e5c4>] dm_pool_alloc_data_block+0x54/0x80 [dm_thin_pool]
 [<ffffffffa001b23c>] alloc_data_block+0x9c/0x130 [dm_thin_pool]
 [<ffffffffa001c27e>] provision_block+0x4e/0x180 [dm_thin_pool]
 [<ffffffffa001fe9a>] ? dm_thin_find_block+0x6a/0x110 [dm_thin_pool]
 [<ffffffffa001c57a>] process_bio+0x1ca/0x1f0 [dm_thin_pool]
 [<ffffffff8111e2ed>] ? mempool_free+0x8d/0xa0
 [<ffffffffa001d755>] process_deferred_bios+0xc5/0x230 [dm_thin_pool]
 [<ffffffffa001d911>] do_worker+0x51/0x60 [dm_thin_pool]
 [<ffffffff81067872>] process_one_work+0x182/0x3b0
 [<ffffffff81068c90>] worker_thread+0x120/0x3a0
 [<ffffffff81068b70>] ? manage_workers+0x160/0x160
 [<ffffffff8106eb2e>] kthread+0xce/0xe0
 [<ffffffff8106ea60>] ? kthread_freezable_should_stop+0x70/0x70
 [<ffffffff8152af6c>] ret_from_fork+0x7c/0xb0
 [<ffffffff8106ea60>] ? kthread_freezable_should_stop+0x70/0x70
 [<ffffffff8152af6c>] ret_from_fork+0x7c/0xb0
 [<ffffffff8106ea60>] ? kthread_freezable_should_stop+0x70/0x70

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Cc: stable@vger.kernel.org # v3.10+
2013-12-10 16:34:28 -05:00
Joe Thornber 40c57f475f dm space map disk: optimise sm_disk_dec_block
Don't waste time spotting blocks that have been allocated and then freed
in the same transaction.

The extra lookup is expensive, and I don't think it really gives us much.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09 18:20:24 -05:00
Joe Thornber 9c1d4de560 dm array: fix bug in growing array
Entries would be lost if the old tail block was partially filled.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.9+
2013-11-05 11:20:50 -05:00
Joe Thornber f722063ee0 dm space map: optimise sm_ll_dec and sm_ll_inc
Prior to this patch these methods did a lookup followed by an insert.
Instead they now call a common mutate function that adjusts the value
according to a callback function.  This avoids traversing the data
structures twice and hence improves performance.

Also factor out sm_ll_lookup_big_ref_count() for use by both
sm_ll_lookup() and sm_ll_mutate().

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-08-23 09:02:14 -04:00
Joe Thornber 04f17c802f dm btree: prefetch child nodes when walking tree for a dm_btree_del
dm-btree now takes advantage of dm-bufio's ability to prefetch data via
dm_bm_prefetch().  Prior to this change many btree node visits were
causing a synchronous read.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-08-23 09:02:14 -04:00
Joe Thornber cd5acf0b44 dm btree: use pop_frame in dm_btree_del to cleanup code
Remove a visited leaf straight away from the stack, rather than
marking all it's children as visited and letting it get removed on the
next iteration.  May also offer a micro optimisation in dm_btree_del.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-08-23 09:02:14 -04:00
Joe Thornber 2fc48021f4 dm persistent metadata: add space map threshold callback
Add a threshold callback to dm persistent data space maps.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-05-10 14:37:20 +01:00
Joe Thornber 7c3d3f2a87 dm persistent data: add threshold callback to space map
Add a threshold callback function to the persistent data space map
interface for a subsequent patch to use.

dm-thin and dm-cache are interested in knowing when they're getting
low on metadata or data blocks.  This patch introduces a new method
for registering a callback against a threshold.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-05-10 14:37:20 +01:00
Joe Thornber 1921c56d95 dm persistent data: support space map resizing
Support extending a dm persistent data metadata space map.

The extend itself is implemented by switching back to the boostrap
allocator and pointing to the new space.  The extra bitmap indexes are
then allocated from the new space, and finally we switch back to the
proper space map ops and tweak the reference counts.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-05-10 14:37:19 +01:00
Joe Thornber 88a488f624 dm persistent data: fix error message typos
Fix some typos in dm-space-map-metadata.c error messages.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-05-10 14:37:17 +01:00
Joe Thornber f046f89a99 dm thin: fix discard corruption
Fix a bug in dm_btree_remove that could leave leaf values with incorrect
reference counts.  The effect of this was that removal of a shared block
could result in the space maps thinking the block was no longer used.
More concretely, if you have a thin device and a snapshot of it, sending
a discard to a shared region of the thin could corrupt the snapshot.

Thinp uses a 2-level nested btree to store it's mappings.  This first
level is indexed by thin device, and the second level by logical
block.

Often when we're removing an entry in this mapping tree we need to
rebalance nodes, which can involve shadowing them, possibly creating a
copy if the block is shared.  If we do create a copy then children of
that node need to have their reference counts incremented.  In this
way reference counts percolate down the tree as shared trees diverge.

The rebalance functions were incrementing the children at the
appropriate time, but they were always assuming the children were
internal nodes.  This meant the leaf values (in our case packed
block/flags entries) were not being incremented.

Cc: stable@vger.kernel.org
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-20 17:21:24 +00:00
Joe Thornber c6b4fcbad0 dm: add cache target
Add a target that allows a fast device such as an SSD to be used as a
cache for a slower device such as a disk.

A plug-in architecture was chosen so that the decisions about which data
to migrate and when are delegated to interchangeable tunable policy
modules.  The first general purpose module we have developed, called
"mq" (multiqueue), follows in the next patch.  Other modules are
under development.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Heinz Mauelshagen <mauelshagen@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:51 +00:00
Joe Thornber 7a87edfee7 dm persistent data: add bitset
Add a persistent bitset as a wrapper around dm-array.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:51 +00:00
Joe Thornber 6513c29f44 dm persistent data: add transactional array
Add a transactional array.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:51 +00:00
Joe Thornber 4e7f1f9089 dm persistent data: add btree_walk
Add dm_btree_walk to iterate through the contents of a btree.
This will be used by the dm cache target.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:50 +00:00
Mike Snitzer 018cede93c dm persistent data: set some btree fn parms const
Mark some constant parameters constant in some dm-btree functions.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:47 +00:00
Kees Cook 88ae4c5294 dm persistent data: remove CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:46 +00:00
Sasha Levin b67bfe0d42 hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived

        list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

        hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:24 -08:00
Andrew Morton df8557982f drivers/md/persistent-data/dm-transaction-manager.c: rename HASH_SIZE
Fix the warning:

  drivers/md/persistent-data/dm-transaction-manager.c:28:1: warning: "HASH_SIZE" redefined
  In file included from include/linux/elevator.h:5,
                   from include/linux/blkdev.h:216,
                   from drivers/md/persistent-data/dm-block-manager.h:11,
                   from drivers/md/persistent-data/dm-transaction-manager.h:10,
                   from drivers/md/persistent-data/dm-transaction-manager.c:6:
  include/linux/hashtable.h:22:1: warning: this is the location of the previous definition

Cc: Alasdair Kergon <agk@redhat.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:08 -08:00
Linus Torvalds b49249d103 Miscellaneous device-mapper fixes, cleanups and performance improvements.
Of particular note:
 - Disable broken WRITE SAME support in all targets except linear and striped.
   Use it when kcopyd is zeroing blocks.
 - Remove several mempools from targets by moving the data into the bio's new
   front_pad area(which dm calls 'per_bio_data').
 - Fix a race in thin provisioning if discards are misused.
 - Prevent userspace from interfering with the ioctl parameters and
   use kmalloc for the data buffer if it's small instead of vmalloc.
 - Throttle some annoying error messages when I/O fails.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ1M5OAAoJEK2W1qbAHj1nrQcP/itnnAw8RNsSHBrFMrL9wVnB
 5dmZ1BXPZmEbG+ViU4wzVmRUPHuSHTwhIqH7UFPjyCgbWaz1jaXfpyIxBsxlJi4E
 zuGjv46akANMwH0o/aJDRuEIrCnMtjLrMiY2Oq00lJFvATurwYAKSIgmnwRVdAYy
 gDehJhaymNtHVjhymu33xEn/hqqkQtUbMDj9o+IZppmAw1aQyNuYnwQu3HvcETuz
 /JBcs8isXKIQMJdMLFdGg7lZjLO241UvSwCAeGycKkupHLaYfycumPywgdiNFVUg
 L6pQP9RtAQ+H2VBQ1OIVMJxqiXxQ0xHhyxUYIe3reTar+RXoMA0yK+FiJTwSY1cE
 Xk0s8x2DXwUyu3Vx7UmvgUXnMgd4TIPITYBYiOAanEF/8Xt0voZn8mzNyyzsyFXy
 0u1vMRK+ZK7+QPio9LRh7bgHNK1g5ZyShvwqTMDmtlp+uskaP4iHDDGtVUjFA+Wf
 r9Ms0CXPbXIN6laUIT/4L3LJZtyRWB6e8wuCrUWIWWRbjrMPaPnB+/NlckGJ0CHa
 P/5r1rmLdneTEZ8Vx/2g3fBJ+H2uNQKhYujjnE0HqtHP+tvjt7ernibyU2QhNBeE
 Zy0PXRatY0Xn7UFpn44uJ2qxkWaO5Dloaa4HkWdlWFdR3f/u5MzVjy5mDXLUxkGq
 wj2Z3YkjYjy948MViBhD
 =yzhS
 -----END PGP SIGNATURE-----

Merge tag 'dm-3.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm

Pull dm update from Alasdair G Kergon:
 "Miscellaneous device-mapper fixes, cleanups and performance
  improvements.

  Of particular note:
   - Disable broken WRITE SAME support in all targets except linear and
     striped.  Use it when kcopyd is zeroing blocks.
   - Remove several mempools from targets by moving the data into the
     bio's new front_pad area(which dm calls 'per_bio_data').
   - Fix a race in thin provisioning if discards are misused.
   - Prevent userspace from interfering with the ioctl parameters and
     use kmalloc for the data buffer if it's small instead of vmalloc.
   - Throttle some annoying error messages when I/O fails."

* tag 'dm-3.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm: (36 commits)
  dm stripe: add WRITE SAME support
  dm: remove map_info
  dm snapshot: do not use map_context
  dm thin: dont use map_context
  dm raid1: dont use map_context
  dm flakey: dont use map_context
  dm raid1: rename read_record to bio_record
  dm: move target request nr to dm_target_io
  dm snapshot: use per_bio_data
  dm verity: use per_bio_data
  dm raid1: use per_bio_data
  dm: introduce per_bio_data
  dm kcopyd: add WRITE SAME support to dm_kcopyd_zero
  dm linear: add WRITE SAME support
  dm: add WRITE SAME support
  dm: prepare to support WRITE SAME
  dm ioctl: use kmalloc if possible
  dm ioctl: remove PF_MEMALLOC
  dm persistent data: improve improve space map block alloc failure message
  dm thin: use DMERR_LIMIT for errors
  ...
2012-12-21 17:08:06 -08:00
Joe Thornber 7960123f2d dm persistent data: improve improve space map block alloc failure message
Improve space map error message when unable to allocate a new
metadata block.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:36 +00:00
Mike Snitzer 89ddeb8cb1 dm persistent data: use DMERR_LIMIT for errors
Nearly all of persistent-data is in the IO path so throttle error
messages with DMERR_LIMIT to limit the amount logged when
something has gone wrong.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:34 +00:00
Mike Snitzer a5bd968aeb dm block manager: reinstate message when validator fails
Reinstate a useful error message when the block manager buffer validator fails.
This was mistakenly eliminated when the block manager was converted to use
dm-bufio.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:34 +00:00
Joe Thornber e3cbf94513 dm persistent data: fix nested btree deletion
When deleting nested btrees, the code forgets to delete the innermost
btree.  The thin-metadata code serendipitously compensates for this by
claiming there is one extra layer in the tree.

This patch corrects both problems.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:32 +00:00
Mikulas Patocka 550929faf8 dm persistent data: rename node to btree_node
This patch fixes a compilation failure on sparc32 by renaming struct node.

struct node is already defined in include/linux/node.h. On sparc32, it
happens to be included through other dependencies and persistent-data
doesn't compile because of conflicting declarations.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:30 +00:00
Masanari Iida 83f0d77a7f md: Fix typo in drivers/md
Correct spelling typo in drivers/md.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-10-29 22:57:50 +01:00
Wei Yongjun 0bcf08798e dm persistent data: convert to use le32_add_cpu
Convert cpu_to_le32(le32_to_cpu(E1) + E2) to use le32_add_cpu().

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-10-12 16:59:47 +01:00
Joe Thornber 310975573b dm persistent data: introduce dm_bm_set_read_only
Introduce dm_bm_set_read_only to switch the block manager into a
read-only mode.  To be used when dm-thin degrades due to io errors on
the metadata device.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-27 15:08:15 +01:00
Joe Thornber 3c9ad9bd87 dm persistent data: stop using dm_bm_unlock_move when shadowing blocks in tm
Stop using dm_bm_unlock_move when shadowing blocks in the transaction
manager as an optimisation and remove the function as it is then no
longer used.

Some code, such as the space maps, keeps using on-disk data structures
from the previous transaction.  It can do this because blocks won't
be reallocated until the subsequent transaction.  Using
dm_bm_unlock_move to copy blocks sounds like a win, but it forces a
synchronous read should the old block be accessed.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-27 15:08:09 +01:00
Joe Thornber 384ef0e62e dm persistent data: tidy transaction manager creation fns
Tidy the transaction manager creation functions.

They no longer lock the superblock.  Superblock locking is pulled out to
the caller.

Also export dm_bm_write_lock_zero.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-27 15:08:09 +01:00
Joe Thornber 51a0f659c0 dm persistent data: create new dm_block_manager struct
This patch introduces a separate struct for the block_manager.
It also uses IS_ERR to check the return value of dm_bufio_client_create
instead of testing incorrectly for NULL.

Prior to this patch a struct dm_block_manager was really an alias for
a struct dm_bufio_client.  We want to add some functionality to the
block manager that will require extra fields, so this one to one
mapping is no longer valid.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-27 15:08:08 +01:00
Joe Thornber f4b90369d3 dm persistent data: only commit space map if index changed
Introduce bitmap_index_changed to track whether or not the index changed
then only commit a space map if it did.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-27 15:08:06 +01:00