OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Ingo Molnar	0881e7bd34	sched/headers: Prepare to move the get_task_struct()/put_task_struct() and related APIs from <linux/sched.h> to <linux/sched/task.h> But first update usage sites with the new header dependency. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:40 +01:00
Linus Torvalds	7a771ceac7	- Fix dm-raid transient device failure processing and other smaller tweaks. - Add journal support to the DM raid target to close the 'write hole' on raid 4/5/6. - Fix dm-cache corruption, due to rounding bug, when cache exceeds 2TB. - Add 'metadata2' feature to dm-cache to separate the dirty bitset out from other cache metadata. This improves speed of shutting down a large cache device (which implies writing out dirty bits). - Fix a memory leak during dm-stats data structure destruction. - Fix a DM multipath round-robin path selector performance regression that was caused by less precise balancing across all paths. - Lastly, introduce a DM core fix for a long-standing DM snapshot deadlock that is rooted in the complexity of the device stack used in conjunction with block core maintaining bios on current->bio_list to manage recursion in generic_make_request(). A more comprehensive fix to block core (and its hook in the cpu scheduler) would be wonderful but this DM-specific fix is pragmatic considering how difficult it has been to make progress on a generic fix. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJYrJJeAAoJEMUj8QotnQNaDskIAIJeMX3Dc8Skt00tZ6vEj3p6 9juDpOrBKH3RYdqPmrYy9lVhhpFs6OoDfTQZaW/SmjDjHboJ3skKMjO+/NWav4nN 39LoDfxLbDi06fC7Y4H7FHUPjb5sKSzw4W5IttFEKmHOwkz+iwVFL1R0dihBqv7G Lq0Ta6xffW8jHrzpmmSDY1I6FSmZ9LlHPCL00qQ5Z7WkMS5oDk0GzZoLFasdNfvm fP9N13+uel2/R7hclpxE6J+IZPN5ARG3HAQ5POS+2gMlIzaH4AlMh7yf5q0sSGwq uQsmdps8c+LOtAakOzVScykEZvwBh+ci8VqE1X1zol+fl8ijeWqgWtz4XXYECC0= =saD8 -----END PGP SIGNATURE----- Merge tag 'dm-4.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Fix dm-raid transient device failure processing and other smaller tweaks. - Add journal support to the DM raid target to close the 'write hole' on raid 4/5/6. - Fix dm-cache corruption, due to rounding bug, when cache exceeds 2TB. - Add 'metadata2' feature to dm-cache to separate the dirty bitset out from other cache metadata. This improves speed of shutting down a large cache device (which implies writing out dirty bits). - Fix a memory leak during dm-stats data structure destruction. - Fix a DM multipath round-robin path selector performance regression that was caused by less precise balancing across all paths. - Lastly, introduce a DM core fix for a long-standing DM snapshot deadlock that is rooted in the complexity of the device stack used in conjunction with block core maintaining bios on current->bio_list to manage recursion in generic_make_request(). A more comprehensive fix to block core (and its hook in the cpu scheduler) would be wonderful but this DM-specific fix is pragmatic considering how difficult it has been to make progress on a generic fix. * tag 'dm-4.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (22 commits) dm: flush queued bios when process blocks to avoid deadlock dm round robin: revert "use percpu 'repeat_count' and 'current_path'" dm stats: fix a leaked s->histogram_boundaries array dm space map metadata: constify dm_space_map structures dm cache metadata: use cursor api in blocks_are_clean_separate_dirty() dm persistent data: add cursor skip functions to the cursor APIs dm cache metadata: use dm_bitset_new() to create the dirty bitset in format 2 dm bitset: add dm_bitset_new() dm cache metadata: name the cache block that couldn't be loaded dm cache metadata: add "metadata2" feature dm cache metadata: use bitset cursor api to load discard bitset dm bitset: introduce cursor api dm btree: use GFP_NOFS in dm_btree_del() dm space map common: memcpy the disk root to ensure it's arch aligned dm block manager: add unlikely() annotations on dm_bufio error paths dm cache: fix corruption seen when using cache > 2TB dm raid: cleanup awkward branching in raid_message() option processing dm raid: use mddev rather than rdev->mddev dm raid: use read_disk_sb() throughout dm raid: add raid4/5/6 journaling support ...	2017-02-21 12:11:41 -08:00
Bhumika Goyal	b79af13efd	dm space map metadata: constify dm_space_map structures Declare dm_space_map structures as const as they are only passed as an argument to the function memcpy. This argument is of type const void *, so dm_space_map structures having this property can be declared as const. File size before: text data bss dec hex filename 4889 240 0 5129 1409 dm-space-map-metadata.o File size after: text data bss dec hex filename 5139 0 0 5139 1413 dm-space-map-metadata.o Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-02-16 14:14:36 -05:00
Joe Thornber	9b696229aa	dm persistent data: add cursor skip functions to the cursor APIs Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-02-16 13:12:50 -05:00
Joe Thornber	2151249eaa	dm bitset: add dm_bitset_new() A more efficient way of creating a populated bitset. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-02-16 13:12:48 -05:00
Joe Thornber	6fe28dbf05	dm bitset: introduce cursor api Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-02-16 13:12:45 -05:00
Joe Thornber	9f9ef0657d	dm btree: use GFP_NOFS in dm_btree_del() dm_btree_del() is called from an ioctl so don't recurse into FS. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-02-16 13:09:10 -05:00
Joe Thornber	3ba3ba1e84	dm space map common: memcpy the disk root to ensure it's arch aligned The metadata_space_map_root passed to sm_ll_open_metadata() may or may not be arch aligned, use memcpy to ensure it is. This is not a fast path so the extra memcpy doesn't hurt us. Long-term it'd be better to use the kernel's alignment infrastructure to remove the memcpy()s that are littered across persistent-data (btree, array, space-maps, etc). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-02-16 13:02:54 -05:00
Joe Thornber	602548bdd5	dm block manager: add unlikely() annotations on dm_bufio error paths Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-02-16 13:01:07 -05:00
Davidlohr Bueso	642fa448ae	sched/core: Remove set_task_state() This is a nasty interface and setting the state of a foreign task must not be done. As of the following commit: `be628be095` ("bcache: Make gc wakeup sane, remove set_task_state()") ... everyone in the kernel calls set_task_state() with current, allowing the helper to be removed. However, as the comment indicates, it is still around for those archs where computing current is more expensive than using a pointer, at least in theory. An important arch that is affected is arm64, however this has been addressed now [1] and performance is up to par making no difference with either calls. Of all the callers, if any, it's the locking bits that would care most about this -- ie: we end up passing a tsk pointer to a lot of the lock slowpath, and setting ->state on that. The following numbers are based on two tests: a custom ad-hoc microbenchmark that just measures latencies (for ~65 million calls) between get_task_state() vs get_current_state(). Secondly for a higher overview, an unlink microbenchmark was used, which pounds on a single file with open, close,unlink combos with increasing thread counts (up to 4x ncpus). While the workload is quite unrealistic, it does contend a lot on the inode mutex or now rwsem. [1] https://lkml.kernel.org/r/1483468021-8237-1-git-send-email-mark.rutland@arm.com == 1. x86-64 == Avg runtime set_task_state(): 601 msecs Avg runtime set_current_state(): 552 msecs vanilla dirty Hmean unlink1-processes-2 36089.26 ( 0.00%) 38977.33 ( 8.00%) Hmean unlink1-processes-5 28555.01 ( 0.00%) 29832.55 ( 4.28%) Hmean unlink1-processes-8 37323.75 ( 0.00%) 44974.57 ( 20.50%) Hmean unlink1-processes-12 43571.88 ( 0.00%) 44283.01 ( 1.63%) Hmean unlink1-processes-21 34431.52 ( 0.00%) 38284.45 ( 11.19%) Hmean unlink1-processes-30 34813.26 ( 0.00%) 37975.17 ( 9.08%) Hmean unlink1-processes-48 37048.90 ( 0.00%) 39862.78 ( 7.59%) Hmean unlink1-processes-79 35630.01 ( 0.00%) 36855.30 ( 3.44%) Hmean unlink1-processes-110 36115.85 ( 0.00%) 39843.91 ( 10.32%) Hmean unlink1-processes-141 32546.96 ( 0.00%) 35418.52 ( 8.82%) Hmean unlink1-processes-172 34674.79 ( 0.00%) 36899.21 ( 6.42%) Hmean unlink1-processes-203 37303.11 ( 0.00%) 36393.04 ( -2.44%) Hmean unlink1-processes-224 35712.13 ( 0.00%) 36685.96 ( 2.73%) == 2. ppc64le == Avg runtime set_task_state(): 938 msecs Avg runtime set_current_state: 940 msecs vanilla dirty Hmean unlink1-processes-2 19269.19 ( 0.00%) 30704.50 ( 59.35%) Hmean unlink1-processes-5 20106.15 ( 0.00%) 21804.15 ( 8.45%) Hmean unlink1-processes-8 17496.97 ( 0.00%) 17243.28 ( -1.45%) Hmean unlink1-processes-12 14224.15 ( 0.00%) 17240.21 ( 21.20%) Hmean unlink1-processes-21 14155.66 ( 0.00%) 15681.23 ( 10.78%) Hmean unlink1-processes-30 14450.70 ( 0.00%) 15995.83 ( 10.69%) Hmean unlink1-processes-48 16945.57 ( 0.00%) 16370.42 ( -3.39%) Hmean unlink1-processes-79 15788.39 ( 0.00%) 14639.27 ( -7.28%) Hmean unlink1-processes-110 14268.48 ( 0.00%) 14377.40 ( 0.76%) Hmean unlink1-processes-141 14023.65 ( 0.00%) 16271.69 ( 16.03%) Hmean unlink1-processes-172 13417.62 ( 0.00%) 16067.55 ( 19.75%) Hmean unlink1-processes-203 15293.08 ( 0.00%) 15440.40 ( 0.96%) Hmean unlink1-processes-234 13719.32 ( 0.00%) 16190.74 ( 18.01%) Hmean unlink1-processes-265 16400.97 ( 0.00%) 16115.22 ( -1.74%) Hmean unlink1-processes-296 14388.60 ( 0.00%) 16216.13 ( 12.70%) Hmean unlink1-processes-320 15771.85 ( 0.00%) 15905.96 ( 0.85%) x86-64 (known to be fast for get_current()/this_cpu_read_stable() caching) and ppc64 (with paca) show similar improvements in the unlink microbenches. The small delta for ppc64 (2ms), does not represent the gains on the unlink runs. In the case of x86, there was a decent amount of variation in the latency runs, but always within a 20 to 50ms increase), ppc was more constant. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Cc: mark.rutland@arm.com Link: http://lkml.kernel.org/r/1483479794-14013-5-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-14 11:14:16 +01:00
Benjamin Marzinski	b446396b74	dm space map: always set ev if sm_ll_mutate() succeeds If no block was allocated or freed, sm_ll_mutate() wasn't setting ev, leaving the variable unitialized. sm_ll_insert(), sm_disk_inc_block(), and sm_disk_new_block() all check ev to see if there was an allocation event in sm_ll_mutate(), possibly reading unitialized data. If no allocation event occured, sm_ll_mutate() should set ev to SM_NONE. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-12-08 14:13:15 -05:00
Benjamin Marzinski	0c79ce0b75	dm space map metadata: skip useless memcpy in metadata_ll_init_index() When metadata_ll_init_index() is called by sm_ll_new_metadata(), ll->mi_le hasn't been initialized yet. So, when metadata_ll_init_index() copies the contents of ll->mi_le into the newly allocated bitmap_root, it is just copying garbage. ll->mi_le will be allocated later in sm_ll_extend() and copied into the bitmap_root, in sm_ll_commit(). Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-12-08 14:13:15 -05:00
Benjamin Marzinski	314c25c56c	dm space map metadata: fix 'struct sm_metadata' leak on failed create In dm_sm_metadata_create() we temporarily change the dm_space_map operations from 'ops' (whose .destroy function deallocates the sm_metadata) to 'bootstrap_ops' (whose .destroy function doesn't). If dm_sm_metadata_create() fails in sm_ll_new_metadata() or sm_ll_extend(), it exits back to dm_tm_create_internal(), which calls dm_sm_destroy() with the intention of freeing the sm_metadata, but it doesn't (because the dm_space_map operations is still set to 'bootstrap_ops'). Fix this by setting the dm_space_map operations back to 'ops' if dm_sm_metadata_create() fails when it is set to 'bootstrap_ops'. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org	2016-12-08 14:13:14 -05:00
Bart Van Assche	0637018dff	dm array: remove a dead assignment in populate_ablock_with_values() A value is assigned to 'nr_entries' but is never used, remove it. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-12-08 14:13:09 -05:00
Joe Thornber	2e8ed71102	dm block manager: make block locking optional The block manager's locking is useful for catching cycles that may result from certain btree metadata corruption. But in general it serves as a developer tool to catch bugs in code. Unless you're finding that DM thin provisioning is hanging due to infinite loops within the block manager's access to btree nodes you can safely disable this feature. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> # do/while(0) macro fix Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-11-14 15:17:47 -05:00
Joe Thornber	fdd1315aa5	dm array: introduce cursor api More efficient way to iterate an array due to prefetching (makes use of the new dm_btree_cursor_* api). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-09-22 11:15:04 -04:00
Joe Thornber	7d111c81fa	dm btree: introduce cursor api This uses prefetching to speed up iteration through a btree. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-09-22 11:15:04 -04:00
Joe Thornber	dd6a77d998	dm array: add dm_array_new() dm_array_new() creates a new, populated array more efficiently than starting with an empty one and resizing. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-09-22 11:12:23 -04:00
Joe Thornber	e7e0f73047	dm btree: fix a bug in dm_btree_find_next_single() dm_btree_find_next_single() can short-circuit the search for a block with a return of -ENODATA if all entries are higher than the search key passed to lower_bound(). This hasn't been a problem because of the way the btree has been used by DM thinp. But it must be fixed now in preparation for fixing the race in DM thinp's handling of simultaneous block discard vs allocation. Otherwise, once that fix is in place, some of the blocks in a discard would not be unmapped as expected. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-07-20 12:43:34 -04:00
Mike Snitzer	512167788a	dm space map metadata: remove unused variable in brb_pop() Remove the unused struct block_op pointer that was inadvertantly introduced, via cut-and-paste of previous brb_op() code, as part of commit `50dd842ad`. (Cc'ing stable@ because commit `50dd842ad` did) Fixes: `50dd842ad` ("dm space map metadata: fix ref counting bug when bootstrapping a new space map") Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org	2015-12-14 09:26:01 -05:00
Mike Snitzer	ba503835ad	dm btree: factor out need_insert() helper Eliminates code duplication within insert(). Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-12-10 10:38:59 -05:00
Mikulas Patocka	86bad0c707	dm bufio: store stacktrace in buffers to help find buffer leaks The option DM_DEBUG_BLOCK_STACK_TRACING is moved from persistent-data directory to device mapper directory because it will now be used by persistent-data and bufio. When the option is enabled, each bufio buffer stores the stacktrace of the last dm_bufio_get(), dm_bufio_read() or dm_bufio_new() call that increased the hold count to 1. The buffer's stacktrace is printed if the buffer was not released before the bufio client is destroyed. When DM_DEBUG_BLOCK_STACK_TRACING is enabled, any bufio buffer leaks are considered warnings - i.e. the kernel continues afterwards. If not enabled, buffer leaks are considered BUGs and the kernel with crash. Reasoning on this disposition is: if we only ever warned on buffer leaks users would generally ignore them and the problematic code would never get fixed. Successfully used to find source of bufio leaks fixed with commit fce079f63c3 ("dm btree: fix bufio buffer leaks in dm_btree_del() error path"). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-12-10 10:38:58 -05:00
Mikulas Patocka	313c9b9736	dm block manager: cleanup code that prints stacktrace There is no need to record stack trace and immediately print it. Just use dump_stack() to print the current stack. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-12-10 10:38:56 -05:00
Joe Thornber	ed8b45a367	dm btree: fix bufio buffer leaks in dm_btree_del() error path If dm_btree_del()'s call to push_frame() fails, e.g. due to btree_node_validator finding invalid metadata, the dm_btree_del() error path must unlock all frames (which have active dm-bufio buffers) that were pushed onto the del_stack. Otherwise, dm_bufio_client_destroy() will BUG_ON() because dm-bufio buffers have leaked, e.g.: device-mapper: bufio: leaked buffer 3, hold count 1, list 0 Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org	2015-12-10 10:30:18 -05:00
Joe Thornber	50dd842ad8	dm space map metadata: fix ref counting bug when bootstrapping a new space map When applying block operations (BOPs) do not remove them from the uncommitted BOP ring-buffer until after they've been applied -- in case we recurse. Also, perform BOP_INC operation, in dm_sm_metadata_create() and sm_metadata_extend(), in terms of the uncommitted BOP ring-buffer rather than using direct calls to sm_ll_inc(). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org	2015-12-09 13:27:25 -05:00
Joe Thornber	993ceab919	dm thin metadata: fix bug in dm_thin_remove_range() dm_btree_remove_leaves() only unmaps a contiguous region so we need a loop, in __remove_range(), to handle ranges that contain multiple regions. A new btree function, dm_btree_lookup_next(), is introduced which is more efficiently able to skip over regions of the thin device which aren't mapped. __remove_range() uses dm_btree_lookup_next() for each iteration of __remove_range()'s loop. Also, improve description of dm_btree_remove_leaves(). Fixes: `6550f075` ("dm thin metadata: add dm_thin_remove_range()") Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 4.1+	2015-12-02 13:26:49 -05:00
Mike Snitzer	30ce6e1cc5	dm btree: fix leak of bufio-backed block in btree_split_sibling error path The block allocated at the start of btree_split_sibling() is never released if later insert_at() fails. Fix this by releasing the previously allocated bufio block using unlock_block(). Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org	2015-12-02 13:20:34 -05:00
Linus Torvalds	e0700ce709	- Revert a dm-multipath change that caused a regression for unprivledged users (e.g. kvm guests) that issued ioctls when a multipath device had no available paths. - Include Christoph's refactoring of DM's ioctl handling and add support for passing through persistent reservations with DM multipath. - All other changes are very simple cleanups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJWOp04AAoJEMUj8QotnQNaFLsH/AhMEH/jI1ObOfy4J1Wy4rOx ujJT91uS/s0H3pc9cGKQYnuGpFkX6WWU4wMiabIyiTn4sAsoXaflfIGutivLiDJr HfecrMrGZgnP4ZlpPPB02BmlxFbcPW8yzAU4ma38xBgQ+Pu30RO/HkvX/2vKOppG qwPop/XsNxq3KXgFGM44ToytM6c/MPGluhuvOwbaacAO1HviMuen9qsVjk4kwcf3 jGYTbEPHATxyu5/6oKDTkQTYhzdwg3B2qHCiKMGw3l1kXhaQLFcaOivOLV8Sf3xh bj1070pkGe9OpqaVzMnwDtJ8rnsBl/Nt4wj9oiQPxbX71GYZAmcMIYn9WEkcKFI= =AR2D -----END PGP SIGNATURE----- Merge tag 'dm-4.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: "Smaller set of DM changes for this merge. I've based these changes on Jens' for-4.4/reservations branch because the associated DM changes required it. - Revert a dm-multipath change that caused a regression for unprivledged users (e.g. kvm guests) that issued ioctls when a multipath device had no available paths. - Include Christoph's refactoring of DM's ioctl handling and add support for passing through persistent reservations with DM multipath. - All other changes are very simple cleanups" * tag 'dm-4.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm switch: simplify conditional in alloc_region_table() dm delay: document that offsets are specified in sectors dm delay: capitalize the start of an delay_ctr() error message dm delay: Use DM_MAPIO macros instead of open-coded equivalents dm linear: remove redundant target name from error messages dm persistent data: eliminate unnecessary return values dm: eliminate unused "bioset" process for each bio-based DM device dm: convert ffs to __ffs dm: drop NULL test before kmem_cache_destroy() and mempool_destroy() dm: add support for passing through persistent reservations dm: refactor ioctl handling Revert "dm mpath: fix stalls when handling invalid ioctls" dm: initialize non-blk-mq queue data before queue is used	2015-11-04 21:19:53 -08:00
Mikulas Patocka	4c7da06f5a	dm persistent data: eliminate unnecessary return values dm_bm_unlock and dm_tm_unlock return an integer value but the returned value is always 0. The calling code sometimes checks the return value and sometimes doesn't. Eliminate these unnecessary return values and also the checks for them. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-10-31 19:06:02 -04:00
Mike Snitzer	4dcb8b57df	dm btree: fix leak of bufio-backed block in btree_split_beneath error path btree_split_beneath()'s error path had an outstanding FIXME that speaks directly to the potential for _not_ cleaning up a previously allocated bufio-backed block. Fix this by releasing the previously allocated bufio block using unlock_block(). Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <thornber@redhat.com> Cc: stable@vger.kernel.org	2015-10-23 14:02:55 -04:00
Joe Thornber	2871c69e02	dm btree remove: fix a bug when rebalancing nodes after removal Commit `4c7e309340` ("dm btree remove: fix bug in redistribute3") wasn't a complete fix for redistribute3(). The redistribute3 function takes 3 btree nodes and shares out the entries evenly between them. If the three nodes in total contained (MAX_ENTRIES * 3) - 1 entries between them then this was erroneously getting rebalanced as (MAX_ENTRIES - 1) on the left and right, and (MAX_ENTRIES + 1) in the center. Fix this issue by being more careful about calculating the target number of entries for the left and right nodes. Unit tested in userspace using this program: https://github.com/jthornber/redistribute3-test/blob/master/redistribute3_t.c Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org	2015-10-23 14:02:55 -04:00
viresh kumar	fc0a446152	dm: remove unlikely() before IS_ERR() IS_ERR() already contains an 'unlikely' compiler flag so there is no need to do that again from IS_ERR() callers. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-08-12 11:32:21 -04:00
Vivek Goyal	8c747fd0c3	dm btree remove: remove unused function get_nr_entries() rebalance_children() calls get_nr_entries() and assigns the result to an unused local 'child_entries' variable. Remove get_nr_entries() and cleanup rebalance_children() accordingly. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-08-12 11:32:19 -04:00
Vivek Goyal	0a8d4c3ef8	dm btree: remove unused "dm_block_t root" parameter in btree_split_sibling() Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-08-12 11:32:19 -04:00
Joe Thornber	b0dc3c8bc1	dm btree: add ref counting ops for the leaves of top level btrees When using nested btrees, the top leaves of the top levels contain block addresses for the root of the next tree down. If we shadow a shared leaf node the leaf values (sub tree roots) should be incremented accordingly. This is only an issue if there is metadata sharing in the top levels. Which only occurs if metadata snapshots are being used (as is possible with dm-thinp). And could result in a block from the thinp metadata snap being reused early, thus corrupting the thinp metadata snap. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org	2015-08-12 10:50:37 -04:00
Joe Thornber	aa0cd28d05	dm btree remove: fix bug in remove_one() remove_one() was not incrementing the key for the beginning of the range, so not all entries were being removed. This resulted in discards that were not unmapping all blocks. Fixes: `4ec331c3ea` ("dm btree: add dm_btree_remove_leaves()") Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-08-07 11:56:43 -04:00
Joe Thornber	1c7518794a	dm btree: silence lockdep lock inversion in dm_btree_del() Allocate memory using GFP_NOIO when deleting a btree. dm_btree_del() can be called via an ioctl and we don't want to recurse into the FS or block layer. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org	2015-07-06 10:45:02 -04:00
Dennis Yang	4c7e309340	dm btree remove: fix bug in redistribute3 redistribute3() shares entries out across 3 nodes. Some entries were being moved the wrong way, breaking the ordering. This manifested as a BUG() in dm-btree-remove.c:shift() when entries were removed from the btree. For additional context see: https://www.redhat.com/archives/dm-devel/2015-May/msg00113.html Signed-off-by: Dennis Yang <shinrairis@gmail.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org	2015-07-05 17:43:58 -04:00
Joe Thornber	6096d91af0	dm space map metadata: fix occasional leak of a metadata block on resize The metadata space map has a simplified 'bootstrap' mode that is operational when extending the space maps. Whilst in this mode it's possible for some refcount decrement operations to become queued (eg, as a result of shadowing one of the bitmap indexes). These decrements were not being applied when switching out of bootstrap mode. The effect of this bug was the leaking of a 4k metadata block. This is detected by the latest version of thin_check as a non fatal error. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org	2015-06-17 10:09:23 -04:00
Joe Thornber	4ec331c3ea	dm btree: add dm_btree_remove_leaves() Removes a range of leaf values from the tree. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-06-11 17:13:03 -04:00
Mike Snitzer	49f154c732	dm thin metadata: remove in-core 'read_only' flag Leverage the block manager's read_only flag instead of duplicating it; access with new dm_bm_is_read_only() method. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-05-29 14:18:59 -04:00
Linus Torvalds	a911dcdba1	- Significant dm-crypt CPU scalability performance improvements thanks to changes that enable effective use of an unbound workqueue across all available CPUs. A large battery of tests were performed to validate these changes, summary of results is available here: https://www.redhat.com/archives/dm-devel/2015-February/msg00106.html - A few additional stable fixes (to DM core, dm-snapshot and dm-mirror) and a small fix to the dm-space-map-disk. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJU51MsAAoJEMUj8QotnQNan8wH/3Oa2/ofbX8zNYtdDxFBK7W0 pHwVA2CH9yzDDi90X9GsQODmXerMrYf+SHo/RsQTpSnt5WI/4ZVP/1EoNGu6T2Yr XiaKc/Jwo+ScxdhoHSNtyGL5vKeipX6clgz1wcd/4/UBcBAr0Vxj25s8Ta7naK2V hZyWnt3MCGEDQ45NVBmLOU78Cl68LGf8JOUY0l3cd8ehQjGyR9a12GXwRktepslp hw9xu8uYmE93fD+MIe/fUs7W/WvIVgFSskhswlvD3kAFpEAsMczjLGVSRQlBE90L OK7v1HKxiIUJa17g3LmiCFa59ZjrnSHGpOOv9IPAFU/LzFIU47lcFwRFjgfB8po= =8vFi -----END PGP SIGNATURE----- Merge tag 'dm-3.20-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull more device mapper changes from Mike Snitzer: - Significant dm-crypt CPU scalability performance improvements thanks to changes that enable effective use of an unbound workqueue across all available CPUs. A large battery of tests were performed to validate these changes, summary of results is available here: https://www.redhat.com/archives/dm-devel/2015-February/msg00106.html - A few additional stable fixes (to DM core, dm-snapshot and dm-mirror) and a small fix to the dm-space-map-disk. * tag 'dm-3.20-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm snapshot: fix a possible invalid memory access on unload dm: fix a race condition in dm_get_md dm crypt: sort writes dm crypt: add 'submit_from_crypt_cpus' option dm crypt: offload writes to thread dm crypt: remove unused io_pool and _crypt_io_pool dm crypt: avoid deadlock in mempools dm crypt: don't allocate pages for a partial request dm crypt: use unbound workqueue for request processing dm io: reject unsupported DISCARD requests with EOPNOTSUPP dm mirror: do not degrade the mirror on discard error dm space map disk: fix sm_disk_count_is_more_than_one()	2015-02-21 13:28:45 -08:00
Mike Snitzer	145b9006a0	dm space map disk: fix sm_disk_count_is_more_than_one() dm_tm_shadow_block() is the only caller of dm_sm_count_is_more_than_one() which only ever operates on a metadata space-map. So in practice, sm_disk_count_is_more_than_one() isn't actually used (which explains why this bug never amounted to anything). But fix sm_disk_count_is_more_than_one() to properly set *result and return 0. Reported-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-02-13 19:32:58 -05:00
Christoph Jaeger	6341e62b21	kconfig: use bool instead of boolean for type definition attributes Support for keyword 'boolean' will be dropped later on. No functional change. Reference: http://lkml.kernel.org/r/cover.1418003065.git.cj@linux.com Signed-off-by: Christoph Jaeger <cj@linux.com> Signed-off-by: Michal Marek <mmarek@suse.cz>	2015-01-07 13:08:04 +01:00
Joe Thornber	02717d9855	dm space map metadata: fix sm_bootstrap_get_count() Must set 'result' accordingly rather than return it. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-12-02 10:25:06 -05:00
Dan Carpenter	c1c6156fe4	dm space map metadata: fix sm_bootstrap_get_nr_blocks() This function isn't right and it causes a static checker warning: drivers/md/dm-thin.c:3016 maybe_resize_data_dev() error: potentially using uninitialized 'sb_data_size'. It should set "*count" and return zero on success the same as the sm_metadata_get_nr_blocks() function does earlier. Fixes: `3241b1d3e0` ('dm: add persistent data library') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-12-01 11:31:58 -05:00
Joe Thornber	8001e87d0e	dm array: if resizing the array is a noop set the new root to the old one This could've been quite bad (to return success but not update the new root to point at the old) but in practice the only known consumer of the dm array code is the DM cache target. And the DM cache target passes in the same old root to array_resize() anyway. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-12-01 11:30:07 -05:00
Joe Thornber	4646015d7e	dm transaction manager: add support for prefetching blocks of metadata Introduce the dm_tm_issue_prefetches interface. If you're using a non-blocking clone the tm will build up a list of requested blocks that weren't in core. dm_tm_issue_prefetches will request those blocks to be prefetched. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-11-10 15:25:26 -05:00
Joe Thornber	9b460d3699	dm btree: fix a recursion depth bug in btree walking code The walk code was using a 'ro_spine' to hold it's locked btree nodes. But this data structure is designed for the rolling lock scheme, and as such automatically unlocks blocks that are two steps up the call chain. This is not suitable for the simple recursive walk algorithm, which retraces its steps. This code is only used by the persistent array code, which in turn is only used by dm-cache. In order to trigger it you need to have a mapping tree that is more than 2 levels deep; which equates to 8-16 million cache blocks. For instance a 4T ssd with a very small block size of 32k only just triggers this bug. The fix just places the locked blocks on the stack, and stops using the ro_spine altogether. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org	2014-11-10 15:23:58 -05:00
Joe Thornber	a9d45396f5	dm transaction manager: fix corruption due to non-atomic transaction commit The persistent-data library used by dm-thin, dm-cache, etc is transactional. If anything goes wrong, such as an io error when writing new metadata or a power failure, then we roll back to the last transaction. Atomicity when committing a transaction is achieved by: a) Never overwriting data from the previous transaction. b) Writing the superblock last, after all other metadata has hit the disk. This commit and the following commit ("dm: take care to copy the space map roots before locking the superblock") fix a bug associated with (b). When committing it was possible for the superblock to still be written in spite of an io error occurring during the preceeding metadata flush. With these commits we're careful not to take the write lock out on the superblock until after the metadata flush has completed. Change the transaction manager's semantics for dm_tm_commit() to assume all data has been flushed _before_ the single superblock that is passed in. As a prerequisite, split the block manager's block unlocking and flushing by simplifying dm_bm_flush_and_unlock() to dm_bm_flush(). Now the unlocking must be done separately. This issue was discovered by forcing io errors at the crucial time using dm-flakey. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org	2014-03-27 16:56:23 -04:00

1 2 3

107 Commits