OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Kent Overstreet	8835c1234d	bcache: Add make_btree_freeing_key() Refactoring, prep work for incremental garbage collection. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:37 -08:00
Kent Overstreet	f269af5a07	bcache: Add btree_node_write_sync() More refactoring - mostly making the interfaces more explicit about what we actually want to do. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:36 -08:00
Kent Overstreet	0eacac2203	bcache: PRECEDING_KEY() btree_insert_key() was open coding this, this is just refactoring. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:36 -08:00
Kent Overstreet	d5cc66e957	bcache: bch_(btree\|extent)_ptr_invalid() Trying to treat btree pointers and leaf node pointers the same way was a mistake - going to start being more explicit about the type of key/pointer we're dealing with. This is the first part of that refactoring; this patch shouldn't change any actual behaviour. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:35 -08:00
Kent Overstreet	3a3b6a4e07	bcache: Don't bother with bucket refcount for btree node allocations The bucket refcount (dropped with bkey_put()) is only needed to prevent the newly allocated bucket from being garbage collected until we've added a pointer to it somewhere. But for btree node allocations, the fact that we have btree nodes locked is enough to guard against races with garbage collection. Eventually the per bucket refcount is going to be replaced with something specific to bch_alloc_sectors(). Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:34 -08:00
Kent Overstreet	280481d06c	bcache: Debug code improvements Couple changes: * Consolidate bch_check_keys() and bch_check_key_order(), and move the checks that only check_key_order() could do to bch_btree_iter_next(). * Get rid of CONFIG_BCACHE_EDEBUG - now, all that code is compiled in when CONFIG_BCACHE_DEBUG is enabled, and there's now a sysfs file to flip on the EDEBUG checks at runtime. * Dropped an old not terribly useful check in rw_unlock(), and refactored/improved a some of the other debug code. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:34 -08:00
Kent Overstreet	e58ff15503	bcache: Fix bch_ptr_bad() Previously, bch_ptr_bad() could return false when there was a pointer to a nonexistant device... it only filtered out keys with PTR_CHECK_DEV pointers. This behaviour was intended for multiple cache device support; for that, just because the device for one of the pointers has gone away doesn't mean we want to filter out the rest of the pointers. But we don't yet explicitly filter/check individual pointers, so without that this behaviour was wrong - a corrupt bkey with a bad device pointer could cause us to deref a bad pointer. Doh. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:33 -08:00
Kent Overstreet	81ab4190ac	bcache: Pull on disk data structures out into a separate header Now, the on disk data structures are in a header that can be exported to userspace - and having them all centralized is nice too. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:33 -08:00
Kent Overstreet	2599b53b7b	bcache: Move sector allocator to alloc.c Just reorganizing things a bit. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:32 -08:00
Kent Overstreet	220bb38c21	bcache: Break up struct search With all the recent refactoring around struct btree op struct search has gotten rather large. But we can now easily break it up in a different way - we break out struct btree_insert_op which is for inserting data into the cache, and that's now what the copying gc code uses - struct search is now specific to request.c Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:32 -08:00
Kent Overstreet	cc7b881921	bcache: Convert bch_btree_insert() to bch_btree_map_leaf_nodes() Last of the btree_map() conversions. Main visible effect is bch_btree_insert() is no longer taking a struct btree_op as an argument anymore - there's no fancy state machine stuff going on, it's just a normal function. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:31 -08:00
Kent Overstreet	6054c6d4da	bcache: Don't use op->insert_collision When we convert bch_btree_insert() to bch_btree_map_leaf_nodes(), we won't be passing struct btree_op to bch_btree_insert() anymore - so we need a different way of returning whether there was a collision (really, a replace collision). Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:30 -08:00
Kent Overstreet	1b207d80d5	bcache: Kill op->replace This is prep work for converting bch_btree_insert to bch_btree_map_leaf_nodes() - we have to convert all its arguments to actual arguments. Bunch of churn, but should be straightforward. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:29 -08:00
Kent Overstreet	faadf0c965	bcache: Drop some closure stuff With a the recent bcache refactoring, some of the closure code isn't needed anymore. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:10 -08:00
Kent Overstreet	b54d6934da	bcache: Kill op->cl This isn't used for waiting asynchronously anymore - so this is a fairly trivial refactoring. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:09 -08:00
Kent Overstreet	c18536a72d	bcache: Prune struct btree_op Eventual goal is for struct btree_op to contain only what is necessary for traversing the btree. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:08 -08:00
Kent Overstreet	cc23196631	bcache: Clean up cache_lookup_fn There was some looping in submit_partial_cache_hit() and submit_partial_cache_hit() that isn't needed anymore - originally, we wouldn't necessarily process the full hit or miss all at once because when splitting the bio, we took into account the restrictions of the device we were sending it to. But, device bio size restrictions are now handled elsewhere, with a wrapper around generic_make_request() - so that looping has been unnecessary for awhile now and we can now do quite a bit of cleanup. And if we trim the key we're reading from to match the subset we're actually reading, we don't have to explicitly calculate bi_sector anymore. Neat. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:08 -08:00
Kent Overstreet	2c1953e201	bcache: Convert bch_btree_read_async() to bch_btree_map_keys() This is a fairly straightforward conversion, mostly reshuffling - op->lookup_done goes away, replaced by MAP_DONE/MAP_CONTINUE. And the code for handling cache hits and misses wasn't really btree code, so it gets moved to request.c. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:07 -08:00
Kent Overstreet	df8e89701f	bcache: Move some stuff to btree.c With the new btree_map() functions, we don't need to export the stuff needed for traversing the btree anymore. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:07 -08:00
Kent Overstreet	48dad8baf9	bcache: Add btree_map() functions Lots of stuff has been open coding its own btree traversal - which is generally pretty simple code, but there are a few subtleties. This adds new new functions, bch_btree_map_nodes() and bch_btree_map_keys(), which do the traversal for you. Everything that's open coding btree traversal now (with the exception of garbage collection) is slowly going to be converted to these two functions; being able to write other code at a higher level of abstraction is a big improvement w.r.t. overall code quality. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:06 -08:00
Kent Overstreet	5e6926daac	bcache: Convert writeback to a kthread This simplifies the writeback flow control quite a bit - previously, it was conceptually two coroutines, refill_dirty() and read_dirty(). This makes the code quite a bit more straightforward. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:05 -08:00
Kent Overstreet	72a44517f3	bcache: Convert gc to a kthread We needed a dedicated rescuer workqueue for gc anyways... and gc was conceptually a dedicated thread, just one that wasn't running all the time. Switch it to a dedicated thread to make the code a bit more straightforward. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:04 -08:00
Kent Overstreet	35fcd848d7	bcache: Convert bucket_wait to wait_queue_head_t At one point we did do fancy asynchronous waiting stuff with bucket_wait, but that's all gone (and bucket_wait is used a lot less than it used to be). So use the standard primitives. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:04 -08:00
Kent Overstreet	e8e1d4682c	bcache: Convert try_wait to wait_queue_head_t We never waited on c->try_wait asynchronously, so just use the standard primitives. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:03 -08:00
Kent Overstreet	0b93207abb	bcache: Move keylist out of btree_op Slowly working on pruning struct btree_op - the aim is for it to only contain things that are actually necessary for traversing the btree. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:02 -08:00
Kent Overstreet	a34a8bfd4e	bcache: Refactor journalling flow control Making things less asynchronous that don't need to be - bch_journal() only has to block when the journal or journal entry is full, which is emphatically not a fast path. So make it a normal function that just returns when it finishes, to make the code and control flow easier to follow. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:02 -08:00
Kent Overstreet	cdd972b164	bcache: Refactor read request code a bit More refactoring, and renaming. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:01 -08:00
Kent Overstreet	84f0db03ea	bcache: Refactor request_write() Try to improve some of the naming a bit to be more consistent, and also improve the flow of control in request_write() a bit. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:00 -08:00
Kent Overstreet	c2f95ae2eb	bcache: Clean up keylist code More random refactoring. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:00 -08:00
Kent Overstreet	4f3d40147b	bcache: Add explicit keylist arg to btree_insert() Some refactoring - better to explicitly pass stuff around instead of having it all in the "big bag of state", struct btree_op. Going to prune struct btree_op quite a bit over time. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:55:59 -08:00
Kent Overstreet	e7c590eb63	bcache: Convert btree_insert_check_key() to btree_insert_node() This was the main point of all this refactoring - now, btree_insert_check_key() won't fail just because the leaf node happened to be full. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:55:59 -08:00
Kent Overstreet	403b6cdeb1	bcache: Insert multiple keys at a time We'll often end up with a list of adjacent keys to insert - because bch_data_insert() may have to fragment the data it writes. Originally, to simplify things and avoid having to deal with corner cases bch_btree_insert() would pass keys from this list one at a time to btree_insert_recurse() - mainly because the list of keys might span leaf nodes, so it was easier this way. With the btree_insert_node() refactoring, it's now a lot easier to just pass down the whole list and have btree_insert_recurse() iterate over leaf nodes until it's done. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:55:58 -08:00
Kent Overstreet	26c949f806	bcache: Add btree_insert_node() The flow of control in the old btree insertion code was rather - backwards; we'd recurse down the btree (in btree_insert_recurse()), and then if we needed to split the keys to be inserted into the parent node would be effectively returned up to btree_insert_recurse(), which would notice there was more work to do and finish the insertion. The main problem with this was that the full logic for btree insertion could only be used by calling btree_insert_recurse; if you'd gotten to a btree leaf some other way and had a key to insert, if it turned out that node needed to be split you were SOL. This inverts the flow of control so btree_insert_node() does _full_ btree insertion, including splitting - and takes a (leaf) btree node to insert into as a parameter. This means we can now _correctly_ handle cache misses - for cache misses, we need to insert a fake "check" key into the btree when we discover we have a cache miss - while we still have the btree locked. Previously, if the btree node was full inserting a cache miss would just fail. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:55:57 -08:00
Kent Overstreet	d6fd3b11ce	bcache: Explicitly track btree node's parent This is prep work for the reworked btree insertion code. The way we set b->parent is ugly and hacky... the problem is, when btree_split() or garbage collection splits or rewrites a btree node, the parent changes for all its (potentially already cached) children. I may change this later and add some code to look through the btree node cache and find all our cached child nodes and change the parent pointer then... Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:55:57 -08:00
Kent Overstreet	8304ad4dc8	bcache: Remove unnecessary check in should_split() Checking i->seq was redundant, because since ages ago we always initialize the new bset when advancing b->written Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:55:56 -08:00
Kent Overstreet	2d679fc756	bcache: Stripe size isn't necessarily a power of two Originally I got this right... except that the divides didn't use do_div(), which broke 32 bit kernels. When I went to fix that, I forgot that the raid stripe size usually isn't a power of two... doh Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:55:55 -08:00
Kent Overstreet	77c320eb46	bcache: Add on error panic/unregister setting Works kind of like the ext4 setting, to panic or remount read only on errors. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:55:55 -08:00
Kent Overstreet	49b1212dfa	bcache: Use blkdev_issue_discard() The old asynchronous discard code was really a relic from when all the allocation code was asynchronous - now that allocation runs out of a dedicated thread there's no point in keeping around all that complicated machinery. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:55:54 -08:00
Kent Overstreet	dd9ec84da5	bcache: Fix a lockdep splat bch_keybuf_del() takes a spinlock that can't be taken in interrupt context - whoops. Fortunately, this code isn't enabled by default (you have to toggle a sysfs thing). Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:55:54 -08:00
Kent Overstreet	7857d5d470	bcache: Fix a journalling performance bug	2013-11-10 21:55:53 -08:00
Kent Overstreet	1fa8455deb	bcache: Fix dirty_data accounting Dirty data accounting wasn't quite right - firstly, we were adding the key we're inserting after it could have merged with another dirty key already in the btree, and secondly we could sometimes pass the wrong offset to bcache_dev_sectors_dirty_add() for dirty data we were overwriting - which is important when tracking dirty data by stripe. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10	2013-11-10 21:55:27 -08:00
Kent Overstreet	d4eddd42f5	bcache: Fixed incorrect order of arguments to bio_alloc_bioset() Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-10-23 07:55:36 +01:00
Kent Overstreet	2fe80d3bbf	bcache: Fix a null ptr deref regression Commit `c0f04d88e4` ("bcache: Fix flushes in writeback mode") was fixing a reported data corruption bug, but it seems some last minute refactoring or rebasing introduced a null pointer deref. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 Reported-by: Gabriel de Perthuis <g2p.code@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-10-10 18:17:39 -07:00
Kent Overstreet	c0f04d88e4	bcache: Fix flushes in writeback mode In writeback mode, when we get a cache flush we need to make sure we issue a flush to the backing device. The code for sending down an extra flush was wrong - by cloning the bio we were probably getting flags that didn't make sense for a bare flush, and also the old code was firing for FUA bios, for which we don't need to send a flush to the backing device. This was causing data corruption somehow - the mechanism was never determined, but this patch fixes it for the users that were seeing it. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-09-24 14:41:43 -07:00
Kent Overstreet	84786438ed	bcache: Fix for handling overlapping extents when reading in a btree node btree_sort_fixup() was overly clever, because it was trying to avoid pulling a key off the btree iterator in more than one place. This led to a really obscure bug where we'd break early from the loop in btree_sort_fixup() if the current key overlapped with keys in more than one older set, and the next key it overlapped with was zero size. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-09-24 14:41:43 -07:00
Kent Overstreet	a698e08c82	bcache: Fix a shrinker deadlock GFP_NOIO means we could be getting called recursively - mca_alloc() -> mca_data_alloc() - definitely can't use mutex_lock(bucket_lock) then. Whoops. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-09-24 14:41:43 -07:00
Kent Overstreet	79e3dab90d	bcache: Fix a dumb CPU spinning bug in writeback schedule_timeout() != schedule_timeout_uninterruptible() Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-09-24 14:41:43 -07:00
Kent Overstreet	1394d6761b	bcache: Fix a flush/fua performance bug bch_journal_meta() was missing the flush to make the journal write actually go down (instead of waiting up to journal_delay_ms)... Whoops Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-09-24 14:41:43 -07:00
Kent Overstreet	c2a4f3183a	bcache: Fix a writeback performance regression Background writeback works by scanning the btree for dirty data and adding those keys into a fixed size buffer, then for each dirty key in the keybuf writing it to the backing device. When read_dirty() finishes and it's time to scan for more dirty data, we need to wait for the outstanding writeback IO to finish - they still take up slots in the keybuf (so that foreground writes can check for them to avoid races) - without that wait, we'll continually rescan when we'll be able to add at most a key or two to the keybuf, and that takes locks that starves foreground IO. Doh. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-09-24 14:41:43 -07:00
Geert Uytterhoeven	61cbd250f8	bcache: Correct printf()-style format length modifier Fix drivers/md/bcache/btree.c: In function ‘bch_btree_node_read’: drivers/md/bcache/btree.c:259: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 3 has type ‘size_t’ Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-09-24 14:41:43 -07:00
Kent Overstreet	c426c4fd46	bcache: Fix for when no journal entries are found The journal replay code didn't handle this case, causing it to go into an infinite loop... Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-09-24 14:41:43 -07:00
Gabriel de Perthuis	aee6f1cfff	bcache: Strip endline when writing the label through sysfs sysfs attributes with unusual characters have crappy failure modes in Squeeze (udev 164); later versions of udev are unaffected. This should make these characters more unusual. Signed-off-by: Gabriel de Perthuis <g2p.code@gmail.com> Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-09-24 14:41:43 -07:00
Kent Overstreet	6d9d21e35f	bcache: Fix a dumb journal discard bug That switch statement was obviously wrong, leading to some sort of weird spinning on rare occasion with discards enabled... Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-09-24 14:41:43 -07:00
Dave Chinner	7dc19d5aff	drivers: convert shrinkers to new count/scan API Convert the driver shrinkers to the new API. Most changes are compile tested only because I either don't have the hardware or it's staging stuff. FWIW, the md and android code is pretty good, but the rest of it makes me want to claw my eyes out. The amount of broken code I just encountered is mind boggling. I've added comments explaining what is broken, but I fear that some of the code would be best dealt with by being dragged behind the bike shed, burying in mud up to it's neck and then run over repeatedly with a blunt lawn mower. Special mention goes to the zcache/zcache2 drivers. They can't co-exist in the build at the same time, they are under different menu options in menuconfig, they only show up when you've got the right set of mm subsystem options configured and so even compile testing is an exercise in pulling teeth. And that doesn't even take into account the horrible, broken code... [glommer@openvz.org: fixes for i915, android lowmem, zcache, bcache] Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Glauber Costa <glommer@openvz.org> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Kent Overstreet <koverstreet@google.com> Cc: John Stultz <john.stultz@linaro.org> Cc: David Rientjes <rientjes@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-09-10 18:56:32 -04:00
Linus Torvalds	d4c90b1b9f	Merge branch 'for-3.11/drivers' of git://git.kernel.dk/linux-block Pull block IO driver bits from Jens Axboe: "As I mentioned in the core block pull request, due to real life circumstances the driver pull request would be late. Now it looks like -rc2 late... On the plus side, apart form the rsxx update, these are all things that I could argue could go in later in the cycle as they are fixes and not features. So even though things are late, it's not ALL bad. The pull request contains: - Updates to bcache, all bug fixes, from Kent. - A pile of drbd bug fixes (no big features this time!). - xen blk front/back fixes. - rsxx driver updates, some of them deferred form 3.10. So should be well cooked by now" * 'for-3.11/drivers' of git://git.kernel.dk/linux-block: (63 commits) bcache: Allocation kthread fixes bcache: Fix GC_SECTORS_USED() calculation bcache: Journal replay fix bcache: Shutdown fix bcache: Fix a sysfs splat on shutdown bcache: Advertise that flushes are supported bcache: check for allocation failures bcache: Fix a dumb race bcache: Use standard utility code bcache: Update email address bcache: Delete fuzz tester bcache: Document shrinker reserve better bcache: FUA fixes drbd: Allow online change of al-stripes and al-stripe-size drbd: Constants should be UPPERCASE drbd: Ignore the exit code of a fence-peer handler if it returns too late drbd: Fix rcu_read_lock balance on error path drbd: fix error return code in drbd_init() drbd: Do not sleep inside rcu bcache: Refresh usage docs ...	2013-07-22 19:02:52 -07:00
Kent Overstreet	79826c35eb	bcache: Allocation kthread fixes The alloc kthread should've been using try_to_freeze() - and also there was the potential for the alloc kthread to get woken up after it had shut down, which would have been bad. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-07-12 00:22:49 -07:00
Kent Overstreet	29ebf465b9	bcache: Fix GC_SECTORS_USED() calculation Part of the job of garbage collection is to add up however many sectors of live data it finds in each bucket, but that doesn't work very well if it doesn't reset GC_SECTORS_USED() when it starts. Whoops. This wouldn't have broken anything horribly, but allocation tries to preferentially reclaim buckets that are mostly empty and that's not gonna work with an incorrect GC_SECTORS_USED() value. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10	2013-07-12 00:22:48 -07:00
Kent Overstreet	faa5673617	bcache: Journal replay fix The journal replay code starts by finding something that looks like a valid journal entry, then it does a binary search over the unchecked region of the journal for the journal entries with the highest sequence numbers. Trouble is, the logic was wrong - journal_read_bucket() returns true if it found journal entries we need, but if the range of journal entries we're looking for loops around the end of the journal - in that case journal_read_bucket() could return true when it hadn't found the highest sequence number we'd seen yet, and in that case the binary search did the wrong thing. Whoops. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10	2013-07-12 00:22:48 -07:00
Kent Overstreet	5caa52afc5	bcache: Shutdown fix Stopping a cache set is supposed to make it stop attached backing devices, but somewhere along the way that code got lost. Fixing this mainly has the effect of fixing our reboot notifier. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10	2013-07-12 00:22:47 -07:00
Kent Overstreet	c9502ea442	bcache: Fix a sysfs splat on shutdown If we stopped a bcache device when we were already detaching (or something like that), bcache_device_unlink() would try to remove a symlink from sysfs that was already gone because the bcache dev kobject had already been removed from sysfs. So keep track of whether we've removed stuff from sysfs. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10	2013-07-12 00:22:47 -07:00
Kent Overstreet	54d12f2b4f	bcache: Advertise that flushes are supported Whoops - bcache's flush/FUA was mostly correct, but flushes get filtered out unless we say we support them... Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10	2013-07-12 00:22:46 -07:00
Dan Carpenter	d2a65ce2ac	bcache: check for allocation failures There is a missing NULL check after the kzalloc(). Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>	2013-07-12 00:22:46 -07:00
Kent Overstreet	6aa8f1a6ca	bcache: Fix a dumb race In the far-too-complicated closure code - closures can have destructors, for probably dubious reasons; they get run after the closure is no longer waiting on anything but before dropping the parent ref, intended just for freeing whatever memory the closure is embedded in. Trouble is, when remaining goes to 0 and we've got nothing more to run - we also have to unlock the closure, setting remaining to -1. If there's a destructor, that unlock isn't doing anything - nobody could be trying to lock it if we're about to free it - but if the unlock _is needed... that check for a destructor was racy. Argh. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10	2013-07-12 00:22:33 -07:00
Linus Torvalds	80cc38b163	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "The usual stuff from trivial tree" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits) treewide: relase -> release Documentation/cgroups/memory.txt: fix stat file documentation sysctl/net.txt: delete reference to obsolete 2.4.x kernel spinlock_api_smp.h: fix preprocessor comments treewide: Fix typo in printk doc: device tree: clarify stuff in usage-model.txt. open firmware: "/aliasas" -> "/aliases" md: bcache: Fixed a typo with the word 'arithmetic' irq/generic-chip: fix a few kernel-doc entries frv: Convert use of typedef ctl_table to struct ctl_table sgi: xpc: Convert use of typedef ctl_table to struct ctl_table doc: clk: Fix incorrect wording Documentation/arm/IXP4xx fix a typo Documentation/networking/ieee802154 fix a typo Documentation/DocBook/media/v4l fix a typo Documentation/video4linux/si476x.txt fix a typo Documentation/virtual/kvm/api.txt fix a typo Documentation/early-userspace/README fix a typo Documentation/video4linux/soc-camera.txt fix a typo lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment ...	2013-07-04 11:40:58 -07:00
Kent Overstreet	8e51e414a3	bcache: Use standard utility code Some of bcache's utility code has made it into the rest of the kernel, so drop the bcache versions. Bcache used to have a workaround for allocating from a bio set under generic_make_request() (if you allocated more than once, the bios you already allocated would get stuck on current->bio_list when you submitted, and you'd risk deadlock) - bcache would mask out __GFP_WAIT when allocating bios under generic_make_request() so that allocation could fail and it could retry from workqueue. But bio_alloc_bioset() has a workaround now, so we can drop this hack and the associated error handling. Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-07-01 14:43:53 -07:00
Kent Overstreet	f3059a5461	bcache: Delete fuzz tester This code has rotted and it hasn't been used in ages anyways. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-07-01 14:43:48 -07:00
Kent Overstreet	36c9ea9837	bcache: Document shrinker reserve better Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-07-01 14:42:48 -07:00
Kent Overstreet	e49c7c374e	bcache: FUA fixes Journal writes need to be marked FUA, not just REQ_FLUSH. And btree node writes have... weird ordering requirements. Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-07-01 14:42:47 -07:00
Gabriel de Perthuis	ab9e14002e	bcache: Send label uevents Signed-off-by: Gabriel de Perthuis <g2p.code@gmail.com> Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-06-26 21:58:06 -07:00
Gabriel de Perthuis	a25c32bede	bcache: Send a uevent with a cached device's UUID Signed-off-by: Gabriel de Perthuis <g2p.code@gmail.com>	2013-06-26 21:58:05 -07:00
Kent Overstreet	72c270612b	bcache: Write out full stripes Now that we're tracking dirty data per stripe, we can add two optimizations for raid5/6: * If a stripe is already dirty, force writes to that stripe to writeback mode - to help build up full stripes of dirty data * When flushing dirty data, preferentially write out full stripes first if there are any. Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-06-26 21:58:04 -07:00
Kent Overstreet	279afbad4e	bcache: Track dirty data by stripe To make background writeback aware of raid5/6 stripes, we first need to track the amount of dirty data within each stripe - we do this by breaking up the existing sectors_dirty into per stripe atomic_ts Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-06-26 21:57:23 -07:00
Kent Overstreet	444fc0b6b1	bcache: Initialize sectors_dirty when attaching Previously, dirty_data wouldn't get initialized until the first garbage collection... which was a bit of a problem for background writeback (as the PD controller keys off of it) and also confusing for users. This is also prep work for making background writeback aware of raid5/6 stripes. Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-06-26 17:09:16 -07:00
Kent Overstreet	6ded34d1a5	bcache: Improve lazy sorting The old lazy sorting code was kind of hacky - rewrite in a way that mathematically makes more sense; the idea is that the size of the sets of keys in a btree node should increase by a more or less fixed ratio from smallest to biggest. Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-06-26 17:09:16 -07:00
Kent Overstreet	85b1492ee1	bcache: Rip out pkey()/pbtree() Old gcc doesnt like the struct hack, and it is kind of ugly. So finish off the work to convert pr_debug() statements to tracepoints, and delete pkey()/pbtree(). Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-06-26 17:09:15 -07:00
Kent Overstreet	c37511b863	bcache: Fix/revamp tracepoints The tracepoints were reworked to be more sensible, and fixed a null pointer deref in one of the tracepoints. Converted some of the pr_debug()s to tracepoints - this is partly a performance optimization; it used to be that with DEBUG or CONFIG_DYNAMIC_DEBUG pr_debug() was an empty macro; but at some point it was changed to an empty inline function. Some of the pr_debug() statements had rather expensive function calls as part of the arguments, so this code was getting run unnecessarily even on non debug kernels - in some fast paths, too. Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-06-26 17:09:15 -07:00
Kent Overstreet	5794351146	bcache: Refactor btree io The most significant change is that btree reads are now done synchronously, instead of asynchronously and doing the post read stuff from a workqueue. This was originally done because we can't block on IO under generic_make_request(). But - we already have a mechanism to punt cache lookups to workqueue if needed, so if we just use that we don't have to deal with the complexity of doing things asynchronously. The main benefit is this makes the locking situation saner; we can hold our write lock on the btree node until we're finished reading it, and we don't need that btree_node_read_done() flag anymore. Also, for writes, btree_write() was broken out into btree_node_write() and btree_leaf_dirty() - the old code with the boolean argument was dumb and confusing. The prio_blocked mechanism was improved a bit too, now the only counter is in struct btree_write, we don't mess with transfering a count from struct btree anymore. This required changing garbage collection to block prios at the start and unblock when it finishes, which is cleaner than what it was doing anyways (the old code had mostly the same effect, but was doing it in a convoluted way) And the btree iter btree_node_read_done() uses was converted to a real mempool. Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-06-26 17:09:14 -07:00
Kent Overstreet	119ba0f828	bcache: Convert allocator thread to kthread Using a workqueue when we just want a single thread is a bit silly. Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-06-26 17:09:13 -07:00
Gabriel de Perthuis	a9dd53adbb	bcache: Warn when a device is already registered. Signed-off-by: Gabriel de Perthuis <g2p.code+bcache@gmail.com> Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-06-26 17:08:52 -07:00
Kent Overstreet	bbc77aa7fb	bcache: fix a spurious gcc complaint, use scnprintf An old version of gcc was complaining about using a const int as the size of a stack allocated array. Which should be fine - but using ARRAY_SIZE() is better, anyways. Also, refactor the code to use scnprintf(). Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-06-26 17:06:33 -07:00
Kumar Amit Mehta	5c694129c8	md: bcache: io.c: fix a potential NULL pointer dereference bio_alloc_bioset returns NULL on failure. This fix adds a missing check for potential NULL pointer dereferencing. Signed-off-by: Kumar Amit Mehta <gmate.amit@gmail.com> Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-06-26 17:06:19 -07:00
Phil Viana	48a73025cb	md: bcache: Fixed a typo with the word 'arithmetic' The word 'arithmetic' was typed as 'arithmatic' Signed-off-by: Phil Viana <phillip.l.viana@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2013-06-18 13:41:16 +02:00
Kent Overstreet	f59fce847f	bcache: Fix error handling in init code This code appears to have rotted... fix various bugs and do some refactoring. Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-05-15 00:48:14 -07:00
Paul Bolle	bbb1c3b5ae	bcache: drop "select CLOSURES" The Kconfig entry for BCACHE selects CLOSURES. But there's no Kconfig symbol CLOSURES. That symbol was used in development versions of bcache, but was removed when the closures code was no longer provided as a kernel library. It can safely be dropped. Signed-off-by: Paul Bolle <pebolle@tiscali.nl>	2013-05-15 00:42:51 -07:00
Emil Goode	867e116206	bcache: Fix incompatible pointer type warning The function pointer release in struct block_device_operations should point to functions declared as void. Sparse warnings: drivers/md/bcache/super.c:656:27: warning: incorrect type in initializer (different base types) drivers/md/bcache/super.c:656:27: expected void ( release )( ... ) drivers/md/bcache/super.c:656:27: got int ( static [toplevel] <noident> )( ... ) drivers/md/bcache/super.c:656:2: warning: initialization from incompatible pointer type [enabled by default] drivers/md/bcache/super.c:656:2: warning: (near initialization for ‘bcache_ops.release’) [enabled by default] Signed-off-by: Emil Goode <emilgoode@gmail.com> Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-05-15 00:42:50 -07:00
Kent Overstreet	ee66850642	bcache: Use bd_link_disk_holder() Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-04-30 19:14:43 -07:00
Kent Overstreet	86b26b824c	bcache: Allocator cleanup/fixes The main fix is that bch_allocator_thread() wasn't waiting on garbage collection to finish (if invalidate_buckets had set ca->invalidate_needs_gc); we need that to make sure the allocator doesn't spin and potentially block gc from finishing. Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-04-30 19:14:40 -07:00
Kent Overstreet	8abb2a5dba	bcache: Make sure blocksize isn't smaller than device blocksize Sanity check to make sure we don't end up doing IO the device doesn't support. Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-04-24 13:07:39 -07:00
Kent Overstreet	a09ded8edf	bcache: Fix merge_bvec_fn usage for when it modifies the bvm Stacked md devices reuse the bvm for the subordinate device, causing problems... Reported-by: Michael Balser <michael.balser@profitbricks.com> Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-04-22 14:44:24 -07:00
Kent Overstreet	1545f13730	bcache: Correctly check against BIO_MAX_PAGES bch_bio_max_sectors() was checking against BIO_MAX_PAGES as if the limit was for the total bytes in the bio, not the number of segments. Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-04-20 17:57:42 -07:00
Kent Overstreet	bca97adaf5	bcache: Hack around stuff that clones up to bi_max_vecs Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-04-20 17:57:41 -07:00
Kent Overstreet	4f0fd955cd	bcache: Set ra_pages based on backing device's ra_pages Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-04-20 17:57:26 -07:00
Kent Overstreet	2903381fce	bcache: Take data offset from the bdev superblock. Add a new superblock version, and consolidate related defines. Signed-off-by: Gabriel de Perthuis <g2p.code+bcache@gmail.com> Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-04-20 17:56:12 -07:00
Kent Overstreet	cef5279735	bcache: Disable broken btree fuzz tester Reported-by: <sasha.levin@oracle.com> Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-04-08 13:33:49 -07:00
Kent Overstreet	91bbcfc361	bcache: Fix a format string overflow Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-04-08 13:33:49 -07:00
Kent Overstreet	8ef747909c	bcache: Fix a minor memory leak on device teardown Reported-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-04-08 13:33:48 -07:00
Kent Overstreet	cc0f4eaa61	bcache: Use WARN_ONCE() instead of __WARN() Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-04-08 13:33:48 -07:00
Geert Uytterhoeven	cd953ed036	bcache: Add missing #include <linux/prefetch.h> m68k/allmodconfig: drivers/md/bcache/bset.c: In function ‘bset_search_tree’: drivers/md/bcache/bset.c:727: error: implicit declaration of function ‘prefetch’ drivers/md/bcache/btree.c: In function ‘bch_btree_node_get’: drivers/md/bcache/btree.c:933: error: implicit declaration of function ‘prefetch’ Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-04-08 13:33:48 -07:00
Kent Overstreet	c19ed23a0b	bcache: Sparse fixes Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-04-08 13:33:48 -07:00
Kent Overstreet	169ef1cf61	bcache: Don't export utility code, prefix with bch_ Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: linux-bcache@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2013-03-28 12:50:55 -06:00
Kent Overstreet	29177b8966	bcache: Fix for the build fixes Commit 82a84eaf7e51ba3da0c36cbc401034a4e943492d left a return 0 in closure_debug_init(). Whoops. Signed-off-by: Kent Overstreet <koverstreet@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2013-03-25 19:36:39 -06:00
Kent Overstreet	b1a67b0f4c	bcache: Style/checkpatch fixes Took out some nested functions, and fixed some more checkpatch complaints. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: linux-bcache@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2013-03-25 13:06:13 -06:00
Kent Overstreet	07e86ccb54	bcache: Build fixes from test robot config: make ARCH=i386 allmodconfig All error/warnings: drivers/md/bcache/bset.c: In function 'bch_ptr_bad': >> drivers/md/bcache/bset.c:164:2: warning: format '%li' expects argument of type 'long int', but argument 4 has type 'size_t' [-Wformat] -- drivers/md/bcache/debug.c: In function 'bch_pbtree': >> drivers/md/bcache/debug.c:86:4: warning: format '%li' expects argument of type 'long int', but argument 4 has type 'size_t' [-Wformat] -- drivers/md/bcache/btree.c: In function 'bch_btree_read_done': >> drivers/md/bcache/btree.c:245:8: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t' [-Wformat] -- drivers/md/bcache/closure.o: In function `closure_debug_init': >> (.init.text+0x0): multiple definition of `init_module' >> drivers/md/bcache/super.o:super.c:(.init.text+0x0): first defined here Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: linux-bcache@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2013-03-25 13:06:13 -06:00
Kent Overstreet	cafe563591	bcache: A block layer cache Does writethrough and writeback caching, handles unclean shutdown, and has a bunch of other nifty features motivated by real world usage. See the wiki at http://bcache.evilpiepirate.org for more. Signed-off-by: Kent Overstreet <koverstreet@google.com>	2013-03-23 16:11:31 -07:00

... 2 3 4 5 6

254 Commits