The list_lru infrastructure already keeps per-node LRU lists in its
node-specific list_lru_node arrays and provide us with a per-node API, and
the shrinkers are properly equiped with node information. This means that
we can now focus our shrinking effort in a single node, but the work that
is deferred from one run to another is kept global at nr_in_batch. Work
can be deferred, for instance, during direct reclaim under a GFP_NOFS
allocation, where situation, all the filesystem shrinkers will be
prevented from running and accumulate in nr_in_batch the amount of work
they should have done, but could not.
This creates an impedance problem, where upon node pressure, work deferred
will accumulate and end up being flushed in other nodes. The problem we
describe is particularly harmful in big machines, where many nodes can
accumulate at the same time, all adding to the global counter nr_in_batch.
As we accumulate more and more, we start to ask for the caches to flush
even bigger numbers. The result is that the caches are depleted and do
not stabilize. To achieve stable steady state behavior, we need to tackle
it differently.
In this patch we keep the deferred count per-node, in the new array
nr_deferred[] (the name is also a bit more descriptive) and will never
accumulate that to other nodes.
Signed-off-by: Glauber Costa <glommer@openvz.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pass the node of the current zone being reclaimed to shrink_slab(),
allowing the shrinker control nodemask to be set appropriately for node
aware shrinkers.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The list_lru implementation has one function, list_lru_dispose_all, with
only one user (the dentry code). At first, such function appears to make
sense because we are really not interested in the result of isolating each
dentry separately - all of them are going away anyway. However, it's
implementation is buggy in the following way:
When we call list_lru_dispose_all in fs/dcache.c, we scan all dentries
marking them with DCACHE_SHRINK_LIST. However, this is done without the
nlru->lock taken. The imediate result of that is that someone else may
add or remove the dentry from the LRU at the same time. When list_lru_del
happens in that scenario we will see an element that is not yet marked
with DCACHE_SHRINK_LIST (even though it will be in the future) and
obviously remove it from an lru where the element no longer is. Since
list_lru_dispose_all will in effect count down nlru's nr_items and
list_lru_del will do the same, this will lead to an imbalance.
The solution for this would not be so simple: we can obviously just keep
the lru_lock taken, but then we have no guarantees that we will be able to
acquire the dentry lock (dentry->d_lock). To properly solve this, we need
a communication mechanism between the lru and dentry code, so they can
coordinate this with each other.
Such mechanism already exists in the form of the list_lru_walk_cb
callback. So it is possible to construct a dcache-side prune function
that does the right thing only by calling list_lru_walk in a loop until no
more dentries are available.
With only one user, plus the fact that a sane solution for the problem
would involve boucing between dcache and list_lru anyway, I see little
justification to keep the special case list_lru_dispose_all in tree.
Signed-off-by: Glauber Costa <glommer@openvz.org>
Cc: Michal Hocko <mhocko@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This patch adapts the list_lru API to accept an optional node argument, to
be used by NUMA aware shrinking functions. Code that does not care about
the NUMA placement of objects can still call into the very same functions
as before. They will simply iterate over all nodes.
Signed-off-by: Glauber Costa <glommer@openvz.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The LRU_RETRY code assumes that the list traversal status after we have
dropped and regained the list lock. Unfortunately, this is not a valid
assumption, and that can lead to racing traversals isolating objects that
the other traversal expects to be the next item on the list.
This is causing problems with the inode cache shrinker isolation, with
races resulting in an inode on a dispose list being "isolated" because a
racing traversal still thinks it is on the LRU. The inode is then never
reclaimed and that causes hangs if a subsequent lookup on that inode
occurs.
Fix it by always restarting the list walk on a LRU_RETRY return from the
isolate callback. Avoid the possibility of livelocks the current code was
trying to avoid by always decrementing the nr_to_walk counter on retries
so that even if we keep hitting the same item on the list we'll eventually
stop trying to walk and exit out of the situation causing the problem.
Reported-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now that we have an LRU list API, we can start to enhance the
implementation. This splits the single LRU list into per-node lists and
locks to enhance scalability. Items are placed on lists according to the
node the memory belongs to. To make scanning the lists efficient, also
track whether the per-node lists have entries in them in a active
nodemask.
Note: We use a fixed-size array for the node LRU, this struct can be very
big if MAX_NUMNODES is big. If this becomes a problem this is fixable by
turning this into a pointer and dynamically allocating this to
nr_node_ids. This quantity is firwmare-provided, and still would provide
room for all nodes at the cost of a pointer lookup and an extra
allocation. Because that allocation will most likely come from a may very
well fail.
[glommer@openvz.org: fix warnings, added note about node lru]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Reviewed-by: Greg Thelen <gthelen@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
When removing an element from the lru, this will be done today after the lock
is released. This is a clear mistake, although we are not sure if the bugs we
are seeing are related to this. All list manipulations are done inside the
lock, and so should this one.
Signed-off-by: Glauber Costa <glommer@openvz.org>
Tested-by: Michal Hocko <mhocko@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Several subsystems use the same construct for LRU lists - a list head, a
spin lock and and item count. They also use exactly the same code for
adding and removing items from the LRU. Create a generic type for these
LRU lists.
This is the beginning of generic, node aware LRUs for shrinkers to work
with.
[glommer@openvz.org: enum defined constants for lru. Suggested by gthelen, don't relock over retry]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Reviewed-by: Greg Thelen <gthelen@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Convert superblock shrinker to use the new count/scan API, and propagate
the API changes through to the filesystem callouts. The filesystem
callouts already use a count/scan API, so it's just changing counters to
longs to match the VM API.
This requires the dentry and inode shrinker callouts to be converted to
the count/scan API. This is mainly a mechanical change.
[glommer@openvz.org: use mult_frac for fractional proportions, build fixes]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The current shrinker callout API uses an a single shrinker call for
multiple functions. To determine the function, a special magical value is
passed in a parameter to change the behaviour. This complicates the
implementation and return value specification for the different
behaviours.
Separate the two different behaviours into separate operations, one to
return a count of freeable objects in the cache, and another to scan a
certain number of objects in the cache for freeing. In defining these new
operations, ensure the return values and resultant behaviours are clearly
defined and documented.
Modify shrink_slab() to use the new API and implement the callouts for all
the existing shrinkers.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@parallels.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
One of the big problems with modifying the way the dcache shrinker and LRU
implementation works is that the LRU is abused in several ways. One of
these is shrink_dentry_list().
Basically, we can move a dentry off the LRU onto a different list without
doing any accounting changes, and then use dentry_lru_prune() to remove it
from what-ever list it is now on to do the LRU accounting at that point.
This makes it -really hard- to change the LRU implementation. The use of
the per-sb LRU lock serialises movement of the dentries between the
different lists and the removal of them, and this is the only reason that
it works. If we want to break up the dentry LRU lock and lists into, say,
per-node lists, we remove the only serialisation that allows this lru
list/dispose list abuse to work.
To make this work effectively, the dispose list has to be isolated from
the LRU list - dentries have to be removed from the LRU *before* being
placed on the dispose list. This means that the LRU accounting and
isolation is completed before disposal is started, and that means we can
change the LRU implementation freely in future.
This means that dentries *must* be marked with DCACHE_SHRINK_LIST when
they are placed on the dispose list so that we don't think that parent
dentries found in try_prune_one_dentry() are on the LRU when the are
actually on the dispose list. This would result in accounting the dentry
to the LRU a second time. Hence dentry_lru_del() has to handle the
DCACHE_SHRINK_LIST case
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
With the dentry LRUs being per-sb structures, there is no real need for
a global dentry_lru_lock. The locking can be made more fine-grained by
moving to a per-sb LRU lock, isolating the LRU operations of different
filesytsems completely from each other. The need for this is independent
of any performance consideration that may arise: in the interest of
abstracting the lru operations away, it is mandatory that each lru works
around its own lock instead of a global lock for all of them.
[glommer@openvz.org: updated changelog ]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Before we split up the dcache_lru_lock, the unused dentry counter needs to
be made independent of the global dcache_lru_lock. Convert it to per-cpu
counters to do this.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The sysctl knob sysctl_vfs_cache_pressure is used to determine which
percentage of the shrinkable objects in our cache we should actively try
to shrink.
It works great in situations in which we have many objects (at least more
than 100), because the aproximation errors will be negligible. But if
this is not the case, specially when total_objects < 100, we may end up
concluding that we have no objects at all (total / 100 = 0, if total <
100).
This is certainly not the biggest killer in the world, but may matter in
very low kernel memory situations.
Signed-off-by: Glauber Costa <glommer@openvz.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This series reworks our current object cache shrinking infrastructure in
two main ways:
* Noticing that a lot of users copy and paste their own version of LRU
lists for objects, we put some effort in providing a generic version.
It is modeled after the filesystem users: dentries, inodes, and xfs
(for various tasks), but we expect that other users could benefit in
the near future with little or no modification. Let us know if you
have any issues.
* The underlying list_lru being proposed automatically and
transparently keeps the elements in per-node lists, and is able to
manipulate the node lists individually. Given this infrastructure, we
are able to modify the up-to-now hammer called shrink_slab to proceed
with node-reclaim instead of always searching memory from all over like
it has been doing.
Per-node lru lists are also expected to lead to less contention in the lru
locks on multi-node scans, since we are now no longer fighting for a
global lock. The locks usually disappear from the profilers with this
change.
Although we have no official benchmarks for this version - be our guest to
independently evaluate this - earlier versions of this series were
performance tested (details at
http://permalink.gmane.org/gmane.linux.kernel.mm/100537) yielding no
visible performance regressions while yielding a better qualitative
behavior in NUMA machines.
With this infrastructure in place, we can use the list_lru entry point to
provide memcg isolation and per-memcg targeted reclaim. Historically,
those two pieces of work have been posted together. This version presents
only the infrastructure work, deferring the memcg work for a later time,
so we can focus on getting this part tested. You can see more about the
history of such work at http://lwn.net/Articles/552769/
Dave Chinner (18):
dcache: convert dentry_stat.nr_unused to per-cpu counters
dentry: move to per-sb LRU locks
dcache: remove dentries from LRU before putting on dispose list
mm: new shrinker API
shrinker: convert superblock shrinkers to new API
list: add a new LRU list type
inode: convert inode lru list to generic lru list code.
dcache: convert to use new lru list infrastructure
list_lru: per-node list infrastructure
shrinker: add node awareness
fs: convert inode and dentry shrinking to be node aware
xfs: convert buftarg LRU to generic code
xfs: rework buffer dispose list tracking
xfs: convert dquot cache lru to list_lru
fs: convert fs shrinkers to new scan/count API
drivers: convert shrinkers to new count/scan API
shrinker: convert remaining shrinkers to count/scan API
shrinker: Kill old ->shrink API.
Glauber Costa (7):
fs: bump inode and dentry counters to long
super: fix calculation of shrinkable objects for small numbers
list_lru: per-node API
vmscan: per-node deferred work
i915: bail out earlier when shrinker cannot acquire mutex
hugepage: convert huge zero page shrinker to new shrinker API
list_lru: dynamically adjust node arrays
This patch:
There are situations in very large machines in which we can have a large
quantity of dirty inodes, unused dentries, etc. This is particularly true
when umounting a filesystem, where eventually since every live object will
eventually be discarded.
Dave Chinner reported a problem with this while experimenting with the
shrinker revamp patchset. So we believe it is time for a change. This
patch just moves int to longs. Machines where it matters should have a
big long anyway.
Signed-off-by: Glauber Costa <glommer@openvz.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
For a long time no filesystem has been using vfs_follow_link, and as seen
by recent filesystem submissions any new use is accidental as well.
Remove vfs_follow_link, document the replacement in
Documentation/filesystems/porting and also rename __vfs_follow_link
to match its only caller better.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull vfs pile 3 (of many) from Al Viro:
"Waiman's conversion of d_path() and bits related to it,
kern_path_mountpoint(), several cleanups and fixes (exportfs
one is -stable fodder, IMO).
There definitely will be more... ;-/"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
split read_seqretry_or_unlock(), convert d_walk() to resulting primitives
dcache: Translating dentry into pathname without taking rename_lock
autofs4 - fix device ioctl mount lookup
introduce kern_path_mountpoint()
rename user_path_umountat() to user_path_mountpoint_at()
take unlazy_walk() into umount_lookup_last()
Kill indirect include of file.h from eventfd.h, use fdget() in cgroup.c
prune_super(): sb->s_op is never NULL
exportfs: don't assume that ->iterate() won't feed us too long entries
afs: get rid of redundant ->d_name.len checks
When I moved the RCU walk termination into unlazy_walk(), I didn't copy
quite all of it: for the successful RCU termination we properly add the
necessary reference counts to our temporary copy of the root path, but
for the failure case we need to make sure that any temporary root path
information is cleared out (since it does _not_ have the proper
reference counts from the RCU lookup).
We could clean up this mess by just always dropping the temporary root
information, but Al points out that that would mean that a single lookup
through symlinks could see multiple different root entries if it races
with another thread doing chroot. Not that I think we should really
care (we had that before too, back before we had a copy of the root path
in the nameidata).
Al says he has a cunning plan. In the meantime, this is the minimal fix
for the problem, even if it's not all that pretty.
Reported-by: Mace Moneta <moneta.mace@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Collection of random updates to the core and some end-driver fixups for
ioatdma and mv_xor:
* NUMA aware channel allocation
* Cleanup dmatest debugfs interface
* ioat: make raid-support Atom only
* mv_xor: big endian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJSLmLbAAoJEB7SkWpmfYgCIHoQAK/RmD0fnK+NsqoPpKVqeo/q
4YL/+DNBwhC1edx5yD0+cyGydcd4Z74RdaoMJJg4naDhY4jjsBetNGJA7qH4ynbu
LrPesxF1g9Ayhgk5sFzGU66Bg1/SpaQA37/wCVbJI0fO84ZZy+snSBv6wEKCYdRo
Ot/65pB5zdbQd1zdESi3v9w9TFIRVYmMlKMoxfxjDdoTG+yI7R3ShICbSa2AS2zs
yOJA2kgMpQvwdufZCLqwfWzrHD9fwLEv8IxNKSdbUD63JUZwCb7kQ88D1S6VqeUV
EurcPiFjDckGpyZX+4rgB6/8iym4x41t1gFcsR65FlKxHtdcdsjTeFGsSbcIrzAi
FM5S+C5eJMGkEpWZWAQXHatkc5R7fr2eljSWeGomEECWhcPUBNmqJ2p55jCDj2Ex
4SV8wZRID1pcqOwMec/B7nSNzALBXMXh5sSGp9T74+dwZ0OLoHQ1aOmJzl2tDiL7
OF6si7XYgecYOnINDck4AAb6JfQKRIz+zFt18tIJbYL2F5yTumJD0Bw4DaeU0Zfy
muUMzWJJKuoQtYMs4CeD0HzsbhZH0jieTgrXXrHpCXFLxVTeVmVcxerfvXWanqrP
glMwDLcC4Dg+7JPiz8IGp9wcPppQqzOk7PS4ibPG5X0pm+SWUxL1Qd/0v5xnRcjN
HK17pMXS0oTFfcTl20qe
=ldij
-----END PGP SIGNATURE-----
Merge tag 'dmaengine-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine
Pull dmaengine update from Dan Williams:
"Collection of random updates to the core and some end-driver fixups
for ioatdma and mv_xor:
- NUMA aware channel allocation
- Cleanup dmatest debugfs interface
- ioat: make raid-support Atom only
- mv_xor: big endian
Aside from the top three commits these have all had some soak time in
-next. The top commit fixes a recent build breakage.
It has been a long while since my last pull request, hopefully it does
not show. Thanks to Vinod for keeping an eye on drivers/dma/ this
past year"
* tag 'dmaengine-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine:
dmaengine: dma_sync_wait and dma_find_channel undefined
MAINTAINERS: update email for Dan Williams
dma: mv_xor: Fix incorrect error path
ioatdma: silence GCC warnings
dmaengine: make dma_channel_rebalance() NUMA aware
dmaengine: make dma_submit_error() return an error code
ioatdma: disable RAID on non-Atom platforms and reenable unaligned copies
mv_xor: support big endian systems using descriptor swap feature
mv_xor: use {readl, writel}_relaxed instead of __raw_{readl, writel}
dmatest: print message on debug level in case of no error
dmatest: remove IS_ERR_OR_NULL checks of debugfs calls
dmatest: make module parameters writable
dma_sync_wait and dma_find_channel are declared regardless of whether
CONFIG_DMA_ENGINE is enabled, but calling the function without
CONFIG_DMA_ENGINE enabled results "undefined reference" errors.
To get around this, declare dma_sync_wait and dma_find_channel as inline
functions if CONFIG_DMA_ENGINE is undefined.
Signed-off-by: Jon Mason <jon.mason@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
These are changes that arrived a little late before the merge window,
or had dependencies on previous branches.
Highlights:
- ux500: misc. cleanup, fixup I2C devices
- exynos: DT updates for RTC; PM updates
- at91: DT updates for NAND; new platforms added to generic defconfig
- sunxi: DT updates: cubieboard2, pinctrl driver, gated clocks
- highbank: LPAE fixes, select necessary ARM errata
- omap: PM fixes and improvements; OMAP5 mailbox support
- omap: basic support for new DRA7xx SoCs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJSLkf1AAoJEFk3GJrT+8ZlF7oP/AyxrdRFyC1YmuOqzFH0/JTQ
EVBmMBiH+f1IKBT6YRkWCzX4JI5oOi+2DhrM6d/UPfbpr6pwd8dptuPiyLuBBUEm
byNbiJEYHidm23oFpKM+89tTHXbBrrz8XQN2xLwYhNr24QkVAsLTxyOjVA7KJM59
tk1tPQzO1ORyiFd485eQa3V4z98JgcE3QFNthbS7Y72wEXBzMZQDc9nFaoIJ5mHW
nzJSZyV24ibeEJeM2nsc7a3OvCyUfAQaO5Cio2UvdkGzZcmtxjxc1LjHa4VjIL6h
hwz+gqIOfl3hXotbjJxTp9+Ezt4TGU5bB3NUweE1btHE/KIEu0bx4hSsOz/kooA9
2JL8BCCTx+KiGiNHmNCcT679n9q11iOwqOWvxxhcJFkiV/6+mkjwTD9TNwR1q+RG
+LtOZr9tMcu2v/DbAivDYKiROmNCZhxpn35DoUKpBy73SOvJOiTLtSYitVN/tyM3
nWLEP5aTf3NwrWr8nFFws6ycwhgTCX0ITbdFD/fMlLMamHYPkckJ/0NXXOxfGiLk
kCMbdrCX4YTbCftmAQhrbdaPJVnE/SZI3CTJfutj8eX6NC2fm/U7Hcf5PI+W0Igd
moN/PaUULpVZI5hUrADyU1HCQnA97pv0biYVwzW5pBIt2u9tzUritabuERxPt9fa
SdHj0+u+xq9d3y35Oq46
=NIZZ
-----END PGP SIGNATURE-----
Merge tag 'late-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC late changes from Kevin Hilman:
"These are changes that arrived a little late before the merge window,
or had dependencies on previous branches.
Highlights:
- ux500: misc. cleanup, fixup I2C devices
- exynos: DT updates for RTC; PM updates
- at91: DT updates for NAND; new platforms added to generic defconfig
- sunxi: DT updates: cubieboard2, pinctrl driver, gated clocks
- highbank: LPAE fixes, select necessary ARM errata
- omap: PM fixes and improvements; OMAP5 mailbox support
- omap: basic support for new DRA7xx SoCs"
* tag 'late-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (60 commits)
ARM: dts: vexpress: Add CCI node to TC2 device-tree
ARM: EXYNOS: Skip C1 cpuidle state for exynos5440
ARM: EXYNOS: always enable PM domains support for EXYNOS4X12
ARM: highbank: clean-up some unused includes
ARM: sun7i: Enable the A20 clocks in the DTSI
ARM: sun6i: Enable clock support in the DTSI
ARM: sun5i: dt: Use the A10s gates in the DTSI
ARM: at91: at91_dt_defconfig: enable rm9200 support
ARM: dts: add ADC device tree node for exynos5420/5250
ARM: dts: Add RTC DT node to Exynos5420 SoC
ARM: dts: Update the "status" property of RTC DT node for Exynos5250 SoC
ARM: dts: Fix the RTC DT node name for Exynos5250
irqchip: mmp: avoid to include irqs head file
ARM: mmp: avoid to include head file in mach-mmp
irqchip: mmp: support irqchip
irqchip: move mmp irq driver
ARM: OMAP: AM33xx: clock: Add RNG clock data
ARM: OMAP: TI81XX: add always-on powerdomain for TI81XX
ARM: OMAP4: clock: Lock PLLs in the right sequence
ARM: OMAP: AM33XX: hwmod: Add hwmod data for debugSS
...
Lots of cleanup and refactoring and some SMP additions for Renesas
platforms. Due to some inter-dependencies with other arm-soc
branches, this Renesas stuff was separated out for sending after the
other branches were merged.
Highlights:
- remove unused board support and cleanup of unused headers
- refactoring of init and device registration
- simplify IRQ initialization
Conflicts: Too many. Most of these are because Simon chose to send
some board updates through the V4L tree that ends up colliding with
the main platform changes. We'll work with him on sorting out his
workflow:
- arch/arm/boot/dts/r8a7740.dtsi:
- Add/add conflict in a devicetree file (keep both)
- arch/arm/mach-shmobile/Makefile:
- Splitting out of clock files collides with intc move to DT.
Keep HEAD version but remove intc-* files for R8A7740 and R8A7779.
- arch/arm/mach-shmobile/board-bockw.c:
- Keep HEAD but remove i2c, hspi and mmc device init calls
- arch/arm/mach-shmobile/board-marzen.c
- Remove mach/hardware.h include and r8a7779_add_usb_phy_device() call,
everything else stays.
- arch/arm/mach-shmobile/include/mach/r8a7778.h:
- From HEAD, Keep camera-rcar.h include and r8a7778_add_vin_device()
- From branch, keep everything
- arch/arm/mach-shmobile/include/mach/r8a7779.h:
- From HEAD, Keep only camera-rcar.h include and r8a7779_add_vin_device()
- arch/arm/mach-shmobile/setup-r8a7778.c
- Keep HEAD, but drop the MMC section (struct resource + add_mmc_device())
- take the new function name from our side (r8a7778_add_dt_devices())
- arch/arm/mach-shmobile/setup-r8a7779.c
- Keep HEAD, but drop r8a7779_add_usb_phy_device()
I've also pushed a test-merge2 branch where you can see how I resolved
them.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJSLkaWAAoJEFk3GJrT+8Zl5l8P/3oJ83VJHcjD2fMpAb8Dz5b6
pdRLxFLJrLxN+WTC1LylPom3EsSAJuuaG3Z8Cr9Xa6yoWuUYLy/A6MwsyLXBOGLC
3tVWa96xt1cHhd3p/NAOQwvRz/CFdMLM7MStd0mgSihj/pq3jtc2V697+dRtmJih
J0mIc8+jnig+uwVl1DMCmBqdEmasccaDZeX30PcjaPL9ZDyZBeSXI8brdDx8A21e
5RiAsqn9HCxrLZjedL9TWA23BJ7NccsI3aVGpQVtCa9N/MHKp8gZft3v8FrWzFjk
cOeaZY55Xq8hbbbmkF1LoezLrlQDF5aJcj6tl85lyuSJfR5d5BXmLswI7bglw8Qy
ZU7/aF28pbhUT3oozNuRx5yl8oqpxmUoCwfP5hfnFf590OJ3noIELbOoIZQkJQUy
LsCf3GMUQaWzrvs0IenM1lMJmw5zfDXbrUWUti95OAd5bbTdBE30z7EouejoKRVh
1/Wg4keBdtem4CpU+C4fUVtLL4XJhe/uadbjKteA7DRpTRMvrLYNutQgyOAuQjRM
RiLvDPnsIZEGV+YMsj+IemN0hanae4kR3v8At+HwvVK1hROWEQWyL6cGxXH9n9jB
VToxoYoyiAOk01X2BnPVMVTXEl5XMAVgZ1IgZNhnhaIUxi3HKrUfNG4oXE5Jx8Lw
XruUAUKHknTUJ2Q/3y4D
=Fagt
-----END PGP SIGNATURE-----
Merge tag 'renesas-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM Renesas SoC cleanup, refactoring and more SMP support from Kevin Hilman:
"Lots of cleanup and refactoring and some SMP additions for Renesas
platforms. Due to some inter-dependencies with other arm-soc
branches, this Renesas stuff was separated out for sending after the
other branches were merged.
Highlights:
- remove unused board support and cleanup of unused headers
- refactoring of init and device registration
- simplify IRQ initialization"
* tag 'renesas-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (68 commits)
ARM: shmobile: Per-CPU SMP boot / sleep code for SCU SoCs
ARM: shmobile: Introduce per-CPU SMP boot / sleep code
ARM: shmobile: Use shared SCU CPU Hotplug code on r8a7779
ARM: shmobile: Use shared SCU CPU Hotplug code on sh73a0
ARM: shmobile: Add shared SCU CPU Hotplug code
ARM: shmobile: Use shared SCU SMP boot code on emev2
ARM: shmobile: Use shared SCU SMP boot code on r8a7779
ARM: shmobile: Use shared SCU SMP boot code on sh73a0
ARM: shmobile: Introduce shared SCU SMP boot code
ARM: shmobile: sh73a0: Remove global GPIO_NR definition
ARM: shmobile: kzm9d: remove nfsroot settings from bootargs
ARM: shmobile: armadillo800eva: remove nfsroot settings from bootargs
ARM: shmobile: r8a7779: move r8a7779_init_irq_xxx() to setup
ARM: shmobile: r8a7740: move r8a7740_init_irq_of() to setup
ARM: shmobile: bockw: add missing __initdata
ARM: shmobile: r8a7790: add missing __initdata
ARM: shmobile: r8a7779: add missing __initdata
ARM: shmobile: Remove unused shmobile_init_time()
ARM: shmobile: Use clocksource_of_init() on r8a7790
ARM: shmobile: Use default ->init_time() on KZM9G DT ref
...
This branch contains ARM SoC related driver updates for v3.12. The
only thing this cycle are core PM updates and CPUidle support for
ARM's TC2 big.LITTLE development platform.
Conflicts:
One cleanup/reorg conflict with a new entry in
drivers/cpuidle/Makefile. Append the new entry after the existing
ones. A follow up patch for v3.12-rc will make the new entry conform
to the cleanup/reorg.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJSLjatAAoJEFk3GJrT+8Zl32sP/Aw2iEXd/5DUvcp6y/qZoAjO
oLhCPviEnQCpz4smFFySBLvvKyVyA7oOMet8nelIJhwHCTNMBpJZHIfcvpIP5uBY
6LLpFUw4m7TqOISwpVXlwc/3CuG76QCrITLJmButq6tHF4udHeAur+pAnNHoaoys
O5arRMLvl5C4rREeiZctTv5JARICCxIcHpweQdtt+MZ03yG78fEfSB9XxvyOlhh0
OJnGcqU07fIXw9kT/9KAnR3Ql7JJsdzlXqLq6/wFWPe5a1KtgxHNXPbtWaxl8JWW
cPSQci+n9iWgxKzoQTGyQO6sfkDHcol3izMeCScMwlx05SMPwofXpYitaPHLF1cy
PtJosSMVQvJPrHyGlY4vhD9mtCIcyOmlwSlZ6dOf7oqXMhT9CPJe2UD/8JZWgXBi
imY/vpU8mgZT315rQmc/Khg721VNKcSuIvP6xUS9PuaSMUrPSCJFbbkckHGnzdC7
XVFCui9gFxa7vMN+CzrZRqfZnjJ7ujuiFDauMzltu0iBiPNXkAfyoqbxMqUP1HJ5
pdU84vuEVjsUdWt9ivJs6I6cqIwroeji9HZzZnWkWyoDgtAjxhDFVXydqlhrZsuJ
O3uErP8fjRtloFa2iLDZfawPpHDFsY4F+Nm09rZLO7RE4ELlYlQGfYEwuIh+kZ16
nLPE/V5DYrBVyNGDouKx
=FvQD
-----END PGP SIGNATURE-----
Merge tag 'drivers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC driver update from Kevin Hilman:
"This contains the ARM SoC related driver updates for v3.12. The only
thing this cycle are core PM updates and CPUidle support for ARM's TC2
big.LITTLE development platform"
* tag 'drivers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
cpuidle: big.LITTLE: vexpress-TC2 CPU idle driver
ARM: vexpress: tc2: disable GIC CPU IF in tc2_pm_suspend
drivers: irq-chip: irq-gic: introduce gic_cpu_if_down()
patches, both new drivers and fixes to existing. A high percentage of
these are for Samsung platforms like Exynos. Core framework fixes and
some new features like automagical clock re-parenting round out the
patches.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJSLkImAAoJEDqPOy9afJhJOjsP/Ri26AW7XB9pPWJRSU9REBZA
31wxcFo2T+PNir9duwDwjFBFycC3MisaKFlg7D134M+7txbYqm1TRvfu9OEDxpSP
4b/Yl6TarN4dhCN2R+BREO8PnxCBVpspDcsdh6Esuwuet2xUom3UtN8yvSjhPP/u
qGNmXQYXyQy4fom5r+GsDVW+HIhLkaX9b0fYc9EN/bqfgv94PMZAxAxsK9CroAGZ
0m0g9ZXw9iSvVfz+iQEqPINtvpTLHk0FGyimoSR7kvW4o4o47tVtLEWp7VjG6mr5
zvBsycaQq6NgxPu96iUWWhsO9Uj2I7/7JgidXF7r+wvEFs1mcgZtkkirSA/n4zUN
C8a87rvQrZRLr+xXhVuqiVHCgCY8vXoHqkWg6SrZ62ORL8C7uYRpog5SEe2ZzLJX
l5uGAsDM6el+Uc/YviCPoZbeFr3h3CQvvFo8+i2eN0v/Phf30rq4lotBvpQj894G
ngEIMj+D8wshdYSF2dNJ0rLnkLHTgCbiA28L6Cl5TRzRMj3Uaj9aT3cmoLUnimZu
7F7nWU4Iu/vzQKCTQ+eTvwxXJqIlE0JeVbJilqH1f2a68JdXP1LOId+2w/CP8gqQ
i2odj6JHMgBzM9rNs+y0Ir9X/bXIVi6F341c19Nl15srEiLLl8xQIpcPDaI/Kvzs
pefYgF2yS5AZAW3ac90r
=5GfA
-----END PGP SIGNATURE-----
Merge tag 'clk-for-linus-3.12' of git://git.linaro.org/people/mturquette/linux
Pull clock framework changes from Michael Turquette:
"The common clk framework changes for 3.12 are dominated by clock
driver patches, both new drivers and fixes to existing. A high
percentage of these are for Samsung platforms like Exynos. Core
framework fixes and some new features like automagical clock
re-parenting round out the patches"
* tag 'clk-for-linus-3.12' of git://git.linaro.org/people/mturquette/linux: (102 commits)
clk: only call get_parent if there is one
clk: samsung: exynos5250: Simplify registration of PLL rate tables
clk: samsung: exynos4: Register PLL rate tables for Exynos4x12
clk: samsung: exynos4: Register PLL rate tables for Exynos4210
clk: samsung: exynos4: Reorder registration of mout_vpllsrc
clk: samsung: pll: Add support for rate configuration of PLL46xx
clk: samsung: pll: Use new registration method for PLL46xx
clk: samsung: pll: Add support for rate configuration of PLL45xx
clk: samsung: pll: Use new registration method for PLL45xx
clk: samsung: exynos4: Rename exynos4_plls to exynos4x12_plls
clk: samsung: exynos4: Remove checks for DT node
clk: samsung: exynos4: Remove unused static clkdev aliases
clk: samsung: Modify _get_rate() helper to use __clk_lookup()
clk: samsung: exynos4: Use separate aliases for cpufreq related clocks
clocksource: samsung_pwm_timer: Get clock from device tree
ARM: dts: exynos4: Specify PWM clocks in PWM node
pwm: samsung: Update DT bindings documentation to cover clocks
clk: Move symbol export to proper location
clk: fix new_parent dereference before null check
clk: wm831x: Initialise wm831x pointer on init
...
are still in flux, and will have to wait for 3.13.
The changes for 3.12 are mostly clean ups and minor fixes.
H. Peter Anvin added a check to x86_32 static function tracing that
helps a small segment of the kernel community.
Oleg Nesterov had a few changes from 3.11, but were mostly clean ups
and not worth pushing in the -rc time frame.
Li Zefan had small clean up with annotating a raw_init with __init.
I fixed a slight race in updating function callbacks, but the race
is so small and the bug that happens when it occurs is so minor it's
not even worth pushing to stable.
The only real enhancement is from Alexander Z Lam that made the
tracing_cpumask work for trace buffer instances, instead of them all
sharing a global cpumask.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQEcBAABAgAGBQJSLJm1AAoJEOdOSU1xswtMSu0H/0/Uuh0D5VhANZRcTATY4gUO
n3WH6sm3atOxH+cbeYQcFXxOcvRcR2n90tvCMpiFlPiC0NiNR1yjro3VLS4zWb77
twq7gABdJf+Tdq7sOBmSzmY5vRKQVHIXvAfC27mBez38nCWZz0BjJGEsPBwoly25
ZaiCbKlusw/QKIEy40tuKUL/rXF6yEWnQrMujhBbyNm0w7sJVdfnd+HHmCvy15H2
IQE1g83d/dAMBjFY2BYg77J+oV6qmJxql2itvDivQWXHqFb52Jw3ZTwHwWLZlPYU
AZcHtYGs2lSUscQLF56LejB7zZyE8taUufExFEVexXxZS5u7nNPXsPrA2LOOK70=
=JWO6
-----END PGP SIGNATURE-----
Merge tag 'trace-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"Not much changes for the 3.12 merge window. The major tracing changes
are still in flux, and will have to wait for 3.13.
The changes for 3.12 are mostly clean ups and minor fixes.
H Peter Anvin added a check to x86_32 static function tracing that
helps a small segment of the kernel community.
Oleg Nesterov had a few changes from 3.11, but were mostly clean ups
and not worth pushing in the -rc time frame.
Li Zefan had small clean up with annotating a raw_init with __init.
I fixed a slight race in updating function callbacks, but the race is
so small and the bug that happens when it occurs is so minor it's not
even worth pushing to stable.
The only real enhancement is from Alexander Z Lam that made the
tracing_cpumask work for trace buffer instances, instead of them all
sharing a global cpumask"
* tag 'trace-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace/rcu: Do not trace debug_lockdep_rcu_enabled()
x86-32, ftrace: Fix static ftrace when early microcode is enabled
ftrace: Fix a slight race in modifying what function callback gets traced
tracing: Make tracing_cpumask available for all instances
tracing: Kill the !CONFIG_MODULES code in trace_events.c
tracing: Don't pass file_operations array to event_create_dir()
tracing: Kill trace_create_file_ops() and friends
tracing/syscalls: Annotate raw_init function with __init
In __clk_init(), after a clock is mostly initialized, a scan is done
of the orphan clocks to see if the clock being registered is the
parent of any of them.
This code assumes that any clock that provides a get_parent method
actually has at least one parent, and that's not a valid assumption.
As a result, an orphan clock with no parent can return *something*
as the parent index, and that value is blindly used to dereference
the orphan's parent_names[] array (which will be ZERO_SIZE_PTR or
NULL).
Fix this by ensuring get_parent is only called for orphans with at
least one parent.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
Separate "check if we need to retry" from "unlock if we are done and
had seq_writelock"; that allows to use these guys in d_walk(), where
we need to recheck every time we ascend back to parent, but do *not*
want to unlock until the very end. Lift rcu_read_lock/rcu_read_unlock
out into callers.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
For 3.12-rc1 there are a number of bugfixes in addition to work to ease usage
of shared code between libxfs and the kernel, the rest of the work to enable
project and group quotas to be used simultaneously, performance optimisations
in the log and the CIL, directory entry file type support, fixes for log space
reservations, some spelling/grammar cleanups, and the addition of user
namespace support.
- introduce readahead to log recovery
- add directory entry file type support
- fix a number of spelling errors in comments
- introduce new Q_XGETQSTATV quotactl for project quotas
- add USER_NS support
- log space reservation rework
- CIL optimisations
- kernel/userspace libxfs rework
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJSLeikAAoJENaLyazVq6ZOciEP/3tc850sQsPlNwP9aqd1l2Wk
S1RJ8i+MUQ2W/PlbswCXvdUCT8DIwXWxL31tGvi8vtaLhh6t8ICSZwqNil+/GCIJ
BErVvY4oXhEMHhlbIRRvpxblTfJGiYy3puUEz9VI0yDdUVnC33+DuEeLTQ/0mibo
/UUqKFmM3KYpOc8vIQvH5K5i8PkjtMt9yge0k4l9COD30gtY2okkaD4b1voOsKc+
5YFqulq7zcXBUYti+EFCQeV8aUBTGEPN4PJRdcS12/ylzsTzZivAOO+QREu7qBW8
x+Gj8fOC+yYWCttmJlfa1n8taxge3ndEuzKN97nvvfQgjvvunMvwJ499skryYVdB
EcPnBnpDUQuz/y7exKBT9uROK817vZBtfHzSova29ayQSWC+qDpNE4xXeDIqeCtT
CPxdHuWMOvIdZg41E4x7je0elaZl8EAZ8hycc2WuRhtukEkIdE1O8aD7IVrMYee8
kg+aVHG5nmYRInO1WuMinbtiCzwvVoBJToWM3y4cbfgW0dILASRyL53HDd+eCr1j
kOpPIVgXlBZgiPMmdYahWxyVVWcE7zyex0w4frzWVlJMZ4lP5brppD6qfQg1JwOB
z21Y95F5C2GxSyN/Lwps0G6jujHrpe6GVeYK7uKCtnqTD83nSShv5Naln7pQ3AUs
qUMsqmJob4+bwt94Xgbx
=V4s4
-----END PGP SIGNATURE-----
Merge tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs
Pull xfs updates from Ben Myers:
"For 3.12-rc1 there are a number of bugfixes in addition to work to
ease usage of shared code between libxfs and the kernel, the rest of
the work to enable project and group quotas to be used simultaneously,
performance optimisations in the log and the CIL, directory entry file
type support, fixes for log space reservations, some spelling/grammar
cleanups, and the addition of user namespace support.
- introduce readahead to log recovery
- add directory entry file type support
- fix a number of spelling errors in comments
- introduce new Q_XGETQSTATV quotactl for project quotas
- add USER_NS support
- log space reservation rework
- CIL optimisations
- kernel/userspace libxfs rework"
* tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs: (112 commits)
xfs: XFS_MOUNT_QUOTA_ALL needed by userspace
xfs: dtype changed xfs_dir2_sfe_put_ino to xfs_dir3_sfe_put_ino
Fix wrong flag ASSERT in xfs_attr_shortform_getvalue
xfs: finish removing IOP_* macros.
xfs: inode log reservations are too small
xfs: check correct status variable for xfs_inobt_get_rec() call
xfs: inode buffers may not be valid during recovery readahead
xfs: check LSN ordering for v5 superblocks during recovery
xfs: btree block LSN escaping to disk uninitialised
XFS: Assertion failed: first <= last && last < BBTOB(bp->b_length), file: fs/xfs/xfs_trans_buf.c, line: 568
xfs: fix bad dquot buffer size in log recovery readahead
xfs: don't account buffer cancellation during log recovery readahead
xfs: check for underflow in xfs_iformat_fork()
xfs: xfs_dir3_sfe_put_ino can be static
xfs: introduce object readahead to log recovery
xfs: Simplify xfs_ail_min() with list_first_entry_or_null()
xfs: Register hotcpu notifier after initialization
xfs: add xfs sb v4 support for dirent filetype field
xfs: Add write support for dirent filetype field
xfs: Add read-only support for dirent filetype field
...
Not using the return value can in the generic case be racy, so it's
in general good practice to check the return value instead.
This also resolved the warning caused on ARM and other architectures:
fs/direct-io.c: In function 'sb_init_dio_done_wq':
fs/direct-io.c:557:2: warning: value computed is not used [-Wunused-value]
Signed-off-by: Olof Johansson <olof@lixom.net>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: H Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When running the AIM7's short workload, Linus' lockref patch eliminated
most of the spinlock contention. However, there were still some left:
8.46% reaim [kernel.kallsyms] [k] _raw_spin_lock
|--42.21%-- d_path
| proc_pid_readlink
| SyS_readlinkat
| SyS_readlink
| system_call
| __GI___readlink
|
|--40.97%-- sys_getcwd
| system_call
| __getcwd
The big one here is the rename_lock (seqlock) contention in d_path()
and the getcwd system call. This patch will eliminate the need to take
the rename_lock while translating dentries into the full pathnames.
The need to take the rename_lock is to make sure that no rename
operation can be ongoing while the translation is in progress. However,
only one thread can take the rename_lock thus blocking all the other
threads that need it even though the translation process won't make
any change to the dentries.
This patch will replace the writer's write_seqlock/write_sequnlock
sequence of the rename_lock of the callers of the prepend_path() and
__dentry_path() functions with the reader's read_seqbegin/read_seqretry
sequence within these 2 functions. As a result, the code will have to
retry if one or more rename operations had been performed. In addition,
RCU read lock will be taken during the translation process to make sure
that no dentries will go away. To prevent live-lock from happening,
the code will switch back to take the rename_lock if read_seqretry()
fails for three times.
To further reduce spinlock contention, this patch does not take the
dentry's d_lock when copying the filename from the dentries. Instead,
it treats the name pointer and length as unreliable and just copy
the string byte-by-byte over until it hits a null byte or the end of
string as specified by the length. This should avoid stepping into
invalid memory address. The error cases are left to be handled by
the sequence number check.
The following code re-factoring are also made:
1. Move prepend('/') into prepend_name() to remove one conditional
check.
2. Move the global root check in prepend_path() back to the top of
the while loop.
With this patch, the _raw_spin_lock will now account for only 1.2%
of the total CPU cycles for the short workload. This patch also has
the effect of reducing the effect of running perf on its profile
since the perf command itself can be a heavy user of the d_path()
function depending on the complexity of the workload.
When taking the perf profile of the high-systime workload, the amount
of spinlock contention contributed by running perf without this patch
was about 16%. With this patch, the spinlock contention caused by
the running of perf will go away and we will have a more accurate
perf profile.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
- nand-gpio cleanup and portability to non-ARM
- m25p80 support for 4-byte addressing chips, other new chips
- pxa3xx cleanup and support for new platforms
- remove obsolete alauda, octagon-5066 drivers
- erase/write support for bcm47xxsflash
- improve detection of ECC requirements for NAND, controller setup
- NFC acceleration support for atmel-nand, read/write via SRAM
- etc.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iEYEABECAAYFAlIt3VsACgkQdwG7hYl686NKBQCgqFzV1S+TiXZSjbqanK0374Hu
lMIAoI2IOQb1bgQFSqaOzgSaBmVhLtMc
=xWBt
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20130909' of git://git.infradead.org/linux-mtd
Pull mtd updates from David Woodhouse:
- factor out common code from MTD tests
- nand-gpio cleanup and portability to non-ARM
- m25p80 support for 4-byte addressing chips, other new chips
- pxa3xx cleanup and support for new platforms
- remove obsolete alauda, octagon-5066 drivers
- erase/write support for bcm47xxsflash
- improve detection of ECC requirements for NAND, controller setup
- NFC acceleration support for atmel-nand, read/write via SRAM
- etc
* tag 'for-linus-20130909' of git://git.infradead.org/linux-mtd: (184 commits)
mtd: chips: Add support for PMC SPI Flash chips in m25p80.c
mtd: ofpart: use for_each_child_of_node() macro
mtd: mtdswap: replace strict_strtoul() with kstrtoul()
mtd cs553x_nand: use kzalloc() instead of memset
mtd: atmel_nand: fix error return code in atmel_nand_probe()
mtd: bcm47xxsflash: writing support
mtd: bcm47xxsflash: implement erasing support
mtd: bcm47xxsflash: convert to module_platform_driver instead of init/exit
mtd: bcm47xxsflash: convert kzalloc to avoid invalid access
mtd: remove alauda driver
mtd: nand: mxc_nand: mark 'const' properly
mtd: maps: cfi_flagadm: add missing __iomem annotation
mtd: spear_smi: add missing __iomem annotation
mtd: r852: Staticize local symbols
mtd: nandsim: Staticize local symbols
mtd: impa7: add missing __iomem annotation
mtd: sm_ftl: Staticize local symbols
mtd: m25p80: add support for mr25h10
mtd: m25p80: make CONFIG_M25PXX_USE_FAST_READ safe to enable
mtd: m25p80: Pass flags through CAT25_INFO macro
...
- Fix a regression since 3.2 inclusive: The subsystem workqueue
deadlocked between transaction completion handling and bus reset
handling if the worker pool could not be increased in time.
- janitorial updates
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (GNU/Linux)
iQIcBAABAgAGBQJSLKFKAAoJEHnzb7JUXXnQ0FYQAIVhbOdQHNrBIhaLurutW7KX
1O8NIta5rnvjRotPOM4SPP0l6NJaQOzSSo2/8LW/00mByjyB55SF71yepUnFyaHo
jpHq8mF3ogdZYG92pOa/+E/0apCle2FCEWn5oJOGMFOIaYI4+h2yDb64Ejekg3LZ
egrSp/XWGR/L+i3FCahEtX4UDeTuVu7OpxRKov+dhWnEwHhzfwqfM6v8hW1TsthI
XmQrJFOvbgvLUAh2Jph15wAtY0oIgkfgAEvPMkkE3MD3nEK0kWPq7L2F51+rpRgf
4ZEWbXAc3Kv7IMrGW5UnIIm9k5cHEXklyVdQTCPdHp7htaXMmKaQwsBNvsPgy+D2
eBF8lG9DCyvK2G1h3iVtbJUA27lWxs8VR6VtuL8xpLjs97k6bd2FI3+SesKS3FdP
TghbQwV0OyNPYEt7s8+9sKPcv47qXg3ARvQWJTqfVo1j5fS62m/WfbcM4b4yD79w
G/7auGsX/gdY8AB78sC6LBEiwW6RDlCsRfgH50Ll7Vjy9xhks5H0Q8MR+qMP5EOl
ybs4lioho1S5AK8Qqu2t1JMHbknNDR9NKFD7tD3cSwqurOyP0l2cHyku+rKeNNmR
B5W1nMbhMpaQ8OT00hnMXzT5+QobPPVZIzOoFxF4qA28lUjacC+nmZfpCRDTaxqQ
5GNJ/9EBpH1pvg9kw3pJ
=0rDy
-----END PGP SIGNATURE-----
Merge tag 'firewire-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire updates from Stefan Richter:
- Fix a regression since 3.2 inclusive: The subsystem workqueue
deadlocked between transaction completion handling and bus reset
handling if the worker pool could not be increased in time.
- janitorial updates
* tag 'firewire-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: ohci: Fix deadlock at bus reset
firewire: ohci: Change module_pci_driver to module_init/module_exit
firewire: ohci: beautify some macro definitions
firewire: ohci: change confusing name of a struct member
firewire: core: typecast from gfp_t to bool more safely
firewire: WQ_NON_REENTRANT is meaningless and going away
Returned to intel.com
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Jon Mason <jon.mason@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Shaohua Li <shli@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull DMA mapping update from Marek Szyprowski:
"This contains an addition of Device Tree support for reserved memory
regions (Contiguous Memory Allocator is one of the drivers for it) and
changes required by the KVM extensions for PowerPC architectue"
* 'for-v3.12' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
ARM: init: add support for reserved memory defined by device tree
drivers: of: add initialization code for dma reserved memory
drivers: of: add function to scan fdt nodes given by path
drivers: dma-contiguous: clean source code and prepare for device tree
Return directly if memory allocation fails. There is no need
of dma_free_coherent().
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Saeed Bishara <saeed@marvell.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
CC:stable. But for that reason, I did a merge with master partway
through to avoid an unnecessary conflict.
Also: a fun lguest bug turns out if you don't clear the TF flag when trapping
Bad Things happen to the guest kernel as the stack overflows...
Cheers,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJSLVvbAAoJENkgDmzRrbjxLp0P/ivSWpHo6qPvOGG/xBCSCUX0
lHb8murQOy48g8i5vcZAkkLZIxoxonMwpst91zLawH6c0iifwIqPSvwqzQhcjuxj
5766Ntbdz5n5H1AEKrFnANAGMJ25f0b5irixzCSAADlDOYelaoGWfjsxTf7ZgU4i
qYd0x3RuIUDU5dk5LRHJTpml5yai372xvV3AOyYc6l0lxCgfRt87hYrQmBpgxrfC
KtHIYS3pw3582bMlKn77WHRjC4Ca6w0X36HfeyyToOh27B89TFw2D53Fb9EDzEN/
do8RxGGmXHSkg6gh1TYUGvNs21Lv0jmzn/Zejzyn7i0UscnjFAksoARi6Eh00xri
ZkkBoq8HZVEDu/uh3/J5HRgC6/HNyKd9C4uZ8zflMqV8VHgtdyciDH4uqoFXCesv
vRrfBNadmvTjFOhQ1bCi/GUOWdaCNoVfE+3WZ+0FvFhS2o5RPusVuSGLXQDm7frV
UejvK8gZ8LnQ1SS3KZ/7UZPeym9WXh7/62VI3bIQb97a/sBRQZYR/JOg+lOQcLhu
bVfP2KO2zk+L8NKYJ3Jg+JJXswzIgtGx4WLqvfqY4TOR8Gt/VMT5zCmGXoMI+te5
JcHLLC713kVSa4jgbdoErk+WxvN5LsDfIPKWxl4t/AfIVSS2ZxvwTQKAbMZkyDwx
qcLpnfANeqS8bzXgRtxG
=MFZ1
-----END PGP SIGNATURE-----
Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull virtio update from Rusty Russell:
"More console fixes; these are the theoretical ones which didn't get
CC:stable. But for that reason, I did a merge with master partway
through to avoid an unnecessary conflict.
Also: a fun lguest bug turns out if you don't clear the TF flag when
trapping Bad Things happen to the guest kernel as the stack
overflows..."
* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
virtio_pci: pm: Use CONFIG_PM_SLEEP instead of CONFIG_PM
lguest: fix GPF in guest when using gdb.
lguest: fix guest kernel stack overflow when TF bit set.
lguest: fix BUG_ON() in invalid guest page table.
virtio: console: prevent use-after-free of port name in port unplug
virtio: console: cleanup an error message
virtio: console: fix locking around send_sigio_to_port()
virtio: console: add locking in port unplug path
virtio: console: add locks around buffer removal in port unplug path
tools/lguest: offer VIRTIO_F_ANY_LAYOUT for net device.
virtio tools: add .gitignore
lguest: Point to the right directory for the lguest launcher
an external user interface exported to allow other modules to hold
references to VFIO groups, a fix to test for extended config space
on PCIe and PCI-x, and new hot reset interfaces for PCI devices
which allows the user to do PCI bus/slot resets when all of the
devices affected by the reset are owned by the user.
For this last feature, the PCI bus reset interface, I depend on
changes already merged from Bjorn's PCI pull request. I therefore
merged my tree up to commit cb3e433, which I think was the correct
action, but as Stephen Rothwell noted, I failed to provide a commit
message indicating why the merge was required. Sorry for that.
Thanks,
Alex
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJSLKr0AAoJECObm247sIsiouMP/1rym/JDjSsjIONvjnQEH5hk
aZy3gGjqvSBBSN2aWqYkEgsSmgh4lSdDAbSBpY6vfz4Q0h7OXhSZcZ5oG2eGECia
rN6GvfI/g2mYBCHJbS30bEkpljx9ssMZi7n9dnWOWEkzXSTC/uLHx9DCUiHdczPF
SEeEHbxAnCTL5S5SQgUwhjPI813eVPjm94KDZK6pHcd14GGtM/piGjJNkyfzymRZ
0l7lu95YaY70qL1Bk1EZ2TlsqWtYeKfDXsp465U3uBj5eXslMKpxXBn/XndRWu25
ZnjRitF/izMZdcBgIM4uVK/yDhdAI2qav1rtkVirqp9Qrs9BgJXxUpO3p0e/GhFl
KZgq3DnE1aUfJlyu+YGmjIKWDT+CBrkkxidRwYoaBCs3if/1E19zUH5Gfuuimy/D
rpET3HPuJlPhxyE2vH9u6EVTybBx4wlvDP43FyhetriQjwv+SuPP2oNWD96L/qBY
BHcSMHfkyq90Lp6tjCeTISkwOb66qqkRUlzycPiB67BgrIea8yn14lKpE7GLcgmO
XSLLRvv6tFD2VcqEWQZNLoLo8XqAngvgStbOXxiTFpSQD4h2Z0TbpSc9qdVZlgIt
JBMA0BuDj+gzCDePUlsuM5L1EVOaFuKSHiZA6MUewDBqiiKoWPPsDKsFpKt2jFv4
Mp47JcWkDBg1DCC8nQ7f
=FAuJ
-----END PGP SIGNATURE-----
Merge tag 'vfio-v3.12-rc0' of git://github.com/awilliam/linux-vfio
Pull VFIO update from Alex Williamson:
"VFIO updates include safer default file flags for VFIO device fds, an
external user interface exported to allow other modules to hold
references to VFIO groups, a fix to test for extended config space on
PCIe and PCI-x, and new hot reset interfaces for PCI devices which
allows the user to do PCI bus/slot resets when all of the devices
affected by the reset are owned by the user.
For this last feature, the PCI bus reset interface, I depend on
changes already merged from Bjorn's PCI pull request. I therefore
merged my tree up to commit cb3e433, which I think was the correct
action, but as Stephen Rothwell noted, I failed to provide a commit
message indicating why the merge was required. Sorry for that.
Thanks, Alex"
* tag 'vfio-v3.12-rc0' of git://github.com/awilliam/linux-vfio:
vfio: fix documentation
vfio-pci: PCI hot reset interface
vfio-pci: Test for extended config space
vfio-pci: Use fdget() rather than eventfd_fget()
vfio: Add O_CLOEXEC flag to vfio device fd
vfio: use get_unused_fd_flags(0) instead of get_unused_fd()
vfio: add external user support
Highlights include:
- Fix NFSv4 recovery so that it doesn't recover lost locks in cases such as
lease loss due to a network partition, where doing so may result in data
corruption. Add a kernel parameter to control choice of legacy behaviour
or not.
- Performance improvements when 2 processes are writing to the same file.
- Flush data to disk when an RPCSEC_GSS session timeout is imminent.
- Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other
NFS clients from being able to manipulate our lease and file lockingr
state.
- Allow sharing of RPCSEC_GSS caches between different rpc clients
- Fix the broken NFSv4 security auto-negotiation between client and server
- Fix rmdir() to wait for outstanding sillyrename unlinks to complete
- Add a tracepoint framework for debugging NFSv4 state recovery issues.
- Add tracing to the generic NFS layer.
- Add tracing for the SUNRPC socket connection state.
- Clean up the rpc_pipefs mount/umount event management.
- Merge more patches from Chuck in preparation for NFSv4 migration support.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJSLelVAAoJEGcL54qWCgDyo2IQAKOfRJyZVnf4ipxi3xLNl1QF
w/70DVSIF1S1djWN7G3vgkxj/R8KCvJ8CcvkAD2BEgRDeZJ9TtyKAdM/jYLZ+W05
7k2QKk8fkwZmc1Y2qDqFwKHzP5ZgP5L2nGx7FNhi/99wEAe47yFG3qd3rUWKrcOf
mnd863zgGDE2Q10slhoq/bywwMJo6tKZNeaIE8kPjgFbBEh/jslpAWr8dSA4QgvJ
nZ8VB5XU8L+XJ0GpHHdjYm9LvQ51DbQ6omOF+0P4fI093azKmf4ZsrjMDWT8+iu3
XkXlnQmKLGTi7yB43hHtn2NiRqwGzCcZ1Amo9PpCFaHUt1RP9cc37UhG1T+x1xWJ
STEKDbvCdQ3FU9FvbgrGEwBR0e8fNS4fZY3ToDBflIcfwre0aWs5RCodZMUD0nUI
4wY5J9NsQR/bL+v8KeUR4V4cXK8YrgL0zB4u4WYzH5Npxr5KD0NEKDNqRPhrB9l2
LLF9Haql8j76Ff0ek6UGFIZjDE0h6Fs71wLBpLj+ZWArOJ7vBuLMBSOVqNpld9+9
f2fEG7qoGF4FGTY4myH/eakMPaWnk9Ol4Ls/svSIapJ9+rePD+a93e/qnmdofIMf
4TuEYk6ERib1qXgaeDRQuCsm2YE1Co5skGMaOsRFWgReE1c12QoJQVst2nMtEKp3
uV2w8LgX18aZOZXJVkCM
=ZuW+
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Highlights include:
- Fix NFSv4 recovery so that it doesn't recover lost locks in cases
such as lease loss due to a network partition, where doing so may
result in data corruption. Add a kernel parameter to control
choice of legacy behaviour or not.
- Performance improvements when 2 processes are writing to the same
file.
- Flush data to disk when an RPCSEC_GSS session timeout is imminent.
- Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other
NFS clients from being able to manipulate our lease and file
locking state.
- Allow sharing of RPCSEC_GSS caches between different rpc clients.
- Fix the broken NFSv4 security auto-negotiation between client and
server.
- Fix rmdir() to wait for outstanding sillyrename unlinks to complete
- Add a tracepoint framework for debugging NFSv4 state recovery
issues.
- Add tracing to the generic NFS layer.
- Add tracing for the SUNRPC socket connection state.
- Clean up the rpc_pipefs mount/umount event management.
- Merge more patches from Chuck in preparation for NFSv4 migration
support"
* tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (107 commits)
NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity
NFS: nfs_compare_super shouldn't check the auth flavour unless 'sec=' was set
NFSv4: Allow security autonegotiation for submounts
NFSv4: Disallow security negotiation for lookups when 'sec=' is specified
NFSv4: Fix security auto-negotiation
NFS: Clean up nfs_parse_security_flavors()
NFS: Clean up the auth flavour array mess
NFSv4.1 Use MDS auth flavor for data server connection
NFS: Don't check lock owner compatability unless file is locked (part 2)
NFS: Don't check lock owner compatibility in writes unless file is locked
nfs4: Map NFS4ERR_WRONG_CRED to EPERM
nfs4.1: Add SP4_MACH_CRED write and commit support
nfs4.1: Add SP4_MACH_CRED stateid support
nfs4.1: Add SP4_MACH_CRED secinfo support
nfs4.1: Add SP4_MACH_CRED cleanup support
nfs4.1: Add state protection handler
nfs4.1: Minimal SP4_MACH_CRED implementation
SUNRPC: Replace pointer values with task->tk_pid and rpc_clnt->cl_clid
SUNRPC: Add an identifier for struct rpc_clnt
SUNRPC: Ensure rpc_task->tk_pid is available for tracepoints
...
Pull fuse bugfixes from Miklos Szeredi:
"Just a bunch of bugfixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: use list_for_each_entry() for list traversing
fuse: readdir: check for slash in names
fuse: hotfix truncate_pagecache() issue
fuse: invalidate inode attributes on xattr modification
fuse: postpone end_page_writeback() in fuse_writepage_locked()
window. Also, most of them are bug fixes this time. Two of my
three patches (moving gfs2_sync_meta and merging the two writepage
implementations) are clean ups with the third (taking the glock ref
in examine_bucket) being a fix for a difficult to hit race condition.
The removal of an unused memory barrier is a clean up from Bob Peterson,
and the "spectator" relates to a rarely used mount option. Ben
Marzinski's patch fixes a corner case where the incorrect inode
flags were being set, resulting in incorrect behaviour on fsync.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJSLYWjAAoJEMrg3m4a/8jSHBUQAIBXyrLypCJNJNpYreRlg4Te
YZOqXMxqrfsWD5jkfWvxV/vZyVu6FEJRJRRwKoO8foZhnVIy2HiGdpFvwk4NMzGs
Ah5cCaySgDhFIKXNq1CnhVZNpRU8N+Lf+8U2MwkpPUrT+KnZlDdG8COuHbWJ3/+t
prqHy0N1oa8hgj3P0oDf3kyQKiB2MQVhORTBVOWaas1mzw8vsKRjvO7k5g0VPcfV
VnIYzvQ033V6pW6ymWGcbz6us7hXeFJmXo4grAdLIclK6QDt1zPLVKlvk7X/Me52
PTxO5AP/Nw6AlJABTNy8UZ3uJO3QaiqhcIz+OzIXlYpsICqma30qkHHrWzOh0Q5F
HrDqh03A2zuqigamVmsJr2+tbdLWxz72D6N3eFlIBLAw2JxILosCPcGc4UNzG/7g
mFr6+LDnZXzFpjT6OBdepjVH06ZqfcV+nr3KhvmKQT+tkzTizArbt8JutOdKjahH
3+pXx/bA0B7LRVYymK5/xrfbjnQp629GG3QnGiI23gtDZ6eEzm663bQz6MTgjR88
H0nigLi8d2KmM8UnrZqq5nI+LirFD/IZ5DO47Vv6w3NCZW/ZpNOodlWkd1igvhHR
ugTvKqyw2O9hImM8jPoRWbqL01RGTKTMMn/3cCCO8Uww7DpI4GLb48a39buYh4+2
kijIxCtntfQAOiR6MCvA
=rsQv
-----END PGP SIGNATURE-----
Merge tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw
Pull GFS2 updates from Steven Whitehouse:
"This is possibly the smallest ever set of GFS2 patches for a merge
window. Also, most of them are bug fixes this time.
Two of my three patches (moving gfs2_sync_meta and merging the two
writepage implementations) are clean ups with the third (taking the
glock ref in examine_bucket) being a fix for a difficult to hit race
condition.
The removal of an unused memory barrier is a clean up from Bob
Peterson, and the "spectator" relates to a rarely used mount option.
Ben Marzinski's patch fixes a corner case where the incorrect inode
flags were being set, resulting in incorrect behaviour on fsync"
* tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
GFS2: dirty inode correctly in gfs2_write_end
GFS2: Don't flag consistency error if first mounter is a spectator
GFS2: Remove unnecessary memory barrier
GFS2: Merge ordered and writeback writepage
GFS2: Take glock reference in examine_bucket()
GFS2: Move gfs2_sync_meta to lops.c
Pull ceph updates from Sage Weil:
"This includes both the first pile of Ceph patches (which I sent to
torvalds@vger, sigh) and a few new patches that add support for
fscache for Ceph. That includes a few fscache core fixes that David
Howells asked go through the Ceph tree. (Thanks go to Milosz Tanski
for putting this feature together)
This first batch of patches (included here) had (has) several
important RBD bug fixes, hole punch support, several different
cleanups in the page cache interactions, improvements in the truncate
code (new truncate mutex to avoid shenanigans with i_mutex), and a
series of fixes in the synchronous striping read/write code.
On top of that is a random collection of small fixes all across the
tree (error code checks and error path cleanup, obsolete wq flags,
etc)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (43 commits)
ceph: use d_invalidate() to invalidate aliases
ceph: remove ceph_lookup_inode()
ceph: trivial buildbot warnings fix
ceph: Do not do invalidate if the filesystem is mounted nofsc
ceph: page still marked private_2
ceph: ceph_readpage_to_fscache didn't check if marked
ceph: clean PgPrivate2 on returning from readpages
ceph: use fscache as a local presisent cache
fscache: Netfs function for cleanup post readpages
FS-Cache: Fix heading in documentation
CacheFiles: Implement interface to check cache consistency
FS-Cache: Add interface to check consistency of a cached object
rbd: fix null dereference in dout
rbd: fix buffer size for writes to images with snapshots
libceph: use pg_num_mask instead of pgp_num_mask for pg.seed calc
rbd: fix I/O error propagation for reads
ceph: use vfs __set_page_dirty_nobuffers interface instead of doing it inside filesystem
ceph: allow sync_read/write return partial successed size of read/write.
ceph: fix bugs about handling short-read for sync read mode.
ceph: remove useless variable revoked_rdcache
...
- Device tree updates for TZ1090 GPIO drivers merged via GPIO tree.
- Add driver for ImgTec PDC irqchip as found in TZ1090 SoC.
- Add linux-metag mailing list to MAINTAINERS file.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJSLeXAAAoJEKHZs+irPybf72QP/2IPeDqMEcBWA0SXmMV2gb9r
BAr2MVYxbFwvFrUbFWtYPW/d+u5YHSFe071dYVfZj31djddj+JeVlK1QiQ7QfN43
OtBFfyuKaTKb5I5QNs4dVJW1pcEXQFcyb3KYnKdkVWammhqdwGqpYIuo+7MwlCV8
COC6q4BHK17Fa6NLl3WXIV3DdMu7j6IZPzBRiJDXhIUIFxMH34qsVmmZQcXPD43G
BGAM6ztyHVEbCt2SVSOS6WB7Yk2w4fW4fReqenOBTinZU6HS7uI7MXoCkifMLCWD
ipQnd2TOVSA4zCElS1xVRc+n7c91zOprC8rijuDG5rMT3ml6famvU28OSEHGcMiM
4kX00ZxjqTk9DPg5xzusGgEuvHF0CwXRWf5QVjUY6yZwPyf22PmN7d1oJ2G2I2Ge
2zyJok7x9VdLtEdNIWFXuNyHVwFNte0evciwFdZk16yBXdikTHXapVqUggmrf0GR
dPCSrZS342i5JlS4fTtqscuuw9GDPLDGDytjD184GlHWhrKDP6JsYSdNvDXBF+9w
sE6eMkYkf8QqhW41hmdBSpI9RLH3VPPRa1hQH1wZS98Rl1arUc9QnHxNhoiao1Y2
838qvRSpvdvwE9fO+WfU/X7MTQs7qWA6t/vdO60KIxVAaPzTLLIHR62CoxSsWHFB
IyVsf4+7BnDjciSJSqOJ
=WHe5
-----END PGP SIGNATURE-----
Merge tag 'metag-for-v3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag
Pull metag architecture changes from James Hogan:
- Device tree updates for TZ1090 GPIO drivers merged via GPIO tree.
- Add driver for ImgTec PDC irqchip as found in TZ1090 SoC.
- Add linux-metag mailing list to MAINTAINERS file.
* tag 'metag-for-v3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag:
irq-imgpdc: add ImgTec PDC irqchip driver
MAINTAINERS: add linux-metag mailing list
metag: tz1090: instantiate gpio-tz1090-pdc
metag: tz1090: select and instantiate gpio-tz1090
metag: tz1090: select and instantiate irq-imgpdc
Pull m68knommu fixes from Greg Ungerer:
"Just a small collection of cleanups and fixes this time, no big
changes. The most interresting are to make the m68k and m68knommu
consistently use CONFIG_IOMAP, clean out some unused board config
options and flush the cache on signal stack creation"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68k: remove 16 unused boards in Kconfig.machine
m68k: define 'VM_DATA_DEFAULT_FLAGS' no matter whether has 'NOMMU' or not
m68knommu: user generic iomap to support ioread*/iowrite*
m68k/coldfire: flush cache when creating the signal stack frame
m68knommu: Mark functions only called from setup_arch() __init
Pull UML updates from Richard Weinberger:
"This pile contains mostly fixes and improvements for issues identified
by Richard W M Jones while adding UML as backend to libguestfs"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: Add irq chip um/mask handlers
um: prctl: Do not include linux/ptrace.h
um: Run UML in it's own session.
um: Cleanup SIGTERM handling
um: ubd: Introduce submit_request()
um: ubd: Add REQ_FLUSH suppport
um: Implement probe_kernel_read()
um: hostfs: Fix writeback
When reconnecting to automounts at startup an autofs ioctl is used
to find the device and inode of existing mounts so they can be used
to open a file descriptor of possibly covered mounts.
At this time the the caller might not yet "own" the mount so it can
trigger calling ->d_automount(). This causes automount to hang when
trying to reconnect to direct or offset mount types.
Consequently kern_path() can't be used but kern_path_mountpoint() can be.
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>