Presently, memory cgroup's direct reclaim frees memory from the current
node. But this has some troubles. Usually when a set of threads works in
a cooperative way, they tend to operate on the same node. So if they hit
limits under memcg they will reclaim memory from themselves, damaging the
active working set.
For example, assume 2 node system which has Node 0 and Node 1 and a memcg
which has 1G limit. After some work, file cache remains and the usages
are
Node 0: 1M
Node 1: 998M.
and run an application on Node 0, it will eat its foot before freeing
unnecessary file caches.
This patch adds round-robin for NUMA and adds equal pressure to each node.
When using cpuset's spread memory feature, this will work very well.
But yes, a better algorithm is needed.
[akpm@linux-foundation.org: comment editing]
[kamezawa.hiroyu@jp.fujitsu.com: fix time comparisons]
Signed-off-by: Ying Han <yinghan@google.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
AFAICS mm/page_cgroup.c is for memcg subsystem, but it was directed only
to generic cgroup maintainers. Fix it.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move page-freeing code out of swap_cgroup_mutex in the hope that it could
reduce few of theoretical contentions between swapons and/or swapoffs.
This is just a cleanup, no functional changes.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It allocated one more page than necessary if @max_pages was a multiple of
SC_PER_PAGE.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit ca371c0d7e ("memcg: fix page_cgroup fatal error in FLATMEM")
removes call to alloc_bootmem() in the function so that it can be marked
as __meminit to reduce memory usage when MEMORY_HOTPLUG=n.
Also as the new helper function alloc_page_cgroup() is called only in the
function, it should be marked too.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
next_mz is assigned to NULL if __mem_cgroup_largest_soft_limit_node
selects the same mz. This doesn't make much sense as we assign to the
variable right in the next loop.
Compiler will probably optimize this out but it is little bit confusing
for the code reading.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We recently added the change in global background reclaim which counts the
return value of soft_limit reclaim. Now this patch adds the similar logic
on global direct reclaim.
We should skip scanning global LRU on shrink_zone if soft_limit reclaim
does enough work. This is the first step where we start with counting the
nr_scanned and nr_reclaimed from soft_limit reclaim into global
scan_control.
Signed-off-by: Ying Han <yinghan@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The global kswapd scans per-zone LRU and reclaims pages regardless of the
cgroup. It breaks memory isolation since one cgroup can end up reclaiming
pages from another cgroup. Instead we should rely on memcg-aware target
reclaim including per-memcg kswapd and soft_limit hierarchical reclaim under
memory pressure.
In the global background reclaim, we do soft reclaim before scanning the
per-zone LRU. However, the return value is ignored. This patch is the first
step to skip shrink_zone() if soft_limit reclaim does enough work.
This is part of the effort which tries to reduce reclaiming pages in global
LRU in memcg. The per-memcg background reclaim patchset further enhances the
per-cgroup targetting reclaim, which I should have V4 posted shortly.
Try running multiple memory intensive workloads within seperate memcgs. Watch
the counters of soft_steal in memory.stat.
$ cat /dev/cgroup/A/memory.stat | grep 'soft'
soft_steal 240000
soft_scan 240000
total_soft_steal 240000
total_soft_scan 240000
This patch:
In the global background reclaim, we do soft reclaim before scanning the
per-zone LRU. However, the return value is ignored.
We would like to skip shrink_zone() if soft_limit reclaim does enough
work. Also, we need to make the memory pressure balanced across per-memcg
zones, like the logic vm-core. This patch is the first step where we
start with counting the nr_scanned and nr_reclaimed from soft_limit
reclaim into the global scan_control.
Signed-off-by: Ying Han <yinghan@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
enums are problematic because they cannot be forward-declared:
akpm2:/home/akpm> cat t.c
enum foo;
static inline void bar(enum foo f)
{
}
akpm2:/home/akpm> gcc -c t.c
t.c:4: error: parameter 1 ('f') has incomplete type
So move the enum's definition into a standalone header file which can be used
wherever its definition is needed.
Cc: Ying Han <yinghan@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier and
leads to some problems:
* cgroup creation is out-of-control
* cgroup name can conflict when pids are looping
* it is not possible to have a single process handling a lot of
namespaces without falling in a exponential creation time
* we may want to create a namespace without creating a cgroup
The ns_cgroup was replaced by a compatibility flag 'clone_children',
where a newly created cgroup will copy the parent cgroup values.
The userspace has to manually create a cgroup and add a task to
the 'tasks' file.
This patch removes the ns_cgroup as suggested in the following thread:
https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html
The 'cgroup_clone' function is removed because it is no longer used.
This is a userspace-visible change. Commit 45531757b4 ("cgroup: notify
ns_cgroup deprecated") (merged into 2.6.27) caused the kernel to emit a
printk warning users that the feature is planned for removal. Since that
time we have heard from XXX users who were affected by this.
Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jamal Hadi Salim <hadi@cyberus.ca>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Acked-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert cgroup_attach_proc to use flex_array.
The cgroup_attach_proc implementation requires a pre-allocated array to
store task pointers to atomically move a thread-group, but asking for a
monolithic array with kmalloc() may be unreliable for very large groups.
Using flex_array provides the same functionality with less risk of
failure.
This is a post-patch for cgroup-procs-write.patch.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Reviewed-by: Paul Menage <menage@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make procs file writable to move all threads by tgid at once.
Add functionality that enables users to move all threads in a threadgroup
at once to a cgroup by writing the tgid to the 'cgroup.procs' file. This
current implementation makes use of a per-threadgroup rwsem that's taken
for reading in the fork() path to prevent newly forking threads within the
threadgroup from "escaping" while the move is in progress.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Reviewed-by: Paul Menage <menage@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add cgroup subsystem callbacks for per-thread attachment in atomic contexts
Add can_attach_task(), pre_attach(), and attach_task() as new callbacks
for cgroups's subsystem interface. Unlike can_attach and attach, these
are for per-thread operations, to be called potentially many times when
attaching an entire threadgroup.
Also, the old "bool threadgroup" interface is removed, as replaced by
this. All subsystems are modified for the new interface - of note is
cpuset, which requires from/to nodemasks for attach to be globally scoped
(though per-cpuset would work too) to persist from its pre_attach to
attach_task and attach.
This is a pre-patch for cgroup-procs-writable.patch.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Reviewed-by: Paul Menage <menage@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adds functionality to read/write lock CLONE_THREAD fork()ing per-threadgroup
Add an rwsem that lives in a threadgroup's signal_struct that's taken for
reading in the fork path, under CONFIG_CGROUPS. If another part of the
kernel later wants to use such a locking mechanism, the CONFIG_CGROUPS
ifdefs should be changed to a higher-up flag that CGROUPS and the other
system would both depend on.
This is a pre-patch for cgroup-procs-write.patch.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Reviewed-by: Paul Menage <menage@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When configfs_register_subsystem() fails, we unregister too many
subsystems in configfs_example_init. Decrement i by one to not unregister
non-registered subsystem.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I find it very handy to show the average delays in milliseconds.
Example output (on 100 concurrent dd reading sparse files):
CPU count real total virtual total delay total delay average
986 3223509952 3207643301 38863410579 39.415ms
IO count delay total delay average
0 0 0ms
SWAP count delay total delay average
0 0 0ms
RECLAIM count delay total delay average
1059 5131834899 4ms
dd: read=0, write=0, cancelled_write=0
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Mel Gorman <mel@linux.vnet.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Reviewed-by: Satoru Moriya <satoru.moriya@hds.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes
Documentation/accounting/getdelays.c: In function `get_family_id':
Documentation/accounting/getdelays.c:172:14: warning: variable `rc' set but not used [-Wunused-but-set-variable]
Reported-by: "Justin P. Mattock" <justinmattock@gmail.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes
Documentation/accounting/getdelays.c: In function `main':
Documentation/accounting/getdelays.c:436:7: warning: variable `i' set but not used [-Wunused-but-set-variable]
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As declaring counter as volatile is discouraged, it is best not to use it
in sample code as well.
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Originally i_lastfrag was 32 bits but then we added support for handling
64 bit metadata and it became a 64 bit variable. That was during 2007, in
54fb996ac1 "[PATCH] ufs2 write: block allocation update". Unfortunately
these casts got left behind so the value got truncated to 32 bit again.
[akpm@linux-foundation.org: remove now-unneeded min_t/max_t casting]
Signed-off-by: Dan Carpenter <error27@gmail.com>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 51ba60c5 ("RTC: Cleanup rtc_class_ops->update_irq_enable()")
removed the only user of the update IRQ, so there is no need to manage it
any more.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Marcelo Roberto Jimenez <mroberto@cpti.cetuc.puc-rio.br>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The memory allocated using request_mem_region should be released using
release_mem_region, not release_region.
The semantic patch that fixes part of this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression E1,E2,E3;
@@
request_mem_region(E1,E2,E3)
...
?- release_region(E1,E2)
+ release_mem_region(E1,E2)
// </smpl>
[akpm@linux-foundation.org: use resource_size()]
Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add basic support for ST m41t93 SPI RTCs. Tested with factory-new and
with "run-in" species with and without backup batteries.
Signed-off-by: Nikolaus Voss <n.voss@weinmann.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add support for EM Microelectronic EM3027 RTC chip.
Signed-off-by: Mike Rapoport <mike@compulab.co.il>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds a driver for the RTC devices in VIA and WonderMedia
Systems-on-Chip. Alarm, 1Hz interrupts, reading and setting time are
supported.
Signed-off-by: Alexey Charkov <alchark@gmail.com>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Alexey Charkov <alchark@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On most architectures division is an expensive operation and accessing an
element currently requires four of them. This performance penalty
effectively precludes flex arrays from being used on any kind of fast
path. However, two of these divisions can be handled at creation time and
the others can be replaced by a reciprocal divide, completely avoiding
real divisions on access.
[eparis@redhat.com: rebase on top of changes to support 0 len elements]
[eparis@redhat.com: initialize part_nr when array fits entirely in base]
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
They have no meaning.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We plan to remove cpus_xx() old cpumask APIs later. Also, we plan to
change mm_cpu_mask() implementation, allocate only nr_cpu_ids, thus
*mm_cpu_mask() is dangerous operation.
Then, this patch convert them.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
alpha allmodconfig:
drivers/bcma/host_pci.c: In function 'bcma_host_pci_probe':
drivers/bcma/host_pci.c:102: error: implicit declaration of function 'kzalloc'
drivers/bcma/host_pci.c:102: warning: assignment makes pointer from integer without a cast
Cc: <zajec5@gmail.com>
Cc: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
alpha allmodconfig:
drivers/video/mb862xx/mb862xxfbdrv.c: In function 'mb862xxfb_ioctl':
drivers/video/mb862xx/mb862xxfbdrv.c:323: error: implicit declaration of function 'copy_to_user'
drivers/video/mb862xx/mb862xxfbdrv.c:327: error: implicit declaration of function 'copy_from_user'
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Anatolij Gustschin <agust@denx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make pm_qos_power_write() accept values passed to it in the ASCII hex
format either with or without an ending newline.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Mark Gross <markgross@thegnar.org>
For a filesystem that has lots of files in it, the first time we mount
it with free ino caching support, it can take quite a long time to
setup the caching before we can create new files.
Here we fill the cache with [highest_ino, BTRFS_LAST_FREE_OBJECTID]
before we start the caching thread to search through the extent tree.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
scrub_page collects several pages into one bio as long as they are physically
contiguous. As we only save one logical address for the whole bio, don't
collect pages that are physically contiguous but logically discontiguous.
Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This will detect small random writes into files and
queue the up for an auto defrag process. It isn't well suited to
database workloads yet, but works for smaller files such as rpm, sqlite
or bdb databases.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Since this cred was not created with copy_creds(), it needs to get
initialized. Otherwise use of syscall(__NR_keyctl, KEYCTL_SESSION_TO_PARENT);
can lead to a NULL deref. Thanks to Robert for finding this.
But introduced by commit 47a150edc2 ("Cache user_ns in struct cred").
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Reported-by: Robert Święcki <robert@swiecki.net>
Cc: David Howells <dhowells@redhat.com>
Cc: stable@kernel.org (2.6.39)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6:
gfs2: Drop __TIME__ usage
isdn/diva: Drop __TIME__ usage
atm: Drop __TIME__ usage
dlm: Drop __TIME__ usage
wan/pc300: Drop __TIME__ usage
parport: Drop __TIME__ usage
hdlcdrv: Drop __TIME__ usage
baycom: Drop __TIME__ usage
pmcraid: Drop __DATE__ usage
edac: Drop __DATE__ usage
rio: Drop __DATE__ usage
scsi/wd33c93: Drop __TIME__ usage
scsi/in2000: Drop __TIME__ usage
aacraid: Drop __TIME__ usage
media/cx231xx: Drop __TIME__ usage
media/radio-maxiradio: Drop __TIME__ usage
nozomi: Drop __TIME__ usage
cyclades: Drop __TIME__ usage
* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: vdso: Remove unused variable
x86-64: Optimize vDSO time()
x86-64: Add time to vDSO
x86-64: Turn off -pg and turn on -foptimize-sibling-calls for vDSO
x86-64: Move vread_tsc into a new file with sensible options
x86-64: Vclock_gettime(CLOCK_MONOTONIC) can't ever see nsec < 0
x86-64: Don't generate cmov in vread_tsc
x86-64: Remove unnecessary barrier in vread_tsc
x86-64: Clean up vdso/kernel shared variables
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
seqlock: Get rid of SEQLOCK_UNLOCKED
* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
irq: Remove smp_affinity_list when unregister irq proc
* 'gpio/next' of git://git.secretlab.ca/git/linux-2.6:
gpio/via: rename VIA local config struct
basic_mmio_gpio: split into a gpio library and platform device
gpio: remove some legacy comments in build files
gpio: add trace events for setting direction and value
gpio/pca953x: Use handle_simple_irq instead of handle_edge_irq
gpiolib: export gpiochip_find
gpio: remove redundant Kconfig depends on GPIOLIB
basic_mmio_gpio: convert to non-__raw* accessors
basic_mmio_gpio: support direction registers
basic_mmio_gpio: support different input/output registers
basic_mmio_gpio: detect output method at probe time
basic_mmio_gpio: request register regions
basic_mmio_gpio: allow overriding number of gpio
basic_mmio_gpio: convert to platform_{get,set}_drvdata()
basic_mmio_gpio: remove runtime width/endianness evaluation
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: (57 commits)
regulator: Fix 88pm8607.c printk format warning
input: Add support for Qualcomm PMIC8XXX power key
input: Add Qualcomm pm8xxx keypad controller driver
mfd: Add omap-usbhs runtime PM support
mfd: Fix ASIC3 SD Host Controller Configuration size
mfd: Fix omap_usbhs_alloc_children error handling
mfd: Fix omap usbhs crash when rmmoding ehci or ohci
mfd: Add ASIC3 LED support
leds: Add ASIC3 LED support
mfd: Update twl4030-code maintainer e-mail address
mfd: Correct the name and bitmask for ab8500-gpadc BTempPullUp
mfd: Add manual ab8500-gpadc batt temp activation for AB8500 3.0
mfd: Provide ab8500-core enumerators for chip cuts
mfd: Check twl4030-power remove script error condition after i2cwrite
mfd: Fix twl6030 irq definitions
mfd: Add phoenix lite (twl6025) support to twl6030
mfd: Avoid to use constraint name in 88pm860x regulator driver
mfd: Remove checking on max8925 regulator[0]
mfd: Remove unused parameter from 88pm860x API
mfd: Avoid to allocate 88pm860x static platform data
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
RDMA/cma: Save PID of ID's owner
RDMA/cma: Add support for netlink statistics export
RDMA/cma: Pass QP type into rdma_create_id()
RDMA: Update exported headers list
RDMA/cma: Export enum cma_state in <rdma/rdma_cm.h>
RDMA/nes: Add a check for strict_strtoul()
RDMA/cxgb3: Don't post zero-byte read if endpoint is going away
RDMA/cxgb4: Use completion objects for event blocking
IB/srp: Fix integer -> pointer cast warnings
IB: Add devnode methods to cm_class and umad_class
IB/mad: Return EPROTONOSUPPORT when an RDMA device lacks the QP required
IB/uverbs: Add devnode method to set path/mode
RDMA/ucma: Add .nodename/.mode to tell userspace where to create device node
RDMA: Add netlink infrastructure
RDMA: Add error handling to ib_core_init()
* 'spi/next' of git://git.secretlab.ca/git/linux-2.6:
spi/amba-pl022: work in polling or interrupt mode if pl022_dma_probe fails
spi/spi_s3c24xx: Use spi_bitbang_stop instead of spi_unregister_master in s3c24xx_spi_remove
spi/spi_nuc900: Use spi_bitbang_stop instead of spi_unregister_master in nuc900_spi_remove
spi/spi_tegra: use spi_unregister_master() instead of spi_master_put()
spi/spi_sh: use spi_unregister_master instead of spi_master_put in remove path
spi: Use void pointers for data in simple SPI I/O operations
spi/pl022: use cpu_relax in the busy loop
spi/pl022: mark driver non-experimental
spi/pl022: timeout on polled transfer v2
spi/dw_spi: improve the interrupt mode with the batch ops
spi/dw_spi: change poll mode transfer from byte ops to batch ops
spi/dw_spi: remove the un-necessary flush()
spi/dw_spi: unify the low level read/write routines
* 'omap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6: (33 commits)
OMAP3: PM: Boot message is not an error, and not helpful, remove it
OMAP3: cpuidle: change the power domains modes determination logic
OMAP3: cpuidle: code rework for improved readability
OMAP3: cpuidle: re-organize the C-states data
OMAP3: clean-up mach specific cpuidle data structures
OMAP3 cpuidle: remove useless SDP specific timings
usb: otg: OMAP4430: Powerdown the internal PHY when USB is disabled
usb: otg: OMAP4430: Fixing the omap4430_phy_init function
usb: musb: am35x: fix compile error when building am35x
usb: musb: OMAP4430: Power down the PHY during board init
omap: drop board-igep0030.c
omap: igep0020: add support for IGEP3
omap: igep0020: minor refactoring
omap: igep0020: name refactoring for future merge with IGEP3
omap: Remove support for omap2evm
arm: omap2plus: GPIO cleanup
omap: musb: introduce default board config
omap: move detection of NAND CS to common-board-devices
omap: use common initialization for PMIC i2c bus
omap: consolidate touch screen initialization among different boards
...