linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Lee Schermerhorn	4e25b2576e	hugetlb: add generic definition of NUMA_NO_NODE Move definition of NUMA_NO_NODE from ia64 and x86_64 arch specific headers to generic header 'linux/numa.h' for use in generic code. NUMA_NO_NODE replaces bare '-1' where it's used in this series to indicate "no node id specified". Ultimately, it can be used to replace the -1 elsewhere where it is used similarly. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:12 -08:00
Lee Schermerhorn	06808b0827	hugetlb: derive huge pages nodes allowed from task mempolicy This patch derives a "nodes_allowed" node mask from the numa mempolicy of the task modifying the number of persistent huge pages to control the allocation, freeing and adjusting of surplus huge pages when the pool page count is modified via the new sysctl or sysfs attribute "nr_hugepages_mempolicy". The nodes_allowed mask is derived as follows: * For "default" [NULL] task mempolicy, a NULL nodemask_t pointer is produced. This will cause the hugetlb subsystem to use node_online_map as the "nodes_allowed". This preserves the behavior before this patch. * For "preferred" mempolicy, including explicit local allocation, a nodemask with the single preferred node will be produced. "local" policy will NOT track any internode migrations of the task adjusting nr_hugepages. * For "bind" and "interleave" policy, the mempolicy's nodemask will be used. * Other than to inform the construction of the nodes_allowed node mask, the actual mempolicy mode is ignored. That is, all modes behave like interleave over the resulting nodes_allowed mask with no "fallback". See the updated documentation [next patch] for more information about the implications of this patch. Examples: Starting with: Node 0 HugePages_Total: 0 Node 1 HugePages_Total: 0 Node 2 HugePages_Total: 0 Node 3 HugePages_Total: 0 Default behavior [with or without this patch] balances persistent hugepage allocation across nodes [with sufficient contiguous memory]: sysctl vm.nr_hugepages[_mempolicy]=32 yields: Node 0 HugePages_Total: 8 Node 1 HugePages_Total: 8 Node 2 HugePages_Total: 8 Node 3 HugePages_Total: 8 Of course, we only have nr_hugepages_mempolicy with the patch, but with default mempolicy, nr_hugepages_mempolicy behaves the same as nr_hugepages. Applying mempolicy--e.g., with numactl [using '-m' a.k.a. '--membind' because it allows multiple nodes to be specified and it's easy to type]--we can allocate huge pages on individual nodes or sets of nodes. So, starting from the condition above, with 8 huge pages per node, add 8 more to node 2 using: numactl -m 2 sysctl vm.nr_hugepages_mempolicy=40 This yields: Node 0 HugePages_Total: 8 Node 1 HugePages_Total: 8 Node 2 HugePages_Total: 16 Node 3 HugePages_Total: 8 The incremental 8 huge pages were restricted to node 2 by the specified mempolicy. Similarly, we can use mempolicy to free persistent huge pages from specified nodes: numactl -m 0,1 sysctl vm.nr_hugepages_mempolicy=32 yields: Node 0 HugePages_Total: 4 Node 1 HugePages_Total: 4 Node 2 HugePages_Total: 16 Node 3 HugePages_Total: 8 The 8 huge pages freed were balanced over nodes 0 and 1. [rientjes@google.com: accomodate reworked NODEMASK_ALLOC] Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:12 -08:00
Lee Schermerhorn	c1e6c8d074	hugetlb: factor init_nodemask_of_node() Factor init_nodemask_of_node() out of the nodemask_of_node() macro. This will be used to populate the huge pages "nodes_allowed" nodemask for a single node when basing nodes_allowed on a preferred/local mempolicy or when a persistent huge page pool page count is modified via a per node sysfs attribute. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:12 -08:00
Lee Schermerhorn	6ae11b278b	hugetlb: add nodemask arg to huge page alloc, free and surplus adjust functions In preparation for constraining huge page allocation and freeing by the controlling task's numa mempolicy, add a "nodes_allowed" nodemask pointer to the allocate, free and surplus adjustment functions. For now, pass NULL to indicate default behavior--i.e., use node_online_map. A subsqeuent patch will derive a non-default mask from the controlling task's numa mempolicy. Note that this method of updating the global hstate nr_hugepages under the constraint of a nodemask simplifies keeping the global state consistent--especially the number of persistent and surplus pages relative to reservations and overcommit limits. There are undoubtedly other ways to do this, but this works for both interfaces: mempolicy and per node attributes. [rientjes@google.com: fix HIGHMEM compile error] Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Reviewed-by: Mel Gorman <mel@csn.ul.ie> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:12 -08:00
Lee Schermerhorn	9a76db0997	hugetlb: rework hstate_next_node_* functions Modify the hstate_next_node* functions to allow them to be called to obtain the "start_nid". Then, whereas prior to this patch we unconditionally called hstate_next_node_to_{alloc\|free}(), whether or not we successfully allocated/freed a huge page on the node, now we only call these functions on failure to alloc/free to advance to next allowed node. Factor out the next_node_allowed() function to handle wrap at end of node_online_map. In this version, the allowed nodes include all of the online nodes. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Reviewed-by: Mel Gorman <mel@csn.ul.ie> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:12 -08:00
David Rientjes	4e7b8a6cef	nodemask: make NODEMASK_ALLOC more general This is a series of patches to provide control over the location of the allocation and freeing of persistent huge pages on a NUMA platform. Please consider for merging into mmotm. This series uses two mechanisms to constrain the nodes from which persistent huge pages are allocated: 1) the task NUMA mempolicy of the task modifying a new sysctl "nr_hugepages_mempolicy", based on a suggestion by Mel Gorman; and 2) a subset of the hugepages hstate sysfs attributes have been added [in V4] to each node system device under: /sys/devices/node/node[0-9]*/hugepages The per node attibutes allow direct assignment of a huge page count on a specific node, regardless of the task's mempolicy or cpuset constraints. This patch: NODEMASK_ALLOC(x, m) assumes x is a type of struct, which is unnecessary. It's perfectly reasonable to use this macro to allocate a nodemask_t, which is anonymous, either dynamically or on the stack depending on NODES_SHIFT. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Rientjes <rientjes@google.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:12 -08:00
KOSAKI Motohiro	6d9c285a63	mm: move inc_zone_page_state(NR_ISOLATED) to just isolated place Christoph pointed out inc_zone_page_state(NR_ISOLATED) should be placed in right after isolate_page(). This patch does it. Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:12 -08:00
Wu Fengguang	ee32398fda	/dev/mem: remove redundant parameter from do_write_kmem() Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Avi Kivity <avi@qumranet.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:12 -08:00
Wu Fengguang	80ad89a0ce	/dev/mem: remove the "written" variable in write_kmem() Also rename "len" to "sz". No behavior change. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Avi Kivity <avi@qumranet.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:11 -08:00
Wu Fengguang	7fabaddd09	/dev/mem: make size_inside_page() logic straight Also convert more size_inside_page() users. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Avi Kivity <avi@qumranet.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:11 -08:00
Wu Fengguang	fa29e97bb8	/dev/mem: cleanup unxlate_dev_mem_ptr() calls No behaviour change. [akpm@linux-foundation.org: cleanuplets] [akpm@linux-foundation.org: remove unused `ret'] Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:11 -08:00
Wu Fengguang	f222318e9c	/dev/mem: introduce size_inside_page() Introduce size_inside_page() to replace duplicate /dev/mem code. Also apply it to /dev/kmem, whose alignment logic was buggy. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:11 -08:00
Wu Fengguang	4ea2f43f28	/dev/mem: remove redundant test on len The len test in write_kmem() is always true, so can be reduced. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:11 -08:00
KOSAKI Motohiro	659ace584e	mmap: don't return ENOMEM when mapcount is temporarily exceeded in munmap() On ia64, the following test program exit abnormally, because glibc thread library called abort(). ======================================================== (gdb) bt #0 0xa000000000010620 in __kernel_syscall_via_break () #1 0x20000000003208e0 in raise () from /lib/libc.so.6.1 #2 0x2000000000324090 in abort () from /lib/libc.so.6.1 #3 0x200000000027c3e0 in __deallocate_stack () from /lib/libpthread.so.0 #4 0x200000000027f7c0 in start_thread () from /lib/libpthread.so.0 #5 0x200000000047ef60 in __clone2 () from /lib/libc.so.6.1 ======================================================== The fact is, glibc call munmap() when thread exitng time for freeing stack, and it assume munlock() never fail. However, munmap() often make vma splitting and it with many mapcount make -ENOMEM. Oh well, that's crazy, because stack unmapping never increase mapcount. The maxcount exceeding is only temporary. internal temporary exceeding shouldn't make ENOMEM. This patch does it. test_max_mapcount.c ================================================================== #include<stdio.h> #include<stdlib.h> #include<string.h> #include<pthread.h> #include<errno.h> #include<unistd.h> #define THREAD_NUM 30000 #define MAL_SIZE (810241024) void wait_thread(void args) { void addr; addr = malloc(MAL_SIZE); sleep(10); return NULL; } void wait_thread2(void args) { sleep(60); return NULL; } int main(int argc, char argv[]) { int i; pthread_t thread[THREAD_NUM], th; int ret, count = 0; pthread_attr_t attr; ret = pthread_attr_init(&attr); if(ret) { perror("pthread_attr_init"); } ret = pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED); if(ret) { perror("pthread_attr_setdetachstate"); } for (i = 0; i < THREAD_NUM; i++) { ret = pthread_create(&th, &attr, wait_thread, NULL); if(ret) { fprintf(stderr, "[%d] ", count); perror("pthread_create"); } else { printf("[%d] create OK.\n", count); } count++; ret = pthread_create(&thread[i], &attr, wait_thread2, NULL); if(ret) { fprintf(stderr, "[%d] ", count); perror("pthread_create"); } else { printf("[%d] create OK.\n", count); } count++; } sleep(3600); return 0; } ================================================================== [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:11 -08:00
Alex Chiang	bb86a7338b	page-types: exit early when invoked with -d\|--describe On a system with large amount of memory (256GB), invoking page-types can take quite a long time, which is unreasonable considering the user only wants a description of the flags: # time ./page-types -d 0x10 0x0000000000000010 ____D_____________________________ dirty real 0m34.285s user 0m1.966s sys 0m32.313s This is because we still walk the entire address range. Exiting early seems like a reasonble solution: # time ./page-types -d 0x10 0x0000000000000010 ____D_____________________________ dirty real 0m0.007s user 0m0.001s sys 0m0.005s Signed-off-by: Alex Chiang <achiang@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Haicheng Li <haicheng.li@intel.com> Acked-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:11 -08:00
Alex Chiang	9fdcd886ab	page-types: whitespace alignment Align the output when page-type -h is invoked. Signed-off-by: Alex Chiang <achiang@hp.com> Acked-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:11 -08:00
Alex Chiang	dcfe730c60	page-types: learn to describe flags directly from command line Teach page-types to describe page flags directly from the command line. Why is this useful? For instance, if you're using memory hotplug and see this in /var/log/messages: kernel: removing from LRU failed 3836dd0/1/1e00000000000010 It would be nice to decode those page flags without staring at the source. Example usage and output: # Documentation/vm/page-types -d 0x10 0x0000000000000010 ____D_____________________________ dirty # Documentation/vm/page-types -d anon 0x0000000000001000 ____________a_____________________ anonymous # Documentation/vm/page-types -d anon,0x10 0x0000000000001010 ____D_______a_____________________ dirty,anonymous [achiang@hp.com: documentation] Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Haicheng Li <haicheng.li@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:11 -08:00
Roel Kluin	f1327bf18c	page-types: unsigned cannot be less than 0 in add_page() If not signed, testing of the read() return value in this function will not work. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:11 -08:00
Tommi Rantala	3428838d8e	page-types: constify read only arrays Signed-off-by: Tommi Rantala <tt.rantala@gmail.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:10 -08:00
David Rientjes	1b604d75bb	oom: dump stack and VM state when oom killer panics The oom killer header, including information such as the allocation order and gfp mask, current's cpuset and memory controller, call trace, and VM state information is currently only shown when the oom killer has selected a task to kill. This information is omitted, however, when the oom killer panics either because of panic_on_oom sysctl settings or when no killable task was found. It is still relevant to know crucial pieces of information such as the allocation order and VM state when diagnosing such issues, especially at boot. This patch displays the oom killer header whenever it panics so that bug reports can include pertinent information to debug the issue, if possible. Signed-off-by: David Rientjes <rientjes@google.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:10 -08:00
Michal Marek	5ce45962b2	MAINTAINERS: new kbuild maintainer Sam was fine with handing over kbuild maintainership to me. The git trees are already in linux-next, a merge request will follow shortly. Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Michal Marek <mmarek@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:10 -08:00
Amerigo Wang	ec81aecb29	hfs: fix a potential buffer overflow A specially-crafted Hierarchical File System (HFS) filesystem could cause a buffer overflow to occur in a process's kernel stack during a memcpy() call within the hfs_bnode_read() function (at fs/hfs/bnode.c:24). The attacker can provide the source buffer and length, and the destination buffer is a local variable of a fixed length. This local variable (passed as "&entry" from fs/hfs/dir.c:112 and allocated on line 60) is stored in the stack frame of hfs_bnode_read()'s caller, which is hfs_readdir(). Because the hfs_readdir() function executes upon any attempt to read a directory on the filesystem, it gets called whenever a user attempts to inspect any filesystem contents. [amwang@redhat.com: modify this patch and fix coding style problems] Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Eugene Teo <eteo@redhat.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Dave Anderson <anderson@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:10 -08:00
Alexey Dobriyan	4b731d50ff	bsdacct: fix uid/gid misreporting commit `d8e180dcd5` "bsdacct: switch credentials for writing to the accounting file" introduced credential switching during final acct data collecting. However, uid/gid pair continued to be collected from current which became credentials of who created acct file, not who exits. Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14676 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reported-by: Juho K. Juopperi <jkj@kapsi.fi> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: David Howells <dhowells@redhat.com> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:10 -08:00
Linus Torvalds	5443040754	Merge branch 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging * 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging: i2c-core: i2c bus should support PM entries in struct dev_pm_ops i2c: Get rid of I2C_CLIENT_MODULE_PARM i2c: Drop I2C_CLIENT_INSMOD_2 to 8 i2c: Drop I2C_CLIENT_INSMOD_1 i2c: Get rid of struct i2c_client_address_data i2c: Drop the kind parameter from detect callbacks	2009-12-14 14:11:56 -08:00
Linus Torvalds	3ea6b3d0e6	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6: udf: Avoid IO in udf_clear_inode udf: Try harder when looking for VAT inode udf: Fix compilation with UDFFS_DEBUG enabled	2009-12-14 12:50:25 -08:00
Jan Kara	2c948b3f86	udf: Avoid IO in udf_clear_inode It is not very good to do IO in udf_clear_inode. First, VFS does not really expect inode to become dirty there and thus we have to write it ourselves, second, memory reclaim gets blocked waiting for IO when it does not really expect it, third, the IO pattern (e.g. on umount) resulting from writes in udf_clear_inode is bad and it slows down writing a lot. The reason why UDF needed to do IO in udf_clear_inode is that UDF standard mandates extent length to exactly match inode size. But when we allocate extents to a file or directory, we don't really know what exactly the final file size will be and thus temporarily set it to block boundary and later truncate it to exact length in udf_clear_inode. Now, this is changed to truncate to final file size in udf_release_file for regular files. For directories and symlinks, we do the truncation at the moment when learn what the final file size will be. Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-14 21:40:04 +01:00
Jan Kara	e971b0b9e0	udf: Try harder when looking for VAT inode Some disks do not contain VAT inode in the last recorded block as required by the standard but a few blocks earlier (or the number of recorded blocks is wrong). So look for the VAT inode a bit before the end of the media. Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-14 21:40:04 +01:00
Jan Kara	1fefd086df	udf: Fix compilation with UDFFS_DEBUG enabled Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-14 21:40:04 +01:00
Linus Torvalds	75b08038ce	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, mce: Clean up thermal init by introducing intel_thermal_supported() x86, mce: Thermal monitoring depends on APIC being enabled x86: Gart: fix breakage due to IOMMU initialization cleanup x86: Move swiotlb initialization before dma32_free_bootmem x86: Fix build warning in arch/x86/mm/mmio-mod.c x86: Remove usedac in feature-removal-schedule.txt x86: Fix duplicated UV BAU interrupt vector nvram: Fix write beyond end condition; prove to gcc copy is safe mm: Adjust do_pages_stat() so gcc can see copy_from_user() is safe x86: Limit the number of processor bootup messages x86: Remove enabling x2apic message for every CPU doc: Add documentation for bootloader_{type,version} x86, msr: Add support for non-contiguous cpumasks x86: Use find_e820() instead of hard coded trampoline address x86, AMD: Fix stale cpuid4_info shared_map data in shared_cpu_map cpumasks Trivial percpu-naming-introduced conflicts in arch/x86/kernel/cpu/intel_cacheinfo.c	2009-12-14 12:36:46 -08:00
Linus Torvalds	fb1beb29b5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6: pcmcia: CodingStyle fixes pcmcia: remove unused IRQ_FIRST_SHARED	2009-12-14 12:33:02 -08:00
sonic zhang	54067ee206	i2c-core: i2c bus should support PM entries in struct dev_pm_ops Struct dev_pm_ops is not configured in current i2c bus type. i2c drivers only depends on suspend/resume entries in struct dev_pm_ops are not informed of PM suspend and resume events by i2c framework. Signed-off-by: Sonic Zhang <sonic.zhang@analog.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2009-12-14 21:17:30 +01:00
Jean Delvare	7f508118b1	i2c: Get rid of I2C_CLIENT_MODULE_PARM There is no user left of I2C_CLIENT_MODULE_PARM, so we can finally get rid of this ugly macro. Signed-off-by: Jean Delvare <khali@linux-fr.org> Tested-by: Wolfram Sang <w.sang@pengutronix.de>	2009-12-14 21:17:29 +01:00
Jean Delvare	e5e9f44c24	i2c: Drop I2C_CLIENT_INSMOD_2 to 8 These macros simply declare an enum, so drivers might as well declare it themselves. This puts an end to the arbitrary limit of 8 chip types per i2c driver. Signed-off-by: Jean Delvare <khali@linux-fr.org> Tested-by: Wolfram Sang <w.sang@pengutronix.de>	2009-12-14 21:17:27 +01:00
Jean Delvare	1f86df49dd	i2c: Drop I2C_CLIENT_INSMOD_1 This macro simply declares an enum, so drivers might as well declare it themselves. Signed-off-by: Jean Delvare <khali@linux-fr.org> Tested-by: Wolfram Sang <w.sang@pengutronix.de>	2009-12-14 21:17:26 +01:00
Jean Delvare	c3813d6af1	i2c: Get rid of struct i2c_client_address_data Struct i2c_client_address_data only contains one field at this point, which makes its usefulness questionable. Get rid of it and pass simple address lists around instead. Signed-off-by: Jean Delvare <khali@linux-fr.org> Tested-by: Wolfram Sang <w.sang@pengutronix.de>	2009-12-14 21:17:25 +01:00
Jean Delvare	310ec79210	i2c: Drop the kind parameter from detect callbacks The "kind" parameter always has value -1, and nobody is using it any longer, so we can remove it. Signed-off-by: Jean Delvare <khali@linux-fr.org> Tested-by: Wolfram Sang <w.sang@pengutronix.de>	2009-12-14 21:17:23 +01:00
Linus Torvalds	478e4e9d7a	Merge branch 'next-spi' of git://git.secretlab.ca/git/linux-2.6 * 'next-spi' of git://git.secretlab.ca/git/linux-2.6: (23 commits) spi: fix probe/remove section markings Add OMAP spi100k driver spi-imx: don't access struct device directly but use dev_get_platdata spi-imx: Add mx25 support spi-imx: use positive logic to distinguish cpu variants spi-imx: correct check for platform_get_irq failing ARM: NUC900: Add spi driver support for nuc900 spi: SuperH MSIOF SPI Master driver V2 spi: fix spidev compilation failure when VERBOSE is defined spi/au1550_spi: fix setupxfer not to override cfg with zeros spi/mpc8xxx: don't use __exit_p to wrap plat_mpc8xxx_spi_remove spi/i.MX: fix broken error handling for gpio_request spi/i.mx: drain MXC SPI transfer buffer when probing device MAINTAINERS: add SPI co-maintainer. spi/xilinx_spi: fix incorrect casting spi/mpc52xx-spi: minor cleanups xilinx_spi: add a platform driver using the xilinx_spi common module. xilinx_spi: add support for the DS570 IP. xilinx_spi: Switch to iomem functions and support little endian. xilinx_spi: Split into of driver and generic part. ...	2009-12-14 10:22:11 -08:00
Linus Torvalds	2205afa7d1	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf sched: Fix build failure on sparc perf bench: Add "all" pseudo subsystem and "all" pseudo suite perf tools: Introduce perf_session class perf symbols: Ditch dso->find_symbol perf symbols: Allow lookups by symbol name too perf symbols: Add missing "Variables" entry to map_type__name perf symbols: Add support for 'variable' symtabs perf symbols: Introduce ELF counterparts to symbol_type__is_a perf symbols: Introduce symbol_type__is_a perf symbols: Rename kthreads to kmaps, using another abstraction for it perf tools: Allow building for ARM hw-breakpoints: Handle bad modify_user_hw_breakpoint off-case return value perf tools: Allow cross compiling tracing, slab: Fix no callsite ifndef CONFIG_KMEMTRACE tracing, slab: Define kmem_cache_alloc_notrace ifdef CONFIG_TRACING Trivial conflict due to different fixes to modify_user_hw_breakpoint() in include/linux/hw_breakpoint.h	2009-12-14 10:13:22 -08:00
David Howells	491424c0f4	PCI: Global variable decls must match the defs in section attributes Global variable declarations must match the definitions in section attributes as the compiler is at liberty to vary the method it uses to access a variable, depending on the section it is in. When building the FRV arch, I now see: drivers/built-in.o: In function `pci_apply_final_quirks': drivers/pci/quirks.c:2606: relocation truncated to fit: R_FRV_GPREL12 against symbol `pci_dfl_cache_line_size' defined in .devinit.data section in drivers/built-in.o drivers/pci/quirks.c:2623: relocation truncated to fit: R_FRV_GPREL12 against symbol `pci_dfl_cache_line_size' defined in .devinit.data section in drivers/built-in.o drivers/pci/quirks.c:2630: relocation truncated to fit: R_FRV_GPREL12 against symbol `pci_dfl_cache_line_size' defined in .devinit.data section in drivers/built-in.o because the declaration of pci_dfl_cache_line_size in linux/pci.h does not match the definition in drivers/pci/pci.c. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-14 10:11:34 -08:00
David Howells	5185fb0699	FRV: Fix no-hardware-breakpoint case If there is no hardware breakpoint support, modify_user_hw_breakpoint() tries to return a NULL pointer through as an 'int' return value: In file included from kernel/exit.c:53: include/linux/hw_breakpoint.h: In function 'modify_user_hw_breakpoint': include/linux/hw_breakpoint.h:96: warning: return makes integer from pointer without a cast Return 0 instead. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-14 10:10:55 -08:00
Linus Torvalds	464480f72e	Merge branch 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze * 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze: (46 commits) microblaze: Remove rt_sigsuspend wrapper microblaze: nommu: Don't clobber R11 on syscalls microblaze: Remove show_tmem function microblaze: Support for WB cache microblaze: Add PVR for Microblaze v7.30.a microblaze: Remove ancient and fake microblaze version from cpu_ver table microblaze: Remove panic_timeout init value microblaze: Do not count system calls in default microblaze: Enable DTC compilation microblaze: Core oprofile configs and hooks microblaze: Fix level interrupt ACKing microblaze: Enable futimesat syscall microblaze: Checking DTS against PVR for write-back cache microblaze: Remove duplicity from pgalloc.h microblaze: Futex support microblaze: Adding dev_arch_data functions microblaze: Fix the heartbeat gpio to be more robust microblaze: Simple __copy_tofrom_user for noMMU microblaze: Export memory_start for modules microblaze: Use lowest-common-denominator default CPU settings ...	2009-12-14 10:04:04 -08:00
Linus Torvalds	37222e1c9e	Merge branch 'for-linus' of git://neil.brown.name/md * 'for-linus' of git://neil.brown.name/md: (27 commits) md: add 'recovery_start' per-device sysfs attribute md: rcu_read_lock() walk of mddev->disks in md_do_sync() md: integrate spares into array at earliest opportunity. md: move compat_ioctl handling into md.c md: revise Kconfig help for MD_MULTIPATH md: add MODULE_DESCRIPTION for all md related modules. raid: improve MD/raid10 handling of correctable read errors. md/raid10: print more useful messages on device failure. md/bitmap: update dirty flag when bitmap bits are explicitly set. md: Support write-intent bitmaps with externally managed metadata. md/bitmap: move setting of daemon_lastrun out of bitmap_read_sb md: support updating bitmap parameters via sysfs. md: factor out parsing of fixed-point numbers md: support bitmap offset appropriate for external-metadata arrays. md: remove needless setting of thread->timeout in raid10_quiesce md: change daemon_sleep to be in 'jiffies' rather than 'seconds'. md: move offset, daemon_sleep and chunksize out of bitmap structure md: collect bitmap-specific fields into one structure. md/raid1: add takeover support for raid5->raid1 md: add honouring of suspend_{lo,hi} to raid1. ...	2009-12-14 10:03:36 -08:00
Linus Torvalds	76b8f82cde	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6 * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: (58 commits) mfd: Add twl6030 regulator subdevices regulator: Add support for twl6030 regulators rtc: Add twl6030 RTC support mfd: Add support for twl6030 irq framework mfd: Rename twl4030_ routines in twl-regulator.c mfd: Rename twl4030_ routines in rtc-twl.c mfd: Rename all twl4030_i2c* mfd: Rename twl4030* driver files to enable re-use mfd: Clarify twl4030 return value for read and write mfd: Add all twl4030 regulators to the twl4030 mfd driver mfd: Don't set mc13783 ADREFMODE for touch conversions mfd: Remove ezx-pcap defines for custom led gpio encoding mfd: Near complete mc13783 rewrite mfd: Remove build time warning for WM835x register default tables mfd: Force I2C to be built in when building WM831x mfd: Don't allow wm831x to be built as a module mfd: Fix incorrect error check for wm8350-core mfd: Fix twl4030 warning gpiolib: Implement gpio_to_irq() for wm831x mfd: Remove default selection of AB4500 ...	2009-12-14 10:02:35 -08:00
Linus Torvalds	af853e631d	Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm * 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm: ARM: fix lh7a40x build ARM: fix sa1100 build ARM: fix clps711x, footbridge, integrator, ixp2000, ixp2300 and s3c build bug ARM: VFP: fix vfp thread init bug and document vfp notifier entry conditions ARM: pxa: fix now incorrect reference of skt->irq by using skt->socket.pci_irq [ARM] pxa/zeus: default configuration for Arcom Zeus SBC. [ARM] pxa/zeus: make Viper pcmcia support more generic to support Zeus [ARM] pxa/zeus: basic support for Arcom Zeus SBC [ARM] pxa/em-x270: fix usb hub power up/reset sequence PCMCIA: fix pxa2xx_lubbock modular build error ARM: RealView: Fix typo in the RealView/PBX Kconfig entry ARM: Do not allow the probing of the local timer ARM: Add an earlyprintk debug console	2009-12-14 10:01:15 -08:00
Linus Torvalds	3f86ce72cf	Merge git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (75 commits) NFS: Fix nfs_migrate_page() rpc: remove unneeded function parameter in gss_add_msg() nfs41: Invoke RECLAIM_COMPLETE on all new client ids SUNRPC: IS_ERR/PTR_ERR confusion NFSv41: Fix a potential state leakage when restarting nfs4_close_prepare nfs41: Handle NFSv4.1 session errors in the delegation recall code nfs41: Retry delegation return if it failed with session error nfs41: Handle session errors during delegation return nfs41: Mark stateids in need of reclaim if state manager gets stale clientid NFS: Fix up the declaration of nfs4_restart_rpc when NFSv4 not configured nfs41: Don't clear DRAINING flag on NFS4ERR_STALE_CLIENTID nfs41: nfs41_setup_state_renewal NFSv41: More cleanups NFSv41: Fix up some bugs in the NFS4CLNT_SESSION_DRAINING code NFSv41: Clean up slot table management NFSv41: Fix nfs4_proc_create_session nfs41: Invoke RECLAIM_COMPLETE nfs41: RECLAIM_COMPLETE functionality nfs41: RECLAIM_COMPLETE XDR functionality Cleanup some NFSv4 XDR decode comments ...	2009-12-14 10:00:24 -08:00
Linus Torvalds	d0316554d3	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits) m68k: rename global variable vmalloc_end to m68k_vmalloc_end percpu: add missing per_cpu_ptr_to_phys() definition for UP percpu: Fix kdump failure if booted with percpu_alloc=page percpu: make misc percpu symbols unique percpu: make percpu symbols in ia64 unique percpu: make percpu symbols in powerpc unique percpu: make percpu symbols in x86 unique percpu: make percpu symbols in xen unique percpu: make percpu symbols in cpufreq unique percpu: make percpu symbols in oprofile unique percpu: make percpu symbols in tracer unique percpu: make percpu symbols under kernel/ and mm/ unique percpu: remove some sparse warnings percpu: make alloc_percpu() handle array types vmalloc: fix use of non-existent percpu variable in put_cpu_var() this_cpu: Use this_cpu_xx in trace_functions_graph.c this_cpu: Use this_cpu_xx for ftrace this_cpu: Use this_cpu_xx in nmi handling this_cpu: Use this_cpu operations in RCU this_cpu: Use this_cpu ops for VM statistics ... Fix up trivial (famous last words) global per-cpu naming conflicts in arch/x86/kvm/svm.c mm/slab.c	2009-12-14 09:58:24 -08:00
William Allen Simpson	fb0bbb92d4	Documentation: rw_lock lessons learned In recent months, two different network projects erroneously strayed down the rw_lock path. Update the Documentation based upon comments by Eric Dumazet and Paul E. McKenney in those threads. Further updates await somebody else with more expertise. Changes: - Merged with extensive content by Stephen Hemminger. - Fix one of the comments by Linus Torvalds. Signed-off-by: William.Allen.Simpson@gmail.com Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-14 09:46:56 -08:00
Hidetoshi Seto	70fe440718	x86, mce: Clean up thermal init by introducing intel_thermal_supported() It looks better to have a common function. No change in functionality. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> LKML-Reference: <4B25FDDC.407@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Cyrill Gorcunov <gorcunov@openvz.org>	2009-12-14 10:38:41 +01:00
Cyrill Gorcunov	485a2e1973	x86, mce: Thermal monitoring depends on APIC being enabled Add check if APIC is not disabled since thermal monitoring depends on it. As only apic gets disabled we should not try to install "thermal monitor" vector, print out that thermal monitoring is enabled and etc... Note that "Intel Correct Machine Check Interrupts" already has such a check. Also I decided to not add cpu_has_apic check into mcheck_intel_therm_init since even if it'll call apic_read on disabled apic -- it's safe here and allow us to save a few code bytes. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> LKML-Reference: <4B25FDC2.3020401@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-14 10:38:41 +01:00
David Miller	2cd9046cc5	perf sched: Fix build failure on sparc Here, tvec->tv_usec is "unsigned int" not "unsigned long". Since the type is different on every platform, it's probably best to just use long printf formats and cast. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20091213.235622.53363059.davem@davemloft.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-14 08:59:12 +01:00

1 2 3 4 5 ...

176245 Commits All Branches Search

176245 Commits

All Branches