OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Balbir Singh	cf475ad28a	cgroups: add an owner to the mm_struct Remove the mem_cgroup member from mm_struct and instead adds an owner. This approach was suggested by Paul Menage. The advantage of this approach is that, once the mm->owner is known, using the subsystem id, the cgroup can be determined. It also allows several control groups that are virtually grouped by mm_struct, to exist independent of the memory controller i.e., without adding mem_cgroup's for each controller, to mm_struct. A new config option CONFIG_MM_OWNER is added and the memory resource controller selects this config option. This patch also adds cgroup callbacks to notify subsystems when mm->owner changes. The mm_cgroup_changed callback is called with the task_lock() of the new task held and is called just prior to changing the mm->owner. I am indebted to Paul Menage for the several reviews of this patchset and helping me make it lighter and simpler. This patch was tested on a powerpc box, it was compiled with both the MM_OWNER config turned on and off. After the thread group leader exits, it's moved to init_css_state by cgroup_exit(), thus all future charges from runnings threads would be redirected to the init_css_set's subsystem. Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Hugh Dickins <hugh@veritas.com> Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com> Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Cc: Hirokazu Takahashi <taka@valinux.co.jp> Cc: David Rientjes <rientjes@google.com>, Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Reviewed-by: Paul Menage <menage@google.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:10 -07:00
Serge E. Hallyn	29486df325	cgroups: introduce cft->read_seq() Introduce a read_seq() helper in cftype, which uses seq_file to print out lists. Use it in the devices cgroup. Also split devices.allow into two files, so now devices.deny and devices.allow are the ones to use to manipulate the whitelist, while devices.list outputs the cgroup's current whitelist. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: Paul Menage <menage@google.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:10 -07:00
Li Zefan	28fd5dfc12	cgroups: remove the css_set linked-list Now we can run through the hash table instead of running through the linked-list. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Reviewed-by: Paul Menage <menage@google.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:10 -07:00
Li Zefan	472b1053f3	cgroups: use a hash table for css_set finding When we attach a process to a different cgroup, the css_set linked-list will be run through to find a suitable existing css_set to use. This patch implements a hash table for better performance. The following benchmarks have been tested: For N in 1, 5, 10, 50, 100, 500, 1000, create N cgroups with one sleeping task in each, and then move an additional task through each cgroup in turn. Here is a test result: N Loop orig - Time(s) hash - Time(s) ---------------------------------------------- 1 10000 1.201231728 1.196311177 5 2000 1.065743872 1.040566424 10 1000 0.991054735 0.986876440 50 200 0.976554203 0.969608733 100 100 0.998504680 0.969218270 500 20 1.157347764 0.962602963 1000 10 1.619521852 1.085140172 Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Reviewed-by: Paul Menage <menage@google.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:09 -07:00
Serge E. Hallyn	08ce5f16ee	cgroups: implement device whitelist Implement a cgroup to track and enforce open and mknod restrictions on device files. A device cgroup associates a device access whitelist with each cgroup. A whitelist entry has 4 fields. 'type' is a (all), c (char), or b (block). 'all' means it applies to all types and all major and minor numbers. Major and minor are either an integer or * for all. Access is a composition of r (read), w (write), and m (mknod). The root device cgroup starts with rwm to 'all'. A child devcg gets a copy of the parent. Admins can then remove devices from the whitelist or add new entries. A child cgroup can never receive a device access which is denied its parent. However when a device access is removed from a parent it will not also be removed from the child(ren). An entry is added using devices.allow, and removed using devices.deny. For instance echo 'c 1:3 mr' > /cgroups/1/devices.allow allows cgroup 1 to read and mknod the device usually known as /dev/null. Doing echo a > /cgroups/1/devices.deny will remove the default 'a : mrw' entry. CAP_SYS_ADMIN is needed to change permissions or move another task to a new cgroup. A cgroup may not be granted more permissions than the cgroup's parent has. Any task can move itself between cgroups. This won't be sufficient, but we can decide the best way to adequately restrict movement later. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix may-be-used-uninitialized warning] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: James Morris <jmorris@namei.org> Looks-good-to: Pavel Emelyanov <xemul@openvz.org> Cc: Daniel Hokka Zakrisson <daniel@hozac.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:09 -07:00
Pavel Emelyanov	d447ea2f30	cgroups: add the trigger callback to struct cftype Trigger callback can be used to receive a kick-up from the user space. The string written is ignored. The cftype->private is used for multiplexing events. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Paul Menage <menage@google.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:09 -07:00
Paul Menage	e73d2c61d1	CGroups _s64 files: add cgroups read_s64/write_s64 file methods These patches add cgroups read_s64 and write_s64 control file methods (the signed equivalent of read_u64/write_u64) and use them to implement the cpu.rt_runtime_us control file in the CFS cgroup subsystem. This patch: These are the signed equivalents of the read_u64/write_u64 methods Signed-off-by: Paul Menage <menage@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:09 -07:00
Paul Menage	3116f0e3df	CGroup API files: move "releasable" to cgroup_debug subsystem The "releasable" control file provided by the cgroup framework exports the state of a per-cgroup flag that's related to the notify-on-release feature. This isn't really generally useful, unless you're trying to debug this particular feature of cgroups. This patch moves the "releasable" file to the cgroup_debug subsystem. Signed-off-by: Paul Menage <menage@google.com> Cc: "Li Zefan" <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: "YAMAMOTO Takashi" <yamamoto@valinux.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:09 -07:00
Paul Menage	9179656961	CGroup API files: add cgroup map data type Adds a new type of supported control file representation, a map from strings to u64 values. Each map entry is printed as a line in a similar format to /proc/vmstat, i.e. "$key $value\n" Signed-off-by: Paul Menage <menage@google.com> Cc: "Li Zefan" <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: "YAMAMOTO Takashi" <yamamoto@valinux.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:08 -07:00
Paul Menage	2c7eabf376	CGroup API files: add res_counter_read_u64() Adds a function for returning the value of a resource counter member, in a form suitable for use in a cgroup read_u64 control file method. Signed-off-by: Paul Menage <menage@google.com> Cc: "Li Zefan" <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: "YAMAMOTO Takashi" <yamamoto@valinux.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:08 -07:00
Paul Menage	f4c753b7ea	CGroup API files: rename read/write_uint methods to read_write_u64 Several people have justifiably complained that the "_uint" suffix is inappropriate for functions that handle u64 values, so this patch just renames all these functions and their users to have the suffic _u64. [peterz@infradead.org: build fix] Signed-off-by: Paul Menage <menage@google.com> Cc: "Li Zefan" <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: "YAMAMOTO Takashi" <yamamoto@valinux.co.jp> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:07 -07:00
Jan Engelhardt	c9e587abfd	vt: fix background color on line feed A command that causes a line feed while a background color is active, such as perl -e 'print "x" x 60, "\e[44m", "x" x 40, "\e[0m\n"' and perl -e 'print "x" x 40, "\e[44m\n", "x" x 40, "\e[0m\n"' causes the line that was started as a result of the line feed to be completely filled with the currently active background color instead of the default color. When scrolling, part of the current screen is memcpy'd/memmove'd to the new region, and the new line(s) that will appear as a result are cleared using memset. However, the lines are cleared with vc->vc_video_erase_char, causing them to be colored with the currently active background color. This is different from X11 terminal emulators which always paint the new lines with the default background color (e.g. `xterm -bg black`). The clear operation (\e[1J and \e[2J) also use vc_video_erase_char, so a new vc->vc_scrl_erase_char is introduced with contains the erase character used for scrolling, which is built from vc->vc_def_color instead of vc->vc_color. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:06 -07:00
Dave Young	5f97a5a879	isolate ratelimit from printk.c for other use Due to the rcupreempt.h WARN_ON trigged, I got 2G syslog file. For some serious complaining of kernel, we need repeat the warnings, so here I isolate the ratelimit part of printk.c to a standalone file. Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:06 -07:00
David Howells	8f0cfa52a1	xattr: add missing consts to function arguments Add missing consts to xattr function arguments. Signed-off-by: David Howells <dhowells@redhat.com> Cc: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:06 -07:00
Ilpo Järvinen	76308da189	smb.h: uses struct timespec but didn't include linux/time.h Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:05 -07:00
Robert P. J. Day	95d8c365b2	lists: add "const" qualifier to first arg of list_splice() operations Since neither the list_splice() nor __list_splice() routines modify their first argument, might as well declare them "const". [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:04 -07:00
Robert P. J. Day	8673511845	kbuild: move files that don't check __KERNEL__ Move files that don't check __KERNEL__ from unifdef-y to header-y. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:04 -07:00
Robert P. J. Day	1a6924f93d	kbuild: remove duplicate, conflicting entry for oom.h oom.h is already tagged for unifdef'ing, so its entry as a simple exportable header should be deleted. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:04 -07:00
Robert P. J. Day	aab3c3b01d	Remove superfluous include of string.h from percpu.h There's nothing in percpu.h that requires an explicit inclusion of string.h. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:04 -07:00
Pavel Emelyanov	3a2e7f47d7	binfmt_misc.c: avoid potential kernel stack overflow This can be triggered with root help only, but... Register the ":text:E::txt::/root/cat.txt:' rule in binfmt_misc (by root) and try launching the cat.txt file (by anyone) :) The result is - the endless recursion in the load_misc_binary -> open_exec -> load_misc_binary chain and stack overflow. There's a similar problem with binfmt_script, and there's a sh_bang memner on linux_binprm structure to handle this, but simply raising this in binfmt_misc may break some setups when the interpreter of some misc binaries is a script. So the proposal is to turn sh_bang into a bit, add a new one (the misc_bang) and raise it in load_misc_binary. After this, even if we set up the misc -> script -> misc loop for binfmts one of them will step on its own bang and exit. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:04 -07:00
Adrian Bunk	7d195a5409	proper extern for late_time_init Add a proper extern for late_time_init in include/linux/init.h Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:03 -07:00
Tetsuo Handa	175a06ae30	exec: remove argv_len from struct linux_binprm I noticed that 2.6.24.2 calculates bprm->argv_len at do_execve(). But it doesn't update bprm->argv_len after "remove_arg_zero() + copy_strings_kernel()" at load_script() etc. audit_bprm() is called from search_binary_handler() and search_binary_handler() is called from load_script() etc. Thus, I think the condition check if (bprm->argv_len > (audit_argv_kb << 10)) return -E2BIG; in audit_bprm() might return wrong result when strlen(removed_arg) != strlen(spliced_args). Why not update bprm->argv_len at load_script() etc. ? By the way, 2.6.25-rc3 seems to not doing the condition check. Is the field bprm->argv_len no longer needed? Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Ollie Wild <aaw@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:03 -07:00
WANG Cong	ecd0fa9825	Remove the macro get_personality Remove the macro get_personality, use ->personality instead. Cc: Christoph Hellwig <hch@infradead.org Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Bryan Wu <bryan.wu@analog.com> Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:02 -07:00
jan sonnek	6e5e8c5085	Misc: phantom, consistent whitespace Make it consistent with the rest of the header. Signed-off-by: jan sonnek <xsonnek@gmail.com> Cc: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:02 -07:00
Jiri Slaby	7e4e8e689f	Misc: phantom, add compat ioctl Openhaptics uses pointers in _IOC() macros, implement compat for them. Also add _IOC alternatives which are not 32/64 bit dependent (structures passed through aren't yet) -- libphantom will use them. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:02 -07:00
Adrian Bunk	eb0f1c442d	proper __do_softirq() prototype Add a proper prototype for __do_softirq() in include/linux/interrupt.h Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:02 -07:00
Adrian Bunk	58b250daff	remove mca_is_adapter_used() Remove the no longer used mca_is_adapter_used(). Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: James Bottomley <James.Bottomley@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:01 -07:00
Adrian Bunk	946a57b526	remove generic_commit_write() Remove the obsolete and no longer used generic_commit_write(). Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:01 -07:00
Adrian Bunk	d5470b596a	fs/aio.c: make 3 functions static Make the following needlessly global functions static: - __put_ioctx() - lookup_ioctx() - io_submit_one() Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Zach Brown <zach.brown@oracle.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:00 -07:00
Adrian Bunk	07d45da616	fs/drop_caches.c: make 2 functions static Make the following needlessly global functions static: - drop_pagecache() - drop_slab() Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:00 -07:00
Adrian Bunk	f11b00f3bd	fs/fs-writeback.c: make 2 functions static Make the following needlessly global functions static: - writeback_acquire() - writeback_release() Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:00 -07:00
Adrian Bunk	67cde59537	make vfs_ioctl() static Make the needlessly global vfs_ioctl() static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:00 -07:00
Adrian Bunk	6b09ae6692	make __put_super() static Make the needlessly global __put_super() static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:00 -07:00
Sam Ravnborg	f718e31819	cpu: fix section mismatch warnings in hotcpu_register Fix following warnings: WARNING: vmlinux.o(.data+0x5020): Section mismatch in reference from the variable cpu_vsyscall_notifier_nb.12876 to the function .cpuinit.text:cpu_vsyscall_notifier() WARNING: vmlinux.o(.data+0x9ce0): Section mismatch in reference from the variable profile_cpu_callback_nb.17654 to the function .devinit.text:profile_cpu_callback() WARNING: vmlinux.o(.data+0xd380): Section mismatch in reference from the variable workqueue_cpu_callback_nb.15004 to the function .devinit.text:workqueue_cpu_callback() WARNING: vmlinux.o(.data+0x11d00): Section mismatch in reference from the variable relay_hotcpu_callback_nb.19626 to the function .cpuinit.text:relay_hotcpu_callback() WARNING: vmlinux.o(.data+0x12970): Section mismatch in reference from the variable cpu_callback_nb.24694 to the function .devinit.text:cpu_callback() WARNING: vmlinux.o(.data+0x3fee0): Section mismatch in reference from the variable percpu_counter_hotcpu_callback_nb.10903 to the function .cpuinit.text:percpu_counter_hotcpu_callback() WARNING: vmlinux.o(.data+0x74ce0): Section mismatch in reference from the variable topology_cpu_callback_nb.12506 to the function .cpuinit.text:topology_cpu_callback() Functions used as argument are by definition only used in HOTPLUG_CPU situations so thay are annotated __cpuinit. Annotate the static variable used by hotcpu_register with __cpuinitdata to match this definition. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:05:59 -07:00
Sripathi Kodi	679c9cd4ac	add RUSAGE_THREAD Add the RUSAGE_THREAD option for the getrusage system call. This is essentially Roland's patch from http://lkml.org/lkml/2008/1/18/589, but the line about RUSAGE_LWP line has been removed, as suggested by Ulrich and Christoph. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:05:59 -07:00
Nur Hussein	95b570c9ce	Taint kernel after WARN_ON(condition) The kernel is sent to tainted within the warn_on_slowpath() function, and whenever a warning occurs the new taint flag 'W' is set. This is useful to know if a warning occurred before a BUG by preserving the warning as a flag in the taint state. This does not work on architectures where WARN_ON has its own definition. These archs are: 1. s390 2. superh 3. avr32 4. parisc The maintainers of these architectures have been added in the Cc: list in this email to alert them to the situation. The documentation in oops-tracing.txt has been updated to include the new flag. Signed-off-by: Nur Hussein <nurhussein@gmail.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:05:59 -07:00
Ilpo Järvinen	bd3feb13e1	fs/coda: remove static inline forward declarations They're defined later on in the same file with bodies and nothing in between needs them. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:05:59 -07:00
Eric Dumazet	ede9c697bc	Avoid divides in BITS_TO_LONGS BITS_PER_LONG is a signed value (32 or 64) DIV_ROUND_UP(nr, BITS_PER_LONG) performs signed arithmetic if "nr" is signed too. Converting BITS_TO_LONGS(nr) to DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long)) makes sure compiler can perform a right shift, even if "nr" is a signed value, instead of an expensive integer divide. Applying this patch saves 141 bytes on x86 when CONFIG_CC_OPTIMIZE_FOR_SIZE=y and speedup bitmap operations. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:05:59 -07:00
Nishanth Aravamudan	ab857d0938	mm: fix misleading __GFP_REPEAT related comments The definition and use of __GFP_REPEAT, __GFP_NOFAIL and __GFP_NORETRY in the core VM have somewhat differing comments as to their actual semantics. Annoyingly, the flags definition has inline and header comments, which might be interpreted as not being equivalent. Just add references to the header comments in the inline ones so they don't go out of sync in the future. In their use in __alloc_pages() clarify that the current implementation treats low-order allocations and __GFP_REPEAT allocations as distinct cases. To clarify, the flags' semantics are: __GFP_NORETRY means try no harder than one run through __alloc_pages __GFP_REPEAT means __GFP_NOFAIL __GFP_NOFAIL means repeat forever order <= PAGE_ALLOC_COSTLY_ORDER means __GFP_NOFAIL Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:05:58 -07:00
Alan D. Brunelle	ac9fafa124	block: Skip I/O merges when disabled The block I/O + elevator + I/O scheduler code spend a lot of time trying to merge I/Os -- rightfully so under "normal" circumstances. However, if one were to know that the incoming I/O stream was /very/ random in nature, the cycles are wasted. This patch adds a per-request_queue tunable that (when set) disables merge attempts (beyond the simple one-hit cache check), thus freeing up a non-trivial amount of CPU cycles. Signed-off-by: Alan D. Brunelle <alan.brunelle@hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-04-29 14:48:55 +02:00
FUJITA Tomonori	d7e3c3249e	block: add large command support This patch changes rq->cmd from the static array to a pointer to support large commands. We rarely handle large commands. So for optimization, a struct request still has a static array for a command. rq_init sets rq->cmd pointer to the static array. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-04-29 14:48:55 +02:00
FUJITA Tomonori	2a4aa30c5f	block: rename and export rq_init() This rename rq_init() blk_rq_init() and export it. Any path that hands the request to the block layer needs to call it to initialize the request. This is a preparation for large command support, which needs to initialize the request in a proper way (that is, just doing a memset() will not work). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-04-29 14:48:55 +02:00
Nick Piggin	75ad23bc0f	block: make queue flags non-atomic We can save some atomic ops in the IO path, if we clearly define the rules of how to modify the queue flags. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-04-29 14:48:33 +02:00
Zhang Wei	61b269179d	[RAPIDIO] Add serial RapidIO controller support, which includes MPC8548, MPC8641 Signed-off-by: Zhang Wei <wei.zhang@freescale.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-29 19:40:29 +10:00
Zhang Wei	e042323607	[RAPIDIO] Auto-probe the RapidIO system size The RapidIO system size will auto probe in RIO setup. The route table and rionet_active in rionet.c are changed to be allocated dynamically according to the size of the system. Signed-off-by: Zhang Wei <wei.zhang@freescale.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-29 19:40:28 +10:00
Zhang Wei	ad1e9380b1	[RAPIDIO] Add RapidIO multi mport support The original RapidIO driver suppose there is only one mpc85xx RIO controller in system. So, some data structures are defined as mpc85xx_rio global, such as 'regs_win', 'dbell_ring', 'msg_tx_ring'. Now, I changed them to mport's private members. And you can define multi RIO OF-nodes in dts file for multi RapidIO controller in one processor, such as PCI/PCI-Ex host controllers in Freescale's silicon. And the mport operation function declaration should be changed to know which RapidIO controller is target. Signed-off-by: Zhang Wei <wei.zhang@freescale.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-29 19:40:28 +10:00
FUJITA Tomonori	68154e90c9	block: add dma alignment and padding support to blk_rq_map_kern This patch adds bio_copy_kern similar to bio_copy_user. blk_rq_map_kern uses bio_copy_kern instead of bio_map_kern if necessary. bio_copy_kern uses temporary pages and the bi_end_io callback frees these pages. bio_copy_kern saves the original kernel buffer at bio->bi_private it doesn't use something like struct bio_map_data to store the information about the caller. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Tejun Heo <htejun@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-04-29 09:50:34 +02:00
Badari Pulavarty	9d88a2eb6e	[POWERPC] Provide walk_memory_resource() for powerpc Provide walk_memory_resource() for 64-bit powerpc. PowerPC maintains logical memory region mapping in the lmb.memory structure. Walk through these structures and do the callbacks for the contiguous chunks. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-29 15:57:53 +10:00
Badari Pulavarty	98d5c21c81	[POWERPC] Update lmb data structures for hotplug memory add/remove The powerpc kernel maintains information about logical memory blocks in the lmb.memory structure, which is initialized and updated at boot time, but not when memory is added or removed while the kernel is running. This adds a hotplug memory notifier which updates lmb.memory when memory is added or removed. This information is useful for eHEA driver to find out the memory layout and holes. NOTE: No special locking is needed for lmb_add() and lmb_remove(). Calls to these are serialized by caller. (pSeries_reconfig_chain). Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-29 15:57:53 +10:00
Linus Torvalds	8ab68ab420	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (35 commits) siimage: coding style cleanup (take 2) ide-cd: clean up cdrom_analyze_sense_data() ide-cd: fix test unsigned var < 0 ide: add TSSTcorp CDDVDW SH-S202H to ivb_list[] piix: add Asus Eee 701 controller to short cable list ARM: always select HAVE_IDE remove the broken ETRAX_IDE driver ide: remove ->dma_prdtable field from ide_hwif_t ide: remove ->dma_vendor{1,3} fields from ide_hwif_t scc_pata: add ->dma_host_set and ->dma_start methods ide: skip "VLB sync" if host uses MMIO ide: add ide_pad_transfer() helper ide: remove ->INW and ->OUTW methods ide: use IDE I/O helpers directly in ide_tf_{load,read}() ns87415: add ->tf_read method scc_pata: add ->tf_{load,read} methods ide-h8300: add ->tf_{load,read} methods ide-cris: add ->tf_{load,read} methods ide: add ->tf_load and ->tf_read methods ide: move ide_tf_{load,read} to ide-iops.c ...	2008-04-28 17:30:26 -07:00
Bartlomiej Zolnierkiewicz	55224bc86a	ide: remove ->dma_prdtable field from ide_hwif_t * Use 'hwif->dma_base + {4,8}' instead of hwif->dma_prdtable in {ide,scc}_dma_setup(). * Remove no longer needed ->dma_prdtable field from ide_hwif_t. While at it: * Use ATA_DMA_TABLE_OFS define. Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-28 23:44:42 +02:00
Bartlomiej Zolnierkiewicz	41051a141d	ide: remove ->dma_vendor{1,3} fields from ide_hwif_t * Use 'hwif->dma_base + {1,3}' instead of hwif->dma_vendor{1,3} in pdc202xx_new host driver. * Remove no longer needed ->dma_vendor{1,3} fields from ide_hwif_t. Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-28 23:44:42 +02:00
Bartlomiej Zolnierkiewicz	9f87abe892	ide: add ide_pad_transfer() helper * Add ide_pad_transfer() helper (which uses ->{in,out}put_data methods internally so the transfer is also padded to drive+host requirements) and use it instead of ide_atapi_{write_zeros,discard_data}(). * Remove no longer needed ide_atapi_{write_zeros,discard_data}(). Cc: Borislav Petkov <petkovbb@gmail.com> Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-28 23:44:41 +02:00
Bartlomiej Zolnierkiewicz	7c0daf2681	ide: remove ->INW and ->OUTW methods * Remove no longer used ->INW and ->OUTW methods. While at it: * scc_pata.c: scc_ide_{out,in}w() is called only in scc_tf_{load,read}() so inline it there. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-28 23:44:41 +02:00
Bartlomiej Zolnierkiewicz	94cd5b62ff	ide: add ->tf_load and ->tf_read methods * Add ->tf_load and ->tf_read methods to ide_hwif_t and set the default methods in default_hwif_transport(). * Use ->tf_{load,read} instead o calling ide_tf_{load,read}() directly. * Make ide_tf_{load,read}() static. There should be no functional changes caused by this patch. Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-28 23:44:40 +02:00
Bartlomiej Zolnierkiewicz	089c5c7e00	ide: factor out debugging code from ide_tf_load() Factor out debugging code from ide_tf_load() to ide_tf_dump() helper and update ide_tf_load() users accordingly. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-28 23:44:39 +02:00
Bartlomiej Zolnierkiewicz	1fc142589e	ide: add ide_execute_pkt_cmd() helper Add ide_execute_pkt_cmd() helper for executing PACKET command, then convert ATAPI device drivers to use it. As a nice side-effect this fixes ide-{floppy,tape,scsi} w.r.t. ide_lock taking (ide-cd was OK). Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-28 23:44:39 +02:00
Bartlomiej Zolnierkiewicz	16bb69c14a	ide: remove ->INS{W,L} and ->OUTS{W,L} methods * Use ins{w,l}()/outs{w,l}() and __ide_mm_ins{w,l}()/__ide_mm_outs{w,l}() directly in ata_{in,out}put_data() (by using IDE_HFLAG_MMIO host flag to decide which I/O ops are required). * Remove no longer needed ->INS{W,L} and ->OUTS{W,L} methods (ide-h8300, au1xxx-ide and scc_pata implement their own ->{in,out}put_data methods). There should be no functional changes caused by this patch. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-28 23:44:37 +02:00
Bartlomiej Zolnierkiewicz	c5dd43ec65	ide: add IDE_HFLAG_MMIO host flag (take 2) * Add IDE_HFLAG_MMIO host flag and set it for hosts which use default_hwif_mmiops(). v2: * Fix kernel panic in pmac host driver (',' should be '\|'). Thanks to Kamalesh for reporting it + testing the fix and to Andrew for hinting me about the source of the issue. Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-28 23:44:37 +02:00
Bartlomiej Zolnierkiewicz	9567b349f7	ide: merge ->atapi_put_bytes and ->ata_put_data methods * Merge ->atapi_{in,out}put_bytes and ->ata_{in,out}put_data methods into new ->{in,out}put_data methods which take number of bytes to transfer as an argument and always do padding. While at it: * Use 'hwif' or 'drive->hwif' instead of 'HWIF(drive)'. There should be no functional changes caused by this patch (all users of ->ata_{in,out}put_data methods were using multiply-of-4 word counts). Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-28 23:44:36 +02:00
Bartlomiej Zolnierkiewicz	92d3ab27e8	falconide/q40ide: add ->atapi_put_bytes and ->ata_put_data methods (take 2) * Add ->atapi_{in,out}put_bytes and ->ata_{in,out}put_data methods to falconide and q40ide host drivers (->ata_* methods are implemented on top of ->atapi_* methods so they also do byte-swapping now). * Cleanup atapi_{in,out}put_bytes(). v2: * Add 'struct request *rq' argument to ->ata_{in,out}put_data methods and don't byte-swap disk fs requests (we shouldn't un-swap fs requests because fs itself is stored byte-swapped on the disk) - this is how things were done before the patch (ideally device mapper should be used instead but it would break existing setups and would have some performance impact). Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michael Schmitz <schmitz@debian.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Richard Zidlicky <rz@linux-m68k.org> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-28 23:44:36 +02:00
Linus Torvalds	e97e386b12	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: slub: pack objects denser slub: Calculate min_objects based on number of processors. slub: Drop DEFAULT_MAX_ORDER / DEFAULT_MIN_OBJECTS slub: Simplify any_slab_object checks slub: Make the order configurable for each slab cache slub: Drop fallback to page allocator method slub: Fallback to minimal order during slab page allocation slub: Update statistics handling for variable order slabs slub: Add kmem_cache_order_objects struct slub: for_each_object must be passed the number of objects in a slab slub: Store max number of objects in the page struct. slub: Dump list of objects not freed on kmem_cache_close() slub: free_list() cleanup slub: improve kmem_cache_destroy() error message slob: fix bug - when slob allocates "struct kmem_cache", it does not force alignment.	2008-04-28 14:08:56 -07:00
Alessandro Guido	30d221db44	[CPUFREQ] allow use of the powersave governor as the default one Allow use of the powersave cpufreq governor as the default one for EMBEDDED configs. Signed-off-by: Alessandro Guido <alessandro.guido@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Jones <davej@redhat.com>	2008-04-28 16:27:08 -04:00
Darrick J. Wong	e8628dd06d	[CPUFREQ] expose cpufreq coordination requirements regardless of coordination mechanism Currently, affected_cpus shows which CPUs need to have their frequency coordinated in software. When hardware coordination is in use, the contents of this file appear the same as when no coordination is required. This can lead to some confusion among user-space programs, for example, that do not know that extra coordination is required to force a CPU core to a particular speed to control power consumption. To fix this, create a "related_cpus" attribute that always displays the coordination map regardless of whatever coordination strategy the cpufreq driver uses (sw or hw). If the cpufreq driver does not provide a value, fall back to policy->cpus. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Jones <davej@redhat.com>	2008-04-28 16:27:08 -04:00
Jesse Barnes	ee69439cc1	PCI: don't expose struct pci_vpd to userspace We just need to forward declare it for struct pci_dev, not expose it outside of __KERNEL__. Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-04-28 12:30:35 -07:00
Linus Torvalds	cfd299dffe	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6: SELinux: Fix a RCU free problem with the netport cache SELinux: Made netnode cache adds faster SELinux: include/security.h whitespace, syntax, and other cleanups SELinux: policydb.h whitespace, syntax, and other cleanups SELinux: mls_types.h whitespace, syntax, and other cleanups SELinux: mls.h whitespace, syntax, and other cleanups SELinux: hashtab.h whitespace, syntax, and other cleanups SELinux: context.h whitespace, syntax, and other cleanups SELinux: ss/conditional.h whitespace, syntax, and other cleanups SELinux: selinux/include/security.h whitespace, syntax, and other cleanups SELinux: objsec.h whitespace, syntax, and other cleanups SELinux: netlabel.h whitespace, syntax, and other cleanups SELinux: avc_ss.h whitespace, syntax, and other cleanups Fixed up conflict in include/linux/security.h manually	2008-04-28 10:08:49 -07:00
Al Viro	01d7b36988	usbhid endianness annotations and fixes usb_control_msg() converts arguments to little-endian itself, doing that in caller means breakage on big-endian boxen. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 10:03:31 -07:00
Andrew Morton	73f20e58b1	FAT_VALID_MEDIA(): remove pointless test The on-disk media specification field in FAT is only 8-bits, so testing for <=0xff is pointless, and can generate a "comparison is always true due to limited range of data type" warning. While we're there, convert FAT_VALID_MEDIA() into a C function - the present implementation is buggy: it generates either one or two references to its argument. Cc: Frank Seidel <fseidel@suse.de> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:47 -07:00
OGAWA Hirofumi	606e423e43	fat: Update free_clusters even if it is untrusted Currently, free_clusters is not updated until it is trusted, because Windows doesn't update it correctly. But if user is using FAT driver of Linux, it updates free_clusters correctly. Instead, this updates it even if it's untrusted, so if free_clustes is correct, now keep correct value. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:47 -07:00
OGAWA Hirofumi	1ae43f826b	fat: Add allow_utime option Normally utime(2) checks current process is owner of the file, or it has CAP_FOWNER capability. But FAT filesystem doesn't have uid/gid as on disk info, so normal check is too unflexible. With this option you can relax it. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:47 -07:00
OGAWA Hirofumi	1278fdd34b	fat: fat_notify_change() and check_mode() cleanup - Rename fat_notify_change() to fat_setattr() - check_mode() cleanup - Change layout of code Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:47 -07:00
Jan Kara	d5dee5c395	reiserfs: unpack tails on quota files Quota files cannot have tails because quota_write and quota_read functions do not support them. So far when quota files did have tail, we just refused to turn quotas on it. Sadly this check has been wrong and so there are now plenty installations where quota files don't have NOTAIL flag set and so now after fixing the check, they suddently fail to turn quotas on. Since it's easy to unpack the tail from kernel, do this from reiserfs_quota_on() which solves the problem and is generally nicer to users anyway. Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: <urhausen@urifabi.net> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:46 -07:00
Dan Williams	8b3e6cdc53	md: introduce get_priority_stripe() to improve raid456 write performance Improve write performance by preventing the delayed_list from dumping all its stripes onto the handle_list in one shot. Delayed stripes are now further delayed by being held on the 'hold_list'. The 'hold_list' is bypassed when: * a STRIPE_IO_STARTED stripe is found at the head of 'handle_list' * 'handle_list' is empty and i/o is being done to satisfy full stripe-width write requests * 'bypass_count' is less than 'bypass_threshold'. By default the threshold is 1, i.e. every other stripe handled is a preread stripe provided the top two conditions are false. Benchmark data: System: 2x Xeon 5150, 4x SATA, mem=1GB Baseline: 2.6.24-rc7 Configuration: mdadm --create /dev/md0 /dev/sd[b-e] -n 4 -l 5 --assume-clean Test1: dd if=/dev/zero of=/dev/md0 bs=1024k count=2048 * patched: +33% (stripe_cache_size = 256), +25% (stripe_cache_size = 512) Test2: tiobench --size 2048 --numruns 5 --block 4096 --block 131072 (XFS) * patched: +13% * patched + preread_bypass_threshold = 0: +37% Changes since v1: * reduce bypass_threshold from (chunk_size / sectors_per_chunk) to (1) and make it configurable. This defaults to fairness and modest performance gains out of the box. Changes since v2: * [neilb@suse.de]: kill STRIPE_PRIO_HI and preread_needed as they are not necessary, the important change was clearing STRIPE_DELAYED in add_stripe_bio and this has been moved out to make_request for the hang fix. * [neilb@suse.de]: simplify get_priority_stripe * [dan.j.williams@intel.com]: reset the bypass_count when ->hold_list is sampled empty (+11%) * [dan.j.williams@intel.com]: decrement the bypass_count at the detection of stripes being naturally promoted off of hold_list +2%. Note, resetting bypass_count instead of decrementing on these events yields +4% but that is probably too aggressive. Changes since v3: * cosmetic fixups Tested-by: James W. Laferriere <babydr@baby-dragons.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:42 -07:00
Andres Salomon	b6f448e99c	PM/gxfb: add hook to PM console layer that allows disabling of suspend VT switch Prior to suspend, we allocate and switch to a new VT; after suspend, we switch back to the original VT. This can be slow, and is completely unnecessary if the framebuffer we're using can restore video properly. This adds a hook that allows drivers to select whether or not to do this vt switch, and changes the gxfb driver to call this hook. It also adds a module param to gxfb to allow controlling of the vt switch (defaulting to no switch). (Note: I'm not convinced that console_sem is the best way to protect this, but we should probably have some form of locking..) [akpm@linux-foundation.org: build fix] Signed-off-by: Andres Salomon <dilinger@debian.org> Cc: Jordan Crouse <jordan.crouse@amd.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:36 -07:00
Anton Vorontsov	e4c690e061	fb: add support for foreign endianness Add support for the framebuffers with non-native endianness. This is done via FBINFO_FOREIGN_ENDIAN flag that will be used by the drivers. Depending on the host endianness this flag will be overwritten by FBINFO_BE_MATH internal flag, or cleared. Tested to work on MPC8360E-RDK (BE) + Fujitsu MINT framebuffer (LE). Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: <Valdis.Kletnieks@vt.edu> Cc: Clemens Koller <clemens.koller@anagramm.de> Cc: Krzysztof Helt <krzysztof.h1@poczta.fm> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:35 -07:00
Ilpo Järvinen	73fcdc9e15	i2o: remove static inline forward declarations Nothing in between of them and the later declaration with body needs them. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:34 -07:00
Andrew Morton	50f8c370e7	quota: convert stub functions from macros into inlines Fixes things like this: fs/super.c: In function `deactivate_super': fs/super.c:182: warning: statement with no effect fs/super.c: In function `do_remount_sb': fs/super.c:644: warning: statement with no effect Cc: Jan Kara <jack@ucw.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:33 -07:00
Jan Kara	0ff5af8340	quota: quota core changes for quotaon on remount Currently, we just turn quotas off on remount of filesystem to read-only state. The patch below adds necessary framework so that we can turn quotas off on remount RO but we are able to automatically reenable them again when filesystem is remounted to RW state. All we need to do is to keep references to inodes of quota files when remounting RO and using these references to reenable quotas when remounting RW. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:33 -07:00
Jan Kara	03f6e92bdd	quota: various style cleanups Cleanups in quota code: Change __inline__ to inline. Change some macros to inline functions. Remove vfs_quota_off_mount() macro. DQUOT_OFF() should be (0) is CONFIG_QUOTA is disabled. Move declaration of mark_dquot_dirty and dirty_dquot from quota.h to dquot.c [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:33 -07:00
Andrew Perepechko	338bf9afda	quota: do not allow setting of quota limits to too high values We should check whether quota limits set via Q_SETQUOTA are not exceeding limits which quota format is able to handle. Signed-off-by: Andrew Perepechko <andrew.perepechko@sun.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:32 -07:00
Masami Hiramatsu	26b31c1908	kprobes: add (un)register_jprobes for batch registration Introduce unregister_/register_jprobes() for jprobe batch registration. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Cc: Shaohua Li <shaohua.li@intel.com> Cc: David Miller <davem@davemloft.net> Cc: "Frank Ch. Eigler" <fche@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:32 -07:00
Masami Hiramatsu	4a296e07c3	kprobes: add (un)register_kretprobes for batch registration Introduce unregister_/register_kretprobes() for kretprobe batch registration. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Cc: Shaohua Li <shaohua.li@intel.com> Cc: David Miller <davem@davemloft.net> Cc: "Frank Ch. Eigler" <fche@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:32 -07:00
Masami Hiramatsu	9861668f74	kprobes: add (un)register_kprobes for batch registration Introduce unregister_/register_kprobes() for kprobe batch registration. This can reduce waiting time for synchronized_sched() when a lot of probes have to be unregistered at once. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Cc: Shaohua Li <shaohua.li@intel.com> Cc: David Miller <davem@davemloft.net> Cc: "Frank Ch. Eigler" <fche@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:32 -07:00
Masami Hiramatsu	9960257281	list.h: add list_is_singular() Add list_is_singular() to check a list has just one entry. list_is_singular() is useful to check whether a list_head which have been temporarily allocated for listing objects can be released or not. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:32 -07:00
Srinivasa Ds	3d8d996e0c	kprobes: prevent probing of preempt_schedule() Prohibit users from probing preempt_schedule(). One way of prohibiting the user from probing functions is by marking such functions with __kprobes. But this method doesn't work for those functions, which are already marked to different section like preempt_schedule() (belongs to __sched section). So we use blacklist approach to refuse user from probing these functions. In blacklist approach we populate the blacklisted function's starting address and its size in kprobe_blacklist structure. Then we verify the user specified address against start and end of the blacklisted function. So any attempt to register probe on blacklisted functions will be rejected. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com> Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Jim Keniston <jkenisto@us.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:32 -07:00
Karl Dahlke	0341a4d0fd	VT notifier extension for accessibility Some accessibility modules need to be able to catch the output on the console before the VT interpretation, and possibly swallow it. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:32 -07:00
Magnus Damm	61711f8fd8	sm501: add uart support This patch extends the sm501 mfd with 8250 uart support. We're currently doing this in the board specific r2d-1 code already, but it would be nice to do move things into the mfd since it's more chip specific than board specific. Signed-off-by: Magnus Damm <damm@igel.co.jp> Cc: Ben Dooks <ben-linux@fluff.org> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:32 -07:00
Thomas Petazzoni	7ae9392c0a	x86: configurable DMI scanning code Turn CONFIG_DMI into a selectable option if EMBEDDED is defined, in order to be able to remove the DMI table scanning code if it's not needed, and then reduce the kernel code size. With CONFIG_DMI (i.e before) : text data bss dec hex filename 1076076 128656 98304 1303036 13e1fc vmlinux Without CONFIG_DMI (i.e after) : text data bss dec hex filename 1068092 126308 98304 1292704 13b9a0 vmlinux Result: text data bss dec hex filename -7984 -2348 0 -10332 -285c vmlinux The new option appears in "Processor type and features", only when CONFIG_EMBEDDED is defined. This patch is part of the Linux Tiny project, and is based on previous work done by Matt Mackall <mpm@selenic.com>. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Anvin" <hpa@zytor.com> Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:30 -07:00
Joe Perches	0fab6de09c	synclink drivers bool conversion Remove more TRUE/FALSE defines and uses Remove == TRUE tests Convert BOOLEAN to bool Convert int to bool where appropriate Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Paul Fulghum <paulkf@microgate.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:29 -07:00
Harvey Harrison	cdf8803768	ncpfs: add prototypes to ncp_fs.h Removes some externs from C files, noticed from the sparse warnings: fs/ncpfs/dir.c:90:26: warning: symbol 'ncp_root_dentry_operations' was not declared. Should it be static? fs/ncpfs/symlink.c:107:5: warning: symbol 'ncp_symlink' was not declared. Should it be static? fs/ncpfs/symlink.c:101:39: warning: symbol 'ncp_symlink_aops' was not declared. Should it be static? Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Acked-by: Petr Vandrovec <VANDROVE@vc.cvut.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:29 -07:00
Andrew G. Morgan	3898b1b4eb	capabilities: implement per-process securebits Filesystem capability support makes it possible to do away with (set)uid-0 based privilege and use capabilities instead. That is, with filesystem support for capabilities but without this present patch, it is (conceptually) possible to manage a system with capabilities alone and never need to obtain privilege via (set)uid-0. Of course, conceptually isn't quite the same as currently possible since few user applications, certainly not enough to run a viable system, are currently prepared to leverage capabilities to exercise privilege. Further, many applications exist that may never get upgraded in this way, and the kernel will continue to want to support their setuid-0 base privilege needs. Where pure-capability applications evolve and replace setuid-0 binaries, it is desirable that there be a mechanisms by which they can contain their privilege. In addition to leveraging the per-process bounding and inheritable sets, this should include suppressing the privilege of the uid-0 superuser from the process' tree of children. The feature added by this patch can be leveraged to suppress the privilege associated with (set)uid-0. This suppression requires CAP_SETPCAP to initiate, and only immediately affects the 'current' process (it is inherited through fork()/exec()). This reimplementation differs significantly from the historical support for securebits which was system-wide, unwieldy and which has ultimately withered to a dead relic in the source of the modern kernel. With this patch applied a process, that is capable(CAP_SETPCAP), can now drop all legacy privilege (through uid=0) for itself and all subsequently fork()'d/exec()'d children with: prctl(PR_SET_SECUREBITS, 0x2f); This patch represents a no-op unless CONFIG_SECURITY_FILE_CAPABILITIES is enabled at configure time. [akpm@linux-foundation.org: fix uninitialised var warning] [serue@us.ibm.com: capabilities: use cap_task_prctl when !CONFIG_SECURITY] Signed-off-by: Andrew G. Morgan <morgan@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Reviewed-by: James Morris <jmorris@namei.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Paul Moore <paul.moore@hp.com> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:26 -07:00
KAMEZAWA Hiroyuki	8cece85ec7	mm: fix broken gfp_zone with __GFP_THISNODE This hack, "base = MAX_NR_ZONES", at __GFP_THISNODE was used for old zonliests. Now, new zonelist[] have a list for __GFP_THISNODE and this hack is incorrect. Should be removed. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:26 -07:00
Yasunori Goto	e70260aabe	memory hotplug: make alloc_bootmem_section() alloc_bootmem_section() can allocate specified section's area. This is used for usemap to keep same section with pgdat by later patch. Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:25 -07:00
Yasunori Goto	0475327876	memory hotplug: register section/node id to free This patch set is to free pages which is allocated by bootmem for memory-hotremove. Some structures of memory management are allocated by bootmem. ex) memmap, etc. To remove memory physically, some of them must be freed according to circumstance. This patch set makes basis to free those pages, and free memmaps. Basic my idea is using remain members of struct page to remember information of users of bootmem (section number or node id). When the section is removing, kernel can confirm it. By this information, some issues can be solved. 1) When the memmap of removing section is allocated on other section by bootmem, it should/can be free. 2) When the memmap of removing section is allocated on the same section, it shouldn't be freed. Because the section has to be logical memory offlined already and all pages must be isolated against page allocater. If it is freed, page allocator may use it which will be removed physically soon. 3) When removing section has other section's memmap, kernel will be able to show easily which section should be removed before it for user. (Not implemented yet) 4) When the above case 2), the page isolation will be able to check and skip memmap's page when logical memory offline (offline_pages()). Current page isolation code fails in this case because this page is just reserved page and it can't distinguish this pages can be removed or not. But, it will be able to do by this patch. (Not implemented yet.) 5) The node information like pgdat has similar issues. But, this will be able to be solved too by this. (Not implemented yet, but, remembering node id in the pages.) Fortunately, current bootmem allocator just keeps PageReserved flags, and doesn't use any other members of page struct. The users of bootmem doesn't use them too. This patch: This is to register information which is node or section's id. Kernel can distinguish which node/section uses the pages allcated by bootmem. This is basis for hot-remove sections or nodes. Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:25 -07:00
Gerald Schaefer	6d779079bf	hugetlbfs: architecture header cleanup This patch moves all architecture functions for hugetlb to architecture header files (include/asm-foo/hugetlb.h) and converts all macros to inline functions. It also removes (!) ARCH_HAS_HUGEPAGE_ONLY_RANGE, ARCH_HAS_HUGETLB_FREE_PGD_RANGE, ARCH_HAS_PREPARE_HUGEPAGE_RANGE, ARCH_HAS_SETCLEAR_HUGE_PTE and ARCH_HAS_HUGETLB_PREFAULT_HOOK. Getting rid of the ARCH_HAS_xxx #ifdef and macro fugliness should increase readability and maintainability, at the price of some code duplication. An asm-generic common part would have reduced the loc, but we would end up with new ARCH_HAS_xxx defines eventually. Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:25 -07:00
Lee Schermerhorn	71fe804b6d	mempolicy: use struct mempolicy pointer in shmem_sb_info This patch replaces the mempolicy mode, mode_flags, and nodemask in the shmem_sb_info struct with a struct mempolicy pointer, initialized to NULL. This removes dependency on the details of mempolicy from shmem.c and hugetlbfs inode.c and simplifies the interfaces. mpol_parse_str() in mempolicy.c is changed to return, via a pointer to a pointer arg, a struct mempolicy pointer on success. For MPOL_DEFAULT, the returned pointer is NULL. Further, mpol_parse_str() now takes a 'no_context' argument that causes the input nodemask to be stored in the w.user_nodemask of the created mempolicy for use when the mempolicy is installed in a tmpfs inode shared policy tree. At that time, any cpuset contextualization is applied to the original input nodemask. This preserves the previous behavior where the input nodemask was stored in the superblock. We can think of the returned mempolicy as "context free". Because mpol_parse_str() is now calling mpol_new(), we can remove from mpol_to_str() the semantic checks that mpol_new() already performs. Add 'no_context' parameter to mpol_to_str() to specify that it should format the nodemask in w.user_nodemask for 'bind' and 'interleave' policies. Change mpol_shared_policy_init() to take a pointer to a "context free" struct mempolicy and to create a new, "contextualized" mempolicy using the mode, mode_flags and user_nodemask from the input mempolicy. Note: we know that the mempolicy passed to mpol_to_str() or mpol_shared_policy_init() from a tmpfs superblock is "context free". This is currently the only instance thereof. However, if we found more uses for this concept, and introduced any ambiguity as to whether a mempolicy was context free or not, we could add another internal mode flag to identify context free mempolicies. Then, we could remove the 'no_context' argument from mpol_to_str(). Added shmem_get_sbmpol() to return a reference counted superblock mempolicy, if one exists, to pass to mpol_shared_policy_init(). We must add the reference under the sb stat_lock to prevent races with replacement of the mpol by remount. This reference is removed in mpol_shared_policy_init(). [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: another build fix] [akpm@linux-foundation.org: yet another build fix] Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:25 -07:00
Lee Schermerhorn	095f1fc4eb	mempolicy: rework shmem mpol parsing and display mm/shmem.c currently contains functions to parse and display memory policy strings for the tmpfs 'mpol' mount option. Move this to mm/mempolicy.c with the rest of the mempolicy support. With subsequent patches, we'll be able to remove knowledge of the details [mode, flags, policy, ...] completely from shmem.c 1) replace shmem_parse_mpol() in mm/shmem.c with mpol_parse_str() in mm/mempolicy.c. Rework to use the policy_types[] array [used by mpol_to_str()] to look up mode by name. 2) use mpol_to_str() to format policy for shmem_show_mpol(). mpol_to_str() expects a pointer to a struct mempolicy, so temporarily construct one. This will be replaced with a reference to a struct mempolicy in the tmpfs superblock in a subsequent patch. NOTE 1: I changed mpol_to_str() to use a colon ':' rather than an equal sign '=' as the nodemask delimiter to match mpol_parse_str() and the tmpfs/shmem mpol mount option formatting that now uses mpol_to_str(). This is a user visible change to numa_maps, but then the addition of the mode flags already changed the display. It makes sense to me to have the mounts and numa_maps display the policy in the same format. However, if anyone objects strongly, I can pass the desired nodemask delimeter as an arg to mpol_to_str(). Note 2: Like show_numa_map(), I don't check the return code from mpol_to_str(). I do use a longer buffer than the one provided by show_numa_map(), which seems to have sufficed so far. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:24 -07:00
Lee Schermerhorn	fc36b8d3d8	mempolicy: use MPOL_F_LOCAL to Indicate Preferred Local Policy Now that we're using "preferred local" policy for system default, we need to make this as fast as possible. Because of the variable size of the mempolicy structure [based on size of nodemasks], the preferred_node may be in a different cacheline from the mode. This can result in accessing an extra cacheline in the normal case of system default policy. Suspect this is the cause of an observed 2-3% slowdown in page fault testing relative to kernel without this patch series. To alleviate this, use an internal mode flag, MPOL_F_LOCAL in the mempolicy flags member which is guaranteed [?] to be in the same cacheline as the mode itself. Verified that reworked mempolicy now performs slightly better on 25-rc8-mm1 for both anon and shmem segments with system default and vma [preferred local] policy. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:24 -07:00
Lee Schermerhorn	52cd3b0740	mempolicy: rework mempolicy Reference Counting [yet again] After further discussion with Christoph Lameter, it has become clear that my earlier attempts to clean up the mempolicy reference counting were a bit of overkill in some areas, resulting in superflous ref/unref in what are usually fast paths. In other areas, further inspection reveals that I botched the unref for interleave policies. A separate patch, suitable for upstream/stable trees, fixes up the known errors in the previous attempt to fix reference counting. This patch reworks the memory policy referencing counting and, one hopes, simplifies the code. Maybe I'll get it right this time. See the update to the numa_memory_policy.txt document for a discussion of memory policy reference counting that motivates this patch. Summary: Lookup of mempolicy, based on (vma, address) need only add a reference for shared policy, and we need only unref the policy when finished for shared policies. So, this patch backs out all of the unneeded extra reference counting added by my previous attempt. It then unrefs only shared policies when we're finished with them, using the mpol_cond_put() [conditional put] helper function introduced by this patch. Note that shmem_swapin() calls read_swap_cache_async() with a dummy vma containing just the policy. read_swap_cache_async() can call alloc_page_vma() multiple times, so we can't let alloc_page_vma() unref the shared policy in this case. To avoid this, we make a copy of any non-null shared policy and remove the MPOL_F_SHARED flag from the copy. This copy occurs before reading a page [or multiple pages] from swap, so the overhead should not be an issue here. I introduced a new static inline function "mpol_cond_copy()" to copy the shared policy to an on-stack policy and remove the flags that would require a conditional free. The current implementation of mpol_cond_copy() assumes that the struct mempolicy contains no pointers to dynamically allocated structures that must be duplicated or reference counted during copy. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:24 -07:00
Lee Schermerhorn	a6020ed759	mempolicy: document {set\|get}_policy() vm_ops APIs Document mempolicy return value reference semantics assumed by the rest of the mempolicy code for the set_ and get_policy vm_ops in <linux/mm.h>--where the prototypes are defined--to inform any future mempolicy vm_op writers what the rest of the subsystem expects of them. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:24 -07:00

1 2 3 4 5 ...

10446 Commits