OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Jiri Slaby	e64646946e	exit_thread: accept a task parameter to be exited We need to call exit_thread from copy_process in a fail path. So make it accept task_struct as a parameter. [v2] * s390: exit_thread_runtime_instr doesn't make sense to be called for non-current tasks. * arm: fix the comment in vfp_thread_copy * change 'me' to 'tsk' for task_struct * now we can change only archs that actually have exit_thread [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: "David S. Miller" <davem@davemloft.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Liqin <liqin.linux@gmail.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Howells <dhowells@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Jonas Bonn <jonas@southpole.se> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Ley Foon Tan <lftan@altera.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mikael Starvik <starvik@axis.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@arm.linux.org.uk> Cc: Steven Miao <realmz6@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-20 17:58:30 -07:00
Konstantin Khlebnikov	8463833590	mm: rework virtual memory accounting When inspecting a vague code inside prctl(PR_SET_MM_MEM) call (which testing the RLIMIT_DATA value to figure out if we're allowed to assign new @start_brk, @brk, @start_data, @end_data from mm_struct) it's been commited that RLIMIT_DATA in a form it's implemented now doesn't do anything useful because most of user-space libraries use mmap() syscall for dynamic memory allocations. Linus suggested to convert RLIMIT_DATA rlimit into something suitable for anonymous memory accounting. But in this patch we go further, and the changes are bundled together as: * keep vma counting if CONFIG_PROC_FS=n, will be used for limits * replace mm->shared_vm with better defined mm->data_vm * account anonymous executable areas as executable * account file-backed growsdown/up areas as stack * drop struct file* argument from vm_stat_account * enforce RLIMIT_DATA for size of data areas This way code looks cleaner: now code/stack/data classification depends only on vm_flags state: VM_EXEC & ~VM_WRITE -> code (VmExe + VmLib in proc) VM_GROWSUP \| VM_GROWSDOWN -> stack (VmStk) VM_WRITE & ~VM_SHARED & !stack -> data (VmData) The rest (VmSize - VmData - VmStk - VmExe - VmLib) could be called "shared", but that might be strange beast like readonly-private or VM_IO area. - RLIMIT_AS limits whole address space "VmSize" - RLIMIT_STACK limits stack "VmStk" (but each vma individually) - RLIMIT_DATA now limits "VmData" Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Willy Tarreau <w@1wt.eu> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Kees Cook <keescook@google.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: Pavel Emelyanov <xemul@virtuozzo.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-14 16:00:49 -08:00
David Howells	75c3cfa855	VFS: assorted weird filesystems: d_inode() annotations Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-15 15:06:58 -04:00
Linus Torvalds	603ba7e41b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile #2 from Al Viro: "Next pile (and there'll be one or two more). The large piece in this one is getting rid of /proc//ns/ weirdness; among other things, it allows to (finally) make nameidata completely opaque outside of fs/namei.c, making for easier further cleanups in there" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: coda_venus_readdir(): use file_inode() fs/namei.c: fold link_path_walk() call into path_init() path_init(): don't bother with LOOKUP_PARENT in argument fs/namei.c: new helper (path_cleanup()) path_init(): store the "base" pointer to file in nameidata itself make default ->i_fop have ->open() fail with ENXIO make nameidata completely opaque outside of fs/namei.c kill proc_ns completely take the targets of /proc//ns/ symlinks to separate fs bury struct proc_ns in fs/proc copy address of proc_ns_ops into ns_common new helpers: ns_alloc_inum/ns_free_inum make proc_ns_operations work with struct ns_common * instead of void * switch the rest of proc_ns_operations to working with &...->ns netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns common object embedded into various struct ....ns	2014-12-16 15:53:03 -08:00
Al Viro	bd9b51e79c	make default ->i_fop have ->open() fail with ENXIO As it is, default ->i_fop has NULL ->open() (along with all other methods). The only case where it matters is reopening (via procfs symlink) a file that didn't get its ->f_op from ->i_fop - anything else will have ->i_fop assigned to something sane (default would fail on read/write/ioctl/etc.). Unfortunately, such case exists - alloc_file() users, especially anon_get_file() ones. There we have tons of opened files of very different kinds sharing the same inode. As the result, attempt to reopen those via procfs succeeds and you get a descriptor you can't do anything with. Moreover, in case of sockets we set ->i_fop that will only be used on such reopen attempts - and put a failing ->open() into it to make sure those do not succeed. It would be simpler to put such ->open() into default ->i_fop and leave it unchanged both for anon inode (as we do anyway) and for socket ones. Result: * everything going through do_dentry_open() works as it used to * sock_no_open() kludge is gone * attempts to reopen anon-inode files fail as they really ought to * ditto for aio_private_file() * ditto for perfmon - this one actually tried to imitate sock_no_open() trick, but failed to set ->i_fop, so in the current tree reopens succeed and yield completely useless descriptor. Intent clearly had been to fail with -ENXIO on such reopens; now it actually does. * everything else that used alloc_file() keeps working - it has ->i_fop set for its inodes anyway Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-12-10 21:32:15 -05:00
Yann Droneaud	aeb682dd18	ia64: replace get_unused_fd() with get_unused_fd_flags(0) This patch replaces calls to get_unused_fd() with equivalent call to get_unused_fd_flags(0) to preserve current behavor for existing code. In a further patch, get_unused_fd() will be removed so that new code start using get_unused_fd_flags(), with the hope O_CLOEXEC could be used, either by default or choosen by userspace. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:10 -08:00
Joe Perches	2841efa636	ia64: convert use of typedef ctl_table to struct ctl_table This typedef is unnecessary and should just be removed. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-06 16:08:15 -07:00
Michael Opdenacker	2958a489d7	ia64: Remove deprecated IRQF_DISABLED This patch removes the IRQF_DISABLED flag from ia64 architecture code. It's a NOOP since 2.6.35 and it will be removed one day. Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Cc: paul.gortmaker@windriver.com Cc: viro@zeniv.linux.org.uk Cc: srivatsa.bhat@linux.vnet.ibm.com Cc: andriy.shevchenko@linux.intel.com Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Link: http://lkml.kernel.org/r/1393964953-17002-1-git-send-email-michael.opdenacker@free-electrons.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-03-04 21:47:51 +01:00
Al Viro	b26d4cd385	consolidate simple ->d_delete() instances Rename simple_delete_dentry() to always_delete_dentry() and export it. Export simple_dentry_operations, while we are at it, and get rid of their duplicates Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-11-15 22:04:17 -05:00
Andy Shevchenko	7451adc516	[IA64] perfmon: Use %phD specifier to dump small buffers pfm_uuid_t value is defined as unsigned char [16]. Thus, we may dump its value as byte buffer using %phD specifier. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2013-06-03 16:03:17 -07:00
Thomas Gleixner	91d591c387	ia64: Use generic idle loop Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Magnus Damm <magnus.damm@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20130321215234.406851909@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2013-04-08 17:39:25 +02:00
Eric W. Biederman	7f78e03513	fs: Limit sys_mount to only request filesystem modules. Modify the request_module to prefix the file system type with "fs-" and add aliases to all of the filesystems that can be built as modules to match. A common practice is to build all of the kernel code and leave code that is not commonly needed as modules, with the result that many users are exposed to any bug anywhere in the kernel. Looking for filesystems with a fs- prefix limits the pool of possible modules that can be loaded by mount to just filesystems trivially making things safer with no real cost. Using aliases means user space can control the policy of which filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf with blacklist and alias directives. Allowing simple, safe, well understood work-arounds to known problematic software. This also addresses a rare but unfortunate problem where the filesystem name is not the same as it's module name and module auto-loading would not work. While writing this patch I saw a handful of such cases. The most significant being autofs that lives in the module autofs4. This is relevant to user namespaces because we can reach the request module in get_fs_type() without having any special permissions, and people get uncomfortable when a user specified string (in this case the filesystem type) goes all of the way to request_module. After having looked at this issue I don't think there is any particular reason to perform any filtering or permission checks beyond making it clear in the module request that we want a filesystem module. The common pattern in the kernel is to call request_module() without regards to the users permissions. In general all a filesystem module does once loaded is call register_filesystem() and go to sleep. Which means there is not much attack surface exposed by loading a filesytem module unless the filesystem is mounted. In a user namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT, which most filesystems do not set today. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Kees Cook <keescook@chromium.org> Reported-by: Kees Cook <keescook@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2013-03-03 19:36:31 -08:00
Anatol Pomozov	39b6525274	fs: Preserve error code in get_empty_filp(), part 2 Allocating a file structure in function get_empty_filp() might fail because of several reasons: - not enough memory for file structures - operation is not allowed - user is over its limit Currently the function returns NULL in all cases and we loose the exact reason of the error. All callers of get_empty_filp() assume that the function can fail with ENFILE only. Return error through pointer. Change all callers to preserve this error code. [AV: cleaned up a bit, carved the get_empty_filp() part out into a separate commit (things remaining here deal with alloc_file()), removed pipe(2) behaviour change] Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-22 23:31:32 -05:00
Konstantin Khlebnikov	314e51b985	mm: kill vma flag VM_RESERVED and mm->reserved_vm counter A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA, currently it lost original meaning but still has some effects: \| effect \| alternative flags -+------------------------+--------------------------------------------- 1\| account as reserved_vm \| VM_IO 2\| skip in core dump \| VM_IO, VM_DONTDUMP 3\| do not merge or expand \| VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP 4\| do not mlock \| VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP This patch removes reserved_vm counter from mm_struct. Seems like nobody cares about it, it does not exported into userspace directly, it only reduces total_vm showed in proc. Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND \| VM_DONTDUMP. remap_pfn_range() and io_remap_pfn_range() set VM_IO\|VM_DONTEXPAND\|VM_DONTDUMP. remap_vmalloc_range() set VM_DONTEXPAND \| VM_DONTDUMP. [akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup] Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:19 +09:00
Linus Torvalds	aab174f0df	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs update from Al Viro: - big one - consolidation of descriptor-related logics; almost all of that is moved to fs/file.c (BTW, I'm seriously tempted to rename the result to fd.c. As it is, we have a situation when file_table.c is about handling of struct file and file.c is about handling of descriptor tables; the reasons are historical - file_table.c used to be about a static array of struct file we used to have way back). A lot of stray ends got cleaned up and converted to saner primitives, disgusting mess in android/binder.c is still disgusting, but at least doesn't poke so much in descriptor table guts anymore. A bunch of relatively minor races got fixed in process, plus an ext4 struct file leak. - related thing - fget_light() partially unuglified; see fdget() in there (and yes, it generates the code as good as we used to have). - also related - bits of Cyrill's procfs stuff that got entangled into that work; _not_ all of it, just the initial move to fs/proc/fd.c and switch of fdinfo to seq_file. - Alex's fs/coredump.c spiltoff - the same story, had been easier to take that commit than mess with conflicts. The rest is a separate pile, this was just a mechanical code movement. - a few misc patches all over the place. Not all for this cycle, there'll be more (and quite a few currently sit in akpm's tree)." Fix up trivial conflicts in the android binder driver, and some fairly simple conflicts due to two different changes to the sock_alloc_file() interface ("take descriptor handling from sock_alloc_file() to callers" vs "net: Providing protocol type via system.sockprotoname xattr of /proc/PID/fd entries" adding a dentry name to the socket) * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits) MAX_LFS_FILESIZE should be a loff_t compat: fs: Generic compat_sys_sendfile implementation fs: push rcu_barrier() from deactivate_locked_super() to filesystems btrfs: reada_extent doesn't need kref for refcount coredump: move core dump functionality into its own file coredump: prevent double-free on an error path in core dumper usb/gadget: fix misannotations fcntl: fix misannotations ceph: don't abuse d_delete() on failure exits hypfs: ->d_parent is never NULL or negative vfs: delete surplus inode NULL check switch simple cases of fget_light to fdget new helpers: fdget()/fdput() switch o2hb_region_dev_write() to fget_light() proc_map_files_readdir(): don't bother with grabbing files make get_file() return its argument vhost_set_vring(): turn pollstart/pollstop into bool switch prctl_set_mm_exe_file() to fget_light() switch xfs_find_handle() to fget_light() switch xfs_swapext() to fget_light() ...	2012-10-02 20:25:04 -07:00
Al Viro	2903ff019b	switch simple cases of fget_light to fdget Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-09-26 22:20:08 -04:00
Al Viro	cb0942b812	make get_file() return its argument simplifies a bunch of callers... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-09-26 21:10:25 -04:00
Al Viro	7456a29bcf	switch itanic perfmonctl(2) to fget_light() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-09-26 21:10:05 -04:00
Eric W. Biederman	6c1ee03359	userns: On ia64 deal with current_uid and current_gid being kuid and kgid These ia64 uses of current_uid and current_gid slipped through the cracks when I was converting everything to kuids and kgids convert them now. Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-09-21 03:13:29 -07:00
Huang Shijie	44de9d0cad	mm: account the total_vm in the vm_stat_account() vm_stat_account() accounts the shared_vm, stack_vm and reserved_vm now. But we can also account for total_vm in the vm_stat_account() which makes the code tidy. Even for mprotect_fixup(), we can get the right result in the end. Signed-off-by: Huang Shijie <shijie8@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:39 -07:00
Al Viro	4ad310b836	ia64 perfmon: fix get_unmapped_area() use there get_unmapped_area() returns -E... on failure, not 0. Moreover, the wrapper around it is completely pointless. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-30 21:04:57 -04:00
Al Viro	bfce281c28	kill mm argument of vm_munmap() it's always current->mm Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-04-21 01:58:20 -04:00
Al Viro	9f3a4afb27	perfmon: kill some helpers and arguments pfm_vm_munmap() is simply vm_munmap() and pfm_remove_smpl_mapping() always get current as the first argument. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-04-21 01:58:18 -04:00
Linus Torvalds	a46ef99d80	VM: add "vm_munmap()" helper function Like the vm_brk() function, this is the same as "do_munmap()", except it does the VM locking for the caller. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-04-20 17:29:13 -07:00
David Howells	c140d87995	Disintegrate asm/system.h for IA64 Disintegrate asm/system.h for IA64. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Tony Luck <tony.luck@intel.com> cc: linux-ia64@vger.kernel.org	2012-03-28 18:30:02 +01:00
Al Viro	4c1d5a64f1	vfs: for usbfs, etc. internal vfsmounts ->mnt_sb->s_root == ->mnt_root Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:52:41 -05:00
Tony Luck	09579770dc	[IA64] fix build error - arch/ia64/kernel/perfmon.c arch/ia64/kernel/perfmon.c:621: error: duplicate 'static' Introduced by commit `c74a1cbb3c` pass default dentry_operations to mount_pseudo() Signed-off-by: Tony Luck <tony.luck@intel.com>	2011-01-13 14:49:56 -08:00
Al Viro	c74a1cbb3c	pass default dentry_operations to mount_pseudo() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:03:43 -05:00
Linus Torvalds	ecacc6c70c	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] Avoid array overflow if there are too many cpus in SRAT table [IA64] Remove unlikely from cpu_is_offline [IA64] irq_ia64, use set_irq_chip [IA64] perfmon: Change vmalloc to vzalloc and drop memset. [IA64] eliminate race condition in smp_flush_tlb_mm	2011-01-10 14:52:44 -08:00
Nick Piggin	b3e19d924b	fs: scale mntget/mntput The problem that this patch aims to fix is vfsmount refcounting scalability. We need to take a reference on the vfsmount for every successful path lookup, which often go to the same mount point. The fundamental difficulty is that a "simple" reference count can never be made scalable, because any time a reference is dropped, we must check whether that was the last reference. To do that requires communication with all other CPUs that may have taken a reference count. We can make refcounts more scalable in a couple of ways, involving keeping distributed counters, and checking for the global-zero condition less frequently. - check the global sum once every interval (this will delay zero detection for some interval, so it's probably a showstopper for vfsmounts). - keep a local count and only taking the global sum when local reaches 0 (this is difficult for vfsmounts, because we can't hold preempt off for the life of a reference, so a counter would need to be per-thread or tied strongly to a particular CPU which requires more locking). - keep a local difference of increments and decrements, which allows us to sum the total difference and hence find the refcount when summing all CPUs. Then, keep a single integer "long" refcount for slow and long lasting references, and only take the global sum of local counters when the long refcount is 0. This last scheme is what I implemented here. Attached mounts and process root and working directory references are "long" references, and everything else is a short reference. This allows scalable vfsmount references during path walking over mounted subtrees and unattached (lazy umounted) mounts with processes still running in them. This results in one fewer atomic op in the fastpath: mntget is now just a per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock and non-atomic decrement in the common case. However code is otherwise bigger and heavier, so single threaded performance is basically a wash. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:33 +11:00
Nick Piggin	fb045adb99	fs: dcache reduce branches in lookup path Reduce some branches and memory accesses in dcache lookup by adding dentry flags to indicate common d_ops are set, rather than having to check them. This saves a pointer memory access (dentry->d_op) in common path lookup situations, and saves another pointer load and branch in cases where we have d_op but not the particular operation. Patched with: git grep -E '[.>]([[:space:]])d_op([[:space:]])=' \| xargs sed -e 's/\([^\t ]\)->d_op = \(.\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]\)\.d_op = \(.\);/d_set_d_op(\&\1, \2);/' -i Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:28 +11:00
Nick Piggin	fe15ce446b	fs: change d_delete semantics Change d_delete from a dentry deletion notification to a dentry caching advise, more like ->drop_inode. Require it to be constant and idempotent, and not take d_lock. This is how all existing filesystems use the callback anyway. This makes fine grained dentry locking of dput and dentry lru scanning much simpler. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:18 +11:00
Jesper Juhl	e21763dbce	[IA64] perfmon: Change vmalloc to vzalloc and drop memset. vzalloc() nicely zeroes memory for us, so we don't have to do a vmalloc() and then manually memset() the returned memory when all we want is for it to be zero. Patch changes this for pfm_rvmalloc(). Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Tony Luck <tony.luck@intel.com>	2010-12-28 14:13:36 -08:00
Al Viro	51139adac9	convert get_sb_pseudo() users Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-29 04:16:33 -04:00
Joe Perches	df0a59a14c	[IA64] Remove unnecessary casts of private_data in perfmon.c Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2010-09-23 14:34:56 -07:00
Miklos Szeredi	7ae6bdbd9b	ia64: perfmon: add d_dname method Switch ia64/perfmon to using the d_dname() instead of relying on __d_path() to prepend the name of the root dentry to the path. CC: Tony Luck <tony.luck@intel.com> CC: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-11 00:28:21 -04:00
Arnd Bergmann	ba58aebf56	[IA64] perfmon: convert to unlocked_ioctl The ioctl function in this driver does not do anything that requires the BKL, so make it use unlocked_ioctl. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Tony Luck <tony.luck@intel.com>	2010-07-06 11:38:59 -07:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Rik van Riel	5beb493052	mm: change anon_vma linking to fix multi-process server scalability issue The old anon_vma code can lead to scalability issues with heavily forking workloads. Specifically, each anon_vma will be shared between the parent process and all its child processes. In a workload with 1000 child processes and a VMA with 1000 anonymous pages per process that get COWed, this leads to a system with a million anonymous pages in the same anon_vma, each of which is mapped in just one of the 1000 processes. However, the current rmap code needs to walk them all, leading to O(N) scanning complexity for each page. This can result in systems where one CPU is walking the page tables of 1000 processes in page_referenced_one, while all other CPUs are stuck on the anon_vma lock. This leads to catastrophic failure for a benchmark like AIM7, where the total number of processes can reach in the tens of thousands. Real workloads are still a factor 10 less process intensive than AIM7, but they are catching up. This patch changes the way anon_vmas and VMAs are linked, which allows us to associate multiple anon_vmas with a VMA. At fork time, each child process gets its own anon_vmas, in which its COWed pages will be instantiated. The parents' anon_vma is also linked to the VMA, because non-COWed pages could be present in any of the children. This reduces rmap scanning complexity to O(1) for the pages of the 1000 child processes, with O(N) complexity for at most 1/N pages in the system. This reduces the average scanning cost in heavily forking workloads from O(N) to 2. The only real complexity in this patch stems from the fact that linking a VMA to anon_vmas now involves memory allocations. This means vma_adjust can fail, if it needs to attach a VMA to anon_vma structures. This in turn means error handling needs to be added to the calling functions. A second source of complexity is that, because there can be multiple anon_vmas, the anon_vma linking in vma_adjust can no longer be done under "the" anon_vma lock. To prevent the rmap code from walking up an incomplete VMA, this patch introduces the VM_LOCK_RMAP VMA flag. This bit flag uses the same slot as the NOMMU VM_MAPPED_COPY, with an ifdef in mm.h to make sure it is impossible to compile a kernel that needs both symbolic values for the same bitflag. Some test results: Without the anon_vma changes, when AIM7 hits around 9.7k users (on a test box with 16GB RAM and not quite enough IO), the system ends up running >99% in system time, with every CPU on the same anon_vma lock in the pageout code. With these changes, AIM7 hits the cross-over point around 29.7k users. This happens with ~99% IO wait time, there never seems to be any spike in system time. The anon_vma lock contention appears to be resolved. [akpm@linux-foundation.org: cleanups] Signed-off-by: Rik van Riel <riel@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-03-06 11:26:26 -08:00
Frans Pop	04157e4c06	[IA64] remove trailing space in messages ia64 parts of system wide cleanup to drop trailing whitespace from lines in message strings. Signed-off-by: Frans Pop <elendil@planet.nl> Signed-off-by: Tony Luck <tony.luck@intel.com>	2010-02-26 11:18:52 -08:00
Jiri Slaby	02b763b8cc	[IA64] use helpers for rlimits Make sure compiler won't do weird things with limits. E.g. fetching them twice may return 2 different values after writable limits are implemented. I.e. either use rlimit helpers added in `3e10e716ab` or ACCESS_ONCE if not applicable. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2010-01-06 15:49:06 -08:00
Al Viro	2c48b9c455	switch alloc_file() to passing struct path ... and have the caller grab both mnt and dentry; kill leak in infiniband, while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-12-16 12:16:42 -05:00
Linus Torvalds	4ef58d4e2a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits) tree-wide: fix misspelling of "definition" in comments reiserfs: fix misspelling of "journaled" doc: Fix a typo in slub.txt. inotify: remove superfluous return code check hdlc: spelling fix in find_pvc() comment doc: fix regulator docs cut-and-pasteism mtd: Fix comment in Kconfig doc: Fix IRQ chip docs tree-wide: fix assorted typos all over the place drivers/ata/libata-sff.c: comment spelling fixes fix typos/grammos in Documentation/edac.txt sysctl: add missing comments fs/debugfs/inode.c: fix comment typos sgivwfb: Make use of ARRAY_SIZE. sky2: fix sky2_link_down copy/paste comment error tree-wide: fix typos "couter" -> "counter" tree-wide: fix typos "offest" -> "offset" fix kerneldoc for set_irq_msi() spidev: fix double "of of" in comment comment typo fix: sybsystem -> subsystem ...	2009-12-09 19:43:33 -08:00
André Goddard Rosa	af901ca181	tree-wide: fix assorted typos all over the place That is "success", "unknown", "through", "performance", "[re\|un]mapping" , "access", "default", "reasonable", "[con]currently", "temperature" , "channel", "[un]used", "application", "example","hierarchy", "therefore" , "[over\|under]flow", "contiguous", "threshold", "enough" and others. Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-12-04 15:39:55 +01:00
Eric W. Biederman	6d4561110a	sysctl: Drop & in front of every proc_handler. For consistency drop & in front of every proc_handler. Explicity taking the address is unnecessary and it prevents optimizations like stubbing the proc_handlers to NULL. Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Joe Perches <joe@perches.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2009-11-18 08:37:40 -08:00
Eric W. Biederman	d00faf81af	sysctl ia64: Remove dead binary sysctl support Now that sys_sysctl is a generic wrapper around /proc/sys .ctl_name and .strategy members of sysctl tables are dead code. Remove them. Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2009-11-12 02:05:01 -08:00
Jan Beulich	fa276f36f3	[IA64] address compiler warnings perfmon.c/salinfo.c perfmon.c has a dubious cast directly from "int" to "void *". Add an intermediate cast to "long" to keep gcc happy. salinfo.c uses "down_trylock()" in a highly creative way (explained in the comments in the file) ... but it does kick out this warning: arch/ia64/kernel/salinfo.c:195: warning: ignoring return value of 'down_trylock' which people occasionally try to "fix" in ways that do not work. Use some casts to keep gcc quiet. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2009-06-30 14:26:34 -07:00
Tony Luck	27f70c3117	Pull for-2.6.31 into release	2009-06-17 09:35:24 -07:00
Matthew Wilcox	e088a4ad7f	[IA64] Convert ia64 to use int-ll64.h It is generally agreed that it would be beneficial for u64 to be an unsigned long long on all architectures. ia64 (in common with several other 64-bit architectures) currently uses unsigned long. Migrating piecemeal is too painful; this giant patch fixes all compilation warnings and errors that come as a result of switching to use int-ll64.h. Note that userspace will still see __u64 defined as unsigned long. This is important as it affects C++ name mangling. [Updated by Tony Luck to change efi.h:efi_freemem_callback_t to use u64 for start/end rather than unsigned long] Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2009-06-17 09:33:49 -07:00
Thomas Gleixner	8b0b1db013	remove put_cpu_no_resched() put_cpu_no_resched() is an optimization of put_cpu() which unfortunately can cause high latencies. The nfs iostats code uses put_cpu_no_resched() in a code sequence where a reschedule request caused by an interrupt between the get_cpu() and the put_cpu_no_resched() can delay the reschedule for at least HZ. The other users of put_cpu_no_resched() optimize correctly in interrupt code, but there is no real harm in using the put_cpu() function which is an alias for preempt_enable(). The extra check of the preemmpt count is not as critical as the potential source of missing a reschedule. Debugged in the preempt-rt tree and verified in mainline. Impact: remove a high latency source [akpm@linux-foundation.org: build fix] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Tony Luck <tony.luck@intel.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:48 -07:00

1 2 3

118 Commits