OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Johannes Weiner	759496ba64	arch: mm: pass userspace fault flag to generic fault handler Unlike global OOM handling, memory cgroup code will invoke the OOM killer in any OOM situation because it has no way of telling faults occuring in kernel context - which could be handled more gracefully - from user-triggered faults. Pass a flag that identifies faults originating in user space from the architecture-specific fault handlers to generic code so that memcg OOM handling can be improved. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: azurIt <azurit@pobox.sk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-09-12 15:38:01 -07:00
David Rientjes	c2d23f919b	mm, oom: remove statically defined arch functions of same name out_of_memory() is a globally defined function to call the oom killer. x86, sh, and powerpc all use a function of the same name within file scope in their respective fault.c unnecessarily. Inline the functions into the pagefault handlers to clean the code up. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-12 17:38:34 -08:00
Shaohua Li	45cac65b0f	readahead: fault retry breaks mmap file read random detection .fault now can retry. The retry can break state machine of .fault. In filemap_fault, if page is miss, ra->mmap_miss is increased. In the second try, since the page is in page cache now, ra->mmap_miss is decreased. And these are done in one fault, so we can't detect random mmap file access. Add a new flag to indicate .fault is tried once. In the second try, skip ra->mmap_miss decreasing. The filemap_fault state machine is ok with it. I only tested x86, didn't test other archs, but looks the change for other archs is obvious, but who knows :) Signed-off-by: Shaohua Li <shaohua.li@fusionio.com> Cc: Rik van Riel <riel@redhat.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:47 +09:00
Paul Mundt	90eed7d87b	sh: Fix up recursive fault in oops with unset TTB. Presently the oops code looks for the pgd either from the mm context or the cached TTB value. There are presently cases where the TTB can be unset or otherwise cleared by hardware, which we weren't handling, resulting in recursive faults on the NULL pgd. In these cases we can simply reload from swapper_pg_dir and continue on as normal. Cc: stable@vger.kernel.org Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2012-07-25 13:11:13 +09:00
Paul Mundt	d8fd35fc58	sh64: Fix up vmalloc fault range check. With the previous attempt reverted this switches to conditionalizing the end address. Nominally VMALLOC_END, but extended for P3_ADDR_MAX in the store queue case. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2012-05-18 20:01:16 +09:00
Paul Mundt	c3e0af9879	Revert "sh: Ensure fixmap and store queue space can co-exist." This reverts commit `20e7c297ef`. With store queues enabled the area above P4SEG has special properties from the MMU's point of view, which was causing fixmap failure. We'll have to do something else to satisfy the vmalloc range check. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2012-05-18 19:30:05 +09:00
Paul Mundt	28080329ed	sh: Enable shared page fault handler for _32/_64. This moves the now generic _32 page fault handling code to a shared place and adapts the _64 implementation to make use of it. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2012-05-14 15:33:28 +09:00
Paul Mundt	811d50cb43	sh: Move in the SH-5 TLB miss. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2008-01-28 13:18:50 +09:00
Paul Mundt	0f1a394ba6	sh: lockless UTLB miss fast-path. With the refactored update_mmu_cache() introduced in older kernels, there's no longer any need to take the page_table_lock in this path, so simply drop it completely. Without this, performance degradation is seen on SMP on heavily threaded workloads that don't use the split ptlock, and ultimately we have no reason to contend for the lock in the first place. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-11-19 13:05:18 +09:00
Paul Mundt	1c6b2ca5e0	sh: Kill off UTLB flush in fast-path. The __do_page_fault() fast-path contains a UTLB flush in order to force an ITLB reload, this isn't needed in practice as the ITLB is auto-reloaded from the UTLB anyways, which is already displaced by the manual 'ldtlb' in the update_mmu_cache() path. This provides a measurable speed up in the TLB miss fast-path. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-11-19 13:00:32 +09:00
Serge E. Hallyn	b460cbc581	pid namespaces: define is_global_init() and is_container_init() is_init() is an ambiguous name for the pid==1 check. Split it into is_global_init() and is_container_init(). A cgroup init has it's tsk->pid == 1. A global init also has it's tsk->pid == 1 and it's active pid namespace is the init_pid_ns. But rather than check the active pid namespace, compare the task structure with 'init_pid_ns.child_reaper', which is initialized during boot to the /sbin/init process and never changes. Changelog: 2.6.22-rc4-mm2-pidns1: - Use 'init_pid_ns.child_reaper' to determine if a given task is the global init (/sbin/init) process. This would improve performance and remove dependence on the task_pid(). 2.6.21-mm2-pidns2: - [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc, ppc,avr32}/traps.c for the _exception() call to is_global_init(). This way, we kill only the cgroup if the cgroup's init has a bug rather than force a kernel panic. [akpm@linux-foundation.org: fix comment] [sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c] [bunk@stusta.de: kernel/pid.c: remove unused exports] [sukadev@us.ibm.com: Fix capability.c to work with threaded init] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Acked-by: Pavel Emelianov <xemul@openvz.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Herbert Poetzel <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:37 -07:00
Will Schmidt	dcca2bde4f	During VM oom condition, kill all threads in process group We have had complaints where a threaded application is left in a bad state after one of it's threads is killed when we hit a VM: out_of_memory condition. Killing just one of the process threads can leave the application in a bad state, whereas killing the entire process group would allow for the application to restart, or be otherwise handled, and makes it very obvious that something has gone wrong. This change allows the entire process group to be taken down, rather than just the one thread. Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Andi Kleen <ak@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Matthew Wilcox <willy@debian.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:52 -07:00
Paul Mundt	06f862c8ce	sh: Fix pgd mismatch from cached TTB in unhandled fault. When reading the cached TTB value and extracting the pgd, we accidentally applied a __va() to it and bumped it off in to bogus space which ended up causing multiple faults in the error path. Fix it up so unhandled faults don't do strange and highly unorthodox things when oopsing. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-08-01 16:39:51 +09:00
Nick Piggin	83c54070ee	mm: fault feedback #2 This patch completes Linus's wish that the fault return codes be made into bit flags, which I agree makes everything nicer. This requires requires all handle_mm_fault callers to be modified (possibly the modifications should go further and do things like fault accounting in handle_mm_fault -- however that would be for another patch). [akpm@linux-foundation.org: fix alpha build] [akpm@linux-foundation.org: fix s390 build] [akpm@linux-foundation.org: fix sparc build] [akpm@linux-foundation.org: fix sparc64 build] [akpm@linux-foundation.org: fix ia64 build] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Bryan Wu <bryan.wu@analog.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Greg Ungerer <gerg@uclinux.org> Cc: Matthew Wilcox <willy@debian.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Still apparently needs some ARM and PPC loving - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:41 -07:00
Paul Mundt	0630e45c88	sh: Check oops_may_print() in unhandled fault. Only print out pgd/pte data in the oops path if oops_may_print() holds true. Follows the i386 implementation. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-06-18 19:02:47 +09:00
Christoph Hellwig	fce692e798	sh: revert addition of page fault notifiers Just at the time you added them on sh we're removing them from other architectures. As there's no user yet this patch just removes them completely. Once you actually have a kprobes patch it should follow the direct call to kprobes_fault_handler model that powerpc, s390 and sparc64 employ in 2.6.22-rc1 and that I'm updating other architectures to. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-05-21 14:32:10 +09:00
Paul Mundt	b8947444a7	sh: Shut up compiler warnings in __do_page_fault(). GCC doesn't seem to be able to figure this one out for itself, so just shut it up.. CC arch/sh/mm/fault.o arch/sh/mm/fault.c: In function '__do_page_fault': arch/sh/mm/fault.c:288: warning: 'ptl' may be used uninitialized in this function Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-05-14 09:18:34 +09:00
Paul Mundt	b118ca572d	sh: Convert to common die chain. This went in immediately after SH added the die chain notifiers, so move over to that instead.. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-05-09 10:55:38 +09:00
Paul Mundt	3a2e117e22	sh: Add die chain notifiers. Add the atomic die chains in, kprobes needs these. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-05-07 02:11:57 +00:00
Paul Mundt	db2e1fa3f0	sh: Revert TLB miss fast-path changes that broke PTEA parts. This ended up causing problems for older parts (particularly ones using PTEA). Revert this for now, it can be added back in once it's had some more testing. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-02-14 14:13:10 +09:00
Paul Mundt	afbfb52e47	sh: stacktrace/lockdep/irqflags tracing support. Wire up all of the essentials for lockdep.. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2006-12-06 10:45:40 +09:00
Paul Mundt	bca7c20764	sh: Get the PGD right in oops case with 64-bit PTEs. Previously this was using a static pgd shift in the reporting code, simply flip this to PGDIR_SHIFT which does the right thing depending on varying PTE magnitudes on the SH-X2 MMU. While we're at it, and since it's been recently added, use get_TTB() for fetching the TTB, rather than the open coded instructions. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2006-12-06 10:45:39 +09:00
Stuart Menefy	9b3a53ab76	sh: TLB miss fast-path optimizations. Handle simple TLB miss faults which can be resolved completely from the page table in assembler. Signed-off-by: Stuart Menefy <stuart.menefy@st.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2006-12-06 10:45:38 +09:00
Stuart Menefy	99a596f93b	sh: pmd rework. Remove extra bits from the pmd structure and store a kernel logical address rather than a physical address. This allows it to be directly dereferenced. Another piece of wierdness inherited from x86. Signed-off-by: Stuart Menefy <stuart.menefy@st.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2006-12-06 10:45:38 +09:00
Stuart Menefy	b5a1bcbee4	sh: Set up correct siginfo structures for page faults. Remove the previous saving of fault codes into the thread_struct as they are never used, and appeared to be inherited from x86. Signed-off-by: Stuart Menefy <stuart.menefy@st.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2006-12-06 10:45:38 +09:00
Sukadev Bhattiprolu	f400e198b2	[PATCH] pidspace: is_init() This is an updated version of Eric Biederman's is_init() patch. (http://lkml.org/lkml/2006/2/6/280). It applies cleanly to 2.6.18-rc3 and replaces a few more instances of ->pid == 1 with is_init(). Further, is_init() checks pid and thus removes dependency on Eric's other patches for now. Eric's original description: There are a lot of places in the kernel where we test for init because we give it special properties. Most significantly init must not die. This results in code all over the kernel test ->pid == 1. Introduce is_init to capture this case. With multiple pid spaces for all of the cases affected we are looking for only the first process on the system, not some other process that has pid == 1. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: <lxc-devel@lists.sourceforge.net> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:12 -07:00
Jason Baron	df67b3daea	[PATCH] make PROT_WRITE imply PROT_READ Make PROT_WRITE imply PROT_READ for a number of architectures which don't support write only in hardware. While looking at this, I noticed that some architectures which do not support write only mappings already take the exact same approach. For example, in arch/alpha/mm/fault.c: " if (cause < 0) { if (!(vma->vm_flags & VM_EXEC)) goto bad_area; } else if (!cause) { /* Allow reads even for write-only mappings */ if (!(vma->vm_flags & (VM_READ \| VM_WRITE))) goto bad_area; } else { if (!(vma->vm_flags & VM_WRITE)) goto bad_area; } " Thus, this patch brings other architectures which do not support write only mappings in-line and consistent with the rest. I've verified the patch on ia64, x86_64 and x86. Additional discussion: Several architectures, including x86, can not support write-only mappings. The pte for x86 reserves a single bit for protection and its two states are read only or read/write. Thus, write only is not supported in h/w. Currently, if i 'mmap' a page write-only, the first read attempt on that page creates a page fault and will SEGV. That check is enforced in arch/blah/mm/fault.c. However, if i first write that page it will fault in and the pte will be set to read/write. Thus, any subsequent reads to the page will succeed. It is this inconsistency in behavior that this patch is attempting to address. Furthermore, if the page is swapped out, and then brought back the first read will also cause a SEGV. Thus, any arbitrary read on a page can potentially result in a SEGV. According to the SuSv3 spec, "if the application requests only PROT_WRITE, the implementation may also allow read access." Also as mentioned, some archtectures, such as alpha, shown above already take the approach that i am suggesting. The counter-argument to this raised by Arjan, is that the kernel is enforcing the write only mapping the best it can given the h/w limitations. This is true, however Alan Cox, and myself would argue that the inconsitency in behavior, that is applications can sometimes work/sometimes fails is highly undesireable. If you read through the thread, i think people, came to an agreement on the last patch i posted, as nobody has objected to it... Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Andi Kleen <ak@muc.de> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Ian Molton <spyro@f2s.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:05 -07:00
Paul Mundt	0f08f33808	sh: More cosmetic cleanups and trivial fixes. Nothing exciting here, just trivial fixes.. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2006-09-27 17:03:56 +09:00
Paul Mundt	f647d33f87	sh: Fix split ptlock for user mappings in __do_page_fault(). There was a bug that got introduced when the split ptlock changes went in where mm could be unintialized for user mappings, this fixes it up.. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2006-09-27 15:30:24 +09:00
Paul Mundt	26ff6c11ef	sh: page table alloc cleanups and page fault optimizations. Cleanup of page table allocators, using generic folded PMD and PUD helpers. TLB flushing operations are moved to a more sensible spot. The page fault handler is also optimized slightly, we no longer waste cycles on IRQ disabling for flushing of the page from the ITLB, since we're already under CLI protection by the initial exception handler. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2006-09-27 15:13:36 +09:00
Paul Mundt	298476220d	sh: Add control register barriers. Currently when making changes to control registers, we typically need some time for changes to take effect (8 nops, generally). However, for sh4a we simply need to do an icbi.. This is a simple patch for implementing a general purpose ctrl_barrier() which functions as a control register write barrier. There's some additional documentation in the patch itself, but it's pretty self explanatory. There were also some places where we were not doing the barrier, which didn't seem to have any adverse effects on legacy parts, but certainly did on sh4a. It's safer to have the barrier in place for legacy parts as well in these cases, though this does make flush_tlb_all() more expensive (by an order of 8 nops). We can ifdef around the flush_tlb_all() case for now if it's clear that all legacy parts won't have a problem with this. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2006-09-27 14:57:44 +09:00
Hugh Dickins	60ec558549	[PATCH] mm: i386 sh sh64 ready for split ptlock Use pte_offset_map_lock, instead of pte_offset_map (or inappropriate pte_offset_kernel) and mm-wide page_table_lock, in sundry arch places. The i386 vm86 mark_screen_rdonly: yes, there was and is an assumption that the screen fits inside the one page table, as indeed it does. The sh __do_page_fault: which handles both kernel faults (without lock) and user mm faults (locked - though it set_pte without locking before). The sh64 flush_cache_range and helpers: which wrongly thought callers held page_table_lock before (only its tlb_start_vma did, and no longer does so); moved the flush loop down, and adjusted the large versus small range decision to consider a range which spans page tables as large. Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-10-29 21:40:41 -07:00
Linus Torvalds	1da177e4c3	Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!	2005-04-16 15:20:36 -07:00

33 Commits