OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	e7fd3b4669	Merge branch 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix binutils-2.21 symbol related build failures x86-64, trampoline: Remove unused variable x86, reboot: Fix the use of passed arguments in 32-bit BIOS reboot x86, reboot: Move the real-mode reboot code to an assembly file x86: Make the GDT_ENTRY() macro in <asm/segment.h> safe for assembly x86, trampoline: Use the unified trampoline setup for ACPI wakeup x86, trampoline: Common infrastructure for low memory trampolines Fix up trivial conflicts in arch/x86/kernel/Makefile	2011-03-16 10:10:02 -07:00
Linus Torvalds	da849abeb8	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, binutils, xen: Fix another wrong size directive x86: Remove dead config option X86_CPU x86: Really print supported CPUs if PROCESSOR_SELECT=y x86: Fix a bogus unwind annotation in lib/semaphore_32.S um, x86-64: Fix UML build after adding CFI annotations to lib/rwsem_64.S x86: Remove unused bits from lib/thunk_*.S x86: Use {push,pop}_cfi in more places x86-64: Add CFI annotations to lib/rwsem_64.S x86, asm: Cleanup unnecssary macros in asm-offsets.c x86, system.h: Drop unused __SAVE/__RESTORE macros x86: Use bitmap library functions x86: Partly unify asm-offsets_{32,64}.c x86: Reduce back the alignment of the per-CPU data section	2011-03-15 18:59:56 -07:00
Sedat Dilek	2ae9d293b1	x86: Fix binutils-2.21 symbol related build failures New binutils version 2.21.0.20110302-1 started checking that the symbol parameter to the .size directive matches the entry name's symbol parameter, unearthing two mismatches: AS arch/x86/kernel/acpi/wakeup_rm.o arch/x86/kernel/acpi/wakeup_rm.S: Assembler messages: arch/x86/kernel/acpi/wakeup_rm.S:12: Error: .size expression with symbol `wakeup_code_start' does not evaluate to a constant arch/x86/kernel/entry_32.S: Assembler messages: arch/x86/kernel/entry_32.S:1421: Error: .size expression with symbol `apf_page_fault' does not evaluate to a constant The problem was discovered while using Debian's binutils (2.21.0.20110302-1) and experimenting with binutils from upstream. Thanks Alexander and H.J. for the vital help. Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com> Cc: Alexander van Heukelum <heukelum@fastmail.fm> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Rafael J. Wysocki <rjw@sisk.pl> LKML-Reference: <1299620364-21644-1-git-send-email-sedat.dilek@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-09 10:25:45 +01:00
Jiri Olsa	ea7145477a	x86: Separate out entry text section Put x86 entry code into a separate link section: .entry.text. Separating the entry text section seems to have performance benefits - caused by more efficient instruction cache usage. Running hackbench with perf stat --repeat showed that the change compresses the icache footprint. The icache load miss rate went down by about 15%: before patch: 19417627 L1-icache-load-misses ( +- 0.147% ) after patch: 16490788 L1-icache-load-misses ( +- 0.180% ) The motivation of the patch was to fix a particular kprobes bug that relates to the entry text section, the performance advantage was discovered accidentally. Whole perf output follows: - results for current tip tree: Performance counter stats for './hackbench/hackbench 10' (500 runs): 19417627 L1-icache-load-misses ( +- 0.147% ) 2676914223 instructions # 0.497 IPC ( +- 0.079% ) 5389516026 cycles ( +- 0.144% ) 0.206267711 seconds time elapsed ( +- 0.138% ) - results for current tip tree with the patch applied: Performance counter stats for './hackbench/hackbench 10' (500 runs): 16490788 L1-icache-load-misses ( +- 0.180% ) 2717734941 instructions # 0.502 IPC ( +- 0.079% ) 5414756975 cycles ( +- 0.148% ) 0.206747566 seconds time elapsed ( +- 0.137% ) Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: masami.hiramatsu.pt@hitachi.com Cc: ananth@in.ibm.com Cc: davem@davemloft.net Cc: 2nddept-manager@sdl.hitachi.co.jp LKML-Reference: <20110307181039.GB15197@jolsa.redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-08 17:22:11 +01:00
Jan Beulich	60cf637a13	x86: Use {push,pop}_cfi in more places Cleaning up and shortening code... Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Alexander van Heukelum <heukelum@fastmail.fm> LKML-Reference: <4D6BD35002000078000341DA@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-02-28 18:06:22 +01:00
Stratos Psomadakis	7bf04be8f4	x86, asm: Cleanup unnecssary macros in asm-offsets.c PAGE_SIZE_asm, PAGE_SHIFT_asm, THREAD_SIZE_asm can be safely removed from asm-offsets.c, and be replaced by their non-'_asm' counterparts in the code that uses them, since the _AC macro defined in include/linux/const.h makes PAGE_SIZE/PAGE_SHIFT/THREAD_SIZE work with as. Signed-off-by: Stratos Psomadakis <psomas@cslab.ece.ntua.gr> LKML-Reference: <1298666774-17646-2-git-send-email-psomas@cslab.ece.ntua.gr> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-02-25 16:37:32 -08:00
Gleb Natapov	631bc48782	KVM: Handle async PF in a guest. When async PF capability is detected hook up special page fault handler that will handle async page fault events and bypass other page faults to regular page fault handler. Also add async PF handling to nested SVM emulation. Async PF always generates exit to L1 where vcpu thread will be scheduled out until page is available. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:16 +02:00
Tetsuo Handa	96e612ffc3	x86, asm: Fix binutils 2.15 build failure Add parentheses around one pushl_cfi argument. Commit `df5d1874` "x86: Use {push,pop}{l,q}_cfi in more places" caused GNU assembler 2.15 (Debian Sarge) to fail. It is still failing as of commit `07bd8516` "x86, asm: Restore parentheses around one pushl_cfi argument". This patch solves build failure with GNU assembler 2.15. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Jan Beulich <jbeulich@novell.com> Cc: heukelum@fastmail.fm Cc: hpa@linux.intel.com LKML-Reference: <201011160445.oAG4jGif079860@www262.sakura.ne.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 09:25:11 +01:00
Jan Beulich	07bd8516a2	x86, asm: Restore parentheses around one pushl_cfi argument These were (intentionally) stripped by "fix CFI macro invocations to deal with shortcomings in gas" to expose problems with unexpected splitting of arguments by older gas also on newer versions, but as it turns out there is at least one distro (Ubuntu 6.06) where even not having any spaces in a macro argument doesn't reliably prevent splitting into multiple arguments. Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Alexander van Heukelum <heukelum@fastmail.fm> LKML-Reference: <4CC157DB020000780001E8A2@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-22 10:51:44 +02:00
Jan Beulich	3234282f33	x86, asm: Fix CFI macro invocations to deal with shortcomings in gas gas prior to (perhaps) 2.16.90 has problems with passing non- parenthesized expressions containing spaces to macros. Spaces, however, get inserted by cpp between any macro expanding to a number and a subsequent + or -. For the +, current x86 gas then removes the space again (future gas may not do so), but for the - the space gets retained and is then considered a separator between macro arguments. Fix the respective definitions for both the - and + cases, so that they neither contain spaces nor make cpp insert any (the latter by adding seemingly redundant parentheses). Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4CBDBEBA020000780001E05A@vpn.id2.novell.com> Cc: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-19 14:28:02 -07:00
Jan Beulich	df5d1874ce	x86: Use {push,pop}{l,q}_cfi in more places ... plus additionally introduce {push,pop}f{l,q}_cfi. All in the hope that the code becomes better readable this way (it gets quite a bit smaller in any case). Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Alexander van Heukelum <heukelum@fastmail.fm> LKML-Reference: <4C7FBDA40200007800013FAF@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-03 08:14:11 +02:00
Jan Beulich	a34107b557	i386: Add unwind directives to syscall ptregs stubs When these stubs are actual functions (i.e. having a return instruction) and have stack manipulation instructions in them, they should also be annotated to allow unwinding through them. Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Alexander van Heukelum <heukelum@fastmail.fm> LKML-Reference: <4C7FBCF00200007800013F99@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-03 08:14:10 +02:00
Linus Torvalds	66cd55d2b9	Merge branch 'x86-alternatives-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-alternatives-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, alternatives: BUG on encountering an invalid CPU feature number x86, alternatives: Fix one more open-coded 8-bit alternative number x86, alternatives: Use 16-bit numbers for cpufeature index	2010-08-06 16:24:17 -07:00
Linus Torvalds	d9a73c0016	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: um, x86: Cast to (u64 *) inside set_64bit() x86-32, asm: Directly access per-cpu GDT x86-64, asm: Directly access per-cpu IST x86, asm: Merge cmpxchg_486_u64() and cmpxchg8b_emu() x86, asm: Move cmpxchg emulation code to arch/x86/lib x86, asm: Clean up and simplify <asm/cmpxchg.h> x86, asm: Clean up and simplify set_64bit() x86: Add memory modify constraints to xchg() and cmpxchg() x86-64: Simplify loading initial_gs x86: Use symbolic MSR names x86: Remove redundant K6 MSRs	2010-08-06 10:07:34 -07:00
Brian Gerst	72c511dd59	x86-32, asm: Directly access per-cpu GDT Use a direct per-cpu reference for the GDT instead of using a scratch register. Signed-off-by: Brian Gerst <brgerst@gmail.com> LKML-Reference: <1280594903-6341-2-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-01 16:05:23 -07:00
Sheng Yang	38e20b07ef	x86/xen: event channels delivery on HVM. Set the callback to receive evtchns from Xen, using the callback vector delivery mechanism. The traditional way for receiving event channel notifications from Xen is via the interrupts from the platform PCI device. The callback vector is a newer alternative that allow us to receive notifications on any vcpu and doesn't need any PCI support: we allocate a vector exclusively to receive events, in the vector handler we don't need to interact with the vlapic, therefore we avoid a VMEXIT. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-07-22 16:45:59 -07:00
H. Peter Anvin	83a7a2ad2a	x86, alternatives: Use 16-bit numbers for cpufeature index We already have cpufeature indicies above 255, so use a 16-bit number for the alternatives index. This consumes a padding field and so doesn't add any size, but it means that abusing the padding field to create assembly errors on overflow no longer works. We can retain the test simply by redirecting it to the .discard section, however. [ v3: updated to include open-coded locations ] Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <tip-f88731e3068f9d1392ba71cc9f50f035d26a0d4f@git.kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-07-07 10:36:28 -07:00
Brian Gerst	40d2e76315	x86-32: Rework cache flush denied handler The cache flush denied error is an erratum on some AMD 486 clones. If an invd instruction is executed in userspace, the processor calls exception 19 (13 hex) instead of #GP (13 decimal). On cpus where XMM is not supported, redirect exception 19 to do_general_protection(). Also, remove die_if_kernel(), since this was the last user. Signed-off-by: Brian Gerst <brgerst@gmail.com> LKML-Reference: <1269176446-2489-2-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-05-03 13:39:26 -07:00
Brian Gerst	e840227c14	x86, 32-bit: Use same regs as 64-bit for kernel_thread_helper The arg should be in %eax, but that is clobbered by the return value of clone. The function pointer can be in any register. Also, don't push args onto the stack, since regparm(3) is the normal calling convention now. Signed-off-by: Brian Gerst <brgerst@gmail.com> LKML-Reference: <1260380084-3707-4-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-12-10 15:55:36 -08:00
H. Peter Anvin	ce9119ad90	x86-32: Avoid pipeline serialization in PTREGSCALL1 and 2 In the PTREGSCALL1 and 2 macros, we can trivially avoid an unnecessary pipeline serialization, so do so. In PTREGSCALLS3 this is much less clear-cut since we have to push a new value to the stack. Leave it alone for now assuming it is as good as it is going to be; may want to check on Atom or another in-order x86 to see if we can do better. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Brian Gerst <brgerst@gmail.com> LKML-Reference: <1260403316-5679-2-git-send-email-brgerst@gmail.com>	2009-12-09 16:33:44 -08:00
Brian Gerst	f839bbc5c8	x86: Merge sys_clone Change 32-bit sys_clone to new PTREGSCALL stub, and merge with 64-bit. Signed-off-by: Brian Gerst <brgerst@gmail.com> LKML-Reference: <1260403316-5679-7-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-12-09 16:29:42 -08:00
Brian Gerst	f1382f157f	x86, 32-bit: Convert sys_vm86 & sys_vm86old Convert these to new PTREGSCALL stubs. Signed-off-by: Brian Gerst <brgerst@gmail.com> LKML-Reference: <1260403316-5679-6-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-12-09 16:29:23 -08:00
Brian Gerst	052acad48a	x86: Merge sys_sigaltstack Change 32-bit sys_sigaltstack to PTREGSCALL2, and merge with 64-bit. Signed-off-by: Brian Gerst <brgerst@gmail.com> LKML-Reference: <1260403316-5679-5-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-12-09 16:28:59 -08:00
Brian Gerst	11cf88bd0b	x86: Merge sys_execve Change 32-bit sys_execve to PTREGSCALL3, and merge with 64-bit. Signed-off-by: Brian Gerst <brgerst@gmail.com> LKML-Reference: <1260403316-5679-4-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-12-09 16:28:34 -08:00
Brian Gerst	27f59559d6	x86: Merge sys_iopl Change 32-bit sys_iopl to PTREGSCALL1, and merge with 64-bit. Signed-off-by: Brian Gerst <brgerst@gmail.com> LKML-Reference: <1260403316-5679-3-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-12-09 16:28:10 -08:00
Brian Gerst	e258e4e0b4	x86-32: Add new pt_regs stubs Add new stubs which add the pt_regs pointer as the last arg, matching 64-bit. This will allow these syscalls to be easily merged. Signed-off-by: Brian Gerst <brgerst@gmail.com> LKML-Reference: <1260403316-5679-2-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-12-09 16:27:49 -08:00
Ingo Molnar	4331595650	Merge branch 'perf/core' into perf/probes Conflicts: tools/perf/Makefile Merge reason: - fix the conflict - pick up the pr_*() infrastructure to queue up dependent patch Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-10-23 08:23:20 +02:00
Steven Rostedt	194ec34184	function-graph/x86: Replace unbalanced ret with jmp The function graph tracer replaces the return address with a hook to trace the exit of the function call. This hook will finish by returning to the real location the function should return to. But the current implementation uses a ret to jump to the real return location. This causes a imbalance between calls and ret. That is the original function does a call, the ret goes to the handler and then the handler does a ret without a matching call. Although the function graph tracer itself still breaks the branch predictor by replacing the original ret, by using a second ret and causing an imbalance, it breaks the predictor even more. This patch replaces the ret with a jmp to keep the calls and ret balanced. I tested this on one box and it showed a 1.7% increase in performance. Another box only showed a small 0.3% increase. But no box that I tested this on showed a decrease in performance by making this change. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20091013203425.042034383@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-10-14 08:13:53 +02:00
Masami Hiramatsu	a00e817f42	kprobes/x86-32: Move irq-exit functions to kprobes section Move irq-exit functions to .kprobes.text section to protect against kprobes recursion. When I ran kprobe stress test on x86-32, I found below symbols cause unrecoverable recursive probing: ret_from_exception ret_from_intr check_userspace restore_all restore_all_notrace restore_nocheck irq_return And also, I found some interrupt/exception entry points that cause similar problems. This patch moves those symbols (including their container functions) to .kprobes.text section to prevent any kprobes probing. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> LKML-Reference: <20090908164755.24050.81182.stgit@dhcp-100-2-132.bos.redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-09-11 03:59:35 +02:00
Linus Torvalds	b0b7065b64	Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits) tracing/urgent: warn in case of ftrace_start_up inbalance tracing/urgent: fix unbalanced ftrace_start_up function-graph: add stack frame test function-graph: disable when both x86_32 and optimize for size are configured ring-buffer: have benchmark test print to trace buffer ring-buffer: do not grab locks in nmi ring-buffer: add locks around rb_per_cpu_empty ring-buffer: check for less than two in size allocation ring-buffer: remove useless compile check for buffer_page size ring-buffer: remove useless warn on check ring-buffer: use BUF_PAGE_HDR_SIZE in calculating index tracing: update sample event documentation tracing/filters: fix race between filter setting and module unload tracing/filters: free filter_string in destroy_preds() ring-buffer: use commit counters for commit pointer accounting ring-buffer: remove unused variable ring-buffer: have benchmark test handle discarded events ring-buffer: prevent adding write in discarded area tracing/filters: strloc should be unsigned short tracing/filters: operand can be negative ... Fix up kmemcheck-induced conflict in kernel/trace/ring_buffer.c manually	2009-06-20 10:56:46 -07:00
Steven Rostedt	71e308a239	function-graph: add stack frame test In case gcc does something funny with the stack frames, or the return from function code, we would like to detect that. An arch may implement passing of a variable that is unique to the function and can be saved on entering a function and can be tested when exiting the function. Usually the frame pointer can be used for this purpose. This patch also implements this for x86. Where it passes in the stack frame of the parent function, and will test that frame on exit. There was a case in x86_32 with optimize for size (-Os) where, for a few functions, gcc would align the stack frame and place a copy of the return address into it. The function graph tracer modified the copy and not the actual return address. On return from the funtion, it did not go to the tracer hook, but returned to the parent. This broke the function graph tracer, because the return of the parent (where gcc did not do this funky manipulation) returned to the location that the child function was suppose to. This caused strange kernel crashes. This test detected the problem and pointed out where the issue was. This modifies the parameters of one of the functions that the arch specific code calls, so it includes changes to arch code to accommodate the new prototype. Note, I notice that the parsic arch implements its own push_return_trace. This is now a generic function and the ftrace_push_return_trace should be used instead. This patch does not touch that code. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Helge Deller <deller@gmx.de> Cc: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-18 18:40:18 -04:00
Alexander van Heukelum	bc3f5d3dbd	x86: de-assembler-ize asm/desc.h asm/desc.h is included in three assembly files, but the only macro it defines, GET_DESC_BASE, is never used. This patch removes the includes, removes the macro GET_DESC_BASE and the ASSEMBLY guard from asm/desc.h. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-17 21:35:10 -07:00
Alexander van Heukelum	dc4c2a0aed	i386: fix/simplify espfix stack switching, move it into assembly The espfix code triggers if we have a protected mode userspace application with a 16-bit stack. On returning to userspace, with iret, the CPU doesn't restore the high word of the stack pointer. This is an "official" bug, and the work-around used in the kernel is to temporarily switch to a 32-bit stack segment/pointer pair where the high word of the pointer is equal to the high word of the userspace stackpointer. The current implementation uses THREAD_SIZE to determine the cut-off, but there is no good reason not to use the more natural 64kb... However, implementing this by simply substituting THREAD_SIZE with 65536 in patch_espfix_desc crashed the test application. patch_espfix_desc tries to do what is described above, but gets it subtly wrong if the userspace stack pointer is just below a multiple of THREAD_SIZE: an overflow occurs to bit 13... With a bit of luck, when the kernelspace stackpointer is just below a 64kb-boundary, the overflow then ripples trough to bit 16 and userspace will see its stack pointer changed by 65536. This patch moves all espfix code into entry_32.S. Selecting a 16-bit cut-off simplifies the code. The game with changing the limit dynamically is removed too. It complicates matters and I see no value in it. Changing only the top 16-bit word of ESP is one instruction and it also implies that only two bytes of the ESPFIX GDT entry need to be changed and this can be implemented in just a handful simple to understand instructions. As a side effect, the operation to compute the original ESP from the ESPFIX ESP and the GDT entry simplifies a bit too, and the remaining three instructions have been expanded inline in entry_32.S. impact: can now reliably run userspace with ESP=xxxxfffc on 16-bit stack segment Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Acked-by: Stas Sergeev <stsp@aknet.ru> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-17 21:35:09 -07:00
Alexander van Heukelum	2e04bc7656	i386: fix return to 16-bit stack from NMI handler Returning to a task with a 16-bit stack requires special care: the iret instruction does not restore the high word of esp in that case. The espfix code fixes this, but currently is not invoked on NMIs. This means that a running task gets the upper word of esp clobbered due intervening NMIs. To reproduce, compile and run the following program with the nmi watchdog enabled (nmi_watchdog=2 on the command line). Using gdb you can see that the high bits of esp contain garbage, while the low bits are still correct. This patch puts the espfix code back into the NMI code path. The patch is slightly complicated due to the irqtrace infrastructure not being NMI-safe. The NMI return path cannot call TRACE_IRQS_IRET. Otherwise, the tail of the normal iret-code is correct for the nmi code path too. To be able to share this code-path, the TRACE_IRQS_IRET was move up a bit. The espfix code exists after the TRACE_IRQS_IRET, but this code explicitly disables interrupts. This short interrupts-off section is now not traced anymore. The return-to-kernel path now always includes the preliminary test to decide if the espfix code should be called. This is never the case, but doing it this way keeps the patch as simple as possible and the few extra instructions should not affect timing in any significant way. #define _GNU_SOURCE #include <stdio.h> #include <sys/types.h> #include <sys/mman.h> #include <unistd.h> #include <sys/syscall.h> #include <asm/ldt.h> int modify_ldt(int func, void ptr, unsigned long bytecount) { return syscall(SYS_modify_ldt, func, ptr, bytecount); } / this is assumed to be usable / #define SEGBASEADDR 0x10000 #define SEGLIMIT 0x20000 / 16-bit segment / struct user_desc desc = { .entry_number = 0, .base_addr = SEGBASEADDR, .limit = SEGLIMIT, .seg_32bit = 0, .contents = 0, / ??? / .read_exec_only = 0, .limit_in_pages = 0, .seg_not_present = 0, .useable = 1 }; int main(void) { setvbuf(stdout, NULL, _IONBF, 0); / map a 64 kb segment / char pointer = mmap((void )SEGBASEADDR, SEGLIMIT+1, PROT_EXEC\|PROT_READ\|PROT_WRITE, MAP_SHARED\|MAP_ANONYMOUS, -1, 0); if (pointer == NULL) { printf("could not map space\n"); return 0; } / write ldt, new mode / int err = modify_ldt(0x11, &desc, sizeof(desc)); if (err) { printf("error modifying ldt: %i\n", err); return 0; } for (int i=0; i<1000; i++) { asm volatile ( "pusha\n\t" "mov %ss, %eax\n\t" / preserve ss:esp / "mov %esp, %ebp\n\t" "push $7\n\t" / index 0, ldt, user mode / "push $65536-4096\n\t" / esp / "lss (%esp), %esp\n\t" / switch to new stack / "push %eax\n\t" / save old ss:esp on new stack / "push %ebp\n\t" "add $1765536, %esp\n\t" /* set high bits / "mov %esp, %edx\n\t" "mov $10000000, %ecx\n\t" / wait... / "1: loop 1b\n\t" / ... a bit / "cmp %esp, %edx\n\t" "je 1f\n\t" "ud2\n\t" / esp changed inexplicably! / "1:\n\t" "sub $1765536, %esp\n\t" /* restore high bits / "lss (%esp), %esp\n\t" / restore old ss:esp */ "popa\n\t"); printf("\rx%ix", i); } return 0; } Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Acked-by: Stas Sergeev <stsp@aknet.ru> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-17 21:35:09 -07:00
Jaswinder Singh Rajput	88200bc28d	x86: entry_32.S fix compile warnings - fix work mask bit width Fix: arch/x86/kernel/entry_32.S:446: Warning: 00000000080001d1 shortened to 00000000000001d1 arch/x86/kernel/entry_32.S:457: Warning: 000000000800feff shortened to 000000000000feff arch/x86/kernel/entry_32.S:527: Warning: 00000000080001d1 shortened to 00000000000001d1 arch/x86/kernel/entry_32.S:541: Warning: 000000000800feff shortened to 000000000000feff arch/x86/kernel/entry_32.S:676: Warning: 0000000008000091 shortened to 0000000000000091 TIF_SYSCALL_FTRACE is 0x08000000 and until now we checked the first 16 bits of the work mask - bit 27 falls outside of that. Update the entry_32.S code to check the full 32-bit mask. [ %cx => %ecx fix from Cyrill Gorcunov <gorcunov@gmail.com> ] Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "H. Peter Anvin" <hpa@kernel.org> LKML-Reference: <1237012693.18733.3.camel@ht.satnam> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-14 09:42:51 +01:00
Stas Sergeev	bda3a89745	x86: minor cleanup in the espfix code Impact: Cleanup Checkin `be44d2aabc` eliminates the use of a 16-bit stack for espfix. However, at least one instruction remained that only operated on the low 16 bits of %esp. This is not a bug per se because the kernel stack is always an aligned 4K or 8K block. Therefore it cannot cross 64K boundaries; this code, in fact, relies strictly on that fact. However, it's a lot cleaner (and, for that matter, smaller) to operate on the entire 32-bit register. Signed-off-by: Stas Sergeev <stsp@aknet.ru> CC: Zachary Amsden <zach@vmware.com> CC: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2009-02-23 11:34:04 -08:00
Jeremy Fitzhardinge	0341c14da4	x86: use _types.h headers in asm where available In general, the only definitions that assembly files can use are in _types.S headers (where available), so convert them. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-02-13 11:35:01 -08:00
Ingo Molnar	ab639f3593	Merge branch 'core/percpu' into x86/core	2009-02-13 09:45:09 +01:00
Brian Gerst	253f29a4ae	x86: pass in pt_regs pointer for syscalls that need it Some syscalls need to access the pt_regs structure, either to copy user register state or to modifiy it. This patch adds stubs to load the address of the pt_regs struct into the %eax register, and changes the syscalls to regparm(1) to receive the pt_regs pointer as the first argument. Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-11 12:40:45 +01:00
Tejun Heo	60a5317ff0	x86: implement x86_32 stack protector Impact: stack protector for x86_32 Implement stack protector for x86_32. GDT entry 28 is used for it. It's set to point to stack_canary-20 and have the length of 24 bytes. CONFIG_CC_STACKPROTECTOR turns off CONFIG_X86_32_LAZY_GS and sets %gs to the stack canary segment on entry. As %gs is otherwise unused by the kernel, the canary can be anywhere. It's defined as a percpu variable. x86_32 exception handlers take register frame on stack directly as struct pt_regs. With -fstack-protector turned on, gcc copies the whole structure after the stack canary and (of course) doesn't copy back on return thus losing all changed. For now, -fno-stack-protector is added to all files which contain those functions. We definitely need something better. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-10 00:42:01 +01:00
Tejun Heo	ccbeed3a05	x86: make lazy %gs optional on x86_32 Impact: pt_regs changed, lazy gs handling made optional, add slight overhead to SAVE_ALL, simplifies error_code path a bit On x86_32, %gs hasn't been used by kernel and handled lazily. pt_regs doesn't have place for it and gs is saved/loaded only when necessary. In preparation for stack protector support, this patch makes lazy %gs handling optional by doing the followings. * Add CONFIG_X86_32_LAZY_GS and place for gs in pt_regs. * Save and restore %gs along with other registers in entry_32.S unless LAZY_GS. Note that this unfortunately adds "pushl $0" on SAVE_ALL even when LAZY_GS. However, it adds no overhead to common exit path and simplifies entry path with error code. * Define different user_gs accessors depending on LAZY_GS and add lazy_save_gs() and lazy_load_gs() which are noop if !LAZY_GS. The lazy__gs() ops are used to save, load and clear %gs lazily. Define ELF_CORE_COPY_KERNEL_REGS() which always read %gs directly. xen and lguest changes need to be verified. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-10 00:42:00 +01:00
Tejun Heo	f0d96110f9	x86: use asm .macro instead of cpp #define in entry_32.S Impact: cleanup Use .macro instead of cpp #define where approriate. This cleans up code and will ease future changes. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-10 00:41:57 +01:00
Ingo Molnar	1164dd0099	x86: move mach-default/*.h files to asm/ We are getting rid of subarchitecture support - move the hook files to asm/. (These are now stale and should be replaced with more explicit runtime mechanisms - but the transition is simpler this way.) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-29 14:16:51 +01:00
Tejun Heo	02cf94c370	x86: make x86_32 use tlb_64.c Impact: less contention when issuing invalidate IPI, cleanup Make x86_32 use the same tlb code as 64bit. The 64bit code uses multiple IPI vectors for tlb shootdown to reduce contention. This patch makes x86_32 allocate the same 8 IPIs as x86_64 and share the code paths. Note that the usage of asmlinkage is inconsistent for x86_32 and 64 and calls for further cleanup. This has been noted with a FIXME comment in tlb_64.c. Signed-off-by: Tejun Heo <tj@kernel.org>	2009-01-21 17:26:06 +09:00
Ingo Molnar	e8cea892df	Revert "i386: add TRACE_IRQS_OFF for the nmi" This reverts commit `e0c7317557`. This patch was wrong, as lockdep (and thus the irq state tracer) aren't nmi safe. People are already seeing lockdep warnings due to this. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-12 19:36:59 +01:00
Linus Torvalds	b0f4b285d7	Merge branch 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (241 commits) sched, trace: update trace_sched_wakeup() tracing/ftrace: don't trace on early stage of a secondary cpu boot, v3 Revert "x86: disable X86_PTRACE_BTS" ring-buffer: prevent false positive warning ring-buffer: fix dangling commit race ftrace: enable format arguments checking x86, bts: memory accounting x86, bts: add fork and exit handling ftrace: introduce tracing_reset_online_cpus() helper tracing: fix warnings in kernel/trace/trace_sched_switch.c tracing: fix warning in kernel/trace/trace.c tracing/ring-buffer: remove unused ring_buffer size trace: fix task state printout ftrace: add not to regex on filtering functions trace: better use of stack_trace_enabled for boot up code trace: add a way to enable or disable the stack tracer x86: entry_64 - introduce FTRACE_ frame macro v2 tracing/ftrace: add the printk-msg-only option tracing/ftrace: use preempt_enable_no_resched_notrace in ring_buffer_time_stamp() x86, bts: correctly report invalid bts records ... Fixed up trivial conflict in scripts/recordmcount.pl due to SH bits being already partly merged by the SH merge.	2008-12-28 12:21:10 -08:00
Steven Rostedt	e49dc19c6a	ftrace: function graph return for function entry Impact: feature, let entry function decide to trace or not This patch lets the graph tracer entry function decide if the tracing should be done at the end as well. This requires all function graph entry functions return 1 if it should trace, or 0 if the return should not be traced. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-03 08:56:26 +01:00
Steven Rostedt	bb4304c71c	ftrace: have function graph use mcount caller address Impact: consistency change for function graph This patch makes function graph record the mcount caller address the same way the function tracer does. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-03 08:56:22 +01:00
Ingo Molnar	3bdae4f464	Merge branch 'x86/debug' into x86/irq We merge this branch because x86/debug touches code that we started cleaning up in x86/irq. The two branches started out independent, but as unexpected amount of activity went into x86/irq, they became dependent. Resolve that by this cross-merge.	2008-11-28 15:00:48 +01:00
Alexander van Heukelum	d211af055d	i386: get rid of the use of KPROBE_ENTRY / KPROBE_END entry_32.S is now the only user of KPROBE_ENTRY / KPROBE_END, treewide. This patch reorders entry_64.S and explicitly generates a separate section for functions that need the protection. The generated code before and after the patch is equal. The KPROBE_ENTRY and KPROBE_END macro's are removed too. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-27 12:37:54 +01:00

1 2 3

106 Commits