Commit Graph

20950 Commits

Author SHA1 Message Date
Andy Lutomirski a7fcf28d43 x86/asm/entry: Replace this_cpu_sp0() with current_top_of_stack() and fix it on x86_32
I broke 32-bit kernels.  The implementation of sp0 was correct
as far as I can tell, but sp0 was much weirder on x86_32 than I
realized.  It has the following issues:

 - Init's sp0 is inconsistent with everything else's: non-init tasks
   are offset by 8 bytes.  (I have no idea why, and the comment is unhelpful.)

 - vm86 does crazy things to sp0.

Fix it up by replacing this_cpu_sp0() with
current_top_of_stack() and using a new percpu variable to track
the top of the stack on x86_32.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 75182b1632 ("x86/asm/entry: Switch all C consumers of kernel_stack to this_cpu_sp0()")
Link: http://lkml.kernel.org/r/d09dbe270883433776e0cbee3c7079433349e96d.1425692936.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-07 09:34:03 +01:00
Andy Lutomirski b27559a433 x86/asm/entry: Delay loading sp0 slightly on task switch
The change:

  75182b1632 ("x86/asm/entry: Switch all C consumers of kernel_stack to this_cpu_sp0()")

had the unintended side effect of changing the return value of
current_thread_info() during part of the context switch process.
Change it back.

This has no effect as far as I can tell -- it's just for
consistency.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/9fcaa47dd8487db59eed7a3911b6ae409476763e.1425692936.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-07 09:34:03 +01:00
Andy Lutomirski 9b47668843 x86/asm/entry: Rename 'INIT_TSS_IST' to 'CPU_TSS_IST'
This has nothing to do with the init thread or the initial
anything. It's just the CPU's TSS.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a0bd5e26b32a2e1f08ff99017d0997118fbb2485.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-06 08:32:58 +01:00
Andy Lutomirski d0a0de21f8 x86/asm/entry: Remove INIT_TSS and fold the definitions into 'cpu_tss'
The INIT_TSS is unnecessary.  Just define the initial TSS where
'cpu_tss' is defined.

While we're at it, merge the 32-bit and 64-bit definitions.  The
only syntactic change is that 32-bit kernels were computing sp0
as long, but now they compute it as unsigned long.

Verified by objdump: the contents and relocations of
.data..percpu..shared_aligned are unchanged on 32-bit and 64-bit
kernels.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/8fc39fa3f6c5d635e93afbdd1a0fe0678a6d7913.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-06 08:32:58 +01:00
Andy Lutomirski 24933b82c0 x86/asm/entry: Rename 'init_tss' to 'cpu_tss'
It has nothing to do with init -- there's only one TSS per cpu.

Other names considered include:

 - current_tss: Confusing because we never switch the tss.
 - singleton_tss: Too long.

This patch was generated with 's/init_tss/cpu_tss/g'.  Followup
patches will fix INIT_TSS and INIT_TSS_IST by hand.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/da29fb2a793e4f649d93ce2d1ed320ebe8516262.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-06 08:32:58 +01:00
Andy Lutomirski 9d0c914c60 x86/asm/entry/64/compat: Change the 32-bit sysenter code to use sp0
The ia32 sysenter code loaded the top of the kernel stack into
rsp by loading kernel_stack and then adjusting it.  It can be
simplified to just read sp0 directly.

This requires the addition of a new asm-offsets entry for sp0.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/88ff9006163d296a0665338585c36d9bfb85235d.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-06 08:32:58 +01:00
Andy Lutomirski 75182b1632 x86/asm/entry: Switch all C consumers of kernel_stack to this_cpu_sp0()
This will make modifying the semantics of kernel_stack easier.

The change to ist_begin_non_atomic() is necessary because sp0 no
longer points to the same THREAD_SIZE-aligned region as RSP;
it's one byte too high for that.  At Denys' suggestion, rather
than offsetting it, just check explicitly that we're in the
correct range ending at sp0.  This has the added benefit that we
no longer assume that the thread stack is aligned to
THREAD_SIZE.

Suggested-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/ef8254ad414cbb8034c9a56396eeb24f5dd5b0de.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-06 08:32:57 +01:00
Andy Lutomirski 8ef46a672a x86/asm/entry: Add this_cpu_sp0() to read sp0 for the current cpu
We currently store references to the top of the kernel stack in
multiple places: kernel_stack (with an offset) and
init_tss.x86_tss.sp0 (no offset).  The latter is defined by
hardware and is a clean canonical way to find the top of the
stack.  Add an accessor so we can start using it.

This needs minor paravirt tweaks.  On native, sp0 defines the
top of the kernel stack and is therefore always correct.  On Xen
and lguest, the hypervisor tracks the top of the stack, but we
want to start reading sp0 in the kernel.  Fixing this is simple:
just update our local copy of sp0 as well as the hypervisor's
copy on task switches.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/8d675581859712bee09a055ed8f785d80dac1eca.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-06 08:32:57 +01:00
Wang Nan 5eca7453d6 x86/traps: Separate set_intr_gate() and clean up early_trap_init()
As early_trap_init() doesn't use IST, replace
set_intr_gate_ist() and set_system_intr_gate_ist() with their
standard counterparts.

set_intr_gate() requires a trace_debug symbol which we don't
have and won't use. This patch separates set_intr_gate() into two
parts, and uses base version in early_trap_init().

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: <dave.hansen@linux.intel.com>
Cc: <lizefan@huawei.com>
Cc: <masami.hiramatsu.pt@hitachi.com>
Cc: <oleg@redhat.com>
Cc: <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1425010789-13714-1-git-send-email-wangnan0@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-05 00:47:29 +01:00
Andy Lutomirski 1e3fbb8a1d x86/asm/entry/64: Remove a bogus 'ret_from_fork' optimization
'ret_from_fork' checks TIF_IA32 to determine whether 'pt_regs' and
the related state make sense for 'ret_from_sys_call'.  This is
entirely the wrong check.  TS_COMPAT would make a little more
sense, but there's really no point in keeping this optimization
at all.

This fixes a return to the wrong user CS if we came from int
0x80 in a 64-bit task.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/4710be56d76ef994ddf59087aad98c000fbab9a4.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:53 +01:00
Denys Vlasenko d441c1f2b7 x86/asm/entry/64: Simplify optimistic SYSRET
Avoid redundant load of %r11 (it is already loaded a few
instructions before).

Also simplify %rsp restoration, instead of two steps:

         add $0x80, %rsp
         mov 0x18(%rsp), %rsp

we can do a simplified single step to restore user-space RSP:

         mov 0x98(%rsp), %rsp

and get the same result.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
[ Clarified the changelog. ]
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1aef69b346a6db0d99cdfb0f5ba83e8c985e27d7.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:52 +01:00
Denys Vlasenko b3ab90b333 x86/asm/entry/64/compat: Use more readable constant
The last instance of "mysterious" SS+8 constant is replaced by
SIZEOF_PTREGS.

Message-Id: <1424822419-10267-1-git-send-email-dvlasenk@redhat.com>
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/d35aeba3059407ac54f472ddcfbea767ff8916ac.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:52 +01:00
Denys Vlasenko 911d2bb5cc x86/asm/entry/64: Use more readable constants
Constants such as SS+8 or SS+8-RIP are mysterious.
In most cases, SS+8 is just meant to be SIZEOF_PTREGS,
SS+8-RIP is RIP's offset in the iret frame.

This patch changes some of these constants to be less
mysterious.

No code changes (verified with objdump).

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1d20491384773bd606e23a382fac23ddb49b5178.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:52 +01:00
Denys Vlasenko 14f6e9532d x86/asm/entry/64/compat: Fold the IA32_ARG_FIXUP macro into its callers
Use of a small macro - one with conditional expansion - does
more harm than good. It obfuscates code, with minimal code
reuse.

For example, because of obfuscation it's not obvious that
in 'ia32_sysenter_target', we can optimize loading of r9 -
currently it is loaded with a detour through ebp.

This patch folds the IA32_ARG_FIXUP macro into its callers.

No code changes.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/4da092094cd78734384ac31e0d4ec1d8f69145a2.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:52 +01:00
Denys Vlasenko ebfc453e27 x86/asm/entry/64: Clean up and document various entry code details
This patch does a lot of cleanup in comments and formatting,
but it does not change any code:

 - Rename 'save_paranoid' to 'paranoid_entry': this makes naming
   similar to its "non-paranoid" sibling, 'error_entry',
   and to its counterpart, 'paranoid_exit'.

 - Use the same CFI annotation atop 'paranoid_entry' and 'error_entry'.

 - Fix irregular indentation of assembler operands.

 - Add/fix comments on top of 'paranoid_entry' and 'error_entry'.

 - Remove stale comment about "oldrax".

 - Make comments about "no swapgs" flag in ebx more prominent.

 - Deindent wrongly indented top-level comment atop 'paranoid_exit'.

 - Indent wrongly deindented comment inside 'error_entry'.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/4640f9fcd5ea46eb299b1cd6d3f5da3167d2f78d.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:51 +01:00
Denys Vlasenko 1eeb207f87 x86/asm/entry/64: Move 'save_paranoid' and 'ret_from_fork' closer to their users
For some odd reason, these two functions are at the very top of
the file. "save_paranoid"'s caller is approximately in the middle
of it, move it there. Move 'ret_from_fork' to be right after
fork/exec helpers.

This is a pure block move, nothing is changed in the function
bodies.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/6446bbfe4094532623a5b83779b7015fec167a9d.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:51 +01:00
Denys Vlasenko b87cf63e2a x86/asm/entry: Add comments about various syscall instructions
SYSCALL/SYSRET and SYSENTER/SYSEXIT have weird semantics.
Moreover, they differ in 32- and 64-bit mode.

What is saved? What is not? Is rsp set? Are interrupts disabled?
People tend to not remember these details well enough.

This patch adds comments which explain in detail
what registers are modified by each of these instructions.

The comments are placed immediately before corresponding
entry and exit points.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/a94b98b63527797c871a81402ff5060b18fa880a.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:51 +01:00
Andy Lutomirski 050273d19b x86/asm/entry/64: Remove 'int_check_syscall_exit_work'
Nothing references it anymore.

Reported-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 96b6352c12 ("x86_64, entry: Remove the syscall exit audit and schedule optimizations")
Link: http://lkml.kernel.org/r/dd2a4d26ecc7a5db61b476727175cd99ae2b32a4.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:50 +01:00
Denys Vlasenko f2db9382c1 x86/asm/entry: Do mass removal of 'ARGOFFSET'
ARGOFFSET is zero now, removing it changes no code.

A few macros lost "offset" parameter, since it is always zero
now too.

No code changes - verified with objdump.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/8689f937622d9d2db0ab8be82331fa15e4ed4713.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:50 +01:00
Denys Vlasenko 0d55083698 x86/asm/entry/64: Shrink code in 'paranoid_exit'
RESTORE_EXTRA_REGS + RESTORE_C_REGS looks small, but it's
a lot of instructions (fourteen). Let's reuse them.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
[ Cleaned up the labels. ]
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1421272101-16847-2-git-send-email-dvlasenk@redhat.com
Link: http://lkml.kernel.org/r/59d71848cee3ec9eb48c0252e602efd6bd560e3c.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:50 +01:00
Denys Vlasenko e90e147cbc x86/asm/entry/64: Fix comments
- Misleading and slightly incorrect comments in "struct pt_regs" are
   fixed (four instances).

 - Fix incorrect comment atop EMPTY_FRAME macro.

 - Explain in more detail what we do with stack layout during hw interrupt.

 - Correct comments about "partial stack frame" which are no longer
   true.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1423778052-21038-3-git-send-email-dvlasenk@redhat.com
Link: http://lkml.kernel.org/r/e1f4429c491fe6ceeddb879dea2786e0f8920f9c.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:49 +01:00
Denys Vlasenko 76f5df43ca x86/asm/entry/64: Always allocate a complete "struct pt_regs" on the kernel stack
The 64-bit entry code was using six stack slots less by not
saving/restoring registers which are callee-preserved according
to the C ABI, and was not allocating space for them.

Only when syscalls needed a complete "struct pt_regs" was
the complete area allocated and filled in.

As an additional twist, on interrupt entry a "slightly less
truncated pt_regs" trick is used, to make nested interrupt
stacks easier to unwind.

This proved to be a source of significant obfuscation and subtle
bugs. For example, 'stub_fork' had to pop the return address,
extend the struct, save registers, and push return address back.
Ugly. 'ia32_ptregs_common' pops return address and "returns" via
jmp insn, throwing a wrench into CPU return stack cache.

This patch changes the code to always allocate a complete
"struct pt_regs" on the kernel stack. The saving of registers
is still done lazily.

"Partial pt_regs" trick on interrupt stack is retained.

Macros which manipulate "struct pt_regs" on stack are reworked:

 - ALLOC_PT_GPREGS_ON_STACK allocates the structure.

 - SAVE_C_REGS saves to it those registers which are clobbered
   by C code.

 - SAVE_EXTRA_REGS saves to it all other registers.

 - Corresponding RESTORE_* and REMOVE_PT_GPREGS_FROM_STACK macros
   reverse it.

'ia32_ptregs_common', 'stub_fork' and friends lost their ugly dance
with the return pointer.

LOAD_ARGS32 in ia32entry.S now uses symbolic stack offsets
instead of magic numbers.

'error_entry' and 'save_paranoid' now use SAVE_C_REGS +
SAVE_EXTRA_REGS instead of having it open-coded yet again.

Patch was run-tested: 64-bit executables, 32-bit executables,
strace works.

Timing tests did not show measurable difference in 32-bit
and 64-bit syscalls.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1423778052-21038-2-git-send-email-dvlasenk@redhat.com
Link: http://lkml.kernel.org/r/b89763d354aa23e670b9bdf3a40ae320320a7c2e.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:49 +01:00
Denys Vlasenko 6e1327bd2b x86/asm/entry/64: Fix incorrect symbolic constant usage: R11->ARGOFFSET
Since the last fix of this nature, a few more instances have crept
in. Fix them up. No object code changes (constants have the same
value).

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1423778052-21038-1-git-send-email-dvlasenk@redhat.com
Link: http://lkml.kernel.org/r/f5e1c4084319a42e5f14d41e2d638949ce66bc08.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:49 +01:00
Denys Vlasenko 49db46a67b x86/asm: Introduce push/pop macros which generate CFI_REL_OFFSET and CFI_RESTORE
Sequences:

        pushl_cfi %reg
        CFI_REL_OFFSET reg, 0

and:

        popl_cfi %reg
        CFI_RESTORE reg

happen quite often. This patch adds macros which generate them.

No assembly changes (verified with objdump -dr vmlinux.o).

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1421017655-25561-1-git-send-email-dvlasenk@redhat.com
Link: http://lkml.kernel.org/r/2202eb90f175cf45d1b2d1c64dbb5676a8ad07ad.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:49 +01:00
Denys Vlasenko 69e8544cd0 x86/asm/64: Open-code register save/restore in trace_hardirqs*() thunks
This is a preparatory patch for change in "struct pt_regs"
handling in entry_64.S.

trace_hardirqs*() thunks were (ab)using a part of the
'pt_regs' handling code, namely the SAVE_ARGS/RESTORE_ARGS
macros, to save/restore registers across C function calls.

Since SAVE_ARGS is going to be changed, open-code
register saving/restoring here.

Incidentally, this removes a bit of dead code:
one SAVE_ARGS was used just to emit a CFI annotation,
but it also generated unreachable assembly instructions.

Take a page from thunk_32.S and use push/pop instructions
instead of movq, they are far shorter:
1 or 2 bytes versus 5, and no need for instructions to adjust %rsp:

   text	   data	    bss	    dec	    hex	filename
    333	     40	      0	    373	    175	thunk_64_movq.o
    104	     40	      0	    144	     90	thunk_64_push_pop.o

[ This is ugly as sin, but we'll fix up the ugliness in the next
  patch. I see no point in reordering patches just to avoid an
  ugly intermediate state.  --Andy ]

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1420927210-19738-4-git-send-email-dvlasenk@redhat.com
Link: http://lkml.kernel.org/r/4c979ad604f0f02c5ade3b3da308b53eabd5e198.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:48 +01:00
Ingo Molnar f8e92fb4b0 A more involved rework of the alternatives framework to be able to
pad instructions and thus make using the alternatives macros more
 straightforward and without having to figure out old and new instruction
 sizes but have the toolchain figure that out for us.
 
 Furthermore, it optimizes JMPs used so that fetch and decode can be
 relieved with smaller versions of the JMPs, where possible.
 
 Some stats:
 
 x86_64 defconfig:
 
 Alternatives sites total:               2478
 Total padding added (in Bytes):         6051
 
 The padding is currently done for:
 
 X86_FEATURE_ALWAYS
 X86_FEATURE_ERMS
 X86_FEATURE_LFENCE_RDTSC
 X86_FEATURE_MFENCE_RDTSC
 X86_FEATURE_SMAP
 
 This is with the latest version of the patchset. Of course, on each
 machine the alternatives sites actually being patched are a proper
 subset of the total number.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU9ekpAAoJEBLB8Bhh3lVKyjYP/AiHEiHkkjnpwTt49kUtUMI6
 GIlGfJVNjp5LLnSRD/fkL/wdkBgQtMzr9O1g8Qi/lbFqxsOFteU9f1OtLx34ZwZw
 MhtdiHcrKGMsaIxTJh4FaqPHBT5ussm2yn1jlAX+LgILd3dpqe3oytsO8JihcK9j
 t2u9V/Lq92TV7zXxGgWJsPc86WhhgdldlU3X96S++Di18bnDaKbGkzthU6WzZG/H
 qtFZ5bfK8TlVHYduft+D9ZPzFYGp1WCOa03qU4+Djaxw02HDB6Ltysend9zg0lB1
 RT/BP0PwHD3mOL11qpgtV1ChCbR8FJMN/z5+YdSNJgzDQA0H5Sf0UueTweosfAz+
 /iC5t/wkegdYtqtA0nKVypYOJCS+UdfMZXenYgtSUJl6drB6I5BCW4mVft3AuWo+
 EilPGpblvmjWRx1HiF4/Q/5zrSWHzmKQDyXuyxI9m0OUxAGAM0+8CY6wOqRA5pX+
 /f5MjZ1hXELQGhl5Qdj4nqJacICGevJ8WYdZ53B+uYVxz7fbXk9hSYcZKT94UshD
 qSdaV4XJSuC7pDKqiWoNWXp5N1g+D2BgfwoQEr/RnodFZRlfc+cmOv/visak0OLr
 E/pp1vJvCi3+T3ImX1MCDiXmflQtFctiL3hNgMXYK2IGhJb2RDC2bFeZkksOHuAE
 BGgrn+usQDjVlikEnfI3
 =0KXp
 -----END PGP SIGNATURE-----

Merge tag 'alternatives_padding' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/asm

Pull alternative instructions framework improvements from Borislav Petkov:

 "A more involved rework of the alternatives framework to be able to
  pad instructions and thus make using the alternatives macros more
  straightforward and without having to figure out old and new instruction
  sizes but have the toolchain figure that out for us.

  Furthermore, it optimizes JMPs used so that fetch and decode can be
  relieved with smaller versions of the JMPs, where possible.

  Some stats:

    x86_64 defconfig:

    Alternatives sites total:               2478
    Total padding added (in Bytes):         6051

  The padding is currently done for:

    X86_FEATURE_ALWAYS
    X86_FEATURE_ERMS
    X86_FEATURE_LFENCE_RDTSC
    X86_FEATURE_MFENCE_RDTSC
    X86_FEATURE_SMAP

  This is with the latest version of the patchset. Of course, on each
  machine the alternatives sites actually being patched are a proper
  subset of the total number."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 06:36:15 +01:00
Ingo Molnar d2c032e3dc Linux 4.0-rc2
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJU9enEAAoJEHm+PkMAQRiG/ewIAJ4MW4tcAhaVj6ndCF3+uL/b
 RaVm1apUjsTloe5Fl0TT9J5CO3zdOetmMNToy2sf0W4MJDIyHf21o83l7eniV/6q
 al/c3fQ6HVtNjiSUNghTtzVlL+gUD1F60b9BGYi1V5h2Mp8u0NG1alTGLQfCB8sE
 ArB+v2aWEdSPn7mZDA0Yuc1In+8bkpht3oy+OLD/8JNkqqLnml9YOyPjM1cuRpBr
 NxKCLcPzSHH9/nR3T6XtkxXYV5xD3+CDm9roJhfHukoFmfT/G3C65Zcp2KEed/Cw
 QQpu+ox7fpUs10F/Fbfm8AE+tRB4o2sGh97sprXrO5oaFdx6FPIBo4WN8i/Vy68=
 =qpY+
 -----END PGP SIGNATURE-----

Merge tag 'v4.0-rc2' into x86/asm, to refresh the tree

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 06:35:43 +01:00
Brian Gerst 7e8e385aaf x86/compat: Remove sys32_vm86_warning
The check against lastcomm is racy, and the message it produces
isn't necessary.  vm86 support can be disabled on a 32-bit
kernel also, and doesn't have this message.  Switch to
sys_ni_syscall instead.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1425439896-8322-4-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 06:16:21 +01:00
Brian Gerst 2aa4a71092 x86/compat: Merge native and compat 32-bit syscall tables
Combine the 32-bit syscall tables into one file.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1425439896-8322-3-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 06:16:21 +01:00
Brian Gerst 29a5ff97fa x86/compat: Remove compat_ni_syscall()
compat_ni_syscall() does the same thing as sys_ni_syscall().

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1425439896-8322-2-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 06:16:21 +01:00
Linus Torvalds a38ecbbd0b Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "A CR4-shadow 32-bit init fix, plus two typo fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Init per-cpu shadow copy of CR4 on 32-bit CPUs too
  x86/platform/intel-mid: Fix trivial printk message typo in intel_mid_arch_setup()
  x86/cpu/intel: Fix trivial typo in intel_tlb_table[]
2015-03-01 12:22:44 -08:00
Linus Torvalds d7b48fec35 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Two kprobes fixes and a handful of tooling fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools: Make sparc64 arch point to sparc
  perf symbols: Define EM_AARCH64 for older OSes
  perf top: Fix SIGBUS on sparc64
  perf tools: Fix probing for PERF_FLAG_FD_CLOEXEC flag
  perf tools: Fix pthread_attr_setaffinity_np build error
  perf tools: Define _GNU_SOURCE on pthread_attr_setaffinity_np feature check
  perf bench: Fix order of arguments to memcpy_alloc_mem
  kprobes/x86: Check for invalid ftrace location in __recover_probed_insn()
  kprobes/x86: Use 5-byte NOP when the code might be modified by ftrace
2015-03-01 11:56:13 -08:00
Steven Rostedt 5b2bdbc845 x86: Init per-cpu shadow copy of CR4 on 32-bit CPUs too
Commit:

   1e02ce4ccc ("x86: Store a per-cpu shadow copy of CR4")

added a shadow CR4 such that reads and writes that do not
modify the CR4 execute much faster than always reading the
register itself.

The change modified cpu_init() in common.c, so that the
shadow CR4 gets initialized before anything uses it.

Unfortunately, there's two cpu_init()s in common.c. There's
one for 64-bit and one for 32-bit. The commit only added
the shadow init to the 64-bit path, but the 32-bit path
needs the init too.

Link: http://lkml.kernel.org/r/20150227125208.71c36402@gandalf.local.home Fixes: 1e02ce4ccc "x86: Store a per-cpu shadow copy of CR4"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20150227145019.2bdd4354@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-28 08:04:20 +01:00
Ingo Molnar 5838d18955 Merge branch 'linus' into x86/urgent, to merge dependent patch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-28 08:03:10 +01:00
Wang Nan b4d8327024 x86/traps: Enable DEBUG_STACK after cpu_init() for TRAP_DB/BP
Before this patch early_trap_init() installs DEBUG_STACK for
X86_TRAP_BP and X86_TRAP_DB. However, DEBUG_STACK doesn't work
correctly until cpu_init() <-- trap_init().

This patch passes 0 to set_intr_gate_ist() and
set_system_intr_gate_ist() instead of DEBUG_STACK to let it use
same stack as kernel, and installs DEBUG_STACK for them in
trap_init().

As core runs at ring 0 between early_trap_init() and
trap_init(), there is no chance to get a bad stack before
trap_init().

As NMI is also enabled in trap_init(), we don't need to care
about is_debug_stack() and related things used in
arch/x86/kernel/nmi.c.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: <dave.hansen@linux.intel.com>
Cc: <lizefan@huawei.com>
Cc: <luto@amacapital.net>
Cc: <oleg@redhat.com>
Link: http://lkml.kernel.org/r/1424929779-13174-1-git-send-email-wangnan0@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-26 12:29:20 +01:00
Andy Lutomirski 72c6fb4f74 x86/ia32-compat: Fix CLONE_SETTLS bitness of copy_thread()
CLONE_SETTLS is expected to write a TLS entry in the GDT for
32-bit callers and to set FSBASE for 64-bit callers.

The correct check is is_ia32_task(), which returns true in the
context of a 32-bit syscall.  TIF_IA32 is set if the task itself
has a 32-bit personality, which is not the same thing.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Link: http://lkml.kernel.org/r/45e2d0d695393d76406a0c7225b82c76223e0cc5.1424822291.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-25 08:27:50 +01:00
Andy Lutomirski 08571f1ae3 x86/ptrace: Remove checks for TIF_IA32 when changing CS and SS
The ability for modified CS and/or SS to be useful has nothing
to do with TIF_IA32.  Similarly, if there's an exploit involving
changing CS or SS, it's exploitable with or without a TIF_IA32
check.

So just delete the check.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Link: http://lkml.kernel.org/r/71c7ab36456855d11ae07edd4945a7dfe80f9915.1424822291.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-25 08:27:49 +01:00
Linus Torvalds b24e2bdde4 xen: regression and bug fixes for 4.0-rc1
- Fix two regression introduced in 4.0-rc1 affecting PV/PVH guests in
   certain configurations.
 - Prevent pvscsi frontends bypassing backend checks.
 - Allow privcmd hypercalls to be preempted even on kernel with
   voluntary preemption.  This fixes soft-lockups with long running
   toolstack hypercalls (e.g., when creating/destroying large domains).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJU7GDMAAoJEFxbo/MsZsTRj7AIAIFD2Dgu9p56nx6rUA6JOWP9
 2O1VlINEkggUFNCekTJ3HkAjHreC2Ir67nk1PTTfL3ZN+PqxEv+IoUF1XKJxE/G3
 0KchszK0MVnc7UBrUJWRUEpGo92iIC9QBc8MyDN6dGbfV5lT4aHB5ug/aSZqG8HV
 8ITOftfwnVE8ukOK9bSRPZRejnoRMbGUmqpr0X17sE4Zb8+LEf0iO5Yh88NEjzxM
 OD5hL3TGN5uXh6rYHS+U82M1g83G4Q7v10WIN5O96QhA6e5rVbVNsohHpL8IHY5q
 kazw7wBY//SKGCGniACYnh+vZDeNddwCz8chj6S49h272zanMleyolrUjlcMGzQ=
 =xmcy
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-4.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen bugfixes from David Vrabel:
 "Xen regression and bug fixes for 4.0-rc1

   - Fix two regressions introduced in 4.0-rc1 affecting PV/PVH guests
     in certain configurations.

   - Prevent pvscsi frontends bypassing backend checks.

   - Allow privcmd hypercalls to be preempted even on kernel with
     voluntary preemption.  This fixes soft-lockups with long running
     toolstack hypercalls (e.g., when creating/destroying large
     domains)"

* tag 'stable/for-linus-4.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: Initialize cr4 shadow for 64-bit PV(H) guests
  xen-scsiback: mark pvscsi frontend request consumed only after last read
  x86/xen: allow privcmd hypercalls to be preempted
  x86/xen: Make sure X2APIC_ENABLE bit of MSR_IA32_APICBASE is not set
2015-02-24 09:18:48 -08:00
Linus Torvalds 84257ce6b7 Lguest weird config build fix, and update to the documentation.
Thanks,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU69NiAAoJENkgDmzRrbjxKIEQAJe/Us6fBCToMFt71DNMtwHf
 3SsyDCUciO/lu4H2/hVRI0vK0gYEAmhPZHGTKMKvsg46NJ5Y+BXvg2ZICEbn52Oh
 1guQdNJhJbjNtmHhZQbNzOLJfEz0/0IJmcCKZoJ3hVGyQ54qTMila26FUJ3qWAzO
 s+Wlc7sw1FScSy6RMYA1UQ7foTxpIQZZPc9QKZCMU16ZnK8OutQemQOPJp/pAmm9
 LKuOLAIeF18AbuiC0BChG+wm4U7v8Tci5kEI2gRb77VUjGLcccpZPoJyaP1ZaJP/
 cvkraLvERrWe+PPF/9CCB8yMDsOMfGDMPo0FrdSpSWZ6JnfZmLzKZLmtMlyGjVVA
 y7zn8eBGIJP519Y+WK9/ySO/FL+ibbep13wjwTYOv6AQHkwpH0VvtKOo7EDYAWht
 Prk+xB7u0l1SBX5LPzwN9QAu8WVeECjgYpjY7Oa3Pss/RLoZlNkV6Gk7Bhe9b7Zy
 NOW8nIlKRCQWS5KsM2XYRTkjB0BNoYJyHULIFSrIOBRgRVxnF1vOqRSGnIEzmrBI
 O2ewyUirFmTQI2RZzUL9UPidBo1DbdcUSYNV1Ex4vW+y/++v2VS7RaKDDvqU9JKG
 TahKm2a9Xi/0dDkKu6IwFPCtEjdu5wqRawvryz8AztjpNQbucO1qBpxbc1AmAM3g
 jlBf1S7OtPXf5bOk8QOy
 =pJ4p
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull lguest fixes from Rusty Russell:
 "Lguest weird config build fix, and update to the documentation"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  lguest: update help text.
  lguest: now depends on PCI
2015-02-24 09:14:43 -08:00
Yannick Guerrini 579deee571 x86/platform/intel-mid: Fix trivial printk message typo in intel_mid_arch_setup()
Change 'Uknown' to 'Unknown'

Signed-off-by: Yannick Guerrini <yguerrini@tomshardware.fr>
Cc: trivial@kernel.org
Link: http://lkml.kernel.org/r/1424710358-10140-1-git-send-email-yguerrini@tomshardware.fr
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-24 08:52:37 +01:00
Boris Ostrovsky 5054daa285 x86/xen: Initialize cr4 shadow for 64-bit PV(H) guests
Commit 1e02ce4ccc ("x86: Store a per-cpu shadow copy of CR4")
introduced CR4 shadows.

These shadows are initialized in early boot code. The commit missed
initialization for 64-bit PV(H) guests that this patch adds.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-02-23 16:30:26 +00:00
David Vrabel fdfd811ddd x86/xen: allow privcmd hypercalls to be preempted
Hypercalls submitted by user space tools via the privcmd driver can
take a long time (potentially many 10s of seconds) if the hypercall
has many sub-operations.

A fully preemptible kernel may deschedule such as task in any upcall
called from a hypercall continuation.

However, in a kernel with voluntary or no preemption, hypercall
continuations in Xen allow event handlers to be run but the task
issuing the hypercall will not be descheduled until the hypercall is
complete and the ioctl returns to user space.  These long running
tasks may also trigger the kernel's soft lockup detection.

Add xen_preemptible_hcall_begin() and xen_preemptible_hcall_end() to
bracket hypercalls that may be preempted.  Use these in the privcmd
driver.

When returning from an upcall, call xen_maybe_preempt_hcall() which
adds a schedule point if if the current task was within a preemptible
hypercall.

Since _cond_resched() can move the task to a different CPU, clear and
set xen_in_preemptible_hcall around the call.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2015-02-23 16:30:24 +00:00
Boris Ostrovsky 31795b470b x86/xen: Make sure X2APIC_ENABLE bit of MSR_IA32_APICBASE is not set
Commit d524165cb8 ("x86/apic: Check x2apic early") tests X2APIC_ENABLE
bit of MSR_IA32_APICBASE when CONFIG_X86_X2APIC is off and panics
the kernel when this bit is set.

Xen's PV guests will pass this MSR read to the hypervisor which will
return its version of the MSR, where this bit might be set. Make sure
we clear it before returning MSR value to the caller.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-02-23 16:30:23 +00:00
Borislav Petkov e0bc8d179e x86/lib/memcpy_64.S: Convert memcpy to ALTERNATIVE_2 macro
Make REP_GOOD variant the default after alternatives have run.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:55:52 +01:00
Borislav Petkov a77600cd03 x86/lib/memmove_64.S: Convert memmove() to ALTERNATIVE macro
Make it execute the ERMS version if support is present and we're in the
forward memmove() part and remove the unfolded alternatives section
definition.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:54:14 +01:00
Borislav Petkov 84d95ad4cb x86/lib/memset_64.S: Convert to ALTERNATIVE_2 macro
Make alternatives replace single JMPs instead of whole memset functions,
thus decreasing the amount of instructions copied during patching time
at boot.

While at it, make it use the REP_GOOD version by default which means
alternatives NOP out the JMP to the other versions, as REP_GOOD is set
by default on the majority of relevant x86 processors.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:50:59 +01:00
Borislav Petkov a930dc4543 x86/asm: Cleanup prefetch primitives
This is based on a patch originally by hpa.

With the current improvements to the alternatives, we can simply use %P1
as a mem8 operand constraint and rely on the toolchain to generate the
proper instruction sizes. For example, on 32-bit, where we use an empty
old instruction we get:

  apply_alternatives: feat: 6*32+8, old: (c104648b, len: 4), repl: (c195566c, len: 4)
  c104648b: alt_insn: 90 90 90 90
  c195566c: rpl_insn: 0f 0d 4b 5c

  ...

  apply_alternatives: feat: 6*32+8, old: (c18e09b4, len: 3), repl: (c1955948, len: 3)
  c18e09b4: alt_insn: 90 90 90
  c1955948: rpl_insn: 0f 0d 08

  ...

  apply_alternatives: feat: 6*32+8, old: (c1190cf9, len: 7), repl: (c1955a79, len: 7)
  c1190cf9: alt_insn: 90 90 90 90 90 90 90
  c1955a79: rpl_insn: 0f 0d 0d a0 d4 85 c1

all with the proper padding done depending on the size of the
replacement instruction the compiler generates.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@linux.intel.com>
2015-02-23 13:44:17 +01:00
Borislav Petkov c70e1b475f x86/asm: Use alternative_2() in rdtsc_barrier()
... now that we have it.

Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:44:17 +01:00
Borislav Petkov 6620ef28c8 x86/lib/clear_page_64.S: Convert to ALTERNATIVE_2 macro
Move clear_page() up so that we can get 2-byte forward JMPs when
patching:

  apply_alternatives: feat: 3*32+16, old: (ffffffff8130adb0, len: 5), repl: (ffffffff81d0b859, len: 5)
  ffffffff8130adb0: alt_insn: 90 90 90 90 90
  recompute_jump: new_displ: 0x0000003e
  ffffffff81d0b859: rpl_insn: eb 3e 66 66 90

even though the compiler generated 5-byte JMPs which we padded with 5
NOPs.

Also, make the REP_GOOD version be the default as the majority of
machines set REP_GOOD. This way we get to save ourselves the JMP:

  old insn VA: 0xffffffff813038b0, CPU feat: X86_FEATURE_REP_GOOD, size: 5, padlen: 0
  clear_page:

  ffffffff813038b0 <clear_page>:
  ffffffff813038b0:       e9 0b 00 00 00          jmpq ffffffff813038c0
  repl insn: 0xffffffff81cf0e92, size: 0

  old insn VA: 0xffffffff813038b0, CPU feat: X86_FEATURE_ERMS, size: 5, padlen: 0
  clear_page:

  ffffffff813038b0 <clear_page>:
  ffffffff813038b0:       e9 0b 00 00 00          jmpq ffffffff813038c0
  repl insn: 0xffffffff81cf0e92, size: 5
   ffffffff81cf0e92:      e9 69 2a 61 ff          jmpq ffffffff81303900

  ffffffff813038b0 <clear_page>:
  ffffffff813038b0:       e9 69 2a 61 ff          jmpq ffffffff8091631e

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:44:16 +01:00
Borislav Petkov 8e65f6e03a x86/entry_32: Convert X86_INVD_BUG to ALTERNATIVE macro
Booting a 486 kernel on an AMD guest with this patch applied, says:

  apply_alternatives: feat: 0*32+25, old: (c160a475, len: 5), repl: (c19557d4, len: 5)
  c160a475: alt_insn: 68 10 35 00 c1
  c19557d4: rpl_insn: 68 80 39 00 c1

which is:

  old insn VA: 0xc160a475, CPU feat: X86_FEATURE_XMM, size: 5
  simd_coprocessor_error:
           c160a475:      68 10 35 00 c1          push $0xc1003510 <do_general_protection>
  repl insn: 0xc19557d4, size: 5
           c160a475:      68 80 39 00 c1          push $0xc1003980 <do_simd_coprocessor_error>

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:44:15 +01:00