Commit Graph

26666 Commits

Author SHA1 Message Date
Masami Hiramatsu a8d11cd071 kprobes/x86: Consolidate insn decoder users for copying code
Consolidate x86 instruction decoder users on the path of
copying original code for kprobes.

Kprobes decodes the same instruction a maximum of 3 times when
preparing the instruction buffer:

 - The first time for getting the length of the instruction,
 - the 2nd for adjusting displacement,
 - and the 3rd for checking whether the instruction is boostable or not.

For each time, the actual decoding target address is slightly
different (1st is original address or recovered instruction buffer,
2nd and 3rd are pointing to the copied buffer), but all have
the same instruction.

Thus, this patch also changes the target address to the copied
buffer at first and reuses the decoded "insn" for displacement
adjusting and checking boostability.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076389643.22469.13151892839998777373.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-12 09:23:47 +02:00
Masami Hiramatsu ea1e34fc36 kprobes/x86: Use probe_kernel_read() instead of memcpy()
Use probe_kernel_read() for avoiding unexpected faults while
copying kernel text in __recover_probed_insn(),
__recover_optprobed_insn() and __copy_instruction().

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076382624.22469.10091613887942958518.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-12 09:23:47 +02:00
Masami Hiramatsu d0381c81c2 kprobes/x86: Set kprobes pages read-only
Set the pages which is used for kprobes' singlestep buffer
and optprobe's trampoline instruction buffer to readonly.
This can prevent unexpected (or unintended) instruction
modification.

This also passes rodata_test as below.

Without this patch, rodata_test shows a warning:

  WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:235 note_page+0x7a9/0xa20
  x86/mm: Found insecure W+X mapping at address ffffffffa0000000/0xffffffffa0000000

With this fix, no W+X pages are found:

  x86/mm: Checked W+X mappings: passed, no W+X pages found.
  rodata_test: all tests were successful

Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076375592.22469.14174394514338612247.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-12 09:23:46 +02:00
Masami Hiramatsu 490154bc68 kprobes/x86: Make boostable flag boolean
Make arch_specific_insn.boostable to boolean, since it has
only 2 states, boostable or not. So it is better to use
boolean from the viewpoint of code readability.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076368566.22469.6322906866458231844.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-12 09:23:46 +02:00
Masami Hiramatsu 804dec5bda kprobes/x86: Do not modify singlestep buffer while resuming
Do not modify singlestep execution buffer (kprobe.ainsn.insn)
while resuming from single-stepping, instead, modifies
the buffer to add a jump back instruction at preparing
buffer.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076361560.22469.1610155860343077495.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-12 09:23:45 +02:00
Masami Hiramatsu 17880e4d57 kprobes/x86: Use instruction decoder for booster
Use x86 instruction decoder for checking whether the probed
instruction is able to boost or not, instead of hand-written
code.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076354563.22469.13379472209338986858.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-12 09:23:45 +02:00
Masami Hiramatsu 129d17e8e8 kprobes/x86: Fix the description of __copy_instruction()
Fix the description comment of __copy_instruction() function
since it has already been changed to return the length of the
copied instruction.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076347582.22469.3775133607244923462.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-12 09:23:45 +02:00
Masami Hiramatsu bd0b90676c kprobes/x86: Fix kprobe-booster not to boost far call instructions
Fix the kprobe-booster not to boost far call instruction,
because a call may store the address in the single-step
execution buffer to the stack, which should be modified
after single stepping.

Currently, this instruction will be filtered as not
boostable in resume_execution(), so this is not a
critical issue.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076340615.22469.14066273186134229909.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-12 09:23:44 +02:00
Joerg Roedel 5ed386ec09 x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space
When this function fails it just sends a SIGSEGV signal to
user-space using force_sig(). This signal is missing
essential information about the cause, e.g. the trap_nr or
an error code.

Fix this by propagating the error to the only caller of
mpx_handle_bd_fault(), do_bounds(), which sends the correct
SIGSEGV signal to the process.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: fe3d197f84 ('x86, mpx: On-demand kernel allocation of bounds tables')
Link: http://lkml.kernel.org/r/1491488362-27198-1-git-send-email-joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-12 08:40:58 +02:00
Ingo Molnar b6466d53af Merge branch 'x86/urgent' into x86/cpu, to resolve conflict
Conflicts:
	arch/x86/kernel/cpu/intel_rdt_schemata.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-11 10:47:28 +02:00
Jiri Olsa 7f00f38871 x86/intel_rdt: Fix locking in rdtgroup_schemata_write()
The schemata lock is released before freeing the resource's temporary
tmp_cbms allocation. That's racy versus another write which allocates and
uses new temporary storage, resulting in memory leaks, freeing in use
memory, double a free or any combination of those.

Move the unlock after the release code.

Fixes: 60ec2440c6 ("x86/intel_rdt: Add schemata file")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Shaohua Li <shli@fb.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170411071446.15241-1-jolsa@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-11 09:48:12 +02:00
Markus Trippelsdorf 1c99a68741 x86/debug: Fix the printk() debug output of signal_fault(), do_trap() and do_general_protection()
Since commit:

  4bcc595ccd "printk: reinstate KERN_CONT for printing"

... the debug output of signal_fault(), do_trap() and do_general_protection()
looks garbled, e.g.:

 traps: conftest[9335] trap invalid opcode ip:400428 sp:7ffeaba1b0d8 error:0
  in conftest[400000+1000]

(note the unintended line break.)

Fix the bug by adding KERN_CONTs.

Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-11 09:11:13 +02:00
Ingo Molnar e5185a76a2 Merge branch 'x86/boot' into x86/mm, to avoid conflict
There's a conflict between ongoing level-5 paging support and
the E820 rewrite. Since the E820 rewrite is essentially ready,
merge it into x86/mm to reduce tree conflicts.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-11 08:56:05 +02:00
Ingo Molnar 4729277156 Merge branch 'WIP.x86/boot' into x86/boot, to pick up ready branch
The E820 rework in WIP.x86/boot has gone through a couple of weeks
of exposure in -tip, merge it in a wider fashion.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-11 08:49:31 +02:00
Borislav Petkov 9df9078ef2 perf/amd/uncore: Fix pr_fmt() prefix
Make it "perf/amd/uncore: ", i.e., something more specific than "perf: ".

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20170410122047.3026-4-bp@alien8.de
[ Changed it to perf/amd/uncore/ ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-11 08:44:59 +02:00
Borislav Petkov 68e8038048 perf/amd/uncore: Clean up per-family setup
Fam16h is the same as the default one, remove it. Turn the switch-case
into a simple if-else.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20170410122047.3026-3-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-11 08:44:59 +02:00
Borislav Petkov c2628f90c9 perf/amd/uncore: Do feature check first, before assignments
... and save some unnecessary work. Remove now unused label while at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20170410122047.3026-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-11 08:44:59 +02:00
Ingo Molnar 84b1e36a6a Linux 4.11-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJY6mY1AAoJEHm+PkMAQRiGB14IAImsH28JPjxJVDasMIRPBxVc
 euPPlZgoBieu7sNt+kEsEqdkXuu0MLk6gln0IGxWLeoB2S+u3Tz5LMa2YArVqV9Z
 tWzOnI9auE73P2Pz/tUMOdyMs5tO0PolQxX3uljbULBozOHjHRh13fsXchX2yQvl
 mFeFCDqpPV0KhWRH/ciA8uIHdvYPhMpkKgRtmR8jXL0yzqLp6+2J+Bs8nHG4NNng
 HMVxZPC8jOE/TgWq6k/GmXgxh3H/AideFdHFbLKYnIFJW41ZGOI8a262zq3NmjPd
 lywpVU7O7RMhSITY5PnuR3LpNV8ftw1hz2y6t35unyFK1P02adOSj5GJ3hGdhaQ=
 =Xz5O
 -----END PGP SIGNATURE-----

Merge tag 'v4.11-rc6' into perf/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-11 08:42:47 +02:00
Jiri Olsa 4ffa3c977b x86/intel_rdt: Add cpus_list rdtgroup file
The resource control filesystem provides only a bitmask based cpus file for
assigning CPUs to a resource group. That's cumbersome with large cpumasks
and non-intuitive when modifying the file from the command line.

Range based cpu lists are commonly used along with bitmask based cpu files
in various subsystems throughout the kernel.

Add 'cpus_list' file which is CPU range based.

  # cd /sys/fs/resctrl/
  # echo 1-10 > krava/cpus_list
  # cat krava/cpus_list
  1-10
  # cat krava/cpus
  0007fe
  # cat cpus
  fffff9
  # cat cpus_list
  0,3-23

[ tglx: Massaged changelog and replaced "bitmask lists" by "CPU ranges" ]

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20170410145232.GF25354@krava
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-10 19:10:25 +02:00
Thomas Gleixner 17f8ba1dca x86/intel_rdt: Cleanup kernel-doc
The kernel-doc is inconsistently formatted. Fix it up.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
2017-04-10 18:35:42 +02:00
Thomas Gleixner 6fdc6dd902 x86/vdso: Plug race between mapping and ELF header setup
The vsyscall32 sysctl can racy against a concurrent fork when it switches
from disabled to enabled:

    arch_setup_additional_pages()
	if (vdso32_enabled)
           --> No mapping
                                        sysctl.vsysscall32()
                                          --> vdso32_enabled = true
    create_elf_tables()
      ARCH_DLINFO_IA32
        if (vdso32_enabled) {
           --> Add VDSO entry with NULL pointer

Make ARCH_DLINFO_IA32 check whether the VDSO mapping has been set up for
the newly forked process or not.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170410151723.602367196@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-10 18:31:41 +02:00
Mathias Krause c06989da39 x86/vdso: Ensure vdso32_enabled gets set to valid values only
vdso_enabled can be set to arbitrary integer values via the kernel command
line 'vdso32=' parameter or via 'sysctl abi.vsyscall32'.

load_vdso32() only maps VDSO if vdso_enabled == 1, but ARCH_DLINFO_IA32
merily checks for vdso_enabled != 0. As a consequence the AT_SYSINFO_EHDR
auxiliary vector for the VDSO_ENTRY is emitted with a NULL pointer which
causes a segfault when the application tries to use the VDSO.

Restrict the valid arguments on the command line and the sysctl to 0 and 1.

Fixes: b0b49f2673 ("x86, vdso: Remove compat vdso support")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Cc: Roland McGrath <roland@redhat.com>
Link: http://lkml.kernel.org/r/1491424561-7187-1-git-send-email-minipli@googlemail.com
Link: http://lkml.kernel.org/r/20170410151723.518412863@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-10 18:31:41 +02:00
Borislav Petkov db47d5f856 x86/nmi, EDAC: Get rid of DRAM error reporting thru PCI SERR NMI
Apparently, some machines used to report DRAM errors through a PCI SERR
NMI. This is why we have a call into EDAC in the NMI handler. See

  c0d1217202 ("drivers/edac: add new nmi rescan").

From looking at the patch above, that's two drivers: e752x_edac.c and
e7xxx_edac.c. Now, I wanna say those are old machines which are probably
decommissioned already.

Tony says that "[t]the newest CPU supported by either of those drivers
is the Xeon E7520 (a.k.a. "Nehalem") released in Q1'2010. Possibly some
folks are still using these ... but people that hold onto h/w for 7
years generally cling to old s/w too ... so I'd guess it unlikely that
we will get complaints for breaking these in upstream."

So even if there is a small number still in use, we did load EDAC with
edac_op_state == EDAC_OPSTATE_POLL by default (we still do, in fact)
which means a default EDAC setup without any parameters supplied on the
command line or otherwise would never even log the error in the NMI
handler because we're polling by default:

  inline int edac_handler_set(void)
  {
         if (edac_op_state == EDAC_OPSTATE_POLL)
                 return 0;

         return atomic_read(&edac_handlers);
  }

So, long story short, I'd like to get rid of that nastiness called
edac_stub.c and confine all the EDAC drivers solely to drivers/edac/. If
we ever have to do stuff like that again, it should be notifiers we're
using and not some insanity like this one.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
2017-04-10 17:13:48 +02:00
Linus Torvalds 542380a208 KVM fixes for v4.11-rc6
ARM:
  - Fix a problem with GICv3 userspace save/restore
  - Clarify GICv2 userspace save/restore ABI
  - Be more careful in clearing GIC LRs
  - Add missing synchronization primitive to our MMU handling code
 
 PPC:
  - Check for a NULL return from kzalloc
 
 s390:
  - Prevent translation exception errors on valid page tables for the
    instruction-exection-protection support
 
 x86:
  - Fix Page-Modification Logging when running a nested guest
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJY5/X8AAoJEED/6hsPKofo8hQH/As3CbihZMysaK6JJTx5oMZw
 b3W8p8xVXVu4dKM8WnXa6m5xBDFmOa7eBB+CtT3gP68XnFvMpr/vPmDv6v6i9p8q
 7VyALDqqk2fxDmgHEwuETw9XZyuhdyCz/GaINCdnAJs25wTFOA7r0WEW5W8qRJpA
 9nQirapdJcknymIch1JqeWlYYmbIaFzT8jItfA9QQ7F9mG4pxC8D1k2D56lNYwTf
 FJIgXgkMPe7CPDXmgc/KqT5+iVsc/+SgzP/WdH6bX/007TV71sksxxfz6fIrao0X
 RtcL2WIZTXBdSNrvXflHhCfYgogPgCnYp8AsYTIa+IEijcfteJx7UiET47Ne0Ow=
 =/SPG
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Radim Krčmář:
 "ARM:
   - Fix a problem with GICv3 userspace save/restore
   - Clarify GICv2 userspace save/restore ABI
   - Be more careful in clearing GIC LRs
   - Add missing synchronization primitive to our MMU handling code

  PPC:
   - Check for a NULL return from kzalloc

  s390:
   - Prevent translation exception errors on valid page tables for the
     instruction-exection-protection support

  x86:
   - Fix Page-Modification Logging when running a nested guest"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: PPC: Book3S HV: Check for kmalloc errors in ioctl
  KVM: nVMX: initialize PML fields in vmcs02
  KVM: nVMX: do not leak PML full vmexit to L1
  KVM: arm/arm64: vgic: Fix GICC_PMR uaccess on GICv3 and clarify ABI
  KVM: arm64: Ensure LRs are clear when they should be
  kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
  KVM: s390: remove change-recording override support
  arm/arm64: KVM: Take mmap_sem in kvm_arch_prepare_memory_region
  arm/arm64: KVM: Take mmap_sem in stage2_unmap_vm
2017-04-08 01:39:43 -07:00
Thomas Gleixner b678c91aef Revert "x86/mm/numa: Remove numa_nodemask_from_meminfo()"
This reverts commit 474aeffd88 due to testing
failures.

Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20170406124459.dwn5zhpr2xqg3lqm@node.shutemov.name
2017-04-08 00:00:53 +02:00
Al Viro fccfb99508 Merge commit 'b4fb8f66f1ae2e167d06c12d018025a8d4d3ba7e' into uaccess.ia64
backmerge of mainline ia64 fix
2017-04-06 19:35:03 -04:00
Al Viro 054838bc01 Merge commit 'fc69910f329d' into uaccess.mips
backmerge of a build fix from mainline
2017-04-06 02:07:33 -04:00
Vikas Shivappa de016df88f x86/intel_rdt: Update schemata read to show data in tabular format
The schemata file displays data from different resources on all
domains. Its cumbersome to read since they are not tabular and data/names
could be of different widths.  Make the schemata file to display data in a
tabular format thereby making it nice and simple to read.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: peterz@infradead.org
Cc: vikas.shivappa@intel.com
Cc: h.peter.anvin@intel.com
Link: http://lkml.kernel.org/r/1491255857-17213-4-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-05 17:22:31 +02:00
Tony Luck c4026b7b95 x86/intel_rdt: Implement "update" mode when writing schemata file
The schemata file can have multiple lines and it is cumbersome to update
all lines.

Remove code that requires that the user provides values for every resource
(in the right order).  If the user provides values for just a few
resources, update them and leave the rest unchanged.

Side benefit: we now check which values were updated and only send IPIs to
cpus that actually have updates.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Tested-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: ravi.v.shankar@intel.com
Cc: fenghua.yu@intel.com
Cc: peterz@infradead.org
Cc: vikas.shivappa@intel.com
Cc: h.peter.anvin@intel.com
Link: http://lkml.kernel.org/r/1491255857-17213-3-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-05 17:22:31 +02:00
Baoquan He b1d1776139 x86/efi: Clean up a minor mistake in comment
EFI allocates runtime services regions from EFI_VA_START, -4G, down
to -68G, EFI_VA_END - 64G altogether, top-down.

The mechanism was introduced in commit:

  d2f7cbe7b2 ("x86/efi: Runtime services virtual mapping")

Fix the comment that still says bottom-up.

Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20170404160245.27812-10-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-05 12:27:26 +02:00
Bhupesh Sharma 6e7300cff1 efi/bgrt: Enable ACPI BGRT handling on arm64
Now that the ACPI BGRT handling code has been made generic, we can
enable it for arm64.

Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
[ Updated commit log to reflect that BGRT is only enabled for arm64, and added
  missing 'return' statement to the dummy acpi_parse_bgrt() function. ]
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20170404160245.27812-8-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-05 12:27:25 +02:00
Bhupesh Sharma 75def552bb x86/efi/bgrt: Move efi-bgrt handling out of arch/x86
Now with open-source boot firmware (EDK2) supporting ACPI BGRT table
addition even for architectures like AARCH64, it makes sense to move
out the 'efi-bgrt.c' file and supporting infrastructure from 'arch/x86'
directory and house it inside 'drivers/firmware/efi', so that this common
code can be used across architectures.

Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20170404160245.27812-7-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-05 12:27:24 +02:00
Joerg Roedel cfac6dfa42 x86/signals: Fix lower/upper bound reporting in compat siginfo
Put the right values from the original siginfo into the
userspace compat-siginfo.

This fixes the 32-bit MPX "tabletest" testcase on 64-bit kernels.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.8+
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: a4455082dc ('x86/signals: Add missing signal_compat code for x86 features')
Link: http://lkml.kernel.org/r/1491322501-5054-1-git-send-email-joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-05 10:16:43 +02:00
Ladi Prosek 1fb883bb82 KVM: nVMX: initialize PML fields in vmcs02
L2 was running with uninitialized PML fields which led to incomplete
dirty bitmap logging. This manifested as all kinds of subtle erratic
behavior of the nested guest.

Fixes: 843e433057 ("KVM: VMX: Add PML support in VMX")
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-04 16:24:43 +02:00
Ladi Prosek ab007cc94f KVM: nVMX: do not leak PML full vmexit to L1
The PML feature is not exposed to guests so we should not be forwarding
the vmexit either.

This commit fixes BSOD 0x20001 (HYPERVISOR_ERROR) when running Hyper-V
enabled Windows Server 2016 in L1 on hardware that supports PML.

Fixes: 843e433057 ("KVM: VMX: Add PML support in VMX")
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-04 16:11:06 +02:00
Kirill A. Shutemov 1d33b21956 x86/espfix: Add support for 5-level paging
We don't need extra virtual address space for ESPFIX, so it stays within
one PUD page table for both 4- and 5-level paging.

Redefining ESPFIX_BASE_ADDR using P4D_SHIFT instead of PGDIR_SHIFT would
make it stay in the same place regarding of paging mode.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170330080731.65421-8-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-04 08:22:34 +02:00
Kirill A. Shutemov 5480bb61cf x86/kasan: Extend KASAN to support 5-level paging
This patch bring support for a non-folded additional page table level.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170330080731.65421-7-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-04 08:22:34 +02:00
Kirill A. Shutemov b8504058a0 x86/mm: Add basic defines/helpers for CONFIG_X86_5LEVEL=y
Extends pagetable headers to support the new paging mode.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170330080731.65421-6-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-04 08:22:34 +02:00
Kirill A. Shutemov 335437fbf7 x86/paravirt: Add 5-level support to the paravirt code
Add operations to allocate/release p4ds.

Xen requires more work. We will need to come back to it.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170330080731.65421-5-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-04 08:22:34 +02:00
Kirill A. Shutemov 4c7c44837b x86/mm: Define virtual memory map for 5-level paging
The first part of memory map (up to %esp fixup) simply scales existing
map for 4-level paging by factor of 9 -- number of bits addressed by
the additional page table level.

The rest of the map is unchanged.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170330080731.65421-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-04 08:22:33 +02:00
Kirill A. Shutemov 361b4b58ec x86/asm: Remove __VIRTUAL_MASK_SHIFT==47 assert
We don't need the assert anymore, as:

  17be0aec74 ("x86/asm/entry/64: Implement better check for canonical addresses")

made canonical address checks generic wrt. address width.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170330080731.65421-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-04 08:22:33 +02:00
Kirill A. Shutemov 3677d4c6a2 x86/boot: Detect 5-level paging support
In this initial implementation we force-require 5-level paging support
from the hardware, when compiled with CONFIG_X86_5LEVEL=y. (The kernel
will panic during boot on CPUs that don't support 5-level paging.)

We will implement boot-time switch between 4- and 5-level paging later.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170330080731.65421-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-04 08:22:33 +02:00
Linus Torvalds 3ccfcdc9ef Merge branch 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS fix from Thomas Gleixner:
 "Prevent dmesg from being spammed when MCE logging is active"

* 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Don't print MCEs when mcelog is active
2017-04-03 08:36:24 -07:00
Ingo Molnar 7f75540ff2 Linux 4.11-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJY4ZYkAAoJEHm+PkMAQRiGsq4H/R4PMXDoe2XhSSk7IoT97pXV
 /A8np/scAPjzEgYUidbb54OSqWwsPRuPGWONTFeSrE2u0L4wln/REI91jg7QetLq
 IisncExlYeJ/XQ+iO0ZZh9fLbqwIlEJFdSXmyIFr3m/TBxe8a61C8j93oNgM1tHT
 yuwzlq7c3sLq2hsmUG2HyL2kJsEfRasv4Rk0yhFuti12zVsBoTW4qmZuMauq+gdf
 f7cSYgiHhPTdb2o+azg5O7uYNHaQQBxdUMlIuhhYtVOUq+pFDO23SLHSFIW2NwOm
 Zn5R6CFSrLsCw0Bx0v8Xlc151QUbaRK4h9lhUhkBr6d3uNShU1NQ9JojpSvYwBo=
 =vP6E
 -----END PGP SIGNATURE-----

Merge tag 'v4.11-rc5' into x86/mm, to refresh the branch

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-03 16:36:32 +02:00
Wei Yang 474aeffd88 x86/mm/numa: Remove numa_nodemask_from_meminfo()
numa_nodemask_from_meminfo() generates a nodemask of nodes which have
memory according to a meminfo descriptor.

The two callsites of that function both set bits in copies of the
numa_nodes_parsed nodemask. In both cases, the information in supplied
numa_meminfo is a subset of numa_nodes_parsed. So setting those bits
again is not really necessary.

Here are the three call paths which show that the supplied numa_meminfo
argument describes memory regions in nodes which are already in
numa_nodes_parsed:

    x86_numa_init()
        numa_init()
            Case 1:
            acpi_numa_init()
	    acpi_parse_memory_affinity()
                    numa_add_memblk()
                    node_set(numa_nodes_parsed)
                acpi_parse_slit()
		 acpi_numa_slit_init()
		  numa_set_distance()
		   numa_alloc_distance()
                    numa_nodemask_from_meminfo()

            Case 2:
            amd_numa_init()
                numa_add_memblk()
                node_set(numa_nodes_parsed)

            Case 3
            dummy_numa_init()
                node_set(numa_nodes_parsed)
                numa_add_memblk()

            numa_register_memblks()
                numa_nodemask_from_meminfo()

Thus, in all three cases, the respective bit in numa_nodes_parsed is
set, which means it is not necessary to set it again in a copy of
numa_nodes_parsed.

So remove that function.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20170314030801.13656-2-richard.weiyang@gmail.com
[ Heavily massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-03 11:54:37 +02:00
Wei Yang 43dac8f6a7 x86/mm/numa: Improve alloc_node_data() error path message
alloc_node_data() tries to allocate from the local node first and, if
that attempt fails, falls back to any node. Improve the error message to
issue the initial node for ease during debugging.

Fix a typo in the comments, while at it.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Link: http://lkml.kernel.org/r/20170314030801.13656-1-richard.weiyang@gmail.com
[ Masssage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-03 11:54:37 +02:00
Peter Zijlstra b5effd3815 debug: Fix __bug_table[] in arch linker scripts
The kbuild test robot reported this build failure on a number
of architectures:

 >         make.cross ARCH=arm
 >    lib/lib.a(bug.o): In function `find_bug':
 > >> lib/bug.c:135: undefined reference to `__start___bug_table'
 > >> lib/bug.c:135: undefined reference to `__stop___bug_table'

Caused by:

  19d436268d ("debug: Add _ONCE() logic to report_bug()")

Which moved the BUG_TABLE from RO_DATA_SECTION() to RW_DATA_SECTION(),
but a number of architectures don't use RW_DATA_SECTION(), so they
ended up with no __bug_table[] ...

Ideally all those would use RW_DATA_SECTION() in their linker scripts,
but that's for another day.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kbuild test robot <fengguang.wu@intel.com>
Cc: kbuild-all@01.org
Cc: tipbuild@zytor.com
Link: http://lkml.kernel.org/r/20170330154927.o6qmgfp4bdhrajbm@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-03 10:22:40 +02:00
Linus Torvalds 496dcc5091 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "This update provides:

   - prevent KASLR from randomizing EFI regions

   - restrict the usage of -maccumulate-outgoing-args and document when
     and why it is required.

   - make the Global Physical Address calculation for UV4 systems work
     correctly.

   - address a copy->paste->forgot-edit problem in the MCE exception
     table entries.

   - assign a name to AMD MCA bank 3, so the sysfs file registration
     works.

   - add a missing include in the boot code"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Include missing header file
  x86/mce/AMD: Give a name to MCA bank 3 when accessed with legacy MSRs
  x86/build: Mostly disable '-maccumulate-outgoing-args'
  x86/mm/KASLR: Exclude EFI region from KASLR VA space randomization
  x86/mce: Fix copy/paste error in exception table entries
  x86/platform/uv: Fix calculation of Global Physical Address
2017-04-02 09:27:02 -07:00
Linus Torvalds 128c434a70 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
 "This update provides:

   - make the scheduler clock switch to unstable mode smooth so the
     timestamps stay at microseconds granularity instead of switching to
     tick granularity.

   - unbreak perf test tsc by taking the new offset into account which
     was added in order to proveide better sched clock continuity

   - switching sched clock to unstable mode runs all clock related
     computations which affect the sched clock output itself from a work
     queue. In case of preemption sched clock uses half updated data and
     provides wrong timestamps. Keep the math in the protected context
     and delegate only the static key switch to workqueue context.

   - remove a duplicate header include"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/headers: Remove duplicate #include <linux/sched/debug.h> line
  sched/clock: Fix broken stable to unstable transfer
  sched/clock, x86/perf: Fix "perf test tsc"
  sched/clock: Fix clear_sched_clock_stable() preempt wobbly
2017-04-02 09:25:10 -07:00
Al Viro bee3f412d6 Merge branch 'parisc-4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux into uaccess.parisc 2017-04-02 10:33:48 -04:00
Mike Galbraith 13a6798e4a kasan: do not sanitize kexec purgatory
Fixes this:

  kexec: Undefined symbol: __asan_load8_noabort
  kexec-bzImage64: Loading purgatory failed

Link: http://lkml.kernel.org/r/1489672155.4458.7.camel@gmx.de
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-31 17:13:30 -07:00
Dmitry Safonov ada26481df x86/mm: Make in_compat_syscall() work during exec
The x86 mmap() code selects the mmap base for an allocation depending on
the bitness of the syscall. For 64bit sycalls it select mm->mmap_base and
for 32bit mm->mmap_compat_base.

On execve the registers of the task invoking exec() are copied to the child
pt_regs. So child->pt_regs->orig_ax contains the execve syscall number of the
parent.

exec() calls mmap() which in turn uses in_compat_syscall() to check whether
the mapping is for a 32bit or a 64bit task. The decision is made on the
following criteria:

  ia32	  child->thread.status & TS_COMPAT
   x32	  child->pt_regs.orig_ax & __X32_SYSCALL_BIT
  ia64	  !ia32 && !x32 

child->thread.status is corretly set up in set_personality_*(), but the
syscall number in child->pt_regs.orig_ax is left unmodified.

Therefore the parent/child combinations work or fail in the following way:

Parent Child Child->thread_status  child->pt_regs.orig_ax  in_compat()  Works
ia64    ia64   TS_COMPAT == 0	   __X32_SYSCALL_BIT == 0     false       Y
ia64    ia32   TS_COMPAT == 1	   __X32_SYSCALL_BIT == 0     true        Y
ia64     x32   TS_COMPAT == 0	   __X32_SYSCALL_BIT == 0     false       N
ia32    ia64   TS_COMPAT == 0	   __X32_SYSCALL_BIT == 0     false       Y
ia32    ia32   TS_COMPAT == 1	   __X32_SYSCALL_BIT == 0     true        Y
ia32     x32   TS_COMPAT == 0	   __X32_SYSCALL_BIT == 0     false       N
 x32    ia64   TS_COMPAT == 0	   __X32_SYSCALL_BIT == 1     true        N
 x32    ia32   TS_COMPAT == 1	   __X32_SYSCALL_BIT == 1     true        Y
 x32     x32   TS_COMPAT == 0	   __X32_SYSCALL_BIT == 1     true        Y

Make set_personality_*() store the syscall number incl. __X32_SYSCALL_BIT
which corresponds to the newly started ELF executable in the childs
pt_regs, i.e. pretend that the exec was invoked from a task with the same
executable format.

So both thread.status and pt_regs.orig_ax correspond to the new ELF format
and in_compat_syscall() returns the correct result.

[ tglx: Rewrote changelog ]

Fixes: commit 1b028f784e ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()")
Reported-by: Adam Borowski <kilobyte@angband.pl>
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: 0x7f454c46@gmail.com
Cc: linux-mm@kvack.org
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: http://lkml.kernel.org/r/20170331111137.28170-1-dsafonov@virtuozzo.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-31 16:53:02 +02:00
Zhengyi Shen 6b1cc946dd x86/boot: Include missing header file
Sparse complains about missing forward declarations:

arch/x86/boot/compressed/error.c:8:6:
	warning: symbol 'warn' was not declared. Should it be static?
arch/x86/boot/compressed/error.c:15:6:
	warning: symbol 'error' was not declared. Should it be static?

Include the missing header file.

Signed-off-by: Zhengyi Shen <shenzhengyi@gmail.com>
Acked-by: Kess Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/1490770820-24472-1-git-send-email-shenzhengyi@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-31 10:43:42 +02:00
Geliang Tang 352ef03ca0 x86/pci-calgary: Use setup_timer() instead of open coding it.
Use setup_timer() instead of init_timer() to simplify the code.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Cc: iommu@lists.linux-foundation.org
Cc: Jon Mason <jdmason@kudzu.us>
Cc: Muli Ben-Yehuda <mulix@mulix.org>
Link: http://lkml.kernel.org/r/e4f1888b9e4a87f6a6345f86ed23071483763b22.1490340972.git.geliangtang@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-31 10:21:04 +02:00
Yazen Ghannam 29f72ce3e4 x86/mce/AMD: Give a name to MCA bank 3 when accessed with legacy MSRs
MCA bank 3 is reserved on systems pre-Fam17h, so it didn't have a name.
However, MCA bank 3 is defined on Fam17h systems and can be accessed
using legacy MSRs. Without a name we get a stack trace on Fam17h systems
when trying to register sysfs files for bank 3 on kernels that don't
recognize Scalable MCA.

Call MCA bank 3 "decode_unit" since this is what it represents on
Fam17h. This will allow kernels without SMCA support to see this bank on
Fam17h+ and prevent the stack trace. This will not affect older systems
since this bank is reserved on them, i.e. it'll be ignored.

Tested on AMD Fam15h and Fam17h systems.

  WARNING: CPU: 26 PID: 1 at lib/kobject.c:210 kobject_add_internal
  kobject: (ffff88085bb256c0): attempted to be registered with empty name!
  ...
  Call Trace:
   kobject_add_internal
   kobject_add
   kobject_create_and_add
   threshold_create_device
   threshold_init_device

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1490102285-3659-1-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-31 10:09:44 +02:00
Zhengyi Shen 5af218439f x86/boot: Fix Sparse warning by including required header file
Include declarations for various symbols defined in the error.h header file
to fix the following Sparse warnings:

  arch/x86/boot/compressed/error.c:8:6:
	warning: symbol 'warn' was not declared. Should it be static?
  arch/x86/boot/compressed/error.c:15:6:
	warning: symbol 'error' was not declared. Should it be static?

Signed-off-by: Zhengyi Shen <shenzhengyi@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1490770820-24472-1-git-send-email-shenzhengyi@gmail.com
[ Fixed/enhanced the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-31 08:13:54 +02:00
Borislav Petkov 952a6c2c09 x86/boot/32: Flip the logic in test_wp_bit()
... to have a natural "likely()" in the code flow and thus have the
success case with a branch 99.999% of the times non-taken and function
return code following it instead of jumping to it each time.

This puts the panic() call at the end of the function - it is going to
be practically unreachable anyway.

The C code is a bit more readable too.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: thgarnie@google.com
Link: http://lkml.kernel.org/r/20170330080101.ywsf5rg6ilzu4itk@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-31 08:08:31 +02:00
Andy Shevchenko d4d969909b x86/platform/intel-mid: Enable Bluetooth support on Intel Edison
Intel Edison has Wi-Fi + BT module attached and, since it's an SFI-enumerated
platform, needs platform data. Here we add bits to enable the Bluetooth device.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170330100443.22701-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-31 08:06:29 +02:00
Josh Poimboeuf 3f135e57a4 x86/build: Mostly disable '-maccumulate-outgoing-args'
The GCC '-maccumulate-outgoing-args' flag is enabled for most configs,
mostly because of issues which are no longer relevant.  For most
configs, and with most recent versions of GCC, it's no longer needed.

Clarify which cases need it, and only enable it for those cases.  Also
produce a compile-time error for the ftrace graph + mcount + '-Os' case,
which will otherwise cause runtime failures.

The main benefit of '-maccumulate-outgoing-args' is that it prevents an
ugly prologue for functions which have aligned stacks.  But removing the
option also has some benefits: more readable argument saves, smaller
text size, and (presumably) slightly improved performance.

Here are the object size savings for 32-bit and 64-bit defconfig
kernels:

      text	   data	    bss	     dec	    hex	filename
  10006710	3543328	1773568	15323606	 e9d1d6	vmlinux.x86-32.before
   9706358	3547424	1773568	15027350	 e54c96	vmlinux.x86-32.after

      text	   data	    bss	     dec	    hex	filename
  10652105	4537576	 843776	16033457	 f4a6b1	vmlinux.x86-64.before
  10639629	4537576	 843776	16020981	 f475f5	vmlinux.x86-64.after

That comes out to a 3% text size improvement on x86-32 and a 0.1% text
size improvement on x86-64.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170316193133.zrj6gug53766m6nn@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30 11:53:04 +02:00
Suravee Suthikulpanit 25df39f2cf x86/events/amd/iommu: Enable support for multiple IOMMUs
Add support for multiple IOMMUs to perf by exposing an AMD IOMMU PMU for
each IOMMU found in the system via:

  /bus/event_source/devices/amd_iommu_x

where x is the IOMMU index. This allows users to specify different
events to be programmed into the performance counters of each IOMMU.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
[ Improve readability, shorten names. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Jörg Rödel <joro@8bytes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: iommu@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/1490166162-10002-11-git-send-email-Suravee.Suthikulpanit@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30 09:55:36 +02:00
Suravee Suthikulpanit cf25f904ef x86/events/amd/iommu: Add IOMMU-specific hw_perf_event struct
Current AMD IOMMU perf PMU inappropriately uses the hardware struct
inside the union in struct hw_perf_event, extra_reg in particular.

Instead, introduce an AMD IOMMU-specific struct with required parameters
to be programmed into the IOMMU performance counter control register.

Update the pasid field from 16 to 20 bits while at it.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
[ Fixup macros, shorten get_next_avail_iommu_bnk_cntr() local vars, massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Jörg Rödel <joro@8bytes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: iommu@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/1487926102-13073-10-git-send-email-Suravee.Suthikulpanit@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30 09:55:35 +02:00
Suravee Suthikulpanit 5168654630 x86/events/amd/iommu: Fix sysfs perf attribute groups
Introduce static amd_iommu_attr_groups to simplify the
sysfs attributes initialization code.

Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Jörg Rödel <joro@8bytes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: iommu@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/1487926102-13073-9-git-send-email-Suravee.Suthikulpanit@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30 09:55:34 +02:00
Suravee Suthikulpanit 1650dfd1a9 x86/events, drivers/amd/iommu: Prepare for multiple IOMMUs support
Currently, amd_iommu_pc_get_set_reg_val() cannot support multiple
IOMMUs. Modify it to allow callers to specify an IOMMU. This is in
preparation for supporting multiple IOMMUs.

Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Jörg Rödel <joro@8bytes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: iommu@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/1487926102-13073-8-git-send-email-Suravee.Suthikulpanit@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30 09:53:55 +02:00
Suravee Suthikulpanit f5863a00e7 x86/events/amd/iommu.c: Modify functions to query max banks and counters
Currently, amd_iommu_pc_get_max_[banks|counters]() use end-point device
ID to locate an IOMMU and check the reported max banks/counters. The
logic assumes that the IOMMU_BASE_DEVID belongs to the first IOMMU, and
uses it to acquire a reference to the first IOMMU, which does not work
on certain systems. Instead, modify the function to take an IOMMU index,
and use it to query the corresponding AMD IOMMU instance.

Currently, hardcode the IOMMU index to 0 since the current AMD IOMMU
perf implementation supports only a single IOMMU. A subsequent patch
will add support for multiple IOMMUs, and will use a proper IOMMU index.

Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Jörg Rödel <joro@8bytes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: iommu@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/1487926102-13073-7-git-send-email-Suravee.Suthikulpanit@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30 09:53:54 +02:00
Suravee Suthikulpanit 6b9376e30f x86/events, drivers/iommu/amd: Introduce amd_iommu_get_num_iommus()
Introduce amd_iommu_get_num_iommus(), which returns the value of
amd_iommus_present. The function is used to replace direct access to the
variable, which is now declared as static.

This function will also be used by AMD IOMMU perf driver.

Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Jörg Rödel <joro@8bytes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: iommu@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/1487926102-13073-6-git-send-email-Suravee.Suthikulpanit@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30 09:53:53 +02:00
Suravee Suthikulpanit dc6ca5e47d x86/events/amd/iommu: Clean up perf_iommu_read()
Fix coding style and use GENMASK_ULL().

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Jörg Rödel <joro@8bytes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: iommu@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/1487926102-13073-4-git-send-email-Suravee.Suthikulpanit@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30 09:53:52 +02:00
Suravee Suthikulpanit 6aad0c6269 x86/events/amd/iommu: Clean up bitwise operations
Clean up register initialization and make use of BIT_ULL(x) where
appropriate. This should not affect logic and functionality.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Jörg Rödel <joro@8bytes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: iommu@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/1487926102-13073-3-git-send-email-Suravee.Suthikulpanit@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30 09:53:51 +02:00
Suravee Suthikulpanit f9573e53f1 x86/events/amd/iommu: Declare pr_fmt() format
Declare pr_fmt() format for perf/amd_iommu and remove unnecessary
pr_debug() calls.

Also check return value when _init_events_attrs() fails and issue an
error message.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Jörg Rödel <joro@8bytes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: iommu@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/1487926102-13073-2-git-send-email-Suravee.Suthikulpanit@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30 09:53:51 +02:00
Alexander Shishkin d35869ba34 perf/x86/intel/pt: Allow the disabling of branch tracing
Now that Intel PT supports more types of trace content than just branch
tracing, it may be useful to allow the user to disable branch tracing
when it is not needed.

The special case is BDW, where not setting BranchEn is not supported.

This is slightly trickier than necessary, because up to this moment
the driver has been setting BranchEn automatically and the userspace
assumes as much. Instead of reversing the semantics of BranchEn, we
introduce a 'passthrough' bit, which will forego the default and allow
the user to set BranchEn to their heart's content.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20170206144140.14402-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30 09:53:49 +02:00
Ingo Molnar c69f203df3 Merge branch 'linus' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30 09:48:58 +02:00
Peter Zijlstra 19d436268d debug: Add _ONCE() logic to report_bug()
Josh suggested moving the _ONCE logic inside the trap handler, using a
bit in the bug_entry::flags field, avoiding the need for the extra
variable.

Sadly this only works for WARN_ON_ONCE(), since the others have
printk() statements prior to triggering the trap.

Still, this saves a fair amount of text and some data:

  text         data       filename
  10682460     4530992    defconfig-build/vmlinux.orig
  10665111     4530096    defconfig-build/vmlinux.patched

Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30 09:37:20 +02:00
Peter Zijlstra 44fe84459f locking/atomic: Fix atomic_try_cmpxchg() semantics
Dmitry noted that the new atomic_try_cmpxchg() primitive is broken when
the old pointer doesn't point to the local stack.

He writes:

  "Consider a classical lock-free stack push:

    node->next = atomic_read(&head);
    do {
    } while (!atomic_try_cmpxchg(&head, &node->next, node));

  This code is broken with the current implementation, the problem is
  with unconditional update of *__po.

  In case of success it writes the same value back into *__po, but in
  case of cmpxchg success we might have lose ownership of some memory
  locations and potentially over what __po has pointed to. The same
  holds for the re-read of *__po. "

He also points out that this makes it surprisingly different from the
similar C/C++ atomic operation.

After investigating the code-gen differences caused by this patch; and
a number of alternatives (Linus dislikes this interface lots), we
arrived at these results (size x86_64-defconfig/vmlinux):

  GCC-6.3.0:

  10735757        cmpxchg
  10726413        try_cmpxchg
  10730509        try_cmpxchg + patch
  10730445        try_cmpxchg-linus

  GCC-7 (20170327):

  10709514        cmpxchg
  10704266        try_cmpxchg
  10704266        try_cmpxchg + patch
  10704394        try_cmpxchg-linus

From this we see that the patch has the advantage of better code-gen
on GCC-7 and keeps the interface roughly consistent with the C
language variant.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: a9ebf306f5 ("locking/atomic: Introduce atomic_try_cmpxchg()")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30 09:35:54 +02:00
Arnd Bergmann 70579a86e3 x86/debug: Define BUG() again for !CONFIG_BUG
The latest change to the BUG() macro inadvertently reverted the earlier
commit:

  b06dd879f5 ("x86: always define BUG() and HAVE_ARCH_BUG, even with !CONFIG_BUG")

... that sanitized the behavior with CONFIG_BUG=n.

I noticed this as some warnings have appeared again that were previously
fixed as a side effect of that patch:

  kernel/seccomp.c: In function '__seccomp_filter':
  kernel/seccomp.c:670:1: error: no return statement in function returning non-void [-Werror=return-type]
  ...

This combines the two patches and uses the ud2 macro to define BUG()
in case of CONFIG_BUG=n.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 9a93848fe7 ("x86/debug: Implement __WARN() using UD0")
Link: http://lkml.kernel.org/r/20170329211646.2707365-1-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30 09:12:10 +02:00
Andy Lutomirski 4af1711051 x86/boot/32: Rewrite test_wp_bit()
This code seems to be very old and has gotten only minor updates.
It's overcomplicated and has a bunch of comments that are, at best,
of purely historical interest.  Nowadays we have a shiny function
probe_kernel_write() that does more or less exactly what we need.
Use it.

I switched the page that we test from swapper_pg_dir to
empty_zero_page because writing zero to empty_zero_page is more
obviously safe than writing to the paging structures.  (It's
extremely unlikely that any of this would cause problems in practice
because the write will fail on any supported CPU.)

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/0b9e64ab0236de30e7572213cea77bf95ae2e990.1490831211.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30 09:08:33 +02:00
Ingo Molnar 73fa1362a7 Merge branch 'x86/cpu' into x86/mm, before applying dependent patch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30 09:07:54 +02:00
Kirill A. Shutemov fdd3d8ce0e x86/dump_pagetables: Add support for 5-level paging
Simple extension to support one more page table level.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170328104806.41711-1-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30 08:20:17 +02:00
Al Viro beba3a20bf x86: switch to RAW_COPY_USER
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-29 12:06:28 -04:00
Al Viro a41e0d7542 x86: don't wank with magical size in __copy_in_user()
... especially since copy_in_user() doesn't

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-29 12:04:35 -04:00
Al Viro 3f763453e6 kill __copy_from_user_nocache()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-28 18:24:05 -04:00
Al Viro 122b05ddf5 amd64: get rid of zeroing
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-28 18:24:04 -04:00
Paolo Bonzini 2beb6dad2e KVM: x86: cleanup the page tracking SRCU instance
SRCU uses a delayed work item.  Skip cleaning it up, and
the result is use-after-free in the work item callbacks.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Fixes: 0eb05bf290
Reviewed-by: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-28 14:08:02 +02:00
Ladi Prosek 7ad658b693 KVM: nVMX: fix nested EPT detection
The nested_ept_enabled flag introduced in commit 7ca29de213 was not
computed correctly. We are interested only in L1's EPT state, not the
the combined L0+L1 value.

In particular, if L0 uses EPT but L1 does not, nested_ept_enabled must
be false to make sure that PDPSTRs are loaded based on CR3 as usual,
because the special case described in 26.3.2.4 Loading Page-Directory-
Pointer-Table Entries does not apply.

Fixes: 7ca29de213 ("KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT")
Cc: qemu-stable@nongnu.org
Reported-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-28 10:10:15 +02:00
Borislav Petkov 32b40a82e8 x86/mce: Do not register notifiers with invalid prio
This is just a defensive precaution: do not register notifiers with a
priority which would disrupt the error handling in the notifiers with
prio higher than MCE_PRIO_EDAC.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170327093304.10683-7-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-28 08:55:15 +02:00
Tony Luck 5de97c9f6d x86/mce: Factor out and deprecate the /dev/mcelog driver
Move all code relating to /dev/mcelog to a separate source file.
/dev/mcelog driver can now operate from the machine check notifier with
lowest prio.

Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Move the mce_helper and trigger functionality behind CONFIG_X86_MCELOG_LEGACY. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170327093304.10683-6-bp@alien8.de
[ Renamed CONFIG_X86_MCELOG to CONFIG_X86_MCELOG_LEGACY. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-28 08:55:01 +02:00
Borislav Petkov 011d826111 RAS: Add a Corrected Errors Collector
Introduce a simple data structure for collecting correctable errors
along with accessors. More detailed description in the code itself.

The error decoding is done with the decoding chain now and
mce_first_notifier() gets to see the error first and the CEC decides
whether to log it and then the rest of the chain doesn't hear about it -
basically the main reason for the CE collector - or to continue running
the notifiers.

When the CEC hits the action threshold, it will try to soft-offine the
page containing the ECC and then the whole decoding chain gets to see
the error.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170327093304.10683-5-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-28 08:54:48 +02:00
Borislav Petkov e64edfcce9 x86/mce: Rename mce_log to mce_log_buffer
It is confusing when staring at "struct mce_log mcelog" and then there's
also a function called mce_log(). So call the buffer what it is.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170327093304.10683-4-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-28 08:54:42 +02:00
Borislav Petkov fe3ed20fdd x86/mce: Rename mce_log()'s argument
We call it everywhere "struct mce *m". Adjust that here too to avoid
confusion.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170327093304.10683-3-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-28 08:54:38 +02:00
Ingo Molnar 4a96d1a5f0 Merge branch 'ras/urgent' into ras/core, to pick up fix
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-28 08:54:25 +02:00
Andi Kleen cc66afea58 x86/mce: Don't print MCEs when mcelog is active
Since:

  cd9c57cad3 ("x86/MCE: Dump MCE to dmesg if no consumers")

all MCEs are printed even when mcelog is running. Fix the regression to
not print to dmesg when mcelog is running as it is a consumer too.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: stable@vger.kernel.org # 4.10..
Fixes: cd9c57cad3 ("x86/MCE: Dump MCE to dmesg if no consumers")
Link: http://lkml.kernel.org/r/20170327093304.10683-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-28 08:53:52 +02:00
Ingo Molnar d652f4bbca Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-28 07:44:25 +02:00
Peter Zijlstra 9a93848fe7 x86/debug: Implement __WARN() using UD0
By using "UD0" for WARN()s we remove the function call and its possible
__FILE__ and __LINE__ immediate arguments from the instruction stream.

Total image size will not change much, what we win in the instruction
stream we'll lose because of the __bug_table entries. Still, saves on
I$ footprint and the total image size does go down a bit.

      text    data       filename
  10702123    4530992    defconfig-build/vmlinux.orig
  10682460    4530992    defconfig-build/vmlinux.patched

(UML didn't seem to use GENERIC_BUG at all, so remove it)

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Weinberger <richard.weinberger@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-27 10:20:28 +02:00
Kirill A. Shutemov f2a6a70501 x86: Convert the rest of the code to support p4d_t
This patch converts x86 to use proper folding of a new (fifth) page table level
with <asm-generic/pgtable-nop4d.h>.

That's a bit of a kitchen sink patch, but I don't see how to split it further
without hurting bisectability.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170317185515.8636-7-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-27 08:56:58 +02:00
Xiong Zhang 907cd43902 x86/xen: Change __xen_pgd_walk() and xen_cleanmfnmap() to support p4d
Split these helpers into a couple of per-level functions and add support for
an additional page table level.

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
[ Split off into separate patch ]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170317185515.8636-6-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-27 08:56:49 +02:00
Kirill A. Shutemov d691a3cf80 x86/kasan: Prepare clear_pgds() to switch to <asm-generic/pgtable-nop4d.h>
With folded p4d, pgd_clear() is a nop. Change clear_pgds() to use
p4d_clear() instead.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170317185515.8636-5-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-27 08:56:41 +02:00
Kirill A. Shutemov 4547833602 x86/mm/pat: Add 5-level paging support
Straight-forward extension of existing code to support additional page
table level.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170317185515.8636-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-27 08:56:34 +02:00
Kirill A. Shutemov e981316f56 x86/efi: Add 5-level paging support
Allocate additional page table level and ajdust efi_sync_low_kernel_mappings()
to work with additional page table level.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170317185515.8636-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-27 08:56:25 +02:00
Kirill A. Shutemov 7f68904182 x86/kexec: Add 5-level paging support
Handle additional page table level in the kexec code.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170317185515.8636-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-27 08:56:13 +02:00
Steven Rostedt (VMware) 1fa9d67a2f x86/ftrace: Use Makefile logic instead of #ifdef for compiling ftrace_*.o
Currently ftrace_32.S and ftrace_64.S are compiled even when
CONFIG_FUNCTION_TRACER is not set. This means there's an unnecessary #ifdef
to protect the code. Instead of using preprocessor directives, only compile
those files when FUNCTION_TRACER is defined.

Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20170316210043.peycxdxktwwn6cid@treble
Link: http://lkml.kernel.org/r/20170323143446.217684991@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-24 10:14:08 +01:00
Steven Rostedt (VMware) 644e0e8dc7 x86/ftrace: Add -mfentry support to x86_32 with DYNAMIC_FTRACE set
x86_64 has had fentry support for some time. I did not add support to x86_32
as I was unsure if it will be used much in the future. It is still very much
used, and there's issues with function graph tracing with gcc playing around
with the mcount frames, causing function graph to panic. The fentry code
does not have this issue, and is able to cope as there is no frame to mess
up.

Note, this only adds support for fentry when DYNAMIC_FTRACE is set. There's
really no reason to not have that set, because the performance of the
machine drops significantly when it's not enabled.

Keep !DYNAMIC_FTRACE around to test it off, as there's still some archs
that have FTRACE but not DYNAMIC_FTRACE.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20170323143446.052202377@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-24 10:14:07 +01:00
Steven Rostedt (VMware) ff04b440d2 x86/ftrace: Clean up ftrace_regs_caller
When ftrace_regs_caller was created, it was designed to preserve flags as
much as possible as it needed to act just like a breakpoint triggered on the
same location. But the design is over complicated as it treated all
operations as modifying flags. But push, mov and lea do not modify flags.
This means the code can become more simplified by allowing flags to be
stored further down.

Making ftrace_regs_caller simpler will also be useful in implementing fentry
logic.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20170316135328.36123c3e@gandalf.local.home
Link: http://lkml.kernel.org/r/20170323143445.917292592@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-24 10:14:07 +01:00
Steven Rostedt (VMware) e6928e58d4 x86/ftrace: Add stack frame pointer to ftrace_caller
The function hook ftrace_caller does not create its own stack frame, and
this causes the ftrace stack trace to miss the first function when doing
stack traces.

 # echo schedule:stacktrace > /sys/kernel/tracing/set_ftrace_filter

Before:
         <idle>-0     [002] .N..    29.865807: <stack trace>
 => cpu_startup_entry
 => start_secondary
 => startup_32_smp
           <...>-7     [001] ....    29.866509: <stack trace>
 => kthread
 => ret_from_fork
           <...>-1     [000] ....    29.865377: <stack trace>
 => poll_schedule_timeout
 => do_select
 => core_sys_select
 => SyS_select
 => do_fast_syscall_32
 => entry_SYSENTER_32

After:
          <idle>-0     [002] .N..    31.234853: <stack trace>
 => do_idle
 => cpu_startup_entry
 => start_secondary
 => startup_32_smp
           <...>-7     [003] ....    31.235140: <stack trace>
 => rcu_gp_kthread
 => kthread
 => ret_from_fork
           <...>-1819  [000] ....    31.264172: <stack trace>
 => schedule_hrtimeout_range
 => poll_schedule_timeout
 => do_sys_poll
 => SyS_ppoll
 => do_fast_syscall_32
 => entry_SYSENTER_32

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20170323143445.771707773@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-24 10:14:07 +01:00
Steven Rostedt (VMware) 3d82c59c6e x86/ftrace: Move the ftrace specific code out of entry_32.S
The function tracing hook code for ftrace is not an entry point from
userspace and does not belong in the entry_*.S files. It has already been
moved out of entry_64.S.

Move it out of entry_32.S into its own ftrace_32.S file.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20170323143445.645218946@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-24 10:14:07 +01:00
Steven Rostedt (VMware) db65d7b6dc x86/ftrace: Rename mcount_64.S to ftrace_64.S
With the advent of -mfentry that uses the new "fentry" hook over mcount,
the mcount name is obsolete. Having the code file that ftrace hooks into
called "mcount*.S" is rather misleading. Rename it to ftrace_64.S and
remove the file name reference.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20170323143445.490601451@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-24 10:14:06 +01:00
Baoquan He a46f60d760 x86/mm/KASLR: Exclude EFI region from KASLR VA space randomization
Currently KASLR is enabled on three regions: the direct mapping of physical
memory, vamlloc and vmemmap. However the EFI region is also mistakenly
included for VA space randomization because of misusing EFI_VA_START macro
and assuming EFI_VA_START < EFI_VA_END.

(This breaks kexec and possibly other things that rely on stable addresses.)

The EFI region is reserved for EFI runtime services virtual mapping which
should not be included in KASLR ranges. In Documentation/x86/x86_64/mm.txt,
we can see:

  ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space

EFI uses the space from -4G to -64G thus EFI_VA_START > EFI_VA_END,
Here EFI_VA_START = -4G, and EFI_VA_END = -64G.

Changing EFI_VA_START to EFI_VA_END in mm/kaslr.c fixes this problem.

Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Acked-by: Thomas Garnier <thgarnie@google.com>
Cc: <stable@vger.kernel.org> #4.8+
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1490331592-31860-1-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-24 09:04:27 +01:00
Wanpeng Li 08d839c4b1 KVM: VMX: Fix enable VPID conditions
This can be reproduced by running L2 on L1, and disable VPID on L0
if w/o commit "KVM: nVMX: Fix nested VPID vmx exec control", the L2
crash as below:

KVM: entry failed, hardware error 0x7
EAX=00000000 EBX=00000000 ECX=00000000 EDX=000306c3
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT=     00000000 0000ffff
IDT=     00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000

Reference SDM 30.3 INVVPID:

Protected Mode Exceptions
- #UD
  - If not in VMX operation.
  - If the logical processor does not support VPIDs (IA32_VMX_PROCBASED_CTLS2[37]=0).
  - If the logical processor supports VPIDs (IA32_VMX_PROCBASED_CTLS2[37]=1) but does
    not support the INVVPID instruction (IA32_VMX_EPT_VPID_CAP[32]=0).

So we should check both VPID enable bit in vmx exec control and INVVPID support bit
in vmx capability MSRs to enable VPID. This patch adds the guarantee to not enable
VPID if either INVVPID or single-context/all-context invalidation is not exposed in
vmx capability MSRs.

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-23 19:02:22 +01:00
Wanpeng Li 63cb6d5f00 KVM: nVMX: Fix nested VPID vmx exec control
This can be reproduced by running kvm-unit-tests/vmx.flat on L0 w/ vpid disabled.

Test suite: VPID
Unhandled exception 6 #UD at ip 00000000004051a6
error_code=0000      rflags=00010047      cs=00000008
rax=0000000000000000 rcx=0000000000000001 rdx=0000000000000047 rbx=0000000000402f79
rbp=0000000000456240 rsi=0000000000000001 rdi=0000000000000000
r8=000000000000000a  r9=00000000000003f8 r10=0000000080010011 r11=0000000000000000
r12=0000000000000003 r13=0000000000000708 r14=0000000000000000 r15=0000000000000000
cr0=0000000080010031 cr2=0000000000000000 cr3=0000000007fff000 cr4=0000000000002020
cr8=0000000000000000
STACK: @4051a6 40523e 400f7f 402059 40028f

We should hide and forbid VPID in L1 if it is disabled on L0. However, nested VPID
enable bit is set unconditionally during setup nested vmx exec controls though VPID
is not exposed through nested VMX capablity. This patch fixes it by don't set nested
VPID enable bit if it is disabled on L0.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 5c614b3583 (KVM: nVMX: nested VPID emulation)
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-23 19:02:14 +01:00
Wanpeng Li 24dccf83a1 KVM: x86: correct async page present tracepoint
After async pf setup successfully, there is a broadcast wakeup w/ special
token 0xffffffff which tells vCPU that it should wake up all processes
waiting for APFs though there is no real process waiting at the moment.

The async page present tracepoint print prematurely and fails to catch the
special token setup. This patch fixes it by moving the async page present
tracepoint after the special token setup.

Before patch:

qemu-system-x86-8499  [006] ...1  5973.473292: kvm_async_pf_ready: token 0x0 gva 0x0

After patch:

qemu-system-x86-8499  [006] ...1  5973.473292: kvm_async_pf_ready: token 0xffffffff gva 0x0

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-23 19:02:07 +01:00
Jim Mattson fb6c819843 kvm: vmx: Flush TLB when the APIC-access address changes
Quoting from the Intel SDM, volume 3, section 28.3.3.4: Guidelines for
Use of the INVEPT Instruction:

If EPT was in use on a logical processor at one time with EPTP X, it
is recommended that software use the INVEPT instruction with the
"single-context" INVEPT type and with EPTP X in the INVEPT descriptor
before a VM entry on the same logical processor that enables EPT with
EPTP X and either (a) the "virtualize APIC accesses" VM-execution
control was changed from 0 to 1; or (b) the value of the APIC-access
address was changed.

In the nested case, the burden falls on L1, unless L0 enables EPT in
vmcs02 when L1 doesn't enable EPT in vmcs12.

Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-03-23 19:02:06 +01:00
Peter Xu c761159cf8 KVM: x86: use pic/ioapic destructor when destroy vm
We have specific destructors for pic/ioapic, we'd better use them when
destroying the VM as well.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-03-23 19:02:06 +01:00
Peter Xu 950712eb8e KVM: x86: check existance before destroy
Mostly used for split irqchip mode. In that case, these two things are
not inited at all, so no need to release.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-03-23 19:02:03 +01:00
Peter Zijlstra e6790e4b5d locking/atomic/x86: Use atomic_try_cmpxchg()
Better code generation:

      text           data  bss        name
  10665111        4530096  843776     defconfig-build/vmlinux.3
  10655703        4530096  843776     defconfig-build/vmlinux.4

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-23 08:54:41 +01:00
Peter Zijlstra a9ebf306f5 locking/atomic: Introduce atomic_try_cmpxchg()
Add a new cmpxchg interface:

  bool try_cmpxchg(u{8,16,32,64} *ptr, u{8,16,32,64} *val, u{8,16,32,64} new);

Where the boolean returns the result of the compare; and thus if the
exchange happened; and in case of failure, the new value of *ptr is
returned in *val.

This allows simplification/improvement of loops like:

	for (;;) {
		new = val $op $imm;
		old = cmpxchg(ptr, val, new);
		if (old == val)
			break;
		val = old;
	}

into:

	do {
	} while (!try_cmpxchg(ptr, &val, val $op $imm));

while also generating better code (GCC6 and onwards).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-23 08:54:40 +01:00
Ingo Molnar 1f9ca18404 Merge branch 'x86/process' into x86/mm, to create new base for further patches
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-23 08:28:19 +01:00
Andy Lutomirski b23adb7d3f x86/xen/gdt: Use X86_FEATURE_XENPV instead of globals for the GDT fixup
Xen imposes special requirements on the GDT.  Rather than using a
global variable for the pgprot, just use an explicit special case
for Xen -- this makes it clearer what's going on.  It also debloats
64-bit kernels very slightly.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e9ea96abbfd6a8c87753849171bb5987ecfeb523.1490218061.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-23 08:25:08 +01:00
Andy Lutomirski 59c58ceb29 x86/gdt: Get rid of the get_*_gdt_*_vaddr() helpers
There's a single caller that is only there because it's passing a
pointer into a function (vmcs_writel()) that takes an unsigned long.
Let's just cast it in place rather than having a bunch of trivial
helpers.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/46108fb35e1699252b1b6a85039303ff562c9836.1490218061.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-23 08:25:08 +01:00
Andy Lutomirski 23b2a4ddeb x86/boot/32: Defer resyncing initial_page_table until per-cpu is set up
The x86 smpboot trampoline expects initial_page_table to have the
GDT mapped.  If the GDT ends up in a virtually mapped per-cpu page,
then it won't be in the page tables at all until perc-pu areas are
set up.  The result will be a triple fault the first time that the
CPU attempts to access the GDT after LGDT loads the perc-pu GDT.

This appears to be an old bug, but somehow the GDT fixmap rework
is triggering it.  This seems to have something to do with the
memory layout.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/a553264a5972c6a86f9b5caac237470a0c74a720.1490218061.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-23 08:25:08 +01:00
Andy Lutomirski 3fa1cabbc3 x86/efi/32: Fix EFI on systems where the per-cpu GDT is virtually mapped
__pa() on a per-cpu pointer is invalid.  This bug appears to go *waaay*
back, and I guess it's just never been triggered.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/5ba1d3ffca85e1a5b3ac99265ebe55df4cf0dbe4.1490218061.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-23 08:25:07 +01:00
Andy Lutomirski aa4ea67552 x86/gdt: Fix setup_fixmap_gdt() to use the correct PA
__pa() cannot be used on percpu pointers because they may be
virtually mapped.  Use per_cpu_ptr_to_phys() instead.

This fixes a boot crash on a some 32-bit configurations.  I assume
this is related to which allocation strategy is chosen by the percpu
core.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 69218e4799 x86: ("Remap GDT tables in the fixmap section")
Link: http://lkml.kernel.org/r/22e0069c29fba31998f193201e359eebfdac4960.1490218061.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-23 08:25:07 +01:00
Ingo Molnar 4ccb6aea4b Merge branch 'linus' into x86/asm, to refresh the topic tree with fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-23 07:35:15 +01:00
Peter Zijlstra 698eff6355 sched/clock, x86/perf: Fix "perf test tsc"
People reported that commit:

  5680d8094f ("sched/clock: Provide better clock continuity")

broke "perf test tsc".

That commit added another offset to the reported clock value; so
take that into account when computing the provided offset values.

Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 5680d8094f ("sched/clock: Provide better clock continuity")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-23 07:31:49 +01:00
Tony Luck 26a37ab319 x86/mce: Fix copy/paste error in exception table entries
Back in commit:

  92b0729c34 ("x86/mm, x86/mce: Add memcpy_mcsafe()")

... I made a copy/paste error setting up the exception table entries
and ended up with two for label .L_cache_w3 and none for .L_cache_w2.

This means that if we take a machine check on:

  .L_cache_w2: movq 2*8(%rsi), %r10

then we don't have an exception table entry for this instruction
and we can't recover.

Fix: s/3/2/

Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 92b0729c34 ("x86/mm, x86/mce: Add memcpy_mcsafe()")
Link: http://lkml.kernel.org/r/1490046030-25862-1-git-send-email-tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-22 08:43:25 +01:00
Mike Travis ad4830051a x86/platform/uv: Fix calculation of Global Physical Address
The calculation of the global physical address (GPA) on UV4 is
incorrect.  The gnode_extra/upper global offset should only be
applied for fixed address space systems (UV1..3).

Tested-by: John Estabrook <john.estabrook@hpe.com>
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: John Estabrook <estabrook@sgi.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170321231646.667689538@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-22 07:41:10 +01:00
Lu Baolu 1b5aeebf3a x86/earlyprintk: Add support for earlyprintk via USB3 debug port
Add support for earlyprintk by writing debug messages to the
USB3 debug port. Users can use this type of early printk by
specifying the kernel parameter of "earlyprintk=xdbc". This
gives users a chance of providing debugging output.

The hardware for USB3 debug port requires DMA memory blocks.
This requires to delay setting up debugging hardware and
registering boot console until the memblocks are filled.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Nyman <mathias.nyman@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-usb@vger.kernel.org
Link: http://lkml.kernel.org/r/1490083293-3792-4-git-send-email-baolu.lu@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-21 12:30:16 +01:00
Lu Baolu aeb9dd1de9 usb/early: Add driver for xhci debug capability
XHCI debug capability (DbC) is an optional but standalone
functionality provided by an xHCI host controller. Software
learns this capability by walking through the extended
capability list of the host. XHCI specification describes
DbC in section 7.6.

This patch introduces the code to probe and initialize the
debug capability hardware during early boot. With hardware
initialized, the debug target (system on which this code is
running) will present a debug device through the debug port
(normally the first USB3 port). The debug device is fully
compliant with the USB framework and provides the equivalent
of a very high performance (USB3) full-duplex serial link
between the debug host and target. The DbC functionality is
independent of the xHCI host. There isn't any precondition
from the xHCI host side for the DbC to work.

One use for this feature is kernel debugging, for example
when your machine crashes very early before the regular
console code is initialized. Other uses include simpler,
lockless logging instead of a full-blown printk console
driver and klogd.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Nyman <mathias.nyman@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-usb@vger.kernel.org
Link: http://lkml.kernel.org/r/1490083293-3792-3-git-send-email-baolu.lu@linux.intel.com
[ Small fix to the Kconfig help text. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-21 12:30:05 +01:00
Lu Baolu dd759d93f4 x86/timers: Add simple udelay calibration
Add a simple udelay calibration in x86 architecture-specific
boot-time initializations. This will get a workable estimate
for loops_per_jiffy. Hence, udelay() could be used after this
initialization.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Nyman <mathias.nyman@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-usb@vger.kernel.org
Link: http://lkml.kernel.org/r/1490083293-3792-2-git-send-email-baolu.lu@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-21 12:28:45 +01:00
Kyle Huey d582799fe5 um/arch_prctl: Fix fallout from x86 arch_prctl() rework
The recent arch_prctl rework added a bracket instead of a comma. Fix it.

Fixes: 17a6e1b8e8 ("x86/arch_prctl/64: Rename do_arch_prctl() to do_arch_prctl_64()")
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: kbuild-all@01.org
Link: http://lkml.kernel.org/r/20170320230535.11281-1-khuey@kylehuey.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-21 10:08:29 +01:00
Thomas Garnier ef37bc3614 x86/headers: Simplify asm/fixmap.h inclusion into asm/pgtable*.h
Instead of including fixmap.h twice in pgtable_32.h and pgtable_64.h,
include it only once, in the common asm/pgtable.h header.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: kasan-dev@googlegroups.com
Cc: kernel-hardening@lists.openwall.com
Cc: linux-mm@kvack.org
Cc: richard.weiyang@gmail.com
Cc: zijun_hu <zijun_hu@htc.com>
Link: http://lkml.kernel.org/r/20170321071725.GA15782@gmail.com
[ Generated this patch from two other patches and wrote changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-21 08:21:17 +01:00
Wanpeng Li 6d1b3ad2cd KVM: nVMX: don't reset kvm mmu twice
kvm mmu is reset once successfully loading CR3 as part of emulating vmentry
in nested_vmx_load_cr3(). We should not reset kvm mmu twice.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-03-20 16:25:06 +01:00
Dmitry Vyukov 3863dff0c3 kvm: fix usage of uninit spinlock in avic_vm_destroy()
If avic is not enabled, avic_vm_init() does nothing and returns early.
However, avic_vm_destroy() still tries to destroy what hasn't been created.
The only bad consequence of this now is that avic_vm_destroy() uses
svm_vm_data_hash_lock that hasn't been initialized (and is not meant
to be used at all if avic is not enabled).

Return early from avic_vm_destroy() if avic is not enabled.
It has nothing to destroy.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: kvm@vger.kernel.org
Cc: syzkaller@googlegroups.com
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-03-20 16:25:05 +01:00
Radim Krčmář 6c6c5e0311 KVM: VMX: downgrade warning on unexpected exit code
We never needed the call trace and we better rate-limit if it can be
triggered by a guest.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-03-20 16:25:05 +01:00
Kyle Huey e9ea1e7f53 x86/arch_prctl: Add ARCH_[GET|SET]_CPUID
Intel supports faulting on the CPUID instruction beginning with Ivy Bridge.
When enabled, the processor will fault on attempts to execute the CPUID
instruction with CPL>0. Exposing this feature to userspace will allow a
ptracer to trap and emulate the CPUID instruction.

When supported, this feature is controlled by toggling bit 0 of
MSR_MISC_FEATURES_ENABLES. It is documented in detail in Section 2.3.2 of
https://bugzilla.kernel.org/attachment.cgi?id=243991

Implement a new pair of arch_prctls, available on both x86-32 and x86-64.

ARCH_GET_CPUID: Returns the current CPUID state, either 0 if CPUID faulting
    is enabled (and thus the CPUID instruction is not available) or 1 if
    CPUID faulting is not enabled.

ARCH_SET_CPUID: Set the CPUID state to the second argument. If
    cpuid_enabled is 0 CPUID faulting will be activated, otherwise it will
    be deactivated. Returns ENODEV if CPUID faulting is not supported on
    this system.

The state of the CPUID faulting flag is propagated across forks, but reset
upon exec.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: linux-kselftest@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Robert O'Callahan <robert@ocallahan.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: user-mode-linux-user@lists.sourceforge.net
Cc: David Matlack <dmatlack@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/20170320081628.18952-9-khuey@kylehuey.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-20 16:10:34 +01:00
Kyle Huey 90218ac77d x86/cpufeature: Detect CPUID faulting support
Intel supports faulting on the CPUID instruction beginning with Ivy Bridge.
When enabled, the processor will fault on attempts to execute the CPUID
instruction with CPL>0. This will allow a ptracer to emulate the CPUID
instruction.

Bit 31 of MSR_PLATFORM_INFO advertises support for this feature. It is
documented in detail in Section 2.3.2 of
https://bugzilla.kernel.org/attachment.cgi?id=243991

Detect support for this feature and expose it as X86_FEATURE_CPUID_FAULT.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: linux-kselftest@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Robert O'Callahan <robert@ocallahan.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: user-mode-linux-user@lists.sourceforge.net
Cc: David Matlack <dmatlack@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/20170320081628.18952-8-khuey@kylehuey.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-20 16:10:34 +01:00
Kyle Huey 79170fda31 x86/syscalls/32: Wire up arch_prctl on x86-32
Hook up arch_prctl to call do_arch_prctl() on x86-32, and in 32 bit compat
mode on x86-64. This allows to have arch_prctls that are not specific to 64
bits.

On UML, simply stub out this syscall.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: linux-kselftest@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Robert O'Callahan <robert@ocallahan.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: user-mode-linux-user@lists.sourceforge.net
Cc: David Matlack <dmatlack@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/20170320081628.18952-7-khuey@kylehuey.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-20 16:10:33 +01:00
Kyle Huey b0b9b01401 x86/arch_prctl: Add do_arch_prctl_common()
Add do_arch_prctl_common() to handle arch_prctls that are not specific to 64
bit mode. Call it from the syscall entry point, but not any of the other
callsites in the kernel, which all want one of the existing 64 bit only
arch_prctls.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: linux-kselftest@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Robert O'Callahan <robert@ocallahan.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: user-mode-linux-user@lists.sourceforge.net
Cc: David Matlack <dmatlack@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/20170320081628.18952-6-khuey@kylehuey.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-20 16:10:33 +01:00
Kyle Huey 17a6e1b8e8 x86/arch_prctl/64: Rename do_arch_prctl() to do_arch_prctl_64()
In order to introduce new arch_prctls that are not 64 bit only, rename the
existing 64 bit implementation to do_arch_prctl_64(). Also rename the
second argument of that function from 'addr' to 'arg2', because it will no
longer always be an address.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: linux-kselftest@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Robert O'Callahan <robert@ocallahan.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Len Brown <len.brown@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: user-mode-linux-user@lists.sourceforge.net
Cc: David Matlack <dmatlack@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/20170320081628.18952-5-khuey@kylehuey.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-20 16:10:32 +01:00
Kyle Huey ff3f097eef x86/arch_prctl/64: Use SYSCALL_DEFINE2 to define sys_arch_prctl()
Use the SYSCALL_DEFINE2 macro instead of manually defining it.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: linux-kselftest@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Robert O'Callahan <robert@ocallahan.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: user-mode-linux-user@lists.sourceforge.net
Cc: David Matlack <dmatlack@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/20170320081628.18952-4-khuey@kylehuey.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-20 16:10:32 +01:00
Kyle Huey dd93938a92 x86/arch_prctl: Rename 'code' argument to 'option'
The x86 specific arch_prctl() arbitrarily changed prctl's 'option' to
'code'. Before adding new options, rename it.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: linux-kselftest@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Robert O'Callahan <robert@ocallahan.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: user-mode-linux-user@lists.sourceforge.net
Cc: David Matlack <dmatlack@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/20170320081628.18952-3-khuey@kylehuey.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-20 16:10:32 +01:00
Kyle Huey ab6d946863 x86/msr: Rename MISC_FEATURE_ENABLES to MISC_FEATURES_ENABLES
This matches the only public Intel documentation of this MSR, in the
"Virtualization Technology FlexMigration Application Note"
(preserved at https://bugzilla.kernel.org/attachment.cgi?id=243991)

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: linux-kselftest@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Robert O'Callahan <robert@ocallahan.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: user-mode-linux-user@lists.sourceforge.net
Cc: David Matlack <dmatlack@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/20170320081628.18952-2-khuey@kylehuey.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-20 16:10:32 +01:00
Andy Lutomirski 5b781c7e31 x86/tls: Forcibly set the accessed bit in TLS segments
For mysterious historical reasons, struct user_desc doesn't indicate
whether segments are accessed.  set_thread_area() has always programmed
segments as non-accessed, so the first write will set the accessed bit.
This will fault if the GDT is read-only.

Fix it by making TLS segments start out accessed.

If this ends up breaking something, we could, in principle, leave TLS
segments non-accessed and fix them up when we get the page fault.  I'd be
surprised, though -- AFAIK all the nasty legacy segmented programs (DOSEMU,
Wine, things that run on DOSEMU and Wine, etc.) do their nasty segmented
things using the LDT and not the GDT.  I assume this is mainly because old
OSes (Linux and otherwise) didn't historically provide APIs to do nasty
things in the GDT.

Fixes: 45fc8757d1 ("x86: Make the GDT remapping read-only on 64-bit")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Garnier <thgarnie@google.com>
Link: http://lkml.kernel.org/r/62b7748542df0164af7e0a5231283b9b13858c45.1489900519.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-19 12:14:35 +01:00
Yazen Ghannam 5204bf1703 x86/mce: Init some CPU features early
When the MCA banks in __mcheck_cpu_init_generic() are polled for leftover
errors logged during boot or from the previous boot, its required to have
CPU features detected sufficiently so that the reading out and handling of
those early errors is done correctly.

If those features are not available, the decoding may miss some information
and get incomplete errors logged. For example, on SMCA systems the MCA_IPID
and MCA_SYND registers are not logged and MCA_ADDR is not masked
appropriately.

To cure that, do a subset of the basic feature detection early while the
rest happens in its usual place in __mcheck_cpu_init_vendor().

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1489599055-20756-1-git-send-email-Yazen.Ghannam@amd.com
[ Massage commit message and simplify. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-18 13:03:44 +01:00
Kirill A. Shutemov 2947ba054a x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation
This patch provides all required callbacks required by the generic
get_user_pages_fast() code and switches x86 over - and removes
the platform specific implementation.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dann Frazier <dann.frazier@canonical.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170316213906.89528-1-kirill.shutemov@linux.intel.com
[ Minor readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-18 09:48:03 +01:00
Kirill A. Shutemov 9a804fecee mm/gup: Drop the arch_pte_access_permitted() MMU callback
The only arch that defines it to something meaningful is x86.
But x86 doesn't use the generic GUP_fast() implementation -- the
only place where the callback is called.

Let's drop it.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dann Frazier <dann.frazier@canonical.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170316152655.37789-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-18 09:48:01 +01:00
Thomas Garnier f991376e44 x86/mm: Correct fixmap header usage on adaptable MODULES_END
This patch removes fixmap header usage on non-x86 code that was
introduced by the adaptable MODULE_END change.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170317175034.4701-1-thgarnie@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-18 09:48:00 +01:00
Linus Torvalds eab60d4e5b Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "An assorted pile of fixes along with some hardware enablement:

   - a fix for a KASAN / branch profiling related boot failure

   - some more fallout of the PUD rework

   - a fix for the Always Running Timer which is not initialized when
     the TSC frequency is known at boot time (via MSR/CPUID)

   - a resource leak fix for the RDT filesystem

   - another unwinder corner case fixup

   - removal of the warning for duplicate NMI handlers because there are
     legitimate cases where more than one handler can be registered at
     the last level

   - make a function static - found by sparse

   - a set of updates for the Intel MID platform which got delayed due
     to merge ordering constraints. It's hardware enablement for a non
     mainstream platform, so there is no risk"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mpx: Make unnecessarily global function static
  x86/intel_rdt: Put group node in rdtgroup_kn_unlock
  x86/unwind: Fix last frame check for aligned function stacks
  mm, x86: Fix native_pud_clear build error
  x86/kasan: Fix boot with KASAN=y and PROFILE_ANNOTATED_BRANCHES=y
  x86/platform/intel-mid: Add power button support for Merrifield
  x86/platform/intel-mid: Use common power off sequence
  x86/platform: Remove warning message for duplicate NMI handlers
  x86/tsc: Fix ART for TSC_KNOWN_FREQ
  x86/platform/intel-mid: Correct MSI IRQ line for watchdog device
2017-03-17 14:05:03 -07:00
Linus Torvalds ae13373319 Merge branch 'x86-acpi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 acpi fixes from Thomas Gleixner:
 "This update deals with the fallout of the recent work to make
  cpuid/node mappings persistent.

  It turned out that the boot time ACPI based mapping tripped over ACPI
  inconsistencies and caused regressions. It's partially reverted and
  the fragile part replaced by an implementation which makes the mapping
  persistent when a CPU goes online for the first time"

* 'x86-acpi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  acpi/processor: Check for duplicate processor ids at hotplug time
  acpi/processor: Implement DEVICE operator for processor enumeration
  x86/acpi: Restore the order of CPU IDs
  Revert"x86/acpi: Enable MADT APIs to return disabled apicids"
  Revert "x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"
2017-03-17 14:01:40 -07:00
Linus Torvalds a7fc726bb2 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "A set of perf related fixes:

   - fix a CR4.PCE propagation issue caused by usage of mm instead of
     active_mm and therefore propagated the wrong value.

   - perf core fixes, which plug a use-after-free issue and make the
     event inheritance on fork more robust.

   - a tooling fix for symbol handling"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf symbols: Fix symbols__fixup_end heuristic for corner cases
  x86/perf: Clarify why x86_pmu_event_mapped() isn't racy
  x86/perf: Fix CR4.PCE propagation to use active_mm instead of mm
  perf/core: Better explain the inherit magic
  perf/core: Simplify perf_event_free_task()
  perf/core: Fix event inheritance on fork()
  perf/core: Fix use-after-free in perf_release()
2017-03-17 13:59:52 -07:00
Andy Lutomirski 4b07372a32 x86/perf: Clarify why x86_pmu_event_mapped() isn't racy
Naively, it looks racy, but ->mmap_sem saves it.  Add a comment and a
lockdep assertion.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/03a1e629063899168dfc4707f3bb6e581e21f5c6.1489694270.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-17 08:28:26 +01:00
Andy Lutomirski 5dc855d44c x86/perf: Fix CR4.PCE propagation to use active_mm instead of mm
If one thread mmaps a perf event while another thread in the same mm
is in some context where active_mm != mm (which can happen in the
scheduler, for example), refresh_pce() would write the wrong value
to CR4.PCE.  This broke some PAPI tests.

Reported-and-tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: 7911d3f7af ("perf/x86: Only allow rdpmc if a perf_event is mapped")
Link: http://lkml.kernel.org/r/0c5b38a76ea50e405f9abe07a13dfaef87c173a1.1489694270.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-17 08:28:26 +01:00
Arnd Bergmann d0f33ac9ae mm, x86: fix native_pud_clear build error
We still get a build error in random configurations, after this has been
modified a few times:

  In file included from include/linux/mm.h:68:0,
                   from include/linux/suspend.h:8,
                   from arch/x86/kernel/asm-offsets.c:12:
  arch/x86/include/asm/pgtable.h:66:26: error: redefinition of 'native_pud_clear'
   #define pud_clear(pud)   native_pud_clear(pud)

My interpretation is that the build error comes from a typo in
__PAGETABLE_PUD_FOLDED, so fix that typo now, and remove the incorrect
#ifdef around the native_pud_clear definition.

Fixes: 3e761a42e1 ("mm, x86: fix HIGHMEM64 && PARAVIRT build config for native_pud_clear()")
Fixes: a00cc7d9dd ("mm, x86: add support for PUD-sized transparent hugepages")
Link: http://lkml.kernel.org/r/20170314121330.182155-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Ackedy-by: Dave Jiang <dave.jiang@intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-16 16:56:18 -07:00
Alexander Shishkin ee368428aa perf/x86/intel/pt: Handle VMX better
Since commit:

  1c5ac21a0e ("perf/x86/intel/pt: Don't die on VMXON")

... PT events depend on re-scheduling to get enabled after a VMX session
has taken place. This is, in particular, a problem for CPU context events,
which don't normally get re-scheduled, unless there is a reason.

This patch changes the VMX handling so that PT event gets re-enabled
when VMX root mode exits.

Also, notify the user when there's a gap in PT data due to VMX root
mode by flagging AUX records as partial.

In combination with vmm_exclusive=0 parameter of the kvm_intel driver,
this will result in trace gaps only for the duration of the guest's
timeslices.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20170220133352.17995-5-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16 09:51:11 +01:00