Commit Graph

30505 Commits

Author SHA1 Message Date
Linus Torvalds 899ba79553 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "Speculation:

   - Make the microcode check more robust

   - Make the L1TF memory limit depend on the internal cache physical
     address space and not on the CPUID advertised physical address
     space, which might be significantly smaller. This avoids disabling
     L1TF on machines which utilize the full physical address space.

   - Fix the GDT mapping for EFI calls on 32bit PTI

   - Fix the MCE nospec implementation to prevent #GP

  Fixes and robustness:

   - Use the proper operand order for LSL in the VDSO

   - Prevent NMI uaccess race against CR3 switching

   - Add a lockdep check to verify that text_mutex is held in
     text_poke() functions

   - Repair the fallout of giving native_restore_fl() a prototype

   - Prevent kernel memory dumps based on usermode RIP

   - Wipe KASAN shadow stack before rewinding the stack to prevent false
     positives

   - Move the AMS GOTO enforcement to the actual build stage to allow
     user API header extraction without a compiler

   - Fix a section mismatch introduced by the on demand VDSO mapping
     change

  Miscellaneous:

   - Trivial typo, GCC quirk removal and CC_SET/OUT() cleanups"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/pti: Fix section mismatch warning/error
  x86/vdso: Fix lsl operand order
  x86/mce: Fix set_mce_nospec() to avoid #GP fault
  x86/efi: Load fixmap GDT in efi_call_phys_epilog()
  x86/nmi: Fix NMI uaccess race against CR3 switching
  x86: Allow generating user-space headers without a compiler
  x86/dumpstack: Don't dump kernel memory based on usermode RIP
  x86/asm: Use CC_SET()/CC_OUT() in __gen_sigismember()
  x86/alternatives: Lockdep-enforce text_mutex in text_poke*()
  x86/entry/64: Wipe KASAN stack shadow before rewind_stack_do_exit()
  x86/irqflags: Mark native_restore_fl extern inline
  x86/build: Remove jump label quirk for GCC older than 4.5.2
  x86/Kconfig: Fix trivial typo
  x86/speculation/l1tf: Increase l1tf memory limit for Nehalem+
  x86/spectre: Add missing family 6 check to microcode check
2018-09-02 10:11:30 -07:00
Randy Dunlap ff924c5a1e x86/pti: Fix section mismatch warning/error
Fix the section mismatch warning in arch/x86/mm/pti.c:

WARNING: vmlinux.o(.text+0x6972a): Section mismatch in reference from the function pti_clone_pgtable() to the function .init.text:pti_user_pagetable_walk_pte()
The function pti_clone_pgtable() references
the function __init pti_user_pagetable_walk_pte().
This is often because pti_clone_pgtable lacks a __init
annotation or the annotation of pti_user_pagetable_walk_pte is wrong.
FATAL: modpost: Section mismatches detected.

Fixes: 85900ea515 ("x86/pti: Map the vsyscall page if needed")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/43a6d6a3-d69d-5eda-da09-0b1c88215a2a@infradead.org
2018-09-02 11:24:41 +02:00
Samuel Neves e78e5a9145 x86/vdso: Fix lsl operand order
In the __getcpu function, lsl is using the wrong target and destination
registers. Luckily, the compiler tends to choose %eax for both variables,
so it has been working so far.

Fixes: a582c540ac ("x86/vdso: Use RDPID in preference to LSL when available")
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180901201452.27828-1-sneves@dei.uc.pt
2018-09-01 23:01:56 +02:00
LuckTony c7486104a5 x86/mce: Fix set_mce_nospec() to avoid #GP fault
The trick with flipping bit 63 to avoid loading the address of the 1:1
mapping of the poisoned page while the 1:1 map is updated used to work when
unmapping the page. But it falls down horribly when attempting to directly
set the page as uncacheable.

The problem is that when the cache mode is changed to uncachable, the pages
needs to be flushed from the cache first. But the decoy address is
non-canonical due to bit 63 flipped, and the CLFLUSH instruction throws a
#GP fault.

Add code to change_page_attr_set_clr() to fix the address before calling
flush.

Fixes: 284ce4011b ("x86/memory_failure: Introduce {set, clear}_mce_nospec()")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Link: https://lkml.kernel.org/r/20180831165506.GA9605@agluck-desk
2018-09-01 14:59:19 +02:00
Joerg Roedel eeb89e2bb1 x86/efi: Load fixmap GDT in efi_call_phys_epilog()
When PTI is enabled on x86-32 the kernel uses the GDT mapped in the fixmap
for the simple reason that this address is also mapped for user-space.

The efi_call_phys_prolog()/efi_call_phys_epilog() wrappers change the GDT
to call EFI runtime services and switch back to the kernel GDT when they
return. But the switch-back uses the writable GDT, not the fixmap GDT.

When that happened and and the CPU returns to user-space it switches to the
user %cr3 and tries to restore user segment registers. This fails because
the writable GDT is not mapped in the user page-table, and without a GDT
the fault handlers also can't be launched. The result is a triple fault and
reboot of the machine.

Fix that by restoring the GDT back to the fixmap GDT which is also mapped
in the user page-table.

Fixes: 7757d607c6 x86/pti: ('Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32')
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: hpa@zytor.com
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/1535702738-10971-1-git-send-email-joro@8bytes.org
2018-08-31 17:45:54 +02:00
Linus Torvalds 4290d5b9ca xen: fixes for 4.19-rc2
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCW4lM6AAKCRCAXGG7T9hj
 vs8AAQDysFccg97UdopW3B7yklIaRqkfEIAsxe65f191MXsH2AEAp5SKxZqRPqBP
 a9WHDj8ShB3BhZ/IxpdO9Y59U3Jo4wA=
 =Gt4c
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.19b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - minor cleanup avoiding a warning when building with new gcc

 - a patch to add a new sysfs node for Xen frontend/backend drivers to
   make it easier to obtain the state of a pv device

 - two fixes for 32-bit pv-guests to avoid intermediate L1TF vulnerable
   PTEs

* tag 'for-linus-4.19b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: remove redundant variable save_pud
  xen: export device state to sysfs
  x86/pae: use 64 bit atomic xchg function in native_ptep_get_and_clear
  x86/xen: don't write ptes directly in 32-bit PV guests
2018-08-31 08:45:16 -07:00
Andy Lutomirski 4012e77a90 x86/nmi: Fix NMI uaccess race against CR3 switching
A NMI can hit in the middle of context switching or in the middle of
switch_mm_irqs_off().  In either case, CR3 might not match current->mm,
which could cause copy_from_user_nmi() and friends to read the wrong
memory.

Fix it by adding a new nmi_uaccess_okay() helper and checking it in
copy_from_user_nmi() and in __copy_from_user_nmi()'s callers.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Rik van Riel <riel@surriel.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/dd956eba16646fd0b15c3c0741269dfd84452dac.1535557289.git.luto@kernel.org
2018-08-31 17:08:22 +02:00
Ben Hutchings 829fe4aa9a x86: Allow generating user-space headers without a compiler
When bootstrapping an architecture, it's usual to generate the kernel's
user-space headers (make headers_install) before building a compiler.  Move
the compiler check (for asm goto support) to the archprepare target so that
it is only done when building code for the target.

Fixes: e501ce957a ("x86: Force asm-goto")
Reported-by: Helmut Grohne <helmutg@debian.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180829194317.GA4765@decadent.org.uk
2018-08-31 17:08:22 +02:00
Jann Horn 342db04ae7 x86/dumpstack: Don't dump kernel memory based on usermode RIP
show_opcodes() is used both for dumping kernel instructions and for dumping
user instructions. If userspace causes #PF by jumping to a kernel address,
show_opcodes() can be reached with regs->ip controlled by the user,
pointing to kernel code. Make sure that userspace can't trick us into
dumping kernel memory into dmesg.

Fixes: 7cccf0725c ("x86/dumpstack: Add a show_ip() function")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: security@kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180828154901.112726-1-jannh@google.com
2018-08-31 17:08:22 +02:00
Uros Bizjak 26e609eccd x86/asm: Use CC_SET()/CC_OUT() in __gen_sigismember()
Replace open-coded set instructions with CC_SET()/CC_OUT().

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20180814165951.13538-1-ubizjak@gmail.com
2018-08-30 13:02:31 +02:00
Jiri Kosina 9222f60650 x86/alternatives: Lockdep-enforce text_mutex in text_poke*()
text_poke() and text_poke_bp() must be called with text_mutex held.

Put proper lockdep anotation in place instead of just mentioning the
requirement in a comment.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1808280853520.25787@cbobk.fhfr.pm
2018-08-30 13:02:30 +02:00
Jann Horn f12d11c5c1 x86/entry/64: Wipe KASAN stack shadow before rewind_stack_do_exit()
Reset the KASAN shadow state of the task stack before rewinding RSP.
Without this, a kernel oops will leave parts of the stack poisoned, and
code running under do_exit() can trip over such poisoned regions and cause
nonsensical false-positive KASAN reports about stack-out-of-bounds bugs.

This does not wipe the exception stacks; if an oops happens on an exception
stack, it might result in random KASAN false-positives from other tasks
afterwards. This is probably relatively uninteresting, since if the kernel
oopses on an exception stack, there are most likely bigger things to worry
about. It'd be more interesting if vmapped stacks and KASAN were
compatible, since then handle_stack_overflow() would oops from exception
stack context.

Fixes: 2deb4be280 ("x86/dumpstack: When OOPSing, rewind the stack before do_exit()")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: kasan-dev@googlegroups.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180828184033.93712-1-jannh@google.com
2018-08-30 11:37:09 +02:00
Nick Desaulniers 1f59a4581b x86/irqflags: Mark native_restore_fl extern inline
This should have been marked extern inline in order to pick up the out
of line definition in arch/x86/kernel/irqflags.S.

Fixes: 208cbb3255 ("x86/irqflags: Provide a declaration for native_save_fl")
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180827214011.55428-1-ndesaulniers@google.com
2018-08-30 11:37:09 +02:00
Masahiro Yamada 36bf9da291 x86/build: Remove jump label quirk for GCC older than 4.5.2
Commit cafa0010cd ("Raise the minimum required gcc version to 4.6")
bumped the minimum GCC version to 4.6 for all architectures.

Remove the workaround code.

It was the only user of cc-if-fullversion.  Remove the macro as well.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: linux-kbuild@vger.kernel.org
Link: https://lkml.kernel.org/r/1535348714-25457-1-git-send-email-yamada.masahiro@socionext.com
2018-08-30 11:37:08 +02:00
Linus Torvalds b4df50de6a Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:

 - Check for the right CPU feature bit in sm4-ce on arm64.

 - Fix scatterwalk WARN_ON in aes-gcm-ce on arm64.

 - Fix unaligned fault in aesni on x86.

 - Fix potential NULL pointer dereference on exit in chtls.

 - Fix DMA mapping direction for RSA in caam.

 - Fix error path return value for xts setkey in caam.

 - Fix address endianness when DMA unmapping in caam.

 - Fix sleep-in-atomic in vmx.

 - Fix command corruption when queue is full in cavium/nitrox.

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: cavium/nitrox - fix for command corruption in queue full case with backlog submissions.
  crypto: vmx - Fix sleep-in-atomic bugs
  crypto: arm64/aes-gcm-ce - fix scatterwalk API violation
  crypto: aesni - Use unaligned loads from gcm_context_data
  crypto: chtls - fix null dereference chtls_free_uld()
  crypto: arm64/sm4-ce - check for the right CPU feature bit
  crypto: caam - fix DMA mapping direction for RSA forms 2 & 3
  crypto: caam/qi - fix error path in xts setkey
  crypto: caam/jr - fix descriptor DMA unmapping
2018-08-29 13:38:39 -07:00
Colin Ian King 6d3c8ce012 x86/xen: remove redundant variable save_pud
Variable save_pud is being assigned but is never used hence it is
redundant and can be removed.

Cleans up clang warning:
variable 'save_pud' set but not used [-Wunused-but-set-variable]

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-08-28 17:37:40 -04:00
Juergen Gross b2d7a075a1 x86/pae: use 64 bit atomic xchg function in native_ptep_get_and_clear
Using only 32-bit writes for the pte will result in an intermediate
L1TF vulnerable PTE. When running as a Xen PV guest this will at once
switch the guest to shadow mode resulting in a loss of performance.

Use arch_atomic64_xchg() instead which will perform the requested
operation atomically with all 64 bits.

Some performance considerations according to:

https://software.intel.com/sites/default/files/managed/ad/dc/Intel-Xeon-Scalable-Processor-throughput-latency.pdf

The main number should be the latency, as there is no tight loop around
native_ptep_get_and_clear().

"lock cmpxchg8b" has a latency of 20 cycles, while "lock xchg" (with a
memory operand) isn't mentioned in that document. "lock xadd" (with xadd
having 3 cycles less latency than xchg) has a latency of 11, so we can
assume a latency of 14 for "lock xchg".

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Jason Andryuk <jandryuk@gmail.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-08-27 14:20:49 -04:00
Juergen Gross f7c90c2aa4 x86/xen: don't write ptes directly in 32-bit PV guests
In some cases 32-bit PAE PV guests still write PTEs directly instead of
using hypercalls. This is especially bad when clearing a PTE as this is
done via 32-bit writes which will produce intermediate L1TF attackable
PTEs.

Change the code to use hypercalls instead.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-08-27 09:31:26 -04:00
Nikolas Nyby e3a5dc0871 x86/Kconfig: Fix trivial typo
Fix a typo in the Kconfig help text: adverticed -> advertised.

Signed-off-by: Nikolas Nyby <nikolas@gnu.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: trivial@kernel.org
Cc: tglx@linutronix.de
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20180825231054.23813-1-nikolas@gnu.org
2018-08-27 10:29:14 +02:00
Andi Kleen cc51e5428e x86/speculation/l1tf: Increase l1tf memory limit for Nehalem+
On Nehalem and newer core CPUs the CPU cache internally uses 44 bits
physical address space. The L1TF workaround is limited by this internal
cache address width, and needs to have one bit free there for the
mitigation to work.

Older client systems report only 36bit physical address space so the range
check decides that L1TF is not mitigated for a 36bit phys/32GB system with
some memory holes.

But since these actually have the larger internal cache width this warning
is bogus because it would only really be needed if the system had more than
43bits of memory.

Add a new internal x86_cache_bits field. Normally it is the same as the
physical bits field reported by CPUID, but for Nehalem and newerforce it to
be at least 44bits.

Change the L1TF memory size warning to use the new cache_bits field to
avoid bogus warnings and remove the bogus comment about memory size.

Fixes: 17dbca1193 ("x86/speculation/l1tf: Add sysfs reporting for l1tf")
Reported-by: George Anchev <studio@anchev.net>
Reported-by: Christopher Snowhill <kode54@gmail.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Michael Hocko <mhocko@suse.com>
Cc: vbabka@suse.cz
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180824170351.34874-1-andi@firstfloor.org
2018-08-27 10:29:14 +02:00
Andi Kleen 1ab534e85c x86/spectre: Add missing family 6 check to microcode check
The check for Spectre microcodes does not check for family 6, only the
model numbers.

Add a family 6 check to avoid ambiguity with other families.

Fixes: a5b2966364 ("x86/cpufeature: Blacklist SPEC_CTRL/PRED_CMD on early Spectre v2 microcodes")
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180824170351.34874-2-andi@firstfloor.org
2018-08-27 10:29:14 +02:00
Linus Torvalds d207ea8e74 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Thomas Gleixner:
 "Kernel:
   - Improve kallsyms coverage
   - Add x86 entry trampolines to kcore
   - Fix ARM SPE handling
   - Correct PPC event post processing

  Tools:
   - Make the build system more robust
   - Small fixes and enhancements all over the place
   - Update kernel ABI header copies
   - Preparatory work for converting libtraceevnt to a shared library
   - License cleanups"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits)
  tools arch: Update arch/x86/lib/memcpy_64.S copy used in 'perf bench mem memcpy'
  tools arch x86: Update tools's copy of cpufeatures.h
  perf python: Fix pyrf_evlist__read_on_cpu() interface
  perf mmap: Store real cpu number in 'struct perf_mmap'
  perf tools: Remove ext from struct kmod_path
  perf tools: Add gzip_is_compressed function
  perf tools: Add lzma_is_compressed function
  perf tools: Add is_compressed callback to compressions array
  perf tools: Move the temp file processing into decompress_kmodule
  perf tools: Use compression id in decompress_kmodule()
  perf tools: Store compression id into struct dso
  perf tools: Add compression id into 'struct kmod_path'
  perf tools: Make is_supported_compression() static
  perf tools: Make decompress_to_file() function static
  perf tools: Get rid of dso__needs_decompress() call in __open_dso()
  perf tools: Get rid of dso__needs_decompress() call in symbol__disassemble()
  perf tools: Get rid of dso__needs_decompress() call in read_object_code()
  tools lib traceevent: Change to SPDX License format
  perf llvm: Allow passing options to llc in addition to clang
  perf parser: Improve error message for PMU address filters
  ...
2018-08-26 11:25:21 -07:00
Linus Torvalds 2a8a2b7c49 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:

 - Correct the L1TF fallout on 32bit and the off by one in the 'too much
   RAM for protection' calculation.

 - Add a helpful kernel message for the 'too much RAM' case

 - Unbreak the VDSO in case that the compiler desides to use indirect
   jumps/calls and emits retpolines which cannot be resolved because the
   kernel uses its own thunks, which does not work for the VDSO. Make it
   use the builtin thunks.

 - Re-export start_thread() which was unexported when the 32/64bit
   implementation was unified. start_thread() is required by modular
   binfmt handlers.

 - Trivial cleanups

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation/l1tf: Suggest what to do on systems with too much RAM
  x86/speculation/l1tf: Fix off-by-one error when warning that system has too much RAM
  x86/kvm/vmx: Remove duplicate l1d flush definitions
  x86/speculation/l1tf: Fix overflow in l1tf_pfn_limit() on 32bit
  x86/process: Re-export start_thread()
  x86/mce: Add notifier_block forward declaration
  x86/vdso: Fix vDSO build if a retpoline is emitted
2018-08-26 10:13:21 -07:00
Linus Torvalds 2923b27e54 libnvdimm-for-4.19_dax-memory-failure
* memory_failure() gets confused by dev_pagemap backed mappings. The
   recovery code has specific enabling for several possible page states
   that needs new enabling to handle poison in dax mappings. Teach
   memory_failure() about ZONE_DEVICE pages.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAlt9ui8ACgkQYGjFFmlT
 OEpNRw//XGj9s7sezfJFeol4psJlRUd935yii/gmJRgi/yPf2VxxQG9qyM6SMBUc
 75jASfOL6FSsfxHz0kplyWzMDNdrTkNNAD+9rv80FmY7GqWgcas9DaJX7jZ994vI
 5SRO7pfvNZcXlo7IhqZippDw3yxkIU9Ufi0YQKaEUm7GFieptvCZ0p9x3VYfdvwM
 BExrxQe0X1XUF4xErp5P78+WUbKxP47DLcucRDig8Q7dmHELUdyNzo3E1SVoc7m+
 3CmvyTj6XuFQgOZw7ZKun1BJYfx/eD5ZlRJLZbx6wJHRtTXv/Uea8mZ8mJ31ykN9
 F7QVd0Pmlyxys8lcXfK+nvpL09QBE0/PhwWKjmZBoU8AdgP/ZvBXLDL/D6YuMTg6
 T4wwtPNJorfV4lVD06OliFkVI4qbKbmNsfRq43Ns7PCaLueu4U/eMaSwSH99UMaZ
 MGbO140XW2RZsHiU9yTRUmZq73AplePEjxtzR8oHmnjo45nPDPy8mucWPlkT9kXA
 oUFMhgiviK7dOo19H4eaPJGqLmHM93+x5tpYxGqTr0dUOXUadKWxMsTnkID+8Yi7
 /kzQWCFvySz3VhiEHGuWkW08GZT6aCcpkREDomnRh4MEnETlZI8bblcuXYOCLs6c
 nNf1SIMtLdlsl7U1fEX89PNeQQ2y237vEDhFQZftaalPeu/JJV0=
 =Ftop
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.19_dax-memory-failure' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm memory-failure update from Dave Jiang:
 "As it stands, memory_failure() gets thoroughly confused by dev_pagemap
  backed mappings. The recovery code has specific enabling for several
  possible page states and needs new enabling to handle poison in dax
  mappings.

  In order to support reliable reverse mapping of user space addresses:

   1/ Add new locking in the memory_failure() rmap path to prevent races
      that would typically be handled by the page lock.

   2/ Since dev_pagemap pages are hidden from the page allocator and the
      "compound page" accounting machinery, add a mechanism to determine
      the size of the mapping that encompasses a given poisoned pfn.

   3/ Given pmem errors can be repaired, change the speculatively
      accessed poison protection, mce_unmap_kpfn(), to be reversible and
      otherwise allow ongoing access from the kernel.

  A side effect of this enabling is that MADV_HWPOISON becomes usable
  for dax mappings, however the primary motivation is to allow the
  system to survive userspace consumption of hardware-poison via dax.
  Specifically the current behavior is:

     mce: Uncorrected hardware memory error in user-access at af34214200
     {1}[Hardware Error]: It has been corrected by h/w and requires no further action
     mce: [Hardware Error]: Machine check events logged
     {1}[Hardware Error]: event severity: corrected
     Memory failure: 0xaf34214: reserved kernel page still referenced by 1 users
     [..]
     Memory failure: 0xaf34214: recovery action for reserved kernel page: Failed
     mce: Memory error not recovered
     <reboot>

  ...and with these changes:

     Injecting memory failure for pfn 0x20cb00 at process virtual address 0x7f763dd00000
     Memory failure: 0x20cb00: Killing dax-pmd:5421 due to hardware memory corruption
     Memory failure: 0x20cb00: recovery action for dax page: Recovered

  Given all the cross dependencies I propose taking this through
  nvdimm.git with acks from Naoya, x86/core, x86/RAS, and of course dax
  folks"

* tag 'libnvdimm-for-4.19_dax-memory-failure' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm:
  libnvdimm, pmem: Restore page attributes when clearing errors
  x86/memory_failure: Introduce {set, clear}_mce_nospec()
  x86/mm/pat: Prepare {reserve, free}_memtype() for "decoy" addresses
  mm, memory_failure: Teach memory_failure() about dev_pagemap pages
  filesystem-dax: Introduce dax_lock_mapping_entry()
  mm, memory_failure: Collect mapping size in collect_procs()
  mm, madvise_inject_error: Let memory_failure() optionally take a page reference
  mm, dev_pagemap: Do not clear ->mapping on final put
  mm, madvise_inject_error: Disable MADV_SOFT_OFFLINE for ZONE_DEVICE pages
  filesystem-dax: Set page->index
  device-dax: Set page->index
  device-dax: Enable page_mapping()
  device-dax: Convert to vmf_insert_mixed and vm_fault_t
2018-08-25 18:43:59 -07:00
Linus Torvalds 1bc276775d Kbuild updates for v4.19 (2nd)
- add build_{menu,n,g,x}config targets for compile-testing Kconfig
 
  - fix and improve recursive dependency detection in Kconfig
 
  - fix parallel building of menuconfig/nconfig
 
  - fix syntax error in clang-version.sh
 
  - suppress distracting log from syncconfig
 
  - remove obsolete "rpm" target
 
  - remove VMLINUX_SYMBOL(_STR) macro entirely
 
  - fix microblaze build with CONFIG_DYNAMIC_FTRACE
 
  - move compiler test for dead code/data elimination to Kconfig
 
  - rename well-known LDFLAGS variable to KBUILD_LDFLAGS
 
  - misc fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJbgYhCAAoJED2LAQed4NsGErAP/jt7gt76+N0PZmADBZqyVR/H
 4k286g3OiT7DIcdvwqE5BRvu+zNOamDujnnXw63/jwu2RjrkLX/JnhzTbC0IZleZ
 KeO4bU4ZH0WFa0Ny9pp0LAnzbXGMnQjDXygcUd5BFoEd5JSLKW2PISEEjRh6b5B7
 swJRdgySFaMrUBRNf13FwH5EvX/D0xZQe/wFhFCOv6L4gJZFMmpGUIepgTjTUmxZ
 wcNN6xxXg+ulLHVcPdPQ9EYssNHN5xNys02+IdIrhhXuNHji/TFm4dGYuU+dDGeE
 Eu4O6Qs7pg0PFGrZ5gLxXDJEp75W+uaTNOqV+jcjq8MRxJuWxyy2biUeelKRT/KH
 0iv4ZQJVOMOhl8fZgLtQaXHyQ++5uwd6kvPPf+XFdkogGAIXK0wKWLoALFEOXwb6
 z1BBnFx09LrKPGt0ZlKX624OEczedv/UAFiSh3Ic2S3PFEpq4oHrEGhTnyKRobPv
 OEcF3RqKjmAdK7PLy4kVpTLhkutkWWhw6Giy9qXUkXYJWonJR7NTQ1mIan2LoGZC
 sGi+qKae/8xgO2Nerx59tZpkiHYTMfYeAo8frzWurOxm3YzEfaxNNGPl+IMW7VKz
 cNPzQZ5tMUy4i4PAhk/gIWibnUTPfjDbWsZSMtIbO0GFcao56EvllwD8/awuy7lO
 QkaAeZHFcF+qgU3muaYK
 =Vsb2
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull more Kbuild updates from Masahiro Yamada:

 - add build_{menu,n,g,x}config targets for compile-testing Kconfig

 - fix and improve recursive dependency detection in Kconfig

 - fix parallel building of menuconfig/nconfig

 - fix syntax error in clang-version.sh

 - suppress distracting log from syncconfig

 - remove obsolete "rpm" target

 - remove VMLINUX_SYMBOL(_STR) macro entirely

 - fix microblaze build with CONFIG_DYNAMIC_FTRACE

 - move compiler test for dead code/data elimination to Kconfig

 - rename well-known LDFLAGS variable to KBUILD_LDFLAGS

 - misc fixes and cleanups

* tag 'kbuild-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: rename LDFLAGS to KBUILD_LDFLAGS
  kbuild: pass LDFLAGS to recordmcount.pl
  kbuild: test dead code/data elimination support in Kconfig
  initramfs: move gen_initramfs_list.sh from scripts/ to usr/
  vmlinux.lds.h: remove stale <linux/export.h> include
  export.h: remove VMLINUX_SYMBOL() and VMLINUX_SYMBOL_STR()
  Coccinelle: remove pci_alloc_consistent semantic to detect in zalloc-simple.cocci
  kbuild: make sorting initramfs contents independent of locale
  kbuild: remove "rpm" target, which is alias of "rpm-pkg"
  kbuild: Fix LOADLIBES rename in Documentation/kbuild/makefiles.txt
  kconfig: suppress "configuration written to .config" for syncconfig
  kconfig: fix "Can't open ..." in parallel build
  kbuild: Add a space after `!` to prevent parsing as file pattern
  scripts: modpost: check memory allocation results
  kconfig: improve the recursive dependency report
  kconfig: report recursive dependency involving 'imply'
  kconfig: error out when seeing recursive dependency
  kconfig: add build-only configurator targets
  scripts/dtc: consolidate include path options in Makefile
2018-08-25 13:40:38 -07:00
Dave Watson e5b954e8d1 crypto: aesni - Use unaligned loads from gcm_context_data
A regression was reported bisecting to 1476db2d12
"Move HashKey computation from stack to gcm_context".  That diff
moved HashKey computation from the stack, which was explicitly aligned
in the asm, to a struct provided from the C code, depending on
AESNI_ALIGN_ATTR for alignment.   It appears some compilers may not
align this struct correctly, resulting in a crash on the movdqa
instruction when attempting to encrypt or decrypt data.

Fix by using unaligned loads for the HashKeys.  On modern
hardware there is no perf difference between the unaligned and
aligned loads.  All other accesses to gcm_context_data already use
unaligned loads.

Reported-by: Mauro Rossi <issor.oruam@gmail.com>
Fixes: 1476db2d12 ("Move HashKey computation from stack to gcm_context")
Cc: <stable@vger.kernel.org>
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-25 19:50:42 +08:00
Linus Torvalds 18b8bfdfba IOMMU Update for Linux v4.19
Including:
 
 	- PASID table handling updates for the Intel VT-d driver. It
 	  implements a global PASID space now so that applications
 	  usings multiple devices will just have one PASID.
 
 	- A new config option to make iommu passthroug mode the default.
 
 	- New sysfs attribute for iommu groups to export the type of the
 	  default domain.
 
 	- A debugfs interface (for debug only) usable by IOMMU drivers
 	  to export internals to user-space.
 
 	- R-Car Gen3 SoCs support for the ipmmu-vmsa driver
 
 	- The ARM-SMMU now aborts transactions from unknown devices and
 	  devices not attached to any domain.
 
 	- Various cleanups and smaller fixes all over the place.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJbf/9wAAoJECvwRC2XARrjcuYP/3dIsOFN7Xb4sTOB5wxk4wmD
 2Rm5o/18cFekEy4M8fwIBCYkzH/McohgKbOFcH6XiCxIwJ5RdXzITLAwmp4PbvIO
 KtwppXSp+MQtboip/bp6NDNBhABErgUtgdXawwENCCrFivXDsB8W4wnXESAOkLv9
 4fLXrUgDFCAquLZpLqQobXHhajtGAkSekaasphlhejXFulFyF1YcEUcliU7eXZ0R
 rZjL4Zqcyyi5kv6d3WhL+tvmmhr7wfMsMPaW18eRf9tXvMpWRM2GOAj65coI2AWs
 1T1kW/jvvrxnewOsmo1nYlw7R07uiRkUfHmJ9tY65xW4120HJFhdFLPUQZXfrX/b
 wcGbheYIh6cwAaZBtPJ35bPeW6pREkDOShohbzt45T62Q837cBkr3zyHhNsoOXHS
 13YVtTd2vtPa4iLdu2qmEOC1OuhQnMvqHqX0iN8U74QbDxEYYvMfAdx0JL3hmPp/
 uynY3QmXIKCeZg+vH2qcWHm07nfaAr5y8WSPA0crnqeznD5zJ4kvJf5dFGmDyTKr
 pyTkhidkifm6ZejrJsDZveoZdLpHrOatrqKaoLFh2crMUG3d807NYqQ3JmA3NDjg
 zPbYyU4joFGNVjd3XkSnRTGxR6YvLIwNbkQ3b/K/B5AqWJ6VrTbbTCOa4GSms6rF
 Qm8wRrmYaycKxkcMqtls
 =TeYQ
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU updates from Joerg Roedel:

 - PASID table handling updates for the Intel VT-d driver. It implements
   a global PASID space now so that applications usings multiple devices
   will just have one PASID.

 - A new config option to make iommu passthroug mode the default.

 - New sysfs attribute for iommu groups to export the type of the
   default domain.

 - A debugfs interface (for debug only) usable by IOMMU drivers to
   export internals to user-space.

 - R-Car Gen3 SoCs support for the ipmmu-vmsa driver

 - The ARM-SMMU now aborts transactions from unknown devices and devices
   not attached to any domain.

 - Various cleanups and smaller fixes all over the place.

* tag 'iommu-updates-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (42 commits)
  iommu/omap: Fix cache flushes on L2 table entries
  iommu: Remove the ->map_sg indirection
  iommu/arm-smmu-v3: Abort all transactions if SMMU is enabled in kdump kernel
  iommu/arm-smmu-v3: Prevent any devices access to memory without registration
  iommu/ipmmu-vmsa: Don't register as BUS IOMMU if machine doesn't have IPMMU-VMSA
  iommu/ipmmu-vmsa: Clarify supported platforms
  iommu/ipmmu-vmsa: Fix allocation in atomic context
  iommu: Add config option to set passthrough as default
  iommu: Add sysfs attribyte for domain type
  iommu/arm-smmu-v3: sync the OVACKFLG to PRIQ consumer register
  iommu/arm-smmu: Error out only if not enough context interrupts
  iommu/io-pgtable-arm-v7s: Abort allocation when table address overflows the PTE
  iommu/io-pgtable-arm: Fix pgtable allocation in selftest
  iommu/vt-d: Remove the obsolete per iommu pasid tables
  iommu/vt-d: Apply per pci device pasid table in SVA
  iommu/vt-d: Allocate and free pasid table
  iommu/vt-d: Per PCI device pasid table interfaces
  iommu/vt-d: Add for_each_device_domain() helper
  iommu/vt-d: Move device_domain_info to header
  iommu/vt-d: Apply global PASID in SVA
  ...
2018-08-24 13:10:38 -07:00
Vlastimil Babka 6a012288d6 x86/speculation/l1tf: Suggest what to do on systems with too much RAM
Two users have reported [1] that they have an "extremely unlikely" system
with more than MAX_PA/2 memory and L1TF mitigation is not effective.

Make the warning more helpful by suggesting the proper mem=X kernel boot
parameter to make it effective and a link to the L1TF document to help
decide if the mitigation is worth the unusable RAM.

[1] https://bugzilla.suse.com/show_bug.cgi?id=1105536

Suggested-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/966571f0-9d7f-43dc-92c6-a10eec7a1254@suse.cz
2018-08-24 15:55:17 +02:00
Vlastimil Babka b0a182f875 x86/speculation/l1tf: Fix off-by-one error when warning that system has too much RAM
Two users have reported [1] that they have an "extremely unlikely" system
with more than MAX_PA/2 memory and L1TF mitigation is not effective. In
fact it's a CPU with 36bits phys limit (64GB) and 32GB memory, but due to
holes in the e820 map, the main region is almost 500MB over the 32GB limit:

[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000081effffff] usable

Suggestions to use 'mem=32G' to enable the L1TF mitigation while losing the
500MB revealed, that there's an off-by-one error in the check in
l1tf_select_mitigation().

l1tf_pfn_limit() returns the last usable pfn (inclusive) and the range
check in the mitigation path does not take this into account.

Instead of amending the range check, make l1tf_pfn_limit() return the first
PFN which is over the limit which is less error prone. Adjust the other
users accordingly.

[1] https://bugzilla.suse.com/show_bug.cgi?id=1105536

Fixes: 17dbca1193 ("x86/speculation/l1tf: Add sysfs reporting for l1tf")
Reported-by: George Anchev <studio@anchev.net>
Reported-by: Christopher Snowhill <kode54@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180823134418.17008-1-vbabka@suse.cz
2018-08-24 09:51:14 +02:00
Masahiro Yamada d503ac531a kbuild: rename LDFLAGS to KBUILD_LDFLAGS
Commit a0f97e06a4 ("kbuild: enable 'make CFLAGS=...' to add
additional options to CC") renamed CFLAGS to KBUILD_CFLAGS.

Commit 222d394d30 ("kbuild: enable 'make AFLAGS=...' to add
additional options to AS") renamed AFLAGS to KBUILD_AFLAGS.

Commit 06c5040cdb ("kbuild: enable 'make CPPFLAGS=...' to add
additional options to CPP") renamed CPPFLAGS to KBUILD_CPPFLAGS.

For some reason, LDFLAGS was not renamed.

Using a well-known variable like LDFLAGS may result in accidental
override of the variable.

Kbuild generally uses KBUILD_ prefixed variables for the internally
appended options, so here is one more conversion to sanitize the
naming convention.

I did not touch Makefiles under tools/ since the tools build system
is a different world.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
2018-08-24 08:22:08 +09:00
Linus Torvalds 706a1ea65e Merge branch 'tlb-fixes'
Merge fixes for missing TLB shootdowns.

This fixes a couple of cases that involved us possibly freeing page
table structures before the required TLB shootdown had been done.

There are a few cleanup patches to make the code easier to follow, and
to avoid some of the more problematic cases entirely when not necessary.

To make this easier for backports, it undoes the recent lazy TLB
patches, because the cleanups and fixes are more important, and Rik is
ok with re-doing them later when things have calmed down.

The missing TLB flush was only delayed, and the wrong ordering only
happened under memory pressure (and in theory under a couple of other
fairly theoretical situations), so this may have been all very unlikely
to have hit people in practice.

But getting the TLB shootdown wrong is _so_ hard to debug and see that I
consider this a crticial fix.

Many thanks to Jann Horn for having debugged this.

* tlb-fixes:
  x86/mm: Only use tlb_remove_table() for paravirt
  mm: mmu_notifier fix for tlb_end_vma
  mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE
  mm/tlb: Remove tlb_remove_table() non-concurrent condition
  mm: move tlb_table_flush to tlb_flush_mmu_free
  x86/mm/tlb: Revert the recent lazy TLB patches
2018-08-23 14:55:01 -07:00
Linus Torvalds d40acad1f1 xen: fixes and cleanups for 4.19-rc1, second round
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCW36rRgAKCRCAXGG7T9hj
 vkrcAQC8F+ljGO5PtYUkKcMy17vqvcq/BdetJuUVfk+G1WmLxQEAiaNiqqJGsOyJ
 Msa0HHDT31uBYGg/iq7yAWk23tcTZwE=
 =Px4D
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.19b-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes and cleanups from Juergen Gross:
 "Some cleanups, some minor fixes and a fix for a bug introduced in this
  merge window hitting 32-bit PV guests"

* tag 'for-linus-4.19b-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: enable early use of set_fixmap in 32-bit Xen PV guest
  xen: remove unused hypercall functions
  x86/xen: remove unused function xen_auto_xlated_memory_setup()
  xen/ACPI: don't upload Px/Cx data for disabled processors
  x86/Xen: further refine add_preferred_console() invocations
  xen/mcelog: eliminate redundant setting of interface version
  x86/Xen: mark xen_setup_gdt() __init
2018-08-23 14:52:23 -07:00
Peter Zijlstra 48a8b97cfd x86/mm: Only use tlb_remove_table() for paravirt
If we don't use paravirt; don't play unnecessary and complicated games
to free page-tables.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-23 11:56:31 -07:00
Peter Zijlstra d86564a2f0 mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE
Jann reported that x86 was missing required TLB invalidates when he
hit the !*batch slow path in tlb_remove_table().

This is indeed the case; RCU_TABLE_FREE does not provide TLB (cache)
invalidates, the PowerPC-hash where this code originated and the
Sparc-hash where this was subsequently used did not need that. ARM
which later used this put an explicit TLB invalidate in their
__p*_free_tlb() functions, and PowerPC-radix followed that example.

But when we hooked up x86 we failed to consider this. Fix this by
(optionally) hooking tlb_remove_table() into the TLB invalidate code.

NOTE: s390 was also needing something like this and might now
      be able to use the generic code again.

[ Modified to be on top of Nick's cleanups, which simplified this patch
  now that tlb_flush_mmu_tlbonly() really only flushes the TLB - Linus ]

Fixes: 9e52fc2b50 ("x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rik van Riel <riel@surriel.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-23 11:55:58 -07:00
Peter Zijlstra 52a288c736 x86/mm/tlb: Revert the recent lazy TLB patches
Revert commits:

  95b0e6357d x86/mm/tlb: Always use lazy TLB mode
  64482aafe5 x86/mm/tlb: Only send page table free TLB flush to lazy TLB CPUs
  ac03158969 x86/mm/tlb: Make lazy TLB mode lazier
  61d0beb579 x86/mm/tlb: Restructure switch_mm_irqs_off()
  2ff6ddf19c x86/mm/tlb: Leave lazy TLB mode at page table free time

In order to simplify the TLB invalidate fixes for x86 and unify the
parts that need backporting.  We'll try again later.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22 18:22:04 -07:00
Linus Torvalds b372115311 ARM: Support for Group0 interrupts in guests, Cache management
optimizations for ARMv8.4 systems, Userspace interface for RAS, Fault
 path optimization, Emulated physical timer fixes, Random cleanups
 
 x86: fixes for L1TF, a new test case, non-support for SGX (inject the
 right exception in the guest), a lockdep false positive
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJbfXfZAAoJEL/70l94x66DL2QH/RnQZW4OaqVdE3pNvRvaNJGQ
 41yk9aErbqPcK25aIKnhs9e3S+e32BhArA1YBwdHXwwuanANYv5W+o3HNTL0UFj7
 UG6APKm5DR6kJeUZ3vCfyeZ/ZKxDW0uqf5DXQyHUiAhwLGw2wWYJ9Ttv0m0Q4Fxl
 x9HEnK/s+komG93QT+2hIXtZdPiB026yBBqDDPyYiWrweyBagYUHz65p6qaPiOEY
 HqOyLYKsgrqCv9U0NLTD9U54IWGFIaxMGgjyRdZTMCIQeGj6dAH7vyfURGOeDHvw
 C0OZeEKRbMsHLwzXRBDEZp279pYgS7zafe/hMkr/znaac+j6xNwxpWwqg5Sm0UE=
 =5yTH
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull second set of KVM updates from Paolo Bonzini:
 "ARM:
   - Support for Group0 interrupts in guests
   - Cache management optimizations for ARMv8.4 systems
   - Userspace interface for RAS
   - Fault path optimization
   - Emulated physical timer fixes
   - Random cleanups

  x86:
   - fixes for L1TF
   - a new test case
   - non-support for SGX (inject the right exception in the guest)
   - fix lockdep false positive"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (49 commits)
  KVM: VMX: fixes for vmentry_l1d_flush module parameter
  kvm: selftest: add dirty logging test
  kvm: selftest: pass in extra memory when create vm
  kvm: selftest: include the tools headers
  kvm: selftest: unify the guest port macros
  tools: introduce test_and_clear_bit
  KVM: x86: SVM: Call x86_spec_ctrl_set_guest/host() with interrupts disabled
  KVM: vmx: Inject #UD for SGX ENCLS instruction in guest
  KVM: vmx: Add defines for SGX ENCLS exiting
  x86/kvm/vmx: Fix coding style in vmx_setup_l1d_flush()
  x86: kvm: avoid unused variable warning
  KVM: Documentation: rename the capability of KVM_CAP_ARM_SET_SERROR_ESR
  KVM: arm/arm64: Skip updating PTE entry if no change
  KVM: arm/arm64: Skip updating PMD entry if no change
  KVM: arm: Use true and false for boolean values
  KVM: arm/arm64: vgic: Do not use spin_lock_irqsave/restore with irq disabled
  KVM: arm/arm64: vgic: Move DEBUG_SPINLOCK_BUG_ON to vgic.h
  KVM: arm: vgic-v3: Add support for ICC_SGI0R and ICC_ASGI1R accesses
  KVM: arm64: vgic-v3: Add support for ICC_SGI0R_EL1 and ICC_ASGI1R_EL1 accesses
  KVM: arm/arm64: vgic-v3: Add core support for Group0 SGIs
  ...
2018-08-22 13:52:44 -07:00
Linus Torvalds cd9b44f907 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - the rest of MM

 - procfs updates

 - various misc things

 - more y2038 fixes

 - get_maintainer updates

 - lib/ updates

 - checkpatch updates

 - various epoll updates

 - autofs updates

 - hfsplus

 - some reiserfs work

 - fatfs updates

 - signal.c cleanups

 - ipc/ updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (166 commits)
  ipc/util.c: update return value of ipc_getref from int to bool
  ipc/util.c: further variable name cleanups
  ipc: simplify ipc initialization
  ipc: get rid of ids->tables_initialized hack
  lib/rhashtable: guarantee initial hashtable allocation
  lib/rhashtable: simplify bucket_table_alloc()
  ipc: drop ipc_lock()
  ipc/util.c: correct comment in ipc_obtain_object_check
  ipc: rename ipcctl_pre_down_nolock()
  ipc/util.c: use ipc_rcu_putref() for failues in ipc_addid()
  ipc: reorganize initialization of kern_ipc_perm.seq
  ipc: compute kern_ipc_perm.id under the ipc lock
  init/Kconfig: remove EXPERT from CHECKPOINT_RESTORE
  fs/sysv/inode.c: use ktime_get_real_seconds() for superblock stamp
  adfs: use timespec64 for time conversion
  kernel/sysctl.c: fix typos in comments
  drivers/rapidio/devices/rio_mport_cdev.c: remove redundant pointer md
  fork: don't copy inconsistent signal handler state to child
  signal: make get_signal() return bool
  signal: make sigkill_pending() return bool
  ...
2018-08-22 12:34:08 -07:00
Ard Biesheuvel 7290d58095 module: use relative references for __ksymtab entries
An ordinary arm64 defconfig build has ~64 KB worth of __ksymtab entries,
each consisting of two 64-bit fields containing absolute references, to
the symbol itself and to a char array containing its name, respectively.

When we build the same configuration with KASLR enabled, we end up with an
additional ~192 KB of relocations in the .init section, i.e., one 24 byte
entry for each absolute reference, which all need to be processed at boot
time.

Given how the struct kernel_symbol that describes each entry is completely
local to module.c (except for the references emitted by EXPORT_SYMBOL()
itself), we can easily modify it to contain two 32-bit relative references
instead.  This reduces the size of the __ksymtab section by 50% for all
64-bit architectures, and gets rid of the runtime relocations entirely for
architectures implementing KASLR, either via standard PIE linking (arm64)
or using custom host tools (x86).

Note that the binary search involving __ksymtab contents relies on each
section being sorted by symbol name.  This is implemented based on the
input section names, not the names in the ksymtab entries, so this patch
does not interfere with that.

Given that the use of place-relative relocations requires support both in
the toolchain and in the module loader, we cannot enable this feature for
all architectures.  So make it dependent on whether
CONFIG_HAVE_ARCH_PREL32_RELOCATIONS is defined.

Link: http://lkml.kernel.org/r/20180704083651.24360-4-ard.biesheuvel@linaro.org
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Jessica Yu <jeyu@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morris <james.morris@microsoft.com>
Cc: James Morris <jmorris@namei.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22 10:52:47 -07:00
Ard Biesheuvel f922c4abdf module: allow symbol exports to be disabled
To allow existing C code to be incorporated into the decompressor or the
UEFI stub, introduce a CPP macro that turns all EXPORT_SYMBOL_xxx
declarations into nops, and #define it in places where such exports are
undesirable.  Note that this gets rid of a rather dodgy redefine of
linux/export.h's header guard.

Link: http://lkml.kernel.org/r/20180704083651.24360-3-ard.biesheuvel@linaro.org
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morris <james.morris@microsoft.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22 10:52:47 -07:00
Ard Biesheuvel 271ca78877 arch: enable relative relocations for arm64, power and x86
Patch series "add support for relative references in special sections", v10.

This adds support for emitting special sections such as initcall arrays,
PCI fixups and tracepoints as relative references rather than absolute
references.  This reduces the size by 50% on 64-bit architectures, but
more importantly, it removes the need for carrying relocation metadata for
these sections in relocatable kernels (e.g., for KASLR) that needs to be
fixed up at boot time.  On arm64, this reduces the vmlinux footprint of
such a reference by 8x (8 byte absolute reference + 24 byte RELA entry vs
4 byte relative reference)

Patch #3 was sent out before as a single patch.  This series supersedes
the previous submission.  This version makes relative ksymtab entries
dependent on the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS rather
than trying to infer from kbuild test robot replies for which
architectures it should be blacklisted.

Patch #1 introduces the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS,
and sets it for the main architectures that are expected to benefit the
most from this feature, i.e., 64-bit architectures or ones that use
runtime relocations.

Patch #2 add support for #define'ing __DISABLE_EXPORTS to get rid of
ksymtab/kcrctab sections in decompressor and EFI stub objects when
rebuilding existing C files to run in a different context.

Patches #4 - #6 implement relative references for initcalls, PCI fixups
and tracepoints, respectively, all of which produce sections with order
~1000 entries on an arm64 defconfig kernel with tracing enabled.  This
means we save about 28 KB of vmlinux space for each of these patches.

[From the v7 series blurb, which included the jump_label patches as well]:

  For the arm64 kernel, all patches combined reduce the memory footprint
  of vmlinux by about 1.3 MB (using a config copied from Ubuntu that has
  KASLR enabled), of which ~1 MB is the size reduction of the RELA section
  in .init, and the remaining 300 KB is reduction of .text/.data.

This patch (of 6):

Before updating certain subsystems to use place relative 32-bit
relocations in special sections, to save space and reduce the number of
absolute relocations that need to be processed at runtime by relocatable
kernels, introduce the Kconfig symbol and define it for some architectures
that should be able to support and benefit from it.

Link: http://lkml.kernel.org/r/20180704083651.24360-2-ard.biesheuvel@linaro.org
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: James Morris <jmorris@namei.org>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
Cc: James Morris <james.morris@microsoft.com>
Cc: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22 10:52:47 -07:00
Michal Hocko 93065ac753 mm, oom: distinguish blockable mode for mmu notifiers
There are several blockable mmu notifiers which might sleep in
mmu_notifier_invalidate_range_start and that is a problem for the
oom_reaper because it needs to guarantee a forward progress so it cannot
depend on any sleepable locks.

Currently we simply back off and mark an oom victim with blockable mmu
notifiers as done after a short sleep.  That can result in selecting a new
oom victim prematurely because the previous one still hasn't torn its
memory down yet.

We can do much better though.  Even if mmu notifiers use sleepable locks
there is no reason to automatically assume those locks are held.  Moreover
majority of notifiers only care about a portion of the address space and
there is absolutely zero reason to fail when we are unmapping an unrelated
range.  Many notifiers do really block and wait for HW which is harder to
handle and we have to bail out though.

This patch handles the low hanging fruit.
__mmu_notifier_invalidate_range_start gets a blockable flag and callbacks
are not allowed to sleep if the flag is set to false.  This is achieved by
using trylock instead of the sleepable lock for most callbacks and
continue as long as we do not block down the call chain.

I think we can improve that even further because there is a common pattern
to do a range lookup first and then do something about that.  The first
part can be done without a sleeping lock in most cases AFAICS.

The oom_reaper end then simply retries if there is at least one notifier
which couldn't make any progress in !blockable mode.  A retry loop is
already implemented to wait for the mmap_sem and this is basically the
same thing.

The simplest way for driver developers to test this code path is to wrap
userspace code which uses these notifiers into a memcg and set the hard
limit to hit the oom.  This can be done e.g.  after the test faults in all
the mmu notifier managed memory and set the hard limit to something really
small.  Then we are looking for a proper process tear down.

[akpm@linux-foundation.org: coding style fixes]
[akpm@linux-foundation.org: minor code simplification]
Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers
Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp
Reported-by: David Rientjes <rientjes@google.com>
Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Sudeep Dutt <sudeep.dutt@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22 10:52:44 -07:00
Paolo Bonzini 0027ff2a75 KVM: VMX: fixes for vmentry_l1d_flush module parameter
Two bug fixes:

1) missing entries in the l1d_param array; this can cause a host crash
if an access attempts to reach the missing entry. Future-proof the get
function against any overflows as well.  However, the two entries
VMENTER_L1D_FLUSH_EPT_DISABLED and VMENTER_L1D_FLUSH_NOT_REQUIRED must
not be accepted by the parse function, so disable them there.

2) invalid values must be rejected even if the CPU does not have the
bug, so test for them before checking boot_cpu_has(X86_BUG_L1TF)

... and a small refactoring, since the .cmd field is redundant with
the index in the array.

Reported-by: Bandan Das <bsd@redhat.com>
Cc: stable@vger.kernel.org
Fixes: a7b9020b06
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-22 16:48:39 +02:00
Thomas Gleixner 024d83cadc KVM: x86: SVM: Call x86_spec_ctrl_set_guest/host() with interrupts disabled
Mikhail reported the following lockdep splat:

WARNING: possible irq lock inversion dependency detected
CPU 0/KVM/10284 just changed the state of lock:
  000000000d538a88 (&st->lock){+...}, at:
  speculative_store_bypass_update+0x10b/0x170

but this lock was taken by another, HARDIRQ-safe lock
in the past:

(&(&sighand->siglock)->rlock){-.-.}

   and interrupts could create inverse lock ordering between them.

Possible interrupt unsafe locking scenario:

    CPU0                    CPU1
    ----                    ----
   lock(&st->lock);
                           local_irq_disable();
                           lock(&(&sighand->siglock)->rlock);
                           lock(&st->lock);
    <Interrupt>
     lock(&(&sighand->siglock)->rlock);
     *** DEADLOCK ***

The code path which connects those locks is:

   speculative_store_bypass_update()
   ssb_prctl_set()
   do_seccomp()
   do_syscall_64()

In svm_vcpu_run() speculative_store_bypass_update() is called with
interupts enabled via x86_virt_spec_ctrl_set_guest/host().

This is actually a false positive, because GIF=0 so interrupts are
disabled even if IF=1; however, we can easily move the invocations of
x86_virt_spec_ctrl_set_guest/host() into the interrupt disabled region to
cure it, and it's a good idea to keep the GIF=0/IF=1 area as small
and self-contained as possible.

Fixes: 1f50ddb4f4 ("x86/speculation: Handle HT correctly on AMD")
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: kvm@vger.kernel.org
Cc: x86@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-22 16:48:36 +02:00
Sean Christopherson 0b665d3040 KVM: vmx: Inject #UD for SGX ENCLS instruction in guest
Virtualization of Intel SGX depends on Enclave Page Cache (EPC)
management that is not yet available in the kernel, i.e. KVM support
for exposing SGX to a guest cannot be added until basic support
for SGX is upstreamed, which is a WIP[1].

Until SGX is properly supported in KVM, ensure a guest sees expected
behavior for ENCLS, i.e. all ENCLS #UD.  Because SGX does not have a
true software enable bit, e.g. there is no CR4.SGXE bit, the ENCLS
instruction can be executed[1] by the guest if SGX is supported by the
system.  Intercept all ENCLS leafs (via the ENCLS- exiting control and
field) and unconditionally inject #UD.

[1] https://www.spinics.net/lists/kvm/msg171333.html or
    https://lkml.org/lkml/2018/7/3/879

[2] A guest can execute ENCLS in the sense that ENCLS will not take
    an immediate #UD, but no ENCLS will ever succeed in a guest
    without explicit support from KVM (map EPC memory into the guest),
    unless KVM has a *very* egregious bug, e.g. accidentally mapped
    EPC memory into the guest SPTEs.  In other words this patch is
    needed only to prevent the guest from seeing inconsistent behavior,
    e.g. #GP (SGX not enabled in Feature Control MSR) or #PF (leaf
    operand(s) does not point at EPC memory) instead of #UD on ENCLS.
    Intercepting ENCLS is not required to prevent the guest from truly
    utilizing SGX.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20180814163334.25724-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-22 16:48:35 +02:00
Sean Christopherson 802ec46167 KVM: vmx: Add defines for SGX ENCLS exiting
Hardware support for basic SGX virtualization adds a new execution
control (ENCLS_EXITING), VMCS field (ENCLS_EXITING_BITMAP) and exit
reason (ENCLS), that enables a VMM to intercept specific ENCLS leaf
functions, e.g. to inject faults when the VMM isn't exposing SGX to
a VM.  When ENCLS_EXITING is enabled, the VMM can set/clear bits in
the bitmap to intercept/allow ENCLS leaf functions in non-root, e.g.
setting bit 2 in the ENCLS_EXITING_BITMAP will cause ENCLS[EINIT]
to VMExit(ENCLS).

Note: EXIT_REASON_ENCLS was previously added by commit 1f51999270
("KVM: VMX: add missing exit reasons").

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20180814163334.25724-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-22 16:48:35 +02:00
Yi Wang d806afa495 x86/kvm/vmx: Fix coding style in vmx_setup_l1d_flush()
Substitute spaces with tab. No functional changes.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
Message-Id: <1534398159-48509-1-git-send-email-wang.yi59@zte.com.cn>
Cc: stable@vger.kernel.org # L1TF
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-22 16:48:34 +02:00
Arnd Bergmann 7288bde1f9 x86: kvm: avoid unused variable warning
Removing one of the two accesses of the maxphyaddr variable led to
a harmless warning:

arch/x86/kvm/x86.c: In function 'kvm_set_mmio_spte_mask':
arch/x86/kvm/x86.c:6563:6: error: unused variable 'maxphyaddr' [-Werror=unused-variable]

Removing the #ifdef seems to be the nicest workaround, as it
makes the code look cleaner than adding another #ifdef.

Fixes: 28a1f3ac1d ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: stable@vger.kernel.org # L1TF
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-22 16:48:26 +02:00
Linus Torvalds dfec4a8478 More power management updates for 4.19-rc1
- Make the idle loop handle stopped scheduler tick correctly (Rafael
    Wysocki).
 
  - Prevent the menu cpuidle governor from letting CPUs spend too much
    time in shallow idle states when it is invoked with scheduler tick
    stopped and clean it up somewhat (Rafael Wysocki).
 
  - Avoid invoking the platform firmware to make the platform enter
    the ACPI S3 sleep state with suspended PCIe root ports which may
    confuse the firmware and cause it to crash (Rafael Wysocki).
 
  - Fix sysfs-related race in the ondemand and conservative cpufreq
    governors which may cause the system to crash if the governor
    module is removed during an update of CPU frequency limits (Henry
    Willard).
 
  - Select SRCU when building the system wakeup framework to avoid a
    build issue in it (zhangyi).
 
  - Make the descriptions of ACPI C-states vendor-neutral to avoid
    confusion (Prarit Bhargava).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJbfSnVAAoJEILEb/54YlRxBn4QAKQ8PqkSYkBby+1hb90ET4dk
 VaLkbCYXuzLK5rIDvnbYOALhVKo4B29Ex5GdCLN7cWkZMkrVKe7oX8QQTnp3/7lF
 URjTKgTNec5uJG652PrE3ESAa3X/kYggj6aeQOxDR4iYKzcpJEQ92ekFW+SoJTNp
 Jc2kZh3qkC2On64GB3ibsZaKnmHfPvLg0t4agwzuYq/Gff8NRJFk7kMwAPzqGzZo
 b2UVRcYFWIRkJjgmU9iInoeHIY8mBdT3IiKwTemZP1dOhb5T1AHOXwGTk6/cS+RH
 A9qx4eg7I3R00KmnYvO8WytYJeOu2qb83GIUx4fIJGOqfvevm5xkxB9F+nfE+ouj
 ROBqO4+X4XfQGPw8slayg0rJjI9JSkXLnLdl0Qw2WRlbc4/fVWntra1C57EeKFBR
 EG9UAF9+7nUUx0bOCLsfFF3+r9R3SDUjk7b4thyhYncyQRsYC+FL7ztlxnMzVtzW
 M5SF2sPrpcQzqmcszdUdbESI10n5X8m/crJW4rsbTxBpAM+coO+uLcvHWOY4MpkW
 BgBsR6bMDAlG/VlTFgeeP/tkCRd5zNlJi7yBFItXuOoVKXpnHCJuxq2WkZ1Rb74M
 Gk1d3TduekHJm8VsLEdCJR/tEk1cMc0zVUD/a1yzI4Z21QxvXUCqMDdws4/Ey184
 qmKgNR9R94vSC5xIPRhM
 =9GrU
 -----END PGP SIGNATURE-----

Merge tag 'pm-4.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 "These fix the main idle loop and the menu cpuidle governor, clean up
  the latter, fix a mistake in the PCI bus type's support for system
  suspend and resume, fix the ondemand and conservative cpufreq
  governors, address a build issue in the system wakeup framework and
  make the ACPI C-states desciptions less confusing.

  Specifics:

   - Make the idle loop handle stopped scheduler tick correctly (Rafael
     Wysocki).

   - Prevent the menu cpuidle governor from letting CPUs spend too much
     time in shallow idle states when it is invoked with scheduler tick
     stopped and clean it up somewhat (Rafael Wysocki).

   - Avoid invoking the platform firmware to make the platform enter the
     ACPI S3 sleep state with suspended PCIe root ports which may
     confuse the firmware and cause it to crash (Rafael Wysocki).

   - Fix sysfs-related race in the ondemand and conservative cpufreq
     governors which may cause the system to crash if the governor
     module is removed during an update of CPU frequency limits (Henry
     Willard).

   - Select SRCU when building the system wakeup framework to avoid a
     build issue in it (zhangyi).

   - Make the descriptions of ACPI C-states vendor-neutral to avoid
     confusion (Prarit Bhargava)"

* tag 'pm-4.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpuidle: menu: Handle stopped tick more aggressively
  sched: idle: Avoid retaining the tick when it has been stopped
  PCI / ACPI / PM: Resume all bridges on suspend-to-RAM
  cpuidle: menu: Update stale polling override comment
  cpufreq: governor: Avoid accessing invalid governor_data
  x86/ACPI/cstate: Make APCI C1 FFH MWAIT C-state description vendor-neutral
  cpuidle: menu: Fix white space
  PM / sleep: wakeup: Fix build error caused by missing SRCU support
2018-08-22 07:42:36 -07:00
Juergen Gross 75f2d3a0ce x86/xen: enable early use of set_fixmap in 32-bit Xen PV guest
Commit 7b25b9cb0d ("x86/xen/time: Initialize pv xen time in
init_hypervisor_platform()") moved the mapping of the shared info area
before pagetable_init(). This breaks booting as 32-bit PV guest as the
use of set_fixmap isn't possible at this time on 32-bit.

This can be worked around by populating the needed PMD on 32-bit
kernel earlier.

In order not to reimplement populate_extra_pte() using extend_brk()
for allocating new page tables extend alloc_low_pages() to do that in
case the early page table pool is not yet available.

Fixes: 7b25b9cb0d ("x86/xen/time: Initialize pv xen time in init_hypervisor_platform()")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-08-20 14:46:26 -04:00
Juergen Gross 00f53f758d xen: remove unused hypercall functions
Remove Xen hypercall functions which are used nowhere in the kernel.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-08-20 14:46:18 -04:00