Commit Graph

30535 Commits

Author SHA1 Message Date
Linus Torvalds 27c5a778df Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingol Molnar:
 "Misc fixes:

   - EFI crash fix

   - Xen PV fixes

   - do not allow PTI on 2-level 32-bit kernels for now

   - documentation fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/APM: Fix build warning when PROC_FS is not enabled
  Revert "x86/mm/legacy: Populate the user page-table with user pgd's"
  x86/efi: Load fixmap GDT in efi_call_phys_epilog() before setting %cr3
  x86/xen: Disable CPU0 hotplug for Xen PV
  x86/EISA: Don't probe EISA bus for Xen PV guests
  x86/doc: Fix Documentation/x86/earlyprintk.txt
2018-09-15 08:02:46 -10:00
Randy Dunlap 002b87d2aa x86/APM: Fix build warning when PROC_FS is not enabled
Fix build warning in apm_32.c when CONFIG_PROC_FS is not enabled:

../arch/x86/kernel/apm_32.c:1643:12: warning: 'proc_apm_show' defined but not used [-Wunused-function]
 static int proc_apm_show(struct seq_file *m, void *v)

Fixes: 3f3942aca6 ("proc: introduce proc_create_single{,_data}")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jiri Kosina <jikos@kernel.org>
Link: https://lkml.kernel.org/r/be39ac12-44c2-4715-247f-4dcc3c525b8b@infradead.org
2018-09-15 10:16:25 +02:00
Joerg Roedel 61a6bd83ab Revert "x86/mm/legacy: Populate the user page-table with user pgd's"
This reverts commit 1f40a46cf4.

It turned out that this patch is not sufficient to enable PTI on 32 bit
systems with legacy 2-level page-tables. In this paging mode the huge-page
PTEs are in the top-level page-table directory, where also the mirroring to
the user-space page-table happens. So every huge PTE exits twice, in the
kernel and in the user page-table.

That means that accessed/dirty bits need to be fetched from two PTEs in
this mode to be safe, but this is not trivial to implement because it needs
changes to generic code just for the sake of enabling PTI with 32-bit
legacy paging. As all systems that need PTI should support PAE anyway,
remove support for PTI when 32-bit legacy paging is used.

Fixes: 7757d607c6 ('x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32')
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Link: https://lkml.kernel.org/r/1536922754-31379-1-git-send-email-joro@8bytes.org
2018-09-14 17:08:45 +02:00
Guenter Roeck cf40361ede x86/efi: Load fixmap GDT in efi_call_phys_epilog() before setting %cr3
Commit eeb89e2bb1 ("x86/efi: Load fixmap GDT in efi_call_phys_epilog()")
moved loading the fixmap in efi_call_phys_epilog() after load_cr3() since
it was assumed to be more logical.

Turns out this is incorrect: In efi_call_phys_prolog(), the gdt with its
physical address is loaded first, and when the %cr3 is reloaded in _epilog
from initial_page_table to swapper_pg_dir again the gdt is no longer
mapped.  This results in a triple fault if an interrupt occurs after
load_cr3() and before load_fixmap_gdt(0). Calling load_fixmap_gdt(0) first
restores the execution order prior to commit eeb89e2bb1 and fixes the
problem.

Fixes: eeb89e2bb1 ("x86/efi: Load fixmap GDT in efi_call_phys_epilog()")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: linux-efi@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/1536689892-21538-1-git-send-email-linux@roeck-us.net
2018-09-12 21:53:34 +02:00
Juergen Gross 999696752d x86/xen: Disable CPU0 hotplug for Xen PV
Xen PV guests don't allow CPU0 hotplug, so disable it.

Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boris.ostrovsky@oracle.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20180912174122.24282-1-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-12 21:15:02 +02:00
Boris Ostrovsky 6a92b11169 x86/EISA: Don't probe EISA bus for Xen PV guests
For unprivileged Xen PV guests this is normal memory and ioremap will
not be able to properly map it.

While at it, since ioremap may return NULL, add a test for pointer's
validity.

Reported-by: Andy Smith <andy@strugglers.net>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Cc: xen-devel@lists.xenproject.org
Cc: jgross@suse.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180911195538.23289-1-boris.ostrovsky@oracle.com
2018-09-11 23:36:50 +02:00
Jacek Tomaka 16160c1946 perf/x86/intel: Add support/quirk for the MISPREDICT bit on Knights Landing CPUs
Problem: perf did not show branch predicted/mispredicted bit in brstack.

Output of perf -F brstack for profile collected

Before:

 0x4fdbcd/0x4fdc03/-/-/-/0
 0x45f4c1/0x4fdba0/-/-/-/0
 0x45f544/0x45f4bb/-/-/-/0
 0x45f555/0x45f53c/-/-/-/0
 0x7f66901cc24b/0x45f555/-/-/-/0
 0x7f66901cc22e/0x7f66901cc23d/-/-/-/0
 0x7f66901cc1ff/0x7f66901cc20f/-/-/-/0
 0x7f66901cc1e8/0x7f66901cc1fc/-/-/-/0

After:

 0x4fdbcd/0x4fdc03/P/-/-/0
 0x45f4c1/0x4fdba0/P/-/-/0
 0x45f544/0x45f4bb/P/-/-/0
 0x45f555/0x45f53c/P/-/-/0
 0x7f66901cc24b/0x45f555/P/-/-/0
 0x7f66901cc22e/0x7f66901cc23d/P/-/-/0
 0x7f66901cc1ff/0x7f66901cc20f/P/-/-/0
 0x7f66901cc1e8/0x7f66901cc1fc/P/-/-/0

Cause:

As mentioned in Software Development Manual vol 3, 17.4.8.1,
IA32_PERF_CAPABILITIES[5:0] indicates the format of the address that is
stored in the LBR stack. Knights Landing reports 1 (LBR_FORMAT_LIP) as
its format. Despite that, registers containing FROM address of the branch,
do have MISPREDICT bit but because of the format indicated in
IA32_PERF_CAPABILITIES[5:0], LBR did not read MISPREDICT bit.

Solution:

Teach LBR about above Knights Landing quirk and make it read MISPREDICT bit.

Signed-off-by: Jacek Tomaka <jacek.tomaka@poczta.fm>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180802013830.10600-1-jacekt@dugeo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-10 10:03:01 +02:00
Linus Torvalds 9a5682765a Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A set of fixes for x86:

   - Prevent multiplication result truncation on 32bit. Introduced with
     the early timestamp reworrk.

   - Ensure microcode revision storage to be consistent under all
     circumstances

   - Prevent write tearing of PTEs

   - Prevent confusion of user and kernel reegisters when dumping fatal
     signals verbosely

   - Make an error return value in a failure path of the vector
     allocation negative. Returning EINVAL might the caller assume
     success and causes further wreckage.

   - A trivial kernel doc warning fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Use WRITE_ONCE() when setting PTEs
  x86/apic/vector: Make error return value negative
  x86/process: Don't mix user/kernel regs in 64bit __show_regs()
  x86/tsc: Prevent result truncation on 32bit
  x86: Fix kernel-doc atomic.h warnings
  x86/microcode: Update the new microcode revision unconditionally
  x86/microcode: Make sure boot_cpu_data.microcode is up-to-date
2018-09-09 07:05:15 -07:00
Nadav Amit 9bc4f28af7 x86/mm: Use WRITE_ONCE() when setting PTEs
When page-table entries are set, the compiler might optimize their
assignment by using multiple instructions to set the PTE. This might
turn into a security hazard if the user somehow manages to use the
interim PTE. L1TF does not make our lives easier, making even an interim
non-present PTE a security hazard.

Using WRITE_ONCE() to set PTEs and friends should prevent this potential
security hazard.

I skimmed the differences in the binary with and without this patch. The
differences are (obviously) greater when CONFIG_PARAVIRT=n as more
code optimizations are possible. For better and worse, the impact on the
binary with this patch is pretty small. Skimming the code did not cause
anything to jump out as a security hazard, but it seems that at least
move_soft_dirty_pte() caused set_pte_at() to use multiple writes.

Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180902181451.80520-1-namit@vmware.com
2018-09-08 12:30:36 +02:00
Thomas Gleixner 47b7360ce5 x86/apic/vector: Make error return value negative
activate_managed() returns EINVAL instead of -EINVAL in case of
error. While this is unlikely to happen, the positive return value would
cause further malfunction at the call site.

Fixes: 2db1f959d9 ("x86/vector: Handle managed interrupts proper")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
2018-09-08 12:12:40 +02:00
Wanpeng Li bdf7ffc899 KVM: LAPIC: Fix pv ipis out-of-bounds access
Dan Carpenter reported that the untrusted data returns from kvm_register_read()
results in the following static checker warning:
  arch/x86/kvm/lapic.c:576 kvm_pv_send_ipi()
  error: buffer underflow 'map->phys_map' 's32min-s32max'

KVM guest can easily trigger this by executing the following assembly sequence
in Ring0:

mov $10, %rax
mov $0xFFFFFFFF, %rbx
mov $0xFFFFFFFF, %rdx
mov $0, %rsi
vmcall

As this will cause KVM to execute the following code-path:
vmx_handle_exit() -> handle_vmcall() -> kvm_emulate_hypercall() -> kvm_pv_send_ipi()
which will reach out-of-bounds access.

This patch fixes it by adding a check to kvm_pv_send_ipi() against map->max_apic_id,
ignoring destinations that are not present and delivering the rest. We also check
whether or not map->phys_map[min + i] is NULL since the max_apic_id is set to the
max apic id, some phys_map maybe NULL when apic id is sparse, especially kvm
unconditionally set max_apic_id to 255 to reserve enough space for any xAPIC ID.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
[Add second "if (min > map->max_apic_id)" to complete the fix. -Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-09-07 18:38:43 +02:00
Liran Alon b5861e5cf2 KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2
Consider the case L1 had a IRQ/NMI event until it executed
VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed
(e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME,
L0 needs to evaluate if this pending event should cause an exit from
L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept
EXTERNAL_INTERRUPT).

Usually this would be handled by L0 requesting a IRQ/NMI window
by setting VMCS accordingly. However, this setting was done on
VMCS01 and now VMCS02 is active instead. Thus, when L1 executes
VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by
requesting a KVM_REQ_EVENT.

Note that above scenario exists when L1 KVM is about to enter L2 but
requests an "immediate-exit". As in this case, L1 will
disable-interrupts and then send a self-IPI before entering L2.

Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-09-07 18:38:42 +02:00
Radim Krčmář 564ad0aa85 Fixes for KVM/ARM for Linux v4.19 v2:
- Fix a VFP corruption in 32-bit guest
  - Add missing cache invalidation for CoW pages
  - Two small cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJbkngmAAoJEEtpOizt6ddyeaoH/15bbGHlwWf23tGjSoDzhyD4
 zAXfy+SJdm4cR8K7jEkVrNffkEMAby7Zl28hTHKB9jsY1K8DD+EuCE3Nd4kkVAsc
 iHJwV4aiHil/zC5SyE0MqMzELeS8UhsxESYebG6yNF0ElQDQ0SG+QAFr47/OBN9S
 u4I7x0rhyJP6Kg8z9U4KtEX0hM6C7VVunGWu44/xZSAecTaMuJnItCIM4UMdEkSs
 xpAoI59lwM6BWrXLvEunekAkxEXoR7AVpQER2PDINoLK2I0i0oavhPim9Xdt2ZXs
 rqQqfmwmPOVvYbexDp97JtfWo3/psGLqvgoK1tq9bzF3u6Y3ylnUK5IspyVYwuQ=
 =TK8A
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-fixes-for-v4.19-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm

Fixes for KVM/ARM for Linux v4.19 v2:

 - Fix a VFP corruption in 32-bit guest
 - Add missing cache invalidation for CoW pages
 - Two small cleanups
2018-09-07 18:38:25 +02:00
Radim Krčmář ed2ef29100 KVM: s390: Fixes for 4.19
- Fallout from the hugetlbfs support: pfmf interpretion and locking
 - VSIE: fix keywrapping for nested guests
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJbj40sAAoJEBF7vIC1phx8MIYQAK6TtogzCUok4nvRJZGl34Ac
 HvJP2OTSNcJO8MA/DkmXk6LNVgrjgLqc4Y0MCMqaz9EzM1FVM0A5cQ4Tiiwk6dlG
 395Q5SbkrmVIpmxG7dSQbrj3HlMTUCz7jtAUrDS57zaWYdKhqX+AUuW45u+TPfAo
 DL00wS+WJxiTWB06cr0gHpHcXyctn5hK0cYUZQokMn2a1pAjLrS4TEpvoGOcu2d6
 lULY6uYWCwCnma8eieiC8ssLzB8opDPedLrewBnaZFziEZZrPybYvT8uMffNfygA
 tj7og1/+iqnUmyAG20Fb8oM0MMcjRWhLGHVFpv1W1ph7624oDUb3Tzd7rV8bzTMC
 NoqHeIv+oQyhRJCsuPTe2jUcpKc/eJzA8o3ZUdu3LeDBXxNzNOIh08iRHvyFC9iM
 91/YkyYcDW2cukxqYjIwPf+y/dVHRqNAmcs9+hvu8AiNeUJPGUYsmlTBABEg0V9H
 gubV7m/Gl5Yx95UyrlQ4UkuvkOzmtwFYsnFKE0KnqT99bbFFf2na3CZyYBJFBVOj
 knSl3lS9W5LLrZ3s2VaJ/4/bPc4oGjW1ADEamQCYa4K3XQoMrnqGdL0VVuALJ2dZ
 RVIz2DP+P6HBCoRWD0cOA0Q+MvP5hl6TrGDdpCbza3ASSF1f/eSASvHs4P4JQPqY
 dWQ3uIByc3wDXuErkcT5
 =kgjR
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-master-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux

KVM: s390: Fixes for 4.19

- Fallout from the hugetlbfs support: pfmf interpretion and locking
- VSIE: fix keywrapping for nested guests
2018-09-07 18:30:47 +02:00
Marc Zyngier a35381e10d KVM: Remove obsolete kvm_unmap_hva notifier backend
kvm_unmap_hva is long gone, and we only have kvm_unmap_hva_range to
deal with. Drop the now obsolete code.

Fixes: fb1522e099 ("KVM: update to new mmu_notifier semantic v2")
Cc: James Hogan <jhogan@kernel.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
2018-09-07 15:06:02 +02:00
Jann Horn 9fe6299dde x86/process: Don't mix user/kernel regs in 64bit __show_regs()
When the kernel.print-fatal-signals sysctl has been enabled, a simple
userspace crash will cause the kernel to write a crash dump that contains,
among other things, the kernel gsbase into dmesg.

As suggested by Andy, limit output to pt_regs, FS_BASE and KERNEL_GS_BASE
in this case.

This also moves the bitness-specific logic from show_regs() into
process_{32,64}.c.

Fixes: 45807a1df9 ("vdso: print fatal signals")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180831194151.123586-1-jannh@google.com
2018-09-06 14:33:12 +02:00
Chuanhua Lei 17f6bac224 x86/tsc: Prevent result truncation on 32bit
Loops per jiffy is calculated by multiplying tsc_khz with 1e3 and then
dividing it by HZ.

Both tsc_khz and the temporary variable holding the multiplication result
are of type unsigned long, so on 32bit the result is truncated to the lower
32bit.

Use u64 as type for the temporary variable and cast tsc_khz to it before
multiplying.

[ tglx: Massaged changelog and removed pointless braces ]

Fixes: cf7a63ef4e ("x86/tsc: Calibrate tsc only once")
Signed-off-by: Chuanhua Lei <chuanhua.lei@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: yixin.zhu@linux.intel.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@microsoft.com>
Cc: Rajvi Jingar <rajvi.jingar@intel.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Link: https://lkml.kernel.org/r/1536228203-18701-1-git-send-email-chuanhua.lei@linux.intel.com
2018-09-06 14:22:01 +02:00
Randy Dunlap 4331f4d5ad x86: Fix kernel-doc atomic.h warnings
Fix kernel-doc warnings in arch/x86/include/asm/atomic.h that are caused by
having a #define macro between the kernel-doc notation and the function
name.  Fixed by moving the #define macro to after the function
implementation.

Make the same change for atomic64_{32,64}.h for consistency even though
there were no kernel-doc warnings found in these header files, but there
would be if they were used in generation of documentation.

Fixes these kernel-doc warnings:

../arch/x86/include/asm/atomic.h:84: warning: Excess function parameter 'i' description in 'arch_atomic_sub_and_test'
../arch/x86/include/asm/atomic.h:84: warning: Excess function parameter 'v' description in 'arch_atomic_sub_and_test'
../arch/x86/include/asm/atomic.h:96: warning: Excess function parameter 'v' description in 'arch_atomic_inc'
../arch/x86/include/asm/atomic.h:109: warning: Excess function parameter 'v' description in 'arch_atomic_dec'
../arch/x86/include/asm/atomic.h:124: warning: Excess function parameter 'v' description in 'arch_atomic_dec_and_test'
../arch/x86/include/asm/atomic.h:138: warning: Excess function parameter 'v' description in 'arch_atomic_inc_and_test'
../arch/x86/include/asm/atomic.h:153: warning: Excess function parameter 'i' description in 'arch_atomic_add_negative'
../arch/x86/include/asm/atomic.h:153: warning: Excess function parameter 'v' description in 'arch_atomic_add_negative'

Fixes: 18cc1814d4 ("atomics/treewide: Make test ops optional")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lkml.kernel.org/r/0a1e678d-c8c5-b32c-2640-ed4e94d399d2@infradead.org
2018-09-03 12:41:41 +02:00
Linus Torvalds 899ba79553 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "Speculation:

   - Make the microcode check more robust

   - Make the L1TF memory limit depend on the internal cache physical
     address space and not on the CPUID advertised physical address
     space, which might be significantly smaller. This avoids disabling
     L1TF on machines which utilize the full physical address space.

   - Fix the GDT mapping for EFI calls on 32bit PTI

   - Fix the MCE nospec implementation to prevent #GP

  Fixes and robustness:

   - Use the proper operand order for LSL in the VDSO

   - Prevent NMI uaccess race against CR3 switching

   - Add a lockdep check to verify that text_mutex is held in
     text_poke() functions

   - Repair the fallout of giving native_restore_fl() a prototype

   - Prevent kernel memory dumps based on usermode RIP

   - Wipe KASAN shadow stack before rewinding the stack to prevent false
     positives

   - Move the AMS GOTO enforcement to the actual build stage to allow
     user API header extraction without a compiler

   - Fix a section mismatch introduced by the on demand VDSO mapping
     change

  Miscellaneous:

   - Trivial typo, GCC quirk removal and CC_SET/OUT() cleanups"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/pti: Fix section mismatch warning/error
  x86/vdso: Fix lsl operand order
  x86/mce: Fix set_mce_nospec() to avoid #GP fault
  x86/efi: Load fixmap GDT in efi_call_phys_epilog()
  x86/nmi: Fix NMI uaccess race against CR3 switching
  x86: Allow generating user-space headers without a compiler
  x86/dumpstack: Don't dump kernel memory based on usermode RIP
  x86/asm: Use CC_SET()/CC_OUT() in __gen_sigismember()
  x86/alternatives: Lockdep-enforce text_mutex in text_poke*()
  x86/entry/64: Wipe KASAN stack shadow before rewind_stack_do_exit()
  x86/irqflags: Mark native_restore_fl extern inline
  x86/build: Remove jump label quirk for GCC older than 4.5.2
  x86/Kconfig: Fix trivial typo
  x86/speculation/l1tf: Increase l1tf memory limit for Nehalem+
  x86/spectre: Add missing family 6 check to microcode check
2018-09-02 10:11:30 -07:00
Filippo Sironi 8da38ebaad x86/microcode: Update the new microcode revision unconditionally
Handle the case where microcode gets loaded on the BSP's hyperthread
sibling first and the boot_cpu_data's microcode revision doesn't get
updated because of early exit due to the siblings sharing a microcode
engine.

For that, simply write the updated revision on all CPUs unconditionally.

Signed-off-by: Filippo Sironi <sironi@amazon.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: prarit@redhat.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1533050970-14385-1-git-send-email-sironi@amazon.de
2018-09-02 14:10:54 +02:00
Prarit Bhargava 370a132bb2 x86/microcode: Make sure boot_cpu_data.microcode is up-to-date
When preparing an MCE record for logging, boot_cpu_data.microcode is used
to read out the microcode revision on the box.

However, on systems where late microcode update has happened, the microcode
revision output in a MCE log record is wrong because
boot_cpu_data.microcode is not updated when the microcode gets updated.

But, the microcode revision saved in boot_cpu_data's microcode member
should be kept up-to-date, regardless, for consistency.

Make it so.

Fixes: fa94d0c6e0 ("x86/MCE: Save microcode revision in machine check records")
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: sironi@amazon.de
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20180731112739.32338-1-prarit@redhat.com
2018-09-02 14:10:54 +02:00
Randy Dunlap ff924c5a1e x86/pti: Fix section mismatch warning/error
Fix the section mismatch warning in arch/x86/mm/pti.c:

WARNING: vmlinux.o(.text+0x6972a): Section mismatch in reference from the function pti_clone_pgtable() to the function .init.text:pti_user_pagetable_walk_pte()
The function pti_clone_pgtable() references
the function __init pti_user_pagetable_walk_pte().
This is often because pti_clone_pgtable lacks a __init
annotation or the annotation of pti_user_pagetable_walk_pte is wrong.
FATAL: modpost: Section mismatches detected.

Fixes: 85900ea515 ("x86/pti: Map the vsyscall page if needed")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/43a6d6a3-d69d-5eda-da09-0b1c88215a2a@infradead.org
2018-09-02 11:24:41 +02:00
Samuel Neves e78e5a9145 x86/vdso: Fix lsl operand order
In the __getcpu function, lsl is using the wrong target and destination
registers. Luckily, the compiler tends to choose %eax for both variables,
so it has been working so far.

Fixes: a582c540ac ("x86/vdso: Use RDPID in preference to LSL when available")
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180901201452.27828-1-sneves@dei.uc.pt
2018-09-01 23:01:56 +02:00
LuckTony c7486104a5 x86/mce: Fix set_mce_nospec() to avoid #GP fault
The trick with flipping bit 63 to avoid loading the address of the 1:1
mapping of the poisoned page while the 1:1 map is updated used to work when
unmapping the page. But it falls down horribly when attempting to directly
set the page as uncacheable.

The problem is that when the cache mode is changed to uncachable, the pages
needs to be flushed from the cache first. But the decoy address is
non-canonical due to bit 63 flipped, and the CLFLUSH instruction throws a
#GP fault.

Add code to change_page_attr_set_clr() to fix the address before calling
flush.

Fixes: 284ce4011b ("x86/memory_failure: Introduce {set, clear}_mce_nospec()")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Link: https://lkml.kernel.org/r/20180831165506.GA9605@agluck-desk
2018-09-01 14:59:19 +02:00
Joerg Roedel eeb89e2bb1 x86/efi: Load fixmap GDT in efi_call_phys_epilog()
When PTI is enabled on x86-32 the kernel uses the GDT mapped in the fixmap
for the simple reason that this address is also mapped for user-space.

The efi_call_phys_prolog()/efi_call_phys_epilog() wrappers change the GDT
to call EFI runtime services and switch back to the kernel GDT when they
return. But the switch-back uses the writable GDT, not the fixmap GDT.

When that happened and and the CPU returns to user-space it switches to the
user %cr3 and tries to restore user segment registers. This fails because
the writable GDT is not mapped in the user page-table, and without a GDT
the fault handlers also can't be launched. The result is a triple fault and
reboot of the machine.

Fix that by restoring the GDT back to the fixmap GDT which is also mapped
in the user page-table.

Fixes: 7757d607c6 x86/pti: ('Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32')
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: hpa@zytor.com
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/1535702738-10971-1-git-send-email-joro@8bytes.org
2018-08-31 17:45:54 +02:00
Linus Torvalds 4290d5b9ca xen: fixes for 4.19-rc2
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCW4lM6AAKCRCAXGG7T9hj
 vs8AAQDysFccg97UdopW3B7yklIaRqkfEIAsxe65f191MXsH2AEAp5SKxZqRPqBP
 a9WHDj8ShB3BhZ/IxpdO9Y59U3Jo4wA=
 =Gt4c
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.19b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - minor cleanup avoiding a warning when building with new gcc

 - a patch to add a new sysfs node for Xen frontend/backend drivers to
   make it easier to obtain the state of a pv device

 - two fixes for 32-bit pv-guests to avoid intermediate L1TF vulnerable
   PTEs

* tag 'for-linus-4.19b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: remove redundant variable save_pud
  xen: export device state to sysfs
  x86/pae: use 64 bit atomic xchg function in native_ptep_get_and_clear
  x86/xen: don't write ptes directly in 32-bit PV guests
2018-08-31 08:45:16 -07:00
Andy Lutomirski 4012e77a90 x86/nmi: Fix NMI uaccess race against CR3 switching
A NMI can hit in the middle of context switching or in the middle of
switch_mm_irqs_off().  In either case, CR3 might not match current->mm,
which could cause copy_from_user_nmi() and friends to read the wrong
memory.

Fix it by adding a new nmi_uaccess_okay() helper and checking it in
copy_from_user_nmi() and in __copy_from_user_nmi()'s callers.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Rik van Riel <riel@surriel.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/dd956eba16646fd0b15c3c0741269dfd84452dac.1535557289.git.luto@kernel.org
2018-08-31 17:08:22 +02:00
Ben Hutchings 829fe4aa9a x86: Allow generating user-space headers without a compiler
When bootstrapping an architecture, it's usual to generate the kernel's
user-space headers (make headers_install) before building a compiler.  Move
the compiler check (for asm goto support) to the archprepare target so that
it is only done when building code for the target.

Fixes: e501ce957a ("x86: Force asm-goto")
Reported-by: Helmut Grohne <helmutg@debian.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180829194317.GA4765@decadent.org.uk
2018-08-31 17:08:22 +02:00
Jann Horn 342db04ae7 x86/dumpstack: Don't dump kernel memory based on usermode RIP
show_opcodes() is used both for dumping kernel instructions and for dumping
user instructions. If userspace causes #PF by jumping to a kernel address,
show_opcodes() can be reached with regs->ip controlled by the user,
pointing to kernel code. Make sure that userspace can't trick us into
dumping kernel memory into dmesg.

Fixes: 7cccf0725c ("x86/dumpstack: Add a show_ip() function")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: security@kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180828154901.112726-1-jannh@google.com
2018-08-31 17:08:22 +02:00
Sean Christopherson c60658d1d9 KVM: x86: Unexport x86_emulate_instruction()
Allowing x86_emulate_instruction() to be called directly has led to
subtle bugs being introduced, e.g. not setting EMULTYPE_NO_REEXECUTE
in the emulation type.  While most of the blame lies on re-execute
being opt-out, exporting x86_emulate_instruction() also exposes its
cr2 parameter, which may have contributed to commit d391f12070
("x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO
when running nested") using x86_emulate_instruction() instead of
emulate_instruction() because "hey, I have a cr2!", which in turn
introduced its EMULTYPE_NO_REEXECUTE bug.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-08-30 16:20:44 +02:00
Sean Christopherson 0ce97a2b62 KVM: x86: Rename emulate_instruction() to kvm_emulate_instruction()
Lack of the kvm_ prefix gives the impression that it's a VMX or SVM
specific function, and there's no conflict that prevents adding the
kvm_ prefix.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-08-30 16:20:44 +02:00
Sean Christopherson 6c3dfeb6a4 KVM: x86: Do not re-{try,execute} after failed emulation in L2
Commit a6f177efaa ("KVM: Reenter guest after emulation failure if
due to access to non-mmio address") added reexecute_instruction() to
handle the scenario where two (or more) vCPUS race to write a shadowed
page, i.e. reexecute_instruction() is intended to return true if and
only if the instruction being emulated was accessing a shadowed page.
As L0 is only explicitly shadowing L1 tables, an emulation failure of
a nested VM instruction cannot be due to a race to write a shadowed
page and so should never be re-executed.

This fixes an issue where an "MMIO" emulation failure[1] in L2 is all
but guaranteed to result in an infinite loop when TDP is enabled.
Because "cr2" is actually an L2 GPA when TDP is enabled, calling
kvm_mmu_gva_to_gpa_write() to translate cr2 in the non-direct mapped
case (L2 is never direct mapped) will almost always yield UNMAPPED_GVA
and cause reexecute_instruction() to immediately return true.  The
!mmio_info_in_cache() check in kvm_mmu_page_fault() doesn't catch this
case because mmio_info_in_cache() returns false for a nested MMU (the
MMIO caching currently handles L1 only, e.g. to cache nested guests'
GPAs we'd have to manually flush the cache when switching between
VMs and when L1 updated its page tables controlling the nested guest).

Way back when, commit 68be080345 ("KVM: x86: never re-execute
instruction with enabled tdp") changed reexecute_instruction() to
always return false when using TDP under the assumption that KVM would
only get into the emulator for MMIO.  Commit 95b3cf69bd ("KVM: x86:
let reexecute_instruction work for tdp") effectively reverted that
behavior in order to handle the scenario where emulation failed due to
an access from L1 to the shadow page tables for L2, but it didn't
account for the case where emulation failed in L2 with TDP enabled.

All of the above logic also applies to retry_instruction(), added by
commit 1cb3f3ae5a ("KVM: x86: retry non-page-table writing
instructions").  An indefinite loop in retry_instruction() should be
impossible as it protects against retrying the same instruction over
and over, but it's still correct to not retry an L2 instruction in
the first place.

Fix the immediate issue by adding a check for a nested guest when
determining whether or not to allow retry in kvm_mmu_page_fault().
In addition to fixing the immediate bug, add WARN_ON_ONCE in the
retry functions since they are not designed to handle nested cases,
i.e. they need to be modified even if there is some scenario in the
future where we want to allow retrying a nested guest.

[1] This issue was encountered after commit 3a2936dedd ("kvm: mmu:
    Don't expose private memslots to L2") changed the page fault path
    to return KVM_PFN_NOSLOT when translating an L2 access to a
    prive memslot.  Returning KVM_PFN_NOSLOT is semantically correct
    when we want to hide a memslot from L2, i.e. there effectively is
    no defined memory region for L2, but it has the unfortunate side
    effect of making KVM think the GFN is a MMIO page, thus triggering
    emulation.  The failure occurred with in-development code that
    deliberately exposed a private memslot to L2, which L2 accessed
    with an instruction that is not emulated by KVM.

Fixes: 95b3cf69bd ("KVM: x86: let reexecute_instruction work for tdp")
Fixes: 1cb3f3ae5a ("KVM: x86: retry non-page-table writing instructions")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: Xiao Guangrong <xiaoguangrong@tencent.com>
Cc: stable@vger.kernel.org
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-08-30 16:20:44 +02:00
Sean Christopherson 472faffacd KVM: x86: Default to not allowing emulation retry in kvm_mmu_page_fault
Effectively force kvm_mmu_page_fault() to opt-in to allowing retry to
make it more obvious when and why it allows emulation to be retried.
Previously this approach was less convenient due to retry and
re-execute behavior being controlled by separate flags that were also
inverted in their implementations (opt-in versus opt-out).

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-08-30 16:20:43 +02:00
Sean Christopherson 384bf2218e KVM: x86: Merge EMULTYPE_RETRY and EMULTYPE_ALLOW_REEXECUTE
retry_instruction() and reexecute_instruction() are a package deal,
i.e. there is no scenario where one is allowed and the other is not.
Merge their controlling emulation type flags to enforce this in code.
Name the combined flag EMULTYPE_ALLOW_RETRY to make it abundantly
clear that we are allowing re{try,execute} to occur, as opposed to
explicitly requesting retry of a previously failed instruction.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-08-30 16:20:43 +02:00
Sean Christopherson 8065dbd1ee KVM: x86: Invert emulation re-execute behavior to make it opt-in
Re-execution of an instruction after emulation decode failure is
intended to be used only when emulating shadow page accesses.  Invert
the flag to make allowing re-execution opt-in since that behavior is
by far in the minority.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-08-30 16:20:43 +02:00
Sean Christopherson 35be0aded7 KVM: x86: SVM: Set EMULTYPE_NO_REEXECUTE for RSM emulation
Re-execution after an emulation decode failure is only intended to
handle a case where two or vCPUs race to write a shadowed page, i.e.
we should never re-execute an instruction as part of RSM emulation.

Add a new helper, kvm_emulate_instruction_from_buffer(), to support
emulating from a pre-defined buffer.  This eliminates the last direct
call to x86_emulate_instruction() outside of kvm_mmu_page_fault(),
which means x86_emulate_instruction() can be unexported in a future
patch.

Fixes: 7607b71744 ("KVM: SVM: install RSM intercept")
Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-08-30 16:20:43 +02:00
Sean Christopherson c4409905cd KVM: VMX: Do not allow reexecute_instruction() when skipping MMIO instr
Re-execution after an emulation decode failure is only intended to
handle a case where two or vCPUs race to write a shadowed page, i.e.
we should never re-execute an instruction as part of MMIO emulation.
As handle_ept_misconfig() is only used for MMIO emulation, it should
pass EMULTYPE_NO_REEXECUTE when using the emulator to skip an instr
in the fast-MMIO case where VM_EXIT_INSTRUCTION_LEN is invalid.

And because the cr2 value passed to x86_emulate_instruction() is only
destined for use when retrying or reexecuting, we can simply call
emulate_instruction().

Fixes: d391f12070 ("x86/kvm/vmx: do not use vm-exit instruction length
                      for fast MMIO when running nested")
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-08-30 16:20:42 +02:00
Colin Ian King 0186ec8232 KVM: SVM: remove unused variable dst_vaddr_end
Variable dst_vaddr_end is being assigned but is never used hence it is
redundant and can be removed.

Cleans up clang warning:
variable 'dst_vaddr_end' set but not used [-Wunused-but-set-variable]

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-08-30 16:20:42 +02:00
Vitaly Kuznetsov b871da4a77 KVM: nVMX: avoid redundant double assignment of nested_run_pending
nested_run_pending is set 20 lines above and check_vmentry_prereqs()/
check_vmentry_postreqs() don't seem to be resetting it (the later, however,
checks it).

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Eduardo Valentin <eduval@amazon.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-08-30 16:20:03 +02:00
Uros Bizjak 26e609eccd x86/asm: Use CC_SET()/CC_OUT() in __gen_sigismember()
Replace open-coded set instructions with CC_SET()/CC_OUT().

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20180814165951.13538-1-ubizjak@gmail.com
2018-08-30 13:02:31 +02:00
Jiri Kosina 9222f60650 x86/alternatives: Lockdep-enforce text_mutex in text_poke*()
text_poke() and text_poke_bp() must be called with text_mutex held.

Put proper lockdep anotation in place instead of just mentioning the
requirement in a comment.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1808280853520.25787@cbobk.fhfr.pm
2018-08-30 13:02:30 +02:00
Jann Horn f12d11c5c1 x86/entry/64: Wipe KASAN stack shadow before rewind_stack_do_exit()
Reset the KASAN shadow state of the task stack before rewinding RSP.
Without this, a kernel oops will leave parts of the stack poisoned, and
code running under do_exit() can trip over such poisoned regions and cause
nonsensical false-positive KASAN reports about stack-out-of-bounds bugs.

This does not wipe the exception stacks; if an oops happens on an exception
stack, it might result in random KASAN false-positives from other tasks
afterwards. This is probably relatively uninteresting, since if the kernel
oopses on an exception stack, there are most likely bigger things to worry
about. It'd be more interesting if vmapped stacks and KASAN were
compatible, since then handle_stack_overflow() would oops from exception
stack context.

Fixes: 2deb4be280 ("x86/dumpstack: When OOPSing, rewind the stack before do_exit()")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: kasan-dev@googlegroups.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180828184033.93712-1-jannh@google.com
2018-08-30 11:37:09 +02:00
Nick Desaulniers 1f59a4581b x86/irqflags: Mark native_restore_fl extern inline
This should have been marked extern inline in order to pick up the out
of line definition in arch/x86/kernel/irqflags.S.

Fixes: 208cbb3255 ("x86/irqflags: Provide a declaration for native_save_fl")
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180827214011.55428-1-ndesaulniers@google.com
2018-08-30 11:37:09 +02:00
Masahiro Yamada 36bf9da291 x86/build: Remove jump label quirk for GCC older than 4.5.2
Commit cafa0010cd ("Raise the minimum required gcc version to 4.6")
bumped the minimum GCC version to 4.6 for all architectures.

Remove the workaround code.

It was the only user of cc-if-fullversion.  Remove the macro as well.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: linux-kbuild@vger.kernel.org
Link: https://lkml.kernel.org/r/1535348714-25457-1-git-send-email-yamada.masahiro@socionext.com
2018-08-30 11:37:08 +02:00
Linus Torvalds b4df50de6a Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:

 - Check for the right CPU feature bit in sm4-ce on arm64.

 - Fix scatterwalk WARN_ON in aes-gcm-ce on arm64.

 - Fix unaligned fault in aesni on x86.

 - Fix potential NULL pointer dereference on exit in chtls.

 - Fix DMA mapping direction for RSA in caam.

 - Fix error path return value for xts setkey in caam.

 - Fix address endianness when DMA unmapping in caam.

 - Fix sleep-in-atomic in vmx.

 - Fix command corruption when queue is full in cavium/nitrox.

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: cavium/nitrox - fix for command corruption in queue full case with backlog submissions.
  crypto: vmx - Fix sleep-in-atomic bugs
  crypto: arm64/aes-gcm-ce - fix scatterwalk API violation
  crypto: aesni - Use unaligned loads from gcm_context_data
  crypto: chtls - fix null dereference chtls_free_uld()
  crypto: arm64/sm4-ce - check for the right CPU feature bit
  crypto: caam - fix DMA mapping direction for RSA forms 2 & 3
  crypto: caam/qi - fix error path in xts setkey
  crypto: caam/jr - fix descriptor DMA unmapping
2018-08-29 13:38:39 -07:00
Colin Ian King 6d3c8ce012 x86/xen: remove redundant variable save_pud
Variable save_pud is being assigned but is never used hence it is
redundant and can be removed.

Cleans up clang warning:
variable 'save_pud' set but not used [-Wunused-but-set-variable]

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-08-28 17:37:40 -04:00
Juergen Gross b2d7a075a1 x86/pae: use 64 bit atomic xchg function in native_ptep_get_and_clear
Using only 32-bit writes for the pte will result in an intermediate
L1TF vulnerable PTE. When running as a Xen PV guest this will at once
switch the guest to shadow mode resulting in a loss of performance.

Use arch_atomic64_xchg() instead which will perform the requested
operation atomically with all 64 bits.

Some performance considerations according to:

https://software.intel.com/sites/default/files/managed/ad/dc/Intel-Xeon-Scalable-Processor-throughput-latency.pdf

The main number should be the latency, as there is no tight loop around
native_ptep_get_and_clear().

"lock cmpxchg8b" has a latency of 20 cycles, while "lock xchg" (with a
memory operand) isn't mentioned in that document. "lock xadd" (with xadd
having 3 cycles less latency than xchg) has a latency of 11, so we can
assume a latency of 14 for "lock xchg".

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Jason Andryuk <jandryuk@gmail.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-08-27 14:20:49 -04:00
Juergen Gross f7c90c2aa4 x86/xen: don't write ptes directly in 32-bit PV guests
In some cases 32-bit PAE PV guests still write PTEs directly instead of
using hypercalls. This is especially bad when clearing a PTE as this is
done via 32-bit writes which will produce intermediate L1TF attackable
PTEs.

Change the code to use hypercalls instead.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-08-27 09:31:26 -04:00
Nikolas Nyby e3a5dc0871 x86/Kconfig: Fix trivial typo
Fix a typo in the Kconfig help text: adverticed -> advertised.

Signed-off-by: Nikolas Nyby <nikolas@gnu.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: trivial@kernel.org
Cc: tglx@linutronix.de
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20180825231054.23813-1-nikolas@gnu.org
2018-08-27 10:29:14 +02:00
Andi Kleen cc51e5428e x86/speculation/l1tf: Increase l1tf memory limit for Nehalem+
On Nehalem and newer core CPUs the CPU cache internally uses 44 bits
physical address space. The L1TF workaround is limited by this internal
cache address width, and needs to have one bit free there for the
mitigation to work.

Older client systems report only 36bit physical address space so the range
check decides that L1TF is not mitigated for a 36bit phys/32GB system with
some memory holes.

But since these actually have the larger internal cache width this warning
is bogus because it would only really be needed if the system had more than
43bits of memory.

Add a new internal x86_cache_bits field. Normally it is the same as the
physical bits field reported by CPUID, but for Nehalem and newerforce it to
be at least 44bits.

Change the L1TF memory size warning to use the new cache_bits field to
avoid bogus warnings and remove the bogus comment about memory size.

Fixes: 17dbca1193 ("x86/speculation/l1tf: Add sysfs reporting for l1tf")
Reported-by: George Anchev <studio@anchev.net>
Reported-by: Christopher Snowhill <kode54@gmail.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Michael Hocko <mhocko@suse.com>
Cc: vbabka@suse.cz
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180824170351.34874-1-andi@firstfloor.org
2018-08-27 10:29:14 +02:00