Pull x86 fixes from Ingo Molnar:
"Misc fixes all across the map:
- /proc/kcore vsyscall related fixes
- LTO fix
- build warning fix
- CPU hotplug fix
- Kconfig NR_CPUS cleanups
- cpu_has() cleanups/robustification
- .gitignore fix
- memory-failure unmapping fix
- UV platform fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm, mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages
x86/error_inject: Make just_return_func() globally visible
x86/platform/UV: Fix GAM Range Table entries less than 1GB
x86/build: Add arch/x86/tools/insn_decoder_test to .gitignore
x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU
x86/mm/kcore: Add vsyscall page to /proc/kcore conditionally
vfs/proc/kcore, x86/mm/kcore: Fix SMAP fault when dumping vsyscall user page
x86/Kconfig: Further simplify the NR_CPUS config
x86/Kconfig: Simplify NR_CPUS config
x86/MCE: Fix build warning introduced by "x86: do not use print_symbol()"
x86/cpufeature: Update _static_cpu_has() to use all named variables
x86/cpufeature: Reindent _static_cpu_has()
Pull x86 PTI and Spectre related fixes and updates from Ingo Molnar:
"Here's the latest set of Spectre and PTI related fixes and updates:
Spectre:
- Add entry code register clearing to reduce the Spectre attack
surface
- Update the Spectre microcode blacklist
- Inline the KVM Spectre helpers to get close to v4.14 performance
again.
- Fix indirect_branch_prediction_barrier()
- Fix/improve Spectre related kernel messages
- Fix array_index_nospec_mask() asm constraint
- KVM: fix two MSR handling bugs
PTI:
- Fix a paranoid entry PTI CR3 handling bug
- Fix comments
objtool:
- Fix paranoid_entry() frame pointer warning
- Annotate WARN()-related UD2 as reachable
- Various fixes
- Add Add Peter Zijlstra as objtool co-maintainer
Misc:
- Various x86 entry code self-test fixes
- Improve/simplify entry code stack frame generation and handling
after recent heavy-handed PTI and Spectre changes. (There's two
more WIP improvements expected here.)
- Type fix for cache entries
There's also some low risk non-fix changes I've included in this
branch to reduce backporting conflicts:
- rename a confusing x86_cpu field name
- de-obfuscate the naming of single-TLB flushing primitives"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
x86/entry/64: Fix CR3 restore in paranoid_exit()
x86/cpu: Change type of x86_cache_size variable to unsigned int
x86/spectre: Fix an error message
x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_stepping
selftests/x86/mpx: Fix incorrect bounds with old _sigfault
x86/mm: Rename flush_tlb_single() and flush_tlb_one() to __flush_tlb_one_[user|kernel]()
x86/speculation: Add <asm/msr-index.h> dependency
nospec: Move array_index_nospec() parameter checking into separate macro
x86/speculation: Fix up array_index_nospec_mask() asm constraint
x86/debug: Use UD2 for WARN()
x86/debug, objtool: Annotate WARN()-related UD2 as reachable
objtool: Fix segfault in ignore_unreachable_insn()
selftests/x86: Disable tests requiring 32-bit support on pure 64-bit systems
selftests/x86: Do not rely on "int $0x80" in single_step_syscall.c
selftests/x86: Do not rely on "int $0x80" in test_mremap_vdso.c
selftests/x86: Fix build bug caused by the 5lvl test which has been moved to the VM directory
selftests/x86/pkeys: Remove unused functions
selftests/x86: Clean up and document sscanf() usage
selftests/x86: Fix vDSO selftest segfault for vsyscall=none
x86/entry/64: Remove the unused 'icebp' macro
...
Currently, x86_cache_size is of type int, which makes no sense as we
will never have a valid cache size equal or less than 0. So instead of
initializing this variable to -1, it can perfectly be initialized to 0
and use it as an unsigned variable instead.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Addresses-Coverity-ID: 1464429
Link: http://lkml.kernel.org/r/20180213192208.GA26414@embeddedor.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If i == ARRAY_SIZE(mitigation_options) then we accidentally print
garbage from one space beyond the end of the mitigation_options[] array.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: KarimAllah Ahmed <karahmed@amazon.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-janitors@vger.kernel.org
Fixes: 9005c6834c ("x86/spectre: Simplify spectre_v2 command line parsing")
Link: http://lkml.kernel.org/r/20180214071416.GA26677@mwanda
Signed-off-by: Ingo Molnar <mingo@kernel.org>
x86_mask is a confusing name which is hard to associate with the
processor's stepping.
Additionally, correct an indent issue in lib/cpu.c.
Signed-off-by: Jia Zhang <qianyue.zj@alibaba-inc.com>
[ Updated it to more recent kernels. ]
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: tony.luck@intel.com
Link: http://lkml.kernel.org/r/1514771530-70829-1-git-send-email-qianyue.zj@alibaba-inc.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For boot-time switching between 4- and 5-level paging we need to be able
to fold p4d page table level at runtime. It requires variable
PGDIR_SHIFT and PTRS_PER_P4D.
The change doesn't affect the kernel image size much:
text data bss dec hex filename
8628091 4734304 1368064 14730459 e0c4db vmlinux.before
8628393 4734340 1368064 14730797 e0c62d vmlinux.after
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-7-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In the following commit:
ce0fa3e56a ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages")
... we added code to memory_failure() to unmap the page from the
kernel 1:1 virtual address space to avoid speculative access to the
page logging additional errors.
But memory_failure() may not always succeed in taking the page offline,
especially if the page belongs to the kernel. This can happen if
there are too many corrected errors on a page and either mcelog(8)
or drivers/ras/cec.c asks to take a page offline.
Since we remove the 1:1 mapping early in memory_failure(), we can
end up with the page unmapped, but still in use. On the next access
the kernel crashes :-(
There are also various debug paths that call memory_failure() to simulate
occurrence of an error. Since there is no actual error in memory, we
don't need to map out the page for those cases.
Revert most of the previous attempt and keep the solution local to
arch/x86/kernel/cpu/mcheck/mce.c. Unmap the page only when:
1) there is a real error
2) memory_failure() succeeds.
All of this only applies to 64-bit systems. 32-bit kernel doesn't map
all of memory into kernel space. It isn't worth adding the code to unmap
the piece that is mapped because nobody would run a 32-bit kernel on a
machine that has recoverable machine checks.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert (Persistent Memory) <elliott@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Cc: stable@vger.kernel.org #v4.14
Fixes: ce0fa3e56a ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Harmonize all the Spectre messages so that a:
dmesg | grep -i spectre
... gives us most Spectre related kernel boot messages.
Also fix a few other details:
- clarify a comment about firmware speculation control
- s/KPTI/PTI
- remove various line-breaks that made the code uglier
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This reverts commit 64e16720ea.
We cannot call C functions like that, without marking all the
call-clobbered registers as, well, clobbered. We might have got away
with it for now because the __ibp_barrier() function was *fairly*
unlikely to actually use any other registers. But no. Just no.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arjan.van.de.ven@intel.com
Cc: dave.hansen@intel.com
Cc: jmattson@google.com
Cc: karahmed@amazon.de
Cc: kvm@vger.kernel.org
Cc: pbonzini@redhat.com
Cc: rkrcmar@redhat.com
Cc: sironi@amazon.de
Link: http://lkml.kernel.org/r/1518305967-31356-3-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Arjan points out that the Intel document only clears the 0xc2 microcode
on *some* parts with CPUID 506E3 (INTEL_FAM6_SKYLAKE_DESKTOP stepping 3).
For the Skylake H/S platform it's OK but for Skylake E3 which has the
same CPUID it isn't (yet) cleared.
So removing it from the blacklist was premature. Put it back for now.
Also, Arjan assures me that the 0x84 microcode for Kaby Lake which was
featured in one of the early revisions of the Intel document was never
released to the public, and won't be until/unless it is also validated
as safe. So those can change to 0x80 which is what all *other* versions
of the doc have identified.
Once the retrospective testing of existing public microcodes is done, we
should be back into a mode where new microcodes are only released in
batches and we shouldn't even need to update the blacklist for those
anyway, so this tweaking of the list isn't expected to be a thing which
keeps happening.
Requested-by: Arjan van de Ven <arjan.van.de.ven@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arjan.van.de.ven@intel.com
Cc: dave.hansen@intel.com
Cc: kvm@vger.kernel.org
Cc: pbonzini@redhat.com
Link: http://lkml.kernel.org/r/1518449255-2182-1-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:
for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
done
with de-mangling cleanups yet to come.
NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do. But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.
The next patch from Al will sort out the final differences, and we
should be all done.
Scripted-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The following commit:
7b6061627e ("x86: do not use print_symbol()")
... introduced a new build warning on 32-bit x86:
arch/x86/kernel/cpu/mcheck/mce.c:237:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
pr_cont("{%pS}", (void *)m->ip);
^
Fix the type mismatch between the 'void *' expected by %pS and the mce->ip
field which is u64 by casting to long.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-kernel@vger.kernel.org
Fixes: 7b6061627e ("x86: do not use print_symbol()")
Link: http://lkml.kernel.org/r/20180210145314.22174-1-bp@alien8.de
[ Cleaned up the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Intel have retroactively blessed the 0xc2 microcode on Skylake mobile
and desktop parts, and the Gemini Lake 0x22 microcode is apparently fine
too. We blacklisted the latter purely because it was present with all
the other problematic ones in the 2018-01-08 release, but now it's
explicitly listed as OK.
We still list 0x84 for the various Kaby Lake / Coffee Lake parts, as
that appeared in one version of the blacklist and then reverted to
0x80 again. We can change it if 0x84 is actually announced to be safe.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arjan.van.de.ven@intel.com
Cc: jmattson@google.com
Cc: karahmed@amazon.de
Cc: kvm@vger.kernel.org
Cc: pbonzini@redhat.com
Cc: rkrcmar@redhat.com
Cc: sironi@amazon.de
Link: http://lkml.kernel.org/r/1518305967-31356-2-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
ARM:
- Include icache invalidation optimizations, improving VM startup time
- Support for forwarded level-triggered interrupts, improving
performance for timers and passthrough platform devices
- A small fix for power-management notifiers, and some cosmetic changes
PPC:
- Add MMIO emulation for vector loads and stores
- Allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without
requiring the complex thread synchronization of older CPU versions
- Improve the handling of escalation interrupts with the XIVE interrupt
controller
- Support decrement register migration
- Various cleanups and bugfixes.
s390:
- Cornelia Huck passed maintainership to Janosch Frank
- Exitless interrupts for emulated devices
- Cleanup of cpuflag handling
- kvm_stat counter improvements
- VSIE improvements
- mm cleanup
x86:
- Hypervisor part of SEV
- UMIP, RDPID, and MSR_SMI_COUNT emulation
- Paravirtualized TLB shootdown using the new KVM_VCPU_PREEMPTED bit
- Allow guests to see TOPOEXT, GFNI, VAES, VPCLMULQDQ, and more AVX512
features
- Show vcpu id in its anonymous inode name
- Many fixes and cleanups
- Per-VCPU MSR bitmaps (already merged through x86/pti branch)
- Stable KVM clock when nesting on Hyper-V (merged through x86/hyperv)
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJafvMtAAoJEED/6hsPKofo6YcH/Rzf2RmshrWaC3q82yfIV0Qz
Z8N8yJHSaSdc3Jo6cmiVj0zelwAxdQcyjwlT7vxt5SL2yML+/Q0st9Hc3EgGGXPm
Il99eJEl+2MYpZgYZqV8ff3mHS5s5Jms+7BITAeh6Rgt+DyNbykEAvzt+MCHK9cP
xtsIZQlvRF7HIrpOlaRzOPp3sK2/MDZJ1RBE7wYItK3CUAmsHim/LVYKzZkRTij3
/9b4LP1yMMbziG+Yxt1o682EwJB5YIat6fmDG9uFeEVI5rWWN7WFubqs8gCjYy/p
FX+BjpOdgTRnX+1m9GIj0Jlc/HKMXryDfSZS07Zy4FbGEwSiI5SfKECub4mDhuE=
=C/uD
-----END PGP SIGNATURE-----
Merge tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Radim Krčmář:
"ARM:
- icache invalidation optimizations, improving VM startup time
- support for forwarded level-triggered interrupts, improving
performance for timers and passthrough platform devices
- a small fix for power-management notifiers, and some cosmetic
changes
PPC:
- add MMIO emulation for vector loads and stores
- allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without
requiring the complex thread synchronization of older CPU versions
- improve the handling of escalation interrupts with the XIVE
interrupt controller
- support decrement register migration
- various cleanups and bugfixes.
s390:
- Cornelia Huck passed maintainership to Janosch Frank
- exitless interrupts for emulated devices
- cleanup of cpuflag handling
- kvm_stat counter improvements
- VSIE improvements
- mm cleanup
x86:
- hypervisor part of SEV
- UMIP, RDPID, and MSR_SMI_COUNT emulation
- paravirtualized TLB shootdown using the new KVM_VCPU_PREEMPTED bit
- allow guests to see TOPOEXT, GFNI, VAES, VPCLMULQDQ, and more
AVX512 features
- show vcpu id in its anonymous inode name
- many fixes and cleanups
- per-VCPU MSR bitmaps (already merged through x86/pti branch)
- stable KVM clock when nesting on Hyper-V (merged through
x86/hyperv)"
* tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (197 commits)
KVM: PPC: Book3S: Add MMIO emulation for VMX instructions
KVM: PPC: Book3S HV: Branch inside feature section
KVM: PPC: Book3S HV: Make HPT resizing work on POWER9
KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing code
KVM: PPC: Book3S PR: Fix broken select due to misspelling
KVM: x86: don't forget vcpu_put() in kvm_arch_vcpu_ioctl_set_sregs()
KVM: PPC: Book3S PR: Fix svcpu copying with preemption enabled
KVM: PPC: Book3S HV: Drop locks before reading guest memory
kvm: x86: remove efer_reload entry in kvm_vcpu_stat
KVM: x86: AMD Processor Topology Information
x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested
kvm: embed vcpu id to dentry of vcpu anon inode
kvm: Map PFN-type memory regions as writable (if possible)
x86/kvm: Make it compile on 32bit and with HYPYERVISOR_GUEST=n
KVM: arm/arm64: Fixup userspace irqchip static key optimization
KVM: arm/arm64: Fix userspace_irqchip_in_use counting
KVM: arm/arm64: Fix incorrect timer_is_pending logic
MAINTAINERS: update KVM/s390 maintainers
MAINTAINERS: add Halil as additional vfio-ccw maintainer
MAINTAINERS: add David as a reviewer for KVM/s390
...
Pull spectre/meltdown updates from Thomas Gleixner:
"The next round of updates related to melted spectrum:
- The initial set of spectre V1 mitigations:
- Array index speculation blocker and its usage for syscall,
fdtable and the n180211 driver.
- Speculation barrier and its usage in user access functions
- Make indirect calls in KVM speculation safe
- Blacklisting of known to be broken microcodes so IPBP/IBSR are not
touched.
- The initial IBPB support and its usage in context switch
- The exposure of the new speculation MSRs to KVM guests.
- A fix for a regression in x86/32 related to the cpu entry area
- Proper whitelisting for known to be safe CPUs from the mitigations.
- objtool fixes to deal proper with retpolines and alternatives
- Exclude __init functions from retpolines which speeds up the boot
process.
- Removal of the syscall64 fast path and related cleanups and
simplifications
- Removal of the unpatched paravirt mode which is yet another source
of indirect unproteced calls.
- A new and undisputed version of the module mismatch warning
- A couple of cleanup and correctness fixes all over the place
Yet another step towards full mitigation. There are a few things still
missing like the RBS underflow mitigation for Skylake and other small
details, but that's being worked on.
That said, I'm taking a belated christmas vacation for a week and hope
that everything is magically solved when I'm back on Feb 12th"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL
KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL
KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
KVM/x86: Add IBPB support
KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX
x86/speculation: Fix typo IBRS_ATT, which should be IBRS_ALL
x86/pti: Mark constant arrays as __initconst
x86/spectre: Simplify spectre_v2 command line parsing
x86/retpoline: Avoid retpolines for built-in __init functions
x86/kvm: Update spectre-v1 mitigation
KVM: VMX: make MSR bitmaps per-VCPU
x86/paravirt: Remove 'noreplace-paravirt' cmdline option
x86/speculation: Use Indirect Branch Prediction Barrier in context switch
x86/cpuid: Fix up "virtual" IBRS/IBPB/STIBP feature bits on Intel
x86/spectre: Fix spelling mistake: "vunerable"-> "vulnerable"
x86/spectre: Report get_user mitigation for spectre_v1
nl80211: Sanitize array index in parse_txq_params
vfs, fdtable: Prevent bounds-check bypass via speculative execution
x86/syscall: Sanitize syscall table de-references under speculation
x86/get_user: Use pointer masking to limit speculation
...
I'm seeing build failures from the two newly introduced arrays that
are marked 'const' and '__initdata', which are mutually exclusive:
arch/x86/kernel/cpu/common.c:882:43: error: 'cpu_no_speculation' causes a section type conflict with 'e820_table_firmware_init'
arch/x86/kernel/cpu/common.c:895:43: error: 'cpu_no_meltdown' causes a section type conflict with 'e820_table_firmware_init'
The correct annotation is __initconst.
Fixes: fec9434a12 ("x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/20180202213959.611210-1-arnd@arndb.de
Pull printk updates from Petr Mladek:
- Add a console_msg_format command line option:
The value "default" keeps the old "[time stamp] text\n" format. The
value "syslog" allows to see the syslog-like "<log
level>[timestamp] text" format.
This feature was requested by people doing regression tests, for
example, 0day robot. They want to have both filtered and full logs
at hands.
- Reduce the risk of softlockup:
Pass the console owner in a busy loop.
This is a new approach to the old problem. It was first proposed by
Steven Rostedt on Kernel Summit 2017. It marks a context in which
the console_lock owner calls console drivers and could not sleep.
On the other side, printk() callers could detect this state and use
a busy wait instead of a simple console_trylock(). Finally, the
console_lock owner checks if there is a busy waiter at the end of
the special context and eventually passes the console_lock to the
waiter.
The hand-off works surprisingly well and helps in many situations.
Well, there is still a possibility of the softlockup, for example,
when the flood of messages stops and the last owner still has too
much to flush.
There is increasing number of people having problems with
printk-related softlockups. We might eventually need to get better
solution. Anyway, this looks like a good start and promising
direction.
- Do not allow to schedule in console_unlock() called from printk():
This reverts an older controversial commit. The reschedule helped
to avoid softlockups. But it also slowed down the console output.
This patch is obsoleted by the new console waiter logic described
above. In fact, the reschedule made the hand-off less effective.
- Deprecate "%pf" and "%pF" format specifier:
It was needed on ia64, ppc64 and parisc64 to dereference function
descriptors and show the real function address. It is done
transparently by "%ps" and "pS" format specifier now.
Sergey Senozhatsky found that all the function descriptors were in
a special elf section and could be easily detected.
- Remove printk_symbol() API:
It has been obsoleted by "%pS" format specifier, and this change
helped to remove few continuous lines and a less intuitive old API.
- Remove redundant memsets:
Sergey removed unnecessary memset when processing printk.devkmsg
command line option.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: (27 commits)
printk: drop redundant devkmsg_log_str memsets
printk: Never set console_may_schedule in console_trylock()
printk: Hide console waiter logic into helpers
printk: Add console owner and waiter logic to load balance console writes
kallsyms: remove print_symbol() function
checkpatch: add pF/pf deprecation warning
symbol lookup: introduce dereference_symbol_descriptor()
parisc64: Add .opd based function descriptor dereference
powerpc64: Add .opd based function descriptor dereference
ia64: Add .opd based function descriptor dereference
sections: split dereference_function_descriptor()
openrisc: Fix conflicting types for _exext and _stext
lib: do not use print_symbol()
irq debug: do not use print_symbol()
sysfs: do not use print_symbol()
drivers: do not use print_symbol()
x86: do not use print_symbol()
unicore32: do not use print_symbol()
sh: do not use print_symbol()
mn10300: do not use print_symbol()
...
Here is the set of "big" driver core patches for 4.16-rc1.
The majority of the work here is in the firmware subsystem, with reworks
to try to attempt to make the code easier to handle in the long run, but
no functional change. There's also some tree-wide sysfs attribute
fixups with lots of acks from the various subsystem maintainers, as well
as a handful of other normal fixes and changes.
And finally, some license cleanups for the driver core and sysfs code.
All have been in linux-next for a while with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWnLvPw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynNzACgkzjPoBytJWbpWFt6SR6L33/u4kEAnRFvVCGL
s6ygQPQhZIjKk2Lxa2hC
=Zihy
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the set of "big" driver core patches for 4.16-rc1.
The majority of the work here is in the firmware subsystem, with
reworks to try to attempt to make the code easier to handle in the
long run, but no functional change. There's also some tree-wide sysfs
attribute fixups with lots of acks from the various subsystem
maintainers, as well as a handful of other normal fixes and changes.
And finally, some license cleanups for the driver core and sysfs code.
All have been in linux-next for a while with no reported issues"
* tag 'driver-core-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (48 commits)
device property: Define type of PROPERTY_ENRTY_*() macros
device property: Reuse property_entry_free_data()
device property: Move property_entry_free_data() upper
firmware: Fix up docs referring to FIRMWARE_IN_KERNEL
firmware: Drop FIRMWARE_IN_KERNEL Kconfig option
USB: serial: keyspan: Drop firmware Kconfig options
sysfs: remove DEBUG defines
sysfs: use SPDX identifiers
drivers: base: add coredump driver ops
sysfs: add attribute specification for /sysfs/devices/.../coredump
test_firmware: fix missing unlock on error in config_num_requests_store()
test_firmware: make local symbol test_fw_config static
sysfs: turn WARN() into pr_warn()
firmware: Fix a typo in fallback-mechanisms.rst
treewide: Use DEVICE_ATTR_WO
treewide: Use DEVICE_ATTR_RO
treewide: Use DEVICE_ATTR_RW
sysfs.h: Use octal permissions
component: add debugfs support
bus: simple-pm-bus: convert bool SIMPLE_PM_BUS to tristate
...
Pull poll annotations from Al Viro:
"This introduces a __bitwise type for POLL### bitmap, and propagates
the annotations through the tree. Most of that stuff is as simple as
'make ->poll() instances return __poll_t and do the same to local
variables used to hold the future return value'.
Some of the obvious brainos found in process are fixed (e.g. POLLIN
misspelled as POLL_IN). At that point the amount of sparse warnings is
low and most of them are for genuine bugs - e.g. ->poll() instance
deciding to return -EINVAL instead of a bitmap. I hadn't touched those
in this series - it's large enough as it is.
Another problem it has caught was eventpoll() ABI mess; select.c and
eventpoll.c assumed that corresponding POLL### and EPOLL### were
equal. That's true for some, but not all of them - EPOLL### are
arch-independent, but POLL### are not.
The last commit in this series separates userland POLL### values from
the (now arch-independent) kernel-side ones, converting between them
in the few places where they are copied to/from userland. AFAICS, this
is the least disruptive fix preserving poll(2) ABI and making epoll()
work on all architectures.
As it is, it's simply broken on sparc - try to give it EPOLLWRNORM and
it will trigger only on what would've triggered EPOLLWRBAND on other
architectures. EPOLLWRBAND and EPOLLRDHUP, OTOH, are never triggered
at all on sparc. With this patch they should work consistently on all
architectures"
* 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
make kernel-side POLL... arch-independent
eventpoll: no need to mask the result of epi_item_poll() again
eventpoll: constify struct epoll_event pointers
debugging printk in sg_poll() uses %x to print POLL... bitmap
annotate poll(2) guts
9p: untangle ->poll() mess
->si_band gets POLL... bitmap stored into a user-visible long field
ring_buffer_poll_wait() return value used as return value of ->poll()
the rest of drivers/*: annotate ->poll() instances
media: annotate ->poll() instances
fs: annotate ->poll() instances
ipc, kernel, mm: annotate ->poll() instances
net: annotate ->poll() instances
apparmor: annotate ->poll() instances
tomoyo: annotate ->poll() instances
sound: annotate ->poll() instances
acpi: annotate ->poll() instances
crypto: annotate ->poll() instances
block: annotate ->poll() instances
x86: annotate ->poll() instances
...
Hyper-V supports Live Migration notification. This is supposed to be used
in conjunction with TSC emulation: when a VM is migrated to a host with
different TSC frequency for some short period the host emulates the
accesses to TSC and sends an interrupt to notify about the event. When the
guest is done updating everything it can disable TSC emulation and
everything will start working fast again.
These notifications weren't required until now as Hyper-V guests are not
supposed to use TSC as a clocksource: in Linux the TSC is even marked as
unstable on boot. Guests normally use 'tsc page' clocksource and host
updates its values on migrations automatically.
Things change when with nested virtualization: even when the PV
clocksources (kvm-clock or tsc page) are passed through to the nested
guests the TSC frequency and frequency changes need to be know..
Hyper-V Top Level Functional Specification (as of v5.0b) wrongly specifies
EAX:BIT(12) of CPUID:0x40000009 as the feature identification bit. The
right one to check is EAX:BIT(13) of CPUID:0x40000003. I was assured that
the fix in on the way.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: devel@linuxdriverproject.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Mohammed Gamal <mmorsy@redhat.com>
Link: https://lkml.kernel.org/r/20180124132337.30138-4-vkuznets@redhat.com
Pull siginfo cleanups from Eric Biederman:
"Long ago when 2.4 was just a testing release copy_siginfo_to_user was
made to copy individual fields to userspace, possibly for efficiency
and to ensure initialized values were not copied to userspace.
Unfortunately the design was complex, it's assumptions unstated, and
humans are fallible and so while it worked much of the time that
design failed to ensure unitialized memory is not copied to userspace.
This set of changes is part of a new design to clean up siginfo and
simplify things, and hopefully make the siginfo handling robust enough
that a simple inspection of the code can be made to ensure we don't
copy any unitializied fields to userspace.
The design is to unify struct siginfo and struct compat_siginfo into a
single definition that is shared between all architectures so that
anyone adding to the set of information shared with struct siginfo can
see the whole picture. Hopefully ensuring all future si_code
assignments are arch independent.
The design is to unify copy_siginfo_to_user32 and
copy_siginfo_from_user32 so that those function are complete and cope
with all of the different cases documented in signinfo_layout. I don't
think there was a single implementation of either of those functions
that was complete and correct before my changes unified them.
The design is to introduce a series of helpers including
force_siginfo_fault that take the values that are needed in struct
siginfo and build the siginfo structure for their callers. Ensuring
struct siginfo is built correctly.
The remaining work for 4.17 (unless someone thinks it is post -rc1
material) is to push usage of those helpers down into the
architectures so that architecture specific code will not need to deal
with the fiddly work of intializing struct siginfo, and then when
struct siginfo is guaranteed to be fully initialized change copy
siginfo_to_user into a simple wrapper around copy_to_user.
Further there is work in progress on the issues that have been
documented requires arch specific knowledge to sort out.
The changes below fix or at least document all of the issues that have
been found with siginfo generation. Then proceed to unify struct
siginfo the 32 bit helpers that copy siginfo to and from userspace,
and generally clean up anything that is not arch specific with regards
to siginfo generation.
It is a lot but with the unification you can of siginfo you can
already see the code reduction in the kernel"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (45 commits)
signal/memory-failure: Use force_sig_mceerr and send_sig_mceerr
mm/memory_failure: Remove unused trapno from memory_failure
signal/ptrace: Add force_sig_ptrace_errno_trap and use it where needed
signal/powerpc: Remove unnecessary signal_code parameter of do_send_trap
signal: Helpers for faults with specialized siginfo layouts
signal: Add send_sig_fault and force_sig_fault
signal: Replace memset(info,...) with clear_siginfo for clarity
signal: Don't use structure initializers for struct siginfo
signal/arm64: Better isolate the COMPAT_TASK portion of ptrace_hbptriggered
ptrace: Use copy_siginfo in setsiginfo and getsiginfo
signal: Unify and correct copy_siginfo_to_user32
signal: Remove the code to clear siginfo before calling copy_siginfo_from_user32
signal: Unify and correct copy_siginfo_from_user32
signal/blackfin: Remove pointless UID16_SIGINFO_COMPAT_NEEDED
signal/blackfin: Move the blackfin specific si_codes to asm-generic/siginfo.h
signal/tile: Move the tile specific si_codes to asm-generic/siginfo.h
signal/frv: Move the frv specific si_codes to asm-generic/siginfo.h
signal/ia64: Move the ia64 specific si_codes to asm-generic/siginfo.h
signal/powerpc: Remove redefinition of NSIGTRAP on powerpc
signal: Move addr_lsb into the _sigfault union for clarity
...
Despite the fact that all the other code there seems to be doing it, just
using set_cpu_cap() in early_intel_init() doesn't actually work.
For CPUs with PKU support, setup_pku() calls get_cpu_cap() after
c->c_init() has set those feature bits. That resets those bits back to what
was queried from the hardware.
Turning the bits off for bad microcode is easy to fix. That can just use
setup_clear_cpu_cap() to force them off for all CPUs.
I was less keen on forcing the feature bits *on* that way, just in case
of inconsistencies. I appreciate that the kernel is going to get this
utterly wrong if CPU features are not consistent, because it has already
applied alternatives by the time secondary CPUs are brought up.
But at least if setup_force_cpu_cap() isn't being used, we might have a
chance of *detecting* the lack of the corresponding bit and either
panicking or refusing to bring the offending CPU online.
So ensure that the appropriate feature bits are set within get_cpu_cap()
regardless of how many extra times it's called.
Fixes: 2961298e ("x86/cpufeatures: Clean up Spectre v2 related CPUID flags")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: karahmed@amazon.de
Cc: peterz@infradead.org
Cc: bp@alien8.de
Link: https://lkml.kernel.org/r/1517322623-15261-1-git-send-email-dwmw@amazon.co.uk
Pull x86 RAS updates from Ingo Molnar:
- various AMD SMCA error parsing/reporting improvements (Yazen Ghannam)
- extend Intel CMCI error reporting to more cases (Xie XiuQi)
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/MCE: Make correctable error detection look at the Deferred bit
x86/MCE: Report only DRAM ECC as memory errors on AMD systems
x86/MCE/AMD: Define a function to get SMCA bank type
x86/mce/AMD: Don't set DEF_INT_TYPE in MSR_CU_DEF_ERR on SMCA systems
x86/MCE: Extend table to report action optional errors through CMCI too
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJabj6pAAoJEHm+PkMAQRiGs8cIAJQFkCWnbz86e3vG4DuWhyA8
CMGHCQdUOxxFGa/ixhIiuetbC0x+JVHAjV2FwVYbAQfaZB3pfw2iR1ncQxpAP1AI
oLU9vBEqTmwKMPc9CM5rRfnLFWpGcGwUNzgPdxD5yYqGDtcM8K840mF6NdkYe5AN
xU8rv1wlcFPF4A5pvHCH0pvVmK4VxlVFk/2H67TFdxBs4PyJOnSBnf+bcGWgsKO6
hC8XIVtcKCH2GfFxt5d0Vgc5QXJEpX1zn2mtCa1MwYRjN2plgYfD84ha0xE7J0B0
oqV/wnjKXDsmrgVpncr3txd4+zKJFNkdNRE4eLAIupHo2XHTG4HvDJ5dBY2NhGU=
=sOml
-----END PGP SIGNATURE-----
Merge tag 'v4.15' into x86/pti, to be able to merge dependent changes
Time has come to switch PTI development over to a v4.15 base - we'll still
try to make sure that all PTI fixes backport cleanly to v4.14 and earlier.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86/pti updates from Thomas Gleixner:
"Another set of melted spectrum related changes:
- Code simplifications and cleanups for RSB and retpolines.
- Make the indirect calls in KVM speculation safe.
- Whitelist CPUs which are known not to speculate from Meltdown and
prepare for the new CPUID flag which tells the kernel that a CPU is
not affected.
- A less rigorous variant of the module retpoline check which merily
warns when a non-retpoline protected module is loaded and reflects
that fact in the sysfs file.
- Prepare for Indirect Branch Prediction Barrier support.
- Prepare for exposure of the Speculation Control MSRs to guests, so
guest OSes which depend on those "features" can use them. Includes
a blacklist of the broken microcodes. The actual exposure of the
MSRs through KVM is still being worked on"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation: Simplify indirect_branch_prediction_barrier()
x86/retpoline: Simplify vmexit_fill_RSB()
x86/cpufeatures: Clean up Spectre v2 related CPUID flags
x86/cpu/bugs: Make retpoline module warning conditional
x86/bugs: Drop one "mitigation" from dmesg
x86/nospec: Fix header guards names
x86/alternative: Print unadorned pointers
x86/speculation: Add basic IBPB (Indirect Branch Prediction Barrier) support
x86/cpufeature: Blacklist SPEC_CTRL/PRED_CMD on early Spectre v2 microcodes
x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown
x86/msr: Add definitions for new speculation control MSRs
x86/cpufeatures: Add AMD feature bits for Speculation Control
x86/cpufeatures: Add Intel feature bits for Speculation Control
x86/cpufeatures: Add CPUID_7_EDX CPUID leaf
module/retpoline: Warn about missing retpoline in module
KVM: VMX: Make indirect call speculation safe
KVM: x86: Make indirect calls in emulator speculation safe
Pull x86 timer updates from Thomas Gleixner:
"A small set of updates for x86 specific timers:
- Mark TSC invariant on a subset of Centaur CPUs
- Allow TSC calibration without PIT on mobile platforms which lack
legacy devices"
* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/centaur: Mark TSC invariant
x86/tsc: Introduce early tsc clocksource
x86/time: Unconditionally register legacy timer interrupt
x86/tsc: Allow TSC calibration without PIT
Pull x86 platform updates from Thomas Gleixner:
"The platform support for x86 contains the following updates:
- A set of updates for the UV platform to support new CPUs and to fix
some of the UV4A BAU MRRs
- The initial platform support for the jailhouse hypervisor to allow
native Linux guests (inmates) in non-root cells.
- A fix for the PCI initialization on Intel MID platforms"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/jailhouse: Respect pci=lastbus command line settings
x86/jailhouse: Set X86_FEATURE_TSC_KNOWN_FREQ
x86/platform/intel-mid: Move PCI initialization to arch_init()
x86/platform/uv/BAU: Replace hard-coded values with MMR definitions
x86/platform/UV: Fix UV4A BAU MMRs
x86/platform/UV: Fix GAM MMR references in the UV x2apic code
x86/platform/UV: Fix GAM MMR changes in UV4A
x86/platform/UV: Add references to access fixed UV4A HUB MMRs
x86/platform/UV: Fix UV4A support on new Intel Processors
x86/platform/UV: Update uv_mmrs.h to prepare for UV4A fixes
x86/jailhouse: Add PCI dependency
x86/jailhouse: Hide x2apic code when CONFIG_X86_X2APIC=n
x86/jailhouse: Initialize PCI support
x86/jailhouse: Wire up IOAPIC for legacy UART ports
x86/jailhouse: Halt instead of failing to restart
x86/jailhouse: Silence ACPI warning
x86/jailhouse: Avoid access of unsupported platform resources
x86/jailhouse: Set up timekeeping
x86/jailhouse: Enable PMTIMER
x86/jailhouse: Enable APIC and SMP support
...
Pull x86/cache updates from Thomas Gleixner:
"A set of patches which add support for L2 cache partitioning to the
Intel RDT facility"
* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/intel_rdt: Add command line parameter to control L2_CDP
x86/intel_rdt: Enable L2 CDP in MSR IA32_L2_QOS_CFG
x86/intel_rdt: Add two new resources for L2 Code and Data Prioritization (CDP)
x86/intel_rdt: Enumerate L2 Code and Data Prioritization (CDP) feature
x86/intel_rdt: Add L2CDP support in documentation
x86/intel_rdt: Update documentation
We want to expose the hardware features simply in /proc/cpuinfo as "ibrs",
"ibpb" and "stibp". Since AMD has separate CPUID bits for those, use them
as the user-visible bits.
When the Intel SPEC_CTRL bit is set which indicates both IBRS and IBPB
capability, set those (AMD) bits accordingly. Likewise if the Intel STIBP
bit is set, set the AMD STIBP that's used for the generic hardware
capability.
Hide the rest from /proc/cpuinfo by putting "" in the comments. Including
RETPOLINE and RETPOLINE_AMD which shouldn't be visible there. There are
patches to make the sysfs vulnerabilities information non-readable by
non-root, and the same should apply to all information about which
mitigations are actually in use. Those *shouldn't* appear in /proc/cpuinfo.
The feature bit for whether IBPB is actually used, which is needed for
ALTERNATIVEs, is renamed to X86_FEATURE_USE_IBPB.
Originally-by: Borislav Petkov <bp@suse.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ak@linux.intel.com
Cc: dave.hansen@intel.com
Cc: karahmed@amazon.de
Cc: arjan@linux.intel.com
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Cc: bp@alien8.de
Cc: pbonzini@redhat.com
Cc: tim.c.chen@linux.intel.com
Cc: gregkh@linux-foundation.org
Link: https://lkml.kernel.org/r/1517070274-12128-2-git-send-email-dwmw@amazon.co.uk
If sysfs is disabled and RETPOLINE not defined:
arch/x86/kernel/cpu/bugs.c:97:13: warning: ‘spectre_v2_bad_module’ defined but not used
[-Wunused-variable]
static bool spectre_v2_bad_module;
Hide it.
Fixes: caf7501a1b ("module/retpoline: Warn about missing retpoline in module")
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
There's a risk that a kernel which has full retpoline mitigations becomes
vulnerable when a module gets loaded that hasn't been compiled with the
right compiler or the right option.
To enable detection of that mismatch at module load time, add a module info
string "retpoline" at build time when the module was compiled with
retpoline support. This only covers compiled C source, but assembler source
or prebuilt object files are not checked.
If a retpoline enabled kernel detects a non retpoline protected module at
load time, print a warning and report it in the sysfs vulnerability file.
[ tglx: Massaged changelog ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: gregkh@linuxfoundation.org
Cc: torvalds@linux-foundation.org
Cc: jeyu@kernel.org
Cc: arjan@linux.intel.com
Link: https://lkml.kernel.org/r/20180125235028.31211-1-andi@firstfloor.org
Centaur CPU has a constant frequency TSC and that TSC does not stop in
C-States. But because the corresponding TSC feature flags are not set for
that CPU, the TSC is treated as not constant frequency and assumed to stop
in C-States, which makes it an unreliable and unusable clock source.
Setting those flags tells the kernel that the TSC is usable, so it will
select it over HPET. The effect of this is that reading time stamps (from
kernel or user space) will be faster and more efficent.
Signed-off-by: davidwang <davidwang@zhaoxin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: qiyuanwang@zhaoxin.com
Cc: linux-pm@vger.kernel.org
Cc: brucechang@via-alliance.com
Cc: cooperyan@zhaoxin.com
Cc: benjaminpan@viatech.com
Link: https://lkml.kernel.org/r/1516616057-5158-1-git-send-email-davidwang@zhaoxin.com
Commit 24c2503255 ("x86/microcode: Do not access the initrd after it has
been freed") fixed attempts to access initrd from the microcode loader
after it has been freed. However, a similar KASAN warning was reported
(stack trace edited):
smpboot: Booting Node 0 Processor 1 APIC 0x11
==================================================================
BUG: KASAN: use-after-free in find_cpio_data+0x9b5/0xa50
Read of size 1 at addr ffff880035ffd000 by task swapper/1/0
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.8-slack #7
Hardware name: System manufacturer System Product Name/A88X-PLUS, BIOS 3003 03/10/2016
Call Trace:
dump_stack
print_address_description
kasan_report
? find_cpio_data
__asan_report_load1_noabort
find_cpio_data
find_microcode_in_initrd
__load_ucode_amd
load_ucode_amd_ap
load_ucode_ap
After some investigation, it turned out that a merge was done using the
wrong side to resolve, leading to picking up the previous state, before
the 24c2503255 fix. Therefore the Fixes tag below contains a merge
commit.
Revert the mismerge by catching the save_microcode_in_initrd_amd()
retval and thus letting the function exit with the last return statement
so that initrd_gone can be set to true.
Fixes: f26483eaed ("Merge branch 'x86/urgent' into x86/microcode, to resolve conflicts")
Reported-by: <higuita@gmx.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=198295
Link: https://lkml.kernel.org/r/20180123104133.918-2-bp@alien8.de
Commit b94b737331 ("x86/microcode/intel: Extend BDW late-loading with a
revision check") reduced the impact of erratum BDF90 for Broadwell model
79.
The impact can be reduced further by checking the size of the last level
cache portion per core.
Tony: "The erratum says the problem only occurs on the large-cache SKUs.
So we only need to avoid the update if we are on a big cache SKU that is
also running old microcode."
For more details, see erratum BDF90 in document #334165 (Intel Xeon
Processor E7-8800/4800 v4 Product Family Specification Update) from
September 2017.
Fixes: b94b737331 ("x86/microcode/intel: Extend BDW late-loading with a revision check")
Signed-off-by: Jia Zhang <zhang.jia@linux.alibaba.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1516321542-31161-1-git-send-email-zhang.jia@linux.alibaba.com
Today 4 architectures set ARCH_SUPPORTS_MEMORY_FAILURE (arm64, parisc,
powerpc, and x86), while 4 other architectures set __ARCH_SI_TRAPNO
(alpha, metag, sparc, and tile). These two sets of architectures do
not interesect so remove the trapno paramater to remove confusion.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Pull x86 pti fixes from Thomas Gleixner:
"A small set of fixes for the meltdown/spectre mitigations:
- Make kprobes aware of retpolines to prevent probes in the retpoline
thunks.
- Make the machine check exception speculation protected. MCE used to
issue an indirect call directly from the ASM entry code. Convert
that to a direct call into a C-function and issue the indirect call
from there so the compiler can add the retpoline protection,
- Make the vmexit_fill_RSB() assembly less stupid
- Fix a typo in the PTI documentation"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/retpoline: Optimize inline assembler for vmexit_fill_RSB
x86/pti: Document fix wrong index
kprobes/x86: Disable optimizing on the function jumps to indirect thunk
kprobes/x86: Blacklist indirect thunk functions for kprobes
retpoline: Introduce start/end markers of indirect thunk
x86/mce: Make machine check speculation protected
The machine check idtentry uses an indirect branch directly from the low
level code. This evades the speculation protection.
Replace it by a direct call into C code and issue the indirect call there
so the compiler can apply the proper speculation protection.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by:Borislav Petkov <bp@alien8.de>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Niced-by: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801181626290.1847@nanos
L2 CDP can be controlled by kernel parameter "rdt=".
If "rdt=l2cdp", L2 CDP is turned on.
If "rdt=!l2cdp", L2 CDP is turned off.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: Vikas" <vikas.shivappa@intel.com>
Cc: Sai Praneeth" <sai.praneeth.prakhya@intel.com>
Cc: Reinette" <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/1513810644-78015-7-git-send-email-fenghua.yu@intel.com
Bit 0 in MSR IA32_L2_QOS_CFG (0xc82) is L2 CDP enable bit. By default,
the bit is zero, i.e. L2 CAT is enabled, and L2 CDP is disabled. When
the resctrl mount parameter "cdpl2" is given, the bit is set to 1 and L2
CDP is enabled.
In L2 CDP mode, the L2 CAT mask MSRs are re-mapped into interleaved pairs
of mask MSRs for code (referenced by an odd CLOSID) and data (referenced by
an even CLOSID).
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: Vikas" <vikas.shivappa@intel.com>
Cc: Sai Praneeth" <sai.praneeth.prakhya@intel.com>
Cc: Reinette" <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/1513810644-78015-6-git-send-email-fenghua.yu@intel.com
L2 data and L2 code are added as new resources in rdt_resources_all[]
and data in the resources are configured.
When L2 CDP is enabled, the schemata will have the two resources in
this format:
L2DATA:l2id0=xxxx;l2id1=xxxx;....
L2CODE:l2id0=xxxx;l2id1=xxxx;....
xxxx represent CBM (Cache Bit Mask) values in the schemata, similar to all
others (L2 CAT/L3 CAT/L3 CDP).
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: Vikas" <vikas.shivappa@intel.com>
Cc: Sai Praneeth" <sai.praneeth.prakhya@intel.com>
Cc: Reinette" <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/1513810644-78015-5-git-send-email-fenghua.yu@intel.com
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- A rather involved set of memory hardware encryption fixes to
support the early loading of microcode files via the initrd. These
are larger than what we normally take at such a late -rc stage, but
there are two mitigating factors: 1) much of the changes are
limited to the SME code itself 2) being able to early load
microcode has increased importance in the post-Meltdown/Spectre
era.
- An IRQ vector allocator fix
- An Intel RDT driver use-after-free fix
- An APIC driver bug fix/revert to make certain older systems boot
again
- A pkeys ABI fix
- TSC calibration fixes
- A kdump fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic/vector: Fix off by one in error path
x86/intel_rdt/cqm: Prevent use after free
x86/mm: Encrypt the initrd earlier for BSP microcode update
x86/mm: Prepare sme_encrypt_kernel() for PAGE aligned encryption
x86/mm: Centralize PMD flags in sme_encrypt_kernel()
x86/mm: Use a struct to reduce parameters for SME PGD mapping
x86/mm: Clean up register saving in the __enc_copy() assembly code
x86/idt: Mark IDT tables __initconst
Revert "x86/apic: Remove init_bsp_APIC()"
x86/mm/pkeys: Fix fill_sig_info_pkey
x86/tsc: Print tsc_khz, when it differs from cpu_khz
x86/tsc: Fix erroneous TSC rate on Skylake Xeon
x86/tsc: Future-proof native_calibrate_tsc()
kdump: Write the correct address of mem_section into vmcoreinfo
Pull x86 pti bits and fixes from Thomas Gleixner:
"This last update contains:
- An objtool fix to prevent a segfault with the gold linker by
changing the invocation order. That's not just for gold, it's a
general robustness improvement.
- An improved error message for objtool which spares tearing hairs.
- Make KASAN fail loudly if there is not enough memory instead of
oopsing at some random place later
- RSB fill on context switch to prevent RSB underflow and speculation
through other units.
- Make the retpoline/RSB functionality work reliably for both Intel
and AMD
- Add retpoline to the module version magic so mismatch can be
detected
- A small (non-fix) update for cpufeatures which prevents cpu feature
clashing for the upcoming extra mitigation bits to ease
backporting"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
module: Add retpoline tag to VERMAGIC
x86/cpufeature: Move processor tracing out of scattered features
objtool: Improve error message for bad file argument
objtool: Fix seg fault with gold linker
x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros
x86/retpoline: Fill RSB on context switch for affected CPUs
x86/kasan: Panic if there is not enough memory to boot
intel_rdt_iffline_cpu() -> domain_remove_cpu() frees memory first and then
proceeds accessing it.
BUG: KASAN: use-after-free in find_first_bit+0x1f/0x80
Read of size 8 at addr ffff883ff7c1e780 by task cpuhp/31/195
find_first_bit+0x1f/0x80
has_busy_rmid+0x47/0x70
intel_rdt_offline_cpu+0x4b4/0x510
Freed by task 195:
kfree+0x94/0x1a0
intel_rdt_offline_cpu+0x17d/0x510
Do the teardown first and then free memory.
Fixes: 24247aeeab ("x86/intel_rdt/cqm: Improve limbo list processing")
Reported-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Peter Zilstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "Roderick W. Smith" <rod.smith@canonical.com>
Cc: 1733662@bugs.launchpad.net
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801161957510.2366@nanos
Processor tracing is already enumerated in word 9 (CPUID[7,0].EBX),
so do not duplicate it in the scattered features word.
Besides being more tidy, this will be useful for KVM when it presents
processor tracing to the guests. KVM selects host features that are
supported by both the host kernel (depending on command line options,
CPU errata, or whatever) and KVM. Whenever a full feature word exists,
KVM's code is written in the expectation that the CPUID bit number
matches the X86_FEATURE_* bit number, but this is not the case for
X86_FEATURE_INTEL_PT.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luwei Kang <luwei.kang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/1516117345-34561-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This part of Secure Encrypted Virtualization (SEV) patch series focuses on KVM
changes required to create and manage SEV guests.
SEV is an extension to the AMD-V architecture which supports running encrypted
virtual machine (VMs) under the control of a hypervisor. Encrypted VMs have their
pages (code and data) secured such that only the guest itself has access to
unencrypted version. Each encrypted VM is associated with a unique encryption key;
if its data is accessed to a different entity using a different key the encrypted
guest's data will be incorrectly decrypted, leading to unintelligible data.
This security model ensures that hypervisor will no longer able to inspect or
alter any guest code or data.
The key management of this feature is handled by a separate processor known as
the AMD Secure Processor (AMD-SP) which is present on AMD SOCs. The SEV Key
Management Specification (see below) provides a set of commands which can be
used by hypervisor to load virtual machine keys through the AMD-SP driver.
The patch series adds a new ioctl in KVM driver (KVM_MEMORY_ENCRYPT_OP). The
ioctl will be used by qemu to issue SEV guest-specific commands defined in Key
Management Specification.
The following links provide additional details:
AMD Memory Encryption white paper:
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf
AMD64 Architecture Programmer's Manual:
http://support.amd.com/TechDocs/24593.pdf
SME is section 7.10
SEV is section 15.34
SEV Key Management:
http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf
KVM Forum Presentation:
http://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf
SEV Guest BIOS support:
SEV support has been add to EDKII/OVMF BIOS
https://github.com/tianocore/edk2
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On context switch from a shallow call stack to a deeper one, as the CPU
does 'ret' up the deeper side it may encounter RSB entries (predictions for
where the 'ret' goes to) which were populated in userspace.
This is problematic if neither SMEP nor KPTI (the latter of which marks
userspace pages as NX for the kernel) are active, as malicious code in
userspace may then be executed speculatively.
Overwrite the CPU's return prediction stack with calls which are predicted
to return to an infinite loop, to "capture" speculation if this
happens. This is required both for retpoline, and also in conjunction with
IBRS for !SMEP && !KPTI.
On Skylake+ the problem is slightly different, and an *underflow* of the
RSB may cause errant branch predictions to occur. So there it's not so much
overwrite, as *filling* the RSB to attempt to prevent it getting
empty. This is only a partial solution for Skylake+ since there are many
other conditions which may result in the RSB becoming empty. The full
solution on Skylake+ is to use IBRS, which will prevent the problem even
when the RSB becomes empty. With IBRS, the RSB-stuffing will not be
required on context switch.
[ tglx: Added missing vendor check and slighty massaged comments and
changelog ]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515779365-9032-1-git-send-email-dwmw@amazon.co.uk
The Jailhouse hypervisor is able to statically partition a multicore
system into multiple so-called cells. Linux is used as boot loader and
continues to run in the root cell after Jailhouse is enabled. Linux can
also run in non-root cells.
Jailhouse does not emulate usual x86 devices. It also provides no
complex ACPI but basic platform information that the boot loader
forwards via setup data. This adds the infrastructure to detect when
running in a non-root cell so that the platform can be configured as
required in succeeding steps.
Support is limited to x86-64 so far, primarily because no boot loader
stub exists for i386 and, thus, we wouldn't be able to test the 32-bit
path.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/7f823d077b38b1a70c526b40b403f85688c137d3.1511770314.git.jan.kiszka@siemens.com
Pull x86 pti updates from Thomas Gleixner:
"This contains:
- a PTI bugfix to avoid setting reserved CR3 bits when PCID is
disabled. This seems to cause issues on a virtual machine at least
and is incorrect according to the AMD manual.
- a PTI bugfix which disables the perf BTS facility if PTI is
enabled. The BTS AUX buffer is not globally visible and causes the
CPU to fault when the mapping disappears on switching CR3 to user
space. A full fix which restores BTS on PTI is non trivial and will
be worked on.
- PTI bugfixes for EFI and trusted boot which make sure that the user
space visible page table entries have the NX bit cleared
- removal of dead code in the PTI pagetable setup functions
- add PTI documentation
- add a selftest for vsyscall to verify that the kernel actually
implements what it advertises.
- a sysfs interface to expose vulnerability and mitigation
information so there is a coherent way for users to retrieve the
status.
- the initial spectre_v2 mitigations, aka retpoline:
+ The necessary ASM thunk and compiler support
+ The ASM variants of retpoline and the conversion of affected ASM
code
+ Make LFENCE serializing on AMD so it can be used as speculation
trap
+ The RSB fill after vmexit
- initial objtool support for retpoline
As I said in the status mail this is the most of the set of patches
which should go into 4.15 except two straight forward patches still on
hold:
- the retpoline add on of LFENCE which waits for ACKs
- the RSB fill after context switch
Both should be ready to go early next week and with that we'll have
covered the major holes of spectre_v2 and go back to normality"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
x86,perf: Disable intel_bts when PTI
security/Kconfig: Correct the Documentation reference for PTI
x86/pti: Fix !PCID and sanitize defines
selftests/x86: Add test_vsyscall
x86/retpoline: Fill return stack buffer on vmexit
x86/retpoline/irq32: Convert assembler indirect jumps
x86/retpoline/checksum32: Convert assembler indirect jumps
x86/retpoline/xen: Convert Xen hypercall indirect jumps
x86/retpoline/hyperv: Convert assembler indirect jumps
x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
x86/retpoline/entry: Convert entry assembler indirect jumps
x86/retpoline/crypto: Convert crypto assembler indirect jumps
x86/spectre: Add boot time option to select Spectre v2 mitigation
x86/retpoline: Add initial retpoline support
objtool: Allow alternatives to be ignored
objtool: Detect jumps to retpoline thunks
x86/pti: Make unpoison of pgd for trusted boot work for real
x86/alternatives: Fix optimize_nops() checking
sysfs/cpu: Fix typos in vulnerability documentation
x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
...
Add a spectre_v2= option to select the mitigation used for the indirect
branch speculation vulnerability.
Currently, the only option available is retpoline, in its various forms.
This will be expanded to cover the new IBRS/IBPB microcode features.
The RETPOLINE_AMD feature relies on a serializing LFENCE for speculation
control. For AMD hardware, only set RETPOLINE_AMD if LFENCE is a
serializing instruction, which is indicated by the LFENCE_RDTSC feature.
[ tglx: Folded back the LFENCE/AMD fixes and reworked it so IBRS
integration becomes simple ]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-5-git-send-email-dwmw@amazon.co.uk
Enable the use of -mindirect-branch=thunk-extern in newer GCC, and provide
the corresponding thunks. Provide assembler macros for invoking the thunks
in the same way that GCC does, from native and inline assembler.
This adds X86_FEATURE_RETPOLINE and sets it by default on all CPUs. In
some circumstances, IBRS microcode features may be used instead, and the
retpoline can be disabled.
On AMD CPUs if lfence is serialising, the retpoline can be dramatically
simplified to a simple "lfence; jmp *\reg". A future patch, after it has
been verified that lfence really is serialising in all circumstances, can
enable this by setting the X86_FEATURE_RETPOLINE_AMD feature bit in addition
to X86_FEATURE_RETPOLINE.
Do not align the retpoline in the altinstr section, because there is no
guarantee that it stays aligned when it's copied over the oldinstr during
alternative patching.
[ Andi Kleen: Rename the macros, add CONFIG_RETPOLINE option, export thunks]
[ tglx: Put actual function CALL/JMP in front of the macros, convert to
symbolic labels ]
[ dwmw2: Convert back to numeric labels, merge objtool fixes ]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-4-git-send-email-dwmw@amazon.co.uk
With LFENCE now a serializing instruction, use LFENCE_RDTSC in preference
to MFENCE_RDTSC. However, since the kernel could be running under a
hypervisor that does not support writing that MSR, read the MSR back and
verify that the bit has been set successfully. If the MSR can be read
and the bit is set, then set the LFENCE_RDTSC feature, otherwise set the
MFENCE_RDTSC feature.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/20180108220932.12580.52458.stgit@tlendack-t1.amdoffice.net
To aid in speculation control, make LFENCE a serializing instruction
since it has less overhead than MFENCE. This is done by setting bit 1
of MSR 0xc0011029 (DE_CFG). Some families that support LFENCE do not
have this MSR. For these families, the LFENCE instruction is already
serializing.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/20180108220921.12580.71694.stgit@tlendack-t1.amdoffice.net
Implement the CPU vulnerabilty show functions for meltdown, spectre_v1 and
spectre_v2.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/20180107214913.177414879@linutronix.de
Add the bug bits for spectre v1/2 and force them unconditionally for all
cpus.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1515239374-23361-2-git-send-email-dwmw@amazon.co.uk
Instead of blacklisting all model 79 CPUs when attempting a late
microcode loading, limit that only to CPUs with microcode revisions <
0x0b000021 because only on those late loading may cause a system hang.
For such processors either:
a) a BIOS update which might contain a newer microcode revision
or
b) the early microcode loading method
should be considered.
Processors with revisions 0x0b000021 or higher will not experience such
hangs.
For more details, see erratum BDF90 in document #334165 (Intel Xeon
Processor E7-8800/4800 v4 Product Family Specification Update) from
September 2017.
[ bp: Heavily massage commit message and pr_* statements. ]
Fixes: 723f2828a9 ("x86/microcode/intel: Disable late loading on model 79")
Signed-off-by: Jia Zhang <qianyue.zj@alibaba-inc.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Cc: <stable@vger.kernel.org> # v4.14
Link: http://lkml.kernel.org/r/1514772287-92959-1-git-send-email-qianyue.zj@alibaba-inc.com
Pull more x86 pti fixes from Thomas Gleixner:
"Another small stash of fixes for fallout from the PTI work:
- Fix the modules vs. KASAN breakage which was caused by making
MODULES_END depend of the fixmap size. That was done when the cpu
entry area moved into the fixmap, but now that we have a separate
map space for that this is causing more issues than it solves.
- Use the proper cache flush methods for the debugstore buffers as
they are mapped/unmapped during runtime and not statically mapped
at boot time like the rest of the cpu entry area.
- Make the map layout of the cpu_entry_area consistent for 4 and 5
level paging and fix the KASLR vaddr_end wreckage.
- Use PER_CPU_EXPORT for per cpu variable and while at it unbreak
nvidia gfx drivers by dropping the GPL export. The subject line of
the commit tells it the other way around, but I noticed that too
late.
- Fix the ASM alternative macros so they can be used in the middle of
an inline asm block.
- Rename the BUG_CPU_INSECURE flag to BUG_CPU_MELTDOWN so the attack
vector is properly identified. The Spectre mitigations will come
with their own bug bits later"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN
x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm
x86/tlb: Drop the _GPL from the cpu_tlbstate export
x86/events/intel/ds: Use the proper cache flush method for mapping ds buffers
x86/kaslr: Fix the vaddr_end mess
x86/mm: Map cpu_entry_area at the same place on 4/5 level
x86/mm: Set MODULES_END to 0xffffffffff000000
Use the name associated with the particular attack which needs page table
isolation for mitigation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Jiri Koshina <jikos@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Lutomirski <luto@amacapital.net>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Greg KH <gregkh@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801051525300.1724@nanos
print_symbol() is a very old API that has been obsoleted by %pS format
specifier in a normal printk() call.
Replace print_symbol() with a direct printk("%pS") call and correctly
handle continuous lines.
Link: http://lkml.kernel.org/r/20171211125025.2270-9-sergey.senozhatsky@gmail.com
To: Andrew Morton <akpm@linux-foundation.org>
To: Russell King <linux@armlinux.org.uk>
To: Catalin Marinas <catalin.marinas@arm.com>
To: Mark Salter <msalter@redhat.com>
To: Tony Luck <tony.luck@intel.com>
To: David Howells <dhowells@redhat.com>
To: Yoshinori Sato <ysato@users.sourceforge.jp>
To: Guan Xuetao <gxt@mprc.pku.edu.cn>
To: Borislav Petkov <bp@alien8.de>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Thomas Gleixner <tglx@linutronix.de>
To: Peter Zijlstra <peterz@infradead.org>
To: Vineet Gupta <vgupta@synopsys.com>
To: Fengguang Wu <fengguang.wu@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-am33-list@redhat.com
Cc: linux-sh@vger.kernel.org
Cc: linux-edac@vger.kernel.org
Cc: x86@kernel.org
Cc: linux-snps-arc@lists.infradead.org
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Borislav Petkov <bp@suse.de> # mce.c part
[pmladek@suse.com: updated commit message]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Pull x86 page table isolation fixes from Thomas Gleixner:
"A couple of urgent fixes for PTI:
- Fix a PTE mismatch between user and kernel visible mapping of the
cpu entry area (differs vs. the GLB bit) and causes a TLB mismatch
MCE on older AMD K8 machines
- Fix the misplaced CR3 switch in the SYSCALL compat entry code which
causes access to unmapped kernel memory resulting in double faults.
- Fix the section mismatch of the cpu_tss_rw percpu storage caused by
using a different mechanism for declaration and definition.
- Two fixes for dumpstack which help to decode entry stack issues
better
- Enable PTI by default in Kconfig. We should have done that earlier,
but it slipped through the cracks.
- Exclude AMD from the PTI enforcement. Not necessarily a fix, but if
AMD is so confident that they are not affected, then we should not
burden users with the overhead"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/process: Define cpu_tss_rw in same section as declaration
x86/pti: Switch to kernel CR3 at early in entry_SYSCALL_compat()
x86/dumpstack: Print registers for first stack frame
x86/dumpstack: Fix partial register dumps
x86/pti: Make sure the user/kernel PTEs match
x86/cpu, x86/pti: Do not enable PTI on AMD processors
x86/pti: Enable PTI by default
AMD processors are not subject to the types of attacks that the kernel
page table isolation feature protects against. The AMD microarchitecture
does not allow memory references, including speculative references, that
access higher privileged data when running in a lesser privileged mode
when that access would result in a page fault.
Disable page table isolation by default on AMD processors by not setting
the X86_BUG_CPU_INSECURE feature, which controls whether X86_FEATURE_PTI
is set.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20171227054354.20369.94587.stgit@tlendack-t1.amdoffice.net
Pull x86 page table isolation updates from Thomas Gleixner:
"This is the final set of enabling page table isolation on x86:
- Infrastructure patches for handling the extra page tables.
- Patches which map the various bits and pieces which are required to
get in and out of user space into the user space visible page
tables.
- The required changes to have CR3 switching in the entry/exit code.
- Optimizations for the CR3 switching along with documentation how
the ASID/PCID mechanism works.
- Updates to dump pagetables to cover the user space page tables for
W+X scans and extra debugfs files to analyze both the kernel and
the user space visible page tables
The whole functionality is compile time controlled via a config switch
and can be turned on/off on the command line as well"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
x86/ldt: Make the LDT mapping RO
x86/mm/dump_pagetables: Allow dumping current pagetables
x86/mm/dump_pagetables: Check user space page table for WX pages
x86/mm/dump_pagetables: Add page table directory to the debugfs VFS hierarchy
x86/mm/pti: Add Kconfig
x86/dumpstack: Indicate in Oops whether PTI is configured and enabled
x86/mm: Clarify the whole ASID/kernel PCID/user PCID naming
x86/mm: Use INVPCID for __native_flush_tlb_single()
x86/mm: Optimize RESTORE_CR3
x86/mm: Use/Fix PCID to optimize user/kernel switches
x86/mm: Abstract switching CR3
x86/mm: Allow flushing for future ASID switches
x86/pti: Map the vsyscall page if needed
x86/pti: Put the LDT in its own PGD if PTI is on
x86/mm/64: Make a full PGD-entry size hole in the memory map
x86/events/intel/ds: Map debug buffers in cpu_entry_area
x86/cpu_entry_area: Add debugstore entries to cpu_entry_area
x86/mm/pti: Map ESPFIX into user space
x86/mm/pti: Share entry text PMD
x86/entry: Align entry text section to PMD boundary
...
Force the entry through the trampoline only when PTI is active. Otherwise
go through the normal entry code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Many x86 CPUs leak information to user space due to missing isolation of
user space and kernel space page tables. There are many well documented
ways to exploit that.
The upcoming software migitation of isolating the user and kernel space
page tables needs a misfeature flag so code can be made runtime
conditional.
Add the BUG bits which indicates that the CPU is affected and add a feature
bit which indicates that the software migitation is enabled.
Assume for now that _ALL_ x86 CPUs are affected by this. Exceptions can be
made later.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 PTI preparatory patches from Thomas Gleixner:
"Todays Advent calendar window contains twentyfour easy to digest
patches. The original plan was to have twenty three matching the date,
but a late fixup made that moot.
- Move the cpu_entry_area mapping out of the fixmap into a separate
address space. That's necessary because the fixmap becomes too big
with NRCPUS=8192 and this caused already subtle and hard to
diagnose failures.
The top most patch is fresh from today and cures a brain slip of
that tall grumpy german greybeard, who ignored the intricacies of
32bit wraparounds.
- Limit the number of CPUs on 32bit to 64. That's insane big already,
but at least it's small enough to prevent address space issues with
the cpu_entry_area map, which have been observed and debugged with
the fixmap code
- A few TLB flush fixes in various places plus documentation which of
the TLB functions should be used for what.
- Rename the SYSENTER stack to CPU_ENTRY_AREA stack as it is used for
more than sysenter now and keeping the name makes backtraces
confusing.
- Prevent LDT inheritance on exec() by moving it to arch_dup_mmap(),
which is only invoked on fork().
- Make vysycall more robust.
- A few fixes and cleanups of the debug_pagetables code. Check
PAGE_PRESENT instead of checking the PTE for 0 and a cleanup of the
C89 initialization of the address hint array which already was out
of sync with the index enums.
- Move the ESPFIX init to a different place to prepare for PTI.
- Several code moves with no functional change to make PTI
integration simpler and header files less convoluted.
- Documentation fixes and clarifications"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/cpu_entry_area: Prevent wraparound in setup_cpu_entry_area_ptes() on 32bit
init: Invoke init_espfix_bsp() from mm_init()
x86/cpu_entry_area: Move it out of the fixmap
x86/cpu_entry_area: Move it to a separate unit
x86/mm: Create asm/invpcid.h
x86/mm: Put MMU to hardware ASID translation in one place
x86/mm: Remove hard-coded ASID limit checks
x86/mm: Move the CR3 construction functions to tlbflush.h
x86/mm: Add comments to clarify which TLB-flush functions are supposed to flush what
x86/mm: Remove superfluous barriers
x86/mm: Use __flush_tlb_one() for kernel memory
x86/microcode: Dont abuse the TLB-flush interface
x86/uv: Use the right TLB-flush API
x86/entry: Rename SYSENTER_stack to CPU_ENTRY_AREA_entry_stack
x86/doc: Remove obvious weirdnesses from the x86 MM layout documentation
x86/mm/64: Improve the memory map documentation
x86/ldt: Prevent LDT inheritance on exec
x86/ldt: Rework locking
arch, mm: Allow arch_dup_mmap() to fail
x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode
...
Separate the cpu_entry_area code out of cpu/common.c and the fixmap.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit:
ec400ddeff ("x86/microcode_intel_early.c: Early update ucode on Intel's CPU")
... grubbed into tlbflush internals without coherent explanation.
Since it says its a precaution and the SDM doesn't mention anything like
this, take it out back.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: fenghua.yu@intel.com
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If the kernel oopses while on the trampoline stack, it will print
"<SYSENTER>" even if SYSENTER is not involved. That is rather confusing.
The "SYSENTER" stack is used for a lot more than SYSENTER now. Give it a
better string to display in stack dumps, and rename the kernel code to
match.
Also move the 32-bit code over to the new naming even though it still uses
the entry stack only for SYSENTER.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 syscall entry code changes for PTI from Ingo Molnar:
"The main changes here are Andy Lutomirski's changes to switch the
x86-64 entry code to use the 'per CPU entry trampoline stack'. This,
besides helping fix KASLR leaks (the pending Page Table Isolation
(PTI) work), also robustifies the x86 entry code"
* 'WIP.x86-pti.entry-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
x86/cpufeatures: Make CPU bugs sticky
x86/paravirt: Provide a way to check for hypervisors
x86/paravirt: Dont patch flush_tlb_single
x86/entry/64: Make cpu_entry_area.tss read-only
x86/entry: Clean up the SYSENTER_stack code
x86/entry/64: Remove the SYSENTER stack canary
x86/entry/64: Move the IST stacks into struct cpu_entry_area
x86/entry/64: Create a per-CPU SYSCALL entry trampoline
x86/entry/64: Return to userspace from the trampoline stack
x86/entry/64: Use a per-CPU trampoline stack for IDT entries
x86/espfix/64: Stop assuming that pt_regs is on the entry stack
x86/entry/64: Separate cpu_current_top_of_stack from TSS.sp0
x86/entry: Remap the TSS into the CPU entry area
x86/entry: Move SYSENTER_stack to the beginning of struct tss_struct
x86/dumpstack: Handle stack overflow on all stacks
x86/entry: Fix assumptions that the HW TSS is at the beginning of cpu_tss
x86/kasan/64: Teach KASAN about the cpu_entry_area
x86/mm/fixmap: Generalize the GDT fixmap mechanism, introduce struct cpu_entry_area
x86/entry/gdt: Put per-CPU GDT remaps in ascending order
x86/dumpstack: Add get_stack_info() support for the SYSENTER stack
...
AMD systems may log Deferred errors. These are errors that are uncorrected
but which do not need immediate action. The MCA_STATUS[UC] bit may not be
set for Deferred errors.
Flag the error as not correctable when MCA_STATUS[Deferred] is set and
do not feed it into the Correctable Errors Collector.
[ bp: Massage commit message. ]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171212165143.27475-1-Yazen.Ghannam@amd.com
The MCA_STATUS[ErrorCodeExt] field is very bank type specific.
We currently check if the ErrorCodeExt value is 0x0 or 0x8 in
mce_is_memory_error(), but we don't check the bank number. This means
that we could flag non-memory errors as memory errors.
We know that we want to flag DRAM ECC errors as memory errors, so let's do
those cases first. We can add more cases later when needed.
Define a wrapper function in mce_amd.c so we can use SMCA enums.
[ bp: Remove brackets around return statements. ]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171207203955.118171-2-Yazen.Ghannam@amd.com
Scalable MCA systems have various types of banks. The bank's type
can determine how we handle errors from it. For example, if a bank
represents a UMC (Unified Memory Controller) then we will need to
convert its address from a normalized address to a system physical
address before handling the error.
[ bp: Verify m->bank is within range and use bank pointer. ]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171207203955.118171-1-Yazen.Ghannam@amd.com
There is currently no way to force CPU bug bits like CPU feature bits. That
makes it impossible to set a bug bit once at boot and have it stick for all
upcoming CPUs.
Extend the force set/clear arrays to handle bug bits as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150606.992156574@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The TSS is a fairly juicy target for exploits, and, now that the TSS
is in the cpu_entry_area, it's no longer protected by kASLR. Make it
read-only on x86_64.
On x86_32, it can't be RO because it's written by the CPU during task
switches, and we use a task gate for double faults. I'd also be
nervous about errata if we tried to make it RO even on configurations
without double fault handling.
[ tglx: AMD confirmed that there is no problem on 64-bit with TSS RO. So
it's probably safe to assume that it's a non issue, though Intel
might have been creative in that area. Still waiting for
confirmation. ]
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bpetkov@suse.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150606.733700132@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The existing code was a mess, mainly because C arrays are nasty. Turn
SYSENTER_stack into a struct, add a helper to find it, and do all the
obvious cleanups this enables.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bpetkov@suse.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150606.653244723@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The IST stacks are needed when an IST exception occurs and are accessed
before any kernel code at all runs. Move them into struct cpu_entry_area.
The IST stacks are unlike the rest of cpu_entry_area: they're used even for
entries from kernel mode. This means that they should be set up before we
load the final IDT. Move cpu_entry_area setup to trap_init() for the boot
CPU and set it up for all possible CPUs at once in native_smp_prepare_cpus().
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150606.480598743@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Handling SYSCALL is tricky: the SYSCALL handler is entered with every
single register (except FLAGS), including RSP, live. It somehow needs
to set RSP to point to a valid stack, which means it needs to save the
user RSP somewhere and find its own stack pointer. The canonical way
to do this is with SWAPGS, which lets us access percpu data using the
%gs prefix.
With PAGE_TABLE_ISOLATION-like pagetable switching, this is
problematic. Without a scratch register, switching CR3 is impossible, so
%gs-based percpu memory would need to be mapped in the user pagetables.
Doing that without information leaks is difficult or impossible.
Instead, use a different sneaky trick. Map a copy of the first part
of the SYSCALL asm at a different address for each CPU. Now RIP
varies depending on the CPU, so we can use RIP-relative memory access
to access percpu memory. By putting the relevant information (one
scratch slot and the stack address) at a constant offset relative to
RIP, we can make SYSCALL work without relying on %gs.
A nice thing about this approach is that we can easily switch it on
and off if we want pagetable switching to be configurable.
The compat variant of SYSCALL doesn't have this problem in the first
place -- there are plenty of scratch registers, since we don't care
about preserving r8-r15. This patch therefore doesn't touch SYSCALL32
at all.
This patch actually seems to be a small speedup. With this patch,
SYSCALL touches an extra cache line and an extra virtual page, but
the pipeline no longer stalls waiting for SWAPGS. It seems that, at
least in a tight loop, the latter outweights the former.
Thanks to David Laight for an optimization tip.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bpetkov@suse.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150606.403607157@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Historically, IDT entries from usermode have always gone directly
to the running task's kernel stack. Rearrange it so that we enter on
a per-CPU trampoline stack and then manually switch to the task's stack.
This touches a couple of extra cachelines, but it gives us a chance
to run some code before we touch the kernel stack.
The asm isn't exactly beautiful, but I think that fully refactoring
it can wait.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150606.225330557@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This has a secondary purpose: it puts the entry stack into a region
with a well-controlled layout. A subsequent patch will take
advantage of this to streamline the SYSCALL entry code to be able to
find it more easily.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bpetkov@suse.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150605.962042855@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
SYSENTER_stack should have reliable overflow detection, which
means that it needs to be at the bottom of a page, not the top.
Move it to the beginning of struct tss_struct and page-align it.
Also add an assertion to make sure that the fixed hardware TSS
doesn't cross a page boundary.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150605.881827433@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A future patch will move SYSENTER_stack to the beginning of cpu_tss
to help detect overflow. Before this can happen, fix several code
paths that hardcode assumptions about the old layout.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150605.722425540@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently, the GDT is an ad-hoc array of pages, one per CPU, in the
fixmap. Generalize it to be an array of a new 'struct cpu_entry_area'
so that we can cleanly add new things to it.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150605.563271721@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This will simplify future changes that want scratch variables early in
the SYSENTER handler -- they'll be able to spill registers to the
stack. It also lets us get rid of a SWAPGS_UNSAFE_STACK user.
This does not depend on CONFIG_IA32_EMULATION=y because we'll want the
stack space even without IA32 emulation.
As far as I can tell, the reason that this wasn't done from day 1 is
that we use IST for #DB and #BP, which is IMO rather nasty and causes
a lot more problems than it solves. But, since #DB uses IST, we don't
actually need a real stack for SYSENTER (because SYSENTER with TF set
will invoke #DB on the IST stack rather than the SYSENTER stack).
I want to remove IST usage from these vectors some day, and this patch
is a prerequisite for that as well.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150605.312726423@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[ Note, this is a Git cherry-pick of the following commit:
2b67799bdf25 ("x86: Make X86_BUG_FXSAVE_LEAK detectable in CPUID on AMD")
... for easier x86 PTI code testing and back-porting. ]
The latest AMD AMD64 Architecture Programmer's Manual
adds a CPUID feature XSaveErPtr (CPUID_Fn80000008_EBX[2]).
If this feature is set, the FXSAVE, XSAVE, FXSAVEOPT, XSAVEC, XSAVES
/ FXRSTOR, XRSTOR, XRSTORS always save/restore error pointers,
thus making the X86_BUG_FXSAVE_LEAK workaround obsolete on such CPUs.
Signed-Off-By: Rudolf Marek <r.marek@assembler.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Link: https://lkml.kernel.org/r/bdcebe90-62c5-1f05-083c-eba7f08b2540@assembler.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull misc x86 fixes from Ingo Molnar:
- make CR4 handling irq-safe, which bug vmware guests ran into
- don't crash on early IRQs in Xen guests
- don't crash secondary CPU bringup if #UD assisted WARN()ings are
triggered
- make X86_BUG_FXSAVE_LEAK optional on newer AMD CPUs that have the fix
- fix AMD Fam17h microcode loading
- fix broadcom_postcore_init() if ACPI is disabled
- fix resume regression in __restore_processor_context()
- fix Sparse warnings
- fix a GCC-8 warning
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vdso: Change time() prototype to match __vdso_time()
x86: Fix Sparse warnings about non-static functions
x86/power: Fix some ordering bugs in __restore_processor_context()
x86/PCI: Make broadcom_postcore_init() check acpi_disabled
x86/microcode/AMD: Add support for fam17h microcode loading
x86/cpufeatures: Make X86_BUG_FXSAVE_LEAK detectable in CPUID on AMD
x86/idt: Load idt early in start_secondary
x86/xen: Support early interrupts in xen pv guests
x86/tlb: Disable interrupts when changing CR4
x86/tlb: Refactor CR4 setting and shadow write
The size for the Microcode Patch Block (MPB) for an AMD family 17h
processor is 3200 bytes. Add a #define for fam17h so that it does
not default to 2048 bytes and fail a microcode load/update.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20171130224640.15391.40247.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The latest AMD AMD64 Architecture Programmer's Manual
adds a CPUID feature XSaveErPtr (CPUID_Fn80000008_EBX[2]).
If this feature is set, the FXSAVE, XSAVE, FXSAVEOPT, XSAVEC, XSAVES
/ FXRSTOR, XRSTOR, XRSTORS always save/restore error pointers,
thus making the X86_BUG_FXSAVE_LEAK workaround obsolete on such CPUs.
Signed-off-by: Rudolf Marek <r.marek@assembler.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Link: https://lkml.kernel.org/r/bdcebe90-62c5-1f05-083c-eba7f08b2540@assembler.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The McaIntrCfg register (MSRC000_0410), previously known as CU_DEFER_ERR,
is used on SMCA systems to set the LVT offset for the Threshold and
Deferred error interrupts.
This register was used on non-SMCA systems to also set the Deferred
interrupt type in bits 2:1. However, these bits are reserved on SMCA
systems.
Only set MSRC000_0410[2:1] on non-SMCA systems.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20171120162646.5210-1-Yazen.Ghannam@amd.com
According to the Intel SDM Volume 3B (253669-063US, July 2017), action
optional (SRAO) errors can be reported either via MCE or CMC:
In cases when SRAO is signaled via CMCI the error signature is
indicated via UC=1, PCC=0, S=0.
Type(*1) UC EN PCC S AR Signaling
---------------------------------------------------------------
UC 1 1 1 x x MCE
SRAR 1 1 0 1 1 MCE
SRAO 1 x(*2) 0 x(*2) 0 MCE/CMC
UCNA 1 x 0 0 0 CMC
CE 0 x x x x CMC
NOTES:
1. SRAR, SRAO and UCNA errors are supported by the processor only
when IA32_MCG_CAP[24] (MCG_SER_P) is set.
2. EN=1, S=1 when signaled via MCE. EN=x, S=0 when signaled via CMC.
And there is a description in 15.6.2 UCR Error Reporting and Logging, for
bit S:
S (Signaling) flag, bit 56 - Indicates (when set) that a machine check
exception was generated for the UCR error reported in this MC bank...
When the S flag in the IA32_MCi_STATUS register is clear, this UCR error
was not signaled via a machine check exception and instead was reported
as a corrected machine check (CMC).
So merge the two cases and just remove the S=0 check for SRAO in
mce_severity().
[ Borislav: Massage commit message.]
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Chen Wei <chenwei68@huawei.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1511575548-41992-1-git-send-email-xiexiuqi@huawei.com
Update the CPU features to include identifying and reporting on the
Secure Encrypted Virtualization (SEV) feature. SEV is identified by
CPUID 0x8000001f, but requires BIOS support to enable it (set bit 23 of
MSR_K8_SYSCFG and set bit 0 of MSR_K7_HWCR). Only show the SEV feature
as available if reported by CPUID and enabled by BIOS.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: kvm@vger.kernel.org
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Pull misc x86 fixes from Ingo Molnar:
- topology enumeration fixes
- KASAN fix
- two entry fixes (not yet the big series related to KASLR)
- remove obsolete code
- instruction decoder fix
- better /dev/mem sanity checks, hopefully working better this time
- pkeys fixes
- two ACPI fixes
- 5-level paging related fixes
- UMIP fixes that should make application visible faults more debuggable
- boot fix for weird virtualization environment
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/decoder: Add new TEST instruction pattern
x86/PCI: Remove unused HyperTransport interrupt support
x86/umip: Fix insn_get_code_seg_params()'s return value
x86/boot/KASLR: Remove unused variable
x86/entry/64: Add missing irqflags tracing to native_load_gs_index()
x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow
x86/entry/64: Fix entry_SYSCALL_64_after_hwframe() IRQ tracing
x86/pkeys/selftests: Fix protection keys write() warning
x86/pkeys/selftests: Rename 'si_pkey' to 'siginfo_pkey'
x86/mpx/selftests: Fix up weird arrays
x86/pkeys: Update documentation about availability
x86/umip: Print a warning into the syslog if UMIP-protected instructions are used
x86/smpboot: Fix __max_logical_packages estimate
x86/topology: Avoid wasting 128k for package id array
perf/x86/intel/uncore: Cache logical pkg id in uncore driver
x86/acpi: Reduce code duplication in mp_override_legacy_irq()
x86/acpi: Handle SCI interrupts above legacy space gracefully
x86/boot: Fix boot failure when SMP MP-table is based at 0
x86/mm: Limit mmap() of /dev/mem to valid physical addresses
x86/selftests: Add test for mapping placement for 5-level paging
...
This is the change making /proc/cpuinfo on x86 report current
CPU frequency in "cpu MHz" again in all cases and an additional
one dealing with an overzealous check in one of the helper
routines in the runtime PM framework.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJaDvBIAAoJEILEb/54YlRxZ58QAJP6p53XDcml8Risw9CrpnZV
6kBdFTYn6JSJiE4cALTER14ScqHQdTP2M6QJPDDLV5LwiQFa5fJYsSNP7F1Dpg4r
8V3QNZbBjpyc8rSGRUkjY7+WsvUUb2UWzEkLIUjOWIT4mfC969JxV/fBYEL7ZDn9
Wg7q79qI5Tss9PU2GUmaFtdkR0lqUIdNrrWe+qyLl0XHkrmU8DGL4XkPykdkwX0L
gn0i/RrK+5DBUVPR1qQTU2CO3751IdIDktpK3RLmWl/yb4TqlM4WKIhIZvvglc2g
S+OWGg/E4CNU6/EcGllNCPENAH7v0FNvvLMslPs6ao+wGQBcgO4R5d70dzobph/i
P1ns6iJbd+lgRlGSQBReVo/FWcwi4HrINRxAB4W88dBBxchHdt+G3/Juq6GiGEJi
mOh3ZHWd0J3mQEIWLKEcm5nHwIeY9yhCFJIpr5azte7JIz1fDuMnnp2gYl1SOVCK
CHv0uD8Mw7hQFC0Dzje8T0Hr29MBwpEJiXE4Eh+Fp4zWiI7BYd1TNtp5WPDtchhv
weqFqgDArN5gpkrZuSsxxg8eeRRwPeQR/mCyxofmsQ5lplCVJi8Ieqcf/KZrCy/c
1vHGJsn9ec2dNeQKTFFT5luznQSSSXoZCXprumFuTp2804E3Hpkf/UnAldc4EYSn
SwzAOO3gNA76eaFikvTK
=h6Ux
-----END PGP SIGNATURE-----
Merge tag 'pm-fixes-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull two power management fixes from Rafael Wysocki:
"This is the change making /proc/cpuinfo on x86 report current CPU
frequency in "cpu MHz" again in all cases and an additional one
dealing with an overzealous check in one of the helper routines in the
runtime PM framework"
* tag 'pm-fixes-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / runtime: Drop children check from __pm_runtime_set_status()
x86 / CPU: Always show current CPU frequency in /proc/cpuinfo
After commit 890da9cf09 (Revert "x86: do not use cpufreq_quick_get()
for /proc/cpuinfo "cpu MHz"") the "cpu MHz" number in /proc/cpuinfo
on x86 can be either the nominal CPU frequency (which is constant)
or the frequency most recently requested by a scaling governor in
cpufreq, depending on the cpufreq configuration. That is somewhat
inconsistent and is different from what it was before 4.13, so in
order to restore the previous behavior, make it report the current
CPU frequency like the scaling_cur_freq sysfs file in cpufreq.
To that end, modify the /proc/cpuinfo implementation on x86 to use
aperfmperf_snapshot_khz() to snapshot the APERF and MPERF feedback
registers, if available, and use their values to compute the CPU
frequency to be reported as "cpu MHz".
However, do that carefully enough to avoid accumulating delays that
lead to unacceptable access times for /proc/cpuinfo on systems with
many CPUs. Run aperfmperf_snapshot_khz() once on all CPUs
asynchronously at the /proc/cpuinfo open time, add a single delay
upfront (if necessary) at that point and simply compute the current
frequency while running show_cpuinfo() for each individual CPU.
Also, to avoid slowing down /proc/cpuinfo accesses too much, reduce
the default delay between consecutive APERF and MPERF reads to 10 ms,
which should be sufficient to get large enough numbers for the
frequency computation in all cases.
Fixes: 890da9cf09 (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Even though aperfmperf_snapshot_khz() caches the samples.khz value to
return if called again in a sufficiently short time, its caller,
arch_freq_get_on_cpu(), still uses smp_call_function_single() to run it
which may allow user space to trigger an IPI storm by reading from the
scaling_cur_freq cpufreq sysfs file in a tight loop.
To avoid that, move the decision on whether or not to return the cached
samples.khz value to arch_freq_get_on_cpu().
This change was part of commit 941f5f0f6e ("x86: CPU: Fix up "cpu MHz"
in /proc/cpuinfo"), but it was not the reason for the revert and it
remains applicable.
Fixes: 4815d3c56d (cpufreq: x86: Make scaling_cur_freq behave more as expected)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: WANG Chao <chao.wang@ucloud.cn>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 cache resource updates from Thomas Gleixner:
"This update provides updates to RDT:
- A diagnostic framework for the Resource Director Technology (RDT)
user interface (sysfs). The failure modes of the user interface are
hard to diagnose from the error codes. An extra last command status
file provides now sensible textual information about the failure so
its simpler to use.
- A few minor cleanups and updates in the RDT code"
* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/intel_rdt: Fix a silent failure when writing zero value schemata
x86/intel_rdt: Fix potential deadlock during resctrl mount
x86/intel_rdt: Fix potential deadlock during resctrl unmount
x86/intel_rdt: Initialize bitmask of shareable resource if CDP enabled
x86/intel_rdt: Remove redundant assignment
x86/intel_rdt/cqm: Make integer rmid_limbo_count static
x86/intel_rdt: Add documentation for "info/last_cmd_status"
x86/intel_rdt: Add diagnostics when making directories
x86/intel_rdt: Add diagnostics when writing the cpus file
x86/intel_rdt: Add diagnostics when writing the tasks file
x86/intel_rdt: Add diagnostics when writing the schemata file
x86/intel_rdt: Add framework for better RDT UI diagnostics
Pull x86 platform updates from Ingo Molnar:
"The main changes in this cycle were:
- a refactoring of the early virt init code by merging 'struct
x86_hyper' into 'struct x86_platform' and 'struct x86_init', which
allows simplifications and also the addition of a new
->guest_late_init() callback. (Juergen Gross)
- timer_setup() conversion of the UV code (Kees Cook)"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/virt/xen: Use guest_late_init to detect Xen PVH guest
x86/virt, x86/platform: Add ->guest_late_init() callback to hypervisor_x86 structure
x86/virt, x86/acpi: Add test for ACPI_FADT_NO_VGA
x86/virt: Add enum for hypervisors to replace x86_hyper
x86/virt, x86/platform: Merge 'struct x86_hyper' into 'struct x86_platform' and 'struct x86_init'
x86/platform/UV: Convert timers to use timer_setup()
Pull x86 core updates from Ingo Molnar:
"Note that in this cycle most of the x86 topics interacted at a level
that caused them to be merged into tip:x86/asm - but this should be a
temporary phenomenon, hopefully we'll back to the usual patterns in
the next merge window.
The main changes in this cycle were:
Hardware enablement:
- Add support for the Intel UMIP (User Mode Instruction Prevention)
CPU feature. This is a security feature that disables certain
instructions such as SGDT, SLDT, SIDT, SMSW and STR. (Ricardo Neri)
[ Note that this is disabled by default for now, there are some
smaller enhancements in the pipeline that I'll follow up with in
the next 1-2 days, which allows this to be enabled by default.]
- Add support for the AMD SEV (Secure Encrypted Virtualization) CPU
feature, on top of SME (Secure Memory Encryption) support that was
added in v4.14. (Tom Lendacky, Brijesh Singh)
- Enable new SSE/AVX/AVX512 CPU features: AVX512_VBMI2, GFNI, VAES,
VPCLMULQDQ, AVX512_VNNI, AVX512_BITALG. (Gayatri Kammela)
Other changes:
- A big series of entry code simplifications and enhancements (Andy
Lutomirski)
- Make the ORC unwinder default on x86 and various objtool
enhancements. (Josh Poimboeuf)
- 5-level paging enhancements (Kirill A. Shutemov)
- Micro-optimize the entry code a bit (Borislav Petkov)
- Improve the handling of interdependent CPU features in the early
FPU init code (Andi Kleen)
- Build system enhancements (Changbin Du, Masahiro Yamada)
- ... plus misc enhancements, fixes and cleanups"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (118 commits)
x86/build: Make the boot image generation less verbose
selftests/x86: Add tests for the STR and SLDT instructions
selftests/x86: Add tests for User-Mode Instruction Prevention
x86/traps: Fix up general protection faults caused by UMIP
x86/umip: Enable User-Mode Instruction Prevention at runtime
x86/umip: Force a page fault when unable to copy emulated result to user
x86/umip: Add emulation code for UMIP instructions
x86/cpufeature: Add User-Mode Instruction Prevention definitions
x86/insn-eval: Add support to resolve 16-bit address encodings
x86/insn-eval: Handle 32-bit address encodings in virtual-8086 mode
x86/insn-eval: Add wrapper function for 32 and 64-bit addresses
x86/insn-eval: Add support to resolve 32-bit address encodings
x86/insn-eval: Compute linear address in several utility functions
resource: Fix resource_size.cocci warnings
X86/KVM: Clear encryption attribute when SEV is active
X86/KVM: Decrypt shared per-cpu variables when SEV is active
percpu: Introduce DEFINE_PER_CPU_DECRYPTED
x86: Add support for changing memory encryption attribute in early boot
x86/io: Unroll string I/O when SEV is active
x86/boot: Add early boot support when running with SEV active
...
Pull RAS updates from Ingo Molnar:
"Two minor updates to AMD SMCA support, plus a timer_setup() conversion"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/MCE/AMD: Fix mce_severity_amd_smca() signature
x86/MCE/AMD: Always give panic severity for UC errors in kernel context
x86/mce: Convert timers to use timer_setup()
Writing an invalid schemata with no domain values (e.g., "(L3|MB):"),
results in a silent failure, i.e. the last_cmd_status returns OK,
Check for an empty value and set the result string with a proper error
message and return -EINVAL.
Before the fix:
# mkdir /sys/fs/resctrl/p1
# echo "L3:" > /sys/fs/resctrl/p1/schemata
(silent failure)
# cat /sys/fs/resctrl/info/last_cmd_status
ok
# echo "MB:" > /sys/fs/resctrl/p1/schemata
(silent failure)
# cat /sys/fs/resctrl/info/last_cmd_status
ok
After the fix:
# mkdir /sys/fs/resctrl/p1
# echo "L3:" > /sys/fs/resctrl/p1/schemata
-bash: echo: write error: Invalid argument
# cat /sys/fs/resctrl/info/last_cmd_status
Missing 'L3' value
# echo "MB:" > /sys/fs/resctrl/p1/schemata
-bash: echo: write error: Invalid argument
# cat /sys/fs/resctrl/info/last_cmd_status
Missing 'MB' value
[ Tony: This is an unintended side effect of the patch earlier to allow the
user to just write the value they want to change. While allowing
user to specify less than all of the values, it also allows an
empty value. ]
Fixes: c4026b7b95 ("x86/intel_rdt: Implement "update" mode when writing schemata file")
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lkml.kernel.org/r/20171110191624.20280-1-tony.luck@intel.com
This reverts commit 941f5f0f6e.
Sadly, it turns out that we really can't just do the cross-CPU IPI to
all CPU's to get their proper frequencies, because it's much too
expensive on systems with lots of cores.
So we'll have to revert this for now, and revisit it using a smarter
model (probably doing one system-wide IPI at open time, and doing all
the frequency calculations in parallel).
Reported-by: WANG Chao <chao.wang@ucloud.cn>
Reported-by: Ingo Molnar <mingo@kernel.org>
Cc: Rafael J Wysocki <rafael.j.wysocki@intel.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
User-Mode Instruction Prevention (UMIP) is enabled by setting/clearing a
bit in %cr4.
It makes sense to enable UMIP at some point while booting, before user
spaces come up. Like SMAP and SMEP, is not critical to have it enabled
very early during boot. This is because UMIP is relevant only when there is
a user space to be protected from. Given these similarities, UMIP can be
enabled along with SMAP and SMEP.
At the moment, UMIP is disabled by default at build time. It can be enabled
at build time by selecting CONFIG_X86_INTEL_UMIP. If enabled at build time,
it can be disabled at run time by adding clearcpuid=514 to the kernel
parameters.
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: ricardo.neri@intel.com
Link: http://lkml.kernel.org/r/1509935277-22138-10-git-send-email-ricardo.neri-calderon@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Change the err_ctx type to "enum context" to match the type passed in.
No functionality change.
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20171106174633.13576-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The AMD severity grading function was introduced in kernel 4.1. The
current logic can possibly give MCE_AR_SEVERITY for uncorrectable
errors in kernel context. The system may then get stuck in a loop as
memory_failure() will try to handle the bad kernel memory and find it
busy.
Return MCE_PANIC_SEVERITY for all UC errors IN_KERNEL context on AMD
systems.
After:
b2f9d678e2 ("x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries")
was accepted in v4.6, this issue was masked because of the tail-end attempt
at kernel mode recovery in the #MC handler.
However, uncorrectable errors IN_KERNEL context should always be considered
unrecoverable and cause a panic.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.9.x
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: bf80bbd7dc (x86/mce: Add an AMD severities-grading function)
Link: http://lkml.kernel.org/r/20171106174633.13576-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull RAS fix from Ingo Molnar:
"Fix an RCU warning that triggers when /dev/mcelog is used"
* 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mcelog: Get rid of RCU remnants
Commit 890da9cf09 (Revert "x86: do not use cpufreq_quick_get() for
/proc/cpuinfo "cpu MHz"") is not sufficient to restore the previous
behavior of "cpu MHz" in /proc/cpuinfo on x86 due to some changes
made after the commit it has reverted.
To address this, make the code in question use arch_freq_get_on_cpu()
which also is used by cpufreq for reporting the current frequency of
CPUs and since that function doesn't really depend on cpufreq in any
way, drop the CONFIG_CPU_FREQ dependency for the object file
containing it.
Also refactor arch_freq_get_on_cpu() somewhat to avoid IPIs and
return cached values right away if it is called very often over a
short time (to prevent user space from triggering IPI storms through
it).
Fixes: 890da9cf09 (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"")
Cc: stable@kernel.org # 4.13 - together with 890da9cf09
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 51204e0639.
There wasn't really any good reason for it, and people are complaining
(rightly) that it broke existing practice.
Cc: Len Brown <len.brown@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter pointed out that the set/clear_bit32() variants are broken in various
aspects.
Replace them with open coded set/clear_bit() and type cast
cpu_info::x86_capability as it's done in all other places throughout x86.
Fixes: 0b00de857a ("x86/cpuid: Add generic table for CPUID dependencies")
Reported-by: Peter Ziljstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In my quest to get rid of thread_struct::sp0, I want to clean up or
remove all of its readers. Two of them are in cpu_init() (32-bit and
64-bit), and they aren't needed. This is because we never enter
userspace at all on the threads that CPUs are initialized in.
Poison the initial TSS.sp0 and stop initializing it on CPU init.
The comment text mostly comes from Dave Hansen. Thanks!
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/ee4a00540ad28c6cff475fbcc7769a4460acc861.1509609304.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
load_sp0() had an odd signature:
void load_sp0(struct tss_struct *tss, struct thread_struct *thread);
Simplify it to:
void load_sp0(unsigned long sp0);
Also simplify a few get_cpu()/put_cpu() sequences to
preempt_disable()/preempt_enable().
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2655d8b42ed940aa384fe18ee1129bbbcf730a08.1509609304.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are about to commit complex rework of various x86 entry code details - create
a unified base tree (with FPU commits included) before doing that.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jeremy reported a suspicious RCU usage warning in mcelog.
/dev/mcelog is called in process context now as part of the notifier
chain and doesn't need any of the fancy RCU and lockless accesses which
it did in atomic context.
Axe it all in favor of a simple mutex synchronization which cures the
problem reported.
Fixes: 5de97c9f6d ("x86/mce: Factor out and deprecate the /dev/mcelog driver")
Reported-by: Jeremy Cline <jcline@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-and-tested-by: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: linux-edac@vger.kernel.org
Cc: Laura Abbott <labbott@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20171101164754.xzzmskl4ngrqc5br@pd.tnic
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1498969
Add a few new SSE/AVX/AVX512 instruction groups/features for enumeration
in /proc/cpuinfo: AVX512_VBMI2, GFNI, VAES, VPCLMULQDQ, AVX512_VNNI,
AVX512_BITALG.
CPUID.(EAX=7,ECX=0):ECX[bit 6] AVX512_VBMI2
CPUID.(EAX=7,ECX=0):ECX[bit 8] GFNI
CPUID.(EAX=7,ECX=0):ECX[bit 9] VAES
CPUID.(EAX=7,ECX=0):ECX[bit 10] VPCLMULQDQ
CPUID.(EAX=7,ECX=0):ECX[bit 11] AVX512_VNNI
CPUID.(EAX=7,ECX=0):ECX[bit 12] AVX512_BITALG
Detailed information of CPUID bits for these features can be found
in the Intel Architecture Instruction Set Extensions and Future Features
Programming Interface document (refer to Table 1-1. and Table 1-2.).
A copy of this document is available at
https://bugzilla.kernel.org/show_bug.cgi?id=197239
Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Yang Zhong <yang.zhong@intel.com>
Cc: bp@alien8.de
Link: http://lkml.kernel.org/r/1509412829-23380-1-git-send-email-gayatri.kammela@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The platform informs via CPUID.(EAX=0x10, ECX=res#):EBX[31:0] (valid res#
are only 1 for L3 and 2 for L2) which unit of the allocation may be used by
other entities in the platform. This information is valid whether CDP (Code
and Data Prioritization) is enabled or not.
Ensure that the bitmask of shareable resource is initialized when CDP is
enabled.
Fixes: 0dd2d7494c ("x86/intel_rdt: Show bitmask of shareable resource with other executing units"
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/815747bddc820ca221a8924edaf4d1a7324547e4.1508490116.git.reinette.chatre@intel.com
do_clear_cpu_cap() allocates a bitmap to keep track of disabled feature
dependencies. That bitmap is sized NCAPINTS * BITS_PER_INIT. The possible
'features' which can be handed in are larger than this, because after the
capabilities the bug 'feature' bits occupy another 32bit. Not really
obvious...
So clearing any of the misfeature bits, as 32bit does for the F00F bug,
accesses that bitmap out of bounds thereby corrupting the stack.
Size the bitmap proper and add a sanity check to catch accidental out of
bound access.
Fixes: 0b00de857a ("x86/cpuid: Add generic table for CPUID dependencies")
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20171018022023.GA12058@yexl-desktop
Blacklist Broadwell X model 79 for late loading due to an erratum.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171018111225.25635-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With a followon patch we want to make clearcpuid affect the XSAVE
configuration. But xsave is currently initialized before arguments
are parsed. Move the clearcpuid= parsing into the special
early xsave argument parsing code.
Since clearcpuid= contains a = we need to keep the old __setup
around as a dummy, otherwise it would end up as a environment
variable in init's environment.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171013215645.23166-4-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Some CPUID features depend on other features. Currently it's
possible to to clear dependent features, but not clear the base features,
which can cause various interesting problems.
This patch implements a generic table to describe dependencies
between CPUID features, to be used by all code that clears
CPUID.
Some subsystems (like XSAVE) had an own implementation of this,
but it's better to do it all in a single place for everyone.
Then clear_cpu_cap and setup_clear_cpu_cap always look up
this table and clear all dependencies too.
This is intended to be a practical table: only for features
that make sense to clear. If someone for example clears FPU,
or other features that are essentially part of the required
base feature set, not much is going to work. Handling
that is right now out of scope. We're only handling
features which can be usefully cleared.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jonathan McDowell <noodles@earth.li>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171013215645.23166-3-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The 'this_leaf' variable is assigned a value that is never
read and it is updated a little later with a newer value,
hence we can remove the redundant assignment.
Cleans up the following Clang warning:
Value stored to 'this_leaf' is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-janitors@vger.kernel.org
Link: http://lkml.kernel.org/r/20171015160203.12332-1-colin.king@canonical.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 fixes from Ingo Molnar:
"A landry list of fixes:
- fix reboot breakage on some PCID-enabled system
- fix crashes/hangs on some PCID-enabled systems
- fix microcode loading on certain older CPUs
- various unwinder fixes
- extend an APIC quirk to more hardware systems and disable APIC
related warning on virtualized systems
- various Hyper-V fixes
- a macro definition robustness fix
- remove jprobes IRQ disabling
- various mem-encryption fixes"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode: Do the family check first
x86/mm: Flush more aggressively in lazy TLB mode
x86/apic: Update TSC_DEADLINE quirk with additional SKX stepping
x86/apic: Silence "FW_BUG TSC_DEADLINE disabled due to Errata" on hypervisors
x86/mm: Disable various instrumentations of mm/mem_encrypt.c and mm/tlb.c
x86/hyperv: Fix hypercalls with extended CPU ranges for TLB flushing
x86/hyperv: Don't use percpu areas for pcpu_flush/pcpu_flush_ex structures
x86/hyperv: Clear vCPU banks between calls to avoid flushing unneeded vCPUs
x86/unwind: Disable unwinder warnings on 32-bit
x86/unwind: Align stack pointer in unwinder dump
x86/unwind: Use MSB for frame pointer encoding on 32-bit
x86/unwind: Fix dereference of untrusted pointer
x86/alternatives: Fix alt_max_short macro to really be a max()
x86/mm/64: Fix reboot interaction with CR4.PCIDE
kprobes/x86: Remove IRQ disabling from jprobe handlers
kprobes/x86: Set up frame pointer in kprobe trampoline
On CPUs like AMD's Geode, for example, we shouldn't even try to load
microcode because they do not support the modern microcode loading
interface.
However, we do the family check *after* the other checks whether the
loader has been disabled on the command line or whether we're running in
a guest.
So move the family checks first in order to exit early if we're being
loaded on an unsupported family.
Reported-and-tested-by: Sven Glodowski <glodi1@arcor.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.11..
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://bugzilla.suse.com/show_bug.cgi?id=1061396
Link: http://lkml.kernel.org/r/20171012112316.977-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly. Adjust sanity-check WARN to make sure
the triggering timer matches the current CPU timer.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac@vger.kernel.org
Link: https://lkml.kernel.org/r/20171005005425.GA23950@beast
Now that lguest is gone, put it in the internal header which should be
used only by MCA/RAS code.
Add missing header guards while at it.
No functional change.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20171002092836.22971-3-bp@alien8.de
The assignment to the 'files' variable is immediately overwritten
in the following line. Remove the older assignment, which was meant
specifially for creating control groups files.
Fixes: c7d9aac613 ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring")
Reported-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: tony.luck@intel.com
Cc: vikas.shivappa@intel.com
Link: https://lkml.kernel.org/r/1507157337-18118-1-git-send-email-jithu.joseph@intel.com
rmid_limbo_count is local to the source and does not need to be in global
scope, so make it static.
Cleans up sparse warning:
symbol 'rmid_limbo_count' was not declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: kernel-janitors@vger.kernel.org
Link: https://lkml.kernel.org/r/20171002145931.27479-1-colin.king@canonical.com