Commit c6202ca764 ("sparc64: Add auxiliary vectors to report
platform ADI properties") adds auxiliary vectors to report ADI
capabilities on sparc64 platform only. This needs to be uniform
across 64-bit and 32-bit. This patch makes the same vectors
available on 32-bit as well.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit f8ec66014f ("signal: Add send_sig_fault and force_sig_fault")
added new helper functions to streamline signal delivery. This patch
updates signal delivery for new/updated handlers for ADI related
exceptions to use the helper function.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ADI is a new feature supported on SPARC M7 and newer processors to allow
hardware to catch rogue accesses to memory. ADI is supported for data
fetches only and not instruction fetches. An app can enable ADI on its
data pages, set version tags on them and use versioned addresses to
access the data pages. Upper bits of the address contain the version
tag. On M7 processors, upper four bits (bits 63-60) contain the version
tag. If a rogue app attempts to access ADI enabled data pages, its
access is blocked and processor generates an exception. Please see
Documentation/sparc/adi.txt for further details.
This patch extends mprotect to enable ADI (TSTATE.mcde), enable/disable
MCD (Memory Corruption Detection) on selected memory ranges, enable
TTE.mcd in PTEs, return ADI parameters to userspace and save/restore ADI
version tags on page swap out/in or migration. ADI is not enabled by
default for any task. A task must explicitly enable ADI on a memory
range and set version tag for ADI to be effective for the task.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A protection flag may not be valid across entire address space and
hence arch_validate_prot() might need the address a protection bit is
being set on to ensure it is a valid protection flag. For example, sparc
processors support memory corruption detection (as part of ADI feature)
flag on memory addresses mapped on to physical RAM but not on PFN mapped
pages or addresses mapped on to devices. This patch adds address to the
parameters being passed to arch_validate_prot() so protection bits can
be validated in the relevant context.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ADI feature on M7 and newer processors has three important properties
relevant to userspace apps using ADI capabilities - (1) Size of block of
memory an ADI version tag applies to, (2) Number of uppermost bits in
virtual address used to encode ADI tag, and (3) The value M7 processor
will force the ADI tags to if it detects uncorrectable error in an ADI
tagged cacheline. Kernel can retrieve these properties for a platform
through machine description provided by the firmware. This patch adds
code to retrieve these properties and report them to userspace through
auxiliary vectors.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
M7 and newer processors add a "Memory corruption Detected" trap with
the addition of ADI feature. This trap is vectored into kernel by HV
through resumable error trap with error attribute for the resumable
error set to 0x00000800.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ADI (Application Data Integrity) feature on M7 and newer processors
adds new fault types for hypervisor - Invalid ASI and MCD disabled.
This patch expands data access exception handler to handle these
faults.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SPARC M7 processor adds new control register fields, ASIs and a new
trap to support the ADI (Application Data Integrity) feature. This
patch adds definitions for these register fields, ASIs and a handler
for the new precise memory corruption detected trap.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SPARC M7 processor introduces a new feature - Application Data
Integrity (ADI). ADI allows MMU to catch rogue accesses to memory.
When a rogue access occurs, MMU blocks the access and raises an
exception. In response to the exception, kernel sends the offending
task a SIGSEGV with si_code that indicates the nature of exception.
This patch adds three new signal codes specific to ADI feature:
1. ADI is not enabled for the address and task attempted to access
memory using ADI
2. Task attempted to access memory using wrong ADI tag and caused
a deferred exception.
3. Task attempted to access memory using wrong ADI tag and caused
a precise exception.
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Microblaze doesn't set CONFIG_NO_BOOTMEM and so memblock_virt_alloc()
doesn't work for CONFIG_HAVE_MEMBLOCK && !CONFIG_NO_BOOTMEM.
Similar change was already done by others architectures
"ARM: mm: Remove bootmem code and switch to NO_BOOTMEM"
(sha1: 84f452b1e8)
or
"openrisc: Consolidate setup to use memblock instead of bootmem"
(sha1: 266c7fad15)
or
"parisc: Drop bootmem and switch to memblock"
(sha1: 4fe9e1d957)
or
"powerpc: Remove bootmem allocator"
(sha1: 10239733ee)
or
"s390/mm: Convert bootmem to memblock"
(sha1: 50be634507)
or
"sparc64: Convert over to NO_BOOTMEM."
(sha1: 625d693e97)
or
"xtensa: drop sysmem and switch to memblock"
(sha1: 0e46c1115f)
Issue was introduced by:
"of/fdt: use memblock_virt_alloc for early alloc"
(sha1: 0fa1c57934)
Signed-off-by: Rob Herring <robh@kernel.org>
Tested-by: Alvaro Gamez Machado <alvaro.gamez@hazent.com>
Tested-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
The patch:
"microblaze: Setup proper dependency for optimized lib functions"
(sha1: 7b6ce52be3)
didn't setup all dependencies properly.
Optimized lib functions in C are also present for little endian
and optimized library functions in assembler are implemented only for
big endian version.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
pmdp_invalidate() was changed to update the pmd atomically
(to not lose dirty/access bits) and return the original pmd
value.
However, in doing so, we lost a lot of the essential work that
set_pmd_at() does, namely to update hugepage mapping counts and
queuing up the batched TLB flush entry.
Thus we were not flushing entries out of the TLB when making
such PMD changes.
Fix this by abstracting the accounting work of set_pmd_at() out into a
separate function, and call it from pmdp_establish().
Fixes: a8e654f01c ("sparc64: update pmdp_invalidate() to return old pmd value")
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull x86/pti updates from Thomas Gleixner:
"Yet another pile of melted spectrum related updates:
- Drop native vsyscall support finally as it causes more trouble than
benefit.
- Make microcode loading more robust. There were a few issues
especially related to late loading which are now surfacing because
late loading of the IB* microcodes addressing spectre issues has
become more widely used.
- Simplify and robustify the syscall handling in the entry code
- Prevent kprobes on the entry trampoline code which lead to kernel
crashes when the probe hits before CR3 is updated
- Don't check microcode versions when running on hypervisors as they
are considered as lying anyway.
- Fix the 32bit objtool build and a coment typo"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kprobes: Fix kernel crash when probing .entry_trampoline code
x86/pti: Fix a comment typo
x86/microcode: Synchronize late microcode loading
x86/microcode: Request microcode on the BSP
x86/microcode/intel: Look into the patch cache first
x86/microcode: Do not upload microcode if CPUs are offline
x86/microcode/intel: Writeback and invalidate caches before updating microcode
x86/microcode/intel: Check microcode revision before updating sibling threads
x86/microcode: Get rid of struct apply_microcode_ctx
x86/spectre_v2: Don't check microcode versions when running under hypervisors
x86/vsyscall/64: Drop "native" vsyscalls
x86/entry/64/compat: Save one instruction in entry_INT80_compat()
x86/entry: Do not special-case clone(2) in compat entry
x86/syscalls: Use COMPAT_SYSCALL_DEFINEx() macros for x86-only compat syscalls
x86/syscalls: Use proper syscall definition for sys_ioperm()
x86/entry: Remove stale syscall prototype
x86/syscalls/32: Simplify $entry == $compat entries
objtool: Fix 32-bit build
Pull RAS fixes from Thomas Gleixner:
"Two small fixes for RAS/MCE:
- Serialize sysfs changes to avoid concurrent modificaiton of
underlying data
- Add microcode revision to Machine Check records. This should have
been there forever, but now with the broken microcode versions in
the wild it has become important"
* 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/MCE: Serialize sysfs changes
x86/MCE: Save microcode revision in machine check records
Pull perf updates from Thomas Gleixner:
"Another set of perf updates:
- Fix a Skylake Uncore event format declaration
- Prevent perf pipe mode from crahsing which was caused by a missing
buffer allocation
- Make the perf top popup message which tells the user that it uses
fallback mode on older kernels a debug message.
- Make perf context rescheduling work correcctly
- Robustify the jump error drawing in perf browser mode so it does
not try to create references to NULL initialized offset entries
- Make trigger_on() robust so it does not enable the trigger before
everything is set up correctly to handle it
- Make perf auxtrace respect the --no-itrace option so it does not
try to queue AUX data for decoding.
- Prevent having different number of field separators in CVS output
lines when a counter is not supported.
- Make the perf kallsyms man page usage behave like it does for all
other perf commands.
- Synchronize the kernel headers"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix ctx_event_type in ctx_resched()
perf tools: Fix trigger class trigger_on()
perf auxtrace: Prevent decoding when --no-itrace
perf stat: Fix CVS output format for non-supported counters
tools headers: Sync x86's cpufeatures.h
tools headers: Sync copy of kvm UAPI headers
perf record: Fix crash in pipe mode
perf annotate browser: Be more robust when drawing jump arrows
perf top: Fix annoying fallback message on older kernels
perf kallsyms: Fix the usage on the man page
perf/x86/intel/uncore: Fix Skylake UPI event format
PPC:
- Fix guest time accounting in the host
- Fix large-page backing for radix guests on POWER9
- Fix HPT guests on POWER9 backed by 2M or 1G pages
- Compile fixes for some configs and gcc versions
s390:
- Fix random memory corruption when running as guest2 (e.g. KVM in
LPAR) and starting guest3 (e.g. nested KVM) with many CPUs
- Export forgotten io interrupt delivery statistics counter
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJaoupNAAoJEED/6hsPKofoT2gH/1P6cNF3Gz3T7fetuan5Tyhw
o5zduUxkM2AmTxzim9GPsKc5nsnPMnDraHCpQ9O2WW6VHvRZ6pwgtbtEtF2cx2Hv
70o5gQzap/odA8eoV98xRbZ+aZHrZgs/z2Ql5eXk32BIs57TOQZVQ/mW+BA4Ixos
8lOLYXuNY0lSL7Cp0MBY76ed8100ZTp7sfFByg3AmWDy7qhYOk9C3wPYNqPNfkuB
eqZ9XPKV3vJuqc2xSJvsW4DjBcqZTxqDbw+EOA3MuUMH/JRkt7OuPgQHl/G5BqAW
JcLb8ENfXNbamiakvarVenvnAJKJTLs8s9K8Z8ADXolY6NlGysGcrJo5dnGWN5A=
=UxpG
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"PPC:
- Fix guest time accounting in the host
- Fix large-page backing for radix guests on POWER9
- Fix HPT guests on POWER9 backed by 2M or 1G pages
- Compile fixes for some configs and gcc versions
s390:
- Fix random memory corruption when running as guest2 (e.g. KVM in
LPAR) and starting guest3 (e.g. nested KVM) with many CPUs
- Export forgotten io interrupt delivery statistics counter"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: s390: fix memory overwrites when not using SCA entries
KVM: PPC: Book3S HV: Fix guest time accounting with VIRT_CPU_ACCOUNTING_GEN
KVM: PPC: Book3S HV: Fix VRMA initialization with 2MB or 1GB memory backing
KVM: PPC: Book3S HV: Fix handling of large pages in radix page fault handler
KVM: s390: provide io interrupt kvm_stat
KVM: PPC: Book3S: Fix compile error that occurs with some gcc versions
KVM: PPC: Fix compile error that occurs when CONFIG_ALTIVEC=n
- The SMCCC firmware interface for the spectre variant 2 mitigation has
been updated to allow the discovery of whether the CPU needs the
workaround. This pull request relaxes the kernel check on the return
value from firmware.
- Fix the commit allowing changing from global to non-global page table
entries which inadvertently disallowed other safe attribute changes.
- Fix sleeping in atomic during the arm_perf_teardown_cpu() code.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlqi040ACgkQa9axLQDI
XvFnJQ//YTCYifVu7pBY50czqDjBZ8BONQJFtMCsz/id4fBeELrciN5jNklWXA/y
yYg+9Rb4UAEomqCRJWRU6MdIx52UagWlJ2Cn0G5q48uMdY9YFCJ4V8M6IFikvSUp
o0p6Ldhee4r2yv6iBs125c7vIW/4c3nrTb03nsEJrjesKjcW1JSrzuJ0Py+x6ZIP
AMuZocGlUOZ3NlKTPTQqY//fFCBp/hjvYzgUmPpcSZE/3E5pLHoxAIkkLMsaXaLH
eWAbT9/E3NfQoBX2xisp7fyfd5nXZZ5IfEFJC90Dtl+yMb4I3DPgmBXclGFC8Rxd
YOyabVAx9vpyBPGa9h4EtwMSRmiNwLwKxfCcXii8gAV7lPDqOyzduQTeepNCv6iY
ioPHnx3mEEpfEF8TCV0lXzcsPdQnkfQcciJGxoz31KQe3TIp1keGASfwbn/Q575S
i8/pHg9PS1r18tQIrrm/0lnBvkiyBFiKxPgOaWk4GXFYNh34GS9+xnTOsTuGOgGg
vjQ0gRIkseqOeVuZSwD6kkj0f70NsjreTOaXF8eCA4cpGIia+cGUAOPR1SKTF3o6
XkDjCRpde0KZoon95qye0+mVVJHOPgLs5VXFEngF7HCbI6spXxMSKuKoRYUbXZQj
ddXQeaPY0wisMWmerDM9jkbhaprNsKp7b9CGmZKWAYXaa6+Y93w=
=jVvu
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- The SMCCC firmware interface for the spectre variant 2 mitigation has
been updated to allow the discovery of whether the CPU needs the
workaround. This pull request relaxes the kernel check on the return
value from firmware.
- Fix the commit allowing changing from global to non-global page table
entries which inadvertently disallowed other safe attribute changes.
- Fix sleeping in atomic during the arm_perf_teardown_cpu() code.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Relax ARM_SMCCC_ARCH_WORKAROUND_1 discovery
arm_pmu: Use disable_irq_nosync when disabling SPI in CPU teardown hook
arm64: mm: fix thinko in non-global page table attribute check
A recent update to the ARM SMCCC ARCH_WORKAROUND_1 specification
allows firmware to return a non zero, positive value to describe
that although the mitigation is implemented at the higher exception
level, the CPU on which the call is made is not affected.
Let's relax the check on the return value from ARCH_WORKAROUND_1
so that we only error out if the returned value is negative.
Fixes: b092201e00 ("arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
One notable fix to properly advertise our support for a new firmware feature,
caused by two series conflicting semantically but not textually.
There's a new ioctl for the new ocxl driver, which is not a fix, but needed to
complete the userspace API and good to have before the driver is in a released
kernel.
Finally three minor selftest fixes, and a fix for intermittent build failures
for some obscure platforms, caused by a missing make dependency.
Thanks to:
Alastair D'Silva, Bharata B Rao, Guenter Roeck.
-----BEGIN PGP SIGNATURE-----
iQIwBAABCAAaBQJaomOTExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYBV
LQ/+MyHQH7d/lhqRLWa3BVgy9mn2QfFTkRNMi46eqIWyk6wD5Qf//gzO+vTDzcEO
rzkBu3XwhuhDCpCInLtJH17X8iKVy65Uk2JQ808pZTm0WnNTBv0ag251QARDg//B
S1QmCQGJkpoUJ85+IDqQzV90K2o13BPWkAOrVh40G5GASvFGZRdxuTuo4DKeIe/j
0yri2LRuL+cpcfmZqaJVco8l2tqJEi6zJRhL/ORrLr4XSy0clxF1q1MYshNWPDsB
3l4r/yKz2aMltplknVVrpdMThUTD311kknIHoQuuEYvd6GxDz52d3B1lvE5Xb8EB
sNJQX6q8ydab0u79/tTsXUm/EyfmMd6HMKOeVnQim19tnEe0wW6VkaPDNWWXa66U
hW0qx7rX+zMSHzEcgLF7HKirzCQi2oS5ZMqXvhHlOPBb8Iy0O5za670AtvMGgm7/
NKXKML31opgXmTcU2ZxBMrtL0S8ft3wHCKRLkB6H8GHE+6//Ps87bQaUo4v2KsOr
2T/2w6TVtAwxLrASpCVonfcmIjjjIK4WILaOKp24Yzyv0eqen6Z3/kkga37sLgj8
f11HzTfFWO5ckroHrJSVCOC48eWb+O1CWRwS4rofL1jLfucpv6VFTPK6F4G47pAj
pZyIJMTem0AEgsxVBJlsw6TibGH5W6rzDE/2WpEqxb7iZh0=
=ajRr
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"One notable fix to properly advertise our support for a new firmware
feature, caused by two series conflicting semantically but not
textually.
There's a new ioctl for the new ocxl driver, which is not a fix, but
needed to complete the userspace API and good to have before the
driver is in a released kernel.
Finally three minor selftest fixes, and a fix for intermittent build
failures for some obscure platforms, caused by a missing make
dependency.
Thanks to: Alastair D'Silva, Bharata B Rao, Guenter Roeck"
* tag 'powerpc-4.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries: Fix vector5 in ibm architecture vector table
ocxl: Document the OCXL_IOCTL_GET_METADATA IOCTL
ocxl: Add get_metadata IOCTL to share OCXL information to userspace
selftests/powerpc: Skip the subpage_prot tests if the syscall is unavailable
selftests/powerpc: Fix missing clean of pmu/lib.o
powerpc/boot: Fix random libfdt related build errors
selftests/powerpc: Skip tm-trap if transactional memory is not enabled
Disable the kprobe probing of the entry trampoline:
.entry_trampoline is a code area that is used to ensure page table
isolation between userspace and kernelspace.
At the beginning of the execution of the trampoline, we load the
kernel's CR3 register. This has the effect of enabling the translation
of the kernel virtual addresses to physical addresses. Before this
happens most kernel addresses can not be translated because the running
process' CR3 is still used.
If a kprobe is placed on the trampoline code before that change of the
CR3 register happens the kernel crashes because int3 handling pages are
not accessible.
To fix this, add the .entry_trampoline section to the kprobe blacklist
to prohibit the probing of code before all the kernel pages are
accessible.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: mathieu.desnoyers@efficios.com
Cc: mhiramat@kernel.org
Link: http://lkml.kernel.org/r/1520565492-4637-2-git-send-email-francis.deslauriers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The check_interval file in
/sys/devices/system/machinecheck/machinecheck<cpu number>
directory is a global timer value for MCE polling. If it is changed by one
CPU, mce_restart() broadcasts the event to other CPUs to delete and restart
the MCE polling timer and __mcheck_cpu_init_timer() reinitializes the
mce_timer variable.
If more than one CPU writes a specific value to the check_interval file
concurrently, mce_timer is not protected from such concurrent accesses and
all kinds of explosions happen. Since only root can write to those sysfs
variables, the issue is not a big deal security-wise.
However, concurrent writes to these configuration variables is void of
reason so the proper thing to do is to serialize the access with a mutex.
Boris:
- Make store_int_with_restart() use device_store_ulong() to filter out
negative intervals
- Limit min interval to 1 second
- Correct locking
- Massage commit message
Signed-off-by: Seunghun Han <kkamagui@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20180302202706.9434-1-kkamagui@gmail.com
Updating microcode used to be relatively rare. Now that it has become
more common we should save the microcode version in a machine check
record to make sure that those people looking at the error have this
important information bundled with the rest of the logged information.
[ Borislav: Simplify a bit. ]
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20180301233449.24311-1-tony.luck@intel.com
Original idea by Ashok, completely rewritten by Borislav.
Before you read any further: the early loading method is still the
preferred one and you should always do that. The following patch is
improving the late loading mechanism for long running jobs and cloud use
cases.
Gather all cores and serialize the microcode update on them by doing it
one-by-one to make the late update process as reliable as possible and
avoid potential issues caused by the microcode update.
[ Borislav: Rewrite completely. ]
Co-developed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Link: https://lkml.kernel.org/r/20180228102846.13447-8-bp@alien8.de
... so that any newer version can land in the cache and can later be
fished out by the application functions. Do that before grabbing the
hotplug lock.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Link: https://lkml.kernel.org/r/20180228102846.13447-7-bp@alien8.de
The cache might contain a newer patch - look in there first.
A follow-on change will make sure newest patches are loaded into the
cache of microcode patches.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Ashok Raj <ashok.raj@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Link: https://lkml.kernel.org/r/20180228102846.13447-6-bp@alien8.de
Avoid loading microcode if any of the CPUs are offline, and issue a
warning. Having different microcode revisions on the system at any time
is outright dangerous.
[ Borislav: Massage changelog. ]
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Link: http://lkml.kernel.org/r/1519352533-15992-4-git-send-email-ashok.raj@intel.com
Link: https://lkml.kernel.org/r/20180228102846.13447-5-bp@alien8.de
Updating microcode is less error prone when caches have been flushed and
depending on what exactly the microcode is updating. For example, some
of the issues around certain Broadwell parts can be addressed by doing a
full cache flush.
[ Borislav: Massage it and use native_wbinvd() in both cases. ]
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Ashok Raj <ashok.raj@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Link: http://lkml.kernel.org/r/1519352533-15992-3-git-send-email-ashok.raj@intel.com
Link: https://lkml.kernel.org/r/20180228102846.13447-4-bp@alien8.de
After updating microcode on one of the threads of a core, the other
thread sibling automatically gets the update since the microcode
resources on a hyperthreaded core are shared between the two threads.
Check the microcode revision on the CPU before performing a microcode
update and thus save us the WRMSR 0x79 because it is a particularly
expensive operation.
[ Borislav: Massage changelog and coding style. ]
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Ashok Raj <ashok.raj@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Link: http://lkml.kernel.org/r/1519352533-15992-2-git-send-email-ashok.raj@intel.com
Link: https://lkml.kernel.org/r/20180228102846.13447-3-bp@alien8.de
It is a useless remnant from earlier times. Use the ucode_state enum
directly.
No functional change.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Ashok Raj <ashok.raj@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Link: https://lkml.kernel.org/r/20180228102846.13447-2-bp@alien8.de
As:
1) It's known that hypervisors lie about the environment anyhow (host
mismatch)
2) Even if the hypervisor (Xen, KVM, VMWare, etc) provided a valid
"correct" value, it all gets to be very murky when migration happens
(do you provide the "new" microcode of the machine?).
And in reality the cloud vendors are the ones that should make sure that
the microcode that is running is correct and we should just sing lalalala
and trust them.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Cc: kvm <kvm@vger.kernel.org>
Cc: Krčmář <rkrcmar@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180226213019.GE9497@char.us.oracle.com
Since Linux v3.2, vsyscalls have been deprecated and slow. From v3.2
on, Linux had three vsyscall modes: "native", "emulate", and "none".
"emulate" is the default. All known user programs work correctly in
emulate mode, but vsyscalls turn into page faults and are emulated.
This is very slow. In "native" mode, the vsyscall page is easily
usable as an exploit gadget, but vsyscalls are a bit faster -- they
turn into normal syscalls. (This is in contrast to vDSO functions,
which can be much faster than syscalls.) In "none" mode, there are
no vsyscalls.
For all practical purposes, "native" was really just a chicken bit
in case something went wrong with the emulation. It's been over six
years, and nothing has gone wrong. Delete it.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Kernel Hardening <kernel-hardening@lists.openwall.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/519fee5268faea09ae550776ce969fa6e88668b0.1520449896.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull s390 fixes from Martin Schwidefsky:
"Nine bug fixes for s390:
- Three fixes for the expoline code, one of them is strictly speaking
a cleanup but as it relates to code added with 4.16 I would like to
include the patch.
- Three timer related fixes in the common I/O layer
- A fix for the handling of internal DASD request which could cause
panics.
- One correction in regard to the accounting of pud page tables vs.
compat tasks.
- The register scrubbing in entry.S caused spurious crashes, this is
fixed now as well"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/entry.S: fix spurious zeroing of r0
s390: Fix runtime warning about negative pgtables_bytes
s390: do not bypass BPENTER for interrupt system calls
s390/cio: clear timer when terminating driver I/O
s390/cio: fix return code after missing interrupt
s390/cio: fix ccw_device_start_timeout API
s390/clean-up: use CFI_* macros in entry.S
s390: Replace IS_ENABLED(EXPOLINE_*) with IS_ENABLED(CONFIG_EXPOLINE_*)
s390/dasd: fix handling of internal requests
As %rdi is never user except in the following push, there is no
need to restore %rdi to the original value.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: luto@amacapital.net
Cc: viro@zeniv.linux.org.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With the CPU renaming registers on its own, and all the overhead of the
syscall entry/exit, it is doubtful whether the compiled output of
mov %r8, %rax
mov %rcx, %r8
mov %rax, %rcx
jmpq sys_clone
is measurably slower than the hand-crafted version of
xchg %r8, %rcx
So get rid of this special case.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: luto@amacapital.net
Cc: viro@zeniv.linux.org.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Using SYSCALL_DEFINEx() is recommended, so use it also here.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: luto@amacapital.net
Cc: viro@zeniv.linux.org.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If the compat entry point is equivalent to the native entry point, it
does not need to be specified explicitly.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: luto@amacapital.net
Cc: viro@zeniv.linux.org.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull sigingo fix from Eric Biederman:
"The kbuild test robot found that I accidentally moved si_pkey when I
was cleaning up siginfo_t. A short followed by an int with the int
having 8 byte alignment. Sheesh siginfo_t is a weird structure.
I have now corrected it and added build time checks that with a little
luck will catch any similar future mistakes. The build time checks
were sufficient for me to verify the bug and to verify my fix. So they
are at least useful this once."
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
signal/x86: Include the field offsets in the build time checks
signal: Correct the offset of si_pkey in struct siginfo
- Fix random memory corruption when running as guest2 (e.g. KVM in
LPAR) and starting guest3 (nested KVM) with many CPUs (e.g. a nested
guest with 200 vcpu)
- io interrupt delivery counter was not exported
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJanpm7AAoJEBF7vIC1phx8B1QP/2Lw/Buca3aYfDEHMwGMmp1x
kKOaXGxTtYzKI3JarPmp9mE98c8MOZkiYIxTCaPfapf9AiRhFW2B2idBJHgCXqcQ
ghrcyYnTaV9kL8gvB/7qZEhvOjZB2pPhmqK1OeVW4cYCQhhNiy8n68r5dD2kyf/U
bIL4TqXhG4AftWU9iY28HbiPrJ6GGJ0ELJGeiPzwIn4nJCFRCw97qEhw7PeOQ7I2
gOuyk+k9+cXmU5ftNXpwsBEDuDSoMUAMKiSiqJ/EgUA0uuatn+EeL0ho70MUlQ6M
xsLRIlGfWl7po9PUq0UIXlF5s4MmpUNYZ9S/MjJBgYJzosGdn26scaVXEP+D8cJ6
teJgmcBiCx44LinI4yu7hDls/EUSDhKP5cjwEtBRR4ck8q6n+hY7gjAPwzEumA07
jDfJbB0S09SXASYrjL2c7+Lt0zGXB8pdWPLYPQazh7MqIWo8iiWk4fRYZYja2nmt
ZDagh9BZrq/y7Rigz9etNioMA9Lus1DphW600t/5EMbtcUlI2jQWrKqNdMTRKi2R
ElYoL/UrCtmkv5Q2Gv7RrEn4Rw3EY1OqZ8WeOkLxNUlXlW8nPgYYTmFi/CY44d9H
9sV586au6ECu9EBPuAMPL8javGRwZbE5cSGZVculrj4ZNRll3EmBnn3/ZNPnimWo
k/tf5mxA0DhDxcYCMcpB
=mxZT
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-master-4.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux
KVM: s390: Fixes
- Fix random memory corruption when running as guest2 (e.g. KVM in
LPAR) and starting guest3 (nested KVM) with many CPUs (e.g. a nested
guest with 200 vcpu)
- io interrupt delivery counter was not exported
- Fix guest time accounting in the host
- Fix large-page backing for radix guests on POWER9
- Fix HPT guests on POWER9 backed by 2M or 1G pages
- Compile fixes for some configs and gcc versions
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJammR1AAoJEJ2a6ncsY3Gf2pQIAKf1sBKimpDj/yeBlWVbS41q
6mIJh8R+4DV7TcOSVoOdzUz1dU1cseVznqqr3kexu+unoUpcqm240ZUDsDNWy9j0
Xv0JyrGOcPor9sQmlb1s2gOsybxhic4u8Ih1eQV47bEUw1Rb84/da0JI1u5nMRTq
nm3OHPGSnK2C8UkBBVjGelLJGUx+uaLFLjJSSTd0F9+hlxjGT3yXjP3wLG/ZNajT
6Reuzpr95hGpmIaml8gh73clLk4WAjF3+5SyiLo5nlsXzvMnC0DyzaUrHocIo6i7
nZxrx9UguzEdiUbuc5NEs4klTc+GPwMCfd+7z6vmtyw87A0sVOUgGWNGcZL60ew=
=Wy2j
-----END PGP SIGNATURE-----
Merge tag 'kvm-ppc-fixes-4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
Fixes for PPC KVM:
- Fix guest time accounting in the host
- Fix large-page backing for radix guests on POWER9
- Fix HPT guests on POWER9 backed by 2M or 1G pages
- Compile fixes for some configs and gcc versions
Even if we don't have extended SCA support, we can have more than 64 CPUs
if we don't enable any HW features that might use the SCA entries.
Now, this works just fine, but we missed a return, which is why we
would actually store the SCA entries. If we have more than 64 CPUs, this
means writing outside of the basic SCA - bad.
Let's fix this. This allows > 64 CPUs when running nested (under vSIE)
without random crashes.
Fixes: a6940674c3 ("KVM: s390: allow 255 VCPUs when sca entries aren't used")
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180306132758.21034-1-david@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
With ibm,dynamic-memory-v2 and ibm,drc-info coming around the same
time, byte22 in vector5 of ibm architecture vector table got set twice
separately. The end result is that guest kernel isn't advertising
support for ibm,dynamic-memory-v2.
Fix this by removing the duplicate assignment of byte22.
Fixes: 02ef6dd810 ("powerpc: Enable support for ibm,drc-info devtree property")
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
when a system call is interrupted we might call the critical section
cleanup handler that re-does some of the operations. When we are between
.Lsysc_vtime and .Lsysc_do_svc we might also redo the saving of the
problem state registers r0-r7:
.Lcleanup_system_call:
[...]
0: # update accounting time stamp
mvc __LC_LAST_UPDATE_TIMER(8),__LC_SYNC_ENTER_TIMER
# set up saved register r11
lg %r15,__LC_KERNEL_STACK
la %r9,STACK_FRAME_OVERHEAD(%r15)
stg %r9,24(%r11) # r11 pt_regs pointer
# fill pt_regs
mvc __PT_R8(64,%r9),__LC_SAVE_AREA_SYNC
---> stmg %r0,%r7,__PT_R0(%r9)
The problem is now, that we might have already zeroed out r0.
The fix is to move the zeroing of r0 after sysc_do_svc.
Reported-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Fixes: 7041d28115 ("s390: scrub registers on kernel entry and KVM exit")
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Due to an oversight when refactoring siginfo_t si_pkey has been in the
wrong position since 4.16-rc1. Add an explicit check of the offset of
every user space field in siginfo_t and compat_siginfo_t to make a
mistake like this hard to make in the future.
I have run this code on 4.15 and 4.16-rc1 with the position of si_pkey
fixed and all of the fields show up in the same location.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Some other small cleanups.
-----BEGIN PGP SIGNATURE-----
iQIxBAABCAAbBQJancx9FBx0b255Lmx1Y2tAaW50ZWwuY29tAAoJEKurIx+X31iB
6SsQAJkSqqDoQaXQOvdRsKSzE3i/90iOev+8dp7aWaJQkrmpmepRh8sLB/rD7B/3
GkprvcBKhaQfXOxB4iE+m4IkP9Lw/AoacKva5K0yDEW7eCOOUCCgbfDa0RvXu0Is
b2qTsLvfqF9yamctMSXGzwcdvHccOUe5q6YZCMLcULvWJBPVAxEbryC6lglIkkmm
hkOvpDllbCZL8QKxHK+MeHJ+t2YiRuSIbzOIF5EHeAVQGwxUNgxZKktHDHgrzS79
LbJSjGuL/XyT6gWp/IB9OIQQn8gcAPzmn4nDrd8b8GPQFohg7cGn80z8dNgoJCKE
cTUxiwdYDjW3wc4NlmbapajltlsCaRc54hING1U0sC8sFCS1BfNlRKqMIjZcPMr+
sArylsY4zyLcoLEmdKVS6ruV5MVATBjFo4DawFnQ6PMzFQP04ZnftXcQmyS8FCeP
Uk1lHOYFDTa4pi4GSAwTNizLsvXnnT1oM6a1OtfCj7rpcQ9YlUUEjdt3wggY112H
8WPJu5YegdVnVENUL5SgYDcZcdpL3qbTcXqNWaTaqKGvkfXceeWJZjtIMKVDEpj3
KpznoGGiNmMNs2eOXxFq+JiJbeioXWF4UEH3slQz2lfznUldSLlTI2KMr7eSDygN
IXvrMNeD41K/IWrtF5yO3A2LtzifWPiDDaKAHkQXgAHFGi5q
=s8IY
-----END PGP SIGNATURE-----
Merge tag 'please-pull-ia64_misc' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
Pull ia64 cleanups from Tony Luck:
- More atomic cleanup from willy
- Fix a python script to work with version 3
- Some other small cleanups
* tag 'please-pull-ia64_misc' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
ia64/err-inject: fix spelling mistake: "capapbilities" -> "capabilities"
ia64/err-inject: Use get_user_pages_fast()
ia64: doc: tweak whitespace for 'console=' parameter
ia64: Convert remaining atomic operations
ia64: convert unwcheck.py to python3