For inline asm statements which have a CALL instruction, we list the
stack pointer as a constraint to convince GCC to ensure the frame
pointer is set up first:
static inline void foo()
{
register void *__sp asm(_ASM_SP);
asm("call bar" : "+r" (__sp))
}
Unfortunately, that pattern causes Clang to corrupt the stack pointer.
The fix is easy: convert the stack pointer register variable to a global
variable.
It should be noted that the end result is different based on the GCC
version. With GCC 6.4, this patch has exactly the same result as
before:
defconfig defconfig-nofp distro distro-nofp
before 9820389 9491555 8816046 8516940
after 9820389 9491555 8816046 8516940
With GCC 7.2, however, GCC's behavior has changed. It now changes its
behavior based on the conversion of the register variable to a global.
That somehow convinces it to *always* set up the frame pointer before
inserting *any* inline asm. (Therefore, listing the variable as an
output constraint is a no-op and is no longer necessary.) It's a bit
overkill, but the performance impact should be negligible. And in fact,
there's a nice improvement with frame pointers disabled:
defconfig defconfig-nofp distro distro-nofp
before 9796316 9468236 9076191 8790305
after 9796957 9464267 9076381 8785949
So in summary, while listing the stack pointer as an output constraint
is no longer necessary for newer versions of GCC, it's still needed for
older versions.
Suggested-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/3db862e970c432ae823cf515c52b54fec8270e0e.1505942196.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull arch/tile fixes from Chris Metcalf:
"These are a code cleanup and config cleanup, respectively"
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
tile: array underflow in setup_maxnodemem()
tile: defconfig: Cleanup from old Kconfig options
- #ifdef CONFIG_EFI around __efi_fpsimd_begin/end
- Assembly code alignment reduced to 4 bytes from 16
- Ensure the kernel is compiled for LP64 (there are some arm64 compilers
around defaulting to ILP32)
- Fix arm_pmu_acpi memory leak on the error path
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlnFPSsACgkQa9axLQDI
XvGdxBAAhdWVExEaygHhoWl6/Dj5wpMu3ydYKb3crF7Ok+X/H5842vqVGxhSLxiN
5P9ZPmFN0nzOabBEGY5bJX35l0/JLA9UnYoNvHtIJc3MkLMCgJspf7yTE18jAjlj
rC53CBysDyMTTgfuJEOjyh2r06vuHnbMg5bXPtdXZe63tOSwWIiVxErhz3sl32cK
MTzAkLlJtnei5Da8EqEQuSV5/aRGK9ZHwWVuJqtnjOIT8rnR0ZZKVebwPdXJrMLt
FWjQLYuHHzlt6guQvV85SuokWX/Rkd/3VR1/6ShLXFbDGEgeRO9hvwoyZ7yRmxr8
y/agTDUWmGQoM7wtxKDXJUW/GFgib1Kg70C9L0YofN4CpcXrlBvqPDF2OIslfTil
smFkjchLHu1ToVww8zOgeMnXuWJoFvmUTvKau5jyWqNBqw71OYuxUBDWsx2NDVEi
OJJrQ1YMKRhI/n2berYJxZxreSsySEejcrPff55H60NlFPwlX6xQwzcJLpa6umkb
OfwW9gC29rpbM1uVMbPKj0faLNo11/+tkBxvZivzLKN2jtuPaV5MuIfXfvTZhuM1
16WYRMk4fQ2rqfQOiF1R5RIbKr1JHRCAmSEv2hiUSei35mfeVcqfUHcBS3VURgwa
UBcwCdlLkSxkLgCV4MAhZlST7FsRkQx+6OBiOJtiReGqkQtyjkg=
=ZHek
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- #ifdef CONFIG_EFI around __efi_fpsimd_begin/end
- Assembly code alignment reduced to 4 bytes from 16
- Ensure the kernel is compiled for LP64 (there are some arm64
compilers around defaulting to ILP32)
- Fix arm_pmu_acpi memory leak on the error path
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
drivers/perf: arm_pmu_acpi: Release memory obtained by kasprintf
arm64: ensure the kernel is compiled for LP64
arm64: relax assembly code alignment from 16 byte to 4 byte
arm64: efi: Don't include EFI fpsimd save/restore code in non-EFI kernels
Some architectures define the no-op macros/functions copy_segments,
release_segments and forget_segments. These are used nowhere in the
tree, so removed them.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Vineet Gupta <vgupta@synopsys.com> [for arch/arc]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
gcc-7 optimizes the byte-wise accesses of get_unaligned_le32() into
word-wise accesses if the 32-bit integer output_len is declared as
external. This panics then the bootloader since we don't have the
unaligned access fault trap handler installed during boot time.
Avoid this optimization by declaring output_len as byte-aligned and thus
unbreak the bootloader code.
Additionally, compile the boot code optimized for size.
Signed-off-by: Helge Deller <deller@gmx.de>
By adding the feature to build the kernel as self-extracting
executeable, the possibility to simply compress the kernel with gzip was
lost.
This patch now reintroduces this possibilty again and leaves it up to
the user to decide how the kernel should be built.
The palo bootloader is able to natively load both formats.
Signed-off-by: Helge Deller <deller@gmx.de>
Commit 24587380f6 ("parisc: Add MADV_HWPOISON and MADV_SOFT_OFFLINE") added
the necessary constants to handle hardware-poisoning. Those were needed to
support the page deallocation feature from firmware.
But I completely missed to add the relevant fault handler code. This now
showed up when I ran the madvise07 testcase from the Linux Test Project,
which failed with a kernel BUG at arch/parisc/mm/fault.c:320.
With this patch the parisc kernel now behaves like other platforms and
gives the same kernel syslog warnings when poisoning pages.
Signed-off-by: Helge Deller <deller@gmx.de>
While scanning the PDT for reported broken memory modules, warn if the
initrd was coincidentally loaded into bad memory.
Signed-off-by: Helge Deller <deller@gmx.de>
According to the programming note at page 1-31 of the PA 1.1 Firmware
Architecture document, one should use the PDC_INSTR firmware function to
get the instruction that invokes a PDCE_CHECK in the HPMC handler. This
patch follows this note and sets the instruction which has been a nop up
until now.
Testing on a C3000 and C8000 showed that this firmware call isn't
implemented on those machines, so maybe it's only needed on older ones.
Signed-off-by: Helge Deller <deller@gmx.de>
Check stack pointer if we are reaching the stack end and stop unwinding
if we do. This fixes early backtraces and avoids showing unrealistic
call stacks.
Signed-off-by: Helge Deller <deller@gmx.de>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABAgAGBQJZxSV8AAoJELDendYovxMvf5YH/jUEgeFVgP0KRjNsvgp4gu88
4BW7nWYtFt4gGE1KnrKEPbg5Je0OwkpW7vUXxvLwDGWymtHMzVuuB2xxwzkePyzS
17Kzmb/JiuaNpVF4+5v3JvAw3b9iHrZ7T6cXOQgm28agd3m/y+9FSyzoMoNNRdGG
xURwUyK1idRqtkQV5VsQAK0Z1lVF7YhhaxWXBtClsqnKWoeLBLc8fpRJmUNruA33
E2Sdi06mNNN3xudu1s2edC5hAO4EgVKmonnmyHRYonIYwuqSND8fhEXj+PRdHj7s
lLVRQixd3raBiSscLASaQ7I/66frBm+TXzmoHAVtYkdlXBJlisTIvQlMPtAwu60=
=c3HX
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.14b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"A fix for a missing __init annotation and two cleanup patches"
* tag 'for-linus-4.14b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen, arm64: drop dummy lookup_address()
xen: don't compile pv-specific parts if XEN_PV isn't configured
xen: x86: mark xen_find_pt_base as __init
Four fixes for the new instruction emulation code. A fix for CPU offline on bare
metal machines when certain idle states are not supported, and a fix for a
device_node refcounting oops during CPU hotplug, caused by recent changes.
Going to stable are a fix for an oops during core dump on machines that have TM
(Transactional Memory) disabled. Reordering some EEH initialisation to avoid
trashing memory, and another device_node refcounting fix.
And a few other minor things.
Thanks to:
Anton Blanchard, Benjamin Herrenschmidt, Cyril Bur, Gautham R. Shenoy, Gustavo
Romero, Kamalesh Babulal, Matthew Weber, Matt Weber, Naveen N. Rao, Nicholas
Piggin, Pavithra Prakash, Ravi Bangoria, Ronak Desai, Scott Wood, Tyrel
Datwyler.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZxPnDAAoJEFHr6jzI4aWAPBgP+wVskXfJQXxm3kvnOm1UZ56m
pKEjoH1n54gKyPcNpQmd7sqMWzdBqzBToeOSr+zMbTm7+xcjURJDulEBh3Fhiame
YnJDal5SYxiWZLylHwLseu8t7J7n5J+dWXYkZI7eIupWsKxuL4kr5GHtFpezbA/a
kFhx2aNLZ5iy/mLJ6p01/qwn1bUKkjSTuvn75JG3kSdoUCB9d+l8VfYqT1cLC7sl
doyKQ2AEHDMV8hFs4A2S1bgNdenq9IAmIMiZf6BkoTkIMg9QxRNsjHvY8wa4N/by
4iHK9Ig30qfMIXWr6hmpl8HmR9BL1BZ2nJOX+0ilUI1ktXz/amEdr4bL0NrCKZKk
waUg+aK5HDNMTi8dOv+zuF4oCHBwGz8HKNLiuQDv2ek3BSlG3Fb4ge/Quibsn3je
jj9bY7kC0i+w7By9NRo7nL1GV0LV4LArjhHmOqBtW++vKwl3dseMydj2Ml9mpSr7
U385dGBoazxOkdsO84WHgMcFuP61LYRA14boMqFkdRd9G7tRT2DCAdpALyz1Mr/E
TqiNRu65ZGaMZzOS14VBP+1xpNEx5a6CtoambES94MC2IBH5SrOTsmYkyZJL9PbO
USznplm5hcEt8obdNYSMpGsV5U2iGm9XwDcFxJW4JtmoPxQwa0ap1M3eP7aJXVIk
N5KhYUgsYAdqb6skvLcM
=Eiyz
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"It turns out our single-fix pull from last week was too good to be
true. I missed a few fixes in that pull that had already come in
because I was on leave, but also we hadn't found the bugs yet. So this
week it's a bit bigger, though not ridiculous. Hopefully things will
settle down from here on.
Four fixes for the new instruction emulation code. A fix for CPU
offline on bare metal machines when certain idle states are not
supported, and a fix for a device_node refcounting oops during CPU
hotplug, caused by recent changes.
Going to stable are a fix for an oops during core dump on machines
that have TM (Transactional Memory) disabled. Reordering some EEH
initialisation to avoid trashing memory, and another device_node
refcounting fix.
And a few other minor things.
Thanks to: Anton Blanchard, Benjamin Herrenschmidt, Cyril Bur, Gautham
R. Shenoy, Gustavo Romero, Kamalesh Babulal, Matthew Weber, Matt Weber,
Naveen N. Rao, Nicholas Piggin, Pavithra Prakash, Ravi Bangoria, Ronak
Desai, Scott Wood, Tyrel Datwyler"
* tag 'powerpc-4.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries: Fix parent_dn reference leak in add_dt_node()
powerpc/pseries: Fix "OF: ERROR: Bad of_node_put() on /cpus" during DLPAR
powerpc/eeh: Create PHB PEs after EEH is initialized
powerpc/kprobes: Update optprobes to use emulate_update_regs()
powerpc/powernv: Clear LPCR[PECE1] via stop-api only for deep state offline
powerpc/sstep: mullw should calculate a 64 bit signed result
powerpc/sstep: Fix issues with mcrf
powerpc/sstep: Fix issues with set_cr0()
powerpc/tm: Flush TM only if CPU has TM feature
powerpc/sysrq: Fix oops whem ppmu is not registered
powerpc/configs: Update for CONFIG_SND changes
powerpc/e6500: Update machine check for L1D cache err
Pull MIPS fixes from Ralf Baechle:
- Fix a build error on MSP71xx which used to rely on somehow magically
<asm/setup.h> being pulled in which no longer happens.
- Fix the __write_64bit_c0_split inline assembler where there was the
theoretical possibility of GCC interpret the constraints such that
bad code could result.
- A __init was causing section mismatch errors on Alchemy. Just to be
on the safe side, Manuel's patch does away with all of them.
- Fix perf event init.
* '4.14-fixes' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: PCI: fix pcibios_map_irq section mismatch
MIPS: Fix input modify in __write_64bit_c0_split()
MIPS: MSP71xx: Include asm/setup.h
MIPS: Fix perf event init
Pull s390 fixes from Martin Schwidefsky:
- A couple of bug fixes: memory management, perf, cio, dasd and
scm_blk.
- A larger change in regard to the CPU topology to improve performance
for systems running under z/VM or KVM.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/topology: enable / disable topology dynamically
s390/topology: alternative topology for topology-less machines
s390/mm: fix write access check in gup_huge_pmd()
s390/mm: make pmdp_invalidate() do invalidation only
s390/cio: recover from bad paths
s390/scm_blk: consistently use blk_status_t as error type
s390/dasd: fix race during dasd initialization
s390/perf: fix bug when creating per-thread event
Drop the __init from pcibios_map_irq() to make this section mis-
match go away:
WARNING: vmlinux.o(.text+0x56acd4): Section mismatch in reference from the function pcibios_scanbus() to the function .init.text:pcibios_map_irq()
The function pcibios_scanbus() references
the function __init pcibios_map_irq().
This is often because pcibios_scanbus lacks a __init
annotation or the annotation of pcibios_map_irq is wrong.
Run-Tested only on Alchemy.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/17267/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The inline asm in __write_64bit_c0_split() modifies the 64-bit input
operand by shifting the high register left by 32, and constructing the
full 64-bit value in the low register (even on a 32-bit kernel), so if
that value is used again it could cause breakage as GCC would assume the
registers haven't changed when they have.
To quote the GCC extended asm documentation:
> Warning: Do not modify the contents of input-only operands (except for
> inputs tied to outputs). The compiler assumes that on exit from the
> asm statement these operands contain the same values as they had
> before executing the statement.
Avoid modifying the input by using a temporary variable as an output
which is modified instead of the input and not otherwise used. The asm
is always __volatile__ so GCC shouldn't optimise it out. The low
register of the temporary output is written before the high register of
the input is read, so we have two constraint alternatives, one where
both use the same registers (for when the input value isn't subsequently
used), and one with an early clobber on the output in case the low
output uses the same register as the high input. This allows the
resulting assembly to remain mostly unchanged.
A diff of a MIPS32r6 kernel reveals only three differences, two in
relation to write_c0_r10k_diag() in cpu_probe() (register allocation
rearranged slightly but otherwise identical), and one in relation to
write_c0_cvmmemctl2() in kvm_vz_local_flush_guesttlb_all(), but the
octeon CPU is only supported on 64-bit kernels where
__write_64bit_c0_split() isn't used so that shouldn't matter in
practice. So there currently doesn't appear to be anything broken by
this bug.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/17315/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
msp71xx_defconfig can not be built at the in v4.14-rc1
arch/mips/pmcs-msp71xx/msp_smp.c:72:2: error: implicit declaration of function 'set_vi_handler' [-Werror=implicit-function-declaration]
I don't know what caused the regression, but including the right
header is the obvious fix.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/17309/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
A reference to the parent device node is held by add_dt_node() for the
node to be added. If the call to dlpar_configure_connector() fails
add_dt_node() returns ENOENT and that reference is not freed.
Add a call to of_node_put(parent_dn) prior to bailing out after a
failed dlpar_configure_connector() call.
Fixes: 8d5ff32076 ("powerpc/pseries: Make dlpar_configure_connector parent node aware")
Cc: stable@vger.kernel.org # v3.12+
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 215ee763f8 ("powerpc: pseries: remove dlpar_attach_node
dependency on full path") reworked dlpar_attach_node() to no longer
look up the parent node "/cpus", but instead to have the parent node
passed by the caller in the function parameter list.
As a result dlpar_attach_node() is no longer responsible for freeing
the reference to the parent node. However, commit 215ee763f8 failed
to remove the of_node_put(parent) call in dlpar_attach_node(), or to
take into account that the reference to the parent in the caller
dlpar_cpu_add() needs to be held until after dlpar_attach_node()
returns.
As a result doing repeated cpu add/remove dlpar operations will
eventually result in the following error:
OF: ERROR: Bad of_node_put() on /cpus
CPU: 0 PID: 10896 Comm: drmgr Not tainted 4.13.0-autotest #1
Call Trace:
dump_stack+0x15c/0x1f8 (unreliable)
of_node_release+0x1a4/0x1c0
kobject_put+0x1a8/0x310
kobject_del+0xbc/0xf0
__of_detach_node_sysfs+0x144/0x210
of_detach_node+0xf0/0x180
dlpar_detach_node+0xc4/0x120
dlpar_cpu_remove+0x280/0x560
dlpar_cpu_release+0xbc/0x1b0
arch_cpu_release+0x6c/0xb0
cpu_release_store+0xa0/0x100
dev_attr_store+0x68/0xa0
sysfs_kf_write+0xa8/0xf0
kernfs_fop_write+0x2cc/0x400
__vfs_write+0x5c/0x340
vfs_write+0x1a8/0x3d0
SyS_write+0xa8/0x1a0
system_call+0x58/0x6c
Fix the issue by removing the of_node_put(parent) call from
dlpar_attach_node(), and ensuring that the reference to the parent
node is properly held and released by the caller dlpar_cpu_add().
Fixes: 215ee763f8 ("powerpc: pseries: remove dlpar_attach_node dependency on full path")
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
[mpe: Add a comment in the code and frob the change log slightly]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Otherwise we end up not yet having computed the right diag data size
on powernv where EEH initialization is delayed, thus causing memory
corruption later on when calling OPAL.
Fixes: 5cb1f8fddd ("powerpc/powernv/pci: Dynamically allocate PHB diag data")
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add a new sysctl file /proc/sys/s390/topology which displays if
topology is on (1) or off (0) as specified by the "topology=" kernel
parameter.
This allows to change topology information during runtime and
configuring it via /etc/sysctl.conf instead of using the kernel line
parameter.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If running on machines that do not provide topology information we
currently generate a "fake" topology which defines the maximum
distance between each cpu: each cpu will be put into an own drawer.
Historically this used to be the best option for (virtual) machines in
overcommited hypervisors.
For some workloads however it is better to generate a different
topology where all cpus are siblings within a package (all cpus are
core siblings). This shows performance improvements of up to 10%,
depending on the workload.
In order to keep the current behaviour, but also allow to switch to
the different core sibling topology use the existing "topology="
kernel parameter:
Specifying "topology=on" on machines without topology information will
generate the core siblings (fake) topology information, instead of the
default topology information where all cpus have the maximum distance.
On machines which provide topology information specifying
"topology=on" does not have any effect.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Optprobes depended on an updated regs->nip from analyse_instr() to
identify the location to branch back from the optprobes trampoline.
However, since commit 3cdfcbfd32 ("powerpc: Change analyse_instr so
it doesn't modify *regs"), analyse_instr() doesn't update the registers
anymore. Due to this, we end up branching back from the optprobes
trampoline to the same branch into the trampoline resulting in a loop.
Fix this by calling out to emulate_update_regs() before using the nip.
Additionally, explicitly compare the return value from analyse_instr()
to 1, rather than just checking for !0 so as to guard against any
future changes to analyse_instr() that may result in -1 being returned
in more scenarios.
Fixes: 3cdfcbfd32 ("powerpc: Change analyse_instr so it doesn't modify *regs")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Using RBP as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.
Use R13 instead of RBP. Both are callee-saved registers, so the
substitution is straightforward.
Reported-by: Eric Biggers <ebiggers@google.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Using RBP as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.
Mix things up a little bit to get rid of the RBP usage, without hurting
performance too much. Use RDI instead of RBP for the TBL pointer. That
will clobber CTX, so spill CTX onto the stack and use R12 to read it in
the outer loop. R12 is used as a non-persistent temporary variable
elsewhere, so it's safe to use.
Also remove the unused y4 variable.
Reported-by: Eric Biggers <ebiggers3@gmail.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Using RBP as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.
Swap the usages of R12 and RBP. Use R12 for the TBL register, and use
RBP to store the pre-aligned stack pointer.
Reported-by: Eric Biggers <ebiggers@google.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Using RBP as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.
There's no need to use RBP as a temporary register for the TBL value,
because it always stores the same value: the address of the K256 table.
Instead just reference the address of K256 directly.
Reported-by: Eric Biggers <ebiggers@google.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Using RBP as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.
Swap the usages of R12 and RBP. Use R12 for the TBL register, and use
RBP to store the pre-aligned stack pointer.
Reported-by: Eric Biggers <ebiggers@google.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Using RBP as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.
Swap the usages of R12 and RBP. Use R12 for the REG_D register, and use
RBP to store the pre-aligned stack pointer.
Reported-by: Eric Biggers <ebiggers@google.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Using RBP as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.
Use R11 instead of RBP. Since R11 isn't a callee-saved register, it
doesn't need to be saved and restored on the stack.
Reported-by: Eric Biggers <ebiggers@google.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Using RBP as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.
Use RSI instead of RBP for RT1. Since RSI is also used as a the 'dst'
function argument, it needs to be saved on the stack until the argument
is needed.
Reported-by: Eric Biggers <ebiggers@google.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Using RBP as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.
Use R15 instead of RBP. R15 can't be used as the RID1 register because
of x86 instruction encoding limitations. So use R15 for CTX and RDI for
CTX. This means that CTX is no longer an implicit function argument.
Instead it needs to be explicitly copied from RDI.
Reported-by: Eric Biggers <ebiggers@google.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Using RBP as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.
Use R15 instead of RBP. R15 can't be used as the RID1 register because
of x86 instruction encoding limitations. So use R15 for CTX and RDI for
CTX. This means that CTX is no longer an implicit function argument.
Instead it needs to be explicitly copied from RDI.
Reported-by: Eric Biggers <ebiggers@google.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Using RBP as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.
Use R12 instead of RBP. Both are callee-saved registers, so the
substitution is straightforward.
Reported-by: Eric Biggers <ebiggers@google.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Using RBP as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.
Use R12 instead of RBP. R12 can't be used as the RT0 register because
of x86 instruction encoding limitations. So use R12 for CTX and RDI for
CTX. This means that CTX is no longer an implicit function argument.
Instead it needs to be explicitly copied from RDI.
Reported-by: Eric Biggers <ebiggers@google.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Commit c311c79799 ("cpumask: make "nr_cpumask_bits" unsigned")
modified mipspmu_event_init() to cast the struct perf_event cpu field to
an unsigned integer before it is compared with nr_cpumask_bits (and
*ahem* did so without copying the linux-mips mailing list or any MIPS
developers...). This is broken because the cpu field may be -1 for
events which follow a process rather than being affine to a particular
CPU. When this is the case the cast to an unsigned int results in a
value equal to ULONG_MAX, which is always greater than nr_cpumask_bits
so we always fail mipspmu_event_init() and return -ENODEV.
The check against nr_cpumask_bits seems nonsensical anyway, so this
patch simply removes it. The cpu field is going to either be -1 or a
valid CPU number. Comparing it with nr_cpumask_bits is effectively
checking that it's a valid cpu number, but it seems safe to rely on the
core perf events code to ensure that's the case.
The end result is that this fixes use of perf on MIPS when not
constraining events to a particular CPU, and fixes the "perf list hw"
command which fails to list any events without this.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Fixes: c311c79799 ("cpumask: make "nr_cpumask_bits" unsigned")
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mips@linux-mips.org
Cc: stable <stable@vger.kernel.org> # v4.12+
Patchwork: https://patchwork.linux-mips.org/patch/17323/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Commit 24be85a23d ("powerpc/powernv: Clear PECE1 in LPCR via
stop-api only on Hotplug") clears the PECE1 bit of the LPCR via
stop-api during CPU-Hotplug to prevent wakeup due to a decrementer on
an offlined CPU which is in a deep stop state.
In the case where the stop-api support is found to be lacking, the
commit 785a12afdb ("powerpc/powernv/idle: Disable LOSE_FULL_CONTEXT
states when stop-api fails") disables deep states that lose hypervisor
context. Thus in this case, the offlined CPU will be put to some
shallow idle state.
However, we currently unconditionally clear the PECE1 in LPCR via
stop-api during CPU-Hotplug even when deep states are disabled due to
stop-api failure.
Fix this by clearing PECE1 of LPCR via stop-api during CPU-Hotplug
*only* when the offlined CPU will be put to a deep state that loses
hypervisor context.
Fixes: 24be85a23d ("powerpc/powernv: Clear PECE1 in LPCR via stop-api only on Hotplug")
Reported-by: Pavithra Prakash <pavirampu@linux.vnet.ibm.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Pavithra Prakash <pavrampu@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
mullw should do a 32 bit signed multiply and create a 64 bit signed
result. It currently truncates the result to 32 bits.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
mcrf broke when we changed analyse_instr() to not modify the register
state. The instruction writes to the CR, so we need to store the result
in op->ccval, not op->val.
Fixes: 3cdfcbfd32 ("powerpc: Change analyse_instr so it doesn't modify *regs")
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
set_cr0() broke when we changed analyse_instr() to not modify the
register state. Instead of looking at regs->gpr[x] which has not
been updated yet, we need to look at op->val.
Fixes: 3cdfcbfd32 ("powerpc: Change analyse_instr so it doesn't modify *regs")
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit cd63f3c ("powerpc/tm: Fix saving of TM SPRs in core dump")
added code to access TM SPRs in flush_tmregs_to_thread(). However
flush_tmregs_to_thread() does not check if TM feature is available on
CPU before trying to access TM SPRs in order to copy live state to
thread structures. flush_tmregs_to_thread() is indeed guarded by
CONFIG_PPC_TRANSACTIONAL_MEM but it might be the case that kernel
was compiled with CONFIG_PPC_TRANSACTIONAL_MEM enabled and ran on
a CPU without TM feature available, thus rendering the execution
of TM instructions that are treated by the CPU as illegal instructions.
The fix is just to add proper checking in flush_tmregs_to_thread()
if CPU has the TM feature before accessing any TM-specific resource,
returning immediately if TM is no available on the CPU. Adding
that checking in flush_tmregs_to_thread() instead of in places
where it is called, like in vsr_get() and vsr_set(), is better because
avoids the same problem cropping up elsewhere.
Cc: stable@vger.kernel.org # v4.13+
Fixes: cd63f3c ("powerpc/tm: Fix saving of TM SPRs in core dump")
Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com>
Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Kernel crashes if power pmu is not registered and user tries to dump
regs with 'echo p > /proc/sysrq-trigger'. Sample log:
Unable to handle kernel paging request for data at address 0x00000008
Faulting instruction address: 0xc0000000000d52f0
NIP [c0000000000d52f0] perf_event_print_debug+0x10/0x230
LR [c00000000058a938] sysrq_handle_showregs+0x38/0x50
Call Trace:
printk+0x38/0x4c (unreliable)
__handle_sysrq+0xe4/0x270
write_sysrq_trigger+0x64/0x80
proc_reg_write+0x80/0xd0
__vfs_write+0x40/0x200
vfs_write+0xc8/0x240
SyS_write+0x60/0x110
system_call+0x58/0x6c
Fixes: 5f6d0380c6 ("powerpc/perf: Define perf_event_print_debug() to print PMU register values")
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit eb3b705aae ("ALSA: Make CONFIG_SND_OSSEMUL user-selectable")
means we need to set CONFIG_SND_OSSEMUL in our configs, otherwise we
lose some of the SND symbols.
And commit 0181307abc ("ALSA: seq: Reorganize kconfig and build")
reorganised things, which causes the churn.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
- fix build without CONFIG_HAVE_KVM_IRQ_ROUTING
- fix NULL access in x86 CR access
- fix race with VMX posted interrups
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJZwS3PAAoJEED/6hsPKofoT+EH/0EGL2BdSAMmtLm5HUrGJHpO
412Q0bxV2KREcic1xJ+eJiuUcM2UihvflOyJQVBFEkToClw9jbB8Ms0kQUufYkLa
R1y7HmrDVVSbuEtd68fqbApuUaOKbjQEjmjKL5j3A2vxs9dgID5qMffRj5yGBC+a
V0ZpVsdLwQvqix77ibPXpoZnerbvOqkFadskGjYBpoiXEhNPbsEdc4Ca6sHAiqSs
hfUGTAnMSLBl34GfMBwvh++b8H/YlAoWM2vDnV4LnQb48hbGwqSwcVQ3CFEQbFgN
MrZoRFYpdx4FzXYYsh7dTSvPO4JyZXex7QKZSrZpg59Azfcx8pKv3am7H9W811g=
=ksrm
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
- fix build without CONFIG_HAVE_KVM_IRQ_ROUTING
- fix NULL access in x86 CR access
- fix race with VMX posted interrups
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: VMX: remove WARN_ON_ONCE in kvm_vcpu_trigger_posted_interrupt
KVM: VMX: do not change SN bit in vmx_update_pi_irte()
KVM: x86: Fix the NULL pointer parameter in check_cr_write()
Revert "KVM: Don't accept obviously wrong gsi values via KVM_IRQFD"
04c81c7293 ("MIPS: PCI: Replace pci_fixup_irqs() call with host bridge
IRQ mapping hooks") moved the PCI IRQ fixup to the new host bridge
map/swizzle_irq() hooks mechanism. Those hooks can also be called after
boot, when all the __init/__initdata/__initconst sections have been freed.
Therefore, functions called by them (and the data they refer to) must not
be marked as __init/__initdata/__initconst lest compilation trigger section
mismatch warnings.
Fix all the board files map_irq() hooks by simply removing the respective
__init/__initdata/__initconst section markers and by adding another
persistent hook IRQ map for the txx9 board files.
Fixes: 04c81c7293 ("MIPS: PCI: Replace pci_fixup_irqs() call with host bridge IRQ mapping hooks")
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Steve French <smfrench@gmail.com>
WARN_ON_ONCE(pi_test_sn(&vmx->pi_desc)) in kvm_vcpu_trigger_posted_interrupt()
intends to detect the violation of invariant that VT-d PI notification
event is not suppressed when vcpu is in the guest mode. Because the
two checks for the target vcpu mode and the target suppress field
cannot be performed atomically, the target vcpu mode may change in
between. If that does happen, WARN_ON_ONCE() here may raise false
alarms.
As the previous patch fixed the real invariant breaker, remove this
WARN_ON_ONCE() to avoid false alarms, and document the allowed cases
instead.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reported-by: "Ramamurthy, Venkatesh" <venkatesh.ramamurthy@intel.com>
Reported-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes: 28b835d60f ("KVM: Update Posted-Interrupts Descriptor when vCPU is preempted")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
In kvm_vcpu_trigger_posted_interrupt() and pi_pre_block(), KVM
assumes that PI notification events should not be suppressed when the
target vCPU is not blocked.
vmx_update_pi_irte() sets the SN field before changing an interrupt
from posting to remapping, but it does not check the vCPU mode.
Therefore, the change of SN field may break above the assumption.
Besides, I don't see reasons to suppress notification events here, so
remove the changes of SN field to avoid race condition.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reported-by: "Ramamurthy, Venkatesh" <venkatesh.ramamurthy@intel.com>
Reported-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes: 28b835d60f ("KVM: Update Posted-Interrupts Descriptor when vCPU is preempted")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Routine check_cr_write() will trigger emulator_get_cpuid()->
kvm_cpuid() to get maxphyaddr, and NULL is passed as values
for ebx/ecx/edx. This is problematic because kvm_cpuid() will
dereference these pointers.
Fixes: d1cd3ce900 ("KVM: MMU: check guest CR3 reserved bits based on its physical address width.")
Reported-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
The check for the _SEGMENT_ENTRY_PROTECT bit in gup_huge_pmd() is the
wrong way around. It must not be set for write==1, and not be checked for
write==0. Fix this similar to how it was fixed for ptes long time ago in
commit 25591b0703 ("[S390] fix get_user_pages_fast").
One impact of this bug would be unnecessarily using the gup slow path for
write==0 on r/w mappings. A potentially more severe impact would be that
gup_huge_pmd() will succeed for write==1 on r/o mappings.
Cc: <stable@vger.kernel.org>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Commit 227be799c3 ("s390/mm: uninline pmdp_xxx functions from pgtable.h")
inadvertently changed the behavior of pmdp_invalidate(), so that it now
clears the pmd instead of just marking it as invalid. Fix this by restoring
the original behavior.
A possible impact of the misbehaving pmdp_invalidate() would be the
MADV_DONTNEED races (see commits ced10803 and 58ceeb6b), although we
should not have any negative impact on the related dirty/young flags,
since those flags are not set by the hardware on s390.
Fixes: 227be799c3 ("s390/mm: uninline pmdp_xxx functions from pgtable.h")
Cc: <stable@vger.kernel.org> # v4.6+
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The kernel needs to be compiled as a LP64 binary for ARM64, even when
using a compiler that defaults to code-generation for the ILP32 ABI.
Consequently, we need to explicitly pass '-mabi=lp64' (supported on
gcc-4.9 and newer).
Signed-off-by: Andrew Pinski <Andrew.Pinski@caviumnetworks.com>
Signed-off-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>
Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Reviewed-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Aarch64 instructions must be word aligned. The current 16 byte
alignment is more than enough. Relax it into 4 byte alignment.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
__efi_fpsimd_begin()/__efi_fpsimd_end() are for use when making EFI
calls only, so using them in non-EFI kernels is not allowed.
This patch compiles them out if CONFIG_EFI is not set.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
A bug was reported on ARM where set_fs might be called after it was
checked on the work pending function. ARM64 is not affected by this bug
but has a similar construct. In order to avoid any similar problems in
the future, the addr_limit_user_check function is moved at the beginning
of the loop.
Fixes: cf7de27ab3 ("arm64/syscalls: Check address limit on user-mode return")
Reported-by: Leonard Crestez <leonard.crestez@nxp.com>
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Pratyush Anand <panand@redhat.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Will Drewry <wad@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-api@vger.kernel.org
Cc: Yonghong Song <yhs@fb.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1504798247-48833-5-git-send-email-keescook@chromium.org
Disable the generic address limit check in favor of an architecture
specific optimized implementation. The generic implementation using
pending work flags did not work well with ARM and alignment faults.
The address limit is checked on each syscall return path to user-mode
path as well as the irq user-mode return function. If the address limit
was changed, a function is called to report data corruption (stopping
the kernel or process based on configuration).
The address limit check has to be done before any pending work because
they can reset the address limit and the process is killed using a
SIGKILL signal. For example the lkdtm address limit check does not work
because the signal to kill the process will reset the user-mode address
limit.
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Tested-by: Leonard Crestez <leonard.crestez@nxp.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Pratyush Anand <panand@redhat.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Will Drewry <wad@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-api@vger.kernel.org
Cc: Yonghong Song <yhs@fb.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1504798247-48833-4-git-send-email-keescook@chromium.org
This reverts commit 73ac5d6a2b.
The work pending loop can call set_fs after addr_limit_user_check
removed the _TIF_FSCHECK flag. This may happen at anytime based on how
ARM handles alignment exceptions. It leads to an infinite loop condition.
After discussion, it has been agreed that the generic approach is not
tailored to the ARM architecture and any fix might not be complete. This
patch will be replaced by an architecture specific implementation. The
work flag approach will be kept for other architectures.
Reported-by: Leonard Crestez <leonard.crestez@nxp.com>
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Pratyush Anand <panand@redhat.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Will Drewry <wad@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-api@vger.kernel.org
Cc: Yonghong Song <yhs@fb.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1504798247-48833-3-git-send-email-keescook@chromium.org
For unknown historical reasons (i.e. Borislav doesn't recall),
32-bit kernels invoke cpu_init() on secondary CPUs with
initial_page_table loaded into CR3. Then they set
current->active_mm to &init_mm and call enter_lazy_tlb() before
fixing CR3. This means that the x86 TLB code gets invoked while CR3
is inconsistent, and, with the improved PCID sanity checks I added,
we warn.
Fix it by loading swapper_pg_dir (i.e. init_mm.pgd) earlier.
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 72c0098d92 ("x86/mm: Reinitialize TLB state on hotplug and resume")
Link: http://lkml.kernel.org/r/30cdfea504682ba3b9012e77717800a91c22097f.1505663533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Otherwise we might have the PCID feature bit set during cpu_init().
This is just for robustness. I haven't seen any actual bugs here.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: cba4671af7 ("x86/mm: Disable PCID on 32-bit kernels")
Link: http://lkml.kernel.org/r/b16dae9d6b0db5d9801ddbebbfd83384097c61f3.1505663533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Putting the logical ASID into CR3's PCID bits directly means that we
have two cases to consider separately: ASID == 0 and ASID != 0.
This means that bugs that only hit in one of these cases trigger
nondeterministically.
There were some bugs like this in the past, and I think there's
still one in current kernels. In particular, we have a number of
ASID-unware code paths that save CR3, write some special value, and
then restore CR3. This includes suspend/resume, hibernate, kexec,
EFI, and maybe other things I've missed. This is currently
dangerous: if ASID != 0, then this code sequence will leave garbage
in the TLB tagged for ASID 0. We could potentially see corruption
when switching back to ASID 0. In principle, an
initialize_tlbstate_and_flush() call after these sequences would
solve the problem, but EFI, at least, does not call this. (And it
probably shouldn't -- initialize_tlbstate_and_flush() is rather
expensive.)
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/cdc14bbe5d3c3ef2a562be09a6368ffe9bd947a6.1505663533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Current, the code that assembles a value to load into CR3 is
open-coded everywhere. Factor it out into helpers build_cr3() and
build_cr3_noflush().
This makes one semantic change: __get_current_cr3_fast() was wrong
on SME systems. No one noticed because the only caller is in the
VMX code, and there are no CPUs with both SME and VMX.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
Link: http://lkml.kernel.org/r/ce350cf11e93e2842d14d0b95b0199c7d881f527.1505663533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 fix from Thomas Gleixner:
"A single fix addressing the missing CP8 feature bit in CPUID for a
range of AMD ZEN models/mask revisions"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu/AMD: Fix erratum 1076 (CPB bit)
Pull UML updates from Richard Weinberger:
- minor improvements
- fixes for Debian's new gcc defaults (pie enabled by default)
- fixes for XSTATE/XSAVE to make UML work again on modern systems
* 'for-linus-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: return negative in tuntap_open_tramp()
um: remove a stray tab
um: Use relative modversions with LD_SCRIPT_DYN
um: link vmlinux with -no-pie
um: Fix CONFIG_GCOV for modules.
Fix minor typos and grammar in UML start_up help
um: defconfig: Cleanup from old Kconfig options
um: Fix FP register size for XSTATE/XSAVE
Pull MIPS updates from Ralf Baechle:
"This is the main pull request for 4.14 for MIPS; below a summary of
the non-merge commits:
CM:
- Rename mips_cm_base to mips_gcr_base
- Specify register size when generating accessors
- Use BIT/GENMASK for register fields, order & drop shifts
- Add cluster & block args to mips_cm_lock_other()
CPC:
- Use common CPS accessor generation macros
- Use BIT/GENMASK for register fields, order & drop shifts
- Introduce register modify (set/clear/change) accessors
- Use change_*, set_* & clear_* where appropriate
- Add CM/CPC 3.5 register definitions
- Use GlobalNumber macros rather than magic numbers
- Have asm/mips-cps.h include CM & CPC headers
- Cluster support for topology functions
- Detect CPUs in secondary clusters
CPS:
- Read GIC_VL_IDENT directly, not via irqchip driver
DMA:
- Consolidate coherent and non-coherent dma_alloc code
- Don't use dma_cache_sync to implement fd_cacheflush
FPU emulation / FP assist code:
- Another series of 14 commits fixing corner cases such as NaN
propgagation and other special input values.
- Zero bits 32-63 of the result for a CLASS.D instruction.
- Enhanced statics via debugfs
- Do not use bools for arithmetic. GCC 7.1 moans about this.
- Correct user fault_addr type
Generic MIPS:
- Enhancement of stack backtraces
- Cleanup from non-existing options
- Handle non word sized instructions when examining frame
- Fix detection and decoding of ADDIUSP instruction
- Fix decoding of SWSP16 instruction
- Refactor handling of stack pointer in get_frame_info
- Remove unreachable code from force_fcr31_sig()
- Convert to using %pOF instead of full_name
- Remove the R6000 support.
- Move FP code from *_switch.S to *_fpu.S
- Remove unused ST_OFF from r2300_switch.S
- Allow platform to specify multiple its.S files
- Add #includes to various files to ensure code builds reliable and
without warning..
- Remove __invalidate_kernel_vmap_range
- Remove plat_timer_setup
- Declare various variables & functions static
- Abstract CPU core & VP(E) ID access through accessor functions
- Store core & VP IDs in GlobalNumber-style variable
- Unify checks for sibling CPUs
- Add CPU cluster number accessors
- Prevent direct use of generic_defconfig
- Make CONFIG_MIPS_MT_SMP default y
- Add __ioread64_copy
- Remove unnecessary inclusions of linux/irqchip/mips-gic.h
GIC:
- Introduce asm/mips-gic.h with accessor functions
- Use new GIC accessor functions in mips-gic-timer
- Remove counter access functions from irq-mips-gic.c
- Remove gic_read_local_vp_id() from irq-mips-gic.c
- Simplify shared interrupt pending/mask reads in irq-mips-gic.c
- Simplify gic_local_irq_domain_map() in irq-mips-gic.c
- Drop gic_(re)set_mask() functions in irq-mips-gic.c
- Remove gic_set_polarity(), gic_set_trigger(), gic_set_dual_edge(),
gic_map_to_pin() and gic_map_to_vpe() from irq-mips-gic.c.
- Convert remaining shared reg access, local int mask access and
remaining local reg access to new accessors
- Move GIC_LOCAL_INT_* to asm/mips-gic.h
- Remove GIC_CPU_INT* macros from irq-mips-gic.c
- Move various definitions to the driver
- Remove gic_get_usm_range()
- Remove __gic_irq_dispatch() forward declaration
- Remove gic_init()
- Use mips_gic_present() in place of gic_present and remove
gic_present
- Move gic_get_c0_*_int() to asm/mips-gic.h
- Remove linux/irqchip/mips-gic.h
- Inline __gic_init()
- Inline gic_basic_init()
- Make pcpu_masks a per-cpu variable
- Use pcpu_masks to avoid reading GIC_SH_MASK*
- Clean up mti, reserved-cpu-vectors handling
- Use cpumask_first_and() in gic_set_affinity()
- Let the core set struct irq_common_data affinity
microMIPS:
- Fix microMIPS stack unwinding on big endian systems
MIPS-GIC:
- SYNC after enabling GIC region
NUMA:
- Remove the unused parent_node() macro
R6:
- Constify r2_decoder_tables
- Add accessor & bit definitions for GlobalNumber
SMP:
- Constify smp ops
- Allow boot_secondary SMP op to return errors
VDSO:
- Drop gic_get_usm_range() usage
- Avoid use of linux/irqchip/mips-gic.h
Platform changes:
Alchemy:
- Add devboard machine type to cpuinfo
- update cpu feature overrides
- Threaded carddetect irqs for devboards
AR7:
- allow NULL clock for clk_get_rate
BCM63xx:
- Fix ENETDMA_6345_MAXBURST_REG offset
- Allow NULL clock for clk_get_rate
CI20:
- Enable GPIO and RTC drivers in defconfig
- Add ethernet and fixed-regulator nodes to DTS
Generic platform:
- Move Boston and NI 169445 FIT image source to their own files
- Include asm/bootinfo.h for plat_fdt_relocated()
- Include asm/time.h for get_c0_*_int()
- Include asm/bootinfo.h for plat_fdt_relocated()
- Include asm/time.h for get_c0_*_int()
- Allow filtering enabled boards by requirements
- Don't explicitly disable CONFIG_USB_SUPPORT
- Bump default NR_CPUS to 16
JZ4700:
- Probe the jz4740-rtc driver from devicetree
Lantiq:
- Drop check of boot select from the spi-falcon driver.
- Drop check of boot select from the lantiq-flash MTD driver.
- Access boot cause register in the watchdog driver through regmap
- Add device tree binding documentation for the watchdog driver
- Add docs for the RCU DT bindings.
- Convert the fpi bus driver to a platform_driver
- Remove ltq_reset_cause() and ltq_boot_select(
- Switch to a proper reset driver
- Switch to a new drivers/soc GPHY driver
- Add an USB PHY driver for the Lantiq SoCs using the RCU module
- Use of_platform_default_populate instead of __dt_register_buses
- Enable MFD_SYSCON to be able to use it for the RCU MFD
- Replace ltq_boot_select() with dummy implementation.
Loongson 2F:
- Allow NULL clock for clk_get_rate
Malta:
- Use new GIC accessor functions
NI 169445:
- Add support for NI 169445 board.
- Only include in 32r2el kernels
Octeon:
- Add support for watchdog of 78XX SOCs.
- Add support for watchdog of CN68XX SOCs.
- Expose support for mips32r1, mips32r2 and mips64r1
- Enable more drivers in config file
- Add support for accessing the boot vector.
- Remove old boot vector code from watchdog driver
- Define watchdog registers for 70xx, 73xx, 78xx, F75xx.
- Make CSR functions node aware.
- Allow access to CIU3 IRQ domains.
- Misc cleanups in the watchdog driver
Omega2+:
- New board, add support and defconfig
Pistachio:
- Enable Root FS on NFS in defconfig
Ralink:
- Add Mediatek MT7628A SoC
- Allow NULL clock for clk_get_rate
- Explicitly request exclusive reset control in the pci-mt7620 PCI driver.
SEAD3:
- Only include in 32 bit kernels by default
VoCore:
- Add VoCore as a vendor t0 dt-bindings
- Add defconfig file"
* '4.14-features' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (167 commits)
MIPS: Refactor handling of stack pointer in get_frame_info
MIPS: Stacktrace: Fix microMIPS stack unwinding on big endian systems
MIPS: microMIPS: Fix decoding of swsp16 instruction
MIPS: microMIPS: Fix decoding of addiusp instruction
MIPS: microMIPS: Fix detection of addiusp instruction
MIPS: Handle non word sized instructions when examining frame
MIPS: ralink: allow NULL clock for clk_get_rate
MIPS: Loongson 2F: allow NULL clock for clk_get_rate
MIPS: BCM63XX: allow NULL clock for clk_get_rate
MIPS: AR7: allow NULL clock for clk_get_rate
MIPS: BCM63XX: fix ENETDMA_6345_MAXBURST_REG offset
mips: Save all registers when saving the frame
MIPS: Add DWARF unwinding to assembly
MIPS: Make SAVE_SOME more standard
MIPS: Fix issues in backtraces
MIPS: jz4780: DTS: Probe the jz4740-rtc driver from devicetree
MIPS: Ci20: Enable RTC driver
watchdog: octeon-wdt: Add support for 78XX SOCs.
watchdog: octeon-wdt: Add support for cn68XX SOCs.
watchdog: octeon-wdt: File cleaning.
...
gcc-4.6 causes a harmless link-time warning:
WARNING: vmlinux.o(.text.unlikely+0x48e): Section mismatch in reference from the function xen_find_pt_base() to the function .init.text:m2p()
The function xen_find_pt_base() references
the function __init m2p().
This is often because xen_find_pt_base lacks a __init
annotation or the annotation of m2p is wrong.
Newer compilers inline this function, so it never shows up, but marking
it __init is the right way to avoid the warning.
Fixes: 70e6119955 ("xen: move p2m list if conflicting with e820 map")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
nios2: time: Read timer in get_cycles only if initialized
nios2: add earlycon support to 3c120 devboard DTS
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZu+wvAAoJEFWoEK+e3syCojkP/R1BwzrM3hi5elL0Z/uH7X5Z
zcRcfDsAMssdYEERnLk+8Ecw2ehN/8qi7qMD2AW5Lm+N1Tob7L4fG87ZTQQLXi9Y
uIlVvbeIwlkanN7iQXm9E9s7RMyu/7NO3JrdnKAo0SJaL6zbiYIAGdgIItvgrfUI
VVbYRBUmwkzR9qqFL1BSEZLA0CRQCVAhTXKDcmiSFKn6g8dhmlSQ0xxs0J9LVAgI
ClhHQQejX0eMFUlZ5/I3lpwZHLBA+AVBcCsRO6HqK5erLsZy697mZVvrzGI+2mPk
Rumo0Fwgwz0MjtMkzVAFBn7J6tdoIQJxlwOyg8iJ65oWzEhplvc2KgkHzxaONKkG
ISMm+JDdH7R+Nz+O3PmCio/HBrvZLxC3wSsHtqh19rfA9TiX+Y+J260GIqvksmLo
06MgdMOzdvIAuRmwGIDy12Q38gIwNIPr4NPh2m89QXtuTbUoWSM70dA4jKmCmVg0
ux4hAe4qxBRcse9U/pVx+H7z2piwZyXtbk453xa0zucNun1mh0Y+v79fY5IF07T9
v84ph7aUSBoaXLyTuDfgfCjB/cSw2xlK0W1k0rra2XTWm7YKIp38QsYvCSBdc54t
p8EtURPf5W5514B0QIfhyHNmVlSz9MIKYhigYC42AyvRESCNkzwxtxy/bEPXuUOm
t/iNPv2TdNQCQ9aMWhqx
=pf1f
-----END PGP SIGNATURE-----
Merge tag 'nios2-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2
Pull arch/nios2 update from Ley Foon Tan.
* tag 'nios2-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
nios2: time: Read timer in get_cycles only if initialized
nios2: add earlycon support to 3c120 devboard DTS
Just one fix, for the handling of alignment interrupts on dcbz instructions.
Thanks to:
Paul Mackerras, Christian Zigotzky, Michal Sojka.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZu5HyAAoJEFHr6jzI4aWA+yoP+gOKF4obMWnvSJ5YWZZWQee/
T++Uhps7AF0G2y99rBVUKzbXzweDPpBrNQW5cFlF51WinNCA4KJ4E08xoNSrnlKl
ebc6Tn2cHbpWgydSRxWDig/Dp90U95toKp14irO2YvOVWoTgYBX7gOHhkMcpxdLH
MeVe6IAatOSNNNpJK9gLrNDLZaDhTGqXf64VIaw44llf0R9avNFhoS+8F94bllm2
VSsXizP1HZxWFcxJ1W/F54zkOvOkSXxjmd9Lf0IOxERPAshZBO7MfDaXF9MaYXwL
4AU5GZc2UmWIJpwmSQqRQPpLOPFvsJkhNnXeUb/8kQAG2W+HfZy6BvHpVh1lajm7
j+tyIYe+E7Jt63tQxyxSKpGnu20po78iimciYhl77A2RksGl9Tw5sHWj8gok30TI
pQFIRLPAO38MnnD3OrsboAnBrruS0Og+lkSWan7nV5n8ybHflvimjG+MfkNx9vzL
as7yJThQ8D8JBRAyaIHG3aAUPgBDxpZwqhDU3rac0qZChOc9DH4X7YFHR1UzGH8n
J8h9e7IsKGfvm4O9j4t3/nq0Qwh0LvpAlfu1/tPdJ0KXc9BnQTVoCpkFzhA+IFV6
JY2y4F1MfW8ZWS20iPHLlYEXYcTtcTq5b47JYQx97TgSp82ShXSLwbUxXNwSraDv
umUvJle++yyRm927njki
=F9yZ
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fix from Michael Ellerman:
"Just one fix, for the handling of alignment interrupts on dcbz
instructions.
Thanks to Paul Mackerras, Christian Zigotzky, Michal Sojka"
* tag 'powerpc-4.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Fix handling of alignment interrupt on dcbz instruction
When emulating a nested VM-entry from L1 to L2, several control field
validation checks are deferred to the hardware. Should one of these
validation checks fail, vcpu_vmx_run will set the vmx->fail flag. When
this happens, the L2 guest state is not loaded (even in part), and
execution should continue in L1 with the next instruction after the
VMLAUNCH/VMRESUME.
The VMCS12 is not modified (except for the VM-instruction error
field), the VMCS12 MSR save/load lists are not processed, and the CPU
state is not loaded from the VMCS12 host area. Moreover, the vmcs02
exit reason is stale, so it should not be consulted for any reason.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On an early VMLAUNCH/VMRESUME failure (i.e. one which sets the
VM-instruction error field of the current VMCS), the launch state of
the current VMCS is not set to "launched," and the VM-exit information
fields of the current VMCS (including IDT-vectoring information and
exit reason) are stale.
On a late VMLAUNCH/VMRESUME failure (i.e. one which sets the high bit
of the exit reason field), the launch state of the current VMCS is not
set to "launched," and only two of the VM-exit information fields of
the current VMCS are modified (exit reason and exit
qualification). The remaining VM-exit information fields of the
current VMCS (including IDT-vectoring information, in particular) are
stale.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
After a successful VM-entry, RFLAGS is cleared, with the exception of
bit 1, which is always set. This is handled by load_vmcs12_host_state.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For example, the following could occur, making us miss a wakeup:
CPU0 CPU1
kvm_vcpu_block kvm_mips_comparecount_func
[L] swait_active(&vcpu->wq)
[S] prepare_to_swait(&vcpu->wq)
[L] if (!kvm_vcpu_has_pending_timer(vcpu))
schedule() [S] queue_timer_int(vcpu)
Ensure that the swait_active() check is not hoisted over the interrupt.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Particularly because kvmppc_fast_vcpu_kick_hv() is a callback,
ensure that we properly serialize wq active checks in order to
avoid potentially missing a wakeup due to racing with the waiter
side.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
During code inspection, the following potential race was seen:
CPU0 CPU1
kvm_async_pf_task_wait apf_task_wake_one
[L] swait_active(&n->wq)
[S] prepare_to_swait(&n.wq)
[L] if (!hlist_unhahed(&n.link))
schedule() [S] hlist_del_init(&n->link);
Properly serialize swait_active() checks such that a wakeup is
not missed.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The value of the guest_irq argument to vmx_update_pi_irte() is
ultimately coming from a KVM_IRQFD API call. Do not BUG() in
vmx_update_pi_irte() if the value is out-of bounds. (Especially,
since KVM as a whole seems to hang after that.)
Instead, print a message only once if we find that we don't have a
route for a certain IRQ (which can be out-of-bounds or within the
array).
This fixes CVE-2017-1000252.
Fixes: efc644048e ("KVM: x86: Update IRTE for posted-interrupts")
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Mainline crashes as follows when running nios2 images.
On node 0 totalpages: 65536
free_area_init_node: node 0, pgdat c8408fa0, node_mem_map c8726000
Normal zone: 512 pages used for memmap
Normal zone: 0 pages reserved
Normal zone: 65536 pages, LIFO batch:15
Unable to handle kernel NULL pointer dereference at virtual address 00000000
ea = c8003cb0, ra = c81cbf40, cause = 15
Kernel panic - not syncing: Oops
Problem is seen because get_cycles() is called before the timer it depends
on is initialized. Returning 0 in that situation fixes the problem.
Fixes: 33d72f3822 ("init/main.c: extract early boot entropy from the ..")
Cc: Laura Abbott <labbott@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
If L1 does not specify the "use TPR shadow" VM-execution control in
vmcs12, then L0 must specify the "CR8-load exiting" and "CR8-store
exiting" VM-execution controls in vmcs02. Failure to do so will give
the L2 VM unrestricted read/write access to the hardware CR8.
This fixes CVE-2017-12154.
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
CPUID Fn8000_0007_EDX[CPB] is wrongly 0 on models up to B1. But they do
support CPB (AMD's Core Performance Boosting cpufreq CPU feature), so fix that.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170907170821.16021-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull more set_fs removal from Al Viro:
"Christoph's 'use kernel_read and friends rather than open-coding
set_fs()' series"
* 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: unexport vfs_readv and vfs_writev
fs: unexport vfs_read and vfs_write
fs: unexport __vfs_read/__vfs_write
lustre: switch to kernel_write
gadget/f_mass_storage: stop messing with the address limit
mconsole: switch to kernel_read
btrfs: switch write_buf to kernel_write
net/9p: switch p9_fd_read to kernel_write
mm/nommu: switch do_mmap_private to kernel_read
serial2002: switch serial2002_tty_write to kernel_{read/write}
fs: make the buf argument to __kernel_write a void pointer
fs: fix kernel_write prototype
fs: fix kernel_read prototype
fs: move kernel_read to fs/read_write.c
fs: move kernel_write to fs/read_write.c
autofs4: switch autofs4_write to __kernel_write
ashmem: switch to ->read_iter
Pull ipc compat cleanup and 64-bit time_t from Al Viro:
"IPC copyin/copyout sanitizing, including 64bit time_t work from Deepa
Dinamani"
* 'work.ipc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
utimes: Make utimes y2038 safe
ipc: shm: Make shmid_kernel timestamps y2038 safe
ipc: sem: Make sem_array timestamps y2038 safe
ipc: msg: Make msg_queue timestamps y2038 safe
ipc: mqueue: Replace timespec with timespec64
ipc: Make sys_semtimedop() y2038 safe
get rid of SYSVIPC_COMPAT on ia64
semtimedop(): move compat to native
shmat(2): move compat to native
msgrcv(2), msgsnd(2): move compat to native
ipc(2): move compat to native
ipc: make use of compat ipc_perm helpers
semctl(): move compat to native
semctl(): separate all layout-dependent copyin/copyout
msgctl(): move compat to native
msgctl(): split the actual work from copyin/copyout
ipc: move compat shmctl to native
shmctl: split the work from copyin/copyout
This fixes the emulation of the dcbz instruction in the alignment
interrupt handler. The error was that we were comparing just the
instruction type field of op.type rather than the whole thing,
and therefore the comparison "type != CACHEOP + DCBZ" was always
true.
Fixes: 31bfdb036f ("powerpc: Use instruction emulation infrastructure to handle alignment faults")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Tested-by: Michal Sojka <sojkam1@fel.cvut.cz>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull dmi update from Jean Delvare:
"Mark all struct dmi_system_id instances const"
* 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
dmi: Mark all struct dmi_system_id instances const
qemu-system-x86-8600 [004] d..1 7205.687530: kvm_entry: vcpu 2
qemu-system-x86-8600 [004] .... 7205.687532: kvm_exit: reason EXCEPTION_NMI rip 0xffffffffa921297d info ffffeb2c0e44e018 80000b0e
qemu-system-x86-8600 [004] .... 7205.687532: kvm_page_fault: address ffffeb2c0e44e018 error_code 0
qemu-system-x86-8600 [004] .... 7205.687620: kvm_try_async_get_page: gva = 0xffffeb2c0e44e018, gfn = 0x427e4e
qemu-system-x86-8600 [004] .N.. 7205.687628: kvm_async_pf_not_present: token 0x8b002 gva 0xffffeb2c0e44e018
kworker/4:2-7814 [004] .... 7205.687655: kvm_async_pf_completed: gva 0xffffeb2c0e44e018 address 0x7fcc30c4e000
qemu-system-x86-8600 [004] .... 7205.687703: kvm_async_pf_ready: token 0x8b002 gva 0xffffeb2c0e44e018
qemu-system-x86-8600 [004] d..1 7205.687711: kvm_entry: vcpu 2
After running some memory intensive workload in guest, I catch the kworker
which completes the GUP too quickly, and queues an "Page Ready" #PF exception
after the "Page not Present" exception before the next vmentry as the above
trace which will result in #DF injected to guest.
This patch fixes it by clearing the queue for "Page not Present" if "Page Ready"
occurs before the next vmentry since the GUP has already got the required page
and shadow page table has already been fixed by "Page Ready" handler.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Fixes: 7c90705bf2 ("KVM: Inject asynchronous page fault into a PV guest if page is swapped out.")
[Changed indentation and added clearing of injected. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Don't block vCPU if there is pending exception.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
SVM AVIC hardware accelerates guest write to APIC_EOI register
(for edge-trigger interrupt), which means it does not trap to KVM.
So, only enable SVM AVIC only in split irqchip mode.
(e.g. launching qemu w/ option '-machine kernel_irqchip=split').
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Fixes: 44a95dae1d ("KVM: x86: Detect and Initialize AVIC support")
[Removed pr_debug - Radim.]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
... and __initconst if applicable.
Based on similar work for an older kernel in the Grsecurity patch.
[JD: fix toshiba-wmi build]
[JD: add htcpen]
[JD: move __initconst where checkscript wants it]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
The stacktraces always begin as follows:
[<c00117b4>] save_stack_trace_tsk+0x0/0x98
[<c0011870>] save_stack_trace+0x24/0x28
...
This is because the stack trace code includes the stack frames for
itself. This is incorrect behaviour, and also leads to "skip" doing the
wrong thing (which is the number of stack frames to avoid recording.)
Perversely, it does the right thing when passed a non-current thread.
Fix this by ensuring that we have a known constant number of frames
above the main stack trace function, and always skip these.
This was fixed for arch arm by commit 3683f44c42 ("ARM: stacktrace:
avoid listing stacktrace functions in stacktrace")
Link: http://lkml.kernel.org/r/1504078343-28754-1-git-send-email-guptap@codeaurora.org
Signed-off-by: Prakash Gupta <guptap@codeaurora.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
GFP_TEMPORARY was introduced by commit e12ba74d8f ("Group short-lived
and reclaimable kernel allocations") along with __GFP_RECLAIMABLE. It's
primary motivation was to allow users to tell that an allocation is
short lived and so the allocator can try to place such allocations close
together and prevent long term fragmentation. As much as this sounds
like a reasonable semantic it becomes much less clear when to use the
highlevel GFP_TEMPORARY allocation flag. How long is temporary? Can the
context holding that memory sleep? Can it take locks? It seems there is
no good answer for those questions.
The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
__GFP_RECLAIMABLE which in itself is tricky because basically none of
the existing caller provide a way to reclaim the allocated memory. So
this is rather misleading and hard to evaluate for any benefits.
I have checked some random users and none of them has added the flag
with a specific justification. I suspect most of them just copied from
other existing users and others just thought it might be a good idea to
use without any measuring. This suggests that GFP_TEMPORARY just
motivates for cargo cult usage without any reasoning.
I believe that our gfp flags are quite complex already and especially
those with highlevel semantic should be clearly defined to prevent from
confusion and abuse. Therefore I propose dropping GFP_TEMPORARY and
replace all existing users to simply use GFP_KERNEL. Please note that
SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
so they will be placed properly for memory fragmentation prevention.
I can see reasons we might want some gfp flag to reflect shorterm
allocations but I propose starting from a clear semantic definition and
only then add users with proper justification.
This was been brought up before LSF this year by Matthew [1] and it
turned out that GFP_TEMPORARY really doesn't have a clear semantic. It
seems to be a heuristic without any measured advantage for most (if not
all) its current users. The follow up discussion has revealed that
opinions on what might be temporary allocation differ a lot between
developers. So rather than trying to tweak existing users into a
semantic which they haven't expected I propose to simply remove the flag
and start from scratch if we really need a semantic for short term
allocations.
[1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org
[akpm@linux-foundation.org: fix typo]
[akpm@linux-foundation.org: coding-style fixes]
[sfr@canb.auug.org.au: drm/i915: fix up]
Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The intention is to return negative error codes. "pid" is already
negative but we accidentally negate it again back to positive.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Static checkers would urge us to add curly braces to this code, but
actually the code works correctly. It just isn't indented right.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
When building a dynamic kernel image use relative symbols with MODVERSIONS.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
Debian's gcc defaults to pie. The global Makefile already defines the -fno-pie option.
Link UML dynamic kernel image also with -no-pie to fix the build.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Richard Weinberger <richard@nod.at>