Commit Graph

134895 Commits

Author SHA1 Message Date
Olof Johansson a1858df975 Allwinner fixes for 4.12
A few fixes around the PRCM support that got in 4.12 with a wrong
 compatible, and a missing clock in the binding.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZQZkZAAoJEBx+YmzsjxAg4nAQAJJqzy8//Ur0o6Ppc6eufIAY
 gYGS80x5n/a6/X7PPMj/cMBVc1/HoOoF5YVKry2edRi4jKwpBjCE6THNJ/EdI0LM
 6PrN4y9yJAzxbwWfD9rfVNLg54665TGW8etBp5C3Sqdi+qmU9BTL068UYGcA46I8
 XGZ53wGLnCfRH5VGpVzxORbdQMStKsZ2D0PTmZ7aJU2nrPugbf4DiGg2Uhgdx+bI
 Mz3Zl6cZQraWdl6gSVTjG1Z5LQOKo5tXGIaC4zxbXe/Ss2lspxM3WKtJDhdtoTH9
 ZiLGDf6Q3XUeMN5WkQNT6ZnT8+/8NQujhcktEfxhfiA5pGeHLzuOCaOLgEWVy8sc
 Z6jNLHUht3W/XGOGY0szKfqmsOnDdnsnv3YbCUoWJ/0ER8kwQdJ8k4iI1EVMi23Q
 UcXDiZivCgjj7mYOzfhG2YYZ03rxhadPqsnoDro/a+mI6splPhQplJC/we8TUYHt
 eJmXF3rvOgXGYJAdnF8FJiftzUKUd0h8S8qsxBB3knP4mPY73vtppsvJwPOqlil2
 EPcqHmcd3EvHRZrmHNsP7qpQXMiaWqImf6Ioq7hz4mPPJ5uiIPrEIdRmACCAdn9H
 eeOQyI0rdg3bTCKi/dztaX4zVCMjEy9HG0xZ/Y6dvPwKjuWTpJApsTC+xEaiql2L
 gQ+OFMX4mvgr79OUNp7L
 =WD1J
 -----END PGP SIGNATURE-----

Merge tag 'sunxi-fixes-for-4.12' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into fixes

Allwinner fixes for 4.12

A few fixes around the PRCM support that got in 4.12 with a wrong
compatible, and a missing clock in the binding.

* tag 'sunxi-fixes-for-4.12' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
  arm64: allwinner: a64: Add PLL_PERIPH0 clock to the R_CCU
  ARM: sunxi: h3-h5: Add PLL_PERIPH0 clock to the R_CCU
  arm64: allwinner: h5: Remove syslink to shared DTSI
  ARM: sunxi: h3/h5: fix the compatible of R_CCU

Signed-off-by: Olof Johansson <olof@lixom.net>
2017-06-18 20:42:21 -07:00
Olof Johansson 51b6e2813c Two fixes for am335x-sl50 to fix a boot time error
for claiming SPI pins, and to fix a SDIO card detect
 pin for production version of the device.
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEEkgNvrZJU/QSQYIcQG9Q+yVyrpXMFAlk3q9cRHHRvbnlAYXRv
 bWlkZS5jb20ACgkQG9Q+yVyrpXMFbRAAx9XBHtsHPI0AcWEagprXFoeKClEmlV33
 Az06mEwgWKM0IMKELdstFLgpT80zp71v44k61vxWHudJi3a3rEbcDiujmttKg4II
 fL3CfjpYS3son449VZWc/X0ZsvU/5vhusDkx8QwGWK1FgfOLVNvwNtgxWr3roUFC
 qUQZImkTvgKYuhoaoV5Sx0VtUEJ9ukLNAqjy0HTfTU15jkX4/nqmlB+8ov2oamYv
 r502+LEzNVoCmlYW1SBH+yn7zebIad2UtRZqz+TV8FnJl3p4yFI1Odvo8hbODTzT
 vo+YsSmhd23eKnw+yXpqdPYng3yXH5Co3vLFumpBcyNkZlCtSokLrJ1/VOjtTNVn
 oM+gsR77I6zGknFdiBzITLJRFABHlziSfeaDyyhfQX9CAfrCv9t5hiqn6KwRuiGj
 H1hD2JkFWeJLLPGO4YauJk5PLcAdMILfjHRHW7lBeKaT76nH4Ha5PgpCj22hNSCo
 m1EDdR30QzkfL28iYp8LM9yNhkm05Y8VG517AmIrzJUl6/RhTRH2f5d6+ChTQ7AA
 cp89ITqEeccCso6xULCPjqaGIuYCEopDFjQN/WP8Pt5bXweiiQ6HpAgMDp2E5PfD
 3yKwYvXDyl5HIa+ZbEDPLuna1OjDu7BrCe+nuDwsVfeM0fOHViABPl6mAvH7VXvb
 nQhOVeQSP8I=
 =e4F/
 -----END PGP SIGNATURE-----

Merge tag 'omap-for-v4.12/fixes-sl50' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes

Two fixes for am335x-sl50 to fix a boot time error
for claiming SPI pins, and to fix a SDIO card detect
pin for production version of the device.

* tag 'omap-for-v4.12/fixes-sl50' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: dts: am335x-sl50: Fix cannot claim requested pins for spi0
  ARM: dts: am335x-sl50: Fix card detect pin for mmc1

Signed-off-by: Olof Johansson <olof@lixom.net>
2017-06-18 18:55:12 -07:00
Linus Torvalds b3ee4edd8a Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:

 - Three highmem fixes:
    + Fixed mapping initialization
    + Adjust the pkmap location
    + Ensure we use at most one page for PTEs

 - Fix makefile dependencies for .its targets to depend on vmlinux

 - Fix reversed condition in BNEZC and JIALC software branch emulation

 - Only flush initialized flush_insn_slot to avoid NULL pointer
   dereference

 - perf: Remove incorrect odd/even counter handling for I6400

 - ftrace: Fix init functions tracing

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: .its targets depend on vmlinux
  MIPS: Fix bnezc/jialc return address calculation
  MIPS: kprobes: flush_insn_slot should flush only if probe initialised
  MIPS: ftrace: fix init functions tracing
  MIPS: mm: adjust PKMAP location
  MIPS: highmem: ensure that we don't use more than one page for PTEs
  MIPS: mm: fixed mappings: correct initialisation
  MIPS: perf: Remove incorrect odd/even counter handling for I6400
2017-06-19 09:01:01 +09:00
Linus Torvalds edf9364d3f Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "Two fixlets for x86:

   - Handle WARN_ONs proper with the new UD based WARN implementation

   - Disable 1G mappings when 2M mappings are disabled by kmemleak or
     debug_pagealloc. Otherwise 1G mappings might still be used,
     confusing the debug mechanisms"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Disable 1GB direct mappings when disabling 2MB mappings
  x86/debug: Handle early WARN_ONs proper
2017-06-18 18:49:12 +09:00
Linus Torvalds 5ac447d268 powerpc fixes for 4.12 #6
Three small fixes for recently merged code:
 
  - remove a spurious WARN_ON when a PCI device has no of_node, it's allowed in
    some circumstances for there to be no of_node.
 
  - fix the offset for store EOI MMIOs in the XIVE interrupt controller.
 
  - fix non-const WARN_ONs which were becoming BUGs due to them losing
    BUGFLAG_WARNING in a recent cleanup patch.
 
 Thanks to:
   Alexey Kardashevskiy, Alistair Popple, Benjamin Herrenschmidt.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZQ7XeAAoJEFHr6jzI4aWA2bgP/R++c9YehNdDKbZyLqumY+6q
 7ns8NoxgEW/Gc8JuSTE4MW51q2HimBJu6ntyHfMUwgpGQpGzOGDn6g8OxVfjOySa
 kFc7cytgOOhEpTHENDZ3xxZtcSd9iafyX9ga/0dz6UycfEHcZayiXDRuXffRzJwa
 RNqbwDxOtkgn6w4bW02SRlfDSTra+zQZQd6NsPXSJJgF+tb3MflMj1A9WoJp/mj/
 tXc9fpKQsZkIG/AvAHziizHqeAKJxUrmoVb8qy1SYTKVUDZoxTYgiO1G1nebZX/s
 Zzsdd/fcHcd0DIEJkjf2V3cegmIGTLzw7mUOodU7IF3mZ1LPgCMVF5lTTZzjcXDQ
 d1gugVojHnGr7KB3lNNijyHxsmHG7LdTQmRHcyZ2L8KYpa3/+Ca3ZuFnTwjvgRNx
 dJEFX5JdAhCrkg1B73rvcjKCFg0ysVIrkdf27SaameaQdQQuZU4+5+s1LB2EqJQr
 II3+pnZr/RF3OWu4yJE5KAHX5ZBQQ+unzVPpW4pqvwYMoVKhO7dhCPPISeRCtzJE
 +po5Ys4ncheSRhwf5dQhf+H04kXmL6ekpl1GJOBB3BskJcsIr8hiLp3/mF238et1
 80o6yTAJLADKUIl75ISiePz+KFZNamgke1/XWZolfHYZ9dNRF0c//E0qvpopz8jE
 F90hxEAtJ9ws/VUlo40Q
 =Mnxp
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.12-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Three small fixes for recently merged code:

   - remove a spurious WARN_ON when a PCI device has no of_node, it's
     allowed in some circumstances for there to be no of_node.

   - fix the offset for store EOI MMIOs in the XIVE interrupt
     controller.

   - fix non-const WARN_ONs which were becoming BUGs due to them losing
     BUGFLAG_WARNING in a recent cleanup patch.

  Thanks to: Alexey Kardashevskiy, Alistair Popple, Benjamin
  Herrenschmidt"

* tag 'powerpc-4.12-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/debug: Add missing warn flag to WARN_ON's non-builtin path
  powerpc/xive: Fix offset for store EOI MMIOs
  powerpc/npu-dma: Remove spurious WARN_ON when a PCI device has no of_node
2017-06-17 05:57:54 +09:00
Ravi Bangoria bf05fc25f2 powerpc/perf: Fix oops when kthread execs user process
When a kthread calls call_usermodehelper() the steps are:
  1. allocate current->mm
  2. load_elf_binary()
  3. populate current->thread.regs

While doing this, interrupts are not disabled. If there is a perf
interrupt in the middle of this process (i.e. step 1 has completed
but not yet reached to step 3) and if perf tries to read userspace
regs, kernel oops with following log:

  Unable to handle kernel paging request for data at address 0x00000000
  Faulting instruction address: 0xc0000000000da0fc
  ...
  Call Trace:
  perf_output_sample_regs+0x6c/0xd0
  perf_output_sample+0x4e4/0x830
  perf_event_output_forward+0x64/0x90
  __perf_event_overflow+0x8c/0x1e0
  record_and_restart+0x220/0x5c0
  perf_event_interrupt+0x2d8/0x4d0
  performance_monitor_exception+0x54/0x70
  performance_monitor_common+0x158/0x160
  --- interrupt: f01 at avtab_search_node+0x150/0x1a0
      LR = avtab_search_node+0x100/0x1a0
  ...
  load_elf_binary+0x6e8/0x15a0
  search_binary_handler+0xe8/0x290
  do_execveat_common.isra.14+0x5f4/0x840
  call_usermodehelper_exec_async+0x170/0x210
  ret_from_kernel_thread+0x5c/0x7c

Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace
pt_regs are not set.

Fixes: ed4a4ef85c ("powerpc/perf: Add support for sampling interrupt register state")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-16 21:02:46 +10:00
Naveen N. Rao d89ba5353f powerpc/64s: Handle data breakpoints in Radix mode
On Power9, trying to use data breakpoints throws the splat shown
below. This is because the check for a data breakpoint in DSISR is in
do_hash_page(), which is not called when in Radix mode.

  Unable to handle kernel paging request for data at address 0xc000000000e19218
  Faulting instruction address: 0xc0000000001155e8
  cpu 0x0: Vector: 300 (Data Access) at [c0000000ef1e7b20]
  pc: c0000000001155e8: find_pid_ns+0x48/0xe0
  lr: c000000000116ac4: find_task_by_vpid+0x44/0x90
  sp: c0000000ef1e7da0
  msr: 9000000000009033
  dar: c000000000e19218
  dsisr: 400000

Move the check to handle_page_fault() so as to catch data breakpoints
in both Hash and Radix MMU modes.

We have to change the check in do_hash_page() against 0xa410 to use
0xa450, so as to include the value of (DSISR_DABRMATCH << 16).

There are two sites that call handle_page_fault() when in Radix, both
already pass DSISR in r4.

Fixes: caca285e5a ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code")
Cc: stable@vger.kernel.org # v4.7+
Reported-by: Shriya R. Kulkarni <shriykul@in.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Fix the fall-through case on hash, we need to reload DSISR]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-16 19:49:43 +10:00
Naveen N. Rao c05b8c4474 powerpc/kprobes: Skip livepatch_handler() for jprobes
ftrace_caller() depends on a modified regs->nip to detect if a certain
function has been livepatched. However, with KPROBES_ON_FTRACE, it is
possible for regs->nip to have been modified by the kprobes pre_handler
(jprobes, for instance). In this case, we do not want to invoke the
livepatch_handler so as not to consume the livepatch stack.

To distinguish between the two (kprobes and livepatch), we check if
there is an active kprobe on the current function. If there is, then we
know for sure that it must have modified the NIP as we don't support
livepatching a kprobe'd function. In this case, we simply skip the
livepatch_handler and branch to the new NIP. Otherwise, the
livepatch_handler is invoked.

Fixes: ead514d5fb ("powerpc/kprobes: Add support for KPROBES_ON_FTRACE")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-16 19:49:43 +10:00
Naveen N. Rao a4979a7e71 powerpc/ftrace: Pass the correct stack pointer for DYNAMIC_FTRACE_WITH_REGS
For DYNAMIC_FTRACE_WITH_REGS, we should be passing-in the original set
of registers in pt_regs, to capture the state _before_ ftrace_caller.
However, we are instead passing the stack pointer *after* allocating a
stack frame in ftrace_caller. Fix this by saving the proper value of r1
in pt_regs. Also, use SAVE_10GPRS() to simplify the code.

Fixes: 153086644f ("powerpc/ftrace: Add support for -mprofile-kernel ftrace ABI")
Cc: stable@vger.kernel.org # v4.6+
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-16 19:49:43 +10:00
Naveen N. Rao a9f8553e93 powerpc/kprobes: Pause function_graph tracing during jprobes handling
This fixes a crash when function_graph and jprobes are used together.
This is essentially commit 237d28db03 ("ftrace/jprobes/x86: Fix
conflict between jprobes and function graph tracing"), but for powerpc.

Jprobes breaks function_graph tracing since the jprobe hook needs to use
jprobe_return(), which never returns back to the hook, but instead to
the original jprobe'd function. The solution is to momentarily pause
function_graph tracing before invoking the jprobe hook and re-enable it
when returning back to the original jprobe'd function.

Fixes: 6794c78243 ("powerpc64: port of the function graph tracer")
Cc: stable@vger.kernel.org # v2.6.30+
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-16 19:49:43 +10:00
Alexey Kardashevskiy a093c92dc7 powerpc/debug: Add missing warn flag to WARN_ON's non-builtin path
When trapped on WARN_ON(), report_bug() is expected to return
BUG_TRAP_TYPE_WARN so the caller will increment NIP by 4 and continue.
The __builtin_constant_p() path of the PPC's WARN_ON()
calls (indirectly) __WARN_FLAGS() which has BUGFLAG_WARNING set,
however the other branch does not which makes report_bug() report a
bug rather than a warning.

Fixes: f26dee1510 ("debug: Avoid setting BUGFLAG_WARNING twice")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-16 16:10:37 +10:00
Paul Mackerras 3d3efb68c1 KVM: PPC: Book3S HV: Ignore timebase offset on POWER9 DD1
POWER9 DD1 has an erratum where writing to the TBU40 register, which
is used to apply an offset to the timebase, can cause the timebase to
lose counts.  This results in the timebase on some CPUs getting out of
sync with other CPUs, which then results in misbehaviour of the
timekeeping code.

To work around the problem, we make KVM ignore the timebase offset for
all guests on POWER9 DD1 machines.  This means that live migration
cannot be supported on POWER9 DD1 machines.

Cc: stable@vger.kernel.org # v4.10+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-16 16:04:57 +10:00
Paul Mackerras 7ceaa6dcd8 KVM: PPC: Book3S HV: Save/restore host values of debug registers
At present, HV KVM on POWER8 and POWER9 machines loses any instruction
or data breakpoint set in the host whenever a guest is run.
Instruction breakpoints are currently only used by xmon, but ptrace
and the perf_event subsystem can set data breakpoints as well as xmon.

To fix this, we save the host values of the debug registers (CIABR,
DAWR and DAWRX) before entering the guest and restore them on exit.
To provide space to save them in the stack frame, we expand the stack
frame allocated by kvmppc_hv_entry() from 112 to 144 bytes.

Fixes: b005255e12 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-16 11:53:19 +10:00
Shaohua Li 7304e8f28b iommu/vt-d: Correctly disable Intel IOMMU force on
I made a mistake in commit bfd20f1. We should skip the force on with the
option enabled instead of vice versa. Not sure why this passed our
performance test, sorry.

Fixes: bfd20f1cc8 ('x86, iommu/vt-d: Add an option to disable Intel IOMMU force on')
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-06-15 16:41:10 +02:00
Benjamin Herrenschmidt 25642705b2 powerpc/xive: Fix offset for store EOI MMIOs
Architecturally we should apply a 0x400 offset for these. Not doing
it will break future HW implementations.

The offset of 0 is supposed to remain for "triggers" though not all
sources support both trigger and store EOI, and in P9 specifically,
some sources will treat 0 as a store EOI. But future chips will not.
So this makes us use the properly architected offset which should work
always.

Fixes: 243e25112d ("powerpc/xive: Native exploitation of the XIVE interrupt controller")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-15 23:29:39 +10:00
Paul Burton bcd7c45e0d MIPS: .its targets depend on vmlinux
The .its targets require information about the kernel binary, such as
its entry point, which is extracted from the vmlinux ELF. We therefore
require that the ELF is built before the .its files are generated.
Declare this requirement in the Makefile such that make will ensure this
is always the case, otherwise in corner cases we can hit issues as the
.its is generated with an incorrect (either invalid or stale) entry
point.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Fixes: cf2a5e0bb4 ("MIPS: Support generating Flattened Image Trees (.itb)")
Cc: linux-mips@linux-mips.org
Cc: stable <stable@vger.kernel.org> # v4.9+
Patchwork: https://patchwork.linux-mips.org/patch/16179/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-06-15 11:48:15 +02:00
Paul Burton 1a73d9310e MIPS: Fix bnezc/jialc return address calculation
The code handling the pop76 opcode (ie. bnezc & jialc instructions) in
__compute_return_epc_for_insn() needs to set the value of $31 in the
jialc case, which is encoded with rs = 0. However its check to
differentiate bnezc (rs != 0) from jialc (rs = 0) was unfortunately
backwards, meaning that if we emulate a bnezc instruction we clobber $31
& if we emulate a jialc instruction it actually behaves like a jic
instruction.

Fix this by inverting the check of rs to match the way the instructions
are actually encoded.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Fixes: 28d6f93d20 ("MIPS: Emulate the new MIPS R6 BNEZC and JIALC instructions")
Cc: stable <stable@vger.kernel.org> # v4.0+
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/16178/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-06-15 11:47:42 +02:00
Linus Torvalds a090bd4ff8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) The netlink attribute passed in to dev_set_alias() is not
    necessarily NULL terminated, don't use strlcpy() on it. From
    Alexander Potapenko.

 2) Fix implementation of atomics in arm64 bpf JIT, from Daniel
    Borkmann.

 3) Correct the release of netdevs and driver private data in certain
    circumstances.

 4) Sanitize netlink message length properly in decnet, from Mateusz
    Jurczyk.

 5) Don't leak kernel data in rtnl_fill_vfinfo() netlink blobs. From
    Yuval Mintz.

 6) Hash secret is never initialized in ipv6 ILA translation code, from
    Arnd Bergmann. I guess those clang warnings about unused inline
    functions are useful for something!

 7) Fix endian selection in bpf_endian.h, from Daniel Borkmann.

 8) Sanitize sockaddr length before dereferncing any fields in AF_UNIX
    and CAIF. From Mateusz Jurczyk.

 9) Fix timestamping for GMAC3 chips in stmmac driver, from Mario
    Molitor.

10) Do not leak netdev on dev_alloc_name() errors in mac80211, from
    Johannes Berg.

11) Fix locking in sctp_for_each_endpoint(), from Xin Long.

12) Fix wrong memset size on 32-bit in snmp6, from Christian Perle.

13) Fix use after free in ip_mc_clear_src(), from WANG Cong.

14) Fix regressions caused by ICMP rate limiting changes in 4.11, from
    Jesper Dangaard Brouer.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (91 commits)
  i40e: Fix a sleep-in-atomic bug
  net: don't global ICMP rate limit packets originating from loopback
  net/act_pedit: fix an error code
  net: update undefined ->ndo_change_mtu() comment
  net_sched: move tcf_lock down after gen_replace_estimator()
  caif: Add sockaddr length check before accessing sa_family in connect handler
  qed: fix dump of context data
  qmi_wwan: new Telewell and Sierra device IDs
  net: phy: Fix MDIO_THUNDER dependencies
  netconsole: Remove duplicate "netconsole: " logging prefix
  igmp: acquire pmc lock for ip_mc_clear_src()
  r8152: give the device version
  net: rps: fix uninitialized symbol warning
  mac80211: don't send SMPS action frame in AP mode when not needed
  mac80211/wpa: use constant time memory comparison for MACs
  mac80211: set bss_info data before configuring the channel
  mac80211: remove 5/10 MHz rate code from station MLME
  mac80211: Fix incorrect condition when checking rx timestamp
  mac80211: don't look at the PM bit of BAR frames
  i40e: fix handling of HW ATR eviction
  ...
2017-06-15 18:09:47 +09:00
Paul Mackerras 46a704f840 KVM: PPC: Book3S HV: Preserve userspace HTM state properly
If userspace attempts to call the KVM_RUN ioctl when it has hardware
transactional memory (HTM) enabled, the values that it has put in the
HTM-related SPRs TFHAR, TFIAR and TEXASR will get overwritten by
guest values.  To fix this, we detect this condition and save those
SPR values in the thread struct, and disable HTM for the task.  If
userspace goes to access those SPRs or the HTM facility in future,
a TM-unavailable interrupt will occur and the handler will reload
those SPRs and re-enable HTM.

If userspace has started a transaction and suspended it, we would
currently lose the transactional state in the guest entry path and
would almost certainly get a "TM Bad Thing" interrupt, which would
cause the host to crash.  To avoid this, we detect this case and
return from the KVM_RUN ioctl with an EINVAL error, with the KVM
exit reason set to KVM_EXIT_FAIL_ENTRY.

Fixes: b005255e12 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-15 16:18:17 +10:00
Paul Mackerras 4c3bb4ccd0 KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exit
This restores several special-purpose registers (SPRs) to sane values
on guest exit that were missed before.

TAR and VRSAVE are readable and writable by userspace, and we need to
save and restore them to prevent the guest from potentially affecting
userspace execution (not that TAR or VRSAVE are used by any known
program that run uses the KVM_RUN ioctl).  We save/restore these
in kvmppc_vcpu_run_hv() rather than on every guest entry/exit.

FSCR affects userspace execution in that it can prohibit access to
certain facilities by userspace.  We restore it to the normal value
for the task on exit from the KVM_RUN ioctl.

IAMR is normally 0, and is restored to 0 on guest exit.  However,
with a radix host on POWER9, it is set to a value that prevents the
kernel from executing user-accessible memory.  On POWER9, we save
IAMR on guest entry and restore it on guest exit to the saved value
rather than 0.  On POWER8 we continue to set it to 0 on guest exit.

PSPB is normally 0.  We restore it to 0 on guest exit to prevent
userspace taking advantage of the guest having set it non-zero
(which would allow userspace to set its SMT priority to high).

UAMOR is normally 0.  We restore it to 0 on guest exit to prevent
the AMR from being used as a covert channel between userspace
processes, since the AMR is not context-switched at present.

Fixes: b005255e12 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-15 16:17:09 +10:00
Heiko Carstens 4130b28f56 s390/ipl: revert Load Normal semantics for LPAR CCW-type re-IPL
This reverts the two commits

7afbeb6df2 ("s390/ipl: always use load normal for CCW-type re-IPL")
0f7451ff3a ("s390/ipl: use load normal for LPAR re-ipl")

The two commits did not take into account that behavior of standby
memory changes fundamentally if the re-IPL method is changed from
Load Clear to Load Normal.

In case of the old re-IPL clear method all memory that was initially
in standby state will be put into standby state again within the
re-IPL process. Or in other words: memory that was brought online
before a re-IPL will be offline again after a reboot.

Given that we use different re-IPL methods depending on the hypervisor
and CCW-type vs SCSI re-IPL it is not easy to tell in advance when and
why memory will stay online or will be offline after a re-IPL.
This does also have other side effects, since memory that is online
from the beginning will be in ZONE_NORMAL by default vs ZONE_MOVABLE
for memory that is offline.

Therefore, before the change, a user could online and offline memory
easily since standby memory was always in ZONE_NORMAL.  After the
change, and a re-IPL, this depended on which memory parts were online
before the re-IPL.

From a usability point of view the current behavior is more than
suboptimal. Therefore revert these changes until we have a better
solution and get back to a consistent behavior. The bad thing about
this is that the time required for a re-IPL will be significantly
increased for configurations with several 100GB or 1TB of memory.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-14 15:35:31 +02:00
Daniel Lezcano bb0eb050a5 clocksource/drivers: Rename CLKSRC_OF to TIMER_OF
The config option name is now renamed to 'TIMER_OF' for consistency with
the CLOCKSOURCE_OF_DECLARE => TIMER_OF_DECLARE change.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
2017-06-14 12:01:03 +02:00
Daniel Lezcano ba5d08c0ea clocksource/drivers: Rename clocksource_probe to timer_probe
The function name is now renamed to 'timer_probe' for consistency with
the CLOCKSOURCE_OF_DECLARE => TIMER_OF_DECLARE change.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
2017-06-14 11:59:16 +02:00
Daniel Lezcano 1727339590 clocksource/drivers: Rename CLOCKSOURCE_OF_DECLARE to TIMER_OF_DECLARE
The CLOCKSOURCE_OF_DECLARE macro is used widely for the timers to declare the
clocksource at early stage. However, this macro is also used to initialize
the clockevent if any, or the clockevent only.

It was originally suggested to declare another macro to initialize a
clockevent, so in order to separate the two entities even they belong to the
same IP. This was not accepted because of the impact on the DT where splitting
a clocksource/clockevent definition does not make sense as it is a Linux
concept not a hardware description.

On the other side, the clocksource has not interrupt declared while the
clockevent has, so it is easy from the driver to know if the description is
for a clockevent or a clocksource, IOW it could be implemented at the driver
level.

So instead of dealing with a named clocksource macro, let's use a more generic
one: TIMER_OF_DECLARE.

The patch has not functional changes.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Acked-by: Neil Armstrong <narmstrong@baylibre.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Matthias Brugger <matthias.bgg@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
2017-06-14 11:58:45 +02:00
Yazen Ghannam 6057077f6e x86/mce: Update bootlog description to reflect behavior on AMD
The bootlog option is only disabled by default on AMD Fam10h and older
systems.

Update bootlog description to say this. Change the family value to hex
to avoid confusion.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170613162835.30750-9-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-14 07:32:10 +02:00
Yazen Ghannam ec33838244 x86/mce: Don't disable MCA banks when offlining a CPU on AMD
AMD systems have non-core, shared MCA banks within a die. These banks
are controlled by a master CPU per die. If this CPU is offlined then all
the shared banks are disabled in addition to the CPU's core banks.

Also, Fam17h systems may have SMT enabled. The MCA_CTL register is shared
between SMT thread siblings. If a CPU is offlined then all its sibling's
MCA banks are also disabled.

Extend the existing vendor check to AMD too.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
[ Fix up comment. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170613162835.30750-8-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-14 07:32:09 +02:00
Borislav Petkov 86d2eac5a7 x86/mce/mce-inject: Preset the MCE injection struct
Populate the MCE injection struct before doing initial injection so that
values which don't change have sane defaults.

Tested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20170613162835.30750-7-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-14 07:32:09 +02:00
Borislav Petkov 5c99881b33 x86/mce: Clean up include files
Not really needed.

Tested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20170613162835.30750-6-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-14 07:32:08 +02:00
Borislav Petkov fbe9ff9eaf x86/mce: Get rid of register_mce_write_callback()
Make the mcelog call a notifier which lands in the injector module and
does the injection. This allows for mce-inject to be a normal kernel
module now.

Tested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20170613162835.30750-5-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-14 07:32:07 +02:00
Borislav Petkov bc8e80d56c x86/mce: Merge mce_amd_inj into mce-inject
Reuse mce_amd_inj's debugfs interface so that mce-inject can
benefit from it too. The old functionality is still preserved under
CONFIG_X86_MCELOG_LEGACY.

Tested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20170613162835.30750-4-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-14 07:32:07 +02:00
Yazen Ghannam 17ef4af0ec x86/mce/AMD: Use saved threshold block info in interrupt handler
In the amd_threshold_interrupt() handler, we loop through every possible
block in each bank and rediscover the block's address and if it's valid,
e.g. valid, counter present and not locked.

However, we already have the address saved in the threshold blocks list
for each CPU and bank. The list only contains blocks that have passed
all the valid checks.

Besides the redundancy, there's also a smp_call_function* in
get_block_address() which causes a warning when servicing the interrupt:

 WARNING: CPU: 0 PID: 0 at kernel/smp.c:281 smp_call_function_single+0xdd/0xf0
 ...
 Call Trace:
  <IRQ>
  rdmsr_safe_on_cpu()
  get_block_address.isra.2()
  amd_threshold_interrupt()
  smp_threshold_interrupt()
  threshold_interrupt()

because we do get called in an interrupt handler *with* interrupts
disabled, which can result in a deadlock.

Drop the redundant valid checks and move the overflow check, logging and
block reset into a separate function.

Check the first block then iterate over the rest. This procedure is
needed since the first block is used as the head of the list.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170613162835.30750-3-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-14 07:32:06 +02:00
Yazen Ghannam a24b8c3409 x86/mce/AMD: Use msr_stat when clearing MCA_STATUS
The value of MCA_STATUS is used as the MSR when clearing MCA_STATUS.

This may cause the following warning:

 unchecked MSR access error: WRMSR to 0x11b (tried to write 0x0000000000000000)
 Call Trace:
  <IRQ>
  smp_threshold_interrupt()
  threshold_interrupt()

Use msr_stat instead which has the MSR address.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: 37d43acfd7 ("x86/mce/AMD: Redo error logging from APIC LVT interrupt handlers")
Link: http://lkml.kernel.org/r/20170613162835.30750-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-14 07:32:06 +02:00
Ingo Molnar 10b90ee2ec Linux 4.12-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZPdbLAAoJEHm+PkMAQRiGx4wH/1nCjfnl6fE8oJ24/1gEAOUh
 biFdqJkYZmlLYHVtYfLm4Ueg4adJdg0wx6qM/4RaAzmQVvLfDV34bc1qBf1+P95G
 kVF+osWyXrZo5cTwkwapHW/KNu4VJwAx2D1wrlxKDVG5AOrULH1pYOYGOpApEkZU
 4N+q5+M0ce0GJpqtUZX+UnI33ygjdDbBxXoFKsr24B7eA0ouGbAJ7dC88WcaETL+
 2/7tT01SvDMo0jBSV0WIqlgXwZ5gp3yPGnklC3F4159Yze6VFrzHMKS/UpPF8o8E
 W9EbuzwxsKyXUifX2GY348L1f+47glen/1sedbuKnFhP6E9aqUQQJXvEO7ueQl4=
 =m2Gx
 -----END PGP SIGNATURE-----

Merge tag 'v4.12-rc5' into ras/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-14 07:31:46 +02:00
Alistair Popple 377aa6b0ef powerpc/npu-dma: Remove spurious WARN_ON when a PCI device has no of_node
Commit 4c3b89effc ("powerpc/powernv: Add sanity checks to
pnv_pci_get_{gpu|npu}_dev") introduced explicit warnings in
pnv_pci_get_npu_dev() when a PCIe device has no associated device-tree
node. However not all PCIe devices have an of_node and
pnv_pci_get_npu_dev() gets indirectly called at least once for every
PCIe device in the system. This results in spurious WARN_ON()'s so
remove it.

The same situation should not exist for pnv_pci_get_gpu_dev() as any
NPU based PCIe device requires a device-tree node.

Fixes: 4c3b89effc ("powerpc/powernv: Add sanity checks to pnv_pci_get_{gpu|npu}_dev")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-14 15:23:19 +10:00
Martin Schwidefsky f044f4c588 s390/fpu: export save_fpu_regs for all configs
The save_fpu_regs function is a general API that is supposed to be
usable for modules as well. Remove the #ifdef that hides the symbol
for CONFIG_KVM=n.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-13 13:03:43 +02:00
Martin Schwidefsky 23fefe119c s390/kvm: avoid global config of vm.alloc_pgste=1
The system control vm.alloc_pgste is used to control the size of the
page tables, either 2K or 4K. The idea is that a KVM host sets the
vm.alloc_pgste control to 1 which causes *all* new processes to run
with 4K page tables. For a non-kvm system the control should stay off
to save on memory used for page tables.

Trouble is that distributions choose to set the control globally to
be able to run KVM guests. This wastes memory on non-KVM systems.

Introduce the PT_S390_PGSTE ELF segment type to "mark" the qemu
executable with it. All executables with this (empty) segment in
its ELF phdr array will be started with 4K page tables. Any executable
without PT_S390_PGSTE will run with the default 2K page tables.

This removes the need to set vm.alloc_pgste=1 for a KVM host and
minimizes the waste of memory for page tables.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-13 13:03:41 +02:00
Christoph Hellwig fdd050b5b3 Merge branch 'uuid-types' of bombadil.infradead.org:public_git/uuid into nvme-base 2017-06-13 11:45:14 +02:00
Kirill A. Shutemov 8624c1f66f x86/mm: Add support for 5-level paging for KASLR
With 5-level paging randomization happens on P4D level instead of PUD.

Maximum amount of physical memory also bumped to 52-bits for 5-level
paging.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170606113133.22974-13-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13 08:56:58 +02:00
Kirill A. Shutemov 7e82ea946a x86/mm: Make kernel_physical_mapping_init() support 5-level paging
Populate additional page table level if CONFIG_X86_5LEVEL is enabled.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170606113133.22974-12-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13 08:56:57 +02:00
Kirill A. Shutemov 141efad7d7 x86/mm: Add sync_global_pgds() for configuration with 5-level paging
This basically restores slightly modified version of original
sync_global_pgds() which we had before folded p4d was introduced.

The only modification is protection against 'addr' overflow.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170606113133.22974-11-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13 08:56:56 +02:00
Kirill A. Shutemov 032370b9c8 x86/boot/64: Add support of additional page table level during early boot
This patch adds support for 5-level paging during early boot.
It generalizes boot for 4- and 5-level paging on 64-bit systems with
compile-time switch between them.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170606113133.22974-10-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13 08:56:55 +02:00
Kirill A. Shutemov 65ade2f872 x86/boot/64: Rename init_level4_pgt and early_level4_pgt
With CONFIG_X86_5LEVEL=y, level 4 is no longer top level of page tables.

Let's give these variable more generic names: init_top_pgt and
early_top_pgt.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170606113133.22974-9-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13 08:56:55 +02:00
Kirill A. Shutemov c88d71508e x86/boot/64: Rewrite startup_64() in C
The patch write most of startup_64 logic in C.

This is preparation for 5-level paging enabling.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170606113133.22974-8-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13 08:56:54 +02:00
Kirill A. Shutemov 34bbb0009f x86/boot/compressed: Enable 5-level paging during decompression stage
We need to cover two basic cases: when bootloader left us in 32-bit mode
and when bootloader enabled long mode.

The patch implements unified codepath to enabled 5-level paging for both
cases. It means case when we start in 32-bit mode, we first enable long
mode with 4-level and then switch over to 5-level paging.

Switching from 4-level to 5-level paging is not trivial. We cannot do it
directly. Setting LA57 in long mode would trigger #GP. So we need to
switch off long mode first and the then re-enable with 5-level paging.

NOTE: The need of switching off long mode means we are in trouble if
bootloader put us above 4G boundary. If bootloader wants to boot 5-level
paging kernel, it has to put kernel below 4G or enable 5-level paging on
it's own, so we could avoid the step.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170606113133.22974-7-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13 08:56:53 +02:00
Kirill A. Shutemov 919a02d128 x86/boot/efi: Define __KERNEL32_CS GDT on 64-bit configurations
We would need to switch temporarily to compatibility mode during booting
with 5-level paging enabled. It would require 32-bit code segment
descriptor.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170606113133.22974-6-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13 08:56:53 +02:00
Kirill A. Shutemov 4c94117c7f x86/boot/efi: Fix __KERNEL_CS definition of GDT entry on 64-bit configurations
Define __KERNEL_CS GDT entry as long mode (.L=1, .D=0) on 64-bit
configurations.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170606113133.22974-5-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13 08:56:52 +02:00
Kirill A. Shutemov f8fceacbd1 x86/boot/efi: Cleanup initialization of GDT entries
This is preparation for following patches without changing semantics of the
code.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170606113133.22974-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13 08:56:51 +02:00
Kirill A. Shutemov cbe0317bf1 x86/asm: Fix comment in return_from_SYSCALL_64()
On x86-64 __VIRTUAL_MASK_SHIFT depends on paging mode now.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170606113133.22974-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13 08:56:51 +02:00
Kirill A. Shutemov e585513b76 x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation
This patch provides all required callbacks required by the generic
get_user_pages_fast() code and switches x86 over - and removes
the platform specific implementation.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170606113133.22974-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13 08:56:50 +02:00
Andy Lutomirski 6c690ee103 x86/mm: Split read_cr3() into read_cr3_pa() and __read_cr3()
The kernel has several code paths that read CR3.  Most of them assume that
CR3 contains the PGD's physical address, whereas some of them awkwardly
use PHYSICAL_PAGE_MASK to mask off low bits.

Add explicit mask macros for CR3 and convert all of the CR3 readers.
This will keep them from breaking when PCID is enabled.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/883f8fb121f4616c1c1427ad87350bb2f5ffeca1.1497288170.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13 08:48:09 +02:00
Ingo Molnar 3f365cf304 Merge branch 'sched/urgent' into x86/mm, to pick up dependent fix
Andy will need the following scheduler fix for the PCID series:

  252d2a4117bc: sched/core: Idle_task_exit() shouldn't use switch_mm_irqs_off()

So do a cross-merge.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13 08:47:22 +02:00
Dou Liyang b1b4f2fe68 x86/time: Make setup_default_timer_irq() static
This function isn't used outside of time.c, so let's mark it static.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1497321029-29049-1-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13 08:42:09 +02:00
Vlastimil Babka d9ee35acfa x86/mm: Disable 1GB direct mappings when disabling 2MB mappings
The kmemleak and debug_pagealloc features both disable using huge pages for
direct mappings so they can do cpa() on page level granularity in any context.

However they only do that for 2MB pages, which means 1GB pages can still be
used if the CPU supports it, unless disabled by a boot param, which is
non-obvious. Disable also 1GB pages when disabling 2MB pages.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/2be70c78-6130-855d-3dfa-d87bd1dd4fda@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13 08:33:00 +02:00
Linus Torvalds 63f700aab4 Xtensa fixes for v4.12-rc6
- don't use linux IRQ #0 in legacy irq domains: fixes timer interrupt
   assignment when it's hardware IRQ # is 0 and the kernel is built w/o
   device tree support;
 - reduce reservation size for double exception vector literals from 48
   to 20 bytes: fixes build on cores with small user exception vector;
 - cleanups: use kmalloc_array instead of kmalloc in simdisk_init and
   seq_puts instead of seq_printf in c_show.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZPwZjAAoJEFH5zJH4P6BEFxIP/25rNFHjeJFcUevm6JNR9kz2
 BThpznk0a8I0Nm1PBk6345YfYyC8DOsbR7rClwjjmZePc1ZUfFJU0TFdHlot1scQ
 MASib9qeWToG/kl/JlcC88CJn2uFgbzLXNlVzYpdOAqvYY6sfthf9b0tgdWnByUb
 WLbdg+MN5B+qfhZzNpsKZHHHPv26QV6iwre3ojUk2KgTgGoLHOYo5bcvmr09id0Y
 nWplIteG6RggUxIEghzvB/Uk2pDyYuAjku/dJtFcCShER172v9d6XE8FqV3naTjy
 Boo310fZhLgo93jEUTAjLZHT45vYxstggVRGzRnSiXE+QIqigIxQ76rbqbk8jwWG
 AhQvlNY+XxALYGyQ9SEGcX5JQC9XmjSuuVO8L3I7JeK3MNoH8FkdCrGRoJs2pwJr
 CXBkbvHkSYgaWfStAInFS0Jga8FJ4lCAxKsnVgchB/WoYmqcI9tqGsXnjPa+zOqR
 b9RY6lKeqfVxda1NRQo6TGu03lCFrqHcQZ5fwYPy8oxZfVL3DjQha1E4JZ68GiFI
 E7sEX0W6IIJozHCWruh4acZN/q25OzSnliI0UvPQpPUbPAbI834ouiPXuiJRJNv1
 CffCb2aYFFar4JBEg5t3w87fBUrEe6HDvjVXwwDGqH6HM8kiQyqftMvQ38lQ8g8b
 X3W4SYmFxiqmEaEAvlnr
 =5lPB
 -----END PGP SIGNATURE-----

Merge tag 'xtensa-20170612' of git://github.com/jcmvbkbc/linux-xtensa

Pull Xtensa fixes from Max Filippov:

 - don't use linux IRQ #0 in legacy irq domains: fixes timer interrupt
   assignment when it's hardware IRQ # is 0 and the kernel is built w/o
   device tree support

 - reduce reservation size for double exception vector literals from 48
   to 20 bytes: fixes build on cores with small user exception vector

 - cleanups: use kmalloc_array instead of kmalloc in simdisk_init and
   seq_puts instead of seq_printf in c_show.

* tag 'xtensa-20170612' of git://github.com/jcmvbkbc/linux-xtensa:
  xtensa: don't use linux IRQ #0
  xtensa: reduce double exception literal reservation
  xtensa: ISS: Use kmalloc_array() in simdisk_init()
  xtensa: Use seq_puts() in c_show()
2017-06-13 15:09:10 +09:00
Linus Torvalds 2ab99b001d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:

 - A fix for KVM to avoid kernel oopses in case of host protection
   faults due to runtime instrumentation

 - A fix for the AP bus to avoid dead devices after unbind / bind

 - A fix for a compile warning merged from the vfio_ccw tree

 - Updated default configurations

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: update defconfig
  s390/zcrypt: Fix blocking queue device after unbind/bind.
  s390/vfio_ccw: make some symbols static
  s390/kvm: do not rely on the ILC on kvm host protection fauls
2017-06-13 15:07:11 +09:00
Paul Mackerras ca8efa1df1 KVM: PPC: Book3S HV: Context-switch EBB registers properly
This adds code to save the values of three SPRs (special-purpose
registers) used by userspace to control event-based branches (EBBs),
which are essentially interrupts that get delivered directly to
userspace.  These registers are loaded up with guest values when
entering the guest, and their values are saved when exiting the
guest, but we were not saving the host values and restoring them
before going back to userspace.

On POWER8 this would only affect userspace programs which explicitly
request the use of EBBs and also use the KVM_RUN ioctl, since the
only source of EBBs on POWER8 is the PMU, and there is an explicit
enable bit in the PMU registers (and those PMU registers do get
properly context-switched between host and guest).  On POWER9 there
is provision for externally-generated EBBs, and these are not subject
to the control in the PMU registers.

Since these registers only affect userspace, we can save them when
we first come in from userspace and restore them before returning to
userspace, rather than saving/restoring the host values on every
guest entry/exit.  Similarly, we don't need to worry about their
values on offline secondary threads since they execute in the context
of the idle task, which never executes in userspace.

Fixes: b005255e12 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-13 14:12:02 +10:00
Peter Zijlstra 8a524f803a x86/debug: Handle early WARN_ONs proper
Hans managed to trigger a WARN very early in the boot which killed his
(Virtual) box.

The reason is that the recent rework of WARN() to use UD0 forgot to add the
fixup_bug() call to early_fixup_exception(). As a result the kernel does
not handle the WARN_ON injected UD0 exception and panics.

Add the missing fixup call, so early UD's injected by WARN() get handled.

Fixes: 9a93848fe7 ("x86/debug: Implement __WARN() using UD0")
Reported-and-tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Frank Mehnert <frank.mehnert@oracle.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Michael Thayer <michael.thayer@oracle.com>
Link: http://lkml.kernel.org/r/20170612180108.w4vgu2ckucmllf3a@hirez.programming.kicks-ass.net
2017-06-12 21:17:48 +02:00
Vladimir Murzin d360a687d9 ARM: 8682/1: V7M: Set cacheid iff DminLine or IminLine is nonzero
Cache support is optional feature in M-class cores, thus DminLine or
IminLine of Cache Type Register is zero if caches are not implemented,
but we check the whole CTR which has other features encoded there.
Let's be more precise and check for DminLine and IminLine of CTR
before we set cacheid.

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2017-06-12 15:47:29 +01:00
Yisheng Xie bbeedfda8e ARM: 8681/1: make VMSPLIT_3G_OPT depends on !ARM_LPAE
When both enable CONFIG_ARM_LPAE=y and CONFIG_VMSPLIT_3G_OPT=y, which
means use PAGE_OFFSET=0xB0000000 with ARM_LPAE, the kernel will boot
fail and stop after uncompressed:

   Starting kernel ...

   Uart base = 0x20001000
   watchdog reg = 0x20013000
   dtb addr = 0x80840308
   Uncompressing Linux... done, booting the kernel.

For ARM_LPAE only support 3:1, 2:2, 1:3 split of TTBR1, which mention in:
   http://elinux.org/images/6/6a/Elce11_marinas.pdf - p16

So we should make VMSPLIT_3G_OPT depends on !ARM_LPAE to avoid trigger
this bug.

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2017-06-12 15:47:28 +01:00
Ard Biesheuvel 60ce285851 ARM: 8680/1: boot/compressed: fix inappropriate Thumb2 mnemonic for __nop
Commit 06a4b6d009 ("ARM: 8677/1: boot/compressed: fix decompressor
header layout for v7-M") fixed an issue in the layout of the header
of the compressed kernel image that was caused by the assembler
emitting narrow opcodes for 'mov r0, r0', and for this reason, the
mnemonic was updated to use the W() macro, which will append the .w
suffix (which forces a wide encoding) if required, i.e., when building
the kernel in Thumb2 mode.

However, this failed to take into account that on Thumb2 kernels built
for CPUs that are also ARM capable, the entry point is entered in ARM
mode, and so the instructions emitted here will be ARM instructions
that only exist in a wide encoding to begin with, which is why the
assembler rejects the .w suffix here and aborts the build with the
following message:

  head.S: Assembler messages:
  head.S:132: Error: width suffixes are invalid in ARM mode -- `mov.w r0,r0'

So replace the W(mov) with separate ARM and Thumb2 instructions, where
the latter will only be used for THUMB2_ONLY builds.

Fixes: 06a4b6d009 ("ARM: 8677/1: boot/compressed: fix decompressor ...")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2017-06-12 15:47:27 +01:00
Jens Axboe 8f66439eec Linux 4.12-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZPdbLAAoJEHm+PkMAQRiGx4wH/1nCjfnl6fE8oJ24/1gEAOUh
 biFdqJkYZmlLYHVtYfLm4Ueg4adJdg0wx6qM/4RaAzmQVvLfDV34bc1qBf1+P95G
 kVF+osWyXrZo5cTwkwapHW/KNu4VJwAx2D1wrlxKDVG5AOrULH1pYOYGOpApEkZU
 4N+q5+M0ce0GJpqtUZX+UnI33ygjdDbBxXoFKsr24B7eA0ouGbAJ7dC88WcaETL+
 2/7tT01SvDMo0jBSV0WIqlgXwZ5gp3yPGnklC3F4159Yze6VFrzHMKS/UpPF8o8E
 W9EbuzwxsKyXUifX2GY348L1f+47glen/1sedbuKnFhP6E9aqUQQJXvEO7ueQl4=
 =m2Gx
 -----END PGP SIGNATURE-----

Merge tag 'v4.12-rc5' into for-4.13/block

We've already got a few conflicts and upcoming work depends on some of the
changes that have gone into mainline as regression fixes for this series.

Pull in 4.12-rc5 to resolve these conflicts and make it easier on down stream
trees to continue working on 4.13 changes.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-12 08:30:13 -06:00
Heiko Carstens a752598254 s390: rename struct psw_bits members
Rename a couple of the struct psw_bits members so it is more obvious
for what they are good. Initially I thought using the single character
names from the PoP would be sufficient and obvious, but admittedly
that is not true.

The current implementation is not easy to use, if one has to look into
the source file to figure out which member represents the 'per' bit
(which is the 'r' member).

Therefore rename the members to sane names that are identical to the
uapi psw mask defines:

r -> per
i -> io
e -> ext
t -> dat
m -> mcheck
w -> wait
p -> pstate

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-12 16:26:02 +02:00
Heiko Carstens 8bb3fdd686 s390: rename psw_bits enums
The address space enums that must be used when modifying the address
space part of a psw with the psw_bits() macro can easily be confused
with the psw defines that are used to mask and compare directly the
mask part of a psw.
We have e.g. PSW_AS_PRIMARY vs PSW_ASC_PRIMARY.

To avoid confusion rename the PSW_AS_* enums to PSW_BITS_AS_*.

In addition also rename the PSW_AMODE_* enums, so they also follow the
same naming scheme: PSW_BITS_AMODE_*.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-12 16:26:02 +02:00
Heiko Carstens 60c497014e s390/mm: use correct address space when enabling DAT
Right now the kernel uses the primary address space until finally the
switch to the correct home address space will be done when the idle
PSW will be loaded within psw_idle().

Correct this and simply use the home address space when DAT is enabled
for the first time.

This doesn't really fix a bug, but fixes odd behavior.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-12 16:26:02 +02:00
Heiko Carstens ead1dec8ed s390/ipl: revert Load Normal semantics for LPAR CCW-type re-IPL
This reverts the two commits

7afbeb6df2 ("s390/ipl: always use load normal for CCW-type re-IPL")
0f7451ff3a ("s390/ipl: use load normal for LPAR re-ipl")

The two commits did not take into account that behavior of standby
memory changes fundamentally if the re-IPL method is changed from
Load Clear to Load Normal.

In case of the old re-IPL clear method all memory that was initially
in standby state will be put into standby state again within the
re-IPL process. Or in other words: memory that was brought online
before a re-IPL will be offline again after a reboot.

Given that we use different re-IPL methods depending on the hypervisor
and CCW-type vs SCSI re-IPL it is not easy to tell in advance when and
why memory will stay online or will be offline after a re-IPL.
This does also have other side effects, since memory that is online
from the beginning will be in ZONE_NORMAL by default vs ZONE_MOVABLE
for memory that is offline.

Therefore, before the change, a user could online and offline memory
easily since standby memory was always in ZONE_NORMAL.  After the
change, and a re-IPL, this depended on which memory parts were online
before the re-IPL.

From a usability point of view the current behavior is more than
suboptimal. Therefore revert these changes until we have a better
solution and get back to a consistent behavior. The bad thing about
this is that the time required for a re-IPL will be significantly
increased for configurations with several 100GB or 1TB of memory.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-12 16:26:01 +02:00
Heiko Carstens 2b7b9817c2 s390/dumpstack: remove raw stack dump
Remove raw stack dumps that are printed before call traces in case of
a warning, or the 'l' sysrq trigger (show a stack backtrace for all
active CPUs).

Besides that a raw stack dump should not be shown for the 'l' sysrq
trigger the value of the dump is close to zero. That's also why we
don't print it in case of a panic since ages anymore. That this is
still printed on warnings is just a leftover. So get rid of this
completely.

The following won't be printed anymore with this change:

Stack:
       00000000bbc4fbc8 00000000bbc4fc58 0000000000000003 0000000000000000
       00000000bbc4fcf8 00000000bbc4fc70 00000000bbc4fc70 0000000000000020
       000000007fe00098 00000000bfe8be00 00000000bbc4fe94 000000000000000a
       000000000000000c 00000000bbc4fcc0 0000000000000000 0000000000000000
       000000000095b930 0000000000113366 00000000bbc4fc58 00000000bbc4fca0

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-12 16:26:01 +02:00
Logan Gunthorpe 7e9710af23 s390: provide default ioremap and iounmap declaration
Move the CONFIG_PCI device so that ioremap and iounmap are always
available. This looks safe as there's nothing PCI specific in the
implementation of these functions.

I have designs to use these functions in scatterlist.c where they'd likely
never be called without CONFIG_PCI set, but this is needed to compile
such changes.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-12 16:26:00 +02:00
Thomas Richter c39457ff1f s390/perf: fix null string in perf list pmu command
Command 'perf list pmu' displays events which contain
an invalid string "(null)=xxx", where xxx is the pmu event
name, for example:
   cpum_cf/AES_BLOCKED_CYCLES,(null)=AES_BLOCKED_CYCLES/
This is not correct, the invalid string should not be
displayed at all.

It is caused by an obsolete term in the
sysfs attribute file for each s390 CPUMF counter event.
Reading from the sysfs file also displays the event
name.

Fix this by omitting the event name.  This patch makes
s390 CPUMF sysfs files consistent with other plattforms.

This is an interface change between user and kernel
but does not break anything. Reading from a counter event
sysfs file should only list terms mentioned in the
/sys/bus/event_source/devices/<cpumf>/format directory.
Name is not listed.

Reported-by: Zvonko Kosic <zvonko.kosic@de.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-12 16:26:00 +02:00
Heiko Carstens cc18b460dc s390/mm: add p?d_folded() helper functions
Introduce and use p?d_folded() functions to clarify the page table
code a bit more.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-12 16:26:00 +02:00
Heiko Carstens f96c6f72bc s390/mm: remove incorrect _REGION3_ENTRY_ORIGIN define
_REGION3_ENTRY_ORIGIN defines a wrong mask which can be used to
extract a segment table origin from a region 3 table entry. It removes
only the lower 11 instead of 12 bits from a region 3 table entry.
Luckily this bit is currently always zero, so nothing bad happened yet.

In order to avoid future bugs just remove the region 3 specific mask
and use the correct generic _REGION_ENTRY_ORIGIN mask.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-12 16:26:00 +02:00
Martin Schwidefsky f5bbd72198 s390/ptrace: guarded storage regset for the current task
The regset functions for guarded storage are supposed to work on
the current task as well. For task == current add the required
load and store instructions for the guarded storage control block.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-12 16:25:59 +02:00
Heiko Carstens 53d7f25f09 s390/facilities: remove stfle requirement
All call sites of "stfle" check if the instruction is available before
executing it. Therefore there is no reason to have the corresponding
facility bit set within the architecture level set.
This removes the last more or less odd bit from the list.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-12 16:25:59 +02:00
Thomas Huth 8aa8680aa3 s390: Remove 'message security assist' from the list of vital facilities
The code in arch/s390/crypto checks for the availability of the
'message security assist' facility on its own, either by using
module_cpu_feature_match(MSA, ...) or by checking the facility
bit during cpacf_query(). Thus setting the MSA facility bit in
gen_facilities.c as hard requirement is not necessary. We can
remove it here, so that the kernel can also run on systems that
do not provide the MSA facility yet (like the emulated environment
of QEMU, for example).

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-12 16:25:59 +02:00
Heiko Carstens fe7b274729 s390/fault: use _ASCE_ORIGIN instead of PAGE_MASK
When masking an ASCE to get its origin use the corresponding define
instead of the unrelated PAGE_MASK.
This doesn't fix a bug since both masks are identical.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-12 16:25:59 +02:00
Heiko Carstens bf10b6687c s390/smp: use sigp condition code define
Use proper define instead of open-coding the condition code value.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-12 16:25:58 +02:00
Christian Borntraeger 9cf8edb7a3 s390/smp: fix false positive kmemleak of mcesa data structure
I get number of CPUs - 1 kmemleak hits like

unreferenced object 0x37ec6f000 (size 1024):
  comm "swapper/0", pid 1, jiffies 4294937330 (age 889.690s)
  hex dump (first 32 bytes):
    6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
    6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
  backtrace:
    [<000000000034a848>] kmem_cache_alloc+0x2b8/0x3d0
    [<00000000001164de>] __cpu_up+0x456/0x488
    [<000000000016f60c>] bringup_cpu+0x4c/0xd0
    [<000000000016d5d2>] cpuhp_invoke_callback+0xe2/0x9e8
    [<000000000016f3c6>] cpuhp_up_callbacks+0x5e/0x110
    [<000000000016f988>] _cpu_up+0xe0/0x158
    [<000000000016faf0>] do_cpu_up+0xf0/0x110
    [<0000000000dae1ee>] smp_init+0x126/0x130
    [<0000000000d9bd04>] kernel_init_freeable+0x174/0x2e0
    [<000000000089fc62>] kernel_init+0x2a/0x148
    [<00000000008adce2>] kernel_thread_starter+0x6/0xc
    [<00000000008adcdc>] kernel_thread_starter+0x0/0xc
    [<ffffffffffffffff>] 0xffffffffffffffff

The pointer of this data structure is stored in the prefix page of that
CPU together with some extra bits ORed into the the low bits.
Mark the data structure as non-leak.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-12 16:25:58 +02:00
Harald Freudenberger c4684f98d3 s390/crypto: fix aes/paes Kconfig dependeny
The s390_paes and the s390_aes kernel module used just one
config symbol CONFIG_CRYPTO_AES. As paes has a dependency
to PKEY and this requires ZCRYPT the aes module also had
a dependency to the zcrypt device driver which is not true.
Fixed by introducing a new config symbol CONFIG_CRYPTO_PAES
which has dependencies to PKEY and ZCRYPT. Removed the
dependency for the aes module to ZCRYPT.

Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-12 16:25:58 +02:00
Martin Schwidefsky 35bb092a91 s390/vdso: use _install_special_mapping to establish vdso
Switch to the improved _install_special_mapping function to install
the vdso mapping. This has two advantages, the arch_vma_name function
is not needed anymore and the vdso vma still has its name after its
memory location has been changed with mremap.

Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-12 16:25:57 +02:00
Martin Schwidefsky b29e061bb7 s390/cputime: simplify account_system_index_scaled
The account_system_index_scaled gets two cputime values, a raw value
derived from CPU timer deltas and a scaled value. The scaled value
is always calculated from the raw value, the code can be simplified
by moving the scale_vtime call into account_system_index_scaled.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-12 16:25:57 +02:00
Heiko Carstens 6c386da799 s390: use two more generic header files
I missed at least these two header files where we can make use of the
generic ones.  vga.h is another one, however that is already addressed
by a patch from Jiri Slaby.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-12 16:25:57 +02:00
Heiko Carstens d12a3d6036 s390/mm: add __rcu annotations
Add __rcu annotations so sparse correctly warns only if "slot" gets
derefenced without using rcu_dereference(). Right now we get warnings
because of the missing annotation:

arch/s390/mm/gmap.c:135:17: warning: incorrect type in assignment (different address spaces)
arch/s390/mm/gmap.c:135:17:    expected void **slot
arch/s390/mm/gmap.c:135:17:    got void [noderef] <asn:4>**

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-12 16:25:55 +02:00
Heiko Carstens 92acfb7406 s390: add missing header includes for type checking
Add missing include statements to make sure that prototypes match
implementation. As reported by sparse:

arch/s390/crypto/arch_random.c:18:1:
  warning: symbol 's390_arch_random_available' was not declared. Should it be static?
arch/s390/kernel/traps.c:279:13: warning:
  symbol 'trap_init' was not declared. Should it be static?

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-12 16:25:55 +02:00
Martin Schwidefsky 1aea9b3f92 s390/mm: implement 5 level pages tables
Add the logic to upgrade the page table for a 64-bit process to
five levels. This increases the TASK_SIZE from 8PB to 16EB-4K.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-12 16:25:54 +02:00
Linus Walleij ec14ba1ec5 clocksource/drivers/fttmr010: Merge Moxa into FTTMR010
This merges the Moxa Art timer driver into the Faraday FTTMR010
driver and replaces all Kconfig symbols to use the Faraday
driver instead. We are now so similar that the drivers can
be merged by just adding a few lines to the Faraday timer.

Differences:

- The Faraday driver explicitly sets the counter to count
  upwards for the clocksource, removing the need for the
  clocksource core to invert the value.

- The Faraday driver also handles sched_clock()

On the Aspeed, the counter can only count downwards, so support
the timers in downward-counting mode as well, and flag the
Aspeed to use this mode. This mode was tested on the Gemini so
I have high hopes that it'll work fine on the Aspeed as well.

After this we have one driver for all three SoCs and a generic
Faraday FTTMR010 timer driver, which is nice.

Cc: Joel Stanley <joel@jms.id.au>
Cc: Jonas Jensen <jonas.jensen@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Tested-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2017-06-12 10:45:10 +02:00
Andrew Jeffery 164a1a90a4 arm: aspeed: Add clock-names property to timer node
The merging of a number of clocksource drivers into fttmr010 means we
require clock-names to be specified in the Aspeed timer node, else the
clocksource fails to probe and boot hangs.

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2017-06-12 10:13:56 +02:00
Linus Torvalds 32627645e9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull key subsystem fixes from James Morris:
 "Here are a bunch of fixes for Linux keyrings, including:

   - Fix up the refcount handling now that key structs use the
     refcount_t type and the refcount_t ops don't allow a 0->1
     transition.

   - Fix a potential NULL deref after error in x509_cert_parse().

   - Don't put data for the crypto algorithms to use on the stack.

   - Fix the handling of a null payload being passed to add_key().

   - Fix incorrect cleanup an uninitialised key_preparsed_payload in
     key_update().

   - Explicit sanitisation of potentially secure data before freeing.

   - Fixes for the Diffie-Helman code"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (23 commits)
  KEYS: fix refcount_inc() on zero
  KEYS: Convert KEYCTL_DH_COMPUTE to use the crypto KPP API
  crypto : asymmetric_keys : verify_pefile:zero memory content before freeing
  KEYS: DH: add __user annotations to keyctl_kdf_params
  KEYS: DH: ensure the KDF counter is properly aligned
  KEYS: DH: don't feed uninitialized "otherinfo" into KDF
  KEYS: DH: forbid using digest_null as the KDF hash
  KEYS: sanitize key structs before freeing
  KEYS: trusted: sanitize all key material
  KEYS: encrypted: sanitize all key material
  KEYS: user_defined: sanitize key payloads
  KEYS: sanitize add_key() and keyctl() key payloads
  KEYS: fix freeing uninitialized memory in key_update()
  KEYS: fix dereferencing NULL payload with nonzero length
  KEYS: encrypted: use constant-time HMAC comparison
  KEYS: encrypted: fix race causing incorrect HMAC calculations
  KEYS: encrypted: fix buffer overread in valid_master_desc()
  KEYS: encrypted: avoid encrypting/decrypting stack buffers
  KEYS: put keyring if install_session_keyring_to_cred() fails
  KEYS: Delete an error message for a failed memory allocation in get_derived_key()
  ...
2017-06-11 16:17:29 -07:00
Linus Torvalds 9d66af6bbf Fix hexagon build error
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZPKckAAoJEMsfJm/On5mB7ngQAKqyaAJ1Nrr2+Dur//XhDcIj
 Q3FgBJL2ip9s3S9NTgfXy3N16s/h9XoBU3SNvpzOOaeeSSGvdaTPJSsQieJ/Vtbo
 FOp1kp25W/nz6NCrhLxFK1dMLIg+128nsjm2XhzoPC6CfxXSUpxwGQX5CwfqBFP5
 2mfW6htJUiVenjS6QIDM3/fP0hcSgAsiSfTY3flQxn8dBzXgKFJgI6u4VuoqWV3U
 6+NgPSjRJa0aUeZj8z9Kf+1mjoecaBh2cKoztPGLm6tznbSrUJWP1euSLi4DmEQl
 NvQWhi/Sk3B73fNHqJ3k+n2ZsFOErodFnUAGh5Q+18Tz7L9l8TXBF8god7yZYhbO
 YZC9IeJ34fQcnMxvHS26WBADfm6hjRqtRa0BABWtwOio18vhALjqBbCjzlsgp7Cx
 tE///cNk8eGUbR1V/g0i44mSTIv5xBwDnZ9yHoALWVWvefYkHAT3tr1GmlaulKJM
 d2Olv4eLf9FeBkciA57cBJNNx5Mi9kM6VssDkaxVsSYQm7ZA4vFNCRqi9GgvPdAx
 NYPifhxGMn+so5wBDwH1HKbeTFBTgM7TRkZlOJLqPH/p6o6WnlpnvgweaP4YT0cP
 uR9Q75F/NAhMgomqoDkcJlbmup6kfSPku8wuGUoDR2qbU59wOw/rWcyqgc0CFDHg
 ptn2N+FcS/ZH3OSDbe5y
 =TDsn
 -----END PGP SIGNATURE-----

Merge tag 'hexagon-for-linus-v4.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hexagon fix from Guenter Roeck:
 "This fixes a build error seen when building hexagon images.

  Richard sent me an Ack, but didn't reply when asked if he wants me to
  send the patch to you directly, so I figured I'd just do it"

* tag 'hexagon-for-linus-v4.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hexagon: Use raw_copy_to_user
2017-06-11 11:09:15 -07:00
Linus Torvalds 9d0eb46246 Bug fixes (ARM, s390, x86)
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJZPOhfAAoJEL/70l94x66DNgYH/i1pbBmsxeYxWEoScLDTVYZa
 gLKHpjpciCBcKdMGSuBXeP702pi85N/TG7M2XIT/nXmFHYsie9RUSWXK63IZxpPx
 7wRNocstQU+DyMAP3pagJIFPnhUT9ufCZYFDin8sQoh1Dk1xbV38WaUPb/YfPYIt
 xGyti2SzT/CiBOR5zQiNb8m8k+M19QGzjcglHRq5Uk/oSElaw635M6u3PZD0Zvtd
 9L3NhtDYp1tUpG2Rc/5fXX651BVl+J6+xMukuAJBSqcI2hNfe0oM7tTi6larPHPo
 eguW0ChU2qat7Gjob18oKlLAMGwPHt+p7utH2pju5UTWFdPBY2lxUgv5Yfd3Qzo=
 =C/gM
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Bug fixes (ARM, s390, x86)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: async_pf: avoid async pf injection when in guest mode
  KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation
  arm: KVM: Allow unaligned accesses at HYP
  arm64: KVM: Allow unaligned accesses at EL2
  arm64: KVM: Preserve RES1 bits in SCTLR_EL2
  KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages
  KVM: nVMX: Fix exception injection
  kvm: async_pf: fix rcu_irq_enter() with irqs enabled
  KVM: arm/arm64: vgic-v3: Fix nr_pre_bits bitfield extraction
  KVM: s390: fix ais handling vs cpu model
  KVM: arm/arm64: Fix isues with GICv2 on GICv3 migration
2017-06-11 11:07:25 -07:00
Wanpeng Li 9bc1f09f6f KVM: async_pf: avoid async pf injection when in guest mode
INFO: task gnome-terminal-:1734 blocked for more than 120 seconds.
       Not tainted 4.12.0-rc4+ #8
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 gnome-terminal- D    0  1734   1015 0x00000000
 Call Trace:
  __schedule+0x3cd/0xb30
  schedule+0x40/0x90
  kvm_async_pf_task_wait+0x1cc/0x270
  ? __vfs_read+0x37/0x150
  ? prepare_to_swait+0x22/0x70
  do_async_page_fault+0x77/0xb0
  ? do_async_page_fault+0x77/0xb0
  async_page_fault+0x28/0x30

This is triggered by running both win7 and win2016 on L1 KVM simultaneously,
and then gives stress to memory on L1, I can observed this hang on L1 when
at least ~70% swap area is occupied on L0.

This is due to async pf was injected to L2 which should be injected to L1,
L2 guest starts receiving pagefault w/ bogus %cr2(apf token from the host
actually), and L1 guest starts accumulating tasks stuck in D state in
kvm_async_pf_task_wait() since missing PAGE_READY async_pfs.

This patch fixes the hang by doing async pf when executing L1 guest.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-11 08:39:24 +02:00
Guenter Roeck 4d801cca0b hexagon: Use raw_copy_to_user
Commit ac4691fac8 ("hexagon: switch to RAW_COPY_USER") replaced
__copy_to_user_hexagon() with raw_copy_to_user(), but did not catch
all callers, resulting in the following build error.

arch/hexagon/mm/uaccess.c: In function '__clear_user_hexagon':
arch/hexagon/mm/uaccess.c:40:3: error:
	implicit declaration of function '__copy_to_user_hexagon'

Fixes: ac4691fac8 ("hexagon: switch to RAW_COPY_USER")
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2017-06-10 19:10:31 -07:00
Linus Torvalds ac1a14a239 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes: a Geode fix plus a microcode loader fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/intel: Clear patch pointer before jettisoning the initrd
  x86/cpu/cyrix: Add alternative Device ID of Geode GX1 SoC
2017-06-10 10:51:25 -07:00
Linus Torvalds 179145e631 IOMMU Fixes for Linux v4.12-rc4
Including:
 
 	* Another compile-fix for my header cleanup
 
 	* A couple of fixes for the recently merged IOMMU probe
 	  deferal code
 
 	* Includes fixes for ACPI/IORT code necessary with
 	  IOMMU probe deferal
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJZOyKuAAoJECvwRC2XARrjb1EP/iM+5lHWikgAZ7dbH+3oS3hE
 6tkevfr2+6SikuJp4rITtIXWDK3nSO/2kXbt6LqCVTFKuiszEHO1BDAep0ytIPzE
 63m8INBpSm+bs4QhdQUR5f8Ab8uFh/SPozZlxq3mxMf5QARvbbWeAiMH/iTl/HD7
 XlywryI9SD7fGHkdVWApCtUH6AoYv33c9jg9a/+7RngOjv3gvI0hHr4Omt0udf+8
 udS0WGNjsJy9HqtIzaaRCCe4rWqsgLzi9iCFe856P+smD9g9BoH2VgePbOzeOI4r
 IMnEcGBExFuEpx67lDseHYqg6R79lzlE/C1SyzcCOEbvvjMUL+/nqm+AjEMa5++w
 wL0gyiAZrUZjFhsr4QjQESsUDqlB7K7YHfLNLxwlC8vg9/4V7StoeOWhE+YVROLz
 1+MJ2Kv5wlZRN/B6wKCCRSAGnuMT02xXWxNRKfS7+sHPT8eJVWBEo6K+0WNGTkhq
 oFJCggBBllmlegt/IOKTe6jLzKN95UHz0NSoMoj1LIqCZOMFTPJwkV5976oPy0Ba
 uAoH5uJpai2yRIE15mHB23Bkc1SBE3pm7VC/4NeT4i7pyb8hnb4QwhmSD1vd7J+E
 ZApOSoptDWSF9al6LXzEZTjCMF85fY/JsG3+jdqY6zCacUK795gcyUZ/3al5+F4E
 r50a/SvSp90AxrZZcpT7
 =6dZB
 -----END PGP SIGNATURE-----

Merge tag 'iommu-fixes-v4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU fixes from Joerg Roedel:

 - another compile-fix for my header cleanup

 - a couple of fixes for the recently merged IOMMU probe deferal code

 - fixes for ACPI/IORT code necessary with IOMMU probe deferal

* tag 'iommu-fixes-v4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  arm: dma-mapping: Reset the device's dma_ops
  ACPI/IORT: Move the check to get iommu_ops from translated fwspec
  ARM: dma-mapping: Don't tear down third-party mappings
  ACPI/IORT: Ignore all errors except EPROBE_DEFER
  iommu/of: Ignore all errors except EPROBE_DEFER
  iommu/of: Fix check for returning EPROBE_DEFER
  iommu/dma: Fix function declaration
2017-06-09 22:30:55 -07:00
Linus Torvalds a92f63cd13 powerpc fixes for 4.12 #5
Mostly fairly minor, of note are:
  - Fix percpu allocations to be NUMA aware
  - Limit 4k page size config to 64TB virtual address space
  - Avoid needlessly restoring FP and vector registers
 
 Thanks to:
   Aneesh Kumar K.V, Breno Leitao, Christophe Leroy, Frederic Barrat, Madhavan
   Srinivasan, Michael Bringmann, Nicholas Piggin, Vaibhav Jain.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZOobDAAoJEFHr6jzI4aWA/2AQALYfeTGo5Pl+c4E1wd5XpDae
 cXNxDaFFqdyNBW3OzWcDDW7w+pGvuu6SLPdki1CNosp4MprqpZurjTe9bhfYoAy1
 SQAcUgk7vra6WlbfguinwlCDeeKt53Np9ZYs4jhq/wy9bp6+T0evpdoY5NvHRr+x
 wR4y3KtP1t3PSlnJFfXq2jHg6xYcqJ14cVjWD6KPW8c8YB81WU7FkeVF5YtzinxE
 OWfuJlwzXG3JtG5q9CwwcxUiwpWSPZ7WPHovD8sCQc1pkdXx98LY5syhYy+8n2pa
 B2DbHoRwwkxQwT2z6KAGBUPtf94Fhe4bHcH24RXVeTnr1Wu1sIrkJkQ7c0OR6wN/
 nT8H+uJMpuWPNmbVk6n1LWqLem3YliVB6HwcP0944TF9P/f1GCF6YotkZJ/aXTSc
 FLA/teN0tzQyaX+DOrgTGTgmuscCOkDMRUHPmB9HoBYP3XLgeA+N9mphbxtj03d0
 ch62OqZBw6MiweEx3Ji1s7mfTr8/VaCtSx+rzz7GsQllXeo8k2TVoBewr2X1wesv
 0hB779hRYt9JvDVgmJlLyClLsE/kEy6ZPD6r7kf9DgBYwQRaHdMZDZIR6jDZjklJ
 viVEmQ91lCRx+g3q7hqgZO8VPr2EY12h7PqGgcDvkzHq0vawkxif4aBJ5AIZHpWs
 J23ZZB08M8llurG99+uA
 =KDBG
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.12-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Mostly fairly minor, of note are:

   - Fix percpu allocations to be NUMA aware

   - Limit 4k page size config to 64TB virtual address space

   - Avoid needlessly restoring FP and vector registers

  Thanks to Aneesh Kumar K.V, Breno Leitao, Christophe Leroy, Frederic
  Barrat, Madhavan Srinivasan, Michael Bringmann, Nicholas Piggin,
  Vaibhav Jain"

* tag 'powerpc-4.12-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/book3s64: Move PPC_DT_CPU_FTRs and enable it by default
  powerpc/mm/4k: Limit 4k page size config to 64TB virtual address space
  cxl: Fix error path on bad ioctl
  powerpc/perf: Fix Power9 test_adder fields
  powerpc/numa: Fix percpu allocations to be NUMA aware
  cxl: Avoid double free_irq() for psl,slice interrupts
  powerpc/kernel: Initialize load_tm on task creation
  powerpc/kernel: Fix FP and vector register restoration
  powerpc/64: Reclaim CPU_FTR_SUBCORE
  powerpc/hotplug-mem: Fix missing endian conversion of aa_index
  powerpc/sysdev/simple_gpio: Fix oops in gpio save_regs function
  powerpc/spufs: Fix coredump of SPU contexts
  powerpc/64s: Add dt_cpu_ftrs boot time setup option
2017-06-09 09:44:46 -07:00
Linus Torvalds 788a73f4e5 ARM: SoC fixes
Been sitting on these for a couple of weeks waiting on some larger batches
 to come in but it's been pretty quiet.
 
 Just your garden variety fixes here:
 
  - A few maintainers updates (ep93xx, Exynos, TI, Marvell)
  - Some PM fixes for Atmel/at91 and Marvell
  - A few DT fixes for Marvell, Versatile, TI Keystone, bcm283x
  - A reset driver patch to set module license for symbol access
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZN343AAoJEIwa5zzehBx3xJ0QAJSexz+rYI7V3aqvjtNmdEaE
 2l7Rl4dNQ13u7RBx+67/m1vAgxTgefXahckuv6x4Jr5S5sQO++OkTm0XBO1+3trY
 pwQVJYatOwDt5X7+HOKmTvCgFh48KyrNegXy1lvr/p77CyA+B61zQ2w9wqO0VXua
 MQ05HzOt2JroKytPz70MywxtQpULWC8FGZTFbzZqUfdS30HxM4ZXp6gKxMDvRAqh
 LpP2hfjCnM0H3QoeNXYsfSydI0T0J0PcavouUzGQk2XSA6k5g+MXpL1IUB+iN9EH
 UdmEiVhDcNB3upWQ0lPFi84sexDXSqcu6M9VIozdC/LYDD1lGnHBEZuagoq72/xA
 CEU3H81inCQ6cpYRgan7uzlA4+dqKf4HD3H1fkwrowblMQppWPeDe9e/5XAq73Xl
 4+5GxXtDhK1KvPaH3USkTnFOjEQ2QELmDxdLqmiTXP8GnXdn5wJTobUj7z6HttXY
 Q4jA7F/A8ObHbEbnZI9e8pmrnQeMd/cK47NCZTBkJgN2eIzPw/TJk/bQcIXAq/km
 HcVn5R8GbrN9DwJMpMQN9fpH3sXCmcUxujbfldTYGdsBo8rvXChs8DHxJF94FXOV
 rMO6Bb25bd7kN8oCvY3r7VeGavpSkO8WVWi3YnNW4KGF9/oGE24LGHdbChjoLyJH
 rvv3uVsXtx2A9O9uGYl1
 =WlSc
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "Been sitting on these for a couple of weeks waiting on some larger
  batches to come in but it's been pretty quiet.

  Just your garden variety fixes here:

   - A few maintainers updates (ep93xx, Exynos, TI, Marvell)
   - Some PM fixes for Atmel/at91 and Marvell
   - A few DT fixes for Marvell, Versatile, TI Keystone, bcm283x
   - A reset driver patch to set module license for symbol access"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  MAINTAINERS: EP93XX: Update maintainership
  MAINTAINERS: remove kernel@stlinux.com obsolete mailing list
  ARM: dts: versatile: use #include "..." to include local DT
  MAINTAINERS: add device-tree files to TI DaVinci entry
  ARM: at91: select CONFIG_ARM_CPU_SUSPEND
  ARM: dts: keystone-k2l: fix broken Ethernet due to disabled OSR
  arm64: defconfig: enable some core options for 64bit Rockchip socs
  arm64: marvell: dts: fix interrupts in 7k/8k crypto nodes
  reset: hi6220: Set module license so that it can be loaded
  MAINTAINERS: add irqchip related drivers to Marvell EBU maintainers
  MAINTAINERS: sort F entries for Marvell EBU maintainers
  ARM: davinci: PM: Do not free useful resources in normal path in 'davinci_pm_init'
  ARM: davinci: PM: Free resources in error handling path in 'davinci_pm_init'
  ARM: dts: bcm283x: Reserve first page for firmware
  memory: atmel-ebi: mark PM ops as __maybe_unused
  MAINTAINERS: Remove Javier Martinez Canillas as reviewer for Exynos
2017-06-09 09:40:08 -07:00
Christoph Hellwig 2a842acab1 block: introduce new block status code type
Currently we use nornal Linux errno values in the block layer, and while
we accept any error a few have overloaded magic meanings.  This patch
instead introduces a new  blk_status_t value that holds block layer specific
status codes and explicitly explains their meaning.  Helpers to convert from
and to the previous special meanings are provided for now, but I suspect
we want to get rid of them in the long run - those drivers that have a
errno input (e.g. networking) usually get errnos that don't know about
the special block layer overloads, and similarly returning them to userspace
will usually return somethings that strictly speaking isn't correct
for file system operations, but that's left as an exercise for later.

For now the set of errors is a very limited set that closely corresponds
to the previous overloaded errno values, but there is some low hanging
fruite to improve it.

blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
typechecking, so that we can easily catch places passing the wrong values.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09 09:27:32 -06:00
Helge Deller f02e6c61bc parisc: Don't hardcode PSW values in hpmc code
Signed-off-by: Helge Deller <deller@gmx.de>
2017-06-09 11:34:56 +02:00
Helge Deller 3f4fb1084d parisc: Don't hardcode PSW values in gsc_*() functions
Signed-off-by: Helge Deller <deller@gmx.de>
2017-06-09 11:34:54 +02:00
Helge Deller b752c7b207 parisc: Avoid zeroing gr[0] in fixup_exception()
Register gr[0] holds the PSW in interrupt context. It's absolutely
unlikely that the compiler will use register zero in a get_user() call,
but better BUG on such a case in fixup_exception() anyway.

Signed-off-by: Helge Deller <deller@gmx.de>
2017-06-09 11:34:53 +02:00
Helge Deller 649aa24254 parisc/mm: Ensure IRQs are off in switch_mm()
This is because of commit f98db6013c ("sched/core: Add switch_mm_irqs_off()
and use it in the scheduler") in which switch_mm_irqs_off() is called by the
scheduler, vs switch_mm() which is used by use_mm().

This patch lets the parisc code mirror the x86 and powerpc code, ie. it
disables interrupts in switch_mm(), and optimises the scheduler case by
defining switch_mm_irqs_off().

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Helge Deller <deller@gmx.de>
2017-06-09 11:31:06 +02:00
Bilal Amarni 47b2c3fff4 security/keys: add CONFIG_KEYS_COMPAT to Kconfig
CONFIG_KEYS_COMPAT is defined in arch-specific Kconfigs and is missing for
several 64-bit architectures : mips, parisc, tile.

At the moment and for those architectures, calling in 32-bit userspace the
keyctl syscall would return an ENOSYS error.

This patch moves the CONFIG_KEYS_COMPAT option to security/keys/Kconfig, to
make sure the compatibility wrapper is registered by default for any 64-bit
architecture as long as it is configured with CONFIG_COMPAT.

[DH: Modified to remove arm64 compat enablement also as requested by Eric
 Biggers]

Signed-off-by: Bilal Amarni <bilal.amarni@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
cc: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-06-09 13:29:45 +10:00
Paolo Bonzini 9e53932d88 KVM: s390: Fix for master (4.12)
- The newly created AIS capability enables the feature unconditionally
   and ignores the cpu model
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJZOVMaAAoJEBF7vIC1phx8d9oP/1ReHTErJN/MBk3EKuHSX2PY
 kgV27DSDk/kxltFL0giXFNQG8CG+eOQ/uWByg9B7ZrRHwsdpsdr5ErWGHocwu0uA
 sM8qm64+79QHb6sb7hmm0VpfmQ6LT8ovbOPUJtvfb4X7+rbd0fNCZxfrDLSzvMTX
 yccFkNymx1c+zEjbhmy0CgTgikNFkiRzC8nWmfwg6De9vkykDRHDZ/hz6LBPb1Ln
 uw6g9jZq/2Msi0pU1Y/mZvNkade2HdXEWIQepwggQRp+xNYH84gk9CnQ5MsSJ+94
 EfVyQXOlMnuB/H0wJOKz6KMQ9OjYguBwFsRUEaB2Xqng0k3Y975UQnQ3iRN7p9Kd
 NgmGOhlfRCfv1vi/l2gPhVHMnEJqzVHso7HrLXqkHUDY9KagFdLM5nF5GRAtj0bW
 nRr6K4CioLS9mILEQgxwOprvrfKH37dUi5mb/0HHzyYPytwOfiWkKq5r9OQbbJks
 nY2t7aBn3VliFeJtWbst7ntR54kfm589liAofO/TjrhSMtZANm6BrHe39w9pMzfE
 PcOZYs8V1HW4KLlB6sky3ghBrK1OTTaCdOgbWE4TeoQWDEIDoagYVxSzcHZ25aAc
 rnFshZOsZlXIJilCxIyalmJkTKr3HWW+vE94F8H8Ft0k0nrrDwWWYhISUjbigbZz
 zu7jltSocuep0c7OTzKM
 =MjM+
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-master-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Fix for master (4.12)

- The newly created AIS capability enables the feature unconditionally
  and ignores the cpu model
2017-06-08 16:35:18 +02:00
Martin Schwidefsky 16ddcc34b8 s390: update defconfig
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-08 15:53:48 +02:00
Marcin Nowakowski 698b851073 MIPS: kprobes: flush_insn_slot should flush only if probe initialised
When ftrace is used with kprobes, it is possible for a kprobe to contain
an invalid location (ie. only initialised to 0 and not to a specific
location in the code). Trying to perform a cache flush on such location
leads to a crash r4k_flush_icache_range().

Fixes: c1bf207d6e ("MIPS: kprobe: Add support.")
Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/16296/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-06-08 15:42:05 +02:00
Wanpeng Li a3641631d1 KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation
If "i" is the last element in the vcpu->arch.cpuid_entries[] array, it
potentially can be exploited the vulnerability. this will out-of-bounds
read and write.  Luckily, the effect is small:

	/* when no next entry is found, the current entry[i] is reselected */
	for (j = i + 1; ; j = (j + 1) % nent) {
		struct kvm_cpuid_entry2 *ej = &vcpu->arch.cpuid_entries[j];
		if (ej->function == e->function) {

It reads ej->maxphyaddr, which is user controlled.  However...

			ej->flags |= KVM_CPUID_FLAG_STATE_READ_NEXT;

After cpuid_entries there is

	int maxphyaddr;
	struct x86_emulate_ctxt emulate_ctxt;  /* 16-byte aligned */

So we have:

- cpuid_entries at offset 1B50 (6992)
- maxphyaddr at offset 27D0 (6992 + 3200 = 10192)
- padding at 27D4...27DF
- emulate_ctxt at 27E0

And it writes in the padding.  Pfew, writing the ops field of emulate_ctxt
would have been much worse.

This patch fixes it by modding the index to avoid the out-of-bounds
access. Worst case, i == j and ej->function == e->function,
the loop can bail out.

Reported-by: Moguofang <moguofang@huawei.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Guofang Mo <moguofang@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-08 15:38:21 +02:00
Paolo Bonzini 38a4f43d56 KVM/ARM Fixes for v4.12-rc5 - Take 2
Changes include:
  - Fix an issue with migrating GICv2 VMs on GICv3 systems.
  - Squashed a bug for gicv3 when figuring out preemption levels.
  - Fix a potential null pointer derefence in KVM happening under memory
    pressure.
  - Maintain RES1 bits in the SCTLR_EL2 to make sure KVM works on new
    architecture revisions.
  - Allow unaligned accesses at EL2/HYP
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJZODC6AAoJEEtpOizt6ddy7qsH/RakZzHHlPcIFk+VPhK4AvIV
 ke6y1IznIVVv024geILb2NyF2pZoSUROxk1NF0wBIWM4ryjPm7oYgK7TTLyxkiX0
 00gNxWpRRerCSxfh11a28tQywc7ATlw0yFpogGvbbHG9qEMX1NaGP/CNFK5us0LT
 dw3y7jIZounlHlHu0W85AE27Osn5anFPHQnEtvJlUsM7WkIQf765EIfttXGUKRDZ
 szmwuFAhdsSeIfo23LNXj87WAn6uP/37qRUmNXnxSya4u5urXa4qlOM5Hvg6agw2
 K6LdpDXF/FnHhiT+b/xMTRPPivy4rXJZTpP51shl5GqKE2gI0tbhsHwJJ5Di/Aw=
 =3xSf
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-v4.12-rc5-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM Fixes for v4.12-rc5 - Take 2

Changes include:
 - Fix an issue with migrating GICv2 VMs on GICv3 systems.
 - Squashed a bug for gicv3 when figuring out preemption levels.
 - Fix a potential null pointer derefence in KVM happening under memory
   pressure.
 - Maintain RES1 bits in the SCTLR_EL2 to make sure KVM works on new
   architecture revisions.
 - Allow unaligned accesses at EL2/HYP
2017-06-08 15:04:38 +02:00
Marcin Nowakowski 87051ec120 MIPS: ftrace: fix init functions tracing
Since introduction of tracing for init functions the in_kernel_space()
check is no longer correct, as it ignores the init sections. As a
result, when probes are inserted (and disabled) in the init functions,
a branch instruction is inserted instead of a nop, which is likely to
result in random crashes during boot.

Remove the MIPS-specific in_kernel_space() method and replace it with a
generic core_kernel_text() that also checks for init sections during
system boot stage.

Fixes: 42c269c88d ("ftrace: Allow for function tracing to record init functions on boot up")
Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>
Tested-by: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/16092/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-06-08 14:51:59 +02:00
Marcin Nowakowski c56e7a4c3e MIPS: mm: adjust PKMAP location
Space reserved for PKMap should span from PKMAP_BASE to FIXADDR_START.
For large page sizes this is not the case as eg. for 64k pages the range
currently defined is from 0xfe000000 to 0x102000000(!!) which obviously
isn't right.
Remove the hardcoded location and set the BASE address as an offset from
FIXADDR_START.

Since all PKMAP ptes have to be placed in a contiguous memory, ensure
that this is the case by placing them all in a single page. This is
achieved by aligning the end address to pkmap pages count pages.

Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/15950/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-06-08 14:51:58 +02:00
Marcin Nowakowski 725a269b3d MIPS: highmem: ensure that we don't use more than one page for PTEs
All PTEs used by PKMAP should be allocated in a contiguous memory area,
but we do not currently have a mechanism to enforce that, so ensure that
we don't try to allocate more entries than would fit in a single page.

Current fixed value of 1024 would not work with XPA enabled when
sizeof(pte_t)==8 and we need two pages to store pte tables.

Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/15949/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-06-08 14:51:58 +02:00
Marcin Nowakowski 71eb989ab5 MIPS: mm: fixed mappings: correct initialisation
fixrange_init operates at PMD-granularity and expects the addresses to
be PMD-size aligned, but currently that might not be the case for
PKMAP_BASE unless it is defined properly, so ensure a correct alignment
is used before passing the address to fixrange_init.

fixed mappings: only align the start address that is passed to
fixrange_init rather than the value before adding the size, as we may
end up with uninitialised upper part of the range.

Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/15948/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-06-08 14:51:58 +02:00
Marcin Nowakowski f7a31b5e78 MIPS: perf: Remove incorrect odd/even counter handling for I6400
All performance counters on I6400 (odd and even) are capable of counting
any of the available events, so drop current logic of using the extra
bit to determine which counter to use.

Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>
Fixes: 4e88a86213 ("MIPS: Add cases for CPU_I6400")
Fixes: fd716fca10 ("MIPS: perf: Fix I6400 event numbers")
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/15991/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-06-08 14:51:58 +02:00
Michael Ellerman c6ee9619e2 powerpc/book3s64: Move PPC_DT_CPU_FTRs and enable it by default
The PPC_DT_CPU_FTRs is a bit misplaced in menuconfig, it shows up with
other general kernel options. It's really more at home in the "Platform
Support" section, so move it there.

Also enable it by default, for Book3s 64. It does mostly nothing unless
the device tree properties are found, and we will want it enabled
eventually in distro kernels, so turn it on to start getting more
testing.

Fixes: 5a61ef74f2 ("powerpc/64s: Support new device tree binding for discovering CPU features")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-08 20:42:57 +10:00
Aneesh Kumar K.V 92d9dfda8b powerpc/mm/4k: Limit 4k page size config to 64TB virtual address space
Supporting 512TB requires us to do a order 3 allocation for level 1 page
table (pgd). This results in page allocation failures with certain workloads.
For now limit 4k linux page size config to 64TB.

Fixes: f6eedbba7a ("powerpc/mm/hash: Increase VA range to 128TB")
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-08 20:42:56 +10:00
Dmitry Vyukov 31b35f6b4d locking/x86: Remove the unused atomic_inc_short() methd
It is completely unused and implemented only on x86.
Remove it.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170526172900.91058-1-dvyukov@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-08 10:33:50 +02:00
Ingo Molnar a5506c46a4 Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-08 10:12:12 +02:00
Dominik Brodowski 5b0bc9ac2c x86/microcode/intel: Clear patch pointer before jettisoning the initrd
During early boot, load_ucode_intel_ap() uses __load_ucode_intel()
to obtain a pointer to the relevant microcode patch (embedded in the
initrd), and stores this value in 'intel_ucode_patch' to speed up the
microcode patch application for subsequent CPUs.

On resuming from suspend-to-RAM, however, load_ucode_ap() calls
load_ucode_intel_ap() for each non-boot-CPU. By then the initramfs is
long gone so the pointer stored in 'intel_ucode_patch' no longer points to
a valid microcode patch.

Clear that pointer so that we effectively fall back to the CPU hotplug
notifier callbacks to update the microcode.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
[ Edit and massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.10..
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170607095819.9754-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-08 10:03:05 +02:00
Borislav Petkov bbf79d21bd x86/ldt: Rename ldt_struct::size to ::nr_entries
... because this is exactly what it is: the number of entries in the
LDT. Calling it "size" is simply confusing and it is actually begging
to be called "nr_entries" or somesuch, especially if you see constructs
like:

	alloc_size = size * LDT_ENTRY_SIZE;

since LDT_ENTRY_SIZE is the size of a single entry.

There should be no functionality change resulting from this patch, as
the before/after output from tools/testing/selftests/x86/ldt_gdt.c
shows.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170606173116.13977-1-bp@alien8.de
[ Renamed 'n_entries' to 'nr_entries' ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-08 09:28:21 +02:00
Daniel Borkmann 7005cade1b bpf, arm64: use separate register for state in stxr
Will reported that in BPF_XADD we must use a different register in stxr
instruction for the status flag due to otherwise CONSTRAINED UNPREDICTABLE
behavior per architecture. Reference manual says [1]:

  If s == t, then one of the following behaviors must occur:

   * The instruction is UNDEFINED.
   * The instruction executes as a NOP.
   * The instruction performs the store to the specified address, but
     the value stored is UNKNOWN.

Thus, use a different temporary register for the status flag to fix it.

Disassembly extract from test 226/STX_XADD_DW from test_bpf.ko:

  [...]
  0000003c:  c85f7d4b  ldxr x11, [x10]
  00000040:  8b07016b  add x11, x11, x7
  00000044:  c80c7d4b  stxr w12, x11, [x10]
  00000048:  35ffffac  cbnz w12, 0x0000003c
  [...]

  [1] https://static.docs.arm.com/ddi0487/b/DDI0487B_a_armv8_arm.pdf, p.6132

Fixes: 85f68fe898 ("bpf, arm64: implement jiting of BPF_XADD")
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 15:27:20 -04:00
Linus Torvalds b29794ec95 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Made TCP congestion control documentation match current reality,
    from Anmol Sarma.

 2) Various build warning and failure fixes from Arnd Bergmann.

 3) Fix SKB list leak in ipv6_gso_segment().

 4) Use after free in ravb driver, from Eugeniu Rosca.

 5) Don't use udp_poll() in ping protocol driver, from Eric Dumazet.

 6) Don't crash in PCI error recovery of cxgb4 driver, from Guilherme
    Piccoli.

 7) _SRC_NAT_DONE_BIT needs to be cleared using atomics, from Liping
    Zhang.

 8) Use after free in vxlan deletion, from Mark Bloch.

 9) Fix ordering of NAPI poll enabled in ethoc driver, from Max
    Filippov.

10) Fix stmmac hangs with TSO, from Niklas Cassel.

11) Fix crash in CALIPSO ipv6, from Richard Haines.

12) Clear nh_flags properly on mpls link up. From Roopa Prabhu.

13) Fix regression in sk_err socket error queue handling, noticed by
    ping applications. From Soheil Hassas Yeganeh.

14) Update mlx4/mlx5 MAINTAINERS information.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (78 commits)
  net: stmmac: fix a broken u32 less than zero check
  net: stmmac: fix completely hung TX when using TSO
  net: ethoc: enable NAPI before poll may be scheduled
  net: bridge: fix a null pointer dereference in br_afspec
  ravb: Fix use-after-free on `ifconfig eth0 down`
  net/ipv6: Fix CALIPSO causing GPF with datagram support
  net: stmmac: ensure jumbo_frm error return is correctly checked for -ve value
  Revert "sit: reload iphdr in ipip6_rcv"
  i40e/i40evf: proper update of the page_offset field
  i40e: Fix state flags for bit set and clean operations of PF
  iwlwifi: fix host command memory leaks
  iwlwifi: fix min API version for 7265D, 3168, 8000 and 8265
  iwlwifi: mvm: clear new beacon command template struct
  iwlwifi: mvm: don't fail when removing a key from an inexisting sta
  iwlwifi: pcie: only use d0i3 in suspend/resume if system_pm is set to d0i3
  iwlwifi: mvm: fix firmware debug restart recording
  iwlwifi: tt: move ucode_loaded check under mutex
  iwlwifi: mvm: support ibss in dqa mode
  iwlwifi: mvm: Fix command queue number on d0i3 flow
  iwlwifi: mvm: rs: start using LQ command color
  ...
2017-06-06 14:30:17 -07:00
Linus Torvalds e87f327ecd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull sparc fixes from David Miller:

 1) Fix TLB context wrap races, from Pavel Tatashin.

 2) Cure some gcc-7 build issues.

 3) Handle invalid setup_hugepagesz command line values properly, from
    Liam R Howlett.

 4) Copy TSB using the correct address shift for the huge TSB, from Mike
    Kravetz.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: delete old wrap code
  sparc64: new context wrap
  sparc64: add per-cpu mm of secondary contexts
  sparc64: redefine first version
  sparc64: combine activate_mm and switch_mm
  sparc64: reset mm cpumask after wrap
  sparc/mm/hugepages: Fix setup_hugepagesz for invalid values.
  sparc: Machine description indices can vary
  sparc64: mm: fix copy_tsb to correctly copy huge page TSBs
  arch/sparc: support NR_CPUS = 4096
  sparc64: Add __multi3 for gcc 7.x and later.
  sparc64: Fix build warnings with gcc 7.
  arch/sparc: increase CONFIG_NODES_SHIFT on SPARC64 to 5
2017-06-06 14:28:18 -07:00
Pavel Tatashin 0197e41ce7 sparc64: delete old wrap code
The old method that is using xcall and softint to get new context id is
deleted, as it is replaced by a method of using per_cpu_secondary_mm
without xcall to perform the context wrap.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 13:45:29 -07:00
Pavel Tatashin a0582f26ec sparc64: new context wrap
The current wrap implementation has a race issue: it is called outside of
the ctx_alloc_lock, and also does not wait for all CPUs to complete the
wrap.  This means that a thread can get a new context with a new version
and another thread might still be running with the same context. The
problem is especially severe on CPUs with shared TLBs, like sun4v. I used
the following test to very quickly reproduce the problem:
- start over 8K processes (must be more than context IDs)
- write and read values at a  memory location in every process.

Very quickly memory corruptions start happening, and what we read back
does not equal what we wrote.

Several approaches were explored before settling on this one:

Approach 1:
Move smp_new_mmu_context_version() inside ctx_alloc_lock, and wait for
every process to complete the wrap. (Note: every CPU must WAIT before
leaving smp_new_mmu_context_version_client() until every one arrives).

This approach ends up with deadlocks, as some threads own locks which other
threads are waiting for, and they never receive softint until these threads
exit smp_new_mmu_context_version_client(). Since we do not allow the exit,
deadlock happens.

Approach 2:
Handle wrap right during mondo interrupt. Use etrap/rtrap to enter into
into C code, and issue new versions to every CPU.
This approach adds some overhead to runtime: in switch_mm() we must add
some checks to make sure that versions have not changed due to wrap while
we were loading the new secondary context. (could be protected by PSTATE_IE
but that degrades performance as on M7 and older CPUs as it takes 50 cycles
for each access). Also, we still need a global per-cpu array of MMs to know
where we need to load new contexts, otherwise we can change context to a
thread that is going way (if we received mondo between switch_mm() and
switch_to() time). Finally, there are some issues with window registers in
rtrap() when context IDs are changed during CPU mondo time.

The approach in this patch is the simplest and has almost no impact on
runtime.  We use the array with mm's where last secondary contexts were
loaded onto CPUs and bump their versions to the new generation without
changing context IDs. If a new process comes in to get a context ID, it
will go through get_new_mmu_context() because of version mismatch. But the
running processes do not need to be interrupted. And wrap is quicker as we
do not need to xcall and wait for everyone to receive and complete wrap.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 13:45:29 -07:00
Pavel Tatashin 7a5b4bbf49 sparc64: add per-cpu mm of secondary contexts
The new wrap is going to use information from this array to figure out
mm's that currently have valid secondary contexts setup.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 13:45:29 -07:00
Pavel Tatashin c4415235b2 sparc64: redefine first version
CTX_FIRST_VERSION defines the first context version, but also it defines
first context. This patch redefines it to only include the first context
version.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 13:45:28 -07:00
Pavel Tatashin 14d0334c67 sparc64: combine activate_mm and switch_mm
The only difference between these two functions is that in activate_mm we
unconditionally flush context. However, there is no need to keep this
difference after fixing a bug where cpumask was not reset on a wrap. So, in
this patch we combine these.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 13:45:28 -07:00
Pavel Tatashin 5889748573 sparc64: reset mm cpumask after wrap
After a wrap (getting a new context version) a process must get a new
context id, which means that we would need to flush the context id from
the TLB before running for the first time with this ID on every CPU. But,
we use mm_cpumask to determine if this process has been running on this CPU
before, and this mask is not reset after a wrap. So, there are two possible
fixes for this issue:

1. Clear mm cpumask whenever mm gets a new context id
2. Unconditionally flush context every time process is running on a CPU

This patch implements the first solution

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 13:45:28 -07:00
Liam R. Howlett f322980b74 sparc/mm/hugepages: Fix setup_hugepagesz for invalid values.
hugetlb_bad_size needs to be called on invalid values.  Also change the
pr_warn to a pr_err to better align with other platforms.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 13:45:03 -07:00
James Clarke c982aa9c30 sparc: Machine description indices can vary
VIO devices were being looked up by their index in the machine
description node block, but this often varies over time as devices are
added and removed. Instead, store the ID and look up using the type,
config handle and ID.

Signed-off-by: James Clarke <jrtc27@jrtc27.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112541
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 13:45:03 -07:00
Mike Kravetz 654f480762 sparc64: mm: fix copy_tsb to correctly copy huge page TSBs
When a TSB grows beyond its current capacity, a new TSB is allocated
and copy_tsb is called to copy entries from the old TSB to the new.
A hash shift based on page size is used to calculate the index of an
entry in the TSB.  copy_tsb has hard coded PAGE_SHIFT in these
calculations.  However, for huge page TSBs the value REAL_HPAGE_SHIFT
should be used.  As a result, when copy_tsb is called for a huge page
TSB the entries are placed at the incorrect index in the newly
allocated TSB.  When doing hardware table walk, the MMU does not
match these entries and we end up in the TSB miss handling code.
This code will then create and write an entry to the correct index
in the TSB.  We take a performance hit for the table walk miss and
recreation of these entries.

Pass a new parameter to copy_tsb that is the page size shift to be
used when copying the TSB.

Suggested-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 13:45:02 -07:00
Jane Chu c79a13734d arch/sparc: support NR_CPUS = 4096
Linux SPARC64 limits NR_CPUS to 4064 because init_cpu_send_mondo_info()
only allocates a single page for NR_CPUS mondo entries. Thus we cannot
use all 4096 CPUs on some SPARC platforms.

To fix, allocate (2^order) pages where order is set according to the size
of cpu_list for possible cpus. Since cpu_list_pa and cpu_mondo_block_pa
are not used in asm code, there are no imm13 offsets from the base PA
that will break because they can only reach one page.

Orabug: 25505750

Signed-off-by: Jane Chu <jane.chu@oracle.com>

Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Atish Patra <atish.patra@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 16:41:47 -04:00
Marc Zyngier 33b5c38852 arm: KVM: Allow unaligned accesses at HYP
We currently have the HSCTLR.A bit set, trapping unaligned accesses
at HYP, but we're not really prepared to deal with it.

Since the rest of the kernel is pretty happy about that, let's follow
its example and set HSCTLR.A to zero. Modern CPUs don't really care.

Cc: stable@vger.kernel.org
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-06-06 22:20:02 +02:00
Marc Zyngier 78fd6dcf11 arm64: KVM: Allow unaligned accesses at EL2
We currently have the SCTLR_EL2.A bit set, trapping unaligned accesses
at EL2, but we're not really prepared to deal with it. So far, this
has been unnoticed, until GCC 7 started emitting those (in particular
64bit writes on a 32bit boundary).

Since the rest of the kernel is pretty happy about that, let's follow
its example and set SCTLR_EL2.A to zero. Modern CPUs don't really
care.

Cc: stable@vger.kernel.org
Reported-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-06-06 22:20:02 +02:00
Marc Zyngier d68c1f7fd1 arm64: KVM: Preserve RES1 bits in SCTLR_EL2
__do_hyp_init has the rather bad habit of ignoring RES1 bits and
writing them back as zero. On a v8.0-8.2 CPU, this doesn't do anything
bad, but may end-up being pretty nasty on future revisions of the
architecture.

Let's preserve those bits so that we don't have to fix this later on.

Cc: stable@vger.kernel.org
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-06-06 22:20:02 +02:00
Wanpeng Li d4912215d1 KVM: nVMX: Fix exception injection
WARNING: CPU: 3 PID: 2840 at arch/x86/kvm/vmx.c:10966 nested_vmx_vmexit+0xdcd/0xde0 [kvm_intel]
 CPU: 3 PID: 2840 Comm: qemu-system-x86 Tainted: G           OE   4.12.0-rc3+ #23
 RIP: 0010:nested_vmx_vmexit+0xdcd/0xde0 [kvm_intel]
 Call Trace:
  ? kvm_check_async_pf_completion+0xef/0x120 [kvm]
  ? rcu_read_lock_sched_held+0x79/0x80
  vmx_queue_exception+0x104/0x160 [kvm_intel]
  ? vmx_queue_exception+0x104/0x160 [kvm_intel]
  kvm_arch_vcpu_ioctl_run+0x1171/0x1ce0 [kvm]
  ? kvm_arch_vcpu_load+0x47/0x240 [kvm]
  ? kvm_arch_vcpu_load+0x62/0x240 [kvm]
  kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
  ? kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
  ? __fget+0xf3/0x210
  do_vfs_ioctl+0xa4/0x700
  ? __fget+0x114/0x210
  SyS_ioctl+0x79/0x90
  do_syscall_64+0x81/0x220
  entry_SYSCALL64_slow_path+0x25/0x25

This is triggered occasionally by running both win7 and win2016 in L2, in
addition, EPT is disabled on both L1 and L2. It can't be reproduced easily.

Commit 0b6ac343fc (KVM: nVMX: Correct handling of exception injection) mentioned
that "KVM wants to inject page-faults which it got to the guest. This function
assumes it is called with the exit reason in vmcs02 being a #PF exception".
Commit e011c663 (KVM: nVMX: Check all exceptions for intercept during delivery to
L2) allows to check all exceptions for intercept during delivery to L2. However,
there is no guarantee the exit reason is exception currently, when there is an
external interrupt occurred on host, maybe a time interrupt for host which should
not be injected to guest, and somewhere queues an exception, then the function
nested_vmx_check_exception() will be called and the vmexit emulation codes will
try to emulate the "Acknowledge interrupt on exit" behavior, the warning is
triggered.

Reusing the exit reason from the L2->L0 vmexit is wrong in this case,
the reason must always be EXCEPTION_NMI when injecting an exception into
L1 as a nested vmexit.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Fixes: e011c663b9 ("KVM: nVMX: Check all exceptions for intercept during delivery to L2")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-06-06 15:21:50 +02:00
Paolo Bonzini bbaf0e2b1c kvm: async_pf: fix rcu_irq_enter() with irqs enabled
native_safe_halt enables interrupts, and you just shouldn't
call rcu_irq_enter() with interrupts enabled.  Reorder the
call with the following local_irq_disable() to respect the
invariant.

Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-06-06 14:43:16 +02:00
Madhavan Srinivasan 8c218578fc powerpc/perf: Fix Power9 test_adder fields
Commit 8d911904f3 ('powerpc/perf: Add restrictions to PMC5 in power9 DD1')
was added to restrict the use of PMC5 in Power9 DD1. Intention was to disable
the use of PMC5 using raw event code. But instead of updating the
power9_isa207_pmu structure (used on DD1), the commit incorrectly updated the
power9_pmu structure. Fix it.

Fixes: 8d911904f3 ("powerpc/perf: Add restrictions to PMC5 in power9 DD1")
Reported-by: Shriya <shriyak@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Tested-by: Shriya <shriyak@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-06 21:21:19 +10:00
Michael Ellerman ba4a648f12 powerpc/numa: Fix percpu allocations to be NUMA aware
In commit 8c27226119 ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), we
switched to the generic implementation of cpu_to_node(), which uses a percpu
variable to hold the NUMA node for each CPU.

Unfortunately we neglected to notice that we use cpu_to_node() in the allocation
of our percpu areas, leading to a chicken and egg problem. In practice what
happens is when we are setting up the percpu areas, cpu_to_node() reports that
all CPUs are on node 0, so we allocate all percpu areas on node 0.

This is visible in the dmesg output, as all pcpu allocs being in group 0:

  pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
  pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
  pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
  pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31
  pcpu-alloc: [0] 32 33 34 35 [0] 36 37 38 39
  pcpu-alloc: [0] 40 41 42 43 [0] 44 45 46 47

To fix it we need an early_cpu_to_node() which can run prior to percpu being
setup. We already have the numa_cpu_lookup_table we can use, so just plumb it
in. With the patch dmesg output shows two groups, 0 and 1:

  pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
  pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
  pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
  pcpu-alloc: [1] 24 25 26 27 [1] 28 29 30 31
  pcpu-alloc: [1] 32 33 34 35 [1] 36 37 38 39
  pcpu-alloc: [1] 40 41 42 43 [1] 44 45 46 47

We can also check the data_offset in the paca of various CPUs, with the fix we
see:

  CPU 0:  data_offset = 0x0ffe8b0000
  CPU 24: data_offset = 0x1ffe5b0000

And we can see from dmesg that CPU 24 has an allocation on node 1:

  node   0: [mem 0x0000000000000000-0x0000000fffffffff]
  node   1: [mem 0x0000001000000000-0x0000001fffffffff]

Cc: stable@vger.kernel.org # v3.16+
Fixes: 8c27226119 ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-06 21:19:46 +10:00
Breno Leitao 7f22ced437 powerpc/kernel: Initialize load_tm on task creation
Currently tsk->thread.load_tm is not initialized in the task creation
and can contain garbage on a new task.

This is an undesired behaviour, since it affects the timing to enable
and disable the transactional memory laziness (disabling and enabling
the MSR TM bit, which affects TM reclaim and recheckpoint in the
scheduling process).

Fixes: 5d176f751e ("powerpc: tm: Enable transactional memory (TM) lazily for userspace")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-06 19:09:22 +10:00
Max Filippov e5c86679d5 xtensa: don't use linux IRQ #0
Linux IRQ #0 is reserved for error reporting and may not be used.
Increase NR_IRQS for one additional slot and increase
irq_domain_add_legacy parameter first_irq value to 1, so that linux
IRQ #0 is not associated with hardware IRQ #0 in legacy IRQ domains.
Introduce macro XTENSA_PIC_LINUX_IRQ for static translation of xtensa
PIC hardware IRQ # to linux IRQ #. Use this macro in XTFPGA platform
data definitions.

This fixes inability to use hardware IRQ #0 in configurations that don't
use device tree and allows for non-identity mapping between linux IRQ #
and hardware IRQ #.

Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2017-06-05 16:53:10 -07:00
Max Filippov 6bf28969f6 xtensa: reduce double exception literal reservation
Double exception vector only needs 20 bytes of space for 5 literals, not
48. Reduce the reservation for double exception vector literals
accordingly. This fixes build for configurations with small user
exception vector size.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2017-06-05 11:52:07 -07:00
David S. Miller 1b4af13ff2 sparc64: Add __multi3 for gcc 7.x and later.
Reported-by: Waldemar Brodkorb <wbx@openadk.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-05 11:30:33 -07:00
Linus Torvalds 112eb07287 Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
 "Three fixes this time around:

   - Two fixes for noMMU, fixing the decompressor header layout, and
     preventing a build error with some configurations.

   - Fixing the hyp-stub updates that went in during the merge window
     for platforms that use MCPM"

* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 8677/1: boot/compressed: fix decompressor header layout for v7-M
  ARM: 8676/1: NOMMU: provide pgprot_device() macro
  ARM: 8675/1: MCPM: ensure not to enter __hyp_soft_restart from loopback and cpu_power_down
2017-06-05 11:19:40 -07:00
Ard Biesheuvel bb817bef3b efi/arm: Enable DMI/SMBIOS
Wire up the existing arm64 support for SMBIOS tables (aka DMI) for ARM as
well, by moving the arm64 init code to drivers/firmware/efi/arm-runtime.c
(which is shared between ARM and arm64), and adding a asm/dmi.h header to
ARM that defines the mapping routines for the firmware tables.

This allows userspace to access these tables to discover system information
exposed by the firmware. It also sets the hardware name used in crash
dumps, e.g.:

  Unable to handle kernel NULL pointer dereference at virtual address 00000000
  pgd = ed3c0000
  [00000000] *pgd=bf1f3835
  Internal error: Oops: 817 [#1] SMP THUMB2
  Modules linked in:
  CPU: 0 PID: 759 Comm: bash Not tainted 4.10.0-09601-g0e8f38792120-dirty #112
  Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
  ^^^

NOTE: This does *NOT* enable or encourage the use of DMI quirks, i.e., the
      the practice of identifying the platform via DMI to decide whether
      certain workarounds for buggy hardware and/or firmware need to be
      enabled. This would require the DMI subsystem to be enabled much
      earlier than we do on ARM, which is non-trivial.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20170602135207.21708-14-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-05 17:50:44 +02:00
Sai Praneeth ac81d3de03 x86/efi: Extend CONFIG_EFI_PGT_DUMP support to x86_32 and kexec as well
CONFIG_EFI_PGT_DUMP=y, as the name suggests, dumps EFI page tables to the
kernel log during kernel boot.

This feature is very useful while debugging page faults/null pointer
dereferences to EFI related addresses.

Presently, this feature is limited only to x86_64, so let's extend it to
other EFI configurations like kexec kernel, efi=old_map and to x86_32 as well.

This doesn't effect normal boot path because this config option should
be used only for debug purposes.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20170602135207.21708-13-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-05 17:50:43 +02:00
Jan Kiszka 2959c95d51 efi/capsule: Add support for Quark security header
The firmware for Quark X102x prepends a security header to the capsule
which is needed to support the mandatory secure boot on this processor.
The header can be detected by checking for the "_CSH" signature and -
to avoid any GUID conflict - validating its size field to contain the
expected value. Then we need to look for the EFI header right after the
security header and pass the real header to __efi_capsule_setup_info.

To be minimal invasive and maximal safe, the quirk version of
efi_capsule_setup_info() is only effective on Quark processors.

Tested-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20170602135207.21708-11-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-05 17:50:42 +02:00
Christoph Hellwig e64e17a554 S390/sysinfo: use uuid_is_null instead of opencoding it
And switch to use uuid_t instead of the old uuid_be type.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2017-06-05 16:59:06 +02:00
Ard Biesheuvel 06a4b6d009 ARM: 8677/1: boot/compressed: fix decompressor header layout for v7-M
As reported by Patrice, the header layout of the decompressor is
incorrect when building for v7-M. In this case, the __nop macro
resolves to 'mov r0, r0', which is emitted as a narrow encoding,
resulting in the header data fields to end up at lower offsets than
required.

Given the variety of targets we need to support with the same code,
the startup sequence is a bit of a jumble, and uses instructions
and macros whose encoding widths cannot be specified (badr), or only
exist in a narrow encoding (bx)

So force the use of a wide encoding in __nop, and replace the start
sequence with a simple jump to the label marking the start of code,
preceded by a Thumb2 mode switch if required (using explicit wide
encodings where appropriate). The label itself can be moved to the
start of code [where it belongs] due to the larger range of branch
instructions as compared to adr instructions.

Reported-by: Patrice CHOTARD <patrice.chotard@st.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2017-06-05 10:29:40 +01:00
Vladimir Murzin 7ef4783e19 ARM: 8676/1: NOMMU: provide pgprot_device() macro
NOMMU build leads to the following error:

  CC      drivers/pci/mmap.o
drivers/pci/mmap.c: In function 'pci_mmap_resource_range':
drivers/pci/mmap.c:60:3: error: implicit declaration of function 'pgprot_device' [-Werror=implicit-function-declaration]
   vma->vm_page_prot = pgprot_device(vma->vm_page_prot);
   ^

cc1: some warnings being treated as errors
scripts/Makefile.build:302: recipe for target 'drivers/pci/mmap.o' failed
make[2]: *** [drivers/pci/mmap.o] Error 1
scripts/Makefile.build:561: recipe for target 'drivers/pci' failed
make[1]: *** [drivers/pci] Error 2
Makefile:1016: recipe for target 'drivers' failed
make: *** [drivers] Error 2

Fix it with support of pgprot_device() macro for NOMMU.

Fixes: 00d2904ffe ("ARM/PCI: Use generic pci_mmap_resource_range()")
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2017-06-05 10:29:40 +01:00
Andy Lutomirski d6e41f1151 x86/mm, KVM: Teach KVM's VMX code that CR3 isn't a constant
When PCID is enabled, CR3's PCID bits can change during context
switches, so KVM won't be able to treat CR3 as a per-mm constant any
more.

I structured this like the existing CR4 handling.  Under ordinary
circumstances (PCID disabled or if the current PCID and the value
that's already in the VMCS match), then we won't do an extra VMCS
write, and we'll never do an extra direct CR3 read.  The overhead
should be minimal.

I disallowed using the new helper in non-atomic context because
PCID support will cause CR3 to stop being constant in non-atomic
process context.

(Frankly, it also scares me a bit that KVM ever treated CR3 as
constant, but it looks like it was okay before.)

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-05 09:59:45 +02:00
Andy Lutomirski be4ffc0d78 x86/mm: Be more consistent wrt PAGE_SHIFT vs PAGE_SIZE in tlb flush code
Nadav pointed out that some code used PAGE_SIZE and other code used
PAGE_SHIFT.  Use PAGE_SHIFT instead of multiplying or dividing by
PAGE_SIZE.

Requested-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-05 09:59:44 +02:00
Andy Lutomirski 3d28ebceaf x86/mm: Rework lazy TLB to track the actual loaded mm
Lazy TLB state is currently managed in a rather baroque manner.
AFAICT, there are three possible states:

 - Non-lazy.  This means that we're running a user thread or a
   kernel thread that has called use_mm().  current->mm ==
   current->active_mm == cpu_tlbstate.active_mm and
   cpu_tlbstate.state == TLBSTATE_OK.

 - Lazy with user mm.  We're running a kernel thread without an mm
   and we're borrowing an mm_struct.  We have current->mm == NULL,
   current->active_mm == cpu_tlbstate.active_mm, cpu_tlbstate.state
   != TLBSTATE_OK (i.e. TLBSTATE_LAZY or 0).  The current cpu is set
   in mm_cpumask(current->active_mm).  CR3 points to
   current->active_mm->pgd.  The TLB is up to date.

 - Lazy with init_mm.  This happens when we call leave_mm().  We
   have current->mm == NULL, current->active_mm ==
   cpu_tlbstate.active_mm, but that mm is only relelvant insofar as
   the scheduler is tracking it for refcounting.  cpu_tlbstate.state
   != TLBSTATE_OK.  The current cpu is clear in
   mm_cpumask(current->active_mm).  CR3 points to swapper_pg_dir,
   i.e. init_mm->pgd.

This patch simplifies the situation.  Other than perf, x86 stops
caring about current->active_mm at all.  We have
cpu_tlbstate.loaded_mm pointing to the mm that CR3 references.  The
TLB is always up to date for that mm.  leave_mm() just switches us
to init_mm.  There are no longer any special cases for mm_cpumask,
and switch_mm() switches mms without worrying about laziness.

After this patch, cpu_tlbstate.state serves only to tell the TLB
flush code whether it may switch to init_mm instead of doing a
normal flush.

This makes fairly extensive changes to xen_exit_mmap(), which used
to look a bit like black magic.

Perf is unchanged.  With or without this change, perf may behave a bit
erratically if it tries to read user memory in kernel thread context.
We should build on this patch to teach perf to never look at user
memory when cpu_tlbstate.loaded_mm != current->mm.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-05 09:59:44 +02:00
Andy Lutomirski ce4a4e565f x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code
The UP asm/tlbflush.h generates somewhat nicer code than the SMP version.
Aside from that, it's fallen quite a bit behind the SMP code:

 - flush_tlb_mm_range() didn't flush individual pages if the range
   was small.

 - The lazy TLB code was much weaker.  This usually wouldn't matter,
   but, if a kernel thread flushed its lazy "active_mm" more than
   once (due to reclaim or similar), it wouldn't be unlazied and
   would instead pointlessly flush repeatedly.

 - Tracepoints were missing.

Aside from that, simply having the UP code around was a maintanence
burden, since it means that any change to the TLB flush code had to
make sure not to break it.

Simplify everything by deleting the UP code.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-05 09:59:44 +02:00
Andy Lutomirski 3f79e4c7c9 x86/mm: Use new merged flush logic in arch_tlbbatch_flush()
Now there's only one copy of the local tlb flush logic for
non-kernel pages on SMP kernels.

The only functional change is that arch_tlbbatch_flush() will now
leave_mm() on the local CPU if that CPU is in the batch and is in
TLBSTATE_LAZY mode.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-05 09:59:43 +02:00
Andy Lutomirski 454bbad979 x86/mm: Refactor flush_tlb_mm_range() to merge local and remote cases
The local flush path is very similar to the remote flush path.
Merge them.

This is intended to make no difference to behavior whatsoever.  It
removes some code and will make future changes to the flushing
mechanics simpler.

This patch does remove one small optimization: flush_tlb_mm_range()
now has an unconditional smp_mb() instead of using MOV to CR3 or
INVLPG as a full barrier when applicable.  I think this is okay for
a few reasons.  First, smp_mb() is quite cheap compared to the cost
of a TLB flush.  Second, this rearrangement makes a bigger
optimization available: with some work on the SMP function call
code, we could do the local and remote flushes in parallel.  Third,
I'm planning a rework of the TLB flush algorithm that will require
an atomic operation at the beginning of each flush, and that
operation will replace the smp_mb().

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-05 09:59:43 +02:00
Andy Lutomirski 59f537c1de x86/mm: Change the leave_mm() condition for local TLB flushes
On a remote TLB flush, we leave_mm() if we're TLBSTATE_LAZY.  For a
local flush_tlb_mm_range(), we leave_mm() if !current->mm.  These
are approximately the same condition -- the scheduler sets lazy TLB
mode when switching to a thread with no mm.

I'm about to merge the local and remote flush code, but for ease of
verifying and bisecting the patch, I want the local and remote flush
behavior to match first.  This patch changes the local code to match
the remote code.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-05 09:59:42 +02:00
Andy Lutomirski a2055abe9c x86/mm: Pass flush_tlb_info to flush_tlb_others() etc
Rather than passing all the contents of flush_tlb_info to
flush_tlb_others(), pass a pointer to the structure directly. For
consistency, this also removes the unnecessary cpu parameter from
uv_flush_tlb_others() to make its signature match the other
*flush_tlb_others() functions.

This serves two purposes:

 - It will dramatically simplify future patches that change struct
   flush_tlb_info, which I'm planning to do.

 - struct flush_tlb_info is an adequate description of what to do
   for a local flush, too, so by reusing it we can remove duplicated
   code between local and remove flushes in a future patch.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
[ Fix build warning. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-05 09:59:35 +02:00
Ingo Molnar 4241119eeb Linux 4.12-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZNJwlAAoJEHm+PkMAQRiGfmgH/1wYUgliF41UCVKEoZHR1vcr
 L1JkJhu/aXRYhtfOlna22w7QphZKPn5y/XcbqXbX782qfRi+3AIuNvPxiy90YGoy
 TtJyzv+JjmvCZpQgDKsy3mL2/kSvcKOQ2kGUEgZxNhorfXO209xO0gSNWBmo0pkJ
 xtSM2QDJMFUR24n3defRrIRGPcWw2X9N7NzEoIo8Dv6axNuNTGU1bUwjluHasSiU
 kNzhwfjUqPc/DppluLKYn18YstV+2kV6wofsnsH+w2N8wgSaqeipbolOCR1HFlc+
 k89TvaR9SfF8FfRwsH3Q6R+Aw18WgCJlNr4EFJFIlx32/hrwaWiUe0ckr1VBm/w=
 =R7aq
 -----END PGP SIGNATURE-----

Merge tag 'v4.12-rc4' into x86/mm, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-05 09:54:49 +02:00
Jiri Slaby 28be1b454c x86/boot: Remove unused copy_*_gs() functions
copy_from_gs() and copy_to_gs() are unused in the boot code. They have
actually never been used -- they were always commented out since their
addition in 2007:

  5be8656615 ("String-handling functions for the new x86 setup code.")

So remove them -- they can be restored from history if needed.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170531081243.5709-1-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-05 09:35:16 +02:00
Christian Sünkenberg ae1d557d8f x86/cpu/cyrix: Add alternative Device ID of Geode GX1 SoC
A SoC variant of Geode GX1, notably NSC branded SC1100, seems to
report an inverted Device ID in its DIR0 configuration register,
specifically 0xb instead of the expected 0x4.

Catch this presumably quirky version so it's properly recognized
as GX1 and has its cache switched to write-back mode, which provides
a significant performance boost in most workloads.

SC1100's datasheet "Geode™ SC1100 Information Appliance On a Chip",
states in section 1.1.7.1 "Device ID" that device identification
values are specified in SC1100's device errata. These, however,
seem to not have been publicly released.

Wading through a number of boot logs and /proc/cpuinfo dumps found on
pastebin and blogs, this patch should mostly be relevant for a number
of now admittedly aging Soekris NET4801 and PC Engines WRAP devices,
the latter being the platform this issue was discovered on.
Performance impact was verified using "openssl speed", with
write-back caching scaling throughput between -3% and +41%.

Signed-off-by: Christian Sünkenberg <christian.suenkenberg@student.kit.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1496596719.26725.14.camel@student.kit.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-05 08:34:20 +02:00
Breno Leitao 1195892c09 powerpc/kernel: Fix FP and vector register restoration
Currently tsk->thread->load_vec and load_fp are not initialized during
task creation, which can lead to garbage values in these variables (non-zero
values).

These variables will be checked later in restore_math() to validate if the
FP and vector registers are being utilized. Since these values might be
non-zero, the restore_math() will continue to save the FP and vectors even if
they were never utilized by the userspace application. load_fp and load_vec
counters will then overflow (they wrap at 255) and the FP and Altivec will be
finally disabled, but before that condition is reached (counter overflow)
several context switches will have restored FP and vector registers without
need, causing a performance degradation.

Fixes: 70fe3d980f ("powerpc: Restore FPU/VEC/VSX if previously used")
Cc: stable@vger.kernel.org # v4.6+
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Gustavo Romero <gusbromero@gmail.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-05 15:55:30 +10:00
Peter Zijlstra 855615eee9 x86/tsc: Remove the TSC_ADJUST clamp
Now that all affected platforms have a microcode update; and we check
this and disable TSC_DEADLINE and print a microcode revision update
error if its too old, we can remove the TSC_ADJUST clamp.

This should help with systems where the second socket runs ahead of
the first socket and needs a negative adjustment. In this case we'd
hit the 0 clamp and give up for not achieving synchronization.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: kevin.b.stanton@intel.com
Link: http://lkml.kernel.org/r/20170531155306.100950003@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-06-04 21:55:53 +02:00
Peter Zijlstra bd9240a18e x86/apic: Add TSC_DEADLINE quirk due to errata
Due to errata it is possible for the TSC_DEADLINE timer to misbehave
after using TSC_ADJUST. A microcode update is available to fix this
situation.

Avoid using the TSC_DEADLINE timer if it is affected by this issue and
report the required microcode version.

[ tglx: Renamed function to apic_check_deadline_errata() ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: kevin.b.stanton@intel.com
Link: http://lkml.kernel.org/r/20170531155306.050849877@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-06-04 21:55:53 +02:00
Peter Zijlstra c6e9f42bbe x86/apic: Change the lapic name in deadline mode
So that we can more easily see in what mode the lapic timer operates.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: kevin.b.stanton@intel.com
Link: http://lkml.kernel.org/r/20170531155305.989808008@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-06-04 21:55:52 +02:00
Thomas Gleixner 978267b643 Merge branch 'timers/urgent' into WIP.timers
Pick up urgent fixes to avoid conflicts.
2017-06-04 15:21:52 +02:00
Christoph Hellwig 6bc51cbaa9 signal: Remove non-uapi <asm/siginfo.h>
By moving the kernel side __SI_* defintions right next to the userspace
ones we can kill the non-uapi versions of <asm/siginfo.h> include
include/asm-generic/siginfo.h and untangle the unholy mess of includes.

[ tglx: Removed uapi/asm/siginfo.h from m32r, microblaze, mn10300 and score ]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-ia64@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: sparclinux@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20170603190102.28866-6-hch@lst.de
2017-06-04 15:11:47 +02:00
Christoph Hellwig 7994200ce6 ia64: Remove HAVE_ARCH_COPY_SIGINFO
Since ia64 defines __ARCH_SI_PREAMBLE_SIZE it can just use the generic
copy_siginfo implementation, which is identical to the architecture
specific one.

With that support for HAVE_ARCH_COPY_SIGINFO can go away entirely.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-ia64@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: sparclinux@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20170603190102.28866-3-hch@lst.de
2017-06-04 15:11:46 +02:00
Christoph Hellwig f70c80068f sparc: Simplify <asm/siginfo.h>
There is no need for the forward declaration of compat_siginfo provided
here.  We can't yet use the generic header as we need to pull in the
sparc-specific version of the uapi <asm/siginfo.h>, but this prepares
for removing the non-uapi <asm/siginfo.h> entirely.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-ia64@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/20170603190102.28866-2-hch@lst.de
2017-06-04 15:11:45 +02:00
Chen-Yu Tsai f74994a940 arm64: allwinner: a64: Add PLL_PERIPH0 clock to the R_CCU
The AR100 clock within the R_CCU (PRCM) has the PLL_PERIPH0 as one of
its parents.

This adds the reference in the device tree describing this relationship.
This patch uses a raw number for the clock index to ease merging by
avoiding cross tree dependencies.

Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
2017-06-03 10:04:49 +08:00
Chen-Yu Tsai 77125a701a ARM: sunxi: h3-h5: Add PLL_PERIPH0 clock to the R_CCU
The AR100 clock within the R_CCU (PRCM) has the PLL_PERIPH0 as one of
its parents.

This adds the reference in the device tree describing this relationship.
This patch uses a raw number for the clock index to ease merging by
avoiding cross tree dependencies.

Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
2017-06-03 10:04:48 +08:00
Linus Torvalds f219764920 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "15 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  scripts/gdb: make lx-dmesg command work (reliably)
  mm: consider memblock reservations for deferred memory initialization sizing
  mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified
  mlock: fix mlock count can not decrease in race condition
  mm/migrate: fix refcount handling when !hugepage_migration_supported()
  dax: fix race between colliding PMD & PTE entries
  mm: avoid spurious 'bad pmd' warning messages
  mm/page_alloc.c: make sure OOM victim can try allocations with no watermarks once
  pcmcia: remove left-over %Z format
  slub/memcg: cure the brainless abuse of sysfs attributes
  initramfs: fix disabling of initramfs (and its compression)
  mm: clarify why we want kmalloc before falling backto vmallock
  frv: declare jiffies to be located in the .data section
  include/linux/gfp.h: fix ___GFP_NOLOCKDEP value
  ksm: prevent crash after write_protect_page fails
2017-06-02 15:49:46 -07:00
Matthias Kaehlcke 60b0a8c3d2 frv: declare jiffies to be located in the .data section
Commit 7c30f352c8 ("jiffies.h: declare jiffies and jiffies_64 with
____cacheline_aligned_in_smp") removed a section specification from the
jiffies declaration that caused conflicts on some platforms.

Unfortunately this change broke the build for frv:

  kernel/built-in.o: In function `__do_softirq': (.text+0x6460): relocation truncated to fit: R_FRV_GPREL12 against symbol
      `jiffies' defined in *ABS* section in .tmp_vmlinux1
  kernel/built-in.o: In function `__do_softirq': (.text+0x6574): relocation truncated to fit: R_FRV_GPREL12 against symbol
      `jiffies' defined in *ABS* section in .tmp_vmlinux1
  kernel/built-in.o: In function `pwq_activate_delayed_work': workqueue.c:(.text+0x15b9c): relocation truncated to fit: R_FRV_GPREL12 against
      symbol `jiffies' defined in *ABS* section in .tmp_vmlinux1
  ...

Add __jiffy_arch_data to the declaration of jiffies and use it on frv to
include the section specification.  For all other platforms
__jiffy_arch_data (currently) has no effect.

Fixes: 7c30f352c8 ("jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smp")
Link: http://lkml.kernel.org/r/20170516221333.177280-1-mka@chromium.org
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02 15:07:37 -07:00
Linus Torvalds b939c51445 ACPI-related fixes for arm64:
- GICC MADT entry validity check fix
 
 - Skip IRQ registration with pmu=off in an ACPI guest
 
 - struct acpi_pci_root_ops freeing on error path
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZMZyaAAoJEGvWsS0AyF7xIQIQAIdbwi6/NWKAa3p7ZOte8hHz
 wTiDbW0swAdWHxgwrSkIJLHWsEKj1Tc2xvTgrWyc1ygxVh6GipgRyJtVVgcLyo00
 2vpxp7e3JOHBzKEbapDjAI4vMV0IARa9GCD5+Zbd0TXC8cxlhrWW4K94P5KjNbOf
 piTKvXrudCPU+plwaNHjROgiZmaKPV4H9Nwlo9AQ+cYHLD8zUn4Xu6vnZsYS40pZ
 WVJ5a9LDD7PRfG4ox1Ie/b8G1DlT7sqv4j/HRXXVBEt9NtOAnKheaKMarolhGgzO
 5OlBIEiR+T7HZrSKCrvpsZZF2WhKIZHBuiwXNOHj5yKiU6LTyT2gZ0R5+EA9V9e7
 h447uf4DyqxlK4vNvtn+JL2pBv8oVJHAuxinp2PW1N8IriRU0XyGnWV4nIw9QkN9
 H+i8pDGcXapo64oIPasnTNB0rTIOd5sdQ41fKKE/lgIrOeGHhIVbgHGrJ2yp/0yu
 WUpIsMZY0Ng5yTyxeiq3aLjOiaAVevsBumoievaG2IUbIBEfxk+NGMFFTfJKjvxZ
 JFswluTk/SydX325E1QDdhiWCpeeeGTQTFgjLNK9hVCOhtYmupROD/0vg5e20Ldh
 tWADqnOyvrfMZoetp9kgyh3ElHN1wxBint3w7NgB5oPTEjg7CFywQeEvTc6ZcB6X
 yjf8lTnbnf3IrVARYuk2
 =uGlu
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:
 "ACPI-related fixes for arm64:

   - GICC MADT entry validity check fix

   - Skip IRQ registration with pmu=off in an ACPI guest

   - struct acpi_pci_root_ops freeing on error path"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  ARM64/ACPI: Fix BAD_MADT_GICC_ENTRY() macro implementation
  drivers/perf: arm_pmu_acpi: avoid perf IRQ init when guest PMU is off
  ARM64: PCI: Fix struct acpi_pci_root_ops allocation failure path
2017-06-02 12:06:27 -07:00
Linus Torvalds f2a025defd Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - revert a broken PAT commit that broke a number of systems

   - fix two preemptability warnings/bugs that can trigger under certain
     circumstances, in the debug code and in the microcode loader"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "x86/PAT: Fix Xorg regression on CPUs that don't support PAT"
  x86/debug/32: Convert a smp_processor_id() call to raw to avoid DEBUG_PREEMPT warning
  x86/microcode/AMD: Change load_microcode_amd()'s param to bool to fix preemptibility bug
2017-06-02 08:53:42 -07:00
Linus Torvalds f56f88ee3f Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
 "Misc fixes:

   - three boot crash fixes for uncommon configurations

   - silence a boot warning under virtualization

   - plus a GCC 7 related (harmless) build warning fix"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/bgrt: Skip efi_bgrt_init() in case of non-EFI boot
  x86/efi: Correct EFI identity mapping under 'efi=old_map' when KASLR is enabled
  x86/efi: Disable runtime services on kexec kernel if booted with efi=old_map
  efi: Remove duplicate 'const' specifiers
  efi: Don't issue error message when booted under Xen
2017-06-02 08:51:53 -07:00
Lorenzo Pieralisi cb7cf772d8 ARM64/ACPI: Fix BAD_MADT_GICC_ENTRY() macro implementation
The BAD_MADT_GICC_ENTRY() macro checks if a GICC MADT entry passes
muster from an ACPI specification standpoint. Current macro detects the
MADT GICC entry length through ACPI firmware version (it changed from 76
to 80 bytes in the transition from ACPI 5.1 to ACPI 6.0 specification)
but always uses (erroneously) the ACPICA (latest) struct (ie struct
acpi_madt_generic_interrupt - that is 80-bytes long) length to check if
the current GICC entry memory record exceeds the MADT table end in
memory as defined by the MADT table header itself, which may result in
false negatives depending on the ACPI firmware version and how the MADT
entries are laid out in memory (ie on ACPI 5.1 firmware MADT GICC
entries are 76 bytes long, so by adding 80 to a GICC entry start address
in memory the resulting address may well be past the actual MADT end,
triggering a false negative).

Fix the BAD_MADT_GICC_ENTRY() macro by reshuffling the condition checks
and update them to always use the firmware version specific MADT GICC
entry length in order to carry out boundary checks.

Fixes: b6cfb27737 ("ACPI / ARM64: add BAD_MADT_GICC_ENTRY() macro")
Reported-by: Julien Grall <julien.grall@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Hanjun Guo <hanjun.guo@linaro.org>
Cc: Al Stone <ahs3@redhat.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-06-02 15:13:52 +01:00
Olof Johansson 1ba2eaaacd Merge tag 'mvebu-fixes-4.12-1' of git://git.infradead.org/linux-mvebu into fixes
mvebu fixes for 4.12

Fix the interrupt description of the crypto node for device tree of
the Armada 7K/8K SoCs

* tag 'mvebu-fixes-4.12-1' of git://git.infradead.org/linux-mvebu: (316 commits)
  arm64: marvell: dts: fix interrupts in 7k/8k crypto nodes
  + Linux 4.12-rc2

Signed-off-by: Olof Johansson <olof@lixom.net>
2017-06-01 17:07:38 -07:00
Olof Johansson da3d1d4a4e Merge tag 'at91-4.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux into fixes
Fixes for 4.12:

Fix two compilation issues

* tag 'at91-4.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
  ARM: at91: select CONFIG_ARM_CPU_SUSPEND
  memory: atmel-ebi: mark PM ops as __maybe_unused

Signed-off-by: Olof Johansson <olof@lixom.net>
2017-06-01 17:07:31 -07:00
Masahiro Yamada cf5cde2199 ARM: dts: versatile: use #include "..." to include local DT
Most of DT files in ARM use #include "..." to make pre-processor
include DT in the same directory, but this is one of the exceptional
files that use #include <...> for that.

Fix it to remove -I$(srctree)/arch/$(SRCARCH)/boot/dts path from
dtc_cpp_flags.

ARM: dts: versatile: use #include "..." to include DT in the same directory

Most of DT files in ARM use #include "..." to make pre-processor
include DT in the same directory, but we have 3 exceptional files
that use #include <...> for that.

They must be fixed to remove -I$(srctree)/arch/$(SRCARCH)/boot/dts
path from dtc_cpp_flags.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
2017-06-01 17:07:29 -07:00
Leonard Crestez e6f4292ae0 ARM: dts: imx6ul-14x14-evk: Add ksz8081 phy properties
Right now mach-imx6ul registers a fixup for the ksz8081 phy. The same
register values can be set through the micrel phy driver by using dts
properties.

This seems preferable and allows cleanly fixing suspend/resume.

Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-01 15:02:30 -04:00
Linus Torvalds 9ea15a59c3 Many small x86 bug fixes: SVM segment registers access rights, nested VMX,
preempt notifiers, LAPIC virtual wire mode, NMI injection.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJZMETIAAoJEL/70l94x66DyjgH/i0LJydL2vnDKanUsPdQYrml
 daoRm67uaz8cy456SIvz+j+NAmVLKoOsQy+jRigFpOWDJglBhy77Fw0rD68uTaf1
 vGRl75pOYANxVlDC0BLgUwXVUjfFsLs6sKYpIIb9pTKNf8Q04MaWSpeCX+GUe0IR
 8Ere9LK0UKTinF1cHmZe4ihG9DYylK6DEagk/9qnxEu1rU8ZiC9SXguXXeNDQI9p
 wppkBngOokqbZ5oTVIkBmbbDMpVefj6ioGqeBVBjS6xE0UlfvJsjsJ54wdLXsGue
 7CF8E1cX7d8NohtlN5qZGssTDscUPPJghalXpeIhtHYgooKf1VeZExATA5YCVLw=
 =bHQF
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Many small x86 bug fixes: SVM segment registers access rights, nested
  VMX, preempt notifiers, LAPIC virtual wire mode, NMI injection"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Fix nmi injection failure when vcpu got blocked
  KVM: SVM: do not zero out segment attributes if segment is unusable or not present
  KVM: SVM: ignore type when setting segment registers
  KVM: nVMX: fix nested_vmx_check_vmptr failure paths under debugging
  KVM: x86: Fix virtual wire mode
  KVM: nVMX: Fix handling of lmsw instruction
  KVM: X86: Fix preempt the preemption timer cancel
2017-06-01 10:48:09 -07:00
David S. Miller 0fde7ad71e sparc64: Fix build warnings with gcc 7.
arch/sparc/kernel/ds.c: In function ‘register_services’:
arch/sparc/kernel/ds.c:912:3: error: ‘strcpy’: writing at least 1 byte
into a region of size 0 overflows the destination

Reported-by: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-01 09:42:46 -07:00
Ingo Molnar c08d517480 Revert "x86/PAT: Fix Xorg regression on CPUs that don't support PAT"
This reverts commit cbed27cdf0.

As Andy Lutomirski observed:

 "I think this patch is bogus. pat_enabled() sure looks like it's
  supposed to return true if PAT is *enabled*, and these days PAT is
  'enabled' even if there's no HW PAT support."

Reported-by: Bernhard Held <berny156@gmx.de>
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: stable@vger.kernel.org # v4.2+
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-01 15:52:23 +02:00
Michael Ellerman 0e5e7f5e97 powerpc/64: Reclaim CPU_FTR_SUBCORE
We are running low on CPU feature bits, so we only want to use them when
it's really necessary.

CPU_FTR_SUBCORE is only used in one place, and only in C, so we don't
need it in order to make asm patching work. It can only be set on
"Power8" CPUs, which in practice means POWER8, POWER8E and POWER8NVL.
There are no plans to implement it on future CPUs, but if there ever
were we could retrofit it then.

Although KVM uses subcores, it never looks at the CPU feature, it either
looks at the ISA level or the threads_per_subcore value.

So drop the CPU feature and do a PVR check instead. Drop the device tree
"subcore" feature as we no longer support doing anything with it, and we
will drop it from skiboot too.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-01 19:56:28 +10:00
Michael Bringmann dc421b200f powerpc/hotplug-mem: Fix missing endian conversion of aa_index
When adding or removing memory, the aa_index (affinity value) for the
memblock must also be converted to match the endianness of the rest
of the 'ibm,dynamic-memory' property.  Otherwise, subsequent retrieval
of the attribute will likely lead to non-existent nodes, followed by
using the default node in the code inappropriately.

Fixes: 5f97b2a0d1 ("powerpc/pseries: Implement memory hotplug add in the kernel")
Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-01 19:54:41 +10:00
Christophe Leroy 6f553912ee powerpc/sysdev/simple_gpio: Fix oops in gpio save_regs function
of_mm_gpiochip_add_data() generates an oops for NULL pointer dereference.

of_mm_gpiochip_add_data() calls mm_gc->save_regs() before
setting the data, therefore ->save_regs() cannot use gpiochip_get_data()

Fixes: 937daafca7 ("powerpc: simple-gpio: use gpiochip data pointer")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-01 19:54:41 +10:00
Michael Ellerman 99acc9bede powerpc/spufs: Fix coredump of SPU contexts
If a process dumps core while it has SPU contexts active then we have
code to also dump information about the SPU contexts.

Unfortunately it's been broken for 3 1/2 years, and we didn't notice. In
commit 7b1f4020d0 ("spufs: get rid of dump_emit() wrappers") the nread
variable was removed and rc used instead. That means when the loop exits
successfully, rc has the number of bytes read, but it's then used as the
return value for the function, which should return 0 on success.

So fix it by setting rc = 0 before returning in the success case.

Fixes: 7b1f4020d0 ("spufs: get rid of dump_emit() wrappers")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-01 19:54:40 +10:00
Nicholas Piggin a2b05b7aa6 powerpc/64s: Add dt_cpu_ftrs boot time setup option
Provide a dt_cpu_ftrs= cmdline option to disable the dt_cpu_ftrs CPU
feature discovery, and fall back to the "cputable" based version.

Also allow control of advertising unknown features to userspace and
with this parameter, and remove the clunky CONFIG option.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add explicit early check of bootargs in dt_cpu_ftrs_init()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-01 19:54:33 +10:00
ZhuangYanying 47a66eed99 KVM: x86: Fix nmi injection failure when vcpu got blocked
When spin_lock_irqsave() deadlock occurs inside the guest, vcpu threads,
other than the lock-holding one, would enter into S state because of
pvspinlock. Then inject NMI via libvirt API "inject-nmi", the NMI could
not be injected into vm.

The reason is:
1 It sets nmi_queued to 1 when calling ioctl KVM_NMI in qemu, and sets
cpu->kvm_vcpu_dirty to true in do_inject_external_nmi() meanwhile.
2 It sets nmi_queued to 0 in process_nmi(), before entering guest, because
cpu->kvm_vcpu_dirty is true.

It's not enough just to check nmi_queued to decide whether to stay in
vcpu_block() or not. NMI should be injected immediately at any situation.
Add checking nmi_pending, and testing KVM_REQ_NMI replaces nmi_queued
in vm_vcpu_has_events().

Do the same change for SMIs.

Signed-off-by: Zhuang Yanying <ann.zhuangyanying@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-01 11:23:10 +02:00
Roman Pen d9c1b5431d KVM: SVM: do not zero out segment attributes if segment is unusable or not present
This is a fix for the problem [1], where VMCB.CPL was set to 0 and interrupt
was taken on userspace stack.  The root cause lies in the specific AMD CPU
behaviour which manifests itself as unusable segment attributes on SYSRET.
The corresponding work around for the kernel is the following:

61f01dd941 ("x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue")

In other turn virtualization side treated unusable segment incorrectly and
restored CPL from SS attributes, which were zeroed out few lines above.

In current patch it is assured only that P bit is cleared in VMCB.save state
and segment attributes are not zeroed out if segment is not presented or is
unusable, therefore CPL can be safely restored from DPL field.

This is only one part of the fix, since QEMU side should be fixed accordingly
not to zero out attributes on its side.  Corresponding patch will follow.

[1] Message id: CAJrWOzD6Xq==b-zYCDdFLgSRMPM-NkNuTSDFEtX=7MreT45i7Q@mail.gmail.com

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Signed-off-by: Mikhail Sennikovskii <mikhail.sennikovskii@profitbricks.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-01 11:21:17 +02:00
Christian Borntraeger 1ba15b24f0 KVM: s390: fix ais handling vs cpu model
If ais is disabled via cpumodel, we must act accordingly, even if
KVM_CAP_S390_AIS was enabled.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Reviewed-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com>
2017-05-31 19:54:49 +02:00
Arnd Bergmann 5b8b9cf76a x86/KASLR: Use the right memcpy() implementation
The decompressor has its own implementation of the string functions,
but has to include the right header to get those, while implicitly
including linux/string.h may result in a link error:

  arch/x86/boot/compressed/kaslr.o: In function `choose_random_location':
  kaslr.c:(.text+0xf51): undefined reference to `_mmx_memcpy'

This has appeared now as KASLR started using memcpy(), via:

	d52e7d5a95 ("x86/KASLR: Parse all 'memmap=' boot option entries")

Other files in the decompressor already do the same thing.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170530091446.1000183-1-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-31 07:59:45 +02:00
Gioh Kim 8eae9570d1 KVM: SVM: ignore type when setting segment registers
Commit 19bca6ab75 ("KVM: SVM: Fix cross vendor migration issue with
unusable bit") added checking type when setting unusable.
So unusable can be set if present is 0 OR type is 0.
According to the AMD processor manual, long mode ignores the type value
in segment descriptor. And type can be 0 if it is read-only data segment.
Therefore type value is not related to unusable flag.

This patch is based on linux-next v4.12.0-rc3.

Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-05-30 17:17:22 +02:00
Radim Krčmář cbf712792b KVM: nVMX: fix nested_vmx_check_vmptr failure paths under debugging
kvm_skip_emulated_instruction() will return 0 if userspace is
single-stepping the guest.

kvm_skip_emulated_instruction() uses return status convention of exit
handler: 0 means "exit to userspace" and 1 means "continue vm entries".
The problem is that nested_vmx_check_vmptr() return status means
something else: 0 is ok, 1 is error.

This means we would continue executing after a failure.  Static checker
noticed it because vmptr was not initialized.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 6affcbedca ("KVM: x86: Add kvm_skip_emulated_instruction and use it.")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-05-30 17:17:21 +02:00
Sricharan R d3e01c5159 arm: dma-mapping: Reset the device's dma_ops
arch_teardown_dma_ops() being the inverse of arch_setup_dma_ops()
,dma_ops should be cleared in the teardown path. Currently, only the
device's iommu mapping structures are cleared in arch_teardown_dma_ops,
but not the dma_ops. So on the next reprobe, dma_ops left in place is
stale from the first IOMMU setup, but iommu mappings has been disposed
of. This is a problem when the probe of the device is deferred and
recalled with the IOMMU probe deferral.

So for fixing this, slightly refactor by moving the code from
__arm_iommu_detach_device to arm_iommu_detach_device and cleanup
the former. This takes care of resetting the dma_ops in the teardown
path.

Fixes: 09515ef5dd ("of/acpi: Configure dma operations at probe time for platform/amba/pci bus devices")
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Sricharan R <sricharan@codeaurora.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-05-30 11:31:34 +02:00
Laurent Pinchart a93a121a96 ARM: dma-mapping: Don't tear down third-party mappings
arch_setup_dma_ops() is used in device probe code paths to create an
IOMMU mapping and attach it to the device. The function assumes that the
device is attached to a device-specific IOMMU instance (or at least a
device-specific TLB in a shared IOMMU instance) and thus creates a
separate mapping for every device.

On several systems (Renesas R-Car Gen2 being one of them), that
assumption is not true, and IOMMU mappings must be shared between
multiple devices. In those cases the IOMMU driver knows better than the
generic ARM dma-mapping layer and attaches mapping to devices manually
with arm_iommu_attach_device(), which sets the DMA ops for the device.

The arch_setup_dma_ops() function takes this into account and bails out
immediately if the device already has DMA ops assigned. However, the
corresponding arch_teardown_dma_ops() function, called from driver
unbind code paths (including probe deferral), will tear the mapping down
regardless of who created it. When the device is reprobed
arch_setup_dma_ops() will be called again but won't perform any
operation as the DMA ops will still be set.

We need to reset the DMA ops in arch_teardown_dma_ops() to fix this.
However, we can't do so unconditionally, as then a new mapping would be
created by arch_setup_dma_ops() when the device is reprobed, regardless
of whether the device needs to share a mapping or not. We must thus keep
track of whether arch_setup_dma_ops() created the mapping, and only in
that case tear it down in arch_teardown_dma_ops().

Keep track of that information in the dev_archdata structure. As the
structure is embedded in all instances of struct device let's not grow
it, but turn the existing dma_coherent bool field into a bitfield that
can be used for other purposes.

Fixes: 09515ef5dd ("of/acpi: Configure dma operations at probe time for platform/amba/pci bus devices")
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-05-30 11:31:33 +02:00
Vegard Nossum b0f5a8f32e kthread: fix boot hang (regression) on MIPS/OpenRISC
This fixes a regression in commit 4d6501dce0 where I didn't notice
that MIPS and OpenRISC were reinitialising p->{set,clear}_child_tid to
NULL after our initialisation in copy_process().

We can simply get rid of the arch-specific initialisation here since it
is now always done in copy_process() before hitting copy_thread{,_tls}().

Review notes:

 - As far as I can tell, copy_process() is the only user of
   copy_thread_tls(), which is the only caller of copy_thread() for
   architectures that don't implement copy_thread_tls().

 - After this patch, there is no arch-specific code touching
   p->set_child_tid or p->clear_child_tid whatsoever.

 - It may look like MIPS/OpenRISC wanted to always have these fields be
   NULL, but that's not true, as copy_process() would unconditionally
   set them again _after_ calling copy_thread_tls() before commit
   4d6501dce0.

Fixes: 4d6501dce0 ("kthread: Fix use-after-free if kthread fork fails")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net> # MIPS only
Acked-by: Stafford Horne <shorne@gmail.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: openrisc@lists.librecores.org
Cc: Jamie Iles <jamie.iles@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-29 09:40:54 -07:00
Arnd Bergmann cc7a938f5f ARM: at91: select CONFIG_ARM_CPU_SUSPEND
The reference to cpu_resume requires the corresponding
generic code to be enabled when CONFIG_PM is set:

arch/arm/mach-at91/pm.o: In function `sama5d2_pm_init':
pm.c:(.init.text+0x5e8): undefined reference to `cpu_resume'

Fixes: 24a0f5c539 ("ARM: at91: pm: Add sama5d2 backup mode")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2017-05-29 11:06:56 +02:00
Tobias Klauser 83f0124ad8 microblaze: remove asm-generic wrapper headers
Some of microblaze's asm and uapi header are merely including their
asm-generic counterpart. Thus, the arch specific headers can be removed
and the asm-generic header can be used directly via generic-y.

The headers removed from uapi don't need to be added to generic-y in the
uapi Kbuild in order to get exported, as they are already listed in
mandatory-y.

Also order the generic-y list alphabetically.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
2017-05-29 11:06:10 +02:00
Tobias Klauser f5ef419630 microblaze: wire up statx syscall
Add the new statx syscall.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
2017-05-29 11:03:56 +02:00
Nicolai Stange bb26081a98 microblaze: Set ->min_delta_ticks and ->max_delta_ticks for timer
In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.

Make the microblaze arch's clockevent driver initialize these fields
properly.

This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from this
driver.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
2017-05-29 11:02:43 +02:00
Geliang Tang e56751c961 microblaze: use sg_phys()
Use sg_phys() instead of open-coding it.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
2017-05-29 11:00:02 +02:00