Commit Graph

5884 Commits

Author SHA1 Message Date
Cyril Bur 91381b9cb1 powerpc: Force reload for recheckpoint during tm {fp, vec, vsx} unavailable exception
Lazy save and restore of FP/Altivec means that a userspace process can
be sent to userspace with FP or Altivec disabled and loaded only as
required (by way of an FP/Altivec unavailable exception). Transactional
Memory complicates this situation as a transaction could be started
without FP/Altivec being loaded up. This causes the hardware to
checkpoint incorrect registers. Handling FP/Altivec unavailable
exceptions while a thread is transactional requires a reclaim and
recheckpoint to ensure the CPU has correct state for both sets of
registers.

tm_reclaim() has optimisations to not always save the FP/Altivec
registers to the checkpointed save area. This was originally done
because the caller might have information that the checkpointed
registers aren't valid due to lazy save and restore. We've also been a
little vague as to how tm_reclaim() leaves the FP/Altivec state since it
doesn't necessarily always save it to the thread struct. This has lead
to an (incorrect) assumption that it leaves the checkpointed state on
the CPU.

tm_recheckpoint() has similar optimisations in reverse. It may not
always reload the checkpointed FP/Altivec registers from the thread
struct before the trecheckpoint. It is therefore quite unclear where it
expects to get the state from. This didn't help with the assumption
made about tm_reclaim().

This patch is a minimal fix for ease of backporting. A more correct fix
which removes the msr parameter to tm_reclaim() and tm_recheckpoint()
altogether has been upstreamed to apply on top of this patch.

Fixes: dc3106690b ("powerpc: tm: Always use fp_state and vr_state to
store live registers")

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06 20:39:33 +11:00
Cyril Bur a7771176b4 powerpc: Don't enable FP/Altivec if not checkpointed
Lazy save and restore of FP/Altivec means that a userspace process can
be sent to userspace with FP or Altivec disabled and loaded only as
required (by way of an FP/Altivec unavailable exception). Transactional
Memory complicates this situation as a transaction could be started
without FP/Altivec being loaded up. This causes the hardware to
checkpoint incorrect registers. Handling FP/Altivec unavailable
exceptions while a thread is transactional requires a reclaim and
recheckpoint to ensure the CPU has correct state for both sets of
registers.

Lazy save and restore of FP/Altivec cannot be done if a process is
transactional. If a facility was enabled it must remain enabled whenever
a thread is transactional.

Commit dc16b553c9 ("powerpc: Always restore FPU/VEC/VSX if hardware
transactional memory in use") ensures that the facilities are always
enabled if a thread is transactional. A bug in the introduced code may
cause it to inadvertently enable a facility that was (and should remain)
disabled. The problem with this extraneous enablement is that the
registers for the erroneously enabled facility have not been correctly
recheckpointed - the recheckpointing code assumed the facility would
remain disabled.

Further compounding the issue, the transactional {fp,altivec,vsx}
unavailable code has been incorrectly using the MSR to enable
facilities. The presence of the {FP,VEC,VSX} bit in the regs->msr simply
means if the registers are live on the CPU, not if the kernel should
load them before returning to userspace. This has worked due to the bug
mentioned above.

This causes transactional threads which return to their failure handler
to observe incorrect checkpointed registers. Perhaps an example will
help illustrate the problem:

A userspace process is running and uses both FP and Altivec registers.
This process then continues to run for some time without touching
either sets of registers. The kernel subsequently disables the
facilities as part of lazy save and restore. The userspace process then
performs a tbegin and the CPU checkpoints 'junk' FP and Altivec
registers. The process then performs a floating point instruction
triggering a fp unavailable exception in the kernel.

The kernel then loads the FP registers - and only the FP registers.
Since the thread is transactional it must perform a reclaim and
recheckpoint to ensure both the checkpointed registers and the
transactional registers are correct. It then (correctly) enables
MSR[FP] for the process. Later (on exception exist) the kernel also
(inadvertently) enables MSR[VEC]. The process is then returned to
userspace.

Since the act of loading the FP registers doomed the transaction we know
CPU will fail the transaction, restore its checkpointed registers, and
return the process to its failure handler. The problem is that we're
now running with Altivec enabled and the 'junk' checkpointed registers
are restored. The kernel had only recheckpointed FP.

This patch solves this by only activating FP/Altivec if userspace was
using them when it entered the kernel and not simply if the process is
transactional.

Fixes: dc16b553c9 ("powerpc: Always restore FPU/VEC/VSX if hardware
transactional memory in use")

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06 20:39:32 +11:00
Arnd Bergmann edfd17ff39 powerpc/eeh: Stop using do_gettimeofday()
This interface is inefficient and deprecated because of the y2038
overflow.

ktime_get_seconds() is an appropriate replacement here, since it
has sufficient granularity but is more efficient and uses monotonic
time.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06 17:40:00 +11:00
Michael Ellerman 632f057416 powerpc/tm: Don't check for WARN in TM Bad Thing handling
Currently when we take a TM Bad Thing program check exception, we
search the bug table to see if the program check was generated by a
WARN/WARN_ON etc.

That makes no sense, the WARN macros use trap instructions, which
should never generate a TM Bad Thing exception. If they ever did that
would be a bug and we should oops.

We do have some hand-coded bugs in tm.S, using EMIT_BUG_ENTRY, but
those are all BUGs not WARNs, and they all use trap instructions
anyway. Almost certainly this check was incorrectly copied from the
REASON_TRAP handling in the same function.

Remove it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-By: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06 16:48:16 +11:00
Michael Ellerman 4e00374704 powerpc/64s: Replace CONFIG_PPC_STD_MMU_64 with CONFIG_PPC_BOOK3S_64
CONFIG_PPC_STD_MMU_64 indicates support for the "standard" powerpc MMU
on 64-bit CPUs. The "standard" MMU refers to the hash page table MMU
found in "server" processors, from IBM mainly.

Currently CONFIG_PPC_STD_MMU_64 is == CONFIG_PPC_BOOK3S_64. While it's
annoying to have two symbols that always have the same value, it's not
quite annoying enough to bother removing one.

However with the arrival of Power9, we now have the situation where
CONFIG_PPC_STD_MMU_64 is enabled, but the kernel is running using the
Radix MMU - *not* the "standard" MMU. So it is now actively confusing
to use it, because it implies that code is disabled or inactive when
the Radix MMU is in use, however that is not necessarily true.

So s/CONFIG_PPC_STD_MMU_64/CONFIG_PPC_BOOK3S_64/, and do some minor
formatting updates of some of the affected lines.

This will be a pain for backports, but c'est la vie.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06 16:48:14 +11:00
Michael Ellerman c1807e3f84 powerpc/64: Free up CPU_FTR_ICSWX
The last user of CPU_FTR_ICSWX was removed in commit
6ff4d3e966 ("powerpc: Remove old unused icswx based coprocessor
support"), so free the bit up for future use.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06 16:48:14 +11:00
Nicholas Piggin ff967900c9 powerpc/64: Fix latency tracing for lazy irq replay
When returning from an exception to a soft-enabled context, pending
IRQs are replayed but IRQ tracing is not reset, so a number of them
can get chained together into the same IRQ-disabled trace.

Fix this by having __check_irq_replay re-set IRQ trace. This is
conceptually where we respond to the next interrupt, so it fits the
semantics of the IRQ tracer.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06 16:48:07 +11:00
Nicholas Piggin 6de6638b35 KVM: PPC: Book3S HV: Handle host system reset in guest mode
If the host takes a system reset interrupt while a guest is running,
the CPU must exit the guest before processing the host exception
handler.

After this patch, taking a sysrq+x with a CPU running in a guest
gives a trace like this:

   cpu 0x27: Vector: 100 (System Reset) at [c000000fdf5776f0]
       pc: c008000010158b80: kvmppc_run_core+0x16b8/0x1ad0 [kvm_hv]
       lr: c008000010158b80: kvmppc_run_core+0x16b8/0x1ad0 [kvm_hv]
       sp: c000000fdf577850
      msr: 9000000002803033
     current = 0xc000000fdf4b1e00
     paca    = 0xc00000000fd4d680	 softe: 3	 irq_happened: 0x01
       pid   = 6608, comm = qemu-system-ppc
   Linux version 4.14.0-rc7-01489-g47e1893a404a-dirty #26 SMP
   [c000000fdf577a00] c008000010159dd4 kvmppc_vcpu_run_hv+0x3dc/0x12d0 [kvm_hv]
   [c000000fdf577b30] c0080000100a537c kvmppc_vcpu_run+0x44/0x60 [kvm]
   [c000000fdf577b60] c0080000100a1ae0 kvm_arch_vcpu_ioctl_run+0x118/0x310 [kvm]
   [c000000fdf577c00] c008000010093e98 kvm_vcpu_ioctl+0x530/0x7c0 [kvm]
   [c000000fdf577d50] c000000000357bf8 do_vfs_ioctl+0xd8/0x8c0
   [c000000fdf577df0] c000000000358448 SyS_ioctl+0x68/0x100
   [c000000fdf577e30] c00000000000b220 system_call+0x58/0x6c
   --- Exception: c01 (System Call) at 00007fff76868df0
   SP (7fff7069baf0) is in userspace

Fixes: e36d0a2ed5 ("powerpc/powernv: Implement NMI IPI with OPAL_SIGNAL_SYSTEM_RESET")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06 16:48:06 +11:00
Linus Torvalds 866ba84ea3 powerpc fixes for 4.14 #6
A fix to the handling of misaligned paste instructions (P9 only), where a change
 to a #define has caused the check for the instruction to always fail.
 
 The preempt handling was unbalanced in the radix THP flush (P9 only). Though we
 don't generally use preempt we want to keep it working as much as possible.
 
 Two fixes for IMC (P9 only), one when booting with restricted number of CPUs and
 one in the error handling when initialisation fails due to firmware etc.
 
 A revert to fix function_graph on big endian machines, and then a rework of the
 reverted patch to fix kprobes blacklist handling on big endian machines.
 
 Thanks to:
   Anju T Sudhakar, Guilherme G. Piccoli, Madhavan Srinivasan, Naveen N. Rao,
   Nicholas Piggin, Paul Mackerras.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZ/ASDAAoJEFHr6jzI4aWAtWYP/34uCIJ83roZHfBdASGcczeO
 AEOnxZdVdAB6ynU0eM2JSlJMhx39mqT5qJQP4ucUlqNa+SHhFxTW4EmoQm/qAr9N
 4xhwix6OAsMleiPJx+RGGAnKB8rFhUEYH/+cql67FAq3u8zRbAdurR/A1lnIAY9x
 2bQ85GsjFlRSFdmSohXegBJ7Xey3QE3jpj+co3dAEm3JQVRHNpGqgvRy8yeowvod
 c/hHH7vmdKyDCZDtS5uF/egZOJLK1mueVuV69O0KKxT7xtEKk1cjFB08aHhcuJDm
 QIk5Efr8iCbn4EyRBzthij7YJ0VPBTNCmsGcf6RorrYv3hT0uN72qQPjg13WH7HY
 wkdMtkQvaFLbnZDqOxm9saczE27Ce7QNe1PNh6eapNlunz2nsb8Z/SvSyqskK3Yh
 5cBxHrfi+NcJHvdDAJNobNoJ3vlTJGI0qQ/dSexJsXxZw2zR3tPojRKCtK0pZGql
 xKDSqU7AbuUhLMxVBDDSQ+KONwyaBS84baLc98dapflw7htGWU339c5f9JTmnGvZ
 7fWT3hNihAmZNZZ957BC5J0h5kgkKgZ7y67SA/+RI0o2ZoJlxoeDYAyqc7dwfvs+
 ncxG4S6okSQP/EHf9bRSdYeJsZup/zP7+wD+02cbJd5zFJ+6cfu0AhPDRcBNMjHg
 EPkZvfZzRxTlGvmHCoWv
 =vTi1
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.14-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Some more powerpc fixes for 4.14.

  This is bigger than I like to send at rc7, but that's at least partly
  because I didn't send any fixes last week. If it wasn't for the IMC
  driver, which is new and getting heavy testing, the diffstat would
  look a bit better. I've also added ftrace on big endian to my test
  suite, so we shouldn't break that again in future.

   - A fix to the handling of misaligned paste instructions (P9 only),
     where a change to a #define has caused the check for the
     instruction to always fail.

   - The preempt handling was unbalanced in the radix THP flush (P9
     only). Though we don't generally use preempt we want to keep it
     working as much as possible.

   - Two fixes for IMC (P9 only), one when booting with restricted
     number of CPUs and one in the error handling when initialisation
     fails due to firmware etc.

   - A revert to fix function_graph on big endian machines, and then a
     rework of the reverted patch to fix kprobes blacklist handling on
     big endian machines.

  Thanks to: Anju T Sudhakar, Guilherme G. Piccoli, Madhavan Srinivasan,
  Naveen N. Rao, Nicholas Piggin, Paul Mackerras"

* tag 'powerpc-4.14-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/perf: Fix core-imc hotplug callback failure during imc initialization
  powerpc/kprobes: Dereference function pointers only if the address does not belong to kernel text
  Revert "powerpc64/elfv1: Only dereference function descriptor for non-text symbols"
  powerpc/64s/radix: Fix preempt imbalance in TLB flush
  powerpc: Fix check for copy/paste instructions in alignment handler
  powerpc/perf: Fix IMC allocation routine
2017-11-03 09:25:53 -07:00
Kees Cook 5943cf4a59 powerpc/watchdog: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-02 15:50:31 -07:00
Greg Kroah-Hartman b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Dave Airlie 7a88cbd8d6 Linux 4.14-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZ9kEFAAoJEHm+PkMAQRiGw6wH/0j197qyGd0hkVFMJO6LAgN3
 KQWS4nZ5BkVDocwv0RVnUJTtXqU1eozFgdVEtSoaFXpzlHGuptR2Tau9efDCJ7w3
 /utZxqvhGebZd2T+j+/o/LE8BRQxhADBNJq2D/o0WNt8ecxuG0GIkhkEYt/o3z1v
 /sxlwVwzXB7Dc/h1WcgGJG7cS6L9KzzAzGAS/iNvdFrPOygHBv8c0MxVZIiBIeeK
 1nZdyvbyM8uenSyG+prGt9ENrqXZxxfwUxIchi2V7A9m1WmD5zijNkf1JCWji/O+
 UsA1auxna7MwoxjxqZuGm4MlKOwZ+8xutk4JGgc+aP/ulndJbJYu+4op/3vaFBM=
 =Mhx+
 -----END PGP SIGNATURE-----

Backmerge tag 'v4.14-rc7' into drm-next

Linux 4.14-rc7

Requested by Ben Skeggs for nouveau to avoid major conflicts,
and things were getting a bit conflicty already, esp around amdgpu
reverts.
2017-11-02 12:40:41 +10:00
Naveen N. Rao e6c4dcb308 powerpc/kprobes: Dereference function pointers only if the address does not belong to kernel text
This makes the changes introduced in commit 83e840c770
("powerpc64/elfv1: Only dereference function descriptor for non-text
symbols") to be specific to the kprobe subsystem.

We previously changed ppc_function_entry() to always check the provided
address to confirm if it needed to be dereferenced. This is actually
only an issue for kprobe blacklisted asm labels (through use of
_ASM_NOKPROBE_SYMBOL) and can cause other issues with ftrace. Also, the
additional checks are not really necessary for our other uses.

As such, move this check to the kprobes subsystem.

Fixes: 83e840c770 ("powerpc64/elfv1: Only dereference function descriptor for non-text symbols")
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-01 15:51:03 +11:00
Paul Mackerras c01015091a KVM: PPC: Book3S HV: Run HPT guests on POWER9 radix hosts
This patch removes the restriction that a radix host can only run
radix guests, allowing us to run HPT (hashed page table) guests as
well.  This is useful because it provides a way to run old guest
kernels that know about POWER8 but not POWER9.

Unfortunately, POWER9 currently has a restriction that all threads
in a given code must either all be in HPT mode, or all in radix mode.
This means that when entering a HPT guest, we have to obtain control
of all 4 threads in the core and get them to switch their LPIDR and
LPCR registers, even if they are not going to run a guest.  On guest
exit we also have to get all threads to switch LPIDR and LPCR back
to host values.

To make this feasible, we require that KVM not be in the "independent
threads" mode, and that the CPU cores be in single-threaded mode from
the host kernel's perspective (only thread 0 online; threads 1, 2 and
3 offline).  That allows us to use the same code as on POWER8 for
obtaining control of the secondary threads.

To manage the LPCR/LPIDR changes required, we extend the kvm_split_info
struct to contain the information needed by the secondary threads.
All threads perform a barrier synchronization (where all threads wait
for every other thread to reach the synchronization point) on guest
entry, both before and after loading LPCR and LPIDR.  On guest exit,
they all once again perform a barrier synchronization both before
and after loading host values into LPCR and LPIDR.

Finally, it is also currently necessary to flush the entire TLB every
time we enter a HPT guest on a radix host.  We do this on thread 0
with a loop of tlbiel instructions.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-11-01 15:36:41 +11:00
Paul Mackerras 3e8f150a3b Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-next
This merges in the ppc-kvm topic branch of the powerpc tree to get the
commit that reverts the patch "KVM: PPC: Book3S HV: POWER9 does not
require secondary thread management".  This is needed for subsequent
patches which will be applied on this branch.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-11-01 15:33:39 +11:00
Paul Mackerras 158f19698b powerpc: Fix check for copy/paste instructions in alignment handler
Commit 07d2a628bc ("powerpc/64s: Avoid cpabort in context switch
when possible", 2017-06-09) changed the definition of PPC_INST_COPY
and in so doing inadvertently broke the check for copy/paste
instructions in the alignment fault handler. The check currently
matches no instructions.

This fixes it by ANDing both sides of the comparison with the mask.

Fixes: 07d2a628bc ("powerpc/64s: Avoid cpabort in context switch when possible")
Cc: stable@vger.kernel.org # v4.13+
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-25 12:42:35 +02:00
Ingo Molnar 9babb091e0 Linux 4.14-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZ7clWAAoJEHm+PkMAQRiG07AH/iKcej+AsurISHx6i/LUEDC1
 a9wo5HAR5kEj+ohdE3JSkD9BHLcyhcCXaqIk9yOrwi9xv1DrPv8U/nGkKzZJzFi2
 mGWK09Zgi+vgSpA+YSErgl05IVGtgaryQQPqQdawpyRpqTUwP0+2pLnKEnJe0f05
 fpv+S4bDKUCuE8GcVNjF9gxXDg8j60fFa+oAcn7QPS6dCun/H6TbDRue5oeky0Y+
 50ZYjjioy9S9DIm2VF7pktMCP/mK/fgb+Q+4Up09VJGHGhq+891SRJ27yDulxo47
 /gq22SRIGBX2PGNllSwhYslgaCRRlYTMBYOIWrBreanA4NpGD662dp+GgWhD154=
 =TAMw
 -----END PGP SIGNATURE-----

Merge tag 'v4.14-rc6' into locking/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-24 13:17:20 +02:00
Michael Ellerman 727f13616c powerpc: Disable the fast-endian switch syscall by default
Back in 2008 we added support for "fast little-endian switch" in the
syscall path. This added a special case syscall number 0x1ebe, which
is caught very early in the system call exception and switches endian
with as little overhead as possible. See commit 745a14cc26
("[POWERPC] Add fast little-endian switch system call") for full
details.

Although it is fast, it's also completely non standard. The "syscall
number" is out of the range of normal syscalls, it can't be traced or
audited, and it's a bit of a wart. To the best of our knowledge it was
only used by one program, now long since discontinued.

So in an effort to shake out any current users, put it behind a config
option, and make it default n. If anyone *is* using it they can
quickly reinstate it with a rebuild, and we can flip it to default y.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-22 12:08:31 +02:00
Michael Ellerman 5c2511bff4 powerpc/64s: Move the two FAST_ENDIAN macros next to each other
So we can #ifdef them in the next patch.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-22 12:08:31 +02:00
Michael Neuling 92fb8690bd powerpc/tm: P9 disable transactionally suspended sigcontexts
Unfortunately userspace can construct a sigcontext which enables
suspend. Thus userspace can force Linux into a path where trechkpt is
executed.

This patch blocks this from happening on POWER9 by sanity checking
sigcontexts passed in.

ptrace doesn't have this problem as only MSR SE and BE can be changed
via ptrace.

This patch also adds a number of WARN_ON()s in case we ever enter
suspend when we shouldn't. This should not happen, but if it does the
symptoms are soft lockup warnings which are not obviously TM related,
so the WARN_ON()s should make it obvious what's happening.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-21 09:36:28 +11:00
Michael Ellerman 54820530c5 powerpc/powernv: Enable TM without suspend if possible
Some Power9 revisions can run in a mode where TM operates without
suspended state. If we find ourself on a CPU that might be in this
mode, we query OPAL to check, and if so we reenable TM in CPU
features, and enable a new user feature to signal to userspace that we
are in this mode.

We do not enable the "normal" user feature, PPC_FEATURE2_HTM, but we
do enable PPC_FEATURE2_HTM_NOSC because that indicates to userspace
that the kernel will abort transactions on syscall entry, which is
true regardless of the suspend mode.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-21 09:33:05 +11:00
Cyril Bur 07fd1761e1 powerpc/tm: Add commandline option to disable hardware transactional memory
Currently the kernel relies on firmware to inform it whether or not the
CPU supports HTM and as long as the kernel was built with
CONFIG_PPC_TRANSACTIONAL_MEM=y then it will allow userspace to make
use of the facility.

There may be situations where it would be advantageous for the kernel
to not allow userspace to use HTM, currently the only way to achieve
this is to recompile the kernel with CONFIG_PPC_TRANSACTIONAL_MEM=n.

This patch adds a simple commandline option so that HTM can be
disabled at boot time.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
[mpe: Simplify to a bool, move to prom.c, put doco in the right place.
 Always disable, regardless of initial state, to avoid user confusion.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-20 11:10:56 +11:00
Michael Ellerman ddd46ed2e6 Merge branch 'topic/ppc-kvm' into next
Bring in some KVM commits we need (the TM one in particular).
2017-10-20 11:10:30 +11:00
Paul Mackerras 31a4d4480c Revert "KVM: PPC: Book3S HV: POWER9 does not require secondary thread management"
This reverts commit 94a04bc25a.

In order to run HPT guests on a radix POWER9 host, we will have to run
the host in single-threaded mode, because POWER9 processors do not
currently support running some threads of a core in HPT mode while
others are in radix mode ("mixed mode").

That means that we will need the same mechanisms that are used on
POWER8 to make the secondary threads available to KVM, which were
disabled on POWER9 by commit 94a04bc25a.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-19 15:28:04 +11:00
Will Deacon 58788a9b60 locking/arch, powerpc/rtas: Use arch_spin_lock() instead of arch_spin_lock_flags()
arch_spin_lock_flags() is an internal part of the spinlock implementation
and is no longer available when SMP=n and DEBUG_SPINLOCK=y, so the PPC
RTAS code fails to compile in this configuration:

   arch/powerpc/kernel/rtas.c: In function 'lock_rtas':
>> arch/powerpc/kernel/rtas.c:81:2: error: implicit declaration of function 'arch_spin_lock_flags' [-Werror=implicit-function-declaration]
     arch_spin_lock_flags(&rtas.lock, flags);
     ^~~~~~~~~~~~~~~~~~~~

Since there's no good reason to use arch_spin_lock_flags() here (the code
in question already calls local_irq_save(flags)), switch it over to
arch_spin_lock and get things building again.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1508327469-20231-1-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-18 15:15:07 +02:00
Bjorn Helgaas a37c0f4895 vgaarb: Select a default VGA device even if there's no legacy VGA
Daniel Axtens reported that on the HiSilicon D05 board, the VGA device is
behind a bridge that doesn't support PCI_BRIDGE_CTL_VGA, so the VGA arbiter
never selects it as the default, which means Xorg auto-detection doesn't
work.

VGA is a legacy PCI feature: a VGA device can respond to addresses, e.g.,
[mem 0xa0000-0xbffff], [io 0x3b0-0x3bb], [io 0x3c0-0x3df], etc., that are
not configurable by BARs.  Consequently, multiple VGA devices can conflict
with each other.  The VGA arbiter avoids conflicts by ensuring that those
legacy resources are only routed to one VGA device at a time.

The arbiter identifies the "default VGA" device, i.e., a legacy VGA device
that was used by boot firmware.  It selects the first device that:

  - is of PCI_CLASS_DISPLAY_VGA,
  - has both PCI_COMMAND_IO and PCI_COMMAND_MEMORY enabled, and
  - has PCI_BRIDGE_CTL_VGA set in all upstream bridges.

Some systems don't have such a device.  For example, if a host bridge
doesn't support I/O space, PCI_COMMAND_IO probably won't be enabled for any
devices below it.  Or, as on the HiSilicon D05, the VGA device may be
behind a bridge that doesn't support PCI_BRIDGE_CTL_VGA, so accesses to the
legacy VGA resources will never reach the device.

This patch extends the arbiter so that if it doesn't find a device that
meets all the above criteria, it selects the first device that:

  - is of PCI_CLASS_DISPLAY_VGA and
  - has PCI_COMMAND_IO or PCI_COMMAND_MEMORY enabled

If it doesn't find even that, it selects the first device that:

  - is of class PCI_CLASS_DISPLAY_VGA.

Such a device may not be able to use the legacy VGA resources, but most
drivers can operate the device without those.  Setting it as the default
device means its "boot_vga" sysfs file will contain "1", which Xorg (via
libpciaccess) uses to help select its default output device.

This fixes Xorg auto-detection on some arm64 systems (HiSilicon D05 in
particular; see the link below).

It also replaces the powerpc fixup_vga() quirk, albeit with slightly
different semantics: the quirk selected the first VGA device we found, and
overrode that selection with any enabled VGA device we found.  If there
were several enabled VGA devices, the *last* one we found would become the
default.

The code here instead selects the *first* enabled VGA device we find, and
if none are enabled, the first VGA device we find.

Link: http://lkml.kernel.org/r/20170901072744.2409-1-dja@axtens.net
Tested-by: Daniel Axtens <dja@axtens.net>       # arm64, ppc64-qemu-tcg
Tested-by: Zhou Wang <wangzhou1@hisilicon.com>  # D05 Hisi Hip07, Hip08
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20171013034721.14630.65913.stgit@bhelgaas-glaptop.roam.corp.google.com
2017-10-18 10:04:56 +02:00
Balbir Singh 733e4a4c44 powerpc/mce: hookup memory_failure for UE errors
If we are in user space and hit a UE error, we now have the
basic infrastructure to walk the page tables and find out
the effective address that was accessed, since the DAR
is not valid.

We use a work_queue content to hookup the bad pfn, any
other context causes problems, since memory_failure itself
can call into schedule() via lru_drain_ bits.

We could probably poison the struct page to avoid a race
between detection and taking corrective action.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-16 23:12:02 +11:00
Balbir Singh 01eaac2b05 powerpc/mce: Hookup ierror (instruction) UE errors
Hookup instruction errors (UE) for memory offling via memory_failure()
in a manner similar to load/store errors (derror). Since we have access
to the NIP, the conversion is a one step process in this case.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-16 23:12:01 +11:00
Balbir Singh ba41e1e1cc powerpc/mce: Hookup derror (load/store) UE errors
Extract physical_address for UE errors by walking the page
tables for the mm and address at the NIP, to extract the
instruction. Then use the instruction to find the effective
address via analyse_instr().

We might have page table walking races, but we expect them to
be rare, the physical address extraction is best effort. The idea
is to then hook up this infrastructure to memory failure eventually.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-16 23:12:01 +11:00
Balbir Singh 81b61fa7a0 powerpc/mce: Align the print of physical address better
Use the same alignment as Effective address.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-16 23:11:50 +11:00
Balbir Singh 73e341eb6b powerpc/mce: Remove unused function get_mce_fault_addr()
There are no users of get_mce_fault_addr() since commit
1363875bdb ("powerpc/64s: fix handling of non-synchronous machine
checks") removed the last usage.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-16 23:11:19 +11:00
Linus Torvalds e18e884445 powerpc fixes for 4.14 #5
A fix for a bad bug (written by me) in our livepatch handler. Removal of an
 over-zealous lockdep_assert_cpus_held() in our topology code. A fix to the
 recently added emulation of cntlz[wd]. And three small fixes to the recently
 added IMC PMU driver.
 
 Thanks to:
   Anju T Sudhakar, Balbir Singh, Kamalesh Babulal, Naveen N. Rao, Sandipan Das,
   Santosh Sivaraj, Thiago Jung Bauermann.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZ4JXVAAoJEFHr6jzI4aWAi9MQAMBJLPG62uDmuhRY6v1MLp44
 8iVvNqTSGXuzz2c293Zo3Y8LHQ28S7f5wdwZs75fZ6GVup033pkuhL8XipFTB87m
 RzotMwJ1V1H0J7RRsDZ+QcyybwfE4GdqRHLjSU5VIUwJu7cQSIjjNWdkPUqswzXI
 g3BbNoxEOlEx31NbyNB+as7AnSvS0lLMvwB7TEy9rf3mmadN3UiAl3OJ93M/7Nm2
 qCWjCib/gUQ5U4N5STKWl62yGyvJ330OHoWdsTlHzHgYsEatQFHN8mjnW+UPN3rR
 Mz+xCIkt6PnYMdiJNXND42iwdx/7C7BR9JhQ5t6150Swfnbe+CT5nBxk2+IDKpQu
 V1rSX4S18TLQ+YCQm0wKuaQs/0EZ4kiHqUDZYVP/YQLSSYM5Ftf4W94Dwu+AZdtr
 wWX3szZxqvCZjvdT8HWQQW8vAIqpdxOkr019fjoUzcDoUIYKs3cLDJVx6CAY6n3t
 GGN3oGhffKbg1NyldDrZRDBJ+ie7gGincxlcMe1YrXDsXNum6edAhu01cUz7o9vO
 b9/fIPjWAWotS7kqWXgGfZL6vlRx3cdSh58vKaThmNAbZLaIAnPTYUynXqXZ++Nn
 bBDQUE9zwJ42ZgzYDr3bT6+9pwqPQ3w8zmOjZGB/J1ygS8k9/0gYVe1FO2N8vElM
 w5VBKl69v2RwItwHuSRJ
 =wWnt
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.14-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "A fix for a bad bug (written by me) in our livepatch handler. Removal
  of an over-zealous lockdep_assert_cpus_held() in our topology code. A
  fix to the recently added emulation of cntlz[wd]. And three small
  fixes to the recently added IMC PMU driver.

  Thanks to: Anju T Sudhakar, Balbir Singh, Kamalesh Babulal, Naveen N.
  Rao, Sandipan Das, Santosh Sivaraj, Thiago Jung Bauermann"

* tag 'powerpc-4.14-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/perf: Fix IMC initialization crash
  powerpc/perf: Add ___GFP_NOWARN flag to alloc_pages_node()
  powerpc/perf: Fix for core/nest imc call trace on cpuhotplug
  powerpc: Don't call lockdep_assert_cpus_held() from arch_update_cpu_topology()
  powerpc/lib/sstep: Fix count leading zeros instructions
  powerpc/livepatch: Fix livepatch stack access
2017-10-13 11:39:28 -07:00
Kamalesh Babulal 1c0437af9f powerpc/modules: Use WARN_ON() in stub_for_addr()
Use WARN_ON(), while running out of stubs in stub_for_addr()
and abort loading of the module instead of BUG_ON().

Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-13 19:41:57 +11:00
Kamalesh Babulal e36a82ee4c powerpc/livepatch: Fix livepatch stack access
While running stress test with livepatch module loaded, kernel bug was
triggered.

  cpu 0x5: Vector: 400 (Instruction Access) at [c0000000eb9d3b60]
  5:mon> t
  [c0000000eb9d3de0] c0000000eb9d3e30 (unreliable)
  [c0000000eb9d3e30] c000000000008ab4 hardware_interrupt_common+0x114/0x120
   --- Exception: 501 (Hardware Interrupt) at c000000000053040 livepatch_handler+0x4c/0x74
  [c0000000eb9d4120] 0000000057ac6e9d (unreliable)
  [d0000000089d9f78] 2e0965747962382e
  SP (965747962342e09) is in userspace

When an interrupt occurs during the livepatch_handler execution, it's
possible for the livepatch_stack and/or thread_info to be corrupted.
eg:

  Task A                        Interrupt Handler
  =========                     =================
  livepatch_handler:
  mr r0, r1
  ld r1, TI_livepatch_sp(r12)
                                hardware_interrupt_common:
                                  do_IRQ+0x8:
                                    mflr    r0          <- saved stack pointer is overwritten
                                    bl      _mcount
                                    ...
                                    std     r27,-40(r1) <- overwrite of thread_info()

  lis r2, STACK_END_MAGIC@h
  ori r2, r2, STACK_END_MAGIC@l
  ld  r12, -8(r1)

Fix the corruption by using r11 register for livepatch stack
manipulation, instead of shuffling task stack and livepatch stack into
r1 register. Using r11 register also avoids disabling/enabling irq's
while setting up the livepatch stack.

Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-10 15:27:42 +11:00
Linus Torvalds 529a86e063 Merge branch 'ppc-bundle' (bundle from Michael Ellerman)
Merge powerpc transactional memory fixes from Michael Ellerman:
 "I figured I'd still send you the commits using a bundle to make sure
  it works in case I need to do it again in future"

This fixes transactional memory state restore for powerpc.

* bundle'd patches from Michael Ellerman:
  powerpc/tm: Fix illegal TM state in signal handler
  powerpc/64s: Use emergency stack for kernel TM Bad Thing program checks
2017-10-09 19:08:32 -07:00
Linus Torvalds 1249b571ba powerpc fixes for 4.14 #4
Nine small fixes, really nothing that stands out.
 
 A work around for a spurious MCE on Power9. A CXL fault handling fix, some fixes
 to the new XIVE code, and a fix to the new 32-bit STRICT_KERNEL_RWX code.
 
 Fixes for old code/stable: an fix to an incorrect TLB flush on boot but not on
 any current machines, a compile error on 4xx and a fix to memory hotplug when
 using radix (Power9).
 
 Thanks to:
   Anton Blanchard, Cédric Le Goater, Christian Lamparter, Christophe Leroy,
   Christophe Lombard, Guenter Roeck, Jeremy Kerr, Michael Neuling, Nicholas
   Piggin.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZ11/yAAoJEFHr6jzI4aWAooYQAKjOKamLyxRotFGSqBfrfOMN
 UfWKAmfeoF/gi4z03x8JdhcMQ1CMh5c/cQmrwzkzDosQ2m83CEWzNbk0gFV0Tpld
 2qDTrEnBfTC4TRhCwWuOlaQC8LFeqxwpsCS6iXGpYoTY2NzmsbliPIQaruCIqj3f
 F+7etfTqGJ+gCCkGfT0s7m96QO+daaT32nXL1VM8i/rOOpwZ23oj6kR3/ZSTWxBc
 eyzvUcTLLhLp8hJoF0kU4f1VQtYijnpqK6ULr3wSfxYqjB9S4dsGyiF537aw5bKX
 UPUCoMgkttVy8Jh35dtvKLuMqlfjHGqoe6UcezSG9a6wxeUnQRzsVmXihNDSkyEt
 EawmPS6R5YaXXu9NKsleKBZDAdU4+EkcX24Wwn/lZy3wTKc1Lys+wH9hXuzzGNiH
 JYEcMacXvpM1yFYNT+ouyLtoT86VWVCdwihueBMV74lyV87Wr55zS65XP6Z+Bb8/
 aPL906sOAvnJW1JhHbYq/UxLfvs0enHNHpyR0xM4/oXi+RoGfhbM/MC1uJxv4w9z
 12n1hbrpaRxHSSnJr1gahCrDoNUXLjkO2+0Ur3iXP4tfMMhwDs8qCILeaKpNmzkC
 sO1ehOmXSSYqJLNiPmSIxs/QEzjFrvQdNwZOIav+3Tsc5ofR6wsauDCmfK5br2Mg
 BdhW7LXtqL77cmOzN95e
 =etbe
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Nine small fixes, really nothing that stands out.

  A work-around for a spurious MCE on Power9. A CXL fault handling fix,
  some fixes to the new XIVE code, and a fix to the new 32-bit
  STRICT_KERNEL_RWX code.

  Fixes for old code/stable: an fix to an incorrect TLB flush on boot
  but not on any current machines, a compile error on 4xx and a fix to
  memory hotplug when using radix (Power9).

  Thanks to: Anton Blanchard, Cédric Le Goater, Christian Lamparter,
  Christophe Leroy, Christophe Lombard, Guenter Roeck, Jeremy Kerr,
  Michael Neuling, Nicholas Piggin"

* tag 'powerpc-4.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/powernv: Increase memory block size to 1GB on radix
  powerpc/mm: Call flush_tlb_kernel_range with interrupts enabled
  powerpc/xive: Clear XIVE internal structures when a CPU is removed
  powerpc/xive: Fix IPI reset
  powerpc/4xx: Fix compile error with 64K pages on 40x, 44x
  powerpc: Fix action argument for cpufeatures-based TLB flush
  cxl: Fix memory page not handled
  powerpc: Fix workaround for spurious MCE on POWER9
  powerpc: Handle MCE on POWER9 with only DSISR bit 30 set
2017-10-06 08:47:21 -07:00
Linus Torvalds 27efed3e83 Merge branch 'core-watchdog-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull watchddog clean-up and fixes from Thomas Gleixner:
 "The watchdog (hard/softlockup detector) code is pretty much broken in
  its current state. The patch series addresses this by removing all
  duct tape and refactoring it into a workable state.

  The reasons why I ask for inclusion that late in the cycle are:

   1) The code causes lockdep splats vs. hotplug locking which get
      reported over and over. Unfortunately there is no easy fix.

   2) The risk of breakage is minimal because it's already broken

   3) As 4.14 is a long term stable kernel, I prefer to have working
      watchdog code in that and the lockdep issues resolved. I wouldn't
      ask you to pull if 4.14 wouldn't be a LTS kernel or if the
      solution would be easy to backport.

   4) The series was around before the merge window opened, but then got
      delayed due to the UP failure caused by the for_each_cpu()
      surprise which we discussed recently.

  Changes vs. V1:

   - Addressed your review points

   - Addressed the warning in the powerpc code which was discovered late

   - Changed two function names which made sense up to a certain point
     in the series. Now they match what they do in the end.

   - Fixed a 'unused variable' warning, which got not detected by the
     intel robot. I triggered it when trying all possible related config
     combinations manually. Randconfig testing seems not random enough.

  The changes have been tested by and reviewed by Don Zickus and tested
  and acked by Micheal Ellerman for powerpc"

* 'core-watchdog-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  watchdog/core: Put softlockup_threads_initialized under ifdef guard
  watchdog/core: Rename some softlockup_* functions
  powerpc/watchdog: Make use of watchdog_nmi_probe()
  watchdog/core, powerpc: Lock cpus across reconfiguration
  watchdog/core, powerpc: Replace watchdog_nmi_reconfigure()
  watchdog/hardlockup/perf: Fix spelling mistake: "permanetely" -> "permanently"
  watchdog/hardlockup/perf: Cure UP damage
  watchdog/hardlockup: Clean up hotplug locking mess
  watchdog/hardlockup/perf: Simplify deferred event destroy
  watchdog/hardlockup/perf: Use new perf CPU enable mechanism
  watchdog/hardlockup/perf: Implement CPU enable replacement
  watchdog/hardlockup/perf: Implement init time detection of perf
  watchdog/hardlockup/perf: Implement init time perf validation
  watchdog/core: Get rid of the racy update loop
  watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stage
  watchdog/sysctl: Clean up sysctl variable name space
  watchdog/sysctl: Get rid of the #ifdeffery
  watchdog/core: Clean up header mess
  watchdog/core: Further simplify sysctl handling
  watchdog/core: Get rid of the thread teardown/setup dance
  ...
2017-10-06 08:36:41 -07:00
Gustavo Romero 044215d145 powerpc/tm: Fix illegal TM state in signal handler
Currently it's possible that on returning from the signal handler
through the restore_tm_sigcontexts() code path (e.g. from a signal
caught due to a `trap` instruction executed in the middle of an HTM
block, or a deliberately constructed sigframe) an illegal TM state
(like TS=10 TM=0, i.e. "T0") is set in SRR1 and when `rfid` sets
implicitly the MSR register from SRR1 register on return to userspace
it causes a TM Bad Thing exception.

That illegal state can be set (a) by a malicious user that disables
the TM bit by tweaking the bits in uc_mcontext before returning from
the signal handler or (b) by a sufficient number of context switches
occurring such that the load_tm counter overflows and TM is disabled
whilst in the signal handler.

This commit fixes the illegal TM state by ensuring that TM bit is
always enabled before we return from restore_tm_sigcontexts(). A small
comment correction is made as well.

Fixes: 5d176f751e ("powerpc: tm: Enable transactional memory (TM) lazily for userspace")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-06 22:12:55 +11:00
Cyril Bur 265e60a170 powerpc/64s: Use emergency stack for kernel TM Bad Thing program checks
When using transactional memory (TM), the CPU can be in one of six
states as far as TM is concerned, encoded in the Machine State
Register (MSR). Certain state transitions are illegal and if attempted
trigger a "TM Bad Thing" type program check exception.

If we ever hit one of these exceptions it's treated as a bug, ie. we
oops, and kill the process and/or panic, depending on configuration.

One case where we can trigger a TM Bad Thing, is when returning to
userspace after a system call or interrupt, using RFID. When this
happens the CPU first restores the user register state, in particular
r1 (the stack pointer) and then attempts to update the MSR. However
the MSR update is not allowed and so we take the program check with
the user register state, but the kernel MSR.

This tricks the exception entry code into thinking we have a bad
kernel stack pointer, because the MSR says we're coming from the
kernel, but r1 is pointing to userspace.

To avoid this we instead always switch to the emergency stack if we
take a TM Bad Thing from the kernel. That way none of the user
register values are used, other than for printing in the oops message.

This is the fix for CVE-2017-1000255.

Fixes: 5d176f751e ("powerpc: tm: Enable transactional memory (TM) lazily for userspace")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
[mpe: Rewrite change log & comments, tweak asm slightly]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-06 22:12:16 +11:00
Kautuk Consul 4ca360f3db powerpc: get_wchan(): solve possible race scenario due to parallel wakeup
Add a check for p->state == TASK_RUNNING so that any wake-ups on
task_struct p in the interim lead to 0 being returned by get_wchan().

Signed-off-by: Kautuk Consul <kautuk.consul.1980@gmail.com>
[mpe: Confirmed other architectures do similar]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-06 20:51:52 +11:00
Jan H. Schönherr 753f612471 PCI: Remove reset argument from pci_iov_{add,remove}_virtfn()
The "reset" argument passed to pci_iov_add_virtfn() and
pci_iov_remove_virtfn() is always zero since 46cb7b1bd8 ("PCI: Remove
unused SR-IOV VF Migration support")

Remove the argument together with the associated code.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Russell Currey <ruscur@russell.cc>
2017-10-05 15:54:56 -05:00
Naveen N. Rao 3368f5699a powerpc/jprobes: Validate break handler invocation as being due to a jprobe_return()
Fix a circa 2005 FIXME by implementing a check to ensure that we
actually got into the jprobe break handler() due to the trap in
jprobe_return().

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-05 16:12:48 +11:00
Naveen N. Rao 6baea433bc powerpc/jprobes: Disable preemption when triggered through ftrace
KPROBES_SANITY_TEST throws the below splat when CONFIG_PREEMPT is
enabled:

  Kprobe smoke test: started
  DEBUG_LOCKS_WARN_ON(val > preempt_count())
  ------------[ cut here ]------------
  WARNING: CPU: 19 PID: 1 at kernel/sched/core.c:3094 preempt_count_sub+0xcc/0x140
  Modules linked in:
  CPU: 19 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc7-nnr+ #97
  task: c0000000fea80000 task.stack: c0000000feb00000
  NIP:  c00000000011d3dc LR: c00000000011d3d8 CTR: c000000000a090d0
  REGS: c0000000feb03400 TRAP: 0700   Not tainted  (4.13.0-rc7-nnr+)
  MSR:  8000000000021033 <SF,ME,IR,DR,RI,LE>  CR: 28000282  XER: 00000000
  CFAR: c00000000015aa18 SOFTE: 0
  <snip>
  NIP preempt_count_sub+0xcc/0x140
  LR  preempt_count_sub+0xc8/0x140
  Call Trace:
    preempt_count_sub+0xc8/0x140 (unreliable)
    kprobe_handler+0x228/0x4b0
    program_check_exception+0x58/0x3b0
    program_check_common+0x16c/0x170
    --- interrupt: 0 at kprobe_target+0x8/0x20
                     LR = init_test_probes+0x248/0x7d0
    kp+0x0/0x80 (unreliable)
    livepatch_handler+0x38/0x74
    init_kprobes+0x1d8/0x208
    do_one_initcall+0x68/0x1d0
    kernel_init_freeable+0x298/0x374
    kernel_init+0x24/0x160
    ret_from_kernel_thread+0x5c/0x70
  Instruction dump:
  419effdc 3d22001b 39299240 81290000 2f890000 409effc8 3c82ffcb 3c62ffcb
  3884bc68 3863bc18 4803d5fd 60000000 <0fe00000> 4bffffa8 60000000 60000000
  ---[ end trace 432dd46b4ce3d29f ]---
  Kprobe smoke test: passed successfully

The issue is that we aren't disabling preemption in
kprobe_ftrace_handler(). Disable it.

Fixes: ead514d5fb ("powerpc/kprobes: Add support for KPROBES_ON_FTRACE")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Trim oops a little for formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-05 16:11:29 +11:00
Naveen N. Rao c179ea2701 powerpc/kprobes: Fix warnings from __this_cpu_read() on preempt kernels
Kamalesh pointed out that we are getting the below call traces with
livepatched functions when we enable CONFIG_PREEMPT:

[  495.470721] BUG: using __this_cpu_read() in preemptible [00000000] code: cat/8394
[  495.471167] caller is is_current_kprobe_addr+0x30/0x90
[  495.471171] CPU: 4 PID: 8394 Comm: cat Tainted: G              K 4.13.0-rc7-nnr+ #95
[  495.471173] Call Trace:
[  495.471178] [c00000008fd9b960] [c0000000009f039c] dump_stack+0xec/0x160 (unreliable)
[  495.471184] [c00000008fd9b9a0] [c00000000059169c] check_preemption_disabled+0x15c/0x170
[  495.471187] [c00000008fd9ba30] [c000000000046460] is_current_kprobe_addr+0x30/0x90
[  495.471191] [c00000008fd9ba60] [c00000000004e9a0] ftrace_call+0x1c/0xb8
[  495.471195] [c00000008fd9bc30] [c000000000376fd8] seq_read+0x238/0x5c0
[  495.471199] [c00000008fd9bcd0] [c0000000003cfd78] proc_reg_read+0x88/0xd0
[  495.471203] [c00000008fd9bd00] [c00000000033e5d4] __vfs_read+0x44/0x1b0
[  495.471206] [c00000008fd9bd90] [c0000000003402ec] vfs_read+0xbc/0x1b0
[  495.471210] [c00000008fd9bde0] [c000000000342138] SyS_read+0x68/0x110
[  495.471214] [c00000008fd9be30] [c00000000000bc6c] system_call+0x58/0x6c

Commit c05b8c4474 ("powerpc/kprobes: Skip livepatch_handler() for
jprobes") introduced a helper is_current_kprobe_addr() to help determine
if the current function has been livepatched or if it has a jprobe
installed, both of which modify the NIP. This was subsequently renamed
to __is_active_jprobe().

In the case of a jprobe, kprobe_ftrace_handler() disables pre-emption
before calling into setjmp_pre_handler() which returns without disabling
pre-emption. This is done to ensure that the jprobe handler won't
disappear beneath us if the jprobe is unregistered between the
setjmp_pre_handler() and the subsequent longjmp_break_handler() called
from the jprobe handler. Due to this, we can use __this_cpu_read() in
__is_active_jprobe() with the pre-emption check as we know that
pre-emption will be disabled.

However, if this function has been livepatched, we are still doing this
check and when we do so, pre-emption won't necessarily be disabled. This
results in the call trace shown above.

Fix this by only invoking __is_active_jprobe() when pre-emption is
disabled. And since we now guard this within a pre-emption check, we can
instead use raw_cpu_read() to get the current_kprobe value skipping the
check done by __this_cpu_read().

Fixes: c05b8c4474 ("powerpc/kprobes: Skip livepatch_handler() for jprobes")
Reported-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Tested-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04 23:42:20 +11:00
Naveen N. Rao bf3a912517 powerpc/kprobes: Clean up jprobe detection in livepatch handler
In commit c05b8c4474 ("powerpc/kprobes: Skip livepatch_handler() for
jprobes"), we added a helper is_current_kprobe_addr() to help detect if
the modified regs->nip was due to a jprobe or livepatch. Masami felt
that the function name was not quite clear. To that end, this patch
renames is_current_kprobe_addr() to __is_active_jprobe() and adds a
comment to (hopefully) better clarify the purpose of this helper. The
helper has also now been moved to kprobes-ftrace.c so that it is only
available for KPROBES_ON_FTRACE.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04 23:42:17 +11:00
Naveen N. Rao a7b440383f powerpc/kprobes: Do not suppress instruction emulation if a single run failed
Currently, we disable instruction emulation if emulate_step() fails for
any reason. However, such failures could be transient and specific to a
particular run. Instead, only disable instruction emulation if we have
never been able to emulate this. If we had emulated this instruction
successfully at least once, then we single step only this probe hit and
continue to try emulating the instruction in subsequent probe hits.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04 23:42:16 +11:00
Naveen N. Rao 22085337f5 powerpc/kprobes: Some cosmetic updates to try_to_emulate()
1. This is only used in kprobes.c, so make it static.
2. Remove the un-necessary (ret == 0) comparison in the else clause.

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04 23:42:12 +11:00
Thomas Gleixner 34ddaa3e5c powerpc/watchdog: Make use of watchdog_nmi_probe()
The rework of the core hotplug code triggers the WARN_ON in start_wd_cpu()
on powerpc because it is called multiple times for the boot CPU.

The first call is via:

  start_wd_on_cpu+0x80/0x2f0
  watchdog_nmi_reconfigure+0x124/0x170
  softlockup_reconfigure_threads+0x110/0x130
  lockup_detector_init+0xbc/0xe0
  kernel_init_freeable+0x18c/0x37c
  kernel_init+0x2c/0x160
  ret_from_kernel_thread+0x5c/0xbc

And then again via the CPU hotplug registration:

  start_wd_on_cpu+0x80/0x2f0
  cpuhp_invoke_callback+0x194/0x620
  cpuhp_thread_fun+0x7c/0x1b0
  smpboot_thread_fn+0x290/0x2a0
  kthread+0x168/0x1b0
  ret_from_kernel_thread+0x5c/0xbc

This can be avoided by setting up the cpu hotplug state with nocalls and
move the initialization to the watchdog_nmi_probe() function. That
initializes the hotplug callbacks without invoking the callback and the
following core initialization function then configures the watchdog for the
online CPUs (in this case CPU0) via softlockup_reconfigure_threads().

Reported-and-tested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
2017-10-04 10:53:54 +02:00
Thomas Gleixner e31d6883f2 watchdog/core, powerpc: Lock cpus across reconfiguration
Instead of dropping the cpu hotplug lock after stopping NMI watchdog and
threads and reaquiring for restart, the code and the protection rules
become more obvious when holding cpu hotplug lock across the full
reconfiguration.

Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1710022105570.2114@nanos
2017-10-04 10:53:54 +02:00
Thomas Gleixner 6b9dc4806b watchdog/core, powerpc: Replace watchdog_nmi_reconfigure()
The recent cleanup of the watchdog code split watchdog_nmi_reconfigure()
into two stages. One to stop the NMI and one to restart it after
reconfiguration. That was done by adding a boolean 'run' argument to the
code, which is functionally correct but not necessarily a piece of art.

Replace it by two explicit functions: watchdog_nmi_stop() and
watchdog_nmi_start().

Fixes: 6592ad2fcc ("watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stage")
Requested-by: Linus 'Nursing his pet-peeve' Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Thomas 'Mopping up garbage' Gleixner <tglx@linutronix.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1710021957480.2114@nanos
2017-10-04 10:53:53 +02:00
Allen Pais 8d6b1bf20f powerpc/6xx: Use setup_timer() helper
Use setup_timer function instead of initializing timer with the
function and data fields.

Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04 11:28:02 +11:00
Nicholas Piggin 969a86a285 powerpc/powernv: Use early_radix_enabled in POWER9 tlb flush
This code is used at boot and machine checks, so it should be using
early_radix_enabled() (which is usable any time).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04 11:28:01 +11:00
Nicholas Piggin 78adf6c214 powerpc/64s: Implement system reset idle wakeup reason
It is possible to wake from idle due to a system reset exception, in
which case the CPU takes a system reset interrupt to wake from idle,
with system reset as the wakeup reason.

The regular (not idle wakeup) system reset interrupt handler must be
invoked in this case, otherwise the system reset interrupt is lost.

Handle the system reset interrupt immediately after CPU state has been
restored.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04 11:26:32 +11:00
Nicholas Piggin 80e4d70b06 powerpc/watchdog: Do not trigger SMP crash from touch_nmi_watchdog
In xmon, touch_nmi_watchdog() is not expected to be checking that
other CPUs have not touched the watchdog, so the code will just call
touch_nmi_watchdog() once before re-enabling hard interrupts.

Just update our CPU's state, and ignore apparently stuck SMP threads.

Arguably touch_nmi_watchdog should check for SMP lockups, and callers
should be fixed, but that's not trivial for the input code of xmon.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04 11:26:02 +11:00
Nicholas Piggin d58fdd9d7f powerpc/watchdog: Do not backtrace locked CPUs twice if allcpus backtrace is enabled
If sysctl_hardlockup_all_cpu_backtrace is enabled, there is no need to
IPI stuck CPUs for backtrace before trigger_allbutself_cpu_backtrace(),
which does the same thing again.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04 11:25:50 +11:00
Nicholas Piggin 842dc1dbab powerpc/watchdog: Do not panic from locked CPU's IPI handler
The SMP watchdog will detect locked CPUs and IPI them to print a
backtrace and registers. If panic on hard lockup is enabled, do not
panic from this handler, because that can cause recursion into the IPI
layer during the panic.

The caller already panics in this case.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04 11:25:40 +11:00
Christian Lamparter 070e004912 powerpc/4xx: Fix compile error with 64K pages on 40x, 44x
The mmu context on the 40x, 44x does not define pte_frag entry. This
causes gcc abort the compilation due to:

  setup-common.c: In function ‘setup_arch’:
  setup-common.c:908: error: ‘mm_context_t’ has no ‘pte_frag’

This patch fixes the issue by removing the pte_frag initialization in
setup-common.c.

This is possible, because the compiler will do the initialization,
since the mm_context is a sub struct of init_mm. init_mm is declared
in mm_types.h as external linkage.

According to C99 6.2.4.3:
  An object whose identifier is declared with external linkage
  [...] has static storage duration.

C99 defines in 6.7.8.10 that:
  If an object that has static storage duration is not
  initialized explicitly, then:
  - if it has pointer type, it is initialized to a null pointer

Fixes: b1923caa6e ("powerpc: Merge 32-bit and 64-bit setup_arch()")
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-03 21:59:48 +11:00
Jeremy Kerr 3b7af5c0fd powerpc: Fix action argument for cpufeatures-based TLB flush
Commit 41d0c2ecde ("powerpc/powernv: Fix local TLB flush for boot
and MCE on POWER9") introduced calls to __flush_tlb_power[89] from the
cpufeatures code, specifying the number of sets to flush.

However, these functions take an action argument, not a number of
sets. This means we hit the BUG() in __flush_tlb_{206,300} when using
cpufeatures-style configuration.

This change passes TLB_INVAL_SCOPE_GLOBAL instead.

Fixes: 41d0c2ecde ("powerpc/powernv: Fix local TLB flush for boot and MCE on POWER9")
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-03 16:16:55 +11:00
Michael Neuling bca73f595a powerpc: Fix workaround for spurious MCE on POWER9
In the recent commit d8bd9f3f09 ("powerpc: Handle MCE on POWER9 with
only DSISR bit 30 set") I screwed up the bit number. It should be bit
25 (IBM bit 38).

Fixes: d8bd9f3f09 ("powerpc: Handle MCE on POWER9 with only DSISR bit 30 set")
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-29 14:19:44 +10:00
Michael Neuling 5080332c2c powerpc/64s: Add workaround for P9 vector CI load issue
POWER9 DD2.1 and earlier has an issue where some cache inhibited
vector load will return bad data. The workaround is two part, one
firmware/microcode part triggers HMI interrupts when hitting such
loads, the other part is this patch which then emulates the
instructions in Linux.

The affected instructions are limited to lxvd2x, lxvw4x, lxvb16x and
lxvh8x.

When an instruction triggers the HMI, all threads in the core will be
sent to the HMI handler, not just the one running the vector load.

In general, these spurious HMIs are detected by the emulation code and
we just return back to the running process. Unfortunately, if a
spurious interrupt occurs on a vector load that's to normal memory we
have no way to detect that it's spurious (unless we walk the page
tables, which is very expensive). In this case we emulate the load but
we need do so using a vector load itself to ensure 128bit atomicity is
preserved.

Some additional debugfs emulated instruction counters are added also.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Switch CONFIG_PPC_BOOK3S_64 to CONFIG_VSX to unbreak the build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-27 08:23:22 +10:00
Michael Neuling d8bd9f3f09 powerpc: Handle MCE on POWER9 with only DSISR bit 30 set
On POWER9 DD2.1 and below, it's possible for a paste instruction to
cause a Machine Check Exception (MCE) where only DSISR bit 30 (IBM 33)
is set. This will result in the MCE handler seeing an unknown event,
which triggers linux to crash.

We change this by detecting unknown events caused by load/stores in
the MCE handler and marking them as handled so that we no longer
crash.

An MCE that occurs like this is spurious, so we don't need to do
anything in terms of servicing it. If there is something that needs to
be serviced, the CPU will raise the MCE again with the correct DSISR
so that it can be serviced properly.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com
Acked-by: Balbir Singh <bsingharora@gmail.com>
[mpe: Expand comment with details from change log, use normal bit #s]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-26 21:01:59 +10:00
Benjamin Herrenschmidt b9fde58db7 powerpc/powernv: Rework EEH initialization on powernv
Remove the post_init callback which is only used
by powernv, we can just call it explicitly from
the powernv code.

This partially kills the ability to "disable" eeh at
runtime via debugfs as this was calling that same
callback again, but this is both unused and broken
in several ways. If we want to revive it, we need
to create a dedicated enable/disable callback on the
backend that does the right thing.

Let the bulk of eeh initialize normally at
core_initcall() like it does on pseries by removing
the hack in eeh_init() that delays it.

Instead we make sure our eeh->probe cleanly bails
out of the PEs haven't been created yet and we force
a re-probe where we used to call eeh_init() again.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-26 11:19:07 +10:00
Benjamin Herrenschmidt 3e77adeea3 powerpc/eeh: Create PHB PEs after EEH is initialized
Otherwise we end up not yet having computed the right diag data size
on powernv where EEH initialization is delayed, thus causing memory
corruption later on when calling OPAL.

Fixes: 5cb1f8fddd ("powerpc/powernv/pci: Dynamically allocate PHB diag data")
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-21 14:56:00 +10:00
Naveen N. Rao 8afafa6fba powerpc/kprobes: Update optprobes to use emulate_update_regs()
Optprobes depended on an updated regs->nip from analyse_instr() to
identify the location to branch back from the optprobes trampoline.
However, since commit 3cdfcbfd32 ("powerpc: Change analyse_instr so
it doesn't modify *regs"), analyse_instr() doesn't update the registers
anymore.  Due to this, we end up branching back from the optprobes
trampoline to the same branch into the trampoline resulting in a loop.

Fix this by calling out to emulate_update_regs() before using the nip.
Additionally, explicitly compare the return value from analyse_instr()
to 1, rather than just checking for !0 so as to guard against any
future changes to analyse_instr() that may result in -1 being returned
in more scenarios.

Fixes: 3cdfcbfd32 ("powerpc: Change analyse_instr so it doesn't modify *regs")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-20 20:21:24 +10:00
Michael Ellerman b134165ead Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into fixes
Merge one commit from Scott which I missed while away.
2017-09-20 20:05:24 +10:00
Gustavo Romero c1fa0768a8 powerpc/tm: Flush TM only if CPU has TM feature
Commit cd63f3c ("powerpc/tm: Fix saving of TM SPRs in core dump")
added code to access TM SPRs in flush_tmregs_to_thread(). However
flush_tmregs_to_thread() does not check if TM feature is available on
CPU before trying to access TM SPRs in order to copy live state to
thread structures. flush_tmregs_to_thread() is indeed guarded by
CONFIG_PPC_TRANSACTIONAL_MEM but it might be the case that kernel
was compiled with CONFIG_PPC_TRANSACTIONAL_MEM enabled and ran on
a CPU without TM feature available, thus rendering the execution
of TM instructions that are treated by the CPU as illegal instructions.

The fix is just to add proper checking in flush_tmregs_to_thread()
if CPU has the TM feature before accessing any TM-specific resource,
returning immediately if TM is no available on the CPU. Adding
that checking in flush_tmregs_to_thread() instead of in places
where it is called, like in vsr_get() and vsr_set(), is better because
avoids the same problem cropping up elsewhere.

Cc: stable@vger.kernel.org # v4.13+
Fixes: cd63f3c ("powerpc/tm: Fix saving of TM SPRs in core dump")
Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com>
Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-20 13:30:09 +10:00
Al Viro a5ae754a3d ppc: switch to {get,put}_compat_sigset()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-09-19 17:56:03 -04:00
Linus Torvalds 418702b910 powerpc fixes for 4.14 #2
Just one fix, for the handling of alignment interrupts on dcbz instructions.
 
 Thanks to:
   Paul Mackerras, Christian Zigotzky, Michal Sojka.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZu5HyAAoJEFHr6jzI4aWA+yoP+gOKF4obMWnvSJ5YWZZWQee/
 T++Uhps7AF0G2y99rBVUKzbXzweDPpBrNQW5cFlF51WinNCA4KJ4E08xoNSrnlKl
 ebc6Tn2cHbpWgydSRxWDig/Dp90U95toKp14irO2YvOVWoTgYBX7gOHhkMcpxdLH
 MeVe6IAatOSNNNpJK9gLrNDLZaDhTGqXf64VIaw44llf0R9avNFhoS+8F94bllm2
 VSsXizP1HZxWFcxJ1W/F54zkOvOkSXxjmd9Lf0IOxERPAshZBO7MfDaXF9MaYXwL
 4AU5GZc2UmWIJpwmSQqRQPpLOPFvsJkhNnXeUb/8kQAG2W+HfZy6BvHpVh1lajm7
 j+tyIYe+E7Jt63tQxyxSKpGnu20po78iimciYhl77A2RksGl9Tw5sHWj8gok30TI
 pQFIRLPAO38MnnD3OrsboAnBrruS0Og+lkSWan7nV5n8ybHflvimjG+MfkNx9vzL
 as7yJThQ8D8JBRAyaIHG3aAUPgBDxpZwqhDU3rac0qZChOc9DH4X7YFHR1UzGH8n
 J8h9e7IsKGfvm4O9j4t3/nq0Qwh0LvpAlfu1/tPdJ0KXc9BnQTVoCpkFzhA+IFV6
 JY2y4F1MfW8ZWS20iPHLlYEXYcTtcTq5b47JYQx97TgSp82ShXSLwbUxXNwSraDv
 umUvJle++yyRm927njki
 =F9yZ
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:
 "Just one fix, for the handling of alignment interrupts on dcbz
  instructions.

  Thanks to Paul Mackerras, Christian Zigotzky, Michal Sojka"

* tag 'powerpc-4.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc: Fix handling of alignment interrupt on dcbz instruction
2017-09-15 12:44:59 -07:00
Paul Mackerras 1bc944cee6 powerpc: Fix handling of alignment interrupt on dcbz instruction
This fixes the emulation of the dcbz instruction in the alignment
interrupt handler.  The error was that we were comparing just the
instruction type field of op.type rather than the whole thing,
and therefore the comparison "type != CACHEOP + DCBZ" was always
true.

Fixes: 31bfdb036f ("powerpc: Use instruction emulation infrastructure to handle alignment faults")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Tested-by: Michal Sojka <sojkam1@fel.cvut.cz>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-15 08:41:18 +10:00
Thomas Gleixner ab5fe3ff38 watchdog/hardlockup: Clean up hotplug locking mess
All watchdog thread related functions are delegated to the smpboot thread
infrastructure, which handles serialization against CPU hotplug correctly.

The sysctl interface is completely decoupled from anything which requires
CPU hotplug protection.

No need to protect the sysctl writes against cpu hotplug anymore. Remove it
and add the now required protection to the powerpc arch_nmi_watchdog
implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170912194148.418497420@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:09 +02:00
Thomas Gleixner 6592ad2fcc watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stage
Both the perf reconfiguration and the powerpc watchdog_nmi_reconfigure()
need to be done in two steps.

     1) Stop all NMIs
     2) Read the new parameters and start NMIs

Right now watchdog_nmi_reconfigure() is a combination of both. To allow a
clean reconfiguration add a 'run' argument and split the functionality in
powerpc.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170912194147.862865570@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:07 +02:00
Thomas Gleixner 5490125d77 watchdog/core: Remove broken suspend/resume interfaces
This interface has several issues:

 - It's causing recursive locking of the hotplug lock.

 - It's complete overkill to teardown all threads and then recreate them

The same can be achieved with the simple hardlockup_detector_perf_stop /
restart() interfaces. The abuse from the busy looping poweroff() loop of
PARISC has been solved as well.

Remove the cruft.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194146.487537732@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:04 +02:00
Michal Hocko 0ee931c4e3 mm: treewide: remove GFP_TEMPORARY allocation flag
GFP_TEMPORARY was introduced by commit e12ba74d8f ("Group short-lived
and reclaimable kernel allocations") along with __GFP_RECLAIMABLE.  It's
primary motivation was to allow users to tell that an allocation is
short lived and so the allocator can try to place such allocations close
together and prevent long term fragmentation.  As much as this sounds
like a reasonable semantic it becomes much less clear when to use the
highlevel GFP_TEMPORARY allocation flag.  How long is temporary? Can the
context holding that memory sleep? Can it take locks? It seems there is
no good answer for those questions.

The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
__GFP_RECLAIMABLE which in itself is tricky because basically none of
the existing caller provide a way to reclaim the allocated memory.  So
this is rather misleading and hard to evaluate for any benefits.

I have checked some random users and none of them has added the flag
with a specific justification.  I suspect most of them just copied from
other existing users and others just thought it might be a good idea to
use without any measuring.  This suggests that GFP_TEMPORARY just
motivates for cargo cult usage without any reasoning.

I believe that our gfp flags are quite complex already and especially
those with highlevel semantic should be clearly defined to prevent from
confusion and abuse.  Therefore I propose dropping GFP_TEMPORARY and
replace all existing users to simply use GFP_KERNEL.  Please note that
SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
so they will be placed properly for memory fragmentation prevention.

I can see reasons we might want some gfp flag to reflect shorterm
allocations but I propose starting from a clear semantic definition and
only then add users with proper justification.

This was been brought up before LSF this year by Matthew [1] and it
turned out that GFP_TEMPORARY really doesn't have a clear semantic.  It
seems to be a heuristic without any measured advantage for most (if not
all) its current users.  The follow up discussion has revealed that
opinions on what might be temporary allocation differ a lot between
developers.  So rather than trying to tweak existing users into a
semantic which they haven't expected I propose to simply remove the flag
and start from scratch if we really need a semantic for short term
allocations.

[1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org

[akpm@linux-foundation.org: fix typo]
[akpm@linux-foundation.org: coding-style fixes]
[sfr@canb.auug.org.au: drm/i915: fix up]
  Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-13 18:53:16 -07:00
Linus Torvalds dd198ce714 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace updates from Eric Biederman:
 "Life has been busy and I have not gotten half as much done this round
  as I would have liked. I delayed it so that a minor conflict
  resolution with the mips tree could spend a little time in linux-next
  before I sent this pull request.

  This includes two long delayed user namespace changes from Kirill
  Tkhai. It also includes a very useful change from Serge Hallyn that
  allows the security capability attribute to be used inside of user
  namespaces. The practical effect of this is people can now untar
  tarballs and install rpms in user namespaces. It had been suggested to
  generalize this and encode some of the namespace information
  information in the xattr name. Upon close inspection that makes the
  things that should be hard easy and the things that should be easy
  more expensive.

  Then there is my bugfix/cleanup for signal injection that removes the
  magic encoding of the siginfo union member from the kernel internal
  si_code. The mips folks reported the case where I had used FPE_FIXME
  me is impossible so I have remove FPE_FIXME from mips, while at the
  same time including a return statement in that case to keep gcc from
  complaining about unitialized variables.

  I almost finished the work to get make copy_siginfo_to_user a trivial
  copy to user. The code is available at:

     git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git neuter-copy_siginfo_to_user-v3

  But I did not have time/energy to get the code posted and reviewed
  before the merge window opened.

  I was able to see that the security excuse for just copying fields
  that we know are initialized doesn't work in practice there are buggy
  initializations that don't initialize the proper fields in siginfo. So
  we still sometimes copy unitialized data to userspace"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  Introduce v3 namespaced file capabilities
  mips/signal: In force_fcr31_sig return in the impossible case
  signal: Remove kernel interal si_code magic
  fcntl: Don't use ambiguous SIG_POLL si_codes
  prctl: Allow local CAP_SYS_ADMIN changing exe_file
  security: Use user_namespace::level to avoid redundant iterations in cap_capable()
  userns,pidns: Verify the userns for new pid namespaces
  signal/testing: Don't look for __SI_FAULT in userspace
  signal/mips: Document a conflict with SI_USER with SIGFPE
  signal/sparc: Document a conflict with SI_USER with SIGFPE
  signal/ia64: Document a conflict with SI_USER with SIGFPE
  signal/alpha: Document a conflict with SI_USER for SIGTRAP
2017-09-11 18:34:47 -07:00
Alexey Dobriyan 9b130ad5bb treewide: make "nr_cpu_ids" unsigned
First, number of CPUs can't be negative number.

Second, different signnnedness leads to suboptimal code in the following
cases:

1)
	kmalloc(nr_cpu_ids * sizeof(X));

"int" has to be sign extended to size_t.

2)
	while (loff_t *pos < nr_cpu_ids)

MOVSXD is 1 byte longed than the same MOV.

Other cases exist as well. Basically compiler is told that nr_cpu_ids
can't be negative which can't be deduced if it is "int".

Code savings on allyesconfig kernel: -3KB

	add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370)
	function                                     old     new   delta
	coretemp_cpu_online                          450     512     +62
	rcu_init_one                                1234    1272     +38
	pci_device_probe                             374     399     +25

				...

	pgdat_reclaimable_pages                      628     556     -72
	select_fallback_rq                           446     369     -77
	task_numa_find_cpu                          1923    1807    -116

Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:48 -07:00
Cédric Le Goater ac5e5a5402 powerpc/xive: add XIVE Exploitation Mode to CAS
On POWER9, the Client Architecture Support (CAS) negotiation process
determines whether the guest operates in XIVE Legacy compatibility or
in XIVE exploitation mode. Now that we have initial guest support for
the XIVE interrupt controller, let's inform the hypervisor what we can
do.

The platform advertises the XIVE Exploitation Mode support using the
property "ibm,arch-vec-5-platform-support-vec-5", byte 23 bits 0-1 :

 - 0b00 XIVE legacy mode Only
 - 0b01 XIVE exploitation mode Only
 - 0b10 XIVE legacy or exploitation mode

The OS asks for XIVE Exploitation Mode support using the property
"ibm,architecture-vec-5", byte 23 bits 0-1:

 - 0b00 XIVE legacy mode Only
 - 0b01 XIVE exploitation mode Only

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-02 21:02:38 +10:00
Julia Lawall 8a7aef2cb3 powerpc/iommu: Use permission-specific DEVICE_ATTR variants
Use DEVICE_ATTR_RW for read-write attributes.  This simplifies the
source code, improves readbility, and reduces the chance of
inconsistencies.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-01 16:42:54 +10:00
Markus Elfring 6ab41161b4 powerpc/eeh: Delete an error out of memory message at init time
Omit an extra message for a memory allocation failure in
eeh_dev_init().

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
[mpe: Do not drop the message that can happen at runtime and lead to
 an event not being handled]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-01 16:42:53 +10:00
Christophe Leroy ad1b0122bd powerpc/32: remove a NOP from memset()
memset() is patched after initialisation to activate the
optimised part which uses cache instructions.

Today we have a 'b 2f' to skip the optimised patch, which then gets
replaced by a NOP, implying a useless cycle consumption.
As we have a 'bne 2f' just before, we could use that instruction
for the live patching, hence removing the need to have a
dedicated 'b 2f' to be replaced by a NOP.

This patch changes the 'bne 2f' by a 'b 2f'. During init, that
'b 2f' is then replaced by 'bne 2f'

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-01 16:42:46 +10:00
Paul Mackerras 31bfdb036f powerpc: Use instruction emulation infrastructure to handle alignment faults
This replaces almost all of the instruction emulation code in
fix_alignment() with calls to analyse_instr(), emulate_loadstore()
and emulate_dcbz().  The only emulation code left is the SPE
emulation code; analyse_instr() etc. do not handle SPE instructions
at present.

One result of this is that we can now handle alignment faults on
all the new VSX load and store instructions that were added in POWER9.
VSX loads/stores will take alignment faults for unaligned accesses
to cache-inhibited memory.

Another effect is that we no longer rely on the DAR and DSISR values
set by the processor.

With this, we now need to include the instruction emulation code
unconditionally.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-01 16:42:43 +10:00
Michael Ellerman f9effe9250 powerpc: Fix DAR reporting when alignment handler faults
Anton noticed that if we fault part way through emulating an unaligned
instruction, we don't update the DAR to reflect that.

The DAR value is eventually reported back to userspace as the address
in the SEGV signal, and if userspace is using that value to demand
fault then it can be confused by us not setting the value correctly.

This patch is ugly as hell, but is intended to be the minimal fix and
back ports easily.

Cc: stable@vger.kernel.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
2017-08-31 22:06:57 +10:00
Oliver O'Halloran 96d91431d6 powerpc/smp: Add Power9 scheduler topology
In previous generations of Power processors each core had a private L2
cache. The Power 9 processor has a slightly different design where the
L2 cache is shared among pairs of cores rather than being completely
private.

Making the scheduler aware of this cache sharing allows the scheduler to
make better migration decisions. For example, if two CPU heavy tasks
share a core then one task can be migrated to the paired core to improve
throughput. Under the existing three level topology the task could be
migrated to any core on the same chip, while with the new topology it
would be preferentially migrated to the paired core so it remains
cache-hot.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 18:16:08 +10:00
Oliver O'Halloran 2a636a56d2 powerpc/smp: Add cpu_l2_cache_map
We want to add an extra level to the CPU scheduler topology to account
for cores which share a cache. To do this we need to build a cpumask
for each CPU that indicates which CPUs share this cache to use as an
input to the scheduler.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:56 +10:00
Oliver O'Halloran df52f67140 powerpc/smp: Rework CPU topology construction
The CPU scheduler topology is constructed from a number of per-cpu
cpumasks which describe which sets of logical CPUs are related in some
fashion. Current code that handles constructing these masks when CPUs
are hot(un)plugged can be simplified a bit by exploiting the fact that
the scheduler requires higher levels of the toplogy (e.g package level
groupings) to be supersets of the lower levels (e.g.  threas in a core).
This patch reworks the cpumask construction to be simpler and easier to
extend with extra topology levels.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[mpe: Fix CONFIG_HOTPLUG_CPU=n build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:51 +10:00
Oliver O'Halloran e3d8b67e2c powerpc/smp: Use cpu_to_chip_id() to find core siblings
When building the CPU scheduler topology the kernel uses the ibm,chipid
property from the devicetree to group logical CPUs. Currently the DT
search for this property is open-coded in smp.c and this functionality
is a duplication of what's in cpu_to_chip_id() already. This patch
removes the existing search in favor of that.

It's worth mentioning that the semantics of the search are different
in cpu_to_chip_id(). When there is no ibm,chipid in the CPUs node it
will also search /cpus and / for the property, but this should not
effect the output topology.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:50 +10:00
Tobin C. Harding eb039161da powerpc/asm: Convert .llong directives to .8byte
.llong is an undocumented PPC specific directive. The generic
equivalent is .quad, but even better (because it's self describing) is
.8byte.

Convert all .llong directives to .8byte.

Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:47 +10:00
Masahiro Yamada 7f2462acb6 powerpc: Squash lines for simple wrapper functions
Remove unneeded variables and assignments.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:42 +10:00
Bryant G. Ly f9df74dfce powerpc/kernel: Change retrieval of pci_dn
For a PCI device it's pci_dn can be retrieved from
pdev->dev.archdata.firmware_data, PCI_DN(devnode), or parent's list.
Thus, we should just use the existing function pci_get_pdn_by_devfn
to get the pci_dn.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Reviewed-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:40 +10:00
Alexey Kardashevskiy f1e08232ed powerpc/pci: Remove OF node back pointer from pci_dn
The check_req() helper uses pci_get_pdn() to get an OF node pointer.
pci_get_pdn() returns a pci_dn pointer which either:
1) from the OF node returned by pci_device_to_OF_node();
2) from the parent child_list where entries don't have OF node pointers.
Since check_req() does not care about 2), it can call
pci_device_to_OF_node() directly, hence the change.

The find_pe_dn() helper uses embedded pci_dn to get an OF node which is
also stored in edev->pdev so let's take a shortcut and call
pci_device_to_OF_node() directly.

With these 2 changes, we can finally get rid of the OF node back pointer.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:12 +10:00
Alexey Kardashevskiy 14db3d52d3 powerpc/eeh: Reduce use of pci_dn::node
The pci_dn struct caches a OF device node pointer in order to access
the "ibm,loc-code" property when EEH is recovering.

However, when this happens in eeh_dev_check_failure(), we also have
a pci_dev pointer which should have a valid pointer to the device node
when pci_dn has one (both pointers are not NULL for physical functions
and are NULL for virtual functions).

This changes pci_remove_device_node_info() to look for a parent of
the node being removed, just like pci_add_device_node_info() does when it
references the parent node.

This is the first step to get rid of pci_dn::node.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:10 +10:00
Alexey Kardashevskiy 405b33a76d powerpc/eeh: Remove unnecessary config_addr from eeh_dev
The eeh_dev struct hold a config space address of an associated node
and the very same address is also stored in the pci_dn struct which
is always present during the eeh_dev lifetime.

This uses bus:devfn directly from pci_dn instead of cached and packed
config_addr.

Since config_addr is made from device's bus:dev.fn, there is no point
in keeping it in the debugfs either so remove that too.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:09 +10:00
Alexey Kardashevskiy 69672bd748 powerpc/eeh: Remove unnecessary pointer to phb from eeh_dev
The eeh_dev struct already holds a pointer to pci_dn which it does not
exist without and pci_dn itself holds the very same pointer so just
use it.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:09 +10:00
Alexey Kardashevskiy 8bae6a2319 powerpc/eeh: Reduce to one the number of places where edev is allocated
arch/powerpc/kernel/eeh_dev.c:57 is the only legit place where edev
is allocated; other 2 places allocate it on stack and in the heap for
a very short period of time to use eeh_pe_get() as takes edev.

This changes eeh_pe_get() to receive required parameters explicitly.

This removes unnecessary temporary allocation of edev.

This uses the "pe_no" name instead of the "pe_config_addr" name as
it actually is a PE number and not a config space address as it seemed.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:08 +10:00
Alexey Kardashevskiy 5f600b17d1 powerpc/pci: Remove unused parameter from add_one_dev_pci_data()
pdev is always NULL, remove it.

To make checkpatch.pl happy, this also removes the "out of memory"
message.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:07 +10:00
Nicholas Piggin b96672dd84 powerpc: Machine check interrupt is a non-maskable interrupt
Use nmi_enter similarly to system reset interrupts. This uses NMI
printk NMI buffers and turns off various debugging facilities that
helps avoid tripping on ourselves or other CPUs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:04 +10:00
Nicholas Piggin 6fcd6baa90 powerpc/powernv: Use kernel crash path for machine checks
There are quite a few machine check exceptions that can be caused by
kernel bugs. To make debugging easier, use the kernel crash path in
cases of synchronous machine checks that occur in kernel mode, if that
would not result in the machine going straight to panic or crash dump.

There is a downside here that die()ing the process in kernel mode can
still leave the system unstable. panic_on_oops will always force the
system to fail-stop, so systems where that behaviour is important will
still do the right thing.

As a test, when triggering an i-side 0111b error (ifetch from foreign
address) in kernel mode process context on POWER9, the kernel currently
dies quickly like this:

  Severe Machine check interrupt [Not recovered]
    NIP [ffff000000000000]: 0xffff000000000000
    Initiator: CPU
    Error type: Real address [Instruction fetch (foreign)]
  [  127.426651616,0] OPAL: Reboot requested due to Platform error.
      Effective[  127.426693712,3] OPAL: Reboot requested due to Platform error. address: ffff000000000000
  opal: Reboot type 1 not supported
  Kernel panic - not syncing: PowerNV Unrecovered Machine Check
  CPU: 56 PID: 4425 Comm: syscall Tainted: G   M            4.12.0-rc1-13857-ga4700a261072-dirty #35
  Call Trace:
  [  128.017988928,4] IPMI: BUG: Dropping ESEL on the floor due to
    buggy/mising code in OPAL for this BMC
    Rebooting in 10 seconds..
  Trying to free IRQ 496 from IRQ context!

After this patch, the process is killed and the kernel continues with
this message, which gives enough information to identify the offending
branch (i.e., with CFAR):

  Severe Machine check interrupt [Not recovered]
    NIP [ffff000000000000]: 0xffff000000000000
    Initiator: CPU
    Error type: Real address [Instruction fetch (foreign)]
      Effective address: ffff000000000000
  Oops: Machine check, sig: 7 [#1]
  SMP NR_CPUS=2048
  NUMA
  PowerNV
  Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 ...
  CPU: 22 PID: 4436 Comm: syscall Tainted: G   M            4.12.0-rc1-13857-ga4700a261072-dirty #36
  task: c000000932300000 task.stack: c000000932380000
  NIP: ffff000000000000 LR: 00000000217706a4 CTR: ffff000000000000
  REGS: c00000000fc8fd80 TRAP: 0200   Tainted: G   M             (4.12.0-rc1-13857-ga4700a261072-dirty)
  MSR: 90000000001c1003 <SF,HV,ME,RI,LE>
    CR: 24000484  XER: 20000000
  CFAR: c000000000004c80 DAR: 0000000021770a90 DSISR: 0a000000 SOFTE: 1
  GPR00: 0000000000001ebe 00007fffce4818b0 0000000021797f00 0000000000000000
  GPR04: 00007fff8007ac24 0000000044000484 0000000000004000 00007fff801405e8
  GPR08: 900000000280f033 0000000024000484 0000000000000000 0000000000000030
  GPR12: 9000000000001003 00007fff801bc370 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR28: 00007fff801b0000 0000000000000000 00000000217707a0 00007fffce481918
  NIP [ffff000000000000] 0xffff000000000000
  LR [00000000217706a4] 0x217706a4
  Call Trace:
  Instruction dump:
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:04 +10:00
Nicholas Piggin 4388c9b3a6 powerpc: Do not send system reset request through the oops path
A system reset is a request to crash / debug the system rather than
necessarily caused by encountering a BUG. So there is no need to
serialize all CPUs behind the die lock, adding taints to all
subsequent traces beyond the first, breaking console locks, etc.

The system reset is NMI context which has its own printk buffers to
prevent output being interleaved. Then it's better to have all
secondaries print out their debug as quickly as possible and the
primary will flush out all printk buffers during panic().

So remove the 0x100 path from die, and move it into system_reset. Name
the crash/dump reasons "System Reset".

This gives "not tained" traces when crashing an untainted kernel. It
also gives the panic reason as "System Reset" as opposed to "Fatal
exception in interrupt" (or "die oops" for fadump).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:02 +10:00
Nicholas Piggin a3b2cb30f2 powerpc: Do not call ppc_md.panic in fadump panic notifier
If fadump is not registered, and no other crash or debug handlers are
registered, the powerpc panic handler stops the guest before the
generic panic code can push out debug information to the console.

Currently, system reset injection causes the guest to silently stop.

Stop calling ppc_md.panic in the panic notifier. crash_fadump already
does rtas_os_term() to terminate the guest if fadump is registered.

Remove ppc_md.panic. Move fadump panic notifier into fadump code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:01 +10:00
Nicholas Piggin 70412c55d4 powerpc/64: Fix watchdog configuration regressions
This fixes a couple more bits of fallout from the new hard lockup watchdog
patch.

It restores the required hw_nmi_get_sample_period() function for the
perf watchdog, and removes some function declarations on 64e that are only
defined for 64s. This fixes the 64e build when the hardlockup detector is
enabled.

It restores the default behaviour of disabling the perf watchdog, and also
fixes disabling the 64s watchdog when running as a guest.

Fixes: 2104180a53 ("powerpc/64s: implement arch-specific hardlockup watchdog")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:00 +10:00
Nicholas Piggin b68b1d7487 powerpc/64s/radix: Do not allocate SLB shadow structures
These are unused in radix mode.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:25:59 +10:00
Nicholas Piggin d55071905e powerpc/64s/radix: Remove bolted-SLB address limit for per-cpu stacks
Radix MMU does not take SLB or TLB interrupts when accessing kernel
linear address. Remove this restriction for radix mode.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:25:59 +10:00
Nicholas Piggin 72b0d51d97 powerpc/64s: idle POWER9 can execute stop in virtual mode
The hardware can execute stop in any context, and KVM does not
require real mode because siblings do not share MMU state. This
saves a switch to real-mode when going idle.

Acked-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-29 21:42:14 +10:00
Nicholas Piggin 65dbbe812f powerpc/64s: Drop no longer used IDLE_STATE_ENTER_SEQ
There are no longer any callers of IDLE_STATE_ENTER_SEQ, all callers
use IDLE_STATE_ENTER_SEQ_NORET. So drop the former.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Split out of larger patch, write change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-29 21:41:44 +10:00
Nicholas Piggin 56ee52408e powerpc/64s: POWER9 can execute stop without a sync sequence
We don't need to use IDLE_STATE_ENTER_SEQ_NORET on Power9.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-29 21:39:07 +10:00
Nicholas Piggin aafc8a8300 powerpc/64s: Move IDLE_STATE_ENTER_SEQ[_NORET] into idle_book3s.S
This macro is only used in idle_book3s.S, move it in there and add a
more descriptive comment.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Split out of larger patch and write change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-29 21:38:47 +10:00
Michael Ellerman 82b7fcc005 Merge branch 'topic/ppc-kvm' into next
Merge Nicks commit to rework the KVM thread management, shared with the
KVM tree via the ppc-kvm topic branch.
2017-08-29 21:26:30 +10:00
Nicholas Piggin 94a04bc25a KVM: PPC: Book3S HV: POWER9 does not require secondary thread management
POWER9 CPUs have independent MMU contexts per thread, so KVM does not
need to quiesce secondary threads, so the hwthread_req/hwthread_state
protocol does not have to be used. So patch it away on POWER9, and patch
away the branch from the Linux idle wakeup to kvm_start_guest that is
never used.

Add a warning and error out of kvmppc_grab_hwthread in case it is ever
called on POWER9.

This avoids a hwsync in the idle wakeup path on POWER9.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
[mpe: Use WARN(...) instead of WARN_ON()/pr_err(...)]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-29 14:48:59 +10:00
Matt Weber a4e89ffb59 powerpc/e6500: Update machine check for L1D cache err
This patch updates the machine check handler of Linux kernel to
handle the e6500 architecture case. In e6500 core, L1 Data Cache Write
Shadow Mode (DCWS) register is not implemented but L1 data cache always
runs in write shadow mode. So, on L1 data cache parity errors, hardware
will automatically invalidate the data cache but will still log a
machine check interrupt.

Signed-off-by: Ronak Desai <ronak.desai@rockwellcollins.com>
Signed-off-by: Matthew Weber <matthew.weber@rockwellcollins.com>
Signed-off-by: Scott Wood <oss@buserror.net>
2017-08-28 23:15:32 -05:00
Michael Ellerman a6036100ed powerpc/oops: Line up NIP & MSR with other rows
This is purely cosmetic, but does look nicer IMHO:

Before:

  task: c000000001453400 task.stack: c000000001c6c000
  NIP: c000000000a0fbfc LR: c000000000a0fbf4 CTR: c000000000ba6220
  REGS: c0000001fffef820 TRAP: 0300   Not tainted  (4.13.0-rc6-gcc-6.3.1-00234-g423af27f7d81)
  MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 88088242  XER: 00000000
  CFAR: c0000000000b3488 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 0

After:
  task: c000000001453400 task.stack: c000000001c6c000
  NIP:  c000000000a0fbfc LR: c000000000a0fbf4 CTR: c000000000ba6220
  REGS: c0000001fffef820 TRAP: 0300   Not tainted  (4.13.0-rc6-gcc-6.3.1-00234-g423af27f7d81-dirty)
  MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 88088242  XER: 00000000
  CFAR: c0000000000b34a4 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 0

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-28 22:10:00 +10:00
Michael Ellerman f6fc73fb96 powerpc/oops: Print CR/XER on same line as MSR
Somehow we missed this when the pr_cont() changes went in. Fix CR/XER
to go on the same line as MSR, as they have historically, eg:

  MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI>  CR: 4804408a  XER: 20000000

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-28 22:09:59 +10:00
Michael Ellerman 1c56cd8ee9 powerpc/oops: Use IS_ENABLED() for oops markers
Just because it looks less gross.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-28 22:09:59 +10:00
Michael Ellerman 2e82ca3c39 powerpc/oops: Print the kernel's endian in the oops
Although the MSR tells you what endian you're in it's possible that
isn't the same endian the kernel was built for, and if that happens
you're usually having a very bad day. So print a marker to make
it 100% clear which endian the kernel was built for.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-28 22:09:58 +10:00
Michael Ellerman 72c0d9ee4a powerpc/oops: Fix the oops markers to use pr_cont()
When we oops we print a few markers for significant config options
such as PREEMPT, SMP etc. Currently these appear on separate lines
because we're not using pr_cont() properly. Fix it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-28 22:09:58 +10:00
Naveen N. Rao 2dea1d9c38 powerpc/uprobes: Implement arch_uretprobe_is_alive()
This helper is used to detect if a uprobe'd function has returned
through a setjmp/longjmp, rather than branching to the LR that was
updated previously by us. This fixes a SIGSEGV that gets generated when
programs use setjmp/longjmp with uretprobes.

We use the arm64 model (arch/arm64/kernel/probes/uprobes.c:
arch_uretprobe_is_alive()) for detecting when stack frames have been
removed from under us.

Reference:
https://marc.info/?l=linux-kernel&m=143748610330073
commit 7b868e4802 ("uprobes/x86: Reimplement arch_uretprobe_is_alive()")
commit db087ef69a ("uprobes/x86: Make arch_uretprobe_is_alive(RP_CHECK_CALL) more
clever")

Tested with the test program from:
https://sourceware.org/git/gitweb.cgi?p=systemtap.git;a=blob;f=testsuite/systemtap.base/bz5274.c;hb=HEAD

And this script:
    $ cat test.sh
    #!/bin/bash

    perf probe -x ./bz5274 -a bz5274_main_return=main%return
    perf probe -x ./bz5274 -a bz5274_funca_return=funca%return
    perf probe -x ./bz5274 -a bz5274_funcb_return=funcb%return
    perf probe -x ./bz5274 -a bz5274_funcc_return=funcc%return
    perf probe -x ./bz5274 -a bz5274_funcd_return=funcd%return

    perf record -e 'probe_bz5274:*' -aR ./bz5274

Reported-by: Gustavo Luiz Duarte <gduarte@redhat.com>
Reported-by: zsun@redhat.com
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-24 16:19:21 +10:00
Naveen N. Rao ec4189c4e8 powerpc/kprobes: Don't save/restore DAR/DSISR to/from pt_regs for optprobes
We don't save/restore these across a trap, or with KPROBES_ON_FTRACE.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-24 16:19:01 +10:00
Nicholas Piggin d1d0d5ffb3 powerpc/64: Optimise set/clear of CTRL[RUN] (runlatch)
On modern CPUs the CTRL register is read-only except bit 63 which is
the run latch control. This means it can be updated with a mtspr
rather than mfspr/mtspr.

To accomodate older CPUs (Cell at least), where there are other bits
in the register, we still do a read/modify/write on pre 2.06 CPUs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Update change log to mention 2.06 workaround]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23 23:48:38 +10:00
Nicholas Piggin 7b76a1f5ed powerpc/64s: Remove spurious IRQ reason in IRQ replay
HVI interrupts have always used 0x500, so remove the dead branch.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23 23:17:29 +10:00
Nicholas Piggin ccd5eb837c powerpc/64: Remove redundant instruction in interrupt replay
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23 23:17:16 +10:00
Nicholas Piggin e6c1203d5c powerpc/64s: Use the HV handler for external IRQ replay in HV mode on POWER9
POWER9 host external interrupts use the h_virt_irq_common handler, so
use that to replay them rather than using the hardware_interrupt_common
handler. Both call do_IRQ, but using the correct handler reduces
i-cache footprint.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23 23:15:23 +10:00
Nicholas Piggin d6f73fc69b powerpc/64s: Merge HV and non-HV paths for doorbell IRQ replay
This results in smaller code, and fewer branches. This relies on the
fact that both the 0xe80 and 0xa00 handlers call the same upper level
code, namely doorbell_exception().

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Mention we rely on the implementation of the 0xe80/0xa00 handlers]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23 23:13:27 +10:00
Nicholas Piggin 6f881eaeb5 powerpc/64: Cleanup __check_irq_replay()
Move the clearing of irq_happened bits into the condition where they
were found to be set. This reduces instruction count slightly, and
reduces stores into irq_happened.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23 23:13:15 +10:00
Nicholas Piggin c05f0be888 powerpc/64s: masked_interrupt() returns to kernel so avoid restoring r13
Places in the kernel where r13 is not the PACA pointer must have
maskable interrupts disabled, so r13 does not have to be restored when
returning from a soft-masked interrupt. We should never have
interrupts soft disabled when we're in user space.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23 23:11:28 +10:00
Nicholas Piggin 6e9a2f6eba powerpc/64s: Optimise clearing of MSR_EE in masked_[H]interrupt()
MSR_EE is always enabled in SRR1 for masked interrupts, so we can use
xor to clear it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23 23:06:48 +10:00
Nicholas Piggin e0c827c09c powerpc/64s: Avoid a branch in masked_[H]interrupt()
Interrupts which do not require EE to be cleared can all be tested
with a single bitwise test.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23 23:02:48 +10:00
Michael Ellerman 3e23a12bca powerpc/64s: Fix replay interrupt return label name
In __replay_interrupt() we take the address of a local label so we can
return to it later. However the assembler turns the local label into a
symbol with a name like ".L1^B42" - where "^B" is literally "\002".
This does not make for pleasant stack traces. Fix it by giving the
label a sensible name.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23 22:27:05 +10:00
Rob Herring b7c670d673 powerpc: Convert to using %pOF instead of full_name
Now that we have a custom printf format specifier, convert users of
full_name to use %pOF instead. This is preparation to remove storing
of the full path string for each node.

Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Anatolij Gustschin <agust@denx.de>
Cc: Scott Wood <oss@buserror.net>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linuxppc-dev@lists.ozlabs.org
Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23 22:27:04 +10:00
Michael Ellerman 15c659ff9d Merge branch 'fixes' into next
There's a non-trivial dependency between some commits we want to put in
next and the KVM prefetch work around that went into fixes. So merge
fixes into next.
2017-08-23 22:20:10 +10:00
Michael Ellerman 8434f0892e Merge branch 'topic/ppc-kvm' into next
Bring in the commit to rename find_linux_pte_or_hugepte() which touches
arch and KVM code, and might need to be merged with the kvmppc tree to
avoid conflicts.
2017-08-17 23:14:17 +10:00
Aneesh Kumar K.V 94171b19c3 powerpc/mm: Rename find_linux_pte_or_hugepte()
Add newer helpers to make the function usage simpler. It is always
recommended to use find_current_mm_pte() for walking the page table.
If we cannot use find_current_mm_pte(), it should be documented why
the said usage of __find_linux_pte() is safe against a parallel THP
split.

For now we have KVM code using __find_linux_pte(). This is because kvm
code ends up calling __find_linux_pte() in real mode with MSR_EE=0 but
with PACA soft_enabled = 1. We may want to fix that later and make
sure we keep the MSR_EE and PACA soft_enabled in sync. When we do that
we can switch kvm to use find_linux_pte().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-17 23:13:46 +10:00
Benjamin Herrenschmidt 96c79b6bd7 powerpc: Remove more redundant VSX save/tests
__giveup_vsx/save_vsx are completely equivalent to testing MSR_FP
and MSR_VEC and calling the corresponding giveup/save function so
just remove the spurious VSX cases. Also add WARN_ONs checking that
we never have VSX enabled without the two other.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-16 22:35:04 +10:00
Benjamin Herrenschmidt dc801081f2 powerpc: Remove redundant clear of MSR_VSX in __giveup_vsx()
__giveup_fpu() already does it and we cannot have MSR_VSX set
without having MSR_FP also set.

This also adds a warning to check we indeed do

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-16 22:35:03 +10:00
Benjamin Herrenschmidt 746874d31c powerpc: Remove redundant FP/Altivec giveup code
__giveup_vsx() already calls those two functions.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-16 22:34:42 +10:00
Benjamin Herrenschmidt 6a303833b5 powerpc: Fix missing newline before {
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-16 22:34:34 +10:00
Benjamin Herrenschmidt 5a69aec945 powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VEC
VSX uses a combination of the old vector registers, the old FP
registers and new "second halves" of the FP registers.

Thus when we need to see the VSX state in the thread struct
(flush_vsx_to_thread()) or when we'll use the VSX in the kernel
(enable_kernel_vsx()) we need to ensure they are all flushed into
the thread struct if either of them is individually enabled.

Unfortunately we only tested if the whole VSX was enabled, not if they
were individually enabled.

Fixes: 72cd7b44bc ("powerpc: Uncomment and make enable_kernel_vsx() routine available")
Cc: stable@vger.kernel.org # v4.3+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-16 19:35:54 +10:00
Aneesh Kumar K.V 79cc38ded1 powerpc/mm/hugetlb: Add support for reserving gigantic huge pages via kernel command line
With commit aa888a7497 ("hugetlb: support larger than MAX_ORDER") we added
support for allocating gigantic hugepages via kernel command line. Switch
ppc64 arch specific code to use that.

W.r.t FSL support, we now limit our allocation range using BOOTMEM_ALLOC_ACCESSIBLE.

We use the kernel command line to do reservation of hugetlb pages on powernv
platforms. On pseries hash mmu mode the supported gigantic huge page size is
16GB and that can only be allocated with hypervisor assist. For pseries the
command line option doesn't do the allocation. Instead pseries does gigantic
hugepage allocation based on hypervisor hint that is specified via
"ibm,expected#pages" property of the memory node.

Cc: Scott Wood <oss@buserror.net>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-16 14:56:12 +10:00
Christophe Leroy 95902e6c88 powerpc/mm: Implement STRICT_KERNEL_RWX on PPC32
This patch implements STRICT_KERNEL_RWX on PPC32.

As for CONFIG_DEBUG_PAGEALLOC, it deactivates BAT and LTLB mappings
in order to allow page protection setup at the level of each page.

As BAT/LTLB mappings are deactivated, there might be a performance
impact.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15 22:55:57 +10:00
Christophe Leroy 2ef973a9af powerpc/8xx: Reduce DTLB miss handler by one insn
This reduces the DTLB miss handler hot path (user address path)
by one instruction by preserving r10.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15 22:55:55 +10:00
Christophe Leroy a3059b0ca0 powerpc/8xx: Make pinning of ITLBs optional
As stated in a comment in head_8xx.S, today we "Always pin the first
8 MB ITLB to prevent ITLB misses while mucking around with SRR0/SRR1
in asm".

This issue has just been cleared by the preceding patch, therefore
we can make this pinning optional (on by default) and independent
of DATA pinning.

This patch also makes pinning of IMMR independent of pinning of DATA.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15 22:55:53 +10:00
Christophe Leroy 0eb0d2e77d powerpc/32: Avoid risk of unrecoverable TLBmiss inside entry_32.S
By default, the 8xx pins an ITLB on the first 8M of memory in order
to avoid any ITLB miss on kernel code.
However, with some debug functions like DEBUG_PAGEALLOC and
DEBUG_RODATA, pinning TLBs is contradictory.

In order to avoid any ITLB miss in a critical section without pinning
TLBs, we have to ensure that there is no page boundary crossed between
the setup of a new value in SRR0/SRR1 and the associated RFI.

The functions modifying srr0/srr1 are all located in setup_32.S.
They are spread over almost 4kbytes.

The patch forces a 12 bits (4kbytes) alignment for those
functions. This garanties that the functions remain in a
single 4k page.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15 22:55:53 +10:00
Christophe Leroy c8a127092e powerpc/8xx: Remove macro that checks kernel address
The macro to check if an address is a kernel address or not is
not used anymore in DTLBmiss handler. It is used in ITLB miss handler
and in DTLB error handler. DTLB error handler is not a hot path, it
doesn't need such optimisation.

In order to simplify a following patch which will rework ITLB miss
handler, we remove the macros and reintroduce them inside the handler.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15 22:55:52 +10:00
Andreas Schwab 6c80d3164e powerpc/l2cr_6xx: Fix invalid use of register expressions
This fixes another invalid use of register expressions.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15 21:04:32 +10:00
Michael Ellerman 63b85621d9 powerpc/iommu: Avoid undefined right shift in iommu_range_alloc()
In iommu_range_alloc() we generate a mask by right shifting ~0,
however if the specified alignment is 0 then we right shift by 64,
which is undefined. UBSAN tells us so:

  UBSAN: Undefined behaviour in ../arch/powerpc/kernel/iommu.c:193:35
  shift exponent 64 is too large for 64-bit type 'long unsigned int'

We can avoid it by instead generating the mask with:

  align_mask = (1ull << align_order) - 1;

That will also generate an undefined shift if align_order is 64 or
greater, but that shouldn't be a problem for a while.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15 20:30:58 +10:00
Christophe Leroy 0e9645df58 powerpc/8xx: Remove cpu dependent macro instructions from head_8xx
head_8xx is dedicated to 8xx so no need to use macros that
depends on the CPU

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 23:32:21 +10:00
Christophe Leroy 4915349b10 powerpc/8xx: Use symbolic names for DSISR bits in DSI
Use symbolic names for DSISR bits in DSI

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 23:32:20 +10:00
Christophe Leroy 3ee87674e0 powerpc/8xx: Use symbolic PVR value
For the 8xx, PVR values defined in arch/powerpc/include/asm/reg.h
are nowhere used.

Remove all defines and add PVR_8xx

Use it in arch/powerpc/kernel/cputable.c

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 23:32:18 +10:00
Christophe Leroy 968159c003 powerpc/8xx: Getting rid of remaining use of CONFIG_8xx
Two config options exist to define powerpc MPC8xx:
* CONFIG_PPC_8xx
* CONFIG_8xx

arch/powerpc/platforms/Kconfig.cputype has contained the following
comment about CONFIG_8xx item for some years:
"# this is temp to handle compat with arch=ppc"

arch/powerpc is now the only place with remaining use of
CONFIG_8xx: get rid of them.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 23:32:12 +10:00
Christophe Leroy 72e4b2cdf0 powerpc/time: refactor MFTB() to limit number of ifdefs
The 8xx cannot access the TBL and TBU registers using mfspr/mtspr
It must be accessed using mftb/mftbu

Due to this, there is a number of places with #ifdef CONFIG_8xx

This patch defines new macros MFTBL(x) and MFTBU(x) on the same model
as MFTB(x) and tries to make use of them as much as possible.

In arch/powerpc/include/asm/timex.h, we also remove the ifdef
for the asm() operands as the compiler doesn't mind unused operands

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 23:32:09 +10:00
Christophe Leroy fbbcc3bb13 powerpc/8xx: Remove SoftwareEmulation()
Since commit aa42c69c67 ("[POWERPC] Add support for FP emulation
for the e300c2 core"), program_check_exception() can be called for
math emulation. In that case, 'reason' is 0.

On the 8xx, there is a Software Emulation interrupt which is
called for all unimplemented and illegal instructions. This
interrupt calls SoftwareEmulation() which does almost the
same as program_check_exception() called with reason = 0.

The Software Emulation interrupt sets all reason bits to 0,
it is therefore possible to call program_check_exception()
directly from the interrupt handler.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 23:32:04 +10:00
Christophe Leroy f70b1e8d17 powerpc/8xx: Move 8xx machine check handlers into platforms/8xx
In the same spirit as what was done for 4xx and 44x, move
the 8xx machine check into platforms/8xx

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 23:32:02 +10:00
Michael Ellerman d30a5a5262 powerpc/traps: Use SRR1 defines for program check reasons
Currently we open code the reason codes for program checks. Instead use
the existing SRR1 defines.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 23:31:32 +10:00
Michael Ellerman ccd3cd3613 powerpc/mce: Move 64-bit machine check code into mce.c
We already have mce.c which is built for 64bit and contains other parts
of the machine check code, so move these bits in there too.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 23:31:31 +10:00
Michael Ellerman 7f3f819e94 powerpc/traps: machine_check_generic() is only used on 32-bit
Make it clear that the fallback version of machine_check_generic() is
only used on 32-bit configs.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 23:31:31 +10:00
Michael Ellerman 42bff234be powerpc/traps: Inline get_mc_reason()
get_mc_reason() no longer provides (if it ever really did) any
meaningful abstraction, so remove it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 23:31:30 +10:00
Michael Ellerman 0d0935b367 powerpc/4xx: Move machine_check_4xx() into platforms/4xx
Now that we have 4xx platform directory we can move the 4xx machine
check handler in there. Again we drop get_mc_reason() and replace it
with regs->dsisr directly (which is actually SPRN_ESR).

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 23:31:30 +10:00
Michael Ellerman 5b4e28577b powerpc/44x: Move 44x machine check handlers into platforms/44x
We have several 44x machine check handlers defined in traps.c. It would
be preferable if they were split out with the platforms that use them.
Do that.

In the process, drop get_mc_reason() and instead just open code the
lookup of reason from regs->dsisr. This avoids a pointless layer of
abstraction.

We know to use regs->dsisr because 44x enables BOOKE which enables
PPC_ADV_DEBUG_REGS, and FSL_BOOKE is not enabled on 44x builds.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 23:31:24 +10:00
Michael Ellerman 13fef7f9da powerpc/47x: Guard 47x cputable entries with CONFIG_PPC_47x
Currently we build the 47x cputable entries even when CONFIG_PPC_47x is
disabled. That means a kernel built without CONFIG_PPC_47x will claim to
support a 47x CPU and start booting, only to break somewhere later
because it doesn't have 47x support compiled in.

So guard the 47x cputable entries with CONFIG_PPC_47x. Note that this is
inside the #ifdef CONFIG_44x section, because 47x depends on 44x.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 23:28:59 +10:00
Nicholas Piggin 04019bf8eb powerpc: Add irq accounting for watchdog interrupts
This adds an irq counter for the watchdog soft-NMI. This interrupt
only fires when interrupts are soft-disabled, so it will not
increment much even when the watchdog is running. However it's
useful for debugging and sanity checking.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 22:30:02 +10:00
Nicholas Piggin ca41ad4377 powerpc: Add irq accounting for system reset interrupts
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 22:30:01 +10:00
Nicholas Piggin 75eb767e4c powerpc: Fix powerpc-specific watchdog build configuration
The powerpc kernel/watchdog.o should be built when HARDLOCKUP_DETECTOR
and HAVE_HARDLOCKUP_DETECTOR_ARCH are both selected. If only the former
is selected, then the generic perf watchdog has been selected.

To simplify this check, introduce a new Kconfig symbol PPC_WATCHDOG that
depends on both. This Kconfig option means the powerpc specific
watchdog is enabled.

Without this patch, Book3E will attempt to build the powerpc watchdog.

Fixes: 2104180a53 ("powerpc/64s: implement arch-specific hardlockup watchdog")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 22:30:01 +10:00
Nicholas Piggin f886f0f6e0 powerpc/64s: Fix mce accounting for powernv
On 64-bit Book3s, when we're in HV mode, we have already counted the
machine check exception in machine_check_early().

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Use IS_ENABLED() rather than an #ifdef]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 22:30:00 +10:00
Andreas Schwab 8a583c0a8d powerpc: Fix invalid use of register expressions
binutils >= 2.26 now warns about misuse of register expressions in
assembler operands that are actually literals, for example:

  arch/powerpc/kernel/entry_64.S:535: Warning: invalid register expression

In practice these are almost all uses of r0 that should just be a
literal 0.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
[mpe: Mention r0 is almost always the culprit, fold in purgatory change]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 22:29:41 +10:00
Nicholas Piggin 96ea91e7b6 powerpc/watchdog: add locking around init/exit functions
When CPUs start and stop the watchdog, they manipulate shared data
that is normally protected by the lock. Other CPUs can be running
concurrently at this time, so it's a good idea to use locking here
to be on the safe side.

Remove the barrier which is undocumented and didn't do anything.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-09 23:45:33 +10:00
Nicholas Piggin 87607a30be powerpc/watchdog: Fix marking of stuck CPUs
When the SMP detector finds other CPUs stuck, it iterates over
them and marks them as stuck. This pulls them out of the pending
mask and allows the detector to continue with remaining good
CPUs (if nmi_watchdog=panic is not enabled).

The code to dothat was buggy because when setting a CPU stuck,
if the pending mask became empty, it resets it to keep the
watchdog running. However the iterator will continue to run
over the new pending mask and mark remaining good CPUs sas stuck.

Fix this by doing it with cpumask bitwise operations.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-09 23:45:32 +10:00
Nicholas Piggin 8e23692175 powerpc/watchdog: Fix final-check recovered case
When the watchdog decides to panic, it takes the lock and double
checks everything (to avoid races with the CPU being unstuck or
panic()ed by something else).

The exit label was misplaced and would result in all-CPUs backtrace
and watchdog panic even in the case that the condition was found to be
resolved.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-09 23:45:31 +10:00
Nicholas Piggin 26c5c6e129 powerpc/watchdog: Moderate touch_nmi_watchdog overhead
Some code can go into a tight loop calling touch_nmi_watchdog (e.g.,
stop_machine CPU hotplug code). This can cause contention on watchdog
locks particularly if all CPUs with watchdog enabled are spinning in
the loops.

Avoid this storm of activity by running the watchdog timer callback
from this path if we have exceeded the timer period since it was last
run.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-09 23:45:29 +10:00
Nicholas Piggin d8e2a40535 powerpc/watchdog: Improve watchdog lock primitive
- Hard-disable interrupts before taking the lock, which prevents
  soft-NMI re-entrancy and therefore can prevent deadlocks.
- Use raw_ variants of local_irq_disable to avoid irq debugging.
- When the lock is contended, spin at low SMT priority, using
  loads only, and with interrupts enabled (where possible).

Some stalls have been noticed at high loads that go away with improved
locking. There should not be so much locking contention in the first
place (which is addressed in a subsequent patch), but locking should
still be improved.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-09 23:45:28 +10:00
Nicholas Piggin 0459ddfdb3 powerpc: NMI IPI improve lock primitive
When the NMI IPI lock is contended, spin at low SMT priority, using
loads only, and with interrupts enabled (where possible). This
improves behaviour under high contention (e.g., a system crash when
a number of CPUs are trying to enter the debugger).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-09 23:45:26 +10:00
Christophe Leroy 64d0a506fb powerpc/32: Fix boot failure on non 6xx platforms
Commit d300627c6a ("powerpc/6xx: Handle DABR match before calling
do_page_fault") breaks non 6xx platforms.

  Failed to execute /init (error -14)
  Starting init: /bin/sh exists but couldn't execute it (error -14)
  Kernel panic - not syncing: No working init found.  Try passing init= ...
  CPU: 0 PID: 1 Comm: init Not tainted 4.13.0-rc3-s3k-dev-00143-g7aa62e972a56 #56
  Call Trace:
    panic+0x108/0x250 (unreliable)
    rootfs_mount+0x0/0x58
    ret_from_kernel_thread+0x5c/0x64
  Rebooting in 180 seconds..

This is because in handle_page_fault(), the call to do_page_fault() has been
mistakenly enclosed inside an #ifdef CONFIG_6xx

Fixes: d300627c6a ("powerpc/6xx: Handle DABR match before calling do_page_fault")
Brown-paper-bag-to-be-worn-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-08 19:35:34 +10:00
Michael Ellerman 44a12806d0 Revert "powerpc/64: Avoid restore_math call if possible in syscall exit"
This reverts commit bc4f65e4cf.

As reported by Andreas, this commit is causing unrecoverable SLB misses in the
system call exit path:

  Unrecoverable exception 4100 at c00000000000a1ec
  Oops: Unrecoverable exception, sig: 6 [#1]
  SMP NR_CPUS=2 PowerMac
  ...
  CPU: 0 PID: 18626 Comm: rm Not tainted 4.13.0-rc3 #1
  task: c00000018335e080 task.stack: c000000139e50000
  NIP: c00000000000a1ec LR: c00000000000a118 CTR: 0000000000000000
  REGS: c000000139e53bb0 TRAP: 4100   Not tainted  (4.13.0-rc3)
  MSR: 9000000000001030 <SF,HV,ME,IR,DR> CR: 24000044  XER: 20000000 SOFTE: 1
  GPR00: 0000000000000000 c000000139e53e30 c000000000abb500 fffffffffffffffe
  GPR04: c0000001eb866298 0000000000000000 0000000000000000 c00000018335e080
  GPR08: 900000000000d032 0000000000000000 0000000000000002 fffffffffffff001
  GPR12: c000000139e50000 c00000000ffff000 00003fffa8c0dca0 00003fffa8c0dc88
  GPR16: 0000000010000000 0000000000000001 00003fffa8c0eaa0 0000000000000000
  GPR20: 00003fffa8c27528 00003fffa8c27b00 0000000000000000 0000000000000000
  GPR24: 00003fffa8c0d918 00003ffff1b3efa0 00003fffa8c26d68 0000000000000000
  GPR28: 00003fffa8c249e8 00003fffa8c263d0 00003fffa8c27550 00003ffff1b3ef10
  NIP [c00000000000a1ec] system_call_exit+0xc0/0x21c
  LR [c00000000000a118] system_call+0x58/0x6c
  Call Trace:
  [c000000139e53e30] [c00000000000a118] system_call+0x58/0x6c (unreliable)
  Instruction dump:
  64a51000 7c6300d0 f8a101a0 4bffff9c 3c000000 60000006 780007c6 64000000
  60000000 7c004039 4082001c e8ed0170 <88070b78> 88c70b79 7c003214 2c200000

This is caused by us trying to load THREAD_LOAD_FP with MSR_RI=0, and taking an
SLB miss on the thread struct.

Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Diagnosed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-07 21:36:56 +10:00
Nicholas Piggin 3db40c312c powerpc/64: Fix __check_irq_replay missing decrementer interrupt
If the decrementer wraps again and de-asserts the decrementer
exception while hard-disabled, __check_irq_replay() has a test to
notice the wrap when interrupts are re-enabled.

The decrementer check must be done when clearing the PACA_IRQ_HARD_DIS
flag, not when the PACA_IRQ_DEC flag is tested. Previously this worked
because the decrementer interrupt was always the first one checked
after clearing the hard disable flag, but HMI check was moved ahead of
that, which introduced this bug.

This can cause a missed decrementer interrupt if we soft-disable
interrupts then take an HMI which is recorded in irq_happened, then
hard-disable interrupts for > 4s to wrap the decrementer.

Fixes: e0e0d6b739 ("powerpc/64: Replay hypervisor maintenance interrupt first")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-04 12:55:49 +10:00
Nicholas Piggin 09539f9b12 powerpc/perf: POWER9 PMU stops after idle workaround
POWER9 DD2 PMU can stop after a state-loss idle in some conditions.

A solution is to set then clear MMCRA[60] after wake from state-loss
idle. MMCRA[60] is a non-architected bit, see the user manual for
details.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-04 12:52:26 +10:00
Benjamin Herrenschmidt b4c001dc44 powerpc/mm: Use symbolic constants for filtering SRR1 bits on ISIs
This uses the newly defined constants for this rather than open-coded
numbers. There is a side effect on 64-bit which is to pass through
some of the new P9 bits which we didn't before.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03 16:06:44 +10:00
Benjamin Herrenschmidt 398a719d34 powerpc/mm: Update bits used to skip hash_page
We test a number of bits from DSISR/SRR1 before deciding
to call hash_page(). If any of these is set, we go directly
to do_page_fault() as the bit indicate a fault that needs
to be handled there (no hashing needed).

This updates the current open-coded masks to use the new
DSISR definitions.

This *does* change the masks actually used in two ways:

 - We used to test various bits that were defined as "always 0"
in the architecture and could be repurposed for something
else. From now on, we just ignore such bits.

 - We were missing some new bits defined on P9

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03 16:06:43 +10:00
Benjamin Herrenschmidt d300627c6a powerpc/6xx: Handle DABR match before calling do_page_fault
On legacy 6xx 32-bit procesors, we checked for the DABR match bit
in DSISR from do_page_fault(), in the middle of a pile of ifdef's
because all other CPU types do it in assembly prior to calling
do_page_fault. Fix that.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Add #ifdef CONFIG_6xx]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03 16:06:26 +10:00
Benjamin Herrenschmidt c433ec0455 powerpc/mm: Pre-filter SRR1 bits before do_page_fault()
By filtering the relevant SRR1 bits in the assembly rather than
in do_page_fault() itself, we avoid a conditional branch (since we
already come from different path for data and instruction faults).

This will allow more simplifications later

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-02 13:11:07 +10:00
Victor Aoqui 75f327c6b7 powerpc/kernel: Avoid preemption check in iommu_range_alloc()
Replace the __this_cpu_read() with raw_cpu_read() in
iommu_range_alloc(). Otherwise we get a warning about using
__this_cpu_read() in preemptible code:

  BUG: using __this_cpu_read() in preemptible
  caller is iommu_range_alloc+0xa8/0x3d0

Preemption doesn't need to be disabled since according to the comment
any CPU can safely use any IOMMU pool.

Signed-off-by: Victor Aoqui <victora@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-01 21:49:23 +10:00
Gautham R. Shenoy e1c1cfed54 powerpc/powernv: Save/Restore additional SPRs for stop4 cpuidle
The stop4 idle state on POWER9 is a deep idle state which loses
hypervisor resources, but whose latency is low enough that it can be
exposed via cpuidle.

Until now, the deep idle states which lose hypervisor resources (eg:
winkle) were only exposed via CPU-Hotplug.  Hence currently on wakeup
from such states, barring a few SPRs which need to be restored to
their older value, rest of the SPRS are reinitialized to their values
corresponding to that at boot time.

When stop4 is used in the context of cpuidle, we want these additional
SPRs to be restored to their older value, to ensure that the context
on the CPU coming back from idle is same as it was before going idle.

In this patch, we define a SPR save area in PACA (since we have used
up the volatile register space in the stack) and on POWER9, we restore
SPRN_PID, SPRN_LDBAR, SPRN_FSCR, SPRN_HFSCR, SPRN_MMCRA, SPRN_MMCR1,
SPRN_MMCR2 to the values they had before entering stop.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-01 21:01:20 +10:00
Nicholas Piggin cc491f1d35 powerpc/64s: Fix stack setup in watchdog soft_nmi_common()
The watchdog soft-NMI exception stack setup loads a stack pointer
twice, which is an obvious error. It ends up using the system reset
interrupt (true-NMI) stack, which is also a bug because the watchdog
could be preempted by a system reset interrupt that overwrites the
NMI stack.

Change the soft-NMI to use the "emergency stack". The current kernel
stack is not used, because of the longer-term goal to prevent
asynchronous stack access using soft-disable.

Fixes: 2104180a53 ("powerpc/64s: implement arch-specific hardlockup watchdog")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-31 20:22:37 +10:00
Michael Ellerman bb272221e9 Linux v4.13-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZapWhAAoJEHm+PkMAQRiGKb0IAJM6b7SbWaw69Og7+qiFB+zZ
 xp29iXqbE9fPISC6a5BRQV1ONjeDM6opGixGHqGC8Hla6k2IYz25VDNoF8wd0MXN
 cz/Ih20vd3C5afxXGe5cTT8lsPAlV0mWXxForlu6j8jPeL62FPfq6RhEkw7AcrYL
 yfYy3k3qSdOrrvBdII0WAAUi46UfIs+we9BQgbsMbkHOiqV2K0MOrzKE84Xbgepq
 RAy2xg6P4b4+hTx8xTrYc1MXwpnqjRc0oJ08gdmiwW3AOOU7LxYFn7zDkLPWi9Rr
 g4x6r4YhBTGxT4wNvovLIiqd9QFs//dMCuPWYwEtTICG48umIqqq24beQ0mvCdg=
 =08Ic
 -----END PGP SIGNATURE-----

Merge tag 'v4.13-rc1' into fixes

The fixes branch is based off a random pre-rc1 commit, because we had
some fixes that needed to go in before rc1 was released.

However we now need to fix some code that went in after that point, but
before rc1, so merge rc1 to get that code into fixes so we can fix it!
2017-07-31 20:20:29 +10:00
Michael Ellerman 7b7622bb95 powerpc/smp: Call smp_ops->setup_cpu() directly on the boot CPU
In smp_cpus_done() we need to call smp_ops->setup_cpu() for the boot
CPU, which means it has to run *on* the boot CPU.

In the past we ensured it ran on the boot CPU by changing the CPU
affinity mask of current directly. That was removed in commit
6d11b87d55 ("powerpc/smp: Replace open coded task affinity logic"),
and replaced with a work queue call.

Unfortunately using a work queue leads to a lockdep warning, now that
the CPU hotplug lock is a regular semaphore:

  ======================================================
  WARNING: possible circular locking dependency detected
  ...
  kworker/0:1/971 is trying to acquire lock:
   (cpu_hotplug_lock.rw_sem){++++++}, at: [<c000000000100974>] apply_workqueue_attrs+0x34/0xa0

  but task is already holding lock:
   ((&wfc.work)){+.+.+.}, at: [<c0000000000fdb2c>] process_one_work+0x25c/0x800
  ...
       CPU0                    CPU1
       ----                    ----
  lock((&wfc.work));
                               lock(cpu_hotplug_lock.rw_sem);
                               lock((&wfc.work));
  lock(cpu_hotplug_lock.rw_sem);

Although the deadlock can't happen in practice, because
smp_cpus_done() only runs in early boot before CPU hotplug is allowed,
lockdep can't tell that.

Luckily in commit 8fb12156b8 ("init: Pin init task to the boot CPU,
initially") tglx changed the generic code to pin init to the boot CPU
to begin with. The unpinning of init from the boot CPU happens in
sched_init_smp(), which is called after smp_cpus_done().

So smp_cpus_done() is always called on the boot CPU, which means we
don't need the work queue call at all - and the lockdep warning goes
away.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2017-07-28 19:35:45 +10:00
Gustavo Romero cd63f3cf1d powerpc/tm: Fix saving of TM SPRs in core dump
Currently flush_tmregs_to_thread() does not save the TM SPRs (TFHAR,
TFIAR, TEXASR) to the thread struct, unless the process is currently
inside a suspended transaction.

If the process is core dumping, and the TM SPRs have changed since the
last time the process was context switched, then we will save stale
values of the TM SPRs to the core dump.

Fix it by saving the live register state to the thread struct in that
case.

Fixes: 08e1c01d6a ("powerpc/ptrace: Enable support for TM SPR state")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com>
Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-28 15:56:06 +10:00
Eric W. Biederman cc731525f2 signal: Remove kernel interal si_code magic
struct siginfo is a union and the kernel since 2.4 has been hiding a union
tag in the high 16bits of si_code using the values:
__SI_KILL
__SI_TIMER
__SI_POLL
__SI_FAULT
__SI_CHLD
__SI_RT
__SI_MESGQ
__SI_SYS

While this looks plausible on the surface, in practice this situation has
not worked well.

- Injected positive signals are not copied to user space properly
  unless they have these magic high bits set.

- Injected positive signals are not reported properly by signalfd
  unless they have these magic high bits set.

- These kernel internal values leaked to userspace via ptrace_peek_siginfo

- It was possible to inject these kernel internal values and cause the
  the kernel to misbehave.

- Kernel developers got confused and expected these kernel internal values
  in userspace in kernel self tests.

- Kernel developers got confused and set si_code to __SI_FAULT which
  is SI_USER in userspace which causes userspace to think an ordinary user
  sent the signal and that it was not kernel generated.

- The values make it impossible to reorganize the code to transform
  siginfo_copy_to_user into a plain copy_to_user.  As si_code must
  be massaged before being passed to userspace.

So remove these kernel internal si codes and make the kernel code simpler
and more maintainable.

To replace these kernel internal magic si_codes introduce the helper
function siginfo_layout, that takes a signal number and an si_code and
computes which union member of siginfo is being used.  Have
siginfo_layout return an enumeration so that gcc will have enough
information to warn if a switch statement does not handle all of union
members.

A couple of architectures have a messed up ABI that defines signal
specific duplications of SI_USER which causes more special cases in
siginfo_layout than I would like.  The good news is only problem
architectures pay the cost.

Update all of the code that used the previous magic __SI_ values to
use the new SIL_ values and to call siginfo_layout to get those
values.  Escept where not all of the cases are handled remove the
defaults in the switch statements so that if a new case is missed in
the future the lack will show up at compile time.

Modify the code that copies siginfo si_code to userspace to just copy
the value and not cast si_code to a short first.  The high bits are no
longer used to hold a magic union member.

Fixup the siginfo header files to stop including the __SI_ values in
their constants and for the headers that were missing it to properly
update the number of si_codes for each signal type.

The fixes to copy_siginfo_from_user32 implementations has the
interesting property that several of them perviously should never have
worked as the __SI_ values they depended up where kernel internal.
With that dependency gone those implementations should work much
better.

The idea of not passing the __SI_ values out to userspace and then
not reinserting them has been tested with criu and criu worked without
changes.

Ref: 2.4.0-test1
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2017-07-24 14:30:28 -05:00
Linus Torvalds 10fc95547f powerpc fixes for 4.13 #3
A handful of fixes, mostly for new code.
 
 Some reworking of the new STRICT_KERNEL_RWX support to make sure we also remove
 executable permission from __init memory before it's freed.
 
 A fix to some recent optimisations to the hypercall entry where we were
 clobbering r12, this was breaking nested guests (PR KVM).
 
 A fix for the recent patch to opal_configure_cores(). This could break booting
 on bare metal Power8 boxes if the kernel was built without
 CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG.
 
 And finally a workaround for spurious PMU interrupts on Power9 DD2.
 
 Thanks to:
   Nicholas Piggin, Anton Blanchard, Balbir Singh.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZcctRAAoJEFHr6jzI4aWAji8P/RaG3/vGmhP4HGsAh7jHx/3v
 EIBsjXPozQ16AIs4235JcQ2Vg54LLms6IaFfw1w9oYZxi98sNC3b56Rtd3AGxLbq
 WHEGAD+mE9aXUOXkycLRkKZ+THiNzR0ZQKiMDqU+oJ9XbIAoof2gI4nc7UzUxGnt
 EqxERiL2RrYqRBvO3W2ErtL29yXhopUhv+HKdbuMlIcoH4uooSdn435MPleIJWxl
 WRFFv17DFw7PN6juwlff2YWvbTgrM1ND8czLfINzSZQDy0F+HEbKDFbtHlgAvOFN
 GjbODX9OuU/QRrPQv6MoY2UumZNtLSCUsw5BUg0y4Qaak8p0gAEb4D/gmvIZNp/r
 9ZqJDPfFKh/oA08apLTPJBKQmDUwCaYHyHVJNw10CU9GZ9CzW/2/B2/h2Egpgqao
 vt0TgR3FXYizhXGAEpmBZzadAg9VKZTXH/irN6X62wMxNfvRIDOS57829QFHr0ka
 wbsOK1gI1rE1g1GwsZafyURRSe95Sv84RDHW1fFQAtvFgpA3eHl26LUkpcvwOOFI
 53g9zMedYOyZXXIdBn05QojwxgXZ0ry1Wc93oDHOn3rgDL6XBeOXFfFI5jIwPJDL
 VZhOSvN1v+Ti3Q+I05zH+ll6BMhlU8RKBf+esq419DiLUeaNLHZR6GF0+UVJyJo+
 WcLsJ1x/GDa9pOBfMEJy
 =Oy4/
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "A handful of fixes, mostly for new code:

   - some reworking of the new STRICT_KERNEL_RWX support to make sure we
     also remove executable permission from __init memory before it's
     freed.

   - a fix to some recent optimisations to the hypercall entry where we
     were clobbering r12, this was breaking nested guests (PR KVM).

   - a fix for the recent patch to opal_configure_cores(). This could
     break booting on bare metal Power8 boxes if the kernel was built
     without CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG.

   - .. and finally a workaround for spurious PMU interrupts on Power9
     DD2.

  Thanks to: Nicholas Piggin, Anton Blanchard, Balbir Singh"

* tag 'powerpc-4.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/mm: Mark __init memory no-execute when STRICT_KERNEL_RWX=y
  powerpc/mm/hash: Refactor hash__mark_rodata_ro()
  powerpc/mm/radix: Refactor radix__mark_rodata_ro()
  powerpc/64s: Fix hypercall entry clobbering r12 input
  powerpc/perf: Avoid spurious PMU interrupts after idle
  powerpc/powernv: Fix boot on Power8 bare metal due to opal_configure_cores()
2017-07-21 13:54:37 -07:00
Nicholas Piggin 76fc0cfcc5 powerpc/64s: Fix hypercall entry clobbering r12 input
A previous optimisation incorrectly assumed the PAPR hcall does
not use r12, and clobbers it upon entry. In fact it is used as
an input. This can result in KVM guests crashing (observed with
PR KVM).

Instead of using r12 to save r13, tihs patch saves r13 in ctr.
This is more costly, but not as slow as using the SPRG.

Fixes: acd7d8cef0 ("powerpc/64s: Optimize hypercall/syscall entry")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-18 16:45:11 +10:00
Nicholas Piggin 101dd590a7 powerpc/perf: Avoid spurious PMU interrupts after idle
POWER9 DD2 can see spurious PMU interrupts after state-loss idle in
some conditions.

A solution is to save and reload MMCR0 over state-loss idle.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Tested-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-18 11:45:47 +10:00
Linus Torvalds deed9deb62 powerpc fixes for 4.13 #2
Nothing that really stands out, just a bunch of fixes that have come in in the
 last couple of weeks.
 
 None of these are actually fixes for code that is new in 4.13. It's roughly half
 older bugs, with fixes going to stable, and half fixes/updates for Power9.
 
 Thanks to:
   Aneesh Kumar K.V, Anton Blanchard, Balbir Singh, Benjamin Herrenschmidt,
   Madhavan Srinivasan, Michael Neuling, Nicholas Piggin, Oliver O'Halloran.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZaJKuAAoJEFHr6jzI4aWAojsQAI/Mnu3T9XCPbcuWJVHiNZu2
 CubNKFb9H+HdjRZlgdT7dpx0XITCNWziQmBpQZO+w1kwa3s5uv5qNwj8I30HyTFH
 qAArMq2Yb7GIW4PR8IpbSdCEWZii4YcIiLYk3kiEMyC2UIWeznBq+j79jFyITd9T
 beevhLLq7dMoLO8T2VRSdp4YhdLKNYmgjQG1YFh3YIh8s/tyPQ0OsDaxNk5z8wgt
 /htWiVQ2X/xWRHBnVKT4olc97pXqTvtaCdBWjD/fKNZK50YIwAjr2cJVoXpJidAW
 POwwMto7rjSJDYpE6IgqQnhz0VqRTNB/4QkyaeHJkt6BPhAaafkrHZYtDgc2GaUJ
 G96J6xT9bwHn8Jz1tpsxSgSA9A2GfBhdOUKMpIUVn1l9Q1ELPA0dJzzagLhgRDRG
 w97UQ+CJqvRNfJkRHcuJbNqP5i++6yOHcHGB8QLkmkSYVcCm9Hj/fMXj/DJ5tHt/
 c9t6uw25eqI7raFcxfh09+QTX8VDV96fa+GTo1HnCH71nGG2GqCoJFXyb3HEW64s
 gD7uOPOlFlqfpuGDmV1kmdwH0R15EIJb2b6QsDssrcHEg5fA6z/xcAcV839c9y47
 U8tcBiqFF0vWgB1QljTDrKuNsjg8dIeZfkSekBGD0WuBVLSADBl7f3b2k3BZB5H3
 WxosyR4ufSVfw53HaDsb
 =6eIT
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Nothing that really stands out, just a bunch of fixes that have come
  in in the last couple of weeks.

  None of these are actually fixes for code that is new in 4.13. It's
  roughly half older bugs, with fixes going to stable, and half
  fixes/updates for Power9.

  Thanks to: Aneesh Kumar K.V, Anton Blanchard, Balbir Singh, Benjamin
  Herrenschmidt, Madhavan Srinivasan, Michael Neuling, Nicholas Piggin,
  Oliver O'Halloran"

* tag 'powerpc-4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64: Fix atomic64_inc_not_zero() to return an int
  powerpc: Fix emulation of mfocrf in emulate_step()
  powerpc: Fix emulation of mcrf in emulate_step()
  powerpc/perf: Add POWER9 alternate PM_RUN_CYC and PM_RUN_INST_CMPL events
  powerpc/perf: Fix SDAR_MODE value for continous sampling on Power9
  powerpc/asm: Mark cr0 as clobbered in mftb()
  powerpc/powernv: Fix local TLB flush for boot and MCE on POWER9
  powerpc/mm/radix: Synchronize updates to the process table
  powerpc/mm/radix: Properly clear process table entry
  powerpc/powernv: Tell OPAL about our MMU mode on POWER9
  powerpc/kexec: Fix radix to hash kexec due to IAMR/AMOR
2017-07-14 15:33:15 -07:00
Daniel Axtens 054f367a32 powerpc: don't fortify prom_init
prom_init is a bit special; in theory it should be able to be linked
separately to the kernel.  To keep this from getting too complex, the
symbols that prom_init.c uses are checked.

Fortification adds symbols, and it gets quite messy as it includes
things like panic().  So just don't fortify prom_init.c for now.

Link: http://lkml.kernel.org/r/1497903987-21002-6-git-send-email-keescook@chromium.org
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:03 -07:00
Nicholas Piggin 2104180a53 powerpc/64s: implement arch-specific hardlockup watchdog
Implement an arch-speicfic watchdog rather than use the perf-based
hardlockup detector.

The new watchdog takes the soft-NMI directly, rather than going through
perf.  Perf interrupts are to be made maskable in future, so that would
prevent the perf detector from working in those regions.

Additionally, implement a SMP based detector where all CPUs watch one
another by pinging a shared cpumask.  This is because powerpc Book3S
does not have a true periodic local NMI, but some platforms do implement
a true NMI IPI.

If a CPU is stuck with interrupts hard disabled, the soft-NMI watchdog
does not work, but the SMP watchdog will.  Even on platforms without a
true NMI IPI to get a good trace from the stuck CPU, other CPUs will
notice the lockup sufficiently to report it and panic.

[npiggin@gmail.com: honor watchdog disable at boot/hotplug]
  Link: http://lkml.kernel.org/r/20170621001346.5bb337c9@roar.ozlabs.ibm.com
[npiggin@gmail.com: fix false positive warning at CPU unplug]
  Link: http://lkml.kernel.org/r/20170630080740.20766-1-npiggin@gmail.com
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170616065715.18390-6-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Tested-by: Babu Moger <babu.moger@oracle.com>	[sparc]
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:02 -07:00
Nicholas Piggin 05a4a95279 kernel/watchdog: split up config options
Split SOFTLOCKUP_DETECTOR from LOCKUP_DETECTOR, and split
HARDLOCKUP_DETECTOR_PERF from HARDLOCKUP_DETECTOR.

LOCKUP_DETECTOR implies the general boot, sysctl, and programming
interfaces for the lockup detectors.

An architecture that wants to use a hard lockup detector must define
HAVE_HARDLOCKUP_DETECTOR_PERF or HAVE_HARDLOCKUP_DETECTOR_ARCH.

Alternatively an arch can define HAVE_NMI_WATCHDOG, which provides the
minimum arch_touch_nmi_watchdog, and it otherwise does its own thing and
does not implement the LOCKUP_DETECTOR interfaces.

sparc is unusual in that it has started to implement some of the
interfaces, but not fully yet.  It should probably be converted to a full
HAVE_HARDLOCKUP_DETECTOR_ARCH.

[npiggin@gmail.com: fix]
  Link: http://lkml.kernel.org/r/20170617223522.66c0ad88@roar.ozlabs.ibm.com
Link: http://lkml.kernel.org/r/20170616065715.18390-4-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Reviewed-by: Babu Moger <babu.moger@oracle.com>
Tested-by: Babu Moger <babu.moger@oracle.com>	[sparc]
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:02 -07:00
Xunlei Pang 5203f4995d powerpc/fadump: use the correct VMCOREINFO_NOTE_SIZE for phdr
vmcoreinfo_max_size stands for the vmcoreinfo_data, the correct one we
should use is vmcoreinfo_note whose total size is VMCOREINFO_NOTE_SIZE.

Like explained in commit 77019967f0 ("kdump: fix exported size of
vmcoreinfo note"), it should not affect the actual function, but we
better fix it, also this change should be safe and backward compatible.

After this, we can get rid of variable vmcoreinfo_max_size, let's use
the corresponding macros directly, fewer variables means more safety for
vmcoreinfo operation.

[xlpang@redhat.com: fix build warning]
  Link: http://lkml.kernel.org/r/1494830606-27736-1-git-send-email-xlpang@redhat.com
Link: http://lkml.kernel.org/r/1493281021-20737-2-git-send-email-xlpang@redhat.com
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Reviewed-by: Dave Young <dyoung@redhat.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:25:59 -07:00
Nicholas Piggin 41d0c2ecde powerpc/powernv: Fix local TLB flush for boot and MCE on POWER9
There are two cases outside the normal address space management
where a CPU's local TLB is to be flushed:

  1. Host boot; in case something has left stale entries in the
     TLB (e.g., kexec).

  2. Machine check; to clean corrupted TLB entries.

CPU state restore from deep idle states also flushes the TLB.
However this seems to be a side effect of reusing the boot code to set
CPU state, rather than a requirement itself.

The current flushing has a number of problems with ISA v3.0B:

- The current radix mode of the MMU is not taken into account. tlbiel
  is undefined if the R field does not match the current radix mode.

- ISA v3.0B hash must flush the partition and process table caches.

- ISA v3.0B radix must flush partition and process scoped translations,
  partition and process table caches, and also the page walk cache.

Add POWER9 cases to handle these, with radix vs hash determined by the
host MMU mode.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-11 12:53:53 +10:00
Balbir Singh 1e2a516e89 powerpc/kexec: Fix radix to hash kexec due to IAMR/AMOR
This patch fixes a crash seen while doing a kexec from radix mode to
hash mode. Key 0 is special in hash and used in the RPN by default, we
set the key values to 0 today. In radix mode key 0 is used to control
supervisor<->user access. In hash key 0 is used by default, so the
first instruction after the switch causes a crash on kexec.

Commit 3b10d0095a ("powerpc/mm/radix: Prevent kernel execution of
user space") introduced the setting of IAMR and AMOR values to prevent
execution of user mode instructions from supervisor mode. We need to
clean up these SPR's on kexec.

Fixes: 3b10d0095a ("powerpc/mm/radix: Prevent kernel execution of user space")
Cc: stable@vger.kernel.org # v4.10+
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-10 21:07:38 +10:00
Linus Torvalds c3931a87db Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "A couple of fixes for perf and kprobes:

   - Add he missing exclude_kernel attribute for the precise_ip level so
     !CAP_SYS_ADMIN users get the proper results.

   - Warn instead of failing completely when perf has no unwind support
     for a particular architectiure built in.

   - Ensure that jprobes are at function entry and not at some random
     place"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kprobes: Ensure that jprobe probepoints are at function entry
  kprobes: Simplify register_jprobes()
  kprobes: Rename [arch_]function_offset_within_entry() to [arch_]kprobe_on_func_entry()
  perf unwind: Do not fail due to missing unwind support
  perf evsel: Set attr.exclude_kernel when probing max attr.precise_ip
2017-07-09 10:49:47 -07:00
Naveen N. Rao 659b957f20 kprobes: Rename [arch_]function_offset_within_entry() to [arch_]kprobe_on_func_entry()
Rename function_offset_within_entry() to scope it to kprobe namespace by
using kprobe_ prefix, and to also simplify it.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Suggested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/3aa6c7e2e4fb6e00f3c24fa306496a66edb558ea.1499443367.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-08 11:05:34 +02:00
Linus Torvalds d691b7e7d1 powerpc updates for 4.13
Highlights include:
 
  - Support for STRICT_KERNEL_RWX on 64-bit server CPUs.
 
  - Platform support for FSP2 (476fpe) board
 
  - Enable ZONE_DEVICE on 64-bit server CPUs.
 
  - Generic & powerpc spin loop primitives to optimise busy waiting
 
  - Convert VDSO update function to use new update_vsyscall() interface
 
  - Optimisations to hypercall/syscall/context-switch paths
 
  - Improvements to the CPU idle code on Power8 and Power9.
 
 As well as many other fixes and improvements.
 
 Thanks to:
   Akshay Adiga, Andrew Donnellan, Andrew Jeffery, Anshuman Khandual, Anton
   Blanchard, Balbir Singh, Benjamin Herrenschmidt, Christophe Leroy, Christophe
   Lombard, Colin Ian King, Dan Carpenter, Gautham R. Shenoy, Hari Bathini, Ian
   Munsie, Ivan Mikhaylov, Javier Martinez Canillas, Madhavan Srinivasan,
   Masahiro Yamada, Matt Brown, Michael Neuling, Michal Suchanek, Murilo
   Opsfelder Araujo, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Paul
   Mackerras, Pavel Machek, Russell Currey, Santosh Sivaraj, Stephen Rothwell,
   Thiago Jung Bauermann, Yang Li.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZXyPCAAoJEFHr6jzI4aWAI9QQAISf2x5y//cqCi4ISyQB5pTq
 KLS/yQajNkQOw7c0fzBZOaH5Xd/SJ6AcKWDg8yDlpDR3+sRRsr98iIRECgKS5I7/
 DxD9ywcbSoMXFQQo1ZMCp5CeuMUIJRtugBnUQM+JXCSUCPbznCY5DchDTLyTBTpO
 MeMVhI//JxthhoOMA9MudiEGaYCU9ho442Z4OJUSiLUv8WRbvQX9pTqoc4vx1fxA
 BWf2mflztBVcIfKIyxIIIlDLukkMzix6gSYPMCbC7lzkbnU7JSqKiheJXjo1gJS2
 ePHKDxeNR2/QP0g/j3aT/MR1uTt9MaNBSX3gANE1xQ9OoJ8m1sOtCO4gNbSdLWae
 eXhDnoiEp930DRZOeEioOItuWWoxFaMyYk3BMmRKV4QNdYL3y3TRVeL2/XmRGqYL
 Lxz4IY/x5TteFEJNGcRX90uizNSi8AaEXPF16pUk8Ctt6eH3ZSwPMv2fHeYVCMr0
 KFlKHyaPEKEoztyzLcUR6u9QB56yxDN58bvLpd32AeHvKhqyxFoySy59x9bZbatn
 B2y8mmDItg860e0tIG6jrtplpOVvL8i5jla5RWFVoQDuxxrLAds3vG9JZQs+eRzx
 Fiic93bqeUAS6RzhXbJ6QQJYIyhE7yqpcgv7ME1W87SPef3HPBk9xlp3yIDwdA2z
 bBDvrRnvupusz8qCWrxe
 =w8rj
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Highlights include:

   - Support for STRICT_KERNEL_RWX on 64-bit server CPUs.

   - Platform support for FSP2 (476fpe) board

   - Enable ZONE_DEVICE on 64-bit server CPUs.

   - Generic & powerpc spin loop primitives to optimise busy waiting

   - Convert VDSO update function to use new update_vsyscall() interface

   - Optimisations to hypercall/syscall/context-switch paths

   - Improvements to the CPU idle code on Power8 and Power9.

  As well as many other fixes and improvements.

  Thanks to: Akshay Adiga, Andrew Donnellan, Andrew Jeffery, Anshuman
  Khandual, Anton Blanchard, Balbir Singh, Benjamin Herrenschmidt,
  Christophe Leroy, Christophe Lombard, Colin Ian King, Dan Carpenter,
  Gautham R. Shenoy, Hari Bathini, Ian Munsie, Ivan Mikhaylov, Javier
  Martinez Canillas, Madhavan Srinivasan, Masahiro Yamada, Matt Brown,
  Michael Neuling, Michal Suchanek, Murilo Opsfelder Araujo, Naveen N.
  Rao, Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pavel Machek,
  Russell Currey, Santosh Sivaraj, Stephen Rothwell, Thiago Jung
  Bauermann, Yang Li"

* tag 'powerpc-4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (158 commits)
  powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs
  powerpc/mm/radix: Implement STRICT_RWX/mark_rodata_ro() for Radix
  powerpc/mm/hash: Implement mark_rodata_ro() for hash
  powerpc/vmlinux.lds: Align __init_begin to 16M
  powerpc/lib/code-patching: Use alternate map for patch_instruction()
  powerpc/xmon: Add patch_instruction() support for xmon
  powerpc/kprobes/optprobes: Use patch_instruction()
  powerpc/kprobes: Move kprobes over to patch_instruction()
  powerpc/mm/radix: Fix execute permissions for interrupt_vectors
  powerpc/pseries: Fix passing of pp0 in updatepp() and updateboltedpp()
  powerpc/64s: Blacklist rtas entry/exit from kprobes
  powerpc/64s: Blacklist functions invoked on a trap
  powerpc/64s: Un-blacklist system_call() from kprobes
  powerpc/64s: Move system_call() symbol to just after setting MSR_EE
  powerpc/64s: Blacklist system_call() and system_call_common() from kprobes
  powerpc/64s: Convert .L__replay_interrupt_return to a local label
  powerpc64/elfv1: Only dereference function descriptor for non-text symbols
  cxl: Export library to support IBM XSL
  powerpc/dts: Use #include "..." to include local DT
  powerpc/perf/hv-24x7: Aggregate result elements on POWER9 SMT8
  ...
2017-07-07 13:55:45 -07:00
Linus Torvalds f72e24a124 This is the first pull request for the new dma-mapping subsystem
In this new subsystem we'll try to properly maintain all the generic
 code related to dma-mapping, and will further consolidate arch code
 into common helpers.
 
 This pull request contains:
 
  - removal of the DMA_ERROR_CODE macro, replacing it with calls
    to ->mapping_error so that the dma_map_ops instances are
    more self contained and can be shared across architectures (me)
  - removal of the ->set_dma_mask method, which duplicates the
    ->dma_capable one in terms of functionality, but requires more
    duplicate code.
  - various updates for the coherent dma pool and related arm code
    (Vladimir)
  - various smaller cleanups (me)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlldmw0LHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYOiKA/+Ln1mFLSf3nfTzIHa24Bbk8ZTGr0B8TD4Vmyyt8iG
 oO3AeaTLn3d6ugbH/uih/tPz8PuyXsdiTC1rI/ejDMiwMTSjW6phSiIHGcStSR9X
 VFNhmMFacp7QpUpvxceV0XZYKDViAoQgHeGdp3l+K5h/v4AYePV/v/5RjQPaEyOh
 YLbCzETO+24mRWdJxdAqtTW4ovYhzj6XsiJ+pAjlV0+SWU6m5L5E+VAPNi1vqv1H
 1O2KeCFvVYEpcnfL3qnkw2timcjmfCfeFAd9mCUAc8mSRBfs3QgDTKw3XdHdtRml
 LU2WuA5cpMrOdBO4mVra2plo8E2szvpB1OZZXoKKdCpK3VGwVpVHcTvClK2Ks/3B
 GDLieroEQNu2ZIUIdWXf/g2x6le3BcC9MmpkAhnGPqCZ7skaIBO5Cjpxm0zTJAPl
 PPY3CMBBEktAvys6DcudOYGixNjKUuAm5lnfpcfTEklFdG0AjhdK/jZOplAFA6w4
 LCiy0rGHM8ZbVAaFxbYoFCqgcjnv6EjSiqkJxVI4fu/Q7v9YXfdPnEmE0PJwCVo5
 +i7aCLgrYshTdHr/F3e5EuofHN3TDHwXNJKGh/x97t+6tt326QMvDKX059Kxst7R
 rFukGbrYvG8Y7yXwrSDbusl443ta0Ht7T1oL4YUoJTZp0nScAyEluDTmrH1JVCsT
 R4o=
 =0Fso
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping infrastructure from Christoph Hellwig:
 "This is the first pull request for the new dma-mapping subsystem

  In this new subsystem we'll try to properly maintain all the generic
  code related to dma-mapping, and will further consolidate arch code
  into common helpers.

  This pull request contains:

   - removal of the DMA_ERROR_CODE macro, replacing it with calls to
     ->mapping_error so that the dma_map_ops instances are more self
     contained and can be shared across architectures (me)

   - removal of the ->set_dma_mask method, which duplicates the
     ->dma_capable one in terms of functionality, but requires more
     duplicate code.

   - various updates for the coherent dma pool and related arm code
     (Vladimir)

   - various smaller cleanups (me)"

* tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mapping: (56 commits)
  ARM: dma-mapping: Remove traces of NOMMU code
  ARM: NOMMU: Set ARM_DMA_MEM_BUFFERABLE for M-class cpus
  ARM: NOMMU: Introduce dma operations for noMMU
  drivers: dma-mapping: allow dma_common_mmap() for NOMMU
  drivers: dma-coherent: Introduce default DMA pool
  drivers: dma-coherent: Account dma_pfn_offset when used with device tree
  dma: Take into account dma_pfn_offset
  dma-mapping: replace dmam_alloc_noncoherent with dmam_alloc_attrs
  dma-mapping: remove dmam_free_noncoherent
  crypto: qat - avoid an uninitialized variable warning
  au1100fb: remove a bogus dma_free_nonconsistent call
  MAINTAINERS: add entry for dma mapping helpers
  powerpc: merge __dma_set_mask into dma_set_mask
  dma-mapping: remove the set_dma_mask method
  powerpc/cell: use the dma_supported method for ops switching
  powerpc/cell: clean up fixed mapping dma_ops initialization
  tile: remove dma_supported and mapping_error methods
  xen-swiotlb: remove xen_swiotlb_set_dma_mask
  arm: implement ->dma_supported instead of ->set_dma_mask
  mips/loongson64: implement ->dma_supported instead of ->set_dma_mask
  ...
2017-07-06 19:20:54 -07:00
Linus Torvalds c136b84393 PPC:
- Better machine check handling for HV KVM
 - Ability to support guests with threads=2, 4 or 8 on POWER9
 - Fix for a race that could cause delayed recognition of signals
 - Fix for a bug where POWER9 guests could sleep with interrupts pending.
 
 ARM:
 - VCPU request overhaul
 - allow timer and PMU to have their interrupt number selected from userspace
 - workaround for Cavium erratum 30115
 - handling of memory poisonning
 - the usual crop of fixes and cleanups
 
 s390:
 - initial machine check forwarding
 - migration support for the CMMA page hinting information
 - cleanups and fixes
 
 x86:
 - nested VMX bugfixes and improvements
 - more reliable NMI window detection on AMD
 - APIC timer optimizations
 
 Generic:
 - VCPU request overhaul + documentation of common code patterns
 - kvm_stat improvements
 
 There is a small conflict in arch/s390 due to an arch-wide field rename.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJZW4XTAAoJEL/70l94x66DkhMH/izpk54KI17PtyQ9VYI2sYeZ
 BWK6Kl886g3ij4pFi3pECqjDJzWaa3ai+vFfzzpJJ8OkCJT5Rv4LxC5ERltVVmR8
 A3T1I/MRktSC0VJLv34daPC2z4Lco/6SPipUpPnL4bE2HATKed4vzoOjQ3tOeGTy
 dwi7TFjKwoVDiM7kPPDRnTHqCe5G5n13sZ49dBe9WeJ7ttJauWqoxhlYosCGNPEj
 g8ZX8+cvcAhVnz5uFL8roqZ8ygNEQq2mgkU18W8ZZKuiuwR0gdsG0gSBFNTdwIMK
 NoreRKMrw0+oLXTIB8SZsoieU6Qi7w3xMAMabe8AJsvYtoersugbOmdxGCr1lsA=
 =OD7H
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "PPC:
   - Better machine check handling for HV KVM
   - Ability to support guests with threads=2, 4 or 8 on POWER9
   - Fix for a race that could cause delayed recognition of signals
   - Fix for a bug where POWER9 guests could sleep with interrupts pending.

  ARM:
   - VCPU request overhaul
   - allow timer and PMU to have their interrupt number selected from userspace
   - workaround for Cavium erratum 30115
   - handling of memory poisonning
   - the usual crop of fixes and cleanups

  s390:
   - initial machine check forwarding
   - migration support for the CMMA page hinting information
   - cleanups and fixes

  x86:
   - nested VMX bugfixes and improvements
   - more reliable NMI window detection on AMD
   - APIC timer optimizations

  Generic:
   - VCPU request overhaul + documentation of common code patterns
   - kvm_stat improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits)
  Update my email address
  kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS
  x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12
  kvm: x86: mmu: allow A/D bits to be disabled in an mmu
  x86: kvm: mmu: make spte mmio mask more explicit
  x86: kvm: mmu: dead code thanks to access tracking
  KVM: PPC: Book3S: Fix typo in XICS-on-XIVE state saving code
  KVM: PPC: Book3S HV: Close race with testing for signals on guest entry
  KVM: PPC: Book3S HV: Simplify dynamic micro-threading code
  KVM: x86: remove ignored type attribute
  KVM: LAPIC: Fix lapic timer injection delay
  KVM: lapic: reorganize restart_apic_timer
  KVM: lapic: reorganize start_hv_timer
  kvm: nVMX: Check memory operand to INVVPID
  KVM: s390: Inject machine check into the nested guest
  KVM: s390: Inject machine check into the guest
  tools/kvm_stat: add new interactive command 'b'
  tools/kvm_stat: add new command line switch '-i'
  tools/kvm_stat: fix error on interactive command 'g'
  KVM: SVM: suppress unnecessary NMI singlestep on GIF=0 and nested exit
  ...
2017-07-06 18:38:31 -07:00
Linus Torvalds 2cc7b4ca7d Various fixes and tweaks for the pstore subsystem. Highlights:
- use memdup_user() instead of open-coded copies (Geliang Tang)
 - fix record memory leak during initialization (Douglas Anderson)
 - avoid confused compressed record warning (Ankit Kumar)
 - prepopulate record timestamp and remove redundant logic from backends
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJZXGq0AAoJEIly9N/cbcAmecQP/iw5ngoGaB5pQD8Jq8srzWJK
 nGysSHuEQmDMSmTpXKllmi+AVotMXtvLzeEy0ThmtaTYJPUF2NYi4BIv0SonAu/v
 6Jds4AP9OYBZAxe95Xdk/VlDpo3LW2DxDk3URC3kmDCWqr91zH2a8RQCfr1ArGb0
 7vI0fEKuc4rDTnOIw4hlJ60UyYX+PsD7m/s/9p///mFN7nIhCvm1w9ToIIwNovX7
 4Hvgs135ZanBjLkvPEKPMQRoizCGEeznZPNhn0WFe+AKFIW0KLME+XArgcrCg5w+
 UZr3p706fqMe54ZuZhzlaoHZKuEEfsSda8XamgSA1tMuHm983DZJ0k9nskyXRqtT
 tGBSaFbrArAim3JvI5diJ6LB7QGGThGWjUc8tkbTMyJyxwZeDvoPIyirzTnignRz
 RbnL3DJDAnKqNuf0RyX6a6iz6JobXRz52SZqOWZ/CWrDnBtsXnvPz/enMANgKLZn
 5Hq3ngapIa+DdK6jipppgPMY2woHrb3Jr6E0xhU6PDXQFMNI8cnD0+6H8h3//XG0
 4q6bGsDMy6G6o6RvxIFN+Nr7Xrff8CSlujClIQBPSgn0fPcxvOnZTnYjN0UQ0RMW
 OCh68vb4eJgi3diYLqQ/1m25fIRsxC8O0uu089bH4uGJgtZUfEX+D6L5UtBGt+fe
 BXbX1HbaVFeatVB/o0el
 =VRl5
 -----END PGP SIGNATURE-----

Merge tag 'pstore-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull pstore updates from Kees Cook:
 "Various fixes and tweaks for the pstore subsystem.

  Highlights:

   - use memdup_user() instead of open-coded copies (Geliang Tang)

   - fix record memory leak during initialization (Douglas Anderson)

   - avoid confused compressed record warning (Ankit Kumar)

   - prepopulate record timestamp and remove redundant logic from
     backends"

* tag 'pstore-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  powerpc/nvram: use memdup_user
  pstore: use memdup_user
  pstore: Fix format string to use %u for record id
  pstore: Populate pstore record->time field
  pstore: Create common record initializer
  efi-pstore: Refactor erase routine
  pstore: Avoid potential infinite loop
  pstore: Fix leaked pstore_record in pstore_get_backend_records()
  pstore: Don't warn if data is uncompressed and type is not PSTORE_TYPE_DMESG
2017-07-05 11:43:47 -07:00
Linus Torvalds 9a9594efe5 Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP hotplug updates from Thomas Gleixner:
 "This update is primarily a cleanup of the CPU hotplug locking code.

  The hotplug locking mechanism is an open coded RWSEM, which allows
  recursive locking. The main problem with that is the recursive nature
  as it evades the full lockdep coverage and hides potential deadlocks.

  The rework replaces the open coded RWSEM with a percpu RWSEM and
  establishes full lockdep coverage that way.

  The bulk of the changes fix up recursive locking issues and address
  the now fully reported potential deadlocks all over the place. Some of
  these deadlocks have been observed in the RT tree, but on mainline the
  probability was low enough to hide them away."

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
  cpu/hotplug: Constify attribute_group structures
  powerpc: Only obtain cpu_hotplug_lock if called by rtasd
  ARM/hw_breakpoint: Fix possible recursive locking for arch_hw_breakpoint_init
  cpu/hotplug: Remove unused check_for_tasks() function
  perf/core: Don't release cred_guard_mutex if not taken
  cpuhotplug: Link lock stacks for hotplug callbacks
  acpi/processor: Prevent cpu hotplug deadlock
  sched: Provide is_percpu_thread() helper
  cpu/hotplug: Convert hotplug locking to percpu rwsem
  s390: Prevent hotplug rwsem recursion
  arm: Prevent hotplug rwsem recursion
  arm64: Prevent cpu hotplug rwsem recursion
  kprobes: Cure hotplug lock ordering issues
  jump_label: Reorder hotplug lock and jump_label_lock
  perf/tracing/cpuhotplug: Fix locking order
  ACPI/processor: Use cpu_hotplug_disable() instead of get_online_cpus()
  PCI: Replace the racy recursion prevention
  PCI: Use cpu_hotplug_disable() instead of get_online_cpus()
  perf/x86/intel: Drop get_online_cpus() in intel_snb_check_microcode()
  x86/perf: Drop EXPORT of perf_check_microcode
  ...
2017-07-03 18:08:06 -07:00
Balbir Singh d924cc3fed powerpc/vmlinux.lds: Align __init_begin to 16M
For CONFIG_STRICT_KERNEL_RWX align __init_begin to 16M. We use 16M
since its the larger of 2M on radix and 16M on hash for our linear
mapping. The plan is to have .text, .rodata and everything upto
__init_begin marked as RX. Note we still have executable read only
data. We could further align rodata to another 16M boundary. I've used
keeping text plus rodata as read-only-executable as a trade-off to
doing read-only-executable for text and read-only for rodata.

We don't use multi PT_LOAD in PHDRS because we are not sure if all
bootloaders support them. This patch keeps PHDRS in vmlinux.lds.S as
the same they are with just one PT_LOAD for all of the kernel marked
as RWX (7).

mpe: What this means is the added alignment bloats the resulting
binary on disk, a powernv kernel goes from 17M to 22M.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03 23:12:19 +10:00