Commit Graph

722401 Commits

Author SHA1 Message Date
Michael Bringmann e67e02a544 powerpc/pseries: Fix cpu hotplug crash with memoryless nodes
On powerpc systems with shared configurations of CPUs and memory and
memoryless nodes at boot, an event ordering problem was observed on a
SLES12 build platforms with the hot-add of CPUs to the memoryless
nodes.

* The most common error occurred when the memory SLAB driver attempted
  to reference the memoryless node to which a CPU was being added
  before the kernel had finished initializing all of the data
  structures for the CPU and exited 'device_online' under
  DLPAR/hot-add.

  Normally the memoryless node would be initialized through the call
  path device_online ... arch_update_cpu_topology ... find_cpu_nid ...
  try_online_node. This patch ensures that the powerpc node will be
  initialized as early as possible, even if it was memoryless and
  CPU-less at the point when we are trying to hot-add a new CPU to it.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27 20:59:45 +11:00
Michael Bringmann ea05ba7c55 powerpc/numa: Ensure nodes initialized for hotplug
This patch fixes some problems encountered at runtime with
configurations that support memory-less nodes, or that hot-add CPUs
into nodes that are memoryless during system execution after boot. The
problems of interest include:

* Nodes known to powerpc to be memoryless at boot, but to have CPUs in
  them are allowed to be 'possible' and 'online'. Memory allocations
  for those nodes are taken from another node that does have memory
  until and if memory is hot-added to the node.

* Nodes which have no resources assigned at boot, but which may still
  be referenced subsequently by affinity or associativity attributes,
  are kept in the list of 'possible' nodes for powerpc. Hot-add of
  memory or CPUs to the system can reference these nodes and bring
  them online instead of redirecting the references to one of the set
  of nodes known to have memory at boot.

Note that this software operates under the context of CPU hotplug. We
are not doing memory hotplug in this code, but rather updating the
kernel's CPU topology (i.e. arch_update_cpu_topology /
numa_update_cpu_topology). We are initializing a node that may be used
by CPUs or memory before it can be referenced as invalid by a CPU
hotplug operation. CPU hotplug operations are protected by a range of
APIs including cpu_maps_update_begin/cpu_maps_update_done,
cpus_read/write_lock / cpus_read/write_unlock, device locks, and more.
Memory hotplug operations, including try_online_node, are protected by
mem_hotplug_begin/mem_hotplug_done, device locks, and more. In the
case of CPUs being hot-added to a previously memoryless node, the
try_online_node operation occurs wholly within the CPU locks with no
overlap. Using HMC hot-add/hot-remove operations, we have been able to
add and remove CPUs to any possible node without failures. HMC
operations involve a degree self-serialization, though.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27 20:59:02 +11:00
Michael Bringmann a346137e91 powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes
On powerpc systems which allow 'hot-add' of CPU or memory resources,
it may occur that the new resources are to be inserted into nodes that
were not used for these resources at bootup. In the kernel, any node
that is used must be defined and initialized. These empty nodes may
occur when,

* Dedicated vs. shared resources. Shared resources require information
  such as the VPHN hcall for CPU assignment to nodes. Associativity
  decisions made based on dedicated resource rules, such as
  associativity properties in the device tree, may vary from decisions
  made using the values returned by the VPHN hcall.

* memoryless nodes at boot. Nodes need to be defined as 'possible' at
  boot for operation with other code modules. Previously, the powerpc
  code would limit the set of possible nodes to those which have
  memory assigned at boot, and were thus online. Subsequent add/remove
  of CPUs or memory would only work with this subset of possible
  nodes.

* memoryless nodes with CPUs at boot. Due to the previous restriction
  on nodes, nodes that had CPUs but no memory were being collapsed
  into other nodes that did have memory at boot. In practice this
  meant that the node assignment presented by the runtime kernel
  differed from the affinity and associativity attributes presented by
  the device tree or VPHN hcalls. Nodes that might be known to the
  pHyp were not 'possible' in the runtime kernel because they did not
  have memory at boot.

This patch ensures that sufficient nodes are defined to support
configuration requirements after boot, as well as at boot. This patch
set fixes a couple of problems.

* Nodes known to powerpc to be memoryless at boot, but to have CPUs in
  them are allowed to be 'possible' and 'online'. Memory allocations
  for those nodes are taken from another node that does have memory
  until and if memory is hot-added to the node. * Nodes which have no
  resources assigned at boot, but which may still be referenced
  subsequently by affinity or associativity attributes, are kept in
  the list of 'possible' nodes for powerpc. Hot-add of memory or CPUs
  to the system can reference these nodes and bring them online
  instead of redirecting to one of the set of nodes that were known to
  have memory at boot.

This patch extracts the value of the lowest domain level (number of
allocable resources) from the device tree property
"ibm,max-associativity-domains" to use as the maximum number of nodes
to setup as possibly available in the system. This new setting will
override the instruction:

    nodes_and(node_possible_map, node_possible_map, node_online_map);

presently seen in the function arch/powerpc/mm/numa.c:initmem_init().

If the "ibm,max-associativity-domains" property is not present at
boot, no operation will be performed to define or enable additional
nodes, or enable the above 'nodes_and()'.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27 20:55:02 +11:00
Sukadev Bhattiprolu 384dfd627f powerpc/kernel: Block interrupts when updating TIDR
clear_thread_tidr() is called in interrupt context as a part of delayed
put of the task structure (i.e as a part of timer interrupt). To prevent
a deadlock, block interrupts when holding vas_thread_id_lock to set/
clear TIDR for a task.

Fixes: ec233ede4c ("powerpc: Add support for setting SPRN_TIDR")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27 20:54:57 +11:00
Alexey Kardashevskiy 902bdc5745 powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn
The pcidev value stored in pci_dn is only used for NPU/NPU2
initialization. We can easily drop the cached pointer and
use an ancient helper - pci_get_domain_bus_and_slot() instead in order
to reduce complexity.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27 20:39:01 +11:00
Christophe Leroy 5c8136fa1a powerpc/mm/nohash: do not flush the entire mm when range is a single page
Most of the time, flush_tlb_range() is called on single pages.
At the time being, flush_tlb_range() inconditionnaly calls
flush_tlb_mm() which flushes at least the entire PID pages and on
older CPUs like 4xx or 8xx it flushes the entire TLB table.

This patch calls flush_tlb_page() instead of flush_tlb_mm() when
the range is a single page.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27 20:24:44 +11:00
Bryant G. Ly fc5f622163 powerpc/pseries: Add Initialization of VF Bars
When enabling SR-IOV in pseries platform, the VF bar properties for a
PF are reported on the device node in the device tree.

This patch adds the IOV Bar resources to Linux structures from the
device tree for later use when configuring SR-IOV by PF driver.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27 20:02:53 +11:00
Bryant G. Ly 9a7f6b4386 powerpc/pseries/pci: Associate PEs to VFs in configure SR-IOV
After initial validation of SR-IOV resources, firmware will associate
PEs to the dynamic VFs created within this call. This patch adds the
association of PEs to the PF array of PE numbers indexed by VF.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27 20:02:53 +11:00
Bryant G. Ly 6ea3df6931 powerpc/eeh: Add EEH notify resume sysfs
Introduce a method for notify resume to be called from sysfs. In this
patch one can now call notify resume from sysfs when is supported by
platform.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
[mpe: Add NULL check, add empty versions to avoid #ifdefs]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27 20:02:52 +11:00
Bryant G. Ly 67923cfcfa powerpc/eeh: Add EEH operations to notify resume
When pseries SR-IOV is enabled and after a PF driver has resumed from
EEH, platform has to be notified of the event so the child VFs can be
allowed to resume their normal recovery path.

This patch makes the EEH operation allow unfreeze platform dependent
code and adds the call to pseries EEH code.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27 20:02:52 +11:00
Bryant G. Ly 565a744dd2 powerpc/pseries: Set eeh_pe of EEH_PE_VF type
To correctly use EEH code one has to make sure that the EEH_PE_VF is
set for dynamic created VFs. Therefore this patch allocates an eeh_pe
of eeh type EEH_PE_VF and associates PE with parent.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27 20:02:51 +11:00
Bryant G. Ly 856e1eb9bd PCI/AER: Add uevents in AER and EEH error/resume
Devices can go offline when erors reported. This patch adds a change
to the kernel object and lets udev know of error. When device resumes,
a change is also set reporting device as online. Therefore, EEH and
AER events are better propagated to user space for PCI devices in all
arches.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27 20:02:51 +11:00
Bryant G. Ly 64ba3dc7bf powerpc/eeh: Update VF config space after EEH
Add EEH platform operations for pseries to update VF config space.
With this change after EEH, the VF will have updated config space for
pseries platform.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27 20:02:51 +11:00
Frederic Barrat 6385d6f85f ocxl: add MAINTAINERS entry
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27 20:02:50 +11:00
Frederic Barrat 00b96c0e3c ocxl: Documentation
ocxl.rst gives a quick, high-level view of opencapi.

Update ioctl-number.txt to reflect ioctl numbers being used by the
ocxl driver

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
[mpe: Fix up mixed whitespace as spotted by gregkh]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27 20:02:24 +11:00
Frederic Barrat 741ddae6c4 cxl: Remove support for "Processing accelerators" class
The cxl driver currently declares in its table of supported PCI
devices the class "Processing accelerators". Therefore it may be
called to probe for opencapi devices, which generates errors, as the
config space of a cxl device is not compatible with opencapi.

So remove support for the generic class, as we now have (at least) two
drivers for devices of the same class. Most cxl devices are FPGAs with
a PSL which will show a known device ID of 0x477. Other devices are
really supported by the cxlflash driver and are already listed in the
table. So removing the class is expected to go unnoticed.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24 11:43:00 +11:00
Frederic Barrat b97f02246e ocxl: Add Makefile and Kconfig
OCXL_BASE triggers the platform support needed by the driver.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24 11:42:59 +11:00
Frederic Barrat 92add22e84 ocxl: Add trace points
Define a few trace points so that we can use the standard tracing
mechanism for debug and/or monitoring.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24 11:42:59 +11:00
Frederic Barrat 280b983ce2 ocxl: Add a kernel API for other opencapi drivers
Some of the functions done by the generic driver should also be needed
by other opencapi drivers: attaching a context to an adapter,
translation fault handling, AFU interrupt allocation...

So to avoid code duplication, the driver provides a kernel API that
other drivers can use, similar to calling a in-kernel library.

It is still a bit theoretical, for lack of real hardware, and will
likely need adjustements down the road. But we used the cxlflash
driver as a guinea pig.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24 11:42:59 +11:00
Frederic Barrat aeddad1760 ocxl: Add AFU interrupt support
Add user APIs through ioctl to allocate, free, and be notified of an
AFU interrupt.

For opencapi, an AFU can trigger an interrupt on the host by sending a
specific command targeting a 64-bit object handle. On POWER9, this is
implemented by mapping a special page in the address space of a
process and a write to that page will trigger an interrupt.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24 11:42:58 +11:00
Frederic Barrat 5ef3166e8a ocxl: Driver code for 'generic' opencapi devices
Add an ocxl driver to handle generic opencapi devices. Of course, it's
not meant to be the only opencapi driver, any device is free to
implement its own. But if a host application only needs basic services
like attaching to an opencapi adapter, have translation faults handled
or allocate AFU interrupts, it should suffice.

The AFU config space must follow the opencapi specification and use
the expected vendor/device ID to be seen by the generic driver.

The driver exposes the device AFUs as a char device in /dev/ocxl/

Note that the driver currently doesn't handle memory attached to the
opencapi device.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24 11:42:58 +11:00
Frederic Barrat 2cb3d64b26 powerpc/powernv: Capture actag information for the device
In the opencapi protocol, host memory contexts are referenced by a
'actag'. During setup, a driver must tell the device how many actags
it can used, and what values are acceptable.

On POWER9, the NPU can handle 64 actags per link, so they must be
shared between all the PCI functions of the link. To get a global
picture of how many actags are used by each AFU of every function, we
capture some data at the end of PCI enumeration, so that actags can be
shared fairly if needed.

This is not powernv specific per say, but rather a consequence of the
opencapi configuration specification being quite general. The number
of available actags on POWER9 makes it more likely to be hit. This is
somewhat mitigated by the fact that existing AFUs are coded by
requesting a reasonable count of actags and existing devices carry
only one AFU.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24 11:42:57 +11:00
Frederic Barrat 6914c75711 powerpc/powernv: Add platform-specific services for opencapi
Implement a few platform-specific calls which can be used by drivers:

- provide the Transaction Layer capabilities of the host, so that the
  driver can find some common ground and configure the device and host
  appropriately.

- provide the hw interrupt to be used for translation faults raised by
  the NPU

- map/unmap some NPU mmio registers to get the fault context when the
  NPU raises an address translation fault

The rest are wrappers around the previously-introduced opal calls.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24 11:42:57 +11:00
Frederic Barrat 74d656d219 powerpc/powernv: Add opal calls for opencapi
Add opal calls to interact with the NPU:

OPAL_NPU_SPA_SETUP: set the Shared Process Area (SPA)
The SPA is a table containing one entry (Process Element) per memory
context which can be accessed by the opencapi device.

OPAL_NPU_SPA_CLEAR_CACHE: clear the context cache
The NPU keeps a cache of recently accessed memory contexts. When a
Process Element is removed from the SPA, the cache for the link must
be cleared.

OPAL_NPU_TL_SET: configure the Transaction Layer
The Transaction Layer specification defines several templates for
messages to be exchanged on the link. During link setup, the host and
device must negotiate what templates are supported on both sides and
at what rates those messages can be sent.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24 11:42:56 +11:00
Andrew Donnellan 228c2f4103 powerpc/powernv: Set correct configuration space size for opencapi devices
The configuration space for opencapi devices doesn't have a PCI
Express capability, therefore confusing linux in thinking it's of an
old PCI type with a 256-byte configuration space size, instead of the
desired 4k. So add a PCI fixup to declare the correct size.

Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24 11:42:56 +11:00
Frederic Barrat 7f2c39e91f powerpc/powernv: Introduce new PHB type for opencapi links
The NPU was already abstracted by opal as a virtual PHB for nvlink,
but it helps to be able to differentiate between a nvlink or opencapi
PHB, as it's not completely transparent to linux. In particular, PE
assignment differs and we'll also need the information in later
patches.

So rename existing PNV_PHB_NPU type to PNV_PHB_NPU_NVLINK and add a
new type PNV_PHB_NPU_OCAPI.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24 11:42:56 +11:00
Nicholas Piggin bdcb1aefc5 powerpc/64s: Improve RFI L1-D cache flush fallback
The fallback RFI flush is used when firmware does not provide a way
to flush the cache. It's a "displacement flush" that evicts useful
data by displacing it with an uninteresting buffer.

The flush has to take care to work with implementation specific cache
replacment policies, so the recipe has been in flux. The initial
slow but conservative approach is to touch all lines of a congruence
class, with dependencies between each load. It has since been
determined that a linear pattern of loads without dependencies is
sufficient, and is significantly faster.

Measuring the speed of a null syscall with RFI fallback flush enabled
gives the relative improvement:

P8 - 1.83x
P9 - 1.75x

The flush also becomes simpler and more adaptable to different cache
geometries.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-23 16:16:33 +11:00
Nicholas Piggin 35adacd6fc powerpc/pseries, ps3: panic flush kernel messages before halting system
Platforms with a panic handler that halts the system can have problems
getting kernel messages out, because the panic notifiers are called
before kernel/panic.c does its flushing of printk buffers an console
etc.

This was attempted to be solved with commit a3b2cb30f2 ("powerpc: Do
not call ppc_md.panic in fadump panic notifier"), but that wasn't the
right approach and caused other problems, and was reverted by commit
ab9dbf771f.

Instead, the powernv shutdown paths have already had a similar
problem, fixed by taking the message flushing sequence from
kernel/panic.c. That's a little bit ugly, but while we have the code
duplicated, it will work for this case as well. So have ppc panic
handlers do the same flushing before they terminate.

Without this patch, a qemu pseries_le_defconfig guest stops silently
when issued the nmi command when xmon is off and no crash dumpers
enabled. Afterwards, an oops is printed by each CPU as expected.

Fixes: ab9dbf771f ("Revert "powerpc: Do not call ppc_md.panic in fadump panic notifier"")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-22 11:44:24 +11:00
Gustavo Romero a08082f8e4 powerpc/selftests: Check endianness on trap in TM
Add a selftest to check if endianness is flipped inadvertently to BE
(MSR.LE set to zero) on BE and LE machines when a trap is caught in
transactional mode and load_fp and load_vec are zero, i.e. when MSR.FP
and MSR.VEC are zeroed (disabled).

Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-22 05:48:37 +11:00
Gustavo Romero 1c200e63d0 powerpc/tm: Fix endianness flip on trap
Currently it's possible that a thread on PPC64 LE has its endianness
flipped inadvertently to Big-Endian resulting in a crash once the process
is back from the signal handler.

If giveup_all() is called when regs->msr has the bits MSR.FP and MSR.VEC
disabled (and hence MSR.VSX disabled too) it returns without calling
check_if_tm_restore_required() which copies regs->msr to ckpt_regs->msr if
the process caught a signal whilst in transactional mode. Then once in
setup_tm_sigcontexts() MSR from ckpt_regs.msr is used, but since
check_if_tm_restore_required() was not called previuosly, gp_regs[PT_MSR]
gets a copy of invalid MSR bits as MSR in ckpt_regs was not updated from
regs->msr and so is zeroed. Later when leaving the signal handler once in
sys_rt_sigreturn() the TS bits of gp_regs[PT_MSR] are checked to determine
if restore_tm_sigcontexts() must be called to pull in the correct MSR state
into the user context. Because TS bits are zeroed
restore_tm_sigcontexts() is never called and MSR restored from the user
context on returning from the signal handler has the MSR.LE (the endianness
bit) forced to zero (Big-Endian). That leads, for instance, to 'nop' being
treated as an illegal instruction in the following sequence:

	tbegin.
	beq	1f
	trap
	tend.
1:	nop

on PPC64 LE machines and the process dies just after returning from the
signal handler.

PPC64 BE is also affected but in a subtle way since forcing Big-Endian on
a BE machine does not change the endianness.

This commit fixes the issue described above by ensuring that once in
setup_tm_sigcontexts() the MSR used is from regs->msr instead of from
ckpt_regs->msr and by ensuring that we pull in only the MSR.FP, MSR.VEC,
and MSR.VSX bits from ckpt_regs->msr.

The fix was tested both on LE and BE machines and no regression regarding
the powerpc/tm selftests was observed.

Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-22 05:48:36 +11:00
Anton Blanchard b6d34eb4d2 powerpc: Expose TSCR via sysfs
The thread switch control register (TSCR) is a per core register
that configures how the CPU shares resources between SMT threads.

Exposing it via sysfs allows us to tune it at run time.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-22 05:48:36 +11:00
Mahesh Salgaonkar 8d81296cfc powerpc/radix: Remove trace_tlbie call from radix__flush_tlb_all
radix__flush_tlb_all() is called only in kexec path in real mode and any
tracepoints at this stage will make kexec to fail if enabled.

To verify enable tlbie trace before kexec.

$ echo 1 > /sys/kernel/debug/tracing/events/powerpc/tlbie/enable
== kexec into new kernel and kexec fails.

Fix this by not calling trace_tlbie from radix__flush_tlb_all().

Fixes: 0428491cba ("powerpc/mm: Trace tlbie(l) instructions")
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-22 05:48:35 +11:00
Guilherme G. Piccoli 45baee1416 powerpc/powernv: Add ppc_pci_reset_phbs parameter to issue a PHB reset
During a kdump kernel boot in PowerPC, we request a reset of the PHBs
to the FW. It makes sense, since if we are booting a kdump kernel it
means we had some trouble before and we cannot rely in the adapters'
health; they could be in a bad state, hence the reset is needed.

But this reset is useful not only in kdump - there are situations,
specially when debugging drivers, that we could break an adapter in
a way it requires such reset. One can tell to just go ahead and
reboot the machine, but happens that many times doing kexec is much
faster, and so preferable than a full power cycle.

This patch adds the ppc_pci_reset_phbs parameter to perform such reset.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-22 05:48:35 +11:00
David Gibson 7339390d77 powerpc/pseries: Don't give a warning when HPT resizing isn't available
As of 438cc81a41 "powerpc/pseries: Automatically resize HPT for memory hot
add/remove" when running on the pseries platform, we always attempt to
use the PAPR extension to resize the hashed page table (HPT) when we add
or remove memory.

This is fine, but when the extension is available we'll give a harmless,
but scary warning.  This patch suppresses the warning in this case.  It
will still warn if the feature is supposed to be available, but didn't
work.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-22 05:48:34 +11:00
Geert Uytterhoeven 4be4119d1f dt: booting-without-of: DT fix s/#interrupt-cell/#interrupt-cells/
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-22 05:48:34 +11:00
Andrew Donnellan 8d1915873d selftests/powerpc: Add alignment handler selftest
Add a selftest to exercise the powerpc alignment fault handler.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
[mpe: Add 32-bit support to the signal handler]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-22 05:48:34 +11:00
Russell Currey 57ad583f20 powerpc: Use octal numbers for file permissions
Symbolic macros are unintuitive and hard to read, whereas octal constants
are much easier to interpret.  Replace macros for the basic permission
flags (user/group/other read/write/execute) with numeric constants
instead, across the whole powerpc tree.

Introducing a significant number of changes across the tree for no runtime
benefit isn't exactly desirable, but so long as these macros are still
used in the tree people will keep sending patches that add them.  Not only
are they hard to parse at a glance, there are multiple ways of coming to
the same value (as you can see with 0444 and 0644 in this patch) which
hurts readability.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-22 05:48:33 +11:00
Mathieu Malaterre 600ecc1936 powerpc/boot/dts: Remove leading 0x and 0s from bindings notation
Improve the DTS files by removing all the leading "0x" and zeros to
fix the following dtc warnings:

  Warning (unit_address_format): Node /XXX unit name should not have leading "0x"

and:

  Warning (unit_address_format): Node /XXX unit name should not have leading 0s

Converted using the following command:

  find . -type f \( -iname *.dts -o -iname *.dtsi \) -exec sed -E -i -e "s/@0x([0-9a-fA-F\.]+)\s?\{/@\L\1 \{/g" -e "s/@0+([0-9a-fA-F\.]+)\s?\{/@\L\1 \{/g" {} +

For simplicity, two sed expressions were used to solve each warnings
separately.

To make the regex expression more robust a few other issues were
resolved, namely setting unit-address to lower case, and adding a
whitespace before the the opening curly brace:

  https://elinux.org/Device_Tree_Linux#Linux_conventions

This is a follow up to commit 4c9847b737 ("dt-bindings: Remove
leading 0x from bindings notation")

Reported-by: David Daney <ddaney@caviumnetworks.com>
Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-21 23:37:45 +11:00
Mathieu Malaterre 00f7b29f6e backlight: Fix old-style function definition
Fix warning:

drivers/macintosh/via-pmu-backlight.c: In function ‘pmu_backlight_init’:
drivers/macintosh/via-pmu-backlight.c:140:13: warning: old-style function definition [-Wold-style-definition]
 void __init pmu_backlight_init()
             ^~~~~~~~~~~~~~~~~~

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-21 23:37:44 +11:00
Mathieu Malaterre 4a7b8a4997 powerpc: Fix old-style function definition
Fix warnings such as:

arch/powerpc/platforms/powermac/backlight.c: In function ‘pmac_backlight_get_legacy_brightness’:
arch/powerpc/platforms/powermac/backlight.c:189:5: error: old-style function definition [-Werror=old-style-definition]
 int pmac_backlight_get_legacy_brightness()
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-21 23:37:44 +11:00
Mathieu Malaterre 104d55ae4d powerpc/xmon: Do not compute/store the major opcode
In commit 5b102782c7 ("powerpc/xmon: Enable disassembly files (compilation
changes)") usage of variable `op` has been removed. Completely remove opcode
computation since not used anymore.

Fix fatal warning:

arch/powerpc/xmon/ppc-dis.c: In function ‘lookup_powerpc’:
arch/powerpc/xmon/ppc-dis.c:96:17: error: variable ‘op’ set but not used [-Werror=unused-but-set-variable]
   unsigned long op;
                 ^~

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-21 23:37:43 +11:00
Mathieu Malaterre 38833faa11 powerpc/xive: Properly use static keyword for inline function
Fix fatal warning during compilation:

In file included from arch/powerpc/xmon/xmon.c:54:0:
./arch/powerpc/include/asm/xive.h:157:20: error: no previous prototype for ‘xive_smp_prepare_cpu’ [-Werror=missing-prototypes]
 extern inline int  xive_smp_prepare_cpu(unsigned int cpu) { return -EINVAL; }
                    ^

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-21 23:37:43 +11:00
Aneesh Kumar K.V be68fb64f7 selftest/powerpc: Add additional option to mmap_bench test
This patch adds --pgfault and --iterations options to mmap_bench test. With
--pgfault we touch every page mapped. This helps in measuring impact in the
page fault path with a patch series.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-21 23:37:43 +11:00
Aneesh Kumar K.V 10527e8081 powerpc/hash: Skip non initialized page size in init_hpte_page_sizes
One of the easiest way to test config with 4K HPTE is to disable 64K hardware
page size like below.

int __init htab_dt_scan_page_sizes(unsigned long node,

 		size -= 3; prop += 3;
 		base_idx = get_idx_from_shift(base_shift);
-		if (base_idx < 0) {
+		if (base_idx < 0 || base_idx == MMU_PAGE_64K) {
 			/* skip the pte encoding also */
 			prop += lpnum * 2; size -= lpnum * 2;

But then this results in error in other part of the code such as MPSS parsing
where we look at 4K base page size and 64K actual page size support.

This patch fix MPSS parsing by ignoring the actual page sizes marked
unsupported. In reality this can happen only with a corrupt device tree. But it
is good to tighten the error check.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-21 23:37:42 +11:00
Michael Ellerman 4afa0f3a89 Merge branch 'next' of https://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next
Freescale updates from Scott:

"Contains fixes for CPM GPIO and an FSL PCI erratum workaround, plus a
 minor cleanup patch."
2018-01-21 23:32:02 +11:00
Michael Ellerman ebf0b6a8b1 Merge branch 'fixes' into next
Merge our fixes branch from the 4.15 cycle.

Unusually the fixes branch saw some significant features merged,
notably the RFI flush patches, so we want the code in next to be
tested against that, to avoid any surprises when the two are merged.

There's also some other work on the panic handling that was reverted
in fixes and we now want to do properly in next, which would conflict.

And we also fix a few other minor merge conflicts.
2018-01-21 23:21:14 +11:00
Michael Ellerman 5400fc229e Merge branch 'topic/ppc-kvm' into next
Merge the topic branch we share with kvm-ppc, this brings in two xive
commits, one from Paul to rework HMI handling, and a minor cleanup to
drop an unused flag.
2018-01-21 22:43:43 +11:00
Aneesh Kumar K.V 76b03dc07e powerpc/mm: Remove unused flag arg in global_invalidates
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-21 20:30:44 +11:00
Christophe Leroy c095ff93f9 powerpc/sysdev: change CPM GPIO to platform_device
Since commit 9427ecbed4 ("gpio: Rework of_gpiochip_set_names()
to use device property accessors"), gpio chips have to have a
parent, otherwise devprop_gpiochip_set_names() prematurely exists
with message "GPIO chip parent is NULL" and doesn't proceed
'gpio-line-names' DT property.

This patch wraps the CPM GPIO into a platform driver to allow
assignment of the parent device.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2018-01-20 23:29:02 -06:00
Michael Bringmann 02ef6dd810 powerpc: Enable support for ibm,drc-info devtree property
To: linuxppc-dev@lists.ozlabs.org

From: Michael Bringmann <mwb@linux.vnet.ibm.com>

Cc: Michael Bringmann <mwb@linux.vnet.ibm.com>
Cc: nfont@linux.vnet.ibm.com
Subject: [PATCH V6 4/4] powerpc: Enable support for ibm,drc-info devtree property

prom_init.c: Enable support for new DRC device tree property
"ibm,drc-info" in initial handshake between the Linux kernel and
the front end processor.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-21 16:21:50 +11:00