Commit Graph

533413 Commits

Author SHA1 Message Date
Gavin Shan dd497154d3 powerpc: Export include/uapi/asm/eeh.h
This adds include/uapi/asm/eeh.h to kbuild so that the header
file will be exported automatically with below command. The
header file was added by commit ed3e81ff20 ("powerpc/eeh: Move PE
state constants around")

   make INSTALL_HDR_PATH=/tmp/headers \
        SRCARCH=powerpc headers_install

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-18 19:34:41 +10:00
Vaibhav Jain e0ad784b65 powerpc/pseries: enable RTC class support
A working rtc kernel driver is needed so that hwclock can synchronize
system clock to rtc during shutdown/boot. We already have a powernv
platform rtc driver located at drivers/rtc/rtc-opal.c. However it depends
on CONFIG_RTC_CLASS which is disabled by default. Hence the driver isn't
enabled and not compiled for the powernv kernel.

We fix this by enabling rtc class support in pseries defconfig which
enables this driver and compiles it into the pseries kernel. In case
CONFIG_PPC_POWERNV is not enabled we fallback to 'Generic RTC support'
driver which emulates the legacy 'PC RTC driver'.

Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-18 19:34:41 +10:00
Nikunj A Dadhania 1d805440a3 powerpc/numa: initialize distance lookup table from drconf path
In some situations, a NUMA guest that supports
ibm,dynamic-memory-reconfiguration node will end up having flat NUMA
distances between nodes. This is because of two problems in the
current code.

1) Different representations of associativity lists.

   There is an assumption about the associativity list in
   initialize_distance_lookup_table(). Associativity list has two forms:

   a) [cpu,memory]@x/ibm,associativity has following
      format:
           <N> <N integers>

   b) ibm,dynamic-reconfiguration-memory/ibm,associativity-lookup-arrays

           <M> <N> <M associativity lists each having N integers>
           M = the number of associativity lists
           N = the number of entries per associativity list

   Fix initialize_distance_lookup_table() so that it does not assume
   "case a". And update the caller to skip the length field before
   sending the associativity list.

2) Distance table not getting updated from drconf path.

   Node distance table will not get initialized in certain cases as
   ibm,dynamic-reconfiguration-memory path does not initialize the
   lookup table.

   Call initialize_distance_lookup_table() from drconf path with
   appropriate associativity list.

Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-18 19:34:41 +10:00
Andrew Donnellan 53522982fc powerpc/powernv: move dma_get_required_mask from pnv_phb to pci_controller_ops
Simplify the dma_get_required_mask call chain by moving it from pnv_phb to
pci_controller_ops, similar to commit 763d2d8df1 ("powerpc/powernv:
Move dma_set_mask from pnv_phb to pci_controller_ops").

Previous call chain:

  0) call dma_get_required_mask() (kernel/dma.c)
  1) call ppc_md.dma_get_required_mask, if it exists. On powernv, that
     points to pnv_dma_get_required_mask() (platforms/powernv/setup.c)
  2) device is PCI, therefore call pnv_pci_dma_get_required_mask()
     (platforms/powernv/pci.c)
  3) call phb->dma_get_required_mask if it exists
  4) it only exists in the ioda case, where it points to
       pnv_pci_ioda_dma_get_required_mask() (platforms/powernv/pci-ioda.c)

New call chain:

  0) call dma_get_required_mask() (kernel/dma.c)
  1) device is PCI, therefore call pci_controller_ops.dma_get_required_mask
     if it exists
  2) in the ioda case, that points to pnv_pci_ioda_dma_get_required_mask()
     (platforms/powernv/pci-ioda.c)

In the p5ioc2 case, the call chain remains the same -
dma_get_required_mask() does not find either a ppc_md call or
pci_controller_ops call, so it calls __dma_get_required_mask().

Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-18 19:32:11 +10:00
Michael Ellerman 73b341efda powerpc/mm: Drop CONFIG_PPC_HAS_HASH_64K
The relation between CONFIG_PPC_HAS_HASH_64K and CONFIG_PPC_64K_PAGES is
painfully complicated.

But if we rearrange it enough we can see that PPC_HAS_HASH_64K
essentially depends on PPC_STD_MMU_64 && PPC_64K_PAGES.

We can then notice that PPC_HAS_HASH_64K is used in files that are only
built for PPC_STD_MMU_64, meaning it's equivalent to PPC_64K_PAGES.

So replace all uses and drop it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2015-08-18 19:32:10 +10:00
Michael Ellerman 55f8b5b82f powerpc/mm: Simplify page size kconfig dependencies
For config options with only a single value, guarding the single value
with 'if' is the same as adding a 'depends' statement. And it's more
standard to just use 'depends'.

And if the option has both an 'if' guard and a 'depends' we can collapse
them into a single 'depends' by combining them with &&.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2015-08-18 19:32:10 +10:00
Michael Ellerman 953005770e powerpc/mm: Drop the 64K on 4K version of pte_pagesize_index()
Now that support for 64k pages with a 4K kernel is removed, this code is
unreachable.

CONFIG_PPC_HAS_HASH_64K can only be true when CONFIG_PPC_64K_PAGES is
also true.

But when CONFIG_PPC_64K_PAGES is true we include pte-hash64.h which
includes pte-hash64-64k.h, which defines both pte_pagesize_index() and
crucially __real_pte, which means this definition can never be used.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2015-08-18 19:31:55 +10:00
Michael Ellerman f444f1f898 powerpc/cell: Drop support for 64K local store on 4K kernels
Back in the olden days we added support for using 64K pages to map the
SPU (Synergistic Processing Unit) local store on Cell, when the main
kernel was using 4K pages.

This was useful at the time because distros were using 4K pages, but
using 64K pages on the SPUs could reduce TLB pressure there.

However these days the number of Cell users is approaching zero, and
supporting this option adds unpleasant complexity to the memory
management code.

So drop the option, CONFIG_SPU_FS_64K_LS, and all related code.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
2015-08-18 19:29:49 +10:00
Michael Ellerman 74b5037baa powerpc/mm: Fix pte_pagesize_index() crash on 4K w/64K hash
The powerpc kernel can be built to have either a 4K PAGE_SIZE or a 64K
PAGE_SIZE.

However when built with a 4K PAGE_SIZE there is an additional config
option which can be enabled, PPC_HAS_HASH_64K, which means the kernel
also knows how to hash a 64K page even though the base PAGE_SIZE is 4K.

This is used in one obscure configuration, to support 64K pages for SPU
local store on the Cell processor when the rest of the kernel is using
4K pages.

In this configuration, pte_pagesize_index() is defined to just pass
through its arguments to get_slice_psize(). However pte_pagesize_index()
is called for both user and kernel addresses, whereas get_slice_psize()
only knows how to handle user addresses.

This has been broken forever, however until recently it happened to
work. That was because in get_slice_psize() the large kernel address
would cause the right shift of the slice mask to return zero.

However in commit 7aa0727f33 ("powerpc/mm: Increase the slice range to
64TB"), the get_slice_psize() code was changed so that instead of a
right shift we do an array lookup based on the address. When passed a
kernel address this means we index way off the end of the slice array
and return random junk.

That is only fatal if we happen to hit something non-zero, but when we
do return a non-zero value we confuse the MMU code and eventually cause
a check stop.

This fix is ugly, but simple. When we're called for a kernel address we
return 4K, which is always correct in this configuration, otherwise we
use the slice mask.

Fixes: 7aa0727f33 ("powerpc/mm: Increase the slice range to 64TB")
Reported-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2015-08-18 19:29:13 +10:00
Michael Ellerman 281786ea2c selftests/powerpc: Install tempfile so the subpage_prot_file test works
We forgot to install the tempfile, so when the selftests are installed
and then run the subpage_prot_file test fails.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-17 18:28:49 +10:00
Vaibhav Jain 8c7dd08a8c cxl: Plug irq_bitmap getting leaked in cxl_context
This patch plugs the leak of irq_bitmap, allocated as part of
initialization of cxl_context struct; during the call to
afu_allocate_irqs. The bitmap is now release during the call to function
afu_release_irqs.

Reported-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-17 13:56:32 +10:00
Daniel Axtens 25901632c9 cxl: Add CONFIG_CXL_EEH symbol
CONFIG_CXL_EEH is for CXL's EEH related code.

Other drivers can depend on or #ifdef on this symbol to configure
PERST behaviour, allowing CXL to participate in the EEH process.

Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-17 13:56:29 +10:00
Daniel Axtens 9e8df8a219 cxl: EEH support
EEH (Enhanced Error Handling) allows a driver to recover from the
temporary failure of an attached PCI card. Enable basic CXL support
for EEH.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-14 21:32:08 +10:00
Daniel Axtens 13e68d8bd0 cxl: Allow the kernel to trust that an image won't change on PERST.
Provide a kernel API and a sysfs entry which allow a user to specify
that when a card is PERSTed, it's image will stay the same, allowing
it to participate in EEH.

cxl_reset is used to reflash the card. In that case, we cannot safely
assert that the image will not change. Therefore, disallow cxl_reset
if the flag is set.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-14 21:32:07 +10:00
Daniel Axtens 4e1efb403c cxl: Don't remove AFUs/vPHBs in cxl_reset
If the driver doesn't participate in EEH, the AFUs will be removed
by cxl_remove, which will be invoked by EEH.

If the driver does particpate in EEH, the vPHB needs to stick around
so that the it can particpate.

In both cases, we shouldn't remove the AFU/vPHB.

Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-14 21:32:07 +10:00
Daniel Axtens d76427b0d8 cxl: Refactor AFU init/teardown
As with an adapter, some aspects of initialisation are done only once
in the lifetime of an AFU: for example, allocating memory, or setting
up sysfs/debugfs files.

However, we may want to be able to do some parts of the initialisation
multiple times: for example, in error recovery we want to be able to
tear down and then re-map IO memory and IRQs.

Therefore, refactor AFU init/teardown as follows.

 - Create two new functions: 'cxl_configure_afu', and its pair
   'cxl_deconfigure_afu'. As with the adapter functions,
   these (de)configure resources that do not need to last the entire
   lifetime of the AFU.

 - Allocating and releasing memory remain the task of 'cxl_alloc_afu'
   and 'cxl_release_afu'.

 - Once-only functions that do not involve allocating/releasing memory
   stay in the overarching 'cxl_init_afu'/'cxl_remove_afu' pair.
   However, the task of picking an AFU mode and activating it has been
   broken out.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-14 21:32:07 +10:00
Daniel Axtens c044c41542 cxl: Refactor adaptor init/teardown
Some aspects of initialisation are done only once in the lifetime of
an adapter: for example, allocating memory for the adapter,
allocating the adapter number, or setting up sysfs/debugfs files.

However, we may want to be able to do some parts of the
initialisation multiple times: for example, in error recovery we
want to be able to tear down and then re-map IO memory and IRQs.

Therefore, refactor CXL init/teardown as follows.

 - Keep the overarching functions 'cxl_init_adapter' and its pair,
   'cxl_remove_adapter'.

 - Move all 'once only' allocation/freeing steps to the existing
   'cxl_alloc_adapter' function, and its pair 'cxl_release_adapter'
   (This involves moving allocation of the adapter number out of
   cxl_init_adapter.)

 - Create two new functions: 'cxl_configure_adapter', and its pair
   'cxl_deconfigure_adapter'. These two functions 'wire up' the
   hardware --- they (de)configure resources that do not need to
   last the entire lifetime of the adapter

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-14 21:32:05 +10:00
Daniel Axtens 575e6986f0 cxl: Clean up adapter MMIO unmap path.
- MMIO pointer unmapping is guarded by a null pointer check.
   However, iounmap doesn't null the pointer, just invalidate it.
   Therefore, explicitly null the pointer after unmapping.

 - afu_desc_mmio also needs to be unmapped.

 - PCI regions are allocated in cxl_map_adapter_regs.
   Therefore they should be released in unmap, not elsewhere.

Acked-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-14 21:32:05 +10:00
Daniel Axtens e640d2fc81 cxl: Make IRQ release idempotent
Check if an IRQ is mapped before releasing it.

This will simplify future EEH code by allowing unconditional unmapping
of IRQs.

Acked-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-14 21:32:04 +10:00
Daniel Axtens 05155772f6 cxl: Allocate and release the SPA with the AFU
Previously the SPA was allocated and freed upon entering and leaving
AFU-directed mode. This causes some issues for error recovery - contexts
hold a pointer inside the SPA, and they may persist after the AFU has
been detached.

We would ideally like to allocate the SPA when the AFU is allocated, and
release it until the AFU is released. However, we don't know how big the
SPA needs to be until we read the AFU descriptor.

Therefore, restructure the code:

 - Allocate the SPA only once, on the first attach.

 - Release the SPA only when the entire AFU is being released (not
   detached). Guard the release with a NULL check, so we don't free
   if it was never allocated (e.g. dedicated mode)

Acked-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-14 21:32:04 +10:00
Daniel Axtens 0b3f9c757c cxl: Drop commands if the PCI channel is not in normal state
If the PCI channel has gone down, don't attempt to poke the hardware.

We need to guard every time cxl_whatever_(read|write) is called. This
is because a call to those functions will dereference an offset into an
mmio register, and the mmio mappings get invalidated in the EEH
teardown.

Check in the read/write functions in the header.
We give them the same semantics as usual PCI operations:
 - a write to a channel that is down is ignored.
 - a read from a channel that is down returns all fs.

Also, we try to access the MMIO space of a vPHB device as part of the
PCI disable path. Because that's a read that bypasses most of our usual
checks, we handle it explicitly.

As far as user visible warnings go:
 - Check link state in file ops, return -EIO if down.
 - Be reasonably quiet if there's an error in a teardown path,
   or when we already know the hardware is going down.
 - Throw a big WARN if someone tries to start a CXL operation
   while the card is down. This gives a useful stacktrace for
   debugging whatever is doing that.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-14 21:32:03 +10:00
Daniel Axtens 588b34be20 cxl: Convert MMIO read/write macros to inline functions
We're about to make these more complex, so make them functions
first.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-14 21:32:02 +10:00
Daniel Axtens e642d11bdb powerpc/eeh: Probe after unbalanced kref check
In the complete hotplug case, EEH PEs are supposed to be released
and set to NULL. Normally, this is done by eeh_remove_device(),
which is called from pcibios_release_device().

However, if something is holding a kref to the device, it will not
be released, and the PE will remain. eeh_add_device_late() has
a check for this which will explictly destroy the PE in this case.

This check in eeh_add_device_late() occurs after a call to
eeh_ops->probe(). On PowerNV, probe is a pointer to pnv_eeh_probe(),
which will exit without probing if there is an existing PE.

This means that on PowerNV, devices with outstanding krefs will not
be rediscovered by EEH correctly after a complete hotplug. This is
affecting CXL (CAPI) devices in the field.

Put the probe after the kref check so that the PE is destroyed
and affected devices are correctly rediscovered by EEH.

Fixes: d91dafc02f ("powerpc/eeh: Delay probing EEH device during hotplug")
Cc: stable@vger.kernel.org
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-14 21:31:49 +10:00
Gautham R. Shenoy e63dbd16ab powerpc: Add an inline function to update POWER8 HID0
Section 3.7 of Version 1.2 of the Power8 Processor User's Manual
prescribes that updates to HID0 be preceded by a SYNC instruction and
followed by an ISYNC instruction (Page 91).

Create an inline function name update_power8_hid0() which follows this
recipe and invoke it from the static split core path.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Tested-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-14 15:58:28 +10:00
Anshuman Khandual 9afac93343 powerpc/prom: Use DRCONF flags while processing detected LMBs
Replace hard coded values with existing DRCONF flags while procesing
detected LMBs from the device tree. Does not change any functionality.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-12 15:05:47 +10:00
Anshuman Khandual 8218a3031c powerpc/xmon: Drop the valid variable completely in dump_segments()
The value of 'valid' is always zero when 'esid' is zero, and if 'esid'
is non-zero then the value of 'valid' is irrelevant because we are using
logical or in the if expression.

In fact 'valid' can be dropped completely from dump_segments() by
simply doing the check with SLB_ESID_V directly in the if.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Rewrite change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-12 15:05:47 +10:00
Anshuman Khandual 9c61f7a0ad powerpc/prom: Simplify the logic to fetch SLB size
The code to fetch the SLB size from the device tree wants to first look
for "slb-size" and then if that's not found "ibm,slb-size".

We can simplify the code by looking for the properties and then if we
find one of them we set mmu_slb_size.

We also change the function name from check_cpu_slb_size() to
init_mmu_slb_size() as the function doesn't check anything, it only
initialises mmu_slb_size.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Rewrite change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-12 15:05:46 +10:00
Anshuman Khandual 79d0be7407 powerpc/slb: Add documentation on runtime patching of SLB encoding
This patch adds some documentation to patch_slb_encoding() explaining
how it works.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Update change log and mention the signedness of the immediate]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-12 15:04:52 +10:00
Anshuman Khandual 2be682af48 powerpc/slb: Rename all the 'slot' occurrences to 'entry'
The SLB code uses 'slot' and 'entry' interchangeably, change it to always
use 'entry'.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Rewrite change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-12 14:50:12 +10:00
Anshuman Khandual 752b8adec4 powerpc/slb: Remove a duplicate extern variable
This patch just removes one redundant entry for one extern variable
'slb_compare_rr_to_size' from the scope. This patch does not change
any functionality.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-12 14:50:12 +10:00
Daniel Axtens 83c3fee7e7 cxl: sparse: Silence iomem warning in debugfs file creation
An IO address, tagged with __iomem, is passed to debugfs_create_file
as private data. This requires that it be cast to void *. The cast
drops the __iomem annotation and so creates a sparse warning:

  drivers/misc/cxl/debugfs.c:51:57: warning: cast removes address space of expression

The address space marker is added back in the file operations
(fops_io_u64).

Silence the warning with __force.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-12 14:49:29 +10:00
Daniel Axtens 3d6b040e73 cxl: sparse: Make declarations static
A few declarations were identified by sparse as needing to be static:

  drivers/misc/cxl/irq.c:408:6: warning: symbol 'afu_irq_name_free' was not declared. Should it be static?
  drivers/misc/cxl/irq.c:467:6: warning: symbol 'afu_register_hwirqs' was not declared. Should it be static?
  drivers/misc/cxl/file.c:254:6: warning: symbol 'afu_compat_ioctl' was not declared. Should it be static?
  drivers/misc/cxl/file.c:399:30: warning: symbol 'afu_master_fops' was not declared. Should it be static?

Make them static.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-12 14:49:09 +10:00
Daniel Axtens d3d73f4b38 cxl: Compile with -Werror
It's a good idea, and it brings us in line with the rest of arch/powerpc.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-11 07:43:40 +10:00
Naveen N. Rao 197165d449 powerpc/ftrace: add powerpc timebase as a trace clock source
Add a new powerpc-specific trace clock using the timebase register,
similar to x86-tsc. This gives us
- a fast, monotonic, hardware clock source for trace entries, and
- a clock that can be used to correlate events across cpus as well as across
  hypervisor and guests.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-06 16:36:23 +10:00
Wei Yongjun 35a7f41cc6 powerpc/4xx: Fix return value check in hsta_msi_probe()
In case of error, the functions platform_get_resource() and kmalloc()
returns NULL not ERR_PTR(). The IS_ERR() test in the return value check
should be replaced with NULL test.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-06 16:33:46 +10:00
Paul Bolle a368c29cf1 windfarm: remove three exported but unused functions
wf_find_control(), wf_find_sensor(), and wf_is_overtemp() are exported
but unused. Remove these three functions.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-06 15:10:21 +10:00
Paul Bolle ca94bbab1a windfarm: make wf_critical_overtemp() static
wf_critical_overtemp() is exported. But nothing uses that export.
That's unsurprising because there's no header that defines it. Stop
exporting that function and make it static.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-06 15:10:21 +10:00
Paul Bolle fe2b592173 windfarm: decrement client count when unregistering
wf_unregister_client() increments the client count when a client
unregisters. That is obviously incorrect. Decrement that client count
instead.

Fixes: 75722d3992 ("[PATCH] ppc64: Thermal control for SMU based machines")

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-06 15:10:20 +10:00
Joe Perches a825ac078b powerpc: Remove redundant breaks
break; break; isn't useful.

Remove one.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-06 15:10:20 +10:00
Kevin Hao ae2a84b407 powerpc: pci: use %pR for printing struct resource
Use %pR to simplify the debug code. This also make the debug info more
readable.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
[mpe: Unsplit multi-line printk strings]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-06 15:10:19 +10:00
Daniel Axtens 368857c16c cxl: Don't ignore add_process_element() result when attaching context
Currently when attaching a context in dedicated mode, we ignore the
result of add_process_element(), which could potentially fail.

If add_process_element() returns an error, pass it back to the caller.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-06 15:10:19 +10:00
Mahesh Salgaonkar 62521ea6db powerpc/powernv: Invoke opal_cec_reboot2() on unrecoverable HMI.
Invoke new opal_cec_reboot2() call with reboot type
OPAL_REBOOT_PLATFORM_ERROR (for unrecoverable HMI interrupts) to inform
BMC/OCC about this error, so that BMC can collect relevant data for error
analysis and decide what component to de-configure before rebooting.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-06 15:10:19 +10:00
Mahesh Salgaonkar e784b6499d powerpc/powernv: Invoke opal_cec_reboot2() on unrecoverable machine check errors.
On non-recoverable MCE errors in kernel space, Linux kernel panics
and system reboots. On BMC based system opal-prd runs as a daemon
in the host. Hence, kernel crash may prevent opal-prd to detect and
analyze this MCE error. This may land us in a situation where the faulty
memory never gets de-configured and Linux would keep hitting same MCE error
again and again. If this happens in early stage of kernel initialization,
then Linux will keep crashing and rebooting in a loop.

This patch fixes this issue by invoking new opal_cec_reboot2() call with
reboot type OPAL_REBOOT_PLATFORM_ERROR to inform BMC/OCC about this
error, so that BMC can collect relevant data for error analysis and
decide what component to de-configure before rebooting.

This patch is dependent on OPAL patchset posted on skiboot mailing list
at https://lists.ozlabs.org/pipermail/skiboot/2015-July/001771.html that
introduces opal_cec_reboot2() opal call.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-06 15:10:18 +10:00
Mahesh Salgaonkar 1852ae276b powerpc/powernv: Pull all HMI events before panic.
In the event of unrecovered HMI the existing code panics as soon as
it receives the first unrecovered HMI event. This makes host to report
partial information about HMIs before panic. There may be more errors
which would have caused the HMI and hence more HMI event would have been
generated waiting to be pulled by host. This patch implements a logic to
pull and display all the HMI event before going down panic path.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-06 15:10:18 +10:00
Mahesh Salgaonkar c33e11d0dd powerpc/powernv: display reason for Malfunction Alert HMI.
The V2 version of HMI event now carries additional information for
Malfunction Alert. It now contains error information about CORE and NX
checkstop. This patch checks and displays the check stop reason before
panic.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-06 15:09:59 +10:00
Michael Ellerman 5d83c2b37d selftests/seccomp: Add powerpc support
Wire up the syscall number and regs so the tests work on powerpc.

With the powerpc kernel support just merged, all tests pass on ppc64,
ppc64 (compat), ppc64le, ppc, ppc64e and ppc64e (compat).

Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-07-30 14:35:36 +10:00
Michael Ellerman c385d0db30 selftests/seccomp: Make seccomp tests work on big endian
The seccomp_bpf test uses BPF_LD|BPF_W|BPF_ABS to load 32-bit values
from seccomp_data->args. On big endian machines this will load the high
word of the argument, which is not what the test wants.

Borrow a hack from samples/seccomp/bpf-helper.h which changes the offset
on big endian to account for this.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Kees Cook <keescook@chromium.org>
2015-07-30 14:35:36 +10:00
Michael Ellerman 2449acc534 powerpc/kernel: Enable seccomp filter
This commit enables seccomp filter on powerpc, now that we have all the
necessary pieces in place.

To support seccomp's desire to modify the syscall return value under
some circumstances, we use a different ABI to the ptrace ABI. That is we
use r3 as the syscall return value, and orig_gpr3 is the first syscall
parameter.

This means the seccomp code, or a ptracer via SECCOMP_RET_TRACE, will
see -ENOSYS preloaded in r3. This is identical to the behaviour on x86,
and allows seccomp or the ptracer to either leave the -ENOSYS or change
it to something else, as well as rejecting or not the syscall by
modifying r0.

If seccomp does not reject the syscall, we restore the register state to
match what ptrace and audit expect, ie. r3 is the first syscall
parameter again. We do this restore using orig_gpr3, which may have been
modified by seccomp, which allows seccomp to modify the first syscall
paramater and allow the syscall to proceed.

We need to #ifdef the the additional handling of r3 for seccomp, so move
it all out of line.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Kees Cook <keescook@chromium.org>
2015-07-30 14:34:44 +10:00
Michael Ellerman 1b60bab04e powerpc/kernel: Add SIG_SYS support for compat tasks
SIG_SYS was added in commit a0727e8ce5 "signal, x86: add SIGSYS info
and make it synchronous."

Because we use the asm-generic struct siginfo, we got support for
SIG_SYS for free as part of that commit.

However there was no compat handling added for powerpc. That means we've
been advertising the existence of signfo._sifields._sigsys to compat
tasks, but not actually filling in the fields correctly.

Luckily it looks like no one has noticed, presumably because the only
user of SIGSYS in the kernel is seccomp filter, which we don't support
yet.

So before we enable seccomp filter, add compat handling for SIGSYS.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Kees Cook <keescook@chromium.org>
2015-07-29 11:56:13 +10:00
Michael Ellerman e9fbe68632 powerpc: Change syscall_get_nr() to return int
The documentation for syscall_get_nr() in asm-generic says:

 Note this returns int even on 64-bit machines. Only 32 bits of
 system call number can be meaningful. If the actual arch value
 is 64 bits, this truncates to 32 bits so 0xffffffff means -1.

However our implementation was never updated to reflect this.

Generally it's not important, but there is once case where it matters.

For seccomp filter with SECCOMP_RET_TRACE, the tracer will set
regs->gpr[0] to -1 to reject the syscall. When the task is a compat
task, this means we end up with 0xffffffff in r0 because ptrace will
zero extend the 32-bit value.

If syscall_get_nr() returns an unsigned long, then a 64-bit kernel will
see a positive value in r0 and will incorrectly allow the syscall
through seccomp.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Kees Cook <keescook@chromium.org>
2015-07-29 11:56:13 +10:00