Rename cxl_afu_reset() to __cxl_afu_reset() to we can reuse this function name
in the API.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Rework __detach_context() and cxl_context_detach() so we can reuse them in the
kernel API.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add cookie parameter to afu_release_irqs() so that we can pass in a different
cookie than the context structure. This will be useful for other kernel
drivers that want to call this but get their own cookie back in the interrupt
handler.
Update all existing call sites.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Now that we parse the AFU Configuration record, dump some info on it when in
debug mode.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When probing we call pci_enable_device() but don't call pci_disable_device() on
fail. This causes refcounting issues in the PCI subsystem if a second driver
tries to bind to the same device.
This patch adds the pci_disable_device() to the probe error path. This error
path is hit when this cxl driver tries to bind to AFUs (on the vPHB) rather
than the physical device.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When we expose AFUs as virtual PCI devices, they may look like the physical
CAPI PCI card. ie they may have the same vendor/device IDs.
We want to avoid these AFUs binding to this driver and any init this driver may
do.
Re-order card init to check the VSEC earlier before assigning BARs or
activating CXL. Also change the dev used in early prints as the adapter struct
may not be inited at this earlier stage.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Now that libcxl is public, let's document it.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This adds a hook into the powerpc pci code for pci_disable_device() calls. The
generic code already provides a weak pcibios_disable_device() symbol, so we
just need to provide our own in powerpc and it'll get picked up.
This is passed directly to the phb controller ops, provided one exists.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently pnv_pci_shutdown() calls the PHB shutdown code for all PHBs in the
system. It dereferences the private_data assuming it's a powernv PHB, which
won't be the case when we have different PHB in the systems (like when we add
vPHBs for CXL).
This moves the shutdown hook to the pci_controller_ops and fixes the call site
to use that instead.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add cxl context pointer to archdata. We'll want to create one of these for cxl
PCI devices. Put them here until we can get a pci_dev specific private data.
This location was suggested by benh.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add release_device() hook to phb ops so we can clean up for specific phbs.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Export pcibios_claim_one_bus, pcibios_scan_phb and pcibios_alloc_controller.
These will be used by the CXL driver.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This fixes calculating the key bits (KP and KS) in the SLB VSID for kernel
mappings.
I'm not CCing this to stable as there are no uses of this currently.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The afu fd release path was identified as a significant bottleneck in
the overall performance of cxl. While an optimal AFU design would
minimise the need to close & reopen the AFU fd, it is not always
practical to avoid.
The bottleneck seems to be down to the call to synchronize_rcu(), which
will block until every other thread is guaranteed to be out of an RCU
critical section. Replace it with call_rcu() to free the context
structures later so we can return to the application sooner.
This reduces the time spent in the fd release path from 13356 usec to
13.3 usec - about a 100x speed up.
Reported-by: Fei K Chen <uchen@cn.ibm.com>
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Export the "AFU Error Buffer" via sysfs attribute (afu_err_buf). AFU
error buffer is used by the AFU to report application specific
errors. The contents of this buffer are AFU specific and are intended to
be interpreted by the application interacting with the afu.
Suggested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Given a file descriptor on an afu device, libcxl currently uses the
major/minor number obtained from fstat on the fd to construct path to
the afu's sysfs directory. However it is possible that rather than using
one of the device in /dev/cxl, a kernel driver creates its own device
which export generic cxl interface to the userspace. This causes
problems with libcxl as it tries to use a wrong major/minor number to
construct the sysfs path and fail.
So this patch introduces a new ioctl called CXL_IOCTL_GET_AFU_ID on the
afu file descriptor to fetch the cxl_afu_id struct that holds the
card/offset-id and mode information. These info is then used by libcxl to
construct the correct path to the afu sysfs directory.
Testing:
- Build against pseries be/le configs
- Testing with corresponding libcxl changes to verify that it constructs
right sysfs path to the afu.
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
These tests were merged in parallel to the install support, update them
now to use it.
This also adds cross compile support for the VPHN test which was missing
it.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Rather than continuing to maintain a copy of pseries_defconfig with
CONFIG_CPU_LITTLE_ENDIAN enabled, use the generic merge_config script
and use an le.config to enable little endian on top of pseries_defconfig
without the need for a duplicated _defconfig file.
This method will require less maintenance in the future and will ensure
that both 'defconfigs' are always in sync.
It is worth noting that the seemingly more simple approach of:
pseries_le_defconfig: pseries_defconfig
$(Q)$(MAKE) le.config
Will not work when building using O=builddir.
The obvious fix to that:
pseries_le_defconfig:
$(Q)$(MAKE) -f $(srctree)/Makefile pseries_defconfig le.config
Also does not work. This is because if we have for example:
config FOO
depends on CPU_BIG_ENDIAN
select BAR
Then BAR will be enabled by the first call to kconfig (via
pseries_defconfig), and then will remain enabled after we merge
le.config, even though FOO will have been turned off.
The solution is to ensure to only invoke the kconfig logic once, after
we have merged all the config fragments. This ensures nothing is
select'ed on that should then be disabled by the later merged configs.
This is done through the explicit call to make olddefconfig
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Reviewed-by: Samuel Mendoza-Jonas <sam.mj@au1.ibm.com>
[mpe: Massage change log, fix white space and use ARCH not SRCARCH]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
These two configs should be identical with the exception of big or little
endian.
The big endian version has XMON_DEFAULT turned on while the little endian
has XMON_DEFAULT not set. It makes the most sense for defconfigs not to use
xmon by default, production systems should get back up as quickly as
possible, not sit in xmon.
In the event debugging is required, the option can be enabled or xmon=on
can be specified on commandline.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Use irq_desc_get_xxx() to avoid redundant lookup of irq_desc while we
already have a pointer to corresponding irq_desc.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We need to use a trampoline when using LOAD_HANDLER(), because the
destination needs to be in the first 64kB. An absolute branch has
no such limitations, so just jump there.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We had some code to restore the LR in the relocatable system call path
back when we used the LR to do an indirect branch.
Commit 6a404806df ("powerpc: Avoid link stack corruption in MMU
on syscall entry path") changed this to use the CTR which is volatile
across system calls so does not need restoring.
Remove the stale comment and the restore of the LR.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When we take a PMU exception or a software event we call
perf_read_regs(). This overloads regs->result with a boolean that
describes if we should use the sampled instruction address register
(SIAR) or the regs.
If the exception is in kernel, we start with the kernel regs and
backtrace through the kernel stack. At this point we switch to the
userspace regs and backtrace the user stack with perf_callchain_user().
Unfortunately these regs have not got the perf_read_regs() treatment,
so regs->result could be anything. If it is non zero,
perf_instruction_pointer() decides to use the SIAR, and we get issues
like this:
0.11% qemu-system-ppc [kernel.kallsyms] [k] _raw_spin_lock_irqsave
|
---_raw_spin_lock_irqsave
|
|--52.35%-- 0
| |
| |--46.39%-- __hrtimer_start_range_ns
| | kvmppc_run_core
| | kvmppc_vcpu_run_hv
| | kvmppc_vcpu_run
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call
| | |
| | |--67.08%-- _raw_spin_lock_irqsave <--- hi mum
| | | |
| | | --100.00%-- 0x7e714
| | | 0x7e714
Notice the bogus _raw_spin_irqsave when we transition from kernel
(system_call) to userspace (0x7e714). We inserted what was in the SIAR.
Add a check in regs_use_siar() to check that the regs in question
are from a PMU exception. With this fix the backtrace makes sense:
0.47% qemu-system-ppc [kernel.vmlinux] [k] _raw_spin_lock_irqsave
|
---_raw_spin_lock_irqsave
|
|--53.83%-- 0
| |
| |--44.73%-- hrtimer_try_to_cancel
| | kvmppc_start_thread
| | kvmppc_run_core
| | kvmppc_vcpu_run_hv
| | kvmppc_vcpu_run
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call
| | __ioctl
| | 0x7e714
| | 0x7e714
Cc: stable@vger.kernel.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
If both STRICT_MM_TYPECHECKS and DEBUG_PAGEALLOC are enabled, the code
in kernel_map_linear_page() is built, and so we fail with:
arch/powerpc/mm/hash_utils_64.c:1478:2:
error: incompatible type for argument 1 of 'htab_convert_pte_flags'
Fix it by using pgprot_val().
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Previously, dma_set_mask() on powernv was convoluted:
0) Call dma_set_mask() (a/p/kernel/dma.c)
1) In dma_set_mask(), ppc_md.dma_set_mask() exists, so call it.
2) On powernv, that function pointer is pnv_dma_set_mask().
In pnv_dma_set_mask(), the device is pci, so call pnv_pci_dma_set_mask().
3) In pnv_pci_dma_set_mask(), call pnv_phb->set_dma_mask() if it exists.
4) It only exists in the ioda case, where it points to
pnv_pci_ioda_dma_set_mask(), which is the final function.
So the call chain is:
dma_set_mask() ->
pnv_dma_set_mask() ->
pnv_pci_dma_set_mask() ->
pnv_pci_ioda_dma_set_mask()
Both ppc_md and pnv_phb function pointers are used.
Rip out the ppc_md call, pnv_dma_set_mask() and pnv_pci_dma_set_mask().
Instead:
0) Call dma_set_mask() (a/p/kernel/dma.c)
1) In dma_set_mask(), the device is pci, and pci_controller_ops.dma_set_mask()
exists, so call pci_controller_ops.dma_set_mask()
2) In the ioda case, that points to pnv_pci_ioda_dma_set_mask().
The new call chain is
dma_set_mask() ->
pnv_pci_ioda_dma_set_mask()
Now only the pci_controller_ops function pointer is used.
The fallback paths for p5ioc2 are the same.
Previously, pnv_pci_dma_set_mask() would find no pnv_phb->set_dma_mask()
function, to it would call __set_dma_mask().
Now, dma_set_mask() finds no ppc_md call or pci_controller_ops call,
so it calls __set_dma_mask().
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Some systems only need to deal with DMA masks for PCI devices.
For these systems, we can avoid the need for a platform hook and
instead use a pci controller based hook.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Remove powernv generic PCI controller operations. Replace it with
controller ops for each of the two supported PHBs.
As an added bonus, make the two new structs const, which will help
guard against bugs such as the one introduced in 65ebf4b63
("powerpc/powernv: Move controller ops from ppc_md to controller_ops")
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Remove unneeded ppc_md functions. Patch callsites to use pci_controller_ops
functions exclusively.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Move the u3 MPIC msi subsystem to use the pci_controller_ops structure
rather than ppc_md for MSI related PCI controller operations.
As with fsl_msi, operations are plugged in at the subsys level, after
controller creation. Again, we iterate over all controllers and
populate them with the MSI ops.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Move the PaSemi MPIC msi subsystem to use the pci_controller_ops
structure rather than ppc_md for MSI related PCI controller
operations.
As with fsl_msi, operations are plugged in at the subsys level, after
controller creation. Again, we iterate over all controllers and
populate them with the MSI ops.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Move the ppc4xx hsta msi subsystem to use the pci_controller_ops
structure rather than ppc_md for MSI related PCI controller
operations.
As with fsl_msi, operations are plugged in at the subsys level, after
controller creation. Again, we iterate over all controllers and
populate them with the MSI ops.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Move the ppc4xx msi subsystem to use the pci_controller_ops structure
rather than ppc_md for MSI related PCI controller operations.
As with fsl_msi, operations are plugged in at the subsys level, after
controller creation. Again, we iterate over all controllers and
populate them with the MSI ops.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Move the fsl_msi subsystem to use the pci_controller_ops structure
rather than ppc_md for MSI related PCI controller operations.
Previously, MSI ops were added to ppc_md at the subsys level. However,
in fsl_pci.c, PCI controllers are created at the at arch level. So,
unlike in e.g. PowerNV/pSeries/Cell, we can't simply populate a
platform-level controller ops structure and have it copied into the
controllers when they are created.
Instead, walk every phb, and attempt to populate it with the MSI ops.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Move the pseries platform to use the pci_controller_ops structure
rather than ppc_md for MSI related PCI controller operations
We need to iterate all PHBs because the MSI setup happens later than
find_and_init_phbs() - mpe.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Move the Cell platform to use the pci_controller_ops structure rather
than ppc_md for MSI related PCI controller operations.
We can be confident that the functions will be added to the platform's
ops struct before any PCI controller's ops struct is populated
because:
1) These ops are added to the struct in a subsys initcall.
We populate the ops in axon_msi_probe, which is the probe call for the
axon-msi driver. However the driver is registered in axon_msi_init,
which is a subsys initcall, so this will happen at the subsys level.
2) The controller recieves the struct later, in a device initcall.
Cell populates the controller in cell_setup_phb, which is hooked up to
ppc_md.pci_setup_phb. ppc_md.pci_setup_phb is only ever called in
of_platform.c, as part of the OpenFirmware PCI driver's probe
routine. That driver is registered in a device initcall, so it will
occur *after* the struct is properly populated.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Move the PowerNV/BML platform to use the pci_controller_ops structure
rather than ppc_md for MSI related PCI controller operations.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add MSI setup and teardown functions to pci_controller_ops.
Patch the callsites (arch_{setup,teardown}_msi_irqs) to prefer the
controller ops version if it's available.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
All users of the old opal events notifier have been converted over to
the irq domain so remove the event notifier functions.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Convert the opal dump driver to the new opal irq domain.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch converts the elog code to use the opal irq domain instead
of notifier events.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch converts the opal message event to use the new opal irq
domain.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The eeh code currently uses the old notifier method to get eeh events
from OPAL. It also contains some logic to filter opal events which has
been moved into the virtual irqchip. This patch converts the eeh code
to the new event interface which simplifies event handling.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Convert the opal hvc driver to use the new irqchip to register for
opal events. As older firmware versions may not have device tree
bindings for the interrupt parent we just use a hardcoded hwirq based
on the event number.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Convert the opal ipmi driver to use the new irq interface for events.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Acked-by: Corey Minyard <cminyard@mvista.com>
Cc: Corey Minyard <minyard@acm.org>
Cc: openipmi-developer@lists.sourceforge.net
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Whenever an interrupt is received for opal the linux kernel gets a
bitfield indicating certain events that have occurred and need handling
by the various device drivers. Currently this is handled using a
notifier interface where we call every device driver that has
registered to receive opal events.
This approach has several drawbacks. For example each driver has to do
its own checking to see if the event is relevant as well as event
masking. There is also no easy method of recording the number of times
we receive particular events.
This patch solves these issues by exposing opal events via the
standard interrupt APIs by adding a new interrupt chip and
domain. Drivers can then register for the appropriate events using
standard kernel calls such as irq_of_parse_and_map().
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Most of the OPAL subsystems are always compiled in for PowerNV and
many of them need to be initialised before or after other OPAL
subsystems. Rather than trying to control this ordering through
machine initcalls it is clearer and easier to control initialisation
order with explicit calls in opal_init.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Cc: Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Fastsleep is one of the idle state which cpuidle subsystem currently
uses on power8 machines. In this state L2 cache is brought down to a
threshold voltage. Therefore when the core is in fastsleep, the
communication between L2 and L3 needs to be fenced. But there is a bug
in the current power8 chips surrounding this fencing.
OPAL provides a workaround which precludes the possibility of hitting
this bug. But running with this workaround applied causes checkstop
if any correctable error in L2 cache directory is detected. Hence OPAL
also provides a way to undo the workaround.
In the existing implementation, workaround is applied by the last thread
of the core entering fastsleep and undone by the first thread waking up.
But this has a performance cost. These OPAL calls account for roughly
4000 cycles everytime the core has to enter or wakeup from fastsleep.
This patch introduces a sysfs attribute (fastsleep_workaround_applyonce)
to choose the behavior of this workaround.
By default, fastsleep_workaround_applyonce = 0. In this case, workaround
is applied/undone everytime the core enters/exits fastsleep.
fastsleep_workaround_applyonce = 1. In this case the workaround is
applied once on all the cores and never undone. This can be triggered by
echo 1 > /sys/devices/system/cpu/fastsleep_workaround_applyonce
For simplicity this attribute can be modified only once. Implying, once
fastsleep_workaround_applyonce is changed to 1, it cannot be reverted
to the default state.
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This is a cleanup patch; doesn't change any functionality. Moves
all cpuidle related code from setup.c to a new file.
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
[mpe: Fix the SMP=n build by including asm/smp.h in idle.c]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>