Add SWITCHTEC_QUIRK() to reduce redundancy in declaring devices that use
quirk_switchtec_ntb_dma_alias().
By itself, this is no functional change, but a subsequent patch updates
SWITCHTEC_QUIRK() to fix ad281ecf1c ("PCI: Add DMA alias quirk for
Microsemi Switchtec NTB").
Fixes: ad281ecf1c ("PCI: Add DMA alias quirk for Microsemi Switchtec NTB")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: split to separate patch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Add Device IDs to the Intel GPU "spurious interrupt" quirk table.
For these devices, unplugging the VGA cable and plugging it in again causes
spurious interrupts from the IGD. Linux eventually disables the interrupt,
but of course that disables any other devices sharing the interrupt.
The theory is that this is a VGA BIOS defect: it should have disabled the
IGD interrupt but failed to do so.
See f67fd55fa9 ("PCI: Add quirk for still enabled interrupts on Intel
Sandy Bridge GPUs") and 7c82126a94 ("PCI: Add new ID for Intel GPU
"spurious interrupt" quirk") for some history.
[bhelgaas: See link below for discussion about how to fix this more
generically instead of adding device IDs for every new Intel GPU. I hope
this is the last patch to add device IDs.]
Link: https://lore.kernel.org/linux-pci/1537974841-29928-1-git-send-email-bmeng.cn@gmail.com
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v3.4+
The few callers can just use dma_set_max_seg_size ()directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The two callers can just use dma_set_seg_boundary() directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Some PCI devices may have memory mapped in a BAR space that's intended for
use in peer-to-peer transactions. To enable such transactions the memory
must be registered with ZONE_DEVICE pages so it can be used by DMA
interfaces in existing drivers.
Add an interface for other subsystems to find and allocate chunks of P2P
memory as necessary to facilitate transfers between two PCI peers:
struct pci_dev *pci_p2pmem_find[_many]();
int pci_p2pdma_distance[_many]();
void *pci_alloc_p2pmem();
The new interface requires a driver to collect a list of client devices
involved in the transaction then call pci_p2pmem_find() to obtain any
suitable P2P memory. Alternatively, if the caller knows a device which
provides P2P memory, they can use pci_p2pdma_distance() to determine if it
is usable. With a suitable p2pmem device, memory can then be allocated
with pci_alloc_p2pmem() for use in DMA transactions.
Depending on hardware, using peer-to-peer memory may reduce the bandwidth
of the transfer but can significantly reduce pressure on system memory.
This may be desirable in many cases: for example a system could be designed
with a small CPU connected to a PCIe switch by a small number of lanes
which would maximize the number of lanes available to connect to NVMe
devices.
The code is designed to only utilize the p2pmem device if all the devices
involved in a transfer are behind the same PCI bridge. This is because we
have no way of knowing whether peer-to-peer routing between PCIe Root Ports
is supported (PCIe r4.0, sec 1.3.1). Additionally, the benefits of P2P
transfers that go through the RC is limited to only reducing DRAM usage
and, in some cases, coding convenience. The PCI-SIG may be exploring
adding a new capability bit to advertise whether this is possible for
future hardware.
This commit includes significant rework and feedback from Christoph
Hellwig.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: fold in fix from Keith Busch <keith.busch@intel.com>:
https://lore.kernel.org/linux-pci/20181012155920.15418-1-keith.busch@intel.com,
to address comment from Dan Carpenter <dan.carpenter@oracle.com>, fold in
https://lore.kernel.org/linux-pci/20181017160510.17926-1-logang@deltatee.com]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The threaded IRQ is naturally single threaded as desired, so use that to
simplify the AER bottom half handler. Since the root port structure has
much less to do now, remove the rpc construction helper routine.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Use the recommended kernel API for writing to a concurrently-accessed
kfifo. No functional change here.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The kernel provides a generic FIFO implementation, so no need to reinvent
that capability in a driver. Replace the AER-specific implementation with
the kernel-provided kfifo. Since the interrupt handler producer and work
queue consumer run single threaded, there is no need for additional
locking, so remove that lock, too.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The AER struct aer_rpc was carrying a copy of the error source simply as a
temperary variable. Remove that from the structure and use a stack
variable for the purpose.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The error recovery callbacks are only run on child devices. A Root Port is
never a child device, so this error resume callback was never invoked.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
As done treewide earlier, this catches several more open-coded
allocation size calculations that were added to the kernel during the
merge window. This performs the following mechanical transformations
using Coccinelle:
kvmalloc(a * b, ...) -> kvmalloc_array(a, b, ...)
kvzalloc(a * b, ...) -> kvcalloc(a, b, ...)
devm_kzalloc(..., a * b, ...) -> devm_kcalloc(..., a, b, ...)
Signed-off-by: Kees Cook <keescook@chromium.org>
When the root complex suspends it must send a PME_Turn_Off TLP.
Implement this by asserting the "turnoff" reset.
On imx7d this functionality is part of the System Reset Controller (SRC)
and is exposed through the linux reset-controller subsystem.
On imx6 equivalent bits are in the IOMUXC pinmux controller General
Purpose Register (GPR) area which the imx6-pcie driver accesses
directly.
This is only for imx7d right now but it's deliberately implemented as an
optional reset, ignoring the chip variant:
* Older dtbs won't have this reset so it will be ignored.
* Future chips might also expose this as a reset controller.
For example imx8m (not yet supported) has the exact same
PCIE_CTRL_APPS_TURNOFF bit in the same location.
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
[lorenzo.pieralisi@arm.com: updated commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
The PCI bus config accessors could be inlined into other accessor
functions, which makes it so they can't be traced. Force them to never be
inlined so that ftrace can hook into these functions.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Addresses-Coverity-ID: 1472052 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Use kmemdup() rather than duplicating its implementation.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/pci/hotplug/cpqphp_core.c: In function 'init_SERR':
drivers/pci/hotplug/cpqphp_core.c:124:5: warning: variable 'physical_slot' set but not used [-Wunused-but-set-variable]
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Currently, a hotplug bridge will be given hpmemsize additional memory
and hpiosize additional io if available, in order to satisfy any future
hotplug allocation requirements.
These calculations don't consider the current memory/io size of the
hotplug bridge/slot, so hotplug bridges/slots which have downstream
devices will be allocated their current allocation in addition to the
hpmemsize value.
This makes for possibly undesirable results with a mix of unoccupied and
occupied slots (ex, with hpmemsize=2M):
02:03.0 PCI bridge: <-- Occupied
Memory behind bridge: d6200000-d64fffff [size=3M]
02:04.0 PCI bridge: <-- Unoccupied
Memory behind bridge: d6500000-d66fffff [size=2M]
This change considers the current allocation size when using the
hpmemsize/hpiosize parameters to make the reservations predictable for
the mix of unoccupied and occupied slots:
02:03.0 PCI bridge: <-- Occupied
Memory behind bridge: d6200000-d63fffff [size=2M]
02:04.0 PCI bridge: <-- Unoccupied
Memory behind bridge: d6400000-d65fffff [size=2M]
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
In order to have better power management for Thunderbolt PCIe chains,
Windows enables power management for native PCIe hotplug ports if there is
the following ACPI _DSD attached to the root port:
Name (_DSD, Package () {
ToUUID ("6211e2c0-58a3-4af3-90e1-927a4e0c55a4"),
Package () {
Package () {"HotPlugSupportInD3", 1}
}
})
This is also documented in:
https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-pcie-root-ports-supporting-hot-plug-in-d3
Do the same in Linux by introducing new firmware PM callback
(->bridge_d3()) and then implement it for ACPI based systems so that the
above property is checked.
There is one catch, though. The initial pci_dev->bridge_d3 is set before
the root port has ACPI companion bound (the device is not added to the PCI
bus either) so we need to look up the ACPI companion manually in that case
in acpi_pci_bridge_d3().
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Basically we need to do the same steps than what we do when system sleep is
entered and disable PME interrupt when the root port is runtime suspended.
This prevents spurious wakeups immediately when the port is transitioned
into D3cold.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Basically we need to do the same thing when runtime suspending than with
system sleep so re-use those operations here. This makes sure hotplug
interrupt does not trigger immediately when the link goes down.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When PCIe port is runtime suspended/resumed some extra steps might be
needed to be executed from the port service driver side. For instance we
may need to disable PCIe hotplug interrupt to prevent it from triggering
immediately when PCIe link to the downstream component goes down.
To make the above possible add optional ->runtime_suspend() and
->runtime_resume() callbacks to struct pcie_port_service_driver and call
them for each port service in runtime suspend/resume callbacks of portdrv.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
[bhelgaas: adjust "slot->state" for 5790a9c78e ("PCI: pciehp: Unify
controller and slot structs")]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently we try to keep PCIe ports runtime suspended over system suspend
if possible. This mostly happens when entering suspend-to-idle because
there is no need to re-configure wake settings.
This causes problems if the parent port goes into D3cold and it gets
resumed upon exit from system suspend. This may happen for example if the
port is part of PCIe switch and the same switch is connected to a PCIe
endpoint that needs to be resumed. The way exit from D3cold works according
PCIe 4.0 spec 5.3.1.4.2 is that power is restored and cold reset is
signaled. After this the device is in D0unitialized state keeping PME
context if it supports wake from D3cold.
The problem occurs when a PCIe hotplug port is left suspended and the
parent port goes into D3cold and back to D0: the port keeps its PME context
but since everything else is reset back to defaults (D0unitialized) it is
not set to detect hotplug events anymore.
For this reason change the PCIe portdrv power management logic so that it
is fine to keep the port runtime suspended over system suspend but it needs
to be resumed upon exit to make sure it gets properly re-initialized.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
PCIe native hotplug shares MSI vector with native PME so the interrupt
handler might get called even the hotplug interrupt is masked. In that case
we should not handle any events because the interrupt was not meant for us.
Modify the PCIe hotplug interrupt handler to check this accordingly and
bail out if it finds out that the interrupt was not about hotplug.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
When PCIe hotplug port is transitioned into D3hot, the link to the
downstream component will go down. If hotplug interrupt generation is
enabled when that happens, it will trigger immediately, waking up the
system and bringing the link back up.
To prevent this, disable hotplug interrupt generation when system suspend
is entered. This does not prevent wakeup from low power states according
to PCIe 4.0 spec section 6.7.3.4:
Software enables a hot-plug event to generate a wakeup event by
enabling software notification of the event as described in Section
6.7.3.1. Note that in order for software to disable interrupt generation
while keeping wakeup generation enabled, the Hot-Plug Interrupt Enable
bit must be cleared.
So as long as we have set the slot event mask accordingly, wakeup should
work even if slot interrupt is disabled. The port should trigger wake and
then send PME to the root port when the PCIe hierarchy is brought back up.
Limit this to systems using native PME mechanism to make sure older Apple
systems depending on commit e3354628c376 ("PCI: pciehp: Support interrupts
sent from D3hot") still continue working.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We enable power management automatically for bridges where
pci_bridge_d3_possible() returns true. However, these bridges may have
ACPI methods such as _DSW that need to be called before D3 entry. For
example in Lenovo Thinkpad X1 Carbon 6th _DSW method is used to prepare
D3cold for the PCIe root port hosting Thunderbolt chain. Because wake is
not enabled _DSW method is never called and the port does not enter
D3cold properly consuming more power than necessary.
Users can work this around by writing "enabled" to "wakeup" sysfs file
under the device in question but that is not something an ordinary user
is expected to do.
Since we already automatically enable power management for PCIe ports
with ->bridge_d3 set extend that to enable wake for them as well,
assuming the port has any ACPI wakeup related objects implemented in the
namespace (adev->wakeup.flags.valid is true). This ensures the necessary
ACPI methods get called at appropriate times and allows the root port in
Thinkpad X1 Carbon 6th to go into D3cold.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit baecc470d5 ("PCI / PM: Skip bridges in pci_enable_wake()") changed
pci_enable_wake() so that all bridges are skipped when wakeup is enabled
(or disabled) with the reasoning that bridges can only signal wakeup on
behalf of their subordinate devices.
However, there are bridges that can signal wakeup themselves. For example
PCIe downstream and root ports supporting hotplug may signal wakeup upon
hotplug event.
For this reason change pci_enable_wake() so that it skips all bridges
except those that we power manage (->bridge_d3 is set). Those are the ones
that can go into low power states and may need to signal wakeup.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The spec has timing requirements when waiting for a link to become active
after a conventional reset. Implement those hard delays when waiting for
an active link so pciehp and dpc drivers don't need to duplicate this.
For devices that don't support data link layer active reporting, wait the
fixed time recommended by the PCIe spec.
Signed-off-by: Keith Busch <keith.busch@intel.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
Bring surprise removals and permanent failures together so we no longer
need separate flags. The implementation enforces that error handling will
not be able to override a surprise removal's permanent channel failure.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
A device still participates in error recovery even if it doesn't have
the error callbacks.
Always provide the status for user event watchers.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
There is no point in having a generic broadcast function if it needs to
have special cases for each callback it broadcasts.
Abstract the error broadcast to only the necessary information and removes
the now unnecessary helper to walk the bus.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
Going primarily by:
https://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors
with additional information gleaned from other related pages; notably:
- Bonnell shrink was called Saltwell
- Moorefield is the Merriefield refresh which makes it Airmont
The general naming scheme is: FAM6_ATOM_UARCH_SOCTYPE
for i in `git grep -l FAM6_ATOM` ; do
sed -i -e 's/ATOM_PINEVIEW/ATOM_BONNELL/g' \
-e 's/ATOM_LINCROFT/ATOM_BONNELL_MID/' \
-e 's/ATOM_PENWELL/ATOM_SALTWELL_MID/g' \
-e 's/ATOM_CLOVERVIEW/ATOM_SALTWELL_TABLET/g' \
-e 's/ATOM_CEDARVIEW/ATOM_SALTWELL/g' \
-e 's/ATOM_SILVERMONT1/ATOM_SILVERMONT/g' \
-e 's/ATOM_SILVERMONT2/ATOM_SILVERMONT_X/g' \
-e 's/ATOM_MERRIFIELD/ATOM_SILVERMONT_MID/g' \
-e 's/ATOM_MOOREFIELD/ATOM_AIRMONT_MID/g' \
-e 's/ATOM_DENVERTON/ATOM_GOLDMONT_X/g' \
-e 's/ATOM_GEMINI_LAKE/ATOM_GOLDMONT_PLUS/g' ${i}
done
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: dave.hansen@linux.intel.com
Cc: len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit ee1604381a ("PCI: mvebu: Only remap I/O space if configured") had
the side effect that the PCI I/O mapping was created much earlier than
before, at a point where the probe() of the driver could still fail. This
is for example a problem if one gets an -EPROBE_DEFER at some point during
probe(), after pci_ioremap_io() has been called.
Indeed, there is currently no function to undo what pci_ioremap_io() did,
and switching to pci_remap_iospace() is not an option in pci-mvebu due to
the need for special memory attributes on Armada 38x.
Reverting ee1604381a ("PCI: mvebu: Only remap I/O space if configured")
would be a possibility, but it would require also reverting 42342073e3
("PCI: mvebu: Convert to use pci_host_bridge directly"). So instead, we use
an open-coded version of pci_host_probe() that creates the PCI I/O mapping
at a point where we are guaranteed not to fail anymore.
Fixes: ee1604381a ("PCI: mvebu: Only remap I/O space if configured")
Reported-by: Jan Kundrát <jan.kundrat@cesnet.cz>
Tested-by: Jan Kundrát <jan.kundrat@cesnet.cz>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
The PCI kirin driver compilation produces the following section mismatch
warning:
WARNING: vmlinux.o(.text+0x4758cc): Section mismatch in reference from
the function kirin_pcie_probe() to the function
.init.text:kirin_add_pcie_port()
The function kirin_pcie_probe() references
the function __init kirin_add_pcie_port().
This is often because kirin_pcie_probe lacks a __init
annotation or the annotation of kirin_add_pcie_port is wrong.
Remove '__init' from kirin_add_pcie_port() to fix it.
Fixes: fc5165db24 ("PCI: kirin: Add HiSilicon Kirin SoC PCIe controller driver")
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
[lorenzo.pieralisi@arm.com: updated commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
This save some duplication for ia64, and makes the interface more
general. In the long run we want each dma_map_ops instance to fill this
out, but this will take a little more prep work.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
PCIe r4.0, sec 7.5.1.1.4 defines a new bit in the Status Register:
Immediate Readiness – This optional bit, when Set, indicates the Function
is guaranteed to be ready to successfully complete valid configuration
accesses at any time following any reset that the host is capable of
issuing Configuration Requests to this Function.
When this bit is Set, for accesses to this Function, software is exempt
from all requirements to delay configuration accesses following any type
of reset, including but not limited to the timing requirements defined in
Section 6.6.
This means that all delays after a Conventional or Function Reset can be
skipped.
This patch reads such bit and caches its value in a flag inside struct
pci_dev to be checked later if we should delay or can skip delays after a
reset. While at that, also move the explicit msleep(100) call from
pcie_flr() and pci_af_flr() to pci_dev_wait().
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
[bhelgaas: rename PCI_STATUS_IMMEDIATE to PCI_STATUS_IMM_READY]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Test the correct value to see whether the PHY get failed.
Use devm_phy_get() instead of devm_phy_optional_get(), since it is
only called if phy name is given in devicetree and so should exist.
If failure when getting or linking PHY, put any PHYs which were
already got and unlink them.
Fixes: dfb8053469 ("PCI: cadence: Add generic PHY support to host and EP drivers")
Reported-by: Colin King <colin.king@canonical.com>
Signed-off-by: Alan Douglas <adouglas@cadence.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
On 38+ Intel-based ASUS products, the NVIDIA GPU becomes unusable after S3
suspend/resume. The affected products include multiple generations of
NVIDIA GPUs and Intel SoCs. After resume, nouveau logs many errors such
as:
fifo: fault 00 [READ] at 0000005555555000 engine 00 [GR] client 04
[HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
DRM: failed to idle channel 0 [DRM]
Similarly, the NVIDIA proprietary driver also fails after resume (black
screen, 100% CPU usage in Xorg process). We shipped a sample to NVIDIA for
diagnosis, and their response indicated that it's a problem with the parent
PCI bridge (on the Intel SoC), not the GPU.
Runtime suspend/resume works fine, only S3 suspend is affected.
We found a workaround: on resume, rewrite the Intel PCI bridge
'Prefetchable Base Upper 32 Bits' register (PCI_PREF_BASE_UPPER32). In the
cases that I checked, this register has value 0 and we just have to rewrite
that value.
Linux already saves and restores PCI config space during suspend/resume,
but this register was being skipped because upon resume, it already has
value 0 (the correct, pre-suspend value).
Intel appear to have previously acknowledged this behaviour and the
requirement to rewrite this register:
https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23
Based on that, rewrite the prefetch register values even when that appears
unnecessary.
We have confirmed this solution on all the affected models we have in-hands
(X542UQ, UX533FD, X530UN, V272UN).
Additionally, this solves an issue where r8169 MSI-X interrupts were broken
after S3 suspend/resume on ASUS X441UAR. This issue was recently worked
around in commit 7bb05b85bc ("r8169: don't use MSI-X on RTL8106e"). It
also fixes the same issue on RTL6186evl/8111evl on an Aimfor-tech laptop
that we had not yet patched. I suspect it will also fix the issue that was
worked around in commit 7c53a72245 ("r8169: don't use MSI-X on
RTL8168g").
Thomas Martitz reports that this change also solves an issue where the AMD
Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive after S3
suspend/resume.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201069
Signed-off-by: Daniel Drake <drake@endlessm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-By: Peter Wu <peter@lekensteyn.nl>
CC: stable@vger.kernel.org
HP 6730b laptop has an ethernet NIC connected to one of the PCIe root
ports. The root ports themselves are native PCIe hotplug capable. Now,
during boot after PCI devices are scanned the BIOS triggers ACPI bus check
directly to the NIC:
ACPI: \_SB_.PCI0.RP06.NIC_: Bus check in hotplug_event()
It is not clear why it is sending bus check but regardless the ACPI hotplug
notify handler calls enable_slot() directly (instead of going through
acpiphp_check_bridge() as there is no bridge), which ends up handling
special case for non-hotplug bridges with native PCIe hotplug. This
results a crash of some kind but the reporter only sees black screen so it
is hard to figure out the exact spot and what actually happens. Based on
a few fix proposals it was tracked to crash somewhere inside
pci_assign_unassigned_bridge_resources().
In any case we should not really be in that special branch at all because
the ACPI notify happened to a slot that is not a PCI bridge (it is just a
regular PCI device).
Fix this so that we only go to that special branch if we are calling
enable_slot() for a bridge (e.g., the ACPI notification was for the
bridge).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201127
Fixes: 84c8b58ed3 ("ACPI / hotplug / PCI: Don't scan bridges managed by native hotplug")
Reported-by: Peter Anemone <peter.anemone@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CC: stable@vger.kernel.org # v4.18+
If an Endpoint reported an error with ERR_FATAL, we previously ran driver
error recovery callbacks only for the Endpoint's driver. But if we reset a
Link to recover from the error, all downstream components are affected,
including the Endpoint, any multi-function peers, and children of those
peers.
Initiate the Link reset from the deepest Downstream Port that is
reliable, and call the error recovery callbacks for all its children.
If a Downstream Port (including a Root Port) reports an error, we assume
the Port itself is reliable and we need to reset its downstream Link. In
all other cases (Switch Upstream Ports, Endpoints, Bridges, etc), we assume
the Link leading to the component needs to be reset, so we initiate the
reset at the parent Downstream Port.
This allows two other clean-ups. First, we currently only use a Link
reset, which can only be initiated using a Downstream Port, so we can
remove checks for Endpoints. Second, the Downstream Port where we initiate
the Link reset is reliable (unlike components downstream from it), so the
special cases for error detect and resume are no longer necessary.
Signed-off-by: Keith Busch <keith.busch@intel.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
We don't need to be paranoid about the topology changing while handling an
error. If the device has changed in a hotplug capable slot, we can rely on
the presence detection handling to react to a changing topology.
Restore the fatal error handling behavior that existed before merging DPC
with AER with 7e9084b367 ("PCI/AER: Handle ERR_FATAL with removal and
re-enumeration of devices").
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
It is a serious driver defect to enable MSI or MSI-X more than once. Doing
so may panic the kernel as in the stack trace below:
Call Trace:
sysfs_add_one+0xa5/0xd0
create_dir+0x7c/0xe0
sysfs_create_subdir+0x1c/0x20
internal_create_group+0x6d/0x290
sysfs_create_groups+0x4a/0xa0
populate_msi_sysfs+0x1cd/0x210
pci_enable_msix+0x31c/0x3e0
igbuio_pci_open+0x72/0x300 [igb_uio]
uio_open+0xcc/0x120 [uio]
chrdev_open+0xa1/0x1e0
[...]
do_sys_open+0xf3/0x1f0
SyS_open+0x1e/0x20
system_call_fastpath+0x16/0x1b
---[ end trace 11042e2848880209 ]---
Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffa056b4fa
We want to keep the WARN_ON() and stack trace so the driver can be fixed,
but we can avoid the kernel panic by returning an error. We may still get
warnings like this:
Call Trace:
pci_enable_msix+0x3c9/0x3e0
igbuio_pci_open+0x72/0x300 [igb_uio]
uio_open+0xcc/0x120 [uio]
chrdev_open+0xa1/0x1e0
[...]
do_sys_open+0xf3/0x1f0
SyS_open+0x1e/0x20
system_call_fastpath+0x16/0x1b
------------[ cut here ]------------
WARNING: at fs/sysfs/dir.c:526 sysfs_add_one+0xa5/0xd0()
sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:03.0/0000:01:00.1/msi_irqs'
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
[bhelgaas: changelog, fix patch whitespace, remove !!]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Errata i870 is applicable in both EP and RC mode. Therefore rename
function dra7xx_pcie_ep_unaligned_memaccess(), that implements errata
workaround, to dra7xx_pcie_unaligned_memaccess() and call it for both RC
and EP. Make sure driver probe does not fail in case the workaround is not
applied for RC mode in order to maintain DT backward compatibility.
Reported-by: Chris Welch <Chris.Welch@viavisolutions.com>
Signed-off-by: Vignesh R <vigneshr@ti.com>
[lorenzo.pieralisi@arm.com: reworded the log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Kishon Vijay Abraham I <kishon@ti.com>
PCI host drivers have already matched on compatible strings, so checking
device_type is redundant. Also, device_type is considered deprecated for
FDT though we've still been requiring it for PCI hosts as it is useful
for finding PCI buses.
Signed-off-by: Rob Herring <robh@kernel.org>
[lorenzo.pieralisi@arm.com: reformatted the log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Alan Douglas <adouglas@cadence.com>
Acked-by: Subrahmaya Lingappa <l.subrahmanya@mobiveil.co.in>
Acked-by: Michal Simek <michal.simek@xilinx.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Alan Douglas <adouglas@cadence.com>
Cc: Subrahmanya Lingappa <l.subrahmanya@mobiveil.co.in>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: linux-pci@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
iommu-map property is also used by devices with fsl-mc. This
patch moves the of_pci_map_rid to generic location, so that it
can be used by other busses too.
'of_pci_map_rid' is renamed here to 'of_map_rid' and there is no
functional change done in the API.
Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In case of error, the function pci_create_slot() returns ERR_PTR() and
never returns NULL. The NULL test in the return value check should be
replaced with IS_ERR().
Fixes: a15f2c08c7 ("PCI: hv: support reporting serial number as slot information")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The secondary bus reset may have link side effects that a hotplug capable
port may incorrectly react to. Use the slot specific reset for hotplug
ports, fixing the undesirable link down-up handling during error
recovering.
Signed-off-by: Keith Busch <keith.busch@intel.com>
[bhelgaas: fold in
https://lore.kernel.org/linux-pci/20180926152326.14821-1-keith.busch@intel.com
for issue reported by Stephen Rothwell <sfr@canb.auug.org.au>]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
The AER driver has never read the config space of an endpoint that reported
a fatal error because the link to that device is considered unreliable.
An ERR_FATAL from an upstream port almost certainly indicates an error on
its upstream link, so we can't expect to reliably read its config space for
the same reason.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
Error handling may be running in parallel with a hot removal. Reference
count the device during AER handling so the device can not be freed while
AER wants to reference it.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
When programming the inbound/outbound ATUs, we call usleep_range() after
each checking PCIE_ATU_ENABLE bit. Unfortunately, the ATU programming
can be executed in atomic context:
inbound ATU programming could be called through
pci_epc_write_header()
=>dw_pcie_ep_write_header()
=>dw_pcie_prog_inbound_atu()
outbound ATU programming could be called through
pci_bus_read_config_dword()
=>dw_pcie_rd_conf()
=>dw_pcie_prog_outbound_atu()
Fix this issue by calling mdelay() instead.
Fixes: f8aed6ec62 ("PCI: dwc: designware: Add EP mode support")
Fixes: d8bbeb39fb ("PCI: designware: Wait for iATU enable")
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
[lorenzo.pieralisi@arm.com: commit log update]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
This patch provides DPC save and restore capabilities. This is necessary
for the driver to observe DPC events in the event the configuration space
needs to be restored after a reset.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
The port's config space may be cleared after a link reset, which wipes out
the bridge's bus and memory windows. Restore the config space that was
saved during probe so we can access downstream devices.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
The PCI port driver saves the PCI state after initializing the device with
the applicable service devices. This was, however, before the service
drivers were even registered because PCI probe happens before the
device_initcall initialized those service drivers. The config space state
that the services set up were not being saved. The end result would cause
PCI devices to not react to events that the drivers think they did if the
PCI state ever needed to be restored.
Fix this by changing the service drivers from using the init calls to
having the portdrv driver calling the services directly. This will get the
state saved as desired, while making the relationship between the port
driver and the services under it more explicit in the code.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
While refactoring the PCI hotplug core's API, I noticed a significant
amount of technical debt in some of the hotplug drivers. Document the
issues that caught my eye for starters.
I do not have hardware at my disposal that utilizes the listed drivers
and I think that's a prerequisite to work on them to ensure that no
regressions sneak in. But some of this hardware is so old that it may be
hard to come by. Obviously, it is fine to support old hardware, but the
drivers need to be maintained.
If noone steps up, perhaps we should consider sunsetting a few drivers
by moving them to staging. Based on my findings, ibmphp would be the
first candidate. I've found it fairly difficult to apply my API
refactorings to it and have listed some obvious bugs in the driver.
cpqphp is also in need of a modernization and would be a second
candidate for relegation to staging.
shpchp was introduced in the same commit as pciehp but hasn't benefited
from the same amount of refactoring due to the decline of conventional
PCI's relevance. Yet hardware supporting it may be more prevalent than
for the proprietary hotplug methods.
Per Documentation/process/2.Process.rst, "a TODO file should be present"
for drivers in staging. The file introduced by the present commit may
serve as a basis for this.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Scott Murray <scott@spiteful.org>
Cc: Dan Zink <dan.zink@hpe.com>
Cc: Prarit Bhargava <prarit@redhat.com>
When the PCI hotplug core and its first user, cpqphp, were introduced in
February 2002 with historic commit a8a2069f432c, cpqphp allocated a slot
struct for its internal use plus a hotplug_slot struct to be registered
with the hotplug core and linked the two with pointers:
https://git.kernel.org/tglx/history/c/a8a2069f432c
Nowadays, the predominant pattern in the tree is to embed ("subclass")
such structures in one another and cast to the containing struct with
container_of(). But it wasn't until July 2002 that container_of() was
introduced with historic commit ec4f214232cf:
https://git.kernel.org/tglx/history/c/ec4f214232cf
pnv_php, introduced in 2016, did the right thing and embedded struct
hotplug_slot in its internal struct pnv_php_slot, but all other drivers
cargo-culted cpqphp's design and linked separate structs with pointers.
Embedding structs is preferrable to linking them with pointers because
it requires fewer allocations, thereby reducing overhead and simplifying
error paths. Casting an embedded struct to the containing struct
becomes a cheap subtraction rather than a dereference. And having fewer
pointers reduces the risk of them pointing nowhere either accidentally
or due to an attack.
Convert all drivers to embed struct hotplug_slot in their internal slot
struct. The "private" pointer in struct hotplug_slot thereby becomes
unused, so drop it.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> # drivers/pci/hotplug/rpa*
Acked-by: Sebastian Ott <sebott@linux.ibm.com> # drivers/pci/hotplug/s390*
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> # drivers/platform/x86
Cc: Len Brown <lenb@kernel.org>
Cc: Scott Murray <scott@spiteful.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Corentin Chary <corentin.chary@gmail.com>
Cc: Darren Hart <dvhart@infradead.org>
Ever since the PCI hotplug core was introduced in 2002, drivers had to
allocate and register a struct hotplug_slot_info for every slot:
https://git.kernel.org/tglx/history/c/a8a2069f432c
Apparently the idea was that drivers furnish the hotplug core with an
up-to-date card presence status, power status, latch status and
attention indicator status as well as notify the hotplug core of changes
thereof. However only 4 out of 12 hotplug drivers bother to notify the
hotplug core with pci_hp_change_slot_info() and the hotplug core never
made any use of the information: There is just a single macro in
pci_hotplug_core.c, GET_STATUS(), which uses the hotplug_slot_info if
the driver lacks the corresponding callback in hotplug_slot_ops. The
macro is called when the user reads the attribute via sysfs.
Now, if the callback isn't defined, the attribute isn't exposed in sysfs
in the first place (see e.g. has_power_file()). There are only two
situations when the hotplug_slot_info would actually be accessed:
* If the driver defines ->enable_slot or ->disable_slot but not
->get_power_status.
* If the driver defines ->set_attention_status but not
->get_attention_status.
There is no driver doing the former and just a single driver doing the
latter, namely pnv_php.c. Amend it with a ->get_attention_status
callback. With that, the hotplug_slot_info becomes completely unused by
the PCI hotplug core. But a few drivers use it internally as a cache:
cpcihp uses it to cache the latch_status and adapter_status.
cpqhp uses it to cache the adapter_status.
pnv_php and rpaphp use it to cache the attention_status.
shpchp uses it to cache all four values.
Amend these drivers to cache the information in their private slot
struct. shpchp's slot struct already contains members to cache the
power_status and adapter_status, so additional members are only needed
for the other two values. In the case of cpqphp, the cached value is
only accessed in a single place, so instead of caching it, read the
current value from the hardware.
Caution: acpiphp, cpci, cpqhp, shpchp, asus-wmi and eeepc-laptop
populate the hotplug_slot_info with initial values on probe. That code
is herewith removed. There is a theoretical chance that the code has
side effects without which the driver fails to function, e.g. if the
ACPI method to read the adapter status needs to be executed at least
once on probe. That seems unlikely to me, still maintainers should
review the changes carefully for this possibility.
Rafael adds: "I'm not aware of any case in which it will break anything,
[...] but if that happens, it may be necessary to add the execution of
the control methods in question directly to the initialization part."
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> # drivers/pci/hotplug/rpa*
Acked-by: Sebastian Ott <sebott@linux.ibm.com> # drivers/pci/hotplug/s390*
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> # drivers/platform/x86
Cc: Len Brown <lenb@kernel.org>
Cc: Scott Murray <scott@spiteful.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Corentin Chary <corentin.chary@gmail.com>
Cc: Darren Hart <dvhart@infradead.org>
Hotplug drivers cannot declare their hotplug_slot_ops const, making them
attractive targets for attackers, because upon registration of a hotplug
slot, __pci_hp_initialize() writes to the "owner" and "mod_name" members
in that struct.
Fix by moving these members to struct hotplug_slot and constify every
driver's hotplug_slot_ops except for pciehp.
pciehp constructs its hotplug_slot_ops at runtime based on the PCIe
port's capabilities, hence cannot declare them const. It can be
converted to __write_rarely once that's mainlined:
http://www.openwall.com/lists/kernel-hardening/2016/11/16/3
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> # drivers/pci/hotplug/rpa*
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> # drivers/platform/x86
Cc: Len Brown <lenb@kernel.org>
Cc: Scott Murray <scott@spiteful.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Corentin Chary <corentin.chary@gmail.com>
Cc: Darren Hart <dvhart@infradead.org>
The members in pciehp's controller struct are arranged in a seemingly
arbitrary order and have grown to an amount that I no longer consider
easily graspable by contributors.
Sort the members into 5 rubrics:
* Slot Capabilities register and quirks
* Slot Control register access
* Slot Status register event handling
* state machine
* hotplug core interface
Obviously, this is just my personal bikeshed color and if anyone has a
better idea, please come forward. Any ordering will do as long as the
information is presented in a manageable manner.
No functional change intended.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Of the members which were just moved from pciehp's slot struct to the
controller struct, rename "lock" to "state_lock" and rename "work" to
"button_work" for clarity. Perform the rename separately to the
unification of the two structs per Sinan's request.
No functional change intended.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sinan Kaya <okaya@kernel.org>
pciehp was originally introduced together with shpchp in a single
commit, c16b4b14d980 ("PCI Hotplug: Add SHPC and PCI Express hot-plug
drivers"):
https://git.kernel.org/tglx/history/c/c16b4b14d980
shpchp supports up to 31 slots per controller, hence uses separate slot
and controller structs. pciehp has a 1:1 relationship between slot and
controller and therefore never required this separation. Nevertheless,
because much of the code had been copy-pasted between the two drivers,
pciehp likewise uses separate structs to this very day.
The artificial separation of data structures adds unnecessary complexity
and bloat to pciehp and requires constantly chasing pointers at runtime.
Simplify the driver by merging struct slot into struct controller.
Merge the slot constructor pcie_init_slot() and the destructor
pcie_cleanup_slot() into the controller counterparts.
No functional change intended.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The WiGig Bus Extension (WBE) specification allows tunneling PCIe over
IEEE 802.11. A product implementing this spec is the wil6210 from
Wilocity (now part of Qualcomm Atheros). It integrates a PCIe switch
with a wireless network adapter:
00.0-+ [1ae9:0101] Upstream Port
+-00.0-+ [1ae9:0200] Downstream Port
| +-00.0 [168c:0034] Atheros AR9462 Wireless Network Adapter
+-02.0 [1ae9:0201] Downstream Port
+-03.0 [1ae9:0201] Downstream Port
Wirelessly attached devices presumably appear below the hotplug ports
with device ID [1ae9:0201]. Oddly, the Downstream Port [1ae9:0200]
leading to the wireless network adapter is likewise Hotplug Capable,
but has its Presence Detect State bit hardwired to zero. Even if the
Link Active bit is set, Presence Detect is zero, so this cannot be
caused by in-band presence detection but only by broken hardware.
pciehp assumes an empty slot if Presence Detect State is zero,
regardless of Link Active being one. Consequently, up until v4.18 it
removes the wireless network adapter in pciehp_resume(). From v4.19 it
already does so in pciehp_probe().
Be lenient towards broken hardware and assume the slot is occupied if
Link Active is set: Introduce pciehp_card_present_or_link_active()
and use it in lieu of pciehp_get_adapter_status() everywhere, except
in pciehp_handle_presence_or_link_change() whose log messages depend
on which of Presence Detect State or Link Active is set.
Remove the Presence Detect State check from __pciehp_enable_slot()
because it is only called if either of Presence Detect State or Link
Active is set.
Caution: There is a possibility that broken hardware exists which has
working Presence Detect but hardwires Link Active to one. On such
hardware the slot will now incorrectly be considered always occupied.
If such hardware is discovered, this commit can be rolled back and a
quirk can be added which sets is_hotplug_bridge = 0 for [1ae9:0200].
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200839
Reported-and-tested-by: David Yang <mmyangfl@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rajat Jain <rajatja@google.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Now that ASPM is configured for *all* PCIe devices at boot, a problem is
seen with systems that set the FADT NO_ASPM bit. This bit indicates that
the OS should not alter the ASPM state, but when
pcie_aspm_init_link_state() runs it only checks for !aspm_support_enabled.
This misses the ACPI_FADT_NO_ASPM case because that is setting
aspm_disabled.
The result is systems may hang at boot after 1302fcf; avoidable if they
boot with pcie_aspm=off (sets !aspm_support_enabled).
Fix this by having aspm_init_link_state() check for either
!aspm_support_enabled or acpm_disabled.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201001
Fixes: 1302fcf0d0 ("PCI: Configure *all* devices, not just hot-added ones")
Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Check return value of devm_pci_remap_iospace().
Addresses-Coverity-ID: 1471965 ("Unchecked return value")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Honghui Zhang <honghui.zhang@mediatek.com>
The driver does not cope with the fact that probe can fail in a number
of cases after enabling runtime PM on the device; this results in
warnings about "Unbalanced pm_runtime_enable". Furthermore if probe
fails after invoking qcom_pcie_host_init() the power-domain will be left
referenced.
As it is not possible for the error handling in qcom_pcie_host_init() to
handle errors happening after returning from that function the
pm_runtime_get_sync() is moved to qcom_pcie_probe() as well.
Fixes: 854b69efbd ("PCI: qcom: add runtime pm support to pcie_port")
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
[lorenzo.pieralisi@arm.com: updated commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Stanimir Varbanov <svarbanov@mm-sol.com>
On imx7d the pcie-phy power domain is turned off in suspend and this can
make the system hang after resume when attempting any read from PCI.
Fix this by adding minimal suspend/resume code. This will prepare for
powering down on suspend and reset the block on resume.
Code is only for imx7d but a very similar sequence can be used for
other SOCs.
Original-by: Richard Zhu <hongxing.zhu@nxp.com>
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
[lorenzo.pieralisi@arm.com: commit log update]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
The power up defaults of the MPLL are designed for the standard 125MHz
refclock derived from the ENET PLL. As this clock has a jitter that
violates the PCIe Gen2 timing requirements, some board designs use
an external reference clock generator. Those clock generators may
output a clock at a different rate than what the MPLL expects
(usually a 100MHz clock, to re-use the PCIe bus clock).
In that case the MPLL must be reconfigured via overrides to use
different refclock dividers and loop multipliers. The i.MX6
reference manual lists both 100MHz and 200MHz as supported refclock
rates and the associated mult and div values.
Only the 100MHz setup has been tested on a real board, but since the
200MHz setup only differs in the used pre-divider it seems safe to
add it now.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Richard Zhu <hongxing.zhu@nxp.com>
Fix previous incorrect logic that limits PAXC slot number to zero only.
In order for SRIOV/VF to work, we need to allow the slot number to be
greater than zero.
Fixes: 46560388c4 ("PCI: iproc: Allow multiple devices except on PAXC")
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Signed-off-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Dave writes:
"Various fixes, all over the place:
1) OOB data generation fix in bluetooth, from Matias Karhumaa.
2) BPF BTF boundary calculation fix, from Martin KaFai Lau.
3) Don't bug on excessive frags, to be compatible in situations mixing
older and newer kernels on each end. From Juergen Gross.
4) Scheduling in RCU fix in hv_netvsc, from Stephen Hemminger.
5) Zero keying information in TLS layer before freeing copies
of them, from Sabrina Dubroca.
6) Fix NULL deref in act_sample, from Davide Caratti.
7) Orphan SKB before GRO in veth to prevent crashes with XDP,
from Toshiaki Makita.
8) Fix use after free in ip6_xmit, from Eric Dumazet.
9) Fix VF mac address regression in bnxt_en, from Micahel Chan.
10) Fix MSG_PEEK behavior in TLS layer, from Daniel Borkmann.
11) Programming adjustments to r8169 which fix not being to enter deep
sleep states on some machines, from Kai-Heng Feng and Hans de
Goede.
12) Fix DST_NOCOUNT flag handling for ipv6 routes, from Peter
Oskolkov."
* gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net: (45 commits)
net/ipv6: do not copy dst flags on rt init
qmi_wwan: set DTR for modems in forced USB2 mode
clk: x86: Stop marking clocks as CLK_IS_CRITICAL
r8169: Get and enable optional ether_clk clock
clk: x86: add "ether_clk" alias for Bay Trail / Cherry Trail
r8169: enable ASPM on RTL8106E
r8169: Align ASPM/CLKREQ setting function with vendor driver
Revert "kcm: remove any offset before parsing messages"
kcm: remove any offset before parsing messages
net: ethernet: Fix a unused function warning.
net: dsa: mv88e6xxx: Fix ATU Miss Violation
tls: fix currently broken MSG_PEEK behavior
hv_netvsc: pair VF based on serial number
PCI: hv: support reporting serial number as slot information
bnxt_en: Fix VF mac address regression.
ipv6: fix possible use-after-free in ip6_xmit()
net: hp100: fix always-true check for link up state
ARM: dts: at91: add new compatibility string for macb on sama5d3
net: macb: disable scatter-gather for macb on sama5d3
net: mvpp2: let phylink manage the carrier state
...
Remove a set but unused variable in quirks.c. Fixes warning:
variable ‘mmio_sys_info’ set but not used [-Wunused-but-set-variable]
Signed-off-by: Joshua Abraham <j.abraham1776@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Switch to bitmap_zalloc() to show clearly what we are allocating. Besides
that it returns pointer of bitmap type ("unsigned long *") instead of the
opaque "void *".
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
pciehp's ->enable_slot, ->disable_slot, ->get_attention_status and
->reset_slot callbacks are currently implemented by wrapper functions
that do nothing else but call down to a backend function. The backends
are not called from anywhere else, so drop the wrappers and use the
backends directly as callbacks, thereby shaving off a few lines of
unnecessary code.
No functional change intended.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Drop the following includes from pciehp source files which no longer use
any of the included symbols:
* <linux/sched/signal.h> in pciehp.h
<linux/signal.h> in pciehp_hpc.c
Added by commit de25968cc8 ("fix more missing includes") to
accommodate for a call to signal_pending().
The call was removed by commit 262303fe32 ("pciehp: fix wait command
completion").
* <linux/interrupt.h> in pciehp_core.c
Added by historic commit f308a2dfbe63 ("PCI: add PCI Express Port Bus
Driver subsystem") to accommodate for a call to free_irq():
https://git.kernel.org/tglx/history/c/f308a2dfbe63
The call was removed by commit 407f452b05 ("pciehp: remove
unnecessary free_irq").
* <linux/time.h> in pciehp_core.c and pciehp_hpc.c
Added by commit 34d03419f0 ("PCIEHP: Add Electro Mechanical
Interlock (EMI) support to the PCIE hotplug driver."),
which was reverted by commit bd3d99c170 ("PCI: Remove untested
Electromechanical Interlock (EMI) support in pciehp.").
* <linux/module.h> in pciehp_ctrl.c, pciehp_hpc.c and pciehp_pci.c
Added by historic commit c16b4b14d980 ("PCI Hotplug: Add SHPC and PCI
Express hot-plug drivers"):
https://git.kernel.org/tglx/history/c/c16b4b14d980
Module-related symbols were neither used back then in those files,
nor are they used today.
* <linux/slab.h> in pciehp_ctrl.c
Added by commit 5a0e3ad6af ("include cleanup: Update gfp.h and
slab.h includes to prepare for breaking implicit slab.h inclusion from
percpu.h") to accommodate for calls to kmalloc().
The calls were removed by commit 0e94916e60 ("PCI: pciehp: Handle
events synchronously").
* "../pci.h" in pciehp_ctrl.c
Added by historic commit 67f4660b72f2 ("PCI: ASPM patch for") to
accommodate for usage of the global variable pcie_mch_quirk:
https://git.kernel.org/tglx/history/c/67f4660b72f2
The global variable was removed by commit 0ba379ec0f ("PCI: Simplify
hotplug mch quirk").
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
When removing PCI devices below a hotplug bridge, pciehp marks them as
disconnected if the card is no longer present in the slot or it quiesces
them if the card is still present (by disabling INTx interrupts, bus
mastering and SERR# reporting).
To detect whether the card is still present, pciehp checks the Presence
Detect State bit in the Slot Status register. The problem with this
approach is that even if the card is present, the link to it may be
down, and it that case it would be better to mark the devices as
disconnected instead of trying to quiesce them. Moreover, if the card
in the slot was quickly replaced by another one, the Presence Detect
State bit would be set, yet trying to quiesce the new card's devices
would be wrong and the correct thing to do is to mark the previous
card's devices as disconnected.
Instead of looking at the Presence Detect State bit, it is better to
differentiate whether the card was surprise removed versus safely
removed (via sysfs or an Attention Button press). On surprise removal,
the devices should be marked as disconnected, whereas on safe removal it
is correct to quiesce the devices.
The knowledge whether a surprise removal or a safe removal is at hand
does exist further up in the call stack: A surprise removal is
initiated by pciehp_handle_presence_or_link_change(), a safe removal by
pciehp_handle_disable_request().
Pass that information down to pciehp_unconfigure_device() and use it in
lieu of the Presence Detect State bit. While there, add kernel-doc to
pciehp_unconfigure_device() and pciehp_configure_device().
Tested-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Keith Busch <keith.busch@intel.com>
Commit 89ee9f7680 ("PCI: Add device disconnected state") iterates over
the devices on a parent bus, marks each as disconnected, then marks
each device's children as disconnected using pci_walk_bus().
The same can be achieved more succinctly by calling pci_walk_bus() on
the parent bus. Moreover, this does not need to wait until acquiring
pci_lock_rescan_remove(), so move it out of that critical section.
The critical section in err.c contains a pci_dev_get() / pci_dev_put()
pair which was apparently copy-pasted from pciehp_pci.c. In the latter
it serves the purpose of holding the struct pci_dev in place until the
Command register is updated. err.c doesn't do anything like that, hence
the pair is unnecessary. Remove it.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Oza Pawandeep <poza@codeaurora.org>
Cc: Sinan Kaya <okaya@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The ACPI specification allows you to provide _PXM entries for devices based
on their location on a particular bus. Let us use that if it is provided
rather than just assuming it makes sense to put the device into the
proximity domain of the root.
An example DSDT entry that will supply this is:
Device (PCI2)
{
Name (_HID, "PNP0A08") // PCI Express Root Bridge
Name (_CID, "PNP0A03") // Compatible PCI Root Bridge
Name(_SEG, 2) // Segment of this Root complex
Name(_BBN, 0xF8) // Base Bus Number
Name(_CCA, 1)
Method (_PXM, 0, NotSerialized) {
Return(0x00)
}
...
Device (BRI0) {
Name (_HID, "19E51610")
Name (_ADR, 0)
Name (_BBN, 0xF9)
Device (CAR0) {
Name (_HID, "97109912")
Name (_ADR, 0)
Method (_PXM, 0, NotSerialized) {
Return(0x02)
}
}
}
}
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Upon removal of the last device on a bus, the link_state of the bridge
leading to that bus is sought to be torn down by having pci_stop_dev()
call pcie_aspm_exit_link_state().
When ASPM was originally introduced by commit 7d715a6c1a ("PCI: add
PCI Express ASPM support"), it determined whether the device being
removed is the last one by calling list_empty() on the bridge's
subordinate devices list. That didn't work because the device is only
removed from the list slightly later in pci_destroy_dev().
Commit 3419c75e15 ("PCI: properly clean up ASPM link state on device
remove") attempted to fix it by calling list_is_last(), but that's not
correct either because it checks whether the device is at the *end* of
the list, not whether it's the last one *left* in the list. If the user
removes the device which happens to be at the end of the list via sysfs
but other devices are preceding the device in the list, the link_state
is torn down prematurely.
The real fix is to move the invocation of pcie_aspm_exit_link_state() to
pci_destroy_dev() and reinstate the call to list_empty(). Remove a
duplicate check for dev->bus->self because pcie_aspm_exit_link_state()
already contains an identical check.
Fixes: 7d715a6c1a ("PCI: add PCI Express ASPM support")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: stable@vger.kernel.org # v2.6.26
The Hyper-V host API for PCI provides a unique "serial number" which
can be used as basis for sysfs PCI slot table. This can be useful
for cases where userspace wants to find the PCI device based on
serial number.
When an SR-IOV NIC is added, the host sends an attach message
with serial number. The kernel doesn't use the serial number, but
it is useful when doing the same thing in a userspace driver such
as the DPDK. By having /sys/bus/pci/slots/N it provides a direct
way to find the matching PCI device.
There maybe some cases where serial number is not unique such
as when using GPU's. But the PCI slot infrastructure will handle
that.
This has a side effect which may also be useful. The common udev
network device naming policy uses the slot information (rather
than PCI address).
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set the eetlp_prefix_path on PCIE_EXP_TYPE_RC_END devices to allow PASID
to be enabled on them. This fixes IOMMUv2 initialization on AMD Carrizo
APUs.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201079
Fixes: 7ce3f912ae ("PCI: Enable PASID only if entire path supports End-End TLP prefixes")
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Calling into the new API to reset the secondary bus results in a deadlock.
This occurs because the device/bus is already locked at probe time.
Reverting back to the old behavior while the API is improved.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200985
Fixes: c6a44ba950 ("PCI: Rename pci_try_reset_bus() to pci_reset_bus()")
Fixes: 409888e096 ("IB/hfi1: Use pci_try_reset_bus() for initiating PCI Secondary Bus Reset")
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Cc: Sinan Kaya <okaya@codeaurora.org>
The pci_reset_bus() function calls pci_probe_reset_slot() to determine
whether to call the slot or bus reset. The check has faulty logic in that
it does not account for pci_probe_reset_slot() being able to return an
errno. Fix by only calling the slot reset when the function returns 0.
Fixes: 811c5cb37d ("PCI: Unify try slot and bus reset API")
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Cc: Sinan Kaya <okaya@codeaurora.org>
If both hot-add and power fault were observed in a single interrupt, we
handled the hot-add first, then the power fault, in this path:
pciehp_ist
if (events & (PDC | DLLSC))
pciehp_handle_presence_or_link_change
case OFF_STATE:
pciehp_enable_slot
__pciehp_enable_slot
board_added
pciehp_power_on_slot
ctrl->power_fault_detected = 0
pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_ON, PCI_EXP_SLTCTL_PCC)
pciehp_green_led_on(p_slot) # power LED on
pciehp_set_attention_status(p_slot, 0) # attention LED off
if ((events & PFD) && !ctrl->power_fault_detected)
ctrl->power_fault_detected = 1
pciehp_set_attention_status(1) # attention LED on
pciehp_green_led_off(slot) # power LED off
This left the attention indicator on (even though the hot-add succeeded)
and the power indicator off (even though the slot power was on).
Fix this by checking for power faults before checking for new devices.
Prior to 0e94916e60, this was successful because everything was chained
through work queues and the order was:
INT_PRESENCE_ON -> INT_POWER_FAULT -> ENABLE_REQ
The ENABLE_REQ cleared the power fault at the end, but now everything is
handled inline with the interrupt thread, such that the work ENABLE_REQ was
doing happens before power fault handling now.
Fixes: 0e94916e60 ("PCI: pciehp: Handle events synchronously")
Signed-off-by: Keith Busch <keith.busch@intel.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
p.port can is indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/pci/switch/switchtec.c:912 ioctl_port_to_pff() warn: potential spectre issue 'pcfg->dsp_pff_inst_id' [r]
Fix this by sanitizing p.port before using it to index
pcfg->dsp_pff_inst_id
Notice that given that speculation windows are large, the policy is to kill
the speculation on the first load and not worry if it can be completed with
a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Logan Gunthorpe <logang@deltatee.com>
Cc: stable@vger.kernel.org
Merge more updates from Andrew Morton:
- the rest of MM
- procfs updates
- various misc things
- more y2038 fixes
- get_maintainer updates
- lib/ updates
- checkpatch updates
- various epoll updates
- autofs updates
- hfsplus
- some reiserfs work
- fatfs updates
- signal.c cleanups
- ipc/ updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (166 commits)
ipc/util.c: update return value of ipc_getref from int to bool
ipc/util.c: further variable name cleanups
ipc: simplify ipc initialization
ipc: get rid of ids->tables_initialized hack
lib/rhashtable: guarantee initial hashtable allocation
lib/rhashtable: simplify bucket_table_alloc()
ipc: drop ipc_lock()
ipc/util.c: correct comment in ipc_obtain_object_check
ipc: rename ipcctl_pre_down_nolock()
ipc/util.c: use ipc_rcu_putref() for failues in ipc_addid()
ipc: reorganize initialization of kern_ipc_perm.seq
ipc: compute kern_ipc_perm.id under the ipc lock
init/Kconfig: remove EXPERT from CHECKPOINT_RESTORE
fs/sysv/inode.c: use ktime_get_real_seconds() for superblock stamp
adfs: use timespec64 for time conversion
kernel/sysctl.c: fix typos in comments
drivers/rapidio/devices/rio_mport_cdev.c: remove redundant pointer md
fork: don't copy inconsistent signal handler state to child
signal: make get_signal() return bool
signal: make sigkill_pending() return bool
...
Allow the PCI quirk tables to be emitted in a way that avoids absolute
references to the hook functions. This reduces the size of the entries,
and, more importantly, makes them invariant under runtime relocation
(e.g., for KASLR)
Link: http://lkml.kernel.org/r/20180704083651.24360-6-ard.biesheuvel@linaro.org
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morris <james.morris@microsoft.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Make the idle loop handle stopped scheduler tick correctly (Rafael
Wysocki).
- Prevent the menu cpuidle governor from letting CPUs spend too much
time in shallow idle states when it is invoked with scheduler tick
stopped and clean it up somewhat (Rafael Wysocki).
- Avoid invoking the platform firmware to make the platform enter
the ACPI S3 sleep state with suspended PCIe root ports which may
confuse the firmware and cause it to crash (Rafael Wysocki).
- Fix sysfs-related race in the ondemand and conservative cpufreq
governors which may cause the system to crash if the governor
module is removed during an update of CPU frequency limits (Henry
Willard).
- Select SRCU when building the system wakeup framework to avoid a
build issue in it (zhangyi).
- Make the descriptions of ACPI C-states vendor-neutral to avoid
confusion (Prarit Bhargava).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJbfSnVAAoJEILEb/54YlRxBn4QAKQ8PqkSYkBby+1hb90ET4dk
VaLkbCYXuzLK5rIDvnbYOALhVKo4B29Ex5GdCLN7cWkZMkrVKe7oX8QQTnp3/7lF
URjTKgTNec5uJG652PrE3ESAa3X/kYggj6aeQOxDR4iYKzcpJEQ92ekFW+SoJTNp
Jc2kZh3qkC2On64GB3ibsZaKnmHfPvLg0t4agwzuYq/Gff8NRJFk7kMwAPzqGzZo
b2UVRcYFWIRkJjgmU9iInoeHIY8mBdT3IiKwTemZP1dOhb5T1AHOXwGTk6/cS+RH
A9qx4eg7I3R00KmnYvO8WytYJeOu2qb83GIUx4fIJGOqfvevm5xkxB9F+nfE+ouj
ROBqO4+X4XfQGPw8slayg0rJjI9JSkXLnLdl0Qw2WRlbc4/fVWntra1C57EeKFBR
EG9UAF9+7nUUx0bOCLsfFF3+r9R3SDUjk7b4thyhYncyQRsYC+FL7ztlxnMzVtzW
M5SF2sPrpcQzqmcszdUdbESI10n5X8m/crJW4rsbTxBpAM+coO+uLcvHWOY4MpkW
BgBsR6bMDAlG/VlTFgeeP/tkCRd5zNlJi7yBFItXuOoVKXpnHCJuxq2WkZ1Rb74M
Gk1d3TduekHJm8VsLEdCJR/tEk1cMc0zVUD/a1yzI4Z21QxvXUCqMDdws4/Ey184
qmKgNR9R94vSC5xIPRhM
=9GrU
-----END PGP SIGNATURE-----
Merge tag 'pm-4.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"These fix the main idle loop and the menu cpuidle governor, clean up
the latter, fix a mistake in the PCI bus type's support for system
suspend and resume, fix the ondemand and conservative cpufreq
governors, address a build issue in the system wakeup framework and
make the ACPI C-states desciptions less confusing.
Specifics:
- Make the idle loop handle stopped scheduler tick correctly (Rafael
Wysocki).
- Prevent the menu cpuidle governor from letting CPUs spend too much
time in shallow idle states when it is invoked with scheduler tick
stopped and clean it up somewhat (Rafael Wysocki).
- Avoid invoking the platform firmware to make the platform enter the
ACPI S3 sleep state with suspended PCIe root ports which may
confuse the firmware and cause it to crash (Rafael Wysocki).
- Fix sysfs-related race in the ondemand and conservative cpufreq
governors which may cause the system to crash if the governor
module is removed during an update of CPU frequency limits (Henry
Willard).
- Select SRCU when building the system wakeup framework to avoid a
build issue in it (zhangyi).
- Make the descriptions of ACPI C-states vendor-neutral to avoid
confusion (Prarit Bhargava)"
* tag 'pm-4.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpuidle: menu: Handle stopped tick more aggressively
sched: idle: Avoid retaining the tick when it has been stopped
PCI / ACPI / PM: Resume all bridges on suspend-to-RAM
cpuidle: menu: Update stale polling override comment
cpufreq: governor: Avoid accessing invalid governor_data
x86/ACPI/cstate: Make APCI C1 FFH MWAIT C-state description vendor-neutral
cpuidle: menu: Fix white space
PM / sleep: wakeup: Fix build error caused by missing SRCU support
Commit 26112ddc25 (PCI / ACPI / PM: Resume bridges w/o drivers on
suspend-to-RAM) attempted to fix a functional regression resulting
from commit c62ec4610c (PM / core: Fix direct_complete handling
for devices with no callbacks) by resuming PCI bridges without
drivers (that is, "parallel PCI" ones) during system-wide suspend if
the target system state is not ACPI S0 (working state).
That turns out insufficient, however, as it is reported that, at
least in one case, the platform firmware gets confused if a PCIe
root port is suspended before entering the ACPI S3 sleep state.
That issue was exposed by commit 77b3729ca03 (PCI / PM: Use
SMART_SUSPEND and LEAVE_SUSPENDED flags for PCIe ports) that allowed
PCIe ports to stay in runtime suspend during system-wide suspend
(which is OK for suspend-to-idle, but turns out to be problematic
otherwise).
For this reason, drop the driver check from acpi_pci_need_resume()
and resume all bridges (including PCIe ports with drivers) during
system-wide suspend if the target system state is not ACPI S0.
[If the target system state is ACPI S0, it means suspend-to-idle
and the platform firmware is not going to be invoked to actually
suspend the system, so there is no need to resume the bridges in
that case.]
Fixes: 77b3729ca03 (PCI / PM: Use SMART_SUSPEND and LEAVE_SUSPENDED flags for PCIe ports)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200675
Reported-by: teika kazura <teika@gmx.com>
Tested-by: teika kazura <teika@gmx.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: 4.16+ <stable@vger.kernel.org> # 4.16+: 26112ddc25 (PCI / ACPI / PM: Resume bridges ...)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlt1f9AUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vxbdhAArnhRvkwOk4m4/LCuKF6HpmlxbBNC
TjnBCenNf+lFXzWskfDFGFl/Wif4UzGbRTSCNQrwMzj3Ww3f/6R2QIq9rEJvyNC4
VdxQnaBEZSUgN87q5UGqgdjMTo3zFvlFH6fpb5XDiQ5IX/QZeXeYqoB64w+HvKPU
M+IsoOvnA5gb7pMcpchrGUnSfS1e6AqQbbTt6tZflore6YCEA4cH5OnpGx8qiZIp
ut+CMBvQjQB01fHeBc/wGrVte4NwXdONrXqpUb4sHF7HqRNfEh0QVyPhvebBi+k1
kquqoBQfPFTqgcab31VOcQhg70dEx+1qGm5/YBAwmhCpHR/g2gioFXoROsr+iUOe
BtF6LZr+Y8cySuhJnkCrJBqWvvBaKbJLg0KMbI+7p4o9MZpod2u7LS5LFrlRDyKW
3nz3o+b1+v3tCCKVKIhKo0ljolgkweQtR1f6KIHvq93wBODHVQnAOt9NlPfHVyks
ryGBnOhMjoU5hvfexgIWFk9Ph9MEVQSffkI+TeFPO/tyGBfGfQyGtESiXuEaMQaH
FGdZHX2RLkY3pWHOtWeMzRHzOnr2XjpDFcAqL3HBGPdJ30K3Umv3WOgoFe2SaocG
0gaddPjKSwwM4Sa/VP+O5cjGuzi7QnczSDdpYjxIGZzBav32hqx4/rsnLw7bHH8y
XkEme7cYJc8MGsA=
=2Dmn
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci updates from Bjorn Helgaas:
- Decode AER errors with names similar to "lspci" (Tyler Baicar)
- Expose AER statistics in sysfs (Rajat Jain)
- Clear AER status bits selectively based on the type of recovery (Oza
Pawandeep)
- Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST (Alexandru
Gagniuc)
- Don't clear AER status bits if we're using the "Firmware-First"
strategy where firmware owns the registers (Alexandru Gagniuc)
- Use sysfs_match_string() to simplify ASPM sysfs parsing (Andy
Shevchenko)
- Remove unnecessary includes of <linux/pci-aspm.h> (Bjorn Helgaas)
- Defer DPC event handling to work queue (Keith Busch)
- Use threaded IRQ for DPC bottom half (Keith Busch)
- Print AER status while handling DPC events (Keith Busch)
- Work around IDT switch ACS Source Validation erratum (James
Puthukattukaran)
- Emit diagnostics for all cases of PCIe Link downtraining (Links
operating slower than they're capable of) (Alexandru Gagniuc)
- Skip VFs when configuring Max Payload Size (Myron Stowe)
- Reduce Root Port Max Payload Size if necessary when hot-adding a
device below it (Myron Stowe)
- Simplify SHPC existence/permission checks (Bjorn Helgaas)
- Remove hotplug sample skeleton driver (Lukas Wunner)
- Convert pciehp to threaded IRQ handling (Lukas Wunner)
- Improve pciehp tolerance of missed events and initially unstable
links (Lukas Wunner)
- Clear spurious pciehp events on resume (Lukas Wunner)
- Add pciehp runtime PM support, including for Thunderbolt controllers
(Lukas Wunner)
- Support interrupts from pciehp bridges in D3hot (Lukas Wunner)
- Mark fall-through switch cases before enabling -Wimplicit-fallthrough
(Gustavo A. R. Silva)
- Move DMA-debug PCI init from arch code to PCI core (Christoph
Hellwig)
- Fix pci_request_irq() usage of IRQF_ONESHOT when no handler is
supplied (Heiner Kallweit)
- Unify PCI and DMA direction #defines (Shunyong Yang)
- Add PCI_DEVICE_DATA() macro (Andy Shevchenko)
- Check for VPD completion before checking for timeout (Bert Kenward)
- Limit Netronome NFP5000 config space size to work around erratum
(Jakub Kicinski)
- Set IRQCHIP_ONESHOT_SAFE for PCI MSI irqchips (Heiner Kallweit)
- Document ACPI description of PCI host bridges (Bjorn Helgaas)
- Add "pci=disable_acs_redir=" parameter to disable ACS redirection for
peer-to-peer DMA support (we don't have the peer-to-peer support yet;
this is just one piece) (Logan Gunthorpe)
- Clean up devm_of_pci_get_host_bridge_resources() resource allocation
(Jan Kiszka)
- Fixup resizable BARs after suspend/resume (Christian König)
- Make "pci=earlydump" generic (Sinan Kaya)
- Fix ROM BAR access routines to stay in bounds and check for signature
correctly (Rex Zhu)
- Add DMA alias quirk for Microsemi Switchtec NTB (Doug Meyer)
- Expand documentation for pci_add_dma_alias() (Logan Gunthorpe)
- To avoid bus errors, enable PASID only if entire path supports
End-End TLP prefixes (Sinan Kaya)
- Unify slot and bus reset functions and remove hotplug knowledge from
callers (Sinan Kaya)
- Add Function-Level Reset quirks for Intel and Samsung NVMe devices to
fix guest reboot issues (Alex Williamson)
- Add function 1 DMA alias quirk for Marvell 88SS9183 PCIe SSD
Controller (Bjorn Helgaas)
- Remove Xilinx AXI-PCIe host bridge arch dependency (Palmer Dabbelt)
- Remove Aardvark outbound window configuration (Evan Wang)
- Fix Aardvark bridge window sizing issue (Zachary Zhang)
- Convert Aardvark to use pci_host_probe() to reduce code duplication
(Thomas Petazzoni)
- Correct the Cadence cdns_pcie_writel() signature (Alan Douglas)
- Add Cadence support for optional generic PHYs (Alan Douglas)
- Add Cadence power management ops (Alan Douglas)
- Remove redundant variable from Cadence driver (Colin Ian King)
- Add Kirin MSI support (Xiaowei Song)
- Drop unnecessary root_bus_nr setting from exynos, imx6, keystone,
armada8k, artpec6, designware-plat, histb, qcom, spear13xx (Shawn
Guo)
- Move link notification settings from DesignWare core to individual
drivers (Gustavo Pimentel)
- Add endpoint library MSI-X interfaces (Gustavo Pimentel)
- Correct signature of endpoint library IRQ interfaces (Gustavo
Pimentel)
- Add DesignWare endpoint library MSI-X callbacks (Gustavo Pimentel)
- Add endpoint library MSI-X test support (Gustavo Pimentel)
- Remove unnecessary GFP_ATOMIC from Hyper-V "new child" allocation
(Jia-Ju Bai)
- Add more devices to Broadcom PAXC quirk (Ray Jui)
- Work around corrupted Broadcom PAXC config space to enable SMMU and
GICv3 ITS (Ray Jui)
- Disable MSI parsing to work around broken Broadcom PAXC logic in some
devices (Ray Jui)
- Hide unconfigured functions to work around a Broadcom PAXC defect
(Ray Jui)
- Lower iproc log level to reduce console output during boot (Ray Jui)
- Fix mobiveil iomem/phys_addr_t type usage (Lorenzo Pieralisi)
- Fix mobiveil missing include file (Lorenzo Pieralisi)
- Add mobiveil Kconfig/Makefile support (Lorenzo Pieralisi)
- Fix mvebu I/O space remapping issues (Thomas Petazzoni)
- Use generic pci_host_bridge in mvebu instead of ARM-specific API
(Thomas Petazzoni)
- Whitelist VMD devices with fast interrupt handlers to avoid sharing
vectors with slow handlers (Keith Busch)
* tag 'pci-v4.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (153 commits)
PCI/AER: Don't clear AER bits if error handling is Firmware-First
PCI: Limit config space size for Netronome NFP5000
PCI/MSI: Set IRQCHIP_ONESHOT_SAFE for PCI-MSI irqchips
PCI/VPD: Check for VPD access completion before checking for timeout
PCI: Add PCI_DEVICE_DATA() macro to fully describe device ID entry
PCI: Match Root Port's MPS to endpoint's MPSS as necessary
PCI: Skip MPS logic for Virtual Functions (VFs)
PCI: Add function 1 DMA alias quirk for Marvell 88SS9183
PCI: Check for PCIe Link downtraining
PCI: Add ACS Redirect disable quirk for Intel Sunrise Point
PCI: Add device-specific ACS Redirect disable infrastructure
PCI: Convert device-specific ACS quirks from NULL termination to ARRAY_SIZE
PCI: Add "pci=disable_acs_redir=" parameter for peer-to-peer support
PCI: Allow specifying devices using a base bus and path of devfns
PCI: Make specifying PCI devices in kernel parameters reusable
PCI: Hide ACS quirk declarations inside PCI core
PCI: Delay after FLR of Intel DC P3700 NVMe
PCI: Disable Samsung SM961/PM961 NVMe before FLR
PCI: Export pcie_has_flr()
PCI: mvebu: Drop bogus comment above mvebu_pcie_map_registers()
...
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJbc41pAAoJEAx081l5xIa+ZrAP/AzKj4i4pBLVJcvNZ2BwD+UD
ZNSNj2iqCJ5+Jo/WtIwQ8tLct9UqfVssUwBke6tZksiLdTigGPTUyVIAdK+9kyWD
D00m3x/pToJrSF2D0FwxQlPUtPkohp9N+E6+TU7gd1oCasZfBzmcEpoVAmZf+NWE
kN1xXpmGxZWpu0wc7JA2lv9MuUTijCwIqJqa5E0bB3z06G5mw+PJ89kYzMx19OyA
ZYQK8y3A40ZGl8UbajZ4xg9pqFCRYFFHGqfYlpUWWTh0XMAXu8+Yqzh3dJxmak7r
4u2pdQBsxPMZO8qKBHpVvI7Zhoe0Ntnolc0XVD+2IbqqnTprVbQs0bWf3YyfUlQi
1/9bWFK67W0LEuzac6M7a7EQqFNiHF13Btao7aqENTIe/GaCZJoopaiRMAmh6EHD
4PezeYqrW8cSaPj6OKouL1BhW9Bjixsg0bvjS/uB6m4KekFCt1++BDFGzkqvm6Mo
SVW7nkJoCFpCASaR7DhUEOPexaHeJ65HCDDUvYdqz9jd2w1TgvvanEZWual1NwEm
ImA8A4wGZ/3KijpyyKm0gE96RX7+zMMZ3brW6p1vhUUKVYJCrvSr5jrXH5+2k6Aw
Y455doGL87IRkwyje/YbQF0I8pbUZD9QS5wII13tLGwOH9/uC/Xl6dHNM40gtqyh
W4gEdY+NAMJmYLvRNawa
=g9rD
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2018-08-15' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"This is the main drm pull request for 4.19.
Rob has some new hardware support for new qualcomm hw that I'll send
along separately. This has the display part of it, the remaining pull
is for the acceleration engine.
This also contains a wound-wait/wait-die mutex rework, Peter has acked
it for merging via my tree.
Otherwise mostly the usual level of activity. Summary:
core:
- Wound-wait/wait-die mutex rework
- Add writeback connector type
- Add "content type" property for HDMI
- Move GEM bo to drm_framebuffer
- Initial gpu scheduler documentation
- GPU scheduler fixes for dying processes
- Console deferred fbcon takeover support
- Displayport support for CEC tunneling over AUX
panel:
- otm8009a panel driver fixes
- Innolux TV123WAM and G070Y2-L01 panel driver
- Ilitek ILI9881c panel driver
- Rocktech RK070ER9427 LCD
- EDT ETM0700G0EDH6 and EDT ETM0700G0BDH6
- DLC DLC0700YZG-1
- BOE HV070WSA-100
- newhaven, nhd-4.3-480272ef-atxl LCD
- DataImage SCF0700C48GGU18
- Sharp LQ035Q7DB03
- p079zca: Refactor to support multiple panels
tinydrm:
- ILI9341 display panel
New driver:
- vkms - virtual kms driver to testing.
i915:
- Icelake:
Display enablement
DSI support
IRQ support
Powerwell support
- GPU reset fixes and improvements
- Full ppgtt support refactoring
- PSR fixes and improvements
- Execlist improvments
- GuC related fixes
amdgpu:
- Initial amdgpu documentation
- JPEG engine support on VCN
- CIK uses powerplay by default
- Move to using core PCIE functionality for gens/lanes
- DC/Powerplay interface rework
- Stutter mode support for RV
- Vega12 Powerplay updates
- GFXOFF fixes
- GPUVM fault debugging
- Vega12 GFXOFF
- DC improvements
- DC i2c/aux changes
- UVD 7.2 fixes
- Powerplay fixes for Polaris12, CZ/ST
- command submission bo_list fixes
amdkfd:
- Raven support
- Power management fixes
udl:
- Cleanups and fixes
nouveau:
- misc fixes and cleanups.
msm:
- DPU1 support display controller in sdm845
- GPU coredump support.
vmwgfx:
- Atomic modesetting validation fixes
- Support for multisample surfaces
armada:
- Atomic modesetting support completed.
exynos:
- IPPv2 fixes
- Move g2d to component framework
- Suspend/resume support cleanups
- Driver cleanups
imx:
- CSI configuration improvements
- Driver cleanups
- Use atomic suspend/resume helpers
- ipu-v3 V4L2 XRGB32/XBGR32 support
pl111:
- Add Nomadik LCDC variant
v3d:
- GPU scheduler jobs management
sun4i:
- R40 display engine support
- TCON TOP driver
mediatek:
- MT2712 SoC support
rockchip:
- vop fixes
omapdrm:
- Workaround for DRA7 errata i932
- Fix mm_list locking
mali-dp:
- Writeback implementation
PM improvements
- Internal error reporting debugfs
tilcdc:
- Single fix for deferred probing
hdlcd:
- Teardown fixes
tda998x:
- Converted to a bridge driver.
etnaviv:
- Misc fixes"
* tag 'drm-next-2018-08-15' of git://anongit.freedesktop.org/drm/drm: (1506 commits)
drm/amdgpu/sriov: give 8s for recover vram under RUNTIME
drm/scheduler: fix param documentation
drm/i2c: tda998x: correct PLL divider calculation
drm/i2c: tda998x: get rid of private fill_modes function
drm/i2c: tda998x: move mode_valid() to bridge
drm/i2c: tda998x: register bridge outside of component helper
drm/i2c: tda998x: cleanup from previous changes
drm/i2c: tda998x: allocate tda998x_priv inside tda998x_create()
drm/i2c: tda998x: convert to bridge driver
drm/scheduler: fix timeout worker setup for out of order job completions
drm/amd/display: display connected to dp-1 does not light up
drm/amd/display: update clk for various HDMI color depths
drm/amd/display: program display clock on cache match
drm/amd/display: Add NULL check for enabling dp ss
drm/amd/display: add vbios table check for enabling dp ss
drm/amd/display: Don't share clk source between DP and HDMI
drm/amd/display: Fix DP HBR2 Eye Diagram Pattern on Carrizo
drm/amd/display: Use calculated disp_clk_khz value for dce110
drm/amd/display: Implement custom degamma lut on dcn
drm/amd/display: Destroy aux_engines only once
...
- Whitelist VMD devices with fast interrupt handlers to avoid sharing
vectors with slow handlers (Keith Busch)
* remotes/lorenzo/pci/vmd:
PCI: vmd: White list for fast interrupt handlers
- Fix mvebu I/O space remapping issues (Thomas Petazzoni)
- Use generic pci_host_bridge in mvebu instead of ARM-specific API
(Thomas Petazzoni)
* remotes/lorenzo/pci/mvebu:
PCI: mvebu: Drop bogus comment above mvebu_pcie_map_registers()
PCI: mvebu: Convert to use pci_host_bridge directly
PCI: mvebu: Use resource_size() to remap I/O space
PCI: mvebu: Only remap I/O space if configured
PCI: mvebu: Fix I/O space end address calculation
PCI: mvebu: Remove redundant platform_set_drvdata() call
- Add more devices to Broadcom PAXC quirk (Ray Jui)
- Work around corrupted Broadcom PAXC config space to enable SMMU and
GICv3 ITS (Ray Jui)
- Disable MSI parsing to work around broken Broadcom PAXC logic in some
devices (Ray Jui)
- Hide unconfigured functions to work around a Broadcom PAXC defect (Ray
Jui)
- Lower iproc log level to reduce console output during boot (Ray Jui)
* remotes/lorenzo/pci/iproc:
PCI: iproc: Reduce inbound/outbound mapping print level
PCI: iproc: Reject unconfigured physical functions from PAXC
PCI: iproc: Disable MSI parsing in certain PAXC blocks
PCI: iproc: Fix up corrupted PAXC root complex config registers
PCI: iproc: Activate PAXC bridge quirk for more devices
- Remove Xilinx AXI-PCIe host bridge arch dependency (Palmer Dabbelt)
* remotes/lorenzo/pci/controller/misc:
PCI/xilinx: Depend on OF instead of the ARCH
- To avoid bus errors, enable PASID only if entire path supports End-End
TLP prefixes (Sinan Kaya)
- Unify slot and bus reset functions and remove hotplug knowledge from
callers (Sinan Kaya)
- Add Function-Level Reset quirks for Intel and Samsung NVMe devices to
fix guest reboot issues (Alex Williamson)
- Add function 1 DMA alias quirk for Marvell 88SS9183 PCIe SSD Controller
(Bjorn Helgaas)
* pci/virtualization:
PCI: Add function 1 DMA alias quirk for Marvell 88SS9183
PCI: Delay after FLR of Intel DC P3700 NVMe
PCI: Disable Samsung SM961/PM961 NVMe before FLR
PCI: Export pcie_has_flr()
PCI: Rename pci_try_reset_bus() to pci_reset_bus()
PCI: Deprecate pci_reset_bus() and pci_reset_slot() functions
PCI: Unify try slot and bus reset API
PCI: Hide pci_reset_bridge_secondary_bus() from drivers
IB/hfi1: Use pci_try_reset_bus() for initiating PCI Secondary Bus Reset
PCI: Handle error return from pci_reset_bridge_secondary_bus()
PCI/IOV: Tidy pci_sriov_set_totalvfs()
PCI: Enable PASID only if entire path supports End-End TLP prefixes
# Conflicts:
# drivers/pci/hotplug/pciehp_hpc.c
- Add DMA alias quirk for Microsemi Switchtec NTB (Doug Meyer)
- Expand documentation for pci_add_dma_alias() (Logan Gunthorpe)
* pci/switchtec:
PCI: Expand documentation for pci_add_dma_alias()
PCI: Add DMA alias quirk for Microsemi Switchtec NTB
switchtec: Use generic PCI Vendor ID and Class Code
# Conflicts:
# drivers/pci/quirks.c
- Clean up devm_of_pci_get_host_bridge_resources() resource allocation
(Jan Kiszka)
- Fixup resizable BARs after suspend/resume (Christian König)
- Make "pci=earlydump" generic (Sinan Kaya)
- Fix ROM BAR access routines to stay in bounds and check for signature
correctly (Rex Zhu)
* pci/resource:
PCI: Make pci_get_rom_size() static
PCI: Add check code for last image indicator not set
PCI: Avoid accessing memory outside the ROM BAR
PCI: Make early dump functionality generic
PCI: Cleanup PCI_REBAR_CTRL_BAR_SHIFT handling
PCI: Restore resized BAR state on resume
PCI: Clean up resource allocation in devm_of_pci_get_host_bridge_resources()
# Conflicts:
# Documentation/admin-guide/kernel-parameters.txt
- Add "pci=disable_acs_redir=" parameter to disable ACS redirection for
peer-to-peer DMA support (we don't have the peer-to-peer support yet;
this is just one piece) (Logan Gunthorpe)
* pci/peer-to-peer:
PCI: Add ACS Redirect disable quirk for Intel Sunrise Point
PCI: Add device-specific ACS Redirect disable infrastructure
PCI: Convert device-specific ACS quirks from NULL termination to ARRAY_SIZE
PCI: Add "pci=disable_acs_redir=" parameter for peer-to-peer support
PCI: Allow specifying devices using a base bus and path of devfns
PCI: Make specifying PCI devices in kernel parameters reusable
PCI: Hide ACS quirk declarations inside PCI core
- Mark fall-through switch cases before enabling -Wimplicit-fallthrough
(Gustavo A. R. Silva)
- Move DMA-debug PCI init from arch code to PCI core (Christoph Hellwig)
- Fix pci_request_irq() usage of IRQF_ONESHOT when no handler is supplied
(Heiner Kallweit)
- Unify PCI and DMA direction #defines (Shunyong Yang)
- Add PCI_DEVICE_DATA() macro (Andy Shevchenko)
- Check for VPD completion before checking for timeout (Bert Kenward)
- Limit Netronome NFP5000 config space size to work around erratum (Jakub
Kicinski)
* pci/misc:
PCI: Limit config space size for Netronome NFP5000
PCI/VPD: Check for VPD access completion before checking for timeout
PCI: Add PCI_DEVICE_DATA() macro to fully describe device ID entry
PCI: Unify PCI and normal DMA direction definitions
PCI: Use IRQF_ONESHOT if pci_request_irq() called with no handler
PCI: Call dma_debug_add_bus() for pci_bus_type from PCI core
PCI: Mark fall-through switch cases before enabling -Wimplicit-fallthrough
# Conflicts:
# drivers/pci/hotplug/pciehp_ctrl.c
- Work around IDT switch ACS Source Validation erratum (James
Puthukattukaran)
- Emit diagnostics for all cases of PCIe Link downtraining (Links
operating slower than they're capable of) (Alexandru Gagniuc)
- Skip VFs when configuring Max Payload Size (Myron Stowe)
- Reduce Root Port Max Payload Size if necessary when hot-adding a device
below it (Myron Stowe)
* pci/enumeration:
PCI: Match Root Port's MPS to endpoint's MPSS as necessary
PCI: Skip MPS logic for Virtual Functions (VFs)
PCI: Check for PCIe Link downtraining
PCI: Workaround IDT switch ACS Source Validation erratum
- Defer DPC event handling to work queue (Keith Busch)
- Use threaded IRQ for DPC bottom half (Keith Busch)
- Print AER status while handling DPC events (Keith Busch)
* pci/dpc:
PCI/DPC: Remove indirection waiting for inactive link
PCI/DPC: Use threaded IRQ for bottom half handling
PCI/DPC: Print AER status in DPC event handling
PCI/DPC: Remove rp_pio_status from dpc struct
PCI/DPC: Defer event handling to work queue
PCI/DPC: Leave interrupts enabled while handling event
- Use sysfs_match_string() to simplify ASPM sysfs parsing (Andy
Shevchenko)
- Remove unnecessary includes of <linux/pci-aspm.h> (Bjorn Helgaas)
* pci/aspm:
PCI: Remove unnecessary include of <linux/pci-aspm.h>
iwlwifi: Remove unnecessary include of <linux/pci-aspm.h>
ath9k: Remove unnecessary include of <linux/pci-aspm.h>
igb: Remove unnecessary include of <linux/pci-aspm.h>
PCI/ASPM: Convert to use sysfs_match_string() helper
- Decode AER errors with names similar to "lspci" (Tyler Baicar)
- Expose AER statistics in sysfs (Rajat Jain)
- Clear AER status bits selectively based on the type of recovery (Oza
Pawandeep)
- Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST (Alexandru
Gagniuc)
- Don't clear AER status bits if we're using the "Firmware-First"
strategy where firmware owns the registers (Alexandru Gagniuc)
* pci/aer:
PCI/AER: Don't clear AER bits if error handling is Firmware-First
PCI/AER: Remove duplicate PCI_EXP_AER_FLAGS definition
PCI/portdrv: Remove pcie_portdrv_err_handler.slot_reset
PCI/AER: Clear device status bits during ERR_COR handling
PCI/AER: Clear device status bits during ERR_FATAL and ERR_NONFATAL
PCI/AER: Remove ERR_FATAL code from ERR_NONFATAL path
PCI/AER: Factor out ERR_NONFATAL status bit clearing
PCI/AER: Clear only ERR_NONFATAL bits during non-fatal recovery
PCI/AER: Clear only ERR_FATAL status bits during fatal recovery
PCI/AER: Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST
PCI/AER: Add sysfs attributes for rootport cumulative stats
PCI/AER: Add sysfs attributes to provide AER stats and breakdown
PCI/AER: Define aer_stats structure for AER capable devices
PCI/AER: Move internal declarations to drivers/pci/pci.h
PCI/AER: Adopt lspci names for AER error decoding
PCI/AER: Expose internal API for obtaining AER information
# Conflicts:
# drivers/pci/pci.h
If the platform requests Firmware-First error handling, firmware is
responsible for reading and clearing AER status bits. If OSPM also clears
them, we may miss errors. See ACPI v6.2, sec 18.3.2.5 and 18.4.
This race is mostly of theoretical significance, as it is not easy to
reasonably demonstrate it in testing.
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
[bhelgaas: add similar guards to pci_cleanup_aer_uncorrect_error_status()
and pci_aer_clear_fatal_status()]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Like the NFP4000 and NFP6000, the NFP5000 as an erratum where reading/
writing to PCI config space addresses above 0x600 can cause the NFP to
generate PCIe completion timeouts.
Limit the NFP5000's PF's config space size to 0x600 bytes as is already
done for the NFP4000 and NFP6000.
The NFP5000's VF is 0x6003 (PCI_DEVICE_ID_NETRONOME_NFP6000_VF), the same
device ID as the NFP6000's VF. Thus, its config space is already limited
by the existing use of quirk_nfp6000().
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Tony Egan <tony.egan@netronome.com>
If flag IRQCHIP_ONESHOT_SAFE isn't set for an irqchip and we have a
threaded interrupt with no primary handler, flag IRQF_ONESHOT needs to be
set for the interrupt, causing some overhead in the threaded interrupt
handler. For more detailed explanation also check following comment in
__setup_irq():
The interrupt was requested with handler = NULL, so we use the default
primary handler for it. But it does not have the oneshot flag set. In
combination with level interrupts this is deadly, because the default
primary handler just wakes the thread, then the irq lines is reenabled,
but the device still has the level irq asserted. Rinse and repeat....
While this works for edge type interrupts, we play it safe and reject
unconditionally because we can't say for sure which type this interrupt
really has. The type flags are unreliable as the underlying chip
implementation can override them.
Another comment in __setup_irq() gives a hint already that this
overhead can be avoided for PCI-MSI:
Some irq chips like MSI based interrupts are per se one shot safe. Check
the chip flags, so we can avoid the unmask dance at the end of the
threaded handler for those.
Following this let's mark all PCI-MSI irqchips as oneshot-safe.
See also discussion here:
https://lkml.kernel.org/r/alpine.DEB.2.21.1808032136490.1658@nanos.tec.linutronix.de
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Previously we checked the timeout before checking the VPD access completion
bit. On a very heavily loaded system this can cause VPD access to timeout.
Check the completion bit before checking the timeout.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Merge L1 Terminal Fault fixes from Thomas Gleixner:
"L1TF, aka L1 Terminal Fault, is yet another speculative hardware
engineering trainwreck. It's a hardware vulnerability which allows
unprivileged speculative access to data which is available in the
Level 1 Data Cache when the page table entry controlling the virtual
address, which is used for the access, has the Present bit cleared or
other reserved bits set.
If an instruction accesses a virtual address for which the relevant
page table entry (PTE) has the Present bit cleared or other reserved
bits set, then speculative execution ignores the invalid PTE and loads
the referenced data if it is present in the Level 1 Data Cache, as if
the page referenced by the address bits in the PTE was still present
and accessible.
While this is a purely speculative mechanism and the instruction will
raise a page fault when it is retired eventually, the pure act of
loading the data and making it available to other speculative
instructions opens up the opportunity for side channel attacks to
unprivileged malicious code, similar to the Meltdown attack.
While Meltdown breaks the user space to kernel space protection, L1TF
allows to attack any physical memory address in the system and the
attack works across all protection domains. It allows an attack of SGX
and also works from inside virtual machines because the speculation
bypasses the extended page table (EPT) protection mechanism.
The assoicated CVEs are: CVE-2018-3615, CVE-2018-3620, CVE-2018-3646
The mitigations provided by this pull request include:
- Host side protection by inverting the upper address bits of a non
present page table entry so the entry points to uncacheable memory.
- Hypervisor protection by flushing L1 Data Cache on VMENTER.
- SMT (HyperThreading) control knobs, which allow to 'turn off' SMT
by offlining the sibling CPU threads. The knobs are available on
the kernel command line and at runtime via sysfs
- Control knobs for the hypervisor mitigation, related to L1D flush
and SMT control. The knobs are available on the kernel command line
and at runtime via sysfs
- Extensive documentation about L1TF including various degrees of
mitigations.
Thanks to all people who have contributed to this in various ways -
patches, review, testing, backporting - and the fruitful, sometimes
heated, but at the end constructive discussions.
There is work in progress to provide other forms of mitigations, which
might be less horrible performance wise for a particular kind of
workloads, but this is not yet ready for consumption due to their
complexity and limitations"
* 'l1tf-final' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
x86/microcode: Allow late microcode loading with SMT disabled
tools headers: Synchronise x86 cpufeatures.h for L1TF additions
x86/mm/kmmio: Make the tracer robust against L1TF
x86/mm/pat: Make set_memory_np() L1TF safe
x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert
x86/speculation/l1tf: Invert all not present mappings
cpu/hotplug: Fix SMT supported evaluation
KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry
x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry
x86/speculation: Simplify sysfs report of VMX L1TF vulnerability
Documentation/l1tf: Remove Yonah processors from not vulnerable list
x86/KVM/VMX: Don't set l1tf_flush_l1d from vmx_handle_external_intr()
x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d
x86: Don't include linux/irq.h from asm/hardirq.h
x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d
x86/irq: Demote irq_cpustat_t::__softirq_pending to u16
x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush()
x86/KVM/VMX: Replace 'vmx_l1d_flush_always' with 'vmx_l1d_flush_cond'
x86/KVM/VMX: Don't set l1tf_flush_l1d to true from vmx_l1d_flush()
cpu/hotplug: detect SMT disabled by BIOS
...
In commit 27d868b5e6 ("PCI: Set MPS to match upstream bridge"), we made
sure every device's MPS setting matches its upstream bridge, making it more
likely that a hot-added device will work in a system with an optimized MPS
configuration.
Recently I've started encountering systems where the endpoint device's MPSS
capability is less than its Root Port's current MPS value, thus the
endpoint is not capable of matching its upstream bridge's MPS setting (see:
bugzilla via "Link:" below). This leaves the system vulnerable - the
upstream Root Port could respond with larger TLPs than the device can
handle, and the device will consider them to be 'Malformed'.
One could use the "pci=pcie_bus_safe" kernel parameter to work around the
issue, but that forces a user to supply a kernel parameter to get the
system to function reliably and may end up limiting MPS settings of other
unrelated, sub-topologies which could benefit from maintaining their larger
values.
Augment Keith's approach to include tuning down a Root Port's MPS setting
when its hot-added endpoint device is not capable of matching it.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200527
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jon Mason <jdmason@kudzu.us>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Sinan Kaya <okaya@kernel.org>
Cc: Dongdong Liu <liudongdong3@huawei.com>
PCIe r4.0, sec 9.3.5.4, "Device Control Register", shows both
Max_Payload_Size (MPS) and Max_Read_request_Size (MRRS) to be 'RsvdP' for
VFs. Just prior to the table it states:
"PF and VF functionality is defined in Section 7.5.3.4 except where
noted in Table 9-16. For VF fields marked 'RsvdP', the PF setting
applies to the VF."
All of which implies that with respect to Max_Payload_Size Supported
(MPSS), MPS, and MRRS values, we should not be paying any attention to the
VF's fields, but rather only to the PF's. Only looking at the PF's fields
also logically makes sense as it's the sole physical interface to the PCIe
bus.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200527
Fixes: 27d868b5e6 ("PCI: Set MPS to match upstream bridge")
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # 4.3+
Cc: Keith Busch <keith.busch@intel.com>
Cc: Sinan Kaya <okaya@kernel.org>
Cc: Dongdong Liu <liudongdong3@huawei.com>
Cc: Jon Mason <jdmason@kudzu.us>
When both ends of a PCIe Link are capable of a higher bandwidth than is
currently in use, the Link is said to be "downtrained". A downtrained Link
may indicate hardware or configuration problems in the system, but it's
hard to identify such Links from userspace.
Refactor pcie_print_link_status() so it continues to always print PCIe
bandwidth information, as several NIC drivers desire.
Add a new internal __pcie_print_link_status() to emit a message only when a
device's bandwidth is constrained by the fabric and call it from the PCI
core for all devices, which identifies all downtrained Links. It also
emits messages for a few cases that are technically not downtrained, such
as a x4 device in an open-ended x1 slot.
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
[bhelgaas: changelog, move __pcie_print_link_status() declaration to
drivers/pci/, rename pcie_check_upstream_link() to
pcie_report_downtraining()]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Intel Sunrise Point PCH hardware has an implementation of the ACS bits that
does not comply with the PCIe standard. Add a device-specific quirk,
pci_quirk_disable_intel_spt_pch_acs_redir() to disable ACS Redirection on
this system.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: changelog, split to separate patch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Intel Sunrise Point (SPT) PCH hardware has an implementation of the ACS
bits that does not comply with the PCIe standard. To deal with this we
need device-specific quirks to disable ACS redirection.
Add a new pci_dev_specific_disable_acs_redir() quirk and a new
.disable_acs_redir() function pointer for use by non-compliant devices. No
functional change intended.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: split to separate patch, move
pci_dev_specific_disable_acs_redir() declarations to drivers/pci/pci.h]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Convert the search for device-specific ACS enable quirks from searching a
NULL-terminated array to iterating through the array, which is always
fixed-size anyway. No functional change intended.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: changelog, split to separate patch for reviewability]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
To support peer-to-peer traffic on a segment of the PCI hierarchy, we must
disable the ACS redirect bits for select PCI bridges. The bridges must be
selected before the devices are discovered by the kernel and the IOMMU
groups created. Therefore, add a kernel command line parameter to specify
devices which must have their ACS bits disabled.
The new parameter takes a list of devices separated by a semicolon. Each
device specified will have its ACS redirect bits disabled. This is
similar to the existing 'resource_alignment' parameter.
The ACS Request P2P Request Redirect, P2P Completion Redirect and P2P
Egress Control bits are disabled, which is sufficient to always allow
passing P2P traffic uninterrupted. The bits are set after the kernel
(optionally) enables the ACS bits itself. It is also done regardless of
whether the kernel or platform firmware sets the bits.
If the user tries to disable the ACS redirect for a device without the ACS
capability, print a warning to dmesg.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: reorder to add the generic code first and move the
device-specific quirk to subsequent patches]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Christian König <christian.koenig@amd.com>
When specifying PCI devices on the kernel command line using a
bus/device/function address, bus numbers can change when adding or
replacing a device, changing motherboard firmware, or applying kernel
parameters like "pci=assign-buses". When bus numbers change, it's likely
the command line tweak will be applied to the wrong device.
Therefore, it is useful to be able to specify devices with a base bus
number and the path of devfns needed to get to it, similar to the "device
scope" structure in the Intel VT-d spec, Section 8.3.1.
Thus, we add an option to specify devices in the following format:
[<domain>:]<bus>:<device>.<func>[/<device>.<func>]*
The path can be any segment within the PCI hierarchy of any length and
determined through the use of 'lspci -t'. When specified this way, it is
less likely that a renumbered bus will result in a valid device
specification and the tweak won't be applied to the wrong device.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: use "device" instead of "slot" in documentation since that's the
usual language in the PCI specs]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Christian König <christian.koenig@amd.com>
Separate out the code to match a PCI device with a string (typically
originating from a kernel parameter) from the
pci_specified_resource_alignment() function into its own helper function.
While we are at it, this change fixes the kernel style of the function
(fixing a number of long lines and extra parentheses).
Additionally, make the analogous change to the kernel parameter
documentation: Separate the description of how to specify a PCI device
into its own section at the head of the "pci=" parameter.
This patch should have no functional alterations.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: use "device" instead of "slot" in documentation since that's the
usual language in the PCI specs]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Christian König <christian.koenig@amd.com>
Move declarations for these functions:
pci_dev_specific_acs_enabled()
pci_dev_specific_enable_acs()
from include/linux/pci.h to drivers/pci/pci.h because nothing outside the
PCI core needs to use them.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Add a device-specific reset for Intel DC P3700 NVMe device which exhibits a
timeout failure in drivers waiting for the ready status to update after
NVMe enable if the driver interacts with the device too soon after FLR. As
this has been observed in device assignment scenarios, resolve this with a
device-specific reset quirk to add an additional, heuristically determined,
delay after the FLR completes.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1592654
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The Samsung SM961/PM961 (960 EVO) sometimes fails to return from FLR with
the PCI config space reading back as -1. A reproducible instance of this
behavior is resolved by clearing the enable bit in the NVMe configuration
register and waiting for the ready status to clear (disabling the NVMe
controller) prior to FLR.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1542494
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
pcie_flr() suggests pcie_has_flr() to ensure that PCIe FLR support is
present prior to calling. pcie_flr() is exported while pcie_has_flr()
is not. Resolve this.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This comment has been there since the driver was introduced, but seems
to be a leftover from previous iterations of the driver. Indeed, we do
not lookup in a list to find the register ranges that matches the
given port/lane, as the "reg" property is in each sub-node
representing a PCI port. There is no lookup involved at all.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Rather than using the ARM-specific pci_common_init_dev() API, use the
pci_host_bridge logic directly.
Unfortunately, we can't use devm_of_pci_get_host_bridge_resources(),
because the DT binding for describing PCIe apertures for this PCI
controller is a bit special, and we cannot retrieve them from the
'ranges' property. Therefore, we still have some special code to
handle this.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Instead of hardcoding the remapping of IO_SPACE_LIMIT - SZ_64K, use
resource_size().
However, we cannot use just IO_SPACE_LIMIT, because pci_ioremap_io() has
a bug and doesn't allow remapping the last 64 KB before IO_SPACE_LIMIT,
so we ensure that we do not exceed this limit. When the pci_ioremap_io()
issue is fixed, this work around can be dropped.
Note that this workaround already existed, since we were mapping only
up to IO_SPACE_LIMIT - SZ_64K.
Suggested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
[lorenzo.pieralisi@arm.com: tweaked the commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
If there is no PCI I/O aperture configured in the Device Tree, it does
not make sense to create the virtual mapping for the PCI I/O space,
since we will anyway not create the MBus window that will allow to
access it. Therefore, do the pci_ioremap_io() only if necessary.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
pcie->realio.end should be the address of last byte of the area,
therefore using resource_size() of another resource is not correct, we
must substract 1 to get the address of the last byte.
Fixes: 11be65472a ("PCI: mvebu: Adapt to the new device tree layout")
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
This is already done earlier in mvebu_pcie_probe().
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Several PCI core files include pci-aspm.h even though they don't need
anything provided by that file. Remove the unnecessary includes of it.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
The sysfs_match_string() helper returns index of the matching string in an
array. Use it in pcie_aspm_set_policy() to simplify the code.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
[bhelgaas: squash sysfs_match_string() fix into original patch for issue
Reported-by: Heiner Kallweit <hkallweit1@gmail.com>]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
There isn't a hard dependency of the Xilinx AXI-PCIe host bridge on any
architecture. For example: at SiFive we map RISC-V cores to Xilinx FPGAs
and connect the Xilinx IP via a TileLink adapter, so the RISC-V Linux
port will need to be able to enable PCIE_XILINX in order to have PCIe
support.
This patch decouples the PCIE_XILINX support from ARCH. Instead it just
depends on OF, which is the only true dependency.
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
[hch: switch to OF instead of OF_PCI now that the latter is gone]
Signed-off-by: Christoph Hellwig <hch@lst.de>
[lorenzo.pieralisi@arm.com: trimmed the commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
The next patch in this series will have to make the definition of
irq_cpustat_t available to entering_irq().
Inclusion of asm/hardirq.h into asm/apic.h would cause circular header
dependencies like
asm/smp.h
asm/apic.h
asm/hardirq.h
linux/irq.h
linux/topology.h
linux/smp.h
asm/smp.h
or
linux/gfp.h
linux/mmzone.h
asm/mmzone.h
asm/mmzone_64.h
asm/smp.h
asm/apic.h
asm/hardirq.h
linux/irq.h
linux/irqdesc.h
linux/kobject.h
linux/sysfs.h
linux/kernfs.h
linux/idr.h
linux/gfp.h
and others.
This causes compilation errors because of the header guards becoming
effective in the second inclusion: symbols/macros that had been defined
before wouldn't be available to intermediate headers in the #include chain
anymore.
A possible workaround would be to move the definition of irq_cpustat_t
into its own header and include that from both, asm/hardirq.h and
asm/apic.h.
However, this wouldn't solve the real problem, namely asm/harirq.h
unnecessarily pulling in all the linux/irq.h cruft: nothing in
asm/hardirq.h itself requires it. Also, note that there are some other
archs, like e.g. arm64, which don't have that #include in their
asm/hardirq.h.
Remove the linux/irq.h #include from x86' asm/hardirq.h.
Fix resulting compilation errors by adding appropriate #includes to *.c
files as needed.
Note that some of these *.c files could be cleaned up a bit wrt. to their
set of #includes, but that should better be done from separate patches, if
at all.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
PCI_EXP_AER_FLAGS was defined twice (with identical definitions), once
under #ifdef CONFIG_ACPI_APEI, and again at the top level. This looks like
my merge error from these commits:
fd3362cb73 ("PCI/AER: Squash aerdrv_core.c into aerdrv.c")
41cbc9eb1a ("PCI/AER: Squash ecrc.c into aerdrv.c")
Remove the duplicate PCI_EXP_AER_FLAGS definition.
Fixes: 41cbc9eb1a ("PCI/AER: Squash ecrc.c into aerdrv.c")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Oza Pawandeep <poza@codeaurora.org>
On driver probe and on resume from system sleep, pciehp checks the
Presence Detect State bit in the Slot Status register to bring up an
occupied slot or bring down an unoccupied slot. Both code paths are
identical, so deduplicate them per Mika's request.
On probe, an additional check is performed to disable power of an
unoccupied slot. This can e.g. happen if power was enabled by BIOS.
It cannot happen once pciehp has taken control, hence is not necessary
on resume: The Slot Control register is set to the same value that it
had on suspend by pci_restore_state(), so if the slot was occupied,
power is enabled and if it wasn't, power is disabled. Should occupancy
have changed during the system sleep transition, power is adjusted by
bringing up or down the slot per the paragraph above.
To allow for deduplication of the presence check, move the power check
to pcie_init(). This seems safer anyway, because right now it is
performed while interrupts are already enabled, and although I can't
think of a scenario where pciehp_power_off_slot() and the IRQ thread
collide, it does feel brittle.
However this means that pcie_init() may now write to the Slot Control
register before the IRQ is requested. If both the CCIE and HPIE bits
happen to be set, pcie_wait_cmd() will wait for an interrupt (instead
of polling the Command Completed bit) and eventually emit a timeout
message. Additionally, if a level-triggered INTx interrupt is used,
the user may see a spurious interrupt splat. Avoid by disabling
interrupts before disabling power. (Normally the HPIE and CCIE bits
should be clear on probe, but conceivably they may already have been
set e.g. by BIOS.)
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Per Mika's request, add an explicit break to the last case of switch
statements everywhere in pciehp to be more defensive towards future
amendments.
Per Gustavo's request, mark all non-empty implicit fallthroughs with a
comment to silence warnings triggered by -Wimplicit-fallthrough=2.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
When a PCI device is detected, pdev->is_added is set to 1 and proc and
sysfs entries are created.
When the device is removed, pdev->is_added is checked for one and then
device is detached with clearing of proc and sys entries and at end,
pdev->is_added is set to 0.
is_added and is_busmaster are bit fields in pci_dev structure sharing same
memory location.
A strange issue was observed with multiple removal and rescan of a PCIe
NVMe device using sysfs commands where is_added flag was observed as zero
instead of one while removing device and proc,sys entries are not cleared.
This causes issue in later device addition with warning message
"proc_dir_entry" already registered.
Debugging revealed a race condition between the PCI core setting the
is_added bit in pci_bus_add_device() and the NVMe driver reset work-queue
setting the is_busmaster bit in pci_set_master(). As these fields are not
handled atomically, that clears the is_added bit.
Move the is_added bit to a separate private flag variable and use atomic
functions to set and retrieve the device addition state. This avoids the
race because is_added no longer shares a memory location with is_busmaster.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200283
Signed-off-by: Hari Vyas <hari.vyas@broadcom.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Thunderbolt controllers can be runtime suspended to D3cold to save ~1.5W.
This requires that runtime D3 is allowed on its PCIe ports, so whitelist
them.
The 2015 BIOS cutoff that we've instituted for runtime D3 on PCIe ports
is unnecessary on Thunderbolt because we know that even the oldest
controller, Light Ridge (2010), is able to suspend its ports to D3 just
fine -- specifically including its hotplug ports. And the power saving
should be afforded to machines even if their BIOS predates 2015.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Andreas Noever <andreas.noever@gmail.com>
Previously we blacklisted PCIe hotplug ports for runtime D3 because:
(a) Ports handled by the firmware must not be transitioned to D3 by the
OS behind the firmware's back:
https://bugzilla.kernel.org/show_bug.cgi?id=53811
(b) Ports handled natively by the OS lacked runtime D3 support in the
pciehp driver.
We've just rectified the latter, so allow users to manually enable and
test it by passing pcie_port_pm=force on the command line. Vendors are
thus put in a position to validate hotplug ports for runtime D3 and
perhaps we can someday enable it by default, but with a BIOS cutoff date.
Ashok Raj tested runtime D3 on hotplug ports of a SkyLake Xeon-SP in
2017 and encountered Hardware Error NMIs, so this feature clearly cannot
be enabled for everyone yet:
https://lkml.kernel.org/r/20170503180426.GA4058@otc-nc-03
While at it, remove an erroneous code comment I added with 97a90aee5d
("PCI: Consolidate conditions to allow runtime PM on PCIe ports") which
claims that parents of a hotplug port must stay awake lest interrupts
cannot be delivered. That has turned out to be wrong at least for
Thunderbolt hotplug ports.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
When performing a function reset via sysfs, the device's config space is
accessed in places such as pcie_flr() and its MMIO space is accessed e.g.
in reset_ivb_igd(), so ensure accessibility by resuming the device to D0.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Ensure accessibility of a hotplug port's config space when accessed via
sysfs by resuming its parent to D0.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
pciehp's IRQ thread ensures accessibility of the port by runtime resuming
its parent to D0. However when the slot is enabled/disabled, the port
itself needs to be in D0 because its secondary bus is accessed in:
pciehp_check_link_status(),
pciehp_configure_device() (both called from board_added())
and
pciehp_unconfigure_device() (called from remove_board()).
Thus, acquire a runtime PM ref on enable/disablement of the slot.
Yinghai Lu additionally discovered that some SkyLake servers feature a
Power Controller for their PCIe hotplug ports (PCIe r3.1, sec 6.7.1.8)
which requires the port to be in D0 when invoking
pciehp_power_on_slot() (likewise called from board_added()).
If slot power is turned on while in D3hot, link training later fails:
https://lkml.kernel.org/r/20170205073454.GA253@wunner.de
The spec is silent about such a requirement, but it seems prudent to
assume that any hotplug port with a Power Controller may need this.
The present commit holds a runtime PM ref whenever slot power is turned
on and off, but it doesn't keep the port in D0 as long as slot power is
on. If vendors determine that's necessary, they need to amend pciehp to
acquire a runtime PM ref in pciehp_power_on_slot() and release one in
pciehp_power_off_slot().
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
If a hotplug port is able to send an interrupt, one would naively assume
that it is accessible at that moment. After all, if it wouldn't be
accessible, i.e. if its parent is in D3hot and the link to the hotplug
port is thus down, how should an interrupt come through?
It turns out that assumption is wrong at least for Thunderbolt: Even
though its parents are in D3hot, a Thunderbolt hotplug port is able to
signal interrupts. Because the port's config space is inaccessible and
resuming the parents may sleep, the hard IRQ handler has to defer
runtime resuming the parents and reading the Slot Status register to the
IRQ thread.
If the hotplug port uses a level-triggered INTx interrupt, it needs to
be masked until the IRQ thread has cleared the signaled events. For
simplicity, this commit also masks edge-triggered MSI/MSI-X interrupts.
Note that if the interrupt is shared (which can only happen for INTx),
other devices are starved from receiving interrupts until the IRQ thread
is scheduled, has runtime resumed the hotplug port's parents and has
read and cleared the Slot Status register.
That delay is dominated by the 10 ms D3hot->D0 transition time of each
parent port. The worst case is a Thunderbolt downstream port at the
end of a daisy chain: There may be up to six Thunderbolt controllers
in-between it and the root port, each comprising an upstream and
downstream port, plus its own upstream port. That's 13 x 10 = 130 ms.
Possible mitigations are polling the interrupt while it's disabled or
reducing the d3_delay of Thunderbolt ports if possible.
Open code masking of the interrupt instead of requesting it with the
IRQF_ONESHOT flag to minimize the period during which it is masked.
(IRQF_ONESHOT unmasks the IRQ only after the IRQ thread has finished.)
PCIe r4.0 sec 6.7.3.4 states that "If wake generation is required by the
associated form factor specification, a hotplug capable Downstream Port
must support generation of a wakeup event (using the PME mechanism) on
hotplug events that occur when the system is in a sleep state or the
Port is in device state D1, D2, or D3Hot."
This would seem to imply that PME needs to be enabled on the hotplug
port when it is runtime suspended. pci_enable_wake() currently doesn't
enable PME on bridges, it may be necessary to add an exemption for
hotplug bridges there. On "Light Ridge" Thunderbolt controllers, the
PME_Status bit is not set when an interrupt occurs while the hotplug
port is in D3hot, even if PME is enabled. (I've tested this on a Mac
and we hardcode the OSC_PCI_EXPRESS_PME_CONTROL bit to 0 on Macs in
negotiate_os_control(), modifying it to 1 didn't change the behavior.)
(Side note: Section 6.7.3.4 also states that "PME and Hot-Plug Event
interrupts (when both are implemented) always share the same MSI or
MSI-X vector". That would only seem to apply to Root Ports, however
the section never mentions Root Ports, only Downstream Ports. This is
explained in the definition of "Downstream Port" in the "Terms and
Acronyms" section of the PCIe Base Spec: "The Ports on a Switch that
are not the Upstream Port are Downstream Ports. All Ports on a Root
Complex are Downstream Ports.")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Upon resume from system sleep, the Slot Control register is written via:
pci_pm_resume_noirq()
pci_pm_default_resume_early()
pci_restore_state()
pci_restore_pcie_state()
PCIe r4.0, sec 6.7.3.2 says that after "issuing a write transaction that
targets any portion of the Port's Slot Control register, [...] software
must wait for [the] command to complete before issuing the next command".
pciehp currently fails to enforce that rule after the above-mentioned
write. Fix it.
(Moving restoration of the Slot Control register to pciehp doesn't seem
to make sense because the other PCIe hotplug drivers may need it as
well.)
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Thunderbolt hotplug ports that were occupied before system sleep resume
with their downstream link in "off" state. Only after the Thunderbolt
controller has reestablished the PCIe tunnels does the link go up.
As a result, a spurious Presence Detect Changed and/or Data Link Layer
State Changed event occurs.
The events are not immediately acted upon because tunnel reestablishment
happens in the ->resume_noirq phase, when interrupts are still disabled.
Also, notification of events may initially be disabled in the Slot
Control register when coming out of system sleep and is reenabled in the
->resume_noirq phase through:
pci_pm_resume_noirq()
pci_pm_default_resume_early()
pci_restore_state()
pci_restore_pcie_state()
It is not guaranteed that the events are acted upon at all: PCIe r4.0,
sec 6.7.3.4 says that "a port may optionally send an MSI when there are
hot-plug events that occur while interrupt generation is disabled, and
interrupt generation is subsequently enabled." Note the "optionally".
If an MSI is sent, pciehp will gratuitously turn the slot off and back
on once the ->resume_early phase has commenced.
If an MSI is not sent, the extant, unacknowledged events in the Slot
Status register will prevent future notification of presence or link
changes.
Commit 13c65840fe ("PCI: pciehp: Clear Presence Detect and Data Link
Layer Status Changed on resume") fixed the latter by clearing the events
in the ->resume phase. Move this to the ->resume_noirq phase to also
fix the gratuitous disable/enablement of the slot.
The commit further restored the Slot Control register in the ->resume
phase, but that's dispensable because as shown above it's already been
done in the ->resume_noirq phase.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Replace suspend_iter() and resume_iter() with a single function pm_iter()
to allow addition of port service callbacks for further power management
phases without having to add another iterator each time.
No functional change intended.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The ->reset_slot callback introduced by commits:
2e35afaefe ("PCI: pciehp: Add reset_slot() method") and
06a8d89af5 ("PCI: pciehp: Disable link notification across slot reset")
disables notification of Presence Detect Changed and Data Link Layer
State Changed events for the duration of a secondary bus reset.
However a bus reset not only triggers these events, but may also clear
the Presence Detect State bit in the Slot Status register and the Data
Link Layer Link Active bit in the Link Status register momentarily.
According to Sinan Kaya:
"I know for a fact that bus reset clears the Data Link Layer Active bit
as soon as link goes down. It gets set again following link up.
Presence detect depends on the HW implementation. QDT root ports
don't change presence detect for instance since nobody actually
removed the card. If an implementation supports in-band presence
detect, the answer is yes. As soon as the link goes down, presence
detect bit will get cleared until recovery."
https://lkml.kernel.org/r/42e72f83-3b24-f7ef-e5bc-290fae99259a@codeaurora.org
In-band presence detect is also covered in Table 4-15 in PCIe r4.0,
sec 4.2.6.
pciehp should therefore ensure that any parts of the driver that access
those bits do not run concurrently to a bus reset. The only precaution
the commits took to that effect was to halt interrupt polling. They
made no effort to drain the slot workqueue, cancel an outstanding
Attention Button work, or block slot enable/disable requests via sysfs
and in the ->probe hook.
Now that pciehp is converted to enable/disable the slot exclusively from
the IRQ thread, the only places accessing the two above-mentioned bits
are the IRQ thread and the ->probe hook. Add locking to serialize them
with a bus reset. This obviates the need to halt interrupt polling.
Do not add locking to the ->get_adapter_status sysfs callback to afford
users unfettered access to that bit. Use an rw_semaphore in lieu of a
regular mutex to allow parallel execution of the non-reset code paths
accessing the critical bits, i.e. the IRQ thread and the ->probe hook.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rajat Jain <rajatja@google.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Sinan Kaya <okaya@kernel.org>
If we have a threaded interrupt with the handler being NULL, then
request_threaded_irq() -> __setup_irq() will complain and bail out if the
IRQF_ONESHOT flag isn't set. Therefore check for the handler being NULL
and set IRQF_ONESHOT in this case.
This change is needed to migrate the mei_me driver to
pci_alloc_irq_vectors() and pci_request_irq().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
There is nothing arch-specific about PCI or dma-debug, so call
dma_debug_add_bus() from the PCI core just after registering the bus type.
Most of dma-debug is already generic; this just adds reporting of pending
dma-allocations on driver unload for arches other than powerpc, sh, and
x86.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
commit 9af6bcb11e ("PCI: mobiveil: Add Mobiveil PCIe Host Bridge IP
driver") did not add the configuration and build infrastructure to
configure and build the mobiveil controller driver, so at present the
driver code is in the kernel but cannot be compiled.
Add the mobiveil controller driver Kconfig/Makefile infrastructure.
Fixes: 9af6bcb11e ("PCI: mobiveil: Add Mobiveil PCIe Host Bridge IP
driver")
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Subrahmanya Lingappa <l.subrahmanya@mobiveil.co.in>
PCI mobiveil host controller driver currently fails to compile
with the following error:
drivers/pci/controller/pcie-mobiveil.c: In function
'mobiveil_pcie_probe':
drivers/pci/controller/pcie-mobiveil.c:788:8: error: implicit
declaration of function 'devm_of_pci_get_host_bridge_resources'; did you
mean 'pci_get_host_bridge_device'?
[-Werror=implicit-function-declaration]
ret = devm_of_pci_get_host_bridge_resources(dev, 0, 0xff,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pci_get_host_bridge_device
Add the missing include file to pull in the required function declaration.
Fixes: 9af6bcb11e ("PCI: mobiveil: Add Mobiveil PCIe Host Bridge IP
driver")
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Subrahmanya Lingappa <l.subrahmanya@mobiveil.co.in>
The field pcie_reg_base in struct mobiveil_pcie represents a physical
address so it should be of phys_addr_t type rather than void __iomem*;
this results in the following compilation warnings:
drivers/pci/controller/pcie-mobiveil.c: In function
'mobiveil_pcie_parse_dt':
drivers/pci/controller/pcie-mobiveil.c:326:22: warning: assignment makes
pointer from integer without a cast [-Wint-conversion]
pcie->pcie_reg_base = res->start;
^
drivers/pci/controller/pcie-mobiveil.c: In function
'mobiveil_pcie_enable_msi':
drivers/pci/controller/pcie-mobiveil.c:485:25: warning: initialization
makes integer from pointer without a cast [-Wint-conversion]
phys_addr_t msg_addr = pcie->pcie_reg_base;
^~~~
drivers/pci/controller/pcie-mobiveil.c: In function
'mobiveil_compose_msi_msg':
drivers/pci/controller/pcie-mobiveil.c:640:21: warning: initialization
makes integer from pointer without a cast [-Wint-conversion]
phys_addr_t addr = pcie->pcie_reg_base + (data->hwirq * sizeof(int));
Fix the type and with it the compilation warnings.
Fixes: 9af6bcb11e ("PCI: mobiveil: Add Mobiveil PCIe Host Bridge IP
driver")
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Subrahmanya Lingappa <l.subrahmanya@mobiveil.co.in>
IB_WIN_SIZE is larger than INT_MAX so we need to cast it to u64.
Fixes: 9af6bcb11e ("PCI: mobiveil: Add Mobiveil PCIe Host Bridge IP driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
When an fatal error is received by a non-bridge device, the device is
removed, and pci_stop_and_remove_bus_device() deallocates the device
structure. The freed device structure is used by subsequent code to send
uevents and print messages.
Hold a reference on the device until we're finished using it. This is not
an ideal fix because pcie_do_fatal_recovery() should not use the device at
all after removing it, but that's too big a project for right now.
Fixes: 7e9084b367 ("PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices")
Signed-off-by: Thomas Tai <thomas.tai@oracle.com>
[bhelgaas: changelog, reduce get/put coverage]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
IB_WIN_SIZE is larger than INT_MAX so we need to cast it to u64.
Fixes: 9af6bcb11e ("PCI: mobiveil: Add Mobiveil PCIe Host Bridge IP driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Per PCIe r4.0, sec 6.7.3.4, a "port may optionally send an MSI when
there are hot-plug events that occur while interrupt generation is
disabled, and interrupt generation is subsequently enabled."
On probe, we currently clear all event bits in the Slot Status register
with the notable exception of the Presence Detect Changed bit. Thereby
we seek to receive an interrupt for an already occupied slot once event
notification is enabled.
But because the interrupt is optional, users may have to specify the
pciehp_force parameter on the command line, which is inconvenient.
Moreover, now that pciehp's event handling has become resilient to
missed events, a Presence Detect Changed interrupt for a slot which is
powered on is interpreted as removal of the card. If the slot has
already been brought up by the BIOS, receiving such an interrupt on
probe causes the slot to be powered off and immediately back on, which
is likewise undesirable.
Avoid both issues by making the behavior of pciehp_force the default and
clearing the Presence Detect Changed bit on probe.
Note that the stated purpose of pciehp_force per the MODULE_PARM_DESC
("Force pciehp, even if OSHP is missing") seems nonsensical because the
OSHP control method is only relevant for SHCP slots according to the
PCI Firmware specification r3.0, sec 4.8.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
A hotplug port's Slot Status register does not count how often each type
of event occurred, it only records the fact *that* an event has occurred.
Previously pciehp queued a work item for each event. But if it missed
an event, e.g. removal of a card in-between two back-to-back insertions,
it queued up the wrong work item or no work item at all. Commit
fad214b0aa ("PCI: pciehp: Process all hotplug events before looking
for new ones") sought to improve the situation by shrinking the window
during which events may be missed.
But Stefan Roese reports unbalanced Card present and Link Up events,
suggesting that we're still missing events if they occur very rapidly.
Bjorn Helgaas responds that he considers pciehp's event handling
"baroque" and calls for its simplification and rationalization:
https://lkml.kernel.org/r/20180202192045.GA53759@bhelgaas-glaptop.roam.corp.google.com
It gets worse once a hotplug port is runtime suspended: The port can
signal an interrupt while it and its parents are in D3hot, i.e. while
it is inaccessible. By the time we've runtime resumed all parents to D0
and read the port's Slot Status register, we may have missed an arbitrary
number of events. Event handling therefore needs to be reworked to
become resilient to missed events.
Assume that a Presence Detect Changed event has occurred.
Consider the following truth table:
- Slot is in OFF_STATE and is currently empty. => Do nothing.
(The event is trailing a Link Down or we've
missed an insertion and subsequent removal.)
- Slot is in OFF_STATE and is currently occupied. => Turn the slot on.
- Slot is in ON_STATE and is currently empty. => Turn the slot off.
- Slot is in ON_STATE and is currently occupied. => Turn the slot off,
(Be cautious and assume the card in then back on.
the slot isn't the same as before.)
This leads to the following simple algorithm:
1 If the slot is in ON_STATE, turn it off unconditionally.
2 If the slot is currently occupied, turn it on.
Because those actions are now carried out synchronously, rather than by
scheduled work items, pciehp reacts to the *current* situation and
missed events no longer matter.
Data Link Layer State Changed events can be handled identically to
Presence Detect Changed events. Note that in the above truth table,
a Link Up trailing a Card present event didn't have to be accounted for:
It is filtered out by pciehp_check_link_status().
As for Attention Button Pressed events, PCIe r4.0, sec 6.7.1.5 says:
"Once the Power Indicator begins blinking, a 5-second abort interval
exists during which a second depression of the Attention Button cancels
the operation." In other words, the user can only expect the system to
react to a button press after it starts blinking. Missed button presses
that occur in-between are irrelevant.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Stefan Roese <sr@denx.de>
Cc: Mayurkumar Patel <mayurkumar.patel@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
When a device is hotplugged, Presence Detect and Link Up events often do
not occur simultaneously, but with a lag of a few milliseconds. Only
the first event received is relevant, the other one can be disregarded.
Moreover, Stefan Roese reports that on certain platforms, Link State and
Presence Detect may flap for up to 100 ms before stabilizing, suggesting
that such events should be disregarded for at least this long:
https://lkml.kernel.org/r/20180130084121.18653-1-sr@denx.de
On slot enablement, pciehp_check_link_status() waits for 100 ms per
PCIe r4.0, sec 6.7.3.3, then probes the hotplugged device's vendor
register for up to 1 second.
If this succeeds, the link is definitely up, so ignore any Presence
Detect or Link State events that occurred up to this point.
pciehp_check_link_status() then checks the Link Training bit in the
Link Status register. This is the final opportunity to detect
inaccessibility of the device and abort slot enablement. Any link
or presence change that occurs afterwards will cause the slot to be
disabled again immediately after attempting to enable it.
The astute reviewer may appreciate that achieving this behavior would be
more complicated had pciehp not just been converted to enable/disable
the slot exclusively from the IRQ thread: When the slot is enabled via
sysfs, each link or presence flap would otherwise cause the IRQ thread
to run and it would have to sense that those events are belonging to a
concurrent slot enablement operation and disregard them. It would be
much more difficult than this mere 3 line change.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Stefan Roese <sr@denx.de>
No callers of pciehp_enable/disable_slot() outside of pciehp_ctrl.c
remain, so declare the functions static. For now this requires forward
declarations. Those can be eliminated by reshuffling functions once the
ongoing effort to refactor the driver has settled.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Previously slot enablement and disablement could happen concurrently.
But now it's under the exclusive control of the IRQ thread, rendering
the locking obsolete. Drop it.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Besides the IRQ thread, there are several other places in the driver
which enable or disable the slot:
- pciehp_probe() enables the slot if it's occupied and the pciehp_force
module parameter is used.
- pciehp_resume() enables or disables the slot after system sleep.
- pciehp_queue_pushbutton_work() enables or disables the slot after the
5 second delay following an Attention Button press.
- pciehp_sysfs_enable_slot() and pciehp_sysfs_disable_slot() enable or
disable the slot on sysfs write.
This requires locking and complicates pciehp's state machine.
A simplification can be achieved by enabling and disabling the slot
exclusively from the IRQ thread.
Amend the functions listed above to request slot enable/disablement from
the IRQ thread by either synthesizing a Presence Detect Changed event or,
in the case of a disable user request (via sysfs or an Attention Button
press), submitting a newly introduced force disable request. The latter
is needed because the slot shall be forced off despite being occupied.
For this force disable request, avoid colliding with Slot Status register
bits by using a bit number greater than 16.
For synchronous execution of requests (on sysfs write), wait for the
request to finish and retrieve the result. There can only ever be one
sysfs write in flight due to the locking in kernfs_fop_write(), hence
there is no risk of returning the result of a different sysfs request to
user space.
The POWERON_STATE and POWEROFF_STATE is now no longer entered by the
above-listed functions, but solely by the IRQ thread when it begins a
power transition. Afterwards, it moves to STATIC_STATE. The same
applies to canceling the Attention Button work, it likewise becomes an
IRQ thread only operation.
An immediate consequence is that the POWERON_STATE and POWEROFF_STATE is
never observed by the IRQ thread itself, only by functions called in a
different context, such as pciehp_sysfs_enable_slot(). So remove
handling of these states from pciehp_handle_button_press() and
pciehp_handle_link_change() which are exclusively called from the IRQ
thread.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
handle_button_press_event() currently determines whether the slot has
been turned on or off by looking at the Power Controller Control bit in
the Slot Control register. This assumes that an attention button
implies presence of a power controller even though that's not mandated
by the spec. Moreover the Power Controller Control bit is unreliable
when a power fault occurs (PCIe r4.0, sec 6.7.1.8). This issue has
existed since the driver was introduced in 2004.
Fix by replacing STATIC_STATE with ON_STATE and OFF_STATE and tracking
whether the slot has been turned on or off. This is also a required
ingredient to make pciehp resilient to missed events, which is the
object of an upcoming commit.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The PCI hotplug core has just been refactored to separate slot
initialization for in-kernel use from publication to user space.
Take advantage of it in pciehp by publishing to user space last on
probe. This will allow enable/disablement of the slot exclusively from
the IRQ thread because the IRQ is requested after initialization for
in-kernel use (thereby getting its unique name needed by the IRQ thread)
but before user space is able to submit enable/disable requests.
On teardown, the order is the same in reverse: The user space interface
is removed prior to freeing the IRQ and destroying the slot.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
When a hotplug driver calls pci_hp_register(), all steps necessary for
registration are carried out in one go, including creation of a kobject
and addition to sysfs. That's a problem for pciehp once it's converted
to enable/disable the slot exclusively from the IRQ thread: The thread
needs to be spawned after creation of the kobject (because it uses the
kobject's name), but before addition to sysfs (because it will handle
enable/disable requests submitted via sysfs).
pci_hp_deregister() does offer a ->release callback that's invoked
after deletion from sysfs and before destruction of the kobject. But
because pci_hp_register() doesn't offer a counterpart, hotplug drivers'
->probe and ->remove code becomes asymmetric, which is error prone
as recently discovered use-after-free bugs in pciehp's ->remove hook
have shown.
In a sense, this appears to be a case of the midlayer antipattern:
"The core thesis of the "midlayer mistake" is that midlayers are
bad and should not exist. That common functionality which it is
so tempting to put in a midlayer should instead be provided as
library routines which can [be] used, augmented, or ignored by
each bottom level driver independently. Thus every subsystem
that supports multiple implementations (or drivers) should
provide a very thin top layer which calls directly into the
bottom layer drivers, and a rich library of support code that
eases the implementation of those drivers. This library is
available to, but not forced upon, those drivers."
-- Neil Brown (2009), https://lwn.net/Articles/336262/
The presence of midlayer traits in the PCI hotplug core might be ascribed
to its age: When it was introduced in February 2002, the blessings of a
library approach might not have been well known:
https://git.kernel.org/tglx/history/c/a8a2069f432c
For comparison, the driver core does offer split functions for creating
a kobject (device_initialize()) and addition to sysfs (device_add()) as
an alternative to carrying out everything at once (device_register()).
This was introduced in October 2002:
https://git.kernel.org/tglx/history/c/8b290eb19962
The odd ->release callback in the PCI hotplug core was added in 2003:
https://git.kernel.org/tglx/history/c/69f8d663b595
Clearly, a library approach would not force every hotplug driver to
implement a ->release callback, but rather allow the driver to remove
the sysfs files, release its data structures and finally destroy the
kobject. Alternatively, a driver may choose to remove everything with
pci_hp_deregister(), then release its data structures.
To this end, offer drivers pci_hp_initialize() and pci_hp_add() as a
split-up version of pci_hp_register(). Likewise, offer pci_hp_del()
and pci_hp_destroy() as a split-up version of pci_hp_deregister().
Eliminate the ->release callback and move its code into each driver's
teardown routine.
Declare pci_hp_deregister() void, in keeping with the usual kernel
pattern that enablement can fail, but disablement cannot. It only
returned an error if the caller passed in a NULL pointer or a slot which
has never or is no longer registered or is sharing its name with another
slot. Those would be bugs, so WARN about them. Few hotplug drivers
actually checked the return value and those that did only printed a
useless error message to dmesg. Remove that.
For most drivers the conversion was straightforward since it doesn't
matter whether the code in the ->release callback is executed before or
after destruction of the kobject. But in the case of ibmphp, it was
unclear to me whether setting slot_cur->ctrl and slot_cur->bus_on to
NULL needs to happen before the kobject is destroyed, so I erred on
the side of caution and ensured that the order stays the same. Another
nontrivial case is pnv_php, I've found the list and kref logic difficult
to understand, however my impression was that it is safe to delete the
list element and drop the references until after the kobject is
destroyed.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> # drivers/platform/x86
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Scott Murray <scott@spiteful.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Corentin Chary <corentin.chary@gmail.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andy Shevchenko <andy@infradead.org>
Previously the slot workqueue was used to handle events and enable or
disable the slot. That's no longer the case as those tasks are done
synchronously in the IRQ thread. The slot workqueue is thus merely used
to handle a button press after the 5 second delay and only one such work
item may be in flight at any given time. A separate workqueue isn't
necessary for this simple task, so use the system workqueue instead.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Up until now, pciehp's IRQ handler schedules a work item for each event,
which in turn schedules a work item to enable or disable the slot. This
double indirection was necessary because sleeping wasn't allowed in the
IRQ handler.
However it is now that pciehp has been converted to threaded IRQ handling
and polling, so handle events synchronously in pciehp_ist() and remove
the work item infrastructure (with the exception of work items to handle
a button press after the 5 second delay).
For link or presence change events, move the register read to determine
the current link or presence state behind acquisition of the slot lock
to prevent it from becoming stale while the lock is contended.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
If the attention button is pressed to power on the slot AND the user
powers on the slot via sysfs before 5 seconds have elapsed AND powering
on the slot fails because either the slot is unoccupied OR the latch is
open, we neglect turning off the green LED so it keeps on blinking.
That's because the error path of pciehp_sysfs_enable_slot() doesn't call
pciehp_green_led_off(), unlike pciehp_power_thread() which does.
The bug has been present since 2004 when the driver was introduced.
Fix by deduplicating common code in pciehp_sysfs_enable_slot() and
pciehp_power_thread() into a wrapper function pciehp_enable_slot() and
renaming the existing function to __pciehp_enable_slot(). Same for
pciehp_disable_slot(). This will also simplify the upcoming rework of
pciehp's event handling.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
We've just converted pciehp to threaded IRQ handling, but still cannot
sleep in pciehp_ist() because the function is also called in poll mode,
which runs in softirq context (from a timer).
Convert poll mode to a kthread so that pciehp_ist() always runs in task
context.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
pciehp's IRQ handler queues up a work item for each event signaled by
the hardware. A more modern alternative is to let a long running
kthread service the events. The IRQ handler's sole job is then to check
whether the IRQ originated from the device in question, acknowledge its
receipt to the hardware to quiesce the interrupt and wake up the kthread.
One benefit is reduced latency to handle the IRQ, which is a necessity
for realtime environments. Another benefit is that we can make pciehp
simpler and more robust by handling events synchronously in process
context, rather than asynchronously by queueing up work items. pciehp's
usage of work items is a historic artifact, it predates the introduction
of threaded IRQ handlers by two years. (The former was introduced in
2007 with commit 5d386e1ac4 ("pciehp: Event handling rework"), the
latter in 2009 with commit 3aa551c9b4 ("genirq: add threaded interrupt
handler support").)
Convert pciehp to threaded IRQ handling by retrieving the pending events
in pciehp_isr(), saving them for later consumption by the thread handler
pciehp_ist() and clearing them in the Slot Status register.
By clearing the Slot Status (and thereby acknowledging the events) in
pciehp_isr(), we can avoid requesting the IRQ with IRQF_ONESHOT, which
would have the unpleasant side effect of starving devices sharing the
IRQ until pciehp_ist() has finished.
pciehp_isr() does not count how many times each event occurred, but
merely records the fact *that* an event occurred. If the same event
occurs a second time before pciehp_ist() is woken, that second event
will not be recorded separately, which is problematic according to
commit fad214b0aa ("PCI: pciehp: Process all hotplug events before
looking for new ones") because we may miss removal of a card in-between
two back-to-back insertions. We're about to make pciehp_ist() resilient
to missed events. The present commit regresses the driver's behavior
temporarily in order to separate the changes into reviewable chunks.
This doesn't affect regular slow-motion hotplug, only plug-unplug-plug
operations that happen in a timespan shorter than wakeup of the IRQ
thread.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mayurkumar Patel <mayurkumar.patel@intel.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Document the driver's data structures to lower the barrier to entry for
contributors.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Since commit 0f4bd8014d ("PCI: hotplug: Drop checking of PCI_BRIDGE_
CONTROL in *_unconfigure_device()"), pciehp_unconfigure_device() can no
longer fail, so declare it and its sole caller remove_board() void, in
keeping with the usual kernel pattern that enablement can fail, but
disablement cannot. No functional change intended.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
pciehp_disable_slot() checks if the ctrl attribute of the slot is NULL
and bails out if so. However the function is not called prior to the
attribute being set in pcie_init_slot(), and pcie_init_slot() is not
called if ctrl is NULL. So the check is unnecessary. Drop it.
It has been present ever since the driver was introduced in 2004, but it
was already unnecessary back then:
https://git.kernel.org/tglx/history/c/c16b4b14d980
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Commit b440bde74f ("PCI: Add pci_ignore_hotplug() to ignore hotplug
events for a device") iterates over the devices on a hotplug port's
subordinate bus in pciehp's IRQ handler without acquiring pci_bus_sem.
It is thus possible for a user to cause a crash by concurrently
manipulating the device list, e.g. by disabling slot power via sysfs
on a different CPU or by initiating a remove/rescan via sysfs.
This can't be fixed by acquiring pci_bus_sem because it may sleep.
The simplest fix is to avoid the list iteration altogether and just
check the ignore_hotplug flag on the port itself. This works because
pci_ignore_hotplug() sets the flag both on the device as well as on its
parent bridge.
We do lose the ability to print the name of the device blocking hotplug
in the debug message, but that's probably bearable.
Fixes: b440bde74f ("PCI: Add pci_ignore_hotplug() to ignore hotplug events for a device")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
When pciehp is unbound (e.g. on unplug of a Thunderbolt device), the
hotplug_slot struct is deregistered and thus freed before freeing the
IRQ. The IRQ handler and the work items it schedules print the slot
name referenced from the freed structure in various informational and
debug log messages, each time resulting in a quadruple dereference of
freed pointers (hotplug_slot -> pci_slot -> kobject -> name).
At best the slot name is logged as "(null)", at worst kernel memory is
exposed in logs or the driver crashes:
pciehp 0000:10:00.0:pcie204: Slot((null)): Card not present
An attacker may provoke the bug by unplugging multiple devices on a
Thunderbolt daisy chain at once. Unplugging can also be simulated by
powering down slots via sysfs. The bug is particularly easy to trigger
in poll mode.
It has been present since the driver's introduction in 2004:
https://git.kernel.org/tglx/history/c/c16b4b14d980
Fix by rearranging teardown such that the IRQ is freed first. Run the
work items queued by the IRQ handler to completion before freeing the
hotplug_slot struct by draining the work queue from the ->release_slot
callback which is invoked by pci_hp_deregister().
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v2.6.4
If addition of sysfs files fails on registration of a hotplug slot, the
struct pci_slot as well as the entry in the slot_list is leaked. The
issue has been present since the hotplug core was introduced in 2002:
https://git.kernel.org/tglx/history/c/a8a2069f432c
Perhaps the idea was that even though sysfs addition fails, the slot
should still be usable. But that's not how drivers use the interface,
they abort probe if a non-zero value is returned.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v2.4.15+
Cc: Greg Kroah-Hartman <greg@kroah.com>
Ten years ago, commit 58319b802a ("PCI: Hotplug core: remove 'name'")
dropped the name element from struct hotplug_slot but neglected to update
the skeleton driver.
That same year, commit f46753c5e3 ("PCI: introduce pci_slot") raised the
number of arguments to pci_hp_register() from one to four.
Fourteen years ago, historic commit 7ab60fc1b8e7 ("PCI Hotplug skeleton:
final cleanups") removed all usages of the retval variable from
pcihp_skel_init() but not the variable itself, provoking a compiler
warning: https://git.kernel.org/tglx/history/c/7ab60fc1b8e7
It seems fair to assume the driver hasn't been used as a template for a new
driver in a while. Per Bjorn's and Christoph's preference, delete it.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Christoph Hellwig <hch@lst.de>
The pci_error_handlers.slot_reset() callback is only used for non-bridge
devices (see broadcast_error_message()). Since portdrv only binds to
bridges, we don't need pcie_portdrv_slot_reset(), so remove it.
Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: changelog, remove pcie_portdrv_slot_reset() completely]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
In case of correctable error, the Correctable Error Detected bit in the
Device Status register is set. Clear it after handling the error.
Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Clear the device status bits while handling both ERR_FATAL and ERR_NONFATAL
cases.
Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: rename to pci_aer_clear_device_status(), declare internal to PCI
core instead of exposing it everywhere]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
broadcast_error_message() is only used for ERR_NONFATAL events, when the
state is always pci_channel_io_normal, so remove the unused alternate path.
Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
aer_error_resume() clears all ERR_NONFATAL error status bits. This is
exactly what pci_cleanup_aer_uncorrect_error_status(), so use that instead
of duplicating the code.
Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: split to separate patch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
pci_cleanup_aer_uncorrect_error_status() is called by driver .slot_reset()
methods when handling ERR_NONFATAL errors. Previously this cleared *all*
the bits, including ERR_FATAL bits.
Since we're only handling ERR_NONFATAL errors, clear only the ERR_NONFATAL
error status bits.
Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: split to separate patch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
During recovery from fatal errors, we previously called
pci_cleanup_aer_uncorrect_error_status(), which cleared *all* uncorrectable
error status bits (both ERR_FATAL and ERR_NONFATAL).
Instead, call a new pci_aer_clear_fatal_status() that clears only the
ERR_FATAL bits (as indicated by the PCI_ERR_UNCOR_SEVER register).
Based-on-patch-by: Oza Pawandeep <poza@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Now that the old implementation of pci_reset_bus() is gone, replace
pci_try_reset_bus() with pci_reset_bus().
Compared to the old implementation, new code will fail immmediately with
-EAGAIN if object lock cannot be obtained.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
pci_reset_bus() and pci_reset_slot() functions are not being used by any
code. Remove them from the kernel in favor of pci_try_reset_bus() and
pci_try_reset_slot() functions.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Drivers are expected to call pci_try_reset_slot() or pci_try_reset_bus() by
querying if a system supports hotplug or not. A survey showed that most
drivers don't do this and we are leaking hotplug capability to the user.
Hide pci_try_slot_reset() from drivers and embed into pci_try_bus_reset().
Change pci_try_reset_bus() parameter from struct pci_bus to struct pci_dev.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>