A Root Complex Event Collector terminates error and PME messages from
associated RCiEPs.
Use the RCEC Endpoint Association Extended Capability to identify
associated RCiEPs. Link the associated RCiEPs as the RCECs are enumerated.
Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/20201121001036.8560-12-sean.v.kelley@intel.com
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> # non-native/no RCEC
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
pci_msi_set_enable() and pci_msix_clear_and_set_ctrl() are only used from
msi.c, so move them from drivers/pci/pci.h to msi.c. No functional change
intended.
Link: https://lore.kernel.org/r/20201203185110.1583077-3-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thierry Reding <treding@nvidia.com>
Move pci_msi_setup_pci_dev(), which disables MSI and MSI-X interrupts, from
probe.c to msi.c so it's with all the other MSI code and more consistent
with other capability initialization. This means we must compile msi.c
always, even without CONFIG_PCI_MSI, so wrap the rest of msi.c in an #ifdef
and adjust the Makefile accordingly. No functional change intended.
Link: https://lore.kernel.org/r/20201203185110.1583077-2-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thierry Reding <treding@nvidia.com>
reset_link() appears to be misnamed. The point is to reset any devices
below a given bridge, so rename it to reset_subordinates() to make it clear
that we are passing a bridge with the intent to reset the devices below it.
Link: https://lore.kernel.org/r/20201121001036.8560-5-sean.v.kelley@intel.com
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> # non-native/no RCEC
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Extend support for Root Complex Event Collectors by decoding and caching
the RCEC Endpoint Association Extended Capabilities when enumerating. Use
that cached information for later error source reporting. See PCIe r5.0,
sec 7.9.10.
Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/20201121001036.8560-4-sean.v.kelley@intel.com
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> # non-native/no RCEC
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
PCIe r6.0, sec 7.5.3.18, defines a new 64.0 GT/s bit in the Supported Link
Speeds Vector of Link Capabilities 2.
This patch does not affect the speed of the link, which should be
negotiated automatically by the hardware; it only adds decoding when
showing the speed to the user.
Decode this new speed. Previously, reading the speed of a link operating
at this speed showed "Unknown speed" instead of "64.0 GT/s".
Link: https://lore.kernel.org/r/aaaab33fe18975e123a84aebce2adb85f44e2bbe.1605739760.git.gustavo.pimentel@synopsys.com
Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
saved and restored during suspend/resume leading to L1 Substates
configuration being lost post-resume.
Save the L1 Substates control registers so that the configuration is
retained post-resume.
Link: https://lore.kernel.org/r/20201024190442.871-1-vidyas@nvidia.com
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This reverts commit 7e24bc347e.
7e24bc347e was based on PCIe r5.0, sec 5.9, which claims we need a 200 ms
delay when transitioning to or from D2. However, sec 5.3.1.3 states the
delay as 200 μs (microseconds), as does the table in PCIe r4.0, sec 5.9.1.
This looks like a typo in the r5.0 spec, so revert back to a 200 μs delay
instead of a 200 ms delay.
Fixes: 7e24bc347e ("PCI/PM: Apply D2 delay as milliseconds, not microseconds")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
476e7faefc ("PCI PM: Do not wait for buses in B2 or B3 during resume")
removed the last use of PCI_PM_BUS_WAIT. Remove the definition as well.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
PCI devices support two variants of the D3 power state: D3hot (main power
present) D3cold (main power removed). Previously struct pci_dev contained:
unsigned int d3_delay; /* D3->D0 transition time in ms */
unsigned int d3cold_delay; /* D3cold->D0 transition time in ms */
"d3_delay" refers specifically to the D3hot state. Rename it to
"d3hot_delay" to avoid ambiguity and align with the ACPI "_DSM for
Specifying Device Readiness Durations" in the PCI Firmware spec r3.2,
sec 4.6.9.
There is no change to the functionality.
Link: https://lore.kernel.org/r/20200730210848.1578826-1-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
- Use pci_host_bridge.windows list directly instead of splicing in a
temporary list for cadence, mvebu, host-common (Rob Herring)
- Use pci_host_probe() instead of open-coding all the pieces for altera,
brcmstb, iproc, mobiveil, rcar, rockchip, tegra, v3, versatile, xgene,
xilinx, xilinx-nwl (Rob Herring)
- Convert to devm_platform_ioremap_resource_byname() instead of open-coding
platform_get_resource_byname() and devm_ioremap_resource() for altera,
cadence, mediatek, rockchip, tegra, xgene (Dejin Zheng)
- Convert to devm_platform_ioremap_resource() instead of open-coding
platform_get_resource() and devm_ioremap_resource() for aardvark,
brcmstb, exynos, ftpci100, versatile (Dejin Zheng)
- Remove redundant error messages from devm_pci_remap_cfg_resource()
callers (Dejin Zheng)
- Drop useless PCI_ENABLE_PROC_DOMAINS from versatile driver (Rob Herring)
- Default host bridge parent device to the platform device (Rob Herring)
- Drop unnecessary zeroing of host bridge fields (Rob Herring)
- Use pci_is_root_bus() instead of tracking root bus number separately in
aardvark, designware (imx6, keystone, designware-host), mobiveil,
xilinx-nwl, xilinx, rockchip, rcar (Rob Herring)
- Set host bridge bus number in pci_scan_root_bus_bridge() instead of each
driver for aardvark, designware-host, host-common, mediatek, rcar, tegra,
v3-semi (Rob Herring)
- Use bridge resources instead of parsing DT 'ranges' again for cadence
(Rob Herring)
- Remove private bus number and range from cadence (Rob Herring)
- Use devm_pci_alloc_host_bridge() to simplify rcar (Rob Herring)
- Use struct pci_host_bridge.windows list directly rather than a temporary
(Rob Herring)
- Reduce OF "missing non-prefetchable window" from error to warning message
(Rob Herring)
- Convert rcar-gen2 from old Arm-specific pci_common_init_dev() to new
arch-independent interfaces (Rob Herring)
- Move DT resource setup into devm_pci_alloc_host_bridge() (Rob Herring)
- Set bridge map_irq and swizzle_irq to default functions; drivers that
don't support legacy IRQs (iproc) need to undo this (Rob Herring)
* pci/host-probe-refactor:
PCI: Set bridge map_irq and swizzle_irq to default functions
PCI: Move DT resource setup into devm_pci_alloc_host_bridge()
PCI: rcar-gen2: Convert to use modern host bridge probe functions
PCI: of: Reduce missing non-prefetchable memory region to a warning
PCI: rcar: Use struct pci_host_bridge.windows list directly
PCI: rcar: Use devm_pci_alloc_host_bridge()
PCI: cadence: Remove private bus number and range storage
PCI: cadence: Use bridge resources for outbound window setup
PCI: Move setting pci_host_bridge.busnr out of host drivers
PCI: rcar: Use pci_is_root_bus() to check if bus is root bus
PCI: rockchip: Use pci_is_root_bus() to check if bus is root bus
PCI: xilinx: Use pci_is_root_bus() to check if bus is root bus
PCI: xilinx-nwl: Use pci_is_root_bus() to check if bus is root bus
PCI: mobiveil: Use pci_is_root_bus() to check if bus is root bus
PCI: designware: Use pci_is_root_bus() to check if bus is root bus
PCI: aardvark: Use pci_is_root_bus() to check if bus is root bus
PCI: Drop unnecessary zeroing of bridge fields
PCI: Set default bridge parent device
PCI: versatile: Drop flag PCI_ENABLE_PROC_DOMAINS
PCI: controller: Remove duplicate error message
PCI: controller: Convert to devm_platform_ioremap_resource()
PCI: controller: Convert to devm_platform_ioremap_resource_byname()
PCI: xilinx: Use pci_host_probe() to register host
PCI: xilinx-nwl: Use pci_host_probe() to register host
PCI: rockchip: Use pci_host_probe() to register host
PCI: rcar: Use pci_host_probe() to register host
PCI: iproc: Use pci_host_probe() to register host
PCI: altera: Use pci_host_probe() to register host
PCI: xgene: Use pci_host_probe() to register host
PCI: versatile: Use pci_host_probe() to register host
PCI: v3: Use pci_host_probe() to register host
PCI: tegra: Use pci_host_probe() to register host
PCI: mobiveil: Use pci_host_probe() to register host
PCI: brcmstb: Use pci_host_probe() to register host
PCI: host-common: Use struct pci_host_bridge.windows list directly
PCI: mvebu: Use struct pci_host_bridge.windows list directly
PCI: cadence: Use struct pci_host_bridge.windows list directly
# Conflicts:
# drivers/pci/controller/cadence/pcie-cadence-host.c
- Use pci_channel_state_t instead of enum pci_channel_state (Luc Van
Oostenryck)
- Simplify __aer_print_error() (Bjorn Helgaas)
- Log AER correctable errors as warning, not error (Matt Jolly)
- Rename pci_aer_clear_device_status() to pcie_clear_device_status() (Bjorn
Helgaas)
- Clear PCIe Device Status errors only if OS owns AER (Jonathan Cameron)
* pci/error:
PCI/ERR: Clear PCIe Device Status errors only if OS owns AER
PCI/ERR: Rename pci_aer_clear_device_status() to pcie_clear_device_status()
PCI/AER: Log correctable errors as warning, not error
PCI/AER: Simplify __aer_print_error()
PCI: Use 'pci_channel_state_t' instead of 'enum pci_channel_state'
Now that pci_parse_request_of_pci_ranges() callers just setup
pci_host_bridge.windows and dma_ranges directly and don't need the bus
range returned, we can just initialize them when allocating the
pci_host_bridge struct.
With this, pci_parse_request_of_pci_ranges() becomes a static function.
Link: https://lore.kernel.org/r/20200722022514.1283916-19-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
pci_aer_clear_device_status() clears the error bits in the PCIe Device
Status Register (PCI_EXP_DEVSTA). Every PCIe device has this register,
regardless of whether it supports AER.
Rename pci_aer_clear_device_status() to pcie_clear_device_status() to make
clear that it is PCIe-specific but not AER-specific. Move it to
drivers/pci/pci.c, again since it's not AER-specific. No functional change
intended.
Link: https://lore.kernel.org/r/20200717195619.766662-1-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Currently the ACS capability is being looked up at a number of places. Read
and store it once at enumeration so that it can be used by all later. No
functional change intended.
Link: https://lore.kernel.org/r/20200707224604.3737893-2-rajatja@google.com
Signed-off-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The method struct pci_error_handlers.error_detected() is defined and
documented as taking an 'enum pci_channel_state' for the second argument,
but most drivers use 'pci_channel_state_t' instead.
This 'pci_channel_state_t' is not a typedef for the enum but a typedef for
a bitwise type in order to have better/stricter typechecking.
Consolidate everything by using 'pci_channel_state_t' in the method's
definition, in the related helpers and in the drivers.
Enforce use of 'pci_channel_state_t' by replacing 'enum pci_channel_state'
with an anonymous 'enum'.
Note: Currently, from a typechecking point of view this patch changes
nothing because only the constants defined by the enum are bitwise, not the
enum itself (sparse doesn't have the notion of 'bitwise enum'). This may
change in some not too far future, hence the patch.
[bhelgaas: squash in
https://lore.kernel.org/r/20200702162651.49526-3-luc.vanoostenryck@gmail.comhttps://lore.kernel.org/r/20200702162651.49526-4-luc.vanoostenryck@gmail.com]
Link: https://lore.kernel.org/r/20200702162651.49526-2-luc.vanoostenryck@gmail.com
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The AER interfaces to clear error status registers were a confusing mess:
- pci_cleanup_aer_uncorrect_error_status() cleared non-fatal errors
from the Uncorrectable Error Status register.
- pci_aer_clear_fatal_status() cleared fatal errors from the
Uncorrectable Error Status register.
- pci_cleanup_aer_error_status_regs() cleared the Root Error Status
register (for Root Ports), the Uncorrectable Error Status register,
and the Correctable Error Status register.
Rename them to make them consistent:
From To
---------------------------------------- -------------------------------
pci_cleanup_aer_uncorrect_error_status() pci_aer_clear_nonfatal_status()
pci_aer_clear_fatal_status() pci_aer_clear_fatal_status()
pci_cleanup_aer_error_status_regs() pci_aer_clear_status()
Since pci_cleanup_aer_error_status_regs() (renamed to
pci_aer_clear_status()) is only used within drivers/pci/, move the
declaration from <linux/aer.h> to drivers/pci/pci.h.
[bhelgaas: commit log, add renames]
Link: https://lore.kernel.org/r/d1310a75dc3d28f7e8da4e99c45fbd3e60fe238e.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
If firmware controls DPC, it is generally responsible for managing the DPC
capability and events, and the OS should not access the DPC capability.
However, if firmware controls DPC and both the OS and the platform support
Error Disconnect Recover (EDR) notifications, the OS EDR notify handler is
responsible for recovery, and the notify handler may read/write the DPC
capability until it clears the DPC Trigger Status bit. See [1], sec 4.5.1,
table 4-6.
Expose some DPC error handling functions so they can be used by the EDR
notify handler.
[1] Downstream Port Containment Related Enhancements ECN, Jan 28, 2019,
affecting PCI Firmware Specification, Rev. 3.2
https://members.pcisig.com/wg/PCI-SIG/document/12888
Link: https://lore.kernel.org/r/e9000bb15b3a4293e81d98bb29ead7c84a6393c9.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Per the SFI _OSC and DPC Updates ECN [1] implementation note flowchart, the
OS seems to be expected to clear AER status even if it doesn't have
ownership of the AER capability. Unlike the DPC capability, where a DPC
ECN [2] specifies a window when the OS is allowed to access DPC registers
even if it doesn't have ownership, there is no clear model for AER.
Add pci_aer_raw_clear_status() to clear the AER error status registers
unconditionally. This is intended for use only by the EDR path (see [2]).
[1] System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24,
2020, affecting PCI Firmware Specification, Rev. 3.2
https://members.pcisig.com/wg/PCI-SIG/document/14076
[2] Downstream Port Containment Related Enhancements ECN, Jan 28, 2019,
affecting PCI Firmware Specification, Rev. 3.2
https://members.pcisig.com/wg/PCI-SIG/document/12888
[bhelgaas: changelog]
Link: https://lore.kernel.org/r/c19ad28f3633cce67448609e89a75635da0da07d.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
As per the DPC Enhancements ECN [1], sec 4.5.1, table 4-4, if the OS
supports Error Disconnect Recover (EDR), it must invalidate the software
state associated with child devices of the port without attempting to
access the child device hardware. In addition, if the OS supports DPC, it
must attempt to recover the child devices if the port implements the DPC
Capability. If the OS continues operation, the OS must inform the firmware
of the status of the recovery operation via the _OST method.
Return the result of pcie_do_recovery() so we can report it to firmware via
_OST.
[1] Downstream Port Containment Related Enhancements ECN, Jan 28, 2019,
affecting PCI Firmware Specification, Rev. 3.2
https://members.pcisig.com/wg/PCI-SIG/document/12888
Link: https://lore.kernel.org/r/eb60ec89448769349c6722954ffbf2de163155b5.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Previously we passed the PCIe service type parameter to pcie_do_recovery(),
where reset_link() looked up the underlying pci_port_service_driver and its
.reset_link() function pointer. Instead of using this roundabout way, we
can just pass the driver-specific .reset_link() callback function when
calling pcie_do_recovery() function.
This allows us to call pcie_do_recovery() from code that is not a PCIe port
service driver, e.g., Error Disconnect Recover (EDR) support.
Remove pcie_port_find_service() and pcie_port_service_driver.reset_link
since they are now unused.
Link: https://lore.kernel.org/r/60e02b87b526cdf2930400059d98704bf0a147d1.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Add PCIE_LNKCAP2_SLS2SPEED macro for transforming raw Link Capabilities 2
values to the pci_bus_speed. This is next to PCIE_SPEED2MBS_ENC() to make
it easier to update both places when adding support for new speeds.
Link: https://lore.kernel.org/r/1581937984-40353-10-git-send-email-yangyicong@hisilicon.com
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Previously some PCI speed strings came from pci_speed_string(), some came
from the PCIe-specific PCIE_SPEED2STR(), and some came from a PCIe-specific
switch statement. These methods were inconsistent:
pci_speed_string() PCIE_SPEED2STR() switch
------------------ ---------------- ------
33 MHz PCI
...
2.5 GT/s PCIe 2.5 GT/s 2.5 GT/s
5.0 GT/s PCIe 5 GT/s 5 GT/s
8.0 GT/s PCIe 8 GT/s 8 GT/s
16.0 GT/s PCIe 16 GT/s 16 GT/s
32.0 GT/s PCIe 32 GT/s 32 GT/s
Standardize on pci_speed_string() as the single source of these strings.
Note that this adds ".0" and "PCIe" to some messages, including sysfs
"max_link_speed" files, a brcmstb "link up" message, and the link status
dmesg logging, e.g.,
nvme 0000:01:00.0: 16.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x4 link at 0000:00:01.1 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
I think it's better to standardize on a single version of the speed text.
Previously we had strings like this:
/sys/bus/pci/slots/0/cur_bus_speed: 8.0 GT/s PCIe
/sys/bus/pci/slots/0/max_bus_speed: 8.0 GT/s PCIe
/sys/devices/pci0000:00/0000:00:1c.0/current_link_speed: 8 GT/s
/sys/devices/pci0000:00/0000:00:1c.0/max_link_speed: 8 GT/s
This changes the latter two to match the slots files:
/sys/devices/pci0000:00/0000:00:1c.0/current_link_speed: 8.0 GT/s PCIe
/sys/devices/pci0000:00/0000:00:1c.0/max_link_speed: 8.0 GT/s PCIe
Based-on-patch by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Add pci_speed_string() to return a text description of the supplied bus or
link speed. The slot code previously used the private
pci_bus_speed_strings[] array for this purpose, but adding this interface
will enable us to consolidate similar code elsewhere.
Export pcie_link_speed[] and pci_speed_string() so they can be used by
modules.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link speed 32.0 GT/s is supported in PCIe r5.0. Add this speed to
PCIE_SPEED2STR() and PCIE_SPEED2MBS_ENC() to correctly decode it.
This is complementary to de76cda215 ("PCI: Decode PCIe 32 GT/s link
speed").
Link: https://lore.kernel.org/r/1581937984-40353-2-git-send-email-yangyicong@hisilicon.com
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The number of possible devfns is 256, but pci_add_dma_alias() allocated a
bitmap of size 255. Fix this off-by-one error.
This fixes commits 338c3149a2 ("PCI: Add support for multiple DMA
aliases") and c663579273 ("PCI: Allocate dma_alias_mask with
bitmap_zalloc()"), but I doubt it was possible to see a problem because
it takes 4 64-bit longs (or 8 32-bit longs) to hold 255 bits, and
bitmap_zalloc() doesn't save the 255-bit size anywhere.
[bhelgaas: commit log, move #define to drivers/pci/pci.h, include loop
limit fix from Qian Cai:
https://lore.kernel.org/r/20191218170004.5297-1-cai@lca.pw]
Signed-off-by: James Sewart <jamessewart@arista.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
- Consolidate DT "dma-ranges" parsing and convert all host drivers to use
shared parsing (Rob Herring)
* remotes/lorenzo/pci/mmio-dma-ranges:
PCI: Make devm_of_pci_get_host_bridge_resources() static
PCI: rcar: Use inbound resources for setup
PCI: iproc: Use inbound resources for setup
PCI: xgene: Use inbound resources for setup
PCI: v3-semi: Use inbound resources for setup
PCI: ftpci100: Use inbound resources for setup
PCI: of: Add inbound resource parsing to helpers
PCI: versatile: Enable COMPILE_TEST
PCI: versatile: Remove usage of PHYS_OFFSET
PCI: versatile: Use pci_parse_request_of_pci_ranges()
PCI: xilinx-nwl: Use pci_parse_request_of_pci_ranges()
PCI: xilinx: Use pci_parse_request_of_pci_ranges()
PCI: xgene: Use pci_parse_request_of_pci_ranges()
PCI: v3-semi: Use pci_parse_request_of_pci_ranges()
PCI: rockchip: Drop storing driver private outbound resource data
PCI: rockchip: Use pci_parse_request_of_pci_ranges()
PCI: mobiveil: Use pci_parse_request_of_pci_ranges()
PCI: mediatek: Use pci_parse_request_of_pci_ranges()
PCI: iproc: Use pci_parse_request_of_pci_ranges()
PCI: faraday: Use pci_parse_request_of_pci_ranges()
PCI: dwc: Use pci_parse_request_of_pci_ranges()
PCI: altera: Use pci_parse_request_of_pci_ranges()
PCI: aardvark: Use pci_parse_request_of_pci_ranges()
PCI: Export pci_parse_request_of_pci_ranges()
resource: Add a resource_list_first_type helper
# Conflicts:
# drivers/pci/controller/pcie-rcar.c
- Fix erroneous intel-iommu dependency on CONFIG_AMD_IOMMU (Bjorn
Helgaas)
- Move pci_prg_resp_pasid_required() to CONFIG_PCI_PRI (Bjorn Helgaas)
- Allow VFs to use PRI (the PF PRI is shared by the VFs, but the code
previously didn't recognize that) (Kuppuswamy Sathyanarayanan)
- Allow VFs to use PASID (the PF PASID capability is shared by the VFs,
but the code previously didn't recognize that) (Kuppuswamy
Sathyanarayanan)
- Disconnect PF and VF ATS enablement, since ATS in PFs and associated
VFs can be enabled independently (Kuppuswamy Sathyanarayanan)
- Cache PRI and PASID capability offsets (Kuppuswamy Sathyanarayanan)
- Cache the PRI PRG Response PASID Required bit (Bjorn Helgaas)
- Consolidate ATS declarations in linux/pci-ats.h (Krzysztof Wilczynski)
- Remove unused PRI and PASID stubs (Bjorn Helgaas)
- Removed unnecessary EXPORT_SYMBOL_GPL() from ATS, PRI, and PASID
interfaces that are only used by built-in IOMMU drivers (Bjorn Helgaas)
- Hide PRI and PASID state restoration functions used only inside the PCI
core (Bjorn Helgaas)
- Fix the UPDCR register address in the Intel ACS quirk (Steffen
Liebergeld)
- Add a DMA alias quirk for the Intel VCA NTB (Slawomir Pawlowski)
- Serialize sysfs sriov_numvfs reads vs writes (Pierre Crégut)
- Update Cavium ACS quirk for ThunderX2 and ThunderX3 (George Cherian)
- Unify ACS quirk implementations (Bjorn Helgaas)
* pci/virtualization:
PCI: Unify ACS quirk desired vs provided checking
PCI: Make ACS quirk implementations more uniform
PCI: Apply Cavium ACS quirk to ThunderX2 and ThunderX3
PCI/IOV: Serialize sysfs sriov_numvfs reads vs writes
PCI: Add DMA alias quirk for Intel VCA NTB
PCI: Fix Intel ACS quirk UPDCR register address
PCI/ATS: Make pci_restore_pri_state(), pci_restore_pasid_state() private
PCI/ATS: Remove unnecessary EXPORT_SYMBOL_GPL()
PCI/ATS: Remove unused PRI and PASID stubs
PCI/ATS: Consolidate ATS declarations in linux/pci-ats.h
PCI/ATS: Cache PRI PRG Response PASID Required bit
PCI/ATS: Cache PASID Capability offset
PCI/ATS: Cache PRI Capability offset
PCI/ATS: Disable PF/VF ATS service independently
PCI/ATS: Handle sharing of PF PASID Capability with all VFs
PCI/ATS: Handle sharing of PF PRI Capability with all VFs
PCI/ATS: Move pci_prg_resp_pasid_required() to CONFIG_PCI_PRI
iommu/vt-d: Select PCI_PRI for INTEL_IOMMU_SVM
- Protect pci_reassign_bridge_resources() against concurrent
addition/removal (Benjamin Herrenschmidt)
- Fix bridge dma_ranges resource list cleanup (Rob Herring)
- Add PCI_STD_NUM_BARS for the number of standard BARs (Denis Efremov)
- Add "pci=hpmmiosize" and "pci=hpmmioprefsize" parameters to control the
MMIO and prefetchable MMIO window sizes of hotplug bridges
independently (Nicholas Johnson)
- Fix MMIO/MMIO_PREF window assignment that assigned more space than
desired (Nicholas Johnson)
- Only enforce bus numbers from bridge EA if the bridge has EA devices
downstream (Subbaraya Sundeep)
* pci/resource:
PCI: Do not use bus number zero from EA capability
PCI: Avoid double hpmemsize MMIO window assignment
PCI: Add "pci=hpmmiosize" and "pci=hpmmioprefsize" parameters
PCI: Add PCI_STD_NUM_BARS for the number of standard BARs
PCI: Fix missing bridge dma_ranges resource list cleanup
PCI: Protect pci_reassign_bridge_resources() against concurrent addition/removal
- Always return devices to D0 when thawing to fix hibernation with
drivers like mlx4 that used legacy power management (previously we only
did it for drivers with new power management ops) (Dexuan Cui)
- Clear PCIe PME Status even for legacy power management (Bjorn Helgaas)
- Fix PCI PM documentation errors (Bjorn Helgaas)
- Use dev_printk() for more power management messages (Bjorn Helgaas)
- Apply D2 delay as milliseconds, not microseconds (Bjorn Helgaas)
- Convert xen-platform from legacy to generic power management (Bjorn
Helgaas)
- Removed unused .resume_early() and .suspend_late() legacy power
management hooks (Bjorn Helgaas)
- Rearrange power management code for clarity (Rafael J. Wysocki)
- Decode power states more clearly ("4" or "D4" really refers to
"D3cold") (Bjorn Helgaas)
- Notice when reading PM Control register returns an error (~0) instead
of interpreting it as being in D3hot (Bjorn Helgaas)
- Add missing link delays required by the PCIe spec (Mika Westerberg)
* pci/pm:
PCI/PM: Move pci_dev_wait() definition earlier
PCI/PM: Add missing link delays required by the PCIe spec
PCI/PM: Add pcie_wait_for_link_delay()
PCI/PM: Return error when changing power state from D3cold
PCI/PM: Decode D3cold power state correctly
PCI/PM: Fold __pci_complete_power_transition() into its caller
PCI/PM: Avoid exporting __pci_complete_power_transition()
PCI/PM: Fold __pci_start_power_transition() into its caller
PCI/PM: Use pci_power_up() in pci_set_power_state()
PCI/PM: Move power state update away from pci_power_up()
PCI/PM: Remove unused pci_driver.suspend_late() hook
PCI/PM: Remove unused pci_driver.resume_early() hook
xen-platform: Convert to generic power management
PCI/PM: Simplify pci_set_power_state()
PCI/PM: Expand PM reset messages to mention D3hot (not just D3)
PCI/PM: Apply D2 delay as milliseconds, not microseconds
PCI/PM: Use pci_WARN() to include device information
PCI/PM: Use PCI dev_printk() wrappers for consistency
PCI/PM: Wrap long lines in documentation
PCI/PM: Note that PME can be generated from D0
PCI/PM: Make power management op coding style consistent
PCI/PM: Run resume fixups before disabling wakeup events
PCI/PM: Clear PCIe PME Status even for legacy power management
PCI/PM: Correct pci_pm_thaw_noirq() documentation
PCI/PM: Always return devices to D0 when thawing
Previously, CONFIG_PCIEASPM_DEBUG enabled "link_state" and "clk_ctl" sysfs
files that controlled ASPM. We believe these files were rarely if ever
used.
We recently added sysfs ASPM controls that are always present, so the debug
code is no longer needed. Removing this debug code has been discussed for
quite some time, see e.g. [0].
Remove PCIEASPM_DEBUG and the related code.
[0] https://lore.kernel.org/lkml/20180727202619.GD173328@bhelgaas-glaptop.roam.corp.google.com/
Link: https://lore.kernel.org/r/ec935d8e-c084-3938-f1d1-748617596b25@gmail.com
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Add sysfs attributes to Endpoints and other Upstream Ports to control ASPM,
Clock PM, and L1 PM Substates. The new attributes are:
/sys/devices/pci*/.../link/clkpm
/sys/devices/pci*/.../link/l0s_aspm
/sys/devices/pci*/.../link/l1_aspm
/sys/devices/pci*/.../link/l1_1_aspm
/sys/devices/pci*/.../link/l1_2_aspm
/sys/devices/pci*/.../link/l1_1_pcipm
/sys/devices/pci*/.../link/l1_2_pcipm
An attribute is only visible if both ends of the Link leading to the device
support the state. Writing y/1/on to the file enables the state; n/0/off
disables it.
These attributes can be used to tune the power/performance tradeoff for
individual devices.
[bhelgaas: commit log, rename directory to "link"]
Link: https://lore.kernel.org/r/b1c83f8a-9bf6-eac5-82d0-cf5b90128fbf@gmail.com
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Currently Linux does not follow PCIe spec regarding the required delays
after reset. A concrete example is a Thunderbolt add-in-card that consists
of a PCIe switch and two PCIe endpoints:
+-1b.0-[01-6b]----00.0-[02-6b]--+-00.0-[03]----00.0 TBT controller
+-01.0-[04-36]-- DS hotplug port
+-02.0-[37]----00.0 xHCI controller
\-04.0-[38-6b]-- DS hotplug port
The root port (1b.0) and the PCIe switch downstream ports are all PCIe Gen3
so they support 8GT/s link speeds.
We wait for the PCIe hierarchy to enter D3cold (runtime):
pcieport 0000:00:1b.0: power state changed by ACPI to D3cold
When it wakes up from D3cold, according to the PCIe 5.0 section 5.8 the
PCIe switch is put to reset and its power is re-applied. This means that we
must follow the rules in PCIe 5.0 section 6.6.1.
For the PCIe Gen3 ports we are dealing with here, the following applies:
With a Downstream Port that supports Link speeds greater than 5.0 GT/s,
software must wait a minimum of 100 ms after Link training completes
before sending a Configuration Request to the device immediately below
that Port. Software can determine when Link training completes by polling
the Data Link Layer Link Active bit or by setting up an associated
interrupt (see Section 6.7.3.3).
Translating this into the above topology we would need to do this (DLLLA
stands for Data Link Layer Link Active):
0000:00:1b.0: wait for 100 ms after DLLLA is set before access to 0000:01:00.0
0000:02:00.0: wait for 100 ms after DLLLA is set before access to 0000:03:00.0
0000:02:02.0: wait for 100 ms after DLLLA is set before access to 0000:37:00.0
I've instrumented the kernel with some additional logging so we can see the
actual delays performed:
pcieport 0000:00:1b.0: power state changed by ACPI to D0
pcieport 0000:00:1b.0: waiting for D3cold delay of 100 ms
pcieport 0000:00:1b.0: waiting for D3hot delay of 10 ms
pcieport 0000:02:01.0: waiting for D3hot delay of 10 ms
pcieport 0000:02:04.0: waiting for D3hot delay of 10 ms
For the switch upstream port (01:00.0 reachable through 00:1b.0 root port)
we wait for 100 ms but not taking into account the DLLLA requirement. We
then wait 10 ms for D3hot -> D0 transition of the root port and the two
downstream hotplug ports. This means that we deviate from what the spec
requires.
Performing the same check for system sleep (s2idle) transitions it turns
out to be even worse. None of the mandatory delays are performed. If this
would be S3 instead of s2idle then according to PCI FW spec 3.2 section
4.6.8. there is a specific _DSM that allows the OS to skip the delays but
this platform does not provide the _DSM and does not go to S3 anyway so no
firmware is involved that could already handle these delays.
On this particular platform these delays are not actually needed because
there is an additional delay as part of the ACPI power resource that is
used to turn on power to the hierarchy but since that additional delay is
not required by any of standards (PCIe, ACPI) it is not present in the
Intel Ice Lake, for example where missing the mandatory delays causes
pciehp to start tearing down the stack too early (links are not yet
trained). Below is an example how it looks like when this happens:
pcieport 0000:83:04.0: pciehp: Slot(4): Card not present
pcieport 0000:87:04.0: PME# disabled
pcieport 0000:83:04.0: pciehp: pciehp_unconfigure_device: domain🚌dev = 0000:86:00
pcieport 0000:86:00.0: Refused to change power state, currently in D3
pcieport 0000:86:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x201ff)
pcieport 0000:86:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
...
There is also one reported case (see the bugzilla link below) where the
missing delay causes xHCI on a Titan Ridge controller fail to runtime
resume when USB-C dock is plugged. This does not involve pciehp but instead
the PCI core fails to runtime resume the xHCI device:
pcieport 0000:04:02.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
pcieport 0000:04:02.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100406)
xhci_hcd 0000:39:00.0: Refused to change power state, currently in D3
xhci_hcd 0000:39:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x1ff)
xhci_hcd 0000:39:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
...
Add a new function pci_bridge_wait_for_secondary_bus() that is called on
PCI core resume and runtime resume paths accordingly if the bridge entered
D3cold (and thus went through reset).
This is second attempt to add the missing delays. The previous solution in
c2bf1fc212 ("PCI: Add missing link delays required by the PCIe spec") was
reverted because of two issues it caused:
1. One system become unresponsive after S3 resume due to PME service
spinning in pcie_pme_work_fn(). The root port in question reports that
the xHCI sent PME but the xHCI device itself does not have PME status
set. The PME status bit is never cleared in the root port resulting
the indefinite loop in pcie_pme_work_fn().
2. Slows down resume if the root/downstream port does not support Data
Link Layer Active Reporting because pcie_wait_for_link_delay() waits
1100 ms in that case.
This version should avoid the above issues because we restrict the delay to
happen only if the port went into D3cold.
Link: https://lore.kernel.org/linux-pci/SL2P216MB01878BBCD75F21D882AEEA2880C60@SL2P216MB0187.KORP216.PROD.OUTLOOK.COM/
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203885
Link: https://lore.kernel.org/r/20191112091617.70282-3-mika.westerberg@linux.intel.com
Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Make it explicitly clear that the code to put devices into D0 in
pci_set_power_state() and in pci_pm_default_resume_early() is the
same by making the latter use pci_power_up() for transitions into D0.
Code rearrangement, no intentional functional impact.
Link: https://lore.kernel.org/r/2520019.OZ1nXS5aSj@kreacher
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Now that all the PCI host drivers are using pci_parse_request_of_pci_ranges(),
make devm_of_pci_get_host_bridge_resources() static.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Extend devm_of_pci_get_host_bridge_resources() and
pci_parse_request_of_pci_ranges() helpers to also parse the inbound
addresses from DT 'dma-ranges' and populate a resource list with the
translated addresses. This will help ensure 'dma-ranges' is always
parsed in a consistent way.
Tested-by: Srinath Mannam <srinath.mannam@broadcom.com>
Tested-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com> # for AArdvark
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Srinath Mannam <srinath.mannam@broadcom.com>
Reviewed-by: Andrew Murray <andrew.murray@arm.com>
Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Cc: Jingoo Han <jingoohan1@gmail.com>
Cc: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Will Deacon <will@kernel.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Toan Le <toan@os.amperecomputing.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Tom Joseph <tjoseph@cadence.com>
Cc: Ray Jui <rjui@broadcom.com>
Cc: Scott Branden <sbranden@broadcom.com>
Cc: bcm-kernel-feedback-list@broadcom.com
Cc: Ryder Lee <ryder.lee@mediatek.com>
Cc: Karthikeyan Mitran <m.karthikeyan@mobiveil.co.in>
Cc: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Shawn Lin <shawn.lin@rock-chips.com>
Cc: Heiko Stuebner <heiko@sntech.de>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: rfi@lists.rocketboards.org
Cc: linux-mediatek@lists.infradead.org
Cc: linux-renesas-soc@vger.kernel.org
Cc: linux-rockchip@lists.infradead.org
The existing "pci=hpmemsize=nn[KMG]" kernel parameter overrides the default
size of both the non-prefetchable and the prefetchable MMIO windows for
hotplug bridges.
Add "pci=hpmmiosize=nn[KMG]" to override the default size of only the
non-prefetchable MMIO window.
Add "pci=hpmmioprefsize=nn[KMG]" to override the default size of only the
prefetchable MMIO window.
Link: https://lore.kernel.org/r/SL2P216MB0187E4D0055791957B7E2660806B0@SL2P216MB0187.KORP216.PROD.OUTLOOK.COM
Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Previously we did not save and restore the AER configuration on
suspend/resume, so the configuration may be lost after resume.
Save the AER configuration during suspend and restore it during resume.
[bhelgaas: commit log]
Link: https://lore.kernel.org/r/92EBB4272BF81E4089A7126EC1E7B28492C3B007@IRSMSX101.ger.corp.intel.com
Signed-off-by: Mayurkumar Patel <mayurkumar.patel@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
These interfaces:
void pci_restore_pri_state(struct pci_dev *pdev);
void pci_restore_pasid_state(struct pci_dev *pdev);
are only used in drivers/pci and do not need to be seen by the rest of the
kernel. Most them to drivers/pci/pci.h so they're private to the PCI
subsystem.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
- Use devm_add_action_or_reset() helper (Fuqian Huang)
- Mark expected switch fall-through (Gustavo A. R. Silva)
- Convert sysfs device attributes from __ATTR() to DEVICE_ATTR() (Kelsey
Skunberg)
- Convert sysfs file permissions from S_IRUSR etc to octal (Kelsey
Skunberg)
- Move SR-IOV sysfs functions to iov.c (Kelsey Skunberg)
- Add pci_info_ratelimited() to ratelimit PCI messages separately
(Krzysztof Wilczynski)
- Fix "'static' not at beginning of declaration" warnings (Krzysztof
Wilczynski)
- Clean up resource_alignment parameter to not require static buffer
(Logan Gunthorpe)
- Add ACS quirk for iProc PAXB (Abhinav Ratna)
- Add pci_irq_vector() and other stubs for !CONFIG_PCI (Herbert Xu)
* pci/misc:
PCI: Add pci_irq_vector() and other stubs when !CONFIG_PCI
PCI: Add ACS quirk for iProc PAXB
PCI: Force trailing new line to resource_alignment_param in sysfs
PCI: Move pci_[get|set]_resource_alignment_param() into their callers
PCI: Clean up resource_alignment parameter to not require static buffer
PCI: Use static const struct, not const static struct
PCI: Add pci_info_ratelimited() to ratelimit PCI separately
PCI/IOV: Remove group write permission from sriov_numvfs, sriov_drivers_autoprobe
PCI/IOV: Move sysfs SR-IOV functions to iov.c
PCI: sysfs: Change permissions from symbolic to octal
PCI: sysfs: Change DEVICE_ATTR() to DEVICE_ATTR_WO()
PCI: sysfs: Define device attributes with DEVICE_ATTR*()
PCI: Mark expected switch fall-through
PCI: Use devm_add_action_or_reset()
- Consolidate _HPP & _HPX code in pci-acpi.h and remove unnecessary
struct hotplug_program_ops (Krzysztof Wilczynski)
- Fixup PCIe device types to remove the need for dev->has_secondary_link
(Mika Westerberg)
* pci/enumeration:
PCI: Get rid of dev->has_secondary_link flag
PCI: Make pcie_downstream_port() available outside of access.c
PCI/ACPI: Remove unnecessary struct hotplug_program_ops
PCI/ACPI: Move _HPP & _HPX functions to pci-acpi.c
PCI/ACPI: Rename _HPX structs from hpp_* to hpx_*
pcie_downstream_port() is useful in other places where code needs to
determine whether the PCIe port is downstream so make it available outside
of access.c.
Link: https://lore.kernel.org/r/20190822085553.62697-1-mika.westerberg@linux.intel.com
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Move the ACPI-specific structs hpx_type0, hpx_type1, hpx_type2 and
hpx_type3 to drivers/pci/pci-acpi.c as they are not used anywhere else.
Then remove the struct hotplug_program_ops that has been shared between
drivers/pci/probe.c and drivers/pci/pci-acpi.c from drivers/pci/pci.h as it
is no longer needed.
The struct hotplug_program_ops was added by 87fcf12e84 ("PCI/ACPI: Remove
the need for 'struct hotplug_params'") and replaced previously used struct
hotplug_params enabling the support for the _HPX Type 3 Setting Record that
was added by f873c51a15 ("PCI/ACPI: Implement _HPX Type 3 Setting
Record").
The new struct allowed for the static functions such program_hpx_type0(),
program_hpx_type1(), etc., from the drivers/pci/probe.c to be called from
the function pci_acpi_program_hp_params() in the drivers/pci/pci-acpi.c.
Previously a programming of _HPX Type 0 was as follows:
drivers/pci/probe.c:
program_hpx_type0()
...
pci_configure_device()
hp_ops = {
.program_type0 = program_hpx_type0,
...
}
pci_acpi_program_hp_params(&hp_ops)
drivers/pci/pci-acpi.c:
pci_acpi_program_hp_params(&hp_ops)
acpi_run_hpx(hp_ops)
decode_type0_hpx_record()
hp_ops->program_type0 # program_hpx_type0() called via hp_ops
After the ACPI-specific functions, structs, enums, etc., have been moved to
drivers/pci/pci-acpi.c there is no need for the hotplug_program_ops as all
of the _HPX Type 0, 1, 2 and 3 are directly accessible.
Link: https://lore.kernel.org/r/20190827094951.10613-4-kw@linux.com
Signed-off-by: Krzysztof Wilczynski <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Move program_hpx_type0(), program_hpx_type1(), etc., and enums
hpx_type3_dev_type, hpx_type3_fn_type and hpx_type3_cfg_loc to
drivers/pci/pci-acpi.c as these functions and enums are ACPI-specific.
Move structs hpx_type0, hpx_type1, hpx_type2 and hpx_type3 to
drivers/pci/pci.h as these are shared between drivers/pci/pci-acpi.c and
drivers/pci/probe.c.
Link: https://lore.kernel.org/r/20190827094951.10613-3-kw@linux.com
Signed-off-by: Krzysztof Wilczynski <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The sysfs SR-IOV functions are only needed when the kernel is built with
SR-IOV support. Rather than put them in pci-sysfs.c under #ifdef
CONFIG_PCI_IOV, move them to iov.c, which is only compiled when
CONFIG_PCI_IOV=y.
Link: https://lore.kernel.org/r/20190813204513.4790-4-skunberg.kelsey@gmail.com
Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Donald Dutile <ddutile@redhat.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
These interfaces:
void pci_set_of_node(struct pci_dev *dev);
void pci_release_of_node(struct pci_dev *dev);
void pci_set_bus_of_node(struct pci_bus *bus);
void pci_release_bus_of_node(struct pci_bus *bus);
are only used in drivers/pci/ and do not need to be seen by the rest of the
kernel. Move them to drivers/pci/pci.h so they're private to the PCI
subsystem.
Link: https://lore.kernel.org/r/20190724233848.73327-12-skunberg.kelsey@gmail.com
Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This interface:
int pci_enable_ptm(struct pci_dev *dev, u8 *granularity);
is only used in drivers/pci/ and does not need to be seen by the rest of
the kernel. Move it to drivers/pci/pci.h so it's private to the PCI
subsystem.
Link: https://lore.kernel.org/r/20190724233848.73327-11-skunberg.kelsey@gmail.com
Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
These interfaces:
void pcie_set_ecrc_checking(struct pci_dev *dev);
void pcie_ecrc_get_policy(char *str);
are only used in drivers/pci/ and do not need to be seen by the rest of the
kernel. Move them to drivers/pci/pci.h so they're private to the PCI
subsystem.
Link: https://lore.kernel.org/r/20190724233848.73327-10-skunberg.kelsey@gmail.com
Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This interface:
void pci_ats_init(struct pci_dev *dev);
is only used in drivers/pci/ and does not need to be seen by the rest of
the kernel. Move it to drivers/pci/pci.h so it's private to the PCI
subsystem.
Link: https://lore.kernel.org/r/20190724233848.73327-9-skunberg.kelsey@gmail.com
Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This interface:
void pcie_update_link_speed(struct pci_bus *bus, u16 link_status);
is only used in drivers/pci/ and does not need to be seen by the rest of
the kernel. Move it to drivers/pci/pci.h so it's private to the PCI
subsystem.
Link: https://lore.kernel.org/r/20190724233848.73327-8-skunberg.kelsey@gmail.com
Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
These interfaces:
struct pci_bus *pci_bus_get(struct pci_bus *bus);
void pci_bus_put(struct pci_bus *bus);
are only used in drivers/pci/ and do not need to be seen by the rest of the
kernel. Move them to drivers/pci/pci.h so they're private to the PCI
subsystem.
Link: https://lore.kernel.org/r/20190724233848.73327-7-skunberg.kelsey@gmail.com
Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
These symbols:
extern unsigned long pci_hotplug_io_size;
extern unsigned long pci_hotplug_mem_size;
extern unsigned long pci_hotplug_bus_size;
are only used in drivers/pci/ and do not need to be seen by the rest of the
kernel. Move them to drivers/pci/pci.h so they're private to the PCI
subsystem.
Link: https://lore.kernel.org/r/20190724233848.73327-6-skunberg.kelsey@gmail.com
Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
These Virtual Channel interfaces:
int pci_save_vc_state(struct pci_dev *dev);
void pci_restore_vc_state(struct pci_dev *dev);
void pci_allocate_vc_save_buffers(struct pci_dev *dev);
are only used in drivers/pci/ and do not need to be seen by the rest of the
kernel. Move them to drivers/pci/pci.h so they're private to the PCI
subsystem.
Link: https://lore.kernel.org/r/20190724233848.73327-5-skunberg.kelsey@gmail.com
Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
These interfaces:
struct device *pci_get_host_bridge_device(struct pci_dev *dev);
void pci_put_host_bridge_device(struct device *dev);
are only used in drivers/pci/ and do not need to be seen by the rest of the
kernel. Move them to drivers/pci/pci.h so they're private to the PCI
subsystem.
Link: https://lore.kernel.org/r/20190724233848.73327-4-skunberg.kelsey@gmail.com
Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
These interfaces:
bool pci_check_pme_status(struct pci_dev *dev);
void pci_pme_wakeup_bus(struct pci_bus *bus);
are only used in drivers/pci/ and do not need to be seen by the rest of the
kernel. Move them to drivers/pci/pci.h so they're private to the PCI
subsystem.
Link: https://lore.kernel.org/r/20190724233848.73327-3-skunberg.kelsey@gmail.com
Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
These delay time definitions:
#define PCI_PM_D2_DELAY 200
#define PCI_PM_D3_WAIT 10
#define PCI_PM_D3COLD_WAIT 100
#define PCI_PM_BUS_WAIT 50
are only used in drivers/pci/ and do not need to be seen by the rest of the
kernel. Move them to drivers/pci/pci.h so they're private to the PCI
subsystem.
Link: https://lore.kernel.org/r/20190724233848.73327-2-skunberg.kelsey@gmail.com
Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Revert 975bb8b4dc ("PCI/IOV: Use VF0 cached config space size for other
VFs"), which attempted to cache the config space size from the first VF to
re-use for subsequent VFs.
The cached value was determined prior to discovering the PCIe capability on
the VF, which resulted in the first VF reporting the correct config space
size (4K), as it has a special case through pci_cfg_space_size(), while all
the other VFs only reported 256 bytes. As this was only a performance
optimization, we're better off without it.
Fixes: 975bb8b4dc ("PCI/IOV: Use VF0 cached config space size for other VFs")
Link: https://lore.kernel.org/r/156046663197.29869.3633634445109057665.stgit@gimli.home
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: KarimAllah Ahmed <karahmed@amazon.de>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: Hao Zheng <yinhe@linux.alibaba.com>
In pci_pm_complete() there are checks to decide whether or not to
resume devices that were left in runtime-suspend during the preceding
system-wide transition into a sleep state. They involve checking the
current power state of the device and comparing it with the power
state of it set before the preceding system-wide transition, but the
platform component of the device's power state is not handled
correctly in there.
Namely, on platforms with ACPI, the device power state information
needs to be updated with care, so that the reference counters of
power resources used by the device (if any) are set to ensure that
the refreshed power state of it will be maintained going forward.
To that end, introduce a new ->refresh_state() platform PM callback
for PCI devices, for asking the platform to refresh the device power
state data and ensure that the corresponding power state will be
maintained going forward, make it invoke acpi_device_update_power()
(for devices with ACPI PM) on platforms with ACPI and make
pci_pm_complete() use it, through a new pci_refresh_power_state()
wrapper function.
Fixes: a0d2a959d3 (PCI: Avoid unnecessary resume after direct-complete)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Currently Linux does not follow PCIe spec regarding the required delays
after reset. A concrete example is a Thunderbolt add-in-card that
consists of a PCIe switch and two PCIe endpoints:
+-1b.0-[01-6b]----00.0-[02-6b]--+-00.0-[03]----00.0 TBT controller
+-01.0-[04-36]-- DS hotplug port
+-02.0-[37]----00.0 xHCI controller
\-04.0-[38-6b]-- DS hotplug port
The root port (1b.0) and the PCIe switch downstream ports are all PCIe
gen3 so they support 8GT/s link speeds.
We wait for the PCIe hierarchy to enter D3cold (runtime):
pcieport 0000:00:1b.0: power state changed by ACPI to D3cold
When it wakes up from D3cold, according to the PCIe 4.0 section 5.8 the
PCIe switch is put to reset and its power is re-applied. This means that
we must follow the rules in PCIe 4.0 section 6.6.1.
For the PCIe gen3 ports we are dealing with here, the following applies:
With a Downstream Port that supports Link speeds greater than 5.0
GT/s, software must wait a minimum of 100 ms after Link training
completes before sending a Configuration Request to the device
immediately below that Port. Software can determine when Link training
completes by polling the Data Link Layer Link Active bit or by setting
up an associated interrupt (see Section 6.7.3.3).
Translating this into the above topology we would need to do this (DLLLA
stands for Data Link Layer Link Active):
pcieport 0000:00:1b.0: wait for 100ms after DLLLA is set before access to 0000:01:00.0
pcieport 0000:02:00.0: wait for 100ms after DLLLA is set before access to 0000:03:00.0
pcieport 0000:02:02.0: wait for 100ms after DLLLA is set before access to 0000:37:00.0
I've instrumented the kernel with additional logging so we can see the
actual delays the kernel performs:
pcieport 0000:00:1b.0: power state changed by ACPI to D0
pcieport 0000:00:1b.0: waiting for D3cold delay of 100 ms
pcieport 0000:00:1b.0: waking up bus
pcieport 0000:00:1b.0: waiting for D3hot delay of 10 ms
pcieport 0000:00:1b.0: restoring config space at offset 0x2c (was 0x60, writing 0x60)
...
pcieport 0000:00:1b.0: PME# disabled
pcieport 0000:01:00.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
...
pcieport 0000:01:00.0: PME# disabled
pcieport 0000:02:00.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
...
pcieport 0000:02:00.0: PME# disabled
pcieport 0000:02:01.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
...
pcieport 0000:02:01.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407)
pcieport 0000:02:01.0: PME# disabled
pcieport 0000:02:02.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
...
pcieport 0000:02:02.0: PME# disabled
pcieport 0000:02:04.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
...
pcieport 0000:02:04.0: PME# disabled
pcieport 0000:02:01.0: PME# enabled
pcieport 0000:02:01.0: waiting for D3hot delay of 10 ms
pcieport 0000:02:04.0: PME# enabled
pcieport 0000:02:04.0: waiting for D3hot delay of 10 ms
thunderbolt 0000:03:00.0: restoring config space at offset 0x14 (was 0x0, writing 0x8a040000)
...
thunderbolt 0000:03:00.0: PME# disabled
xhci_hcd 0000:37:00.0: restoring config space at offset 0x10 (was 0x0, writing 0x73f00000)
...
xhci_hcd 0000:37:00.0: PME# disabled
For the switch upstream port (01:00.0) we wait for 100ms but not taking
into account the DLLLA requirement. We then wait 10ms for D3hot -> D0
transition of the root port and the two downstream hotplug ports. This
means that we deviate from what the spec requires.
Performing the same check for system sleep (s2idle) transitions we can
see following when resuming from s2idle:
pcieport 0000:00:1b.0: power state changed by ACPI to D0
pcieport 0000:00:1b.0: restoring config space at offset 0x2c (was 0x60, writing 0x60)
...
pcieport 0000:01:00.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
...
pcieport 0000:02:02.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
pcieport 0000:02:02.0: restoring config space at offset 0x2c (was 0x0, writing 0x0)
pcieport 0000:02:01.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
pcieport 0000:02:04.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
pcieport 0000:02:02.0: restoring config space at offset 0x28 (was 0x0, writing 0x0)
pcieport 0000:02:00.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
pcieport 0000:02:02.0: restoring config space at offset 0x24 (was 0x10001, writing 0x1fff1)
pcieport 0000:02:01.0: restoring config space at offset 0x2c (was 0x0, writing 0x60)
pcieport 0000:02:02.0: restoring config space at offset 0x20 (was 0x0, writing 0x73f073f0)
pcieport 0000:02:04.0: restoring config space at offset 0x2c (was 0x0, writing 0x60)
pcieport 0000:02:01.0: restoring config space at offset 0x28 (was 0x0, writing 0x60)
pcieport 0000:02:00.0: restoring config space at offset 0x2c (was 0x0, writing 0x0)
pcieport 0000:02:02.0: restoring config space at offset 0x1c (was 0x101, writing 0x1f1)
pcieport 0000:02:04.0: restoring config space at offset 0x28 (was 0x0, writing 0x60)
pcieport 0000:02:01.0: restoring config space at offset 0x24 (was 0x10001, writing 0x1ff10001)
pcieport 0000:02:00.0: restoring config space at offset 0x28 (was 0x0, writing 0x0)
pcieport 0000:02:02.0: restoring config space at offset 0x18 (was 0x0, writing 0x373702)
pcieport 0000:02:04.0: restoring config space at offset 0x24 (was 0x10001, writing 0x49f12001)
pcieport 0000:02:01.0: restoring config space at offset 0x20 (was 0x0, writing 0x73e05c00)
pcieport 0000:02:00.0: restoring config space at offset 0x24 (was 0x10001, writing 0x1fff1)
pcieport 0000:02:04.0: restoring config space at offset 0x20 (was 0x0, writing 0x89f07400)
pcieport 0000:02:01.0: restoring config space at offset 0x1c (was 0x101, writing 0x5151)
pcieport 0000:02:00.0: restoring config space at offset 0x20 (was 0x0, writing 0x8a008a00)
pcieport 0000:02:02.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
pcieport 0000:02:04.0: restoring config space at offset 0x1c (was 0x101, writing 0x6161)
pcieport 0000:02:01.0: restoring config space at offset 0x18 (was 0x0, writing 0x360402)
pcieport 0000:02:00.0: restoring config space at offset 0x1c (was 0x101, writing 0x1f1)
pcieport 0000:02:04.0: restoring config space at offset 0x18 (was 0x0, writing 0x6b3802)
pcieport 0000:02:02.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407)
pcieport 0000:02:00.0: restoring config space at offset 0x18 (was 0x0, writing 0x30302)
pcieport 0000:02:01.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
pcieport 0000:02:04.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
pcieport 0000:02:00.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
pcieport 0000:02:01.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407)
pcieport 0000:02:04.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407)
pcieport 0000:02:00.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407)
xhci_hcd 0000:37:00.0: restoring config space at offset 0x10 (was 0x0, writing 0x73f00000)
...
thunderbolt 0000:03:00.0: restoring config space at offset 0x14 (was 0x0, writing 0x8a040000)
This is even worse. None of the mandatory delays are performed. If this
would be S3 instead of s2idle then according to PCI FW spec 3.2 section
4.6.8. there is a specific _DSM that allows the OS to skip the delays
but this platform does not provide the _DSM and does not go to S3 anyway
so no firmware is involved that could already handle these delays.
In this particular Intel Coffee Lake platform these delays are not
actually needed because there is an additional delay as part of the ACPI
power resource that is used to turn on power to the hierarchy but since
that additional delay is not required by any of standards (PCIe, ACPI)
it is not present in the Intel Ice Lake, for example where missing the
mandatory delays causes pciehp to start tearing down the stack too early
(links are not yet trained).
For this reason, change the PCIe portdrv PM resume hooks so that they
perform the mandatory delays before the downstream component gets
resumed. We perform the delays before port services are resumed because
otherwise pciehp might find that the link is not up (even if it is just
training) and tears-down the hierarchy.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The code in pci_dev_keep_suspended() is relatively hard to follow due
to the negative checks in it and in its callers and the function has
a possible side-effect (disabling the PME) which doesn't really match
its role.
For this reason, move the PME disabling from pci_dev_keep_suspended()
to a separate function and change the semantics (and name) of the
rest of it, so that 'true' is returned when the device needs to be
resumed (and not the other way around). Change the callers of
pci_dev_keep_suspended() accordingly.
While at it, make the code flow in pci_pm_poweroff() reflect the
pci_pm_suspend() more closely to avoid arbitrary differences between
them.
This is a cosmetic change with no intention to alter behavior.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
If a multi-function device's bandwidth is already limited when it is
enumerated, a message is logged only for function 0. By contrast, when
downtraining occurs after enumeration, a message is logged for all
functions. That's because the former uses pcie_report_downtraining(),
whereas the latter uses __pcie_print_link_status() (which doesn't filter
functions != 0). I am seeing this happen on a MacBookPro9,1 with a GPU
(function 0) and an integrated HDA controller (function 1).
Avoid this incongruence by calling pcie_report_downtraining() in both
cases.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alexandru Gagniuc <alex.gagniuc@dellteam.com>
This file makes use of definitions provided in <linux/pci.h>. This only
compiles when <linux/pci.h> is included beforehand, and creates a nasty
include dependency. Instead, just include the correct file.
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
- Cache VF config space size to optimize enumeration of many VFs
(KarimAllah Ahmed)
- Remove unnecessary <linux/pci-ats.h> include (Bjorn Helgaas)
* pci/virtualization:
PCI/IOV: Remove unnecessary include of <linux/pci-ats.h>
PCI/IOV: Use VF0 cached config space size for other VFs
Cache the config space size from VF0 and use it for all other VFs instead
of reading it from the config space of each VF. We assume that it will be
the same across all associated VFs.
This is an optimization when enabling SR-IOV on a device with many VFs.
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
[bhelgaas: use CONFIG_PCI_IOV (not CONFIG_PCI_ATS)]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
In order to have better power management for Thunderbolt PCIe chains,
Windows enables power management for native PCIe hotplug ports if there is
the following ACPI _DSD attached to the root port:
Name (_DSD, Package () {
ToUUID ("6211e2c0-58a3-4af3-90e1-927a4e0c55a4"),
Package () {
Package () {"HotPlugSupportInD3", 1}
}
})
This is also documented in:
https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-pcie-root-ports-supporting-hot-plug-in-d3
Do the same in Linux by introducing new firmware PM callback
(->bridge_d3()) and then implement it for ACPI based systems so that the
above property is checked.
There is one catch, though. The initial pci_dev->bridge_d3 is set before
the root port has ACPI companion bound (the device is not added to the PCI
bus either) so we need to look up the ACPI companion manually in that case
in acpi_pci_bridge_d3().
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Bring surprise removals and permanent failures together so we no longer
need separate flags. The implementation enforces that error handling will
not be able to override a surprise removal's permanent channel failure.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
We don't need to be paranoid about the topology changing while handling an
error. If the device has changed in a hotplug capable slot, we can rely on
the presence detection handling to react to a changing topology.
Restore the fatal error handling behavior that existed before merging DPC
with AER with 7e9084b367 ("PCI/AER: Handle ERR_FATAL with removal and
re-enumeration of devices").
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
The secondary bus reset may have link side effects that a hotplug capable
port may incorrectly react to. Use the slot specific reset for hotplug
ports, fixing the undesirable link down-up handling during error
recovering.
Signed-off-by: Keith Busch <keith.busch@intel.com>
[bhelgaas: fold in
https://lore.kernel.org/linux-pci/20180926152326.14821-1-keith.busch@intel.com
for issue reported by Stephen Rothwell <sfr@canb.auug.org.au>]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
This patch provides DPC save and restore capabilities. This is necessary
for the driver to observe DPC events in the event the configuration space
needs to be restored after a reset.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
- To avoid bus errors, enable PASID only if entire path supports End-End
TLP prefixes (Sinan Kaya)
- Unify slot and bus reset functions and remove hotplug knowledge from
callers (Sinan Kaya)
- Add Function-Level Reset quirks for Intel and Samsung NVMe devices to
fix guest reboot issues (Alex Williamson)
- Add function 1 DMA alias quirk for Marvell 88SS9183 PCIe SSD Controller
(Bjorn Helgaas)
* pci/virtualization:
PCI: Add function 1 DMA alias quirk for Marvell 88SS9183
PCI: Delay after FLR of Intel DC P3700 NVMe
PCI: Disable Samsung SM961/PM961 NVMe before FLR
PCI: Export pcie_has_flr()
PCI: Rename pci_try_reset_bus() to pci_reset_bus()
PCI: Deprecate pci_reset_bus() and pci_reset_slot() functions
PCI: Unify try slot and bus reset API
PCI: Hide pci_reset_bridge_secondary_bus() from drivers
IB/hfi1: Use pci_try_reset_bus() for initiating PCI Secondary Bus Reset
PCI: Handle error return from pci_reset_bridge_secondary_bus()
PCI/IOV: Tidy pci_sriov_set_totalvfs()
PCI: Enable PASID only if entire path supports End-End TLP prefixes
# Conflicts:
# drivers/pci/hotplug/pciehp_hpc.c
- Clean up devm_of_pci_get_host_bridge_resources() resource allocation
(Jan Kiszka)
- Fixup resizable BARs after suspend/resume (Christian König)
- Make "pci=earlydump" generic (Sinan Kaya)
- Fix ROM BAR access routines to stay in bounds and check for signature
correctly (Rex Zhu)
* pci/resource:
PCI: Make pci_get_rom_size() static
PCI: Add check code for last image indicator not set
PCI: Avoid accessing memory outside the ROM BAR
PCI: Make early dump functionality generic
PCI: Cleanup PCI_REBAR_CTRL_BAR_SHIFT handling
PCI: Restore resized BAR state on resume
PCI: Clean up resource allocation in devm_of_pci_get_host_bridge_resources()
# Conflicts:
# Documentation/admin-guide/kernel-parameters.txt
- Add "pci=disable_acs_redir=" parameter to disable ACS redirection for
peer-to-peer DMA support (we don't have the peer-to-peer support yet;
this is just one piece) (Logan Gunthorpe)
* pci/peer-to-peer:
PCI: Add ACS Redirect disable quirk for Intel Sunrise Point
PCI: Add device-specific ACS Redirect disable infrastructure
PCI: Convert device-specific ACS quirks from NULL termination to ARRAY_SIZE
PCI: Add "pci=disable_acs_redir=" parameter for peer-to-peer support
PCI: Allow specifying devices using a base bus and path of devfns
PCI: Make specifying PCI devices in kernel parameters reusable
PCI: Hide ACS quirk declarations inside PCI core
- Work around IDT switch ACS Source Validation erratum (James
Puthukattukaran)
- Emit diagnostics for all cases of PCIe Link downtraining (Links
operating slower than they're capable of) (Alexandru Gagniuc)
- Skip VFs when configuring Max Payload Size (Myron Stowe)
- Reduce Root Port Max Payload Size if necessary when hot-adding a device
below it (Myron Stowe)
* pci/enumeration:
PCI: Match Root Port's MPS to endpoint's MPSS as necessary
PCI: Skip MPS logic for Virtual Functions (VFs)
PCI: Check for PCIe Link downtraining
PCI: Workaround IDT switch ACS Source Validation erratum
- Decode AER errors with names similar to "lspci" (Tyler Baicar)
- Expose AER statistics in sysfs (Rajat Jain)
- Clear AER status bits selectively based on the type of recovery (Oza
Pawandeep)
- Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST (Alexandru
Gagniuc)
- Don't clear AER status bits if we're using the "Firmware-First"
strategy where firmware owns the registers (Alexandru Gagniuc)
* pci/aer:
PCI/AER: Don't clear AER bits if error handling is Firmware-First
PCI/AER: Remove duplicate PCI_EXP_AER_FLAGS definition
PCI/portdrv: Remove pcie_portdrv_err_handler.slot_reset
PCI/AER: Clear device status bits during ERR_COR handling
PCI/AER: Clear device status bits during ERR_FATAL and ERR_NONFATAL
PCI/AER: Remove ERR_FATAL code from ERR_NONFATAL path
PCI/AER: Factor out ERR_NONFATAL status bit clearing
PCI/AER: Clear only ERR_NONFATAL bits during non-fatal recovery
PCI/AER: Clear only ERR_FATAL status bits during fatal recovery
PCI/AER: Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST
PCI/AER: Add sysfs attributes for rootport cumulative stats
PCI/AER: Add sysfs attributes to provide AER stats and breakdown
PCI/AER: Define aer_stats structure for AER capable devices
PCI/AER: Move internal declarations to drivers/pci/pci.h
PCI/AER: Adopt lspci names for AER error decoding
PCI/AER: Expose internal API for obtaining AER information
# Conflicts:
# drivers/pci/pci.h
When both ends of a PCIe Link are capable of a higher bandwidth than is
currently in use, the Link is said to be "downtrained". A downtrained Link
may indicate hardware or configuration problems in the system, but it's
hard to identify such Links from userspace.
Refactor pcie_print_link_status() so it continues to always print PCIe
bandwidth information, as several NIC drivers desire.
Add a new internal __pcie_print_link_status() to emit a message only when a
device's bandwidth is constrained by the fabric and call it from the PCI
core for all devices, which identifies all downtrained Links. It also
emits messages for a few cases that are technically not downtrained, such
as a x4 device in an open-ended x1 slot.
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
[bhelgaas: changelog, move __pcie_print_link_status() declaration to
drivers/pci/, rename pcie_check_upstream_link() to
pcie_report_downtraining()]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Intel Sunrise Point (SPT) PCH hardware has an implementation of the ACS
bits that does not comply with the PCIe standard. To deal with this we
need device-specific quirks to disable ACS redirection.
Add a new pci_dev_specific_disable_acs_redir() quirk and a new
.disable_acs_redir() function pointer for use by non-compliant devices. No
functional change intended.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: split to separate patch, move
pci_dev_specific_disable_acs_redir() declarations to drivers/pci/pci.h]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Move declarations for these functions:
pci_dev_specific_acs_enabled()
pci_dev_specific_enable_acs()
from include/linux/pci.h to drivers/pci/pci.h because nothing outside the
PCI core needs to use them.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
When a PCI device is detected, pdev->is_added is set to 1 and proc and
sysfs entries are created.
When the device is removed, pdev->is_added is checked for one and then
device is detached with clearing of proc and sys entries and at end,
pdev->is_added is set to 0.
is_added and is_busmaster are bit fields in pci_dev structure sharing same
memory location.
A strange issue was observed with multiple removal and rescan of a PCIe
NVMe device using sysfs commands where is_added flag was observed as zero
instead of one while removing device and proc,sys entries are not cleared.
This causes issue in later device addition with warning message
"proc_dir_entry" already registered.
Debugging revealed a race condition between the PCI core setting the
is_added bit in pci_bus_add_device() and the NVMe driver reset work-queue
setting the is_busmaster bit in pci_set_master(). As these fields are not
handled atomically, that clears the is_added bit.
Move the is_added bit to a separate private flag variable and use atomic
functions to set and retrieve the device addition state. This avoids the
race because is_added no longer shares a memory location with is_busmaster.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200283
Signed-off-by: Hari Vyas <hari.vyas@broadcom.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Clear the device status bits while handling both ERR_FATAL and ERR_NONFATAL
cases.
Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: rename to pci_aer_clear_device_status(), declare internal to PCI
core instead of exposing it everywhere]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
During recovery from fatal errors, we previously called
pci_cleanup_aer_uncorrect_error_status(), which cleared *all* uncorrectable
error status bits (both ERR_FATAL and ERR_NONFATAL).
Instead, call a new pci_aer_clear_fatal_status() that clears only the
ERR_FATAL bits (as indicated by the PCI_ERR_UNCOR_SEVER register).
Based-on-patch-by: Oza Pawandeep <poza@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Rename pci_reset_bridge_secondary_bus() to pci_bridge_secondary_bus_reset()
and move the declaration from linux/pci.h to drivers/pci.h to be used
internally in PCI directory only.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Add sysfs attributes to provide total and breakdown of the AERs seen,
into different type of correctable, fatal and nonfatal errors:
/sys/bus/pci/devices/<dev>/aer_dev_correctable
/sys/bus/pci/devices/<dev>/aer_dev_fatal
/sys/bus/pci/devices/<dev>/aer_dev_nonfatal
Signed-off-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Define a structure to hold the AER statistics. There are 2 groups of
statistics: dev_* counters that are to be collected for all AER capable
devices and rootport_* counters that are collected for all (AER capable)
rootports only. Allocate and free this structure when device is added or
released (thus counters survive the lifetime of the device).
Signed-off-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Since pci_aer_init() and pci_no_aer() are used only internally, move their
declarations to the PCI internal header file. Also, no one cares about
return value of pci_aer_init(), so make it void.
Signed-off-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Export some common AER functions and structures for other PCI core drivers
to use. Since this is making the function externally visible inside the
PCI core, prepend "aer_" to the function name.
Signed-off-by: Keith Busch <keith.busch@intel.com>
[bhelgaas: move AER declarations from linux/aer.h to drivers/pci/pci.h]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
Reviewed-by: Oza Pawandeep <poza@codeaurora.org>
Some IDT switches incorrectly flag an ACS Source Validation error on
completions for config read requests even though PCIe r4.0, sec 6.12.1.1,
says that completions are never affected by ACS Source Validation. Here's
the text of IDT 89H32H8G3-YC, erratum #36:
Item #36 - Downstream port applies ACS Source Validation to Completions
Section 6.12.1.1 of the PCI Express Base Specification 3.1 states that
completions are never affected by ACS Source Validation. However,
completions received by a downstream port of the PCIe switch from a
device that has not yet captured a PCIe bus number are incorrectly
dropped by ACS Source Validation by the switch downstream port.
Workaround: Issue a CfgWr1 to the downstream device before issuing the
first CfgRd1 to the device. This allows the downstream device to capture
its bus number; ACS Source Validation no longer stops completions from
being forwarded by the downstream port. It has been observed that
Microsoft Windows implements this workaround already; however, some
versions of Linux and other operating systems may not.
When doing the first config read to probe for a device, if the device is
behind an IDT switch with this erratum:
1. Disable ACS Source Validation if enabled
2. Wait for device to become ready to accept config accesses (by using
the Config Request Retry Status mechanism)
3. Do a config write to the endpoint
4. Enable ACS Source Validation (if it was enabled to begin with)
The workaround suggested by IDT is basically only step 3, but we don't know
when the device is ready to accept config requests. That means we need to
do config reads until we receive a non-Config Request Retry Status, which
means we need to disable ACS SV temporarily.
Signed-off-by: James Puthukattukaran <james.puthukattukaran@oracle.com>
[bhelgaas: changelog, clean up whitespace, fold in unused variable fix
from Anders Roxell <anders.roxell@linaro.org>]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Move early dump functionality into common code so that it is available for
all architectures. No need to carry arch-specific reads around as the read
hooks are already initialized by the time pci_setup_device() is getting
called during scan.
Tested-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
The TotalVFs register in the SR-IOV capability is the hardware limit on the
number of VFs. A PF driver can limit the number of VFs further with
pci_sriov_set_totalvfs(). When the PF driver is removed, reset any VF
limit that was imposed by the driver because that limit may not apply to
other drivers.
Before 8d85a7a4f2 ("PCI/IOV: Allow PF drivers to limit total_VFs to 0"),
pci_sriov_set_totalvfs(pdev, 0) meant "we can enable TotalVFs virtual
functions", and the nfp driver used that to remove the VF limit when the
driver unloads.
8d85a7a4f2 broke that because instead of removing the VF limit,
pci_sriov_set_totalvfs(pdev, 0) actually sets the limit to zero, and that
limit persists even if another driver is loaded.
We could fix that by making the nfp driver reset the limit when it unloads,
but it seems more robust to do it in the PCI core instead of relying on the
driver.
The regression scenario is:
nfp_pci_probe (driver 1)
...
nfp_pci_remove
pci_sriov_set_totalvfs(pf->pdev, 0) # limits VFs to 0
...
nfp_pci_probe (driver 2)
nfp_rtsym_read_le("nfd_vf_cfg_max_vfs")
# no VF limit from firmware
Now driver 2 is broken because the VF limit is still 0 from driver 1.
Fixes: 8d85a7a4f2 ("PCI/IOV: Allow PF drivers to limit total_VFs to 0")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
[bhelgaas: changelog, rename functions]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
- reduce Keystone "link already up" log level (Fabio Estevam)
- move private DT functions to drivers/pci/ (Rob Herring)
- factor out dwc CONFIG_PCI Kconfig dependencies (Rob Herring)
- add DesignWare support to the endpoint test driver (Gustavo Pimentel)
- add DesignWare support for endpoint mode (Gustavo Pimentel)
- use devm_ioremap_resource() instead of devm_ioremap() in dra7xx and
artpec6 (Gustavo Pimentel)
- fix Qualcomm bitwise NOT issue (Dan Carpenter)
- add Qualcomm runtime PM support (Srinivas Kandagatla)
* lorenzo/pci/dwc:
PCI: qcom: add runtime pm support to pcie_port
PCI: qcom: Fix a bitwise vs logical NOT typo
PCI: dwc: dra7xx: Use devm_ioremap_resource() instead of devm_ioremap()
PCI: dwc: artpec6: Use devm_ioremap_resource() instead of devm_ioremap()
misc: pci_endpoint_test: Add DesignWare EP entry
dt-bindings: PCI: designware: Add support for EP in DesignWare driver
PCI: dwc: Add support for EP mode
dt-bindings: PCI: designware: Example update
PCI: Move private DT related functions into private header
PCI: dwc: Move CONFIG_PCI depends to menu
PCI: dwc: Replace magic number by defines
PCI: dwc: Small computation improvement
PCI: dwc: Replace lower into upper case characters
PCI: dwc: Define maximum number of vectors
PCI: imx6: Remove space before tabs
PCI: keystone: Do not treat link up message as error
# Conflicts:
# include/linux/of_pci.h
Pass the service type to pcie_do_fatal_recovery() instead of assuming AER.
We will make DPC also use pcie_do_fatal_recovery(), and it needs to do
things a little differently for AER and DPC.
Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: split to separate patch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>