It should not be necessary to update the current_state field of
struct pci_dev in pci_enable_device_flags() before calling
do_pci_enable_device() for the device, because none of the
code between that point and the pci_set_power_state() call in
do_pci_enable_device() invoked later depends on it.
Moreover, doing that is actively harmful in some cases. For example,
if the given PCI device depends on an ACPI power resource whose _STA
method initially returns 0 ("off"), but the config space of the PCI
device is accessible and the power state retrieved from the
PCI_PM_CTRL register is D0, the current_state field in the struct
pci_dev representing that device will get out of sync with the
power.state of its ACPI companion object and that will lead to
power management issues going forward.
To avoid such issues it is better to leave the current_state value
as is until it is changed to PCI_D0 by do_pci_enable_device() as
appropriate. However, the power state of the device is not changed
to PCI_D0 if it is already enabled when pci_enable_device_flags()
gets called for it, so update its current_state in that case, but
use pci_update_current_state() covering platform PM too for that.
Link: https://lore.kernel.org/lkml/20210314000439.3138941-1-luzmaximilian@gmail.com/
Reported-by: Maximilian Luz <luzmaximilian@gmail.com>
Tested-by: Maximilian Luz <luzmaximilian@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmA2xiQUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vzRDA/9GCyEskI9DMtyT9UeoTMzpHcUZpaU
eCbLa2BSPjOKlrHLnPY7IwE0nT7ihe4OOcm8uOYOWtulE46XJNCHfxlUYP3SbI0Y
JlG0FBCh4ldzCzzKsftwkSvVhk+gn+ms9ucJ8q2iBSOXVhG/41IbX7++8IfbQM4v
VHjdYUmTCCiOSRDtBVi82p4+GAHxH8IhaB0gDNb1Q7myj+qJKL5nKjK/nukgO0fO
UpCnSxyua48Ij+c59Y1QAIhGeORq5Gg5Q4ussY3FxS9ovhZODEGQwCFniTfilqRw
wEB9Fb8tiPY60ljEyDPnERMkiW69zutTJqOY4LfwmoRM9IEbxD6VPIqF5gin8sB7
pHhX4KUU+eB1hQdK9SGKjkwyehquNKzTdxsu2jccltOKwBm5jcXYeOvu2bJTzZn+
rrZPYJoA1dQig3bEuOzsBxvW4Jaj7IsVfVcao4OzXyh8Y7tLr9kVDXxr7JC/EkPM
zRK24yglERD2J1JXgNMvOuJQj6JmRHhEbV/faZci8x8ZEaz1FawRAUZqHf/gGmnW
2CllarHbRnchPyD8btv03Mp84WG6fCfKy7zG2D8HxOsiStDO/5ICehHtGcvYg7IL
RuE4Tj8OKdcbw/8cO4C3842FqiSj34+jooNIHSLyBqcpJam6VsN4XqNIZCL+DeG5
Q2JXruAaahTWOZg=
=GXL5
-----END PGP SIGNATURE-----
Merge tag 'pci-v5.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Remove unnecessary locking around _OSC (Bjorn Helgaas)
- Clarify message about _OSC failure (Bjorn Helgaas)
- Remove notification of PCIe bandwidth changes (Bjorn Helgaas)
- Tidy checking of syscall user config accessors (Heiner Kallweit)
Resource management:
- Decline to resize resources if boot config must be preserved (Ard
Biesheuvel)
- Fix pci_register_io_range() memory leak (Geert Uytterhoeven)
Error handling (Keith Busch):
- Clear error status from the correct device
- Retain error recovery status so drivers can use it after reset
- Log the type of Port (Root or Switch Downstream) that we reset
- Always request a reset for Downstream Ports in frozen state
Endpoint framework and NTB (Kishon Vijay Abraham I):
- Make *_get_first_free_bar() take into account 64 bit BAR
- Add helper API to get the 'next' unreserved BAR
- Make *_free_bar() return error codes on failure
- Remove unused pci_epf_match_device()
- Add support to associate secondary EPC with EPF
- Add support in configfs to associate two EPCs with EPF
- Add pci_epc_ops to map MSI IRQ
- Add pci_epf_ops to expose function-specific attrs
- Allow user to create sub-directory of 'EPF Device' directory
- Implement ->msi_map_irq() ops for cadence
- Configure LM_EP_FUNC_CFG based on epc->function_num_map for cadence
- Add EP function driver to provide NTB functionality
- Add support for EPF PCI Non-Transparent Bridge
- Add specification for PCI NTB function device
- Add PCI endpoint NTB function user guide
- Add configfs binding documentation for pci-ntb endpoint function
Broadcom STB PCIe controller driver:
- Add support for BCM4908 and external PERST# signal controller
(Rafał Miłecki)
Cadence PCIe controller driver:
- Retrain Link to work around Gen2 training defect (Nadeem Athani)
- Fix merge botch in cdns_pcie_host_map_dma_ranges() (Krzysztof
Wilczyński)
Freescale Layerscape PCIe controller driver:
- Add LX2160A rev2 EP mode support (Hou Zhiqiang)
- Convert to builtin_platform_driver() (Michael Walle)
MediaTek PCIe controller driver:
- Fix OF node reference leak (Krzysztof Wilczyński)
Microchip PolarFlare PCIe controller driver:
- Add Microchip PolarFire PCIe controller driver (Daire McNamara)
Qualcomm PCIe controller driver:
- Use PHY_REFCLK_USE_PAD only for ipq8064 (Ansuel Smith)
- Add support for ddrss_sf_tbu clock for sm8250 (Dmitry Baryshkov)
Renesas R-Car PCIe controller driver:
- Drop PCIE_RCAR config option (Lad Prabhakar)
- Always allocate MSI addresses in 32bit space (Marek Vasut)
Rockchip PCIe controller driver:
- Add FriendlyARM NanoPi M4B DT binding (Chen-Yu Tsai)
- Make 'ep-gpios' DT property optional (Chen-Yu Tsai)
Synopsys DesignWare PCIe controller driver:
- Work around ECRC configuration hardware defect (Vidya Sagar)
- Drop support for config space in DT 'ranges' (Rob Herring)
- Change size to u64 for EP outbound iATU (Shradha Todi)
- Add upper limit address for outbound iATU (Shradha Todi)
- Make dw_pcie ops optional (Jisheng Zhang)
- Remove unnecessary dw_pcie_ops from al driver (Jisheng Zhang)
Xilinx Versal CPM PCIe controller driver:
- Fix OF node reference leak (Pan Bian)
Miscellaneous:
- Remove tango host controller driver (Arnd Bergmann)
- Remove IRQ handler & data together (altera-msi, brcmstb, dwc)
(Martin Kaiser)
- Fix xgene-msi race in installing chained IRQ handler (Martin
Kaiser)
- Apply CONFIG_PCI_DEBUG to entire drivers/pci hierarchy (Junhao He)
- Fix pci-bridge-emul array overruns (Russell King)
- Remove obsolete uses of WARN_ON(in_interrupt()) (Sebastian Andrzej
Siewior)"
* tag 'pci-v5.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (69 commits)
PCI: qcom: Use PHY_REFCLK_USE_PAD only for ipq8064
PCI: qcom: Add support for ddrss_sf_tbu clock
dt-bindings: PCI: qcom: Document ddrss_sf_tbu clock for sm8250
PCI: al: Remove useless dw_pcie_ops
PCI: dwc: Don't assume the ops in dw_pcie always exist
PCI: dwc: Add upper limit address for outbound iATU
PCI: dwc: Change size to u64 for EP outbound iATU
PCI: dwc: Drop support for config space in 'ranges'
PCI: layerscape: Convert to builtin_platform_driver()
PCI: layerscape: Add LX2160A rev2 EP mode support
dt-bindings: PCI: layerscape: Add LX2160A rev2 compatible strings
PCI: dwc: Work around ECRC configuration issue
PCI/portdrv: Report reset for frozen channel
PCI/AER: Specify the type of Port that was reset
PCI/ERR: Retain status from error notification
PCI/AER: Clear AER status from Root Port when resetting Downstream Port
PCI/ERR: Clear status of the reporting device
dt-bindings: arm: rockchip: Add FriendlyARM NanoPi M4B
PCI: rockchip: Make 'ep-gpios' DT property optional
Documentation: PCI: Add PCI endpoint NTB function user guide
...
Kmemleak reports:
unreferenced object 0xc328de40 (size 64):
comm "kworker/1:1", pid 21, jiffies 4294938212 (age 1484.670s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 e0 d8 fc eb 00 00 00 00 ................
00 00 10 fe 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ad758d10>] pci_register_io_range+0x3c/0x80
[<2c7f139e>] of_pci_range_to_resource+0x48/0xc0
[<f079ecc8>] devm_of_pci_get_host_bridge_resources.constprop.0+0x2ac/0x3ac
[<e999753b>] devm_of_pci_bridge_init+0x60/0x1b8
[<a895b229>] devm_pci_alloc_host_bridge+0x54/0x64
[<e451ddb0>] rcar_pcie_probe+0x2c/0x644
In case a PCI host driver's probe is deferred, the same I/O range may be
allocated again, and be ignored, causing a memory leak.
Fix this by (a) letting logic_pio_register_range() return -EEXIST if the
passed range already exists, so pci_register_io_range() will free it, and
by (b) making pci_register_io_range() not consider -EEXIST an error
condition.
Link: https://lore.kernel.org/r/20210202100332.829047-1-geert+renesas@glider.be
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
RX 5600 XT Pulse advertises support for BAR 0 being 256MB, 512MB,
or 1GB, but it also supports 2GB, 4GB, and 8GB. Add a rebar
size quirk so that the BAR 0 is big enough to cover complete VARM.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20210107175017.15893-5-nirmoy.das@amd.com
Users of pci_resize_resource() need a way to calculate BAR size
from desired bytes. Add a helper function and export it so that
modular drivers can use it.
Signed-off-by: Darren Salt <devspam@moreofthesa.me.uk>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20210107175017.15893-3-nirmoy.das@amd.com
- Save/restore Precision Time Measurement Capability for suspend/resume
(David E. Box)
- Disable PTM during suspend to save power (David E. Box)
* pci/ptm:
PCI: Disable PTM during suspend to save power
PCI/PTM: Save/restore Precision Time Measurement Capability for suspend/resume
- Add sysfs attribute for device power state (Maximilian Luz)
- Rename pci_wakeup_bus() to pci_resume_bus() (Mika Westerberg)
- Do not generate wakeup event when runtime resuming bus (Mika Westerberg)
* pci/pm:
PCI/PM: Do not generate wakeup event when runtime resuming device
PCI/PM: Rename pci_wakeup_bus() to pci_resume_bus()
PCI: Add sysfs attribute for device power state
- Decode PCIe 64 GT/s link speed (Gustavo Pimentel)
- De-duplicate Device IDs in the driver dynamic IDs list (Zhenzhong Duan)
- Return u8 from pci_find_capability() and similar (Puranjay Mohan)
- Return u16 from pci_find_ext_capability() and similar (Bjorn Helgaas)
- Include both device and resource name in config space resources
(Alexander Lobakin)
- Fix ACPI companion lookup for device 0 on the root bus (Rafael J.
Wysocki)
* pci/enumeration:
PCI/ACPI: Fix companion lookup for device 0 on the root bus
PCI: Keep both device and resource name for config space remaps
PCI: Return u16 from pci_find_ext_capability() and similar
PCI: Return u8 from pci_find_capability() and similar
PCI: Avoid duplicate IDs in driver dynamic IDs list
PCI: Move pci_match_device() ahead of new_id_store()
PCI: Decode PCIe 64 GT/s link speed
Follow the rule taken in commit 35bd8c07db ("devres: keep both device
name and resource name in pretty name") and keep both device and resource
names while requesting memory regions for PCI config space to prettify e.g.
/proc/iomem output:
Before (DWC Host Controller):
18b00000-18b01fff : dbi
18b10000-18b11fff : config
18b20000-18b21fff : dbi
18b30000-18b31fff : config
After:
18b00000-18b01fff : 18b00000.pci dbi
18b10000-18b11fff : 18b00000.pci config
18b20000-18b21fff : 18b20000.pci dbi
18b30000-18b31fff : 18b20000.pci config
Link: https://lore.kernel.org/r/WbKfdybjZ6xNIUjcC5oC8NcuLqrJfkxQAlnO80ag@cp3-web-020.plabs.ch
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
There are systems (for example, Intel based mobile platforms since Coffee
Lake) where the power drawn while suspended can be significantly reduced by
disabling Precision Time Measurement (PTM) on PCIe root ports as this
allows the port to enter a lower-power PM state and the SoC to reach a
lower-power idle state. To save this power, disable the PTM feature on root
ports during pci_prepare_to_sleep() and pci_finish_runtime_suspend(). The
feature will be returned to its previous state during restore and error
recovery.
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=209361
Link: https://lore.kernel.org/r/20201207223951.19667-2-david.e.box@linux.intel.com
Reported-by: Len Brown <len.brown@intel.com>
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The PCI subsystem does not currently save and restore the configuration
space for the Precision Time Measurement (PTM) Extended Capability leading
to the possibility of the feature returning disabled on S3 resume. This
has been observed on Intel Coffee Lake desktops. Add save/restore of the
PTM control register. This saves the PTM Enable, Root Select, and Effective
Granularity bits.
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20201207223951.19667-1-david.e.box@linux.intel.com
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Drivers like ehci_hcd and xhci_hcd use pci_set_mwi() and emit an annnoying
message like the following that results in user questions whether something
is broken:
xhci_hcd 0000:00:15.0: cache line size of 64 is not supported
Root cause of the message is that on several chips the Cache Line Size
register is hard-wired to 0.
Change this message to debug level; an interested caller can still inform
the user (if deemed helpful) based on the return code.
Link: https://lore.kernel.org/r/be1ed3a2-98b9-ee1d-20b8-477f3d93961d@gmail.com
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
When a PCI bridge is runtime resumed from D3cold, we resume any downstream
devices as well. Previously, we also generated a wakeup event for each
device even though this is not a wakeup signal coming from the hardware.
Normally this does not cause problems but when combined with
/sys/power/wakeup_count like using the steps below:
# count=$(cat /sys/power/wakeup_count)
# echo $count > /sys/power/wakeup_count
# echo mem > /sys/power/state
The system suspend cycle might fail at this point if a PCI bridge that was
runtime suspended (D3cold) was runtime resumed for any reason. The runtime
resume calls pci_resume_bus(), which generates a wakeup event and increases
wakeup_count.
Since this is not a real wakeup event, remove the call to
pci_wakeup_event() from pci_resume_one().
[bhelgaas: reorder, commit log]
Link: https://lore.kernel.org/r/20201125090733.77782-1-mika.westerberg@linux.intel.com
Reported-by: Utkarsh Patel <utkarsh.h.patel@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
A "wakeup" is a signal from a device telling the system that the device or
the whole system should be awakened and made active. PCI devices are made
active by "resuming" them.
pci_wakeup_bus() is not involved with the wakeup signal; it *resumes*
devices on a bus (possibly in response to a wakeup signal, but that's at a
higher level).
Rename pci_wakeup_bus() to pci_resume_bus() to better reflect what it does.
No functional change intended.
[bhelgaas: commit log, reorder before removal of pci_wakeup_event()]
Link: https://lore.kernel.org/r/20201125090733.77782-2-mika.westerberg@linux.intel.com
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
PCI Express Extended Capabilities are in config space between offsets 256
and 4K. These offsets all fit in 16 bits.
Change the return type of pci_find_ext_capability() and supporting
functions from int to u16 to match the specification. Many callers use
"int", which is fine, but there's no need to store more than a u16.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
PCI Capabilities are linked in a list that must appear in the first 256
bytes of config space. Each capabilities list pointer is 8 bits.
Change the return type of pci_find_capability() and supporting functions
from int to u8 to match the specification.
[bhelgaas: change other related interfaces, fix HyperTransport typos]
Link: https://lore.kernel.org/r/20201129164626.12887-1-puranjay12@gmail.com
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The shift of 1 by align_order is evaluated using 32 bit arithmetic and the
result is assigned to a resource_size_t type variable that is a 64 bit
unsigned integer on 64 bit platforms. Fix an overflow before widening issue
by making the 1 a ULL.
Addresses-Coverity: ("Unintentional integer overflow")
Fixes: 32a9a682be ("PCI: allow assignment of memory resources with a specified alignment")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
32-bit BARs are limited to 2GB size (2^31). By extension, I assume 64-bit
BARs are limited to 2^63 bytes. Limit the alignment requested by the
"pci=resource_alignment=" command-line parameter to 2^63.
Link: https://lore.kernel.org/r/20201007123045.GS4282@kadam
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
saved and restored during suspend/resume leading to L1 Substates
configuration being lost post-resume.
Save the L1 Substates control registers so that the configuration is
retained post-resume.
Link: https://lore.kernel.org/r/20201024190442.871-1-vidyas@nvidia.com
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Some devices support ACS functionality even though they don't have a
spec-compliant ACS Capability; pci_enable_acs() has a quirk mechanism to
handle them.
We want to enable ACS whenever possible, but 52fbf5bdee ("PCI: Cache ACS
capability offset in device") inadvertently broke this by calling
pci_enable_acs() only if we find an ACS Capability.
This resulted in ACS not being enabled for these non-compliant devices,
which means devices can't be separated into different IOMMU groups, which
in turn means we may not be able to pass those devices through to VMs, as
reported by Boris V:
https://lore.kernel.org/r/74aeea93-8a46-5f5a-343c-790d4c655da3@bstnet.org
Fixes: 52fbf5bdee ("PCI: Cache ACS capability offset in device")
Link: https://lore.kernel.org/r/20201028231545.4116866-1-rajatja@google.com
Reported-by: Boris V <borisvk@bstnet.org>
Signed-off-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
- Remove unnecessary #includes (Gustavo Pimentel)
- Fix intel_mid_pci.c build error when !CONFIG_ACPI (Randy Dunlap)
- Use scnprintf(), not snprintf(), in sysfs "show" functions (Krzysztof
Wilczyński)
- Simplify pci-pf-stub by using module_pci_driver() (Liu Shixin)
- Print IRQ used by Link Bandwidth Notification (Dongdong Liu)
- Update sysfs mmap-related #ifdef comments (Clint Sbisa)
- Simplify pci_dev_reset_slot_function() (Lukas Wunner)
- Use "NULL" instead of "0" to fix sparse warnings (Gustavo Pimentel)
- Simplify bool comparisons (Krzysztof Wilczyński)
- Drop double zeroing for P2PDMA sg_init_table() (Julia Lawall)
* pci/misc:
PCI: v3-semi: Remove unneeded break
PCI/P2PDMA: Drop double zeroing for sg_init_table()
PCI: Simplify bool comparisons
PCI: endpoint: Use "NULL" instead of "0" as a NULL pointer
PCI: Simplify pci_dev_reset_slot_function()
PCI: Update mmap-related #ifdef comments
PCI/LINK: Print IRQ number used by port
PCI/IOV: Simplify pci-pf-stub with module_pci_driver()
PCI: Use scnprintf(), not snprintf(), in sysfs "show" functions
x86/PCI: Fix intel_mid_pci.c build error when ACPI is not enabled
PCI: Remove unnecessary header includes
- Use for_each_child_of_node() and for_each_node_by_name() instead of
open-coding them (Qinglang Miao)
- Reduce pciehp noisiness on hot removal (Lukas Wunner)
- Remove unused assignment in shpchp (Krzysztof Wilczyński)
* pci/hotplug:
PCI: shpchp: Remove unused 'rc' assignment
PCI: pciehp: Reduce noisiness on hot removal
PCI: rpadlpar: Use for_each_child_of_node() and for_each_node_by_name()
- Tone down message about missing optional MCFG (Jeremy Linton)
- Add schedule point in pci_read_config() (Jiang Biao)
- Add Ampere Altra SOC MCFG quirk (Tuan Phan)
- Add Kconfig options for MPS/MRRS strategy (Jim Quinlan)
* pci/enumeration:
PCI: Add Kconfig options for MPS/MRRS strategy
PCI/ACPI: Add Ampere Altra SOC MCFG quirk
PCI: Add schedule point in pci_read_config()
PCI/ACPI: Tone down missing MCFG message
Add Kconfig options for changing the default pcie_bus_config, i.e., the
strategy for configuration MPS and MRRS, in the same manner as the
CONFIG_PCIEASPM_XXXX choice. The pci_bus_config setting may still be
overridden by kernel command-line parameters, e.g.,
"pci=pcie_bus_tune_off".
[bhelgaas: depend on EXPERT, tweak help texts]
Link: https://lore.kernel.org/r/20200928194651.5393-2-james.quinlan@broadcom.com
Signed-off-by: Jim Quinlan <james.quinlan@broadcom.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This reverts commit 7e24bc347e.
7e24bc347e was based on PCIe r5.0, sec 5.9, which claims we need a 200 ms
delay when transitioning to or from D2. However, sec 5.3.1.3 states the
delay as 200 μs (microseconds), as does the table in PCIe r4.0, sec 5.9.1.
This looks like a typo in the r5.0 spec, so revert back to a 200 μs delay
instead of a 200 ms delay.
Fixes: 7e24bc347e ("PCI/PM: Apply D2 delay as milliseconds, not microseconds")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Take care about Coccinelle warnings:
drivers/pci/pci.c:6008:6-12: WARNING: Comparison to bool
drivers/pci/pci.c:6024:7-13: WARNING: Comparison to bool
No change to functionality intended.
Link: https://lore.kernel.org/r/20200925224555.1752460-1-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
PCI devices support two variants of the D3 power state: D3hot (main power
present) D3cold (main power removed). Previously struct pci_dev contained:
unsigned int d3_delay; /* D3->D0 transition time in ms */
unsigned int d3cold_delay; /* D3cold->D0 transition time in ms */
"d3_delay" refers specifically to the D3hot state. Rename it to
"d3hot_delay" to avoid ambiguity and align with the ACPI "_DSM for
Specifying Device Readiness Durations" in the PCI Firmware spec r3.2,
sec 4.6.9.
There is no change to the functionality.
Link: https://lore.kernel.org/r/20200730210848.1578826-1-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
pci_dev_reset_slot_function() refuses to reset a hotplug slot if it is
shared by multiple pci_devs. That's the case if and only if the slot is
occupied by a multifunction device.
Simplify the function to check the device's multifunction flag instead
of iterating over the devices on the bus. (Iterating over the devices
requires holding pci_bus_sem, which the function erroneously does not
acquire.)
Link: https://lore.kernel.org/r/c6aab5af096f7b1b3db57f6335cebba8f0fcca89.1595330431.git.lukas@wunner.de
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
When a PCIe card is hot-removed, the Presence Detect State and Data Link
Layer Link Active bits often do not clear simultaneously. I've seen delays
of up to 244 msec between the two events with Thunderbolt.
After pciehp has brought down the slot in response to the first event, the
other bit may still be set. It's not discernible whether it's set because
a new card is already in the slot or if it will soon clear. So pciehp
tries to bring up the slot and in the latter case fails with a bunch of
messages, some of them at KERN_ERR severity. If the slot is no longer
occupied, the messages are false positives and annoy users.
Stuart Hayes reports the following splat on hot removal:
KERN_INFO pcieport 0000:3c:06.0: pciehp: Slot(180): Link Up
KERN_INFO pcieport 0000:3c:06.0: pciehp: Timeout waiting for Presence Detect
KERN_ERR pcieport 0000:3c:06.0: pciehp: link training error: status 0x0001
KERN_ERR pcieport 0000:3c:06.0: pciehp: Failed to check link status
Dongdong Liu complains about a similar splat:
KERN_INFO pciehp 0000:80:10.0:pcie004: Slot(36): Link Down
KERN_INFO iommu: Removing device 0000:87:00.0 from group 12
KERN_INFO pciehp 0000:80:10.0:pcie004: Slot(36): Card present
KERN_INFO pcieport 0000:80:10.0: Data Link Layer Link Active not set in 1000 msec
KERN_ERR pciehp 0000:80:10.0:pcie004: Failed to check link status
Users are particularly irritated to see a bringup attempt even though the
slot was explicitly brought down via sysfs. In a perfect world, we could
avoid this by setting Link Disable on slot bringdown and re-enabling it
upon a Presence Detect State change. In reality however, there are broken
hotplug ports which hardwire Presence Detect to zero, see 80696f9914
("PCI: pciehp: Tolerate Presence Detect hardwired to zero"). Conversely,
PCIe r1.0 hotplug ports hardwire Link Active to zero because Link Active
Reporting wasn't specified before PCIe r1.1. On unplug, some ports first
clear Presence then Link (see Stuart Hayes' splat) whereas others use the
inverse order (see Dongdong Liu's splat). To top it off, there are hotplug
ports which flap the Presence and Link bits on slot bringup, see
6c35a1ac3d ("PCI: pciehp: Tolerate initially unstable link").
pciehp is designed to work with all of these variants. Surplus attempts at
slot bringup are a lesser evil than not being able to bring up slots at
all. Although we could try to perfect the behavior for specific hotplug
controllers, we'd risk breaking others or increasing code complexity.
But we can certainly minimize annoyance by emitting only a single message
with KERN_INFO severity if bringup is unsuccessful:
* Drop the "Timeout waiting for Presence Detect" message in
pcie_wait_for_presence(). The sole caller of that function,
pciehp_check_link_status(), ignores the timeout and carries on. It emits
error messages of its own and I don't think this particular message adds
much value.
* There's a single error condition in pciehp_check_link_status() which
does not emit a message. Adding one allows dropping the "Failed to check
link status" message emitted by board_added() if
pciehp_check_link_status() returns a non-zero integer.
* Tone down all messages in pciehp_check_link_status() to KERN_INFO
severity and rephrase them to look as innocuous as possible. To this
end, move the message emitted by pcie_wait_for_link_delay() to its
callers.
As a result, Stuart Hayes' splat becomes:
KERN_INFO pcieport 0000:3c:06.0: pciehp: Slot(180): Link Up
KERN_INFO pcieport 0000:3c:06.0: pciehp: Slot(180): Cannot train link: status 0x0001
Dongdong Liu's splat becomes:
KERN_INFO pciehp 0000:80:10.0:pcie004: Slot(36): Card present
KERN_INFO pciehp 0000:80:10.0:pcie004: Slot(36): No link
The messages now merely serve as information that presence or link bits
were set a little longer than expected. Bringup failures which are not
false positives are still reported, albeit no longer at KERN_ERR severity.
Link: https://lore.kernel.org/linux-pci/20200310182100.102987-1-stuart.w.hayes@gmail.com/
Link: https://lore.kernel.org/linux-pci/1547649064-19019-1-git-send-email-liudongdong3@huawei.com/
Link: https://lore.kernel.org/r/b45e46fd8a6aa6930aaac9d7718c2e4b787a4e5e.1595935071.git.lukas@wunner.de
Reported-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Reported-by: Dongdong Liu <liudongdong3@huawei.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Translation Blocking is a required feature for Downstream Ports (Root
Ports or Switch Downstream Ports) that implement ACS. When enabled, the
Port checks the Address Type (AT) of each upstream Memory Request it
receives.
The default AT (00b) means "untranslated" and the IOMMU can decide whether
to treat the address as I/O virtual or physical.
If AT is not the default, i.e., if the Memory Request contains an
already-translated (physical) address, the Port blocks the request and
reports an ACS error.
When enabling ACS, enable Translation Blocking for external-facing ports
and untrusted (external) devices. This is to help prevent attacks from
external devices that initiate DMA with physical addresses that bypass the
IOMMU.
[bhelgaas: commit log, simplify setting bit and drop warning; TB is
required for Downstream Ports with ACS, so we should never see the warning]
Link: https://lore.kernel.org/r/20200707224604.3737893-4-rajatja@google.com
Signed-off-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
- Use pci_channel_state_t instead of enum pci_channel_state (Luc Van
Oostenryck)
- Simplify __aer_print_error() (Bjorn Helgaas)
- Log AER correctable errors as warning, not error (Matt Jolly)
- Rename pci_aer_clear_device_status() to pcie_clear_device_status() (Bjorn
Helgaas)
- Clear PCIe Device Status errors only if OS owns AER (Jonathan Cameron)
* pci/error:
PCI/ERR: Clear PCIe Device Status errors only if OS owns AER
PCI/ERR: Rename pci_aer_clear_device_status() to pcie_clear_device_status()
PCI/AER: Log correctable errors as warning, not error
PCI/AER: Simplify __aer_print_error()
PCI: Use 'pci_channel_state_t' instead of 'enum pci_channel_state'
pci_aer_clear_device_status() clears the error bits in the PCIe Device
Status Register (PCI_EXP_DEVSTA). Every PCIe device has this register,
regardless of whether it supports AER.
Rename pci_aer_clear_device_status() to pcie_clear_device_status() to make
clear that it is PCIe-specific but not AER-specific. Move it to
drivers/pci/pci.c, again since it's not AER-specific. No functional change
intended.
Link: https://lore.kernel.org/r/20200717195619.766662-1-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Currently the ACS capability is being looked up at a number of places. Read
and store it once at enumeration so that it can be used by all later. No
functional change intended.
Link: https://lore.kernel.org/r/20200707224604.3737893-2-rajatja@google.com
Signed-off-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Move pci_enable_acs() and dependencies further up in the source code to
avoid having to forward declare it when we make it static in near future.
No functional changes intended.
Link: https://lore.kernel.org/r/20200707224604.3737893-1-rajatja@google.com
Signed-off-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The PCI config accessors (pci_read_config_word(), et al) return
PCIBIOS_SUCCESSFUL (zero) or positive error values like
PCIBIOS_FUNC_NOT_SUPPORTED.
The PCIe capability accessors (pcie_capability_read_word(), et al)
similarly return PCIBIOS errors, but some callers assume they return
generic errno values like -EINVAL.
For example, the Myri-10G probe function returns a positive PCIBIOS error
if the pcie_capability_clear_and_set_word() in pcie_set_readrq() fails:
myri10ge_probe
status = pcie_set_readrq
return pcie_capability_clear_and_set_word
if (status)
return status
A positive return from a PCI driver probe function would cause a "Driver
probe function unexpectedly returned" warning from local_pci_probe()
instead of the desired probe failure.
Convert PCIBIOS errors to generic errno for all callers of:
pcie_capability_read_word
pcie_capability_read_dword
pcie_capability_write_word
pcie_capability_write_dword
pcie_capability_set_word
pcie_capability_set_dword
pcie_capability_clear_word
pcie_capability_clear_dword
pcie_capability_clear_and_set_word
pcie_capability_clear_and_set_dword
that check the return code for anything other than zero.
[bhelgaas: commit log, squash together]
Suggested-by: Bjorn Helgaas <bjorn@helgaas.com>
Link: https://lore.kernel.org/r/20200615073225.24061-1-refactormyself@gmail.com
Signed-off-by: Bolarinwa Olayemi Saheed <refactormyself@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
- Check .bridge_d3() hook for NULL before calling it (Bjorn Helgaas)
- Disable PME# for Pericom OHCI/UHCI USB controllers because it's
not reliably asserted on USB hotplug (Kai-Heng Feng)
- Assume ports without DLL Link Active train links in 100 ms to work
around Thunderbolt bridge defects (Mika Westerberg)
* pci/pm:
PCI/PM: Assume ports without DLL Link Active train links in 100 ms
PCI/PM: Adjust pcie_wait_for_link_delay() for caller delay
PCI: Avoid Pericom USB controller OHCI/EHCI PME# defect
serial: 8250_pci: Move Pericom IDs to pci_ids.h
PCI/PM: Call .bridge_d3() hook only if non-NULL
Kai-Heng Feng reported that it takes a long time (> 1 s) to resume
Thunderbolt-connected devices from both runtime suspend and system sleep
(s2idle).
This was because some Downstream Ports that support > 5 GT/s do not also
support Data Link Layer Link Active reporting. Per PCIe r5.0 sec 6.6.1:
With a Downstream Port that supports Link speeds greater than 5.0 GT/s,
software must wait a minimum of 100 ms after Link training completes
before sending a Configuration Request to the device immediately below
that Port. Software can determine when Link training completes by polling
the Data Link Layer Link Active bit or by setting up an associated
interrupt (see Section 6.7.3.3).
Sec 7.5.3.6 requires such Ports to support DLL Link Active reporting, but
at least the Intel JHL6240 Thunderbolt 3 Bridge [8086:15c0] and the Intel
JHL7540 Thunderbolt 3 Bridge [8086:15ea] do not.
Previously we tried to wait for Link training to complete, but since there
was no DLL Link Active reporting, all we could do was wait the worst-case
1000 ms, then another 100 ms.
Instead of using the supported speeds to determine whether to wait for Link
training, check whether the port supports DLL Link Active reporting. The
Ports in question do not, so we'll wait only the 100 ms required for Ports
that support Link speeds <= 5 GT/s.
This of course assumes these Ports always train the Link within 100 ms even
if they are operating at > 5 GT/s, which is not required by the spec.
[bhelgaas: commit log, comment]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206837
Link: https://lore.kernel.org/r/20200514133043.27429-1-mika.westerberg@linux.intel.com
Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>