- Reduce wait time for secondary bus to be ready to speed up resume (Mika
Westerberg)
- Avoid putting EloPOS E2/S2/H2 (as well as Elo i2) PCIe Ports in D3cold
(Ondrej Zary)
- Call _REG when transitioning D-states so AML that uses the PCI config
space OpRegion works, which fixes some ASMedia GPIO controllers (Mario
Limonciello)
* pci/pm:
PCI/ACPI: Call _REG when transitioning D-states
PCI/ACPI: Validate acpi_pci_set_power_state() parameter
PCI/PM: Avoid putting EloPOS E2/S2/H2 PCIe Ports in D3cold
PCI/PM: Shorten pci_bridge_wait_for_secondary_bus() wait time for slow links
- Add PCI_EXT_CAP_ID_PL_32GT define (Ben Dooks)
- Propagate firmware node by calling device_set_node() for better
modularity (Andy Shevchenko)
- Discover Data Link Layer Link Active Reporting earlier so quirks can take
advantage of it (Maciej W. Rozycki)
- Use cached Data Link Layer Link Active Reporting capability in pciehp,
powerpc/eeh, and mlx5 (Maciej W. Rozycki)
- Run quirk for devices that require OS to clear Retrain Link earlier, so
later quirks can rely on it (Maciej W. Rozycki)
- Export pcie_retrain_link() for use outside ASPM (Maciej W. Rozycki)
- Add Data Link Layer Link Active Reporting as another way for
pcie_retrain_link() to determine the link is up (Maciej W. Rozycki)
- Work around link training failures (especially on the ASMedia ASM2824
switch) by training first at 2.5GT/s and then attempting higher rates
(Maciej W. Rozycki)
* pci/enumeration:
PCI: Add failed link recovery for device reset events
PCI: Work around PCIe link training failures
PCI: Use pcie_wait_for_link_status() in pcie_wait_for_link_delay()
PCI: Add support for polling DLLLA to pcie_retrain_link()
PCI: Export pcie_retrain_link() for use outside ASPM
PCI: Export PCIe link retrain timeout
PCI: Execute quirk_enable_clear_retrain_link() earlier
PCI/ASPM: Factor out waiting for link training to complete
PCI/ASPM: Avoid unnecessary pcie_link_state use
PCI/ASPM: Use distinct local vars in pcie_retrain_link()
net/mlx5: Rely on dev->link_active_reporting
powerpc/eeh: Rely on dev->link_active_reporting
PCI: pciehp: Rely on dev->link_active_reporting
PCI: Initialize dev->link_active_reporting earlier
PCI: of: Propagate firmware node by calling device_set_node()
PCI: Add PCI_EXT_CAP_ID_PL_32GT define
# Conflicts:
# drivers/pci/pcie/aspm.c
Request failed link recovery with any upstream PCIe bridge where a device
has not come back after reset within PCI_RESET_WAIT time. Reset the
polling interval if recovery succeeded, otherwise continue as usual.
[bhelgaas: inline pcie_parent_link_retrain()]
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2306111631050.64925@angie.orcam.me.uk
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Attempt to handle cases such as with a downstream port of the ASMedia
ASM2824 PCIe switch where link training never completes and the link
continues switching between speeds indefinitely with the data link layer
never reaching the active state.
It has been observed with a downstream port of the ASMedia ASM2824 Gen 3
switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2 switch,
using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device, P/N 41433,
wired to a SiFive HiFive Unmatched board. In this setup the switches
should negotiate a link speed of 5.0GT/s, falling back to 2.5GT/s if
necessary.
Instead the link continues oscillating between the two speeds, at the rate
of 34-35 times per second, with link training reported repeatedly active
~84% of the time. Limiting the target link speed to 2.5GT/s with the
upstream ASM2824 device makes the two switches communicate correctly.
Removing the speed restriction afterwards makes the two devices switch to
5.0GT/s then.
Make use of these observations and detect the inability to train the link
by checking for the Data Link Layer Link Active status bit being off while
the Link Bandwidth Management Status indicating that hardware has changed
the link speed or width in an attempt to correct unreliable link operation.
Restrict the speed to 2.5GT/s then with the Target Link Speed field,
request a retrain and wait 200ms for the data link to go up. If this is
successful, lift the restriction, letting the devices negotiate a higher
speed.
Also check for a 2.5GT/s speed restriction the firmware may have already
arranged and lift it too with ports of devices known to continue working
afterwards (currently only ASM2824), that already report their data link
being up.
[bhelgaas: reorder and squash stubs from
https://lore.kernel.org/r/alpine.DEB.2.21.2306111619570.64925@angie.orcam.me.uk
to avoid adding stubs that do nothing]
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2203022037020.56670@angie.orcam.me.uk/
Link: https://source.denx.de/u-boot/u-boot/-/commit/a398a51ccc68
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2305310038540.59226@angie.orcam.me.uk
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Remove a DLLLA status bit polling loop from pcie_wait_for_link_delay() and
call almost identical code in pcie_wait_for_link_status() instead. This
reduces the lower bound on the polling interval from 10ms to 1ms, possibly
increasing the CPU load on the system in favour to reducing the wait time.
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2306111611170.64925@angie.orcam.me.uk
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Let the caller of pcie_retrain_link() specify whether they want to use the
LT bit or the DLLLA bit of the Link Status Register to determine if link
training has completed. It is up to the caller to verify whether the use
of the DLLLA bit, the implementation of which is optional, is valid for the
device requested.
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2306110310540.64925@angie.orcam.me.uk
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Export pcie_retrain_link() for link retrain needs outside ASPM. Struct
pcie_link_state is local to ASPM and only used by pcie_retrain_link() to
get at the associated PCI device, so change the operand and adjust the lone
call site accordingly. Document the interface. No functional change at
this point.
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2306110229010.64925@angie.orcam.me.uk
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Convert LINK_RETRAIN_TIMEOUT from jiffies to milliseconds, accordingly
rename to PCIE_LINK_RETRAIN_TIMEOUT_MS, and make available via "pci.h" for
the PCI core to use. Use in pcie_wait_for_link_delay().
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2305310030280.59226@angie.orcam.me.uk
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
With slow links (<= 5GT/s) active link reporting is not mandatory, so if a
device is disconnected during system sleep we might end up waiting for it
to respond for ~60s, which slows down resume time.
PCIe r6.0, sec 6.6.1, mandates that software must wait for at least 1s
before it can assume a device is broken, so use that minimum requirement
for slow links and bail out if the device doesn't respond within 1s.
However, if the port supports active link reporting we can wait longer as
we do with the fast links.
This should make system resume time faster for slow links as well while
still following the PCIe spec.
While there move the PCI_RESET_WAIT constant into pci.c because it is
not used outside of that file anymore.
Link: https://lore.kernel.org/r/20230425064751.24951-1-mika.westerberg@linux.intel.com
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Here is the large set of driver core changes for 6.4-rc1.
Once again, a busy development cycle, with lots of changes happening in
the driver core in the quest to be able to move "struct bus" and "struct
class" into read-only memory, a task now complete with these changes.
This will make the future rust interactions with the driver core more
"provably correct" as well as providing more obvious lifetime rules for
all busses and classes in the kernel.
The changes required for this did touch many individual classes and
busses as many callbacks were changed to take const * parameters
instead. All of these changes have been submitted to the various
subsystem maintainers, giving them plenty of time to review, and most of
them actually did so.
Other than those changes, included in here are a small set of other
things:
- kobject logging improvements
- cacheinfo improvements and updates
- obligatory fw_devlink updates and fixes
- documentation updates
- device property cleanups and const * changes
- firwmare loader dependency fixes.
All of these have been in linux-next for a while with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZEp7Sw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykitQCfamUHpxGcKOAGuLXMotXNakTEsxgAoIquENm5
LEGadNS38k5fs+73UaxV
=7K4B
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the large set of driver core changes for 6.4-rc1.
Once again, a busy development cycle, with lots of changes happening
in the driver core in the quest to be able to move "struct bus" and
"struct class" into read-only memory, a task now complete with these
changes.
This will make the future rust interactions with the driver core more
"provably correct" as well as providing more obvious lifetime rules
for all busses and classes in the kernel.
The changes required for this did touch many individual classes and
busses as many callbacks were changed to take const * parameters
instead. All of these changes have been submitted to the various
subsystem maintainers, giving them plenty of time to review, and most
of them actually did so.
Other than those changes, included in here are a small set of other
things:
- kobject logging improvements
- cacheinfo improvements and updates
- obligatory fw_devlink updates and fixes
- documentation updates
- device property cleanups and const * changes
- firwmare loader dependency fixes.
All of these have been in linux-next for a while with no reported
problems"
* tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits)
device property: make device_property functions take const device *
driver core: update comments in device_rename()
driver core: Don't require dynamic_debug for initcall_debug probe timing
firmware_loader: rework crypto dependencies
firmware_loader: Strip off \n from customized path
zram: fix up permission for the hot_add sysfs file
cacheinfo: Add use_arch[|_cache]_info field/function
arch_topology: Remove early cacheinfo error message if -ENOENT
cacheinfo: Check cache properties are present in DT
cacheinfo: Check sib_leaf in cache_leaves_are_shared()
cacheinfo: Allow early level detection when DT/ACPI info is missing/broken
cacheinfo: Add arm64 early level initializer implementation
cacheinfo: Add arch specific early level initializer
tty: make tty_class a static const structure
driver core: class: remove struct class_interface * from callbacks
driver core: class: mark the struct class in struct class_interface constant
driver core: class: make class_register() take a const *
driver core: class: mark class_release() as taking a const *
driver core: remove incorrect comment for device_create*
MIPS: vpe-cmp: remove module owner pointer from struct class usage.
...
All callers of pci_bridge_wait_for_secondary_bus() supply a timeout of
PCIE_RESET_READY_POLL_MS, so drop the parameter. Move the definition of
PCIE_RESET_READY_POLL_MS into pci.c, the only user.
[bhelgaas: extracted from
https://lore.kernel.org/r/20230404052714.51315-3-mika.westerberg@linux.intel.com]
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Refactor pci_bus_for_each_resource() in the same way as
pci_dev_for_each_resource(). This allows the index to be hidden inside the
implementation so the caller can omit it when it's not used otherwise.
No functional changes intended.
Link: https://lore.kernel.org/r/20230330162434.35055-6-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
struct bus_type should never be modified in a sysfs callback as there is
nothing in the structure to modify, and frankly, the structure is almost
never used in a sysfs callback, so mark it as constant to allow struct
bus_type to be moved to read-only memory.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Bounine <alex.bou9@gmail.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Ben Widawsky <bwidawsk@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Harald Freudenberger <freude@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Hu Haowen <src.res@email.cn>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Stuart Yoder <stuyoder@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Acked-by: Ilya Dryomov <idryomov@gmail.com> # rbd
Acked-by: Ira Weiny <ira.weiny@intel.com> # cxl
Reviewed-by: Alex Shi <alexs@kernel.org>
Acked-by: Iwona Winiarska <iwona.winiarska@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com> # scsi
Link: https://lore.kernel.org/r/20230313182918.1312597-23-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmP2dbsUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vzBDg//aW2IeJYku5ENXwwnCQjBlyjBGOOZ
456KGpFt/ky0N9Jp0ZS3nQSa5YN7q+L8XY48gu6I7s1hXly8iLZKLrJN++S//k55
BadXu7mDUyVoY74LYvBe0nlXuwJul2qnq9IJLufRucrn1yoyqApAh39IRdCzi4U8
mP+wad7sQA0Si4bpf80uwn6Yq8SrDoO0mtmO/dZSXJooM2t2SnDXEL/fxMwTNDA4
XsVSP9FrbPmcTLo8mkDa8Dy7JKbL6KQJF9yDlmYzuA2spQpTf+YLLfsNnmE+850h
WTtfCjVaYtlik7i9qTB+VcN1CsGVepYKK3H5the16Aeql2Fu+Ji5KSt74C220Yi9
ZSDA93d/EfGc5egKyBdUUMFgqhe46srRUAoWcMrx2T4ARGuOm5EYCa9C8C7dFmO0
j6f9MYL3j2Sw3FROEKViRVOFfbIfVW1TXIo3x0fE0ud3xkg73eKp/++X8QeTMjox
2ArY2AWPNQpUI1oMlKxlSEd5XjFf7n/hHDtFqj9bIuJzt0/8wXQf0jCYTjhpGkRB
pmO+lColK6lp+bg8aWRRkiwN73xGdQhKaeXLo0Iq4T6xr0Lb3XoskHZvt6NIGe/A
ds5/uwtErq6kCf2G9YG1xfh+G1bimbjWwsHCNfSNXzTsWGDFTCb8tvqF90m+7+yl
bllxTXA6PO312Tw=
=/y4d
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Rework portdrv shutdown so it disables interrupts but doesn't
disable bus mastering, which leads to hangs on Loongson LS7A
- Add mechanism to prevent Max_Read_Request_Size (MRRS) increases,
again to avoid hardware issues on Loongson LS7A (and likely other
devices based on DesignWare IP)
- Ignore devices with a firmware (DT or ACPI) node that says the
device is disabled
Resource management:
- Distribute spare resources to unconfigured hotplug bridges at
boot-time (not just when hot-adding such a bridge), which makes
hot-adding devices to docks work better. Tried this in v6.1 but had
to revert for regressions, so try again
- Fix root bus issue that dropped resources that happened to end
at 0, e.g., [bus 00]
PCI device hotplug:
- Remove device locking when marking device as disconnected so this
doesn't have to wait for concurrent driver bind/unbind to complete
- Quirk more Qualcomm bridges that don't fully implement the PCIe
Slot Status 'Command Completed' bit
Power management:
- Account for _S0W of the target bridge in acpi_pci_bridge_d3() so we
don't miss hot-add notifications for USB4 docks, Thunderbolt, etc
Reset:
- Observe delay after reset, e.g., resuming from system sleep,
regardless of whether a bridge can suspend to D3cold at runtime
- Wait for secondary bus to become ready after a bridge reset
Virtualization:
- Avoid FLR on some AMD FCH AHCI adapters where it doesn't work
- Allow independent IOMMU groups for some Wangxun NICs that prevent
peer-to-peer transactions but don't advertise an ACS Capability
Error handling:
- Configure End-to-End-CRC (ECRC) only if Linux owns the AER
Capability
- Remove redundant Device Control Error Reporting Enable in the AER
service driver since this is already done for all devices during
enumeration
ASPM:
- Add pci_enable_link_state() interface to allow drivers to enable
ASPM link state
Endpoint framework:
- Move dra7xx and tegra194 linkup processing from hard IRQ to
threaded IRQ handler
- Add a separate lock for endpoint controller list of endpoint
function drivers to prevent deadlock in callbacks
- Pass events from endpoint controller to endpoint function drivers
via callbacks instead of notifiers
Synopsys DesignWare eDMA controller driver (acked by Vinod):
- Fix CPU vs PCI address issues
- Fix source vs destination address issues
- Fix issues with interleaved transfer semantics
- Fix channel count initialization issue (issue still exists in
several other drivers)
- Clean up and improve debugfs usage so it will work on platforms
with several eDMA devices
Baikal T-1 PCIe controller driver:
- Set a 64-bit DMA mask
Freescale i.MX6 PCIe controller driver:
- Add i.MX8MM, i.MX8MQ, i.MX8MP endpoint mode DT binding and driver
support
Intel VMD host bridge driver:
- Add quirk to configure PCIe ASPM and LTR. This is normally done by
BIOS, and will be for future products
Marvell MVEBU PCIe controller driver:
- Mark this driver as broken in Kconfig since bugs prevent its daily
usage
MediaTek MT7621 PCIe controller driver:
- Delay PHY port initialization to improve boot reliability for ZBT
WE1326, ZBT WF3526-P, and some Netgear models
Qualcomm PCIe controller driver:
- Add MSM8998 DT compatible string
- Unify MSM8996 and MSM8998 clock orderings
- Add SM8350 DT binding and driver support
- Add IPQ8074 Gen3 DT binding and driver support
- Correct qcom,perst-regs in DT binding
- Add qcom_pcie_host_deinit() so the PHY is powered off and
regulators and clocks are disabled on late host-init errors
Socionext UniPhier Pro5 controller driver:
- Clean up uniphier-ep reg, clocks, resets, and their names in DT
binding
Synopsys DesignWare PCIe controller driver:
- Restrict coherent DMA mask to 32 bits for MSI, but allow controller
drivers to set 64-bit streaming DMA mask
- Add eDMA engine support in both Root Port and Endpoint controllers
Miscellaneous:
- Remove MODULE_LICENSE from boolean drivers so they don't look like
modules so modprobe can complain about them"
* tag 'pci-v6.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (86 commits)
PCI: dwc: Add Root Port and Endpoint controller eDMA engine support
PCI: bt1: Set 64-bit DMA mask
PCI: dwc: Restrict only coherent DMA mask for MSI address allocation
dmaengine: dw-edma: Prepare dw_edma_probe() for builtin callers
dmaengine: dw-edma: Depend on DW_EDMA instead of selecting it
dmaengine: dw-edma: Add mem-mapped LL-entries support
PCI: Remove MODULE_LICENSE so boolean drivers don't look like modules
PCI: hv: Drop duplicate PCI_MSI dependency
PCI/P2PDMA: Annotate RCU dereference
PCI/sysfs: Constify struct kobj_type pci_slot_ktype
PCI: hotplug: Allow marking devices as disconnected during bind/unbind
PCI: pciehp: Add Qualcomm quirk for Command Completed erratum
PCI: qcom: Add IPQ8074 Gen3 port support
dt-bindings: PCI: qcom: Add IPQ8074 Gen3 port
dt-bindings: PCI: qcom: Sort compatibles alphabetically
PCI: qcom: Fix host-init error handling
PCI: qcom: Add SM8350 support
dt-bindings: PCI: qcom: Add SM8350
dt-bindings: PCI: qcom-ep: Correct qcom,perst-regs
dt-bindings: PCI: qcom: Unify MSM8996 and MSM8998 clock order
...
- Always observe reset delay when waking devices from D3cold, e.g., after
system sleep, regardless of whether we're allowed to runtime-suspend to
D3cold (Lukas Wunner)
- Unify reset and resume delays to wait for downstream devices after a
bridge reset (Lukas Wunner)
- Wait for downstream devices after a DPC-induced bridge reset (Lukas
Wunner)
* pci/reset:
PCI/DPC: Await readiness of secondary bus after reset
PCI: Unify delay handling for reset and resume
PCI/PM: Observe reset delay irrespective of bridge_d3
This reverts commit 4ff116d0d5.
Tasev Nikola and Mark Enriquez reported that resume from suspend was broken
in v6.1-rc1. Tasev bisected to a47126ec29 ("PCI/PTM: Cache PTM
Capability offset"), but we can't figure out how that could be related.
Mark saw the same symptoms and bisected to 4ff116d0d5 ("PCI/ASPM: Save L1
PM Substates Capability for suspend/resume"), which does have a connection:
it restores L1 Substates configuration while ASPM L1 may be enabled:
pci_restore_state
pci_restore_aspm_l1ss_state
aspm_program_l1ss
pci_write_config_dword(PCI_L1SS_CTL1, ctl1) # L1SS restore
pci_restore_pcie_state
pcie_capability_write_word(PCI_EXP_LNKCTL, cap[i++]) # L1 restore
which is a problem because PCIe r6.0, sec 5.5.4, requires that:
If setting either or both of the enable bits for ASPM L1 PM
Substates, both ports must be configured as described in this
section while ASPM L1 is disabled.
Separately, Thomas Witt reported that 5e85eba6f5 ("PCI/ASPM: Refactor L1
PM Substates Control Register programming") broke suspend/resume, and it
depends on 4ff116d0d5.
Revert 4ff116d0d5 ("PCI/ASPM: Save L1 PM Substates Capability for
suspend/resume") to fix the resume issue and enable revert of 5e85eba6f5
to fix the issue Thomas reported.
Note that reverting 4ff116d0d5 means L1 Substates config may be lost on
suspend/resume. As far as we know the system will use more power but will
still *work* correctly.
Fixes: 4ff116d0d5 ("PCI/ASPM: Save L1 PM Substates Capability for suspend/resume")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216782
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216877
Reported-by: Tasev Nikola <tasev.stefanoska@skynet.be>
Reported-by: Mark Enriquez <enriquezmark36@gmail.com>
Reported-by: Thomas Witt <kernel@witt.link>
Tested-by: Mark Enriquez <enriquezmark36@gmail.com>
Tested-by: Thomas Witt <kernel@witt.link>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v6.1+
Cc: Vidya Sagar <vidyas@nvidia.com>
pci_bridge_wait_for_secondary_bus() is called after a Secondary Bus
Reset, but not after a DPC-induced Hot Reset.
As a result, the delays prescribed by PCIe r6.0 sec 6.6.1 are not
observed and devices on the secondary bus may be accessed before
they're ready.
One affected device is Intel's Ponte Vecchio HPC GPU. It comprises a
PCIe switch whose upstream port is not immediately ready after reset.
Because its config space is restored too early, it remains in
D0uninitialized, its subordinate devices remain inaccessible and DPC
recovery fails with messages such as:
i915 0000:8c:00.0: can't change power state from D3cold to D0 (config space inaccessible)
intel_vsec 0000:8e:00.1: can't change power state from D3cold to D0 (config space inaccessible)
pcieport 0000:89:02.0: AER: device recovery failed
Fix it.
Link: https://lore.kernel.org/r/9f5ff00e1593d8d9a4b452398b98aa14d23fca11.1673769517.git.lukas@wunner.de
Tested-by: Ravi Kishore Koppuravuri <ravi.kishore.koppuravuri@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: stable@vger.kernel.org
Sheng Bi reports that pci_bridge_secondary_bus_reset() may fail to wait
for devices on the secondary bus to become accessible after reset:
Although it does call pci_dev_wait(), it erroneously passes the bridge's
pci_dev rather than that of a child. The bridge of course is always
accessible while its secondary bus is reset, so pci_dev_wait() returns
immediately.
Sheng Bi proposes introducing a new pci_bridge_secondary_bus_wait()
function which is called from pci_bridge_secondary_bus_reset():
https://lore.kernel.org/linux-pci/20220523171517.32407-1-windy.bi.enflame@gmail.com/
However we already have pci_bridge_wait_for_secondary_bus() which does
almost exactly what we need. So far it's only called on resume from
D3cold (which implies a Fundamental Reset per PCIe r6.0 sec 5.8).
Re-using it for Secondary Bus Resets is a leaner and more rational
approach than introducing a new function.
That only requires a few minor tweaks:
- Amend pci_bridge_wait_for_secondary_bus() to await accessibility of
the first device on the secondary bus by calling pci_dev_wait() after
performing the prescribed delays. pci_dev_wait() needs two parameters,
a reset reason and a timeout, which callers must now pass to
pci_bridge_wait_for_secondary_bus(). The timeout is 1 sec for resume
(PCIe r6.0 sec 6.6.1) and 60 sec for reset (commit 821cdad5c4 ("PCI:
Wait up to 60 seconds for device to become ready after FLR")).
Introduce a PCI_RESET_WAIT macro for the 1 sec timeout.
- Amend pci_bridge_wait_for_secondary_bus() to return 0 on success or
-ENOTTY on error for consumption by pci_bridge_secondary_bus_reset().
- Drop an unnecessary 1 sec delay from pci_reset_secondary_bus() which
is now performed by pci_bridge_wait_for_secondary_bus(). A static
delay this long is only necessary for Conventional PCI, so modern
PCIe systems benefit from shorter reset times as a side effect.
Fixes: 6b2f1351af ("PCI: Wait for device to become ready after secondary bus reset")
Link: https://lore.kernel.org/r/da77c92796b99ec568bd070cbe4725074a117038.1673769517.git.lukas@wunner.de
Reported-by: Sheng Bi <windy.bi.enflame@gmail.com>
Tested-by: Ravi Kishore Koppuravuri <ravi.kishore.koppuravuri@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: stable@vger.kernel.org # v4.17+
If a PCI bridge is suspended to D3cold upon entering system sleep,
resuming it entails a Fundamental Reset per PCIe r6.0 sec 5.8.
The delay prescribed after a Fundamental Reset in PCIe r6.0 sec 6.6.1
is sought to be observed by:
pci_pm_resume_noirq()
pci_pm_bridge_power_up_actions()
pci_bridge_wait_for_secondary_bus()
However, pci_bridge_wait_for_secondary_bus() bails out if the bridge_d3
flag is not set. That flag indicates whether a bridge is allowed to
suspend to D3cold at *runtime*.
Hence *no* delay is observed on resume from system sleep if runtime
D3cold is forbidden. That doesn't make any sense, so drop the bridge_d3
check from pci_bridge_wait_for_secondary_bus().
The purpose of the bridge_d3 check was probably to avoid delays if a
bridge remained in D0 during suspend. However the sole caller of
pci_bridge_wait_for_secondary_bus(), pci_pm_bridge_power_up_actions(),
is only invoked if the previous power state was D3cold. Hence the
additional bridge_d3 check seems superfluous.
Fixes: ad9001f2f4 ("PCI/PM: Add missing link delays required by the PCIe spec")
Link: https://lore.kernel.org/r/eb37fa345285ec8bacabbf06b020b803f77bdd3d.1673769517.git.lukas@wunner.de
Tested-by: Ravi Kishore Koppuravuri <ravi.kishore.koppuravuri@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: stable@vger.kernel.org # v5.5+
Except for isochronous-configured devices, software may set
Max_Read_Request_Size (MRRS) to any value up to 4096. If a device issues a
read request with size greater than the completer's Max_Payload_Size (MPS),
the completer is required to break the response into multiple completions.
Instead of correctly responding with multiple completions to a large read
request, some LS7A Root Ports respond with a Completer Abort. To prevent
this, the MRRS must be limited to an implementation-specific value.
The OS cannot detect that value, so rely on BIOS to configure MRRS before
booting, and quirk the Root Ports so we never set an MRRS larger than that
BIOS value for any downstream device.
N.B. Hot-added devices are not configured by BIOS, and they power up with
MRRS = 512 bytes, so these devices will be limited to 512 bytes. If the
LS7A limit is smaller, those hot-added devices may not work correctly, but
per [1], hotplug is not supported with this chipset revision.
[1] https://lore.kernel.org/r/073638a7-ae68-2847-ac3d-29e5e760d6af@loongson.cn
[bhelgaas: commit log]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216884
Link: https://lore.kernel.org/r/20230201043018.778499-3-chenhuacai@loongson.cn
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
pci_device_is_present() previously didn't work for VFs because it reads the
Vendor and Device ID, which are 0xffff for VFs, which looks like they
aren't present. Check the PF instead.
Wei Gong reported that if virtio I/O is in progress when the driver is
unbound or "0" is written to /sys/.../sriov_numvfs, the virtio I/O
operation hangs, which may result in output like this:
task:bash state:D stack: 0 pid: 1773 ppid: 1241 flags:0x00004002
Call Trace:
schedule+0x4f/0xc0
blk_mq_freeze_queue_wait+0x69/0xa0
blk_mq_freeze_queue+0x1b/0x20
blk_cleanup_queue+0x3d/0xd0
virtblk_remove+0x3c/0xb0 [virtio_blk]
virtio_dev_remove+0x4b/0x80
...
device_unregister+0x1b/0x60
unregister_virtio_device+0x18/0x30
virtio_pci_remove+0x41/0x80
pci_device_remove+0x3e/0xb0
This happened because pci_device_is_present(VF) returned "false" in
virtio_pci_remove(), so it called virtio_break_device(). The broken vq
meant that vring_interrupt() skipped the vq.callback() that would have
completed the virtio I/O operation via virtblk_done().
[bhelgaas: commit log, simplify to always use pci_physfn(), add stable tag]
Link: https://lore.kernel.org/r/20221026060912.173250-1-mst@redhat.com
Reported-by: Wei Gong <gongwei833x@gmail.com>
Tested-by: Wei Gong <gongwei833x@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Replace assignment of PCI domain IDs from atomic_inc_return() to
ida_alloc().
Use two IDAs, one for static domain allocations (those which are defined in
device tree) and second for dynamic allocations (all other).
During removal of root bus / host bridge, also release the domain ID. The
released ID can be reused again, for example when dynamically loading and
unloading native PCI host bridge drivers.
This change also allows to mix static device tree assignment and dynamic by
kernel as all static allocations are reserved in dynamic pool.
[bhelgaas: set "err" if "bus->domain_nr < 0"]
Link: https://lore.kernel.org/r/20220714184130.5436-1-pali@kernel.org
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
- Cache the PTM capability offset instead of searching for it every time
(Bjorn Helgaas)
- Separate PTM configuration from PTM enable (Bjorn Helgaas)
- Add pci_suspend_ptm() and pci_resume_ptm() to disable and re-enable PTM
on suspend/resume so some Root Ports can safely enter a lower-power PM
state (Bjorn Helgaas)
- Disable PTM for all devices during suspend; previously we only did this
for Root Ports and even then only in certain cases (Bjorn Helgaas)
- Simplify pci_pm_suspend_noirq() (Rajvi Jingar)
- Reduce the delay after transitions to/from D3hot by using usleep_range()
instead of msleep(), which reduces the typical delay from 19ms to 10ms
(Sajid Dalvi, Will McVicker)
* pci/pm:
PCI/PM: Reduce D3hot delay with usleep_range()
PCI/PM: Simplify pci_pm_suspend_noirq()
PCI/PM: Always disable PTM for all devices during suspend
PCI/PTM: Consolidate PTM interface declarations
PCI/PTM: Reorder functions in logical order
PCI/PTM: Preserve RsvdP bits in PTM Control register
PCI/PTM: Move pci_ptm_info() body into its only caller
PCI/PTM: Add pci_suspend_ptm() and pci_resume_ptm()
PCI/PTM: Separate configuration and enable
PCI/PTM: Add pci_upstream_ptm() helper
PCI/PTM: Cache PTM Capability offset
Previously the L1 PM Substates Control Registers (CTL1 and CTL2) weren't
saved and restored during suspend/resume leading to the L1 PM Substates
configuration being lost post-resume.
Save the L1 PM Substates Control Registers so that the configuration is
retained post-resume.
[bhelgaas: drop pci_is_pcie() testing; we can rely on pci_configure_ltr()
having already done that]
Link: https://lore.kernel.org/r/20220913131822.16557-3-vidyas@nvidia.com
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
PCIe r6.0, sec 5.9, requires a 10ms delay between programming a device to
change to or from D3hot and the time the device is next accessed (unless
Readiness Notifications are used).
The 10ms value (PCI_PM_D3HOT_WAIT) doesn't appear directly here because
some chipsets require 120ms for devices *below* them (pci_pm_d3hot_delay)
and some devices require more or less than 10ms (dev->d3hot_delay).
But msleep(10) typically waits about *20*ms, which is more than we need.
Switch to usleep_range() to improve the delay accuracy.
Based on a commit from Sajid in the Pixel 6 kernel tree [1]. On a Pixel 6,
the 10ms delay for the Exynos PCIe device delayed for an average of 19ms.
Switching to usleep_range() decreased the resume time by about 9ms.
[1] 18a8cad68d
[bhelgaas commit log, add timers-howto.rst link]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/timers/timers-howto.rst?id=v5.19#n73
Link: https://lore.kernel.org/r/20220921212735.2131588-1-willmcvicker@google.com
Signed-off-by: Sajid Dalvi <sdalvi@google.com>
Signed-off-by: Will McVicker <willmcvicker@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
We want to disable PTM on Root Ports because that allows some chips, e.g.,
Intel mobile chips since Coffee Lake, to enter a lower-power PM state.
That means we also have to disable PTM on downstream devices. PCIe r6.0,
sec 2.2.8, recommends that functions support generation of messages in
non-D0 states, so we have to assume Switch Upstream Ports or Endpoints may
send PTM Requests while in D1, D2, and D3hot. A PTM message received by a
Downstream Port (including a Root Port) with PTM disabled must be treated
as an Unsupported Request (sec 6.21.3).
PTM was previously disabled only for Root Ports, and it was disabled in
pci_prepare_to_sleep(), which is not called at all if a driver supports
legacy PM or does its own state saving.
Instead, disable PTM early in pci_pm_suspend() and pci_pm_runtime_suspend()
so we do it in all cases.
Previously PTM was disabled *after* saving device state, so the state
restore on resume automatically re-enabled it. Since we now disable PTM
*before* saving state, we must explicitly re-enable it in pci_pm_resume()
and pci_pm_runtime_resume().
Here's a sample of errors that occur when PTM is disabled only on the Root
Port. With this topology:
0000:00:1d.0 Root Port to [bus 08-71]
0000:08:00.0 Switch Upstream Port to [bus 09-71]
Kai-Heng reported errors like this:
pcieport 0000:00:1d.0: [20] UnsupReq (First)
pcieport 0000:00:1d.0: AER: TLP Header: 34000000 08000052 00000000 00000000
Decoding TLP header 0x34...... (0011 0100b) and 0x08000052:
Fmt 001b 4 DW header, no data
Type 1 0100b Msg (Local - Terminate at Receiver)
Requester ID 0x0800 Bus 08 Devfn 00.0
Message Code 0x52 0101 0010b PTM Request
The 00:1d.0 Root Port logged an Unsupported Request error when it received
a PTM Request with Requester ID 08:00.0.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215453
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216210
Fixes: a697f072f5 ("PCI: Disable PTM during suspend to save power")
Link: https://lore.kernel.org/r/20220909202505.314195-10-helgaas@kernel.org
Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Rajvi Jingar <rajvi.jingar@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
We disable PTM during suspend because that allows some Root Ports to enter
lower-power PM states, which means we also need to disable PTM for all
downstream devices. Add pci_suspend_ptm() and pci_resume_ptm() for this
purpose.
pci_enable_ptm() and pci_disable_ptm() are for drivers to use to enable or
disable PTM. They use dev->ptm_enabled to keep track of whether PTM should
be enabled.
pci_suspend_ptm() and pci_resume_ptm() are PCI core-internal functions to
temporarily disable PTM during suspend and (depending on dev->ptm_enabled)
re-enable PTM during resume.
Enable/disable/suspend/resume all use internal __pci_enable_ptm() and
__pci_disable_ptm() functions that only update the PTM Control register.
Outline:
pci_enable_ptm(struct pci_dev *dev)
{
__pci_enable_ptm(dev);
dev->ptm_enabled = 1;
pci_ptm_info(dev);
}
pci_disable_ptm(struct pci_dev *dev)
{
if (dev->ptm_enabled) {
__pci_disable_ptm(dev);
dev->ptm_enabled = 0;
}
}
pci_suspend_ptm(struct pci_dev *dev)
{
if (dev->ptm_enabled)
__pci_disable_ptm(dev);
}
pci_resume_ptm(struct pci_dev *dev)
{
if (dev->ptm_enabled)
__pci_enable_ptm(dev);
}
Nothing currently calls pci_resume_ptm(); the suspend path saves the PTM
state before disabling PTM, so the PTM state restore in the resume path
implicitly re-enables it. A future change will use pci_resume_ptm() to fix
some problems with this approach.
Link: https://lore.kernel.org/r/20220909202505.314195-5-helgaas@kernel.org
Tested-by: Rajvi Jingar <rajvi.jingar@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
- Remove pci_get_legacy_ide_irq(); use ATA_PRIMARY_IRQ() and
ATA_SECONDARY_IRQ() instead (Stafford Horne)
- Remove isa_dma_bridge_buggy, except for x86_32, the only place it's used
(Stafford Horne)
- Define ARCH_GENERIC_PCI_MMAP_RESOURCE for csky (Stafford Horne)
- Move common PCI definitions that arches sometimes override to
asm-generic/pci.h (Stafford Horne)
- Include <linux/isa-dma.h> for 'isa_dma_bridge_buggy' when needed
(bisection hole here) (Randy Dunlap)
* pci/header-cleanup-immutable:
PCI: Stub __pci_ioport_map() for arches that don't support it at all
x86/cyrix: include header linux/isa-dma.h
asm-generic: Add new pci.h and use it
csky: PCI: Define ARCH_GENERIC_PCI_MMAP_RESOURCE
PCI: Move isa_dma_bridge_buggy out of asm/dma.h
PCI: Remove pci_get_legacy_ide_irq() and asm-generic/pci.h
The isa_dma_bridge_buggy symbol is only used for x86_32, and only x86_32
platforms or quirks ever set it.
Add a new linux/isa-dma.h header that #defines isa_dma_bridge_buggy to 0
except on x86_32, where we keep it as a variable, and remove all the arch-
specific definitions.
[bhelgaas: commit log]
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Link: https://lore.kernel.org/r/20220722214944.831438-3-shorne@gmail.com
Signed-off-by: Stafford Horne <shorne@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
pcie_aspm_pm_state_change() was introduced at the inception of PCIe ASPM
code, but it can cause some issues. For instance, when ASPM config is
changed via sysfs, those changes won't persist across power state change
because pcie_aspm_pm_state_change() overwrites them.
Also, if the driver restores L1SS [1] after system resume, the restored
state will also be overwritten by pcie_aspm_pm_state_change().
Remove pcie_aspm_pm_state_change(). If there's any hardware that really
needs it to function, a quirk can be used instead.
[1] https://lore.kernel.org/linux-pci/20220201123536.12962-1-vidyas@nvidia.com/
Link: https://lore.kernel.org/r/20220509073639.2048236-1-kai.heng.feng@canonical.com
[bhelgaas: remove additional pcie_aspm_pm_state_change() call in
pci_set_low_power_state(), added by
10aa5377fc ("PCI/PM: Split pci_raw_set_power_state()") and moved by
7957d20145 ("PCI/PM: Relocate pci_set_low_power_state()")]
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
92597f97a4 ("PCI/PM: Avoid putting Elo i2 PCIe Ports in D3cold") omitted
braces around the new Elo i2 entry, so it overwrote the existing Gigabyte
X299 entry. Add the appropriate braces.
Found by:
$ make W=1 drivers/pci/pci.o
CC drivers/pci/pci.o
drivers/pci/pci.c:2974:12: error: initialized field overwritten [-Werror=override-init]
2974 | .ident = "Elo i2",
| ^~~~~~~~
Link: https://lore.kernel.org/r/20220526221258.GA409855@bhelgaas
Fixes: 92597f97a4 ("PCI/PM: Avoid putting Elo i2 PCIe Ports in D3cold")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v5.15+
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmKP/tQUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vzH0xAAojQowrSWzZ5FKTqI+L/L9ZXoAb+e
9IvQljKc9taJldmXp+EB9wkS/5B+VtQcC2qUQuWEQXUoECF8qHlcB4l+XQyd1tWO
O0vZxETH22xjLLrjG2F3l5rrfkJZAf2nEugwbDk97YEgiimeOiRcv3bx6AUCtj6I
rPJ13Fop3Jke7sQMcXYJe3gQLT1o1AKiQGghiCFNi/gzx2lXI6mmHBgLxFoiqcby
WpfXbvbJti95HRaahUR3HaDFfHj4HVkQNLlTtIykJ3Tl2/rOhWEJjI8JOIQpAA+M
WBrWw9rfgbScTiGV+dZ3h7hKiPnHKl9YETIX7L0oA2sj0jZcIs0d6mSBZx0kYuI9
eAlx+qSK9xpbQQr/fdYaUdF1q4QdtU0BYOvOWOzWsqYCECMRJ1PUHFSMbmR/+PNB
P5lHnAbggRSoxdAtwFYv1HTr+VpGH9S+5oxHCz3ohpMjYy6mkCZwHpZn3doaU3ci
KG6yIoVKftm3fZdtFvL03qHl/I8+X24ZhT/T/278PRGjkhSyr56hZo8hg0gqqTct
ngip8qNABmSbqpr73/W6Vl42zAbYtNk1BykYahbKupgW8FbT7hqaZTB05V87pVu+
Ko1aJM6VoOP9rMlKHI9ba8eYCzDrZbLZUFn7ljNPDpzutf0tAwtgwzvZBXN3za6+
Z9+D5dxmvrZEIbA=
=hEti
-----END PGP SIGNATURE-----
Merge tag 'pci-v5.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci updates from Bjorn Helgaas:
"Resource management:
- Restrict E820 clipping to PCI host bridge windows (Bjorn Helgaas)
- Log E820 clipping better (Bjorn Helgaas)
- Add kernel cmdline options to enable/disable E820 clipping (Hans de
Goede)
- Disable E820 reserved region clipping for IdeaPads, Yoga, Yoga
Slip, Acer Spin 5, Clevo Barebone systems where clipping leaves no
usable address space for touchpads, Thunderbolt devices, etc (Hans
de Goede)
- Disable E820 clipping by default starting in 2023 (Hans de Goede)
PCI device hotplug:
- Include files to remove implicit dependencies (Christophe Leroy)
- Only put Root Ports in D3 if they can signal and wake from D3 so
AMD Yellow Carp doesn't miss hotplug events (Mario Limonciello)
Power management:
- Define pci_restore_standard_config() only for CONFIG_PM_SLEEP since
it's unused otherwise (Krzysztof Kozlowski)
- Power up devices completely, including anything platform firmware
needs to do, during runtime resume (Rafael J. Wysocki)
- Move pci_resume_bus() to PM callbacks so we observe the required
bridge power-up delays (Rafael J. Wysocki)
- Drop unneeded runtime_d3cold device flag (Rafael J. Wysocki)
- Split pci_raw_set_power_state() between pci_power_up() and a new
pci_set_low_power_state() (Rafael J. Wysocki)
- Set current_state to D3cold if config read returns ~0, indicating
the device is not accessible (Rafael J. Wysocki)
- Do not call pci_update_current_state() from pci_power_up() so BARs
and ASPM config are restored correctly (Rafael J. Wysocki)
- Write 0 to PMCSR in pci_power_up() in all cases (Rafael J. Wysocki)
- Split pci_power_up() to pci_set_full_power_state() to avoid some
redundant operations (Rafael J. Wysocki)
- Skip restoring BARs if device is not in D0 (Rafael J. Wysocki)
- Rearrange and clarify pci_set_power_state() (Rafael J. Wysocki)
- Remove redundant BAR restores from pci_pm_thaw_noirq() (Rafael J.
Wysocki)
Virtualization:
- Acquire device lock before config space access lock to avoid AB/BA
deadlock with sriov_numvfs_store() (Yicong Yang)
Error handling:
- Clear MULTI_ERR_COR/UNCOR_RCV bits, which a race could previously
leave permanently set (Kuppuswamy Sathyanarayanan)
Peer-to-peer DMA:
- Whitelist Intel Skylake-E Root Ports regardless of which devfn they
are (Shlomo Pongratz)
ASPM:
- Override L1 acceptable latency advertised by Intel DG2 so ASPM L1
can be enabled (Mika Westerberg)
Cadence PCIe controller driver:
- Set up device-specific register to allow PTM Responder to be
enabled by the normal architected bit (Christian Gmeiner)
- Override advertised FLR support since the controller doesn't
implement FLR correctly (Parshuram Thombare)
Cadence PCIe endpoint driver:
- Correct bitmap size for the ob_region_map of outbound window usage
(Dan Carpenter)
Freescale i.MX6 PCIe controller driver:
- Fix PERST# assertion/deassertion so we observe the required delays
before accessing device (Francesco Dolcini)
Freescale Layerscape PCIe controller driver:
- Add "big-endian" DT property (Hou Zhiqiang)
- Update SCFG DT property (Hou Zhiqiang)
- Add "aer", "pme", "intr" DT properties (Li Yang)
- Add DT compatible strings for ls1028a (Xiaowei Bao)
Intel VMD host bridge driver:
- Assign VMD IRQ domain before enumeration to avoid IOMMU interrupt
remapping errors when MSI-X remapping is disabled (Nirmal Patel)
- Revert VMD workaround that kept MSI-X remapping enabled when IOMMU
remapping was enabled (Nirmal Patel)
Marvell MVEBU PCIe controller driver:
- Add of_pci_get_slot_power_limit() to parse the
'slot-power-limit-milliwatt' DT property (Pali Rohár)
- Add mvebu support for sending Set_Slot_Power_Limit message (Pali
Rohár)
MediaTek PCIe controller driver:
- Fix refcount leak in mtk_pcie_subsys_powerup() (Miaoqian Lin)
MediaTek PCIe Gen3 controller driver:
- Reset PHY and MAC at probe time (AngeloGioacchino Del Regno)
Microchip PolarFlare PCIe controller driver:
- Add chained_irq_enter()/chained_irq_exit() calls to mc_handle_msi()
and mc_handle_intx() to avoid lost interrupts (Conor Dooley)
- Fix interrupt handling race (Daire McNamara)
NVIDIA Tegra194 PCIe controller driver:
- Drop tegra194 MSI register save/restore, which is unnecessary since
the DWC core does it (Jisheng Zhang)
Qualcomm PCIe controller driver:
- Add SM8150 SoC DT binding and support (Bhupesh Sharma)
- Fix pipe clock imbalance (Johan Hovold)
- Fix runtime PM imbalance on probe errors (Johan Hovold)
- Fix PHY init imbalance on probe errors (Johan Hovold)
- Convert DT binding to YAML (Dmitry Baryshkov)
- Update DT binding to show that resets aren't required for
MSM8996/APQ8096 platforms (Dmitry Baryshkov)
- Add explicit register names per chipset in DT binding (Dmitry
Baryshkov)
- Add sc7280-specific clock and reset definitions to DT binding
(Dmitry Baryshkov)
Rockchip PCIe controller driver:
- Fix bitmap size when searching for free outbound region (Dan
Carpenter)
Rockchip DesignWare PCIe controller driver:
- Remove "snps,dw-pcie" from rockchip-dwc DT "compatible" property
because it's not fully compatible with rockchip (Peter Geis)
- Reset rockchip-dwc controller at probe (Peter Geis)
- Add rockchip-dwc INTx support (Peter Geis)
Synopsys DesignWare PCIe controller driver:
- Return error instead of success if DMA mapping of MSI area fails
(Jiantao Zhang)
Miscellaneous:
- Change pci_set_dma_mask() documentation references to
dma_set_mask() (Alex Williamson)"
* tag 'pci-v5.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (64 commits)
dt-bindings: PCI: qcom: Add schema for sc7280 chipset
dt-bindings: PCI: qcom: Specify reg-names explicitly
dt-bindings: PCI: qcom: Do not require resets on msm8996 platforms
dt-bindings: PCI: qcom: Convert to YAML
PCI: qcom: Fix unbalanced PHY init on probe errors
PCI: qcom: Fix runtime PM imbalance on probe errors
PCI: qcom: Fix pipe clock imbalance
PCI: qcom: Add SM8150 SoC support
dt-bindings: pci: qcom: Document PCIe bindings for SM8150 SoC
x86/PCI: Disable E820 reserved region clipping starting in 2023
x86/PCI: Disable E820 reserved region clipping via quirks
x86/PCI: Add kernel cmdline options to use/ignore E820 reserved regions
PCI: microchip: Fix potential race in interrupt handling
PCI/AER: Clear MULTI_ERR_COR/UNCOR_RCV bits
PCI: cadence: Clear FLR in device capabilities register
PCI: cadence: Allow PTM Responder to be enabled
PCI: vmd: Revert 2565e5b69c ("PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is enabled by IOMMU.")
PCI: vmd: Assign VMD IRQ domain before enumeration
PCI: Avoid pci_dev_lock() AB/BA deadlock with sriov_numvfs_store()
PCI: rockchip-dwc: Add legacy interrupt support
...
The sysfs sriov_numvfs_store() path acquires the device lock before the
config space access lock:
sriov_numvfs_store
device_lock # A (1) acquire device lock
sriov_configure
vfio_pci_sriov_configure # (for example)
vfio_pci_core_sriov_configure
pci_disable_sriov
sriov_disable
pci_cfg_access_lock
pci_wait_cfg # B (4) wait for dev->block_cfg_access == 0
Previously, pci_dev_lock() acquired the config space access lock before the
device lock:
pci_dev_lock
pci_cfg_access_lock
dev->block_cfg_access = 1 # B (2) set dev->block_cfg_access = 1
device_lock # A (3) wait for device lock
Any path that uses pci_dev_lock(), e.g., pci_reset_function(), may
deadlock with sriov_numvfs_store() if the operations occur in the sequence
(1) (2) (3) (4).
Avoid the deadlock by reversing the order in pci_dev_lock() so it acquires
the device lock before the config space access lock, the same as the
sriov_numvfs_store() path.
[bhelgaas: combined and adapted commit log from Jay Zhou's independent
subsequent posting:
https://lore.kernel.org/r/20220404062539.1710-1-jianjay.zhou@huawei.com]
Link: https://lore.kernel.org/linux-pci/1583489997-17156-1-git-send-email-yangyicong@hisilicon.com/
Also-posted-by: Jay Zhou <jianjay.zhou@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The part of pci_set_power_state() related to transitions into
low-power states is unnecessary convoluted, so clearly divide it
into the D3cold special case and the general case covering all of
the other states.
Also fix a potential issue with calling pci_bus_set_current_state()
to set the current state of all devices on the subordinate bus to
D3cold without checking if the power state of the parent bridge has
really changed to D3cold.
Link: https://lore.kernel.org/r/2139440.Mh6RI2rZIc@kreacher
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Make the following assorted non-essential changes in
pci_set_low_power_state():
1. Drop two redundant checks from it (the caller takes care of these
conditions).
2. Change the log level of a messages printed by it to "debug",
because it only indicates a programming mistake.
Link: https://lore.kernel.org/r/2539071.Lt9SDvczpP@kreacher
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Do not attempt to restore the device's BARs in
pci_set_full_power_state() if the actual current
power state of the device is not D0.
Link: https://lore.kernel.org/r/1849718.CQOukoFCf9@kreacher
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
One of the two callers of pci_power_up() invokes
pci_update_current_state() and pci_restore_state() right after calling
it, in which case running the part of it happening after the mandatory
transition delays is redundant, so move that part out of it into a new
function called pci_set_full_power_state() that will be invoked from
pci_set_power_state() which is the other caller of pci_power_up().
Link: https://lore.kernel.org/r/1942150.usQuhbGJ8B@kreacher
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Make pci_power_up() write 0 to the device's PCI_PM_CTRL register in
order to put it into D0 regardless of the power state returned by
the previous read from that register which should not matter.
Link: https://lore.kernel.org/r/5748066.MhkbZ0Pkbq@kreacher
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Notice that calling pci_update_current_state() from pci_power_up() is
redundant and may be harmful in some cases.
First, if the device is in a low-power state before pci_power_up()
gets called for it and platform_pci_set_power_state() successfully
changes its power state to D0, pci_update_current_state() will update
current_state to reflect that and pci_power_up() will return success
right away without restoring the device's BARs or reconfiguring ASPM
which may be necessary. This is arguably incorrect and definitely
inconsistent with the case when platform_pci_set_power_state() returns
an error (for example, because the device is not power-manageable by
the platform firmware).
Second, current_state should not be overwritten until the decision
whether or not to restore the device's BARs is made, because that
decision generally depends on its value. Again, calling
pci_update_current_state() in pci_power_up() is not consistent with
this observation.
Next, pci_power_up() attempts to read from the device's PCI_PM_CTRL
register regardless of the current_state value unless it is PCI_D0,
including the case when pci_update_current_state() sets current_state
to PCI_D3cold to indicate that the device is not accessible. If the
register read is not successful, current_state will be set to
PCI_D3cold anyway, so that pci_update_current_state() action is
redundant.
Further, if pci_update_current_state() reads the device's PCI_PM_CTRL
register, pci_power_up() will repeat that read going forward and
it is not necessary to update current_state in the meantime.
Finally, if pm_cap is not set (in which case the PCI_PM_CTRL register
is not present), the power state of the device should be determined
with the help of the platform firmware or set to D0 if that's not
possible and pci_update_current_state() does not do that.
Accordingly, rearrange pci_power_up() so as to address the above
shortcomings.
Link: https://lore.kernel.org/r/3695055.kQq0lBPeGt@kreacher
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Some actions carried out by pci_platform_power_transition(() in
pci_power_up() are redundant, but before dealing with them, make
pci_power_up() call the pci_platform_power_transition() code directly
(and avoid a redundant check when pm_cap is unset while at it).
Link: https://lore.kernel.org/r/1922486.PYKUYFuaPT@kreacher
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Make pci_power_up() and pci_set_low_power_state() change current_state
to PCI_D3cold when the device is not accessible along the lines of
pci_update_current_state().
Link: https://lore.kernel.org/r/10104376.nUPlyArG6x@kreacher
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Because pci_set_power_state() is the only caller of
pci_set_low_power_state(), put the latter next to the former.
No functional impact.
Link: https://lore.kernel.org/r/3202976.44csPzL39Z@kreacher
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The transitions from low-power states to D0 and the other way around
are unnecessarily tangled in pci_raw_set_power_state() which makes it
rather hard to follow.
Moreover, the only caller of pci_raw_set_power_state() passing PCI_D0
as its state argument is pci_power_up(), so the code carrying out
transitions into D0 can be put directly into that function.
Accordingly, move the code handling transitions from low-power states
into D0 directly into pci_power_up() and rename the remaining part
of pci_raw_set_power_state() to pci_set_low_power_state(), because
it only handles transitions into low-power state now.
While at it, fix up some white space, update some comments and modify
messages printed by pci_power_up() and pci_set_low_power_state() to
be less confusing (which is the only expected functional impact of
this change).
Link: https://lore.kernel.org/r/13038676.uLZWGnKmhe@kreacher
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Save one config space access in pci_update_current_state() by testing the
retrieved PCI_PM_CTRL register value against PCI_POSSIBLE_ERROR() instead
of invoking pci_device_is_present() separately.
While at it, drop a pair of unnecessary parens.
No expected functional impact.
Link: https://lore.kernel.org/r/1917095.PYKUYFuaPT@kreacher
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The runtime_d3cold flag is not needed any more, so drop it.
Link: https://lore.kernel.org/r/8077784.T7Z3S40VBb@kreacher
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Calling pci_resume_bus() on the secondary bus from pci_power_up() as it is
done now is questionable, because it depends on the mandatory bridge
power-up delays that are only covered by the PCI bus type PM callbacks.
For this reason, move the subordinate bus resume to those callbacks too and
use the observation that if a bridge is being powered-up during resume from
system-wide suspend, it may be still desirable to runtime-resume its
subordinate bus after completing the system-wide transition (in case the
resume of the devices on that bus is skipped during it).
Link: https://lore.kernel.org/r/3190097.aeNJFYEL58@kreacher
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
If a Root Port on Elo i2 is put into D3cold and then back into D0, the
downstream device becomes permanently inaccessible, so add a bridge D3 DMI
quirk for that system.
This was exposed by 14858dcc3b ("PCI: Use pci_update_current_state() in
pci_enable_device_flags()"), but before that commit the Root Port in
question had never been put into D3cold for real due to a mismatch between
its power state retrieved from the PCI_PM_CTRL register (which was
accessible even though the platform firmware indicated that the port was in
D3cold) and the state of an ACPI power resource involved in its power
management.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215715
Link: https://lore.kernel.org/r/11980172.O9o76ZdvQC@kreacher
Reported-by: Stefan Gottwald <gottwald@igel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v5.15+