From f2c33ccacb2d4bbeae2a255a7ca0cbfd03017b7c Mon Sep 17 00:00:00 2001 From: Dexuan Cui Date: Wed, 14 Aug 2019 01:06:55 +0000 Subject: [PATCH 01/25] PCI/PM: Always return devices to D0 when thawing pci_pm_thaw_noirq() is supposed to return the device to D0 and restore its configuration registers, but previously it only did that for devices whose drivers implemented the new power management ops. Hibernation, e.g., via "echo disk > /sys/power/state", involves freezing devices, creating a hibernation image, thawing devices, writing the image, and powering off. The fact that thawing did not return devices with legacy power management to D0 caused errors, e.g., in this path: pci_pm_thaw_noirq if (pci_has_legacy_pm_support(pci_dev)) # true for Mellanox VF driver return pci_legacy_resume_early(dev) # ... legacy PM skips the rest pci_set_power_state(pci_dev, PCI_D0) pci_restore_state(pci_dev) pci_pm_thaw if (pci_has_legacy_pm_support(pci_dev)) pci_legacy_resume drv->resume mlx4_resume ... pci_enable_msix_range ... if (dev->current_state != PCI_D0) # <--- return -EINVAL; which caused these warnings: mlx4_core a6d1:00:02.0: INTx is not supported in multi-function mode, aborting PM: dpm_run_callback(): pci_pm_thaw+0x0/0xd7 returns -95 PM: Device a6d1:00:02.0 failed to thaw: error -95 Return devices to D0 and restore config registers for all devices, not just those whose drivers support new power management. [bhelgaas: also call pci_restore_state() before pci_legacy_resume_early(), update comment, add stable tag, commit log] Link: https://lore.kernel.org/r/KU1P153MB016637CAEAD346F0AA8E3801BFAD0@KU1P153MB0166.APCP153.PROD.OUTLOOK.COM Signed-off-by: Dexuan Cui Signed-off-by: Bjorn Helgaas Reviewed-by: Rafael J. Wysocki Cc: stable@vger.kernel.org # v4.13+ --- drivers/pci/pci-driver.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index a8124e47bf6e..d4ac8ce8c1f9 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -1076,17 +1076,22 @@ static int pci_pm_thaw_noirq(struct device *dev) return error; } - if (pci_has_legacy_pm_support(pci_dev)) - return pci_legacy_resume_early(dev); - /* - * pci_restore_state() requires the device to be in D0 (because of MSI - * restoration among other things), so force it into D0 in case the - * driver's "freeze" callbacks put it into a low-power state directly. + * Both the legacy ->resume_early() and the new pm->thaw_noirq() + * callbacks assume the device has been returned to D0 and its + * config state has been restored. + * + * In addition, pci_restore_state() restores MSI-X state in MMIO + * space, which requires the device to be in D0, so return it to D0 + * in case the driver's "freeze" callbacks put it into a low-power + * state. */ pci_set_power_state(pci_dev, PCI_D0); pci_restore_state(pci_dev); + if (pci_has_legacy_pm_support(pci_dev)) + return pci_legacy_resume_early(dev); + if (drv && drv->pm && drv->pm->thaw_noirq) error = drv->pm->thaw_noirq(dev); From dc68b406783edf33af0e997fb98e9e048b4d46e8 Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Mon, 14 Oct 2019 14:14:06 -0500 Subject: [PATCH 02/25] PCI/PM: Correct pci_pm_thaw_noirq() documentation According to the documentation, pci_pm_thaw_noirq() did not put the device into the full-power state and restore its standard configuration registers. This is incorrect, so update the documentation to match the code. Link: https://lore.kernel.org/r/20191014230016.240912-3-helgaas@kernel.org Signed-off-by: Bjorn Helgaas Reviewed-by: Rafael J. Wysocki --- Documentation/power/pci.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/Documentation/power/pci.rst b/Documentation/power/pci.rst index 0e2ef7429304..1525c594d631 100644 --- a/Documentation/power/pci.rst +++ b/Documentation/power/pci.rst @@ -600,17 +600,17 @@ using the following PCI bus type's callbacks:: respectively. -The first of them, pci_pm_thaw_noirq(), is analogous to pci_pm_resume_noirq(), -but it doesn't put the device into the full power state and doesn't attempt to -restore its standard configuration registers. It also executes the device -driver's pm->thaw_noirq() callback, if defined, instead of pm->resume_noirq(). +The first of them, pci_pm_thaw_noirq(), is analogous to pci_pm_resume_noirq(). +It puts the device into the full power state and restores its standard +configuration registers. It also executes the device driver's pm->thaw_noirq() +callback, if defined, instead of pm->resume_noirq(). The pci_pm_thaw() routine is similar to pci_pm_resume(), but it runs the device driver's pm->thaw() callback instead of pm->resume(). It is executed asynchronously for different PCI devices that don't depend on each other in a known way. -The complete phase it the same as for system resume. +The complete phase is the same as for system resume. After saving the image, devices need to be powered down before the system can enter the target sleep state (ACPI S4 for ACPI-based systems). This is done in From ec6a75ef8e33fe33f963b916fd902c52a0be33ff Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Thu, 10 Oct 2019 16:54:36 -0500 Subject: [PATCH 03/25] PCI/PM: Clear PCIe PME Status even for legacy power management Previously, pci_pm_resume_noirq() cleared the PME Status bit in the Root Status register only if the device had no driver or the driver did not implement legacy power management. It should clear PME Status regardless of what sort of power management the driver supports, so do this before checking for legacy power management. This affects Root Ports and Root Complex Event Collectors, for which the usual driver is the PCIe portdrv, which implements new power management, so this change is just on principle, not to fix any actual defects. Fixes: a39bd851dccf ("PCI/PM: Clear PCIe PME Status bit in core, not PCIe port driver") Link: https://lore.kernel.org/r/20191014230016.240912-4-helgaas@kernel.org Signed-off-by: Bjorn Helgaas Reviewed-by: Rafael J. Wysocki --- drivers/pci/pci-driver.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index d4ac8ce8c1f9..0c3086793e4e 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -941,12 +941,11 @@ static int pci_pm_resume_noirq(struct device *dev) pci_pm_default_resume_early(pci_dev); pci_fixup_device(pci_fixup_resume_early, pci_dev); + pcie_pme_root_status_cleanup(pci_dev); if (pci_has_legacy_pm_support(pci_dev)) return pci_legacy_resume_early(dev); - pcie_pme_root_status_cleanup(pci_dev); - if (drv && drv->pm && drv->pm->resume_noirq) error = drv->pm->resume_noirq(dev); From f7b32a86e455b035dd45a334c203abf6fe118cf3 Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Sat, 12 Oct 2019 17:15:57 -0500 Subject: [PATCH 04/25] PCI/PM: Run resume fixups before disabling wakeup events pci_pm_resume() and pci_pm_restore() call pci_pm_default_resume(), which runs resume fixups before disabling wakeup events: static void pci_pm_default_resume(struct pci_dev *pci_dev) { pci_fixup_device(pci_fixup_resume, pci_dev); pci_enable_wake(pci_dev, PCI_D0, false); } pci_pm_runtime_resume() does both of these, but in the opposite order: pci_enable_wake(pci_dev, PCI_D0, false); pci_fixup_device(pci_fixup_resume, pci_dev); We should always use the same ordering unless there's a reason to do otherwise. Change pci_pm_runtime_resume() to call pci_pm_default_resume() instead of open-coding this, so the fixups are always done before disabling wakeup events. pci_pm_default_resume() is called from pci_pm_runtime_resume(), which is under #ifdef CONFIG_PM. If SUSPEND and HIBERNATION are disabled, PM_SLEEP is disabled also, so move pci_pm_default_resume() from #ifdef CONFIG_PM_SLEEP to #ifdef CONFIG_PM. Link: https://lore.kernel.org/r/20191014230016.240912-5-helgaas@kernel.org Signed-off-by: Bjorn Helgaas Reviewed-by: Rafael J. Wysocki --- drivers/pci/pci-driver.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 0c3086793e4e..abee2a790a10 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -517,6 +517,12 @@ static int pci_restore_standard_config(struct pci_dev *pci_dev) return 0; } +static void pci_pm_default_resume(struct pci_dev *pci_dev) +{ + pci_fixup_device(pci_fixup_resume, pci_dev); + pci_enable_wake(pci_dev, PCI_D0, false); +} + #endif #ifdef CONFIG_PM_SLEEP @@ -645,12 +651,6 @@ static int pci_legacy_resume(struct device *dev) /* Auxiliary functions used by the new power management framework */ -static void pci_pm_default_resume(struct pci_dev *pci_dev) -{ - pci_fixup_device(pci_fixup_resume, pci_dev); - pci_enable_wake(pci_dev, PCI_D0, false); -} - static void pci_pm_default_suspend(struct pci_dev *pci_dev) { /* Disable non-bridge devices without PM support */ @@ -992,7 +992,6 @@ static int pci_pm_resume(struct device *dev) #ifdef CONFIG_HIBERNATE_CALLBACKS - /* * pcibios_pm_ops - provide arch-specific hooks when a PCI device is doing * a hibernate transition @@ -1345,8 +1344,7 @@ static int pci_pm_runtime_resume(struct device *dev) return 0; pci_fixup_device(pci_fixup_resume_early, pci_dev); - pci_enable_wake(pci_dev, PCI_D0, false); - pci_fixup_device(pci_fixup_resume, pci_dev); + pci_pm_default_resume(pci_dev); if (pm && pm->runtime_resume) rc = pm->runtime_resume(dev); From 6da2f2ccfd2deb81a63fc23a505ccd72de005c39 Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Mon, 14 Oct 2019 13:46:50 -0500 Subject: [PATCH 05/25] PCI/PM: Make power management op coding style consistent Some of the power management ops use this style: struct device_driver *drv = dev->driver; if (drv && drv->pm && drv->pm->prepare(dev)) drv->pm->prepare(dev); while others use this: const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; if (pm && pm->runtime_resume) pm->runtime_resume(dev); Convert the first style to the second so they're all consistent. Remove local "error" variables when unnecessary. No functional change intended. Link: https://lore.kernel.org/r/20191014230016.240912-6-helgaas@kernel.org Signed-off-by: Bjorn Helgaas Reviewed-by: Rafael J. Wysocki --- drivers/pci/pci-driver.c | 76 +++++++++++++++++++--------------------- 1 file changed, 36 insertions(+), 40 deletions(-) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index abee2a790a10..dd70ab2519c9 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -679,11 +679,11 @@ static bool pci_has_legacy_pm_support(struct pci_dev *pci_dev) static int pci_pm_prepare(struct device *dev) { - struct device_driver *drv = dev->driver; struct pci_dev *pci_dev = to_pci_dev(dev); + const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; - if (drv && drv->pm && drv->pm->prepare) { - int error = drv->pm->prepare(dev); + if (pm && pm->prepare) { + int error = pm->prepare(dev); if (error < 0) return error; @@ -917,8 +917,7 @@ Fixup: static int pci_pm_resume_noirq(struct device *dev) { struct pci_dev *pci_dev = to_pci_dev(dev); - struct device_driver *drv = dev->driver; - int error = 0; + const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; if (dev_pm_may_skip_resume(dev)) return 0; @@ -946,17 +945,16 @@ static int pci_pm_resume_noirq(struct device *dev) if (pci_has_legacy_pm_support(pci_dev)) return pci_legacy_resume_early(dev); - if (drv && drv->pm && drv->pm->resume_noirq) - error = drv->pm->resume_noirq(dev); + if (pm && pm->resume_noirq) + return pm->resume_noirq(dev); - return error; + return 0; } static int pci_pm_resume(struct device *dev) { struct pci_dev *pci_dev = to_pci_dev(dev); const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; - int error = 0; /* * This is necessary for the suspend error path in which resume is @@ -972,12 +970,12 @@ static int pci_pm_resume(struct device *dev) if (pm) { if (pm->resume) - error = pm->resume(dev); + return pm->resume(dev); } else { pci_pm_reenable_device(pci_dev); } - return error; + return 0; } #else /* !CONFIG_SUSPEND */ @@ -1037,16 +1035,16 @@ static int pci_pm_freeze(struct device *dev) static int pci_pm_freeze_noirq(struct device *dev) { struct pci_dev *pci_dev = to_pci_dev(dev); - struct device_driver *drv = dev->driver; + const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; if (pci_has_legacy_pm_support(pci_dev)) return pci_legacy_suspend_late(dev, PMSG_FREEZE); - if (drv && drv->pm && drv->pm->freeze_noirq) { + if (pm && pm->freeze_noirq) { int error; - error = drv->pm->freeze_noirq(dev); - suspend_report_result(drv->pm->freeze_noirq, error); + error = pm->freeze_noirq(dev); + suspend_report_result(pm->freeze_noirq, error); if (error) return error; } @@ -1065,8 +1063,8 @@ static int pci_pm_freeze_noirq(struct device *dev) static int pci_pm_thaw_noirq(struct device *dev) { struct pci_dev *pci_dev = to_pci_dev(dev); - struct device_driver *drv = dev->driver; - int error = 0; + const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; + int error; if (pcibios_pm_ops.thaw_noirq) { error = pcibios_pm_ops.thaw_noirq(dev); @@ -1090,10 +1088,10 @@ static int pci_pm_thaw_noirq(struct device *dev) if (pci_has_legacy_pm_support(pci_dev)) return pci_legacy_resume_early(dev); - if (drv && drv->pm && drv->pm->thaw_noirq) - error = drv->pm->thaw_noirq(dev); + if (pm && pm->thaw_noirq) + return pm->thaw_noirq(dev); - return error; + return 0; } static int pci_pm_thaw(struct device *dev) @@ -1164,24 +1162,24 @@ static int pci_pm_poweroff_late(struct device *dev) static int pci_pm_poweroff_noirq(struct device *dev) { struct pci_dev *pci_dev = to_pci_dev(dev); - struct device_driver *drv = dev->driver; + const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; if (dev_pm_smart_suspend_and_suspended(dev)) return 0; - if (pci_has_legacy_pm_support(to_pci_dev(dev))) + if (pci_has_legacy_pm_support(pci_dev)) return pci_legacy_suspend_late(dev, PMSG_HIBERNATE); - if (!drv || !drv->pm) { + if (!pm) { pci_fixup_device(pci_fixup_suspend_late, pci_dev); return 0; } - if (drv->pm->poweroff_noirq) { + if (pm->poweroff_noirq) { int error; - error = drv->pm->poweroff_noirq(dev); - suspend_report_result(drv->pm->poweroff_noirq, error); + error = pm->poweroff_noirq(dev); + suspend_report_result(pm->poweroff_noirq, error); if (error) return error; } @@ -1207,8 +1205,8 @@ static int pci_pm_poweroff_noirq(struct device *dev) static int pci_pm_restore_noirq(struct device *dev) { struct pci_dev *pci_dev = to_pci_dev(dev); - struct device_driver *drv = dev->driver; - int error = 0; + const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; + int error; if (pcibios_pm_ops.restore_noirq) { error = pcibios_pm_ops.restore_noirq(dev); @@ -1222,17 +1220,16 @@ static int pci_pm_restore_noirq(struct device *dev) if (pci_has_legacy_pm_support(pci_dev)) return pci_legacy_resume_early(dev); - if (drv && drv->pm && drv->pm->restore_noirq) - error = drv->pm->restore_noirq(dev); + if (pm && pm->restore_noirq) + return pm->restore_noirq(dev); - return error; + return 0; } static int pci_pm_restore(struct device *dev) { struct pci_dev *pci_dev = to_pci_dev(dev); const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; - int error = 0; /* * This is necessary for the hibernation error path in which restore is @@ -1248,12 +1245,12 @@ static int pci_pm_restore(struct device *dev) if (pm) { if (pm->restore) - error = pm->restore(dev); + return pm->restore(dev); } else { pci_pm_reenable_device(pci_dev); } - return error; + return 0; } #else /* !CONFIG_HIBERNATE_CALLBACKS */ @@ -1329,9 +1326,9 @@ static int pci_pm_runtime_suspend(struct device *dev) static int pci_pm_runtime_resume(struct device *dev) { - int rc = 0; struct pci_dev *pci_dev = to_pci_dev(dev); const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; + int error = 0; /* * Restoring config space is necessary even if the device is not bound @@ -1347,18 +1344,17 @@ static int pci_pm_runtime_resume(struct device *dev) pci_pm_default_resume(pci_dev); if (pm && pm->runtime_resume) - rc = pm->runtime_resume(dev); + error = pm->runtime_resume(dev); pci_dev->runtime_d3cold = false; - return rc; + return error; } static int pci_pm_runtime_idle(struct device *dev) { struct pci_dev *pci_dev = to_pci_dev(dev); const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; - int ret = 0; /* * If pci_dev->driver is not set (unbound), the device should @@ -1371,9 +1367,9 @@ static int pci_pm_runtime_idle(struct device *dev) return -ENOSYS; if (pm->runtime_idle) - ret = pm->runtime_idle(dev); + return pm->runtime_idle(dev); - return ret; + return 0; } static const struct dev_pm_ops pci_dev_pm_ops = { From 85a9b0507db2fbbfe7912dc3b33d322f200e2ca7 Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Tue, 8 Oct 2019 15:28:00 -0500 Subject: [PATCH 06/25] PCI/PM: Note that PME can be generated from D0 Per PCIe r5.0 sec 7.5.2.1, PME may be generated from D0, so update Documentation/power/pci.rst to reflect that. Link: https://lore.kernel.org/r/20191016194450.68959-1-helgaas@kernel.org Signed-off-by: Bjorn Helgaas Reviewed-by: Rafael J. Wysocki --- Documentation/power/pci.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/power/pci.rst b/Documentation/power/pci.rst index 1525c594d631..8a4abb7c7363 100644 --- a/Documentation/power/pci.rst +++ b/Documentation/power/pci.rst @@ -130,8 +130,8 @@ a full power-on reset sequence and the power-on defaults are restored to the device by hardware just as at initial power up. PCI devices supporting the PCI PM Spec can be programmed to generate PMEs -while in a low-power state (D1-D3), but they are not required to be capable -of generating PMEs from all supported low-power states. In particular, the +while in any power state (D0-D3), but they are not required to be capable +of generating PMEs from all supported power states. In particular, the capability of generating PMEs from D3cold is optional and depends on the presence of additional voltage (3.3Vaux) allowing the device to remain sufficiently active to generate a wakeup signal. From b64cf7a1711d7796fdac19a23bc63d378a6e7ed1 Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Tue, 8 Oct 2019 15:25:23 -0500 Subject: [PATCH 07/25] PCI/PM: Wrap long lines in documentation Documentation/power/pci.rst is wrapped to fit in 80 columns, but directory structure changes made a few lines longer. Wrap them so they all fit in 80 columns again. Link: https://lore.kernel.org/r/20191014230016.240912-7-helgaas@kernel.org Signed-off-by: Bjorn Helgaas Reviewed-by: Rafael J. Wysocki --- Documentation/power/pci.rst | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/Documentation/power/pci.rst b/Documentation/power/pci.rst index 8a4abb7c7363..a90e82c70a3b 100644 --- a/Documentation/power/pci.rst +++ b/Documentation/power/pci.rst @@ -426,12 +426,12 @@ pm->runtime_idle() callback. 2.4. System-Wide Power Transitions ---------------------------------- There are a few different types of system-wide power transitions, described in -Documentation/driver-api/pm/devices.rst. Each of them requires devices to be handled -in a specific way and the PM core executes subsystem-level power management -callbacks for this purpose. They are executed in phases such that each phase -involves executing the same subsystem-level callback for every device belonging -to the given subsystem before the next phase begins. These phases always run -after tasks have been frozen. +Documentation/driver-api/pm/devices.rst. Each of them requires devices to be +handled in a specific way and the PM core executes subsystem-level power +management callbacks for this purpose. They are executed in phases such that +each phase involves executing the same subsystem-level callback for every device +belonging to the given subsystem before the next phase begins. These phases +always run after tasks have been frozen. 2.4.1. System Suspend ^^^^^^^^^^^^^^^^^^^^^ @@ -636,12 +636,12 @@ System restore requires a hibernation image to be loaded into memory and the pre-hibernation memory contents to be restored before the pre-hibernation system activity can be resumed. -As described in Documentation/driver-api/pm/devices.rst, the hibernation image is loaded -into memory by a fresh instance of the kernel, called the boot kernel, which in -turn is loaded and run by a boot loader in the usual way. After the boot kernel -has loaded the image, it needs to replace its own code and data with the code -and data of the "hibernated" kernel stored within the image, called the image -kernel. For this purpose all devices are frozen just like before creating +As described in Documentation/driver-api/pm/devices.rst, the hibernation image +is loaded into memory by a fresh instance of the kernel, called the boot kernel, +which in turn is loaded and run by a boot loader in the usual way. After the +boot kernel has loaded the image, it needs to replace its own code and data with +the code and data of the "hibernated" kernel stored within the image, called the +image kernel. For this purpose all devices are frozen just like before creating the image during hibernation, in the prepare, freeze, freeze_noirq @@ -691,8 +691,8 @@ controlling the runtime power management of their devices. At the time of this writing there are two ways to define power management callbacks for a PCI device driver, the recommended one, based on using a -dev_pm_ops structure described in Documentation/driver-api/pm/devices.rst, and the -"legacy" one, in which the .suspend(), .suspend_late(), .resume_early(), and +dev_pm_ops structure described in Documentation/driver-api/pm/devices.rst, and +the "legacy" one, in which the .suspend(), .suspend_late(), .resume_early(), and .resume() callbacks from struct pci_driver are used. The legacy approach, however, doesn't allow one to define runtime power management callbacks and is not really suitable for any new drivers. Therefore it is not covered by this From 6941a0c2bdedfb729fb166091e12d06e4fce177f Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Mon, 7 Oct 2019 07:55:18 -0500 Subject: [PATCH 08/25] PCI/PM: Use PCI dev_printk() wrappers for consistency Use the PCI dev_printk() wrappers for consistency with the rest of the PCI core. No functional change intended. Link: https://lore.kernel.org/r/20191017212851.54237-2-helgaas@kernel.org Signed-off-by: Bjorn Helgaas Reviewed-by: Rafael J. Wysocki --- drivers/pci/pci-driver.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index dd70ab2519c9..407b1df5ea7c 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -315,7 +315,8 @@ static long local_pci_probe(void *_ddi) * Probe function should return < 0 for failure, 0 for success * Treat values > 0 as success, but warn. */ - dev_warn(dev, "Driver probe function unexpectedly returned %d\n", rc); + pci_warn(pci_dev, "Driver probe function unexpectedly returned %d\n", + rc); return 0; } @@ -865,7 +866,7 @@ static int pci_pm_suspend_noirq(struct device *dev) pci_prepare_to_sleep(pci_dev); } - dev_dbg(dev, "PCI PM: Suspend power state: %s\n", + pci_dbg(pci_dev, "PCI PM: Suspend power state: %s\n", pci_power_name(pci_dev->current_state)); if (pci_dev->current_state == PCI_D0) { @@ -880,7 +881,7 @@ static int pci_pm_suspend_noirq(struct device *dev) } if (pci_dev->skip_bus_pm && pm_suspend_no_platform()) { - dev_dbg(dev, "PCI PM: Skipped\n"); + pci_dbg(pci_dev, "PCI PM: Skipped\n"); goto Fixup; } @@ -1295,11 +1296,11 @@ static int pci_pm_runtime_suspend(struct device *dev) * log level. */ if (error == -EBUSY || error == -EAGAIN) { - dev_dbg(dev, "can't suspend now (%ps returned %d)\n", + pci_dbg(pci_dev, "can't suspend now (%ps returned %d)\n", pm->runtime_suspend, error); return error; } else if (error) { - dev_err(dev, "can't suspend (%ps returned %d)\n", + pci_err(pci_dev, "can't suspend (%ps returned %d)\n", pm->runtime_suspend, error); return error; } From 12bcae44bf48595c71898330076576075590e15b Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Mon, 7 Oct 2019 07:52:28 -0500 Subject: [PATCH 09/25] PCI/PM: Use pci_WARN() to include device information Add and use pci_WARN() wrappers so warnings include device information. Link: https://lore.kernel.org/r/20191017212851.54237-3-helgaas@kernel.org Signed-off-by: Bjorn Helgaas Reviewed-by: Rafael J. Wysocki --- drivers/pci/pci-driver.c | 34 +++++++++++++++++----------------- include/linux/pci.h | 8 ++++++++ 2 files changed, 25 insertions(+), 17 deletions(-) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 407b1df5ea7c..5337cbbd69de 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -585,9 +585,9 @@ static int pci_legacy_suspend(struct device *dev, pm_message_t state) if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0 && pci_dev->current_state != PCI_UNKNOWN) { - WARN_ONCE(pci_dev->current_state != prev, - "PCI PM: Device state not saved by %pS\n", - drv->suspend); + pci_WARN_ONCE(pci_dev, pci_dev->current_state != prev, + "PCI PM: Device state not saved by %pS\n", + drv->suspend); } } @@ -612,9 +612,9 @@ static int pci_legacy_suspend_late(struct device *dev, pm_message_t state) if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0 && pci_dev->current_state != PCI_UNKNOWN) { - WARN_ONCE(pci_dev->current_state != prev, - "PCI PM: Device state not saved by %pS\n", - drv->suspend_late); + pci_WARN_ONCE(pci_dev, pci_dev->current_state != prev, + "PCI PM: Device state not saved by %pS\n", + drv->suspend_late); goto Fixup; } } @@ -670,8 +670,8 @@ static bool pci_has_legacy_pm_support(struct pci_dev *pci_dev) * supported as well. Drivers are supposed to support either the * former, or the latter, but not both at the same time. */ - WARN(ret && drv->driver.pm, "driver %s device %04x:%04x\n", - drv->name, pci_dev->vendor, pci_dev->device); + pci_WARN(pci_dev, ret && drv->driver.pm, "device %04x:%04x\n", + pci_dev->vendor, pci_dev->device); return ret; } @@ -794,9 +794,9 @@ static int pci_pm_suspend(struct device *dev) if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0 && pci_dev->current_state != PCI_UNKNOWN) { - WARN_ONCE(pci_dev->current_state != prev, - "PCI PM: State of device not saved by %pS\n", - pm->suspend); + pci_WARN_ONCE(pci_dev, pci_dev->current_state != prev, + "PCI PM: State of device not saved by %pS\n", + pm->suspend); } } @@ -842,9 +842,9 @@ static int pci_pm_suspend_noirq(struct device *dev) if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0 && pci_dev->current_state != PCI_UNKNOWN) { - WARN_ONCE(pci_dev->current_state != prev, - "PCI PM: State of device not saved by %pS\n", - pm->suspend_noirq); + pci_WARN_ONCE(pci_dev, pci_dev->current_state != prev, + "PCI PM: State of device not saved by %pS\n", + pm->suspend_noirq); goto Fixup; } } @@ -1311,9 +1311,9 @@ static int pci_pm_runtime_suspend(struct device *dev) if (pm && pm->runtime_suspend && !pci_dev->state_saved && pci_dev->current_state != PCI_D0 && pci_dev->current_state != PCI_UNKNOWN) { - WARN_ONCE(pci_dev->current_state != prev, - "PCI PM: State of device not saved by %pS\n", - pm->runtime_suspend); + pci_WARN_ONCE(pci_dev, pci_dev->current_state != prev, + "PCI PM: State of device not saved by %pS\n", + pm->runtime_suspend); return 0; } diff --git a/include/linux/pci.h b/include/linux/pci.h index f9088c89a534..4846306d521c 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -2400,4 +2400,12 @@ void pci_uevent_ers(struct pci_dev *pdev, enum pci_ers_result err_type); #define pci_info_ratelimited(pdev, fmt, arg...) \ dev_info_ratelimited(&(pdev)->dev, fmt, ##arg) +#define pci_WARN(pdev, condition, fmt, arg...) \ + WARN(condition, "%s %s: " fmt, \ + dev_driver_string(&(pdev)->dev), pci_name(pdev), ##arg) + +#define pci_WARN_ONCE(pdev, condition, fmt, arg...) \ + WARN_ONCE(condition, "%s %s: " fmt, \ + dev_driver_string(&(pdev)->dev), pci_name(pdev), ##arg) + #endif /* LINUX_PCI_H */ From 7e24bc347e57992d532bc2ed700209b0fc0a4bf5 Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Wed, 23 Oct 2019 17:40:52 -0500 Subject: [PATCH 10/25] PCI/PM: Apply D2 delay as milliseconds, not microseconds PCI_PM_D2_DELAY is defined as 200, which is milliseconds, but previously we used udelay(), which only waited for 200 microseconds. Use msleep() instead so we wait the correct amount of time. See PCIe r5.0, sec 5.9. Link: https://lore.kernel.org/r/20191101204558.210235-2-helgaas@kernel.org Signed-off-by: Bjorn Helgaas Reviewed-by: Rafael J. Wysocki --- drivers/pci/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index a97e2571a527..4dfbde4f944e 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -886,7 +886,7 @@ static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state) if (state == PCI_D3hot || dev->current_state == PCI_D3hot) pci_dev_d3_sleep(dev); else if (state == PCI_D2 || dev->current_state == PCI_D2) - udelay(PCI_PM_D2_DELAY); + msleep(PCI_PM_D2_DELAY); pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK); From 993cc6d1bd3af734693a74a3ccc9445dd5b31f9c Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Mon, 28 Oct 2019 08:27:00 -0500 Subject: [PATCH 11/25] PCI/PM: Expand PM reset messages to mention D3hot (not just D3) pci_pm_reset() resets a device by putting it in D3hot and bringing it back to D0. Clarify related messages to mention "D3hot" explicitly instead of just "D3". Link: https://lore.kernel.org/r/20191101204558.210235-3-helgaas@kernel.org Signed-off-by: Bjorn Helgaas Reviewed-by: Rafael J. Wysocki --- drivers/pci/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 4dfbde4f944e..9378fe5fe475 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4603,7 +4603,7 @@ static int pci_pm_reset(struct pci_dev *dev, int probe) pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, csr); pci_dev_d3_sleep(dev); - return pci_dev_wait(dev, "PM D3->D0", PCIE_RESET_READY_POLL_MS); + return pci_dev_wait(dev, "PM D3hot->D0", PCIE_RESET_READY_POLL_MS); } /** * pcie_wait_for_link - Wait until link is active or inactive From baef7f8e5e91f85ce7625c11370479f9f0778fae Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Wed, 16 Oct 2019 15:23:20 -0500 Subject: [PATCH 12/25] PCI/PM: Simplify pci_set_power_state() Check for the PCI_DEV_FLAGS_NO_D3 quirk early, before calling __pci_start_power_transition(). This way all the cases where we don't need to do anything at all are checked up front. This doesn't fix anything because if the caller requested D3hot or D3cold, __pci_start_power_transition() is a no-op. But calling it is pointless and makes the code harder to analyze. Link: https://lore.kernel.org/r/20191101204558.210235-4-helgaas@kernel.org Signed-off-by: Bjorn Helgaas Reviewed-by: Rafael J. Wysocki --- drivers/pci/pci.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 9378fe5fe475..ea98c77a6512 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1117,8 +1117,6 @@ int pci_set_power_state(struct pci_dev *dev, pci_power_t state) if (dev->current_state == state) return 0; - __pci_start_power_transition(dev, state); - /* * This device is quirked not to be put into D3, so don't put it in * D3 @@ -1126,6 +1124,8 @@ int pci_set_power_state(struct pci_dev *dev, pci_power_t state) if (state >= PCI_D3hot && (dev->dev_flags & PCI_DEV_FLAGS_NO_D3)) return 0; + __pci_start_power_transition(dev, state); + /* * To put device in D3cold, we put device into D3hot in native * way, then put device into D3cold with platform ops From 77b84bb306fd0d5d1d3a4bf1f2acf73dc971e372 Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Fri, 1 Nov 2019 08:20:40 -0500 Subject: [PATCH 13/25] xen-platform: Convert to generic power management Convert xen-platform from the legacy PCI power management callbacks to the generic operations. This is one step towards removing support for the legacy PCI callbacks. The generic .resume_noirq() operation is called by pci_pm_resume_noirq() at the same point the legacy PCI .resume_early() callback was, so this patch should not change the xen-platform behavior. Link: https://lore.kernel.org/r/20191101204558.210235-5-helgaas@kernel.org Signed-off-by: Bjorn Helgaas Acked-by: Rafael J. Wysocki Cc: Stefano Stabellini Cc: KarimAllah Ahmed --- drivers/xen/platform-pci.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c index 5e30602fdbad..59e85e408c23 100644 --- a/drivers/xen/platform-pci.c +++ b/drivers/xen/platform-pci.c @@ -74,7 +74,7 @@ static int xen_allocate_irq(struct pci_dev *pdev) "xen-platform-pci", pdev); } -static int platform_pci_resume(struct pci_dev *pdev) +static int platform_pci_resume(struct device *dev) { int err; @@ -83,7 +83,7 @@ static int platform_pci_resume(struct pci_dev *pdev) err = xen_set_callback_via(callback_via); if (err) { - dev_err(&pdev->dev, "platform_pci_resume failure!\n"); + dev_err(dev, "platform_pci_resume failure!\n"); return err; } return 0; @@ -168,13 +168,17 @@ static const struct pci_device_id platform_pci_tbl[] = { {0,} }; +static struct dev_pm_ops platform_pm_ops = { + .resume_noirq = platform_pci_resume, +}; + static struct pci_driver platform_driver = { .name = DRV_NAME, .probe = platform_pci_probe, .id_table = platform_pci_tbl, -#ifdef CONFIG_PM - .resume_early = platform_pci_resume, -#endif + .driver = { + .pm = &platform_pm_ops, + }, }; builtin_pci_driver(platform_driver); From 89cdbc3546354c359558a1809133902028c57da4 Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Thu, 31 Oct 2019 17:53:04 -0500 Subject: [PATCH 14/25] PCI/PM: Remove unused pci_driver.resume_early() hook The struct pci_driver.resume_early() hook is one of the legacy PCI power management callbacks, and there are no remaining users of it. Remove it. Link: https://lore.kernel.org/r/20191101204558.210235-6-helgaas@kernel.org Signed-off-by: Bjorn Helgaas Reviewed-by: Rafael J. Wysocki --- Documentation/power/pci.rst | 2 +- drivers/pci/pci-driver.c | 23 ++++++----------------- include/linux/pci.h | 2 -- 3 files changed, 7 insertions(+), 20 deletions(-) diff --git a/Documentation/power/pci.rst b/Documentation/power/pci.rst index a90e82c70a3b..ff7029b94068 100644 --- a/Documentation/power/pci.rst +++ b/Documentation/power/pci.rst @@ -692,7 +692,7 @@ controlling the runtime power management of their devices. At the time of this writing there are two ways to define power management callbacks for a PCI device driver, the recommended one, based on using a dev_pm_ops structure described in Documentation/driver-api/pm/devices.rst, and -the "legacy" one, in which the .suspend(), .suspend_late(), .resume_early(), and +the "legacy" one, in which the .suspend(), .suspend_late(), and .resume() callbacks from struct pci_driver are used. The legacy approach, however, doesn't allow one to define runtime power management callbacks and is not really suitable for any new drivers. Therefore it is not covered by this diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 5337cbbd69de..fc372c2d529a 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -630,15 +630,6 @@ Fixup: return 0; } -static int pci_legacy_resume_early(struct device *dev) -{ - struct pci_dev *pci_dev = to_pci_dev(dev); - struct pci_driver *drv = pci_dev->driver; - - return drv && drv->resume_early ? - drv->resume_early(pci_dev) : 0; -} - static int pci_legacy_resume(struct device *dev) { struct pci_dev *pci_dev = to_pci_dev(dev); @@ -662,8 +653,7 @@ static void pci_pm_default_suspend(struct pci_dev *pci_dev) static bool pci_has_legacy_pm_support(struct pci_dev *pci_dev) { struct pci_driver *drv = pci_dev->driver; - bool ret = drv && (drv->suspend || drv->suspend_late || drv->resume - || drv->resume_early); + bool ret = drv && (drv->suspend || drv->suspend_late || drv->resume); /* * Legacy PM support is used by default, so warn if the new framework is @@ -944,7 +934,7 @@ static int pci_pm_resume_noirq(struct device *dev) pcie_pme_root_status_cleanup(pci_dev); if (pci_has_legacy_pm_support(pci_dev)) - return pci_legacy_resume_early(dev); + return 0; if (pm && pm->resume_noirq) return pm->resume_noirq(dev); @@ -1074,9 +1064,8 @@ static int pci_pm_thaw_noirq(struct device *dev) } /* - * Both the legacy ->resume_early() and the new pm->thaw_noirq() - * callbacks assume the device has been returned to D0 and its - * config state has been restored. + * The pm->thaw_noirq() callback assumes the device has been + * returned to D0 and its config state has been restored. * * In addition, pci_restore_state() restores MSI-X state in MMIO * space, which requires the device to be in D0, so return it to D0 @@ -1087,7 +1076,7 @@ static int pci_pm_thaw_noirq(struct device *dev) pci_restore_state(pci_dev); if (pci_has_legacy_pm_support(pci_dev)) - return pci_legacy_resume_early(dev); + return 0; if (pm && pm->thaw_noirq) return pm->thaw_noirq(dev); @@ -1219,7 +1208,7 @@ static int pci_pm_restore_noirq(struct device *dev) pci_fixup_device(pci_fixup_resume_early, pci_dev); if (pci_has_legacy_pm_support(pci_dev)) - return pci_legacy_resume_early(dev); + return 0; if (pm && pm->restore_noirq) return pm->restore_noirq(dev); diff --git a/include/linux/pci.h b/include/linux/pci.h index 4846306d521c..dd4596fc1208 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -806,7 +806,6 @@ struct module; * context, so it can sleep. * @suspend: Put device into low power state. * @suspend_late: Put device into low power state. - * @resume_early: Wake device from low power state. * @resume: Wake device from low power state. * (Please see Documentation/power/pci.rst for descriptions * of PCI Power Management and the related functions.) @@ -830,7 +829,6 @@ struct pci_driver { void (*remove)(struct pci_dev *dev); /* Device removed (NULL if not a hot-plug capable driver) */ int (*suspend)(struct pci_dev *dev, pm_message_t state); /* Device suspended */ int (*suspend_late)(struct pci_dev *dev, pm_message_t state); - int (*resume_early)(struct pci_dev *dev); int (*resume)(struct pci_dev *dev); /* Device woken up */ void (*shutdown)(struct pci_dev *dev); int (*sriov_configure)(struct pci_dev *dev, int num_vfs); /* On PF */ From 1a1daf097e21e544dd3e7c0ff620d78a9795fbf2 Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Thu, 31 Oct 2019 17:37:54 -0500 Subject: [PATCH 15/25] PCI/PM: Remove unused pci_driver.suspend_late() hook The struct pci_driver.suspend_late() hook is one of the legacy PCI power management callbacks, and there are no remaining users of it. Remove it. Link: https://lore.kernel.org/r/20191101204558.210235-7-helgaas@kernel.org Signed-off-by: Bjorn Helgaas Reviewed-by: Rafael J. Wysocki --- Documentation/power/pci.rst | 10 +++++----- drivers/pci/pci-driver.c | 22 +--------------------- include/linux/pci.h | 2 -- 3 files changed, 6 insertions(+), 28 deletions(-) diff --git a/Documentation/power/pci.rst b/Documentation/power/pci.rst index ff7029b94068..0924d29636ad 100644 --- a/Documentation/power/pci.rst +++ b/Documentation/power/pci.rst @@ -692,11 +692,11 @@ controlling the runtime power management of their devices. At the time of this writing there are two ways to define power management callbacks for a PCI device driver, the recommended one, based on using a dev_pm_ops structure described in Documentation/driver-api/pm/devices.rst, and -the "legacy" one, in which the .suspend(), .suspend_late(), and -.resume() callbacks from struct pci_driver are used. The legacy approach, -however, doesn't allow one to define runtime power management callbacks and is -not really suitable for any new drivers. Therefore it is not covered by this -document (refer to the source code to learn more about it). +the "legacy" one, in which the .suspend() and .resume() callbacks from struct +pci_driver are used. The legacy approach, however, doesn't allow one to define +runtime power management callbacks and is not really suitable for any new +drivers. Therefore it is not covered by this document (refer to the source code +to learn more about it). It is recommended that all PCI device drivers define a struct dev_pm_ops object containing pointers to power management (PM) callbacks that will be executed by diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index fc372c2d529a..e89fd90eaa93 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -599,32 +599,12 @@ static int pci_legacy_suspend(struct device *dev, pm_message_t state) static int pci_legacy_suspend_late(struct device *dev, pm_message_t state) { struct pci_dev *pci_dev = to_pci_dev(dev); - struct pci_driver *drv = pci_dev->driver; - - if (drv && drv->suspend_late) { - pci_power_t prev = pci_dev->current_state; - int error; - - error = drv->suspend_late(pci_dev, state); - suspend_report_result(drv->suspend_late, error); - if (error) - return error; - - if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0 - && pci_dev->current_state != PCI_UNKNOWN) { - pci_WARN_ONCE(pci_dev, pci_dev->current_state != prev, - "PCI PM: Device state not saved by %pS\n", - drv->suspend_late); - goto Fixup; - } - } if (!pci_dev->state_saved) pci_save_state(pci_dev); pci_pm_set_unknown_state(pci_dev); -Fixup: pci_fixup_device(pci_fixup_suspend_late, pci_dev); return 0; @@ -653,7 +633,7 @@ static void pci_pm_default_suspend(struct pci_dev *pci_dev) static bool pci_has_legacy_pm_support(struct pci_dev *pci_dev) { struct pci_driver *drv = pci_dev->driver; - bool ret = drv && (drv->suspend || drv->suspend_late || drv->resume); + bool ret = drv && (drv->suspend || drv->resume); /* * Legacy PM support is used by default, so warn if the new framework is diff --git a/include/linux/pci.h b/include/linux/pci.h index dd4596fc1208..9b0e35e09874 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -805,7 +805,6 @@ struct module; * The remove function always gets called from process * context, so it can sleep. * @suspend: Put device into low power state. - * @suspend_late: Put device into low power state. * @resume: Wake device from low power state. * (Please see Documentation/power/pci.rst for descriptions * of PCI Power Management and the related functions.) @@ -828,7 +827,6 @@ struct pci_driver { int (*probe)(struct pci_dev *dev, const struct pci_device_id *id); /* New device inserted */ void (*remove)(struct pci_dev *dev); /* Device removed (NULL if not a hot-plug capable driver) */ int (*suspend)(struct pci_dev *dev, pm_message_t state); /* Device suspended */ - int (*suspend_late)(struct pci_dev *dev, pm_message_t state); int (*resume)(struct pci_dev *dev); /* Device woken up */ void (*shutdown)(struct pci_dev *dev); int (*sriov_configure)(struct pci_dev *dev, int num_vfs); /* On PF */ From 81cfa5908fd6fe610b7c47b742fe30d6d897ba0f Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Tue, 5 Nov 2019 11:13:43 +0100 Subject: [PATCH 16/25] PCI/PM: Move power state update away from pci_power_up() Move the invocation of pci_update_current_state() from pci_power_up() to pci_pm_default_resume_early(), which is the only caller of that function. Preparatory change, no functional impact. Link: https://lore.kernel.org/r/37482337.udjOGdOKNb@kreacher Signed-off-by: Rafael J. Wysocki Signed-off-by: Bjorn Helgaas Reviewed-by: Mika Westerberg --- drivers/pci/pci-driver.c | 1 + drivers/pci/pci.c | 1 - 2 files changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index e89fd90eaa93..08d3bdbc8c04 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -531,6 +531,7 @@ static void pci_pm_default_resume(struct pci_dev *pci_dev) static void pci_pm_default_resume_early(struct pci_dev *pci_dev) { pci_power_up(pci_dev); + pci_update_current_state(pci_dev, PCI_D0); pci_restore_state(pci_dev); pci_pme_restore(pci_dev); } diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index ea98c77a6512..11b4d9274d96 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1148,7 +1148,6 @@ void pci_power_up(struct pci_dev *dev) { __pci_start_power_transition(dev, PCI_D0); pci_raw_set_power_state(dev, PCI_D0); - pci_update_current_state(dev, PCI_D0); } /** From adfac8f6b7396b408fa9a8f40ea41112bebb980f Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Tue, 5 Nov 2019 11:27:49 +0100 Subject: [PATCH 17/25] PCI/PM: Use pci_power_up() in pci_set_power_state() Make it explicitly clear that the code to put devices into D0 in pci_set_power_state() and in pci_pm_default_resume_early() is the same by making the latter use pci_power_up() for transitions into D0. Code rearrangement, no intentional functional impact. Link: https://lore.kernel.org/r/2520019.OZ1nXS5aSj@kreacher Signed-off-by: Rafael J. Wysocki Signed-off-by: Bjorn Helgaas Reviewed-by: Mika Westerberg --- drivers/pci/pci.c | 25 +++++++++++++------------ drivers/pci/pci.h | 2 +- 2 files changed, 14 insertions(+), 13 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 11b4d9274d96..9f0977d0ac5a 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1032,6 +1032,16 @@ static void __pci_start_power_transition(struct pci_dev *dev, pci_power_t state) } } +/** + * pci_power_up - Put the given device into D0 + * @dev: PCI device to power up + */ +int pci_power_up(struct pci_dev *dev) +{ + __pci_start_power_transition(dev, PCI_D0); + return pci_raw_set_power_state(dev, PCI_D0); +} + /** * __pci_dev_set_current_state - Set current state of a PCI device * @dev: Device to handle @@ -1117,6 +1127,9 @@ int pci_set_power_state(struct pci_dev *dev, pci_power_t state) if (dev->current_state == state) return 0; + if (state == PCI_D0) + return pci_power_up(dev); + /* * This device is quirked not to be put into D3, so don't put it in * D3 @@ -1124,8 +1137,6 @@ int pci_set_power_state(struct pci_dev *dev, pci_power_t state) if (state >= PCI_D3hot && (dev->dev_flags & PCI_DEV_FLAGS_NO_D3)) return 0; - __pci_start_power_transition(dev, state); - /* * To put device in D3cold, we put device into D3hot in native * way, then put device into D3cold with platform ops @@ -1140,16 +1151,6 @@ int pci_set_power_state(struct pci_dev *dev, pci_power_t state) } EXPORT_SYMBOL(pci_set_power_state); -/** - * pci_power_up - Put the given device into D0 forcibly - * @dev: PCI device to power up - */ -void pci_power_up(struct pci_dev *dev) -{ - __pci_start_power_transition(dev, PCI_D0); - pci_raw_set_power_state(dev, PCI_D0); -} - /** * pci_choose_state - Choose the power state of a PCI device * @dev: PCI device to be suspended diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 3f6947ee3324..7c68312c72f4 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -85,7 +85,7 @@ struct pci_platform_pm_ops { int pci_set_platform_pm(const struct pci_platform_pm_ops *ops); void pci_update_current_state(struct pci_dev *dev, pci_power_t state); void pci_refresh_power_state(struct pci_dev *dev); -void pci_power_up(struct pci_dev *dev); +int pci_power_up(struct pci_dev *dev); void pci_disable_enabled_device(struct pci_dev *dev); int pci_finish_runtime_suspend(struct pci_dev *dev); void pcie_clear_root_pme_status(struct pci_dev *dev); From dc2256b0735d03664a92a6cb94ea4e564dfa237b Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Tue, 5 Nov 2019 11:29:16 +0100 Subject: [PATCH 18/25] PCI/PM: Fold __pci_start_power_transition() into its caller Because pci_power_up() has become the only caller of __pci_start_power_transition(), there is no need for the latter to be a separate function any more, so fold it into the former, drop a redundant check and reduce the number of lines of code somewhat. Code rearrangement, no intentional functional impact. Link: https://lore.kernel.org/r/3458080.lsoDbfkST9@kreacher Signed-off-by: Rafael J. Wysocki Signed-off-by: Bjorn Helgaas Reviewed-by: Mika Westerberg --- drivers/pci/pci.c | 50 ++++++++++++++++++----------------------------- 1 file changed, 19 insertions(+), 31 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 9f0977d0ac5a..56bc79e33286 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1002,43 +1002,31 @@ void pci_wakeup_bus(struct pci_bus *bus) pci_walk_bus(bus, pci_wakeup, NULL); } -/** - * __pci_start_power_transition - Start power transition of a PCI device - * @dev: PCI device to handle. - * @state: State to put the device into. - */ -static void __pci_start_power_transition(struct pci_dev *dev, pci_power_t state) -{ - if (state == PCI_D0) { - pci_platform_power_transition(dev, PCI_D0); - /* - * Mandatory power management transition delays, see - * PCI Express Base Specification Revision 2.0 Section - * 6.6.1: Conventional Reset. Do not delay for - * devices powered on/off by corresponding bridge, - * because have already delayed for the bridge. - */ - if (dev->runtime_d3cold) { - if (dev->d3cold_delay && !dev->imm_ready) - msleep(dev->d3cold_delay); - /* - * When powering on a bridge from D3cold, the - * whole hierarchy may be powered on into - * D0uninitialized state, resume them to give - * them a chance to suspend again - */ - pci_wakeup_bus(dev->subordinate); - } - } -} - /** * pci_power_up - Put the given device into D0 * @dev: PCI device to power up */ int pci_power_up(struct pci_dev *dev) { - __pci_start_power_transition(dev, PCI_D0); + pci_platform_power_transition(dev, PCI_D0); + + /* + * Mandatory power management transition delays, see PCI Express Base + * Specification Revision 2.0 Section 6.6.1: Conventional Reset. Do not + * delay for devices powered on/off by corresponding bridge, because + * have already delayed for the bridge. + */ + if (dev->runtime_d3cold) { + if (dev->d3cold_delay && !dev->imm_ready) + msleep(dev->d3cold_delay); + /* + * When powering on a bridge from D3cold, the whole hierarchy + * may be powered on into D0uninitialized state, resume them to + * give them a chance to suspend again + */ + pci_wakeup_bus(dev->subordinate); + } + return pci_raw_set_power_state(dev, PCI_D0); } From d6aa37cd04fdafaf31ae89691e537535df43ca78 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Tue, 5 Nov 2019 11:30:36 +0100 Subject: [PATCH 19/25] PCI/PM: Avoid exporting __pci_complete_power_transition() Notice that radeon_set_suspend(), which is the only caller of __pci_complete_power_transition() outside of pci.c, really only cares about the pci_platform_power_transition() invoked by it, so export the latter instead of it, update the radeon driver to call pci_platform_power_transition() directly and make __pci_complete_power_transition() static. Code rearrangement, no intentional functional impact. Link: https://lore.kernel.org/r/1731661.ykamz2Tiuf@kreacher Signed-off-by: Rafael J. Wysocki Signed-off-by: Bjorn Helgaas Reviewed-by: Mika Westerberg --- drivers/pci/pci.c | 6 +++--- drivers/video/fbdev/aty/radeon_pm.c | 2 +- include/linux/pci.h | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 56bc79e33286..2f53234f9e91 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -963,7 +963,7 @@ void pci_refresh_power_state(struct pci_dev *dev) * @dev: PCI device to handle. * @state: State to put the device into. */ -static int pci_platform_power_transition(struct pci_dev *dev, pci_power_t state) +int pci_platform_power_transition(struct pci_dev *dev, pci_power_t state) { int error; @@ -979,6 +979,7 @@ static int pci_platform_power_transition(struct pci_dev *dev, pci_power_t state) return error; } +EXPORT_SYMBOL_GPL(pci_platform_power_transition); /** * pci_wakeup - Wake up a PCI device @@ -1061,7 +1062,7 @@ void pci_bus_set_current_state(struct pci_bus *bus, pci_power_t state) * * This function should not be called directly by device drivers. */ -int __pci_complete_power_transition(struct pci_dev *dev, pci_power_t state) +static int __pci_complete_power_transition(struct pci_dev *dev, pci_power_t state) { int ret; @@ -1073,7 +1074,6 @@ int __pci_complete_power_transition(struct pci_dev *dev, pci_power_t state) pci_bus_set_current_state(dev->subordinate, PCI_D3cold); return ret; } -EXPORT_SYMBOL_GPL(__pci_complete_power_transition); /** * pci_set_power_state - Set the power state of a PCI device diff --git a/drivers/video/fbdev/aty/radeon_pm.c b/drivers/video/fbdev/aty/radeon_pm.c index 2dc5703eac51..7c4483c7f313 100644 --- a/drivers/video/fbdev/aty/radeon_pm.c +++ b/drivers/video/fbdev/aty/radeon_pm.c @@ -2593,7 +2593,7 @@ static void radeon_set_suspend(struct radeonfb_info *rinfo, int suspend) * calling pci_set_power_state() */ radeonfb_whack_power_state(rinfo, PCI_D2); - __pci_complete_power_transition(rinfo->pdev, PCI_D2); + pci_platform_power_transition(rinfo->pdev, PCI_D2); } else { printk(KERN_DEBUG "radeonfb (%s): switching to D0 state...\n", pci_name(rinfo->pdev)); diff --git a/include/linux/pci.h b/include/linux/pci.h index 9b0e35e09874..86976cccdfe3 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1228,7 +1228,7 @@ struct pci_cap_saved_state *pci_find_saved_ext_cap(struct pci_dev *dev, int pci_add_cap_save_buffer(struct pci_dev *dev, char cap, unsigned int size); int pci_add_ext_cap_save_buffer(struct pci_dev *dev, u16 cap, unsigned int size); -int __pci_complete_power_transition(struct pci_dev *dev, pci_power_t state); +int pci_platform_power_transition(struct pci_dev *dev, pci_power_t state); int pci_set_power_state(struct pci_dev *dev, pci_power_t state); pci_power_t pci_choose_state(struct pci_dev *dev, pm_message_t state); bool pci_pme_capable(struct pci_dev *dev, pci_power_t state); From 9c77e63bd8dcbb3b59294e9176c00c5fcab3c9c6 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Tue, 5 Nov 2019 17:32:08 +0100 Subject: [PATCH 20/25] PCI/PM: Fold __pci_complete_power_transition() into its caller Because pci_set_power_state() has become the only caller of __pci_complete_power_transition(), there is no need for the latter to be a separate function any more, so fold it into the former, drop a redundant check and reduce the number of lines of code somewhat. Code rearrangement, no intentional functional impact. Link: https://lore.kernel.org/r/15576968.k611qn3UU0@kreacher Signed-off-by: Rafael J. Wysocki Signed-off-by: Bjorn Helgaas Reviewed-by: Mika Westerberg --- drivers/pci/pci.c | 30 +++++++----------------------- 1 file changed, 7 insertions(+), 23 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 2f53234f9e91..37fca1e51d16 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1055,26 +1055,6 @@ void pci_bus_set_current_state(struct pci_bus *bus, pci_power_t state) pci_walk_bus(bus, __pci_dev_set_current_state, &state); } -/** - * __pci_complete_power_transition - Complete power transition of a PCI device - * @dev: PCI device to handle. - * @state: State to put the device into. - * - * This function should not be called directly by device drivers. - */ -static int __pci_complete_power_transition(struct pci_dev *dev, pci_power_t state) -{ - int ret; - - if (state <= PCI_D0) - return -EINVAL; - ret = pci_platform_power_transition(dev, state); - /* Power off the bridge may power off the whole hierarchy */ - if (!ret && state == PCI_D3cold) - pci_bus_set_current_state(dev->subordinate, PCI_D3cold); - return ret; -} - /** * pci_set_power_state - Set the power state of a PCI device * @dev: PCI device to handle. @@ -1132,10 +1112,14 @@ int pci_set_power_state(struct pci_dev *dev, pci_power_t state) error = pci_raw_set_power_state(dev, state > PCI_D3hot ? PCI_D3hot : state); - if (!__pci_complete_power_transition(dev, state)) - error = 0; + if (pci_platform_power_transition(dev, state)) + return error; - return error; + /* Powering off a bridge may power off the whole hierarchy */ + if (state == PCI_D3cold) + pci_bus_set_current_state(dev->subordinate, PCI_D3cold); + + return 0; } EXPORT_SYMBOL(pci_set_power_state); From e43f15ea2f6d6858675fa1baa5cb624f17269af0 Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Fri, 2 Aug 2019 18:47:22 -0500 Subject: [PATCH 21/25] PCI/PM: Decode D3cold power state correctly Use pci_power_name() to print pci_power_t correctly. This changes: "state 0" or "D0" to "D0" "state 1" or "D1" to "D1" "state 2" or "D2" to "D2" "state 3" or "D3" to "D3hot" "state 4" or "D4" to "D3cold" Changes dmesg logging only, no other functional change intended. Link: https://lore.kernel.org/r/20190822200551.129039-3-helgaas@kernel.org Signed-off-by: Bjorn Helgaas Reviewed-by: Rafael J. Wysocki Reviewed-by: Keith Busch Reviewed-by: Mika Westerberg --- drivers/pci/pci.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 37fca1e51d16..3376c663035d 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -834,14 +834,16 @@ static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state) return -EINVAL; /* - * Validate current state: - * Can enter D0 from any state, but if we can only go deeper - * to sleep if we're already in a low power state + * Validate transition: We can enter D0 from any state, but if + * we're already in a low-power state, we can only go deeper. E.g., + * we can go from D1 to D3, but we can't go directly from D3 to D1; + * we'd have to go from D3 to D0, then to D1. */ if (state != PCI_D0 && dev->current_state <= PCI_D3cold && dev->current_state > state) { - pci_err(dev, "invalid power transition (from state %d to %d)\n", - dev->current_state, state); + pci_err(dev, "invalid power transition (from %s to %s)\n", + pci_power_name(dev->current_state), + pci_power_name(state)); return -EINVAL; } @@ -891,8 +893,9 @@ static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state) pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK); if (dev->current_state != state) - pci_info_ratelimited(dev, "Refused to change power state, currently in D%d\n", - dev->current_state); + pci_info_ratelimited(dev, "refused to change power state from %s to %s\n", + pci_power_name(dev->current_state), + pci_power_name(state)); /* * According to section 5.4.1 of the "PCI BUS POWER MANAGEMENT From 327ccbbcc1497bff6d6f543c1f37c43a1653671f Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Thu, 1 Aug 2019 11:50:56 -0500 Subject: [PATCH 22/25] PCI/PM: Return error when changing power state from D3cold pci_raw_set_power_state() uses the Power Management capability to change a device's power state. The capability is in config space, which is accessible in D0, D1, D2, and D3hot, but not in D3cold. If we call pci_raw_set_power_state() on a device that's in D3cold, config reads fail and return ~0 data, which we erroneously interpreted as "the device is in D3hot", leading to messages like this: pcieport 0000:03:00.0: Refused to change power state, currently in D3 The PCI_PM_CTRL has several RsvdP fields, so ~0 is never a valid register value. If we get that value, print a more informative message and return an error. Changing the power state of a device from D3cold must be done by a platform power management method or some other non-config space mechanism. Link: https://lore.kernel.org/r/20190822200551.129039-4-helgaas@kernel.org Signed-off-by: Bjorn Helgaas Reviewed-by: Rafael J. Wysocki Reviewed-by: Keith Busch Reviewed-by: Mika Westerberg --- drivers/pci/pci.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 3376c663035d..d6ff1a98bd8d 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -853,6 +853,12 @@ static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state) return -EIO; pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); + if (pmcsr == (u16) ~0) { + pci_err(dev, "can't change power state from %s to %s (config space inaccessible)\n", + pci_power_name(dev->current_state), + pci_power_name(state)); + return -EIO; + } /* * If we're (effectively) in D3, force entire word to 0. From 4827d63891b6a839dac49c6ab62e61c4b011c4f2 Mon Sep 17 00:00:00 2001 From: Mika Westerberg Date: Tue, 12 Nov 2019 12:16:16 +0300 Subject: [PATCH 23/25] PCI/PM: Add pcie_wait_for_link_delay() Add pcie_wait_for_link_delay(). Similar to pcie_wait_for_link() but allows passing custom activation delay in milliseconds. Link: https://lore.kernel.org/r/20191112091617.70282-2-mika.westerberg@linux.intel.com Signed-off-by: Mika Westerberg Signed-off-by: Bjorn Helgaas Reviewed-by: Rafael J. Wysocki --- drivers/pci/pci.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index d6ff1a98bd8d..11e4b495cfac 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4586,14 +4586,17 @@ static int pci_pm_reset(struct pci_dev *dev, int probe) return pci_dev_wait(dev, "PM D3hot->D0", PCIE_RESET_READY_POLL_MS); } + /** - * pcie_wait_for_link - Wait until link is active or inactive + * pcie_wait_for_link_delay - Wait until link is active or inactive * @pdev: Bridge device * @active: waiting for active or inactive? + * @delay: Delay to wait after link has become active (in ms) * * Use this to wait till link becomes active or inactive. */ -bool pcie_wait_for_link(struct pci_dev *pdev, bool active) +static bool pcie_wait_for_link_delay(struct pci_dev *pdev, bool active, + int delay) { int timeout = 1000; bool ret; @@ -4630,13 +4633,25 @@ bool pcie_wait_for_link(struct pci_dev *pdev, bool active) timeout -= 10; } if (active && ret) - msleep(100); + msleep(delay); else if (ret != active) pci_info(pdev, "Data Link Layer Link Active not %s in 1000 msec\n", active ? "set" : "cleared"); return ret == active; } +/** + * pcie_wait_for_link - Wait until link is active or inactive + * @pdev: Bridge device + * @active: waiting for active or inactive? + * + * Use this to wait till link becomes active or inactive. + */ +bool pcie_wait_for_link(struct pci_dev *pdev, bool active) +{ + return pcie_wait_for_link_delay(pdev, active, 100); +} + void pci_reset_secondary_bus(struct pci_dev *dev) { u16 ctrl; From ad9001f2f41198784b0423646450ba2cb24793a3 Mon Sep 17 00:00:00 2001 From: Mika Westerberg Date: Tue, 12 Nov 2019 12:16:17 +0300 Subject: [PATCH 24/25] PCI/PM: Add missing link delays required by the PCIe spec Currently Linux does not follow PCIe spec regarding the required delays after reset. A concrete example is a Thunderbolt add-in-card that consists of a PCIe switch and two PCIe endpoints: +-1b.0-[01-6b]----00.0-[02-6b]--+-00.0-[03]----00.0 TBT controller +-01.0-[04-36]-- DS hotplug port +-02.0-[37]----00.0 xHCI controller \-04.0-[38-6b]-- DS hotplug port The root port (1b.0) and the PCIe switch downstream ports are all PCIe Gen3 so they support 8GT/s link speeds. We wait for the PCIe hierarchy to enter D3cold (runtime): pcieport 0000:00:1b.0: power state changed by ACPI to D3cold When it wakes up from D3cold, according to the PCIe 5.0 section 5.8 the PCIe switch is put to reset and its power is re-applied. This means that we must follow the rules in PCIe 5.0 section 6.6.1. For the PCIe Gen3 ports we are dealing with here, the following applies: With a Downstream Port that supports Link speeds greater than 5.0 GT/s, software must wait a minimum of 100 ms after Link training completes before sending a Configuration Request to the device immediately below that Port. Software can determine when Link training completes by polling the Data Link Layer Link Active bit or by setting up an associated interrupt (see Section 6.7.3.3). Translating this into the above topology we would need to do this (DLLLA stands for Data Link Layer Link Active): 0000:00:1b.0: wait for 100 ms after DLLLA is set before access to 0000:01:00.0 0000:02:00.0: wait for 100 ms after DLLLA is set before access to 0000:03:00.0 0000:02:02.0: wait for 100 ms after DLLLA is set before access to 0000:37:00.0 I've instrumented the kernel with some additional logging so we can see the actual delays performed: pcieport 0000:00:1b.0: power state changed by ACPI to D0 pcieport 0000:00:1b.0: waiting for D3cold delay of 100 ms pcieport 0000:00:1b.0: waiting for D3hot delay of 10 ms pcieport 0000:02:01.0: waiting for D3hot delay of 10 ms pcieport 0000:02:04.0: waiting for D3hot delay of 10 ms For the switch upstream port (01:00.0 reachable through 00:1b.0 root port) we wait for 100 ms but not taking into account the DLLLA requirement. We then wait 10 ms for D3hot -> D0 transition of the root port and the two downstream hotplug ports. This means that we deviate from what the spec requires. Performing the same check for system sleep (s2idle) transitions it turns out to be even worse. None of the mandatory delays are performed. If this would be S3 instead of s2idle then according to PCI FW spec 3.2 section 4.6.8. there is a specific _DSM that allows the OS to skip the delays but this platform does not provide the _DSM and does not go to S3 anyway so no firmware is involved that could already handle these delays. On this particular platform these delays are not actually needed because there is an additional delay as part of the ACPI power resource that is used to turn on power to the hierarchy but since that additional delay is not required by any of standards (PCIe, ACPI) it is not present in the Intel Ice Lake, for example where missing the mandatory delays causes pciehp to start tearing down the stack too early (links are not yet trained). Below is an example how it looks like when this happens: pcieport 0000:83:04.0: pciehp: Slot(4): Card not present pcieport 0000:87:04.0: PME# disabled pcieport 0000:83:04.0: pciehp: pciehp_unconfigure_device: domain:bus:dev = 0000:86:00 pcieport 0000:86:00.0: Refused to change power state, currently in D3 pcieport 0000:86:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x201ff) pcieport 0000:86:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0) ... There is also one reported case (see the bugzilla link below) where the missing delay causes xHCI on a Titan Ridge controller fail to runtime resume when USB-C dock is plugged. This does not involve pciehp but instead the PCI core fails to runtime resume the xHCI device: pcieport 0000:04:02.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020) pcieport 0000:04:02.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100406) xhci_hcd 0000:39:00.0: Refused to change power state, currently in D3 xhci_hcd 0000:39:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x1ff) xhci_hcd 0000:39:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0) ... Add a new function pci_bridge_wait_for_secondary_bus() that is called on PCI core resume and runtime resume paths accordingly if the bridge entered D3cold (and thus went through reset). This is second attempt to add the missing delays. The previous solution in c2bf1fc212f7 ("PCI: Add missing link delays required by the PCIe spec") was reverted because of two issues it caused: 1. One system become unresponsive after S3 resume due to PME service spinning in pcie_pme_work_fn(). The root port in question reports that the xHCI sent PME but the xHCI device itself does not have PME status set. The PME status bit is never cleared in the root port resulting the indefinite loop in pcie_pme_work_fn(). 2. Slows down resume if the root/downstream port does not support Data Link Layer Active Reporting because pcie_wait_for_link_delay() waits 1100 ms in that case. This version should avoid the above issues because we restrict the delay to happen only if the port went into D3cold. Link: https://lore.kernel.org/linux-pci/SL2P216MB01878BBCD75F21D882AEEA2880C60@SL2P216MB0187.KORP216.PROD.OUTLOOK.COM/ Link: https://bugzilla.kernel.org/show_bug.cgi?id=203885 Link: https://lore.kernel.org/r/20191112091617.70282-3-mika.westerberg@linux.intel.com Reported-by: Kai-Heng Feng Tested-by: Kai-Heng Feng Signed-off-by: Mika Westerberg Signed-off-by: Bjorn Helgaas --- drivers/pci/pci-driver.c | 11 +++- drivers/pci/pci.c | 128 +++++++++++++++++++++++++++++++++++++-- drivers/pci/pci.h | 1 + 3 files changed, 133 insertions(+), 7 deletions(-) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 08d3bdbc8c04..0454ca0e4e3f 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -890,6 +890,8 @@ static int pci_pm_resume_noirq(struct device *dev) { struct pci_dev *pci_dev = to_pci_dev(dev); const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; + pci_power_t prev_state = pci_dev->current_state; + bool skip_bus_pm = pci_dev->skip_bus_pm; if (dev_pm_may_skip_resume(dev)) return 0; @@ -908,12 +910,15 @@ static int pci_pm_resume_noirq(struct device *dev) * configuration here and attempting to put them into D0 again is * pointless, so avoid doing that. */ - if (!(pci_dev->skip_bus_pm && pm_suspend_no_platform())) + if (!(skip_bus_pm && pm_suspend_no_platform())) pci_pm_default_resume_early(pci_dev); pci_fixup_device(pci_fixup_resume_early, pci_dev); pcie_pme_root_status_cleanup(pci_dev); + if (!skip_bus_pm && prev_state == PCI_D3cold) + pci_bridge_wait_for_secondary_bus(pci_dev); + if (pci_has_legacy_pm_support(pci_dev)) return 0; @@ -1299,6 +1304,7 @@ static int pci_pm_runtime_resume(struct device *dev) { struct pci_dev *pci_dev = to_pci_dev(dev); const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; + pci_power_t prev_state = pci_dev->current_state; int error = 0; /* @@ -1314,6 +1320,9 @@ static int pci_pm_runtime_resume(struct device *dev) pci_fixup_device(pci_fixup_resume_early, pci_dev); pci_pm_default_resume(pci_dev); + if (prev_state == PCI_D3cold) + pci_bridge_wait_for_secondary_bus(pci_dev); + if (pm && pm->runtime_resume) error = pm->runtime_resume(dev); diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 11e4b495cfac..b9d91370856a 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1021,14 +1021,11 @@ int pci_power_up(struct pci_dev *dev) pci_platform_power_transition(dev, PCI_D0); /* - * Mandatory power management transition delays, see PCI Express Base - * Specification Revision 2.0 Section 6.6.1: Conventional Reset. Do not - * delay for devices powered on/off by corresponding bridge, because - * have already delayed for the bridge. + * Mandatory power management transition delays are handled in + * pci_pm_resume_noirq() and pci_pm_runtime_resume() of the + * corresponding bridge. */ if (dev->runtime_d3cold) { - if (dev->d3cold_delay && !dev->imm_ready) - msleep(dev->d3cold_delay); /* * When powering on a bridge from D3cold, the whole hierarchy * may be powered on into D0uninitialized state, resume them to @@ -4652,6 +4649,125 @@ bool pcie_wait_for_link(struct pci_dev *pdev, bool active) return pcie_wait_for_link_delay(pdev, active, 100); } +/* + * Find maximum D3cold delay required by all the devices on the bus. The + * spec says 100 ms, but firmware can lower it and we allow drivers to + * increase it as well. + * + * Called with @pci_bus_sem locked for reading. + */ +static int pci_bus_max_d3cold_delay(const struct pci_bus *bus) +{ + const struct pci_dev *pdev; + int min_delay = 100; + int max_delay = 0; + + list_for_each_entry(pdev, &bus->devices, bus_list) { + if (pdev->d3cold_delay < min_delay) + min_delay = pdev->d3cold_delay; + if (pdev->d3cold_delay > max_delay) + max_delay = pdev->d3cold_delay; + } + + return max(min_delay, max_delay); +} + +/** + * pci_bridge_wait_for_secondary_bus - Wait for secondary bus to be accessible + * @dev: PCI bridge + * + * Handle necessary delays before access to the devices on the secondary + * side of the bridge are permitted after D3cold to D0 transition. + * + * For PCIe this means the delays in PCIe 5.0 section 6.6.1. For + * conventional PCI it means Tpvrh + Trhfa specified in PCI 3.0 section + * 4.3.2. + */ +void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev) +{ + struct pci_dev *child; + int delay; + + if (pci_dev_is_disconnected(dev)) + return; + + if (!pci_is_bridge(dev) || !dev->bridge_d3) + return; + + down_read(&pci_bus_sem); + + /* + * We only deal with devices that are present currently on the bus. + * For any hot-added devices the access delay is handled in pciehp + * board_added(). In case of ACPI hotplug the firmware is expected + * to configure the devices before OS is notified. + */ + if (!dev->subordinate || list_empty(&dev->subordinate->devices)) { + up_read(&pci_bus_sem); + return; + } + + /* Take d3cold_delay requirements into account */ + delay = pci_bus_max_d3cold_delay(dev->subordinate); + if (!delay) { + up_read(&pci_bus_sem); + return; + } + + child = list_first_entry(&dev->subordinate->devices, struct pci_dev, + bus_list); + up_read(&pci_bus_sem); + + /* + * Conventional PCI and PCI-X we need to wait Tpvrh + Trhfa before + * accessing the device after reset (that is 1000 ms + 100 ms). In + * practice this should not be needed because we don't do power + * management for them (see pci_bridge_d3_possible()). + */ + if (!pci_is_pcie(dev)) { + pci_dbg(dev, "waiting %d ms for secondary bus\n", 1000 + delay); + msleep(1000 + delay); + return; + } + + /* + * For PCIe downstream and root ports that do not support speeds + * greater than 5 GT/s need to wait minimum 100 ms. For higher + * speeds (gen3) we need to wait first for the data link layer to + * become active. + * + * However, 100 ms is the minimum and the PCIe spec says the + * software must allow at least 1s before it can determine that the + * device that did not respond is a broken device. There is + * evidence that 100 ms is not always enough, for example certain + * Titan Ridge xHCI controller does not always respond to + * configuration requests if we only wait for 100 ms (see + * https://bugzilla.kernel.org/show_bug.cgi?id=203885). + * + * Therefore we wait for 100 ms and check for the device presence. + * If it is still not present give it an additional 100 ms. + */ + if (!pcie_downstream_port(dev)) + return; + + if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) { + pci_dbg(dev, "waiting %d ms for downstream link\n", delay); + msleep(delay); + } else { + pci_dbg(dev, "waiting %d ms for downstream link, after activation\n", + delay); + if (!pcie_wait_for_link_delay(dev, true, delay)) { + /* Did not train, no need to wait any further */ + return; + } + } + + if (!pci_device_is_present(child)) { + pci_dbg(child, "waiting additional %d ms to become accessible\n", delay); + msleep(delay); + } +} + void pci_reset_secondary_bus(struct pci_dev *dev) { u16 ctrl; diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 7c68312c72f4..bf3c7e5de6fa 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -104,6 +104,7 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev); void pci_free_cap_save_buffers(struct pci_dev *dev); bool pci_bridge_d3_possible(struct pci_dev *dev); void pci_bridge_d3_update(struct pci_dev *dev); +void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev); static inline void pci_wakeup_event(struct pci_dev *dev) { From bae26849372b83c65da73d19ff58e987d70e6600 Mon Sep 17 00:00:00 2001 From: Vidya Sagar Date: Wed, 20 Nov 2019 10:47:42 +0530 Subject: [PATCH 25/25] PCI/PM: Move pci_dev_wait() definition earlier Move the definition of pci_dev_wait() above pci_power_up() so that it can be called from the latter with no change in functionality. This is a pure code move with no functional change. Link: https://lore.kernel.org/r/20191120051743.23124-1-vidyas@nvidia.com Signed-off-by: Vidya Sagar Signed-off-by: Bjorn Helgaas --- drivers/pci/pci.c | 82 +++++++++++++++++++++++------------------------ 1 file changed, 41 insertions(+), 41 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index b9d91370856a..d6b44ae8baa4 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1012,6 +1012,47 @@ void pci_wakeup_bus(struct pci_bus *bus) pci_walk_bus(bus, pci_wakeup, NULL); } +static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout) +{ + int delay = 1; + u32 id; + + /* + * After reset, the device should not silently discard config + * requests, but it may still indicate that it needs more time by + * responding to them with CRS completions. The Root Port will + * generally synthesize ~0 data to complete the read (except when + * CRS SV is enabled and the read was for the Vendor ID; in that + * case it synthesizes 0x0001 data). + * + * Wait for the device to return a non-CRS completion. Read the + * Command register instead of Vendor ID so we don't have to + * contend with the CRS SV value. + */ + pci_read_config_dword(dev, PCI_COMMAND, &id); + while (id == ~0) { + if (delay > timeout) { + pci_warn(dev, "not ready %dms after %s; giving up\n", + delay - 1, reset_type); + return -ENOTTY; + } + + if (delay > 1000) + pci_info(dev, "not ready %dms after %s; waiting\n", + delay - 1, reset_type); + + msleep(delay); + delay *= 2; + pci_read_config_dword(dev, PCI_COMMAND, &id); + } + + if (delay > 1000) + pci_info(dev, "ready %dms after %s\n", delay - 1, + reset_type); + + return 0; +} + /** * pci_power_up - Put the given device into D0 * @dev: PCI device to power up @@ -4406,47 +4447,6 @@ int pci_wait_for_pending_transaction(struct pci_dev *dev) } EXPORT_SYMBOL(pci_wait_for_pending_transaction); -static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout) -{ - int delay = 1; - u32 id; - - /* - * After reset, the device should not silently discard config - * requests, but it may still indicate that it needs more time by - * responding to them with CRS completions. The Root Port will - * generally synthesize ~0 data to complete the read (except when - * CRS SV is enabled and the read was for the Vendor ID; in that - * case it synthesizes 0x0001 data). - * - * Wait for the device to return a non-CRS completion. Read the - * Command register instead of Vendor ID so we don't have to - * contend with the CRS SV value. - */ - pci_read_config_dword(dev, PCI_COMMAND, &id); - while (id == ~0) { - if (delay > timeout) { - pci_warn(dev, "not ready %dms after %s; giving up\n", - delay - 1, reset_type); - return -ENOTTY; - } - - if (delay > 1000) - pci_info(dev, "not ready %dms after %s; waiting\n", - delay - 1, reset_type); - - msleep(delay); - delay *= 2; - pci_read_config_dword(dev, PCI_COMMAND, &id); - } - - if (delay > 1000) - pci_info(dev, "ready %dms after %s\n", delay - 1, - reset_type); - - return 0; -} - /** * pcie_has_flr - check if a device supports function level resets * @dev: device to check