pci-v5.3-changes
-----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl0siFoUHGJoZWxnYWFz QGdvb2dsZS5jb20ACgkQWYigwDrT+vzi9A//S4jRyyZrgUr88Az0GbgMhE4b3yqc uL7om/Sf+443gG6C+aKkZSM/IE9hrbyIKuYq7GGxDkzZ/HkucZo2yIuAHkPgG4ik QQYJ8fJsmMq1bUht87c1ZZwGP0++Deq/Ns2+VNy/WBYqKLulnV0DvEEaJgPs9C5D ppwccGdo6UghiujBTpE4ddUBjFjjURWqT6wSnMRDQ4EGwfUhG0MWwwHKI4hbBuaL N6refuggdYyUUX5FeUOHa6VF6uTnSSAQ75k+40n4nljdayqoumHLskst77o9q5ZI oXjdpwgmuEqYhfp03HEA4Xo/bBxiRj76NuTiEMKvPokxjpanwbLrdV0GhF0OIlM0 rp1NOI1w+vppFrU+rc2gtq+7hYXFmvdhjS29hFLeD91PP36N5d29jW5NVFpm7GCm n4TMGAOsu8RB+bNua6ZbZVcDk2EnPgQeIcM0ZPoBtPK19Fg/rScdEU4u/aFE1Y0Q C+Ks7D1qCvFpHzl/xAg0oo9v/jFsWef3qnQWOzot964Zz4W4NSVvB9Ox6Vbfj6C4 v331LJmlPxG8fxBNA3q28FrTxcG1NW6sgo3WY9VoSp/vc0aqaPKhm7sbraTt5IrI TwqA/WhnAHv90MQCGFcofANyYTkjPkKk2QBFK6b0suoAmVdwVWWELi1WaZ+HdvgQ JP7YpmC2cXcQBPk= =ZGxL -----END PGP SIGNATURE----- Merge tag 'pci-v5.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Enumeration changes: - Evaluate PCI Boot Configuration _DSM to learn if firmware wants us to preserve its resource assignments (Benjamin Herrenschmidt) - Simplify resource distribution (Nicholas Johnson) - Decode 32 GT/s link speed (Gustavo Pimentel) Virtualization: - Fix incorrect caching of VF config space size (Alex Williamson) - Fix VF driver probing sysfs knobs (Alex Williamson) Peer-to-peer DMA: - Fix dma_virt_ops check (Logan Gunthorpe) Altera host bridge driver: - Allow building as module (Ley Foon Tan) Armada 8K host bridge driver: - add PHYs support (Miquel Raynal) DesignWare host bridge driver: - Export APIs to support removable loadable module (Vidya Sagar) - Enable Relaxed Ordering erratum workaround only on Tegra20 & Tegra30 (Vidya Sagar) Hyper-V host bridge driver: - Fix use-after-free in eject (Dexuan Cui) Mobiveil host bridge driver: - Clean up and fix many issues, including non-identify mapped windows, 64-bit windows, multi-MSI, class code, INTx clearing (Hou Zhiqiang) Qualcomm host bridge driver: - Use clk bulk API for 2.4.0 controllers (Bjorn Andersson) - Add QCS404 support (Bjorn Andersson) - Assert PERST for at least 100ms (Niklas Cassel) R-Car host bridge driver: - Add r8a774a1 DT support (Biju Das) Tegra host bridge driver: - Add support for Gen2, opportunistic UpdateFC and ACK (PCIe protocol details) AER, GPIO-based PERST# (Manikanta Maddireddy) - Fix many issues, including power-on failure cases, interrupt masking in suspend, UPHY settings, AFI dynamic clock gating, pending DLL transactions (Manikanta Maddireddy) Xilinx host bridge driver: - Fix NWL Multi-MSI programming (Bharat Kumar Gogada) Endpoint support: - Fix 64bit BAR support (Alan Mikhak) - Fix pcitest build issues (Alan Mikhak, Andy Shevchenko) Bug fixes: - Fix NVIDIA GPU multi-function power dependencies (Abhishek Sahu) - Fix NVIDIA GPU HDA enablement issue (Lukas Wunner) - Ignore lockdep for sysfs "remove" (Marek Vasut) Misc: - Convert docs to reST (Changbin Du, Mauro Carvalho Chehab)" * tag 'pci-v5.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (107 commits) PCI: Enable NVIDIA HDA controllers tools: PCI: Fix installation when `make tools/pci_install` PCI: dwc: pci-dra7xx: Fix compilation when !CONFIG_GPIOLIB PCI: Fix typos and whitespace errors PCI: mobiveil: Fix INTx interrupt clearing in mobiveil_pcie_isr() PCI: mobiveil: Fix infinite-loop in the INTx handling function PCI: mobiveil: Move PCIe PIO enablement out of inbound window routine PCI: mobiveil: Add upper 32-bit PCI base address setup in inbound window PCI: mobiveil: Add upper 32-bit CPU base address setup in outbound window PCI: mobiveil: Mask out hardcoded bits in inbound/outbound windows setup PCI: mobiveil: Clear the control fields before updating it PCI: mobiveil: Add configured inbound windows counter PCI: mobiveil: Fix the valid check for inbound and outbound windows PCI: mobiveil: Clean-up program_{ib/ob}_windows() PCI: mobiveil: Remove an unnecessary return value check PCI: mobiveil: Fix error return values PCI: mobiveil: Refactor the MEM/IO outbound window initialization PCI: mobiveil: Make some register updates more readable PCI: mobiveil: Reformat the code for readability dt-bindings: PCI: mobiveil: Change gpio_slave and apb_csr to optional ...
This commit is contained in:
commit
fb4da215ed
|
@ -5,7 +5,7 @@ Contact: linux-pm@vger.kernel.org
|
||||||
Description:
|
Description:
|
||||||
The powercap/ class sub directory belongs to the power cap
|
The powercap/ class sub directory belongs to the power cap
|
||||||
subsystem. Refer to
|
subsystem. Refer to
|
||||||
Documentation/power/powercap/powercap.txt for details.
|
Documentation/power/powercap/powercap.rst for details.
|
||||||
|
|
||||||
What: /sys/class/powercap/<control type>
|
What: /sys/class/powercap/<control type>
|
||||||
Date: September 2013
|
Date: September 2013
|
||||||
|
|
|
@ -1,4 +1,8 @@
|
||||||
ACPI considerations for PCI host bridges
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
========================================
|
||||||
|
ACPI considerations for PCI host bridges
|
||||||
|
========================================
|
||||||
|
|
||||||
The general rule is that the ACPI namespace should describe everything the
|
The general rule is that the ACPI namespace should describe everything the
|
||||||
OS might use unless there's another way for the OS to find it [1, 2].
|
OS might use unless there's another way for the OS to find it [1, 2].
|
||||||
|
@ -131,12 +135,13 @@ address always corresponds to bus 0, even if the bus range below the bridge
|
||||||
|
|
||||||
[4] ACPI 6.2, sec 6.4.3.5.1, 2, 3, 4:
|
[4] ACPI 6.2, sec 6.4.3.5.1, 2, 3, 4:
|
||||||
QWord/DWord/Word Address Space Descriptor (.1, .2, .3)
|
QWord/DWord/Word Address Space Descriptor (.1, .2, .3)
|
||||||
General Flags: Bit [0] Ignored
|
General Flags: Bit [0] Ignored
|
||||||
|
|
||||||
Extended Address Space Descriptor (.4)
|
Extended Address Space Descriptor (.4)
|
||||||
General Flags: Bit [0] Consumer/Producer:
|
General Flags: Bit [0] Consumer/Producer:
|
||||||
1–This device consumes this resource
|
|
||||||
0–This device produces and consumes this resource
|
* 1 – This device consumes this resource
|
||||||
|
* 0 – This device produces and consumes this resource
|
||||||
|
|
||||||
[5] ACPI 6.2, sec 19.6.43:
|
[5] ACPI 6.2, sec 19.6.43:
|
||||||
ResourceUsage specifies whether the Memory range is consumed by
|
ResourceUsage specifies whether the Memory range is consumed by
|
|
@ -0,0 +1,13 @@
|
||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
======================
|
||||||
|
PCI Endpoint Framework
|
||||||
|
======================
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
|
||||||
|
pci-endpoint
|
||||||
|
pci-endpoint-cfs
|
||||||
|
pci-test-function
|
||||||
|
pci-test-howto
|
|
@ -1,41 +1,51 @@
|
||||||
CONFIGURING PCI ENDPOINT USING CONFIGFS
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
Kishon Vijay Abraham I <kishon@ti.com>
|
|
||||||
|
=======================================
|
||||||
|
Configuring PCI Endpoint Using CONFIGFS
|
||||||
|
=======================================
|
||||||
|
|
||||||
|
:Author: Kishon Vijay Abraham I <kishon@ti.com>
|
||||||
|
|
||||||
The PCI Endpoint Core exposes configfs entry (pci_ep) to configure the
|
The PCI Endpoint Core exposes configfs entry (pci_ep) to configure the
|
||||||
PCI endpoint function and to bind the endpoint function
|
PCI endpoint function and to bind the endpoint function
|
||||||
with the endpoint controller. (For introducing other mechanisms to
|
with the endpoint controller. (For introducing other mechanisms to
|
||||||
configure the PCI Endpoint Function refer to [1]).
|
configure the PCI Endpoint Function refer to [1]).
|
||||||
|
|
||||||
*) Mounting configfs
|
Mounting configfs
|
||||||
|
=================
|
||||||
|
|
||||||
The PCI Endpoint Core layer creates pci_ep directory in the mounted configfs
|
The PCI Endpoint Core layer creates pci_ep directory in the mounted configfs
|
||||||
directory. configfs can be mounted using the following command.
|
directory. configfs can be mounted using the following command::
|
||||||
|
|
||||||
mount -t configfs none /sys/kernel/config
|
mount -t configfs none /sys/kernel/config
|
||||||
|
|
||||||
*) Directory Structure
|
Directory Structure
|
||||||
|
===================
|
||||||
|
|
||||||
The pci_ep configfs has two directories at its root: controllers and
|
The pci_ep configfs has two directories at its root: controllers and
|
||||||
functions. Every EPC device present in the system will have an entry in
|
functions. Every EPC device present in the system will have an entry in
|
||||||
the *controllers* directory and and every EPF driver present in the system
|
the *controllers* directory and and every EPF driver present in the system
|
||||||
will have an entry in the *functions* directory.
|
will have an entry in the *functions* directory.
|
||||||
|
::
|
||||||
|
|
||||||
/sys/kernel/config/pci_ep/
|
/sys/kernel/config/pci_ep/
|
||||||
.. controllers/
|
.. controllers/
|
||||||
.. functions/
|
.. functions/
|
||||||
|
|
||||||
*) Creating EPF Device
|
Creating EPF Device
|
||||||
|
===================
|
||||||
|
|
||||||
Every registered EPF driver will be listed in controllers directory. The
|
Every registered EPF driver will be listed in controllers directory. The
|
||||||
entries corresponding to EPF driver will be created by the EPF core.
|
entries corresponding to EPF driver will be created by the EPF core.
|
||||||
|
::
|
||||||
|
|
||||||
/sys/kernel/config/pci_ep/functions/
|
/sys/kernel/config/pci_ep/functions/
|
||||||
.. <EPF Driver1>/
|
.. <EPF Driver1>/
|
||||||
... <EPF Device 11>/
|
... <EPF Device 11>/
|
||||||
... <EPF Device 21>/
|
... <EPF Device 21>/
|
||||||
.. <EPF Driver2>/
|
.. <EPF Driver2>/
|
||||||
... <EPF Device 12>/
|
... <EPF Device 12>/
|
||||||
... <EPF Device 22>/
|
... <EPF Device 22>/
|
||||||
|
|
||||||
In order to create a <EPF device> of the type probed by <EPF Driver>, the
|
In order to create a <EPF device> of the type probed by <EPF Driver>, the
|
||||||
user has to create a directory inside <EPF DriverN>.
|
user has to create a directory inside <EPF DriverN>.
|
||||||
|
@ -44,34 +54,37 @@ Every <EPF device> directory consists of the following entries that can be
|
||||||
used to configure the standard configuration header of the endpoint function.
|
used to configure the standard configuration header of the endpoint function.
|
||||||
(These entries are created by the framework when any new <EPF Device> is
|
(These entries are created by the framework when any new <EPF Device> is
|
||||||
created)
|
created)
|
||||||
|
::
|
||||||
|
|
||||||
.. <EPF Driver1>/
|
.. <EPF Driver1>/
|
||||||
... <EPF Device 11>/
|
... <EPF Device 11>/
|
||||||
... vendorid
|
... vendorid
|
||||||
... deviceid
|
... deviceid
|
||||||
... revid
|
... revid
|
||||||
... progif_code
|
... progif_code
|
||||||
... subclass_code
|
... subclass_code
|
||||||
... baseclass_code
|
... baseclass_code
|
||||||
... cache_line_size
|
... cache_line_size
|
||||||
... subsys_vendor_id
|
... subsys_vendor_id
|
||||||
... subsys_id
|
... subsys_id
|
||||||
... interrupt_pin
|
... interrupt_pin
|
||||||
|
|
||||||
*) EPC Device
|
EPC Device
|
||||||
|
==========
|
||||||
|
|
||||||
Every registered EPC device will be listed in controllers directory. The
|
Every registered EPC device will be listed in controllers directory. The
|
||||||
entries corresponding to EPC device will be created by the EPC core.
|
entries corresponding to EPC device will be created by the EPC core.
|
||||||
|
::
|
||||||
|
|
||||||
/sys/kernel/config/pci_ep/controllers/
|
/sys/kernel/config/pci_ep/controllers/
|
||||||
.. <EPC Device1>/
|
.. <EPC Device1>/
|
||||||
... <Symlink EPF Device11>/
|
... <Symlink EPF Device11>/
|
||||||
... <Symlink EPF Device12>/
|
... <Symlink EPF Device12>/
|
||||||
... start
|
... start
|
||||||
.. <EPC Device2>/
|
.. <EPC Device2>/
|
||||||
... <Symlink EPF Device21>/
|
... <Symlink EPF Device21>/
|
||||||
... <Symlink EPF Device22>/
|
... <Symlink EPF Device22>/
|
||||||
... start
|
... start
|
||||||
|
|
||||||
The <EPC Device> directory will have a list of symbolic links to
|
The <EPC Device> directory will have a list of symbolic links to
|
||||||
<EPF Device>. These symbolic links should be created by the user to
|
<EPF Device>. These symbolic links should be created by the user to
|
||||||
|
@ -81,7 +94,7 @@ The <EPC Device> directory will also have a *start* field. Once
|
||||||
"1" is written to this field, the endpoint device will be ready to
|
"1" is written to this field, the endpoint device will be ready to
|
||||||
establish the link with the host. This is usually done after
|
establish the link with the host. This is usually done after
|
||||||
all the EPF devices are created and linked with the EPC device.
|
all the EPF devices are created and linked with the EPC device.
|
||||||
|
::
|
||||||
|
|
||||||
| controllers/
|
| controllers/
|
||||||
| <Directory: EPC name>/
|
| <Directory: EPC name>/
|
||||||
|
@ -102,4 +115,4 @@ all the EPF devices are created and linked with the EPC device.
|
||||||
| interrupt_pin
|
| interrupt_pin
|
||||||
| function
|
| function
|
||||||
|
|
||||||
[1] -> Documentation/PCI/endpoint/pci-endpoint.txt
|
[1] :doc:`pci-endpoint`
|
|
@ -1,11 +1,13 @@
|
||||||
PCI ENDPOINT FRAMEWORK
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
Kishon Vijay Abraham I <kishon@ti.com>
|
|
||||||
|
:Author: Kishon Vijay Abraham I <kishon@ti.com>
|
||||||
|
|
||||||
This document is a guide to use the PCI Endpoint Framework in order to create
|
This document is a guide to use the PCI Endpoint Framework in order to create
|
||||||
endpoint controller driver, endpoint function driver, and using configfs
|
endpoint controller driver, endpoint function driver, and using configfs
|
||||||
interface to bind the function driver to the controller driver.
|
interface to bind the function driver to the controller driver.
|
||||||
|
|
||||||
1. Introduction
|
Introduction
|
||||||
|
============
|
||||||
|
|
||||||
Linux has a comprehensive PCI subsystem to support PCI controllers that
|
Linux has a comprehensive PCI subsystem to support PCI controllers that
|
||||||
operates in Root Complex mode. The subsystem has capability to scan PCI bus,
|
operates in Root Complex mode. The subsystem has capability to scan PCI bus,
|
||||||
|
@ -19,26 +21,30 @@ add endpoint mode support in Linux. This will help to run Linux in an
|
||||||
EP system which can have a wide variety of use cases from testing or
|
EP system which can have a wide variety of use cases from testing or
|
||||||
validation, co-processor accelerator, etc.
|
validation, co-processor accelerator, etc.
|
||||||
|
|
||||||
2. PCI Endpoint Core
|
PCI Endpoint Core
|
||||||
|
=================
|
||||||
|
|
||||||
The PCI Endpoint Core layer comprises 3 components: the Endpoint Controller
|
The PCI Endpoint Core layer comprises 3 components: the Endpoint Controller
|
||||||
library, the Endpoint Function library, and the configfs layer to bind the
|
library, the Endpoint Function library, and the configfs layer to bind the
|
||||||
endpoint function with the endpoint controller.
|
endpoint function with the endpoint controller.
|
||||||
|
|
||||||
2.1 PCI Endpoint Controller(EPC) Library
|
PCI Endpoint Controller(EPC) Library
|
||||||
|
------------------------------------
|
||||||
|
|
||||||
The EPC library provides APIs to be used by the controller that can operate
|
The EPC library provides APIs to be used by the controller that can operate
|
||||||
in endpoint mode. It also provides APIs to be used by function driver/library
|
in endpoint mode. It also provides APIs to be used by function driver/library
|
||||||
in order to implement a particular endpoint function.
|
in order to implement a particular endpoint function.
|
||||||
|
|
||||||
2.1.1 APIs for the PCI controller Driver
|
APIs for the PCI controller Driver
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
This section lists the APIs that the PCI Endpoint core provides to be used
|
This section lists the APIs that the PCI Endpoint core provides to be used
|
||||||
by the PCI controller driver.
|
by the PCI controller driver.
|
||||||
|
|
||||||
*) devm_pci_epc_create()/pci_epc_create()
|
* devm_pci_epc_create()/pci_epc_create()
|
||||||
|
|
||||||
The PCI controller driver should implement the following ops:
|
The PCI controller driver should implement the following ops:
|
||||||
|
|
||||||
* write_header: ops to populate configuration space header
|
* write_header: ops to populate configuration space header
|
||||||
* set_bar: ops to configure the BAR
|
* set_bar: ops to configure the BAR
|
||||||
* clear_bar: ops to reset the BAR
|
* clear_bar: ops to reset the BAR
|
||||||
|
@ -51,110 +57,116 @@ by the PCI controller driver.
|
||||||
The PCI controller driver can then create a new EPC device by invoking
|
The PCI controller driver can then create a new EPC device by invoking
|
||||||
devm_pci_epc_create()/pci_epc_create().
|
devm_pci_epc_create()/pci_epc_create().
|
||||||
|
|
||||||
*) devm_pci_epc_destroy()/pci_epc_destroy()
|
* devm_pci_epc_destroy()/pci_epc_destroy()
|
||||||
|
|
||||||
The PCI controller driver can destroy the EPC device created by either
|
The PCI controller driver can destroy the EPC device created by either
|
||||||
devm_pci_epc_create() or pci_epc_create() using devm_pci_epc_destroy() or
|
devm_pci_epc_create() or pci_epc_create() using devm_pci_epc_destroy() or
|
||||||
pci_epc_destroy().
|
pci_epc_destroy().
|
||||||
|
|
||||||
*) pci_epc_linkup()
|
* pci_epc_linkup()
|
||||||
|
|
||||||
In order to notify all the function devices that the EPC device to which
|
In order to notify all the function devices that the EPC device to which
|
||||||
they are linked has established a link with the host, the PCI controller
|
they are linked has established a link with the host, the PCI controller
|
||||||
driver should invoke pci_epc_linkup().
|
driver should invoke pci_epc_linkup().
|
||||||
|
|
||||||
*) pci_epc_mem_init()
|
* pci_epc_mem_init()
|
||||||
|
|
||||||
Initialize the pci_epc_mem structure used for allocating EPC addr space.
|
Initialize the pci_epc_mem structure used for allocating EPC addr space.
|
||||||
|
|
||||||
*) pci_epc_mem_exit()
|
* pci_epc_mem_exit()
|
||||||
|
|
||||||
Cleanup the pci_epc_mem structure allocated during pci_epc_mem_init().
|
Cleanup the pci_epc_mem structure allocated during pci_epc_mem_init().
|
||||||
|
|
||||||
2.1.2 APIs for the PCI Endpoint Function Driver
|
|
||||||
|
APIs for the PCI Endpoint Function Driver
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
This section lists the APIs that the PCI Endpoint core provides to be used
|
This section lists the APIs that the PCI Endpoint core provides to be used
|
||||||
by the PCI endpoint function driver.
|
by the PCI endpoint function driver.
|
||||||
|
|
||||||
*) pci_epc_write_header()
|
* pci_epc_write_header()
|
||||||
|
|
||||||
The PCI endpoint function driver should use pci_epc_write_header() to
|
The PCI endpoint function driver should use pci_epc_write_header() to
|
||||||
write the standard configuration header to the endpoint controller.
|
write the standard configuration header to the endpoint controller.
|
||||||
|
|
||||||
*) pci_epc_set_bar()
|
* pci_epc_set_bar()
|
||||||
|
|
||||||
The PCI endpoint function driver should use pci_epc_set_bar() to configure
|
The PCI endpoint function driver should use pci_epc_set_bar() to configure
|
||||||
the Base Address Register in order for the host to assign PCI addr space.
|
the Base Address Register in order for the host to assign PCI addr space.
|
||||||
Register space of the function driver is usually configured
|
Register space of the function driver is usually configured
|
||||||
using this API.
|
using this API.
|
||||||
|
|
||||||
*) pci_epc_clear_bar()
|
* pci_epc_clear_bar()
|
||||||
|
|
||||||
The PCI endpoint function driver should use pci_epc_clear_bar() to reset
|
The PCI endpoint function driver should use pci_epc_clear_bar() to reset
|
||||||
the BAR.
|
the BAR.
|
||||||
|
|
||||||
*) pci_epc_raise_irq()
|
* pci_epc_raise_irq()
|
||||||
|
|
||||||
The PCI endpoint function driver should use pci_epc_raise_irq() to raise
|
The PCI endpoint function driver should use pci_epc_raise_irq() to raise
|
||||||
Legacy Interrupt, MSI or MSI-X Interrupt.
|
Legacy Interrupt, MSI or MSI-X Interrupt.
|
||||||
|
|
||||||
*) pci_epc_mem_alloc_addr()
|
* pci_epc_mem_alloc_addr()
|
||||||
|
|
||||||
The PCI endpoint function driver should use pci_epc_mem_alloc_addr(), to
|
The PCI endpoint function driver should use pci_epc_mem_alloc_addr(), to
|
||||||
allocate memory address from EPC addr space which is required to access
|
allocate memory address from EPC addr space which is required to access
|
||||||
RC's buffer
|
RC's buffer
|
||||||
|
|
||||||
*) pci_epc_mem_free_addr()
|
* pci_epc_mem_free_addr()
|
||||||
|
|
||||||
The PCI endpoint function driver should use pci_epc_mem_free_addr() to
|
The PCI endpoint function driver should use pci_epc_mem_free_addr() to
|
||||||
free the memory space allocated using pci_epc_mem_alloc_addr().
|
free the memory space allocated using pci_epc_mem_alloc_addr().
|
||||||
|
|
||||||
2.1.3 Other APIs
|
Other APIs
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
There are other APIs provided by the EPC library. These are used for binding
|
There are other APIs provided by the EPC library. These are used for binding
|
||||||
the EPF device with EPC device. pci-ep-cfs.c can be used as reference for
|
the EPF device with EPC device. pci-ep-cfs.c can be used as reference for
|
||||||
using these APIs.
|
using these APIs.
|
||||||
|
|
||||||
*) pci_epc_get()
|
* pci_epc_get()
|
||||||
|
|
||||||
Get a reference to the PCI endpoint controller based on the device name of
|
Get a reference to the PCI endpoint controller based on the device name of
|
||||||
the controller.
|
the controller.
|
||||||
|
|
||||||
*) pci_epc_put()
|
* pci_epc_put()
|
||||||
|
|
||||||
Release the reference to the PCI endpoint controller obtained using
|
Release the reference to the PCI endpoint controller obtained using
|
||||||
pci_epc_get()
|
pci_epc_get()
|
||||||
|
|
||||||
*) pci_epc_add_epf()
|
* pci_epc_add_epf()
|
||||||
|
|
||||||
Add a PCI endpoint function to a PCI endpoint controller. A PCIe device
|
Add a PCI endpoint function to a PCI endpoint controller. A PCIe device
|
||||||
can have up to 8 functions according to the specification.
|
can have up to 8 functions according to the specification.
|
||||||
|
|
||||||
*) pci_epc_remove_epf()
|
* pci_epc_remove_epf()
|
||||||
|
|
||||||
Remove the PCI endpoint function from PCI endpoint controller.
|
Remove the PCI endpoint function from PCI endpoint controller.
|
||||||
|
|
||||||
*) pci_epc_start()
|
* pci_epc_start()
|
||||||
|
|
||||||
The PCI endpoint function driver should invoke pci_epc_start() once it
|
The PCI endpoint function driver should invoke pci_epc_start() once it
|
||||||
has configured the endpoint function and wants to start the PCI link.
|
has configured the endpoint function and wants to start the PCI link.
|
||||||
|
|
||||||
*) pci_epc_stop()
|
* pci_epc_stop()
|
||||||
|
|
||||||
The PCI endpoint function driver should invoke pci_epc_stop() to stop
|
The PCI endpoint function driver should invoke pci_epc_stop() to stop
|
||||||
the PCI LINK.
|
the PCI LINK.
|
||||||
|
|
||||||
2.2 PCI Endpoint Function(EPF) Library
|
|
||||||
|
PCI Endpoint Function(EPF) Library
|
||||||
|
----------------------------------
|
||||||
|
|
||||||
The EPF library provides APIs to be used by the function driver and the EPC
|
The EPF library provides APIs to be used by the function driver and the EPC
|
||||||
library to provide endpoint mode functionality.
|
library to provide endpoint mode functionality.
|
||||||
|
|
||||||
2.2.1 APIs for the PCI Endpoint Function Driver
|
APIs for the PCI Endpoint Function Driver
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
This section lists the APIs that the PCI Endpoint core provides to be used
|
This section lists the APIs that the PCI Endpoint core provides to be used
|
||||||
by the PCI endpoint function driver.
|
by the PCI endpoint function driver.
|
||||||
|
|
||||||
*) pci_epf_register_driver()
|
* pci_epf_register_driver()
|
||||||
|
|
||||||
The PCI Endpoint Function driver should implement the following ops:
|
The PCI Endpoint Function driver should implement the following ops:
|
||||||
* bind: ops to perform when a EPC device has been bound to EPF device
|
* bind: ops to perform when a EPC device has been bound to EPF device
|
||||||
|
@ -166,50 +178,54 @@ by the PCI endpoint function driver.
|
||||||
The PCI Function driver can then register the PCI EPF driver by using
|
The PCI Function driver can then register the PCI EPF driver by using
|
||||||
pci_epf_register_driver().
|
pci_epf_register_driver().
|
||||||
|
|
||||||
*) pci_epf_unregister_driver()
|
* pci_epf_unregister_driver()
|
||||||
|
|
||||||
The PCI Function driver can unregister the PCI EPF driver by using
|
The PCI Function driver can unregister the PCI EPF driver by using
|
||||||
pci_epf_unregister_driver().
|
pci_epf_unregister_driver().
|
||||||
|
|
||||||
*) pci_epf_alloc_space()
|
* pci_epf_alloc_space()
|
||||||
|
|
||||||
The PCI Function driver can allocate space for a particular BAR using
|
The PCI Function driver can allocate space for a particular BAR using
|
||||||
pci_epf_alloc_space().
|
pci_epf_alloc_space().
|
||||||
|
|
||||||
*) pci_epf_free_space()
|
* pci_epf_free_space()
|
||||||
|
|
||||||
The PCI Function driver can free the allocated space
|
The PCI Function driver can free the allocated space
|
||||||
(using pci_epf_alloc_space) by invoking pci_epf_free_space().
|
(using pci_epf_alloc_space) by invoking pci_epf_free_space().
|
||||||
|
|
||||||
2.2.2 APIs for the PCI Endpoint Controller Library
|
APIs for the PCI Endpoint Controller Library
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
This section lists the APIs that the PCI Endpoint core provides to be used
|
This section lists the APIs that the PCI Endpoint core provides to be used
|
||||||
by the PCI endpoint controller library.
|
by the PCI endpoint controller library.
|
||||||
|
|
||||||
*) pci_epf_linkup()
|
* pci_epf_linkup()
|
||||||
|
|
||||||
The PCI endpoint controller library invokes pci_epf_linkup() when the
|
The PCI endpoint controller library invokes pci_epf_linkup() when the
|
||||||
EPC device has established the connection to the host.
|
EPC device has established the connection to the host.
|
||||||
|
|
||||||
2.2.2 Other APIs
|
Other APIs
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
There are other APIs provided by the EPF library. These are used to notify
|
There are other APIs provided by the EPF library. These are used to notify
|
||||||
the function driver when the EPF device is bound to the EPC device.
|
the function driver when the EPF device is bound to the EPC device.
|
||||||
pci-ep-cfs.c can be used as reference for using these APIs.
|
pci-ep-cfs.c can be used as reference for using these APIs.
|
||||||
|
|
||||||
*) pci_epf_create()
|
* pci_epf_create()
|
||||||
|
|
||||||
Create a new PCI EPF device by passing the name of the PCI EPF device.
|
Create a new PCI EPF device by passing the name of the PCI EPF device.
|
||||||
This name will be used to bind the the EPF device to a EPF driver.
|
This name will be used to bind the the EPF device to a EPF driver.
|
||||||
|
|
||||||
*) pci_epf_destroy()
|
* pci_epf_destroy()
|
||||||
|
|
||||||
Destroy the created PCI EPF device.
|
Destroy the created PCI EPF device.
|
||||||
|
|
||||||
*) pci_epf_bind()
|
* pci_epf_bind()
|
||||||
|
|
||||||
pci_epf_bind() should be invoked when the EPF device has been bound to
|
pci_epf_bind() should be invoked when the EPF device has been bound to
|
||||||
a EPC device.
|
a EPC device.
|
||||||
|
|
||||||
*) pci_epf_unbind()
|
* pci_epf_unbind()
|
||||||
|
|
||||||
pci_epf_unbind() should be invoked when the binding between EPC device
|
pci_epf_unbind() should be invoked when the binding between EPC device
|
||||||
and EPF device is lost.
|
and EPF device is lost.
|
|
@ -1,5 +1,10 @@
|
||||||
PCI TEST
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
Kishon Vijay Abraham I <kishon@ti.com>
|
|
||||||
|
=================
|
||||||
|
PCI Test Function
|
||||||
|
=================
|
||||||
|
|
||||||
|
:Author: Kishon Vijay Abraham I <kishon@ti.com>
|
||||||
|
|
||||||
Traditionally PCI RC has always been validated by using standard
|
Traditionally PCI RC has always been validated by using standard
|
||||||
PCI cards like ethernet PCI cards or USB PCI cards or SATA PCI cards.
|
PCI cards like ethernet PCI cards or USB PCI cards or SATA PCI cards.
|
||||||
|
@ -23,65 +28,76 @@ The PCI endpoint test device has the following registers:
|
||||||
8) PCI_ENDPOINT_TEST_IRQ_TYPE
|
8) PCI_ENDPOINT_TEST_IRQ_TYPE
|
||||||
9) PCI_ENDPOINT_TEST_IRQ_NUMBER
|
9) PCI_ENDPOINT_TEST_IRQ_NUMBER
|
||||||
|
|
||||||
*) PCI_ENDPOINT_TEST_MAGIC
|
* PCI_ENDPOINT_TEST_MAGIC
|
||||||
|
|
||||||
This register will be used to test BAR0. A known pattern will be written
|
This register will be used to test BAR0. A known pattern will be written
|
||||||
and read back from MAGIC register to verify BAR0.
|
and read back from MAGIC register to verify BAR0.
|
||||||
|
|
||||||
*) PCI_ENDPOINT_TEST_COMMAND:
|
* PCI_ENDPOINT_TEST_COMMAND
|
||||||
|
|
||||||
This register will be used by the host driver to indicate the function
|
This register will be used by the host driver to indicate the function
|
||||||
that the endpoint device must perform.
|
that the endpoint device must perform.
|
||||||
|
|
||||||
Bitfield Description:
|
======== ================================================================
|
||||||
Bit 0 : raise legacy IRQ
|
Bitfield Description
|
||||||
Bit 1 : raise MSI IRQ
|
======== ================================================================
|
||||||
Bit 2 : raise MSI-X IRQ
|
Bit 0 raise legacy IRQ
|
||||||
Bit 3 : read command (read data from RC buffer)
|
Bit 1 raise MSI IRQ
|
||||||
Bit 4 : write command (write data to RC buffer)
|
Bit 2 raise MSI-X IRQ
|
||||||
Bit 5 : copy command (copy data from one RC buffer to another
|
Bit 3 read command (read data from RC buffer)
|
||||||
RC buffer)
|
Bit 4 write command (write data to RC buffer)
|
||||||
|
Bit 5 copy command (copy data from one RC buffer to another RC buffer)
|
||||||
|
======== ================================================================
|
||||||
|
|
||||||
*) PCI_ENDPOINT_TEST_STATUS
|
* PCI_ENDPOINT_TEST_STATUS
|
||||||
|
|
||||||
This register reflects the status of the PCI endpoint device.
|
This register reflects the status of the PCI endpoint device.
|
||||||
|
|
||||||
Bitfield Description:
|
======== ==============================
|
||||||
Bit 0 : read success
|
Bitfield Description
|
||||||
Bit 1 : read fail
|
======== ==============================
|
||||||
Bit 2 : write success
|
Bit 0 read success
|
||||||
Bit 3 : write fail
|
Bit 1 read fail
|
||||||
Bit 4 : copy success
|
Bit 2 write success
|
||||||
Bit 5 : copy fail
|
Bit 3 write fail
|
||||||
Bit 6 : IRQ raised
|
Bit 4 copy success
|
||||||
Bit 7 : source address is invalid
|
Bit 5 copy fail
|
||||||
Bit 8 : destination address is invalid
|
Bit 6 IRQ raised
|
||||||
|
Bit 7 source address is invalid
|
||||||
|
Bit 8 destination address is invalid
|
||||||
|
======== ==============================
|
||||||
|
|
||||||
*) PCI_ENDPOINT_TEST_SRC_ADDR
|
* PCI_ENDPOINT_TEST_SRC_ADDR
|
||||||
|
|
||||||
This register contains the source address (RC buffer address) for the
|
This register contains the source address (RC buffer address) for the
|
||||||
COPY/READ command.
|
COPY/READ command.
|
||||||
|
|
||||||
*) PCI_ENDPOINT_TEST_DST_ADDR
|
* PCI_ENDPOINT_TEST_DST_ADDR
|
||||||
|
|
||||||
This register contains the destination address (RC buffer address) for
|
This register contains the destination address (RC buffer address) for
|
||||||
the COPY/WRITE command.
|
the COPY/WRITE command.
|
||||||
|
|
||||||
*) PCI_ENDPOINT_TEST_IRQ_TYPE
|
* PCI_ENDPOINT_TEST_IRQ_TYPE
|
||||||
|
|
||||||
This register contains the interrupt type (Legacy/MSI) triggered
|
This register contains the interrupt type (Legacy/MSI) triggered
|
||||||
for the READ/WRITE/COPY and raise IRQ (Legacy/MSI) commands.
|
for the READ/WRITE/COPY and raise IRQ (Legacy/MSI) commands.
|
||||||
|
|
||||||
Possible types:
|
Possible types:
|
||||||
- Legacy : 0
|
|
||||||
- MSI : 1
|
|
||||||
- MSI-X : 2
|
|
||||||
|
|
||||||
*) PCI_ENDPOINT_TEST_IRQ_NUMBER
|
====== ==
|
||||||
|
Legacy 0
|
||||||
|
MSI 1
|
||||||
|
MSI-X 2
|
||||||
|
====== ==
|
||||||
|
|
||||||
|
* PCI_ENDPOINT_TEST_IRQ_NUMBER
|
||||||
|
|
||||||
This register contains the triggered ID interrupt.
|
This register contains the triggered ID interrupt.
|
||||||
|
|
||||||
Admissible values:
|
Admissible values:
|
||||||
- Legacy : 0
|
|
||||||
- MSI : [1 .. 32]
|
====== ===========
|
||||||
- MSI-X : [1 .. 2048]
|
Legacy 0
|
||||||
|
MSI [1 .. 32]
|
||||||
|
MSI-X [1 .. 2048]
|
||||||
|
====== ===========
|
|
@ -1,38 +1,51 @@
|
||||||
PCI TEST USERGUIDE
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
Kishon Vijay Abraham I <kishon@ti.com>
|
|
||||||
|
===================
|
||||||
|
PCI Test User Guide
|
||||||
|
===================
|
||||||
|
|
||||||
|
:Author: Kishon Vijay Abraham I <kishon@ti.com>
|
||||||
|
|
||||||
This document is a guide to help users use pci-epf-test function driver
|
This document is a guide to help users use pci-epf-test function driver
|
||||||
and pci_endpoint_test host driver for testing PCI. The list of steps to
|
and pci_endpoint_test host driver for testing PCI. The list of steps to
|
||||||
be followed in the host side and EP side is given below.
|
be followed in the host side and EP side is given below.
|
||||||
|
|
||||||
1. Endpoint Device
|
Endpoint Device
|
||||||
|
===============
|
||||||
|
|
||||||
1.1 Endpoint Controller Devices
|
Endpoint Controller Devices
|
||||||
|
---------------------------
|
||||||
|
|
||||||
To find the list of endpoint controller devices in the system:
|
To find the list of endpoint controller devices in the system::
|
||||||
|
|
||||||
# ls /sys/class/pci_epc/
|
# ls /sys/class/pci_epc/
|
||||||
51000000.pcie_ep
|
51000000.pcie_ep
|
||||||
|
|
||||||
If PCI_ENDPOINT_CONFIGFS is enabled
|
If PCI_ENDPOINT_CONFIGFS is enabled::
|
||||||
|
|
||||||
# ls /sys/kernel/config/pci_ep/controllers
|
# ls /sys/kernel/config/pci_ep/controllers
|
||||||
51000000.pcie_ep
|
51000000.pcie_ep
|
||||||
|
|
||||||
1.2 Endpoint Function Drivers
|
|
||||||
|
|
||||||
To find the list of endpoint function drivers in the system:
|
Endpoint Function Drivers
|
||||||
|
-------------------------
|
||||||
|
|
||||||
|
To find the list of endpoint function drivers in the system::
|
||||||
|
|
||||||
# ls /sys/bus/pci-epf/drivers
|
# ls /sys/bus/pci-epf/drivers
|
||||||
pci_epf_test
|
pci_epf_test
|
||||||
|
|
||||||
If PCI_ENDPOINT_CONFIGFS is enabled
|
If PCI_ENDPOINT_CONFIGFS is enabled::
|
||||||
|
|
||||||
# ls /sys/kernel/config/pci_ep/functions
|
# ls /sys/kernel/config/pci_ep/functions
|
||||||
pci_epf_test
|
pci_epf_test
|
||||||
|
|
||||||
1.3 Creating pci-epf-test Device
|
|
||||||
|
Creating pci-epf-test Device
|
||||||
|
----------------------------
|
||||||
|
|
||||||
PCI endpoint function device can be created using the configfs. To create
|
PCI endpoint function device can be created using the configfs. To create
|
||||||
pci-epf-test device, the following commands can be used
|
pci-epf-test device, the following commands can be used::
|
||||||
|
|
||||||
# mount -t configfs none /sys/kernel/config
|
# mount -t configfs none /sys/kernel/config
|
||||||
# cd /sys/kernel/config/pci_ep/
|
# cd /sys/kernel/config/pci_ep/
|
||||||
|
@ -42,7 +55,7 @@ The "mkdir func1" above creates the pci-epf-test function device that will
|
||||||
be probed by pci_epf_test driver.
|
be probed by pci_epf_test driver.
|
||||||
|
|
||||||
The PCI endpoint framework populates the directory with the following
|
The PCI endpoint framework populates the directory with the following
|
||||||
configurable fields.
|
configurable fields::
|
||||||
|
|
||||||
# ls functions/pci_epf_test/func1
|
# ls functions/pci_epf_test/func1
|
||||||
baseclass_code interrupt_pin progif_code subsys_id
|
baseclass_code interrupt_pin progif_code subsys_id
|
||||||
|
@ -51,67 +64,83 @@ configurable fields.
|
||||||
|
|
||||||
The PCI endpoint function driver populates these entries with default values
|
The PCI endpoint function driver populates these entries with default values
|
||||||
when the device is bound to the driver. The pci-epf-test driver populates
|
when the device is bound to the driver. The pci-epf-test driver populates
|
||||||
vendorid with 0xffff and interrupt_pin with 0x0001
|
vendorid with 0xffff and interrupt_pin with 0x0001::
|
||||||
|
|
||||||
# cat functions/pci_epf_test/func1/vendorid
|
# cat functions/pci_epf_test/func1/vendorid
|
||||||
0xffff
|
0xffff
|
||||||
# cat functions/pci_epf_test/func1/interrupt_pin
|
# cat functions/pci_epf_test/func1/interrupt_pin
|
||||||
0x0001
|
0x0001
|
||||||
|
|
||||||
1.4 Configuring pci-epf-test Device
|
|
||||||
|
Configuring pci-epf-test Device
|
||||||
|
-------------------------------
|
||||||
|
|
||||||
The user can configure the pci-epf-test device using configfs entry. In order
|
The user can configure the pci-epf-test device using configfs entry. In order
|
||||||
to change the vendorid and the number of MSI interrupts used by the function
|
to change the vendorid and the number of MSI interrupts used by the function
|
||||||
device, the following commands can be used.
|
device, the following commands can be used::
|
||||||
|
|
||||||
# echo 0x104c > functions/pci_epf_test/func1/vendorid
|
# echo 0x104c > functions/pci_epf_test/func1/vendorid
|
||||||
# echo 0xb500 > functions/pci_epf_test/func1/deviceid
|
# echo 0xb500 > functions/pci_epf_test/func1/deviceid
|
||||||
# echo 16 > functions/pci_epf_test/func1/msi_interrupts
|
# echo 16 > functions/pci_epf_test/func1/msi_interrupts
|
||||||
# echo 8 > functions/pci_epf_test/func1/msix_interrupts
|
# echo 8 > functions/pci_epf_test/func1/msix_interrupts
|
||||||
|
|
||||||
1.5 Binding pci-epf-test Device to EP Controller
|
|
||||||
|
Binding pci-epf-test Device to EP Controller
|
||||||
|
--------------------------------------------
|
||||||
|
|
||||||
In order for the endpoint function device to be useful, it has to be bound to
|
In order for the endpoint function device to be useful, it has to be bound to
|
||||||
a PCI endpoint controller driver. Use the configfs to bind the function
|
a PCI endpoint controller driver. Use the configfs to bind the function
|
||||||
device to one of the controller driver present in the system.
|
device to one of the controller driver present in the system::
|
||||||
|
|
||||||
# ln -s functions/pci_epf_test/func1 controllers/51000000.pcie_ep/
|
# ln -s functions/pci_epf_test/func1 controllers/51000000.pcie_ep/
|
||||||
|
|
||||||
Once the above step is completed, the PCI endpoint is ready to establish a link
|
Once the above step is completed, the PCI endpoint is ready to establish a link
|
||||||
with the host.
|
with the host.
|
||||||
|
|
||||||
1.6 Start the Link
|
|
||||||
|
Start the Link
|
||||||
|
--------------
|
||||||
|
|
||||||
In order for the endpoint device to establish a link with the host, the _start_
|
In order for the endpoint device to establish a link with the host, the _start_
|
||||||
field should be populated with '1'.
|
field should be populated with '1'::
|
||||||
|
|
||||||
# echo 1 > controllers/51000000.pcie_ep/start
|
# echo 1 > controllers/51000000.pcie_ep/start
|
||||||
|
|
||||||
2. RootComplex Device
|
|
||||||
|
|
||||||
2.1 lspci Output
|
RootComplex Device
|
||||||
|
==================
|
||||||
|
|
||||||
Note that the devices listed here correspond to the value populated in 1.4 above
|
lspci Output
|
||||||
|
------------
|
||||||
|
|
||||||
|
Note that the devices listed here correspond to the value populated in 1.4
|
||||||
|
above::
|
||||||
|
|
||||||
00:00.0 PCI bridge: Texas Instruments Device 8888 (rev 01)
|
00:00.0 PCI bridge: Texas Instruments Device 8888 (rev 01)
|
||||||
01:00.0 Unassigned class [ff00]: Texas Instruments Device b500
|
01:00.0 Unassigned class [ff00]: Texas Instruments Device b500
|
||||||
|
|
||||||
2.2 Using Endpoint Test function Device
|
|
||||||
|
Using Endpoint Test function Device
|
||||||
|
-----------------------------------
|
||||||
|
|
||||||
pcitest.sh added in tools/pci/ can be used to run all the default PCI endpoint
|
pcitest.sh added in tools/pci/ can be used to run all the default PCI endpoint
|
||||||
tests. To compile this tool the following commands should be used:
|
tests. To compile this tool the following commands should be used::
|
||||||
|
|
||||||
# cd <kernel-dir>
|
# cd <kernel-dir>
|
||||||
# make -C tools/pci
|
# make -C tools/pci
|
||||||
|
|
||||||
or if you desire to compile and install in your system:
|
or if you desire to compile and install in your system::
|
||||||
|
|
||||||
# cd <kernel-dir>
|
# cd <kernel-dir>
|
||||||
# make -C tools/pci install
|
# make -C tools/pci install
|
||||||
|
|
||||||
The tool and script will be located in <rootfs>/usr/bin/
|
The tool and script will be located in <rootfs>/usr/bin/
|
||||||
|
|
||||||
2.2.1 pcitest.sh Output
|
|
||||||
|
pcitest.sh Output
|
||||||
|
~~~~~~~~~~~~~~~~~
|
||||||
|
::
|
||||||
|
|
||||||
# pcitest.sh
|
# pcitest.sh
|
||||||
BAR tests
|
BAR tests
|
||||||
|
|
|
@ -0,0 +1,18 @@
|
||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
=======================
|
||||||
|
Linux PCI Bus Subsystem
|
||||||
|
=======================
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
:numbered:
|
||||||
|
|
||||||
|
pci
|
||||||
|
picebus-howto
|
||||||
|
pci-iov-howto
|
||||||
|
msi-howto
|
||||||
|
acpi-info
|
||||||
|
pci-error-recovery
|
||||||
|
pcieaer-howto
|
||||||
|
endpoint/index
|
|
@ -1,13 +1,16 @@
|
||||||
The MSI Driver Guide HOWTO
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
Tom L Nguyen tom.l.nguyen@intel.com
|
.. include:: <isonum.txt>
|
||||||
10/03/2003
|
|
||||||
Revised Feb 12, 2004 by Martine Silbermann
|
|
||||||
email: Martine.Silbermann@hp.com
|
|
||||||
Revised Jun 25, 2004 by Tom L Nguyen
|
|
||||||
Revised Jul 9, 2008 by Matthew Wilcox <willy@linux.intel.com>
|
|
||||||
Copyright 2003, 2008 Intel Corporation
|
|
||||||
|
|
||||||
1. About this guide
|
==========================
|
||||||
|
The MSI Driver Guide HOWTO
|
||||||
|
==========================
|
||||||
|
|
||||||
|
:Authors: Tom L Nguyen; Martine Silbermann; Matthew Wilcox
|
||||||
|
|
||||||
|
:Copyright: 2003, 2008 Intel Corporation
|
||||||
|
|
||||||
|
About this guide
|
||||||
|
================
|
||||||
|
|
||||||
This guide describes the basics of Message Signaled Interrupts (MSIs),
|
This guide describes the basics of Message Signaled Interrupts (MSIs),
|
||||||
the advantages of using MSI over traditional interrupt mechanisms, how
|
the advantages of using MSI over traditional interrupt mechanisms, how
|
||||||
|
@ -15,7 +18,8 @@ to change your driver to use MSI or MSI-X and some basic diagnostics to
|
||||||
try if a device doesn't support MSIs.
|
try if a device doesn't support MSIs.
|
||||||
|
|
||||||
|
|
||||||
2. What are MSIs?
|
What are MSIs?
|
||||||
|
==============
|
||||||
|
|
||||||
A Message Signaled Interrupt is a write from the device to a special
|
A Message Signaled Interrupt is a write from the device to a special
|
||||||
address which causes an interrupt to be received by the CPU.
|
address which causes an interrupt to be received by the CPU.
|
||||||
|
@ -29,7 +33,8 @@ Devices may support both MSI and MSI-X, but only one can be enabled at
|
||||||
a time.
|
a time.
|
||||||
|
|
||||||
|
|
||||||
3. Why use MSIs?
|
Why use MSIs?
|
||||||
|
=============
|
||||||
|
|
||||||
There are three reasons why using MSIs can give an advantage over
|
There are three reasons why using MSIs can give an advantage over
|
||||||
traditional pin-based interrupts.
|
traditional pin-based interrupts.
|
||||||
|
@ -61,14 +66,16 @@ Other possible designs include giving one interrupt to each packet queue
|
||||||
in a network card or each port in a storage controller.
|
in a network card or each port in a storage controller.
|
||||||
|
|
||||||
|
|
||||||
4. How to use MSIs
|
How to use MSIs
|
||||||
|
===============
|
||||||
|
|
||||||
PCI devices are initialised to use pin-based interrupts. The device
|
PCI devices are initialised to use pin-based interrupts. The device
|
||||||
driver has to set up the device to use MSI or MSI-X. Not all machines
|
driver has to set up the device to use MSI or MSI-X. Not all machines
|
||||||
support MSIs correctly, and for those machines, the APIs described below
|
support MSIs correctly, and for those machines, the APIs described below
|
||||||
will simply fail and the device will continue to use pin-based interrupts.
|
will simply fail and the device will continue to use pin-based interrupts.
|
||||||
|
|
||||||
4.1 Include kernel support for MSIs
|
Include kernel support for MSIs
|
||||||
|
-------------------------------
|
||||||
|
|
||||||
To support MSI or MSI-X, the kernel must be built with the CONFIG_PCI_MSI
|
To support MSI or MSI-X, the kernel must be built with the CONFIG_PCI_MSI
|
||||||
option enabled. This option is only available on some architectures,
|
option enabled. This option is only available on some architectures,
|
||||||
|
@ -76,14 +83,15 @@ and it may depend on some other options also being set. For example,
|
||||||
on x86, you must also enable X86_UP_APIC or SMP in order to see the
|
on x86, you must also enable X86_UP_APIC or SMP in order to see the
|
||||||
CONFIG_PCI_MSI option.
|
CONFIG_PCI_MSI option.
|
||||||
|
|
||||||
4.2 Using MSI
|
Using MSI
|
||||||
|
---------
|
||||||
|
|
||||||
Most of the hard work is done for the driver in the PCI layer. The driver
|
Most of the hard work is done for the driver in the PCI layer. The driver
|
||||||
simply has to request that the PCI layer set up the MSI capability for this
|
simply has to request that the PCI layer set up the MSI capability for this
|
||||||
device.
|
device.
|
||||||
|
|
||||||
To automatically use MSI or MSI-X interrupt vectors, use the following
|
To automatically use MSI or MSI-X interrupt vectors, use the following
|
||||||
function:
|
function::
|
||||||
|
|
||||||
int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
|
int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
|
||||||
unsigned int max_vecs, unsigned int flags);
|
unsigned int max_vecs, unsigned int flags);
|
||||||
|
@ -101,12 +109,12 @@ any possible kind of interrupt. If the PCI_IRQ_AFFINITY flag is set,
|
||||||
pci_alloc_irq_vectors() will spread the interrupts around the available CPUs.
|
pci_alloc_irq_vectors() will spread the interrupts around the available CPUs.
|
||||||
|
|
||||||
To get the Linux IRQ numbers passed to request_irq() and free_irq() and the
|
To get the Linux IRQ numbers passed to request_irq() and free_irq() and the
|
||||||
vectors, use the following function:
|
vectors, use the following function::
|
||||||
|
|
||||||
int pci_irq_vector(struct pci_dev *dev, unsigned int nr);
|
int pci_irq_vector(struct pci_dev *dev, unsigned int nr);
|
||||||
|
|
||||||
Any allocated resources should be freed before removing the device using
|
Any allocated resources should be freed before removing the device using
|
||||||
the following function:
|
the following function::
|
||||||
|
|
||||||
void pci_free_irq_vectors(struct pci_dev *dev);
|
void pci_free_irq_vectors(struct pci_dev *dev);
|
||||||
|
|
||||||
|
@ -126,7 +134,7 @@ The typical usage of MSI or MSI-X interrupts is to allocate as many vectors
|
||||||
as possible, likely up to the limit supported by the device. If nvec is
|
as possible, likely up to the limit supported by the device. If nvec is
|
||||||
larger than the number supported by the device it will automatically be
|
larger than the number supported by the device it will automatically be
|
||||||
capped to the supported limit, so there is no need to query the number of
|
capped to the supported limit, so there is no need to query the number of
|
||||||
vectors supported beforehand:
|
vectors supported beforehand::
|
||||||
|
|
||||||
nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_ALL_TYPES)
|
nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_ALL_TYPES)
|
||||||
if (nvec < 0)
|
if (nvec < 0)
|
||||||
|
@ -135,7 +143,7 @@ vectors supported beforehand:
|
||||||
If a driver is unable or unwilling to deal with a variable number of MSI
|
If a driver is unable or unwilling to deal with a variable number of MSI
|
||||||
interrupts it can request a particular number of interrupts by passing that
|
interrupts it can request a particular number of interrupts by passing that
|
||||||
number to pci_alloc_irq_vectors() function as both 'min_vecs' and
|
number to pci_alloc_irq_vectors() function as both 'min_vecs' and
|
||||||
'max_vecs' parameters:
|
'max_vecs' parameters::
|
||||||
|
|
||||||
ret = pci_alloc_irq_vectors(pdev, nvec, nvec, PCI_IRQ_ALL_TYPES);
|
ret = pci_alloc_irq_vectors(pdev, nvec, nvec, PCI_IRQ_ALL_TYPES);
|
||||||
if (ret < 0)
|
if (ret < 0)
|
||||||
|
@ -143,23 +151,24 @@ number to pci_alloc_irq_vectors() function as both 'min_vecs' and
|
||||||
|
|
||||||
The most notorious example of the request type described above is enabling
|
The most notorious example of the request type described above is enabling
|
||||||
the single MSI mode for a device. It could be done by passing two 1s as
|
the single MSI mode for a device. It could be done by passing two 1s as
|
||||||
'min_vecs' and 'max_vecs':
|
'min_vecs' and 'max_vecs'::
|
||||||
|
|
||||||
ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_ALL_TYPES);
|
ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_ALL_TYPES);
|
||||||
if (ret < 0)
|
if (ret < 0)
|
||||||
goto out_err;
|
goto out_err;
|
||||||
|
|
||||||
Some devices might not support using legacy line interrupts, in which case
|
Some devices might not support using legacy line interrupts, in which case
|
||||||
the driver can specify that only MSI or MSI-X is acceptable:
|
the driver can specify that only MSI or MSI-X is acceptable::
|
||||||
|
|
||||||
nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_MSI | PCI_IRQ_MSIX);
|
nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_MSI | PCI_IRQ_MSIX);
|
||||||
if (nvec < 0)
|
if (nvec < 0)
|
||||||
goto out_err;
|
goto out_err;
|
||||||
|
|
||||||
4.3 Legacy APIs
|
Legacy APIs
|
||||||
|
-----------
|
||||||
|
|
||||||
The following old APIs to enable and disable MSI or MSI-X interrupts should
|
The following old APIs to enable and disable MSI or MSI-X interrupts should
|
||||||
not be used in new code:
|
not be used in new code::
|
||||||
|
|
||||||
pci_enable_msi() /* deprecated */
|
pci_enable_msi() /* deprecated */
|
||||||
pci_disable_msi() /* deprecated */
|
pci_disable_msi() /* deprecated */
|
||||||
|
@ -174,9 +183,11 @@ number of vectors. If you have a legitimate special use case for the count
|
||||||
of vectors we might have to revisit that decision and add a
|
of vectors we might have to revisit that decision and add a
|
||||||
pci_nr_irq_vectors() helper that handles MSI and MSI-X transparently.
|
pci_nr_irq_vectors() helper that handles MSI and MSI-X transparently.
|
||||||
|
|
||||||
4.4 Considerations when using MSIs
|
Considerations when using MSIs
|
||||||
|
------------------------------
|
||||||
|
|
||||||
4.4.1 Spinlocks
|
Spinlocks
|
||||||
|
~~~~~~~~~
|
||||||
|
|
||||||
Most device drivers have a per-device spinlock which is taken in the
|
Most device drivers have a per-device spinlock which is taken in the
|
||||||
interrupt handler. With pin-based interrupts or a single MSI, it is not
|
interrupt handler. With pin-based interrupts or a single MSI, it is not
|
||||||
|
@ -188,7 +199,8 @@ acquire the spinlock. Such deadlocks can be avoided by using
|
||||||
spin_lock_irqsave() or spin_lock_irq() which disable local interrupts
|
spin_lock_irqsave() or spin_lock_irq() which disable local interrupts
|
||||||
and acquire the lock (see Documentation/kernel-hacking/locking.rst).
|
and acquire the lock (see Documentation/kernel-hacking/locking.rst).
|
||||||
|
|
||||||
4.5 How to tell whether MSI/MSI-X is enabled on a device
|
How to tell whether MSI/MSI-X is enabled on a device
|
||||||
|
----------------------------------------------------
|
||||||
|
|
||||||
Using 'lspci -v' (as root) may show some devices with "MSI", "Message
|
Using 'lspci -v' (as root) may show some devices with "MSI", "Message
|
||||||
Signalled Interrupts" or "MSI-X" capabilities. Each of these capabilities
|
Signalled Interrupts" or "MSI-X" capabilities. Each of these capabilities
|
||||||
|
@ -196,7 +208,8 @@ has an 'Enable' flag which is followed with either "+" (enabled)
|
||||||
or "-" (disabled).
|
or "-" (disabled).
|
||||||
|
|
||||||
|
|
||||||
5. MSI quirks
|
MSI quirks
|
||||||
|
==========
|
||||||
|
|
||||||
Several PCI chipsets or devices are known not to support MSIs.
|
Several PCI chipsets or devices are known not to support MSIs.
|
||||||
The PCI stack provides three ways to disable MSIs:
|
The PCI stack provides three ways to disable MSIs:
|
||||||
|
@ -205,7 +218,8 @@ The PCI stack provides three ways to disable MSIs:
|
||||||
2. on all devices behind a specific bridge
|
2. on all devices behind a specific bridge
|
||||||
3. on a single device
|
3. on a single device
|
||||||
|
|
||||||
5.1. Disabling MSIs globally
|
Disabling MSIs globally
|
||||||
|
-----------------------
|
||||||
|
|
||||||
Some host chipsets simply don't support MSIs properly. If we're
|
Some host chipsets simply don't support MSIs properly. If we're
|
||||||
lucky, the manufacturer knows this and has indicated it in the ACPI
|
lucky, the manufacturer knows this and has indicated it in the ACPI
|
||||||
|
@ -219,7 +233,8 @@ on the kernel command line to disable MSIs on all devices. It would be
|
||||||
in your best interests to report the problem to linux-pci@vger.kernel.org
|
in your best interests to report the problem to linux-pci@vger.kernel.org
|
||||||
including a full 'lspci -v' so we can add the quirks to the kernel.
|
including a full 'lspci -v' so we can add the quirks to the kernel.
|
||||||
|
|
||||||
5.2. Disabling MSIs below a bridge
|
Disabling MSIs below a bridge
|
||||||
|
-----------------------------
|
||||||
|
|
||||||
Some PCI bridges are not able to route MSIs between busses properly.
|
Some PCI bridges are not able to route MSIs between busses properly.
|
||||||
In this case, MSIs must be disabled on all devices behind the bridge.
|
In this case, MSIs must be disabled on all devices behind the bridge.
|
||||||
|
@ -230,7 +245,7 @@ as the nVidia nForce and Serverworks HT2000). As with host chipsets,
|
||||||
Linux mostly knows about them and automatically enables MSIs if it can.
|
Linux mostly knows about them and automatically enables MSIs if it can.
|
||||||
If you have a bridge unknown to Linux, you can enable
|
If you have a bridge unknown to Linux, you can enable
|
||||||
MSIs in configuration space using whatever method you know works, then
|
MSIs in configuration space using whatever method you know works, then
|
||||||
enable MSIs on that bridge by doing:
|
enable MSIs on that bridge by doing::
|
||||||
|
|
||||||
echo 1 > /sys/bus/pci/devices/$bridge/msi_bus
|
echo 1 > /sys/bus/pci/devices/$bridge/msi_bus
|
||||||
|
|
||||||
|
@ -244,7 +259,8 @@ below this bridge.
|
||||||
Again, please notify linux-pci@vger.kernel.org of any bridges that need
|
Again, please notify linux-pci@vger.kernel.org of any bridges that need
|
||||||
special handling.
|
special handling.
|
||||||
|
|
||||||
5.3. Disabling MSIs on a single device
|
Disabling MSIs on a single device
|
||||||
|
---------------------------------
|
||||||
|
|
||||||
Some devices are known to have faulty MSI implementations. Usually this
|
Some devices are known to have faulty MSI implementations. Usually this
|
||||||
is handled in the individual device driver, but occasionally it's necessary
|
is handled in the individual device driver, but occasionally it's necessary
|
||||||
|
@ -252,7 +268,8 @@ to handle this with a quirk. Some drivers have an option to disable use
|
||||||
of MSI. While this is a convenient workaround for the driver author,
|
of MSI. While this is a convenient workaround for the driver author,
|
||||||
it is not good practice, and should not be emulated.
|
it is not good practice, and should not be emulated.
|
||||||
|
|
||||||
5.4. Finding why MSIs are disabled on a device
|
Finding why MSIs are disabled on a device
|
||||||
|
-----------------------------------------
|
||||||
|
|
||||||
From the above three sections, you can see that there are many reasons
|
From the above three sections, you can see that there are many reasons
|
||||||
why MSIs may not be enabled for a given device. Your first step should
|
why MSIs may not be enabled for a given device. Your first step should
|
||||||
|
@ -260,8 +277,8 @@ be to examine your dmesg carefully to determine whether MSIs are enabled
|
||||||
for your machine. You should also check your .config to be sure you
|
for your machine. You should also check your .config to be sure you
|
||||||
have enabled CONFIG_PCI_MSI.
|
have enabled CONFIG_PCI_MSI.
|
||||||
|
|
||||||
Then, 'lspci -t' gives the list of bridges above a device. Reading
|
Then, 'lspci -t' gives the list of bridges above a device. Reading
|
||||||
/sys/bus/pci/devices/*/msi_bus will tell you whether MSIs are enabled (1)
|
`/sys/bus/pci/devices/*/msi_bus` will tell you whether MSIs are enabled (1)
|
||||||
or disabled (0). If 0 is found in any of the msi_bus files belonging
|
or disabled (0). If 0 is found in any of the msi_bus files belonging
|
||||||
to bridges between the PCI root and the device, MSIs are disabled.
|
to bridges between the PCI root and the device, MSIs are disabled.
|
||||||
|
|
|
@ -1,12 +1,13 @@
|
||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
PCI Error Recovery
|
==================
|
||||||
------------------
|
PCI Error Recovery
|
||||||
February 2, 2006
|
==================
|
||||||
|
|
||||||
Current document maintainer:
|
|
||||||
Linas Vepstas <linasvepstas@gmail.com>
|
:Authors: - Linas Vepstas <linasvepstas@gmail.com>
|
||||||
updated by Richard Lary <rlary@us.ibm.com>
|
- Richard Lary <rlary@us.ibm.com>
|
||||||
and Mike Mason <mmlnx@us.ibm.com> on 27-Jul-2009
|
- Mike Mason <mmlnx@us.ibm.com>
|
||||||
|
|
||||||
|
|
||||||
Many PCI bus controllers are able to detect a variety of hardware
|
Many PCI bus controllers are able to detect a variety of hardware
|
||||||
|
@ -63,7 +64,8 @@ mechanisms for dealing with SCSI bus errors and SCSI bus resets.
|
||||||
|
|
||||||
|
|
||||||
Detailed Design
|
Detailed Design
|
||||||
---------------
|
===============
|
||||||
|
|
||||||
Design and implementation details below, based on a chain of
|
Design and implementation details below, based on a chain of
|
||||||
public email discussions with Ben Herrenschmidt, circa 5 April 2005.
|
public email discussions with Ben Herrenschmidt, circa 5 April 2005.
|
||||||
|
|
||||||
|
@ -73,30 +75,33 @@ pci_driver. A driver that fails to provide the structure is "non-aware",
|
||||||
and the actual recovery steps taken are platform dependent. The
|
and the actual recovery steps taken are platform dependent. The
|
||||||
arch/powerpc implementation will simulate a PCI hotplug remove/add.
|
arch/powerpc implementation will simulate a PCI hotplug remove/add.
|
||||||
|
|
||||||
This structure has the form:
|
This structure has the form::
|
||||||
struct pci_error_handlers
|
|
||||||
{
|
|
||||||
int (*error_detected)(struct pci_dev *dev, enum pci_channel_state);
|
|
||||||
int (*mmio_enabled)(struct pci_dev *dev);
|
|
||||||
int (*slot_reset)(struct pci_dev *dev);
|
|
||||||
void (*resume)(struct pci_dev *dev);
|
|
||||||
};
|
|
||||||
|
|
||||||
The possible channel states are:
|
struct pci_error_handlers
|
||||||
enum pci_channel_state {
|
{
|
||||||
pci_channel_io_normal, /* I/O channel is in normal state */
|
int (*error_detected)(struct pci_dev *dev, enum pci_channel_state);
|
||||||
pci_channel_io_frozen, /* I/O to channel is blocked */
|
int (*mmio_enabled)(struct pci_dev *dev);
|
||||||
pci_channel_io_perm_failure, /* PCI card is dead */
|
int (*slot_reset)(struct pci_dev *dev);
|
||||||
};
|
void (*resume)(struct pci_dev *dev);
|
||||||
|
};
|
||||||
|
|
||||||
Possible return values are:
|
The possible channel states are::
|
||||||
enum pci_ers_result {
|
|
||||||
PCI_ERS_RESULT_NONE, /* no result/none/not supported in device driver */
|
enum pci_channel_state {
|
||||||
PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */
|
pci_channel_io_normal, /* I/O channel is in normal state */
|
||||||
PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
|
pci_channel_io_frozen, /* I/O to channel is blocked */
|
||||||
PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
|
pci_channel_io_perm_failure, /* PCI card is dead */
|
||||||
PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
|
};
|
||||||
};
|
|
||||||
|
Possible return values are::
|
||||||
|
|
||||||
|
enum pci_ers_result {
|
||||||
|
PCI_ERS_RESULT_NONE, /* no result/none/not supported in device driver */
|
||||||
|
PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */
|
||||||
|
PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
|
||||||
|
PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
|
||||||
|
PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
|
||||||
|
};
|
||||||
|
|
||||||
A driver does not have to implement all of these callbacks; however,
|
A driver does not have to implement all of these callbacks; however,
|
||||||
if it implements any, it must implement error_detected(). If a callback
|
if it implements any, it must implement error_detected(). If a callback
|
||||||
|
@ -134,16 +139,17 @@ shouldn't do any new IOs. Called in task context. This is sort of a
|
||||||
|
|
||||||
All drivers participating in this system must implement this call.
|
All drivers participating in this system must implement this call.
|
||||||
The driver must return one of the following result codes:
|
The driver must return one of the following result codes:
|
||||||
- PCI_ERS_RESULT_CAN_RECOVER:
|
|
||||||
Driver returns this if it thinks it might be able to recover
|
- PCI_ERS_RESULT_CAN_RECOVER
|
||||||
the HW by just banging IOs or if it wants to be given
|
Driver returns this if it thinks it might be able to recover
|
||||||
a chance to extract some diagnostic information (see
|
the HW by just banging IOs or if it wants to be given
|
||||||
mmio_enable, below).
|
a chance to extract some diagnostic information (see
|
||||||
- PCI_ERS_RESULT_NEED_RESET:
|
mmio_enable, below).
|
||||||
Driver returns this if it can't recover without a
|
- PCI_ERS_RESULT_NEED_RESET
|
||||||
slot reset.
|
Driver returns this if it can't recover without a
|
||||||
- PCI_ERS_RESULT_DISCONNECT:
|
slot reset.
|
||||||
Driver returns this if it doesn't want to recover at all.
|
- PCI_ERS_RESULT_DISCONNECT
|
||||||
|
Driver returns this if it doesn't want to recover at all.
|
||||||
|
|
||||||
The next step taken will depend on the result codes returned by the
|
The next step taken will depend on the result codes returned by the
|
||||||
drivers.
|
drivers.
|
||||||
|
@ -159,25 +165,27 @@ then recovery proceeds to STEP 4 (Slot Reset).
|
||||||
If the platform is unable to recover the slot, the next step
|
If the platform is unable to recover the slot, the next step
|
||||||
is STEP 6 (Permanent Failure).
|
is STEP 6 (Permanent Failure).
|
||||||
|
|
||||||
>>> The current powerpc implementation assumes that a device driver will
|
.. note::
|
||||||
>>> *not* schedule or semaphore in this routine; the current powerpc
|
|
||||||
>>> implementation uses one kernel thread to notify all devices;
|
|
||||||
>>> thus, if one device sleeps/schedules, all devices are affected.
|
|
||||||
>>> Doing better requires complex multi-threaded logic in the error
|
|
||||||
>>> recovery implementation (e.g. waiting for all notification threads
|
|
||||||
>>> to "join" before proceeding with recovery.) This seems excessively
|
|
||||||
>>> complex and not worth implementing.
|
|
||||||
|
|
||||||
>>> The current powerpc implementation doesn't much care if the device
|
The current powerpc implementation assumes that a device driver will
|
||||||
>>> attempts I/O at this point, or not. I/O's will fail, returning
|
*not* schedule or semaphore in this routine; the current powerpc
|
||||||
>>> a value of 0xff on read, and writes will be dropped. If more than
|
implementation uses one kernel thread to notify all devices;
|
||||||
>>> EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH
|
thus, if one device sleeps/schedules, all devices are affected.
|
||||||
>>> assumes that the device driver has gone into an infinite loop
|
Doing better requires complex multi-threaded logic in the error
|
||||||
>>> and prints an error to syslog. A reboot is then required to
|
recovery implementation (e.g. waiting for all notification threads
|
||||||
>>> get the device working again.
|
to "join" before proceeding with recovery.) This seems excessively
|
||||||
|
complex and not worth implementing.
|
||||||
|
|
||||||
|
The current powerpc implementation doesn't much care if the device
|
||||||
|
attempts I/O at this point, or not. I/O's will fail, returning
|
||||||
|
a value of 0xff on read, and writes will be dropped. If more than
|
||||||
|
EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH
|
||||||
|
assumes that the device driver has gone into an infinite loop
|
||||||
|
and prints an error to syslog. A reboot is then required to
|
||||||
|
get the device working again.
|
||||||
|
|
||||||
STEP 2: MMIO Enabled
|
STEP 2: MMIO Enabled
|
||||||
-------------------
|
--------------------
|
||||||
The platform re-enables MMIO to the device (but typically not the
|
The platform re-enables MMIO to the device (but typically not the
|
||||||
DMA), and then calls the mmio_enabled() callback on all affected
|
DMA), and then calls the mmio_enabled() callback on all affected
|
||||||
device drivers.
|
device drivers.
|
||||||
|
@ -192,34 +200,36 @@ link reset was performed by the HW. If the platform can't just re-enable IOs
|
||||||
without a slot reset or a link reset, it will not call this callback, and
|
without a slot reset or a link reset, it will not call this callback, and
|
||||||
instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
|
instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
|
||||||
|
|
||||||
>>> The following is proposed; no platform implements this yet:
|
.. note::
|
||||||
>>> Proposal: All I/O's should be done _synchronously_ from within
|
|
||||||
>>> this callback, errors triggered by them will be returned via
|
The following is proposed; no platform implements this yet:
|
||||||
>>> the normal pci_check_whatever() API, no new error_detected()
|
Proposal: All I/O's should be done _synchronously_ from within
|
||||||
>>> callback will be issued due to an error happening here. However,
|
this callback, errors triggered by them will be returned via
|
||||||
>>> such an error might cause IOs to be re-blocked for the whole
|
the normal pci_check_whatever() API, no new error_detected()
|
||||||
>>> segment, and thus invalidate the recovery that other devices
|
callback will be issued due to an error happening here. However,
|
||||||
>>> on the same segment might have done, forcing the whole segment
|
such an error might cause IOs to be re-blocked for the whole
|
||||||
>>> into one of the next states, that is, link reset or slot reset.
|
segment, and thus invalidate the recovery that other devices
|
||||||
|
on the same segment might have done, forcing the whole segment
|
||||||
|
into one of the next states, that is, link reset or slot reset.
|
||||||
|
|
||||||
The driver should return one of the following result codes:
|
The driver should return one of the following result codes:
|
||||||
- PCI_ERS_RESULT_RECOVERED
|
- PCI_ERS_RESULT_RECOVERED
|
||||||
Driver returns this if it thinks the device is fully
|
Driver returns this if it thinks the device is fully
|
||||||
functional and thinks it is ready to start
|
functional and thinks it is ready to start
|
||||||
normal driver operations again. There is no
|
normal driver operations again. There is no
|
||||||
guarantee that the driver will actually be
|
guarantee that the driver will actually be
|
||||||
allowed to proceed, as another driver on the
|
allowed to proceed, as another driver on the
|
||||||
same segment might have failed and thus triggered a
|
same segment might have failed and thus triggered a
|
||||||
slot reset on platforms that support it.
|
slot reset on platforms that support it.
|
||||||
|
|
||||||
- PCI_ERS_RESULT_NEED_RESET
|
- PCI_ERS_RESULT_NEED_RESET
|
||||||
Driver returns this if it thinks the device is not
|
Driver returns this if it thinks the device is not
|
||||||
recoverable in its current state and it needs a slot
|
recoverable in its current state and it needs a slot
|
||||||
reset to proceed.
|
reset to proceed.
|
||||||
|
|
||||||
- PCI_ERS_RESULT_DISCONNECT
|
- PCI_ERS_RESULT_DISCONNECT
|
||||||
Same as above. Total failure, no recovery even after
|
Same as above. Total failure, no recovery even after
|
||||||
reset driver dead. (To be defined more precisely)
|
reset driver dead. (To be defined more precisely)
|
||||||
|
|
||||||
The next step taken depends on the results returned by the drivers.
|
The next step taken depends on the results returned by the drivers.
|
||||||
If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform
|
If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform
|
||||||
|
@ -293,31 +303,33 @@ device will be considered "dead" in this case.
|
||||||
Drivers for multi-function cards will need to coordinate among
|
Drivers for multi-function cards will need to coordinate among
|
||||||
themselves as to which driver instance will perform any "one-shot"
|
themselves as to which driver instance will perform any "one-shot"
|
||||||
or global device initialization. For example, the Symbios sym53cxx2
|
or global device initialization. For example, the Symbios sym53cxx2
|
||||||
driver performs device init only from PCI function 0:
|
driver performs device init only from PCI function 0::
|
||||||
|
|
||||||
+ if (PCI_FUNC(pdev->devfn) == 0)
|
+ if (PCI_FUNC(pdev->devfn) == 0)
|
||||||
+ sym_reset_scsi_bus(np, 0);
|
+ sym_reset_scsi_bus(np, 0);
|
||||||
|
|
||||||
Result codes:
|
Result codes:
|
||||||
- PCI_ERS_RESULT_DISCONNECT
|
- PCI_ERS_RESULT_DISCONNECT
|
||||||
Same as above.
|
Same as above.
|
||||||
|
|
||||||
Drivers for PCI Express cards that require a fundamental reset must
|
Drivers for PCI Express cards that require a fundamental reset must
|
||||||
set the needs_freset bit in the pci_dev structure in their probe function.
|
set the needs_freset bit in the pci_dev structure in their probe function.
|
||||||
For example, the QLogic qla2xxx driver sets the needs_freset bit for certain
|
For example, the QLogic qla2xxx driver sets the needs_freset bit for certain
|
||||||
PCI card types:
|
PCI card types::
|
||||||
|
|
||||||
+ /* Set EEH reset type to fundamental if required by hba */
|
+ /* Set EEH reset type to fundamental if required by hba */
|
||||||
+ if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha))
|
+ if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha))
|
||||||
+ pdev->needs_freset = 1;
|
+ pdev->needs_freset = 1;
|
||||||
+
|
+
|
||||||
|
|
||||||
Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent
|
Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent
|
||||||
Failure).
|
Failure).
|
||||||
|
|
||||||
>>> The current powerpc implementation does not try a power-cycle
|
.. note::
|
||||||
>>> reset if the driver returned PCI_ERS_RESULT_DISCONNECT.
|
|
||||||
>>> However, it probably should.
|
The current powerpc implementation does not try a power-cycle
|
||||||
|
reset if the driver returned PCI_ERS_RESULT_DISCONNECT.
|
||||||
|
However, it probably should.
|
||||||
|
|
||||||
|
|
||||||
STEP 5: Resume Operations
|
STEP 5: Resume Operations
|
||||||
|
@ -370,44 +382,43 @@ The current policy is to turn this into a platform policy.
|
||||||
That is, the recovery API only requires that:
|
That is, the recovery API only requires that:
|
||||||
|
|
||||||
- There is no guarantee that interrupt delivery can proceed from any
|
- There is no guarantee that interrupt delivery can proceed from any
|
||||||
device on the segment starting from the error detection and until the
|
device on the segment starting from the error detection and until the
|
||||||
slot_reset callback is called, at which point interrupts are expected
|
slot_reset callback is called, at which point interrupts are expected
|
||||||
to be fully operational.
|
to be fully operational.
|
||||||
|
|
||||||
- There is no guarantee that interrupt delivery is stopped, that is,
|
- There is no guarantee that interrupt delivery is stopped, that is,
|
||||||
a driver that gets an interrupt after detecting an error, or that detects
|
a driver that gets an interrupt after detecting an error, or that detects
|
||||||
an error within the interrupt handler such that it prevents proper
|
an error within the interrupt handler such that it prevents proper
|
||||||
ack'ing of the interrupt (and thus removal of the source) should just
|
ack'ing of the interrupt (and thus removal of the source) should just
|
||||||
return IRQ_NOTHANDLED. It's up to the platform to deal with that
|
return IRQ_NOTHANDLED. It's up to the platform to deal with that
|
||||||
condition, typically by masking the IRQ source during the duration of
|
condition, typically by masking the IRQ source during the duration of
|
||||||
the error handling. It is expected that the platform "knows" which
|
the error handling. It is expected that the platform "knows" which
|
||||||
interrupts are routed to error-management capable slots and can deal
|
interrupts are routed to error-management capable slots and can deal
|
||||||
with temporarily disabling that IRQ number during error processing (this
|
with temporarily disabling that IRQ number during error processing (this
|
||||||
isn't terribly complex). That means some IRQ latency for other devices
|
isn't terribly complex). That means some IRQ latency for other devices
|
||||||
sharing the interrupt, but there is simply no other way. High end
|
sharing the interrupt, but there is simply no other way. High end
|
||||||
platforms aren't supposed to share interrupts between many devices
|
platforms aren't supposed to share interrupts between many devices
|
||||||
anyway :)
|
anyway :)
|
||||||
|
|
||||||
>>> Implementation details for the powerpc platform are discussed in
|
.. note::
|
||||||
>>> the file Documentation/powerpc/eeh-pci-error-recovery.txt
|
|
||||||
|
|
||||||
>>> As of this writing, there is a growing list of device drivers with
|
Implementation details for the powerpc platform are discussed in
|
||||||
>>> patches implementing error recovery. Not all of these patches are in
|
the file Documentation/powerpc/eeh-pci-error-recovery.txt
|
||||||
>>> mainline yet. These may be used as "examples":
|
|
||||||
>>>
|
|
||||||
>>> drivers/scsi/ipr
|
|
||||||
>>> drivers/scsi/sym53c8xx_2
|
|
||||||
>>> drivers/scsi/qla2xxx
|
|
||||||
>>> drivers/scsi/lpfc
|
|
||||||
>>> drivers/next/bnx2.c
|
|
||||||
>>> drivers/next/e100.c
|
|
||||||
>>> drivers/net/e1000
|
|
||||||
>>> drivers/net/e1000e
|
|
||||||
>>> drivers/net/ixgb
|
|
||||||
>>> drivers/net/ixgbe
|
|
||||||
>>> drivers/net/cxgb3
|
|
||||||
>>> drivers/net/s2io.c
|
|
||||||
>>> drivers/net/qlge
|
|
||||||
|
|
||||||
The End
|
As of this writing, there is a growing list of device drivers with
|
||||||
-------
|
patches implementing error recovery. Not all of these patches are in
|
||||||
|
mainline yet. These may be used as "examples":
|
||||||
|
|
||||||
|
- drivers/scsi/ipr
|
||||||
|
- drivers/scsi/sym53c8xx_2
|
||||||
|
- drivers/scsi/qla2xxx
|
||||||
|
- drivers/scsi/lpfc
|
||||||
|
- drivers/next/bnx2.c
|
||||||
|
- drivers/next/e100.c
|
||||||
|
- drivers/net/e1000
|
||||||
|
- drivers/net/e1000e
|
||||||
|
- drivers/net/ixgb
|
||||||
|
- drivers/net/ixgbe
|
||||||
|
- drivers/net/cxgb3
|
||||||
|
- drivers/net/s2io.c
|
||||||
|
- drivers/net/qlge
|
|
@ -1,14 +1,19 @@
|
||||||
PCI Express I/O Virtualization Howto
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
Copyright (C) 2009 Intel Corporation
|
.. include:: <isonum.txt>
|
||||||
Yu Zhao <yu.zhao@intel.com>
|
|
||||||
|
|
||||||
Update: November 2012
|
====================================
|
||||||
-- sysfs-based SRIOV enable-/disable-ment
|
PCI Express I/O Virtualization Howto
|
||||||
Donald Dutile <ddutile@redhat.com>
|
====================================
|
||||||
|
|
||||||
1. Overview
|
:Copyright: |copy| 2009 Intel Corporation
|
||||||
|
:Authors: - Yu Zhao <yu.zhao@intel.com>
|
||||||
|
- Donald Dutile <ddutile@redhat.com>
|
||||||
|
|
||||||
1.1 What is SR-IOV
|
Overview
|
||||||
|
========
|
||||||
|
|
||||||
|
What is SR-IOV
|
||||||
|
--------------
|
||||||
|
|
||||||
Single Root I/O Virtualization (SR-IOV) is a PCI Express Extended
|
Single Root I/O Virtualization (SR-IOV) is a PCI Express Extended
|
||||||
capability which makes one physical device appear as multiple virtual
|
capability which makes one physical device appear as multiple virtual
|
||||||
|
@ -23,9 +28,11 @@ Memory Space, which is used to map its register set. VF device driver
|
||||||
operates on the register set so it can be functional and appear as a
|
operates on the register set so it can be functional and appear as a
|
||||||
real existing PCI device.
|
real existing PCI device.
|
||||||
|
|
||||||
2. User Guide
|
User Guide
|
||||||
|
==========
|
||||||
|
|
||||||
2.1 How can I enable SR-IOV capability
|
How can I enable SR-IOV capability
|
||||||
|
----------------------------------
|
||||||
|
|
||||||
Multiple methods are available for SR-IOV enablement.
|
Multiple methods are available for SR-IOV enablement.
|
||||||
In the first method, the device driver (PF driver) will control the
|
In the first method, the device driver (PF driver) will control the
|
||||||
|
@ -43,105 +50,123 @@ checks, e.g., check numvfs == 0 if enabling VFs, ensure
|
||||||
numvfs <= totalvfs.
|
numvfs <= totalvfs.
|
||||||
The second method is the recommended method for new/future VF devices.
|
The second method is the recommended method for new/future VF devices.
|
||||||
|
|
||||||
2.2 How can I use the Virtual Functions
|
How can I use the Virtual Functions
|
||||||
|
-----------------------------------
|
||||||
|
|
||||||
The VF is treated as hot-plugged PCI devices in the kernel, so they
|
The VF is treated as hot-plugged PCI devices in the kernel, so they
|
||||||
should be able to work in the same way as real PCI devices. The VF
|
should be able to work in the same way as real PCI devices. The VF
|
||||||
requires device driver that is same as a normal PCI device's.
|
requires device driver that is same as a normal PCI device's.
|
||||||
|
|
||||||
3. Developer Guide
|
Developer Guide
|
||||||
|
===============
|
||||||
|
|
||||||
3.1 SR-IOV API
|
SR-IOV API
|
||||||
|
----------
|
||||||
|
|
||||||
To enable SR-IOV capability:
|
To enable SR-IOV capability:
|
||||||
(a) For the first method, in the driver:
|
|
||||||
|
(a) For the first method, in the driver::
|
||||||
|
|
||||||
int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
|
int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
|
||||||
'nr_virtfn' is number of VFs to be enabled.
|
|
||||||
(b) For the second method, from sysfs:
|
'nr_virtfn' is number of VFs to be enabled.
|
||||||
|
|
||||||
|
(b) For the second method, from sysfs::
|
||||||
|
|
||||||
echo 'nr_virtfn' > \
|
echo 'nr_virtfn' > \
|
||||||
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
|
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
|
||||||
|
|
||||||
To disable SR-IOV capability:
|
To disable SR-IOV capability:
|
||||||
(a) For the first method, in the driver:
|
|
||||||
|
(a) For the first method, in the driver::
|
||||||
|
|
||||||
void pci_disable_sriov(struct pci_dev *dev);
|
void pci_disable_sriov(struct pci_dev *dev);
|
||||||
(b) For the second method, from sysfs:
|
|
||||||
|
(b) For the second method, from sysfs::
|
||||||
|
|
||||||
echo 0 > \
|
echo 0 > \
|
||||||
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
|
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
|
||||||
|
|
||||||
To enable auto probing VFs by a compatible driver on the host, run
|
To enable auto probing VFs by a compatible driver on the host, run
|
||||||
command below before enabling SR-IOV capabilities. This is the
|
command below before enabling SR-IOV capabilities. This is the
|
||||||
default behavior.
|
default behavior.
|
||||||
|
::
|
||||||
|
|
||||||
echo 1 > \
|
echo 1 > \
|
||||||
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_drivers_autoprobe
|
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_drivers_autoprobe
|
||||||
|
|
||||||
To disable auto probing VFs by a compatible driver on the host, run
|
To disable auto probing VFs by a compatible driver on the host, run
|
||||||
command below before enabling SR-IOV capabilities. Updating this
|
command below before enabling SR-IOV capabilities. Updating this
|
||||||
entry will not affect VFs which are already probed.
|
entry will not affect VFs which are already probed.
|
||||||
|
::
|
||||||
|
|
||||||
echo 0 > \
|
echo 0 > \
|
||||||
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_drivers_autoprobe
|
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_drivers_autoprobe
|
||||||
|
|
||||||
3.2 Usage example
|
Usage example
|
||||||
|
-------------
|
||||||
|
|
||||||
Following piece of code illustrates the usage of the SR-IOV API.
|
Following piece of code illustrates the usage of the SR-IOV API.
|
||||||
|
::
|
||||||
|
|
||||||
static int dev_probe(struct pci_dev *dev, const struct pci_device_id *id)
|
static int dev_probe(struct pci_dev *dev, const struct pci_device_id *id)
|
||||||
{
|
{
|
||||||
pci_enable_sriov(dev, NR_VIRTFN);
|
pci_enable_sriov(dev, NR_VIRTFN);
|
||||||
|
|
||||||
...
|
|
||||||
|
|
||||||
return 0;
|
|
||||||
}
|
|
||||||
|
|
||||||
static void dev_remove(struct pci_dev *dev)
|
|
||||||
{
|
|
||||||
pci_disable_sriov(dev);
|
|
||||||
|
|
||||||
...
|
|
||||||
}
|
|
||||||
|
|
||||||
static int dev_suspend(struct pci_dev *dev, pm_message_t state)
|
|
||||||
{
|
|
||||||
...
|
|
||||||
|
|
||||||
return 0;
|
|
||||||
}
|
|
||||||
|
|
||||||
static int dev_resume(struct pci_dev *dev)
|
|
||||||
{
|
|
||||||
...
|
|
||||||
|
|
||||||
return 0;
|
|
||||||
}
|
|
||||||
|
|
||||||
static void dev_shutdown(struct pci_dev *dev)
|
|
||||||
{
|
|
||||||
...
|
|
||||||
}
|
|
||||||
|
|
||||||
static int dev_sriov_configure(struct pci_dev *dev, int numvfs)
|
|
||||||
{
|
|
||||||
if (numvfs > 0) {
|
|
||||||
...
|
|
||||||
pci_enable_sriov(dev, numvfs);
|
|
||||||
...
|
|
||||||
return numvfs;
|
|
||||||
}
|
|
||||||
if (numvfs == 0) {
|
|
||||||
....
|
|
||||||
pci_disable_sriov(dev);
|
|
||||||
...
|
...
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
}
|
|
||||||
|
|
||||||
static struct pci_driver dev_driver = {
|
static void dev_remove(struct pci_dev *dev)
|
||||||
.name = "SR-IOV Physical Function driver",
|
{
|
||||||
.id_table = dev_id_table,
|
pci_disable_sriov(dev);
|
||||||
.probe = dev_probe,
|
|
||||||
.remove = dev_remove,
|
...
|
||||||
.suspend = dev_suspend,
|
}
|
||||||
.resume = dev_resume,
|
|
||||||
.shutdown = dev_shutdown,
|
static int dev_suspend(struct pci_dev *dev, pm_message_t state)
|
||||||
.sriov_configure = dev_sriov_configure,
|
{
|
||||||
};
|
...
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
static int dev_resume(struct pci_dev *dev)
|
||||||
|
{
|
||||||
|
...
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
static void dev_shutdown(struct pci_dev *dev)
|
||||||
|
{
|
||||||
|
...
|
||||||
|
}
|
||||||
|
|
||||||
|
static int dev_sriov_configure(struct pci_dev *dev, int numvfs)
|
||||||
|
{
|
||||||
|
if (numvfs > 0) {
|
||||||
|
...
|
||||||
|
pci_enable_sriov(dev, numvfs);
|
||||||
|
...
|
||||||
|
return numvfs;
|
||||||
|
}
|
||||||
|
if (numvfs == 0) {
|
||||||
|
....
|
||||||
|
pci_disable_sriov(dev);
|
||||||
|
...
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
static struct pci_driver dev_driver = {
|
||||||
|
.name = "SR-IOV Physical Function driver",
|
||||||
|
.id_table = dev_id_table,
|
||||||
|
.probe = dev_probe,
|
||||||
|
.remove = dev_remove,
|
||||||
|
.suspend = dev_suspend,
|
||||||
|
.resume = dev_resume,
|
||||||
|
.shutdown = dev_shutdown,
|
||||||
|
.sriov_configure = dev_sriov_configure,
|
||||||
|
};
|
|
@ -1,10 +1,12 @@
|
||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
How To Write Linux PCI Drivers
|
==============================
|
||||||
|
How To Write Linux PCI Drivers
|
||||||
|
==============================
|
||||||
|
|
||||||
by Martin Mares <mj@ucw.cz> on 07-Feb-2000
|
:Authors: - Martin Mares <mj@ucw.cz>
|
||||||
updated by Grant Grundler <grundler@parisc-linux.org> on 23-Dec-2006
|
- Grant Grundler <grundler@parisc-linux.org>
|
||||||
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
The world of PCI is vast and full of (mostly unpleasant) surprises.
|
The world of PCI is vast and full of (mostly unpleasant) surprises.
|
||||||
Since each CPU architecture implements different chip-sets and PCI devices
|
Since each CPU architecture implements different chip-sets and PCI devices
|
||||||
have different requirements (erm, "features"), the result is the PCI support
|
have different requirements (erm, "features"), the result is the PCI support
|
||||||
|
@ -15,8 +17,7 @@ PCI device drivers.
|
||||||
A more complete resource is the third edition of "Linux Device Drivers"
|
A more complete resource is the third edition of "Linux Device Drivers"
|
||||||
by Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman.
|
by Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman.
|
||||||
LDD3 is available for free (under Creative Commons License) from:
|
LDD3 is available for free (under Creative Commons License) from:
|
||||||
|
http://lwn.net/Kernel/LDD3/.
|
||||||
http://lwn.net/Kernel/LDD3/
|
|
||||||
|
|
||||||
However, keep in mind that all documents are subject to "bit rot".
|
However, keep in mind that all documents are subject to "bit rot".
|
||||||
Refer to the source code if things are not working as described here.
|
Refer to the source code if things are not working as described here.
|
||||||
|
@ -25,9 +26,8 @@ Please send questions/comments/patches about Linux PCI API to the
|
||||||
"Linux PCI" <linux-pci@atrey.karlin.mff.cuni.cz> mailing list.
|
"Linux PCI" <linux-pci@atrey.karlin.mff.cuni.cz> mailing list.
|
||||||
|
|
||||||
|
|
||||||
|
Structure of PCI drivers
|
||||||
0. Structure of PCI drivers
|
========================
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
PCI drivers "discover" PCI devices in a system via pci_register_driver().
|
PCI drivers "discover" PCI devices in a system via pci_register_driver().
|
||||||
Actually, it's the other way around. When the PCI generic code discovers
|
Actually, it's the other way around. When the PCI generic code discovers
|
||||||
a new device, the driver with a matching "description" will be notified.
|
a new device, the driver with a matching "description" will be notified.
|
||||||
|
@ -42,24 +42,25 @@ pointers and thus dictates the high level structure of a driver.
|
||||||
Once the driver knows about a PCI device and takes ownership, the
|
Once the driver knows about a PCI device and takes ownership, the
|
||||||
driver generally needs to perform the following initialization:
|
driver generally needs to perform the following initialization:
|
||||||
|
|
||||||
Enable the device
|
- Enable the device
|
||||||
Request MMIO/IOP resources
|
- Request MMIO/IOP resources
|
||||||
Set the DMA mask size (for both coherent and streaming DMA)
|
- Set the DMA mask size (for both coherent and streaming DMA)
|
||||||
Allocate and initialize shared control data (pci_allocate_coherent())
|
- Allocate and initialize shared control data (pci_allocate_coherent())
|
||||||
Access device configuration space (if needed)
|
- Access device configuration space (if needed)
|
||||||
Register IRQ handler (request_irq())
|
- Register IRQ handler (request_irq())
|
||||||
Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip)
|
- Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip)
|
||||||
Enable DMA/processing engines
|
- Enable DMA/processing engines
|
||||||
|
|
||||||
When done using the device, and perhaps the module needs to be unloaded,
|
When done using the device, and perhaps the module needs to be unloaded,
|
||||||
the driver needs to take the follow steps:
|
the driver needs to take the follow steps:
|
||||||
Disable the device from generating IRQs
|
|
||||||
Release the IRQ (free_irq())
|
- Disable the device from generating IRQs
|
||||||
Stop all DMA activity
|
- Release the IRQ (free_irq())
|
||||||
Release DMA buffers (both streaming and coherent)
|
- Stop all DMA activity
|
||||||
Unregister from other subsystems (e.g. scsi or netdev)
|
- Release DMA buffers (both streaming and coherent)
|
||||||
Release MMIO/IOP resources
|
- Unregister from other subsystems (e.g. scsi or netdev)
|
||||||
Disable the device
|
- Release MMIO/IOP resources
|
||||||
|
- Disable the device
|
||||||
|
|
||||||
Most of these topics are covered in the following sections.
|
Most of these topics are covered in the following sections.
|
||||||
For the rest look at LDD3 or <linux/pci.h> .
|
For the rest look at LDD3 or <linux/pci.h> .
|
||||||
|
@ -70,99 +71,38 @@ completely empty or just returning an appropriate error codes to avoid
|
||||||
lots of ifdefs in the drivers.
|
lots of ifdefs in the drivers.
|
||||||
|
|
||||||
|
|
||||||
|
pci_register_driver() call
|
||||||
|
==========================
|
||||||
|
|
||||||
1. pci_register_driver() call
|
PCI device drivers call ``pci_register_driver()`` during their
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
PCI device drivers call pci_register_driver() during their
|
|
||||||
initialization with a pointer to a structure describing the driver
|
initialization with a pointer to a structure describing the driver
|
||||||
(struct pci_driver):
|
(``struct pci_driver``):
|
||||||
|
|
||||||
field name Description
|
.. kernel-doc:: include/linux/pci.h
|
||||||
---------- ------------------------------------------------------
|
:functions: pci_driver
|
||||||
id_table Pointer to table of device ID's the driver is
|
|
||||||
interested in. Most drivers should export this
|
|
||||||
table using MODULE_DEVICE_TABLE(pci,...).
|
|
||||||
|
|
||||||
probe This probing function gets called (during execution
|
The ID table is an array of ``struct pci_device_id`` entries ending with an
|
||||||
of pci_register_driver() for already existing
|
|
||||||
devices or later if a new device gets inserted) for
|
|
||||||
all PCI devices which match the ID table and are not
|
|
||||||
"owned" by the other drivers yet. This function gets
|
|
||||||
passed a "struct pci_dev *" for each device whose
|
|
||||||
entry in the ID table matches the device. The probe
|
|
||||||
function returns zero when the driver chooses to
|
|
||||||
take "ownership" of the device or an error code
|
|
||||||
(negative number) otherwise.
|
|
||||||
The probe function always gets called from process
|
|
||||||
context, so it can sleep.
|
|
||||||
|
|
||||||
remove The remove() function gets called whenever a device
|
|
||||||
being handled by this driver is removed (either during
|
|
||||||
deregistration of the driver or when it's manually
|
|
||||||
pulled out of a hot-pluggable slot).
|
|
||||||
The remove function always gets called from process
|
|
||||||
context, so it can sleep.
|
|
||||||
|
|
||||||
suspend Put device into low power state.
|
|
||||||
suspend_late Put device into low power state.
|
|
||||||
|
|
||||||
resume_early Wake device from low power state.
|
|
||||||
resume Wake device from low power state.
|
|
||||||
|
|
||||||
(Please see Documentation/power/pci.txt for descriptions
|
|
||||||
of PCI Power Management and the related functions.)
|
|
||||||
|
|
||||||
shutdown Hook into reboot_notifier_list (kernel/sys.c).
|
|
||||||
Intended to stop any idling DMA operations.
|
|
||||||
Useful for enabling wake-on-lan (NIC) or changing
|
|
||||||
the power state of a device before reboot.
|
|
||||||
e.g. drivers/net/e100.c.
|
|
||||||
|
|
||||||
err_handler See Documentation/PCI/pci-error-recovery.txt
|
|
||||||
|
|
||||||
|
|
||||||
The ID table is an array of struct pci_device_id entries ending with an
|
|
||||||
all-zero entry. Definitions with static const are generally preferred.
|
all-zero entry. Definitions with static const are generally preferred.
|
||||||
|
|
||||||
Each entry consists of:
|
.. kernel-doc:: include/linux/mod_devicetable.h
|
||||||
|
:functions: pci_device_id
|
||||||
|
|
||||||
vendor,device Vendor and device ID to match (or PCI_ANY_ID)
|
Most drivers only need ``PCI_DEVICE()`` or ``PCI_DEVICE_CLASS()`` to set up
|
||||||
|
|
||||||
subvendor, Subsystem vendor and device ID to match (or PCI_ANY_ID)
|
|
||||||
subdevice,
|
|
||||||
|
|
||||||
class Device class, subclass, and "interface" to match.
|
|
||||||
See Appendix D of the PCI Local Bus Spec or
|
|
||||||
include/linux/pci_ids.h for a full list of classes.
|
|
||||||
Most drivers do not need to specify class/class_mask
|
|
||||||
as vendor/device is normally sufficient.
|
|
||||||
|
|
||||||
class_mask limit which sub-fields of the class field are compared.
|
|
||||||
See drivers/scsi/sym53c8xx_2/ for example of usage.
|
|
||||||
|
|
||||||
driver_data Data private to the driver.
|
|
||||||
Most drivers don't need to use driver_data field.
|
|
||||||
Best practice is to use driver_data as an index
|
|
||||||
into a static list of equivalent device types,
|
|
||||||
instead of using it as a pointer.
|
|
||||||
|
|
||||||
|
|
||||||
Most drivers only need PCI_DEVICE() or PCI_DEVICE_CLASS() to set up
|
|
||||||
a pci_device_id table.
|
a pci_device_id table.
|
||||||
|
|
||||||
New PCI IDs may be added to a device driver pci_ids table at runtime
|
New PCI IDs may be added to a device driver pci_ids table at runtime
|
||||||
as shown below:
|
as shown below::
|
||||||
|
|
||||||
echo "vendor device subvendor subdevice class class_mask driver_data" > \
|
echo "vendor device subvendor subdevice class class_mask driver_data" > \
|
||||||
/sys/bus/pci/drivers/{driver}/new_id
|
/sys/bus/pci/drivers/{driver}/new_id
|
||||||
|
|
||||||
All fields are passed in as hexadecimal values (no leading 0x).
|
All fields are passed in as hexadecimal values (no leading 0x).
|
||||||
The vendor and device fields are mandatory, the others are optional. Users
|
The vendor and device fields are mandatory, the others are optional. Users
|
||||||
need pass only as many optional fields as necessary:
|
need pass only as many optional fields as necessary:
|
||||||
o subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF)
|
|
||||||
o class and classmask fields default to 0
|
- subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF)
|
||||||
o driver_data defaults to 0UL.
|
- class and classmask fields default to 0
|
||||||
|
- driver_data defaults to 0UL.
|
||||||
|
|
||||||
Note that driver_data must match the value used by any of the pci_device_id
|
Note that driver_data must match the value used by any of the pci_device_id
|
||||||
entries defined in the driver. This makes the driver_data field mandatory
|
entries defined in the driver. This makes the driver_data field mandatory
|
||||||
|
@ -175,29 +115,31 @@ When the driver exits, it just calls pci_unregister_driver() and the PCI layer
|
||||||
automatically calls the remove hook for all devices handled by the driver.
|
automatically calls the remove hook for all devices handled by the driver.
|
||||||
|
|
||||||
|
|
||||||
1.1 "Attributes" for driver functions/data
|
"Attributes" for driver functions/data
|
||||||
|
--------------------------------------
|
||||||
|
|
||||||
Please mark the initialization and cleanup functions where appropriate
|
Please mark the initialization and cleanup functions where appropriate
|
||||||
(the corresponding macros are defined in <linux/init.h>):
|
(the corresponding macros are defined in <linux/init.h>):
|
||||||
|
|
||||||
|
====== =================================================
|
||||||
__init Initialization code. Thrown away after the driver
|
__init Initialization code. Thrown away after the driver
|
||||||
initializes.
|
initializes.
|
||||||
__exit Exit code. Ignored for non-modular drivers.
|
__exit Exit code. Ignored for non-modular drivers.
|
||||||
|
====== =================================================
|
||||||
|
|
||||||
Tips on when/where to use the above attributes:
|
Tips on when/where to use the above attributes:
|
||||||
o The module_init()/module_exit() functions (and all
|
- The module_init()/module_exit() functions (and all
|
||||||
initialization functions called _only_ from these)
|
initialization functions called _only_ from these)
|
||||||
should be marked __init/__exit.
|
should be marked __init/__exit.
|
||||||
|
|
||||||
o Do not mark the struct pci_driver.
|
- Do not mark the struct pci_driver.
|
||||||
|
|
||||||
o Do NOT mark a function if you are not sure which mark to use.
|
- Do NOT mark a function if you are not sure which mark to use.
|
||||||
Better to not mark the function than mark the function wrong.
|
Better to not mark the function than mark the function wrong.
|
||||||
|
|
||||||
|
|
||||||
|
How to find PCI devices manually
|
||||||
2. How to find PCI devices manually
|
================================
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
PCI drivers should have a really good reason for not using the
|
PCI drivers should have a really good reason for not using the
|
||||||
pci_register_driver() interface to search for PCI devices.
|
pci_register_driver() interface to search for PCI devices.
|
||||||
|
@ -207,17 +149,17 @@ E.g. combined serial/parallel port/floppy controller.
|
||||||
|
|
||||||
A manual search may be performed using the following constructs:
|
A manual search may be performed using the following constructs:
|
||||||
|
|
||||||
Searching by vendor and device ID:
|
Searching by vendor and device ID::
|
||||||
|
|
||||||
struct pci_dev *dev = NULL;
|
struct pci_dev *dev = NULL;
|
||||||
while (dev = pci_get_device(VENDOR_ID, DEVICE_ID, dev))
|
while (dev = pci_get_device(VENDOR_ID, DEVICE_ID, dev))
|
||||||
configure_device(dev);
|
configure_device(dev);
|
||||||
|
|
||||||
Searching by class ID (iterate in a similar way):
|
Searching by class ID (iterate in a similar way)::
|
||||||
|
|
||||||
pci_get_class(CLASS_ID, dev)
|
pci_get_class(CLASS_ID, dev)
|
||||||
|
|
||||||
Searching by both vendor/device and subsystem vendor/device ID:
|
Searching by both vendor/device and subsystem vendor/device ID::
|
||||||
|
|
||||||
pci_get_subsys(VENDOR_ID,DEVICE_ID, SUBSYS_VENDOR_ID, SUBSYS_DEVICE_ID, dev).
|
pci_get_subsys(VENDOR_ID,DEVICE_ID, SUBSYS_VENDOR_ID, SUBSYS_DEVICE_ID, dev).
|
||||||
|
|
||||||
|
@ -230,21 +172,20 @@ the pci_dev that they return. You must eventually (possibly at module unload)
|
||||||
decrement the reference count on these devices by calling pci_dev_put().
|
decrement the reference count on these devices by calling pci_dev_put().
|
||||||
|
|
||||||
|
|
||||||
|
Device Initialization Steps
|
||||||
3. Device Initialization Steps
|
===========================
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
As noted in the introduction, most PCI drivers need the following steps
|
As noted in the introduction, most PCI drivers need the following steps
|
||||||
for device initialization:
|
for device initialization:
|
||||||
|
|
||||||
Enable the device
|
- Enable the device
|
||||||
Request MMIO/IOP resources
|
- Request MMIO/IOP resources
|
||||||
Set the DMA mask size (for both coherent and streaming DMA)
|
- Set the DMA mask size (for both coherent and streaming DMA)
|
||||||
Allocate and initialize shared control data (pci_allocate_coherent())
|
- Allocate and initialize shared control data (pci_allocate_coherent())
|
||||||
Access device configuration space (if needed)
|
- Access device configuration space (if needed)
|
||||||
Register IRQ handler (request_irq())
|
- Register IRQ handler (request_irq())
|
||||||
Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip)
|
- Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip)
|
||||||
Enable DMA/processing engines.
|
- Enable DMA/processing engines.
|
||||||
|
|
||||||
The driver can access PCI config space registers at any time.
|
The driver can access PCI config space registers at any time.
|
||||||
(Well, almost. When running BIST, config space can go away...but
|
(Well, almost. When running BIST, config space can go away...but
|
||||||
|
@ -252,26 +193,29 @@ that will just result in a PCI Bus Master Abort and config reads
|
||||||
will return garbage).
|
will return garbage).
|
||||||
|
|
||||||
|
|
||||||
3.1 Enable the PCI device
|
Enable the PCI device
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
---------------------
|
||||||
Before touching any device registers, the driver needs to enable
|
Before touching any device registers, the driver needs to enable
|
||||||
the PCI device by calling pci_enable_device(). This will:
|
the PCI device by calling pci_enable_device(). This will:
|
||||||
o wake up the device if it was in suspended state,
|
|
||||||
o allocate I/O and memory regions of the device (if BIOS did not),
|
|
||||||
o allocate an IRQ (if BIOS did not).
|
|
||||||
|
|
||||||
NOTE: pci_enable_device() can fail! Check the return value.
|
- wake up the device if it was in suspended state,
|
||||||
|
- allocate I/O and memory regions of the device (if BIOS did not),
|
||||||
|
- allocate an IRQ (if BIOS did not).
|
||||||
|
|
||||||
[ OS BUG: we don't check resource allocations before enabling those
|
.. note::
|
||||||
resources. The sequence would make more sense if we called
|
pci_enable_device() can fail! Check the return value.
|
||||||
pci_request_resources() before calling pci_enable_device().
|
|
||||||
Currently, the device drivers can't detect the bug when when two
|
.. warning::
|
||||||
devices have been allocated the same range. This is not a common
|
OS BUG: we don't check resource allocations before enabling those
|
||||||
problem and unlikely to get fixed soon.
|
resources. The sequence would make more sense if we called
|
||||||
|
pci_request_resources() before calling pci_enable_device().
|
||||||
|
Currently, the device drivers can't detect the bug when when two
|
||||||
|
devices have been allocated the same range. This is not a common
|
||||||
|
problem and unlikely to get fixed soon.
|
||||||
|
|
||||||
|
This has been discussed before but not changed as of 2.6.19:
|
||||||
|
http://lkml.org/lkml/2006/3/2/194
|
||||||
|
|
||||||
This has been discussed before but not changed as of 2.6.19:
|
|
||||||
http://lkml.org/lkml/2006/3/2/194
|
|
||||||
]
|
|
||||||
|
|
||||||
pci_set_master() will enable DMA by setting the bus master bit
|
pci_set_master() will enable DMA by setting the bus master bit
|
||||||
in the PCI_COMMAND register. It also fixes the latency timer value if
|
in the PCI_COMMAND register. It also fixes the latency timer value if
|
||||||
|
@ -288,8 +232,8 @@ pci_try_set_mwi() to have the system do its best effort at enabling
|
||||||
Mem-Wr-Inval.
|
Mem-Wr-Inval.
|
||||||
|
|
||||||
|
|
||||||
3.2 Request MMIO/IOP resources
|
Request MMIO/IOP resources
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
--------------------------
|
||||||
Memory (MMIO), and I/O port addresses should NOT be read directly
|
Memory (MMIO), and I/O port addresses should NOT be read directly
|
||||||
from the PCI device config space. Use the values in the pci_dev structure
|
from the PCI device config space. Use the values in the pci_dev structure
|
||||||
as the PCI "bus address" might have been remapped to a "host physical"
|
as the PCI "bus address" might have been remapped to a "host physical"
|
||||||
|
@ -304,9 +248,10 @@ Conversely, drivers should call pci_release_region() AFTER
|
||||||
calling pci_disable_device().
|
calling pci_disable_device().
|
||||||
The idea is to prevent two devices colliding on the same address range.
|
The idea is to prevent two devices colliding on the same address range.
|
||||||
|
|
||||||
[ See OS BUG comment above. Currently (2.6.19), The driver can only
|
.. tip::
|
||||||
determine MMIO and IO Port resource availability _after_ calling
|
See OS BUG comment above. Currently (2.6.19), The driver can only
|
||||||
pci_enable_device(). ]
|
determine MMIO and IO Port resource availability _after_ calling
|
||||||
|
pci_enable_device().
|
||||||
|
|
||||||
Generic flavors of pci_request_region() are request_mem_region()
|
Generic flavors of pci_request_region() are request_mem_region()
|
||||||
(for MMIO ranges) and request_region() (for IO Port ranges).
|
(for MMIO ranges) and request_region() (for IO Port ranges).
|
||||||
|
@ -316,12 +261,13 @@ BARs.
|
||||||
Also see pci_request_selected_regions() below.
|
Also see pci_request_selected_regions() below.
|
||||||
|
|
||||||
|
|
||||||
3.3 Set the DMA mask size
|
Set the DMA mask size
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
---------------------
|
||||||
[ If anything below doesn't make sense, please refer to
|
.. note::
|
||||||
Documentation/DMA-API.txt. This section is just a reminder that
|
If anything below doesn't make sense, please refer to
|
||||||
drivers need to indicate DMA capabilities of the device and is not
|
Documentation/DMA-API.txt. This section is just a reminder that
|
||||||
an authoritative source for DMA interfaces. ]
|
drivers need to indicate DMA capabilities of the device and is not
|
||||||
|
an authoritative source for DMA interfaces.
|
||||||
|
|
||||||
While all drivers should explicitly indicate the DMA capability
|
While all drivers should explicitly indicate the DMA capability
|
||||||
(e.g. 32 or 64 bit) of the PCI bus master, devices with more than
|
(e.g. 32 or 64 bit) of the PCI bus master, devices with more than
|
||||||
|
@ -342,23 +288,23 @@ Many 64-bit "PCI" devices (before PCI-X) and some PCI-X devices are
|
||||||
("consistent") data.
|
("consistent") data.
|
||||||
|
|
||||||
|
|
||||||
3.4 Setup shared control data
|
Setup shared control data
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
-------------------------
|
||||||
Once the DMA masks are set, the driver can allocate "consistent" (a.k.a. shared)
|
Once the DMA masks are set, the driver can allocate "consistent" (a.k.a. shared)
|
||||||
memory. See Documentation/DMA-API.txt for a full description of
|
memory. See Documentation/DMA-API.txt for a full description of
|
||||||
the DMA APIs. This section is just a reminder that it needs to be done
|
the DMA APIs. This section is just a reminder that it needs to be done
|
||||||
before enabling DMA on the device.
|
before enabling DMA on the device.
|
||||||
|
|
||||||
|
|
||||||
3.5 Initialize device registers
|
Initialize device registers
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
---------------------------
|
||||||
Some drivers will need specific "capability" fields programmed
|
Some drivers will need specific "capability" fields programmed
|
||||||
or other "vendor specific" register initialized or reset.
|
or other "vendor specific" register initialized or reset.
|
||||||
E.g. clearing pending interrupts.
|
E.g. clearing pending interrupts.
|
||||||
|
|
||||||
|
|
||||||
3.6 Register IRQ handler
|
Register IRQ handler
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
--------------------
|
||||||
While calling request_irq() is the last step described here,
|
While calling request_irq() is the last step described here,
|
||||||
this is often just another intermediate step to initialize a device.
|
this is often just another intermediate step to initialize a device.
|
||||||
This step can often be deferred until the device is opened for use.
|
This step can often be deferred until the device is opened for use.
|
||||||
|
@ -396,6 +342,7 @@ and msix_enabled flags in the pci_dev structure after calling
|
||||||
pci_alloc_irq_vectors.
|
pci_alloc_irq_vectors.
|
||||||
|
|
||||||
There are (at least) two really good reasons for using MSI:
|
There are (at least) two really good reasons for using MSI:
|
||||||
|
|
||||||
1) MSI is an exclusive interrupt vector by definition.
|
1) MSI is an exclusive interrupt vector by definition.
|
||||||
This means the interrupt handler doesn't have to verify
|
This means the interrupt handler doesn't have to verify
|
||||||
its device caused the interrupt.
|
its device caused the interrupt.
|
||||||
|
@ -410,24 +357,23 @@ See drivers/infiniband/hw/mthca/ or drivers/net/tg3.c for examples
|
||||||
of MSI/MSI-X usage.
|
of MSI/MSI-X usage.
|
||||||
|
|
||||||
|
|
||||||
|
PCI device shutdown
|
||||||
4. PCI device shutdown
|
===================
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
When a PCI device driver is being unloaded, most of the following
|
When a PCI device driver is being unloaded, most of the following
|
||||||
steps need to be performed:
|
steps need to be performed:
|
||||||
|
|
||||||
Disable the device from generating IRQs
|
- Disable the device from generating IRQs
|
||||||
Release the IRQ (free_irq())
|
- Release the IRQ (free_irq())
|
||||||
Stop all DMA activity
|
- Stop all DMA activity
|
||||||
Release DMA buffers (both streaming and consistent)
|
- Release DMA buffers (both streaming and consistent)
|
||||||
Unregister from other subsystems (e.g. scsi or netdev)
|
- Unregister from other subsystems (e.g. scsi or netdev)
|
||||||
Disable device from responding to MMIO/IO Port addresses
|
- Disable device from responding to MMIO/IO Port addresses
|
||||||
Release MMIO/IO Port resource(s)
|
- Release MMIO/IO Port resource(s)
|
||||||
|
|
||||||
|
|
||||||
4.1 Stop IRQs on the device
|
Stop IRQs on the device
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
-----------------------
|
||||||
How to do this is chip/device specific. If it's not done, it opens
|
How to do this is chip/device specific. If it's not done, it opens
|
||||||
the possibility of a "screaming interrupt" if (and only if)
|
the possibility of a "screaming interrupt" if (and only if)
|
||||||
the IRQ is shared with another device.
|
the IRQ is shared with another device.
|
||||||
|
@ -446,16 +392,16 @@ MSI and MSI-X are defined to be exclusive interrupts and thus
|
||||||
are not susceptible to the "screaming interrupt" problem.
|
are not susceptible to the "screaming interrupt" problem.
|
||||||
|
|
||||||
|
|
||||||
4.2 Release the IRQ
|
Release the IRQ
|
||||||
~~~~~~~~~~~~~~~~~~~
|
---------------
|
||||||
Once the device is quiesced (no more IRQs), one can call free_irq().
|
Once the device is quiesced (no more IRQs), one can call free_irq().
|
||||||
This function will return control once any pending IRQs are handled,
|
This function will return control once any pending IRQs are handled,
|
||||||
"unhook" the drivers IRQ handler from that IRQ, and finally release
|
"unhook" the drivers IRQ handler from that IRQ, and finally release
|
||||||
the IRQ if no one else is using it.
|
the IRQ if no one else is using it.
|
||||||
|
|
||||||
|
|
||||||
4.3 Stop all DMA activity
|
Stop all DMA activity
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
---------------------
|
||||||
It's extremely important to stop all DMA operations BEFORE attempting
|
It's extremely important to stop all DMA operations BEFORE attempting
|
||||||
to deallocate DMA control data. Failure to do so can result in memory
|
to deallocate DMA control data. Failure to do so can result in memory
|
||||||
corruption, hangs, and on some chip-sets a hard crash.
|
corruption, hangs, and on some chip-sets a hard crash.
|
||||||
|
@ -467,8 +413,8 @@ While this step sounds obvious and trivial, several "mature" drivers
|
||||||
didn't get this step right in the past.
|
didn't get this step right in the past.
|
||||||
|
|
||||||
|
|
||||||
4.4 Release DMA buffers
|
Release DMA buffers
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~
|
-------------------
|
||||||
Once DMA is stopped, clean up streaming DMA first.
|
Once DMA is stopped, clean up streaming DMA first.
|
||||||
I.e. unmap data buffers and return buffers to "upstream"
|
I.e. unmap data buffers and return buffers to "upstream"
|
||||||
owners if there is one.
|
owners if there is one.
|
||||||
|
@ -478,8 +424,8 @@ Then clean up "consistent" buffers which contain the control data.
|
||||||
See Documentation/DMA-API.txt for details on unmapping interfaces.
|
See Documentation/DMA-API.txt for details on unmapping interfaces.
|
||||||
|
|
||||||
|
|
||||||
4.5 Unregister from other subsystems
|
Unregister from other subsystems
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
--------------------------------
|
||||||
Most low level PCI device drivers support some other subsystem
|
Most low level PCI device drivers support some other subsystem
|
||||||
like USB, ALSA, SCSI, NetDev, Infiniband, etc. Make sure your
|
like USB, ALSA, SCSI, NetDev, Infiniband, etc. Make sure your
|
||||||
driver isn't losing resources from that other subsystem.
|
driver isn't losing resources from that other subsystem.
|
||||||
|
@ -487,31 +433,30 @@ If this happens, typically the symptom is an Oops (panic) when
|
||||||
the subsystem attempts to call into a driver that has been unloaded.
|
the subsystem attempts to call into a driver that has been unloaded.
|
||||||
|
|
||||||
|
|
||||||
4.6 Disable Device from responding to MMIO/IO Port addresses
|
Disable Device from responding to MMIO/IO Port addresses
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
--------------------------------------------------------
|
||||||
io_unmap() MMIO or IO Port resources and then call pci_disable_device().
|
io_unmap() MMIO or IO Port resources and then call pci_disable_device().
|
||||||
This is the symmetric opposite of pci_enable_device().
|
This is the symmetric opposite of pci_enable_device().
|
||||||
Do not access device registers after calling pci_disable_device().
|
Do not access device registers after calling pci_disable_device().
|
||||||
|
|
||||||
|
|
||||||
4.7 Release MMIO/IO Port Resource(s)
|
Release MMIO/IO Port Resource(s)
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
--------------------------------
|
||||||
Call pci_release_region() to mark the MMIO or IO Port range as available.
|
Call pci_release_region() to mark the MMIO or IO Port range as available.
|
||||||
Failure to do so usually results in the inability to reload the driver.
|
Failure to do so usually results in the inability to reload the driver.
|
||||||
|
|
||||||
|
|
||||||
|
How to access PCI config space
|
||||||
|
==============================
|
||||||
|
|
||||||
5. How to access PCI config space
|
You can use `pci_(read|write)_config_(byte|word|dword)` to access the config
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
space of a device represented by `struct pci_dev *`. All these functions return
|
||||||
|
0 when successful or an error code (`PCIBIOS_...`) which can be translated to a
|
||||||
You can use pci_(read|write)_config_(byte|word|dword) to access the config
|
text string by pcibios_strerror. Most drivers expect that accesses to valid PCI
|
||||||
space of a device represented by struct pci_dev *. All these functions return 0
|
|
||||||
when successful or an error code (PCIBIOS_...) which can be translated to a text
|
|
||||||
string by pcibios_strerror. Most drivers expect that accesses to valid PCI
|
|
||||||
devices don't fail.
|
devices don't fail.
|
||||||
|
|
||||||
If you don't have a struct pci_dev available, you can call
|
If you don't have a struct pci_dev available, you can call
|
||||||
pci_bus_(read|write)_config_(byte|word|dword) to access a given device
|
`pci_bus_(read|write)_config_(byte|word|dword)` to access a given device
|
||||||
and function on that bus.
|
and function on that bus.
|
||||||
|
|
||||||
If you access fields in the standard portion of the config header, please
|
If you access fields in the standard portion of the config header, please
|
||||||
|
@ -522,10 +467,10 @@ pci_find_capability() for the particular capability and it will find the
|
||||||
corresponding register block for you.
|
corresponding register block for you.
|
||||||
|
|
||||||
|
|
||||||
|
Other interesting functions
|
||||||
|
===========================
|
||||||
|
|
||||||
6. Other interesting functions
|
============================= ================================================
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
pci_get_domain_bus_and_slot() Find pci_dev corresponding to given domain,
|
pci_get_domain_bus_and_slot() Find pci_dev corresponding to given domain,
|
||||||
bus and slot and number. If the device is
|
bus and slot and number. If the device is
|
||||||
found, its reference count is increased.
|
found, its reference count is increased.
|
||||||
|
@ -539,11 +484,11 @@ pci_set_drvdata() Set private driver data pointer for a pci_dev
|
||||||
pci_get_drvdata() Return private driver data pointer for a pci_dev
|
pci_get_drvdata() Return private driver data pointer for a pci_dev
|
||||||
pci_set_mwi() Enable Memory-Write-Invalidate transactions.
|
pci_set_mwi() Enable Memory-Write-Invalidate transactions.
|
||||||
pci_clear_mwi() Disable Memory-Write-Invalidate transactions.
|
pci_clear_mwi() Disable Memory-Write-Invalidate transactions.
|
||||||
|
============================= ================================================
|
||||||
|
|
||||||
|
|
||||||
|
Miscellaneous hints
|
||||||
7. Miscellaneous hints
|
===================
|
||||||
~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
When displaying PCI device names to the user (for example when a driver wants
|
When displaying PCI device names to the user (for example when a driver wants
|
||||||
to tell the user what card has it found), please use pci_name(pci_dev).
|
to tell the user what card has it found), please use pci_name(pci_dev).
|
||||||
|
@ -559,9 +504,8 @@ on the bus need to be capable of doing it, so this is something which needs
|
||||||
to be handled by platform and generic code, not individual drivers.
|
to be handled by platform and generic code, not individual drivers.
|
||||||
|
|
||||||
|
|
||||||
|
Vendor and device identifications
|
||||||
8. Vendor and device identifications
|
=================================
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Do not add new device or vendor IDs to include/linux/pci_ids.h unless they
|
Do not add new device or vendor IDs to include/linux/pci_ids.h unless they
|
||||||
are shared across multiple drivers. You can add private definitions in
|
are shared across multiple drivers. You can add private definitions in
|
||||||
|
@ -575,28 +519,27 @@ There are mirrors of the pci.ids file at http://pciids.sourceforge.net/
|
||||||
and https://github.com/pciutils/pciids.
|
and https://github.com/pciutils/pciids.
|
||||||
|
|
||||||
|
|
||||||
|
Obsolete functions
|
||||||
9. Obsolete functions
|
==================
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
There are several functions which you might come across when trying to
|
There are several functions which you might come across when trying to
|
||||||
port an old driver to the new PCI interface. They are no longer present
|
port an old driver to the new PCI interface. They are no longer present
|
||||||
in the kernel as they aren't compatible with hotplug or PCI domains or
|
in the kernel as they aren't compatible with hotplug or PCI domains or
|
||||||
having sane locking.
|
having sane locking.
|
||||||
|
|
||||||
|
================= ===========================================
|
||||||
pci_find_device() Superseded by pci_get_device()
|
pci_find_device() Superseded by pci_get_device()
|
||||||
pci_find_subsys() Superseded by pci_get_subsys()
|
pci_find_subsys() Superseded by pci_get_subsys()
|
||||||
pci_find_slot() Superseded by pci_get_domain_bus_and_slot()
|
pci_find_slot() Superseded by pci_get_domain_bus_and_slot()
|
||||||
pci_get_slot() Superseded by pci_get_domain_bus_and_slot()
|
pci_get_slot() Superseded by pci_get_domain_bus_and_slot()
|
||||||
|
================= ===========================================
|
||||||
|
|
||||||
The alternative is the traditional PCI device driver that walks PCI
|
The alternative is the traditional PCI device driver that walks PCI
|
||||||
device lists. This is still possible but discouraged.
|
device lists. This is still possible but discouraged.
|
||||||
|
|
||||||
|
|
||||||
|
MMIO Space and "Write Posting"
|
||||||
10. MMIO Space and "Write Posting"
|
==============================
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Converting a driver from using I/O Port space to using MMIO space
|
Converting a driver from using I/O Port space to using MMIO space
|
||||||
often requires some additional changes. Specifically, "write posting"
|
often requires some additional changes. Specifically, "write posting"
|
||||||
|
@ -609,14 +552,14 @@ the CPU before the transaction has reached its destination.
|
||||||
|
|
||||||
Thus, timing sensitive code should add readl() where the CPU is
|
Thus, timing sensitive code should add readl() where the CPU is
|
||||||
expected to wait before doing other work. The classic "bit banging"
|
expected to wait before doing other work. The classic "bit banging"
|
||||||
sequence works fine for I/O Port space:
|
sequence works fine for I/O Port space::
|
||||||
|
|
||||||
for (i = 8; --i; val >>= 1) {
|
for (i = 8; --i; val >>= 1) {
|
||||||
outb(val & 1, ioport_reg); /* write bit */
|
outb(val & 1, ioport_reg); /* write bit */
|
||||||
udelay(10);
|
udelay(10);
|
||||||
}
|
}
|
||||||
|
|
||||||
The same sequence for MMIO space should be:
|
The same sequence for MMIO space should be::
|
||||||
|
|
||||||
for (i = 8; --i; val >>= 1) {
|
for (i = 8; --i; val >>= 1) {
|
||||||
writeb(val & 1, mmio_reg); /* write bit */
|
writeb(val & 1, mmio_reg); /* write bit */
|
||||||
|
@ -633,4 +576,3 @@ handle the PCI master abort on all platforms if the PCI device is
|
||||||
expected to not respond to a readl(). Most x86 platforms will allow
|
expected to not respond to a readl(). Most x86 platforms will allow
|
||||||
MMIO reads to master abort (a.k.a. "Soft Fail") and return garbage
|
MMIO reads to master abort (a.k.a. "Soft Fail") and return garbage
|
||||||
(e.g. ~0). But many RISC platforms will crash (a.k.a."Hard Fail").
|
(e.g. ~0). But many RISC platforms will crash (a.k.a."Hard Fail").
|
||||||
|
|
|
@ -1,21 +1,29 @@
|
||||||
The PCI Express Advanced Error Reporting Driver Guide HOWTO
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
T. Long Nguyen <tom.l.nguyen@intel.com>
|
.. include:: <isonum.txt>
|
||||||
Yanmin Zhang <yanmin.zhang@intel.com>
|
|
||||||
07/29/2006
|
|
||||||
|
|
||||||
|
===========================================================
|
||||||
|
The PCI Express Advanced Error Reporting Driver Guide HOWTO
|
||||||
|
===========================================================
|
||||||
|
|
||||||
1. Overview
|
:Authors: - T. Long Nguyen <tom.l.nguyen@intel.com>
|
||||||
|
- Yanmin Zhang <yanmin.zhang@intel.com>
|
||||||
|
|
||||||
1.1 About this guide
|
:Copyright: |copy| 2006 Intel Corporation
|
||||||
|
|
||||||
|
Overview
|
||||||
|
===========
|
||||||
|
|
||||||
|
About this guide
|
||||||
|
----------------
|
||||||
|
|
||||||
This guide describes the basics of the PCI Express Advanced Error
|
This guide describes the basics of the PCI Express Advanced Error
|
||||||
Reporting (AER) driver and provides information on how to use it, as
|
Reporting (AER) driver and provides information on how to use it, as
|
||||||
well as how to enable the drivers of endpoint devices to conform with
|
well as how to enable the drivers of endpoint devices to conform with
|
||||||
PCI Express AER driver.
|
PCI Express AER driver.
|
||||||
|
|
||||||
1.2 Copyright (C) Intel Corporation 2006.
|
|
||||||
|
|
||||||
1.3 What is the PCI Express AER Driver?
|
What is the PCI Express AER Driver?
|
||||||
|
-----------------------------------
|
||||||
|
|
||||||
PCI Express error signaling can occur on the PCI Express link itself
|
PCI Express error signaling can occur on the PCI Express link itself
|
||||||
or on behalf of transactions initiated on the link. PCI Express
|
or on behalf of transactions initiated on the link. PCI Express
|
||||||
|
@ -30,17 +38,19 @@ The PCI Express AER driver provides the infrastructure to support PCI
|
||||||
Express Advanced Error Reporting capability. The PCI Express AER
|
Express Advanced Error Reporting capability. The PCI Express AER
|
||||||
driver provides three basic functions:
|
driver provides three basic functions:
|
||||||
|
|
||||||
- Gathers the comprehensive error information if errors occurred.
|
- Gathers the comprehensive error information if errors occurred.
|
||||||
- Reports error to the users.
|
- Reports error to the users.
|
||||||
- Performs error recovery actions.
|
- Performs error recovery actions.
|
||||||
|
|
||||||
AER driver only attaches root ports which support PCI-Express AER
|
AER driver only attaches root ports which support PCI-Express AER
|
||||||
capability.
|
capability.
|
||||||
|
|
||||||
|
|
||||||
2. User Guide
|
User Guide
|
||||||
|
==========
|
||||||
|
|
||||||
2.1 Include the PCI Express AER Root Driver into the Linux Kernel
|
Include the PCI Express AER Root Driver into the Linux Kernel
|
||||||
|
-------------------------------------------------------------
|
||||||
|
|
||||||
The PCI Express AER Root driver is a Root Port service driver attached
|
The PCI Express AER Root driver is a Root Port service driver attached
|
||||||
to the PCI Express Port Bus driver. If a user wants to use it, the driver
|
to the PCI Express Port Bus driver. If a user wants to use it, the driver
|
||||||
|
@ -48,7 +58,8 @@ has to be compiled. Option CONFIG_PCIEAER supports this capability. It
|
||||||
depends on CONFIG_PCIEPORTBUS, so pls. set CONFIG_PCIEPORTBUS=y and
|
depends on CONFIG_PCIEPORTBUS, so pls. set CONFIG_PCIEPORTBUS=y and
|
||||||
CONFIG_PCIEAER = y.
|
CONFIG_PCIEAER = y.
|
||||||
|
|
||||||
2.2 Load PCI Express AER Root Driver
|
Load PCI Express AER Root Driver
|
||||||
|
--------------------------------
|
||||||
|
|
||||||
Some systems have AER support in firmware. Enabling Linux AER support at
|
Some systems have AER support in firmware. Enabling Linux AER support at
|
||||||
the same time the firmware handles AER may result in unpredictable
|
the same time the firmware handles AER may result in unpredictable
|
||||||
|
@ -56,30 +67,34 @@ behavior. Therefore, Linux does not handle AER events unless the firmware
|
||||||
grants AER control to the OS via the ACPI _OSC method. See the PCI FW 3.0
|
grants AER control to the OS via the ACPI _OSC method. See the PCI FW 3.0
|
||||||
Specification for details regarding _OSC usage.
|
Specification for details regarding _OSC usage.
|
||||||
|
|
||||||
2.3 AER error output
|
AER error output
|
||||||
|
----------------
|
||||||
|
|
||||||
When a PCIe AER error is captured, an error message will be output to
|
When a PCIe AER error is captured, an error message will be output to
|
||||||
console. If it's a correctable error, it is output as a warning.
|
console. If it's a correctable error, it is output as a warning.
|
||||||
Otherwise, it is printed as an error. So users could choose different
|
Otherwise, it is printed as an error. So users could choose different
|
||||||
log level to filter out correctable error messages.
|
log level to filter out correctable error messages.
|
||||||
|
|
||||||
Below shows an example:
|
Below shows an example::
|
||||||
0000:50:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0500(Requester ID)
|
|
||||||
0000:50:00.0: device [8086:0329] error status/mask=00100000/00000000
|
0000:50:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0500(Requester ID)
|
||||||
0000:50:00.0: [20] Unsupported Request (First)
|
0000:50:00.0: device [8086:0329] error status/mask=00100000/00000000
|
||||||
0000:50:00.0: TLP Header: 04000001 00200a03 05010000 00050100
|
0000:50:00.0: [20] Unsupported Request (First)
|
||||||
|
0000:50:00.0: TLP Header: 04000001 00200a03 05010000 00050100
|
||||||
|
|
||||||
In the example, 'Requester ID' means the ID of the device who sends
|
In the example, 'Requester ID' means the ID of the device who sends
|
||||||
the error message to root port. Pls. refer to pci express specs for
|
the error message to root port. Pls. refer to pci express specs for
|
||||||
other fields.
|
other fields.
|
||||||
|
|
||||||
2.4 AER Statistics / Counters
|
AER Statistics / Counters
|
||||||
|
-------------------------
|
||||||
|
|
||||||
When PCIe AER errors are captured, the counters / statistics are also exposed
|
When PCIe AER errors are captured, the counters / statistics are also exposed
|
||||||
in the form of sysfs attributes which are documented at
|
in the form of sysfs attributes which are documented at
|
||||||
Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
|
Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
|
||||||
|
|
||||||
3. Developer Guide
|
Developer Guide
|
||||||
|
===============
|
||||||
|
|
||||||
To enable AER aware support requires a software driver to configure
|
To enable AER aware support requires a software driver to configure
|
||||||
the AER capability structure within its device and to provide callbacks.
|
the AER capability structure within its device and to provide callbacks.
|
||||||
|
@ -120,7 +135,8 @@ hierarchy and links. These errors do not include any device specific
|
||||||
errors because device specific errors will still get sent directly to
|
errors because device specific errors will still get sent directly to
|
||||||
the device driver.
|
the device driver.
|
||||||
|
|
||||||
3.1 Configure the AER capability structure
|
Configure the AER capability structure
|
||||||
|
--------------------------------------
|
||||||
|
|
||||||
AER aware drivers of PCI Express component need change the device
|
AER aware drivers of PCI Express component need change the device
|
||||||
control registers to enable AER. They also could change AER registers,
|
control registers to enable AER. They also could change AER registers,
|
||||||
|
@ -128,9 +144,11 @@ including mask and severity registers. Helper function
|
||||||
pci_enable_pcie_error_reporting could be used to enable AER. See
|
pci_enable_pcie_error_reporting could be used to enable AER. See
|
||||||
section 3.3.
|
section 3.3.
|
||||||
|
|
||||||
3.2. Provide callbacks
|
Provide callbacks
|
||||||
|
-----------------
|
||||||
|
|
||||||
3.2.1 callback reset_link to reset pci express link
|
callback reset_link to reset pci express link
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
This callback is used to reset the pci express physical link when a
|
This callback is used to reset the pci express physical link when a
|
||||||
fatal error happens. The root port aer service driver provides a
|
fatal error happens. The root port aer service driver provides a
|
||||||
|
@ -140,13 +158,15 @@ upstream ports should provide their own reset_link functions.
|
||||||
|
|
||||||
In struct pcie_port_service_driver, a new pointer, reset_link, is
|
In struct pcie_port_service_driver, a new pointer, reset_link, is
|
||||||
added.
|
added.
|
||||||
|
::
|
||||||
|
|
||||||
pci_ers_result_t (*reset_link) (struct pci_dev *dev);
|
pci_ers_result_t (*reset_link) (struct pci_dev *dev);
|
||||||
|
|
||||||
Section 3.2.2.2 provides more detailed info on when to call
|
Section 3.2.2.2 provides more detailed info on when to call
|
||||||
reset_link.
|
reset_link.
|
||||||
|
|
||||||
3.2.2 PCI error-recovery callbacks
|
PCI error-recovery callbacks
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
The PCI Express AER Root driver uses error callbacks to coordinate
|
The PCI Express AER Root driver uses error callbacks to coordinate
|
||||||
with downstream device drivers associated with a hierarchy in question
|
with downstream device drivers associated with a hierarchy in question
|
||||||
|
@ -161,7 +181,8 @@ definitions of the callbacks.
|
||||||
|
|
||||||
Below sections specify when to call the error callback functions.
|
Below sections specify when to call the error callback functions.
|
||||||
|
|
||||||
3.2.2.1 Correctable errors
|
Correctable errors
|
||||||
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
Correctable errors pose no impacts on the functionality of
|
Correctable errors pose no impacts on the functionality of
|
||||||
the interface. The PCI Express protocol can recover without any
|
the interface. The PCI Express protocol can recover without any
|
||||||
|
@ -169,13 +190,16 @@ software intervention or any loss of data. These errors do not
|
||||||
require any recovery actions. The AER driver clears the device's
|
require any recovery actions. The AER driver clears the device's
|
||||||
correctable error status register accordingly and logs these errors.
|
correctable error status register accordingly and logs these errors.
|
||||||
|
|
||||||
3.2.2.2 Non-correctable (non-fatal and fatal) errors
|
Non-correctable (non-fatal and fatal) errors
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
If an error message indicates a non-fatal error, performing link reset
|
If an error message indicates a non-fatal error, performing link reset
|
||||||
at upstream is not required. The AER driver calls error_detected(dev,
|
at upstream is not required. The AER driver calls error_detected(dev,
|
||||||
pci_channel_io_normal) to all drivers associated within a hierarchy in
|
pci_channel_io_normal) to all drivers associated within a hierarchy in
|
||||||
question. for example,
|
question. for example::
|
||||||
EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort.
|
|
||||||
|
EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort
|
||||||
|
|
||||||
If Upstream port A captures an AER error, the hierarchy consists of
|
If Upstream port A captures an AER error, the hierarchy consists of
|
||||||
Downstream port B and EndPoint.
|
Downstream port B and EndPoint.
|
||||||
|
|
||||||
|
@ -199,53 +223,72 @@ function. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER and
|
||||||
reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes
|
reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes
|
||||||
to mmio_enabled.
|
to mmio_enabled.
|
||||||
|
|
||||||
3.3 helper functions
|
helper functions
|
||||||
|
----------------
|
||||||
|
::
|
||||||
|
|
||||||
|
int pci_enable_pcie_error_reporting(struct pci_dev *dev);
|
||||||
|
|
||||||
3.3.1 int pci_enable_pcie_error_reporting(struct pci_dev *dev);
|
|
||||||
pci_enable_pcie_error_reporting enables the device to send error
|
pci_enable_pcie_error_reporting enables the device to send error
|
||||||
messages to root port when an error is detected. Note that devices
|
messages to root port when an error is detected. Note that devices
|
||||||
don't enable the error reporting by default, so device drivers need
|
don't enable the error reporting by default, so device drivers need
|
||||||
call this function to enable it.
|
call this function to enable it.
|
||||||
|
|
||||||
3.3.2 int pci_disable_pcie_error_reporting(struct pci_dev *dev);
|
::
|
||||||
|
|
||||||
|
int pci_disable_pcie_error_reporting(struct pci_dev *dev);
|
||||||
|
|
||||||
pci_disable_pcie_error_reporting disables the device to send error
|
pci_disable_pcie_error_reporting disables the device to send error
|
||||||
messages to root port when an error is detected.
|
messages to root port when an error is detected.
|
||||||
|
|
||||||
3.3.3 int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);
|
::
|
||||||
|
|
||||||
|
int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);`
|
||||||
|
|
||||||
pci_cleanup_aer_uncorrect_error_status cleanups the uncorrectable
|
pci_cleanup_aer_uncorrect_error_status cleanups the uncorrectable
|
||||||
error status register.
|
error status register.
|
||||||
|
|
||||||
3.4 Frequent Asked Questions
|
Frequent Asked Questions
|
||||||
|
------------------------
|
||||||
|
|
||||||
Q: What happens if a PCI Express device driver does not provide an
|
Q:
|
||||||
error recovery handler (pci_driver->err_handler is equal to NULL)?
|
What happens if a PCI Express device driver does not provide an
|
||||||
|
error recovery handler (pci_driver->err_handler is equal to NULL)?
|
||||||
|
|
||||||
A: The devices attached with the driver won't be recovered. If the
|
A:
|
||||||
error is fatal, kernel will print out warning messages. Please refer
|
The devices attached with the driver won't be recovered. If the
|
||||||
to section 3 for more information.
|
error is fatal, kernel will print out warning messages. Please refer
|
||||||
|
to section 3 for more information.
|
||||||
|
|
||||||
Q: What happens if an upstream port service driver does not provide
|
Q:
|
||||||
callback reset_link?
|
What happens if an upstream port service driver does not provide
|
||||||
|
callback reset_link?
|
||||||
|
|
||||||
A: Fatal error recovery will fail if the errors are reported by the
|
A:
|
||||||
upstream ports who are attached by the service driver.
|
Fatal error recovery will fail if the errors are reported by the
|
||||||
|
upstream ports who are attached by the service driver.
|
||||||
|
|
||||||
Q: How does this infrastructure deal with driver that is not PCI
|
Q:
|
||||||
Express aware?
|
How does this infrastructure deal with driver that is not PCI
|
||||||
|
Express aware?
|
||||||
|
|
||||||
A: This infrastructure calls the error callback functions of the
|
A:
|
||||||
driver when an error happens. But if the driver is not aware of
|
This infrastructure calls the error callback functions of the
|
||||||
PCI Express, the device might not report its own errors to root
|
driver when an error happens. But if the driver is not aware of
|
||||||
port.
|
PCI Express, the device might not report its own errors to root
|
||||||
|
port.
|
||||||
|
|
||||||
Q: What modifications will that driver need to make it compatible
|
Q:
|
||||||
with the PCI Express AER Root driver?
|
What modifications will that driver need to make it compatible
|
||||||
|
with the PCI Express AER Root driver?
|
||||||
|
|
||||||
A: It could call the helper functions to enable AER in devices and
|
A:
|
||||||
cleanup uncorrectable status register. Pls. refer to section 3.3.
|
It could call the helper functions to enable AER in devices and
|
||||||
|
cleanup uncorrectable status register. Pls. refer to section 3.3.
|
||||||
|
|
||||||
|
|
||||||
4. Software error injection
|
Software error injection
|
||||||
|
========================
|
||||||
|
|
||||||
Debugging PCIe AER error recovery code is quite difficult because it
|
Debugging PCIe AER error recovery code is quite difficult because it
|
||||||
is hard to trigger real hardware errors. Software based error
|
is hard to trigger real hardware errors. Software based error
|
||||||
|
@ -261,6 +304,7 @@ After reboot with new kernel or insert the module, a device file named
|
||||||
|
|
||||||
Then, you need a user space tool named aer-inject, which can be gotten
|
Then, you need a user space tool named aer-inject, which can be gotten
|
||||||
from:
|
from:
|
||||||
|
|
||||||
https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/
|
https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/
|
||||||
|
|
||||||
More information about aer-inject can be found in the document comes
|
More information about aer-inject can be found in the document comes
|
|
@ -1,16 +1,23 @@
|
||||||
The PCI Express Port Bus Driver Guide HOWTO
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
Tom L Nguyen tom.l.nguyen@intel.com
|
.. include:: <isonum.txt>
|
||||||
11/03/2004
|
|
||||||
|
|
||||||
1. About this guide
|
===========================================
|
||||||
|
The PCI Express Port Bus Driver Guide HOWTO
|
||||||
|
===========================================
|
||||||
|
|
||||||
|
:Author: Tom L Nguyen tom.l.nguyen@intel.com 11/03/2004
|
||||||
|
:Copyright: |copy| 2004 Intel Corporation
|
||||||
|
|
||||||
|
About this guide
|
||||||
|
================
|
||||||
|
|
||||||
This guide describes the basics of the PCI Express Port Bus driver
|
This guide describes the basics of the PCI Express Port Bus driver
|
||||||
and provides information on how to enable the service drivers to
|
and provides information on how to enable the service drivers to
|
||||||
register/unregister with the PCI Express Port Bus Driver.
|
register/unregister with the PCI Express Port Bus Driver.
|
||||||
|
|
||||||
2. Copyright 2004 Intel Corporation
|
|
||||||
|
|
||||||
3. What is the PCI Express Port Bus Driver
|
What is the PCI Express Port Bus Driver
|
||||||
|
=======================================
|
||||||
|
|
||||||
A PCI Express Port is a logical PCI-PCI Bridge structure. There
|
A PCI Express Port is a logical PCI-PCI Bridge structure. There
|
||||||
are two types of PCI Express Port: the Root Port and the Switch
|
are two types of PCI Express Port: the Root Port and the Switch
|
||||||
|
@ -30,7 +37,8 @@ support (AER), and virtual channel support (VC). These services may
|
||||||
be handled by a single complex driver or be individually distributed
|
be handled by a single complex driver or be individually distributed
|
||||||
and handled by corresponding service drivers.
|
and handled by corresponding service drivers.
|
||||||
|
|
||||||
4. Why use the PCI Express Port Bus Driver?
|
Why use the PCI Express Port Bus Driver?
|
||||||
|
========================================
|
||||||
|
|
||||||
In existing Linux kernels, the Linux Device Driver Model allows a
|
In existing Linux kernels, the Linux Device Driver Model allows a
|
||||||
physical device to be handled by only a single driver. The PCI
|
physical device to be handled by only a single driver. The PCI
|
||||||
|
@ -51,28 +59,31 @@ PCI Express Ports and distributes all provided service requests
|
||||||
to the corresponding service drivers as required. Some key
|
to the corresponding service drivers as required. Some key
|
||||||
advantages of using the PCI Express Port Bus driver are listed below:
|
advantages of using the PCI Express Port Bus driver are listed below:
|
||||||
|
|
||||||
- Allow multiple service drivers to run simultaneously on
|
- Allow multiple service drivers to run simultaneously on
|
||||||
a PCI-PCI Bridge Port device.
|
a PCI-PCI Bridge Port device.
|
||||||
|
|
||||||
- Allow service drivers implemented in an independent
|
- Allow service drivers implemented in an independent
|
||||||
staged approach.
|
staged approach.
|
||||||
|
|
||||||
- Allow one service driver to run on multiple PCI-PCI Bridge
|
- Allow one service driver to run on multiple PCI-PCI Bridge
|
||||||
Port devices.
|
Port devices.
|
||||||
|
|
||||||
- Manage and distribute resources of a PCI-PCI Bridge Port
|
- Manage and distribute resources of a PCI-PCI Bridge Port
|
||||||
device to requested service drivers.
|
device to requested service drivers.
|
||||||
|
|
||||||
5. Configuring the PCI Express Port Bus Driver vs. Service Drivers
|
Configuring the PCI Express Port Bus Driver vs. Service Drivers
|
||||||
|
===============================================================
|
||||||
|
|
||||||
5.1 Including the PCI Express Port Bus Driver Support into the Kernel
|
Including the PCI Express Port Bus Driver Support into the Kernel
|
||||||
|
-----------------------------------------------------------------
|
||||||
|
|
||||||
Including the PCI Express Port Bus driver depends on whether the PCI
|
Including the PCI Express Port Bus driver depends on whether the PCI
|
||||||
Express support is included in the kernel config. The kernel will
|
Express support is included in the kernel config. The kernel will
|
||||||
automatically include the PCI Express Port Bus driver as a kernel
|
automatically include the PCI Express Port Bus driver as a kernel
|
||||||
driver when the PCI Express support is enabled in the kernel.
|
driver when the PCI Express support is enabled in the kernel.
|
||||||
|
|
||||||
5.2 Enabling Service Driver Support
|
Enabling Service Driver Support
|
||||||
|
-------------------------------
|
||||||
|
|
||||||
PCI device drivers are implemented based on Linux Device Driver Model.
|
PCI device drivers are implemented based on Linux Device Driver Model.
|
||||||
All service drivers are PCI device drivers. As discussed above, it is
|
All service drivers are PCI device drivers. As discussed above, it is
|
||||||
|
@ -89,9 +100,11 @@ header file /include/linux/pcieport_if.h, before calling these APIs.
|
||||||
Failure to do so will result an identity mismatch, which prevents
|
Failure to do so will result an identity mismatch, which prevents
|
||||||
the PCI Express Port Bus driver from loading a service driver.
|
the PCI Express Port Bus driver from loading a service driver.
|
||||||
|
|
||||||
5.2.1 pcie_port_service_register
|
pcie_port_service_register
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
::
|
||||||
|
|
||||||
int pcie_port_service_register(struct pcie_port_service_driver *new)
|
int pcie_port_service_register(struct pcie_port_service_driver *new)
|
||||||
|
|
||||||
This API replaces the Linux Driver Model's pci_register_driver API. A
|
This API replaces the Linux Driver Model's pci_register_driver API. A
|
||||||
service driver should always calls pcie_port_service_register at
|
service driver should always calls pcie_port_service_register at
|
||||||
|
@ -99,69 +112,76 @@ module init. Note that after service driver being loaded, calls
|
||||||
such as pci_enable_device(dev) and pci_set_master(dev) are no longer
|
such as pci_enable_device(dev) and pci_set_master(dev) are no longer
|
||||||
necessary since these calls are executed by the PCI Port Bus driver.
|
necessary since these calls are executed by the PCI Port Bus driver.
|
||||||
|
|
||||||
5.2.2 pcie_port_service_unregister
|
pcie_port_service_unregister
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
::
|
||||||
|
|
||||||
void pcie_port_service_unregister(struct pcie_port_service_driver *new)
|
void pcie_port_service_unregister(struct pcie_port_service_driver *new)
|
||||||
|
|
||||||
pcie_port_service_unregister replaces the Linux Driver Model's
|
pcie_port_service_unregister replaces the Linux Driver Model's
|
||||||
pci_unregister_driver. It's always called by service driver when a
|
pci_unregister_driver. It's always called by service driver when a
|
||||||
module exits.
|
module exits.
|
||||||
|
|
||||||
5.2.3 Sample Code
|
Sample Code
|
||||||
|
~~~~~~~~~~~
|
||||||
|
|
||||||
Below is sample service driver code to initialize the port service
|
Below is sample service driver code to initialize the port service
|
||||||
driver data structure.
|
driver data structure.
|
||||||
|
::
|
||||||
|
|
||||||
static struct pcie_port_service_id service_id[] = { {
|
static struct pcie_port_service_id service_id[] = { {
|
||||||
.vendor = PCI_ANY_ID,
|
.vendor = PCI_ANY_ID,
|
||||||
.device = PCI_ANY_ID,
|
.device = PCI_ANY_ID,
|
||||||
.port_type = PCIE_RC_PORT,
|
.port_type = PCIE_RC_PORT,
|
||||||
.service_type = PCIE_PORT_SERVICE_AER,
|
.service_type = PCIE_PORT_SERVICE_AER,
|
||||||
}, { /* end: all zeroes */ }
|
}, { /* end: all zeroes */ }
|
||||||
};
|
};
|
||||||
|
|
||||||
static struct pcie_port_service_driver root_aerdrv = {
|
static struct pcie_port_service_driver root_aerdrv = {
|
||||||
.name = (char *)device_name,
|
.name = (char *)device_name,
|
||||||
.id_table = &service_id[0],
|
.id_table = &service_id[0],
|
||||||
|
|
||||||
.probe = aerdrv_load,
|
.probe = aerdrv_load,
|
||||||
.remove = aerdrv_unload,
|
.remove = aerdrv_unload,
|
||||||
|
|
||||||
.suspend = aerdrv_suspend,
|
.suspend = aerdrv_suspend,
|
||||||
.resume = aerdrv_resume,
|
.resume = aerdrv_resume,
|
||||||
};
|
};
|
||||||
|
|
||||||
Below is a sample code for registering/unregistering a service
|
Below is a sample code for registering/unregistering a service
|
||||||
driver.
|
driver.
|
||||||
|
::
|
||||||
|
|
||||||
static int __init aerdrv_service_init(void)
|
static int __init aerdrv_service_init(void)
|
||||||
{
|
{
|
||||||
int retval = 0;
|
int retval = 0;
|
||||||
|
|
||||||
retval = pcie_port_service_register(&root_aerdrv);
|
retval = pcie_port_service_register(&root_aerdrv);
|
||||||
if (!retval) {
|
if (!retval) {
|
||||||
/*
|
/*
|
||||||
* FIX ME
|
* FIX ME
|
||||||
*/
|
*/
|
||||||
}
|
}
|
||||||
return retval;
|
return retval;
|
||||||
}
|
}
|
||||||
|
|
||||||
static void __exit aerdrv_service_exit(void)
|
static void __exit aerdrv_service_exit(void)
|
||||||
{
|
{
|
||||||
pcie_port_service_unregister(&root_aerdrv);
|
pcie_port_service_unregister(&root_aerdrv);
|
||||||
}
|
}
|
||||||
|
|
||||||
module_init(aerdrv_service_init);
|
module_init(aerdrv_service_init);
|
||||||
module_exit(aerdrv_service_exit);
|
module_exit(aerdrv_service_exit);
|
||||||
|
|
||||||
6. Possible Resource Conflicts
|
Possible Resource Conflicts
|
||||||
|
===========================
|
||||||
|
|
||||||
Since all service drivers of a PCI-PCI Bridge Port device are
|
Since all service drivers of a PCI-PCI Bridge Port device are
|
||||||
allowed to run simultaneously, below lists a few of possible resource
|
allowed to run simultaneously, below lists a few of possible resource
|
||||||
conflicts with proposed solutions.
|
conflicts with proposed solutions.
|
||||||
|
|
||||||
6.1 MSI and MSI-X Vector Resource
|
MSI and MSI-X Vector Resource
|
||||||
|
-----------------------------
|
||||||
|
|
||||||
Once MSI or MSI-X interrupts are enabled on a device, it stays in this
|
Once MSI or MSI-X interrupts are enabled on a device, it stays in this
|
||||||
mode until they are disabled again. Since service drivers of the same
|
mode until they are disabled again. Since service drivers of the same
|
||||||
|
@ -179,7 +199,8 @@ driver. Service drivers should use (struct pcie_device*)dev->irq to
|
||||||
call request_irq/free_irq. In addition, the interrupt mode is stored
|
call request_irq/free_irq. In addition, the interrupt mode is stored
|
||||||
in the field interrupt_mode of struct pcie_device.
|
in the field interrupt_mode of struct pcie_device.
|
||||||
|
|
||||||
6.3 PCI Memory/IO Mapped Regions
|
PCI Memory/IO Mapped Regions
|
||||||
|
----------------------------
|
||||||
|
|
||||||
Service drivers for PCI Express Power Management (PME), Advanced
|
Service drivers for PCI Express Power Management (PME), Advanced
|
||||||
Error Reporting (AER), Hot-Plug (HP) and Virtual Channel (VC) access
|
Error Reporting (AER), Hot-Plug (HP) and Virtual Channel (VC) access
|
||||||
|
@ -188,7 +209,8 @@ registers accessed are independent of each other. This patch assumes
|
||||||
that all service drivers will be well behaved and not overwrite
|
that all service drivers will be well behaved and not overwrite
|
||||||
other service driver's configuration settings.
|
other service driver's configuration settings.
|
||||||
|
|
||||||
6.4 PCI Config Registers
|
PCI Config Registers
|
||||||
|
--------------------
|
||||||
|
|
||||||
Each service driver runs its PCI config operations on its own
|
Each service driver runs its PCI config operations on its own
|
||||||
capability structure except the PCI Express capability structure, in
|
capability structure except the PCI Express capability structure, in
|
|
@ -13,7 +13,7 @@
|
||||||
For ARM64, ONLY "acpi=off", "acpi=on" or "acpi=force"
|
For ARM64, ONLY "acpi=off", "acpi=on" or "acpi=force"
|
||||||
are available
|
are available
|
||||||
|
|
||||||
See also Documentation/power/runtime_pm.txt, pci=noacpi
|
See also Documentation/power/runtime_pm.rst, pci=noacpi
|
||||||
|
|
||||||
acpi_apic_instance= [ACPI, IOAPIC]
|
acpi_apic_instance= [ACPI, IOAPIC]
|
||||||
Format: <int>
|
Format: <int>
|
||||||
|
@ -223,7 +223,7 @@
|
||||||
acpi_sleep= [HW,ACPI] Sleep options
|
acpi_sleep= [HW,ACPI] Sleep options
|
||||||
Format: { s3_bios, s3_mode, s3_beep, s4_nohwsig,
|
Format: { s3_bios, s3_mode, s3_beep, s4_nohwsig,
|
||||||
old_ordering, nonvs, sci_force_enable, nobl }
|
old_ordering, nonvs, sci_force_enable, nobl }
|
||||||
See Documentation/power/video.txt for information on
|
See Documentation/power/video.rst for information on
|
||||||
s3_bios and s3_mode.
|
s3_bios and s3_mode.
|
||||||
s3_beep is for debugging; it makes the PC's speaker beep
|
s3_beep is for debugging; it makes the PC's speaker beep
|
||||||
as soon as the kernel's real-mode entry point is called.
|
as soon as the kernel's real-mode entry point is called.
|
||||||
|
@ -4119,7 +4119,7 @@
|
||||||
Specify the offset from the beginning of the partition
|
Specify the offset from the beginning of the partition
|
||||||
given by "resume=" at which the swap header is located,
|
given by "resume=" at which the swap header is located,
|
||||||
in <PAGE_SIZE> units (needed only for swap files).
|
in <PAGE_SIZE> units (needed only for swap files).
|
||||||
See Documentation/power/swsusp-and-swap-files.txt
|
See Documentation/power/swsusp-and-swap-files.rst
|
||||||
|
|
||||||
resumedelay= [HIBERNATION] Delay (in seconds) to pause before attempting to
|
resumedelay= [HIBERNATION] Delay (in seconds) to pause before attempting to
|
||||||
read the resume files
|
read the resume files
|
||||||
|
|
|
@ -95,7 +95,7 @@ flags - flags of the cpufreq driver
|
||||||
|
|
||||||
3. CPUFreq Table Generation with Operating Performance Point (OPP)
|
3. CPUFreq Table Generation with Operating Performance Point (OPP)
|
||||||
==================================================================
|
==================================================================
|
||||||
For details about OPP, see Documentation/power/opp.txt
|
For details about OPP, see Documentation/power/opp.rst
|
||||||
|
|
||||||
dev_pm_opp_init_cpufreq_table -
|
dev_pm_opp_init_cpufreq_table -
|
||||||
This function provides a ready to use conversion routine to translate
|
This function provides a ready to use conversion routine to translate
|
||||||
|
|
|
@ -10,8 +10,10 @@ Required properties:
|
||||||
interrupt source. The value must be 1.
|
interrupt source. The value must be 1.
|
||||||
- compatible: Should contain "mbvl,gpex40-pcie"
|
- compatible: Should contain "mbvl,gpex40-pcie"
|
||||||
- reg: Should contain PCIe registers location and length
|
- reg: Should contain PCIe registers location and length
|
||||||
|
Mandatory:
|
||||||
"config_axi_slave": PCIe controller registers
|
"config_axi_slave": PCIe controller registers
|
||||||
"csr_axi_slave" : Bridge config registers
|
"csr_axi_slave" : Bridge config registers
|
||||||
|
Optional:
|
||||||
"gpio_slave" : GPIO registers to control slot power
|
"gpio_slave" : GPIO registers to control slot power
|
||||||
"apb_csr" : MSI registers
|
"apb_csr" : MSI registers
|
||||||
|
|
||||||
|
|
|
@ -65,6 +65,14 @@ Required properties:
|
||||||
- afi
|
- afi
|
||||||
- pcie_x
|
- pcie_x
|
||||||
|
|
||||||
|
Optional properties:
|
||||||
|
- pinctrl-names: A list of pinctrl state names. Must contain the following
|
||||||
|
entries:
|
||||||
|
- "default": active state, puts PCIe I/O out of deep power down state
|
||||||
|
- "idle": puts PCIe I/O into deep power down state
|
||||||
|
- pinctrl-0: phandle for the default/active state of pin configurations.
|
||||||
|
- pinctrl-1: phandle for the idle state of pin configurations.
|
||||||
|
|
||||||
Required properties on Tegra124 and later (deprecated):
|
Required properties on Tegra124 and later (deprecated):
|
||||||
- phys: Must contain an entry for each entry in phy-names.
|
- phys: Must contain an entry for each entry in phy-names.
|
||||||
- phy-names: Must include the following entries:
|
- phy-names: Must include the following entries:
|
||||||
|
|
|
@ -24,6 +24,9 @@ driver implementation may support the following properties:
|
||||||
unsupported link speed, for instance, trying to do training for
|
unsupported link speed, for instance, trying to do training for
|
||||||
unsupported link speed, etc. Must be '4' for gen4, '3' for gen3, '2'
|
unsupported link speed, etc. Must be '4' for gen4, '3' for gen3, '2'
|
||||||
for gen2, and '1' for gen1. Any other values are invalid.
|
for gen2, and '1' for gen1. Any other values are invalid.
|
||||||
|
- reset-gpios:
|
||||||
|
If present this property specifies PERST# GPIO. Host drivers can parse the
|
||||||
|
GPIO and apply fundamental reset to endpoints.
|
||||||
|
|
||||||
PCI-PCI Bridge properties
|
PCI-PCI Bridge properties
|
||||||
-------------------------
|
-------------------------
|
||||||
|
|
|
@ -10,6 +10,7 @@
|
||||||
- "qcom,pcie-msm8996" for msm8996 or apq8096
|
- "qcom,pcie-msm8996" for msm8996 or apq8096
|
||||||
- "qcom,pcie-ipq4019" for ipq4019
|
- "qcom,pcie-ipq4019" for ipq4019
|
||||||
- "qcom,pcie-ipq8074" for ipq8074
|
- "qcom,pcie-ipq8074" for ipq8074
|
||||||
|
- "qcom,pcie-qcs404" for qcs404
|
||||||
|
|
||||||
- reg:
|
- reg:
|
||||||
Usage: required
|
Usage: required
|
||||||
|
@ -116,6 +117,15 @@
|
||||||
- "ahb" AHB clock
|
- "ahb" AHB clock
|
||||||
- "aux" Auxiliary clock
|
- "aux" Auxiliary clock
|
||||||
|
|
||||||
|
- clock-names:
|
||||||
|
Usage: required for qcs404
|
||||||
|
Value type: <stringlist>
|
||||||
|
Definition: Should contain the following entries
|
||||||
|
- "iface" AHB clock
|
||||||
|
- "aux" Auxiliary clock
|
||||||
|
- "master_bus" AXI Master clock
|
||||||
|
- "slave_bus" AXI Slave clock
|
||||||
|
|
||||||
- resets:
|
- resets:
|
||||||
Usage: required
|
Usage: required
|
||||||
Value type: <prop-encoded-array>
|
Value type: <prop-encoded-array>
|
||||||
|
@ -167,6 +177,17 @@
|
||||||
- "ahb" AHB Reset
|
- "ahb" AHB Reset
|
||||||
- "axi_m_sticky" AXI Master Sticky reset
|
- "axi_m_sticky" AXI Master Sticky reset
|
||||||
|
|
||||||
|
- reset-names:
|
||||||
|
Usage: required for qcs404
|
||||||
|
Value type: <stringlist>
|
||||||
|
Definition: Should contain the following entries
|
||||||
|
- "axi_m" AXI Master reset
|
||||||
|
- "axi_s" AXI Slave reset
|
||||||
|
- "axi_m_sticky" AXI Master Sticky reset
|
||||||
|
- "pipe_sticky" PIPE sticky reset
|
||||||
|
- "pwr" PWR reset
|
||||||
|
- "ahb" AHB reset
|
||||||
|
|
||||||
- power-domains:
|
- power-domains:
|
||||||
Usage: required for apq8084 and msm8996/apq8096
|
Usage: required for apq8084 and msm8996/apq8096
|
||||||
Value type: <prop-encoded-array>
|
Value type: <prop-encoded-array>
|
||||||
|
@ -195,12 +216,12 @@
|
||||||
Definition: A phandle to the PCIe endpoint power supply
|
Definition: A phandle to the PCIe endpoint power supply
|
||||||
|
|
||||||
- phys:
|
- phys:
|
||||||
Usage: required for apq8084
|
Usage: required for apq8084 and qcs404
|
||||||
Value type: <phandle>
|
Value type: <phandle>
|
||||||
Definition: List of phandle(s) as listed in phy-names property
|
Definition: List of phandle(s) as listed in phy-names property
|
||||||
|
|
||||||
- phy-names:
|
- phy-names:
|
||||||
Usage: required for apq8084
|
Usage: required for apq8084 and qcs404
|
||||||
Value type: <stringlist>
|
Value type: <stringlist>
|
||||||
Definition: Should contain "pciephy"
|
Definition: Should contain "pciephy"
|
||||||
|
|
||||||
|
|
|
@ -3,6 +3,7 @@
|
||||||
Required properties:
|
Required properties:
|
||||||
compatible: "renesas,pcie-r8a7743" for the R8A7743 SoC;
|
compatible: "renesas,pcie-r8a7743" for the R8A7743 SoC;
|
||||||
"renesas,pcie-r8a7744" for the R8A7744 SoC;
|
"renesas,pcie-r8a7744" for the R8A7744 SoC;
|
||||||
|
"renesas,pcie-r8a774a1" for the R8A774A1 SoC;
|
||||||
"renesas,pcie-r8a774c0" for the R8A774C0 SoC;
|
"renesas,pcie-r8a774c0" for the R8A774C0 SoC;
|
||||||
"renesas,pcie-r8a7779" for the R8A7779 SoC;
|
"renesas,pcie-r8a7779" for the R8A7779 SoC;
|
||||||
"renesas,pcie-r8a7790" for the R8A7790 SoC;
|
"renesas,pcie-r8a7790" for the R8A7790 SoC;
|
||||||
|
|
|
@ -225,7 +225,7 @@ system-wide transition to a sleep state even though its :c:member:`runtime_auto`
|
||||||
flag is clear.
|
flag is clear.
|
||||||
|
|
||||||
For more information about the runtime power management framework, refer to
|
For more information about the runtime power management framework, refer to
|
||||||
:file:`Documentation/power/runtime_pm.txt`.
|
:file:`Documentation/power/runtime_pm.rst`.
|
||||||
|
|
||||||
|
|
||||||
Calling Drivers to Enter and Leave System Sleep States
|
Calling Drivers to Enter and Leave System Sleep States
|
||||||
|
@ -728,7 +728,7 @@ it into account in any way.
|
||||||
|
|
||||||
Devices may be defined as IRQ-safe which indicates to the PM core that their
|
Devices may be defined as IRQ-safe which indicates to the PM core that their
|
||||||
runtime PM callbacks may be invoked with disabled interrupts (see
|
runtime PM callbacks may be invoked with disabled interrupts (see
|
||||||
:file:`Documentation/power/runtime_pm.txt` for more information). If an
|
:file:`Documentation/power/runtime_pm.rst` for more information). If an
|
||||||
IRQ-safe device belongs to a PM domain, the runtime PM of the domain will be
|
IRQ-safe device belongs to a PM domain, the runtime PM of the domain will be
|
||||||
disallowed, unless the domain itself is defined as IRQ-safe. However, it
|
disallowed, unless the domain itself is defined as IRQ-safe. However, it
|
||||||
makes sense to define a PM domain as IRQ-safe only if all the devices in it
|
makes sense to define a PM domain as IRQ-safe only if all the devices in it
|
||||||
|
@ -795,7 +795,7 @@ so on) and the final state of the device must reflect the "active" runtime PM
|
||||||
status in that case.
|
status in that case.
|
||||||
|
|
||||||
During system-wide resume from a sleep state it's easiest to put devices into
|
During system-wide resume from a sleep state it's easiest to put devices into
|
||||||
the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`.
|
the full-power state, as explained in :file:`Documentation/power/runtime_pm.rst`.
|
||||||
[Refer to that document for more information regarding this particular issue as
|
[Refer to that document for more information regarding this particular issue as
|
||||||
well as for information on the device runtime power management framework in
|
well as for information on the device runtime power management framework in
|
||||||
general.]
|
general.]
|
||||||
|
|
|
@ -46,7 +46,7 @@ device is turned off while the system as a whole remains running, we
|
||||||
call it a "dynamic suspend" (also known as a "runtime suspend" or
|
call it a "dynamic suspend" (also known as a "runtime suspend" or
|
||||||
"selective suspend"). This document concentrates mostly on how
|
"selective suspend"). This document concentrates mostly on how
|
||||||
dynamic PM is implemented in the USB subsystem, although system PM is
|
dynamic PM is implemented in the USB subsystem, although system PM is
|
||||||
covered to some extent (see ``Documentation/power/*.txt`` for more
|
covered to some extent (see ``Documentation/power/*.rst`` for more
|
||||||
information about system PM).
|
information about system PM).
|
||||||
|
|
||||||
System PM support is present only if the kernel was built with
|
System PM support is present only if the kernel was built with
|
||||||
|
|
|
@ -103,6 +103,7 @@ needed).
|
||||||
vm/index
|
vm/index
|
||||||
bpf/index
|
bpf/index
|
||||||
usb/index
|
usb/index
|
||||||
|
PCI/index
|
||||||
misc-devices/index
|
misc-devices/index
|
||||||
|
|
||||||
Architecture-specific documentation
|
Architecture-specific documentation
|
||||||
|
|
|
@ -1,5 +1,7 @@
|
||||||
|
============
|
||||||
APM or ACPI?
|
APM or ACPI?
|
||||||
------------
|
============
|
||||||
|
|
||||||
If you have a relatively recent x86 mobile, desktop, or server system,
|
If you have a relatively recent x86 mobile, desktop, or server system,
|
||||||
odds are it supports either Advanced Power Management (APM) or
|
odds are it supports either Advanced Power Management (APM) or
|
||||||
Advanced Configuration and Power Interface (ACPI). ACPI is the newer
|
Advanced Configuration and Power Interface (ACPI). ACPI is the newer
|
||||||
|
@ -28,5 +30,7 @@ and be sure that they are started sometime in the system boot process.
|
||||||
Go ahead and start both. If ACPI or APM is not available on your
|
Go ahead and start both. If ACPI or APM is not available on your
|
||||||
system the associated daemon will exit gracefully.
|
system the associated daemon will exit gracefully.
|
||||||
|
|
||||||
apmd: http://ftp.debian.org/pool/main/a/apmd/
|
===== =======================================
|
||||||
acpid: http://acpid.sf.net/
|
apmd http://ftp.debian.org/pool/main/a/apmd/
|
||||||
|
acpid http://acpid.sf.net/
|
||||||
|
===== =======================================
|
|
@ -1,12 +1,16 @@
|
||||||
|
=================================
|
||||||
Debugging hibernation and suspend
|
Debugging hibernation and suspend
|
||||||
|
=================================
|
||||||
|
|
||||||
(C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
|
(C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
|
||||||
|
|
||||||
1. Testing hibernation (aka suspend to disk or STD)
|
1. Testing hibernation (aka suspend to disk or STD)
|
||||||
|
===================================================
|
||||||
|
|
||||||
To check if hibernation works, you can try to hibernate in the "reboot" mode:
|
To check if hibernation works, you can try to hibernate in the "reboot" mode::
|
||||||
|
|
||||||
# echo reboot > /sys/power/disk
|
# echo reboot > /sys/power/disk
|
||||||
# echo disk > /sys/power/state
|
# echo disk > /sys/power/state
|
||||||
|
|
||||||
and the system should create a hibernation image, reboot, resume and get back to
|
and the system should create a hibernation image, reboot, resume and get back to
|
||||||
the command prompt where you have started the transition. If that happens,
|
the command prompt where you have started the transition. If that happens,
|
||||||
|
@ -15,20 +19,21 @@ test at least a couple of times in a row for confidence. [This is necessary,
|
||||||
because some problems only show up on a second attempt at suspending and
|
because some problems only show up on a second attempt at suspending and
|
||||||
resuming the system.] Moreover, hibernating in the "reboot" and "shutdown"
|
resuming the system.] Moreover, hibernating in the "reboot" and "shutdown"
|
||||||
modes causes the PM core to skip some platform-related callbacks which on ACPI
|
modes causes the PM core to skip some platform-related callbacks which on ACPI
|
||||||
systems might be necessary to make hibernation work. Thus, if your machine fails
|
systems might be necessary to make hibernation work. Thus, if your machine
|
||||||
to hibernate or resume in the "reboot" mode, you should try the "platform" mode:
|
fails to hibernate or resume in the "reboot" mode, you should try the
|
||||||
|
"platform" mode::
|
||||||
|
|
||||||
# echo platform > /sys/power/disk
|
# echo platform > /sys/power/disk
|
||||||
# echo disk > /sys/power/state
|
# echo disk > /sys/power/state
|
||||||
|
|
||||||
which is the default and recommended mode of hibernation.
|
which is the default and recommended mode of hibernation.
|
||||||
|
|
||||||
Unfortunately, the "platform" mode of hibernation does not work on some systems
|
Unfortunately, the "platform" mode of hibernation does not work on some systems
|
||||||
with broken BIOSes. In such cases the "shutdown" mode of hibernation might
|
with broken BIOSes. In such cases the "shutdown" mode of hibernation might
|
||||||
work:
|
work::
|
||||||
|
|
||||||
# echo shutdown > /sys/power/disk
|
# echo shutdown > /sys/power/disk
|
||||||
# echo disk > /sys/power/state
|
# echo disk > /sys/power/state
|
||||||
|
|
||||||
(it is similar to the "reboot" mode, but it requires you to press the power
|
(it is similar to the "reboot" mode, but it requires you to press the power
|
||||||
button to make the system resume).
|
button to make the system resume).
|
||||||
|
@ -37,6 +42,7 @@ If neither "platform" nor "shutdown" hibernation mode works, you will need to
|
||||||
identify what goes wrong.
|
identify what goes wrong.
|
||||||
|
|
||||||
a) Test modes of hibernation
|
a) Test modes of hibernation
|
||||||
|
----------------------------
|
||||||
|
|
||||||
To find out why hibernation fails on your system, you can use a special testing
|
To find out why hibernation fails on your system, you can use a special testing
|
||||||
facility available if the kernel is compiled with CONFIG_PM_DEBUG set. Then,
|
facility available if the kernel is compiled with CONFIG_PM_DEBUG set. Then,
|
||||||
|
@ -44,36 +50,38 @@ there is the file /sys/power/pm_test that can be used to make the hibernation
|
||||||
core run in a test mode. There are 5 test modes available:
|
core run in a test mode. There are 5 test modes available:
|
||||||
|
|
||||||
freezer
|
freezer
|
||||||
- test the freezing of processes
|
- test the freezing of processes
|
||||||
|
|
||||||
devices
|
devices
|
||||||
- test the freezing of processes and suspending of devices
|
- test the freezing of processes and suspending of devices
|
||||||
|
|
||||||
platform
|
platform
|
||||||
- test the freezing of processes, suspending of devices and platform
|
- test the freezing of processes, suspending of devices and platform
|
||||||
global control methods(*)
|
global control methods [1]_
|
||||||
|
|
||||||
processors
|
processors
|
||||||
- test the freezing of processes, suspending of devices, platform
|
- test the freezing of processes, suspending of devices, platform
|
||||||
global control methods(*) and the disabling of nonboot CPUs
|
global control methods [1]_ and the disabling of nonboot CPUs
|
||||||
|
|
||||||
core
|
core
|
||||||
- test the freezing of processes, suspending of devices, platform global
|
- test the freezing of processes, suspending of devices, platform global
|
||||||
control methods(*), the disabling of nonboot CPUs and suspending of
|
control methods\ [1]_, the disabling of nonboot CPUs and suspending
|
||||||
platform/system devices
|
of platform/system devices
|
||||||
|
|
||||||
(*) the platform global control methods are only available on ACPI systems
|
.. [1]
|
||||||
|
|
||||||
|
the platform global control methods are only available on ACPI systems
|
||||||
and are only tested if the hibernation mode is set to "platform"
|
and are only tested if the hibernation mode is set to "platform"
|
||||||
|
|
||||||
To use one of them it is necessary to write the corresponding string to
|
To use one of them it is necessary to write the corresponding string to
|
||||||
/sys/power/pm_test (eg. "devices" to test the freezing of processes and
|
/sys/power/pm_test (eg. "devices" to test the freezing of processes and
|
||||||
suspending devices) and issue the standard hibernation commands. For example,
|
suspending devices) and issue the standard hibernation commands. For example,
|
||||||
to use the "devices" test mode along with the "platform" mode of hibernation,
|
to use the "devices" test mode along with the "platform" mode of hibernation,
|
||||||
you should do the following:
|
you should do the following::
|
||||||
|
|
||||||
# echo devices > /sys/power/pm_test
|
# echo devices > /sys/power/pm_test
|
||||||
# echo platform > /sys/power/disk
|
# echo platform > /sys/power/disk
|
||||||
# echo disk > /sys/power/state
|
# echo disk > /sys/power/state
|
||||||
|
|
||||||
Then, the kernel will try to freeze processes, suspend devices, wait a few
|
Then, the kernel will try to freeze processes, suspend devices, wait a few
|
||||||
seconds (5 by default, but configurable by the suspend.pm_test_delay module
|
seconds (5 by default, but configurable by the suspend.pm_test_delay module
|
||||||
|
@ -108,11 +116,12 @@ If the "devices" test fails, most likely there is a driver that cannot suspend
|
||||||
or resume its device (in the latter case the system may hang or become unstable
|
or resume its device (in the latter case the system may hang or become unstable
|
||||||
after the test, so please take that into consideration). To find this driver,
|
after the test, so please take that into consideration). To find this driver,
|
||||||
you can carry out a binary search according to the rules:
|
you can carry out a binary search according to the rules:
|
||||||
|
|
||||||
- if the test fails, unload a half of the drivers currently loaded and repeat
|
- if the test fails, unload a half of the drivers currently loaded and repeat
|
||||||
(that would probably involve rebooting the system, so always note what drivers
|
(that would probably involve rebooting the system, so always note what drivers
|
||||||
have been loaded before the test),
|
have been loaded before the test),
|
||||||
- if the test succeeds, load a half of the drivers you have unloaded most
|
- if the test succeeds, load a half of the drivers you have unloaded most
|
||||||
recently and repeat.
|
recently and repeat.
|
||||||
|
|
||||||
Once you have found the failing driver (there can be more than just one of
|
Once you have found the failing driver (there can be more than just one of
|
||||||
them), you have to unload it every time before hibernation. In that case please
|
them), you have to unload it every time before hibernation. In that case please
|
||||||
|
@ -146,6 +155,7 @@ indicates a serious problem that very well may be related to the hardware, but
|
||||||
please report it anyway.
|
please report it anyway.
|
||||||
|
|
||||||
b) Testing minimal configuration
|
b) Testing minimal configuration
|
||||||
|
--------------------------------
|
||||||
|
|
||||||
If all of the hibernation test modes work, you can boot the system with the
|
If all of the hibernation test modes work, you can boot the system with the
|
||||||
"init=/bin/bash" command line parameter and attempt to hibernate in the
|
"init=/bin/bash" command line parameter and attempt to hibernate in the
|
||||||
|
@ -165,14 +175,15 @@ Again, if you find the offending module(s), it(they) must be unloaded every time
|
||||||
before hibernation, and please report the problem with it(them).
|
before hibernation, and please report the problem with it(them).
|
||||||
|
|
||||||
c) Using the "test_resume" hibernation option
|
c) Using the "test_resume" hibernation option
|
||||||
|
---------------------------------------------
|
||||||
|
|
||||||
/sys/power/disk generally tells the kernel what to do after creating a
|
/sys/power/disk generally tells the kernel what to do after creating a
|
||||||
hibernation image. One of the available options is "test_resume" which
|
hibernation image. One of the available options is "test_resume" which
|
||||||
causes the just created image to be used for immediate restoration. Namely,
|
causes the just created image to be used for immediate restoration. Namely,
|
||||||
after doing:
|
after doing::
|
||||||
|
|
||||||
# echo test_resume > /sys/power/disk
|
# echo test_resume > /sys/power/disk
|
||||||
# echo disk > /sys/power/state
|
# echo disk > /sys/power/state
|
||||||
|
|
||||||
a hibernation image will be created and a resume from it will be triggered
|
a hibernation image will be created and a resume from it will be triggered
|
||||||
immediately without involving the platform firmware in any way.
|
immediately without involving the platform firmware in any way.
|
||||||
|
@ -190,6 +201,7 @@ to resume may be related to the differences between the restore and image
|
||||||
kernels.
|
kernels.
|
||||||
|
|
||||||
d) Advanced debugging
|
d) Advanced debugging
|
||||||
|
---------------------
|
||||||
|
|
||||||
In case that hibernation does not work on your system even in the minimal
|
In case that hibernation does not work on your system even in the minimal
|
||||||
configuration and compiling more drivers as modules is not practical or some
|
configuration and compiling more drivers as modules is not practical or some
|
||||||
|
@ -200,9 +212,10 @@ kernel messages using the serial console. This may provide you with some
|
||||||
information about the reasons of the suspend (resume) failure. Alternatively,
|
information about the reasons of the suspend (resume) failure. Alternatively,
|
||||||
it may be possible to use a FireWire port for debugging with firescope
|
it may be possible to use a FireWire port for debugging with firescope
|
||||||
(http://v3.sk/~lkundrak/firescope/). On x86 it is also possible to
|
(http://v3.sk/~lkundrak/firescope/). On x86 it is also possible to
|
||||||
use the PM_TRACE mechanism documented in Documentation/power/s2ram.txt .
|
use the PM_TRACE mechanism documented in Documentation/power/s2ram.rst .
|
||||||
|
|
||||||
2. Testing suspend to RAM (STR)
|
2. Testing suspend to RAM (STR)
|
||||||
|
===============================
|
||||||
|
|
||||||
To verify that the STR works, it is generally more convenient to use the s2ram
|
To verify that the STR works, it is generally more convenient to use the s2ram
|
||||||
tool available from http://suspend.sf.net and documented at
|
tool available from http://suspend.sf.net and documented at
|
||||||
|
@ -230,7 +243,8 @@ you will have to unload them every time before an STR transition (ie. before
|
||||||
you run s2ram), and please report the problems with them.
|
you run s2ram), and please report the problems with them.
|
||||||
|
|
||||||
There is a debugfs entry which shows the suspend to RAM statistics. Here is an
|
There is a debugfs entry which shows the suspend to RAM statistics. Here is an
|
||||||
example of its output.
|
example of its output::
|
||||||
|
|
||||||
# mount -t debugfs none /sys/kernel/debug
|
# mount -t debugfs none /sys/kernel/debug
|
||||||
# cat /sys/kernel/debug/suspend_stats
|
# cat /sys/kernel/debug/suspend_stats
|
||||||
success: 20
|
success: 20
|
||||||
|
@ -248,6 +262,7 @@ example of its output.
|
||||||
-16
|
-16
|
||||||
last_failed_step: suspend
|
last_failed_step: suspend
|
||||||
suspend
|
suspend
|
||||||
|
|
||||||
Field success means the success number of suspend to RAM, and field fail means
|
Field success means the success number of suspend to RAM, and field fail means
|
||||||
the failure number. Others are the failure number of different steps of suspend
|
the failure number. Others are the failure number of different steps of suspend
|
||||||
to RAM. suspend_stats just lists the last 2 failed devices, error number and
|
to RAM. suspend_stats just lists the last 2 failed devices, error number and
|
|
@ -1,4 +1,7 @@
|
||||||
|
===============
|
||||||
Charger Manager
|
Charger Manager
|
||||||
|
===============
|
||||||
|
|
||||||
(C) 2011 MyungJoo Ham <myungjoo.ham@samsung.com>, GPL
|
(C) 2011 MyungJoo Ham <myungjoo.ham@samsung.com>, GPL
|
||||||
|
|
||||||
Charger Manager provides in-kernel battery charger management that
|
Charger Manager provides in-kernel battery charger management that
|
||||||
|
@ -55,41 +58,39 @@ Charger Manager supports the following:
|
||||||
notification to users with UEVENT.
|
notification to users with UEVENT.
|
||||||
|
|
||||||
2. Global Charger-Manager Data related with suspend_again
|
2. Global Charger-Manager Data related with suspend_again
|
||||||
========================================================
|
=========================================================
|
||||||
In order to setup Charger Manager with suspend-again feature
|
In order to setup Charger Manager with suspend-again feature
|
||||||
(in-suspend monitoring), the user should provide charger_global_desc
|
(in-suspend monitoring), the user should provide charger_global_desc
|
||||||
with setup_charger_manager(struct charger_global_desc *).
|
with setup_charger_manager(`struct charger_global_desc *`).
|
||||||
This charger_global_desc data for in-suspend monitoring is global
|
This charger_global_desc data for in-suspend monitoring is global
|
||||||
as the name suggests. Thus, the user needs to provide only once even
|
as the name suggests. Thus, the user needs to provide only once even
|
||||||
if there are multiple batteries. If there are multiple batteries, the
|
if there are multiple batteries. If there are multiple batteries, the
|
||||||
multiple instances of Charger Manager share the same charger_global_desc
|
multiple instances of Charger Manager share the same charger_global_desc
|
||||||
and it will manage in-suspend monitoring for all instances of Charger Manager.
|
and it will manage in-suspend monitoring for all instances of Charger Manager.
|
||||||
|
|
||||||
The user needs to provide all the three entries properly in order to activate
|
The user needs to provide all the three entries to `struct charger_global_desc`
|
||||||
in-suspend monitoring:
|
properly in order to activate in-suspend monitoring:
|
||||||
|
|
||||||
struct charger_global_desc {
|
`char *rtc_name;`
|
||||||
|
The name of rtc (e.g., "rtc0") used to wakeup the system from
|
||||||
char *rtc_name;
|
|
||||||
: The name of rtc (e.g., "rtc0") used to wakeup the system from
|
|
||||||
suspend for Charger Manager. The alarm interrupt (AIE) of the rtc
|
suspend for Charger Manager. The alarm interrupt (AIE) of the rtc
|
||||||
should be able to wake up the system from suspend. Charger Manager
|
should be able to wake up the system from suspend. Charger Manager
|
||||||
saves and restores the alarm value and use the previously-defined
|
saves and restores the alarm value and use the previously-defined
|
||||||
alarm if it is going to go off earlier than Charger Manager so that
|
alarm if it is going to go off earlier than Charger Manager so that
|
||||||
Charger Manager does not interfere with previously-defined alarms.
|
Charger Manager does not interfere with previously-defined alarms.
|
||||||
|
|
||||||
bool (*rtc_only_wakeup)(void);
|
`bool (*rtc_only_wakeup)(void);`
|
||||||
: This callback should let CM know whether
|
This callback should let CM know whether
|
||||||
the wakeup-from-suspend is caused only by the alarm of "rtc" in the
|
the wakeup-from-suspend is caused only by the alarm of "rtc" in the
|
||||||
same struct. If there is any other wakeup source triggered the
|
same struct. If there is any other wakeup source triggered the
|
||||||
wakeup, it should return false. If the "rtc" is the only wakeup
|
wakeup, it should return false. If the "rtc" is the only wakeup
|
||||||
reason, it should return true.
|
reason, it should return true.
|
||||||
|
|
||||||
bool assume_timer_stops_in_suspend;
|
`bool assume_timer_stops_in_suspend;`
|
||||||
: if true, Charger Manager assumes that
|
if true, Charger Manager assumes that
|
||||||
the timer (CM uses jiffies as timer) stops during suspend. Then, CM
|
the timer (CM uses jiffies as timer) stops during suspend. Then, CM
|
||||||
assumes that the suspend-duration is same as the alarm length.
|
assumes that the suspend-duration is same as the alarm length.
|
||||||
};
|
|
||||||
|
|
||||||
3. How to setup suspend_again
|
3. How to setup suspend_again
|
||||||
=============================
|
=============================
|
||||||
|
@ -109,26 +110,28 @@ if the system was woken up by Charger Manager and the polling
|
||||||
=============================================
|
=============================================
|
||||||
For each battery charged independently from other batteries (if a series of
|
For each battery charged independently from other batteries (if a series of
|
||||||
batteries are charged by a single charger, they are counted as one independent
|
batteries are charged by a single charger, they are counted as one independent
|
||||||
battery), an instance of Charger Manager is attached to it.
|
battery), an instance of Charger Manager is attached to it. The following
|
||||||
|
|
||||||
struct charger_desc {
|
struct charger_desc elements:
|
||||||
|
|
||||||
char *psy_name;
|
`char *psy_name;`
|
||||||
: The power-supply-class name of the battery. Default is
|
The power-supply-class name of the battery. Default is
|
||||||
"battery" if psy_name is NULL. Users can access the psy entries
|
"battery" if psy_name is NULL. Users can access the psy entries
|
||||||
at "/sys/class/power_supply/[psy_name]/".
|
at "/sys/class/power_supply/[psy_name]/".
|
||||||
|
|
||||||
enum polling_modes polling_mode;
|
`enum polling_modes polling_mode;`
|
||||||
: CM_POLL_DISABLE: do not poll this battery.
|
CM_POLL_DISABLE:
|
||||||
CM_POLL_ALWAYS: always poll this battery.
|
do not poll this battery.
|
||||||
CM_POLL_EXTERNAL_POWER_ONLY: poll this battery if and only if
|
CM_POLL_ALWAYS:
|
||||||
an external power source is attached.
|
always poll this battery.
|
||||||
CM_POLL_CHARGING_ONLY: poll this battery if and only if the
|
CM_POLL_EXTERNAL_POWER_ONLY:
|
||||||
battery is being charged.
|
poll this battery if and only if an external power
|
||||||
|
source is attached.
|
||||||
|
CM_POLL_CHARGING_ONLY:
|
||||||
|
poll this battery if and only if the battery is being charged.
|
||||||
|
|
||||||
unsigned int fullbatt_vchkdrop_ms;
|
`unsigned int fullbatt_vchkdrop_ms; / unsigned int fullbatt_vchkdrop_uV;`
|
||||||
unsigned int fullbatt_vchkdrop_uV;
|
If both have non-zero values, Charger Manager will check the
|
||||||
: If both have non-zero values, Charger Manager will check the
|
|
||||||
battery voltage drop fullbatt_vchkdrop_ms after the battery is fully
|
battery voltage drop fullbatt_vchkdrop_ms after the battery is fully
|
||||||
charged. If the voltage drop is over fullbatt_vchkdrop_uV, Charger
|
charged. If the voltage drop is over fullbatt_vchkdrop_uV, Charger
|
||||||
Manager will try to recharge the battery by disabling and enabling
|
Manager will try to recharge the battery by disabling and enabling
|
||||||
|
@ -136,50 +139,52 @@ unsigned int fullbatt_vchkdrop_uV;
|
||||||
condition) is needed to be implemented with hardware interrupts from
|
condition) is needed to be implemented with hardware interrupts from
|
||||||
fuel gauges or charger devices/chips.
|
fuel gauges or charger devices/chips.
|
||||||
|
|
||||||
unsigned int fullbatt_uV;
|
`unsigned int fullbatt_uV;`
|
||||||
: If specified with a non-zero value, Charger Manager assumes
|
If specified with a non-zero value, Charger Manager assumes
|
||||||
that the battery is full (capacity = 100) if the battery is not being
|
that the battery is full (capacity = 100) if the battery is not being
|
||||||
charged and the battery voltage is equal to or greater than
|
charged and the battery voltage is equal to or greater than
|
||||||
fullbatt_uV.
|
fullbatt_uV.
|
||||||
|
|
||||||
unsigned int polling_interval_ms;
|
`unsigned int polling_interval_ms;`
|
||||||
: Required polling interval in ms. Charger Manager will poll
|
Required polling interval in ms. Charger Manager will poll
|
||||||
this battery every polling_interval_ms or more frequently.
|
this battery every polling_interval_ms or more frequently.
|
||||||
|
|
||||||
enum data_source battery_present;
|
`enum data_source battery_present;`
|
||||||
: CM_BATTERY_PRESENT: assume that the battery exists.
|
CM_BATTERY_PRESENT:
|
||||||
CM_NO_BATTERY: assume that the battery does not exists.
|
assume that the battery exists.
|
||||||
CM_FUEL_GAUGE: get battery presence information from fuel gauge.
|
CM_NO_BATTERY:
|
||||||
CM_CHARGER_STAT: get battery presence from chargers.
|
assume that the battery does not exists.
|
||||||
|
CM_FUEL_GAUGE:
|
||||||
|
get battery presence information from fuel gauge.
|
||||||
|
CM_CHARGER_STAT:
|
||||||
|
get battery presence from chargers.
|
||||||
|
|
||||||
char **psy_charger_stat;
|
`char **psy_charger_stat;`
|
||||||
: An array ending with NULL that has power-supply-class names of
|
An array ending with NULL that has power-supply-class names of
|
||||||
chargers. Each power-supply-class should provide "PRESENT" (if
|
chargers. Each power-supply-class should provide "PRESENT" (if
|
||||||
battery_present is "CM_CHARGER_STAT"), "ONLINE" (shows whether an
|
battery_present is "CM_CHARGER_STAT"), "ONLINE" (shows whether an
|
||||||
external power source is attached or not), and "STATUS" (shows whether
|
external power source is attached or not), and "STATUS" (shows whether
|
||||||
the battery is {"FULL" or not FULL} or {"FULL", "Charging",
|
the battery is {"FULL" or not FULL} or {"FULL", "Charging",
|
||||||
"Discharging", "NotCharging"}).
|
"Discharging", "NotCharging"}).
|
||||||
|
|
||||||
int num_charger_regulators;
|
`int num_charger_regulators; / struct regulator_bulk_data *charger_regulators;`
|
||||||
struct regulator_bulk_data *charger_regulators;
|
Regulators representing the chargers in the form for
|
||||||
: Regulators representing the chargers in the form for
|
|
||||||
regulator framework's bulk functions.
|
regulator framework's bulk functions.
|
||||||
|
|
||||||
char *psy_fuel_gauge;
|
`char *psy_fuel_gauge;`
|
||||||
: Power-supply-class name of the fuel gauge.
|
Power-supply-class name of the fuel gauge.
|
||||||
|
|
||||||
int (*temperature_out_of_range)(int *mC);
|
`int (*temperature_out_of_range)(int *mC); / bool measure_battery_temp;`
|
||||||
bool measure_battery_temp;
|
This callback returns 0 if the temperature is safe for charging,
|
||||||
: This callback returns 0 if the temperature is safe for charging,
|
|
||||||
a positive number if it is too hot to charge, and a negative number
|
a positive number if it is too hot to charge, and a negative number
|
||||||
if it is too cold to charge. With the variable mC, the callback returns
|
if it is too cold to charge. With the variable mC, the callback returns
|
||||||
the temperature in 1/1000 of centigrade.
|
the temperature in 1/1000 of centigrade.
|
||||||
The source of temperature can be battery or ambient one according to
|
The source of temperature can be battery or ambient one according to
|
||||||
the value of measure_battery_temp.
|
the value of measure_battery_temp.
|
||||||
};
|
|
||||||
|
|
||||||
5. Notify Charger-Manager of charger events: cm_notify_event()
|
5. Notify Charger-Manager of charger events: cm_notify_event()
|
||||||
=========================================================
|
==============================================================
|
||||||
If there is an charger event is required to notify
|
If there is an charger event is required to notify
|
||||||
Charger Manager, a charger device driver that triggers the event can call
|
Charger Manager, a charger device driver that triggers the event can call
|
||||||
cm_notify_event(psy, type, msg) to notify the corresponding Charger Manager.
|
cm_notify_event(psy, type, msg) to notify the corresponding Charger Manager.
|
|
@ -1,7 +1,11 @@
|
||||||
|
====================================================
|
||||||
Testing suspend and resume support in device drivers
|
Testing suspend and resume support in device drivers
|
||||||
|
====================================================
|
||||||
|
|
||||||
(C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
|
(C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
|
||||||
|
|
||||||
1. Preparing the test system
|
1. Preparing the test system
|
||||||
|
============================
|
||||||
|
|
||||||
Unfortunately, to effectively test the support for the system-wide suspend and
|
Unfortunately, to effectively test the support for the system-wide suspend and
|
||||||
resume transitions in a driver, it is necessary to suspend and resume a fully
|
resume transitions in a driver, it is necessary to suspend and resume a fully
|
||||||
|
@ -14,19 +18,20 @@ the machine's BIOS.
|
||||||
Of course, for this purpose the test system has to be known to suspend and
|
Of course, for this purpose the test system has to be known to suspend and
|
||||||
resume without the driver being tested. Thus, if possible, you should first
|
resume without the driver being tested. Thus, if possible, you should first
|
||||||
resolve all suspend/resume-related problems in the test system before you start
|
resolve all suspend/resume-related problems in the test system before you start
|
||||||
testing the new driver. Please see Documentation/power/basic-pm-debugging.txt
|
testing the new driver. Please see Documentation/power/basic-pm-debugging.rst
|
||||||
for more information about the debugging of suspend/resume functionality.
|
for more information about the debugging of suspend/resume functionality.
|
||||||
|
|
||||||
2. Testing the driver
|
2. Testing the driver
|
||||||
|
=====================
|
||||||
|
|
||||||
Once you have resolved the suspend/resume-related problems with your test system
|
Once you have resolved the suspend/resume-related problems with your test system
|
||||||
without the new driver, you are ready to test it:
|
without the new driver, you are ready to test it:
|
||||||
|
|
||||||
a) Build the driver as a module, load it and try the test modes of hibernation
|
a) Build the driver as a module, load it and try the test modes of hibernation
|
||||||
(see: Documentation/power/basic-pm-debugging.txt, 1).
|
(see: Documentation/power/basic-pm-debugging.rst, 1).
|
||||||
|
|
||||||
b) Load the driver and attempt to hibernate in the "reboot", "shutdown" and
|
b) Load the driver and attempt to hibernate in the "reboot", "shutdown" and
|
||||||
"platform" modes (see: Documentation/power/basic-pm-debugging.txt, 1).
|
"platform" modes (see: Documentation/power/basic-pm-debugging.rst, 1).
|
||||||
|
|
||||||
c) Compile the driver directly into the kernel and try the test modes of
|
c) Compile the driver directly into the kernel and try the test modes of
|
||||||
hibernation.
|
hibernation.
|
||||||
|
@ -34,12 +39,12 @@ c) Compile the driver directly into the kernel and try the test modes of
|
||||||
d) Attempt to hibernate with the driver compiled directly into the kernel
|
d) Attempt to hibernate with the driver compiled directly into the kernel
|
||||||
in the "reboot", "shutdown" and "platform" modes.
|
in the "reboot", "shutdown" and "platform" modes.
|
||||||
|
|
||||||
e) Try the test modes of suspend (see: Documentation/power/basic-pm-debugging.txt,
|
e) Try the test modes of suspend (see: Documentation/power/basic-pm-debugging.rst,
|
||||||
2). [As far as the STR tests are concerned, it should not matter whether or
|
2). [As far as the STR tests are concerned, it should not matter whether or
|
||||||
not the driver is built as a module.]
|
not the driver is built as a module.]
|
||||||
|
|
||||||
f) Attempt to suspend to RAM using the s2ram tool with the driver loaded
|
f) Attempt to suspend to RAM using the s2ram tool with the driver loaded
|
||||||
(see: Documentation/power/basic-pm-debugging.txt, 2).
|
(see: Documentation/power/basic-pm-debugging.rst, 2).
|
||||||
|
|
||||||
Each of the above tests should be repeated several times and the STD tests
|
Each of the above tests should be repeated several times and the STD tests
|
||||||
should be mixed with the STR tests. If any of them fails, the driver cannot be
|
should be mixed with the STR tests. If any of them fails, the driver cannot be
|
|
@ -1,6 +1,6 @@
|
||||||
====================
|
====================
|
||||||
Energy Model of CPUs
|
Energy Model of CPUs
|
||||||
====================
|
====================
|
||||||
|
|
||||||
1. Overview
|
1. Overview
|
||||||
-----------
|
-----------
|
||||||
|
@ -20,7 +20,7 @@ kernel, hence enabling to avoid redundant work.
|
||||||
|
|
||||||
The figure below depicts an example of drivers (Arm-specific here, but the
|
The figure below depicts an example of drivers (Arm-specific here, but the
|
||||||
approach is applicable to any architecture) providing power costs to the EM
|
approach is applicable to any architecture) providing power costs to the EM
|
||||||
framework, and interested clients reading the data from it.
|
framework, and interested clients reading the data from it::
|
||||||
|
|
||||||
+---------------+ +-----------------+ +---------------+
|
+---------------+ +-----------------+ +---------------+
|
||||||
| Thermal (IPA) | | Scheduler (EAS) | | Other |
|
| Thermal (IPA) | | Scheduler (EAS) | | Other |
|
||||||
|
@ -58,15 +58,17 @@ micro-architectures.
|
||||||
2. Core APIs
|
2. Core APIs
|
||||||
------------
|
------------
|
||||||
|
|
||||||
2.1 Config options
|
2.1 Config options
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
CONFIG_ENERGY_MODEL must be enabled to use the EM framework.
|
CONFIG_ENERGY_MODEL must be enabled to use the EM framework.
|
||||||
|
|
||||||
|
|
||||||
2.2 Registration of performance domains
|
2.2 Registration of performance domains
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
Drivers are expected to register performance domains into the EM framework by
|
Drivers are expected to register performance domains into the EM framework by
|
||||||
calling the following API:
|
calling the following API::
|
||||||
|
|
||||||
int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
|
int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
|
||||||
struct em_data_callback *cb);
|
struct em_data_callback *cb);
|
||||||
|
@ -80,7 +82,8 @@ callback, and kernel/power/energy_model.c for further documentation on this
|
||||||
API.
|
API.
|
||||||
|
|
||||||
|
|
||||||
2.3 Accessing performance domains
|
2.3 Accessing performance domains
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
Subsystems interested in the energy model of a CPU can retrieve it using the
|
Subsystems interested in the energy model of a CPU can retrieve it using the
|
||||||
em_cpu_get() API. The energy model tables are allocated once upon creation of
|
em_cpu_get() API. The energy model tables are allocated once upon creation of
|
||||||
|
@ -99,46 +102,46 @@ More details about the above APIs can be found in include/linux/energy_model.h.
|
||||||
This section provides a simple example of a CPUFreq driver registering a
|
This section provides a simple example of a CPUFreq driver registering a
|
||||||
performance domain in the Energy Model framework using the (fake) 'foo'
|
performance domain in the Energy Model framework using the (fake) 'foo'
|
||||||
protocol. The driver implements an est_power() function to be provided to the
|
protocol. The driver implements an est_power() function to be provided to the
|
||||||
EM framework.
|
EM framework::
|
||||||
|
|
||||||
-> drivers/cpufreq/foo_cpufreq.c
|
-> drivers/cpufreq/foo_cpufreq.c
|
||||||
|
|
||||||
01 static int est_power(unsigned long *mW, unsigned long *KHz, int cpu)
|
01 static int est_power(unsigned long *mW, unsigned long *KHz, int cpu)
|
||||||
02 {
|
02 {
|
||||||
03 long freq, power;
|
03 long freq, power;
|
||||||
04
|
04
|
||||||
05 /* Use the 'foo' protocol to ceil the frequency */
|
05 /* Use the 'foo' protocol to ceil the frequency */
|
||||||
06 freq = foo_get_freq_ceil(cpu, *KHz);
|
06 freq = foo_get_freq_ceil(cpu, *KHz);
|
||||||
07 if (freq < 0);
|
07 if (freq < 0);
|
||||||
08 return freq;
|
08 return freq;
|
||||||
09
|
09
|
||||||
10 /* Estimate the power cost for the CPU at the relevant freq. */
|
10 /* Estimate the power cost for the CPU at the relevant freq. */
|
||||||
11 power = foo_estimate_power(cpu, freq);
|
11 power = foo_estimate_power(cpu, freq);
|
||||||
12 if (power < 0);
|
12 if (power < 0);
|
||||||
13 return power;
|
13 return power;
|
||||||
14
|
14
|
||||||
15 /* Return the values to the EM framework */
|
15 /* Return the values to the EM framework */
|
||||||
16 *mW = power;
|
16 *mW = power;
|
||||||
17 *KHz = freq;
|
17 *KHz = freq;
|
||||||
18
|
18
|
||||||
19 return 0;
|
19 return 0;
|
||||||
20 }
|
20 }
|
||||||
21
|
21
|
||||||
22 static int foo_cpufreq_init(struct cpufreq_policy *policy)
|
22 static int foo_cpufreq_init(struct cpufreq_policy *policy)
|
||||||
23 {
|
23 {
|
||||||
24 struct em_data_callback em_cb = EM_DATA_CB(est_power);
|
24 struct em_data_callback em_cb = EM_DATA_CB(est_power);
|
||||||
25 int nr_opp, ret;
|
25 int nr_opp, ret;
|
||||||
26
|
26
|
||||||
27 /* Do the actual CPUFreq init work ... */
|
27 /* Do the actual CPUFreq init work ... */
|
||||||
28 ret = do_foo_cpufreq_init(policy);
|
28 ret = do_foo_cpufreq_init(policy);
|
||||||
29 if (ret)
|
29 if (ret)
|
||||||
30 return ret;
|
30 return ret;
|
||||||
31
|
31
|
||||||
32 /* Find the number of OPPs for this policy */
|
32 /* Find the number of OPPs for this policy */
|
||||||
33 nr_opp = foo_get_nr_opp(policy);
|
33 nr_opp = foo_get_nr_opp(policy);
|
||||||
34
|
34
|
||||||
35 /* And register the new performance domain */
|
35 /* And register the new performance domain */
|
||||||
36 em_register_perf_domain(policy->cpus, nr_opp, &em_cb);
|
36 em_register_perf_domain(policy->cpus, nr_opp, &em_cb);
|
||||||
37
|
37
|
||||||
38 return 0;
|
38 return 0;
|
||||||
39 }
|
39 }
|
|
@ -1,13 +1,18 @@
|
||||||
|
=================
|
||||||
Freezing of tasks
|
Freezing of tasks
|
||||||
(C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
|
=================
|
||||||
|
|
||||||
|
(C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
|
||||||
|
|
||||||
I. What is the freezing of tasks?
|
I. What is the freezing of tasks?
|
||||||
|
=================================
|
||||||
|
|
||||||
The freezing of tasks is a mechanism by which user space processes and some
|
The freezing of tasks is a mechanism by which user space processes and some
|
||||||
kernel threads are controlled during hibernation or system-wide suspend (on some
|
kernel threads are controlled during hibernation or system-wide suspend (on some
|
||||||
architectures).
|
architectures).
|
||||||
|
|
||||||
II. How does it work?
|
II. How does it work?
|
||||||
|
=====================
|
||||||
|
|
||||||
There are three per-task flags used for that, PF_NOFREEZE, PF_FROZEN
|
There are three per-task flags used for that, PF_NOFREEZE, PF_FROZEN
|
||||||
and PF_FREEZER_SKIP (the last one is auxiliary). The tasks that have
|
and PF_FREEZER_SKIP (the last one is auxiliary). The tasks that have
|
||||||
|
@ -41,7 +46,7 @@ explicitly in suitable places or use the wait_event_freezable() or
|
||||||
wait_event_freezable_timeout() macros (defined in include/linux/freezer.h)
|
wait_event_freezable_timeout() macros (defined in include/linux/freezer.h)
|
||||||
that combine interruptible sleep with checking if the task is to be frozen and
|
that combine interruptible sleep with checking if the task is to be frozen and
|
||||||
calling try_to_freeze(). The main loop of a freezable kernel thread may look
|
calling try_to_freeze(). The main loop of a freezable kernel thread may look
|
||||||
like the following one:
|
like the following one::
|
||||||
|
|
||||||
set_freezable();
|
set_freezable();
|
||||||
do {
|
do {
|
||||||
|
@ -65,7 +70,7 @@ order to clear the PF_FROZEN flag for each frozen task. Then, the tasks that
|
||||||
have been frozen leave __refrigerator() and continue running.
|
have been frozen leave __refrigerator() and continue running.
|
||||||
|
|
||||||
|
|
||||||
Rationale behind the functions dealing with freezing and thawing of tasks:
|
Rationale behind the functions dealing with freezing and thawing of tasks
|
||||||
-------------------------------------------------------------------------
|
-------------------------------------------------------------------------
|
||||||
|
|
||||||
freeze_processes():
|
freeze_processes():
|
||||||
|
@ -86,6 +91,7 @@ thaw_processes():
|
||||||
|
|
||||||
|
|
||||||
III. Which kernel threads are freezable?
|
III. Which kernel threads are freezable?
|
||||||
|
========================================
|
||||||
|
|
||||||
Kernel threads are not freezable by default. However, a kernel thread may clear
|
Kernel threads are not freezable by default. However, a kernel thread may clear
|
||||||
PF_NOFREEZE for itself by calling set_freezable() (the resetting of PF_NOFREEZE
|
PF_NOFREEZE for itself by calling set_freezable() (the resetting of PF_NOFREEZE
|
||||||
|
@ -93,37 +99,39 @@ directly is not allowed). From this point it is regarded as freezable
|
||||||
and must call try_to_freeze() in a suitable place.
|
and must call try_to_freeze() in a suitable place.
|
||||||
|
|
||||||
IV. Why do we do that?
|
IV. Why do we do that?
|
||||||
|
======================
|
||||||
|
|
||||||
Generally speaking, there is a couple of reasons to use the freezing of tasks:
|
Generally speaking, there is a couple of reasons to use the freezing of tasks:
|
||||||
|
|
||||||
1. The principal reason is to prevent filesystems from being damaged after
|
1. The principal reason is to prevent filesystems from being damaged after
|
||||||
hibernation. At the moment we have no simple means of checkpointing
|
hibernation. At the moment we have no simple means of checkpointing
|
||||||
filesystems, so if there are any modifications made to filesystem data and/or
|
filesystems, so if there are any modifications made to filesystem data and/or
|
||||||
metadata on disks, we cannot bring them back to the state from before the
|
metadata on disks, we cannot bring them back to the state from before the
|
||||||
modifications. At the same time each hibernation image contains some
|
modifications. At the same time each hibernation image contains some
|
||||||
filesystem-related information that must be consistent with the state of the
|
filesystem-related information that must be consistent with the state of the
|
||||||
on-disk data and metadata after the system memory state has been restored from
|
on-disk data and metadata after the system memory state has been restored
|
||||||
the image (otherwise the filesystems will be damaged in a nasty way, usually
|
from the image (otherwise the filesystems will be damaged in a nasty way,
|
||||||
making them almost impossible to repair). We therefore freeze tasks that might
|
usually making them almost impossible to repair). We therefore freeze
|
||||||
cause the on-disk filesystems' data and metadata to be modified after the
|
tasks that might cause the on-disk filesystems' data and metadata to be
|
||||||
hibernation image has been created and before the system is finally powered off.
|
modified after the hibernation image has been created and before the
|
||||||
The majority of these are user space processes, but if any of the kernel threads
|
system is finally powered off. The majority of these are user space
|
||||||
may cause something like this to happen, they have to be freezable.
|
processes, but if any of the kernel threads may cause something like this
|
||||||
|
to happen, they have to be freezable.
|
||||||
|
|
||||||
2. Next, to create the hibernation image we need to free a sufficient amount of
|
2. Next, to create the hibernation image we need to free a sufficient amount of
|
||||||
memory (approximately 50% of available RAM) and we need to do that before
|
memory (approximately 50% of available RAM) and we need to do that before
|
||||||
devices are deactivated, because we generally need them for swapping out. Then,
|
devices are deactivated, because we generally need them for swapping out.
|
||||||
after the memory for the image has been freed, we don't want tasks to allocate
|
Then, after the memory for the image has been freed, we don't want tasks
|
||||||
additional memory and we prevent them from doing that by freezing them earlier.
|
to allocate additional memory and we prevent them from doing that by
|
||||||
[Of course, this also means that device drivers should not allocate substantial
|
freezing them earlier. [Of course, this also means that device drivers
|
||||||
amounts of memory from their .suspend() callbacks before hibernation, but this
|
should not allocate substantial amounts of memory from their .suspend()
|
||||||
is a separate issue.]
|
callbacks before hibernation, but this is a separate issue.]
|
||||||
|
|
||||||
3. The third reason is to prevent user space processes and some kernel threads
|
3. The third reason is to prevent user space processes and some kernel threads
|
||||||
from interfering with the suspending and resuming of devices. A user space
|
from interfering with the suspending and resuming of devices. A user space
|
||||||
process running on a second CPU while we are suspending devices may, for
|
process running on a second CPU while we are suspending devices may, for
|
||||||
example, be troublesome and without the freezing of tasks we would need some
|
example, be troublesome and without the freezing of tasks we would need some
|
||||||
safeguards against race conditions that might occur in such a case.
|
safeguards against race conditions that might occur in such a case.
|
||||||
|
|
||||||
Although Linus Torvalds doesn't like the freezing of tasks, he said this in one
|
Although Linus Torvalds doesn't like the freezing of tasks, he said this in one
|
||||||
of the discussions on LKML (http://lkml.org/lkml/2007/4/27/608):
|
of the discussions on LKML (http://lkml.org/lkml/2007/4/27/608):
|
||||||
|
@ -132,7 +140,7 @@ of the discussions on LKML (http://lkml.org/lkml/2007/4/27/608):
|
||||||
|
|
||||||
Linus: In many ways, 'at all'.
|
Linus: In many ways, 'at all'.
|
||||||
|
|
||||||
I _do_ realize the IO request queue issues, and that we cannot actually do
|
I **do** realize the IO request queue issues, and that we cannot actually do
|
||||||
s2ram with some devices in the middle of a DMA. So we want to be able to
|
s2ram with some devices in the middle of a DMA. So we want to be able to
|
||||||
avoid *that*, there's no question about that. And I suspect that stopping
|
avoid *that*, there's no question about that. And I suspect that stopping
|
||||||
user threads and then waiting for a sync is practically one of the easier
|
user threads and then waiting for a sync is practically one of the easier
|
||||||
|
@ -150,17 +158,18 @@ thawed after the driver's .resume() callback has run, so it won't be accessing
|
||||||
the device while it's suspended.
|
the device while it's suspended.
|
||||||
|
|
||||||
4. Another reason for freezing tasks is to prevent user space processes from
|
4. Another reason for freezing tasks is to prevent user space processes from
|
||||||
realizing that hibernation (or suspend) operation takes place. Ideally, user
|
realizing that hibernation (or suspend) operation takes place. Ideally, user
|
||||||
space processes should not notice that such a system-wide operation has occurred
|
space processes should not notice that such a system-wide operation has
|
||||||
and should continue running without any problems after the restore (or resume
|
occurred and should continue running without any problems after the restore
|
||||||
from suspend). Unfortunately, in the most general case this is quite difficult
|
(or resume from suspend). Unfortunately, in the most general case this
|
||||||
to achieve without the freezing of tasks. Consider, for example, a process
|
is quite difficult to achieve without the freezing of tasks. Consider,
|
||||||
that depends on all CPUs being online while it's running. Since we need to
|
for example, a process that depends on all CPUs being online while it's
|
||||||
disable nonboot CPUs during the hibernation, if this process is not frozen, it
|
running. Since we need to disable nonboot CPUs during the hibernation,
|
||||||
may notice that the number of CPUs has changed and may start to work incorrectly
|
if this process is not frozen, it may notice that the number of CPUs has
|
||||||
because of that.
|
changed and may start to work incorrectly because of that.
|
||||||
|
|
||||||
V. Are there any problems related to the freezing of tasks?
|
V. Are there any problems related to the freezing of tasks?
|
||||||
|
===========================================================
|
||||||
|
|
||||||
Yes, there are.
|
Yes, there are.
|
||||||
|
|
||||||
|
@ -172,11 +181,12 @@ may be undesirable. That's why kernel threads are not freezable by default.
|
||||||
|
|
||||||
Second, there are the following two problems related to the freezing of user
|
Second, there are the following two problems related to the freezing of user
|
||||||
space processes:
|
space processes:
|
||||||
|
|
||||||
1. Putting processes into an uninterruptible sleep distorts the load average.
|
1. Putting processes into an uninterruptible sleep distorts the load average.
|
||||||
2. Now that we have FUSE, plus the framework for doing device drivers in
|
2. Now that we have FUSE, plus the framework for doing device drivers in
|
||||||
userspace, it gets even more complicated because some userspace processes are
|
userspace, it gets even more complicated because some userspace processes are
|
||||||
now doing the sorts of things that kernel threads do
|
now doing the sorts of things that kernel threads do
|
||||||
(https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012309.html).
|
(https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012309.html).
|
||||||
|
|
||||||
The problem 1. seems to be fixable, although it hasn't been fixed so far. The
|
The problem 1. seems to be fixable, although it hasn't been fixed so far. The
|
||||||
other one is more serious, but it seems that we can work around it by using
|
other one is more serious, but it seems that we can work around it by using
|
||||||
|
@ -201,6 +211,7 @@ requested early enough using the suspend notifier API described in
|
||||||
Documentation/driver-api/pm/notifiers.rst.
|
Documentation/driver-api/pm/notifiers.rst.
|
||||||
|
|
||||||
VI. Are there any precautions to be taken to prevent freezing failures?
|
VI. Are there any precautions to be taken to prevent freezing failures?
|
||||||
|
=======================================================================
|
||||||
|
|
||||||
Yes, there are.
|
Yes, there are.
|
||||||
|
|
||||||
|
@ -226,6 +237,8 @@ So, to summarize, use [un]lock_system_sleep() instead of directly using
|
||||||
mutex_[un]lock(&system_transition_mutex). That would prevent freezing failures.
|
mutex_[un]lock(&system_transition_mutex). That would prevent freezing failures.
|
||||||
|
|
||||||
V. Miscellaneous
|
V. Miscellaneous
|
||||||
|
================
|
||||||
|
|
||||||
/sys/power/pm_freeze_timeout controls how long it will cost at most to freeze
|
/sys/power/pm_freeze_timeout controls how long it will cost at most to freeze
|
||||||
all user space processes or all freezable kernel threads, in unit of millisecond.
|
all user space processes or all freezable kernel threads, in unit of millisecond.
|
||||||
The default value is 20000, with range of unsigned integer.
|
The default value is 20000, with range of unsigned integer.
|
|
@ -0,0 +1,46 @@
|
||||||
|
:orphan:
|
||||||
|
|
||||||
|
================
|
||||||
|
Power Management
|
||||||
|
================
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 1
|
||||||
|
|
||||||
|
apm-acpi
|
||||||
|
basic-pm-debugging
|
||||||
|
charger-manager
|
||||||
|
drivers-testing
|
||||||
|
energy-model
|
||||||
|
freezing-of-tasks
|
||||||
|
interface
|
||||||
|
opp
|
||||||
|
pci
|
||||||
|
pm_qos_interface
|
||||||
|
power_supply_class
|
||||||
|
runtime_pm
|
||||||
|
s2ram
|
||||||
|
suspend-and-cpuhotplug
|
||||||
|
suspend-and-interrupts
|
||||||
|
swsusp-and-swap-files
|
||||||
|
swsusp-dmcrypt
|
||||||
|
swsusp
|
||||||
|
video
|
||||||
|
tricks
|
||||||
|
|
||||||
|
userland-swsusp
|
||||||
|
|
||||||
|
powercap/powercap
|
||||||
|
|
||||||
|
regulator/consumer
|
||||||
|
regulator/design
|
||||||
|
regulator/machine
|
||||||
|
regulator/overview
|
||||||
|
regulator/regulator
|
||||||
|
|
||||||
|
.. only:: subproject and html
|
||||||
|
|
||||||
|
Indices
|
||||||
|
=======
|
||||||
|
|
||||||
|
* :ref:`genindex`
|
|
@ -1,4 +1,6 @@
|
||||||
|
===========================================
|
||||||
Power Management Interface for System Sleep
|
Power Management Interface for System Sleep
|
||||||
|
===========================================
|
||||||
|
|
||||||
Copyright (c) 2016 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
Copyright (c) 2016 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
||||||
|
|
||||||
|
@ -11,10 +13,10 @@ mounted at /sys).
|
||||||
|
|
||||||
Reading from it returns a list of supported sleep states, encoded as:
|
Reading from it returns a list of supported sleep states, encoded as:
|
||||||
|
|
||||||
'freeze' (Suspend-to-Idle)
|
- 'freeze' (Suspend-to-Idle)
|
||||||
'standby' (Power-On Suspend)
|
- 'standby' (Power-On Suspend)
|
||||||
'mem' (Suspend-to-RAM)
|
- 'mem' (Suspend-to-RAM)
|
||||||
'disk' (Suspend-to-Disk)
|
- 'disk' (Suspend-to-Disk)
|
||||||
|
|
||||||
Suspend-to-Idle is always supported. Suspend-to-Disk is always supported
|
Suspend-to-Idle is always supported. Suspend-to-Disk is always supported
|
||||||
too as long the kernel has been configured to support hibernation at all
|
too as long the kernel has been configured to support hibernation at all
|
||||||
|
@ -32,18 +34,18 @@ Specifically, it tells the kernel what to do after creating a hibernation image.
|
||||||
|
|
||||||
Reading from it returns a list of supported options encoded as:
|
Reading from it returns a list of supported options encoded as:
|
||||||
|
|
||||||
'platform' (put the system into sleep using a platform-provided method)
|
- 'platform' (put the system into sleep using a platform-provided method)
|
||||||
'shutdown' (shut the system down)
|
- 'shutdown' (shut the system down)
|
||||||
'reboot' (reboot the system)
|
- 'reboot' (reboot the system)
|
||||||
'suspend' (trigger a Suspend-to-RAM transition)
|
- 'suspend' (trigger a Suspend-to-RAM transition)
|
||||||
'test_resume' (resume-after-hibernation test mode)
|
- 'test_resume' (resume-after-hibernation test mode)
|
||||||
|
|
||||||
The currently selected option is printed in square brackets.
|
The currently selected option is printed in square brackets.
|
||||||
|
|
||||||
The 'platform' option is only available if the platform provides a special
|
The 'platform' option is only available if the platform provides a special
|
||||||
mechanism to put the system to sleep after creating a hibernation image (ACPI
|
mechanism to put the system to sleep after creating a hibernation image (ACPI
|
||||||
does that, for example). The 'suspend' option is available if Suspend-to-RAM
|
does that, for example). The 'suspend' option is available if Suspend-to-RAM
|
||||||
is supported. Refer to Documentation/power/basic-pm-debugging.txt for the
|
is supported. Refer to Documentation/power/basic-pm-debugging.rst for the
|
||||||
description of the 'test_resume' option.
|
description of the 'test_resume' option.
|
||||||
|
|
||||||
To select an option, write the string representing it to /sys/power/disk.
|
To select an option, write the string representing it to /sys/power/disk.
|
||||||
|
@ -71,7 +73,7 @@ If /sys/power/pm_trace contains '1', the fingerprint of each suspend/resume
|
||||||
event point in turn will be stored in the RTC memory (overwriting the actual
|
event point in turn will be stored in the RTC memory (overwriting the actual
|
||||||
RTC information), so it will survive a system crash if one occurs right after
|
RTC information), so it will survive a system crash if one occurs right after
|
||||||
storing it and it can be used later to identify the driver that caused the crash
|
storing it and it can be used later to identify the driver that caused the crash
|
||||||
to happen (see Documentation/power/s2ram.txt for more information).
|
to happen (see Documentation/power/s2ram.rst for more information).
|
||||||
|
|
||||||
Initially it contains '0' which may be changed to '1' by writing a string
|
Initially it contains '0' which may be changed to '1' by writing a string
|
||||||
representing a nonzero integer into it.
|
representing a nonzero integer into it.
|
|
@ -1,20 +1,23 @@
|
||||||
|
==========================================
|
||||||
Operating Performance Points (OPP) Library
|
Operating Performance Points (OPP) Library
|
||||||
==========================================
|
==========================================
|
||||||
|
|
||||||
(C) 2009-2010 Nishanth Menon <nm@ti.com>, Texas Instruments Incorporated
|
(C) 2009-2010 Nishanth Menon <nm@ti.com>, Texas Instruments Incorporated
|
||||||
|
|
||||||
Contents
|
.. Contents
|
||||||
--------
|
|
||||||
1. Introduction
|
1. Introduction
|
||||||
2. Initial OPP List Registration
|
2. Initial OPP List Registration
|
||||||
3. OPP Search Functions
|
3. OPP Search Functions
|
||||||
4. OPP Availability Control Functions
|
4. OPP Availability Control Functions
|
||||||
5. OPP Data Retrieval Functions
|
5. OPP Data Retrieval Functions
|
||||||
6. Data Structures
|
6. Data Structures
|
||||||
|
|
||||||
1. Introduction
|
1. Introduction
|
||||||
===============
|
===============
|
||||||
|
|
||||||
1.1 What is an Operating Performance Point (OPP)?
|
1.1 What is an Operating Performance Point (OPP)?
|
||||||
|
-------------------------------------------------
|
||||||
|
|
||||||
Complex SoCs of today consists of a multiple sub-modules working in conjunction.
|
Complex SoCs of today consists of a multiple sub-modules working in conjunction.
|
||||||
In an operational system executing varied use cases, not all modules in the SoC
|
In an operational system executing varied use cases, not all modules in the SoC
|
||||||
|
@ -28,16 +31,19 @@ the device will support per domain are called Operating Performance Points or
|
||||||
OPPs.
|
OPPs.
|
||||||
|
|
||||||
As an example:
|
As an example:
|
||||||
|
|
||||||
Let us consider an MPU device which supports the following:
|
Let us consider an MPU device which supports the following:
|
||||||
{300MHz at minimum voltage of 1V}, {800MHz at minimum voltage of 1.2V},
|
{300MHz at minimum voltage of 1V}, {800MHz at minimum voltage of 1.2V},
|
||||||
{1GHz at minimum voltage of 1.3V}
|
{1GHz at minimum voltage of 1.3V}
|
||||||
|
|
||||||
We can represent these as three OPPs as the following {Hz, uV} tuples:
|
We can represent these as three OPPs as the following {Hz, uV} tuples:
|
||||||
{300000000, 1000000}
|
|
||||||
{800000000, 1200000}
|
- {300000000, 1000000}
|
||||||
{1000000000, 1300000}
|
- {800000000, 1200000}
|
||||||
|
- {1000000000, 1300000}
|
||||||
|
|
||||||
1.2 Operating Performance Points Library
|
1.2 Operating Performance Points Library
|
||||||
|
----------------------------------------
|
||||||
|
|
||||||
OPP library provides a set of helper functions to organize and query the OPP
|
OPP library provides a set of helper functions to organize and query the OPP
|
||||||
information. The library is located in drivers/base/power/opp.c and the header
|
information. The library is located in drivers/base/power/opp.c and the header
|
||||||
|
@ -46,9 +52,10 @@ CONFIG_PM_OPP from power management menuconfig menu. OPP library depends on
|
||||||
CONFIG_PM as certain SoCs such as Texas Instrument's OMAP framework allows to
|
CONFIG_PM as certain SoCs such as Texas Instrument's OMAP framework allows to
|
||||||
optionally boot at a certain OPP without needing cpufreq.
|
optionally boot at a certain OPP without needing cpufreq.
|
||||||
|
|
||||||
Typical usage of the OPP library is as follows:
|
Typical usage of the OPP library is as follows::
|
||||||
(users) -> registers a set of default OPPs -> (library)
|
|
||||||
SoC framework -> modifies on required cases certain OPPs -> OPP layer
|
(users) -> registers a set of default OPPs -> (library)
|
||||||
|
SoC framework -> modifies on required cases certain OPPs -> OPP layer
|
||||||
-> queries to search/retrieve information ->
|
-> queries to search/retrieve information ->
|
||||||
|
|
||||||
OPP layer expects each domain to be represented by a unique device pointer. SoC
|
OPP layer expects each domain to be represented by a unique device pointer. SoC
|
||||||
|
@ -57,8 +64,9 @@ list is expected to be an optimally small number typically around 5 per device.
|
||||||
This initial list contains a set of OPPs that the framework expects to be safely
|
This initial list contains a set of OPPs that the framework expects to be safely
|
||||||
enabled by default in the system.
|
enabled by default in the system.
|
||||||
|
|
||||||
Note on OPP Availability:
|
Note on OPP Availability
|
||||||
------------------------
|
^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
As the system proceeds to operate, SoC framework may choose to make certain
|
As the system proceeds to operate, SoC framework may choose to make certain
|
||||||
OPPs available or not available on each device based on various external
|
OPPs available or not available on each device based on various external
|
||||||
factors. Example usage: Thermal management or other exceptional situations where
|
factors. Example usage: Thermal management or other exceptional situations where
|
||||||
|
@ -88,7 +96,8 @@ registering the OPPs is maintained by OPP library throughout the device
|
||||||
operation. The SoC framework can subsequently control the availability of the
|
operation. The SoC framework can subsequently control the availability of the
|
||||||
OPPs dynamically using the dev_pm_opp_enable / disable functions.
|
OPPs dynamically using the dev_pm_opp_enable / disable functions.
|
||||||
|
|
||||||
dev_pm_opp_add - Add a new OPP for a specific domain represented by the device pointer.
|
dev_pm_opp_add
|
||||||
|
Add a new OPP for a specific domain represented by the device pointer.
|
||||||
The OPP is defined using the frequency and voltage. Once added, the OPP
|
The OPP is defined using the frequency and voltage. Once added, the OPP
|
||||||
is assumed to be available and control of it's availability can be done
|
is assumed to be available and control of it's availability can be done
|
||||||
with the dev_pm_opp_enable/disable functions. OPP library internally stores
|
with the dev_pm_opp_enable/disable functions. OPP library internally stores
|
||||||
|
@ -96,9 +105,11 @@ dev_pm_opp_add - Add a new OPP for a specific domain represented by the device p
|
||||||
used by SoC framework to define a optimal list as per the demands of
|
used by SoC framework to define a optimal list as per the demands of
|
||||||
SoC usage environment.
|
SoC usage environment.
|
||||||
|
|
||||||
WARNING: Do not use this function in interrupt context.
|
WARNING:
|
||||||
|
Do not use this function in interrupt context.
|
||||||
|
|
||||||
|
Example::
|
||||||
|
|
||||||
Example:
|
|
||||||
soc_pm_init()
|
soc_pm_init()
|
||||||
{
|
{
|
||||||
/* Do things */
|
/* Do things */
|
||||||
|
@ -125,12 +136,15 @@ Callers of these functions shall call dev_pm_opp_put() after they have used the
|
||||||
OPP. Otherwise the memory for the OPP will never get freed and result in
|
OPP. Otherwise the memory for the OPP will never get freed and result in
|
||||||
memleak.
|
memleak.
|
||||||
|
|
||||||
dev_pm_opp_find_freq_exact - Search for an OPP based on an *exact* frequency and
|
dev_pm_opp_find_freq_exact
|
||||||
|
Search for an OPP based on an *exact* frequency and
|
||||||
availability. This function is especially useful to enable an OPP which
|
availability. This function is especially useful to enable an OPP which
|
||||||
is not available by default.
|
is not available by default.
|
||||||
Example: In a case when SoC framework detects a situation where a
|
Example: In a case when SoC framework detects a situation where a
|
||||||
higher frequency could be made available, it can use this function to
|
higher frequency could be made available, it can use this function to
|
||||||
find the OPP prior to call the dev_pm_opp_enable to actually make it available.
|
find the OPP prior to call the dev_pm_opp_enable to actually make
|
||||||
|
it available::
|
||||||
|
|
||||||
opp = dev_pm_opp_find_freq_exact(dev, 1000000000, false);
|
opp = dev_pm_opp_find_freq_exact(dev, 1000000000, false);
|
||||||
dev_pm_opp_put(opp);
|
dev_pm_opp_put(opp);
|
||||||
/* dont operate on the pointer.. just do a sanity check.. */
|
/* dont operate on the pointer.. just do a sanity check.. */
|
||||||
|
@ -141,27 +155,34 @@ dev_pm_opp_find_freq_exact - Search for an OPP based on an *exact* frequency and
|
||||||
dev_pm_opp_enable(dev,1000000000);
|
dev_pm_opp_enable(dev,1000000000);
|
||||||
}
|
}
|
||||||
|
|
||||||
NOTE: This is the only search function that operates on OPPs which are
|
NOTE:
|
||||||
not available.
|
This is the only search function that operates on OPPs which are
|
||||||
|
not available.
|
||||||
|
|
||||||
dev_pm_opp_find_freq_floor - Search for an available OPP which is *at most* the
|
dev_pm_opp_find_freq_floor
|
||||||
|
Search for an available OPP which is *at most* the
|
||||||
provided frequency. This function is useful while searching for a lesser
|
provided frequency. This function is useful while searching for a lesser
|
||||||
match OR operating on OPP information in the order of decreasing
|
match OR operating on OPP information in the order of decreasing
|
||||||
frequency.
|
frequency.
|
||||||
Example: To find the highest opp for a device:
|
Example: To find the highest opp for a device::
|
||||||
|
|
||||||
freq = ULONG_MAX;
|
freq = ULONG_MAX;
|
||||||
opp = dev_pm_opp_find_freq_floor(dev, &freq);
|
opp = dev_pm_opp_find_freq_floor(dev, &freq);
|
||||||
dev_pm_opp_put(opp);
|
dev_pm_opp_put(opp);
|
||||||
|
|
||||||
dev_pm_opp_find_freq_ceil - Search for an available OPP which is *at least* the
|
dev_pm_opp_find_freq_ceil
|
||||||
|
Search for an available OPP which is *at least* the
|
||||||
provided frequency. This function is useful while searching for a
|
provided frequency. This function is useful while searching for a
|
||||||
higher match OR operating on OPP information in the order of increasing
|
higher match OR operating on OPP information in the order of increasing
|
||||||
frequency.
|
frequency.
|
||||||
Example 1: To find the lowest opp for a device:
|
Example 1: To find the lowest opp for a device::
|
||||||
|
|
||||||
freq = 0;
|
freq = 0;
|
||||||
opp = dev_pm_opp_find_freq_ceil(dev, &freq);
|
opp = dev_pm_opp_find_freq_ceil(dev, &freq);
|
||||||
dev_pm_opp_put(opp);
|
dev_pm_opp_put(opp);
|
||||||
Example 2: A simplified implementation of a SoC cpufreq_driver->target:
|
|
||||||
|
Example 2: A simplified implementation of a SoC cpufreq_driver->target::
|
||||||
|
|
||||||
soc_cpufreq_target(..)
|
soc_cpufreq_target(..)
|
||||||
{
|
{
|
||||||
/* Do stuff like policy checks etc. */
|
/* Do stuff like policy checks etc. */
|
||||||
|
@ -184,12 +205,15 @@ fine grained dynamic control of which sets of OPPs are operationally available.
|
||||||
These functions are intended to *temporarily* remove an OPP in conditions such
|
These functions are intended to *temporarily* remove an OPP in conditions such
|
||||||
as thermal considerations (e.g. don't use OPPx until the temperature drops).
|
as thermal considerations (e.g. don't use OPPx until the temperature drops).
|
||||||
|
|
||||||
WARNING: Do not use these functions in interrupt context.
|
WARNING:
|
||||||
|
Do not use these functions in interrupt context.
|
||||||
|
|
||||||
dev_pm_opp_enable - Make a OPP available for operation.
|
dev_pm_opp_enable
|
||||||
|
Make a OPP available for operation.
|
||||||
Example: Lets say that 1GHz OPP is to be made available only if the
|
Example: Lets say that 1GHz OPP is to be made available only if the
|
||||||
SoC temperature is lower than a certain threshold. The SoC framework
|
SoC temperature is lower than a certain threshold. The SoC framework
|
||||||
implementation might choose to do something as follows:
|
implementation might choose to do something as follows::
|
||||||
|
|
||||||
if (cur_temp < temp_low_thresh) {
|
if (cur_temp < temp_low_thresh) {
|
||||||
/* Enable 1GHz if it was disabled */
|
/* Enable 1GHz if it was disabled */
|
||||||
opp = dev_pm_opp_find_freq_exact(dev, 1000000000, false);
|
opp = dev_pm_opp_find_freq_exact(dev, 1000000000, false);
|
||||||
|
@ -201,10 +225,12 @@ dev_pm_opp_enable - Make a OPP available for operation.
|
||||||
goto try_something_else;
|
goto try_something_else;
|
||||||
}
|
}
|
||||||
|
|
||||||
dev_pm_opp_disable - Make an OPP to be not available for operation
|
dev_pm_opp_disable
|
||||||
|
Make an OPP to be not available for operation
|
||||||
Example: Lets say that 1GHz OPP is to be disabled if the temperature
|
Example: Lets say that 1GHz OPP is to be disabled if the temperature
|
||||||
exceeds a threshold value. The SoC framework implementation might
|
exceeds a threshold value. The SoC framework implementation might
|
||||||
choose to do something as follows:
|
choose to do something as follows::
|
||||||
|
|
||||||
if (cur_temp > temp_high_thresh) {
|
if (cur_temp > temp_high_thresh) {
|
||||||
/* Disable 1GHz if it was enabled */
|
/* Disable 1GHz if it was enabled */
|
||||||
opp = dev_pm_opp_find_freq_exact(dev, 1000000000, true);
|
opp = dev_pm_opp_find_freq_exact(dev, 1000000000, true);
|
||||||
|
@ -223,11 +249,13 @@ information from the OPP structure is necessary. Once an OPP pointer is
|
||||||
retrieved using the search functions, the following functions can be used by SoC
|
retrieved using the search functions, the following functions can be used by SoC
|
||||||
framework to retrieve the information represented inside the OPP layer.
|
framework to retrieve the information represented inside the OPP layer.
|
||||||
|
|
||||||
dev_pm_opp_get_voltage - Retrieve the voltage represented by the opp pointer.
|
dev_pm_opp_get_voltage
|
||||||
|
Retrieve the voltage represented by the opp pointer.
|
||||||
Example: At a cpufreq transition to a different frequency, SoC
|
Example: At a cpufreq transition to a different frequency, SoC
|
||||||
framework requires to set the voltage represented by the OPP using
|
framework requires to set the voltage represented by the OPP using
|
||||||
the regulator framework to the Power Management chip providing the
|
the regulator framework to the Power Management chip providing the
|
||||||
voltage.
|
voltage::
|
||||||
|
|
||||||
soc_switch_to_freq_voltage(freq)
|
soc_switch_to_freq_voltage(freq)
|
||||||
{
|
{
|
||||||
/* do things */
|
/* do things */
|
||||||
|
@ -239,10 +267,12 @@ dev_pm_opp_get_voltage - Retrieve the voltage represented by the opp pointer.
|
||||||
/* do other things */
|
/* do other things */
|
||||||
}
|
}
|
||||||
|
|
||||||
dev_pm_opp_get_freq - Retrieve the freq represented by the opp pointer.
|
dev_pm_opp_get_freq
|
||||||
|
Retrieve the freq represented by the opp pointer.
|
||||||
Example: Lets say the SoC framework uses a couple of helper functions
|
Example: Lets say the SoC framework uses a couple of helper functions
|
||||||
we could pass opp pointers instead of doing additional parameters to
|
we could pass opp pointers instead of doing additional parameters to
|
||||||
handle quiet a bit of data parameters.
|
handle quiet a bit of data parameters::
|
||||||
|
|
||||||
soc_cpufreq_target(..)
|
soc_cpufreq_target(..)
|
||||||
{
|
{
|
||||||
/* do things.. */
|
/* do things.. */
|
||||||
|
@ -264,9 +294,11 @@ dev_pm_opp_get_freq - Retrieve the freq represented by the opp pointer.
|
||||||
/* do things.. */
|
/* do things.. */
|
||||||
}
|
}
|
||||||
|
|
||||||
dev_pm_opp_get_opp_count - Retrieve the number of available opps for a device
|
dev_pm_opp_get_opp_count
|
||||||
|
Retrieve the number of available opps for a device
|
||||||
Example: Lets say a co-processor in the SoC needs to know the available
|
Example: Lets say a co-processor in the SoC needs to know the available
|
||||||
frequencies in a table, the main processor can notify as following:
|
frequencies in a table, the main processor can notify as following::
|
||||||
|
|
||||||
soc_notify_coproc_available_frequencies()
|
soc_notify_coproc_available_frequencies()
|
||||||
{
|
{
|
||||||
/* Do things */
|
/* Do things */
|
||||||
|
@ -289,54 +321,59 @@ dev_pm_opp_get_opp_count - Retrieve the number of available opps for a device
|
||||||
==================
|
==================
|
||||||
Typically an SoC contains multiple voltage domains which are variable. Each
|
Typically an SoC contains multiple voltage domains which are variable. Each
|
||||||
domain is represented by a device pointer. The relationship to OPP can be
|
domain is represented by a device pointer. The relationship to OPP can be
|
||||||
represented as follows:
|
represented as follows::
|
||||||
SoC
|
|
||||||
|- device 1
|
SoC
|
||||||
| |- opp 1 (availability, freq, voltage)
|
|- device 1
|
||||||
| |- opp 2 ..
|
| |- opp 1 (availability, freq, voltage)
|
||||||
... ...
|
| |- opp 2 ..
|
||||||
| `- opp n ..
|
... ...
|
||||||
|- device 2
|
| `- opp n ..
|
||||||
...
|
|- device 2
|
||||||
`- device m
|
...
|
||||||
|
`- device m
|
||||||
|
|
||||||
OPP library maintains a internal list that the SoC framework populates and
|
OPP library maintains a internal list that the SoC framework populates and
|
||||||
accessed by various functions as described above. However, the structures
|
accessed by various functions as described above. However, the structures
|
||||||
representing the actual OPPs and domains are internal to the OPP library itself
|
representing the actual OPPs and domains are internal to the OPP library itself
|
||||||
to allow for suitable abstraction reusable across systems.
|
to allow for suitable abstraction reusable across systems.
|
||||||
|
|
||||||
struct dev_pm_opp - The internal data structure of OPP library which is used to
|
struct dev_pm_opp
|
||||||
|
The internal data structure of OPP library which is used to
|
||||||
represent an OPP. In addition to the freq, voltage, availability
|
represent an OPP. In addition to the freq, voltage, availability
|
||||||
information, it also contains internal book keeping information required
|
information, it also contains internal book keeping information required
|
||||||
for the OPP library to operate on. Pointer to this structure is
|
for the OPP library to operate on. Pointer to this structure is
|
||||||
provided back to the users such as SoC framework to be used as a
|
provided back to the users such as SoC framework to be used as a
|
||||||
identifier for OPP in the interactions with OPP layer.
|
identifier for OPP in the interactions with OPP layer.
|
||||||
|
|
||||||
WARNING: The struct dev_pm_opp pointer should not be parsed or modified by the
|
WARNING:
|
||||||
users. The defaults of for an instance is populated by dev_pm_opp_add, but the
|
The struct dev_pm_opp pointer should not be parsed or modified by the
|
||||||
availability of the OPP can be modified by dev_pm_opp_enable/disable functions.
|
users. The defaults of for an instance is populated by
|
||||||
|
dev_pm_opp_add, but the availability of the OPP can be modified
|
||||||
|
by dev_pm_opp_enable/disable functions.
|
||||||
|
|
||||||
struct device - This is used to identify a domain to the OPP layer. The
|
struct device
|
||||||
|
This is used to identify a domain to the OPP layer. The
|
||||||
nature of the device and it's implementation is left to the user of
|
nature of the device and it's implementation is left to the user of
|
||||||
OPP library such as the SoC framework.
|
OPP library such as the SoC framework.
|
||||||
|
|
||||||
Overall, in a simplistic view, the data structure operations is represented as
|
Overall, in a simplistic view, the data structure operations is represented as
|
||||||
following:
|
following::
|
||||||
|
|
||||||
Initialization / modification:
|
Initialization / modification:
|
||||||
+-----+ /- dev_pm_opp_enable
|
+-----+ /- dev_pm_opp_enable
|
||||||
dev_pm_opp_add --> | opp | <-------
|
dev_pm_opp_add --> | opp | <-------
|
||||||
| +-----+ \- dev_pm_opp_disable
|
| +-----+ \- dev_pm_opp_disable
|
||||||
\-------> domain_info(device)
|
\-------> domain_info(device)
|
||||||
|
|
||||||
Search functions:
|
Search functions:
|
||||||
/-- dev_pm_opp_find_freq_ceil ---\ +-----+
|
/-- dev_pm_opp_find_freq_ceil ---\ +-----+
|
||||||
domain_info<---- dev_pm_opp_find_freq_exact -----> | opp |
|
domain_info<---- dev_pm_opp_find_freq_exact -----> | opp |
|
||||||
\-- dev_pm_opp_find_freq_floor ---/ +-----+
|
\-- dev_pm_opp_find_freq_floor ---/ +-----+
|
||||||
|
|
||||||
Retrieval functions:
|
Retrieval functions:
|
||||||
+-----+ /- dev_pm_opp_get_voltage
|
+-----+ /- dev_pm_opp_get_voltage
|
||||||
| opp | <---
|
| opp | <---
|
||||||
+-----+ \- dev_pm_opp_get_freq
|
+-----+ \- dev_pm_opp_get_freq
|
||||||
|
|
||||||
domain_info <- dev_pm_opp_get_opp_count
|
domain_info <- dev_pm_opp_get_opp_count
|
|
@ -1,4 +1,6 @@
|
||||||
|
====================
|
||||||
PCI Power Management
|
PCI Power Management
|
||||||
|
====================
|
||||||
|
|
||||||
Copyright (c) 2010 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
|
Copyright (c) 2010 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
|
||||||
|
|
||||||
|
@ -9,14 +11,14 @@ management. Based on previous work by Patrick Mochel <mochel@transmeta.com>
|
||||||
This document only covers the aspects of power management specific to PCI
|
This document only covers the aspects of power management specific to PCI
|
||||||
devices. For general description of the kernel's interfaces related to device
|
devices. For general description of the kernel's interfaces related to device
|
||||||
power management refer to Documentation/driver-api/pm/devices.rst and
|
power management refer to Documentation/driver-api/pm/devices.rst and
|
||||||
Documentation/power/runtime_pm.txt.
|
Documentation/power/runtime_pm.rst.
|
||||||
|
|
||||||
---------------------------------------------------------------------------
|
.. contents:
|
||||||
|
|
||||||
1. Hardware and Platform Support for PCI Power Management
|
1. Hardware and Platform Support for PCI Power Management
|
||||||
2. PCI Subsystem and Device Power Management
|
2. PCI Subsystem and Device Power Management
|
||||||
3. PCI Device Drivers and Power Management
|
3. PCI Device Drivers and Power Management
|
||||||
4. Resources
|
4. Resources
|
||||||
|
|
||||||
|
|
||||||
1. Hardware and Platform Support for PCI Power Management
|
1. Hardware and Platform Support for PCI Power Management
|
||||||
|
@ -24,6 +26,7 @@ Documentation/power/runtime_pm.txt.
|
||||||
|
|
||||||
1.1. Native and Platform-Based Power Management
|
1.1. Native and Platform-Based Power Management
|
||||||
-----------------------------------------------
|
-----------------------------------------------
|
||||||
|
|
||||||
In general, power management is a feature allowing one to save energy by putting
|
In general, power management is a feature allowing one to save energy by putting
|
||||||
devices into states in which they draw less power (low-power states) at the
|
devices into states in which they draw less power (low-power states) at the
|
||||||
price of reduced functionality or performance.
|
price of reduced functionality or performance.
|
||||||
|
@ -67,6 +70,7 @@ mechanisms have to be used simultaneously to obtain the desired result.
|
||||||
|
|
||||||
1.2. Native PCI Power Management
|
1.2. Native PCI Power Management
|
||||||
--------------------------------
|
--------------------------------
|
||||||
|
|
||||||
The PCI Bus Power Management Interface Specification (PCI PM Spec) was
|
The PCI Bus Power Management Interface Specification (PCI PM Spec) was
|
||||||
introduced between the PCI 2.1 and PCI 2.2 Specifications. It defined a
|
introduced between the PCI 2.1 and PCI 2.2 Specifications. It defined a
|
||||||
standard interface for performing various operations related to power
|
standard interface for performing various operations related to power
|
||||||
|
@ -134,6 +138,7 @@ sufficiently active to generate a wakeup signal.
|
||||||
|
|
||||||
1.3. ACPI Device Power Management
|
1.3. ACPI Device Power Management
|
||||||
---------------------------------
|
---------------------------------
|
||||||
|
|
||||||
The platform firmware support for the power management of PCI devices is
|
The platform firmware support for the power management of PCI devices is
|
||||||
system-specific. However, if the system in question is compliant with the
|
system-specific. However, if the system in question is compliant with the
|
||||||
Advanced Configuration and Power Interface (ACPI) Specification, like the
|
Advanced Configuration and Power Interface (ACPI) Specification, like the
|
||||||
|
@ -194,6 +199,7 @@ enabled for the device to be able to generate wakeup signals.
|
||||||
|
|
||||||
1.4. Wakeup Signaling
|
1.4. Wakeup Signaling
|
||||||
---------------------
|
---------------------
|
||||||
|
|
||||||
Wakeup signals generated by PCI devices, either as native PCI PMEs, or as
|
Wakeup signals generated by PCI devices, either as native PCI PMEs, or as
|
||||||
a result of the execution of the _DSW (or _PSW) ACPI control method before
|
a result of the execution of the _DSW (or _PSW) ACPI control method before
|
||||||
putting the device into a low-power state, have to be caught and handled as
|
putting the device into a low-power state, have to be caught and handled as
|
||||||
|
@ -265,14 +271,15 @@ the native PCI Express PME signaling cannot be used by the kernel in that case.
|
||||||
|
|
||||||
2.1. Device Power Management Callbacks
|
2.1. Device Power Management Callbacks
|
||||||
--------------------------------------
|
--------------------------------------
|
||||||
|
|
||||||
The PCI Subsystem participates in the power management of PCI devices in a
|
The PCI Subsystem participates in the power management of PCI devices in a
|
||||||
number of ways. First of all, it provides an intermediate code layer between
|
number of ways. First of all, it provides an intermediate code layer between
|
||||||
the device power management core (PM core) and PCI device drivers.
|
the device power management core (PM core) and PCI device drivers.
|
||||||
Specifically, the pm field of the PCI subsystem's struct bus_type object,
|
Specifically, the pm field of the PCI subsystem's struct bus_type object,
|
||||||
pci_bus_type, points to a struct dev_pm_ops object, pci_dev_pm_ops, containing
|
pci_bus_type, points to a struct dev_pm_ops object, pci_dev_pm_ops, containing
|
||||||
pointers to several device power management callbacks:
|
pointers to several device power management callbacks::
|
||||||
|
|
||||||
const struct dev_pm_ops pci_dev_pm_ops = {
|
const struct dev_pm_ops pci_dev_pm_ops = {
|
||||||
.prepare = pci_pm_prepare,
|
.prepare = pci_pm_prepare,
|
||||||
.complete = pci_pm_complete,
|
.complete = pci_pm_complete,
|
||||||
.suspend = pci_pm_suspend,
|
.suspend = pci_pm_suspend,
|
||||||
|
@ -290,7 +297,7 @@ const struct dev_pm_ops pci_dev_pm_ops = {
|
||||||
.runtime_suspend = pci_pm_runtime_suspend,
|
.runtime_suspend = pci_pm_runtime_suspend,
|
||||||
.runtime_resume = pci_pm_runtime_resume,
|
.runtime_resume = pci_pm_runtime_resume,
|
||||||
.runtime_idle = pci_pm_runtime_idle,
|
.runtime_idle = pci_pm_runtime_idle,
|
||||||
};
|
};
|
||||||
|
|
||||||
These callbacks are executed by the PM core in various situations related to
|
These callbacks are executed by the PM core in various situations related to
|
||||||
device power management and they, in turn, execute power management callbacks
|
device power management and they, in turn, execute power management callbacks
|
||||||
|
@ -299,9 +306,9 @@ involving some standard configuration registers of PCI devices that device
|
||||||
drivers need not know or care about.
|
drivers need not know or care about.
|
||||||
|
|
||||||
The structure representing a PCI device, struct pci_dev, contains several fields
|
The structure representing a PCI device, struct pci_dev, contains several fields
|
||||||
that these callbacks operate on:
|
that these callbacks operate on::
|
||||||
|
|
||||||
struct pci_dev {
|
struct pci_dev {
|
||||||
...
|
...
|
||||||
pci_power_t current_state; /* Current operating state. */
|
pci_power_t current_state; /* Current operating state. */
|
||||||
int pm_cap; /* PM capability offset in the
|
int pm_cap; /* PM capability offset in the
|
||||||
|
@ -315,13 +322,14 @@ struct pci_dev {
|
||||||
unsigned int wakeup_prepared:1; /* Device prepared for wake up */
|
unsigned int wakeup_prepared:1; /* Device prepared for wake up */
|
||||||
unsigned int d3_delay; /* D3->D0 transition time in ms */
|
unsigned int d3_delay; /* D3->D0 transition time in ms */
|
||||||
...
|
...
|
||||||
};
|
};
|
||||||
|
|
||||||
They also indirectly use some fields of the struct device that is embedded in
|
They also indirectly use some fields of the struct device that is embedded in
|
||||||
struct pci_dev.
|
struct pci_dev.
|
||||||
|
|
||||||
2.2. Device Initialization
|
2.2. Device Initialization
|
||||||
--------------------------
|
--------------------------
|
||||||
|
|
||||||
The PCI subsystem's first task related to device power management is to
|
The PCI subsystem's first task related to device power management is to
|
||||||
prepare the device for power management and initialize the fields of struct
|
prepare the device for power management and initialize the fields of struct
|
||||||
pci_dev used for this purpose. This happens in two functions defined in
|
pci_dev used for this purpose. This happens in two functions defined in
|
||||||
|
@ -348,10 +356,11 @@ during system-wide transitions to a sleep state and back to the working state.
|
||||||
|
|
||||||
2.3. Runtime Device Power Management
|
2.3. Runtime Device Power Management
|
||||||
------------------------------------
|
------------------------------------
|
||||||
|
|
||||||
The PCI subsystem plays a vital role in the runtime power management of PCI
|
The PCI subsystem plays a vital role in the runtime power management of PCI
|
||||||
devices. For this purpose it uses the general runtime power management
|
devices. For this purpose it uses the general runtime power management
|
||||||
(runtime PM) framework described in Documentation/power/runtime_pm.txt.
|
(runtime PM) framework described in Documentation/power/runtime_pm.rst.
|
||||||
Namely, it provides subsystem-level callbacks:
|
Namely, it provides subsystem-level callbacks::
|
||||||
|
|
||||||
pci_pm_runtime_suspend()
|
pci_pm_runtime_suspend()
|
||||||
pci_pm_runtime_resume()
|
pci_pm_runtime_resume()
|
||||||
|
@ -425,13 +434,14 @@ to the given subsystem before the next phase begins. These phases always run
|
||||||
after tasks have been frozen.
|
after tasks have been frozen.
|
||||||
|
|
||||||
2.4.1. System Suspend
|
2.4.1. System Suspend
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
When the system is going into a sleep state in which the contents of memory will
|
When the system is going into a sleep state in which the contents of memory will
|
||||||
be preserved, such as one of the ACPI sleep states S1-S3, the phases are:
|
be preserved, such as one of the ACPI sleep states S1-S3, the phases are:
|
||||||
|
|
||||||
prepare, suspend, suspend_noirq.
|
prepare, suspend, suspend_noirq.
|
||||||
|
|
||||||
The following PCI bus type's callbacks, respectively, are used in these phases:
|
The following PCI bus type's callbacks, respectively, are used in these phases::
|
||||||
|
|
||||||
pci_pm_prepare()
|
pci_pm_prepare()
|
||||||
pci_pm_suspend()
|
pci_pm_suspend()
|
||||||
|
@ -492,6 +502,7 @@ this purpose). PCI device drivers are not encouraged to do that, but in some
|
||||||
rare cases doing that in the driver may be the optimum approach.
|
rare cases doing that in the driver may be the optimum approach.
|
||||||
|
|
||||||
2.4.2. System Resume
|
2.4.2. System Resume
|
||||||
|
^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
When the system is undergoing a transition from a sleep state in which the
|
When the system is undergoing a transition from a sleep state in which the
|
||||||
contents of memory have been preserved, such as one of the ACPI sleep states
|
contents of memory have been preserved, such as one of the ACPI sleep states
|
||||||
|
@ -500,7 +511,7 @@ S1-S3, into the working state (ACPI S0), the phases are:
|
||||||
resume_noirq, resume, complete.
|
resume_noirq, resume, complete.
|
||||||
|
|
||||||
The following PCI bus type's callbacks, respectively, are executed in these
|
The following PCI bus type's callbacks, respectively, are executed in these
|
||||||
phases:
|
phases::
|
||||||
|
|
||||||
pci_pm_resume_noirq()
|
pci_pm_resume_noirq()
|
||||||
pci_pm_resume()
|
pci_pm_resume()
|
||||||
|
@ -539,6 +550,7 @@ The pci_pm_complete() routine only executes the device driver's pm->complete()
|
||||||
callback, if defined.
|
callback, if defined.
|
||||||
|
|
||||||
2.4.3. System Hibernation
|
2.4.3. System Hibernation
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
System hibernation is more complicated than system suspend, because it requires
|
System hibernation is more complicated than system suspend, because it requires
|
||||||
a system image to be created and written into a persistent storage medium. The
|
a system image to be created and written into a persistent storage medium. The
|
||||||
|
@ -551,7 +563,7 @@ to be free) in the following three phases:
|
||||||
|
|
||||||
prepare, freeze, freeze_noirq
|
prepare, freeze, freeze_noirq
|
||||||
|
|
||||||
that correspond to the PCI bus type's callbacks:
|
that correspond to the PCI bus type's callbacks::
|
||||||
|
|
||||||
pci_pm_prepare()
|
pci_pm_prepare()
|
||||||
pci_pm_freeze()
|
pci_pm_freeze()
|
||||||
|
@ -580,7 +592,7 @@ back to the fully functional state and this is done in the following phases:
|
||||||
|
|
||||||
thaw_noirq, thaw, complete
|
thaw_noirq, thaw, complete
|
||||||
|
|
||||||
using the following PCI bus type's callbacks:
|
using the following PCI bus type's callbacks::
|
||||||
|
|
||||||
pci_pm_thaw_noirq()
|
pci_pm_thaw_noirq()
|
||||||
pci_pm_thaw()
|
pci_pm_thaw()
|
||||||
|
@ -608,7 +620,7 @@ three phases:
|
||||||
|
|
||||||
where the prepare phase is exactly the same as for system suspend. The other
|
where the prepare phase is exactly the same as for system suspend. The other
|
||||||
two phases are analogous to the suspend and suspend_noirq phases, respectively.
|
two phases are analogous to the suspend and suspend_noirq phases, respectively.
|
||||||
The PCI subsystem-level callbacks they correspond to
|
The PCI subsystem-level callbacks they correspond to::
|
||||||
|
|
||||||
pci_pm_poweroff()
|
pci_pm_poweroff()
|
||||||
pci_pm_poweroff_noirq()
|
pci_pm_poweroff_noirq()
|
||||||
|
@ -618,6 +630,7 @@ although they don't attempt to save the device's standard configuration
|
||||||
registers.
|
registers.
|
||||||
|
|
||||||
2.4.4. System Restore
|
2.4.4. System Restore
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
System restore requires a hibernation image to be loaded into memory and the
|
System restore requires a hibernation image to be loaded into memory and the
|
||||||
pre-hibernation memory contents to be restored before the pre-hibernation system
|
pre-hibernation memory contents to be restored before the pre-hibernation system
|
||||||
|
@ -653,7 +666,7 @@ phases:
|
||||||
|
|
||||||
The first two of these are analogous to the resume_noirq and resume phases
|
The first two of these are analogous to the resume_noirq and resume phases
|
||||||
described above, respectively, and correspond to the following PCI subsystem
|
described above, respectively, and correspond to the following PCI subsystem
|
||||||
callbacks:
|
callbacks::
|
||||||
|
|
||||||
pci_pm_restore_noirq()
|
pci_pm_restore_noirq()
|
||||||
pci_pm_restore()
|
pci_pm_restore()
|
||||||
|
@ -671,6 +684,7 @@ resume.
|
||||||
|
|
||||||
3.1. Power Management Callbacks
|
3.1. Power Management Callbacks
|
||||||
-------------------------------
|
-------------------------------
|
||||||
|
|
||||||
PCI device drivers participate in power management by providing callbacks to be
|
PCI device drivers participate in power management by providing callbacks to be
|
||||||
executed by the PCI subsystem's power management routines described above and by
|
executed by the PCI subsystem's power management routines described above and by
|
||||||
controlling the runtime power management of their devices.
|
controlling the runtime power management of their devices.
|
||||||
|
@ -698,6 +712,7 @@ defined, though, they are expected to behave as described in the following
|
||||||
subsections.
|
subsections.
|
||||||
|
|
||||||
3.1.1. prepare()
|
3.1.1. prepare()
|
||||||
|
^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
The prepare() callback is executed during system suspend, during hibernation
|
The prepare() callback is executed during system suspend, during hibernation
|
||||||
(when a hibernation image is about to be created), during power-off after
|
(when a hibernation image is about to be created), during power-off after
|
||||||
|
@ -716,6 +731,7 @@ preallocated earlier, for example in a suspend/hibernate notifier as described
|
||||||
in Documentation/driver-api/pm/notifiers.rst).
|
in Documentation/driver-api/pm/notifiers.rst).
|
||||||
|
|
||||||
3.1.2. suspend()
|
3.1.2. suspend()
|
||||||
|
^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
The suspend() callback is only executed during system suspend, after prepare()
|
The suspend() callback is only executed during system suspend, after prepare()
|
||||||
callbacks have been executed for all devices in the system.
|
callbacks have been executed for all devices in the system.
|
||||||
|
@ -742,6 +758,7 @@ operations relying on the driver's ability to handle interrupts should be
|
||||||
carried out in this callback.
|
carried out in this callback.
|
||||||
|
|
||||||
3.1.3. suspend_noirq()
|
3.1.3. suspend_noirq()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
The suspend_noirq() callback is only executed during system suspend, after
|
The suspend_noirq() callback is only executed during system suspend, after
|
||||||
suspend() callbacks have been executed for all devices in the system and
|
suspend() callbacks have been executed for all devices in the system and
|
||||||
|
@ -753,6 +770,7 @@ suspend_noirq() can carry out operations that would cause race conditions to
|
||||||
arise if they were performed in suspend().
|
arise if they were performed in suspend().
|
||||||
|
|
||||||
3.1.4. freeze()
|
3.1.4. freeze()
|
||||||
|
^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
The freeze() callback is hibernation-specific and is executed in two situations,
|
The freeze() callback is hibernation-specific and is executed in two situations,
|
||||||
during hibernation, after prepare() callbacks have been executed for all devices
|
during hibernation, after prepare() callbacks have been executed for all devices
|
||||||
|
@ -770,6 +788,7 @@ or put it into a low-power state. Still, either it or freeze_noirq() should
|
||||||
save the device's standard configuration registers using pci_save_state().
|
save the device's standard configuration registers using pci_save_state().
|
||||||
|
|
||||||
3.1.5. freeze_noirq()
|
3.1.5. freeze_noirq()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
The freeze_noirq() callback is hibernation-specific. It is executed during
|
The freeze_noirq() callback is hibernation-specific. It is executed during
|
||||||
hibernation, after prepare() and freeze() callbacks have been executed for all
|
hibernation, after prepare() and freeze() callbacks have been executed for all
|
||||||
|
@ -786,6 +805,7 @@ The difference between freeze_noirq() and freeze() is analogous to the
|
||||||
difference between suspend_noirq() and suspend().
|
difference between suspend_noirq() and suspend().
|
||||||
|
|
||||||
3.1.6. poweroff()
|
3.1.6. poweroff()
|
||||||
|
^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
The poweroff() callback is hibernation-specific. It is executed when the system
|
The poweroff() callback is hibernation-specific. It is executed when the system
|
||||||
is about to be powered off after saving a hibernation image to a persistent
|
is about to be powered off after saving a hibernation image to a persistent
|
||||||
|
@ -802,6 +822,7 @@ into a low-power state, respectively, but it need not save the device's standard
|
||||||
configuration registers.
|
configuration registers.
|
||||||
|
|
||||||
3.1.7. poweroff_noirq()
|
3.1.7. poweroff_noirq()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
The poweroff_noirq() callback is hibernation-specific. It is executed after
|
The poweroff_noirq() callback is hibernation-specific. It is executed after
|
||||||
poweroff() callbacks have been executed for all devices in the system.
|
poweroff() callbacks have been executed for all devices in the system.
|
||||||
|
@ -814,6 +835,7 @@ The difference between poweroff_noirq() and poweroff() is analogous to the
|
||||||
difference between suspend_noirq() and suspend().
|
difference between suspend_noirq() and suspend().
|
||||||
|
|
||||||
3.1.8. resume_noirq()
|
3.1.8. resume_noirq()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
The resume_noirq() callback is only executed during system resume, after the
|
The resume_noirq() callback is only executed during system resume, after the
|
||||||
PM core has enabled the non-boot CPUs. The driver's interrupt handler will not
|
PM core has enabled the non-boot CPUs. The driver's interrupt handler will not
|
||||||
|
@ -827,6 +849,7 @@ it should only be used for performing operations that would lead to race
|
||||||
conditions if carried out by resume().
|
conditions if carried out by resume().
|
||||||
|
|
||||||
3.1.9. resume()
|
3.1.9. resume()
|
||||||
|
^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
The resume() callback is only executed during system resume, after
|
The resume() callback is only executed during system resume, after
|
||||||
resume_noirq() callbacks have been executed for all devices in the system and
|
resume_noirq() callbacks have been executed for all devices in the system and
|
||||||
|
@ -837,6 +860,7 @@ device and bringing it back to the fully functional state. The device should be
|
||||||
able to process I/O in a usual way after resume() has returned.
|
able to process I/O in a usual way after resume() has returned.
|
||||||
|
|
||||||
3.1.10. thaw_noirq()
|
3.1.10. thaw_noirq()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
The thaw_noirq() callback is hibernation-specific. It is executed after a
|
The thaw_noirq() callback is hibernation-specific. It is executed after a
|
||||||
system image has been created and the non-boot CPUs have been enabled by the PM
|
system image has been created and the non-boot CPUs have been enabled by the PM
|
||||||
|
@ -851,6 +875,7 @@ freeze() and freeze_noirq(), so in general it does not need to modify the
|
||||||
contents of the device's registers.
|
contents of the device's registers.
|
||||||
|
|
||||||
3.1.11. thaw()
|
3.1.11. thaw()
|
||||||
|
^^^^^^^^^^^^^^
|
||||||
|
|
||||||
The thaw() callback is hibernation-specific. It is executed after thaw_noirq()
|
The thaw() callback is hibernation-specific. It is executed after thaw_noirq()
|
||||||
callbacks have been executed for all devices in the system and after device
|
callbacks have been executed for all devices in the system and after device
|
||||||
|
@ -860,6 +885,7 @@ This callback is responsible for restoring the pre-freeze configuration of
|
||||||
the device, so that it will work in a usual way after thaw() has returned.
|
the device, so that it will work in a usual way after thaw() has returned.
|
||||||
|
|
||||||
3.1.12. restore_noirq()
|
3.1.12. restore_noirq()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
The restore_noirq() callback is hibernation-specific. It is executed in the
|
The restore_noirq() callback is hibernation-specific. It is executed in the
|
||||||
restore_noirq phase of hibernation, when the boot kernel has passed control to
|
restore_noirq phase of hibernation, when the boot kernel has passed control to
|
||||||
|
@ -875,6 +901,7 @@ For the vast majority of PCI device drivers there is no difference between
|
||||||
resume_noirq() and restore_noirq().
|
resume_noirq() and restore_noirq().
|
||||||
|
|
||||||
3.1.13. restore()
|
3.1.13. restore()
|
||||||
|
^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
The restore() callback is hibernation-specific. It is executed after
|
The restore() callback is hibernation-specific. It is executed after
|
||||||
restore_noirq() callbacks have been executed for all devices in the system and
|
restore_noirq() callbacks have been executed for all devices in the system and
|
||||||
|
@ -888,14 +915,17 @@ For the vast majority of PCI device drivers there is no difference between
|
||||||
resume() and restore().
|
resume() and restore().
|
||||||
|
|
||||||
3.1.14. complete()
|
3.1.14. complete()
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
The complete() callback is executed in the following situations:
|
The complete() callback is executed in the following situations:
|
||||||
|
|
||||||
- during system resume, after resume() callbacks have been executed for all
|
- during system resume, after resume() callbacks have been executed for all
|
||||||
devices,
|
devices,
|
||||||
- during hibernation, before saving the system image, after thaw() callbacks
|
- during hibernation, before saving the system image, after thaw() callbacks
|
||||||
have been executed for all devices,
|
have been executed for all devices,
|
||||||
- during system restore, when the system is going back to its pre-hibernation
|
- during system restore, when the system is going back to its pre-hibernation
|
||||||
state, after restore() callbacks have been executed for all devices.
|
state, after restore() callbacks have been executed for all devices.
|
||||||
|
|
||||||
It also may be executed if the loading of a hibernation image into memory fails
|
It also may be executed if the loading of a hibernation image into memory fails
|
||||||
(in that case it is run after thaw() callbacks have been executed for all
|
(in that case it is run after thaw() callbacks have been executed for all
|
||||||
devices that have drivers in the boot kernel).
|
devices that have drivers in the boot kernel).
|
||||||
|
@ -904,6 +934,7 @@ This callback is entirely optional, although it may be necessary if the
|
||||||
prepare() callback performs operations that need to be reversed.
|
prepare() callback performs operations that need to be reversed.
|
||||||
|
|
||||||
3.1.15. runtime_suspend()
|
3.1.15. runtime_suspend()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
The runtime_suspend() callback is specific to device runtime power management
|
The runtime_suspend() callback is specific to device runtime power management
|
||||||
(runtime PM). It is executed by the PM core's runtime PM framework when the
|
(runtime PM). It is executed by the PM core's runtime PM framework when the
|
||||||
|
@ -915,6 +946,7 @@ put into a low-power state, but it must allow the PCI subsystem to perform all
|
||||||
of the PCI-specific actions necessary for suspending the device.
|
of the PCI-specific actions necessary for suspending the device.
|
||||||
|
|
||||||
3.1.16. runtime_resume()
|
3.1.16. runtime_resume()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
The runtime_resume() callback is specific to device runtime PM. It is executed
|
The runtime_resume() callback is specific to device runtime PM. It is executed
|
||||||
by the PM core's runtime PM framework when the device is about to be resumed
|
by the PM core's runtime PM framework when the device is about to be resumed
|
||||||
|
@ -927,6 +959,7 @@ The device is expected to be able to process I/O in the usual way after
|
||||||
runtime_resume() has returned.
|
runtime_resume() has returned.
|
||||||
|
|
||||||
3.1.17. runtime_idle()
|
3.1.17. runtime_idle()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
The runtime_idle() callback is specific to device runtime PM. It is executed
|
The runtime_idle() callback is specific to device runtime PM. It is executed
|
||||||
by the PM core's runtime PM framework whenever it may be desirable to suspend
|
by the PM core's runtime PM framework whenever it may be desirable to suspend
|
||||||
|
@ -939,6 +972,7 @@ PCI subsystem will call pm_runtime_suspend() for the device, which in turn will
|
||||||
cause the driver's runtime_suspend() callback to be executed.
|
cause the driver's runtime_suspend() callback to be executed.
|
||||||
|
|
||||||
3.1.18. Pointing Multiple Callback Pointers to One Routine
|
3.1.18. Pointing Multiple Callback Pointers to One Routine
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
Although in principle each of the callbacks described in the previous
|
Although in principle each of the callbacks described in the previous
|
||||||
subsections can be defined as a separate function, it often is convenient to
|
subsections can be defined as a separate function, it often is convenient to
|
||||||
|
@ -962,6 +996,7 @@ dev_pm_ops to indicate that one suspend routine is to be pointed to by the
|
||||||
be pointed to by the .resume(), .thaw(), and .restore() members.
|
be pointed to by the .resume(), .thaw(), and .restore() members.
|
||||||
|
|
||||||
3.1.19. Driver Flags for Power Management
|
3.1.19. Driver Flags for Power Management
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
The PM core allows device drivers to set flags that influence the handling of
|
The PM core allows device drivers to set flags that influence the handling of
|
||||||
power management for the devices by the core itself and by middle layer code
|
power management for the devices by the core itself and by middle layer code
|
||||||
|
@ -1007,6 +1042,7 @@ it.
|
||||||
|
|
||||||
3.2. Device Runtime Power Management
|
3.2. Device Runtime Power Management
|
||||||
------------------------------------
|
------------------------------------
|
||||||
|
|
||||||
In addition to providing device power management callbacks PCI device drivers
|
In addition to providing device power management callbacks PCI device drivers
|
||||||
are responsible for controlling the runtime power management (runtime PM) of
|
are responsible for controlling the runtime power management (runtime PM) of
|
||||||
their devices.
|
their devices.
|
||||||
|
@ -1073,22 +1109,27 @@ device the PM core automatically queues a request to check if the device is
|
||||||
idle), device drivers are generally responsible for queuing power management
|
idle), device drivers are generally responsible for queuing power management
|
||||||
requests for their devices. For this purpose they should use the runtime PM
|
requests for their devices. For this purpose they should use the runtime PM
|
||||||
helper functions provided by the PM core, discussed in
|
helper functions provided by the PM core, discussed in
|
||||||
Documentation/power/runtime_pm.txt.
|
Documentation/power/runtime_pm.rst.
|
||||||
|
|
||||||
Devices can also be suspended and resumed synchronously, without placing a
|
Devices can also be suspended and resumed synchronously, without placing a
|
||||||
request into pm_wq. In the majority of cases this also is done by their
|
request into pm_wq. In the majority of cases this also is done by their
|
||||||
drivers that use helper functions provided by the PM core for this purpose.
|
drivers that use helper functions provided by the PM core for this purpose.
|
||||||
|
|
||||||
For more information on the runtime PM of devices refer to
|
For more information on the runtime PM of devices refer to
|
||||||
Documentation/power/runtime_pm.txt.
|
Documentation/power/runtime_pm.rst.
|
||||||
|
|
||||||
|
|
||||||
4. Resources
|
4. Resources
|
||||||
============
|
============
|
||||||
|
|
||||||
PCI Local Bus Specification, Rev. 3.0
|
PCI Local Bus Specification, Rev. 3.0
|
||||||
|
|
||||||
PCI Bus Power Management Interface Specification, Rev. 1.2
|
PCI Bus Power Management Interface Specification, Rev. 1.2
|
||||||
|
|
||||||
Advanced Configuration and Power Interface (ACPI) Specification, Rev. 3.0b
|
Advanced Configuration and Power Interface (ACPI) Specification, Rev. 3.0b
|
||||||
|
|
||||||
PCI Express Base Specification, Rev. 2.0
|
PCI Express Base Specification, Rev. 2.0
|
||||||
|
|
||||||
Documentation/driver-api/pm/devices.rst
|
Documentation/driver-api/pm/devices.rst
|
||||||
Documentation/power/runtime_pm.txt
|
|
||||||
|
Documentation/power/runtime_pm.rst
|
|
@ -1,4 +1,6 @@
|
||||||
PM Quality Of Service Interface.
|
===============================
|
||||||
|
PM Quality Of Service Interface
|
||||||
|
===============================
|
||||||
|
|
||||||
This interface provides a kernel and user mode interface for registering
|
This interface provides a kernel and user mode interface for registering
|
||||||
performance expectations by drivers, subsystems and user space applications on
|
performance expectations by drivers, subsystems and user space applications on
|
||||||
|
@ -11,6 +13,7 @@ memory_bandwidth.
|
||||||
constraints and PM QoS flags.
|
constraints and PM QoS flags.
|
||||||
|
|
||||||
Each parameters have defined units:
|
Each parameters have defined units:
|
||||||
|
|
||||||
* latency: usec
|
* latency: usec
|
||||||
* timeout: usec
|
* timeout: usec
|
||||||
* throughput: kbs (kilo bit / sec)
|
* throughput: kbs (kilo bit / sec)
|
||||||
|
@ -18,6 +21,7 @@ Each parameters have defined units:
|
||||||
|
|
||||||
|
|
||||||
1. PM QoS framework
|
1. PM QoS framework
|
||||||
|
===================
|
||||||
|
|
||||||
The infrastructure exposes multiple misc device nodes one per implemented
|
The infrastructure exposes multiple misc device nodes one per implemented
|
||||||
parameter. The set of parameters implement is defined by pm_qos_power_init()
|
parameter. The set of parameters implement is defined by pm_qos_power_init()
|
||||||
|
@ -37,38 +41,39 @@ reading the aggregated value does not require any locking mechanism.
|
||||||
From kernel mode the use of this interface is simple:
|
From kernel mode the use of this interface is simple:
|
||||||
|
|
||||||
void pm_qos_add_request(handle, param_class, target_value):
|
void pm_qos_add_request(handle, param_class, target_value):
|
||||||
Will insert an element into the list for that identified PM QoS class with the
|
Will insert an element into the list for that identified PM QoS class with the
|
||||||
target value. Upon change to this list the new target is recomputed and any
|
target value. Upon change to this list the new target is recomputed and any
|
||||||
registered notifiers are called only if the target value is now different.
|
registered notifiers are called only if the target value is now different.
|
||||||
Clients of pm_qos need to save the returned handle for future use in other
|
Clients of pm_qos need to save the returned handle for future use in other
|
||||||
pm_qos API functions.
|
pm_qos API functions.
|
||||||
|
|
||||||
void pm_qos_update_request(handle, new_target_value):
|
void pm_qos_update_request(handle, new_target_value):
|
||||||
Will update the list element pointed to by the handle with the new target value
|
Will update the list element pointed to by the handle with the new target value
|
||||||
and recompute the new aggregated target, calling the notification tree if the
|
and recompute the new aggregated target, calling the notification tree if the
|
||||||
target is changed.
|
target is changed.
|
||||||
|
|
||||||
void pm_qos_remove_request(handle):
|
void pm_qos_remove_request(handle):
|
||||||
Will remove the element. After removal it will update the aggregate target and
|
Will remove the element. After removal it will update the aggregate target and
|
||||||
call the notification tree if the target was changed as a result of removing
|
call the notification tree if the target was changed as a result of removing
|
||||||
the request.
|
the request.
|
||||||
|
|
||||||
int pm_qos_request(param_class):
|
int pm_qos_request(param_class):
|
||||||
Returns the aggregated value for a given PM QoS class.
|
Returns the aggregated value for a given PM QoS class.
|
||||||
|
|
||||||
int pm_qos_request_active(handle):
|
int pm_qos_request_active(handle):
|
||||||
Returns if the request is still active, i.e. it has not been removed from a
|
Returns if the request is still active, i.e. it has not been removed from a
|
||||||
PM QoS class constraints list.
|
PM QoS class constraints list.
|
||||||
|
|
||||||
int pm_qos_add_notifier(param_class, notifier):
|
int pm_qos_add_notifier(param_class, notifier):
|
||||||
Adds a notification callback function to the PM QoS class. The callback is
|
Adds a notification callback function to the PM QoS class. The callback is
|
||||||
called when the aggregated value for the PM QoS class is changed.
|
called when the aggregated value for the PM QoS class is changed.
|
||||||
|
|
||||||
int pm_qos_remove_notifier(int param_class, notifier):
|
int pm_qos_remove_notifier(int param_class, notifier):
|
||||||
Removes the notification callback function for the PM QoS class.
|
Removes the notification callback function for the PM QoS class.
|
||||||
|
|
||||||
|
|
||||||
From user mode:
|
From user mode:
|
||||||
|
|
||||||
Only processes can register a pm_qos request. To provide for automatic
|
Only processes can register a pm_qos request. To provide for automatic
|
||||||
cleanup of a process, the interface requires the process to register its
|
cleanup of a process, the interface requires the process to register its
|
||||||
parameter requests in the following way:
|
parameter requests in the following way:
|
||||||
|
@ -89,6 +94,7 @@ node.
|
||||||
|
|
||||||
|
|
||||||
2. PM QoS per-device latency and flags framework
|
2. PM QoS per-device latency and flags framework
|
||||||
|
================================================
|
||||||
|
|
||||||
For each device, there are three lists of PM QoS requests. Two of them are
|
For each device, there are three lists of PM QoS requests. Two of them are
|
||||||
maintained along with the aggregated targets of resume latency and active
|
maintained along with the aggregated targets of resume latency and active
|
||||||
|
@ -107,73 +113,80 @@ the aggregated value does not require any locking mechanism.
|
||||||
From kernel mode the use of this interface is the following:
|
From kernel mode the use of this interface is the following:
|
||||||
|
|
||||||
int dev_pm_qos_add_request(device, handle, type, value):
|
int dev_pm_qos_add_request(device, handle, type, value):
|
||||||
Will insert an element into the list for that identified device with the
|
Will insert an element into the list for that identified device with the
|
||||||
target value. Upon change to this list the new target is recomputed and any
|
target value. Upon change to this list the new target is recomputed and any
|
||||||
registered notifiers are called only if the target value is now different.
|
registered notifiers are called only if the target value is now different.
|
||||||
Clients of dev_pm_qos need to save the handle for future use in other
|
Clients of dev_pm_qos need to save the handle for future use in other
|
||||||
dev_pm_qos API functions.
|
dev_pm_qos API functions.
|
||||||
|
|
||||||
int dev_pm_qos_update_request(handle, new_value):
|
int dev_pm_qos_update_request(handle, new_value):
|
||||||
Will update the list element pointed to by the handle with the new target value
|
Will update the list element pointed to by the handle with the new target
|
||||||
and recompute the new aggregated target, calling the notification trees if the
|
value and recompute the new aggregated target, calling the notification
|
||||||
target is changed.
|
trees if the target is changed.
|
||||||
|
|
||||||
int dev_pm_qos_remove_request(handle):
|
int dev_pm_qos_remove_request(handle):
|
||||||
Will remove the element. After removal it will update the aggregate target and
|
Will remove the element. After removal it will update the aggregate target
|
||||||
call the notification trees if the target was changed as a result of removing
|
and call the notification trees if the target was changed as a result of
|
||||||
the request.
|
removing the request.
|
||||||
|
|
||||||
s32 dev_pm_qos_read_value(device):
|
s32 dev_pm_qos_read_value(device):
|
||||||
Returns the aggregated value for a given device's constraints list.
|
Returns the aggregated value for a given device's constraints list.
|
||||||
|
|
||||||
enum pm_qos_flags_status dev_pm_qos_flags(device, mask)
|
enum pm_qos_flags_status dev_pm_qos_flags(device, mask)
|
||||||
Check PM QoS flags of the given device against the given mask of flags.
|
Check PM QoS flags of the given device against the given mask of flags.
|
||||||
The meaning of the return values is as follows:
|
The meaning of the return values is as follows:
|
||||||
PM_QOS_FLAGS_ALL: All flags from the mask are set
|
|
||||||
PM_QOS_FLAGS_SOME: Some flags from the mask are set
|
PM_QOS_FLAGS_ALL:
|
||||||
PM_QOS_FLAGS_NONE: No flags from the mask are set
|
All flags from the mask are set
|
||||||
PM_QOS_FLAGS_UNDEFINED: The device's PM QoS structure has not been
|
PM_QOS_FLAGS_SOME:
|
||||||
initialized or the list of requests is empty.
|
Some flags from the mask are set
|
||||||
|
PM_QOS_FLAGS_NONE:
|
||||||
|
No flags from the mask are set
|
||||||
|
PM_QOS_FLAGS_UNDEFINED:
|
||||||
|
The device's PM QoS structure has not been initialized
|
||||||
|
or the list of requests is empty.
|
||||||
|
|
||||||
int dev_pm_qos_add_ancestor_request(dev, handle, type, value)
|
int dev_pm_qos_add_ancestor_request(dev, handle, type, value)
|
||||||
Add a PM QoS request for the first direct ancestor of the given device whose
|
Add a PM QoS request for the first direct ancestor of the given device whose
|
||||||
power.ignore_children flag is unset (for DEV_PM_QOS_RESUME_LATENCY requests)
|
power.ignore_children flag is unset (for DEV_PM_QOS_RESUME_LATENCY requests)
|
||||||
or whose power.set_latency_tolerance callback pointer is not NULL (for
|
or whose power.set_latency_tolerance callback pointer is not NULL (for
|
||||||
DEV_PM_QOS_LATENCY_TOLERANCE requests).
|
DEV_PM_QOS_LATENCY_TOLERANCE requests).
|
||||||
|
|
||||||
int dev_pm_qos_expose_latency_limit(device, value)
|
int dev_pm_qos_expose_latency_limit(device, value)
|
||||||
Add a request to the device's PM QoS list of resume latency constraints and
|
Add a request to the device's PM QoS list of resume latency constraints and
|
||||||
create a sysfs attribute pm_qos_resume_latency_us under the device's power
|
create a sysfs attribute pm_qos_resume_latency_us under the device's power
|
||||||
directory allowing user space to manipulate that request.
|
directory allowing user space to manipulate that request.
|
||||||
|
|
||||||
void dev_pm_qos_hide_latency_limit(device)
|
void dev_pm_qos_hide_latency_limit(device)
|
||||||
Drop the request added by dev_pm_qos_expose_latency_limit() from the device's
|
Drop the request added by dev_pm_qos_expose_latency_limit() from the device's
|
||||||
PM QoS list of resume latency constraints and remove sysfs attribute
|
PM QoS list of resume latency constraints and remove sysfs attribute
|
||||||
pm_qos_resume_latency_us from the device's power directory.
|
pm_qos_resume_latency_us from the device's power directory.
|
||||||
|
|
||||||
int dev_pm_qos_expose_flags(device, value)
|
int dev_pm_qos_expose_flags(device, value)
|
||||||
Add a request to the device's PM QoS list of flags and create sysfs attribute
|
Add a request to the device's PM QoS list of flags and create sysfs attribute
|
||||||
pm_qos_no_power_off under the device's power directory allowing user space to
|
pm_qos_no_power_off under the device's power directory allowing user space to
|
||||||
change the value of the PM_QOS_FLAG_NO_POWER_OFF flag.
|
change the value of the PM_QOS_FLAG_NO_POWER_OFF flag.
|
||||||
|
|
||||||
void dev_pm_qos_hide_flags(device)
|
void dev_pm_qos_hide_flags(device)
|
||||||
Drop the request added by dev_pm_qos_expose_flags() from the device's PM QoS list
|
Drop the request added by dev_pm_qos_expose_flags() from the device's PM QoS list
|
||||||
of flags and remove sysfs attribute pm_qos_no_power_off from the device's power
|
of flags and remove sysfs attribute pm_qos_no_power_off from the device's power
|
||||||
directory.
|
directory.
|
||||||
|
|
||||||
Notification mechanisms:
|
Notification mechanisms:
|
||||||
|
|
||||||
The per-device PM QoS framework has a per-device notification tree.
|
The per-device PM QoS framework has a per-device notification tree.
|
||||||
|
|
||||||
int dev_pm_qos_add_notifier(device, notifier):
|
int dev_pm_qos_add_notifier(device, notifier):
|
||||||
Adds a notification callback function for the device.
|
Adds a notification callback function for the device.
|
||||||
The callback is called when the aggregated value of the device constraints list
|
The callback is called when the aggregated value of the device constraints list
|
||||||
is changed (for resume latency device PM QoS only).
|
is changed (for resume latency device PM QoS only).
|
||||||
|
|
||||||
int dev_pm_qos_remove_notifier(device, notifier):
|
int dev_pm_qos_remove_notifier(device, notifier):
|
||||||
Removes the notification callback function for the device.
|
Removes the notification callback function for the device.
|
||||||
|
|
||||||
|
|
||||||
Active state latency tolerance
|
Active state latency tolerance
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
This device PM QoS type is used to support systems in which hardware may switch
|
This device PM QoS type is used to support systems in which hardware may switch
|
||||||
to energy-saving operation modes on the fly. In those systems, if the operation
|
to energy-saving operation modes on the fly. In those systems, if the operation
|
|
@ -0,0 +1,282 @@
|
||||||
|
========================
|
||||||
|
Linux power supply class
|
||||||
|
========================
|
||||||
|
|
||||||
|
Synopsis
|
||||||
|
~~~~~~~~
|
||||||
|
Power supply class used to represent battery, UPS, AC or DC power supply
|
||||||
|
properties to user-space.
|
||||||
|
|
||||||
|
It defines core set of attributes, which should be applicable to (almost)
|
||||||
|
every power supply out there. Attributes are available via sysfs and uevent
|
||||||
|
interfaces.
|
||||||
|
|
||||||
|
Each attribute has well defined meaning, up to unit of measure used. While
|
||||||
|
the attributes provided are believed to be universally applicable to any
|
||||||
|
power supply, specific monitoring hardware may not be able to provide them
|
||||||
|
all, so any of them may be skipped.
|
||||||
|
|
||||||
|
Power supply class is extensible, and allows to define drivers own attributes.
|
||||||
|
The core attribute set is subject to the standard Linux evolution (i.e.
|
||||||
|
if it will be found that some attribute is applicable to many power supply
|
||||||
|
types or their drivers, it can be added to the core set).
|
||||||
|
|
||||||
|
It also integrates with LED framework, for the purpose of providing
|
||||||
|
typically expected feedback of battery charging/fully charged status and
|
||||||
|
AC/USB power supply online status. (Note that specific details of the
|
||||||
|
indication (including whether to use it at all) are fully controllable by
|
||||||
|
user and/or specific machine defaults, per design principles of LED
|
||||||
|
framework).
|
||||||
|
|
||||||
|
|
||||||
|
Attributes/properties
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
Power supply class has predefined set of attributes, this eliminates code
|
||||||
|
duplication across drivers. Power supply class insist on reusing its
|
||||||
|
predefined attributes *and* their units.
|
||||||
|
|
||||||
|
So, userspace gets predictable set of attributes and their units for any
|
||||||
|
kind of power supply, and can process/present them to a user in consistent
|
||||||
|
manner. Results for different power supplies and machines are also directly
|
||||||
|
comparable.
|
||||||
|
|
||||||
|
See drivers/power/supply/ds2760_battery.c and drivers/power/supply/pda_power.c
|
||||||
|
for the example how to declare and handle attributes.
|
||||||
|
|
||||||
|
|
||||||
|
Units
|
||||||
|
~~~~~
|
||||||
|
Quoting include/linux/power_supply.h:
|
||||||
|
|
||||||
|
All voltages, currents, charges, energies, time and temperatures in µV,
|
||||||
|
µA, µAh, µWh, seconds and tenths of degree Celsius unless otherwise
|
||||||
|
stated. It's driver's job to convert its raw values to units in which
|
||||||
|
this class operates.
|
||||||
|
|
||||||
|
|
||||||
|
Attributes/properties detailed
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
+--------------------------------------------------------------------------+
|
||||||
|
| **Charge/Energy/Capacity - how to not confuse** |
|
||||||
|
+--------------------------------------------------------------------------+
|
||||||
|
| **Because both "charge" (µAh) and "energy" (µWh) represents "capacity" |
|
||||||
|
| of battery, this class distinguish these terms. Don't mix them!** |
|
||||||
|
| |
|
||||||
|
| - `CHARGE_*` |
|
||||||
|
| attributes represents capacity in µAh only. |
|
||||||
|
| - `ENERGY_*` |
|
||||||
|
| attributes represents capacity in µWh only. |
|
||||||
|
| - `CAPACITY` |
|
||||||
|
| attribute represents capacity in *percents*, from 0 to 100. |
|
||||||
|
+--------------------------------------------------------------------------+
|
||||||
|
|
||||||
|
Postfixes:
|
||||||
|
|
||||||
|
_AVG
|
||||||
|
*hardware* averaged value, use it if your hardware is really able to
|
||||||
|
report averaged values.
|
||||||
|
_NOW
|
||||||
|
momentary/instantaneous values.
|
||||||
|
|
||||||
|
STATUS
|
||||||
|
this attribute represents operating status (charging, full,
|
||||||
|
discharging (i.e. powering a load), etc.). This corresponds to
|
||||||
|
`BATTERY_STATUS_*` values, as defined in battery.h.
|
||||||
|
|
||||||
|
CHARGE_TYPE
|
||||||
|
batteries can typically charge at different rates.
|
||||||
|
This defines trickle and fast charges. For batteries that
|
||||||
|
are already charged or discharging, 'n/a' can be displayed (or
|
||||||
|
'unknown', if the status is not known).
|
||||||
|
|
||||||
|
AUTHENTIC
|
||||||
|
indicates the power supply (battery or charger) connected
|
||||||
|
to the platform is authentic(1) or non authentic(0).
|
||||||
|
|
||||||
|
HEALTH
|
||||||
|
represents health of the battery, values corresponds to
|
||||||
|
POWER_SUPPLY_HEALTH_*, defined in battery.h.
|
||||||
|
|
||||||
|
VOLTAGE_OCV
|
||||||
|
open circuit voltage of the battery.
|
||||||
|
|
||||||
|
VOLTAGE_MAX_DESIGN, VOLTAGE_MIN_DESIGN
|
||||||
|
design values for maximal and minimal power supply voltages.
|
||||||
|
Maximal/minimal means values of voltages when battery considered
|
||||||
|
"full"/"empty" at normal conditions. Yes, there is no direct relation
|
||||||
|
between voltage and battery capacity, but some dumb
|
||||||
|
batteries use voltage for very approximated calculation of capacity.
|
||||||
|
Battery driver also can use this attribute just to inform userspace
|
||||||
|
about maximal and minimal voltage thresholds of a given battery.
|
||||||
|
|
||||||
|
VOLTAGE_MAX, VOLTAGE_MIN
|
||||||
|
same as _DESIGN voltage values except that these ones should be used
|
||||||
|
if hardware could only guess (measure and retain) the thresholds of a
|
||||||
|
given power supply.
|
||||||
|
|
||||||
|
VOLTAGE_BOOT
|
||||||
|
Reports the voltage measured during boot
|
||||||
|
|
||||||
|
CURRENT_BOOT
|
||||||
|
Reports the current measured during boot
|
||||||
|
|
||||||
|
CHARGE_FULL_DESIGN, CHARGE_EMPTY_DESIGN
|
||||||
|
design charge values, when battery considered full/empty.
|
||||||
|
|
||||||
|
ENERGY_FULL_DESIGN, ENERGY_EMPTY_DESIGN
|
||||||
|
same as above but for energy.
|
||||||
|
|
||||||
|
CHARGE_FULL, CHARGE_EMPTY
|
||||||
|
These attributes means "last remembered value of charge when battery
|
||||||
|
became full/empty". It also could mean "value of charge when battery
|
||||||
|
considered full/empty at given conditions (temperature, age)".
|
||||||
|
I.e. these attributes represents real thresholds, not design values.
|
||||||
|
|
||||||
|
ENERGY_FULL, ENERGY_EMPTY
|
||||||
|
same as above but for energy.
|
||||||
|
|
||||||
|
CHARGE_COUNTER
|
||||||
|
the current charge counter (in µAh). This could easily
|
||||||
|
be negative; there is no empty or full value. It is only useful for
|
||||||
|
relative, time-based measurements.
|
||||||
|
|
||||||
|
PRECHARGE_CURRENT
|
||||||
|
the maximum charge current during precharge phase of charge cycle
|
||||||
|
(typically 20% of battery capacity).
|
||||||
|
|
||||||
|
CHARGE_TERM_CURRENT
|
||||||
|
Charge termination current. The charge cycle terminates when battery
|
||||||
|
voltage is above recharge threshold, and charge current is below
|
||||||
|
this setting (typically 10% of battery capacity).
|
||||||
|
|
||||||
|
CONSTANT_CHARGE_CURRENT
|
||||||
|
constant charge current programmed by charger.
|
||||||
|
|
||||||
|
|
||||||
|
CONSTANT_CHARGE_CURRENT_MAX
|
||||||
|
maximum charge current supported by the power supply object.
|
||||||
|
|
||||||
|
CONSTANT_CHARGE_VOLTAGE
|
||||||
|
constant charge voltage programmed by charger.
|
||||||
|
CONSTANT_CHARGE_VOLTAGE_MAX
|
||||||
|
maximum charge voltage supported by the power supply object.
|
||||||
|
|
||||||
|
INPUT_CURRENT_LIMIT
|
||||||
|
input current limit programmed by charger. Indicates
|
||||||
|
the current drawn from a charging source.
|
||||||
|
|
||||||
|
CHARGE_CONTROL_LIMIT
|
||||||
|
current charge control limit setting
|
||||||
|
CHARGE_CONTROL_LIMIT_MAX
|
||||||
|
maximum charge control limit setting
|
||||||
|
|
||||||
|
CALIBRATE
|
||||||
|
battery or coulomb counter calibration status
|
||||||
|
|
||||||
|
CAPACITY
|
||||||
|
capacity in percents.
|
||||||
|
CAPACITY_ALERT_MIN
|
||||||
|
minimum capacity alert value in percents.
|
||||||
|
CAPACITY_ALERT_MAX
|
||||||
|
maximum capacity alert value in percents.
|
||||||
|
CAPACITY_LEVEL
|
||||||
|
capacity level. This corresponds to POWER_SUPPLY_CAPACITY_LEVEL_*.
|
||||||
|
|
||||||
|
TEMP
|
||||||
|
temperature of the power supply.
|
||||||
|
TEMP_ALERT_MIN
|
||||||
|
minimum battery temperature alert.
|
||||||
|
TEMP_ALERT_MAX
|
||||||
|
maximum battery temperature alert.
|
||||||
|
TEMP_AMBIENT
|
||||||
|
ambient temperature.
|
||||||
|
TEMP_AMBIENT_ALERT_MIN
|
||||||
|
minimum ambient temperature alert.
|
||||||
|
TEMP_AMBIENT_ALERT_MAX
|
||||||
|
maximum ambient temperature alert.
|
||||||
|
TEMP_MIN
|
||||||
|
minimum operatable temperature
|
||||||
|
TEMP_MAX
|
||||||
|
maximum operatable temperature
|
||||||
|
|
||||||
|
TIME_TO_EMPTY
|
||||||
|
seconds left for battery to be considered empty
|
||||||
|
(i.e. while battery powers a load)
|
||||||
|
TIME_TO_FULL
|
||||||
|
seconds left for battery to be considered full
|
||||||
|
(i.e. while battery is charging)
|
||||||
|
|
||||||
|
|
||||||
|
Battery <-> external power supply interaction
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
Often power supplies are acting as supplies and supplicants at the same
|
||||||
|
time. Batteries are good example. So, batteries usually care if they're
|
||||||
|
externally powered or not.
|
||||||
|
|
||||||
|
For that case, power supply class implements notification mechanism for
|
||||||
|
batteries.
|
||||||
|
|
||||||
|
External power supply (AC) lists supplicants (batteries) names in
|
||||||
|
"supplied_to" struct member, and each power_supply_changed() call
|
||||||
|
issued by external power supply will notify supplicants via
|
||||||
|
external_power_changed callback.
|
||||||
|
|
||||||
|
|
||||||
|
Devicetree battery characteristics
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
Drivers should call power_supply_get_battery_info() to obtain battery
|
||||||
|
characteristics from a devicetree battery node, defined in
|
||||||
|
Documentation/devicetree/bindings/power/supply/battery.txt. This is
|
||||||
|
implemented in drivers/power/supply/bq27xxx_battery.c.
|
||||||
|
|
||||||
|
Properties in struct power_supply_battery_info and their counterparts in the
|
||||||
|
battery node have names corresponding to elements in enum power_supply_property,
|
||||||
|
for naming consistency between sysfs attributes and battery node properties.
|
||||||
|
|
||||||
|
|
||||||
|
QA
|
||||||
|
~~
|
||||||
|
|
||||||
|
Q:
|
||||||
|
Where is POWER_SUPPLY_PROP_XYZ attribute?
|
||||||
|
A:
|
||||||
|
If you cannot find attribute suitable for your driver needs, feel free
|
||||||
|
to add it and send patch along with your driver.
|
||||||
|
|
||||||
|
The attributes available currently are the ones currently provided by the
|
||||||
|
drivers written.
|
||||||
|
|
||||||
|
Good candidates to add in future: model/part#, cycle_time, manufacturer,
|
||||||
|
etc.
|
||||||
|
|
||||||
|
|
||||||
|
Q:
|
||||||
|
I have some very specific attribute (e.g. battery color), should I add
|
||||||
|
this attribute to standard ones?
|
||||||
|
A:
|
||||||
|
Most likely, no. Such attribute can be placed in the driver itself, if
|
||||||
|
it is useful. Of course, if the attribute in question applicable to
|
||||||
|
large set of batteries, provided by many drivers, and/or comes from
|
||||||
|
some general battery specification/standard, it may be a candidate to
|
||||||
|
be added to the core attribute set.
|
||||||
|
|
||||||
|
|
||||||
|
Q:
|
||||||
|
Suppose, my battery monitoring chip/firmware does not provides capacity
|
||||||
|
in percents, but provides charge_{now,full,empty}. Should I calculate
|
||||||
|
percentage capacity manually, inside the driver, and register CAPACITY
|
||||||
|
attribute? The same question about time_to_empty/time_to_full.
|
||||||
|
A:
|
||||||
|
Most likely, no. This class is designed to export properties which are
|
||||||
|
directly measurable by the specific hardware available.
|
||||||
|
|
||||||
|
Inferring not available properties using some heuristics or mathematical
|
||||||
|
model is not subject of work for a battery driver. Such functionality
|
||||||
|
should be factored out, and in fact, apm_power, the driver to serve
|
||||||
|
legacy APM API on top of power supply class, uses a simple heuristic of
|
||||||
|
approximating remaining battery capacity based on its charge, current,
|
||||||
|
voltage and so on. But full-fledged battery model is likely not subject
|
||||||
|
for kernel at all, as it would require floating point calculation to deal
|
||||||
|
with things like differential equations and Kalman filters. This is
|
||||||
|
better be handled by batteryd/libbattery, yet to be written.
|
|
@ -1,231 +0,0 @@
|
||||||
Linux power supply class
|
|
||||||
========================
|
|
||||||
|
|
||||||
Synopsis
|
|
||||||
~~~~~~~~
|
|
||||||
Power supply class used to represent battery, UPS, AC or DC power supply
|
|
||||||
properties to user-space.
|
|
||||||
|
|
||||||
It defines core set of attributes, which should be applicable to (almost)
|
|
||||||
every power supply out there. Attributes are available via sysfs and uevent
|
|
||||||
interfaces.
|
|
||||||
|
|
||||||
Each attribute has well defined meaning, up to unit of measure used. While
|
|
||||||
the attributes provided are believed to be universally applicable to any
|
|
||||||
power supply, specific monitoring hardware may not be able to provide them
|
|
||||||
all, so any of them may be skipped.
|
|
||||||
|
|
||||||
Power supply class is extensible, and allows to define drivers own attributes.
|
|
||||||
The core attribute set is subject to the standard Linux evolution (i.e.
|
|
||||||
if it will be found that some attribute is applicable to many power supply
|
|
||||||
types or their drivers, it can be added to the core set).
|
|
||||||
|
|
||||||
It also integrates with LED framework, for the purpose of providing
|
|
||||||
typically expected feedback of battery charging/fully charged status and
|
|
||||||
AC/USB power supply online status. (Note that specific details of the
|
|
||||||
indication (including whether to use it at all) are fully controllable by
|
|
||||||
user and/or specific machine defaults, per design principles of LED
|
|
||||||
framework).
|
|
||||||
|
|
||||||
|
|
||||||
Attributes/properties
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
Power supply class has predefined set of attributes, this eliminates code
|
|
||||||
duplication across drivers. Power supply class insist on reusing its
|
|
||||||
predefined attributes *and* their units.
|
|
||||||
|
|
||||||
So, userspace gets predictable set of attributes and their units for any
|
|
||||||
kind of power supply, and can process/present them to a user in consistent
|
|
||||||
manner. Results for different power supplies and machines are also directly
|
|
||||||
comparable.
|
|
||||||
|
|
||||||
See drivers/power/supply/ds2760_battery.c and drivers/power/supply/pda_power.c
|
|
||||||
for the example how to declare and handle attributes.
|
|
||||||
|
|
||||||
|
|
||||||
Units
|
|
||||||
~~~~~
|
|
||||||
Quoting include/linux/power_supply.h:
|
|
||||||
|
|
||||||
All voltages, currents, charges, energies, time and temperatures in µV,
|
|
||||||
µA, µAh, µWh, seconds and tenths of degree Celsius unless otherwise
|
|
||||||
stated. It's driver's job to convert its raw values to units in which
|
|
||||||
this class operates.
|
|
||||||
|
|
||||||
|
|
||||||
Attributes/properties detailed
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
~ ~ ~ ~ ~ ~ ~ Charge/Energy/Capacity - how to not confuse ~ ~ ~ ~ ~ ~ ~
|
|
||||||
~ ~
|
|
||||||
~ Because both "charge" (µAh) and "energy" (µWh) represents "capacity" ~
|
|
||||||
~ of battery, this class distinguish these terms. Don't mix them! ~
|
|
||||||
~ ~
|
|
||||||
~ CHARGE_* attributes represents capacity in µAh only. ~
|
|
||||||
~ ENERGY_* attributes represents capacity in µWh only. ~
|
|
||||||
~ CAPACITY attribute represents capacity in *percents*, from 0 to 100. ~
|
|
||||||
~ ~
|
|
||||||
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
|
|
||||||
|
|
||||||
Postfixes:
|
|
||||||
_AVG - *hardware* averaged value, use it if your hardware is really able to
|
|
||||||
report averaged values.
|
|
||||||
_NOW - momentary/instantaneous values.
|
|
||||||
|
|
||||||
STATUS - this attribute represents operating status (charging, full,
|
|
||||||
discharging (i.e. powering a load), etc.). This corresponds to
|
|
||||||
BATTERY_STATUS_* values, as defined in battery.h.
|
|
||||||
|
|
||||||
CHARGE_TYPE - batteries can typically charge at different rates.
|
|
||||||
This defines trickle and fast charges. For batteries that
|
|
||||||
are already charged or discharging, 'n/a' can be displayed (or
|
|
||||||
'unknown', if the status is not known).
|
|
||||||
|
|
||||||
AUTHENTIC - indicates the power supply (battery or charger) connected
|
|
||||||
to the platform is authentic(1) or non authentic(0).
|
|
||||||
|
|
||||||
HEALTH - represents health of the battery, values corresponds to
|
|
||||||
POWER_SUPPLY_HEALTH_*, defined in battery.h.
|
|
||||||
|
|
||||||
VOLTAGE_OCV - open circuit voltage of the battery.
|
|
||||||
|
|
||||||
VOLTAGE_MAX_DESIGN, VOLTAGE_MIN_DESIGN - design values for maximal and
|
|
||||||
minimal power supply voltages. Maximal/minimal means values of voltages
|
|
||||||
when battery considered "full"/"empty" at normal conditions. Yes, there is
|
|
||||||
no direct relation between voltage and battery capacity, but some dumb
|
|
||||||
batteries use voltage for very approximated calculation of capacity.
|
|
||||||
Battery driver also can use this attribute just to inform userspace
|
|
||||||
about maximal and minimal voltage thresholds of a given battery.
|
|
||||||
|
|
||||||
VOLTAGE_MAX, VOLTAGE_MIN - same as _DESIGN voltage values except that
|
|
||||||
these ones should be used if hardware could only guess (measure and
|
|
||||||
retain) the thresholds of a given power supply.
|
|
||||||
|
|
||||||
VOLTAGE_BOOT - Reports the voltage measured during boot
|
|
||||||
|
|
||||||
CURRENT_BOOT - Reports the current measured during boot
|
|
||||||
|
|
||||||
CHARGE_FULL_DESIGN, CHARGE_EMPTY_DESIGN - design charge values, when
|
|
||||||
battery considered full/empty.
|
|
||||||
|
|
||||||
ENERGY_FULL_DESIGN, ENERGY_EMPTY_DESIGN - same as above but for energy.
|
|
||||||
|
|
||||||
CHARGE_FULL, CHARGE_EMPTY - These attributes means "last remembered value
|
|
||||||
of charge when battery became full/empty". It also could mean "value of
|
|
||||||
charge when battery considered full/empty at given conditions (temperature,
|
|
||||||
age)". I.e. these attributes represents real thresholds, not design values.
|
|
||||||
|
|
||||||
ENERGY_FULL, ENERGY_EMPTY - same as above but for energy.
|
|
||||||
|
|
||||||
CHARGE_COUNTER - the current charge counter (in µAh). This could easily
|
|
||||||
be negative; there is no empty or full value. It is only useful for
|
|
||||||
relative, time-based measurements.
|
|
||||||
|
|
||||||
PRECHARGE_CURRENT - the maximum charge current during precharge phase
|
|
||||||
of charge cycle (typically 20% of battery capacity).
|
|
||||||
CHARGE_TERM_CURRENT - Charge termination current. The charge cycle
|
|
||||||
terminates when battery voltage is above recharge threshold, and charge
|
|
||||||
current is below this setting (typically 10% of battery capacity).
|
|
||||||
|
|
||||||
CONSTANT_CHARGE_CURRENT - constant charge current programmed by charger.
|
|
||||||
CONSTANT_CHARGE_CURRENT_MAX - maximum charge current supported by the
|
|
||||||
power supply object.
|
|
||||||
|
|
||||||
CONSTANT_CHARGE_VOLTAGE - constant charge voltage programmed by charger.
|
|
||||||
CONSTANT_CHARGE_VOLTAGE_MAX - maximum charge voltage supported by the
|
|
||||||
power supply object.
|
|
||||||
|
|
||||||
INPUT_CURRENT_LIMIT - input current limit programmed by charger. Indicates
|
|
||||||
the current drawn from a charging source.
|
|
||||||
|
|
||||||
CHARGE_CONTROL_LIMIT - current charge control limit setting
|
|
||||||
CHARGE_CONTROL_LIMIT_MAX - maximum charge control limit setting
|
|
||||||
|
|
||||||
CALIBRATE - battery or coulomb counter calibration status
|
|
||||||
|
|
||||||
CAPACITY - capacity in percents.
|
|
||||||
CAPACITY_ALERT_MIN - minimum capacity alert value in percents.
|
|
||||||
CAPACITY_ALERT_MAX - maximum capacity alert value in percents.
|
|
||||||
CAPACITY_LEVEL - capacity level. This corresponds to
|
|
||||||
POWER_SUPPLY_CAPACITY_LEVEL_*.
|
|
||||||
|
|
||||||
TEMP - temperature of the power supply.
|
|
||||||
TEMP_ALERT_MIN - minimum battery temperature alert.
|
|
||||||
TEMP_ALERT_MAX - maximum battery temperature alert.
|
|
||||||
TEMP_AMBIENT - ambient temperature.
|
|
||||||
TEMP_AMBIENT_ALERT_MIN - minimum ambient temperature alert.
|
|
||||||
TEMP_AMBIENT_ALERT_MAX - maximum ambient temperature alert.
|
|
||||||
TEMP_MIN - minimum operatable temperature
|
|
||||||
TEMP_MAX - maximum operatable temperature
|
|
||||||
|
|
||||||
TIME_TO_EMPTY - seconds left for battery to be considered empty (i.e.
|
|
||||||
while battery powers a load)
|
|
||||||
TIME_TO_FULL - seconds left for battery to be considered full (i.e.
|
|
||||||
while battery is charging)
|
|
||||||
|
|
||||||
|
|
||||||
Battery <-> external power supply interaction
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
Often power supplies are acting as supplies and supplicants at the same
|
|
||||||
time. Batteries are good example. So, batteries usually care if they're
|
|
||||||
externally powered or not.
|
|
||||||
|
|
||||||
For that case, power supply class implements notification mechanism for
|
|
||||||
batteries.
|
|
||||||
|
|
||||||
External power supply (AC) lists supplicants (batteries) names in
|
|
||||||
"supplied_to" struct member, and each power_supply_changed() call
|
|
||||||
issued by external power supply will notify supplicants via
|
|
||||||
external_power_changed callback.
|
|
||||||
|
|
||||||
|
|
||||||
Devicetree battery characteristics
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
Drivers should call power_supply_get_battery_info() to obtain battery
|
|
||||||
characteristics from a devicetree battery node, defined in
|
|
||||||
Documentation/devicetree/bindings/power/supply/battery.txt. This is
|
|
||||||
implemented in drivers/power/supply/bq27xxx_battery.c.
|
|
||||||
|
|
||||||
Properties in struct power_supply_battery_info and their counterparts in the
|
|
||||||
battery node have names corresponding to elements in enum power_supply_property,
|
|
||||||
for naming consistency between sysfs attributes and battery node properties.
|
|
||||||
|
|
||||||
|
|
||||||
QA
|
|
||||||
~~
|
|
||||||
Q: Where is POWER_SUPPLY_PROP_XYZ attribute?
|
|
||||||
A: If you cannot find attribute suitable for your driver needs, feel free
|
|
||||||
to add it and send patch along with your driver.
|
|
||||||
|
|
||||||
The attributes available currently are the ones currently provided by the
|
|
||||||
drivers written.
|
|
||||||
|
|
||||||
Good candidates to add in future: model/part#, cycle_time, manufacturer,
|
|
||||||
etc.
|
|
||||||
|
|
||||||
|
|
||||||
Q: I have some very specific attribute (e.g. battery color), should I add
|
|
||||||
this attribute to standard ones?
|
|
||||||
A: Most likely, no. Such attribute can be placed in the driver itself, if
|
|
||||||
it is useful. Of course, if the attribute in question applicable to
|
|
||||||
large set of batteries, provided by many drivers, and/or comes from
|
|
||||||
some general battery specification/standard, it may be a candidate to
|
|
||||||
be added to the core attribute set.
|
|
||||||
|
|
||||||
|
|
||||||
Q: Suppose, my battery monitoring chip/firmware does not provides capacity
|
|
||||||
in percents, but provides charge_{now,full,empty}. Should I calculate
|
|
||||||
percentage capacity manually, inside the driver, and register CAPACITY
|
|
||||||
attribute? The same question about time_to_empty/time_to_full.
|
|
||||||
A: Most likely, no. This class is designed to export properties which are
|
|
||||||
directly measurable by the specific hardware available.
|
|
||||||
|
|
||||||
Inferring not available properties using some heuristics or mathematical
|
|
||||||
model is not subject of work for a battery driver. Such functionality
|
|
||||||
should be factored out, and in fact, apm_power, the driver to serve
|
|
||||||
legacy APM API on top of power supply class, uses a simple heuristic of
|
|
||||||
approximating remaining battery capacity based on its charge, current,
|
|
||||||
voltage and so on. But full-fledged battery model is likely not subject
|
|
||||||
for kernel at all, as it would require floating point calculation to deal
|
|
||||||
with things like differential equations and Kalman filters. This is
|
|
||||||
better be handled by batteryd/libbattery, yet to be written.
|
|
|
@ -0,0 +1,257 @@
|
||||||
|
=======================
|
||||||
|
Power Capping Framework
|
||||||
|
=======================
|
||||||
|
|
||||||
|
The power capping framework provides a consistent interface between the kernel
|
||||||
|
and the user space that allows power capping drivers to expose the settings to
|
||||||
|
user space in a uniform way.
|
||||||
|
|
||||||
|
Terminology
|
||||||
|
===========
|
||||||
|
|
||||||
|
The framework exposes power capping devices to user space via sysfs in the
|
||||||
|
form of a tree of objects. The objects at the root level of the tree represent
|
||||||
|
'control types', which correspond to different methods of power capping. For
|
||||||
|
example, the intel-rapl control type represents the Intel "Running Average
|
||||||
|
Power Limit" (RAPL) technology, whereas the 'idle-injection' control type
|
||||||
|
corresponds to the use of idle injection for controlling power.
|
||||||
|
|
||||||
|
Power zones represent different parts of the system, which can be controlled and
|
||||||
|
monitored using the power capping method determined by the control type the
|
||||||
|
given zone belongs to. They each contain attributes for monitoring power, as
|
||||||
|
well as controls represented in the form of power constraints. If the parts of
|
||||||
|
the system represented by different power zones are hierarchical (that is, one
|
||||||
|
bigger part consists of multiple smaller parts that each have their own power
|
||||||
|
controls), those power zones may also be organized in a hierarchy with one
|
||||||
|
parent power zone containing multiple subzones and so on to reflect the power
|
||||||
|
control topology of the system. In that case, it is possible to apply power
|
||||||
|
capping to a set of devices together using the parent power zone and if more
|
||||||
|
fine grained control is required, it can be applied through the subzones.
|
||||||
|
|
||||||
|
|
||||||
|
Example sysfs interface tree::
|
||||||
|
|
||||||
|
/sys/devices/virtual/powercap
|
||||||
|
└──intel-rapl
|
||||||
|
├──intel-rapl:0
|
||||||
|
│ ├──constraint_0_name
|
||||||
|
│ ├──constraint_0_power_limit_uw
|
||||||
|
│ ├──constraint_0_time_window_us
|
||||||
|
│ ├──constraint_1_name
|
||||||
|
│ ├──constraint_1_power_limit_uw
|
||||||
|
│ ├──constraint_1_time_window_us
|
||||||
|
│ ├──device -> ../../intel-rapl
|
||||||
|
│ ├──energy_uj
|
||||||
|
│ ├──intel-rapl:0:0
|
||||||
|
│ │ ├──constraint_0_name
|
||||||
|
│ │ ├──constraint_0_power_limit_uw
|
||||||
|
│ │ ├──constraint_0_time_window_us
|
||||||
|
│ │ ├──constraint_1_name
|
||||||
|
│ │ ├──constraint_1_power_limit_uw
|
||||||
|
│ │ ├──constraint_1_time_window_us
|
||||||
|
│ │ ├──device -> ../../intel-rapl:0
|
||||||
|
│ │ ├──energy_uj
|
||||||
|
│ │ ├──max_energy_range_uj
|
||||||
|
│ │ ├──name
|
||||||
|
│ │ ├──enabled
|
||||||
|
│ │ ├──power
|
||||||
|
│ │ │ ├──async
|
||||||
|
│ │ │ []
|
||||||
|
│ │ ├──subsystem -> ../../../../../../class/power_cap
|
||||||
|
│ │ └──uevent
|
||||||
|
│ ├──intel-rapl:0:1
|
||||||
|
│ │ ├──constraint_0_name
|
||||||
|
│ │ ├──constraint_0_power_limit_uw
|
||||||
|
│ │ ├──constraint_0_time_window_us
|
||||||
|
│ │ ├──constraint_1_name
|
||||||
|
│ │ ├──constraint_1_power_limit_uw
|
||||||
|
│ │ ├──constraint_1_time_window_us
|
||||||
|
│ │ ├──device -> ../../intel-rapl:0
|
||||||
|
│ │ ├──energy_uj
|
||||||
|
│ │ ├──max_energy_range_uj
|
||||||
|
│ │ ├──name
|
||||||
|
│ │ ├──enabled
|
||||||
|
│ │ ├──power
|
||||||
|
│ │ │ ├──async
|
||||||
|
│ │ │ []
|
||||||
|
│ │ ├──subsystem -> ../../../../../../class/power_cap
|
||||||
|
│ │ └──uevent
|
||||||
|
│ ├──max_energy_range_uj
|
||||||
|
│ ├──max_power_range_uw
|
||||||
|
│ ├──name
|
||||||
|
│ ├──enabled
|
||||||
|
│ ├──power
|
||||||
|
│ │ ├──async
|
||||||
|
│ │ []
|
||||||
|
│ ├──subsystem -> ../../../../../class/power_cap
|
||||||
|
│ ├──enabled
|
||||||
|
│ ├──uevent
|
||||||
|
├──intel-rapl:1
|
||||||
|
│ ├──constraint_0_name
|
||||||
|
│ ├──constraint_0_power_limit_uw
|
||||||
|
│ ├──constraint_0_time_window_us
|
||||||
|
│ ├──constraint_1_name
|
||||||
|
│ ├──constraint_1_power_limit_uw
|
||||||
|
│ ├──constraint_1_time_window_us
|
||||||
|
│ ├──device -> ../../intel-rapl
|
||||||
|
│ ├──energy_uj
|
||||||
|
│ ├──intel-rapl:1:0
|
||||||
|
│ │ ├──constraint_0_name
|
||||||
|
│ │ ├──constraint_0_power_limit_uw
|
||||||
|
│ │ ├──constraint_0_time_window_us
|
||||||
|
│ │ ├──constraint_1_name
|
||||||
|
│ │ ├──constraint_1_power_limit_uw
|
||||||
|
│ │ ├──constraint_1_time_window_us
|
||||||
|
│ │ ├──device -> ../../intel-rapl:1
|
||||||
|
│ │ ├──energy_uj
|
||||||
|
│ │ ├──max_energy_range_uj
|
||||||
|
│ │ ├──name
|
||||||
|
│ │ ├──enabled
|
||||||
|
│ │ ├──power
|
||||||
|
│ │ │ ├──async
|
||||||
|
│ │ │ []
|
||||||
|
│ │ ├──subsystem -> ../../../../../../class/power_cap
|
||||||
|
│ │ └──uevent
|
||||||
|
│ ├──intel-rapl:1:1
|
||||||
|
│ │ ├──constraint_0_name
|
||||||
|
│ │ ├──constraint_0_power_limit_uw
|
||||||
|
│ │ ├──constraint_0_time_window_us
|
||||||
|
│ │ ├──constraint_1_name
|
||||||
|
│ │ ├──constraint_1_power_limit_uw
|
||||||
|
│ │ ├──constraint_1_time_window_us
|
||||||
|
│ │ ├──device -> ../../intel-rapl:1
|
||||||
|
│ │ ├──energy_uj
|
||||||
|
│ │ ├──max_energy_range_uj
|
||||||
|
│ │ ├──name
|
||||||
|
│ │ ├──enabled
|
||||||
|
│ │ ├──power
|
||||||
|
│ │ │ ├──async
|
||||||
|
│ │ │ []
|
||||||
|
│ │ ├──subsystem -> ../../../../../../class/power_cap
|
||||||
|
│ │ └──uevent
|
||||||
|
│ ├──max_energy_range_uj
|
||||||
|
│ ├──max_power_range_uw
|
||||||
|
│ ├──name
|
||||||
|
│ ├──enabled
|
||||||
|
│ ├──power
|
||||||
|
│ │ ├──async
|
||||||
|
│ │ []
|
||||||
|
│ ├──subsystem -> ../../../../../class/power_cap
|
||||||
|
│ ├──uevent
|
||||||
|
├──power
|
||||||
|
│ ├──async
|
||||||
|
│ []
|
||||||
|
├──subsystem -> ../../../../class/power_cap
|
||||||
|
├──enabled
|
||||||
|
└──uevent
|
||||||
|
|
||||||
|
The above example illustrates a case in which the Intel RAPL technology,
|
||||||
|
available in Intel® IA-64 and IA-32 Processor Architectures, is used. There is one
|
||||||
|
control type called intel-rapl which contains two power zones, intel-rapl:0 and
|
||||||
|
intel-rapl:1, representing CPU packages. Each of these power zones contains
|
||||||
|
two subzones, intel-rapl:j:0 and intel-rapl:j:1 (j = 0, 1), representing the
|
||||||
|
"core" and the "uncore" parts of the given CPU package, respectively. All of
|
||||||
|
the zones and subzones contain energy monitoring attributes (energy_uj,
|
||||||
|
max_energy_range_uj) and constraint attributes (constraint_*) allowing controls
|
||||||
|
to be applied (the constraints in the 'package' power zones apply to the whole
|
||||||
|
CPU packages and the subzone constraints only apply to the respective parts of
|
||||||
|
the given package individually). Since Intel RAPL doesn't provide instantaneous
|
||||||
|
power value, there is no power_uw attribute.
|
||||||
|
|
||||||
|
In addition to that, each power zone contains a name attribute, allowing the
|
||||||
|
part of the system represented by that zone to be identified.
|
||||||
|
For example::
|
||||||
|
|
||||||
|
cat /sys/class/power_cap/intel-rapl/intel-rapl:0/name
|
||||||
|
|
||||||
|
package-0
|
||||||
|
---------
|
||||||
|
|
||||||
|
The Intel RAPL technology allows two constraints, short term and long term,
|
||||||
|
with two different time windows to be applied to each power zone. Thus for
|
||||||
|
each zone there are 2 attributes representing the constraint names, 2 power
|
||||||
|
limits and 2 attributes representing the sizes of the time windows. Such that,
|
||||||
|
constraint_j_* attributes correspond to the jth constraint (j = 0,1).
|
||||||
|
|
||||||
|
For example::
|
||||||
|
|
||||||
|
constraint_0_name
|
||||||
|
constraint_0_power_limit_uw
|
||||||
|
constraint_0_time_window_us
|
||||||
|
constraint_1_name
|
||||||
|
constraint_1_power_limit_uw
|
||||||
|
constraint_1_time_window_us
|
||||||
|
|
||||||
|
Power Zone Attributes
|
||||||
|
=====================
|
||||||
|
|
||||||
|
Monitoring attributes
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
energy_uj (rw)
|
||||||
|
Current energy counter in micro joules. Write "0" to reset.
|
||||||
|
If the counter can not be reset, then this attribute is read only.
|
||||||
|
|
||||||
|
max_energy_range_uj (ro)
|
||||||
|
Range of the above energy counter in micro-joules.
|
||||||
|
|
||||||
|
power_uw (ro)
|
||||||
|
Current power in micro watts.
|
||||||
|
|
||||||
|
max_power_range_uw (ro)
|
||||||
|
Range of the above power value in micro-watts.
|
||||||
|
|
||||||
|
name (ro)
|
||||||
|
Name of this power zone.
|
||||||
|
|
||||||
|
It is possible that some domains have both power ranges and energy counter ranges;
|
||||||
|
however, only one is mandatory.
|
||||||
|
|
||||||
|
Constraints
|
||||||
|
-----------
|
||||||
|
|
||||||
|
constraint_X_power_limit_uw (rw)
|
||||||
|
Power limit in micro watts, which should be applicable for the
|
||||||
|
time window specified by "constraint_X_time_window_us".
|
||||||
|
|
||||||
|
constraint_X_time_window_us (rw)
|
||||||
|
Time window in micro seconds.
|
||||||
|
|
||||||
|
constraint_X_name (ro)
|
||||||
|
An optional name of the constraint
|
||||||
|
|
||||||
|
constraint_X_max_power_uw(ro)
|
||||||
|
Maximum allowed power in micro watts.
|
||||||
|
|
||||||
|
constraint_X_min_power_uw(ro)
|
||||||
|
Minimum allowed power in micro watts.
|
||||||
|
|
||||||
|
constraint_X_max_time_window_us(ro)
|
||||||
|
Maximum allowed time window in micro seconds.
|
||||||
|
|
||||||
|
constraint_X_min_time_window_us(ro)
|
||||||
|
Minimum allowed time window in micro seconds.
|
||||||
|
|
||||||
|
Except power_limit_uw and time_window_us other fields are optional.
|
||||||
|
|
||||||
|
Common zone and control type attributes
|
||||||
|
---------------------------------------
|
||||||
|
|
||||||
|
enabled (rw): Enable/Disable controls at zone level or for all zones using
|
||||||
|
a control type.
|
||||||
|
|
||||||
|
Power Cap Client Driver Interface
|
||||||
|
=================================
|
||||||
|
|
||||||
|
The API summary:
|
||||||
|
|
||||||
|
Call powercap_register_control_type() to register control type object.
|
||||||
|
Call powercap_register_zone() to register a power zone (under a given
|
||||||
|
control type), either as a top-level power zone or as a subzone of another
|
||||||
|
power zone registered earlier.
|
||||||
|
The number of constraints in a power zone and the corresponding callbacks have
|
||||||
|
to be defined prior to calling powercap_register_zone() to register that zone.
|
||||||
|
|
||||||
|
To Free a power zone call powercap_unregister_zone().
|
||||||
|
To free a control type object call powercap_unregister_control_type().
|
||||||
|
Detailed API can be generated using kernel-doc on include/linux/powercap.h.
|
|
@ -1,236 +0,0 @@
|
||||||
Power Capping Framework
|
|
||||||
==================================
|
|
||||||
|
|
||||||
The power capping framework provides a consistent interface between the kernel
|
|
||||||
and the user space that allows power capping drivers to expose the settings to
|
|
||||||
user space in a uniform way.
|
|
||||||
|
|
||||||
Terminology
|
|
||||||
=========================
|
|
||||||
The framework exposes power capping devices to user space via sysfs in the
|
|
||||||
form of a tree of objects. The objects at the root level of the tree represent
|
|
||||||
'control types', which correspond to different methods of power capping. For
|
|
||||||
example, the intel-rapl control type represents the Intel "Running Average
|
|
||||||
Power Limit" (RAPL) technology, whereas the 'idle-injection' control type
|
|
||||||
corresponds to the use of idle injection for controlling power.
|
|
||||||
|
|
||||||
Power zones represent different parts of the system, which can be controlled and
|
|
||||||
monitored using the power capping method determined by the control type the
|
|
||||||
given zone belongs to. They each contain attributes for monitoring power, as
|
|
||||||
well as controls represented in the form of power constraints. If the parts of
|
|
||||||
the system represented by different power zones are hierarchical (that is, one
|
|
||||||
bigger part consists of multiple smaller parts that each have their own power
|
|
||||||
controls), those power zones may also be organized in a hierarchy with one
|
|
||||||
parent power zone containing multiple subzones and so on to reflect the power
|
|
||||||
control topology of the system. In that case, it is possible to apply power
|
|
||||||
capping to a set of devices together using the parent power zone and if more
|
|
||||||
fine grained control is required, it can be applied through the subzones.
|
|
||||||
|
|
||||||
|
|
||||||
Example sysfs interface tree:
|
|
||||||
|
|
||||||
/sys/devices/virtual/powercap
|
|
||||||
??? intel-rapl
|
|
||||||
??? intel-rapl:0
|
|
||||||
? ??? constraint_0_name
|
|
||||||
? ??? constraint_0_power_limit_uw
|
|
||||||
? ??? constraint_0_time_window_us
|
|
||||||
? ??? constraint_1_name
|
|
||||||
? ??? constraint_1_power_limit_uw
|
|
||||||
? ??? constraint_1_time_window_us
|
|
||||||
? ??? device -> ../../intel-rapl
|
|
||||||
? ??? energy_uj
|
|
||||||
? ??? intel-rapl:0:0
|
|
||||||
? ? ??? constraint_0_name
|
|
||||||
? ? ??? constraint_0_power_limit_uw
|
|
||||||
? ? ??? constraint_0_time_window_us
|
|
||||||
? ? ??? constraint_1_name
|
|
||||||
? ? ??? constraint_1_power_limit_uw
|
|
||||||
? ? ??? constraint_1_time_window_us
|
|
||||||
? ? ??? device -> ../../intel-rapl:0
|
|
||||||
? ? ??? energy_uj
|
|
||||||
? ? ??? max_energy_range_uj
|
|
||||||
? ? ??? name
|
|
||||||
? ? ??? enabled
|
|
||||||
? ? ??? power
|
|
||||||
? ? ? ??? async
|
|
||||||
? ? ? []
|
|
||||||
? ? ??? subsystem -> ../../../../../../class/power_cap
|
|
||||||
? ? ??? uevent
|
|
||||||
? ??? intel-rapl:0:1
|
|
||||||
? ? ??? constraint_0_name
|
|
||||||
? ? ??? constraint_0_power_limit_uw
|
|
||||||
? ? ??? constraint_0_time_window_us
|
|
||||||
? ? ??? constraint_1_name
|
|
||||||
? ? ??? constraint_1_power_limit_uw
|
|
||||||
? ? ??? constraint_1_time_window_us
|
|
||||||
? ? ??? device -> ../../intel-rapl:0
|
|
||||||
? ? ??? energy_uj
|
|
||||||
? ? ??? max_energy_range_uj
|
|
||||||
? ? ??? name
|
|
||||||
? ? ??? enabled
|
|
||||||
? ? ??? power
|
|
||||||
? ? ? ??? async
|
|
||||||
? ? ? []
|
|
||||||
? ? ??? subsystem -> ../../../../../../class/power_cap
|
|
||||||
? ? ??? uevent
|
|
||||||
? ??? max_energy_range_uj
|
|
||||||
? ??? max_power_range_uw
|
|
||||||
? ??? name
|
|
||||||
? ??? enabled
|
|
||||||
? ??? power
|
|
||||||
? ? ??? async
|
|
||||||
? ? []
|
|
||||||
? ??? subsystem -> ../../../../../class/power_cap
|
|
||||||
? ??? enabled
|
|
||||||
? ??? uevent
|
|
||||||
??? intel-rapl:1
|
|
||||||
? ??? constraint_0_name
|
|
||||||
? ??? constraint_0_power_limit_uw
|
|
||||||
? ??? constraint_0_time_window_us
|
|
||||||
? ??? constraint_1_name
|
|
||||||
? ??? constraint_1_power_limit_uw
|
|
||||||
? ??? constraint_1_time_window_us
|
|
||||||
? ??? device -> ../../intel-rapl
|
|
||||||
? ??? energy_uj
|
|
||||||
? ??? intel-rapl:1:0
|
|
||||||
? ? ??? constraint_0_name
|
|
||||||
? ? ??? constraint_0_power_limit_uw
|
|
||||||
? ? ??? constraint_0_time_window_us
|
|
||||||
? ? ??? constraint_1_name
|
|
||||||
? ? ??? constraint_1_power_limit_uw
|
|
||||||
? ? ??? constraint_1_time_window_us
|
|
||||||
? ? ??? device -> ../../intel-rapl:1
|
|
||||||
? ? ??? energy_uj
|
|
||||||
? ? ??? max_energy_range_uj
|
|
||||||
? ? ??? name
|
|
||||||
? ? ??? enabled
|
|
||||||
? ? ??? power
|
|
||||||
? ? ? ??? async
|
|
||||||
? ? ? []
|
|
||||||
? ? ??? subsystem -> ../../../../../../class/power_cap
|
|
||||||
? ? ??? uevent
|
|
||||||
? ??? intel-rapl:1:1
|
|
||||||
? ? ??? constraint_0_name
|
|
||||||
? ? ??? constraint_0_power_limit_uw
|
|
||||||
? ? ??? constraint_0_time_window_us
|
|
||||||
? ? ??? constraint_1_name
|
|
||||||
? ? ??? constraint_1_power_limit_uw
|
|
||||||
? ? ??? constraint_1_time_window_us
|
|
||||||
? ? ??? device -> ../../intel-rapl:1
|
|
||||||
? ? ??? energy_uj
|
|
||||||
? ? ??? max_energy_range_uj
|
|
||||||
? ? ??? name
|
|
||||||
? ? ??? enabled
|
|
||||||
? ? ??? power
|
|
||||||
? ? ? ??? async
|
|
||||||
? ? ? []
|
|
||||||
? ? ??? subsystem -> ../../../../../../class/power_cap
|
|
||||||
? ? ??? uevent
|
|
||||||
? ??? max_energy_range_uj
|
|
||||||
? ??? max_power_range_uw
|
|
||||||
? ??? name
|
|
||||||
? ??? enabled
|
|
||||||
? ??? power
|
|
||||||
? ? ??? async
|
|
||||||
? ? []
|
|
||||||
? ??? subsystem -> ../../../../../class/power_cap
|
|
||||||
? ??? uevent
|
|
||||||
??? power
|
|
||||||
? ??? async
|
|
||||||
? []
|
|
||||||
??? subsystem -> ../../../../class/power_cap
|
|
||||||
??? enabled
|
|
||||||
??? uevent
|
|
||||||
|
|
||||||
The above example illustrates a case in which the Intel RAPL technology,
|
|
||||||
available in Intel® IA-64 and IA-32 Processor Architectures, is used. There is one
|
|
||||||
control type called intel-rapl which contains two power zones, intel-rapl:0 and
|
|
||||||
intel-rapl:1, representing CPU packages. Each of these power zones contains
|
|
||||||
two subzones, intel-rapl:j:0 and intel-rapl:j:1 (j = 0, 1), representing the
|
|
||||||
"core" and the "uncore" parts of the given CPU package, respectively. All of
|
|
||||||
the zones and subzones contain energy monitoring attributes (energy_uj,
|
|
||||||
max_energy_range_uj) and constraint attributes (constraint_*) allowing controls
|
|
||||||
to be applied (the constraints in the 'package' power zones apply to the whole
|
|
||||||
CPU packages and the subzone constraints only apply to the respective parts of
|
|
||||||
the given package individually). Since Intel RAPL doesn't provide instantaneous
|
|
||||||
power value, there is no power_uw attribute.
|
|
||||||
|
|
||||||
In addition to that, each power zone contains a name attribute, allowing the
|
|
||||||
part of the system represented by that zone to be identified.
|
|
||||||
For example:
|
|
||||||
|
|
||||||
cat /sys/class/power_cap/intel-rapl/intel-rapl:0/name
|
|
||||||
package-0
|
|
||||||
|
|
||||||
The Intel RAPL technology allows two constraints, short term and long term,
|
|
||||||
with two different time windows to be applied to each power zone. Thus for
|
|
||||||
each zone there are 2 attributes representing the constraint names, 2 power
|
|
||||||
limits and 2 attributes representing the sizes of the time windows. Such that,
|
|
||||||
constraint_j_* attributes correspond to the jth constraint (j = 0,1).
|
|
||||||
|
|
||||||
For example:
|
|
||||||
constraint_0_name
|
|
||||||
constraint_0_power_limit_uw
|
|
||||||
constraint_0_time_window_us
|
|
||||||
constraint_1_name
|
|
||||||
constraint_1_power_limit_uw
|
|
||||||
constraint_1_time_window_us
|
|
||||||
|
|
||||||
Power Zone Attributes
|
|
||||||
=================================
|
|
||||||
Monitoring attributes
|
|
||||||
----------------------
|
|
||||||
|
|
||||||
energy_uj (rw): Current energy counter in micro joules. Write "0" to reset.
|
|
||||||
If the counter can not be reset, then this attribute is read only.
|
|
||||||
|
|
||||||
max_energy_range_uj (ro): Range of the above energy counter in micro-joules.
|
|
||||||
|
|
||||||
power_uw (ro): Current power in micro watts.
|
|
||||||
|
|
||||||
max_power_range_uw (ro): Range of the above power value in micro-watts.
|
|
||||||
|
|
||||||
name (ro): Name of this power zone.
|
|
||||||
|
|
||||||
It is possible that some domains have both power ranges and energy counter ranges;
|
|
||||||
however, only one is mandatory.
|
|
||||||
|
|
||||||
Constraints
|
|
||||||
----------------
|
|
||||||
constraint_X_power_limit_uw (rw): Power limit in micro watts, which should be
|
|
||||||
applicable for the time window specified by "constraint_X_time_window_us".
|
|
||||||
|
|
||||||
constraint_X_time_window_us (rw): Time window in micro seconds.
|
|
||||||
|
|
||||||
constraint_X_name (ro): An optional name of the constraint
|
|
||||||
|
|
||||||
constraint_X_max_power_uw(ro): Maximum allowed power in micro watts.
|
|
||||||
|
|
||||||
constraint_X_min_power_uw(ro): Minimum allowed power in micro watts.
|
|
||||||
|
|
||||||
constraint_X_max_time_window_us(ro): Maximum allowed time window in micro seconds.
|
|
||||||
|
|
||||||
constraint_X_min_time_window_us(ro): Minimum allowed time window in micro seconds.
|
|
||||||
|
|
||||||
Except power_limit_uw and time_window_us other fields are optional.
|
|
||||||
|
|
||||||
Common zone and control type attributes
|
|
||||||
----------------------------------------
|
|
||||||
enabled (rw): Enable/Disable controls at zone level or for all zones using
|
|
||||||
a control type.
|
|
||||||
|
|
||||||
Power Cap Client Driver Interface
|
|
||||||
==================================
|
|
||||||
The API summary:
|
|
||||||
|
|
||||||
Call powercap_register_control_type() to register control type object.
|
|
||||||
Call powercap_register_zone() to register a power zone (under a given
|
|
||||||
control type), either as a top-level power zone or as a subzone of another
|
|
||||||
power zone registered earlier.
|
|
||||||
The number of constraints in a power zone and the corresponding callbacks have
|
|
||||||
to be defined prior to calling powercap_register_zone() to register that zone.
|
|
||||||
|
|
||||||
To Free a power zone call powercap_unregister_zone().
|
|
||||||
To free a control type object call powercap_unregister_control_type().
|
|
||||||
Detailed API can be generated using kernel-doc on include/linux/powercap.h.
|
|
|
@ -1,3 +1,4 @@
|
||||||
|
===================================
|
||||||
Regulator Consumer Driver Interface
|
Regulator Consumer Driver Interface
|
||||||
===================================
|
===================================
|
||||||
|
|
||||||
|
@ -8,73 +9,77 @@ Please see overview.txt for a description of the terms used in this text.
|
||||||
1. Consumer Regulator Access (static & dynamic drivers)
|
1. Consumer Regulator Access (static & dynamic drivers)
|
||||||
=======================================================
|
=======================================================
|
||||||
|
|
||||||
A consumer driver can get access to its supply regulator by calling :-
|
A consumer driver can get access to its supply regulator by calling ::
|
||||||
|
|
||||||
regulator = regulator_get(dev, "Vcc");
|
regulator = regulator_get(dev, "Vcc");
|
||||||
|
|
||||||
The consumer passes in its struct device pointer and power supply ID. The core
|
The consumer passes in its struct device pointer and power supply ID. The core
|
||||||
then finds the correct regulator by consulting a machine specific lookup table.
|
then finds the correct regulator by consulting a machine specific lookup table.
|
||||||
If the lookup is successful then this call will return a pointer to the struct
|
If the lookup is successful then this call will return a pointer to the struct
|
||||||
regulator that supplies this consumer.
|
regulator that supplies this consumer.
|
||||||
|
|
||||||
To release the regulator the consumer driver should call :-
|
To release the regulator the consumer driver should call ::
|
||||||
|
|
||||||
regulator_put(regulator);
|
regulator_put(regulator);
|
||||||
|
|
||||||
Consumers can be supplied by more than one regulator e.g. codec consumer with
|
Consumers can be supplied by more than one regulator e.g. codec consumer with
|
||||||
analog and digital supplies :-
|
analog and digital supplies ::
|
||||||
|
|
||||||
digital = regulator_get(dev, "Vcc"); /* digital core */
|
digital = regulator_get(dev, "Vcc"); /* digital core */
|
||||||
analog = regulator_get(dev, "Avdd"); /* analog */
|
analog = regulator_get(dev, "Avdd"); /* analog */
|
||||||
|
|
||||||
The regulator access functions regulator_get() and regulator_put() will
|
The regulator access functions regulator_get() and regulator_put() will
|
||||||
usually be called in your device drivers probe() and remove() respectively.
|
usually be called in your device drivers probe() and remove() respectively.
|
||||||
|
|
||||||
|
|
||||||
2. Regulator Output Enable & Disable (static & dynamic drivers)
|
2. Regulator Output Enable & Disable (static & dynamic drivers)
|
||||||
====================================================================
|
===============================================================
|
||||||
|
|
||||||
A consumer can enable its power supply by calling:-
|
|
||||||
|
|
||||||
int regulator_enable(regulator);
|
A consumer can enable its power supply by calling::
|
||||||
|
|
||||||
NOTE: The supply may already be enabled before regulator_enabled() is called.
|
int regulator_enable(regulator);
|
||||||
This may happen if the consumer shares the regulator or the regulator has been
|
|
||||||
previously enabled by bootloader or kernel board initialization code.
|
|
||||||
|
|
||||||
A consumer can determine if a regulator is enabled by calling :-
|
NOTE:
|
||||||
|
The supply may already be enabled before regulator_enabled() is called.
|
||||||
|
This may happen if the consumer shares the regulator or the regulator has been
|
||||||
|
previously enabled by bootloader or kernel board initialization code.
|
||||||
|
|
||||||
int regulator_is_enabled(regulator);
|
A consumer can determine if a regulator is enabled by calling::
|
||||||
|
|
||||||
|
int regulator_is_enabled(regulator);
|
||||||
|
|
||||||
This will return > zero when the regulator is enabled.
|
This will return > zero when the regulator is enabled.
|
||||||
|
|
||||||
|
|
||||||
A consumer can disable its supply when no longer needed by calling :-
|
A consumer can disable its supply when no longer needed by calling::
|
||||||
|
|
||||||
int regulator_disable(regulator);
|
int regulator_disable(regulator);
|
||||||
|
|
||||||
NOTE: This may not disable the supply if it's shared with other consumers. The
|
NOTE:
|
||||||
regulator will only be disabled when the enabled reference count is zero.
|
This may not disable the supply if it's shared with other consumers. The
|
||||||
|
regulator will only be disabled when the enabled reference count is zero.
|
||||||
|
|
||||||
Finally, a regulator can be forcefully disabled in the case of an emergency :-
|
Finally, a regulator can be forcefully disabled in the case of an emergency::
|
||||||
|
|
||||||
int regulator_force_disable(regulator);
|
int regulator_force_disable(regulator);
|
||||||
|
|
||||||
NOTE: this will immediately and forcefully shutdown the regulator output. All
|
NOTE:
|
||||||
consumers will be powered off.
|
this will immediately and forcefully shutdown the regulator output. All
|
||||||
|
consumers will be powered off.
|
||||||
|
|
||||||
|
|
||||||
3. Regulator Voltage Control & Status (dynamic drivers)
|
3. Regulator Voltage Control & Status (dynamic drivers)
|
||||||
======================================================
|
=======================================================
|
||||||
|
|
||||||
Some consumer drivers need to be able to dynamically change their supply
|
Some consumer drivers need to be able to dynamically change their supply
|
||||||
voltage to match system operating points. e.g. CPUfreq drivers can scale
|
voltage to match system operating points. e.g. CPUfreq drivers can scale
|
||||||
voltage along with frequency to save power, SD drivers may need to select the
|
voltage along with frequency to save power, SD drivers may need to select the
|
||||||
correct card voltage, etc.
|
correct card voltage, etc.
|
||||||
|
|
||||||
Consumers can control their supply voltage by calling :-
|
Consumers can control their supply voltage by calling::
|
||||||
|
|
||||||
int regulator_set_voltage(regulator, min_uV, max_uV);
|
int regulator_set_voltage(regulator, min_uV, max_uV);
|
||||||
|
|
||||||
Where min_uV and max_uV are the minimum and maximum acceptable voltages in
|
Where min_uV and max_uV are the minimum and maximum acceptable voltages in
|
||||||
microvolts.
|
microvolts.
|
||||||
|
@ -84,47 +89,50 @@ when enabled, then the voltage changes instantly, otherwise the voltage
|
||||||
configuration changes and the voltage is physically set when the regulator is
|
configuration changes and the voltage is physically set when the regulator is
|
||||||
next enabled.
|
next enabled.
|
||||||
|
|
||||||
The regulators configured voltage output can be found by calling :-
|
The regulators configured voltage output can be found by calling::
|
||||||
|
|
||||||
int regulator_get_voltage(regulator);
|
int regulator_get_voltage(regulator);
|
||||||
|
|
||||||
NOTE: get_voltage() will return the configured output voltage whether the
|
NOTE:
|
||||||
regulator is enabled or disabled and should NOT be used to determine regulator
|
get_voltage() will return the configured output voltage whether the
|
||||||
output state. However this can be used in conjunction with is_enabled() to
|
regulator is enabled or disabled and should NOT be used to determine regulator
|
||||||
determine the regulator physical output voltage.
|
output state. However this can be used in conjunction with is_enabled() to
|
||||||
|
determine the regulator physical output voltage.
|
||||||
|
|
||||||
|
|
||||||
4. Regulator Current Limit Control & Status (dynamic drivers)
|
4. Regulator Current Limit Control & Status (dynamic drivers)
|
||||||
===========================================================
|
=============================================================
|
||||||
|
|
||||||
Some consumer drivers need to be able to dynamically change their supply
|
Some consumer drivers need to be able to dynamically change their supply
|
||||||
current limit to match system operating points. e.g. LCD backlight driver can
|
current limit to match system operating points. e.g. LCD backlight driver can
|
||||||
change the current limit to vary the backlight brightness, USB drivers may want
|
change the current limit to vary the backlight brightness, USB drivers may want
|
||||||
to set the limit to 500mA when supplying power.
|
to set the limit to 500mA when supplying power.
|
||||||
|
|
||||||
Consumers can control their supply current limit by calling :-
|
Consumers can control their supply current limit by calling::
|
||||||
|
|
||||||
int regulator_set_current_limit(regulator, min_uA, max_uA);
|
int regulator_set_current_limit(regulator, min_uA, max_uA);
|
||||||
|
|
||||||
Where min_uA and max_uA are the minimum and maximum acceptable current limit in
|
Where min_uA and max_uA are the minimum and maximum acceptable current limit in
|
||||||
microamps.
|
microamps.
|
||||||
|
|
||||||
NOTE: this can be called when the regulator is enabled or disabled. If called
|
NOTE:
|
||||||
when enabled, then the current limit changes instantly, otherwise the current
|
this can be called when the regulator is enabled or disabled. If called
|
||||||
limit configuration changes and the current limit is physically set when the
|
when enabled, then the current limit changes instantly, otherwise the current
|
||||||
regulator is next enabled.
|
limit configuration changes and the current limit is physically set when the
|
||||||
|
regulator is next enabled.
|
||||||
|
|
||||||
A regulators current limit can be found by calling :-
|
A regulators current limit can be found by calling::
|
||||||
|
|
||||||
int regulator_get_current_limit(regulator);
|
int regulator_get_current_limit(regulator);
|
||||||
|
|
||||||
NOTE: get_current_limit() will return the current limit whether the regulator
|
NOTE:
|
||||||
is enabled or disabled and should not be used to determine regulator current
|
get_current_limit() will return the current limit whether the regulator
|
||||||
load.
|
is enabled or disabled and should not be used to determine regulator current
|
||||||
|
load.
|
||||||
|
|
||||||
|
|
||||||
5. Regulator Operating Mode Control & Status (dynamic drivers)
|
5. Regulator Operating Mode Control & Status (dynamic drivers)
|
||||||
=============================================================
|
==============================================================
|
||||||
|
|
||||||
Some consumers can further save system power by changing the operating mode of
|
Some consumers can further save system power by changing the operating mode of
|
||||||
their supply regulator to be more efficient when the consumers operating state
|
their supply regulator to be more efficient when the consumers operating state
|
||||||
|
@ -135,9 +143,9 @@ Regulator operating mode can be changed indirectly or directly.
|
||||||
Indirect operating mode control.
|
Indirect operating mode control.
|
||||||
--------------------------------
|
--------------------------------
|
||||||
Consumer drivers can request a change in their supply regulator operating mode
|
Consumer drivers can request a change in their supply regulator operating mode
|
||||||
by calling :-
|
by calling::
|
||||||
|
|
||||||
int regulator_set_load(struct regulator *regulator, int load_uA);
|
int regulator_set_load(struct regulator *regulator, int load_uA);
|
||||||
|
|
||||||
This will cause the core to recalculate the total load on the regulator (based
|
This will cause the core to recalculate the total load on the regulator (based
|
||||||
on all its consumers) and change operating mode (if necessary and permitted)
|
on all its consumers) and change operating mode (if necessary and permitted)
|
||||||
|
@ -153,12 +161,13 @@ consumers.
|
||||||
|
|
||||||
Direct operating mode control.
|
Direct operating mode control.
|
||||||
------------------------------
|
------------------------------
|
||||||
|
|
||||||
Bespoke or tightly coupled drivers may want to directly control regulator
|
Bespoke or tightly coupled drivers may want to directly control regulator
|
||||||
operating mode depending on their operating point. This can be achieved by
|
operating mode depending on their operating point. This can be achieved by
|
||||||
calling :-
|
calling::
|
||||||
|
|
||||||
int regulator_set_mode(struct regulator *regulator, unsigned int mode);
|
int regulator_set_mode(struct regulator *regulator, unsigned int mode);
|
||||||
unsigned int regulator_get_mode(struct regulator *regulator);
|
unsigned int regulator_get_mode(struct regulator *regulator);
|
||||||
|
|
||||||
Direct mode will only be used by consumers that *know* about the regulator and
|
Direct mode will only be used by consumers that *know* about the regulator and
|
||||||
are not sharing the regulator with other consumers.
|
are not sharing the regulator with other consumers.
|
||||||
|
@ -166,24 +175,26 @@ are not sharing the regulator with other consumers.
|
||||||
|
|
||||||
6. Regulator Events
|
6. Regulator Events
|
||||||
===================
|
===================
|
||||||
|
|
||||||
Regulators can notify consumers of external events. Events could be received by
|
Regulators can notify consumers of external events. Events could be received by
|
||||||
consumers under regulator stress or failure conditions.
|
consumers under regulator stress or failure conditions.
|
||||||
|
|
||||||
Consumers can register interest in regulator events by calling :-
|
Consumers can register interest in regulator events by calling::
|
||||||
|
|
||||||
int regulator_register_notifier(struct regulator *regulator,
|
int regulator_register_notifier(struct regulator *regulator,
|
||||||
struct notifier_block *nb);
|
struct notifier_block *nb);
|
||||||
|
|
||||||
Consumers can unregister interest by calling :-
|
Consumers can unregister interest by calling::
|
||||||
|
|
||||||
int regulator_unregister_notifier(struct regulator *regulator,
|
int regulator_unregister_notifier(struct regulator *regulator,
|
||||||
struct notifier_block *nb);
|
struct notifier_block *nb);
|
||||||
|
|
||||||
Regulators use the kernel notifier framework to send event to their interested
|
Regulators use the kernel notifier framework to send event to their interested
|
||||||
consumers.
|
consumers.
|
||||||
|
|
||||||
7. Regulator Direct Register Access
|
7. Regulator Direct Register Access
|
||||||
===================================
|
===================================
|
||||||
|
|
||||||
Some kinds of power management hardware or firmware are designed such that
|
Some kinds of power management hardware or firmware are designed such that
|
||||||
they need to do low-level hardware access to regulators, with no involvement
|
they need to do low-level hardware access to regulators, with no involvement
|
||||||
from the kernel. Examples of such devices are:
|
from the kernel. Examples of such devices are:
|
||||||
|
@ -199,20 +210,20 @@ to it. The regulator framework provides the following helpers for querying
|
||||||
these details.
|
these details.
|
||||||
|
|
||||||
Bus-specific details, like I2C addresses or transfer rates are handled by the
|
Bus-specific details, like I2C addresses or transfer rates are handled by the
|
||||||
regmap framework. To get the regulator's regmap (if supported), use :-
|
regmap framework. To get the regulator's regmap (if supported), use::
|
||||||
|
|
||||||
struct regmap *regulator_get_regmap(struct regulator *regulator);
|
struct regmap *regulator_get_regmap(struct regulator *regulator);
|
||||||
|
|
||||||
To obtain the hardware register offset and bitmask for the regulator's voltage
|
To obtain the hardware register offset and bitmask for the regulator's voltage
|
||||||
selector register, use :-
|
selector register, use::
|
||||||
|
|
||||||
int regulator_get_hardware_vsel_register(struct regulator *regulator,
|
int regulator_get_hardware_vsel_register(struct regulator *regulator,
|
||||||
unsigned *vsel_reg,
|
unsigned *vsel_reg,
|
||||||
unsigned *vsel_mask);
|
unsigned *vsel_mask);
|
||||||
|
|
||||||
To convert a regulator framework voltage selector code (used by
|
To convert a regulator framework voltage selector code (used by
|
||||||
regulator_list_voltage) to a hardware-specific voltage selector that can be
|
regulator_list_voltage) to a hardware-specific voltage selector that can be
|
||||||
directly written to the voltage selector register, use :-
|
directly written to the voltage selector register, use::
|
||||||
|
|
||||||
int regulator_list_hardware_vsel(struct regulator *regulator,
|
int regulator_list_hardware_vsel(struct regulator *regulator,
|
||||||
unsigned selector);
|
unsigned selector);
|
|
@ -1,3 +1,4 @@
|
||||||
|
==========================
|
||||||
Regulator API design notes
|
Regulator API design notes
|
||||||
==========================
|
==========================
|
||||||
|
|
||||||
|
@ -14,7 +15,9 @@ Safety
|
||||||
have different power requirements, and not all components with power
|
have different power requirements, and not all components with power
|
||||||
requirements are visible to software.
|
requirements are visible to software.
|
||||||
|
|
||||||
=> The API should make no changes to the hardware state unless it has
|
.. note::
|
||||||
|
|
||||||
|
The API should make no changes to the hardware state unless it has
|
||||||
specific knowledge that these changes are safe to perform on this
|
specific knowledge that these changes are safe to perform on this
|
||||||
particular system.
|
particular system.
|
||||||
|
|
||||||
|
@ -28,6 +31,8 @@ Consumer use cases
|
||||||
- Many of the power supplies in the system will be shared between many
|
- Many of the power supplies in the system will be shared between many
|
||||||
different consumers.
|
different consumers.
|
||||||
|
|
||||||
=> The consumer API should be structured so that these use cases are
|
.. note::
|
||||||
|
|
||||||
|
The consumer API should be structured so that these use cases are
|
||||||
very easy to handle and so that consumers will work with shared
|
very easy to handle and so that consumers will work with shared
|
||||||
supplies without any additional effort.
|
supplies without any additional effort.
|
|
@ -1,10 +1,11 @@
|
||||||
|
==================================
|
||||||
Regulator Machine Driver Interface
|
Regulator Machine Driver Interface
|
||||||
===================================
|
==================================
|
||||||
|
|
||||||
The regulator machine driver interface is intended for board/machine specific
|
The regulator machine driver interface is intended for board/machine specific
|
||||||
initialisation code to configure the regulator subsystem.
|
initialisation code to configure the regulator subsystem.
|
||||||
|
|
||||||
Consider the following machine :-
|
Consider the following machine::
|
||||||
|
|
||||||
Regulator-1 -+-> Regulator-2 --> [Consumer A @ 1.8 - 2.0V]
|
Regulator-1 -+-> Regulator-2 --> [Consumer A @ 1.8 - 2.0V]
|
||||||
|
|
|
|
||||||
|
@ -13,31 +14,31 @@ Consider the following machine :-
|
||||||
The drivers for consumers A & B must be mapped to the correct regulator in
|
The drivers for consumers A & B must be mapped to the correct regulator in
|
||||||
order to control their power supplies. This mapping can be achieved in machine
|
order to control their power supplies. This mapping can be achieved in machine
|
||||||
initialisation code by creating a struct regulator_consumer_supply for
|
initialisation code by creating a struct regulator_consumer_supply for
|
||||||
each regulator.
|
each regulator::
|
||||||
|
|
||||||
struct regulator_consumer_supply {
|
struct regulator_consumer_supply {
|
||||||
const char *dev_name; /* consumer dev_name() */
|
const char *dev_name; /* consumer dev_name() */
|
||||||
const char *supply; /* consumer supply - e.g. "vcc" */
|
const char *supply; /* consumer supply - e.g. "vcc" */
|
||||||
};
|
};
|
||||||
|
|
||||||
e.g. for the machine above
|
e.g. for the machine above::
|
||||||
|
|
||||||
static struct regulator_consumer_supply regulator1_consumers[] = {
|
static struct regulator_consumer_supply regulator1_consumers[] = {
|
||||||
REGULATOR_SUPPLY("Vcc", "consumer B"),
|
REGULATOR_SUPPLY("Vcc", "consumer B"),
|
||||||
};
|
};
|
||||||
|
|
||||||
static struct regulator_consumer_supply regulator2_consumers[] = {
|
static struct regulator_consumer_supply regulator2_consumers[] = {
|
||||||
REGULATOR_SUPPLY("Vcc", "consumer A"),
|
REGULATOR_SUPPLY("Vcc", "consumer A"),
|
||||||
};
|
};
|
||||||
|
|
||||||
This maps Regulator-1 to the 'Vcc' supply for Consumer B and maps Regulator-2
|
This maps Regulator-1 to the 'Vcc' supply for Consumer B and maps Regulator-2
|
||||||
to the 'Vcc' supply for Consumer A.
|
to the 'Vcc' supply for Consumer A.
|
||||||
|
|
||||||
Constraints can now be registered by defining a struct regulator_init_data
|
Constraints can now be registered by defining a struct regulator_init_data
|
||||||
for each regulator power domain. This structure also maps the consumers
|
for each regulator power domain. This structure also maps the consumers
|
||||||
to their supply regulators :-
|
to their supply regulators::
|
||||||
|
|
||||||
static struct regulator_init_data regulator1_data = {
|
static struct regulator_init_data regulator1_data = {
|
||||||
.constraints = {
|
.constraints = {
|
||||||
.name = "Regulator-1",
|
.name = "Regulator-1",
|
||||||
.min_uV = 3300000,
|
.min_uV = 3300000,
|
||||||
|
@ -46,7 +47,7 @@ static struct regulator_init_data regulator1_data = {
|
||||||
},
|
},
|
||||||
.num_consumer_supplies = ARRAY_SIZE(regulator1_consumers),
|
.num_consumer_supplies = ARRAY_SIZE(regulator1_consumers),
|
||||||
.consumer_supplies = regulator1_consumers,
|
.consumer_supplies = regulator1_consumers,
|
||||||
};
|
};
|
||||||
|
|
||||||
The name field should be set to something that is usefully descriptive
|
The name field should be set to something that is usefully descriptive
|
||||||
for the board for configuration of supplies for other regulators and
|
for the board for configuration of supplies for other regulators and
|
||||||
|
@ -57,9 +58,9 @@ name is provided then the subsystem will choose one.
|
||||||
Regulator-1 supplies power to Regulator-2. This relationship must be registered
|
Regulator-1 supplies power to Regulator-2. This relationship must be registered
|
||||||
with the core so that Regulator-1 is also enabled when Consumer A enables its
|
with the core so that Regulator-1 is also enabled when Consumer A enables its
|
||||||
supply (Regulator-2). The supply regulator is set by the supply_regulator
|
supply (Regulator-2). The supply regulator is set by the supply_regulator
|
||||||
field below and co:-
|
field below and co::
|
||||||
|
|
||||||
static struct regulator_init_data regulator2_data = {
|
static struct regulator_init_data regulator2_data = {
|
||||||
.supply_regulator = "Regulator-1",
|
.supply_regulator = "Regulator-1",
|
||||||
.constraints = {
|
.constraints = {
|
||||||
.min_uV = 1800000,
|
.min_uV = 1800000,
|
||||||
|
@ -69,11 +70,11 @@ static struct regulator_init_data regulator2_data = {
|
||||||
},
|
},
|
||||||
.num_consumer_supplies = ARRAY_SIZE(regulator2_consumers),
|
.num_consumer_supplies = ARRAY_SIZE(regulator2_consumers),
|
||||||
.consumer_supplies = regulator2_consumers,
|
.consumer_supplies = regulator2_consumers,
|
||||||
};
|
};
|
||||||
|
|
||||||
Finally the regulator devices must be registered in the usual manner.
|
Finally the regulator devices must be registered in the usual manner::
|
||||||
|
|
||||||
static struct platform_device regulator_devices[] = {
|
static struct platform_device regulator_devices[] = {
|
||||||
{
|
{
|
||||||
.name = "regulator",
|
.name = "regulator",
|
||||||
.id = DCDC_1,
|
.id = DCDC_1,
|
||||||
|
@ -88,9 +89,9 @@ static struct platform_device regulator_devices[] = {
|
||||||
.platform_data = ®ulator2_data,
|
.platform_data = ®ulator2_data,
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
};
|
};
|
||||||
/* register regulator 1 device */
|
/* register regulator 1 device */
|
||||||
platform_device_register(®ulator_devices[0]);
|
platform_device_register(®ulator_devices[0]);
|
||||||
|
|
||||||
/* register regulator 2 device */
|
/* register regulator 2 device */
|
||||||
platform_device_register(®ulator_devices[1]);
|
platform_device_register(®ulator_devices[1]);
|
|
@ -1,3 +1,4 @@
|
||||||
|
=============================================
|
||||||
Linux voltage and current regulator framework
|
Linux voltage and current regulator framework
|
||||||
=============================================
|
=============================================
|
||||||
|
|
||||||
|
@ -13,26 +14,30 @@ regulators (where voltage output is controllable) and current sinks (where
|
||||||
current limit is controllable).
|
current limit is controllable).
|
||||||
|
|
||||||
(C) 2008 Wolfson Microelectronics PLC.
|
(C) 2008 Wolfson Microelectronics PLC.
|
||||||
|
|
||||||
Author: Liam Girdwood <lrg@slimlogic.co.uk>
|
Author: Liam Girdwood <lrg@slimlogic.co.uk>
|
||||||
|
|
||||||
|
|
||||||
Nomenclature
|
Nomenclature
|
||||||
============
|
============
|
||||||
|
|
||||||
Some terms used in this document:-
|
Some terms used in this document:
|
||||||
|
|
||||||
o Regulator - Electronic device that supplies power to other devices.
|
- Regulator
|
||||||
|
- Electronic device that supplies power to other devices.
|
||||||
Most regulators can enable and disable their output while
|
Most regulators can enable and disable their output while
|
||||||
some can control their output voltage and or current.
|
some can control their output voltage and or current.
|
||||||
|
|
||||||
Input Voltage -> Regulator -> Output Voltage
|
Input Voltage -> Regulator -> Output Voltage
|
||||||
|
|
||||||
|
|
||||||
o PMIC - Power Management IC. An IC that contains numerous regulators
|
- PMIC
|
||||||
and often contains other subsystems.
|
- Power Management IC. An IC that contains numerous
|
||||||
|
regulators and often contains other subsystems.
|
||||||
|
|
||||||
|
|
||||||
o Consumer - Electronic device that is supplied power by a regulator.
|
- Consumer
|
||||||
|
- Electronic device that is supplied power by a regulator.
|
||||||
Consumers can be classified into two types:-
|
Consumers can be classified into two types:-
|
||||||
|
|
||||||
Static: consumer does not change its supply voltage or
|
Static: consumer does not change its supply voltage or
|
||||||
|
@ -44,46 +49,48 @@ Some terms used in this document:-
|
||||||
current limit to meet operation demands.
|
current limit to meet operation demands.
|
||||||
|
|
||||||
|
|
||||||
o Power Domain - Electronic circuit that is supplied its input power by the
|
- Power Domain
|
||||||
|
- Electronic circuit that is supplied its input power by the
|
||||||
output power of a regulator, switch or by another power
|
output power of a regulator, switch or by another power
|
||||||
domain.
|
domain.
|
||||||
|
|
||||||
The supply regulator may be behind a switch(s). i.e.
|
The supply regulator may be behind a switch(s). i.e.::
|
||||||
|
|
||||||
Regulator -+-> Switch-1 -+-> Switch-2 --> [Consumer A]
|
Regulator -+-> Switch-1 -+-> Switch-2 --> [Consumer A]
|
||||||
| |
|
| |
|
||||||
| +-> [Consumer B], [Consumer C]
|
| +-> [Consumer B], [Consumer C]
|
||||||
|
|
|
|
||||||
+-> [Consumer D], [Consumer E]
|
+-> [Consumer D], [Consumer E]
|
||||||
|
|
||||||
That is one regulator and three power domains:
|
That is one regulator and three power domains:
|
||||||
|
|
||||||
Domain 1: Switch-1, Consumers D & E.
|
- Domain 1: Switch-1, Consumers D & E.
|
||||||
Domain 2: Switch-2, Consumers B & C.
|
- Domain 2: Switch-2, Consumers B & C.
|
||||||
Domain 3: Consumer A.
|
- Domain 3: Consumer A.
|
||||||
|
|
||||||
and this represents a "supplies" relationship:
|
and this represents a "supplies" relationship:
|
||||||
|
|
||||||
Domain-1 --> Domain-2 --> Domain-3.
|
Domain-1 --> Domain-2 --> Domain-3.
|
||||||
|
|
||||||
A power domain may have regulators that are supplied power
|
A power domain may have regulators that are supplied power
|
||||||
by other regulators. i.e.
|
by other regulators. i.e.::
|
||||||
|
|
||||||
Regulator-1 -+-> Regulator-2 -+-> [Consumer A]
|
Regulator-1 -+-> Regulator-2 -+-> [Consumer A]
|
||||||
|
|
|
|
||||||
+-> [Consumer B]
|
+-> [Consumer B]
|
||||||
|
|
||||||
This gives us two regulators and two power domains:
|
This gives us two regulators and two power domains:
|
||||||
|
|
||||||
Domain 1: Regulator-2, Consumer B.
|
- Domain 1: Regulator-2, Consumer B.
|
||||||
Domain 2: Consumer A.
|
- Domain 2: Consumer A.
|
||||||
|
|
||||||
and a "supplies" relationship:
|
and a "supplies" relationship:
|
||||||
|
|
||||||
Domain-1 --> Domain-2
|
Domain-1 --> Domain-2
|
||||||
|
|
||||||
|
|
||||||
o Constraints - Constraints are used to define power levels for performance
|
- Constraints
|
||||||
|
- Constraints are used to define power levels for performance
|
||||||
and hardware protection. Constraints exist at three levels:
|
and hardware protection. Constraints exist at three levels:
|
||||||
|
|
||||||
Regulator Level: This is defined by the regulator hardware
|
Regulator Level: This is defined by the regulator hardware
|
||||||
|
@ -141,7 +148,7 @@ relevant to non SoC devices and is split into the following four interfaces:-
|
||||||
limit. This also compiles out if not in use so drivers can be reused in
|
limit. This also compiles out if not in use so drivers can be reused in
|
||||||
systems with no regulator based power control.
|
systems with no regulator based power control.
|
||||||
|
|
||||||
See Documentation/power/regulator/consumer.txt
|
See Documentation/power/regulator/consumer.rst
|
||||||
|
|
||||||
2. Regulator driver interface.
|
2. Regulator driver interface.
|
||||||
|
|
||||||
|
@ -149,7 +156,7 @@ relevant to non SoC devices and is split into the following four interfaces:-
|
||||||
operations to the core. It also has a notifier call chain for propagating
|
operations to the core. It also has a notifier call chain for propagating
|
||||||
regulator events to clients.
|
regulator events to clients.
|
||||||
|
|
||||||
See Documentation/power/regulator/regulator.txt
|
See Documentation/power/regulator/regulator.rst
|
||||||
|
|
||||||
3. Machine interface.
|
3. Machine interface.
|
||||||
|
|
||||||
|
@ -160,7 +167,7 @@ relevant to non SoC devices and is split into the following four interfaces:-
|
||||||
allows the creation of a regulator tree whereby some regulators are
|
allows the creation of a regulator tree whereby some regulators are
|
||||||
supplied by others (similar to a clock tree).
|
supplied by others (similar to a clock tree).
|
||||||
|
|
||||||
See Documentation/power/regulator/machine.txt
|
See Documentation/power/regulator/machine.rst
|
||||||
|
|
||||||
4. Userspace ABI.
|
4. Userspace ABI.
|
||||||
|
|
|
@ -0,0 +1,32 @@
|
||||||
|
==========================
|
||||||
|
Regulator Driver Interface
|
||||||
|
==========================
|
||||||
|
|
||||||
|
The regulator driver interface is relatively simple and designed to allow
|
||||||
|
regulator drivers to register their services with the core framework.
|
||||||
|
|
||||||
|
|
||||||
|
Registration
|
||||||
|
============
|
||||||
|
|
||||||
|
Drivers can register a regulator by calling::
|
||||||
|
|
||||||
|
struct regulator_dev *regulator_register(struct regulator_desc *regulator_desc,
|
||||||
|
const struct regulator_config *config);
|
||||||
|
|
||||||
|
This will register the regulator's capabilities and operations to the regulator
|
||||||
|
core.
|
||||||
|
|
||||||
|
Regulators can be unregistered by calling::
|
||||||
|
|
||||||
|
void regulator_unregister(struct regulator_dev *rdev);
|
||||||
|
|
||||||
|
|
||||||
|
Regulator Events
|
||||||
|
================
|
||||||
|
|
||||||
|
Regulators can send events (e.g. overtemperature, undervoltage, etc) to
|
||||||
|
consumer drivers by calling::
|
||||||
|
|
||||||
|
int regulator_notifier_call_chain(struct regulator_dev *rdev,
|
||||||
|
unsigned long event, void *data);
|
|
@ -1,30 +0,0 @@
|
||||||
Regulator Driver Interface
|
|
||||||
==========================
|
|
||||||
|
|
||||||
The regulator driver interface is relatively simple and designed to allow
|
|
||||||
regulator drivers to register their services with the core framework.
|
|
||||||
|
|
||||||
|
|
||||||
Registration
|
|
||||||
============
|
|
||||||
|
|
||||||
Drivers can register a regulator by calling :-
|
|
||||||
|
|
||||||
struct regulator_dev *regulator_register(struct regulator_desc *regulator_desc,
|
|
||||||
const struct regulator_config *config);
|
|
||||||
|
|
||||||
This will register the regulator's capabilities and operations to the regulator
|
|
||||||
core.
|
|
||||||
|
|
||||||
Regulators can be unregistered by calling :-
|
|
||||||
|
|
||||||
void regulator_unregister(struct regulator_dev *rdev);
|
|
||||||
|
|
||||||
|
|
||||||
Regulator Events
|
|
||||||
================
|
|
||||||
Regulators can send events (e.g. overtemperature, undervoltage, etc) to
|
|
||||||
consumer drivers by calling :-
|
|
||||||
|
|
||||||
int regulator_notifier_call_chain(struct regulator_dev *rdev,
|
|
||||||
unsigned long event, void *data);
|
|
|
@ -1,10 +1,15 @@
|
||||||
|
==================================================
|
||||||
Runtime Power Management Framework for I/O Devices
|
Runtime Power Management Framework for I/O Devices
|
||||||
|
==================================================
|
||||||
|
|
||||||
(C) 2009-2011 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
|
(C) 2009-2011 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
|
||||||
|
|
||||||
(C) 2010 Alan Stern <stern@rowland.harvard.edu>
|
(C) 2010 Alan Stern <stern@rowland.harvard.edu>
|
||||||
|
|
||||||
(C) 2014 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
(C) 2014 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
||||||
|
|
||||||
1. Introduction
|
1. Introduction
|
||||||
|
===============
|
||||||
|
|
||||||
Support for runtime power management (runtime PM) of I/O devices is provided
|
Support for runtime power management (runtime PM) of I/O devices is provided
|
||||||
at the power management core (PM core) level by means of:
|
at the power management core (PM core) level by means of:
|
||||||
|
@ -33,16 +38,17 @@ fields of 'struct dev_pm_info' and the core helper functions provided for
|
||||||
runtime PM are described below.
|
runtime PM are described below.
|
||||||
|
|
||||||
2. Device Runtime PM Callbacks
|
2. Device Runtime PM Callbacks
|
||||||
|
==============================
|
||||||
|
|
||||||
There are three device runtime PM callbacks defined in 'struct dev_pm_ops':
|
There are three device runtime PM callbacks defined in 'struct dev_pm_ops'::
|
||||||
|
|
||||||
struct dev_pm_ops {
|
struct dev_pm_ops {
|
||||||
...
|
...
|
||||||
int (*runtime_suspend)(struct device *dev);
|
int (*runtime_suspend)(struct device *dev);
|
||||||
int (*runtime_resume)(struct device *dev);
|
int (*runtime_resume)(struct device *dev);
|
||||||
int (*runtime_idle)(struct device *dev);
|
int (*runtime_idle)(struct device *dev);
|
||||||
...
|
...
|
||||||
};
|
};
|
||||||
|
|
||||||
The ->runtime_suspend(), ->runtime_resume() and ->runtime_idle() callbacks
|
The ->runtime_suspend(), ->runtime_resume() and ->runtime_idle() callbacks
|
||||||
are executed by the PM core for the device's subsystem that may be either of
|
are executed by the PM core for the device's subsystem that may be either of
|
||||||
|
@ -112,7 +118,7 @@ low-power state during the execution of the suspend callback, it is expected
|
||||||
that remote wakeup will be enabled for the device. Generally, remote wakeup
|
that remote wakeup will be enabled for the device. Generally, remote wakeup
|
||||||
should be enabled for all input devices put into low-power states at run time.
|
should be enabled for all input devices put into low-power states at run time.
|
||||||
|
|
||||||
The subsystem-level resume callback, if present, is _entirely_ _responsible_ for
|
The subsystem-level resume callback, if present, is **entirely responsible** for
|
||||||
handling the resume of the device as appropriate, which may, but need not
|
handling the resume of the device as appropriate, which may, but need not
|
||||||
include executing the device driver's own ->runtime_resume() callback (from the
|
include executing the device driver's own ->runtime_resume() callback (from the
|
||||||
PM core's point of view it is not necessary to implement a ->runtime_resume()
|
PM core's point of view it is not necessary to implement a ->runtime_resume()
|
||||||
|
@ -197,95 +203,96 @@ rules:
|
||||||
except for scheduled autosuspends.
|
except for scheduled autosuspends.
|
||||||
|
|
||||||
3. Runtime PM Device Fields
|
3. Runtime PM Device Fields
|
||||||
|
===========================
|
||||||
|
|
||||||
The following device runtime PM fields are present in 'struct dev_pm_info', as
|
The following device runtime PM fields are present in 'struct dev_pm_info', as
|
||||||
defined in include/linux/pm.h:
|
defined in include/linux/pm.h:
|
||||||
|
|
||||||
struct timer_list suspend_timer;
|
`struct timer_list suspend_timer;`
|
||||||
- timer used for scheduling (delayed) suspend and autosuspend requests
|
- timer used for scheduling (delayed) suspend and autosuspend requests
|
||||||
|
|
||||||
unsigned long timer_expires;
|
`unsigned long timer_expires;`
|
||||||
- timer expiration time, in jiffies (if this is different from zero, the
|
- timer expiration time, in jiffies (if this is different from zero, the
|
||||||
timer is running and will expire at that time, otherwise the timer is not
|
timer is running and will expire at that time, otherwise the timer is not
|
||||||
running)
|
running)
|
||||||
|
|
||||||
struct work_struct work;
|
`struct work_struct work;`
|
||||||
- work structure used for queuing up requests (i.e. work items in pm_wq)
|
- work structure used for queuing up requests (i.e. work items in pm_wq)
|
||||||
|
|
||||||
wait_queue_head_t wait_queue;
|
`wait_queue_head_t wait_queue;`
|
||||||
- wait queue used if any of the helper functions needs to wait for another
|
- wait queue used if any of the helper functions needs to wait for another
|
||||||
one to complete
|
one to complete
|
||||||
|
|
||||||
spinlock_t lock;
|
`spinlock_t lock;`
|
||||||
- lock used for synchronization
|
- lock used for synchronization
|
||||||
|
|
||||||
atomic_t usage_count;
|
`atomic_t usage_count;`
|
||||||
- the usage counter of the device
|
- the usage counter of the device
|
||||||
|
|
||||||
atomic_t child_count;
|
`atomic_t child_count;`
|
||||||
- the count of 'active' children of the device
|
- the count of 'active' children of the device
|
||||||
|
|
||||||
unsigned int ignore_children;
|
`unsigned int ignore_children;`
|
||||||
- if set, the value of child_count is ignored (but still updated)
|
- if set, the value of child_count is ignored (but still updated)
|
||||||
|
|
||||||
unsigned int disable_depth;
|
`unsigned int disable_depth;`
|
||||||
- used for disabling the helper functions (they work normally if this is
|
- used for disabling the helper functions (they work normally if this is
|
||||||
equal to zero); the initial value of it is 1 (i.e. runtime PM is
|
equal to zero); the initial value of it is 1 (i.e. runtime PM is
|
||||||
initially disabled for all devices)
|
initially disabled for all devices)
|
||||||
|
|
||||||
int runtime_error;
|
`int runtime_error;`
|
||||||
- if set, there was a fatal error (one of the callbacks returned error code
|
- if set, there was a fatal error (one of the callbacks returned error code
|
||||||
as described in Section 2), so the helper functions will not work until
|
as described in Section 2), so the helper functions will not work until
|
||||||
this flag is cleared; this is the error code returned by the failing
|
this flag is cleared; this is the error code returned by the failing
|
||||||
callback
|
callback
|
||||||
|
|
||||||
unsigned int idle_notification;
|
`unsigned int idle_notification;`
|
||||||
- if set, ->runtime_idle() is being executed
|
- if set, ->runtime_idle() is being executed
|
||||||
|
|
||||||
unsigned int request_pending;
|
`unsigned int request_pending;`
|
||||||
- if set, there's a pending request (i.e. a work item queued up into pm_wq)
|
- if set, there's a pending request (i.e. a work item queued up into pm_wq)
|
||||||
|
|
||||||
enum rpm_request request;
|
`enum rpm_request request;`
|
||||||
- type of request that's pending (valid if request_pending is set)
|
- type of request that's pending (valid if request_pending is set)
|
||||||
|
|
||||||
unsigned int deferred_resume;
|
`unsigned int deferred_resume;`
|
||||||
- set if ->runtime_resume() is about to be run while ->runtime_suspend() is
|
- set if ->runtime_resume() is about to be run while ->runtime_suspend() is
|
||||||
being executed for that device and it is not practical to wait for the
|
being executed for that device and it is not practical to wait for the
|
||||||
suspend to complete; means "start a resume as soon as you've suspended"
|
suspend to complete; means "start a resume as soon as you've suspended"
|
||||||
|
|
||||||
enum rpm_status runtime_status;
|
`enum rpm_status runtime_status;`
|
||||||
- the runtime PM status of the device; this field's initial value is
|
- the runtime PM status of the device; this field's initial value is
|
||||||
RPM_SUSPENDED, which means that each device is initially regarded by the
|
RPM_SUSPENDED, which means that each device is initially regarded by the
|
||||||
PM core as 'suspended', regardless of its real hardware status
|
PM core as 'suspended', regardless of its real hardware status
|
||||||
|
|
||||||
unsigned int runtime_auto;
|
`unsigned int runtime_auto;`
|
||||||
- if set, indicates that the user space has allowed the device driver to
|
- if set, indicates that the user space has allowed the device driver to
|
||||||
power manage the device at run time via the /sys/devices/.../power/control
|
power manage the device at run time via the /sys/devices/.../power/control
|
||||||
interface; it may only be modified with the help of the pm_runtime_allow()
|
`interface;` it may only be modified with the help of the pm_runtime_allow()
|
||||||
and pm_runtime_forbid() helper functions
|
and pm_runtime_forbid() helper functions
|
||||||
|
|
||||||
unsigned int no_callbacks;
|
`unsigned int no_callbacks;`
|
||||||
- indicates that the device does not use the runtime PM callbacks (see
|
- indicates that the device does not use the runtime PM callbacks (see
|
||||||
Section 8); it may be modified only by the pm_runtime_no_callbacks()
|
Section 8); it may be modified only by the pm_runtime_no_callbacks()
|
||||||
helper function
|
helper function
|
||||||
|
|
||||||
unsigned int irq_safe;
|
`unsigned int irq_safe;`
|
||||||
- indicates that the ->runtime_suspend() and ->runtime_resume() callbacks
|
- indicates that the ->runtime_suspend() and ->runtime_resume() callbacks
|
||||||
will be invoked with the spinlock held and interrupts disabled
|
will be invoked with the spinlock held and interrupts disabled
|
||||||
|
|
||||||
unsigned int use_autosuspend;
|
`unsigned int use_autosuspend;`
|
||||||
- indicates that the device's driver supports delayed autosuspend (see
|
- indicates that the device's driver supports delayed autosuspend (see
|
||||||
Section 9); it may be modified only by the
|
Section 9); it may be modified only by the
|
||||||
pm_runtime{_dont}_use_autosuspend() helper functions
|
pm_runtime{_dont}_use_autosuspend() helper functions
|
||||||
|
|
||||||
unsigned int timer_autosuspends;
|
`unsigned int timer_autosuspends;`
|
||||||
- indicates that the PM core should attempt to carry out an autosuspend
|
- indicates that the PM core should attempt to carry out an autosuspend
|
||||||
when the timer expires rather than a normal suspend
|
when the timer expires rather than a normal suspend
|
||||||
|
|
||||||
int autosuspend_delay;
|
`int autosuspend_delay;`
|
||||||
- the delay time (in milliseconds) to be used for autosuspend
|
- the delay time (in milliseconds) to be used for autosuspend
|
||||||
|
|
||||||
unsigned long last_busy;
|
`unsigned long last_busy;`
|
||||||
- the time (in jiffies) when the pm_runtime_mark_last_busy() helper
|
- the time (in jiffies) when the pm_runtime_mark_last_busy() helper
|
||||||
function was last called for this device; used in calculating inactivity
|
function was last called for this device; used in calculating inactivity
|
||||||
periods for autosuspend
|
periods for autosuspend
|
||||||
|
@ -293,37 +300,38 @@ defined in include/linux/pm.h:
|
||||||
All of the above fields are members of the 'power' member of 'struct device'.
|
All of the above fields are members of the 'power' member of 'struct device'.
|
||||||
|
|
||||||
4. Runtime PM Device Helper Functions
|
4. Runtime PM Device Helper Functions
|
||||||
|
=====================================
|
||||||
|
|
||||||
The following runtime PM helper functions are defined in
|
The following runtime PM helper functions are defined in
|
||||||
drivers/base/power/runtime.c and include/linux/pm_runtime.h:
|
drivers/base/power/runtime.c and include/linux/pm_runtime.h:
|
||||||
|
|
||||||
void pm_runtime_init(struct device *dev);
|
`void pm_runtime_init(struct device *dev);`
|
||||||
- initialize the device runtime PM fields in 'struct dev_pm_info'
|
- initialize the device runtime PM fields in 'struct dev_pm_info'
|
||||||
|
|
||||||
void pm_runtime_remove(struct device *dev);
|
`void pm_runtime_remove(struct device *dev);`
|
||||||
- make sure that the runtime PM of the device will be disabled after
|
- make sure that the runtime PM of the device will be disabled after
|
||||||
removing the device from device hierarchy
|
removing the device from device hierarchy
|
||||||
|
|
||||||
int pm_runtime_idle(struct device *dev);
|
`int pm_runtime_idle(struct device *dev);`
|
||||||
- execute the subsystem-level idle callback for the device; returns an
|
- execute the subsystem-level idle callback for the device; returns an
|
||||||
error code on failure, where -EINPROGRESS means that ->runtime_idle() is
|
error code on failure, where -EINPROGRESS means that ->runtime_idle() is
|
||||||
already being executed; if there is no callback or the callback returns 0
|
already being executed; if there is no callback or the callback returns 0
|
||||||
then run pm_runtime_autosuspend(dev) and return its result
|
then run pm_runtime_autosuspend(dev) and return its result
|
||||||
|
|
||||||
int pm_runtime_suspend(struct device *dev);
|
`int pm_runtime_suspend(struct device *dev);`
|
||||||
- execute the subsystem-level suspend callback for the device; returns 0 on
|
- execute the subsystem-level suspend callback for the device; returns 0 on
|
||||||
success, 1 if the device's runtime PM status was already 'suspended', or
|
success, 1 if the device's runtime PM status was already 'suspended', or
|
||||||
error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt
|
error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt
|
||||||
to suspend the device again in future and -EACCES means that
|
to suspend the device again in future and -EACCES means that
|
||||||
'power.disable_depth' is different from 0
|
'power.disable_depth' is different from 0
|
||||||
|
|
||||||
int pm_runtime_autosuspend(struct device *dev);
|
`int pm_runtime_autosuspend(struct device *dev);`
|
||||||
- same as pm_runtime_suspend() except that the autosuspend delay is taken
|
- same as pm_runtime_suspend() except that the autosuspend delay is taken
|
||||||
into account; if pm_runtime_autosuspend_expiration() says the delay has
|
`into account;` if pm_runtime_autosuspend_expiration() says the delay has
|
||||||
not yet expired then an autosuspend is scheduled for the appropriate time
|
not yet expired then an autosuspend is scheduled for the appropriate time
|
||||||
and 0 is returned
|
and 0 is returned
|
||||||
|
|
||||||
int pm_runtime_resume(struct device *dev);
|
`int pm_runtime_resume(struct device *dev);`
|
||||||
- execute the subsystem-level resume callback for the device; returns 0 on
|
- execute the subsystem-level resume callback for the device; returns 0 on
|
||||||
success, 1 if the device's runtime PM status was already 'active' or
|
success, 1 if the device's runtime PM status was already 'active' or
|
||||||
error code on failure, where -EAGAIN means it may be safe to attempt to
|
error code on failure, where -EAGAIN means it may be safe to attempt to
|
||||||
|
@ -331,17 +339,17 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
|
||||||
checked additionally, and -EACCES means that 'power.disable_depth' is
|
checked additionally, and -EACCES means that 'power.disable_depth' is
|
||||||
different from 0
|
different from 0
|
||||||
|
|
||||||
int pm_request_idle(struct device *dev);
|
`int pm_request_idle(struct device *dev);`
|
||||||
- submit a request to execute the subsystem-level idle callback for the
|
- submit a request to execute the subsystem-level idle callback for the
|
||||||
device (the request is represented by a work item in pm_wq); returns 0 on
|
device (the request is represented by a work item in pm_wq); returns 0 on
|
||||||
success or error code if the request has not been queued up
|
success or error code if the request has not been queued up
|
||||||
|
|
||||||
int pm_request_autosuspend(struct device *dev);
|
`int pm_request_autosuspend(struct device *dev);`
|
||||||
- schedule the execution of the subsystem-level suspend callback for the
|
- schedule the execution of the subsystem-level suspend callback for the
|
||||||
device when the autosuspend delay has expired; if the delay has already
|
device when the autosuspend delay has expired; if the delay has already
|
||||||
expired then the work item is queued up immediately
|
expired then the work item is queued up immediately
|
||||||
|
|
||||||
int pm_schedule_suspend(struct device *dev, unsigned int delay);
|
`int pm_schedule_suspend(struct device *dev, unsigned int delay);`
|
||||||
- schedule the execution of the subsystem-level suspend callback for the
|
- schedule the execution of the subsystem-level suspend callback for the
|
||||||
device in future, where 'delay' is the time to wait before queuing up a
|
device in future, where 'delay' is the time to wait before queuing up a
|
||||||
suspend work item in pm_wq, in milliseconds (if 'delay' is zero, the work
|
suspend work item in pm_wq, in milliseconds (if 'delay' is zero, the work
|
||||||
|
@ -351,58 +359,58 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
|
||||||
->runtime_suspend() is already scheduled and not yet expired, the new
|
->runtime_suspend() is already scheduled and not yet expired, the new
|
||||||
value of 'delay' will be used as the time to wait
|
value of 'delay' will be used as the time to wait
|
||||||
|
|
||||||
int pm_request_resume(struct device *dev);
|
`int pm_request_resume(struct device *dev);`
|
||||||
- submit a request to execute the subsystem-level resume callback for the
|
- submit a request to execute the subsystem-level resume callback for the
|
||||||
device (the request is represented by a work item in pm_wq); returns 0 on
|
device (the request is represented by a work item in pm_wq); returns 0 on
|
||||||
success, 1 if the device's runtime PM status was already 'active', or
|
success, 1 if the device's runtime PM status was already 'active', or
|
||||||
error code if the request hasn't been queued up
|
error code if the request hasn't been queued up
|
||||||
|
|
||||||
void pm_runtime_get_noresume(struct device *dev);
|
`void pm_runtime_get_noresume(struct device *dev);`
|
||||||
- increment the device's usage counter
|
- increment the device's usage counter
|
||||||
|
|
||||||
int pm_runtime_get(struct device *dev);
|
`int pm_runtime_get(struct device *dev);`
|
||||||
- increment the device's usage counter, run pm_request_resume(dev) and
|
- increment the device's usage counter, run pm_request_resume(dev) and
|
||||||
return its result
|
return its result
|
||||||
|
|
||||||
int pm_runtime_get_sync(struct device *dev);
|
`int pm_runtime_get_sync(struct device *dev);`
|
||||||
- increment the device's usage counter, run pm_runtime_resume(dev) and
|
- increment the device's usage counter, run pm_runtime_resume(dev) and
|
||||||
return its result
|
return its result
|
||||||
|
|
||||||
int pm_runtime_get_if_in_use(struct device *dev);
|
`int pm_runtime_get_if_in_use(struct device *dev);`
|
||||||
- return -EINVAL if 'power.disable_depth' is nonzero; otherwise, if the
|
- return -EINVAL if 'power.disable_depth' is nonzero; otherwise, if the
|
||||||
runtime PM status is RPM_ACTIVE and the runtime PM usage counter is
|
runtime PM status is RPM_ACTIVE and the runtime PM usage counter is
|
||||||
nonzero, increment the counter and return 1; otherwise return 0 without
|
nonzero, increment the counter and return 1; otherwise return 0 without
|
||||||
changing the counter
|
changing the counter
|
||||||
|
|
||||||
void pm_runtime_put_noidle(struct device *dev);
|
`void pm_runtime_put_noidle(struct device *dev);`
|
||||||
- decrement the device's usage counter
|
- decrement the device's usage counter
|
||||||
|
|
||||||
int pm_runtime_put(struct device *dev);
|
`int pm_runtime_put(struct device *dev);`
|
||||||
- decrement the device's usage counter; if the result is 0 then run
|
- decrement the device's usage counter; if the result is 0 then run
|
||||||
pm_request_idle(dev) and return its result
|
pm_request_idle(dev) and return its result
|
||||||
|
|
||||||
int pm_runtime_put_autosuspend(struct device *dev);
|
`int pm_runtime_put_autosuspend(struct device *dev);`
|
||||||
- decrement the device's usage counter; if the result is 0 then run
|
- decrement the device's usage counter; if the result is 0 then run
|
||||||
pm_request_autosuspend(dev) and return its result
|
pm_request_autosuspend(dev) and return its result
|
||||||
|
|
||||||
int pm_runtime_put_sync(struct device *dev);
|
`int pm_runtime_put_sync(struct device *dev);`
|
||||||
- decrement the device's usage counter; if the result is 0 then run
|
- decrement the device's usage counter; if the result is 0 then run
|
||||||
pm_runtime_idle(dev) and return its result
|
pm_runtime_idle(dev) and return its result
|
||||||
|
|
||||||
int pm_runtime_put_sync_suspend(struct device *dev);
|
`int pm_runtime_put_sync_suspend(struct device *dev);`
|
||||||
- decrement the device's usage counter; if the result is 0 then run
|
- decrement the device's usage counter; if the result is 0 then run
|
||||||
pm_runtime_suspend(dev) and return its result
|
pm_runtime_suspend(dev) and return its result
|
||||||
|
|
||||||
int pm_runtime_put_sync_autosuspend(struct device *dev);
|
`int pm_runtime_put_sync_autosuspend(struct device *dev);`
|
||||||
- decrement the device's usage counter; if the result is 0 then run
|
- decrement the device's usage counter; if the result is 0 then run
|
||||||
pm_runtime_autosuspend(dev) and return its result
|
pm_runtime_autosuspend(dev) and return its result
|
||||||
|
|
||||||
void pm_runtime_enable(struct device *dev);
|
`void pm_runtime_enable(struct device *dev);`
|
||||||
- decrement the device's 'power.disable_depth' field; if that field is equal
|
- decrement the device's 'power.disable_depth' field; if that field is equal
|
||||||
to zero, the runtime PM helper functions can execute subsystem-level
|
to zero, the runtime PM helper functions can execute subsystem-level
|
||||||
callbacks described in Section 2 for the device
|
callbacks described in Section 2 for the device
|
||||||
|
|
||||||
int pm_runtime_disable(struct device *dev);
|
`int pm_runtime_disable(struct device *dev);`
|
||||||
- increment the device's 'power.disable_depth' field (if the value of that
|
- increment the device's 'power.disable_depth' field (if the value of that
|
||||||
field was previously zero, this prevents subsystem-level runtime PM
|
field was previously zero, this prevents subsystem-level runtime PM
|
||||||
callbacks from being run for the device), make sure that all of the
|
callbacks from being run for the device), make sure that all of the
|
||||||
|
@ -411,7 +419,7 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
|
||||||
necessary to execute the subsystem-level resume callback for the device
|
necessary to execute the subsystem-level resume callback for the device
|
||||||
to satisfy that request, otherwise 0 is returned
|
to satisfy that request, otherwise 0 is returned
|
||||||
|
|
||||||
int pm_runtime_barrier(struct device *dev);
|
`int pm_runtime_barrier(struct device *dev);`
|
||||||
- check if there's a resume request pending for the device and resume it
|
- check if there's a resume request pending for the device and resume it
|
||||||
(synchronously) in that case, cancel any other pending runtime PM requests
|
(synchronously) in that case, cancel any other pending runtime PM requests
|
||||||
regarding it and wait for all runtime PM operations on it in progress to
|
regarding it and wait for all runtime PM operations on it in progress to
|
||||||
|
@ -419,10 +427,10 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
|
||||||
necessary to execute the subsystem-level resume callback for the device to
|
necessary to execute the subsystem-level resume callback for the device to
|
||||||
satisfy that request, otherwise 0 is returned
|
satisfy that request, otherwise 0 is returned
|
||||||
|
|
||||||
void pm_suspend_ignore_children(struct device *dev, bool enable);
|
`void pm_suspend_ignore_children(struct device *dev, bool enable);`
|
||||||
- set/unset the power.ignore_children flag of the device
|
- set/unset the power.ignore_children flag of the device
|
||||||
|
|
||||||
int pm_runtime_set_active(struct device *dev);
|
`int pm_runtime_set_active(struct device *dev);`
|
||||||
- clear the device's 'power.runtime_error' flag, set the device's runtime
|
- clear the device's 'power.runtime_error' flag, set the device's runtime
|
||||||
PM status to 'active' and update its parent's counter of 'active'
|
PM status to 'active' and update its parent's counter of 'active'
|
||||||
children as appropriate (it is only valid to use this function if
|
children as appropriate (it is only valid to use this function if
|
||||||
|
@ -430,61 +438,61 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
|
||||||
zero); it will fail and return error code if the device has a parent
|
zero); it will fail and return error code if the device has a parent
|
||||||
which is not active and the 'power.ignore_children' flag of which is unset
|
which is not active and the 'power.ignore_children' flag of which is unset
|
||||||
|
|
||||||
void pm_runtime_set_suspended(struct device *dev);
|
`void pm_runtime_set_suspended(struct device *dev);`
|
||||||
- clear the device's 'power.runtime_error' flag, set the device's runtime
|
- clear the device's 'power.runtime_error' flag, set the device's runtime
|
||||||
PM status to 'suspended' and update its parent's counter of 'active'
|
PM status to 'suspended' and update its parent's counter of 'active'
|
||||||
children as appropriate (it is only valid to use this function if
|
children as appropriate (it is only valid to use this function if
|
||||||
'power.runtime_error' is set or 'power.disable_depth' is greater than
|
'power.runtime_error' is set or 'power.disable_depth' is greater than
|
||||||
zero)
|
zero)
|
||||||
|
|
||||||
bool pm_runtime_active(struct device *dev);
|
`bool pm_runtime_active(struct device *dev);`
|
||||||
- return true if the device's runtime PM status is 'active' or its
|
- return true if the device's runtime PM status is 'active' or its
|
||||||
'power.disable_depth' field is not equal to zero, or false otherwise
|
'power.disable_depth' field is not equal to zero, or false otherwise
|
||||||
|
|
||||||
bool pm_runtime_suspended(struct device *dev);
|
`bool pm_runtime_suspended(struct device *dev);`
|
||||||
- return true if the device's runtime PM status is 'suspended' and its
|
- return true if the device's runtime PM status is 'suspended' and its
|
||||||
'power.disable_depth' field is equal to zero, or false otherwise
|
'power.disable_depth' field is equal to zero, or false otherwise
|
||||||
|
|
||||||
bool pm_runtime_status_suspended(struct device *dev);
|
`bool pm_runtime_status_suspended(struct device *dev);`
|
||||||
- return true if the device's runtime PM status is 'suspended'
|
- return true if the device's runtime PM status is 'suspended'
|
||||||
|
|
||||||
void pm_runtime_allow(struct device *dev);
|
`void pm_runtime_allow(struct device *dev);`
|
||||||
- set the power.runtime_auto flag for the device and decrease its usage
|
- set the power.runtime_auto flag for the device and decrease its usage
|
||||||
counter (used by the /sys/devices/.../power/control interface to
|
counter (used by the /sys/devices/.../power/control interface to
|
||||||
effectively allow the device to be power managed at run time)
|
effectively allow the device to be power managed at run time)
|
||||||
|
|
||||||
void pm_runtime_forbid(struct device *dev);
|
`void pm_runtime_forbid(struct device *dev);`
|
||||||
- unset the power.runtime_auto flag for the device and increase its usage
|
- unset the power.runtime_auto flag for the device and increase its usage
|
||||||
counter (used by the /sys/devices/.../power/control interface to
|
counter (used by the /sys/devices/.../power/control interface to
|
||||||
effectively prevent the device from being power managed at run time)
|
effectively prevent the device from being power managed at run time)
|
||||||
|
|
||||||
void pm_runtime_no_callbacks(struct device *dev);
|
`void pm_runtime_no_callbacks(struct device *dev);`
|
||||||
- set the power.no_callbacks flag for the device and remove the runtime
|
- set the power.no_callbacks flag for the device and remove the runtime
|
||||||
PM attributes from /sys/devices/.../power (or prevent them from being
|
PM attributes from /sys/devices/.../power (or prevent them from being
|
||||||
added when the device is registered)
|
added when the device is registered)
|
||||||
|
|
||||||
void pm_runtime_irq_safe(struct device *dev);
|
`void pm_runtime_irq_safe(struct device *dev);`
|
||||||
- set the power.irq_safe flag for the device, causing the runtime-PM
|
- set the power.irq_safe flag for the device, causing the runtime-PM
|
||||||
callbacks to be invoked with interrupts off
|
callbacks to be invoked with interrupts off
|
||||||
|
|
||||||
bool pm_runtime_is_irq_safe(struct device *dev);
|
`bool pm_runtime_is_irq_safe(struct device *dev);`
|
||||||
- return true if power.irq_safe flag was set for the device, causing
|
- return true if power.irq_safe flag was set for the device, causing
|
||||||
the runtime-PM callbacks to be invoked with interrupts off
|
the runtime-PM callbacks to be invoked with interrupts off
|
||||||
|
|
||||||
void pm_runtime_mark_last_busy(struct device *dev);
|
`void pm_runtime_mark_last_busy(struct device *dev);`
|
||||||
- set the power.last_busy field to the current time
|
- set the power.last_busy field to the current time
|
||||||
|
|
||||||
void pm_runtime_use_autosuspend(struct device *dev);
|
`void pm_runtime_use_autosuspend(struct device *dev);`
|
||||||
- set the power.use_autosuspend flag, enabling autosuspend delays; call
|
- set the power.use_autosuspend flag, enabling autosuspend delays; call
|
||||||
pm_runtime_get_sync if the flag was previously cleared and
|
pm_runtime_get_sync if the flag was previously cleared and
|
||||||
power.autosuspend_delay is negative
|
power.autosuspend_delay is negative
|
||||||
|
|
||||||
void pm_runtime_dont_use_autosuspend(struct device *dev);
|
`void pm_runtime_dont_use_autosuspend(struct device *dev);`
|
||||||
- clear the power.use_autosuspend flag, disabling autosuspend delays;
|
- clear the power.use_autosuspend flag, disabling autosuspend delays;
|
||||||
decrement the device's usage counter if the flag was previously set and
|
decrement the device's usage counter if the flag was previously set and
|
||||||
power.autosuspend_delay is negative; call pm_runtime_idle
|
power.autosuspend_delay is negative; call pm_runtime_idle
|
||||||
|
|
||||||
void pm_runtime_set_autosuspend_delay(struct device *dev, int delay);
|
`void pm_runtime_set_autosuspend_delay(struct device *dev, int delay);`
|
||||||
- set the power.autosuspend_delay value to 'delay' (expressed in
|
- set the power.autosuspend_delay value to 'delay' (expressed in
|
||||||
milliseconds); if 'delay' is negative then runtime suspends are
|
milliseconds); if 'delay' is negative then runtime suspends are
|
||||||
prevented; if power.use_autosuspend is set, pm_runtime_get_sync may be
|
prevented; if power.use_autosuspend is set, pm_runtime_get_sync may be
|
||||||
|
@ -493,7 +501,7 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
|
||||||
changed to or from a negative value; if power.use_autosuspend is clear,
|
changed to or from a negative value; if power.use_autosuspend is clear,
|
||||||
pm_runtime_idle is called
|
pm_runtime_idle is called
|
||||||
|
|
||||||
unsigned long pm_runtime_autosuspend_expiration(struct device *dev);
|
`unsigned long pm_runtime_autosuspend_expiration(struct device *dev);`
|
||||||
- calculate the time when the current autosuspend delay period will expire,
|
- calculate the time when the current autosuspend delay period will expire,
|
||||||
based on power.last_busy and power.autosuspend_delay; if the delay time
|
based on power.last_busy and power.autosuspend_delay; if the delay time
|
||||||
is 1000 ms or larger then the expiration time is rounded up to the
|
is 1000 ms or larger then the expiration time is rounded up to the
|
||||||
|
@ -503,36 +511,37 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
|
||||||
|
|
||||||
It is safe to execute the following helper functions from interrupt context:
|
It is safe to execute the following helper functions from interrupt context:
|
||||||
|
|
||||||
pm_request_idle()
|
- pm_request_idle()
|
||||||
pm_request_autosuspend()
|
- pm_request_autosuspend()
|
||||||
pm_schedule_suspend()
|
- pm_schedule_suspend()
|
||||||
pm_request_resume()
|
- pm_request_resume()
|
||||||
pm_runtime_get_noresume()
|
- pm_runtime_get_noresume()
|
||||||
pm_runtime_get()
|
- pm_runtime_get()
|
||||||
pm_runtime_put_noidle()
|
- pm_runtime_put_noidle()
|
||||||
pm_runtime_put()
|
- pm_runtime_put()
|
||||||
pm_runtime_put_autosuspend()
|
- pm_runtime_put_autosuspend()
|
||||||
pm_runtime_enable()
|
- pm_runtime_enable()
|
||||||
pm_suspend_ignore_children()
|
- pm_suspend_ignore_children()
|
||||||
pm_runtime_set_active()
|
- pm_runtime_set_active()
|
||||||
pm_runtime_set_suspended()
|
- pm_runtime_set_suspended()
|
||||||
pm_runtime_suspended()
|
- pm_runtime_suspended()
|
||||||
pm_runtime_mark_last_busy()
|
- pm_runtime_mark_last_busy()
|
||||||
pm_runtime_autosuspend_expiration()
|
- pm_runtime_autosuspend_expiration()
|
||||||
|
|
||||||
If pm_runtime_irq_safe() has been called for a device then the following helper
|
If pm_runtime_irq_safe() has been called for a device then the following helper
|
||||||
functions may also be used in interrupt context:
|
functions may also be used in interrupt context:
|
||||||
|
|
||||||
pm_runtime_idle()
|
- pm_runtime_idle()
|
||||||
pm_runtime_suspend()
|
- pm_runtime_suspend()
|
||||||
pm_runtime_autosuspend()
|
- pm_runtime_autosuspend()
|
||||||
pm_runtime_resume()
|
- pm_runtime_resume()
|
||||||
pm_runtime_get_sync()
|
- pm_runtime_get_sync()
|
||||||
pm_runtime_put_sync()
|
- pm_runtime_put_sync()
|
||||||
pm_runtime_put_sync_suspend()
|
- pm_runtime_put_sync_suspend()
|
||||||
pm_runtime_put_sync_autosuspend()
|
- pm_runtime_put_sync_autosuspend()
|
||||||
|
|
||||||
5. Runtime PM Initialization, Device Probing and Removal
|
5. Runtime PM Initialization, Device Probing and Removal
|
||||||
|
========================================================
|
||||||
|
|
||||||
Initially, the runtime PM is disabled for all devices, which means that the
|
Initially, the runtime PM is disabled for all devices, which means that the
|
||||||
majority of the runtime PM helper functions described in Section 4 will return
|
majority of the runtime PM helper functions described in Section 4 will return
|
||||||
|
@ -608,6 +617,7 @@ manage the device at run time, the driver may confuse it by using
|
||||||
pm_runtime_forbid() this way.
|
pm_runtime_forbid() this way.
|
||||||
|
|
||||||
6. Runtime PM and System Sleep
|
6. Runtime PM and System Sleep
|
||||||
|
==============================
|
||||||
|
|
||||||
Runtime PM and system sleep (i.e., system suspend and hibernation, also known
|
Runtime PM and system sleep (i.e., system suspend and hibernation, also known
|
||||||
as suspend-to-RAM and suspend-to-disk) interact with each other in a couple of
|
as suspend-to-RAM and suspend-to-disk) interact with each other in a couple of
|
||||||
|
@ -647,9 +657,9 @@ brought back to full power during resume, then its runtime PM status will have
|
||||||
to be updated to reflect the actual post-system sleep status. The way to do
|
to be updated to reflect the actual post-system sleep status. The way to do
|
||||||
this is:
|
this is:
|
||||||
|
|
||||||
pm_runtime_disable(dev);
|
- pm_runtime_disable(dev);
|
||||||
pm_runtime_set_active(dev);
|
- pm_runtime_set_active(dev);
|
||||||
pm_runtime_enable(dev);
|
- pm_runtime_enable(dev);
|
||||||
|
|
||||||
The PM core always increments the runtime usage counter before calling the
|
The PM core always increments the runtime usage counter before calling the
|
||||||
->suspend() callback and decrements it after calling the ->resume() callback.
|
->suspend() callback and decrements it after calling the ->resume() callback.
|
||||||
|
@ -705,66 +715,66 @@ Subsystems may wish to conserve code space by using the set of generic power
|
||||||
management callbacks provided by the PM core, defined in
|
management callbacks provided by the PM core, defined in
|
||||||
driver/base/power/generic_ops.c:
|
driver/base/power/generic_ops.c:
|
||||||
|
|
||||||
int pm_generic_runtime_suspend(struct device *dev);
|
`int pm_generic_runtime_suspend(struct device *dev);`
|
||||||
- invoke the ->runtime_suspend() callback provided by the driver of this
|
- invoke the ->runtime_suspend() callback provided by the driver of this
|
||||||
device and return its result, or return 0 if not defined
|
device and return its result, or return 0 if not defined
|
||||||
|
|
||||||
int pm_generic_runtime_resume(struct device *dev);
|
`int pm_generic_runtime_resume(struct device *dev);`
|
||||||
- invoke the ->runtime_resume() callback provided by the driver of this
|
- invoke the ->runtime_resume() callback provided by the driver of this
|
||||||
device and return its result, or return 0 if not defined
|
device and return its result, or return 0 if not defined
|
||||||
|
|
||||||
int pm_generic_suspend(struct device *dev);
|
`int pm_generic_suspend(struct device *dev);`
|
||||||
- if the device has not been suspended at run time, invoke the ->suspend()
|
- if the device has not been suspended at run time, invoke the ->suspend()
|
||||||
callback provided by its driver and return its result, or return 0 if not
|
callback provided by its driver and return its result, or return 0 if not
|
||||||
defined
|
defined
|
||||||
|
|
||||||
int pm_generic_suspend_noirq(struct device *dev);
|
`int pm_generic_suspend_noirq(struct device *dev);`
|
||||||
- if pm_runtime_suspended(dev) returns "false", invoke the ->suspend_noirq()
|
- if pm_runtime_suspended(dev) returns "false", invoke the ->suspend_noirq()
|
||||||
callback provided by the device's driver and return its result, or return
|
callback provided by the device's driver and return its result, or return
|
||||||
0 if not defined
|
0 if not defined
|
||||||
|
|
||||||
int pm_generic_resume(struct device *dev);
|
`int pm_generic_resume(struct device *dev);`
|
||||||
- invoke the ->resume() callback provided by the driver of this device and,
|
- invoke the ->resume() callback provided by the driver of this device and,
|
||||||
if successful, change the device's runtime PM status to 'active'
|
if successful, change the device's runtime PM status to 'active'
|
||||||
|
|
||||||
int pm_generic_resume_noirq(struct device *dev);
|
`int pm_generic_resume_noirq(struct device *dev);`
|
||||||
- invoke the ->resume_noirq() callback provided by the driver of this device
|
- invoke the ->resume_noirq() callback provided by the driver of this device
|
||||||
|
|
||||||
int pm_generic_freeze(struct device *dev);
|
`int pm_generic_freeze(struct device *dev);`
|
||||||
- if the device has not been suspended at run time, invoke the ->freeze()
|
- if the device has not been suspended at run time, invoke the ->freeze()
|
||||||
callback provided by its driver and return its result, or return 0 if not
|
callback provided by its driver and return its result, or return 0 if not
|
||||||
defined
|
defined
|
||||||
|
|
||||||
int pm_generic_freeze_noirq(struct device *dev);
|
`int pm_generic_freeze_noirq(struct device *dev);`
|
||||||
- if pm_runtime_suspended(dev) returns "false", invoke the ->freeze_noirq()
|
- if pm_runtime_suspended(dev) returns "false", invoke the ->freeze_noirq()
|
||||||
callback provided by the device's driver and return its result, or return
|
callback provided by the device's driver and return its result, or return
|
||||||
0 if not defined
|
0 if not defined
|
||||||
|
|
||||||
int pm_generic_thaw(struct device *dev);
|
`int pm_generic_thaw(struct device *dev);`
|
||||||
- if the device has not been suspended at run time, invoke the ->thaw()
|
- if the device has not been suspended at run time, invoke the ->thaw()
|
||||||
callback provided by its driver and return its result, or return 0 if not
|
callback provided by its driver and return its result, or return 0 if not
|
||||||
defined
|
defined
|
||||||
|
|
||||||
int pm_generic_thaw_noirq(struct device *dev);
|
`int pm_generic_thaw_noirq(struct device *dev);`
|
||||||
- if pm_runtime_suspended(dev) returns "false", invoke the ->thaw_noirq()
|
- if pm_runtime_suspended(dev) returns "false", invoke the ->thaw_noirq()
|
||||||
callback provided by the device's driver and return its result, or return
|
callback provided by the device's driver and return its result, or return
|
||||||
0 if not defined
|
0 if not defined
|
||||||
|
|
||||||
int pm_generic_poweroff(struct device *dev);
|
`int pm_generic_poweroff(struct device *dev);`
|
||||||
- if the device has not been suspended at run time, invoke the ->poweroff()
|
- if the device has not been suspended at run time, invoke the ->poweroff()
|
||||||
callback provided by its driver and return its result, or return 0 if not
|
callback provided by its driver and return its result, or return 0 if not
|
||||||
defined
|
defined
|
||||||
|
|
||||||
int pm_generic_poweroff_noirq(struct device *dev);
|
`int pm_generic_poweroff_noirq(struct device *dev);`
|
||||||
- if pm_runtime_suspended(dev) returns "false", run the ->poweroff_noirq()
|
- if pm_runtime_suspended(dev) returns "false", run the ->poweroff_noirq()
|
||||||
callback provided by the device's driver and return its result, or return
|
callback provided by the device's driver and return its result, or return
|
||||||
0 if not defined
|
0 if not defined
|
||||||
|
|
||||||
int pm_generic_restore(struct device *dev);
|
`int pm_generic_restore(struct device *dev);`
|
||||||
- invoke the ->restore() callback provided by the driver of this device and,
|
- invoke the ->restore() callback provided by the driver of this device and,
|
||||||
if successful, change the device's runtime PM status to 'active'
|
if successful, change the device's runtime PM status to 'active'
|
||||||
|
|
||||||
int pm_generic_restore_noirq(struct device *dev);
|
`int pm_generic_restore_noirq(struct device *dev);`
|
||||||
- invoke the ->restore_noirq() callback provided by the device's driver
|
- invoke the ->restore_noirq() callback provided by the device's driver
|
||||||
|
|
||||||
These functions are the defaults used by the PM core, if a subsystem doesn't
|
These functions are the defaults used by the PM core, if a subsystem doesn't
|
||||||
|
@ -781,6 +791,7 @@ UNIVERSAL_DEV_PM_OPS macro defined in include/linux/pm.h (possibly setting its
|
||||||
last argument to NULL).
|
last argument to NULL).
|
||||||
|
|
||||||
8. "No-Callback" Devices
|
8. "No-Callback" Devices
|
||||||
|
========================
|
||||||
|
|
||||||
Some "devices" are only logical sub-devices of their parent and cannot be
|
Some "devices" are only logical sub-devices of their parent and cannot be
|
||||||
power-managed on their own. (The prototype example is a USB interface. Entire
|
power-managed on their own. (The prototype example is a USB interface. Entire
|
||||||
|
@ -807,6 +818,7 @@ parent must take responsibility for telling the device's driver when the
|
||||||
parent's power state changes.
|
parent's power state changes.
|
||||||
|
|
||||||
9. Autosuspend, or automatically-delayed suspends
|
9. Autosuspend, or automatically-delayed suspends
|
||||||
|
=================================================
|
||||||
|
|
||||||
Changing a device's power state isn't free; it requires both time and energy.
|
Changing a device's power state isn't free; it requires both time and energy.
|
||||||
A device should be put in a low-power state only when there's some reason to
|
A device should be put in a low-power state only when there's some reason to
|
||||||
|
@ -832,8 +844,8 @@ registration the length should be controlled by user space, using the
|
||||||
|
|
||||||
In order to use autosuspend, subsystems or drivers must call
|
In order to use autosuspend, subsystems or drivers must call
|
||||||
pm_runtime_use_autosuspend() (preferably before registering the device), and
|
pm_runtime_use_autosuspend() (preferably before registering the device), and
|
||||||
thereafter they should use the various *_autosuspend() helper functions instead
|
thereafter they should use the various `*_autosuspend()` helper functions
|
||||||
of the non-autosuspend counterparts:
|
instead of the non-autosuspend counterparts::
|
||||||
|
|
||||||
Instead of: pm_runtime_suspend use: pm_runtime_autosuspend;
|
Instead of: pm_runtime_suspend use: pm_runtime_autosuspend;
|
||||||
Instead of: pm_schedule_suspend use: pm_request_autosuspend;
|
Instead of: pm_schedule_suspend use: pm_request_autosuspend;
|
||||||
|
@ -858,7 +870,7 @@ The implementation is well suited for asynchronous use in interrupt contexts.
|
||||||
However such use inevitably involves races, because the PM core can't
|
However such use inevitably involves races, because the PM core can't
|
||||||
synchronize ->runtime_suspend() callbacks with the arrival of I/O requests.
|
synchronize ->runtime_suspend() callbacks with the arrival of I/O requests.
|
||||||
This synchronization must be handled by the driver, using its private lock.
|
This synchronization must be handled by the driver, using its private lock.
|
||||||
Here is a schematic pseudo-code example:
|
Here is a schematic pseudo-code example::
|
||||||
|
|
||||||
foo_read_or_write(struct foo_priv *foo, void *data)
|
foo_read_or_write(struct foo_priv *foo, void *data)
|
||||||
{
|
{
|
|
@ -1,7 +1,9 @@
|
||||||
How to get s2ram working
|
========================
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
How to get s2ram working
|
||||||
2006 Linus Torvalds
|
========================
|
||||||
2006 Pavel Machek
|
|
||||||
|
2006 Linus Torvalds
|
||||||
|
2006 Pavel Machek
|
||||||
|
|
||||||
1) Check suspend.sf.net, program s2ram there has long whitelist of
|
1) Check suspend.sf.net, program s2ram there has long whitelist of
|
||||||
"known ok" machines, along with tricks to use on each one.
|
"known ok" machines, along with tricks to use on each one.
|
||||||
|
@ -12,8 +14,8 @@
|
||||||
|
|
||||||
3) You can use Linus' TRACE_RESUME infrastructure, described below.
|
3) You can use Linus' TRACE_RESUME infrastructure, described below.
|
||||||
|
|
||||||
Using TRACE_RESUME
|
Using TRACE_RESUME
|
||||||
~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
I've been working at making the machines I have able to STR, and almost
|
I've been working at making the machines I have able to STR, and almost
|
||||||
always it's a driver that is buggy. Thank God for the suspend/resume
|
always it's a driver that is buggy. Thank God for the suspend/resume
|
||||||
|
@ -27,7 +29,7 @@ machine that doesn't boot) is:
|
||||||
|
|
||||||
- enable PM_DEBUG, and PM_TRACE
|
- enable PM_DEBUG, and PM_TRACE
|
||||||
|
|
||||||
- use a script like this:
|
- use a script like this::
|
||||||
|
|
||||||
#!/bin/sh
|
#!/bin/sh
|
||||||
sync
|
sync
|
||||||
|
@ -38,7 +40,7 @@ machine that doesn't boot) is:
|
||||||
|
|
||||||
- if it doesn't come back up (which is usually the problem), reboot by
|
- if it doesn't come back up (which is usually the problem), reboot by
|
||||||
holding the power button down, and look at the dmesg output for things
|
holding the power button down, and look at the dmesg output for things
|
||||||
like
|
like::
|
||||||
|
|
||||||
Magic number: 4:156:725
|
Magic number: 4:156:725
|
||||||
hash matches drivers/base/power/resume.c:28
|
hash matches drivers/base/power/resume.c:28
|
||||||
|
@ -52,7 +54,7 @@ machine that doesn't boot) is:
|
||||||
If no device matches the hash (or any matches appear to be false positives),
|
If no device matches the hash (or any matches appear to be false positives),
|
||||||
the culprit may be a device from a loadable kernel module that is not loaded
|
the culprit may be a device from a loadable kernel module that is not loaded
|
||||||
until after the hash is checked. You can check the hash against the current
|
until after the hash is checked. You can check the hash against the current
|
||||||
devices again after more modules are loaded using sysfs:
|
devices again after more modules are loaded using sysfs::
|
||||||
|
|
||||||
cat /sys/power/pm_trace_dev_match
|
cat /sys/power/pm_trace_dev_match
|
||||||
|
|
|
@ -1,10 +1,15 @@
|
||||||
|
====================================================================
|
||||||
Interaction of Suspend code (S3) with the CPU hotplug infrastructure
|
Interaction of Suspend code (S3) with the CPU hotplug infrastructure
|
||||||
|
====================================================================
|
||||||
|
|
||||||
(C) 2011 - 2014 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
|
(C) 2011 - 2014 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
|
||||||
|
|
||||||
|
|
||||||
I. How does the regular CPU hotplug code differ from how the Suspend-to-RAM
|
I. Differences between CPU hotplug and Suspend-to-RAM
|
||||||
infrastructure uses it internally? And where do they share common code?
|
======================================================
|
||||||
|
|
||||||
|
How does the regular CPU hotplug code differ from how the Suspend-to-RAM
|
||||||
|
infrastructure uses it internally? And where do they share common code?
|
||||||
|
|
||||||
Well, a picture is worth a thousand words... So ASCII art follows :-)
|
Well, a picture is worth a thousand words... So ASCII art follows :-)
|
||||||
|
|
||||||
|
@ -16,13 +21,13 @@ of describing where they take different paths and where they share code.
|
||||||
What happens when regular CPU hotplug and Suspend-to-RAM race with each other
|
What happens when regular CPU hotplug and Suspend-to-RAM race with each other
|
||||||
is not depicted here.]
|
is not depicted here.]
|
||||||
|
|
||||||
On a high level, the suspend-resume cycle goes like this:
|
On a high level, the suspend-resume cycle goes like this::
|
||||||
|
|
||||||
|Freeze| -> |Disable nonboot| -> |Do suspend| -> |Enable nonboot| -> |Thaw |
|
|Freeze| -> |Disable nonboot| -> |Do suspend| -> |Enable nonboot| -> |Thaw |
|
||||||
|tasks | | cpus | | | | cpus | |tasks|
|
|tasks | | cpus | | | | cpus | |tasks|
|
||||||
|
|
||||||
|
|
||||||
More details follow:
|
More details follow::
|
||||||
|
|
||||||
Suspend call path
|
Suspend call path
|
||||||
-----------------
|
-----------------
|
||||||
|
@ -87,7 +92,9 @@ More details follow:
|
||||||
|
|
||||||
Resuming back is likewise, with the counterparts being (in the order of
|
Resuming back is likewise, with the counterparts being (in the order of
|
||||||
execution during resume):
|
execution during resume):
|
||||||
* enable_nonboot_cpus() which involves:
|
|
||||||
|
* enable_nonboot_cpus() which involves::
|
||||||
|
|
||||||
| Acquire cpu_add_remove_lock
|
| Acquire cpu_add_remove_lock
|
||||||
| Decrease cpu_hotplug_disabled, thereby enabling regular cpu hotplug
|
| Decrease cpu_hotplug_disabled, thereby enabling regular cpu hotplug
|
||||||
| Call _cpu_up() [for all those cpus in the frozen_cpus mask, in a loop]
|
| Call _cpu_up() [for all those cpus in the frozen_cpus mask, in a loop]
|
||||||
|
@ -103,6 +110,8 @@ It is to be noted here that the system_transition_mutex lock is acquired at the
|
||||||
beginning, when we are just starting out to suspend, and then released only
|
beginning, when we are just starting out to suspend, and then released only
|
||||||
after the entire cycle is complete (i.e., suspend + resume).
|
after the entire cycle is complete (i.e., suspend + resume).
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Regular CPU hotplug call path
|
Regular CPU hotplug call path
|
||||||
|
@ -152,16 +161,16 @@ with the 'tasks_frozen' argument set to 1.
|
||||||
|
|
||||||
|
|
||||||
Important files and functions/entry points:
|
Important files and functions/entry points:
|
||||||
------------------------------------------
|
-------------------------------------------
|
||||||
|
|
||||||
kernel/power/process.c : freeze_processes(), thaw_processes()
|
- kernel/power/process.c : freeze_processes(), thaw_processes()
|
||||||
kernel/power/suspend.c : suspend_prepare(), suspend_enter(), suspend_finish()
|
- kernel/power/suspend.c : suspend_prepare(), suspend_enter(), suspend_finish()
|
||||||
kernel/cpu.c: cpu_[up|down](), _cpu_[up|down](), [disable|enable]_nonboot_cpus()
|
- kernel/cpu.c: cpu_[up|down](), _cpu_[up|down](), [disable|enable]_nonboot_cpus()
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
II. What are the issues involved in CPU hotplug?
|
II. What are the issues involved in CPU hotplug?
|
||||||
-------------------------------------------
|
------------------------------------------------
|
||||||
|
|
||||||
There are some interesting situations involving CPU hotplug and microcode
|
There are some interesting situations involving CPU hotplug and microcode
|
||||||
update on the CPUs, as discussed below:
|
update on the CPUs, as discussed below:
|
||||||
|
@ -243,8 +252,11 @@ d. Handling microcode update during suspend/hibernate:
|
||||||
cycles).
|
cycles).
|
||||||
|
|
||||||
|
|
||||||
III. Are there any known problems when regular CPU hotplug and suspend race
|
III. Known problems
|
||||||
with each other?
|
===================
|
||||||
|
|
||||||
|
Are there any known problems when regular CPU hotplug and suspend race
|
||||||
|
with each other?
|
||||||
|
|
||||||
Yes, they are listed below:
|
Yes, they are listed below:
|
||||||
|
|
|
@ -1,4 +1,6 @@
|
||||||
|
====================================
|
||||||
System Suspend and Device Interrupts
|
System Suspend and Device Interrupts
|
||||||
|
====================================
|
||||||
|
|
||||||
Copyright (C) 2014 Intel Corp.
|
Copyright (C) 2014 Intel Corp.
|
||||||
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
@ -1,4 +1,7 @@
|
||||||
|
===============================================
|
||||||
Using swap files with software suspend (swsusp)
|
Using swap files with software suspend (swsusp)
|
||||||
|
===============================================
|
||||||
|
|
||||||
(C) 2006 Rafael J. Wysocki <rjw@sisk.pl>
|
(C) 2006 Rafael J. Wysocki <rjw@sisk.pl>
|
||||||
|
|
||||||
The Linux kernel handles swap files almost in the same way as it handles swap
|
The Linux kernel handles swap files almost in the same way as it handles swap
|
||||||
|
@ -21,20 +24,20 @@ units.
|
||||||
|
|
||||||
In order to use a swap file with swsusp, you need to:
|
In order to use a swap file with swsusp, you need to:
|
||||||
|
|
||||||
1) Create the swap file and make it active, eg.
|
1) Create the swap file and make it active, eg.::
|
||||||
|
|
||||||
# dd if=/dev/zero of=<swap_file_path> bs=1024 count=<swap_file_size_in_k>
|
# dd if=/dev/zero of=<swap_file_path> bs=1024 count=<swap_file_size_in_k>
|
||||||
# mkswap <swap_file_path>
|
# mkswap <swap_file_path>
|
||||||
# swapon <swap_file_path>
|
# swapon <swap_file_path>
|
||||||
|
|
||||||
2) Use an application that will bmap the swap file with the help of the
|
2) Use an application that will bmap the swap file with the help of the
|
||||||
FIBMAP ioctl and determine the location of the file's swap header, as the
|
FIBMAP ioctl and determine the location of the file's swap header, as the
|
||||||
offset, in <PAGE_SIZE> units, from the beginning of the partition which
|
offset, in <PAGE_SIZE> units, from the beginning of the partition which
|
||||||
holds the swap file.
|
holds the swap file.
|
||||||
|
|
||||||
3) Add the following parameters to the kernel command line:
|
3) Add the following parameters to the kernel command line::
|
||||||
|
|
||||||
resume=<swap_file_partition> resume_offset=<swap_file_offset>
|
resume=<swap_file_partition> resume_offset=<swap_file_offset>
|
||||||
|
|
||||||
where <swap_file_partition> is the partition on which the swap file is located
|
where <swap_file_partition> is the partition on which the swap file is located
|
||||||
and <swap_file_offset> is the offset of the swap header determined by the
|
and <swap_file_offset> is the offset of the swap header determined by the
|
||||||
|
@ -46,7 +49,7 @@ OR
|
||||||
|
|
||||||
Use a userland suspend application that will set the partition and offset
|
Use a userland suspend application that will set the partition and offset
|
||||||
with the help of the SNAPSHOT_SET_SWAP_AREA ioctl described in
|
with the help of the SNAPSHOT_SET_SWAP_AREA ioctl described in
|
||||||
Documentation/power/userland-swsusp.txt (this is the only method to suspend
|
Documentation/power/userland-swsusp.rst (this is the only method to suspend
|
||||||
to a swap file allowing the resume to be initiated from an initrd or initramfs
|
to a swap file allowing the resume to be initiated from an initrd or initramfs
|
||||||
image).
|
image).
|
||||||
|
|
|
@ -1,13 +1,15 @@
|
||||||
|
=======================================
|
||||||
|
How to use dm-crypt and swsusp together
|
||||||
|
=======================================
|
||||||
|
|
||||||
Author: Andreas Steinmetz <ast@domdv.de>
|
Author: Andreas Steinmetz <ast@domdv.de>
|
||||||
|
|
||||||
|
|
||||||
How to use dm-crypt and swsusp together:
|
|
||||||
========================================
|
|
||||||
|
|
||||||
Some prerequisites:
|
Some prerequisites:
|
||||||
You know how dm-crypt works. If not, visit the following web page:
|
You know how dm-crypt works. If not, visit the following web page:
|
||||||
http://www.saout.de/misc/dm-crypt/
|
http://www.saout.de/misc/dm-crypt/
|
||||||
You have read Documentation/power/swsusp.txt and understand it.
|
You have read Documentation/power/swsusp.rst and understand it.
|
||||||
You did read Documentation/admin-guide/initrd.rst and know how an initrd works.
|
You did read Documentation/admin-guide/initrd.rst and know how an initrd works.
|
||||||
You know how to create or how to modify an initrd.
|
You know how to create or how to modify an initrd.
|
||||||
|
|
||||||
|
@ -29,23 +31,23 @@ a way that the swap device you suspend to/resume from has
|
||||||
always the same major/minor within the initrd as well as
|
always the same major/minor within the initrd as well as
|
||||||
within your running system. The easiest way to achieve this is
|
within your running system. The easiest way to achieve this is
|
||||||
to always set up this swap device first with dmsetup, so that
|
to always set up this swap device first with dmsetup, so that
|
||||||
it will always look like the following:
|
it will always look like the following::
|
||||||
|
|
||||||
brw------- 1 root root 254, 0 Jul 28 13:37 /dev/mapper/swap0
|
brw------- 1 root root 254, 0 Jul 28 13:37 /dev/mapper/swap0
|
||||||
|
|
||||||
Now set up your kernel to use /dev/mapper/swap0 as the default
|
Now set up your kernel to use /dev/mapper/swap0 as the default
|
||||||
resume partition, so your kernel .config contains:
|
resume partition, so your kernel .config contains::
|
||||||
|
|
||||||
CONFIG_PM_STD_PARTITION="/dev/mapper/swap0"
|
CONFIG_PM_STD_PARTITION="/dev/mapper/swap0"
|
||||||
|
|
||||||
Prepare your boot loader to use the initrd you will create or
|
Prepare your boot loader to use the initrd you will create or
|
||||||
modify. For lilo the simplest setup looks like the following
|
modify. For lilo the simplest setup looks like the following
|
||||||
lines:
|
lines::
|
||||||
|
|
||||||
image=/boot/vmlinuz
|
image=/boot/vmlinuz
|
||||||
initrd=/boot/initrd.gz
|
initrd=/boot/initrd.gz
|
||||||
label=linux
|
label=linux
|
||||||
append="root=/dev/ram0 init=/linuxrc rw"
|
append="root=/dev/ram0 init=/linuxrc rw"
|
||||||
|
|
||||||
Finally you need to create or modify your initrd. Lets assume
|
Finally you need to create or modify your initrd. Lets assume
|
||||||
you create an initrd that reads the required dm-crypt setup
|
you create an initrd that reads the required dm-crypt setup
|
||||||
|
@ -53,66 +55,66 @@ from a pcmcia flash disk card. The card is formatted with an ext2
|
||||||
fs which resides on /dev/hde1 when the card is inserted. The
|
fs which resides on /dev/hde1 when the card is inserted. The
|
||||||
card contains at least the encrypted swap setup in a file
|
card contains at least the encrypted swap setup in a file
|
||||||
named "swapkey". /etc/fstab of your initrd contains something
|
named "swapkey". /etc/fstab of your initrd contains something
|
||||||
like the following:
|
like the following::
|
||||||
|
|
||||||
/dev/hda1 /mnt ext3 ro 0 0
|
/dev/hda1 /mnt ext3 ro 0 0
|
||||||
none /proc proc defaults,noatime,nodiratime 0 0
|
none /proc proc defaults,noatime,nodiratime 0 0
|
||||||
none /sys sysfs defaults,noatime,nodiratime 0 0
|
none /sys sysfs defaults,noatime,nodiratime 0 0
|
||||||
|
|
||||||
/dev/hda1 contains an unencrypted mini system that sets up all
|
/dev/hda1 contains an unencrypted mini system that sets up all
|
||||||
of your crypto devices, again by reading the setup from the
|
of your crypto devices, again by reading the setup from the
|
||||||
pcmcia flash disk. What follows now is a /linuxrc for your
|
pcmcia flash disk. What follows now is a /linuxrc for your
|
||||||
initrd that allows you to resume from encrypted swap and that
|
initrd that allows you to resume from encrypted swap and that
|
||||||
continues boot with your mini system on /dev/hda1 if resume
|
continues boot with your mini system on /dev/hda1 if resume
|
||||||
does not happen:
|
does not happen::
|
||||||
|
|
||||||
#!/bin/sh
|
#!/bin/sh
|
||||||
PATH=/sbin:/bin:/usr/sbin:/usr/bin
|
PATH=/sbin:/bin:/usr/sbin:/usr/bin
|
||||||
mount /proc
|
mount /proc
|
||||||
mount /sys
|
mount /sys
|
||||||
mapped=0
|
mapped=0
|
||||||
noresume=`grep -c noresume /proc/cmdline`
|
noresume=`grep -c noresume /proc/cmdline`
|
||||||
if [ "$*" != "" ]
|
if [ "$*" != "" ]
|
||||||
then
|
|
||||||
noresume=1
|
|
||||||
fi
|
|
||||||
dmesg -n 1
|
|
||||||
/sbin/cardmgr -q
|
|
||||||
for i in 1 2 3 4 5 6 7 8 9 0
|
|
||||||
do
|
|
||||||
if [ -f /proc/ide/hde/media ]
|
|
||||||
then
|
then
|
||||||
usleep 500000
|
noresume=1
|
||||||
mount -t ext2 -o ro /dev/hde1 /mnt
|
fi
|
||||||
if [ -f /mnt/swapkey ]
|
dmesg -n 1
|
||||||
|
/sbin/cardmgr -q
|
||||||
|
for i in 1 2 3 4 5 6 7 8 9 0
|
||||||
|
do
|
||||||
|
if [ -f /proc/ide/hde/media ]
|
||||||
then
|
then
|
||||||
dmsetup create swap0 /mnt/swapkey > /dev/null 2>&1 && mapped=1
|
usleep 500000
|
||||||
|
mount -t ext2 -o ro /dev/hde1 /mnt
|
||||||
|
if [ -f /mnt/swapkey ]
|
||||||
|
then
|
||||||
|
dmsetup create swap0 /mnt/swapkey > /dev/null 2>&1 && mapped=1
|
||||||
|
fi
|
||||||
|
umount /mnt
|
||||||
|
break
|
||||||
fi
|
fi
|
||||||
umount /mnt
|
usleep 500000
|
||||||
break
|
done
|
||||||
fi
|
killproc /sbin/cardmgr
|
||||||
usleep 500000
|
dmesg -n 6
|
||||||
done
|
if [ $mapped = 1 ]
|
||||||
killproc /sbin/cardmgr
|
|
||||||
dmesg -n 6
|
|
||||||
if [ $mapped = 1 ]
|
|
||||||
then
|
|
||||||
if [ $noresume != 0 ]
|
|
||||||
then
|
then
|
||||||
mkswap /dev/mapper/swap0 > /dev/null 2>&1
|
if [ $noresume != 0 ]
|
||||||
|
then
|
||||||
|
mkswap /dev/mapper/swap0 > /dev/null 2>&1
|
||||||
|
fi
|
||||||
|
echo 254:0 > /sys/power/resume
|
||||||
|
dmsetup remove swap0
|
||||||
fi
|
fi
|
||||||
echo 254:0 > /sys/power/resume
|
umount /sys
|
||||||
dmsetup remove swap0
|
mount /mnt
|
||||||
fi
|
umount /proc
|
||||||
umount /sys
|
cd /mnt
|
||||||
mount /mnt
|
pivot_root . mnt
|
||||||
umount /proc
|
mount /proc
|
||||||
cd /mnt
|
umount -l /mnt
|
||||||
pivot_root . mnt
|
umount /proc
|
||||||
mount /proc
|
exec chroot . /sbin/init $* < dev/console > dev/console 2>&1
|
||||||
umount -l /mnt
|
|
||||||
umount /proc
|
|
||||||
exec chroot . /sbin/init $* < dev/console > dev/console 2>&1
|
|
||||||
|
|
||||||
Please don't mind the weird loop above, busybox's msh doesn't know
|
Please don't mind the weird loop above, busybox's msh doesn't know
|
||||||
the let statement. Now, what is happening in the script?
|
the let statement. Now, what is happening in the script?
|
|
@ -0,0 +1,501 @@
|
||||||
|
============
|
||||||
|
Swap suspend
|
||||||
|
============
|
||||||
|
|
||||||
|
Some warnings, first.
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
**BIG FAT WARNING**
|
||||||
|
|
||||||
|
If you touch anything on disk between suspend and resume...
|
||||||
|
...kiss your data goodbye.
|
||||||
|
|
||||||
|
If you do resume from initrd after your filesystems are mounted...
|
||||||
|
...bye bye root partition.
|
||||||
|
|
||||||
|
[this is actually same case as above]
|
||||||
|
|
||||||
|
If you have unsupported ( ) devices using DMA, you may have some
|
||||||
|
problems. If your disk driver does not support suspend... (IDE does),
|
||||||
|
it may cause some problems, too. If you change kernel command line
|
||||||
|
between suspend and resume, it may do something wrong. If you change
|
||||||
|
your hardware while system is suspended... well, it was not good idea;
|
||||||
|
but it will probably only crash.
|
||||||
|
|
||||||
|
( ) suspend/resume support is needed to make it safe.
|
||||||
|
|
||||||
|
If you have any filesystems on USB devices mounted before software suspend,
|
||||||
|
they won't be accessible after resume and you may lose data, as though
|
||||||
|
you have unplugged the USB devices with mounted filesystems on them;
|
||||||
|
see the FAQ below for details. (This is not true for more traditional
|
||||||
|
power states like "standby", which normally don't turn USB off.)
|
||||||
|
|
||||||
|
Swap partition:
|
||||||
|
You need to append resume=/dev/your_swap_partition to kernel command
|
||||||
|
line or specify it using /sys/power/resume.
|
||||||
|
|
||||||
|
Swap file:
|
||||||
|
If using a swapfile you can also specify a resume offset using
|
||||||
|
resume_offset=<number> on the kernel command line or specify it
|
||||||
|
in /sys/power/resume_offset.
|
||||||
|
|
||||||
|
After preparing then you suspend by::
|
||||||
|
|
||||||
|
echo shutdown > /sys/power/disk; echo disk > /sys/power/state
|
||||||
|
|
||||||
|
- If you feel ACPI works pretty well on your system, you might try::
|
||||||
|
|
||||||
|
echo platform > /sys/power/disk; echo disk > /sys/power/state
|
||||||
|
|
||||||
|
- If you would like to write hibernation image to swap and then suspend
|
||||||
|
to RAM (provided your platform supports it), you can try::
|
||||||
|
|
||||||
|
echo suspend > /sys/power/disk; echo disk > /sys/power/state
|
||||||
|
|
||||||
|
- If you have SATA disks, you'll need recent kernels with SATA suspend
|
||||||
|
support. For suspend and resume to work, make sure your disk drivers
|
||||||
|
are built into kernel -- not modules. [There's way to make
|
||||||
|
suspend/resume with modular disk drivers, see FAQ, but you probably
|
||||||
|
should not do that.]
|
||||||
|
|
||||||
|
If you want to limit the suspend image size to N bytes, do::
|
||||||
|
|
||||||
|
echo N > /sys/power/image_size
|
||||||
|
|
||||||
|
before suspend (it is limited to around 2/5 of available RAM by default).
|
||||||
|
|
||||||
|
- The resume process checks for the presence of the resume device,
|
||||||
|
if found, it then checks the contents for the hibernation image signature.
|
||||||
|
If both are found, it resumes the hibernation image.
|
||||||
|
|
||||||
|
- The resume process may be triggered in two ways:
|
||||||
|
|
||||||
|
1) During lateinit: If resume=/dev/your_swap_partition is specified on
|
||||||
|
the kernel command line, lateinit runs the resume process. If the
|
||||||
|
resume device has not been probed yet, the resume process fails and
|
||||||
|
bootup continues.
|
||||||
|
2) Manually from an initrd or initramfs: May be run from
|
||||||
|
the init script by using the /sys/power/resume file. It is vital
|
||||||
|
that this be done prior to remounting any filesystems (even as
|
||||||
|
read-only) otherwise data may be corrupted.
|
||||||
|
|
||||||
|
Article about goals and implementation of Software Suspend for Linux
|
||||||
|
====================================================================
|
||||||
|
|
||||||
|
Author: Gábor Kuti
|
||||||
|
Last revised: 2003-10-20 by Pavel Machek
|
||||||
|
|
||||||
|
Idea and goals to achieve
|
||||||
|
-------------------------
|
||||||
|
|
||||||
|
Nowadays it is common in several laptops that they have a suspend button. It
|
||||||
|
saves the state of the machine to a filesystem or to a partition and switches
|
||||||
|
to standby mode. Later resuming the machine the saved state is loaded back to
|
||||||
|
ram and the machine can continue its work. It has two real benefits. First we
|
||||||
|
save ourselves the time machine goes down and later boots up, energy costs
|
||||||
|
are real high when running from batteries. The other gain is that we don't have
|
||||||
|
to interrupt our programs so processes that are calculating something for a long
|
||||||
|
time shouldn't need to be written interruptible.
|
||||||
|
|
||||||
|
swsusp saves the state of the machine into active swaps and then reboots or
|
||||||
|
powerdowns. You must explicitly specify the swap partition to resume from with
|
||||||
|
`resume=` kernel option. If signature is found it loads and restores saved
|
||||||
|
state. If the option `noresume` is specified as a boot parameter, it skips
|
||||||
|
the resuming. If the option `hibernate=nocompress` is specified as a boot
|
||||||
|
parameter, it saves hibernation image without compression.
|
||||||
|
|
||||||
|
In the meantime while the system is suspended you should not add/remove any
|
||||||
|
of the hardware, write to the filesystems, etc.
|
||||||
|
|
||||||
|
Sleep states summary
|
||||||
|
====================
|
||||||
|
|
||||||
|
There are three different interfaces you can use, /proc/acpi should
|
||||||
|
work like this:
|
||||||
|
|
||||||
|
In a really perfect world::
|
||||||
|
|
||||||
|
echo 1 > /proc/acpi/sleep # for standby
|
||||||
|
echo 2 > /proc/acpi/sleep # for suspend to ram
|
||||||
|
echo 3 > /proc/acpi/sleep # for suspend to ram, but with more power conservative
|
||||||
|
echo 4 > /proc/acpi/sleep # for suspend to disk
|
||||||
|
echo 5 > /proc/acpi/sleep # for shutdown unfriendly the system
|
||||||
|
|
||||||
|
and perhaps::
|
||||||
|
|
||||||
|
echo 4b > /proc/acpi/sleep # for suspend to disk via s4bios
|
||||||
|
|
||||||
|
Frequently Asked Questions
|
||||||
|
==========================
|
||||||
|
|
||||||
|
Q:
|
||||||
|
well, suspending a server is IMHO a really stupid thing,
|
||||||
|
but... (Diego Zuccato):
|
||||||
|
|
||||||
|
A:
|
||||||
|
You bought new UPS for your server. How do you install it without
|
||||||
|
bringing machine down? Suspend to disk, rearrange power cables,
|
||||||
|
resume.
|
||||||
|
|
||||||
|
You have your server on UPS. Power died, and UPS is indicating 30
|
||||||
|
seconds to failure. What do you do? Suspend to disk.
|
||||||
|
|
||||||
|
|
||||||
|
Q:
|
||||||
|
Maybe I'm missing something, but why don't the regular I/O paths work?
|
||||||
|
|
||||||
|
A:
|
||||||
|
We do use the regular I/O paths. However we cannot restore the data
|
||||||
|
to its original location as we load it. That would create an
|
||||||
|
inconsistent kernel state which would certainly result in an oops.
|
||||||
|
Instead, we load the image into unused memory and then atomically copy
|
||||||
|
it back to it original location. This implies, of course, a maximum
|
||||||
|
image size of half the amount of memory.
|
||||||
|
|
||||||
|
There are two solutions to this:
|
||||||
|
|
||||||
|
* require half of memory to be free during suspend. That way you can
|
||||||
|
read "new" data onto free spots, then cli and copy
|
||||||
|
|
||||||
|
* assume we had special "polling" ide driver that only uses memory
|
||||||
|
between 0-640KB. That way, I'd have to make sure that 0-640KB is free
|
||||||
|
during suspending, but otherwise it would work...
|
||||||
|
|
||||||
|
suspend2 shares this fundamental limitation, but does not include user
|
||||||
|
data and disk caches into "used memory" by saving them in
|
||||||
|
advance. That means that the limitation goes away in practice.
|
||||||
|
|
||||||
|
Q:
|
||||||
|
Does linux support ACPI S4?
|
||||||
|
|
||||||
|
A:
|
||||||
|
Yes. That's what echo platform > /sys/power/disk does.
|
||||||
|
|
||||||
|
Q:
|
||||||
|
What is 'suspend2'?
|
||||||
|
|
||||||
|
A:
|
||||||
|
suspend2 is 'Software Suspend 2', a forked implementation of
|
||||||
|
suspend-to-disk which is available as separate patches for 2.4 and 2.6
|
||||||
|
kernels from swsusp.sourceforge.net. It includes support for SMP, 4GB
|
||||||
|
highmem and preemption. It also has a extensible architecture that
|
||||||
|
allows for arbitrary transformations on the image (compression,
|
||||||
|
encryption) and arbitrary backends for writing the image (eg to swap
|
||||||
|
or an NFS share[Work In Progress]). Questions regarding suspend2
|
||||||
|
should be sent to the mailing list available through the suspend2
|
||||||
|
website, and not to the Linux Kernel Mailing List. We are working
|
||||||
|
toward merging suspend2 into the mainline kernel.
|
||||||
|
|
||||||
|
Q:
|
||||||
|
What is the freezing of tasks and why are we using it?
|
||||||
|
|
||||||
|
A:
|
||||||
|
The freezing of tasks is a mechanism by which user space processes and some
|
||||||
|
kernel threads are controlled during hibernation or system-wide suspend (on some
|
||||||
|
architectures). See freezing-of-tasks.txt for details.
|
||||||
|
|
||||||
|
Q:
|
||||||
|
What is the difference between "platform" and "shutdown"?
|
||||||
|
|
||||||
|
A:
|
||||||
|
shutdown:
|
||||||
|
save state in linux, then tell bios to powerdown
|
||||||
|
|
||||||
|
platform:
|
||||||
|
save state in linux, then tell bios to powerdown and blink
|
||||||
|
"suspended led"
|
||||||
|
|
||||||
|
"platform" is actually right thing to do where supported, but
|
||||||
|
"shutdown" is most reliable (except on ACPI systems).
|
||||||
|
|
||||||
|
Q:
|
||||||
|
I do not understand why you have such strong objections to idea of
|
||||||
|
selective suspend.
|
||||||
|
|
||||||
|
A:
|
||||||
|
Do selective suspend during runtime power management, that's okay. But
|
||||||
|
it's useless for suspend-to-disk. (And I do not see how you could use
|
||||||
|
it for suspend-to-ram, I hope you do not want that).
|
||||||
|
|
||||||
|
Lets see, so you suggest to
|
||||||
|
|
||||||
|
* SUSPEND all but swap device and parents
|
||||||
|
* Snapshot
|
||||||
|
* Write image to disk
|
||||||
|
* SUSPEND swap device and parents
|
||||||
|
* Powerdown
|
||||||
|
|
||||||
|
Oh no, that does not work, if swap device or its parents uses DMA,
|
||||||
|
you've corrupted data. You'd have to do
|
||||||
|
|
||||||
|
* SUSPEND all but swap device and parents
|
||||||
|
* FREEZE swap device and parents
|
||||||
|
* Snapshot
|
||||||
|
* UNFREEZE swap device and parents
|
||||||
|
* Write
|
||||||
|
* SUSPEND swap device and parents
|
||||||
|
|
||||||
|
Which means that you still need that FREEZE state, and you get more
|
||||||
|
complicated code. (And I have not yet introduce details like system
|
||||||
|
devices).
|
||||||
|
|
||||||
|
Q:
|
||||||
|
There don't seem to be any generally useful behavioral
|
||||||
|
distinctions between SUSPEND and FREEZE.
|
||||||
|
|
||||||
|
A:
|
||||||
|
Doing SUSPEND when you are asked to do FREEZE is always correct,
|
||||||
|
but it may be unnecessarily slow. If you want your driver to stay simple,
|
||||||
|
slowness may not matter to you. It can always be fixed later.
|
||||||
|
|
||||||
|
For devices like disk it does matter, you do not want to spindown for
|
||||||
|
FREEZE.
|
||||||
|
|
||||||
|
Q:
|
||||||
|
After resuming, system is paging heavily, leading to very bad interactivity.
|
||||||
|
|
||||||
|
A:
|
||||||
|
Try running::
|
||||||
|
|
||||||
|
cat /proc/[0-9]*/maps | grep / | sed 's:.* /:/:' | sort -u | while read file
|
||||||
|
do
|
||||||
|
test -f "$file" && cat "$file" > /dev/null
|
||||||
|
done
|
||||||
|
|
||||||
|
after resume. swapoff -a; swapon -a may also be useful.
|
||||||
|
|
||||||
|
Q:
|
||||||
|
What happens to devices during swsusp? They seem to be resumed
|
||||||
|
during system suspend?
|
||||||
|
|
||||||
|
A:
|
||||||
|
That's correct. We need to resume them if we want to write image to
|
||||||
|
disk. Whole sequence goes like
|
||||||
|
|
||||||
|
**Suspend part**
|
||||||
|
|
||||||
|
running system, user asks for suspend-to-disk
|
||||||
|
|
||||||
|
user processes are stopped
|
||||||
|
|
||||||
|
suspend(PMSG_FREEZE): devices are frozen so that they don't interfere
|
||||||
|
with state snapshot
|
||||||
|
|
||||||
|
state snapshot: copy of whole used memory is taken with interrupts disabled
|
||||||
|
|
||||||
|
resume(): devices are woken up so that we can write image to swap
|
||||||
|
|
||||||
|
write image to swap
|
||||||
|
|
||||||
|
suspend(PMSG_SUSPEND): suspend devices so that we can power off
|
||||||
|
|
||||||
|
turn the power off
|
||||||
|
|
||||||
|
**Resume part**
|
||||||
|
|
||||||
|
(is actually pretty similar)
|
||||||
|
|
||||||
|
running system, user asks for suspend-to-disk
|
||||||
|
|
||||||
|
user processes are stopped (in common case there are none,
|
||||||
|
but with resume-from-initrd, no one knows)
|
||||||
|
|
||||||
|
read image from disk
|
||||||
|
|
||||||
|
suspend(PMSG_FREEZE): devices are frozen so that they don't interfere
|
||||||
|
with image restoration
|
||||||
|
|
||||||
|
image restoration: rewrite memory with image
|
||||||
|
|
||||||
|
resume(): devices are woken up so that system can continue
|
||||||
|
|
||||||
|
thaw all user processes
|
||||||
|
|
||||||
|
Q:
|
||||||
|
What is this 'Encrypt suspend image' for?
|
||||||
|
|
||||||
|
A:
|
||||||
|
First of all: it is not a replacement for dm-crypt encrypted swap.
|
||||||
|
It cannot protect your computer while it is suspended. Instead it does
|
||||||
|
protect from leaking sensitive data after resume from suspend.
|
||||||
|
|
||||||
|
Think of the following: you suspend while an application is running
|
||||||
|
that keeps sensitive data in memory. The application itself prevents
|
||||||
|
the data from being swapped out. Suspend, however, must write these
|
||||||
|
data to swap to be able to resume later on. Without suspend encryption
|
||||||
|
your sensitive data are then stored in plaintext on disk. This means
|
||||||
|
that after resume your sensitive data are accessible to all
|
||||||
|
applications having direct access to the swap device which was used
|
||||||
|
for suspend. If you don't need swap after resume these data can remain
|
||||||
|
on disk virtually forever. Thus it can happen that your system gets
|
||||||
|
broken in weeks later and sensitive data which you thought were
|
||||||
|
encrypted and protected are retrieved and stolen from the swap device.
|
||||||
|
To prevent this situation you should use 'Encrypt suspend image'.
|
||||||
|
|
||||||
|
During suspend a temporary key is created and this key is used to
|
||||||
|
encrypt the data written to disk. When, during resume, the data was
|
||||||
|
read back into memory the temporary key is destroyed which simply
|
||||||
|
means that all data written to disk during suspend are then
|
||||||
|
inaccessible so they can't be stolen later on. The only thing that
|
||||||
|
you must then take care of is that you call 'mkswap' for the swap
|
||||||
|
partition used for suspend as early as possible during regular
|
||||||
|
boot. This asserts that any temporary key from an oopsed suspend or
|
||||||
|
from a failed or aborted resume is erased from the swap device.
|
||||||
|
|
||||||
|
As a rule of thumb use encrypted swap to protect your data while your
|
||||||
|
system is shut down or suspended. Additionally use the encrypted
|
||||||
|
suspend image to prevent sensitive data from being stolen after
|
||||||
|
resume.
|
||||||
|
|
||||||
|
Q:
|
||||||
|
Can I suspend to a swap file?
|
||||||
|
|
||||||
|
A:
|
||||||
|
Generally, yes, you can. However, it requires you to use the "resume=" and
|
||||||
|
"resume_offset=" kernel command line parameters, so the resume from a swap file
|
||||||
|
cannot be initiated from an initrd or initramfs image. See
|
||||||
|
swsusp-and-swap-files.txt for details.
|
||||||
|
|
||||||
|
Q:
|
||||||
|
Is there a maximum system RAM size that is supported by swsusp?
|
||||||
|
|
||||||
|
A:
|
||||||
|
It should work okay with highmem.
|
||||||
|
|
||||||
|
Q:
|
||||||
|
Does swsusp (to disk) use only one swap partition or can it use
|
||||||
|
multiple swap partitions (aggregate them into one logical space)?
|
||||||
|
|
||||||
|
A:
|
||||||
|
Only one swap partition, sorry.
|
||||||
|
|
||||||
|
Q:
|
||||||
|
If my application(s) causes lots of memory & swap space to be used
|
||||||
|
(over half of the total system RAM), is it correct that it is likely
|
||||||
|
to be useless to try to suspend to disk while that app is running?
|
||||||
|
|
||||||
|
A:
|
||||||
|
No, it should work okay, as long as your app does not mlock()
|
||||||
|
it. Just prepare big enough swap partition.
|
||||||
|
|
||||||
|
Q:
|
||||||
|
What information is useful for debugging suspend-to-disk problems?
|
||||||
|
|
||||||
|
A:
|
||||||
|
Well, last messages on the screen are always useful. If something
|
||||||
|
is broken, it is usually some kernel driver, therefore trying with as
|
||||||
|
little as possible modules loaded helps a lot. I also prefer people to
|
||||||
|
suspend from console, preferably without X running. Booting with
|
||||||
|
init=/bin/bash, then swapon and starting suspend sequence manually
|
||||||
|
usually does the trick. Then it is good idea to try with latest
|
||||||
|
vanilla kernel.
|
||||||
|
|
||||||
|
Q:
|
||||||
|
How can distributions ship a swsusp-supporting kernel with modular
|
||||||
|
disk drivers (especially SATA)?
|
||||||
|
|
||||||
|
A:
|
||||||
|
Well, it can be done, load the drivers, then do echo into
|
||||||
|
/sys/power/resume file from initrd. Be sure not to mount
|
||||||
|
anything, not even read-only mount, or you are going to lose your
|
||||||
|
data.
|
||||||
|
|
||||||
|
Q:
|
||||||
|
How do I make suspend more verbose?
|
||||||
|
|
||||||
|
A:
|
||||||
|
If you want to see any non-error kernel messages on the virtual
|
||||||
|
terminal the kernel switches to during suspend, you have to set the
|
||||||
|
kernel console loglevel to at least 4 (KERN_WARNING), for example by
|
||||||
|
doing::
|
||||||
|
|
||||||
|
# save the old loglevel
|
||||||
|
read LOGLEVEL DUMMY < /proc/sys/kernel/printk
|
||||||
|
# set the loglevel so we see the progress bar.
|
||||||
|
# if the level is higher than needed, we leave it alone.
|
||||||
|
if [ $LOGLEVEL -lt 5 ]; then
|
||||||
|
echo 5 > /proc/sys/kernel/printk
|
||||||
|
fi
|
||||||
|
|
||||||
|
IMG_SZ=0
|
||||||
|
read IMG_SZ < /sys/power/image_size
|
||||||
|
echo -n disk > /sys/power/state
|
||||||
|
RET=$?
|
||||||
|
#
|
||||||
|
# the logic here is:
|
||||||
|
# if image_size > 0 (without kernel support, IMG_SZ will be zero),
|
||||||
|
# then try again with image_size set to zero.
|
||||||
|
if [ $RET -ne 0 -a $IMG_SZ -ne 0 ]; then # try again with minimal image size
|
||||||
|
echo 0 > /sys/power/image_size
|
||||||
|
echo -n disk > /sys/power/state
|
||||||
|
RET=$?
|
||||||
|
fi
|
||||||
|
|
||||||
|
# restore previous loglevel
|
||||||
|
echo $LOGLEVEL > /proc/sys/kernel/printk
|
||||||
|
exit $RET
|
||||||
|
|
||||||
|
Q:
|
||||||
|
Is this true that if I have a mounted filesystem on a USB device and
|
||||||
|
I suspend to disk, I can lose data unless the filesystem has been mounted
|
||||||
|
with "sync"?
|
||||||
|
|
||||||
|
A:
|
||||||
|
That's right ... if you disconnect that device, you may lose data.
|
||||||
|
In fact, even with "-o sync" you can lose data if your programs have
|
||||||
|
information in buffers they haven't written out to a disk you disconnect,
|
||||||
|
or if you disconnect before the device finished saving data you wrote.
|
||||||
|
|
||||||
|
Software suspend normally powers down USB controllers, which is equivalent
|
||||||
|
to disconnecting all USB devices attached to your system.
|
||||||
|
|
||||||
|
Your system might well support low-power modes for its USB controllers
|
||||||
|
while the system is asleep, maintaining the connection, using true sleep
|
||||||
|
modes like "suspend-to-RAM" or "standby". (Don't write "disk" to the
|
||||||
|
/sys/power/state file; write "standby" or "mem".) We've not seen any
|
||||||
|
hardware that can use these modes through software suspend, although in
|
||||||
|
theory some systems might support "platform" modes that won't break the
|
||||||
|
USB connections.
|
||||||
|
|
||||||
|
Remember that it's always a bad idea to unplug a disk drive containing a
|
||||||
|
mounted filesystem. That's true even when your system is asleep! The
|
||||||
|
safest thing is to unmount all filesystems on removable media (such USB,
|
||||||
|
Firewire, CompactFlash, MMC, external SATA, or even IDE hotplug bays)
|
||||||
|
before suspending; then remount them after resuming.
|
||||||
|
|
||||||
|
There is a work-around for this problem. For more information, see
|
||||||
|
Documentation/driver-api/usb/persist.rst.
|
||||||
|
|
||||||
|
Q:
|
||||||
|
Can I suspend-to-disk using a swap partition under LVM?
|
||||||
|
|
||||||
|
A:
|
||||||
|
Yes and No. You can suspend successfully, but the kernel will not be able
|
||||||
|
to resume on its own. You need an initramfs that can recognize the resume
|
||||||
|
situation, activate the logical volume containing the swap volume (but not
|
||||||
|
touch any filesystems!), and eventually call::
|
||||||
|
|
||||||
|
echo -n "$major:$minor" > /sys/power/resume
|
||||||
|
|
||||||
|
where $major and $minor are the respective major and minor device numbers of
|
||||||
|
the swap volume.
|
||||||
|
|
||||||
|
uswsusp works with LVM, too. See http://suspend.sourceforge.net/
|
||||||
|
|
||||||
|
Q:
|
||||||
|
I upgraded the kernel from 2.6.15 to 2.6.16. Both kernels were
|
||||||
|
compiled with the similar configuration files. Anyway I found that
|
||||||
|
suspend to disk (and resume) is much slower on 2.6.16 compared to
|
||||||
|
2.6.15. Any idea for why that might happen or how can I speed it up?
|
||||||
|
|
||||||
|
A:
|
||||||
|
This is because the size of the suspend image is now greater than
|
||||||
|
for 2.6.15 (by saving more data we can get more responsive system
|
||||||
|
after resume).
|
||||||
|
|
||||||
|
There's the /sys/power/image_size knob that controls the size of the
|
||||||
|
image. If you set it to 0 (eg. by echo 0 > /sys/power/image_size as
|
||||||
|
root), the 2.6.15 behavior should be restored. If it is still too
|
||||||
|
slow, take a look at suspend.sf.net -- userland suspend is faster and
|
||||||
|
supports LZF compression to speed it up further.
|
|
@ -1,446 +0,0 @@
|
||||||
Some warnings, first.
|
|
||||||
|
|
||||||
* BIG FAT WARNING *********************************************************
|
|
||||||
*
|
|
||||||
* If you touch anything on disk between suspend and resume...
|
|
||||||
* ...kiss your data goodbye.
|
|
||||||
*
|
|
||||||
* If you do resume from initrd after your filesystems are mounted...
|
|
||||||
* ...bye bye root partition.
|
|
||||||
* [this is actually same case as above]
|
|
||||||
*
|
|
||||||
* If you have unsupported (*) devices using DMA, you may have some
|
|
||||||
* problems. If your disk driver does not support suspend... (IDE does),
|
|
||||||
* it may cause some problems, too. If you change kernel command line
|
|
||||||
* between suspend and resume, it may do something wrong. If you change
|
|
||||||
* your hardware while system is suspended... well, it was not good idea;
|
|
||||||
* but it will probably only crash.
|
|
||||||
*
|
|
||||||
* (*) suspend/resume support is needed to make it safe.
|
|
||||||
*
|
|
||||||
* If you have any filesystems on USB devices mounted before software suspend,
|
|
||||||
* they won't be accessible after resume and you may lose data, as though
|
|
||||||
* you have unplugged the USB devices with mounted filesystems on them;
|
|
||||||
* see the FAQ below for details. (This is not true for more traditional
|
|
||||||
* power states like "standby", which normally don't turn USB off.)
|
|
||||||
|
|
||||||
Swap partition:
|
|
||||||
You need to append resume=/dev/your_swap_partition to kernel command
|
|
||||||
line or specify it using /sys/power/resume.
|
|
||||||
|
|
||||||
Swap file:
|
|
||||||
If using a swapfile you can also specify a resume offset using
|
|
||||||
resume_offset=<number> on the kernel command line or specify it
|
|
||||||
in /sys/power/resume_offset.
|
|
||||||
|
|
||||||
After preparing then you suspend by
|
|
||||||
|
|
||||||
echo shutdown > /sys/power/disk; echo disk > /sys/power/state
|
|
||||||
|
|
||||||
. If you feel ACPI works pretty well on your system, you might try
|
|
||||||
|
|
||||||
echo platform > /sys/power/disk; echo disk > /sys/power/state
|
|
||||||
|
|
||||||
. If you would like to write hibernation image to swap and then suspend
|
|
||||||
to RAM (provided your platform supports it), you can try
|
|
||||||
|
|
||||||
echo suspend > /sys/power/disk; echo disk > /sys/power/state
|
|
||||||
|
|
||||||
. If you have SATA disks, you'll need recent kernels with SATA suspend
|
|
||||||
support. For suspend and resume to work, make sure your disk drivers
|
|
||||||
are built into kernel -- not modules. [There's way to make
|
|
||||||
suspend/resume with modular disk drivers, see FAQ, but you probably
|
|
||||||
should not do that.]
|
|
||||||
|
|
||||||
If you want to limit the suspend image size to N bytes, do
|
|
||||||
|
|
||||||
echo N > /sys/power/image_size
|
|
||||||
|
|
||||||
before suspend (it is limited to around 2/5 of available RAM by default).
|
|
||||||
|
|
||||||
. The resume process checks for the presence of the resume device,
|
|
||||||
if found, it then checks the contents for the hibernation image signature.
|
|
||||||
If both are found, it resumes the hibernation image.
|
|
||||||
|
|
||||||
. The resume process may be triggered in two ways:
|
|
||||||
1) During lateinit: If resume=/dev/your_swap_partition is specified on
|
|
||||||
the kernel command line, lateinit runs the resume process. If the
|
|
||||||
resume device has not been probed yet, the resume process fails and
|
|
||||||
bootup continues.
|
|
||||||
2) Manually from an initrd or initramfs: May be run from
|
|
||||||
the init script by using the /sys/power/resume file. It is vital
|
|
||||||
that this be done prior to remounting any filesystems (even as
|
|
||||||
read-only) otherwise data may be corrupted.
|
|
||||||
|
|
||||||
Article about goals and implementation of Software Suspend for Linux
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
Author: Gábor Kuti
|
|
||||||
Last revised: 2003-10-20 by Pavel Machek
|
|
||||||
|
|
||||||
Idea and goals to achieve
|
|
||||||
|
|
||||||
Nowadays it is common in several laptops that they have a suspend button. It
|
|
||||||
saves the state of the machine to a filesystem or to a partition and switches
|
|
||||||
to standby mode. Later resuming the machine the saved state is loaded back to
|
|
||||||
ram and the machine can continue its work. It has two real benefits. First we
|
|
||||||
save ourselves the time machine goes down and later boots up, energy costs
|
|
||||||
are real high when running from batteries. The other gain is that we don't have to
|
|
||||||
interrupt our programs so processes that are calculating something for a long
|
|
||||||
time shouldn't need to be written interruptible.
|
|
||||||
|
|
||||||
swsusp saves the state of the machine into active swaps and then reboots or
|
|
||||||
powerdowns. You must explicitly specify the swap partition to resume from with
|
|
||||||
``resume='' kernel option. If signature is found it loads and restores saved
|
|
||||||
state. If the option ``noresume'' is specified as a boot parameter, it skips
|
|
||||||
the resuming. If the option ``hibernate=nocompress'' is specified as a boot
|
|
||||||
parameter, it saves hibernation image without compression.
|
|
||||||
|
|
||||||
In the meantime while the system is suspended you should not add/remove any
|
|
||||||
of the hardware, write to the filesystems, etc.
|
|
||||||
|
|
||||||
Sleep states summary
|
|
||||||
====================
|
|
||||||
|
|
||||||
There are three different interfaces you can use, /proc/acpi should
|
|
||||||
work like this:
|
|
||||||
|
|
||||||
In a really perfect world:
|
|
||||||
echo 1 > /proc/acpi/sleep # for standby
|
|
||||||
echo 2 > /proc/acpi/sleep # for suspend to ram
|
|
||||||
echo 3 > /proc/acpi/sleep # for suspend to ram, but with more power conservative
|
|
||||||
echo 4 > /proc/acpi/sleep # for suspend to disk
|
|
||||||
echo 5 > /proc/acpi/sleep # for shutdown unfriendly the system
|
|
||||||
|
|
||||||
and perhaps
|
|
||||||
echo 4b > /proc/acpi/sleep # for suspend to disk via s4bios
|
|
||||||
|
|
||||||
Frequently Asked Questions
|
|
||||||
==========================
|
|
||||||
|
|
||||||
Q: well, suspending a server is IMHO a really stupid thing,
|
|
||||||
but... (Diego Zuccato):
|
|
||||||
|
|
||||||
A: You bought new UPS for your server. How do you install it without
|
|
||||||
bringing machine down? Suspend to disk, rearrange power cables,
|
|
||||||
resume.
|
|
||||||
|
|
||||||
You have your server on UPS. Power died, and UPS is indicating 30
|
|
||||||
seconds to failure. What do you do? Suspend to disk.
|
|
||||||
|
|
||||||
|
|
||||||
Q: Maybe I'm missing something, but why don't the regular I/O paths work?
|
|
||||||
|
|
||||||
A: We do use the regular I/O paths. However we cannot restore the data
|
|
||||||
to its original location as we load it. That would create an
|
|
||||||
inconsistent kernel state which would certainly result in an oops.
|
|
||||||
Instead, we load the image into unused memory and then atomically copy
|
|
||||||
it back to it original location. This implies, of course, a maximum
|
|
||||||
image size of half the amount of memory.
|
|
||||||
|
|
||||||
There are two solutions to this:
|
|
||||||
|
|
||||||
* require half of memory to be free during suspend. That way you can
|
|
||||||
read "new" data onto free spots, then cli and copy
|
|
||||||
|
|
||||||
* assume we had special "polling" ide driver that only uses memory
|
|
||||||
between 0-640KB. That way, I'd have to make sure that 0-640KB is free
|
|
||||||
during suspending, but otherwise it would work...
|
|
||||||
|
|
||||||
suspend2 shares this fundamental limitation, but does not include user
|
|
||||||
data and disk caches into "used memory" by saving them in
|
|
||||||
advance. That means that the limitation goes away in practice.
|
|
||||||
|
|
||||||
Q: Does linux support ACPI S4?
|
|
||||||
|
|
||||||
A: Yes. That's what echo platform > /sys/power/disk does.
|
|
||||||
|
|
||||||
Q: What is 'suspend2'?
|
|
||||||
|
|
||||||
A: suspend2 is 'Software Suspend 2', a forked implementation of
|
|
||||||
suspend-to-disk which is available as separate patches for 2.4 and 2.6
|
|
||||||
kernels from swsusp.sourceforge.net. It includes support for SMP, 4GB
|
|
||||||
highmem and preemption. It also has a extensible architecture that
|
|
||||||
allows for arbitrary transformations on the image (compression,
|
|
||||||
encryption) and arbitrary backends for writing the image (eg to swap
|
|
||||||
or an NFS share[Work In Progress]). Questions regarding suspend2
|
|
||||||
should be sent to the mailing list available through the suspend2
|
|
||||||
website, and not to the Linux Kernel Mailing List. We are working
|
|
||||||
toward merging suspend2 into the mainline kernel.
|
|
||||||
|
|
||||||
Q: What is the freezing of tasks and why are we using it?
|
|
||||||
|
|
||||||
A: The freezing of tasks is a mechanism by which user space processes and some
|
|
||||||
kernel threads are controlled during hibernation or system-wide suspend (on some
|
|
||||||
architectures). See freezing-of-tasks.txt for details.
|
|
||||||
|
|
||||||
Q: What is the difference between "platform" and "shutdown"?
|
|
||||||
|
|
||||||
A:
|
|
||||||
|
|
||||||
shutdown: save state in linux, then tell bios to powerdown
|
|
||||||
|
|
||||||
platform: save state in linux, then tell bios to powerdown and blink
|
|
||||||
"suspended led"
|
|
||||||
|
|
||||||
"platform" is actually right thing to do where supported, but
|
|
||||||
"shutdown" is most reliable (except on ACPI systems).
|
|
||||||
|
|
||||||
Q: I do not understand why you have such strong objections to idea of
|
|
||||||
selective suspend.
|
|
||||||
|
|
||||||
A: Do selective suspend during runtime power management, that's okay. But
|
|
||||||
it's useless for suspend-to-disk. (And I do not see how you could use
|
|
||||||
it for suspend-to-ram, I hope you do not want that).
|
|
||||||
|
|
||||||
Lets see, so you suggest to
|
|
||||||
|
|
||||||
* SUSPEND all but swap device and parents
|
|
||||||
* Snapshot
|
|
||||||
* Write image to disk
|
|
||||||
* SUSPEND swap device and parents
|
|
||||||
* Powerdown
|
|
||||||
|
|
||||||
Oh no, that does not work, if swap device or its parents uses DMA,
|
|
||||||
you've corrupted data. You'd have to do
|
|
||||||
|
|
||||||
* SUSPEND all but swap device and parents
|
|
||||||
* FREEZE swap device and parents
|
|
||||||
* Snapshot
|
|
||||||
* UNFREEZE swap device and parents
|
|
||||||
* Write
|
|
||||||
* SUSPEND swap device and parents
|
|
||||||
|
|
||||||
Which means that you still need that FREEZE state, and you get more
|
|
||||||
complicated code. (And I have not yet introduce details like system
|
|
||||||
devices).
|
|
||||||
|
|
||||||
Q: There don't seem to be any generally useful behavioral
|
|
||||||
distinctions between SUSPEND and FREEZE.
|
|
||||||
|
|
||||||
A: Doing SUSPEND when you are asked to do FREEZE is always correct,
|
|
||||||
but it may be unnecessarily slow. If you want your driver to stay simple,
|
|
||||||
slowness may not matter to you. It can always be fixed later.
|
|
||||||
|
|
||||||
For devices like disk it does matter, you do not want to spindown for
|
|
||||||
FREEZE.
|
|
||||||
|
|
||||||
Q: After resuming, system is paging heavily, leading to very bad interactivity.
|
|
||||||
|
|
||||||
A: Try running
|
|
||||||
|
|
||||||
cat /proc/[0-9]*/maps | grep / | sed 's:.* /:/:' | sort -u | while read file
|
|
||||||
do
|
|
||||||
test -f "$file" && cat "$file" > /dev/null
|
|
||||||
done
|
|
||||||
|
|
||||||
after resume. swapoff -a; swapon -a may also be useful.
|
|
||||||
|
|
||||||
Q: What happens to devices during swsusp? They seem to be resumed
|
|
||||||
during system suspend?
|
|
||||||
|
|
||||||
A: That's correct. We need to resume them if we want to write image to
|
|
||||||
disk. Whole sequence goes like
|
|
||||||
|
|
||||||
Suspend part
|
|
||||||
~~~~~~~~~~~~
|
|
||||||
running system, user asks for suspend-to-disk
|
|
||||||
|
|
||||||
user processes are stopped
|
|
||||||
|
|
||||||
suspend(PMSG_FREEZE): devices are frozen so that they don't interfere
|
|
||||||
with state snapshot
|
|
||||||
|
|
||||||
state snapshot: copy of whole used memory is taken with interrupts disabled
|
|
||||||
|
|
||||||
resume(): devices are woken up so that we can write image to swap
|
|
||||||
|
|
||||||
write image to swap
|
|
||||||
|
|
||||||
suspend(PMSG_SUSPEND): suspend devices so that we can power off
|
|
||||||
|
|
||||||
turn the power off
|
|
||||||
|
|
||||||
Resume part
|
|
||||||
~~~~~~~~~~~
|
|
||||||
(is actually pretty similar)
|
|
||||||
|
|
||||||
running system, user asks for suspend-to-disk
|
|
||||||
|
|
||||||
user processes are stopped (in common case there are none, but with resume-from-initrd, no one knows)
|
|
||||||
|
|
||||||
read image from disk
|
|
||||||
|
|
||||||
suspend(PMSG_FREEZE): devices are frozen so that they don't interfere
|
|
||||||
with image restoration
|
|
||||||
|
|
||||||
image restoration: rewrite memory with image
|
|
||||||
|
|
||||||
resume(): devices are woken up so that system can continue
|
|
||||||
|
|
||||||
thaw all user processes
|
|
||||||
|
|
||||||
Q: What is this 'Encrypt suspend image' for?
|
|
||||||
|
|
||||||
A: First of all: it is not a replacement for dm-crypt encrypted swap.
|
|
||||||
It cannot protect your computer while it is suspended. Instead it does
|
|
||||||
protect from leaking sensitive data after resume from suspend.
|
|
||||||
|
|
||||||
Think of the following: you suspend while an application is running
|
|
||||||
that keeps sensitive data in memory. The application itself prevents
|
|
||||||
the data from being swapped out. Suspend, however, must write these
|
|
||||||
data to swap to be able to resume later on. Without suspend encryption
|
|
||||||
your sensitive data are then stored in plaintext on disk. This means
|
|
||||||
that after resume your sensitive data are accessible to all
|
|
||||||
applications having direct access to the swap device which was used
|
|
||||||
for suspend. If you don't need swap after resume these data can remain
|
|
||||||
on disk virtually forever. Thus it can happen that your system gets
|
|
||||||
broken in weeks later and sensitive data which you thought were
|
|
||||||
encrypted and protected are retrieved and stolen from the swap device.
|
|
||||||
To prevent this situation you should use 'Encrypt suspend image'.
|
|
||||||
|
|
||||||
During suspend a temporary key is created and this key is used to
|
|
||||||
encrypt the data written to disk. When, during resume, the data was
|
|
||||||
read back into memory the temporary key is destroyed which simply
|
|
||||||
means that all data written to disk during suspend are then
|
|
||||||
inaccessible so they can't be stolen later on. The only thing that
|
|
||||||
you must then take care of is that you call 'mkswap' for the swap
|
|
||||||
partition used for suspend as early as possible during regular
|
|
||||||
boot. This asserts that any temporary key from an oopsed suspend or
|
|
||||||
from a failed or aborted resume is erased from the swap device.
|
|
||||||
|
|
||||||
As a rule of thumb use encrypted swap to protect your data while your
|
|
||||||
system is shut down or suspended. Additionally use the encrypted
|
|
||||||
suspend image to prevent sensitive data from being stolen after
|
|
||||||
resume.
|
|
||||||
|
|
||||||
Q: Can I suspend to a swap file?
|
|
||||||
|
|
||||||
A: Generally, yes, you can. However, it requires you to use the "resume=" and
|
|
||||||
"resume_offset=" kernel command line parameters, so the resume from a swap file
|
|
||||||
cannot be initiated from an initrd or initramfs image. See
|
|
||||||
swsusp-and-swap-files.txt for details.
|
|
||||||
|
|
||||||
Q: Is there a maximum system RAM size that is supported by swsusp?
|
|
||||||
|
|
||||||
A: It should work okay with highmem.
|
|
||||||
|
|
||||||
Q: Does swsusp (to disk) use only one swap partition or can it use
|
|
||||||
multiple swap partitions (aggregate them into one logical space)?
|
|
||||||
|
|
||||||
A: Only one swap partition, sorry.
|
|
||||||
|
|
||||||
Q: If my application(s) causes lots of memory & swap space to be used
|
|
||||||
(over half of the total system RAM), is it correct that it is likely
|
|
||||||
to be useless to try to suspend to disk while that app is running?
|
|
||||||
|
|
||||||
A: No, it should work okay, as long as your app does not mlock()
|
|
||||||
it. Just prepare big enough swap partition.
|
|
||||||
|
|
||||||
Q: What information is useful for debugging suspend-to-disk problems?
|
|
||||||
|
|
||||||
A: Well, last messages on the screen are always useful. If something
|
|
||||||
is broken, it is usually some kernel driver, therefore trying with as
|
|
||||||
little as possible modules loaded helps a lot. I also prefer people to
|
|
||||||
suspend from console, preferably without X running. Booting with
|
|
||||||
init=/bin/bash, then swapon and starting suspend sequence manually
|
|
||||||
usually does the trick. Then it is good idea to try with latest
|
|
||||||
vanilla kernel.
|
|
||||||
|
|
||||||
Q: How can distributions ship a swsusp-supporting kernel with modular
|
|
||||||
disk drivers (especially SATA)?
|
|
||||||
|
|
||||||
A: Well, it can be done, load the drivers, then do echo into
|
|
||||||
/sys/power/resume file from initrd. Be sure not to mount
|
|
||||||
anything, not even read-only mount, or you are going to lose your
|
|
||||||
data.
|
|
||||||
|
|
||||||
Q: How do I make suspend more verbose?
|
|
||||||
|
|
||||||
A: If you want to see any non-error kernel messages on the virtual
|
|
||||||
terminal the kernel switches to during suspend, you have to set the
|
|
||||||
kernel console loglevel to at least 4 (KERN_WARNING), for example by
|
|
||||||
doing
|
|
||||||
|
|
||||||
# save the old loglevel
|
|
||||||
read LOGLEVEL DUMMY < /proc/sys/kernel/printk
|
|
||||||
# set the loglevel so we see the progress bar.
|
|
||||||
# if the level is higher than needed, we leave it alone.
|
|
||||||
if [ $LOGLEVEL -lt 5 ]; then
|
|
||||||
echo 5 > /proc/sys/kernel/printk
|
|
||||||
fi
|
|
||||||
|
|
||||||
IMG_SZ=0
|
|
||||||
read IMG_SZ < /sys/power/image_size
|
|
||||||
echo -n disk > /sys/power/state
|
|
||||||
RET=$?
|
|
||||||
#
|
|
||||||
# the logic here is:
|
|
||||||
# if image_size > 0 (without kernel support, IMG_SZ will be zero),
|
|
||||||
# then try again with image_size set to zero.
|
|
||||||
if [ $RET -ne 0 -a $IMG_SZ -ne 0 ]; then # try again with minimal image size
|
|
||||||
echo 0 > /sys/power/image_size
|
|
||||||
echo -n disk > /sys/power/state
|
|
||||||
RET=$?
|
|
||||||
fi
|
|
||||||
|
|
||||||
# restore previous loglevel
|
|
||||||
echo $LOGLEVEL > /proc/sys/kernel/printk
|
|
||||||
exit $RET
|
|
||||||
|
|
||||||
Q: Is this true that if I have a mounted filesystem on a USB device and
|
|
||||||
I suspend to disk, I can lose data unless the filesystem has been mounted
|
|
||||||
with "sync"?
|
|
||||||
|
|
||||||
A: That's right ... if you disconnect that device, you may lose data.
|
|
||||||
In fact, even with "-o sync" you can lose data if your programs have
|
|
||||||
information in buffers they haven't written out to a disk you disconnect,
|
|
||||||
or if you disconnect before the device finished saving data you wrote.
|
|
||||||
|
|
||||||
Software suspend normally powers down USB controllers, which is equivalent
|
|
||||||
to disconnecting all USB devices attached to your system.
|
|
||||||
|
|
||||||
Your system might well support low-power modes for its USB controllers
|
|
||||||
while the system is asleep, maintaining the connection, using true sleep
|
|
||||||
modes like "suspend-to-RAM" or "standby". (Don't write "disk" to the
|
|
||||||
/sys/power/state file; write "standby" or "mem".) We've not seen any
|
|
||||||
hardware that can use these modes through software suspend, although in
|
|
||||||
theory some systems might support "platform" modes that won't break the
|
|
||||||
USB connections.
|
|
||||||
|
|
||||||
Remember that it's always a bad idea to unplug a disk drive containing a
|
|
||||||
mounted filesystem. That's true even when your system is asleep! The
|
|
||||||
safest thing is to unmount all filesystems on removable media (such USB,
|
|
||||||
Firewire, CompactFlash, MMC, external SATA, or even IDE hotplug bays)
|
|
||||||
before suspending; then remount them after resuming.
|
|
||||||
|
|
||||||
There is a work-around for this problem. For more information, see
|
|
||||||
Documentation/driver-api/usb/persist.rst.
|
|
||||||
|
|
||||||
Q: Can I suspend-to-disk using a swap partition under LVM?
|
|
||||||
|
|
||||||
A: Yes and No. You can suspend successfully, but the kernel will not be able
|
|
||||||
to resume on its own. You need an initramfs that can recognize the resume
|
|
||||||
situation, activate the logical volume containing the swap volume (but not
|
|
||||||
touch any filesystems!), and eventually call
|
|
||||||
|
|
||||||
echo -n "$major:$minor" > /sys/power/resume
|
|
||||||
|
|
||||||
where $major and $minor are the respective major and minor device numbers of
|
|
||||||
the swap volume.
|
|
||||||
|
|
||||||
uswsusp works with LVM, too. See http://suspend.sourceforge.net/
|
|
||||||
|
|
||||||
Q: I upgraded the kernel from 2.6.15 to 2.6.16. Both kernels were
|
|
||||||
compiled with the similar configuration files. Anyway I found that
|
|
||||||
suspend to disk (and resume) is much slower on 2.6.16 compared to
|
|
||||||
2.6.15. Any idea for why that might happen or how can I speed it up?
|
|
||||||
|
|
||||||
A: This is because the size of the suspend image is now greater than
|
|
||||||
for 2.6.15 (by saving more data we can get more responsive system
|
|
||||||
after resume).
|
|
||||||
|
|
||||||
There's the /sys/power/image_size knob that controls the size of the
|
|
||||||
image. If you set it to 0 (eg. by echo 0 > /sys/power/image_size as
|
|
||||||
root), the 2.6.15 behavior should be restored. If it is still too
|
|
||||||
slow, take a look at suspend.sf.net -- userland suspend is faster and
|
|
||||||
supports LZF compression to speed it up further.
|
|
|
@ -1,5 +1,7 @@
|
||||||
swsusp/S3 tricks
|
================
|
||||||
~~~~~~~~~~~~~~~~
|
swsusp/S3 tricks
|
||||||
|
================
|
||||||
|
|
||||||
Pavel Machek <pavel@ucw.cz>
|
Pavel Machek <pavel@ucw.cz>
|
||||||
|
|
||||||
If you want to trick swsusp/S3 into working, you might want to try:
|
If you want to trick swsusp/S3 into working, you might want to try:
|
|
@ -1,4 +1,7 @@
|
||||||
|
=====================================================
|
||||||
Documentation for userland software suspend interface
|
Documentation for userland software suspend interface
|
||||||
|
=====================================================
|
||||||
|
|
||||||
(C) 2006 Rafael J. Wysocki <rjw@sisk.pl>
|
(C) 2006 Rafael J. Wysocki <rjw@sisk.pl>
|
||||||
|
|
||||||
First, the warnings at the beginning of swsusp.txt still apply.
|
First, the warnings at the beginning of swsusp.txt still apply.
|
||||||
|
@ -30,13 +33,16 @@ called.
|
||||||
|
|
||||||
The ioctl() commands recognized by the device are:
|
The ioctl() commands recognized by the device are:
|
||||||
|
|
||||||
SNAPSHOT_FREEZE - freeze user space processes (the current process is
|
SNAPSHOT_FREEZE
|
||||||
|
freeze user space processes (the current process is
|
||||||
not frozen); this is required for SNAPSHOT_CREATE_IMAGE
|
not frozen); this is required for SNAPSHOT_CREATE_IMAGE
|
||||||
and SNAPSHOT_ATOMIC_RESTORE to succeed
|
and SNAPSHOT_ATOMIC_RESTORE to succeed
|
||||||
|
|
||||||
SNAPSHOT_UNFREEZE - thaw user space processes frozen by SNAPSHOT_FREEZE
|
SNAPSHOT_UNFREEZE
|
||||||
|
thaw user space processes frozen by SNAPSHOT_FREEZE
|
||||||
|
|
||||||
SNAPSHOT_CREATE_IMAGE - create a snapshot of the system memory; the
|
SNAPSHOT_CREATE_IMAGE
|
||||||
|
create a snapshot of the system memory; the
|
||||||
last argument of ioctl() should be a pointer to an int variable,
|
last argument of ioctl() should be a pointer to an int variable,
|
||||||
the value of which will indicate whether the call returned after
|
the value of which will indicate whether the call returned after
|
||||||
creating the snapshot (1) or after restoring the system memory state
|
creating the snapshot (1) or after restoring the system memory state
|
||||||
|
@ -45,48 +51,59 @@ SNAPSHOT_CREATE_IMAGE - create a snapshot of the system memory; the
|
||||||
has been created the read() operation can be used to transfer
|
has been created the read() operation can be used to transfer
|
||||||
it out of the kernel
|
it out of the kernel
|
||||||
|
|
||||||
SNAPSHOT_ATOMIC_RESTORE - restore the system memory state from the
|
SNAPSHOT_ATOMIC_RESTORE
|
||||||
|
restore the system memory state from the
|
||||||
uploaded snapshot image; before calling it you should transfer
|
uploaded snapshot image; before calling it you should transfer
|
||||||
the system memory snapshot back to the kernel using the write()
|
the system memory snapshot back to the kernel using the write()
|
||||||
operation; this call will not succeed if the snapshot
|
operation; this call will not succeed if the snapshot
|
||||||
image is not available to the kernel
|
image is not available to the kernel
|
||||||
|
|
||||||
SNAPSHOT_FREE - free memory allocated for the snapshot image
|
SNAPSHOT_FREE
|
||||||
|
free memory allocated for the snapshot image
|
||||||
|
|
||||||
SNAPSHOT_PREF_IMAGE_SIZE - set the preferred maximum size of the image
|
SNAPSHOT_PREF_IMAGE_SIZE
|
||||||
|
set the preferred maximum size of the image
|
||||||
(the kernel will do its best to ensure the image size will not exceed
|
(the kernel will do its best to ensure the image size will not exceed
|
||||||
this number, but if it turns out to be impossible, the kernel will
|
this number, but if it turns out to be impossible, the kernel will
|
||||||
create the smallest image possible)
|
create the smallest image possible)
|
||||||
|
|
||||||
SNAPSHOT_GET_IMAGE_SIZE - return the actual size of the hibernation image
|
SNAPSHOT_GET_IMAGE_SIZE
|
||||||
|
return the actual size of the hibernation image
|
||||||
|
|
||||||
SNAPSHOT_AVAIL_SWAP_SIZE - return the amount of available swap in bytes (the
|
SNAPSHOT_AVAIL_SWAP_SIZE
|
||||||
|
return the amount of available swap in bytes (the
|
||||||
last argument should be a pointer to an unsigned int variable that will
|
last argument should be a pointer to an unsigned int variable that will
|
||||||
contain the result if the call is successful).
|
contain the result if the call is successful).
|
||||||
|
|
||||||
SNAPSHOT_ALLOC_SWAP_PAGE - allocate a swap page from the resume partition
|
SNAPSHOT_ALLOC_SWAP_PAGE
|
||||||
|
allocate a swap page from the resume partition
|
||||||
(the last argument should be a pointer to a loff_t variable that
|
(the last argument should be a pointer to a loff_t variable that
|
||||||
will contain the swap page offset if the call is successful)
|
will contain the swap page offset if the call is successful)
|
||||||
|
|
||||||
SNAPSHOT_FREE_SWAP_PAGES - free all swap pages allocated by
|
SNAPSHOT_FREE_SWAP_PAGES
|
||||||
|
free all swap pages allocated by
|
||||||
SNAPSHOT_ALLOC_SWAP_PAGE
|
SNAPSHOT_ALLOC_SWAP_PAGE
|
||||||
|
|
||||||
SNAPSHOT_SET_SWAP_AREA - set the resume partition and the offset (in <PAGE_SIZE>
|
SNAPSHOT_SET_SWAP_AREA
|
||||||
|
set the resume partition and the offset (in <PAGE_SIZE>
|
||||||
units) from the beginning of the partition at which the swap header is
|
units) from the beginning of the partition at which the swap header is
|
||||||
located (the last ioctl() argument should point to a struct
|
located (the last ioctl() argument should point to a struct
|
||||||
resume_swap_area, as defined in kernel/power/suspend_ioctls.h,
|
resume_swap_area, as defined in kernel/power/suspend_ioctls.h,
|
||||||
containing the resume device specification and the offset); for swap
|
containing the resume device specification and the offset); for swap
|
||||||
partitions the offset is always 0, but it is different from zero for
|
partitions the offset is always 0, but it is different from zero for
|
||||||
swap files (see Documentation/power/swsusp-and-swap-files.txt for
|
swap files (see Documentation/power/swsusp-and-swap-files.rst for
|
||||||
details).
|
details).
|
||||||
|
|
||||||
SNAPSHOT_PLATFORM_SUPPORT - enable/disable the hibernation platform support,
|
SNAPSHOT_PLATFORM_SUPPORT
|
||||||
|
enable/disable the hibernation platform support,
|
||||||
depending on the argument value (enable, if the argument is nonzero)
|
depending on the argument value (enable, if the argument is nonzero)
|
||||||
|
|
||||||
SNAPSHOT_POWER_OFF - make the kernel transition the system to the hibernation
|
SNAPSHOT_POWER_OFF
|
||||||
|
make the kernel transition the system to the hibernation
|
||||||
state (eg. ACPI S4) using the platform (eg. ACPI) driver
|
state (eg. ACPI S4) using the platform (eg. ACPI) driver
|
||||||
|
|
||||||
SNAPSHOT_S2RAM - suspend to RAM; using this call causes the kernel to
|
SNAPSHOT_S2RAM
|
||||||
|
suspend to RAM; using this call causes the kernel to
|
||||||
immediately enter the suspend-to-RAM state, so this call must always
|
immediately enter the suspend-to-RAM state, so this call must always
|
||||||
be preceded by the SNAPSHOT_FREEZE call and it is also necessary
|
be preceded by the SNAPSHOT_FREEZE call and it is also necessary
|
||||||
to use the SNAPSHOT_UNFREEZE call after the system wakes up. This call
|
to use the SNAPSHOT_UNFREEZE call after the system wakes up. This call
|
||||||
|
@ -98,10 +115,11 @@ SNAPSHOT_S2RAM - suspend to RAM; using this call causes the kernel to
|
||||||
|
|
||||||
The device's read() operation can be used to transfer the snapshot image from
|
The device's read() operation can be used to transfer the snapshot image from
|
||||||
the kernel. It has the following limitations:
|
the kernel. It has the following limitations:
|
||||||
|
|
||||||
- you cannot read() more than one virtual memory page at a time
|
- you cannot read() more than one virtual memory page at a time
|
||||||
- read()s across page boundaries are impossible (ie. if you read() 1/2 of
|
- read()s across page boundaries are impossible (ie. if you read() 1/2 of
|
||||||
a page in the previous call, you will only be able to read()
|
a page in the previous call, you will only be able to read()
|
||||||
_at_ _most_ 1/2 of the page in the next call)
|
**at most** 1/2 of the page in the next call)
|
||||||
|
|
||||||
The device's write() operation is used for uploading the system memory snapshot
|
The device's write() operation is used for uploading the system memory snapshot
|
||||||
into the kernel. It has the same limitations as the read() operation.
|
into the kernel. It has the same limitations as the read() operation.
|
||||||
|
@ -143,8 +161,10 @@ preferably using mlockall(), before calling SNAPSHOT_FREEZE.
|
||||||
The suspending utility MUST check the value stored by SNAPSHOT_CREATE_IMAGE
|
The suspending utility MUST check the value stored by SNAPSHOT_CREATE_IMAGE
|
||||||
in the memory location pointed to by the last argument of ioctl() and proceed
|
in the memory location pointed to by the last argument of ioctl() and proceed
|
||||||
in accordance with it:
|
in accordance with it:
|
||||||
|
|
||||||
1. If the value is 1 (ie. the system memory snapshot has just been
|
1. If the value is 1 (ie. the system memory snapshot has just been
|
||||||
created and the system is ready for saving it):
|
created and the system is ready for saving it):
|
||||||
|
|
||||||
(a) The suspending utility MUST NOT close the snapshot device
|
(a) The suspending utility MUST NOT close the snapshot device
|
||||||
_unless_ the whole suspend procedure is to be cancelled, in
|
_unless_ the whole suspend procedure is to be cancelled, in
|
||||||
which case, if the snapshot image has already been saved, the
|
which case, if the snapshot image has already been saved, the
|
||||||
|
@ -158,6 +178,7 @@ in accordance with it:
|
||||||
called. However, it MAY mount a file system that was not
|
called. However, it MAY mount a file system that was not
|
||||||
mounted at that time and perform some operations on it (eg.
|
mounted at that time and perform some operations on it (eg.
|
||||||
use it for saving the image).
|
use it for saving the image).
|
||||||
|
|
||||||
2. If the value is 0 (ie. the system state has just been restored from
|
2. If the value is 0 (ie. the system state has just been restored from
|
||||||
the snapshot image), the suspending utility MUST close the snapshot
|
the snapshot image), the suspending utility MUST close the snapshot
|
||||||
device. Afterwards it will be treated as a regular userland process,
|
device. Afterwards it will be treated as a regular userland process,
|
|
@ -1,7 +1,8 @@
|
||||||
|
===========================
|
||||||
|
Video issues with S3 resume
|
||||||
|
===========================
|
||||||
|
|
||||||
Video issues with S3 resume
|
2003-2006, Pavel Machek
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
2003-2006, Pavel Machek
|
|
||||||
|
|
||||||
During S3 resume, hardware needs to be reinitialized. For most
|
During S3 resume, hardware needs to be reinitialized. For most
|
||||||
devices, this is easy, and kernel driver knows how to do
|
devices, this is easy, and kernel driver knows how to do
|
||||||
|
@ -41,37 +42,37 @@ There are a few types of systems where video works after S3 resume:
|
||||||
(1) systems where video state is preserved over S3.
|
(1) systems where video state is preserved over S3.
|
||||||
|
|
||||||
(2) systems where it is possible to call the video BIOS during S3
|
(2) systems where it is possible to call the video BIOS during S3
|
||||||
resume. Unfortunately, it is not correct to call the video BIOS at
|
resume. Unfortunately, it is not correct to call the video BIOS at
|
||||||
that point, but it happens to work on some machines. Use
|
that point, but it happens to work on some machines. Use
|
||||||
acpi_sleep=s3_bios.
|
acpi_sleep=s3_bios.
|
||||||
|
|
||||||
(3) systems that initialize video card into vga text mode and where
|
(3) systems that initialize video card into vga text mode and where
|
||||||
the BIOS works well enough to be able to set video mode. Use
|
the BIOS works well enough to be able to set video mode. Use
|
||||||
acpi_sleep=s3_mode on these.
|
acpi_sleep=s3_mode on these.
|
||||||
|
|
||||||
(4) on some systems s3_bios kicks video into text mode, and
|
(4) on some systems s3_bios kicks video into text mode, and
|
||||||
acpi_sleep=s3_bios,s3_mode is needed.
|
acpi_sleep=s3_bios,s3_mode is needed.
|
||||||
|
|
||||||
(5) radeon systems, where X can soft-boot your video card. You'll need
|
(5) radeon systems, where X can soft-boot your video card. You'll need
|
||||||
a new enough X, and a plain text console (no vesafb or radeonfb). See
|
a new enough X, and a plain text console (no vesafb or radeonfb). See
|
||||||
http://www.doesi.gmxhome.de/linux/tm800s3/s3.html for more information.
|
http://www.doesi.gmxhome.de/linux/tm800s3/s3.html for more information.
|
||||||
Alternatively, you should use vbetool (6) instead.
|
Alternatively, you should use vbetool (6) instead.
|
||||||
|
|
||||||
(6) other radeon systems, where vbetool is enough to bring system back
|
(6) other radeon systems, where vbetool is enough to bring system back
|
||||||
to life. It needs text console to be working. Do vbetool vbestate
|
to life. It needs text console to be working. Do vbetool vbestate
|
||||||
save > /tmp/delme; echo 3 > /proc/acpi/sleep; vbetool post; vbetool
|
save > /tmp/delme; echo 3 > /proc/acpi/sleep; vbetool post; vbetool
|
||||||
vbestate restore < /tmp/delme; setfont <whatever>, and your video
|
vbestate restore < /tmp/delme; setfont <whatever>, and your video
|
||||||
should work.
|
should work.
|
||||||
|
|
||||||
(7) on some systems, it is possible to boot most of kernel, and then
|
(7) on some systems, it is possible to boot most of kernel, and then
|
||||||
POSTing bios works. Ole Rohne has patch to do just that at
|
POSTing bios works. Ole Rohne has patch to do just that at
|
||||||
http://dev.gentoo.org/~marineam/patch-radeonfb-2.6.11-rc2-mm2.
|
http://dev.gentoo.org/~marineam/patch-radeonfb-2.6.11-rc2-mm2.
|
||||||
|
|
||||||
(8) on some systems, you can use the video_post utility and or
|
(8) on some systems, you can use the video_post utility and or
|
||||||
do echo 3 > /sys/power/state && /usr/sbin/video_post - which will
|
do echo 3 > /sys/power/state && /usr/sbin/video_post - which will
|
||||||
initialize the display in console mode. If you are in X, you can switch
|
initialize the display in console mode. If you are in X, you can switch
|
||||||
to a virtual terminal and back to X using CTRL+ALT+F1 - CTRL+ALT+F7 to get
|
to a virtual terminal and back to X using CTRL+ALT+F1 - CTRL+ALT+F7 to get
|
||||||
the display working in graphical mode again.
|
the display working in graphical mode again.
|
||||||
|
|
||||||
Now, if you pass acpi_sleep=something, and it does not work with your
|
Now, if you pass acpi_sleep=something, and it does not work with your
|
||||||
bios, you'll get a hard crash during resume. Be careful. Also it is
|
bios, you'll get a hard crash during resume. Be careful. Also it is
|
||||||
|
@ -87,99 +88,126 @@ chance of working.
|
||||||
|
|
||||||
Table of known working notebooks:
|
Table of known working notebooks:
|
||||||
|
|
||||||
|
|
||||||
|
=============================== ===============================================
|
||||||
Model hack (or "how to do it")
|
Model hack (or "how to do it")
|
||||||
------------------------------------------------------------------------------
|
=============================== ===============================================
|
||||||
Acer Aspire 1406LC ole's late BIOS init (7), turn off DRI
|
Acer Aspire 1406LC ole's late BIOS init (7), turn off DRI
|
||||||
Acer TM 230 s3_bios (2)
|
Acer TM 230 s3_bios (2)
|
||||||
Acer TM 242FX vbetool (6)
|
Acer TM 242FX vbetool (6)
|
||||||
Acer TM C110 video_post (8)
|
Acer TM C110 video_post (8)
|
||||||
Acer TM C300 vga=normal (only suspend on console, not in X), vbetool (6) or video_post (8)
|
Acer TM C300 vga=normal (only suspend on console, not in X),
|
||||||
|
vbetool (6) or video_post (8)
|
||||||
Acer TM 4052LCi s3_bios (2)
|
Acer TM 4052LCi s3_bios (2)
|
||||||
Acer TM 636Lci s3_bios,s3_mode (4)
|
Acer TM 636Lci s3_bios,s3_mode (4)
|
||||||
Acer TM 650 (Radeon M7) vga=normal plus boot-radeon (5) gets text console back
|
Acer TM 650 (Radeon M7) vga=normal plus boot-radeon (5) gets text
|
||||||
Acer TM 660 ??? (*)
|
console back
|
||||||
Acer TM 800 vga=normal, X patches, see webpage (5) or vbetool (6)
|
Acer TM 660 ??? [#f1]_
|
||||||
Acer TM 803 vga=normal, X patches, see webpage (5) or vbetool (6)
|
Acer TM 800 vga=normal, X patches, see webpage (5)
|
||||||
|
or vbetool (6)
|
||||||
|
Acer TM 803 vga=normal, X patches, see webpage (5)
|
||||||
|
or vbetool (6)
|
||||||
Acer TM 803LCi vga=normal, vbetool (6)
|
Acer TM 803LCi vga=normal, vbetool (6)
|
||||||
Arima W730a vbetool needed (6)
|
Arima W730a vbetool needed (6)
|
||||||
Asus L2400D s3_mode (3)(***) (S1 also works OK)
|
Asus L2400D s3_mode (3) [#f2]_ (S1 also works OK)
|
||||||
Asus L3350M (SiS 740) (6)
|
Asus L3350M (SiS 740) (6)
|
||||||
Asus L3800C (Radeon M7) s3_bios (2) (S1 also works OK)
|
Asus L3800C (Radeon M7) s3_bios (2) (S1 also works OK)
|
||||||
Asus M6887Ne vga=normal, s3_bios (2), use radeon driver instead of fglrx in x.org
|
Asus M6887Ne vga=normal, s3_bios (2), use radeon driver
|
||||||
|
instead of fglrx in x.org
|
||||||
Athlon64 desktop prototype s3_bios (2)
|
Athlon64 desktop prototype s3_bios (2)
|
||||||
Compal CL-50 ??? (*)
|
Compal CL-50 ??? [#f1]_
|
||||||
Compaq Armada E500 - P3-700 none (1) (S1 also works OK)
|
Compaq Armada E500 - P3-700 none (1) (S1 also works OK)
|
||||||
Compaq Evo N620c vga=normal, s3_bios (2)
|
Compaq Evo N620c vga=normal, s3_bios (2)
|
||||||
Dell 600m, ATI R250 Lf none (1), but needs xorg-x11-6.8.1.902-1
|
Dell 600m, ATI R250 Lf none (1), but needs xorg-x11-6.8.1.902-1
|
||||||
Dell D600, ATI RV250 vga=normal and X, or try vbestate (6)
|
Dell D600, ATI RV250 vga=normal and X, or try vbestate (6)
|
||||||
Dell D610 vga=normal and X (possibly vbestate (6) too, but not tested)
|
Dell D610 vga=normal and X (possibly vbestate (6) too,
|
||||||
Dell Inspiron 4000 ??? (*)
|
but not tested)
|
||||||
Dell Inspiron 500m ??? (*)
|
Dell Inspiron 4000 ??? [#f1]_
|
||||||
|
Dell Inspiron 500m ??? [#f1]_
|
||||||
Dell Inspiron 510m ???
|
Dell Inspiron 510m ???
|
||||||
Dell Inspiron 5150 vbetool needed (6)
|
Dell Inspiron 5150 vbetool needed (6)
|
||||||
Dell Inspiron 600m ??? (*)
|
Dell Inspiron 600m ??? [#f1]_
|
||||||
Dell Inspiron 8200 ??? (*)
|
Dell Inspiron 8200 ??? [#f1]_
|
||||||
Dell Inspiron 8500 ??? (*)
|
Dell Inspiron 8500 ??? [#f1]_
|
||||||
Dell Inspiron 8600 ??? (*)
|
Dell Inspiron 8600 ??? [#f1]_
|
||||||
eMachines athlon64 machines vbetool needed (6) (someone please get me model #s)
|
eMachines athlon64 machines vbetool needed (6) (someone please get
|
||||||
HP NC6000 s3_bios, may not use radeonfb (2); or vbetool (6)
|
me model #s)
|
||||||
HP NX7000 ??? (*)
|
HP NC6000 s3_bios, may not use radeonfb (2);
|
||||||
HP Pavilion ZD7000 vbetool post needed, need open-source nv driver for X
|
or vbetool (6)
|
||||||
|
HP NX7000 ??? [#f1]_
|
||||||
|
HP Pavilion ZD7000 vbetool post needed, need open-source nv
|
||||||
|
driver for X
|
||||||
HP Omnibook XE3 athlon version none (1)
|
HP Omnibook XE3 athlon version none (1)
|
||||||
HP Omnibook XE3GC none (1), video is S3 Savage/IX-MV
|
HP Omnibook XE3GC none (1), video is S3 Savage/IX-MV
|
||||||
HP Omnibook XE3L-GF vbetool (6)
|
HP Omnibook XE3L-GF vbetool (6)
|
||||||
HP Omnibook 5150 none (1), (S1 also works OK)
|
HP Omnibook 5150 none (1), (S1 also works OK)
|
||||||
IBM TP T20, model 2647-44G none (1), video is S3 Inc. 86C270-294 Savage/IX-MV, vesafb gets "interesting" but X work.
|
IBM TP T20, model 2647-44G none (1), video is S3 Inc. 86C270-294
|
||||||
IBM TP A31 / Type 2652-M5G s3_mode (3) [works ok with BIOS 1.04 2002-08-23, but not at all with BIOS 1.11 2004-11-05 :-(]
|
Savage/IX-MV, vesafb gets "interesting"
|
||||||
|
but X work.
|
||||||
|
IBM TP A31 / Type 2652-M5G s3_mode (3) [works ok with
|
||||||
|
BIOS 1.04 2002-08-23, but not at all with
|
||||||
|
BIOS 1.11 2004-11-05 :-(]
|
||||||
IBM TP R32 / Type 2658-MMG none (1)
|
IBM TP R32 / Type 2658-MMG none (1)
|
||||||
IBM TP R40 2722B3G ??? (*)
|
IBM TP R40 2722B3G ??? [#f1]_
|
||||||
IBM TP R50p / Type 1832-22U s3_bios (2)
|
IBM TP R50p / Type 1832-22U s3_bios (2)
|
||||||
IBM TP R51 none (1)
|
IBM TP R51 none (1)
|
||||||
IBM TP T30 236681A ??? (*)
|
IBM TP T30 236681A ??? [#f1]_
|
||||||
IBM TP T40 / Type 2373-MU4 none (1)
|
IBM TP T40 / Type 2373-MU4 none (1)
|
||||||
IBM TP T40p none (1)
|
IBM TP T40p none (1)
|
||||||
IBM TP R40p s3_bios (2)
|
IBM TP R40p s3_bios (2)
|
||||||
IBM TP T41p s3_bios (2), switch to X after resume
|
IBM TP T41p s3_bios (2), switch to X after resume
|
||||||
IBM TP T42 s3_bios (2)
|
IBM TP T42 s3_bios (2)
|
||||||
IBM ThinkPad T42p (2373-GTG) s3_bios (2)
|
IBM ThinkPad T42p (2373-GTG) s3_bios (2)
|
||||||
IBM TP X20 ??? (*)
|
IBM TP X20 ??? [#f1]_
|
||||||
IBM TP X30 s3_bios, s3_mode (4)
|
IBM TP X30 s3_bios, s3_mode (4)
|
||||||
IBM TP X31 / Type 2672-XXH none (1), use radeontool (http://fdd.com/software/radeon/) to turn off backlight.
|
IBM TP X31 / Type 2672-XXH none (1), use radeontool
|
||||||
IBM TP X32 none (1), but backlight is on and video is trashed after long suspend. s3_bios,s3_mode (4) works too. Perhaps that gets better results?
|
(http://fdd.com/software/radeon/) to
|
||||||
|
turn off backlight.
|
||||||
|
IBM TP X32 none (1), but backlight is on and video is
|
||||||
|
trashed after long suspend. s3_bios,
|
||||||
|
s3_mode (4) works too. Perhaps that gets
|
||||||
|
better results?
|
||||||
IBM Thinkpad X40 Type 2371-7JG s3_bios,s3_mode (4)
|
IBM Thinkpad X40 Type 2371-7JG s3_bios,s3_mode (4)
|
||||||
IBM TP 600e none(1), but a switch to console and back to X is needed
|
IBM TP 600e none(1), but a switch to console and
|
||||||
Medion MD4220 ??? (*)
|
back to X is needed
|
||||||
|
Medion MD4220 ??? [#f1]_
|
||||||
Samsung P35 vbetool needed (6)
|
Samsung P35 vbetool needed (6)
|
||||||
Sharp PC-AR10 (ATI rage) none (1), backlight does not switch off
|
Sharp PC-AR10 (ATI rage) none (1), backlight does not switch off
|
||||||
Sony Vaio PCG-C1VRX/K s3_bios (2)
|
Sony Vaio PCG-C1VRX/K s3_bios (2)
|
||||||
Sony Vaio PCG-F403 ??? (*)
|
Sony Vaio PCG-F403 ??? [#f1]_
|
||||||
Sony Vaio PCG-GRT995MP none (1), works with 'nv' X driver
|
Sony Vaio PCG-GRT995MP none (1), works with 'nv' X driver
|
||||||
Sony Vaio PCG-GR7/K none (1), but needs radeonfb, use radeontool (http://fdd.com/software/radeon/) to turn off backlight.
|
Sony Vaio PCG-GR7/K none (1), but needs radeonfb, use
|
||||||
Sony Vaio PCG-N505SN ??? (*)
|
radeontool (http://fdd.com/software/radeon/)
|
||||||
|
to turn off backlight.
|
||||||
|
Sony Vaio PCG-N505SN ??? [#f1]_
|
||||||
Sony Vaio vgn-s260 X or boot-radeon can init it (5)
|
Sony Vaio vgn-s260 X or boot-radeon can init it (5)
|
||||||
Sony Vaio vgn-S580BH vga=normal, but suspend from X. Console will be blank unless you return to X.
|
Sony Vaio vgn-S580BH vga=normal, but suspend from X. Console will
|
||||||
|
be blank unless you return to X.
|
||||||
Sony Vaio vgn-FS115B s3_bios (2),s3_mode (4)
|
Sony Vaio vgn-FS115B s3_bios (2),s3_mode (4)
|
||||||
Toshiba Libretto L5 none (1)
|
Toshiba Libretto L5 none (1)
|
||||||
Toshiba Libretto 100CT/110CT vbetool (6)
|
Toshiba Libretto 100CT/110CT vbetool (6)
|
||||||
Toshiba Portege 3020CT s3_mode (3)
|
Toshiba Portege 3020CT s3_mode (3)
|
||||||
Toshiba Satellite 4030CDT s3_mode (3) (S1 also works OK)
|
Toshiba Satellite 4030CDT s3_mode (3) (S1 also works OK)
|
||||||
Toshiba Satellite 4080XCDT s3_mode (3) (S1 also works OK)
|
Toshiba Satellite 4080XCDT s3_mode (3) (S1 also works OK)
|
||||||
Toshiba Satellite 4090XCDT ??? (*)
|
Toshiba Satellite 4090XCDT ??? [#f1]_
|
||||||
Toshiba Satellite P10-554 s3_bios,s3_mode (4)(****)
|
Toshiba Satellite P10-554 s3_bios,s3_mode (4)[#f3]_
|
||||||
Toshiba M30 (2) xor X with nvidia driver using internal AGP
|
Toshiba M30 (2) xor X with nvidia driver using internal AGP
|
||||||
Uniwill 244IIO ??? (*)
|
Uniwill 244IIO ??? [#f1]_
|
||||||
|
=============================== ===============================================
|
||||||
|
|
||||||
Known working desktop systems
|
Known working desktop systems
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
=================== ============================= ========================
|
||||||
Mainboard Graphics card hack (or "how to do it")
|
Mainboard Graphics card hack (or "how to do it")
|
||||||
------------------------------------------------------------------------------
|
=================== ============================= ========================
|
||||||
Asus A7V8X nVidia RIVA TNT2 model 64 s3_bios,s3_mode (4)
|
Asus A7V8X nVidia RIVA TNT2 model 64 s3_bios,s3_mode (4)
|
||||||
|
=================== ============================= ========================
|
||||||
|
|
||||||
|
|
||||||
(*) from https://wiki.ubuntu.com/HoaryPMResults, not sure
|
.. [#f1] from https://wiki.ubuntu.com/HoaryPMResults, not sure
|
||||||
which options to use. If you know, please tell me.
|
which options to use. If you know, please tell me.
|
||||||
|
|
||||||
(***) To be tested with a newer kernel.
|
.. [#f2] To be tested with a newer kernel.
|
||||||
|
|
||||||
(****) Not with SMP kernel, UP only.
|
.. [#f3] Not with SMP kernel, UP only.
|
|
@ -117,7 +117,7 @@ PM support:
|
||||||
implemented") error. You should also try to make sure that your
|
implemented") error. You should also try to make sure that your
|
||||||
driver uses as little power as possible when it's not doing
|
driver uses as little power as possible when it's not doing
|
||||||
anything. For the driver testing instructions see
|
anything. For the driver testing instructions see
|
||||||
Documentation/power/drivers-testing.txt and for a relatively
|
Documentation/power/drivers-testing.rst and for a relatively
|
||||||
complete overview of the power management issues related to
|
complete overview of the power management issues related to
|
||||||
drivers see :ref:`Documentation/driver-api/pm/devices.rst <driverapi_pm_devices>`.
|
drivers see :ref:`Documentation/driver-api/pm/devices.rst <driverapi_pm_devices>`.
|
||||||
|
|
||||||
|
|
|
@ -22,7 +22,7 @@ the highest.
|
||||||
|
|
||||||
The actual EM used by EAS is _not_ maintained by the scheduler, but by a
|
The actual EM used by EAS is _not_ maintained by the scheduler, but by a
|
||||||
dedicated framework. For details about this framework and what it provides,
|
dedicated framework. For details about this framework and what it provides,
|
||||||
please refer to its documentation (see Documentation/power/energy-model.txt).
|
please refer to its documentation (see Documentation/power/energy-model.rst).
|
||||||
|
|
||||||
|
|
||||||
2. Background and Terminology
|
2. Background and Terminology
|
||||||
|
@ -81,7 +81,7 @@ through the arch_scale_cpu_capacity() callback.
|
||||||
|
|
||||||
The rest of platform knowledge used by EAS is directly read from the Energy
|
The rest of platform knowledge used by EAS is directly read from the Energy
|
||||||
Model (EM) framework. The EM of a platform is composed of a power cost table
|
Model (EM) framework. The EM of a platform is composed of a power cost table
|
||||||
per 'performance domain' in the system (see Documentation/power/energy-model.txt
|
per 'performance domain' in the system (see Documentation/power/energy-model.rst
|
||||||
for futher details about performance domains).
|
for futher details about performance domains).
|
||||||
|
|
||||||
The scheduler manages references to the EM objects in the topology code when the
|
The scheduler manages references to the EM objects in the topology code when the
|
||||||
|
@ -353,7 +353,7 @@ could be amended in the future if proven otherwise.
|
||||||
EAS uses the EM of a platform to estimate the impact of scheduling decisions on
|
EAS uses the EM of a platform to estimate the impact of scheduling decisions on
|
||||||
energy. So, your platform must provide power cost tables to the EM framework in
|
energy. So, your platform must provide power cost tables to the EM framework in
|
||||||
order to make EAS start. To do so, please refer to documentation of the
|
order to make EAS start. To do so, please refer to documentation of the
|
||||||
independent EM framework in Documentation/power/energy-model.txt.
|
independent EM framework in Documentation/power/energy-model.rst.
|
||||||
|
|
||||||
Please also note that the scheduling domains need to be re-built after the
|
Please also note that the scheduling domains need to be re-built after the
|
||||||
EM has been registered in order to start EAS.
|
EM has been registered in order to start EAS.
|
||||||
|
|
|
@ -151,7 +151,7 @@ At the runtime you can disable idle states with below methods:
|
||||||
|
|
||||||
It is possible to disable CPU idle states by way of the PM QoS
|
It is possible to disable CPU idle states by way of the PM QoS
|
||||||
subsystem, more specifically by using the "/dev/cpu_dma_latency"
|
subsystem, more specifically by using the "/dev/cpu_dma_latency"
|
||||||
interface (see Documentation/power/pm_qos_interface.txt for more
|
interface (see Documentation/power/pm_qos_interface.rst for more
|
||||||
details). As specified in the PM QoS documentation the requested
|
details). As specified in the PM QoS documentation the requested
|
||||||
parameter will stay in effect until the file descriptor is released.
|
parameter will stay in effect until the file descriptor is released.
|
||||||
For example:
|
For example:
|
||||||
|
|
|
@ -97,7 +97,7 @@ Linux 2.6:
|
||||||
函数定义成返回 -ENOSYS(功能未实现)错误。你还应该尝试确
|
函数定义成返回 -ENOSYS(功能未实现)错误。你还应该尝试确
|
||||||
保你的驱动在什么都不干的情况下将耗电降到最低。要获得驱动
|
保你的驱动在什么都不干的情况下将耗电降到最低。要获得驱动
|
||||||
程序测试的指导,请参阅
|
程序测试的指导,请参阅
|
||||||
Documentation/power/drivers-testing.txt。有关驱动程序电
|
Documentation/power/drivers-testing.rst。有关驱动程序电
|
||||||
源管理问题相对全面的概述,请参阅
|
源管理问题相对全面的概述,请参阅
|
||||||
Documentation/driver-api/pm/devices.rst。
|
Documentation/driver-api/pm/devices.rst。
|
||||||
|
|
||||||
|
|
|
@ -6548,7 +6548,7 @@ M: "Rafael J. Wysocki" <rjw@rjwysocki.net>
|
||||||
M: Pavel Machek <pavel@ucw.cz>
|
M: Pavel Machek <pavel@ucw.cz>
|
||||||
L: linux-pm@vger.kernel.org
|
L: linux-pm@vger.kernel.org
|
||||||
S: Supported
|
S: Supported
|
||||||
F: Documentation/power/freezing-of-tasks.txt
|
F: Documentation/power/freezing-of-tasks.rst
|
||||||
F: include/linux/freezer.h
|
F: include/linux/freezer.h
|
||||||
F: kernel/freezer.c
|
F: kernel/freezer.c
|
||||||
|
|
||||||
|
@ -11942,7 +11942,7 @@ S: Maintained
|
||||||
T: git git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git
|
T: git git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git
|
||||||
F: drivers/opp/
|
F: drivers/opp/
|
||||||
F: include/linux/pm_opp.h
|
F: include/linux/pm_opp.h
|
||||||
F: Documentation/power/opp.txt
|
F: Documentation/power/opp.rst
|
||||||
F: Documentation/devicetree/bindings/opp/
|
F: Documentation/devicetree/bindings/opp/
|
||||||
|
|
||||||
OPL4 DRIVER
|
OPL4 DRIVER
|
||||||
|
@ -12329,7 +12329,7 @@ M: Sam Bobroff <sbobroff@linux.ibm.com>
|
||||||
M: Oliver O'Halloran <oohall@gmail.com>
|
M: Oliver O'Halloran <oohall@gmail.com>
|
||||||
L: linuxppc-dev@lists.ozlabs.org
|
L: linuxppc-dev@lists.ozlabs.org
|
||||||
S: Supported
|
S: Supported
|
||||||
F: Documentation/PCI/pci-error-recovery.txt
|
F: Documentation/PCI/pci-error-recovery.rst
|
||||||
F: drivers/pci/pcie/aer.c
|
F: drivers/pci/pcie/aer.c
|
||||||
F: drivers/pci/pcie/dpc.c
|
F: drivers/pci/pcie/dpc.c
|
||||||
F: drivers/pci/pcie/err.c
|
F: drivers/pci/pcie/err.c
|
||||||
|
@ -12342,7 +12342,7 @@ PCI ERROR RECOVERY
|
||||||
M: Linas Vepstas <linasvepstas@gmail.com>
|
M: Linas Vepstas <linasvepstas@gmail.com>
|
||||||
L: linux-pci@vger.kernel.org
|
L: linux-pci@vger.kernel.org
|
||||||
S: Supported
|
S: Supported
|
||||||
F: Documentation/PCI/pci-error-recovery.txt
|
F: Documentation/PCI/pci-error-recovery.rst
|
||||||
|
|
||||||
PCI MSI DRIVER FOR ALTERA MSI IP
|
PCI MSI DRIVER FOR ALTERA MSI IP
|
||||||
M: Ley Foon Tan <lftan@altera.com>
|
M: Ley Foon Tan <lftan@altera.com>
|
||||||
|
|
|
@ -164,6 +164,7 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
|
||||||
struct acpi_pci_generic_root_info *ri;
|
struct acpi_pci_generic_root_info *ri;
|
||||||
struct pci_bus *bus, *child;
|
struct pci_bus *bus, *child;
|
||||||
struct acpi_pci_root_ops *root_ops;
|
struct acpi_pci_root_ops *root_ops;
|
||||||
|
struct pci_host_bridge *host;
|
||||||
|
|
||||||
ri = kzalloc(sizeof(*ri), GFP_KERNEL);
|
ri = kzalloc(sizeof(*ri), GFP_KERNEL);
|
||||||
if (!ri)
|
if (!ri)
|
||||||
|
@ -189,8 +190,16 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
|
||||||
if (!bus)
|
if (!bus)
|
||||||
return NULL;
|
return NULL;
|
||||||
|
|
||||||
pci_bus_size_bridges(bus);
|
/* If we must preserve the resource configuration, claim now */
|
||||||
pci_bus_assign_resources(bus);
|
host = pci_find_host_bridge(bus);
|
||||||
|
if (host->preserve_config)
|
||||||
|
pci_bus_claim_resources(bus);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Assign whatever was left unassigned. If we didn't claim above,
|
||||||
|
* this will reassign everything.
|
||||||
|
*/
|
||||||
|
pci_assign_unassigned_root_bus_resources(bus);
|
||||||
|
|
||||||
list_for_each_entry(child, &bus->children, node)
|
list_for_each_entry(child, &bus->children, node)
|
||||||
pcie_bus_configure_settings(child);
|
pcie_bus_configure_settings(child);
|
||||||
|
|
|
@ -2482,7 +2482,7 @@ menuconfig APM
|
||||||
machines with more than one CPU.
|
machines with more than one CPU.
|
||||||
|
|
||||||
In order to use APM, you will need supporting software. For location
|
In order to use APM, you will need supporting software. For location
|
||||||
and more information, read <file:Documentation/power/apm-acpi.txt>
|
and more information, read <file:Documentation/power/apm-acpi.rst>
|
||||||
and the Battery Powered Linux mini-HOWTO, available from
|
and the Battery Powered Linux mini-HOWTO, available from
|
||||||
<http://www.tldp.org/docs.html#howto>.
|
<http://www.tldp.org/docs.html#howto>.
|
||||||
|
|
||||||
|
|
|
@ -881,6 +881,7 @@ struct pci_bus *acpi_pci_root_create(struct acpi_pci_root *root,
|
||||||
int node = acpi_get_node(device->handle);
|
int node = acpi_get_node(device->handle);
|
||||||
struct pci_bus *bus;
|
struct pci_bus *bus;
|
||||||
struct pci_host_bridge *host_bridge;
|
struct pci_host_bridge *host_bridge;
|
||||||
|
union acpi_object *obj;
|
||||||
|
|
||||||
info->root = root;
|
info->root = root;
|
||||||
info->bridge = device;
|
info->bridge = device;
|
||||||
|
@ -917,6 +918,17 @@ struct pci_bus *acpi_pci_root_create(struct acpi_pci_root *root,
|
||||||
if (!(root->osc_control_set & OSC_PCI_EXPRESS_LTR_CONTROL))
|
if (!(root->osc_control_set & OSC_PCI_EXPRESS_LTR_CONTROL))
|
||||||
host_bridge->native_ltr = 0;
|
host_bridge->native_ltr = 0;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Evaluate the "PCI Boot Configuration" _DSM Function. If it
|
||||||
|
* exists and returns 0, we must preserve any PCI resource
|
||||||
|
* assignments made by firmware for this host bridge.
|
||||||
|
*/
|
||||||
|
obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), &pci_acpi_dsm_guid, 1,
|
||||||
|
IGNORE_PCI_BOOT_CONFIG_DSM, NULL);
|
||||||
|
if (obj && obj->type == ACPI_TYPE_INTEGER && obj->integer.value == 0)
|
||||||
|
host_bridge->preserve_config = 1;
|
||||||
|
ACPI_FREE(obj);
|
||||||
|
|
||||||
pci_scan_child_bus(bus);
|
pci_scan_child_bus(bus);
|
||||||
pci_set_host_bridge_release(host_bridge, acpi_pci_root_release_info,
|
pci_set_host_bridge_release(host_bridge, acpi_pci_root_release_info,
|
||||||
info);
|
info);
|
||||||
|
|
|
@ -45,7 +45,7 @@ enum i915_drm_suspend_mode {
|
||||||
* to be disabled. This shouldn't happen and we'll print some error messages in
|
* to be disabled. This shouldn't happen and we'll print some error messages in
|
||||||
* case it happens.
|
* case it happens.
|
||||||
*
|
*
|
||||||
* For more, read the Documentation/power/runtime_pm.txt.
|
* For more, read the Documentation/power/runtime_pm.rst.
|
||||||
*/
|
*/
|
||||||
struct intel_runtime_pm {
|
struct intel_runtime_pm {
|
||||||
atomic_t wakeref_count;
|
atomic_t wakeref_count;
|
||||||
|
|
|
@ -11,4 +11,4 @@ config PM_OPP
|
||||||
OPP layer organizes the data internally using device pointers
|
OPP layer organizes the data internally using device pointers
|
||||||
representing individual voltage domains and provides SOC
|
representing individual voltage domains and provides SOC
|
||||||
implementations a ready to use framework to manage OPPs.
|
implementations a ready to use framework to manage OPPs.
|
||||||
For more information, read <file:Documentation/power/opp.txt>
|
For more information, read <file:Documentation/power/opp.rst>
|
||||||
|
|
|
@ -432,7 +432,7 @@ EXPORT_SYMBOL_GPL(pci_prg_resp_pasid_required);
|
||||||
* @pdev: PCI device structure
|
* @pdev: PCI device structure
|
||||||
*
|
*
|
||||||
* Returns negative value when PASID capability is not present.
|
* Returns negative value when PASID capability is not present.
|
||||||
* Otherwise it returns the numer of supported PASIDs.
|
* Otherwise it returns the number of supported PASIDs.
|
||||||
*/
|
*/
|
||||||
int pci_max_pasids(struct pci_dev *pdev)
|
int pci_max_pasids(struct pci_dev *pdev)
|
||||||
{
|
{
|
||||||
|
|
|
@ -174,14 +174,14 @@ config PCIE_IPROC_MSI
|
||||||
PCIe controller
|
PCIe controller
|
||||||
|
|
||||||
config PCIE_ALTERA
|
config PCIE_ALTERA
|
||||||
bool "Altera PCIe controller"
|
tristate "Altera PCIe controller"
|
||||||
depends on ARM || NIOS2 || ARM64 || COMPILE_TEST
|
depends on ARM || NIOS2 || ARM64 || COMPILE_TEST
|
||||||
help
|
help
|
||||||
Say Y here if you want to enable PCIe controller support on Altera
|
Say Y here if you want to enable PCIe controller support on Altera
|
||||||
FPGA.
|
FPGA.
|
||||||
|
|
||||||
config PCIE_ALTERA_MSI
|
config PCIE_ALTERA_MSI
|
||||||
bool "Altera PCIe MSI feature"
|
tristate "Altera PCIe MSI feature"
|
||||||
depends on PCIE_ALTERA
|
depends on PCIE_ALTERA
|
||||||
depends on PCI_MSI_IRQ_DOMAIN
|
depends on PCI_MSI_IRQ_DOMAIN
|
||||||
help
|
help
|
||||||
|
|
|
@ -90,7 +90,7 @@ config PCI_EXYNOS
|
||||||
|
|
||||||
config PCI_IMX6
|
config PCI_IMX6
|
||||||
bool "Freescale i.MX6/7/8 PCIe controller"
|
bool "Freescale i.MX6/7/8 PCIe controller"
|
||||||
depends on SOC_IMX6Q || SOC_IMX7D || (ARM64 && ARCH_MXC) || COMPILE_TEST
|
depends on ARCH_MXC || COMPILE_TEST
|
||||||
depends on PCI_MSI_IRQ_DOMAIN
|
depends on PCI_MSI_IRQ_DOMAIN
|
||||||
select PCIE_DW_HOST
|
select PCIE_DW_HOST
|
||||||
|
|
||||||
|
|
|
@ -26,6 +26,7 @@
|
||||||
#include <linux/types.h>
|
#include <linux/types.h>
|
||||||
#include <linux/mfd/syscon.h>
|
#include <linux/mfd/syscon.h>
|
||||||
#include <linux/regmap.h>
|
#include <linux/regmap.h>
|
||||||
|
#include <linux/gpio/consumer.h>
|
||||||
|
|
||||||
#include "../../pci.h"
|
#include "../../pci.h"
|
||||||
#include "pcie-designware.h"
|
#include "pcie-designware.h"
|
||||||
|
|
|
@ -25,10 +25,14 @@
|
||||||
|
|
||||||
#include "pcie-designware.h"
|
#include "pcie-designware.h"
|
||||||
|
|
||||||
|
#define ARMADA8K_PCIE_MAX_LANES PCIE_LNK_X4
|
||||||
|
|
||||||
struct armada8k_pcie {
|
struct armada8k_pcie {
|
||||||
struct dw_pcie *pci;
|
struct dw_pcie *pci;
|
||||||
struct clk *clk;
|
struct clk *clk;
|
||||||
struct clk *clk_reg;
|
struct clk *clk_reg;
|
||||||
|
struct phy *phy[ARMADA8K_PCIE_MAX_LANES];
|
||||||
|
unsigned int phy_count;
|
||||||
};
|
};
|
||||||
|
|
||||||
#define PCIE_VENDOR_REGS_OFFSET 0x8000
|
#define PCIE_VENDOR_REGS_OFFSET 0x8000
|
||||||
|
@ -55,7 +59,7 @@ struct armada8k_pcie {
|
||||||
#define PCIE_ARUSER_REG (PCIE_VENDOR_REGS_OFFSET + 0x5C)
|
#define PCIE_ARUSER_REG (PCIE_VENDOR_REGS_OFFSET + 0x5C)
|
||||||
#define PCIE_AWUSER_REG (PCIE_VENDOR_REGS_OFFSET + 0x60)
|
#define PCIE_AWUSER_REG (PCIE_VENDOR_REGS_OFFSET + 0x60)
|
||||||
/*
|
/*
|
||||||
* AR/AW Cache defauls: Normal memory, Write-Back, Read / Write
|
* AR/AW Cache defaults: Normal memory, Write-Back, Read / Write
|
||||||
* allocate
|
* allocate
|
||||||
*/
|
*/
|
||||||
#define ARCACHE_DEFAULT_VALUE 0x3511
|
#define ARCACHE_DEFAULT_VALUE 0x3511
|
||||||
|
@ -67,6 +71,76 @@ struct armada8k_pcie {
|
||||||
|
|
||||||
#define to_armada8k_pcie(x) dev_get_drvdata((x)->dev)
|
#define to_armada8k_pcie(x) dev_get_drvdata((x)->dev)
|
||||||
|
|
||||||
|
static void armada8k_pcie_disable_phys(struct armada8k_pcie *pcie)
|
||||||
|
{
|
||||||
|
int i;
|
||||||
|
|
||||||
|
for (i = 0; i < ARMADA8K_PCIE_MAX_LANES; i++) {
|
||||||
|
phy_power_off(pcie->phy[i]);
|
||||||
|
phy_exit(pcie->phy[i]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
static int armada8k_pcie_enable_phys(struct armada8k_pcie *pcie)
|
||||||
|
{
|
||||||
|
int ret;
|
||||||
|
int i;
|
||||||
|
|
||||||
|
for (i = 0; i < ARMADA8K_PCIE_MAX_LANES; i++) {
|
||||||
|
ret = phy_init(pcie->phy[i]);
|
||||||
|
if (ret)
|
||||||
|
return ret;
|
||||||
|
|
||||||
|
ret = phy_set_mode_ext(pcie->phy[i], PHY_MODE_PCIE,
|
||||||
|
pcie->phy_count);
|
||||||
|
if (ret) {
|
||||||
|
phy_exit(pcie->phy[i]);
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
ret = phy_power_on(pcie->phy[i]);
|
||||||
|
if (ret) {
|
||||||
|
phy_exit(pcie->phy[i]);
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
static int armada8k_pcie_setup_phys(struct armada8k_pcie *pcie)
|
||||||
|
{
|
||||||
|
struct dw_pcie *pci = pcie->pci;
|
||||||
|
struct device *dev = pci->dev;
|
||||||
|
struct device_node *node = dev->of_node;
|
||||||
|
int ret = 0;
|
||||||
|
int i;
|
||||||
|
|
||||||
|
for (i = 0; i < ARMADA8K_PCIE_MAX_LANES; i++) {
|
||||||
|
pcie->phy[i] = devm_of_phy_get_by_index(dev, node, i);
|
||||||
|
if (IS_ERR(pcie->phy[i]) &&
|
||||||
|
(PTR_ERR(pcie->phy[i]) == -EPROBE_DEFER))
|
||||||
|
return PTR_ERR(pcie->phy[i]);
|
||||||
|
|
||||||
|
if (IS_ERR(pcie->phy[i])) {
|
||||||
|
pcie->phy[i] = NULL;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
pcie->phy_count++;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Old bindings miss the PHY handle, so just warn if there is no PHY */
|
||||||
|
if (!pcie->phy_count)
|
||||||
|
dev_warn(dev, "No available PHY\n");
|
||||||
|
|
||||||
|
ret = armada8k_pcie_enable_phys(pcie);
|
||||||
|
if (ret)
|
||||||
|
dev_err(dev, "Failed to initialize PHY(s) (%d)\n", ret);
|
||||||
|
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
static int armada8k_pcie_link_up(struct dw_pcie *pci)
|
static int armada8k_pcie_link_up(struct dw_pcie *pci)
|
||||||
{
|
{
|
||||||
u32 reg;
|
u32 reg;
|
||||||
|
@ -249,14 +323,20 @@ static int armada8k_pcie_probe(struct platform_device *pdev)
|
||||||
goto fail_clkreg;
|
goto fail_clkreg;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
ret = armada8k_pcie_setup_phys(pcie);
|
||||||
|
if (ret)
|
||||||
|
goto fail_clkreg;
|
||||||
|
|
||||||
platform_set_drvdata(pdev, pcie);
|
platform_set_drvdata(pdev, pcie);
|
||||||
|
|
||||||
ret = armada8k_add_pcie_port(pcie, pdev);
|
ret = armada8k_add_pcie_port(pcie, pdev);
|
||||||
if (ret)
|
if (ret)
|
||||||
goto fail_clkreg;
|
goto disable_phy;
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
|
|
||||||
|
disable_phy:
|
||||||
|
armada8k_pcie_disable_phys(pcie);
|
||||||
fail_clkreg:
|
fail_clkreg:
|
||||||
clk_disable_unprepare(pcie->clk_reg);
|
clk_disable_unprepare(pcie->clk_reg);
|
||||||
fail:
|
fail:
|
||||||
|
|
|
@ -311,6 +311,7 @@ void dw_pcie_msi_init(struct pcie_port *pp)
|
||||||
dw_pcie_wr_own_conf(pp, PCIE_MSI_ADDR_HI, 4,
|
dw_pcie_wr_own_conf(pp, PCIE_MSI_ADDR_HI, 4,
|
||||||
upper_32_bits(msi_target));
|
upper_32_bits(msi_target));
|
||||||
}
|
}
|
||||||
|
EXPORT_SYMBOL_GPL(dw_pcie_msi_init);
|
||||||
|
|
||||||
int dw_pcie_host_init(struct pcie_port *pp)
|
int dw_pcie_host_init(struct pcie_port *pp)
|
||||||
{
|
{
|
||||||
|
@ -495,6 +496,16 @@ err_free_msi:
|
||||||
dw_pcie_free_msi(pp);
|
dw_pcie_free_msi(pp);
|
||||||
return ret;
|
return ret;
|
||||||
}
|
}
|
||||||
|
EXPORT_SYMBOL_GPL(dw_pcie_host_init);
|
||||||
|
|
||||||
|
void dw_pcie_host_deinit(struct pcie_port *pp)
|
||||||
|
{
|
||||||
|
pci_stop_root_bus(pp->root_bus);
|
||||||
|
pci_remove_root_bus(pp->root_bus);
|
||||||
|
if (pci_msi_enabled() && !pp->ops->msi_host_init)
|
||||||
|
dw_pcie_free_msi(pp);
|
||||||
|
}
|
||||||
|
EXPORT_SYMBOL_GPL(dw_pcie_host_deinit);
|
||||||
|
|
||||||
static int dw_pcie_access_other_conf(struct pcie_port *pp, struct pci_bus *bus,
|
static int dw_pcie_access_other_conf(struct pcie_port *pp, struct pci_bus *bus,
|
||||||
u32 devfn, int where, int size, u32 *val,
|
u32 devfn, int where, int size, u32 *val,
|
||||||
|
@ -687,3 +698,4 @@ void dw_pcie_setup_rc(struct pcie_port *pp)
|
||||||
val |= PORT_LOGIC_SPEED_CHANGE;
|
val |= PORT_LOGIC_SPEED_CHANGE;
|
||||||
dw_pcie_wr_own_conf(pp, PCIE_LINK_WIDTH_SPEED_CONTROL, 4, val);
|
dw_pcie_wr_own_conf(pp, PCIE_LINK_WIDTH_SPEED_CONTROL, 4, val);
|
||||||
}
|
}
|
||||||
|
EXPORT_SYMBOL_GPL(dw_pcie_setup_rc);
|
||||||
|
|
|
@ -34,6 +34,7 @@ int dw_pcie_read(void __iomem *addr, int size, u32 *val)
|
||||||
|
|
||||||
return PCIBIOS_SUCCESSFUL;
|
return PCIBIOS_SUCCESSFUL;
|
||||||
}
|
}
|
||||||
|
EXPORT_SYMBOL_GPL(dw_pcie_read);
|
||||||
|
|
||||||
int dw_pcie_write(void __iomem *addr, int size, u32 val)
|
int dw_pcie_write(void __iomem *addr, int size, u32 val)
|
||||||
{
|
{
|
||||||
|
@ -51,69 +52,97 @@ int dw_pcie_write(void __iomem *addr, int size, u32 val)
|
||||||
|
|
||||||
return PCIBIOS_SUCCESSFUL;
|
return PCIBIOS_SUCCESSFUL;
|
||||||
}
|
}
|
||||||
|
EXPORT_SYMBOL_GPL(dw_pcie_write);
|
||||||
|
|
||||||
u32 __dw_pcie_read_dbi(struct dw_pcie *pci, void __iomem *base, u32 reg,
|
u32 dw_pcie_read_dbi(struct dw_pcie *pci, u32 reg, size_t size)
|
||||||
size_t size)
|
|
||||||
{
|
{
|
||||||
int ret;
|
int ret;
|
||||||
u32 val;
|
u32 val;
|
||||||
|
|
||||||
if (pci->ops->read_dbi)
|
if (pci->ops->read_dbi)
|
||||||
return pci->ops->read_dbi(pci, base, reg, size);
|
return pci->ops->read_dbi(pci, pci->dbi_base, reg, size);
|
||||||
|
|
||||||
ret = dw_pcie_read(base + reg, size, &val);
|
ret = dw_pcie_read(pci->dbi_base + reg, size, &val);
|
||||||
if (ret)
|
if (ret)
|
||||||
dev_err(pci->dev, "Read DBI address failed\n");
|
dev_err(pci->dev, "Read DBI address failed\n");
|
||||||
|
|
||||||
return val;
|
return val;
|
||||||
}
|
}
|
||||||
|
EXPORT_SYMBOL_GPL(dw_pcie_read_dbi);
|
||||||
|
|
||||||
void __dw_pcie_write_dbi(struct dw_pcie *pci, void __iomem *base, u32 reg,
|
void dw_pcie_write_dbi(struct dw_pcie *pci, u32 reg, size_t size, u32 val)
|
||||||
size_t size, u32 val)
|
|
||||||
{
|
{
|
||||||
int ret;
|
int ret;
|
||||||
|
|
||||||
if (pci->ops->write_dbi) {
|
if (pci->ops->write_dbi) {
|
||||||
pci->ops->write_dbi(pci, base, reg, size, val);
|
pci->ops->write_dbi(pci, pci->dbi_base, reg, size, val);
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
ret = dw_pcie_write(base + reg, size, val);
|
ret = dw_pcie_write(pci->dbi_base + reg, size, val);
|
||||||
if (ret)
|
if (ret)
|
||||||
dev_err(pci->dev, "Write DBI address failed\n");
|
dev_err(pci->dev, "Write DBI address failed\n");
|
||||||
}
|
}
|
||||||
|
EXPORT_SYMBOL_GPL(dw_pcie_write_dbi);
|
||||||
|
|
||||||
u32 __dw_pcie_read_dbi2(struct dw_pcie *pci, void __iomem *base, u32 reg,
|
u32 dw_pcie_read_dbi2(struct dw_pcie *pci, u32 reg, size_t size)
|
||||||
size_t size)
|
|
||||||
{
|
{
|
||||||
int ret;
|
int ret;
|
||||||
u32 val;
|
u32 val;
|
||||||
|
|
||||||
if (pci->ops->read_dbi2)
|
if (pci->ops->read_dbi2)
|
||||||
return pci->ops->read_dbi2(pci, base, reg, size);
|
return pci->ops->read_dbi2(pci, pci->dbi_base2, reg, size);
|
||||||
|
|
||||||
ret = dw_pcie_read(base + reg, size, &val);
|
ret = dw_pcie_read(pci->dbi_base2 + reg, size, &val);
|
||||||
if (ret)
|
if (ret)
|
||||||
dev_err(pci->dev, "read DBI address failed\n");
|
dev_err(pci->dev, "read DBI address failed\n");
|
||||||
|
|
||||||
return val;
|
return val;
|
||||||
}
|
}
|
||||||
|
|
||||||
void __dw_pcie_write_dbi2(struct dw_pcie *pci, void __iomem *base, u32 reg,
|
void dw_pcie_write_dbi2(struct dw_pcie *pci, u32 reg, size_t size, u32 val)
|
||||||
size_t size, u32 val)
|
|
||||||
{
|
{
|
||||||
int ret;
|
int ret;
|
||||||
|
|
||||||
if (pci->ops->write_dbi2) {
|
if (pci->ops->write_dbi2) {
|
||||||
pci->ops->write_dbi2(pci, base, reg, size, val);
|
pci->ops->write_dbi2(pci, pci->dbi_base2, reg, size, val);
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
ret = dw_pcie_write(base + reg, size, val);
|
ret = dw_pcie_write(pci->dbi_base2 + reg, size, val);
|
||||||
if (ret)
|
if (ret)
|
||||||
dev_err(pci->dev, "write DBI address failed\n");
|
dev_err(pci->dev, "write DBI address failed\n");
|
||||||
}
|
}
|
||||||
|
|
||||||
|
u32 dw_pcie_read_atu(struct dw_pcie *pci, u32 reg, size_t size)
|
||||||
|
{
|
||||||
|
int ret;
|
||||||
|
u32 val;
|
||||||
|
|
||||||
|
if (pci->ops->read_dbi)
|
||||||
|
return pci->ops->read_dbi(pci, pci->atu_base, reg, size);
|
||||||
|
|
||||||
|
ret = dw_pcie_read(pci->atu_base + reg, size, &val);
|
||||||
|
if (ret)
|
||||||
|
dev_err(pci->dev, "Read ATU address failed\n");
|
||||||
|
|
||||||
|
return val;
|
||||||
|
}
|
||||||
|
|
||||||
|
void dw_pcie_write_atu(struct dw_pcie *pci, u32 reg, size_t size, u32 val)
|
||||||
|
{
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
if (pci->ops->write_dbi) {
|
||||||
|
pci->ops->write_dbi(pci, pci->atu_base, reg, size, val);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
ret = dw_pcie_write(pci->atu_base + reg, size, val);
|
||||||
|
if (ret)
|
||||||
|
dev_err(pci->dev, "Write ATU address failed\n");
|
||||||
|
}
|
||||||
|
|
||||||
static u32 dw_pcie_readl_ob_unroll(struct dw_pcie *pci, u32 index, u32 reg)
|
static u32 dw_pcie_readl_ob_unroll(struct dw_pcie *pci, u32 index, u32 reg)
|
||||||
{
|
{
|
||||||
u32 offset = PCIE_GET_ATU_OUTB_UNR_REG_OFFSET(index);
|
u32 offset = PCIE_GET_ATU_OUTB_UNR_REG_OFFSET(index);
|
||||||
|
|
|
@ -254,14 +254,12 @@ struct dw_pcie {
|
||||||
int dw_pcie_read(void __iomem *addr, int size, u32 *val);
|
int dw_pcie_read(void __iomem *addr, int size, u32 *val);
|
||||||
int dw_pcie_write(void __iomem *addr, int size, u32 val);
|
int dw_pcie_write(void __iomem *addr, int size, u32 val);
|
||||||
|
|
||||||
u32 __dw_pcie_read_dbi(struct dw_pcie *pci, void __iomem *base, u32 reg,
|
u32 dw_pcie_read_dbi(struct dw_pcie *pci, u32 reg, size_t size);
|
||||||
size_t size);
|
void dw_pcie_write_dbi(struct dw_pcie *pci, u32 reg, size_t size, u32 val);
|
||||||
void __dw_pcie_write_dbi(struct dw_pcie *pci, void __iomem *base, u32 reg,
|
u32 dw_pcie_read_dbi2(struct dw_pcie *pci, u32 reg, size_t size);
|
||||||
size_t size, u32 val);
|
void dw_pcie_write_dbi2(struct dw_pcie *pci, u32 reg, size_t size, u32 val);
|
||||||
u32 __dw_pcie_read_dbi2(struct dw_pcie *pci, void __iomem *base, u32 reg,
|
u32 dw_pcie_read_atu(struct dw_pcie *pci, u32 reg, size_t size);
|
||||||
size_t size);
|
void dw_pcie_write_atu(struct dw_pcie *pci, u32 reg, size_t size, u32 val);
|
||||||
void __dw_pcie_write_dbi2(struct dw_pcie *pci, void __iomem *base, u32 reg,
|
|
||||||
size_t size, u32 val);
|
|
||||||
int dw_pcie_link_up(struct dw_pcie *pci);
|
int dw_pcie_link_up(struct dw_pcie *pci);
|
||||||
int dw_pcie_wait_for_link(struct dw_pcie *pci);
|
int dw_pcie_wait_for_link(struct dw_pcie *pci);
|
||||||
void dw_pcie_prog_outbound_atu(struct dw_pcie *pci, int index,
|
void dw_pcie_prog_outbound_atu(struct dw_pcie *pci, int index,
|
||||||
|
@ -275,52 +273,52 @@ void dw_pcie_setup(struct dw_pcie *pci);
|
||||||
|
|
||||||
static inline void dw_pcie_writel_dbi(struct dw_pcie *pci, u32 reg, u32 val)
|
static inline void dw_pcie_writel_dbi(struct dw_pcie *pci, u32 reg, u32 val)
|
||||||
{
|
{
|
||||||
__dw_pcie_write_dbi(pci, pci->dbi_base, reg, 0x4, val);
|
dw_pcie_write_dbi(pci, reg, 0x4, val);
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline u32 dw_pcie_readl_dbi(struct dw_pcie *pci, u32 reg)
|
static inline u32 dw_pcie_readl_dbi(struct dw_pcie *pci, u32 reg)
|
||||||
{
|
{
|
||||||
return __dw_pcie_read_dbi(pci, pci->dbi_base, reg, 0x4);
|
return dw_pcie_read_dbi(pci, reg, 0x4);
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline void dw_pcie_writew_dbi(struct dw_pcie *pci, u32 reg, u16 val)
|
static inline void dw_pcie_writew_dbi(struct dw_pcie *pci, u32 reg, u16 val)
|
||||||
{
|
{
|
||||||
__dw_pcie_write_dbi(pci, pci->dbi_base, reg, 0x2, val);
|
dw_pcie_write_dbi(pci, reg, 0x2, val);
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline u16 dw_pcie_readw_dbi(struct dw_pcie *pci, u32 reg)
|
static inline u16 dw_pcie_readw_dbi(struct dw_pcie *pci, u32 reg)
|
||||||
{
|
{
|
||||||
return __dw_pcie_read_dbi(pci, pci->dbi_base, reg, 0x2);
|
return dw_pcie_read_dbi(pci, reg, 0x2);
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline void dw_pcie_writeb_dbi(struct dw_pcie *pci, u32 reg, u8 val)
|
static inline void dw_pcie_writeb_dbi(struct dw_pcie *pci, u32 reg, u8 val)
|
||||||
{
|
{
|
||||||
__dw_pcie_write_dbi(pci, pci->dbi_base, reg, 0x1, val);
|
dw_pcie_write_dbi(pci, reg, 0x1, val);
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline u8 dw_pcie_readb_dbi(struct dw_pcie *pci, u32 reg)
|
static inline u8 dw_pcie_readb_dbi(struct dw_pcie *pci, u32 reg)
|
||||||
{
|
{
|
||||||
return __dw_pcie_read_dbi(pci, pci->dbi_base, reg, 0x1);
|
return dw_pcie_read_dbi(pci, reg, 0x1);
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline void dw_pcie_writel_dbi2(struct dw_pcie *pci, u32 reg, u32 val)
|
static inline void dw_pcie_writel_dbi2(struct dw_pcie *pci, u32 reg, u32 val)
|
||||||
{
|
{
|
||||||
__dw_pcie_write_dbi2(pci, pci->dbi_base2, reg, 0x4, val);
|
dw_pcie_write_dbi2(pci, reg, 0x4, val);
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline u32 dw_pcie_readl_dbi2(struct dw_pcie *pci, u32 reg)
|
static inline u32 dw_pcie_readl_dbi2(struct dw_pcie *pci, u32 reg)
|
||||||
{
|
{
|
||||||
return __dw_pcie_read_dbi2(pci, pci->dbi_base2, reg, 0x4);
|
return dw_pcie_read_dbi2(pci, reg, 0x4);
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline void dw_pcie_writel_atu(struct dw_pcie *pci, u32 reg, u32 val)
|
static inline void dw_pcie_writel_atu(struct dw_pcie *pci, u32 reg, u32 val)
|
||||||
{
|
{
|
||||||
__dw_pcie_write_dbi(pci, pci->atu_base, reg, 0x4, val);
|
dw_pcie_write_atu(pci, reg, 0x4, val);
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline u32 dw_pcie_readl_atu(struct dw_pcie *pci, u32 reg)
|
static inline u32 dw_pcie_readl_atu(struct dw_pcie *pci, u32 reg)
|
||||||
{
|
{
|
||||||
return __dw_pcie_read_dbi(pci, pci->atu_base, reg, 0x4);
|
return dw_pcie_read_atu(pci, reg, 0x4);
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline void dw_pcie_dbi_ro_wr_en(struct dw_pcie *pci)
|
static inline void dw_pcie_dbi_ro_wr_en(struct dw_pcie *pci)
|
||||||
|
@ -351,6 +349,7 @@ void dw_pcie_msi_init(struct pcie_port *pp);
|
||||||
void dw_pcie_free_msi(struct pcie_port *pp);
|
void dw_pcie_free_msi(struct pcie_port *pp);
|
||||||
void dw_pcie_setup_rc(struct pcie_port *pp);
|
void dw_pcie_setup_rc(struct pcie_port *pp);
|
||||||
int dw_pcie_host_init(struct pcie_port *pp);
|
int dw_pcie_host_init(struct pcie_port *pp);
|
||||||
|
void dw_pcie_host_deinit(struct pcie_port *pp);
|
||||||
int dw_pcie_allocate_domains(struct pcie_port *pp);
|
int dw_pcie_allocate_domains(struct pcie_port *pp);
|
||||||
#else
|
#else
|
||||||
static inline irqreturn_t dw_handle_msi_irq(struct pcie_port *pp)
|
static inline irqreturn_t dw_handle_msi_irq(struct pcie_port *pp)
|
||||||
|
@ -375,6 +374,10 @@ static inline int dw_pcie_host_init(struct pcie_port *pp)
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static inline void dw_pcie_host_deinit(struct pcie_port *pp)
|
||||||
|
{
|
||||||
|
}
|
||||||
|
|
||||||
static inline int dw_pcie_allocate_domains(struct pcie_port *pp)
|
static inline int dw_pcie_allocate_domains(struct pcie_port *pp)
|
||||||
{
|
{
|
||||||
return 0;
|
return 0;
|
||||||
|
|
|
@ -2,7 +2,7 @@
|
||||||
/*
|
/*
|
||||||
* PCIe host controller driver for Kirin Phone SoCs
|
* PCIe host controller driver for Kirin Phone SoCs
|
||||||
*
|
*
|
||||||
* Copyright (C) 2017 Hilisicon Electronics Co., Ltd.
|
* Copyright (C) 2017 HiSilicon Electronics Co., Ltd.
|
||||||
* http://www.huawei.com
|
* http://www.huawei.com
|
||||||
*
|
*
|
||||||
* Author: Xiaowei Song <songxiaowei@huawei.com>
|
* Author: Xiaowei Song <songxiaowei@huawei.com>
|
||||||
|
|
|
@ -112,10 +112,10 @@ struct qcom_pcie_resources_2_3_2 {
|
||||||
struct regulator_bulk_data supplies[QCOM_PCIE_2_3_2_MAX_SUPPLY];
|
struct regulator_bulk_data supplies[QCOM_PCIE_2_3_2_MAX_SUPPLY];
|
||||||
};
|
};
|
||||||
|
|
||||||
|
#define QCOM_PCIE_2_4_0_MAX_CLOCKS 4
|
||||||
struct qcom_pcie_resources_2_4_0 {
|
struct qcom_pcie_resources_2_4_0 {
|
||||||
struct clk *aux_clk;
|
struct clk_bulk_data clks[QCOM_PCIE_2_4_0_MAX_CLOCKS];
|
||||||
struct clk *master_clk;
|
int num_clks;
|
||||||
struct clk *slave_clk;
|
|
||||||
struct reset_control *axi_m_reset;
|
struct reset_control *axi_m_reset;
|
||||||
struct reset_control *axi_s_reset;
|
struct reset_control *axi_s_reset;
|
||||||
struct reset_control *pipe_reset;
|
struct reset_control *pipe_reset;
|
||||||
|
@ -178,6 +178,8 @@ static void qcom_ep_reset_assert(struct qcom_pcie *pcie)
|
||||||
|
|
||||||
static void qcom_ep_reset_deassert(struct qcom_pcie *pcie)
|
static void qcom_ep_reset_deassert(struct qcom_pcie *pcie)
|
||||||
{
|
{
|
||||||
|
/* Ensure that PERST has been asserted for at least 100 ms */
|
||||||
|
msleep(100);
|
||||||
gpiod_set_value_cansleep(pcie->reset, 0);
|
gpiod_set_value_cansleep(pcie->reset, 0);
|
||||||
usleep_range(PERST_DELAY_US, PERST_DELAY_US + 500);
|
usleep_range(PERST_DELAY_US, PERST_DELAY_US + 500);
|
||||||
}
|
}
|
||||||
|
@ -638,18 +640,20 @@ static int qcom_pcie_get_resources_2_4_0(struct qcom_pcie *pcie)
|
||||||
struct qcom_pcie_resources_2_4_0 *res = &pcie->res.v2_4_0;
|
struct qcom_pcie_resources_2_4_0 *res = &pcie->res.v2_4_0;
|
||||||
struct dw_pcie *pci = pcie->pci;
|
struct dw_pcie *pci = pcie->pci;
|
||||||
struct device *dev = pci->dev;
|
struct device *dev = pci->dev;
|
||||||
|
bool is_ipq = of_device_is_compatible(dev->of_node, "qcom,pcie-ipq4019");
|
||||||
|
int ret;
|
||||||
|
|
||||||
res->aux_clk = devm_clk_get(dev, "aux");
|
res->clks[0].id = "aux";
|
||||||
if (IS_ERR(res->aux_clk))
|
res->clks[1].id = "master_bus";
|
||||||
return PTR_ERR(res->aux_clk);
|
res->clks[2].id = "slave_bus";
|
||||||
|
res->clks[3].id = "iface";
|
||||||
|
|
||||||
res->master_clk = devm_clk_get(dev, "master_bus");
|
/* qcom,pcie-ipq4019 is defined without "iface" */
|
||||||
if (IS_ERR(res->master_clk))
|
res->num_clks = is_ipq ? 3 : 4;
|
||||||
return PTR_ERR(res->master_clk);
|
|
||||||
|
|
||||||
res->slave_clk = devm_clk_get(dev, "slave_bus");
|
ret = devm_clk_bulk_get(dev, res->num_clks, res->clks);
|
||||||
if (IS_ERR(res->slave_clk))
|
if (ret < 0)
|
||||||
return PTR_ERR(res->slave_clk);
|
return ret;
|
||||||
|
|
||||||
res->axi_m_reset = devm_reset_control_get_exclusive(dev, "axi_m");
|
res->axi_m_reset = devm_reset_control_get_exclusive(dev, "axi_m");
|
||||||
if (IS_ERR(res->axi_m_reset))
|
if (IS_ERR(res->axi_m_reset))
|
||||||
|
@ -659,27 +663,33 @@ static int qcom_pcie_get_resources_2_4_0(struct qcom_pcie *pcie)
|
||||||
if (IS_ERR(res->axi_s_reset))
|
if (IS_ERR(res->axi_s_reset))
|
||||||
return PTR_ERR(res->axi_s_reset);
|
return PTR_ERR(res->axi_s_reset);
|
||||||
|
|
||||||
res->pipe_reset = devm_reset_control_get_exclusive(dev, "pipe");
|
if (is_ipq) {
|
||||||
if (IS_ERR(res->pipe_reset))
|
/*
|
||||||
return PTR_ERR(res->pipe_reset);
|
* These resources relates to the PHY or are secure clocks, but
|
||||||
|
* are controlled here for IPQ4019
|
||||||
|
*/
|
||||||
|
res->pipe_reset = devm_reset_control_get_exclusive(dev, "pipe");
|
||||||
|
if (IS_ERR(res->pipe_reset))
|
||||||
|
return PTR_ERR(res->pipe_reset);
|
||||||
|
|
||||||
res->axi_m_vmid_reset = devm_reset_control_get_exclusive(dev,
|
res->axi_m_vmid_reset = devm_reset_control_get_exclusive(dev,
|
||||||
"axi_m_vmid");
|
"axi_m_vmid");
|
||||||
if (IS_ERR(res->axi_m_vmid_reset))
|
if (IS_ERR(res->axi_m_vmid_reset))
|
||||||
return PTR_ERR(res->axi_m_vmid_reset);
|
return PTR_ERR(res->axi_m_vmid_reset);
|
||||||
|
|
||||||
res->axi_s_xpu_reset = devm_reset_control_get_exclusive(dev,
|
res->axi_s_xpu_reset = devm_reset_control_get_exclusive(dev,
|
||||||
"axi_s_xpu");
|
"axi_s_xpu");
|
||||||
if (IS_ERR(res->axi_s_xpu_reset))
|
if (IS_ERR(res->axi_s_xpu_reset))
|
||||||
return PTR_ERR(res->axi_s_xpu_reset);
|
return PTR_ERR(res->axi_s_xpu_reset);
|
||||||
|
|
||||||
res->parf_reset = devm_reset_control_get_exclusive(dev, "parf");
|
res->parf_reset = devm_reset_control_get_exclusive(dev, "parf");
|
||||||
if (IS_ERR(res->parf_reset))
|
if (IS_ERR(res->parf_reset))
|
||||||
return PTR_ERR(res->parf_reset);
|
return PTR_ERR(res->parf_reset);
|
||||||
|
|
||||||
res->phy_reset = devm_reset_control_get_exclusive(dev, "phy");
|
res->phy_reset = devm_reset_control_get_exclusive(dev, "phy");
|
||||||
if (IS_ERR(res->phy_reset))
|
if (IS_ERR(res->phy_reset))
|
||||||
return PTR_ERR(res->phy_reset);
|
return PTR_ERR(res->phy_reset);
|
||||||
|
}
|
||||||
|
|
||||||
res->axi_m_sticky_reset = devm_reset_control_get_exclusive(dev,
|
res->axi_m_sticky_reset = devm_reset_control_get_exclusive(dev,
|
||||||
"axi_m_sticky");
|
"axi_m_sticky");
|
||||||
|
@ -699,9 +709,11 @@ static int qcom_pcie_get_resources_2_4_0(struct qcom_pcie *pcie)
|
||||||
if (IS_ERR(res->ahb_reset))
|
if (IS_ERR(res->ahb_reset))
|
||||||
return PTR_ERR(res->ahb_reset);
|
return PTR_ERR(res->ahb_reset);
|
||||||
|
|
||||||
res->phy_ahb_reset = devm_reset_control_get_exclusive(dev, "phy_ahb");
|
if (is_ipq) {
|
||||||
if (IS_ERR(res->phy_ahb_reset))
|
res->phy_ahb_reset = devm_reset_control_get_exclusive(dev, "phy_ahb");
|
||||||
return PTR_ERR(res->phy_ahb_reset);
|
if (IS_ERR(res->phy_ahb_reset))
|
||||||
|
return PTR_ERR(res->phy_ahb_reset);
|
||||||
|
}
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
@ -719,9 +731,7 @@ static void qcom_pcie_deinit_2_4_0(struct qcom_pcie *pcie)
|
||||||
reset_control_assert(res->axi_m_sticky_reset);
|
reset_control_assert(res->axi_m_sticky_reset);
|
||||||
reset_control_assert(res->pwr_reset);
|
reset_control_assert(res->pwr_reset);
|
||||||
reset_control_assert(res->ahb_reset);
|
reset_control_assert(res->ahb_reset);
|
||||||
clk_disable_unprepare(res->aux_clk);
|
clk_bulk_disable_unprepare(res->num_clks, res->clks);
|
||||||
clk_disable_unprepare(res->master_clk);
|
|
||||||
clk_disable_unprepare(res->slave_clk);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
static int qcom_pcie_init_2_4_0(struct qcom_pcie *pcie)
|
static int qcom_pcie_init_2_4_0(struct qcom_pcie *pcie)
|
||||||
|
@ -850,23 +860,9 @@ static int qcom_pcie_init_2_4_0(struct qcom_pcie *pcie)
|
||||||
|
|
||||||
usleep_range(10000, 12000);
|
usleep_range(10000, 12000);
|
||||||
|
|
||||||
ret = clk_prepare_enable(res->aux_clk);
|
ret = clk_bulk_prepare_enable(res->num_clks, res->clks);
|
||||||
if (ret) {
|
if (ret)
|
||||||
dev_err(dev, "cannot prepare/enable iface clock\n");
|
goto err_clks;
|
||||||
goto err_clk_aux;
|
|
||||||
}
|
|
||||||
|
|
||||||
ret = clk_prepare_enable(res->master_clk);
|
|
||||||
if (ret) {
|
|
||||||
dev_err(dev, "cannot prepare/enable core clock\n");
|
|
||||||
goto err_clk_axi_m;
|
|
||||||
}
|
|
||||||
|
|
||||||
ret = clk_prepare_enable(res->slave_clk);
|
|
||||||
if (ret) {
|
|
||||||
dev_err(dev, "cannot prepare/enable phy clock\n");
|
|
||||||
goto err_clk_axi_s;
|
|
||||||
}
|
|
||||||
|
|
||||||
/* enable PCIe clocks and resets */
|
/* enable PCIe clocks and resets */
|
||||||
val = readl(pcie->parf + PCIE20_PARF_PHY_CTRL);
|
val = readl(pcie->parf + PCIE20_PARF_PHY_CTRL);
|
||||||
|
@ -891,11 +887,7 @@ static int qcom_pcie_init_2_4_0(struct qcom_pcie *pcie)
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
|
|
||||||
err_clk_axi_s:
|
err_clks:
|
||||||
clk_disable_unprepare(res->master_clk);
|
|
||||||
err_clk_axi_m:
|
|
||||||
clk_disable_unprepare(res->aux_clk);
|
|
||||||
err_clk_aux:
|
|
||||||
reset_control_assert(res->ahb_reset);
|
reset_control_assert(res->ahb_reset);
|
||||||
err_rst_ahb:
|
err_rst_ahb:
|
||||||
reset_control_assert(res->pwr_reset);
|
reset_control_assert(res->pwr_reset);
|
||||||
|
@ -1289,6 +1281,7 @@ static const struct of_device_id qcom_pcie_match[] = {
|
||||||
{ .compatible = "qcom,pcie-msm8996", .data = &ops_2_3_2 },
|
{ .compatible = "qcom,pcie-msm8996", .data = &ops_2_3_2 },
|
||||||
{ .compatible = "qcom,pcie-ipq8074", .data = &ops_2_3_3 },
|
{ .compatible = "qcom,pcie-ipq8074", .data = &ops_2_3_3 },
|
||||||
{ .compatible = "qcom,pcie-ipq4019", .data = &ops_2_4_0 },
|
{ .compatible = "qcom,pcie-ipq4019", .data = &ops_2_4_0 },
|
||||||
|
{ .compatible = "qcom,pcie-qcs404", .data = &ops_2_4_0 },
|
||||||
{ }
|
{ }
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|
|
@ -308,7 +308,7 @@ static void advk_pcie_setup_hw(struct advk_pcie *pcie)
|
||||||
|
|
||||||
advk_writel(pcie, PCIE_ISR1_ALL_MASK, PCIE_ISR1_MASK_REG);
|
advk_writel(pcie, PCIE_ISR1_ALL_MASK, PCIE_ISR1_MASK_REG);
|
||||||
|
|
||||||
/* Unmask all MSI's */
|
/* Unmask all MSIs */
|
||||||
advk_writel(pcie, 0, PCIE_MSI_MASK_REG);
|
advk_writel(pcie, 0, PCIE_MSI_MASK_REG);
|
||||||
|
|
||||||
/* Enable summary interrupt for GIC SPI source */
|
/* Enable summary interrupt for GIC SPI source */
|
||||||
|
|
|
@ -1875,6 +1875,7 @@ static void hv_pci_devices_present(struct hv_pcibus_device *hbus,
|
||||||
static void hv_eject_device_work(struct work_struct *work)
|
static void hv_eject_device_work(struct work_struct *work)
|
||||||
{
|
{
|
||||||
struct pci_eject_response *ejct_pkt;
|
struct pci_eject_response *ejct_pkt;
|
||||||
|
struct hv_pcibus_device *hbus;
|
||||||
struct hv_pci_dev *hpdev;
|
struct hv_pci_dev *hpdev;
|
||||||
struct pci_dev *pdev;
|
struct pci_dev *pdev;
|
||||||
unsigned long flags;
|
unsigned long flags;
|
||||||
|
@ -1885,6 +1886,7 @@ static void hv_eject_device_work(struct work_struct *work)
|
||||||
} ctxt;
|
} ctxt;
|
||||||
|
|
||||||
hpdev = container_of(work, struct hv_pci_dev, wrk);
|
hpdev = container_of(work, struct hv_pci_dev, wrk);
|
||||||
|
hbus = hpdev->hbus;
|
||||||
|
|
||||||
WARN_ON(hpdev->state != hv_pcichild_ejecting);
|
WARN_ON(hpdev->state != hv_pcichild_ejecting);
|
||||||
|
|
||||||
|
@ -1895,8 +1897,7 @@ static void hv_eject_device_work(struct work_struct *work)
|
||||||
* because hbus->pci_bus may not exist yet.
|
* because hbus->pci_bus may not exist yet.
|
||||||
*/
|
*/
|
||||||
wslot = wslot_to_devfn(hpdev->desc.win_slot.slot);
|
wslot = wslot_to_devfn(hpdev->desc.win_slot.slot);
|
||||||
pdev = pci_get_domain_bus_and_slot(hpdev->hbus->sysdata.domain, 0,
|
pdev = pci_get_domain_bus_and_slot(hbus->sysdata.domain, 0, wslot);
|
||||||
wslot);
|
|
||||||
if (pdev) {
|
if (pdev) {
|
||||||
pci_lock_rescan_remove();
|
pci_lock_rescan_remove();
|
||||||
pci_stop_and_remove_bus_device(pdev);
|
pci_stop_and_remove_bus_device(pdev);
|
||||||
|
@ -1904,9 +1905,9 @@ static void hv_eject_device_work(struct work_struct *work)
|
||||||
pci_unlock_rescan_remove();
|
pci_unlock_rescan_remove();
|
||||||
}
|
}
|
||||||
|
|
||||||
spin_lock_irqsave(&hpdev->hbus->device_list_lock, flags);
|
spin_lock_irqsave(&hbus->device_list_lock, flags);
|
||||||
list_del(&hpdev->list_entry);
|
list_del(&hpdev->list_entry);
|
||||||
spin_unlock_irqrestore(&hpdev->hbus->device_list_lock, flags);
|
spin_unlock_irqrestore(&hbus->device_list_lock, flags);
|
||||||
|
|
||||||
if (hpdev->pci_slot)
|
if (hpdev->pci_slot)
|
||||||
pci_destroy_slot(hpdev->pci_slot);
|
pci_destroy_slot(hpdev->pci_slot);
|
||||||
|
@ -1915,7 +1916,7 @@ static void hv_eject_device_work(struct work_struct *work)
|
||||||
ejct_pkt = (struct pci_eject_response *)&ctxt.pkt.message;
|
ejct_pkt = (struct pci_eject_response *)&ctxt.pkt.message;
|
||||||
ejct_pkt->message_type.type = PCI_EJECTION_COMPLETE;
|
ejct_pkt->message_type.type = PCI_EJECTION_COMPLETE;
|
||||||
ejct_pkt->wslot.slot = hpdev->desc.win_slot.slot;
|
ejct_pkt->wslot.slot = hpdev->desc.win_slot.slot;
|
||||||
vmbus_sendpacket(hpdev->hbus->hdev->channel, ejct_pkt,
|
vmbus_sendpacket(hbus->hdev->channel, ejct_pkt,
|
||||||
sizeof(*ejct_pkt), (unsigned long)&ctxt.pkt,
|
sizeof(*ejct_pkt), (unsigned long)&ctxt.pkt,
|
||||||
VM_PKT_DATA_INBAND, 0);
|
VM_PKT_DATA_INBAND, 0);
|
||||||
|
|
||||||
|
@ -1924,7 +1925,9 @@ static void hv_eject_device_work(struct work_struct *work)
|
||||||
/* For the two refs got in new_pcichild_device() */
|
/* For the two refs got in new_pcichild_device() */
|
||||||
put_pcichild(hpdev);
|
put_pcichild(hpdev);
|
||||||
put_pcichild(hpdev);
|
put_pcichild(hpdev);
|
||||||
put_hvpcibus(hpdev->hbus);
|
/* hpdev has been freed. Do not use it any more. */
|
||||||
|
|
||||||
|
put_hvpcibus(hbus);
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
|
|
|
@ -17,6 +17,7 @@
|
||||||
#include <linux/debugfs.h>
|
#include <linux/debugfs.h>
|
||||||
#include <linux/delay.h>
|
#include <linux/delay.h>
|
||||||
#include <linux/export.h>
|
#include <linux/export.h>
|
||||||
|
#include <linux/gpio/consumer.h>
|
||||||
#include <linux/interrupt.h>
|
#include <linux/interrupt.h>
|
||||||
#include <linux/iopoll.h>
|
#include <linux/iopoll.h>
|
||||||
#include <linux/irq.h>
|
#include <linux/irq.h>
|
||||||
|
@ -30,6 +31,7 @@
|
||||||
#include <linux/of_platform.h>
|
#include <linux/of_platform.h>
|
||||||
#include <linux/pci.h>
|
#include <linux/pci.h>
|
||||||
#include <linux/phy/phy.h>
|
#include <linux/phy/phy.h>
|
||||||
|
#include <linux/pinctrl/consumer.h>
|
||||||
#include <linux/platform_device.h>
|
#include <linux/platform_device.h>
|
||||||
#include <linux/reset.h>
|
#include <linux/reset.h>
|
||||||
#include <linux/sizes.h>
|
#include <linux/sizes.h>
|
||||||
|
@ -95,7 +97,8 @@
|
||||||
#define AFI_MSI_EN_VEC7 0xa8
|
#define AFI_MSI_EN_VEC7 0xa8
|
||||||
|
|
||||||
#define AFI_CONFIGURATION 0xac
|
#define AFI_CONFIGURATION 0xac
|
||||||
#define AFI_CONFIGURATION_EN_FPCI (1 << 0)
|
#define AFI_CONFIGURATION_EN_FPCI (1 << 0)
|
||||||
|
#define AFI_CONFIGURATION_CLKEN_OVERRIDE (1 << 31)
|
||||||
|
|
||||||
#define AFI_FPCI_ERROR_MASKS 0xb0
|
#define AFI_FPCI_ERROR_MASKS 0xb0
|
||||||
|
|
||||||
|
@ -159,13 +162,14 @@
|
||||||
#define AFI_PCIE_CONFIG_SM2TMS0_XBAR_CONFIG_211 (0x1 << 20)
|
#define AFI_PCIE_CONFIG_SM2TMS0_XBAR_CONFIG_211 (0x1 << 20)
|
||||||
#define AFI_PCIE_CONFIG_SM2TMS0_XBAR_CONFIG_411 (0x2 << 20)
|
#define AFI_PCIE_CONFIG_SM2TMS0_XBAR_CONFIG_411 (0x2 << 20)
|
||||||
#define AFI_PCIE_CONFIG_SM2TMS0_XBAR_CONFIG_111 (0x2 << 20)
|
#define AFI_PCIE_CONFIG_SM2TMS0_XBAR_CONFIG_111 (0x2 << 20)
|
||||||
|
#define AFI_PCIE_CONFIG_PCIE_CLKREQ_GPIO(x) (1 << ((x) + 29))
|
||||||
|
#define AFI_PCIE_CONFIG_PCIE_CLKREQ_GPIO_ALL (0x7 << 29)
|
||||||
|
|
||||||
#define AFI_FUSE 0x104
|
#define AFI_FUSE 0x104
|
||||||
#define AFI_FUSE_PCIE_T0_GEN2_DIS (1 << 2)
|
#define AFI_FUSE_PCIE_T0_GEN2_DIS (1 << 2)
|
||||||
|
|
||||||
#define AFI_PEX0_CTRL 0x110
|
#define AFI_PEX0_CTRL 0x110
|
||||||
#define AFI_PEX1_CTRL 0x118
|
#define AFI_PEX1_CTRL 0x118
|
||||||
#define AFI_PEX2_CTRL 0x128
|
|
||||||
#define AFI_PEX_CTRL_RST (1 << 0)
|
#define AFI_PEX_CTRL_RST (1 << 0)
|
||||||
#define AFI_PEX_CTRL_CLKREQ_EN (1 << 1)
|
#define AFI_PEX_CTRL_CLKREQ_EN (1 << 1)
|
||||||
#define AFI_PEX_CTRL_REFCLK_EN (1 << 3)
|
#define AFI_PEX_CTRL_REFCLK_EN (1 << 3)
|
||||||
|
@ -177,20 +181,74 @@
|
||||||
|
|
||||||
#define AFI_PEXBIAS_CTRL_0 0x168
|
#define AFI_PEXBIAS_CTRL_0 0x168
|
||||||
|
|
||||||
|
#define RP_PRIV_XP_DL 0x00000494
|
||||||
|
#define RP_PRIV_XP_DL_GEN2_UPD_FC_TSHOLD (0x1ff << 1)
|
||||||
|
|
||||||
|
#define RP_RX_HDR_LIMIT 0x00000e00
|
||||||
|
#define RP_RX_HDR_LIMIT_PW_MASK (0xff << 8)
|
||||||
|
#define RP_RX_HDR_LIMIT_PW (0x0e << 8)
|
||||||
|
|
||||||
|
#define RP_ECTL_2_R1 0x00000e84
|
||||||
|
#define RP_ECTL_2_R1_RX_CTLE_1C_MASK 0xffff
|
||||||
|
|
||||||
|
#define RP_ECTL_4_R1 0x00000e8c
|
||||||
|
#define RP_ECTL_4_R1_RX_CDR_CTRL_1C_MASK (0xffff << 16)
|
||||||
|
#define RP_ECTL_4_R1_RX_CDR_CTRL_1C_SHIFT 16
|
||||||
|
|
||||||
|
#define RP_ECTL_5_R1 0x00000e90
|
||||||
|
#define RP_ECTL_5_R1_RX_EQ_CTRL_L_1C_MASK 0xffffffff
|
||||||
|
|
||||||
|
#define RP_ECTL_6_R1 0x00000e94
|
||||||
|
#define RP_ECTL_6_R1_RX_EQ_CTRL_H_1C_MASK 0xffffffff
|
||||||
|
|
||||||
|
#define RP_ECTL_2_R2 0x00000ea4
|
||||||
|
#define RP_ECTL_2_R2_RX_CTLE_1C_MASK 0xffff
|
||||||
|
|
||||||
|
#define RP_ECTL_4_R2 0x00000eac
|
||||||
|
#define RP_ECTL_4_R2_RX_CDR_CTRL_1C_MASK (0xffff << 16)
|
||||||
|
#define RP_ECTL_4_R2_RX_CDR_CTRL_1C_SHIFT 16
|
||||||
|
|
||||||
|
#define RP_ECTL_5_R2 0x00000eb0
|
||||||
|
#define RP_ECTL_5_R2_RX_EQ_CTRL_L_1C_MASK 0xffffffff
|
||||||
|
|
||||||
|
#define RP_ECTL_6_R2 0x00000eb4
|
||||||
|
#define RP_ECTL_6_R2_RX_EQ_CTRL_H_1C_MASK 0xffffffff
|
||||||
|
|
||||||
#define RP_VEND_XP 0x00000f00
|
#define RP_VEND_XP 0x00000f00
|
||||||
#define RP_VEND_XP_DL_UP (1 << 30)
|
#define RP_VEND_XP_DL_UP (1 << 30)
|
||||||
|
#define RP_VEND_XP_OPPORTUNISTIC_ACK (1 << 27)
|
||||||
|
#define RP_VEND_XP_OPPORTUNISTIC_UPDATEFC (1 << 28)
|
||||||
|
#define RP_VEND_XP_UPDATE_FC_THRESHOLD_MASK (0xff << 18)
|
||||||
|
|
||||||
|
#define RP_VEND_CTL0 0x00000f44
|
||||||
|
#define RP_VEND_CTL0_DSK_RST_PULSE_WIDTH_MASK (0xf << 12)
|
||||||
|
#define RP_VEND_CTL0_DSK_RST_PULSE_WIDTH (0x9 << 12)
|
||||||
|
|
||||||
|
#define RP_VEND_CTL1 0x00000f48
|
||||||
|
#define RP_VEND_CTL1_ERPT (1 << 13)
|
||||||
|
|
||||||
|
#define RP_VEND_XP_BIST 0x00000f4c
|
||||||
|
#define RP_VEND_XP_BIST_GOTO_L1_L2_AFTER_DLLP_DONE (1 << 28)
|
||||||
|
|
||||||
#define RP_VEND_CTL2 0x00000fa8
|
#define RP_VEND_CTL2 0x00000fa8
|
||||||
#define RP_VEND_CTL2_PCA_ENABLE (1 << 7)
|
#define RP_VEND_CTL2_PCA_ENABLE (1 << 7)
|
||||||
|
|
||||||
#define RP_PRIV_MISC 0x00000fe0
|
#define RP_PRIV_MISC 0x00000fe0
|
||||||
#define RP_PRIV_MISC_PRSNT_MAP_EP_PRSNT (0xe << 0)
|
#define RP_PRIV_MISC_PRSNT_MAP_EP_PRSNT (0xe << 0)
|
||||||
#define RP_PRIV_MISC_PRSNT_MAP_EP_ABSNT (0xf << 0)
|
#define RP_PRIV_MISC_PRSNT_MAP_EP_ABSNT (0xf << 0)
|
||||||
|
#define RP_PRIV_MISC_CTLR_CLK_CLAMP_THRESHOLD_MASK (0x7f << 16)
|
||||||
|
#define RP_PRIV_MISC_CTLR_CLK_CLAMP_THRESHOLD (0xf << 16)
|
||||||
|
#define RP_PRIV_MISC_CTLR_CLK_CLAMP_ENABLE (1 << 23)
|
||||||
|
#define RP_PRIV_MISC_TMS_CLK_CLAMP_THRESHOLD_MASK (0x7f << 24)
|
||||||
|
#define RP_PRIV_MISC_TMS_CLK_CLAMP_THRESHOLD (0xf << 24)
|
||||||
|
#define RP_PRIV_MISC_TMS_CLK_CLAMP_ENABLE (1 << 31)
|
||||||
|
|
||||||
#define RP_LINK_CONTROL_STATUS 0x00000090
|
#define RP_LINK_CONTROL_STATUS 0x00000090
|
||||||
#define RP_LINK_CONTROL_STATUS_DL_LINK_ACTIVE 0x20000000
|
#define RP_LINK_CONTROL_STATUS_DL_LINK_ACTIVE 0x20000000
|
||||||
#define RP_LINK_CONTROL_STATUS_LINKSTAT_MASK 0x3fff0000
|
#define RP_LINK_CONTROL_STATUS_LINKSTAT_MASK 0x3fff0000
|
||||||
|
|
||||||
|
#define RP_LINK_CONTROL_STATUS_2 0x000000b0
|
||||||
|
|
||||||
#define PADS_CTL_SEL 0x0000009c
|
#define PADS_CTL_SEL 0x0000009c
|
||||||
|
|
||||||
#define PADS_CTL 0x000000a0
|
#define PADS_CTL 0x000000a0
|
||||||
|
@ -226,6 +284,7 @@
|
||||||
#define PADS_REFCLK_CFG_DRVI_SHIFT 12 /* 15:12 */
|
#define PADS_REFCLK_CFG_DRVI_SHIFT 12 /* 15:12 */
|
||||||
|
|
||||||
#define PME_ACK_TIMEOUT 10000
|
#define PME_ACK_TIMEOUT 10000
|
||||||
|
#define LINK_RETRAIN_TIMEOUT 100000 /* in usec */
|
||||||
|
|
||||||
struct tegra_msi {
|
struct tegra_msi {
|
||||||
struct msi_controller chip;
|
struct msi_controller chip;
|
||||||
|
@ -249,10 +308,12 @@ struct tegra_pcie_soc {
|
||||||
unsigned int num_ports;
|
unsigned int num_ports;
|
||||||
const struct tegra_pcie_port_soc *ports;
|
const struct tegra_pcie_port_soc *ports;
|
||||||
unsigned int msi_base_shift;
|
unsigned int msi_base_shift;
|
||||||
|
unsigned long afi_pex2_ctrl;
|
||||||
u32 pads_pll_ctl;
|
u32 pads_pll_ctl;
|
||||||
u32 tx_ref_sel;
|
u32 tx_ref_sel;
|
||||||
u32 pads_refclk_cfg0;
|
u32 pads_refclk_cfg0;
|
||||||
u32 pads_refclk_cfg1;
|
u32 pads_refclk_cfg1;
|
||||||
|
u32 update_fc_threshold;
|
||||||
bool has_pex_clkreq_en;
|
bool has_pex_clkreq_en;
|
||||||
bool has_pex_bias_ctrl;
|
bool has_pex_bias_ctrl;
|
||||||
bool has_intr_prsnt_sense;
|
bool has_intr_prsnt_sense;
|
||||||
|
@ -260,6 +321,24 @@ struct tegra_pcie_soc {
|
||||||
bool has_gen2;
|
bool has_gen2;
|
||||||
bool force_pca_enable;
|
bool force_pca_enable;
|
||||||
bool program_uphy;
|
bool program_uphy;
|
||||||
|
bool update_clamp_threshold;
|
||||||
|
bool program_deskew_time;
|
||||||
|
bool raw_violation_fixup;
|
||||||
|
bool update_fc_timer;
|
||||||
|
bool has_cache_bars;
|
||||||
|
struct {
|
||||||
|
struct {
|
||||||
|
u32 rp_ectl_2_r1;
|
||||||
|
u32 rp_ectl_4_r1;
|
||||||
|
u32 rp_ectl_5_r1;
|
||||||
|
u32 rp_ectl_6_r1;
|
||||||
|
u32 rp_ectl_2_r2;
|
||||||
|
u32 rp_ectl_4_r2;
|
||||||
|
u32 rp_ectl_5_r2;
|
||||||
|
u32 rp_ectl_6_r2;
|
||||||
|
} regs;
|
||||||
|
bool enable;
|
||||||
|
} ectl;
|
||||||
};
|
};
|
||||||
|
|
||||||
static inline struct tegra_msi *to_tegra_msi(struct msi_controller *chip)
|
static inline struct tegra_msi *to_tegra_msi(struct msi_controller *chip)
|
||||||
|
@ -321,6 +400,8 @@ struct tegra_pcie_port {
|
||||||
unsigned int lanes;
|
unsigned int lanes;
|
||||||
|
|
||||||
struct phy **phys;
|
struct phy **phys;
|
||||||
|
|
||||||
|
struct gpio_desc *reset_gpio;
|
||||||
};
|
};
|
||||||
|
|
||||||
struct tegra_pcie_bus {
|
struct tegra_pcie_bus {
|
||||||
|
@ -440,6 +521,7 @@ static struct pci_ops tegra_pcie_ops = {
|
||||||
|
|
||||||
static unsigned long tegra_pcie_port_get_pex_ctrl(struct tegra_pcie_port *port)
|
static unsigned long tegra_pcie_port_get_pex_ctrl(struct tegra_pcie_port *port)
|
||||||
{
|
{
|
||||||
|
const struct tegra_pcie_soc *soc = port->pcie->soc;
|
||||||
unsigned long ret = 0;
|
unsigned long ret = 0;
|
||||||
|
|
||||||
switch (port->index) {
|
switch (port->index) {
|
||||||
|
@ -452,7 +534,7 @@ static unsigned long tegra_pcie_port_get_pex_ctrl(struct tegra_pcie_port *port)
|
||||||
break;
|
break;
|
||||||
|
|
||||||
case 2:
|
case 2:
|
||||||
ret = AFI_PEX2_CTRL;
|
ret = soc->afi_pex2_ctrl;
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -465,15 +547,162 @@ static void tegra_pcie_port_reset(struct tegra_pcie_port *port)
|
||||||
unsigned long value;
|
unsigned long value;
|
||||||
|
|
||||||
/* pulse reset signal */
|
/* pulse reset signal */
|
||||||
value = afi_readl(port->pcie, ctrl);
|
if (port->reset_gpio) {
|
||||||
value &= ~AFI_PEX_CTRL_RST;
|
gpiod_set_value(port->reset_gpio, 1);
|
||||||
afi_writel(port->pcie, value, ctrl);
|
} else {
|
||||||
|
value = afi_readl(port->pcie, ctrl);
|
||||||
|
value &= ~AFI_PEX_CTRL_RST;
|
||||||
|
afi_writel(port->pcie, value, ctrl);
|
||||||
|
}
|
||||||
|
|
||||||
usleep_range(1000, 2000);
|
usleep_range(1000, 2000);
|
||||||
|
|
||||||
value = afi_readl(port->pcie, ctrl);
|
if (port->reset_gpio) {
|
||||||
value |= AFI_PEX_CTRL_RST;
|
gpiod_set_value(port->reset_gpio, 0);
|
||||||
afi_writel(port->pcie, value, ctrl);
|
} else {
|
||||||
|
value = afi_readl(port->pcie, ctrl);
|
||||||
|
value |= AFI_PEX_CTRL_RST;
|
||||||
|
afi_writel(port->pcie, value, ctrl);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
static void tegra_pcie_enable_rp_features(struct tegra_pcie_port *port)
|
||||||
|
{
|
||||||
|
const struct tegra_pcie_soc *soc = port->pcie->soc;
|
||||||
|
u32 value;
|
||||||
|
|
||||||
|
/* Enable AER capability */
|
||||||
|
value = readl(port->base + RP_VEND_CTL1);
|
||||||
|
value |= RP_VEND_CTL1_ERPT;
|
||||||
|
writel(value, port->base + RP_VEND_CTL1);
|
||||||
|
|
||||||
|
/* Optimal settings to enhance bandwidth */
|
||||||
|
value = readl(port->base + RP_VEND_XP);
|
||||||
|
value |= RP_VEND_XP_OPPORTUNISTIC_ACK;
|
||||||
|
value |= RP_VEND_XP_OPPORTUNISTIC_UPDATEFC;
|
||||||
|
writel(value, port->base + RP_VEND_XP);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* LTSSM will wait for DLLP to finish before entering L1 or L2,
|
||||||
|
* to avoid truncation of PM messages which results in receiver errors
|
||||||
|
*/
|
||||||
|
value = readl(port->base + RP_VEND_XP_BIST);
|
||||||
|
value |= RP_VEND_XP_BIST_GOTO_L1_L2_AFTER_DLLP_DONE;
|
||||||
|
writel(value, port->base + RP_VEND_XP_BIST);
|
||||||
|
|
||||||
|
value = readl(port->base + RP_PRIV_MISC);
|
||||||
|
value |= RP_PRIV_MISC_CTLR_CLK_CLAMP_ENABLE;
|
||||||
|
value |= RP_PRIV_MISC_TMS_CLK_CLAMP_ENABLE;
|
||||||
|
|
||||||
|
if (soc->update_clamp_threshold) {
|
||||||
|
value &= ~(RP_PRIV_MISC_CTLR_CLK_CLAMP_THRESHOLD_MASK |
|
||||||
|
RP_PRIV_MISC_TMS_CLK_CLAMP_THRESHOLD_MASK);
|
||||||
|
value |= RP_PRIV_MISC_CTLR_CLK_CLAMP_THRESHOLD |
|
||||||
|
RP_PRIV_MISC_TMS_CLK_CLAMP_THRESHOLD;
|
||||||
|
}
|
||||||
|
|
||||||
|
writel(value, port->base + RP_PRIV_MISC);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void tegra_pcie_program_ectl_settings(struct tegra_pcie_port *port)
|
||||||
|
{
|
||||||
|
const struct tegra_pcie_soc *soc = port->pcie->soc;
|
||||||
|
u32 value;
|
||||||
|
|
||||||
|
value = readl(port->base + RP_ECTL_2_R1);
|
||||||
|
value &= ~RP_ECTL_2_R1_RX_CTLE_1C_MASK;
|
||||||
|
value |= soc->ectl.regs.rp_ectl_2_r1;
|
||||||
|
writel(value, port->base + RP_ECTL_2_R1);
|
||||||
|
|
||||||
|
value = readl(port->base + RP_ECTL_4_R1);
|
||||||
|
value &= ~RP_ECTL_4_R1_RX_CDR_CTRL_1C_MASK;
|
||||||
|
value |= soc->ectl.regs.rp_ectl_4_r1 <<
|
||||||
|
RP_ECTL_4_R1_RX_CDR_CTRL_1C_SHIFT;
|
||||||
|
writel(value, port->base + RP_ECTL_4_R1);
|
||||||
|
|
||||||
|
value = readl(port->base + RP_ECTL_5_R1);
|
||||||
|
value &= ~RP_ECTL_5_R1_RX_EQ_CTRL_L_1C_MASK;
|
||||||
|
value |= soc->ectl.regs.rp_ectl_5_r1;
|
||||||
|
writel(value, port->base + RP_ECTL_5_R1);
|
||||||
|
|
||||||
|
value = readl(port->base + RP_ECTL_6_R1);
|
||||||
|
value &= ~RP_ECTL_6_R1_RX_EQ_CTRL_H_1C_MASK;
|
||||||
|
value |= soc->ectl.regs.rp_ectl_6_r1;
|
||||||
|
writel(value, port->base + RP_ECTL_6_R1);
|
||||||
|
|
||||||
|
value = readl(port->base + RP_ECTL_2_R2);
|
||||||
|
value &= ~RP_ECTL_2_R2_RX_CTLE_1C_MASK;
|
||||||
|
value |= soc->ectl.regs.rp_ectl_2_r2;
|
||||||
|
writel(value, port->base + RP_ECTL_2_R2);
|
||||||
|
|
||||||
|
value = readl(port->base + RP_ECTL_4_R2);
|
||||||
|
value &= ~RP_ECTL_4_R2_RX_CDR_CTRL_1C_MASK;
|
||||||
|
value |= soc->ectl.regs.rp_ectl_4_r2 <<
|
||||||
|
RP_ECTL_4_R2_RX_CDR_CTRL_1C_SHIFT;
|
||||||
|
writel(value, port->base + RP_ECTL_4_R2);
|
||||||
|
|
||||||
|
value = readl(port->base + RP_ECTL_5_R2);
|
||||||
|
value &= ~RP_ECTL_5_R2_RX_EQ_CTRL_L_1C_MASK;
|
||||||
|
value |= soc->ectl.regs.rp_ectl_5_r2;
|
||||||
|
writel(value, port->base + RP_ECTL_5_R2);
|
||||||
|
|
||||||
|
value = readl(port->base + RP_ECTL_6_R2);
|
||||||
|
value &= ~RP_ECTL_6_R2_RX_EQ_CTRL_H_1C_MASK;
|
||||||
|
value |= soc->ectl.regs.rp_ectl_6_r2;
|
||||||
|
writel(value, port->base + RP_ECTL_6_R2);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void tegra_pcie_apply_sw_fixup(struct tegra_pcie_port *port)
|
||||||
|
{
|
||||||
|
const struct tegra_pcie_soc *soc = port->pcie->soc;
|
||||||
|
u32 value;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Sometimes link speed change from Gen2 to Gen1 fails due to
|
||||||
|
* instability in deskew logic on lane-0. Increase the deskew
|
||||||
|
* retry time to resolve this issue.
|
||||||
|
*/
|
||||||
|
if (soc->program_deskew_time) {
|
||||||
|
value = readl(port->base + RP_VEND_CTL0);
|
||||||
|
value &= ~RP_VEND_CTL0_DSK_RST_PULSE_WIDTH_MASK;
|
||||||
|
value |= RP_VEND_CTL0_DSK_RST_PULSE_WIDTH;
|
||||||
|
writel(value, port->base + RP_VEND_CTL0);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Fixup for read after write violation. */
|
||||||
|
if (soc->raw_violation_fixup) {
|
||||||
|
value = readl(port->base + RP_RX_HDR_LIMIT);
|
||||||
|
value &= ~RP_RX_HDR_LIMIT_PW_MASK;
|
||||||
|
value |= RP_RX_HDR_LIMIT_PW;
|
||||||
|
writel(value, port->base + RP_RX_HDR_LIMIT);
|
||||||
|
|
||||||
|
value = readl(port->base + RP_PRIV_XP_DL);
|
||||||
|
value |= RP_PRIV_XP_DL_GEN2_UPD_FC_TSHOLD;
|
||||||
|
writel(value, port->base + RP_PRIV_XP_DL);
|
||||||
|
|
||||||
|
value = readl(port->base + RP_VEND_XP);
|
||||||
|
value &= ~RP_VEND_XP_UPDATE_FC_THRESHOLD_MASK;
|
||||||
|
value |= soc->update_fc_threshold;
|
||||||
|
writel(value, port->base + RP_VEND_XP);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (soc->update_fc_timer) {
|
||||||
|
value = readl(port->base + RP_VEND_XP);
|
||||||
|
value &= ~RP_VEND_XP_UPDATE_FC_THRESHOLD_MASK;
|
||||||
|
value |= soc->update_fc_threshold;
|
||||||
|
writel(value, port->base + RP_VEND_XP);
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* PCIe link doesn't come up with few legacy PCIe endpoints if
|
||||||
|
* root port advertises both Gen-1 and Gen-2 speeds in Tegra.
|
||||||
|
* Hence, the strategy followed here is to initially advertise
|
||||||
|
* only Gen-1 and after link is up, retrain link to Gen-2 speed
|
||||||
|
*/
|
||||||
|
value = readl(port->base + RP_LINK_CONTROL_STATUS_2);
|
||||||
|
value &= ~PCI_EXP_LNKSTA_CLS;
|
||||||
|
value |= PCI_EXP_LNKSTA_CLS_2_5GB;
|
||||||
|
writel(value, port->base + RP_LINK_CONTROL_STATUS_2);
|
||||||
}
|
}
|
||||||
|
|
||||||
static void tegra_pcie_port_enable(struct tegra_pcie_port *port)
|
static void tegra_pcie_port_enable(struct tegra_pcie_port *port)
|
||||||
|
@ -500,6 +729,13 @@ static void tegra_pcie_port_enable(struct tegra_pcie_port *port)
|
||||||
value |= RP_VEND_CTL2_PCA_ENABLE;
|
value |= RP_VEND_CTL2_PCA_ENABLE;
|
||||||
writel(value, port->base + RP_VEND_CTL2);
|
writel(value, port->base + RP_VEND_CTL2);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
tegra_pcie_enable_rp_features(port);
|
||||||
|
|
||||||
|
if (soc->ectl.enable)
|
||||||
|
tegra_pcie_program_ectl_settings(port);
|
||||||
|
|
||||||
|
tegra_pcie_apply_sw_fixup(port);
|
||||||
}
|
}
|
||||||
|
|
||||||
static void tegra_pcie_port_disable(struct tegra_pcie_port *port)
|
static void tegra_pcie_port_disable(struct tegra_pcie_port *port)
|
||||||
|
@ -521,6 +757,12 @@ static void tegra_pcie_port_disable(struct tegra_pcie_port *port)
|
||||||
|
|
||||||
value &= ~AFI_PEX_CTRL_REFCLK_EN;
|
value &= ~AFI_PEX_CTRL_REFCLK_EN;
|
||||||
afi_writel(port->pcie, value, ctrl);
|
afi_writel(port->pcie, value, ctrl);
|
||||||
|
|
||||||
|
/* disable PCIe port and set CLKREQ# as GPIO to allow PLLE power down */
|
||||||
|
value = afi_readl(port->pcie, AFI_PCIE_CONFIG);
|
||||||
|
value |= AFI_PCIE_CONFIG_PCIE_DISABLE(port->index);
|
||||||
|
value |= AFI_PCIE_CONFIG_PCIE_CLKREQ_GPIO(port->index);
|
||||||
|
afi_writel(port->pcie, value, AFI_PCIE_CONFIG);
|
||||||
}
|
}
|
||||||
|
|
||||||
static void tegra_pcie_port_free(struct tegra_pcie_port *port)
|
static void tegra_pcie_port_free(struct tegra_pcie_port *port)
|
||||||
|
@ -545,12 +787,15 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_NVIDIA, 0x0bf1, tegra_pcie_fixup_class);
|
||||||
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_NVIDIA, 0x0e1c, tegra_pcie_fixup_class);
|
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_NVIDIA, 0x0e1c, tegra_pcie_fixup_class);
|
||||||
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_NVIDIA, 0x0e1d, tegra_pcie_fixup_class);
|
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_NVIDIA, 0x0e1d, tegra_pcie_fixup_class);
|
||||||
|
|
||||||
/* Tegra PCIE requires relaxed ordering */
|
/* Tegra20 and Tegra30 PCIE requires relaxed ordering */
|
||||||
static void tegra_pcie_relax_enable(struct pci_dev *dev)
|
static void tegra_pcie_relax_enable(struct pci_dev *dev)
|
||||||
{
|
{
|
||||||
pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_RELAX_EN);
|
pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_RELAX_EN);
|
||||||
}
|
}
|
||||||
DECLARE_PCI_FIXUP_FINAL(PCI_ANY_ID, PCI_ANY_ID, tegra_pcie_relax_enable);
|
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NVIDIA, 0x0bf0, tegra_pcie_relax_enable);
|
||||||
|
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NVIDIA, 0x0bf1, tegra_pcie_relax_enable);
|
||||||
|
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NVIDIA, 0x0e1c, tegra_pcie_relax_enable);
|
||||||
|
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NVIDIA, 0x0e1d, tegra_pcie_relax_enable);
|
||||||
|
|
||||||
static int tegra_pcie_request_resources(struct tegra_pcie *pcie)
|
static int tegra_pcie_request_resources(struct tegra_pcie *pcie)
|
||||||
{
|
{
|
||||||
|
@ -635,7 +880,7 @@ static irqreturn_t tegra_pcie_isr(int irq, void *arg)
|
||||||
* do not pollute kernel log with master abort reports since they
|
* do not pollute kernel log with master abort reports since they
|
||||||
* happen a lot during enumeration
|
* happen a lot during enumeration
|
||||||
*/
|
*/
|
||||||
if (code == AFI_INTR_MASTER_ABORT)
|
if (code == AFI_INTR_MASTER_ABORT || code == AFI_INTR_PE_PRSNT_SENSE)
|
||||||
dev_dbg(dev, "%s, signature: %08x\n", err_msg[code], signature);
|
dev_dbg(dev, "%s, signature: %08x\n", err_msg[code], signature);
|
||||||
else
|
else
|
||||||
dev_err(dev, "%s, signature: %08x\n", err_msg[code], signature);
|
dev_err(dev, "%s, signature: %08x\n", err_msg[code], signature);
|
||||||
|
@ -704,11 +949,13 @@ static void tegra_pcie_setup_translations(struct tegra_pcie *pcie)
|
||||||
afi_writel(pcie, 0, AFI_AXI_BAR5_SZ);
|
afi_writel(pcie, 0, AFI_AXI_BAR5_SZ);
|
||||||
afi_writel(pcie, 0, AFI_FPCI_BAR5);
|
afi_writel(pcie, 0, AFI_FPCI_BAR5);
|
||||||
|
|
||||||
/* map all upstream transactions as uncached */
|
if (pcie->soc->has_cache_bars) {
|
||||||
afi_writel(pcie, 0, AFI_CACHE_BAR0_ST);
|
/* map all upstream transactions as uncached */
|
||||||
afi_writel(pcie, 0, AFI_CACHE_BAR0_SZ);
|
afi_writel(pcie, 0, AFI_CACHE_BAR0_ST);
|
||||||
afi_writel(pcie, 0, AFI_CACHE_BAR1_ST);
|
afi_writel(pcie, 0, AFI_CACHE_BAR0_SZ);
|
||||||
afi_writel(pcie, 0, AFI_CACHE_BAR1_SZ);
|
afi_writel(pcie, 0, AFI_CACHE_BAR1_ST);
|
||||||
|
afi_writel(pcie, 0, AFI_CACHE_BAR1_SZ);
|
||||||
|
}
|
||||||
|
|
||||||
/* MSI translations are setup only when needed */
|
/* MSI translations are setup only when needed */
|
||||||
afi_writel(pcie, 0, AFI_MSI_FPCI_BAR_ST);
|
afi_writel(pcie, 0, AFI_MSI_FPCI_BAR_ST);
|
||||||
|
@ -852,7 +1099,6 @@ static int tegra_pcie_port_phy_power_off(struct tegra_pcie_port *port)
|
||||||
static int tegra_pcie_phy_power_on(struct tegra_pcie *pcie)
|
static int tegra_pcie_phy_power_on(struct tegra_pcie *pcie)
|
||||||
{
|
{
|
||||||
struct device *dev = pcie->dev;
|
struct device *dev = pcie->dev;
|
||||||
const struct tegra_pcie_soc *soc = pcie->soc;
|
|
||||||
struct tegra_pcie_port *port;
|
struct tegra_pcie_port *port;
|
||||||
int err;
|
int err;
|
||||||
|
|
||||||
|
@ -878,12 +1124,6 @@ static int tegra_pcie_phy_power_on(struct tegra_pcie *pcie)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Configure the reference clock driver */
|
|
||||||
pads_writel(pcie, soc->pads_refclk_cfg0, PADS_REFCLK_CFG0);
|
|
||||||
|
|
||||||
if (soc->num_ports > 2)
|
|
||||||
pads_writel(pcie, soc->pads_refclk_cfg1, PADS_REFCLK_CFG1);
|
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -918,13 +1158,11 @@ static int tegra_pcie_phy_power_off(struct tegra_pcie *pcie)
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
static int tegra_pcie_enable_controller(struct tegra_pcie *pcie)
|
static void tegra_pcie_enable_controller(struct tegra_pcie *pcie)
|
||||||
{
|
{
|
||||||
struct device *dev = pcie->dev;
|
|
||||||
const struct tegra_pcie_soc *soc = pcie->soc;
|
const struct tegra_pcie_soc *soc = pcie->soc;
|
||||||
struct tegra_pcie_port *port;
|
struct tegra_pcie_port *port;
|
||||||
unsigned long value;
|
unsigned long value;
|
||||||
int err;
|
|
||||||
|
|
||||||
/* enable PLL power down */
|
/* enable PLL power down */
|
||||||
if (pcie->phy) {
|
if (pcie->phy) {
|
||||||
|
@ -942,9 +1180,12 @@ static int tegra_pcie_enable_controller(struct tegra_pcie *pcie)
|
||||||
value = afi_readl(pcie, AFI_PCIE_CONFIG);
|
value = afi_readl(pcie, AFI_PCIE_CONFIG);
|
||||||
value &= ~AFI_PCIE_CONFIG_SM2TMS0_XBAR_CONFIG_MASK;
|
value &= ~AFI_PCIE_CONFIG_SM2TMS0_XBAR_CONFIG_MASK;
|
||||||
value |= AFI_PCIE_CONFIG_PCIE_DISABLE_ALL | pcie->xbar_config;
|
value |= AFI_PCIE_CONFIG_PCIE_DISABLE_ALL | pcie->xbar_config;
|
||||||
|
value |= AFI_PCIE_CONFIG_PCIE_CLKREQ_GPIO_ALL;
|
||||||
|
|
||||||
list_for_each_entry(port, &pcie->ports, list)
|
list_for_each_entry(port, &pcie->ports, list) {
|
||||||
value &= ~AFI_PCIE_CONFIG_PCIE_DISABLE(port->index);
|
value &= ~AFI_PCIE_CONFIG_PCIE_DISABLE(port->index);
|
||||||
|
value &= ~AFI_PCIE_CONFIG_PCIE_CLKREQ_GPIO(port->index);
|
||||||
|
}
|
||||||
|
|
||||||
afi_writel(pcie, value, AFI_PCIE_CONFIG);
|
afi_writel(pcie, value, AFI_PCIE_CONFIG);
|
||||||
|
|
||||||
|
@ -958,20 +1199,10 @@ static int tegra_pcie_enable_controller(struct tegra_pcie *pcie)
|
||||||
afi_writel(pcie, value, AFI_FUSE);
|
afi_writel(pcie, value, AFI_FUSE);
|
||||||
}
|
}
|
||||||
|
|
||||||
if (soc->program_uphy) {
|
/* Disable AFI dynamic clock gating and enable PCIe */
|
||||||
err = tegra_pcie_phy_power_on(pcie);
|
|
||||||
if (err < 0) {
|
|
||||||
dev_err(dev, "failed to power on PHY(s): %d\n", err);
|
|
||||||
return err;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
/* take the PCIe interface module out of reset */
|
|
||||||
reset_control_deassert(pcie->pcie_xrst);
|
|
||||||
|
|
||||||
/* finally enable PCIe */
|
|
||||||
value = afi_readl(pcie, AFI_CONFIGURATION);
|
value = afi_readl(pcie, AFI_CONFIGURATION);
|
||||||
value |= AFI_CONFIGURATION_EN_FPCI;
|
value |= AFI_CONFIGURATION_EN_FPCI;
|
||||||
|
value |= AFI_CONFIGURATION_CLKEN_OVERRIDE;
|
||||||
afi_writel(pcie, value, AFI_CONFIGURATION);
|
afi_writel(pcie, value, AFI_CONFIGURATION);
|
||||||
|
|
||||||
value = AFI_INTR_EN_INI_SLVERR | AFI_INTR_EN_INI_DECERR |
|
value = AFI_INTR_EN_INI_SLVERR | AFI_INTR_EN_INI_DECERR |
|
||||||
|
@ -989,22 +1220,6 @@ static int tegra_pcie_enable_controller(struct tegra_pcie *pcie)
|
||||||
|
|
||||||
/* disable all exceptions */
|
/* disable all exceptions */
|
||||||
afi_writel(pcie, 0, AFI_FPCI_ERROR_MASKS);
|
afi_writel(pcie, 0, AFI_FPCI_ERROR_MASKS);
|
||||||
|
|
||||||
return 0;
|
|
||||||
}
|
|
||||||
|
|
||||||
static void tegra_pcie_disable_controller(struct tegra_pcie *pcie)
|
|
||||||
{
|
|
||||||
int err;
|
|
||||||
|
|
||||||
reset_control_assert(pcie->pcie_xrst);
|
|
||||||
|
|
||||||
if (pcie->soc->program_uphy) {
|
|
||||||
err = tegra_pcie_phy_power_off(pcie);
|
|
||||||
if (err < 0)
|
|
||||||
dev_err(pcie->dev, "failed to power off PHY(s): %d\n",
|
|
||||||
err);
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
static void tegra_pcie_power_off(struct tegra_pcie *pcie)
|
static void tegra_pcie_power_off(struct tegra_pcie *pcie)
|
||||||
|
@ -1014,13 +1229,11 @@ static void tegra_pcie_power_off(struct tegra_pcie *pcie)
|
||||||
int err;
|
int err;
|
||||||
|
|
||||||
reset_control_assert(pcie->afi_rst);
|
reset_control_assert(pcie->afi_rst);
|
||||||
reset_control_assert(pcie->pex_rst);
|
|
||||||
|
|
||||||
clk_disable_unprepare(pcie->pll_e);
|
clk_disable_unprepare(pcie->pll_e);
|
||||||
if (soc->has_cml_clk)
|
if (soc->has_cml_clk)
|
||||||
clk_disable_unprepare(pcie->cml_clk);
|
clk_disable_unprepare(pcie->cml_clk);
|
||||||
clk_disable_unprepare(pcie->afi_clk);
|
clk_disable_unprepare(pcie->afi_clk);
|
||||||
clk_disable_unprepare(pcie->pex_clk);
|
|
||||||
|
|
||||||
if (!dev->pm_domain)
|
if (!dev->pm_domain)
|
||||||
tegra_powergate_power_off(TEGRA_POWERGATE_PCIE);
|
tegra_powergate_power_off(TEGRA_POWERGATE_PCIE);
|
||||||
|
@ -1048,46 +1261,66 @@ static int tegra_pcie_power_on(struct tegra_pcie *pcie)
|
||||||
if (err < 0)
|
if (err < 0)
|
||||||
dev_err(dev, "failed to enable regulators: %d\n", err);
|
dev_err(dev, "failed to enable regulators: %d\n", err);
|
||||||
|
|
||||||
if (dev->pm_domain) {
|
if (!dev->pm_domain) {
|
||||||
err = clk_prepare_enable(pcie->pex_clk);
|
err = tegra_powergate_power_on(TEGRA_POWERGATE_PCIE);
|
||||||
if (err) {
|
if (err) {
|
||||||
dev_err(dev, "failed to enable PEX clock: %d\n", err);
|
dev_err(dev, "failed to power ungate: %d\n", err);
|
||||||
return err;
|
goto regulator_disable;
|
||||||
}
|
}
|
||||||
reset_control_deassert(pcie->pex_rst);
|
err = tegra_powergate_remove_clamping(TEGRA_POWERGATE_PCIE);
|
||||||
} else {
|
|
||||||
err = tegra_powergate_sequence_power_up(TEGRA_POWERGATE_PCIE,
|
|
||||||
pcie->pex_clk,
|
|
||||||
pcie->pex_rst);
|
|
||||||
if (err) {
|
if (err) {
|
||||||
dev_err(dev, "powerup sequence failed: %d\n", err);
|
dev_err(dev, "failed to remove clamp: %d\n", err);
|
||||||
return err;
|
goto powergate;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
reset_control_deassert(pcie->afi_rst);
|
|
||||||
|
|
||||||
err = clk_prepare_enable(pcie->afi_clk);
|
err = clk_prepare_enable(pcie->afi_clk);
|
||||||
if (err < 0) {
|
if (err < 0) {
|
||||||
dev_err(dev, "failed to enable AFI clock: %d\n", err);
|
dev_err(dev, "failed to enable AFI clock: %d\n", err);
|
||||||
return err;
|
goto powergate;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (soc->has_cml_clk) {
|
if (soc->has_cml_clk) {
|
||||||
err = clk_prepare_enable(pcie->cml_clk);
|
err = clk_prepare_enable(pcie->cml_clk);
|
||||||
if (err < 0) {
|
if (err < 0) {
|
||||||
dev_err(dev, "failed to enable CML clock: %d\n", err);
|
dev_err(dev, "failed to enable CML clock: %d\n", err);
|
||||||
return err;
|
goto disable_afi_clk;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
err = clk_prepare_enable(pcie->pll_e);
|
err = clk_prepare_enable(pcie->pll_e);
|
||||||
if (err < 0) {
|
if (err < 0) {
|
||||||
dev_err(dev, "failed to enable PLLE clock: %d\n", err);
|
dev_err(dev, "failed to enable PLLE clock: %d\n", err);
|
||||||
return err;
|
goto disable_cml_clk;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
reset_control_deassert(pcie->afi_rst);
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
|
|
||||||
|
disable_cml_clk:
|
||||||
|
if (soc->has_cml_clk)
|
||||||
|
clk_disable_unprepare(pcie->cml_clk);
|
||||||
|
disable_afi_clk:
|
||||||
|
clk_disable_unprepare(pcie->afi_clk);
|
||||||
|
powergate:
|
||||||
|
if (!dev->pm_domain)
|
||||||
|
tegra_powergate_power_off(TEGRA_POWERGATE_PCIE);
|
||||||
|
regulator_disable:
|
||||||
|
regulator_bulk_disable(pcie->num_supplies, pcie->supplies);
|
||||||
|
|
||||||
|
return err;
|
||||||
|
}
|
||||||
|
|
||||||
|
static void tegra_pcie_apply_pad_settings(struct tegra_pcie *pcie)
|
||||||
|
{
|
||||||
|
const struct tegra_pcie_soc *soc = pcie->soc;
|
||||||
|
|
||||||
|
/* Configure the reference clock driver */
|
||||||
|
pads_writel(pcie, soc->pads_refclk_cfg0, PADS_REFCLK_CFG0);
|
||||||
|
|
||||||
|
if (soc->num_ports > 2)
|
||||||
|
pads_writel(pcie, soc->pads_refclk_cfg1, PADS_REFCLK_CFG1);
|
||||||
}
|
}
|
||||||
|
|
||||||
static int tegra_pcie_clocks_get(struct tegra_pcie *pcie)
|
static int tegra_pcie_clocks_get(struct tegra_pcie *pcie)
|
||||||
|
@ -1647,6 +1880,15 @@ static int tegra_pcie_disable_msi(struct tegra_pcie *pcie)
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void tegra_pcie_disable_interrupts(struct tegra_pcie *pcie)
|
||||||
|
{
|
||||||
|
u32 value;
|
||||||
|
|
||||||
|
value = afi_readl(pcie, AFI_INTR_MASK);
|
||||||
|
value &= ~AFI_INTR_MASK_INT_MASK;
|
||||||
|
afi_writel(pcie, value, AFI_INTR_MASK);
|
||||||
|
}
|
||||||
|
|
||||||
static int tegra_pcie_get_xbar_config(struct tegra_pcie *pcie, u32 lanes,
|
static int tegra_pcie_get_xbar_config(struct tegra_pcie *pcie, u32 lanes,
|
||||||
u32 *xbar)
|
u32 *xbar)
|
||||||
{
|
{
|
||||||
|
@ -1990,6 +2232,7 @@ static int tegra_pcie_parse_dt(struct tegra_pcie *pcie)
|
||||||
struct tegra_pcie_port *rp;
|
struct tegra_pcie_port *rp;
|
||||||
unsigned int index;
|
unsigned int index;
|
||||||
u32 value;
|
u32 value;
|
||||||
|
char *label;
|
||||||
|
|
||||||
err = of_pci_get_devfn(port);
|
err = of_pci_get_devfn(port);
|
||||||
if (err < 0) {
|
if (err < 0) {
|
||||||
|
@ -2048,6 +2291,31 @@ static int tegra_pcie_parse_dt(struct tegra_pcie *pcie)
|
||||||
if (IS_ERR(rp->base))
|
if (IS_ERR(rp->base))
|
||||||
return PTR_ERR(rp->base);
|
return PTR_ERR(rp->base);
|
||||||
|
|
||||||
|
label = devm_kasprintf(dev, GFP_KERNEL, "pex-reset-%u", index);
|
||||||
|
if (!label) {
|
||||||
|
dev_err(dev, "failed to create reset GPIO label\n");
|
||||||
|
return -ENOMEM;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Returns -ENOENT if reset-gpios property is not populated
|
||||||
|
* and in this case fall back to using AFI per port register
|
||||||
|
* to toggle PERST# SFIO line.
|
||||||
|
*/
|
||||||
|
rp->reset_gpio = devm_gpiod_get_from_of_node(dev, port,
|
||||||
|
"reset-gpios", 0,
|
||||||
|
GPIOD_OUT_LOW,
|
||||||
|
label);
|
||||||
|
if (IS_ERR(rp->reset_gpio)) {
|
||||||
|
if (PTR_ERR(rp->reset_gpio) == -ENOENT) {
|
||||||
|
rp->reset_gpio = NULL;
|
||||||
|
} else {
|
||||||
|
dev_err(dev, "failed to get reset GPIO: %d\n",
|
||||||
|
err);
|
||||||
|
return PTR_ERR(rp->reset_gpio);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
list_add_tail(&rp->list, &pcie->ports);
|
list_add_tail(&rp->list, &pcie->ports);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -2095,7 +2363,7 @@ static bool tegra_pcie_port_check_link(struct tegra_pcie_port *port)
|
||||||
} while (--timeout);
|
} while (--timeout);
|
||||||
|
|
||||||
if (!timeout) {
|
if (!timeout) {
|
||||||
dev_err(dev, "link %u down, retrying\n", port->index);
|
dev_dbg(dev, "link %u down, retrying\n", port->index);
|
||||||
goto retry;
|
goto retry;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -2117,6 +2385,64 @@ retry:
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void tegra_pcie_change_link_speed(struct tegra_pcie *pcie)
|
||||||
|
{
|
||||||
|
struct device *dev = pcie->dev;
|
||||||
|
struct tegra_pcie_port *port;
|
||||||
|
ktime_t deadline;
|
||||||
|
u32 value;
|
||||||
|
|
||||||
|
list_for_each_entry(port, &pcie->ports, list) {
|
||||||
|
/*
|
||||||
|
* "Supported Link Speeds Vector" in "Link Capabilities 2"
|
||||||
|
* is not supported by Tegra. tegra_pcie_change_link_speed()
|
||||||
|
* is called only for Tegra chips which support Gen2.
|
||||||
|
* So there no harm if supported link speed is not verified.
|
||||||
|
*/
|
||||||
|
value = readl(port->base + RP_LINK_CONTROL_STATUS_2);
|
||||||
|
value &= ~PCI_EXP_LNKSTA_CLS;
|
||||||
|
value |= PCI_EXP_LNKSTA_CLS_5_0GB;
|
||||||
|
writel(value, port->base + RP_LINK_CONTROL_STATUS_2);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Poll until link comes back from recovery to avoid race
|
||||||
|
* condition.
|
||||||
|
*/
|
||||||
|
deadline = ktime_add_us(ktime_get(), LINK_RETRAIN_TIMEOUT);
|
||||||
|
|
||||||
|
while (ktime_before(ktime_get(), deadline)) {
|
||||||
|
value = readl(port->base + RP_LINK_CONTROL_STATUS);
|
||||||
|
if ((value & PCI_EXP_LNKSTA_LT) == 0)
|
||||||
|
break;
|
||||||
|
|
||||||
|
usleep_range(2000, 3000);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (value & PCI_EXP_LNKSTA_LT)
|
||||||
|
dev_warn(dev, "PCIe port %u link is in recovery\n",
|
||||||
|
port->index);
|
||||||
|
|
||||||
|
/* Retrain the link */
|
||||||
|
value = readl(port->base + RP_LINK_CONTROL_STATUS);
|
||||||
|
value |= PCI_EXP_LNKCTL_RL;
|
||||||
|
writel(value, port->base + RP_LINK_CONTROL_STATUS);
|
||||||
|
|
||||||
|
deadline = ktime_add_us(ktime_get(), LINK_RETRAIN_TIMEOUT);
|
||||||
|
|
||||||
|
while (ktime_before(ktime_get(), deadline)) {
|
||||||
|
value = readl(port->base + RP_LINK_CONTROL_STATUS);
|
||||||
|
if ((value & PCI_EXP_LNKSTA_LT) == 0)
|
||||||
|
break;
|
||||||
|
|
||||||
|
usleep_range(2000, 3000);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (value & PCI_EXP_LNKSTA_LT)
|
||||||
|
dev_err(dev, "failed to retrain link of port %u\n",
|
||||||
|
port->index);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
static void tegra_pcie_enable_ports(struct tegra_pcie *pcie)
|
static void tegra_pcie_enable_ports(struct tegra_pcie *pcie)
|
||||||
{
|
{
|
||||||
struct device *dev = pcie->dev;
|
struct device *dev = pcie->dev;
|
||||||
|
@ -2127,7 +2453,12 @@ static void tegra_pcie_enable_ports(struct tegra_pcie *pcie)
|
||||||
port->index, port->lanes);
|
port->index, port->lanes);
|
||||||
|
|
||||||
tegra_pcie_port_enable(port);
|
tegra_pcie_port_enable(port);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Start LTSSM from Tegra side */
|
||||||
|
reset_control_deassert(pcie->pcie_xrst);
|
||||||
|
|
||||||
|
list_for_each_entry_safe(port, tmp, &pcie->ports, list) {
|
||||||
if (tegra_pcie_port_check_link(port))
|
if (tegra_pcie_port_check_link(port))
|
||||||
continue;
|
continue;
|
||||||
|
|
||||||
|
@ -2136,12 +2467,17 @@ static void tegra_pcie_enable_ports(struct tegra_pcie *pcie)
|
||||||
tegra_pcie_port_disable(port);
|
tegra_pcie_port_disable(port);
|
||||||
tegra_pcie_port_free(port);
|
tegra_pcie_port_free(port);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (pcie->soc->has_gen2)
|
||||||
|
tegra_pcie_change_link_speed(pcie);
|
||||||
}
|
}
|
||||||
|
|
||||||
static void tegra_pcie_disable_ports(struct tegra_pcie *pcie)
|
static void tegra_pcie_disable_ports(struct tegra_pcie *pcie)
|
||||||
{
|
{
|
||||||
struct tegra_pcie_port *port, *tmp;
|
struct tegra_pcie_port *port, *tmp;
|
||||||
|
|
||||||
|
reset_control_assert(pcie->pcie_xrst);
|
||||||
|
|
||||||
list_for_each_entry_safe(port, tmp, &pcie->ports, list)
|
list_for_each_entry_safe(port, tmp, &pcie->ports, list)
|
||||||
tegra_pcie_port_disable(port);
|
tegra_pcie_port_disable(port);
|
||||||
}
|
}
|
||||||
|
@ -2155,6 +2491,7 @@ static const struct tegra_pcie_soc tegra20_pcie = {
|
||||||
.num_ports = 2,
|
.num_ports = 2,
|
||||||
.ports = tegra20_pcie_ports,
|
.ports = tegra20_pcie_ports,
|
||||||
.msi_base_shift = 0,
|
.msi_base_shift = 0,
|
||||||
|
.afi_pex2_ctrl = 0x128,
|
||||||
.pads_pll_ctl = PADS_PLL_CTL_TEGRA20,
|
.pads_pll_ctl = PADS_PLL_CTL_TEGRA20,
|
||||||
.tx_ref_sel = PADS_PLL_CTL_TXCLKREF_DIV10,
|
.tx_ref_sel = PADS_PLL_CTL_TXCLKREF_DIV10,
|
||||||
.pads_refclk_cfg0 = 0xfa5cfa5c,
|
.pads_refclk_cfg0 = 0xfa5cfa5c,
|
||||||
|
@ -2165,6 +2502,12 @@ static const struct tegra_pcie_soc tegra20_pcie = {
|
||||||
.has_gen2 = false,
|
.has_gen2 = false,
|
||||||
.force_pca_enable = false,
|
.force_pca_enable = false,
|
||||||
.program_uphy = true,
|
.program_uphy = true,
|
||||||
|
.update_clamp_threshold = false,
|
||||||
|
.program_deskew_time = false,
|
||||||
|
.raw_violation_fixup = false,
|
||||||
|
.update_fc_timer = false,
|
||||||
|
.has_cache_bars = true,
|
||||||
|
.ectl.enable = false,
|
||||||
};
|
};
|
||||||
|
|
||||||
static const struct tegra_pcie_port_soc tegra30_pcie_ports[] = {
|
static const struct tegra_pcie_port_soc tegra30_pcie_ports[] = {
|
||||||
|
@ -2188,6 +2531,12 @@ static const struct tegra_pcie_soc tegra30_pcie = {
|
||||||
.has_gen2 = false,
|
.has_gen2 = false,
|
||||||
.force_pca_enable = false,
|
.force_pca_enable = false,
|
||||||
.program_uphy = true,
|
.program_uphy = true,
|
||||||
|
.update_clamp_threshold = false,
|
||||||
|
.program_deskew_time = false,
|
||||||
|
.raw_violation_fixup = false,
|
||||||
|
.update_fc_timer = false,
|
||||||
|
.has_cache_bars = false,
|
||||||
|
.ectl.enable = false,
|
||||||
};
|
};
|
||||||
|
|
||||||
static const struct tegra_pcie_soc tegra124_pcie = {
|
static const struct tegra_pcie_soc tegra124_pcie = {
|
||||||
|
@ -2197,6 +2546,8 @@ static const struct tegra_pcie_soc tegra124_pcie = {
|
||||||
.pads_pll_ctl = PADS_PLL_CTL_TEGRA30,
|
.pads_pll_ctl = PADS_PLL_CTL_TEGRA30,
|
||||||
.tx_ref_sel = PADS_PLL_CTL_TXCLKREF_BUF_EN,
|
.tx_ref_sel = PADS_PLL_CTL_TXCLKREF_BUF_EN,
|
||||||
.pads_refclk_cfg0 = 0x44ac44ac,
|
.pads_refclk_cfg0 = 0x44ac44ac,
|
||||||
|
/* FC threshold is bit[25:18] */
|
||||||
|
.update_fc_threshold = 0x03fc0000,
|
||||||
.has_pex_clkreq_en = true,
|
.has_pex_clkreq_en = true,
|
||||||
.has_pex_bias_ctrl = true,
|
.has_pex_bias_ctrl = true,
|
||||||
.has_intr_prsnt_sense = true,
|
.has_intr_prsnt_sense = true,
|
||||||
|
@ -2204,6 +2555,12 @@ static const struct tegra_pcie_soc tegra124_pcie = {
|
||||||
.has_gen2 = true,
|
.has_gen2 = true,
|
||||||
.force_pca_enable = false,
|
.force_pca_enable = false,
|
||||||
.program_uphy = true,
|
.program_uphy = true,
|
||||||
|
.update_clamp_threshold = true,
|
||||||
|
.program_deskew_time = false,
|
||||||
|
.raw_violation_fixup = true,
|
||||||
|
.update_fc_timer = false,
|
||||||
|
.has_cache_bars = false,
|
||||||
|
.ectl.enable = false,
|
||||||
};
|
};
|
||||||
|
|
||||||
static const struct tegra_pcie_soc tegra210_pcie = {
|
static const struct tegra_pcie_soc tegra210_pcie = {
|
||||||
|
@ -2213,6 +2570,8 @@ static const struct tegra_pcie_soc tegra210_pcie = {
|
||||||
.pads_pll_ctl = PADS_PLL_CTL_TEGRA30,
|
.pads_pll_ctl = PADS_PLL_CTL_TEGRA30,
|
||||||
.tx_ref_sel = PADS_PLL_CTL_TXCLKREF_BUF_EN,
|
.tx_ref_sel = PADS_PLL_CTL_TXCLKREF_BUF_EN,
|
||||||
.pads_refclk_cfg0 = 0x90b890b8,
|
.pads_refclk_cfg0 = 0x90b890b8,
|
||||||
|
/* FC threshold is bit[25:18] */
|
||||||
|
.update_fc_threshold = 0x01800000,
|
||||||
.has_pex_clkreq_en = true,
|
.has_pex_clkreq_en = true,
|
||||||
.has_pex_bias_ctrl = true,
|
.has_pex_bias_ctrl = true,
|
||||||
.has_intr_prsnt_sense = true,
|
.has_intr_prsnt_sense = true,
|
||||||
|
@ -2220,6 +2579,24 @@ static const struct tegra_pcie_soc tegra210_pcie = {
|
||||||
.has_gen2 = true,
|
.has_gen2 = true,
|
||||||
.force_pca_enable = true,
|
.force_pca_enable = true,
|
||||||
.program_uphy = true,
|
.program_uphy = true,
|
||||||
|
.update_clamp_threshold = true,
|
||||||
|
.program_deskew_time = true,
|
||||||
|
.raw_violation_fixup = false,
|
||||||
|
.update_fc_timer = true,
|
||||||
|
.has_cache_bars = false,
|
||||||
|
.ectl = {
|
||||||
|
.regs = {
|
||||||
|
.rp_ectl_2_r1 = 0x0000000f,
|
||||||
|
.rp_ectl_4_r1 = 0x00000067,
|
||||||
|
.rp_ectl_5_r1 = 0x55010000,
|
||||||
|
.rp_ectl_6_r1 = 0x00000001,
|
||||||
|
.rp_ectl_2_r2 = 0x0000008f,
|
||||||
|
.rp_ectl_4_r2 = 0x000000c7,
|
||||||
|
.rp_ectl_5_r2 = 0x55010000,
|
||||||
|
.rp_ectl_6_r2 = 0x00000001,
|
||||||
|
},
|
||||||
|
.enable = true,
|
||||||
|
},
|
||||||
};
|
};
|
||||||
|
|
||||||
static const struct tegra_pcie_port_soc tegra186_pcie_ports[] = {
|
static const struct tegra_pcie_port_soc tegra186_pcie_ports[] = {
|
||||||
|
@ -2232,6 +2609,7 @@ static const struct tegra_pcie_soc tegra186_pcie = {
|
||||||
.num_ports = 3,
|
.num_ports = 3,
|
||||||
.ports = tegra186_pcie_ports,
|
.ports = tegra186_pcie_ports,
|
||||||
.msi_base_shift = 8,
|
.msi_base_shift = 8,
|
||||||
|
.afi_pex2_ctrl = 0x19c,
|
||||||
.pads_pll_ctl = PADS_PLL_CTL_TEGRA30,
|
.pads_pll_ctl = PADS_PLL_CTL_TEGRA30,
|
||||||
.tx_ref_sel = PADS_PLL_CTL_TXCLKREF_BUF_EN,
|
.tx_ref_sel = PADS_PLL_CTL_TXCLKREF_BUF_EN,
|
||||||
.pads_refclk_cfg0 = 0x80b880b8,
|
.pads_refclk_cfg0 = 0x80b880b8,
|
||||||
|
@ -2243,6 +2621,12 @@ static const struct tegra_pcie_soc tegra186_pcie = {
|
||||||
.has_gen2 = true,
|
.has_gen2 = true,
|
||||||
.force_pca_enable = false,
|
.force_pca_enable = false,
|
||||||
.program_uphy = false,
|
.program_uphy = false,
|
||||||
|
.update_clamp_threshold = false,
|
||||||
|
.program_deskew_time = false,
|
||||||
|
.raw_violation_fixup = false,
|
||||||
|
.update_fc_timer = false,
|
||||||
|
.has_cache_bars = false,
|
||||||
|
.ectl.enable = false,
|
||||||
};
|
};
|
||||||
|
|
||||||
static const struct of_device_id tegra_pcie_of_match[] = {
|
static const struct of_device_id tegra_pcie_of_match[] = {
|
||||||
|
@ -2485,16 +2869,32 @@ static int __maybe_unused tegra_pcie_pm_suspend(struct device *dev)
|
||||||
{
|
{
|
||||||
struct tegra_pcie *pcie = dev_get_drvdata(dev);
|
struct tegra_pcie *pcie = dev_get_drvdata(dev);
|
||||||
struct tegra_pcie_port *port;
|
struct tegra_pcie_port *port;
|
||||||
|
int err;
|
||||||
|
|
||||||
list_for_each_entry(port, &pcie->ports, list)
|
list_for_each_entry(port, &pcie->ports, list)
|
||||||
tegra_pcie_pme_turnoff(port);
|
tegra_pcie_pme_turnoff(port);
|
||||||
|
|
||||||
tegra_pcie_disable_ports(pcie);
|
tegra_pcie_disable_ports(pcie);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* AFI_INTR is unmasked in tegra_pcie_enable_controller(), mask it to
|
||||||
|
* avoid unwanted interrupts raised by AFI after pex_rst is asserted.
|
||||||
|
*/
|
||||||
|
tegra_pcie_disable_interrupts(pcie);
|
||||||
|
|
||||||
|
if (pcie->soc->program_uphy) {
|
||||||
|
err = tegra_pcie_phy_power_off(pcie);
|
||||||
|
if (err < 0)
|
||||||
|
dev_err(dev, "failed to power off PHY(s): %d\n", err);
|
||||||
|
}
|
||||||
|
|
||||||
|
reset_control_assert(pcie->pex_rst);
|
||||||
|
clk_disable_unprepare(pcie->pex_clk);
|
||||||
|
|
||||||
if (IS_ENABLED(CONFIG_PCI_MSI))
|
if (IS_ENABLED(CONFIG_PCI_MSI))
|
||||||
tegra_pcie_disable_msi(pcie);
|
tegra_pcie_disable_msi(pcie);
|
||||||
|
|
||||||
tegra_pcie_disable_controller(pcie);
|
pinctrl_pm_select_idle_state(dev);
|
||||||
tegra_pcie_power_off(pcie);
|
tegra_pcie_power_off(pcie);
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
|
@ -2510,20 +2910,45 @@ static int __maybe_unused tegra_pcie_pm_resume(struct device *dev)
|
||||||
dev_err(dev, "tegra pcie power on fail: %d\n", err);
|
dev_err(dev, "tegra pcie power on fail: %d\n", err);
|
||||||
return err;
|
return err;
|
||||||
}
|
}
|
||||||
err = tegra_pcie_enable_controller(pcie);
|
|
||||||
if (err) {
|
err = pinctrl_pm_select_default_state(dev);
|
||||||
dev_err(dev, "tegra pcie controller enable fail: %d\n", err);
|
if (err < 0) {
|
||||||
|
dev_err(dev, "failed to disable PCIe IO DPD: %d\n", err);
|
||||||
goto poweroff;
|
goto poweroff;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
tegra_pcie_enable_controller(pcie);
|
||||||
tegra_pcie_setup_translations(pcie);
|
tegra_pcie_setup_translations(pcie);
|
||||||
|
|
||||||
if (IS_ENABLED(CONFIG_PCI_MSI))
|
if (IS_ENABLED(CONFIG_PCI_MSI))
|
||||||
tegra_pcie_enable_msi(pcie);
|
tegra_pcie_enable_msi(pcie);
|
||||||
|
|
||||||
|
err = clk_prepare_enable(pcie->pex_clk);
|
||||||
|
if (err) {
|
||||||
|
dev_err(dev, "failed to enable PEX clock: %d\n", err);
|
||||||
|
goto pex_dpd_enable;
|
||||||
|
}
|
||||||
|
|
||||||
|
reset_control_deassert(pcie->pex_rst);
|
||||||
|
|
||||||
|
if (pcie->soc->program_uphy) {
|
||||||
|
err = tegra_pcie_phy_power_on(pcie);
|
||||||
|
if (err < 0) {
|
||||||
|
dev_err(dev, "failed to power on PHY(s): %d\n", err);
|
||||||
|
goto disable_pex_clk;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
tegra_pcie_apply_pad_settings(pcie);
|
||||||
tegra_pcie_enable_ports(pcie);
|
tegra_pcie_enable_ports(pcie);
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
|
|
||||||
|
disable_pex_clk:
|
||||||
|
reset_control_assert(pcie->pex_rst);
|
||||||
|
clk_disable_unprepare(pcie->pex_clk);
|
||||||
|
pex_dpd_enable:
|
||||||
|
pinctrl_pm_select_idle_state(dev);
|
||||||
poweroff:
|
poweroff:
|
||||||
tegra_pcie_power_off(pcie);
|
tegra_pcie_power_off(pcie);
|
||||||
|
|
||||||
|
|
|
@ -10,6 +10,7 @@
|
||||||
#include <linux/interrupt.h>
|
#include <linux/interrupt.h>
|
||||||
#include <linux/irqchip/chained_irq.h>
|
#include <linux/irqchip/chained_irq.h>
|
||||||
#include <linux/init.h>
|
#include <linux/init.h>
|
||||||
|
#include <linux/module.h>
|
||||||
#include <linux/msi.h>
|
#include <linux/msi.h>
|
||||||
#include <linux/of_address.h>
|
#include <linux/of_address.h>
|
||||||
#include <linux/of_irq.h>
|
#include <linux/of_irq.h>
|
||||||
|
@ -288,4 +289,13 @@ static int __init altera_msi_init(void)
|
||||||
{
|
{
|
||||||
return platform_driver_register(&altera_msi_driver);
|
return platform_driver_register(&altera_msi_driver);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void __exit altera_msi_exit(void)
|
||||||
|
{
|
||||||
|
platform_driver_unregister(&altera_msi_driver);
|
||||||
|
}
|
||||||
|
|
||||||
subsys_initcall(altera_msi_init);
|
subsys_initcall(altera_msi_init);
|
||||||
|
MODULE_DEVICE_TABLE(of, altera_msi_of_match);
|
||||||
|
module_exit(altera_msi_exit);
|
||||||
|
MODULE_LICENSE("GPL v2");
|
||||||
|
|
|
@ -10,6 +10,7 @@
|
||||||
#include <linux/interrupt.h>
|
#include <linux/interrupt.h>
|
||||||
#include <linux/irqchip/chained_irq.h>
|
#include <linux/irqchip/chained_irq.h>
|
||||||
#include <linux/init.h>
|
#include <linux/init.h>
|
||||||
|
#include <linux/module.h>
|
||||||
#include <linux/of_address.h>
|
#include <linux/of_address.h>
|
||||||
#include <linux/of_device.h>
|
#include <linux/of_device.h>
|
||||||
#include <linux/of_irq.h>
|
#include <linux/of_irq.h>
|
||||||
|
@ -43,6 +44,8 @@
|
||||||
#define S10_RP_RXCPL_STATUS 0x200C
|
#define S10_RP_RXCPL_STATUS 0x200C
|
||||||
#define S10_RP_CFG_ADDR(pcie, reg) \
|
#define S10_RP_CFG_ADDR(pcie, reg) \
|
||||||
(((pcie)->hip_base) + (reg) + (1 << 20))
|
(((pcie)->hip_base) + (reg) + (1 << 20))
|
||||||
|
#define S10_RP_SECONDARY(pcie) \
|
||||||
|
readb(S10_RP_CFG_ADDR(pcie, PCI_SECONDARY_BUS))
|
||||||
|
|
||||||
/* TLP configuration type 0 and 1 */
|
/* TLP configuration type 0 and 1 */
|
||||||
#define TLP_FMTTYPE_CFGRD0 0x04 /* Configuration Read Type 0 */
|
#define TLP_FMTTYPE_CFGRD0 0x04 /* Configuration Read Type 0 */
|
||||||
|
@ -54,14 +57,9 @@
|
||||||
#define TLP_WRITE_TAG 0x10
|
#define TLP_WRITE_TAG 0x10
|
||||||
#define RP_DEVFN 0
|
#define RP_DEVFN 0
|
||||||
#define TLP_REQ_ID(bus, devfn) (((bus) << 8) | (devfn))
|
#define TLP_REQ_ID(bus, devfn) (((bus) << 8) | (devfn))
|
||||||
#define TLP_CFGRD_DW0(pcie, bus) \
|
#define TLP_CFG_DW0(pcie, cfg) \
|
||||||
((((bus == pcie->root_bus_nr) ? pcie->pcie_data->cfgrd0 \
|
(((cfg) << 24) | \
|
||||||
: pcie->pcie_data->cfgrd1) << 24) | \
|
TLP_PAYLOAD_SIZE)
|
||||||
TLP_PAYLOAD_SIZE)
|
|
||||||
#define TLP_CFGWR_DW0(pcie, bus) \
|
|
||||||
((((bus == pcie->root_bus_nr) ? pcie->pcie_data->cfgwr0 \
|
|
||||||
: pcie->pcie_data->cfgwr1) << 24) | \
|
|
||||||
TLP_PAYLOAD_SIZE)
|
|
||||||
#define TLP_CFG_DW1(pcie, tag, be) \
|
#define TLP_CFG_DW1(pcie, tag, be) \
|
||||||
(((TLP_REQ_ID(pcie->root_bus_nr, RP_DEVFN)) << 16) | (tag << 8) | (be))
|
(((TLP_REQ_ID(pcie->root_bus_nr, RP_DEVFN)) << 16) | (tag << 8) | (be))
|
||||||
#define TLP_CFG_DW2(bus, devfn, offset) \
|
#define TLP_CFG_DW2(bus, devfn, offset) \
|
||||||
|
@ -321,14 +319,31 @@ static void s10_tlp_write_packet(struct altera_pcie *pcie, u32 *headers,
|
||||||
s10_tlp_write_tx(pcie, data, RP_TX_EOP);
|
s10_tlp_write_tx(pcie, data, RP_TX_EOP);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void get_tlp_header(struct altera_pcie *pcie, u8 bus, u32 devfn,
|
||||||
|
int where, u8 byte_en, bool read, u32 *headers)
|
||||||
|
{
|
||||||
|
u8 cfg;
|
||||||
|
u8 cfg0 = read ? pcie->pcie_data->cfgrd0 : pcie->pcie_data->cfgwr0;
|
||||||
|
u8 cfg1 = read ? pcie->pcie_data->cfgrd1 : pcie->pcie_data->cfgwr1;
|
||||||
|
u8 tag = read ? TLP_READ_TAG : TLP_WRITE_TAG;
|
||||||
|
|
||||||
|
if (pcie->pcie_data->version == ALTERA_PCIE_V1)
|
||||||
|
cfg = (bus == pcie->root_bus_nr) ? cfg0 : cfg1;
|
||||||
|
else
|
||||||
|
cfg = (bus > S10_RP_SECONDARY(pcie)) ? cfg0 : cfg1;
|
||||||
|
|
||||||
|
headers[0] = TLP_CFG_DW0(pcie, cfg);
|
||||||
|
headers[1] = TLP_CFG_DW1(pcie, tag, byte_en);
|
||||||
|
headers[2] = TLP_CFG_DW2(bus, devfn, where);
|
||||||
|
}
|
||||||
|
|
||||||
static int tlp_cfg_dword_read(struct altera_pcie *pcie, u8 bus, u32 devfn,
|
static int tlp_cfg_dword_read(struct altera_pcie *pcie, u8 bus, u32 devfn,
|
||||||
int where, u8 byte_en, u32 *value)
|
int where, u8 byte_en, u32 *value)
|
||||||
{
|
{
|
||||||
u32 headers[TLP_HDR_SIZE];
|
u32 headers[TLP_HDR_SIZE];
|
||||||
|
|
||||||
headers[0] = TLP_CFGRD_DW0(pcie, bus);
|
get_tlp_header(pcie, bus, devfn, where, byte_en, true,
|
||||||
headers[1] = TLP_CFG_DW1(pcie, TLP_READ_TAG, byte_en);
|
headers);
|
||||||
headers[2] = TLP_CFG_DW2(bus, devfn, where);
|
|
||||||
|
|
||||||
pcie->pcie_data->ops->tlp_write_pkt(pcie, headers, 0, false);
|
pcie->pcie_data->ops->tlp_write_pkt(pcie, headers, 0, false);
|
||||||
|
|
||||||
|
@ -341,9 +356,8 @@ static int tlp_cfg_dword_write(struct altera_pcie *pcie, u8 bus, u32 devfn,
|
||||||
u32 headers[TLP_HDR_SIZE];
|
u32 headers[TLP_HDR_SIZE];
|
||||||
int ret;
|
int ret;
|
||||||
|
|
||||||
headers[0] = TLP_CFGWR_DW0(pcie, bus);
|
get_tlp_header(pcie, bus, devfn, where, byte_en, false,
|
||||||
headers[1] = TLP_CFG_DW1(pcie, TLP_WRITE_TAG, byte_en);
|
headers);
|
||||||
headers[2] = TLP_CFG_DW2(bus, devfn, where);
|
|
||||||
|
|
||||||
/* check alignment to Qword */
|
/* check alignment to Qword */
|
||||||
if ((where & 0x7) == 0)
|
if ((where & 0x7) == 0)
|
||||||
|
@ -705,6 +719,13 @@ static int altera_pcie_init_irq_domain(struct altera_pcie *pcie)
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void altera_pcie_irq_teardown(struct altera_pcie *pcie)
|
||||||
|
{
|
||||||
|
irq_set_chained_handler_and_data(pcie->irq, NULL, NULL);
|
||||||
|
irq_domain_remove(pcie->irq_domain);
|
||||||
|
irq_dispose_mapping(pcie->irq);
|
||||||
|
}
|
||||||
|
|
||||||
static int altera_pcie_parse_dt(struct altera_pcie *pcie)
|
static int altera_pcie_parse_dt(struct altera_pcie *pcie)
|
||||||
{
|
{
|
||||||
struct device *dev = &pcie->pdev->dev;
|
struct device *dev = &pcie->pdev->dev;
|
||||||
|
@ -798,6 +819,7 @@ static int altera_pcie_probe(struct platform_device *pdev)
|
||||||
|
|
||||||
pcie = pci_host_bridge_priv(bridge);
|
pcie = pci_host_bridge_priv(bridge);
|
||||||
pcie->pdev = pdev;
|
pcie->pdev = pdev;
|
||||||
|
platform_set_drvdata(pdev, pcie);
|
||||||
|
|
||||||
match = of_match_device(altera_pcie_of_match, &pdev->dev);
|
match = of_match_device(altera_pcie_of_match, &pdev->dev);
|
||||||
if (!match)
|
if (!match)
|
||||||
|
@ -855,13 +877,28 @@ static int altera_pcie_probe(struct platform_device *pdev)
|
||||||
return ret;
|
return ret;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static int altera_pcie_remove(struct platform_device *pdev)
|
||||||
|
{
|
||||||
|
struct altera_pcie *pcie = platform_get_drvdata(pdev);
|
||||||
|
struct pci_host_bridge *bridge = pci_host_bridge_from_priv(pcie);
|
||||||
|
|
||||||
|
pci_stop_root_bus(bridge->bus);
|
||||||
|
pci_remove_root_bus(bridge->bus);
|
||||||
|
pci_free_resource_list(&pcie->resources);
|
||||||
|
altera_pcie_irq_teardown(pcie);
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
static struct platform_driver altera_pcie_driver = {
|
static struct platform_driver altera_pcie_driver = {
|
||||||
.probe = altera_pcie_probe,
|
.probe = altera_pcie_probe,
|
||||||
|
.remove = altera_pcie_remove,
|
||||||
.driver = {
|
.driver = {
|
||||||
.name = "altera-pcie",
|
.name = "altera-pcie",
|
||||||
.of_match_table = altera_pcie_of_match,
|
.of_match_table = altera_pcie_of_match,
|
||||||
.suppress_bind_attrs = true,
|
|
||||||
},
|
},
|
||||||
};
|
};
|
||||||
|
|
||||||
builtin_platform_driver(altera_pcie_driver);
|
MODULE_DEVICE_TABLE(of, altera_pcie_of_match);
|
||||||
|
module_platform_driver(altera_pcie_driver);
|
||||||
|
MODULE_LICENSE("GPL v2");
|
||||||
|
|
|
@ -87,7 +87,7 @@ static int iproc_pcie_pltfm_probe(struct platform_device *pdev)
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* DT nodes are not used by all platforms that use the iProc PCIe
|
* DT nodes are not used by all platforms that use the iProc PCIe
|
||||||
* core driver. For platforms that require explict inbound mapping
|
* core driver. For platforms that require explicit inbound mapping
|
||||||
* configuration, "dma-ranges" would have been present in DT
|
* configuration, "dma-ranges" would have been present in DT
|
||||||
*/
|
*/
|
||||||
pcie->need_ib_cfg = of_property_read_bool(np, "dma-ranges");
|
pcie->need_ib_cfg = of_property_read_bool(np, "dma-ranges");
|
||||||
|
|
|
@ -163,7 +163,7 @@ enum iproc_pcie_ib_map_type {
|
||||||
* @size_unit: inbound mapping region size unit, could be SZ_1K, SZ_1M, or
|
* @size_unit: inbound mapping region size unit, could be SZ_1K, SZ_1M, or
|
||||||
* SZ_1G
|
* SZ_1G
|
||||||
* @region_sizes: list of supported inbound mapping region sizes in KB, MB, or
|
* @region_sizes: list of supported inbound mapping region sizes in KB, MB, or
|
||||||
* GB, depedning on the size unit
|
* GB, depending on the size unit
|
||||||
* @nr_sizes: number of supported inbound mapping region sizes
|
* @nr_sizes: number of supported inbound mapping region sizes
|
||||||
* @nr_windows: number of supported inbound mapping windows for the region
|
* @nr_windows: number of supported inbound mapping windows for the region
|
||||||
* @imap_addr_offset: register offset between the upper and lower 32-bit
|
* @imap_addr_offset: register offset between the upper and lower 32-bit
|
||||||
|
|
|
@ -31,56 +31,61 @@
|
||||||
* translation tables are grouped into windows, each window registers are
|
* translation tables are grouped into windows, each window registers are
|
||||||
* grouped into blocks of 4 or 16 registers each
|
* grouped into blocks of 4 or 16 registers each
|
||||||
*/
|
*/
|
||||||
#define PAB_REG_BLOCK_SIZE 16
|
#define PAB_REG_BLOCK_SIZE 16
|
||||||
#define PAB_EXT_REG_BLOCK_SIZE 4
|
#define PAB_EXT_REG_BLOCK_SIZE 4
|
||||||
|
|
||||||
#define PAB_REG_ADDR(offset, win) (offset + (win * PAB_REG_BLOCK_SIZE))
|
#define PAB_REG_ADDR(offset, win) \
|
||||||
#define PAB_EXT_REG_ADDR(offset, win) (offset + (win * PAB_EXT_REG_BLOCK_SIZE))
|
(offset + (win * PAB_REG_BLOCK_SIZE))
|
||||||
|
#define PAB_EXT_REG_ADDR(offset, win) \
|
||||||
|
(offset + (win * PAB_EXT_REG_BLOCK_SIZE))
|
||||||
|
|
||||||
#define LTSSM_STATUS 0x0404
|
#define LTSSM_STATUS 0x0404
|
||||||
#define LTSSM_STATUS_L0_MASK 0x3f
|
#define LTSSM_STATUS_L0_MASK 0x3f
|
||||||
#define LTSSM_STATUS_L0 0x2d
|
#define LTSSM_STATUS_L0 0x2d
|
||||||
|
|
||||||
#define PAB_CTRL 0x0808
|
#define PAB_CTRL 0x0808
|
||||||
#define AMBA_PIO_ENABLE_SHIFT 0
|
#define AMBA_PIO_ENABLE_SHIFT 0
|
||||||
#define PEX_PIO_ENABLE_SHIFT 1
|
#define PEX_PIO_ENABLE_SHIFT 1
|
||||||
#define PAGE_SEL_SHIFT 13
|
#define PAGE_SEL_SHIFT 13
|
||||||
#define PAGE_SEL_MASK 0x3f
|
#define PAGE_SEL_MASK 0x3f
|
||||||
#define PAGE_LO_MASK 0x3ff
|
#define PAGE_LO_MASK 0x3ff
|
||||||
#define PAGE_SEL_EN 0xc00
|
#define PAGE_SEL_OFFSET_SHIFT 10
|
||||||
#define PAGE_SEL_OFFSET_SHIFT 10
|
|
||||||
|
|
||||||
#define PAB_AXI_PIO_CTRL 0x0840
|
#define PAB_AXI_PIO_CTRL 0x0840
|
||||||
#define APIO_EN_MASK 0xf
|
#define APIO_EN_MASK 0xf
|
||||||
|
|
||||||
#define PAB_PEX_PIO_CTRL 0x08c0
|
#define PAB_PEX_PIO_CTRL 0x08c0
|
||||||
#define PIO_ENABLE_SHIFT 0
|
#define PIO_ENABLE_SHIFT 0
|
||||||
|
|
||||||
#define PAB_INTP_AMBA_MISC_ENB 0x0b0c
|
#define PAB_INTP_AMBA_MISC_ENB 0x0b0c
|
||||||
#define PAB_INTP_AMBA_MISC_STAT 0x0b1c
|
#define PAB_INTP_AMBA_MISC_STAT 0x0b1c
|
||||||
#define PAB_INTP_INTX_MASK 0x01e0
|
#define PAB_INTP_INTX_MASK 0x01e0
|
||||||
#define PAB_INTP_MSI_MASK 0x8
|
#define PAB_INTP_MSI_MASK 0x8
|
||||||
|
|
||||||
#define PAB_AXI_AMAP_CTRL(win) PAB_REG_ADDR(0x0ba0, win)
|
#define PAB_AXI_AMAP_CTRL(win) PAB_REG_ADDR(0x0ba0, win)
|
||||||
#define WIN_ENABLE_SHIFT 0
|
#define WIN_ENABLE_SHIFT 0
|
||||||
#define WIN_TYPE_SHIFT 1
|
#define WIN_TYPE_SHIFT 1
|
||||||
|
#define WIN_TYPE_MASK 0x3
|
||||||
|
#define WIN_SIZE_MASK 0xfffffc00
|
||||||
|
|
||||||
#define PAB_EXT_AXI_AMAP_SIZE(win) PAB_EXT_REG_ADDR(0xbaf0, win)
|
#define PAB_EXT_AXI_AMAP_SIZE(win) PAB_EXT_REG_ADDR(0xbaf0, win)
|
||||||
|
|
||||||
|
#define PAB_EXT_AXI_AMAP_AXI_WIN(win) PAB_EXT_REG_ADDR(0x80a0, win)
|
||||||
#define PAB_AXI_AMAP_AXI_WIN(win) PAB_REG_ADDR(0x0ba4, win)
|
#define PAB_AXI_AMAP_AXI_WIN(win) PAB_REG_ADDR(0x0ba4, win)
|
||||||
#define AXI_WINDOW_ALIGN_MASK 3
|
#define AXI_WINDOW_ALIGN_MASK 3
|
||||||
|
|
||||||
#define PAB_AXI_AMAP_PEX_WIN_L(win) PAB_REG_ADDR(0x0ba8, win)
|
#define PAB_AXI_AMAP_PEX_WIN_L(win) PAB_REG_ADDR(0x0ba8, win)
|
||||||
#define PAB_BUS_SHIFT 24
|
#define PAB_BUS_SHIFT 24
|
||||||
#define PAB_DEVICE_SHIFT 19
|
#define PAB_DEVICE_SHIFT 19
|
||||||
#define PAB_FUNCTION_SHIFT 16
|
#define PAB_FUNCTION_SHIFT 16
|
||||||
|
|
||||||
#define PAB_AXI_AMAP_PEX_WIN_H(win) PAB_REG_ADDR(0x0bac, win)
|
#define PAB_AXI_AMAP_PEX_WIN_H(win) PAB_REG_ADDR(0x0bac, win)
|
||||||
#define PAB_INTP_AXI_PIO_CLASS 0x474
|
#define PAB_INTP_AXI_PIO_CLASS 0x474
|
||||||
|
|
||||||
#define PAB_PEX_AMAP_CTRL(win) PAB_REG_ADDR(0x4ba0, win)
|
#define PAB_PEX_AMAP_CTRL(win) PAB_REG_ADDR(0x4ba0, win)
|
||||||
#define AMAP_CTRL_EN_SHIFT 0
|
#define AMAP_CTRL_EN_SHIFT 0
|
||||||
#define AMAP_CTRL_TYPE_SHIFT 1
|
#define AMAP_CTRL_TYPE_SHIFT 1
|
||||||
|
#define AMAP_CTRL_TYPE_MASK 3
|
||||||
|
|
||||||
#define PAB_EXT_PEX_AMAP_SIZEN(win) PAB_EXT_REG_ADDR(0xbef0, win)
|
#define PAB_EXT_PEX_AMAP_SIZEN(win) PAB_EXT_REG_ADDR(0xbef0, win)
|
||||||
#define PAB_PEX_AMAP_AXI_WIN(win) PAB_REG_ADDR(0x4ba4, win)
|
#define PAB_PEX_AMAP_AXI_WIN(win) PAB_REG_ADDR(0x4ba4, win)
|
||||||
|
@ -88,34 +93,40 @@
|
||||||
#define PAB_PEX_AMAP_PEX_WIN_H(win) PAB_REG_ADDR(0x4bac, win)
|
#define PAB_PEX_AMAP_PEX_WIN_H(win) PAB_REG_ADDR(0x4bac, win)
|
||||||
|
|
||||||
/* starting offset of INTX bits in status register */
|
/* starting offset of INTX bits in status register */
|
||||||
#define PAB_INTX_START 5
|
#define PAB_INTX_START 5
|
||||||
|
|
||||||
/* supported number of MSI interrupts */
|
/* supported number of MSI interrupts */
|
||||||
#define PCI_NUM_MSI 16
|
#define PCI_NUM_MSI 16
|
||||||
|
|
||||||
/* MSI registers */
|
/* MSI registers */
|
||||||
#define MSI_BASE_LO_OFFSET 0x04
|
#define MSI_BASE_LO_OFFSET 0x04
|
||||||
#define MSI_BASE_HI_OFFSET 0x08
|
#define MSI_BASE_HI_OFFSET 0x08
|
||||||
#define MSI_SIZE_OFFSET 0x0c
|
#define MSI_SIZE_OFFSET 0x0c
|
||||||
#define MSI_ENABLE_OFFSET 0x14
|
#define MSI_ENABLE_OFFSET 0x14
|
||||||
#define MSI_STATUS_OFFSET 0x18
|
#define MSI_STATUS_OFFSET 0x18
|
||||||
#define MSI_DATA_OFFSET 0x20
|
#define MSI_DATA_OFFSET 0x20
|
||||||
#define MSI_ADDR_L_OFFSET 0x24
|
#define MSI_ADDR_L_OFFSET 0x24
|
||||||
#define MSI_ADDR_H_OFFSET 0x28
|
#define MSI_ADDR_H_OFFSET 0x28
|
||||||
|
|
||||||
/* outbound and inbound window definitions */
|
/* outbound and inbound window definitions */
|
||||||
#define WIN_NUM_0 0
|
#define WIN_NUM_0 0
|
||||||
#define WIN_NUM_1 1
|
#define WIN_NUM_1 1
|
||||||
#define CFG_WINDOW_TYPE 0
|
#define CFG_WINDOW_TYPE 0
|
||||||
#define IO_WINDOW_TYPE 1
|
#define IO_WINDOW_TYPE 1
|
||||||
#define MEM_WINDOW_TYPE 2
|
#define MEM_WINDOW_TYPE 2
|
||||||
#define IB_WIN_SIZE ((u64)256 * 1024 * 1024 * 1024)
|
#define IB_WIN_SIZE ((u64)256 * 1024 * 1024 * 1024)
|
||||||
#define MAX_PIO_WINDOWS 8
|
#define MAX_PIO_WINDOWS 8
|
||||||
|
|
||||||
/* Parameters for the waiting for link up routine */
|
/* Parameters for the waiting for link up routine */
|
||||||
#define LINK_WAIT_MAX_RETRIES 10
|
#define LINK_WAIT_MAX_RETRIES 10
|
||||||
#define LINK_WAIT_MIN 90000
|
#define LINK_WAIT_MIN 90000
|
||||||
#define LINK_WAIT_MAX 100000
|
#define LINK_WAIT_MAX 100000
|
||||||
|
|
||||||
|
#define PAGED_ADDR_BNDRY 0xc00
|
||||||
|
#define OFFSET_TO_PAGE_ADDR(off) \
|
||||||
|
((off & PAGE_LO_MASK) | PAGED_ADDR_BNDRY)
|
||||||
|
#define OFFSET_TO_PAGE_IDX(off) \
|
||||||
|
((off >> PAGE_SEL_OFFSET_SHIFT) & PAGE_SEL_MASK)
|
||||||
|
|
||||||
struct mobiveil_msi { /* MSI information */
|
struct mobiveil_msi { /* MSI information */
|
||||||
struct mutex lock; /* protect bitmap variable */
|
struct mutex lock; /* protect bitmap variable */
|
||||||
|
@ -145,15 +156,119 @@ struct mobiveil_pcie {
|
||||||
struct mobiveil_msi msi;
|
struct mobiveil_msi msi;
|
||||||
};
|
};
|
||||||
|
|
||||||
static inline void csr_writel(struct mobiveil_pcie *pcie, const u32 value,
|
/*
|
||||||
const u32 reg)
|
* mobiveil_pcie_sel_page - routine to access paged register
|
||||||
|
*
|
||||||
|
* Registers whose address greater than PAGED_ADDR_BNDRY (0xc00) are paged,
|
||||||
|
* for this scheme to work extracted higher 6 bits of the offset will be
|
||||||
|
* written to pg_sel field of PAB_CTRL register and rest of the lower 10
|
||||||
|
* bits enabled with PAGED_ADDR_BNDRY are used as offset of the register.
|
||||||
|
*/
|
||||||
|
static void mobiveil_pcie_sel_page(struct mobiveil_pcie *pcie, u8 pg_idx)
|
||||||
{
|
{
|
||||||
writel_relaxed(value, pcie->csr_axi_slave_base + reg);
|
u32 val;
|
||||||
|
|
||||||
|
val = readl(pcie->csr_axi_slave_base + PAB_CTRL);
|
||||||
|
val &= ~(PAGE_SEL_MASK << PAGE_SEL_SHIFT);
|
||||||
|
val |= (pg_idx & PAGE_SEL_MASK) << PAGE_SEL_SHIFT;
|
||||||
|
|
||||||
|
writel(val, pcie->csr_axi_slave_base + PAB_CTRL);
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline u32 csr_readl(struct mobiveil_pcie *pcie, const u32 reg)
|
static void *mobiveil_pcie_comp_addr(struct mobiveil_pcie *pcie, u32 off)
|
||||||
{
|
{
|
||||||
return readl_relaxed(pcie->csr_axi_slave_base + reg);
|
if (off < PAGED_ADDR_BNDRY) {
|
||||||
|
/* For directly accessed registers, clear the pg_sel field */
|
||||||
|
mobiveil_pcie_sel_page(pcie, 0);
|
||||||
|
return pcie->csr_axi_slave_base + off;
|
||||||
|
}
|
||||||
|
|
||||||
|
mobiveil_pcie_sel_page(pcie, OFFSET_TO_PAGE_IDX(off));
|
||||||
|
return pcie->csr_axi_slave_base + OFFSET_TO_PAGE_ADDR(off);
|
||||||
|
}
|
||||||
|
|
||||||
|
static int mobiveil_pcie_read(void __iomem *addr, int size, u32 *val)
|
||||||
|
{
|
||||||
|
if ((uintptr_t)addr & (size - 1)) {
|
||||||
|
*val = 0;
|
||||||
|
return PCIBIOS_BAD_REGISTER_NUMBER;
|
||||||
|
}
|
||||||
|
|
||||||
|
switch (size) {
|
||||||
|
case 4:
|
||||||
|
*val = readl(addr);
|
||||||
|
break;
|
||||||
|
case 2:
|
||||||
|
*val = readw(addr);
|
||||||
|
break;
|
||||||
|
case 1:
|
||||||
|
*val = readb(addr);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
*val = 0;
|
||||||
|
return PCIBIOS_BAD_REGISTER_NUMBER;
|
||||||
|
}
|
||||||
|
|
||||||
|
return PCIBIOS_SUCCESSFUL;
|
||||||
|
}
|
||||||
|
|
||||||
|
static int mobiveil_pcie_write(void __iomem *addr, int size, u32 val)
|
||||||
|
{
|
||||||
|
if ((uintptr_t)addr & (size - 1))
|
||||||
|
return PCIBIOS_BAD_REGISTER_NUMBER;
|
||||||
|
|
||||||
|
switch (size) {
|
||||||
|
case 4:
|
||||||
|
writel(val, addr);
|
||||||
|
break;
|
||||||
|
case 2:
|
||||||
|
writew(val, addr);
|
||||||
|
break;
|
||||||
|
case 1:
|
||||||
|
writeb(val, addr);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
return PCIBIOS_BAD_REGISTER_NUMBER;
|
||||||
|
}
|
||||||
|
|
||||||
|
return PCIBIOS_SUCCESSFUL;
|
||||||
|
}
|
||||||
|
|
||||||
|
static u32 csr_read(struct mobiveil_pcie *pcie, u32 off, size_t size)
|
||||||
|
{
|
||||||
|
void *addr;
|
||||||
|
u32 val;
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
addr = mobiveil_pcie_comp_addr(pcie, off);
|
||||||
|
|
||||||
|
ret = mobiveil_pcie_read(addr, size, &val);
|
||||||
|
if (ret)
|
||||||
|
dev_err(&pcie->pdev->dev, "read CSR address failed\n");
|
||||||
|
|
||||||
|
return val;
|
||||||
|
}
|
||||||
|
|
||||||
|
static void csr_write(struct mobiveil_pcie *pcie, u32 val, u32 off, size_t size)
|
||||||
|
{
|
||||||
|
void *addr;
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
addr = mobiveil_pcie_comp_addr(pcie, off);
|
||||||
|
|
||||||
|
ret = mobiveil_pcie_write(addr, size, val);
|
||||||
|
if (ret)
|
||||||
|
dev_err(&pcie->pdev->dev, "write CSR address failed\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
static u32 csr_readl(struct mobiveil_pcie *pcie, u32 off)
|
||||||
|
{
|
||||||
|
return csr_read(pcie, off, 0x4);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void csr_writel(struct mobiveil_pcie *pcie, u32 val, u32 off)
|
||||||
|
{
|
||||||
|
csr_write(pcie, val, off, 0x4);
|
||||||
}
|
}
|
||||||
|
|
||||||
static bool mobiveil_pcie_link_up(struct mobiveil_pcie *pcie)
|
static bool mobiveil_pcie_link_up(struct mobiveil_pcie *pcie)
|
||||||
|
@ -174,7 +289,7 @@ static bool mobiveil_pcie_valid_device(struct pci_bus *bus, unsigned int devfn)
|
||||||
* Do not read more than one device on the bus directly
|
* Do not read more than one device on the bus directly
|
||||||
* attached to RC
|
* attached to RC
|
||||||
*/
|
*/
|
||||||
if ((bus->primary == pcie->root_bus_nr) && (devfn > 0))
|
if ((bus->primary == pcie->root_bus_nr) && (PCI_SLOT(devfn) > 0))
|
||||||
return false;
|
return false;
|
||||||
|
|
||||||
return true;
|
return true;
|
||||||
|
@ -185,17 +300,17 @@ static bool mobiveil_pcie_valid_device(struct pci_bus *bus, unsigned int devfn)
|
||||||
* root port or endpoint
|
* root port or endpoint
|
||||||
*/
|
*/
|
||||||
static void __iomem *mobiveil_pcie_map_bus(struct pci_bus *bus,
|
static void __iomem *mobiveil_pcie_map_bus(struct pci_bus *bus,
|
||||||
unsigned int devfn, int where)
|
unsigned int devfn, int where)
|
||||||
{
|
{
|
||||||
struct mobiveil_pcie *pcie = bus->sysdata;
|
struct mobiveil_pcie *pcie = bus->sysdata;
|
||||||
|
u32 value;
|
||||||
|
|
||||||
if (!mobiveil_pcie_valid_device(bus, devfn))
|
if (!mobiveil_pcie_valid_device(bus, devfn))
|
||||||
return NULL;
|
return NULL;
|
||||||
|
|
||||||
if (bus->number == pcie->root_bus_nr) {
|
/* RC config access */
|
||||||
/* RC config access */
|
if (bus->number == pcie->root_bus_nr)
|
||||||
return pcie->csr_axi_slave_base + where;
|
return pcie->csr_axi_slave_base + where;
|
||||||
}
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* EP config access (in Config/APIO space)
|
* EP config access (in Config/APIO space)
|
||||||
|
@ -203,10 +318,12 @@ static void __iomem *mobiveil_pcie_map_bus(struct pci_bus *bus,
|
||||||
* (BDF) in PAB_AXI_AMAP_PEX_WIN_L0 Register.
|
* (BDF) in PAB_AXI_AMAP_PEX_WIN_L0 Register.
|
||||||
* Relies on pci_lock serialization
|
* Relies on pci_lock serialization
|
||||||
*/
|
*/
|
||||||
csr_writel(pcie, bus->number << PAB_BUS_SHIFT |
|
value = bus->number << PAB_BUS_SHIFT |
|
||||||
PCI_SLOT(devfn) << PAB_DEVICE_SHIFT |
|
PCI_SLOT(devfn) << PAB_DEVICE_SHIFT |
|
||||||
PCI_FUNC(devfn) << PAB_FUNCTION_SHIFT,
|
PCI_FUNC(devfn) << PAB_FUNCTION_SHIFT;
|
||||||
PAB_AXI_AMAP_PEX_WIN_L(WIN_NUM_0));
|
|
||||||
|
csr_writel(pcie, value, PAB_AXI_AMAP_PEX_WIN_L(WIN_NUM_0));
|
||||||
|
|
||||||
return pcie->config_axi_slave_base + where;
|
return pcie->config_axi_slave_base + where;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -241,24 +358,29 @@ static void mobiveil_pcie_isr(struct irq_desc *desc)
|
||||||
|
|
||||||
/* Handle INTx */
|
/* Handle INTx */
|
||||||
if (intr_status & PAB_INTP_INTX_MASK) {
|
if (intr_status & PAB_INTP_INTX_MASK) {
|
||||||
shifted_status = csr_readl(pcie, PAB_INTP_AMBA_MISC_STAT) >>
|
shifted_status = csr_readl(pcie, PAB_INTP_AMBA_MISC_STAT);
|
||||||
PAB_INTX_START;
|
shifted_status &= PAB_INTP_INTX_MASK;
|
||||||
|
shifted_status >>= PAB_INTX_START;
|
||||||
do {
|
do {
|
||||||
for_each_set_bit(bit, &shifted_status, PCI_NUM_INTX) {
|
for_each_set_bit(bit, &shifted_status, PCI_NUM_INTX) {
|
||||||
virq = irq_find_mapping(pcie->intx_domain,
|
virq = irq_find_mapping(pcie->intx_domain,
|
||||||
bit + 1);
|
bit + 1);
|
||||||
if (virq)
|
if (virq)
|
||||||
generic_handle_irq(virq);
|
generic_handle_irq(virq);
|
||||||
else
|
else
|
||||||
dev_err_ratelimited(dev,
|
dev_err_ratelimited(dev, "unexpected IRQ, INT%d\n",
|
||||||
"unexpected IRQ, INT%d\n", bit);
|
bit);
|
||||||
|
|
||||||
/* clear interrupt */
|
/* clear interrupt handled */
|
||||||
csr_writel(pcie,
|
csr_writel(pcie, 1 << (PAB_INTX_START + bit),
|
||||||
shifted_status << PAB_INTX_START,
|
PAB_INTP_AMBA_MISC_STAT);
|
||||||
PAB_INTP_AMBA_MISC_STAT);
|
|
||||||
}
|
}
|
||||||
} while ((shifted_status >> PAB_INTX_START) != 0);
|
|
||||||
|
shifted_status = csr_readl(pcie,
|
||||||
|
PAB_INTP_AMBA_MISC_STAT);
|
||||||
|
shifted_status &= PAB_INTP_INTX_MASK;
|
||||||
|
shifted_status >>= PAB_INTX_START;
|
||||||
|
} while (shifted_status != 0);
|
||||||
}
|
}
|
||||||
|
|
||||||
/* read extra MSI status register */
|
/* read extra MSI status register */
|
||||||
|
@ -266,8 +388,7 @@ static void mobiveil_pcie_isr(struct irq_desc *desc)
|
||||||
|
|
||||||
/* handle MSI interrupts */
|
/* handle MSI interrupts */
|
||||||
while (msi_status & 1) {
|
while (msi_status & 1) {
|
||||||
msi_data = readl_relaxed(pcie->apb_csr_base
|
msi_data = readl_relaxed(pcie->apb_csr_base + MSI_DATA_OFFSET);
|
||||||
+ MSI_DATA_OFFSET);
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* MSI_STATUS_OFFSET register gets updated to zero
|
* MSI_STATUS_OFFSET register gets updated to zero
|
||||||
|
@ -276,18 +397,18 @@ static void mobiveil_pcie_isr(struct irq_desc *desc)
|
||||||
* two dummy reads.
|
* two dummy reads.
|
||||||
*/
|
*/
|
||||||
msi_addr_lo = readl_relaxed(pcie->apb_csr_base +
|
msi_addr_lo = readl_relaxed(pcie->apb_csr_base +
|
||||||
MSI_ADDR_L_OFFSET);
|
MSI_ADDR_L_OFFSET);
|
||||||
msi_addr_hi = readl_relaxed(pcie->apb_csr_base +
|
msi_addr_hi = readl_relaxed(pcie->apb_csr_base +
|
||||||
MSI_ADDR_H_OFFSET);
|
MSI_ADDR_H_OFFSET);
|
||||||
dev_dbg(dev, "MSI registers, data: %08x, addr: %08x:%08x\n",
|
dev_dbg(dev, "MSI registers, data: %08x, addr: %08x:%08x\n",
|
||||||
msi_data, msi_addr_hi, msi_addr_lo);
|
msi_data, msi_addr_hi, msi_addr_lo);
|
||||||
|
|
||||||
virq = irq_find_mapping(msi->dev_domain, msi_data);
|
virq = irq_find_mapping(msi->dev_domain, msi_data);
|
||||||
if (virq)
|
if (virq)
|
||||||
generic_handle_irq(virq);
|
generic_handle_irq(virq);
|
||||||
|
|
||||||
msi_status = readl_relaxed(pcie->apb_csr_base +
|
msi_status = readl_relaxed(pcie->apb_csr_base +
|
||||||
MSI_STATUS_OFFSET);
|
MSI_STATUS_OFFSET);
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Clear the interrupt status */
|
/* Clear the interrupt status */
|
||||||
|
@ -304,7 +425,7 @@ static int mobiveil_pcie_parse_dt(struct mobiveil_pcie *pcie)
|
||||||
|
|
||||||
/* map config resource */
|
/* map config resource */
|
||||||
res = platform_get_resource_byname(pdev, IORESOURCE_MEM,
|
res = platform_get_resource_byname(pdev, IORESOURCE_MEM,
|
||||||
"config_axi_slave");
|
"config_axi_slave");
|
||||||
pcie->config_axi_slave_base = devm_pci_remap_cfg_resource(dev, res);
|
pcie->config_axi_slave_base = devm_pci_remap_cfg_resource(dev, res);
|
||||||
if (IS_ERR(pcie->config_axi_slave_base))
|
if (IS_ERR(pcie->config_axi_slave_base))
|
||||||
return PTR_ERR(pcie->config_axi_slave_base);
|
return PTR_ERR(pcie->config_axi_slave_base);
|
||||||
|
@ -312,7 +433,7 @@ static int mobiveil_pcie_parse_dt(struct mobiveil_pcie *pcie)
|
||||||
|
|
||||||
/* map csr resource */
|
/* map csr resource */
|
||||||
res = platform_get_resource_byname(pdev, IORESOURCE_MEM,
|
res = platform_get_resource_byname(pdev, IORESOURCE_MEM,
|
||||||
"csr_axi_slave");
|
"csr_axi_slave");
|
||||||
pcie->csr_axi_slave_base = devm_pci_remap_cfg_resource(dev, res);
|
pcie->csr_axi_slave_base = devm_pci_remap_cfg_resource(dev, res);
|
||||||
if (IS_ERR(pcie->csr_axi_slave_base))
|
if (IS_ERR(pcie->csr_axi_slave_base))
|
||||||
return PTR_ERR(pcie->csr_axi_slave_base);
|
return PTR_ERR(pcie->csr_axi_slave_base);
|
||||||
|
@ -337,92 +458,50 @@ static int mobiveil_pcie_parse_dt(struct mobiveil_pcie *pcie)
|
||||||
return -ENODEV;
|
return -ENODEV;
|
||||||
}
|
}
|
||||||
|
|
||||||
irq_set_chained_handler_and_data(pcie->irq, mobiveil_pcie_isr, pcie);
|
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
|
||||||
* select_paged_register - routine to access paged register of root complex
|
|
||||||
*
|
|
||||||
* registers of RC are paged, for this scheme to work
|
|
||||||
* extracted higher 6 bits of the offset will be written to pg_sel
|
|
||||||
* field of PAB_CTRL register and rest of the lower 10 bits enabled with
|
|
||||||
* PAGE_SEL_EN are used as offset of the register.
|
|
||||||
*/
|
|
||||||
static void select_paged_register(struct mobiveil_pcie *pcie, u32 offset)
|
|
||||||
{
|
|
||||||
int pab_ctrl_dw, pg_sel;
|
|
||||||
|
|
||||||
/* clear pg_sel field */
|
|
||||||
pab_ctrl_dw = csr_readl(pcie, PAB_CTRL);
|
|
||||||
pab_ctrl_dw = (pab_ctrl_dw & ~(PAGE_SEL_MASK << PAGE_SEL_SHIFT));
|
|
||||||
|
|
||||||
/* set pg_sel field */
|
|
||||||
pg_sel = (offset >> PAGE_SEL_OFFSET_SHIFT) & PAGE_SEL_MASK;
|
|
||||||
pab_ctrl_dw |= ((pg_sel << PAGE_SEL_SHIFT));
|
|
||||||
csr_writel(pcie, pab_ctrl_dw, PAB_CTRL);
|
|
||||||
}
|
|
||||||
|
|
||||||
static void write_paged_register(struct mobiveil_pcie *pcie,
|
|
||||||
u32 val, u32 offset)
|
|
||||||
{
|
|
||||||
u32 off = (offset & PAGE_LO_MASK) | PAGE_SEL_EN;
|
|
||||||
|
|
||||||
select_paged_register(pcie, offset);
|
|
||||||
csr_writel(pcie, val, off);
|
|
||||||
}
|
|
||||||
|
|
||||||
static u32 read_paged_register(struct mobiveil_pcie *pcie, u32 offset)
|
|
||||||
{
|
|
||||||
u32 off = (offset & PAGE_LO_MASK) | PAGE_SEL_EN;
|
|
||||||
|
|
||||||
select_paged_register(pcie, offset);
|
|
||||||
return csr_readl(pcie, off);
|
|
||||||
}
|
|
||||||
|
|
||||||
static void program_ib_windows(struct mobiveil_pcie *pcie, int win_num,
|
static void program_ib_windows(struct mobiveil_pcie *pcie, int win_num,
|
||||||
int pci_addr, u32 type, u64 size)
|
u64 pci_addr, u32 type, u64 size)
|
||||||
{
|
{
|
||||||
int pio_ctrl_val;
|
u32 value;
|
||||||
int amap_ctrl_dw;
|
|
||||||
u64 size64 = ~(size - 1);
|
u64 size64 = ~(size - 1);
|
||||||
|
|
||||||
if ((pcie->ib_wins_configured + 1) > pcie->ppio_wins) {
|
if (win_num >= pcie->ppio_wins) {
|
||||||
dev_err(&pcie->pdev->dev,
|
dev_err(&pcie->pdev->dev,
|
||||||
"ERROR: max inbound windows reached !\n");
|
"ERROR: max inbound windows reached !\n");
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
pio_ctrl_val = csr_readl(pcie, PAB_PEX_PIO_CTRL);
|
value = csr_readl(pcie, PAB_PEX_AMAP_CTRL(win_num));
|
||||||
csr_writel(pcie,
|
value &= ~(AMAP_CTRL_TYPE_MASK << AMAP_CTRL_TYPE_SHIFT | WIN_SIZE_MASK);
|
||||||
pio_ctrl_val | (1 << PIO_ENABLE_SHIFT), PAB_PEX_PIO_CTRL);
|
value |= type << AMAP_CTRL_TYPE_SHIFT | 1 << AMAP_CTRL_EN_SHIFT |
|
||||||
amap_ctrl_dw = read_paged_register(pcie, PAB_PEX_AMAP_CTRL(win_num));
|
(lower_32_bits(size64) & WIN_SIZE_MASK);
|
||||||
amap_ctrl_dw = (amap_ctrl_dw | (type << AMAP_CTRL_TYPE_SHIFT));
|
csr_writel(pcie, value, PAB_PEX_AMAP_CTRL(win_num));
|
||||||
amap_ctrl_dw = (amap_ctrl_dw | (1 << AMAP_CTRL_EN_SHIFT));
|
|
||||||
|
|
||||||
write_paged_register(pcie, amap_ctrl_dw | lower_32_bits(size64),
|
csr_writel(pcie, upper_32_bits(size64),
|
||||||
PAB_PEX_AMAP_CTRL(win_num));
|
PAB_EXT_PEX_AMAP_SIZEN(win_num));
|
||||||
|
|
||||||
write_paged_register(pcie, upper_32_bits(size64),
|
csr_writel(pcie, pci_addr, PAB_PEX_AMAP_AXI_WIN(win_num));
|
||||||
PAB_EXT_PEX_AMAP_SIZEN(win_num));
|
|
||||||
|
|
||||||
write_paged_register(pcie, pci_addr, PAB_PEX_AMAP_AXI_WIN(win_num));
|
csr_writel(pcie, lower_32_bits(pci_addr),
|
||||||
write_paged_register(pcie, pci_addr, PAB_PEX_AMAP_PEX_WIN_L(win_num));
|
PAB_PEX_AMAP_PEX_WIN_L(win_num));
|
||||||
write_paged_register(pcie, 0, PAB_PEX_AMAP_PEX_WIN_H(win_num));
|
csr_writel(pcie, upper_32_bits(pci_addr),
|
||||||
|
PAB_PEX_AMAP_PEX_WIN_H(win_num));
|
||||||
|
|
||||||
|
pcie->ib_wins_configured++;
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* routine to program the outbound windows
|
* routine to program the outbound windows
|
||||||
*/
|
*/
|
||||||
static void program_ob_windows(struct mobiveil_pcie *pcie, int win_num,
|
static void program_ob_windows(struct mobiveil_pcie *pcie, int win_num,
|
||||||
u64 cpu_addr, u64 pci_addr, u32 config_io_bit, u64 size)
|
u64 cpu_addr, u64 pci_addr, u32 type, u64 size)
|
||||||
{
|
{
|
||||||
|
u32 value;
|
||||||
u32 value, type;
|
|
||||||
u64 size64 = ~(size - 1);
|
u64 size64 = ~(size - 1);
|
||||||
|
|
||||||
if ((pcie->ob_wins_configured + 1) > pcie->apio_wins) {
|
if (win_num >= pcie->apio_wins) {
|
||||||
dev_err(&pcie->pdev->dev,
|
dev_err(&pcie->pdev->dev,
|
||||||
"ERROR: max outbound windows reached !\n");
|
"ERROR: max outbound windows reached !\n");
|
||||||
return;
|
return;
|
||||||
|
@ -432,28 +511,27 @@ static void program_ob_windows(struct mobiveil_pcie *pcie, int win_num,
|
||||||
* program Enable Bit to 1, Type Bit to (00) base 2, AXI Window Size Bit
|
* program Enable Bit to 1, Type Bit to (00) base 2, AXI Window Size Bit
|
||||||
* to 4 KB in PAB_AXI_AMAP_CTRL register
|
* to 4 KB in PAB_AXI_AMAP_CTRL register
|
||||||
*/
|
*/
|
||||||
type = config_io_bit;
|
|
||||||
value = csr_readl(pcie, PAB_AXI_AMAP_CTRL(win_num));
|
value = csr_readl(pcie, PAB_AXI_AMAP_CTRL(win_num));
|
||||||
csr_writel(pcie, 1 << WIN_ENABLE_SHIFT | type << WIN_TYPE_SHIFT |
|
value &= ~(WIN_TYPE_MASK << WIN_TYPE_SHIFT | WIN_SIZE_MASK);
|
||||||
lower_32_bits(size64), PAB_AXI_AMAP_CTRL(win_num));
|
value |= 1 << WIN_ENABLE_SHIFT | type << WIN_TYPE_SHIFT |
|
||||||
|
(lower_32_bits(size64) & WIN_SIZE_MASK);
|
||||||
|
csr_writel(pcie, value, PAB_AXI_AMAP_CTRL(win_num));
|
||||||
|
|
||||||
write_paged_register(pcie, upper_32_bits(size64),
|
csr_writel(pcie, upper_32_bits(size64), PAB_EXT_AXI_AMAP_SIZE(win_num));
|
||||||
PAB_EXT_AXI_AMAP_SIZE(win_num));
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* program AXI window base with appropriate value in
|
* program AXI window base with appropriate value in
|
||||||
* PAB_AXI_AMAP_AXI_WIN0 register
|
* PAB_AXI_AMAP_AXI_WIN0 register
|
||||||
*/
|
*/
|
||||||
value = csr_readl(pcie, PAB_AXI_AMAP_AXI_WIN(win_num));
|
csr_writel(pcie, lower_32_bits(cpu_addr) & (~AXI_WINDOW_ALIGN_MASK),
|
||||||
csr_writel(pcie, cpu_addr & (~AXI_WINDOW_ALIGN_MASK),
|
PAB_AXI_AMAP_AXI_WIN(win_num));
|
||||||
PAB_AXI_AMAP_AXI_WIN(win_num));
|
csr_writel(pcie, upper_32_bits(cpu_addr),
|
||||||
|
PAB_EXT_AXI_AMAP_AXI_WIN(win_num));
|
||||||
value = csr_readl(pcie, PAB_AXI_AMAP_PEX_WIN_H(win_num));
|
|
||||||
|
|
||||||
csr_writel(pcie, lower_32_bits(pci_addr),
|
csr_writel(pcie, lower_32_bits(pci_addr),
|
||||||
PAB_AXI_AMAP_PEX_WIN_L(win_num));
|
PAB_AXI_AMAP_PEX_WIN_L(win_num));
|
||||||
csr_writel(pcie, upper_32_bits(pci_addr),
|
csr_writel(pcie, upper_32_bits(pci_addr),
|
||||||
PAB_AXI_AMAP_PEX_WIN_H(win_num));
|
PAB_AXI_AMAP_PEX_WIN_H(win_num));
|
||||||
|
|
||||||
pcie->ob_wins_configured++;
|
pcie->ob_wins_configured++;
|
||||||
}
|
}
|
||||||
|
@ -469,7 +547,9 @@ static int mobiveil_bringup_link(struct mobiveil_pcie *pcie)
|
||||||
|
|
||||||
usleep_range(LINK_WAIT_MIN, LINK_WAIT_MAX);
|
usleep_range(LINK_WAIT_MIN, LINK_WAIT_MAX);
|
||||||
}
|
}
|
||||||
|
|
||||||
dev_err(&pcie->pdev->dev, "link never came up\n");
|
dev_err(&pcie->pdev->dev, "link never came up\n");
|
||||||
|
|
||||||
return -ETIMEDOUT;
|
return -ETIMEDOUT;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -482,50 +562,55 @@ static void mobiveil_pcie_enable_msi(struct mobiveil_pcie *pcie)
|
||||||
msi->msi_pages_phys = (phys_addr_t)msg_addr;
|
msi->msi_pages_phys = (phys_addr_t)msg_addr;
|
||||||
|
|
||||||
writel_relaxed(lower_32_bits(msg_addr),
|
writel_relaxed(lower_32_bits(msg_addr),
|
||||||
pcie->apb_csr_base + MSI_BASE_LO_OFFSET);
|
pcie->apb_csr_base + MSI_BASE_LO_OFFSET);
|
||||||
writel_relaxed(upper_32_bits(msg_addr),
|
writel_relaxed(upper_32_bits(msg_addr),
|
||||||
pcie->apb_csr_base + MSI_BASE_HI_OFFSET);
|
pcie->apb_csr_base + MSI_BASE_HI_OFFSET);
|
||||||
writel_relaxed(4096, pcie->apb_csr_base + MSI_SIZE_OFFSET);
|
writel_relaxed(4096, pcie->apb_csr_base + MSI_SIZE_OFFSET);
|
||||||
writel_relaxed(1, pcie->apb_csr_base + MSI_ENABLE_OFFSET);
|
writel_relaxed(1, pcie->apb_csr_base + MSI_ENABLE_OFFSET);
|
||||||
}
|
}
|
||||||
|
|
||||||
static int mobiveil_host_init(struct mobiveil_pcie *pcie)
|
static int mobiveil_host_init(struct mobiveil_pcie *pcie)
|
||||||
{
|
{
|
||||||
u32 value, pab_ctrl, type = 0;
|
u32 value, pab_ctrl, type;
|
||||||
int err;
|
struct resource_entry *win;
|
||||||
struct resource_entry *win, *tmp;
|
|
||||||
|
|
||||||
err = mobiveil_bringup_link(pcie);
|
/* setup bus numbers */
|
||||||
if (err) {
|
value = csr_readl(pcie, PCI_PRIMARY_BUS);
|
||||||
dev_info(&pcie->pdev->dev, "link bring-up failed\n");
|
value &= 0xff000000;
|
||||||
return err;
|
value |= 0x00ff0100;
|
||||||
}
|
csr_writel(pcie, value, PCI_PRIMARY_BUS);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* program Bus Master Enable Bit in Command Register in PAB Config
|
* program Bus Master Enable Bit in Command Register in PAB Config
|
||||||
* Space
|
* Space
|
||||||
*/
|
*/
|
||||||
value = csr_readl(pcie, PCI_COMMAND);
|
value = csr_readl(pcie, PCI_COMMAND);
|
||||||
csr_writel(pcie, value | PCI_COMMAND_IO | PCI_COMMAND_MEMORY |
|
value |= PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER;
|
||||||
PCI_COMMAND_MASTER, PCI_COMMAND);
|
csr_writel(pcie, value, PCI_COMMAND);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* program PIO Enable Bit to 1 (and PEX PIO Enable to 1) in PAB_CTRL
|
* program PIO Enable Bit to 1 (and PEX PIO Enable to 1) in PAB_CTRL
|
||||||
* register
|
* register
|
||||||
*/
|
*/
|
||||||
pab_ctrl = csr_readl(pcie, PAB_CTRL);
|
pab_ctrl = csr_readl(pcie, PAB_CTRL);
|
||||||
csr_writel(pcie, pab_ctrl | (1 << AMBA_PIO_ENABLE_SHIFT) |
|
pab_ctrl |= (1 << AMBA_PIO_ENABLE_SHIFT) | (1 << PEX_PIO_ENABLE_SHIFT);
|
||||||
(1 << PEX_PIO_ENABLE_SHIFT), PAB_CTRL);
|
csr_writel(pcie, pab_ctrl, PAB_CTRL);
|
||||||
|
|
||||||
csr_writel(pcie, (PAB_INTP_INTX_MASK | PAB_INTP_MSI_MASK),
|
csr_writel(pcie, (PAB_INTP_INTX_MASK | PAB_INTP_MSI_MASK),
|
||||||
PAB_INTP_AMBA_MISC_ENB);
|
PAB_INTP_AMBA_MISC_ENB);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* program PIO Enable Bit to 1 and Config Window Enable Bit to 1 in
|
* program PIO Enable Bit to 1 and Config Window Enable Bit to 1 in
|
||||||
* PAB_AXI_PIO_CTRL Register
|
* PAB_AXI_PIO_CTRL Register
|
||||||
*/
|
*/
|
||||||
value = csr_readl(pcie, PAB_AXI_PIO_CTRL);
|
value = csr_readl(pcie, PAB_AXI_PIO_CTRL);
|
||||||
csr_writel(pcie, value | APIO_EN_MASK, PAB_AXI_PIO_CTRL);
|
value |= APIO_EN_MASK;
|
||||||
|
csr_writel(pcie, value, PAB_AXI_PIO_CTRL);
|
||||||
|
|
||||||
|
/* Enable PCIe PIO master */
|
||||||
|
value = csr_readl(pcie, PAB_PEX_PIO_CTRL);
|
||||||
|
value |= 1 << PIO_ENABLE_SHIFT;
|
||||||
|
csr_writel(pcie, value, PAB_PEX_PIO_CTRL);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* we'll program one outbound window for config reads and
|
* we'll program one outbound window for config reads and
|
||||||
|
@ -535,32 +620,38 @@ static int mobiveil_host_init(struct mobiveil_pcie *pcie)
|
||||||
*/
|
*/
|
||||||
|
|
||||||
/* config outbound translation window */
|
/* config outbound translation window */
|
||||||
program_ob_windows(pcie, pcie->ob_wins_configured,
|
program_ob_windows(pcie, WIN_NUM_0, pcie->ob_io_res->start, 0,
|
||||||
pcie->ob_io_res->start, 0, CFG_WINDOW_TYPE,
|
CFG_WINDOW_TYPE, resource_size(pcie->ob_io_res));
|
||||||
resource_size(pcie->ob_io_res));
|
|
||||||
|
|
||||||
/* memory inbound translation window */
|
/* memory inbound translation window */
|
||||||
program_ib_windows(pcie, WIN_NUM_1, 0, MEM_WINDOW_TYPE, IB_WIN_SIZE);
|
program_ib_windows(pcie, WIN_NUM_0, 0, MEM_WINDOW_TYPE, IB_WIN_SIZE);
|
||||||
|
|
||||||
/* Get the I/O and memory ranges from DT */
|
/* Get the I/O and memory ranges from DT */
|
||||||
resource_list_for_each_entry_safe(win, tmp, &pcie->resources) {
|
resource_list_for_each_entry(win, &pcie->resources) {
|
||||||
type = 0;
|
|
||||||
if (resource_type(win->res) == IORESOURCE_MEM)
|
if (resource_type(win->res) == IORESOURCE_MEM)
|
||||||
type = MEM_WINDOW_TYPE;
|
type = MEM_WINDOW_TYPE;
|
||||||
if (resource_type(win->res) == IORESOURCE_IO)
|
else if (resource_type(win->res) == IORESOURCE_IO)
|
||||||
type = IO_WINDOW_TYPE;
|
type = IO_WINDOW_TYPE;
|
||||||
if (type) {
|
else
|
||||||
/* configure outbound translation window */
|
continue;
|
||||||
program_ob_windows(pcie, pcie->ob_wins_configured,
|
|
||||||
win->res->start, 0, type,
|
/* configure outbound translation window */
|
||||||
resource_size(win->res));
|
program_ob_windows(pcie, pcie->ob_wins_configured,
|
||||||
}
|
win->res->start,
|
||||||
|
win->res->start - win->offset,
|
||||||
|
type, resource_size(win->res));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* fixup for PCIe class register */
|
||||||
|
value = csr_readl(pcie, PAB_INTP_AXI_PIO_CLASS);
|
||||||
|
value &= 0xff;
|
||||||
|
value |= (PCI_CLASS_BRIDGE_PCI << 16);
|
||||||
|
csr_writel(pcie, value, PAB_INTP_AXI_PIO_CLASS);
|
||||||
|
|
||||||
/* setup MSI hardware registers */
|
/* setup MSI hardware registers */
|
||||||
mobiveil_pcie_enable_msi(pcie);
|
mobiveil_pcie_enable_msi(pcie);
|
||||||
|
|
||||||
return err;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
static void mobiveil_mask_intx_irq(struct irq_data *data)
|
static void mobiveil_mask_intx_irq(struct irq_data *data)
|
||||||
|
@ -574,7 +665,8 @@ static void mobiveil_mask_intx_irq(struct irq_data *data)
|
||||||
mask = 1 << ((data->hwirq + PAB_INTX_START) - 1);
|
mask = 1 << ((data->hwirq + PAB_INTX_START) - 1);
|
||||||
raw_spin_lock_irqsave(&pcie->intx_mask_lock, flags);
|
raw_spin_lock_irqsave(&pcie->intx_mask_lock, flags);
|
||||||
shifted_val = csr_readl(pcie, PAB_INTP_AMBA_MISC_ENB);
|
shifted_val = csr_readl(pcie, PAB_INTP_AMBA_MISC_ENB);
|
||||||
csr_writel(pcie, (shifted_val & (~mask)), PAB_INTP_AMBA_MISC_ENB);
|
shifted_val &= ~mask;
|
||||||
|
csr_writel(pcie, shifted_val, PAB_INTP_AMBA_MISC_ENB);
|
||||||
raw_spin_unlock_irqrestore(&pcie->intx_mask_lock, flags);
|
raw_spin_unlock_irqrestore(&pcie->intx_mask_lock, flags);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -589,7 +681,8 @@ static void mobiveil_unmask_intx_irq(struct irq_data *data)
|
||||||
mask = 1 << ((data->hwirq + PAB_INTX_START) - 1);
|
mask = 1 << ((data->hwirq + PAB_INTX_START) - 1);
|
||||||
raw_spin_lock_irqsave(&pcie->intx_mask_lock, flags);
|
raw_spin_lock_irqsave(&pcie->intx_mask_lock, flags);
|
||||||
shifted_val = csr_readl(pcie, PAB_INTP_AMBA_MISC_ENB);
|
shifted_val = csr_readl(pcie, PAB_INTP_AMBA_MISC_ENB);
|
||||||
csr_writel(pcie, (shifted_val | mask), PAB_INTP_AMBA_MISC_ENB);
|
shifted_val |= mask;
|
||||||
|
csr_writel(pcie, shifted_val, PAB_INTP_AMBA_MISC_ENB);
|
||||||
raw_spin_unlock_irqrestore(&pcie->intx_mask_lock, flags);
|
raw_spin_unlock_irqrestore(&pcie->intx_mask_lock, flags);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -603,10 +696,11 @@ static struct irq_chip intx_irq_chip = {
|
||||||
|
|
||||||
/* routine to setup the INTx related data */
|
/* routine to setup the INTx related data */
|
||||||
static int mobiveil_pcie_intx_map(struct irq_domain *domain, unsigned int irq,
|
static int mobiveil_pcie_intx_map(struct irq_domain *domain, unsigned int irq,
|
||||||
irq_hw_number_t hwirq)
|
irq_hw_number_t hwirq)
|
||||||
{
|
{
|
||||||
irq_set_chip_and_handler(irq, &intx_irq_chip, handle_level_irq);
|
irq_set_chip_and_handler(irq, &intx_irq_chip, handle_level_irq);
|
||||||
irq_set_chip_data(irq, domain->host_data);
|
irq_set_chip_data(irq, domain->host_data);
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -623,7 +717,7 @@ static struct irq_chip mobiveil_msi_irq_chip = {
|
||||||
|
|
||||||
static struct msi_domain_info mobiveil_msi_domain_info = {
|
static struct msi_domain_info mobiveil_msi_domain_info = {
|
||||||
.flags = (MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
|
.flags = (MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
|
||||||
MSI_FLAG_MULTI_PCI_MSI | MSI_FLAG_PCI_MSIX),
|
MSI_FLAG_PCI_MSIX),
|
||||||
.chip = &mobiveil_msi_irq_chip,
|
.chip = &mobiveil_msi_irq_chip,
|
||||||
};
|
};
|
||||||
|
|
||||||
|
@ -641,7 +735,7 @@ static void mobiveil_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
|
||||||
}
|
}
|
||||||
|
|
||||||
static int mobiveil_msi_set_affinity(struct irq_data *irq_data,
|
static int mobiveil_msi_set_affinity(struct irq_data *irq_data,
|
||||||
const struct cpumask *mask, bool force)
|
const struct cpumask *mask, bool force)
|
||||||
{
|
{
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
}
|
}
|
||||||
|
@ -653,7 +747,8 @@ static struct irq_chip mobiveil_msi_bottom_irq_chip = {
|
||||||
};
|
};
|
||||||
|
|
||||||
static int mobiveil_irq_msi_domain_alloc(struct irq_domain *domain,
|
static int mobiveil_irq_msi_domain_alloc(struct irq_domain *domain,
|
||||||
unsigned int virq, unsigned int nr_irqs, void *args)
|
unsigned int virq,
|
||||||
|
unsigned int nr_irqs, void *args)
|
||||||
{
|
{
|
||||||
struct mobiveil_pcie *pcie = domain->host_data;
|
struct mobiveil_pcie *pcie = domain->host_data;
|
||||||
struct mobiveil_msi *msi = &pcie->msi;
|
struct mobiveil_msi *msi = &pcie->msi;
|
||||||
|
@ -673,13 +768,13 @@ static int mobiveil_irq_msi_domain_alloc(struct irq_domain *domain,
|
||||||
mutex_unlock(&msi->lock);
|
mutex_unlock(&msi->lock);
|
||||||
|
|
||||||
irq_domain_set_info(domain, virq, bit, &mobiveil_msi_bottom_irq_chip,
|
irq_domain_set_info(domain, virq, bit, &mobiveil_msi_bottom_irq_chip,
|
||||||
domain->host_data, handle_level_irq,
|
domain->host_data, handle_level_irq, NULL, NULL);
|
||||||
NULL, NULL);
|
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
static void mobiveil_irq_msi_domain_free(struct irq_domain *domain,
|
static void mobiveil_irq_msi_domain_free(struct irq_domain *domain,
|
||||||
unsigned int virq, unsigned int nr_irqs)
|
unsigned int virq,
|
||||||
|
unsigned int nr_irqs)
|
||||||
{
|
{
|
||||||
struct irq_data *d = irq_domain_get_irq_data(domain, virq);
|
struct irq_data *d = irq_domain_get_irq_data(domain, virq);
|
||||||
struct mobiveil_pcie *pcie = irq_data_get_irq_chip_data(d);
|
struct mobiveil_pcie *pcie = irq_data_get_irq_chip_data(d);
|
||||||
|
@ -687,12 +782,11 @@ static void mobiveil_irq_msi_domain_free(struct irq_domain *domain,
|
||||||
|
|
||||||
mutex_lock(&msi->lock);
|
mutex_lock(&msi->lock);
|
||||||
|
|
||||||
if (!test_bit(d->hwirq, msi->msi_irq_in_use)) {
|
if (!test_bit(d->hwirq, msi->msi_irq_in_use))
|
||||||
dev_err(&pcie->pdev->dev, "trying to free unused MSI#%lu\n",
|
dev_err(&pcie->pdev->dev, "trying to free unused MSI#%lu\n",
|
||||||
d->hwirq);
|
d->hwirq);
|
||||||
} else {
|
else
|
||||||
__clear_bit(d->hwirq, msi->msi_irq_in_use);
|
__clear_bit(d->hwirq, msi->msi_irq_in_use);
|
||||||
}
|
|
||||||
|
|
||||||
mutex_unlock(&msi->lock);
|
mutex_unlock(&msi->lock);
|
||||||
}
|
}
|
||||||
|
@ -716,12 +810,14 @@ static int mobiveil_allocate_msi_domains(struct mobiveil_pcie *pcie)
|
||||||
}
|
}
|
||||||
|
|
||||||
msi->msi_domain = pci_msi_create_irq_domain(fwnode,
|
msi->msi_domain = pci_msi_create_irq_domain(fwnode,
|
||||||
&mobiveil_msi_domain_info, msi->dev_domain);
|
&mobiveil_msi_domain_info,
|
||||||
|
msi->dev_domain);
|
||||||
if (!msi->msi_domain) {
|
if (!msi->msi_domain) {
|
||||||
dev_err(dev, "failed to create MSI domain\n");
|
dev_err(dev, "failed to create MSI domain\n");
|
||||||
irq_domain_remove(msi->dev_domain);
|
irq_domain_remove(msi->dev_domain);
|
||||||
return -ENOMEM;
|
return -ENOMEM;
|
||||||
}
|
}
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -732,12 +828,12 @@ static int mobiveil_pcie_init_irq_domain(struct mobiveil_pcie *pcie)
|
||||||
int ret;
|
int ret;
|
||||||
|
|
||||||
/* setup INTx */
|
/* setup INTx */
|
||||||
pcie->intx_domain = irq_domain_add_linear(node,
|
pcie->intx_domain = irq_domain_add_linear(node, PCI_NUM_INTX,
|
||||||
PCI_NUM_INTX, &intx_domain_ops, pcie);
|
&intx_domain_ops, pcie);
|
||||||
|
|
||||||
if (!pcie->intx_domain) {
|
if (!pcie->intx_domain) {
|
||||||
dev_err(dev, "Failed to get a INTx IRQ domain\n");
|
dev_err(dev, "Failed to get a INTx IRQ domain\n");
|
||||||
return -ENODEV;
|
return -ENOMEM;
|
||||||
}
|
}
|
||||||
|
|
||||||
raw_spin_lock_init(&pcie->intx_mask_lock);
|
raw_spin_lock_init(&pcie->intx_mask_lock);
|
||||||
|
@ -763,11 +859,9 @@ static int mobiveil_pcie_probe(struct platform_device *pdev)
|
||||||
/* allocate the PCIe port */
|
/* allocate the PCIe port */
|
||||||
bridge = devm_pci_alloc_host_bridge(dev, sizeof(*pcie));
|
bridge = devm_pci_alloc_host_bridge(dev, sizeof(*pcie));
|
||||||
if (!bridge)
|
if (!bridge)
|
||||||
return -ENODEV;
|
return -ENOMEM;
|
||||||
|
|
||||||
pcie = pci_host_bridge_priv(bridge);
|
pcie = pci_host_bridge_priv(bridge);
|
||||||
if (!pcie)
|
|
||||||
return -ENOMEM;
|
|
||||||
|
|
||||||
pcie->pdev = pdev;
|
pcie->pdev = pdev;
|
||||||
|
|
||||||
|
@ -784,7 +878,7 @@ static int mobiveil_pcie_probe(struct platform_device *pdev)
|
||||||
&pcie->resources, &iobase);
|
&pcie->resources, &iobase);
|
||||||
if (ret) {
|
if (ret) {
|
||||||
dev_err(dev, "Getting bridge resources failed\n");
|
dev_err(dev, "Getting bridge resources failed\n");
|
||||||
return -ENOMEM;
|
return ret;
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -797,9 +891,6 @@ static int mobiveil_pcie_probe(struct platform_device *pdev)
|
||||||
goto error;
|
goto error;
|
||||||
}
|
}
|
||||||
|
|
||||||
/* fixup for PCIe class register */
|
|
||||||
csr_writel(pcie, 0x060402ab, PAB_INTP_AXI_PIO_CLASS);
|
|
||||||
|
|
||||||
/* initialize the IRQ domains */
|
/* initialize the IRQ domains */
|
||||||
ret = mobiveil_pcie_init_irq_domain(pcie);
|
ret = mobiveil_pcie_init_irq_domain(pcie);
|
||||||
if (ret) {
|
if (ret) {
|
||||||
|
@ -807,6 +898,8 @@ static int mobiveil_pcie_probe(struct platform_device *pdev)
|
||||||
goto error;
|
goto error;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
irq_set_chained_handler_and_data(pcie->irq, mobiveil_pcie_isr, pcie);
|
||||||
|
|
||||||
ret = devm_request_pci_bus_resources(dev, &pcie->resources);
|
ret = devm_request_pci_bus_resources(dev, &pcie->resources);
|
||||||
if (ret)
|
if (ret)
|
||||||
goto error;
|
goto error;
|
||||||
|
@ -820,6 +913,12 @@ static int mobiveil_pcie_probe(struct platform_device *pdev)
|
||||||
bridge->map_irq = of_irq_parse_and_map_pci;
|
bridge->map_irq = of_irq_parse_and_map_pci;
|
||||||
bridge->swizzle_irq = pci_common_swizzle;
|
bridge->swizzle_irq = pci_common_swizzle;
|
||||||
|
|
||||||
|
ret = mobiveil_bringup_link(pcie);
|
||||||
|
if (ret) {
|
||||||
|
dev_info(dev, "link bring-up failed\n");
|
||||||
|
goto error;
|
||||||
|
}
|
||||||
|
|
||||||
/* setup the kernel resources for the newly added PCIe root bus */
|
/* setup the kernel resources for the newly added PCIe root bus */
|
||||||
ret = pci_scan_root_bus_bridge(bridge);
|
ret = pci_scan_root_bus_bridge(bridge);
|
||||||
if (ret)
|
if (ret)
|
||||||
|
@ -848,10 +947,10 @@ MODULE_DEVICE_TABLE(of, mobiveil_pcie_of_match);
|
||||||
static struct platform_driver mobiveil_pcie_driver = {
|
static struct platform_driver mobiveil_pcie_driver = {
|
||||||
.probe = mobiveil_pcie_probe,
|
.probe = mobiveil_pcie_probe,
|
||||||
.driver = {
|
.driver = {
|
||||||
.name = "mobiveil-pcie",
|
.name = "mobiveil-pcie",
|
||||||
.of_match_table = mobiveil_pcie_of_match,
|
.of_match_table = mobiveil_pcie_of_match,
|
||||||
.suppress_bind_attrs = true,
|
.suppress_bind_attrs = true,
|
||||||
},
|
},
|
||||||
};
|
};
|
||||||
|
|
||||||
builtin_platform_driver(mobiveil_pcie_driver);
|
builtin_platform_driver(mobiveil_pcie_driver);
|
||||||
|
|
|
@ -482,15 +482,13 @@ static int nwl_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
|
||||||
int i;
|
int i;
|
||||||
|
|
||||||
mutex_lock(&msi->lock);
|
mutex_lock(&msi->lock);
|
||||||
bit = bitmap_find_next_zero_area(msi->bitmap, INT_PCI_MSI_NR, 0,
|
bit = bitmap_find_free_region(msi->bitmap, INT_PCI_MSI_NR,
|
||||||
nr_irqs, 0);
|
get_count_order(nr_irqs));
|
||||||
if (bit >= INT_PCI_MSI_NR) {
|
if (bit < 0) {
|
||||||
mutex_unlock(&msi->lock);
|
mutex_unlock(&msi->lock);
|
||||||
return -ENOSPC;
|
return -ENOSPC;
|
||||||
}
|
}
|
||||||
|
|
||||||
bitmap_set(msi->bitmap, bit, nr_irqs);
|
|
||||||
|
|
||||||
for (i = 0; i < nr_irqs; i++) {
|
for (i = 0; i < nr_irqs; i++) {
|
||||||
irq_domain_set_info(domain, virq + i, bit + i, &nwl_irq_chip,
|
irq_domain_set_info(domain, virq + i, bit + i, &nwl_irq_chip,
|
||||||
domain->host_data, handle_simple_irq,
|
domain->host_data, handle_simple_irq,
|
||||||
|
@ -508,7 +506,8 @@ static void nwl_irq_domain_free(struct irq_domain *domain, unsigned int virq,
|
||||||
struct nwl_msi *msi = &pcie->msi;
|
struct nwl_msi *msi = &pcie->msi;
|
||||||
|
|
||||||
mutex_lock(&msi->lock);
|
mutex_lock(&msi->lock);
|
||||||
bitmap_clear(msi->bitmap, data->hwirq, nr_irqs);
|
bitmap_release_region(msi->bitmap, data->hwirq,
|
||||||
|
get_count_order(nr_irqs));
|
||||||
mutex_unlock(&msi->lock);
|
mutex_unlock(&msi->lock);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -627,7 +627,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
|
||||||
* 32-bit resources. __pci_assign_resource() enforces that
|
* 32-bit resources. __pci_assign_resource() enforces that
|
||||||
* artificial restriction to make sure everything will fit.
|
* artificial restriction to make sure everything will fit.
|
||||||
*
|
*
|
||||||
* The only way we could use a 64-bit non-prefechable MEMBAR is
|
* The only way we could use a 64-bit non-prefetchable MEMBAR is
|
||||||
* if its address is <4GB so that we can convert it to a 32-bit
|
* if its address is <4GB so that we can convert it to a 32-bit
|
||||||
* resource. To be visible to the host OS, all VMD endpoints must
|
* resource. To be visible to the host OS, all VMD endpoints must
|
||||||
* be initially configured by platform BIOS, which includes setting
|
* be initially configured by platform BIOS, which includes setting
|
||||||
|
|
|
@ -381,15 +381,15 @@ static void pci_epf_test_unbind(struct pci_epf *epf)
|
||||||
epf_bar = &epf->bar[bar];
|
epf_bar = &epf->bar[bar];
|
||||||
|
|
||||||
if (epf_test->reg[bar]) {
|
if (epf_test->reg[bar]) {
|
||||||
pci_epf_free_space(epf, epf_test->reg[bar], bar);
|
|
||||||
pci_epc_clear_bar(epc, epf->func_no, epf_bar);
|
pci_epc_clear_bar(epc, epf->func_no, epf_bar);
|
||||||
|
pci_epf_free_space(epf, epf_test->reg[bar], bar);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
static int pci_epf_test_set_bar(struct pci_epf *epf)
|
static int pci_epf_test_set_bar(struct pci_epf *epf)
|
||||||
{
|
{
|
||||||
int bar;
|
int bar, add;
|
||||||
int ret;
|
int ret;
|
||||||
struct pci_epf_bar *epf_bar;
|
struct pci_epf_bar *epf_bar;
|
||||||
struct pci_epc *epc = epf->epc;
|
struct pci_epc *epc = epf->epc;
|
||||||
|
@ -400,8 +400,14 @@ static int pci_epf_test_set_bar(struct pci_epf *epf)
|
||||||
|
|
||||||
epc_features = epf_test->epc_features;
|
epc_features = epf_test->epc_features;
|
||||||
|
|
||||||
for (bar = BAR_0; bar <= BAR_5; bar++) {
|
for (bar = BAR_0; bar <= BAR_5; bar += add) {
|
||||||
epf_bar = &epf->bar[bar];
|
epf_bar = &epf->bar[bar];
|
||||||
|
/*
|
||||||
|
* pci_epc_set_bar() sets PCI_BASE_ADDRESS_MEM_TYPE_64
|
||||||
|
* if the specific implementation required a 64-bit BAR,
|
||||||
|
* even if we only requested a 32-bit BAR.
|
||||||
|
*/
|
||||||
|
add = (epf_bar->flags & PCI_BASE_ADDRESS_MEM_TYPE_64) ? 2 : 1;
|
||||||
|
|
||||||
if (!!(epc_features->reserved_bar & (1 << bar)))
|
if (!!(epc_features->reserved_bar & (1 << bar)))
|
||||||
continue;
|
continue;
|
||||||
|
@ -413,13 +419,6 @@ static int pci_epf_test_set_bar(struct pci_epf *epf)
|
||||||
if (bar == test_reg_bar)
|
if (bar == test_reg_bar)
|
||||||
return ret;
|
return ret;
|
||||||
}
|
}
|
||||||
/*
|
|
||||||
* pci_epc_set_bar() sets PCI_BASE_ADDRESS_MEM_TYPE_64
|
|
||||||
* if the specific implementation required a 64-bit BAR,
|
|
||||||
* even if we only requested a 32-bit BAR.
|
|
||||||
*/
|
|
||||||
if (epf_bar->flags & PCI_BASE_ADDRESS_MEM_TYPE_64)
|
|
||||||
bar++;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
|
@ -431,13 +430,19 @@ static int pci_epf_test_alloc_space(struct pci_epf *epf)
|
||||||
struct device *dev = &epf->dev;
|
struct device *dev = &epf->dev;
|
||||||
struct pci_epf_bar *epf_bar;
|
struct pci_epf_bar *epf_bar;
|
||||||
void *base;
|
void *base;
|
||||||
int bar;
|
int bar, add;
|
||||||
enum pci_barno test_reg_bar = epf_test->test_reg_bar;
|
enum pci_barno test_reg_bar = epf_test->test_reg_bar;
|
||||||
const struct pci_epc_features *epc_features;
|
const struct pci_epc_features *epc_features;
|
||||||
|
size_t test_reg_size;
|
||||||
|
|
||||||
epc_features = epf_test->epc_features;
|
epc_features = epf_test->epc_features;
|
||||||
|
|
||||||
base = pci_epf_alloc_space(epf, sizeof(struct pci_epf_test_reg),
|
if (epc_features->bar_fixed_size[test_reg_bar])
|
||||||
|
test_reg_size = bar_size[test_reg_bar];
|
||||||
|
else
|
||||||
|
test_reg_size = sizeof(struct pci_epf_test_reg);
|
||||||
|
|
||||||
|
base = pci_epf_alloc_space(epf, test_reg_size,
|
||||||
test_reg_bar, epc_features->align);
|
test_reg_bar, epc_features->align);
|
||||||
if (!base) {
|
if (!base) {
|
||||||
dev_err(dev, "Failed to allocated register space\n");
|
dev_err(dev, "Failed to allocated register space\n");
|
||||||
|
@ -445,8 +450,10 @@ static int pci_epf_test_alloc_space(struct pci_epf *epf)
|
||||||
}
|
}
|
||||||
epf_test->reg[test_reg_bar] = base;
|
epf_test->reg[test_reg_bar] = base;
|
||||||
|
|
||||||
for (bar = BAR_0; bar <= BAR_5; bar++) {
|
for (bar = BAR_0; bar <= BAR_5; bar += add) {
|
||||||
epf_bar = &epf->bar[bar];
|
epf_bar = &epf->bar[bar];
|
||||||
|
add = (epf_bar->flags & PCI_BASE_ADDRESS_MEM_TYPE_64) ? 2 : 1;
|
||||||
|
|
||||||
if (bar == test_reg_bar)
|
if (bar == test_reg_bar)
|
||||||
continue;
|
continue;
|
||||||
|
|
||||||
|
@ -459,8 +466,6 @@ static int pci_epf_test_alloc_space(struct pci_epf *epf)
|
||||||
dev_err(dev, "Failed to allocate space for BAR%d\n",
|
dev_err(dev, "Failed to allocate space for BAR%d\n",
|
||||||
bar);
|
bar);
|
||||||
epf_test->reg[bar] = base;
|
epf_test->reg[bar] = base;
|
||||||
if (epf_bar->flags & PCI_BASE_ADDRESS_MEM_TYPE_64)
|
|
||||||
bar++;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
|
|
|
@ -519,11 +519,12 @@ void pci_epc_remove_epf(struct pci_epc *epc, struct pci_epf *epf)
|
||||||
{
|
{
|
||||||
unsigned long flags;
|
unsigned long flags;
|
||||||
|
|
||||||
if (!epc || IS_ERR(epc))
|
if (!epc || IS_ERR(epc) || !epf)
|
||||||
return;
|
return;
|
||||||
|
|
||||||
spin_lock_irqsave(&epc->lock, flags);
|
spin_lock_irqsave(&epc->lock, flags);
|
||||||
list_del(&epf->list);
|
list_del(&epf->list);
|
||||||
|
epf->epc = NULL;
|
||||||
spin_unlock_irqrestore(&epc->lock, flags);
|
spin_unlock_irqrestore(&epc->lock, flags);
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(pci_epc_remove_epf);
|
EXPORT_SYMBOL_GPL(pci_epc_remove_epf);
|
||||||
|
|
|
@ -132,8 +132,6 @@ static void pci_read_vf_config_common(struct pci_dev *virtfn)
|
||||||
&physfn->sriov->subsystem_vendor);
|
&physfn->sriov->subsystem_vendor);
|
||||||
pci_read_config_word(virtfn, PCI_SUBSYSTEM_ID,
|
pci_read_config_word(virtfn, PCI_SUBSYSTEM_ID,
|
||||||
&physfn->sriov->subsystem_device);
|
&physfn->sriov->subsystem_device);
|
||||||
|
|
||||||
physfn->sriov->cfg_size = pci_cfg_space_size(virtfn);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
int pci_iov_add_virtfn(struct pci_dev *dev, int id)
|
int pci_iov_add_virtfn(struct pci_dev *dev, int id)
|
||||||
|
|
|
@ -73,7 +73,7 @@ int pci_mmap_resource_range(struct pci_dev *pdev, int bar,
|
||||||
#elif defined(HAVE_PCI_MMAP) /* && !ARCH_GENERIC_PCI_MMAP_RESOURCE */
|
#elif defined(HAVE_PCI_MMAP) /* && !ARCH_GENERIC_PCI_MMAP_RESOURCE */
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Legacy setup: Impement pci_mmap_resource_range() as a wrapper around
|
* Legacy setup: Implement pci_mmap_resource_range() as a wrapper around
|
||||||
* the architecture's pci_mmap_page_range(), converting to "user visible"
|
* the architecture's pci_mmap_page_range(), converting to "user visible"
|
||||||
* addresses as necessary.
|
* addresses as necessary.
|
||||||
*/
|
*/
|
||||||
|
|
|
@ -237,7 +237,7 @@ static void msi_set_mask_bit(struct irq_data *data, u32 flag)
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* pci_msi_mask_irq - Generic irq chip callback to mask PCI/MSI interrupts
|
* pci_msi_mask_irq - Generic IRQ chip callback to mask PCI/MSI interrupts
|
||||||
* @data: pointer to irqdata associated to that interrupt
|
* @data: pointer to irqdata associated to that interrupt
|
||||||
*/
|
*/
|
||||||
void pci_msi_mask_irq(struct irq_data *data)
|
void pci_msi_mask_irq(struct irq_data *data)
|
||||||
|
@ -247,7 +247,7 @@ void pci_msi_mask_irq(struct irq_data *data)
|
||||||
EXPORT_SYMBOL_GPL(pci_msi_mask_irq);
|
EXPORT_SYMBOL_GPL(pci_msi_mask_irq);
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* pci_msi_unmask_irq - Generic irq chip callback to unmask PCI/MSI interrupts
|
* pci_msi_unmask_irq - Generic IRQ chip callback to unmask PCI/MSI interrupts
|
||||||
* @data: pointer to irqdata associated to that interrupt
|
* @data: pointer to irqdata associated to that interrupt
|
||||||
*/
|
*/
|
||||||
void pci_msi_unmask_irq(struct irq_data *data)
|
void pci_msi_unmask_irq(struct irq_data *data)
|
||||||
|
@ -588,11 +588,11 @@ static int msi_verify_entries(struct pci_dev *dev)
|
||||||
* msi_capability_init - configure device's MSI capability structure
|
* msi_capability_init - configure device's MSI capability structure
|
||||||
* @dev: pointer to the pci_dev data structure of MSI device function
|
* @dev: pointer to the pci_dev data structure of MSI device function
|
||||||
* @nvec: number of interrupts to allocate
|
* @nvec: number of interrupts to allocate
|
||||||
* @affd: description of automatic irq affinity assignments (may be %NULL)
|
* @affd: description of automatic IRQ affinity assignments (may be %NULL)
|
||||||
*
|
*
|
||||||
* Setup the MSI capability structure of the device with the requested
|
* Setup the MSI capability structure of the device with the requested
|
||||||
* number of interrupts. A return value of zero indicates the successful
|
* number of interrupts. A return value of zero indicates the successful
|
||||||
* setup of an entry with the new MSI irq. A negative return value indicates
|
* setup of an entry with the new MSI IRQ. A negative return value indicates
|
||||||
* an error, and a positive return value indicates the number of interrupts
|
* an error, and a positive return value indicates the number of interrupts
|
||||||
* which could have been allocated.
|
* which could have been allocated.
|
||||||
*/
|
*/
|
||||||
|
@ -609,7 +609,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
|
||||||
if (!entry)
|
if (!entry)
|
||||||
return -ENOMEM;
|
return -ENOMEM;
|
||||||
|
|
||||||
/* All MSIs are unmasked by default, Mask them all */
|
/* All MSIs are unmasked by default; mask them all */
|
||||||
mask = msi_mask(entry->msi_attrib.multi_cap);
|
mask = msi_mask(entry->msi_attrib.multi_cap);
|
||||||
msi_mask_irq(entry, mask, mask);
|
msi_mask_irq(entry, mask, mask);
|
||||||
|
|
||||||
|
@ -637,7 +637,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
|
||||||
return ret;
|
return ret;
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Set MSI enabled bits */
|
/* Set MSI enabled bits */
|
||||||
pci_intx_for_msi(dev, 0);
|
pci_intx_for_msi(dev, 0);
|
||||||
pci_msi_set_enable(dev, 1);
|
pci_msi_set_enable(dev, 1);
|
||||||
dev->msi_enabled = 1;
|
dev->msi_enabled = 1;
|
||||||
|
@ -729,11 +729,11 @@ static void msix_program_entries(struct pci_dev *dev,
|
||||||
* @dev: pointer to the pci_dev data structure of MSI-X device function
|
* @dev: pointer to the pci_dev data structure of MSI-X device function
|
||||||
* @entries: pointer to an array of struct msix_entry entries
|
* @entries: pointer to an array of struct msix_entry entries
|
||||||
* @nvec: number of @entries
|
* @nvec: number of @entries
|
||||||
* @affd: Optional pointer to enable automatic affinity assignement
|
* @affd: Optional pointer to enable automatic affinity assignment
|
||||||
*
|
*
|
||||||
* Setup the MSI-X capability structure of device function with a
|
* Setup the MSI-X capability structure of device function with a
|
||||||
* single MSI-X irq. A return of zero indicates the successful setup of
|
* single MSI-X IRQ. A return of zero indicates the successful setup of
|
||||||
* requested MSI-X entries with allocated irqs or non-zero for otherwise.
|
* requested MSI-X entries with allocated IRQs or non-zero for otherwise.
|
||||||
**/
|
**/
|
||||||
static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
|
static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
|
||||||
int nvec, struct irq_affinity *affd)
|
int nvec, struct irq_affinity *affd)
|
||||||
|
@ -789,7 +789,7 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
|
||||||
out_avail:
|
out_avail:
|
||||||
if (ret < 0) {
|
if (ret < 0) {
|
||||||
/*
|
/*
|
||||||
* If we had some success, report the number of irqs
|
* If we had some success, report the number of IRQs
|
||||||
* we succeeded in setting up.
|
* we succeeded in setting up.
|
||||||
*/
|
*/
|
||||||
struct msi_desc *entry;
|
struct msi_desc *entry;
|
||||||
|
@ -812,7 +812,7 @@ out_free:
|
||||||
/**
|
/**
|
||||||
* pci_msi_supported - check whether MSI may be enabled on a device
|
* pci_msi_supported - check whether MSI may be enabled on a device
|
||||||
* @dev: pointer to the pci_dev data structure of MSI device function
|
* @dev: pointer to the pci_dev data structure of MSI device function
|
||||||
* @nvec: how many MSIs have been requested ?
|
* @nvec: how many MSIs have been requested?
|
||||||
*
|
*
|
||||||
* Look at global flags, the device itself, and its parent buses
|
* Look at global flags, the device itself, and its parent buses
|
||||||
* to determine if MSI/-X are supported for the device. If MSI/-X is
|
* to determine if MSI/-X are supported for the device. If MSI/-X is
|
||||||
|
@ -896,7 +896,7 @@ static void pci_msi_shutdown(struct pci_dev *dev)
|
||||||
/* Keep cached state to be restored */
|
/* Keep cached state to be restored */
|
||||||
__pci_msi_desc_mask_irq(desc, mask, ~mask);
|
__pci_msi_desc_mask_irq(desc, mask, ~mask);
|
||||||
|
|
||||||
/* Restore dev->irq to its default pin-assertion irq */
|
/* Restore dev->irq to its default pin-assertion IRQ */
|
||||||
dev->irq = desc->msi_attrib.default_irq;
|
dev->irq = desc->msi_attrib.default_irq;
|
||||||
pcibios_alloc_irq(dev);
|
pcibios_alloc_irq(dev);
|
||||||
}
|
}
|
||||||
|
@ -958,7 +958,7 @@ static int __pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Check whether driver already requested for MSI irq */
|
/* Check whether driver already requested for MSI IRQ */
|
||||||
if (dev->msi_enabled) {
|
if (dev->msi_enabled) {
|
||||||
pci_info(dev, "can't enable MSI-X (MSI IRQ already assigned)\n");
|
pci_info(dev, "can't enable MSI-X (MSI IRQ already assigned)\n");
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
|
@ -1026,7 +1026,7 @@ static int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
|
||||||
if (!pci_msi_supported(dev, minvec))
|
if (!pci_msi_supported(dev, minvec))
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
|
|
||||||
/* Check whether driver already requested MSI-X irqs */
|
/* Check whether driver already requested MSI-X IRQs */
|
||||||
if (dev->msix_enabled) {
|
if (dev->msix_enabled) {
|
||||||
pci_info(dev, "can't enable MSI (MSI-X already enabled)\n");
|
pci_info(dev, "can't enable MSI (MSI-X already enabled)\n");
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
|
@ -1113,8 +1113,8 @@ static int __pci_enable_msix_range(struct pci_dev *dev,
|
||||||
* pci_enable_msix_range - configure device's MSI-X capability structure
|
* pci_enable_msix_range - configure device's MSI-X capability structure
|
||||||
* @dev: pointer to the pci_dev data structure of MSI-X device function
|
* @dev: pointer to the pci_dev data structure of MSI-X device function
|
||||||
* @entries: pointer to an array of MSI-X entries
|
* @entries: pointer to an array of MSI-X entries
|
||||||
* @minvec: minimum number of MSI-X irqs requested
|
* @minvec: minimum number of MSI-X IRQs requested
|
||||||
* @maxvec: maximum number of MSI-X irqs requested
|
* @maxvec: maximum number of MSI-X IRQs requested
|
||||||
*
|
*
|
||||||
* Setup the MSI-X capability structure of device function with a maximum
|
* Setup the MSI-X capability structure of device function with a maximum
|
||||||
* possible number of interrupts in the range between @minvec and @maxvec
|
* possible number of interrupts in the range between @minvec and @maxvec
|
||||||
|
@ -1179,7 +1179,7 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
|
||||||
return msi_vecs;
|
return msi_vecs;
|
||||||
}
|
}
|
||||||
|
|
||||||
/* use legacy irq if allowed */
|
/* use legacy IRQ if allowed */
|
||||||
if (flags & PCI_IRQ_LEGACY) {
|
if (flags & PCI_IRQ_LEGACY) {
|
||||||
if (min_vecs == 1 && dev->irq) {
|
if (min_vecs == 1 && dev->irq) {
|
||||||
/*
|
/*
|
||||||
|
@ -1248,7 +1248,7 @@ int pci_irq_vector(struct pci_dev *dev, unsigned int nr)
|
||||||
EXPORT_SYMBOL(pci_irq_vector);
|
EXPORT_SYMBOL(pci_irq_vector);
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* pci_irq_get_affinity - return the affinity of a particular msi vector
|
* pci_irq_get_affinity - return the affinity of a particular MSI vector
|
||||||
* @dev: PCI device to operate on
|
* @dev: PCI device to operate on
|
||||||
* @nr: device-relative interrupt vector index (0-based).
|
* @nr: device-relative interrupt vector index (0-based).
|
||||||
*/
|
*/
|
||||||
|
@ -1280,7 +1280,7 @@ const struct cpumask *pci_irq_get_affinity(struct pci_dev *dev, int nr)
|
||||||
EXPORT_SYMBOL(pci_irq_get_affinity);
|
EXPORT_SYMBOL(pci_irq_get_affinity);
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* pci_irq_get_node - return the numa node of a particular msi vector
|
* pci_irq_get_node - return the NUMA node of a particular MSI vector
|
||||||
* @pdev: PCI device to operate on
|
* @pdev: PCI device to operate on
|
||||||
* @vec: device-relative interrupt vector index (0-based).
|
* @vec: device-relative interrupt vector index (0-based).
|
||||||
*/
|
*/
|
||||||
|
@ -1330,7 +1330,7 @@ void pci_msi_domain_write_msg(struct irq_data *irq_data, struct msi_msg *msg)
|
||||||
/**
|
/**
|
||||||
* pci_msi_domain_calc_hwirq - Generate a unique ID for an MSI source
|
* pci_msi_domain_calc_hwirq - Generate a unique ID for an MSI source
|
||||||
* @dev: Pointer to the PCI device
|
* @dev: Pointer to the PCI device
|
||||||
* @desc: Pointer to the msi descriptor
|
* @desc: Pointer to the MSI descriptor
|
||||||
*
|
*
|
||||||
* The ID number is only used within the irqdomain.
|
* The ID number is only used within the irqdomain.
|
||||||
*/
|
*/
|
||||||
|
@ -1348,7 +1348,8 @@ static inline bool pci_msi_desc_is_multi_msi(struct msi_desc *desc)
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* pci_msi_domain_check_cap - Verify that @domain supports the capabilities for @dev
|
* pci_msi_domain_check_cap - Verify that @domain supports the capabilities
|
||||||
|
* for @dev
|
||||||
* @domain: The interrupt domain to check
|
* @domain: The interrupt domain to check
|
||||||
* @info: The domain info for verification
|
* @info: The domain info for verification
|
||||||
* @dev: The device to check
|
* @dev: The device to check
|
||||||
|
|
|
@ -195,7 +195,7 @@ EXPORT_SYMBOL_GPL(pci_p2pdma_add_resource);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Note this function returns the parent PCI device with a
|
* Note this function returns the parent PCI device with a
|
||||||
* reference taken. It is the caller's responsibily to drop
|
* reference taken. It is the caller's responsibility to drop
|
||||||
* the reference.
|
* the reference.
|
||||||
*/
|
*/
|
||||||
static struct pci_dev *find_parent_pci_dev(struct device *dev)
|
static struct pci_dev *find_parent_pci_dev(struct device *dev)
|
||||||
|
@ -355,7 +355,7 @@ static int upstream_bridge_distance(struct pci_dev *provider,
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Allow the connection if both devices are on a whitelisted root
|
* Allow the connection if both devices are on a whitelisted root
|
||||||
* complex, but add an arbitary large value to the distance.
|
* complex, but add an arbitrary large value to the distance.
|
||||||
*/
|
*/
|
||||||
if (root_complex_whitelist(provider) &&
|
if (root_complex_whitelist(provider) &&
|
||||||
root_complex_whitelist(client))
|
root_complex_whitelist(client))
|
||||||
|
@ -414,7 +414,7 @@ static int upstream_bridge_distance_warn(struct pci_dev *provider,
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* pci_p2pdma_distance_many - Determive the cumulative distance between
|
* pci_p2pdma_distance_many - Determine the cumulative distance between
|
||||||
* a p2pdma provider and the clients in use.
|
* a p2pdma provider and the clients in use.
|
||||||
* @provider: p2pdma provider to check against the client list
|
* @provider: p2pdma provider to check against the client list
|
||||||
* @clients: array of devices to check (NULL-terminated)
|
* @clients: array of devices to check (NULL-terminated)
|
||||||
|
@ -443,6 +443,14 @@ int pci_p2pdma_distance_many(struct pci_dev *provider, struct device **clients,
|
||||||
return -1;
|
return -1;
|
||||||
|
|
||||||
for (i = 0; i < num_clients; i++) {
|
for (i = 0; i < num_clients; i++) {
|
||||||
|
if (IS_ENABLED(CONFIG_DMA_VIRT_OPS) &&
|
||||||
|
clients[i]->dma_ops == &dma_virt_ops) {
|
||||||
|
if (verbose)
|
||||||
|
dev_warn(clients[i],
|
||||||
|
"cannot be used for peer-to-peer DMA because the driver makes use of dma_virt_ops\n");
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
pci_client = find_parent_pci_dev(clients[i]);
|
pci_client = find_parent_pci_dev(clients[i]);
|
||||||
if (!pci_client) {
|
if (!pci_client) {
|
||||||
if (verbose)
|
if (verbose)
|
||||||
|
@ -721,7 +729,7 @@ int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
|
||||||
* p2pdma mappings are not compatible with devices that use
|
* p2pdma mappings are not compatible with devices that use
|
||||||
* dma_virt_ops. If the upper layers do the right thing
|
* dma_virt_ops. If the upper layers do the right thing
|
||||||
* this should never happen because it will be prevented
|
* this should never happen because it will be prevented
|
||||||
* by the check in pci_p2pdma_add_client()
|
* by the check in pci_p2pdma_distance_many()
|
||||||
*/
|
*/
|
||||||
if (WARN_ON_ONCE(IS_ENABLED(CONFIG_DMA_VIRT_OPS) &&
|
if (WARN_ON_ONCE(IS_ENABLED(CONFIG_DMA_VIRT_OPS) &&
|
||||||
dev->dma_ops == &dma_virt_ops))
|
dev->dma_ops == &dma_virt_ops))
|
||||||
|
|
|
@ -305,7 +305,7 @@ int pci_bridge_emul_init(struct pci_bridge_emul *bridge,
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Cleanup a pci_bridge_emul structure that was previously initilized
|
* Cleanup a pci_bridge_emul structure that was previously initialized
|
||||||
* using pci_bridge_emul_init().
|
* using pci_bridge_emul_init().
|
||||||
*/
|
*/
|
||||||
void pci_bridge_emul_cleanup(struct pci_bridge_emul *bridge)
|
void pci_bridge_emul_cleanup(struct pci_bridge_emul *bridge)
|
||||||
|
|
|
@ -399,7 +399,8 @@ void __weak pcibios_free_irq(struct pci_dev *dev)
|
||||||
#ifdef CONFIG_PCI_IOV
|
#ifdef CONFIG_PCI_IOV
|
||||||
static inline bool pci_device_can_probe(struct pci_dev *pdev)
|
static inline bool pci_device_can_probe(struct pci_dev *pdev)
|
||||||
{
|
{
|
||||||
return (!pdev->is_virtfn || pdev->physfn->sriov->drivers_autoprobe);
|
return (!pdev->is_virtfn || pdev->physfn->sriov->drivers_autoprobe ||
|
||||||
|
pdev->driver_override);
|
||||||
}
|
}
|
||||||
#else
|
#else
|
||||||
static inline bool pci_device_can_probe(struct pci_dev *pdev)
|
static inline bool pci_device_can_probe(struct pci_dev *pdev)
|
||||||
|
@ -414,6 +415,9 @@ static int pci_device_probe(struct device *dev)
|
||||||
struct pci_dev *pci_dev = to_pci_dev(dev);
|
struct pci_dev *pci_dev = to_pci_dev(dev);
|
||||||
struct pci_driver *drv = to_pci_driver(dev->driver);
|
struct pci_driver *drv = to_pci_driver(dev->driver);
|
||||||
|
|
||||||
|
if (!pci_device_can_probe(pci_dev))
|
||||||
|
return -ENODEV;
|
||||||
|
|
||||||
pci_assign_irq(pci_dev);
|
pci_assign_irq(pci_dev);
|
||||||
|
|
||||||
error = pcibios_alloc_irq(pci_dev);
|
error = pcibios_alloc_irq(pci_dev);
|
||||||
|
@ -421,12 +425,10 @@ static int pci_device_probe(struct device *dev)
|
||||||
return error;
|
return error;
|
||||||
|
|
||||||
pci_dev_get(pci_dev);
|
pci_dev_get(pci_dev);
|
||||||
if (pci_device_can_probe(pci_dev)) {
|
error = __pci_device_probe(drv, pci_dev);
|
||||||
error = __pci_device_probe(drv, pci_dev);
|
if (error) {
|
||||||
if (error) {
|
pcibios_free_irq(pci_dev);
|
||||||
pcibios_free_irq(pci_dev);
|
pci_dev_put(pci_dev);
|
||||||
pci_dev_put(pci_dev);
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
return error;
|
return error;
|
||||||
|
|
|
@ -1,7 +1,7 @@
|
||||||
// SPDX-License-Identifier: GPL-2.0
|
// SPDX-License-Identifier: GPL-2.0
|
||||||
/* pci-pf-stub - simple stub driver for PCI SR-IOV PF device
|
/* pci-pf-stub - simple stub driver for PCI SR-IOV PF device
|
||||||
*
|
*
|
||||||
* This driver is meant to act as a "whitelist" for devices that provde
|
* This driver is meant to act as a "whitelist" for devices that provide
|
||||||
* SR-IOV functionality while at the same time not actually needing a
|
* SR-IOV functionality while at the same time not actually needing a
|
||||||
* driver of their own.
|
* driver of their own.
|
||||||
*/
|
*/
|
||||||
|
|
|
@ -182,6 +182,9 @@ static ssize_t current_link_speed_show(struct device *dev,
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
|
|
||||||
switch (linkstat & PCI_EXP_LNKSTA_CLS) {
|
switch (linkstat & PCI_EXP_LNKSTA_CLS) {
|
||||||
|
case PCI_EXP_LNKSTA_CLS_32_0GB:
|
||||||
|
speed = "32 GT/s";
|
||||||
|
break;
|
||||||
case PCI_EXP_LNKSTA_CLS_16_0GB:
|
case PCI_EXP_LNKSTA_CLS_16_0GB:
|
||||||
speed = "16 GT/s";
|
speed = "16 GT/s";
|
||||||
break;
|
break;
|
||||||
|
@ -477,7 +480,7 @@ static ssize_t remove_store(struct device *dev, struct device_attribute *attr,
|
||||||
pci_stop_and_remove_bus_device_locked(to_pci_dev(dev));
|
pci_stop_and_remove_bus_device_locked(to_pci_dev(dev));
|
||||||
return count;
|
return count;
|
||||||
}
|
}
|
||||||
static struct device_attribute dev_remove_attr = __ATTR(remove,
|
static struct device_attribute dev_remove_attr = __ATTR_IGNORE_LOCKDEP(remove,
|
||||||
(S_IWUSR|S_IWGRP),
|
(S_IWUSR|S_IWGRP),
|
||||||
NULL, remove_store);
|
NULL, remove_store);
|
||||||
|
|
||||||
|
|
|
@ -4535,7 +4535,7 @@ static int pci_af_flr(struct pci_dev *dev, int probe)
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Wait for Transaction Pending bit to clear. A word-aligned test
|
* Wait for Transaction Pending bit to clear. A word-aligned test
|
||||||
* is used, so we use the conrol offset rather than status and shift
|
* is used, so we use the control offset rather than status and shift
|
||||||
* the test bit to match.
|
* the test bit to match.
|
||||||
*/
|
*/
|
||||||
if (!pci_wait_for_pending(dev, pos + PCI_AF_CTRL,
|
if (!pci_wait_for_pending(dev, pos + PCI_AF_CTRL,
|
||||||
|
@ -5669,7 +5669,9 @@ enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev)
|
||||||
*/
|
*/
|
||||||
pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
|
pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
|
||||||
if (lnkcap2) { /* PCIe r3.0-compliant */
|
if (lnkcap2) { /* PCIe r3.0-compliant */
|
||||||
if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_16_0GB)
|
if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_32_0GB)
|
||||||
|
return PCIE_SPEED_32_0GT;
|
||||||
|
else if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_16_0GB)
|
||||||
return PCIE_SPEED_16_0GT;
|
return PCIE_SPEED_16_0GT;
|
||||||
else if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_8_0GB)
|
else if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_8_0GB)
|
||||||
return PCIE_SPEED_8_0GT;
|
return PCIE_SPEED_8_0GT;
|
||||||
|
|
|
@ -298,7 +298,6 @@ struct pci_sriov {
|
||||||
u16 driver_max_VFs; /* Max num VFs driver supports */
|
u16 driver_max_VFs; /* Max num VFs driver supports */
|
||||||
struct pci_dev *dev; /* Lowest numbered PF */
|
struct pci_dev *dev; /* Lowest numbered PF */
|
||||||
struct pci_dev *self; /* This PF */
|
struct pci_dev *self; /* This PF */
|
||||||
u32 cfg_size; /* VF config space size */
|
|
||||||
u32 class; /* VF device */
|
u32 class; /* VF device */
|
||||||
u8 hdr_type; /* VF header type */
|
u8 hdr_type; /* VF header type */
|
||||||
u16 subsystem_vendor; /* VF subsystem vendor */
|
u16 subsystem_vendor; /* VF subsystem vendor */
|
||||||
|
|
|
@ -2,7 +2,7 @@
|
||||||
/*
|
/*
|
||||||
* PCIe AER software error injection support.
|
* PCIe AER software error injection support.
|
||||||
*
|
*
|
||||||
* Debuging PCIe AER code is quite difficult because it is hard to
|
* Debugging PCIe AER code is quite difficult because it is hard to
|
||||||
* trigger various real hardware errors. Software based error
|
* trigger various real hardware errors. Software based error
|
||||||
* injection can fake almost all kinds of errors with the help of a
|
* injection can fake almost all kinds of errors with the help of a
|
||||||
* user space helper tool aer-inject, which can be gotten from:
|
* user space helper tool aer-inject, which can be gotten from:
|
||||||
|
|
|
@ -668,7 +668,7 @@ const unsigned char pcie_link_speed[] = {
|
||||||
PCIE_SPEED_5_0GT, /* 2 */
|
PCIE_SPEED_5_0GT, /* 2 */
|
||||||
PCIE_SPEED_8_0GT, /* 3 */
|
PCIE_SPEED_8_0GT, /* 3 */
|
||||||
PCIE_SPEED_16_0GT, /* 4 */
|
PCIE_SPEED_16_0GT, /* 4 */
|
||||||
PCI_SPEED_UNKNOWN, /* 5 */
|
PCIE_SPEED_32_0GT, /* 5 */
|
||||||
PCI_SPEED_UNKNOWN, /* 6 */
|
PCI_SPEED_UNKNOWN, /* 6 */
|
||||||
PCI_SPEED_UNKNOWN, /* 7 */
|
PCI_SPEED_UNKNOWN, /* 7 */
|
||||||
PCI_SPEED_UNKNOWN, /* 8 */
|
PCI_SPEED_UNKNOWN, /* 8 */
|
||||||
|
@ -1555,17 +1555,6 @@ static int pci_cfg_space_size_ext(struct pci_dev *dev)
|
||||||
return PCI_CFG_SPACE_EXP_SIZE;
|
return PCI_CFG_SPACE_EXP_SIZE;
|
||||||
}
|
}
|
||||||
|
|
||||||
#ifdef CONFIG_PCI_IOV
|
|
||||||
static bool is_vf0(struct pci_dev *dev)
|
|
||||||
{
|
|
||||||
if (pci_iov_virtfn_devfn(dev->physfn, 0) == dev->devfn &&
|
|
||||||
pci_iov_virtfn_bus(dev->physfn, 0) == dev->bus->number)
|
|
||||||
return true;
|
|
||||||
|
|
||||||
return false;
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
|
|
||||||
int pci_cfg_space_size(struct pci_dev *dev)
|
int pci_cfg_space_size(struct pci_dev *dev)
|
||||||
{
|
{
|
||||||
int pos;
|
int pos;
|
||||||
|
@ -1573,9 +1562,18 @@ int pci_cfg_space_size(struct pci_dev *dev)
|
||||||
u16 class;
|
u16 class;
|
||||||
|
|
||||||
#ifdef CONFIG_PCI_IOV
|
#ifdef CONFIG_PCI_IOV
|
||||||
/* Read cached value for all VFs except for VF0 */
|
/*
|
||||||
if (dev->is_virtfn && !is_vf0(dev))
|
* Per the SR-IOV specification (rev 1.1, sec 3.5), VFs are required to
|
||||||
return dev->physfn->sriov->cfg_size;
|
* implement a PCIe capability and therefore must implement extended
|
||||||
|
* config space. We can skip the NO_EXTCFG test below and the
|
||||||
|
* reachability/aliasing test in pci_cfg_space_size_ext() by virtue of
|
||||||
|
* the fact that the SR-IOV capability on the PF resides in extended
|
||||||
|
* config space and must be accessible and non-aliased to have enabled
|
||||||
|
* support for this VF. This is a micro performance optimization for
|
||||||
|
* systems supporting many VFs.
|
||||||
|
*/
|
||||||
|
if (dev->is_virtfn)
|
||||||
|
return PCI_CFG_SPACE_EXP_SIZE;
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
if (dev->bus->bus_flags & PCI_BUS_FLAGS_NO_EXTCFG)
|
if (dev->bus->bus_flags & PCI_BUS_FLAGS_NO_EXTCFG)
|
||||||
|
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue