This patch adds runtime PM support to PCIe port. This is needed by
PCIe D3cold support, where PCIe device without ACPI node may be
powered on/off by PCIe port.
Because runtime suspend is broken for some chipsets, a black list is
used to disable runtime PM support for these chipsets.
Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
For a valid pci_dev, dev->bus != NULL always, so remove this
unnecessary test.
Found by Coverity (CID 101680).
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
pci_enable_obff() and pci_enable_ltr() incorrectly check "dev->bus" instead
of "dev->bus->self" to determine whether the upstream device is a P2P
bridge or a host bridge. For devices on the root bus, the upstream device
is a host bridge, "dev->bus != NULL" and "dev->bus->self == NULL", and we
panic with a null pointer dereference.
These functions should previously have panicked when called on devices
supporting OBFF or LTR, so they should be regarded as untested.
Found by Coverity (CID 143038 and CID 143039).
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
pci_intx_mask_supported() assumes INTx masking is supported if the
PCI_COMMAND_INTX_DISABLE bit is writable. But when that bit is set,
some devices don't actually mask INTx or update PCI_STATUS_INTERRUPT
as we expect.
This patch adds a way for quirks to identify these broken devices.
[bhelgaas: split out from Chelsio quirk addition]
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Replace the struct pci_bus secondary/subordinate members with the
struct resource busn_res. Later we'll build a resource tree of these
bus numbers.
[bhelgaas: changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
In a PCI environment, transactions aren't always required to reach
the root bus before being re-routed. Intermediate switches between
an endpoint and the root bus can redirect DMA back downstream before
things like IOMMUs have a chance to intervene. Legacy PCI is always
susceptible to this as it operates on a shared bus. PCIe added a
new capability to describe and control this behavior, Access Control
Services, or ACS.
The utility function pci_acs_enabled() allows us to test the ACS
capabilities of an individual devices against a set of flags while
pci_acs_path_enabled() tests a complete path from a given downstream
device up to the specified upstream device. We also include the
ability to add device specific tests as it's likely we'll see
devices that do not implement ACS, but want to indicate support
for various capabilities in this space.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Unlike PCI Express v1's Capabilities Structure, v2's requires the entire
structure to be implemented. In v2 structures, register fields that
are not implemented are present but hardwired to 0x0. These may
include: Link Capabilities, Status, and Control; Slot Capabilities,
Status, and Control; Root Capabilities, Status, and Control; and all of
the '2' (Device, Link, and Slot) Capabilities, Status, and Control
registers.
This patch removes the redundant capability checks corresponding to the
Link 2's and Slot 2's, Capabilities, Status, and Control registers as they
will be present if Device Capabilities 2's registers are (which explains
why the macros for each of the three are identical).
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This patch resolves potential issues when accessing PCI Express
Capability structures. The makeup of the capability varies
substantially between v1 and v2:
Version 1 of the PCI Express Capability (defined by PCI Express
1.0 and 1.1 base) neither requires the endpoint to implement the
entire PCIe capability structure nor specifies default values of
registers that are not implemented by the device.
Version 2 of the PCI Express Capability (defined by PCIe 1.1
Capability Structure Expansion ECN, PCIe 2.0, 2.1, and 3.0) added
additional registers to the structure and requires all registers
to be either implemented or hardwired to 0.
Due to the differences in the capability structures, code dealing with
capability features must be careful not to access the additional
registers introduced with v2 unless the device is specifically known to
be a v2 capable device. Otherwise, attempts to access non-existant
registers will occur. This is a subtle issue that is hard to track down
when it occurs (and it has - see commit 864d296cf9).
To try and help mitigate such occurrences, this patch introduces
pci_pcie_cap2() which is similar to pci_pcie_cap() but also checks
that the PCIe capability version is >= 2. pci_pcie_cap2() should be
used for qualifying PCIe capability features introduced after v1.
Suggested by Don Dutile.
Acked-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
There are a number of redundant pci_is_pcie() checks in various PCI
Express capabilities related routines like the following:
if (!pci_is_pcie(dev))
return false;
pos = pci_pcie_cap(dev);
if (!pos)
return false;
The current pci_is_pcie() implementation is merely:
static inline bool pci_is_pcie(struct pci_dev *dev)
{
return !!pci_pcie_cap(dev);
}
so we can just drop the pci_is_pcie() test in such cases.
Acked-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The PCI Express Latency Tolerance Reporting (LTR) feature's
pci_ltr_supported() routine is currently only used within
drivers/pci/pci.c so make it static.
Acked-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
pci_max_busnr() has been commented out for years (since 54c762fe62), and
this patch removes it completely.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Pull MIPS updates from Ralf Baechle:
"The whole series has been sitting in -next for quite a while with no
complaints. The last change to the series was before the weekend the
removal of an SPI patch which Grant - even though previously acked by
himself - appeared to raise objections. So I removed it until the
situation is clarified. Other than that all the patches have the acks
from their respective maintainers, all MIPS and x86 defconfigs are
building fine and I'm not aware of any problems introduced by this
series.
Among the key features for this patch series is a sizable patchset for
Lantiq which among other things introduces support for Lantiq's
flagship product, the FALCON SOC. It also means that the opensource
developers behind this patchset have overtaken Lantiq's competing
inhouse development team that was working behind closed doors.
Less noteworthy the ath79 patchset which adds support for a few more
chip variants, cleanups and fixes. Finally the usual dose of tweaking
of generic code."
Fix up trivial conflicts in arch/mips/lantiq/xway/gpio_{ebu,stp}.c where
printk spelling fixes clashed with file move and eventual removal of the
printk.
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (81 commits)
MIPS: lantiq: remove orphaned code
MIPS: Remove all -Wall and almost all -Werror usage from arch/mips.
MIPS: lantiq: implement support for FALCON soc
MTD: MIPS: lantiq: verify that the NOR interface is available on falcon soc
MTD: MIPS: lantiq: implement OF support
watchdog: MIPS: lantiq: implement OF support and minor fixes
SERIAL: MIPS: lantiq: implement OF support
GPIO: MIPS: lantiq: convert gpio-stp-xway to OF
GPIO: MIPS: lantiq: convert gpio-mm-lantiq to OF and of_mm_gpio
GPIO: MIPS: lantiq: move gpio-stp and gpio-ebu to the subsystem folder
MIPS: pci: convert lantiq driver to OF
MIPS: lantiq: convert dma to platform driver
MIPS: lantiq: implement support for clkdev api
MIPS: lantiq: drop ltq_gpio_request() and gpio_to_irq()
OF: MIPS: lantiq: implement irq_domain support
OF: MIPS: lantiq: implement OF support
MIPS: lantiq: drop mips_machine support
OF: PCI: const usage needed by MIPS
MIPS: Cavium: Remove smp_reserve_lock.
MIPS: Move cache setup to setup_arch().
...
On MIPS we want to call of_irq_map_pci from inside
arch/mips/include/asm/pci.h:extern int pcibios_map_irq(
const struct pci_dev *dev, u8 slot, u8 pin);
For this to work we need to change several functions to const usage.
Signed-off-by: John Crispin <blogic@openwrt.org>
Cc: linux-pci@vger.kernel.org
Cc: devicetree-discuss@lists.ozlabs.org
Cc: linux-mips@linux-mips.org
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Patchwork: https://patchwork.linux-mips.org/patch/3710/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The intent of git commit 6fbf9e7a90
"PCI: Introduce __pci_reset_function_locked to be used when holding
device_lock." was to have a non-locking function that would call
pci_dev_reset function.
But it fell short of that by just probing and not actually reseting
the device. To make that work we need a way to move the lock
around device_lock to not be in pci_dev_reset (as the caller of
__pci_reset_function_locked already holds said lock). We do this by
renaming pci_dev_reset to __pci_dev_reset and bubbling said mutex out
of __pci_dev_reset to pci_dev_reset (a wrapper around __pci_dev_reset).
The __pci_reset_function_locked can now call __pci_dev_reset without
having to worry about the dead-lock.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
A PCIe downstream port is a P2P bridge. Its secondary interface is
a link that should lead only to device 0 (unless ARI is enabled)[1], so
we don't probe for non-zero device numbers.
Some Stratus ftServer systems have a PCIe downstream port (02:00.0) that
leads to both an upstream port (03:00.0) and a downstream port (03:01.0),
and 03:01.0 has important devices below it:
[0000:02]-+-00.0-[03-3c]--+-00.0-[04-09]--...
\-01.0-[0a-0d]--+-[USB]
+-[NIC]
+-...
Previously, we didn't enumerate device 03:01.0, so USB and the network
didn't work. This patch adds a DMI quirk to scan all device numbers,
not just 0, below a downstream port.
Based on a patch by Prarit Bhargava.
[1] PCIe spec r3.0, sec 7.3.1
CC: Myron Stowe <mstowe@redhat.com>
CC: Don Dutile <ddutile@redhat.com>
CC: James Paradis <james.paradis@stratus.com>
CC: Matthew Wilcox <matthew.r.wilcox@intel.com>
CC: Jesse Barnes <jbarnes@virtuousgeek.org>
CC: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Some shortcomings introduced into pci_restore_state() by commit
26f41062f2 ("PCI: check for pci bar restore completion and retry")
have been fixed by recent commit ebfc5b802f ("PCI: Fix regression in
pci_restore_state(), v3"), but that commit treats all PCI devices as
those with Type 0 configuration headers.
That is not entirely correct, because Type 1 and Type 2 headers have
different layouts. In particular, the area occupied by BARs in Type 0
config headers contains the secondary status register in Type 1 ones and
it doesn't make sense to retry the restoration of that register even if
the value read back from it after a write is not the same as the written
one (it very well may be different).
For this reason, make pci_restore_state() only retry the restoration
of BARs for Type 0 config headers. This effectively makes it behave
as before commit 26f41062f2 for all header types except for Type 0.
Tested-by: Mikko Vinni <mmvinni@yahoo.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 26f41062f2 ("PCI: check for pci bar restore completion and
retry") attempted to address problems with PCI BAR restoration on
systems where FLR had not been completed before pci_restore_state() was
called, but it did that in an utterly wrong way.
First off, instead of retrying the writes for the BAR registers only, it
did that for all of the PCI config space of the device, including the
status register (whose value after the write quite obviously need not be
the same as the written one). Second, it added arbitrary delay to
pci_restore_state() even for systems where the PCI config space
restoration was successful at first attempt. Finally, the mdelay(10) it
added to every iteration of the writing loop was way too much of a delay
for any reasonable device.
All of this actually caused resume failures for some devices on Mikko's
system.
To fix the regression, make pci_restore_state() only retry the writes
for BAR registers and only wait if the first read from the register
doesn't return the written value. Additionaly, make it wait for 1 ms,
instead of 10 ms, after every failing attempt to write into config
space.
Reported-by: Mikko Vinni <mmvinni@yahoo.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are PCIe devices on the market that report ARI support but
then fail to initialize correctly when ARI is actually used. This
leads to situations in which kernels 2.6.34 and newer fail to handle
systems where the previous kernels worked without any apparent
problems. Unfortunately, it is currently unknown how many such
devices are there.
For this reason, introduce a new kernel command line option,
pci=noari, allowing users to disable PCIe ARI altogether if they
see problems with PCIe device initialization.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This isn't really a quirk; calling it directly from pci_add_device makes
more sense.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Let the user could enable and disable with pci=realloc=on or pci=realloc=off
Also
1. move variable and functions near the place they are used.
2. change macro to function
3. change related functions and variable to static and _init
4. update parameter description accordingly.
This will let us add a config option to control default behavior, and
still allow the user to turn off automatic reallocation if it fails on
their platform until a permanent solution is found.
-v2: still honor pci=realloc, and treat it as pci=realloc=on
also use enum instead of ...
-v3: update kernel-paramenters.txt according to Jesse.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Only one user in driver/pci/pci.c, so we don't need to put it in global
pci.h
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
On some OEM systems, pci_restore_state() is called while FLR has not yet
completed. As a result, PCI BAR register restore is not successful. This fix
reads back the restored value and compares it with saved value and re-tries 10
times before giving up.
Signed-off-by: Jean Guyader <jean.guyader@eu.citrix.com>
Signed-off-by: Eric Chanudet <eric.chanudet@citrix.com>
Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The use case of this is when a driver wants to call FLR when a device
is attached to it using the SysFS "bind" or "unbind" functionality.
The call chain when a user does "bind" looks as so:
echo "0000:01.07.0" > /sys/bus/pci/drivers/XXXX/bind
and ends up calling:
driver_bind:
device_lock(dev); <=== TAKES LOCK
XXXX_probe:
.. pci_enable_device()
...__pci_reset_function(), which calls
pci_dev_reset(dev, 0):
if (!0) {
device_lock(dev) <==== DEADLOCK
The __pci_reset_function_locked function allows the the drivers
'probe' function to call the "pci_reset_function" while still holding
the driver mutex lock.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Fix new kernel-doc warnings:
Warning(drivers/pci/pci.c:2811): No description found for parameter 'dev'
Warning(drivers/pci/pci.c:2811): Excess function parameter 'pdev' description in 'pci_intx_mask_supported'
Warning(drivers/pci/pci.c:2894): No description found for parameter 'dev'
Warning(drivers/pci/pci.c:2894): Excess function parameter 'pdev' description in 'pci_check_and_mask_intx'
Warning(drivers/pci/pci.c:2908): No description found for parameter 'dev'
Warning(drivers/pci/pci.c:2908): Excess function parameter 'pdev' description in 'pci_check_and_unmask_intx'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
During S3 or S4 resume or PCI reset, ATS regs aren't restored correctly.
This patch enables ATS at the device state restore if PCI device has ATS
capability.
Signed-off-by: Xudong Hao <xudong.hao@intel.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
When the runtime PM is activated on PCI, if a device switches state
frequently (e.g. an EHCI controller with autosuspending USB devices
connected) the PCI configuration traces might be very verbose in the
kernel log. Let's guard those traces with DEBUG condition.
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Vincent Palatin <vpalatin@chromium.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The latency timer is read-only and hardwired to zero for all PCIe
devices, both Type 0 and Type 1, so don't bother trying to update it
and cluttering the dmesg log with meaningless "setting latency timer
to 64" messages.
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The 'latency timer' of PCI devices, both Type 0 and Type 1,
is setup in architecture-specific code [see: 'pcibios_set_master()'].
There are two approaches being taken by all the architectures - check
if the 'latency timer' is currently set between 16 and 255 and if not
bring it within bounds, or, do nothing (and then there is the
gratuitously different PA-RISC implementation).
There is nothing architecture-specific about PCI's 'latency timer' so
this patch pulls its setup functionality up into the PCI core by
creating a generic 'pcibios_set_master()' function using the '__weak'
attribute which can be used by all architectures as a default which,
if necessary, can then be over-ridden by architecture-specific code.
No functional change.
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
These new PCI services allow to probe for 2.3-compliant INTx masking
support and then use the feature from PCI interrupt handlers. The
services are properly synchronized with concurrent config space access
via sysfs or on device reset.
This enables generic PCI device drivers like uio_pci_generic or KVM's
device assignment to implement the necessary kernel-side IRQ handling
without any knowledge about device-specific interrupt status and control
registers.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
pci_block_user_cfg_access was designed for the use case that a single
context, the IPR driver, temporarily delays user space accesses to the
config space via sysfs. This assumption became invalid by the time
pci_dev_reset was added as locking instance. Today, if you run two loops
in parallel that reset the same device via sysfs, you end up with a
kernel BUG as pci_block_user_cfg_access detect the broken assumption.
This reworks the pci_block_user_cfg_access to a sleeping service
pci_cfg_access_lock and an atomic-compatible variant called
pci_cfg_access_trylock. The former not only blocks user space access as
before but also waits if access was already locked. The latter service
just returns false in this case, allowing the caller to resolve the
conflict instead of raising a BUG.
Adaptions of the ipr driver were originally written by Brian King.
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
I noticed that hotplug of one setup does not work with recent change in
pci tree.
After checking the bridge conf setup, I noticed that the bridges get
assigned but do not get enabled.
The reason is the following commit, while simply ignores bridge
resources when enabling a pci device:
| commit bbef98ab0f
| Author: Ram Pai <linuxram@us.ibm.com>
| Date: Sun Nov 6 10:33:10 2011 +0800
|
| PCI: defer enablement of SRIOV BARS
|...
| NOTE: Note, there is subtle change in the pci_enable_device() API. Any
| driver that depends on SRIOV BARS to be enabled in pci_enable_device()
| can fail.
Put back bridge resource and ROM resource checking to fix the problem.
That should fix regression like BIOS does not assign correct resource to
bridge.
Discussion can be found at:
http://www.spinics.net/lists/linux-pci/msg12874.html
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
During test of one IB card with guest VM, found that, msi is not
initialized properly.
It turns out __write_msi_msg will do nothing if device current_state is
not PCI_D0. And, that pci device does not have pm_cap in guest VM.
There is an error in setting of power state to PCI_D0 in
pci_enable_device(), but error is not returned for this. Following is
code flow:
pci_enable_device() --> __pci_enable_device_flags() -->
do_pci_enable_device() --> pci_set_power_state() -->
__pci_start_power_transition()
We have following condition inside __pci_start_power_transition():
if (platform_pci_power_manageable(dev)) {
error = platform_pci_set_power_state(dev, state);
if (!error)
pci_update_current_state(dev, state);
} else {
error = -ENODEV;
/* Fall back to PCI_D0 if native PM is not supported */
if (!dev->pm_cap)
dev->current_state = PCI_D0;
}
Here, from platform_pci_set_power_state(), acpi_pci_set_power_state() is
getting called and that is failing with ENODEV because of following
condition:
if (!handle || ACPI_SUCCESS(acpi_get_handle(handle, "_EJ0",&tmp)))
return -ENODEV;
Because of that, pci_update_current_state() is not getting called.
With this patch, if device power state can not be set via
platform_pci_set_power_state and that device does not have native pm
support, then PCI device power state will be set to PCI_D0.
-v2: This also reverts 47e9037ac1, as it's
not needed after this change.
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Ajaykumar Hotchandani<ajaykumar.hotchandani@oracle.com>
Signed-off-by: Yinghai Lu<yinghai.lu@oracle.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
All the PCI BARs of a device are enabled when the device is enabled
using pci_enable_device(). This unnecessarily enables SRIOV BARs of the
device.
On some platforms, which do not support SRIOV as yet, the
pci_enable_device() fails to enable the device if its SRIOV BARs are not
allocated resources correctly.
The following patch fixes the above problem. The SRIOV BARs are now
enabled when IOV capability of the device is enabled in sriov_enable().
NOTE: Note, there is subtle change in the pci_enable_device() API. Any
driver that depends on SRIOV BARS to be enabled in pci_enable_device()
can fail.
The patch has been touch tested on power and x86 platform.
Tested-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
When configuring the PCIe settings for "performance", we allow parents
to have a larger Max Payload Size than children and rely on children
Max Read Request Size to not be larger than their own MPS to avoid
having the host bridge generate responses they can't cope with.
However, various drivers in Linux call pci_set_readrq() with arbitrary
values, assuming this to be a simple performance tweak. This breaks
under our "performance" configuration.
Fix that by making sure the value programmed by pcie_set_readrq() is
never larger than the configured MPS for that device.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jon Mason <mason@myri.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The land of PCI power management is a land of sorrow and ugliness,
especially in the area of signaling events by devices. There are
devices that set their PME Status bits, but don't really bother
to send a PME message or assert PME#. There are hardware vendors
who don't connect PME# lines to the system core logic (they know
who they are). There are PCI Express Root Ports that don't bother
to trigger interrupts when they receive PME messages from the devices
below. There are ACPI BIOSes that forget to provide _PRW methods for
devices capable of signaling wakeup. Finally, there are BIOSes that
do provide _PRW methods for such devices, but then don't bother to
call Notify() for those devices from the corresponding _Lxx/_Exx
GPE-handling methods. In all of these cases the kernel doesn't have
a chance to receive a proper notification that it should wake up a
device, so devices stay in low-power states forever. Worse yet, in
some cases they continuously send PME Messages that are silently
ignored, because the kernel simply doesn't know that it should clear
the device's PME Status bit.
This problem was first observed for "parallel" (non-Express) PCI
devices on add-on cards and Matthew Garrett addressed it by adding
code that polls PME Status bits of such devices, if they are enabled
to signal PME, to the kernel. Recently, however, it has turned out
that PCI Express devices are also affected by this issue and that it
is not limited to add-on devices, so it seems necessary to extend
the PME polling to all PCI devices, including PCI Express and planar
ones. Still, it would be wasteful to poll the PME Status bits of
devices that are known to receive proper PME notifications, so make
the kernel (1) poll the PME Status bits of all PCI and PCIe devices
enabled to signal PME and (2) disable the PME Status polling for
devices for which correct PME notifications are received.
Tested-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Add the ability to disable PCI-E MPS turning and using the BIOS
configured MPS defaults. Due to the number of issues recently
discovered on some x86 chipsets, make this the default behavior.
Also, add the option for peer to peer DMA MPS configuration. Peer to
peer DMA is outside the scope of this patch, but MPS configuration could
prevent it from working by having the MPS on one root port different
than the MPS on another. To work around this, simply make the system
wide MPS the smallest possible value (128B).
Signed-off-by: Jon Mason <mason@myri.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Modifying the Maximum Read Request Size to 0 (value of 128Bytes) has
massive negative ramifications on some devices. Without knowing which
devices have this issue, do not modify from the default value when
walking the PCI-E bus in pcie_bus_safe mode. Also, make pcie_bus_safe
the default procedure.
Tested-by: Sven Schnelle <svens@stackframe.org>
Tested-by: Simon Kirby <sim@hostway.ca>
Tested-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-and-tested-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
References: https://bugzilla.kernel.org/show_bug.cgi?id=42162
Signed-off-by: Jon Mason <mason@myri.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix new kernel-doc warning in pci.c:
Warning(drivers/pci/pci.c:3259): No description found for parameter 'mps'
Warning(drivers/pci/pci.c:3259): Excess function parameter 'rq' description in 'pcie_set_mps'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On a given PCI-E fabric, each device, bridge, and root port can have a
different PCI-E maximum payload size. There is a sizable performance
boost for having the largest possible maximum payload size on each PCI-E
device. However, if improperly configured, fatal bus errors can occur.
Thus, it is important to ensure that PCI-E payloads sends by a device
are never larger than the MPS setting of all devices on the way to the
destination.
This can be achieved two ways:
- A conservative approach is to use the smallest common denominator of
the entire tree below a root complex for every device on that fabric.
This means for example that having a 128 bytes MPS USB controller on one
leg of a switch will dramatically reduce performances of a video card or
10GE adapter on another leg of that same switch.
It also means that any hierarchy supporting hotplug slots (including
expresscard or thunderbolt I suppose, dbl check that) will have to be
entirely clamped to 128 bytes since we cannot predict what will be
plugged into those slots, and we cannot change the MPS on a "live"
system.
- A more optimal way is possible, if it falls within a couple of
constraints:
* The top-level host bridge will never generate packets larger than the
smallest TLP (or if it can be controlled independently from its MPS at
least)
* The device will never generate packets larger than MPS (which can be
configured via MRRS)
* No support of direct PCI-E <-> PCI-E transfers between devices without
some additional code to specifically deal with that case
Then we can use an approach that basically ignores downstream requests
and focuses exclusively on upstream requests. In that case, all we need
to care about is that a device MPS is no larger than its parent MPS,
which allows us to keep all switches/bridges to the max MPS supported by
their parent and eventually the PHB.
In this case, your USB controller would no longer "starve" your 10GE
Ethernet and your hotplug slots won't affect your global MPS.
Additionally, the hotplugged devices themselves can be configured to a
larger MPS up to the value configured in the hotplug bridge.
To choose between the two available options, two PCI kernel boot args
have been added to the PCI calls. "pcie_bus_safe" will provide the
former behavior, while "pcie_bus_perf" will perform the latter behavior.
By default, the latter behavior is used.
NOTE: due to the location of the enablement, each arch will need to add
calls to this function. This patch only enables x86.
This patch includes a number of changes recommended by Benjamin
Herrenschmidt.
Tested-by: Jordan_Hargrave@dell.com
Signed-off-by: Jon Mason <mason@myri.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
When setting the PCI-E MRRS, pcie_set_readrq queries the current
settings via a pci_read_config_word call but writes the modified result
via a pci_write_config_dword. This results in writing 16 more bits than
were queried.
Also, the function description comment is slightly incorrect.
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The function pci_enable_ari() may mistakenly set the downstream port
of a v1 PCIe switch in ARI Forwarding mode. This is a PCIe v2 feature,
and with an SR-IOV device on that switch port believing the switch above
is ARI capable it may attempt to use functions 8-255, translating into
invalid (non-zero) device numbers for that bus. This has been seen
to cause Completion Timeouts and general misbehaviour including hangs
and panics.
Cc: stable@kernel.org
Acked-by: Don Dutile <ddutile@redhat.com>
Tested-by: Don Dutile <ddutile@redhat.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Multiple attempts to dynamically reallocate pci resources have
unfortunately lead to regressions. Though we continue to fix the
regressions and fine tune the dynamic-reallocation behavior, we have not
reached a acceptable state yet.
This patch provides a interim solution. It disables dynamic reallocation
by default, but adds the ability to enable it through pci=realloc kernel
command line parameter.
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
x86/PCI/ACPI: fix type mismatch
PCI: fix new kernel-doc warning
PCI: Fix warning in drivers/pci/probe.c on sparc64
When I added 3448a19da4
I forgot about the special uv handling code for this, so this
patch fixes it up.
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Ingo Molnar
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fix pci.c kernel-doc warnings:
Warning(drivers/pci/pci.c:3292): No description found for parameter 'flags'
Warning(drivers/pci/pci.c:3292): Excess function parameter 'change_bridge_flags' description in 'pci_set_vga_state'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'drm-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (169 commits)
drivers/gpu/drm/radeon/atom.c: fix warning
drm/radeon/kms: bump kms version number
drm/radeon/kms: properly set num banks for fusion asics
drm/radeon/kms/atom: move dig phy init out of modesetting
drm/radeon/kms/cayman: fix typo in register mask
drm/radeon/kms: fix typo in spread spectrum code
drm/radeon/kms: fix tile_config value reported to userspace on cayman.
drm/radeon/kms: fix incorrect comparison in cayman setup code.
drm/radeon/kms: add wait idle ioctl for eg->cayman
drm/radeon/cayman: setup hdp to invalidate and flush when asked
drm/radeon/evergreen/btc/fusion: setup hdp to invalidate and flush when asked
agp/uninorth: Fix lockups with radeon KMS and >1x.
drm/radeon/kms: the SS_Id field in the LCD table if for LVDS only
drm/radeon/kms: properly set the CLK_REF bit for DCE3 devices
drm/radeon/kms: fixup eDP connector handling
drm/radeon/kms: bail early for eDP in hotplug callback
drm/radeon/kms: simplify hotplug handler logic
drm/radeon/kms: rewrite DP handling
drm/radeon/kms/atom: add support for setting DP panel mode
drm/radeon/kms: atombios.h updates for DP panel mode
...
For KVM device assignment, we'd like to save off the state of a device
prior to passing it to the guest and restore it later. We also want
to allow pci_reset_funciton() to be called while the device is owned
by the guest. This however overwrites and invalidates the struct pci_dev
buffers, so we can't just manually call save and restore. Add generic
interfaces for the saved state to be stored and reloaded back into
struct pci_dev at a later time.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This will allow us to store and load it later.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Latency tolerance reporting allows devices to send messages to the root
complex indicating their latency tolerance for snooped & unsnooped
memory transactions. Add support for enabling & disabling this
feature, along with a routine to set the max latencies a device should
send upstream.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
OBFF (optimized buffer flush/fill), where supported, can help improve
energy efficiency by giving devices information about when interrupts
and other activity will have a reduced power impact. It requires
support from both the device and system (i.e. not only does the device
need to respond to OBFF messages, but the platform must be capable of
generating and routing them to the end point).
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Add support to allow drivers to enable/disable ID-based ordering. Where
supported, ID-based ordering can significantly improve the latency of
individual requests by preventing them from queueing up behind unrelated
traffic.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The pci_pm_reset() function is not a very nice interface due to its
limitations and conditional behavior (e.g. it doesn't affect devices
in low-power states), but it cannot be simply dropped, because
existing device drivers may depend on it. However, its behavior and
limitations should be well documented, so add an appropriate
kerneldoc comment to it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
So in a lot of modern systems, a GPU will always be below a parent bridge that won't share with any other GPUs. This means VGA arbitration on those GPUs can be controlled by using the bridge routing instead of io/mem decodes.
The problem is locating which GPUs share which upstream bridges. This patch attempts to identify all the GPUs which can be controlled via bridges, and ones that can't. This patch endeavours to work out the bridge sharing semantics.
When disabling GPUs via a bridge, it doesn't do irq callbacks or touch the io/mem decodes for the gpu.
Signed-off-by: Dave Airlie <airlied@redhat.com>
v3 -> v2: Moved ASPM enabling logic to pci_set_power_state()
v2 -> v1: Preserved the logic in pci_raw_set_power_state()
: Added ASPM enabling logic after scanning Root Bridge
: http://marc.info/?l=linux-pci&m=130046996216391&w=2
v1 : http://marc.info/?l=linux-pci&m=130013164703283&w=2
The assumption made in commit 41cd766b06
(PCI: Don't enable aspm before drivers have had a chance to veto it) that
pci_enable_device() will result in re-configuring ASPM when aspm_policy is
POWERSAVE is no longer valid. This is due to commit
97c145f7c8 (PCI: read current power state
at enable time) which resets dev->current_state to D0. Due to this the
call to pcie_aspm_pm_state_change() is never made. Note the equality check
(below) that returns early:
./drivers/pci/pci.c: pci_raw_set_pci_power_state()
546 /* Check if we're already there */
547 if (dev->current_state == state)
548 return 0;
Therefore OSPM never configures the PCIe links for ASPM to turn them "on".
Fix it by configuring ASPM from the pci_enable_device() code path. This
also allows a driver such as the e1000e networking driver a chance to
disable ASPM (L0s, L1), if need be, prior to enabling the device. A
driver may perform this action if the device is known to mis-behave
wrt ASPM.
Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Make wakeup events be reported by the PCI subsystem before attempting to
resume devices or queuing up runtime resume requests for them, because
wakeup events should be reported as soon as they have been detected.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
After recent changes related to wakeup events pm_wakeup_event()
automatically checks if the given device is configured to signal wakeup,
so pci_wakeup_event() may be a static inline function calling
pm_wakeup_event() directly.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
pci_restore_state only ever returns 0, thus there is no benefit in
having it return any value. Also, a large majority of the callers do
not check the return code of pci_restore_state. Make the
pci_restore_state a void return and avoid the overhead.
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
When we enable a PCI device, we avoid doing a lot of the initial setup
work if the device's enable count is non-zero. If we don't fetch the
power state though, we may later fail to set up MSI due to the unknown
status. So pick it up before we short circuit the rest due to a
pre-existing enable or mismatched enable/disable pair (as happens with
VGA devices, which are special in a special way).
Tested-by: Jesse Brandeburg <jesse.brandeburg@gmail.com>
Reported-by: Dave Airlie <airlied@linux.ie>
Tested-by: Dave Airlie <airlied@linux.ie>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Not all hardware vendors hook up the PME line for legacy PCI devices,
meaning that wakeup events get lost. The only way around this is to poll
the devices to see if their state has changed, so add support for doing
that on legacy PCI devices that aren't part of the core chipset.
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Indent the branch of an if.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@r disable braces4@
position p1,p2;
statement S1,S2;
@@
(
if (...) { ... }
|
if (...) S1@p1 S2@p2
)
@script:python@
p1 << r.p1;
p2 << r.p2;
@@
if (p1[0].column == p2[0].column):
cocci.print_main("branch",p1)
cocci.print_secs("after",p2)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (30 commits)
PCI: update for owner removal from struct device_attribute
PCI: Fix warnings when CONFIG_DMI unset
PCI: Do not run NVidia quirks related to MSI with MSI disabled
x86/PCI: use for_each_pci_dev()
PCI: use for_each_pci_dev()
PCI: MSI: Restore read_msi_msg_desc(); add get_cached_msi_msg_desc()
PCI: export SMBIOS provided firmware instance and label to sysfs
PCI: Allow read/write access to sysfs I/O port resources
x86/PCI: use host bridge _CRS info on ASRock ALiveSATA2-GLAN
PCI: remove unused HAVE_ARCH_PCI_SET_DMA_MAX_SEGMENT_{SIZE|BOUNDARY}
PCI: disable mmio during bar sizing
PCI: MSI: Remove unsafe and unnecessary hardware access
PCI: Default PCIe ASPM control to on and require !EMBEDDED to disable
PCI: kernel oops on access to pci proc file while hot-removal
PCI: pci-sysfs: remove casts from void*
ACPI: Disable ASPM if the platform won't provide _OSC control for PCIe
PCI hotplug: make sure child bridges are enabled at hotplug time
PCI hotplug: shpchp: Removed check for hotplug of display devices
PCI hotplug: pciehp: Fixed return value sign for pciehp_unconfigure_device
PCI: Don't enable aspm before drivers have had a chance to veto it
...
In 2.6.34, we transformed the PCI DMA API into the generic device
mode. The PCI DMA API is just the wrapper of the DMA API.
So we don't need HAVE_ARCH_PCI_SET_DMA_MAX_SEGMENT_SIZE or
HAVE_ARCH_PCI_SET_DMA_SEGMENT_BOUNDARY (which enable architectures to
have the own implementations). Both haven't been used anyway.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
One of the arguments during the suspend blockers discussion was that
the mainline kernel didn't contain any mechanisms making it possible
to avoid races between wakeup and system suspend.
Generally, there are two problems in that area. First, if a wakeup
event occurs exactly when /sys/power/state is being written to, it
may be delivered to user space right before the freezer kicks in, so
the user space consumer of the event may not be able to process it
before the system is suspended. Second, if a wakeup event occurs
after user space has been frozen, it is not generally guaranteed that
the ongoing transition of the system into a sleep state will be
aborted.
To address these issues introduce a new global sysfs attribute,
/sys/power/wakeup_count, associated with a running counter of wakeup
events and three helper functions, pm_stay_awake(), pm_relax(), and
pm_wakeup_event(), that may be used by kernel subsystems to control
the behavior of this attribute and to request the PM core to abort
system transitions into a sleep state already in progress.
The /sys/power/wakeup_count file may be read from or written to by
user space. Reads will always succeed (unless interrupted by a
signal) and return the current value of the wakeup events counter.
Writes, however, will only succeed if the written number is equal to
the current value of the wakeup events counter. If a write is
successful, it will cause the kernel to save the current value of the
wakeup events counter and to abort the subsequent system transition
into a sleep state if any wakeup events are reported after the write
has returned.
[The assumption is that before writing to /sys/power/state user space
will first read from /sys/power/wakeup_count. Next, user space
consumers of wakeup events will have a chance to acknowledge or
veto the upcoming system transition to a sleep state. Finally, if
the transition is allowed to proceed, /sys/power/wakeup_count will
be written to and if that succeeds, /sys/power/state will be written
to as well. Still, if any wakeup events are reported to the PM core
by kernel subsystems after that point, the transition will be
aborted.]
Additionally, put a wakeup events counter into struct dev_pm_info and
make these per-device wakeup event counters available via sysfs,
so that it's possible to check the activity of various wakeup event
sources within the kernel.
To illustrate how subsystems can use pm_wakeup_event(), make the
low-level PCI runtime PM wakeup-handling code use it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: markgross <markgross@thegnar.org>
Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
virtio-pci resets the device at startup by writing to the status
register, but this does not clear the pci config space,
specifically msi enable status which affects register
layout.
This breaks things like kdump when they try to use e.g. virtio-blk.
Fix by forcing msi off at startup. Since pci.c already has
a routine to do this, we export and use it instead of duplicating code.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (44 commits)
vlynq: make whole Kconfig-menu dependant on architecture
add descriptive comment for TIF_MEMDIE task flag declaration.
EEPROM: max6875: Header file cleanup
EEPROM: 93cx6: Header file cleanup
EEPROM: Header file cleanup
agp: use NULL instead of 0 when pointer is needed
rtc-v3020: make bitfield unsigned
PCI: make bitfield unsigned
jbd2: use NULL instead of 0 when pointer is needed
cciss: fix shadows sparse warning
doc: inode uses a mutex instead of a semaphore.
uml: i386: Avoid redefinition of NR_syscalls
fix "seperate" typos in comments
cocbalt_lcdfb: correct sections
doc: Change urls for sparse
Powerpc: wii: Fix typo in comment
i2o: cleanup some exit paths
Documentation/: it's -> its where appropriate
UML: Fix compiler warning due to missing task_struct declaration
UML: add kernel.h include to signal.c
...
This fixes all occurrences of pci_enable_device and pci_disable_device
in all comments. There are no code changes involved.
Signed-off-by: Roman Fietze <roman.fietze@telemotive.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch (as1353) removes a couple of unnecessary assignments from
the PCI core. The should_wakeup flag is naturally initialized to 0;
there's no need to clear it.
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
If the firmware puts a device back into D0 state at resume time, we'll
update its state in resume_noirq and thus skip the platform resume code.
Calling that code twice should be safe and we ought to avoid getting to
that point anyway, so remove the check and also allow the platform pci
code to be called for D0.
Fixes USB not being powered after resume on recent Lenovo machines.
Acked-by: Alex Chiang <achiang@canonical.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
pcix_get_mmrbc() returns the maximum memory read byte count (mmrbc), if
successful, or an appropriate error value, if not.
Distinguishing errors from correct values and understanding the meaning of an
error can be somewhat confusing in that:
correct values: 512, 1024, 2048, 4096
errors: -EINVAL -22
PCIBIOS_FUNC_NOT_SUPPORTED 0x81
PCIBIOS_BAD_VENDOR_ID 0x83
PCIBIOS_DEVICE_NOT_FOUND 0x86
PCIBIOS_BAD_REGISTER_NUMBER 0x87
PCIBIOS_SET_FAILED 0x88
PCIBIOS_BUFFER_TOO_SMALL 0x89
The PCIBIOS_ errors are returned from the PCI functions generated by the
PCI_OP_READ() and PCI_OP_WRITE() macros.
In a similar manner, pcix_set_mmrbc() also returns the PCIBIOS_ error values
returned from pci_read_config_[word|dword]() and pci_write_config_word().
Following pcix_get_max_mmrbc()'s example, the following patch simply returns
-EINVAL for all PCIBIOS_ errors encountered by pcix_get_mmrbc(), and -EINVAL
or -EIO for those encountered by pcix_set_mmrbc().
This simplification was chosen in light of the fact that none of the current
callers of these functions are interested in the specific type of error
encountered. In the future, should this change, one could simply create a
function that maps each PCIBIOS_ error to a corresponding unique errno value,
which could be called by pcix_get_max_mmrbc(), pcix_get_mmrbc(), and
pcix_set_mmrbc().
Additionally, this patch eliminates some unnecessary variables.
Cc: stable@kernel.org
Signed-off-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
An e1000 driver on a system with a PCI-X bus was always being returned
a value of 135 from both pcix_get_mmrbc() and pcix_set_mmrbc(). This
value reflects an error return of PCIBIOS_BAD_REGISTER_NUMBER from
pci_bus_read_config_dword(,, cap + PCI_X_CMD,).
This is because for a dword, the following portion of the PCI_OP_READ()
macro:
if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER;
expands to:
if (pos & 3) return PCIBIOS_BAD_REGISTER_NUMBER;
And is always true for 'cap + PCI_X_CMD', which is 0xe4 + 2 = 0xe6. ('cap' is
the result of calling pci_find_capability(, PCI_CAP_ID_PCIX).)
The same problem exists for pci_bus_write_config_dword(,, cap + PCI_X_CMD,).
In both cases, instead of calling _dword(), _word() should be called.
Cc: stable@kernel.org
Signed-off-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
When pci_register_set_vga_state() was made __init, the EXPORT_SYMBOL() was
retained, which now leaves us with a section mismatch.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Cc: Mike Travis <travis@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
For the PCI_X_STATUS register, pcix_get_max_mmrbc() is returning an incorrect
value, which is based on:
(stat & PCI_X_STATUS_MAX_READ) >> 12
Valid return values are 512, 1024, 2048, 4096, which correspond to a 'stat'
(masked and right shifted by 21) of 0, 1, 2, 3, respectively.
A right shift by 11 would generate the correct return value when 'stat' (masked
and right shifted by 21) has a value of 1 or 2. But for a value of 0 or 3 it's
not possible to generate the correct return value by only right shifting.
Fix is based on pcix_get_mmrbc()'s similar dealings with the PCI_X_CMD register.
Cc: stable@kernel.org
Signed-off-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We can use pci-dma-compat.h to implement pci_set_dma_mask and
pci_set_consistent_dma_mask as we do with the other PCI DMA API.
We can remove HAVE_ARCH_PCI_SET_DMA_MASK too.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Greg KH <greg@kroah.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dma_set_coherent_mask corresponds to pci_set_consistent_dma_mask. This is
necessary to move to the generic device model DMA API from the PCI bus
specific API in the long term.
dma_set_coherent_mask works in the exact same way that
pci_set_consistent_dma_mask does. So this patch also changes
pci_set_consistent_dma_mask to call dma_set_coherent_mask.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: James Bottomley <James.Bottomley@suse.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Greg KH <greg@kroah.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This changes pci_set_dma_mask to call the generic DMA API, dma_set_mask.
pci_set_dma_mask (in drivers/pci/pci.c) does the same things that
dma_set_mask does on all the architectures that use pci_set_dma_mask;
calls dma_supprted and sets dev->dma_mask. So we safely change
pci_set_dma_mask to simply call dma_set_mask.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: James Bottomley <James.Bottomley@suse.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Greg KH <greg@kroah.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In the future, we are going to be changing the lock type for struct
device (once we get the lockdep infrastructure properly worked out) To
make that changeover easier, and to possibly burry the lock in a
different part of struct device, let's create some functions to lock and
unlock a device so that no out-of-core code needs to be changed in the
future.
This patch creates the device_lock/unlock/trylock() functions, and
converts all in-tree users to them.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Dave Young <hidave.darkstar@gmail.com>
Cc: Ming Lei <tom.leiming@gmail.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Len Brown <len.brown@intel.com>
Cc: Magnus Damm <damm@igel.co.jp>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: David Brownell <dbrownell@users.sourceforge.net>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Alex Chiang <achiang@hp.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrew Patterson <andrew.patterson@hp.com>
Cc: Yu Zhao <yu.zhao@intel.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Wolfram Sang <w.sang@pengutronix.de>
Cc: CHENG Renquan <rqcheng@smu.edu.sg>
Cc: Oliver Neukum <oliver@neukum.org>
Cc: Frans Pop <elendil@planet.nl>
Cc: David Vrabel <david.vrabel@csr.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Make the run-time power management of PCI devices be inactive by
default by calling pm_runtime_forbid() for each PCI device during its
initialization. This setting may be overriden by the user space with
the help of the /sys/devices/.../power/control interface.
That's necessary to avoid breakage on systems where ACPI-based
wake-up is known to fail for some devices.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'x86-pci-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Enable NMI on all cpus on UV
vgaarb: Add user selectability of the number of GPUS in a system
vgaarb: Fix VGA arbiter to accept PCI domains other than 0
x86, uv: Update UV arch to target Legacy VGA I/O correctly.
pci: Update pci_set_vga_state() to call arch functions
Set power.async_suspend for all PCI devices and PCIe port services,
so that they can be suspended and resumed in parallel with other
devices they don't depend on in a known way (i.e. devices which are
not their parents or children).
This only affects the "regular" suspend and resume stages, which
means in particular that the restoration of the PCI devices' standard
configuration registers during resume will still be carried out
synchronously (at the "early" resume stage).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
No functional change; this converts loops that iterate from 0 to
PCI_BUS_NUM_RESOURCES through pci_bus resource[] table to use the
pci_bus_for_each_resource() iterator instead.
This doesn't change the way resources are stored; it merely removes
dependencies on the fact that they're in a table.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Introduce run-time PM callbacks for the PCI bus type. Make the new
callbacks work in analogy with the existing system sleep PM
callbacks, so that the drivers already converted to struct dev_pm_ops
can use their suspend and resume routines for run-time PM without
modifications.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Although the majority of PCI devices can generate PMEs that in
principle may be used to wake up devices suspended at run time,
platform support is generally necessary to convert PMEs into wake-up
events that can be delivered to the kernel. If ACPI is used for this
purpose, PME signals generated by a PCI device will trigger the ACPI
GPE associated with the device to generate an ACPI wake-up event that
we can set up a handler for, provided that everything is configured
correctly.
Unfortunately, the subset of PCI devices that have GPEs associated
with them is quite limited. The devices without dedicated GPEs have
to rely on the GPEs associated with other devices (in the majority of
cases their upstream bridges and, possibly, the root bridge) to
generate ACPI wake-up events in response to PME signals from them.
Add ACPI platform support for PCI PME wake-up:
o Add a framework making is possible to use ACPI system notify
handlers for run-time PM.
o Add new PCI platform callback ->run_wake() to struct
pci_platform_pm_ops allowing us to enable/disable the platform to
generate wake-up events for given device. Implemet this callback
for the ACPI platform.
o Define ACPI wake-up handlers for PCI devices and PCI root buses and
make the PCI-ACPI binding code register wake-up notifiers for all
PCI devices present in the ACPI tables.
o Add function pci_dev_run_wake() which can be used by PCI drivers to
check if given device is capable of generating wake-up events at
run time.
Developed in cooperation with Matthew Garrett <mjg@redhat.com>.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Add function pci_check_pme_status() that will check the PME status
bit of given device and clear it along with the PME enable bit. It
will be necessary for PCI run-time power management.
Based on a patch from Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Currently, drivers/pci/quirks.c is built unconditionally, but if
CONFIG_PCI_QUIRKS is unset, the only things actually built in this
file are definitions of global variables and empty functions (due to
the #ifdef CONFIG_PCI_QUIRKS embracing all of the code inside the
file). This is not particularly nice and if someone overlooks
the #ifdef CONFIG_PCI_QUIRKS, build errors are introduced.
To clean that up, move the definitions of the global variables in
quirks.c that are always built to pci.c, move the definitions of
the empty functions (compiled when CONFIG_PCI_QUIRKS is unset) to
headers (additionally make these functions static inline) and modify
drivers/pci/Makefile so that quirks.c is only built if
CONFIG_PCI_QUIRKS is set.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
For use by code that needs to walk extended capability lists before
pci_dev structures are set up.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80CFD@orsmsx508.amr.corp.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Update pci_set_vga_state to call arch dependent functions to enable Legacy
VGA I/O transactions to be redirected to correct target.
[akpm@linux-foundation.org: make pci_register_set_vga_state() __init]
Signed-off-by: Mike Travis <travis@sgi.com>
LKML-Reference: <201002022238.o12McE1J018723@imap1.linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Robin Holt <holt@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: David Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
It turns out that some PCI devices require extra delays when changing
power state from D3 to D0 (and the other way around). Although this
is against the PCI specification, we can handle it quite easily by
allowing drivers to define arbitrary D3 delays for devices known to
require extra time for switching power states.
Introduce additional field d3_delay in struct pci_dev and use it to
store the value of the device's D0->D3 delay, in miliseconds. Make
the PCI PM core code use the per-device d3_delay unless
pci_pm_d3_delay is greater (in which case the latter is used).
[This also allows the driver to specify d3_delay shorter than the
10 ms required by the PCI standard if the device is known to be able
to handle that.]
Make the sky2 driver set d3_delay to 150 for devices handled by it.
Fixes http://bugzilla.kernel.org/show_bug.cgi?id=14730 which is a
listed regression from 2.6.30.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
After commit b9c3b26641 ("PCI: support
device-specific reset methods") the kernel build is broken if
CONFIG_PCI_QUIRKS is unset.
Fix this by moving pci_dev_specific_reset() to drivers/pci/quirks.c and
providing an empty replacement for !CONFIG_PCI_QUIRKS builds.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The cardbus code creates PCI devices without ever going through the
necessary fixup bits and pieces that normal PCI devices go through.
There's in fact a commented out call to pcibios_fixup_bus() in there,
it's commented because ... it doesn't work.
I could make pcibios_fixup_bus() do the right thing on powerpc easily
but I felt it cleaner instead to provide a specific hook pci_fixup_cardbus
for which a weak empty implementation is provided by the PCI core.
This fixes cardbus on powerbooks and probably all other PowerPC
platforms which was broken completely for ever on some platforms and
since 2.6.31 on others such as PowerBooks when we made the DMA ops
mandatory (since those are setup by the fixups).
Acked-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Add a new type of quirk for resetting devices at pci_dev_reset time.
This is necessary to handle device with nonstandard reset procedures,
especially useful for guest drivers.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Remove a stray space in pci_save_state().
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Commit ae21ee65e8 "PCI: acs p2p upsteram
forwarding enabling" doesn't actually enable ACS.
Add a function to pci core to allow an IOMMU to request that ACS
be enabled. The existing mechanism of using iommu_found() in the pci
core to know when ACS should be enabled doesn't actually work due to
initialization order; iommu has only been detected not initialized.
Have Intel and AMD IOMMUs request ACS, and Xen does as well during early
init of dom0.
Cc: Allen Kay <allen.m.kay@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The pcie_flr routine writes the device control register with the FLR bit
set clearing all other fields for the FLR duration. Among other fields,
the Max_Payload_Size is also cleared which can cause errors if there are
transactions lurking in the HW pipeline. The patch replaces the blank
write with read-modify-write of the control register keeping the other
fields intact.
Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Change for PCI core to use pci_is_pcie() instead of checking
pci_dev->is_pcie.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Use pcie_cap() instead of pci_find_capability() to get PCIe capability
offset in PCI core code. This avoids unnecessary search in PCI
configuration space.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
I'm not entirely sure it needs to go into 32, but it's probably the right
thing to do. Another way of explaining the patch is:
- we currently pick the _first_ exactly matching bus resource entry, but
the _last_ inexactly matching one. Normally first/last shouldn't
matter, but bus resource entries aren't actually all created equal: in
a transparent bus, the last resources will be the parent resources,
which we should generally try to avoid unless we have no choice. So
"first matching" is the thing we should always aim for.
- the patch is a bit bigger than it needs to be, because I simplified the
logic at the same time. It used to be a fairly incomprehensible
if ((res->flags & IORESOURCE_PREFETCH) && !(r->flags & IORESOURCE_PREFETCH))
best = r; /* Approximating prefetchable by non-prefetchable */
and technically, all the patch did was to make that complex choice be
even more complex (it basically added a "&& !best" to say that if we
already gound a non-prefetchable window for the prefetchable resource,
then we won't override an earlier one with that later one: remember
"first matching").
- So instead of that complex one with three separate conditionals in one,
I split it up a bit, and am taking advantage of the fact that we
already handled the exact case, so if 'res->flags' has the PREFETCH
bit, then we already know that 'r->flags' will _not_ have it. So the
simplified code drops the redundant test, and does the new '!best' test
separately. It also uses 'continue' as a way to ignore the bus
resource we know doesn't work (ie a prefetchable bus resource is _not_
acceptable for anything but an exact match), so it turns into:
/* We can't insert a non-prefetch resource inside a prefetchable parent .. */
if (r->flags & IORESOURCE_PREFETCH)
continue;
/* .. but we can put a prefetchable resource inside a non-prefetchable one */
if (!best)
best = r;
instead. With the comments, it's now six lines instead of two, but it's
conceptually simpler, and I _could_ have written it as two lines:
if ((res->flags & IORESOURCE_PREFETCH) && !best)
best = r; /* Approximating prefetchable by non-prefetchable */
but I thought that was too damn subtle.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
SPIN_LOCK_UNLOCKED is deprecated. Use DEFINE_SPINLOCK instead.
Make the lock static while at it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This makes PCI resource management messages more consistent and adds a few
new messages to aid debugging.
Whenever we assign resources to a device, update a BAR, or change a
bridge aperture, it's worth noting it.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Messages about PME# being supported and enabled/disabled are probably
useful for debug, but maybe don't need to be on the console.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Jesse accidentally applied v1 [1] of the patchset instead of v2 [2]. This
is the diff between v1 and v2.
The changes in this patch are:
- tidied vsprintf stack buffer to shrink and compute size more
accurately
- use %pR for decoding and %pr for "raw" (with type and flags) instead
of adding %pRt and %pRf
[1] http://lkml.org/lkml/2009/10/6/491
[2] http://lkml.org/lkml/2009/10/13/441
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Note: dom0 checking in v4 has been separated out into 2/2.
This patch enables P2P upstream forwarding in ACS capable PCIe switches.
It solves two potential problems in virtualization environment where a PCIe
device is assigned to a guest domain using a HW iommu such as VT-d:
1) Unintentional failure caused by guest physical address programmed
into the device's DMA that happens to match the memory address range
of other downstream ports in the same PCIe switch. This causes the PCI
transaction to go to the matching downstream port instead of go to the
root complex to get translated by VT-d as it should be.
2) Malicious guest software intentionally attacks another downstream
PCIe device by programming the DMA address into the assigned device
that matches memory address range of the downstream PCIe port.
We are in process of implementing device filtering software in KVM/XEN
management software to allow device assignment of PCIe devices behind a PCIe
switch only if it has ACS capability and with the P2P upstream forwarding bits
enabled. This patch is intended to work for both KVM and Xen environments.
Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Reviewed-by: Mathew Wilcox <willy@linux.intel.com>
Reviewed-by: Chris Wright <chris@sous-sol.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
pci_dfl_cache_line_size is marked as __initdata but referenced by
pci_init() which is __devinit. Make it __devinitdata instead of
__initdata.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
For non hotplug PCI devices, the system firmware usually configures
CLS correctly. For pccard devices system firmware can't do it and
Linux PCI layer doesn't do it either. Unfortunately this leads to
poor performance for certain devices (sata_sil). Unless MWI, which
requires separate configuration, is to be used, CLS doesn't affect
correctness, so the configuration should be harmless.
This patch makes pci_set_cacheline_size() always built and export it
and make pccard call it during attach.
Please note that some other PCI hotplug drivers (shpchp and pciehp)
also configure CLS on hotplug.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Daniel Ritz <daniel.ritz@gmx.ch>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Greg KH <greg@kroah.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Cc: Axel Birndt <towerlexa@gmx.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
sparc64 is now the only user of PCI_CACHE_LINE_BYTES. Drop it and set
pci_dfl_cache_line_size from pcibios_init() instead and drop
PCI_CACHE_LINE_BYTES handling from generic pci code.
Orignally-From: David Miller <davem@davemloft.net>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Till now, CLS has been determined either by arch code or as
L1_CACHE_BYTES. Only x86 and ia64 set CLS explicitly and x86 doesn't
always get it right. On most configurations, the chance is that
firmware configures the correct value during boot.
This patch makes pci_init() determine CLS by looking at what firmware
has configured. It scans all devices and if all non-zero values
agree, the value is used. If none is configured or there is a
disagreement, pci_dfl_cache_line_size is used. arch can set the dfl
value (via PCI_CACHE_LINE_BYTES or pci_dfl_cache_line_size) or
override the actual one.
ia64, x86 and sparc64 updated to set the default cls instead of the
actual one.
While at it, declare pci_cache_line_size and pci_dfl_cache_line_size
in pci.h and drop private declarations from arch code.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Miller <davem@davemloft.net>
Acked-by: Greg KH <gregkh@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* git://git.infradead.org/~dwmw2/iommu-2.6.32:
x86: Move pci_iommu_init to rootfs_initcall()
Run pci_apply_final_quirks() sooner.
Mark pci_apply_final_quirks() __init rather than __devinit
Rename pci_init() to pci_apply_final_quirks(), move it to quirks.c
intel-iommu: Yet another BIOS workaround: Isoch DMAR unit with no TLB space
intel-iommu: Decode (and ignore) RHSA entries
intel-iommu: Make "Unknown DMAR structure" message more informative
This function may have done more in the past, but all it does now is
apply the PCI_FIXUP_FINAL quirks. So name it sensibly and put it where
it belongs.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
After attempting to change the power state of a PCI device
pci_raw_set_power_state() doesn't check if the value it wrote into
the device's PCI_PM_CTRL register has been stored in there, but
unconditionally modifies the device's current_state field to reflect
the change. This may cause problems to happen if the power state of
the device hasn't been changed in fact, because it will make the PCI
PM core make a wrong assumption.
To prevent such situations from happening modify
pci_raw_set_power_state() so that it reads the device's PCI_PM_CTRL
register after writing into it and uses the value read from the
register to update the device's current_state field. Also make it
print a message saying that the device refused to change its power
state as requested (returning an error code in such cases would cause
suspend regressions to appear on some systems, where device drivers'
suspend routines return error codes if pci_set_power_state() fails).
Reviewed-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Some PCI devices fail if their standard configuration registers are
restored twice in a row. Prevent this from happening by making
pci_restore_state() clear the saved_state flag of the device right
after the device's standard configuration registers have been
populated with the previously saved values.
Simplify PCI PM callbacks by removing the direct clearing of
state_saved from them, as it shouldn't be necessary any more (except
in pci_pm_thaw(), where it has to be cleared, so that the values saved
during the "freeze" phase of hibernation are not used later by mistake).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Introduce a new PCI device flag, wakeup_prepared, to prevent PCI
wake-up preparation code from being executed twice in a row for the
same device and for the same purpose.
Reviewed-by: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Rework the PCI wake-up code so that it's easier to read without
changing the functionality.
Reviewed-by: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
In general a BIOS may goof or we may hotplug in a hotplug controller.
In either case the kernel needs to reserve resources for plugging
in more devices in the future instead of creating a minimal resource
assignment.
We already do this for cardbus bridges I am just adding a variant
for pcie bridges.
v2: Make testing for pcie hotplug bridges based on a flag.
So far we only set the flag for pcie but a header_quirk
could easily be added for the non-standard pci hotplug
bridges.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Background:
Graphic devices are accessed through ranges in I/O or memory space. While most
modern devices allow relocation of such ranges, some "Legacy" VGA devices
implemented on PCI will typically have the same "hard-decoded" addresses as
they did on ISA. For more details see "PCI Bus Binding to IEEE Std 1275-1994
Standard for Boot (Initialization Configuration) Firmware Revision 2.1"
Section 7, Legacy Devices.
The Resource Access Control (RAC) module inside the X server currently does
the task of arbitration when more than one legacy device co-exists on the same
machine. But the problem happens when these devices are trying to be accessed
by different userspace clients (e.g. two server in parallel). Their address
assignments conflict. Therefore an arbitration scheme _outside_ of the X
server is needed to control the sharing of these resources. This document
introduces the operation of the VGA arbiter implemented for Linux kernel.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Tiago Vignatti <tiago.vignatti@nokia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Some devices allow an individual function to be reset without affecting
other functions in the same device: that's what pci_reset_function does.
For devices that have this support, expose reset attribite in sysfs.
This is useful e.g. for virtualization, where a qemu userspace
process wants to reset the device when the guest is reset,
to emulate machine reboot as closely as possible.
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Without the check, the config space may be filled with zeros. Though
the driver should try to avoid call restoring before saving, but the
pci layer also should check this.
Also removes the existing check in pci_restore_standard_config, since
it's superfluous with the new check in restore_state.
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Alek Du <alek.du@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
For many purposes, including interrupt-swizzling, devices with ARI
enabled behave as if they have one device (number 0) and 256 functions.
This probably hasn't bitten us in practice because all ARI devices I've
seen are also IOV devices, and IOV devices are required to use MSI.
This isn't guaranteed, and there are legitimate reasons to use ARI
without IOV, and hence potentially use pin-based interrupts.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
For devices attached to the root bus, we can't trigger Secondary Bus
Reset because there is no bridge device associated with the bus. So
need to check bus->self again NULL first before using it.
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (74 commits)
PCI: make msi_free_irqs() to use msix_mask_irq() instead of open coded write
PCI: Fix the NIU MSI-X problem in a better way
PCI ASPM: remove get_root_port_link
PCI ASPM: cleanup pcie_aspm_sanity_check
PCI ASPM: remove has_switch field
PCI ASPM: cleanup calc_Lx_latency
PCI ASPM: cleanup pcie_aspm_get_cap_device
PCI ASPM: cleanup clkpm checks
PCI ASPM: cleanup __pcie_aspm_check_state_one
PCI ASPM: cleanup initialization
PCI ASPM: cleanup change input argument of aspm functions
PCI ASPM: cleanup misc in struct pcie_link_state
PCI ASPM: cleanup clkpm state in struct pcie_link_state
PCI ASPM: cleanup latency field in struct pcie_link_state
PCI ASPM: cleanup aspm state field in struct pcie_link_state
PCI ASPM: fix typo in struct pcie_link_state
PCI: drivers/pci/slot.c should depend on CONFIG_SYSFS
PCI: remove redundant __msi_set_enable()
PCI PM: consistently use type bool for wake enable variable
x86/ACPI: Correct maximum allowed _CRS returned resources and warn if exceeded
...
Other functions use type bool, so use that for pci_enable_wake as well.
Signed-off-by: Frans Pop <elendil@planet.nl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
If a PCI device is not power-manageable either by the platform, or
with the help of the native PCI PM interface, pci_target_state() will
return either PCI_D3hot, or PCI_POWER_ERROR for it, depending on
whether or not the device is configured to wake up the system. Alas,
none of these return values is correct, because each of them causes
pci_prepare_to_sleep() to return error code, although it should
complete successfully in such a case.
Fix this problem by making pci_target_state() always return PCI_D0
for devices that cannot be power managed.
Cc: stable@kernel.org
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
PCI-to-PCI Bridge 1.2 specifies that the Secondary Bus Reset bit can
force the assertion of RST# on the secondary interface, which can be
used to reset all devices including subordinates under this bus. This
can be used to reset a function if this function is the only device
under this bus.
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
PCI PM 1.2 specifies that the device will perform an internal reset upon
transitioning from D3hot to D0 when the NO_SOFT_RESET bit is clear. This
method can be used to reset a function if neither PCIe FLR nor PCI AF FLR
are supported.
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch enhances the FLR functions:
1) remove disable_irq() so the shared IRQ won't be disabled.
2) replace the 1s wait with 100, 200 and 400ms wait intervals
for the Pending Transaction.
3) replace mdelay() with msleep().
4) add might_sleep().
5) lock the device to prevent PM suspend from accessing the CSRs
during the reset.
6) coding style fixes.
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Use pci_is_root_bus() in pci_common_swizzle() for checking if the pci
bus is root, for code consistency.
Reviewed-by: Alex Chiang <achiang@hp.com>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Use pci_is_root_bus() in pci_get_interrupt_pin() for checking if the
pci bus is root, for code consistency.
Reviewed-by: Alex Chiang <achiang@hp.com>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch (as1235) adds an array of PCI power-state names, together
with a simple inline accessor routine.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Adds support for PCI Express transaction layer end-to-end CRC checking
(ECRC). This patch will enable/disable ECRC checking by setting/clearing
the ECRC Check Enable and/or ECRC Generation Enable bits for devices that
support ECRC.
The ECRC setting is controlled by the "pci=ecrc=<policy>" command-line
option. If this option is not set or is set to 'bios", the enable and
generation bits are left in whatever state that firmware/BIOS set them to.
The "off" setting turns them off, and the "on" option turns them on (if the
device supports it).
Turning ECRC on or off can be a data integrity versus performance
tradeoff. In theory, turning it on will catch more data errors, turning
it off means possibly better performance since CRC does not need to be
calculated by the PCIe hardware and packet sizes are reduced.
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
According to the PCI PM specification (PCI Bus Power Management
Interface Specification, Rev. 1.2, Section 5.4.1) we are supposed to
reinitialize devices that have PCI_PM_CTRL_NO_SOFT_RESET clear during
all transitions from PCI_D3hot to PCI_D0, but we only do it if the
device's current_state field is equal to PCI_UNKNOWN.
This may lead to problems if a device with PCI_PM_CTRL_NO_SOFT_RESET
unset is put into PCI_D3hot at run time by its driver and
pci_set_power_state() is used to put it back into PCI_D0, because in
that case the device will remain uninitialized after
pci_set_power_state() has returned. Prevent that from happening by
modifying pci_raw_set_power_state() to reinitialize devices with
PCI_PM_CTRL_NO_SOFT_RESET unset during all transitions from D3 to D0.
Cc: stable@kernel.org
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Recent PCI PM changes introduced a bug that causes some devices to be
mishandled after kexec and during early initialization. The failure
scenario in the kexec case is the following:
* Assume a PCI device is not power-manageable by the platform and has
PCI_PM_CTRL_NO_SOFT_RESET set in PMCSR.
* The device is put into D3 before kexec (using the native PCI PM).
* After kexec, pci_setup_device() sets the device's power state to
PCI_UNKNOWN.
* pci_set_power_state(dev, PCI_D0) is called by the device's driver.
* __pci_start_power_transition(dev, PCI_D0) is called and since the
device is not power-manageable by the platform, it causes
pci_update_current_state(dev, PCI_D0) to be called. As a result
the device's current_state field is updated to PCI_D3, in
accordance with the contents of its PCI PM registers.
* pci_raw_set_power_state() is called and it changes the device power
state to D0. *However*, it should also call pci_restore_bars() to
reinitialize the device, but it doesn't, because the device's
current_state field has been modified earlier.
To prevent this from happening, modify pci_platform_power_transition()
so that it doesn't use pci_update_current_state() to update the
current_state field for devices that aren't power-manageable by the
platform. Instead, this field should be updated directly for devices
that don't support the native PCI PM.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
PCIe 1.1 base neither requires the endpoint to implement the entire
PCIe capability structure nor specifies default values of registers
that are not implemented by the device. So we only save and restore
registers that must be implemented by different device types if the
device PCIe capability version is 1.
PCIe 1.1 Capability Structure Expansion ECN and PCIe 2.0 requires
all registers in the PCIe capability to be either implemented or
hardwired to 0. Their PCIe capability version is 2.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch sets up disabled bridges even if buses have already been
added.
pci_assign_unassigned_resources is called after buses are added.
pci_assign_unassigned_resources calls pci_bus_assign_resources.
pci_bus_assign_resources calls pci_setup_bridge to configure BARs of
bridges.
Currently pci_setup_bridge returns immediately if the bus have already
been added. So pci_assign_unassigned_resources can't configure BARs of
bridges that were added in a disabled state; this patch fixes the issue.
On logical hot-add, we need to prevent the kernel from re-initializing
bridges that have already been initialized. To achieve this,
pci_setup_bridge returns immediately if the bridge have already been
enabled.
We don't need to check whether the specified bus is a root bus or not.
pci_setup_bridge is not called on a root bus, because a root bus does
not have a bridge.
The patch adds a new helper function, pci_is_enabled. I made the
function name similar to pci_is_managed. The codes which use
enable_cnt directly are changed to use pci_is_enabled.
Acked-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Yuji Shimada <shimada-yxb@necst.nec.co.jp>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (28 commits)
trivial: Update my email address
trivial: NULL noise: drivers/mtd/tests/mtd_*test.c
trivial: NULL noise: drivers/media/dvb/frontends/drx397xD_fw.h
trivial: Fix misspelling of "Celsius".
trivial: remove unused variable 'path' in alloc_file()
trivial: fix a pdlfush -> pdflush typo in comment
trivial: jbd header comment typo fix for JBD_PARANOID_IOFAIL
trivial: wusb: Storage class should be before const qualifier
trivial: drivers/char/bsr.c: Storage class should be before const qualifier
trivial: h8300: Storage class should be before const qualifier
trivial: fix where cgroup documentation is not correctly referred to
trivial: Give the right path in Documentation example
trivial: MTD: remove EOL from MODULE_DESCRIPTION
trivial: Fix typo in bio_split()'s documentation
trivial: PWM: fix of #endif comment
trivial: fix typos/grammar errors in Kconfig texts
trivial: Fix misspelling of firmware
trivial: cgroups: documentation typo and spelling corrections
trivial: Update contact info for Jochen Hein
trivial: fix typo "resgister" -> "register"
...
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (88 commits)
PCI: fix HT MSI mapping fix
PCI: don't enable too much HT MSI mapping
x86/PCI: make pci=lastbus=255 work when acpi is on
PCI: save and restore PCIe 2.0 registers
PCI: update fakephp for bus_id removal
PCI: fix kernel oops on bridge removal
PCI: fix conflict between SR-IOV and config space sizing
powerpc/PCI: include pci.h in powerpc MSI implementation
PCI Hotplug: schedule fakephp for feature removal
PCI Hotplug: rename legacy_fakephp to fakephp
PCI Hotplug: restore fakephp interface with complete reimplementation
PCI: Introduce /sys/bus/pci/devices/.../rescan
PCI: Introduce /sys/bus/pci/devices/.../remove
PCI: Introduce /sys/bus/pci/rescan
PCI: Introduce pci_rescan_bus()
PCI: do not enable bridges more than once
PCI: do not initialize bridges more than once
PCI: always scan child buses
PCI: pci_scan_slot() returns newly found devices
PCI: don't scan existing devices
...
Fix trivial append-only conflict in Documentation/feature-removal-schedule.txt
If the device is not supposed to wake up the system, ie. when
device_may_wakeup(&dev->dev) returns 'false', pci_prepare_to_sleep()
should pass 'false' to pci_enable_wake() so that it calls the
platform to disable the wake-up capability of the device.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The radeonfb driver needs to program the device's PMCSR directly due
to some quirky hardware it has to handle (see
http://bugzilla.kernel.org/show_bug.cgi?id=12846 for details) and
after doing that it needs to call the platform (usually ACPI) to
finish the power transition of the device. Currently it uses
pci_set_power_state() for this purpose, however making a specific
assumption about the internal behavior of this function, which has
changed recently so that this assumption is no longer satisfied.
For this reason, introduce __pci_complete_power_transition() that may
be called by the radeonfb driver to complete the power transition of
the device. For symmetry, introduce __pci_start_power_transition().
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
There is a problem with PCI devices without any PM support (either
native or through the platform) that pci_set_power_state() always
returns error code for them, even if they are being put into D0.
However, such devices are always in D0, so pci_set_power_state()
should return success when attempting to put such a device into D0.
It also should update the current_state field for these devices as
appropriate. This modification is necessary so that the standard
configuration registers of these devices are successfully restored by
pci_restore_standard_config() during the "early" phase of resume.
In addition, pci_set_power_state() should check the value of
current_state before calling the platform to change the power state
of the device to avoid doing that unnecessarily.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Move pci_restore_standard_config() from pci.c to pci-driver.c and
make it static.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Once we have allowed timer interrupts to be enabled during the early
phase of resuming devices, we are now able to use the generic
pci_set_power_state() to put PCI devices into D0 at that time. Then,
the platform-specific PM code will have a chance to handle devices
that don't implement the native PCI PM or that require some
additional, platform-specific operations to be carried out to power
them up. Also, by doing this we can simplify the code quite a bit.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
PCIe 2.0 defines several new registers (Device Control 2, Link Control 2,
and Slot Control 2). Save and retore them in pci_save_pcie_state() and
pci_restore_pcie_state().
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Restore the volatile registers in the SR-IOV capability after the
D3->D0 transition.
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
If a device has the SR-IOV capability, initialize it (set the ARI
Capable Hierarchy in the lowest numbered PF if necessary; calculate
the System Page Size for the VF MMIO, probe the VF Offset, Stride
and BARs). A lock for the VF bus allocation is also initialized if
a PF is the lowest numbered PF.
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch allows memory resources to be assigned with a specified
alignment at boot-time or run-time. The patch is useful when we use PCI
pass-through, because page-aligned memory resources are required to
securely share PCI resources with guest drivers.
If you want to assign the resource at boot time, please set
"pci=resource_alignment=" boot parameter.
This is format of "pci=resource_alignment=" boot parameter:
[<order of align>@][<domain>:]<bus>:<slot>.<func>[; ...]
Specifies alignment and device to reassign
aligned memory resources.
If <order of align> is not specified, PAGE_SIZE is
used as alignment.
PCI-PCI bridge can be specified, if resource
windows need to be expanded.
This is example:
pci=resource_alignment=20@07:00.0;18@0f:00.0;00:1d.7
If you want to assign the resource at run-time, please set
"/sys/bus/pci/resource_alignment" file, and hot-remove the device and
hot-add the device. For this purpose, fakephp or PCI hotplug interfaces
can be used.
The format of "/sys/bus/pci/resource_alignment" file is the same with
boot parameter. You can use "," instead of ";".
For example:
# cd /sys/bus/pci
# echo -n 20@12:00.0 > resource_alignment
# echo 1 > devices/0000:12:00.0/remove
# echo 1 > rescan
Reviewed-by: Alex Chiang <achiang@hp.com>
Reviewed-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Yuji Shimada <shimada-yxb@necst.nec.co.jp>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Current pci_common_swizzle() seems to have a assumption that
pci_bus->self is NULL on the pci root bus. But it might not be true on
some platforms. Because of this wrong assumption, pci_common_swizzle()
might cause endless loop. We must check pci_bus->parent instead.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Current pci_get_interrupt_pin() seems to have an assumption that
pci_bus->self is NULL on the root pci bus. But it might not be true on
some platforms. Because of this wrong assumption, current
pci_get_interrupt_pin() might cause endless loop. We must check
pci_bus->parent instead.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
For all devices need to do function level reset, currently we need wait for
at least 200ms, which can be too long if we have lots of devices...
The patch checked pending bit before msleep() to skip some unnecessary
sleeping interval.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Fix pci kernel-doc parameter missing notation, correct
function name, and fix typo:
Warning(linux-2.6.28-git10//drivers/pci/pci.c:1511): No description found for parameter 'exclusive'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
pci_restore_standard_config() unconditionally changes current_state
to PCI_D0 after attempting to change the device's power state, but
it should rather read the actual current power state from the
device.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Check if the standard configuration registers of a PCI device have
been saved during suspend before trying to restore them during
resume.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-By: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
pci_restore_standard_config() adds extra delay for PCI buses in
low power states (B2 or B3), but this is only correct for buses in
B2, because the buses in B3 are reset when they are put back into
B0. Thus we should wait for such buses to settle after the reset,
but it's not a good idea to wait that long (1.1 s) with interrupts
off.
On the other hand, we have never waited for buses in B2 and B3
during resume and it seems reasonable to go back to this well
tested behaviour.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Devices that have MSI-X enabled before suspend to RAM or hibernation
and that are in a low power state during resume will not be handled
correctly by pci_restore_standard_config(). Namely, it first calls
pci_restore_state() which calls pci_restore_msi_state(), which in turn
executes __pci_restore_msix_state() that accesses the device's memory
space to restore the contents of the MSI-X table. However, if the
device is in a low power state at this point, it's memory space is
not accessible.
The easiest way to fix this potential problem is to make
pci_restore_standard_config() call pci_restore_state() after
it has put the device into the full power state, D0. Fortunately,
all of this is done with interrupts off, so the change of ordering
should not cause any trouble.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
There is a problem in our handling of suspend-resume of PCI devices that
many of them have their standard config registers restored with
interrupts enabled and they are put into the full power state with
interrupts enabled as well. This may lead to the following scenario:
* an interrupt vector is shared between two or more devices
* one device is resumed earlier and generates an interrupt
* the interrupt handler of another device tries to handle it and
attempts to access the device the config space of which hasn't been
restored yet and/or which still is in a low power state
* the system crashes as a result
To prevent this from happening we should restore the standard
configuration registers of all devices with interrupts disabled and we
should put them into the D0 power state right after that.
Unfortunately, this cannot be done using the existing
pci_set_power_state(), because it can sleep. Also, to do it we have to
make sure that the config spaces of all devices were actually saved
during suspend.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This reverts commit 98e6e286d7, as Yinghai
Lu reports that it breaks kexec with at least the e1000 and e1000e
drivers. The reason is that the shutdown sequence puts the hardware
into D3 sleep, and the commit causes us to claim that it then is in D0
(running) state just because we don't understand the PM capabilities.
Which then later makes "pci_set_power_state()" not do anything, and the
device never wakes up properly and just returns 0xff to everything.
Reported-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: From: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Jesse Barnes <jesse.barnes@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use the observation that the power state of a PCI device can be
loaded into its pci_dev structure as soon as pci_pm_init() is run for
it and make that happen.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
It generally is better to avoid accessing devices behind bridges that
may not be in the D0 power state, because in that case the bridges'
secondary buses may not be accessible. For this reason, during the
early phase of resume (ie. with interrupts disabled), before
restoring the standard config registers of a device, check the power
state of the bridge the device is behind and postpone the restoration
of the device's config space, as well as any other operations that
would involve accessing the device, if that state is not D0.
In such cases the restoration of the device's config space will be
retried during the "normal" phase of resume (ie. with interrupts
enabled), so that the bridge can be put into D0 before that happens.
Also, save standard configuration registers of PCI devices during the
"normal" phase of suspend (ie. with interrupts enabled), so that the
bridges the devices are behind can be put into low power states (we
don't put bridges into low power states at the moment, but we may
want to do it in the future and it seems reasonable to design for
that).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
PCI devices without drivers are not disabled during suspend and
hibernation, but they are enabled during resume, with the help of
pci_reenable_device(), so there is an unbalanced execution of
pcibios_enable_device() in the resume code path.
To correct this introduce function pci_disable_enabled_device()
that will disable the argument device, if it is enabled when the
function is being run, without updating the device's pci_dev
structure and use it in the suspend code path to balance the
pci_reenable_device() executed during resume.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
During an online device reset it may be useful to disable bus-mastering.
pci_disable_device() does that, and far more besides, so is not suitable
for an online reset.
Add pci_clear_master() which does just this.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch adds pci_common_swizzle(), which swizzles INTx values all the
way up to a root bridge.
This common implementation can replace several architecture-specific
ones. This should someday be combined with pci_get_interrupt_pin(),
but I left it separate for now to make reviewing easier.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Currently, PCI devices without the PM capability that are power
manageable by the platform (eg. ACPI) are not handled correctly
by pci_set_power_state(), because their current_state field is not
updated to reflect the new power state of the device. Fix this by
making pci_update_current_state() accept additional argument
representing the power state of the device as set by the platform.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
When PCI devices are initialized, we check whether they support PCI PM
caps and set the device can_wakeup flag if so. However, some devices
may have platform provided wakeup events rather than PCI PME signals, so
we need to set can_wakeup in that case too. Doing so should allow
wakeups from many more devices, especially on cost constrained systems.
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Joseph Chan <JosephChan@via.com.tw>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Add a function to map a given resource number to a corresponding
register so drivers can get the offset and type of device specific BARs.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Remove the unnecessary number of resources condition checks because
the pci_update_resource() will check availability of the resources.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This cleanup removes unnecessary argument 'struct resource *res' in
pci_update_resource(), so it takes same arguments as other companion
functions (pci_assign_resource(), etc.).
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
It's too large to be inlined.
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch (as1186) fixes a minor mistake in pci_enable_wake(). When
the routine is asked to disable remote wakeup, it should not return an
error merely because the device is not allowed to do wakeups!
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch adds pci_swizzle_interrupt_pin(), which implements the
INTx swizzling algorithm specified in Table 9-1 of the "PCI-to-PCI
Bridge Architecture Specification," revision 1.2.
There are many architecture-specific implementations of this
swizzle that can be replaced by this common one.
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch makes pci_get_interrupt_pin() return values encoded
the same way as the "Interrupt Pin" value in PCI config space,
i.e., 1=INTA, ..., 4=INTD.
pirq_bios_set() is the only in-tree caller of pci_get_interrupt_pin()
and pci_get_interrupt_pin() is not exported.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: hpa@zytor.com
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Since interrupts will soon be disabled at PCI resume time, we need to
pre-allocate memory to save/restore PCI config space (or use GFP_ATOMIC,
but this is safer).
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Device drivers that use pci_request_regions() (and similar APIs) have a
reasonable expectation that they are the only ones accessing their device.
As part of the e1000e hunt, we were afraid that some userland (X or some
bootsplash stuff) was mapping the MMIO region that the driver thought it
had exclusively via /dev/mem or via various sysfs resource mappings.
This patch adds the option for device drivers to cause their reserved
regions to the "banned from /dev/mem use" list, so now both kernel memory
and device-exclusive MMIO regions are banned.
NOTE: This is only active when CONFIG_STRICT_DEVMEM is set.
In addition to the config option, a kernel parameter iomem=relaxed is
provided for the cases where developers want to diagnose, in the field,
drivers issues from userspace.
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The _OSC capability OSC_MSI_SUPPORT is set when the root bridge is added
with pci_acpi_osc_support(), so we no longer need to do it in the PCI
MSI driver. Also adds the function pci_msi_enabled, which returns true
if pci=nomsi is not on the kernel command-line.
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The _OSC capability OSC_EXT_PCI_CONFIG_SUPPORT is set when the root
bridge is added with pci_acpi_osc_support() if we can access PCI
extended config space.
This adds the function pci_ext_cfg_avail which returns true if we can
access PCI extended config space (offset greater than 0xff). It
currently only returns false if arch=x86 and raw_pci_ext_ops is not set
(which might happen if pci=nommcfg is set on the kernel command-line).
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Some PCI devices implement PCI Advanced Features, which means they
support Function Level Reset(FLR). Implement support for that in
pci_reset_function.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Separate out function level reset so that pci_reset_function can be more
easily extended.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
for fsck sake, it's used only when parsing kernel command line...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Before initialization, dev->irq may be zero. Make sure we don't disable
it at reset time in that case.
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The original ARI support code has a compatibility problem with non-ARI
devices. If a device doesn't support ARI, turning on ARI forwarding on
its upper level bridge will cause undefined behavior.
This fix turns on ARI forwarding only when the subordinate devices
support it.
Tested-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Sometimes, it's necessary to enable software's ability to quiesce and
reset endpoint hardware with function-level granularity, so provide
support for it.
The patch implement Function Level Reset(FLR) feature following PCI-e
spec. And this is the first step. We would add more generic method, like
D0/D3, to allow more devices support this function.
The patch contains two functions. pcie_reset_function() is the new
driver API, and, contains some action to quiesce a device. The other
function is a helper: pcie_execute_reset_function() just executes the
reset for a particular device function.
Current the usage model is in KVM. Function reset is necessary for
assigning device to a guest, or moving it between partitions.
For Function Level Reset(FLR), please refer to PCI Express spec chapter
6.6.2.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Currently linux doesn't have any code to set the "MSI supported" bit in
Support Fireld of _OSC. This patch adds the code for that.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch adds support for PCI Express Alternative Routing-ID
Interpretation (ARI) capability.
The ARI capability extends the Function Number field of the PCI Express
Endpoint by reusing the Device Number which is otherwise hardwired to 0.
With ARI, an Endpoint can have up to 256 functions.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This is a cleanup that changes all PCI configuration space size
representations to the macros (PCI_CFG_SPACE_SIZE and
PCI_CFG_SPACE_EXP_SIZE). And the macros are also moved from
drivers/pci/probe.c to drivers/pci/pci.h.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>