Allow PCI host bridge drivers to use the new host bridge interfaces to
register their host bridge.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
Provide a way to allocate driver-specific data along with a PCI host bridge
structure. The bridge's ->private field points to this data.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
Make the existing pci_host_bridge structure a proper device that is usable
by PCI host drivers in a more standard way. In addition to the existing
pci_scan_bus(), pci_scan_root_bus(), pci_scan_root_bus_msi(), and
pci_create_root_bus() interfaces, this unfortunately means having to add
yet another interface doing basically the same thing, and add some extra
code in the initial step.
However, this time it's more likely to be extensible enough that we won't
have to do another one again in the future, and we should be able to reduce
code much more as a result.
The main idea is to pull the allocation of 'struct pci_host_bridge' out of
the registration, and let individual host drivers and architecture code
fill the members before calling the registration function.
There are a number of things we can do based on this:
* Use a single memory allocation for the driver-specific structure
and the generic PCI host bridge
* consolidate the contents of driver-specific structures by moving
them into pci_host_bridge
* Add a consistent interface for removing a PCI host bridge again
when unloading a host driver module
* Replace the architecture specific __weak pcibios_*() functions with
callbacks in a pci_host_bridge device
* Move common boilerplate code from host drivers into the generic
function, based on contents of the structure
* Extend pci_host_bridge with additional members when needed without
having to add arguments to pci_scan_*().
* Move members of struct pci_bus into pci_host_bridge to avoid
having lots of identical copies.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
Remove the assumption that IORESOURCE_ROM_ENABLE == PCI_ROM_ADDRESS_ENABLE.
PCI_ROM_ADDRESS_ENABLE is the ROM enable bit defined by the PCI spec, so if
we're reading or writing a BAR register value, that's what we should use.
IORESOURCE_ROM_ENABLE is a corresponding bit in struct resource flags.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
On DT based systems, the of_dma_configure() API implements DMA
configuration for a given device. On ACPI systems an API equivalent to
of_dma_configure() is missing which implies that it is currently not
possible to set-up DMA operations for devices through the ACPI generic
kernel layer.
This patch fills the gap by introducing acpi_dma_configure/deconfigure()
calls that for now are just wrappers around arch_setup_dma_ops() and
arch_teardown_dma_ops() and also updates ACPI and PCI core code to use
the newly introduced acpi_dma_configure/acpi_dma_deconfigure functions.
Since acpi_dma_configure() is used to configure DMA operations, the
function initializes the dma/coherent_dma masks to sane default values
if the current masks are uninitialized (also to keep the default values
consistent with DT systems) to make sure the device has a complete
default DMA set-up.
The DMA range size passed to arch_setup_dma_ops() is sized according
to the device coherent_dma_mask (starting at address 0x0), mirroring the
DT probing path behaviour when a dma-ranges property is not provided
for the device being probed; this changes the current arch_setup_dma_ops()
call parameters in the ACPI probing case, but since arch_setup_dma_ops()
is a NOP on all architectures but ARM/ARM64 this patch does not change
the current kernel behaviour on them.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> [pci]
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Tomasz Nowicki <tn@semihalf.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: Tomasz Nowicki <tn@semihalf.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Tomasz Nowicki <tn@semihalf.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Per PCIe spec r3.0, sec 2.3.1.1, the Read Completion Boundary (RCB)
determines the naturally aligned address boundaries on which a Read Request
may be serviced with multiple Completions:
- For a Root Complex, RCB is 64 bytes or 128 bytes
This value is reported in the Link Control Register
Note: Bridges and Endpoints may implement a corresponding command bit
which may be set by system software to indicate the RCB value for the
Root Complex, allowing the Bridge/Endpoint to optimize its behavior
when the Root Complex’s RCB is 128 bytes.
- For all other system elements, RCB is 128 bytes
Per sec 7.8.7, if a Root Port only supports a 64-byte RCB, the RCB of all
downstream devices must be clear, indicating an RCB of 64 bytes. If the
Root Port supports a 128-byte RCB, we may optionally set the RCB of
downstream devices so they know they can generate larger Completions.
Some BIOSes supply an _HPX that tells us to set RCB, even though the Root
Port doesn't have RCB set, which may lead to Malformed TLP errors if the
Endpoint generates completions larger than the Root Port can handle.
The IBM x3850 X6 with BIOS version -[A8E120CUS-1.30]- 08/22/2016 supplies
such an _HPX and a Mellanox MT27500 ConnectX-3 device fails to initialize:
mlx4_core 0000:41:00.0: command 0xfff timed out (go bit not cleared)
mlx4_core 0000:41:00.0: device is going to be reset
mlx4_core 0000:41:00.0: Failed to obtain HW semaphore, aborting
mlx4_core 0000:41:00.0: Fail to reset HCA
------------[ cut here ]------------
kernel BUG at drivers/net/ethernet/mellanox/mlx4/catas.c:193!
After 6cd33649fa ("PCI: Add pci_configure_device() during enumeration")
and 7a1562d4f2 ("PCI: Apply _HPX Link Control settings to all devices
with a link"), we apply _HPX settings to *all* devices, not just those
hot-added after boot.
Before 7a1562d4f2, we didn't touch the Mellanox RCB, and the device
worked. After 7a1562d4f2, we set its RCB to 128, and it failed.
Set the RCB to 128 iff the Root Port supports a 128-byte RCB. Otherwise,
set RCB to 64 bytes. This effectively ignores what _HPX tells us about
RCB.
Note that this change only affects _HPX handling. If we have no _HPX, this
does nothing with RCB.
[bhelgaas: changelog, clear RCB if not set for Root Port]
Fixes: 6cd33649fa ("PCI: Add pci_configure_device() during enumeration")
Fixes: 7a1562d4f2 ("PCI: Apply _HPX Link Control settings to all devices with a link")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=187781
Tested-by: Frank Danapfel <fdanapfe@redhat.com>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Myron Stowe <myron.stowe@redhat.com>
CC: stable@vger.kernel.org # v3.18+
Save the position of the error reporting capability so it doesn't need to
be rediscovered during error handling.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Lukas Wunner <lukas@wunner.de>
Add Precision Time Measurement (PTM) support (see PCIe r3.1, sec 6.22).
Enable PTM on PTM Root devices and switch ports. This does not enable PTM
on endpoints.
There currently are no PTM-capable devices on the market, but it is
expected to be supported by the Intel Apollo Lake platform.
[bhelgaas: complete rework]
Signed-off-by: Jonathan Yong <jonathan.yong@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* pci/aspm:
PCI/ASPM: Remove redundant check of pcie_set_clkpm
* pci/dpc:
PCI: Remove DPC tristate module option
PCI: Bind DPC to Root Ports as well as Downstream Ports
PCI: Fix whitespace in struct dpc_dev
PCI: Convert Downstream Port Containment driver to use devm_* functions
* pci/hotplug:
PCI: Allow additional bus numbers for hotplug bridges
* pci/misc:
PCI: Include <asm/dma.h> for isa_dma_bridge_buggy
PCI: Make bus_attr_resource_alignment static
MAINTAINERS: Add file patterns for PCI device tree bindings
PCI: Fix comment typo
* pci/msi:
PCI/MSI: irqchip: Fix PCI_MSI dependencies
* pci/pm:
PCI: pciehp: Ignore interrupts during D3cold
PCI: Document connection between pci_power_t and hardware PM capability
PCI: Add runtime PM support for PCIe ports
ACPI / hotplug / PCI: Runtime resume bridge before rescan
PCI: Power on bridges before scanning new devices
PCI: Put PCIe ports into D3 during suspend
PCI: Don't clear d3cold_allowed for PCIe ports
PCI / PM: Enforce type casting for pci_power_t
* pci/virtualization:
PCI: Add ACS quirk for Solarflare SFC9220
PCI: Add DMA alias quirk for Adaptec 3805
PCI: Mark Atheros AR9485 and QCA9882 to avoid bus reset
PCI: Add function 1 DMA alias quirk for Marvell 88SE9182
A user may hot add a switch requiring more than one bus to enumerate. This
previously required a system reboot if BIOS did not sufficiently pad the
bus resource, which they frequently don't do.
Add a kernel parameter so a user can specify the minimum number of bus
numbers to reserve for a hotplug bridge's subordinate buses so rebooting
won't be necessary.
The default is 1, which is equivalent to previous behavior.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
When a PCI device is removed through sysfs interface, the upstream bridge
(PCIe port) can be runtime suspended if it was the last device on that bus.
Now, if the bridge is in D3 we cannot find devices below the bridge
anymore. For example following fails to find the removed device again:
# echo 1 > /sys/bus/pci/devices/0000:00:01.0/0000:01:00.0/remove
# echo 1 > /sys/bus/pci/devices/0000:00:01.0/rescan
Where 0000:00:01.0 is the bridge device.
In order to be able to rescan devices below the bridge add
pm_runtime_get_sync()/pm_runtime_put() calls to pci_scan_bridge(). This
should keep bridges powered on while their children devices are being
scanned.
Reported-by: Peter Wu <peter@lekensteyn.nl>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Instead of assigning bus->domain_nr inside pci_bus_assign_domain_nr(),
return the domain and let the caller do the assignment. Rename
pci_bus_assign_domain_nr() to pci_bus_find_domain_nr() to reflect this.
No functional change intended.
[bhelgaas: changelog]
Signed-off-by: Tomasz Nowicki <tn@semihalf.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
* pci/hotplug:
PCI: Use cached copy of PCI_EXP_SLTCAP_HPC bit
* pci/resource:
PCI: Disable all BAR sizing for devices with non-compliant BARs
x86/PCI: Mark Broadwell-EP Home Agent 1 as having non-compliant BARs
PCI: Identify Enhanced Allocation (EA) BAR Equivalent resources in sysfs
b84106b4e2 ("PCI: Disable IO/MEM decoding for devices with non-compliant
BARs") disabled BAR sizing for BARs 0-5 of devices that don't comply with
the PCI spec. But it didn't do anything for expansion ROM BARs, so we
still try to size them, resulting in warnings like this on Broadwell-EP:
pci 0000:ff:12.0: BAR 6: failed to assign [mem size 0x00000001 pref]
Move the non-compliant BAR check from __pci_read_base() up to
pci_read_bases() so it applies to the expansion ROM BAR as well as
to BARs 0-5.
Note that direct callers of __pci_read_base(), like sriov_init(), will now
bypass this check. We haven't had reports of devices with broken SR-IOV
BARs yet.
[bhelgaas: changelog]
Fixes: b84106b4e2 ("PCI: Disable IO/MEM decoding for devices with non-compliant BARs")
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andi Kleen <ak@linux.intel.com>
Solve IOMMU support issues with PCIe non-transparent bridges that use
Requester ID look-up tables (RID-LUT), e.g., the PEX8733.
The NTB connects devices in two independent PCI domains. Devices separated
by the NTB are not able to discover each other. A PCI packet being
forwared from one domain to another has to have its RID modified so it
appears on correct bus and completions are forwarded back to the original
domain through the NTB. The RID is translated using a preprogrammed table
(LUT) and the PCI packet propagates upstream away from the NTB. If the
destination system has IOMMU enabled, the packet will be discarded because
the new RID is unknown to the IOMMU. Adding a DMA alias for the new RID
allows IOMMU to properly recognize the packet.
Each device behind the NTB has a unique RID assigned in the RID-LUT. The
current DMA alias implementation supports only a single alias, so it's not
possible to support mutiple devices behind the NTB when IOMMU is enabled.
Enable all possible aliases on a given bus (256) that are stored in a
bitset. Alias devfn is directly translated to a bit number. The bitset is
not allocated for devices that have no need for DMA aliases.
More details can be found in the following article:
http://www.plxtech.com/files/pdf/technical/expresslane/RTC_Enabling%20MulitHostSystemDesigns.pdf
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
* pci/resource:
PCI: Simplify pci_create_attr() control flow
PCI: Don't leak memory if sysfs_create_bin_file() fails
PCI: Simplify sysfs ROM cleanup
PCI: Remove unused IORESOURCE_ROM_COPY and IORESOURCE_ROM_BIOS_COPY
MIPS: Loongson 3: Keep CPU physical (not virtual) addresses in shadow ROM resource
MIPS: Loongson 3: Use temporary struct resource * to avoid repetition
ia64/PCI: Keep CPU physical (not virtual) addresses in shadow ROM resource
ia64/PCI: Use ioremap() instead of open-coded equivalent
ia64/PCI: Use temporary struct resource * to avoid repetition
PCI: Clean up pci_map_rom() whitespace
PCI: Remove arch-specific IORESOURCE_ROM_SHADOW size from sysfs
PCI: Set ROM shadow location in arch code, not in PCI core
PCI: Don't enable/disable ROM BAR if we're using a RAM shadow copy
PCI: Don't assign or reassign immutable resources
PCI: Mark shadow copy of VGA ROM as IORESOURCE_PCI_FIXED
x86/PCI: Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs
PCI: Disable IO/MEM decoding for devices with non-compliant BARs
* pci/host-hv:
PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs
PCI: Look up IRQ domain by fwnode_handle
PCI: Add fwnode_handle to x86 pci_sysdata
* pci/aer:
PCI/AER: Log aer_inject error injections
PCI/AER: Log actual error causes in aer_inject
PCI/AER: Use dev_warn() in aer_inject
PCI/AER: Fix aer_inject error codes
* pci/enumeration:
PCI: Fix broken URL for Dell biosdevname
* pci/kconfig:
PCI: Cleanup pci/pcie/Kconfig whitespace
PCI: Include pci/hotplug Kconfig directly from pci/Kconfig
PCI: Include pci/pcie/Kconfig directly from pci/Kconfig
* pci/misc:
PCI: Add PCI_CLASS_SERIAL_USB_DEVICE definition
PCI: Add QEMU top-level IDs for (sub)vendor & device
unicore32: Remove unused HAVE_ARCH_PCI_SET_DMA_MASK definition
PCI: Consolidate PCI DMA constants and interfaces in linux/pci-dma-compat.h
PCI: Move pci_dma_* helpers to common code
frv/PCI: Remove stray pci_{alloc,free}_consistent() declaration
* pci/virtualization:
PCI: Wait for up to 1000ms after FLR reset
PCI: Support SR-IOV on any function type
* pci/vpd:
PCI: Prevent VPD access for buggy devices
PCI: Sleep rather than busy-wait for VPD access completion
PCI: Fold struct pci_vpd_pci22 into struct pci_vpd
PCI: Rename VPD symbols to remove unnecessary "pci22"
PCI: Remove struct pci_vpd_ops.release function pointer
PCI: Move pci_vpd_release() from header file to pci/access.c
PCI: Move pci_read_vpd() and pci_write_vpd() close to other VPD code
PCI: Determine actual VPD size on first access
PCI: Use bitfield instead of bool for struct pci_vpd_pci22.busy
PCI: Allow access to VPD attributes with size 0
PCI: Update VPD definitions
Add pci_ops.{add,remove}_bus() callbacks, which will be called on every
newly created bus and when a bus is being removed, respectively. This can
be used by drivers to implement driver-specific initialization and teardown
of the bus, in addition to the architecture-specifics implemented by the
pcibios_add_bus() and the pcibios_remove_bus() functions.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
There's only one kind of VPD, so we don't need to qualify it as "the
version described by PCI spec rev 2.2."
Rename the following symbols to remove unnecessary "pci22":
PCI_VPD_PCI22_SIZE -> PCI_VPD_MAX_SIZE
pci_vpd_pci22_size() -> pci_vpd_size()
pci_vpd_pci22_wait() -> pci_vpd_wait()
pci_vpd_pci22_read() -> pci_vpd_read()
pci_vpd_pci22_write() -> pci_vpd_write()
pci_vpd_pci22_ops -> pci_vpd_ops
pci_vpd_pci22_init() -> pci_vpd_init()
Tested-by: Shane Seymour <shane.seymour@hpe.com>
Tested-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
The PCI config header (first 64 bytes of each device's config space) is
defined by the PCI spec so generic software can identify the device and
manage its usage of I/O, memory, and IRQ resources.
Some non-spec-compliant devices put registers other than BARs where the
BARs should be. When the PCI core sizes these "BARs", the reads and writes
it does may have unwanted side effects, and the "BAR" may appear to
describe non-sensical address space.
Add a flag bit to mark non-compliant devices so we don't touch their BARs.
Turn off IO/MEM decoding to prevent the devices from consuming address
space, since we can't read the BARs to find out what that address space
would be.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Andi Kleen <ak@linux.intel.com>
CC: stable@vger.kernel.org
If pci_host_bridge_msi_domain() can't find an IRQ domain through the OF
tree, try to look it up directly through the fwnode_handle.
[bhelgaas: changelog]
Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
include/asm-generic/pci-bridge.h is now empty, so remove every #include of
it.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Will Deacon <will.deacon@arm.com> (arm64)
The PCI flag management constants and functions were previously declared in
include/asm-generic/pci-bridge.h. But they are not specific to bridges,
and arches did not include pci-bridge.h consistently.
Move the following interfaces and related constants to include/linux/pci.h
and remove pci-bridge.h:
pci_set_flags()
pci_add_flags()
pci_clear_flags()
pci_has_flag()
This fixes these warnings when building for some arches:
drivers/pci/host/pcie-designware.c:562:20: error: 'PCI_PROBE_ONLY' undeclared (first use in this function)
drivers/pci/host/pcie-designware.c:562:7: error: implicit declaration of function 'pci_has_flag' [-Werror=implicit-function-declaration]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This patch introduces pci_msi_register_fwnode_provider() for irqchip
to register a callback, to provide a way to determine appropriate MSI
domain for a pci device.
It also introduces pci_host_bridge_acpi_msi_domain(), which returns
the MSI domain of the specified PCI host bridge with DOMAIN_BUS_PCI_MSI
bus token. Then, it is assigned to pci device.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* pci/aspm:
PCI/ASPM: Make sysfs link_state_store() consistent with link_state_show()
* pci/hotplug:
PCI: pciehp: Always protect pciehp_disable_slot() with hotplug mutex
* pci/misc:
x86/PCI: Simplify pci_bios_{read,write}
PCI: Simplify config space size computation
PCI: Limit config space size for Netronome NFP6000 family
PCI: Add Netronome vendor and device IDs
PCI: Support PCIe devices with short cfg_size
x86/PCI: Clarify AMD Fam10h config access restrictions comment
PCI: Print warnings for all invalid expansion ROM headers
PCI: Check for PCI_HEADER_TYPE_BRIDGE equality, not bitmask
* pci/msi:
PCI/MSI: Remove empty pci_msi_init_pci_dev()
PCI/MSI: Initialize MSI capability for all architectures
Restructure the logic so we return the config space size as soon as we know
it. This reduces indentation, removes negations, and removes gotos.
No functional change.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
4a7cc83167 ("genirq/MSI: Move msi_list from struct pci_dev to struct
device") removed the contents of pci_msi_init_pci_dev(). All
implementation of it are now empty, so remove it completely.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
1851617cd2 ("PCI/MSI: Disable MSI at enumeration even if kernel doesn't
support MSI") moved dev->msi_cap and dev->msix_cap initialization from the
pci_init_capabilities() path (used on all architectures) to the
pci_setup_device() path (not used on Open Firmware architectures).
This broke MSI or MSI-X on Open Firmware machines. 4d9aac397a
("powerpc/PCI: Disable MSI/MSI-X interrupts at PCI probe time in OF case")
fixed it for PowerPC but not for SPARC.
Set up MSI and MSI-X (initialize msi_cap and msix_cap and disable MSI and
MSI-X) in pci_init_capabilities() so all architectures do it the same way.
This reverts 4d9aac397a since this patch fixes the problem generically
for both PowerPC and SPARC.
[bhelgaas: changelog, make pci_msi_setup_pci_dev() static]
Fixes: 1851617cd2 ("PCI/MSI: Disable MSI at enumeration even if kernel doesn't support MSI")
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* acpi-smbus:
Revert "ACPI / SBS: Add 5 us delay to fix SBS hangs on MacBook"
ACPI / SMBus: Fix boot stalls / high CPU caused by reentrant code
* acpi-ec:
ACPI-EC: Drop unnecessary check made before calling acpi_ec_delete_query()
* acpi-pci:
PCI: Fix OF logic in pci_dma_configure()
This patch fixes a bug introduced by previous commit,
which incorrectly checkes the of_node of the end-point device.
Instead, it should check the of_node of the host bridge.
Fixes: 50230713b6 ("PCI: OF: Move of_pci_dma_configure() to pci_dma_configure()")
Reported-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- Support for the ACPI _CCA configuration object intended to tell
the OS whether or not a bus master device supports hardware
managed cache coherency and a new set of functions to allow
drivers to check the cache coherency support for devices in a
platform firmware interface agnostic way (Suravee Suthikulpanit,
Jeremy Linton).
- ACPI backlight quirks for ESPRIMO Mobile M9410 and Dell XPS L421X
(Aaron Lu, Hans de Goede).
- Fixes for the arm_big_little and s5pv210-cpufreq cpufreq drivers
(Jon Medhurst, Nicolas Pitre).
- kfree()-related fixup for the recently introduced CPPC cpufreq
frontend (Markus Elfring).
- intel_pstate fix reducing kernel log noise on systems where
P-states are managed by hardware (Prarit Bhargava).
- intel_pstate maintainers information update (Srinivas Pandruvada).
- cpufreq core optimization related to the handling of delayed work
items used by governors (Viresh Kumar).
- Locking fixes and cleanups of the Operating Performance Points
(OPP) framework (Viresh Kumar).
- Generic power domains framework cleanups (Lina Iyer).
- cpupower tool updates (Jacob Tanenbaum, Sriram Raghunathan,
Thomas Renninger).
- turbostat tool updates (Len Brown).
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJWQ96OAAoJEILEb/54YlRxyYYQALJ1HXu76SvYX1re2aawOw6Y
WgzF3Ly7JX034E1VvA2xP6wgkWpBRBDcpnRDeltNA4dYXPBDei/eTcRZTLX12N3g
AfFRGjGWTtLJfpNPecNMmUyF5xHjgDgMIQRabY+Is5NfP5STkPHJeqULnEpvTtx8
bd0lnC5jc4vuZiPEh1xVb+ClYDqWS8YQPyFJVjV/BaIf8Qwe5+oRX36byMBaKc9D
ZgmvmCk5n/HLQQ1uQsqe4xnhFLHN2rypt2BLvFrOtlnSz9VNNpQyB+OIW1mgCD4f
LhpKIwjP8NhZNQUq8HFu7nDlm8ciQtWmeMPB5NdGQ+OESu7yfKAOzQ+3U6Gl2Gaf
66zVGyV6SOJJwfDVJ3qKTtroWps9QV7ZClOJ+zJGgiujwU+tJ3pDQyZM9pa7CL3C
s7ZAUsI6IigSBjD3nJVOyG4DO0a8KQFCIE1mDmyqId45Qz8xJoOrYP33/ZnDuOdo
2OtL/emyfWsz9ixbHVfwIhb7EC6aoaUxQrhSWmNraaQS43YfioZR7h4we8gwenph
X4E1KY4SdML+uFf2VKIcd45NM3IBprCxx5UgFAJdrqe8+otqPNF2AVosG4iqhg/b
k4nxwuIvw2a8Fm77U9ytyXDYMItU/wIlAHMbnmgx+oTwRv6AbZ07MHkyfuQLYuhD
tq5Y14qSiTS7prNacx98
=XZiP
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-4.4-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management and ACPI updates from Rafael Wysocki:
"The only new feature in this batch is support for the ACPI _CCA device
configuration object, which it a pre-requisite for future ACPI PCI
support on ARM64, but should not affect the other architectures.
The rest is fixes and cleanups, mostly in cpufreq (including
intel_pstate), the Operating Performace Points (OPP) framework and
tools (cpupower and turbostat).
Specifics:
- Support for the ACPI _CCA configuration object intended to tell the
OS whether or not a bus master device supports hardware managed
cache coherency and a new set of functions to allow drivers to
check the cache coherency support for devices in a platform
firmware interface agnostic way (Suravee Suthikulpanit, Jeremy
Linton).
- ACPI backlight quirks for ESPRIMO Mobile M9410 and Dell XPS L421X
(Aaron Lu, Hans de Goede).
- Fixes for the arm_big_little and s5pv210-cpufreq cpufreq drivers
(Jon Medhurst, Nicolas Pitre).
- kfree()-related fixup for the recently introduced CPPC cpufreq
frontend (Markus Elfring).
- intel_pstate fix reducing kernel log noise on systems where
P-states are managed by hardware (Prarit Bhargava).
- intel_pstate maintainers information update (Srinivas Pandruvada).
- cpufreq core optimization related to the handling of delayed work
items used by governors (Viresh Kumar).
- Locking fixes and cleanups of the Operating Performance Points
(OPP) framework (Viresh Kumar).
- Generic power domains framework cleanups (Lina Iyer).
- cpupower tool updates (Jacob Tanenbaum, Sriram Raghunathan, Thomas
Renninger).
- turbostat tool updates (Len Brown)"
* tag 'pm+acpi-4.4-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (32 commits)
PCI: ACPI: Add support for PCI device DMA coherency
PCI: OF: Move of_pci_dma_configure() to pci_dma_configure()
of/pci: Fix pci_get_host_bridge_device leak
device property: ACPI: Remove unused DMA APIs
device property: ACPI: Make use of the new DMA Attribute APIs
device property: Adding DMA Attribute APIs for Generic Devices
ACPI: Adding DMA Attribute APIs for ACPI Device
device property: Introducing enum dev_dma_attr
ACPI: Honor ACPI _CCA attribute setting
cpufreq: CPPC: Delete an unnecessary check before the function call kfree()
PM / OPP: Add opp_rcu_lockdep_assert() to _find_device_opp()
PM / OPP: Hold dev_opp_list_lock for writers
PM / OPP: Protect updates to list_dev with mutex
PM / OPP: Propagate error properly from dev_pm_opp_set_sharing_cpus()
cpufreq: s5pv210-cpufreq: fix wrong do_div() usage
MAINTAINERS: update for intel P-state driver
Creating a common structure initialization pattern for struct option
cpupower: Enable disabled Cstates if they are below max latency
cpupower: Remove debug message when using cpupower idle-set -D switch
cpupower: cpupower monitor reports uninitialized values for offline cpus
...
This patch adds support for setting up PCI device DMA coherency from
ACPI _CCA object that should normally be specified in the DSDT node
of its PCI host bridge.
Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch move of_pci_dma_configure() to a more generic
pci_dma_configure(), which can be extended by non-OF code (e.g. ACPI).
This has no functional change.
Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Acked-by: Rob Herring <robh+dt@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* pci/aer:
PCI/AER: Clear error status registers during enumeration and restore
* pci/hotplug:
PCI: pciehp: Queue power work requests in dedicated function
* pci/misc:
PCI: Turn off Request Attributes to avoid Chelsio T5 Completion erratum
x86/PCI: Make pci_subsys_init() static
PCI: Add builtin_pci_driver() to avoid registration boilerplate
PCI: Remove unnecessary "if" statement
* pci/msi:
x86/PCI: Don't alloc pcibios-irq when MSI is enabled
PCI/MSI: Export all remapped MSIs to sysfs attributes
PCI: Disable MSI on SiS 761
* pci/resource:
sparc/PCI: Add mem64 resource parsing for root bus
PCI: Expand Enhanced Allocation BAR output
PCI: Make Enhanced Allocation bitmasks more obvious
PCI: Handle Enhanced Allocation capability for SR-IOV devices
PCI: Add support for Enhanced Allocation devices
PCI: Add Enhanced Allocation register entries
PCI: Handle IORESOURCE_PCI_FIXED when assigning resources
PCI: Handle IORESOURCE_PCI_FIXED when sizing resources
PCI: Clear IORESOURCE_UNSET when reverting to firmware-assigned address
* pci/virtualization:
PCI: Fix sriov_enable() error path for pcibios_enable_sriov() failures
PCI: Wait 1 second between disabling VFs and clearing NumVFs
PCI: Reorder pcibios_sriov_disable()
PCI: Remove VFs in reverse order if virtfn_add() fails
PCI: Remove redundant validation of SR-IOV offset/stride registers
PCI: Set SR-IOV NumVFs to zero after enumeration
PCI: Enable SR-IOV ARI Capable Hierarchy before reading TotalVFs
PCI: Don't try to restore VF BARs
Add support for devices using Enhanced Allocation entries instead of BARs.
This allows the kernel to parse the EA Extended Capability structure in PCI
config space and claim the BAR-equivalent resources.
See https://pcisig.com/sites/default/files/specification_documents/ECN_Enhanced_Allocation_23_Oct_2014_Final.pdf
[bhelgaas: add spec URL, s/pci_ea_set_flags/pci_ea_flags/, consolidate
declarations, print unknown property in hex to match spec]
Signed-off-by: Sean O. Stalley <sean.stalley@intel.com>
[david.daney@cavium.com: Add more support/checking for Entry Properties,
allow EA behind bridges, rewrite some error messages.]
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
So far, we've always considered that for a given PCI device, its
MSI controller was either set by the architecture-specific
pcibios hook, or simply inherited from the host bridge.
This doesn't cover things like firmware-defined topologies like
msi-map (DT) or IORT (ACPI), which can provide information about
which MSI controller to use on a per-device basis.
This patch adds the necessary hook into the MSI code to allow this
feature, and provides the msi-map functionnality as a first
implementation.
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
So far, we have considered that the MSI domain for a device was
either set via the architecture-dependent pcibios implementation
or inherited from the host bridge.
As we're about to break that assumption, add pci_dev_msi_domain
which is the equivalent of pci_host_bridge_msi_domain, but for
a single device.
Other than moving things around a bit, this patch on its own
has no effect.
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
SR-IOV creates a virtual bus where bus->self is NULL. When we add VFs and
scan for an MSI domain, pci_set_bus_msi_domain() dereferences bus->self,
which causes a kernel NULL pointer dereference oops.
Scan up to the parent bus until we find a real bridge where we can get the
MSI domain.
[bhelgaas: changelog]
Fixes: 44aa0c657e ("PCI/MSI: Add hooks to populate the msi_domain field")
Tested-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
AER errors might be recorded when powering-on devices. These errors can be
ignored, so firmware usually clears them before the OS enumerates devices.
However, firmware is not involved when devices are added via hotplug, so
the OS may discover power-up errors that should be ignored. The same may
happen when powering up devices when resuming after suspend.
Clear the AER error status registers during enumeration and resume.
[bhelgaas: changelog, remove repetitive comments]
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Revert dff22d2054 ("PCI: Call pci_read_bridge_bases() from core instead
of arch code").
Reading PCI bridge windows is not arch-specific in itself, but there is PCI
core code that doesn't work correctly if we read them too early. For
example, Hannes found this case on an ARM Freescale i.mx6 board:
pci_bus 0000:00: root bus resource [mem 0x01000000-0x01efffff]
pci 0000:00:00.0: PCI bridge to [bus 01-ff]
pci 0000:00:00.0: BAR 8: no space for [mem size 0x01000000] (mem window)
pci 0000:01:00.0: BAR 2: failed to assign [mem size 0x00200000]
pci 0000:01:00.0: BAR 1: failed to assign [mem size 0x00004000]
pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x00000100]
The 00:00.0 mem window needs to be at least 3MB: the 01:00.0 device needs
0x204100 of space, and mem windows are megabyte-aligned.
Bus sizing can increase a bridge window size, but never *decrease* it (see
d65245c329 ("PCI: don't shrink bridge resources")). Prior to
dff22d2054, ARM didn't read bridge windows at all, so the "original size"
was zero, and we assigned a 3MB window.
After dff22d2054, we read the bridge windows before sizing the bus. The
firmware programmed a 16MB window (size 0x01000000) in 00:00.0, and since
we never decrease the size, we kept 16MB even though we only needed 3MB.
But 16MB doesn't fit in the host bridge aperture, so we failed to assign
space for the window and the downstream devices.
I think this is a defect in the PCI core: we shouldn't rely on the firmware
to assign sensible windows.
Ray reported a similar problem, also on ARM, with Broadcom iProc.
Issues like this are too hard to fix right now, so revert dff22d2054.
Reported-by: Hannes <oe5hpm@gmail.com>
Reported-by: Ray Jui <rjui@broadcom.com>
Link: http://lkml.kernel.org/r/CAAa04yFQEUJm7Jj1qMT57-LG7ZGtnhNDBe=PpSRa70Mj+XhW-A@mail.gmail.com
Link: http://lkml.kernel.org/r/55F75BB8.4070405@broadcom.com
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
1/ Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
mechanism for adding device-driver-discovered memory regions to the
kernel's direct map. This facility is used by the pmem driver to
enable pfn_to_page() operations on the page frames returned by DAX
('direct_access' in 'struct block_device_operations'). For now, the
'memmap' allocation for these "device" pages comes from "System
RAM". Support for allocating the memmap from device memory will
arrive in a later kernel.
2/ Introduce memremap() to replace usages of ioremap_cache() and
ioremap_wt(). memremap() drops the __iomem annotation for these
mappings to memory that do not have i/o side effects. The
replacement of ioremap_cache() with memremap() is limited to the
pmem driver to ease merging the api change in v4.3. Completion of
the conversion is targeted for v4.4.
3/ Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
driver, update the VFS DAX implementation and PMEM api to provide
persistence guarantees for kernel operations on a DAX mapping.
4/ Convert the ACPI NFIT 'BLK' driver to map the block apertures as
cacheable to improve performance.
5/ Miscellaneous updates and fixes to libnvdimm including support
for issuing "address range scrub" commands, clarifying the optimal
'sector size' of pmem devices, a clarification of the usage of the
ACPI '_STA' (status) property for DIMM devices, and other minor
fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJV6Nx7AAoJEB7SkWpmfYgCWyYQAI5ju6Gvw27RNFtPovHcZUf5
JGnxXejI6/AqeTQ+IulgprxtEUCrXOHjCDA5dkjr1qvsoqK1qxug+vJHOZLgeW0R
OwDtmdW4Qrgeqm+CPoxETkorJ8wDOc8mol81kTiMgeV3UqbYeeHIiTAmwe7VzZ0C
nNdCRDm5g8dHCjTKcvK3rvozgyoNoWeBiHkPe76EbnxDICxCB5dak7XsVKNMIVFQ
NuYlnw6IYN7+rMHgpgpRux38NtIW8VlYPWTmHExejc2mlioWMNBG/bmtwLyJ6M3e
zliz4/cnonTMUaizZaVozyinTa65m7wcnpjK+vlyGV2deDZPJpDRvSOtB0lH30bR
1gy+qrKzuGKpaN6thOISxFLLjmEeYwzYd7SvC9n118r32qShz+opN9XX0WmWSFlA
sajE1ehm4M7s5pkMoa/dRnAyR8RUPu4RNINdQ/Z9jFfAOx+Q26rLdQXwf9+uqbEb
bIeSQwOteK5vYYCstvpAcHSMlJAglzIX5UfZBvtEIJN7rlb0VhmGWfxAnTu+ktG1
o9cqAt+J4146xHaFwj5duTsyKhWb8BL9+xqbKPNpXEp+PbLsrnE/+WkDLFD67jxz
dgIoK60mGnVXp+16I2uMqYYDgAyO5zUdmM4OygOMnZNa1mxesjbDJC6Wat1Wsndn
slsw6DkrWT60CRE42nbK
=o57/
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"This update has successfully completed a 0day-kbuild run and has
appeared in a linux-next release. The changes outside of the typical
drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the
removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and
the introduction of ZONE_DEVICE + devm_memremap_pages().
Summary:
- Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
mechanism for adding device-driver-discovered memory regions to the
kernel's direct map.
This facility is used by the pmem driver to enable pfn_to_page()
operations on the page frames returned by DAX ('direct_access' in
'struct block_device_operations').
For now, the 'memmap' allocation for these "device" pages comes
from "System RAM". Support for allocating the memmap from device
memory will arrive in a later kernel.
- Introduce memremap() to replace usages of ioremap_cache() and
ioremap_wt(). memremap() drops the __iomem annotation for these
mappings to memory that do not have i/o side effects. The
replacement of ioremap_cache() with memremap() is limited to the
pmem driver to ease merging the api change in v4.3.
Completion of the conversion is targeted for v4.4.
- Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
driver, update the VFS DAX implementation and PMEM api to provide
persistence guarantees for kernel operations on a DAX mapping.
- Convert the ACPI NFIT 'BLK' driver to map the block apertures as
cacheable to improve performance.
- Miscellaneous updates and fixes to libnvdimm including support for
issuing "address range scrub" commands, clarifying the optimal
'sector size' of pmem devices, a clarification of the usage of the
ACPI '_STA' (status) property for DIMM devices, and other minor
fixes"
* tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits)
libnvdimm, pmem: direct map legacy pmem by default
libnvdimm, pmem: 'struct page' for pmem
libnvdimm, pfn: 'struct page' provider infrastructure
x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB
add devm_memremap_pages
mm: ZONE_DEVICE for "device memory"
mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h
dax: drop size parameter to ->direct_access()
nd_blk: change aperture mapping from WC to WB
nvdimm: change to use generic kvfree()
pmem, dax: have direct_access use __pmem annotation
dax: update I/O path to do proper PMEM flushing
pmem: add copy_from_iter_pmem() and clear_pmem()
pmem, x86: clean up conditional pmem includes
pmem: remove layer when calling arch_has_wmb_pmem()
pmem, x86: move x86 PMEM API to new pmem.h header
libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option
pmem: switch to devm_ allocations
devres: add devm_memremap
libnvdimm, btt: write and validate parent_uuid
...
Pull irq updates from Thomas Gleixner:
"This updated pull request does not contain the last few GIC related
patches which were reported to cause a regression. There is a fix
available, but I let it breed for a couple of days first.
The irq departement provides:
- new infrastructure to support non PCI based MSI interrupts
- a couple of new irq chip drivers
- the usual pile of fixlets and updates to irq chip drivers
- preparatory changes for removal of the irq argument from interrupt
flow handlers
- preparatory changes to remove IRQF_VALID"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits)
irqchip/imx-gpcv2: IMX GPCv2 driver for wakeup sources
irqchip: Add bcm2836 interrupt controller for Raspberry Pi 2
irqchip: Add documentation for the bcm2836 interrupt controller
irqchip/bcm2835: Add support for being used as a second level controller
irqchip/bcm2835: Refactor handle_IRQ() calls out of MAKE_HWIRQ
PCI: xilinx: Fix typo in function name
irqchip/gic: Ensure gic_cpu_if_up/down() programs correct GIC instance
irqchip/gic: Only allow the primary GIC to set the CPU map
PCI/MSI: pci-xgene-msi: Consolidate chained IRQ handler install/remove
unicore32/irq: Prepare puv3_gpio_handler for irq argument removal
tile/pci_gx: Prepare trio_handle_level_irq for irq argument removal
m68k/irq: Prepare irq handlers for irq argument removal
C6X/megamode-pic: Prepare megamod_irq_cascade for irq argument removal
blackfin: Prepare irq handlers for irq argument removal
arc/irq: Prepare idu_cascade_isr for irq argument removal
sparc/irq: Use access helper irq_data_get_affinity_mask()
sparc/irq: Use helper irq_data_get_irq_handler_data()
parisc/irq: Use access helper irq_data_get_affinity_mask()
mn10300/irq: Use access helper irq_data_get_affinity_mask()
irqchip/i8259: Prepare i8259_irq_dispatch for irq argument removal
...
Commit 1851617cd2 ("PCI/MSI: Disable MSI at enumeration even if kernel
doesn't support MSI") changed the location of the code that initialises
dev->msi_cap/msix_cap and then disables MSI/MSI-X interrupts at PCI
probe time in devices that have this flag set. It moved the code from
pci_msi_init_pci_dev() to a new function named pci_msi_setup_pci_dev(),
called by pci_setup_device().
The pseries PCI probing code does not call pci_setup_device(), so since
the aforementioned commit the function pci_msi_setup_pci_dev() is not
called and MSI/MSI-X interrupts are left enabled. Additionally because
dev->msi_cap/msix_cap are not initialised no driver can ever enable
MSI/MSI-X.
To fix this, the pseries PCI probe should manually call
pci_msi_setup_pci_dev(), so this patch makes it non-static.
Fixes: 1851617cd2 ("PCI/MSI: Disable MSI at enumeration even if kernel doesn't support MSI")
[mpe: Update change log to mention dev->msi_cap/msix_cap]
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Firmware typically configures the PCIe fabric with a consistent Max Payload
Size setting based on the devices present at boot. A hot-added device
typically has the power-on default MPS setting (128 bytes), which may not
match the fabric.
The previous Linux default, in the absence of any "pci=pcie_bus_*" options,
was PCIE_BUS_TUNE_OFF, in which we never touch MPS, even for hot-added
devices.
Add a new default setting, PCIE_BUS_DEFAULT, in which we make sure every
device's MPS setting matches the upstream bridge. This makes it more
likely that a hot-added device will work in a system with optimized MPS
configuration.
Note that if we hot-add a device that only supports 128-byte MPS, it still
likely won't work because we don't reconfigure the rest of the fabric.
Booting with "pci=pcie_bus_peer2peer" is a workaround for this because it
sets MPS to 128 for everything.
[bhelgaas: changelog, new default, rework for pci_configure_device() path]
Tested-by: Keith Busch <keith.busch@intel.com>
Tested-by: Jordan Hargrave <jharg93@gmail.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Previously we checked for invalid MPS settings, i.e., a device with MPS
different than its upstream bridge, in pcie_bus_detect_mps(). We only did
this if the arch or hotplug driver called pcie_bus_configure_settings(),
and then only if PCIe bus tuning was disabled (PCIE_BUS_TUNE_OFF).
Move the MPS checking code to pci_configure_device(), so we do it in the
pci_device_add() path for every device.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Add a pci_scan_root_bus_msi() interface so an arch can specify the MSI
controller up front. This removes the need for a pcibios callback to set
the MSI controller later.
This is not exported because I'd like to replace the variety of "scan root
bus" interfaces with a single, more extensible interface that can handle
the MSI controller, domain, pci_ops, resources, etc. I hope this interface
is temporary.
[bhelgaas: changelog, split into separate patch]
Suggested-by: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jingoo Han <jingoohan1@gmail.com>
We should not assume any particular hardware topology. Commit d0751b98df
("PCI: Add dev->has_secondary_link to track downstream PCIe links") relied
on the assumption that every PCIe hierarchy is rooted at a Root Port. But
we can't rely on any assumption about what hardware we will find; we just
have to deal with the world as it is.
On some platforms, PCIe devices (endpoints, switch upstream ports, etc.)
appear directly on the root bus, and there is no Root Port in the PCI bus
hierarchy. For example, Meelis observed these top-level devices on a
Sparc V245:
0000:02:00.0 PCI bridge to [bus 03-0d] Switch Upstream Port
0001:02:00.0 PCI bridge to [bus 03] PCIe to PCI/PCI-X Bridge
These devices *look* like they have links going upstream, but there really
are no upstream devices.
In set_pcie_port_type(), we used the parent device to figure out which side
of a switch port has a link, so if the parent device did not exist, we
dereferenced a NULL parent pointer.
Check whether the parent device exists before dereferencing it.
Meelis observed this oops on Sparc V245 and T2000. Ben Herrenschmidt says
this is also possible on IBM PowerVM guests on PowerPC.
[bhelgaas: changelog, comment]
Link: http://lkml.kernel.org/r/alpine.LRH.2.20.1508122118210.18637@math.ut.ee
Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
* pci/hotplug:
PCI: pciehp: Remove ignored MRL sensor interrupt events
PCI: pciehp: Remove unused interrupt events
PCI: pciehp: Handle invalid data when reading from non-existent devices
PCI: Hold pci_slot_mutex while searching bus->slots list
PCI: Protect pci_bus->slots with pci_slot_mutex, not pci_bus_sem
PCI: pciehp: Simplify pcie_poll_cmd()
PCI: Use "slot" and "pci_slot" for struct hotplug_slot and struct pci_slot
* pci/iommu:
PCI: Remove pci_ats_enabled()
PCI: Stop caching ATS Invalidate Queue Depth
PCI: Move ATS declarations to linux/pci.h so they're all together
PCI: Clean up ATS error handling
PCI: Use pci_physfn() rather than looking up physfn by hand
PCI: Inline the ATS setup code into pci_ats_init()
PCI: Rationalize pci_ats_queue_depth() error checking
PCI: Reduce size of ATS structure elements
PCI: Embed ATS info directly into struct pci_dev
PCI: Allocate ATS struct during enumeration
iommu/vt-d: Cache PCI ATS state and Invalidate Queue Depth
* pci/irq:
PCI: Kill off set_irq_flags() usage
* pci/virtualization:
PCI: Add ACS quirks for Intel I219-LM/V
Previously, we allocated pci_ats structures when an IOMMU driver called
pci_enable_ats(). An SR-IOV VF shares the STU setting with its PF, so when
enabling ATS on the VF, we allocated a pci_ats struct for the PF if it
didn't already have one. We held the sriov->lock to serialize threads
concurrently enabling ATS on several VFS so only one would allocate the PF
pci_ats.
Gregor reported a deadlock here:
pci_enable_sriov
sriov_enable
virtfn_add
mutex_lock(dev->sriov->lock) # acquire sriov->lock
pci_device_add
device_add
BUS_NOTIFY_ADD_DEVICE notifier chain
iommu_bus_notifier
amd_iommu_add_device # iommu_ops.add_device
init_iommu_group
iommu_group_get_for_dev
iommu_group_add_device
__iommu_attach_device
amd_iommu_attach_device # iommu_ops.attach_device
attach_device
pci_enable_ats
mutex_lock(dev->sriov->lock) # deadlock
There's no reason to delay allocating the pci_ats struct, and if we
allocate it for each device at enumeration-time, there's no need for
locking in pci_enable_ats().
Allocate pci_ats struct during enumeration, when we initialize other
capabilities.
Note that this implementation requires ATS to be enabled on the PF first,
before on any of the VFs because the PF controls the STU for all the VFs.
Link: http://permalink.gmane.org/gmane.linux.kernel.iommu/9433
Reported-by: Gregor Dick <gdick@solarflare.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Quoting Arnd:
I was thinking the opposite approach and basically removing all uses
of IORESOURCE_CACHEABLE from the kernel. There are only a handful of
them.and we can probably replace them all with hardcoded
ioremap_cached() calls in the cases they are actually useful.
All existing usages of IORESOURCE_CACHEABLE call ioremap() instead of
ioremap_nocache() if the resource is cacheable, however ioremap() is
uncached by default. Clearly none of the existing usages care about the
cacheability. Particularly devm_ioremap_resource() never worked as
advertised since it always fell back to plain ioremap().
Clean this up as the new direction we want is to convert
ioremap_<type>() usages to memremap(..., flags).
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* pci/irq:
PCI/MSI: Free legacy IRQ when enabling MSI/MSI-X
PCI: Add helpers to manage pci_dev->irq and pci_dev->irq_managed
PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()
PCI: Add pcibios_alloc_irq() and pcibios_free_irq()
* pci/misc:
PCI: Remove unused "pci_probe" flags
PCI: Add VPD function 0 quirk for Intel Ethernet devices
PCI: Add dev_flags bit to access VPD through function 0
PCI / ACPI: Fix pci_acpi_optimize_delay() comment
PCI: Remove a broken link in quirks.c
PCI: Remove useless redundant code
PCI: Simplify pci_find_(ext_)capability() return value checks
PCI: Move PCI_FIND_CAP_TTL to pci.h and use it in quirks
PCI: Add pcie_downstream_port() (true for Root and Switch Downstream Ports)
PCI: Fix pcie_port_device_resume() comment
PCI: Shift PCI_CLASS_NOT_DEFINED consistently with other classes
PCI: Revert aeb30016fe ("PCI: add Intel USB specific reset method")
PCI: Fix TI816X class code quirk
PCI: Fix generic NCR 53c810 class code quirk
PCI: Use PCI_CLASS_SERIAL_USB instead of bare number
PCI: Add quirk for Intersil/Techwell TW686[4589] AV capture cards
PCI: Remove Intel Cherrytrail D3 delays
* pci/resource:
PCI: Call pci_read_bridge_bases() from core instead of arch code
* pci/virtualization:
PCI: Restore ACS configuration as part of pci_restore_state()
Previously, pci_setup_device() and similar functions searched the
pci_bus->slots list without any locking. It was possible for another
thread to update the list while we searched it.
Add pci_dev_assign_slot() to search the list while holding pci_slot_mutex.
[bhelgaas: changelog, fold in CONFIG_SYSFS fix]
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
In order to populate the PCI host bridge msi_domain, use the
"msi-parent" attribute to lookup a corresponding irq domain.
If found, this is our MSI domain.
This gets plugged into the core PCI code.
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: Yijing Wang <wangyijing@huawei.com>
Cc: Ma Jun <majun258@huawei.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Duc Dang <dhdang@apm.com>
Cc: Hanjun Guo <hanjun.guo@linaro.org>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1438091186-10244-6-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In order to be able to populate the device msi_domain field,
add the necessary hooks to propagate the host bridge msi_domain
across secondary busses to devices.
So far, nobody populates the initial msi_domain.
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: Yijing Wang <wangyijing@huawei.com>
Cc: Ma Jun <majun258@huawei.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Duc Dang <dhdang@apm.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1438091186-10244-5-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When we scan a PCI bus, we read PCI-PCI bridge window registers with
pci_read_bridge_bases() so we can validate the resource hierarchy. Most
architectures call pci_read_bridge_bases() from pcibios_fixup_bus(), but
PCI-PCI bridges are not arch-specific, so this doesn't need to be in
arch-specific code.
Call pci_read_bridge_bases() directly from the PCI core instead of from
arch code.
For alpha and mips, we now call pci_read_bridge_bases() always; previously
we only called it if PCI_PROBE_ONLY was set.
[bhelgaas: changelog]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Ralf Baechle <ralf@linux-mips.org>
CC: James E.J. Bottomley <jejb@parisc-linux.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Bjorn Helgaas <bhelgaas@google.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: David Howells <dhowells@redhat.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Tony Luck <tony.luck@intel.com>
CC: David S. Miller <davem@davemloft.net>
CC: Ingo Molnar <mingo@redhat.com>
CC: Guenter Roeck <linux@roeck-us.net>
CC: Michal Simek <monstr@monstr.eu>
CC: Chris Zankel <chris@zankel.net>
The PCI class in dev->class is a three-byte value comprising a base class,
sub-class, and interface type. PCI_CLASS_NOT_DEFINED includes the base
class and sub-class, but not the interface type, so it should be shifted to
make space for the interface. It happens that PCI_CLASS_NOT_DEFINED is
zero, so it doesn't matter in the end, but we should still use it
consistently with other class definitions.
Treat PCI_CLASS_NOT_DEFINED as a base class/sub-class value that should
appear in bits 8-23 of dev->class.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
No one uses pci_scan_bus_parented() any more, remove it.
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
David Ahern reported that d63e2e1f3d ("sparc/PCI: Clip bridge windows
to fit in upstream windows") fails to boot on sparc/T5-8:
pci 0000:06:00.0: reg 0x184: can't handle BAR above 4GB (bus address 0x110204000)
The problem is that sparc64 assumed that dma_addr_t only needed to hold DMA
addresses, i.e., bus addresses returned via the DMA API (dma_map_single(),
etc.), while the PCI core assumed dma_addr_t could hold *any* bus address,
including raw BAR values. On sparc64, all DMA addresses fit in 32 bits, so
dma_addr_t is a 32-bit type. However, BAR values can be 64 bits wide, so
they don't fit in a dma_addr_t. d63e2e1f3d added new checking that
tripped over this mismatch.
Add pci_bus_addr_t, which is wide enough to hold any PCI bus address,
including both raw BAR values and DMA addresses. This will be 64 bits
on 64-bit platforms and on platforms with a 64-bit dma_addr_t. Then
dma_addr_t only needs to be wide enough to hold addresses from the DMA API.
[bhelgaas: changelog, bugzilla, Kconfig to ensure pci_bus_addr_t is at
least as wide as dma_addr_t, documentation]
Fixes: d63e2e1f3d ("sparc/PCI: Clip bridge windows to fit in upstream windows")
Fixes: 23b13bc76f ("PCI: Fail safely if we can't handle BARs larger than 4GB")
Link: http://lkml.kernel.org/r/CAE9FiQU1gJY1LYrxs+ma5LCTEEe4xmtjRG0aXJ9K_Tsu+m9Wuw@mail.gmail.com
Link: http://lkml.kernel.org/r/1427857069-6789-1-git-send-email-yinghai@kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=96231
Reported-by: David Ahern <david.ahern@oracle.com>
Tested-by: David Ahern <david.ahern@oracle.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
CC: stable@vger.kernel.org # v3.19+
Previously we assumed that PCIe Root Ports and Downstream Ports had Links
on their secondary side. That is true in most systems, but it is possible
to connect a switch with either an Upstream or a Downstream Port leading
downstream.
Instead of relying on the component type to identify devices that have
links leading downstream, use the "dev->has_secondary_link" field.
[bhelgaas: changelog]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
A PCIe Port is an interface to a Link. A Root Port is a PCI-PCI bridge in
a Root Complex and has a Link on its secondary (downstream) side. For
other Ports, the Link may be on either the upstream (closer to the Root
Complex) or downstream side of the Port.
The usual topology has a Root Port connected to an Upstream Port. We
previously assumed this was the only possible topology, and that a
Downstream Port's Link was always on its downstream side, like this:
+---------------------+
+------+ | Downstream |
| Root | | Upstream Port +--Link--
| Port +--Link--+ Port |
+------+ | Downstream |
| Port +--Link--
+---------------------+
But systems do exist (see URL below) where the Root Port is connected to a
Downstream Port. In this case, a Downstream Port's Link may be on either
the upstream or downstream side:
+---------------------+
+------+ | Upstream |
| Root | | Downstream Port +--Link--
| Port +--Link--+ Port |
+------+ | Downstream |
| Port +--Link--
+---------------------+
We can't use the Port type to determine which side the Link is on, so add a
bit in struct pci_dev to keep track.
A Root Port's Link is always on the Port's secondary side. A component
(Endpoint or Port) on the other end of the Link obviously has the Link on
its upstream side. If that component is a Port, it is part of a Switch or
a Bridge. A Bridge has a PCI or PCI-X bus on its secondary side, not a
Link. The internal bus of a Switch connects the Port to another Port whose
Link is on the downstream side.
[bhelgaas: changelog, comment, cache "type", use if/else]
Link: http://lkml.kernel.org/r/54EB81B2.4050904@pobox.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=94361
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
If we enable MSI, then kexec a new kernel, the new kernel may receive MSIs
it is not prepared for. Commit d5dea7d95c ("PCI: msi: Disable msi
interrupts when we initialize a pci device") prevents this, but only if the
new kernel is built with CONFIG_PCI_MSI=y.
Move the "disable MSI" functionality from drivers/pci/msi.c to a new
pci_msi_setup_pci_dev() in drivers/pci/probe.c so we can disable MSIs when
we enumerate devices even if the kernel doesn't include full MSI support.
[bhelgaas: changelog, disable MSIs in pci_setup_device(), put
pci_msi_setup_pci_dev() at its final destination]
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Export the following symbols so they can be referenced by a PCI host bridge
driver compiled as a kernel loadable module:
pci_common_swizzle
pci_create_root_bus
pci_stop_root_bus
pci_remove_root_bus
pci_assign_unassigned_bus_resources
pci_fixup_irqs
Signed-off-by: Ray Jui <rjui@broadcom.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Previously, pci_scan_root_bus() created a root PCI bus, enumerated the
devices on it, and called pci_bus_add_devices(), which made the devices
available for drivers to claim them.
Most callers assigned resources to devices after pci_scan_root_bus()
returns, which may be after drivers have claimed the devices. This is
incorrect; the PCI core should not change device resources while a driver
is managing the device.
Remove pci_bus_add_devices() from pci_scan_root_bus() and do it after any
resource assignment in the callers.
Note that ARM's pci_common_init_dev() already called pci_bus_add_devices()
after pci_scan_root_bus(), so we only need to remove the first call:
pci_common_init_dev
pcibios_init_hw
pci_scan_root_bus
pci_bus_add_devices # first call
pci_bus_assign_resources
pci_bus_add_devices # second call
[bhelgaas: changelog, drop "root_bus" var in alpha common_init_pci(),
return failure earlier in mn10300, add "return" in x86 pcibios_scan_root(),
return early if xtensa platform_pcibios_fixup() fails]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
CC: Matt Turner <mattst88@gmail.com>
CC: David Howells <dhowells@redhat.com>
CC: Tony Luck <tony.luck@intel.com>
CC: Michal Simek <monstr@monstr.eu>
CC: Ralf Baechle <ralf@linux-mips.org>
CC: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
CC: Sebastian Ott <sebott@linux.vnet.ibm.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Chris Metcalf <cmetcalf@ezchip.com>
CC: Chris Zankel <chris@zankel.net>
CC: Max Filippov <jcmvbkbc@gmail.com>
CC: Thomas Gleixner <tglx@linutronix.de>
Previously, pci_scan_bus() created a root PCI bus, enumerated the devices
on it, and called pci_bus_add_devices(), which made the devices available
for drivers to claim them.
Most callers assigned resources to devices after pci_scan_bus() returns,
which may be after drivers have claimed the devices. This is incorrect;
the PCI core should not change device resources while a driver is managing
the device.
Remove pci_bus_add_devices() from pci_scan_bus() and do it after any
resource assignment in the callers.
[bhelgaas: changelog, check for failure in mcf_pci_init()]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Geert Uytterhoeven <geert@linux-m68k.org>
CC: Guan Xuetao <gxt@mprc.pku.edu.cn>
CC: Richard Henderson <rth@twiddle.net>
CC: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
CC: Matt Turner <mattst88@gmail.com>
If there is a DT node available for the root bridge's parent device, use
the DMA configuration from that device node. For example, Keystone PCI
devices would require dma_pfn_offset to be set correctly in the device
structure of the PCI device in order to have the correct DMA mask. The DT
node will have dma-ranges defined for this. Also support using the DT
property dma-coherent to allow coherent DMA operation by the PCI device.
Use the new helper function of_pci_dma_configure() to update the device DMA
configuration. This fixes DMA on systems where DMA addresses are a
constant offset from CPU physical addresses.
Tested-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> (AMD Seattle)
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
CC: Joerg Roedel <joro@8bytes.org>
CC: Grant Likely <grant.likely@linaro.org>
CC: Rob Herring <robh+dt@kernel.org>
CC: Russell King <linux@arm.linux.org.uk>
CC: Arnd Bergmann <arnd@arndb.de>
Use common resource list management data structure and interfaces
instead of private implementation.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
As a consequence of restoring the detection of invalid BARs, add a new
informational printk like the following when such occurrences are
encountered.
pci ssss:bb:dd.f: [Firmware Bug]: reg 0xXX: invalid BAR (can't size)
Reported-by: William Unruh <unruh@physics.ubc.ca>
Reported-by: Martin Lucina <martin@lucina.net>
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Matthew Wilcox <willy@linux.intel.com>
Aaron reported that a 32-bit x86 kernel with Physical Address Extension
(PAE) support complains about bridge prefetchable memory windows above 4GB:
pci_bus 0000:00: root bus resource [mem 0x380000000000-0x383fffffffff]
...
pci 0000:03:00.0: reg 0x10: [mem 0x383fffc00000-0x383fffdfffff 64bit pref]
pci 0000:03:00.0: reg 0x20: [mem 0x383fffe04000-0x383fffe07fff 64bit pref]
pci 0000:03:00.1: reg 0x10: [mem 0x383fffa00000-0x383fffbfffff 64bit pref]
pci 0000:03:00.1: reg 0x20: [mem 0x383fffe00000-0x383fffe03fff 64bit pref]
pci 0000:00:02.2: PCI bridge to [bus 03-04]
pci 0000:00:02.2: bridge window [io 0x1000-0x1fff]
pci 0000:00:02.2: bridge window [mem 0x91900000-0x91cfffff]
pci 0000:00:02.2: can't handle 64-bit address space for bridge
In this kernel, unsigned long is 32 bits and dma_addr_t is 64 bits.
Previously we used "unsigned long" to hold the bridge window address. But
this is a bus address, so we should use dma_addr_t instead.
Use dma_addr_t to hold the bridge window base and limit.
The question of whether the CPU can actually *address* the window is
separate and depends on what the physical address space of the CPU is and
whether the host bridge does any address translation.
[bhelgaas: fix "shift count > width of type", changelog, stable tag]
Fixes: d56dbf5bab ("PCI: Allocate 64-bit BARs above 4G when possible")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=88131
Reported-by: Aaron Ma <mapengyu@gmail.com>
Tested-by: Aaron Ma <mapengyu@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v3.14+
Previously we applied _HPX type 2 record Link Control register settings
only to bridges with a subordinate bus. But it's better to apply them to
all devices with a link because if the subordinate bus has not been
allocated yet, we won't apply settings to the device.
Use pcie_cap_has_lnkctl() to determine whether the device has a Link
Control register instead of looking at dev->subordinate.
[bhelgaas: changelog]
Fixes: 6cd33649fa ("PCI: Add pci_configure_device() during enumeration")
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The functions pci_dev_put(), pci_pme_wakeup_bus(), and put_device() return
immediately if their argument is NULL. Thus the test before the call is
not needed.
Remove these unnecessary tests.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
__pci_read_base() disables decoding while sizing device BARs. We can't
print while decoding is disabled, which leads to some rather messy exit
logic.
Coalesce the sizing logic to minimize the time decoding is disabled. This
lets us print errors where they're detected.
The refactoring also takes advantage of the symmetry of obtaining the BAR's
extent (pci_size) and storing the result as the 'region' for both the
32-bit and 64-bit BARs, consolidating both cases.
No functional change intended.
[bhelgaas: move pci_size() up, per Thomas Petazzoni, Thierry Reding, Kevin Hilman]
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Matthew Wilcox <willy@linux.intel.com>
Commit 6ac665c63d ("PCI: rewrite PCI BAR reading code") masked off
low-order bits from 'l', but not from 'sz'. Both are passed to pci_size(),
which compares 'base == maxbase' to check for read-only BARs. The masking
of 'l' means that comparison will never be 'true', so the check for
read-only BARs no longer works.
Resolve this by also masking off the low-order bits of 'sz' before passing
it into pci_size() as 'maxbase'. With this change, pci_size() will once
again catch the problems that have been encountered to date:
- AGP aperture BAR of AMD-7xx host bridges: if the AGP window is
disabled, this BAR is read-only and read as 0x00000008 [1]
- BARs 0-4 of ALi IDE controllers can be non-zero and read-only [1]
- Intel Sandy Bridge - Thermal Management Controller [8086:0103];
BAR 0 returning 0xfed98004 [2]
- Intel Xeon E5 v3/Core i7 Power Control Unit [8086:2fc0];
Bar 0 returning 0x00001a [3]
Link: [1] https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/drivers/pci/probe.c?id=1307ef6621991f1c4bc3cec1b5a4ebd6fd3d66b9 ("PCI: probing read-only BARs" (pre-git))
Link: [2] https://bugzilla.kernel.org/show_bug.cgi?id=43331
Link: [3] https://bugzilla.kernel.org/show_bug.cgi?id=85991
Reported-by: William Unruh <unruh@physics.ubc.ca>
Reported-by: Martin Lucina <martin@lucina.net>
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Matthew Wilcox <willy@linux.intel.com>
CC: stable@vger.kernel.org # v2.6.27+
* pci/host-generic:
arm64: Add architectural support for PCI
PCI: Add pci_remap_iospace() to map bus I/O resources
of/pci: Add support for parsing PCI host bridge resources from DT
of/pci: Add pci_get_new_domain_nr() and of_get_pci_domain_nr()
PCI: Add generic domain handling
of/pci: Fix the conversion of IO ranges into IO resources
of/pci: Move of_pci_range_to_resource() to of/address.c
ARM: Define PCI_IOBASE as the base of virtual PCI IO space
of/pci: Add pci_register_io_range() and pci_pio_to_address()
asm-generic/io.h: Fix ioport_map() for !CONFIG_GENERIC_IOMAP
Conflicts:
drivers/pci/host/pci-tegra.c
The handling of PCI domains (or PCI segments in ACPI speak) is usually a
straightforward affair but its implementation is currently left to the
architectural code, with pci_domain_nr(b) querying the value of the domain
associated with bus b.
This patch introduces CONFIG_PCI_DOMAINS_GENERIC as an option that can be
selected if an architecture wants a simple implementation where the value
of the domain associated with a bus is stored in struct pci_bus.
The architectures that select CONFIG_PCI_DOMAINS_GENERIC will then have to
implement pci_bus_assign_domain_nr() as a way of setting the domain number
associated with a root bus. All child buses except the root bus will
inherit the domain_nr value from their parent.
Signed-off-by: Catalin Marinas <Catalin.Marinas@arm.com>
[Renamed pci_set_domain_nr() to pci_bus_assign_domain_nr()]
Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Arnd Bergmann <arnd@arndb.de>
This reverts commit 1820ffdccb ("PCI: Make sure bus number resources stay
within their parents bounds") because it breaks some systems with LSI Logic
FC949ES Fibre Channel Adapters, apparently by exposing a defect in those
adapters.
Dirk tested a Tyan VX50 (B4985) with this device that worked like this
prior to 1820ffdccb9b:
bus: [bus 00-7f] on node 0 link 1
ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-07])
pci 0000:00:0e.0: PCI bridge to [bus 0a]
pci_bus 0000:0a: busn_res: can not insert [bus 0a] under [bus 00-07] (conflicts with (null) [bus 00-07])
pci 0000:0a:00.0: [1000:0646] type 00 class 0x0c0400 (FC adapter)
Note that the root bridge [bus 00-07] aperture is wrong; this is a BIOS
defect in the PCI0 _CRS method. But prior to 1820ffdccb, we didn't
enforce that aperture, and the FC adapter worked fine at 0a:00.0.
After 1820ffdccb, we notice that 00:0e.0's aperture is not contained in
the root bridge's aperture, so we reconfigure it so it *is* contained:
pci 0000:00:0e.0: bridge configuration invalid ([bus 0a-0a]), reconfiguring
pci 0000:00:0e.0: PCI bridge to [bus 06-07]
This effectively moves the FC device from 0a:00.0 to 07:00.0, which should
be legal. But when we enumerate bus 06, the FC device doesn't respond, so
we don't find anything. This is probably a defect in the FC device.
Possible fixes (due to Yinghai):
1) Add a quirk to fix the _CRS information based on what amd_bus.c read
from the hardware
2) Reset the FC device after we change its bus number
3) Revert 1820ffdccb
Fix 1 would be relatively easy, but it does sweep the LSI FC issue under
the rug. We might want to reconfigure bus numbers in the future for some
other reason, e.g., hotplug, and then we could trip over this again.
For that reason, I like fix 2, but we don't know whether it actually works,
and we don't have a patch for it yet.
This revert is fix 3, which also sweeps the LSI FC issue under the rug.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=84281
Reported-by: Dirk Gouders <dirk@gouders.net>
Tested-by: Dirk Gouders <dirk@gouders.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v3.15+
CC: Yinghai Lu <yinghai@kernel.org>
This reverts commit fc1b253141 ("PCI: Don't scan random busses in
pci_scan_bridge()") because it breaks CardBus on some machines.
David tested a Dell Latitude D505 that worked like this prior to
fc1b253141b3:
pci 0000:00:1e.0: PCI bridge to [bus 01]
pci 0000:01:01.0: CardBus bridge to [bus 02-05]
Note that the 01:01.0 CardBus bridge has a bus number aperture of
[bus 02-05], but those buses are all outside the 00:1e.0 PCI bridge bus
number aperture, so accesses to buses 02-05 never reach CardBus. This is
later patched up by yenta_fixup_parent_bridge(), which changes the
subordinate bus number of the 00:1e.0 PCI bridge:
pci_bus 0000:01: Raising subordinate bus# of parent bus (#01) from #01 to #05
With fc1b253141, pci_scan_bridge() fails immediately when it notices that
we can't allocate a valid secondary bus number for the CardBus bridge, and
CardBus doesn't work at all:
pci 0000:01:01.0: can't allocate child bus 01 from [bus 01]
I'd prefer to fix this by integrating the yenta_fixup_parent_bridge() logic
into pci_scan_bridge() so we fix the bus number apertures up front. But
I don't think we can do that before v3.17, so I'm going to revert this to
avoid the problem while we're working on the long-term fix.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=83441
Link: http://lkml.kernel.org/r/1409303414-5196-1-git-send-email-david.henningsson@canonical.com
Reported-by: David Henningsson <david.henningsson@canonical.com>
Tested-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v3.15+
There's not really a good way to determine whether firmware has already
configured a device with _HPP/_HPX settings. On legacy systems, the BIOS
has probably configured everything, but on UEFI systems it is not required
to do so.
Per the PCI Firmware Specification, rev 3.1, sec 3.5, if PCI_COMMAND_IO or
PCI_COMMAND_MEMORY is set, we can assume firmware has set the corresponding
BARs and maybe we can assume it has configured the rest of the device. And
if a bridge has PCI_COMMAND_PARITY or PCI_COMMAND_SERR set, we can assume
firmware has configured the bridge. But we can't tell much about devices
without BARs.
I think it should be safe to apply _HPP and _HPX settings anyway, even if
firmware has already configured the device, so configure everything we
find.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Linux manages MPS and MRRS settings to keep them consistent across the PCIe
fabric. BIOS doesn't participate in this Linux management, so ignore that
part of any _HPX settings it supplies.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
We currently apply _HPP settings only to:
- non-bridge devices, and
- PCI-to-PCI bridges
i.e., we do not apply them to PCI-to-ISA bridges and the like. It has been
that way since _HPP support was added by 40abb96c51 ("pciehp: Fix
programming hotplug parameters"), but I don't think there's any reason to
exclude these other bridges.
Apply _HPP settings to hot-added PCI devices of any type.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Do not clear PCI_COMMAND_SERR or PCI_COMMAND_PARITY based on _HPP. The
spec (ACPI rev 5.0, sec 6.2.7) says that when "Enable SERR" is set to 1,
we should enable SERR in the command register. It says nothing about
*disabling* SERR or PERR; in fact, the example in 6.2.7.1 says we should
leave PERR alone unless "Enable PERR" is 1.
For hot-added devices, this probably doesn't matter because they power up
with these bits cleared. But in addition to hot-plugged devices, the spec
allows the platform to use _HPP for "configuration of PCI devices not
configured by the BIOS at system boot," and it may make a difference for
devices present at boot.
This change means that if BIOS enables SERR or PERR on a device, and it
supplies _HPP or _HPX with the SERR or PERR bits *cleared*, we will now
leave SERR or PERR reporting enabled on that device instead of disabling it
as we previously did.
See also 40abb96c51 ("pciehp: Fix programming hotplug parameters"), where
this code was first added.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
The ACPI _HPP method was defined before PCIe existed, so its documentation
only mentions PCI. The _HPX Type 0 setting record is essentially identical
to _HPP, but the spec (ACPI rev 5.0, sec 6.2.8.1) says it should be applied
to PCI, PCI-X, and PCIe devices, with settings being ignored if they are
not applicable.
Some platforms with both conventional PCI and PCIe devices provide only
_HPP (not _HPX), so treat _HPP the same way as an _HPX Type 0 record and
apply it to PCIe devices as well as PCI and PCI-X.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
All pci_configure_slot() uses have been removed, so remove the definition
as well.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Some platforms can tell the OS how to configure PCI devices, e.g., how to
set cache line size, error reporting enables, etc. ACPI defines _HPP and
_HPX methods for this purpose.
This configuration was previously done by some of the hotplug drivers using
pci_configure_slot(). But not all hotplug drivers did this, and per the
spec (ACPI rev 5.0, sec 6.2.7), we can also do it for "devices not
configured by the BIOS at system boot."
Move this configuration into the PCI core by adding pci_configure_device()
and calling it from pci_device_add(), so we do this for all devices as we
enumerate them.
This is based on pci_configure_slot(), which is used by hotplug drivers.
I omitted:
- pcie_bus_configure_settings() because it configures MPS and MRRS, which
requires global knowledge of the fabric and must be done later, and
- configuration of subordinate devices; that will happen when we call
pci_device_add() for those devices.
Because pci_configure_slot() was only done by hotplug drivers, this initial
version of pci_configure_device() only configures hot-added devices,
ignoring anything added during boot.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Move pci_configure_slot() and related functions from
drivers/pci/hotplug/pcihp_slot to drivers/pci/probe.c.
This is to prepare for doing device configuration during the normal
enumeration process instead of just after hot-add.
No functional change.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Per PCIe r3.0, sec 2.3.2, an endpoint may respond to a Configuration
Request with a Completion with Configuration Request Retry Status (CRS).
This terminates the Configuration Request.
When the CRS Software Visibility feature is disabled (as it is by default),
a Root Complex must handle a CRS Completion by re-issuing the Configuration
Request. This is invisible to software. From the CPU's point of view, an
endpoint that always responds with CRS causes a hang because the Root
Complex never supplies data to complete the CPU read.
When CRS Software Visibility is enabled, a Root Complex that receives a CRS
Completion for a read of the Vendor ID must return data of 0x0001. The
Vendor ID of 0x0001 indicates to software that the endpoint is not ready.
We now have more devices that require CRS Software Visibility. For
example, a PLX 8713 NT bridge may respond with CRS until it has been
configured via I2C, and the I2C configuration is completely independent of
PCI enumeration.
Enable CRS Software Visibility if it is supported. This allows a system
with such a device to work (though the PCI core times out waiting for it to
become ready, and we have to rescan the bus after it is ready).
This essentially reverts ad7edfe049 ("[PCI] Do not enable CRS Software
Visibility by default"). The failures that led to ad7edfe049 should be
addressed by 89665a6a71 ("PCI: Check only the Vendor ID to identify
Configuration Request Retry").
[bhelgaas: changelog]
Link: http://lkml.kernel.org/r/20071029061532.5d10dfc6@snowcone
Link: http://lkml.kernel.org/r/alpine.LFD.0.9999.0712271023090.21557@woody.linux-foundation.org
Signed-off-by: Rajat Jain <rajatxjain@gmail.com>
Signed-off-by: Rajat Jain <rajatjain@juniper.net>
Signed-off-by: Guenter Roeck <groeck@juniper.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Per PCIe r3.0, sec 2.3.2, if a Root Complex
- has Configuration Request Retry Status Software Visibility enabled,
- issues a Configuration Read of both bytes of the Vendor ID, and
- receives a Completion with Configuration Request Retry Status (CRS),
it must complete the request to the host by fabricating data of 0x0001 for
the Vendor ID and 0xff for any additional bytes in the request.
Linux issues a single config read for the four bytes containing the Vendor
ID and the Device ID. Previously we checked all four bytes for 0xffff0001
to identify CRS.
However, it is only the Vendor ID that really indicates CRS, because it's
sufficient to read only those two bytes. Checking the Device ID verifies
spec compliance but doesn't add any information.
Some Root Complexes appear to indicate CRS by returning 0x0001 for the
Vendor ID along with the actual the Device ID. Previously we interpreted
that as a valid Vendor/Device ID pair, although 0x0001 is reserved and
cannot be a valid Vendor ID.
[bhelgaas: changelog]
Link: http://lkml.kernel.org/r/4729FC36.3040000@gmail.com
Signed-off-by: Rajat Jain <rajatxjain@gmail.com>
Signed-off-by: Rajat Jain <rajatjain@juniper.net>
Signed-off-by: Guenter Roeck <groeck@juniper.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Merge quoted strings that are broken across lines into a single entity.
The compiler merges them anyway, but checkpatch complains about it, and
merging them makes it easier to grep for strings.
No functional change.
[bhelgaas: changelog, do the same for everything under drivers/pci]
Signed-off-by: Ryan Desfosses <ryan@desfo.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Fix various whitespace errors.
No functional change.
[bhelgaas: fix other similar problems]
Signed-off-by: Ryan Desfosses <ryan@desfo.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Move EXPORT_SYMBOL so it immediately follows the function or variable.
No functional change.
[bhelgaas: squash similar changes, fix hotplug, probe, rom, search, too]
Signed-off-by: Ryan Desfosses <ryan@desfo.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* pci/misc:
PCI: Fix return value from pci_user_{read,write}_config_*()
PCI: Turn pcibios_penalize_isa_irq() into a weak function
PCI: Test for std config alias when testing extended config space
* pci/hotplug:
PCI: cpqphp: Fix possible null pointer dereference
NVMe: Implement PCIe reset notification callback
PCI: Notify driver before and after device reset
* pci/pci_is_bridge:
pcmcia: Use pci_is_bridge() to simplify code
PCI: pciehp: Use pci_is_bridge() to simplify code
PCI: acpiphp: Use pci_is_bridge() to simplify code
PCI: cpcihp: Use pci_is_bridge() to simplify code
PCI: shpchp: Use pci_is_bridge() to simplify code
PCI: rpaphp: Use pci_is_bridge() to simplify code
sparc/PCI: Use pci_is_bridge() to simplify code
powerpc/PCI: Use pci_is_bridge() to simplify code
ia64/PCI: Use pci_is_bridge() to simplify code
x86/PCI: Use pci_is_bridge() to simplify code
PCI: Use pci_is_bridge() to simplify code
PCI: Add new pci_is_bridge() interface
PCI: Rename pci_is_bridge() to pci_has_subordinate()
* pci/virtualization:
PCI: Introduce new device binding path using pci_dev.driver_override
Conflicts:
drivers/pci/pci-sysfs.c
The driver_override field allows us to specify the driver for a device
rather than relying on the driver to provide a positive match of the
device. This shortcuts the existing process of looking up the vendor and
device ID, adding them to the driver new_id, binding the device, then
removing the ID, but it also provides a couple advantages.
First, the above existing process allows the driver to bind to any device
matching the new_id for the window where it's enabled. This is often not
desired, such as the case of trying to bind a single device to a meta
driver like pci-stub or vfio-pci. Using driver_override we can do this
deterministically using:
echo pci-stub > /sys/bus/pci/devices/0000:03:00.0/driver_override
echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind
echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
Previously we could not invoke drivers_probe after adding a device to
new_id for a driver as we get non-deterministic behavior whether the driver
we intend or the standard driver will claim the device. Now it becomes a
deterministic process, only the driver matching driver_override will probe
the device.
To return the device to the standard driver, we simply clear the
driver_override and reprobe the device:
echo > /sys/bus/pci/devices/0000:03:00.0/driver_override
echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind
echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
Another advantage to this approach is that we can specify a driver override
to force a specific binding or prevent any binding. For instance when an
IOMMU group is exposed to userspace through VFIO we require that all
devices within that group are owned by VFIO. However, devices can be
hot-added into an IOMMU group, in which case we want to prevent the device
from binding to any driver (override driver = "none") or perhaps have it
automatically bind to vfio-pci. With driver_override it's a simple matter
for this field to be set internally when the device is first discovered to
prevent driver matches.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a PCI-to-PCIe bridge is stacked on a PCIe-to-PCI bridge, we can have
PCIe endpoints masked by a conventional PCI bus. This makes the extended
config space of the PCIe endpoint inaccessible. The PCIe-to-PCI bridge is
supposed to handle any type 1 configuration transactions where the extended
config offset bits are non-zero as an Unsupported Request rather than
forward it to the secondary interface. As noted here, there are a couple
known offenders to this rule. These bridges drop the extended offset bits,
resulting in the conventional config space being aliased many times across
the extended config space. For Intel NICs, this alias often seems to
expose a bogus SR-IOV cap.
Stacking bridges may seem like an uncommon scenario, but note that any
conventional PCI slot in a modern PC is already the secondary interface of
an onboard PCIe-to-PCI bridge. The user need only add a PCI-to-PCIe
adapter and PCIe device to encounter this problem.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Use pci_is_bridge() to simplify code. No functional change.
Requires: 326c1cdae7 PCI: Rename pci_is_bridge() to pci_has_subordinate()
Requires: 1c86438c94 PCI: Add new pci_is_bridge() interface
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* dma-api:
iommu/exynos: Remove unnecessary "&" from function pointers
DMA-API: Update dma_pool_create ()and dma_pool_alloc() descriptions
DMA-API: Fix duplicated word in DMA-API-HOWTO.txt
DMA-API: Capitalize "CPU" consistently
sh/PCI: Pass GAPSPCI_DMA_BASE CPU & bus address to dma_declare_coherent_memory()
DMA-API: Change dma_declare_coherent_memory() CPU address to phys_addr_t
DMA-API: Clarify physical/bus address distinction
* pci/virtualization:
PCI: Mark RTL8110SC INTx masking as broken
* pci/msi:
PCI/MSI: Remove pci_enable_msi_block()
* pci/misc:
PCI: Remove pcibios_add_platform_entries()
s390/pci: use pdev->dev.groups for attribute creation
PCI: Move Open Firmware devspec attribute to PCI common code
* pci/resource:
PCI: Add resource allocation comments
PCI: Simplify __pci_assign_resource() coding style
PCI: Change pbus_size_mem() return values to be more conventional
PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources
PCI: Support BAR sizes up to 8GB
resources: Clarify sanity check message
PCI: Don't add disabled subtractive decode bus resources
PCI: Don't print anything while decoding is disabled
PCI: Don't set BAR to zero if dma_addr_t is too small
PCI: Don't convert BAR address to resource if dma_addr_t is too small
PCI: Reject BAR above 4GB if dma_addr_t is too small
PCI: Fail safely if we can't handle BARs larger than 4GB
x86/gart: Tidy messages and add bridge device info
x86/gart: Replace printk() with pr_info()
x86/PCI: Move pcibios_assign_resources() annotation to definition
x86/PCI: Mark ATI SBx00 HPET BAR as IORESOURCE_PCI_FIXED
x86/PCI: Don't try to move IORESOURCE_PCI_FIXED resources
x86/PCI: Fix Broadcom CNB20LE unintended sign extension
For a subtractive decode bridge, we previously added and printed all
resources of the primary bus, even if they were not valid. In the example
below, the bridge 00:1c.3 has no windows enabled, so there are no valid
resources on bus 02. But since 02:00.0 is subtractive decode bridge, we
add and print all those invalid resources, which don't really make sense:
pci 0000:00:1c.3: PCI bridge to [bus 02-03]
pci 0000:02:00.0: PCI bridge to [bus 03] (subtractive decode)
pci 0000:02:00.0: bridge window [??? 0x00000000 flags 0x0] (subtractive decode)
Add and print the subtractively-decoded resources only if they are valid.
There's an example in the dmesg log attached to the bugzilla below (but
this patch doesn't fix the bug reported there).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=73141
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
If the console is a PCI device, and we try to print to it while its
decoding is disabled, the system will hang. This particular printk hasn't
caused a problem yet, but it could, so this fixes it.
See also 0ff9514b57 ("PCI: Don't print anything while decoding is
disabled").
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
If a BAR is above 4GB and our dma_addr_t is too small, don't clear the BAR
to zero: that doesn't disable the BAR, and it makes it more likely that the
BAR will conflict with things if we turn on the memory enable bit (as we
will at "out:" if the device was already enabled at the handoff).
We should also print the BAR info and its original size so we can follow
the process when we try to assign space to it.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
If dma_addr_t is too small to represent the BAR value,
pcibios_bus_to_resource() will fail, so just remember the BAR size directly
in the resource. The resource is already marked UNSET, so we know the
address isn't valid anyway.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
We can only handle BARs above 4GB if dma_addr_t (not resource_size_t) is 64
bits wide. If we have a 64-bit resource_size_t and a 32-bit dma_addr_t,
we can't deal with BARs above 4GB.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
We can only handle BARs larger than 4GB if both dma_addr_t and
resource_size_t are 64 bits wide. If dma_addr_t is 32 bits, we can't
represent all the bus addresses, and if resource_size_t is 32 bits, we
can't represent all the CPU addresses.
Previously we cleared res->flags (at "fail:") for resources that were too
large. That means we think the BAR doesn't exist at all, which in turn
means that we could enable the device even though we can't keep track of
where the BAR is and we can't make sure it doesn't overlap something else.
This preserves the type flags (MEM/IO) so we can keep from enabling the
device.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
If "pcie_bus_config == PCIE_BUS_PERFORMANCE", we don't initialize "smpss",
so we pass a pointer to garbage into pcie_bus_configure_set(), where we
compute "mps" based on the garbage. We then pass the garbage "mps" to
pcie_write_mps(), which ignores it in the PCIE_BUS_PERFORMANCE case.
Coverity isn't smart enough to deduce that we ignore the garbage (it's a
lot to expect from a human, too), so initialize "smpss" to a safe value in
all cases.
Found by Coverity (CID 146454).
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Some PCI functions used to be marked __devinit. When CONFIG_HOTPLUG was
not set, these functions were discarded after boot. A few callers of these
__devinit functions were marked __ref to indicate that they could safely
call the __devinit functions even though the callers were not __devinit.
But CONFIG_HOTPLUG and __devinit are now gone, and the need for the __ref
annotations is also gone, so remove them. Relevant historical commits:
54b956b903 Remove __dev* markings from init.h
a8e4b9c101 PCI: add generic pci_hp_add_bridge()
0ab2b57f8d PCI: fix section mismatch warning in pci_scan_child_bus
451124a7cc PCI: fix 4x section mismatch warnings
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* pci/resource: (26 commits)
Revert "[PATCH] Insert GART region into resource map"
PCI: Log IDE resource quirk in dmesg
PCI: Change pci_bus_alloc_resource() type_mask to unsigned long
PCI: Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region()
resources: Set type in __request_region()
PCI: Don't check resource_size() in pci_bus_alloc_resource()
s390/PCI: Use generic pci_enable_resources()
tile PCI RC: Use default pcibios_enable_device()
sparc/PCI: Use default pcibios_enable_device() (Leon only)
sh/PCI: Use default pcibios_enable_device()
microblaze/PCI: Use default pcibios_enable_device()
alpha/PCI: Use default pcibios_enable_device()
PCI: Add "weak" generic pcibios_enable_device() implementation
PCI: Don't enable decoding if BAR hasn't been assigned an address
PCI: Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit
PCI: Don't try to claim IORESOURCE_UNSET resources
PCI: Check IORESOURCE_UNSET before updating BAR
PCI: Don't clear IORESOURCE_UNSET when updating BAR
PCI: Mark resources as IORESOURCE_UNSET if we can't assign them
PCI: Remove pci_find_parent_resource() use for allocation
...
Make a note in dmesg when we overwrite legacy IDE BAR info. We previously
logged something like this:
pci 0000:00:1f.1: reg 0x10: [io 0x0000-0x0007]
and then silently overwrote the resource. There's an example in the
bugzilla below. This doesn't fix the bugzilla; it just makes what's going
on more obvious.
No functional change; merely adds some dev_info() calls.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=48451
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
If we don't support 64-bit addresses, i.e., CONFIG_PHYS_ADDR_T_64BIT is not
set, we can't deal with BARs above 4GB. In this case we already pretend
the BAR contained zero; this patch also sets IORESOURCE_UNSET so we can try
to reallocate it later.
I don't think this is exactly correct: what we care about here are *bus*
addresses, not CPU addresses, so the tests of sizeof(resource_size_t)
probably should be on sizeof(dma_addr_t) instead. But this is what's been
in -next, so we'll fix that later.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
When assigning a new bus number in pci_scan_bridge we check whether
max+1 is free by calling pci_find_bus. If it does already exist then we
assume that we are rescanning and that this is the right bus to scan.
This is fragile. If max+1 lies outside of bus->busn_res.end then we will
rescan some random bus from somewhere else in the hierachy. This patch
checks for this case and prints a warning.
[bhelgaas: add parent/child bus number info to dev_warn()]
Signed-off-by: Andreas Noever <andreas.noever@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
pci_scan_child_bus can (potentially) return a bus number higher than the
subordinate value of the child bus. Possible reasons are that bus numbers
are reserved for SR-IOV or for CardBus (SR-IOV is done without checks and
the CardBus checks are sketchy at best).
We clamp the returned value to the actual subordinate value and print a
warning if too many bus numbers are reserved.
[bhelgaas: whitespace]
Signed-off-by: Andreas Noever <andreas.noever@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The function has no effect.
If pcibios_assign_all_busses() is not set then the function does nothing.
If it is set then in pci_scan_bridge we are always in the branch where
we assign the bus numbers ourselves and the subordinate values of all
parent busses will be set to 0xff since that is what they inherited from
their parent bus and ultimately from the root bus.
Signed-off-by: Andreas Noever <andreas.noever@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Right now we use 0xff for busn_res.end when probing and later reduce it to
the value that is actually used. This does not work if a parent bridge has
already a lower subordinate value. For example during hotplug of a new
bridge below an already-configured bridge the following message is printed
from pci_bus_insert_busn_res():
pci_bus 0000:06: busn_res: can not insert [bus 06-ff] under [bus 05-9b] (conflicts with (null) [bus 05-9b])
This patch clamps the bus range to that of the parent and also ensures that
we do not exceed the parents range when assigning the final subordinate
value.
We also check that busses configured by the firmware fit into their parents
bounds.
[bhelgaas: reword dev_warn() and fix printk format warning]
Signed-off-by: Andreas Noever <andreas.noever@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
If a conflict happens during insert_resource_conflict() and all conflicts
fit within the newly inserted resource then they will become children of
the new resource. This is almost certainly not what we want for bus
numbers.
Signed-off-by: Andreas Noever <andreas.noever@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Right now the CardBus code in pci_scan_bridge() is executed during both
passes. Since we always allocate the bus number ourselves it makes sense
to put it into the second pass.
Signed-off-by: Andreas Noever <andreas.noever@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Initially when we encountered a bus that was already present we skipped
it. Since 74710ded8e 'PCI: always scan child buses' we continue
scanning in order to allow user triggered rescans of already existing
busses.
The old comment suggested that the reason for continuing the scan is a
bug in the i450NX chipset. This is not the case.
Signed-off-by: Andreas Noever <andreas.noever@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This patch fixes two small issues:
- If pci_add_new_bus() fails, max must not be incremented. Otherwise
an incorrect value is returned from pci_scan_bridge().
- If the bus is already present, max must be incremented. I think
that this case should only be hit if we trigger a manual rescan of a
CardBus bridge.
Signed-off-by: Andreas Noever <andreas.noever@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Revert commit ef83b0781a "PCI: Remove from bus_list and release
resources in pci_release_dev()" that made some nasty race conditions
become possible. For example, if a Thunderbolt link is unplugged
and then replugged immediately, the pci_release_dev() resulting from
the hot-remove code path may be racing with the hot-add code path
which after that commit causes various kinds of breakage to happen
(up to and including a hard crash of the whole system).
Moreover, the problem that commit ef83b0781a attempted to address
cannot happen any more after commit 8a4c5c329d "PCI: Check parent
kobject in pci_destroy_dev()", because pci_destroy_dev() will now
return immediately if it has already been executed for the given
device.
Note, however, that the invocation of msi_remove_pci_irq_vectors()
removed by commit ef83b0781a from pci_free_resources() along with
the other changes made by it is not added back because of subsequent
code changes depending on that modification.
Fixes: ef83b0781a (PCI: Remove from bus_list and release resources in pci_release_dev())
Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are multiple PCI device addition and removal code paths that may be
run concurrently with the generic PCI bus rescan and device removal that
can be triggered via sysfs. If that happens, it may lead to multiple
different, potentially dangerous race conditions.
The most straightforward way to address those problems is to run
the code in question under the same lock that is used by the
generic rescan/remove code in pci-sysfs.c. To prepare for those
changes, move the definition of the global PCI remove/rescan lock
to probe.c and provide global wrappers, pci_lock_rescan_remove()
and pci_unlock_rescan_remove(), allowing drivers to manipulate
that lock. Also provide pci_stop_and_remove_bus_device_locked()
for the callers of pci_stop_and_remove_bus_device() who only need
to hold the rescan/remove lock around it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Using 'make namespacecheck' identify code which should be declared static.
Checked for users in other driver/archs as well. Compile tested only.
This stops exporting the following interfaces to modules:
pci_target_state()
pci_load_saved_state()
[bhelgaas: retained pci_find_next_ext_capability() and pci_cfg_space_size()]
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
My philosophy is unused code is dead code. And dead code is subject to bit
rot and is a likely source of bugs. Use it or lose it.
This removes this unused and deprecated interface:
alloc_pci_dev()
[bhelgaas: split to separate patch]
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* pci/resource:
PCI: Allocate 64-bit BARs above 4G when possible
PCI: Enforce bus address limits in resource allocation
PCI: Split out bridge window override of minimum allocation address
agp/ati: Use PCI_COMMAND instead of hard-coded 4
agp/intel: Use CPU physical address, not bus address, for ioremap()
agp/intel: Use pci_bus_address() to get GTTADR bus address
agp/intel: Use pci_bus_address() to get MMADR bus address
agp/intel: Support 64-bit GMADR
agp/intel: Rename gtt_bus_addr to gtt_phys_addr
drm/i915: Rename gtt_bus_addr to gtt_phys_addr
agp: Use pci_resource_start() to get CPU physical address for BAR
agp: Support 64-bit APBASE
PCI: Add pci_bus_address() to get bus address of a BAR
PCI: Convert pcibios_resource_to_bus() to take a pci_bus, not a pci_dev
PCI: Change pci_bus_region addresses to dma_addr_t
These interfaces:
pcibios_resource_to_bus(struct pci_dev *dev, *bus_region, *resource)
pcibios_bus_to_resource(struct pci_dev *dev, *resource, *bus_region)
took a pci_dev, but they really depend only on the pci_bus. And we want to
use them in resource allocation paths where we have the bus but not a
device, so this patch converts them to take the pci_bus instead of the
pci_dev:
pcibios_resource_to_bus(struct pci_bus *bus, *bus_region, *resource)
pcibios_bus_to_resource(struct pci_bus *bus, *resource, *bus_region)
In fact, with standard PCI-PCI bridges, they only depend on the host
bridge, because that's the only place address translation occurs, but
we aren't going that far yet.
[bhelgaas: changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Previously we removed the pci_dev from the bus_list and released its
resources in pci_destroy_dev(). But that's too early: it's possible to
call pci_destroy_dev() twice for the same device (e.g., via sysfs), and
that will cause an oops when we try to remove it from bus_list the second
time.
We should remove it from the bus_list only when the last reference to the
pci_dev has been released, i.e., in pci_release_dev().
[bhelgaas: changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
4f535093cf ("PCI: Put pci_dev in device tree as early as possible")
moved pci_proc_attach_device() from pci_bus_add_device() to
pci_device_add().
This moves it back to pci_bus_add_device(), essentially reverting that
part of 4f535093cf. This makes it symmetric with pci_stop_dev(),
where we call pci_proc_detach_device() and pci_remove_sysfs_dev_files()
and set dev->is_added = 0.
[bhelgaas: changelog, create sysfs then attach proc for symmetry]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Fix whitespace, capitalization, and spelling errors. No functional change.
I know "busses" is not an error, but "buses" was more common, so I used it
consistently.
Signed-off-by: Marta Rybczynska <rybczynska@gmail.com> (pci_reset_bridge_secondary_bus())
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* pci/misc:
PCI: Remove unused PCI_MSIX_FLAGS_BIRMASK definition
PCI: acpiphp_ibm: Convert to dynamic debug
PCI: acpiphp: Convert to dynamic debug
PCI: Remove Intel Haswell D3 delays
PCI: Pass type, width, and prefetchability for window alignment
PCI: Document reason for using pci_is_root_bus()
PCI: Use pci_is_root_bus() to check for root bus
PCI: Remove unused "is_pcie" from pci_dev structure
PCI: Update pci_find_slot() description in pci.txt
[SCSI] qla2xxx: Use standard PCIe Capability Link register field names
PCI: Fix comment typo, remove unnecessary !! in pci_is_pcie()
PCI: Drop "setting latency timer" messages
No one uses "is_pcie" now; remove this obsolete member.
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Use pci_is_pcie() instead of pci_find_capability() to simplify code.
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This branch contains mostly additions and changes to platform enablement
and SoC-level drivers. Since there's sometimes a dependency on device-tree
changes, there's also a fair amount of those in this branch.
Pieces worth mentioning are:
- Mbus driver for Marvell platforms, allowing kernel configuration
and resource allocation of on-chip peripherals.
- Enablement of the mbus infrastructure from Marvell PCI-e drivers.
- Preparation of MSI support for Marvell platforms.
- Addition of new PCI-e host controller driver for Tegra platforms
- Some churn caused by sharing of macro names between i.MX 6Q and 6DL
platforms in the device tree sources and header files.
- Various suspend/PM updates for Tegra, including LP1 support.
- Versatile Express support for MCPM, part of big little support.
- Allwinner platform support for A20 and A31 SoCs (dual and quad Cortex-A7)
- OMAP2+ support for DRA7, a new Cortex-A15-based SoC.
The code that touches other architectures are patches moving
MSI arch-specific functions over to weak symbols and removal of
ARCH_SUPPORTS_MSI, acked by PCI maintainers.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJSKhYmAAoJEIwa5zzehBx322AP/1ONYs8o8f7/Gzq6lZvTN6T3
0pBTApg6Jfioi3lwKvUAEIcsW82YKQ+UZkbW66GQH6+Ri4aZJKZHuz0+JPU67OJ4
LtSLuzVWrymy2VOOUvAnS/SXkOZw/pHhU4cLNHn1dMndhUL1Uqp9/XwuiHEQyFsP
uOkpcBtIu0EWElov0PKKZ5SWBg8JJs2vy5ydiViGelWHCrZvDDZkWzIsDcBQxJLQ
juzT4+JE+KOu7vKmfw78o6iHoCS2TBRAN9YUCajRb8Wl+out1hrTahHnDWaZ5Mce
EskcQNkJROqFbjD4k3ABN4XGTv2VDmrztIwFe0SEQ7Dz/9ypCrBGT69uI9xIqTXr
GwVRIwAUFTpMupK0gy93z1ajV3N0CXV79out9+jQNUQybYE+czp8QOyhmuc1tZx0
8fn9jlBQe9Vy6yrs39gEcE7nUwrayeyQ+6UvqqwsE2pWZabNAnCMSPX5+QIu+T/3
tQ7+jYmfFeserp1sIDOHOnxfhtW9EI6U9d1h/DUCwrsuFdkL9ha4M/vh9Pwgye98
tBdz0T4yE39AJQwwFWRkv1jcQKcGu6WqJanmvS4KRBksGwuLWxy+ewOnkz2ifS25
ZYSyxAryZRBvQRqlOK11rXPfRcbGcY0MG9lkKX96rGcyWEizgE1DdjxXD8HoIleN
R8heV6GX5OzlFLGX2tKK
=fJ5x
-----END PGP SIGNATURE-----
Merge tag 'soc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC platform changes from Olof Johansson:
"This branch contains mostly additions and changes to platform
enablement and SoC-level drivers. Since there's sometimes a
dependency on device-tree changes, there's also a fair amount of
those in this branch.
Pieces worth mentioning are:
- Mbus driver for Marvell platforms, allowing kernel configuration
and resource allocation of on-chip peripherals.
- Enablement of the mbus infrastructure from Marvell PCI-e drivers.
- Preparation of MSI support for Marvell platforms.
- Addition of new PCI-e host controller driver for Tegra platforms
- Some churn caused by sharing of macro names between i.MX 6Q and 6DL
platforms in the device tree sources and header files.
- Various suspend/PM updates for Tegra, including LP1 support.
- Versatile Express support for MCPM, part of big little support.
- Allwinner platform support for A20 and A31 SoCs (dual and quad
Cortex-A7)
- OMAP2+ support for DRA7, a new Cortex-A15-based SoC.
The code that touches other architectures are patches moving MSI
arch-specific functions over to weak symbols and removal of
ARCH_SUPPORTS_MSI, acked by PCI maintainers"
* tag 'soc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (266 commits)
tegra-cpuidle: provide stub when !CONFIG_CPU_IDLE
PCI: tegra: replace devm_request_and_ioremap by devm_ioremap_resource
ARM: tegra: Drop ARCH_SUPPORTS_MSI and sort list
ARM: dts: vf610-twr: enable i2c0 device
ARM: dts: i.MX51: Add one more I2C2 pinmux entry
ARM: dts: i.MX51: Move pins configuration under "iomuxc" label
ARM: dtsi: imx6qdl-sabresd: Add USB OTG vbus pin to pinctrl_hog
ARM: dtsi: imx6qdl-sabresd: Add USB host 1 VBUS regulator
ARM: dts: imx27-phytec-phycore-som: Enable AUDMUX
ARM: dts: i.MX27: Disable AUDMUX in the template
ARM: dts: wandboard: Add support for SDIO bcm4329
ARM: i.MX5 clocks: Remove optional clock setup (CKIH1) from i.MX51 template
ARM: dts: imx53-qsb: Make USBH1 functional
ARM i.MX6Q: dts: Enable I2C1 with EEPROM and PMIC on Phytec phyFLEX-i.MX6 Ouad module
ARM i.MX6Q: dts: Enable SPI NOR flash on Phytec phyFLEX-i.MX6 Ouad module
ARM: dts: imx6qdl-sabresd: Add touchscreen support
ARM: imx: add ocram clock for imx53
ARM: dts: imx: ocram size is different between imx6q and imx6dl
ARM: dts: imx27-phytec-phycore-som: Fix regulator settings
ARM: dts: i.MX27: Remove clock name from CPU node
...
Pull networking changes from David Miller:
"Noteworthy changes this time around:
1) Multicast rejoin support for team driver, from Jiri Pirko.
2) Centralize and simplify TCP RTT measurement handling in order to
reduce the impact of bad RTO seeding from SYN/ACKs. Also, when
both timestamps and local RTT measurements are available prefer
the later because there are broken middleware devices which
scramble the timestamp.
From Yuchung Cheng.
3) Add TCP_NOTSENT_LOWAT socket option to limit the amount of kernel
memory consumed to queue up unsend user data. From Eric Dumazet.
4) Add a "physical port ID" abstraction for network devices, from
Jiri Pirko.
5) Add a "suppress" operation to influence fib_rules lookups, from
Stefan Tomanek.
6) Add a networking development FAQ, from Paul Gortmaker.
7) Extend the information provided by tcp_probe and add ipv6 support,
from Daniel Borkmann.
8) Use RCU locking more extensively in openvswitch data paths, from
Pravin B Shelar.
9) Add SCTP support to openvswitch, from Joe Stringer.
10) Add EF10 chip support to SFC driver, from Ben Hutchings.
11) Add new SYNPROXY netfilter target, from Patrick McHardy.
12) Compute a rate approximation for sending in TCP sockets, and use
this to more intelligently coalesce TSO frames. Furthermore, add
a new packet scheduler which takes advantage of this estimate when
available. From Eric Dumazet.
13) Allow AF_PACKET fanouts with random selection, from Daniel
Borkmann.
14) Add ipv6 support to vxlan driver, from Cong Wang"
Resolved conflicts as per discussion.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1218 commits)
openvswitch: Fix alignment of struct sw_flow_key.
netfilter: Fix build errors with xt_socket.c
tcp: Add missing braces to do_tcp_setsockopt
caif: Add missing braces to multiline if in cfctrl_linkup_request
bnx2x: Add missing braces in bnx2x:bnx2x_link_initialize
vxlan: Fix kernel panic on device delete.
net: mvneta: implement ->ndo_do_ioctl() to support PHY ioctls
net: mvneta: properly disable HW PHY polling and ensure adjust_link() works
icplus: Use netif_running to determine device state
ethernet/arc/arc_emac: Fix huge delays in large file copies
tuntap: orphan frags before trying to set tx timestamp
tuntap: purge socket error queue on detach
qlcnic: use standard NAPI weights
ipv6:introduce function to find route for redirect
bnx2x: VF RSS support - VF side
bnx2x: VF RSS support - PF side
vxlan: Notify drivers for listening UDP port changes
net: usbnet: update addr_assign_type if appropriate
driver/net: enic: update enic maintainers and driver
driver/net: enic: Exposing symbols for Cisco's low latency driver
...
* pci/misc:
PCI: Remove pcie_cap_has_devctl()
PCI: Support PCIe Capability Slot registers only for ports with slots
PCI: Remove PCIe Capability version checks
PCI: Allow PCIe Capability link-related register access for switches
PCI: Add offsets of PCIe capability registers
PCI: Tidy bitmasks and spacing of PCIe capability definitions
PCI: Remove obsolete comment reference to pci_pcie_cap2()
PCI: Clarify PCI_EXP_TYPE_PCI_BRIDGE comment
PCI: Rename PCIe capability definitions to follow convention
PCI: Disable decoding for BAR sizing only when it was actually enabled
PCI: Add comment about needing pci_msi_off() even when CONFIG_PCI_MSI=n
PCI: Add pcibios_pm_ops for optional arch-specific hibernate functionality
* pci/yinghai-assign-unassigned-v6:
PCI: Assign resources for hot-added host bridge more aggressively
PCI: Move resource reallocation code to non-__init
PCI: Delay enabling bridges until they're needed
PCI: Assign resources on a per-bus basis
PCI: Enable unassigned resource reallocation on per-bus basis
PCI: Turn on reallocation for unassigned resources with host bridge offset
PCI: Look for unassigned resources on per-bus basis
PCI: Drop temporary variable in pci_assign_unassigned_resources()
If a BIOS configures MPS incorrectly, devices may not work normally.
For example, if a bridge has MPS set larger than an endpoint below it,
the endpoint may discard packets.
To help diagnose this issue, print a warning if we find an endpoint
MPS setting different than that of the upstream bridge.
[bhelgaas: changelog, "bridge" temporary, warning text]
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=60799
Reported-by: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jon Mason <jdmason@kudzu.us>