OpenCloudOS-Kernel/include/linux/pci.h

2503 lines
90 KiB
C
Raw Normal View History

License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 22:07:57 +08:00
/* SPDX-License-Identifier: GPL-2.0 */
/*
* pci.h
*
* PCI defines and function prototypes
* Copyright 1994, Drew Eckhardt
* Copyright 1997--1999 Martin Mares <mj@ucw.cz>
*
* PCI Express ASPM defines and function prototypes
* Copyright (c) 2007 Intel Corp.
* Zhang Yanmin (yanmin.zhang@intel.com)
* Shaohua Li (shaohua.li@intel.com)
*
* For more information, please consult the following manuals (look at
* http://www.pcisig.com/ for how to get them):
*
* PCI BIOS Specification
* PCI Local Bus Specification
* PCI to PCI Bridge Specification
* PCI Express Specification
* PCI System Design Guide
*/
#ifndef LINUX_PCI_H
#define LINUX_PCI_H
#include <linux/mod_devicetable.h>
#include <linux/types.h>
#include <linux/init.h>
#include <linux/ioport.h>
#include <linux/list.h>
#include <linux/compiler.h>
#include <linux/errno.h>
PCI: introduce pci_slot Currently, /sys/bus/pci/slots/ only exposes hotplug attributes when a hotplug driver is loaded, but PCI slots have attributes such as address, speed, width, etc. that are not related to hotplug at all. Introduce pci_slot as the primary data structure and kobject model. Hotplug attributes described in hotplug_slot become a secondary structure associated with the pci_slot. This patch only creates the infrastructure that allows the separation of PCI slot attributes and hotplug attributes. In this patch, the PCI hotplug core remains the only user of this infrastructure, and thus, /sys/bus/pci/slots/ will still only become populated when a hotplug driver is loaded. A later patch in this series will add a second user of this new infrastructure and demonstrate splitting the task of exposing pci_slot attributes from hotplug_slot attributes. - Make pci_slot the primary sysfs entity. hotplug_slot becomes a subsidiary structure. o pci_create_slot() creates and registers a slot with the PCI core o pci_slot_add_hotplug() gives it hotplug capability - Change the prototype of pci_hp_register() to take the bus and slot number (on parent bus) as parameters. - Remove all the ->get_address methods since this functionality is now handled by pci_slot directly. [achiang@hp.com: rpaphp-correctly-pci_hp_register-for-empty-pci-slots] Tested-by: Badari Pulavarty <pbadari@us.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: make headers_check happy] [akpm@linux-foundation.org: nuther build fix] [akpm@linux-foundation.org: fix typo in #include] Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Matthew Wilcox <matthew@wil.cx> Cc: Greg KH <greg@kroah.com> Cc: Kristen Carlson Accardi <kristen.c.accardi@intel.com> Cc: Len Brown <lenb@kernel.org> Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-11 05:28:50 +08:00
#include <linux/kobject.h>
#include <linux/atomic.h>
#include <linux/device.h>
#include <linux/interrupt.h>
#include <linux/io.h>
#include <linux/resource_ext.h>
#include <uapi/linux/pci.h>
#include <linux/pci_ids.h>
#define PCI_STATUS_ERROR_BITS (PCI_STATUS_DETECTED_PARITY | \
PCI_STATUS_SIG_SYSTEM_ERROR | \
PCI_STATUS_REC_MASTER_ABORT | \
PCI_STATUS_REC_TARGET_ABORT | \
PCI_STATUS_SIG_TARGET_ABORT | \
PCI_STATUS_PARITY)
/* Number of reset methods used in pci_reset_fn_methods array in pci.c */
#define PCI_NUM_RESET_METHODS 7
#define PCI_RESET_PROBE true
#define PCI_RESET_DO_RESET false
/*
* The PCI interface treats multi-function devices as independent
* devices. The slot/function address of each device is encoded
* in a single byte as follows:
*
* 7:3 = slot
* 2:0 = function
*
* PCI_DEVFN(), PCI_SLOT(), and PCI_FUNC() are defined in uapi/linux/pci.h.
* In the interest of not exposing interfaces to user-space unnecessarily,
* the following kernel-only defines are being added here.
*/
#define PCI_DEVID(bus, devfn) ((((u16)(bus)) << 8) | (devfn))
/* return bus from PCI devid = ((u16)bus_number) << 8) | devfn */
#define PCI_BUS_NUM(x) (((x) >> 8) & 0xff)
PCI: introduce pci_slot Currently, /sys/bus/pci/slots/ only exposes hotplug attributes when a hotplug driver is loaded, but PCI slots have attributes such as address, speed, width, etc. that are not related to hotplug at all. Introduce pci_slot as the primary data structure and kobject model. Hotplug attributes described in hotplug_slot become a secondary structure associated with the pci_slot. This patch only creates the infrastructure that allows the separation of PCI slot attributes and hotplug attributes. In this patch, the PCI hotplug core remains the only user of this infrastructure, and thus, /sys/bus/pci/slots/ will still only become populated when a hotplug driver is loaded. A later patch in this series will add a second user of this new infrastructure and demonstrate splitting the task of exposing pci_slot attributes from hotplug_slot attributes. - Make pci_slot the primary sysfs entity. hotplug_slot becomes a subsidiary structure. o pci_create_slot() creates and registers a slot with the PCI core o pci_slot_add_hotplug() gives it hotplug capability - Change the prototype of pci_hp_register() to take the bus and slot number (on parent bus) as parameters. - Remove all the ->get_address methods since this functionality is now handled by pci_slot directly. [achiang@hp.com: rpaphp-correctly-pci_hp_register-for-empty-pci-slots] Tested-by: Badari Pulavarty <pbadari@us.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: make headers_check happy] [akpm@linux-foundation.org: nuther build fix] [akpm@linux-foundation.org: fix typo in #include] Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Matthew Wilcox <matthew@wil.cx> Cc: Greg KH <greg@kroah.com> Cc: Kristen Carlson Accardi <kristen.c.accardi@intel.com> Cc: Len Brown <lenb@kernel.org> Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-11 05:28:50 +08:00
/* pci_slot represents a physical slot */
struct pci_slot {
struct pci_bus *bus; /* Bus this slot is on */
struct list_head list; /* Node in list of slots */
struct hotplug_slot *hotplug; /* Hotplug info (move here) */
unsigned char number; /* PCI_SLOT(pci_dev->devfn) */
struct kobject kobj;
PCI: introduce pci_slot Currently, /sys/bus/pci/slots/ only exposes hotplug attributes when a hotplug driver is loaded, but PCI slots have attributes such as address, speed, width, etc. that are not related to hotplug at all. Introduce pci_slot as the primary data structure and kobject model. Hotplug attributes described in hotplug_slot become a secondary structure associated with the pci_slot. This patch only creates the infrastructure that allows the separation of PCI slot attributes and hotplug attributes. In this patch, the PCI hotplug core remains the only user of this infrastructure, and thus, /sys/bus/pci/slots/ will still only become populated when a hotplug driver is loaded. A later patch in this series will add a second user of this new infrastructure and demonstrate splitting the task of exposing pci_slot attributes from hotplug_slot attributes. - Make pci_slot the primary sysfs entity. hotplug_slot becomes a subsidiary structure. o pci_create_slot() creates and registers a slot with the PCI core o pci_slot_add_hotplug() gives it hotplug capability - Change the prototype of pci_hp_register() to take the bus and slot number (on parent bus) as parameters. - Remove all the ->get_address methods since this functionality is now handled by pci_slot directly. [achiang@hp.com: rpaphp-correctly-pci_hp_register-for-empty-pci-slots] Tested-by: Badari Pulavarty <pbadari@us.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: make headers_check happy] [akpm@linux-foundation.org: nuther build fix] [akpm@linux-foundation.org: fix typo in #include] Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Matthew Wilcox <matthew@wil.cx> Cc: Greg KH <greg@kroah.com> Cc: Kristen Carlson Accardi <kristen.c.accardi@intel.com> Cc: Len Brown <lenb@kernel.org> Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-11 05:28:50 +08:00
};
static inline const char *pci_slot_name(const struct pci_slot *slot)
{
return kobject_name(&slot->kobj);
}
/* File state for mmap()s on /proc/bus/pci/X/Y */
enum pci_mmap_state {
pci_mmap_io,
pci_mmap_mem
};
/* For PCI devices, the region numbers are assigned this way: */
enum {
/* #0-5: standard PCI resources */
PCI_STD_RESOURCES,
PCI_STD_RESOURCE_END = PCI_STD_RESOURCES + PCI_STD_NUM_BARS - 1,
/* #6: expansion ROM resource */
PCI_ROM_RESOURCE,
/* Device-specific resources */
#ifdef CONFIG_PCI_IOV
PCI_IOV_RESOURCES,
PCI_IOV_RESOURCE_END = PCI_IOV_RESOURCES + PCI_SRIOV_NUM_BARS - 1,
#endif
/* PCI-to-PCI (P2P) bridge windows */
#define PCI_BRIDGE_IO_WINDOW (PCI_BRIDGE_RESOURCES + 0)
#define PCI_BRIDGE_MEM_WINDOW (PCI_BRIDGE_RESOURCES + 1)
#define PCI_BRIDGE_PREF_MEM_WINDOW (PCI_BRIDGE_RESOURCES + 2)
/* CardBus bridge windows */
#define PCI_CB_BRIDGE_IO_0_WINDOW (PCI_BRIDGE_RESOURCES + 0)
#define PCI_CB_BRIDGE_IO_1_WINDOW (PCI_BRIDGE_RESOURCES + 1)
#define PCI_CB_BRIDGE_MEM_0_WINDOW (PCI_BRIDGE_RESOURCES + 2)
#define PCI_CB_BRIDGE_MEM_1_WINDOW (PCI_BRIDGE_RESOURCES + 3)
/* Total number of bridge resources for P2P and CardBus */
#define PCI_BRIDGE_RESOURCE_NUM 4
/* Resources assigned to buses behind the bridge */
PCI_BRIDGE_RESOURCES,
PCI_BRIDGE_RESOURCE_END = PCI_BRIDGE_RESOURCES +
PCI_BRIDGE_RESOURCE_NUM - 1,
/* Total resources associated with a PCI device */
PCI_NUM_RESOURCES,
/* Preserve this for compatibility */
DEVICE_COUNT_RESOURCE = PCI_NUM_RESOURCES,
};
/**
* enum pci_interrupt_pin - PCI INTx interrupt values
* @PCI_INTERRUPT_UNKNOWN: Unknown or unassigned interrupt
* @PCI_INTERRUPT_INTA: PCI INTA pin
* @PCI_INTERRUPT_INTB: PCI INTB pin
* @PCI_INTERRUPT_INTC: PCI INTC pin
* @PCI_INTERRUPT_INTD: PCI INTD pin
*
* Corresponds to values for legacy PCI INTx interrupts, as can be found in the
* PCI_INTERRUPT_PIN register.
*/
enum pci_interrupt_pin {
PCI_INTERRUPT_UNKNOWN,
PCI_INTERRUPT_INTA,
PCI_INTERRUPT_INTB,
PCI_INTERRUPT_INTC,
PCI_INTERRUPT_INTD,
};
/* The number of legacy PCI INTx interrupts */
#define PCI_NUM_INTX 4
/*
* Reading from a device that doesn't respond typically returns ~0. A
* successful read from a device may also return ~0, so you need additional
* information to reliably identify errors.
*/
#define PCI_ERROR_RESPONSE (~0ULL)
#define PCI_SET_ERROR_RESPONSE(val) (*(val) = ((typeof(*(val))) PCI_ERROR_RESPONSE))
#define PCI_POSSIBLE_ERROR(val) ((val) == ((typeof(val)) PCI_ERROR_RESPONSE))
/*
* pci_power_t values must match the bits in the Capabilities PME_Support
* and Control/Status PowerState fields in the Power Management capability.
*/
typedef int __bitwise pci_power_t;
#define PCI_D0 ((pci_power_t __force) 0)
#define PCI_D1 ((pci_power_t __force) 1)
#define PCI_D2 ((pci_power_t __force) 2)
#define PCI_D3hot ((pci_power_t __force) 3)
#define PCI_D3cold ((pci_power_t __force) 4)
#define PCI_UNKNOWN ((pci_power_t __force) 5)
#define PCI_POWER_ERROR ((pci_power_t __force) -1)
/* Remember to update this when the list above changes! */
extern const char *pci_power_names[];
static inline const char *pci_power_name(pci_power_t state)
{
return pci_power_names[1 + (__force int) state];
}
/**
* typedef pci_channel_state_t
*
* The pci_channel state describes connectivity between the CPU and
* the PCI device. If some PCI bus between here and the PCI device
* has crashed or locked up, this info is reflected here.
*/
typedef unsigned int __bitwise pci_channel_state_t;
enum {
/* I/O channel is in normal state */
pci_channel_io_normal = (__force pci_channel_state_t) 1,
/* I/O to channel is blocked */
pci_channel_io_frozen = (__force pci_channel_state_t) 2,
/* PCI card is dead */
pci_channel_io_perm_failure = (__force pci_channel_state_t) 3,
};
typedef unsigned int __bitwise pcie_reset_state_t;
enum pcie_reset_state {
/* Reset is NOT asserted (Use to deassert reset) */
pcie_deassert_reset = (__force pcie_reset_state_t) 1,
/* Use #PERST to reset PCIe device */
pcie_warm_reset = (__force pcie_reset_state_t) 2,
/* Use PCIe Hot Reset to reset device */
pcie_hot_reset = (__force pcie_reset_state_t) 3
};
typedef unsigned short __bitwise pci_dev_flags_t;
enum pci_dev_flags {
/* INTX_DISABLE in PCI_COMMAND register disables MSI too */
PCI_DEV_FLAGS_MSI_INTX_DISABLE_BUG = (__force pci_dev_flags_t) (1 << 0),
/* Device configuration is irrevocably lost if disabled into D3 */
PCI_DEV_FLAGS_NO_D3 = (__force pci_dev_flags_t) (1 << 1),
/* Provide indication device is assigned by a Virtual Machine Manager */
PCI_DEV_FLAGS_ASSIGNED = (__force pci_dev_flags_t) (1 << 2),
/* Flag for quirk use to store if quirk-specific ACS is enabled */
PCI_DEV_FLAGS_ACS_ENABLED_QUIRK = (__force pci_dev_flags_t) (1 << 3),
/* Use a PCIe-to-PCI bridge alias even if !pci_is_pcie */
PCI_DEV_FLAG_PCIE_BRIDGE_ALIAS = (__force pci_dev_flags_t) (1 << 5),
/* Do not use bus resets for device */
PCI_DEV_FLAGS_NO_BUS_RESET = (__force pci_dev_flags_t) (1 << 6),
/* Do not use PM reset even if device advertises NoSoftRst- */
PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
/* Get VPD from function 0 VPD */
PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
/* A non-root bridge where translation occurs, stop alias search here */
PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT = (__force pci_dev_flags_t) (1 << 9),
/* Do not use FLR even if device advertises PCI_AF_CAP */
PCI_DEV_FLAGS_NO_FLR_RESET = (__force pci_dev_flags_t) (1 << 10),
/* Don't use Relaxed Ordering for TLPs directed at this device */
PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 11),
/* Device does honor MSI masking despite saying otherwise */
PCI_DEV_FLAGS_HAS_MSI_MASKING = (__force pci_dev_flags_t) (1 << 12),
};
enum pci_irq_reroute_variant {
INTEL_IRQ_REROUTE_VARIANT = 1,
MAX_IRQ_REROUTE_VARIANTS = 3
};
typedef unsigned short __bitwise pci_bus_flags_t;
enum pci_bus_flags {
PCI_BUS_FLAGS_NO_MSI = (__force pci_bus_flags_t) 1,
PCI_BUS_FLAGS_NO_MMRBC = (__force pci_bus_flags_t) 2,
PCI_BUS_FLAGS_NO_AERSID = (__force pci_bus_flags_t) 4,
PCI: Check whether bridges allow access to extended config space Even if a device supports extended config space, i.e., it is a PCI-X Mode 2 or a PCI Express device, the extended space may not be accessible if there's a conventional PCI bus in the path to it. We currently figure that out in pci_cfg_space_size() by reading the first dword of extended config space. On most platforms that returns ~0 data if the space is inaccessible, but it may set error bits in PCI status registers, and on some platforms it causes exceptions that we currently don't recover from. For example, a PCIe-to-conventional PCI bridge treats config transactions with a non-zero Extended Register Address as an Unsupported Request on PCIe and a received Master-Abort on the destination bus (see PCI Express to PCI/PCI-X Bridge spec, r1.0, sec 4.1.3). A sample case is a LS1043A CPU (NXP QorIQ Layerscape) platform with the following bus topology: LS1043 PCIe Root Port -> PEX8112 PCIe-to-PCI bridge (doesn't support ext cfg on PCI side) -> PMC slot connector (for legacy PMC modules) With a PMC module topology as follows: PMC connector -> PCI-to-PCIe bridge -> PCIe switch (4 ports) -> 4 PCIe devices (one on each port) The PCIe devices on the PMC module support extended config space, but we can't reach it because the PEX8112 can't generate accesses to the extended space on its secondary bus. Attempts to access it cause Unsupported Request errors, which result in synchronous aborts on this platform. To avoid these errors, check whether bridges are capable of generating extended config space addresses on their secondary interfaces. If they can't, we restrict devices below the bridge to only the 256-byte PCI-compatible config space. Signed-off-by: Gilles Buloz <gilles.buloz@kontron.com> [bhelgaas: changelog, rework patch so bus_flags testing is all in pci_bridge_child_ext_cfg_accessible()] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-05-04 04:21:44 +08:00
PCI_BUS_FLAGS_NO_EXTCFG = (__force pci_bus_flags_t) 8,
};
/* Values from Link Status register, PCIe r3.1, sec 7.8.8 */
enum pcie_link_width {
PCIE_LNK_WIDTH_RESRV = 0x00,
PCIE_LNK_X1 = 0x01,
PCIE_LNK_X2 = 0x02,
PCIE_LNK_X4 = 0x04,
PCIE_LNK_X8 = 0x08,
PCIE_LNK_X12 = 0x0c,
PCIE_LNK_X16 = 0x10,
PCIE_LNK_X32 = 0x20,
PCIE_LNK_WIDTH_UNKNOWN = 0xff,
};
/* See matching string table in pci_speed_string() */
enum pci_bus_speed {
PCI_SPEED_33MHz = 0x00,
PCI_SPEED_66MHz = 0x01,
PCI_SPEED_66MHz_PCIX = 0x02,
PCI_SPEED_100MHz_PCIX = 0x03,
PCI_SPEED_133MHz_PCIX = 0x04,
PCI_SPEED_66MHz_PCIX_ECC = 0x05,
PCI_SPEED_100MHz_PCIX_ECC = 0x06,
PCI_SPEED_133MHz_PCIX_ECC = 0x07,
PCI_SPEED_66MHz_PCIX_266 = 0x09,
PCI_SPEED_100MHz_PCIX_266 = 0x0a,
PCI_SPEED_133MHz_PCIX_266 = 0x0b,
AGP_UNKNOWN = 0x0c,
AGP_1X = 0x0d,
AGP_2X = 0x0e,
AGP_4X = 0x0f,
AGP_8X = 0x10,
PCI_SPEED_66MHz_PCIX_533 = 0x11,
PCI_SPEED_100MHz_PCIX_533 = 0x12,
PCI_SPEED_133MHz_PCIX_533 = 0x13,
PCIE_SPEED_2_5GT = 0x14,
PCIE_SPEED_5_0GT = 0x15,
PCIE_SPEED_8_0GT = 0x16,
PCIE_SPEED_16_0GT = 0x17,
PCIE_SPEED_32_0GT = 0x18,
PCIE_SPEED_64_0GT = 0x19,
PCI_SPEED_UNKNOWN = 0xff,
};
enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev);
enum pcie_link_width pcie_get_width_cap(struct pci_dev *dev);
struct pci_vpd {
struct mutex lock;
unsigned int len;
u8 cap;
};
struct irq_affinity;
struct pcie_link_state;
struct pci_sriov;
PCI/P2PDMA: Support peer-to-peer memory Some PCI devices may have memory mapped in a BAR space that's intended for use in peer-to-peer transactions. To enable such transactions the memory must be registered with ZONE_DEVICE pages so it can be used by DMA interfaces in existing drivers. Add an interface for other subsystems to find and allocate chunks of P2P memory as necessary to facilitate transfers between two PCI peers: struct pci_dev *pci_p2pmem_find[_many](); int pci_p2pdma_distance[_many](); void *pci_alloc_p2pmem(); The new interface requires a driver to collect a list of client devices involved in the transaction then call pci_p2pmem_find() to obtain any suitable P2P memory. Alternatively, if the caller knows a device which provides P2P memory, they can use pci_p2pdma_distance() to determine if it is usable. With a suitable p2pmem device, memory can then be allocated with pci_alloc_p2pmem() for use in DMA transactions. Depending on hardware, using peer-to-peer memory may reduce the bandwidth of the transfer but can significantly reduce pressure on system memory. This may be desirable in many cases: for example a system could be designed with a small CPU connected to a PCIe switch by a small number of lanes which would maximize the number of lanes available to connect to NVMe devices. The code is designed to only utilize the p2pmem device if all the devices involved in a transfer are behind the same PCI bridge. This is because we have no way of knowing whether peer-to-peer routing between PCIe Root Ports is supported (PCIe r4.0, sec 1.3.1). Additionally, the benefits of P2P transfers that go through the RC is limited to only reducing DRAM usage and, in some cases, coding convenience. The PCI-SIG may be exploring adding a new capability bit to advertise whether this is possible for future hardware. This commit includes significant rework and feedback from Christoph Hellwig. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> [bhelgaas: fold in fix from Keith Busch <keith.busch@intel.com>: https://lore.kernel.org/linux-pci/20181012155920.15418-1-keith.busch@intel.com, to address comment from Dan Carpenter <dan.carpenter@oracle.com>, fold in https://lore.kernel.org/linux-pci/20181017160510.17926-1-logang@deltatee.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-10-05 05:27:35 +08:00
struct pci_p2pdma;
struct rcec_ea;
/* The pci_dev structure describes PCI devices */
struct pci_dev {
struct list_head bus_list; /* Node in per-bus list */
struct pci_bus *bus; /* Bus this device is on */
struct pci_bus *subordinate; /* Bus this device bridges to */
void *sysdata; /* Hook for sys-specific extension */
struct proc_dir_entry *procent; /* Device entry in /proc/bus/pci */
PCI: introduce pci_slot Currently, /sys/bus/pci/slots/ only exposes hotplug attributes when a hotplug driver is loaded, but PCI slots have attributes such as address, speed, width, etc. that are not related to hotplug at all. Introduce pci_slot as the primary data structure and kobject model. Hotplug attributes described in hotplug_slot become a secondary structure associated with the pci_slot. This patch only creates the infrastructure that allows the separation of PCI slot attributes and hotplug attributes. In this patch, the PCI hotplug core remains the only user of this infrastructure, and thus, /sys/bus/pci/slots/ will still only become populated when a hotplug driver is loaded. A later patch in this series will add a second user of this new infrastructure and demonstrate splitting the task of exposing pci_slot attributes from hotplug_slot attributes. - Make pci_slot the primary sysfs entity. hotplug_slot becomes a subsidiary structure. o pci_create_slot() creates and registers a slot with the PCI core o pci_slot_add_hotplug() gives it hotplug capability - Change the prototype of pci_hp_register() to take the bus and slot number (on parent bus) as parameters. - Remove all the ->get_address methods since this functionality is now handled by pci_slot directly. [achiang@hp.com: rpaphp-correctly-pci_hp_register-for-empty-pci-slots] Tested-by: Badari Pulavarty <pbadari@us.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: make headers_check happy] [akpm@linux-foundation.org: nuther build fix] [akpm@linux-foundation.org: fix typo in #include] Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Matthew Wilcox <matthew@wil.cx> Cc: Greg KH <greg@kroah.com> Cc: Kristen Carlson Accardi <kristen.c.accardi@intel.com> Cc: Len Brown <lenb@kernel.org> Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-11 05:28:50 +08:00
struct pci_slot *slot; /* Physical slot this device is in */
unsigned int devfn; /* Encoded device & function index */
unsigned short vendor;
unsigned short device;
unsigned short subsystem_vendor;
unsigned short subsystem_device;
unsigned int class; /* 3 bytes: (base,sub,prog-if) */
u8 revision; /* PCI revision, low byte of class word */
u8 hdr_type; /* PCI header type (`multi' flag masked out) */
#ifdef CONFIG_PCIEAER
u16 aer_cap; /* AER capability offset */
struct aer_stats *aer_stats; /* AER stats for this device */
#endif
#ifdef CONFIG_PCIEPORTBUS
struct rcec_ea *rcec_ea; /* RCEC cached endpoint association */
struct pci_dev *rcec; /* Associated RCEC device */
#endif
u32 devcap; /* PCIe Device Capabilities */
u8 pcie_cap; /* PCIe capability offset */
u8 msi_cap; /* MSI capability offset */
u8 msix_cap; /* MSI-X capability offset */
u8 pcie_mpss:3; /* PCIe Max Payload Size Supported */
u8 rom_base_reg; /* Config register controlling ROM */
u8 pin; /* Interrupt pin this device uses */
u16 pcie_flags_reg; /* Cached PCIe Capabilities Register */
unsigned long *dma_alias_mask;/* Mask of enabled devfn aliases */
struct pci_driver *driver; /* Driver bound to this device */
u64 dma_mask; /* Mask of the bits of bus address this
device implements. Normally this is
0xffffffff. You only need to change
this if your device has broken DMA
or supports 64-bit transfers. */
struct device_dma_parameters dma_parms;
pci_power_t current_state; /* Current operating state. In ACPI,
this is D0-D3, D0 being fully
functional, and D3 being off. */
unsigned int imm_ready:1; /* Supports Immediate Readiness */
u8 pm_cap; /* PM capability offset */
unsigned int pme_support:5; /* Bitmask of states from which PME#
can be generated */
PCI / PM: Extend PME polling to all PCI devices The land of PCI power management is a land of sorrow and ugliness, especially in the area of signaling events by devices. There are devices that set their PME Status bits, but don't really bother to send a PME message or assert PME#. There are hardware vendors who don't connect PME# lines to the system core logic (they know who they are). There are PCI Express Root Ports that don't bother to trigger interrupts when they receive PME messages from the devices below. There are ACPI BIOSes that forget to provide _PRW methods for devices capable of signaling wakeup. Finally, there are BIOSes that do provide _PRW methods for such devices, but then don't bother to call Notify() for those devices from the corresponding _Lxx/_Exx GPE-handling methods. In all of these cases the kernel doesn't have a chance to receive a proper notification that it should wake up a device, so devices stay in low-power states forever. Worse yet, in some cases they continuously send PME Messages that are silently ignored, because the kernel simply doesn't know that it should clear the device's PME Status bit. This problem was first observed for "parallel" (non-Express) PCI devices on add-on cards and Matthew Garrett addressed it by adding code that polls PME Status bits of such devices, if they are enabled to signal PME, to the kernel. Recently, however, it has turned out that PCI Express devices are also affected by this issue and that it is not limited to add-on devices, so it seems necessary to extend the PME polling to all PCI devices, including PCI Express and planar ones. Still, it would be wasteful to poll the PME Status bits of devices that are known to receive proper PME notifications, so make the kernel (1) poll the PME Status bits of all PCI and PCIe devices enabled to signal PME and (2) disable the PME Status polling for devices for which correct PME notifications are received. Tested-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2011-10-04 05:16:33 +08:00
unsigned int pme_poll:1; /* Poll device's PME status bit */
unsigned int d1_support:1; /* Low power state D1 is supported */
unsigned int d2_support:1; /* Low power state D2 is supported */
PCI/PM: add PCIe runtime D3cold support This patch adds runtime D3cold support and corresponding ACPI platform support. This patch only enables runtime D3cold support; it does not enable D3cold support during system suspend/hibernate. D3cold is the deepest power saving state for a PCIe device, where its main power is removed. While it is in D3cold, you can't access the device at all, not even its configuration space (which is still accessible in D3hot). Therefore the PCI PM registers can not be used to transition into/out of the D3cold state; that must be done by platform logic such as ACPI _PR3. To support wakeup from D3cold, a system may provide auxiliary power, which allows a device to request wakeup using a Beacon or the sideband WAKE# signal. WAKE# is usually connected to platform logic such as ACPI GPE. This is quite different from other power saving states, where devices request wakeup via a PME message on the PCIe link. Some devices, such as those in plug-in slots, have no direct platform logic. For example, there is usually no ACPI _PR3 for them. D3cold support for these devices can be done via the PCIe Downstream Port leading to the device. When the PCIe port is powered on/off, the device is powered on/off too. Wakeup events from the device will be notified to the corresponding PCIe port. For more information about PCIe D3cold and corresponding ACPI support, please refer to: - PCI Express Base Specification Revision 2.0 - Advanced Configuration and Power Interface Specification Revision 5.0 [bhelgaas: changelog] Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl> Originally-by: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-06-23 10:23:51 +08:00
unsigned int no_d1d2:1; /* D1 and D2 are forbidden */
unsigned int no_d3cold:1; /* D3cold is forbidden */
PCI: Put PCIe ports into D3 during suspend Currently the Linux PCI core does not touch power state of PCI bridges and PCIe ports when system suspend is entered. Leaving them in D0 consumes power unnecessarily and may prevent the CPU from entering deeper C-states. With recent PCIe hardware we can power down the ports to save power given that we take into account few restrictions: - The PCIe port hardware is recent enough, starting from 2015. - Devices connected to PCIe ports are effectively in D3cold once the port is transitioned to D3 (the config space is not accessible anymore and the link may be powered down). - Devices behind the PCIe port need to be allowed to transition to D3cold and back. There is a way both drivers and userspace can forbid this. - If the device behind the PCIe port is capable of waking the system it needs to be able to do so from D3cold. This patch adds a new flag to struct pci_device called 'bridge_d3'. This flag is set and cleared by the PCI core whenever there is a change in power management state of any of the devices behind the PCIe port. When system later on is suspended we only need to check this flag and if it is true transition the port to D3 otherwise we leave it in D0. Also provide override mechanism via command line parameter "pcie_port_pm=[off|force]" that can be used to disable or enable the feature regardless of the BIOS manufacturing date. Tested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-02 16:17:12 +08:00
unsigned int bridge_d3:1; /* Allow D3 for bridge */
PCI/PM: add PCIe runtime D3cold support This patch adds runtime D3cold support and corresponding ACPI platform support. This patch only enables runtime D3cold support; it does not enable D3cold support during system suspend/hibernate. D3cold is the deepest power saving state for a PCIe device, where its main power is removed. While it is in D3cold, you can't access the device at all, not even its configuration space (which is still accessible in D3hot). Therefore the PCI PM registers can not be used to transition into/out of the D3cold state; that must be done by platform logic such as ACPI _PR3. To support wakeup from D3cold, a system may provide auxiliary power, which allows a device to request wakeup using a Beacon or the sideband WAKE# signal. WAKE# is usually connected to platform logic such as ACPI GPE. This is quite different from other power saving states, where devices request wakeup via a PME message on the PCIe link. Some devices, such as those in plug-in slots, have no direct platform logic. For example, there is usually no ACPI _PR3 for them. D3cold support for these devices can be done via the PCIe Downstream Port leading to the device. When the PCIe port is powered on/off, the device is powered on/off too. Wakeup events from the device will be notified to the corresponding PCIe port. For more information about PCIe D3cold and corresponding ACPI support, please refer to: - PCI Express Base Specification Revision 2.0 - Advanced Configuration and Power Interface Specification Revision 5.0 [bhelgaas: changelog] Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl> Originally-by: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-06-23 10:23:51 +08:00
unsigned int d3cold_allowed:1; /* D3cold is allowed by user */
unsigned int mmio_always_on:1; /* Disallow turning off io/mem
decoding during BAR sizing */
unsigned int wakeup_prepared:1;
unsigned int skip_bus_pm:1; /* Internal: Skip bus-level PM */
PCI: Add pci_ignore_hotplug() to ignore hotplug events for a device Powering off a hot-pluggable device, e.g., with pci_set_power_state(D3cold), normally generates a hot-remove event that unbinds the driver. Some drivers expect to remain bound to a device even while they power it off and back on again. This can be dangerous, because if the device is removed or replaced while it is powered off, the driver doesn't know that anything changed. But some drivers accept that risk. Add pci_ignore_hotplug() for use by drivers that know their device cannot be removed. Using pci_ignore_hotplug() tells the PCI core that hot-plug events for the device should be ignored. The radeon and nouveau drivers use this to switch between a low-power, integrated GPU and a higher-power, higher-performance discrete GPU. They power off the unused GPU, but they want to remain bound to it. This is a reimplementation of f244d8b623da ("ACPIPHP / radeon / nouveau: Fix VGA switcheroo problem related to hotplug") but extends it to work with both acpiphp and pciehp. This fixes a problem where systems with dual GPUs using the radeon drivers become unusable, freezing every few seconds (see bugzillas below). The resume of the radeon device may also fail, e.g., This fixes problems on dual GPU systems where the radeon driver becomes unusable because of problems while suspending the device, as in bug 79701: [drm] radeon: finishing device. radeon 0000:01:00.0: Userspace still has active objects ! radeon 0000:01:00.0: ffff8800cb4ec288 ffff8800cb4ec000 16384 4294967297 force free ... WARNING: CPU: 0 PID: 67 at /home/apw/COD/linux/drivers/gpu/drm/radeon/radeon_gart.c:234 radeon_gart_unbind+0xd2/0xe0 [radeon]() trying to unbind memory from uninitialized GART ! or while resuming it, as in bug 77261: radeon 0000:01:00.0: ring 0 stalled for more than 10158msec radeon 0000:01:00.0: GPU lockup ... radeon 0000:01:00.0: GPU pci config reset pciehp 0000:00:01.0:pcie04: Card not present on Slot(1-1) radeon 0000:01:00.0: GPU reset succeeded, trying to resume *ERROR* radeon: dpm resume failed radeon 0000:01:00.0: Wait for MC idle timedout ! Link: https://bugzilla.kernel.org/show_bug.cgi?id=77261 Link: https://bugzilla.kernel.org/show_bug.cgi?id=79701 Reported-by: Shawn Starr <shawn.starr@rogers.com> Reported-by: Jose P. <lbdkmjdf@sharklasers.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Rajat Jain <rajatxjain@gmail.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Dave Airlie <airlied@redhat.com> CC: stable@vger.kernel.org # v3.15+
2014-09-11 03:45:01 +08:00
unsigned int ignore_hotplug:1; /* Ignore hotplug events */
unsigned int hotplug_user_indicators:1; /* SlotCtl indicators
controlled exclusively by
user sysfs */
unsigned int clear_retrain_link:1; /* Need to clear Retrain Link
bit manually */
unsigned int d3hot_delay; /* D3hot->D0 transition time in ms */
PCI/PM: add PCIe runtime D3cold support This patch adds runtime D3cold support and corresponding ACPI platform support. This patch only enables runtime D3cold support; it does not enable D3cold support during system suspend/hibernate. D3cold is the deepest power saving state for a PCIe device, where its main power is removed. While it is in D3cold, you can't access the device at all, not even its configuration space (which is still accessible in D3hot). Therefore the PCI PM registers can not be used to transition into/out of the D3cold state; that must be done by platform logic such as ACPI _PR3. To support wakeup from D3cold, a system may provide auxiliary power, which allows a device to request wakeup using a Beacon or the sideband WAKE# signal. WAKE# is usually connected to platform logic such as ACPI GPE. This is quite different from other power saving states, where devices request wakeup via a PME message on the PCIe link. Some devices, such as those in plug-in slots, have no direct platform logic. For example, there is usually no ACPI _PR3 for them. D3cold support for these devices can be done via the PCIe Downstream Port leading to the device. When the PCIe port is powered on/off, the device is powered on/off too. Wakeup events from the device will be notified to the corresponding PCIe port. For more information about PCIe D3cold and corresponding ACPI support, please refer to: - PCI Express Base Specification Revision 2.0 - Advanced Configuration and Power Interface Specification Revision 5.0 [bhelgaas: changelog] Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl> Originally-by: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-06-23 10:23:51 +08:00
unsigned int d3cold_delay; /* D3cold->D0 transition time in ms */
#ifdef CONFIG_PCIEASPM
struct pcie_link_state *link_state; /* ASPM link state */
unsigned int ltr_path:1; /* Latency Tolerance Reporting
supported from root to here */
u16 l1ss; /* L1SS Capability pointer */
#endif
unsigned int pasid_no_tlp:1; /* PASID works without TLP Prefix */
unsigned int eetlp_prefix_path:1; /* End-to-End TLP Prefix */
pci_channel_state_t error_state; /* Current connectivity state */
struct device dev; /* Generic device interface */
int cfg_size; /* Size of config space */
/*
* Instead of touching interrupt line and base address registers
* directly, use the values stored here. They might be different!
*/
unsigned int irq;
struct resource resource[DEVICE_COUNT_RESOURCE]; /* I/O and memory regions + expansion ROMs */
bool match_driver; /* Skip attaching driver */
unsigned int transparent:1; /* Subtractive decode bridge */
PCI: Probe bridge window attributes once at enumeration-time pci_bridge_check_ranges() determines whether a bridge supports the optional I/O and prefetchable memory windows and sets the flag bits in the bridge resources. This *could* be done once during enumeration except that the resource allocation code completely clears the flag bits, e.g., in the pci_assign_unassigned_bridge_resources() path. The problem with pci_bridge_check_ranges() in the resource allocation path is that we may allocate resources after devices have been claimed by drivers, and pci_bridge_check_ranges() *changes* the window registers to determine whether they're writable. This may break concurrent accesses to devices behind the bridge. Add a new pci_read_bridge_windows() to determine whether a bridge supports the optional windows, call it once during enumeration, remember the results, and change pci_bridge_check_ranges() so it doesn't touch the bridge windows but sets the flag bits based on those remembered results. Link: https://lore.kernel.org/linux-pci/1506151482-113560-1-git-send-email-wangzhou1@hisilicon.com Link: https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.html Reported-by: Yandong Xu <xuyandong2@huawei.com> Tested-by: Yandong Xu <xuyandong2@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Ofer Hayut <ofer@lightbitslabs.com> Cc: Roy Shterman <roys@lightbitslabs.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Zhou Wang <wangzhou1@hisilicon.com>
2019-01-20 01:35:04 +08:00
unsigned int io_window:1; /* Bridge has I/O window */
unsigned int pref_window:1; /* Bridge has pref mem window */
unsigned int pref_64_window:1; /* Pref mem window is 64-bit */
unsigned int multifunction:1; /* Multi-function device */
unsigned int is_busmaster:1; /* Is busmaster */
unsigned int no_msi:1; /* May not use MSI */
unsigned int no_64bit_msi:1; /* May only use 32-bit MSIs */
unsigned int block_cfg_access:1; /* Config space access blocked */
unsigned int broken_parity_status:1; /* Generates false positive parity */
unsigned int irq_reroute_variant:2; /* Needs IRQ rerouting variant */
unsigned int msi_enabled:1;
unsigned int msix_enabled:1;
unsigned int ari_enabled:1; /* ARI forwarding */
unsigned int ats_enabled:1; /* Address Translation Svc */
unsigned int pasid_enabled:1; /* Process Address Space ID */
unsigned int pri_enabled:1; /* Page Request Interface */
unsigned int is_managed:1; /* Managed via devres */
unsigned int is_msi_managed:1; /* MSI release via devres installed */
unsigned int needs_freset:1; /* Requires fundamental reset */
unsigned int state_saved:1;
unsigned int is_physfn:1;
unsigned int is_virtfn:1;
unsigned int is_hotplug_bridge:1;
unsigned int shpc_managed:1; /* SHPC owned by shpchp */
unsigned int is_thunderbolt:1; /* Thunderbolt controller */
/*
* Devices marked being untrusted are the ones that can potentially
* execute DMA attacks and similar. They are typically connected
* through external ports such as Thunderbolt but not limited to
* that. When an IOMMU is enabled they should be getting full
* mappings to make sure they cannot access arbitrary memory.
*/
unsigned int untrusted:1;
/*
* Info from the platform, e.g., ACPI or device tree, may mark a
* device as "external-facing". An external-facing device is
* itself internal but devices downstream from it are external.
*/
unsigned int external_facing:1;
unsigned int broken_intx_masking:1; /* INTx masking can't be used */
unsigned int io_window_1k:1; /* Intel bridge 1K I/O windows */
x86, irq: Keep balance of IOAPIC pin reference count To keep balance of IOAPIC pin reference count, we need to protect pirq_enable_irq(), acpi_pci_irq_enable() and intel_mid_pci_irq_enable() from reentrance. There are two cases which will cause reentrance. The first case is caused by suspend/hibernation. If pcibios_disable_irq is called during suspending/hibernating, we don't release the assigned IRQ number, otherwise it may break the suspend/hibernation. So late when pcibios_enable_irq is called during resume, we shouldn't allocate IRQ number again. The second case is that function acpi_pci_irq_enable() may be called twice for PCI devices present at boot time as below: 1) pci_acpi_init() --> acpi_pci_irq_enable() if pci_routeirq is true 2) pci_enable_device() --> pcibios_enable_device() --> acpi_pci_irq_enable() We can't kill kernel parameter pci_routeirq yet because it's still needed for debugging purpose. So flag irq_managed is introduced to track whether IRQ number is assigned by OS and to protect pirq_enable_irq(), acpi_pci_irq_enable() and intel_mid_pci_irq_enable() from reentrance. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Len Brown <lenb@kernel.org> Link: http://lkml.kernel.org/r/1414387308-27148-13-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-10-27 13:21:42 +08:00
unsigned int irq_managed:1;
unsigned int non_compliant_bars:1; /* Broken BARs; ignore them */
unsigned int is_probed:1; /* Device probing in progress */
unsigned int link_active_reporting:1;/* Device capable of reporting link active */
unsigned int no_vf_scan:1; /* Don't scan for VFs after IOV enablement */
unsigned int no_command_memory:1; /* No PCI_COMMAND_MEMORY */
PCI: Work around Intel I210 ROM BAR overlap defect Per PCIe r5, sec 7.5.1.2.4, a device must not claim accesses to its Expansion ROM unless both the Memory Space Enable and the Expansion ROM Enable bit are set. But apparently some Intel I210 NICs don't work correctly if the ROM BAR overlaps another BAR, even if the Expansion ROM is disabled. Michael reported that on a Kontron SMARC-sAL28 ARM64 system with U-Boot v2021.01-rc3, the ROM BAR overlaps BAR 3, and networking doesn't work at all: BAR 0: 0x40000000 (32-bit, non-prefetchable) [size=1M] BAR 3: 0x40200000 (32-bit, non-prefetchable) [size=16K] ROM: 0x40200000 (disabled) [size=1M] NETDEV WATCHDOG: enP2p1s0 (igb): transmit queue 0 timed out Hardware name: Kontron SMARC-sAL28 (Single PHY) on SMARC Eval 2.0 carrier (DT) igb 0002:01:00.0 enP2p1s0: Reset adapter Previously, pci_std_update_resource() wrote the assigned ROM address to the BAR only when the ROM was enabled. This meant that the I210 ROM BAR could be left with an address assigned by firmware, which might overlap with other BARs. Quirk these I210 devices so pci_std_update_resource() always writes the assigned address to the ROM BAR, whether or not the ROM is enabled. Link: https://lore.kernel.org/r/20211223163754.GA1267351@bhelgaas Link: https://lore.kernel.org/r/20201230185317.30915-1-michael@walle.cc Link: https://bugzilla.kernel.org/show_bug.cgi?id=211105 Reported-by: Michael Walle <michael@walle.cc> Tested-by: Michael Walle <michael@walle.cc> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-12-22 00:45:07 +08:00
unsigned int rom_bar_overlap:1; /* ROM BAR disable broken */
pci_dev_flags_t dev_flags;
PCI: switch pci_{enable,disable}_device() to be nestable Changes the pci_{enable,disable}_device() functions to work in a nested basis, so that eg, three calls to enable_device() require three calls to disable_device(). The reason for this is to simplify PCI drivers for multi-interface/capability devices. These are devices that cram more than one interface in a single function. A relevant example of that is the Wireless [USB] Host Controller Interface (similar to EHCI) [see http://www.intel.com/technology/comms/wusb/whci.htm]. In these kind of devices, multiple interfaces are accessed through a single bar and IRQ line. For that, the drivers map only the smallest area of the bar to access their register banks and use shared IRQ handlers. However, because the order at which those drivers load cannot be known ahead of time, the sequence in which the calls to pci_enable_device() and pci_disable_device() cannot be predicted. Thus: 1. driverA starts pci_enable_device() 2. driverB starts pci_enable_device() 3. driverA shutdown pci_disable_device() 4. driverB shutdown pci_disable_device() between steps 3 and 4, driver B would loose access to it's device, even if it didn't intend to. By using this modification, the device won't be disabled until all the callers to enable() have called disable(). This is implemented by replacing 'struct pci_dev->is_enabled' from a bitfield to an atomic use count. Each caller to enable increments it, each caller to disable decrements it. When the count increments from 0 to 1, __pci_enable_device() is called to actually enable the device. When it drops to zero, pci_disable_device() actually does the disabling. We keep the backend __pci_enable_device() for pci_default_resume() to use and also change the sysfs method implementation, so that userspace enabling/disabling the device doesn't disable it one time too much. Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-11-23 04:40:31 +08:00
atomic_t enable_cnt; /* pci_enable_device has been called */
u32 saved_config_space[16]; /* Config space saved at suspend time */
struct hlist_head saved_cap_space;
int rom_attr_enabled; /* Display of ROM attribute enabled? */
struct bin_attribute *res_attr[DEVICE_COUNT_RESOURCE]; /* sysfs file for resources */
struct bin_attribute *res_attr_wc[DEVICE_COUNT_RESOURCE]; /* sysfs file for WC mapping of resources */
PCI: pciehp: Add quirk for Command Completed errata Several PCIe hotplug controllers have errata that mean they do not set the Command Completed bit unless writes to the Slot Command register change "Control" bits. Command Completed is never set for writes that only change software notification "Enable" bits. This results in timeouts like this: pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago) When this erratum is present, avoid these timeouts by marking commands "completed" immediately unless they change the "Control" bits. Here's the text of the Intel erratum CF118. We assume this applies to all Intel parts: CF118 PCIe Slot Status Register Command Completed bit not always updated on any configuration write to the Slot Control Register Problem: For PCIe root ports (devices 0 - 10) supporting hot-plug, the Slot Status Register (offset AAh) Command Completed (bit[4]) status is updated under the following condition: IOH will set Command Completed bit after delivering the new commands written in the Slot Controller register (offset A8h) to VPP. The IOH detects new commands written in Slot Control register by checking the change of value for Power Controller Control (bit[10]), Power Indicator Control (bits[9:8]), Attention Indicator Control (bits[7:6]), or Electromechanical Interlock Control (bit[11]) fields. Any other configuration writes to the Slot Control register without changing the values of these fields will not cause Command Completed bit to be set. The PCIe Base Specification Revision 2.0 or later describes the “Slot Control Register” in section 7.8.10, as follows (Reference section 7.8.10, Slot Control Register, Offset 18h). In hot-plug capable Downstream Ports, a write to the Slot Control register must cause a hot-plug command to be generated (see Section 6.7.3.2 for details on hot-plug commands). A write to the Slot Control register in a Downstream Port that is not hotplug capable must not cause a hot-plug command to be executed. The PCIe Spec intended that every write to the Slot Control Register is a command and expected a command complete status to abstract the VPP implementation specific nuances from the OS software. IOH PCIe Slot Control Register implementation is not fully conforming to the PCIe Specification in this respect. Implication: Software checking on the Command Completed status after writing to the Slot Control register may time out. Workaround: Software can read the Slot Control register and compare the existing and new values to determine if it should check the Command Completed status after writing to the Slot Control register. Per Sinan, the Qualcomm QDF2400 controller also does not set the Command Completed bit unless writes to the Slot Command register change "Control" bits. Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html Link: https://lkml.kernel.org/r/8770820b-85a0-172b-7230-3a44524e6c9f@molgen.mpg.de Reported-by: Paul Menzel <pmenzel+linux-pci@molgen.mpg.de> # Lenovo X60 Tested-by: Paul Menzel <pmenzel+linux-pci@molgen.mpg.de> # Lenovo X60 Signed-off-by: Sinan Kaya <okaya@codeaurora.org> # Qcom quirk Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-05-04 07:39:38 +08:00
#ifdef CONFIG_HOTPLUG_PCI_PCIE
unsigned int broken_cmd_compl:1; /* No compl for some cmds */
#endif
#ifdef CONFIG_PCIE_PTM
unsigned int ptm_root:1;
unsigned int ptm_enabled:1;
u8 ptm_granularity;
#endif
#ifdef CONFIG_PCI_MSI
void __iomem *msix_base;
raw_spinlock_t msi_lock;
#endif
struct pci_vpd vpd;
#ifdef CONFIG_PCIE_DPC
u16 dpc_cap;
unsigned int dpc_rp_extensions:1;
u8 dpc_rp_log_size;
#endif
#ifdef CONFIG_PCI_ATS
union {
struct pci_sriov *sriov; /* PF: SR-IOV info */
struct pci_dev *physfn; /* VF: related PF */
};
u16 ats_cap; /* ATS Capability offset */
u8 ats_stu; /* ATS Smallest Translation Unit */
#endif
#ifdef CONFIG_PCI_PRI
u16 pri_cap; /* PRI Capability offset */
u32 pri_reqs_alloc; /* Number of PRI requests allocated */
unsigned int pasid_required:1; /* PRG Response PASID Required */
#endif
#ifdef CONFIG_PCI_PASID
u16 pasid_cap; /* PASID Capability offset */
u16 pasid_features;
PCI/P2PDMA: Support peer-to-peer memory Some PCI devices may have memory mapped in a BAR space that's intended for use in peer-to-peer transactions. To enable such transactions the memory must be registered with ZONE_DEVICE pages so it can be used by DMA interfaces in existing drivers. Add an interface for other subsystems to find and allocate chunks of P2P memory as necessary to facilitate transfers between two PCI peers: struct pci_dev *pci_p2pmem_find[_many](); int pci_p2pdma_distance[_many](); void *pci_alloc_p2pmem(); The new interface requires a driver to collect a list of client devices involved in the transaction then call pci_p2pmem_find() to obtain any suitable P2P memory. Alternatively, if the caller knows a device which provides P2P memory, they can use pci_p2pdma_distance() to determine if it is usable. With a suitable p2pmem device, memory can then be allocated with pci_alloc_p2pmem() for use in DMA transactions. Depending on hardware, using peer-to-peer memory may reduce the bandwidth of the transfer but can significantly reduce pressure on system memory. This may be desirable in many cases: for example a system could be designed with a small CPU connected to a PCIe switch by a small number of lanes which would maximize the number of lanes available to connect to NVMe devices. The code is designed to only utilize the p2pmem device if all the devices involved in a transfer are behind the same PCI bridge. This is because we have no way of knowing whether peer-to-peer routing between PCIe Root Ports is supported (PCIe r4.0, sec 1.3.1). Additionally, the benefits of P2P transfers that go through the RC is limited to only reducing DRAM usage and, in some cases, coding convenience. The PCI-SIG may be exploring adding a new capability bit to advertise whether this is possible for future hardware. This commit includes significant rework and feedback from Christoph Hellwig. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> [bhelgaas: fold in fix from Keith Busch <keith.busch@intel.com>: https://lore.kernel.org/linux-pci/20181012155920.15418-1-keith.busch@intel.com, to address comment from Dan Carpenter <dan.carpenter@oracle.com>, fold in https://lore.kernel.org/linux-pci/20181017160510.17926-1-logang@deltatee.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-10-05 05:27:35 +08:00
#endif
#ifdef CONFIG_PCI_P2PDMA
struct pci_p2pdma __rcu *p2pdma;
#endif
u16 acs_cap; /* ACS Capability offset */
phys_addr_t rom; /* Physical address if not from BAR */
size_t romlen; /* Length if not from BAR */
/*
* Driver name to force a match. Do not set directly, because core
* frees it. Use driver_set_override() to set or clear it.
*/
const char *driver_override;
unsigned long priv_flags; /* Private flags for the PCI driver */
/* These methods index pci_reset_fn_methods[] */
u8 reset_methods[PCI_NUM_RESET_METHODS]; /* In priority order */
};
static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
{
#ifdef CONFIG_PCI_IOV
if (dev->is_virtfn)
dev = dev->physfn;
#endif
return dev;
}
struct pci_dev *pci_alloc_dev(struct pci_bus *bus);
#define to_pci_dev(n) container_of(n, struct pci_dev, dev)
#define for_each_pci_dev(d) while ((d = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, d)) != NULL)
static inline int pci_channel_offline(struct pci_dev *pdev)
{
return (pdev->error_state != pci_channel_io_normal);
}
/*
* Currently in ACPI spec, for each PCI host bridge, PCI Segment
* Group number is limited to a 16-bit value, therefore (int)-1 is
* not a valid PCI domain number, and can be used as a sentinel
* value indicating ->domain_nr is not set by the driver (and
* CONFIG_PCI_DOMAINS_GENERIC=y archs will set it with
* pci_bus_find_domain_nr()).
*/
#define PCI_DOMAIN_NR_NOT_SET (-1)
struct pci_host_bridge {
struct device dev;
struct pci_bus *bus; /* Root bus */
struct pci_ops *ops;
struct pci_ops *child_ops;
void *sysdata;
int busnr;
int domain_nr;
struct list_head windows; /* resource_entry */
struct list_head dma_ranges; /* dma ranges resource list */
u8 (*swizzle_irq)(struct pci_dev *, u8 *); /* Platform IRQ swizzler */
int (*map_irq)(const struct pci_dev *, u8, u8);
void (*release_fn)(struct pci_host_bridge *);
void *release_data;
unsigned int ignore_reset_delay:1; /* For entire hierarchy */
unsigned int no_ext_tags:1; /* No Extended Tags */
unsigned int native_aer:1; /* OS may use PCIe AER */
unsigned int native_pcie_hotplug:1; /* OS may use PCIe hotplug */
unsigned int native_shpc_hotplug:1; /* OS may use SHPC hotplug */
unsigned int native_pme:1; /* OS may use PCIe PME */
unsigned int native_ltr:1; /* OS may use PCIe LTR */
PCI/DPC: Add Error Disconnect Recover (EDR) support Error Disconnect Recover (EDR) is a feature that allows ACPI firmware to notify OSPM that a device has been disconnected due to an error condition (ACPI v6.3, sec 5.6.6). OSPM advertises its support for EDR on PCI devices via _OSC (see [1], sec 4.5.1, table 4-4). The OSPM EDR notify handler should invalidate software state associated with disconnected devices and may attempt to recover them. OSPM communicates the status of recovery to the firmware via _OST (sec 6.3.5.2). For PCIe, firmware may use Downstream Port Containment (DPC) to support EDR. Per [1], sec 4.5.1, table 4-6, even if firmware has retained control of DPC, OSPM may read/write DPC control and status registers during the EDR notification processing window, i.e., from the time it receives an EDR notification until it clears the DPC Trigger Status. Note that per [1], sec 4.5.1 and 4.5.2.4, 1. If the OS supports EDR, it should advertise that to firmware by setting OSC_PCI_EDR_SUPPORT in _OSC Support. 2. If the OS sets OSC_PCI_EXPRESS_DPC_CONTROL in _OSC Control to request control of the DPC capability, it must also set OSC_PCI_EDR_SUPPORT in _OSC Support. Add an EDR notify handler to attempt recovery. [1] Downstream Port Containment Related Enhancements ECN, Jan 28, 2019, affecting PCI Firmware Specification, Rev. 3.2 https://members.pcisig.com/wg/PCI-SIG/document/12888 [bhelgaas: squash add/enable patches into one] Link: https://lore.kernel.org/r/90f91fe6d25c13f9d2255d2ce97ca15be307e1bb.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org>
2020-03-24 08:26:07 +08:00
unsigned int native_dpc:1; /* OS may use PCIe DPC */
unsigned int preserve_config:1; /* Preserve FW resource setup */
unsigned int size_windows:1; /* Enable root bus sizing */
unsigned int msi_domain:1; /* Bridge wants MSI domain */
/* Resource alignment requirements */
resource_size_t (*align_resource)(struct pci_dev *dev,
const struct resource *res,
resource_size_t start,
resource_size_t size,
resource_size_t align);
PCI: Replace zero-length array with flexible-array The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these as a flexible array member [1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that dynamic memory allocations won't be affected by this change: Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero. [1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type [1]. There are some instances of code in which the sizeof() operator is being incorrectly/erroneously applied to zero-length arrays, and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Link: https://lore.kernel.org/r/20200507190544.GA15633@embeddedor Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-05-08 03:05:44 +08:00
unsigned long private[] ____cacheline_aligned;
};
#define to_pci_host_bridge(n) container_of(n, struct pci_host_bridge, dev)
static inline void *pci_host_bridge_priv(struct pci_host_bridge *bridge)
{
return (void *)bridge->private;
}
static inline struct pci_host_bridge *pci_host_bridge_from_priv(void *priv)
{
return container_of(priv, struct pci_host_bridge, private);
}
struct pci_host_bridge *pci_alloc_host_bridge(size_t priv);
struct pci_host_bridge *devm_pci_alloc_host_bridge(struct device *dev,
size_t priv);
void pci_free_host_bridge(struct pci_host_bridge *bridge);
struct pci_host_bridge *pci_find_host_bridge(struct pci_bus *bus);
void pci_set_host_bridge_release(struct pci_host_bridge *bridge,
void (*release_fn)(struct pci_host_bridge *),
void *release_data);
ACPI / PCI: Set root bridge ACPI handle in advance The ACPI handles of PCI root bridges need to be known to acpi_bind_one(), so that it can create the appropriate "firmware_node" and "physical_node" files for them, but currently the way it gets to know those handles is not exactly straightforward (to put it lightly). This is how it works, roughly: 1. acpi_bus_scan() finds the handle of a PCI root bridge, creates a struct acpi_device object for it and passes that object to acpi_pci_root_add(). 2. acpi_pci_root_add() creates a struct acpi_pci_root object, populates its "device" field with its argument's address (device->handle is the ACPI handle found in step 1). 3. The struct acpi_pci_root object created in step 2 is passed to pci_acpi_scan_root() and used to get resources that are passed to pci_create_root_bus(). 4. pci_create_root_bus() creates a struct pci_host_bridge object and passes its "dev" member to device_register(). 5. platform_notify(), which for systems with ACPI is set to acpi_platform_notify(), is called. So far, so good. Now it starts to be "interesting". 6. acpi_find_bridge_device() is used to find the ACPI handle of the given device (which is the PCI root bridge) and executes acpi_pci_find_root_bridge(), among other things, for the given device object. 7. acpi_pci_find_root_bridge() uses the name (sic!) of the given device object to extract the segment and bus numbers of the PCI root bridge and passes them to acpi_get_pci_rootbridge_handle(). 8. acpi_get_pci_rootbridge_handle() browses the list of ACPI PCI root bridges and finds the one that matches the given segment and bus numbers. Its handle is then used to initialize the ACPI handle of the PCI root bridge's device object by acpi_bind_one(). However, this is *exactly* the ACPI handle we started with in step 1. Needless to say, this is quite embarassing, but it may be avoided thanks to commit f3fd0c8 (ACPI: Allow ACPI handles of devices to be initialized in advance), which makes it possible to initialize the ACPI handle of a device before passing it to device_register(). Accordingly, add a new __weak routine, pcibios_root_bridge_prepare(), defaulting to an empty implementation that can be replaced by the interested architecutres (x86 and ia64 at the moment) with functions that will set the root bridge's ACPI handle before its dev member is passed to device_register(). Make both x86 and ia64 provide such implementations of pcibios_root_bridge_prepare() and remove acpi_pci_find_root_bridge() and acpi_get_pci_rootbridge_handle() that aren't necessary any more. Included is a fix for breakage on systems with non-ACPI PCI host bridges from Bjorn Helgaas. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-01-10 05:33:37 +08:00
int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge);
/*
* The first PCI_BRIDGE_RESOURCE_NUM PCI bus resources (those that correspond
* to P2P or CardBus bridge windows) go in a table. Additional ones (for
* buses below host bridges or subtractive decode bridges) go in the list.
* Use pci_bus_for_each_resource() to iterate through all the resources.
*/
/*
* PCI_SUBTRACTIVE_DECODE means the bridge forwards the window implicitly
* and there's no way to program the bridge with the details of the window.
* This does not apply to ACPI _CRS windows, even with the _DEC subtractive-
* decode bit set, because they are explicit and can be programmed with _SRS.
*/
#define PCI_SUBTRACTIVE_DECODE 0x1
struct pci_bus_resource {
struct list_head list;
struct resource *res;
unsigned int flags;
};
#define PCI_REGION_FLAG_MASK 0x0fU /* These bits of resource flags tell us the PCI region flags */
struct pci_bus {
struct list_head node; /* Node in list of buses */
struct pci_bus *parent; /* Parent bus this bridge is on */
struct list_head children; /* List of child buses */
struct list_head devices; /* List of devices on this bus */
struct pci_dev *self; /* Bridge device as seen by parent */
struct list_head slots; /* List of slots on this bus;
protected by pci_slot_mutex */
struct resource *resource[PCI_BRIDGE_RESOURCE_NUM];
struct list_head resources; /* Address space routed to this bus */
struct resource busn_res; /* Bus numbers routed to this bus */
struct pci_ops *ops; /* Configuration access functions */
void *sysdata; /* Hook for sys-specific extension */
struct proc_dir_entry *procdir; /* Directory entry in /proc/bus/pci */
unsigned char number; /* Bus number */
unsigned char primary; /* Number of primary bridge */
unsigned char max_bus_speed; /* enum pci_bus_speed */
unsigned char cur_bus_speed; /* enum pci_bus_speed */
#ifdef CONFIG_PCI_DOMAINS_GENERIC
int domain_nr;
#endif
char name[48];
unsigned short bridge_ctl; /* Manage NO_ISA/FBB/et al behaviors */
pci_bus_flags_t bus_flags; /* Inherited by child buses */
struct device *bridge;
struct device dev;
struct bin_attribute *legacy_io; /* Legacy I/O for this bus */
struct bin_attribute *legacy_mem; /* Legacy mem */
unsigned int is_added:1;
unsigned int unsafe_warn:1; /* warned about RW1C config write */
};
#define to_pci_bus(n) container_of(n, struct pci_bus, dev)
static inline u16 pci_dev_id(struct pci_dev *dev)
{
return PCI_DEVID(dev->bus->number, dev->devfn);
}
/*
* Returns true if the PCI bus is root (behind host-PCI bridge),
* false otherwise
*
* Some code assumes that "bus->self == NULL" means that bus is a root bus.
* This is incorrect because "virtual" buses added for SR-IOV (via
* virtfn_add_bus()) have "bus->self == NULL" but are not root buses.
*/
static inline bool pci_is_root_bus(struct pci_bus *pbus)
{
return !(pbus->parent);
}
/**
* pci_is_bridge - check if the PCI device is a bridge
* @dev: PCI device
*
* Return true if the PCI device is bridge whether it has subordinate
* or not.
*/
static inline bool pci_is_bridge(struct pci_dev *dev)
{
return dev->hdr_type == PCI_HEADER_TYPE_BRIDGE ||
dev->hdr_type == PCI_HEADER_TYPE_CARDBUS;
}
#define for_each_pci_bridge(dev, bus) \
list_for_each_entry(dev, &bus->devices, bus_list) \
if (!pci_is_bridge(dev)) {} else
static inline struct pci_dev *pci_upstream_bridge(struct pci_dev *dev)
{
dev = pci_physfn(dev);
if (pci_is_root_bus(dev->bus))
return NULL;
return dev->bus->self;
}
#ifdef CONFIG_PCI_MSI
static inline bool pci_dev_msi_enabled(struct pci_dev *pci_dev)
{
return pci_dev->msi_enabled || pci_dev->msix_enabled;
}
#else
static inline bool pci_dev_msi_enabled(struct pci_dev *pci_dev) { return false; }
#endif
/* Error values that may be returned by PCI functions */
#define PCIBIOS_SUCCESSFUL 0x00
#define PCIBIOS_FUNC_NOT_SUPPORTED 0x81
#define PCIBIOS_BAD_VENDOR_ID 0x83
#define PCIBIOS_DEVICE_NOT_FOUND 0x86
#define PCIBIOS_BAD_REGISTER_NUMBER 0x87
#define PCIBIOS_SET_FAILED 0x88
#define PCIBIOS_BUFFER_TOO_SMALL 0x89
/* Translate above to generic errno for passing back through non-PCI code */
static inline int pcibios_err_to_errno(int err)
{
if (err <= PCIBIOS_SUCCESSFUL)
return err; /* Assume already errno */
switch (err) {
case PCIBIOS_FUNC_NOT_SUPPORTED:
return -ENOENT;
case PCIBIOS_BAD_VENDOR_ID:
return -ENOTTY;
case PCIBIOS_DEVICE_NOT_FOUND:
return -ENODEV;
case PCIBIOS_BAD_REGISTER_NUMBER:
return -EFAULT;
case PCIBIOS_SET_FAILED:
return -EIO;
case PCIBIOS_BUFFER_TOO_SMALL:
return -ENOSPC;
}
return -ERANGE;
}
/* Low-level architecture-dependent routines */
struct pci_ops {
int (*add_bus)(struct pci_bus *bus);
void (*remove_bus)(struct pci_bus *bus);
void __iomem *(*map_bus)(struct pci_bus *bus, unsigned int devfn, int where);
int (*read)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *val);
int (*write)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 val);
};
/*
* ACPI needs to be able to access PCI config space before we've done a
* PCI bus scan and created pci_bus structures.
*/
int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn,
int reg, int len, u32 *val);
int raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn,
int reg, int len, u32 val);
#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
PCI: Add pci_bus_addr_t David Ahern reported that d63e2e1f3df9 ("sparc/PCI: Clip bridge windows to fit in upstream windows") fails to boot on sparc/T5-8: pci 0000:06:00.0: reg 0x184: can't handle BAR above 4GB (bus address 0x110204000) The problem is that sparc64 assumed that dma_addr_t only needed to hold DMA addresses, i.e., bus addresses returned via the DMA API (dma_map_single(), etc.), while the PCI core assumed dma_addr_t could hold *any* bus address, including raw BAR values. On sparc64, all DMA addresses fit in 32 bits, so dma_addr_t is a 32-bit type. However, BAR values can be 64 bits wide, so they don't fit in a dma_addr_t. d63e2e1f3df9 added new checking that tripped over this mismatch. Add pci_bus_addr_t, which is wide enough to hold any PCI bus address, including both raw BAR values and DMA addresses. This will be 64 bits on 64-bit platforms and on platforms with a 64-bit dma_addr_t. Then dma_addr_t only needs to be wide enough to hold addresses from the DMA API. [bhelgaas: changelog, bugzilla, Kconfig to ensure pci_bus_addr_t is at least as wide as dma_addr_t, documentation] Fixes: d63e2e1f3df9 ("sparc/PCI: Clip bridge windows to fit in upstream windows") Fixes: 23b13bc76f35 ("PCI: Fail safely if we can't handle BARs larger than 4GB") Link: http://lkml.kernel.org/r/CAE9FiQU1gJY1LYrxs+ma5LCTEEe4xmtjRG0aXJ9K_Tsu+m9Wuw@mail.gmail.com Link: http://lkml.kernel.org/r/1427857069-6789-1-git-send-email-yinghai@kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=96231 Reported-by: David Ahern <david.ahern@oracle.com> Tested-by: David Ahern <david.ahern@oracle.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: David S. Miller <davem@davemloft.net> CC: stable@vger.kernel.org # v3.19+
2015-05-28 08:23:51 +08:00
typedef u64 pci_bus_addr_t;
#else
typedef u32 pci_bus_addr_t;
#endif
struct pci_bus_region {
pci_bus_addr_t start;
pci_bus_addr_t end;
};
struct pci_dynids {
spinlock_t lock; /* Protects list, index */
struct list_head list; /* For IDs added at runtime */
};
/*
* PCI Error Recovery System (PCI-ERS). If a PCI device driver provides
* a set of callbacks in struct pci_error_handlers, that device driver
* will be notified of PCI bus errors, and will be driven to recovery
* when an error occurs.
*/
typedef unsigned int __bitwise pci_ers_result_t;
enum pci_ers_result {
/* No result/none/not supported in device driver */
PCI_ERS_RESULT_NONE = (__force pci_ers_result_t) 1,
/* Device driver can recover without slot reset */
PCI_ERS_RESULT_CAN_RECOVER = (__force pci_ers_result_t) 2,
/* Device driver wants slot to be reset */
PCI_ERS_RESULT_NEED_RESET = (__force pci_ers_result_t) 3,
/* Device has completely failed, is unrecoverable */
PCI_ERS_RESULT_DISCONNECT = (__force pci_ers_result_t) 4,
/* Device driver is fully recovered and operational */
PCI_ERS_RESULT_RECOVERED = (__force pci_ers_result_t) 5,
PCI/AER: Report success only when every device has AER-aware driver When an error is detected on a PCIe device which does not have an AER-aware driver, prevent AER infrastructure from reporting successful error recovery. This is because the report_error_detected() function that gets called in the first phase of recovery process allows forward progress even when the driver for the device does not have AER capabilities. It seems that all callbacks (in pci_error_handlers structure) registered by drivers that gets called during error recovery are not mandatory. So the intention of the infrastructure design seems to be to allow forward progress even when a specific callback has not been registered by a driver. However, if error handler structure itself has not been registered, it doesn't make sense to allow forward progress. As a result of the current design, in the case of a single device having an AER-unaware driver or in the case of any function in a multi-function card having an AER-unaware driver, a successful recovery is reported. Typical scenario this happens is when a PCI device is detached from a KVM host and the pci-stub driver on the host claims the device. The pci-stub driver does not have error handling capabilities but the AER infrastructure still reports that the device recovered successfully. The changes proposed here leaves the device(s)in an unrecovered state if the driver for the device or for any device in the subtree does not have error handler structure registered. This reflects the true state of the device and prevents any partial recovery (or no recovery at all) reported as successful. [bhelgaas: changelog] Signed-off-by: Vijay Mohan Pandarathil <vijaymohan.pandarathil@hp.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Linas Vepstas <linasvepstas@gmail.com> Reviewed-by: Myron Stowe <myron.stowe@redhat.com>
2012-11-17 19:47:18 +08:00
/* No AER capabilities registered for the driver */
PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6,
};
/* PCI bus error event callbacks */
struct pci_error_handlers {
/* PCI bus error detected on this device */
pci_ers_result_t (*error_detected)(struct pci_dev *dev,
pci_channel_state_t error);
/* MMIO has been re-enabled, but not DMA */
pci_ers_result_t (*mmio_enabled)(struct pci_dev *dev);
/* PCI slot has been reset */
pci_ers_result_t (*slot_reset)(struct pci_dev *dev);
/* PCI function reset prepare or completed */
void (*reset_prepare)(struct pci_dev *dev);
void (*reset_done)(struct pci_dev *dev);
/* Device driver may resume normal operations */
void (*resume)(struct pci_dev *dev);
};
struct module;
/**
* struct pci_driver - PCI driver structure
* @node: List of driver structures.
* @name: Driver name.
* @id_table: Pointer to table of device IDs the driver is
* interested in. Most drivers should export this
* table using MODULE_DEVICE_TABLE(pci,...).
* @probe: This probing function gets called (during execution
* of pci_register_driver() for already existing
* devices or later if a new device gets inserted) for
* all PCI devices which match the ID table and are not
* "owned" by the other drivers yet. This function gets
* passed a "struct pci_dev \*" for each device whose
* entry in the ID table matches the device. The probe
* function returns zero when the driver chooses to
* take "ownership" of the device or an error code
* (negative number) otherwise.
* The probe function always gets called from process
* context, so it can sleep.
* @remove: The remove() function gets called whenever a device
* being handled by this driver is removed (either during
* deregistration of the driver or when it's manually
* pulled out of a hot-pluggable slot).
* The remove function always gets called from process
* context, so it can sleep.
* @suspend: Put device into low power state.
* @resume: Wake device from low power state.
* (Please see Documentation/power/pci.rst for descriptions
* of PCI Power Management and the related functions.)
* @shutdown: Hook into reboot_notifier_list (kernel/sys.c).
* Intended to stop any idling DMA operations.
* Useful for enabling wake-on-lan (NIC) or changing
* the power state of a device before reboot.
* e.g. drivers/net/e100.c.
* @sriov_configure: Optional driver callback to allow configuration of
* number of VFs to enable via sysfs "sriov_numvfs" file.
PCI/IOV: Add sysfs MSI-X vector assignment interface A typical cloud provider SR-IOV use case is to create many VFs for use by guest VMs. The VFs may not be assigned to a VM until a customer requests a VM of a certain size, e.g., number of CPUs. A VF may need MSI-X vectors proportional to the number of CPUs in the VM, but there is no standard way to change the number of MSI-X vectors supported by a VF. Some Mellanox ConnectX devices support dynamic assignment of MSI-X vectors to SR-IOV VFs. This can be done by the PF driver after VFs are enabled, and it can be done without affecting VFs that are already in use. The hardware supports a limited pool of MSI-X vectors that can be assigned to the PF or to individual VFs. This is device-specific behavior that requires support in the PF driver. Add a read-only "sriov_vf_total_msix" sysfs file for the PF and a writable "sriov_vf_msix_count" file for each VF. Management software may use these to learn how many MSI-X vectors are available and to dynamically assign them to VFs before the VFs are passed through to a VM. If the PF driver implements the ->sriov_get_vf_total_msix() callback, "sriov_vf_total_msix" contains the total number of MSI-X vectors available for distribution among VFs. If no driver is bound to the VF, writing "N" to "sriov_vf_msix_count" uses the PF driver ->sriov_set_msix_vec_count() callback to assign "N" MSI-X vectors to the VF. When a VF driver subsequently reads the MSI-X Message Control register, it will see the new Table Size "N". Link: https://lore.kernel.org/linux-pci/20210314124256.70253-2-leon@kernel.org Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-04-04 15:22:18 +08:00
* @sriov_set_msix_vec_count: PF Driver callback to change number of MSI-X
* vectors on a VF. Triggered via sysfs "sriov_vf_msix_count".
* This will change MSI-X Table Size in the VF Message Control
* registers.
* @sriov_get_vf_total_msix: PF driver callback to get the total number of
* MSI-X vectors available for distribution to the VFs.
* @err_handler: See Documentation/PCI/pci-error-recovery.rst
* @groups: Sysfs attribute groups.
* @dev_groups: Attributes attached to the device that will be
* created once it is bound to the driver.
* @driver: Driver model structure.
* @dynids: List of dynamically added device IDs.
bus: platform,amba,fsl-mc,PCI: Add device DMA ownership management The devices on platform/amba/fsl-mc/PCI buses could be bound to drivers with the device DMA managed by kernel drivers or user-space applications. Unfortunately, multiple devices may be placed in the same IOMMU group because they cannot be isolated from each other. The DMA on these devices must either be entirely under kernel control or userspace control, never a mixture. Otherwise the driver integrity is not guaranteed because they could access each other through the peer-to-peer accesses which by-pass the IOMMU protection. This checks and sets the default DMA mode during driver binding, and cleanups during driver unbinding. In the default mode, the device DMA is managed by the device driver which handles DMA operations through the kernel DMA APIs (see Documentation/core-api/dma-api.rst). For cases where the devices are assigned for userspace control through the userspace driver framework(i.e. VFIO), the drivers(for example, vfio_pci/ vfio_platfrom etc.) may set a new flag (driver_managed_dma) to skip this default setting in the assumption that the drivers know what they are doing with the device DMA. Calling iommu_device_use_default_domain() before {of,acpi}_dma_configure is currently a problem. As things stand, the IOMMU driver ignored the initial iommu_probe_device() call when the device was added, since at that point it had no fwspec yet. In this situation, {of,acpi}_iommu_configure() are retriggering iommu_probe_device() after the IOMMU driver has seen the firmware data via .of_xlate to learn that it actually responsible for the given device. As the result, before that gets fixed, iommu_use_default_domain() goes at the end, and calls arch_teardown_dma_ops() if it fails. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Stuart Yoder <stuyoder@gmail.com> Cc: Laurentiu Tudor <laurentiu.tudor@nxp.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20220418005000.897664-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-18 08:49:53 +08:00
* @driver_managed_dma: Device driver doesn't use kernel DMA API for DMA.
* For most device drivers, no need to care about this flag
* as long as all DMAs are handled through the kernel DMA API.
* For some special ones, for example VFIO drivers, they know
* how to manage the DMA themselves and set this flag so that
* the IOMMU layer will allow them to setup and manage their
* own I/O address space.
*/
struct pci_driver {
struct list_head node;
const char *name;
const struct pci_device_id *id_table; /* Must be non-NULL for probe to be called */
int (*probe)(struct pci_dev *dev, const struct pci_device_id *id); /* New device inserted */
void (*remove)(struct pci_dev *dev); /* Device removed (NULL if not a hot-plug capable driver) */
int (*suspend)(struct pci_dev *dev, pm_message_t state); /* Device suspended */
int (*resume)(struct pci_dev *dev); /* Device woken up */
void (*shutdown)(struct pci_dev *dev);
int (*sriov_configure)(struct pci_dev *dev, int num_vfs); /* On PF */
PCI/IOV: Add sysfs MSI-X vector assignment interface A typical cloud provider SR-IOV use case is to create many VFs for use by guest VMs. The VFs may not be assigned to a VM until a customer requests a VM of a certain size, e.g., number of CPUs. A VF may need MSI-X vectors proportional to the number of CPUs in the VM, but there is no standard way to change the number of MSI-X vectors supported by a VF. Some Mellanox ConnectX devices support dynamic assignment of MSI-X vectors to SR-IOV VFs. This can be done by the PF driver after VFs are enabled, and it can be done without affecting VFs that are already in use. The hardware supports a limited pool of MSI-X vectors that can be assigned to the PF or to individual VFs. This is device-specific behavior that requires support in the PF driver. Add a read-only "sriov_vf_total_msix" sysfs file for the PF and a writable "sriov_vf_msix_count" file for each VF. Management software may use these to learn how many MSI-X vectors are available and to dynamically assign them to VFs before the VFs are passed through to a VM. If the PF driver implements the ->sriov_get_vf_total_msix() callback, "sriov_vf_total_msix" contains the total number of MSI-X vectors available for distribution among VFs. If no driver is bound to the VF, writing "N" to "sriov_vf_msix_count" uses the PF driver ->sriov_set_msix_vec_count() callback to assign "N" MSI-X vectors to the VF. When a VF driver subsequently reads the MSI-X Message Control register, it will see the new Table Size "N". Link: https://lore.kernel.org/linux-pci/20210314124256.70253-2-leon@kernel.org Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-04-04 15:22:18 +08:00
int (*sriov_set_msix_vec_count)(struct pci_dev *vf, int msix_vec_count); /* On PF */
u32 (*sriov_get_vf_total_msix)(struct pci_dev *pf);
const struct pci_error_handlers *err_handler;
const struct attribute_group **groups;
const struct attribute_group **dev_groups;
struct device_driver driver;
struct pci_dynids dynids;
bus: platform,amba,fsl-mc,PCI: Add device DMA ownership management The devices on platform/amba/fsl-mc/PCI buses could be bound to drivers with the device DMA managed by kernel drivers or user-space applications. Unfortunately, multiple devices may be placed in the same IOMMU group because they cannot be isolated from each other. The DMA on these devices must either be entirely under kernel control or userspace control, never a mixture. Otherwise the driver integrity is not guaranteed because they could access each other through the peer-to-peer accesses which by-pass the IOMMU protection. This checks and sets the default DMA mode during driver binding, and cleanups during driver unbinding. In the default mode, the device DMA is managed by the device driver which handles DMA operations through the kernel DMA APIs (see Documentation/core-api/dma-api.rst). For cases where the devices are assigned for userspace control through the userspace driver framework(i.e. VFIO), the drivers(for example, vfio_pci/ vfio_platfrom etc.) may set a new flag (driver_managed_dma) to skip this default setting in the assumption that the drivers know what they are doing with the device DMA. Calling iommu_device_use_default_domain() before {of,acpi}_dma_configure is currently a problem. As things stand, the IOMMU driver ignored the initial iommu_probe_device() call when the device was added, since at that point it had no fwspec yet. In this situation, {of,acpi}_iommu_configure() are retriggering iommu_probe_device() after the IOMMU driver has seen the firmware data via .of_xlate to learn that it actually responsible for the given device. As the result, before that gets fixed, iommu_use_default_domain() goes at the end, and calls arch_teardown_dma_ops() if it fails. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Stuart Yoder <stuyoder@gmail.com> Cc: Laurentiu Tudor <laurentiu.tudor@nxp.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20220418005000.897664-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-18 08:49:53 +08:00
bool driver_managed_dma;
};
static inline struct pci_driver *to_pci_driver(struct device_driver *drv)
{
return drv ? container_of(drv, struct pci_driver, driver) : NULL;
}
/**
* PCI_DEVICE - macro used to describe a specific PCI device
* @vend: the 16 bit PCI Vendor ID
* @dev: the 16 bit PCI Device ID
*
* This macro is used to create a struct pci_device_id that matches a
* specific device. The subvendor and subdevice fields will be set to
* PCI_ANY_ID.
*/
#define PCI_DEVICE(vend,dev) \
.vendor = (vend), .device = (dev), \
.subvendor = PCI_ANY_ID, .subdevice = PCI_ANY_ID
/**
* PCI_DEVICE_DRIVER_OVERRIDE - macro used to describe a PCI device with
* override_only flags.
* @vend: the 16 bit PCI Vendor ID
* @dev: the 16 bit PCI Device ID
* @driver_override: the 32 bit PCI Device override_only
*
* This macro is used to create a struct pci_device_id that matches only a
* driver_override device. The subvendor and subdevice fields will be set to
* PCI_ANY_ID.
*/
#define PCI_DEVICE_DRIVER_OVERRIDE(vend, dev, driver_override) \
.vendor = (vend), .device = (dev), .subvendor = PCI_ANY_ID, \
.subdevice = PCI_ANY_ID, .override_only = (driver_override)
PCI / VFIO: Add 'override_only' support for VFIO PCI sub system Expose an 'override_only' helper macro (i.e. PCI_DRIVER_OVERRIDE_DEVICE_VFIO) for VFIO PCI sub system and add the required code to prefix its matching entries with "vfio_" in modules.alias file. It allows VFIO device drivers to include match entries in the modules.alias file produced by kbuild that are not used for normal driver autoprobing and module autoloading. Drivers using these match entries can be connected to the PCI device manually, by userspace, using the existing driver_override sysfs. For example the resulting modules.alias may have: alias pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_core alias vfio_pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_vfio_pci alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci In this example mlx5_core and mlx5_vfio_pci match to the same PCI device. The kernel will autoload and autobind to mlx5_core but the kernel and udev mechanisms will ignore mlx5_vfio_pci. When userspace wants to change a device to the VFIO subsystem it can implement a generic algorithm: 1) Identify the sysfs path to the device: /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 2) Get the modalias string from the kernel: $ cat /sys/bus/pci/devices/0000:01:00.0/modalias pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00 3) Prefix it with vfio_: vfio_pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00 4) Search modules.alias for the above string and select the entry that has the fewest *'s: alias vfio_pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_vfio_pci 5) modprobe the matched module name: $ modprobe mlx5_vfio_pci 6) cat the matched module name to driver_override: echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override 7) unbind device from original module echo 0000:01:00.0 > /sys/bus/pci/devices/0000:01:00.0/driver/unbind 8) probe PCI drivers (or explicitly bind to mlx5_vfio_pci) echo 0000:01:00.0 > /sys/bus/pci/drivers_probe The algorithm is independent of bus type. In future the other buses with VFIO device drivers, like platform and ACPI, can use this algorithm as well. This patch is the infrastructure to provide the information in the modules.alias to userspace. Convert the only VFIO pci_driver which results in one new line in the modules.alias: alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci Later series introduce additional HW specific VFIO PCI drivers, such as mlx5_vfio_pci. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # for pci.h Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-11-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26 18:39:09 +08:00
/**
* PCI_DRIVER_OVERRIDE_DEVICE_VFIO - macro used to describe a VFIO
* "driver_override" PCI device.
* @vend: the 16 bit PCI Vendor ID
* @dev: the 16 bit PCI Device ID
*
* This macro is used to create a struct pci_device_id that matches a
* specific device. The subvendor and subdevice fields will be set to
* PCI_ANY_ID and the driver_override will be set to
* PCI_ID_F_VFIO_DRIVER_OVERRIDE.
*/
#define PCI_DRIVER_OVERRIDE_DEVICE_VFIO(vend, dev) \
PCI_DEVICE_DRIVER_OVERRIDE(vend, dev, PCI_ID_F_VFIO_DRIVER_OVERRIDE)
/**
* PCI_DEVICE_SUB - macro used to describe a specific PCI device with subsystem
* @vend: the 16 bit PCI Vendor ID
* @dev: the 16 bit PCI Device ID
* @subvend: the 16 bit PCI Subvendor ID
* @subdev: the 16 bit PCI Subdevice ID
*
* This macro is used to create a struct pci_device_id that matches a
* specific device with subsystem information.
*/
#define PCI_DEVICE_SUB(vend, dev, subvend, subdev) \
.vendor = (vend), .device = (dev), \
.subvendor = (subvend), .subdevice = (subdev)
/**
* PCI_DEVICE_CLASS - macro used to describe a specific PCI device class
* @dev_class: the class, subclass, prog-if triple for this device
* @dev_class_mask: the class mask for this device
*
* This macro is used to create a struct pci_device_id that matches a
* specific PCI class. The vendor, device, subvendor, and subdevice
* fields will be set to PCI_ANY_ID.
*/
#define PCI_DEVICE_CLASS(dev_class,dev_class_mask) \
.class = (dev_class), .class_mask = (dev_class_mask), \
.vendor = PCI_ANY_ID, .device = PCI_ANY_ID, \
.subvendor = PCI_ANY_ID, .subdevice = PCI_ANY_ID
/**
* PCI_VDEVICE - macro used to describe a specific PCI device in short form
* @vend: the vendor name
* @dev: the 16 bit PCI Device ID
*
* This macro is used to create a struct pci_device_id that matches a
* specific PCI device. The subvendor, and subdevice fields will be set
* to PCI_ANY_ID. The macro allows the next field to follow as the device
* private data.
*/
#define PCI_VDEVICE(vend, dev) \
.vendor = PCI_VENDOR_ID_##vend, .device = (dev), \
.subvendor = PCI_ANY_ID, .subdevice = PCI_ANY_ID, 0, 0
/**
* PCI_DEVICE_DATA - macro used to describe a specific PCI device in very short form
* @vend: the vendor name (without PCI_VENDOR_ID_ prefix)
* @dev: the device name (without PCI_DEVICE_ID_<vend>_ prefix)
* @data: the driver data to be filled
*
* This macro is used to create a struct pci_device_id that matches a
* specific PCI device. The subvendor, and subdevice fields will be set
* to PCI_ANY_ID.
*/
#define PCI_DEVICE_DATA(vend, dev, data) \
.vendor = PCI_VENDOR_ID_##vend, .device = PCI_DEVICE_ID_##vend##_##dev, \
.subvendor = PCI_ANY_ID, .subdevice = PCI_ANY_ID, 0, 0, \
.driver_data = (kernel_ulong_t)(data)
enum {
PCI_REASSIGN_ALL_RSRC = 0x00000001, /* Ignore firmware setup */
PCI_REASSIGN_ALL_BUS = 0x00000002, /* Reassign all bus numbers */
PCI_PROBE_ONLY = 0x00000004, /* Use existing setup */
PCI_CAN_SKIP_ISA_ALIGN = 0x00000008, /* Don't do ISA alignment */
PCI_ENABLE_PROC_DOMAINS = 0x00000010, /* Enable domains in /proc */
PCI_COMPAT_DOMAIN_0 = 0x00000020, /* ... except domain 0 */
PCI_SCAN_ALL_PCIE_DEVS = 0x00000040, /* Scan all, not just dev 0 */
};
#define PCI_IRQ_LEGACY (1 << 0) /* Allow legacy interrupts */
#define PCI_IRQ_MSI (1 << 1) /* Allow MSI interrupts */
#define PCI_IRQ_MSIX (1 << 2) /* Allow MSI-X interrupts */
#define PCI_IRQ_AFFINITY (1 << 3) /* Auto-assign affinity */
/* These external functions are only available when PCI support is enabled */
#ifdef CONFIG_PCI
extern unsigned int pci_flags;
static inline void pci_set_flags(int flags) { pci_flags = flags; }
static inline void pci_add_flags(int flags) { pci_flags |= flags; }
static inline void pci_clear_flags(int flags) { pci_flags &= ~flags; }
static inline int pci_has_flag(int flag) { return pci_flags & flag; }
void pcie_bus_configure_settings(struct pci_bus *bus);
PCI: Set PCI-E Max Payload Size on fabric On a given PCI-E fabric, each device, bridge, and root port can have a different PCI-E maximum payload size. There is a sizable performance boost for having the largest possible maximum payload size on each PCI-E device. However, if improperly configured, fatal bus errors can occur. Thus, it is important to ensure that PCI-E payloads sends by a device are never larger than the MPS setting of all devices on the way to the destination. This can be achieved two ways: - A conservative approach is to use the smallest common denominator of the entire tree below a root complex for every device on that fabric. This means for example that having a 128 bytes MPS USB controller on one leg of a switch will dramatically reduce performances of a video card or 10GE adapter on another leg of that same switch. It also means that any hierarchy supporting hotplug slots (including expresscard or thunderbolt I suppose, dbl check that) will have to be entirely clamped to 128 bytes since we cannot predict what will be plugged into those slots, and we cannot change the MPS on a "live" system. - A more optimal way is possible, if it falls within a couple of constraints: * The top-level host bridge will never generate packets larger than the smallest TLP (or if it can be controlled independently from its MPS at least) * The device will never generate packets larger than MPS (which can be configured via MRRS) * No support of direct PCI-E <-> PCI-E transfers between devices without some additional code to specifically deal with that case Then we can use an approach that basically ignores downstream requests and focuses exclusively on upstream requests. In that case, all we need to care about is that a device MPS is no larger than its parent MPS, which allows us to keep all switches/bridges to the max MPS supported by their parent and eventually the PHB. In this case, your USB controller would no longer "starve" your 10GE Ethernet and your hotplug slots won't affect your global MPS. Additionally, the hotplugged devices themselves can be configured to a larger MPS up to the value configured in the hotplug bridge. To choose between the two available options, two PCI kernel boot args have been added to the PCI calls. "pcie_bus_safe" will provide the former behavior, while "pcie_bus_perf" will perform the latter behavior. By default, the latter behavior is used. NOTE: due to the location of the enablement, each arch will need to add calls to this function. This patch only enables x86. This patch includes a number of changes recommended by Benjamin Herrenschmidt. Tested-by: Jordan_Hargrave@dell.com Signed-off-by: Jon Mason <mason@myri.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2011-07-21 04:20:54 +08:00
enum pcie_bus_config_types {
PCIE_BUS_TUNE_OFF, /* Don't touch MPS at all */
PCIE_BUS_DEFAULT, /* Ensure MPS matches upstream bridge */
PCIE_BUS_SAFE, /* Use largest MPS boot-time devices support */
PCIE_BUS_PERFORMANCE, /* Use MPS and MRRS for best performance */
PCIE_BUS_PEER2PEER, /* Set MPS = 128 for all devices */
PCI: Set PCI-E Max Payload Size on fabric On a given PCI-E fabric, each device, bridge, and root port can have a different PCI-E maximum payload size. There is a sizable performance boost for having the largest possible maximum payload size on each PCI-E device. However, if improperly configured, fatal bus errors can occur. Thus, it is important to ensure that PCI-E payloads sends by a device are never larger than the MPS setting of all devices on the way to the destination. This can be achieved two ways: - A conservative approach is to use the smallest common denominator of the entire tree below a root complex for every device on that fabric. This means for example that having a 128 bytes MPS USB controller on one leg of a switch will dramatically reduce performances of a video card or 10GE adapter on another leg of that same switch. It also means that any hierarchy supporting hotplug slots (including expresscard or thunderbolt I suppose, dbl check that) will have to be entirely clamped to 128 bytes since we cannot predict what will be plugged into those slots, and we cannot change the MPS on a "live" system. - A more optimal way is possible, if it falls within a couple of constraints: * The top-level host bridge will never generate packets larger than the smallest TLP (or if it can be controlled independently from its MPS at least) * The device will never generate packets larger than MPS (which can be configured via MRRS) * No support of direct PCI-E <-> PCI-E transfers between devices without some additional code to specifically deal with that case Then we can use an approach that basically ignores downstream requests and focuses exclusively on upstream requests. In that case, all we need to care about is that a device MPS is no larger than its parent MPS, which allows us to keep all switches/bridges to the max MPS supported by their parent and eventually the PHB. In this case, your USB controller would no longer "starve" your 10GE Ethernet and your hotplug slots won't affect your global MPS. Additionally, the hotplugged devices themselves can be configured to a larger MPS up to the value configured in the hotplug bridge. To choose between the two available options, two PCI kernel boot args have been added to the PCI calls. "pcie_bus_safe" will provide the former behavior, while "pcie_bus_perf" will perform the latter behavior. By default, the latter behavior is used. NOTE: due to the location of the enablement, each arch will need to add calls to this function. This patch only enables x86. This patch includes a number of changes recommended by Benjamin Herrenschmidt. Tested-by: Jordan_Hargrave@dell.com Signed-off-by: Jon Mason <mason@myri.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2011-07-21 04:20:54 +08:00
};
extern enum pcie_bus_config_types pcie_bus_config;
extern struct bus_type pci_bus_type;
/* Do NOT directly access these two variables, unless you are arch-specific PCI
* code, or PCI core code. */
extern struct list_head pci_root_buses; /* List of all known PCI buses */
/* Some device drivers need know if PCI is initiated */
int no_pci_devices(void);
void pcibios_resource_survey_bus(struct pci_bus *bus);
void pcibios_bus_add_device(struct pci_dev *pdev);
void pcibios_add_bus(struct pci_bus *bus);
void pcibios_remove_bus(struct pci_bus *bus);
void pcibios_fixup_bus(struct pci_bus *);
int __must_check pcibios_enable_device(struct pci_dev *, int mask);
/* Architecture-specific versions may override this (weak) */
char *pcibios_setup(char *str);
/* Used only when drivers/pci/setup.c is used */
resource_size_t pcibios_align_resource(void *, const struct resource *,
resource_size_t,
resource_size_t);
/* Weak but can be overridden by arch */
void pci_fixup_cardbus(struct pci_bus *);
/* Generic PCI functions used internally */
void pcibios_resource_to_bus(struct pci_bus *bus, struct pci_bus_region *region,
struct resource *res);
void pcibios_bus_to_resource(struct pci_bus *bus, struct resource *res,
struct pci_bus_region *region);
void pcibios_scan_specific_bus(int busn);
struct pci_bus *pci_find_bus(int domain, int busnr);
void pci_bus_add_devices(const struct pci_bus *bus);
struct pci_bus *pci_scan_bus(int bus, struct pci_ops *ops, void *sysdata);
struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
struct pci_ops *ops, void *sysdata,
struct list_head *resources);
int pci_host_probe(struct pci_host_bridge *bridge);
int pci_bus_insert_busn_res(struct pci_bus *b, int bus, int busmax);
int pci_bus_update_busn_res_end(struct pci_bus *b, int busmax);
void pci_bus_release_busn_res(struct pci_bus *b);
struct pci_bus *pci_scan_root_bus(struct device *parent, int bus,
struct pci_ops *ops, void *sysdata,
struct list_head *resources);
int pci_scan_root_bus_bridge(struct pci_host_bridge *bridge);
struct pci_bus *pci_add_new_bus(struct pci_bus *parent, struct pci_dev *dev,
int busnr);
PCI: introduce pci_slot Currently, /sys/bus/pci/slots/ only exposes hotplug attributes when a hotplug driver is loaded, but PCI slots have attributes such as address, speed, width, etc. that are not related to hotplug at all. Introduce pci_slot as the primary data structure and kobject model. Hotplug attributes described in hotplug_slot become a secondary structure associated with the pci_slot. This patch only creates the infrastructure that allows the separation of PCI slot attributes and hotplug attributes. In this patch, the PCI hotplug core remains the only user of this infrastructure, and thus, /sys/bus/pci/slots/ will still only become populated when a hotplug driver is loaded. A later patch in this series will add a second user of this new infrastructure and demonstrate splitting the task of exposing pci_slot attributes from hotplug_slot attributes. - Make pci_slot the primary sysfs entity. hotplug_slot becomes a subsidiary structure. o pci_create_slot() creates and registers a slot with the PCI core o pci_slot_add_hotplug() gives it hotplug capability - Change the prototype of pci_hp_register() to take the bus and slot number (on parent bus) as parameters. - Remove all the ->get_address methods since this functionality is now handled by pci_slot directly. [achiang@hp.com: rpaphp-correctly-pci_hp_register-for-empty-pci-slots] Tested-by: Badari Pulavarty <pbadari@us.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: make headers_check happy] [akpm@linux-foundation.org: nuther build fix] [akpm@linux-foundation.org: fix typo in #include] Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Matthew Wilcox <matthew@wil.cx> Cc: Greg KH <greg@kroah.com> Cc: Kristen Carlson Accardi <kristen.c.accardi@intel.com> Cc: Len Brown <lenb@kernel.org> Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-11 05:28:50 +08:00
struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
const char *name,
struct hotplug_slot *hotplug);
PCI: introduce pci_slot Currently, /sys/bus/pci/slots/ only exposes hotplug attributes when a hotplug driver is loaded, but PCI slots have attributes such as address, speed, width, etc. that are not related to hotplug at all. Introduce pci_slot as the primary data structure and kobject model. Hotplug attributes described in hotplug_slot become a secondary structure associated with the pci_slot. This patch only creates the infrastructure that allows the separation of PCI slot attributes and hotplug attributes. In this patch, the PCI hotplug core remains the only user of this infrastructure, and thus, /sys/bus/pci/slots/ will still only become populated when a hotplug driver is loaded. A later patch in this series will add a second user of this new infrastructure and demonstrate splitting the task of exposing pci_slot attributes from hotplug_slot attributes. - Make pci_slot the primary sysfs entity. hotplug_slot becomes a subsidiary structure. o pci_create_slot() creates and registers a slot with the PCI core o pci_slot_add_hotplug() gives it hotplug capability - Change the prototype of pci_hp_register() to take the bus and slot number (on parent bus) as parameters. - Remove all the ->get_address methods since this functionality is now handled by pci_slot directly. [achiang@hp.com: rpaphp-correctly-pci_hp_register-for-empty-pci-slots] Tested-by: Badari Pulavarty <pbadari@us.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: make headers_check happy] [akpm@linux-foundation.org: nuther build fix] [akpm@linux-foundation.org: fix typo in #include] Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Matthew Wilcox <matthew@wil.cx> Cc: Greg KH <greg@kroah.com> Cc: Kristen Carlson Accardi <kristen.c.accardi@intel.com> Cc: Len Brown <lenb@kernel.org> Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-11 05:28:50 +08:00
void pci_destroy_slot(struct pci_slot *slot);
#ifdef CONFIG_SYSFS
void pci_dev_assign_slot(struct pci_dev *dev);
#else
static inline void pci_dev_assign_slot(struct pci_dev *dev) { }
#endif
int pci_scan_slot(struct pci_bus *bus, int devfn);
struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn);
void pci_device_add(struct pci_dev *dev, struct pci_bus *bus);
unsigned int pci_scan_child_bus(struct pci_bus *bus);
void pci_bus_add_device(struct pci_dev *dev);
void pci_read_bridge_bases(struct pci_bus *child);
struct resource *pci_find_parent_resource(const struct pci_dev *dev,
struct resource *res);
u8 pci_swizzle_interrupt_pin(const struct pci_dev *dev, u8 pin);
int pci_get_interrupt_pin(struct pci_dev *dev, struct pci_dev **bridge);
u8 pci_common_swizzle(struct pci_dev *dev, u8 *pinp);
struct pci_dev *pci_dev_get(struct pci_dev *dev);
void pci_dev_put(struct pci_dev *dev);
void pci_remove_bus(struct pci_bus *b);
void pci_stop_and_remove_bus_device(struct pci_dev *dev);
void pci_stop_and_remove_bus_device_locked(struct pci_dev *dev);
void pci_stop_root_bus(struct pci_bus *bus);
void pci_remove_root_bus(struct pci_bus *bus);
void pci_setup_cardbus(struct pci_bus *bus);
void pcibios_setup_bridge(struct pci_bus *bus, unsigned long type);
void pci_sort_breadthfirst(void);
#define dev_is_pci(d) ((d)->bus == &pci_bus_type)
#define dev_is_pf(d) ((dev_is_pci(d) ? to_pci_dev(d)->is_physfn : false))
/* Generic PCI functions exported to card drivers */
u8 pci_bus_find_capability(struct pci_bus *bus, unsigned int devfn, int cap);
u8 pci_find_capability(struct pci_dev *dev, int cap);
u8 pci_find_next_capability(struct pci_dev *dev, u8 pos, int cap);
u8 pci_find_ht_capability(struct pci_dev *dev, int ht_cap);
u8 pci_find_next_ht_capability(struct pci_dev *dev, u8 pos, int ht_cap);
u16 pci_find_ext_capability(struct pci_dev *dev, int cap);
u16 pci_find_next_ext_capability(struct pci_dev *dev, u16 pos, int cap);
struct pci_bus *pci_find_next_bus(const struct pci_bus *from);
u16 pci_find_vsec_capability(struct pci_dev *dev, u16 vendor, int cap);
u16 pci_find_dvsec_capability(struct pci_dev *dev, u16 vendor, u16 dvsec);
u64 pci_get_dsn(struct pci_dev *dev);
struct pci_dev *pci_get_device(unsigned int vendor, unsigned int device,
struct pci_dev *from);
struct pci_dev *pci_get_subsys(unsigned int vendor, unsigned int device,
unsigned int ss_vendor, unsigned int ss_device,
struct pci_dev *from);
struct pci_dev *pci_get_slot(struct pci_bus *bus, unsigned int devfn);
struct pci_dev *pci_get_domain_bus_and_slot(int domain, unsigned int bus,
unsigned int devfn);
struct pci_dev *pci_get_class(unsigned int class, struct pci_dev *from);
int pci_dev_present(const struct pci_device_id *ids);
int pci_bus_read_config_byte(struct pci_bus *bus, unsigned int devfn,
int where, u8 *val);
int pci_bus_read_config_word(struct pci_bus *bus, unsigned int devfn,
int where, u16 *val);
int pci_bus_read_config_dword(struct pci_bus *bus, unsigned int devfn,
int where, u32 *val);
int pci_bus_write_config_byte(struct pci_bus *bus, unsigned int devfn,
int where, u8 val);
int pci_bus_write_config_word(struct pci_bus *bus, unsigned int devfn,
int where, u16 val);
int pci_bus_write_config_dword(struct pci_bus *bus, unsigned int devfn,
int where, u32 val);
int pci_generic_config_read(struct pci_bus *bus, unsigned int devfn,
int where, int size, u32 *val);
int pci_generic_config_write(struct pci_bus *bus, unsigned int devfn,
int where, int size, u32 val);
int pci_generic_config_read32(struct pci_bus *bus, unsigned int devfn,
int where, int size, u32 *val);
int pci_generic_config_write32(struct pci_bus *bus, unsigned int devfn,
int where, int size, u32 val);
struct pci_ops *pci_bus_set_ops(struct pci_bus *bus, struct pci_ops *ops);
int pci_read_config_byte(const struct pci_dev *dev, int where, u8 *val);
int pci_read_config_word(const struct pci_dev *dev, int where, u16 *val);
int pci_read_config_dword(const struct pci_dev *dev, int where, u32 *val);
int pci_write_config_byte(const struct pci_dev *dev, int where, u8 val);
int pci_write_config_word(const struct pci_dev *dev, int where, u16 val);
int pci_write_config_dword(const struct pci_dev *dev, int where, u32 val);
int pcie_capability_read_word(struct pci_dev *dev, int pos, u16 *val);
int pcie_capability_read_dword(struct pci_dev *dev, int pos, u32 *val);
int pcie_capability_write_word(struct pci_dev *dev, int pos, u16 val);
int pcie_capability_write_dword(struct pci_dev *dev, int pos, u32 val);
int pcie_capability_clear_and_set_word(struct pci_dev *dev, int pos,
u16 clear, u16 set);
int pcie_capability_clear_and_set_dword(struct pci_dev *dev, int pos,
u32 clear, u32 set);
static inline int pcie_capability_set_word(struct pci_dev *dev, int pos,
u16 set)
{
return pcie_capability_clear_and_set_word(dev, pos, 0, set);
}
static inline int pcie_capability_set_dword(struct pci_dev *dev, int pos,
u32 set)
{
return pcie_capability_clear_and_set_dword(dev, pos, 0, set);
}
static inline int pcie_capability_clear_word(struct pci_dev *dev, int pos,
u16 clear)
{
return pcie_capability_clear_and_set_word(dev, pos, clear, 0);
}
static inline int pcie_capability_clear_dword(struct pci_dev *dev, int pos,
u32 clear)
{
return pcie_capability_clear_and_set_dword(dev, pos, clear, 0);
}
/* User-space driven config access */
int pci_user_read_config_byte(struct pci_dev *dev, int where, u8 *val);
int pci_user_read_config_word(struct pci_dev *dev, int where, u16 *val);
int pci_user_read_config_dword(struct pci_dev *dev, int where, u32 *val);
int pci_user_write_config_byte(struct pci_dev *dev, int where, u8 val);
int pci_user_write_config_word(struct pci_dev *dev, int where, u16 val);
int pci_user_write_config_dword(struct pci_dev *dev, int where, u32 val);
int __must_check pci_enable_device(struct pci_dev *dev);
int __must_check pci_enable_device_io(struct pci_dev *dev);
int __must_check pci_enable_device_mem(struct pci_dev *dev);
int __must_check pci_reenable_device(struct pci_dev *);
int __must_check pcim_enable_device(struct pci_dev *pdev);
void pcim_pin_device(struct pci_dev *pdev);
static inline bool pci_intx_mask_supported(struct pci_dev *pdev)
{
/*
* INTx masking is supported if PCI_COMMAND_INTX_DISABLE is
* writable and no quirk has marked the feature broken.
*/
return !pdev->broken_intx_masking;
}
static inline int pci_is_enabled(struct pci_dev *pdev)
{
return (atomic_read(&pdev->enable_cnt) > 0);
}
static inline int pci_is_managed(struct pci_dev *pdev)
{
return pdev->is_managed;
}
void pci_disable_device(struct pci_dev *dev);
extern unsigned int pcibios_max_latency;
void pci_set_master(struct pci_dev *dev);
void pci_clear_master(struct pci_dev *dev);
int pci_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state state);
int pci_set_cacheline_size(struct pci_dev *dev);
int __must_check pci_set_mwi(struct pci_dev *dev);
int __must_check pcim_set_mwi(struct pci_dev *dev);
int pci_try_set_mwi(struct pci_dev *dev);
void pci_clear_mwi(struct pci_dev *dev);
void pci_disable_parity(struct pci_dev *dev);
void pci_intx(struct pci_dev *dev, int enable);
bool pci_check_and_mask_intx(struct pci_dev *dev);
bool pci_check_and_unmask_intx(struct pci_dev *dev);
int pci_wait_for_pending(struct pci_dev *dev, int pos, u16 mask);
int pci_wait_for_pending_transaction(struct pci_dev *dev);
int pcix_get_max_mmrbc(struct pci_dev *dev);
int pcix_get_mmrbc(struct pci_dev *dev);
int pcix_set_mmrbc(struct pci_dev *dev, int mmrbc);
int pcie_get_readrq(struct pci_dev *dev);
int pcie_set_readrq(struct pci_dev *dev, int rq);
PCI: Set PCI-E Max Payload Size on fabric On a given PCI-E fabric, each device, bridge, and root port can have a different PCI-E maximum payload size. There is a sizable performance boost for having the largest possible maximum payload size on each PCI-E device. However, if improperly configured, fatal bus errors can occur. Thus, it is important to ensure that PCI-E payloads sends by a device are never larger than the MPS setting of all devices on the way to the destination. This can be achieved two ways: - A conservative approach is to use the smallest common denominator of the entire tree below a root complex for every device on that fabric. This means for example that having a 128 bytes MPS USB controller on one leg of a switch will dramatically reduce performances of a video card or 10GE adapter on another leg of that same switch. It also means that any hierarchy supporting hotplug slots (including expresscard or thunderbolt I suppose, dbl check that) will have to be entirely clamped to 128 bytes since we cannot predict what will be plugged into those slots, and we cannot change the MPS on a "live" system. - A more optimal way is possible, if it falls within a couple of constraints: * The top-level host bridge will never generate packets larger than the smallest TLP (or if it can be controlled independently from its MPS at least) * The device will never generate packets larger than MPS (which can be configured via MRRS) * No support of direct PCI-E <-> PCI-E transfers between devices without some additional code to specifically deal with that case Then we can use an approach that basically ignores downstream requests and focuses exclusively on upstream requests. In that case, all we need to care about is that a device MPS is no larger than its parent MPS, which allows us to keep all switches/bridges to the max MPS supported by their parent and eventually the PHB. In this case, your USB controller would no longer "starve" your 10GE Ethernet and your hotplug slots won't affect your global MPS. Additionally, the hotplugged devices themselves can be configured to a larger MPS up to the value configured in the hotplug bridge. To choose between the two available options, two PCI kernel boot args have been added to the PCI calls. "pcie_bus_safe" will provide the former behavior, while "pcie_bus_perf" will perform the latter behavior. By default, the latter behavior is used. NOTE: due to the location of the enablement, each arch will need to add calls to this function. This patch only enables x86. This patch includes a number of changes recommended by Benjamin Herrenschmidt. Tested-by: Jordan_Hargrave@dell.com Signed-off-by: Jon Mason <mason@myri.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2011-07-21 04:20:54 +08:00
int pcie_get_mps(struct pci_dev *dev);
int pcie_set_mps(struct pci_dev *dev, int mps);
u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev,
enum pci_bus_speed *speed,
enum pcie_link_width *width);
void pcie_print_link_status(struct pci_dev *dev);
int pcie_reset_flr(struct pci_dev *dev, bool probe);
int pcie_flr(struct pci_dev *dev);
int __pci_reset_function_locked(struct pci_dev *dev);
int pci_reset_function(struct pci_dev *dev);
int pci_reset_function_locked(struct pci_dev *dev);
int pci_try_reset_function(struct pci_dev *dev);
int pci_probe_reset_slot(struct pci_slot *slot);
int pci_probe_reset_bus(struct pci_bus *bus);
int pci_reset_bus(struct pci_dev *dev);
void pci_reset_secondary_bus(struct pci_dev *dev);
void pcibios_reset_secondary_bus(struct pci_dev *dev);
void pci_update_resource(struct pci_dev *dev, int resno);
int __must_check pci_assign_resource(struct pci_dev *dev, int i);
int __must_check pci_reassign_resource(struct pci_dev *dev, int i, resource_size_t add_size, resource_size_t align);
void pci_release_resource(struct pci_dev *dev, int resno);
static inline int pci_rebar_bytes_to_size(u64 bytes)
{
bytes = roundup_pow_of_two(bytes);
/* Return BAR size as defined in the resizable BAR specification */
return max(ilog2(bytes), 20) - 20;
}
u32 pci_rebar_get_possible_sizes(struct pci_dev *pdev, int bar);
int __must_check pci_resize_resource(struct pci_dev *dev, int i, int size);
int pci_select_bars(struct pci_dev *dev, unsigned long flags);
bool pci_device_is_present(struct pci_dev *pdev);
void pci_ignore_hotplug(struct pci_dev *dev);
struct pci_dev *pci_real_dma_dev(struct pci_dev *dev);
int pci_status_get_and_clear_errors(struct pci_dev *pdev);
int __printf(6, 7) pci_request_irq(struct pci_dev *dev, unsigned int nr,
irq_handler_t handler, irq_handler_t thread_fn, void *dev_id,
const char *fmt, ...);
void pci_free_irq(struct pci_dev *dev, unsigned int nr, void *dev_id);
/* ROM control related routines */
int pci_enable_rom(struct pci_dev *pdev);
void pci_disable_rom(struct pci_dev *pdev);
void __iomem __must_check *pci_map_rom(struct pci_dev *pdev, size_t *size);
void pci_unmap_rom(struct pci_dev *pdev, void __iomem *rom);
/* Power management related routines */
int pci_save_state(struct pci_dev *dev);
void pci_restore_state(struct pci_dev *dev);
struct pci_saved_state *pci_store_saved_state(struct pci_dev *dev);
int pci_load_saved_state(struct pci_dev *dev,
struct pci_saved_state *state);
int pci_load_and_free_saved_state(struct pci_dev *dev,
struct pci_saved_state **state);
int pci_platform_power_transition(struct pci_dev *dev, pci_power_t state);
int pci_set_power_state(struct pci_dev *dev, pci_power_t state);
pci_power_t pci_choose_state(struct pci_dev *dev, pm_message_t state);
bool pci_pme_capable(struct pci_dev *dev, pci_power_t state);
void pci_pme_active(struct pci_dev *dev, bool enable);
int pci_enable_wake(struct pci_dev *dev, pci_power_t state, bool enable);
int pci_wake_from_d3(struct pci_dev *dev, bool enable);
int pci_prepare_to_sleep(struct pci_dev *dev);
int pci_back_from_sleep(struct pci_dev *dev);
PCI / ACPI / PM: Platform support for PCI PME wake-up Although the majority of PCI devices can generate PMEs that in principle may be used to wake up devices suspended at run time, platform support is generally necessary to convert PMEs into wake-up events that can be delivered to the kernel. If ACPI is used for this purpose, PME signals generated by a PCI device will trigger the ACPI GPE associated with the device to generate an ACPI wake-up event that we can set up a handler for, provided that everything is configured correctly. Unfortunately, the subset of PCI devices that have GPEs associated with them is quite limited. The devices without dedicated GPEs have to rely on the GPEs associated with other devices (in the majority of cases their upstream bridges and, possibly, the root bridge) to generate ACPI wake-up events in response to PME signals from them. Add ACPI platform support for PCI PME wake-up: o Add a framework making is possible to use ACPI system notify handlers for run-time PM. o Add new PCI platform callback ->run_wake() to struct pci_platform_pm_ops allowing us to enable/disable the platform to generate wake-up events for given device. Implemet this callback for the ACPI platform. o Define ACPI wake-up handlers for PCI devices and PCI root buses and make the PCI-ACPI binding code register wake-up notifiers for all PCI devices present in the ACPI tables. o Add function pci_dev_run_wake() which can be used by PCI drivers to check if given device is capable of generating wake-up events at run time. Developed in cooperation with Matthew Garrett <mjg@redhat.com>. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-18 06:44:09 +08:00
bool pci_dev_run_wake(struct pci_dev *dev);
PCI: Put PCIe ports into D3 during suspend Currently the Linux PCI core does not touch power state of PCI bridges and PCIe ports when system suspend is entered. Leaving them in D0 consumes power unnecessarily and may prevent the CPU from entering deeper C-states. With recent PCIe hardware we can power down the ports to save power given that we take into account few restrictions: - The PCIe port hardware is recent enough, starting from 2015. - Devices connected to PCIe ports are effectively in D3cold once the port is transitioned to D3 (the config space is not accessible anymore and the link may be powered down). - Devices behind the PCIe port need to be allowed to transition to D3cold and back. There is a way both drivers and userspace can forbid this. - If the device behind the PCIe port is capable of waking the system it needs to be able to do so from D3cold. This patch adds a new flag to struct pci_device called 'bridge_d3'. This flag is set and cleared by the PCI core whenever there is a change in power management state of any of the devices behind the PCIe port. When system later on is suspended we only need to check this flag and if it is true transition the port to D3 otherwise we leave it in D0. Also provide override mechanism via command line parameter "pcie_port_pm=[off|force]" that can be used to disable or enable the feature regardless of the BIOS manufacturing date. Tested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-02 16:17:12 +08:00
void pci_d3cold_enable(struct pci_dev *dev);
void pci_d3cold_disable(struct pci_dev *dev);
bool pcie_relaxed_ordering_enabled(struct pci_dev *dev);
void pci_resume_bus(struct pci_bus *bus);
void pci_bus_set_current_state(struct pci_bus *bus, pci_power_t state);
/* For use by arch with custom probe code */
void set_pcie_port_type(struct pci_dev *pdev);
void set_pcie_hotplug_bridge(struct pci_dev *pdev);
/* Functions for PCI Hotplug drivers to use */
unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge);
unsigned int pci_rescan_bus(struct pci_bus *bus);
void pci_lock_rescan_remove(void);
void pci_unlock_rescan_remove(void);
/* Vital Product Data routines */
ssize_t pci_read_vpd(struct pci_dev *dev, loff_t pos, size_t count, void *buf);
ssize_t pci_write_vpd(struct pci_dev *dev, loff_t pos, size_t count, const void *buf);
ssize_t pci_read_vpd_any(struct pci_dev *dev, loff_t pos, size_t count, void *buf);
ssize_t pci_write_vpd_any(struct pci_dev *dev, loff_t pos, size_t count, const void *buf);
/* Helper functions for low-level code (drivers/pci/setup-[bus,res].c) */
resource_size_t pcibios_retrieve_fw_addr(struct pci_dev *dev, int idx);
void pci_bus_assign_resources(const struct pci_bus *bus);
void pci_bus_claim_resources(struct pci_bus *bus);
void pci_bus_size_bridges(struct pci_bus *bus);
int pci_claim_resource(struct pci_dev *, int);
PCI: Add pci_claim_bridge_resource() to clip window if necessary Add pci_claim_bridge_resource() to claim a PCI-PCI bridge window. This is like regular pci_claim_resource(), except that if we fail to claim the window, we check to see if we can reduce the size of the window and try again. This is for scenarios like this: pci_bus 0000:00: root bus resource [mem 0xc0000000-0xffffffff] pci 0000:00:01.0: bridge window [mem 0xbdf00000-0xddefffff 64bit pref] pci 0000:01:00.0: reg 0x10: [mem 0xc0000000-0xcfffffff pref] The 00:01.0 window is illegal: it starts before the host bridge window, so we have to assume the [0xbdf00000-0xbfffffff] region is inaccessible. We can make it legal by clipping it to [mem 0xc0000000-0xddefffff 64bit pref]. Previously we discarded the 00:01.0 window and tried to reassign that part of the hierarchy from scratch. That is a problem because Linux doesn't always assign things optimally. For example, in this case, BIOS put the 01:00.0 device in a prefetchable window below 4GB, but after 5b28541552ef, Linux puts the prefetchable window above 4GB where the 32-bit 01:00.0 device can't use it. Clipping the 00:01.0 window is less intrusive than completely reassigning things and is sufficient to let us use most of the BIOS configuration. Of course, it's possible that devices below 00:01.0 will no longer fit. If that's the case, we'll have to reassign things. But that's a separate problem. [bhelgaas: changelog, split into separate patch] Link: https://bugzilla.kernel.org/show_bug.cgi?id=85491 Reported-by: Marek Kordik <kordikmarek@gmail.com> Fixes: 5b28541552ef ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources") Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: stable@vger.kernel.org # v3.16+
2015-01-16 06:21:49 +08:00
int pci_claim_bridge_resource(struct pci_dev *bridge, int i);
void pci_assign_unassigned_resources(void);
void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge);
void pci_assign_unassigned_bus_resources(struct pci_bus *bus);
void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus);
int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type);
void pdev_enable_device(struct pci_dev *);
int pci_enable_resources(struct pci_dev *, int mask);
void pci_assign_irq(struct pci_dev *dev);
struct resource *pci_find_resource(struct pci_dev *dev, struct resource *res);
#define HAVE_PCI_REQ_REGIONS 2
int __must_check pci_request_regions(struct pci_dev *, const char *);
int __must_check pci_request_regions_exclusive(struct pci_dev *, const char *);
void pci_release_regions(struct pci_dev *);
int __must_check pci_request_region(struct pci_dev *, int, const char *);
void pci_release_region(struct pci_dev *, int);
int pci_request_selected_regions(struct pci_dev *, int, const char *);
int pci_request_selected_regions_exclusive(struct pci_dev *, int, const char *);
void pci_release_selected_regions(struct pci_dev *, int);
/* drivers/pci/bus.c */
void pci_add_resource(struct list_head *resources, struct resource *res);
void pci_add_resource_offset(struct list_head *resources, struct resource *res,
resource_size_t offset);
void pci_free_resource_list(struct list_head *resources);
void pci_bus_add_resource(struct pci_bus *bus, struct resource *res,
unsigned int flags);
struct resource *pci_bus_resource_n(const struct pci_bus *bus, int n);
void pci_bus_remove_resources(struct pci_bus *bus);
int devm_request_pci_bus_resources(struct device *dev,
struct list_head *resources);
/* Temporary until new and working PCI SBR API in place */
int pci_bridge_secondary_bus_reset(struct pci_dev *dev);
#define pci_bus_for_each_resource(bus, res, i) \
for (i = 0; \
(res = pci_bus_resource_n(bus, i)) || i < PCI_BRIDGE_RESOURCE_NUM; \
i++)
int __must_check pci_bus_alloc_resource(struct pci_bus *bus,
struct resource *res, resource_size_t size,
resource_size_t align, resource_size_t min,
unsigned long type_mask,
resource_size_t (*alignf)(void *,
const struct resource *,
resource_size_t,
resource_size_t),
void *alignf_data);
int pci_register_io_range(struct fwnode_handle *fwnode, phys_addr_t addr,
resource_size_t size);
unsigned long pci_address_to_pio(phys_addr_t addr);
phys_addr_t pci_pio_to_address(unsigned long pio);
int pci_remap_iospace(const struct resource *res, phys_addr_t phys_addr);
PCI: OF: Fix I/O space page leak When testing the R-Car PCIe driver on the Condor board, if the PCIe PHY driver was left disabled, the kernel crashed with this BUG: kernel BUG at lib/ioremap.c:72! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 39 Comm: kworker/0:1 Not tainted 4.17.0-dirty #1092 Hardware name: Renesas Condor board based on r8a77980 (DT) Workqueue: events deferred_probe_work_func pstate: 80000005 (Nzcv daif -PAN -UAO) pc : ioremap_page_range+0x370/0x3c8 lr : ioremap_page_range+0x40/0x3c8 sp : ffff000008da39e0 x29: ffff000008da39e0 x28: 00e8000000000f07 x27: ffff7dfffee00000 x26: 0140000000000000 x25: ffff7dfffef00000 x24: 00000000000fe100 x23: ffff80007b906000 x22: ffff000008ab8000 x21: ffff000008bb1d58 x20: ffff7dfffef00000 x19: ffff800009c30fb8 x18: 0000000000000001 x17: 00000000000152d0 x16: 00000000014012d0 x15: 0000000000000000 x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720 x11: 0720072007300730 x10: 00000000000000ae x9 : 0000000000000000 x8 : ffff7dffff000000 x7 : 0000000000000000 x6 : 0000000000000100 x5 : 0000000000000000 x4 : 000000007b906000 x3 : ffff80007c61a880 x2 : ffff7dfffeefffff x1 : 0000000040000000 x0 : 00e80000fe100f07 Process kworker/0:1 (pid: 39, stack limit = 0x (ptrval)) Call trace: ioremap_page_range+0x370/0x3c8 pci_remap_iospace+0x7c/0xac pci_parse_request_of_pci_ranges+0x13c/0x190 rcar_pcie_probe+0x4c/0xb04 platform_drv_probe+0x50/0xbc driver_probe_device+0x21c/0x308 __device_attach_driver+0x98/0xc8 bus_for_each_drv+0x54/0x94 __device_attach+0xc4/0x12c device_initial_probe+0x10/0x18 bus_probe_device+0x90/0x98 deferred_probe_work_func+0xb0/0x150 process_one_work+0x12c/0x29c worker_thread+0x200/0x3fc kthread+0x108/0x134 ret_from_fork+0x10/0x18 Code: f9004ba2 54000080 aa0003fb 17ffff48 (d4210000) It turned out that pci_remap_iospace() wasn't undone when the driver's probe failed, and since devm_phy_optional_get() returned -EPROBE_DEFER, the probe was retried, finally causing the BUG due to trying to remap already remapped pages. Introduce the devm_pci_remap_iospace() managed API and replace the pci_remap_iospace() call with it to fix the bug. Fixes: dbf9826d5797 ("PCI: generic: Convert to DT resource parsing API") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> [lorenzo.pieralisi@arm.com: split commit/updated the commit log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
2018-07-19 04:40:26 +08:00
int devm_pci_remap_iospace(struct device *dev, const struct resource *res,
phys_addr_t phys_addr);
void pci_unmap_iospace(struct resource *res);
void __iomem *devm_pci_remap_cfgspace(struct device *dev,
resource_size_t offset,
resource_size_t size);
void __iomem *devm_pci_remap_cfg_resource(struct device *dev,
struct resource *res);
PCI: Add pci_bus_addr_t David Ahern reported that d63e2e1f3df9 ("sparc/PCI: Clip bridge windows to fit in upstream windows") fails to boot on sparc/T5-8: pci 0000:06:00.0: reg 0x184: can't handle BAR above 4GB (bus address 0x110204000) The problem is that sparc64 assumed that dma_addr_t only needed to hold DMA addresses, i.e., bus addresses returned via the DMA API (dma_map_single(), etc.), while the PCI core assumed dma_addr_t could hold *any* bus address, including raw BAR values. On sparc64, all DMA addresses fit in 32 bits, so dma_addr_t is a 32-bit type. However, BAR values can be 64 bits wide, so they don't fit in a dma_addr_t. d63e2e1f3df9 added new checking that tripped over this mismatch. Add pci_bus_addr_t, which is wide enough to hold any PCI bus address, including both raw BAR values and DMA addresses. This will be 64 bits on 64-bit platforms and on platforms with a 64-bit dma_addr_t. Then dma_addr_t only needs to be wide enough to hold addresses from the DMA API. [bhelgaas: changelog, bugzilla, Kconfig to ensure pci_bus_addr_t is at least as wide as dma_addr_t, documentation] Fixes: d63e2e1f3df9 ("sparc/PCI: Clip bridge windows to fit in upstream windows") Fixes: 23b13bc76f35 ("PCI: Fail safely if we can't handle BARs larger than 4GB") Link: http://lkml.kernel.org/r/CAE9FiQU1gJY1LYrxs+ma5LCTEEe4xmtjRG0aXJ9K_Tsu+m9Wuw@mail.gmail.com Link: http://lkml.kernel.org/r/1427857069-6789-1-git-send-email-yinghai@kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=96231 Reported-by: David Ahern <david.ahern@oracle.com> Tested-by: David Ahern <david.ahern@oracle.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: David S. Miller <davem@davemloft.net> CC: stable@vger.kernel.org # v3.19+
2015-05-28 08:23:51 +08:00
static inline pci_bus_addr_t pci_bus_address(struct pci_dev *pdev, int bar)
{
struct pci_bus_region region;
pcibios_resource_to_bus(pdev->bus, &region, &pdev->resource[bar]);
return region.start;
}
/* Proper probing supporting hot-pluggable devices */
int __must_check __pci_register_driver(struct pci_driver *, struct module *,
const char *mod_name);
2008-07-31 03:07:04 +08:00
/* pci_register_driver() must be a macro so KBUILD_MODNAME can be expanded */
2008-07-31 03:07:04 +08:00
#define pci_register_driver(driver) \
__pci_register_driver(driver, THIS_MODULE, KBUILD_MODNAME)
void pci_unregister_driver(struct pci_driver *dev);
/**
* module_pci_driver() - Helper macro for registering a PCI driver
* @__pci_driver: pci_driver struct
*
* Helper macro for PCI drivers which do not do anything special in module
* init/exit. This eliminates a lot of boilerplate. Each module may only
* use this macro once, and calling it replaces module_init() and module_exit()
*/
#define module_pci_driver(__pci_driver) \
module_driver(__pci_driver, pci_register_driver, pci_unregister_driver)
/**
* builtin_pci_driver() - Helper macro for registering a PCI driver
* @__pci_driver: pci_driver struct
*
* Helper macro for PCI drivers which do not do anything special in their
* init code. This eliminates a lot of boilerplate. Each driver may only
* use this macro once, and calling it replaces device_initcall(...)
*/
#define builtin_pci_driver(__pci_driver) \
builtin_driver(__pci_driver, pci_register_driver)
struct pci_driver *pci_dev_driver(const struct pci_dev *dev);
int pci_add_dynid(struct pci_driver *drv,
unsigned int vendor, unsigned int device,
unsigned int subvendor, unsigned int subdevice,
unsigned int class, unsigned int class_mask,
unsigned long driver_data);
const struct pci_device_id *pci_match_id(const struct pci_device_id *ids,
struct pci_dev *dev);
int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max,
int pass);
void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
void *userdata);
int pci_cfg_space_size(struct pci_dev *dev);
unsigned char pci_bus_max_busnr(struct pci_bus *bus);
void pci_setup_bridge(struct pci_bus *bus);
resource_size_t pcibios_window_alignment(struct pci_bus *bus,
unsigned long type);
#define PCI_VGA_STATE_CHANGE_BRIDGE (1 << 0)
#define PCI_VGA_STATE_CHANGE_DECODES (1 << 1)
int pci_set_vga_state(struct pci_dev *pdev, bool decode,
unsigned int command_bits, u32 flags);
/*
* Virtual interrupts allow for more interrupts to be allocated
* than the device has interrupts for. These are not programmed
* into the device's MSI-X table and must be handled by some
* other driver means.
*/
#define PCI_IRQ_VIRTUAL (1 << 4)
#define PCI_IRQ_ALL_TYPES \
(PCI_IRQ_LEGACY | PCI_IRQ_MSI | PCI_IRQ_MSIX)
#include <linux/dmapool.h>
struct msix_entry {
u32 vector; /* Kernel uses to write allocated vector */
u16 entry; /* Driver uses to specify entry, OS writes */
};
#ifdef CONFIG_PCI_MSI
int pci_msi_vec_count(struct pci_dev *dev);
void pci_disable_msi(struct pci_dev *dev);
int pci_msix_vec_count(struct pci_dev *dev);
void pci_disable_msix(struct pci_dev *dev);
void pci_restore_msi_state(struct pci_dev *dev);
int pci_msi_enabled(void);
int pci_enable_msi(struct pci_dev *dev);
int pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries,
int minvec, int maxvec);
static inline int pci_enable_msix_exact(struct pci_dev *dev,
struct msix_entry *entries, int nvec)
{
int rc = pci_enable_msix_range(dev, entries, nvec, nvec);
if (rc < 0)
return rc;
return 0;
}
int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
unsigned int max_vecs, unsigned int flags,
genirq/affinity: Add new callback for (re)calculating interrupt sets The interrupt affinity spreading mechanism supports to spread out affinities for one or more interrupt sets. A interrupt set contains one or more interrupts. Each set is mapped to a specific functionality of a device, e.g. general I/O queues and read I/O queus of multiqueue block devices. The number of interrupts per set is defined by the driver. It depends on the total number of available interrupts for the device, which is determined by the PCI capabilites and the availability of underlying CPU resources, and the number of queues which the device provides and the driver wants to instantiate. The driver passes initial configuration for the interrupt allocation via a pointer to struct irq_affinity. Right now the allocation mechanism is complex as it requires to have a loop in the driver to determine the maximum number of interrupts which are provided by the PCI capabilities and the underlying CPU resources. This loop would have to be replicated in every driver which wants to utilize this mechanism. That's unwanted code duplication and error prone. In order to move this into generic facilities it is required to have a mechanism, which allows the recalculation of the interrupt sets and their size, in the core code. As the core code does not have any knowledge about the underlying device, a driver specific callback is required in struct irq_affinity, which can be invoked by the core code. The callback gets the number of available interupts as an argument, so the driver can calculate the corresponding number and size of interrupt sets. At the moment the struct irq_affinity pointer which is handed in from the driver and passed through to several core functions is marked 'const', but for the callback to be able to modify the data in the struct it's required to remove the 'const' qualifier. Add the optional callback to struct irq_affinity, which allows drivers to recalculate the number and size of interrupt sets and remove the 'const' qualifier. For simple invocations, which do not supply a callback, a default callback is installed, which just sets nr_sets to 1 and transfers the number of spreadable vectors to the set_size array at index 0. This is for now guarded by a check for nr_sets != 0 to keep the NVME driver working until it is converted to the callback mechanism. To make sure that the driver configuration is correct under all circumstances the callback is invoked even when there are no interrupts for queues left, i.e. the pre/post requirements already exhaust the numner of available interrupts. At the PCI layer irq_create_affinity_masks() has to be invoked even for the case where the legacy interrupt is used. That ensures that the callback is invoked and the device driver can adjust to that situation. [ tglx: Fixed the simple case (no sets required). Moved the sanity check for nr_sets after the invocation of the callback so it catches broken drivers. Fixed the kernel doc comments for struct irq_affinity and de-'This patch'-ed the changelog ] Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Cc: Sagi Grimberg <sagi@grimberg.me> Cc: linux-nvme@lists.infradead.org Cc: linux-pci@vger.kernel.org Cc: Keith Busch <keith.busch@intel.com> Cc: Sumit Saxena <sumit.saxena@broadcom.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Shivasharan Srikanteshwara <shivasharan.srikanteshwara@broadcom.com> Link: https://lkml.kernel.org/r/20190216172228.512444498@linutronix.de
2019-02-17 01:13:09 +08:00
struct irq_affinity *affd);
void pci_free_irq_vectors(struct pci_dev *dev);
int pci_irq_vector(struct pci_dev *dev, unsigned int nr);
const struct cpumask *pci_irq_get_affinity(struct pci_dev *pdev, int vec);
#else
static inline int pci_msi_vec_count(struct pci_dev *dev) { return -ENOSYS; }
static inline void pci_disable_msi(struct pci_dev *dev) { }
static inline int pci_msix_vec_count(struct pci_dev *dev) { return -ENOSYS; }
static inline void pci_disable_msix(struct pci_dev *dev) { }
static inline void pci_restore_msi_state(struct pci_dev *dev) { }
static inline int pci_msi_enabled(void) { return 0; }
static inline int pci_enable_msi(struct pci_dev *dev)
{ return -ENOSYS; }
static inline int pci_enable_msix_range(struct pci_dev *dev,
struct msix_entry *entries, int minvec, int maxvec)
{ return -ENOSYS; }
static inline int pci_enable_msix_exact(struct pci_dev *dev,
struct msix_entry *entries, int nvec)
{ return -ENOSYS; }
static inline int
pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
unsigned int max_vecs, unsigned int flags,
genirq/affinity: Add new callback for (re)calculating interrupt sets The interrupt affinity spreading mechanism supports to spread out affinities for one or more interrupt sets. A interrupt set contains one or more interrupts. Each set is mapped to a specific functionality of a device, e.g. general I/O queues and read I/O queus of multiqueue block devices. The number of interrupts per set is defined by the driver. It depends on the total number of available interrupts for the device, which is determined by the PCI capabilites and the availability of underlying CPU resources, and the number of queues which the device provides and the driver wants to instantiate. The driver passes initial configuration for the interrupt allocation via a pointer to struct irq_affinity. Right now the allocation mechanism is complex as it requires to have a loop in the driver to determine the maximum number of interrupts which are provided by the PCI capabilities and the underlying CPU resources. This loop would have to be replicated in every driver which wants to utilize this mechanism. That's unwanted code duplication and error prone. In order to move this into generic facilities it is required to have a mechanism, which allows the recalculation of the interrupt sets and their size, in the core code. As the core code does not have any knowledge about the underlying device, a driver specific callback is required in struct irq_affinity, which can be invoked by the core code. The callback gets the number of available interupts as an argument, so the driver can calculate the corresponding number and size of interrupt sets. At the moment the struct irq_affinity pointer which is handed in from the driver and passed through to several core functions is marked 'const', but for the callback to be able to modify the data in the struct it's required to remove the 'const' qualifier. Add the optional callback to struct irq_affinity, which allows drivers to recalculate the number and size of interrupt sets and remove the 'const' qualifier. For simple invocations, which do not supply a callback, a default callback is installed, which just sets nr_sets to 1 and transfers the number of spreadable vectors to the set_size array at index 0. This is for now guarded by a check for nr_sets != 0 to keep the NVME driver working until it is converted to the callback mechanism. To make sure that the driver configuration is correct under all circumstances the callback is invoked even when there are no interrupts for queues left, i.e. the pre/post requirements already exhaust the numner of available interrupts. At the PCI layer irq_create_affinity_masks() has to be invoked even for the case where the legacy interrupt is used. That ensures that the callback is invoked and the device driver can adjust to that situation. [ tglx: Fixed the simple case (no sets required). Moved the sanity check for nr_sets after the invocation of the callback so it catches broken drivers. Fixed the kernel doc comments for struct irq_affinity and de-'This patch'-ed the changelog ] Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Cc: Sagi Grimberg <sagi@grimberg.me> Cc: linux-nvme@lists.infradead.org Cc: linux-pci@vger.kernel.org Cc: Keith Busch <keith.busch@intel.com> Cc: Sumit Saxena <sumit.saxena@broadcom.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Shivasharan Srikanteshwara <shivasharan.srikanteshwara@broadcom.com> Link: https://lkml.kernel.org/r/20190216172228.512444498@linutronix.de
2019-02-17 01:13:09 +08:00
struct irq_affinity *aff_desc)
{
if ((flags & PCI_IRQ_LEGACY) && min_vecs == 1 && dev->irq)
return 1;
return -ENOSPC;
}
static inline void pci_free_irq_vectors(struct pci_dev *dev)
{
}
static inline int pci_irq_vector(struct pci_dev *dev, unsigned int nr)
{
if (WARN_ON_ONCE(nr > 0))
return -EINVAL;
return dev->irq;
}
static inline const struct cpumask *pci_irq_get_affinity(struct pci_dev *pdev,
int vec)
{
return cpu_possible_mask;
}
#endif
PCI: Add pci_irqd_intx_xlate() Legacy PCI INTx interrupts are represented in the PCI_INTERRUPT_PIN register using the range 1-4, which matches our enum pci_interrupt_pin. This is however not ideal for an IRQ domain, where with 4 interrupts we would ideally have a domain of size 4 & hwirq numbers in the range 0-3. Different PCI host controller drivers have handled this in different ways. Of those under drivers/pci/ which register an INTx IRQ domain, we have: - pcie-altera uses the range 1-4 in device trees and an IRQ domain of size 5 to cover that range, with entry 0 wasted. - pcie-xilinx & pcie-xilinx-nwl use the range 1-4 in device trees but register an IRQ domain of size 4, which doesn't cover the hwirq=4/INTD case leading to that interrupt being broken. - pci-ftpci100 & pci-aardvark use the range 0-3 in both device trees & as hwirq numbering in the driver & IRQ domain. In order to introduce some level of consistency in at least the hwirq numbering used by the drivers & IRQ domains, this patch introduces a new pci_irqd_intx_xlate() helper function which drivers using the 1-4 range in device trees can assign as the xlate callback for their INTx IRQ domain. This translates the 1-4 range into a 0-3 range, allowing us to use an IRQ domain of size 4 & avoid a wasted entry. Further patches will make use of this in drivers to allow them to use an IRQ domain of size 4 for legacy INTx interrupts without breaking INTD. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-08-16 03:02:17 +08:00
/**
* pci_irqd_intx_xlate() - Translate PCI INTx value to an IRQ domain hwirq
* @d: the INTx IRQ domain
* @node: the DT node for the device whose interrupt we're translating
* @intspec: the interrupt specifier data from the DT
* @intsize: the number of entries in @intspec
* @out_hwirq: pointer at which to write the hwirq number
* @out_type: pointer at which to write the interrupt type
*
* Translate a PCI INTx interrupt number from device tree in the range 1-4, as
* stored in the standard PCI_INTERRUPT_PIN register, to a value in the range
* 0-3 suitable for use in a 4 entry IRQ domain. That is, subtract one from the
* INTx value to obtain the hwirq number.
*
* Returns 0 on success, or -EINVAL if the interrupt specifier is out of range.
*/
static inline int pci_irqd_intx_xlate(struct irq_domain *d,
struct device_node *node,
const u32 *intspec,
unsigned int intsize,
unsigned long *out_hwirq,
unsigned int *out_type)
{
const u32 intx = intspec[0];
if (intx < PCI_INTERRUPT_INTA || intx > PCI_INTERRUPT_INTD)
return -EINVAL;
*out_hwirq = intx - PCI_INTERRUPT_INTA;
return 0;
}
#ifdef CONFIG_PCIEPORTBUS
extern bool pcie_ports_disabled;
extern bool pcie_ports_native;
#else
#define pcie_ports_disabled true
#define pcie_ports_native false
#endif
#define PCIE_LINK_STATE_L0S BIT(0)
#define PCIE_LINK_STATE_L1 BIT(1)
#define PCIE_LINK_STATE_CLKPM BIT(2)
#define PCIE_LINK_STATE_L1_1 BIT(3)
#define PCIE_LINK_STATE_L1_2 BIT(4)
#define PCIE_LINK_STATE_L1_1_PCIPM BIT(5)
#define PCIE_LINK_STATE_L1_2_PCIPM BIT(6)
#ifdef CONFIG_PCIEASPM
int pci_disable_link_state(struct pci_dev *pdev, int state);
int pci_disable_link_state_locked(struct pci_dev *pdev, int state);
void pcie_no_aspm(void);
bool pcie_aspm_support_enabled(void);
bool pcie_aspm_enabled(struct pci_dev *pdev);
#else
static inline int pci_disable_link_state(struct pci_dev *pdev, int state)
{ return 0; }
static inline int pci_disable_link_state_locked(struct pci_dev *pdev, int state)
{ return 0; }
static inline void pcie_no_aspm(void) { }
static inline bool pcie_aspm_support_enabled(void) { return false; }
static inline bool pcie_aspm_enabled(struct pci_dev *pdev) { return false; }
#endif
#ifdef CONFIG_PCIEAER
bool pci_aer_available(void);
#else
static inline bool pci_aer_available(void) { return false; }
#endif
bool pci_ats_disabled(void);
#ifdef CONFIG_PCIE_PTM
int pci_enable_ptm(struct pci_dev *dev, u8 *granularity);
bool pcie_ptm_enabled(struct pci_dev *dev);
#else
static inline int pci_enable_ptm(struct pci_dev *dev, u8 *granularity)
{ return -EINVAL; }
static inline bool pcie_ptm_enabled(struct pci_dev *dev)
{ return false; }
#endif
void pci_cfg_access_lock(struct pci_dev *dev);
bool pci_cfg_access_trylock(struct pci_dev *dev);
void pci_cfg_access_unlock(struct pci_dev *dev);
void pci_dev_lock(struct pci_dev *dev);
int pci_dev_trylock(struct pci_dev *dev);
void pci_dev_unlock(struct pci_dev *dev);
/*
* PCI domain support. Sometimes called PCI segment (eg by ACPI),
* a PCI domain is defined to be a set of PCI buses which share
* configuration space.
*/
#ifdef CONFIG_PCI_DOMAINS
extern int pci_domains_supported;
#else
enum { pci_domains_supported = 0 };
static inline int pci_domain_nr(struct pci_bus *bus) { return 0; }
static inline int pci_proc_domain(struct pci_bus *bus) { return 0; }
#endif /* CONFIG_PCI_DOMAINS */
/*
* Generic implementation for PCI domain support. If your
* architecture does not need custom management of PCI
* domains then this implementation will be used
*/
#ifdef CONFIG_PCI_DOMAINS_GENERIC
static inline int pci_domain_nr(struct pci_bus *bus)
{
return bus->domain_nr;
}
#ifdef CONFIG_ACPI
int acpi_pci_bus_find_domain_nr(struct pci_bus *bus);
#else
static inline int acpi_pci_bus_find_domain_nr(struct pci_bus *bus)
{ return 0; }
#endif
int pci_bus_find_domain_nr(struct pci_bus *bus, struct device *parent);
#endif
/* Some architectures require additional setup to direct VGA traffic */
typedef int (*arch_set_vga_state_t)(struct pci_dev *pdev, bool decode,
unsigned int command_bits, u32 flags);
void pci_register_set_vga_state(arch_set_vga_state_t func);
static inline int
pci_request_io_regions(struct pci_dev *pdev, const char *name)
{
return pci_request_selected_regions(pdev,
pci_select_bars(pdev, IORESOURCE_IO), name);
}
static inline void
pci_release_io_regions(struct pci_dev *pdev)
{
return pci_release_selected_regions(pdev,
pci_select_bars(pdev, IORESOURCE_IO));
}
static inline int
pci_request_mem_regions(struct pci_dev *pdev, const char *name)
{
return pci_request_selected_regions(pdev,
pci_select_bars(pdev, IORESOURCE_MEM), name);
}
static inline void
pci_release_mem_regions(struct pci_dev *pdev)
{
return pci_release_selected_regions(pdev,
pci_select_bars(pdev, IORESOURCE_MEM));
}
#else /* CONFIG_PCI is not enabled */
static inline void pci_set_flags(int flags) { }
static inline void pci_add_flags(int flags) { }
static inline void pci_clear_flags(int flags) { }
static inline int pci_has_flag(int flag) { return 0; }
/*
* If the system does not have PCI, clearly these return errors. Define
* these as simple inline functions to avoid hair in drivers.
*/
#define _PCI_NOP(o, s, t) \
static inline int pci_##o##_config_##s(struct pci_dev *dev, \
int where, t val) \
{ return PCIBIOS_FUNC_NOT_SUPPORTED; }
#define _PCI_NOP_ALL(o, x) _PCI_NOP(o, byte, u8 x) \
_PCI_NOP(o, word, u16 x) \
_PCI_NOP(o, dword, u32 x)
_PCI_NOP_ALL(read, *)
_PCI_NOP_ALL(write,)
static inline struct pci_dev *pci_get_device(unsigned int vendor,
unsigned int device,
struct pci_dev *from)
{ return NULL; }
static inline struct pci_dev *pci_get_subsys(unsigned int vendor,
unsigned int device,
unsigned int ss_vendor,
unsigned int ss_device,
struct pci_dev *from)
{ return NULL; }
static inline struct pci_dev *pci_get_class(unsigned int class,
struct pci_dev *from)
{ return NULL; }
static inline int pci_dev_present(const struct pci_device_id *ids)
{ return 0; }
#define no_pci_devices() (1)
#define pci_dev_put(dev) do { } while (0)
static inline void pci_set_master(struct pci_dev *dev) { }
static inline int pci_enable_device(struct pci_dev *dev) { return -EIO; }
static inline void pci_disable_device(struct pci_dev *dev) { }
crypto: inside-secure - Remove #ifdef checks When both PCI and OF are disabled, no drivers are registered, and we get some unused-function warnings: drivers/crypto/inside-secure/safexcel.c:1221:13: error: unused function 'safexcel_unregister_algorithms' [-Werror,-Wunused-function] static void safexcel_unregister_algorithms(struct safexcel_crypto_priv *priv) drivers/crypto/inside-secure/safexcel.c:1307:12: error: unused function 'safexcel_probe_generic' [-Werror,-Wunused-function] static int safexcel_probe_generic(void *pdev, drivers/crypto/inside-secure/safexcel.c:1531:13: error: unused function 'safexcel_hw_reset_rings' [-Werror,-Wunused-function] static void safexcel_hw_reset_rings(struct safexcel_crypto_priv *priv) It's better to make the compiler see what is going on and remove such ifdef checks completely. In case of PCI, this is trivial since pci_register_driver() is defined to an empty function that makes the compiler subsequently drop all unused code silently. The global pcireg_rc/ofreg_rc variables are not actually needed here since the driver registration does not fail in ways that would make it helpful. For CONFIG_OF, an IS_ENABLED() check is still required, since platform drivers can exist both with and without it. A little change to linux/pci.h is needed to ensure that pcim_enable_device() is visible to the driver. Moving the declaration outside of ifdef would be sufficient here, but for consistency with the rest of the file, adding an inline helper is probably best. Fixes: 212ef6f29e5b ("crypto: inside-secure - Fix unused variable warning when CONFIG_PCI=n") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci.h Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-09-30 20:14:35 +08:00
static inline int pcim_enable_device(struct pci_dev *pdev) { return -EIO; }
static inline int pci_assign_resource(struct pci_dev *dev, int i)
{ return -EBUSY; }
static inline int __must_check __pci_register_driver(struct pci_driver *drv,
struct module *owner,
const char *mod_name)
{ return 0; }
static inline int pci_register_driver(struct pci_driver *drv)
{ return 0; }
static inline void pci_unregister_driver(struct pci_driver *drv) { }
static inline u8 pci_find_capability(struct pci_dev *dev, int cap)
{ return 0; }
static inline int pci_find_next_capability(struct pci_dev *dev, u8 post,
int cap)
{ return 0; }
static inline int pci_find_ext_capability(struct pci_dev *dev, int cap)
{ return 0; }
static inline u64 pci_get_dsn(struct pci_dev *dev)
{ return 0; }
/* Power management related routines */
static inline int pci_save_state(struct pci_dev *dev) { return 0; }
static inline void pci_restore_state(struct pci_dev *dev) { }
static inline int pci_set_power_state(struct pci_dev *dev, pci_power_t state)
{ return 0; }
static inline int pci_wake_from_d3(struct pci_dev *dev, bool enable)
{ return 0; }
static inline pci_power_t pci_choose_state(struct pci_dev *dev,
pm_message_t state)
{ return PCI_D0; }
static inline int pci_enable_wake(struct pci_dev *dev, pci_power_t state,
int enable)
{ return 0; }
static inline struct resource *pci_find_resource(struct pci_dev *dev,
struct resource *res)
{ return NULL; }
static inline int pci_request_regions(struct pci_dev *dev, const char *res_name)
{ return -EIO; }
static inline void pci_release_regions(struct pci_dev *dev) { }
static inline int pci_register_io_range(struct fwnode_handle *fwnode,
phys_addr_t addr, resource_size_t size)
{ return -EINVAL; }
static inline unsigned long pci_address_to_pio(phys_addr_t addr) { return -1; }
static inline struct pci_bus *pci_find_next_bus(const struct pci_bus *from)
{ return NULL; }
static inline struct pci_dev *pci_get_slot(struct pci_bus *bus,
unsigned int devfn)
{ return NULL; }
static inline struct pci_dev *pci_get_domain_bus_and_slot(int domain,
unsigned int bus, unsigned int devfn)
{ return NULL; }
static inline int pci_domain_nr(struct pci_bus *bus) { return 0; }
static inline struct pci_dev *pci_dev_get(struct pci_dev *dev) { return NULL; }
#define dev_is_pci(d) (false)
#define dev_is_pf(d) (false)
static inline bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags)
{ return false; }
static inline int pci_irqd_intx_xlate(struct irq_domain *d,
struct device_node *node,
const u32 *intspec,
unsigned int intsize,
unsigned long *out_hwirq,
unsigned int *out_type)
{ return -EINVAL; }
static inline const struct pci_device_id *pci_match_id(const struct pci_device_id *ids,
struct pci_dev *dev)
{ return NULL; }
static inline bool pci_ats_disabled(void) { return true; }
static inline int pci_irq_vector(struct pci_dev *dev, unsigned int nr)
{
return -EINVAL;
}
static inline int
pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
unsigned int max_vecs, unsigned int flags,
struct irq_affinity *aff_desc)
{
return -ENOSPC;
}
#endif /* CONFIG_PCI */
static inline int
pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
unsigned int max_vecs, unsigned int flags)
{
return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
NULL);
}
/* Include architecture-dependent settings and functions */
#include <asm/pci.h>
PCI: Remove pci_mmap_page_range() wrapper The ARCH_GENERIC_PCI_MMAP_RESOURCE symbol came up in a recent discussion, and I noticed that this was left behind by an unfinished cleanup from 2017. The only architecture that still relies on providing its own pci_mmap_page_range() helper instead of using the generic pci_mmap_resource_range() is sparc. Presumably the reasons for this have not changed, but at least this can be simplified by converting sparc to use the same interface as the others. The only difference between the two is the device-specific offset that gets added to or subtracted from vma->vm_pgoff. Change the only caller of pci_mmap_page_range() in common code to subtract this offset and call the modern interface, while adding it back in the sparc implementation to preserve the existing behavior. This removes the complexities of the dual interfaces from the common code, and keeps it all specific to the sparc architecture code. According to David Miller, the sparc code lets user space poke into the VGA I/O port registers by mmapping the I/O space of the parent bridge device, which is something that the generic pci_mmap_resource_range() code apparently does not. Link: https://lore.kernel.org/lkml/1519887203.622.3.camel@infradead.org/t/ Link: https://lore.kernel.org/lkml/20220714214657.2402250-3-shorne@gmail.com/ Link: https://lore.kernel.org/r/20220715153617.3393420-1-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Stafford Horne <shorne@gmail.com>
2022-07-15 23:36:16 +08:00
/*
* pci_mmap_resource_range() maps a specific BAR, and vm->vm_pgoff
* is expected to be an offset within that region.
*
*/
int pci_mmap_resource_range(struct pci_dev *dev, int bar,
struct vm_area_struct *vma,
enum pci_mmap_state mmap_state, int write_combine);
#ifndef arch_can_pci_mmap_wc
#define arch_can_pci_mmap_wc() 0
#endif
#ifndef arch_can_pci_mmap_io
#define arch_can_pci_mmap_io() 0
#define pci_iobar_pfn(pdev, bar, vma) (-EINVAL)
#else
int pci_iobar_pfn(struct pci_dev *pdev, int bar, struct vm_area_struct *vma);
#endif
#ifndef pci_root_bus_fwnode
#define pci_root_bus_fwnode(bus) NULL
#endif
/*
* These helpers provide future and backwards compatibility
* for accessing popular PCI BAR info
*/
#define pci_resource_start(dev, bar) ((dev)->resource[(bar)].start)
#define pci_resource_end(dev, bar) ((dev)->resource[(bar)].end)
#define pci_resource_flags(dev, bar) ((dev)->resource[(bar)].flags)
#define pci_resource_len(dev,bar) \
((pci_resource_end((dev), (bar)) == 0) ? 0 : \
\
(pci_resource_end((dev), (bar)) - \
pci_resource_start((dev), (bar)) + 1))
/*
* Similar to the helpers above, these manipulate per-pci_dev
* driver-specific data. They are really just a wrapper around
* the generic device structure functions of these calls.
*/
static inline void *pci_get_drvdata(struct pci_dev *pdev)
{
return dev_get_drvdata(&pdev->dev);
}
static inline void pci_set_drvdata(struct pci_dev *pdev, void *data)
{
dev_set_drvdata(&pdev->dev, data);
}
static inline const char *pci_name(const struct pci_dev *pdev)
{
return dev_name(&pdev->dev);
}
void pci_resource_to_user(const struct pci_dev *dev, int bar,
const struct resource *rsrc,
resource_size_t *start, resource_size_t *end);
/*
* The world is not perfect and supplies us with broken PCI devices.
* For at least a part of these bugs we need a work-around, so both
* generic (drivers/pci/quirks.c) and per-architecture code can define
* fixup hooks to be called for particular buggy devices.
*/
struct pci_fixup {
u16 vendor; /* Or PCI_ANY_ID */
u16 device; /* Or PCI_ANY_ID */
u32 class; /* Or PCI_ANY_ID */
unsigned int class_shift; /* should be 0, 8, 16 */
#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
int hook_offset;
#else
void (*hook)(struct pci_dev *dev);
#endif
};
enum pci_fixup_pass {
pci_fixup_early, /* Before probing BARs */
pci_fixup_header, /* After reading configuration header */
pci_fixup_final, /* Final phase of device fixups */
pci_fixup_enable, /* pci_enable_device() time */
pci_fixup_resume, /* pci_device_resume() */
pci_fixup_suspend, /* pci_device_suspend() */
pci_fixup_resume_early, /* pci_device_resume_early() */
pci_fixup_suspend_late, /* pci_device_suspend_late() */
};
#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
#define ___DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class, \
class_shift, hook) \
__ADDRESSABLE(hook) \
asm(".section " #sec ", \"a\" \n" \
".balign 16 \n" \
".short " #vendor ", " #device " \n" \
".long " #class ", " #class_shift " \n" \
".long " #hook " - . \n" \
".previous \n");
/*
* Clang's LTO may rename static functions in C, but has no way to
* handle such renamings when referenced from inline asm. To work
* around this, create global C stubs for these cases.
*/
#ifdef CONFIG_LTO_CLANG
#define __DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class, \
class_shift, hook, stub) \
void __cficanonical stub(struct pci_dev *dev); \
void __cficanonical stub(struct pci_dev *dev) \
{ \
hook(dev); \
} \
___DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class, \
class_shift, stub)
#else
#define __DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class, \
class_shift, hook, stub) \
___DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class, \
class_shift, hook)
#endif
#define DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class, \
class_shift, hook) \
__DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class, \
class_shift, hook, __UNIQUE_ID(hook))
#else
/* Anonymous variables would be nice... */
#define DECLARE_PCI_FIXUP_SECTION(section, name, vendor, device, class, \
class_shift, hook) \
static const struct pci_fixup __PASTE(__pci_fixup_##name,__LINE__) __used \
__attribute__((__section__(#section), aligned((sizeof(void *))))) \
= { vendor, device, class, class_shift, hook };
#endif
#define DECLARE_PCI_FIXUP_CLASS_EARLY(vendor, device, class, \
class_shift, hook) \
DECLARE_PCI_FIXUP_SECTION(.pci_fixup_early, \
hook, vendor, device, class, class_shift, hook)
#define DECLARE_PCI_FIXUP_CLASS_HEADER(vendor, device, class, \
class_shift, hook) \
DECLARE_PCI_FIXUP_SECTION(.pci_fixup_header, \
hook, vendor, device, class, class_shift, hook)
#define DECLARE_PCI_FIXUP_CLASS_FINAL(vendor, device, class, \
class_shift, hook) \
DECLARE_PCI_FIXUP_SECTION(.pci_fixup_final, \
hook, vendor, device, class, class_shift, hook)
#define DECLARE_PCI_FIXUP_CLASS_ENABLE(vendor, device, class, \
class_shift, hook) \
DECLARE_PCI_FIXUP_SECTION(.pci_fixup_enable, \
hook, vendor, device, class, class_shift, hook)
#define DECLARE_PCI_FIXUP_CLASS_RESUME(vendor, device, class, \
class_shift, hook) \
DECLARE_PCI_FIXUP_SECTION(.pci_fixup_resume, \
resume##hook, vendor, device, class, class_shift, hook)
#define DECLARE_PCI_FIXUP_CLASS_RESUME_EARLY(vendor, device, class, \
class_shift, hook) \
DECLARE_PCI_FIXUP_SECTION(.pci_fixup_resume_early, \
resume_early##hook, vendor, device, class, class_shift, hook)
#define DECLARE_PCI_FIXUP_CLASS_SUSPEND(vendor, device, class, \
class_shift, hook) \
DECLARE_PCI_FIXUP_SECTION(.pci_fixup_suspend, \
suspend##hook, vendor, device, class, class_shift, hook)
#define DECLARE_PCI_FIXUP_CLASS_SUSPEND_LATE(vendor, device, class, \
class_shift, hook) \
DECLARE_PCI_FIXUP_SECTION(.pci_fixup_suspend_late, \
suspend_late##hook, vendor, device, class, class_shift, hook)
#define DECLARE_PCI_FIXUP_EARLY(vendor, device, hook) \
DECLARE_PCI_FIXUP_SECTION(.pci_fixup_early, \
hook, vendor, device, PCI_ANY_ID, 0, hook)
#define DECLARE_PCI_FIXUP_HEADER(vendor, device, hook) \
DECLARE_PCI_FIXUP_SECTION(.pci_fixup_header, \
hook, vendor, device, PCI_ANY_ID, 0, hook)
#define DECLARE_PCI_FIXUP_FINAL(vendor, device, hook) \
DECLARE_PCI_FIXUP_SECTION(.pci_fixup_final, \
hook, vendor, device, PCI_ANY_ID, 0, hook)
#define DECLARE_PCI_FIXUP_ENABLE(vendor, device, hook) \
DECLARE_PCI_FIXUP_SECTION(.pci_fixup_enable, \
hook, vendor, device, PCI_ANY_ID, 0, hook)
#define DECLARE_PCI_FIXUP_RESUME(vendor, device, hook) \
DECLARE_PCI_FIXUP_SECTION(.pci_fixup_resume, \
resume##hook, vendor, device, PCI_ANY_ID, 0, hook)
#define DECLARE_PCI_FIXUP_RESUME_EARLY(vendor, device, hook) \
DECLARE_PCI_FIXUP_SECTION(.pci_fixup_resume_early, \
resume_early##hook, vendor, device, PCI_ANY_ID, 0, hook)
#define DECLARE_PCI_FIXUP_SUSPEND(vendor, device, hook) \
DECLARE_PCI_FIXUP_SECTION(.pci_fixup_suspend, \
suspend##hook, vendor, device, PCI_ANY_ID, 0, hook)
#define DECLARE_PCI_FIXUP_SUSPEND_LATE(vendor, device, hook) \
DECLARE_PCI_FIXUP_SECTION(.pci_fixup_suspend_late, \
suspend_late##hook, vendor, device, PCI_ANY_ID, 0, hook)
#ifdef CONFIG_PCI_QUIRKS
void pci_fixup_device(enum pci_fixup_pass pass, struct pci_dev *dev);
#else
static inline void pci_fixup_device(enum pci_fixup_pass pass,
struct pci_dev *dev) { }
#endif
void __iomem *pcim_iomap(struct pci_dev *pdev, int bar, unsigned long maxlen);
void pcim_iounmap(struct pci_dev *pdev, void __iomem *addr);
void __iomem * const *pcim_iomap_table(struct pci_dev *pdev);
int pcim_iomap_regions(struct pci_dev *pdev, int mask, const char *name);
int pcim_iomap_regions_request_all(struct pci_dev *pdev, int mask,
const char *name);
void pcim_iounmap_regions(struct pci_dev *pdev, int mask);
extern int pci_pci_problems;
#define PCIPCI_FAIL 1 /* No PCI PCI DMA */
#define PCIPCI_TRITON 2
#define PCIPCI_NATOMA 4
#define PCIPCI_VIAETBF 8
#define PCIPCI_VSFX 16
#define PCIPCI_ALIMAGIK 32 /* Need low latency setting */
#define PCIAGP_FAIL 64 /* No PCI to AGP DMA */
extern unsigned long pci_cardbus_io_size;
extern unsigned long pci_cardbus_mem_size;
extern u8 pci_dfl_cache_line_size;
extern u8 pci_cache_line_size;
/* Architecture-specific versions may override these (weak) */
void pcibios_disable_device(struct pci_dev *dev);
void pcibios_set_master(struct pci_dev *dev);
int pcibios_set_pcie_reset_state(struct pci_dev *dev,
enum pcie_reset_state state);
int pcibios_device_add(struct pci_dev *dev);
void pcibios_release_device(struct pci_dev *dev);
#ifdef CONFIG_PCI
void pcibios_penalize_isa_irq(int irq, int active);
#else
static inline void pcibios_penalize_isa_irq(int irq, int active) {}
#endif
int pcibios_alloc_irq(struct pci_dev *dev);
void pcibios_free_irq(struct pci_dev *dev);
resource_size_t pcibios_default_alignment(void);
#if defined(CONFIG_PCI_MMCONFIG) || defined(CONFIG_ACPI_MCFG)
void __init pci_mmcfg_early_init(void);
void __init pci_mmcfg_late_init(void);
x86: validate against acpi motherboard resources This path adds validation of the MMCONFIG table against the ACPI reserved motherboard resources. If the MMCONFIG table is found to be reserved in ACPI, we don't bother checking the E820 table. The PCI Express firmware spec apparently tells BIOS developers that reservation in ACPI is required and E820 reservation is optional, so checking against ACPI first makes sense. Many BIOSes don't reserve the MMCONFIG region in E820 even though it is perfectly functional, the existing check needlessly disables MMCONFIG in these cases. In order to do this, MMCONFIG setup has been split into two phases. If PCI configuration type 1 is not available then MMCONFIG is enabled early as before. Otherwise, it is enabled later after the ACPI interpreter is enabled, since we need to be able to execute control methods in order to check the ACPI reserved resources. Presently this is just triggered off the end of ACPI interpreter initialization. There are a few other behavioral changes here: - Validate all MMCONFIG configurations provided, not just the first one. - Validate the entire required length of each configuration according to the provided ending bus number is reserved, not just the minimum required allocation. - Validate that the area is reserved even if we read it from the chipset directly and not from the MCFG table. This catches the case where the BIOS didn't set the location properly in the chipset and has mapped it over other things it shouldn't have. This also cleans up the MMCONFIG initialization functions so that they simply do nothing if MMCONFIG is not compiled in. Based on an original patch by Rajesh Shah from Intel. [akpm@linux-foundation.org: many fixes and cleanups] Signed-off-by: Robert Hancock <hancockr@shaw.ca> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Andi Kleen <ak@suse.de> Cc: Rajesh Shah <rajesh.shah@intel.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andi Kleen <ak@suse.de> Cc: Greg KH <greg@kroah.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-15 17:27:20 +08:00
#else
static inline void pci_mmcfg_early_init(void) { }
x86: validate against acpi motherboard resources This path adds validation of the MMCONFIG table against the ACPI reserved motherboard resources. If the MMCONFIG table is found to be reserved in ACPI, we don't bother checking the E820 table. The PCI Express firmware spec apparently tells BIOS developers that reservation in ACPI is required and E820 reservation is optional, so checking against ACPI first makes sense. Many BIOSes don't reserve the MMCONFIG region in E820 even though it is perfectly functional, the existing check needlessly disables MMCONFIG in these cases. In order to do this, MMCONFIG setup has been split into two phases. If PCI configuration type 1 is not available then MMCONFIG is enabled early as before. Otherwise, it is enabled later after the ACPI interpreter is enabled, since we need to be able to execute control methods in order to check the ACPI reserved resources. Presently this is just triggered off the end of ACPI interpreter initialization. There are a few other behavioral changes here: - Validate all MMCONFIG configurations provided, not just the first one. - Validate the entire required length of each configuration according to the provided ending bus number is reserved, not just the minimum required allocation. - Validate that the area is reserved even if we read it from the chipset directly and not from the MCFG table. This catches the case where the BIOS didn't set the location properly in the chipset and has mapped it over other things it shouldn't have. This also cleans up the MMCONFIG initialization functions so that they simply do nothing if MMCONFIG is not compiled in. Based on an original patch by Rajesh Shah from Intel. [akpm@linux-foundation.org: many fixes and cleanups] Signed-off-by: Robert Hancock <hancockr@shaw.ca> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Andi Kleen <ak@suse.de> Cc: Rajesh Shah <rajesh.shah@intel.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andi Kleen <ak@suse.de> Cc: Greg KH <greg@kroah.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-15 17:27:20 +08:00
static inline void pci_mmcfg_late_init(void) { }
#endif
int pci_ext_cfg_avail(void);
void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar);
PCI: Add pci_ioremap_wc_bar() This lets drivers take advantage of PAT when available. It should help with the transition of converting video drivers over to ioremap_wc() to help with the goal of eventually using _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache(), see: de33c442ed2a ("x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()") Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: <syrjala@sci.fi> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Antonino Daplas <adaplas@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Ville Syrjälä <syrjala@sci.fi> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: airlied@linux.ie Cc: benh@kernel.crashing.org Cc: dan.j.williams@intel.com Cc: konrad.wilk@oracle.com Cc: linux-fbdev@vger.kernel.org Cc: linux-pci@vger.kernel.org Cc: mst@redhat.com Cc: vinod.koul@intel.com Cc: xen-devel@lists.xensource.com Link: http://lkml.kernel.org/r/1440443613-13696-2-git-send-email-mcgrof@do-not-panic.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-25 03:13:23 +08:00
void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar);
#ifdef CONFIG_PCI_IOV
int pci_iov_virtfn_bus(struct pci_dev *dev, int id);
int pci_iov_virtfn_devfn(struct pci_dev *dev, int id);
int pci_iov_vf_id(struct pci_dev *dev);
PCI/IOV: Add pci_iov_get_pf_drvdata() to allow VF reaching the drvdata of a PF There are some cases where a SR-IOV VF driver will need to reach into and interact with the PF driver. This requires accessing the drvdata of the PF. Provide a function pci_iov_get_pf_drvdata() to return this PF drvdata in a safe way. Normally accessing a drvdata of a foreign struct device would be done using the device_lock() to protect against device driver probe()/remove() races. However, due to the design of pci_enable_sriov() this will result in a ABBA deadlock on the device_lock as the PF's device_lock is held during PF sriov_configure() while calling pci_enable_sriov() which in turn holds the VF's device_lock while calling VF probe(), and similarly for remove. This means the VF driver can never obtain the PF's device_lock. Instead use the implicit locking created by pci_enable/disable_sriov(). A VF driver can access its PF drvdata only while its own driver is attached, and the PF driver can control access to its own drvdata based on when it calls pci_enable/disable_sriov(). To use this API the PF driver will setup the PF drvdata in the probe() function. pci_enable_sriov() is only called from sriov_configure() which cannot happen until probe() completes, ensuring no VF races with drvdata setup. For removal, the PF driver must call pci_disable_sriov() in its remove function before destroying any of the drvdata. This ensures that all VF drivers are unbound before returning, fencing concurrent access to the drvdata. The introduction of a new function to do this access makes clear the special locking scheme and the documents the requirements on the PF/VF drivers using this. Link: https://lore.kernel.org/all/20220224142024.147653-5-yishaih@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-02-24 22:20:13 +08:00
void *pci_iov_get_pf_drvdata(struct pci_dev *dev, struct pci_driver *pf_driver);
int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
void pci_disable_sriov(struct pci_dev *dev);
int pci_iov_sysfs_link(struct pci_dev *dev, struct pci_dev *virtfn, int id);
int pci_iov_add_virtfn(struct pci_dev *dev, int id);
void pci_iov_remove_virtfn(struct pci_dev *dev, int id);
int pci_num_vf(struct pci_dev *dev);
int pci_vfs_assigned(struct pci_dev *dev);
int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
int pci_sriov_get_totalvfs(struct pci_dev *dev);
int pci_sriov_configure_simple(struct pci_dev *dev, int nr_virtfn);
resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno);
void pci_vf_drivers_autoprobe(struct pci_dev *dev, bool probe);
/* Arch may override these (weak) */
int pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs);
int pcibios_sriov_disable(struct pci_dev *pdev);
resource_size_t pcibios_iov_resource_alignment(struct pci_dev *dev, int resno);
#else
static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id)
{
return -ENOSYS;
}
static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id)
{
return -ENOSYS;
}
static inline int pci_iov_vf_id(struct pci_dev *dev)
{
return -ENOSYS;
}
PCI/IOV: Add pci_iov_get_pf_drvdata() to allow VF reaching the drvdata of a PF There are some cases where a SR-IOV VF driver will need to reach into and interact with the PF driver. This requires accessing the drvdata of the PF. Provide a function pci_iov_get_pf_drvdata() to return this PF drvdata in a safe way. Normally accessing a drvdata of a foreign struct device would be done using the device_lock() to protect against device driver probe()/remove() races. However, due to the design of pci_enable_sriov() this will result in a ABBA deadlock on the device_lock as the PF's device_lock is held during PF sriov_configure() while calling pci_enable_sriov() which in turn holds the VF's device_lock while calling VF probe(), and similarly for remove. This means the VF driver can never obtain the PF's device_lock. Instead use the implicit locking created by pci_enable/disable_sriov(). A VF driver can access its PF drvdata only while its own driver is attached, and the PF driver can control access to its own drvdata based on when it calls pci_enable/disable_sriov(). To use this API the PF driver will setup the PF drvdata in the probe() function. pci_enable_sriov() is only called from sriov_configure() which cannot happen until probe() completes, ensuring no VF races with drvdata setup. For removal, the PF driver must call pci_disable_sriov() in its remove function before destroying any of the drvdata. This ensures that all VF drivers are unbound before returning, fencing concurrent access to the drvdata. The introduction of a new function to do this access makes clear the special locking scheme and the documents the requirements on the PF/VF drivers using this. Link: https://lore.kernel.org/all/20220224142024.147653-5-yishaih@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-02-24 22:20:13 +08:00
static inline void *pci_iov_get_pf_drvdata(struct pci_dev *dev,
struct pci_driver *pf_driver)
{
return ERR_PTR(-EINVAL);
}
static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn)
{ return -ENODEV; }
static inline int pci_iov_sysfs_link(struct pci_dev *dev,
struct pci_dev *virtfn, int id)
{
return -ENODEV;
}
static inline int pci_iov_add_virtfn(struct pci_dev *dev, int id)
{
return -ENOSYS;
}
static inline void pci_iov_remove_virtfn(struct pci_dev *dev,
int id) { }
static inline void pci_disable_sriov(struct pci_dev *dev) { }
static inline int pci_num_vf(struct pci_dev *dev) { return 0; }
static inline int pci_vfs_assigned(struct pci_dev *dev)
{ return 0; }
static inline int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs)
{ return 0; }
static inline int pci_sriov_get_totalvfs(struct pci_dev *dev)
{ return 0; }
#define pci_sriov_configure_simple NULL
static inline resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
{ return 0; }
static inline void pci_vf_drivers_autoprobe(struct pci_dev *dev, bool probe) { }
#endif
#if defined(CONFIG_HOTPLUG_PCI) || defined(CONFIG_HOTPLUG_PCI_MODULE)
void pci_hp_create_module_link(struct pci_slot *pci_slot);
void pci_hp_remove_module_link(struct pci_slot *pci_slot);
#endif
/**
* pci_pcie_cap - get the saved PCIe capability offset
* @dev: PCI device
*
* PCIe capability offset is calculated at PCI device initialization
* time and saved in the data structure. This function returns saved
* PCIe capability offset. Using this instead of pci_find_capability()
* reduces unnecessary search in the PCI configuration space. If you
* need to calculate PCIe capability offset from raw device for some
* reasons, please use pci_find_capability() instead.
*/
static inline int pci_pcie_cap(struct pci_dev *dev)
{
return dev->pcie_cap;
}
/**
* pci_is_pcie - check if the PCI device is PCI Express capable
* @dev: PCI device
*
* Returns: true if the PCI device is PCI Express capable, false otherwise.
*/
static inline bool pci_is_pcie(struct pci_dev *dev)
{
return pci_pcie_cap(dev);
}
/**
* pcie_caps_reg - get the PCIe Capabilities Register
* @dev: PCI device
*/
static inline u16 pcie_caps_reg(const struct pci_dev *dev)
{
return dev->pcie_flags_reg;
}
/**
* pci_pcie_type - get the PCIe device/port type
* @dev: PCI device
*/
static inline int pci_pcie_type(const struct pci_dev *dev)
{
return (pcie_caps_reg(dev) & PCI_EXP_FLAGS_TYPE) >> 4;
}
/**
* pcie_find_root_port - Get the PCIe root port device
* @dev: PCI device
*
* Traverse up the parent chain and return the PCIe Root Port PCI Device
* for a given PCI/PCIe Device.
*/
static inline struct pci_dev *pcie_find_root_port(struct pci_dev *dev)
{
while (dev) {
if (pci_is_pcie(dev) &&
pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
return dev;
dev = pci_upstream_bridge(dev);
}
return NULL;
}
void pci_request_acs(void);
bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags);
bool pci_acs_path_enabled(struct pci_dev *start,
struct pci_dev *end, u16 acs_flags);
int pci_enable_atomic_ops_to_root(struct pci_dev *dev, u32 cap_mask);
#define PCI_VPD_LRDT 0x80 /* Large Resource Data Type */
#define PCI_VPD_LRDT_ID(x) ((x) | PCI_VPD_LRDT)
/* Large Resource Data Type Tag Item Names */
#define PCI_VPD_LTIN_ID_STRING 0x02 /* Identifier String */
#define PCI_VPD_LTIN_RO_DATA 0x10 /* Read-Only Data */
#define PCI_VPD_LTIN_RW_DATA 0x11 /* Read-Write Data */
#define PCI_VPD_LRDT_ID_STRING PCI_VPD_LRDT_ID(PCI_VPD_LTIN_ID_STRING)
#define PCI_VPD_LRDT_RO_DATA PCI_VPD_LRDT_ID(PCI_VPD_LTIN_RO_DATA)
#define PCI_VPD_LRDT_RW_DATA PCI_VPD_LRDT_ID(PCI_VPD_LTIN_RW_DATA)
#define PCI_VPD_RO_KEYWORD_PARTNO "PN"
#define PCI_VPD_RO_KEYWORD_SERIALNO "SN"
#define PCI_VPD_RO_KEYWORD_MFR_ID "MN"
#define PCI_VPD_RO_KEYWORD_VENDOR0 "V0"
#define PCI_VPD_RO_KEYWORD_CHKSUM "RV"
/**
* pci_vpd_alloc - Allocate buffer and read VPD into it
* @dev: PCI device
* @size: pointer to field where VPD length is returned
*
* Returns pointer to allocated buffer or an ERR_PTR in case of failure
*/
void *pci_vpd_alloc(struct pci_dev *dev, unsigned int *size);
/**
* pci_vpd_find_id_string - Locate id string in VPD
* @buf: Pointer to buffered VPD data
* @len: The length of the buffer area in which to search
* @size: Pointer to field where length of id string is returned
*
* Returns the index of the id string or -ENOENT if not found.
*/
int pci_vpd_find_id_string(const u8 *buf, unsigned int len, unsigned int *size);
/**
* pci_vpd_find_ro_info_keyword - Locate info field keyword in VPD RO section
* @buf: Pointer to buffered VPD data
* @len: The length of the buffer area in which to search
* @kw: The keyword to search for
* @size: Pointer to field where length of found keyword data is returned
*
* Returns the index of the information field keyword data or -ENOENT if
* not found.
*/
int pci_vpd_find_ro_info_keyword(const void *buf, unsigned int len,
const char *kw, unsigned int *size);
/**
* pci_vpd_check_csum - Check VPD checksum
* @buf: Pointer to buffered VPD data
* @len: VPD size
*
* Returns 1 if VPD has no checksum, otherwise 0 or an errno
*/
int pci_vpd_check_csum(const void *buf, unsigned int len);
/* PCI <-> OF binding helpers */
#ifdef CONFIG_OF
struct device_node;
struct irq_domain;
struct irq_domain *pci_host_bridge_of_msi_domain(struct pci_bus *bus);
bool pci_host_of_has_msi_map(struct device *dev);
/* Arch may override this (weak) */
struct device_node *pcibios_get_phb_of_node(struct pci_bus *bus);
#else /* CONFIG_OF */
static inline struct irq_domain *
pci_host_bridge_of_msi_domain(struct pci_bus *bus) { return NULL; }
static inline bool pci_host_of_has_msi_map(struct device *dev) { return false; }
#endif /* CONFIG_OF */
static inline struct device_node *
pci_device_to_OF_node(const struct pci_dev *pdev)
{
return pdev ? pdev->dev.of_node : NULL;
}
static inline struct device_node *pci_bus_to_OF_node(struct pci_bus *bus)
{
return bus ? bus->dev.of_node : NULL;
}
#ifdef CONFIG_ACPI
struct irq_domain *pci_host_bridge_acpi_msi_domain(struct pci_bus *bus);
void
pci_msi_register_fwnode_provider(struct fwnode_handle *(*fn)(struct device *));
bool pci_pr3_present(struct pci_dev *pdev);
#else
static inline struct irq_domain *
pci_host_bridge_acpi_msi_domain(struct pci_bus *bus) { return NULL; }
static inline bool pci_pr3_present(struct pci_dev *pdev) { return false; }
#endif
#ifdef CONFIG_EEH
static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev)
{
return pdev->dev.archdata.edev;
}
#endif
void pci_add_dma_alias(struct pci_dev *dev, u8 devfn_from, unsigned nr_devfns);
PCI: Add support for multiple DMA aliases Solve IOMMU support issues with PCIe non-transparent bridges that use Requester ID look-up tables (RID-LUT), e.g., the PEX8733. The NTB connects devices in two independent PCI domains. Devices separated by the NTB are not able to discover each other. A PCI packet being forwared from one domain to another has to have its RID modified so it appears on correct bus and completions are forwarded back to the original domain through the NTB. The RID is translated using a preprogrammed table (LUT) and the PCI packet propagates upstream away from the NTB. If the destination system has IOMMU enabled, the packet will be discarded because the new RID is unknown to the IOMMU. Adding a DMA alias for the new RID allows IOMMU to properly recognize the packet. Each device behind the NTB has a unique RID assigned in the RID-LUT. The current DMA alias implementation supports only a single alias, so it's not possible to support mutiple devices behind the NTB when IOMMU is enabled. Enable all possible aliases on a given bus (256) that are stored in a bitset. Alias devfn is directly translated to a bit number. The bitset is not allocated for devices that have no need for DMA aliases. More details can be found in the following article: http://www.plxtech.com/files/pdf/technical/expresslane/RTC_Enabling%20MulitHostSystemDesigns.pdf Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Acked-by: Joerg Roedel <jroedel@suse.de>
2016-03-03 22:38:02 +08:00
bool pci_devs_are_dma_aliases(struct pci_dev *dev1, struct pci_dev *dev2);
int pci_for_each_dma_alias(struct pci_dev *pdev,
int (*fn)(struct pci_dev *pdev,
u16 alias, void *data), void *data);
/* Helper functions for operation of device flag */
static inline void pci_set_dev_assigned(struct pci_dev *pdev)
{
pdev->dev_flags |= PCI_DEV_FLAGS_ASSIGNED;
}
static inline void pci_clear_dev_assigned(struct pci_dev *pdev)
{
pdev->dev_flags &= ~PCI_DEV_FLAGS_ASSIGNED;
}
static inline bool pci_is_dev_assigned(struct pci_dev *pdev)
{
return (pdev->dev_flags & PCI_DEV_FLAGS_ASSIGNED) == PCI_DEV_FLAGS_ASSIGNED;
}
/**
* pci_ari_enabled - query ARI forwarding status
* @bus: the PCI bus
*
* Returns true if ARI forwarding is enabled.
*/
static inline bool pci_ari_enabled(struct pci_bus *bus)
{
return bus->self && bus->self->ari_enabled;
}
PCI: Recognize Thunderbolt devices Detect on probe whether a PCI device is part of a Thunderbolt controller. Intel uses a Vendor-Specific Extended Capability (VSEC) with ID 0x1234 on such devices. Detect presence of this VSEC and cache it in a newly added is_thunderbolt bit in struct pci_dev. Also, add a helper to check whether a given PCI device is situated on a Thunderbolt daisy chain (i.e., below a PCI device with is_thunderbolt set). The necessity arises from the following: * If an external Thunderbolt GPU is connected to a dual GPU laptop, that GPU is currently registered with vga_switcheroo even though it can neither drive the laptop's panel nor be powered off by the platform. To vga_switcheroo it will appear as if two discrete GPUs are present. As a result, when the external GPU is runtime suspended, vga_switcheroo will cut power to the internal discrete GPU which may not be runtime suspended at all at this moment. The solution is to not register external GPUs with vga_switcheroo, which necessitates a way to recognize if they're on a Thunderbolt daisy chain. * Dual GPU MacBook Pros introduced 2011+ can no longer switch external DisplayPort ports between GPUs. (They're no longer just used for DP but have become combined DP/Thunderbolt ports.) The driver to switch the ports, drivers/platform/x86/apple-gmux.c, needs to detect presence of a Thunderbolt controller and, if found, keep external ports permanently switched to the discrete GPU. v2: Make kerneldoc for pci_is_thunderbolt_attached() more precise, drop portion of commit message pertaining to separate series. (Bjorn Helgaas) Cc: Andreas Noever <andreas.noever@gmail.com> Cc: Michael Jamet <michael.jamet@intel.com> Cc: Tomas Winkler <tomas.winkler@intel.com> Cc: Amir Levy <amir.jer.levy@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Link: http://patchwork.freedesktop.org/patch/msgid/0ab165a4a35c0b60f29d4c306c653ead14fcd8f9.1489145162.git.lukas@wunner.de
2017-03-11 04:23:45 +08:00
/**
* pci_is_thunderbolt_attached - whether device is on a Thunderbolt daisy chain
* @pdev: PCI device to check
*
* Walk upwards from @pdev and check for each encountered bridge if it's part
* of a Thunderbolt controller. Reaching the host bridge means @pdev is not
* Thunderbolt-attached. (But rather soldered to the mainboard usually.)
*/
static inline bool pci_is_thunderbolt_attached(struct pci_dev *pdev)
{
struct pci_dev *parent = pdev;
if (pdev->is_thunderbolt)
return true;
while ((parent = pci_upstream_bridge(parent)))
if (parent->is_thunderbolt)
return true;
return false;
}
#if defined(CONFIG_PCIEPORTBUS) || defined(CONFIG_EEH)
void pci_uevent_ers(struct pci_dev *pdev, enum pci_ers_result err_type);
#endif
#include <linux/dma-mapping.h>
#define pci_printk(level, pdev, fmt, arg...) \
dev_printk(level, &(pdev)->dev, fmt, ##arg)
#define pci_emerg(pdev, fmt, arg...) dev_emerg(&(pdev)->dev, fmt, ##arg)
#define pci_alert(pdev, fmt, arg...) dev_alert(&(pdev)->dev, fmt, ##arg)
#define pci_crit(pdev, fmt, arg...) dev_crit(&(pdev)->dev, fmt, ##arg)
#define pci_err(pdev, fmt, arg...) dev_err(&(pdev)->dev, fmt, ##arg)
#define pci_warn(pdev, fmt, arg...) dev_warn(&(pdev)->dev, fmt, ##arg)
#define pci_notice(pdev, fmt, arg...) dev_notice(&(pdev)->dev, fmt, ##arg)
#define pci_info(pdev, fmt, arg...) dev_info(&(pdev)->dev, fmt, ##arg)
#define pci_dbg(pdev, fmt, arg...) dev_dbg(&(pdev)->dev, fmt, ##arg)
#define pci_notice_ratelimited(pdev, fmt, arg...) \
dev_notice_ratelimited(&(pdev)->dev, fmt, ##arg)
#define pci_info_ratelimited(pdev, fmt, arg...) \
dev_info_ratelimited(&(pdev)->dev, fmt, ##arg)
#define pci_WARN(pdev, condition, fmt, arg...) \
WARN(condition, "%s %s: " fmt, \
dev_driver_string(&(pdev)->dev), pci_name(pdev), ##arg)
#define pci_WARN_ONCE(pdev, condition, fmt, arg...) \
WARN_ONCE(condition, "%s %s: " fmt, \
dev_driver_string(&(pdev)->dev), pci_name(pdev), ##arg)
#endif /* LINUX_PCI_H */