NUMA systems with ACPI normally describe the physical topology via _PXM
methods. But many BIOSes don't implement _PXM, which leaves the kernel
with no way to discover the device topology, which reduces performance
because we can't put memory and processes close to the device.
The NUMA node of a PCI device is already exported in the sysfs "numa_node"
file. Make that file writable so users can workaround the lack of _PXM
methods in the BIOS. For example:
echo 3 > /sys/devices/pci0000:ff/0000:03:1f.3/numa_node
sets the node for PCI device 0000:03:1f.3.
Writing the file emits a FW_BUG warning to encourage users to request
firmware updates. It also taints the kernel with TAINT_FIRMWARE_WORKAROUND
because overriding the node incorrectly can cause performance issues.
[bhelgaas: changelog, documentation text]
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Myron Stowe <mstowe@redhat.com>
CC: Alexander Ducyk <alexander.h.duyck@redhat.com>
CC: Jiang Liu <jiang.liu@linux.intel.com>
The "msi_bus" sysfs file for bridges sets a bus flag to allow or disallow
future driver requests for MSI or MSI-X. Previously, the sysfs file
existed for endpoints but did nothing.
Add "msi_bus" support for endpoints, so an administrator can prevent the
use of MSI and MSI-X for individual devices.
Note that as for bridges, these changes only affect future driver requests
for MSI or MSI-X, so drivers may need to be reloaded.
Add documentation for the "msi_bus" sysfs file.
[bhelgaas: changelog, comments, add "subordinate", add endpoint printk,
rework bus_flags setting, make bus_flags printk unconditional]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* pci/hotplug:
PCI: cpqphp: Fix possible null pointer dereference
NVMe: Implement PCIe reset notification callback
PCI: Notify driver before and after device reset
* pci/pci_is_bridge:
pcmcia: Use pci_is_bridge() to simplify code
PCI: pciehp: Use pci_is_bridge() to simplify code
PCI: acpiphp: Use pci_is_bridge() to simplify code
PCI: cpcihp: Use pci_is_bridge() to simplify code
PCI: shpchp: Use pci_is_bridge() to simplify code
PCI: rpaphp: Use pci_is_bridge() to simplify code
sparc/PCI: Use pci_is_bridge() to simplify code
powerpc/PCI: Use pci_is_bridge() to simplify code
ia64/PCI: Use pci_is_bridge() to simplify code
x86/PCI: Use pci_is_bridge() to simplify code
PCI: Use pci_is_bridge() to simplify code
PCI: Add new pci_is_bridge() interface
PCI: Rename pci_is_bridge() to pci_has_subordinate()
* pci/virtualization:
PCI: Introduce new device binding path using pci_dev.driver_override
Conflicts:
drivers/pci/pci-sysfs.c
The driver_override field allows us to specify the driver for a device
rather than relying on the driver to provide a positive match of the
device. This shortcuts the existing process of looking up the vendor and
device ID, adding them to the driver new_id, binding the device, then
removing the ID, but it also provides a couple advantages.
First, the above existing process allows the driver to bind to any device
matching the new_id for the window where it's enabled. This is often not
desired, such as the case of trying to bind a single device to a meta
driver like pci-stub or vfio-pci. Using driver_override we can do this
deterministically using:
echo pci-stub > /sys/bus/pci/devices/0000:03:00.0/driver_override
echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind
echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
Previously we could not invoke drivers_probe after adding a device to
new_id for a driver as we get non-deterministic behavior whether the driver
we intend or the standard driver will claim the device. Now it becomes a
deterministic process, only the driver matching driver_override will probe
the device.
To return the device to the standard driver, we simply clear the
driver_override and reprobe the device:
echo > /sys/bus/pci/devices/0000:03:00.0/driver_override
echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind
echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
Another advantage to this approach is that we can specify a driver override
to force a specific binding or prevent any binding. For instance when an
IOMMU group is exposed to userspace through VFIO we require that all
devices within that group are owned by VFIO. However, devices can be
hot-added into an IOMMU group, in which case we want to prevent the device
from binding to any driver (override driver = "none") or perhaps have it
automatically bind to vfio-pci. With driver_override it's a simple matter
for this field to be set internally when the device is first discovered to
prevent driver matches.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The PCI MSI sysfs code is a mess with kobjects for things that don't really
need to be kobjects. This patch creates attributes dynamically for the MSI
interrupts instead of using kobjects.
Note, this removes a directory from sysfs. Old MSI kobjects:
pci_device
└── msi_irqs
└── 40
└── mode
New MSI attributes:
pci_device
└── msi_irqs
└── 40
As there was only one file "mode" with the kobject model, the interrupt
number is now a file that returns the "mode" of the interrupt (msi vs.
msix).
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Ever since commit 45f035ab9b ("CONFIG_HOTPLUG should be always on"),
it has been basically impossible to build a kernel with CONFIG_HOTPLUG
turned off. Remove all the remaining references to it.
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add documentation of new sysfs files and new pci_driver SRIOV
configuration interface.
[bhelgaas: changelog]
Signed-off: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This patch adds ABI document for the following sysfs file:
/sys/bus/pci/devices/.../d3cold_allowed
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
This patch adds a per-pci-device subdirectory in sysfs called:
/sys/bus/pci/devices/<device>/msi_irqs
This sub-directory exports the set of msi vectors allocated by a given
pci device, by creating a numbered sub-directory for each vector beneath
msi_irqs. For each vector various attributes can be exported.
Currently the only attribute is called mode, which tracks the
operational mode of that vector (msi vs. msix)
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
After remove the device from /sys, we have to rescan all or
find out the bridge and access /sys../device/rescan there.
this patch add /sys/.../pci_bus/.../rescan. So user can rescan more easy.
that is more clean and easy to understand.
like after remove 0000:c4:00.0, you can rescan 0000:c4 directly.
-v2: According to Jesse, use function instead of exposing attr, so could hide
#ifdef in header file.
also add code to remove rescan file in remove path.
-v3: GregKH pointed out that we should use dev_attrs to avoid racing.
So add pcibus_attrs and make it to be member of pcibus_attrs.
-v4: Change name to pcibus_dev_attrs according to GregKH
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch exports ACPI _DSM (Device Specific Method) provided firmware
instance number and string name of PCI devices as defined by 'PCI
Firmware Specification Revision 3.1' section 4.6.7.( DSM for Naming a
PCI or PCI Express Device Under Operating Systems) to sysfs.
New files created are:
/sys/bus/pci/devices/.../label which contains the firmware name for
the device in question, and
/sys/bus/pci/devices/.../acpi_index which contains the firmware device type
instance for the given device.
cat /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/acpi_index
1
cat /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/label
Embedded Broadcom 5709C NIC 1
cat /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.1/acpi_index
2
cat /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.1/label
Embedded Broadcom 5709C NIC 2
The ACPI _DSM provided firmware 'instance number' and 'string name' will
be given priority if the firmware also provides 'SMBIOS type 41 device
type instance and string'.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Jordan Hargrave <jordan_hargrave@dell.com>
Signed-off-by: Narendra K <narendra_k@dell.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch exports SMBIOS provided firmware instance and label of
onboard PCI devices to sysfs. New files are:
/sys/bus/pci/devices/.../label which contains the firmware name for
the device in question, and
/sys/bus/pci/devices/.../index which contains the firmware device type
instance for the given device.
Signed-off-by: Jordan Hargrave <jordan_hargrave@dell.com>
Signed-off-by: Narendra K <narendra_k@dell.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This reverts commit 75568f8094.
Since they're just a convenience anyway, remove these symlinks since
they're causing duplicate filename errors in the wild.
Acked-by: Alex Chiang <achiang@canonical.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Create convenience symlinks in sysfs, linking slots to device
functions, and vice versa. These links make it easier for users to
figure out which devices actually live in what slots.
For example:
sapphire:/sys/bus/pci/slots # ls
1 10 2 3 4 5 6 7 8 9
sapphire:/sys/bus/pci/slots # ls -l 3
total 0
-r--r--r-- 1 root root 65536 Aug 18 14:10 address
lrwxrwxrwx 1 root root 0 Aug 18 14:10 function0 ->
../../../../devices/pci0000:23/0000:23:01.0
lrwxrwxrwx 1 root root 0 Aug 18 14:10 function1 ->
../../../../devices/pci0000:23/0000:23:01.1
sapphire:/sys/bus/pci/slots # ls -l 3/function0/slot
lrwxrwxrwx 1 root root 0 Aug 18 14:13 3/function0/slot ->
../../../bus/pci/slots/3
The original form of this patch was written by Matthew Wilcox,
and was enhanced to include links from the sysfs slots/ directory
pointing back at the device functions.
Cc: willy@linux.intel.com
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Some devices allow an individual function to be reset without affecting
other functions in the same device: that's what pci_reset_function does.
For devices that have this support, expose reset attribite in sysfs.
This is useful e.g. for virtualization, where a qemu userspace
process wants to reset the device when the guest is reset,
to emulate machine reboot as closely as possible.
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Create symbolic link to hotplug driver module in the PCI slot
directory (/sys/bus/pci/slots/<SLOT#>). In the past, we need to load
hotplug drivers one by one to identify the hotplug driver that handles
the slot, and it was very inconvenient especially for trouble shooting.
With this change, we can easily identify the hotplug driver.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Reviewed-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This interface allows the user to force a rescan of the device's
parent bus and all subordinate buses, and rediscover devices removed
earlier from this part of the device tree.
Cc: Trent Piepho <xyzzy@speakeasy.org>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch adds an attribute named "remove" to a PCI device's sysfs
directory. Writing a non-zero value to this attribute will remove the PCI
device and any children of it.
Trent Piepho wrote the original implementation and documentation.
Thanks to Vegard Nossum for testing under kmemcheck and finding locking
issues with the sysfs interface.
Cc: Trent Piepho <xyzzy@speakeasy.org>
Tested-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This interface allows the user to force a rescan of all PCI buses
in system, and rediscover devices that have been removed earlier.
pci_bus_attrs implementation from Trent Piepho.
Thanks to Vegard Nossum for discovering locking issues with the
sysfs interface.
Cc: Trent Piepho <xyzzy@speakeasy.org>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This adds a remove_id sysfs entry to allow users of new_id to later
remove the added dynid. One use case is management tools that want to
dynamically bind/unbind devices to pci-stub driver while devices are
assigned to KVM guests. Rather than having to track which driver was
originally bound to the driver, a mangement tool can simply:
Guest uses device
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Add sysfs ABI docs for driver entries bind, unbind and new_id. These
entries are pretty old, from 2.6.0 onwards AFAIK, so this documents
current behaviour.
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Jesse Barnes <jbarnes@hobbes.lan>
Vital Product Data (VPD) may be exposed by PCI devices in several
ways. It is generally unsafe to read this information through the
existing interfaces to user-land because of stateful interfaces.
This adds:
- abstract operations for VPD access (struct pci_vpd_ops)
- VPD state information in struct pci_dev (struct pci_vpd)
- an implementation of the VPD access method specified in PCI 2.2
(in access.c)
- a 'vpd' binary file in sysfs directories for PCI devices with VPD
operations defined
It adds a probe for PCI 2.2 VPD in pci_scan_device() and release of
VPD state in pci_release_dev().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>