Return error immediately to simplify the control flow in pci_create_attr().
No functional change intended.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
If sysfs_create_bin_file() fails, pci_create_attr() leaks the struct
bin_attribute it allocated previously.
Free the struct bin_attribute if pci_create_attr() fails.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The value of pdev->rom_attr is the definitive indicator of the fact that
we're created a sysfs attribute. Check that rather than rom_size, which is
only used incidentally when deciding whether to create a sysfs attribute.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
When pci_create_sysfs_dev_files() created the "rom" sysfs file, it set the
sysfs file size to the actual size of a ROM BAR, or if there was no ROM BAR
but the platform provided a shadow copy in RAM, to 0x20000. 0x20000 is an
arch-specific length that should not be baked into the PCI core.
Every place that sets IORESOURCE_ROM_SHADOW also sets the size of the
PCI_ROM_RESOURCE, so use the resource length always.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
PCI-2.2 VPD entries have a maximum size of 32k, but might actually be
smaller than that. To figure out the actual size one has to read the VPD
area until the 'end marker' is reached.
Per spec, reading outside of the VPD space is "not allowed." In practice,
it may cause simple read errors or even crash the card. To make matters
worse not every PCI card implements this properly, leaving us with no 'end'
marker or even completely invalid data.
Try to determine the size of the VPD data when it's first accessed. If no
valid data can be read an I/O error will be returned when reading or
writing the sysfs attribute.
As the amount of VPD data is unknown initially the size of the sysfs
attribute will always be set to '0'.
[bhelgaas: changelog, use 0/1 (not false/true) for bitfield, tweak
pci_vpd_pci22_read() error checking]
Tested-by: Shane Seymour <shane.seymour@hpe.com>
Tested-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
It is not always possible to determine the actual size of the VPD
data, so allow access to them if the size is set to '0'.
Tested-by: Shane Seymour <shane.seymour@hpe.com>
Tested-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
* pci/trivial:
PCI: shpchp: Constify hpc_ops structure
PCI: Use kobj_to_dev() instead of open-coding it
PCI: Use to_pci_dev() instead of open-coding it
PCI: Fix all whitespace issues
PCI/MSI: Fix typos in <linux/msi.h>
If a device quirk modifies the pci_dev->cfg_size to be less than
PCI_CFG_SPACE_EXP_SIZE (4096), but greater than PCI_CFG_SPACE_SIZE (256),
the PCI sysfs interface truncates the readable size to PCI_CFG_SPACE_SIZE.
Allow sysfs access to config space up to cfg_size, even if the device
doesn't support the entire 4096-byte PCIe config space.
Note that pci_read_config() and pci_write_config() limit access to
dev->cfg_size even though pcie_config_attr contains 4096 (the maximum
size).
Signed-off-by: Jason S. McMullan <jason.mcmullan@netronome.com>
[simon: edited changelog]
Signed-off-by: Simon Horman <simon.horman@netronome.com>
[bhelgaas: more changelog edits]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Commit 1266963170 ("PCI: Prevent out of bounds access in numa_node
override") missed that the user-provided node could also be negative.
Handle this case as well to avoid out-of-bounds accesses to the
node_states[] array. However, allow the special value -1, i.e.
NUMA_NO_NODE, to be able to set the 'no specific node' configuration.
Fixes: 1266963170 ("PCI: Prevent out of bounds access in numa_node override")
Fixes: 63692df103 ("PCI: Allow numa_node override via sysfs")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Sasha Levin <sasha.levin@oracle.com>
CC: Prarit Bhargava <prarit@redhat.com>
CC: stable@vger.kernel.org # v3.19+
63692df103 ("PCI: Allow numa_node override via sysfs") didn't check that
the numa node provided by userspace is valid. Passing a node number too
high would attempt to access invalid memory and trigger a kernel panic.
Fixes: 63692df103 ("PCI: Allow numa_node override via sysfs")
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v3.19+
When printing the driver_override parameter when it is 4095 and 4094 bytes
long, the printing code would access invalid memory because we need count+1
bytes for printing.
Fixes: 782a985d7a ("PCI: Introduce new device binding path using pci_dev.driver_override")
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
CC: stable@vger.kernel.org # v3.16+
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Alexander Graf <agraf@suse.de>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Here's the set of driver core patches for 3.19-rc1.
They are dominated by the removal of the .owner field in platform
drivers. They touch a lot of files, but they are "simple" changes, just
removing a line in a structure.
Other than that, a few minor driver core and debugfs changes. There are
some ath9k patches coming in through this tree that have been acked by
the wireless maintainers as they relied on the debugfs changes.
Everything has been in linux-next for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlSOD20ACgkQMUfUDdst+ylLPACg2QrW1oHhdTMT9WI8jihlHVRM
53kAoLeteByQ3iVwWurwwseRPiWa8+MI
=OVRS
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core update from Greg KH:
"Here's the set of driver core patches for 3.19-rc1.
They are dominated by the removal of the .owner field in platform
drivers. They touch a lot of files, but they are "simple" changes,
just removing a line in a structure.
Other than that, a few minor driver core and debugfs changes. There
are some ath9k patches coming in through this tree that have been
acked by the wireless maintainers as they relied on the debugfs
changes.
Everything has been in linux-next for a while"
* tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (324 commits)
Revert "ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries"
fs: debugfs: add forward declaration for struct device type
firmware class: Deletion of an unnecessary check before the function call "vunmap"
firmware loader: fix hung task warning dump
devcoredump: provide a one-way disable function
device: Add dev_<level>_once variants
ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries
ath: use seq_file api for ath9k debugfs files
debugfs: add helper function to create device related seq_file
drivers/base: cacheinfo: remove noisy error boot message
Revert "core: platform: add warning if driver has no owner"
drivers: base: support cpu cache information interface to userspace via sysfs
drivers: base: add cpu_device_create to support per-cpu devices
topology: replace custom attribute macros with standard DEVICE_ATTR*
cpumask: factor out show_cpumap into separate helper function
driver core: Fix unbalanced device reference in drivers_probe
driver core: fix race with userland in device_add()
sysfs/kernfs: make read requests on pre-alloc files use the buffer.
sysfs/kernfs: allow attributes to request write buffer be pre-allocated.
fs: sysfs: return EGBIG on write if offset is larger than file size
...
This time we have some more new material than we used to have during
the last couple of development cycles.
The most important part of it to me is the introduction of a unified
interface for accessing device properties provided by platform
firmware. It works with Device Trees and ACPI in a uniform way and
drivers using it need not worry about where the properties come
from as long as the platform firmware (either DT or ACPI) makes
them available. It covers both devices and "bare" device node
objects without struct device representation as that turns out to
be necessary in some cases. This has been in the works for quite
a few months (and development cycles) and has been approved by
all of the relevant maintainers.
On top of that, some drivers are switched over to the new interface
(at25, leds-gpio, gpio_keys_polled) and some additional changes are
made to the core GPIO subsystem to allow device drivers to manipulate
GPIOs in the "canonical" way on platforms that provide GPIO information
in their ACPI tables, but don't assign names to GPIO lines (in which
case the driver needs to do that on the basis of what it knows about
the device in question). That also has been approved by the GPIO
core maintainers and the rfkill driver is now going to use it.
Second is support for hardware P-states in the intel_pstate driver.
It uses CPUID to detect whether or not the feature is supported by
the processor in which case it will be enabled by default. However,
it can be disabled entirely from the kernel command line if necessary.
Next is support for a platform firmware interface based on ACPI
operation regions used by the PMIC (Power Management Integrated
Circuit) chips on the Intel Baytrail-T and Baytrail-T-CR platforms.
That interface is used for manipulating power resources and for
thermal management: sensor temperature reporting, trip point setting
and so on.
Also the ACPI core is now going to support the _DEP configuration
information in a limited way. Basically, _DEP it supposed to reflect
off-the-hierarchy dependencies between devices which may be very
indirect, like when AML for one device accesses locations in an
operation region handled by another device's driver (usually, the
device depended on this way is a serial bus or GPIO controller).
The support added this time is sufficient to make the ACPI battery
driver work on Asus T100A, but it is general enough to be able to
cover some other use cases in the future.
Finally, we have a new cpufreq driver for the Loongson1B processor.
In addition to the above, there are fixes and cleanups all over the
place as usual and a traditional ACPICA update to a recent upstream
release.
As far as the fixes go, the ACPI LPSS (Low-power Subsystem) driver
for Intel platforms should be able to handle power management of
the DMA engine correctly, the cpufreq-dt driver should interact
with the thermal subsystem in a better way and the ACPI backlight
driver should handle some more corner cases, among other things.
On top of the ACPICA update there are fixes for race conditions
in the ACPICA's interrupt handling code which might lead to some
random and strange looking failures on some systems.
In the cleanups department the most visible part is the series
of commits targeted at getting rid of the CONFIG_PM_RUNTIME
configuration option. That was triggered by a discussion
regarding the generic power domains code during which we realized
that trying to support certain combinations of PM config options
was painful and not really worth it, because nobody would use them
in production anyway. For this reason, we decided to make
CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and that lead to the
conclusion that the latter became redundant and CONFIG_PM could
be used instead of it. The material here makes that replacement
in a major part of the tree, but there will be at least one more
batch of that in the second part of the merge window.
Specifics:
- Support for retrieving device properties information from ACPI
_DSD device configuration objects and a unified device properties
interface for device drivers (and subsystems) on top of that.
As stated above, this works with Device Trees and ACPI and allows
device drivers to be written in a platform firmware (DT or ACPI)
agnostic way. The at25, leds-gpio and gpio_keys_polled drivers
are now going to use this new interface and the GPIO subsystem
is additionally modified to allow device drivers to assign names
to GPIO resources returned by ACPI _CRS objects (in case _DSD is
not present or does not provide the expected data). The changes
in this set are mostly from Mika Westerberg, Rafael J Wysocki,
Aaron Lu, and Darren Hart with some fixes from others (Fabio Estevam,
Geert Uytterhoeven).
- Support for Hardware Managed Performance States (HWP) as described
in Volume 3, section 14.4, of the Intel SDM in the intel_pstate
driver. CPUID is used to detect whether or not the feature is
supported by the processor. If supported, it will be enabled
automatically unless the intel_pstate=no_hwp switch is present in
the kernel command line. From Dirk Brandewie.
- New Intel Broadwell-H ID for intel_pstate (Dirk Brandewie).
- Support for firmware interface based on ACPI operation regions
used by the PMIC chips on the Intel Baytrail-T and Baytrail-T-CR
platforms for power resource control and thermal management
(Aaron Lu).
- Limited support for retrieving off-the-hierarchy dependencies
between devices from ACPI _DEP device configuration objects
and deferred probing support for the ACPI battery driver based
on the _DEP information to make that driver work on Asus T100A
(Lan Tianyu).
- New cpufreq driver for the Loongson1B processor (Kelvin Cheung).
- ACPICA update to upstream revision 20141107 which only affects
tools (Bob Moore).
- Fixes for race conditions in the ACPICA's interrupt handling
code and in the ACPI code related to system suspend and resume
(Lv Zheng and Rafael J Wysocki).
- ACPI core fix for an RCU-related issue in the ioremap() regions
management code that slowed down significantly after CPUs had
been allowed to enter idle states even if they'd had RCU callbakcs
queued and triggered some problems in certain proprietary graphics
driver (and elsewhere). The fix replaces synchronize_rcu() in
that code with synchronize_rcu_expedited() which makes the issue
go away. From Konstantin Khlebnikov.
- ACPI LPSS (Low-Power Subsystem) driver fix to handle power
management of the DMA engine included into the LPSS correctly.
The problem is that the DMA engine doesn't have ACPI PM support
of its own and it simply is turned off when the last LPSS device
having ACPI PM support goes into D3cold. To work around that,
the PM domain used by the ACPI LPSS driver is redesigned so at
least one device with ACPI PM support will be on as long as the
DMA engine is in use. From Andy Shevchenko.
- ACPI backlight driver fix to avoid using it on "Win8-compatible"
systems where it doesn't work and where it was used by default by
mistake (Aaron Lu).
- Assorted minor ACPI core fixes and cleanups from Tomasz Nowicki,
Sudeep Holla, Huang Rui, Hanjun Guo, Fabian Frederick, and
Ashwin Chaugule (mostly related to the upcoming ARM64 support).
- Intel RAPL (Running Average Power Limit) power capping driver
fixes and improvements including new processor IDs (Jacob Pan).
- Generic power domains modification to power up domains after
attaching devices to them to meet the expectations of device
drivers and bus types assuming devices to be accessible at
probe time (Ulf Hansson).
- Preliminary support for controlling device clocks from the
generic power domains core code and modifications of the
ARM/shmobile platform to use that feature (Ulf Hansson).
- Assorted minor fixes and cleanups of the generic power
domains core code (Ulf Hansson, Geert Uytterhoeven).
- Assorted minor fixes and cleanups of the device clocks control
code in the PM core (Geert Uytterhoeven, Grygorii Strashko).
- Consolidation of device power management Kconfig options by making
CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and removing the latter
which is now redundant (Rafael J Wysocki and Kevin Hilman). That
is the first batch of the changes needed for this purpose.
- Core device runtime power management support code cleanup related
to the execution of callbacks (Andrzej Hajda).
- cpuidle ARM support improvements (Lorenzo Pieralisi).
- cpuidle cleanup related to the CPUIDLE_FLAG_TIME_VALID flag and
a new MAINTAINERS entry for ARM Exynos cpuidle (Daniel Lezcano and
Bartlomiej Zolnierkiewicz).
- New cpufreq driver callback (->ready) to be executed when the
cpufreq core is ready to use a given policy object and cpufreq-dt
driver modification to use that callback for cooling device
registration (Viresh Kumar).
- cpufreq core fixes and cleanups (Viresh Kumar, Vince Hsu,
James Geboski, Tomeu Vizoso).
- Assorted fixes and cleanups in the cpufreq-pcc, intel_pstate,
cpufreq-dt, pxa2xx cpufreq drivers (Lenny Szubowicz, Ethan Zhao,
Stefan Wahren, Petr Cvek).
- OPP (Operating Performance Points) framework modification to
allow OPPs to be removed too and update of a few cpufreq drivers
(cpufreq-dt, exynos5440, imx6q, cpufreq) to remove OPPs (added
during initialization) on driver removal (Viresh Kumar).
- Hibernation core fixes and cleanups (Tina Ruchandani and
Markus Elfring).
- PM Kconfig fix related to CPU power management (Pankaj Dubey).
- cpupower tool fix (Prarit Bhargava).
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJUhj6JAAoJEILEb/54YlRxTM4P/j5g5SfqvY0QKsn7sR7MGZ6v
nsgCBhJAqTw3ocNC7EAs8z9h2GWy1KbKpakKYWAh9Fs1yZoey7tFSlcv/Rgjlp70
uU5sDQHtpE9mHKiymdsowiQuWgpl962L4k+k8hUslhlvgk1PvVbpajR6OqG8G+pD
asuIW9eh1APNkLyXmRJ3ZPomzs0VmRdZJ0NEs0lKX9mJskqEvxPIwdaxq3iaJq9B
Fo0J345zUDcJnxWblDRdHlOigCimglElfN5qJwaC4KpwUKuBvLRKbp4f69+wfT0c
kYFiR29X5KjJ2kLfP/wKsLyuDCYYXRq3tCia5M1tAqOjZ+UA89H/GDftx/5lntmv
qUlBa35VfdS1SX4HyApZitOHiLgo+It/hl8Z9bJnhyVw66NxmMQ8JYN2imb8Lhqh
XCLR7BxLTah82AapLJuQ0ZDHPzZqMPG2veC2vAzRMYzVijict/p4Y2+qBqONltER
4rs9uRVn+hamX33lCLg8BEN8zqlnT3rJFIgGaKjq/wXHAU/zpE9CjOrKMQcAg9+s
t51XMNPwypHMAYyGVhEL89ImjXnXxBkLRuquhlmEpvQchIhR+mR3dLsarGn7da44
WPIQJXzcsojXczcwwfqsJCR4I1FTFyQIW+UNh02GkDRgRovQqo+Jk762U7vQwqH+
LBdhvVaS1VW4v+FWXEoZ
=5dox
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management updates from Rafael Wysocki:
"This time we have some more new material than we used to have during
the last couple of development cycles.
The most important part of it to me is the introduction of a unified
interface for accessing device properties provided by platform
firmware. It works with Device Trees and ACPI in a uniform way and
drivers using it need not worry about where the properties come from
as long as the platform firmware (either DT or ACPI) makes them
available. It covers both devices and "bare" device node objects
without struct device representation as that turns out to be necessary
in some cases. This has been in the works for quite a few months (and
development cycles) and has been approved by all of the relevant
maintainers.
On top of that, some drivers are switched over to the new interface
(at25, leds-gpio, gpio_keys_polled) and some additional changes are
made to the core GPIO subsystem to allow device drivers to manipulate
GPIOs in the "canonical" way on platforms that provide GPIO
information in their ACPI tables, but don't assign names to GPIO lines
(in which case the driver needs to do that on the basis of what it
knows about the device in question). That also has been approved by
the GPIO core maintainers and the rfkill driver is now going to use
it.
Second is support for hardware P-states in the intel_pstate driver.
It uses CPUID to detect whether or not the feature is supported by the
processor in which case it will be enabled by default. However, it
can be disabled entirely from the kernel command line if necessary.
Next is support for a platform firmware interface based on ACPI
operation regions used by the PMIC (Power Management Integrated
Circuit) chips on the Intel Baytrail-T and Baytrail-T-CR platforms.
That interface is used for manipulating power resources and for
thermal management: sensor temperature reporting, trip point setting
and so on.
Also the ACPI core is now going to support the _DEP configuration
information in a limited way. Basically, _DEP it supposed to reflect
off-the-hierarchy dependencies between devices which may be very
indirect, like when AML for one device accesses locations in an
operation region handled by another device's driver (usually, the
device depended on this way is a serial bus or GPIO controller). The
support added this time is sufficient to make the ACPI battery driver
work on Asus T100A, but it is general enough to be able to cover some
other use cases in the future.
Finally, we have a new cpufreq driver for the Loongson1B processor.
In addition to the above, there are fixes and cleanups all over the
place as usual and a traditional ACPICA update to a recent upstream
release.
As far as the fixes go, the ACPI LPSS (Low-power Subsystem) driver for
Intel platforms should be able to handle power management of the DMA
engine correctly, the cpufreq-dt driver should interact with the
thermal subsystem in a better way and the ACPI backlight driver should
handle some more corner cases, among other things.
On top of the ACPICA update there are fixes for race conditions in the
ACPICA's interrupt handling code which might lead to some random and
strange looking failures on some systems.
In the cleanups department the most visible part is the series of
commits targeted at getting rid of the CONFIG_PM_RUNTIME configuration
option. That was triggered by a discussion regarding the generic
power domains code during which we realized that trying to support
certain combinations of PM config options was painful and not really
worth it, because nobody would use them in production anyway. For
this reason, we decided to make CONFIG_PM_SLEEP select
CONFIG_PM_RUNTIME and that lead to the conclusion that the latter
became redundant and CONFIG_PM could be used instead of it. The
material here makes that replacement in a major part of the tree, but
there will be at least one more batch of that in the second part of
the merge window.
Specifics:
- Support for retrieving device properties information from ACPI _DSD
device configuration objects and a unified device properties
interface for device drivers (and subsystems) on top of that. As
stated above, this works with Device Trees and ACPI and allows
device drivers to be written in a platform firmware (DT or ACPI)
agnostic way. The at25, leds-gpio and gpio_keys_polled drivers are
now going to use this new interface and the GPIO subsystem is
additionally modified to allow device drivers to assign names to
GPIO resources returned by ACPI _CRS objects (in case _DSD is not
present or does not provide the expected data). The changes in
this set are mostly from Mika Westerberg, Rafael J Wysocki, Aaron
Lu, and Darren Hart with some fixes from others (Fabio Estevam,
Geert Uytterhoeven).
- Support for Hardware Managed Performance States (HWP) as described
in Volume 3, section 14.4, of the Intel SDM in the intel_pstate
driver. CPUID is used to detect whether or not the feature is
supported by the processor. If supported, it will be enabled
automatically unless the intel_pstate=no_hwp switch is present in
the kernel command line. From Dirk Brandewie.
- New Intel Broadwell-H ID for intel_pstate (Dirk Brandewie).
- Support for firmware interface based on ACPI operation regions used
by the PMIC chips on the Intel Baytrail-T and Baytrail-T-CR
platforms for power resource control and thermal management (Aaron
Lu).
- Limited support for retrieving off-the-hierarchy dependencies
between devices from ACPI _DEP device configuration objects and
deferred probing support for the ACPI battery driver based on the
_DEP information to make that driver work on Asus T100A (Lan
Tianyu).
- New cpufreq driver for the Loongson1B processor (Kelvin Cheung).
- ACPICA update to upstream revision 20141107 which only affects
tools (Bob Moore).
- Fixes for race conditions in the ACPICA's interrupt handling code
and in the ACPI code related to system suspend and resume (Lv Zheng
and Rafael J Wysocki).
- ACPI core fix for an RCU-related issue in the ioremap() regions
management code that slowed down significantly after CPUs had been
allowed to enter idle states even if they'd had RCU callbakcs
queued and triggered some problems in certain proprietary graphics
driver (and elsewhere). The fix replaces synchronize_rcu() in that
code with synchronize_rcu_expedited() which makes the issue go
away. From Konstantin Khlebnikov.
- ACPI LPSS (Low-Power Subsystem) driver fix to handle power
management of the DMA engine included into the LPSS correctly. The
problem is that the DMA engine doesn't have ACPI PM support of its
own and it simply is turned off when the last LPSS device having
ACPI PM support goes into D3cold. To work around that, the PM
domain used by the ACPI LPSS driver is redesigned so at least one
device with ACPI PM support will be on as long as the DMA engine is
in use. From Andy Shevchenko.
- ACPI backlight driver fix to avoid using it on "Win8-compatible"
systems where it doesn't work and where it was used by default by
mistake (Aaron Lu).
- Assorted minor ACPI core fixes and cleanups from Tomasz Nowicki,
Sudeep Holla, Huang Rui, Hanjun Guo, Fabian Frederick, and Ashwin
Chaugule (mostly related to the upcoming ARM64 support).
- Intel RAPL (Running Average Power Limit) power capping driver fixes
and improvements including new processor IDs (Jacob Pan).
- Generic power domains modification to power up domains after
attaching devices to them to meet the expectations of device
drivers and bus types assuming devices to be accessible at probe
time (Ulf Hansson).
- Preliminary support for controlling device clocks from the generic
power domains core code and modifications of the ARM/shmobile
platform to use that feature (Ulf Hansson).
- Assorted minor fixes and cleanups of the generic power domains core
code (Ulf Hansson, Geert Uytterhoeven).
- Assorted minor fixes and cleanups of the device clocks control code
in the PM core (Geert Uytterhoeven, Grygorii Strashko).
- Consolidation of device power management Kconfig options by making
CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and removing the latter
which is now redundant (Rafael J Wysocki and Kevin Hilman). That
is the first batch of the changes needed for this purpose.
- Core device runtime power management support code cleanup related
to the execution of callbacks (Andrzej Hajda).
- cpuidle ARM support improvements (Lorenzo Pieralisi).
- cpuidle cleanup related to the CPUIDLE_FLAG_TIME_VALID flag and a
new MAINTAINERS entry for ARM Exynos cpuidle (Daniel Lezcano and
Bartlomiej Zolnierkiewicz).
- New cpufreq driver callback (->ready) to be executed when the
cpufreq core is ready to use a given policy object and cpufreq-dt
driver modification to use that callback for cooling device
registration (Viresh Kumar).
- cpufreq core fixes and cleanups (Viresh Kumar, Vince Hsu, James
Geboski, Tomeu Vizoso).
- Assorted fixes and cleanups in the cpufreq-pcc, intel_pstate,
cpufreq-dt, pxa2xx cpufreq drivers (Lenny Szubowicz, Ethan Zhao,
Stefan Wahren, Petr Cvek).
- OPP (Operating Performance Points) framework modification to allow
OPPs to be removed too and update of a few cpufreq drivers
(cpufreq-dt, exynos5440, imx6q, cpufreq) to remove OPPs (added
during initialization) on driver removal (Viresh Kumar).
- Hibernation core fixes and cleanups (Tina Ruchandani and Markus
Elfring).
- PM Kconfig fix related to CPU power management (Pankaj Dubey).
- cpupower tool fix (Prarit Bhargava)"
* tag 'pm+acpi-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (120 commits)
i2c-omap / PM: Drop CONFIG_PM_RUNTIME from i2c-omap.c
dmaengine / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
tools: cpupower: fix return checks for sysfs_get_idlestate_count()
drivers: sh / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
e1000e / igb / PM: Eliminate CONFIG_PM_RUNTIME
MMC / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
MFD / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
misc / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
media / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
input / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
leds: leds-gpio: Fix multiple instances registration without 'label' property
iio / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
hsi / OMAP / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
i2c-hid / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
drm / exynos / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
gpio / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
hwrandom / exynos / PM: Use CONFIG_PM in #ifdef
block / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
USB / PM: Drop CONFIG_PM_RUNTIME from the USB core
PM: Merge the SET*_RUNTIME_PM_OPS() macros
...
After commit b2b49ccbdd (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is
selected) PM_RUNTIME is always set if PM is set, so quite a few
depend on CONFIG_PM.
Replace CONFIG_PM_RUNTIME with CONFIG_PM in the PCI core code.
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Many sysfs *_show function use cpu{list,mask}_scnprintf to copy cpumap
to the buffer aligned to PAGE_SIZE, append '\n' and '\0' to return null
terminated buffer with newline.
This patch creates a new helper function cpumap_print_to_pagebuf in
cpumask.h using newly added bitmap_print_to_pagebuf and consolidates
most of those sysfs functions using the new helper function.
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Suggested-by: Stephen Boyd <sboyd@codeaurora.org>
Tested-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: x86@kernel.org
Cc: linux-acpi@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
NUMA systems with ACPI normally describe the physical topology via _PXM
methods. But many BIOSes don't implement _PXM, which leaves the kernel
with no way to discover the device topology, which reduces performance
because we can't put memory and processes close to the device.
The NUMA node of a PCI device is already exported in the sysfs "numa_node"
file. Make that file writable so users can workaround the lack of _PXM
methods in the BIOS. For example:
echo 3 > /sys/devices/pci0000:ff/0000:03:1f.3/numa_node
sets the node for PCI device 0000:03:1f.3.
Writing the file emits a FW_BUG warning to encourage users to request
firmware updates. It also taints the kernel with TAINT_FIRMWARE_WORKAROUND
because overriding the node incorrectly can cause performance issues.
[bhelgaas: changelog, documentation text]
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Myron Stowe <mstowe@redhat.com>
CC: Alexander Ducyk <alexander.h.duyck@redhat.com>
CC: Jiang Liu <jiang.liu@linux.intel.com>
Back in commit 5136b2da77 ("PCI: convert bus code to use dev_groups"),
I misstyped the 'enable' sysfs filename as 'enabled', which broke the
userspace API. This patch fixes that issue by renaming the file back.
Fixes: 5136b2da77 ("PCI: convert bus code to use dev_groups")
Reported-by: Jeff Epler <jepler@unpythonic.net>
Tested-by: Jeff Epler <jepler@unpythonic.net> # on v3.14-rt
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # 3.13
The "msi_bus" sysfs file for bridges sets a bus flag to allow or disallow
future driver requests for MSI or MSI-X. Previously, the sysfs file
existed for endpoints but did nothing.
Add "msi_bus" support for endpoints, so an administrator can prevent the
use of MSI and MSI-X for individual devices.
Note that as for bridges, these changes only affect future driver requests
for MSI or MSI-X, so drivers may need to be reloaded.
Add documentation for the "msi_bus" sysfs file.
[bhelgaas: changelog, comments, add "subordinate", add endpoint printk,
rework bus_flags setting, make bus_flags printk unconditional]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Some implementations of modprobe fail to load the driver for a PCI device
automatically because the "interface" part of the modalias from the kernel
is lowercase, and the modalias from file2alias is uppercase.
The "interface" is the low-order byte of the Class Code, defined in PCI
r3.0, Appendix D. Most interface types defined in the spec do not use
alpha characters, so they won't be affected. For example, 00h, 01h, 10h,
20h, etc. are unaffected.
Print the "interface" byte of the Class Code in uppercase hex, as we
already do for the Vendor ID, Device ID, Class, etc.
[bhelgaas: changelog]
Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: stable@vger.kernel.org
Merge quoted strings that are broken across lines into a single entity.
The compiler merges them anyway, but checkpatch complains about it, and
merging them makes it easier to grep for strings.
No functional change.
[bhelgaas: changelog, do the same for everything under drivers/pci]
Signed-off-by: Ryan Desfosses <ryan@desfo.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Fix various whitespace errors.
No functional change.
[bhelgaas: fix other similar problems]
Signed-off-by: Ryan Desfosses <ryan@desfo.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* pci/hotplug:
PCI: cpqphp: Fix possible null pointer dereference
NVMe: Implement PCIe reset notification callback
PCI: Notify driver before and after device reset
* pci/pci_is_bridge:
pcmcia: Use pci_is_bridge() to simplify code
PCI: pciehp: Use pci_is_bridge() to simplify code
PCI: acpiphp: Use pci_is_bridge() to simplify code
PCI: cpcihp: Use pci_is_bridge() to simplify code
PCI: shpchp: Use pci_is_bridge() to simplify code
PCI: rpaphp: Use pci_is_bridge() to simplify code
sparc/PCI: Use pci_is_bridge() to simplify code
powerpc/PCI: Use pci_is_bridge() to simplify code
ia64/PCI: Use pci_is_bridge() to simplify code
x86/PCI: Use pci_is_bridge() to simplify code
PCI: Use pci_is_bridge() to simplify code
PCI: Add new pci_is_bridge() interface
PCI: Rename pci_is_bridge() to pci_has_subordinate()
* pci/virtualization:
PCI: Introduce new device binding path using pci_dev.driver_override
Conflicts:
drivers/pci/pci-sysfs.c
The driver_override field allows us to specify the driver for a device
rather than relying on the driver to provide a positive match of the
device. This shortcuts the existing process of looking up the vendor and
device ID, adding them to the driver new_id, binding the device, then
removing the ID, but it also provides a couple advantages.
First, the above existing process allows the driver to bind to any device
matching the new_id for the window where it's enabled. This is often not
desired, such as the case of trying to bind a single device to a meta
driver like pci-stub or vfio-pci. Using driver_override we can do this
deterministically using:
echo pci-stub > /sys/bus/pci/devices/0000:03:00.0/driver_override
echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind
echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
Previously we could not invoke drivers_probe after adding a device to
new_id for a driver as we get non-deterministic behavior whether the driver
we intend or the standard driver will claim the device. Now it becomes a
deterministic process, only the driver matching driver_override will probe
the device.
To return the device to the standard driver, we simply clear the
driver_override and reprobe the device:
echo > /sys/bus/pci/devices/0000:03:00.0/driver_override
echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind
echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
Another advantage to this approach is that we can specify a driver override
to force a specific binding or prevent any binding. For instance when an
IOMMU group is exposed to userspace through VFIO we require that all
devices within that group are owned by VFIO. However, devices can be
hot-added into an IOMMU group, in which case we want to prevent the device
from binding to any driver (override driver = "none") or perhaps have it
automatically bind to vfio-pci. With driver_override it's a simple matter
for this field to be set internally when the device is first discovered to
prevent driver matches.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Remove pcibios_add_platform_entries(). Architecture-specific attributes
can be achieved by setting pdev->dev.groups.
Link: https://lkml.kernel.org/r/alpine.LFD.2.11.1404141101500.1529@denkbrett
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Move the devspec OF attribute to PCI common code's set of device attributes
since it's not architecture dependent. As a side effect microblaze and
powerpc no longer need to use pcibios_add_platform_entries().
[bhelgaas: fold in #include for compile error]
Link: https://lkml.kernel.org/r/alpine.LFD.2.11.1404141101500.1529@denkbrett
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
driver-core now supports synchrnous self-deletion of attributes and
the asynchrnous removal mechanism is scheduled for removal. Use it
instead of device_schedule_callback(). This makes "remove" behave
synchronously.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are multiple PCI device addition and removal code paths that may be
run concurrently with the generic PCI bus rescan and device removal that
can be triggered via sysfs. If that happens, it may lead to multiple
different, potentially dangerous race conditions.
The most straightforward way to address those problems is to run
the code in question under the same lock that is used by the
generic rescan/remove code in pci-sysfs.c. To prepare for those
changes, move the definition of the global PCI remove/rescan lock
to probe.c and provide global wrappers, pci_lock_rescan_remove()
and pci_unlock_rescan_remove(), allowing drivers to manipulate
that lock. Also provide pci_stop_and_remove_bus_device_locked()
for the callers of pci_stop_and_remove_bus_device() who only need
to hold the rescan/remove lock around it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Fix whitespace, capitalization, and spelling errors. No functional change.
I know "busses" is not an error, but "buses" was more common, so I used it
consistently.
Signed-off-by: Marta Rybczynska <rybczynska@gmail.com> (pci_reset_bridge_secondary_bus())
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
local_cpus_show() and local_cpulist_show() are almost the same.
This adds a new helper function, pci_dev_show_local_cpu(), to simplify
code.
The same strategy is already used by cpuaffinity_show() and
cpulistaffinity_show().
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Local variables used only in this file are made static.
[bhelgaas: also make pci_dev_attrs[] static (from Fengguang)]
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The dev_attrs field of struct bus_type is going away soon, dev_groups
should be used instead. This converts the PCI bus code to use the
correct field.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The bus_attrs field of struct bus_type is going away soon, dev_groups
should be used instead. This converts the PCI bus code to use the
correct field.
Cc: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The dev_attrs field of struct class is going away soon, dev_groups
should be used instead. This converts the PCI class code to use the
correct field.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
PCI devices for SR-IOV virtual functions should only be created/
destroyed by pci_enable_sriov()/pci_disable_sriov() because special
data structures are associated with SR-IOV virtual functions.
So hide hotplug related sysfs interfaces "remove" and "rescan" for
SR-IOV virtual functions, otherwise it may cause memory leakage
and other issues.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Donald Dutile <ddutile@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Ram Pai <linuxram@us.ibm.com>
The usage of strict_strtoul() is not preferred, because
strict_strtoul() is obsolete. Thus, kstrtoul() should be
used.
[bhelgaas: "#define strict_strtoul kstrtoul", so no functional change]
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
(*->vm_end - *->vm_start) >> PAGE_SHIFT operation is implemented
as an inline funcion vma_pages() in linux/mm.h, so use it.
Signed-off-by: Libin <huawei.libin@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
If we request "num_vfs" and the driver's sriov_configure() method enables
exactly that number ("num_vfs_enabled"), we complain "Invalid value for
number of VFs to enable" and return an error. We should silently return
success instead.
Also, use kstrtou16() since numVFs is defined to be a 16-bit field and
rework to simplify control flow.
Reported-by: Greg Rose <gregory.v.rose@intel.com>
Reference: http://lkml.kernel.org/r/20121214101911.00002f59@unknown
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Donald Dutile <ddutile@redhat.com>
Host bridge hotplug:
- Untangle _PRT from struct pci_bus (Bjorn Helgaas)
- Request _OSC control before scanning root bus (Taku Izumi)
- Assign resources when adding host bridge (Yinghai Lu)
- Remove root bus when removing host bridge (Yinghai Lu)
- Remove _PRT during hot remove (Yinghai Lu)
SRIOV
- Add sysfs knobs to control numVFs (Don Dutile)
Power management
- Notify devices when power resource turned on (Huang Ying)
Bug fixes
- Work around broken _SEG on HP xw9300 (Bjorn Helgaas)
- Keep runtime PM enabled for unbound PCI devices (Huang Ying)
- Fix Optimus dual-GPU runtime D3 suspend issue (Dave Airlie)
- Fix xen frontend shutdown issue (David Vrabel)
- Work around PLX PCI 9050 BAR alignment erratum (Ian Abbott)
Miscellaneous
- Add GPL license for drivers/pci/ioapic (Andrew Cooks)
- Add standard PCI-X, PCIe ASPM register #defines (Bjorn Helgaas)
- NumaChip remote PCI support (Daniel Blueman)
- Fix PCIe Link Capabilities Supported Link Speed definition (Jingoo Han)
- Convert dev_printk() to dev_info(), etc (Joe Perches)
- Add support for non PCI BAR ROM data (Matthew Garrett)
- Add x86 support for host bridge translation offset (Mike Yoknis)
- Report success only when every driver supports AER (Vijay Pandarathil)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJQyKwSAAoJEPGMOI97Hn6zScgQAJZK2VDfCv74mKrgSDNokIzH
5nVDrc9AHKJm7CUODs6keJK5d4TD/za3Zao68zrYHsJJKes2ni2Z3W34HP2RXKK2
eOmePXOHYPPZMlimP9r9cVxNu1ZJCyp/yWSBcsPF4zUgWhBWLRaSj85I049gQ0sz
+05nZYfLjVd3HNiaXsG4CQyMrNF46XEsLhF9vs+Nr2GHPwrpzhfScgYv63oDS86C
3ICKsjmiRUZcNelxIFYmyxa5u89QdW5XHjzc9eHGQuus24Vxw+TZzsdfc17sUJEE
HTyXY+RjDpOVhdtwwUjrCEOiyZYvy3g9+3sKxoxgt/76ghdUaR7fxITwB97qVMFD
T0ESlKjSV/Qv5QYdyy5uP4zwNs/PXCWXkTg/L1m71F30BxKWDa7tgiA6uK7Z7fl5
1aokKBdk3mtJJJIDJG1YkxPXx/JItTGCNYrx7CcFj49rSjrUWLQdmrYahersRIsB
3wiD2xTi9e4dXeP/+VGzGOWB/sHk+73jvrvZe/REa1FCnMINDz4+9V9WaGROMqyq
MQ8kX0KfYcNVNxy1GOXjU5wLpMN/t/QbvI7gwzRP1DAUCJPoOgFy7AjvSTVG3zuy
8CtdOFttVkUn5dqsbQR0gVbyQVTS3PGSKz5XC/s8kVDWhja0xZTBYwrskM/4zdSD
Xf48OyYV5EjpC3FYUSiU
=OE3Q
-----END PGP SIGNATURE-----
Merge tag 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI update from Bjorn Helgaas:
"Host bridge hotplug:
- Untangle _PRT from struct pci_bus (Bjorn Helgaas)
- Request _OSC control before scanning root bus (Taku Izumi)
- Assign resources when adding host bridge (Yinghai Lu)
- Remove root bus when removing host bridge (Yinghai Lu)
- Remove _PRT during hot remove (Yinghai Lu)
SRIOV
- Add sysfs knobs to control numVFs (Don Dutile)
Power management
- Notify devices when power resource turned on (Huang Ying)
Bug fixes
- Work around broken _SEG on HP xw9300 (Bjorn Helgaas)
- Keep runtime PM enabled for unbound PCI devices (Huang Ying)
- Fix Optimus dual-GPU runtime D3 suspend issue (Dave Airlie)
- Fix xen frontend shutdown issue (David Vrabel)
- Work around PLX PCI 9050 BAR alignment erratum (Ian Abbott)
Miscellaneous
- Add GPL license for drivers/pci/ioapic (Andrew Cooks)
- Add standard PCI-X, PCIe ASPM register #defines (Bjorn Helgaas)
- NumaChip remote PCI support (Daniel Blueman)
- Fix PCIe Link Capabilities Supported Link Speed definition (Jingoo
Han)
- Convert dev_printk() to dev_info(), etc (Joe Perches)
- Add support for non PCI BAR ROM data (Matthew Garrett)
- Add x86 support for host bridge translation offset (Mike Yoknis)
- Report success only when every driver supports AER (Vijay
Pandarathil)"
Fix up trivial conflicts.
* tag 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (48 commits)
PCI: Use phys_addr_t for physical ROM address
x86/PCI: Add NumaChip remote PCI support
ath9k: Use standard #defines for PCIe Capability ASPM fields
iwlwifi: Use standard #defines for PCIe Capability ASPM fields
iwlwifi: collapse wrapper for pcie_capability_read_word()
iwlegacy: Use standard #defines for PCIe Capability ASPM fields
iwlegacy: collapse wrapper for pcie_capability_read_word()
cxgb3: Use standard #defines for PCIe Capability ASPM fields
PCI: Add standard PCIe Capability Link ASPM field names
PCI/portdrv: Use PCI Express Capability accessors
PCI: Use standard PCIe Capability Link register field names
x86: Use PCI setup data
PCI: Add support for non-BAR ROMs
PCI: Add pcibios_add_device
EFI: Stash ROMs if they're not in the PCI BAR
PCI: Add and use standard PCI-X Capability register names
PCI/PM: Keep runtime PM enabled for unbound PCI devices
xen-pcifront: Handle backend CLOSED without CLOSING
PCI: SRIOV control and status via sysfs (documentation)
PCI/AER: Report success only when every device has AER-aware driver
...
Remove conditional code based on CONFIG_HOTPLUG being false. It's
always on now in preparation of it going away as an option.
Signed-off-by: Bill Pemberton <wfp5p@virginia.edu>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some implementations of SRIOV provide a capability structure
value of TotalVFs that is greater than what the software can support.
Provide a method to reduce the capability structure reported value
to the value the driver can support.
This ensures sysfs reports the current capability of the system,
hardware and software.
Example for its use: igb & ixgbe -- report 8 & 64 as TotalVFs,
but drivers only support 7 & 63 maximum.
Signed-off-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Provide files under sysfs to determine the maximum number of VFs
an SR-IOV-capable PCIe device supports, and methods to enable and
disable the VFs on a per-device basis.
Currently, VF enablement by SR-IOV-capable PCIe devices is done
via driver-specific module parameters. If not setup in modprobe files,
it requires admin to unload & reload PF drivers with number of desired
VFs to enable. Additionally, the enablement is system wide: all
devices controlled by the same driver have the same number of VFs
enabled. Although the latter is probably desired, there are PCI
configurations setup by system BIOS that may not enable that to occur.
Two files are created for the PF of PCIe devices with SR-IOV support:
sriov_totalvfs Contains the maximum number of VFs the device
could support as reported by the TotalVFs register
in the SR-IOV extended capability.
sriov_numvfs Contains the number of VFs currently enabled on
this device as reported by the NumVFs register in
the SR-IOV extended capability.
Writing zero to this file disables all VFs.
Writing a positive number to this file enables that
number of VFs.
These files are readable for all SR-IOV PF devices. Writes to the
sriov_numvfs file are effective only if a driver that supports the
sriov_configure() method is attached.
Signed-off-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Should make pci_create_sysfs_dev_files() simpler. Also fix possible
memleak in remove path.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Need type filled in device structure so it can be used for visible
attribute control in sysfs for pci_dev.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
In https://bugzilla.kernel.org/show_bug.cgi?id=48981
Peter reported that /proc/bus/pci/??/??.? does not work for 3.6.
This is because the device configuration space registers are
not accessible if the corresponding parent bridge is suspended or
the device is put into D3cold state.
This is the same as /sys/bus/pci/devices/0000:??:??.?/config access
issue. So the function used to solve sysfs issue is used to solve
this issue.
This patch moves pci_config_pm_runtime_get()/_put() from pci/pci-sysfs.c
to pci/pci.c and makes them extern so they can be used by both the
sysfs and proc paths.
[bhelgaas: changelog, references, reporters]
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=48981
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=49031
Reported-by: Forrest Loomis <cybercyst@gmail.com>
Reported-by: Peter <lekensteyn@gmail.com>
Reported-by: Micael Dias <kam1kaz3@gmail.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CC: stable@vger.kernel.org # v3.6+
This patch fixes the following bug:
http://marc.info/?l=linux-pci&m=134338059022620&w=2
Where lspci does not work properly if a device and the corresponding
parent bridge (such as PCIe port) is suspended. This is because the
device configuration space registers will be not accessible if the
corresponding parent bridge is suspended or the device is put into
D3cold state.
To solve the issue, the bridge/PCIe port connected to the device is
put into active state before read/write configuration space registers.
If the device is in D3cold state, it will be put into active state
too.
To avoid resume/suspend PCIe port for each configuration register
read/write, a small delay is added before the PCIe port to go
suspended.
Reported-by: Bjorn Mork <bjorn@mork.no>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl>
* topic/huang-d3cold-v7:
PCI/PM: add PCIe runtime D3cold support
PCI: do not call pci_set_power_state with PCI_D3cold
PCI/PM: add runtime PM support to PCIe port
ACPI/PM: specify lowest allowed state for device sleep state
This patch adds runtime D3cold support and corresponding ACPI platform
support. This patch only enables runtime D3cold support; it does not
enable D3cold support during system suspend/hibernate.
D3cold is the deepest power saving state for a PCIe device, where its main
power is removed. While it is in D3cold, you can't access the device at
all, not even its configuration space (which is still accessible in D3hot).
Therefore the PCI PM registers can not be used to transition into/out of
the D3cold state; that must be done by platform logic such as ACPI _PR3.
To support wakeup from D3cold, a system may provide auxiliary power, which
allows a device to request wakeup using a Beacon or the sideband WAKE#
signal. WAKE# is usually connected to platform logic such as ACPI GPE.
This is quite different from other power saving states, where devices
request wakeup via a PME message on the PCIe link.
Some devices, such as those in plug-in slots, have no direct platform
logic. For example, there is usually no ACPI _PR3 for them. D3cold
support for these devices can be done via the PCIe Downstream Port leading
to the device. When the PCIe port is powered on/off, the device is powered
on/off too. Wakeup events from the device will be notified to the
corresponding PCIe port.
For more information about PCIe D3cold and corresponding ACPI support,
please refer to:
- PCI Express Base Specification Revision 2.0
- Advanced Configuration and Power Interface Specification Revision 5.0
[bhelgaas: changelog]
Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl>
Originally-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The default VGA device is a somewhat fluid concept on platforms with
multiple GPUs. Add support for setting it so switching code can update
things appropriately, and make sure that the sysfs code returns the right
device if it's changed.
v2: Updated to fix builds when __ARCH_HAS_VGA_DEFAULT_DEVICE is false.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: benh@kernel.crashing.org
Cc: airlied@redhat.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
The old pci_remove_bus_device actually did stop and remove.
Make the name reflect that to reduce confusion.
This patch is done by sed scripts and changes back some incorrect
__pci_remove_bus_device changes.
Suggested-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Current rescan will not touch bridge MMIO and IO.
Try to reuse pci_assign_unassigned_bridge_resources(bridge) to update bridge
resources, if child devices need more resources.
Only do that for bridges whose children are all removed already; i.e. don't
release resources that could already be in use by drivers on child devices.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'for-linus' of git://selinuxproject.org/~jmorris/linux-security:
capabilities: remove __cap_full_set definition
security: remove the security_netlink_recv hook as it is equivalent to capable()
ptrace: do not audit capability check when outputing /proc/pid/stat
capabilities: remove task_ns_* functions
capabitlies: ns_capable can use the cap helpers rather than lsm call
capabilities: style only - move capable below ns_capable
capabilites: introduce new has_ns_capabilities_noaudit
capabilities: call has_ns_capability from has_capability
capabilities: remove all _real_ interfaces
capabilities: introduce security_capable_noaudit
capabilities: reverse arguments to security_capable
capabilities: remove the task from capable LSM hook entirely
selinux: sparse fix: fix several warnings in the security server cod
selinux: sparse fix: fix warnings in netlink code
selinux: sparse fix: eliminate warnings for selinuxfs
selinux: sparse fix: declare selinux_disable() in security.h
selinux: sparse fix: move selinux_complete_init
selinux: sparse fix: make selinux_secmark_refcount static
SELinux: Fix RCU deref check warning in sel_netport_insert()
Manually fix up a semantic mis-merge wrt security_netlink_recv():
- the interface was removed in commit fd77846152 ("security: remove
the security_netlink_recv hook as it is equivalent to capable()")
- a new user of it appeared in commit a38f7907b9 ("crypto: Add
userspace configuration API")
causing no automatic merge conflict, but Eric Paris pointed out the
issue.
security_capable takes ns, cred, cap. But the LSM capable() hook takes
cred, ns, cap. The capability helper functions also take cred, ns, cap.
Rather than flip argument order just to flip it back, leave them alone.
Heck, this should be a little faster since argument will be in the right
place!
Signed-off-by: Eric Paris <eparis@redhat.com>
They were implicitly getting it from device.h --> module.h but
we want to clean that up. So add the minimal header for these
macros.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Requested by Greg KH to fix a race condition in the creating of PCI bus
cpuaffinity files.
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
After remove the device from /sys, we have to rescan all or
find out the bridge and access /sys../device/rescan there.
this patch add /sys/.../pci_bus/.../rescan. So user can rescan more easy.
that is more clean and easy to understand.
like after remove 0000:c4:00.0, you can rescan 0000:c4 directly.
-v2: According to Jesse, use function instead of exposing attr, so could hide
#ifdef in header file.
also add code to remove rescan file in remove path.
-v3: GregKH pointed out that we should use dev_attrs to avoid racing.
So add pcibus_attrs and make it to be member of pcibus_attrs.
-v4: Change name to pcibus_dev_attrs according to GregKH
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
- Introduce ns_capable to test for a capability in a non-default
user namespace.
- Teach cap_capable to handle capabilities in a non-default
user namespace.
The motivation is to get to the unprivileged creation of new
namespaces. It looks like this gets us 90% of the way there, with
only potential uid confusion issues left.
I still need to handle getting all caps after creation but otherwise I
think I have a good starter patch that achieves all of your goals.
Changelog:
11/05/2010: [serge] add apparmor
12/14/2010: [serge] fix capabilities to created user namespaces
Without this, if user serge creates a user_ns, he won't have
capabilities to the user_ns he created. THis is because we
were first checking whether his effective caps had the caps
he needed and returning -EPERM if not, and THEN checking whether
he was the creator. Reverse those checks.
12/16/2010: [serge] security_real_capable needs ns argument in !security case
01/11/2011: [serge] add task_ns_capable helper
01/11/2011: [serge] add nsown_capable() helper per Bastian Blank suggestion
02/16/2011: [serge] fix a logic bug: the root user is always creator of
init_user_ns, but should not always have capabilities to
it! Fix the check in cap_capable().
02/21/2011: Add the required user_ns parameter to security_capable,
fixing a compile failure.
02/23/2011: Convert some macros to functions as per akpm comments. Some
couldn't be converted because we can't easily forward-declare
them (they are inline if !SECURITY, extern if SECURITY). Add
a current_user_ns function so we can use it in capability.h
without #including cred.h. Move all forward declarations
together to the top of the #ifdef __KERNEL__ section, and use
kernel-doc format.
02/23/2011: Per dhowells, clean up comment in cap_capable().
02/23/2011: Per akpm, remove unreachable 'return -EPERM' in cap_capable.
(Original written and signed off by Eric; latest, modified version
acked by him)
[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: export current_user_ns() for ecryptfs]
[serge.hallyn@canonical.com: remove unneeded extra argument in selinux's task_has_capability]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Acked-by: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI: label: remove #include of ACPI header to avoid warnings
PCI: label: Fix compilation error when CONFIG_ACPI is unset
PCI: pre-allocate additional resources to devices only after successful allocation of essential resources.
PCI: introduce reset_resource()
PCI: data structure agnostic free list function
PCI: refactor io size calculation code
PCI: do not create quirk I/O regions below PCIBIOS_MIN_IO for ICH
PCI hotplug: acpiphp: set current_state to D0 in register_slot
PCI: Export ACPI _DSM provided firmware instance number and string name to sysfs
PCI: add more checking to ICH region quirks
PCI: aer-inject: Override PCIe AER Mask Registers
PCI: fix tlan build when CONFIG_PCI is not enabled
PCI: remove quirk for pre-production systems
PCI: Avoid potential NULL pointer dereference in pci_scan_bridge
PCI/lpc: irq and pci_ids patch for Intel DH89xxCC DeviceIDs
PCI: sysfs: Fix failure path for addition of "vpd" attribute
This reintroduces commit 47970b1b which was subsequently reverted
as f00eaeea. The original change was broken and caused X startup
failures and generally made privileged processes incapable of reading
device dependent config space. The normal capable() interface returns
true on success, but the LSM interface returns 0 on success. This thinko
is now fixed in this patch, and has been confirmed to work properly.
So, once again...Eric Paris noted that commit de139a3 ("pci: check caps
from sysfs file open to read device dependent config space") caused the
capability check to bypass security modules and potentially auditing.
Rectify this by calling security_capable() when checking the open file's
capabilities for config space reads.
Reported-by: Eric Paris <eparis@redhat.com>
Tested-by: Dave Young <hidave.darkstar@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Alex Riesen <raa.lkml@gmail.com>
Cc: Sedat Dilek <sedat.dilek@googlemail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: James Morris <jmorris@namei.org>
This reverts commit 47970b1b2a.
It turns out it breaks several distributions. Looks like the stricter
selinux checks fail due to selinux policies not being set to allow the
access - breaking X, but also lspci.
So while the change was clearly the RightThing(tm) to do in theory, in
practice we have backwards compatibility issues making it not work.
Reported-by: Dave Young <hidave.darkstar@gmail.com>
Acked-by: David Airlie <airlied@linux.ie>
Acked-by: Alex Riesen <raa.lkml@gmail.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Paris noted that commit de139a3 ("pci: check caps from sysfs file
open to read device dependent config space") caused the capability check
to bypass security modules and potentially auditing. Rectify this by
calling security_capable() when checking the open file's capabilities
for config space reads.
Reported-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: James Morris <jmorris@namei.org>
Commit 280c73d ("PCI: centralize the capabilities code in
pci-sysfs.c") changed the initialisation of the "rom" and "vpd"
attributes, and made the failure path for the "vpd" attribute
incorrect. We must free the new attribute structure (attr), but
instead we currently free dev->vpd->attr. That will normally be NULL,
resulting in a memory leak, but it might be a stale pointer, resulting
in a double-free.
Found by inspection; compile-tested only.
Cc: stable@kernel.org
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The PCI sysfs ROM interface requires an enabling write to access the ROM
image, but the default file mode is 0400. The original proposed patch
adding sysfs ROM support was a true read-only interface, with the
enabling bit coming in as a feature request. I suspect it was simply an
oversight that the file mode didn't get updated to match the API.
Acked-by: Chris Wright <chrisw@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
I just loaded 2.6.37-rc2 on my machines, and I noticed that X no longer starts.
Running an strace of the X server shows that it's doing this:
open("/sys/bus/pci/devices/0000:07:00.0/resource0", O_RDWR) = 10
mmap(NULL, 16777216, PROT_READ|PROT_WRITE, MAP_SHARED, 10, 0) = -1 EINVAL (Invalid argument)
This code seems to be asking for a shared read/write mapping of 16MB worth of
BAR0 starting at file offset 0, and letting the kernel assign a starting
address. Unfortunately, this -EINVAL causes X not to start. Looking into
dmesg, there's a complaint like so:
process "Xorg" tried to map 0x01000000 bytes at page 0x00000000 on 0000:07:00.0 BAR 0 (start 0x 96000000, size 0x 1000000)
...with the following code in pci_mmap_fits:
pci_start = (mmap_api == PCI_MMAP_SYSFS) ?
pci_resource_start(pdev, resno) >> PAGE_SHIFT : 0;
if (start >= pci_start && start < pci_start + size &&
start + nr <= pci_start + size)
It looks like the logic here is set up such that when the mmap call comes via
sysfs, the check in pci_mmap_fits wants vma->vm_pgoff to be between the
resource's start and end address, and the end of the vma to be no farther than
the end. However, the sysfs PCI resource files always start at offset zero,
which means that this test always fails for programs that mmap the sysfs files.
Given the comment in the original commit
3b519e4ea6, I _think_ the old procfs files
require that the file offset be equal to the resource's base address when
mmapping.
I think what we want here is for pci_start to be 0 when mmap_api ==
PCI_MMAP_PROCFS. The following patch makes that change, after which the Matrox
and Mach64 X drivers work again.
Acked-by: Martin Wilck <martin.wilck@ts.fujitsu.com>
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cast pci_resource_start() and pci_resource_len() to u64 for printk.
drivers/pci/pci-sysfs.c:753: warning: format '%16Lx' expects type 'long long unsigned int', but argument 9 has type 'resource_size_t'
drivers/pci/pci-sysfs.c:753: warning: format '%16Lx' expects type 'long long unsigned int', but argument 10 has type 'resource_size_t'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The checks for valid mmaps of PCI resources made through /proc/bus/pci files
that were introduced in 9eff02e204 have several
problems:
1. mmap() calls on /proc/bus/pci files are made with real file offsets > 0,
whereas under /sys/bus/pci/devices, the start of the resource corresponds
to offset 0. This may lead to false negatives in pci_mmap_fits(), which
implicitly assumes the /sys/bus/pci/devices layout.
2. The loop in proc_bus_pci_mmap doesn't skip empty resouces. This leads
to false positives, because pci_mmap_fits() doesn't treat empty resources
correctly (the calculated size is 1 << (8*sizeof(resource_size_t)-PAGE_SHIFT)
in this case!).
3. If a user maps resources with BAR > 0, pci_mmap_fits will emit bogus
WARNINGS for the first resources that don't fit until the correct one is found.
On many controllers the first 2-4 BARs are used, and the others are empty.
In this case, an mmap attempt will first fail on the non-empty BARs
(including the "right" BAR because of 1.) and emit bogus WARNINGS because
of 3., and finally succeed on the first empty BAR because of 2.
This is certainly not the intended behaviour.
This patch addresses all 3 issues.
Updated with an enum type for the additional parameter for pci_mmap_fits().
Cc: stable@kernel.org
Signed-off-by: Martin Wilck <martin.wilck@ts.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch exports SMBIOS provided firmware instance and label of
onboard PCI devices to sysfs. New files are:
/sys/bus/pci/devices/.../label which contains the firmware name for
the device in question, and
/sys/bus/pci/devices/.../index which contains the firmware device type
instance for the given device.
Signed-off-by: Jordan Hargrave <jordan_hargrave@dell.com>
Signed-off-by: Narendra K <narendra_k@dell.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
PCI sysfs resource files currently only allow mmap'ing. On x86 this
works fine for memory backed BARs, but doesn't work at all for I/O
port backed BARs. Add read/write to I/O port PCI sysfs resource
files to allow userspace access to these device regions.
Acked-by: Chris Wright <chrisw@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This reverts commit 75568f8094.
Since they're just a convenience anyway, remove these symlinks since
they're causing duplicate filename errors in the wild.
Acked-by: Alex Chiang <achiang@canonical.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The PCI config space bin_attr read handler has a hardcoded CAP_SYS_ADMIN
check to verify privileges before allowing a user to read device
dependent config space. This is meant to protect from an unprivileged
user potentially locking up the box.
When assigning a PCI device directly to a guest with libvirt and KVM,
the sysfs config space file is chown'd to the unprivileged user that
the KVM guest will run as. The guest needs to have full access to the
device's config space since it's responsible for driving the device.
However, despite being the owner of the sysfs file, the CAP_SYS_ADMIN
check will not allow read access beyond the config header.
With this patch we check privileges against the capabilities used when
openining the sysfs file. The allows a privileged process to open the
file and hand it to an unprivileged process, and the unprivileged process
can still read all of the config space.
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This allows bin_attr->read,write,mmap callbacks to check file specific data
(such as inode owner) as part of any privilege validation.
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
A successful write() to the "reset" sysfs attribute should return the
number of bytes written, not 0. Otherwise userspace (bash) retries the
write over and over again.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Create convenience symlinks in sysfs, linking slots to device
functions, and vice versa. These links make it easier for users to
figure out which devices actually live in what slots.
For example:
sapphire:/sys/bus/pci/slots # ls
1 10 2 3 4 5 6 7 8 9
sapphire:/sys/bus/pci/slots # ls -l 3
total 0
-r--r--r-- 1 root root 65536 Aug 18 14:10 address
lrwxrwxrwx 1 root root 0 Aug 18 14:10 function0 ->
../../../../devices/pci0000:23/0000:23:01.0
lrwxrwxrwx 1 root root 0 Aug 18 14:10 function1 ->
../../../../devices/pci0000:23/0000:23:01.1
sapphire:/sys/bus/pci/slots # ls -l 3/function0/slot
lrwxrwxrwx 1 root root 0 Aug 18 14:13 3/function0/slot ->
../../../bus/pci/slots/3
The original form of this patch was written by Matthew Wilcox,
and was enhanced to include links from the sysfs slots/ directory
pointing back at the device functions.
Cc: willy@linux.intel.com
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
PPC64 is failing to boot the latest mmotm due to an uninitialised pointer in
pci_create_legacy_files(). The surprise is that machines boot at all and it
would appear to affect current mainline as well. This patch fixes the problem.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
After merging the final tree, today's linux-next build (powerpc
allyesconfig) failed like this:
drivers/pci/pci-sysfs.c: In function 'pci_create_legacy_files':
drivers/pci/pci-sysfs.c:645: error: lvalue required as unary '&' operand
drivers/pci/pci-sysfs.c:658: error: lvalue required as unary '&' operand
Caused by commit "sysfs: Use sysfs_attr_init and sysfs_bin_attr_init on
dynamic attributes" interacting with commit "sysfs: Use one lockdep
class per sysfs attribute") both from the driver-core tree.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
These are the non-static sysfs attributes that exist on
my test machine. Fix them to use sysfs_attr_init or
sysfs_bin_attr_init as appropriate. It simply requires
making a sysfs attribute present to see this. So this
is a little bit tedious but otherwise not too bad.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit e0cd516 "PCI: derive nearby CPUs from device's instead of bus'
NUMA information" causes an null pointer dereference when reading from
the sysfs attributes local_cpu* on Intel machines with no ACPI NUMA
proximity info, since dev->numa_node gets set to -1 for all PCI devices,
which then gets passed to cpumask_of_node.
Add a check to prevent this.
Signed-off-by: David John <davidjon@xenontk.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
So we can catch if the driver sets an incorrect dma_mask.
Reviewed-by: Grant Grundler <grundler@google.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
In case of AMD CPU northbridge functions this NUMA information might
differ. Here is an example from a 4-socket system.
Currently Linux shows
root@hagen:/sys/devices/pci0000:00/0000:00:1a.4# cat numa_node
0
root@hagen:/sys/devices/pci0000:00/0000:00:1a.4# cat local_cpu*
0-3
00000000,0000000f
which is not correct for northbridge functions as the local CPUs
are those of the same socket.
With this patch and a quirk for AMD CPU NB functions Linux can
do better and correctly show
root@hagen:/sys/devices/pci0000:00/0000:00:1a.4# cat numa_node
2
root@hagen:/sys/devices/pci0000:00/0000:00:1a.4# cat local_cpu*
8-11
00000000,00000f00
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Some devices allow an individual function to be reset without affecting
other functions in the same device: that's what pci_reset_function does.
For devices that have this support, expose reset attribite in sysfs.
This is useful e.g. for virtualization, where a qemu userspace
process wants to reset the device when the guest is reset,
to emulate machine reboot as closely as possible.
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Add drivers/pci/*.c source files to DocBook/kernel-api.tmpl
and update those pci/*.c source files that need kernel-doc fixes.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
There is no reason to prevent removal of root bus devices. A subsequent
rescan will find them just fine.
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch sets up disabled bridges even if buses have already been
added.
pci_assign_unassigned_resources is called after buses are added.
pci_assign_unassigned_resources calls pci_bus_assign_resources.
pci_bus_assign_resources calls pci_setup_bridge to configure BARs of
bridges.
Currently pci_setup_bridge returns immediately if the bus have already
been added. So pci_assign_unassigned_resources can't configure BARs of
bridges that were added in a disabled state; this patch fixes the issue.
On logical hot-add, we need to prevent the kernel from re-initializing
bridges that have already been initialized. To achieve this,
pci_setup_bridge returns immediately if the bridge have already been
enabled.
We don't need to check whether the specified bus is a root bus or not.
pci_setup_bridge is not called on a root bus, because a root bus does
not have a bridge.
The patch adds a new helper function, pci_is_enabled. I made the
function name similar to pci_is_managed. The codes which use
enable_cnt directly are changed to use pci_is_enabled.
Acked-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Yuji Shimada <shimada-yxb@necst.nec.co.jp>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This interface allows the user to force a rescan of the device's
parent bus and all subordinate buses, and rediscover devices removed
earlier from this part of the device tree.
Cc: Trent Piepho <xyzzy@speakeasy.org>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch adds an attribute named "remove" to a PCI device's sysfs
directory. Writing a non-zero value to this attribute will remove the PCI
device and any children of it.
Trent Piepho wrote the original implementation and documentation.
Thanks to Vegard Nossum for testing under kmemcheck and finding locking
issues with the sysfs interface.
Cc: Trent Piepho <xyzzy@speakeasy.org>
Tested-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This interface allows the user to force a rescan of all PCI buses
in system, and rediscover devices that have been removed earlier.
pci_bus_attrs implementation from Trent Piepho.
Thanks to Vegard Nossum for discovering locking issues with the
sysfs interface.
Cc: Trent Piepho <xyzzy@speakeasy.org>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
X really would like to know which VGA device was considered the boot
device by the system. The x86 PCI fixups have support for discovering
this but we provide no way to expose it to userspace.
This adds a sysfs file per VGA class device which has the value 0 for
non the boot device or unknown, and 1 if the VGA device is the boot
device.
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This closes http://bugzilla.kernel.org/show_bug.cgi?id=10893
which is a showstopper for X development on alpha.
The generic HAVE_PCI_MMAP code (drivers/pci-sysfs.c) is not
very useful since we have to deal with three different types
of MMIO address spaces: sparse and dense mappings for old
ev4/ev5 machines and "normal" 1:1 MMIO space (bwx) for ev56 and
later.
Also "write combine" mappings are meaningless on alpha - roughly
speaking, alpha does write combining, IO reordering and other
optimizations by default, unless user splits IO accesses
with memory barriers.
I think the cleanest way to deal with resource files on alpha
is to convert the default no-op pci_create_resource_files() and
pci_remove_resource_files() for !HAVE_PCI_MMAP case into __weak
functions and override them with alpha specific ones.
Another alpha hook is needed for "legacy_" resource files
to handle sparse addressing (pci_adjust_legacy_attr).
With the "standard" resourceN files on ev56/ev6 libpciaccess
works "out of the box". Handling of resourceN_sparse/resourceN_dense
files on older machines obviously requires some userland work.
Sparse/dense stuff has been tested on sx164 (pca56/pyxis, normally
uses bwx IO) with the kernel hacked into "cia compatible" mode.
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>