Langwell devices are not true pci devices, they are not subject to the 10 ms
d3 to d0 delay required by pci spec. This patch assigns d3_delay to 0 for all
langwell pci devices.
We can also power off devices that are not really used by the OS
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Add a parameter to avoid using MSI/MSI-X for PCIe native hotplug; it's
known to be buggy on some platforms.
In my environment, while shutting down, following stack trace is shown
sometimes.
irq 16: nobody cared (try booting with the "irqpoll" option)
Pid: 1081, comm: reboot Not tainted 3.2.0 #1
Call Trace:
<IRQ> [<ffffffff810cec1d>] __report_bad_irq+0x3d/0xe0
[<ffffffff810cee1c>] note_interrupt+0x15c/0x210
[<ffffffff810cc485>] handle_irq_event_percpu+0xb5/0x210
[<ffffffff810cc621>] handle_irq_event+0x41/0x70
[<ffffffff810cf675>] handle_fasteoi_irq+0x55/0xc0
[<ffffffff81015356>] handle_irq+0x46/0xb0
[<ffffffff814fbe9d>] do_IRQ+0x5d/0xe0
[<ffffffff814f146e>] common_interrupt+0x6e/0x6e
[<ffffffff8106b040>] ? __do_softirq+0x60/0x210
[<ffffffff8108aeb1>] ? hrtimer_interrupt+0x151/0x240
[<ffffffff814fb5ec>] call_softirq+0x1c/0x30
[<ffffffff810152d5>] do_softirq+0x65/0xa0
[<ffffffff8106ae9d>] irq_exit+0xbd/0xe0
[<ffffffff814fbf8e>] smp_apic_timer_interrupt+0x6e/0x99
[<ffffffff814f9e5e>] apic_timer_interrupt+0x6e/0x80
<EOI> [<ffffffff814f0fb1>] ? _raw_spin_unlock_irqrestore+0x11/0x20
[<ffffffff812629fc>] pci_bus_write_config_word+0x6c/0x80
[<ffffffff81266fc2>] pci_intx+0x52/0xa0
[<ffffffff8127de3d>] pci_intx_for_msi+0x1d/0x30
[<ffffffff8127e4fb>] pci_msi_shutdown+0x7b/0x110
[<ffffffff81269d34>] pci_device_shutdown+0x34/0x50
[<ffffffff81326c4f>] device_shutdown+0x2f/0x140
[<ffffffff8107b981>] kernel_restart_prepare+0x31/0x40
[<ffffffff8107b9e6>] kernel_restart+0x16/0x60
[<ffffffff8107bbfd>] sys_reboot+0x1ad/0x220
[<ffffffff814f4b90>] ? do_page_fault+0x1e0/0x460
[<ffffffff811942d0>] ? __sync_filesystem+0x90/0x90
[<ffffffff8105c9aa>] ? __cond_resched+0x2a/0x40
[<ffffffff814ef090>] ? _cond_resched+0x30/0x40
[<ffffffff81169e17>] ? iterate_supers+0xb7/0xd0
[<ffffffff814f9382>] system_call_fastpath+0x16/0x1b
handlers:
[<ffffffff8138a0f0>] usb_hcd_irq
[<ffffffff8138a0f0>] usb_hcd_irq
[<ffffffff8138a0f0>] usb_hcd_irq
Disabling IRQ #16
An un-wanted interrupt is generated when PCI driver switches from
MSI/MSI-X to INTx while shutting down the device. The interrupt does
not happen if MSI/MSI-X is not used on the device.
I confirmed that this problem does not happen if pcie_hp=nomsi was
specified and hotplug operation worked fine as usual.
v2: Automatically disable MSI/MSI-X against following device:
PCI bridge: Integrated Device Technology, Inc. Device 807f (rev 02)
v3: Based on the review comment, combile the if statements.
v4: Removed module parameter.
Move some code to build pciehp as a module.
Move device specific code to driver/pci/quirks.c.
v5: Drop a device specific code until getting a vendor statement.
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: MUNEDA Takahiro <muneda.takahiro@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Only one user in driver/pci/pci.c, so we don't need to put it in global
pci.h
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Found debug print of class is shifted.
| pci 0000:f8:15.2: [8086:2b56] type 0 class 0x000600
Code is trying to print class with 6 digits, but use shifted class with
4 digits valid value as variable.
Change to original dev->class directly.
Also remove not needed calculating of local variable class, because it
will be updated after pci_fixup_device(pci_fixup_early...)
Also unify type print out when class and header is not matched.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Otherwise when rescan is used for cardbus, assigned resources will get
cleared.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We should not set the requested size to -2; that will confuse the
resource list sorting with align when SIZEALIGN is used.
Change to STARTALIGN and pass align from start; we are safe to do that
just as we do that regular pci bridge. In the long run, we should just
treat cardbus like a regular pci bridge.
Also fix the case when realloc_head is not passed: we should keep the
requested size.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Some BIOSes enable prefetch on both MEM0 and MEM1. But the cardbus code
assumes MEM1 is non-pref...
Discussion could be found at:
https://lkml.org/lkml/2012/1/12/1https://bugzilla.kernel.org/show_bug.cgi?id=41622#c23
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
If a PCI device is enabled to generate wakeup signals (PME) when put
into a low-power state by runtime PM, it will be still enabled to
generate those signals after the system shutdown, unless its driver's
.shutdown() callback takes care of the wakeup signals generation
setting. Moreover, there are devices that are not enabled to wake
up the system and that are configured by runtime PM to generate
wakeup signals so that (runtime) remote wakeup works with them.
Those devices should be reconfigured during system shutdown so that
they don't generate wakeup signals, but at least some drivers don't
do that. However, that very well may be done by the PCI core so
that drivers don't have to worry about it. For this reason, modify
pci_device_shutdown() to disable the generation of wakeup events for
devices not supposed to wake up the system.
References: https://bugzilla.kernel.org/show_bug.cgi?id=37952
Reported-and-tested-by: Kamil Iskra <kamil.54002@iskra.name>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
sysfs is a bit stricter now and emits warnings in more cases.
For SRIOV hotplug, we are calling pci_stop_dev() for each VF first
(after we update pci_stop_bus_devices) which remove each VF subdir. So
double check the VF dir in /sys before trying to remove the physfn link.
Signed-of-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Distributions may wish to provide different defaults for PCIE ASPM
depending on their target audience. Provide a configuration option for
choosing the default policy.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Some BIOS implementations leave the Intel GPU interrupts enabled,
even though no one is handling them (f.e. i915 driver is never loaded).
Additionally the interrupt destination is not set up properly
and the interrupt ends up -somewhere-.
These spurious interrupts are "sticky" and the kernel disables
the (shared) interrupt line after 100.000+ generated interrupts.
Fix it by disabling the still enabled interrupts.
This resolves crashes often seen on monitor unplug.
Tested on the following boards:
- Intel DH61CR: Affected
- Intel DH67BL: Affected
- Intel S1200KP server board: Affected
- Asus P8H61-M LE: Affected, but system does not crash.
Probably the IRQ ends up somewhere unnoticed.
According to reports on the net, the Intel DH61WW board is also affected.
Many thanks to Jesse Barnes from Intel for helping
with the register configuration and to Intel in general
for providing public hardware documentation.
Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Tested-by: Charlie Suffin <charlie.suffin@stratus.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
While diagnosing some boot time issues on a platform, all that I
could see in the bootgraph/dmesg was that the system was spending
a lot of time in applying one or more PCI quirks... which
was virtually undebuggable.
This patch adds printk's in "initcall_debug" style to the dmesg,
which are added when the user asks for the initcall_debug
(the nr one tool to use when debugging boot hangs or boot time issues)
kernel command line option.
v2: add #includes so quirks can build on non-x86
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Fix debug variable from module parameter to be really bool to
fix 'warning: return from incompatible pointer type'.
Acked-by: Scott Murray <scott@spiteful.org>
Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
On some OEM systems, pci_restore_state() is called while FLR has not yet
completed. As a result, PCI BAR register restore is not successful. This fix
reads back the restored value and compares it with saved value and re-tries 10
times before giving up.
Signed-off-by: Jean Guyader <jean.guyader@eu.citrix.com>
Signed-off-by: Eric Chanudet <eric.chanudet@citrix.com>
Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
On a system with a repeater on the system board to support gen2 hotplug,
we found that when an ExpressModule is removed from some slots,
/var/log/messages will be full of "card present/not present" warnings.
It turns out the root complex is continually trying to train the link to
the repeater because the repeater has not been reset.
This patch will disable the link at removal time to allow the repeater
to be reset properly. This also prevents a potential AER message at
removal time.
Also, when testing hotplug on a system under development, we found if we
boot the system without an EM installed, and later hot-add an EM, it
does not work with Linux, but another OS is ok. The root cause is that
BIOS left link disabled when slot was empty at boot time, and other OS
is modifying the link disable bit in link ctrl during power on/off.
So we should do the same thing to disable/enable link during power off/on.
-v2: check link DLLA bit instead of 100ms waiting.
Separate link disable/enable functions to another patch.
Signed-off-by: Yinghai Lu <yinghai.lu@oracle.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
A few changes:
- remove the 'inline' and let the complier decide
- return a bool to indicate whether the link was active
- add a debug message to indicate link state when it beocmes active
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
During reviewing
| PCI: pciehp: wait 1000 ms before Link Training check
Linus said:
>...
> That's a *long* time, and it's irritating to the user. It makes the
> user think "the machine is slow".
>...
> And quite frankly, an unconditional one-second delay here seems bad.
>Two seconds was unacceptable, one second is just bad.
Try to access the pci conf of a pci device that is supposed to show up
in 1s. If we can read back a valid vendor/device id, we can return
early.
Related discussion could be found:
https://lkml.org/lkml/2011/12/6/339
-v2: seperate code to pci_bus_read_dev_vendor_id() from pci_scan_device()
and reuse it from pciehp code. Suggested by Matthew Wilcox.
-v3: According to Kenj, don't use array in stack, and don't wait too long
for crs, also return fail status if not found.
Also separate pci_bus_dev_read_vendor_id() change to another patch.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We can reuse it for pciehp probing.
-v2: according to Kenji, fix crs timeout checking, and export the function
for later use when pciehp is compiled as a module.
Suggested-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
When hot removing a pci express module that has a pcie switch and supports
SRIOV, we got:
[ 5918.610127] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 1
[ 5918.615779] pciehp 0000:80:02.2:pcie04: Attention button interrupt received
[ 5918.622730] pciehp 0000:80:02.2:pcie04: Button pressed on Slot(3)
[ 5918.629002] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 1f9
[ 5918.637416] pciehp 0000:80:02.2:pcie04: PCI slot #3 - powering off due to button press.
[ 5918.647125] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 10
[ 5918.653039] pciehp 0000:80:02.2:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
[ 5918.661229] pciehp 0000:80:02.2:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd c0
[ 5924.667627] pciehp 0000:80:02.2:pcie04: Disabling domain🚌device=0000:b0:00
[ 5924.674909] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 2f9
[ 5924.683262] pciehp 0000:80:02.2:pcie04: pciehp_unconfigure_device: domain🚌dev = 0000:b0:00
[ 5924.693976] libfcoe_device_notification: NETDEV_UNREGISTER eth6
[ 5924.764979] libfcoe_device_notification: NETDEV_UNREGISTER eth14
[ 5924.873539] libfcoe_device_notification: NETDEV_UNREGISTER eth15
[ 5924.995209] libfcoe_device_notification: NETDEV_UNREGISTER eth16
[ 5926.114407] sxge 0000:b2:00.0: PCI INT A disabled
[ 5926.119342] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 5926.127189] IP: [<ffffffff81353a3b>] pci_stop_bus_device+0x33/0x83
[ 5926.133377] PGD 0
[ 5926.135402] Oops: 0000 [#1] SMP
[ 5926.138659] CPU 2
[ 5926.140499] Modules linked in:
...
[ 5926.143754]
[ 5926.275823] Call Trace:
[ 5926.278267] [<ffffffff81353a38>] pci_stop_bus_device+0x30/0x83
[ 5926.284180] [<ffffffff81353af4>] pci_remove_bus_device+0x1a/0xba
[ 5926.290264] [<ffffffff81366311>] pciehp_unconfigure_device+0x110/0x17b
[ 5926.296866] [<ffffffff81365dd9>] ? pciehp_disable_slot+0x188/0x188
[ 5926.303123] [<ffffffff81365d6f>] pciehp_disable_slot+0x11e/0x188
[ 5926.309206] [<ffffffff81365e68>] pciehp_power_thread+0x8f/0xe0
...
+-[0000:80]-+-00.0-[81-8f]--
| +-01.0-[90-9f]--
| +-02.0-[a0-af]--
| +-02.2-[b0-bf]----00.0-[b1-b3]--+-02.0-[b2]--+-00.0 Device
| | | +-00.1 Device
| | | +-00.2 Device
| | | \-00.3 Device
| | \-03.0-[b3]--+-00.0 Device
| | +-00.1 Device
| | +-00.2 Device
| | \-00.3 Device
root complex: 80:02.2
pci express modules: have pcie switch and are listed as b0:00.0, b1:02.0 and b1:03.0.
end devices are b2:00.0 and b3.00.0.
VFs are: b2:00.1,... b2:00.3, and b3:00.1,...,b3:00.3
Root cause: when doing pci_stop_bus_device() with phys fn, it will stop
virt fn and remove the fn, so
list_for_each_safe(l, n, &bus->devices)
will have problem to refer freed n that is pointed to vf entry.
Solution is just replacing list_for_each_safe() with
list_for_each_prev_safe(). This will make sure we can get valid n pointer
to PF instead of the freed VF pointer (because newly added devices are
inserted to the bus->devices list tail).
During reviewing the patch, Bjorn said:
| The PCI hot-remove path calls pci_stop_bus_devices() via
| pci_remove_bus_device().
|
| pci_stop_bus_devices() traverses the bus->devices list (point A below),
| stopping each device in turn, which calls the driver remove() method. When
| the device is an SR-IOV PF, the driver calls pci_disable_sriov(), which
| also uses pci_remove_bus_device() to remove the VF devices from the
| bus->devices list (point B).
|
| pci_remove_bus_device
| pci_stop_bus_device
| pci_stop_bus_devices(subordinate)
| list_for_each(bus->devices) <-- A
| pci_stop_bus_device(PF)
| ...
| driver->remove
| pci_disable_sriov
| ...
| pci_remove_bus_device(VF)
| <remove from bus_list> <-- B
|
| At B, we're changing the same list we're iterating through at A, so when
| the driver remove() method returns, the pci_stop_bus_devices() iterator has
| a pointer to a list entry that has already been freed.
Discussion thread can be found : https://lkml.org/lkml/2011/10/15/141https://lkml.org/lkml/2012/1/23/360
-v5: According to Linus to make remove more robust, Change to
list_for_each_prev_safe instead. That is more reasonable, because
those devices are added to tail of the list before.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
After merging struct pci_dev_resource_x and pci_dev_resource,
We can use a function instead of macro now.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Linus says don't use dev_res_x because it doesn't communicate anything
about usage. Rename them to add_res or fail_res etc according to
context.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
pci_dev_resource_x is a superset of pci_dev_resource and they're just
temp structs used during resource reallocation.
pci_dev_resource usage is quite limted.
So just use pci_dev_resource_x, and rename it as new pci_dev_resource.
-v2: According to Linus, Separate free_list change to another patch
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
So we can use helper functions for generic list. This makes the
resource re-allocation code much more readable.
-v2: Use list_add_tail instead of adding list_insert_before, Pointed out
by Linus.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
No user outside of setup-bus.c now. Later patches will convert
resource_list to a regular list.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This allows us to move the definition of struct resource_list to
setup_bus.c and later convert resource_list to a regular list.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
On a system with devices that support SRIOV connected to a pcie switch
to pcie root port:
+-[0000:80]-+-00.0-[81-8f]--
| +-01.0-[90-9f]--
| +-02.0-[a0-af]----00.0-[a1-a3]--+-02.0-[a2]--+-00.0 Oracle Corporation Device 207a
| | \-03.0-[a3]--+-00.0 Oracle Corporation Device 207a
| +-02.2-[b0-bf]----00.0-[b1-b3]--+-02.0-[b2]--+-00.0 Oracle Corporation Device 207a
| | \-03.0-[b3]--+-00.0 Oracle Corporation Device 207a
When the BIOS does not assign resources for SRIOV BARs, kernel pci
reallocation only goes up one bridge and then gives up, failing to to
get resources for all sSRIOV BARs, even though the range is large enough
in the peer root bus.
Specifically, only the bridge at the a1:02.0 level has its resources
cleared and reallocated. The kernel does not go up to clear the bridge
at the 80:02.0 level.
To make it go to upper levels, during retry, we need to treat "good to have"
resources as "must have".
Only on the last try will we treat good to have resources as optional.
At that time, parent bridge resources will already have been released so
we'll have a chance to get everything assigned with must_have plus
good_to_have for all child devices.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This allows us to allocate resources to hotplug bridges during
remove/rescan.
We need to move the function to setup-bus.c so it can use
__pci_bus_size_bridges and __pci_bus_assign_resources directly to take
the add_list resource tracking list.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Current rescan will not touch bridge MMIO and IO.
Try to reuse pci_assign_unassigned_bridge_resources(bridge) to update bridge
resources, if child devices need more resources.
Only do that for bridges whose children are all removed already; i.e. don't
release resources that could already be in use by drivers on child devices.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We need add size for hot plug path when pluging in hotplug chassis
without cards.
-v2: change descriptions. make it applicable after "pci: Check bridge
resources after resource allocation."
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Need to call it from __assign_resources_sorted() later and we'd like to
avoid a forward declaraion.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Will be used for resource_list_x duplication when trying
requested+optional at first.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
During debug of one SRIOV enabled hotplug device, we found found that
add_size is not passed properly.
The device has devices under two level bridges:
+-[0000:80]-+-00.0-[81-8f]--
| +-01.0-[90-9f]--
| +-02.0-[a0-af]----00.0-[a1-a3]--+-02.0-[a2]--+-00.0 Oracle Corporation Device
| | \-03.0-[a3]--+-00.0 Oracle Corporation Device
Which means later the parent bridge will not try to add a big enough range:
[ 557.455077] pci 0000:a0:00.0: BAR 14: assigned [mem 0xf9000000-0xf93fffff]
[ 557.461974] pci 0000:a0:00.0: BAR 15: assigned [mem 0xf6000000-0xf61fffff pref]
[ 557.469340] pci 0000:a1:02.0: BAR 14: assigned [mem 0xf9000000-0xf91fffff]
[ 557.476231] pci 0000:a1:02.0: BAR 15: assigned [mem 0xf6000000-0xf60fffff pref]
[ 557.483582] pci 0000:a1:03.0: BAR 14: assigned [mem 0xf9200000-0xf93fffff]
[ 557.490468] pci 0000:a1:03.0: BAR 15: assigned [mem 0xf6100000-0xf61fffff pref]
[ 557.497833] pci 0000:a1:03.0: BAR 14: can't assign mem (size 0x200000)
[ 557.504378] pci 0000:a1:03.0: failed to add optional resources res=[mem 0xf9200000-0xf93fffff]
[ 557.513026] pci 0000:a1:02.0: BAR 14: can't assign mem (size 0x200000)
[ 557.519578] pci 0000:a1:02.0: failed to add optional resources res=[mem 0xf9000000-0xf91fffff]
It turns out we did not calculate size1 properly.
static resource_size_t calculate_memsize(resource_size_t size,
resource_size_t min_size,
resource_size_t size1,
resource_size_t old_size,
resource_size_t align)
{
if (size < min_size)
size = min_size;
if (old_size == 1 )
old_size = 0;
if (size < old_size)
size = old_size;
size = ALIGN(size + size1, align);
return size;
}
We should not pass add_size with min_size in calculate_memsize since
that will make add_size not contribute final add_size.
So just pass add_size with size1 to calculate_memsize().
With this change, we should have chance to remove extra addon in
pci_reassign_resource.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Host bridges that lead to things like the Uncore need not have any
I/O port or MMIO apertures. For example, in this case:
ACPI: PCI Root Bridge [UNC1] (domain 0000 [bus ff])
PCI: root bus ff: using default resources
PCI host bridge to bus 0000:ff
pci_bus 0000:ff: root bus resource [io 0x0000-0xffff]
pci_bus 0000:ff: root bus resource [mem 0x00000000-0x3fffffffffff]
we should not pretend those default resources are available on bus ff.
CC: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We use the __pci_reset_function_locked to perform the action.
Also on attaching ("bind") and detaching ("unbind") we save and
restore the configuration states. When the device is disconnected
from a guest we use the "pci_reset_function" to also reset the
device before being passed to another guest.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The use case of this is when a driver wants to call FLR when a device
is attached to it using the SysFS "bind" or "unbind" functionality.
The call chain when a user does "bind" looks as so:
echo "0000:01.07.0" > /sys/bus/pci/drivers/XXXX/bind
and ends up calling:
driver_bind:
device_lock(dev); <=== TAKES LOCK
XXXX_probe:
.. pci_enable_device()
...__pci_reset_function(), which calls
pci_dev_reset(dev, 0):
if (!0) {
device_lock(dev) <==== DEADLOCK
The __pci_reset_function_locked function allows the the drivers
'probe' function to call the "pci_reset_function" while still holding
the driver mutex lock.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Add missing iounmap in error handling code, in a case where the function
already preforms iounmap on some other execution path.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@@
expression e;
statement S,S1;
int ret;
@@
e = \(ioremap\|ioremap_nocache\)(...)
... when != iounmap(e)
if (<+...e...+>) S
... when any
when != iounmap(e)
*if (...)
{ ... when != iounmap(e)
return ...; }
... when any
iounmap(e);
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Boot up a KVM guest, and hotplug multifunction
devices(func1,func2,func0,func3) to guest.
for i in 1 2 0 3;do
qemu-img create /tmp/resize$i.qcow2 1G -f qcow2
(qemu) drive_add 0x11.$i id=drv11$i,if=none,file=/tmp/resize$i.qcow2
(qemu) device_add virtio-blk-pci,id=dev11$i,drive=drv11$i,addr=0x11.$i,multifunction=on
done
In linux kernel, when func0 of the slot is hot-added, the whole
slot will be marked as 'enabled', then driver will ignore other new
hotadded funcs.
But in Win7 & WinXP, we can continaully add other funcs after adding
func0, all funcs will be added in guest.
drivers/pci/hotplug/acpiphp_glue.c:
static int acpiphp_check_bridge(struct acpiphp_bridge *bridge)
{
....
for (slot = bridge->slots; slot; slot = slot->next) {
if (slot->flags & SLOT_ENABLED) {
acpiphp_disable_slot()
else
acpiphp_enable_slot()
.... |
} v
enable_device()
|
v
//only don't enable slot if func0 is not added
list_for_each_entry(func, &slot->funcs, sibling) {
...
}
slot->flags |= SLOT_ENABLED; //mark slot to 'enabled'
This patch just make pci driver can continaully add funcs after adding
func 0. Only mark slot to 'enabled' when all funcs are added.
For pci multifunction hotplug, we can add functions one by one(func 0 is
necessary), and all functions will be removed in one time.
Signed-off-by: Amos Kong <akong@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch converts the underlying maintenance aspects of FW-assigned
BIOS BAR values from a statically allocated array within struct pci_dev
to a list of temporary, stand alone, entries.
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Commit 58c84eda07 introduced functionality to try and reinstate the
original BIOS BAR addresses of a PCI device when normal resource
assignment attempts fail. To keep track of the BIOS BAR addresses,
struct pci_dev was augmented with an array to hold the BAR addresses
of the PCI device: 'resource_size_t fw_addr[DEVICE_COUNT_RESOURCE]'.
The reinstatement of BAR addresses is an uncommon event leaving the
'fw_addr' array unused under normal circumstances. This functionality
is also currently architecture specific with an implementation limited
to x86. As the use of struct pci_dev is so prevalent, having the
'fw_addr' array residing within such seems somewhat wasteful.
This patch introduces a stand alone data structure and interfacing
routines for maintaining a list of FW-assigned BIOS BAR value entries.
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
pci_revert_fw_address() is used to reinstate a PCI device's original
FW-assigned BIOS BAR value(s) if normal resource assignment fails.
When attempting to reinstate an address, the point within the resource
tree from which to attempt the new resource request should be the parent
resource corresponding to the device, not the base of the resource tree
(ioport_resource or iomem_resource). For PCI devices this would
typically be the resource corresponding to the upstream PCI host bridge
or P2P bridge aperture.
This patch sets the point within the resource tree to attempt a new
resource assignment request to the PCI device's parent resource and only
if that fails does it fall back to the base ioport_resource or
iomem_resource.
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
During test busn_res allocation with cardbus, found pci card removal is not
working anymore, and it turns out it is broken by:
|commit 79cc9601c3
|Date: Tue Nov 22 21:06:53 2011 -0800
|
| PCI: Only call pci_stop_bus_device() one time for child devices at remove
The above changed the behavior of pci_remove_behind_bridge that
yenta_cardbus depended on. So restore the old behavoir of
pci_remove_behind_bridge (which requires stopping and removing of all
devices) by:
1. rename pci_remove_behind_bridge to __pci_remove_behind_bridge, and let
__pci_remove_bus_device() call it instead.
2. add pci_stop_behind_bridge that will stop devices behind a bridge
3. add back pci_remove_behind_bridge that will stop and remove devices
under bridge.
-v2: update commit description a little bit.
Tested-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
For an SRIOV device, PCI_SRIOV_SYS_PGSIZE should be set before
the PCI_SRIOV_BAR are queried. The sys pagesize defaults to 4k,
so this change is required on powerpc box with 64k base page size.
This is a regression caused due to moving SRIOV init to sriov_enable().
| commit afd24ece5c
| Author: Ram Pai <linuxram@us.ibm.com>
| PCI: delay configuration of SRIOV capability
| The SRIOV capability, namely page size and total_vfs of a device are
| configured during enumeration phase of the device. This can potentially
| interfere with the PCI operations of the platform, if the IOV capability
| of the device is not enabled.
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Acked-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>