Commit Graph

2130 Commits

Author SHA1 Message Date
David John 6be954d1f9 PCI: Check the node argument passed to cpumask_of_node
Commit e0cd516 "PCI: derive nearby CPUs from device's instead of bus'
NUMA information" causes an null pointer dereference when reading from
the sysfs attributes local_cpu* on Intel machines with no ACPI NUMA
proximity info, since dev->numa_node gets set to -1 for all PCI devices,
which then gets passed to cpumask_of_node.

Add a check to prevent this.

Signed-off-by: David John <davidjon@xenontk.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-01-04 15:10:56 -08:00
Youquan,Song 46256f83d0 PCI: AER: fix aer inject result in kernel oops
If the BIOS does not export _OSC to allow OS take over the PCIe AER, the
pcie aer driver will not initialize the aer service. However, the
aer_inject driver does not check this scenario, which results in a kernel
oops when injecting an aer error into OS.  For example:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000350
IP: [<ffffffff812e08f7>] _spin_lock_irqsave+0xc/0x23
PGD 155c41067 PUD 157fe0067 PMD 0
Oops: 0002 [#1] SMP
Pid: 5119, comm: aer-inject Not tainted 2.6.32-rc8-mce #2
RIP: 0010:[<ffffffff812e08f7>]  [<ffffffff812e08f7>] _spin_lock_irqsave+0xc/0x23
RSP: 0018:ffff880157f81e28  EFLAGS: 00010096
RAX: 0000000000000296 RBX: 0000000000000000 RCX: 0000000000000100
RDX: 0000000000010000 RSI: 0000000000000246 RDI: 0000000000000350
RBP: ffff880157f81e28 R08: 0000000000000004 R09: ffff880157f81dac
R10: ffff88015a666f60 R11: ffff88015a666f40 R12: ffff88015758cc00
R13: 0000000000000350 R14: 0000000000000000 R15: 0000000000000100
FS:  00007f4d4a66e6f0(0000) GS:ffff8800282e0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000350 CR3: 000000015661a000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process aer-inject (pid: 5119, threadinfo ffff880157f80000, task ffff8801585f4340)
Stack:
 ffff880157f81e78 ffffffff811b1615 ffff880157f81e78 ffffffff81222823
Call Trace:
 [<ffffffff811b1615>] aer_irq+0x38/0x117
 [<ffffffff81222823>] ? device_for_each_child+0x5f/0x6f
 [<ffffffffa00967bf>] aer_inject_write+0x409/0x45e [aer_inject]
 [<ffffffff810eb80e>] vfs_write+0xae/0x16a
 [<ffffffff810eb98e>] sys_write+0x47/0x6e
 [<ffffffff8100ba2b>] system_call_fastpath+0x16/0x1b
RIP  [<ffffffff812e08f7>] _spin_lock_irqsave+0xc/0x23
 RSP <ffff880157f81e28>
CR2: 0000000000000350

So check the _OSC before assuming that AER is available to the OS.

Signed-off-by: Youquan, Song <youquan.song@intel.com>
Acked-by: Ying, Huang <ying.huang@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-01-04 08:31:46 -08:00
Hidetoshi Seto 40da4186a5 PCI: pcie portdrv: style cleanup
No change in logic.

Before:
  drivers/pci/pcie/portdrv_core.c:
    total: 7 errors, 2 warnings, 508 lines checked
  drivers/pci/pcie/portdrv_pci.c:
    total: 4 errors, 2 warnings, 300 lines checked

After:
  drivers/pci/pcie/portdrv_core.c:
    total: 0 errors, 0 warnings, 506 lines checked
  drivers/pci/pcie/portdrv_pci.c:
    total: 0 errors, 0 warnings, 299 lines checked

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-01-04 08:29:37 -08:00
Linus Torvalds df9d1e8a43 pci: avoid compiler warning in quirks.c
Introduced by commit 5b889bf23 ("PCI: Fix build if quirks are not
enabled"), which made the pci_dev_reset_methods[] array static and
'const', but didn't then change the code to match, and use a const
pointer when moving it to quirks.c.

Trivially fixed by just adding the required 'const' to the iterator
variable.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-31 16:44:43 -08:00
Rafael J. Wysocki 5b889bf237 PCI: Fix build if quirks are not enabled
After commit b9c3b26641 ("PCI: support
device-specific reset methods") the kernel build is broken if
CONFIG_PCI_QUIRKS is unset.

Fix this by moving pci_dev_specific_reset() to drivers/pci/quirks.c and
providing an empty replacement for !CONFIG_PCI_QUIRKS builds.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-31 12:00:45 -08:00
Linus Torvalds d661d76b02 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI/cardbus: Add a fixup hook and fix powerpc
  PCI: change PCI nomenclature in drivers/pci/ (non-comment changes)
  PCI: change PCI nomenclature in drivers/pci/ (comment changes)
  PCI: fix section mismatch on update_res()
  PCI: add Intel 82599 Virtual Function specific reset method
  PCI: add Intel USB specific reset method
  PCI: support device-specific reset methods
  PCI: Handle case when no pci device can provide cache line size hint
  PCI/PM: Propagate wake-up enable for PCIe devices too
  vgaarbiter: fix a typo in the vgaarbiter Documentation
2009-12-30 13:13:24 -08:00
Benjamin Herrenschmidt 2d1c861871 PCI/cardbus: Add a fixup hook and fix powerpc
The cardbus code creates PCI devices without ever going through the
necessary fixup bits and pieces that normal PCI devices go through.

There's in fact a commented out call to pcibios_fixup_bus() in there,
it's commented because ... it doesn't work.

I could make pcibios_fixup_bus() do the right thing on powerpc easily
but I felt it cleaner instead to provide a specific hook pci_fixup_cardbus
for which a weak empty implementation is provided by the PCI core.

This fixes cardbus on powerbooks and probably all other PowerPC
platforms which was broken completely for ever on some platforms and
since 2.6.31 on others such as PowerBooks when we made the DMA ops
mandatory (since those are setup by the fixups).

Acked-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-16 18:55:51 -08:00
Stefan Assmann 7e8af37a9a PCI: change PCI nomenclature in drivers/pci/ (non-comment changes)
Changing occurrences of variants of PCI-X and PCIe to the PCI-SIG
terms listed in the "Trademark and Logo Usage Guidelines".
http://www.pcisig.com/developers/procedures/logos/Trademark_and_Logo_Usage_Guidelines_updated_112206.pdf

Patch is limited to drivers/pci/ and changes concern non-comment parts or
anything that might be visible to the user.

Signed-off-by: Stefan Assmann <sassmann@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-16 13:37:54 -08:00
Stefan Assmann 45e829ea41 PCI: change PCI nomenclature in drivers/pci/ (comment changes)
Changing occurrences of variants of PCI-X and PCIe to the PCI-SIG
terms listed in the "Trademark and Logo Usage Guidelines".
http://www.pcisig.com/developers/procedures/logos/Trademark_and_Logo_Usage_Guidelines_updated_112206.pdf

Patch is limited to drivers/pci/ and changes concern comments only.

Signed-off-by: Stefan Assmann <sassmann@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-16 13:37:53 -08:00
Dexuan Cui c763e7b58f PCI: add Intel 82599 Virtual Function specific reset method
Handle device specific timeout and use FLR.

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-16 13:37:52 -08:00
Dexuan Cui aeb30016fe PCI: add Intel USB specific reset method
Handle device specific reset requirements (i.e. vendor reg for reset
along with appropriate timeout).

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-16 13:37:51 -08:00
Dexuan Cui b9c3b26641 PCI: support device-specific reset methods
Add a new type of quirk for resetting devices at pci_dev_reset time.
This is necessary to handle device with nonstandard reset procedures,
especially useful for guest drivers.

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-16 13:37:50 -08:00
Csaba Henk 2820f333e3 PCI: Handle case when no pci device can provide cache line size hint
Prior to this patch, if pci_read_config_byte(dev, PCI_CACHE_LINE_SIZE, ...)
returns 0 for all dev, pci_cache_line_size ends up set to zero
(instead of pci_dfl_cache_line_size).

This patch ensures the pci_cache_line_size = pci_dfl_cache_line_size
setting in the above scenario.

This happens in case of a kvm-88 guest (where, consequently, the rtl8139
NIC failed to initialize).

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Csaba Henk <csaba@gluster.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-16 13:37:50 -08:00
Rafael J. Wysocki dc1a94ae17 PCI/PM: Propagate wake-up enable for PCIe devices too
Having read the PM part of the PCIe 2.0 specification more carefully
I think that it was a mistake to restrict the wake-up enable
propagation to non-PCIe devices, because if we do not request
control of the root ports' PME registers via OSC, PCIe PME is
supposed to be handled by the platform, just like the non-PCIe PME.
Even if we do that, the wake-up propagation is done to allow the
devices to wake up the system from sleep states which involves the
platform anyway, so it won't hurt.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-16 13:37:49 -08:00
Linus Torvalds a79960e576 Merge git://git.infradead.org/iommu-2.6
* git://git.infradead.org/iommu-2.6:
  implement early_io{re,un}map for ia64
  Revert "Intel IOMMU: Avoid memory allocation failures in dma map api calls"
  intel-iommu: ignore page table validation in pass through mode
  intel-iommu: Fix oops with intel_iommu=igfx_off
  intel-iommu: Check for an RMRR which ends before it starts.
  intel-iommu: Apply BIOS sanity checks for interrupt remapping too.
  intel-iommu: Detect DMAR in hyperspace at probe time.
  dmar: Fix build failure without NUMA, warn on bogus RHSA tables and don't abort
  iommu: Allocate dma-remapping structures using numa locality info
  intr_remap: Allocate intr-remapping table using numa locality info
  dmar: Allocate queued invalidation structure using numa locality info
  dmar: support for parsing Remapping Hardware Static Affinity structure
2009-12-16 10:11:38 -08:00
Alexey Dobriyan 471452104b const: constify remaining dev_pm_ops
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:25 -08:00
Linus Torvalds 11bd04f6f3 Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (109 commits)
  PCI: fix coding style issue in pci_save_state()
  PCI: add pci_request_acs
  PCI: fix BUG_ON triggered by logical PCIe root port removal
  PCI: remove ifdefed pci_cleanup_aer_correct_error_status
  PCI: unconditionally clear AER uncorr status register during cleanup
  x86/PCI: claim SR-IOV BARs in pcibios_allocate_resource
  PCI: portdrv: remove redundant definitions
  PCI: portdrv: remove unnecessary struct pcie_port_data
  PCI: portdrv: minor cleanup for pcie_port_device_register
  PCI: portdrv: add missing irq cleanup
  PCI: portdrv: enable device before irq initialization
  PCI: portdrv: cleanup service irqs initialization
  PCI: portdrv: check capabilities first
  PCI: portdrv: move PME capability check
  PCI: portdrv: remove redundant pcie type calculation
  PCI: portdrv: cleanup pcie_device registration
  PCI: portdrv: remove redundant pcie_port_device_probe
  PCI: Always set prefetchable base/limit upper32 registers
  PCI: read-modify-write the pcie device control register when initiating pcie flr
  PCI: show dma_mask bits in /sys
  ...

Fixed up conflicts in:
	arch/x86/kernel/amd_iommu_init.c
	drivers/pci/dmar.c
	drivers/pci/hotplug/acpiphp_glue.c
2009-12-11 12:18:16 -08:00
Linus Torvalds 3067e02f8f Merge branch 'acpica' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'acpica' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
  ACPICA: Update version to 20091112.
  ACPICA: Add additional module-level code support
  ACPICA: Deploy new create integer interface where appropriate
  ACPICA: New internal utility function to create Integer objects
  ACPICA: Add repair for predefined methods that must return sorted lists
  ACPICA: Fix possible fault if return Package objects contain NULL elements
  ACPICA: Add post-order callback to acpi_walk_namespace
  ACPICA: Change package length error message to an info message
  ACPICA: Reduce severity of predefined repair messages, Warning to Info
  ACPICA: Update version to 20091013
  ACPICA: Fix possible memory leak for Scope ASL operator
  ACPICA: Remove possibility of executing _REG methods twice
  ACPICA: Add repair for bad _MAT buffers
  ACPICA: Add repair for bad _BIF/_BIX packages
2009-12-09 19:57:06 -08:00
Linus Torvalds 849e8dea09 Merge branch 'timers-for-linus-hpet' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-for-linus-hpet' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: hpet: Make WARN_ON understandable
  x86: arch specific support for remapping HPET MSIs
  intr-remap: generic support for remapping HPET MSIs
  x86, hpet: Simplify the HPET code
  x86, hpet: Disable per-cpu hpet timer if ARAT is supported
2009-12-08 19:26:55 -08:00
KOSAKI Motohiro 354bb65e6e Revert "Intel IOMMU: Avoid memory allocation failures in dma map api calls"
commit eb3fa7cb51 said Intel IOMMU

    Intel IOMMU driver needs memory during DMA map calls to setup its
    internal page tables and for other data structures.  As we all know
    that these DMA map calls are mostly called in the interrupt context
    or with the spinlock held by the upper level drivers(network/storage
    drivers), so in order to avoid any memory allocation failure due to
    low memory issues, this patch makes memory allocation by temporarily
    setting PF_MEMALLOC flags for the current task before making memory
    allocation calls.

    We evaluated mempools as a backup when kmem_cache_alloc() fails
    and found that mempools are really not useful here because
     1) We don't know for sure how much to reserve in advance
     2) And mempools are not useful for GFP_ATOMIC case (as we call
        memory alloc functions with GFP_ATOMIC)

    (akpm: point 2 is wrong...)

The above description doesn't justify to waste system emergency memory
at all. Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need
few memory, anyone must not prevent it. Otherwise the system cause
mysterious hang-up and/or OOM Killer invokation.

Plus, akpm already pointed out what we should do.

Then, this patch revert it.

Cc: Keshavamurthy Anil S <anil.s.keshavamurthy@intel.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-12-08 10:12:04 +00:00
Chris Wright 1672af1164 intel-iommu: ignore page table validation in pass through mode
We are seeing a bug when booting w/ iommu=pt with current upstream
(bisect blames 19943b0e30 "intel-iommu:
Unify hardware and software passthrough support).

The issue is specific to this loop during identity map initialization
of each device:

domain_context_mapping_one(si_domain, ..., CONTEXT_TT_PASS_THROUGH)
...
		/* Skip top levels of page tables for
		* iommu which has less agaw than default.
		*/
		for (agaw = domain->agaw; agaw != iommu->agaw; agaw--) {
			pgd = phys_to_virt(dma_pte_addr(pgd));
			if (!dma_pte_present(pgd)) {      <------ failing here
				spin_unlock_irqrestore(&iommu->lock, flags);
			return -ENOMEM;
		}

This box has 2 iommu's in it.  The catchall iommu has MGAW == 48, and
SAGAW == 4.  The other iommu has MGAW == 39, SAGAW == 2.

The device that's failing the above pgd test is the only device connected
to the non-catchall iommu, which has a smaller address width than the
domain default.  This test is not necessary since the context is in PT
mode and the ASR is ignored.

Thanks to Don Dutile for discovering and debugging this one.

Cc: stable@kernel.org
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-12-08 10:03:25 +00:00
David Woodhouse 44cd613c0e intel-iommu: Fix oops with intel_iommu=igfx_off
The hotplug notifier will call find_domain() to see if the device in
question has been assigned an IOMMU domain. However, this should never
be called for devices with a "dummy" domain, such as graphics devices
when intel_iommu=igfx_off is set and the corresponding IOMMU isn't even
initialised. If you do that, it'll oops as it dereferences the (-1)
pointer.

The notifier function should check iommu_no_mapping() for the
device before doing anything else.

Cc: stable@kernel.org
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-12-08 10:03:06 +00:00
David Woodhouse 5595b528b4 intel-iommu: Check for an RMRR which ends before it starts.
Some HP BIOSes report an RMRR region (a region which needs a 1:1 mapping
in the IOMMU for a given device) which has an end address lower than its
start address. Detect that and warn, rather than triggering the
BUG() in dma_pte_clear_range().

Cc: stable@kernel.org
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-12-08 10:02:52 +00:00
David Woodhouse 6ecbf01c7c intel-iommu: Apply BIOS sanity checks for interrupt remapping too.
The BIOS errors where an IOMMU is reported either at zero or a bogus
address are causing problems even when the IOMMU is disabled -- because
interrupt remapping uses the same hardware. Ensure that the checks get
applied for the interrupt remapping initialisation too.

Cc: stable@kernel.org
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-12-08 10:02:39 +00:00
Chris Wright 2c99220810 intel-iommu: Detect DMAR in hyperspace at probe time.
Many BIOSes will lie to us about the existence of an IOMMU, and claim
that there is one at an address which actually returns all 0xFF.

We need to detect this early, so that we know we don't have a viable
IOMMU and can set up swiotlb before it's too late.

Cc: stable@kernel.org
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-12-08 10:02:15 +00:00
David Woodhouse ec20849193 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Merge the BIOS workarounds from 2.6.32, and the swiotlb fallback on failure.
2009-12-08 09:59:24 +00:00
Linus Torvalds 7b626acb8f Merge branch 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits)
  x86, Calgary IOMMU quirk: Find nearest matching Calgary while walking up the PCI tree
  x86/amd-iommu: Remove amd_iommu_pd_table
  x86/amd-iommu: Move reset_iommu_command_buffer out of locked code
  x86/amd-iommu: Cleanup DTE flushing code
  x86/amd-iommu: Introduce iommu_flush_device() function
  x86/amd-iommu: Cleanup attach/detach_device code
  x86/amd-iommu: Keep devices per domain in a list
  x86/amd-iommu: Add device bind reference counting
  x86/amd-iommu: Use dev->arch->iommu to store iommu related information
  x86/amd-iommu: Remove support for domain sharing
  x86/amd-iommu: Rearrange dma_ops related functions
  x86/amd-iommu: Move some pte allocation functions in the right section
  x86/amd-iommu: Remove iommu parameter from dma_ops_domain_alloc
  x86/amd-iommu: Use get_device_id and check_device where appropriate
  x86/amd-iommu: Move find_protection_domain to helper functions
  x86/amd-iommu: Simplify get_device_resources()
  x86/amd-iommu: Let domain_for_device handle aliases
  x86/amd-iommu: Remove iommu specific handling from dma_ops path
  x86/amd-iommu: Remove iommu parameter from __(un)map_single
  x86/amd-iommu: Make alloc_new_range aware of multiple IOMMUs
  ...
2009-12-05 09:49:07 -08:00
Kleber Sacilotto de Souza 9e0b5b2c44 PCI: fix coding style issue in pci_save_state()
Remove a stray space in pci_save_state().

Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-04 16:21:02 -08:00
Chris Wright 5d990b6275 PCI: add pci_request_acs
Commit ae21ee65e8 "PCI: acs p2p upsteram
forwarding enabling" doesn't actually enable ACS.

Add a function to pci core to allow an IOMMU to request that ACS
be enabled.  The existing mechanism of using iommu_found() in the pci
core to know when ACS should be enabled doesn't actually work due to
initialization order;  iommu has only been detected not initialized.

Have Intel and AMD IOMMUs request ACS, and Xen does as well during early
init of dom0.

Cc: Allen Kay <allen.m.kay@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-04 16:19:24 -08:00
Kenji Kaneshige b26a34aa47 PCI: fix BUG_ON triggered by logical PCIe root port removal
This problem happened when removing PCIe root port using PCI logical
hotplug operation.

The immediate cause of this problem is that the pointer to invalid
data structure is passed to pcie_update_aspm_capable() by
pcie_aspm_exit_link_state(). When pcie_aspm_exit_link_state() received
a pointer to root port link, it unconfigures the root port link and
frees its data structure at first. At this point, there are not links
to configure under the root port and the data structure for root port
link is already freed. So pcie_aspm_exit_link_state() must not call
pcie_update_aspm_capable() and pcie_config_aspm_path().

This patch fixes the problem by changing pcie_aspm_exit_link_state()
not to call pcie_update_aspm_capable() and pcie_config_aspm_path() if
the specified link is root port link.

------------[ cut here ]------------
kernel BUG at drivers/pci/pcie/aspm.c:606!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/pci0000:40/0000:40:13.0/remove
CPU 1
Modules linked in: shpchp
Pid: 9345, comm: sysfsd Not tainted 2.6.32-rc5 #98 ProLiant DL785 G6
RIP: 0010:[<ffffffff811df69b>]  [<ffffffff811df69b>] pcie_update_aspm_capable+0x15/0xbe
RSP: 0018:ffff88082a2f5ca0  EFLAGS: 00010202
RAX: 0000000000000e77 RBX: ffff88182cc3e000 RCX: ffff88082a33d006
RDX: 0000000000000001 RSI: ffffffff811dff4a RDI: ffff88182cc3e000
RBP: ffff88082a2f5cc0 R08: ffff88182cc3e000 R09: 0000000000000000
R10: ffff88182fc00180 R11: ffff88182fc00198 R12: ffff88182cc3e000
R13: 0000000000000000 R14: ffff88182cc3e000 R15: ffff88082a2f5e20
FS:  00007f259a64b6f0(0000) GS:ffff880864600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007feb53f73da0 CR3: 000000102cc94000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sysfsd (pid: 9345, threadinfo ffff88082a2f4000, task ffff88082a33cf00)
Stack:
 ffff88182cc3e000 ffff88182cc3e000 0000000000000000 ffff88082a33cf00
<0> ffff88082a2f5cf0 ffffffff811dff52 ffff88082a2f5cf0 ffff88082c525168
<0> ffff88402c9fd2f8 ffff88402c9fd2f8 ffff88082a2f5d20 ffffffff811d7db2
Call Trace:
 [<ffffffff811dff52>] pcie_aspm_exit_link_state+0xf5/0x11e
 [<ffffffff811d7db2>] pci_stop_bus_device+0x76/0x7e
 [<ffffffff811d7d67>] pci_stop_bus_device+0x2b/0x7e
 [<ffffffff811d7e4f>] pci_remove_bus_device+0x15/0xb9
 [<ffffffff811dcb8c>] remove_callback+0x29/0x3a
 [<ffffffff81135aeb>] sysfs_schedule_callback_work+0x15/0x6d
 [<ffffffff81072790>] worker_thread+0x19d/0x298
 [<ffffffff8107273b>] ? worker_thread+0x148/0x298
 [<ffffffff81135ad6>] ? sysfs_schedule_callback_work+0x0/0x6d
 [<ffffffff810765c0>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff810725f3>] ? worker_thread+0x0/0x298
 [<ffffffff8107629e>] kthread+0x7d/0x85
 [<ffffffff8102eafa>] child_rip+0xa/0x20
 [<ffffffff8102e4bc>] ? restore_args+0x0/0x30
 [<ffffffff81076221>] ? kthread+0x0/0x85
 [<ffffffff8102eaf0>] ? child_rip+0x0/0x20
Code: 89 e5 8a 50 48 31 c0 c0 ea 03 83 e2 07 e8 b2 de fe ff c9 48 98 c3 55 48 89 e5 41 56 49 89 fe 41 55 41 54 53 48 83 7f 10 00 74 04 <0f> 0b eb fe 48 8b 05 da 7d 63 00 4c 8d 60 e8 4c 89 e1 eb 24 4c
RIP  [<ffffffff811df69b>] pcie_update_aspm_capable+0x15/0xbe
 RSP <ffff88082a2f5ca0>
---[ end trace 6ae0f65bdeab8555 ]---

Reported-by: Alex Chiang <achiang@hp.com>
Tested-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-04 16:09:59 -08:00
Andrew Patterson 638bba0828 PCI: remove ifdefed pci_cleanup_aer_correct_error_status
The pci_cleanup_aer_correct_error_status() function has been
#if 0'd out since 2.6.25.  Time to remove the dead code.

Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-04 16:03:19 -08:00
Andrew Patterson 6cdfd995a6 PCI: unconditionally clear AER uncorr status register during cleanup
The current implementation of pci_cleanup_aer_uncorrect_error_status
only clears either fatal or non-fatal error status bits depending
on the state of the I/O channel. This implementation will then often
leave some bits set after PCI error recovery completes.  The uncleared bit
settings will then be falsely reported the next time an AER interrupt is
generated for that hierarchy. An easy way to illustrate this issue is to
use the aer-inject module to simultaneously inject both an uncorrectable
non-fatal and uncorrectable fatal error.  One of the errors will not be
cleared.

This patch resolves this issue by unconditionally clearing all bits in
the AER uncorrectable status register. All settings and corrective action
strategies are saved and determined before
pci_cleanup_aer_uncorrect_error_status is called, so this change should not
affect errory handling functionality.

Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-04 16:03:11 -08:00
Kenji Kaneshige f9f45604ed PCI: portdrv: remove redundant definitions
Remove unnecessary definitions from portdrv.h and use generic
definitions instead.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-04 15:56:24 -08:00
Kenji Kaneshige 694f88ef7a PCI: portdrv: remove unnecessary struct pcie_port_data
Remove 'port_type' field in struct pcie_port_data(), because we can
get port type information from struct pci_dev. With this change, this
patch also does followings:

 - Remove struct pcie_port_data because it no longer has any field.
 - Remove portdrv private definitions about port type (PCIE_RC_PORT,
   PCIE_SW_UPSTREAM_PORT and PCIE_SW_DOWNSTREAM_PORT), and use generic
   definitions instead.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-04 15:56:19 -08:00
Kenji Kaneshige 40717c39b1 PCI: portdrv: minor cleanup for pcie_port_device_register
Minor cleanups for pcie_port_device_register().

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-04 15:56:10 -08:00
Kenji Kaneshige fbb5de70bb PCI: portdrv: add missing irq cleanup
Add missing service irqs cleanup in the error code path of
pcie_port_device_register().

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-04 15:56:06 -08:00
Kenji Kaneshige 1ce5e83063 PCI: portdrv: enable device before irq initialization
Call pci_enable_device() before initializing service irqs, because
legacy interrupt is initialized in pci_enable_device() on some
architectures.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-04 15:55:59 -08:00
Kenji Kaneshige dc5351784e PCI: portdrv: cleanup service irqs initialization
This patch cleans up the service irqs initialization as follows:

 - Remove 'irq_mode' field in pcie_port_data and related definitions,
   which is not needed because we can get the same information from
   'is_msix', 'is_msi' and 'pin' fields in struct pci_dev.

 - Change the name of 'vectors' argument of assign_interrupt_mode() to
   'irqs' because it holds irq numbers actually. People might confuse
   it with CPU vector or MSI/MSI-X vector.

 - Change function name assign_interrupt_mode() to init_service_irqs()
   becasuse we no longer have 'irq_mode' data structure, and new name
   is more straightforward (IMO).

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-04 15:55:51 -08:00
Kenji Kaneshige d013598d9a PCI: portdrv: check capabilities first
Move capability check capability to the beginning of
pcie_port_device_register() prevents redundant execution path.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-04 15:55:44 -08:00
Kenji Kaneshige 9e5d0b16da PCI: portdrv: move PME capability check
No reason to check PME capability outside get_port_device_capability().
Do it in get_port_device_capability().

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-04 15:55:37 -08:00
Kenji Kaneshige 2dd60e96b4 PCI: portdrv: remove redundant pcie type calculation
PCIe port type is already stored in 'pcie_type' field of struct
pci_dev. So we don't need to get it from pci configuration space.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-04 15:55:26 -08:00
Kenji Kaneshige 52a0f24bea PCI: portdrv: cleanup pcie_device registration
In the current port bus driver implementation, pcie_device allocation,
initialization and registration are done in separated functions. Doing
those in one function make the code simple and easier to read.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-04 15:55:18 -08:00
Kenji Kaneshige 898294c975 PCI: portdrv: remove redundant pcie_port_device_probe
We don't need pcie_port_device_probe() because we can get pci
device/port type using pci_is_pcie() and 'pcie_type' fields in struct
pci_dev. Remove pcie_port_device_probe().

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-04 15:55:12 -08:00
Alex Williamson 59353ea30e PCI: Always set prefetchable base/limit upper32 registers
Prior to 1f82de10 we always initialized the upper 32bits of the
prefetchable memory window, regardless of the address range used.
Now we only touch it for a >32bit address, which means the upper32
registers remain whatever the BIOS initialized them too.

It's valid for the BIOS to set the upper32 base/limit to
0xffffffff/0x00000000, which makes us program prefetchable ranges
like 0xffffffffabc00000 - 0x00000000abc00000

Revert the chunk of 1f82de10 that made this conditional so we always
write the upper32 registers and remove now unused pref_mem64 variable.

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-04 15:52:43 -08:00
Shmulik Ravid 04b55c4732 PCI: read-modify-write the pcie device control register when initiating pcie flr
The pcie_flr routine writes the device control register with the FLR bit
set clearing all other fields for the FLR duration. Among other fields,
the Max_Payload_Size is also cleared which can cause errors if there are
transactions lurking in the HW pipeline. The patch replaces the blank
write with read-modify-write of the control register keeping the other
fields intact.

Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-04 15:49:44 -08:00
Yinghai Lu bb965401fd PCI: show dma_mask bits in /sys
So we can catch if the driver sets an incorrect dma_mask.

Reviewed-by: Grant Grundler <grundler@google.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-04 15:47:50 -08:00
Yinghai Lu c6a415761c PCI: add debug output for DMA mask info
This allows us to find out what DMA mask is used for each PCI device at boot
time; useful for debugging.

After the patch:
ehci_hcd 0000:00:02.1: using 31bit consistent DMA mask
e1000 0000:0b:01.0: using 64bit DMA mask
e1000 0000:0b:01.0: using 64bit consistent DMA mask
e1000e 0000:04:00.0: using 64bit DMA mask
e1000e 0000:04:00.0: using 64bit consistent DMA mask
ixgb 0000:0c:01.0: using 64bit DMA mask
ixgb 0000:0c:01.0: using 64bit consistent DMA mask
aacraid 0000:86:00.0: using 32bit DMA mask
aacraid 0000:86:00.0: using 32bit consistent DMA mask
aacraid 0000:86:00.0: using 64bit DMA mask
aacraid 0000:86:00.0: using 64bit consistent DMA mask
qla2xxx 0000:0c:02.0: using 64bit consistent DMA mask
qla2xxx 0000:0c:02.1: using 64bit consistent DMA mask
lpfc 0000:06:00.0: using 64bit DMA mask
lpfc 0000:06:00.1: using 64bit DMA mask
pata_amd 0000:00:06.0: using 32bit DMA mask
pata_amd 0000:00:06.0: using 32bit consistent DMA mask
mptsas 0000:0c:04.0: using 64bit DMA mask
mptsas 0000:0c:04.0: using 64bit consistent DMA mask

forcedeth 0000:00:08.0: using 39bit DMA mask
forcedeth 0000:00:08.0: using 39bit consistent DMA mask
niu 0000:02:00.0: using 44bit DMA mask
niu 0000:02:00.0: using 44bit consistent DMA mask
sata_nv 0000:00:05.0: using 32bit DMA mask
sata_nv 0000:00:05.0: using 32bit consistent DMA mask
ib_mthca 0000:03:00.0: using 64bit DMA mask
ib_mthca 0000:03:00.0: using 64bit consistent DMA mask

Reviewed-by: Grant Grundler <grundler@google.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-04 15:46:20 -08:00
Jesse Barnes 5c788a695a PCI: ibmphp_hpc: don't release hw sem twice if kthread stops
If we stop the kthread, we may end up up'ing the sem twice, which seems
unintended.

Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-04 15:18:01 -08:00
Lin Ming 2263576cfc ACPICA: Add post-order callback to acpi_walk_namespace
The existing interface only has a pre-order callback. This change
adds an additional parameter for a post-order callback which will
be more useful for bus scans. ACPICA BZ 779.

Also update the external calls to acpi_walk_namespace.

http://www.acpica.org/bugzilla/show_bug.cgi?id=779

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-11-24 21:31:10 -05:00
Kenji Kaneshige 5651c48cfa PCI pciehp: fix power fault interrupt storm problem
Enabling power fault detected event notification in current pciehp
might cause power fault interrupt storm on some machines. On those
machines. On those machines, power fault detected bit in the slot
status register was set again immediately when it is cleared in the
interrupt service routine, and next power fault detected interrupt was
notified again. Therefore, disable power fault detected event
notification for now.

This patch also removes unnecessary handling for power fault cleared
event because this event is not supported by PCIe spec.

Tested-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:25:19 -08:00
Kenji Kaneshige 13598378f2 PCI hotplug: use pci_is_pcie()
Change for PCI hotplug to use pci_is_pcie() instead of checking
pci_dev->is_pcie.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:25:18 -08:00
Kenji Kaneshige b44d7db364 PCIe AER: use pci_is_pcie()
Changes for PCIe AER driver to use pci_is_pcie() instead of checking
pci_dev->is_pcie.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:25:17 -08:00
Kenji Kaneshige 8b06477dc4 PCIe ASPM: use pci_is_pcie()
Change for PCIe ASPM driver to use pci_is_pcie() instead of checking
pci_dev->is_pcie.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:25:17 -08:00
Kenji Kaneshige 5f4d91a122 PCI: use pci_is_pcie() in pci core
Change for PCI core to use pci_is_pcie() instead of checking
pci_dev->is_pcie.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:25:16 -08:00
Kenji Kaneshige 1518c17ab7 pciehp: use pci_pcie_cap()
Use pci_pcie_cap() instead of pci_find_capability() to get PCIe capability
offset in pciehp driver. This avoids unnecessary search in PCI
configuration space. This patch also removes 'cap_base' field in
struct controller, that was used to hold PCIe capability offset by
pciehp itself.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:25:15 -08:00
Kenji Kaneshige d3ccc4091f PCI hotplug: use pci_pcie_cap()
Use pci_pcie_cap() instead of pci_find_capability() to get PCIe capability
offset in PCI hotplug core. This avoids unnecessary search in PCI
configuration space.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:25:14 -08:00
Kenji Kaneshige db9538a749 PCIe ASPM: use pci_pcie_cap()
Use pci_pcie_cap() instead of pci_find_capability() to get PCIe capability
offset in PCIe ASPM driver. This avoids unnecessary search in PCI
configuration space.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:25:14 -08:00
Kenji Kaneshige dba90dfe48 PCIe port bus: use pci_pcie_cap()
Use pci_pcie_cap() instead of pci_find_capability() to get PCIe capability
offset in PCI Express Port Bus driver. This avoids unnecessary serarch
in PCI configuration space.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:25:13 -08:00
Kenji Kaneshige 39a53062cb PCIe AER: use pci_pcie_cap()
Use pcie_cap() instead of pci_find_capability() to get PCIe capability
offset in PCIe AER driver. This avoids unnecessary search in PCI
configuration space.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:25:13 -08:00
Kenji Kaneshige 06a1cbafb2 PCI: use pci_pcie_cap() in pci core
Use pcie_cap() instead of pci_find_capability() to get PCIe capability
offset in PCI core code. This avoids unnecessary search in PCI
configuration space.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-24 15:25:12 -08:00
David Woodhouse 5854d9c8d1 Fix handling of the HP/Acer 'DMAR at zero' BIOS error for machines with <4GiB RAM.
Commit 86cf898e1d ("intel-iommu: Check for
'DMAR at zero' BIOS error earlier.") was supposed to work by pretending
not to detect an IOMMU if it was actually being reported by the BIOS at
physical address zero.

However, the intel_iommu_init() function is called unconditionally, as
are the corresponding functions for other IOMMU hardware.

So the patch only worked if you have RAM above the 4GiB boundary. It
caused swiotlb to be initialised when no IOMMU was detected during early
boot, and thus the later IOMMU init would refuse to run.

But if you have less RAM than that, swiotlb wouldn't get set up and the
IOMMU _would_ still end up being initialised, even though we never
claimed to detect it.

This patch also sets the dmar_disabled flag when the error is detected
during the initial detection phase -- so that the later call to
intel_iommu_init() will return without doing anything, regardless of
whether swiotlb is used or not.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-19 13:42:02 -08:00
Ingo Molnar 99f4c9de2b Merge commit 'v2.6.32-rc7' into core/iommu
Merge reason: Add fixes we'll depend on.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-17 07:51:07 +01:00
Linus Torvalds a9366e61b0 Merge git://git.infradead.org/users/dwmw2/iommu-2.6.32
* git://git.infradead.org/users/dwmw2/iommu-2.6.32:
  intel-iommu: Support PCIe hot-plug
  intel-iommu: Obey coherent_dma_mask for alloc_coherent on passthrough
  intel-iommu: Check for 'DMAR at zero' BIOS error earlier.
2009-11-14 13:05:27 -08:00
Fenghua Yu 99dcadede4 intel-iommu: Support PCIe hot-plug
To support PCIe hot plug in IOMMU, we register a notifier to respond to device
change action.

When the notifier gets BUS_NOTIFY_UNBOUND_DRIVER, it removes the device
from its DMAR domain.

A hot added device will be added into an IOMMU domain when it first does IOMMU
op. So there is no need to add more code for hot add.

Without the patch, after a hot-remove, a hot-added device on the same
slot will not work.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Tested-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-11-12 02:28:45 +00:00
Alex Williamson e8bb910d1b intel-iommu: Obey coherent_dma_mask for alloc_coherent on passthrough
The model for IOMMU passthrough is that decent devices that can cope
with DMA to all of memory get passthrough; crappy devices with a limited
dma_mask don't -- they get to use the IOMMU anyway.

This is done on the basis that IOMMU passthrough is usually wanted for
performance reasons, and it's only the decent PCI devices that you
really care about performance for, while the crappy 32-bit ones like
your USB controller can just use the IOMMU and you won't really care.

Unfortunately, the check for this was only looking at dev->dma_mask, not
at dev->coherent_dma_mask. And some devices have a 32-bit
coherent_dma_mask even though they have a full 64-bit dma_mask.

Even more unfortunately, fixing that simple oversight would upset
certain broken HP devices. Not only do they have a 32-bit
coherent_dma_mask, but they also have a tendency to do stray DMA to
unmapped addresses. And then they die when they take the DMA fault they
so richly deserve.

So if we do the 'correct' fix, it'll mean that affected users have to
disable IOMMU support completely on "a large percentage of servers from
a major vendor."

Personally, I have little sympathy -- given that this is the _same_
'major vendor' who is shipping machines which claim to have IOMMU
support but have obviously never _once_ booted a VT-d capable OS to do
any form of QA. But strictly speaking, it _would_ be a regression even
though it only ever worked by fluke.

For 2.6.33, we'll come up with a quirk which gives swiotlb support
for this particular device, and other devices with an inadequate
coherent_dma_mask will just get normal IOMMU mapping.

The simplest fix for 2.6.32, though, is just to jump through some hoops
to try to allocate coherent DMA memory for such devices in a place that
they can reach. We'd use dma_generic_alloc_coherent() for this if it
existed on IA64.

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-11-12 02:10:34 +00:00
Linus Torvalds 8c8def26bf PCI: allow matching of prefetchable resources to non-prefetchable windows
I'm not entirely sure it needs to go into 32, but it's probably the right
thing to do. Another way of explaining the patch is:

 - we currently pick the _first_ exactly matching bus resource entry, but
   the _last_ inexactly matching one. Normally first/last shouldn't
   matter, but bus resource entries aren't actually all created equal: in
   a transparent bus, the last resources will be the parent resources,
   which we should generally try to avoid unless we have no choice. So
   "first matching" is the thing we should always aim for.

 - the patch is a bit bigger than it needs to be, because I simplified the
   logic at the same time. It used to be a fairly incomprehensible

	if ((res->flags & IORESOURCE_PREFETCH) && !(r->flags & IORESOURCE_PREFETCH))
		best = r;       /* Approximating prefetchable by non-prefetchable */

   and technically, all the patch did was to make that complex choice be
   even more complex (it basically added a "&& !best" to say that if we
   already gound a non-prefetchable window for the prefetchable resource,
   then we won't override an earlier one with that later one: remember
   "first matching").

 - So instead of that complex one with three separate conditionals in one,
   I split it up a bit, and am taking advantage of the fact that we
   already handled the exact case, so if 'res->flags' has the PREFETCH
   bit, then we already know that 'r->flags' will _not_ have it. So the
   simplified code drops the redundant test, and does the new '!best' test
   separately. It also uses 'continue' as a way to ignore the bus
   resource we know doesn't work (ie a prefetchable bus resource is _not_
   acceptable for anything but an exact match), so it turns into:

	/* We can't insert a non-prefetch resource inside a prefetchable parent .. */
	if (r->flags & IORESOURCE_PREFETCH)
		continue;
	/* .. but we can put a prefetchable resource inside a non-prefetchable one */
	if (!best)
		best = r;

   instead. With the comments, it's now six lines instead of two, but it's
   conceptually simpler, and I _could_ have written it as two lines:

	if ((res->flags & IORESOURCE_PREFETCH) && !best)
		best = r;	/* Approximating prefetchable by non-prefetchable */

   but I thought that was too damn subtle.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-11 08:19:52 +00:00
FUJITA Tomonori 75f1cdf1dd x86: Handle HW IOMMU initialization failure gracefully
If HW IOMMU initialization fails (Intel VT-d often does this,
typically due to BIOS bugs), we fall back to nommu. It doesn't
work for the majority since nowadays we have more than 4GB
memory so we must use swiotlb instead of nommu.

The problem is that it's too late to initialize swiotlb when HW
IOMMU initialization fails. We need to allocate swiotlb memory
earlier from bootmem allocator. Chris explained the issue in
detail:

  http://marc.info/?l=linux-kernel&m=125657444317079&w=2

The current x86 IOMMU initialization sequence is too complicated
and handling the above issue makes it more hacky.

This patch changes x86 IOMMU initialization sequence to handle
the above issue cleanly.

The new x86 IOMMU initialization sequence are:

1. we initialize the swiotlb (and setting swiotlb to 1) in the case
   of (max_pfn > MAX_DMA32_PFN && !no_iommu). dma_ops is set to
   swiotlb_dma_ops or nommu_dma_ops. if swiotlb usage is forced by
   the boot option, we finish here.

2. we call the detection functions of all the IOMMUs

3. the detection function sets x86_init.iommu.iommu_init to the
   IOMMU initialization function (so we can avoid calling the
   initialization functions of all the IOMMUs needlessly).

4. if the IOMMU initialization function doesn't need to swiotlb
   then sets swiotlb to zero (e.g. the initialization is
   sucessful).

5. if we find that swiotlb is set to zero, we free swiotlb
   resource.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-10-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10 12:32:07 +01:00
FUJITA Tomonori 9d5ce73a64 x86: intel-iommu: Convert detect_intel_iommu to use iommu_init hook
This changes detect_intel_iommu() to set intel_iommu_init() to
iommu_init hook if detect_intel_iommu() finds the IOMMU.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-6-git-send-email-fujita.tomonori@lab.ntt.co.jp>
[ -v2: build fix for the !CONFIG_DMAR case ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10 12:31:36 +01:00
David Woodhouse 86cf898e1d intel-iommu: Check for 'DMAR at zero' BIOS error earlier.
Chris Wright has some patches which let us fall back to swiotlb nicely
if IOMMU initialisation fails. But those are a bit much for 2.6.32.

Instead, let's shift the check for the biggest problem, the HP and Acer
BIOS bug which reports a DMAR at physical address zero. That one can
actually be checked much earlier -- before we even admit to having
detected an IOMMU in the first place. So the swiotlb init goes ahead as
we want.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-11-09 22:15:15 +00:00
Thomas Gleixner e9d1e4921d PCI: Replace old style lock initializer
SPIN_LOCK_UNLOCKED is deprecated. Use DEFINE_SPINLOCK instead.

Make the lock static while at it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-06 15:06:27 -08:00
Kenji Kaneshige 9b536e0b61 PCI hotplug: fix oshp evaluation
If firmware doesn't grant over native hotplug control through ACPI
_OSC method, we must not evaluate OSHP.

Acked-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-06 14:13:32 -08:00
Andreas Herrmann e0cd516034 PCI: derive nearby CPUs from device's instead of bus' NUMA information
In case of AMD CPU northbridge functions this NUMA information might
differ.  Here is an example from a 4-socket system.

Currently Linux shows

  root@hagen:/sys/devices/pci0000:00/0000:00:1a.4# cat numa_node
  0
  root@hagen:/sys/devices/pci0000:00/0000:00:1a.4# cat local_cpu*
  0-3
  00000000,0000000f

which is not correct for northbridge functions as the local CPUs
are those of the same socket.

With this patch and a quirk for AMD CPU NB functions Linux can
do better and correctly show

  root@hagen:/sys/devices/pci0000:00/0000:00:1a.4# cat numa_node
  2
  root@hagen:/sys/devices/pci0000:00/0000:00:1a.4# cat local_cpu*
  8-11
  00000000,00000f00

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-06 14:09:15 -08:00
Kenji Kaneshige 761434a318 PCI ASPM: fix oops on root port removal
Fix the following BUG_ON() problem reported by Alex Chiang.

This problem happened when removing PCIe root port using PCI logical
hotplug operation.

The immediate cause of this problem is that the pointer to invalid
data structure is passed to pcie_update_aspm_capable() by
pcie_aspm_exit_link_state(). When pcie_aspm_exit_link_state() received
a pointer to root port link, it unconfigures the root port link and
frees its data structure at first. At this point, there are not links
to configure under the root port and the data structure for root port
link is already freed. So pcie_aspm_exit_link_state() must not call
pcie_update_aspm_capable() and pcie_config_aspm_path().

This patch fixes the problem by changing pcie_aspm_exit_link_state()
not to call pcie_update_aspm_capable() and pcie_config_aspm_path() if
the specified link is root port link.

------------[ cut here ]------------
kernel BUG at drivers/pci/pcie/aspm.c:606!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/pci0000:40/0000:40:13.0/remove
CPU 1
Modules linked in: shpchp
Pid: 9345, comm: sysfsd Not tainted 2.6.32-rc5 #98 ProLiant DL785 G6
RIP: 0010:[<ffffffff811df69b>]  [<ffffffff811df69b>] pcie_update_aspm_capable+0x15/0xbe
RSP: 0018:ffff88082a2f5ca0  EFLAGS: 00010202
RAX: 0000000000000e77 RBX: ffff88182cc3e000 RCX: ffff88082a33d006
RDX: 0000000000000001 RSI: ffffffff811dff4a RDI: ffff88182cc3e000
RBP: ffff88082a2f5cc0 R08: ffff88182cc3e000 R09: 0000000000000000
R10: ffff88182fc00180 R11: ffff88182fc00198 R12: ffff88182cc3e000
R13: 0000000000000000 R14: ffff88182cc3e000 R15: ffff88082a2f5e20
FS:  00007f259a64b6f0(0000) GS:ffff880864600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007feb53f73da0 CR3: 000000102cc94000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sysfsd (pid: 9345, threadinfo ffff88082a2f4000, task ffff88082a33cf00)
Stack:
 ffff88182cc3e000 ffff88182cc3e000 0000000000000000 ffff88082a33cf00
<0> ffff88082a2f5cf0 ffffffff811dff52 ffff88082a2f5cf0 ffff88082c525168
<0> ffff88402c9fd2f8 ffff88402c9fd2f8 ffff88082a2f5d20 ffffffff811d7db2
Call Trace:
 [<ffffffff811dff52>] pcie_aspm_exit_link_state+0xf5/0x11e
 [<ffffffff811d7db2>] pci_stop_bus_device+0x76/0x7e
 [<ffffffff811d7d67>] pci_stop_bus_device+0x2b/0x7e
 [<ffffffff811d7e4f>] pci_remove_bus_device+0x15/0xb9
 [<ffffffff811dcb8c>] remove_callback+0x29/0x3a
 [<ffffffff81135aeb>] sysfs_schedule_callback_work+0x15/0x6d
 [<ffffffff81072790>] worker_thread+0x19d/0x298
 [<ffffffff8107273b>] ? worker_thread+0x148/0x298
 [<ffffffff81135ad6>] ? sysfs_schedule_callback_work+0x0/0x6d
 [<ffffffff810765c0>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff810725f3>] ? worker_thread+0x0/0x298
 [<ffffffff8107629e>] kthread+0x7d/0x85
 [<ffffffff8102eafa>] child_rip+0xa/0x20
 [<ffffffff8102e4bc>] ? restore_args+0x0/0x30
 [<ffffffff81076221>] ? kthread+0x0/0x85
 [<ffffffff8102eaf0>] ? child_rip+0x0/0x20
Code: 89 e5 8a 50 48 31 c0 c0 ea 03 83 e2 07 e8 b2 de fe ff c9 48 98 c3 55 48 89 e5 41 56 49 89 fe 41 55 41 54 53 48 83 7f 10 00 74 04 <0f> 0b eb fe 48 8b 05 da 7d 63 00 4c 8d 60 e8 4c 89 e1 eb 24 4c
RIP  [<ffffffff811df69b>] pcie_update_aspm_capable+0x15/0xbe
 RSP <ffff88082a2f5ca0>
---[ end trace 6ae0f65bdeab8555 ]---

Reported-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Tested-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-06 14:01:23 -08:00
Kenji Kaneshige 0efea00063 PCI: cache PCIe capability offset
There are a lot of codes that searches PCI express capability offset
in the PCI configuration space using pci_find_capability(). Caching it
in the struct pci_dev will reduce unncecessary search. This patch adds
an additional 'pcie_cap' fields into struct pci_dev, which is
initialized at pci device scan time (in set_pcie_port_type()).

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-06 13:59:02 -08:00
Bjorn Helgaas 865df576e8 PCI: improve discovery/configuration messages
This makes PCI resource management messages more consistent and adds a few
new messages to aid debugging.

Whenever we assign resources to a device, update a BAR, or change a
bridge aperture, it's worth noting it.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 13:06:44 -08:00
Bjorn Helgaas 0207c356ef PCI: replace pr_debug with dev_dbg
Since we have a struct device, we might as well use dev_printk.  Note that
both pr_debug() and dev_dbg() are completely compiled out unless DEBUG or
DYNAMIC_DEBUG is defined.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 13:06:44 -08:00
Bjorn Helgaas 10c3d71d42 PCI: make PME# messages KERN_DEBUG
Messages about PME# being supported and enabled/disabled are probably
useful for debug, but maybe don't need to be on the console.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 13:06:42 -08:00
Thadeu Lima de Souza Cascardo 8d6cfdcdb5 PCI: remove pci_find_slot from PCI_LEGACY config description
Commit 3b073eda has removed pci_find_slot, so there's no point in
mentioning it in the config description as one of the deprecated APIs
there are enabled by PCI_LEGACY and still used by some drivers.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 13:06:42 -08:00
Bjorn Helgaas c7dabef8a2 vsprintf: use %pR, %pr instead of %pRt, %pRf
Jesse accidentally applied v1 [1] of the patchset instead of v2 [2].  This
is the diff between v1 and v2.

The changes in this patch are:
    - tidied vsprintf stack buffer to shrink and compute size more
      accurately
    - use %pR for decoding and %pr for "raw" (with type and flags) instead
      of adding %pRt and %pRf

[1] http://lkml.org/lkml/2009/10/6/491
[2] http://lkml.org/lkml/2009/10/13/441

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 13:06:41 -08:00
Stefan Assmann 4fd8bdc567 PCI: avoid boot interrupt quirk for AMD 813x B1 devices
AMD 813x rev. B1 (like rev. B2) devices generate no interrupts if
quirk_disable_amd_813x_boot_interrupt is executed, add an exception.
http://bugzilla.kernel.org/show_bug.cgi?id=14159

Patch also adds missing cases for DECLARE_PCI_FIXUP_RESUME and
DECLARE_PCI_FIXUP_FINAL calls to quirk_disable_amd_813x_boot_interrupt.

Signed-off-by: Stefan Assmann <sassmann@redhat.com>
Tested-by: Gabriele Giorgetti <g.giorgetti@teamsystem.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 13:06:40 -08:00
Alex Chiang 58c08628c4 PCI Hotplug: acpiphp: clean up list traversals
Using list_for_each_entry instead of list_for_each allows us to
enhance readability and minorly reduce some stack usage.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 13:06:40 -08:00
Bjorn Helgaas 204d49a561 PCI hotplug: move IOAPIC support from acpiphp to ioapic driver
This patch moves PCI I/O APIC support from acpiphp to a separate driver.

Like pciehp and shpchp, acpiphp handles PCI hotplug, i.e., addition and
removal of PCI adapters.  But in addition, acpiphp handles some ACPI
hotplug, such as the addition of new host bridges, and the I/O APIC
support was tangled up with that.

I don't think the I/O APIC support needs to be in acpiphp; PCI I/O APICs
usually appear as a function on a PCI host bridge, and we'll enumerate the
APIC before any of the devices behind the bridge that use it.

As far as I know, nobody actually uses I/O APIC hotplug.  It depends on
acpi_register_ioapic(), which is only implemented for ia64, and I don't
think any vendors have supported I/O chassis hotplug yet.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
CC: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
CC: MUNEDA Takahiro <muneda.takahiro@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 13:06:39 -08:00
Andrew Patterson 476f644edf PCI: fix memory leak in aer_inject
Fixed probable typo in aer_inject cleanup code resulting in a memory
leak.

Acked-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 13:06:38 -08:00
Andrew Patterson 1d02435594 PCI: use better error return values in aer_inject
Replaced some error return values in aer_inject. Use -ENODEV when we
can't find a device and -ENOTTY when the device does not support PCIe AER.

Acked-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 13:06:38 -08:00
Andrew Patterson cc5d153a0c PCI: add support for PCI domains to aer_inject
Add support for PCI domains (segments) to aer_inject.

Acked-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 13:06:37 -08:00
Andrew Patterson 3c299dc226 PCI: add pci_get_domain_bus_and_slot function
Added the pci_get_domain_and_slot_function which is analogous to
pci_get_bus_and_slot. It returns a pci_dev given a domain (segment) number,
bus number, and devnr. Like pci_get_bus_and_slot,
pci_get_domain_bus_and_slot holds a reference to the returned pci_dev.

Converted pci_get_bus_and_slot to a wrapper that calls
pci_get_domain_bus_and_slot with the domain hard-coded to 0.

This routine was patterned off code suggested by Bjorn Helgaas.

Acked-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 13:06:36 -08:00
Gabe Black bc577d2bb9 PCI: populate subsystem vendor and device IDs for PCI bridges
Change to populate the subsystem vendor and subsytem device IDs for
PCI-PCI bridges that implement the PCI Subsystem Vendor ID capability.
Previously bridges left subsystem vendor IDs unpopulated.

Signed-off-by: Gabe Black <gabe.black@ni.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 13:06:36 -08:00
Matt Domsch 0584396157 PCI: PCIe AER: honor ACPI HEST FIRMWARE FIRST mode
Feedback from Hidetoshi Seto and Kenji Kaneshige incorporated.  This
correctly handles PCI-X bridges, PCIe root ports and endpoints, and
prints debug messages when invalid/reserved types are found in the
HEST.  PCI devices not in domain/segment 0 are not represented in
HEST, thus will be ignored.

Today, the PCIe Advanced Error Reporting (AER) driver attaches itself
to every PCIe root port for which BIOS reports it should, via ACPI
_OSC.

However, _OSC alone is insufficient for newer BIOSes.  Part of ACPI
4.0 is the new APEI (ACPI Platform Error Interfaces) which is a way
for OS and BIOS to handshake over which errors for which components
each will handle.  One table in ACPI 4.0 is the Hardware Error Source
Table (HEST), where BIOS can define that errors for certain PCIe
devices (or all devices), should be handled by BIOS ("Firmware First
mode"), rather than be handled by the OS.

Dell PowerEdge 11G server BIOS defines Firmware First mode in HEST, so
that it may manage such errors, log them to the System Event Log, and
possibly take other actions.  The aer driver should honor this, and
not attach itself to devices noted as such.

Furthermore, Kenji Kaneshige reminded us to disallow changing the AER
registers when respecting Firmware First mode.  Platform firmware is
expected to manage these, and if changes to them are allowed, it could
break that firmware's behavior.

The HEST parsing code may be replaced in the future by a more
feature-rich implementation.  This patch provides the minimum needed
to prevent breakage until that implementation is available.

Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 13:06:25 -08:00
Kenji Kaneshige 8792e11f1c PCI: pciehp: prevent unnecessary power off
Prevent unnecessary power off at initialization time. If slot power
is already off, we don't need to power off the slot.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 09:02:35 -08:00
Kenji Kaneshige 65b947bc5f PCI: pciehp: fix typo in pciehp_probe
Fix typo that might cause memory leak in pciehp_probe().

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 09:02:13 -08:00
Kenji Kaneshige 445f798555 PCI: pciehp: return error on read/write failure
Current pciehp returns successfully on read/write failure with dummy
state values. It should return error instead.

With this patch, pciehp no longer uses hotplug_slot_info data
structure. So this also removes hotplug_slot_info related code. But
note that it still allocates hotplug_slot_info because it is required
by pci hotplug core.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 09:01:59 -08:00
Kenji Kaneshige 586f1d6688 PCI: pciehp: create files only for existing capabilities
Current pciehp driver creates 'attention' and 'latch' files even if
the controller doesn't support them. In this case, the contents of
those files are meaningless and unpredictable. Those files should be
created only if the controller has the corresponding capabilities.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 09:01:44 -08:00
Kenji Kaneshige 3c3a1b1759 PCI: pciehp: remove wrong workaround for bad DLLP
Remove wrong workaround for BAD DLLP error, which confused surprise
down error with DLL errors.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 09:01:28 -08:00
Kenji Kaneshige f22daf1fb9 PCI: pciehp: disable DLL state changed event notification
Current pciehp doesn't handle Data Link Layer State Changed Event
notification. So it needs to be disabled at initialization time,
otherwise other event notifications are not generated.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 09:01:12 -08:00
Michael S. Tsirkin 1ed6743918 PCI: fix nit in ROM BAR size probing
When probing for ROM BAR size, we should not change bits 1:10 in this
BAR, because these bits are marked as "reserved for future use" in PCI
spec, so changing them might have side effects.

No such issue for I/O or memory, as there is an implementation note in
PCI spec which explicitly allows writing 0xfffffffff there.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 08:59:40 -08:00
Allen Kay df0e97c6f1 PCI: add xen dom0 checking before ACS initialization
This patch is predicated on Jeremy's patch in include/xen/xen.h.  It'll
prevent ACS init unless the platform has both an IOMMU and we're running
as dom0.

Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 08:47:26 -08:00
Allen Kay ae21ee65e8 PCI: acs p2p upsteram forwarding enabling
Note: dom0 checking in v4 has been separated out into 2/2.

This patch enables P2P upstream forwarding in ACS capable PCIe switches.
It solves two potential problems in virtualization environment where a PCIe
device is assigned to a guest domain using a HW iommu such as VT-d:

1) Unintentional failure caused by guest physical address programmed
   into the device's DMA that happens to match the memory address range
   of other downstream ports in the same PCIe switch.  This causes the PCI
   transaction to go to the matching downstream port instead of go to the
   root complex to get translated by VT-d as it should be.

2) Malicious guest software intentionally attacks another downstream
   PCIe device by programming the DMA address into the assigned device
   that matches memory address range of the downstream PCIe port.

We are in process of implementing device filtering software in KVM/XEN
management software to allow device assignment of PCIe devices behind a PCIe
switch only if it has ACS capability and with the P2P upstream forwarding bits
enabled.  This patch is intended to work for both KVM and Xen environments.

Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Reviewed-by: Mathew Wilcox <willy@linux.intel.com>
Reviewed-by: Chris Wright <chris@sous-sol.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 08:47:25 -08:00
Bjorn Helgaas a369c791e8 PCI: print resources consistently with %pRt
This uses %pRt to print additional resource information (type, size,
prefetchability, etc.) consistently.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 08:47:18 -08:00
Matthew Garrett 3368dd2958 PCI hotplug: acpiphp should be linked after vendor drivers
As a followup to 71a082efc9, it's conceivable
that some vendors may expose PCI hotplug functionality through both vendor
mechanisms and ACPI. The native mechanism will generally be a superset of
any functionality provided via ACPI, so the acpiphp driver should always
be initialised after any others. Change the link order such that acpiphp
will not be initialised until any other statically linked drivers have had
an opportunity to claim the hardware.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 08:47:14 -08:00
Stefan Assmann 17d6715279 PCI hotplug: change PCI nomenclature
Change PCI nomenclature according to
http://www.pcisig.com/developers/procedures/logos/Trademark_and_Logo_Usage_Guidelines_updated_112206.pdf.

Signed-off-by: Stefan Assmann <sassmann@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 08:47:13 -08:00