Instead of calling domain_pfn_mapping() repeatedly with single or
small numbers of pages, just pass the sglist in. It can optimise the
number of cache flushes like domain_pfn_mapping() does, and gives a huge
speedup for large scatterlists.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
There's no need for the separate iommu_alloc_iova() function, and
certainly not for it to be global. Remove the underscores while we're at
it.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
As with dma_pte_clear_range(), don't keep flushing a single PTE at a
time. And also micro-optimise the setting of PTE values rather than
using the helper functions to do all the masking.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
It's a bit silly to repeatedly call domain_flush_cache() for each PTE
individually, as we clear it. Instead, batch them up and flush a whole
range at a time. We might as well refrain from recalculating the PTE
address from scratch each time round the loop too.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This is fairly broken anyway -- it doesn't take hotplug into account.
We should probably be checking page_is_ram() instead.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Most of its callers are having to shift for themselves anyway, so we might
as well do it in iommu_flush_iotlb_psi().
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
... and use it in the trivial cases; the other callers want individual
(and bisectable) attention, since I screwed them up the first time...
Make the BUG_ON() happen on too-large virtual address rather than
physical address, too. That's the one we care about.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Use unaligned address for domain->max_addr. That algorithm isn't ideal
anyway -- we should probably just look at the last iova in the tree.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
With some cleanup of intel_unmap_page(), intel_unmap_sg() and
vm_domain_exit() to no longer play with 64-bit addresses.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Add some helpers for converting between VT-d and normal system pfns,
since system pages can be larger than VT-d pages.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
There's no need for the GFX workaround now we have 'iommu=pt' for the
cases where people really care about performance. There's no need to
have a special case for just one type of device.
This also speeds up the iommu=pt path and reduces memory usage by
setting up the si_domain _once_ and then using it for all devices,
rather than giving each device its own private page tables.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
In caching mode, domain ID 0 is reserved for non-present to present
mapping flush. Device IOTLB doesn't need to be flushed in this case.
Previously we were avoiding the flush for domain zero, even if the IOMMU
wasn't in caching mode and domain zero wasn't special.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Drop the e820 scanning and use existing function for finding valid
RAM regions to add to 1:1 mapping.
Signed-off-by: Chris Wright <chrisw@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Identity mapping for IOMMU defines a single domain to 1:1 map all PCI
devices to all usable memory.
This reduces map/unmap overhead in DMA API's and improve IOMMU
performance. On 10Gb network cards, Netperf shows no performance
degradation compared to non-IOMMU performance.
This method may lose some of DMA remapping benefits like isolation.
The patch sets up identity mapping for all PCI devices to all usable
memory. In the DMA API, there is no overhead to maintain page tables,
invalidate iotlb, flush cache etc.
32 bit DMA devices don't use identity mapping domain, in order to access
memory beyond 4GiB.
When kernel option iommu=pt, pass through is first tried. If pass
through succeeds, IOMMU goes to pass through. If pass through is not
supported in hw or fail for whatever reason, IOMMU goes to identity
mapping.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* git://git.infradead.org/~dwmw2/iommu-2.6.31:
intel-iommu: Fix one last ia64 build problem in Pass Through Support
VT-d: support the device IOTLB
VT-d: cleanup iommu_flush_iotlb_psi and flush_unmaps
VT-d: add device IOTLB invalidation support
VT-d: parse ATSR in DMA Remapping Reporting Structure
PCI: handle Virtual Function ATS enabling
PCI: support the ATS capability
intel-iommu: dmar_set_interrupt return error value
intel-iommu: Tidy up iommu->gcmd handling
intel-iommu: Fix tiny theoretical race in write-buffer flush.
intel-iommu: Clean up handling of "caching mode" vs. IOTLB flushing.
intel-iommu: Clean up handling of "caching mode" vs. context flushing.
VT-d: fix invalid domain id for KVM context flush
Fix !CONFIG_DMAR build failure introduced by Intel IOMMU Pass Through Support
Intel IOMMU Pass Through Support
Fix up trivial conflicts in drivers/pci/{intel-iommu.c,intr_remapping.c}
Conflicts:
arch/mips/sibyte/bcm1480/irq.c
arch/mips/sibyte/sb1250/irq.c
Merge reason: we gathered a few conflicts plus update to latest upstream fixes.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Enable the device IOTLB (i.e. ATS) for both the bare metal and KVM
environments.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Make iommu_flush_iotlb_psi() and flush_unmaps() more readable.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
PAGE_MASK is 0xFFFFF000 on i386 -- even with PAE.
So it's not sufficient to ensure that you use phys_addr_t or uint64_t
everywhere you handle physical addresses -- you also have to avoid using
the construct 'addr & PAGE_MASK', because that will strip the high 32
bits of the address.
This patch avoids that problem by using PHYSICAL_PAGE_MASK instead of
PAGE_MASK where appropriate. It leaves '& PAGE_MASK' in a few instances
that don't matter -- where it's being used on the virtual bus addresses
we're dishing out, which are 32-bit anyway.
Since PHYSICAL_PAGE_MASK is not present on other architectures, we have
to define it (to PAGE_MASK) if it's not already defined.
Maybe it would be better just to fix PAGE_MASK for i386/PAE?
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In iommu_flush_write_buffer() we read iommu->gcmd before taking the
register_lock, and then we mask in the WBF bit and write it to the
register.
There is a tiny chance that something else could have _changed_
iommu->gcmd before we take the lock, but after we read it. So we could
be undoing that change.
Never actually going to have happened in practice, since nothing else
changes that register at runtime -- aside from the write-buffer flush
it's only ever touched at startup for enabling translation, etc.
But worth fixing anyway.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
As we just did for context cache flushing, clean up the logic around
whether we need to flush the iotlb or just the write-buffer, depending
on caching mode.
Fix the same bug in qi_flush_iotlb() that qi_flush_context() had -- it
isn't supposed to be returning an error; it's supposed to be returning a
flag which triggers a write-buffer flush.
Remove some superfluous conditional write-buffer flushes which could
never have happened because they weren't for non-present-to-present
mapping changes anyway.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
It really doesn't make a lot of sense to have some of the logic to
handle caching vs. non-caching mode duplicated in qi_flush_context() and
__iommu_flush_context(), while the return value indicates whether the
caller should take other action which depends on the same thing.
Especially since qi_flush_context() thought it was returning something
entirely different anyway.
This patch makes qi_flush_context() and __iommu_flush_context() both
return void, removes the 'non_present_entry_flush' argument and makes
the only call site which _set_ that argument to 1 do the right thing.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The domain->id is a sequence number associated with the KVM guest
and should not be used for the context flush. This patch replaces
the domain->id with a proper id value for both bare metal and KVM.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Acked-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This updated patch should fix the compiling errors and remove the extern
iommu_pass_through from drivers/pci/intel-iommu.c file.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The patch adds kernel parameter intel_iommu=pt to set up pass through
mode in context mapping entry. This disables DMAR in linux kernel; but
KVM still runs on VT-d and interrupt remapping still works.
In this mode, kernel uses swiotlb for DMA API functions but other VT-d
functionalities are enabled for KVM. KVM always uses multi level
translation page table in VT-d. By default, pass though mode is disabled
in kernel.
This is useful when people don't want to enable VT-d DMAR in kernel but
still want to use KVM and interrupt remapping for reasons like DMAR
performance concern or debug purpose.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Weidong Han <weidong@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Currently, when x2apic is not enabled, interrupt remapping
will be enabled in init_dmars(), where it is too late to remap
ioapic interrupts, that is, ioapic interrupts are really in
compatibility mode, not remappable mode.
This patch always enables interrupt remapping before ioapic
setup, it guarantees all interrupts will be remapped when
interrupt remapping is enabled. Thus it doesn't need to set
the compatibility interrupt bit.
[ Impact: refactor intr-remap init sequence, enable fuller remap mode ]
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: allen.m.kay@intel.com
Cc: fenghua.yu@intel.com
LKML-Reference: <1239957736-6161-4-git-send-email-weidong.han@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Replace all DMA_32BIT_MASK macro with DMA_BIT_MASK(32)
Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace all DMA_64BIT_MASK macro with DMA_BIT_MASK(64)
Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This issue was pointed out by Linus.
In dma_pte_clear_range() in intel-iommu.c
start = PAGE_ALIGN(start);
end &= PAGE_MASK;
npages = (end - start) / VTD_PAGE_SIZE;
In partial page case, start could be bigger than end and npages will be
negative.
Currently the issue doesn't show up as a real bug in because start and
end have been aligned to page boundary already by all callers. So the
issue has been hidden. But it is dangerous programming practice.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
It's possible for a device in the drhd->devices[] array to be NULL if
it wasn't found at boot time, which means we have to check for that
case.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
We were comparing {bus,devfn} and assuming that a match meant it was the
same device. It doesn't -- the same {bus,devfn} can exist in
multiple PCI domains. Include domain number in device identification
(and call it 'segment' in most places, because there's already a lot of
references to 'domain' which means something else, and this code is
infected with ACPI thinking already).
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
When the DMAR table identifies that a PCI-PCI bridge belongs to a given
IOMMU, that means that the bridge and all devices behind it should be
associated with the IOMMU. Not just the bridge itself.
This fixes the device_to_iommu() function accordingly.
(It's broken if you have the same PCI bus numbers in multiple domains,
but this function was always broken in that way; I'll be dealing with
that later).
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
interrupt remapping must be enabled before enabling x2apic, but
interrupt remapping doesn't depend on x2apic, it can be used
separately. Enable interrupt remapping in init_dmars even x2apic
is not supported.
[dwmw2: Update Kconfig accordingly, fix build with INTR_REMAP && !X2APIC]
Signed-off-by: Weidong Han <weidong.han@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This patch implements the suspend and resume feature for Intel IOMMU
DMAR. It hooks to kernel suspend and resume interface. When suspend happens, it
saves necessary hardware registers. When resume happens, it restores the
registers and restarts IOMMU by enabling translation, setting up root entry, and
re-enabling queued invalidation.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* git://git.infradead.org/iommu-2.6:
intel-iommu: Fix address wrap on 32-bit kernel.
intel-iommu: Enable DMAR on 32-bit kernel.
intel-iommu: fix PCI device detach from virtual machine
intel-iommu: VT-d page table to support snooping control bit
iommu: Add domain_has_cap iommu_ops
intel-iommu: Snooping control support
Fixed trivial conflicts in arch/x86/Kconfig and drivers/pci/intel-iommu.c
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (88 commits)
PCI: fix HT MSI mapping fix
PCI: don't enable too much HT MSI mapping
x86/PCI: make pci=lastbus=255 work when acpi is on
PCI: save and restore PCIe 2.0 registers
PCI: update fakephp for bus_id removal
PCI: fix kernel oops on bridge removal
PCI: fix conflict between SR-IOV and config space sizing
powerpc/PCI: include pci.h in powerpc MSI implementation
PCI Hotplug: schedule fakephp for feature removal
PCI Hotplug: rename legacy_fakephp to fakephp
PCI Hotplug: restore fakephp interface with complete reimplementation
PCI: Introduce /sys/bus/pci/devices/.../rescan
PCI: Introduce /sys/bus/pci/devices/.../remove
PCI: Introduce /sys/bus/pci/rescan
PCI: Introduce pci_rescan_bus()
PCI: do not enable bridges more than once
PCI: do not initialize bridges more than once
PCI: always scan child buses
PCI: pci_scan_slot() returns newly found devices
PCI: don't scan existing devices
...
Fix trivial append-only conflict in Documentation/feature-removal-schedule.txt
* 'iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (60 commits)
dma-debug: make memory range checks more consistent
dma-debug: warn of unmapping an invalid dma address
dma-debug: fix dma_debug_add_bus() definition for !CONFIG_DMA_API_DEBUG
dma-debug/x86: register pci bus for dma-debug leak detection
dma-debug: add a check dma memory leaks
dma-debug: add checks for kernel text and rodata
dma-debug: print stacktrace of mapping path on unmap error
dma-debug: Documentation update
dma-debug: x86 architecture bindings
dma-debug: add function to dump dma mappings
dma-debug: add checks for sync_single_sg_*
dma-debug: add checks for sync_single_range_*
dma-debug: add checks for sync_single_*
dma-debug: add checking for [alloc|free]_coherent
dma-debug: add add checking for map/unmap_sg
dma-debug: add checking for map/unmap_page/single
dma-debug: add core checking functions
dma-debug: add debugfs interface
dma-debug: add kernel command line parameters
dma-debug: add initialization code
...
Fix trivial conflicts due to whitespace changes in arch/x86/kernel/pci-nommu.c
The problem is in dma_pte_clear_range and dma_pte_free_pagetable. When
intel_unmap_single and intel_unmap_sg call them, the end address may be
zero if the 'start_addr + size' rounds up. So no PTE gets cleared. The
uncleared PTE fires the BUG_ON when it's used again to create new mappings.
After I modified dma_pte_clear_range a bit, the BUG_ON is gone.
Tested both 32 and 32 PAE modes on Intel X58 and Q35 platforms.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
If we fix a few highmem-related thinkos and a couple of printk format
warnings, the Intel IOMMU driver works fine in a 32-bit kernel.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
When assign a device behind conventional PCI bridge or PCIe to
PCI/PCI-x bridge to a domain, it must assign its bridge and may
also need to assign secondary interface to the same domain.
Dependent assignment is already there, but dependent
deassignment is missed when detach device from virtual machine.
This results in conventional PCI device assignment failure after
it has been assigned once. This patch addes dependent
deassignment, and fixes the issue.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The user can request to enable snooping control through VT-d page table.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This iommu_op can tell if domain have a specific capability, like snooping
control for Intel IOMMU, which can be used by other components of kernel to
adjust the behaviour.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Snooping control enabled IOMMU to guarantee DMA cache coherency and thus reduce
software effort (VMM) in maintaining effective memory type.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
According to kerneljanitors todo list all printk calls (beginning
a new line) should have an according KERN_* constant.
Those are the missing pieces here for the pci subsystem.
Signed-off-by: Frank Seidel <frank@f-seidel.de>
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Tested-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Impact: cleanup/sanitization
Start from a sane state while enabling dma and interrupt-remapping, by
clearing the previous recorded faults and disabling previously
enabled queued invalidation and interrupt-remapping.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Impact: interface augmentation (not yet used)
Enable fault handling flow for intr-remapping aswell. Fault handling
code now shared by both dma-remapping and intr-remapping.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Impact: code movement
Move page fault handling code to dmar.c
This will be shared both by DMA-remapping and Intr-remapping code.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
This is the cause of the DMA faults and disk corruption that people have
been seeing. Some chipsets neglect to report the RWBF "capability" --
the flag which says that we need to flush the chipset write-buffer when
changing the DMA page tables, to ensure that the change is visible to
the IOMMU.
Override that bit on the affected chipsets, and everything is happy
again.
Thanks to Chris and Bhavesh and others for helping to debug.
Should resolve:
https://bugzilla.redhat.com/show_bug.cgi?id=479996http://bugzilla.kernel.org/show_bug.cgi?id=12578
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Tested-and-acked-by: Chris Wright <chrisw@sous-sol.org>
Reviewed-by: Bhavesh Davda <bhavesh@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Due to recurring issues with DMAR support on certain platforms.
There's a number of filesystem corruption incidents reported:
https://bugzilla.redhat.com/show_bug.cgi?id=479996http://bugzilla.kernel.org/show_bug.cgi?id=12578
Provide a Kconfig option to change whether it is enabled by
default.
If disabled, it can still be reenabled by passing intel_iommu=on to the
kernel. Keep the .config option off by default.
Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-By: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The dma ops unification enables X86 and IA64 to share intel_dma_ops so
we can make dma mapping functions static. This also remove unused
intel_map_single().
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
dma_mapping_error is used to see if dma_map_single and dma_map_page
succeed. IA64 VT-d dma_mapping_error always says that dma_map_single
is successful even though it could fail. Note that X86 VT-d works
properly in this regard.
This patch fixes IA64 VT-d dma_mapping_error by adding VT-d's own
dma_mapping_error() that works for both X86_64 and IA64. VT-d uses
zero as an error dma address so VT-d's dma_mapping_error returns 1 if
a passed dma address is zero (as x86's VT-d dma_mapping_error does
now).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
With some broken BIOSs when VT-d is enabled, the data structures are
filled incorrectly. This can cause a NULL pointer dereference in very
early boot.
Signed-off-by: Dirk Hohndel <hohndel@linux.intel.com>
Acked-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This converts X86 and IA64 to use include/linux/dma-mapping.h.
It's a bit large but pretty boring. The major change for X86 is
converting 'int dir' to 'enum dma_data_direction dir' in DMA mapping
operations. The major changes for IA64 is using map_page and
unmap_page instead of map_single and unmap_single.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch converts dma_map_single and dma_unmap_single to use
map_page and unmap_page respectively and removes unnecessary
map_single and unmap_single in struct dma_mapping_ops.
This leaves intel-iommu's dma_map_single and dma_unmap_single since
IA64 uses them. They will be removed after the unification.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is a preparation of struct dma_mapping_ops unification. We use
map_page and unmap_page instead of map_single and unmap_single.
This uses a temporary workaround, ifdef X86_64 to avoid IA64
build. The workaround will be removed after the unification. Well,
changing x86's struct dma_mapping_ops could break IA64. It's just
wrong. It's one of problems that this patchset fixes.
We will remove map_single and unmap_single hooks in the last patch in
this patchset.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When domain is related to multiple iommus, need to check if the minimum agaw is sufficient for the mapped memory
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
vm_domid won't be set in context, find available domain id for a device from its iommu.
For a virtual machine domain, a default agaw will be set, and skip top levels of page tables for iommu which has less agaw than default.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
virtual machine domain is different from native DMA-API domain, implement separate allocation and free functions for virtual machine domain.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Because virtual machine domain may have multiple devices from different iommus, it cannot use __iommu_flush_cache.
In some common low level functions, use domain_flush_cache instead of __iommu_flush_cache. On the other hand, in some functions, iommu can is specified or domain cannot be got, still use __iommu_flush_cache
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Add iommu reference count in domain, and add a lock to protect iommu setting including iommu_bmp, iommu_count and iommu_coherency.
virtual machine domain may have multiple devices from different iommus, so it needs to do more things when add/remove domain device info. Thus implement separate these functions for virtual machine domain.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Add this flag for VT-d used in virtual machine, like KVM.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
In dmar_domain, more than one iommus may be included in iommu_bmp. Due to "Coherency" capability may be different across iommus, set this variable to indicate iommu access is coherent or not. Only when all related iommus in a dmar_domain are all coherent, iommu access of this domain is coherent.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
"SAGAW" capability may be different across iommus. Use a default agaw, but if default agaw is not supported in some iommus, choose a less supported agaw.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
In order to support assigning multiple devices from different iommus to a domain, iommu bitmap is used to keep all iommus the domain are related to.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
deferred_flush[] uses the iommu seq_id to index, so its iommu is fixed and can get it from g_iommus.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
It's random number after the domain is allocated by kmem_cache_alloc
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Some macros were unused, so I just dropped them:
context_fault_disable
context_translation_type
context_address_root
context_address_width
context_domain_id
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
We keep the struct root_entry forward declaration for the
pointer in struct intel_iommu.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
init_dmars() is not used outside of drivers/pci/intel-iommu.c
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The seg, saved_msg and sysdev fields appear to be unused since
before the code was first merged.
linux/msi.h is not needed in linux/intel-iommu.h anymore since
there is no longer a reference to struct msi_msg. The MSI code
in drivers/pci/intel-iommu.c still has linux/msi.h included
via linux/dmar.h.
linux/sysdev.h isn't needed because there is no reference to
struct sys_device.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Impact: cleanup
I got the following warnings on IA64:
linux-2.6/drivers/pci/intel-iommu.c: In function 'init_dmars':
linux-2.6/drivers/pci/intel-iommu.c:1658: warning: format '%Lx' expects type 'long long unsigned int', but argument 2 has type 'u64'
linux-2.6/drivers/pci/intel-iommu.c:1663: warning: format '%Lx' expects type 'long long unsigned int', but argument 2 has type 'u64'
Another victim of int-ll64.h versus int-l64.h confusion between platforms.
->reg_base_addr has a type of u64 - which can only be printed out
consistently if we cast its type up to LL.
[ Eventually reg_base_addr should be converted to phys_addr_t, for which
we have the %pR printk helper - but that is out of the scope of late
-rc's. ]
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix for a typo and and replacing incorrect word in the comment.
Signed-off-by: Ameya Palande <2ameya@gmail.com>
Cc: "Ashok Raj" <ashok.raj@intel.com>
Cc: "Shaohua Li" <shaohua.li@intel.com>
Cc: "Anil S Keshavamurthy" <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes intel-iommu to use dev->coherent_dma_mask in
alloc_coherent. Currently, intel-iommu uses dev->dma_mask in
alloc_coherent but alloc_coherent is supposed to use
coherent_dma_mask. It could break drivers that uses smaller
coherent_dma_mask than dma_mask (though the current code works for the
majority that use the same mask for coherent_dma_mask and dma_mask).
[dwmw2: dma_mask can be bigger than 'unsigned long']
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The current Intel IOMMU code assumes that both host page size and Intel
IOMMU page size are 4KiB. The first patch supports variable page size.
This provides support for IA64 which has multiple page sizes.
This patch also adds some other code hooks for IA64 platform including
DMAR_OPERATION_TIMEOUT definition.
[dwmw2: some cleanup]
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
If queued invalidation interface is available and enabled, queued invalidation
interface will be used instead of the register based interface.
According to Vt-d2 specification, when queued invalidation is enabled,
invalidation command submit works only through invalidation queue and not
through the command registers interface.
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This patch extends the VT-d driver to support KVM
[Ben: fixed memory pinning]
[avi: move dma_remapping.h as well]
Signed-off-by: Kay, Allen M <allen.m.kay@intel.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Acked-by: Mark Gross <mgross@linux.intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (72 commits)
Revert "x86/PCI: ACPI based PCI gap calculation"
PCI: remove unnecessary volatile in PCIe hotplug struct controller
x86/PCI: ACPI based PCI gap calculation
PCI: include linux/pm_wakeup.h for device_set_wakeup_capable
PCI PM: Fix pci_prepare_to_sleep
x86/PCI: Fix PCI config space for domains > 0
Fix acpi_pm_device_sleep_wake() by providing a stub for CONFIG_PM_SLEEP=n
PCI: Simplify PCI device PM code
PCI PM: Introduce pci_prepare_to_sleep and pci_back_from_sleep
PCI ACPI: Rework PCI handling of wake-up
ACPI: Introduce new device wakeup flag 'prepared'
ACPI: Introduce acpi_device_sleep_wake function
PCI: rework pci_set_power_state function to call platform first
PCI: Introduce platform_pci_power_manageable function
ACPI: Introduce acpi_bus_power_manageable function
PCI: make pci_name use dev_name
PCI: handle pci_name() being const
PCI: add stub for pci_set_consistent_dma_mask()
PCI: remove unused arch pcibios_update_resource() functions
PCI: fix pci_setup_device()'s sprinting into a const buffer
...
Fixed up conflicts in various files (arch/x86/kernel/setup_64.c,
arch/x86/pci/irq.c, arch/x86/pci/pci.h, drivers/acpi/sleep/main.c,
drivers/pci/pci.c, drivers/pci/pci.h, include/acpi/acpi_bus.h) from x86
and ACPI updates manually.
Queued invalidation (part of Intel Virtualization Technology for
Directed I/O architecture) infrastructure.
This will be used for invalidating the interrupt entry cache in the
case of Interrupt-remapping and IOTLB invalidation in the case
of DMA-remapping.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: akpm@linux-foundation.org
Cc: arjan@linux.intel.com
Cc: andi@firstfloor.org
Cc: ebiederm@xmission.com
Cc: jbarnes@virtuousgeek.org
Cc: steiner@sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Allocate the iommu during the parse of DMA remapping hardware
definition structures. And also, introduce routines for device
scope initialization which will be explicitly called during
dma-remapping initialization.
These will be used for enabling interrupt remapping separately from the
existing DMA-remapping enabling sequence.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: akpm@linux-foundation.org
Cc: arjan@linux.intel.com
Cc: andi@firstfloor.org
Cc: ebiederm@xmission.com
Cc: jbarnes@virtuousgeek.org
Cc: steiner@sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Clean up the intel-iommu code related to deferred iommu flush logic. There is
no need to allocate all the iommu's as a sequential array.
This will be used later in the interrupt-remapping patch series to
allocate iommu much early and individually for each device remapping
hardware unit.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: akpm@linux-foundation.org
Cc: arjan@linux.intel.com
Cc: andi@firstfloor.org
Cc: ebiederm@xmission.com
Cc: jbarnes@virtuousgeek.org
Cc: steiner@sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
code reorganization of the generic Intel vt-d parsing related routines and linux
iommu routines specific to Intel vt-d.
drivers/pci/dmar.c now contains the generic vt-d parsing related routines
drivers/pci/intel_iommu.c contains the iommu routines specific to vt-d
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: akpm@linux-foundation.org
Cc: arjan@linux.intel.com
Cc: andi@firstfloor.org
Cc: ebiederm@xmission.com
Cc: jbarnes@virtuousgeek.org
Cc: steiner@sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
gart.h has only GART-specific stuff. Only GART code needs it. Other
IOMMU stuff should include iommu.h instead of gart.h.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
want to remove arch_get_ram_range, and use early_node_map instead.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The destination of goto error also does a kfree(g_iommus), so it is not
correct to do one here.
This was found using Coccinelle (http://www.emn.fr/x-info/coccinelle/).
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The following patch changes the intel-iommu.c code to use the TSC
instead of jiffies for detecting bad DMAR functionality. Some systems
with bad bios's have been seen to hang in early boot spinning in the
IOMMU_WAIT_IO macro. This patch will replace the infinite loop with a call to
panic.
Signed-off-by: Mark Gross <mgross@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Stephen Rothwell noticed that:
Commit 2be621498d ("x86: dma-ops on highmem
fix") in Linus' tree introduced a new warning (noticed in the x86_64
allmodconfig build of linux-next):
drivers/pci/intel-iommu.c:2240: warning: initialization from incompatible pointer type
Which points at an instance of map_single that needs updating.
Fix it to the new prototype.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The following patch is an update to use an array instead of a list of
IOVA's in the implementation of defered iotlb flushes. It takes
inspiration from sba_iommu.c
I like this implementation better as it encapsulates the batch process
within intel-iommu.c, and no longer touches iova.h (which is shared)
Performance data: Netperf 32byte UDP streaming
2.6.25-rc3-mm1:
IOMMU-strict : 58Mps @ 62% cpu
NO-IOMMU : 71Mbs @ 41% cpu
List-based IOMMU-default-batched-IOTLB flush: 66Mbps @ 57% cpu
with this patch:
IOMMU-strict : 73Mps @ 75% cpu
NO-IOMMU : 74Mbs @ 42% cpu
Array-based IOMMU-default-batched-IOTLB flush: 72Mbps @ 62% cpu
Signed-off-by: <mgross@linux.intel.com>
Cc: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch is for batching up the flushing of the IOTLB for the DMAR
implementation found in the Intel VT-d hardware. It works by building a list
of to be flushed IOTLB entries and a bitmap list of which DMAR engine they are
from.
After either a high water mark (250 accessible via debugfs) or 10ms the list
of iova's will be reclaimed and the DMAR engines associated are IOTLB-flushed.
This approach recovers 15 to 20% of the performance lost when using the IOMMU
for my netperf udp stream benchmark with small packets. It can be disabled
with a kernel boot parameter "intel_iommu=strict".
Its use does weaken the IOMMU protections a bit.
Signed-off-by: Mark Gross <mgross@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
lockdep goes off on the iova copy_reserved_iova() because it and a function
it calls grabs locks in the from, and the to of the copy operation.
The function grab locks of the same lock classes triggering the warning. The
first lock grabbed is for the constant reserved areas that is never accessed
after early boot. Technically you could do without grabbing the locks for the
"from" structure its copying reserved areas from.
But dropping the from locks to me looks wrong, even though it would be ok.
The affected code only runs in early boot as its setting up the DMAR
engines.
This patch gives the reserved_ioval_list locks special lockdep classes.
Signed-off-by: Mark Gross <mgross@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The following is a clean up and correction of the copyright holding
entities for the files associated with the intel iommu code.
Signed-off-by: <mgross@linux.intel.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix an off by one bug in the fault reason string reporting function, and
clean up some of the code around this buglet.
[akpm@linux-foundation.org: cleanup]
Signed-off-by: mark gross <mgross@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add support for protected memory enable bits by clearing them if they are
set at startup time. Some future boot loaders or firmware could have this
bit set after it loads the kernel, and it needs to be cleared if DMA's are
going to happen effectively.
Signed-off-by: mark gross <mgross@intel.com>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I would like to potentially move the sparc64 IOMMU code over to using
the nice new drivers/pci/iova.[ch] code for free area management..
In order to do that we have to detach the IOMMU page size assumptions
which only really need to exist in the intel-iommu.[ch] code.
This patch attempts to implement that.
[akpm@linux-foundation.org: build fix]
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch renames the include file asm-x86/iommu.h to asm-x86/gart.h to make
clear to which IOMMU implementation it belongs. The patch also adds "GART" to
the Kconfig line.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
- off by one in dmar_get_fault_reason() (maximal index in array is
ARRAY_SIZE()-1, not ARRAY_SIZE())
- NULL noise removal
- __iomem annotation fix
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
x86_64 defines ARCH_HAS_SG_CHAIN. So if IOMMU implementations don't
support sg chaining, we will get data corruption.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pci_dev's->sysdata is highly overloaded and currently IOMMU is broken due
to IOMMU code depending on this field.
This patch introduces new field in pci_dev's dev.archdata struct to hold
IOMMU specific per device IOMMU private data.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greg KH <greg@kroah.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds PageSelectiveInvalidation support replacing existing
DomainSelectiveInvalidation for intel_{map/unmap}_sg() calls and also
enables to mapping one big contiguous DMA virtual address which is mapped
to discontiguous physical address for SG map/unmap calls.
"Doamin selective invalidations" wipes out the IOMMU address translation
cache based on domain ID where as "Page selective invalidations" wipes out
the IOMMU address translation cache for that address mask range which is
more cache friendly when compared to Domain selective invalidations.
Here is how it is done.
1) changes to iova.c
alloc_iova() now takes a bool size_aligned argument, which
when when set, returns the io virtual address that is
naturally aligned to 2 ^ x, where x is the order
of the size requested.
Returning this io vitual address which is naturally
aligned helps iommu to do the "page selective
invalidations" which is IOMMU cache friendly
over "domain selective invalidations".
2) Changes to driver/pci/intel-iommu.c
Clean up intel_{map/unmap}_{single/sg} () calls so that
s/g map/unamp calls is no more dependent on
intel_{map/unmap}_single()
intel_map_sg() now computes the total DMA virtual address
required and allocates the size aligned total DMA virtual address
and maps the discontiguous physical address to the allocated
contiguous DMA virtual address.
In the intel_unmap_sg() case since the DMA virtual address
is contiguous and size_aligned, PageSelectiveInvalidation
is used replacing earlier DomainSelectiveInvalidations.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Greg KH <greg@kroah.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Suresh B <suresh.b.siddha@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This config option (DMAR_FLPY_WA) sets up 1:1 mapping for the floppy device so
that the floppy device which does not use DMA api's will continue to work.
Once the floppy driver starts using DMA api's this config option can be turn
off or this patch can be yanked out of kernel at that time.
[akpm@linux-foundation.org: cleanups, rename things, build fix]
[jengelh@computergmbh.de: Kconfig fixes]
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When we fix all the opensource gfx drivers to use the DMA api's, at that time
we can yank this config options out.
[jengelh@computergmbh.de: Kconfig fixes]
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
MSI interrupt handler registrations and fault handling support for Intel-IOMMU
hadrware.
This patch enables the MSI interrupts for the DMA remapping units and in the
interrupt handler read the fault cause and outputs the same on to the console.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Intel IOMMU driver needs memory during DMA map calls to setup its internal
page tables and for other data structures. As we all know that these DMA map
calls are mostly called in the interrupt context or with the spinlock held by
the upper level drivers(network/storage drivers), so in order to avoid any
memory allocation failure due to low memory issues, this patch makes memory
allocation by temporarily setting PF_MEMALLOC flags for the current task
before making memory allocation calls.
We evaluated mempools as a backup when kmem_cache_alloc() fails
and found that mempools are really not useful here because
1) We don't know for sure how much to reserve in advance
2) And mempools are not useful for GFP_ATOMIC case (as we call
memory alloc functions with GFP_ATOMIC)
(akpm: point 2 is wrong...)
With PF_MEMALLOC flag set in the current->flags, the VM subsystem avoids any
watermark checks before allocating memory thus guarantee'ing the memory till
the last free page. Further, looking at the code in mm/page_alloc.c in
__alloc_pages() function, looks like this flag is useful only in the
non-interrupt context.
If we are in the interrupt context and memory allocation in IOMMU driver fails
for some reason, then the DMA map api's will return failure and it is up to
the higher level drivers to retry. Suppose, if upper level driver programs
the controller with the buggy DMA virtual address, the IOMMU will block that
DMA transaction when that happens thus preventing any corruption to main
memory.
So far in our test scenario, we were unable to create any memory allocation
failure inside dma map api calls.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Actual intel IOMMU driver. Hardware spec can be found at:
http://www.intel.com/technology/virtualization
This driver sets X86_64 'dma_ops', so hook into standard DMA APIs. In this
way, PCI driver will get virtual DMA address. This change is transparent to
PCI drivers.
[akpm@linux-foundation.org: remove unneeded cast]
[akpm@linux-foundation.org: build fix]
[bunk@stusta.de: fix duplicate CONFIG_DMAR Makefile line]
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>