Commit Graph

534 Commits

Author SHA1 Message Date
Zhen Lei aa3ac9469c iommu/iova: Make dma_32bit_pfn implicit
Now that the cached node optimisation can apply to all allocations, the
couple of users which were playing tricks with dma_32bit_pfn in order to
benefit from it can stop doing so. Conversely, there is also no need for
all the other users to explicitly calculate a 'real' 32-bit PFN, when
init_iova_domain() can happily do that itself from the page granularity.

CC: Thierry Reding <thierry.reding@gmail.com>
CC: Jonathan Hunter <jonathanh@nvidia.com>
CC: David Airlie <airlied@linux.ie>
CC: Sudeep Dutt <sudeep.dutt@intel.com>
CC: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Zhen Lei <thunder.leizhen@huawei.com>
Tested-by: Nate Watterson <nwatters@codeaurora.org>
[rm: use iova_shift(), rewrote commit message]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-09-27 17:09:57 +02:00
Linus Torvalds 4dfc278803 IOMMU Updates for Linux v4.14
Slightly more changes than usual this time:
 
 	- KDump Kernel IOMMU take-over code for AMD IOMMU. The code now
 	  tries to preserve the mappings of the kernel so that master
 	  aborts for devices are avoided. Master aborts cause some
 	  devices to fail in the kdump kernel, so this code makes the
 	  dump more likely to succeed when AMD IOMMU is enabled.
 
 	- Common flush queue implementation for IOVA code users. The
 	  code is still optional, but AMD and Intel IOMMU drivers had
 	  their own implementation which is now unified.
 
 	- Finish support for iommu-groups. All drivers implement this
 	  feature now so that IOMMU core code can rely on it.
 
 	- Finish support for 'struct iommu_device' in iommu drivers. All
 	  drivers now use the interface.
 
 	- New functions in the IOMMU-API for explicit IO/TLB flushing.
 	  This will help to reduce the number of IO/TLB flushes when
 	  IOMMU drivers support this interface.
 
 	- Support for mt2712 in the Mediatek IOMMU driver
 
 	- New IOMMU driver for QCOM hardware
 
 	- System PM support for ARM-SMMU
 
 	- Shutdown method for ARM-SMMU-v3
 
 	- Some constification patches
 
 	- Various other small improvements and fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJZtCFNAAoJECvwRC2XARrjZnQP/AxC/ezQpq82HbegF4sM/cVE
 Ep7TeTqodEl75FS/6txe2wU0pwodqk/LB9ajfQZUbE1w8vKsNEqi5qf4ZYHGoxYI
 5bWyjJBzKIlwENH5lsBpQNt6XLevrYmRsFy7F0tRYy+qPQq8k+js2i7/XkCL3q7L
 3xklF847RRoITaTOhhaROx1pF23dSMEsS2XGuWHcZfjORtep4wcFKzd/2SvlCWCo
 P2bRU7jBzfJuuGSA80gaiUbDmrULTUfYuZNp7njASzCgsDmagERtvDEpdoXPNNSp
 u6s4LjU1Dp3fgr6g6cFRO7B6JUbWd619nwo9so/c/wZN54yEngBF9EyeeF3mv2O5
 ZbM2mOW3RlZcjxFT/AC8G4cZwwP6MpCEQOdqknoqc6ZQwcDqwN0o9I4+po0wsiAU
 89ijZZe9Mx0p9lNpihaBEB1erAUWPo5Obh62zo80W3h6x9WzkGQWM+PyFK2DYoaC
 8biEZzcc21sLEHvXQkcEGJSKrihHr9sluOqvxmCw5QAkYIFAeZRoeH7JtZWjVCnr
 T3XvaG1G1Aw6tS7ErxufdKawREAGki0Rm9i1baiH9sqNj5rllM01Y+PgU6E21Nbg
 iZp9gJLjfwM4vhYLlovvQK5PRoOBsCkyCpEI4GJqjLeam5p/WN06CFFf0ifQofYr
 qDPCVDkWHWV8nugFFKE7
 =EVh9
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU updates from Joerg Roedel:
 "Slightly more changes than usual this time:

   - KDump Kernel IOMMU take-over code for AMD IOMMU. The code now tries
     to preserve the mappings of the kernel so that master aborts for
     devices are avoided. Master aborts cause some devices to fail in
     the kdump kernel, so this code makes the dump more likely to
     succeed when AMD IOMMU is enabled.

   - common flush queue implementation for IOVA code users. The code is
     still optional, but AMD and Intel IOMMU drivers had their own
     implementation which is now unified.

   - finish support for iommu-groups. All drivers implement this feature
     now so that IOMMU core code can rely on it.

   - finish support for 'struct iommu_device' in iommu drivers. All
     drivers now use the interface.

   - new functions in the IOMMU-API for explicit IO/TLB flushing. This
     will help to reduce the number of IO/TLB flushes when IOMMU drivers
     support this interface.

   - support for mt2712 in the Mediatek IOMMU driver

   - new IOMMU driver for QCOM hardware

   - system PM support for ARM-SMMU

   - shutdown method for ARM-SMMU-v3

   - some constification patches

   - various other small improvements and fixes"

* tag 'iommu-updates-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (87 commits)
  iommu/vt-d: Don't be too aggressive when clearing one context entry
  iommu: Introduce Interface for IOMMU TLB Flushing
  iommu/s390: Constify iommu_ops
  iommu/vt-d: Avoid calling virt_to_phys() on null pointer
  iommu/vt-d: IOMMU Page Request needs to check if address is canonical.
  arm/tegra: Call bus_set_iommu() after iommu_device_register()
  iommu/exynos: Constify iommu_ops
  iommu/ipmmu-vmsa: Make ipmmu_gather_ops const
  iommu/ipmmu-vmsa: Rereserving a free context before setting up a pagetable
  iommu/amd: Rename a few flush functions
  iommu/amd: Check if domain is NULL in get_domain() and return -EBUSY
  iommu/mediatek: Fix a build warning of BIT(32) in ARM
  iommu/mediatek: Fix a build fail of m4u_type
  iommu: qcom: annotate PM functions as __maybe_unused
  iommu/pamu: Fix PAMU boot crash
  memory: mtk-smi: Degrade SMI init to module_init
  iommu/mediatek: Enlarge the validate PA range for 4GB mode
  iommu/mediatek: Disable iommu clock when system suspend
  iommu/mediatek: Move pgtable allocation into domain_alloc
  iommu/mediatek: Merge 2 M4U HWs into one iommu domain
  ...
2017-09-09 15:03:24 -07:00
Linus Torvalds 0d519f2d1e pci-v4.14-changes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZsr8cAAoJEFmIoMA60/r8lXYQAKViYIRMJDD4n3NhjMeLOsnJ
 vwaBmWlLRjSFIEpag5kMjS1RJE17qAvmkBZnDvSNZ6cT28INkkZnVM2IW96WECVq
 64MIvDijVPcvqGuWePCfWdDiSXApiDWwJuw55BOhmvV996wGy0gYgzpPY+1g0Knh
 XzH9IOzDL79hZleLfsxX0MLV6FGBVtOsr0jvQ04k4IgEMIxEDTlbw85rnrvzQUtc
 0Vj2koaxWIESZsq7G/wiZb2n6ekaFdXO/VlVvvhmTSDLCBaJ63Hb/gfOhwMuVkS6
 B3cVprNrCT0dSzWmU4ZXf+wpOyDpBexlemW/OR/6CQUkC6AUS6kQ5si1X44dbGmJ
 nBPh414tdlm/6V4h/A3UFPOajSGa/ZWZ/uQZPfvKs1R6WfjUerWVBfUpAzPbgjam
 c/mhJ19HYT1J7vFBfhekBMeY2Px3JgSJ9rNsrFl48ynAALaX5GEwdpo4aqBfscKz
 4/f9fU4ysumopvCEuKD2SsJvsPKd5gMQGGtvAhXM1TxvAoQ5V4cc99qEetAPXXPf
 h2EqWm4ph7YP4a+n/OZBjzluHCmZJn1CntH5+//6wpUk6HnmzsftGELuO9n12cLE
 GGkreI3T9ctV1eOkzVVa0l0QTE1X/VLyEyKCtb9obXsDaG4Ud7uKQoZgB19DwyTJ
 EG76ridTolUFVV+wzJD9
 =9cLP
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

 - add enhanced Downstream Port Containment support, which prints more
   details about Root Port Programmed I/O errors (Dongdong Liu)

 - add Layerscape ls1088a and ls2088a support (Hou Zhiqiang)

 - add MediaTek MT2712 and MT7622 support (Ryder Lee)

 - add MediaTek MT2712 and MT7622 MSI support (Honghui Zhang)

 - add Qualcom IPQ8074 support (Varadarajan Narayanan)

 - add R-Car r8a7743/5 device tree support (Biju Das)

 - add Rockchip per-lane PHY support for better power management (Shawn
   Lin)

 - fix IRQ mapping for hot-added devices by replacing the
   pci_fixup_irqs() boot-time design with a host bridge hook called at
   probe-time (Lorenzo Pieralisi, Matthew Minter)

 - fix race when enabling two devices that results in upstream bridge
   not being enabled correctly (Srinath Mannam)

 - fix pciehp power fault infinite loop (Keith Busch)

 - fix SHPC bridge MSI hotplug events by enabling bus mastering
   (Aleksandr Bezzubikov)

 - fix a VFIO issue by correcting PCIe capability sizes (Alex
   Williamson)

 - fix an INTD issue on Xilinx and possibly other drivers by unifying
   INTx IRQ domain support (Paul Burton)

 - avoid IOMMU stalls by marking AMD Stoney GPU ATS as broken (Joerg
   Roedel)

 - allow APM X-Gene device assignment to guests by adding an ACS quirk
   (Feng Kan)

 - fix driver crashes by disabling Extended Tags on Broadcom HT2100
   (Extended Tags support is required for PCIe Receivers but not
   Requesters, and we now enable them by default when Requesters support
   them) (Sinan Kaya)

 - fix MSIs for devices that use phantom RIDs for DMA by assuming MSIs
   use the real Requester ID (not a phantom RID) (Robin Murphy)

 - prevent assignment of Intel VMD children to guests (which may be
   supported eventually, but isn't yet) by not associating an IOMMU with
   them (Jon Derrick)

 - fix Intel VMD suspend/resume by releasing IRQs on suspend (Scott
   Bauer)

 - fix a Function-Level Reset issue with Intel 750 NVMe by waiting
   longer (up to 60sec instead of 1sec) for device to become ready
   (Sinan Kaya)

 - fix a Function-Level Reset issue on iProc Stingray by working around
   hardware defects in the CRS implementation (Oza Pawandeep)

 - fix an issue with Intel NVMe P3700 after an iProc reset by adding a
   delay during shutdown (Oza Pawandeep)

 - fix a Microsoft Hyper-V lockdep issue by polling instead of blocking
   in compose_msi_msg() (Stephen Hemminger)

 - fix a wireless LAN driver timeout by clearing DesignWare MSI
   interrupt status after it is handled, not before (Faiz Abbas)

 - fix DesignWare ATU enable checking (Jisheng Zhang)

 - reduce Layerscape dependencies on the bootloader by doing more
   initialization in the driver (Hou Zhiqiang)

 - improve Intel VMD performance allowing allocation of more IRQ vectors
   than present CPUs (Keith Busch)

 - improve endpoint framework support for initial DMA mask, different
   BAR sizes, configurable page sizes, MSI, test driver, etc (Kishon
   Vijay Abraham I, Stan Drozd)

 - rework CRS support to add periodic messages while we poll during
   enumeration and after Function-Level Reset and prepare for possible
   other uses of CRS (Sinan Kaya)

 - clean up Root Port AER handling by removing unnecessary code and
   moving error handler methods to struct pcie_port_service_driver
   (Christoph Hellwig)

 - clean up error handling paths in various drivers (Bjorn Andersson,
   Fabio Estevam, Gustavo A. R. Silva, Harunobu Kurokawa, Jeffy Chen,
   Lorenzo Pieralisi, Sergei Shtylyov)

 - clean up SR-IOV resource handling by disabling VF decoding before
   updating the corresponding resource structs (Gavin Shan)

 - clean up DesignWare-based drivers by unifying quirks to update Class
   Code and Interrupt Pin and related handling of write-protected
   registers (Hou Zhiqiang)

 - clean up by adding empty generic pcibios_align_resource() and
   pcibios_fixup_bus() and removing empty arch-specific implementations
   (Palmer Dabbelt)

 - request exclusive reset control for several drivers to allow cleanup
   elsewhere (Philipp Zabel)

 - constify various structures (Arvind Yadav, Bhumika Goyal)

 - convert from full_name() to %pOF (Rob Herring)

 - remove unused variables from iProc, HiSi, Altera, Keystone (Shawn
   Lin)

* tag 'pci-v4.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (170 commits)
  PCI: xgene: Clean up whitespace
  PCI: xgene: Define XGENE_PCI_EXP_CAP and use generic PCI_EXP_RTCTL offset
  PCI: xgene: Fix platform_get_irq() error handling
  PCI: xilinx-nwl: Fix platform_get_irq() error handling
  PCI: rockchip: Fix platform_get_irq() error handling
  PCI: altera: Fix platform_get_irq() error handling
  PCI: spear13xx: Fix platform_get_irq() error handling
  PCI: artpec6: Fix platform_get_irq() error handling
  PCI: armada8k: Fix platform_get_irq() error handling
  PCI: dra7xx: Fix platform_get_irq() error handling
  PCI: exynos: Fix platform_get_irq() error handling
  PCI: iproc: Clean up whitespace
  PCI: iproc: Rename PCI_EXP_CAP to IPROC_PCI_EXP_CAP
  PCI: iproc: Add 500ms delay during device shutdown
  PCI: Fix typos and whitespace errors
  PCI: Remove unused "res" variable from pci_resource_io()
  PCI: Correct kernel-doc of pci_vpd_srdt_size(), pci_vpd_srdt_tag()
  PCI/AER: Reformat AER register definitions
  iommu/vt-d: Prevent VMD child devices from being remapping targets
  x86/PCI: Use is_vmd() rather than relying on the domain number
  ...
2017-09-08 15:47:43 -07:00
Joerg Roedel 47b59d8e40 Merge branches 'arm/exynos', 'arm/renesas', 'arm/rockchip', 'arm/omap', 'arm/mediatek', 'arm/tegra', 'arm/qcom', 'arm/smmu', 'ppc/pamu', 'x86/vt-d', 'x86/amd', 's390' and 'core' into next 2017-09-01 11:31:42 +02:00
Filippo Sironi 5082219b6a iommu/vt-d: Don't be too aggressive when clearing one context entry
Previously, we were invalidating context cache and IOTLB globally when
clearing one context entry.  This is a tad too aggressive.
Invalidate the context cache and IOTLB for the interested device only.

Signed-off-by: Filippo Sironi <sironi@amazon.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-09-01 11:30:41 +02:00
Jon Derrick 5823e330b5 iommu/vt-d: Prevent VMD child devices from being remapping targets
VMD child devices must use the VMD endpoint's ID as the requester.  Because
of this, there needs to be a way to link the parent VMD endpoint's IOMMU
group and associated mappings to the VMD child devices such that attaching
and detaching child devices modify the endpoint's mappings, while
preventing early detaching on a singular device removal or unbinding.

The reassignment of individual VMD child devices devices to VMs is outside
the scope of VMD, but may be implemented in the future. For now it is best
to prevent any such attempts.

Prevent VMD child devices from returning an IOMMU, which prevents it from
exposing an iommu_group sysfs directory and allowing subsequent binding by
userspace-access drivers such as VFIO.

Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-08-30 16:43:58 -05:00
Ashok Raj 11b93ebfa0 iommu/vt-d: Avoid calling virt_to_phys() on null pointer
New kernels with debug show panic() from __phys_addr() checks. Avoid
calling virt_to_phys()  when pasid_state_tbl pointer is null

To: Joerg Roedel <joro@8bytes.org>
To: linux-kernel@vger.kernel.org>
Cc: iommu@lists.linux-foundation.org
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>

Fixes: 2f26e0a9c9 ('iommu/vt-d: Add basic SVM PASID support')
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-08-30 17:48:18 +02:00
Joerg Roedel 13cf017446 iommu/vt-d: Make use of iova deferred flushing
Remove the deferred flushing implementation in the Intel
VT-d driver and use the one from the common iova code
instead.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-08-15 18:23:53 +02:00
Joerg Roedel 2926a2aa5c iommu: Fix wrong freeing of iommu_device->dev
The struct iommu_device has a 'struct device' embedded into
it, not as a pointer, but the whole struct. In the
conversion of the iommu drivers to use struct iommu_device
it was forgotten that the relase function for that struct
device simply calls kfree() on the pointer.

This frees memory that was never allocated and causes memory
corruption.

To fix this issue, use a pointer to struct device instead of
embedding the whole struct. This needs some updates in the
iommu sysfs code as well as the Intel VT-d and AMD IOMMU
driver.

Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Fixes: 39ab9555c2 ('iommu: Add sysfs bindings for struct iommu_device')
Cc: stable@vger.kernel.org # >= v4.11
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-08-15 13:58:48 +02:00
David Dillow bc24c57159 iommu/vt-d: Don't free parent pagetable of the PTE we're adding
When adding a large scatterlist entry that covers more than the L3
superpage size (1GB) but has an alignment such that we must use L2
superpages (2MB) , we give dma_pte_free_level() a range that causes it
to free the L3 pagetable we're about to populate. We fix this by telling
dma_pte_free_pagetable() about the pagetable level we're about to populate
to prevent freeing it.

For example, mapping a scatterlist with entry lengths 854MB and 1194MB
at IOVA 0xffff80000000 would, when processing the 2MB-aligned second
entry, cause pfn_to_dma_pte() to create a L3 directory to hold L2
superpages for the mapping at IOVA 0xffffc0000000. We would previously
call dma_pte_free_pagetable(domain, 0xffffc0000, 0xfffffffff), which
would free the L3 directory pfn_to_dma_pte() just created for IO PFN
0xffffc0000. Telling dma_pte_free_pagetable() to retain the L3
directories while using L2 superpages avoids the erroneous free.

Signed-off-by: David Dillow <dillow@google.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-07-26 11:12:58 +02:00
Linus Torvalds fb4e3beeff IOMMU Updates for Linux v4.13
This update comes with:
 
 	* Support for lockless operation in the ARM io-pgtable code.
 	  This is an important step to solve the scalability problems in
 	  the common dma-iommu code for ARM
 
 	* Some Errata workarounds for ARM SMMU implemenations
 
 	* Rewrite of the deferred IO/TLB flush code in the AMD IOMMU
 	  driver. The code suffered from very high flush rates, with the
 	  new implementation the flush rate is down to ~1% of what it
 	  was before
 
 	* Support for amd_iommu=off when booting with kexec. Problem
 	  here was that the IOMMU driver bailed out early without
 	  disabling the iommu hardware, if it was enabled in the old
 	  kernel
 
 	* The Rockchip IOMMU driver is now available on ARM64
 
 	* Align the return value of the iommu_ops->device_group
 	  call-backs to not miss error values
 
 	* Preempt-disable optimizations in the Intel VT-d and common
 	  IOVA code to help Linux-RT
 
 	* Various other small cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJZZgddAAoJECvwRC2XARrjurgQANO338GIBr2ZkA0oectidDpZ
 Y4yu7W9RH6NyhupJG/Xooya7daBWFjbaA1AVJ3ZZNlMERh69AmehVfRUfVMzF2w+
 buma58HQgiJWN1zFD8xdeMzYKms9P77whA88C/9QvrK/klB3LipWP2SC0yvvvyxJ
 mMCDpgt+D+CGnIDqbRuyLDQoRu3yjAkAvYb6OzL8DPJVP1Y5oLffGwGnHzJbJnOf
 eWJwYHM5ai0uF/Qqy6RNNekacObjVaOLihjugGvokH6ipXfOrSSNriXW9pZiWR5m
 S91898YTP3KuWWsJM+N93UAjvc6pL9PqL/OvbB9zdYpzu+5PtUpFXHYcOebKyEEO
 4j9CaRzubsWFTFjbYItJnR4WgXQRf4NKOGfTfHMHA+dY8aODYnlXNVdQDAA2aFgn
 TUBvHq5xb0zZ3nbPwtTDyW06oDMVfBBarLx2yFI1aQSSh+eg/GtIi5KP28gyFZNz
 4gWj0q3g/e3y7WEwNbYV7L3TS0d/p8VUYFtUp7PUCddnWoY+4cJzgidub5xIViZD
 Ql0nZzga9pXXIE/kE5Pf74WqrG7JJzZsvK2ABy4+XGrMq6RclJf+0pXbSqiXDpXL
 quw8t0oXw0ZEeavQ31Za8mjXBvo5ocM5iintl1wrl2BujHEO3oKqbGsIOaRcLnlN
 Ukehbl4OEKzZpD3oLPPk
 =pmBf
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU updates from Joerg Roedel:
 "This update comes with:

   - Support for lockless operation in the ARM io-pgtable code.

     This is an important step to solve the scalability problems in the
     common dma-iommu code for ARM

   - Some Errata workarounds for ARM SMMU implemenations

   - Rewrite of the deferred IO/TLB flush code in the AMD IOMMU driver.

     The code suffered from very high flush rates, with the new
     implementation the flush rate is down to ~1% of what it was before

   - Support for amd_iommu=off when booting with kexec.

     The problem here was that the IOMMU driver bailed out early without
     disabling the iommu hardware, if it was enabled in the old kernel

   - The Rockchip IOMMU driver is now available on ARM64

   - Align the return value of the iommu_ops->device_group call-backs to
     not miss error values

   - Preempt-disable optimizations in the Intel VT-d and common IOVA
     code to help Linux-RT

   - Various other small cleanups and fixes"

* tag 'iommu-updates-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (60 commits)
  iommu/vt-d: Constify intel_dma_ops
  iommu: Warn once when device_group callback returns NULL
  iommu/omap: Return ERR_PTR in device_group call-back
  iommu: Return ERR_PTR() values from device_group call-backs
  iommu/s390: Use iommu_group_get_for_dev() in s390_iommu_add_device()
  iommu/vt-d: Don't disable preemption while accessing deferred_flush()
  iommu/iova: Don't disable preempt around this_cpu_ptr()
  iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #126
  iommu/arm-smmu-v3: Enable ACPI based HiSilicon CMD_PREFETCH quirk(erratum 161010701)
  iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #74
  ACPI/IORT: Fixup SMMUv3 resource size for Cavium ThunderX2 SMMUv3 model
  iommu/arm-smmu-v3, acpi: Add temporary Cavium SMMU-V3 IORT model number definitions
  iommu/io-pgtable-arm: Use dma_wmb() instead of wmb() when publishing table
  iommu/io-pgtable: depend on !GENERIC_ATOMIC64 when using COMPILE_TEST with LPAE
  iommu/arm-smmu-v3: Remove io-pgtable spinlock
  iommu/arm-smmu: Remove io-pgtable spinlock
  iommu/io-pgtable-arm-v7s: Support lockless operation
  iommu/io-pgtable-arm: Support lockless operation
  iommu/io-pgtable: Introduce explicit coherency
  iommu/io-pgtable-arm-v7s: Refactor split_blk_unmap
  ...
2017-07-12 10:00:04 -07:00
Linus Torvalds f72e24a124 This is the first pull request for the new dma-mapping subsystem
In this new subsystem we'll try to properly maintain all the generic
 code related to dma-mapping, and will further consolidate arch code
 into common helpers.
 
 This pull request contains:
 
  - removal of the DMA_ERROR_CODE macro, replacing it with calls
    to ->mapping_error so that the dma_map_ops instances are
    more self contained and can be shared across architectures (me)
  - removal of the ->set_dma_mask method, which duplicates the
    ->dma_capable one in terms of functionality, but requires more
    duplicate code.
  - various updates for the coherent dma pool and related arm code
    (Vladimir)
  - various smaller cleanups (me)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlldmw0LHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYOiKA/+Ln1mFLSf3nfTzIHa24Bbk8ZTGr0B8TD4Vmyyt8iG
 oO3AeaTLn3d6ugbH/uih/tPz8PuyXsdiTC1rI/ejDMiwMTSjW6phSiIHGcStSR9X
 VFNhmMFacp7QpUpvxceV0XZYKDViAoQgHeGdp3l+K5h/v4AYePV/v/5RjQPaEyOh
 YLbCzETO+24mRWdJxdAqtTW4ovYhzj6XsiJ+pAjlV0+SWU6m5L5E+VAPNi1vqv1H
 1O2KeCFvVYEpcnfL3qnkw2timcjmfCfeFAd9mCUAc8mSRBfs3QgDTKw3XdHdtRml
 LU2WuA5cpMrOdBO4mVra2plo8E2szvpB1OZZXoKKdCpK3VGwVpVHcTvClK2Ks/3B
 GDLieroEQNu2ZIUIdWXf/g2x6le3BcC9MmpkAhnGPqCZ7skaIBO5Cjpxm0zTJAPl
 PPY3CMBBEktAvys6DcudOYGixNjKUuAm5lnfpcfTEklFdG0AjhdK/jZOplAFA6w4
 LCiy0rGHM8ZbVAaFxbYoFCqgcjnv6EjSiqkJxVI4fu/Q7v9YXfdPnEmE0PJwCVo5
 +i7aCLgrYshTdHr/F3e5EuofHN3TDHwXNJKGh/x97t+6tt326QMvDKX059Kxst7R
 rFukGbrYvG8Y7yXwrSDbusl443ta0Ht7T1oL4YUoJTZp0nScAyEluDTmrH1JVCsT
 R4o=
 =0Fso
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping infrastructure from Christoph Hellwig:
 "This is the first pull request for the new dma-mapping subsystem

  In this new subsystem we'll try to properly maintain all the generic
  code related to dma-mapping, and will further consolidate arch code
  into common helpers.

  This pull request contains:

   - removal of the DMA_ERROR_CODE macro, replacing it with calls to
     ->mapping_error so that the dma_map_ops instances are more self
     contained and can be shared across architectures (me)

   - removal of the ->set_dma_mask method, which duplicates the
     ->dma_capable one in terms of functionality, but requires more
     duplicate code.

   - various updates for the coherent dma pool and related arm code
     (Vladimir)

   - various smaller cleanups (me)"

* tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mapping: (56 commits)
  ARM: dma-mapping: Remove traces of NOMMU code
  ARM: NOMMU: Set ARM_DMA_MEM_BUFFERABLE for M-class cpus
  ARM: NOMMU: Introduce dma operations for noMMU
  drivers: dma-mapping: allow dma_common_mmap() for NOMMU
  drivers: dma-coherent: Introduce default DMA pool
  drivers: dma-coherent: Account dma_pfn_offset when used with device tree
  dma: Take into account dma_pfn_offset
  dma-mapping: replace dmam_alloc_noncoherent with dmam_alloc_attrs
  dma-mapping: remove dmam_free_noncoherent
  crypto: qat - avoid an uninitialized variable warning
  au1100fb: remove a bogus dma_free_nonconsistent call
  MAINTAINERS: add entry for dma mapping helpers
  powerpc: merge __dma_set_mask into dma_set_mask
  dma-mapping: remove the set_dma_mask method
  powerpc/cell: use the dma_supported method for ops switching
  powerpc/cell: clean up fixed mapping dma_ops initialization
  tile: remove dma_supported and mapping_error methods
  xen-swiotlb: remove xen_swiotlb_set_dma_mask
  arm: implement ->dma_supported instead of ->set_dma_mask
  mips/loongson64: implement ->dma_supported instead of ->set_dma_mask
  ...
2017-07-06 19:20:54 -07:00
Christoph Hellwig 5860acc1a9 x86: remove arch specific dma_supported implementation
And instead wire it up as method for all the dma_map_ops instances.

Note that this also means the arch specific check will be fully instead
of partially applied in the AMD iommu driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-06-28 06:54:46 -07:00
Joerg Roedel 6a7086431f Merge branches 'iommu/fixes', 'arm/rockchip', 'arm/renesas', 'arm/smmu', 'arm/core', 'x86/vt-d', 'x86/amd', 's390' and 'core' into next 2017-06-28 14:45:02 +02:00
Arvind Yadav 01e1932a17 iommu/vt-d: Constify intel_dma_ops
Most dma_map_ops structures are never modified. Constify these
structures such that these can be write-protected.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-06-28 14:17:12 +02:00
Sebastian Andrzej Siewior 58c4a95f90 iommu/vt-d: Don't disable preemption while accessing deferred_flush()
get_cpu() disables preemption and returns the current CPU number. The
CPU number is only used once while retrieving the address of the local's
CPU deferred_flush pointer.
We can instead use raw_cpu_ptr() while we remain preemptible. The worst
thing that can happen is that flush_unmaps_timeout() is invoked multiple
times: once by taskA after seeing HIGH_WATER_MARK and then preempted to
another CPU and then by taskB which saw HIGH_WATER_MARK on the same CPU
as taskA. It is also likely that ->size got from HIGH_WATER_MARK to 0
right after its read because another CPU invoked flush_unmaps_timeout()
for this CPU.
The access to flush_data is protected by a spinlock so even if we get
migrated to another CPU or preempted - the data structure is protected.

While at it, I marked deferred_flush static since I can't find a
reference to it outside of this file.

Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: iommu@lists.linux-foundation.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-06-28 12:24:40 +02:00
Peter Xu b316d02a13 iommu/vt-d: Unwrap __get_valid_domain_for_dev()
We do find_domain() in __get_valid_domain_for_dev(), while we do the
same thing in get_valid_domain_for_dev().  No need to do it twice.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-05-30 11:41:32 +02:00
Thomas Gleixner b608fe356f iommu/vt-d: Adjust system_state checks
To enable smp_processor_id() and might_sleep() debug checks earlier, it's
required to add system states between SYSTEM_BOOTING and SYSTEM_RUNNING.

Adjust the system_state checks in dmar_parse_one_atsr() and
dmar_iommu_notify_scope_dev() to handle the extra states.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Joerg Roedel <joro@8bytes.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: iommu@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20170516184735.712365947@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-23 10:01:36 +02:00
KarimAllah Ahmed f73a7eee90 iommu/vt-d: Flush the IOTLB to get rid of the initial kdump mappings
Ever since commit 091d42e43d ("iommu/vt-d: Copy translation tables from
old kernel") the kdump kernel copies the IOMMU context tables from the
previous kernel. Each device mappings will be destroyed once the driver
for the respective device takes over.

This unfortunately breaks the workflow of mapping and unmapping a new
context to the IOMMU. The mapping function assumes that either:

1) Unmapping did the proper IOMMU flushing and it only ever flush if the
   IOMMU unit supports caching invalid entries.
2) The system just booted and the initialization code took care of
   flushing all IOMMU caches.

This assumption is not true for the kdump kernel since the context
tables have been copied from the previous kernel and translations could
have been cached ever since. So make sure to flush the IOTLB as well
when we destroy these old copied mappings.

Cc: Joerg Roedel <joro@8bytes.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Anthony Liguori <aliguori@amazon.com>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: stable@vger.kernel.org  v4.2+
Fixes: 091d42e43d ("iommu/vt-d: Copy translation tables from old kernel")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-05-17 14:44:47 +02:00
Joerg Roedel 2c0248d688 Merge branches 'arm/exynos', 'arm/omap', 'arm/rockchip', 'arm/mediatek', 'arm/smmu', 'arm/core', 'x86/vt-d', 'x86/amd' and 'core' into next 2017-05-04 18:06:17 +02:00
Shaohua Li bfd20f1cc8 x86, iommu/vt-d: Add an option to disable Intel IOMMU force on
IOMMU harms performance signficantly when we run very fast networking
workloads. It's 40GB networking doing XDP test. Software overhead is
almost unaware, but it's the IOTLB miss (based on our analysis) which
kills the performance. We observed the same performance issue even with
software passthrough (identity mapping), only the hardware passthrough
survives. The pps with iommu (with software passthrough) is only about
~30% of that without it. This is a limitation in hardware based on our
observation, so we'd like to disable the IOMMU force on, but we do want
to use TBOOT and we can sacrifice the DMA security bought by IOMMU. I
must admit I know nothing about TBOOT, but TBOOT guys (cc-ed) think not
eabling IOMMU is totally ok.

So introduce a new boot option to disable the force on. It's kind of
silly we need to run into intel_iommu_init even without force on, but we
need to disable TBOOT PMR registers. For system without the boot option,
nothing is changed.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-04-26 23:57:53 +02:00
Joerg Roedel 161b28aae1 iommu/vt-d: Make sure IOMMUs are off when intel_iommu=off
When booting into a kexec kernel with intel_iommu=off, and
the previous kernel had intel_iommu=on, the IOMMU hardware
is still enabled and gets not disabled by the new kernel.

This causes the boot to fail because DMA is blocked by the
hardware. Disable the IOMMUs when we find it enabled in the
kexec kernel and boot with intel_iommu=off.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-03-29 17:02:00 +02:00
Robin Murphy 9d3a4de4cb iommu: Disambiguate MSI region types
The introduction of reserved regions has left a couple of rough edges
which we could do with sorting out sooner rather than later. Since we
are not yet addressing the potential dynamic aspect of software-managed
reservations and presenting them at arbitrary fixed addresses, it is
incongruous that we end up displaying hardware vs. software-managed MSI
regions to userspace differently, especially since ARM-based systems may
actually require one or the other, or even potentially both at once,
(which iommu-dma currently has no hope of dealing with at all). Let's
resolve the former user-visible inconsistency ASAP before the ABI has
been baked into a kernel release, in a way that also lays the groundwork
for the latter shortcoming to be addressed by follow-up patches.

For clarity, rename the software-managed type to IOMMU_RESV_SW_MSI, use
IOMMU_RESV_MSI to describe the hardware type, and document everything a
little bit. Since the x86 MSI remapping hardware falls squarely under
this meaning of IOMMU_RESV_MSI, apply that type to their regions as well,
so that we tell the same story to userspace across all platforms.

Secondly, as the various region types require quite different handling,
and it really makes little sense to ever try combining them, convert the
bitfield-esque #defines to a plain enum in the process before anyone
gets the wrong impression.

Fixes: d30ddcaa7b ("iommu: Add a new type field in iommu_resv_region")
Reviewed-by: Eric Auger <eric.auger@redhat.com>
CC: Alex Williamson <alex.williamson@redhat.com>
CC: David Woodhouse <dwmw2@infradead.org>
CC: kvm@vger.kernel.org
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-03-22 16:16:17 +01:00
Koos Vriezen 5003ae1e73 iommu/vt-d: Fix NULL pointer dereference in device_to_iommu
The function device_to_iommu() in the Intel VT-d driver
lacks a NULL-ptr check, resulting in this oops at boot on
some platforms:

 BUG: unable to handle kernel NULL pointer dereference at 00000000000007ab
 IP: [<ffffffff8132234a>] device_to_iommu+0x11a/0x1a0
 PGD 0

 [...]

 Call Trace:
   ? find_or_alloc_domain.constprop.29+0x1a/0x300
   ? dw_dma_probe+0x561/0x580 [dw_dmac_core]
   ? __get_valid_domain_for_dev+0x39/0x120
   ? __intel_map_single+0x138/0x180
   ? intel_alloc_coherent+0xb6/0x120
   ? sst_hsw_dsp_init+0x173/0x420 [snd_soc_sst_haswell_pcm]
   ? mutex_lock+0x9/0x30
   ? kernfs_add_one+0xdb/0x130
   ? devres_add+0x19/0x60
   ? hsw_pcm_dev_probe+0x46/0xd0 [snd_soc_sst_haswell_pcm]
   ? platform_drv_probe+0x30/0x90
   ? driver_probe_device+0x1ed/0x2b0
   ? __driver_attach+0x8f/0xa0
   ? driver_probe_device+0x2b0/0x2b0
   ? bus_for_each_dev+0x55/0x90
   ? bus_add_driver+0x110/0x210
   ? 0xffffffffa11ea000
   ? driver_register+0x52/0xc0
   ? 0xffffffffa11ea000
   ? do_one_initcall+0x32/0x130
   ? free_vmap_area_noflush+0x37/0x70
   ? kmem_cache_alloc+0x88/0xd0
   ? do_init_module+0x51/0x1c4
   ? load_module+0x1ee9/0x2430
   ? show_taint+0x20/0x20
   ? kernel_read_file+0xfd/0x190
   ? SyS_finit_module+0xa3/0xb0
   ? do_syscall_64+0x4a/0xb0
   ? entry_SYSCALL64_slow_path+0x25/0x25
 Code: 78 ff ff ff 4d 85 c0 74 ee 49 8b 5a 10 0f b6 9b e0 00 00 00 41 38 98 e0 00 00 00 77 da 0f b6 eb 49 39 a8 88 00 00 00 72 ce eb 8f <41> f6 82 ab 07 00 00 04 0f 85 76 ff ff ff 0f b6 4d 08 88 0e 49
 RIP  [<ffffffff8132234a>] device_to_iommu+0x11a/0x1a0
  RSP <ffffc90001457a78>
 CR2: 00000000000007ab
 ---[ end trace 16f974b6d58d0aad ]---

Add the missing pointer check.

Fixes: 1c387188c6 ("iommu/vt-d: Fix IOMMU lookup for SR-IOV Virtual Functions")
Signed-off-by: Koos Vriezen <koos.vriezen@gmail.com>
Cc: stable@vger.kernel.org # 4.8.15+
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-03-22 12:01:42 +01:00
Joerg Roedel a7fdb6e648 iommu/vt-d: Fix crash when accessing VT-d sysfs entries
The link between the iommu sysfs-device and the struct
intel_iommu is no longer stored as driver-data. Update the
code to use the new access method.

Reported-by: Dave Jones <davej@codemonkey.org.uk>
Fixes: 39ab9555c2 ('iommu: Add sysfs bindings for struct iommu_device')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-02-28 15:48:15 +01:00
Lucas Stach 712c604dcd mm: wire up GFP flag passing in dma_alloc_from_contiguous
The callers of the DMA alloc functions already provide the proper
context GFP flags.  Make sure to pass them through to the CMA allocator,
to make the CMA compaction context aware.

Link: http://lkml.kernel.org/r/20170127172328.18574-3-l.stach@pengutronix.de
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Alexander Graf <agraf@suse.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:55 -08:00
Joerg Roedel 8d2932dd06 Merge branches 'iommu/fixes', 'arm/exynos', 'arm/renesas', 'arm/smmu', 'arm/mediatek', 'arm/core', 'x86/vt-d' and 'core' into next 2017-02-10 15:13:10 +01:00
Joerg Roedel e3d10af112 iommu: Make iommu_device_link/unlink take a struct iommu_device
This makes the interface more consistent with
iommu_device_sysfs_add/remove.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-02-10 13:44:57 +01:00
Joerg Roedel 39ab9555c2 iommu: Add sysfs bindings for struct iommu_device
There is currently support for iommu sysfs bindings, but
those need to be implemented in the IOMMU drivers. Add a
more generic version of this by adding a struct device to
struct iommu_device and use that for the sysfs bindings.

Also convert the AMD and Intel IOMMU driver to make use of
it.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-02-10 13:44:57 +01:00
Joerg Roedel b0119e8708 iommu: Introduce new 'struct iommu_device'
This struct represents one hardware iommu in the iommu core
code. For now it only has the iommu-ops associated with it,
but that will be extended soon.

The register/unregister interface is also added, as well as
making use of it in the Intel and AMD IOMMU drivers.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-02-10 13:44:57 +01:00
David Dillow f7116e115a iommu/vt-d: Don't over-free page table directories
dma_pte_free_level() recurses down the IOMMU page tables and frees
directory pages that are entirely contained in the given PFN range.
Unfortunately, it incorrectly calculates the starting address covered
by the PTE under consideration, which can lead to it clearing an entry
that is still in use.

This occurs if we have a scatterlist with an entry that has a length
greater than 1026 MB and is aligned to 2 MB for both the IOMMU and
physical addresses. For example, if __domain_mapping() is asked to map a
two-entry scatterlist with 2 MB and 1028 MB segments to PFN 0xffff80000,
it will ask if dma_pte_free_pagetable() is asked to PFNs from
0xffff80200 to 0xffffc05ff, it will also incorrectly clear the PFNs from
0xffff80000 to 0xffff801ff because of this issue. The current code will
set level_pfn to 0xffff80200, and 0xffff80200-0xffffc01ff fits inside
the range being cleared. Properly setting the level_pfn for the current
level under consideration catches that this PTE is outside of the range
being cleared.

This patch also changes the value passed into dma_pte_free_level() when
it recurses. This only affects the first PTE of the range being cleared,
and is handled by the existing code that ensures we start our cursor no
lower than start_pfn.

This was found when using dma_map_sg() to map large chunks of contiguous
memory, which immediatedly led to faults on the first access of the
erroneously-deleted mappings.

Fixes: 3269ee0bd6 ("intel-iommu: Fix leaks in pagetable freeing")
Reviewed-by: Benjamin Serebrin <serebrin@google.com>
Signed-off-by: David Dillow <dillow@google.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-01-31 12:50:05 +01:00
Ashok Raj 21e722c4c8 iommu/vt-d: Tylersburg isoch identity map check is done too late.
The check to set identity map for tylersburg is done too late. It needs
to be done before the check for identity_map domain is done.

To: Joerg Roedel <joro@8bytes.org>
To: David Woodhouse <dwmw2@infradead.org>
Cc: iommu@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Cc: Ashok Raj <ashok.raj@intel.com>

Fixes: 86080ccc22 ("iommu/vt-d: Allocate si_domain in init_dmars()")
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Reported-by: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-01-31 12:50:05 +01:00
Eric Auger 0659b8dc45 iommu/vt-d: Implement reserved region get/put callbacks
This patch registers the [FEE0_0000h - FEF0_000h] 1MB MSI
range as a reserved region and RMRR regions as direct regions.

This will allow to report those reserved regions in the
iommu-group sysfs.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23 11:48:17 +00:00
Jacob Pan 65ca7f5f7d iommu/vt-d: Fix pasid table size encoding
Different encodings are used to represent supported PASID bits
and number of PASID table entries.
The current code assigns ecap_pss directly to extended context
table entry PTS which is wrong and could result in writing
non-zero bits to the reserved fields. IOMMU fault reason
11 will be reported when reserved bits are nonzero.
This patch converts ecap_pss to extend context entry pts encoding
based on VT-d spec. Chapter 9.4 as follows:
 - number of PASID bits = ecap_pss + 1
 - number of PASID table entries = 2^(pts + 5)
Software assigned limit of pasid_max value is also respected to
match the allocation limitation of PASID table.

cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Tested-by: Mika Kuoppala <mika.kuoppala@intel.com>
Fixes: 2f26e0a9c9 ('iommu/vt-d: Add basic SVM PASID support')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-01-04 15:18:57 +01:00
Xunlei Pang aec0e86172 iommu/vt-d: Flush old iommu caches for kdump when the device gets context mapped
We met the DMAR fault both on hpsa P420i and P421 SmartArray controllers
under kdump, it can be steadily reproduced on several different machines,
the dmesg log is like:
HP HPSA Driver (v 3.4.16-0)
hpsa 0000:02:00.0: using doorbell to reset controller
hpsa 0000:02:00.0: board ready after hard reset.
hpsa 0000:02:00.0: Waiting for controller to respond to no-op
DMAR: Setting identity map for device 0000:02:00.0 [0xe8000 - 0xe8fff]
DMAR: Setting identity map for device 0000:02:00.0 [0xf4000 - 0xf4fff]
DMAR: Setting identity map for device 0000:02:00.0 [0xbdf6e000 - 0xbdf6efff]
DMAR: Setting identity map for device 0000:02:00.0 [0xbdf6f000 - 0xbdf7efff]
DMAR: Setting identity map for device 0000:02:00.0 [0xbdf7f000 - 0xbdf82fff]
DMAR: Setting identity map for device 0000:02:00.0 [0xbdf83000 - 0xbdf84fff]
DMAR: DRHD: handling fault status reg 2
DMAR: [DMA Read] Request device [02:00.0] fault addr fffff000 [fault reason 06] PTE Read access is not set
hpsa 0000:02:00.0: controller message 03:00 timed out
hpsa 0000:02:00.0: no-op failed; re-trying

After some debugging, we found that the fault addr is from DMA initiated at
the driver probe stage after reset(not in-flight DMA), and the corresponding
pte entry value is correct, the fault is likely due to the old iommu caches
of the in-flight DMA before it.

Thus we need to flush the old cache after context mapping is setup for the
device, where the device is supposed to finish reset at its driver probe
stage and no in-flight DMA exists hereafter.

I'm not sure if the hardware is responsible for invalidating all the related
caches allocated in the iommu hardware before, but seems not the case for hpsa,
actually many device drivers have problems in properly resetting the hardware.
Anyway flushing (again) by software in kdump kernel when the device gets context
mapped which is a quite infrequent operation does little harm.

With this patch, the problematic machine can survive the kdump tests.

CC: Myron Stowe <myron.stowe@gmail.com>
CC: Joseph Szczypek <jszczype@redhat.com>
CC: Don Brace <don.brace@microsemi.com>
CC: Baoquan He <bhe@redhat.com>
CC: Dave Young <dyoung@redhat.com>
Fixes: 091d42e43d ("iommu/vt-d: Copy translation tables from old kernel")
Fixes: dbcd861f25 ("iommu/vt-d: Do not re-use domain-ids from the old kernel")
Fixes: cf484d0e69 ("iommu/vt-d: Mark copied context entries")
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Tested-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-01-04 15:14:04 +01:00
Linus Torvalds e71c3978d6 Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp hotplug updates from Thomas Gleixner:
 "This is the final round of converting the notifier mess to the state
  machine. The removal of the notifiers and the related infrastructure
  will happen around rc1, as there are conversions outstanding in other
  trees.

  The whole exercise removed about 2000 lines of code in total and in
  course of the conversion several dozen bugs got fixed. The new
  mechanism allows to test almost every hotplug step standalone, so
  usage sites can exercise all transitions extensively.

  There is more room for improvement, like integrating all the
  pointlessly different architecture mechanisms of synchronizing,
  setting cpus online etc into the core code"

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  tracing/rb: Init the CPU mask on allocation
  soc/fsl/qbman: Convert to hotplug state machine
  soc/fsl/qbman: Convert to hotplug state machine
  zram: Convert to hotplug state machine
  KVM/PPC/Book3S HV: Convert to hotplug state machine
  arm64/cpuinfo: Convert to hotplug state machine
  arm64/cpuinfo: Make hotplug notifier symmetric
  mm/compaction: Convert to hotplug state machine
  iommu/vt-d: Convert to hotplug state machine
  mm/zswap: Convert pool to hotplug state machine
  mm/zswap: Convert dst-mem to hotplug state machine
  mm/zsmalloc: Convert to hotplug state machine
  mm/vmstat: Convert to hotplug state machine
  mm/vmstat: Avoid on each online CPU loops
  mm/vmstat: Drop get_online_cpus() from init_cpu_node_state/vmstat_cpu_dead()
  tracing/rb: Convert to hotplug state machine
  oprofile/nmi timer: Convert to hotplug state machine
  net/iucv: Use explicit clean up labels in iucv_init()
  x86/pci/amd-bus: Convert to hotplug state machine
  x86/oprofile/nmi: Convert to hotplug state machine
  ...
2016-12-12 19:25:04 -08:00
Anna-Maria Gleixner 21647615db iommu/vt-d: Convert to hotplug state machine
Install the callbacks via the state machine.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: iommu@lists.linux-foundation.org
Cc: rt@linutronix.de
Cc: David Woodhouse <dwmw2@infradead.org>
Link: http://lkml.kernel.org/r/20161126231350.10321-14-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-02 00:52:37 +01:00
Linus Torvalds 105ecadc6d Merge git://git.infradead.org/intel-iommu
Pull IOMMU fixes from David Woodhouse:
 "Two minor fixes.

  The first fixes the assignment of SR-IOV virtual functions to the
  correct IOMMU unit, and the second fixes the excessively large (and
  physically contiguous) PASID tables used with SVM"

* git://git.infradead.org/intel-iommu:
  iommu/vt-d: Fix PASID table allocation
  iommu/vt-d: Fix IOMMU lookup for SR-IOV Virtual Functions
2016-11-27 08:24:46 -08:00
Joerg Roedel bea64033dd iommu/vt-d: Fix dead-locks in disable_dmar_iommu() path
It turns out that the disable_dmar_iommu() code-path tried
to get the device_domain_lock recursivly, which will
dead-lock when this code runs on dmar removal. Fix both
code-paths that could lead to the dead-lock.

Fixes: 55d940430a ('iommu/vt-d: Get rid of domain->iommu_lock')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2016-11-08 15:08:26 +01:00
Ashok Raj 1c387188c6 iommu/vt-d: Fix IOMMU lookup for SR-IOV Virtual Functions
The VT-d specification (§8.3.3) says:
    ‘Virtual Functions’ of a ‘Physical Function’ are under the scope
    of the same remapping unit as the ‘Physical Function’.

The BIOS is not required to list all the possible VFs in the scope
tables, and arguably *shouldn't* make any attempt to do so, since there
could be a huge number of them.

This has been broken basically for ever — the VF is never going to match
against a specific unit's scope, so it ends up being assigned to the
INCLUDE_ALL IOMMU. Which was always actually correct by coincidence, but
now we're looking at Root-Complex integrated devices with SR-IOV support
it's going to start being wrong.

Fix it to simply use pci_physfn() before doing the lookup for PCI devices.

Cc: stable@vger.kernel.org
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2016-10-30 05:32:51 -06:00
Joerg Roedel 1c5ebba95b iommu/vt-d: Make sure RMRRs are mapped before domain goes public
When a domain is allocated through the get_valid_domain_for_dev
path, it will be context-mapped before the RMRR regions are
mapped in the page-table. This opens a short time window
where device-accesses to these regions fail and causing DMAR
faults.

Fix this by mapping the RMRR regions before the domain is
context-mapped.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2016-09-05 13:00:28 +02:00
Joerg Roedel 76208356a0 iommu/vt-d: Split up get_domain_for_dev function
Split out the search for an already existing domain and the
context mapping of the device to the new domain.

This allows to map possible RMRR regions into the domain
before it is context mapped.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2016-09-05 13:00:28 +02:00
Krzysztof Kozlowski 00085f1efa dma-mapping: use unsigned long for dma_attrs
The dma-mapping core and the implementations do not change the DMA
attributes passed by pointer.  Thus the pointer can point to const data.
However the attributes do not have to be a bitfield.  Instead unsigned
long will do fine:

1. This is just simpler.  Both in terms of reading the code and setting
   attributes.  Instead of initializing local attributes on the stack
   and passing pointer to it to dma_set_attr(), just set the bits.

2. It brings safeness and checking for const correctness because the
   attributes are passed by value.

Semantic patches for this change (at least most of them):

    virtual patch
    virtual context

    @r@
    identifier f, attrs;

    @@
    f(...,
    - struct dma_attrs *attrs
    + unsigned long attrs
    , ...)
    {
    ...
    }

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
     )

and

    // Options: --all-includes
    virtual patch
    virtual context

    @r@
    identifier f, attrs;
    type t;

    @@
    t f(..., struct dma_attrs *attrs);

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
     )

Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
Acked-by: Mark Salter <msalter@redhat.com> [c6x]
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-04 08:50:07 -04:00
Linus Torvalds dd9671172a IOMMU Updates for Linux v4.8
In the updates:
 
 	* Big endian support and preparation for defered probing for the
 	  Exynos IOMMU driver
 
 	* Simplifications in iommu-group id handling
 
 	* Support for Mediatek generation one IOMMU hardware
 
 	* Conversion of the AMD IOMMU driver to use the generic IOVA
 	  allocator. This driver now also benefits from the recent
 	  scalability improvements in the IOVA code.
 
 	* Preparations to use generic DMA mapping code in the Rockchip
 	  IOMMU driver
 
 	* Device tree adaption and conversion to use generic page-table
 	  code for the MSM IOMMU driver
 
 	* An iova_to_phys optimization in the ARM-SMMU driver to greatly
 	  improve page-table teardown performance with VFIO
 
 	* Various other small fixes and conversions
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJXl3e+AAoJECvwRC2XARrjMIgP/1Mm9qIfcaAxKY4ByqbVfrH8
 313PO6rpwUhhywUmnf/1F/x+JbuLv8MmRXfSc106mdB1rq9NXpkORYKrqVxs0cSq
 6u6TzZWbF6WN1ipqXxDITNFBSy7u97K1VuFaKyYFfLbg8xrkcdkMZJ7BqM2xIEdk
 rnRKcfHo6wsmCXJ6InsUPmKAqU6AfMewZTGjO+v77Gce0rZEbsJ8n7BRKC9vO2bc
 akvN2W+zzEUSyhbuyYQBG+agpmC5GJvz4u+6QvAP5sxTWfAsnwAoPpP4xxR+/KjT
 eicHlja4v0YK6Hr4AJaMxoKfKIrCdqpWm0D2tg/edyWZCeg98AW/w7/s0I8OD3ao
 Otj6IqC8nPk0pYciOeEPQ7aqPbvKAqU2FYWt7lWamrdr98u2R3p2nXGl0KthoAj6
 JqzrCZXvBS7sj1IPLlGpj939yvbKbjpE0p7y1qhI1VEBXoBWFNvlKydkYx76BTGK
 F6paGVqn2Zwy00AqAsylTEkvIK063zwShZ6nPqz4bMdVlgzjrjCzdDecjfbHr8Ic
 6D2oCwyF+RJ8qw+Ecm9EmWFik80sgb+iUTeeYEXNf+YzLYt5McIj7fi3N+sUPel3
 YJ4S4x0sIpgUZZ1i+rOo8ZPAFHRU6SRPYV+ewaeYKrMt+Un5dTn9SddpqrJdbiUu
 YrF36BaQjc123IRGKrSd
 =xiS2
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU updates from Joerg Roedel:

 - big-endian support and preparation for defered probing for the Exynos
   IOMMU driver

 - simplifications in iommu-group id handling

 - support for Mediatek generation one IOMMU hardware

 - conversion of the AMD IOMMU driver to use the generic IOVA allocator.
   This driver now also benefits from the recent scalability
   improvements in the IOVA code.

 - preparations to use generic DMA mapping code in the Rockchip IOMMU
   driver

 - device tree adaption and conversion to use generic page-table code
   for the MSM IOMMU driver

 - an iova_to_phys optimization in the ARM-SMMU driver to greatly
   improve page-table teardown performance with VFIO

 - various other small fixes and conversions

* tag 'iommu-updates-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (59 commits)
  iommu/amd: Initialize dma-ops domains with 3-level page-table
  iommu/amd: Update Alias-DTE in update_device_table()
  iommu/vt-d: Return error code in domain_context_mapping_one()
  iommu/amd: Use container_of to get dma_ops_domain
  iommu/amd: Flush iova queue before releasing dma_ops_domain
  iommu/amd: Handle IOMMU_DOMAIN_DMA in ops->domain_free call-back
  iommu/amd: Use dev_data->domain in get_domain()
  iommu/amd: Optimize map_sg and unmap_sg
  iommu/amd: Introduce dir2prot() helper
  iommu/amd: Implement timeout to flush unmap queues
  iommu/amd: Implement flush queue
  iommu/amd: Allow NULL pointer parameter for domain_flush_complete()
  iommu/amd: Set up data structures for flush queue
  iommu/amd: Remove align-parameter from __map_single()
  iommu/amd: Remove other remains of old address allocator
  iommu/amd: Make use of the generic IOVA allocator
  iommu/amd: Remove special mapping code for dma_ops path
  iommu/amd: Pass gfp-flags to iommu_map_page()
  iommu/amd: Implement apply_dm_region call-back
  iommu/amd: Create a list of reserved iova addresses
  ...
2016-08-01 07:25:10 -04:00
Linus Torvalds 194dc870a5 Add braces to avoid "ambiguous ‘else’" compiler warnings
Some of our "for_each_xyz()" macro constructs make gcc unhappy about
lack of braces around if-statements inside or outside the loop, because
the loop construct itself has a "if-then-else" statement inside of it.

The resulting warnings look something like this:

  drivers/gpu/drm/i915/i915_debugfs.c: In function ‘i915_dump_lrc’:
  drivers/gpu/drm/i915/i915_debugfs.c:2103:6: warning: suggest explicit braces to avoid ambiguous ‘else’ [-Wparentheses]
     if (ctx != dev_priv->kernel_context)
        ^

even if the code itself is fine.

Since the warning is fairly easy to avoid by adding a braces around the
if-statement near the for_each_xyz() construct, do so, rather than
disabling the otherwise potentially useful warning.

(The if-then-else statements used in the "for_each_xyz()" constructs are
designed to be inherently safe even with no braces, but in this case
it's quite understandable that gcc isn't really able to tell that).

This finally leaves the standard "allmodconfig" build with just a
handful of remaining warnings, so new and valid warnings hopefully will
stand out.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-27 20:03:31 -07:00
Joerg Roedel f360d3241f Merge branches 'x86/amd', 'x86/vt-d', 'arm/exynos', 'arm/mediatek', 'arm/msm', 'arm/rockchip', 'arm/smmu' and 'core' into next 2016-07-26 16:02:37 +02:00
Wei Yang 5c365d18a7 iommu/vt-d: Return error code in domain_context_mapping_one()
In 'commit <55d940430ab9> ("iommu/vt-d: Get rid of domain->iommu_lock")',
the error handling path is changed a little, which makes the function
always return 0.

This path fixes this.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Fixes: 55d940430a ('iommu/vt-d: Get rid of domain->iommu_lock')
Cc: stable@vger.kernel.org # v4.3+
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2016-07-14 10:26:30 +02:00
Aaron Campbell 0caa7616a6 iommu/vt-d: Fix infinite loop in free_all_cpu_cached_iovas
Per VT-d spec Section 10.4.2 ("Capability Register"), the maximum
number of possible domains is 64K; indeed this is the maximum value
that the cap_ndoms() macro will expand to.  Since the value 65536
will not fix in a u16, the 'did' variable must be promoted to an
int, otherwise the test for < 65536 will always be true and the
loop will never end.

The symptom, in my case, was a hung machine during suspend.

Fixes: 3bd4f9112f ("iommu/vt-d: Fix overflow of iommu->domains array")
Signed-off-by: Aaron Campbell <aaron@monkey.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2016-07-04 13:34:52 +02:00
Jan Niehusmann 3bd4f9112f iommu/vt-d: Fix overflow of iommu->domains array
The valid range of 'did' in get_iommu_domain(*iommu, did)
is 0..cap_ndoms(iommu->cap), so don't exceed that
range in free_all_cpu_cached_iovas().

The user-visible impact of the out-of-bounds access is the machine
hanging on suspend-to-ram. It is, in fact, a kernel panic, but due
to already suspended devices, that's often not visible to the user.

Fixes: 22e2f9fa63 ("iommu/vt-d: Use per-cpu IOVA caching")
Signed-off-by: Jan Niehusmann <jan@gondor.com>
Tested-By: Marius Vlad <marius.c.vlad@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2016-06-27 13:21:37 +02:00
Joerg Roedel a4c34ff1c0 iommu/vt-d: Enable QI on all IOMMUs before setting root entry
This seems to be required on some X58 chipsets on systems
with more than one IOMMU. QI does not work until it is
enabled on all IOMMUs in the system.

Reported-by: Dheeraj CVR <cvr.dheeraj@gmail.com>
Tested-by: Dheeraj CVR <cvr.dheeraj@gmail.com>
Fixes: 5f0a7f7614 ('iommu/vt-d: Make root entry visible for hardware right after allocation')
Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2016-06-17 11:29:48 +02:00
Wei Yang 86f004c77c iommu/vt-d: Reduce extra first level entry in iommu->domains
In commit <8bf478163e69> ("iommu/vt-d: Split up iommu->domains array"), it
it splits iommu->domains in two levels. Each first level contains 256
entries of second level. In case of the ndomains is exact a multiple of
256, it would have one more extra first level entry for current
implementation.

This patch refines this calculation to reduce the extra first level entry.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2016-06-15 13:36:58 +02:00
Linus Torvalds 2566278551 Merge git://git.infradead.org/intel-iommu
Pull intel IOMMU updates from David Woodhouse:
 "This patchset improves the scalability of the Intel IOMMU code by
  resolving two spinlock bottlenecks and eliminating the linearity of
  the IOVA allocator, yielding up to ~5x performance improvement and
  approaching 'iommu=off' performance"

* git://git.infradead.org/intel-iommu:
  iommu/vt-d: Use per-cpu IOVA caching
  iommu/iova: introduce per-cpu caching to iova allocation
  iommu/vt-d: change intel-iommu to use IOVA frame numbers
  iommu/vt-d: avoid dev iotlb logic for domains with no dev iotlbs
  iommu/vt-d: only unmap mapped entries
  iommu/vt-d: correct flush_unmaps pfn usage
  iommu/vt-d: per-cpu deferred invalidation queues
  iommu/vt-d: refactoring of deferred flush entries
2016-05-27 13:49:24 -07:00
Joerg Roedel 6c0b43df74 Merge branches 'arm/io-pgtable', 'arm/rockchip', 'arm/omap', 'x86/vt-d', 'ppc/pamu', 'core' and 'x86/amd' into next 2016-05-09 19:39:17 +02:00
Omer Peleg 22e2f9fa63 iommu/vt-d: Use per-cpu IOVA caching
Commit 9257b4a2 ('iommu/iova: introduce per-cpu caching to iova allocation')
introduced per-CPU IOVA caches to massively improve scalability. Use them.

Signed-off-by: Omer Peleg <omer@cs.technion.ac.il>
[mad@cs.technion.ac.il: rebased, cleaned up and reworded the commit message]
Signed-off-by: Adam Morrison <mad@cs.technion.ac.il>
Reviewed-by: Shaohua Li <shli@fb.com>
Reviewed-by: Ben Serebrin <serebrin@google.com>
[dwmw2: split out VT-d part into a separate patch]
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2016-04-20 15:44:48 -04:00
Omer Peleg 2aac630429 iommu/vt-d: change intel-iommu to use IOVA frame numbers
Make intel-iommu map/unmap/invalidate work with IOVA pfns instead of
pointers to "struct iova". This avoids using the iova struct from the IOVA
red-black tree and the resulting explicit find_iova() on unmap.

This patch will allow us to cache IOVAs in the next patch, in order to
avoid rbtree operations for the majority of map/unmap operations.

Note: In eliminating the find_iova() operation, we have also eliminated
the sanity check previously done in the unmap flow. Arguably, this was
overhead that is better avoided in production code, but it could be
brought back as a debug option for driver development.

Signed-off-by: Omer Peleg <omer@cs.technion.ac.il>
[mad@cs.technion.ac.il: rebased, fixed to not break iova api, and reworded
 the commit message]
Signed-off-by: Adam Morrison <mad@cs.technion.ac.il>
Reviewed-by: Shaohua Li <shli@fb.com>
Reviewed-by: Ben Serebrin <serebrin@google.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2016-04-20 15:07:22 -04:00
Omer Peleg 0824c5920b iommu/vt-d: avoid dev iotlb logic for domains with no dev iotlbs
This patch avoids taking the device_domain_lock in iommu_flush_dev_iotlb()
for domains with no dev iotlb devices.

Signed-off-by: Omer Peleg <omer@cs.technion.ac.il>
[gvdl@google.com: fixed locking issues]
Signed-off-by: Godfrey van der Linden <gvdl@google.com>
[mad@cs.technion.ac.il: rebased and reworded the commit message]
Signed-off-by: Adam Morrison <mad@cs.technion.ac.il>
Reviewed-by: Shaohua Li <shli@fb.com>
Reviewed-by: Ben Serebrin <serebrin@google.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2016-04-20 15:06:15 -04:00
Omer Peleg 769530e4ba iommu/vt-d: only unmap mapped entries
Current unmap implementation unmaps the entire area covered by the IOVA
range, which is a power-of-2 aligned region. The corresponding map,
however, only maps those pages originally mapped by the user. This
discrepancy can lead to unmapping of already unmapped entries, which is
unneeded work.

With this patch, only mapped pages are unmapped. This is also a baseline
for a map/unmap implementation based on IOVAs and not iova structures,
which will allow caching.

Signed-off-by: Omer Peleg <omer@cs.technion.ac.il>
[mad@cs.technion.ac.il: rebased and reworded the commit message]
Signed-off-by: Adam Morrison <mad@cs.technion.ac.il>
Reviewed-by: Shaohua Li <shli@fb.com>
Reviewed-by: Ben Serebrin <serebrin@google.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2016-04-20 15:06:01 -04:00
Omer Peleg f5c0c08b1e iommu/vt-d: correct flush_unmaps pfn usage
Change flush_unmaps() to correctly pass iommu_flush_iotlb_psi()
dma addresses.  (x86_64 mm and dma have the same size for pages
at the moment, but this usage improves consistency.)

Signed-off-by: Omer Peleg <omer@cs.technion.ac.il>
[mad@cs.technion.ac.il: rebased and reworded the commit message]
Signed-off-by: Adam Morrison <mad@cs.technion.ac.il>
Reviewed-by: Shaohua Li <shli@fb.com>
Reviewed-by: Ben Serebrin <serebrin@google.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2016-04-20 15:05:56 -04:00
Omer Peleg aa4732406e iommu/vt-d: per-cpu deferred invalidation queues
The IOMMU's IOTLB invalidation is a costly process.  When iommu mode
is not set to "strict", it is done asynchronously. Current code
amortizes the cost of invalidating IOTLB entries by batching all the
invalidations in the system and performing a single global invalidation
instead. The code queues pending invalidations in a global queue that
is accessed under the global "async_umap_flush_lock" spinlock, which
can result is significant spinlock contention.

This patch splits this deferred queue into multiple per-cpu deferred
queues, and thus gets rid of the "async_umap_flush_lock" and its
contention.  To keep existing deferred invalidation behavior, it still
invalidates the pending invalidations of all CPUs whenever a CPU
reaches its watermark or a timeout occurs.

Signed-off-by: Omer Peleg <omer@cs.technion.ac.il>
[mad@cs.technion.ac.il: rebased, cleaned up and reworded the commit message]
Signed-off-by: Adam Morrison <mad@cs.technion.ac.il>
Reviewed-by: Shaohua Li <shli@fb.com>
Reviewed-by: Ben Serebrin <serebrin@google.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2016-04-20 15:05:24 -04:00
Omer Peleg 314f1dc140 iommu/vt-d: refactoring of deferred flush entries
Currently, deferred flushes' info is striped between several lists in
the flush tables. Instead, move all information about a specific flush
to a single entry in this table.

This patch does not introduce any functional change.

Signed-off-by: Omer Peleg <omer@cs.technion.ac.il>
[mad@cs.technion.ac.il: rebased and reworded the commit message]
Signed-off-by: Adam Morrison <mad@cs.technion.ac.il>
Reviewed-by: Shaohua Li <shli@fb.com>
Reviewed-by: Ben Serebrin <serebrin@google.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2016-04-20 15:05:20 -04:00
Dan Carpenter 0b74ecdfbe iommu/vt-d: Silence an uninitialized variable warning
My static checker complains that "dma_alias" is uninitialized unless we
are dealing with a pci device.  This is true but harmless.  Anyway, we
can flip the condition around to silence the warning.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2016-04-07 14:51:47 +02:00
Michael S. Tsirkin 3d1a2442d2 x86/vt-d: Fix comment for dma_pte_free_pagetable()
dma_pte_free_pagetable no longer depends on last level ptes
being clear, it clears them itself.  Fix up the comment to
match.

Cc: Jiang Liu <jiang.liu@linux.intel.com>
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2016-04-05 17:00:37 +02:00
Joerg Roedel e6a8c9b337 iommu/vt-d: Use BUS_NOTIFY_REMOVED_DEVICE in hotplug path
In the PCI hotplug path of the Intel IOMMU driver, replace
the usage of the BUS_NOTIFY_DEL_DEVICE notifier, which is
executed before the driver is unbound from the device, with
BUS_NOTIFY_REMOVED_DEVICE, which runs after that.

This fixes a kernel BUG being triggered in the VT-d code
when the device driver tries to unmap DMA buffers and the
VT-d driver already destroyed all mappings.

Reported-by: Stefani Seibold <stefani@seibold.net>
Cc: stable@vger.kernel.org # v4.3+
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2016-02-29 23:55:16 +01:00
Jeremy McNicoll da972fb13b iommu/vt-d: Don't skip PCI devices when disabling IOTLB
Fix a simple typo when disabling IOTLB on PCI(e) devices.

Fixes: b16d0cb9e2 ("iommu/vt-d: Always enable PASID/PRI PCI capabilities before ATS")
Cc: stable@vger.kernel.org  # v4.4
Signed-off-by: Jeremy McNicoll <jmcnicol@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2016-01-29 12:18:13 +01:00
Dan Williams 3e6110fd54 Revert "scatterlist: use sg_phys()"
commit db0fa0cb01 "scatterlist: use sg_phys()" did replacements of
the form:

    phys_addr_t phys = page_to_phys(sg_page(s));
    phys_addr_t phys = sg_phys(s) & PAGE_MASK;

However, this breaks platforms where sizeof(phys_addr_t) >
sizeof(unsigned long).  Revert for 4.3 and 4.4 to make room for a
combined helper in 4.5.

Cc: <stable@vger.kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: db0fa0cb01 ("scatterlist: use sg_phys()")
Suggested-by: Joerg Roedel <joro@8bytes.org>
Reported-by: Vitaly Lavrov <vel21ripn@gmail.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-12-15 12:54:06 -08:00
Mel Gorman d0164adc89 mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts.  They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve".  __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".

Over time, callers had a requirement to not block when fallback options
were available.  Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.

This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative.  High priority users continue to use
__GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.

This patch then converts a number of sites

o __GFP_ATOMIC is used by callers that are high priority and have memory
  pools for those requests. GFP_ATOMIC uses this flag.

o Callers that have a limited mempool to guarantee forward progress clear
  __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
  into this category where kswapd will still be woken but atomic reserves
  are not used as there is a one-entry mempool to guarantee progress.

o Callers that are checking if they are non-blocking should use the
  helper gfpflags_allow_blocking() where possible. This is because
  checking for __GFP_WAIT as was done historically now can trigger false
  positives. Some exceptions like dm-crypt.c exist where the code intent
  is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
  flag manipulations.

o Callers that built their own GFP flags instead of starting with GFP_KERNEL
  and friends now also need to specify __GFP_KSWAPD_RECLAIM.

The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.

The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL.  They may
now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Linus Torvalds 39cf7c3981 IOMMU Updates for Linux v4.4
This time including:
 
 	* A new IOMMU driver for s390 pci devices
 
 	* Common dma-ops support based on iommu-api for ARM64. The plan is to
 	  use this as a basis for ARM32 and hopefully other architectures as
 	  well in the future.
 
 	* MSI support for ARM-SMMUv3
 
 	* Cleanups and dead code removal in the AMD IOMMU driver
 
 	* Better RMRR handling for the Intel VT-d driver
 
 	* Various other cleanups and small fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJWOz7hAAoJECvwRC2XARrjbvYQALwtITTA5iTm0y/ApwNMxI7n
 pZpjZVPoBPNsGBc4t/MT8pVhUSdmpBOljbV4Y4CayL1mSSB6Bl2gooZjd66m7Z81
 qMJYEVWhFQqVsIKkCSNOgaO7W5y+xt3rTgqN6vCu86/CCDfKrTPP/+CRl1T/z9bo
 1J8ioM3KnZG9KzG8JuXYFg5wwbKToaBh6swSmj+O4U9hru7zV/ILP7ikcc9pyMji
 12WbzCqchRORsJZD65xMRYAqRaPNN/3IlDejs00TOFhY3qpWgEgFUucyeRJBJ/+q
 K4U8T5vZsnr1a04l7/BeYbLmP7y/9Qv0N0xMGtTyoy/w/BieGqRWu4hHhqf/44NO
 EhCSXcEThMNCGTjP2VWC4dnQ/s7Y8OmSW9nCreUcFVxHoE5LfDoh8RngA2fpeNuS
 ixb3OwP+YXHN9Ck+1BQqQCeBznsPTLuDxlhRjCJsWntIfMSkXebOkz83YxyZ9b0Q
 gFvptfuknU7cotUwWa3dg8RiUB8kNlKJyEEByaVpWEbEOabnONKEMkstvuBx6Ots
 kA63wbe7QcPgbUYuq7g0nijDw6E2aEtf0nx2Xx4ZDL932qjg/xUkiBpmbDXHw4Gu
 nimNXVQtbCzF74SyTvxEtupiijOTm5eHtoKtg0mYnqPZ+V9eOwEvW8IHaFFf8XHD
 SecikoTtH1Q4RVtqOcAQ
 =jLlB
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:
 "This time including:

   - A new IOMMU driver for s390 pci devices

   - Common dma-ops support based on iommu-api for ARM64.  The plan is
     to use this as a basis for ARM32 and hopefully other architectures
     as well in the future.

   - MSI support for ARM-SMMUv3

   - Cleanups and dead code removal in the AMD IOMMU driver

   - Better RMRR handling for the Intel VT-d driver

   - Various other cleanups and small fixes"

* tag 'iommu-updates-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (41 commits)
  iommu/vt-d: Fix return value check of parse_ioapics_under_ir()
  iommu/vt-d: Propagate error-value from ir_parse_ioapic_hpet_scope()
  iommu/vt-d: Adjust the return value of the parse_ioapics_under_ir
  iommu: Move default domain allocation to iommu_group_get_for_dev()
  iommu: Remove is_pci_dev() fall-back from iommu_group_get_for_dev
  iommu/arm-smmu: Switch to device_group call-back
  iommu/fsl: Convert to device_group call-back
  iommu: Add device_group call-back to x86 iommu drivers
  iommu: Add generic_device_group() function
  iommu: Export and rename iommu_group_get_for_pci_dev()
  iommu: Revive device_group iommu-ops call-back
  iommu/amd: Remove find_last_devid_on_pci()
  iommu/amd: Remove first/last_device handling
  iommu/amd: Initialize amd_iommu_last_bdf for DEV_ALL
  iommu/amd: Cleanup buffer allocation
  iommu/amd: Remove cmd_buf_size and evt_buf_size from struct amd_iommu
  iommu/amd: Align DTE flag definitions
  iommu/amd: Remove old alias handling code
  iommu/amd: Set alias DTE in do_attach/do_detach
  iommu/amd: WARN when __[attach|detach]_device are called with irqs enabled
  ...
2015-11-05 16:12:10 -08:00
Linus Torvalds ab1228e42e Merge git://git.infradead.org/intel-iommu
Pull intel iommu updates from David Woodhouse:
 "This adds "Shared Virtual Memory" (aka PASID support) for the Intel
  IOMMU.  This allows devices to do DMA using process address space,
  translated through the normal CPU page tables for the relevant mm.

  With corresponding support added to the i915 driver, this has been
  tested with the graphics device on Skylake.  We don't have the
  required TLP support in our PCIe root ports for supporting discrete
  devices yet, so it's only integrated devices that can do it so far"

* git://git.infradead.org/intel-iommu: (23 commits)
  iommu/vt-d: Fix rwxp flags in SVM device fault callback
  iommu/vt-d: Expose struct svm_dev_ops without CONFIG_INTEL_IOMMU_SVM
  iommu/vt-d: Clean up pasid_enabled() and ecs_enabled() dependencies
  iommu/vt-d: Handle Caching Mode implementations of SVM
  iommu/vt-d: Fix SVM IOTLB flush handling
  iommu/vt-d: Use dev_err(..) in intel_svm_device_to_iommu(..)
  iommu/vt-d: fix a loop in prq_event_thread()
  iommu/vt-d: Fix IOTLB flushing for global pages
  iommu/vt-d: Fix address shifting in page request handler
  iommu/vt-d: shift wrapping bug in prq_event_thread()
  iommu/vt-d: Fix NULL pointer dereference in page request error case
  iommu/vt-d: Implement SVM_FLAG_SUPERVISOR_MODE for kernel access
  iommu/vt-d: Implement SVM_FLAG_PRIVATE_PASID to allocate unique PASIDs
  iommu/vt-d: Add callback to device driver on page faults
  iommu/vt-d: Implement page request handling
  iommu/vt-d: Generalise DMAR MSI setup to allow for page request events
  iommu/vt-d: Implement deferred invalidate for SVM
  iommu/vt-d: Add basic SVM PASID support
  iommu/vt-d: Always enable PASID/PRI PCI capabilities before ATS
  iommu/vt-d: Add initial support for PASID tables
  ...
2015-11-05 16:06:52 -08:00
Joerg Roedel b67ad2f7c7 Merge branches 'x86/vt-d', 'arm/omap', 'arm/smmu', 's390', 'core' and 'x86/amd' into next
Conflicts:
	drivers/iommu/amd_iommu_types.h
2015-11-02 20:03:34 +09:00
David Woodhouse d42fde7084 iommu/vt-d: Clean up pasid_enabled() and ecs_enabled() dependencies
When booted with intel_iommu=ecs_off we were still allocating the PASID
tables even though we couldn't actually use them. We really want to make
the pasid_enabled() macro depend on ecs_enabled().

Which is unfortunate, because currently they're the other way round to
cope with the Broadwell/Skylake problems with ECS.

Instead of having ecs_enabled() depend on pasid_enabled(), which was never
something that made me happy anyway, make it depend in the normal case
on the "broken PASID" bit 28 *not* being set.

Then pasid_enabled() can depend on ecs_enabled() as it should. And we also
don't need to mess with it if we ever see an implementation that has some
features requiring ECS (like PRI) but which *doesn't* have PASID support.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2015-10-24 21:33:01 +02:00
Joerg Roedel a960fadbe6 iommu: Add device_group call-back to x86 iommu drivers
Set the device_group call-back to pci_device_group() for the
Intel VT-d and the AMD IOMMU driver.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-10-22 00:00:49 +02:00
Linus Torvalds 8a70dd2669 Merge tag 'for-linus-20151021' of git://git.infradead.org/intel-iommu
Pull intel-iommu bugfix from David Woodhouse:
 "This contains a single fix, for when the IOMMU API is used to overlay
  an existing mapping comprised of 4KiB pages, with a mapping that can
  use superpages.

  For the *first* superpage in the new mapping, we were correctly¹
  freeing the old bottom-level page table page and clearing the link to
  it, before installing the superpage.  For subsequent superpages,
  however, we weren't.  This causes a memory leak, and a warning about
  setting a PTE which is already set.

  ¹ Well, not *entirely* correctly.  We just free the page table pages
    right there and then, which is wrong.  In fact they should only be
    freed *after* the IOTLB is flushed so we know the hardware will no
    longer be looking at them....  and in fact I note that the IOTLB
    flush is completely missing from the intel_iommu_map() code path,
    although it needs to be there if it's permitted to overwrite
    existing mappings.

    Fixing those is somewhat more intrusive though, and will probably
    need to wait for 4.4 at this point"

* tag 'for-linus-20151021' of git://git.infradead.org/intel-iommu:
  iommu/vt-d: fix range computation when making room for large pages
2015-10-22 06:32:48 +09:00
Sudeep Dutt b9997e385e iommu/vt-d: Use dev_err(..) in intel_svm_device_to_iommu(..)
This will give a little bit of assistance to those developing drivers
using SVM. It might cause a slight annoyance to end-users whose kernel
disables the IOMMU when drivers are trying to use it. But the fix there
is to fix the kernel to enable the IOMMU.

Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2015-10-19 15:03:00 +01:00
David Woodhouse a222a7f0bb iommu/vt-d: Implement page request handling
Largely based on the driver-mode implementation by Jesse Barnes.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2015-10-15 15:35:19 +01:00
David Woodhouse 907fea3491 iommu/vt-d: Implement deferred invalidate for SVM
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2015-10-15 13:22:35 +01:00
David Woodhouse 2f26e0a9c9 iommu/vt-d: Add basic SVM PASID support
This provides basic PASID support for endpoint devices, tested with a
version of the i915 driver.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2015-10-15 12:55:45 +01:00
David Woodhouse b16d0cb9e2 iommu/vt-d: Always enable PASID/PRI PCI capabilities before ATS
The behaviour if you enable PASID support after ATS is undefined. So we
have to enable it first, even if we don't know whether we'll need it.

This is safe enough; unless we set up a context that permits it, the device
can't actually *do* anything with it.

Also shift the feature detction to dmar_insert_one_dev_info() as it only
needs to happen once.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2015-10-15 12:05:39 +01:00
David Woodhouse 8a94ade4ce iommu/vt-d: Add initial support for PASID tables
Add CONFIG_INTEL_IOMMU_SVM, and allocate PASID tables on supported hardware.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2015-10-15 11:24:51 +01:00
David Woodhouse ae853ddb9a iommu/vt-d: Introduce intel_iommu=pasid28, and pasid_enabled() macro
As long as we use an identity mapping to work around the worst of the
hardware bugs which caused us to defeature it and change the definition
of the capability bit, we *can* use PASID support on the devices which
advertised it in bit 28 of the Extended Capability Register.

Allow people to do so with 'intel_iommu=pasid28' on the command line.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2015-10-15 11:24:45 +01:00
David Woodhouse d14053b3c7 iommu/vt-d: Fix ATSR handling for Root-Complex integrated endpoints
The VT-d specification says that "Software must enable ATS on endpoint
devices behind a Root Port only if the Root Port is reported as
supporting ATS transactions."

We walk up the tree to find a Root Port, but for integrated devices we
don't find one — we get to the host bridge. In that case we *should*
allow ATS. Currently we don't, which means that we are incorrectly
failing to use ATS for the integrated graphics. Fix that.

We should never break out of this loop "naturally" with bus==NULL,
since we'll always find bridge==NULL in that case (and now return 1).

So remove the check for (!bridge) after the loop, since it can never
happen. If it did, it would be worthy of a BUG_ON(!bridge). But since
it'll oops anyway in that case, that'll do just as well.

Cc: stable@vger.kernel.org
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2015-10-15 09:28:56 +01:00
Dan Williams dfddb969ed iommu/vt-d: Switch from ioremap_cache to memremap
In preparation for deprecating ioremap_cache() convert its usage in
intel-iommu to memremap.  This also eliminates the mishandling of the
__iomem annotation in the implementation.

Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-10-14 15:22:06 +02:00
Christian Zander ba2374fd2b iommu/vt-d: fix range computation when making room for large pages
In preparation for the installation of a large page, any small page
tables that may still exist in the target IOV address range are
removed.  However, if a scatter/gather list entry is large enough to
fit more than one large page, the address space for any subsequent
large pages is not cleared of conflicting small page tables.

This can cause legitimate mapping requests to fail with errors of the
form below, potentially followed by a series of IOMMU faults:

ERROR: DMA PTE for vPFN 0xfde00 already set (to 7f83a4003 not 7e9e00083)

In this example, a 4MiB scatter/gather list entry resulted in the
successful installation of a large page @ vPFN 0xfdc00, followed by
a failed attempt to install another large page @ vPFN 0xfde00, due to
the presence of a pointer to a small page table @ 0x7f83a4000.

To address this problem, compute the number of large pages that fit
into a given scatter/gather list entry, and use it to derive the
last vPFN covered by the large page(s).

Cc: stable@vger.kernel.org
Signed-off-by: Christian Zander <christian@nervanasys.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2015-10-13 20:32:50 +01:00
Linus Torvalds 7554225312 IOMMU Fixes for Linux v4.3-rc5
A few fixes piled up:
 
 	* Fix for a suspend/resume issue where PCI probing code overwrote
 	  dev->irq for the MSI irq of the AMD IOMMU.
 
 	* Fix for a kernel crash when a 32 bit PCI device was assigned to a KVM
 	  guest.
 
 	* Fix for a possible memory leak in the VT-d driver
 
 	* A couple of fixes for the ARM-SMMU driver
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJWHNdbAAoJECvwRC2XARrjB/YQAKouJaRMjBaehx6kbaZMhMJy
 hXDsh8Xl6TtCe6kLD2uXrvjLZAdu32kjrtzhhcM21EO5Ms2Weq6A60/98LwnJ4Eg
 AqftjfxQsIwf2G1PvHb+xepgcFxIAhW6a3nORzx6d2AGrNWmMtUhbLTSncYjmojf
 Td4dscuRmRPenJUV1JhcJQBR62QonknIHV99QmevaCSAoUdyuMH+t5kQVEgPjx7C
 GlMPNEZZmGl7J3NXSWRtDSkUxFZ1OU8MTKc1LmPPHHAOZk37wbePihQbLLySlHPH
 v4G1R05e2hG7C66yu959fyOleL87lDToUXhwQNFJMqEc+e7IzBzZsB3ANEHjpLQH
 UJC9COU+sf8mPafja4ge/KbyGDmgDg/OMQJDhU6+DSXUflwymeWJmXr7sLFQex6O
 nZO/SVzkbKj+PKxV7UnGD0sTeAAk0X6vfhFCL0l/acPpQg0T6Fpky5D5fUMv5dWS
 xxxvxfwBcDoI44fxWBhfPYvmLFT9f5da+bpbzeeGjVSNezOkPJ65AJcVk5An4kQu
 PRzJGoq3XpZHOeg5+O7IKzeuJ+3qc7Tz4wAzMxcaNFpVBl2qp1RUkTbmS9/YV1b5
 ZOcIFBMLuUROE1ExsU19c5Uo0j1Bvh9jtdy6lNFCagQYzihtA0Jk19ucllx1jIjD
 sdv2hgDIauRToKF1d9xz
 =v5G4
 -----END PGP SIGNATURE-----

Merge tag 'iommu-fixes-v4.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU fixes from Joerg Roedel:
 "A few fixes piled up:

   - Fix for a suspend/resume issue where PCI probing code overwrote
     dev->irq for the MSI irq of the AMD IOMMU.

   - Fix for a kernel crash when a 32 bit PCI device was assigned to a
     KVM guest.

   - Fix for a possible memory leak in the VT-d driver

   - A couple of fixes for the ARM-SMMU driver"

* tag 'iommu-fixes-v4.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/amd: Fix NULL pointer deref on device detach
  iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices
  iommu/vt-d: Fix memory leak in dmar_insert_one_dev_info()
  iommu/arm-smmu: Use correct address mask for CMD_TLBI_S2_IPA
  iommu/arm-smmu: Ensure IAS is set correctly for AArch32-capable SMMUs
  iommu/io-pgtable-arm: Don't use dma_to_phys()
2015-10-13 10:09:59 -07:00
Joerg Roedel b1ce5b79ae iommu/vt-d: Create RMRR mappings in newly allocated domains
Currently the RMRR entries are created only at boot time.
This means they will vanish when the domain allocated at
boot time is destroyed.
This patch makes sure that also newly allocated domains will
get RMRR mappings.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-10-05 17:39:21 +02:00
Joerg Roedel d66ce54b46 iommu/vt-d: Split iommu_prepare_identity_map
Split the part of the function that fetches the domain out
and put the rest into into a domain_prepare_identity_map, so
that the code can also be used with when the domain is
already known.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-10-05 17:38:47 +02:00
Linus Torvalds 8c25ab8b5a Merge git://git.infradead.org/intel-iommu
Pull IOVA fixes from David Woodhouse:
 "The main fix here is the first one, fixing the over-allocation of
   size-aligned requests.  The other patches simply make the existing
  IOVA code available to users other than the Intel VT-d driver, with no
  functional change.

  I concede the latter really *should* have been submitted during the
  merge window, but since it's basically risk-free and people are
  waiting to build on top of it and it's my fault I didn't get it in, I
  (and they) would be grateful if you'd take it"

* git://git.infradead.org/intel-iommu:
  iommu: Make the iova library a module
  iommu: iova: Export symbols
  iommu: iova: Move iova cache management to the iova library
  iommu/iova: Avoid over-allocating when size-aligned
2015-10-02 07:59:29 -04:00
Sudip Mukherjee 499f3aa432 iommu/vt-d: Fix memory leak in dmar_insert_one_dev_info()
We are returning NULL if we are not able to attach the iommu
to the domain but while returning we missed freeing info.

Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-09-29 15:45:50 +02:00
Linus Torvalds 9a9952bbd7 IOMMU Updates for Linux v4.3
This time the IOMMU updates are mostly cleanups or fixes. No big new
 features or drivers this time. In particular the changes include:
 
 	* Bigger cleanup of the Domain<->IOMMU data structures and the
 	  code that manages them in the Intel VT-d driver. This makes
 	  the code easier to understand and maintain, and also easier to
 	  keep the data structures in sync. It is also a preparation
 	  step to make use of default domains from the IOMMU core in the
 	  Intel VT-d driver.
 
 	* Fixes for a couple of DMA-API misuses in ARM IOMMU drivers,
 	  namely in the ARM and Tegra SMMU drivers.
 
 	* Fix for a potential buffer overflow in the OMAP iommu driver's
 	  debug code
 
 	* A couple of smaller fixes and cleanups in various drivers
 
 	* One small new feature: Report domain-id usage in the Intel
 	  VT-d driver to easier detect bugs where these are leaked.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJV7sCEAAoJECvwRC2XARrjz3YP/Au4IIfqykfPvmI0cmPhVnAV
 Q72tltwkbK2u2iP+pHheveaMngJtAshsZrnhBon4KJRIt/KTLZQvsFplHDaRhPfY
 yw3LIxhC5kLG/S6irY9Ozb0+uTMdQ3BU2uS23pyoFVfCz+RngBrAwDBcTKqZDCDG
 8dNd+T21XlzxuyeGr58h9upz2VFtq6feoGFhLU5PNxTlf4JWZe77D7NlbSvx6Nwy
 7Ai8dVRgpV9ciUP7w8FXrCUvbMZQDIoTMiWGNSlogVMgA0dllGES91UZYhWf3pil
 abuX6DeFul/cOhEOnH2xa+j5zz2O/upe9stU4wAFw6IhPiAELTHc2NKlWAhwb0SY
 bpDRf7dgLnUfqpmZLpWjTwN4jllc0qS2MIHj+eUu0uhdFi4Z0BuH2wSCdbR7xkqk
 u5u0Jq7hDNKs5FmQTSsWSiAdjakMsRjIN7jMrBbOeZnBSmUnLx74KGPLTb63ncR3
 WIOi4Iyu+LSXBIvZDiLu3lIIh7Atzd+y7IDnb8KXdyqfy+h53OZZOJNbP/qTWHgT
 ZUdm/qrqjIQpTQfleOEadC7vY/y3fR5sBtOQHUamfntni3oYCc4AMRlNdf3eV9lb
 Tyss6F699mU7d/vennTaIToBgVwaXdLYtmvGWjnoT/kqOMclyDf3cIUtZGtp2rJR
 ddmzDA3vBUC5pGj8Hd8R
 =yoGE
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates for from Joerg Roedel:
 "This time the IOMMU updates are mostly cleanups or fixes.  No big new
  features or drivers this time.  In particular the changes include:

   - Bigger cleanup of the Domain<->IOMMU data structures and the code
     that manages them in the Intel VT-d driver.  This makes the code
     easier to understand and maintain, and also easier to keep the data
     structures in sync.  It is also a preparation step to make use of
     default domains from the IOMMU core in the Intel VT-d driver.

   - Fixes for a couple of DMA-API misuses in ARM IOMMU drivers, namely
     in the ARM and Tegra SMMU drivers.

   - Fix for a potential buffer overflow in the OMAP iommu driver's
     debug code

   - A couple of smaller fixes and cleanups in various drivers

   - One small new feature: Report domain-id usage in the Intel VT-d
     driver to easier detect bugs where these are leaked"

* tag 'iommu-updates-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (83 commits)
  iommu/vt-d: Really use upper context table when necessary
  x86/vt-d: Fix documentation of DRHD
  iommu/fsl: Really fix init section(s) content
  iommu/io-pgtable-arm: Unmap and free table when overwriting with block
  iommu/io-pgtable-arm: Move init-fn declarations to io-pgtable.h
  iommu/msm: Use BUG_ON instead of if () BUG()
  iommu/vt-d: Access iomem correctly
  iommu/vt-d: Make two functions static
  iommu/vt-d: Use BUG_ON instead of if () BUG()
  iommu/vt-d: Return false instead of 0 in irq_remapping_cap()
  iommu/amd: Use BUG_ON instead of if () BUG()
  iommu/amd: Make a symbol static
  iommu/amd: Simplify allocation in irq_remapping_alloc()
  iommu/tegra-smmu: Parameterize number of TLB lines
  iommu/tegra-smmu: Factor out tegra_smmu_set_pde()
  iommu/tegra-smmu: Extract tegra_smmu_pte_get_use()
  iommu/tegra-smmu: Use __GFP_ZERO to allocate zeroed pages
  iommu/tegra-smmu: Remove PageReserved manipulation
  iommu/tegra-smmu: Convert to use DMA API
  iommu/tegra-smmu: smmu_flush_ptc() wants device addresses
  ...
2015-09-08 17:22:35 -07:00
Linus Torvalds d975f309a8 Merge branch 'for-4.3/sg' of git://git.kernel.dk/linux-block
Pull SG updates from Jens Axboe:
 "This contains a set of scatter-gather related changes/fixes for 4.3:

   - Add support for limited chaining of sg tables even for
     architectures that do not set ARCH_HAS_SG_CHAIN.  From Christoph.

   - Add sg chain support to target_rd.  From Christoph.

   - Fixup open coded sg->page_link in crypto/omap-sham.  From
     Christoph.

   - Fixup open coded crypto ->page_link manipulation.  From Dan.

   - Also from Dan, automated fixup of manual sg_unmark_end()
     manipulations.

   - Also from Dan, automated fixup of open coded sg_phys()
     implementations.

   - From Robert Jarzmik, addition of an sg table splitting helper that
     drivers can use"

* 'for-4.3/sg' of git://git.kernel.dk/linux-block:
  lib: scatterlist: add sg splitting function
  scatterlist: use sg_phys()
  crypto/omap-sham: remove an open coded access to ->page_link
  scatterlist: remove open coded sg_unmark_end instances
  crypto: replace scatterwalk_sg_chain with sg_chain
  target/rd: always chain S/G list
  scatterlist: allow limited chaining without ARCH_HAS_SG_CHAIN
2015-09-02 13:22:38 -07:00
Linus Torvalds 26f8b7edc9 PCI changes for the v4.3 merge window:
Enumeration
     Allocate ATS struct during enumeration (Bjorn Helgaas)
     Embed ATS info directly into struct pci_dev (Bjorn Helgaas)
     Reduce size of ATS structure elements (Bjorn Helgaas)
     Stop caching ATS Invalidate Queue Depth (Bjorn Helgaas)
     iommu/vt-d: Cache PCI ATS state and Invalidate Queue Depth (Bjorn Helgaas)
     Move MPS configuration check to pci_configure_device() (Bjorn Helgaas)
     Set MPS to match upstream bridge (Keith Busch)
     ARM/PCI: Set MPS before pci_bus_add_devices() (Murali Karicheri)
     Add pci_scan_root_bus_msi() (Lorenzo Pieralisi)
     ARM/PCI, designware, xilinx: Use pci_scan_root_bus_msi() (Lorenzo Pieralisi)
 
   Resource management
     Call pci_read_bridge_bases() from core instead of arch code (Lorenzo Pieralisi)
 
   PCI device hotplug
     pciehp: Remove unused interrupt events (Bjorn Helgaas)
     pciehp: Remove ignored MRL sensor interrupt events (Bjorn Helgaas)
     pciehp: Handle invalid data when reading from non-existent devices (Jarod Wilson)
     pciehp: Simplify pcie_poll_cmd() (Yijing Wang)
     Use "slot" and "pci_slot" for struct hotplug_slot and struct pci_slot (Yijing Wang)
     Protect pci_bus->slots with pci_slot_mutex, not pci_bus_sem (Yijing Wang)
     Hold pci_slot_mutex while searching bus->slots list (Yijing Wang)
 
   Power management
     Disable async suspend/resume for JMicron multi-function SATA/AHCI (Zhang Rui)
 
   Virtualization
     Add ACS quirks for Intel I219-LM/V (Alex Williamson)
     Restore ACS configuration as part of pci_restore_state() (Alexander Duyck)
 
   MSI
     Add pcibios_alloc_irq() and pcibios_free_irq() (Jiang Liu)
     x86: Implement pcibios_alloc_irq() and pcibios_free_irq() (Jiang Liu)
     Add helpers to manage pci_dev->irq and pci_dev->irq_managed (Jiang Liu)
     Free legacy IRQ when enabling MSI/MSI-X (Jiang Liu)
     ARM/PCI: Remove msi_controller from struct pci_sys_data (Lorenzo Pieralisi)
     Remove unused pcibios_msi_controller() hook (Lorenzo Pieralisi)
 
   Generic host bridge driver
     Remove dependency on ARM-specific struct hw_pci (Jayachandran C)
     Build setup-irq.o for arm64 (Jayachandran C)
     Add arm64 support (Jayachandran C)
 
   APM X-Gene host bridge driver
     Add APM X-Gene PCIe 64-bit prefetchable window (Duc Dang)
     Add support for a 64-bit prefetchable memory window (Duc Dang)
     Drop owner assignment from platform_driver (Krzysztof Kozlowski)
 
   Broadcom iProc host bridge driver
     Allow BCMA bus driver to be built as module (Hauke Mehrtens)
     Delete unnecessary checks before phy calls (Markus Elfring)
     Add arm64 support (Ray Jui)
 
   Synopsys DesignWare host bridge driver
     Don't complain missing *config* reg space if va_cfg0 is set (Murali Karicheri)
 
   TI DRA7xx host bridge driver
     Disable pm_runtime on get_sync failure (Kishon Vijay Abraham I)
     Add PM support (Kishon Vijay Abraham I)
     Clear MSE bit during suspend so clocks will idle (Kishon Vijay Abraham I)
     Add support to make GPIO drive PERST# line (Kishon Vijay Abraham I)
 
   Xilinx AXI host bridge driver
     Check for MSI interrupt flag before handling as INTx (Russell Joyce)
 
   Miscellaneous
     Fix Intersil/Techwell TW686[4589] AV capture class code (Krzysztof Hałasa)
     Use PCI_CLASS_SERIAL_USB instead of bare number (Bjorn Helgaas)
     Fix generic NCR 53c810 class code quirk (Bjorn Helgaas)
     Fix TI816X class code quirk (Bjorn Helgaas)
     Remove unused "pci_probe" flags (Bjorn Helgaas)
     Host bridge driver code simplifications (Fabio Estevam)
     Add dev_flags bit to access VPD through function 0 (Mark Rustad)
     Add VPD function 0 quirk for Intel Ethernet devices (Mark Rustad)
     Kill off set_irq_flags() usage (Rob Herring)
     Remove Intel Cherrytrail D3 delays (Srinidhi Kasagar)
     Clean up pci_find_capability() (Wei Yang)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJV5FE/AAoJEFmIoMA60/r8I2QP/R9b9MrvH2i9tN98/lTDl7g3
 czE58ZM1d4kMYtW3Pm/DrYI6y6RprAaB4ZEp5rHxlFLqBPZEQwWodA19NkjECcb6
 g5qKWOdIWA4T6Jaab6a/yCmAFa0jni7iAmmTYqca9o3Xj7tFovxDxqPSYkh+rer0
 v+1sAr/4HXSiN339KR6teEF3VZqLFp6ewMydQlVS+R7kAOHHYQDqoo9WF6JnIoL5
 PO3Kbmr1WN3fZY3s98yLq1x6XmLrLlmGdJI+2r+KewO4r/05CL6wTVP/oTMi+Eti
 dueseeISlOTcTAUhk87Vap23uJPeB/rJbYoFdCr7+0AkZGe/U/E2dpZm2wyMcCvq
 OrATuFymgzIuJm5uUPsdH4lzsX97U9BcDccracfC38rYnP5u3bqHCjw8HJzANR7p
 VYbFBzc5ZCCUYtQAjyrKt2820AvTFo+Bu+z75IsJO8LQQgv/zGtQQ8grIQeAjH+l
 sAe3xOTwzZnq6Obl4qb/GElHmIGUbQ1X4Dx1mliiijKMKkhYHOA0iFnB/OBILmEZ
 wHzKU8chWcI9lip0aaX8q9i/qovdVUt2+rdo/N40l7YY66x4jkNgQQXZX+FSKk6H
 stTvEBQgK28EKCHDxMsgzTGIqllSyk4DnRMA7ij1hRWqdUbGk7wOPTvm9QSwNDWe
 SokuWzAQD9YeMRGdsYjZ
 =DX1r
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "PCI changes for the v4.3 merge window:

  Enumeration:
   - Allocate ATS struct during enumeration (Bjorn Helgaas)
   - Embed ATS info directly into struct pci_dev (Bjorn Helgaas)
   - Reduce size of ATS structure elements (Bjorn Helgaas)
   - Stop caching ATS Invalidate Queue Depth (Bjorn Helgaas)
   - iommu/vt-d: Cache PCI ATS state and Invalidate Queue Depth (Bjorn Helgaas)
   - Move MPS configuration check to pci_configure_device() (Bjorn Helgaas)
   - Set MPS to match upstream bridge (Keith Busch)
   - ARM/PCI: Set MPS before pci_bus_add_devices() (Murali Karicheri)
   - Add pci_scan_root_bus_msi() (Lorenzo Pieralisi)
   - ARM/PCI, designware, xilinx: Use pci_scan_root_bus_msi() (Lorenzo Pieralisi)

  Resource management:
   - Call pci_read_bridge_bases() from core instead of arch code (Lorenzo Pieralisi)

  PCI device hotplug:
   - pciehp: Remove unused interrupt events (Bjorn Helgaas)
   - pciehp: Remove ignored MRL sensor interrupt events (Bjorn Helgaas)
   - pciehp: Handle invalid data when reading from non-existent devices (Jarod Wilson)
   - pciehp: Simplify pcie_poll_cmd() (Yijing Wang)
   - Use "slot" and "pci_slot" for struct hotplug_slot and struct pci_slot (Yijing Wang)
   - Protect pci_bus->slots with pci_slot_mutex, not pci_bus_sem (Yijing Wang)
   - Hold pci_slot_mutex while searching bus->slots list (Yijing Wang)

  Power management:
   - Disable async suspend/resume for JMicron multi-function SATA/AHCI (Zhang Rui)

  Virtualization:
   - Add ACS quirks for Intel I219-LM/V (Alex Williamson)
   - Restore ACS configuration as part of pci_restore_state() (Alexander Duyck)

  MSI:
   - Add pcibios_alloc_irq() and pcibios_free_irq() (Jiang Liu)
   - x86: Implement pcibios_alloc_irq() and pcibios_free_irq() (Jiang Liu)
   - Add helpers to manage pci_dev->irq and pci_dev->irq_managed (Jiang Liu)
   - Free legacy IRQ when enabling MSI/MSI-X (Jiang Liu)
   - ARM/PCI: Remove msi_controller from struct pci_sys_data (Lorenzo Pieralisi)
   - Remove unused pcibios_msi_controller() hook (Lorenzo Pieralisi)

  Generic host bridge driver:
   - Remove dependency on ARM-specific struct hw_pci (Jayachandran C)
   - Build setup-irq.o for arm64 (Jayachandran C)
   - Add arm64 support (Jayachandran C)

  APM X-Gene host bridge driver:
   - Add APM X-Gene PCIe 64-bit prefetchable window (Duc Dang)
   - Add support for a 64-bit prefetchable memory window (Duc Dang)
   - Drop owner assignment from platform_driver (Krzysztof Kozlowski)

  Broadcom iProc host bridge driver:
   - Allow BCMA bus driver to be built as module (Hauke Mehrtens)
   - Delete unnecessary checks before phy calls (Markus Elfring)
   - Add arm64 support (Ray Jui)

  Synopsys DesignWare host bridge driver:
   - Don't complain missing *config* reg space if va_cfg0 is set (Murali Karicheri)

  TI DRA7xx host bridge driver:
   - Disable pm_runtime on get_sync failure (Kishon Vijay Abraham I)
   - Add PM support (Kishon Vijay Abraham I)
   - Clear MSE bit during suspend so clocks will idle (Kishon Vijay Abraham I)
   - Add support to make GPIO drive PERST# line (Kishon Vijay Abraham I)

  Xilinx AXI host bridge driver:
   - Check for MSI interrupt flag before handling as INTx (Russell Joyce)

  Miscellaneous:
   - Fix Intersil/Techwell TW686[4589] AV capture class code (Krzysztof Hałasa)
   - Use PCI_CLASS_SERIAL_USB instead of bare number (Bjorn Helgaas)
   - Fix generic NCR 53c810 class code quirk (Bjorn Helgaas)
   - Fix TI816X class code quirk (Bjorn Helgaas)
   - Remove unused "pci_probe" flags (Bjorn Helgaas)
   - Host bridge driver code simplifications (Fabio Estevam)
   - Add dev_flags bit to access VPD through function 0 (Mark Rustad)
   - Add VPD function 0 quirk for Intel Ethernet devices (Mark Rustad)
   - Kill off set_irq_flags() usage (Rob Herring)
   - Remove Intel Cherrytrail D3 delays (Srinidhi Kasagar)
   - Clean up pci_find_capability() (Wei Yang)"

* tag 'pci-v4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (72 commits)
  PCI: Disable async suspend/resume for JMicron multi-function SATA/AHCI
  PCI: Set MPS to match upstream bridge
  PCI: Move MPS configuration check to pci_configure_device()
  PCI: Drop references acquired by of_parse_phandle()
  PCI/MSI: Remove unused pcibios_msi_controller() hook
  ARM/PCI: Remove msi_controller from struct pci_sys_data
  ARM/PCI, designware, xilinx: Use pci_scan_root_bus_msi()
  PCI: Add pci_scan_root_bus_msi()
  ARM/PCI: Replace panic with WARN messages on failures
  PCI: generic: Add arm64 support
  PCI: Build setup-irq.o for arm64
  PCI: generic: Remove dependency on ARM-specific struct hw_pci
  PCI: imx6: Simplify a trivial if-return sequence
  PCI: spear: Use BUG_ON() instead of condition followed by BUG()
  PCI: dra7xx: Remove unneeded use of IS_ERR_VALUE()
  PCI: Remove pci_ats_enabled()
  PCI: Stop caching ATS Invalidate Queue Depth
  PCI: Move ATS declarations to linux/pci.h so they're all together
  PCI: Clean up ATS error handling
  PCI: Use pci_physfn() rather than looking up physfn by hand
  ...
2015-08-31 17:14:39 -07:00
Joerg Roedel 4df4eab168 iommu/vt-d: Really use upper context table when necessary
There is a bug in iommu_context_addr() which will always use
the lower context table, even when the upper context table
needs to be used. Fix this issue.

Fixes: 03ecc32c52 ("iommu/vt-d: support extended root and context entries")
Reported-by: Xiao, Nan <nan.xiao@hp.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-25 11:39:27 +02:00
Dan Williams db0fa0cb01 scatterlist: use sg_phys()
Coccinelle cleanup to replace open coded sg to physical address
translations.  This is in preparation for introducing scatterlists that
reference __pfn_t.

// sg_phys.cocci: convert usage page_to_phys(sg_page(sg)) to sg_phys(sg)
// usage: make coccicheck COCCI=sg_phys.cocci MODE=patch

virtual patch

@@
struct scatterlist *sg;
@@

- page_to_phys(sg_page(sg)) + sg->offset
+ sg_phys(sg)

@@
struct scatterlist *sg;
@@

- page_to_phys(sg_page(sg))
+ sg_phys(sg) & PAGE_MASK

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-17 08:13:26 -06:00
Joerg Roedel 543c8dcf1d iommu/vt-d: Access iomem correctly
This fixes wrong accesses to iomem introduced by the kdump
fixing code.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-13 19:49:56 +02:00
Joerg Roedel b690420a40 iommu/vt-d: Make two functions static
These functions are only used in that file and can be
static.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-13 19:49:51 +02:00
Joerg Roedel dc02e46e8d iommu/vt-d: Use BUG_ON instead of if () BUG()
Found by a coccicheck script.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-13 19:49:46 +02:00
Joerg Roedel f303e50766 iommu/vt-d: Avoid duplicate device_domain_info structures
When a 'struct device_domain_info' is created as an alias
for another device, this struct will not be re-used when the
real device is encountered. Fix that to avoid duplicate
device_domain_info structures being added.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-12 16:23:37 +02:00
Joerg Roedel 08a7f456a7 iommu/vt-d: Only insert alias dev_info if there is an alias
For devices without an PCI alias there will be two
device_domain_info structures added. Prevent that by
checking if the alias is different from the device.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-12 16:23:36 +02:00
Joerg Roedel 127c761598 iommu/vt-d: Pass device_domain_info to __dmar_remove_one_dev_info
This struct contains all necessary information for the
function already. Also handle the info->dev == NULL case
while at it.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-12 16:23:36 +02:00
Joerg Roedel 2309bd793e iommu/vt-d: Remove dmar_global_lock from device_notifier
The code in the locked section does not touch anything
protected by the dmar_global_lock. Remove it from there.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-12 16:23:36 +02:00
Joerg Roedel 55d940430a iommu/vt-d: Get rid of domain->iommu_lock
When this lock is held the device_domain_lock is also
required to make sure the device_domain_info does not vanish
while in use. So this lock can be removed as it gives no
additional protection.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-12 16:23:36 +02:00
Joerg Roedel de7e888646 iommu/vt-d: Only call domain_remove_one_dev_info to detach old domain
There is no need to make a difference here between VM and
non-VM domains, so simplify this code here.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-12 16:23:36 +02:00
Joerg Roedel d160aca527 iommu/vt-d: Unify domain->iommu attach/detachment
Move the code to attach/detach domains to iommus and vice
verce into a single function to make sure there are no
dangling references.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-12 16:23:36 +02:00
Joerg Roedel c6c2cebd66 iommu/vt-d: Establish domain<->iommu link in dmar_insert_one_dev_info
This makes domain attachment more synchronous with domain
deattachment. The domain<->iommu link is released in
dmar_remove_one_dev_info.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-12 16:23:36 +02:00
Joerg Roedel dc534b25d1 iommu/vt-d: Pass an iommu pointer to domain_init()
This allows to do domain->iommu attachment after domain_init
has run.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-12 16:23:36 +02:00
Joerg Roedel 2452d9db12 iommu/vt-d: Rename iommu_detach_dependent_devices()
Rename this function and the ones further down its
call-chain to domain_context_clear_*. In particular this
means:

	iommu_detach_dependent_devices -> domain_context_clear
		   iommu_detach_dev_cb -> domain_context_clear_one_cb
		      iommu_detach_dev -> domain_context_clear_one

These names match a lot better with its
domain_context_mapping counterparts.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-12 16:23:35 +02:00
Joerg Roedel e6de0f8dfc iommu/vt-d: Rename domain_remove_one_dev_info()
Rename the function to dmar_remove_one_dev_info to match is
name better with its dmar_insert_one_dev_info counterpart.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-12 16:23:35 +02:00
Joerg Roedel 5db31569e9 iommu/vt-d: Rename dmar_insert_dev_info()
Rename this function to dmar_insert_one_dev_info() to match
the name better with its counter part function
domain_remove_one_dev_info().

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-12 16:23:35 +02:00
Joerg Roedel cc4e2575cc iommu/vt-d: Move context-mapping into dmar_insert_dev_info
Do the context-mapping of devices from a single place in the
call-path and clean up the other call-sites.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-12 16:23:35 +02:00
Joerg Roedel 76f45fe35c iommu/vt-d: Simplify domain_remove_dev_info()
Just call domain_remove_one_dev_info() for all devices in
the domain instead of reimplementing the functionality.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-12 16:23:35 +02:00
Joerg Roedel b608ac3b6d iommu/vt-d: Simplify domain_remove_one_dev_info()
Simplify this function as much as possible with the new
iommu_refcnt field.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-12 16:23:34 +02:00
Joerg Roedel 42e8c186b5 iommu/vt-d: Simplify io/tlb flushing in intel_iommu_unmap
We don't need to do an expensive search for domain-ids
anymore, as we keep track of per-iommu domain-ids.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-12 16:23:34 +02:00
Joerg Roedel 29a27719ab iommu/vt-d: Replace iommu_bmp with a refcount
This replaces the dmar_domain->iommu_bmp with a similar
reference count array. This allows us to keep track of how
many devices behind each iommu are attached to the domain.

This is necessary for further simplifications and
optimizations to the iommu<->domain attachment code.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-12 16:23:34 +02:00
Joerg Roedel af1089ce38 iommu/vt-d: Kill dmar_domain->id
This field is now obsolete because all places use the
per-iommu domain-ids. Kill the remaining uses of this field
and remove it.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-12 16:23:34 +02:00
Joerg Roedel 0dc7971594 iommu/vt-d: Don't pre-allocate domain ids for si_domain
There is no reason for this special handling of the
si_domain. The per-iommu domain-id can be allocated
on-demand like for any other domain. So remove the
pre-allocation code.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-12 16:23:34 +02:00
Joerg Roedel a1ddcbe930 iommu/vt-d: Pass dmar_domain directly into iommu_flush_iotlb_psi
This function can figure out the domain-id to use itself
from the iommu_did array. This is more reliable over
different domain types and brings us one step further to
remove the domain->id field.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-12 16:23:34 +02:00
Joerg Roedel de24e55395 iommu/vt-d: Simplify domain_context_mapping_one
Get rid of the special cases for VM domains vs. non-VM
domains and simplify the code further to just handle the
hardware passthrough vs. page-table case.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-12 16:23:33 +02:00
Joerg Roedel 28ccce0d95 iommu/vt-d: Calculate translation in domain_context_mapping_one
There is no reason to pass the translation type through
multiple layers. It can also be determined in the
domain_context_mapping_one function directly.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-12 16:23:33 +02:00
Joerg Roedel e2411427f7 iommu/vt-d: Get rid of iommu_attach_vm_domain()
The special case for VM domains is not needed, as other
domains could be attached to the iommu in the same way. So
get rid of this special case.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-12 16:23:33 +02:00
Joerg Roedel 8bf478163e iommu/vt-d: Split up iommu->domains array
This array is indexed by the domain-id and contains the
pointers to the domains attached to this iommu. Modern
systems support 65536 domain ids, so that this array has a
size of 512kb, per iommu.

This is a huge waste of space, as the array is usually
sparsely populated. This patch makes the array
two-dimensional and allocates the memory for the domain
pointers on-demand.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-12 16:23:33 +02:00
Joerg Roedel 9452d5bfe5 iommu/vt-d: Add access functions for iommu->domains
This makes it easier to change the layout of the data
structure later.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-12 16:23:33 +02:00
Joerg Roedel c0e8a6c803 iommu/vt-d: Keep track of per-iommu domain ids
Instead of searching in the domain array for already
allocated domain ids, keep track of them explicitly.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-12 16:23:32 +02:00
Alex Williamson 2238c0827a iommu/vt-d: Report domain usage in sysfs
Debugging domain ID leakage typically requires long running tests in
order to exhaust the domain ID space or kernel instrumentation to
track the setting and clearing of bits.  A couple trivial intel-iommu
specific sysfs extensions make it much easier to expose the IOMMU
capabilities and current usage.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-03 16:30:57 +02:00
Kees Cook 2439d4aa92 iommu/vt-d: Avoid format string leaks into iommu_device_create
This makes sure it won't be possible to accidentally leak format
strings into iommu device names. Current name allocations are safe,
but this makes the "%s" explicit.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-08-03 16:15:47 +02:00
Sakari Ailus ae1ff3d623 iommu: iova: Move iova cache management to the iova library
This is necessary to separate intel-iommu from the iova library.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2015-07-28 15:47:58 +01:00
Robin Murphy 8f6429c7cb iommu/iova: Avoid over-allocating when size-aligned
Currently, allocating a size-aligned IOVA region quietly adjusts the
actual allocation size in the process, returning a rounded-up
power-of-two-sized allocation. This results in mismatched behaviour in
the IOMMU driver if the original size was not a power of two, where the
original size is mapped, but the rounded-up IOVA size is unmapped.

Whilst some IOMMUs will happily unmap already-unmapped pages, others
consider this an error, so fix it by computing the necessary alignment
padding without altering the actual allocation size. Also clean up by
making pad_size unsigned, since its callers always pass unsigned values
and negative padding makes little sense here anyway.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2015-07-28 15:47:56 +01:00
Alex Williamson 46ebb7af7b iommu/vt-d: Fix VM domain ID leak
This continues the attempt to fix commit fb170fb4c5 ("iommu/vt-d:
Introduce helper functions to make code symmetric for readability").
The previous attempt in commit 7168440690 ("iommu/vt-d: Detach
domain *only* from attached iommus") overlooked the fact that
dmar_domain.iommu_bmp gets cleared for VM domains when devices are
detached:

intel_iommu_detach_device
  domain_remove_one_dev_info
    domain_detach_iommu

The domain is detached from the iommu, but the iommu is still attached
to the domain, for whatever reason.  Thus when we get to domain_exit(),
we can't rely on iommu_bmp for VM domains to find the active iommus,
we must check them all.  Without that, the corresponding bit in
intel_iommu.domain_ids doesn't get cleared and repeated VM domain
creation and destruction will run out of domain IDs.  Meanwhile we
still can't call iommu_detach_domain() on arbitrary non-VM domains or
we risk clearing in-use domain IDs, as 7168440690 attempted to
address.

It's tempting to modify iommu_detach_domain() to test the domain
iommu_bmp, but the call ordering from domain_remove_one_dev_info()
prevents it being able to work as fb170fb4c5 seems to have intended.
Caching of unused VM domains on the iommu object seems to be the root
of the problem, but this code is far too fragile for that kind of
rework to be proposed for stable, so we simply revert this chunk to
its state prior to fb170fb4c5.

Fixes: fb170fb4c5 ("iommu/vt-d: Introduce helper functions to make
                      code symmetric for readability")
Fixes: 7168440690 ("iommu/vt-d: Detach domain *only* from attached
                      iommus")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: stable@vger.kernel.org # v3.17+
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-07-23 14:17:39 +02:00
Bjorn Helgaas fb0cc3aa55 iommu/vt-d: Cache PCI ATS state and Invalidate Queue Depth
We check the ATS state (enabled/disabled) and fetch the PCI ATS Invalidate
Queue Depth in performance-sensitive paths.  It's easy to cache these,
which removes dependencies on PCI.

Remember the ATS enabled state.  When enabling, read the queue depth once
and cache it in the device_domain_info struct.  This is similar to what
amd_iommu.c does.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
2015-07-20 11:49:46 -05:00
Joerg Roedel 8939ddf6d6 iommu/vt-d: Enable Translation only if it was previously disabled
Do not touch the TE bit unless we know translation is
disabled.

Tested-by: ZhenHua Li <zhen-hual@hp.com>
Tested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-06-16 10:59:35 +02:00
Joerg Roedel 60b523ecfe iommu/vt-d: Don't disable translation prior to OS handover
For all the copy-translation code to run, we have to keep
translation enabled in intel_iommu_init(). So remove the
code disabling it.

Tested-by: ZhenHua Li <zhen-hual@hp.com>
Tested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-06-16 10:59:35 +02:00
Joerg Roedel c3361f2f6e iommu/vt-d: Don't copy translation tables if RTT bit needs to be changed
We can't change the RTT bit when translation is enabled, so
don't copy translation tables when we would change the bit
with our new root entry.

Tested-by: ZhenHua Li <zhen-hual@hp.com>
Tested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-06-16 10:59:35 +02:00
Joerg Roedel a87f491890 iommu/vt-d: Don't do early domain assignment if kdump kernel
When we copied over context tables from an old kernel, we
need to defer assignment of devices to domains until the
device driver takes over. So skip this part of
initialization when we copied over translation tables from
the old kernel.

Tested-by: ZhenHua Li <zhen-hual@hp.com>
Tested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-06-16 10:59:35 +02:00
Joerg Roedel 86080ccc22 iommu/vt-d: Allocate si_domain in init_dmars()
This seperates the allocation of the si_domain from its
assignment to devices. It makes sure that the iommu=pt case
still works in the kdump kernel, when we have to defer the
assignment of devices to domains to device driver
initialization time.

Tested-by: ZhenHua Li <zhen-hual@hp.com>
Tested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-06-16 10:59:34 +02:00
Joerg Roedel cf484d0e69 iommu/vt-d: Mark copied context entries
Mark the context entries we copied over from the old kernel,
so that we don't detect them as present in other code paths.
This makes sure we safely overwrite old context entries when
a new domain is assigned.

Tested-by: ZhenHua Li <zhen-hual@hp.com>
Tested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-06-16 10:59:34 +02:00
Joerg Roedel dbcd861f25 iommu/vt-d: Do not re-use domain-ids from the old kernel
Mark all domain-ids we find as reserved, so that there could
be no collision between domains from the previous kernel and
our domains in the IOMMU TLB.

Tested-by: ZhenHua Li <zhen-hual@hp.com>
Tested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-06-16 10:59:34 +02:00
Joerg Roedel 091d42e43d iommu/vt-d: Copy translation tables from old kernel
If we are in a kdump kernel and find translation enabled in
the iommu, try to copy the translation tables from the old
kernel to preserve the mappings until the device driver
takes over.
This supports old and the extended root-entry and
context-table formats.

Tested-by: ZhenHua Li <zhen-hual@hp.com>
Tested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-06-16 10:59:34 +02:00
Joerg Roedel 4158c2eca3 iommu/vt-d: Detect pre enabled translation
Add code to detect whether translation is already enabled in
the IOMMU. Save this state in a flags field added to
struct intel_iommu.

Tested-by: ZhenHua Li <zhen-hual@hp.com>
Tested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-06-16 10:59:34 +02:00
Joerg Roedel 5f0a7f7614 iommu/vt-d: Make root entry visible for hardware right after allocation
In case there was an old root entry, make our new one
visible immediately after it was allocated.

Tested-by: ZhenHua Li <zhen-hual@hp.com>
Tested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-06-16 10:59:34 +02:00
Joerg Roedel b63d80d1e0 iommu/vt-d: Init QI before root entry is allocated
QI needs to be available when we write the root entry into
hardware because flushes might be necessary after this.

Tested-by: ZhenHua Li <zhen-hual@hp.com>
Tested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-06-16 10:59:33 +02:00
Joerg Roedel 9f10e5bf62 iommu/vt-d: Cleanup log messages
Give them a common prefix that can be grepped for and
improve the wording here and there.

Tested-by: ZhenHua Li <zhen-hual@hp.com>
Tested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-06-16 10:59:33 +02:00
David Woodhouse c83b2f20fd iommu/vt-d: Only enable extended context tables if PASID is supported
Although the extended tables are theoretically a completely orthogonal
feature to PASID and anything else that *uses* the newly-available bits,
some of the early hardware has problems even when all we do is enable
them and use only the same bits that were in the old context tables.

For now, there's no motivation to support extended tables unless we're
going to use PASID support to do SVM. So just don't use them unless
PASID support is advertised too. Also add a command-line bailout just in
case later chips also have issues.

The equivalent problem for PASID support has already been fixed with the
upcoming VT-d spec update and commit bd00c606a ("iommu/vt-d: Change
PASID support to bit 40 of Extended Capability Register"), because the
problematic platforms use the old definition of the PASID-capable bit,
which is now marked as reserved and meaningless.

So with this change, we'll magically start using ECS again only when we
see the new hardware advertising "hey, we have PASID support and we
actually tested it this time" on bit 40.

The VT-d hardware architect has promised that we are not going to have
any reason to support ECS *without* PASID any time soon, and he'll make
sure he checks with us before changing that.

In the future, if hypothetical new features also use new bits in the
context tables and can be seen on implementations *without* PASID support,
we might need to add their feature bits to the ecs_enabled() macro.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2015-06-12 11:31:25 +01:00
David Woodhouse 4ed6a540fa iommu/vt-d: Fix passthrough mode with translation-disabled devices
When we use 'intel_iommu=igfx_off' to disable translation for the
graphics, and when we discover that the BIOS has misconfigured the DMAR
setup for I/OAT, we use a special DUMMY_DEVICE_DOMAIN_INFO value in
dev->archdata.iommu to indicate that translation is disabled.

With passthrough mode, we were attempting to dereference that as a
normal pointer to a struct device_domain_info when setting up an
identity mapping for the affected device.

This fixes the problem by making device_to_iommu() explicitly check for
the special value and indicate that no IOMMU was found to handle the
devices in question.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@vger.kernel.org (which means you can pick up 18436afdc now too)
2015-05-11 14:59:20 +01:00
Linus Torvalds 9f86262dcc Merge git://git.infradead.org/intel-iommu
Pull intel iommu updates from David Woodhouse:
 "This lays a little of the groundwork for upcoming Shared Virtual
  Memory support — fixing some bogus #defines for capability bits and
  adding the new ones, and starting to use the new wider page tables
  where we can, in anticipation of actually filling in the new fields
  therein.

  It also allows graphics devices to be assigned to VM guests again.
  This got broken in 3.17 by disallowing assignment of RMRR-afflicted
  devices.  Like USB, we do understand why there's an RMRR for graphics
  devices — and unlike USB, it's actually sane.  So we can make an
  exception for graphics devices, just as we do USB controllers.

  Finally, tone down the warning about the X2APIC_OPT_OUT bit, due to
  persistent requests.  X2APIC_OPT_OUT was added to the spec as a nasty
  hack to allow broken BIOSes to forbid us from using X2APIC when they
  do stupid and invasive things and would break if we did.

  Someone noticed that since Windows doesn't have full IOMMU support for
  DMA protection, setting the X2APIC_OPT_OUT bit made Windows avoid
  initialising the IOMMU on the graphics unit altogether.

  This means that it would be available for use in "driver mode", where
  the IOMMU registers are made available through a BAR of the graphics
  device and the graphics driver can do SVM all for itself.

  So they started setting the X2APIC_OPT_OUT bit on *all* platforms with
  SVM capabilities.  And even the platforms which *might*, if the
  planets had been aligned correctly, possibly have had SVM capability
  but which in practice actually don't"

* git://git.infradead.org/intel-iommu:
  iommu/vt-d: support extended root and context entries
  iommu/vt-d: Add new extended capabilities from v2.3 VT-d specification
  iommu/vt-d: Allow RMRR on graphics devices too
  iommu/vt-d: Print x2apic opt out info instead of printing a warning
  iommu/vt-d: kill bogus ecap_niotlb_iunits()
2015-04-26 17:47:46 -07:00
Linus Torvalds 79319a052c IOMMU Updates for Linux v4.1
Not much this time, but the changes include:
 
 	* Moving domain allocation into the iommu drivers to prepare for
 	  the introduction of default domains for devices
 
 	* Fixing the IO page-table code in the AMD IOMMU driver to
 	  correctly encode large page sizes
 
 	* Extension of the PCI support in the ARM-SMMU driver
 
 	* Various fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVNFIPAAoJECvwRC2XARrj4v8QAMVsPJ+kmnLvqGDkO9v2i9z6
 sFX27h55HhK3Pgb5aEmEhvZd0Eec22KtuADr92LsRSjskgA4FgrzzSlo8w7+MbwM
 dtowij+5Bzx/jEeexM5gog0ZA9Brl725KSYBmwJIAroKAtl3YXsIA4TO7X/JtXJm
 0qWbCxLs9CX5uWyJawkeDl8UAaZYb8AHKv1UhJt8Z5yajM/qITMULi51g2Bgh8kx
 YaRHeZNj+mFQqb6IlNkmOhILN+dbTdxQREp+aJs1alGdkBGlJyfo6eK4weNOpA4x
 gc8EXUWZzj1GEPyWMpA/ZMzPzCbj9M6wTeXqRiTq31AMV10zcy545uYcLWks680M
 CYvWTmjeCvwsbuaj9cn+efa47foH2UoeXxBmXWOJDv4WxcjE1ejmlmSd8WYfwkh9
 hIkMzD8tW2iZf3ssnjCeQLa7f6ydL2P4cpnK2JH+N7hN9VOASAlciezroFxtCjU+
 18T7ozgUTbOXZZomBX7OcGQ8ElXMiHB/uaCyNO64yVzApsUnQfpHzcRI5OavOYn5
 dznjrzvNLCwHs3QFI4R7rsmIfPkOM0g5nY5drGwJ23+F+rVpLmpWVPR5hqT7a1HM
 tJVmzces6HzOu7P1Mo0IwvNbZEmNBGTHYjGtWs6e79MQxdriFT4I+DwvFOy7GUq/
 Is2b+HPwWhiWJHQXLTT2
 =FMxH
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU updates from Joerg Roedel:
 "Not much this time, but the changes include:

   - moving domain allocation into the iommu drivers to prepare for the
     introduction of default domains for devices

   - fixing the IO page-table code in the AMD IOMMU driver to correctly
     encode large page sizes

   - extension of the PCI support in the ARM-SMMU driver

   - various fixes and cleanups"

* tag 'iommu-updates-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (34 commits)
  iommu/amd: Correctly encode huge pages in iommu page tables
  iommu/amd: Optimize amd_iommu_iova_to_phys for new fetch_pte interface
  iommu/amd: Optimize alloc_new_range for new fetch_pte interface
  iommu/amd: Optimize iommu_unmap_page for new fetch_pte interface
  iommu/amd: Return the pte page-size in fetch_pte
  iommu/amd: Add support for contiguous dma allocator
  iommu/amd: Don't allocate with __GFP_ZERO in alloc_coherent
  iommu/amd: Ignore BUS_NOTIFY_UNBOUND_DRIVER event
  iommu/amd: Use BUS_NOTIFY_REMOVED_DEVICE
  iommu/tegra: smmu: Compute PFN mask at runtime
  iommu/tegra: gart: Set aperture at domain initialization time
  iommu/tegra: Setup aperture
  iommu: Remove domain_init and domain_free iommu_ops
  iommu/fsl: Make use of domain_alloc and domain_free
  iommu/rockchip: Make use of domain_alloc and domain_free
  iommu/ipmmu-vmsa: Make use of domain_alloc and domain_free
  iommu/shmobile: Make use of domain_alloc and domain_free
  iommu/msm: Make use of domain_alloc and domain_free
  iommu/tegra-gart: Make use of domain_alloc and domain_free
  iommu/tegra-smmu: Make use of domain_alloc and domain_free
  ...
2015-04-20 10:50:05 -07:00
Rafael J. Wysocki 9a9ca16e7a Merge branch 'device-properties'
* device-properties:
  device property: Introduce firmware node type for platform data
  device property: Make it possible to use secondary firmware nodes
  driver core: Implement device property accessors through fwnode ones
  driver core: property: Update fwnode_property_read_string_array()
  driver core: Add comments about returning array counts
  ACPI: Introduce has_acpi_companion()
  driver core / ACPI: Represent ACPI companions using fwnode_handle
2015-04-13 00:35:54 +02:00
Joerg Roedel 7f65ef01e1 Merge branches 'iommu/fixes', 'x86/vt-d', 'x86/amd', 'arm/smmu', 'arm/tegra' and 'core' into next
Conflicts:
	drivers/iommu/amd_iommu.c
	drivers/iommu/tegra-gart.c
	drivers/iommu/tegra-smmu.c
2015-04-02 13:33:19 +02:00
Joerg Roedel 00a77deb0f iommu/vt-d: Make use of domain_alloc and domain_free
Get rid of domain_init and domain_destroy and implement
domain_alloc/domain_free instead.

Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-03-31 15:32:02 +02:00
David Woodhouse 03ecc32c52 iommu/vt-d: support extended root and context entries
Add a new function iommu_context_addr() which takes care of the
differences and returns a pointer to a context entry which may be
in either format. The formats are binary compatible for all the old
fields anyway; the new one is just larger and some of the reserved
bits in the original 128 are now meaningful.

So far, nothing actually uses the new fields in the extended context
entry. Modulo hardware bugs with interpreting the new-style tables,
this should basically be a no-op.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2015-03-25 15:46:13 +00:00
David Woodhouse 18436afdc1 iommu/vt-d: Allow RMRR on graphics devices too
Commit c875d2c1 ("iommu/vt-d: Exclude devices using RMRRs from IOMMU API
domains") prevents certain options for devices with RMRRs. This even
prevents those devices from getting a 1:1 mapping with 'iommu=pt',
because we don't have the code to handle *preserving* the RMRR regions
when moving the device between domains.

There's already an exclusion for USB devices, because we know the only
reason for RMRRs there is a misguided desire to keep legacy
keyboard/mouse emulation running in some theoretical OS which doesn't
have support for USB in its own right... but which *does* enable the
IOMMU.

Add an exclusion for graphics devices too, so that 'iommu=pt' works
there. We should be able to successfully assign graphics devices to
guests too, as long as the initial handling of stolen memory is
reconfigured appropriately. This has certainly worked in the past.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@vger.kernel.org
2015-03-25 15:36:35 +00:00
Alex Williamson 509fca899d iommu/vt-d: Remove unused variable
Unused after commit 7168440690 ("iommu/vt-d: Detach domain *only*
from attached iommus").  Reported by 0-day builder.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-03-24 15:39:26 +01:00
Alex Williamson 7168440690 iommu/vt-d: Detach domain *only* from attached iommus
Device domains never span IOMMU hardware units, which allows the
domain ID space for each IOMMU to be an independent address space.
Therefore we can have multiple, independent domains, each with the
same domain->id, but attached to different hardware units.  This is
also why we need to do a heavy-weight search for VM domains since
they can span multiple IOMMUs hardware units and we don't require a
single global ID to use for all hardware units.

Therefore, if we call iommu_detach_domain() across all active IOMMU
hardware units for a non-VM domain, the result is that we clear domain
IDs that are not associated with our domain, allowing them to be
re-allocated and causing apparent coherency issues when the device
cannot access IOVAs for the intended domain.

This bug was introduced in commit fb170fb4c5 ("iommu/vt-d: Introduce
helper functions to make code symmetric for readability"), but is
significantly exacerbated by the more recent commit 62c22167dd
("iommu/vt-d: Fix dmar_domain leak in iommu_attach_device") which calls
domain_exit() more frequently to resolve a domain leak.

Fixes: fb170fb4c5 ("iommu/vt-d: Introduce helper functions to make code symmetric for readability")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: stable@vger.kernel.org # v3.17+
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-03-23 15:22:08 +01:00
Rafael J. Wysocki ca5b74d267 ACPI: Introduce has_acpi_companion()
Now that the ACPI companions of devices are represented by pointers
to struct fwnode_handle, it is not quite efficient to check whether
or not an ACPI companion of a device is present by evaluating the
ACPI_COMPANION() macro.

For this reason, introduce a special static inline routine for that,
has_acpi_companion(), and update the code to use it where applicable.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-03-16 23:49:08 +01:00
Quentin Lambert 2f119c7895 iommu/vt-d: Convert non-returned local variable to boolean when relevant
This patch was produced using Coccinelle. A simplified version of the
semantic patch is:

@r exists@
identifier f;
local idexpression u8 x;
identifier xname;
@@

f(...) {
...when any
(
  x@xname = 1;
|
  x@xname = 0;
)
...when any
}

@bad exists@
identifier r.f;
local idexpression u8 r.x
expression e1 != {0, 1}, e2;
@@

f(...) {
...when any
(
  x = e1;
|
  x + e2
)
...when any
}

@depends on !bad@
identifier r.f;
local idexpression u8 r.x;
identifier r.xname;
@@

f(...) {
...
++ bool xname;
- int xname;
<...
(
  x =
- 1
+ true
|
  x =
- -1
+ false
)
...>

}

Signed-off-by: Quentin Lambert <lambert.quentin@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-03-03 14:13:20 +01:00
Robin Murphy 0fb5fe874c iommu: Make IOVA domain page size explicit
Systems may contain heterogeneous IOMMUs supporting differing minimum
page sizes, which may also not be common with the CPU page size.
Thus it is practical to have an explicit notion of IOVA granularity
to simplify handling of mapping and allocation constraints.

As an initial step, move the IOVA page granularity from an implicit
compile-time constant to a per-domain property so we can make use
of it in IOVA domain context at runtime. To keep the abstraction tidy,
extend the little API of inline iova_* helpers to parallel some of the
equivalent PAGE_* macros.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-01-19 14:55:22 +01:00
Robin Murphy 1b72250076 iommu: Make IOVA domain low limit flexible
To share the IOVA allocator with other architectures, it needs to
accommodate more general aperture restrictions; move the lower limit
from a compile-time constant to a runtime domain property to allow
IOVA domains with different requirements to co-exist.

Also reword the slightly unclear description of alloc_iova since we're
touching it anyway.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-01-19 14:55:22 +01:00
Robin Murphy 85b4545629 iommu: Consolidate IOVA allocator code
In order to share the IOVA allocator with other architectures, break
the unnecssary dependency on the Intel IOMMU driver and move the
remaining IOVA internals to iova.c

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-01-19 14:55:22 +01:00
Joerg Roedel 6d1b9cc9ee iommu/vt-d: Remove dead code in device_notifier
This code only runs when action == BUS_NOTIFY_REMOVED_DEVICE,
so it can't be BUS_NOTIFY_DEL_DEVICE.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-01-05 12:23:38 +01:00
Joerg Roedel 62c22167dd iommu/vt-d: Fix dmar_domain leak in iommu_attach_device
Since commit 1196c2f a domain is only destroyed in the
notifier path if it is hot-unplugged. This caused a
domain leakage in iommu_attach_device when a driver was
unbound from the device and bound to VFIO. In this case the
device is attached to a new domain and unlinked from the old
domain. At this point nothing points to the old domain
anymore and its memory is leaked.
Fix this by explicitly freeing the old domain in
iommu_attach_domain.

Fixes: 1196c2f (iommu/vt-d: Fix dmar_domain leak in iommu_attach_device)
Cc: stable@vger.kernel.org # v3.18
Tested-by: Jerry Hoemann <jerry.hoemann@hp.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-01-05 12:23:38 +01:00
Joerg Roedel 76771c938e Merge branches 'arm/omap', 'arm/msm', 'arm/rockchip', 'arm/renesas', 'arm/smmu', 'x86/vt-d', 'x86/amd' and 'core' into next
Conflicts:
	drivers/iommu/arm-smmu.c
2014-12-02 13:07:13 +01:00
Jiang Liu cc4f14aa17 iommu/vt-d: Fix an off-by-one bug in __domain_mapping()
There's an off-by-one bug in function __domain_mapping(), which may
trigger the BUG_ON(nr_pages < lvl_pages) when
	(nr_pages + 1) & superpage_mask == 0

The issue was introduced by commit 9051aa0268 "intel-iommu: Combine
domain_pfn_mapping() and domain_sg_mapping()", which sets sg_res to
"nr_pages + 1" to avoid some of the 'sg_res==0' code paths.

It's safe to remove extra "+1" because sg_res is only used to calculate
page size now.

Reported-And-Tested-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: <stable@vger.kernel.org> # >= 3.0
Acked-By: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-12-02 13:03:09 +01:00
Jiang Liu ffebeb46dd iommu/vt-d: Enhance intel-iommu driver to support DMAR unit hotplug
Implement required callback functions for intel-iommu driver
to support DMAR unit hotplug.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Reviewed-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-11-18 11:18:36 +01:00
Jiang Liu 6b1972493a iommu/vt-d: Implement DMAR unit hotplug framework
On Intel platforms, an IO Hub (PCI/PCIe host bridge) may contain DMAR
units, so we need to support DMAR hotplug when supporting PCI host
bridge hotplug on Intel platforms.

According to Section 8.8 "Remapping Hardware Unit Hot Plug" in "Intel
Virtualization Technology for Directed IO Architecture Specification
Rev 2.2", ACPI BIOS should implement ACPI _DSM method under the ACPI
object for the PCI host bridge to support DMAR hotplug.

This patch introduces interfaces to parse ACPI _DSM method for
DMAR unit hotplug. It also implements state machines for DMAR unit
hot-addition and hot-removal.

The PCI host bridge hotplug driver should call dmar_hotplug_hotplug()
before scanning PCI devices connected for hot-addition and after
destroying all PCI devices for hot-removal.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Reviewed-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-11-18 11:18:35 +01:00
Jiang Liu 78d8e70461 iommu/vt-d: Dynamically allocate and free seq_id for DMAR units
Introduce functions to support dynamic IOMMU seq_id allocating and
releasing, which will be used to support DMAR hotplug.

Also rename IOMMU_UNITS_SUPPORTED as DMAR_UNITS_SUPPORTED.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Reviewed-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-11-18 11:18:35 +01:00
Jiang Liu c2a0b538d2 iommu/vt-d: Introduce helper function dmar_walk_resources()
Introduce helper function dmar_walk_resources to walk resource entries
in DMAR table and ACPI buffer object returned by ACPI _DSM method
for IOMMU hot-plug.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-11-18 11:18:35 +01:00
Li, Zhen-Hua 1a2262f90f x86/vt-d: Fix incorrect bit operations in setting values
The function context_set_address_root() and set_root_value are setting new
address in a wrong way, and this patch is trying to fix this problem.

According to Intel Vt-d specs(Feb 2011, Revision 1.3), Chapter 9.1 and 9.2,
field ctp in root entry is using bits 12:63, field asr in context entry is
using bits 12:63.

To set these fields, the following functions are used:
static inline void context_set_address_root(struct context_entry *context,
        unsigned long value);
and
static inline void set_root_value(struct root_entry *root, unsigned long value)

But they are using an invalid method to set these fields, in current code, only
a '|' operator is used to set it. This will not set the asr to the expected
value if it has an old value.

For example:
Before calling this function,
	context->lo = 0x3456789012111;
	value = 0x123456789abcef12;

After we call context_set_address_root(context, value), expected result is
	context->lo == 0x123456789abce111;

But the actual result is:
	context->lo == 0x1237577f9bbde111;

So we need to clear bits 12:63 before setting the new value, this will fix
this problem.

Signed-off-by: Li, Zhen-Hua <zhen-hual@hp.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-11-06 14:40:10 +01:00
Olav Haugan 315786ebbf iommu: Add iommu_map_sg() function
Mapping and unmapping are more often than not in the critical path.
map_sg allows IOMMU driver implementations to optimize the process
of mapping buffers into the IOMMU page tables.

Instead of mapping a buffer one page at a time and requiring potentially
expensive TLB operations for each page, this function allows the driver
to map all pages in one go and defer TLB maintenance until after all
pages have been mapped.

Additionally, the mapping operation would be faster in general since
clients does not have to keep calling map API over and over again for
each physically contiguous chunk of memory that needs to be mapped to a
virtually contiguous region.

Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-11-04 14:53:36 +01:00
Joerg Roedel 09b5269a1b Merge branches 'arm/exynos', 'arm/omap', 'arm/smmu', 'x86/vt-d', 'x86/amd' and 'core' into next
Conflicts:
	drivers/iommu/arm-smmu.c
2014-10-02 12:24:45 +02:00
Joerg Roedel 1196c2fb04 iommu/vt-d: Only remove domain when device is removed
This makes sure any RMRR mappings stay in place when the
driver is unbound from the device.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Jerry Hoemann <jerry.hoemann@hp.com>
2014-10-02 11:18:58 +02:00
Joerg Roedel 5d587b8de5 iommu/vt-d: Convert to iommu_capable() API function
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-09-25 15:47:37 +02:00
Joerg Roedel e7f9fa5498 iommu/vt-d: Defer domain removal if device is assigned to a driver
When the BUS_NOTIFY_DEL_DEVICE event is received the device
might still be attached to a driver. In this case the domain
can't be released as the mappings might still be in use.

Defer the domain removal in this case until we receivce the
BUS_NOTIFY_UNBOUND_DRIVER event.

Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: stable@vger.kernel.org   # v3.15, v3.16
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-08-18 13:37:56 +02:00
Alex Williamson c875d2c1b8 iommu/vt-d: Exclude devices using RMRRs from IOMMU API domains
The user of the IOMMU API domain expects to have full control of
the IOVA space for the domain.  RMRRs are fundamentally incompatible
with that idea.  We can neither map the RMRR into the IOMMU API
domain, nor can we guarantee that the device won't continue DMA with
the area described by the RMRR as part of the new domain.  Therefore
we must prevent such devices from being used by the IOMMU API.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-07-29 17:38:31 +02:00
Jiang Liu 161f693460 iommu/vt-d: Fix issue in computing domain's iommu_snooping flag
IOMMU units may dynamically attached to/detached from domains,
so we should scan all active IOMMU units when computing iommu_snooping
flag for a domain instead of only scanning IOMMU units associated
with the domain.

Also check snooping and superpage capabilities when hot-adding DMAR units.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-07-23 16:04:47 +02:00
Jiang Liu a156ef99e8 iommu/vt-d: Introduce helper function iova_size() to improve code readability
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-07-23 16:04:47 +02:00
Jiang Liu 162d1b10d4 iommu/vt-d: Introduce helper domain_pfn_within_range() to simplify code
Introduce helper function domain_pfn_within_range() to simplify code
and improve readability.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-07-23 16:04:47 +02:00
Jiang Liu d41a4adb1b iommu/vt-d: Simplify intel_unmap_sg() and kill duplicated code
Introduce intel_unmap() to reduce duplicated code in intel_unmap_sg()
and intel_unmap_page().

Also let dma_pte_free_pagetable() to call dma_pte_clear_range() directly,
so caller only needs to call dma_pte_free_pagetable().

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-07-23 16:04:47 +02:00
Jiang Liu 2a41ccee2f iommu/vt-d: Change iommu_enable/disable_translation to return void
Simplify error handling path by changing iommu_{enable|disable}_translation
to return void.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-07-23 16:04:47 +02:00
Jiang Liu 129ad28100 iommu/vt-d: Avoid freeing virtual machine domain in free_dmar_iommu()
Virtual machine domains are created by intel_iommu_domain_init() and
should be destroyed by intel_iommu_domain_destroy(). So avoid freeing
virtual machine domain data structure in free_dmar_iommu() when
doamin->iommu_count reaches zero, otherwise it may cause invalid
memory access because the IOMMU framework still holds references
to the domain structure.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-07-23 16:04:47 +02:00
Jiang Liu 2a46ddf77c iommu/vt-d: Fix possible invalid memory access caused by free_dmar_iommu()
Static identity and virtual machine domains may be cached in
iommu->domain_ids array after corresponding IOMMUs have been removed
from domain->iommu_bmp. So we should check domain->iommu_bmp before
decreasing domain->iommu_count in function free_dmar_iommu(), otherwise
it may cause free of inuse domain data structure.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-07-23 16:04:47 +02:00
Jiang Liu 44bde61428 iommu/vt-d: Allocate dynamic domain id for virtual domains only
Check the same domain id is allocated for si_domain on each IOMMU,
otherwise the IOTLB flush for si_domain will fail.

Now the rules to allocate and manage domain id are:
1) For normal and static identity domains, domain id is allocated
   when creating domain structure. And this id will be written into
   context entry.
2) For virtual machine domain, a virtual id is allocated when creating
   domain. And when binding virtual machine domain to an iommu, a real
   domain id is allocated on demand and this domain id will be written
   into context entry. So domain->id for virtual machine domain may be
   different from the domain id written into context entry(used by
   hardware).

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-07-23 16:04:47 +02:00
Jiang Liu fb170fb4c5 iommu/vt-d: Introduce helper functions to make code symmetric for readability
Introduce domain_attach_iommu()/domain_detach_iommu() and refine
iommu_attach_domain()/iommu_detach_domain() to make code symmetric
and improve readability.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-07-23 16:04:46 +02:00
Jiang Liu ab8dfe2515 iommu/vt-d: Introduce helper functions to improve code readability
Introduce domain_type_is_vm() and domain_type_is_vm_or_si() to improve
code readability.

Also kill useless macro DOMAIN_FLAG_P2P_MULTIPLE_DEVICES.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-07-23 16:04:46 +02:00
Jiang Liu 18fd779a41 iommu/vt-d: Use correct domain id to flush virtual machine domains
For virtual machine domains, domain->id is a virtual id, and the real
domain id written into context entry is dynamically allocated.
So use the real domain id instead of domain->id when flushing iotlbs
for virtual machine domains.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-07-23 16:04:46 +02:00
Jiang Liu c3b497c6bb iommu/vt-d: Match segment number when searching for dev_iotlb capable devices
For virtual machine and static identity domains, there may be devices
from different PCI segments associated with the same domain.
So function iommu_support_dev_iotlb() should also match PCI segment
number (iommu unit) when searching for dev_iotlb capable devices.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-07-23 16:04:46 +02:00
Joerg Roedel cbb24a25a8 Merge branch 'core' into x86/vt-d
Conflicts:
	drivers/iommu/intel-iommu.c
2014-07-23 16:04:37 +02:00
Thierry Reding b22f6434cf iommu: Constify struct iommu_ops
This structure is read-only data and should never be modified.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-07-07 10:36:59 +02:00
Alex Williamson a5459cfece iommu/vt-d: Make use of IOMMU sysfs support
Register our DRHD IOMMUs, cross link devices, and provide a base set
of attributes for the IOMMU.  Note that IRQ remapping support parses
the DMAR table very early in boot, well before the iommu_class can
reasonably be setup, so our registration is split between
intel_iommu_init(), which occurs later, and alloc_iommu(), which
typically occurs much earlier, but may happen at any time later
with IOMMU hot-add support.

On a typical desktop system, this provides the following (pruned):

$ find /sys | grep dmar
/sys/devices/virtual/iommu/dmar0
/sys/devices/virtual/iommu/dmar0/devices
/sys/devices/virtual/iommu/dmar0/devices/0000:00:02.0
/sys/devices/virtual/iommu/dmar0/intel-iommu
/sys/devices/virtual/iommu/dmar0/intel-iommu/cap
/sys/devices/virtual/iommu/dmar0/intel-iommu/ecap
/sys/devices/virtual/iommu/dmar0/intel-iommu/address
/sys/devices/virtual/iommu/dmar0/intel-iommu/version
/sys/devices/virtual/iommu/dmar1
/sys/devices/virtual/iommu/dmar1/devices
/sys/devices/virtual/iommu/dmar1/devices/0000:00:00.0
/sys/devices/virtual/iommu/dmar1/devices/0000:00:01.0
/sys/devices/virtual/iommu/dmar1/devices/0000:00:16.0
/sys/devices/virtual/iommu/dmar1/devices/0000:00:1a.0
/sys/devices/virtual/iommu/dmar1/devices/0000:00:1b.0
/sys/devices/virtual/iommu/dmar1/devices/0000:00:1c.0
...
/sys/devices/virtual/iommu/dmar1/intel-iommu
/sys/devices/virtual/iommu/dmar1/intel-iommu/cap
/sys/devices/virtual/iommu/dmar1/intel-iommu/ecap
/sys/devices/virtual/iommu/dmar1/intel-iommu/address
/sys/devices/virtual/iommu/dmar1/intel-iommu/version
/sys/class/iommu/dmar0
/sys/class/iommu/dmar1

(devices also link back to the dmar units)

This makes address, version, capabilities, and extended capabilities
available, just like printed on boot.  I've tried not to duplicate
data that can be found in the DMAR table, with the exception of the
address, which provides an easy way to associate the sysfs device with
a DRHD entry in the DMAR.  It's tempting to add scopes and RMRR data
here, but the full DMAR table is already exposed under /sys/firmware/
and therefore already provides a way for userspace to learn such
details.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-07-04 12:35:59 +02:00
Alex Williamson 579305f75d iommu/vt-d: Update to use PCI DMA aliases
VT-d code currently makes use of pci_find_upstream_pcie_bridge() in
order to find the topology based alias of a device.  This function has
a few problems.  First, it doesn't check the entire alias path of the
device to the root bus, therefore if a PCIe device is masked upstream,
the wrong result is produced.  Also, it's known to get confused and
give up when it crosses a bridge from a conventional PCI bus to a PCIe
bus that lacks a PCIe capability.  The PCI-core provided DMA alias
support solves both of these problems and additionally adds support
for DMA function quirks allowing VT-d to work with devices like
Marvell and Ricoh with known broken requester IDs.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-07-04 12:35:58 +02:00
Alex Williamson e17f9ff413 iommu/vt-d: Use iommu_group_get_for_dev()
The IOMMU code now provides a common interface for finding or
creating an IOMMU group for a device on PCI buses.  Make use of it
and remove piles of code.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-07-04 12:35:58 +02:00
Yijing Wang aa4d066a2a iommu/vt-d: Suppress compiler warnings
suppress compiler warnings:
drivers/iommu/intel-iommu.c: In function ‘device_to_iommu’:
drivers/iommu/intel-iommu.c:673: warning: ‘segment’ may be used uninitialized in this function
drivers/iommu/intel-iommu.c: In function ‘get_domain_for_dev.clone.3’:
drivers/iommu/intel-iommu.c:2217: warning: ‘bridge_bus’ may be used uninitialized in this function
drivers/iommu/intel-iommu.c:2217: warning: ‘bridge_devfn’ may be used uninitialized in this function

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-07-04 11:34:37 +02:00
Yijing Wang effad4b59f iommu/vt-d: Remove the useless dma_pte_addr
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-07-04 11:34:20 +02:00
Joerg Roedel c3c75eb7fa iommu/vt-d: Don't use magic number in dma_pte_superpage
Use the already defined DMA_PTE_LARGE_PAGE for testing
instead of hardcoding the value again.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-07-04 11:34:18 +02:00
Yijing Wang 9b27e82d20 iommu/vt-d: Fix reference count in iommu_prepare_isa
Decrease the device reference count avoid memory leak.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-07-04 11:34:13 +02:00
Yijing Wang e16922af9d iommu/vt-d: Use inline function dma_pte_superpage instead of macros
Use inline function dma_pte_superpage() instead of macro for
better readability.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-07-04 11:34:07 +02:00
Yijing Wang 8f9d41b430 iommu/vt-d: Clear the redundant assignment for domain->nid
Alloc_domain() will initialize domain->nid to -1. So the
initialization for domain->nid in md_domain_init() is redundant,
clear it.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-07-04 11:34:00 +02:00
Yijing Wang 3a74ca0140 iommu/vt-d: Use list_for_each_safe() to simplify code
Use list_for_each_entry_safe() instead of list_entry()
to simplify code.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-07-04 11:16:20 +02:00
Jiang Liu 27e249501c iommu/vt-d: fix bug in handling multiple RMRRs for the same PCI device
Function dmar_iommu_notify_scope_dev() makes a wrong assumption that
there's one RMRR for each PCI device at most, which causes DMA failure
on some HP platforms. So enhance dmar_iommu_notify_scope_dev() to
handle multiple RMRRs for the same PCI device.

Fixbug: https://bugzilla.novell.com/show_bug.cgi?id=879482

Cc: <stable@vger.kernel.org> # 3.15
Reported-by: Tom Mingarelli <thomas.mingarelli@hp.com>
Tested-by: Linda Knippers <linda.knippers@hp.com>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-06-20 14:18:04 +02:00
Akinobu Mita 3674643625 intel-iommu: integrate DMA CMA
This adds support for the DMA Contiguous Memory Allocator for
intel-iommu.  This change enables dma_alloc_coherent() to allocate big
contiguous memory.

It is achieved in the same way as nommu_dma_ops currently does, i.e.
trying to allocate memory by dma_alloc_from_contiguous() and
alloc_pages() is used as a fallback.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:53:57 -07:00
David Woodhouse 9f05d3fb64 iommu/vt-d: Fix get_domain_for_dev() handling of upstream PCIe bridges
Commit 146922ec79 ("iommu/vt-d: Make get_domain_for_dev() take struct
device") introduced new variables bridge_bus and bridge_devfn to
identify the upstream PCIe to PCI bridge responsible for the given
target device. Leaving the original bus/devfn variables to identify
the target device itself, now that it is no longer assumed to be PCI
and we can no longer trivially find that information.

However, the patch failed to correctly use the new variables in all
cases; instead using the as-yet-uninitialised 'bus' and 'devfn'
variables.

Reported-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-04-14 22:01:30 -07:00
Jiang Liu adeb25905c iommu/vt-d: fix memory leakage caused by commit ea8ea46
Commit ea8ea46 "iommu/vt-d: Clean up and fix page table clear/free
behaviour" introduces possible leakage of DMA page tables due to:
        for (pte = page_address(pg); !first_pte_in_page(pte); pte++) {
                if (dma_pte_present(pte) && !dma_pte_superpage(pte))
                        freelist = dma_pte_list_pagetables(domain, level - 1,
                                                           pte, freelist);
        }

For the first pte in a page, first_pte_in_page(pte) will always be true,
thus dma_pte_list_pagetables() will never be called and leak DMA page
tables if level is bigger than 1.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-04-13 13:07:56 +01:00
Dan Carpenter 14d4056996 iommu/vt-d: returning free pointer in get_domain_for_dev()
If we hit this error condition then we want to return a NULL pointer and
not a freed variable.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-28 11:31:39 +00:00
David Woodhouse cf04eee8bf iommu/vt-d: Include ACPI devices in iommu=pt
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:08:10 +00:00
David Woodhouse 66077edc97 iommu/vt-d: Finally enable translation for non-PCI devices
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:08:08 +00:00
David Woodhouse 46333e375f iommu/vt-d: Remove to_pci_dev() in intel_map_page()
It might not be...

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:08:07 +00:00
David Woodhouse 7207d8f925 iommu/vt-d: Remove pdev from intel_iommu_attach_device()
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:08:05 +00:00
David Woodhouse ecb509ec2b iommu/vt-d: Remove pdev from iommu_no_mapping()
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:08:04 +00:00
David Woodhouse 5913c9bf0e iommu/vt-d: Make domain_add_dev_info() take struct device
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:08:03 +00:00
David Woodhouse bf9c9eda71 iommu/vt-d: Make domain_remove_one_dev_info() take struct device
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:08:01 +00:00
David Woodhouse 5040a918bd iommu/vt-d: Rename 'hwdev' variables to 'dev' now that that's the norm
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:08:00 +00:00
David Woodhouse 207e35920d iommu/vt-d: Remove some pointless to_pci_dev() calls
Mostly made redundant by using dev_name() instead of pci_name(), and one
instance of using *dev->dma_mask instead of pdev->dma_mask.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:07:58 +00:00
David Woodhouse d4b709f48e iommu/vt-d: Make get_valid_domain_for_dev() take struct device
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:07:57 +00:00
David Woodhouse 3bdb259116 iommu/vt-d: Make iommu_should_identity_map() take struct device
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:07:55 +00:00
David Woodhouse 0b9d975315 iommu/vt-d: Handle RMRRs for non-PCI devices
Should hopefully never happen (RMRRs are an abomination) but while we're
busy eliminating all the PCI assumptions, we might as well do it.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:07:54 +00:00
David Woodhouse 146922ec79 iommu/vt-d: Make get_domain_for_dev() take struct device
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:07:52 +00:00
David Woodhouse e1f167f3fd iommu/vt-d: Make domain_context_mapp{ed,ing}() take struct device
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:07:51 +00:00
David Woodhouse 156baca8d3 iommu/vt-d: Make device_to_iommu() cope with non-PCI devices
Pass the struct device to it, and also make it return the bus/devfn to use,
since that is also stored in the DMAR table.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:07:49 +00:00
David Woodhouse 9b226624bb iommu/vt-d: Make identity_mapping() take struct device not struct pci_dev
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:07:48 +00:00
David Woodhouse 41e80dca52 iommu/vt-d: Remove segment from struct device_domain_info()
It's accessible via info->iommu->segment so this is redundant.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:07:46 +00:00
David Woodhouse 7c7faa11ec iommu/vt-d: Remove device_to_iommu() call from domain_remove_dev_info()
This was problematic because it works by domain/bus/devfn and we want
to make device_to_iommu() use only a struct device * (for handling non-PCI
devices). Now that the iommu pointer is reliably stored in the
device_domain_info, we don't need to look it up.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:06:53 +00:00
David Woodhouse 8bbc441012 iommu/vt-d: Simplify iommu check in domain_remove_one_dev_info()
Now we store the iommu in the device_domain_info, we don't need to do a
lookup.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:06:51 +00:00
David Woodhouse 5a8f40e8c8 iommu/vt-d: Always store iommu in device_domain_info
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:06:44 +00:00
David Woodhouse e2f8c5f6d4 iommu/vt-d: Use domain_remove_one_dev_info() in domain_add_dev_info() error path
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:06:42 +00:00
David Woodhouse 0ac7266485 iommu/vt-d: use dmar_insert_dev_info() from dma_add_dev_info()
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:06:41 +00:00
David Woodhouse b718cd3d84 iommu/vt-d: Stop dmar_insert_dev_info() freeing domains on losing race
By moving this into get_domain_for_dev() we can make dmar_insert_dev_info()
suitable for use with "special" domains such as the si_domain, which
currently use domain_add_dev_info().

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:06:39 +00:00
David Woodhouse 64ae892bfe iommu/vt-d: Pass iommu to domain_context_mapping_one() and iommu_support_dev_iotlb()
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:06:37 +00:00
David Woodhouse 0bcb3e28c3 iommu/vt-d: Use struct device in device_domain_info, not struct pci_dev
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:06:36 +00:00
David Woodhouse 1525a29a7d iommu/vt-d: Make dmar_insert_dev_info() take struct device instead of struct pci_dev
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:06:34 +00:00
David Woodhouse 3d89194a94 iommu/vt-d: Make iommu_dummy() take struct device instead of struct pci_dev
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:06:33 +00:00
David Woodhouse 832bd85867 iommu/vt-d: Change scope lists to struct device, bus, devfn
It's not only for PCI devices any more, and the scope information for an
ACPI device provides the bus and devfn so that has to be stored here too.

It is the device pointer itself which needs to be protected with RCU,
so the __rcu annotation follows it into the definition of struct
dmar_dev_scope, since we're no longer just passing arrays of device
pointers around.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:05:08 +00:00
David Woodhouse d050196087 iommu/vt-d: Be less pessimistic about domain coherency where possible
In commit 2e12bc29 ("intel-iommu: Default to non-coherent for domains
unattached to iommus") we decided to err on the side of caution and
always assume that it's possible that a device will be attached which is
behind a non-coherent IOMMU.

In some cases, however, that just *cannot* happen. If there *are* no
IOMMUs in the system which are non-coherent, then we don't need to do
it. And flushing the dcache is a *significant* performance hit.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-19 17:25:48 +00:00
David Woodhouse 214e39aa36 iommu/vt-d: Honour intel_iommu=sp_off for non-VMM domains
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-19 17:22:13 +00:00
David Woodhouse ea8ea460c9 iommu/vt-d: Clean up and fix page table clear/free behaviour
There is a race condition between the existing clear/free code and the
hardware. The IOMMU is actually permitted to cache the intermediate
levels of the page tables, and doesn't need to walk the table from the
very top of the PGD each time. So the existing back-to-back calls to
dma_pte_clear_range() and dma_pte_free_pagetable() can lead to a
use-after-free where the IOMMU reads from a freed page table.

When freeing page tables we actually need to do the IOTLB flush, with
the 'invalidation hint' bit clear to indicate that it's not just a
leaf-node flush, after unlinking each page table page from the next level
up but before actually freeing it.

So in the rewritten domain_unmap() we just return a list of pages (using
pg->freelist to make a list of them), and then the caller is expected to
do the appropriate IOTLB flush (or tear down the domain completely,
whatever), before finally calling dma_free_pagelist() to free the pages.

As an added bonus, we no longer need to flush the CPU's data cache for
pages which are about to be *removed* from the page table hierarchy anyway,
in the non-cache-coherent case. This drastically improves the performance
of large unmaps.

As a side-effect of all these changes, this also fixes the fact that
intel_iommu_unmap() was neglecting to free the page tables for the range
in question after clearing them.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-19 17:21:41 +00:00
David Woodhouse 5cf0a76fa2 iommu/vt-d: Clean up size handling for intel_iommu_unmap()
We have this horrid API where iommu_unmap() can unmap more than it's asked
to, if the IOVA in question happens to be mapped with a large page.

Instead of propagating this nonsense to the point where we end up returning
the page order from dma_pte_clear_range(), let's just do it once and adjust
the 'size' parameter accordingly.

Augment pfn_to_dma_pte() to return the level at which the PTE was found,
which will also be useful later if we end up changing the API for
iommu_iova_to_phys() to behave the same way as is being discussed upstream.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-19 17:21:32 +00:00
Jiang Liu 75f05569d0 iommu/vt-d: Update IOMMU state when memory hotplug happens
If static identity domain is created, IOMMU driver needs to update
si_domain page table when memory hotplug event happens. Otherwise
PCI device DMA operations can't access the hot-added memory regions.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:06 +01:00
Jiang Liu 2e45528930 iommu/vt-d: Unify the way to process DMAR device scope array
Now we have a PCI bus notification based mechanism to update DMAR
device scope array, we could extend the mechanism to support boot
time initialization too, which will help to unify and simplify
the implementation.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:06 +01:00
Jiang Liu 59ce0515cd iommu/vt-d: Update DRHD/RMRR/ATSR device scope caches when PCI hotplug happens
Current Intel DMAR/IOMMU driver assumes that all PCI devices associated
with DMAR/RMRR/ATSR device scope arrays are created at boot time and
won't change at runtime, so it caches pointers of associated PCI device
object. That assumption may be wrong now due to:
1) introduction of PCI host bridge hotplug
2) PCI device hotplug through sysfs interfaces.

Wang Yijing has tried to solve this issue by caching <bus, dev, func>
tupple instead of the PCI device object pointer, but that's still
unreliable because PCI bus number may change in case of hotplug.
Please refer to http://lkml.org/lkml/2013/11/5/64
Message from Yingjing's mail:
after remove and rescan a pci device
[  611.857095] dmar: DRHD: handling fault status reg 2
[  611.857109] dmar: DMAR:[DMA Read] Request device [86:00.3] fault addr ffff7000
[  611.857109] DMAR:[fault reason 02] Present bit in context entry is clear
[  611.857524] dmar: DRHD: handling fault status reg 102
[  611.857534] dmar: DMAR:[DMA Read] Request device [86:00.3] fault addr ffff6000
[  611.857534] DMAR:[fault reason 02] Present bit in context entry is clear
[  611.857936] dmar: DRHD: handling fault status reg 202
[  611.857947] dmar: DMAR:[DMA Read] Request device [86:00.3] fault addr ffff5000
[  611.857947] DMAR:[fault reason 02] Present bit in context entry is clear
[  611.858351] dmar: DRHD: handling fault status reg 302
[  611.858362] dmar: DMAR:[DMA Read] Request device [86:00.3] fault addr ffff4000
[  611.858362] DMAR:[fault reason 02] Present bit in context entry is clear
[  611.860819] IPv6: ADDRCONF(NETDEV_UP): eth3: link is not ready
[  611.860983] dmar: DRHD: handling fault status reg 402
[  611.860995] dmar: INTR-REMAP: Request device [[86:00.3] fault index a4
[  611.860995] INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear

This patch introduces a new mechanism to update the DRHD/RMRR/ATSR device scope
caches by hooking PCI bus notification.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:06 +01:00
Jiang Liu 0e242612d9 iommu/vt-d: Use RCU to protect global resources in interrupt context
Global DMA and interrupt remapping resources may be accessed in
interrupt context, so use RCU instead of rwsem to protect them
in such cases.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:05 +01:00
Jiang Liu 3a5670e8ac iommu/vt-d: Introduce a rwsem to protect global data structures
Introduce a global rwsem dmar_global_lock, which will be used to
protect DMAR related global data structures from DMAR/PCI/memory
device hotplug operations in process context.

DMA and interrupt remapping related data structures are read most,
and only change when memory/PCI/DMAR hotplug event happens.
So a global rwsem solution is adopted for balance between simplicity
and performance.

For interrupt remapping driver, function intel_irq_remapping_supported(),
dmar_table_init(), intel_enable_irq_remapping(), disable_irq_remapping(),
reenable_irq_remapping() and enable_drhd_fault_handling() etc
are called during booting, suspending and resuming with interrupt
disabled, so no need to take the global lock.

For interrupt remapping entry allocation, the locking model is:
	down_read(&dmar_global_lock);
	/* Find corresponding iommu */
	iommu = map_hpet_to_ir(id);
	if (iommu)
		/*
		 * Allocate remapping entry and mark entry busy,
		 * the IOMMU won't be hot-removed until the
		 * allocated entry has been released.
		 */
		index = alloc_irte(iommu, irq, 1);
	up_read(&dmar_global_lock);

For DMA remmaping driver, we only uses the dmar_global_lock rwsem to
protect functions which are only called in process context. For any
function which may be called in interrupt context, we will use RCU
to protect them in following patches.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:05 +01:00
Jiang Liu b683b230a2 iommu/vt-d: Introduce macro for_each_dev_scope() to walk device scope entries
Introduce for_each_dev_scope()/for_each_active_dev_scope() to walk
{active} device scope entries. This will help following RCU lock
related patches.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:04 +01:00
Jiang Liu b5f82ddf22 iommu/vt-d: Fix error in detect ATS capability
Current Intel IOMMU driver only matches a PCIe root port with the first
DRHD unit with the samge segment number. It will report false result
if there are multiple DRHD units with the same segment number, thus fail
to detect ATS capability for some PCIe devices.

This patch refines function dmar_find_matched_atsr_unit() to search all
DRHD units with the same segment number.

An example DMAR table entries as below:
[1D0h 0464  2]                Subtable Type : 0002 <Root Port ATS Capability>
[1D2h 0466  2]                       Length : 0028
[1D4h 0468  1]                        Flags : 00
[1D5h 0469  1]                     Reserved : 00
[1D6h 0470  2]           PCI Segment Number : 0000

[1D8h 0472  1]      Device Scope Entry Type : 02
[1D9h 0473  1]                 Entry Length : 08
[1DAh 0474  2]                     Reserved : 0000
[1DCh 0476  1]               Enumeration ID : 00
[1DDh 0477  1]               PCI Bus Number : 00
[1DEh 0478  2]                     PCI Path : [02, 00]

[1E0h 0480  1]      Device Scope Entry Type : 02
[1E1h 0481  1]                 Entry Length : 08
[1E2h 0482  2]                     Reserved : 0000
[1E4h 0484  1]               Enumeration ID : 00
[1E5h 0485  1]               PCI Bus Number : 00
[1E6h 0486  2]                     PCI Path : [03, 00]

[1E8h 0488  1]      Device Scope Entry Type : 02
[1E9h 0489  1]                 Entry Length : 08
[1EAh 0490  2]                     Reserved : 0000
[1ECh 0492  1]               Enumeration ID : 00
[1EDh 0493  1]               PCI Bus Number : 00
[1EEh 0494  2]                     PCI Path : [03, 02]

[1F0h 0496  1]      Device Scope Entry Type : 02
[1F1h 0497  1]                 Entry Length : 08
[1F2h 0498  2]                     Reserved : 0000
[1F4h 0500  1]               Enumeration ID : 00
[1F5h 0501  1]               PCI Bus Number : 00
[1F6h 0502  2]                     PCI Path : [03, 03]

[1F8h 0504  2]                Subtable Type : 0002 <Root Port ATS Capability>
[1FAh 0506  2]                       Length : 0020
[1FCh 0508  1]                        Flags : 00
[1FDh 0509  1]                     Reserved : 00
[1FEh 0510  2]           PCI Segment Number : 0000

[200h 0512  1]      Device Scope Entry Type : 02
[201h 0513  1]                 Entry Length : 08
[202h 0514  2]                     Reserved : 0000
[204h 0516  1]               Enumeration ID : 00
[205h 0517  1]               PCI Bus Number : 40
[206h 0518  2]                     PCI Path : [02, 00]

[208h 0520  1]      Device Scope Entry Type : 02
[209h 0521  1]                 Entry Length : 08
[20Ah 0522  2]                     Reserved : 0000
[20Ch 0524  1]               Enumeration ID : 00
[20Dh 0525  1]               PCI Bus Number : 40
[20Eh 0526  2]                     PCI Path : [02, 02]

[210h 0528  1]      Device Scope Entry Type : 02
[211h 0529  1]                 Entry Length : 08
[212h 0530  2]                     Reserved : 0000
[214h 0532  1]               Enumeration ID : 00
[215h 0533  1]               PCI Bus Number : 40
[216h 0534  2]                     PCI Path : [03, 00]

[218h 0536  2]                Subtable Type : 0002 <Root Port ATS Capability>
[21Ah 0538  2]                       Length : 0020
[21Ch 0540  1]                        Flags : 00
[21Dh 0541  1]                     Reserved : 00
[21Eh 0542  2]           PCI Segment Number : 0000

[220h 0544  1]      Device Scope Entry Type : 02
[221h 0545  1]                 Entry Length : 08
[222h 0546  2]                     Reserved : 0000
[224h 0548  1]               Enumeration ID : 00
[225h 0549  1]               PCI Bus Number : 80
[226h 0550  2]                     PCI Path : [02, 00]

[228h 0552  1]      Device Scope Entry Type : 02
[229h 0553  1]                 Entry Length : 08
[22Ah 0554  2]                     Reserved : 0000
[22Ch 0556  1]               Enumeration ID : 00
[22Dh 0557  1]               PCI Bus Number : 80
[22Eh 0558  2]                     PCI Path : [02, 02]

[230h 0560  1]      Device Scope Entry Type : 02
[231h 0561  1]                 Entry Length : 08
[232h 0562  2]                     Reserved : 0000
[234h 0564  1]               Enumeration ID : 00
[235h 0565  1]               PCI Bus Number : 80
[236h 0566  2]                     PCI Path : [03, 00]

[238h 0568  2]                Subtable Type : 0002 <Root Port ATS Capability>
[23Ah 0570  2]                       Length : 0020
[23Ch 0572  1]                        Flags : 00
[23Dh 0573  1]                     Reserved : 00
[23Eh 0574  2]           PCI Segment Number : 0000

[240h 0576  1]      Device Scope Entry Type : 02
[241h 0577  1]                 Entry Length : 08
[242h 0578  2]                     Reserved : 0000
[244h 0580  1]               Enumeration ID : 00
[245h 0581  1]               PCI Bus Number : C0
[246h 0582  2]                     PCI Path : [02, 00]

[248h 0584  1]      Device Scope Entry Type : 02
[249h 0585  1]                 Entry Length : 08
[24Ah 0586  2]                     Reserved : 0000
[24Ch 0588  1]               Enumeration ID : 00
[24Dh 0589  1]               PCI Bus Number : C0
[24Eh 0590  2]                     PCI Path : [02, 02]

[250h 0592  1]      Device Scope Entry Type : 02
[251h 0593  1]                 Entry Length : 08
[252h 0594  2]                     Reserved : 0000
[254h 0596  1]               Enumeration ID : 00
[255h 0597  1]               PCI Bus Number : C0
[256h 0598  2]                     PCI Path : [03, 00]

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:04 +01:00
Jiang Liu a4eaa86c0c iommu/vt-d: Check for NULL pointer when freeing IOMMU data structure
Domain id 0 will be assigned to invalid translation without allocating
domain data structure if DMAR unit supports caching mode. So in function
free_dmar_iommu(), we should check whether the domain pointer is NULL,
otherwise it will cause system crash as below:
[    6.790519] BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8
[    6.799520] IP: [<ffffffff810e2dc8>] __lock_acquire+0x11f8/0x1430
[    6.806493] PGD 0
[    6.817972] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[    6.823303] Modules linked in:
[    6.826862] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc1+ #126
[    6.834252] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRIVTIN1.86B.0047.R00.1402050741 02/05/2014
[    6.845951] task: ffff880455a80000 ti: ffff880455a88000 task.ti: ffff880455a88000
[    6.854437] RIP: 0010:[<ffffffff810e2dc8>]  [<ffffffff810e2dc8>] __lock_acquire+0x11f8/0x1430
[    6.864154] RSP: 0000:ffff880455a89ce0  EFLAGS: 00010046
[    6.870179] RAX: 0000000000000046 RBX: 0000000000000002 RCX: 0000000000000000
[    6.878249] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000c8
[    6.886318] RBP: ffff880455a89d40 R08: 0000000000000002 R09: 0000000000000001
[    6.894387] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880455a80000
[    6.902458] R13: 0000000000000000 R14: 00000000000000c8 R15: 0000000000000000
[    6.910520] FS:  0000000000000000(0000) GS:ffff88045b800000(0000) knlGS:0000000000000000
[    6.919687] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.926198] CR2: 00000000000000c8 CR3: 0000000001e0e000 CR4: 00000000001407f0
[    6.934269] Stack:
[    6.936588]  ffffffffffffff10 ffffffff810f59db 0000000000000010 0000000000000246
[    6.945219]  ffff880455a89d10 0000000000000000 ffffffff82bcb980 0000000000000046
[    6.953850]  0000000000000000 0000000000000000 0000000000000002 0000000000000000
[    6.962482] Call Trace:
[    6.965300]  [<ffffffff810f59db>] ? vprintk_emit+0x4fb/0x5a0
[    6.971716]  [<ffffffff810e3185>] lock_acquire+0x185/0x200
[    6.977941]  [<ffffffff821fbbee>] ? init_dmars+0x839/0xa1d
[    6.984167]  [<ffffffff81870b06>] _raw_spin_lock_irqsave+0x56/0x90
[    6.991158]  [<ffffffff821fbbee>] ? init_dmars+0x839/0xa1d
[    6.997380]  [<ffffffff821fbbee>] init_dmars+0x839/0xa1d
[    7.003410]  [<ffffffff8147d575>] ? pci_get_dev_by_id+0x75/0xd0
[    7.010119]  [<ffffffff821fc146>] intel_iommu_init+0x2f0/0x502
[    7.016735]  [<ffffffff821a7947>] ? iommu_setup+0x27d/0x27d
[    7.023056]  [<ffffffff821a796f>] pci_iommu_init+0x28/0x52
[    7.029282]  [<ffffffff81002162>] do_one_initcall+0xf2/0x220
[    7.035702]  [<ffffffff810a4a29>] ? parse_args+0x2c9/0x450
[    7.041919]  [<ffffffff8219d1b1>] kernel_init_freeable+0x1c9/0x25b
[    7.048919]  [<ffffffff8219c8d2>] ? do_early_param+0x8a/0x8a
[    7.055336]  [<ffffffff8184d3f0>] ? rest_init+0x150/0x150
[    7.061461]  [<ffffffff8184d3fe>] kernel_init+0xe/0x100
[    7.067393]  [<ffffffff8187b5fc>] ret_from_fork+0x7c/0xb0
[    7.073518]  [<ffffffff8184d3f0>] ? rest_init+0x150/0x150
[    7.079642] Code: 01 76 18 89 05 46 04 36 01 41 be 01 00 00 00 e9 2f 02 00 00 0f 1f 80 00 00 00 00 41 be 01 00 00 00 e9 1d 02 00 00 0f 1f 44 00 00 <49> 81 3e c0 31 34 82 b8 01 00 00 00 0f 44 d8 41 83 ff 01 0f 87
[    7.104944] RIP  [<ffffffff810e2dc8>] __lock_acquire+0x11f8/0x1430
[    7.112008]  RSP <ffff880455a89ce0>
[    7.115988] CR2: 00000000000000c8
[    7.119784] ---[ end trace 13d756f0f462c538 ]---
[    7.125034] note: swapper/0[1] exited with preempt_count 1
[    7.131285] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[    7.131285]

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:03 +01:00
Jiang Liu 9ebd682e5a iommu/vt-d: Fix incorrect iommu_count for si_domain
The iommu_count field in si_domain(static identity domain) is
initialized to zero and never increases. It will underflow
when tearing down iommu unit in function free_dmar_iommu()
and leak memory. So refine code to correctly manage
si_domain->iommu_count.

Warning message caused by si_domain memory leak:
[   14.609681] IOMMU: Setting RMRR:
[   14.613496] Ignoring identity map for HW passthrough device 0000:00:1a.0 [0xbdcfd000 - 0xbdd1dfff]
[   14.623809] Ignoring identity map for HW passthrough device 0000:00:1d.0 [0xbdcfd000 - 0xbdd1dfff]
[   14.634162] IOMMU: Prepare 0-16MiB unity mapping for LPC
[   14.640329] Ignoring identity map for HW passthrough device 0000:00:1f.0 [0x0 - 0xffffff]
[   14.673360] IOMMU: dmar init failed
[   14.678157] kmem_cache_destroy iommu_devinfo: Slab cache still has objects
[   14.686076] CPU: 12 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc1-gerry+ #59
[   14.694176] Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.99.99.x059.091020121352 09/10/2012
[   14.707412]  0000000000000000 ffff88042dd33db0 ffffffff8156223d ffff880c2cc37c00
[   14.716407]  ffff88042dd33dc8 ffffffff811790b1 ffff880c2d3533b8 ffff88042dd33e00
[   14.725468]  ffffffff81dc7a6a ffffffff81b1e8e0 ffffffff81f84058 ffffffff81d8a711
[   14.734464] Call Trace:
[   14.737453]  [<ffffffff8156223d>] dump_stack+0x4d/0x66
[   14.743430]  [<ffffffff811790b1>] kmem_cache_destroy+0xf1/0x100
[   14.750279]  [<ffffffff81dc7a6a>] intel_iommu_init+0x122/0x56a
[   14.757035]  [<ffffffff81d8a711>] ? iommu_setup+0x27d/0x27d
[   14.763491]  [<ffffffff81d8a739>] pci_iommu_init+0x28/0x52
[   14.769846]  [<ffffffff81000342>] do_one_initcall+0x122/0x180
[   14.776506]  [<ffffffff81077738>] ? parse_args+0x1e8/0x320
[   14.782866]  [<ffffffff81d850e8>] kernel_init_freeable+0x1e1/0x26c
[   14.789994]  [<ffffffff81d84833>] ? do_early_param+0x88/0x88
[   14.796556]  [<ffffffff8154ffc0>] ? rest_init+0xd0/0xd0
[   14.802626]  [<ffffffff8154ffce>] kernel_init+0xe/0x130
[   14.808698]  [<ffffffff815756ac>] ret_from_fork+0x7c/0xb0
[   14.814963]  [<ffffffff8154ffc0>] ? rest_init+0xd0/0xd0
[   14.821640] kmem_cache_destroy iommu_domain: Slab cache still has objects
[   14.829456] CPU: 12 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc1-gerry+ #59
[   14.837562] Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.99.99.x059.091020121352 09/10/2012
[   14.850803]  0000000000000000 ffff88042dd33db0 ffffffff8156223d ffff88102c1ee3c0
[   14.861222]  ffff88042dd33dc8 ffffffff811790b1 ffff880c2d3533b8 ffff88042dd33e00
[   14.870284]  ffffffff81dc7a76 ffffffff81b1e8e0 ffffffff81f84058 ffffffff81d8a711
[   14.879271] Call Trace:
[   14.882227]  [<ffffffff8156223d>] dump_stack+0x4d/0x66
[   14.888197]  [<ffffffff811790b1>] kmem_cache_destroy+0xf1/0x100
[   14.895034]  [<ffffffff81dc7a76>] intel_iommu_init+0x12e/0x56a
[   14.901781]  [<ffffffff81d8a711>] ? iommu_setup+0x27d/0x27d
[   14.908238]  [<ffffffff81d8a739>] pci_iommu_init+0x28/0x52
[   14.914594]  [<ffffffff81000342>] do_one_initcall+0x122/0x180
[   14.921244]  [<ffffffff81077738>] ? parse_args+0x1e8/0x320
[   14.927598]  [<ffffffff81d850e8>] kernel_init_freeable+0x1e1/0x26c
[   14.934738]  [<ffffffff81d84833>] ? do_early_param+0x88/0x88
[   14.941309]  [<ffffffff8154ffc0>] ? rest_init+0xd0/0xd0
[   14.947380]  [<ffffffff8154ffce>] kernel_init+0xe/0x130
[   14.953430]  [<ffffffff815756ac>] ret_from_fork+0x7c/0xb0
[   14.959689]  [<ffffffff8154ffc0>] ? rest_init+0xd0/0xd0
[   14.966299] kmem_cache_destroy iommu_iova: Slab cache still has objects
[   14.973923] CPU: 12 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc1-gerry+ #59
[   14.982020] Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.99.99.x059.091020121352 09/10/2012
[   14.995263]  0000000000000000 ffff88042dd33db0 ffffffff8156223d ffff88042cb5c980
[   15.004265]  ffff88042dd33dc8 ffffffff811790b1 ffff880c2d3533b8 ffff88042dd33e00
[   15.013322]  ffffffff81dc7a82 ffffffff81b1e8e0 ffffffff81f84058 ffffffff81d8a711
[   15.022318] Call Trace:
[   15.025238]  [<ffffffff8156223d>] dump_stack+0x4d/0x66
[   15.031202]  [<ffffffff811790b1>] kmem_cache_destroy+0xf1/0x100
[   15.038038]  [<ffffffff81dc7a82>] intel_iommu_init+0x13a/0x56a
[   15.044786]  [<ffffffff81d8a711>] ? iommu_setup+0x27d/0x27d
[   15.051242]  [<ffffffff81d8a739>] pci_iommu_init+0x28/0x52
[   15.057601]  [<ffffffff81000342>] do_one_initcall+0x122/0x180
[   15.064254]  [<ffffffff81077738>] ? parse_args+0x1e8/0x320
[   15.070608]  [<ffffffff81d850e8>] kernel_init_freeable+0x1e1/0x26c
[   15.077747]  [<ffffffff81d84833>] ? do_early_param+0x88/0x88
[   15.084300]  [<ffffffff8154ffc0>] ? rest_init+0xd0/0xd0
[   15.090362]  [<ffffffff8154ffce>] kernel_init+0xe/0x130
[   15.096431]  [<ffffffff815756ac>] ret_from_fork+0x7c/0xb0
[   15.102693]  [<ffffffff8154ffc0>] ? rest_init+0xd0/0xd0
[   15.189273] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:02 +01:00
Jiang Liu 92d03cc8d0 iommu/vt-d: Reduce duplicated code to handle virtual machine domains
Reduce duplicated code to handle virtual machine domains, there's no
functionality changes. It also improves code readability.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:01 +01:00
Jiang Liu e85bb5d4d1 iommu/vt-d: Free resources if failed to create domain for PCIe endpoint
Enhance function get_domain_for_dev() to release allocated resources
if failed to create domain for PCIe endpoint, otherwise the allocated
resources will get lost.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:01 +01:00
Jiang Liu 745f2586e7 iommu/vt-d: Simplify function get_domain_for_dev()
Function get_domain_for_dev() is a little complex, simplify it
by factoring out dmar_search_domain_by_dev_info() and
dmar_insert_dev_info().

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:01 +01:00
Jiang Liu b94e4117f8 iommu/vt-d: Move private structures and variables into intel-iommu.c
Move private structures and variables into intel-iommu.c, which will
help to simplify locking policy for hotplug. Also delete redundant
declarations.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:00 +01:00
Jiang Liu 7e7dfab71a iommu/vt-d: Avoid caching stale domain_device_info when hot-removing PCI device
Function device_notifier() in intel-iommu.c only remove domain_device_info
data structure associated with a PCI device when handling PCI device
driver unbinding events. If a PCI device has never been bound to a PCI
device driver, there won't be BUS_NOTIFY_UNBOUND_DRIVER event when
hot-removing the PCI device. So associated domain_device_info data
structure may get lost.

On the other hand, if iommu_pass_through is enabled, function
iommu_prepare_static_indentify_mapping() will create domain_device_info
data structure for each PCIe to PCIe bridge and PCIe endpoint,
no matter whether there are drivers associated with those PCIe devices
or not. So those domain_device_info data structures will get lost when
hot-removing the assocated PCIe devices if they have never bound to
any PCI device driver.

To be even worse, it's not only an memory leak issue, but also an
caching of stale information bug because the memory are kept in
device_domain_list and domain->devices lists.

Fix the bug by trying to remove domain_device_info data structure when
handling BUS_NOTIFY_DEL_DEVICE event.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:00 +01:00
Jiang Liu 816997d03b iommu/vt-d: Avoid caching stale domain_device_info and fix memory leak
Function device_notifier() in intel-iommu.c fails to remove
device_domain_info data structures for PCI devices if they are
associated with si_domain because iommu_no_mapping() returns true
for those PCI devices. This will cause memory leak and caching of
stale information in domain->devices list.

So fix the issue by not calling iommu_no_mapping() and skipping check
of iommu_pass_through.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:50:59 +01:00
Jiang Liu 989d51fc99 iommu/vt-d: Avoid double free of g_iommus on error recovery path
Array 'g_iommus' may be freed twice on error recovery path in function
init_dmars() and free_dmar_iommu(), thus cause random system crash as
below.

[    6.774301] IOMMU: dmar init failed
[    6.778310] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    6.785615] software IO TLB [mem 0x76bcf000-0x7abcf000] (64MB) mapped at [ffff880076bcf000-ffff88007abcefff]
[    6.796887] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[    6.804173] Modules linked in:
[    6.807731] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc1+ #108
[    6.815122] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRIVTIN1.86B.0047.R00.1402050741 02/05/2014
[    6.836000] task: ffff880455a80000 ti: ffff880455a88000 task.ti: ffff880455a88000
[    6.844487] RIP: 0010:[<ffffffff8143eea6>]  [<ffffffff8143eea6>] memcpy+0x6/0x110
[    6.853039] RSP: 0000:ffff880455a89cc8  EFLAGS: 00010293
[    6.859064] RAX: ffff006568636163 RBX: ffff00656863616a RCX: 0000000000000005
[    6.867134] RDX: 0000000000000005 RSI: ffffffff81cdc439 RDI: ffff006568636163
[    6.875205] RBP: ffff880455a89d30 R08: 000000000001bc3b R09: 0000000000000000
[    6.883275] R10: 0000000000000000 R11: ffffffff81cdc43e R12: ffff880455a89da8
[    6.891338] R13: ffff006568636163 R14: 0000000000000005 R15: ffffffff81cdc439
[    6.899408] FS:  0000000000000000(0000) GS:ffff88045b800000(0000) knlGS:0000000000000000
[    6.908575] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.915088] CR2: ffff88047e1ff000 CR3: 0000000001e0e000 CR4: 00000000001407f0
[    6.923160] Stack:
[    6.925487]  ffffffff8143c904 ffff88045b407e00 ffff006568636163 ffff006568636163
[    6.934113]  ffffffff8120a1a9 ffffffff81cdc43e 0000000000000007 0000000000000000
[    6.942747]  ffff880455a89da8 ffff006568636163 0000000000000007 ffffffff81cdc439
[    6.951382] Call Trace:
[    6.954197]  [<ffffffff8143c904>] ? vsnprintf+0x124/0x6f0
[    6.960323]  [<ffffffff8120a1a9>] ? __kmalloc_track_caller+0x169/0x360
[    6.967716]  [<ffffffff81440e1b>] kvasprintf+0x6b/0x80
[    6.973552]  [<ffffffff81432bf1>] kobject_set_name_vargs+0x21/0x70
[    6.980552]  [<ffffffff8143393d>] kobject_init_and_add+0x4d/0x90
[    6.987364]  [<ffffffff812067c9>] ? __kmalloc+0x169/0x370
[    6.993492]  [<ffffffff8102dbbc>] ? cache_add_dev+0x17c/0x4f0
[    7.000005]  [<ffffffff8102ddfa>] cache_add_dev+0x3ba/0x4f0
[    7.006327]  [<ffffffff821a87ca>] ? i8237A_init_ops+0x14/0x14
[    7.012842]  [<ffffffff821a87f8>] cache_sysfs_init+0x2e/0x61
[    7.019260]  [<ffffffff81002162>] do_one_initcall+0xf2/0x220
[    7.025679]  [<ffffffff810a4a29>] ? parse_args+0x2c9/0x450
[    7.031903]  [<ffffffff8219d1b1>] kernel_init_freeable+0x1c9/0x25b
[    7.038904]  [<ffffffff8219c8d2>] ? do_early_param+0x8a/0x8a
[    7.045322]  [<ffffffff8184d5e0>] ? rest_init+0x150/0x150
[    7.051447]  [<ffffffff8184d5ee>] kernel_init+0xe/0x100
[    7.057380]  [<ffffffff8187b87c>] ret_from_fork+0x7c/0xb0
[    7.063503]  [<ffffffff8184d5e0>] ? rest_init+0x150/0x150
[    7.069628] Code: 89 e5 53 48 89 fb 75 16 80 7f 3c 00 75 05 e8 d2 f9 ff ff 48 8b 43 58 48 2b 43 50 88 43 4e 5b 5d c3 90 90 90 90 48 89 f8 48 89 d1 <f3> a4 c3 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b
[    7.094960] RIP  [<ffffffff8143eea6>] memcpy+0x6/0x110
[    7.100856]  RSP <ffff880455a89cc8>
[    7.104864] ---[ end trace b5d3fdc6c6c28083 ]---
[    7.110142] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    7.110142]
[    7.120540] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:50:59 +01:00
Linus Torvalds b3a4bcaa5a IOMMU Updates for Linux v3.14
A few patches have been queued up for this merge window:
 
 	* Improvements for the ARM-SMMU driver
 	  (IOMMU_EXEC support, IOMMU group support)
 	* Updates and fixes for the shmobile IOMMU driver
 	* Various fixes to generic IOMMU code and the
 	  Intel IOMMU driver
 	* Some cleanups in IOMMU drivers (dev_is_pci() usage)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJS6XrCAAoJECvwRC2XARrjgy4P/itemtg2U+603Ldje8WcPo0E
 OCO/0VVSmCTYKUJDZY0hiVwqmhe5gFL3Hm/gGwkS0+UJenFXMmi+aVaPp4pCpgH+
 dL2HD3dIEvi14bisrdxG/8MdR6mIx0qzKtnZLkKSR4LXwucyLHvC/DaCoOytb7Yk
 7s+eEuo0hj0jAkiqSG/zLEtKElTEnoAAkLOjMy46orecJ5q4HusPZekLtWZs2ETe
 x3NS63Unb9g1iSQJWIA7HnQlxWIr2+iynoamHHJRiVFzqRF0W0sGvQY3auG0DSCn
 70WRNE1rKfEkfXMJxosRQ4394YUQdAkt8MBENNcJcC6E1n5PBi0cEZXH6mCnEIlG
 jXzIKUY9fz68ZboaqIxXv4Hb+JLlPXCvPBvQzIQiKRgVxd8nncEjn5I9MHdf+je5
 BmJlzJLJvP4cFvW8Hc8k2Oq101b1kEcSCLARWWvE9/bk9xIUyrqBkR4XjC0vb6qq
 1HbKVdZ7KFKCkBHy9xMpr7CUjKiDiiLeUmqlhyjcK9spicuNIZQnC11HemL6/USP
 oR6Ext9RGhvz+ch656+5+L6f6FURVP8/ywKiJ3RjmvXV5/fCYo3WMitOB2qzlWCy
 SYXAczAOMOdOo+1Dxbghrr+7HzUWPqgfPmntZEPGMZhfuZ6xXr+7pGLjAhHb4vcR
 SZxqkDo1cprqrR9KFAWC
 =YKLk
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU Updates from Joerg Roedel:
 "A few patches have been queued up for this merge window:

   - improvements for the ARM-SMMU driver (IOMMU_EXEC support, IOMMU
     group support)
   - updates and fixes for the shmobile IOMMU driver
   - various fixes to generic IOMMU code and the Intel IOMMU driver
   - some cleanups in IOMMU drivers (dev_is_pci() usage)"

* tag 'iommu-updates-v3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (36 commits)
  iommu/vt-d: Fix signedness bug in alloc_irte()
  iommu/vt-d: free all resources if failed to initialize DMARs
  iommu/vt-d, trivial: clean sparse warnings
  iommu/vt-d: fix wrong return value of dmar_table_init()
  iommu/vt-d: release invalidation queue when destroying IOMMU unit
  iommu/vt-d: fix access after free issue in function free_dmar_iommu()
  iommu/vt-d: keep shared resources when failed to initialize iommu devices
  iommu/vt-d: fix invalid memory access when freeing DMAR irq
  iommu/vt-d, trivial: simplify code with existing macros
  iommu/vt-d, trivial: use defined macro instead of hardcoding
  iommu/vt-d: mark internal functions as static
  iommu/vt-d, trivial: clean up unused code
  iommu/vt-d, trivial: check suitable flag in function detect_intel_iommu()
  iommu/vt-d, trivial: print correct domain id of static identity domain
  iommu/vt-d, trivial: refine support of 64bit guest address
  iommu/vt-d: fix resource leakage on error recovery path in iommu_init_domains()
  iommu/vt-d: fix a race window in allocating domain ID for virtual machines
  iommu/vt-d: fix PCI device reference leakage on error recovery path
  drm/msm: Fix link error with !MSM_IOMMU
  iommu/vt-d: use dedicated bitmap to track remapping entry allocation status
  ...
2014-01-29 20:00:13 -08:00
Alex Williamson 08336fd218 intel-iommu: fix off-by-one in pagetable freeing
dma_pte_free_level() has an off-by-one error when checking whether a pte
is completely covered by a range.  Take for example the case of
attempting to free pfn 0x0 - 0x1ff, ie.  512 entries covering the first
2M superpage.

The level_size() is 0x200 and we test:

  static void dma_pte_free_level(...
	...

	if (!(0 > 0 || 0x1ff < 0 + 0x200)) {
		...
	}

Clearly the 2nd test is true, which means we fail to take the branch to
clear and free the pagetable entry.  As a result, we're leaking
pagetables and failing to install new pages over the range.

This was found with a PCI device assigned to a QEMU guest using vfio-pci
without a VGA device present.  The first 1M of guest address space is
mapped with various combinations of 4K pages, but eventually the range
is entirely freed and replaced with a 2M contiguous mapping.
intel-iommu errors out with something like:

  ERROR: DMA PTE for vPFN 0x0 already set (to 5c2b8003 not 849c00083)

In this case 5c2b8003 is the pointer to the previous leaf page that was
neither freed nor cleared and 849c00083 is the superpage entry that
we're trying to replace it with.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:41 -08:00
Jiang Liu 9bdc531ec6 iommu/vt-d: free all resources if failed to initialize DMARs
Enhance intel_iommu_init() to free all resources if failed to
initialize DMAR hardware.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-01-09 12:44:30 +01:00