Commit 4d4bbd8526 ("mm, oom_reaper: skip mm structs with mmu
notifiers") prevented the oom reaper from unmapping private anonymous
memory with the oom reaper when the oom victim mm had mmu notifiers
registered.
The rationale is that doing mmu_notifier_invalidate_range_{start,end}()
around the unmap_page_range(), which is needed, can block and the oom
killer will stall forever waiting for the victim to exit, which may not
be possible without reaping.
That concern is real, but only true for mmu notifiers that have
blockable invalidate_range_{start,end}() callbacks. This patch adds a
"flags" field to mmu notifier ops that can set a bit to indicate that
these callbacks do not block.
The implementation is steered toward an expensive slowpath, such as
after the oom reaper has grabbed mm->mmap_sem of a still alive oom
victim.
[rientjes@google.com: mmu_notifier_invalidate_range_end() can also call the invalidate_range() must not block, fix comment]
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1801091339570.240101@chino.kir.corp.google.com
[akpm@linux-foundation.org: make mm_has_blockable_invalidate_notifiers() return bool, use rwsem_is_locked()]
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1712141329500.74052@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Dimitri Sivanich <sivanich@hpe.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Oded Gabbay <oded.gabbay@gmail.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that no more drivers rely on arbitrary early initialisation via an
of_iommu_init_fn hook, let's clean up the redundant remnants. The
IOMMU_OF_DECLARE() macro needs to remain for now, as the probe-deferral
mechanism has no other nice way to detect built-in drivers before they
have registered themselves, such that it can make the right decision.
Reviewed-by: Sricharan R <sricharan@codeaurora.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Having of_iommu_init() call ipmmu_init() via ipmmu_vmsa_iommu_of_setup()
does nothing that the subsys_initcall wouldn't do slightly later anyway,
since probe-deferral of masters means it is no longer critical to
register the driver super-early. Clean it up.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Since the MSM IOMMU driver now probes via DT exclusively rather than
platform data, dependent masters should be deferred until the IOMMU
itself is ready. Thus we can do away with the early initialisation
hook to unconditionally claim the bus ops, and instead do that only
once an IOMMU is actually probed. Furthermore, this should also make
the driver safe for multiplatform kernels on non-MSM SoCs.
Reviewed-by: Sricharan R <sricharan@codeaurora.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
If the CPU has support for 5-level paging enabled and the IOMMU also
supports 5-level paging then enable the 5-level paging mode for first-
level translations - used when SVM is enabled.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add a check to verify IOMMU 5-level paging support. If the CPU supports
supports 5-level paging but the IOMMU does not support it then disable
SVM by not allocating PASID tables.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add a check to verify IOMMU 1GB page support. If the CPU supports 1GB
pages but the IOMMU does not support it then disable SVM by not
allocating PASID tables.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Update the IOMMU default domain address width to 57 bits. This would
enable the IOMMU to do upto 5-levels of paging for second level
translations - IOVA translation requests without PASID.
Even though the maximum supported address width is being increased to
57, __iommu_calculate_agaw() would set the actual supported address
width to the maximum support available in IOMMU hardware.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Removing the early device registration hook overlooked the fact that
it only ran conditionally on a compatible device being present in the
DT. With exynos_iommu_init() now running as an unconditional initcall,
problems arise on non-Exynos systems when other IOMMU drivers find
themselves unable to install their ops on the platform bus, or at worst
the Exynos ops get called with someone else's domain and all hell breaks
loose.
The global ops/cache setup could probably all now be triggered from the
first IOMMU probe, as with dma_dev assigment, but for the time being the
simplest fix is to resurrect the logic from commit a7b67cd5d9
("iommu/exynos: Play nice in multi-platform builds") to explicitly check
the DT for the presence of an Exynos IOMMU before trying anything.
Fixes: 928055a01b ("iommu/exynos: Remove custom platform device registration code")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When exposing data access through debugfs, the correct
debugfs_create_*() functions must be used, depending on data type.
Remove all casts from data pointers passed to debugfs_create_*()
functions, as such casts prevent the compiler from flagging bugs.
omap_iommu.nr_tlb_entries is "int", hence casting to "u8 *" exposes only
a part of it. Fix this by using debugfs_create_u32() instead.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Move the few remaining bits of swiotlb glue towards their callers,
and remove the pointless on ia64 swiotlb variable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian König <christian.koenig@amd.com>
pci_get_bus_and_slot() is restrictive such that it assumes domain=0 as
where a PCI device is present. This restricts the device drivers to be
reused for other domain numbers.
Getting ready to remove pci_get_bus_and_slot() function in favor of
pci_get_domain_bus_and_slot().
Hard-code the domain number as 0 for the AMD IOMMU driver.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
Reviewed-by: Gary R Hook <gary.hook@amd.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
For PCI devices behind an aliasing PCIe-to-PCI/X bridge, the bridge
alias to DevFn 0.0 on the subordinate bus may match the original RID of
the device, resulting in the same SID being present in the device's
fwspec twice. This causes trouble later in arm_smmu_write_strtab_ent()
when we wind up visiting the STE a second time and find it already live.
Avoid the issue by giving arm_smmu_install_ste_for_dev() the cleverness
to skip over duplicates. It seems mildly counterintuitive compared to
preventing the duplicates from existing in the first place, but since
the DT and ACPI probe paths build their fwspecs differently, this is
actually the cleanest and most self-contained way to deal with it.
Cc: <stable@vger.kernel.org>
Fixes: 8f78515425 ("iommu/arm-smmu: Implement of_xlate() for SMMUv3")
Reported-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com>
Tested-by: Tomasz Nowicki <Tomasz.Nowicki@cavium.com>
Tested-by: Jayachandran C. <jnair@caviumnetworks.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Kasan reports a double free when finalise_stage_fn fails: the io_pgtable
ops are freed by arm_smmu_domain_finalise and then again by
arm_smmu_domain_free. Prevent this by leaving pgtbl_ops empty on failure.
Cc: <stable@vger.kernel.org>
Fixes: 48ec83bcbc ("iommu/arm-smmu: Add initial driver support for ARM SMMUv3 devices")
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The 'early' argument of irq_domain_activate_irq() is actually used to
denote reservation mode. To avoid confusion, rename it before abuse
happens.
No functional change.
Fixes: 7249164346 ("genirq/irqdomain: Update irq_domain_ops.activate() signature")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexandru Chirvasitu <achirvasub@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Mikael Pettersson <mikpelinux@gmail.com>
Cc: Josh Poulson <jopoulso@microsoft.com>
Cc: Mihai Costache <v-micos@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-pci@vger.kernel.org
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Simon Xiao <sixiao@microsoft.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Jork Loeser <Jork.Loeser@microsoft.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: devel@linuxdriverproject.org
Cc: KY Srinivasan <kys@microsoft.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Sakari Ailus <sakari.ailus@intel.com>,
Cc: linux-media@vger.kernel.org
The Tegra memory controller driver will now instruct the SMMU driver to
create groups, which will make it easier for device drivers to share an
IOMMU domain between multiple devices.
Initial Tegra186 support is also added in a separate driver.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEiOrDCAFJzPfAjcif3SOs138+s6EFAlo6tlMTHHRyZWRpbmdA
bnZpZGlhLmNvbQAKCRDdI6zXfz6zoR7MEACpoyPkV5fGz3z8rdsYaVZSA+upYEHw
CC4v5XslNCucmk57XIYTbAZRu7W5MuzP3hq+DoYM5B5EHxgt1a2YHNtjjg3eLx1T
U+8hjoOPOYu7FBVmUf4V6Buvmws90DxtfvqcfIScJJEAPgKsT/wfGXNstc88kDnO
c+huQxk5QZsyGBgt1sS61zlymaLpDlSoE6kScXnjM1K6XtUdZ+8zmoMdZ7Ka8DUv
kwBoYXgUpVminWpSrfuV7f3/8BtTSJ/b2rUPz7Ren0JJTFrhIi/ckeQ5YhTa46Dn
oqR0f6MuLRGADT+E5DfoF7otvE2Fvq0Uq/YlypIf9gtiit5mhPM7ZXb3LOwDGiCp
0mwljW9Uov7ZpFnb8b9SlVf1b+zDeeLxtARq4qzgky5OkWZp4hXZw+Os3ebXQYW3
d/MUfwdqgNKOccGRsRkk9OEdUiFGpGtInb0xM9OWBCHvdrXhbK9llZ3wnrE+M3SY
BELM+94YorqVPx5JTmiM7kTozVwpkvZQG6kU2J81Y55FI1ELQF7Mk0yzFs562tG5
2abc07xkSUCdC6zsrgNOy+AtuRH8oHccUfZRACLUhzHNCrvfr3MrEcEv8l4VRJp4
ZLfR8UjmtTI8YllBKdSdUtFB4nN5oyoIPCNsgwzXcTnKV5SzAyA/KuDW6IWoNU1L
6ekAxpeiHVnLKA==
=ncAl
-----END PGP SIGNATURE-----
Merge tag 'tegra-for-4.16-memory' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tegra/linux into next/drivers
Pull "memory: tegra: Changes for v4.16-rc1" from Thierry Reding:
The Tegra memory controller driver will now instruct the SMMU driver to
create groups, which will make it easier for device drivers to share an
IOMMU domain between multiple devices.
Initial Tegra186 support is also added in a separate driver.
* tag 'tegra-for-4.16-memory' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
iommu/tegra-smmu: Fix return value check in tegra_smmu_group_get()
iommu/tegra: Allow devices to be grouped
memory: tegra: Create SMMU display groups
memory: tegra: Add Tegra186 support
dt-bindings: memory: Add Tegra186 support
dt-bindings: misc: Add Tegra186 MISC registers bindings
In case of error, the function iommu_group_alloc() returns ERR_PTR() and
never returns NULL. The NULL test in the return value check should be
replaced with IS_ERR().
Fixes: 7f4c9176f7 ("iommu/tegra: Allow devices to be grouped")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
The result of iommu_group_get() was being blindly used in both
attach and detach which results in a dereference when trying
to work with an unknown device.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The AMD IOMMU specification Rev 3.00 (December 2016) introduces a
new Enhanced PPR Handling Support (EPHSup) bit in the MMIO register
offset 0030h (IOMMU Extended Feature Register).
When EPHSup=1, the IOMMU hardware requires the PPR bit of the
device table entry (DTE) to be set in order to support PPR for a
particular endpoint device.
Please see https://support.amd.com/TechDocs/48882_IOMMU.pdf for
this revision of the AMD IOMMU specification.
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
When an unknown type event occurs, the default information written to
the syslog should dump raw event data. This could provide insight into
the event that occurred.
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Implement the ->device_group() and ->of_xlate() callbacks which are used
in order to group devices. Each group can then share a single domain.
This is implemented primarily in order to achieve the same semantics on
Tegra210 and earlier as on Tegra186 where the Tegra SMMU was replaced by
an ARM SMMU. Users of the IOMMU API can now use the same code to share
domains between devices, whereas previously they used to attach each
device individually.
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
The intel-iommu DMA ops fail to correctly handle scatterlists where
sg->offset is greater than PAGE_SIZE - the IOVA allocation is computed
appropriately based on the page-aligned portion of the offset, but the
mapping is set up relative to sg->page, which means it fails to actually
cover the whole buffer (and in the worst case doesn't cover it at all):
(sg->dma_address + sg->dma_len) ----+
sg->dma_address ---------+ |
iov_pfn------+ | |
| | |
v v v
iova: a b c d e f
|--------|--------|--------|--------|--------|
<...calculated....>
[_____mapped______]
pfn: 0 1 2 3 4 5
|--------|--------|--------|--------|--------|
^ ^ ^
| | |
sg->page ----+ | |
sg->offset --------------+ |
(sg->offset + sg->length) ----------+
As a result, the caller ends up overrunning the mapping into whatever
lies beyond, which usually goes badly:
[ 429.645492] DMAR: DRHD: handling fault status reg 2
[ 429.650847] DMAR: [DMA Write] Request device [02:00.4] fault addr f2682000 ...
Whilst this is a fairly rare occurrence, it can happen from the result
of intermediate scatterlist processing such as scatterwalk_ffwd() in the
crypto layer. Whilst that particular site could be fixed up, it still
seems worthwhile to bring intel-iommu in line with other DMA API
implementations in handling this robustly.
To that end, fix the intel_map_sg() path to line up the mapping
correctly (in units of MM pages rather than VT-d pages to match the
aligned_nrpages() calculation) regardless of the offset, and use
sg_phys() consistently for clarity.
Reported-by: Harsh Jain <Harsh@chelsio.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed by: Ashok Raj <ashok.raj@intel.com>
Tested by: Jacob Pan <jacob.jun.pan@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* Enforce MSI multiple IRQ alignment in AMD IOMMU
* VT-d PASID error handling fixes
* Add r8a7795 IPMMU support
* Manage runtime PM links on exynos at {add,remove}_device callbacks
* Fix Mediatek driver name to avoid conflict
* Add terminate support to qcom fault handler
* 64-bit IOVA optimizations
* Simplfy IOVA domain destruction, better use of rcache, and
skip anchor nodes on copy
* Convert to IOMMU TLB sync API in io-pgtable-arm{-v7s}
* Drop command queue lock when waiting for CMD_SYNC completion on
ARM SMMU implementations supporting MSI to cacheable memory
* iomu-vmsa cleanup inspired by missed IOTLB sync callbacks
* Fix sleeping lock with preemption disabled for RT
* Dual MMU support for TI DRA7xx DSPs
* Optional flush option on IOVA allocation avoiding overhead when
caller can try other options
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJaCgbBAAoJECObm247sIsiUp0P/16GwExsZFKqD0B7XBiBq+hZ
Q03XiPEnfnMBt05bw4dH8lnfpDPR1+6I3OzrxByc3VurfsBZ6IUh6oXPREcwdkNE
+iyS7/1RlXd7EJGsaY91fiL1Gm7iPQS/MJ50GcL288gdF68cAHygDoLPR0r1bHBC
PzjLhmgSYcqYQwRgnoclUe4/sNEIWWu3Z0trltNmiQ9eFEXW4WCc1/OzCoAmfxY7
EPiKynP+vCBTvzZJ/7m4oBpfciO/WuF8KhnNDjmL/lKGAQGGChiUyQq77pkdaUzm
6MjGfLAzO1xJ3DYM1ozDGBnVXR3AlDE2f+MlveT9bwVHkFLB3WF5atb70Cx70Zju
Lp2ZC2sm42AlHN0WqLGU9Q1pGPhZwXGkoF5kAxcsSh1rWC+bjfjtelpIg9PLngMi
rBAgRLLveGMM3jQcWv4259dZkByJOyFurQwYzHToacj+bE0hGzLv5cFWS/AD8kIE
rdxFBdNWlMARDWoJO+jYOzEr/QTDXovLgWsNvS2C2QDfmdXL1s0yqjdU9ahC3wWJ
pi//PydoXoyuEbN3M04ccjDNCZLX+JYIvvaZAxF5gGVt99VCKhf2Rw2wtLPviPNN
K+8NmUp1xAZX7Ak47HDNHIS3ZVB6CgP5pmoBvqZW1k5yyf5kvbvi2axQta/cyr6a
4ey4GeRpF+iad1n1U/WX
=o4qF
-----END PGP SIGNATURE-----
Merge tag 'iommu-v4.15-rc1' of git://github.com/awilliam/linux-vfio
Pull IOMMU updates from Alex Williamson:
"As Joerg mentioned[1], he's out on paternity leave through the end of
the year and I'm filling in for him in the interim:
- Enforce MSI multiple IRQ alignment in AMD IOMMU
- VT-d PASID error handling fixes
- Add r8a7795 IPMMU support
- Manage runtime PM links on exynos at {add,remove}_device callbacks
- Fix Mediatek driver name to avoid conflict
- Add terminate support to qcom fault handler
- 64-bit IOVA optimizations
- Simplfy IOVA domain destruction, better use of rcache, and skip
anchor nodes on copy
- Convert to IOMMU TLB sync API in io-pgtable-arm{-v7s}
- Drop command queue lock when waiting for CMD_SYNC completion on ARM
SMMU implementations supporting MSI to cacheable memory
- iomu-vmsa cleanup inspired by missed IOTLB sync callbacks
- Fix sleeping lock with preemption disabled for RT
- Dual MMU support for TI DRA7xx DSPs
- Optional flush option on IOVA allocation avoiding overhead when
caller can try other options
[1] https://lkml.org/lkml/2017/10/22/72"
* tag 'iommu-v4.15-rc1' of git://github.com/awilliam/linux-vfio: (54 commits)
iommu/iova: Use raw_cpu_ptr() instead of get_cpu_ptr() for ->fq
iommu/mediatek: Fix driver name
iommu/ipmmu-vmsa: Hook up r8a7795 DT matching code
iommu/ipmmu-vmsa: Allow two bit SL0
iommu/ipmmu-vmsa: Make IMBUSCTR setup optional
iommu/ipmmu-vmsa: Write IMCTR twice
iommu/ipmmu-vmsa: IPMMU device is 40-bit bus master
iommu/ipmmu-vmsa: Make use of IOMMU_OF_DECLARE()
iommu/ipmmu-vmsa: Enable multi context support
iommu/ipmmu-vmsa: Add optional root device feature
iommu/ipmmu-vmsa: Introduce features, break out alias
iommu/ipmmu-vmsa: Unify ipmmu_ops
iommu/ipmmu-vmsa: Clean up struct ipmmu_vmsa_iommu_priv
iommu/ipmmu-vmsa: Simplify group allocation
iommu/ipmmu-vmsa: Unify domain alloc/free
iommu/ipmmu-vmsa: Fix return value check in ipmmu_find_group_dma()
iommu/vt-d: Clear pasid table entry when memory unbound
iommu/vt-d: Clear Page Request Overflow fault bit
iommu/vt-d: Missing checks for pasid tables if allocation fails
iommu/amd: Limit the IOVA page range to the specified addresses
...
get_cpu_ptr() disabled preemption and returns the ->fq object of the
current CPU. raw_cpu_ptr() does the same except that it not disable
preemption which means the scheduler can move it to another CPU after it
obtained the per-CPU object.
In this case this is not bad because the data structure itself is
protected with a spin_lock. This change shouldn't matter however on RT
it does because the sleeping lock can't be accessed with disabled
preemption.
Cc: Joerg Roedel <joro@8bytes.org>
Cc: iommu@lists.linux-foundation.org
Reported-by: vinadhy@gmail.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
There exist two Mediatek iommu drivers for the two different
generations of the device. But both drivers have the same name
"mtk-iommu". This breaks the registration of the second driver:
Error: Driver 'mtk-iommu' is already registered, aborting...
Fix this by changing the name for first generation to
"mtk-iommu-v1".
Fixes: b17336c55d ("iommu/mediatek: add support for mtk iommu generation one HW")
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Tie in r8a7795 features and update the IOMMU_OF_DECLARE
compat string to include the updated compat string.
Signed-off-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Introduce support for two bit SL0 bitfield in IMTTBCR
by using a separate feature flag.
Signed-off-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Introduce a feature to allow opt-out of setting up
IMBUSCR. The default case is unchanged.
Signed-off-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Write IMCTR both in the root device and the leaf node.
To allow access of IMCTR introduce the following function:
- ipmmu_ctx_write_all()
While at it also rename context functions:
- ipmmu_ctx_read() -> ipmmu_ctx_read_root()
- ipmmu_ctx_write() -> ipmmu_ctx_write_root()
Signed-off-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The r8a7795 IPMMU supports 40-bit bus mastering. Both
the coherent DMA mask and the streaming DMA mask are
set to unlock the 40-bit address space for coherent
allocations and streaming operations.
Signed-off-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Hook up IOMMU_OF_DECLARE() support in case CONFIG_IOMMU_DMA
is enabled. The only current supported case for 32-bit ARM
is disabled, however for 64-bit ARM usage of OF is required.
Signed-off-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Add support for up to 8 contexts. Each context is mapped to one
domain. One domain is assigned one or more slave devices. Contexts
are allocated dynamically and slave devices are grouped together
based on which IPMMU device they are connected to. This makes slave
devices tied to the same IPMMU device share the same IOVA space.
Signed-off-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Add root device handling to the IPMMU driver by allowing certain
DT compat strings to enable has_cache_leaf_nodes that in turn will
support both root devices with interrupts and leaf devices that
face the actual IPMMU consumer devices.
Signed-off-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Introduce struct ipmmu_features to track various hardware
and software implementation changes inside the driver for
different kinds of IPMMU hardware. Add use_ns_alias_offset
as a first example of a feature to control if the secure
register bank offset should be used or not.
Signed-off-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The remaining difference between the ARM-specific and iommu-dma ops is
in the {add,remove}_device implementations, but even those have some
overlap and duplication. By stubbing out the few arm_iommu_*() calls,
we can get rid of the rest of the inline #ifdeffery to both simplify the
code and improve build coverage.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Now that the IPMMU instance pointer is the only thing remaining in the
private data structure, we no longer need the extra level of indirection
and can simply stash that directlty in the fwspec.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
We go through quite the merry dance in order to find masters behind the
same IPMMU instance, so that we can ensure they are grouped together.
None of which is really necessary, since the master's private data
already points to the particular IPMMU it is associated with, and that
IPMMU instance data is the perfect place to keep track of a per-instance
group directly.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
We have two implementations for ipmmu_ops->alloc depending on
CONFIG_IOMMU_DMA, the difference being whether they accept the
IOMMU_DOMAIN_DMA type or not. However, iommu_dma_get_cookie() is
guaranteed to return an error when !CONFIG_IOMMU_DMA, so if
ipmmu_domain_alloc_dma() was actually checking and handling the return
value correctly, it would behave the same as ipmmu_domain_alloc()
anyway.
Similarly for freeing; iommu_put_dma_cookie() is robust by design.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
In case of error, the function iommu_group_get() returns NULL pointer
not ERR_PTR(). The IS_ERR() test in the return value check should be
replaced with NULL test.
Fixes: 3ae4729202 ("iommu/ipmmu-vmsa: Add new IOMMU_DOMAIN_DMA ops")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
In intel_svm_unbind_mm(), pasid table entry must be cleared during
svm free. Otherwise, hardware may be set up with a wild pointer.
Suggested-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Currently Page Request Overflow bit in IOMMU Fault Status register
is not cleared. Not clearing this bit would mean that any future
page-request is going to be automatically dropped by IOMMU.
Suggested-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
intel_svm_alloc_pasid_tables() might return an error but never be
checked by the callers. Later when intel_svm_bind_mm() is called,
there are no checks for valid pasid tables before enabling them.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Liu, Yi L <yi.l.liu@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The extent of pages specified when applying a reserved region should
include up to the last page of the range, but not the page following
the range.
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Fixes: 8d54d6c8b8 ('iommu/amd: Implement apply_dm_region call-back')
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This is quite useful for debugging. Currently, always TERMINATE the
translation when the fault handler returns (since this is all we need
for debugging drivers). But I expect the SVM work should eventually
let us do something more clever.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Variable flush_addr is being assigned but is never read; it
is redundant and can be removed. Cleans up the clang warning:
drivers/iommu/amd_iommu.c:2388:2: warning: Value stored to 'flush_addr'
is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
On an is_allocated() interrupt index, we ALIGN() the current index and
then increment it via the for loop, guaranteeing that it is no longer
aligned for alignments >1. We instead need to align the next index,
to guarantee forward progress, moving the increment-only to the case
where the index was found to be unallocated.
Fixes: 37946d95fc ('iommu/amd: Add align parameter to alloc_irq_index()')
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While CMD_SYNC is unlikely to complete immediately such that we never go
round the polling loop, with a lightly-loaded queue it may still do so
long before the delay period is up. If we have no better completion
notifier, use similar logic as we have for SMMUv2 to spin a number of
times before each backoff, so that we have more chance of catching syncs
which complete relatively quickly and avoid delaying unnecessarily.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We have separate (identical) timeout values for polling for a queue to
drain and waiting for an MSI to signal CMD_SYNC completion. In reality,
we only wait for the command queue to drain if we're waiting on a sync,
so just merged these two timeouts into a single constant.
Signed-off-by: Will Deacon <will.deacon@arm.com>
arm_smmu_cmdq_issue_sync is a little unwieldy now that it supports both
MSI and event-based polling, so split it into two functions to make things
easier to follow.
Signed-off-by: Will Deacon <will.deacon@arm.com>
As an IRQ, the CMD_SYNC interrupt is not particularly useful, not least
because we often need to wait for sync completion within someone else's
IRQ handler anyway. However, when the SMMU is both coherent and supports
MSIs, we can have a lot more fun by not using it as an interrupt at all.
Following the example suggested in the architecture and using a write
targeting normal memory, we can let callers wait on a status variable
outside the lock instead of having to stall the entire queue or even
touch MMIO registers. Since multiple sync commands are guaranteed to
complete in order, a simple incrementing sequence count is all we need
to unambiguously support any realistic number of overlapping waiters.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The cmdq-sync interrupt is never going to be particularly useful, since
for stage 1 DMA at least we'll often need to wait for sync completion
within someone else's IRQ handler, thus have to implement polling
anyway. Beyond that, the overhead of taking an interrupt, then still
having to grovel around in the queue to figure out *which* sync command
completed, doesn't seem much more attractive than simple polling either.
Furthermore, if an implementation both has wired interrupts and supports
MSIs, then we don't want to be taking the IRQ unnecessarily if we're
using the MSI write to update memory. Let's just make life simpler by
not even bothering to claim it in the first place.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
CMD_SYNC already has a bit of special treatment here and there, but as
we're about to extend it with more functionality for completing outside
the CMDQ lock, things are going to get rather messy if we keep trying to
cram everything into a single generic command interface. Instead, let's
break out the issuing of CMD_SYNC into its own specific helper where
upcoming changes will have room to breathe.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Slightly confusingly, when reporting a mismatch of the ID register
value, we still refer to the IORT COHACC override flag as the
"dma-coherent property" if we booted with ACPI. Update the message
to be firmware-agnostic in line with SMMUv2.
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
According to Spec, it is ILLEGAL to set STE.S1STALLD if STALL_MODEL
is not 0b00, which means we should not disable stall mode if stall
or terminate mode is not configuable.
Meanwhile, it is also ILLEGAL when STALL_MODEL==0b10 && CD.S==0 which
means if stall mode is force we should always set CD.S.
As Jean-Philippe's suggestion, this patch introduce a feature bit
ARM_SMMU_FEAT_STALL_FORCE, which means smmu only supports stall force.
Therefore, we can avoid the ILLEGAL setting of STE.S1STALLD.by checking
ARM_SMMU_FEAT_STALL_FORCE.
This patch keeps the ARM_SMMU_FEAT_STALLS as the meaning of stall supported
(force or configuable) to easy to expand the future function, i.e. we can
only use ARM_SMMU_FEAT_STALLS to check whether we should register fault
handle or enable master can_stall, etc to supporte platform SVM.
The feature bit, STE.S1STALLD and CD.S setting will be like:
STALL_MODEL FEATURE S1STALLD CD.S
0b00 ARM_SMMU_FEAT_STALLS 0b1 0b0
0b01 !ARM_SMMU_FEAT_STALLS && !ARM_SMMU_FEAT_STALL_FORCE 0b0 0b0
0b10 ARM_SMMU_FEAT_STALLS && ARM_SMMU_FEAT_STALL_FORCE 0b0 0b1
after apply this patch.
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The ARM SMMU identity mapping performance was poor compared with the
DMA mode. It was found that enable caching would restore the performance
back to normal. The S2CRB_TLBEN bit in the ACR register would allow for
caching of the stream to context register bypass transaction information.
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Feng Kan <fkan@apm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The SMMUv3 architecture permits caching of data structures deemed to be
"reachable" by the SMU, which includes STEs marked as invalid. When
transitioning an STE to a bypass/fault configuration at init or detach
time, we mistakenly elide the CMDQ_OP_CFGI_STE operation in some cases,
therefore potentially leaving the old STE state cached in the SMMU.
This patch fixes the problem by ensuring that we perform the
CMDQ_OP_CFGI_STE operation irrespective of the validity of the previous
STE.
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reported-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that the kernel headers have synced with the relevant upstream
ACPICA updates, it's time to clean up the temporary local definitions.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The function only sends the flush command to the IOMMU(s),
but does not wait for its completion when it returns. Fix
that.
Fixes: 601367d76b ('x86/amd-iommu: Remove iommu_flush_domain function')
Cc: stable@vger.kernel.org # >= 2.6.33
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Since IOVA allocation failure is not unusual case we need to flush
CPUs' rcache in hope we will succeed in next round.
However, it is useful to decide whether we need rcache flush step because
of two reasons:
- Not scalability. On large system with ~100 CPUs iterating and flushing
rcache for each CPU becomes serious bottleneck so we may want to defer it.
- free_cpu_cached_iovas() does not care about max PFN we are interested in.
Thus we may flush our rcaches and still get no new IOVA like in the
commonly used scenario:
if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev))
iova = alloc_iova_fast(iovad, iova_len, DMA_BIT_MASK(32) >> shift);
if (!iova)
iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift);
1. First alloc_iova_fast() call is limited to DMA_BIT_MASK(32) to get
PCI devices a SAC address
2. alloc_iova() fails due to full 32-bit space
3. rcaches contain PFNs out of 32-bit space so free_cpu_cached_iovas()
throws entries away for nothing and alloc_iova() fails again
4. Next alloc_iova_fast() call cannot take advantage of rcache since we
have just defeated caches. In this case we pick the slowest option
to proceed.
This patch reworks flushed_rcache local flag to be additional function
argument instead and control rcache flush step. Also, it updates all users
to do the flush as the last chance.
Signed-off-by: Tomasz Nowicki <Tomasz.Nowicki@caviumnetworks.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Nate Watterson <nwatters@codeaurora.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Exynos SYSMMU registers standard platform device with sysmmu_of_match
table, what means that this table is accessed every time a new platform
device is registered in a system. This might happen also after the boot,
so the table must not be attributed as initconst to avoid potential kernel
oops caused by access to freed memory.
Fixes: 6b21a5db36 ("iommu/exynos: Support for device tree")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When SME memory encryption is active it will rely on SWIOTLB to handle
DMA for devices that cannot support the addressing requirements of
having the encryption mask set in the physical address. The IOMMU
currently disables SWIOTLB if it is not running in passthrough mode.
This is not desired as non-PCI devices attempting DMA may fail. Update
the code to check if SME is active and not disable SWIOTLB.
Fixes: 2543a786aa ("iommu/amd: Allow the AMD IOMMU to work with memory encryption")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Make use of the new alignment capability of
alloc_irq_index() to enforce IRQ index alignment
for MSI.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Fixes: 2b32450634 ('iommu/amd: Add routines to manage irq remapping tables')
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
For multi-MSI IRQ ranges the IRQ index needs to be aligned
to the power-of-two of the requested IRQ count. Extend the
alloc_irq_index() function to allow such an allocation.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Fixes: 2b32450634 ('iommu/amd: Add routines to manage irq remapping tables')
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Variable did_old is unsigned so checking whether it is
greater or equal to zero is not necessary.
Signed-off-by: Christos Gkekas <chris.gekas@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The notifier function will take the dmar_global_lock too, so
lockdep complains about inverse locking order when the
notifier is registered under the dmar_global_lock.
Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Fixes: 59ce0515cd ('iommu/vt-d: Update DRHD/RMRR/ATSR device scope caches when PCI hotplug happens')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Now that the core API issues its own post-unmap TLB sync call, push that
operation out from the io-pgtable-arm-v7s internals into the users. For
now, we leave the invalidation implicit in the unmap operation, since
none of the current users would benefit much from any change to that.
Note that the conversion of msm_iommu is implicit, since that apparently
has no specific TLB sync operation anyway.
CC: Yong Wu <yong.wu@mediatek.com>
CC: Rob Clark <robdclark@gmail.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Now that the core API issues its own post-unmap TLB sync call, push that
operation out from the io-pgtable-arm internals into the users. For now,
we leave the invalidation implicit in the unmap operation, since none of
the current users would benefit much from any change to that.
CC: Magnus Damm <damm+renesas@opensource.se>
CC: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Anchor nodes are not reserved IOVAs in the way that copy_reserved_iova()
cares about - while the failure from reserve_iova() is benign since the
target domain will already have its own anchor, we still don't want to
be triggering spurious warnings.
Reported-by: kernel test robot <fengguang.wu@intel.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Fixes: bb68b2fbfb ('iommu/iova: Add rbtree anchor node')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When devices with different DMA masks are using the same domain, or for
PCI devices where we usually try a speculative 32-bit allocation first,
there is a fair possibility that the top PFN of the rcache stack at any
given time may be unsuitable for the lower limit, prompting a fallback
to allocating anew from the rbtree. Consequently, we may end up
artifically increasing pressure on the 32-bit IOVA space as unused IOVAs
accumulate lower down in the rcache stacks, while callers with 32-bit
masks also impose unnecessary rbtree overhead.
In such cases, let's try a bit harder to satisfy the allocation locally
first - scanning the whole stack should still be relatively inexpensive.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When popping a pfn from an rcache, we are currently checking it directly
against limit_pfn for viability. Since this represents iova->pfn_lo, it
is technically possible for the corresponding iova->pfn_hi to be greater
than limit_pfn. Although we generally get away with it in practice since
limit_pfn is typically a power-of-two boundary and the IOVAs are
size-aligned, it's pretty trivial to make the iova_rcache_get() path
take the allocation size into account for complete safety.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
All put_iova_domain() should have to worry about is freeing memory - by
that point the domain must no longer be live, so the act of cleaning up
doesn't need to be concurrency-safe or maintain the rbtree in a
self-consistent state. There's no need to waste time with locking or
emptying the rcache magazines, and we can just use the postorder
traversal helper to clear out the remaining rbtree entries in-place.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The logic of __get_cached_rbnode() is a little obtuse, but then
__get_prev_node_of_cached_rbnode_or_last_node_and_update_limit_pfn()
wouldn't exactly roll off the tongue...
Now that we have the invariant that there is always a valid node to
start searching downwards from, everything gets a bit easier to follow
if we simplify that function to do what it says on the tin and return
the cached node (or anchor node as appropriate) directly. In turn, we
can then deduplicate the rb_prev() and limit_pfn logic into the main
loop itself, further reduce the amount of code under the lock, and
generally make the inner workings a bit less subtle.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add a permanent dummy IOVA reservation to the rbtree, such that we can
always access the top of the address space instantly. The immediate
benefit is that we remove the overhead of the rb_last() traversal when
not using the cached node, but it also paves the way for further
simplifications.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Now that the cached node optimisation can apply to all allocations, the
couple of users which were playing tricks with dma_32bit_pfn in order to
benefit from it can stop doing so. Conversely, there is also no need for
all the other users to explicitly calculate a 'real' 32-bit PFN, when
init_iova_domain() can happily do that itself from the page granularity.
CC: Thierry Reding <thierry.reding@gmail.com>
CC: Jonathan Hunter <jonathanh@nvidia.com>
CC: David Airlie <airlied@linux.ie>
CC: Sudeep Dutt <sudeep.dutt@intel.com>
CC: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Zhen Lei <thunder.leizhen@huawei.com>
Tested-by: Nate Watterson <nwatters@codeaurora.org>
[rm: use iova_shift(), rewrote commit message]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The cached node mechanism provides a significant performance benefit for
allocations using a 32-bit DMA mask, but in the case of non-PCI devices
or where the 32-bit space is full, the loss of this benefit can be
significant - on large systems there can be many thousands of entries in
the tree, such that walking all the way down to find free space every
time becomes increasingly awful.
Maintain a similar cached node for the whole IOVA space as a superset of
the 32-bit space so that performance can remain much more consistent.
Inspired by work by Zhen Lei <thunder.leizhen@huawei.com>.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Zhen Lei <thunder.leizhen@huawei.com>
Tested-by: Nate Watterson <nwatters@codeaurora.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The mask for calculating the padding size doesn't change, so there's no
need to recalculate it every loop iteration. Furthermore, Once we've
done that, it becomes clear that we don't actually need to calculate a
padding size at all - by flipping the arithmetic around, we can just
combine the upper limit, size, and mask directly to check against the
lower limit.
For an arm64 build, this alone knocks 20% off the object code size of
the entire alloc_iova() function!
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Zhen Lei <thunder.leizhen@huawei.com>
Tested-by: Nate Watterson <nwatters@codeaurora.org>
[rm: simplified more of the arithmetic, rewrote commit message]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Checking the IOVA bounds separately before deciding which direction to
continue the search (if necessary) results in redundantly comparing both
pfns twice each. GCC can already determine that the final comparison op
is redundant and optimise it down to 3 in total, but we can go one
further with a little tweak of the ordering (which makes the intent of
the code that much cleaner as a bonus).
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Zhen Lei <thunder.leizhen@huawei.com>
Tested-by: Nate Watterson <nwatters@codeaurora.org>
[rm: rewrote commit message to clarify]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
pr_err() messages should end with a new-line to avoid other messages
being concatenated. So replace '/n' with '\n'.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Fixes: 45a01c4293 ('iommu/amd: Add function copy_dev_tables()')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Fix the commit 81b3c25218 ("iommu/io-pgtable: Introduce explicit
coherency"). If there is no IO_PGTABLE_QUIRK_NO_DMA, we should call
dma_sync_single_for_device for cache synchronization.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Fixes: 81b3c25218 ('iommu/io-pgtable: Introduce explicit coherency')
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
With the upcoming reservation/management scheme, early activation will
assign a special vector. The final activation at request_irq() assigns a
real vector, which needs to be updated in the tables.
Split out the reconfiguration code in set_affinity and use it for
reactivation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Yu Chen <yu.c.chen@intel.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: iommu@lists.linux-foundation.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Link: https://lkml.kernel.org/r/20170913213155.944883733@linutronix.de
With the upcoming reservation/management scheme, early activation will
assign a special vector. The final activation at request_irq() assigns a
real vector, which needs to be updated in the tables.
Split out the reconfiguration code in set_affinity and use it for
reactivation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Yu Chen <yu.c.chen@intel.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: iommu@lists.linux-foundation.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Link: https://lkml.kernel.org/r/20170913213155.853028808@linutronix.de
The irq_domain_ops.activate() callback has no return value and no way to
tell the function that the activation is early.
The upcoming changes to support a reservation scheme which allows to assign
interrupt vectors on x86 only when the interrupt is actually requested
requires:
- A return value, so activation can fail at request_irq() time
- Information that the activate invocation is early, i.e. before
request_irq().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Yu Chen <yu.c.chen@intel.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Link: https://lkml.kernel.org/r/20170913213152.848490816@linutronix.de
of_pci_iommu_init() tries to be clever and stop its alias walk at the
device represented by master_np, in case of weird PCI topologies where
the bridge to the IOMMU and the rest of the system is not at the root.
It turns out this is a bit short-sighted, since there are plenty of
other callers of pci_for_each_dma_alias() which would also need the same
behaviour in that situation, and the only platform so far with such a
topology (Cavium ThunderX2) already solves it more generally via a PCI
quirk. As this check is effectively redundant, and returning a boolean
value as an int is a bit broken anyway, let's just get rid of it.
Reported-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Fixes: d87beb7492 ("iommu/of: Handle PCI aliases properly")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
If NO_DMA=y:
warning: (IPMMU_VMSA && ARM_SMMU && ARM_SMMU_V3 && QCOM_IOMMU) selects IOMMU_IO_PGTABLE_LPAE which has unmet direct dependencies (IOMMU_SUPPORT && HAS_DMA && (ARM || ARM64 || COMPILE_TEST && !GENERIC_ATOMIC64))
and
drivers/iommu/io-pgtable-arm.o: In function `__arm_lpae_sync_pte':
io-pgtable-arm.c:(.text+0x206): undefined reference to `bad_dma_ops'
drivers/iommu/io-pgtable-arm.o: In function `__arm_lpae_free_pages':
io-pgtable-arm.c:(.text+0x6a6): undefined reference to `bad_dma_ops'
drivers/iommu/io-pgtable-arm.o: In function `__arm_lpae_alloc_pages':
io-pgtable-arm.c:(.text+0x812): undefined reference to `bad_dma_ops'
io-pgtable-arm.c:(.text+0x81c): undefined reference to `bad_dma_ops'
io-pgtable-arm.c:(.text+0x862): undefined reference to `bad_dma_ops'
drivers/iommu/io-pgtable-arm.o: In function `arm_lpae_run_tests':
io-pgtable-arm.c:(.init.text+0x86): undefined reference to `alloc_io_pgtable_ops'
io-pgtable-arm.c:(.init.text+0x47c): undefined reference to `free_io_pgtable_ops'
drivers/iommu/qcom_iommu.o: In function `qcom_iommu_init_domain':
qcom_iommu.c:(.text+0x1ce): undefined reference to `alloc_io_pgtable_ops'
drivers/iommu/qcom_iommu.o: In function `qcom_iommu_domain_free':
qcom_iommu.c:(.text+0x754): undefined reference to `free_io_pgtable_ops'
QCOM_IOMMU selects IOMMU_IO_PGTABLE_LPAE, which bypasses its dependency
on HAS_DMA. Make QCOM_IOMMU depend on HAS_DMA to fix this.
Fixes: 0ae349a0f3 ("iommu/qcom: Add qcom_iommu")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
add_device is a bit more suitable for establishing runtime PM links than
the xlate callback. This change also makes it possible to implement proper
cleanup - in remove_device callback.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Building with gcc-4.6 results in this warning due to
dmar_table_print_dmar_entry being inlined as in newer compiler versions:
WARNING: vmlinux.o(.text+0x5c8bee): Section mismatch in reference from the function dmar_walk_remapping_entries() to the function .init.text:dmar_table_print_dmar_entry()
The function dmar_walk_remapping_entries() references
the function __init dmar_table_print_dmar_entry().
This is often because dmar_walk_remapping_entries lacks a __init
annotation or the annotation of dmar_table_print_dmar_entry is wrong.
This removes the __init annotation to avoid the warning. On compilers
that don't show the warning today, this should have no impact since the
function gets inlined anyway.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
parisc:allmodconfig, xtensa:allmodconfig, and possibly others generate
the following Kconfig warning.
warning: (IPMMU_VMSA && ARM_SMMU && ARM_SMMU_V3 && QCOM_IOMMU) selects
IOMMU_IO_PGTABLE_LPAE which has unmet direct dependencies (IOMMU_SUPPORT &&
HAS_DMA && (ARM || ARM64 || COMPILE_TEST && !GENERIC_ATOMIC64))
IOMMU_IO_PGTABLE_LPAE depends on (COMPILE_TEST && !GENERIC_ATOMIC64),
so any configuration option selecting it needs to have the same dependencies.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
A client user instantiates and attaches to an iommu_domain to
program the OMAP IOMMU associated with the domain. The iommus
programmed by a client user are bound with the iommu_domain
through the user's device archdata. The OMAP IOMMU driver
currently supports only one IOMMU per IOMMU domain per user.
The OMAP IOMMU driver has been enhanced to support allowing
multiple IOMMUs to be programmed by a single client user. This
support is being added mainly to handle the DSP subsystems on
the DRA7xx SoCs, which have two MMUs within the same subsystem.
These MMUs provide translations for a processor core port and
an internal EDMA port. This support allows both the MMUs to
be programmed together, but with each one retaining it's own
internal state objects. The internal EDMA block is managed by
the software running on the DSPs, and this design provides
on-par functionality with previous generation OMAP DSPs where
the EDMA and the DSP core shared the same MMU.
The multiple iommus are expected to be provided through a
sentinel terminated array of omap_iommu_arch_data objects
through the client user's device archdata. The OMAP driver
core is enhanced to loop through the array of attached
iommus and program them for all common operations. The
sentinel-terminated logic is used so as to not change the
omap_iommu_arch_data structure.
NOTE:
1. The IOMMU group and IOMMU core registration is done only for
the DSP processor core MMU even though both MMUs are represented
by their own platform device and are probed individually. The
IOMMU device linking uses this registered MMU device. The struct
iommu_device for the second MMU is not used even though memory
for it is allocated.
2. The OMAP IOMMU debugfs code still continues to operate on
individual IOMMU objects.
Signed-off-by: Suman Anna <s-anna@ti.com>
[t-kristo@ti.com: ported support to 4.13 based kernel]
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The OMAP IOMMU driver allows only a single device (eg: a rproc
device) to be attached per domain. The current attach detection
logic relies on a check for an attached iommu for the respective
client device. Change this logic to use the client device pointer
instead in preparation for supporting multiple iommu devices to be
bound to a single iommu domain, and thereby to a client device.
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Slightly more changes than usual this time:
- KDump Kernel IOMMU take-over code for AMD IOMMU. The code now
tries to preserve the mappings of the kernel so that master
aborts for devices are avoided. Master aborts cause some
devices to fail in the kdump kernel, so this code makes the
dump more likely to succeed when AMD IOMMU is enabled.
- Common flush queue implementation for IOVA code users. The
code is still optional, but AMD and Intel IOMMU drivers had
their own implementation which is now unified.
- Finish support for iommu-groups. All drivers implement this
feature now so that IOMMU core code can rely on it.
- Finish support for 'struct iommu_device' in iommu drivers. All
drivers now use the interface.
- New functions in the IOMMU-API for explicit IO/TLB flushing.
This will help to reduce the number of IO/TLB flushes when
IOMMU drivers support this interface.
- Support for mt2712 in the Mediatek IOMMU driver
- New IOMMU driver for QCOM hardware
- System PM support for ARM-SMMU
- Shutdown method for ARM-SMMU-v3
- Some constification patches
- Various other small improvements and fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJZtCFNAAoJECvwRC2XARrjZnQP/AxC/ezQpq82HbegF4sM/cVE
Ep7TeTqodEl75FS/6txe2wU0pwodqk/LB9ajfQZUbE1w8vKsNEqi5qf4ZYHGoxYI
5bWyjJBzKIlwENH5lsBpQNt6XLevrYmRsFy7F0tRYy+qPQq8k+js2i7/XkCL3q7L
3xklF847RRoITaTOhhaROx1pF23dSMEsS2XGuWHcZfjORtep4wcFKzd/2SvlCWCo
P2bRU7jBzfJuuGSA80gaiUbDmrULTUfYuZNp7njASzCgsDmagERtvDEpdoXPNNSp
u6s4LjU1Dp3fgr6g6cFRO7B6JUbWd619nwo9so/c/wZN54yEngBF9EyeeF3mv2O5
ZbM2mOW3RlZcjxFT/AC8G4cZwwP6MpCEQOdqknoqc6ZQwcDqwN0o9I4+po0wsiAU
89ijZZe9Mx0p9lNpihaBEB1erAUWPo5Obh62zo80W3h6x9WzkGQWM+PyFK2DYoaC
8biEZzcc21sLEHvXQkcEGJSKrihHr9sluOqvxmCw5QAkYIFAeZRoeH7JtZWjVCnr
T3XvaG1G1Aw6tS7ErxufdKawREAGki0Rm9i1baiH9sqNj5rllM01Y+PgU6E21Nbg
iZp9gJLjfwM4vhYLlovvQK5PRoOBsCkyCpEI4GJqjLeam5p/WN06CFFf0ifQofYr
qDPCVDkWHWV8nugFFKE7
=EVh9
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"Slightly more changes than usual this time:
- KDump Kernel IOMMU take-over code for AMD IOMMU. The code now tries
to preserve the mappings of the kernel so that master aborts for
devices are avoided. Master aborts cause some devices to fail in
the kdump kernel, so this code makes the dump more likely to
succeed when AMD IOMMU is enabled.
- common flush queue implementation for IOVA code users. The code is
still optional, but AMD and Intel IOMMU drivers had their own
implementation which is now unified.
- finish support for iommu-groups. All drivers implement this feature
now so that IOMMU core code can rely on it.
- finish support for 'struct iommu_device' in iommu drivers. All
drivers now use the interface.
- new functions in the IOMMU-API for explicit IO/TLB flushing. This
will help to reduce the number of IO/TLB flushes when IOMMU drivers
support this interface.
- support for mt2712 in the Mediatek IOMMU driver
- new IOMMU driver for QCOM hardware
- system PM support for ARM-SMMU
- shutdown method for ARM-SMMU-v3
- some constification patches
- various other small improvements and fixes"
* tag 'iommu-updates-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (87 commits)
iommu/vt-d: Don't be too aggressive when clearing one context entry
iommu: Introduce Interface for IOMMU TLB Flushing
iommu/s390: Constify iommu_ops
iommu/vt-d: Avoid calling virt_to_phys() on null pointer
iommu/vt-d: IOMMU Page Request needs to check if address is canonical.
arm/tegra: Call bus_set_iommu() after iommu_device_register()
iommu/exynos: Constify iommu_ops
iommu/ipmmu-vmsa: Make ipmmu_gather_ops const
iommu/ipmmu-vmsa: Rereserving a free context before setting up a pagetable
iommu/amd: Rename a few flush functions
iommu/amd: Check if domain is NULL in get_domain() and return -EBUSY
iommu/mediatek: Fix a build warning of BIT(32) in ARM
iommu/mediatek: Fix a build fail of m4u_type
iommu: qcom: annotate PM functions as __maybe_unused
iommu/pamu: Fix PAMU boot crash
memory: mtk-smi: Degrade SMI init to module_init
iommu/mediatek: Enlarge the validate PA range for 4GB mode
iommu/mediatek: Disable iommu clock when system suspend
iommu/mediatek: Move pgtable allocation into domain_alloc
iommu/mediatek: Merge 2 M4U HWs into one iommu domain
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZsr8cAAoJEFmIoMA60/r8lXYQAKViYIRMJDD4n3NhjMeLOsnJ
vwaBmWlLRjSFIEpag5kMjS1RJE17qAvmkBZnDvSNZ6cT28INkkZnVM2IW96WECVq
64MIvDijVPcvqGuWePCfWdDiSXApiDWwJuw55BOhmvV996wGy0gYgzpPY+1g0Knh
XzH9IOzDL79hZleLfsxX0MLV6FGBVtOsr0jvQ04k4IgEMIxEDTlbw85rnrvzQUtc
0Vj2koaxWIESZsq7G/wiZb2n6ekaFdXO/VlVvvhmTSDLCBaJ63Hb/gfOhwMuVkS6
B3cVprNrCT0dSzWmU4ZXf+wpOyDpBexlemW/OR/6CQUkC6AUS6kQ5si1X44dbGmJ
nBPh414tdlm/6V4h/A3UFPOajSGa/ZWZ/uQZPfvKs1R6WfjUerWVBfUpAzPbgjam
c/mhJ19HYT1J7vFBfhekBMeY2Px3JgSJ9rNsrFl48ynAALaX5GEwdpo4aqBfscKz
4/f9fU4ysumopvCEuKD2SsJvsPKd5gMQGGtvAhXM1TxvAoQ5V4cc99qEetAPXXPf
h2EqWm4ph7YP4a+n/OZBjzluHCmZJn1CntH5+//6wpUk6HnmzsftGELuO9n12cLE
GGkreI3T9ctV1eOkzVVa0l0QTE1X/VLyEyKCtb9obXsDaG4Ud7uKQoZgB19DwyTJ
EG76ridTolUFVV+wzJD9
=9cLP
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- add enhanced Downstream Port Containment support, which prints more
details about Root Port Programmed I/O errors (Dongdong Liu)
- add Layerscape ls1088a and ls2088a support (Hou Zhiqiang)
- add MediaTek MT2712 and MT7622 support (Ryder Lee)
- add MediaTek MT2712 and MT7622 MSI support (Honghui Zhang)
- add Qualcom IPQ8074 support (Varadarajan Narayanan)
- add R-Car r8a7743/5 device tree support (Biju Das)
- add Rockchip per-lane PHY support for better power management (Shawn
Lin)
- fix IRQ mapping for hot-added devices by replacing the
pci_fixup_irqs() boot-time design with a host bridge hook called at
probe-time (Lorenzo Pieralisi, Matthew Minter)
- fix race when enabling two devices that results in upstream bridge
not being enabled correctly (Srinath Mannam)
- fix pciehp power fault infinite loop (Keith Busch)
- fix SHPC bridge MSI hotplug events by enabling bus mastering
(Aleksandr Bezzubikov)
- fix a VFIO issue by correcting PCIe capability sizes (Alex
Williamson)
- fix an INTD issue on Xilinx and possibly other drivers by unifying
INTx IRQ domain support (Paul Burton)
- avoid IOMMU stalls by marking AMD Stoney GPU ATS as broken (Joerg
Roedel)
- allow APM X-Gene device assignment to guests by adding an ACS quirk
(Feng Kan)
- fix driver crashes by disabling Extended Tags on Broadcom HT2100
(Extended Tags support is required for PCIe Receivers but not
Requesters, and we now enable them by default when Requesters support
them) (Sinan Kaya)
- fix MSIs for devices that use phantom RIDs for DMA by assuming MSIs
use the real Requester ID (not a phantom RID) (Robin Murphy)
- prevent assignment of Intel VMD children to guests (which may be
supported eventually, but isn't yet) by not associating an IOMMU with
them (Jon Derrick)
- fix Intel VMD suspend/resume by releasing IRQs on suspend (Scott
Bauer)
- fix a Function-Level Reset issue with Intel 750 NVMe by waiting
longer (up to 60sec instead of 1sec) for device to become ready
(Sinan Kaya)
- fix a Function-Level Reset issue on iProc Stingray by working around
hardware defects in the CRS implementation (Oza Pawandeep)
- fix an issue with Intel NVMe P3700 after an iProc reset by adding a
delay during shutdown (Oza Pawandeep)
- fix a Microsoft Hyper-V lockdep issue by polling instead of blocking
in compose_msi_msg() (Stephen Hemminger)
- fix a wireless LAN driver timeout by clearing DesignWare MSI
interrupt status after it is handled, not before (Faiz Abbas)
- fix DesignWare ATU enable checking (Jisheng Zhang)
- reduce Layerscape dependencies on the bootloader by doing more
initialization in the driver (Hou Zhiqiang)
- improve Intel VMD performance allowing allocation of more IRQ vectors
than present CPUs (Keith Busch)
- improve endpoint framework support for initial DMA mask, different
BAR sizes, configurable page sizes, MSI, test driver, etc (Kishon
Vijay Abraham I, Stan Drozd)
- rework CRS support to add periodic messages while we poll during
enumeration and after Function-Level Reset and prepare for possible
other uses of CRS (Sinan Kaya)
- clean up Root Port AER handling by removing unnecessary code and
moving error handler methods to struct pcie_port_service_driver
(Christoph Hellwig)
- clean up error handling paths in various drivers (Bjorn Andersson,
Fabio Estevam, Gustavo A. R. Silva, Harunobu Kurokawa, Jeffy Chen,
Lorenzo Pieralisi, Sergei Shtylyov)
- clean up SR-IOV resource handling by disabling VF decoding before
updating the corresponding resource structs (Gavin Shan)
- clean up DesignWare-based drivers by unifying quirks to update Class
Code and Interrupt Pin and related handling of write-protected
registers (Hou Zhiqiang)
- clean up by adding empty generic pcibios_align_resource() and
pcibios_fixup_bus() and removing empty arch-specific implementations
(Palmer Dabbelt)
- request exclusive reset control for several drivers to allow cleanup
elsewhere (Philipp Zabel)
- constify various structures (Arvind Yadav, Bhumika Goyal)
- convert from full_name() to %pOF (Rob Herring)
- remove unused variables from iProc, HiSi, Altera, Keystone (Shawn
Lin)
* tag 'pci-v4.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (170 commits)
PCI: xgene: Clean up whitespace
PCI: xgene: Define XGENE_PCI_EXP_CAP and use generic PCI_EXP_RTCTL offset
PCI: xgene: Fix platform_get_irq() error handling
PCI: xilinx-nwl: Fix platform_get_irq() error handling
PCI: rockchip: Fix platform_get_irq() error handling
PCI: altera: Fix platform_get_irq() error handling
PCI: spear13xx: Fix platform_get_irq() error handling
PCI: artpec6: Fix platform_get_irq() error handling
PCI: armada8k: Fix platform_get_irq() error handling
PCI: dra7xx: Fix platform_get_irq() error handling
PCI: exynos: Fix platform_get_irq() error handling
PCI: iproc: Clean up whitespace
PCI: iproc: Rename PCI_EXP_CAP to IPROC_PCI_EXP_CAP
PCI: iproc: Add 500ms delay during device shutdown
PCI: Fix typos and whitespace errors
PCI: Remove unused "res" variable from pci_resource_io()
PCI: Correct kernel-doc of pci_vpd_srdt_size(), pci_vpd_srdt_tag()
PCI/AER: Reformat AER register definitions
iommu/vt-d: Prevent VMD child devices from being remapping targets
x86/PCI: Use is_vmd() rather than relying on the domain number
...
Pull x86 mm changes from Ingo Molnar:
"PCID support, 5-level paging support, Secure Memory Encryption support
The main changes in this cycle are support for three new, complex
hardware features of x86 CPUs:
- Add 5-level paging support, which is a new hardware feature on
upcoming Intel CPUs allowing up to 128 PB of virtual address space
and 4 PB of physical RAM space - a 512-fold increase over the old
limits. (Supercomputers of the future forecasting hurricanes on an
ever warming planet can certainly make good use of more RAM.)
Many of the necessary changes went upstream in previous cycles,
v4.14 is the first kernel that can enable 5-level paging.
This feature is activated via CONFIG_X86_5LEVEL=y - disabled by
default.
(By Kirill A. Shutemov)
- Add 'encrypted memory' support, which is a new hardware feature on
upcoming AMD CPUs ('Secure Memory Encryption', SME) allowing system
RAM to be encrypted and decrypted (mostly) transparently by the
CPU, with a little help from the kernel to transition to/from
encrypted RAM. Such RAM should be more secure against various
attacks like RAM access via the memory bus and should make the
radio signature of memory bus traffic harder to intercept (and
decrypt) as well.
This feature is activated via CONFIG_AMD_MEM_ENCRYPT=y - disabled
by default.
(By Tom Lendacky)
- Enable PCID optimized TLB flushing on newer Intel CPUs: PCID is a
hardware feature that attaches an address space tag to TLB entries
and thus allows to skip TLB flushing in many cases, even if we
switch mm's.
(By Andy Lutomirski)
All three of these features were in the works for a long time, and
it's coincidence of the three independent development paths that they
are all enabled in v4.14 at once"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (65 commits)
x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)
x86/mm: Use pr_cont() in dump_pagetable()
x86/mm: Fix SME encryption stack ptr handling
kvm/x86: Avoid clearing the C-bit in rsvd_bits()
x86/CPU: Align CR3 defines
x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages
acpi, x86/mm: Remove encryption mask from ACPI page protection type
x86/mm, kexec: Fix memory corruption with SME on successive kexecs
x86/mm/pkeys: Fix typo in Documentation/x86/protection-keys.txt
x86/mm/dump_pagetables: Speed up page tables dump for CONFIG_KASAN=y
x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID
x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y
x86/mm: Allow userspace have mappings above 47-bit
x86/mm: Prepare to expose larger address space to userspace
x86/mpx: Do not allow MPX if we have mappings above 47-bit
x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit()
x86/xen: Redefine XEN_ELFNOTE_INIT_P2M using PUD_SIZE * PTRS_PER_PUD
x86/mm/dump_pagetables: Fix printout of p4d level
x86/mm/dump_pagetables: Generalize address normalization
x86/boot: Fix memremap() related build failure
...
Previously, we were invalidating context cache and IOTLB globally when
clearing one context entry. This is a tad too aggressive.
Invalidate the context cache and IOTLB for the interested device only.
Signed-off-by: Filippo Sironi <sironi@amazon.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Calls to mmu_notifier_invalidate_page() were replaced by calls to
mmu_notifier_invalidate_range() and are now bracketed by calls to
mmu_notifier_invalidate_range_start()/end()
Remove now useless invalidate_page callback.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: iommu@lists.linux-foundation.org
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Calls to mmu_notifier_invalidate_page() were replaced by calls to
mmu_notifier_invalidate_range() and are now bracketed by calls to
mmu_notifier_invalidate_range_start()/end()
Remove now useless invalidate_page callback.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
VMD child devices must use the VMD endpoint's ID as the requester. Because
of this, there needs to be a way to link the parent VMD endpoint's IOMMU
group and associated mappings to the VMD child devices such that attaching
and detaching child devices modify the endpoint's mappings, while
preventing early detaching on a singular device removal or unbinding.
The reassignment of individual VMD child devices devices to VMs is outside
the scope of VMD, but may be implemented in the future. For now it is best
to prevent any such attempts.
Prevent VMD child devices from returning an IOMMU, which prevents it from
exposing an iommu_group sysfs directory and allowing subsequent binding by
userspace-access drivers such as VFIO.
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
With the current IOMMU-API the hardware TLBs have to be
flushed in every iommu_ops->unmap() call-back.
For unmapping large amounts of address space, like it
happens when a KVM domain with assigned devices is
destroyed, this causes thousands of unnecessary TLB flushes
in the IOMMU hardware because the unmap call-back runs for
every unmapped physical page.
With the TLB Flush Interface and the new iommu_unmap_fast()
function introduced here the need to clean the hardware TLBs
is removed from the unmapping code-path. Users of
iommu_unmap_fast() have to explicitly call the TLB-Flush
functions to sync the page-table changes to the hardware.
Three functions for TLB-Flushes are introduced:
* iommu_flush_tlb_all() - Flushes all TLB entries
associated with that
domain. TLBs entries are
flushed when this function
returns.
* iommu_tlb_range_add() - This will add a given
range to the flush queue
for this domain.
* iommu_tlb_sync() - Flushes all queued ranges from
the hardware TLBs. Returns when
the flush is finished.
The semantic of this interface is intentionally similar to
the iommu_gather_ops from the io-pgtable code.
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
iommu_ops are not supposed to change at runtime.
Functions 'bus_set_iommu' working with const iommu_ops provided
by <linux/iommu.h>. So mark the non-const structs as const.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
New kernels with debug show panic() from __phys_addr() checks. Avoid
calling virt_to_phys() when pasid_state_tbl pointer is null
To: Joerg Roedel <joro@8bytes.org>
To: linux-kernel@vger.kernel.org>
Cc: iommu@lists.linux-foundation.org
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Fixes: 2f26e0a9c9 ('iommu/vt-d: Add basic SVM PASID support')
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Page Request from devices that support device-tlb would request translation
to pre-cache them in device to avoid overhead of IOMMU lookups.
IOMMU needs to check for canonicallity of the address before performing
page-fault processing.
To: Joerg Roedel <joro@8bytes.org>
To: linux-kernel@vger.kernel.org>
Cc: iommu@lists.linux-foundation.org
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Reported-by: Sudeep Dutt <sudeep.dutt@intel.com>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The bus_set_iommu() function will call the add_device()
call-back which needs the iommu to be registered.
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Fixes: 0b480e4470 ('iommu/tegra: Add support for struct iommu_device')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
iommu_ops are not supposed to change at runtime.
Functions 'iommu_device_set_ops' and 'bus_set_iommu' working with
const iommu_ops provided by <linux/iommu.h>. So mark the non-const
structs as const.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Reserving a free context is both quicker and more likely to fail
(due to limited hardware resources) than setting up a pagetable.
What is more the pagetable init/cleanup code could require
the context to be set up.
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
CC: Robin Murphy <robin.murphy@arm.com>
CC: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
CC: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Rename a few iommu cache-flush functions that start with
iommu_ to start with amd_iommu now. This is to prevent name
collisions with generic iommu code later on.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In get_domain(), 'domain' could be NULL before it's passed to dma_ops_domain()
to dereference. And the current code calling get_domain() can't deal with the
returned 'domain' well if its value is NULL.
So before dma_ops_domain() calling, check if 'domain' is NULL, If yes just return
ERR_PTR(-EBUSY) directly.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: df3f7a6e8e ('iommu/amd: Use is_attach_deferred call-back')
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The commit ("iommu/mediatek: Enlarge the validate PA range
for 4GB mode") introduce the following build warning while ARCH=arm:
drivers/iommu/mtk_iommu.c: In function 'mtk_iommu_iova_to_phys':
include/linux/bitops.h:6:24: warning: left shift count >= width
of type [-Wshift-count-overflow]
#define BIT(nr) (1UL << (nr))
^
>> drivers/iommu/mtk_iommu.c:407:9: note: in expansion of macro 'BIT'
pa |= BIT(32);
drivers/iommu/mtk_iommu.c: In function 'mtk_iommu_probe':
include/linux/bitops.h:6:24: warning: left shift count >= width
of type [-Wshift-count-overflow]
#define BIT(nr) (1UL << (nr))
^
drivers/iommu/mtk_iommu.c:589:35: note: in expansion of macro 'BIT'
data->enable_4GB = !!(max_pfn > (BIT(32) >> PAGE_SHIFT));
Use BIT_ULL instead of BIT.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The commit ("iommu/mediatek: Enlarge the validate PA range
for 4GB mode") introduce the following build error:
drivers/iommu/mtk_iommu.c: In function 'mtk_iommu_hw_init':
>> drivers/iommu/mtk_iommu.c:536:30: error: 'const struct mtk_iommu_data'
has no member named 'm4u_type'; did you mean 'm4u_dom'?
if (data->enable_4GB && data->m4u_type != M4U_MT8173) {
This patch fix it, use "m4u_plat" instead of "m4u_type".
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The qcom_iommu_disable_clocks() function is only called from PM
code that is hidden in an #ifdef, causing a harmless warning without
CONFIG_PM:
drivers/iommu/qcom_iommu.c:601:13: error: 'qcom_iommu_disable_clocks' defined but not used [-Werror=unused-function]
static void qcom_iommu_disable_clocks(struct qcom_iommu_dev *qcom_iommu)
drivers/iommu/qcom_iommu.c:581:12: error: 'qcom_iommu_enable_clocks' defined but not used [-Werror=unused-function]
static int qcom_iommu_enable_clocks(struct qcom_iommu_dev *qcom_iommu)
Replacing that #ifdef with __maybe_unused annotations lets the compiler
drop the functions silently instead.
Fixes: 0ae349a0f3 ("iommu/qcom: Add qcom_iommu")
Acked-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Commit 68a17f0be6 introduced an initialization order
problem, where devices are linked against an iommu which is
not yet initialized. Fix it by initializing the iommu-device
before the iommu-ops are registered against the bus.
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Fixes: 68a17f0be6 ('iommu/pamu: Add support for generic iommu-device')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This patch is for 4GB mode, mainly for 4 issues:
1) Fix a 4GB bug:
if the dram base is 0x4000_0000, the dram size is 0xc000_0000.
then the code just meet a corner case because max_pfn is
0x10_0000.
data->enable_4GB = !!(max_pfn > (0xffffffffUL >> PAGE_SHIFT));
It's true at the case above. That is unexpected.
2) In mt2712, there is a new register for the 4GB PA range(0x118)
we should enlarge the max PA range, or the HW will report
error.
The dram range is from 0x1_0000_0000 to 0x1_ffff_ffff in the 4GB
mode, we cut out the bit[32:30] of the SA(Start address) and
EA(End address) into this REG_MMU_VLD_PA_RNG(0x118).
3) In mt2712, the register(0x13c) is extended for 4GB mode.
bit[7:6] indicate the valid PA[32:33]. Thus, we don't mask the
value and print it directly for debug.
4) if 4GB is enabled, the dram PA range is from 0x1_0000_0000 to
0x1_ffff_ffff. Thus, the PA from iova_to_pa should also '|' BIT(32)
Signed-off-by: Honghui Zhang <honghui.zhang@mediatek.com>
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When system suspend, infra power domain may be off, and the iommu's
clock must be disabled when system off, or the iommu's bclk clock maybe
disabled after system resume.
Signed-off-by: Honghui Zhang <honghui.zhang@mediatek.com>
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
After adding the global list for M4U HW, We get a chance to
move the pagetable allocation into the mtk_iommu_domain_alloc.
Let the domain_alloc do the right thing.
This patch is for fixing this problem[1].
[1]: https://patchwork.codeaurora.org/patch/53987/
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In theory, If there are 2 M4U HWs, there should be 2 IOMMU domains.
But one IOMMU domain(4GB iova range) is enough for us currently,
It's unnecessary to maintain 2 pagetables.
Besides, This patch can simplify our consumer code largely. They don't
need map a iova range from one domain into another, They can share the
iova address easily.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The M4U IP blocks in mt2712 is MTK's generation2 M4U which use the
ARM Short-descriptor like mt8173, and most of the HW registers are
the same.
The difference is that there are 2 M4U HWs in mt2712 while there's
only one in mt8173. The purpose of 2 M4U HWs is for balance the
bandwidth.
Normally if there are 2 M4U HWs, there should be 2 iommu domains,
each M4U has a iommu domain.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The definition of MTK_M4U_TO_LARB and MTK_M4U_TO_PORT are shared by
all the gen2 M4U HWs. Thus, Move them out from mt8173-larb-port.h,
and put them into the c file.
Suggested-by: Honghui Zhang <honghui.zhang@mediatek.com>
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Extend the driver to make use of iommu_device_sysfs_add()/remove()
functions to hook up initial sysfs support.
Suggested-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The variable amd_iommu_pre_enabled is used in non-init
code-paths, so remove the __initdata annotation.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 3ac3e5ee5e ('iommu/amd: Copy old trans table from old kernel')
Acked-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This was reported by the kbuild bot. The condition in which
entry would be used uninitialized can not happen, because
when there is no iommu this function would never be called.
But its no fast-path, so fix the warning anyway.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Acked-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The recently-removed FIXME in iommu_get_domain_for_dev() turns out to
have been a little misleading, since that check is still worthwhile even
when groups *are* universal. We have a few IOMMU-aware drivers which
only care whether their device is already attached to an existing domain
or not, for which the previous behaviour of iommu_get_domain_for_dev()
was ideal, and who now crash if their device does not have an IOMMU.
With IOMMU groups now serving as a reliable indicator of whether a
device has an IOMMU or not (barring false-positives from VFIO no-IOMMU
mode), drivers could arguably do this:
group = iommu_group_get(dev);
if (group) {
domain = iommu_get_domain_for_dev(dev);
iommu_group_put(group);
}
However, rather than duplicate that code across multiple callsites,
particularly when it's still only the domain they care about, let's skip
straight to the next step and factor out the check into the common place
it applies - in iommu_get_domain_for_dev() itself. Sure, it ends up
looking rather familiar, but now it's backed by the reasoning of having
a robust API able to do the expected thing for all devices regardless.
Fixes: 05f80300dc ("iommu: Finish making iommu_group support mandatory")
Reported-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add a struct iommu_device to each tegra-gart and register it
with the iommu-core. Also link devices added to the driver
to their respective hardware iommus.
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add a struct iommu_device to each tegra-smmu and register it
with the iommu-core. Also link devices added to the driver
to their respective hardware iommus.
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
With all our hardware state tracked in such a way that we can naturally
restore it as part of the necessary reset, resuming is trivial, and
there's nothing to do on suspend at all.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Echoing what we do for Stream Map Entries, maintain a software shadow
state for context bank configuration. With this in place, we are mere
moments away from blissfully easy suspend/resume support.
Reviewed-by: Sricharan R <sricharan@codeaurora.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
[will: fix sparse warning by only clearing .cfg during domain destruction]
Signed-off-by: Will Deacon <will.deacon@arm.com>
The shutdown method disables the SMMU to avoid corrupting a new kernel
started with kexec.
Signed-off-by: Nate Watterson <nwatters@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Remove the deferred flushing implementation in the Intel
VT-d driver and use the one from the common iova code
instead.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The shift qi_flush_dev_iotlb() is done on an int, which
limits the mask to 32 bits. Make the mask 64 bits wide so
that more than 4GB of address range can be flushed at once.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add a timer to flush entries from the Flush-Queues every
10ms. This makes sure that no stale TLB entries remain for
too long after an IOVA has been unmapped.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The lock is taken from the same CPU most of the time. But
having it allows to flush the queue also from another CPU if
necessary.
This will be used by a timer to regularily flush any pending
IOVAs from the Flush-Queues.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
There are two counters:
* fq_flush_start_cnt - Increased when a TLB flush
is started.
* fq_flush_finish_cnt - Increased when a TLB flush
is finished.
The fq_flush_start_cnt is assigned to every Flush-Queue
entry on its creation. When freeing entries from the
Flush-Queue, the value in the entry is compared to the
fq_flush_finish_cnt. The entry can only be freed when its
value is less than the value of fq_flush_finish_cnt.
The reason for these counters it to take advantage of IOMMU
TLB flushes that happened on other CPUs. These already
flushed the TLB for Flush-Queue entries on other CPUs so
that they can already be freed without flushing the TLB
again.
This makes it less likely that the Flush-Queue is full and
saves IOMMU TLB flushes.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add a function to add entries to the Flush-Queue ring
buffer. If the buffer is full, call the flush-callback and
free the entries.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This patch adds the basic data-structures to implement
flush-queues in the generic IOVA code. It also adds the
initialization and destroy routines for these data
structures.
The initialization routine is designed so that the use of
this feature is optional for the users of IOVA code.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Sudeep reports that the logic got slightly broken when a PCI iommu-map
entry targets an IOMMU marked as disabled in DT, since of_pci_map_rid()
succeeds in following a phandle, and of_iommu_xlate() doesn't return an
error value, but we miss checking whether ops was actually non-NULL.
Whilst this could be solved with a point fix in of_pci_iommu_init(), it
suggests that all the juggling of ERR_PTR values through the ops pointer
is proving rather too complicated for its own good, so let's instead
simplify the whole flow (with a side-effect of eliminating the cause of
the bug).
The fact that we now rely on iommu_fwspec means that we no longer need
to pass around an iommu_ops pointer at all - we can simply propagate a
regular int return value until we know whether we have a viable IOMMU,
then retrieve the ops from the fwspec if and when we actually need them.
This makes everything a bit more uniform and certainly easier to follow.
Fixes: d87beb7492 ("iommu/of: Handle PCI aliases properly")
Reported-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add support for the iommu_device_register interface to make
the s390 hardware iommus visible to the iommu core and in
sysfs.
Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>