The updates include:
* Small code cleanups in the AMD IOMMUv2 driver
* Scalability improvements for the DMA-API implementation of the
AMD IOMMU driver. This is just a starting point, but already
showed some good improvements in my tests.
* Removal of the unused Renesas IPMMU/IPMMUI driver
* Updates for ARM-SMMU include:
* Some fixes to get the driver working nicely on
Broadcom hardware
* A change to the io-pgtable API to indicate the unit in
which to flush (all callers converted, with Ack from
Laurent)
* Use of devm_* for allocating/freeing the SMMUv3
buffers
* Some other small fixes and improvements for other drivers
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJWnkkIAAoJECvwRC2XARrjpD8QAMCYfoqiq35QLQYn7Jh/LA5E
ZotdNv6hONwCahQiNSSsaoP2f8IBpyvo7Nrgz1Fj3SEQYzAiBn6mWgXFu7WdQarD
kw1SLwUIUweF/qjgpOvGD29F7mC4XIRYfPOFbLEPvkBwx6Vm4NSJkclfMZJeNCFm
ghWxGdva+7HFyJX+gMS1flihfUzN31U5hKWRxQqHXcLbHuVOdEnL1by5ozbpcJNI
vkpbkCcWaD1uKju918akFYJultcwMGb7Wm6HwKB2EjG2aOoe2Siw61MrJ1DUreOh
J0fJubltaZwkMxFUTuNwrP9E+FH6arPtJBmvpMMz8ZQeLyQQQnBcHKDZFAgHu23Z
/wOkjoA5uG5iy2XiPWbUFJQKp4q+Dlkp8LqT1RAKvp8kVbrrsSGUXQzBIf+DE5F7
U0ghAWB70g6fREys/cvs0q7huX42Cuf3M82JKP9rksLj9ArWoT4TtkI2nvbNyKE8
KhX57xj4OSROriZV8+XmaU/W7bK6BVXr7B0aVOCvf5y7GsIYhf6zSH+0cP/TmLuQ
ZLtOr2UHFzvjZq7LHgRfEs1CYn+PhKw6kUM3rxjm/QZxiBSft7ABhhxJZKlMyE/f
jTnPS5DH2XT+UKtt8D0nfS558h0kxqwXzhICQHC30lpJLIoWj9ulZcmOXdzlY1xM
R5+4TTJ4l1tovPtQ9nUW
=bm5E
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"The updates include:
- Small code cleanups in the AMD IOMMUv2 driver
- Scalability improvements for the DMA-API implementation of the AMD
IOMMU driver. This is just a starting point, but already showed
some good improvements in my tests.
- Removal of the unused Renesas IPMMU/IPMMUI driver
- Updates for ARM-SMMU include:
* Some fixes to get the driver working nicely on Broadcom hardware
* A change to the io-pgtable API to indicate the unit in which to
flush (all callers converted, with Ack from Laurent)
* Use of devm_* for allocating/freeing the SMMUv3 buffers
- Some other small fixes and improvements for other drivers"
* tag 'iommu-updates-v4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (46 commits)
iommu/vt-d: Fix up error handling in alloc_iommu
iommu/vt-d: Check the return value of iommu_device_create()
iommu/amd: Remove an unneeded condition
iommu/amd: Preallocate dma_ops apertures based on dma_mask
iommu/amd: Use trylock to aquire bitmap_lock
iommu/amd: Make dma_ops_domain->next_index percpu
iommu/amd: Relax locking in dma_ops path
iommu/amd: Initialize new aperture range before making it visible
iommu/amd: Build io page-tables with cmpxchg64
iommu/amd: Allocate new aperture ranges in dma_ops_alloc_addresses
iommu/amd: Optimize dma_ops_free_addresses
iommu/amd: Remove need_flush from struct dma_ops_domain
iommu/amd: Iterate over all aperture ranges in dma_ops_area_alloc
iommu/amd: Flush iommu tlb in dma_ops_free_addresses
iommu/amd: Rename dma_ops_domain->next_address to next_index
iommu/amd: Remove 'start' parameter from dma_ops_area_alloc
iommu/amd: Flush iommu tlb in dma_ops_aperture_alloc()
iommu/amd: Retry address allocation within one aperture
iommu/amd: Move aperture_range.offset to another cache-line
iommu/amd: Add dma_ops_aperture_alloc() function
...
Only check for error when iommu->iommu_dev has been assigned
and only assign drhd->iommu when the function can't fail
anymore.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This adds the proper check to alloc_iommu to make sure that
the call to iommu_device_create has completed successfully
and if not return the error code to the caller after freeing
up resources allocated previously.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When mapping a non-page-aligned scatterlist entry, we copy the original
offset to the output DMA address before aligning it to hand off to
iommu_map_sg(), then later adding the IOVA page address portion to get
the final mapped address. However, when the IOVA page size is smaller
than the CPU page size, it is the offset within the IOVA page we want,
not that within the CPU page, which can easily be larger than an IOVA
page and thus result in an incorrect final address.
Fix the bug by taking only the IOVA-aligned part of the offset as the
basis of the DMA address, not the whole thing.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
get_device_id() returns an unsigned short device id. It never fails and
it never returns a negative so we can remove this condition.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Preallocate between 4 and 8 apertures when a device gets it
dma_mask. With more apertures we reduce the lock contention
of the domain lock significantly.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Make this pointer percpu so that we start searching for new
addresses in the range we last stopped and which is has a
higher probability of being still in the cache.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This allows to build up the page-tables without holding any
locks. As a consequence it removes the need to pre-populate
dma_ops page-tables.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Don't flush the iommu tlb when we free something behind the
current next_bit pointer. Update the next_bit pointer
instead and let the flush happen on the next wraparound in
the allocation path.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The flushing of iommu tlbs is now done on a per-range basis.
So there is no need anymore for domain-wide flush tracking.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
It points to the next aperture index to allocate from. We
don't need the full address anymore because this is now
tracked in struct aperture_range.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Moving it before the pte_pages array puts in into the same
cache-line as the spin-lock and the bitmap array pointer.
This should safe a cache-miss.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
There have been present PTEs which in theory could have made
it to the IOMMU TLB. Flush the addresses out on the error
path to make sure no stale entries remain.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
If CONFIG_PHYS_ADDR_T_64BIT=n:
drivers/iommu/ipmmu-vmsa.c: In function 'ipmmu_domain_init_context':
drivers/iommu/ipmmu-vmsa.c:434:2: warning: right shift count >= width of type
ipmmu_ctx_write(domain, IMTTUBR0, ttbr >> 32);
^
As io_pgtable_cfg.arm_lpae_s1_cfg.ttbr[] is an array of u64s, assigning
it to a phys_addr_t may truncates it. Make ttbr u64 to fix this.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Doug reports that the equivalent page allocator on 32-bit ARM exhibits
particularly pathalogical behaviour under memory pressure when
fragmentation is high, where allocating a 4MB buffer takes tens of
seconds and the number of calls to alloc_pages() is over 9000![1]
We can drastically improve that situation without losing the other
benefits of high-order allocations when they would succeed, by assuming
memory pressure is relatively constant over the course of an allocation,
and not retrying allocations at orders we know to have failed before.
This way, the best-case behaviour remains unchanged, and in the worst
case we should see at most a dozen or so (MAX_ORDER - 1) failed attempts
before falling back to single pages for the remainder of the buffer.
[1]:http://lists.infradead.org/pipermail/linux-arm-kernel/2015-December/394660.html
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
dma-iommu.c was naughtily relying on an implicit transitive #include of
linux/vmalloc.h, which is apparently not present on some architectures.
Add that, plus a couple more headers for other functions which are used
similarly.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Those are stupid and code should use static_cpu_has_safe() or
boot_cpu_has() instead. Kill the least used and unused ones.
The remaining ones need more careful inspection before a conversion can
happen. On the TODO.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1449481182-27541-4-git-send-email-bp@alien8.de
Cc: David Sterba <dsterba@suse.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* Two similar fixes for the Intel and AMD IOMMU drivers to add
proper access checks before calling handle_mm_fault.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJWdCp7AAoJECvwRC2XARrjjAIP/0ihW2zF4R622RgY1C1Cm62j
0eb/R4UqjI3PG0KsURgDHcIm9JP5Z//dgKTOtNX9KOkHlXLcO9MMSD5chVBd4HKG
+Mgx7RM+Mr7f6ElRUa6s1GY1tcJlGf43fW5cMQ44BJIqVXlE47go4U09D86DVgXy
KgyBxQldeOrkXZvAG82WLjGgkdGALQjbDlI8ktmfYWXAvIRWNGJqWY16BwAYOWfb
9d3+1JPekSSBWHC6H+qbkDb8ueO69/Ux0HL5z2Q0zchqGjBb1gnfwLcz865KZpOB
qUwsKFSXTl+jPCrAaLYJnVqAnH4qqKaF6WKAJSIHObTSVqXKHpFHrQrlGVzOvYNn
s3216KIMsxG2nnvSgXCOFGqM/810MH2MSo8YcF5A3celrka3j2Gj08mxInrZXN7D
3p51HSwq8ePo4i5jppT5ldOBSjNV9N3wKWcjDb4OL+OfkJc/u2VbSHNQtpvTclsV
V6VSfWLDC8BCmUveMH2TrawQWkKOz0LqgqfQPX+VvSCIM7tgkrgVsTJrijPtGOs1
zid/A/cfqMdBezSVALrZfB4OVBaM2UL2LJmmLJgApYV+N55Oxmx+nxnMr0aT5KlY
crjcnVaypkq3rG1Wjpt+nTTwtllB0yXNEywQcu2edeswmaQCqsEgQRsDqi6S2/+S
c8l9JKoTrB4+vToYjXyW
=qrAB
-----END PGP SIGNATURE-----
Merge tag 'iommu-fixes-v4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
"Two similar fixes for the Intel and AMD IOMMU drivers to add proper
access checks before calling handle_mm_fault"
* tag 'iommu-fixes-v4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Do access checks before calling handle_mm_fault()
iommu/amd: Do proper access checking before calling handle_mm_fault()
When tearing down page tables, we return early for the final level
since we know that we won't have any table pointers to follow.
Unfortunately, this also means that we forget to free the final level,
so we end up leaking memory.
Fix the issue by always freeing the current level, but just don't bother
to iterate over the ptes if we're at the final level.
Cc: <stable@vger.kernel.org>
Reported-by: Zhang Bo <zhangbo_a@xiaomi.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
It is ILLEGAL to set STE.S1STALLD to 1 if stage 1 is enabled and
either the stall or terminate models are not supported.
This patch fixes the STALLD check and ensures that we don't set STALLD
in the STE when it is not supported.
Signed-off-by: Prem Mallappa <pmallapp@broadcom.com>
[will: consistently use IDR0_STALL_MODEL_* prefix]
Signed-off-by: Will Deacon <will.deacon@arm.com>
When acknowledging global errors, the GERRORN register should be written
with the original GERROR value so that active errors are toggled.
This patch fixed the driver to write the original GERROR value to
GERRORN, instead of an active error mask.
Signed-off-by: Prem Mallappa <pmallapp@broadcom.com>
[will: reworked use of active bits and fixed commit log]
Signed-off-by: Will Deacon <will.deacon@arm.com>
There is no need to keep a useful accessor for a public structure hidden
away in a private implementation. Move it out alongside the structure
definition so that other implementations may reuse it.
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When invalidating an IOVA range potentially spanning multiple pages,
such as when removing an entire intermediate-level table, we currently
only issue an invalidation for the first IOVA of that range. Since the
architecture specifies that address-based TLB maintenance operations
target a single entry, an SMMU could feasibly retain live entries for
subsequent pages within that unmapped range, which is not good.
Make sure we hit every possible entry by iterating over the whole range
at the granularity provided by the pagetable implementation.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
[will: added missing semicolons...]
Signed-off-by: Will Deacon <will.deacon@arm.com>
IOMMU hardware with range-based TLB maintenance commands can work
happily with the iova and size arguments passed via the tlb_add_flush
callback, but for IOMMUs which require separate commands per entry in
the range, it is not straightforward to infer the necessary granularity
when it comes to issuing the actual commands.
Add an additional argument indicating the granularity for the benefit
of drivers needing to know, and update the ARM LPAE code appropriately
(for non-leaf invalidations we currently just assume the worst-case
page granularity rather than walking the table to check).
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In the case of corrupted page tables, or when an invalid size is given,
__arm_lpae_unmap() may recurse beyond the maximum number of levels.
Unfortunately the detection of this error condition only happens *after*
calculating a nonsense offset from something which might not be a valid
table pointer and dereferencing that to see if it is a valid PTE.
Make things a little more robust by checking the level is valid before
doing anything which depends on it being so.
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Whilst the architecture only defines a few of the possible CERROR values,
we should handle unknown values gracefully rather than go out of bounds
trying to print out an error description.
Signed-off-by: Will Deacon <will.deacon@arm.com>
The basic flow for add a device:
arm_smmu_add_device
|->iommu_group_get_for_dev
|->iommu_group_get
return group; (1)
|->ops->device_group : Init/increase reference count to/by 1.
|->iommu_group_add_device : Increase reference count by 1.
return group (2)
|->return 0;
Since we are adding one device, the flow is (2) and the group reference
count will be increased by 2. So, we need to add iommu_group_put at the
end of arm_smmu_add_device to decrease the count by 1.
Also take the failure path into consideration when fail to add a device.
Signed-off-by: Peng Fan <van.freenix@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When we initialise a bypass STE, we memset the structure to zero and
set the Valid and Config fields to indicate that the stream should
bypass the SMMU. Unfortunately, this results in an SHCFG field of 0
which means that the shareability of any incoming transactions is
overridden with non-shareable, leading to potential coherence problems
down the line.
This patch fixes the issue by initialising bypass STEs to use the
incoming shareability attributes. When translation is in effect at
either stage 1 or stage 2, the shareability is determined by the
page tables.
Signed-off-by: Will Deacon <will.deacon@arm.com>
The free_io_pgtable_ops() function tests whether its argument is NULL
and then returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The ARM SMMUv3 driver uses dma_{alloc,free}_coherent to manage its
queues and configuration data structures.
This patch converts the driver to the managed (dmam_*) API, so that
resources are freed automatically on device teardown. This greatly
simplifies the failure paths and allows us to remove a bunch of
handcrafted freeing code.
Signed-off-by: Will Deacon <will.deacon@arm.com>
PRIQ_0_OF has been removed from the SMMUv3 architecture, so remove its
corresponding (and unused) #define from the driver.
Signed-off-by: Will Deacon <will.deacon@arm.com>
commit db0fa0cb01 "scatterlist: use sg_phys()" did replacements of
the form:
phys_addr_t phys = page_to_phys(sg_page(s));
phys_addr_t phys = sg_phys(s) & PAGE_MASK;
However, this breaks platforms where sizeof(phys_addr_t) >
sizeof(unsigned long). Revert for 4.3 and 4.4 to make room for a
combined helper in 4.5.
Cc: <stable@vger.kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: db0fa0cb01 ("scatterlist: use sg_phys()")
Suggested-by: Joerg Roedel <joro@8bytes.org>
Reported-by: Vitaly Lavrov <vel21ripn@gmail.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>