We try to enforce protection keys in software the same way that we
do in hardware. (See long example below).
But, we only want to do this when accessing our *own* process's
memory. If GDB set PKRU[6].AD=1 (disable access to PKEY 6), then
tried to PTRACE_POKE a target process which just happened to have
some mprotect_pkey(pkey=6) memory, we do *not* want to deny the
debugger access to that memory. PKRU is fundamentally a
thread-local structure and we do not want to enforce it on access
to _another_ thread's data.
This gets especially tricky when we have workqueues or other
delayed-work mechanisms that might run in a random process's context.
We can check that we only enforce pkeys when operating on our *own* mm,
but delayed work gets performed when a random user context is active.
We might end up with a situation where a delayed-work gup fails when
running randomly under its "own" task but succeeds when running under
another process. We want to avoid that.
To avoid that, we use the new GUP flag: FOLL_REMOTE and add a
fault flag: FAULT_FLAG_REMOTE. They indicate that we are
walking an mm which is not guranteed to be the same as
current->mm and should not be subject to protection key
enforcement.
Thanks to Jerome Glisse for pointing out this scenario.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Dominik Vogt <vogt@linux.vnet.ibm.com>
Cc: Eric B Munson <emunson@akamai.com>
Cc: Geliang Tang <geliangtang@163.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Low <jason.low2@hp.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Cc: iommu@lists.linux-foundation.org
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Until all upstream devices have their DMA ops swizzled to point at the
SMMU, we need to treat the IOMMU_DOMAIN_DMA domain as bypass to avoid
putting devices into an empty address space when detaching from VFIO.
Signed-off-by: Will Deacon <will.deacon@arm.com>
The ARM SMMU attach_dev implementations returns -EEXIST if the device
being attached is already attached to a domain. This doesn't play nicely
with the default domain, resulting in splats such as:
WARNING: at drivers/iommu/iommu.c:1257
Modules linked in:
CPU: 3 PID: 1939 Comm: virtio-net-tx Tainted: G S 4.5.0-rc4+ #1
Hardware name: FVP Base (DT)
task: ffffffc87a9d0000 ti: ffffffc07a278000 task.ti: ffffffc07a278000
PC is at __iommu_detach_group+0x68/0xe8
LR is at __iommu_detach_group+0x48/0xe8
This patch fixes the problem by forcefully detaching the device from
its old domain, if present, when attaching to a new one. The unused
->detach_dev callback is also removed the iommu_ops structures.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Borrow the disable_bypass parameter from the SMMUv3 driver as a handy
debugging/security feature so that unmatched stream IDs (i.e. devices
not attached to an IOMMU domain) may be configured to fault.
Rather than introduce unsightly inconsistency, or repeat the existing
unnecessary use of module_param_named(), fix that as well in passing.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We are saving pointer to iommu DT node in of_iommu_set_ops()
hence we should increment DT node ref count.
Reviewed-by: Ray Jui <rjui@broadcom.com>
Reviewed-by: Scott Branden <sbranden@broadcom.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Anup Patel <anup.patel@broadcom.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
With DMA mapping ops provided by the iommu-dma code, only a minimal
contribution from the IOMMU driver is needed to create a suitable
DMA-API domain for them to use. Implement this for the ARM SMMUs.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The IOMMU API has no concept of privilege so assumes all devices and
mappings are equal, and indeed most non-CPU master devices on an AMBA
interconnect make little use of the attribute bits on the bus thus by
default perform unprivileged data accesses.
Some devices, however, believe themselves more equal than others, such
as programmable DMA controllers whose 'master' thread issues bus
transactions marked as privileged instruction fetches, while the data
accesses of its channel threads (under the control of Linux, at least)
are marked as unprivileged. This poses a problem for implementing the
DMA API on an IOMMU conforming to ARM VMSAv8, under which a page that is
unprivileged-writeable is also implicitly privileged-execute-never.
Given that, there is no one set of attributes with which iommu_map() can
implement, say, dma_alloc_coherent() that will allow every possible type
of access without something running into unexecepted permission faults.
Fortunately the SMMU architecture provides a means to mitigate such
issues by overriding the incoming attributes of a transaction; make use
of that to strip the privileged/unprivileged status off incoming
transactions, leaving just the instruction/data dichotomy which the
IOMMU API does at least understand; Four states good, two states better.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
As the number of io-pgtable implementations grows beyond 1, it's time
to rationalise the quirks mechanism before things have a chance to
start getting really ugly and out-of-hand.
To that end:
- Indicate exactly which quirks each format can/does support.
- Fail creating a table if a caller wants unsupported quirks.
- Properly document where each quirk applies and why.
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In certain unmapping situations it is quite possible to end up issuing
back-to-back TLB synchronisations, which at best is a waste of time and
effort, and at worst causes some hardware to get rather confused. Whilst
the pagetable implementations, or the IOMMU drivers, or both, could keep
track of things to avoid this happening, it seems to make the most sense
to prevent code duplication and add some simple state tracking in the
common interface between the two.
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Add some simple wrappers to avoid having the guts of the TLB operations
spilled all over the page table implementations, and to provide a point
to implement extra common functionality.
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Add a nearly-complete ARMv7 short descriptor implementation, omitting
only a few legacy and CPU-centric aspects which shouldn't be necessary
for IOMMU API use anyway.
Reviewed-by: Yong Wu <yong.wu@mediatek.com>
Tested-by: Yong Wu <yong.wu@mediatek.com>
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Minor register size and interrupt acknowledgement fixes which only showed
up in testing on newer hardware, but mostly a fix to the MM refcount
handling to prevent a recursive refcount issue when mmap() is used on
the file descriptor associated with a bound PASID.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iEYEABECAAYFAlbC/gAACgkQdwG7hYl686OY8QCfUPH+IB0zou9/MH3JNMz1ujot
I6wAoK0R4KiOFXvjNeNPy+XroZ9xKqv/
=RM+0
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20160216' of git://git.infradead.org/intel-iommu
Pull IOMMU SVM fixes from David Woodhouse:
"Minor register size and interrupt acknowledgement fixes which only
showed up in testing on newer hardware, but mostly a fix to the MM
refcount handling to prevent a recursive refcount issue when mmap() is
used on the file descriptor associated with a bound PASID"
* tag 'for-linus-20160216' of git://git.infradead.org/intel-iommu:
iommu/vt-d: Clear PPR bit to ensure we get more page request interrupts
iommu/vt-d: Fix 64-bit accesses to 32-bit DMAR_GSTS_REG
iommu/vt-d: Fix mm refcounting to hold mm_count not mm_users
According to the VT-d specification we need to clear the PPR bit in
the Page Request Status register when handling page requests, or the
hardware won't generate any more interrupts.
This wasn't actually necessary on SKL/KBL (which may well be the
subject of a hardware erratum, although it's harmless enough). But
other implementations do appear to get it right, and we only ever get
one interrupt unless we clear the PPR bit.
Reported-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@vger.kernel.org
In below commit alias DTE is set when its peripheral is
setting DTE. However there's a code bug here to wrongly
set the alias DTE, correct it in this patch.
commit e25bfb56ea
Author: Joerg Roedel <jroedel@suse.de>
Date: Tue Oct 20 17:33:38 2015 +0200
iommu/amd: Set alias DTE in do_attach/do_detach
Signed-off-by: Baoquan He <bhe@redhat.com>
Tested-by: Mark Hounschell <markh@compro.net>
Cc: stable@vger.kernel.org # v4.4
Signed-off-by: Joerg Roedel <jroedel@suse.de>
There are some IPs, such as video encoder/decoder, contains 2 slave iommus,
one for reading and the other for writing. They share the same irq and
clock with master.
This patch reconstructs to support this case by making them share the same
Page Directory, Page Tables and even the register operations.
That means every instruction to the reading MMU registers would be
duplicated to the writing MMU and vice versa.
Signed-off-by: ZhengShunQian <zhengsq@rock-chips.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Trying to build a kernel for ARC with both options CONFIG_COMPILE_TEST
and CONFIG_IOMMU_IO_PGTABLE_LPAE enabled (e.g. as a result of "make
allyesconfig") results in the following build failure:
| CC drivers/iommu/io-pgtable-arm.o
| linux/drivers/iommu/io-pgtable-arm.c: In
| function ‘__arm_lpae_alloc_pages’:
| linux/drivers/iommu/io-pgtable-arm.c:221:3:
| error: implicit declaration of function ‘dma_map_single’
| [-Werror=implicit-function-declaration]
| dma = dma_map_single(dev, pages, size, DMA_TO_DEVICE);
| ^
| linux/drivers/iommu/io-pgtable-arm.c:221:42:
| error: ‘DMA_TO_DEVICE’ undeclared (first use in this function)
| dma = dma_map_single(dev, pages, size, DMA_TO_DEVICE);
| ^
Since IOMMU_IO_PGTABLE_LPAE depends on DMA API, io-pgtable-arm.c should
include linux/dma-mapping.h. This fixes the reported failure.
Cc: Alexey Brodkin <abrodkin@synopsys.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Lada Trimasova <ltrimas@synopsys.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The updates include:
* Small code cleanups in the AMD IOMMUv2 driver
* Scalability improvements for the DMA-API implementation of the
AMD IOMMU driver. This is just a starting point, but already
showed some good improvements in my tests.
* Removal of the unused Renesas IPMMU/IPMMUI driver
* Updates for ARM-SMMU include:
* Some fixes to get the driver working nicely on
Broadcom hardware
* A change to the io-pgtable API to indicate the unit in
which to flush (all callers converted, with Ack from
Laurent)
* Use of devm_* for allocating/freeing the SMMUv3
buffers
* Some other small fixes and improvements for other drivers
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJWnkkIAAoJECvwRC2XARrjpD8QAMCYfoqiq35QLQYn7Jh/LA5E
ZotdNv6hONwCahQiNSSsaoP2f8IBpyvo7Nrgz1Fj3SEQYzAiBn6mWgXFu7WdQarD
kw1SLwUIUweF/qjgpOvGD29F7mC4XIRYfPOFbLEPvkBwx6Vm4NSJkclfMZJeNCFm
ghWxGdva+7HFyJX+gMS1flihfUzN31U5hKWRxQqHXcLbHuVOdEnL1by5ozbpcJNI
vkpbkCcWaD1uKju918akFYJultcwMGb7Wm6HwKB2EjG2aOoe2Siw61MrJ1DUreOh
J0fJubltaZwkMxFUTuNwrP9E+FH6arPtJBmvpMMz8ZQeLyQQQnBcHKDZFAgHu23Z
/wOkjoA5uG5iy2XiPWbUFJQKp4q+Dlkp8LqT1RAKvp8kVbrrsSGUXQzBIf+DE5F7
U0ghAWB70g6fREys/cvs0q7huX42Cuf3M82JKP9rksLj9ArWoT4TtkI2nvbNyKE8
KhX57xj4OSROriZV8+XmaU/W7bK6BVXr7B0aVOCvf5y7GsIYhf6zSH+0cP/TmLuQ
ZLtOr2UHFzvjZq7LHgRfEs1CYn+PhKw6kUM3rxjm/QZxiBSft7ABhhxJZKlMyE/f
jTnPS5DH2XT+UKtt8D0nfS558h0kxqwXzhICQHC30lpJLIoWj9ulZcmOXdzlY1xM
R5+4TTJ4l1tovPtQ9nUW
=bm5E
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"The updates include:
- Small code cleanups in the AMD IOMMUv2 driver
- Scalability improvements for the DMA-API implementation of the AMD
IOMMU driver. This is just a starting point, but already showed
some good improvements in my tests.
- Removal of the unused Renesas IPMMU/IPMMUI driver
- Updates for ARM-SMMU include:
* Some fixes to get the driver working nicely on Broadcom hardware
* A change to the io-pgtable API to indicate the unit in which to
flush (all callers converted, with Ack from Laurent)
* Use of devm_* for allocating/freeing the SMMUv3 buffers
- Some other small fixes and improvements for other drivers"
* tag 'iommu-updates-v4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (46 commits)
iommu/vt-d: Fix up error handling in alloc_iommu
iommu/vt-d: Check the return value of iommu_device_create()
iommu/amd: Remove an unneeded condition
iommu/amd: Preallocate dma_ops apertures based on dma_mask
iommu/amd: Use trylock to aquire bitmap_lock
iommu/amd: Make dma_ops_domain->next_index percpu
iommu/amd: Relax locking in dma_ops path
iommu/amd: Initialize new aperture range before making it visible
iommu/amd: Build io page-tables with cmpxchg64
iommu/amd: Allocate new aperture ranges in dma_ops_alloc_addresses
iommu/amd: Optimize dma_ops_free_addresses
iommu/amd: Remove need_flush from struct dma_ops_domain
iommu/amd: Iterate over all aperture ranges in dma_ops_area_alloc
iommu/amd: Flush iommu tlb in dma_ops_free_addresses
iommu/amd: Rename dma_ops_domain->next_address to next_index
iommu/amd: Remove 'start' parameter from dma_ops_area_alloc
iommu/amd: Flush iommu tlb in dma_ops_aperture_alloc()
iommu/amd: Retry address allocation within one aperture
iommu/amd: Move aperture_range.offset to another cache-line
iommu/amd: Add dma_ops_aperture_alloc() function
...
This is a 32-bit register. Apparently harmless on real hardware, but
causing justified warnings in simulation.
Signed-off-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@vger.kernel.org
Holding mm_users works OK for graphics, which was the first user of SVM
with VT-d. However, it works less well for other devices, where we actually
do a mmap() from the file descriptor to which the SVM PASID state is tied.
In this case on process exit we end up with a recursive reference count:
- The MM remains alive until the file is closed and the driver's release()
call ends up unbinding the PASID.
- The VMA corresponding to the mmap() remains intact until the MM is
destroyed.
- Thus the file isn't closed, even when exit_files() runs, because the
VMA is still holding a reference to it. And the MM remains alive…
To address this issue, we *stop* holding mm_users while the PASID is bound.
We already hold mm_count by virtue of the MMU notifier, and that can be
made to be sufficient.
It means that for a period during process exit, the fun part of mmput()
has happened and exit_mmap() has been called so the MM is basically
defunct. But the PGD still exists and the PASID is still bound to it.
During this period, we have to be very careful — exit_mmap() doesn't use
mm->mmap_sem because it doesn't expect anyone else to be touching the MM
(quite reasonably, since mm_users is zero). So we also need to fix the
fault handler to just report failure if mm_users is already zero, and to
temporarily bump mm_users while handling any faults.
Additionally, exit_mmap() calls mmu_notifier_release() *before* it tears
down the page tables, which is too early for us to flush the IOTLB for
this PASID. And __mmu_notifier_release() removes every notifier from the
list, so when exit_mmap() finally *does* tear down the mappings and
clear the page tables, we don't get notified. So we work around this by
clearing the PASID table entry in our MMU notifier release() callback.
That way, the hardware *can't* get any pages back from the page tables
before they get cleared.
Hardware designers have confirmed that the resulting 'PASID not present'
faults should be handled just as gracefully as 'page not present' faults,
the important criterion being that they don't perturb the operation for
any *other* PASID in the system.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@vger.kernel.org
Only check for error when iommu->iommu_dev has been assigned
and only assign drhd->iommu when the function can't fail
anymore.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This adds the proper check to alloc_iommu to make sure that
the call to iommu_device_create has completed successfully
and if not return the error code to the caller after freeing
up resources allocated previously.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When mapping a non-page-aligned scatterlist entry, we copy the original
offset to the output DMA address before aligning it to hand off to
iommu_map_sg(), then later adding the IOVA page address portion to get
the final mapped address. However, when the IOVA page size is smaller
than the CPU page size, it is the offset within the IOVA page we want,
not that within the CPU page, which can easily be larger than an IOVA
page and thus result in an incorrect final address.
Fix the bug by taking only the IOVA-aligned part of the offset as the
basis of the DMA address, not the whole thing.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
get_device_id() returns an unsigned short device id. It never fails and
it never returns a negative so we can remove this condition.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Preallocate between 4 and 8 apertures when a device gets it
dma_mask. With more apertures we reduce the lock contention
of the domain lock significantly.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Make this pointer percpu so that we start searching for new
addresses in the range we last stopped and which is has a
higher probability of being still in the cache.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This allows to build up the page-tables without holding any
locks. As a consequence it removes the need to pre-populate
dma_ops page-tables.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Don't flush the iommu tlb when we free something behind the
current next_bit pointer. Update the next_bit pointer
instead and let the flush happen on the next wraparound in
the allocation path.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The flushing of iommu tlbs is now done on a per-range basis.
So there is no need anymore for domain-wide flush tracking.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
It points to the next aperture index to allocate from. We
don't need the full address anymore because this is now
tracked in struct aperture_range.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Moving it before the pte_pages array puts in into the same
cache-line as the spin-lock and the bitmap array pointer.
This should safe a cache-miss.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
There have been present PTEs which in theory could have made
it to the IOMMU TLB. Flush the addresses out on the error
path to make sure no stale entries remain.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
If CONFIG_PHYS_ADDR_T_64BIT=n:
drivers/iommu/ipmmu-vmsa.c: In function 'ipmmu_domain_init_context':
drivers/iommu/ipmmu-vmsa.c:434:2: warning: right shift count >= width of type
ipmmu_ctx_write(domain, IMTTUBR0, ttbr >> 32);
^
As io_pgtable_cfg.arm_lpae_s1_cfg.ttbr[] is an array of u64s, assigning
it to a phys_addr_t may truncates it. Make ttbr u64 to fix this.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Doug reports that the equivalent page allocator on 32-bit ARM exhibits
particularly pathalogical behaviour under memory pressure when
fragmentation is high, where allocating a 4MB buffer takes tens of
seconds and the number of calls to alloc_pages() is over 9000![1]
We can drastically improve that situation without losing the other
benefits of high-order allocations when they would succeed, by assuming
memory pressure is relatively constant over the course of an allocation,
and not retrying allocations at orders we know to have failed before.
This way, the best-case behaviour remains unchanged, and in the worst
case we should see at most a dozen or so (MAX_ORDER - 1) failed attempts
before falling back to single pages for the remainder of the buffer.
[1]:http://lists.infradead.org/pipermail/linux-arm-kernel/2015-December/394660.html
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
dma-iommu.c was naughtily relying on an implicit transitive #include of
linux/vmalloc.h, which is apparently not present on some architectures.
Add that, plus a couple more headers for other functions which are used
similarly.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Those are stupid and code should use static_cpu_has_safe() or
boot_cpu_has() instead. Kill the least used and unused ones.
The remaining ones need more careful inspection before a conversion can
happen. On the TODO.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1449481182-27541-4-git-send-email-bp@alien8.de
Cc: David Sterba <dsterba@suse.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* Two similar fixes for the Intel and AMD IOMMU drivers to add
proper access checks before calling handle_mm_fault.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJWdCp7AAoJECvwRC2XARrjjAIP/0ihW2zF4R622RgY1C1Cm62j
0eb/R4UqjI3PG0KsURgDHcIm9JP5Z//dgKTOtNX9KOkHlXLcO9MMSD5chVBd4HKG
+Mgx7RM+Mr7f6ElRUa6s1GY1tcJlGf43fW5cMQ44BJIqVXlE47go4U09D86DVgXy
KgyBxQldeOrkXZvAG82WLjGgkdGALQjbDlI8ktmfYWXAvIRWNGJqWY16BwAYOWfb
9d3+1JPekSSBWHC6H+qbkDb8ueO69/Ux0HL5z2Q0zchqGjBb1gnfwLcz865KZpOB
qUwsKFSXTl+jPCrAaLYJnVqAnH4qqKaF6WKAJSIHObTSVqXKHpFHrQrlGVzOvYNn
s3216KIMsxG2nnvSgXCOFGqM/810MH2MSo8YcF5A3celrka3j2Gj08mxInrZXN7D
3p51HSwq8ePo4i5jppT5ldOBSjNV9N3wKWcjDb4OL+OfkJc/u2VbSHNQtpvTclsV
V6VSfWLDC8BCmUveMH2TrawQWkKOz0LqgqfQPX+VvSCIM7tgkrgVsTJrijPtGOs1
zid/A/cfqMdBezSVALrZfB4OVBaM2UL2LJmmLJgApYV+N55Oxmx+nxnMr0aT5KlY
crjcnVaypkq3rG1Wjpt+nTTwtllB0yXNEywQcu2edeswmaQCqsEgQRsDqi6S2/+S
c8l9JKoTrB4+vToYjXyW
=qrAB
-----END PGP SIGNATURE-----
Merge tag 'iommu-fixes-v4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
"Two similar fixes for the Intel and AMD IOMMU drivers to add proper
access checks before calling handle_mm_fault"
* tag 'iommu-fixes-v4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Do access checks before calling handle_mm_fault()
iommu/amd: Do proper access checking before calling handle_mm_fault()
When tearing down page tables, we return early for the final level
since we know that we won't have any table pointers to follow.
Unfortunately, this also means that we forget to free the final level,
so we end up leaking memory.
Fix the issue by always freeing the current level, but just don't bother
to iterate over the ptes if we're at the final level.
Cc: <stable@vger.kernel.org>
Reported-by: Zhang Bo <zhangbo_a@xiaomi.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
It is ILLEGAL to set STE.S1STALLD to 1 if stage 1 is enabled and
either the stall or terminate models are not supported.
This patch fixes the STALLD check and ensures that we don't set STALLD
in the STE when it is not supported.
Signed-off-by: Prem Mallappa <pmallapp@broadcom.com>
[will: consistently use IDR0_STALL_MODEL_* prefix]
Signed-off-by: Will Deacon <will.deacon@arm.com>
When acknowledging global errors, the GERRORN register should be written
with the original GERROR value so that active errors are toggled.
This patch fixed the driver to write the original GERROR value to
GERRORN, instead of an active error mask.
Signed-off-by: Prem Mallappa <pmallapp@broadcom.com>
[will: reworked use of active bits and fixed commit log]
Signed-off-by: Will Deacon <will.deacon@arm.com>
There is no need to keep a useful accessor for a public structure hidden
away in a private implementation. Move it out alongside the structure
definition so that other implementations may reuse it.
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When invalidating an IOVA range potentially spanning multiple pages,
such as when removing an entire intermediate-level table, we currently
only issue an invalidation for the first IOVA of that range. Since the
architecture specifies that address-based TLB maintenance operations
target a single entry, an SMMU could feasibly retain live entries for
subsequent pages within that unmapped range, which is not good.
Make sure we hit every possible entry by iterating over the whole range
at the granularity provided by the pagetable implementation.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
[will: added missing semicolons...]
Signed-off-by: Will Deacon <will.deacon@arm.com>
IOMMU hardware with range-based TLB maintenance commands can work
happily with the iova and size arguments passed via the tlb_add_flush
callback, but for IOMMUs which require separate commands per entry in
the range, it is not straightforward to infer the necessary granularity
when it comes to issuing the actual commands.
Add an additional argument indicating the granularity for the benefit
of drivers needing to know, and update the ARM LPAE code appropriately
(for non-leaf invalidations we currently just assume the worst-case
page granularity rather than walking the table to check).
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In the case of corrupted page tables, or when an invalid size is given,
__arm_lpae_unmap() may recurse beyond the maximum number of levels.
Unfortunately the detection of this error condition only happens *after*
calculating a nonsense offset from something which might not be a valid
table pointer and dereferencing that to see if it is a valid PTE.
Make things a little more robust by checking the level is valid before
doing anything which depends on it being so.
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Whilst the architecture only defines a few of the possible CERROR values,
we should handle unknown values gracefully rather than go out of bounds
trying to print out an error description.
Signed-off-by: Will Deacon <will.deacon@arm.com>
The basic flow for add a device:
arm_smmu_add_device
|->iommu_group_get_for_dev
|->iommu_group_get
return group; (1)
|->ops->device_group : Init/increase reference count to/by 1.
|->iommu_group_add_device : Increase reference count by 1.
return group (2)
|->return 0;
Since we are adding one device, the flow is (2) and the group reference
count will be increased by 2. So, we need to add iommu_group_put at the
end of arm_smmu_add_device to decrease the count by 1.
Also take the failure path into consideration when fail to add a device.
Signed-off-by: Peng Fan <van.freenix@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When we initialise a bypass STE, we memset the structure to zero and
set the Valid and Config fields to indicate that the stream should
bypass the SMMU. Unfortunately, this results in an SHCFG field of 0
which means that the shareability of any incoming transactions is
overridden with non-shareable, leading to potential coherence problems
down the line.
This patch fixes the issue by initialising bypass STEs to use the
incoming shareability attributes. When translation is in effect at
either stage 1 or stage 2, the shareability is determined by the
page tables.
Signed-off-by: Will Deacon <will.deacon@arm.com>
The free_io_pgtable_ops() function tests whether its argument is NULL
and then returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The ARM SMMUv3 driver uses dma_{alloc,free}_coherent to manage its
queues and configuration data structures.
This patch converts the driver to the managed (dmam_*) API, so that
resources are freed automatically on device teardown. This greatly
simplifies the failure paths and allows us to remove a bunch of
handcrafted freeing code.
Signed-off-by: Will Deacon <will.deacon@arm.com>
PRIQ_0_OF has been removed from the SMMUv3 architecture, so remove its
corresponding (and unused) #define from the driver.
Signed-off-by: Will Deacon <will.deacon@arm.com>
commit db0fa0cb01 "scatterlist: use sg_phys()" did replacements of
the form:
phys_addr_t phys = page_to_phys(sg_page(s));
phys_addr_t phys = sg_phys(s) & PAGE_MASK;
However, this breaks platforms where sizeof(phys_addr_t) >
sizeof(unsigned long). Revert for 4.3 and 4.4 to make room for a
combined helper in 4.5.
Cc: <stable@vger.kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: db0fa0cb01 ("scatterlist: use sg_phys()")
Suggested-by: Joerg Roedel <joro@8bytes.org>
Reported-by: Vitaly Lavrov <vel21ripn@gmail.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
As of commit 44d88c754e ("ARM: shmobile: Remove legacy SoC code
for R-Mobile A1"), the Renesas IPMMU/IPMMUI driver is no longer used.
In theory it could still be used on SH-Mobile AG5 and R-Mobile A1 SoCs,
but that requires adding DT support to the driver, which is not
planned.
Remove the driver, it can be resurrected from git history when needed.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
These new helpers simplify implementing multi-driver modules and
properly handle failure to register one driver by unregistering all
previously registered drivers.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This mmu_notifier_ops structure is never modified, so declare it as
const, like the other mmu_notifier_ops structures.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Get rid of the three error paths that look the same and move
error handling to a single place.
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-By: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Instead of just checking for a write access, calculate the
flags that are passed to handle_mm_fault() more precisly and
use the pre-defined macros.
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-By: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Not doing so is a bug and might trigger a BUG_ON in
handle_mm_fault(). So add the proper permission checks
before calling into mm code.
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-By: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The handle_mm_fault function expects the caller to do the
access checks. Not doing so and calling the function with
wrong permissions is a bug (catched by a BUG_ON).
So fix this bug by adding proper access checking to the io
page-fault code in the AMD IOMMUv2 driver.
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-By: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Fix these warnings:
CHECK drivers/iommu/s390-iommu.c
drivers/iommu/s390-iommu.c:52:21: warning: symbol 's390_domain_alloc' was not declared. Should it be static?
drivers/iommu/s390-iommu.c:76:6: warning: symbol 's390_domain_free' was not declared. Should it be static?
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
We use lazy allocation for translation table entries but don't handle
allocation (and other) failures during translation table updates.
Handle these failures and undo translation table updates when it's
meaningful.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve". __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".
Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.
This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim. __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.
This patch then converts a number of sites
o __GFP_ATOMIC is used by callers that are high priority and have memory
pools for those requests. GFP_ATOMIC uses this flag.
o Callers that have a limited mempool to guarantee forward progress clear
__GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
into this category where kswapd will still be woken but atomic reserves
are not used as there is a one-entry mempool to guarantee progress.
o Callers that are checking if they are non-blocking should use the
helper gfpflags_allow_blocking() where possible. This is because
checking for __GFP_WAIT as was done historically now can trigger false
positives. Some exceptions like dm-crypt.c exist where the code intent
is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
flag manipulations.
o Callers that built their own GFP flags instead of starting with GFP_KERNEL
and friends now also need to specify __GFP_KSWAPD_RECLAIM.
The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.
The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL. They may
now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Kconfig: remove BE-only platforms from LE kernel build from Boqun Feng
- Refresh ps3_defconfig from Geoff Levand
- Emit GNU & SysV hashes for the vdso from Michael Ellerman
- Define an enum for the bolted SLB indexes from Anshuman Khandual
- Use a local to avoid multiple calls to get_slb_shadow() from Michael Ellerman
- Add gettimeofday() benchmark from Michael Neuling
- Avoid link stack corruption in __get_datapage() from Michael Neuling
- Add virt_to_pfn and use this instead of opencoding from Aneesh Kumar K.V
- Add ppc64le_defconfig from Michael Ellerman
- pseries: extract of_helpers module from Andy Shevchenko
- Correct string length in pseries_of_derive_parent() from Nathan Fontenot
- Free the MSI bitmap if it was slab allocated from Denis Kirjanov
- Shorten irq_chip name for the SIU from Christophe Leroy
- Wait 1s for secondaries to enter OPAL during kexec from Samuel Mendoza-Jonas
- Fix _ALIGN_* errors due to type difference. from Aneesh Kumar K.V
- powerpc/pseries/hvcserver: don't memset pi_buff if it is null from Colin Ian King
- Disable hugepd for 64K page size. from Aneesh Kumar K.V
- Differentiate between hugetlb and THP during page walk from Aneesh Kumar K.V
- Make PCI non-optional for pseries from Michael Ellerman
- Individual System V IPC system calls from Sam bobroff
- Add selftest of unmuxed IPC calls from Michael Ellerman
- discard .exit.data at runtime from Stephen Rothwell
- Delete old orphaned PrPMC 280/2800 DTS and boot file. from Paul Gortmaker
- Use of_get_next_parent to simplify code from Christophe Jaillet
- Paginate some xmon output from Sam bobroff
- Add some more elements to the xmon PACA dump from Michael Ellerman
- Allow the tm-syscall selftest to build with old headers from Michael Ellerman
- Run EBB selftests only on POWER8 from Denis Kirjanov
- Drop CONFIG_TUNE_CELL in favour of CONFIG_CELL_CPU from Michael Ellerman
- Avoid reference to potentially freed memory in prom.c from Christophe Jaillet
- Quieten boot wrapper output with run_cmd from Geoff Levand
- EEH fixes and cleanups from Gavin Shan
- Fix recursive fenced PHB on Broadcom shiner adapter from Gavin Shan
- Use of_get_next_parent() in of_get_ibm_chip_id() from Michael Ellerman
- Fix section mismatch warning in msi_bitmap_alloc() from Denis Kirjanov
- Fix ps3-lpm white space from Rudhresh Kumar J
- Fix ps3-vuart null dereference from Colin King
- nvram: Add missing kfree in error path from Christophe Jaillet
- nvram: Fix function name in some errors messages. from Christophe Jaillet
- drivers/macintosh: adb: fix misleading Kconfig help text from Aaro Koskinen
- agp/uninorth: fix a memleak in create_gatt_table from Denis Kirjanov
- cxl: Free virtual PHB when removing from Andrew Donnellan
- scripts/kconfig/Makefile: Allow KBUILD_DEFCONFIG to be a target from Michael Ellerman
- scripts/kconfig/Makefile: Fix KBUILD_DEFCONFIG check when building with O= from Michael Ellerman
- Freescale updates from Scott: Highlights include 64-bit book3e kexec/kdump
support, a rework of the qoriq clock driver, device tree changes including
qoriq fman nodes, support for a new 85xx board, and some fixes.
- MPC5xxx updates from Anatolij: Highlights include a driver for MPC512x
LocalPlus Bus FIFO with its device tree binding documentation, mpc512x
device tree updates and some minor fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWPEZgAAoJEFHr6jzI4aWANjYQAKX2Q/95hqKfCuF5FBcUmtMC
Pu/Nff027MVzxZ2ApDcvvLGps5Nz2bn3nIhc9zjkXc5E8DuL6X3Yl8ce7qyNcc3g
cJJ8RvtUo6J1OMWetXFehtPYniAAwKMhZYKnj0+WnLr2SyH/Vhl3ehDkFbGyPtuH
r+2E7krFjfVgU+bzciIFnOaDekFuFN/pXWMb6e6zQyBJe9N8ZIp96uouGCebKVd0
VDLItzdaKErT8JFfbymMPvZm3V0rMVx4WWu3kAbQX8LrD5a18NF1zrjAOHRXc61n
kkk8/DPuNOon1PbXXyiS5BcFyZRe+KE3VBnoW5sOMqMIRg5WdO1oU3e2pEfXMO8+
leXYwFLXiKzUZuOgQG2QiUhrzD2yC1o6/TJWATv0dSl9AwrecgPX+Vj6X357slAf
A9E3eMy5tgnpndBWZmvZS3W7YDKH+NkeZ+Q40+NErAlqr++ErrTcKVndk5vWlYTT
7mMZeTXagX66al/k5ATKqwB7iUSpnYHSAa9fcUYPSM2FnXsDxPyeJGkBbcoOmkGj
QrpgNYOvJaUJd076goZCV39v0c1xpfV9/9kyVch8HUadf6JcjpVZwYnbGw2qlJjh
ZanuBG2VOeSwaKQqXiRBSBetnpAg8CVpFjDmX9wOBfSek2wxEJqDX/vQExdbIDQQ
pUs7vnUxLzhmW/x+ygOI
=YwcM
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Kconfig: remove BE-only platforms from LE kernel build from Boqun
Feng
- Refresh ps3_defconfig from Geoff Levand
- Emit GNU & SysV hashes for the vdso from Michael Ellerman
- Define an enum for the bolted SLB indexes from Anshuman Khandual
- Use a local to avoid multiple calls to get_slb_shadow() from Michael
Ellerman
- Add gettimeofday() benchmark from Michael Neuling
- Avoid link stack corruption in __get_datapage() from Michael Neuling
- Add virt_to_pfn and use this instead of opencoding from Aneesh Kumar
K.V
- Add ppc64le_defconfig from Michael Ellerman
- pseries: extract of_helpers module from Andy Shevchenko
- Correct string length in pseries_of_derive_parent() from Nathan
Fontenot
- Free the MSI bitmap if it was slab allocated from Denis Kirjanov
- Shorten irq_chip name for the SIU from Christophe Leroy
- Wait 1s for secondaries to enter OPAL during kexec from Samuel
Mendoza-Jonas
- Fix _ALIGN_* errors due to type difference, from Aneesh Kumar K.V
- powerpc/pseries/hvcserver: don't memset pi_buff if it is null from
Colin Ian King
- Disable hugepd for 64K page size, from Aneesh Kumar K.V
- Differentiate between hugetlb and THP during page walk from Aneesh
Kumar K.V
- Make PCI non-optional for pseries from Michael Ellerman
- Individual System V IPC system calls from Sam bobroff
- Add selftest of unmuxed IPC calls from Michael Ellerman
- discard .exit.data at runtime from Stephen Rothwell
- Delete old orphaned PrPMC 280/2800 DTS and boot file, from Paul
Gortmaker
- Use of_get_next_parent to simplify code from Christophe Jaillet
- Paginate some xmon output from Sam bobroff
- Add some more elements to the xmon PACA dump from Michael Ellerman
- Allow the tm-syscall selftest to build with old headers from Michael
Ellerman
- Run EBB selftests only on POWER8 from Denis Kirjanov
- Drop CONFIG_TUNE_CELL in favour of CONFIG_CELL_CPU from Michael
Ellerman
- Avoid reference to potentially freed memory in prom.c from Christophe
Jaillet
- Quieten boot wrapper output with run_cmd from Geoff Levand
- EEH fixes and cleanups from Gavin Shan
- Fix recursive fenced PHB on Broadcom shiner adapter from Gavin Shan
- Use of_get_next_parent() in of_get_ibm_chip_id() from Michael
Ellerman
- Fix section mismatch warning in msi_bitmap_alloc() from Denis
Kirjanov
- Fix ps3-lpm white space from Rudhresh Kumar J
- Fix ps3-vuart null dereference from Colin King
- nvram: Add missing kfree in error path from Christophe Jaillet
- nvram: Fix function name in some errors messages, from Christophe
Jaillet
- drivers/macintosh: adb: fix misleading Kconfig help text from Aaro
Koskinen
- agp/uninorth: fix a memleak in create_gatt_table from Denis Kirjanov
- cxl: Free virtual PHB when removing from Andrew Donnellan
- scripts/kconfig/Makefile: Allow KBUILD_DEFCONFIG to be a target from
Michael Ellerman
- scripts/kconfig/Makefile: Fix KBUILD_DEFCONFIG check when building
with O= from Michael Ellerman
- Freescale updates from Scott: Highlights include 64-bit book3e
kexec/kdump support, a rework of the qoriq clock driver, device tree
changes including qoriq fman nodes, support for a new 85xx board, and
some fixes.
- MPC5xxx updates from Anatolij: Highlights include a driver for
MPC512x LocalPlus Bus FIFO with its device tree binding
documentation, mpc512x device tree updates and some minor fixes.
* tag 'powerpc-4.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (106 commits)
powerpc/msi: Fix section mismatch warning in msi_bitmap_alloc()
powerpc/prom: Use of_get_next_parent() in of_get_ibm_chip_id()
powerpc/pseries: Correct string length in pseries_of_derive_parent()
powerpc/e6500: hw tablewalk: make sure we invalidate and write to the same tlb entry
powerpc/mpc85xx: Add FSL QorIQ DPAA FMan support to the SoC device tree(s)
powerpc/mpc85xx: Create dts components for the FSL QorIQ DPAA FMan
powerpc/fsl: Add #clock-cells and clockgen label to clockgen nodes
powerpc: handle error case in cpm_muram_alloc()
powerpc: mpic: use IRQCHIP_SKIP_SET_WAKE instead of redundant mpic_irq_set_wake
powerpc/book3e-64: Enable kexec
powerpc/book3e-64/kexec: Set "r4 = 0" when entering spinloop
powerpc/booke: Only use VIRT_PHYS_OFFSET on booke32
powerpc/book3e-64/kexec: Enable SMP release
powerpc/book3e-64/kexec: create an identity TLB mapping
powerpc/book3e-64: Don't limit paca to 256 MiB
powerpc/book3e/kdump: Enable crash_kexec_wait_realmode
powerpc/book3e: support CONFIG_RELOCATABLE
powerpc/booke64: Fix args to copy_and_flush
powerpc/book3e-64: rename interrupt_end_book3e with __end_interrupts
powerpc/e6500: kexec: Handle hardware threads
...
handling.
PPC: Mostly bug fixes.
ARM: No big features, but many small fixes and prerequisites including:
- a number of fixes for the arch-timer
- introducing proper level-triggered semantics for the arch-timers
- a series of patches to synchronously halt a guest (prerequisite for
IRQ forwarding)
- some tracepoint improvements
- a tweak for the EL2 panic handlers
- some more VGIC cleanups getting rid of redundant state
x86: quite a few changes:
- support for VT-d posted interrupts (i.e. PCI devices can inject
interrupts directly into vCPUs). This introduces a new component (in
virt/lib/) that connects VFIO and KVM together. The same infrastructure
will be used for ARM interrupt forwarding as well.
- more Hyper-V features, though the main one Hyper-V synthetic interrupt
controller will have to wait for 4.5. These will let KVM expose Hyper-V
devices.
- nested virtualization now supports VPID (same as PCID but for vCPUs)
which makes it quite a bit faster
- for future hardware that supports NVDIMM, there is support for clflushopt,
clwb, pcommit
- support for "split irqchip", i.e. LAPIC in kernel + IOAPIC/PIC/PIT in
userspace, which reduces the attack surface of the hypervisor
- obligatory smattering of SMM fixes
- on the guest side, stable scheduler clock support was rewritten to not
require help from the hypervisor.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJWO2IQAAoJEL/70l94x66D/K0H/3AovAgYmJQToZlimsktMk6a
f2xhdIqfU5lIQQh5uNBCfL3o9o8H9Py1ym7aEw3fmztPHHJYc91oTatt2UEKhmEw
VtZHp/dFHt3hwaIdXmjRPEXiYctraKCyrhaUYdWmUYkoKi7lW5OL5h+S7frG2U6u
p/hFKnHRZfXHr6NSgIqvYkKqtnc+C0FWY696IZMzgCksOO8jB1xrxoSN3tANW3oJ
PDV+4og0fN/Fr1capJUFEc/fejREHneANvlKrLaa8ht0qJQutoczNADUiSFLcMPG
iHljXeDsv5eyjMtUuIL8+MPzcrIt/y4rY41ZPiKggxULrXc6H+JJL/e/zThZpXc=
=iv2z
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"First batch of KVM changes for 4.4.
s390:
A bunch of fixes and optimizations for interrupt and time handling.
PPC:
Mostly bug fixes.
ARM:
No big features, but many small fixes and prerequisites including:
- a number of fixes for the arch-timer
- introducing proper level-triggered semantics for the arch-timers
- a series of patches to synchronously halt a guest (prerequisite
for IRQ forwarding)
- some tracepoint improvements
- a tweak for the EL2 panic handlers
- some more VGIC cleanups getting rid of redundant state
x86:
Quite a few changes:
- support for VT-d posted interrupts (i.e. PCI devices can inject
interrupts directly into vCPUs). This introduces a new
component (in virt/lib/) that connects VFIO and KVM together.
The same infrastructure will be used for ARM interrupt
forwarding as well.
- more Hyper-V features, though the main one Hyper-V synthetic
interrupt controller will have to wait for 4.5. These will let
KVM expose Hyper-V devices.
- nested virtualization now supports VPID (same as PCID but for
vCPUs) which makes it quite a bit faster
- for future hardware that supports NVDIMM, there is support for
clflushopt, clwb, pcommit
- support for "split irqchip", i.e. LAPIC in kernel +
IOAPIC/PIC/PIT in userspace, which reduces the attack surface of
the hypervisor
- obligatory smattering of SMM fixes
- on the guest side, stable scheduler clock support was rewritten
to not require help from the hypervisor"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (123 commits)
KVM: VMX: Fix commit which broke PML
KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0()
KVM: x86: allow RSM from 64-bit mode
KVM: VMX: fix SMEP and SMAP without EPT
KVM: x86: move kvm_set_irq_inatomic to legacy device assignment
KVM: device assignment: remove pointless #ifdefs
KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic
KVM: x86: zero apic_arb_prio on reset
drivers/hv: share Hyper-V SynIC constants with userspace
KVM: x86: handle SMBASE as physical address in RSM
KVM: x86: add read_phys to x86_emulate_ops
KVM: x86: removing unused variable
KVM: don't pointlessly leave KVM_COMPAT=y in non-KVM configs
KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr()
KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings
KVM: arm/arm64: Optimize away redundant LR tracking
KVM: s390: use simple switch statement as multiplexer
KVM: s390: drop useless newline in debugging data
KVM: s390: SCA must not cross page boundaries
KVM: arm: Do not indent the arguments of DECLARE_BITMAP
...
This time including:
* A new IOMMU driver for s390 pci devices
* Common dma-ops support based on iommu-api for ARM64. The plan is to
use this as a basis for ARM32 and hopefully other architectures as
well in the future.
* MSI support for ARM-SMMUv3
* Cleanups and dead code removal in the AMD IOMMU driver
* Better RMRR handling for the Intel VT-d driver
* Various other cleanups and small fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJWOz7hAAoJECvwRC2XARrjbvYQALwtITTA5iTm0y/ApwNMxI7n
pZpjZVPoBPNsGBc4t/MT8pVhUSdmpBOljbV4Y4CayL1mSSB6Bl2gooZjd66m7Z81
qMJYEVWhFQqVsIKkCSNOgaO7W5y+xt3rTgqN6vCu86/CCDfKrTPP/+CRl1T/z9bo
1J8ioM3KnZG9KzG8JuXYFg5wwbKToaBh6swSmj+O4U9hru7zV/ILP7ikcc9pyMji
12WbzCqchRORsJZD65xMRYAqRaPNN/3IlDejs00TOFhY3qpWgEgFUucyeRJBJ/+q
K4U8T5vZsnr1a04l7/BeYbLmP7y/9Qv0N0xMGtTyoy/w/BieGqRWu4hHhqf/44NO
EhCSXcEThMNCGTjP2VWC4dnQ/s7Y8OmSW9nCreUcFVxHoE5LfDoh8RngA2fpeNuS
ixb3OwP+YXHN9Ck+1BQqQCeBznsPTLuDxlhRjCJsWntIfMSkXebOkz83YxyZ9b0Q
gFvptfuknU7cotUwWa3dg8RiUB8kNlKJyEEByaVpWEbEOabnONKEMkstvuBx6Ots
kA63wbe7QcPgbUYuq7g0nijDw6E2aEtf0nx2Xx4ZDL932qjg/xUkiBpmbDXHw4Gu
nimNXVQtbCzF74SyTvxEtupiijOTm5eHtoKtg0mYnqPZ+V9eOwEvW8IHaFFf8XHD
SecikoTtH1Q4RVtqOcAQ
=jLlB
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
"This time including:
- A new IOMMU driver for s390 pci devices
- Common dma-ops support based on iommu-api for ARM64. The plan is
to use this as a basis for ARM32 and hopefully other architectures
as well in the future.
- MSI support for ARM-SMMUv3
- Cleanups and dead code removal in the AMD IOMMU driver
- Better RMRR handling for the Intel VT-d driver
- Various other cleanups and small fixes"
* tag 'iommu-updates-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (41 commits)
iommu/vt-d: Fix return value check of parse_ioapics_under_ir()
iommu/vt-d: Propagate error-value from ir_parse_ioapic_hpet_scope()
iommu/vt-d: Adjust the return value of the parse_ioapics_under_ir
iommu: Move default domain allocation to iommu_group_get_for_dev()
iommu: Remove is_pci_dev() fall-back from iommu_group_get_for_dev
iommu/arm-smmu: Switch to device_group call-back
iommu/fsl: Convert to device_group call-back
iommu: Add device_group call-back to x86 iommu drivers
iommu: Add generic_device_group() function
iommu: Export and rename iommu_group_get_for_pci_dev()
iommu: Revive device_group iommu-ops call-back
iommu/amd: Remove find_last_devid_on_pci()
iommu/amd: Remove first/last_device handling
iommu/amd: Initialize amd_iommu_last_bdf for DEV_ALL
iommu/amd: Cleanup buffer allocation
iommu/amd: Remove cmd_buf_size and evt_buf_size from struct amd_iommu
iommu/amd: Align DTE flag definitions
iommu/amd: Remove old alias handling code
iommu/amd: Set alias DTE in do_attach/do_detach
iommu/amd: WARN when __[attach|detach]_device are called with irqs enabled
...
Pull intel iommu updates from David Woodhouse:
"This adds "Shared Virtual Memory" (aka PASID support) for the Intel
IOMMU. This allows devices to do DMA using process address space,
translated through the normal CPU page tables for the relevant mm.
With corresponding support added to the i915 driver, this has been
tested with the graphics device on Skylake. We don't have the
required TLP support in our PCIe root ports for supporting discrete
devices yet, so it's only integrated devices that can do it so far"
* git://git.infradead.org/intel-iommu: (23 commits)
iommu/vt-d: Fix rwxp flags in SVM device fault callback
iommu/vt-d: Expose struct svm_dev_ops without CONFIG_INTEL_IOMMU_SVM
iommu/vt-d: Clean up pasid_enabled() and ecs_enabled() dependencies
iommu/vt-d: Handle Caching Mode implementations of SVM
iommu/vt-d: Fix SVM IOTLB flush handling
iommu/vt-d: Use dev_err(..) in intel_svm_device_to_iommu(..)
iommu/vt-d: fix a loop in prq_event_thread()
iommu/vt-d: Fix IOTLB flushing for global pages
iommu/vt-d: Fix address shifting in page request handler
iommu/vt-d: shift wrapping bug in prq_event_thread()
iommu/vt-d: Fix NULL pointer dereference in page request error case
iommu/vt-d: Implement SVM_FLAG_SUPERVISOR_MODE for kernel access
iommu/vt-d: Implement SVM_FLAG_PRIVATE_PASID to allocate unique PASIDs
iommu/vt-d: Add callback to device driver on page faults
iommu/vt-d: Implement page request handling
iommu/vt-d: Generalise DMAR MSI setup to allow for page request events
iommu/vt-d: Implement deferred invalidate for SVM
iommu/vt-d: Add basic SVM PASID support
iommu/vt-d: Always enable PASID/PRI PCI capabilities before ATS
iommu/vt-d: Add initial support for PASID tables
...
Here's the "big" driver core updates for 4.4-rc1. Primarily a bunch of
debugfs updates, with a smattering of minor driver core fixes and
updates as well.
All have been in linux-next for a long time.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlY6ePQACgkQMUfUDdst+ymNTgCgpP0CZw57GpwF/Hp2L/lMkVeo
Kx8AoKhEi4iqD5fdCQS9qTfomB+2/M6g
=g7ZO
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here's the "big" driver core updates for 4.4-rc1. Primarily a bunch
of debugfs updates, with a smattering of minor driver core fixes and
updates as well.
All have been in linux-next for a long time"
* tag 'driver-core-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
debugfs: Add debugfs_create_ulong()
of: to support binding numa node to specified device in devicetree
debugfs: Add read-only/write-only bool file ops
debugfs: Add read-only/write-only size_t file ops
debugfs: Add read-only/write-only x64 file ops
debugfs: Consolidate file mode checks in debugfs_create_*()
Revert "mm: Check if section present during memory block (un)registering"
driver-core: platform: Provide helpers for multi-driver modules
mm: Check if section present during memory block (un)registering
devres: fix a for loop bounds check
CMA: fix CONFIG_CMA_SIZE_MBYTES overflow in 64bit
base/platform: assert that dev_pm_domain callbacks are called unconditionally
sysfs: correctly handle short reads on PREALLOC attrs.
base: soc: siplify ida usage
kobject: move EXPORT_SYMBOL() macros next to corresponding definitions
kobject: explain what kobject's sd field is
debugfs: document that debugfs_remove*() accepts NULL and error values
debugfs: Pass bool pointer to debugfs_create_bool()
ACPI / EC: Fix broken 64bit big-endian users of 'global_lock'
This is the downside of using bitfields in the struct definition, rather
than doing all the explicit masking and shifting.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Two late fixes for the AMD IOMMU driver:
* One adds an additional check to the io page-fault handler to
avoid a BUG_ON being hit in handle_mm_fault()
* Second patch fixes a problem with devices writing to the
system management area and were blocked by the IOMMU because
the driver wrongly cleared out the DTE flags allowing that
access.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJWLeEzAAoJECvwRC2XARrjD/gQAIAihdsuv33iQQAfwvaztNOu
l5WqQ6gr54xQXKebkY9MgiX5qXIqyNPHku08WGp63kWkfKSkTgvqQw/WNoRcpE2+
GrMKl4EvJyvOp9S3tbtx3QNeb19wGEHyOFN6UuhNBhcXzxsqMN4c3D77upZ2N81k
hYzYYjWNL+5NwsUtK8oZSZUwhmGr4Iuim9mJDMMGhZYw88/dQICIQQtPWc8/ritJ
v/sBJA3KdyRvStxuba64NOWnByYXYnzyrJvBtVPMfPfFjfcyC0D0dwfXe3jvyjh3
nDSRoXqGsd35MBwDfVIf3HUKP4Wxwd+5pbSyrTfD5b4anEFL62ifdTb6/lpMFY/X
uo88xn9oTSMHO0TOPJw2XaBB8Y2OwW1FE7BVpa0CYFMDwQ/vaIGXAoBcxGL98a26
O+xd+pcMVELwOT0XFS5ue7eaZZdLCooj52s1ik8tMIB2qu6lFzd4JyWA7O3LbnMU
qT7YvZKbATjBvIaP0fHpZuZv6iyE2L9pdrvDIGeBb2TqE7r89JfRIYZcYfSvtrdA
TwlijU/w3eMdUoDVDCSlVT9UVfFyPZNwVS1qT3iU4OeVS8MPTnM/lNnWokruAoY0
hcbOOZ7EEtT6o/1GXyWDVaBKbAuWRNhbEEUMEBUlgPN7k9AW6dQIZDPftXwQ05tQ
XzUgi1ueNHcw1br6PItG
=4Bny
-----END PGP SIGNATURE-----
Merge tag 'iommu-fixes-v4.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
"Two late fixes for the AMD IOMMU driver:
- add an additional check to the io page-fault handler to avoid a
BUG_ON being hit in handle_mm_fault()
- fix a problem with devices writing to the system management area
and were blocked by the IOMMU because the driver wrongly cleared
out the DTE flags allowing that access"
* tag 'iommu-fixes-v4.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Don't clear DTE flags when modifying it
iommu/amd: Fix BUG when faulting a PROT_NONE VMA
When booted with intel_iommu=ecs_off we were still allocating the PASID
tables even though we couldn't actually use them. We really want to make
the pasid_enabled() macro depend on ecs_enabled().
Which is unfortunate, because currently they're the other way round to
cope with the Broadwell/Skylake problems with ECS.
Instead of having ecs_enabled() depend on pasid_enabled(), which was never
something that made me happy anyway, make it depend in the normal case
on the "broken PASID" bit 28 *not* being set.
Then pasid_enabled() can depend on ecs_enabled() as it should. And we also
don't need to mess with it if we ever see an implementation that has some
features requiring ECS (like PRI) but which *doesn't* have PASID support.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Not entirely clear why, but it seems we need to reserve PASID zero and
flush it when we make a PASID entry present.
Quite we we couldn't use the true PASID value, isn't clear.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Propagate the error-value from the function ir_parse_ioapic_hpet_scope()
in parse_ioapics_under_ir() and cleanup its calling loop.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Adjust the return value of parse_ioapics_under_ir as
negative value representing failure and "0" representing
succcess. Just make it consistent with other function
implementations, and we can judge if calling is successfull
by if (!parse_ioapics_under_ir()) style.
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Now that the iommu core support for iommu groups is not
pci-centric anymore, we can move default domain allocation
to the bus independent iommu_group_get_for_dev() function.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
All callers of iommu_group_get_for_dev() provide a
device_group call-back now, so this fall-back is no longer
needed.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This converts the ARM SMMU and the SMMUv3 driver to use the
new device_group call-back.
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Convert the fsl pamu driver to make use of the new
device_group call-back.
Cc: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Rename that function to pci_device_group() and export it, so
that IOMMU drivers can use it as their device_group
call-back.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
That call-back is currently unused, change it into a
call-back function for finding the right IOMMU group for a
device.
This is a first step to remove the hard-coded PCI dependency
in the iommu-group code.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Pull intel-iommu bugfix from David Woodhouse:
"This contains a single fix, for when the IOMMU API is used to overlay
an existing mapping comprised of 4KiB pages, with a mapping that can
use superpages.
For the *first* superpage in the new mapping, we were correctly¹
freeing the old bottom-level page table page and clearing the link to
it, before installing the superpage. For subsequent superpages,
however, we weren't. This causes a memory leak, and a warning about
setting a PTE which is already set.
¹ Well, not *entirely* correctly. We just free the page table pages
right there and then, which is wrong. In fact they should only be
freed *after* the IOTLB is flushed so we know the hardware will no
longer be looking at them.... and in fact I note that the IOTLB
flush is completely missing from the intel_iommu_map() code path,
although it needs to be there if it's permitted to overwrite
existing mappings.
Fixing those is somewhat more intrusive though, and will probably
need to wait for 4.4 at this point"
* tag 'for-linus-20151021' of git://git.infradead.org/intel-iommu:
iommu/vt-d: fix range computation when making room for large pages
The code is buggy and the values read from PCI are not
reliable anyway, so it is the best to just remove this code.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Clean up the functions to allocate the command, event and
ppr-log buffers. Remove redundant code and change the return
value to int.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The driver always uses a constant size for these buffers
anyway, so there is no need to waste memory to store the
sizes.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This mostly removes the code to create dev_data structures
for alias device ids. They are not necessary anymore, as
they were only created for device ids which have no struct
pci_dev associated with it. But these device ids are
handled in a simpler way now, so there is no need for this
code anymore.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
With this we don't have to create dev_data entries for
non-existent devices (which only exist as request-ids).
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The alias list is handled aleady by iommu core code. No need
anymore to handle it in this part of the AMD IOMMU code
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The condition in the BUG_ON is an indicator of a BUG, but no
reason to kill the code path. Turn it into a WARN_ON and
bail out if it is hit.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
During device assignment/deassignment the flags in the DTE
get lost, which might cause spurious faults, for example
when the device tries to access the system management range.
Fix this by not clearing the flags with the rest of the DTE.
Reported-by: G. Richard Bellamy <rbellamy@pteradigm.com>
Tested-by: G. Richard Bellamy <rbellamy@pteradigm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Change the 'pages' parameter to 'unsigned long' to avoid overflow.
Fix the device-IOTLB flush parameter calculation — the size of the IOTLB
flush is indicated by the position of the least significant zero bit in
the address field. For example, a value of 0x12345f000 will flush from
0x123440000 to 0x12347ffff (256KiB).
Finally, the cap_pgsel_inv() is not relevant to SVM; the spec says that
*all* implementations must support page-selective invaliation for
"first-level" translations. So don't check for it.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This will give a little bit of assistance to those developing drivers
using SVM. It might cause a slight annoyance to end-users whose kernel
disables the IOMMU when drivers are trying to use it. But the fix there
is to fix the kernel to enable the IOMMU.
Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
There is an extra semi-colon on this if statement so we always break on
the first iteration.
Fixes: 0204a49609 ('iommu/vt-d: Add callback to device driver on page faults')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This really should be VTD_PAGE_SHIFT, not PAGE_SHIFT. Not that we ever
really anticipate seeing this used on IA64, but we should get it right
anyway.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The "req->addr" variable is a bit field declared as "u64 addr:52;".
The "address" variable is a u64. We need to cast "req->addr" to a u64
before the shift or the result is truncated to 52 bits.
Fixes: a222a7f0bb ('iommu/vt-d: Implement page request handling')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Dan Carpenter pointed out an error path which could lead to us
dereferencing the 'svm' pointer after we know it to be NULL because the
PASID lookup failed. Fix that, and make it less likely to happen again.
Fixes: a222a7f0bb ('iommu/vt-d: Implement page request handling')
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Despite being a platform device, the SMMUv3 is capable of signaling
interrupts using MSIs. Hook it into the platform MSI framework and
enjoy faults being reported in a new and exciting way.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
[will: tidied up the binding example and reworked most of the code]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Since commit 1463fe44fd ("iommu/arm-smmu: Don't use VMIDs for stage-1
translations"), we don't need the GR0 base address when initialising a
context bank, so remove the useless local variable and its init code.
Signed-off-by: Will Deacon <will.deacon@arm.com>
The bitmap allocator returns an int, which is one of the standard
negative values on failure. Rather than assigning this straight to a
u16 (like we do for the ASID and VMID callers), which means that we
won't detect failure correctly, use an int for the purposes of error
checking.
Cc: <stable@vger.kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This is only usable for the static 1:1 mapping of physical memory.
Any access to vmalloc or module regions will require some way of doing
an IOTLB flush. It's theoretically possible to hook into the
tlb_flush_kernel_range() function, but that seems like overkill — most
of the addresses accessed through a kernel PASID *will* be in the 1:1
mapping.
If we really need to allow access to more interesting kernel regions,
then the answer will probably be an explicit IOTLB flush call after use,
akin to the DMA API's unmap function.
In fact, it might be worth introducing that sooner rather than later, and
making it just BUG() if the address isn't in the static 1:1 mapping.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Taking inspiration from the existing arch/arm code, break out some
generic functions to interface the DMA-API to the IOMMU-API. This will
do the bulk of the heavy lifting for IOMMU-backed dma-mapping.
Since associating an IOVA allocator with an IOMMU domain is a fairly
common need, rather than introduce yet another private structure just to
do this for ourselves, extend the top-level struct iommu_domain with the
notion. A simple opaque cookie allows reuse by other IOMMU API users
with their various different incompatible allocator types.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
If IRTE is in posted format, the 'pda' field goes across the 64-bit
boundary, we need use cmpxchg16b to atomically update it. We only
expose posted-interrupt when X86_FEATURE_CX16 is supported and use
to update it atomically.
Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
handle_mm_fault indirectly triggers a BUG in do_numa_page
when given a VMA without read/write/execute access. Check
this condition in do_fault.
do_fault -> handle_mm_fault -> handle_pte_fault -> do_numa_page
mm/memory.c
3147 static int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
....
3159 /* A PROT_NONE fault should not end up here */
3160 BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)));
Signed-off-by: Jay Cornwall <jay@jcornwall.me>
Cc: <stable@vger.kernel.org> # v4.1+
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This provides basic PASID support for endpoint devices, tested with a
version of the i915 driver.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The behaviour if you enable PASID support after ATS is undefined. So we
have to enable it first, even if we don't know whether we'll need it.
This is safe enough; unless we set up a context that permits it, the device
can't actually *do* anything with it.
Also shift the feature detction to dmar_insert_one_dev_info() as it only
needs to happen once.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
As long as we use an identity mapping to work around the worst of the
hardware bugs which caused us to defeature it and change the definition
of the capability bit, we *can* use PASID support on the devices which
advertised it in bit 28 of the Extended Capability Register.
Allow people to do so with 'intel_iommu=pasid28' on the command line.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The VT-d specification says that "Software must enable ATS on endpoint
devices behind a Root Port only if the Root Port is reported as
supporting ATS transactions."
We walk up the tree to find a Root Port, but for integrated devices we
don't find one — we get to the host bridge. In that case we *should*
allow ATS. Currently we don't, which means that we are incorrectly
failing to use ATS for the integrated graphics. Fix that.
We should never break out of this loop "naturally" with bus==NULL,
since we'll always find bridge==NULL in that case (and now return 1).
So remove the check for (!bridge) after the loop, since it can never
happen. If it did, it would be worthy of a BUG_ON(!bridge). But since
it'll oops anyway in that case, that'll do just as well.
Cc: stable@vger.kernel.org
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
In preparation for deprecating ioremap_cache() convert its usage in
intel-iommu to memremap. This also eliminates the mishandling of the
__iomem annotation in the implementation.
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The SMMU architecture defines two different behaviors when 64-bit
registers are written with 32-bit writes. The first behavior causes
zero extension into the upper 32-bits. The second behavior splits a
64-bit register into "normal" 32-bit register pairs.
On some buggy implementations, registers incorrectly zero extended
when they should instead behave as normal 32-bit register pairs.
Signed-off-by: Tirumalesh Chalamarla <tchalamarla@caviumnetworks.com>
[will: removed redundant macro parameters]
Signed-off-by: Will Deacon <will.deacon@arm.com>
'%pad' automatically prints with '0x', so remove the explicit '0x'
annotation.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Rather than keep a private list of struct arm_smmu_device and searching
this whenever we need to look up the correct SMMU instance, instead use
the drvdata field in the struct device to take care of the mapping for
us.
Signed-off-by: Will Deacon <will.deacon@arm.com>
The DSP MMUs on DRA7xx SoC requires configuring an additional
MMU_CONFIG register present in the DSP_SYSTEM sub module. This
setting dictates whether the DSP Core's MDMA and EDMA traffic
is routed through the respective MMU or not. Add the support
to the OMAP iommu driver so that the traffic is not bypassed
when enabling the MMUs.
The MMU_CONFIG register has two different bits for enabling
each of these two MMUs present in the DSP processor sub-system
on DRA7xx. An id field is added to the OMAP iommu object to
identify and enable each IOMMU. The id information and the
DSP_SYSTEM.MMU_CONFIG register programming is achieved through
the processing of the optional "ti,syscon-mmuconfig" property.
A proper value is assigned to the id field only when this
property is present.
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In preparation for the installation of a large page, any small page
tables that may still exist in the target IOV address range are
removed. However, if a scatter/gather list entry is large enough to
fit more than one large page, the address space for any subsequent
large pages is not cleared of conflicting small page tables.
This can cause legitimate mapping requests to fail with errors of the
form below, potentially followed by a series of IOMMU faults:
ERROR: DMA PTE for vPFN 0xfde00 already set (to 7f83a4003 not 7e9e00083)
In this example, a 4MiB scatter/gather list entry resulted in the
successful installation of a large page @ vPFN 0xfdc00, followed by
a failed attempt to install another large page @ vPFN 0xfde00, due to
the presence of a pointer to a small page table @ 0x7f83a4000.
To address this problem, compute the number of large pages that fit
into a given scatter/gather list entry, and use it to derive the
last vPFN covered by the large page(s).
Cc: stable@vger.kernel.org
Signed-off-by: Christian Zander <christian@nervanasys.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
A few fixes piled up:
* Fix for a suspend/resume issue where PCI probing code overwrote
dev->irq for the MSI irq of the AMD IOMMU.
* Fix for a kernel crash when a 32 bit PCI device was assigned to a KVM
guest.
* Fix for a possible memory leak in the VT-d driver
* A couple of fixes for the ARM-SMMU driver
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJWHNdbAAoJECvwRC2XARrjB/YQAKouJaRMjBaehx6kbaZMhMJy
hXDsh8Xl6TtCe6kLD2uXrvjLZAdu32kjrtzhhcM21EO5Ms2Weq6A60/98LwnJ4Eg
AqftjfxQsIwf2G1PvHb+xepgcFxIAhW6a3nORzx6d2AGrNWmMtUhbLTSncYjmojf
Td4dscuRmRPenJUV1JhcJQBR62QonknIHV99QmevaCSAoUdyuMH+t5kQVEgPjx7C
GlMPNEZZmGl7J3NXSWRtDSkUxFZ1OU8MTKc1LmPPHHAOZk37wbePihQbLLySlHPH
v4G1R05e2hG7C66yu959fyOleL87lDToUXhwQNFJMqEc+e7IzBzZsB3ANEHjpLQH
UJC9COU+sf8mPafja4ge/KbyGDmgDg/OMQJDhU6+DSXUflwymeWJmXr7sLFQex6O
nZO/SVzkbKj+PKxV7UnGD0sTeAAk0X6vfhFCL0l/acPpQg0T6Fpky5D5fUMv5dWS
xxxvxfwBcDoI44fxWBhfPYvmLFT9f5da+bpbzeeGjVSNezOkPJ65AJcVk5An4kQu
PRzJGoq3XpZHOeg5+O7IKzeuJ+3qc7Tz4wAzMxcaNFpVBl2qp1RUkTbmS9/YV1b5
ZOcIFBMLuUROE1ExsU19c5Uo0j1Bvh9jtdy6lNFCagQYzihtA0Jk19ucllx1jIjD
sdv2hgDIauRToKF1d9xz
=v5G4
-----END PGP SIGNATURE-----
Merge tag 'iommu-fixes-v4.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
"A few fixes piled up:
- Fix for a suspend/resume issue where PCI probing code overwrote
dev->irq for the MSI irq of the AMD IOMMU.
- Fix for a kernel crash when a 32 bit PCI device was assigned to a
KVM guest.
- Fix for a possible memory leak in the VT-d driver
- A couple of fixes for the ARM-SMMU driver"
* tag 'iommu-fixes-v4.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Fix NULL pointer deref on device detach
iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices
iommu/vt-d: Fix memory leak in dmar_insert_one_dev_info()
iommu/arm-smmu: Use correct address mask for CMD_TLBI_S2_IPA
iommu/arm-smmu: Ensure IAS is set correctly for AArch32-capable SMMUs
iommu/io-pgtable-arm: Don't use dma_to_phys()
When a device group is detached from its domain, the iommu
core code calls into the iommu driver to detach each device
individually.
Before this functionality went into the iommu core code, it
was implemented in the drivers, also in the AMD IOMMU
driver as the device alias handling code.
This code is still present, as there might be aliases that
don't exist as real PCI devices (and are therefore invisible
to the iommu core code).
Unfortunatly it might happen now, that a device is unbound
multiple times from its domain, first by the alias handling
code and then by the iommu core code (or vice verca).
This ends up in the do_detach function which dereferences
the dev_data->domain pointer. When the device is already
detached, this pointer is NULL and we get a kernel oops.
Removing the alias code completly is not an option, as that
would also remove the code which handles invisible aliases.
The code could be simplified, but this is too big of a
change outside the merge window.
For now, just check the dev_data->domain pointer in
do_detach and bail out if it is NULL.
Reported-by: Andreas Hartmann <andihartmann@freenet.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
AMD IOMMU driver makes use of IOMMU PCI devices, so prevent binding other
PCI drivers to IOMMU PCI devices.
This fixes a bug reported by Boris that system suspend/resume gets broken
on AMD platforms. For more information, please refer to:
https://lkml.org/lkml/2015/9/26/89
Fixes: 991de2e590 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()")
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This adds an IOMMU API implementation for s390 PCI devices.
Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>