Print the warning about the fall-back to IOMMU_DOMAIN_DMA in
iommu_group_get_for_dev() only when such a domain was
actually allocated.
Otherwise the user will get misleading warnings in the
kernel log when the iommu driver used doesn't support
IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_IDENTITY.
Fixes: fccb4e3b8a ('iommu: Allow default domain type to be set on the kernel command line')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Use dev_printk() when possible so the IOMMU messages are more consistent
with other messages related to the device.
E.g., I think these messages related to surprise hotplug:
pciehp 0000:80:10.0:pcie004: Slot(36): Link Down
iommu: Removing device 0000:87:00.0 from group 12
pciehp 0000:80:10.0:pcie004: Slot(36): Card present
pcieport 0000:80:10.0: Data Link Layer Link Active not set in 1000 msec
would be easier to read as these (also requires some PCI changes not
included here):
pci 0000:80:10.0: Slot(36): Link Down
pci 0000:87:00.0: Removing from iommu group 12
pci 0000:80:10.0: Slot(36): Card present
pci 0000:80:10.0: Data Link Layer Link Active not set in 1000 msec
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Introduce iotlb_sync_map() callback that is invoked in the end of
iommu_map(). This new callback allows IOMMU drivers to avoid syncing
after mapping of each contiguous chunk and sync only when the whole
mapping is completed, optimizing performance of the mapping operation.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This check needs to be there and got lost at some point
during development. Add it again.
Fixes: 641fb0efbf ('iommu/of: Don't call iommu_ops->add_device directly')
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reported-by: kernelci.org bot <bot@kernelci.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
These wrappers will be used to easily change the location of
the field later when all users are converted.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends. That changed
when we forked out support for the latter into the export.h file.
This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig.
The advantage in removing such instances is that module.h itself
sources about 15 other headers; adding significantly to what we feed
cpp, and it can obscure what headers we are effectively using.
Since module.h might have been the implicit source for init.h
(for __init) and for export.h (for EXPORT_SYMBOL) we consider each
instance for the presence of either and replace as needed.
Cc: Joerg Roedel <joro@8bytes.org>
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The original motivation for iommu_map_sg() was to give IOMMU drivers the
chance to map an IOVA-contiguous scatterlist as efficiently as they
could. It turns out that there isn't really much driver-specific
business involved there, so now that the default implementation is
mandatory let's just improve that - the main thing we're after is to use
larger pages wherever possible, and as long as domain->pgsize_bitmap
reflects reality, iommu_map() can already do that in a generic way. All
we need to do is detect physically-contiguous segments and batch them
into a single map operation, since whatever we do here is transparent to
our caller and not bound by any segment-length restrictions on the list
itself.
Speaking of efficiency, there's really very little point in duplicating
the checks that iommu_map() is going to do anyway, so those get cleared
up in the process.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add a generic command line option to enable lazy unmapping via IOVA
flush queues, which will initally be suuported by iommu-dma. This echoes
the semantics of "intel_iommu=strict" (albeit with the opposite default
value), but in the driver-agnostic fashion of "iommu.passthrough".
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
[rm: move handling out of SMMUv3 driver, clean up documentation]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
[will: dropped broken printk when parsing command-line option]
Signed-off-by: Will Deacon <will.deacon@arm.com>
The external interface to get/set window attributes is already
abstracted behind iommu_domain_{get,set}_attr(), so there's no real
reason for the internal interface to be different. Since we only have
one window-based driver anyway, clean up the core code by just moving
the DOMAIN_ATTR_WINDOWS handling directly into the PAMU driver.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
While iommu_get_domain_for_dev() is the robust way for arbitrary IOMMU
API callers to retrieve the domain pointer, for DMA ops domains it
doesn't scale well for large systems and multi-queue devices, since the
momentary refcount adjustment will lead to exclusive cacheline contention
when multiple CPUs are operating in parallel on different mappings for
the same device.
In the case of DMA ops domains, however, this refcounting is actually
unnecessary, since they already imply that the group exists and is
managed by platform code and IOMMU internals (by virtue of
iommu_group_get_for_dev()) such that a reference will already be held
for the lifetime of the device. Thus we can avoid the bottleneck by
providing a fast lookup specifically for the DMA code to retrieve the
default domain it already knows it has set up - a simple read-only
dereference plays much nicer with cache-coherency protocols.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Implement bus specific support for the fsl-mc bus including
registering arm_smmu_ops and bus specific device add operations.
Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
All iommu drivers use the default_iommu_map_sg implementation, and there
is no good reason to ever override it. Just expose it as iommu_map_sg
directly and remove the indirection, specially in our post-spectre world
where indirect calls are horribly expensive.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This allows the default behavior to be controlled by a kernel config
option instead of changing the commandline for the kernel to include
"iommu.passthrough=on" or "iommu=pt" on machines where this is desired.
Likewise, for machines where this config option is enabled, it can be
disabled at boot time with "iommu.passthrough=off" or "iommu=nopt".
Also corrected iommu=pt documentation for IA-64, since it has no code that
parses iommu= at all.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
While we could print it at setup time, this is an easier way to match
each device to their default IOMMU allocation type.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Provide base enablement for using debugfs to expose internal data of an
IOMMU driver. When called, create the /sys/kernel/debug/iommu directory.
Emit a strong warning at boot time to indicate that this feature is
enabled.
This function is called from iommu_init, and creates the initial DebugFS
directory. Drivers may then call iommu_debugfs_new_driver_dir() to
instantiate a device-specific directory to expose internal data.
It will return a pointer to the new dentry structure created in
/sys/kernel/debug/iommu, or NULL in the event of a failure.
Since the IOMMU driver can not be removed from the running system, there
is no need for an "off" function.
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
strtobool() does check for NULL parameter already. No need to repeat.
While here, switch to kstrtobool() and unshadow actual error code
(which is still -EINVAL).
No functional change intended.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Currently, iommu_unmap, iommu_unmap_fast and iommu_map_sg return
size_t. However, some of the return values are error codes (< 0),
which can be misinterpreted as large size. Therefore, returning size 0
instead to signify failure to map/unmap.
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The result of iommu_group_get() was being blindly used in both
attach and detach which results in a dereference when trying
to work with an unknown device.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
With the current IOMMU-API the hardware TLBs have to be
flushed in every iommu_ops->unmap() call-back.
For unmapping large amounts of address space, like it
happens when a KVM domain with assigned devices is
destroyed, this causes thousands of unnecessary TLB flushes
in the IOMMU hardware because the unmap call-back runs for
every unmapped physical page.
With the TLB Flush Interface and the new iommu_unmap_fast()
function introduced here the need to clean the hardware TLBs
is removed from the unmapping code-path. Users of
iommu_unmap_fast() have to explicitly call the TLB-Flush
functions to sync the page-table changes to the hardware.
Three functions for TLB-Flushes are introduced:
* iommu_flush_tlb_all() - Flushes all TLB entries
associated with that
domain. TLBs entries are
flushed when this function
returns.
* iommu_tlb_range_add() - This will add a given
range to the flush queue
for this domain.
* iommu_tlb_sync() - Flushes all queued ranges from
the hardware TLBs. Returns when
the flush is finished.
The semantic of this interface is intentionally similar to
the iommu_gather_ops from the io-pgtable code.
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The recently-removed FIXME in iommu_get_domain_for_dev() turns out to
have been a little misleading, since that check is still worthwhile even
when groups *are* universal. We have a few IOMMU-aware drivers which
only care whether their device is already attached to an existing domain
or not, for which the previous behaviour of iommu_get_domain_for_dev()
was ideal, and who now crash if their device does not have an IOMMU.
With IOMMU groups now serving as a reliable indicator of whether a
device has an IOMMU or not (barring false-positives from VFIO no-IOMMU
mode), drivers could arguably do this:
group = iommu_group_get(dev);
if (group) {
domain = iommu_get_domain_for_dev(dev);
iommu_group_put(group);
}
However, rather than duplicate that code across multiple callsites,
particularly when it's still only the domain they care about, let's skip
straight to the next step and factor out the check into the common place
it applies - in iommu_get_domain_for_dev() itself. Sure, it ends up
looking rather familiar, but now it's backed by the reasoning of having
a robust API able to do the expected thing for all devices regardless.
Fixes: 05f80300dc ("iommu: Finish making iommu_group support mandatory")
Reported-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This new call-back will be used to check if the domain attach need be
deferred for now. If yes, the domain attach/detach will return directly.
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Now that all the drivers properly implementing the IOMMU API support
groups (I'm ignoring the etnaviv GPU MMUs which seemingly only do just
enough to convince the ARM DMA mapping ops), we can remove the FIXME
workarounds from the core code. In the process, it also seems logical to
make the .device_group callback non-optional for drivers calling
iommu_group_get_for_dev() - the current callers all implement it anyway,
and it doesn't make sense for any future callers not to either.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This callback should never return NULL. Print a warning if
that happens so that we notice and can fix it.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The generic device_group call-backs in iommu.c return NULL
in case of error. Since they are getting ERR_PTR values from
iommu_group_alloc(), just pass them up instead.
Reported-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The function is in no fast-path, there is no need for it to
be static inline in a header file. This also removes the
need to include iommu trace-points in iommu.h.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In iommu_bus_notifier(), when action is
BUS_NOTIFY_ADD_DEVICE, it will return 'ops->add_device(dev)'
directly. But ops->add_device will return ERR_VAL, such as
-ENODEV. These value will make notifier_call_chain() not to
traverse the remain nodes in struct notifier_block list.
This patch revises iommu_bus_notifier() to return
NOTIFY_DONE when some errors happened in ops->add_device().
Signed-off-by: zhichang.yuan <yuanzhichang@hisilicon.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The IOMMU core currently initialises the default domain for each group
to IOMMU_DOMAIN_DMA, under the assumption that devices will use
IOMMU-backed DMA ops by default. However, in some cases it is desirable
for the DMA ops to bypass the IOMMU for performance reasons, reserving
use of translation for subsystems such as VFIO that require it for
enforcing device isolation.
Rather than modify each IOMMU driver to provide different semantics for
DMA domains, instead we introduce a command line parameter that can be
used to change the type of the default domain. Passthrough can then be
specified using "iommu.passthrough=1" on the kernel command line.
Signed-off-by: Will Deacon <will.deacon@arm.com>
The introduction of reserved regions has left a couple of rough edges
which we could do with sorting out sooner rather than later. Since we
are not yet addressing the potential dynamic aspect of software-managed
reservations and presenting them at arbitrary fixed addresses, it is
incongruous that we end up displaying hardware vs. software-managed MSI
regions to userspace differently, especially since ARM-based systems may
actually require one or the other, or even potentially both at once,
(which iommu-dma currently has no hope of dealing with at all). Let's
resolve the former user-visible inconsistency ASAP before the ABI has
been baked into a kernel release, in a way that also lays the groundwork
for the latter shortcoming to be addressed by follow-up patches.
For clarity, rename the software-managed type to IOMMU_RESV_SW_MSI, use
IOMMU_RESV_MSI to describe the hardware type, and document everything a
little bit. Since the x86 MSI remapping hardware falls squarely under
this meaning of IOMMU_RESV_MSI, apply that type to their regions as well,
so that we tell the same story to userspace across all platforms.
Secondly, as the various region types require quite different handling,
and it really makes little sense to ever try combining them, convert the
bitfield-esque #defines to a plain enum in the process before anyone
gets the wrong impression.
Fixes: d30ddcaa7b ("iommu: Add a new type field in iommu_resv_region")
Reviewed-by: Eric Auger <eric.auger@redhat.com>
CC: Alex Williamson <alex.williamson@redhat.com>
CC: David Woodhouse <dwmw2@infradead.org>
CC: kvm@vger.kernel.org
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
And also move its remaining functionality to
iommu_device_register() and 'struct iommu_device'.
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: devicetree@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This struct represents one hardware iommu in the iommu core
code. For now it only has the iommu-ops associated with it,
but that will be extended soon.
The register/unregister interface is also added, as well as
making use of it in the Intel and AMD IOMMU drivers.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The struct is used to link devices to iommu-groups, so
'struct group_device' is a better name. Further this makes
the name iommu_device available for a struct representing
hardware iommus.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Rename the function to iommu_ops_from_fwnode(), because that
is what the function actually does. The new name is much
more descriptive about what the function does.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In case the device reserved region list is void, the returned value
of iommu_insert_device_resv_regions is uninitialized. Let's return 0
in that case.
This fixes commit 6c65fb318e ("iommu: iommu_get_group_resv_regions").
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Move the assignment statement into if branch above, where it only
needs to be.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
A new iommu-group sysfs attribute file is introduced. It contains
the list of reserved regions for the iommu-group. Each reserved
region is described on a separate line:
- first field is the start IOVA address,
- second is the end IOVA address,
- third is the type.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com>
Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Introduce iommu_get_group_resv_regions whose role consists in
enumerating all devices from the group and collecting their
reserved regions. The list is sorted and overlaps between
regions of the same type are handled by merging the regions.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com>
Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com>
Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
As we introduced new reserved region types which do not require
mapping, let's make sure we only map direct mapped regions.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com>
Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com>
Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Introduce a new helper serving the purpose to allocate a reserved
region. This will be used in iommu driver implementing reserved
region callbacks.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com>
Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com>
Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We want to extend the callbacks used for dm regions and
use them for reserved regions. Reserved regions can be
- directly mapped regions
- regions that cannot be iommu mapped (PCI host bridge windows, ...)
- MSI regions (because they belong to another address space or because
they are not translated by the IOMMU and need special handling)
So let's rename the struct and also the callbacks.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com>
Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com>
Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We wouldn't normally expect ops->attach_dev() to fail, but on IOMMUs
with limited hardware resources, or generally misconfigured systems,
it is certainly possible. We report failure correctly from the external
iommu_attach_device() interface, but do not do so in iommu_group_add()
when attaching to the default domain. The result of failure there is
that the device, group and domain all get left in a broken,
part-configured state which leads to weird errors and misbehaviour down
the line when IOMMU API calls sort-of-but-don't-quite work.
Check the return value of __iommu_attach_device() on the default domain,
and refactor the error handling paths to cope with its failure and clean
up correctly in such cases.
Fixes: e39cb8a3aa ("iommu: Make sure a device is always attached to a domain")
Reported-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The of_iommu_{set/get}_ops() API is used to associate a device
tree node with a specific set of IOMMU operations. The same
kernel interface is required on systems booting with ACPI, where
devices are not associated with a device tree node, therefore
the interface requires generalization.
The struct device fwnode member represents the fwnode token associated
with the device and the struct it points at is firmware specific;
regardless, it is initialized on both ACPI and DT systems and makes an
ideal candidate to use it to associate a set of IOMMU operations to a
given device, through its struct device.fwnode member pointer, paving
the way for representing per-device iommu_ops (ie an iommu instance
associated with a device).
Convert the DT specific of_iommu_{set/get}_ops() interface to
use struct device.fwnode as a look-up token, making the interface
usable on ACPI systems and rename the data structures and the
registration API so that they are made to represent their usage
more clearly.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Tomasz Nowicki <tn@semihalf.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: Tomasz Nowicki <tn@semihalf.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Hanjun Guo <hanjun.guo@linaro.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>