Prepare for adding bankid, also no functional change.
In the previous SoC, each a iova_region is a domain; In the multi-banks
case, each a bank is a domain, then the original function name
"mtk_iommu_get_domain_id" is not proper. Use "iova_region_id" instead of
"domain_id".
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-32-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The mt8195 IOMMU HW max support 5 banks, and regarding the banks'
registers, it looks like:
----------------------------------------
|bank0 | bank1 | bank2 | bank3 | bank4|
----------------------------------------
|global |
|control| null
|regs |
-----------------------------------------
|bank |bank |bank |bank |bank |
|regs |regs |regs |regs |regs |
| | | | | |
-----------------------------------------
Each bank has some special bank registers and it share bank0's global
control registers. this patch initialise the bank hw with the bankid.
In the hw_init, we always initialise bank0's control register since
we don't know if the bank0 is initialised.
Additionally, About each bank's register base, always delta 0x1000.
like bank[x + 1] = bank[x] + 0x1000.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-31-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Prepare for supporting multi-banks for the IOMMU HW, No functional change.
Add a new structure(mtk_iommu_bank_data) for each a bank. Each a bank have
the independent HW base/IRQ/tlb-range ops, and each a bank has its special
iommu-domain(independent pgtable), thus, also move the domain information
into it.
In previous SoC, we have only one bank which could be treated as bank0(
bankid always is 0 for the previous SoC).
After adding this structure, the tlb operations and irq could use
bank_data as parameter.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-30-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
No functional change. Just rename this for readable. Differentiate this
from mtk_iommu.c
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-29-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Currently there is a suspend structure in the header file. It's no need
to keep a header file only for this. Move these into the c file and rm
this header file.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-28-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Prepare for adding the structure "mtk_iommu_bank_data". No functional
change. The mtk_iommu_domain in v1 and v2 are different, we could not add
current data as bank[0] in v1 simplistically.
Currently we have no plan to add new SoC for v1, in order to avoid affect
v1 when we add many new features for v2, I totally separate v1 and v2 in
this patch, there are many structures only for v2.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-27-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
No functional change too, prepare for mt8195 IOMMU support bank functions.
Some global control settings are in bank0 while the other banks have
their bank independent setting. Here only move the global control
settings and the independent registers together.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-26-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
No functional change. Use "base" instead of the data->base. This is
avoid to touch too many lines in the next patches.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-25-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
mt8195 has 3 IOMMU, containing 2 MM IOMMUs, one is for vdo, the other
is for vpp. and 1 INFRA IOMMU.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-24-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Currently the code for of_iommu_configure_dev_id is like this:
static int of_iommu_configure_dev_id(struct device_node *master_np,
struct device *dev,
const u32 *id)
{
struct of_phandle_args iommu_spec = { .args_count = 1 };
err = of_map_id(master_np, *id, "iommu-map",
"iommu-map-mask", &iommu_spec.np,
iommu_spec.args);
...
}
It supports only one id output. BUT our PCIe HW has two ID(one is for
writing, the other is for reading). I'm not sure if we should change
of_map_id to support output MAX_PHANDLE_ARGS.
Here add the solution in ourselve drivers. If it's pcie case, enable one
more bit.
Not all infra iommu support PCIe, thus add a PCIe support flag here.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-23-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The infra iommu enable bits in mt8195 is in the pericfg register segment,
use regmap to update it.
If infra iommu master translation fault, It doesn't have the larbid/portid,
thus print out the whole register value.
Since regmap_update_bits may fail, add return value for mtk_iommu_config.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-22-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The power/clock of infra iommu is always on, and it doesn't have the
device link with the master devices, then the infra iommu device's PM
status is not active, thus we add A PM_CLK_AO flag for infra iommu.
The tlb operation is a bit not clear here, there are 2 special cases.
Comment them in the code.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-21-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Allow the type IOMMU_DOMAIN_UNMANAGED since vfio_iommu_type1.c always call
iommu_domain_alloc. The PCIe EP works ok when going through vfio.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-20-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
For MM IOMMU, We always add device link between smi-common and IOMMU HW.
In mt8195, we add smi-sub-common. Thus, if the node is sub-common, we still
need find again to get smi-common, then do device link.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-19-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Prepare for supporting INFRA_IOMMU, and APU_IOMMU later.
For Infra IOMMU/APU IOMMU, it doesn't have the "larb""port". thus, Use
the MM flag contain the MM_IOMMU special flow, Also, it moves a big
chunk code about parsing the mediatek,larbs into a function, this is
only needed for MM IOMMU. and all the current SoC are MM_IOMMU.
The device link between iommu consumer device and smi-larb device only
is needed in MM iommu case.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-18-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add IOMMU_TYPE definition. In the mt8195, we have another IOMMU_TYPE:
infra iommu, also there will be another APU_IOMMU, thus, use 2bits for the
IOMMU_TYPE.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-17-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In prevous SoC, the sub common id occupy 2 bits. the mt8195's sub common
id has 3bits. Add a new flag for this. and rename the previous flag to
_2BITS. For readable, I put these two flags together, then move the
other flags. no functional change.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-16-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Currently the output PA[32:33] is contained by the flag IOVA_34.
This is not right. the iova_34 has no relation with pa[32:33], the 32bits
iova still could map to pa[32:33]. Move it out from the flag.
No need fix tag since currently only mt8192 use the calulation and it
always has this IOVA_34 flag.
Prepare for the IOMMU that still use IOVA 32bits but its dram size may be
over 4GB.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-15-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The MediaTek IOMMU doesn't care about granule when tlb flushing.
Remove this variable.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-14-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add a new flag STD_AXI_MODE which is prepared for infra and apu iommu
which use the standard axi mode. All the current SoC don't use this flag.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-13-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In the infra iommu, we should disable DCM. add a new flag for this.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-12-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In mt8192, we preassign 0-4G; 4G-8G; 8G-12G for different multimedia
engines. This depends on the "dma-ranges=" in the iommu consumer's dtsi
node.
Adds 12G-16G region here. and reword the previous comment. we don't limit
which master locate in which region.
CCU still is 8G-12G. Don't change it here.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-11-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In previous mt2712, Both IOMMUs are MM IOMMU, and they will share pgtable.
However in the latest SoC, another is infra IOMMU, there is no reason to
share pgtable between MM with INFRA IOMMU. This patch manage to
implement the two case(sharing and non-sharing pgtable).
Currently we use for_each_m4u to loop the 2 HWs. Add the list_head into
this macro.
In the sharing pgtable case, the list_head is the global "m4ulist".
In the non-sharing pgtable case, the list_head is hw_list_head which is a
variable in the "data". then for_each_m4u will only loop itself.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-10-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Same with the previous patch, add a mutex for the "data" in the
mtk_iommu_domain. Just improve the safety for multi devices
enter attach_device at the same time. We don't get the real issue
for this.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-9-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add a mutex to protect the data in the structure mtk_iommu_data,
like ->"m4u_group" ->"m4u_dom". For the internal data, we should
protect it in ourselves driver. Add a mutex for this.
This could be a fix for the multi-groups support.
Fixes: c3045f3924 ("iommu/mediatek: Support for multi domains")
Signed-off-by: Yunfei Wang <yf.wang@mediatek.com>
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-8-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Lack the list_del in the mtk_iommu_remove, and remove
bus_set_iommu(*, NULL) since there may be several iommu HWs.
we can not bus_set_iommu null when one iommu driver unbind.
This could be a fix for mt2712 which support 2 M4U HW and list them.
Fixes: 7c3a2ec028 ("iommu/mediatek: Merge 2 M4U HWs into one iommu domain")
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-6-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In the commit 4f956c97d2 ("iommu/mediatek: Move domain_finalise into
attach_device"), I overlooked the sharing pgtable case.
After that commit, the "data" in the mtk_iommu_domain_finalise always is
the data of the current IOMMU HW. Fix this for the sharing pgtable case.
Only affect mt2712 which is the only SoC that share pgtable currently.
Fixes: 4f956c97d2 ("iommu/mediatek: Move domain_finalise into attach_device")
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20220503071427.2285-5-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This is required to make loading this as a module work.
Signed-off-by: Hector Martin <marcan@marcan.st>
Fixes: 46d1fb072e ("iommu/dart: Add DART iommu driver")
Reviewed-by: Sven Peter <sven@svenpeter.dev>
Link: https://lore.kernel.org/r/20220502092238.30486-1-marcan@marcan.st
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Smatch static checker warns:
drivers/iommu/amd/iommu_v2.c:133 free_device_state()
warn: sleeping in atomic context
Fixes by storing the list of struct device_state in a temporary
list, and then free the memory after releasing the spinlock.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 9f968fc70d ("iommu/amd: Improve amd_iommu_v2_exit()")
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20220314024321.37411-1-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
While the comment was correct that this flag was intended to convey the
block no-snoop support in the IOMMU, it has become widely implemented and
used to mean the IOMMU supports IOMMU_CACHE as a map flag. Only the Intel
driver was different.
Now that the Intel driver is using enforce_cache_coherency() update the
comment to make it clear that IOMMU_CAP_CACHE_COHERENCY is only about
IOMMU_CACHE. Fix the Intel driver to return true since IOMMU_CACHE always
works.
The two places that test this flag, usnic and vdpa, are both assigning
userspace pages to a driver controlled iommu_domain and require
IOMMU_CACHE behavior as they offer no way for userspace to synchronize
caches.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/3-v3-2cf356649677+a32-intel_no_snoop_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
IOMMU_CACHE means "normal DMA to this iommu_domain's IOVA should be cache
coherent" and is used by the DMA API. The definition allows for special
non-coherent DMA to exist - ie processing of the no-snoop flag in PCIe
TLPs - so long as this behavior is opt-in by the device driver.
The flag is mainly used by the DMA API to synchronize the IOMMU setting
with the expected cache behavior of the DMA master. eg based on
dev_is_dma_coherent() in some case.
For Intel IOMMU IOMMU_CACHE was redefined to mean 'force all DMA to be
cache coherent' which has the practical effect of causing the IOMMU to
ignore the no-snoop bit in a PCIe TLP.
x86 platforms are always IOMMU_CACHE, so Intel should ignore this flag.
Instead use the new domain op enforce_cache_coherency() which causes every
IOPTE created in the domain to have the no-snoop blocking behavior.
Reconfigure VFIO to always use IOMMU_CACHE and call
enforce_cache_coherency() to operate the special Intel behavior.
Remove the IOMMU_CACHE test from Intel IOMMU.
Ultimately VFIO plumbs the result of enforce_cache_coherency() back into
the x86 platform code through kvm_arch_register_noncoherent_dma() which
controls if the WBINVD instruction is available in the guest. No other
archs implement kvm_arch_register_noncoherent_dma() nor are there any
other known consumers of VFIO_DMA_CC_IOMMU that might be affected by the
user visible result change on non-x86 archs.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/2-v3-2cf356649677+a32-intel_no_snoop_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This new mechanism will replace using IOMMU_CAP_CACHE_COHERENCY and
IOMMU_CACHE to control the no-snoop blocking behavior of the IOMMU.
Currently only Intel and AMD IOMMUs are known to support this
feature. They both implement it as an IOPTE bit, that when set, will cause
PCIe TLPs to that IOVA with the no-snoop bit set to be treated as though
the no-snoop bit was clear.
The new API is triggered by calling enforce_cache_coherency() before
mapping any IOVA to the domain which globally switches on no-snoop
blocking. This allows other implementations that might block no-snoop
globally and outside the IOPTE - AMD also documents such a HW capability.
Leave AMD out of sync with Intel and have it block no-snoop even for
in-kernel users. This can be trivially resolved in a follow up patch.
Only VFIO needs to call this API because it does not have detailed control
over the device to avoid requesting no-snoop behavior at the device
level. Other places using domains with real kernel drivers should simply
avoid asking their devices to set the no-snoop bit.
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1-v3-2cf356649677+a32-intel_no_snoop_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The iommu group changes notifer is not referenced in the tree. Remove it
to avoid dead code.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220418005000.897664-12-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Multiple devices may be placed in the same IOMMU group because they
cannot be isolated from each other. These devices must either be
entirely under kernel control or userspace control, never a mixture.
This adds dma ownership management in iommu core and exposes several
interfaces for the device drivers and the device userspace assignment
framework (i.e. VFIO), so that any conflict between user and kernel
controlled dma could be detected at the beginning.
The device driver oriented interfaces are,
int iommu_device_use_default_domain(struct device *dev);
void iommu_device_unuse_default_domain(struct device *dev);
By calling iommu_device_use_default_domain(), the device driver tells
the iommu layer that the device dma is handled through the kernel DMA
APIs. The iommu layer will manage the IOVA and use the default domain
for DMA address translation.
The device user-space assignment framework oriented interfaces are,
int iommu_group_claim_dma_owner(struct iommu_group *group,
void *owner);
void iommu_group_release_dma_owner(struct iommu_group *group);
bool iommu_group_dma_owner_claimed(struct iommu_group *group);
The device userspace assignment must be disallowed if the DMA owner
claiming interface returns failure.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20220418005000.897664-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
kzalloc() is a memory allocation function which can return NULL when
some internal memory errors happen. So it is better to check it to
prevent potential wrong memory access.
Besides, to propagate the error to the caller, the type of
insert_iommu_master() is changed to `int`. Several instructions related
to it are also updated.
Signed-off-by: Xiaoke Wang <xkernel.wang@foxmail.com>
Link: https://lore.kernel.org/r/tencent_EDB94B1C7E14B4E1974A66FF4D2029CC6D08@qq.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
It will cause null-ptr-deref in resource_size(), if platform_get_resource()
returns NULL, move calling resource_size() after devm_ioremap_resource() that
will check 'res' to avoid null-ptr-deref.
And use devm_platform_get_and_ioremap_resource() to simplify code.
Fixes: 46d1fb072e ("iommu/dart: Add DART iommu driver")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Sven Peter <sven@svenpeter.dev>
Link: https://lore.kernel.org/r/20220425090826.2532165-1-yangyingliang@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
- Fix off-by-one in SMMUv3 SVA TLB invalidation
- Disable large mappings to workaround nvidia erratum
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmJiivoQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNKvGB/42IgImPOaHgybA+pfWcYkesdprUuo400wz
xE2tnaZxv7z2dzGlB028DspkQit/kbToA8aPQkWwVeHU6h9SiTAPi0OUctcmFDDd
WWE2357a7epFGp913+zsl61sNHrogq6joNA/7XPu8kBHCb7VOIRu+/2JSkwqaPqv
xHpfWn1mcfTdqUtFoBgikp3h/KpznHi0DJLYKs9qNGstLYftIbOCubvMpTiweZb1
dHT7Oq2SR64x+s3gPUYjRMNYF4XgKZ0iVflv1i1EZJXgAOpQNF1r4oBMbXH+H+l8
XfNTBDogWVaOyxzvq9ruRoeufmQSpALXwAQcltFcZIHgulMvnKIR
=ml9k
-----END PGP SIGNATURE-----
Merge tag 'arm-smmu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into iommu/fixes
Arm SMMU fixes for 5.18
- Fix off-by-one in SMMUv3 SVA TLB invalidation
- Disable large mappings to workaround nvidia erratum
The page fault handling framework in the IOMMU core explicitly states
that it doesn't handle PCI PASID Stop Marker and the IOMMU drivers must
discard them before reporting faults. This handles Stop Marker messages
in prq_event_thread() before reporting events to the core.
The VT-d driver explicitly drains the pending page requests when a CPU
page table (represented by a mm struct) is unbound from a PASID according
to the procedures defined in the VT-d spec. The Stop Marker messages do
not need a response. Hence, it is safe to drop the Stop Marker messages
silently if any of them is found in the page request queue.
Fixes: d5b9e4bfe0 ("iommu/vt-d: Report prq to io-pgfault framework")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220421113558.3504874-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20220423082330.3897867-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Calculate the appropriate mask for non-size-aligned page selective
invalidation. Since psi uses the mask value to mask out the lower order
bits of the target address, properly flushing the iotlb requires using a
mask value such that [pfn, pfn+pages) all lie within the flushed
size-aligned region. This is not normally an issue because iova.c
always allocates iovas that are aligned to their size. However, iovas
which come from other sources (e.g. userspace via VFIO) may not be
aligned.
To properly flush the IOTLB, both the start and end pfns need to be
equal after applying the mask. That means that the most efficient mask
to use is the index of the lowest bit that is equal where all higher
bits are also equal. For example, if pfn=0x17f and pages=3, then
end_pfn=0x181, so the smallest mask we can use is 8. Any differences
above the highest bit of pages are due to carrying, so by xnor'ing pfn
and end_pfn and then masking out the lower order bits based on pages, we
get 0xffffff00, where the first set bit is the mask we want to use.
Fixes: 6fe1010d6d ("vfio/type1: DMA unmap chunking")
Cc: stable@vger.kernel.org
Signed-off-by: David Stevens <stevensd@chromium.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220401022430.1262215-1-stevensd@google.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220410013533.3959168-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
VT-d's dmar_platform_optin() actually represents a combination of
properties fairly well standardised by Microsoft as "Pre-boot DMA
Protection" and "Kernel DMA Protection"[1]. As such, we can provide
interested consumers with an abstracted capability rather than
driver-specific interfaces that won't scale. We name it for the former
aspect since that's what external callers are most likely to be
interested in; the latter is for the IOMMU layer to handle itself.
[1] https://docs.microsoft.com/en-us/windows-hardware/design/device-experiences/oem-kernel-dma-protection
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/d6218dff2702472da80db6aec2c9589010684551.1650878781.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
iommu_capable() only really works for systems where all IOMMU instances
are completely homogeneous, and all devices are IOMMU-mapped. Implement
the new variant which will be able to give a more accurate answer for
whichever device the caller is actually interested in, and even more so
once all the external users have been converted and we can reliably pass
the device pointer through the internal driver interface too.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/8407eb9586677995b7a9fd70d0fd82d85929a9bb.1650878781.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
If the IOMMU is in use and an untrusted device is connected to an external
facing port but the address requested isn't page aligned will cause the
kernel to attempt to use bounce buffers.
If for some reason the bounce buffers have not been allocated this is a
problem that should be made apparent to the user.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220404204723.9767-3-mario.limonciello@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Previously the AMD IOMMU would only enable SWIOTLB in certain
circumstances:
* IOMMU in passthrough mode
* SME enabled
This logic however doesn't work when an untrusted device is plugged in
that doesn't do page aligned DMA transactions. The expectation is
that a bounce buffer is used for those transactions.
This fails like this:
swiotlb buffer is full (sz: 4096 bytes), total 0 (slots), used 0 (slots)
That happens because the bounce buffers have been allocated, followed by
freed during startup but the bounce buffering code expects that all IOMMUs
have left it enabled.
Remove the criteria to set up bounce buffers on AMD systems to ensure
they're always available for supporting untrusted devices.
Fixes: 82612d66d5 ("iommu: Allow the dma-iommu api to use bounce buffers")
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220404204723.9767-2-mario.limonciello@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tegra194 and Tegra234 SoCs have the erratum that causes walk cache
entries to not be invalidated correctly. The problem is that the walk
cache index generated for IOVA is not same across translation and
invalidation requests. This is leading to page faults when PMD entry is
released during unmap and populated with new PTE table during subsequent
map request. Disabling large page mappings avoids the release of PMD
entry and avoid translations seeing stale PMD entry in walk cache.
Fix this by limiting the page mappings to PAGE_SIZE for Tegra194 and
Tegra234 devices. This is recommended fix from Tegra hardware design
team.
Acked-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Krishna Reddy <vdumpa@nvidia.com>
Co-developed-by: Pritesh Raithatha <praithatha@nvidia.com>
Signed-off-by: Pritesh Raithatha <praithatha@nvidia.com>
Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
Link: https://lore.kernel.org/r/20220421081504.24678-1-amhetre@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
The arm_smmu_mm_invalidate_range function is designed to be called
by mm core for Shared Virtual Addressing purpose between IOMMU and
CPU MMU. However, the ways of two subsystems defining their "end"
addresses are slightly different. IOMMU defines its "end" address
using the last address of an address range, while mm core defines
that using the following address of an address range:
include/linux/mm_types.h:
unsigned long vm_end;
/* The first byte after our end address ...
This mismatch resulted in an incorrect calculation for size so it
failed to be page-size aligned. Further, it caused a dead loop at
"while (iova < end)" check in __arm_smmu_tlb_inv_range function.
This patch fixes the issue by doing the calculation correctly.
Fixes: 2f7e8c553e ("iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops")
Cc: stable@vger.kernel.org
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20220419210158.21320-1-nicolinc@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
The IOMMU table tries to separate the different IOMMUs into different
backends, but actually requires various cross calls.
Rewrite the code to do the generic swiotlb/swiotlb-xen setup directly
in pci-dma.c and then just call into the IOMMU drivers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Sync up with v5.18-rc1, in particular to get 5e3094cfd9
("drm/i915/xehpsdv: Add has_flat_ccs to device info").
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Commit 3f6634d997 ("iommu: Use right way to retrieve iommu_ops") started
triggering a NULL pointer dereference for some omap variants:
__iommu_probe_device from probe_iommu_group+0x2c/0x38
probe_iommu_group from bus_for_each_dev+0x74/0xbc
bus_for_each_dev from bus_iommu_probe+0x34/0x2e8
bus_iommu_probe from bus_set_iommu+0x80/0xc8
bus_set_iommu from omap_iommu_init+0x88/0xcc
omap_iommu_init from do_one_initcall+0x44/0x24
This is caused by omap iommu probe returning 0 instead of ERR_PTR(-ENODEV)
as noted by Jason Gunthorpe <jgg@ziepe.ca>.
Looks like the regression already happened with an earlier commit
6785eb9105 ("iommu/omap: Convert to probe/release_device() call-backs")
that changed the function return type and missed converting one place.
Cc: Drew Fustini <dfustini@baylibre.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Suman Anna <s-anna@ti.com>
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Fixes: 6785eb9105 ("iommu/omap: Convert to probe/release_device() call-backs")
Fixes: 3f6634d997 ("iommu: Use right way to retrieve iommu_ops")
Signed-off-by: Tony Lindgren <tony@atomide.com>
Tested-by: Drew Fustini <dfustini@baylibre.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220331062301.24269-1-tony@atomide.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
- do not zero buffer in set_memory_decrypted (Kirill A. Shutemov)
- fix return value of dma-debug __setup handlers (Randy Dunlap)
- swiotlb cleanups (Robin Murphy)
- remove most remaining users of the pci-dma-compat.h API
(Christophe JAILLET)
- share the ABI header for the DMA map_benchmark with userspace
(Tian Tao)
- update the maintainer for DMA MAPPING BENCHMARK (Xiang Chen)
- remove CONFIG_DMA_REMAP (me)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmJDDgsLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYM9oBAAxm93DZCXsqektM2qJ34o1KCyfAhvTvZ1r38ab+cl
wJwmMPF6/S9MCj6XZEnCzUnXL//TnhcuYVztNpPTWqhx6QaqWmmx9yJKjoYAnHce
svVMef7iipn35w7hAPpiVR/AVwWyxQCkSC+5sgp6XX8mp7l7I3ajfO0fZ52JCcxw
12d4k1E0yjC096Kw8wXQv+rzmCAoQcK9Jj20COUO3rkgOr68ZIXse2HXUJjn76Fy
wym2rJfqJ9mdKrDHqphe1ntIzkcQNWx9xR0UVh7/e4p7Si5H8Lp8QWwC7Zw6Y2Gb
paeotIMu1uTKkcZI4K54J8PXRLA7PLrDSDFdxnKOsWNZU/inIwt9b11kr9FOaYqR
BLJ+w6bF1/PmM6q2gkOwNuoiJD5YQfwF7y+wi84VyaauM0J8ssIHYnVrCWXn0m1E
4veAkWasAYb1oaoNlDhmZEbpI+kcN3xwDyK1WbtHuGvR00oSvxl0d1viGTVXYfDA
k5rBjb7CovK8JIrFIJoMiDM4TvdauxL66IlEL7ohLDh6l1f09Q0+gsdVcAM0ObX6
zOkoulyHCFqkePvoH/xpyIrZZ9cHA228fZYC7QiBcxdWlD3dFMWkKvhajiSDQJSW
SAz94CeEDWn64Q462N+ecivKlLwz7j/TqOig5xU+/6UoMC/2a7+HIim+p6bjh8Pc
5Gg=
=C+Es
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-5.18' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- do not zero buffer in set_memory_decrypted (Kirill A. Shutemov)
- fix return value of dma-debug __setup handlers (Randy Dunlap)
- swiotlb cleanups (Robin Murphy)
- remove most remaining users of the pci-dma-compat.h API
(Christophe JAILLET)
- share the ABI header for the DMA map_benchmark with userspace
(Tian Tao)
- update the maintainer for DMA MAPPING BENCHMARK (Xiang Chen)
- remove CONFIG_DMA_REMAP (me)
* tag 'dma-mapping-5.18' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: benchmark: extract a common header file for map_benchmark definition
dma-debug: fix return value of __setup handlers
dma-mapping: remove CONFIG_DMA_REMAP
media: v4l2-pci-skeleton: Remove usage of the deprecated "pci-dma-compat.h" API
rapidio/tsi721: Remove usage of the deprecated "pci-dma-compat.h" API
sparc: Remove usage of the deprecated "pci-dma-compat.h" API
agp/intel: Remove usage of the deprecated "pci-dma-compat.h" API
alpha: Remove usage of the deprecated "pci-dma-compat.h" API
MAINTAINERS: update maintainer list of DMA MAPPING BENCHMARK
swiotlb: simplify array allocation
swiotlb: tidy up includes
swiotlb: simplify debugfs setup
swiotlb: do not zero buffer in set_memory_decrypted()
Here is the set of driver core changes for 5.18-rc1.
Not much here, primarily it was a bunch of cleanups and small updates:
- kobj_type cleanups for default_groups
- documentation updates
- firmware loader minor changes
- component common helper added and take advantage of it in many
drivers (the largest part of this pull request).
There will be a merge conflict in drivers/power/supply/ab8500_chargalg.c
with your tree, the merge conflict should be easy (take all the
changes).
All of these have been in linux-next for a while with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYkG6PA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylMFwCfSIyAU4oLEgj+/Rfmx4o45cAVIWMAnit3zbdU
wUUCGqKcOnTJEcW6dMPh
=1VVi
-----END PGP SIGNATURE-----
Merge tag 'driver-core-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the set of driver core changes for 5.18-rc1.
Not much here, primarily it was a bunch of cleanups and small updates:
- kobj_type cleanups for default_groups
- documentation updates
- firmware loader minor changes
- component common helper added and take advantage of it in many
drivers (the largest part of this pull request).
All of these have been in linux-next for a while with no reported
problems"
* tag 'driver-core-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (54 commits)
Documentation: update stable review cycle documentation
drivers/base/dd.c : Remove the initial value of the global variable
Documentation: update stable tree link
Documentation: add link to stable release candidate tree
devres: fix typos in comments
Documentation: add note block surrounding security patch note
samples/kobject: Use sysfs_emit instead of sprintf
base: soc: Make soc_device_match() simpler and easier to read
driver core: dd: fix return value of __setup handler
driver core: Refactor sysfs and drv/bus remove hooks
driver core: Refactor multiple copies of device cleanup
scripts: get_abi.pl: Fix typo in help message
kernfs: fix typos in comments
kernfs: remove unneeded #if 0 guard
ALSA: hda/realtek: Make use of the helper component_compare_dev_name
video: omapfb: dss: Make use of the helper component_compare_dev
power: supply: ab8500: Make use of the helper component_compare_dev
ASoC: codecs: wcd938x: Make use of the helper component_compare/release_of
iommu/mediatek: Make use of the helper component_compare/release_of
drm: of: Make use of the helper component_release_of
...
Including:
- IOMMU Core changes:
- Removal of aux domain related code as it is basically dead
and will be replaced by iommu-fd framework
- Split of iommu_ops to carry domain-specific call-backs
separatly
- Cleanup to remove useless ops->capable implementations
- Improve 32-bit free space estimate in iova allocator
- Intel VT-d updates:
- Various cleanups of the driver
- Support for ATS of SoC-integrated devices listed in
ACPI/SATC table
- ARM SMMU updates:
- Fix SMMUv3 soft lockup during continuous stream of events
- Fix error path for Qualcomm SMMU probe()
- Rework SMMU IRQ setup to prepare the ground for PMU support
- Minor cleanups and refactoring
- AMD IOMMU driver:
- Some minor cleanups and error-handling fixes
- Rockchip IOMMU driver:
- Use standard driver registration
- MSM IOMMU driver:
- Minor cleanup and change to standard driver registration
- Mediatek IOMMU driver:
- Fixes for IOTLB flushing logic
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmI8OUkACgkQK/BELZcB
GuNz9xAAvlgEg3byMx1y6LY9IctVGyLsegweVGM4+m6XR7qvT5Llc1E2Yw4Gooe4
EAceOihDKW2T9VnMlz9g/cG7Modrx60chcB22KKfxDXPl6yF3R89EMd7DE43T6n/
KPrP9+EsBnI8QSXyYu9ZowioX4CYwWhWD0dKHKAwDvw0BWHHUJ4hTaoHqEoIqLdP
vubeHziIok/g1sylSpJjTzV7r/Na8Q3TGcb/Mi5qC8uiyiyx40vtaduMGNW+/ToN
EqOKszxPmHfHv/xf0IHo0eUZ2L/JAe0mAlZzOb09f5F2sXJrbC05jlmRaDmSjT+u
iEc1r2By/0xo6iOuQC3wD6kTvwwO/ecpNYGhXYXdTbtLquYfL5PSXjRHEU9gf2BO
i/llPDsnytPvm/hnmbi26ChNR6yrDPz5bkoCUl5mnB1jZcaZtIURN7cRlEPPZUWo
62VDNdqWDB6AvALc1/SwYdJX/i5eaBf+niS7/BJ/IkLp2oJxFzrGsU8SRJFHNYsa
zdFIUUoTw647Ul6derSpGzHow169/RwVKYPiXMsaA8/viPNjpBOtfg56abn1WLW6
N4CtwNu6tt+sPfftFdFx2cDEMW2zpWg5doMddBfEx9FAk0HJ4WLZiTpaO2PxcLyd
kCAsGHj+ViAZHINVKFV4nQN/V9yQtcIc4UPmSGJBtKCK3KUYujw=
=bcqr
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
- IOMMU Core changes:
- Removal of aux domain related code as it is basically dead and
will be replaced by iommu-fd framework
- Split of iommu_ops to carry domain-specific call-backs separatly
- Cleanup to remove useless ops->capable implementations
- Improve 32-bit free space estimate in iova allocator
- Intel VT-d updates:
- Various cleanups of the driver
- Support for ATS of SoC-integrated devices listed in ACPI/SATC
table
- ARM SMMU updates:
- Fix SMMUv3 soft lockup during continuous stream of events
- Fix error path for Qualcomm SMMU probe()
- Rework SMMU IRQ setup to prepare the ground for PMU support
- Minor cleanups and refactoring
- AMD IOMMU driver:
- Some minor cleanups and error-handling fixes
- Rockchip IOMMU driver:
- Use standard driver registration
- MSM IOMMU driver:
- Minor cleanup and change to standard driver registration
- Mediatek IOMMU driver:
- Fixes for IOTLB flushing logic
* tag 'iommu-updates-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (47 commits)
iommu/amd: Improve amd_iommu_v2_exit()
iommu/amd: Remove unused struct fault.devid
iommu/amd: Clean up function declarations
iommu/amd: Call memunmap in error path
iommu/arm-smmu: Account for PMU interrupts
iommu/vt-d: Enable ATS for the devices in SATC table
iommu/vt-d: Remove unused function intel_svm_capable()
iommu/vt-d: Add missing "__init" for rmrr_sanity_check()
iommu/vt-d: Move intel_iommu_ops to header file
iommu/vt-d: Fix indentation of goto labels
iommu/vt-d: Remove unnecessary prototypes
iommu/vt-d: Remove unnecessary includes
iommu/vt-d: Remove DEFER_DEVICE_DOMAIN_INFO
iommu/vt-d: Remove domain and devinfo mempool
iommu/vt-d: Remove iova_cache_get/put()
iommu/vt-d: Remove finding domain in dmar_insert_one_dev_info()
iommu/vt-d: Remove intel_iommu::domains
iommu/mediatek: Always tlb_flush_all when each PM resume
iommu/mediatek: Add tlb_lock in tlb_flush_all
iommu/mediatek: Remove the power status checking in tlb flush all
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAmI5jiEACgkQCF8+vY7k
4RWfnw/9FSBVrFgzoDwM4choQu997T6GSsEuqJFbLdDLPbKZifl9UsCPmenFp0aS
4D2EG4A1nF/HQTHJ6vPSWjgVP9zhCAX/DvHH+9DiSAWQoSIVmUZGoEhbAHlbE12K
PUs0MEIR8o8k3IBvMD6buH1FpnIgZO1ULi1Cx/5YH1GaRshdZrLcgz0YioXomLKE
KvNokrhLYzJFIWl34KZ+92RluPOy7DlEJpRNbCTYkaLYfSYqLs/FTisuEUt3gEso
tjgUaBxJ/k3AOgU4XXoeVlqTFuK1TY70aA0aqmVYPqZ7eCO2Btbm11h8WoYO/SgY
N3P57LP86WWUHNA13argVv/pQo0x8iX5RnYObLDMGGrUQyQT7BcjMGCrKIVyMRAz
06dZbnGnbsOOph9D7wwQ+xJQwUqyrllVVhRdMIWXJQjKqAP9mmgIB/dcwrrP5Ziw
y0fmuaXZ/ZmvD63yq2iWwV6niWvNa5XMnR3NxceOV60WOe9LS6aio/duwfaZ5ic1
qzTAtc/+3FuIgRD35eILrjymu53gW6pt6vS0pHP/+xvHq5Yp7u8Pc5+jFxLYRM8e
AOglA7ZxGGz1uL/LUJ4DD8BQ55wr0EH63Lm7Pfy4JmmzqI/TQwEQifT/H8mDNP+G
DCmod3ZyCsHH6vsN0afa4ZxqyCDToVHVwvko4mzOnl4hED5JteI=
=Bc0l
-----END PGP SIGNATURE-----
Merge tag 'media/v5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- a major reorg at platform Kconfig/Makefile files, organizing them per
vendor. The other media Kconfig/Makefile files also sorted
- New sensor drivers: hi847, isl7998x, ov08d10
- New Amphion vpu decoder stateful driver
- New Atmel microchip csi2dc driver
- tegra-vde driver promoted from staging
- atomisp: some fixes for it to work on BYT
- imx7-mipi-csis driver promoted from staging and renamed
- camss driver got initial support for VFE hardware version Titan 480
- mtk-vcodec has gained support for MT8192
- lots of driver changes, fixes and improvements
* tag 'media/v5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (417 commits)
media: nxp: Restrict VIDEO_IMX_MIPI_CSIS to ARCH_MXC or COMPILE_TEST
media: amphion: cleanup media device if register it fail
media: amphion: fix some issues to improve robust
media: amphion: fix some error related with undefined reference to __divdi3
media: amphion: fix an issue that using pm_runtime_get_sync incorrectly
media: vidtv: use vfree() for memory allocated with vzalloc()
media: m5mols/m5mols.h: document new reset field
media: pixfmt-yuv-planar.rst: fix PIX_FMT labels
media: platform: Remove unnecessary print function dev_err()
media: amphion: Add missing of_node_put() in vpu_core_parse_dt()
media: mtk-vcodec: Add missing of_node_put() in mtk_vdec_hw_prob_done()
media: platform: amphion: Fix build error without MAILBOX
media: spi: Kconfig: Place SPI drivers on a single menu
media: i2c: Kconfig: move camera drivers to the top
media: atomisp: fix bad usage at error handling logic
media: platform: rename mediatek/mtk-jpeg/ to mediatek/jpeg/
media: media/*/Kconfig: sort entries
media: Kconfig: cleanup VIDEO_DEV dependencies
media: platform/*/Kconfig: make manufacturer menus more uniform
media: platform: Create vendor/{Makefile,Kconfig} files
...
- Simplify the PASID handling to allocate the PASID once, associate it to
the mm of a process and free it on mm_exit(). The previous attempt of
refcounted PASIDs and dynamic alloc()/free() turned out to be error
prone and too complex. The PASID space is 20bits, so the case of
resource exhaustion is a pure academic concern.
- Populate the PASID MSR on demand via #GP to avoid racy updates via IPIs.
- Reenable ENQCMD and let objtool check for the forbidden usage of ENQCMD
in the kernel.
- Update the documentation for Shared Virtual Addressing accordingly.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmI4WpETHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoUfnD/0bY94rgEX4Uuy/mFQ1W8X8XlcyKrha
0/cRATb+4QV/pwJgGr2nClKhGlFMYPdJLvKMC1TCUPCVrLD1RNmluIZoFzeqXwhm
jDdCcFOuGZ2D4ujDPWwOOpKBT1ytovnQa7+lH6QJyKkEqdcC2ncOvGJQoiRxRQIG
8wTVs/OUvQJ5ZhSZQMKQN4uMWMyHEjhbroYS30/uNi/598jTPgzlEoa14XocQ9Os
nS6ALvjuc9MsJ34F61etMaJU1ZMI3Wx75u9QjEvX6hmJs87YdvgwE7lzJUKFDEuh
gewM0wp2fTa8/azzP0eMiHTin56PqFdmllzRqXmilbZMEPOeI29dZVArCdpKcAn0
r9p1kJUT3Xl2G3Oir/OdCaaQHcznD1Y5ZFOyh12wgEucZ/rdeSr7nq7n5HoOL5Bw
Q2o6YvTkE9DOL0nTN1lSXGiPspou7fzX0uUcRBrbJUS3sBv4zGIlaJXUaTVnSdAt
VZj4LeOK7v2BjyeiOY0iaaIQd3xjmLUF0UjozXS5M13SoVcToZRbyWqhDzPvNuKA
imQb/dnFpXhABgmuqAiJLeqM0VtGMFNc780OURkcsBSPng+iSEdV4DzuhK0jpU8x
Uk1RuGMd/vgmrlDFBrw+orQQiiKR1ixpI0LiHfcOBycfJhqTwcnrNZvAN5/do28Z
E23+QzlUbZF0cw==
=Dy8V
-----END PGP SIGNATURE-----
Merge tag 'x86-pasid-2022-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 PASID support from Thomas Gleixner:
"Reenable ENQCMD/PASID support:
- Simplify the PASID handling to allocate the PASID once, associate
it to the mm of a process and free it on mm_exit().
The previous attempt of refcounted PASIDs and dynamic
alloc()/free() turned out to be error prone and too complex. The
PASID space is 20bits, so the case of resource exhaustion is a pure
academic concern.
- Populate the PASID MSR on demand via #GP to avoid racy updates via
IPIs.
- Reenable ENQCMD and let objtool check for the forbidden usage of
ENQCMD in the kernel.
- Update the documentation for Shared Virtual Addressing accordingly"
* tag 'x86-pasid-2022-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Documentation/x86: Update documentation for SVA (Shared Virtual Addressing)
tools/objtool: Check for use of the ENQCMD instruction in the kernel
x86/cpufeatures: Re-enable ENQCMD
x86/traps: Demand-populate PASID MSR via #GP
sched: Define and initialize a flag to identify valid PASID in the task
x86/fpu: Clear PASID when copying fpstate
iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
kernel/fork: Initialize mm's PASID
iommu/ioasid: Introduce a helper to check for valid PASIDs
mm: Change CONFIG option for mm->pasid field
iommu/sva: Rename CONFIG_IOMMU_SVA_LIB to CONFIG_IOMMU_SVA
- Fix SMMUv3 soft lockup during continuous stream of events
- Fix error path for Qualcomm SMMU probe()
- Rework SMMU IRQ setup to prepare the ground for PMU support
- Minor cleanups and refactoring
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmImd1sQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNO0SB/9JFjMNKFxKEfGktraXce2J0f9n8tlDPywZ
EmLm3Zdksb/B+n3uQog8DGPKLOfR3GzkSSjlOC0l+twSWGK2tvhc93xXUN73lF5z
qGHdhaDtYjKT/YTCB9Zo0AtQrmTQY1tgjpoT1QFS18xCwa4KzkAbnx5+98Xng+43
RvHu90xdPZf5Iq0zRen4gkdttuoFS8oFn4JidS6hG6JZ0l3AExxq81H4/X7Beg1/
/WV3odq+/yDYQeB0l5VeBNgFN3uIBJLcm3l3gWNCRcpoQcKpw1ooRc0nOYrIY1H1
60i6czltXgzu9hnGjnZ91a/t6vUBeZYGt+QAVrKiP40XwMvMdqv5
=i/y5
-----END PGP SIGNATURE-----
Merge tag 'arm-smmu-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into arm/smmu
Arm SMMU updates for 5.18
- Fix SMMUv3 soft lockup during continuous stream of events
- Fix error path for Qualcomm SMMU probe()
- Rework SMMU IRQ setup to prepare the ground for PMU support
- Minor cleanups and refactoring
During module exit, the current logic loops through all possible
16-bit device ID space to search for existing devices and clean up
device state structures. This can be simplified by looping through
the device state list.
Also, refactor various clean up logic into free_device_state()
for better reusability.
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20220301085626.87680-6-vasant.hegde@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This variable has not been used since it was introduced.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20220301085626.87680-5-vasant.hegde@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In preparation for SMMUv2 PMU support, rejig our IRQ setup code to
account for PMU interrupts as additional resources. We can simplify the
whole flow by only storing the context IRQs, since the global IRQs are
devres-managed and we never refer to them beyond the initial request.
CC: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/b2a40caaf1622eb35c555074a0d72f4f0513cff9.1645106346.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Starting from Intel VT-d v3.2, Intel platform BIOS can provide additional
SATC table structure. SATC table includes a list of SoC integrated devices
that support ATC (Address translation cache).
Enabling ATC (via ATS capability) can be a functional requirement for SATC
device operation or optional to enhance device performance/functionality.
This is determined by the bit of ATC_REQUIRED in SATC table. When IOMMU is
working in scalable mode, software chooses to always enable ATS for every
device in SATC table because Intel SoC devices in SATC table are trusted to
use ATS.
On the other hand, if IOMMU is in legacy mode, ATS of SATC capable devices
can work transparently to software and be automatically enabled by IOMMU
hardware. As the result, there is no need for software to enable ATS on
these devices.
This also removes dmar_find_matched_atsr_unit() helper as it becomes dead
code now.
Signed-off-by: Yian Chen <yian.chen@intel.com>
Link: https://lore.kernel.org/r/20220222185416.1722611-1-yian.chen@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220301020159.633356-13-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Allocate and set the per-device iommu private data during iommu device
probe. This makes the per-device iommu private data always available
during iommu_probe_device() and iommu_release_device(). With this changed,
the dummy DEFER_DEVICE_DOMAIN_INFO pointer could be removed. The wrappers
for getting the private data and domain are also cleaned.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220214025704.3184654-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20220301020159.633356-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Prepare for 2 HWs that sharing pgtable in different power-domains.
When there are 2 M4U HWs, it may has problem in the flush_range in which
we get the pm_status via the m4u dev, BUT that function don't reflect the
real power-domain status of the HW since there may be other HW also use
that power-domain.
DAM allocation is often done while the allocating device is runtime
suspended. In such a case the iommu will also be suspended and partial
flushing of the tlb will not be executed.
Therefore, we add a tlb_flush_all in the pm_runtime_resume to make
sure the tlb is always clean.
In other case, the iommu's power should be active via device
link with smi.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
[move the call to mtk_iommu_tlb_flush_all to the bottom of resume cb, improve doc/log]
Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20211208120744.2415-6-dafna.hirschfeld@collabora.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The tlb_flush_all touches the registers controlling tlb operations.
Protect it with the tlb_lock spinlock.
This also require the range_sync func to release that spinlock before
calling tlb_flush_all.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
[refactor commit log]
Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20211208120744.2415-5-dafna.hirschfeld@collabora.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
To simplify the code, Remove the power status checking in the
tlb_flush_all, remove this:
if (pm_runtime_get_if_in_use(data->dev) <= 0)
continue;
The mtk_iommu_tlb_flush_all is called from
a) isr
b) tlb flush range fail case
c) iommu_create_device_direct_mappings
In first two cases, the power and clock are always enabled.
In the third case tlb flush is unnecessary because in a later patch
in the series a full flush from the pm_runtime_resume callback is added.
In addition, writing the tlb control register when the iommu is not resumed
is ok and the write is ignored.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
[refactor commit log]
Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20211208120744.2415-4-dafna.hirschfeld@collabora.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In case of v4l2_reqbufs() it is possible, that a TLB flush is done
without runtime PM being enabled. In that case the "Partial TLB flush
timed out, falling back to full flush" warning is printed.
Commit c0b57581b7 ("iommu/mediatek: Add power-domain operation")
introduced has_pm as optimization to avoid checking runtime PM
when there is no power domain attached. But without the PM domain
there is still the device driver's runtime PM suspend handler, which
disables the clock. Thus flushing should also be avoided when there
is no PM domain involved.
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Reviewed-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Yong Wu <yong.wu@mediatek.com>
Link: https://lore.kernel.org/r/20211208120744.2415-3-dafna.hirschfeld@collabora.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The tlb_sync_all is called from these three functions:
a) flush_iotlb_all: it will be called for each a iommu HW.
b) tlb_flush_range_sync: it already has for_each_m4u.
c) in irq: When IOMMU HW translation fault, Only need flush itself.
Thus, No need for_each_m4u in this tlb_sync_all. Remove it.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20211208120744.2415-2-dafna.hirschfeld@collabora.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
For various reasons based on the allocator behaviour and typical
use-cases at the time, when the max32_alloc_size optimisation was
introduced it seemed reasonable to couple the reset of the tracked
size to the update of cached32_node upon freeing a relevant IOVA.
However, since subsequent optimisations focused on helping genuine
32-bit devices make best use of even more limited address spaces, it
is now a lot more likely for cached32_node to be anywhere in a "full"
32-bit address space, and as such more likely for space to become
available from IOVAs below that node being freed.
At this point, the short-cut in __cached_rbnode_delete_update() really
doesn't hold up any more, and we need to fix the logic to reliably
provide the expected behaviour. We still want cached32_node to only move
upwards, but we should reset the allocation size if *any* 32-bit space
has become available.
Reported-by: Yunfei Wang <yf.wang@mediatek.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Miles Chen <miles.chen@mediatek.com>
Link: https://lore.kernel.org/r/033815732d83ca73b13c11485ac39336f15c3b40.1646318408.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
CONFIG_DMA_REMAP is used to build a few helpers around the core
vmalloc code, and to use them in case there is a highmem page in
dma-direct, and to make dma coherent allocations be able to use
non-contiguous pages allocations for DMA allocations in the dma-iommu
layer.
Right now it needs to be explicitly selected by architectures, and
is only done so by architectures that require remapping to deal
with devices that are not DMA coherent. Make it unconditional for
builds with CONFIG_MMU as it is very little extra code, but makes
it much more likely that large DMA allocations succeed on x86.
This fixes hot plugging a NVMe thunderbolt SSD for me, which tries
to allocate a 1MB buffer that is otherwise hard to obtain due to
memory fragmentation on a heavily used laptop.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
The VT-d spec requires (10.4.4 Global Command Register, TE
field) that:
Hardware implementations supporting DMA draining must drain
any in-flight DMA read/write requests queued within the
Root-Complex before completing the translation enable
command and reflecting the status of the command through
the TES field in the Global Status register.
Unfortunately, some integrated graphic devices fail to do
so after some kind of power state transition. As the
result, the system might stuck in iommu_disable_translati
on(), waiting for the completion of TE transition.
This adds RPLS to a quirk list for those devices and skips
TE disabling if the qurik hits.
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/4898
Tested-by: Raviteja Goud Talla <ravitejax.goud.talla@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220302043256.191529-1-tejaskumarx.surendrakumar.upadhyay@intel.com
The reference taken by 'of_find_device_by_node()' must be released when
not needed anymore.
Add the corresponding 'put_device()' in the error handling path.
Fixes: 765a9d1d02 ("iommu/tegra-smmu: Fix mc errors on tegra124-nyan")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/20220107080915.12686-1-linmq006@gmail.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When enabling VMD and IOMMU scalable mode, the following kernel panic
call trace/kernel log is shown in Eagle Stream platform (Sapphire Rapids
CPU) during booting:
pci 0000:59:00.5: Adding to iommu group 42
...
vmd 0000:59:00.5: PCI host bridge to bus 10000:80
pci 10000:80:01.0: [8086:352a] type 01 class 0x060400
pci 10000:80:01.0: reg 0x10: [mem 0x00000000-0x0001ffff 64bit]
pci 10000:80:01.0: enabling Extended Tags
pci 10000:80:01.0: PME# supported from D0 D3hot D3cold
pci 10000:80:01.0: DMAR: Setup RID2PASID failed
pci 10000:80:01.0: Failed to add to iommu group 42: -16
pci 10000:80:03.0: [8086:352b] type 01 class 0x060400
pci 10000:80:03.0: reg 0x10: [mem 0x00000000-0x0001ffff 64bit]
pci 10000:80:03.0: enabling Extended Tags
pci 10000:80:03.0: PME# supported from D0 D3hot D3cold
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:29!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 7 Comm: kworker/0:1 Not tainted 5.17.0-rc3+ #7
Hardware name: Lenovo ThinkSystem SR650V3/SB27A86647, BIOS ESE101Y-1.00 01/13/2022
Workqueue: events work_for_cpu_fn
RIP: 0010:__list_add_valid.cold+0x26/0x3f
Code: 9a 4a ab ff 4c 89 c1 48 c7 c7 40 0c d9 9e e8 b9 b1 fe ff 0f
0b 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 f0 0c d9 9e e8 a2 b1
fe ff <0f> 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 98 0c d9
9e e8 8b b1 fe
RSP: 0000:ff5ad434865b3a40 EFLAGS: 00010246
RAX: 0000000000000058 RBX: ff4d61160b74b880 RCX: ff4d61255e1fffa8
RDX: 0000000000000000 RSI: 00000000fffeffff RDI: ffffffff9fd34f20
RBP: ff4d611d8e245c00 R08: 0000000000000000 R09: ff5ad434865b3888
R10: ff5ad434865b3880 R11: ff4d61257fdc6fe8 R12: ff4d61160b74b8a0
R13: ff4d61160b74b8a0 R14: ff4d611d8e245c10 R15: ff4d611d8001ba70
FS: 0000000000000000(0000) GS:ff4d611d5ea00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ff4d611fa1401000 CR3: 0000000aa0210001 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
intel_pasid_alloc_table+0x9c/0x1d0
dmar_insert_one_dev_info+0x423/0x540
? device_to_iommu+0x12d/0x2f0
intel_iommu_attach_device+0x116/0x290
__iommu_attach_device+0x1a/0x90
iommu_group_add_device+0x190/0x2c0
__iommu_probe_device+0x13e/0x250
iommu_probe_device+0x24/0x150
iommu_bus_notifier+0x69/0x90
blocking_notifier_call_chain+0x5a/0x80
device_add+0x3db/0x7b0
? arch_memremap_can_ram_remap+0x19/0x50
? memremap+0x75/0x140
pci_device_add+0x193/0x1d0
pci_scan_single_device+0xb9/0xf0
pci_scan_slot+0x4c/0x110
pci_scan_child_bus_extend+0x3a/0x290
vmd_enable_domain.constprop.0+0x63e/0x820
vmd_probe+0x163/0x190
local_pci_probe+0x42/0x80
work_for_cpu_fn+0x13/0x20
process_one_work+0x1e2/0x3b0
worker_thread+0x1c4/0x3a0
? rescuer_thread+0x370/0x370
kthread+0xc7/0xf0
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
...
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x1ca00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception ]---
The following 'lspci' output shows devices '10000:80:*' are subdevices of
the VMD device 0000:59:00.5:
$ lspci
...
0000:59:00.5 RAID bus controller: Intel Corporation Volume Management Device NVMe RAID Controller (rev 20)
...
10000:80:01.0 PCI bridge: Intel Corporation Device 352a (rev 03)
10000:80:03.0 PCI bridge: Intel Corporation Device 352b (rev 03)
10000:80:05.0 PCI bridge: Intel Corporation Device 352c (rev 03)
10000:80:07.0 PCI bridge: Intel Corporation Device 352d (rev 03)
10000:81:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]
10000:82:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]
The symptom 'list_add double add' is caused by the following failure
message:
pci 10000:80:01.0: DMAR: Setup RID2PASID failed
pci 10000:80:01.0: Failed to add to iommu group 42: -16
pci 10000:80:03.0: [8086:352b] type 01 class 0x060400
Device 10000:80:01.0 is the subdevice of the VMD device 0000:59:00.5,
so invoking intel_pasid_alloc_table() gets the pasid_table of the VMD
device 0000:59:00.5. Here is call path:
intel_pasid_alloc_table
pci_for_each_dma_alias
get_alias_pasid_table
search_pasid_table
pci_real_dma_dev() in pci_for_each_dma_alias() gets the real dma device
which is the VMD device 0000:59:00.5. However, pte of the VMD device
0000:59:00.5 has been configured during this message "pci 0000:59:00.5:
Adding to iommu group 42". So, the status -EBUSY is returned when
configuring pasid entry for device 10000:80:01.0.
It then invokes dmar_remove_one_dev_info() to release
'struct device_domain_info *' from iommu_devinfo_cache. But, the pasid
table is not released because of the following statement in
__dmar_remove_one_dev_info():
if (info->dev && !dev_is_real_dma_subdevice(info->dev)) {
...
intel_pasid_free_table(info->dev);
}
The subsequent dmar_insert_one_dev_info() operation of device
10000:80:03.0 allocates 'struct device_domain_info *' from
iommu_devinfo_cache. The allocated address is the same address that
is released previously for device 10000:80:01.0. Finally, invoking
device_attach_pasid_table() causes the issue.
`git bisect` points to the offending commit 474dd1c650 ("iommu/vt-d:
Fix clearing real DMA device's scalable-mode context entries"), which
releases the pasid table if the device is not the subdevice by
checking the returned status of dev_is_real_dma_subdevice().
Reverting the offending commit can work around the issue.
The solution is to prevent from allocating pasid table if those
devices are subdevices of the VMD device.
Fixes: 474dd1c650 ("iommu/vt-d: Fix clearing real DMA device's scalable-mode context entries")
Cc: stable@vger.kernel.org # v5.14+
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Link: https://lore.kernel.org/r/20220216091307.703-1-adrianhuang0701@gmail.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220221053348.262724-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Move the domain specific operations out of struct iommu_ops into a new
structure that only has domain specific operations. This solves the
problem of needing to know if the method vector for a given operation
needs to be retrieved from the device or the domain. Logically the domain
ops are the ones that make sense for external subsystems and endpoint
drivers to use, while device ops, with the sole exception of domain_alloc,
are IOMMU API internals.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220216025249.3459465-10-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The is_attach_deferred iommu_ops callback is a device op. The domain
argument is unnecessary and never used. Remove it to make code clean.
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220216025249.3459465-9-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The common iommu_ops is hooked to both device and domain. When a helper
has both device and domain pointer, the way to get the iommu_ops looks
messy in iommu core. This sorts out the way to get iommu_ops. The device
related helpers go through device pointer, while the domain related ones
go through domain pointer.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220216025249.3459465-8-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The apply_resv_region callback in iommu_ops was introduced to reserve an
IOVA range in the given DMA domain when the IOMMU driver manages the IOVA
by itself. As all drivers converted to use dma-iommu in the core, there's
no driver using this anymore. Remove it to avoid dead code.
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220216025249.3459465-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The aux-domain related interfaces and iommu_ops are not referenced
anywhere in the tree. We've also reached a consensus to redesign it
based the new iommufd framework. Remove them to avoid dead code.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220216025249.3459465-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The aux-domain related callbacks are not called in the tree. Remove them
to avoid dead code.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220216025249.3459465-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The guest pasid related uapi interfaces and definitions are not referenced
anywhere in the tree. We've also reached a consensus to replace them with
a new iommufd design. Remove them to avoid dead code.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220216025249.3459465-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The guest pasid related callbacks are not called in the tree. Remove them
to avoid dead code.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220216025249.3459465-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
PASIDs are process-wide. It was attempted to use refcounted PASIDs to
free them when the last thread drops the refcount. This turned out to
be complex and error prone. Given the fact that the PASID space is 20
bits, which allows up to 1M processes to have a PASID associated
concurrently, PASID resource exhaustion is not a realistic concern.
Therefore, it was decided to simplify the approach and stick with lazy
on demand PASID allocation, but drop the eager free approach and make an
allocated PASID's lifetime bound to the lifetime of the process.
Get rid of the refcounting mechanisms and replace/rename the interfaces
to reflect this new approach.
[ bp: Massage commit message. ]
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20220207230254.3342514-6-fenghua.yu@intel.com