This is more logically correct and will also allow us to
to use mask/unmask logic to restore INTX setttings after
the resume from s3/s4.
Fixes: 6692981295 ("iommu/amd: Add support for X2APIC IOMMU interrupts")
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20211123161038.48009-4-mlevitsk@redhat.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This will give IOMMU GA log a chance to work after resume
from s3/s4.
Fixes: 8bda0cfbdc ("iommu/amd: Detect and initialize guest vAPIC log")
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20211123161038.48009-2-mlevitsk@redhat.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Including:
- Intel IOMMU Updates fro Lu Baolu:
- Dump DMAR translation structure when DMA fault occurs
- An optimization in the page table manipulation code
- Use second level for GPA->HPA translation
- Various cleanups
- Arm SMMU Updates from Will
- Minor optimisations to SMMUv3 command creation and submission
- Numerous new compatible string for Qualcomm SMMUv2 implementations
- Fixes for the SWIOTLB based implemenation of dma-iommu code for
untrusted devices
- Add support for r8a779a0 to the Renesas IOMMU driver and DT matching
code for r8a77980
- A couple of cleanups and fixes for the Apple DART IOMMU driver
- Make use of generic report_iommu_fault() interface in the AMD IOMMU
driver
- Various smaller fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmGD6NQACgkQK/BELZcB
GuOSfg/9FKXl5ym86BP3tAS1fREKH7p59JRGZrrIR89NyHAcEUjtNG3YLPao+YxU
3CDgLkru+vlDpYY54QoyqcY5FgIHT3Cna/Cdk4zekRmSO/14gHp47jtZRheOUzLF
rvwfaplcbbtT8akpsVFzvw8YpQLGSDiDQSl7xL2+40Z9hiYX/gS9Af+PH98tAXsa
yZKZj6gU+JXM58VihO3M7umyE06tovyBaYgcsBZtbf66bGc0ySu+fe75UVWbueRt
Z8jwqa7TUfVXiYC8h+LqtGET6gtzNSsxAU3VllRe7Brf6K8i/yaRs/TO2Hp83d7/
q/fcK3vNQ5v3aDNci/DjBB8SEySzCmRz/9ocCOCx8ByuRp+5lwVRPPq3WcUMtsZY
QpYo9Fk7luFz2Gj5LObKAVBvOoeBZ5Km3oPs4HVmQ6epxn/rVckJDnJnVSLJuATq
tSZC2heRfFlg1dT6WFaynCTP2RI1LlNEdKhHirV6L368rSjmF0ZdQxdTpHULsHr1
yMjqL21OfcSkLW91rvfb3g68EsIwDbCPGTOlQWZLmAtwOWtHSCLPgwwEG7WefZbH
yaslpmlUTOurUnFmpxlfLicy5sqsBL2ASzGJkEKrgunw82Ke96zzkRzi+9j9HeS6
g0AyIWMi1cUAjONVUZtV4yjImXh63HIPiKx730a9teodusoxm+Q=
=waUR
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
- Intel IOMMU Updates fro Lu Baolu:
- Dump DMAR translation structure when DMA fault occurs
- An optimization in the page table manipulation code
- Use second level for GPA->HPA translation
- Various cleanups
- Arm SMMU Updates from Will
- Minor optimisations to SMMUv3 command creation and submission
- Numerous new compatible string for Qualcomm SMMUv2 implementations
- Fixes for the SWIOTLB based implemenation of dma-iommu code for
untrusted devices
- Add support for r8a779a0 to the Renesas IOMMU driver and DT matching
code for r8a77980
- A couple of cleanups and fixes for the Apple DART IOMMU driver
- Make use of generic report_iommu_fault() interface in the AMD IOMMU
driver
- Various smaller fixes and cleanups
* tag 'iommu-updates-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (35 commits)
iommu/dma: Fix incorrect error return on iommu deferred attach
iommu/dart: Initialize DART_STREAMS_ENABLE
iommu/dma: Use kvcalloc() instead of kvzalloc()
iommu/tegra-smmu: Use devm_bitmap_zalloc when applicable
iommu/dart: Use kmemdup instead of kzalloc and memcpy
iommu/vt-d: Avoid duplicate removing in __domain_mapping()
iommu/vt-d: Convert the return type of first_pte_in_page to bool
iommu/vt-d: Clean up unused PASID updating functions
iommu/vt-d: Delete dev_has_feat callback
iommu/vt-d: Use second level for GPA->HPA translation
iommu/vt-d: Check FL and SL capability sanity in scalable mode
iommu/vt-d: Remove duplicate identity domain flag
iommu/vt-d: Dump DMAR translation structure when DMA fault occurs
iommu/vt-d: Do not falsely log intel_iommu is unsupported kernel option
iommu/arm-smmu-qcom: Request direct mapping for modem device
iommu: arm-smmu-qcom: Add compatible for QCM2290
dt-bindings: arm-smmu: Add compatible for QCM2290 SoC
iommu/arm-smmu-qcom: Add SM6350 SMMU compatible
dt-bindings: arm-smmu: Add compatible for SM6350 SoC
iommu/arm-smmu-v3: Properly handle the return value of arm_smmu_cmdq_build_cmd()
...
The end goal of the current buffer overflow detection work[0] is to gain
full compile-time and run-time coverage of all detectable buffer overflows
seen via array indexing or memcpy(), memmove(), and memset(). The str*()
family of functions already have full coverage.
While much of the work for these changes have been on-going for many
releases (i.e. 0-element and 1-element array replacements, as well as
avoiding false positives and fixing discovered overflows[1]), this series
contains the foundational elements of several related buffer overflow
detection improvements by providing new common helpers and FORTIFY_SOURCE
changes needed to gain the introspection required for compiler visibility
into array sizes. Also included are a handful of already Acked instances
using the helpers (or related clean-ups), with many more waiting at the
ready to be taken via subsystem-specific trees[2]. The new helpers are:
- struct_group() for gaining struct member range introspection.
- memset_after() and memset_startat() for clearing to the end of structures.
- DECLARE_FLEX_ARRAY() for using flex arrays in unions or alone in structs.
Also included is the beginning of the refactoring of FORTIFY_SOURCE to
support memcpy() introspection, fix missing and regressed coverage under
GCC, and to prepare to fix the currently broken Clang support. Finishing
this work is part of the larger series[0], but depends on all the false
positives and buffer overflow bug fixes to have landed already and those
that depend on this series to land.
As part of the FORTIFY_SOURCE refactoring, a set of both a compile-time
and run-time tests are added for FORTIFY_SOURCE and the mem*()-family
functions respectively. The compile time tests have found a legitimate
(though corner-case) bug[6] already.
Please note that the appearance of "panic" and "BUG" in the
FORTIFY_SOURCE refactoring are the result of relocating existing code,
and no new use of those code-paths are expected nor desired.
Finally, there are two tree-wide conversions for 0-element arrays and
flexible array unions to gain sane compiler introspection coverage that
result in no known object code differences.
After this series (and the changes that have now landed via netdev
and usb), we are very close to finally being able to build with
-Warray-bounds and -Wzero-length-bounds. However, due corner cases in
GCC[3] and Clang[4], I have not included the last two patches that turn
on these options, as I don't want to introduce any known warnings to
the build. Hopefully these can be solved soon.
[0] https://lore.kernel.org/lkml/20210818060533.3569517-1-keescook@chromium.org/
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?qt=grep&q=FORTIFY_SOURCE
[2] https://lore.kernel.org/lkml/202108220107.3E26FE6C9C@keescook/
[3] https://lore.kernel.org/lkml/3ab153ec-2798-da4c-f7b1-81b0ac8b0c5b@roeck-us.net/
[4] https://bugs.llvm.org/show_bug.cgi?id=51682
[5] https://lore.kernel.org/lkml/202109051257.29B29745C0@keescook/
[6] https://lore.kernel.org/lkml/20211020200039.170424-1-keescook@chromium.org/
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmGAFWcWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJmKFD/45MJdnvW5MhIEeW5tc5UjfcIPS
ae+YvlEX/2ZwgSlTxocFVocE6hz7b6eCiX3dSAChPkPxsSfgeiuhjxsU+4ROnELR
04RqTA/rwT6JXfJcXbDPXfxDL4huUkgktAW3m1sT771AZspeap2GrSwFyttlTqKA
+kTiZ3lXJVFcw10uyhfp3Lk6eFJxdf5iOjuEou5kBOQfpNKEOduRL2K15hSowOwB
lARiAC+HbmN+E+npvDE7YqK4V7ZQ0/dtB0BlfqgTkn1spQz8N21kBAMpegV5vvIk
A+qGHc7q2oyk4M14TRTidQHGQ4juW1Kkvq3NV6KzwQIVD+mIfz0ESn3d4tnp28Hk
Y+OXTI1BRFlApQU9qGWv33gkNEozeyqMLDRLKhDYRSFPA9UKkpgXQRzeTzoLKyrQ
4B6n5NnUGcu7I6WWhpyZQcZLDsHGyy0vHzjQGs/NXtb1PzXJ5XIGuPdmx9pVMykk
IVKnqRcWyGWahfh3asOnoXvdhi1No4NSHQ/ZHfUM+SrIGYjBMaUisw66qm3Fe8ZU
lbO2CFkCsfGSoKNPHf0lUEGlkyxAiDolazOfflDNxdzzlZo2X1l/a7O/yoO4Pqul
cdL0eDjiNoQ2YR2TSYPnXq5KSL1RI0tlfS8pH8k1hVhZsQx0wpAQ+qki0S+fLePV
PdA9XB82G2tmqKc9cQ==
=9xbT
-----END PGP SIGNATURE-----
Merge tag 'overflow-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull overflow updates from Kees Cook:
"The end goal of the current buffer overflow detection work[0] is to
gain full compile-time and run-time coverage of all detectable buffer
overflows seen via array indexing or memcpy(), memmove(), and
memset(). The str*() family of functions already have full coverage.
While much of the work for these changes have been on-going for many
releases (i.e. 0-element and 1-element array replacements, as well as
avoiding false positives and fixing discovered overflows[1]), this
series contains the foundational elements of several related buffer
overflow detection improvements by providing new common helpers and
FORTIFY_SOURCE changes needed to gain the introspection required for
compiler visibility into array sizes. Also included are a handful of
already Acked instances using the helpers (or related clean-ups), with
many more waiting at the ready to be taken via subsystem-specific
trees[2].
The new helpers are:
- struct_group() for gaining struct member range introspection
- memset_after() and memset_startat() for clearing to the end of
structures
- DECLARE_FLEX_ARRAY() for using flex arrays in unions or alone in
structs
Also included is the beginning of the refactoring of FORTIFY_SOURCE to
support memcpy() introspection, fix missing and regressed coverage
under GCC, and to prepare to fix the currently broken Clang support.
Finishing this work is part of the larger series[0], but depends on
all the false positives and buffer overflow bug fixes to have landed
already and those that depend on this series to land.
As part of the FORTIFY_SOURCE refactoring, a set of both a
compile-time and run-time tests are added for FORTIFY_SOURCE and the
mem*()-family functions respectively. The compile time tests have
found a legitimate (though corner-case) bug[6] already.
Please note that the appearance of "panic" and "BUG" in the
FORTIFY_SOURCE refactoring are the result of relocating existing code,
and no new use of those code-paths are expected nor desired.
Finally, there are two tree-wide conversions for 0-element arrays and
flexible array unions to gain sane compiler introspection coverage
that result in no known object code differences.
After this series (and the changes that have now landed via netdev and
usb), we are very close to finally being able to build with
-Warray-bounds and -Wzero-length-bounds.
However, due corner cases in GCC[3] and Clang[4], I have not included
the last two patches that turn on these options, as I don't want to
introduce any known warnings to the build. Hopefully these can be
solved soon"
Link: https://lore.kernel.org/lkml/20210818060533.3569517-1-keescook@chromium.org/ [0]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?qt=grep&q=FORTIFY_SOURCE [1]
Link: https://lore.kernel.org/lkml/202108220107.3E26FE6C9C@keescook/ [2]
Link: https://lore.kernel.org/lkml/3ab153ec-2798-da4c-f7b1-81b0ac8b0c5b@roeck-us.net/ [3]
Link: https://bugs.llvm.org/show_bug.cgi?id=51682 [4]
Link: https://lore.kernel.org/lkml/202109051257.29B29745C0@keescook/ [5]
Link: https://lore.kernel.org/lkml/20211020200039.170424-1-keescook@chromium.org/ [6]
* tag 'overflow-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (30 commits)
fortify: strlen: Avoid shadowing previous locals
compiler-gcc.h: Define __SANITIZE_ADDRESS__ under hwaddress sanitizer
treewide: Replace 0-element memcpy() destinations with flexible arrays
treewide: Replace open-coded flex arrays in unions
stddef: Introduce DECLARE_FLEX_ARRAY() helper
btrfs: Use memset_startat() to clear end of struct
string.h: Introduce memset_startat() for wiping trailing members and padding
xfrm: Use memset_after() to clear padding
string.h: Introduce memset_after() for wiping trailing members/padding
lib: Introduce CONFIG_MEMCPY_KUNIT_TEST
fortify: Add compile-time FORTIFY_SOURCE tests
fortify: Allow strlen() and strnlen() to pass compile-time known lengths
fortify: Prepare to improve strnlen() and strlen() warnings
fortify: Fix dropped strcpy() compile-time write overflow check
fortify: Explicitly disable Clang support
fortify: Move remaining fortify helpers into fortify-string.h
lib/string: Move helper functions out of string.c
compiler_types.h: Remove __compiletime_object_size()
cm4000_cs: Use struct_group() to zero struct cm4000_dev region
can: flexcan: Use struct_group() to zero struct flexcan_regs regions
...
Replace uses of mem_encrypt_active() with calls to cc_platform_has() with
the CC_ATTR_MEM_ENCRYPT attribute.
Remove the implementation of mem_encrypt_active() across all arches.
For s390, since the default implementation of the cc_platform_has()
matches the s390 implementation of mem_encrypt_active(), cc_platform_has()
does not need to be implemented in s390 (the config option
ARCH_HAS_CC_PLATFORM is not set).
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210928191009.32551-9-bp@alien8.de
Replace uses of sme_active() with the more generic cc_platform_has()
using CC_ATTR_HOST_MEM_ENCRYPT. If future support is added for other
memory encryption technologies, the use of CC_ATTR_HOST_MEM_ENCRYPT
can be updated, as required.
This also replaces two usages of sev_active() that are really geared
towards detecting if SME is active.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210928191009.32551-6-bp@alien8.de
This patch makes iommu/amd call report_iommu_fault() when an I/O page
fault occurs, which has two effects:
1) It allows device drivers to register a callback to be notified of
I/O page faults, via the iommu_set_fault_handler() API.
2) It triggers the io_page_fault tracepoint in report_iommu_fault()
when an I/O page fault occurs.
The latter point is the main aim of this patch, as it allows
rasdaemon-like daemons to be notified of I/O page faults, and to
possibly initiate corrective action in response.
A number of other IOMMU drivers already use report_iommu_fault(), and
I/O page faults on those IOMMUs therefore already trigger this
tracepoint -- but this isn't yet the case for AMD-Vi and Intel DMAR.
The AMD IOMMU specification suggests that the bit in an I/O page fault
event log entry that signals whether an I/O page fault was for a read
request or for a write request is only meaningful when the faulting
access was to a present page, but some testing on a Ryzen 3700X suggests
that this bit encodes the correct value even for I/O page faults to
non-present pages, and therefore, this patch passes the R/W information
up the stack even for I/O page faults to non-present pages.
Signed-off-by: Lennert Buytenhek <buytenh@arista.com>
Link: https://lore.kernel.org/r/YVLyBW97vZLpOaAp@wantstofly.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
intentionally writing across neighboring fields.
Use struct_group() in struct ivhd_entry around members ext and hidh, so
they can be referenced together. This will allow memcpy() and sizeof()
to more easily reason about sizes, improve readability, and avoid future
warnings about writing beyond the end of ext.
"pahole" shows no size nor member offset changes to struct ivhd_entry.
"objdump -d" shows no object code changes.
Cc: Will Deacon <will@kernel.org>
Cc: iommu@lists.linux-foundation.org
Acked-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Since the function has been simplified and only call iommu_init_ga_log(),
remove the function and replace with iommu_init_ga_log() instead.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20210820202957.187572-4-suravee.suthikulpanit@amd.com
Fixes: 8bda0cfbdc ("iommu/amd: Detect and initialize guest vAPIC log")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Currently, iommu_init_ga() checks and disables IOMMU VAPIC support
(i.e. AMD AVIC support in IOMMU) when GAMSup feature bit is not set.
However it forgets to clear IRQ_POSTING_CAP from the previously set
amd_iommu_irq_ops.capability.
This triggers an invalid page fault bug during guest VM warm reboot
if AVIC is enabled since the irq_remapping_cap(IRQ_POSTING_CAP) is
incorrectly set, and crash the system with the following kernel trace.
BUG: unable to handle page fault for address: 0000000000400dd8
RIP: 0010:amd_iommu_deactivate_guest_mode+0x19/0xbc
Call Trace:
svm_set_pi_irte_mode+0x8a/0xc0 [kvm_amd]
? kvm_make_all_cpus_request_except+0x50/0x70 [kvm]
kvm_request_apicv_update+0x10c/0x150 [kvm]
svm_toggle_avic_for_irq_window+0x52/0x90 [kvm_amd]
svm_enable_irq_window+0x26/0xa0 [kvm_amd]
vcpu_enter_guest+0xbbe/0x1560 [kvm]
? avic_vcpu_load+0xd5/0x120 [kvm_amd]
? kvm_arch_vcpu_load+0x76/0x240 [kvm]
? svm_get_segment_base+0xa/0x10 [kvm_amd]
kvm_arch_vcpu_ioctl_run+0x103/0x590 [kvm]
kvm_vcpu_ioctl+0x22a/0x5d0 [kvm]
__x64_sys_ioctl+0x84/0xc0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes by moving the initializing of AMD IOMMU interrupt remapping mode
(amd_iommu_guest_ir) earlier before setting up the
amd_iommu_irq_ops.capability with appropriate IRQ_POSTING_CAP flag.
[joro: Squashed the two patches and limited
check_features_on_all_iommus() to CONFIG_IRQ_REMAP
to fix a compile warning.]
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Co-developed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20210820202957.187572-2-suravee.suthikulpanit@amd.com
Link: https://lore.kernel.org/r/20210820202957.187572-3-suravee.suthikulpanit@amd.com
Fixes: 8bda0cfbdc ("iommu/amd: Detect and initialize guest vAPIC log")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Remove the new use of the variable introduced in the AMD driver branch.
The variable was removed already in the iommu core branch, causing build
errors when the brances are merged.
Cc: Nadav Amit <namit@vmware.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20210802150643.3634-1-joro@8bytes.org
When running on an AMD vIOMMU, it is better to avoid TLB flushes
of unmodified PTEs. vIOMMUs require the hypervisor to synchronize the
virtualized IOMMU's PTEs with the physical ones. This process induce
overheads.
AMD IOMMU allows us to flush any range that is aligned to the power of
2. So when running on top of a vIOMMU, break the range into sub-ranges
that are naturally aligned, and flush each one separately. This apporach
is better when running with a vIOMMU, but on physical IOMMUs, the
penalty of IOTLB misses due to unnecessary flushed entries is likely to
be low.
Repurpose (i.e., keeping the name, changing the logic)
domain_flush_pages() so it is used to choose whether to perform one
flush of the whole range or multiple ones to avoid flushing unnecessary
ranges. Use NpCache, as usual, to infer whether the IOMMU is physical or
virtual.
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Jiajun Cao <caojiajun@vmware.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Link: https://lore.kernel.org/r/20210723093209.714328-8-namit@vmware.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
On virtual machines, software must flush the IOTLB after each page table
entry update.
The iommu_map_sg() code iterates through the given scatter-gather list
and invokes iommu_map() for each element in the scatter-gather list,
which calls into the vendor IOMMU driver through iommu_ops callback. As
the result, a single sg mapping may lead to multiple IOTLB flushes.
Fix this by adding amd_iotlb_sync_map() callback and flushing at this
point after all sg mappings we set.
This commit is followed and inspired by commit 933fcd01e9
("iommu/vt-d: Add iotlb_sync_map callback").
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Jiajun Cao <caojiajun@vmware.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Nadav Amit <namit@vmware.com>
Link: https://lore.kernel.org/r/20210723093209.714328-7-namit@vmware.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
AMD's IOMMU can flush efficiently (i.e., in a single flush) any range.
This is in contrast, for instnace, to Intel IOMMUs that have a limit on
the number of pages that can be flushed in a single flush. In addition,
AMD's IOMMU do not care about the page-size, so changes of the page size
do not need to trigger a TLB flush.
So in most cases, a TLB flush due to disjoint range is not needed for
AMD. Yet, vIOMMUs require the hypervisor to synchronize the virtualized
IOMMU's PTEs with the physical ones. This process induce overheads, so
it is better not to cause unnecessary flushes, i.e., flushes of PTEs
that were not modified.
Implement and use amd_iommu_iotlb_gather_add_page() and use it instead
of the generic iommu_iotlb_gather_add_page(). Ignore disjoint regions
unless "non-present cache" feature is reported by the IOMMU
capabilities, as this is an indication we are running on a physical
IOMMU. A similar indication is used by VT-d (see "caching mode"). The
new logic retains the same flushing behavior that we had before the
introduction of page-selective IOTLB flushes for AMD.
On virtualized environments, check if the newly flushed region and the
gathered one are disjoint and flush if it is.
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Jiajun Cao <caojiajun@vmware.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Link: https://lore.kernel.org/r/20210723093209.714328-6-namit@vmware.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Do not use flush-queue on virtualized environments, where the NpCache
capability of the IOMMU is set. This is required to reduce
virtualization overheads.
This change follows a similar change to Intel's VT-d and a detailed
explanation as for the rationale is described in commit 29b3283972
("iommu/vt-d: Do not use flush-queue when caching-mode is on").
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Jiajun Cao <caojiajun@vmware.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Nadav Amit <namit@vmware.com>
Link: https://lore.kernel.org/r/20210723093209.714328-3-namit@vmware.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Recent patch attempted to enable selective page flushes on AMD IOMMU but
neglected to adapt amd_iommu_iotlb_sync() to use the selective flushes.
Adapt amd_iommu_iotlb_sync() to use selective flushes and change
amd_iommu_unmap() to collect the flushes. As a defensive measure, to
avoid potential issues as those that the Intel IOMMU driver encountered
recently, flush the page-walk caches by always setting the "pde"
parameter. This can be removed later.
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Jiajun Cao <caojiajun@vmware.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Nadav Amit <namit@vmware.com>
Link: https://lore.kernel.org/r/20210723093209.714328-2-namit@vmware.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
For the printing of RMP_HW_ERROR / RMP_PAGE_FAULT / IO_PAGE_FAULT
events, the AMD IOMMU code uses such logic:
if (pdev)
dev_data = dev_iommu_priv_get(&pdev->dev);
if (dev_data && __ratelimit(&dev_data->rs)) {
pci_err(pdev, ...
} else {
printk_ratelimit() / pr_err{,_ratelimited}(...
}
This means that if we receive an event for a PCI devid which actually
does have a struct pci_dev and an attached struct iommu_dev_data, but
rate limiting kicks in, we'll fall back to the non-PCI branch of the
test, and print the event in a different format.
Fix this by changing the logic to:
if (dev_data) {
if (__ratelimit(&dev_data->rs)) {
pci_err(pdev, ...
}
} else {
pr_err_ratelimited(...
}
Suggested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/YPgk1dD1gPMhJXgY@wantstofly.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
refcount_t type and corresponding API can protect refcounters from
accidental underflow and overflow and further use-after-free situations.
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/1626683578-64214-1-git-send-email-xiyuyang19@fudan.edu.cn
Signed-off-by: Joerg Roedel <jroedel@suse.de>
If people are going to insist on calling iommu_iova_to_phys()
pointlessly and expecting it to work, we can at least do ourselves a
favour by handling those cases in the core code, rather than repeatedly
across an inconsistent handful of drivers.
Since all the existing drivers implement the internal callback, and any
future ones are likely to want to work with iommu-dma which relies on
iova_to_phys a fair bit, we may as well remove that currently-redundant
check as well and consider it mandatory.
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/f564f3f6ff731b898ff7a898919bf871c2c7745a.1626354264.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
We only ever now set strict mode enabled in iommu_set_dma_strict(), so
just remove the argument.
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1626088340-5838-7-git-send-email-john.garry@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Make IOMMU_DEFAULT_LAZY default for when AMD_IOMMU config is set, which
matches current behaviour.
For "fullflush" param, just call iommu_set_dma_strict(true) directly.
Since we get a strict vs lazy mode print already in iommu_subsys_init(),
and maintain a deprecation print when "fullflush" param is passed, drop the
prints in amd_iommu_init_dma_ops().
Finally drop global flag amd_iommu_unmap_flush, as it has no longer has any
purpose.
[jpg: Rebase for relocated file and drop amd_iommu_unmap_flush]
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1626088340-5838-6-git-send-email-john.garry@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Now that the x86 drivers support iommu.strict, deprecate the custom
methods.
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1626088340-5838-2-git-send-email-john.garry@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Passing a 64-bit address width to iommu_setup_dma_ops() is valid on
virtual platforms, but isn't currently possible. The overflow check in
iommu_dma_init_domain() prevents this even when @dma_base isn't 0. Pass
a limit address instead of a size, so callers don't have to fake a size
to work around the check.
The base and limit parameters are being phased out, because:
* they are redundant for x86 callers. dma-iommu already reserves the
first page, and the upper limit is already in domain->geometry.
* they can now be obtained from dev->dma_range_map on Arm.
But removing them on Arm isn't completely straightforward so is left for
future work. As an intermediate step, simplify the x86 callers by
passing dummy limits.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20210618152059.1194210-5-jean-philippe@linaro.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
A recent commit introduced this section mismatch warning:
WARNING: modpost: vmlinux.o(.text.unlikely+0x22a1f): Section mismatch in reference from the function detect_ivrs() to the variable .init.data:amd_iommu_force_enable
The reason is that detect_ivrs() is not marked __init while it should
be, because it is only called from another __init function. Mark
detect_ivrs() __init to get rid of the warning.
Fixes: b1e650db2c ("iommu/amd: Add amd_iommu=force_enable option")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20210608122843.8413-1-joro@8bytes.org
Now that DMA ops are part of the core API via iommu-dma, fold the
vestigial remains of the IOMMU_DMA_OPS init state into the IOMMU API
phase, and clean up a few other leftovers. This should also close the
race window wherein bus_set_iommu() effectively makes the DMA ops state
visible before its nominal initialisation - it seems this was previously
fairly benign, but since commit a250c23f15 ("iommu: remove
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE") it can now lead to the strict flush
queue policy inadvertently being picked for default domains allocated
during that window, with a corresponding unexpected perfomance impact.
Reported-by: Jussi Maki <joamaki@gmail.com>
Tested-by: Jussi Maki <joamaki@gmail.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Fixes: a250c23f15 ("iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE")
Link: https://lore.kernel.org/r/665db61e23ff8d54ac5eb391bef520b3a803fcb9.1622727974.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add this option to enable the IOMMU on platforms like AMD Stoney,
where the kernel usually disables it because it may cause problems in
some scenarios.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20210603130203.29016-1-joro@8bytes.org
print_iommu_info prints the EFR register and then the decoded list of
features on a separate line:
pci 0000:00:00.2: AMD-Vi: Extended features (0x206d73ef22254ade):
PPR X2APIC NX GT IA GA PC GA_vAPIC
The second line is emitted via 'pr_cont', which causes it to have a
different ('warn') loglevel compared to the previous line ('info').
Commit 9a295ff0ff attempted to rectify this by removing the newline
from the pci_info format string, but this doesn't work, as pci_info
calls implicitly append a newline anyway.
Printing the decoded features on the same line would make it quite long.
Instead, change pci_info() to pr_info() to omit PCI bus location info,
which is also shown in the preceding message. This results in:
pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
AMD-Vi: Extended features (0x206d73ef22254ade): PPR X2APIC NX GT IA GA PC GA_vAPIC
AMD-Vi: Interrupt remapping enabled
Fixes: 9a295ff0ff ("iommu/amd: Print extended features in one line to fix divergent log levels")
Link: https://lore.kernel.org/lkml/alpine.LNX.2.20.13.2104112326460.11104@monopod.intra.ispras.ru
Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: iommu@lists.linux-foundation.org
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Link: https://lore.kernel.org/r/20210504102220.1793-1-amonakov@ispras.ru
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The logic to determine the mask of page-specific invalidations was
tested in userspace. As the code was copied into the kernel, the
parentheses were mistakenly set in the wrong place, resulting in the
wrong mask.
Fix it.
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Jiajun Cao <caojiajun@vmware.com>
Cc: iommu@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org
Fixes: 268aa45482 ("iommu/amd: Page-specific invalidations for more than one page")
Signed-off-by: Nadav Amit <namit@vmware.com>
Link: https://lore.kernel.org/r/20210502070001.1559127-2-namit@vmware.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Since commit 08a27c1c3e ("iommu: Add support to change default domain
of an iommu group") a user can switch a device between IOMMU and direct
DMA through sysfs. This doesn't work for AMD IOMMU at the moment because
dev->dma_ops is not cleared when switching from a DMA to an identity
IOMMU domain. The DMA layer thus attempts to use the dma-iommu ops on an
identity domain, causing an oops:
# echo 0000:00:05.0 > /sys/sys/bus/pci/drivers/e1000e/unbind
# echo identity > /sys/bus/pci/devices/0000:00:05.0/iommu_group/type
# echo 0000:00:05.0 > /sys/sys/bus/pci/drivers/e1000e/bind
...
BUG: kernel NULL pointer dereference, address: 0000000000000028
...
Call Trace:
iommu_dma_alloc
e1000e_setup_tx_resources
e1000e_open
Since iommu_change_dev_def_domain() calls probe_finalize() again, clear
the dma_ops there like Vt-d does.
Fixes: 08a27c1c3e ("iommu: Add support to change default domain of an iommu group")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20210422094216.2282097-1-jean-philippe@linaro.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Rather than have separate opaque setter functions that are easy to
overlook and lead to repetitive boilerplate in drivers, let's pass the
relevant initialisation parameters directly to iommu_device_register().
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/ab001b87c533b6f4db71eb90db6f888953986c36.1617285386.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In early AMD desktop/mobile platforms (during 2013), when the IOMMU
Performance Counter (PMC) support was first introduced in
commit 30861ddc9c ("perf/x86/amd: Add IOMMU Performance Counter
resource management"), there was a HW bug where the counters could not
be accessed. The result was reading of the counter always return zero.
At the time, the suggested workaround was to add a test logic prior
to initializing the PMC feature to check if the counters can be programmed
and read back the same value. This has been working fine until the more
recent desktop/mobile platforms start enabling power gating for the PMC,
which prevents access to the counters. This results in the PMC support
being disabled unnecesarily.
Unfortunatly, there is no documentation of since which generation
of hardware the original PMC HW bug was fixed. Although, it was fixed
soon after the first introduction of the PMC. Base on this, we assume
that the buggy platforms are less likely to be in used, and it should
be relatively safe to remove this legacy logic.
Link: https://lore.kernel.org/linux-iommu/alpine.LNX.3.20.13.2006030935570.3181@monopod.intra.ispras.ru/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753
Cc: Tj (Elloe Linux) <ml.linux@elloe.vision>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Alexander Monakov <amonakov@ispras.ru>
Cc: David Coe <david.coe@live.co.uk>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20210409085848.3908-3-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This reverts commit 6778ff5b21.
The original commit tries to address an issue, where PMC power-gating
causing the IOMMU PMC pre-init test to fail on certain desktop/mobile
platforms where the power-gating is normally enabled.
There have been several reports that the workaround still does not
guarantee to work, and can add up to 100 ms (on the worst case)
to the boot process on certain platforms such as the MSI B350M MORTAR
with AMD Ryzen 3 2200G.
Therefore, revert this commit as a prelude to removing the pre-init
test.
Link: https://lore.kernel.org/linux-iommu/alpine.LNX.3.20.13.2006030935570.3181@monopod.intra.ispras.ru/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753
Cc: Tj (Elloe Linux) <ml.linux@elloe.vision>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Alexander Monakov <amonakov@ispras.ru>
Cc: David Coe <david.coe@live.co.uk>
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20210409085848.3908-2-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
'devid' has been checked in function check_device, no need to double
check and clean up this.
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Link: https://lore.kernel.org/r/1617939040-35579-1-git-send-email-zhangshaokun@hisilicon.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Currently, IOMMU invalidations and device-IOTLB invalidations using
AMD IOMMU fall back to full address-space invalidation if more than a
single page need to be flushed.
Full flushes are especially inefficient when the IOMMU is virtualized by
a hypervisor, since it requires the hypervisor to synchronize the entire
address-space.
AMD IOMMUs allow to provide a mask to perform page-specific
invalidations for multiple pages that match the address. The mask is
encoded as part of the address, and the first zero bit in the address
(in bits [51:12]) indicates the mask size.
Use this hardware feature to perform selective IOMMU and IOTLB flushes.
Combine the logic between both for better code reuse.
The IOMMU invalidations passed a smoke-test. The device IOTLB
invalidations are untested.
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Jiajun Cao <caojiajun@vmware.com>
Cc: iommu@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Nadav Amit <namit@vmware.com>
Link: https://lore.kernel.org/r/20210323210619.513069-1-namit@vmware.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
A few functions that were intentended for the perf events support are
currently declared in arch/x86/events/amd/iommu.h, which mens they are
not in scope for the actual function definition. Also amdkfd has started
using a few of them using externs in a .c file. End that misery by
moving the prototypes to the proper header.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210402143312.372386-5-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Remove exports for functions that are only used in the AMD IOMMU driver
itself, or the also always builtin perf events support.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210402143312.372386-4-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Instead make the global iommu_dma_strict paramete in iommu.c canonical by
exporting helpers to get and set it and use those directly in the drivers.
This make sure that the iommu.strict parameter also works for the AMD and
Intel IOMMU drivers on x86. As those default to lazy flushing a new
IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to
represent the default if not overriden by an explicit parameter.
[ported on top of the other iommu_attr changes and added a few small
missing bits]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210401155256.298656-19-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
increase_address_space() calls get_zeroed_page(gfp) under spin_lock with
disabled interrupts. gfp flags passed to increase_address_space() may allow
sleeping, so it comes to this:
BUG: sleeping function called from invalid context at mm/page_alloc.c:4342
in_atomic(): 1, irqs_disabled(): 1, pid: 21555, name: epdcbbf1qnhbsd8
Call Trace:
dump_stack+0x66/0x8b
___might_sleep+0xec/0x110
__alloc_pages_nodemask+0x104/0x300
get_zeroed_page+0x15/0x40
iommu_map_page+0xdd/0x3e0
amd_iommu_map+0x50/0x70
iommu_map+0x106/0x220
vfio_iommu_type1_ioctl+0x76e/0x950 [vfio_iommu_type1]
do_vfs_ioctl+0xa3/0x6f0
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x4e/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fix this by moving get_zeroed_page() out of spin_lock/unlock section.
Fixes: 754265bcab ("iommu/amd: Fix race in increase_address_space()")
Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210217143004.19165-1-arbn@yandex-team.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Certain AMD platforms enable power gating feature for IOMMU PMC,
which prevents the IOMMU driver from updating the counter while
trying to validate the PMC functionality in the init_iommu_perf_ctr().
This results in disabling PMC support and the following error message:
"AMD-Vi: Unable to read/write to IOMMU perf counter"
To workaround this issue, disable power gating temporarily by programming
the counter source to non-zero value while validating the counter,
and restore the prior state afterward.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Tj (Elloe Linux) <ml.linux@elloe.vision>
Link: https://lore.kernel.org/r/20210208122712.5048-1-suravee.suthikulpanit@amd.com
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753
Signed-off-by: Joerg Roedel <jroedel@suse.de>
These implement map and unmap for AMD IOMMU v1 pagetable, which
will be used by the IO pagetable framework.
Also clean up unused extern function declarations.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20201215073705.123786-13-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Since the IO page table root and mode parameters have been moved into
the struct amd_io_pg, the function is no longer needed. Therefore,
remove it along with the struct domain_pgtable.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20201215073705.123786-9-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
And move declaration to header file so that they can be included across
multiple files. There is no functional change.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20201215073705.123786-6-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
IOMMU Extended Feature Register (EFR) is used to communicate
the supported features for each IOMMU to the IOMMU driver.
This is normally read from the PCI MMIO register offset 0x30,
and used by the iommu_feature() helper function.
However, there are certain scenarios where the information is needed
prior to PCI initialization, and the iommu_feature() function is used
prematurely w/o warning. This has caused incorrect initialization of IOMMU.
This is the case for the commit 6d39bdee23 ("iommu/amd: Enforce 4k
mapping for certain IOMMU data structures")
Since, the EFR is also available in the IVHD header, and is available to
the driver prior to PCI initialization. Therefore, default to using
the IVHD EFR instead.
Fixes: 6d39bdee23 ("iommu/amd: Enforce 4k mapping for certain IOMMU data structures")
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Brijesh Singh <brijesh.singh@amd.com>
Reviewed-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/20210120135002.2682-1-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
See Documentation/core-api/printk-formats.rst.
commit cbacb5ab0a ("docs: printk-formats: Stop encouraging use of unnecessary %h[xudi] and %hh[xudi]")
Standard integer promotion is already done and %hx and %hhx is useless
so do not encourage the use of %hh[xudi] or %h[xudi].
Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20201215213021.2090698-1-trix@redhat.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Spinlock can be initialized automatically with DEFINE_SPINLOCK()
rather than explicitly calling spin_lock_init().
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Link: https://lore.kernel.org/r/20201228135112.28621-1-zhengyongjun3@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
From: Adrian Huang <ahuang12@lenovo.com>
The values of local variables are assigned after local variables
are declared, so no need to assign the initial value during the
variable declaration.
And, no need to assign NULL for the local variable 'ivrs_base'
after invoking acpi_put_table().
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201210021330.2022-1-adrianhuang0701@gmail.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The AMD IOMMU initialisation registers the IRQ remapping domain for
each IOMMU before doing the final sanity check that every I/OAPIC is
covered.
This means that the AMD irq_remapping_select() function gets invoked
even when IRQ remapping has been disabled, eventually leading to a NULL
pointer dereference in alloc_irq_table().
Unfortunately, the IVRS isn't fully parsed early enough that the sanity
check can be done in time to registering the IRQ domain altogether.
Doing that would be nice, but is a larger and more error-prone task. The
simple fix is just for irq_remapping_select() to refuse to report a
match when IRQ remapping has disabled.
Link: https://lore.kernel.org/lkml/ed4be9b4-24ac-7128-c522-7ef359e8185d@gmx.at
Fixes: a1a785b572 ("iommu/amd: Implement select() method on remapping irqdomain")
Reported-by: Johnathan Smithinovic <johnathan.smithinovic@gmx.at>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/04bbe8bca87f81a3cfa93ec4299e53f47e00e5b3.camel@infradead.org
Signed-off-by: Will Deacon <will@kernel.org>
When I made the INTCAPXT support stop gratuitously pretending to be MSI,
I missed the fact that iommu_setup_msi() also sets the ->int_enabled
flag. I missed this in the iommu_setup_intcapxt() code path, which means
that a resume from suspend will try to allocate the IRQ domains again,
accidentally re-enabling interrupts as it does, resulting in much sadness.
Lift out the bit which sets iommu->int_enabled into the iommu_init_irq()
function which is also where it gets checked.
Link: https://lore.kernel.org/r/20210104132250.GE32151@zn.tnic/
Fixes: d1adcfbb52 ("iommu/amd: Fix IOMMU interrupt generation in X2APIC mode")
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/50cd5f55be8ead0937ac315cd2f5b89364f6a9a5.camel@infradead.org
Signed-off-by: Will Deacon <will@kernel.org>
- IOVA allocation optimisations and removal of unused code
- Introduction of DOMAIN_ATTR_IO_PGTABLE_CFG for parameterising the
page-table of an IOMMU domain
- Support for changing the default domain type in sysfs
- Optimisation to the way in which identity-mapped regions are created
- Driver updates:
* Arm SMMU updates, including continued work on Shared Virtual Memory
* Tegra SMMU updates, including support for PCI devices
* Intel VT-D updates, including conversion to the IOMMU-DMA API
- Cleanup, kerneldoc and minor refactoring
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl/XWy8QHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNPejB/46QsXATkWt7hbDPIxlUvzUG8VP/FBNJ6A3
/4Z+4KBXR3zhvZJOEqTarnm6Uc22tWkYpNS3QAOuRW0EfVeD8H+og4SOA2iri5tR
x3GZUCng93APWpHdDtJP7kP/xuU47JsBblY/Ip9aJKYoXi9c9svtssAqKr008wxr
knv/xv/awQ0O7CNc3gAoz7mUagQxG/no+HMXMT3Fz9KWRzzvTi6s+7ZDm2faI0hO
GEJygsKbXxe1qbfeGqKTP/67EJVqjTGsLCF2zMogbnnD7DxadJ2hP0oNg5tvldT/
oDj9YWG6oLMfIVCwDVQXuWNfKxd7RGORMbYwKNAaRSvmkli6625h
=KFOO
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull IOMMU updates from Will Deacon:
"There's a good mixture of improvements to the core code and driver
changes across the board.
One thing worth pointing out is that this includes a quirk to work
around behaviour in the i915 driver (see 65f746e828 ("iommu: Add
quirk for Intel graphic devices in map_sg")), which otherwise
interacts badly with the conversion of the intel IOMMU driver over to
the DMA-IOMMU APU but has being fixed properly in the DRM tree.
We'll revert the quirk later this cycle once we've confirmed that
things don't fall apart without it.
Summary:
- IOVA allocation optimisations and removal of unused code
- Introduction of DOMAIN_ATTR_IO_PGTABLE_CFG for parameterising the
page-table of an IOMMU domain
- Support for changing the default domain type in sysfs
- Optimisation to the way in which identity-mapped regions are
created
- Driver updates:
* Arm SMMU updates, including continued work on Shared Virtual
Memory
* Tegra SMMU updates, including support for PCI devices
* Intel VT-D updates, including conversion to the IOMMU-DMA API
- Cleanup, kerneldoc and minor refactoring"
* tag 'iommu-updates-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (50 commits)
iommu/amd: Add sanity check for interrupt remapping table length macros
dma-iommu: remove __iommu_dma_mmap
iommu/io-pgtable: Remove tlb_flush_leaf
iommu: Stop exporting free_iova_mem()
iommu: Stop exporting alloc_iova_mem()
iommu: Delete split_and_remove_iova()
iommu/io-pgtable-arm: Remove unused 'level' parameter from iopte_type() macro
iommu: Defer the early return in arm_(v7s/lpae)_map
iommu: Improve the performance for direct_mapping
iommu: avoid taking iova_rbtree_lock twice
iommu/vt-d: Avoid GFP_ATOMIC where it is not needed
iommu/vt-d: Remove set but not used variable
iommu: return error code when it can't get group
iommu: Fix htmldocs warnings in sysfs-kernel-iommu_groups
iommu: arm-smmu-impl: Add a space before open parenthesis
iommu: arm-smmu-impl: Use table to list QCOM implementations
iommu/arm-smmu: Move non-strict mode to use io_pgtable_domain_attr
iommu/arm-smmu: Add support for pagetable config domain attribute
iommu: Document usage of "/sys/kernel/iommu_groups/<grp_id>/type" file
iommu: Take lock before reading iommu group default domain type
...
- Simplification and distangling of the MSI related functionality
- Let IO/APIC construct the RTE entries from an MSI message instead of
having IO/APIC specific code in the interrupt remapping drivers
- Make the retrieval of the parent interrupt domain (vector or remap
unit) less hardcoded and use the relevant irqdomain callbacks for
selection.
- Allow the handling of more than 255 CPUs without a virtualized IOMMU
when the hypervisor supports it. This has made been possible by the
above modifications and also simplifies the existing workaround in the
HyperV specific virtual IOMMU.
- Cleanup of the historical timer_works() irq flags related
inconsistencies.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/Xxd8THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoYpOD/9C5TppNlPMUyx2SflH6bxt37pJEpln
+hYTKsk+jSThntr5mfj+GifGvgmHOVBTGnlDUnUnrpN7TQmLFBzwTOtnBLW53AO2
16/u0+Xci4LNCtEkaymf0Rq4MfsfriXHPJr0A/CnZ0tpHSf5QKHAiitSiGujdMlb
gbq43+zXd+jNkH7vkOLPX/7dZVI1hNASQEevJu2tRR4xYTuXFdBxvLgYkHtYKKrK
R1sbs6nI6yIzye2u4m4xGu29SxgUft+zdUf+UehJKM3yFmf51d9qpkX+kLaTWuaL
VPsMItbn0kdvxwXQWO6DYnIAAnVKCklyHQJTZCoNq9Fe91OoByak1CEVspSOa1av
JmycNSch4IYWasR4vVCB1gbb+V9SejcKu5SV3CDrEDqwkOIpfiqpriUXSCJTLlFd
QOEDOLuuk/79Qs//J/tb/nJ4IuKv8WPudDfIlMro8wUsAr67DjD4mnXprZ+svwWx
Ct/0/Memk+BSa0cw6pvg24BUZGN6zrufkBu2HKT9GOXRUdNkdLkiPhT8mK4T/O0l
f90QCLjPSOJ/K/pLEWdUHEPmgC5Q9RsXOmwVGqX+RbjfP7mYTJXlmWnBb+cFNch0
xFIH3SxVGylxxT06NX3SkvinrHj10CoAlmneefBlLtx6dF+2P84DAMZSF0OFToVI
c2KMg5zoesI4bg==
=8Gfs
-----END PGP SIGNATURE-----
Merge tag 'x86-apic-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic updates from Thomas Gleixner:
"Yet another large set of x86 interrupt management updates:
- Simplification and distangling of the MSI related functionality
- Let IO/APIC construct the RTE entries from an MSI message instead
of having IO/APIC specific code in the interrupt remapping drivers
- Make the retrieval of the parent interrupt domain (vector or remap
unit) less hardcoded and use the relevant irqdomain callbacks for
selection.
- Allow the handling of more than 255 CPUs without a virtualized
IOMMU when the hypervisor supports it. This has made been possible
by the above modifications and also simplifies the existing
workaround in the HyperV specific virtual IOMMU.
- Cleanup of the historical timer_works() irq flags related
inconsistencies"
* tag 'x86-apic-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
x86/ioapic: Cleanup the timer_works() irqflags mess
iommu/hyper-v: Remove I/O-APIC ID check from hyperv_irq_remapping_select()
iommu/amd: Fix IOMMU interrupt generation in X2APIC mode
iommu/amd: Don't register interrupt remapping irqdomain when IR is disabled
iommu/amd: Fix union of bitfields in intcapxt support
x86/ioapic: Correct the PCI/ISA trigger type selection
x86/ioapic: Use I/O-APIC ID for finding irqdomain, not index
x86/hyperv: Enable 15-bit APIC ID if the hypervisor supports it
x86/kvm: Enable 15-bit extension when KVM_FEATURE_MSI_EXT_DEST_ID detected
iommu/hyper-v: Disable IRQ pseudo-remapping if 15 bit APIC IDs are available
x86/apic: Support 15 bits of APIC ID in MSI where available
x86/ioapic: Handle Extended Destination ID field in RTE
iommu/vt-d: Simplify intel_irq_remapping_select()
x86: Kill all traces of irq_remapping_get_irq_domain()
x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain
x86/hpet: Use irq_find_matching_fwspec() to find remapping irqdomain
iommu/hyper-v: Implement select() method on remapping irqdomain
iommu/vt-d: Implement select() method on remapping irqdomain
iommu/amd: Implement select() method on remapping irqdomain
x86/apic: Add select() method on vector irqdomain
...
Currently, macros related to the interrupt remapping table length are
defined separately. This has resulted in an oversight in which one of
the macros were missed when changing the length. To prevent this,
redefine the macros to add built-in sanity check.
Also, rename macros to use the name of the DTE[IntTabLen] field as
specified in the AMD IOMMU specification. There is no functional change.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Will Deacon <will@kernel.org>
Cc: Jerry Snitselaar <jsnitsel@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Link: https://lore.kernel.org/r/20201210162436.126321-1-suravee.suthikulpanit@amd.com
Signed-off-by: Will Deacon <will@kernel.org>
According to the AMD IOMMU spec, the commit 73db2fc595
("iommu/amd: Increase interrupt remapping table limit to 512 entries")
also requires the interrupt table length (IntTabLen) to be set to 9
(power of 2) in the device table mapping entry (DTE).
Fixes: 73db2fc595 ("iommu/amd: Increase interrupt remapping table limit to 512 entries")
Reported-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Link: https://lore.kernel.org/r/20201207091920.3052-1-suravee.suthikulpanit@amd.com
Signed-off-by: Will Deacon <will@kernel.org>
AMD IOMMU requires 4k-aligned pages for the event log, the PPR log,
and the completion wait write-back regions. However, when allocating
the pages, they could be part of large mapping (e.g. 2M) page.
This causes #PF due to the SNP RMP hardware enforces the check based
on the page level for these data structures.
So, fix by calling set_memory_4k() on the allocated pages.
Fixes: c69d89aff3 ("iommu/amd: Use 4K page for completion wait write-back semaphore")
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Link: https://lore.kernel.org/r/20201105145832.3065-1-suravee.suthikulpanit@amd.com
Signed-off-by: Will Deacon <will@kernel.org>
The AMD IOMMU has two modes for generating its own interrupts.
The first is very much based on PCI MSI, and can be configured by Linux
precisely that way. But like legacy unmapped PCI MSI it's limited to
8 bits of APIC ID.
The second method does not use PCI MSI at all in hardawre, and instead
configures the INTCAPXT registers in the IOMMU directly with the APIC ID
and vector.
In the latter case, the IOMMU driver would still use pci_enable_msi(),
read back (through MMIO) the MSI message that Linux wrote to the PCI MSI
table, then swizzle those bits into the appropriate register.
Historically, this worked because__irq_compose_msi_msg() would silently
generate an invalid MSI message with the high bits of the APIC ID in the
high bits of the MSI address. That hack was intended only for the Intel
IOMMU, and I recently enforced that, introducing a warning in
__irq_msi_compose_msg() if it was invoked with an APIC ID above 255.
Fix the AMD IOMMU not to depend on that hack any more, by having its own
irqdomain and directly putting the bits from the irq_cfg into the right
place in its ->activate() method.
Fixes: 47bea873cf "x86/msi: Only use high bits of MSI address for DMAR unit")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/05e3a5ba317f5ff48d2f8356f19e617f8b9d23a4.camel@infradead.org
Registering the remapping irq domain unconditionally is potentially
allowing I/O-APIC and MSI interrupts to be parented in the IOMMU IR domain
even when IR is disabled. Don't do that.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201111144322.1659970-1-dwmw2@infradead.org
All the bitfields in here are overlaid on top of each other since
they're a union. Change the second u64 to be in a struct so it does
the intended thing.
Fixes: b5c3786ee3 ("iommu/amd: Use msi_msg shadow structs")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201111144322.1659970-2-dwmw2@infradead.org
Certain device drivers allocate IO queues on a per-cpu basis.
On AMD EPYC platform, which can support up-to 256 cpu threads,
this can exceed the current MAX_IRQ_PER_TABLE limit of 256,
and result in the error message:
AMD-Vi: Failed to allocate IRTE
This has been observed with certain NVME devices.
AMD IOMMU hardware can actually support upto 512 interrupt
remapping table entries. Therefore, update the driver to
match the hardware limit.
Please note that this also increases the size of interrupt remapping
table to 8KB per device when using the 128-bit IRTE format.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20201015025002.87997-1-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The I/O-APIC generates an MSI cycle with address/data bits taken from its
Redirection Table Entry in some combination which used to make sense, but
now is just a bunch of bits which get passed through in some seemingly
arbitrary order.
Instead of making IRQ remapping drivers directly frob the I/OA-PIC RTE, let
them just do their job and generate an MSI message. The bit swizzling to
turn that MSI message into the I/O-APIC's RTE is the same in all cases,
since it's a function of the I/O-APIC hardware. The IRQ remappers have no
real need to get involved with that.
The only slight caveat is that the I/OAPIC is interpreting some of those
fields too, and it does want the 'vector' field to be unique to make EOI
work. The AMD IOMMU happens to put its IRTE index in the bits that the
I/O-APIC thinks are the vector field, and accommodates this requirement by
reserving the first 32 indices for the I/O-APIC. The Intel IOMMU doesn't
actually use the bits that the I/O-APIC thinks are the vector field, so it
fills in the 'pin' value there instead.
[ tglx: Replaced the unreadably macro maze with the cleaned up RTE/msi_msg
bitfields and added commentry to explain the mapping magic ]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-22-dwmw2@infradead.org
Having two seperate structs for the I/O-APIC RTE entries (non-remapped and
DMAR remapped) requires type casts and makes it hard to map.
Combine them in IO_APIC_routing_entry by defining a union of two 64bit
bitfields. Use naming which reflects which bits are shared and which bits
are actually different for the operating modes.
[dwmw2: Fix it up and finish the job, pulling the 32-bit w1,w2 words for
register access into the same union and eliminating a few more
places where bits were accessed through masks and shifts.]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-21-dwmw2@infradead.org
'trigger' and 'polarity' are used throughout the I/O-APIC code for handling
the trigger type (edge/level) and the active low/high configuration. While
there are defines for initializing these variables and struct members, they
are not used consequently and the meaning of 'trigger' and 'polarity' is
opaque and confusing at best.
Rename them to 'is_level' and 'active_low' and make them boolean in various
structs so it's entirely clear what the meaning is.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-20-dwmw2@infradead.org
Get rid of the macro mess and use the shadow structs for the x86 specific
MSI message format. Convert the intcapxt setup to use named bitfields as
well while touching it anyway.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-15-dwmw2@infradead.org
apic::irq_dest_mode is actually a boolean, but defined as u32 and named in
a way which does not explain what it means.
Make it a boolean and rename it to 'dest_mode_logical'
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-9-dwmw2@infradead.org
The enum ioapic_irq_destination_types and the enumerated constants starting
with 'dest_' are gross misnomers because they describe the delivery mode.
Rename then enum and the constants so they actually make sense.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-6-dwmw2@infradead.org
- rework the non-coherent DMA allocator
- move private definitions out of <linux/dma-mapping.h>
- lower CMA_ALIGNMENT (Paul Cercueil)
- remove the omap1 dma address translation in favor of the common
code
- make dma-direct aware of multiple dma offset ranges (Jim Quinlan)
- support per-node DMA CMA areas (Barry Song)
- increase the default seg boundary limit (Nicolin Chen)
- misc fixes (Robin Murphy, Thomas Tai, Xu Wang)
- various cleanups
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl+IiPwLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYPKEQ//TM8vxjucnRl/pklpMin49dJorwiVvROLhQqLmdxw
286ZKpVzYYAPc7LnNqwIBugnFZiXuHu8xPKQkIiOa2OtNDTwhKNoBxOAmOJaV6DD
8JfEtZYeX5mKJ/Nqd2iSkIqOvCwZ9Wzii+aytJ2U88wezQr1fnyF4X49MegETEey
FHWreSaRWZKa0MMRu9AQ0QxmoNTHAQUNaPc0PeqEtPULybfkGOGw4/ghSB7WcKrA
gtKTuooNOSpVEHkTas2TMpcBp6lxtOjFqKzVN0ml+/nqq5NeTSDx91VOCX/6Cj76
mXIg+s7fbACTk/BmkkwAkd0QEw4fo4tyD6Bep/5QNhvEoAriTuSRbhvLdOwFz0EF
vhkF0Rer6umdhSK7nPd7SBqn8kAnP4vBbdmB68+nc3lmkqysLyE4VkgkdH/IYYQI
6TJ0oilXWFmU6DT5Rm4FBqCvfcEfU2dUIHJr5wZHqrF2kLzoZ+mpg42fADoG4GuI
D/oOsz7soeaRe3eYfWybC0omGR6YYPozZJ9lsfftcElmwSsFrmPsbO1DM5IBkj1B
gItmEbOB9ZK3RhIK55T/3u1UWY3Uc/RVr+kchWvADGrWnRQnW0kxYIqDgiOytLFi
JZNH8uHpJIwzoJAv6XXSPyEUBwXTG+zK37Ce769HGbUEaUrE71MxBbQAQsK8mDpg
7fM=
=Bkf/
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-5.10' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- rework the non-coherent DMA allocator
- move private definitions out of <linux/dma-mapping.h>
- lower CMA_ALIGNMENT (Paul Cercueil)
- remove the omap1 dma address translation in favor of the common code
- make dma-direct aware of multiple dma offset ranges (Jim Quinlan)
- support per-node DMA CMA areas (Barry Song)
- increase the default seg boundary limit (Nicolin Chen)
- misc fixes (Robin Murphy, Thomas Tai, Xu Wang)
- various cleanups
* tag 'dma-mapping-5.10' of git://git.infradead.org/users/hch/dma-mapping: (63 commits)
ARM/ixp4xx: add a missing include of dma-map-ops.h
dma-direct: simplify the DMA_ATTR_NO_KERNEL_MAPPING handling
dma-direct: factor out a dma_direct_alloc_from_pool helper
dma-direct check for highmem pages in dma_direct_alloc_pages
dma-mapping: merge <linux/dma-noncoherent.h> into <linux/dma-map-ops.h>
dma-mapping: move large parts of <linux/dma-direct.h> to kernel/dma
dma-mapping: move dma-debug.h to kernel/dma/
dma-mapping: remove <asm/dma-contiguous.h>
dma-mapping: merge <linux/dma-contiguous.h> into <linux/dma-map-ops.h>
dma-contiguous: remove dma_contiguous_set_default
dma-contiguous: remove dev_set_cma_area
dma-contiguous: remove dma_declare_contiguous
dma-mapping: split <linux/dma-mapping.h>
cma: decrease CMA_ALIGNMENT lower limit to 2
firewire-ohci: use dma_alloc_pages
dma-iommu: implement ->alloc_noncoherent
dma-mapping: add new {alloc,free}_noncoherent dma_map_ops methods
dma-mapping: add a new dma_alloc_pages API
dma-mapping: remove dma_cache_sync
53c700: convert to dma_alloc_noncoherent
...
Including:
- ARM-SMMU Updates from Will:
- Continued SVM enablement, where page-table is shared with
CPU
- Groundwork to support integrated SMMU with Adreno GPU
- Allow disabling of MSI-based polling on the kernel
command-line
- Minor driver fixes and cleanups (octal permissions, error
messages, ...)
- Secure Nested Paging Support for AMD IOMMU. The IOMMU will
fault when a device tries DMA on memory owned by a guest. This
needs new fault-types as well as a rewrite of the IOMMU memory
semaphore for command completions.
- Allow broken Intel IOMMUs (wrong address widths reported) to
still be used for interrupt remapping.
- IOMMU UAPI updates for supporting vSVA, where the IOMMU can
access address spaces of processes running in a VM.
- Support for the MT8167 IOMMU in the Mediatek IOMMU driver.
- Device-tree updates for the Renesas driver to support r8a7742.
- Several smaller fixes and cleanups all over the place.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAl+Fy9MACgkQK/BELZcB
GuNxtRAA0TdYHXt6XyLWmvRAX/ySZSz6KOneZWWwpsQ9wh2/iv1PtBsrV0ltf+6g
CaX4ROZUVRbV9wPD+7maBRbzxrG3QhfEaaV+K45Q2J/QE1wjkyV8qj1eORWTUUoc
nis4FhGDKk2ER/Gsajy2Hjs4+6i43gdWG/+ghVGaCRo8mCZyoz1/6AyMQyN3deuO
NqWOv9E7hsavZjRs/w/LXG7eSE20cZwtt//kPVJF0r9eQqC6i1eJDQj48iRqJVqd
R0dwBQZaLz++qQptyKebDNlmH/3aAsb+A8nCeS7ZwHqWC1QujTWOUYWpFyPPbOmC
KVsQXzTzRfnVTDECF1Pk5d3yi45KILLU3B4zDJfUJjbL3KDYjuVUvhHF/pcGcjC3
H1LWJqHSAL8sJwHvKhpi0VtQ5SOxXnLO5fGG/CZT/Xb4QyM+mkwkFLdn1TryZTR/
M4XA+QuI96TzY7HQUJdSoEDANxoBef6gPnxdDKOnK1v4hfNsPAl7o8hZkM3w0DK8
GoFZUV+vjBhFcymGcQegSNiea28Hfi+hBe+PPHCmw+tJm47cketD5uP5jJ5NGaUe
MKU/QXWXc6oqeBTQT6ki5zJbJXKttbPa8eEmp+FrMatc9kruvBVhQoMbj7Vd3CA1
dC4zK9Awy7yj24ZhZfnAFx2DboCmBTUI3QKjDt9K5PRZyMeyoP8=
=C0Sg
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
- ARM-SMMU Updates from Will:
- Continued SVM enablement, where page-table is shared with CPU
- Groundwork to support integrated SMMU with Adreno GPU
- Allow disabling of MSI-based polling on the kernel command-line
- Minor driver fixes and cleanups (octal permissions, error
messages, ...)
- Secure Nested Paging Support for AMD IOMMU. The IOMMU will fault when
a device tries DMA on memory owned by a guest. This needs new
fault-types as well as a rewrite of the IOMMU memory semaphore for
command completions.
- Allow broken Intel IOMMUs (wrong address widths reported) to still be
used for interrupt remapping.
- IOMMU UAPI updates for supporting vSVA, where the IOMMU can access
address spaces of processes running in a VM.
- Support for the MT8167 IOMMU in the Mediatek IOMMU driver.
- Device-tree updates for the Renesas driver to support r8a7742.
- Several smaller fixes and cleanups all over the place.
* tag 'iommu-updates-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (57 commits)
iommu/vt-d: Gracefully handle DMAR units with no supported address widths
iommu/vt-d: Check UAPI data processed by IOMMU core
iommu/uapi: Handle data and argsz filled by users
iommu/uapi: Rename uapi functions
iommu/uapi: Use named union for user data
iommu/uapi: Add argsz for user filled data
docs: IOMMU user API
iommu/qcom: add missing put_device() call in qcom_iommu_of_xlate()
iommu/arm-smmu-v3: Add SVA device feature
iommu/arm-smmu-v3: Check for SVA features
iommu/arm-smmu-v3: Seize private ASID
iommu/arm-smmu-v3: Share process page tables
iommu/arm-smmu-v3: Move definitions to a header
iommu/io-pgtable-arm: Move some definitions to a header
iommu/arm-smmu-v3: Ensure queue is read after updating prod pointer
iommu/amd: Re-purpose Exclusion range registers to support SNP CWWB
iommu/amd: Add support for RMP_PAGE_FAULT and RMP_HW_ERR
iommu/amd: Use 4K page for completion wait write-back semaphore
iommu/tegra-smmu: Allow to group clients in same swgroup
iommu/tegra-smmu: Fix iova->phys translation
...
devices which require non-PCI based MSI handling.
- Cleanup historical leftovers all over the place
- Rework the code to utilize more core functionality
- Wrap XEN PCI/MSI interrupts into an irqdomain to make irqdomain
assignment to PCI devices possible.
- Assign irqdomains to PCI devices at initialization time which allows
to utilize the full functionality of hierarchical irqdomains.
- Remove arch_.*_msi_irq() functions from X86 and utilize the irqdomain
which is assigned to the device for interrupt management.
- Make the arch_.*_msi_irq() support conditional on a config switch and
let the last few users select it.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+EUxcTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoagLEACGp5U7a4mk24GsOZJDhrua1PHR/fhb
enn/5yOPpxDXdYmtFHIjV5qrNjDTV/WqDlI96KOi+oinG1Eoj0O/MA1AcSRhp6nf
jVdAuK1X0DHDUTEeTAP0JFwqd2j0KlIOphBrIMgeWIf1CRKlYiJaO+ioF9fKgwZ/
/HigOTSykGYMPggm3JXnWTWtJkKSGFxeADBvVHt5RpVmbWtrI4YoSBxKEMtvjyeM
5+GsqbCad1CnFYTN74N+QWVGmgGnUWGEzWsPYnJ9hW+yyjad1kWx3n6NcCWhssaC
E4vAXl6JuCPntL7jBFkbfUkQsgq12ThMZYWpCq8pShJA9O2tDKkxIGasHWrIt4cz
nYrESiv6hM7edjtOvBc086Gd0A2EyGOM879goHyaNVaTO4rI6jfZG7PlW1HHWibS
mf/bdTXBtULGNgEt7T8Qnb8sZ+D01WqzLrq/wm645jIrTzvNHUEpOhT1aH/g4TFQ
cNHD5PcM9OTmiBir9srNd47+1s2mpfwdMYHKBt2QgiXMO8fRgdtr6WLQE4vJjmG8
sA0yGGsgdTKeg2wW1ERF1pWL0Lt05Iaa42Skm0D3BwcOG2n5ltkBHzVllto9cTUh
kIldAOgxGE6QeCnnlrnbHz5mvzt/3Ih/PIKqPSUAC94Kx1yvVHRYuOvDExeO8DFB
P+f0TkrscZObSg==
=JlqV
-----END PGP SIGNATURE-----
Merge tag 'x86-irq-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 irq updates from Thomas Gleixner:
"Surgery of the MSI interrupt handling to prepare the support of
upcoming devices which require non-PCI based MSI handling:
- Cleanup historical leftovers all over the place
- Rework the code to utilize more core functionality
- Wrap XEN PCI/MSI interrupts into an irqdomain to make irqdomain
assignment to PCI devices possible.
- Assign irqdomains to PCI devices at initialization time which
allows to utilize the full functionality of hierarchical
irqdomains.
- Remove arch_.*_msi_irq() functions from X86 and utilize the
irqdomain which is assigned to the device for interrupt management.
- Make the arch_.*_msi_irq() support conditional on a config switch
and let the last few users select it"
* tag 'x86-irq-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
PCI: MSI: Fix Kconfig dependencies for PCI_MSI_ARCH_FALLBACKS
x86/apic/msi: Unbreak DMAR and HPET MSI
iommu/amd: Remove domain search for PCI/MSI
iommu/vt-d: Remove domain search for PCI/MSI[X]
x86/irq: Make most MSI ops XEN private
x86/irq: Cleanup the arch_*_msi_irqs() leftovers
PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
x86/pci: Set default irq domain in pcibios_add_device()
iommm/amd: Store irq domain in struct device
iommm/vt-d: Store irq domain in struct device
x86/xen: Wrap XEN MSI management into irqdomain
irqdomain/msi: Allow to override msi_domain_alloc/free_irqs()
x86/xen: Consolidate XEN-MSI init
x86/xen: Rework MSI teardown
x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init()
PCI/MSI: Provide pci_dev_has_special_msi_domain() helper
PCI_vmd_Mark_VMD_irqdomain_with_DOMAIN_BUS_VMD_MSI
irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI
x86/irq: Initialize PCI/MSI domain at PCI init time
x86/pci: Reducde #ifdeffery in PCI init code
...
devices which doesn't need pinning of pages for DMA anymore. Add support
for the command submission to devices using new x86 instructions like
ENQCMD{,S} and MOVDIR64B. In addition, add support for process address
space identifiers (PASIDs) which are referenced by those command
submission instructions along with the handling of the PASID state on
context switch as another extended state. Work by Fenghua Yu, Ashok Raj,
Yu-cheng Yu and Dave Jiang.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl996DIACgkQEsHwGGHe
VUqM4A/+JDI3GxNyMyBpJR0nQ2vs23ru1o3OxvxhYtcacZ0cNwkaO7g3TLQxH+LZ
k1QtvEd4jqI6BXV4de+HdZFDcqzikJf0KHnUflLTx956/Eop5rtxzMWVo69ZmYs8
QrW0mLhyh8eq19cOHbQBb4M/HFc1DXBw+l7Ft3MeA1divOVESRB/uNxjA25K4PvV
y+pipyUxqKSNhmBFf2bV8OVZloJiEtg3H6XudP0g/rZgjYe3qWxa+2iv6D08yBNe
g7NpMDMql2uo1bcFON7se2oF34poAi49BfiIQb5G4m9pnPyvVEMOCijxCx2FHYyF
nukyxt8g3Uq+UJYoolLNoWijL1jgBWeTBg1uuwsQOqWSARJx8nr859z0GfGyk2RP
GNoYE4rrWBUMEqWk4xeiPPgRDzY0cgcGh0AeuWqNhgBfbbZeGL0t0m5kfytk5i1s
W0YfRbz+T8+iYbgVfE/Zpthc7rH7iLL7/m34JC13+pzhPVTT32ECLJov2Ac8Tt15
X+fOe6kmlDZa4GIhKRzUoR2aEyLpjufZ+ug50hznBQjGrQfcx7zFqRAU4sJx0Yyz
rxUOJNZZlyJpkyXzc12xUvShaZvTcYenHGpxXl8TU3iMbY2otxk1Xdza8pc1LGQ/
qneYgILgKa+hSBzKhXCPAAgSYtPlvQrRizArS8Y0k/9rYaKCfBU=
=K9X4
-----END PGP SIGNATURE-----
Merge tag 'x86_pasid_for_5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 PASID updates from Borislav Petkov:
"Initial support for sharing virtual addresses between the CPU and
devices which doesn't need pinning of pages for DMA anymore.
Add support for the command submission to devices using new x86
instructions like ENQCMD{,S} and MOVDIR64B. In addition, add support
for process address space identifiers (PASIDs) which are referenced by
those command submission instructions along with the handling of the
PASID state on context switch as another extended state.
Work by Fenghua Yu, Ashok Raj, Yu-cheng Yu and Dave Jiang"
* tag 'x86_pasid_for_5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm: Add an enqcmds() wrapper for the ENQCMDS instruction
x86/asm: Carve out a generic movdir64b() helper for general usage
x86/mmu: Allocate/free a PASID
x86/cpufeatures: Mark ENQCMD as disabled when configured out
mm: Add a pasid member to struct mm_struct
x86/msr-index: Define an IA32_PASID MSR
x86/fpu/xstate: Add supervisor PASID state for ENQCMD
x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions
Documentation/x86: Add documentation for SVA (Shared Virtual Addressing)
iommu/vt-d: Change flags type to unsigned int in binding mm
drm, iommu: Change type of pasid to u32
Merge dma-contiguous.h into dma-map-ops.h, after removing the comment
describing the contiguous allocator into kernel/dma/contigous.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Commit 387caf0b75 ("iommu/amd: Treat per-device exclusion
ranges as r/w unity-mapped regions") accidentally overwrites
the 'flags' field in IVMD (struct ivmd_header) when the I/O
virtualization memory definition is associated with the
exclusion range entry. This leads to the corrupted IVMD table
(incorrect checksum). The kdump kernel reports the invalid checksum:
ACPI BIOS Warning (bug): Incorrect checksum in table [IVRS] - 0x5C, should be 0x60 (20200717/tbprint-177)
AMD-Vi: [Firmware Bug]: IVRS invalid checksum
Fix the above-mentioned issue by modifying the 'struct unity_map_entry'
member instead of the IVMD header.
Cleanup: The *exclusion_range* functions are not used anymore, so
get rid of them.
Fixes: 387caf0b75 ("iommu/amd: Treat per-device exclusion ranges as r/w unity-mapped regions")
Reported-and-tested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Cc: Jerry Snitselaar <jsnitsel@redhat.com>
Link: https://lore.kernel.org/r/20200926102602.19177-1-adrianhuang0701@gmail.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When the IOMMU SNP support bit is set in the IOMMU Extended Features
register, hardware re-purposes the following registers:
1. IOMMU Exclusion Base register (MMIO offset 0020h) to
Completion Wait Write-Back (CWWB) Base register
2. IOMMU Exclusion Range Limit (MMIO offset 0028h) to
Completion Wait Write-Back (CWWB) Range Limit register
and requires the IOMMU CWWB semaphore base and range to be programmed
in the register offset 0020h and 0028h accordingly.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Link: https://lore.kernel.org/r/20200923121347.25365-4-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
IOMMU SNP support requires the completion wait write-back semaphore to be
implemented using a 4K-aligned page, where the page address is to be
programmed into the newly introduced MMIO base/range registers.
This new scheme uses a per-iommu atomic variable to store the current
semaphore value, which is incremented for every completion wait command.
Since this new scheme is also compatible with non-SNP mode,
generalize the driver to use 4K page for completion-wait semaphore in
both modes.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Link: https://lore.kernel.org/r/20200923121347.25365-2-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Commit e52d58d54a ("iommu/amd: Use cmpxchg_double() when updating
128-bit IRTE") removed an assumption that modify_irte_ga always set
the valid bit, which requires the callers to set the appropriate value
for the struct irte_ga.valid bit before calling the function.
Similar to the commit 26e495f341 ("iommu/amd: Restore IRTE.RemapEn
bit after programming IRTE"), which is for the function
amd_iommu_deactivate_guest_mode().
The same change is also needed for the amd_iommu_activate_guest_mode().
Otherwise, this could trigger IO_PAGE_FAULT for the VFIO based VMs with
AVIC enabled.
Fixes: e52d58d54a ("iommu/amd: Use cmpxchg_double() when updating 128-bit IRTE")
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Link: https://lore.kernel.org/r/20200916111720.43913-1-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
After commit 26e495f341 ("iommu/amd: Restore IRTE.RemapEn bit after
programming IRTE"), smatch warns:
drivers/iommu/amd/iommu.c:3870 amd_iommu_deactivate_guest_mode()
warn: variable dereferenced before check 'entry' (see line 3867)
Fix this by moving the @valid assignment to after @entry has been checked
for NULL.
Fixes: 26e495f341 ("iommu/amd: Restore IRTE.RemapEn bit after programming IRTE")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20200910171621.12879-1-joao.m.martins@oracle.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
PASID is defined as a few different types in iommu including "int",
"u32", and "unsigned int". To be consistent and to match with uapi
definitions, define PASID and its variations (e.g. max PASID) as "u32".
"u32" is also shorter and a little more explicit than "unsigned int".
No PASID type change in uapi although it defines PASID as __u64 in
some places.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/1600187413-163670-2-git-send-email-fenghua.yu@intel.com
As the next step to make X86 utilize the direct MSI irq domain operations
store the irq domain pointer in the device struct when a device is probed.
It only overrides the irqdomain of devices which are handled by a regular
PCI/MSI irq domain which protects PCI devices behind special busses like
VMD which have their own irq domain.
No functional change.
It just avoids the redirection through arch_*_msi_irqs() and allows the
PCI/MSI core to directly invoke the irq domain alloc/free functions instead
of having to look up the irq domain for every single MSI interupt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20200826112333.806328762@linutronix.de
Convert the interrupt remap drivers to retrieve the pci device from the msi
descriptor and use info::hwirq.
This is the first step to prepare x86 for using the generic MSI domain ops.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20200826112332.466405395@linutronix.de
Move the IOAPIC specific fields into their own struct and reuse the common
devid. Get rid of the #ifdeffery as it does not matter at all whether the
alloc info is a couple of bytes longer or not.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20200826112332.054367732@linutronix.de
Now that the iommu implementations handle the X86_*_GET_PARENT_DOMAIN
types, consolidate the two getter functions.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20200826112331.741909337@linutronix.de
The irq domain request mode is now indicated in irq_alloc_info::type.
Consolidate the two getter functions into one.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20200826112331.634777249@linutronix.de
irq_remapping_ir_irq_domain() is used to retrieve the remapping parent
domain for an allocation type. irq_remapping_irq_domain() is for retrieving
the actual device domain for allocating interrupts for a device.
The two functions are similar and can be unified by using explicit modes
for parent irq domain retrieval.
Add X86_IRQ_ALLOC_TYPE_IOAPIC/HPET_GET_PARENT and use it in the iommu
implementations. Drop the parent domain retrieval for PCI_MSI/X as that is
unused.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20200826112331.436350257@linutronix.de
Dereferencing irq_data before checking it for NULL is suboptimal.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
When memory encryption is active the device is likely not in a direct
mapped domain. Forbid using IOMMUv2 functionality for now until finer
grained checks for this have been implemented.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20200824105415.21000-3-joro@8bytes.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Do not force devices supporting IOMMUv2 to be direct mapped when memory
encryption is active. This might cause them to be unusable because their
DMA mask does not include the encryption bit.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20200824105415.21000-2-joro@8bytes.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When using 128-bit interrupt-remapping table entry (IRTE) (a.k.a GA mode),
current driver disables interrupt remapping when it updates the IRTE
so that the upper and lower 64-bit values can be updated safely.
However, this creates a small window, where the interrupt could
arrive and result in IO_PAGE_FAULT (for interrupt) as shown below.
IOMMU Driver Device IRQ
============ ===========
irte.RemapEn=0
...
change IRTE IRQ from device ==> IO_PAGE_FAULT !!
...
irte.RemapEn=1
This scenario has been observed when changing irq affinity on a system
running I/O-intensive workload, in which the destination APIC ID
in the IRTE is updated.
Instead, use cmpxchg_double() to update the 128-bit IRTE at once without
disabling the interrupt remapping. However, this means several features,
which require GA (128-bit IRTE) support will also be affected if cmpxchg16b
is not supported (which is unprecedented for AMD processors w/ IOMMU).
Fixes: 880ac60e25 ("iommu/amd: Introduce interrupt remapping ops structure")
Reported-by: Sean Osborne <sean.m.osborne@oracle.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Erik Rockstrom <erik.rockstrom@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Link: https://lore.kernel.org/r/20200903093822.52012-3-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Currently, the RemapEn (valid) bit is accidentally cleared when
programming IRTE w/ guestMode=0. It should be restored to
the prior state.
Fixes: b9fc6b56f4 ("iommu/amd: Implements irq_set_vcpu_affinity() hook to setup vapic mode for pass-through devices")
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Link: https://lore.kernel.org/r/20200903093822.52012-2-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Fix W=1 compile warnings (invalid kerneldoc):
drivers/iommu/amd/init.c:1586: warning: Function parameter or member 'ivrs' not described in 'get_highest_supported_ivhd_type'
drivers/iommu/amd/init.c:1938: warning: Function parameter or member 'iommu' not described in 'iommu_update_intcapxt'
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20200728170859.28143-1-krzk@kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Few exported functions from AMD IOMMU driver are missing prototypes.
They have declaration in arch/x86/events/amd/iommu.h but this file
cannot be included in the driver. Add prototypes to fix W=1 warnings
like:
drivers/iommu/amd/init.c:3066:19: warning:
no previous prototype for 'get_amd_iommu' [-Wmissing-prototypes]
3066 | struct amd_iommu *get_amd_iommu(unsigned int idx)
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20200727183631.16744-1-krzk@kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Merge more updates from Andrew Morton:
- most of the rest of MM (memcg, hugetlb, vmscan, proc, compaction,
mempolicy, oom-kill, hugetlbfs, migration, thp, cma, util,
memory-hotplug, cleanups, uaccess, migration, gup, pagemap),
- various other subsystems (alpha, misc, sparse, bitmap, lib, bitops,
checkpatch, autofs, minix, nilfs, ufs, fat, signals, kmod, coredump,
exec, kdump, rapidio, panic, kcov, kgdb, ipc).
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (164 commits)
mm/gup: remove task_struct pointer for all gup code
mm: clean up the last pieces of page fault accountings
mm/xtensa: use general page fault accounting
mm/x86: use general page fault accounting
mm/sparc64: use general page fault accounting
mm/sparc32: use general page fault accounting
mm/sh: use general page fault accounting
mm/s390: use general page fault accounting
mm/riscv: use general page fault accounting
mm/powerpc: use general page fault accounting
mm/parisc: use general page fault accounting
mm/openrisc: use general page fault accounting
mm/nios2: use general page fault accounting
mm/nds32: use general page fault accounting
mm/mips: use general page fault accounting
mm/microblaze: use general page fault accounting
mm/m68k: use general page fault accounting
mm/ia64: use general page fault accounting
mm/hexagon: use general page fault accounting
mm/csky: use general page fault accounting
...
Patch series "mm: Page fault accounting cleanups", v5.
This is v5 of the pf accounting cleanup series. It originates from Gerald
Schaefer's report on an issue a week ago regarding to incorrect page fault
accountings for retried page fault after commit 4064b98270 ("mm: allow
VM_FAULT_RETRY for multiple times"):
https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/
What this series did:
- Correct page fault accounting: we do accounting for a page fault
(no matter whether it's from #PF handling, or gup, or anything else)
only with the one that completed the fault. For example, page fault
retries should not be counted in page fault counters. Same to the
perf events.
- Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf
event is used in an adhoc way across different archs.
Case (1): for many archs it's done at the entry of a page fault
handler, so that it will also cover e.g. errornous faults.
Case (2): for some other archs, it is only accounted when the page
fault is resolved successfully.
Case (3): there're still quite some archs that have not enabled
this perf event.
Since this series will touch merely all the archs, we unify this
perf event to always follow case (1), which is the one that makes most
sense. And since we moved the accounting into handle_mm_fault, the
other two MAJ/MIN perf events are well taken care of naturally.
- Unify definition of "major faults": the definition of "major
fault" is slightly changed when used in accounting (not
VM_FAULT_MAJOR). More information in patch 1.
- Always account the page fault onto the one that triggered the page
fault. This does not matter much for #PF handlings, but mostly for
gup. More information on this in patch 25.
Patchset layout:
Patch 1: Introduced the accounting in handle_mm_fault(), not enabled.
Patch 2-23: Enable the new accounting for arch #PF handlers one by one.
Patch 24: Enable the new accounting for the rest outliers (gup, iommu, etc.)
Patch 25: Cleanup GUP task_struct pointer since it's not needed any more
This patch (of 25):
This is a preparation patch to move page fault accountings into the
general code in handle_mm_fault(). This includes both the per task
flt_maj/flt_min counters, and the major/minor page fault perf events. To
do this, the pt_regs pointer is passed into handle_mm_fault().
PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault
handlers.
So far, all the pt_regs pointer that passed into handle_mm_fault() is
NULL, which means this patch should have no intented functional change.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200707225021.200906-1-peterx@redhat.com
Link: http://lkml.kernel.org/r/20200707225021.200906-2-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move AMD Kconfig and Makefile bits down into the amd directory
with the rest of the AMD specific files.
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20200630200636.48600-3-jsnitsel@redhat.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
- Make the handling of the firmware node consistent and do not free the
node after the domain has been created successfully. The core code
stores a pointer to it which can lead to a use after free or double
free.
This used to "work" because the pointer was not stored when the initial
code was written, but at some point later it was required to store
it. Of course nobody noticed that the existing users break that way.
- Handle affinity setting on inactive interrupts correctly when
hierarchical irq domains are enabled. When interrupts are inactive with
the modern hierarchical irqdomain design, the interrupt chips are not
necessarily in a state where affinity changes can be handled. The legacy
irq chip design allowed this because interrupts are immediately fully
initialized at allocation time. X86 has a hacky workaround for this, but
other implementations do not. This cased malfunction on GIC-V3. Instead
of playing whack a mole to find all affected drivers, change the core
code to store the requested affinity setting and then establish it when
the interrupt is allocated, which makes the X86 hack go away.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8UP+4THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoSuZD/9tNPR4fIDt4mC9ciSvwSqGTV+q1y1D
zhXSDro4cJNjzy/9D475IJqOlvchaF9Nfun55b60Q6vnA4VN8G+kABEaG8uwr8mV
ijTB4f0qKfW/9kUDTJRScq3nNmC3miqg8ZFgFEn6Ecxj3NHmwidATIi5sF6f/XVG
DdhL0Jys7ycNeGBf7yIKbT5/NOULMHYy9rK1NDAeBo9u3klvmrwrHgdNsiDDhEaU
KlHtwuQLCdjFY3Lf67YpSah+Hx/gXPI1VHUxDDFRoFmC4RlB0VjyXGydjsisOrSQ
Cl2gnkQ6VOlLaLbN38nmia9nyb6npzE5iK1h9EDcaRhBACG9O23Bdo+YZYxl6BOP
mXuyIVKJYczJEp7j1fGHW/aNCoEqC8dGVyN7toxMVfGZmF12JzMSt4SYItPeSjFC
bPNPRCscpiMOQdgwgO0woK1764V46g1BlmxXtJRdWB4iwWgXcryaz65xzSfNeZF4
0+TvdYs2FYjxwwIyWj8xJ3Npe1lKhH+06DA6gziwJt1u4it8rl82UcqMFyf/ty1w
o5LHyMBWYm7SJXSeaZZj+nv7moJKJnmRYKnpry21cUzsK/vQEPX0vqhwh4dSFN3O
BaBocDsOk+9wkmUwi6haP+6+vpadAFQrsqVhURtwc6OVSWn2/vsf2ZH5P36xwFWD
tlFanb8hX9y2NQ==
=elM3
-----END PGP SIGNATURE-----
Merge tag 'irq-urgent-2020-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into master
Pull irq fixes from Thomas Gleixner:
"Two fixes for the interrupt subsystem:
- Make the handling of the firmware node consistent and do not free
the node after the domain has been created successfully. The core
code stores a pointer to it which can lead to a use after free or
double free.
This used to "work" because the pointer was not stored when the
initial code was written, but at some point later it was required
to store it. Of course nobody noticed that the existing users break
that way.
- Handle affinity setting on inactive interrupts correctly when
hierarchical irq domains are enabled.
When interrupts are inactive with the modern hierarchical irqdomain
design, the interrupt chips are not necessarily in a state where
affinity changes can be handled. The legacy irq chip design allowed
this because interrupts are immediately fully initialized at
allocation time. X86 has a hacky workaround for this, but other
implementations do not.
This cased malfunction on GIC-V3. Instead of playing whack a mole
to find all affected drivers, change the core code to store the
requested affinity setting and then establish it when the interrupt
is allocated, which makes the X86 hack go away"
* tag 'irq-urgent-2020-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/affinity: Handle affinity setting on inactive interrupts correctly
irqdomain/treewide: Keep firmware node unconditionally allocated
Quite some non OF/ACPI users of irqdomains allocate firmware nodes of type
IRQCHIP_FWNODE_NAMED or IRQCHIP_FWNODE_NAMED_ID and free them right after
creating the irqdomain. The only purpose of these FW nodes is to convey
name information. When this was introduced the core code did not store the
pointer to the node in the irqdomain. A recent change stored the firmware
node pointer in irqdomain for other reasons and missed to notice that the
usage sites which do the alloc_fwnode/create_domain/free_fwnode sequence
are broken by this. Storing a dangling pointer is dangerous itself, but in
case that the domain is destroyed later on this leads to a double free.
Remove the freeing of the firmware node after creating the irqdomain from
all affected call sites to cure this.
Fixes: 711419e504 ("irqdomain: Add the missing assignment of domain->fwnode for named fwnode")
Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/873661qakd.fsf@nanos.tec.linutronix.de
At least the version in the header file to fix a compile warning about
the function being unused.
Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20200630124611.23153-1-joro@8bytes.org
Do not call atomic64_set() directly to update the domain page-table
root and use two new helper functions.
This makes it easier to implement additional work necessary when
the page-table is updated.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20200626080547.24865-2-joro@8bytes.org
Currently, Linux logs the two messages below.
[ 0.979142] pci 0000:00:00.2: AMD-Vi: Extended features (0xf77ef22294ada):
[ 0.979546] PPR NX GT IA GA PC GA_vAPIC
The log level of these lines differs though. The first one has level
*info*, while the second has level *warn*, which is confusing.
$ dmesg -T --level=info | grep "Extended features"
[Tue Jun 16 21:46:58 2020] pci 0000:00:00.2: AMD-Vi: Extended features (0xf77ef22294ada):
$ dmesg -T --level=warn | grep "PPR"
[Tue Jun 16 21:46:58 2020] PPR NX GT IA GA PC GA_vAPIC
The problem is, that commit 3928aa3f57 ("iommu/amd: Detect and enable
guest vAPIC support") introduced a newline, causing `pr_cont()`, used to
print the features, to default back to the default log level.
/**
* pr_cont - Continues a previous log message in the same line.
* @fmt: format string
* @...: arguments for the format string
*
* This macro expands to a printk with KERN_CONT loglevel. It should only be
* used when continuing a log message with no newline ('\n') enclosed. Otherwise
* it defaults back to KERN_DEFAULT loglevel.
*/
#define pr_cont(fmt, ...) \
printk(KERN_CONT fmt, ##__VA_ARGS__)
So, remove the line break, so only one line is logged.
Fixes: 3928aa3f57 ("iommu/amd: Detect and enable guest vAPIC support")
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: iommu@lists.linux-foundation.org
Link: https://lore.kernel.org/r/20200616220420.19466-1-pmenzel@molgen.mpg.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
- Move the Intel and AMD IOMMU drivers into their own
subdirectory. Both drivers consist of several files by now and
giving them their own directory unclutters the IOMMU top-level
directory a bit.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAl7jme4ACgkQK/BELZcB
GuNNMw//U7AL3Qq6J8DqU+Ay+gIblxKUhWtYLVHad1+agSWmcbfy4E6iV8FqXLbP
HnCSmA7ScgEMN+3GAve/WpWccMI3aeAgp4xI4MElz/6p4QeJXfNu9COrllif+OX7
4fDpxXyd0fhKev4lPGZFRY8yGgvgP5ZHvDG0juoxi3bKCqiC2bkAga3itC9RPCQb
8kBefKIb7/q+UUGGVppTvVIW0mrqWLQ1TcnfKf0hovU7yZs4i4RO+8br6Q5eNUcB
Vb64vCV3qkQ/zPdr4vK6rvuZTPRMKkCgY4+MJr/g2/JQWuZxF1O+q+TsTYI1ISAS
qNPRdxgNrZbSBDowg2QfQtPBHPpq3m4eNDeD+ewyQkrVt0/Eneg6Np0FG9j3tGAG
+IS64r2E25O0tGtBIQ9Mi2TC68S0C7VtMbzx55zVcTGF0JH9T2YW4sSdRcTjVdW6
WBFqu5fXEKk63ln3h/8JEP7zPWGp+Q3cuOChDvcmIMjCxQ84k5jOB5AIZppGIgJ9
0nGf45t8YCvIXMbNKufYqjesJZOC2bd+Swi1MZXVlO/gSVv19O40UW+F1X0e7YOp
MHOzsV44rE2posS/huHOLR4q0AQTdc9O1mywCCGDxNW8tlwIBHsLLJ8b9C9raIRn
mZkq94QZQXta+WYtoGvbk6nHQ89FtBOOdEH2TSlEbvvYowpjZZE=
=gX8z
-----END PGP SIGNATURE-----
Merge tag 'iommu-drivers-move-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu driver directory structure cleanup from Joerg Roedel:
"Move the Intel and AMD IOMMU drivers into their own subdirectory.
Both drivers consist of several files by now and giving them their own
directory unclutters the IOMMU top-level directory a bit"
* tag 'iommu-drivers-move-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Move Intel IOMMU driver into subdirectory
iommu/amd: Move AMD IOMMU driver into subdirectory
Move all files related to the AMD IOMMU driver into its own
subdirectory.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Link: https://lore.kernel.org/r/20200609130303.26974-2-joro@8bytes.org