Commit Graph

3456 Commits

Author SHA1 Message Date
Ilya Leoshkevich 6d9406f80c s390/uapi: cover statfs padding by growing f_spare
pahole says:

	struct compat_statfs64 {
		...
		u32			f_spare[4];		/*    68    16 */
		/* size: 88, cachelines: 1, members: 12 */
		/* padding: 4 */

	struct statfs {
		...
		unsigned int		f_spare[4];		/*    68    16 */
		/* size: 88, cachelines: 1, members: 12 */
		/* padding: 4 */

	struct statfs64 {
		...
		unsigned int		f_spare[4];		/*    68    16 */
		/* size: 88, cachelines: 1, members: 12 */
		/* padding: 4 */

One has to keep the existence of padding in mind when working with
these structs. Grow f_spare arrays to 5 in order to simplify things.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20230504144021.808932-3-iii@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-05-17 15:20:17 +02:00
Linus Torvalds b115d85a95 Locking changes in v6.4:
- Introduce local{,64}_try_cmpxchg() - a slightly more optimal
    primitive, which will be used in perf events ring-buffer code.
 
  - Simplify/modify rwsems on PREEMPT_RT, to address writer starvation.
 
  - Misc cleanups/fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRUvUoRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hlIhAArP33rTKi+HAndQ3UHW3XtmHRxEEQTfiE
 wvIoN89h58QW4DGMeAV4ltafbIPQAkI233Aogwz903L0qbDV0Ro4OU3XJembRuWl
 LeOADKwYyypXdOa8XICuY9aIP7e1/h0DF3ySs7inLcwK9JCyAIxnsVHYej+hsRXA
 kZoXN98T3TR1C0V9UQy4SU3HI1lC3tsG3R9Ti9TnYUg3ygVXhRE9lOQ4kv9lFPVz
 BNuj2Blj7KNiVaY9kehrhO54THI7NmsCVZO44Rcl48I0KAcFulAmFcNlE7GnR8Nj
 thj38pU6XAFVHXG8MYjgE+Al+PnK48NtJxexCtHyGvGG4D2aLzRMnkolxAUCcVuK
 G+UBsQm3ybjYgHgt1zuN6ehcpT+5tULkDH8JA7vrgZYaVgxHzsUaHgYfCCWKnmUY
 mPR6aImEmYZwZVNLskhe0HT4mq244bp+VnWlnJ6LZK7t/itenvDhqnj7KTi4Bfej
 lTHplOTitV/8uCEW8V4pX+YTEenVsIQmTc/G3iIabXP/6HzLffA3q4vyW6vKIErE
 pqrpuFA0Z4GB+pU0mJXt7+I7zscDVthwI055jDyQBjA7IcdVGm2MjQ6xcNRW5FYN
 UynvaEMocue4ZO4WdFsd1ZBUd9VfoNzGQspBw46DhCL1MEQBYv36SKQNjej/9aRr
 ilVwqnOWI2s=
 =mM0A
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2023-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:

 - Introduce local{,64}_try_cmpxchg() - a slightly more optimal
   primitive, which will be used in perf events ring-buffer code

 - Simplify/modify rwsems on PREEMPT_RT, to address writer starvation

 - Misc cleanups/fixes

* tag 'locking-core-2023-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/atomic: Correct (cmp)xchg() instrumentation
  locking/x86: Define arch_try_cmpxchg_local()
  locking/arch: Wire up local_try_cmpxchg()
  locking/generic: Wire up local{,64}_try_cmpxchg()
  locking/atomic: Add generic try_cmpxchg{,64}_local() support
  locking/rwbase: Mitigate indefinite writer starvation
  locking/arch: Rename all internal __xchg() names to __arch_xchg()
2023-05-05 12:56:55 -07:00
Linus Torvalds 10de638d8e s390 updates for the 6.4 merge window
- Add support for stackleak feature. Also allow specifying
   architecture-specific stackleak poison function to enable faster
   implementation. On s390, the mvc-based implementation helps decrease
   typical overhead from a factor of 3 to just 25%
 
 - Convert all assembler files to use SYM* style macros, deprecating the
   ENTRY() macro and other annotations. Select ARCH_USE_SYM_ANNOTATIONS
 
 - Improve KASLR to also randomize module and special amode31 code
   base load addresses
 
 - Rework decompressor memory tracking to support memory holes and improve
   error handling
 
 - Add support for protected virtualization AP binding
 
 - Add support for set_direct_map() calls
 
 - Implement set_memory_rox() and noexec module_alloc()
 
 - Remove obsolete overriding of mem*() functions for KASAN
 
 - Rework kexec/kdump to avoid using nodat_stack to call purgatory
 
 - Convert the rest of the s390 code to use flexible-array member instead
   of a zero-length array
 
 - Clean up uaccess inline asm
 
 - Enable ARCH_HAS_MEMBARRIER_SYNC_CORE
 
 - Convert to using CONFIG_FUNCTION_ALIGNMENT and enable
   DEBUG_FORCE_FUNCTION_ALIGN_64B
 
 - Resolve last_break in userspace fault reports
 
 - Simplify one-level sysctl registration
 
 - Clean up branch prediction handling
 
 - Rework CPU counter facility to retrieve available counter sets just
   once
 
 - Other various small fixes and improvements all over the code
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmRM8pwACgkQjYWKoQLX
 FBjV1AgAlvAhu1XkwOdwqdT4GqE8pcN4XXzydog1MYihrSO2PdgWAxpEW7o2QURN
 W+3xa6RIqt7nX2YBiwTanMZ12TYaFY7noGl3eUpD/NhueprweVirVl7VZUEuRoW/
 j0mbx77xsVzLfuDFxkpVwE6/j+tTO78kLyjUHwcN9rFVUaL7/orJneDJf+V8fZG0
 sHLOv0aljF7Jr2IIkw82lCmW/vdk7k0dACWMXK2kj1H3dIK34B9X4AdKDDf/WKXk
 /OSElBeZ93tSGEfNDRIda6iR52xocROaRnQAaDtargKFl9VO0/dN9ADxO+SLNHjN
 pFE/9VD6xT/xo4IuZZh/Z3TcYfiLvA==
 =Geqx
 -----END PGP SIGNATURE-----

Merge tag 's390-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Vasily Gorbik:

 - Add support for stackleak feature. Also allow specifying
   architecture-specific stackleak poison function to enable faster
   implementation. On s390, the mvc-based implementation helps decrease
   typical overhead from a factor of 3 to just 25%

 - Convert all assembler files to use SYM* style macros, deprecating the
   ENTRY() macro and other annotations. Select ARCH_USE_SYM_ANNOTATIONS

 - Improve KASLR to also randomize module and special amode31 code base
   load addresses

 - Rework decompressor memory tracking to support memory holes and
   improve error handling

 - Add support for protected virtualization AP binding

 - Add support for set_direct_map() calls

 - Implement set_memory_rox() and noexec module_alloc()

 - Remove obsolete overriding of mem*() functions for KASAN

 - Rework kexec/kdump to avoid using nodat_stack to call purgatory

 - Convert the rest of the s390 code to use flexible-array member
   instead of a zero-length array

 - Clean up uaccess inline asm

 - Enable ARCH_HAS_MEMBARRIER_SYNC_CORE

 - Convert to using CONFIG_FUNCTION_ALIGNMENT and enable
   DEBUG_FORCE_FUNCTION_ALIGN_64B

 - Resolve last_break in userspace fault reports

 - Simplify one-level sysctl registration

 - Clean up branch prediction handling

 - Rework CPU counter facility to retrieve available counter sets just
   once

 - Other various small fixes and improvements all over the code

* tag 's390-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (118 commits)
  s390/stackleak: provide fast __stackleak_poison() implementation
  stackleak: allow to specify arch specific stackleak poison function
  s390: select ARCH_USE_SYM_ANNOTATIONS
  s390/mm: use VM_FLUSH_RESET_PERMS in module_alloc()
  s390: wire up memfd_secret system call
  s390/mm: enable ARCH_HAS_SET_DIRECT_MAP
  s390/mm: use BIT macro to generate SET_MEMORY bit masks
  s390/relocate_kernel: adjust indentation
  s390/relocate_kernel: use SYM* macros instead of ENTRY(), etc.
  s390/entry: use SYM* macros instead of ENTRY(), etc.
  s390/purgatory: use SYM* macros instead of ENTRY(), etc.
  s390/kprobes: use SYM* macros instead of ENTRY(), etc.
  s390/reipl: use SYM* macros instead of ENTRY(), etc.
  s390/head64: use SYM* macros instead of ENTRY(), etc.
  s390/earlypgm: use SYM* macros instead of ENTRY(), etc.
  s390/mcount: use SYM* macros instead of ENTRY(), etc.
  s390/crc32le: use SYM* macros instead of ENTRY(), etc.
  s390/crc32be: use SYM* macros instead of ENTRY(), etc.
  s390/crypto,chacha: use SYM* macros instead of ENTRY(), etc.
  s390/amode31: use SYM* macros instead of ENTRY(), etc.
  ...
2023-04-30 11:43:31 -07:00
Andrzej Hajda 068550631f locking/arch: Rename all internal __xchg() names to __arch_xchg()
Decrease the probability of this internal facility to be used by
driver code.

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Palmer Dabbelt <palmer@rivosinc.com> [riscv]
Link: https://lore.kernel.org/r/20230118154450.73842-1-andrzej.hajda@intel.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-29 09:08:44 +02:00
Linus Torvalds 7fa8a8ee94 - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
switching from a user process to a kernel thread.
 
 - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav.
 
 - zsmalloc performance improvements from Sergey Senozhatsky.
 
 - Yue Zhao has found and fixed some data race issues around the
   alteration of memcg userspace tunables.
 
 - VFS rationalizations from Christoph Hellwig:
 
   - removal of most of the callers of write_one_page().
 
   - make __filemap_get_folio()'s return value more useful
 
 - Luis Chamberlain has changed tmpfs so it no longer requires swap
   backing.  Use `mount -o noswap'.
 
 - Qi Zheng has made the slab shrinkers operate locklessly, providing
   some scalability benefits.
 
 - Keith Busch has improved dmapool's performance, making part of its
   operations O(1) rather than O(n).
 
 - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
   permitting userspace to wr-protect anon memory unpopulated ptes.
 
 - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather
   than exclusive, and has fixed a bunch of errors which were caused by its
   unintuitive meaning.
 
 - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
   which causes minor faults to install a write-protected pte.
 
 - Vlastimil Babka has done some maintenance work on vma_merge():
   cleanups to the kernel code and improvements to our userspace test
   harness.
 
 - Cleanups to do_fault_around() by Lorenzo Stoakes.
 
 - Mike Rapoport has moved a lot of initialization code out of various
   mm/ files and into mm/mm_init.c.
 
 - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
   DRM, but DRM doesn't use it any more.
 
 - Lorenzo has also coverted read_kcore() and vread() to use iterators
   and has thereby removed the use of bounce buffers in some cases.
 
 - Lorenzo has also contributed further cleanups of vma_merge().
 
 - Chaitanya Prakash provides some fixes to the mmap selftesting code.
 
 - Matthew Wilcox changes xfs and afs so they no longer take sleeping
   locks in ->map_page(), a step towards RCUification of pagefaults.
 
 - Suren Baghdasaryan has improved mmap_lock scalability by switching to
   per-VMA locking.
 
 - Frederic Weisbecker has reworked the percpu cache draining so that it
   no longer causes latency glitches on cpu isolated workloads.
 
 - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
   logic.
 
 - Liu Shixin has changed zswap's initialization so we no longer waste a
   chunk of memory if zswap is not being used.
 
 - Yosry Ahmed has improved the performance of memcg statistics flushing.
 
 - David Stevens has fixed several issues involving khugepaged,
   userfaultfd and shmem.
 
 - Christoph Hellwig has provided some cleanup work to zram's IO-related
   code paths.
 
 - David Hildenbrand has fixed up some issues in the selftest code's
   testing of our pte state changing.
 
 - Pankaj Raghav has made page_endio() unneeded and has removed it.
 
 - Peter Xu contributed some rationalizations of the userfaultfd
   selftests.
 
 - Yosry Ahmed has fixed an issue around memcg's page recalim accounting.
 
 - Chaitanya Prakash has fixed some arm-related issues in the
   selftests/mm code.
 
 - Longlong Xia has improved the way in which KSM handles hwpoisoned
   pages.
 
 - Peter Xu fixes a few issues with uffd-wp at fork() time.
 
 - Stefan Roesch has changed KSM so that it may now be used on a
   per-process and per-cgroup basis.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr3zQAKCRDdBJ7gKXxA
 jlLoAP0fpQBipwFxED0Us4SKQfupV6z4caXNJGPeay7Aj11/kQD/aMRC2uPfgr96
 eMG3kwn2pqkB9ST2QpkaRbxA//eMbQY=
 =J+Dj
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
   switching from a user process to a kernel thread.

 - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj
   Raghav.

 - zsmalloc performance improvements from Sergey Senozhatsky.

 - Yue Zhao has found and fixed some data race issues around the
   alteration of memcg userspace tunables.

 - VFS rationalizations from Christoph Hellwig:
     - removal of most of the callers of write_one_page()
     - make __filemap_get_folio()'s return value more useful

 - Luis Chamberlain has changed tmpfs so it no longer requires swap
   backing. Use `mount -o noswap'.

 - Qi Zheng has made the slab shrinkers operate locklessly, providing
   some scalability benefits.

 - Keith Busch has improved dmapool's performance, making part of its
   operations O(1) rather than O(n).

 - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
   permitting userspace to wr-protect anon memory unpopulated ptes.

 - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive
   rather than exclusive, and has fixed a bunch of errors which were
   caused by its unintuitive meaning.

 - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
   which causes minor faults to install a write-protected pte.

 - Vlastimil Babka has done some maintenance work on vma_merge():
   cleanups to the kernel code and improvements to our userspace test
   harness.

 - Cleanups to do_fault_around() by Lorenzo Stoakes.

 - Mike Rapoport has moved a lot of initialization code out of various
   mm/ files and into mm/mm_init.c.

 - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
   DRM, but DRM doesn't use it any more.

 - Lorenzo has also coverted read_kcore() and vread() to use iterators
   and has thereby removed the use of bounce buffers in some cases.

 - Lorenzo has also contributed further cleanups of vma_merge().

 - Chaitanya Prakash provides some fixes to the mmap selftesting code.

 - Matthew Wilcox changes xfs and afs so they no longer take sleeping
   locks in ->map_page(), a step towards RCUification of pagefaults.

 - Suren Baghdasaryan has improved mmap_lock scalability by switching to
   per-VMA locking.

 - Frederic Weisbecker has reworked the percpu cache draining so that it
   no longer causes latency glitches on cpu isolated workloads.

 - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
   logic.

 - Liu Shixin has changed zswap's initialization so we no longer waste a
   chunk of memory if zswap is not being used.

 - Yosry Ahmed has improved the performance of memcg statistics
   flushing.

 - David Stevens has fixed several issues involving khugepaged,
   userfaultfd and shmem.

 - Christoph Hellwig has provided some cleanup work to zram's IO-related
   code paths.

 - David Hildenbrand has fixed up some issues in the selftest code's
   testing of our pte state changing.

 - Pankaj Raghav has made page_endio() unneeded and has removed it.

 - Peter Xu contributed some rationalizations of the userfaultfd
   selftests.

 - Yosry Ahmed has fixed an issue around memcg's page recalim
   accounting.

 - Chaitanya Prakash has fixed some arm-related issues in the
   selftests/mm code.

 - Longlong Xia has improved the way in which KSM handles hwpoisoned
   pages.

 - Peter Xu fixes a few issues with uffd-wp at fork() time.

 - Stefan Roesch has changed KSM so that it may now be used on a
   per-process and per-cgroup basis.

* tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
  mm,unmap: avoid flushing TLB in batch if PTE is inaccessible
  shmem: restrict noswap option to initial user namespace
  mm/khugepaged: fix conflicting mods to collapse_file()
  sparse: remove unnecessary 0 values from rc
  mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area()
  hugetlb: pte_alloc_huge() to replace huge pte_alloc_map()
  maple_tree: fix allocation in mas_sparse_area()
  mm: do not increment pgfault stats when page fault handler retries
  zsmalloc: allow only one active pool compaction context
  selftests/mm: add new selftests for KSM
  mm: add new KSM process and sysfs knobs
  mm: add new api to enable ksm per process
  mm: shrinkers: fix debugfs file permissions
  mm: don't check VMA write permissions if the PTE/PMD indicates write permissions
  migrate_pages_batch: fix statistics for longterm pin retry
  userfaultfd: use helper function range_in_vma()
  lib/show_mem.c: use for_each_populated_zone() simplify code
  mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list()
  fs/buffer: convert create_page_buffers to folio_create_buffers
  fs/buffer: add folio_create_empty_buffers helper
  ...
2023-04-27 19:42:02 -07:00
Heiko Carstens 2a405f6bb3 s390/stackleak: provide fast __stackleak_poison() implementation
Provide an s390 specific __stackleak_poison() implementation which is
faster than the generic variant.

For the original implementation with an enforced 4kb stackframe for the
getpid() system call the system call overhead increases by a factor of 3 if
the stackleak feature is enabled. Using the s390 mvc based variant this is
reduced to an increase of 25% instead.

This is within the expected area, since the mvc based implementation is
more or less a memset64() variant which comes with similar results. See
commit 0b77d6701c ("s390: implement memset16, memset32 & memset64").

Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20230405130841.1350565-3-hca@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-20 11:36:35 +02:00
Heiko Carstens 0490d6d7ba s390/mm: enable ARCH_HAS_SET_DIRECT_MAP
Implement the set_direct_map_*() API, which allows to invalidate and set
default permissions to pages within the direct mapping.

Note that kernel_page_present(), which is also supposed to be part of this
API, is intentionally not implemented. The reason for this is that
kernel_page_present() is only used (and currently only makes sense) for
suspend/resume, which isn't supported on s390.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-20 11:36:29 +02:00
Heiko Carstens 17c51b1ba9 s390/mm: use BIT macro to generate SET_MEMORY bit masks
Use BIT macro to generate SET_MEMORY bit masks, which is easier to
maintain if bits get added, or removed.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-20 11:36:29 +02:00
Heiko Carstens e48b6853d8 s390/kasan: remove override of mem*() functions
The kasan mem*() functions are not used anymore since s390 has switched
to GENERIC_ENTRY and commit 69d4c0d321 ("entry, kasan, x86: Disallow
overriding mem*() functions").

Therefore remove the now dead code, similar to x86.
While at it also use the SYM* macros in mem.S.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-19 17:24:16 +02:00
Alexander Gordeev 2d1b21ecea s390/kdump: remove nodat stack restriction for calling nodat functions
To allow calling of DAT-off code from kernel the stack needs
to be switched to nodat_stack (or other stack mapped as 1:1).

Before call_nodat() macro was introduced that was necessary
to provide the very same memory address for STNSM and STOSM
instructions. If the kernel would stay on a random stack
(e.g. a virtually mapped one) then a virtual address provided
for STNSM instruction could differ from the physical address
needed for the corresponding STOSM instruction.

After call_nodat() macro is introduced the kernel stack does
not need to be mapped 1:1 anymore, since the macro stores the
physical memory address of return PSW in a register before
entering DAT-off mode. This way the return LPSWE instruction
is able to pick the correct memory location and restore the
DAT-on mode. That however might fail in case the 16-byte return
PSW happened to cross page boundary: PSW mask and PSW address
could end up in two separate non-contiguous physical pages.

Align the return PSW on 16-byte boundary so it always fits
into a single physical page. As result any stack (including
the virtually mapped one) could be used for calling DAT-off
code and prior switching to nodat_stack becomes unnecessary.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-19 17:24:16 +02:00
Alexander Gordeev 82caf7aba1 s390/kdump: rework invocation of DAT-off code
Calling kdump kernel is a two-step process that involves
invocation of the purgatory code: first time - to verify
the new kernel checksum and second time - to call the new
kernel itself.

The purgatory code operates on real addresses and does not
expect any memory protection. Therefore, before the purgatory
code is entered the DAT mode is always turned off. However,
it is only restored upon return from the new kernel checksum
verification. In case the purgatory was called to start the
new kernel and failed the control is returned to the old
kernel, but the DAT mode continues staying off.

The new kernel start failure is unlikely and leads to the
disabled wait state anyway. Still that poses a risk, since
the kernel code in general is not DAT-off safe and even
calling the disabled_wait() function might crash.

Introduce call_nodat() macro that allows entering DAT-off
mode, calling an arbitrary function and restoring DAT mode
back on. Switch all invocations of DAT-off code to that
macro and avoid the above described scenario altogether.

Name the call_nodat() macro in small letters after the
already existing call_on_stack() and put it to the same
header file.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
[hca@linux.ibm.com: some small modifications to call_nodat() macro]
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-19 17:24:16 +02:00
Gustavo A. R. Silva 6ca87bc4c8 s390/fcx: replace zero-length array with flexible-array member
Zero-length arrays are deprecated [1] and have to be replaced by C99
flexible-array members.

This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines
on memcpy() and help to make progress towards globally enabling
-fstrict-flex-arrays=3 [2]

Link: https://github.com/KSPP/linux/issues/78 [1]
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [2]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/ZC7XT5prvoE4Yunm@work
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:28 +02:00
Gustavo A. R. Silva 3071e9b391 s390/diag: replace zero-length array with flexible-array member
Zero-length arrays are deprecated [1] and have to be replaced by C99
flexible-array members.

This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines
on memcpy() and help to make progress towards globally enabling
-fstrict-flex-arrays=3 [2]

Link: https://github.com/KSPP/linux/issues/78 [1]
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [2]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/ZC7XGpUtVhqlRLhH@work
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:28 +02:00
Heiko Carstens 81e8479649 s390/mm: fix direct map accounting
Commit bb1520d581 ("s390/mm: start kernel with DAT enabled") did not
implement direct map accounting in the early page table setup code. In
result the reported values are bogus now:

$cat /proc/meminfo
...
DirectMap4k:        5120 kB
DirectMap1M:    18446744073709546496 kB
DirectMap2G:           0 kB

Fix this by adding the missing accounting. The result looks sane again:

$cat /proc/meminfo
...
DirectMap4k:        6156 kB
DirectMap1M:     2091008 kB
DirectMap2G:     6291456 kB

Fixes: bb1520d581 ("s390/mm: start kernel with DAT enabled")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:28 +02:00
Heiko Carstens f0a2a7c527 s390/mm: implement set_memory_rwnx()
Given that set_memory_rox() is implemented, provide also set_memory_rwnx().
This allows to get rid of all open coded __set_memory() usages in s390
architecture code.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:26 +02:00
Heiko Carstens 22e99fa564 s390/mm: implement set_memory_rox()
Provide the s390 specific native set_memory_rox() implementation to avoid
frequent set_memory_ro(); set_memory_x() call pairs.

This is the s390 variant of commit 60463628c9 ("x86/mm: Implement native
set_memory_rox()").

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:26 +02:00
Heiko Carstens bb87190c9d s390/kaslr: provide kaslr_enabled() function
Just like other architectures provide a kaslr_enabled() function, instead
of directly accessing a global variable.

Also pass the renamed __kaslr_enabled variable from the decompressor to the
kernel, so that kalsr_enabled() is available there too. This will be used
by a subsequent patch which randomizes the module base load address.

Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:25 +02:00
Heiko Carstens 11018ef90c s390/checksum: remove not needed uaccess.h include
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:25 +02:00
Heiko Carstens e42ac7789d s390/checksum: always use cksm instruction
Commit dfe843dce7 ("s390/checksum: support GENERIC_CSUM, enable it for
KASAN") switched s390 to use the generic checksum functions, so that KASAN
instrumentation also works checksum functions by avoiding architecture
specific inline assemblies.

There is however the problem that the generic csum_partial() function
returns a 32 bit value with a 16 bit folded checksum, while the original
s390 variant does not fold to 16 bit. This in turn causes that the
ipib_checksum in lowcore contains different values depending on kernel
config options.

The ipib_checksum is used by system dumpers to verify if pointers in
lowcore point to valid data. Verification is done by comparing checksum
values. The system dumpers still use 32 bit checksum values which are not
folded, and therefore the checksum verification fails (incorrectly).

Symptom is that reboot after dump does not work anymore when a KASAN
instrumented kernel is dumped.

Fix this by not using the generic checksum implementation. Instead add an
explicit kasan_check_read() so that KASAN knows about the read access from
within the inline assembly.

Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Fixes: dfe843dce7 ("s390/checksum: support GENERIC_CSUM, enable it for KASAN")
Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:25 +02:00
Stefan Haberland 1cee2975bb s390/dasd: add autoquiesce feature
Add the internal logic to check for autoquiesce triggers and handle
them.

Quiesce and resume are functions that tell Linux to stop/resume
issuing I/Os to a specific DASD.
The DASD driver allows a manual quiesce/resume via ioctl.

Autoquiesce will define an amount of triggers that will lead to
an automatic quiesce if a certain event occurs.
There is no automatic resume.

All events will be reported via DASD Extended Error Reporting (EER)
if configured.

Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Link: https://lore.kernel.org/r/20230405142017.2446986-3-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-11 19:53:08 -06:00
Heiko Carstens 22ca1e7738 s390: move on_thread_stack() to processor.h
As preparation for the stackleak feature move on_thread_stack() to
processor.h like x86.

Also make it __always_inline, and slightly optimize it by reading
current task's kernel stack pointer from lowcore.

Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-04 18:34:56 +02:00
Heiko Carstens 23be82f0de s390/stacktrace: remove call_on_stack_noreturn()
There is no user left of call_on_stack_noreturn() - remove it.

Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-04 18:34:56 +02:00
Heiko Carstens c2c3258fb5 s390/stack: use STACK_INIT_OFFSET where possible
Make STACK_INIT_OFFSET also available for assembler code, and
use it everywhere instead of open-coding it at several places.

Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-04 18:34:55 +02:00
Gerald Schaefer 99c2913363 mm: add PTE pointer parameter to flush_tlb_fix_spurious_fault()
s390 can do more fine-grained handling of spurious TLB protection faults,
when there also is the PTE pointer available.

Therefore, pass on the PTE pointer to flush_tlb_fix_spurious_fault() as an
additional parameter.

This will add no functional change to other architectures, but those with
private flush_tlb_fix_spurious_fault() implementations need to be made
aware of the new parameter.

Link: https://lkml.kernel.org/r/20230306161548.661740-1-gerald.schaefer@linux.ibm.com
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28 16:20:12 -07:00
Thomas Richter af90d7b69c s390/cpum_sf: remove flag PERF_CPUM_SF_FULL_BLOCKS
This flag is used to process only fully populated sampling buffers
when an sampling event is stopped on a CPU. By default the last sampling
buffer is also scanned for samples even if the sampling block full
indicator is not set in the trailer entry of a sampling buffer page.

This flag can be set via perf_event_attr::config1 field. It was never
used and never documented. It is useless now.

With PERF_CPUM_SF_FULL_BLOCKS:
When a process is scheduled off the CPU, the sampling is stopped and
the samples are copied to the perf ring buffer and marked invalid.
When stopped at the last full sample buffer page (which is
achieved with the PERF_CPUM_SF_FULL_BLOCKS options), the hardware
sampling will resume at the first free sample entry in the current,
partially filled sample buffer.

Without PERF_CPUM_SF_FULL_BLOCKS (default behavior):
The partially filled last sample buffer is scanned and valid samples
are saved to the perf ring buffer. The valid samples are marked invalid.
The sampling is resumed when the process is scheduled on this CPU.
Again the hardware sampling will resume at the first free sample entry in
the current, partially filled sample buffer.

Now the next interrupt handler invocation scans the
full sample block and saves the valid samples to the ring buffer.
It omits the invalid samples at the top of the buffer.
The default behavior is fully sufficient, therefore remove this feature.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-03-27 17:19:52 +02:00
Harald Freudenberger 2d72eaf036 s390/ap: implement SE AP bind, unbind and associate
Implementation of the new functions for SE AP support:
bind, unbind and associate. There are two new sysfs
attributes for this:

/sys/devices/ap/cardxx/xx.yyyy/se_bind
/sys/devices/ap/cardxx/xx.yyyy/se_associate

Writing a 1 into the se_bind attribute triggers the
SE AP bind for this AP queue, writing a 0 into does
an unbind - that's a reset (RAPQ) with the F bit enabled.

The se_associate attribute needs an integer value in
range 0...2^16-1 written in. This is the index into a
secrets table feed into the ultravisor. For more details
please see the Architecture documents.

These both new ap queue attributes are only visible
inside a SE guest with SB (Secure Binding) available.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:12:49 +01:00
Harald Freudenberger c81cf436e4 s390/ap: new low level inline functions ap_bapq() and ap_aapq()
Introduce two new low level functions ap_bapq() (calls
PQAP(BAPQ)) and ap_aapq (calls PQAP(AAPQ)). Both functions
are only meant to be used in SE environment with the SE
AP binding facility available.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:12:49 +01:00
Harald Freudenberger 4bdf3c3956 s390/ap: provide F bit parameter for ap_rapq() and ap_zapq()
Extent the ap inline functions ap_rapq() (calls PQAP(RAPQ))
and ap_zapq() (calls PQAP(ZAPQ)) with a new parameter to
enable the new architectured F bit which forces an
unassociate and/or unbind on a secure execution associated
and/or bound queue.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:12:49 +01:00
Harald Freudenberger 211c06d845 s390/ap: make tapq gr2 response a struct
This patch introduces a new struct ap_tapq_gr2 which covers
the response in GR2 on TAPQ invocation. This makes it much
easier and less error-prone for the calling functions to
access the right field without shifting and masking.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:12:48 +01:00
Harald Freudenberger f604704021 s390/ap: exploit new B bit from QCI config info
This patch introduces an update to the ap_config_info
struct which is filled with the QCI subfunction. There
is a new bit apsb (short 'B') showing if the AP secure
bind facility is available. The patch also includes a
simple function ap_sb_available() wrapping this bit test.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:12:48 +01:00
Harald Freudenberger 8794c59613 s390/zcrypt: rework length information for dqap
The inline ap_dqap function does not return the number of
bytes actually written into the message buffer. The calling
code inspects the AP message header to figure out what kind
of AP message has been received and pulls the length
information from this header. This processing may not work
correctly in cases where only a fragment of the reply is
received.

With this patch the ap_dqap inline function now returns
the number of actually written bytes in the *length parameter.
So the calling function has a chance to compare the number of
received bytes against what the AP message header length
field states. This is especially useful in cases where a
message could only get partially received.

The low level reply processing functions needed some rework
to be able to catch this new length information and compare
it the right way. The rework also deals with some situations
where until now the reply length was not correctly calculated
and/or set.

All this has been heavily tested as the modifications on
the reply length information may affect crypto load.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:12:47 +01:00
Harald Freudenberger 003d248fee s390/zcrypt: make psmid unsigned long instead of long long
Since s390 kernel build does not support 32 bit build any
more there is no difference between long and long long.
So this patch reworks all occurrences of psmid (a 64 bit
value) to use unsigned long now.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:12:47 +01:00
Heiko Carstens 91a0117dce s390/expoline: use __ALIGN instead of open coded .align
Use __ALIGN instead of open coded .align statement to make sure that
external expoline thunks follow global function alignment rules.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:12:47 +01:00
Heiko Carstens 6ef55060a1 s390: make use of CONFIG_FUNCTION_ALIGNMENT
Make use of CONFIG_FUNCTION_ALIGNMENT which was introduced with commit
d49a062621 ("arch: Introduce CONFIG_FUNCTION_ALIGNMENT").

Select FUNCTION_ALIGNMENT_8B for gcc in order to reflect gcc's default
function alignment. For all other compilers, which is only clang, select
a function alignment of 16 bytes which reflects the default function
alignment for clang.

Also change the __ALIGN define to follow whatever the value of
CONFIG_FUNCTION_ALIGNMENT is. This makes sure that the alignment of C and
assembler functions is the same.

In result everything still uses the default function alignment for both
compilers. However in addition this is now also true for all assembly
functions, so that all functions have a consistent alignment.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:12:46 +01:00
Heiko Carstens e5323477e6 Merge branch 'decompressor-memory-tracking' into features
Vasily Gorbik says:

===================
Combine and generalize all methods for finding unused memory in
decompressor, while decreasing complexity, add memory holes support,
while improving error handling (especially in low-memory conditions)
and debug-ability.
===================

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:04:10 +01:00
Vasily Gorbik 557b19709d s390/kasan: move shadow mapping to decompressor
Since regular paging structs are initialized in decompressor already
move KASAN shadow mapping to decompressor as well. This helps to avoid
allocating KASAN required memory in 1 large chunk, de-duplicate paging
structs creation code and start the uncompressed kernel with KASAN
instrumentation right away. This also allows to avoid all pitfalls
accidentally calling KASAN instrumented code during KASAN initialization.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:02:51 +01:00
Vasily Gorbik f913a66004 s390/boot: rework decompressor reserved tracking
Currently several approaches for finding unused memory in decompressor
are utilized. While "safe_addr" grows towards higher addresses, vmem
code allocates paging structures top down. The former requires careful
ordering. In addition to that ipl report handling code verifies potential
intersections with secure boot certificates on its own. Neither of two
approaches are memory holes aware and consistent with each other in low
memory conditions.

To solve that, existing approaches are generalized and combined
together, as well as online memory ranges are now taken into
consideration.

physmem_info has been extended to contain reserved memory ranges. New
set of functions allow to handle reserves and find unused memory.
All reserves and memory allocations are "typed". In case of out of
memory condition decompressor fails with detailed info on current
reserved ranges and usable online memory.

Linux version 6.2.0 ...
Kernel command line: ... mem=100M
Our of memory allocating 100000 bytes 100000 aligned in range 0:5800000
Reserved memory ranges:
0000000000000000 0000000003e33000 DECOMPRESSOR
0000000003f00000 00000000057648a3 INITRD
00000000063e0000 00000000063e8000 VMEM
00000000063eb000 00000000063f4000 VMEM
00000000063f7800 0000000006400000 VMEM
0000000005800000 0000000006300000 KASAN
Usable online memory ranges (info source: sclp read info [3]):
0000000000000000 0000000006400000
Usable online memory total: 6400000 Reserved: 61b10a3 Free: 24ef5d
Call Trace:
(sp:000000000002bd58 [<0000000000012a70>] physmem_alloc_top_down+0x60/0x14c)
 sp:000000000002bdc8 [<0000000000013756>] _pa+0x56/0x6a
 sp:000000000002bdf0 [<0000000000013bcc>] pgtable_populate+0x45c/0x65e
 sp:000000000002be90 [<00000000000140aa>] setup_vmem+0x2da/0x424
 sp:000000000002bec8 [<0000000000011c20>] startup_kernel+0x428/0x8b4
 sp:000000000002bf60 [<00000000000100f4>] startup_normal+0xd4/0xd4

physmem_alloc_range allows to find free memory in specified range. It
should be used for one time allocations only like finding position for
amode31 and vmlinux.
physmem_alloc_top_down can be used just like physmem_alloc_range, but
it also allows multiple allocations per type and tries to merge sequential
allocations together. Which is useful for paging structures allocations.
If sequential allocations cannot be merged together they are "chained",
allowing easy per type reserved ranges enumeration and migration to
memblock later. Extra "struct reserved_range" allocated for chaining are
not tracked or reserved but rely on the fact that both
physmem_alloc_range and physmem_alloc_top_down search for free memory
only below current top down allocator position. All reserved ranges
should be transferred to memblock before memblock allocations are
enabled.

The startup code has been reordered to delay any memory allocations until
online memory ranges are detected and occupied memory ranges are marked as
reserved to be excluded from follow-up allocations.
Ipl report certificates are a special case, ipl report certificates list
is checked together with other memory reserves until certificates are
saved elsewhere.
KASAN required memory for shadow memory allocation and mapping is reserved
as 1 large chunk which is later passed to KASAN early initialization code.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:02:50 +01:00
Vasily Gorbik 8c37cb7d4f s390/boot: rename mem_detect to physmem_info
In preparation to extending mem_detect with additional information like
reserved ranges rename it to more generic physmem_info. This new naming
also help to avoid confusion by using more exact terms like "physmem
online ranges", etc.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:02:50 +01:00
Vasily Gorbik 53fcc7dbf1 s390/boot: remove non-functioning image bootable check
check_image_bootable() has been introduced with commit 627c9b6205
("s390/boot: block uncompressed vmlinux booting attempts") to make sure
that users don't try to boot uncompressed vmlinux ELF image in qemu. It
used to be possible quite some time ago. That commit prevented confusion
with uncompressed vmlinux image starting to boot and even printing
kernel messages until it crashed. Users might have tried to report the
problem without realizing they are doing something which was not intended.
Since commit f1d3c53237 ("s390/boot: move sclp early buffer from fixed
address in asm to C") check_image_bootable() doesn't function properly
anymore, as well as booting uncompressed vmlinux image in qemu doesn't
really produce any output and crashes. Moving forward it doesn't make
sense to fix check_image_bootable() anymore, so simply remove it.

Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:02:50 +01:00
Heiko Carstens 029a4f4b95 s390/setup: always inline gen_lpswe()
gen_lpswe() contains a BUILD_BUG_ON() statement which depends on a function
parameter. If the compiler decides to generate a not inlined function this
will lead to a build error, even if all call sites pass a valid parameter.

To avoid this always inline gen_lpswe().

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-03-13 09:16:43 +01:00
Heiko Carstens 69a407bf81 s390/bp: remove __bpon()
There is no point in changing branch prediction state of a cpu shortly
before it enters stop state. Therefore remove __bpon().

Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-03-13 09:16:42 +01:00
Heiko Carstens 9b63fd2fc8 s390/bp: remove s390_isolate_bp_guest()
s390_isolate_bp_guest() is unused. Remove it.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-03-13 09:16:42 +01:00
Heiko Carstens f33f2d4c7c s390/bp: remove TIF_ISOLATE_BP
TIF_ISOLATE_BP is unused since it was introduced with commit 6b73044b2b
("s390: run user space and KVM guests with modified branch prediction").
Given that there is no use case remove it again.

Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-03-13 09:16:42 +01:00
Linus Torvalds 0bdf4a8bf0 s390 updates for 6.3 merge window part 2
- Add empty command line parameter handling stubs to kernel for all command
   line parameters which are handled in the decompressor. This avoids
   invalid "Unknown kernel command line parameters" messages from the
   kernel, and also avoids that these will be incorrectly passed to user
   space. This caused already confusion, therefore add the empty stubs
 
 - Add missing phys_to_virt() handling to machine check handler
 
 - Introduce and use a union to be used for zcrypt inline assemblies. This
   makes sure that only a register wide member of the union is passed as
   input and output parameter to inline assemblies, while usual C code uses
   other members of the union to access bit fields of it
 
 - Add and use a READ_ONCE_ALIGNED_128() macro, which can be used to
   atomically read a 128-bit value from memory. This replaces the (mis-)use
   of the 128-bit cmpxchg operation to do the same in cpum_sf code.
   Currently gcc does not generate the used lpq instruction if __READ_ONCE()
   is used for aligned 128-bit accesses, therefore use this s390 specific
   helper
 
 - Simplify machine check handler code if a task needs to be killed because
   of e.g. register corruption due to a machine malfunction
 
 - Perform CPU reset to clear pending interrupts and TLB entries on an
   already stopped target CPU before delegating work to it
 
 - Generate arch/s390/boot/vmlinux.map link map for the decompressor, when
   CONFIG_VMLINUX_MAP is enabled for debugging purposes
 
 - Fix segment type handling for dcssblk devices. It incorrectly always
   returned type "READ/WRITE" even for read-only segements, which can result
   in a kernel panic if somebody tries to write to a read-only device
 
 - Sort config S390 select list again
 
 - Fix two kprobe reenter bugs revealed by a recently added kprobe kunit
   test
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmQB5LgACgkQIg7DeRsp
 bsLvPRAAojiU7uZTDKL58Ta67D7gyWWZa6ClI7MHrtEwEM70cOZHmOYToFuyapbT
 Af+PBg8YcqDDHxGf+HNuxnYRI4y2hGJUyjURZmGuCzPPJ5IGvy+8/X0d0wsvclqC
 fWeTtVvZIxdu5NdBQTs1iMCCxPXg9OfGDVCU+P/pnrlKskHorA6a2ZWGdkpae3fj
 kd1pgLLcZq0Jf8+Q6bSLBZjm45R/Q+qEpd8eIOncP7xrQTpLPkLxRSrnZLPtYdLu
 7uUDtzEC63x7/0Weri71ZQqbc22CzHNiF6FZj5sm78KsWmparmUUsns/U5CrEO4J
 e85/kPXjgfvwMZdQCXSPHZJ8CGTRzN6Zl/Lym9W5+9X6cgb1WNVNCATGZIG3HueA
 MYnXmLkmNmaEgsckcYbCVPR9SjIwVibIL+/32hH63oUnc8WnrbVlai/G57j0dRUg
 +kxVvwbxaDyesRbF7XKe4PssYZJxpO+QFIhnvo6EEghUuc32o+WVNqjoU04/VBaa
 i2FtbARGDWWs5+EwS9td/xHiFPQHpFCJHrMbxacEQmtDbOTOhoLbxNlP17YfQ2oD
 ch0QCZ1w0VwE9ehMRLmZQCem51n7aPt11tKOEFKsck4Mb1dXVLW9LWwNpjIBXrIo
 2g4U4Ytj2I9HbxWMJmKmLsP4NHn2oG3A+uGUkSWf/wdDTfylB30=
 =zZId
 -----END PGP SIGNATURE-----

Merge tag 's390-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Heiko Carstens:

 - Add empty command line parameter handling stubs to kernel for all
   command line parameters which are handled in the decompressor. This
   avoids invalid "Unknown kernel command line parameters" messages from
   the kernel, and also avoids that these will be incorrectly passed to
   user space. This caused already confusion, therefore add the empty
   stubs

 - Add missing phys_to_virt() handling to machine check handler

 - Introduce and use a union to be used for zcrypt inline assemblies.
   This makes sure that only a register wide member of the union is
   passed as input and output parameter to inline assemblies, while
   usual C code uses other members of the union to access bit fields of
   it

 - Add and use a READ_ONCE_ALIGNED_128() macro, which can be used to
   atomically read a 128-bit value from memory. This replaces the
   (mis-)use of the 128-bit cmpxchg operation to do the same in cpum_sf
   code. Currently gcc does not generate the used lpq instruction if
   __READ_ONCE() is used for aligned 128-bit accesses, therefore use
   this s390 specific helper

 - Simplify machine check handler code if a task needs to be killed
   because of e.g. register corruption due to a machine malfunction

 - Perform CPU reset to clear pending interrupts and TLB entries on an
   already stopped target CPU before delegating work to it

 - Generate arch/s390/boot/vmlinux.map link map for the decompressor,
   when CONFIG_VMLINUX_MAP is enabled for debugging purposes

 - Fix segment type handling for dcssblk devices. It incorrectly always
   returned type "READ/WRITE" even for read-only segements, which can
   result in a kernel panic if somebody tries to write to a read-only
   device

 - Sort config S390 select list again

 - Fix two kprobe reenter bugs revealed by a recently added kprobe kunit
   test

* tag 's390-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/kprobes: fix current_kprobe never cleared after kprobes reenter
  s390/kprobes: fix irq mask clobbering on kprobe reenter from post_handler
  s390/Kconfig: sort config S390 select list again
  s390/extmem: return correct segment type in __segment_load()
  s390/decompressor: add link map saving
  s390/smp: perform cpu reset before delegating work to target cpu
  s390/mcck: cleanup user process termination path
  s390/cpum_sf: use READ_ONCE_ALIGNED_128() instead of 128-bit cmpxchg
  s390/rwonce: add READ_ONCE_ALIGNED_128() macro
  s390/ap,zcrypt,vfio: introduce and use ap_queue_status_reg union
  s390/nmi: fix virtual-physical address confusion
  s390/setup: do not complain about parameters handled in decompressor
2023-03-03 09:38:01 -08:00
Alexander Gordeev e7ec1d2eac s390/mcck: cleanup user process termination path
If a machine check interrupt hits while user process is
running __s390_handle_mcck() helper function is called
directly from the interrupt handler and terminates the
current process by calling make_task_dead() routine.

The make_task_dead() is not allowed to be called from
interrupt context which forces the machine check handler
switch to the kernel stack and enable local interrupts
first.

The __s390_handle_mcck() could also be called to service
pending work, but this time from the external interrupts
handler. It is the machine check handler that establishes
the work and schedules the external interrupt, therefore
the machine check interrupt itself should be disabled
while reading out the corresponding variable:

	local_mcck_disable();
	mcck = *this_cpu_ptr(&cpu_mcck);
	memset(this_cpu_ptr(&cpu_mcck), 0, sizeof(mcck));
	local_mcck_enable();

However, local_mcck_disable() does not have effect when
__s390_handle_mcck() is called directly form the machine
check handler, since the machine check interrupt is still
disabled. Therefore, it is not the opening bracket to the
following local_mcck_enable() function.

Simplify the user process termination flow by scheduling
the external interrupt and killing the affected process
from the interrupt context.

Assume a kernel-generated signal is always delivered and
ignore a value returned by do_send_sig_info() funciton.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-28 13:19:05 +01:00
Heiko Carstens 434b26605f s390/rwonce: add READ_ONCE_ALIGNED_128() macro
Add an s390 specific READ_ONCE_ALIGNED_128() helper, which can be used for
fast block concurrent (atomic) 128-bit accesses.

The used lpq instruction requires 128-bit alignment. This is also the
reason why the compiler doesn't emit this instruction if __READ_ONCE() is
used for 128-bit accesses.

Link: https://lore.kernel.org/r/20230224100237.3247871-2-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-28 13:19:05 +01:00
Harald Freudenberger ebf95e8846 s390/ap,zcrypt,vfio: introduce and use ap_queue_status_reg union
Introduce a new ap queue status register wrapper union to access register
wide values. So the inline assembler only sees register wide values but the
surrounding code may use a more structured view of the same value and a
reader of the code (and the compiler) gets a clear understanding about the
mapping between fields and register values.

All the changes to access the ap queue status are local to the inline
functions within ap.h. However, the struct ap_qirq_ctrl has been replaces
by a union for same reason and this needed slight adaptions in the calling
code.

Suggested-by: Halil Pasic <pasic@linux.ibm.com>
Suggested-by: Andreas Arnez <arnez@linux.ibm.com>
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-27 15:29:36 +01:00
Linus Torvalds 49d5759268 ARM:
- Provide a virtual cache topology to the guest to avoid
   inconsistencies with migration on heterogenous systems. Non secure
   software has no practical need to traverse the caches by set/way in
   the first place.
 
 - Add support for taking stage-2 access faults in parallel. This was an
   accidental omission in the original parallel faults implementation,
   but should provide a marginal improvement to machines w/o FEAT_HAFDBS
   (such as hardware from the fruit company).
 
 - A preamble to adding support for nested virtualization to KVM,
   including vEL2 register state, rudimentary nested exception handling
   and masking unsupported features for nested guests.
 
 - Fixes to the PSCI relay that avoid an unexpected host SVE trap when
   resuming a CPU when running pKVM.
 
 - VGIC maintenance interrupt support for the AIC
 
 - Improvements to the arch timer emulation, primarily aimed at reducing
   the trap overhead of running nested.
 
 - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the
   interest of CI systems.
 
 - Avoid VM-wide stop-the-world operations when a vCPU accesses its own
   redistributor.
 
 - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions
   in the host.
 
 - Aesthetic and comment/kerneldoc fixes
 
 - Drop the vestiges of the old Columbia mailing list and add [Oliver]
   as co-maintainer
 
 This also drags in arm64's 'for-next/sme2' branch, because both it and
 the PSCI relay changes touch the EL2 initialization code.
 
 RISC-V:
 
 - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE
 
 - Correctly place the guest in S-mode after redirecting a trap to the guest
 
 - Redirect illegal instruction traps to guest
 
 - SBI PMU support for guest
 
 s390:
 
 - Two patches sorting out confusion between virtual and physical
   addresses, which currently are the same on s390.
 
 - A new ioctl that performs cmpxchg on guest memory
 
 - A few fixes
 
 x86:
 
 - Change tdp_mmu to a read-only parameter
 
 - Separate TDP and shadow MMU page fault paths
 
 - Enable Hyper-V invariant TSC control
 
 - Fix a variety of APICv and AVIC bugs, some of them real-world,
   some of them affecting architecurally legal but unlikely to
   happen in practice
 
 - Mark APIC timer as expired if its in one-shot mode and the count
   underflows while the vCPU task was being migrated
 
 - Advertise support for Intel's new fast REP string features
 
 - Fix a double-shootdown issue in the emergency reboot code
 
 - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give SVM
   similar treatment to VMX
 
 - Update Xen's TSC info CPUID sub-leaves as appropriate
 
 - Add support for Hyper-V's extended hypercalls, where "support" at this
   point is just forwarding the hypercalls to userspace
 
 - Clean up the kvm->lock vs. kvm->srcu sequences when updating the PMU and
   MSR filters
 
 - One-off fixes and cleanups
 
 - Fix and cleanup the range-based TLB flushing code, used when KVM is
   running on Hyper-V
 
 - Add support for filtering PMU events using a mask.  If userspace
   wants to restrict heavily what events the guest can use, it can now
   do so without needing an absurd number of filter entries
 
 - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
   support is disabled
 
 - Add PEBS support for Intel Sapphire Rapids
 
 - Fix a mostly benign overflow bug in SEV's send|receive_update_data()
 
 - Move several SVM-specific flags into vcpu_svm
 
 x86 Intel:
 
 - Handle NMI VM-Exits before leaving the noinstr region
 
 - A few trivial cleanups in the VM-Enter flows
 
 - Stop enabling VMFUNC for L1 purely to document that KVM doesn't support
   EPTP switching (or any other VM function) for L1
 
 - Fix a crash when using eVMCS's enlighted MSR bitmaps
 
 Generic:
 
 - Clean up the hardware enable and initialization flow, which was
   scattered around multiple arch-specific hooks.  Instead, just
   let the arch code call into generic code.  Both x86 and ARM should
   benefit from not having to fight common KVM code's notion of how
   to do initialization.
 
 - Account allocations in generic kvm_arch_alloc_vm()
 
 - Fix a memory leak if coalesced MMIO unregistration fails
 
 selftests:
 
 - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to emit
   the correct hypercall instruction instead of relying on KVM to patch
   in VMMCALL
 
 - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmP2YA0UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPg/Qf+J6nT+TkIa+8Ei+fN1oMTDp4YuIOx
 mXvJ9mRK9sQ+tAUVwvDz3qN/fK5mjsYbRHIDlVc5p2Q3bCrVGDDqXPFfCcLx1u+O
 9U9xjkO4JxD2LS9pc70FYOyzVNeJ8VMGOBbC2b0lkdYZ4KnUc6e/WWFKJs96bK+H
 duo+RIVyaMthnvbTwSv1K3qQb61n6lSJXplywS8KWFK6NZAmBiEFDAWGRYQE9lLs
 VcVcG0iDJNL/BQJ5InKCcvXVGskcCm9erDszPo7w4Bypa4S9AMS42DHUaRZrBJwV
 /WqdH7ckIz7+OSV0W1j+bKTHAFVTCjXYOM7wQykgjawjICzMSnnG9Gpskw==
 =goe1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Provide a virtual cache topology to the guest to avoid
     inconsistencies with migration on heterogenous systems. Non secure
     software has no practical need to traverse the caches by set/way in
     the first place

   - Add support for taking stage-2 access faults in parallel. This was
     an accidental omission in the original parallel faults
     implementation, but should provide a marginal improvement to
     machines w/o FEAT_HAFDBS (such as hardware from the fruit company)

   - A preamble to adding support for nested virtualization to KVM,
     including vEL2 register state, rudimentary nested exception
     handling and masking unsupported features for nested guests

   - Fixes to the PSCI relay that avoid an unexpected host SVE trap when
     resuming a CPU when running pKVM

   - VGIC maintenance interrupt support for the AIC

   - Improvements to the arch timer emulation, primarily aimed at
     reducing the trap overhead of running nested

   - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the
     interest of CI systems

   - Avoid VM-wide stop-the-world operations when a vCPU accesses its
     own redistributor

   - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected
     exceptions in the host

   - Aesthetic and comment/kerneldoc fixes

   - Drop the vestiges of the old Columbia mailing list and add [Oliver]
     as co-maintainer

  RISC-V:

   - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE

   - Correctly place the guest in S-mode after redirecting a trap to the
     guest

   - Redirect illegal instruction traps to guest

   - SBI PMU support for guest

  s390:

   - Sort out confusion between virtual and physical addresses, which
     currently are the same on s390

   - A new ioctl that performs cmpxchg on guest memory

   - A few fixes

  x86:

   - Change tdp_mmu to a read-only parameter

   - Separate TDP and shadow MMU page fault paths

   - Enable Hyper-V invariant TSC control

   - Fix a variety of APICv and AVIC bugs, some of them real-world, some
     of them affecting architecurally legal but unlikely to happen in
     practice

   - Mark APIC timer as expired if its in one-shot mode and the count
     underflows while the vCPU task was being migrated

   - Advertise support for Intel's new fast REP string features

   - Fix a double-shootdown issue in the emergency reboot code

   - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give
     SVM similar treatment to VMX

   - Update Xen's TSC info CPUID sub-leaves as appropriate

   - Add support for Hyper-V's extended hypercalls, where "support" at
     this point is just forwarding the hypercalls to userspace

   - Clean up the kvm->lock vs. kvm->srcu sequences when updating the
     PMU and MSR filters

   - One-off fixes and cleanups

   - Fix and cleanup the range-based TLB flushing code, used when KVM is
     running on Hyper-V

   - Add support for filtering PMU events using a mask. If userspace
     wants to restrict heavily what events the guest can use, it can now
     do so without needing an absurd number of filter entries

   - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
     support is disabled

   - Add PEBS support for Intel Sapphire Rapids

   - Fix a mostly benign overflow bug in SEV's
     send|receive_update_data()

   - Move several SVM-specific flags into vcpu_svm

  x86 Intel:

   - Handle NMI VM-Exits before leaving the noinstr region

   - A few trivial cleanups in the VM-Enter flows

   - Stop enabling VMFUNC for L1 purely to document that KVM doesn't
     support EPTP switching (or any other VM function) for L1

   - Fix a crash when using eVMCS's enlighted MSR bitmaps

  Generic:

   - Clean up the hardware enable and initialization flow, which was
     scattered around multiple arch-specific hooks. Instead, just let
     the arch code call into generic code. Both x86 and ARM should
     benefit from not having to fight common KVM code's notion of how to
     do initialization

   - Account allocations in generic kvm_arch_alloc_vm()

   - Fix a memory leak if coalesced MMIO unregistration fails

  selftests:

   - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to
     emit the correct hypercall instruction instead of relying on KVM to
     patch in VMMCALL

   - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (325 commits)
  KVM: SVM: hyper-v: placate modpost section mismatch error
  KVM: x86/mmu: Make tdp_mmu_allowed static
  KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID
  KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes
  KVM: arm64: nv: Filter out unsupported features from ID regs
  KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
  KVM: arm64: nv: Allow a sysreg to be hidden from userspace only
  KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
  KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
  KVM: arm64: nv: Handle SMCs taken from virtual EL2
  KVM: arm64: nv: Handle trapped ERET from virtual EL2
  KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
  KVM: arm64: nv: Support virtual EL2 exceptions
  KVM: arm64: nv: Handle HCR_EL2.NV system register traps
  KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
  KVM: arm64: nv: Add EL2 system registers to vcpu context
  KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
  KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
  KVM: arm64: nv: Introduce nested virtualization VCPU feature
  KVM: arm64: Use the S2 MMU context to iterate over S2 table
  ...
2023-02-25 11:30:21 -08:00
Linus Torvalds 143c7bc649 iommufd for 6.3
Some polishing and small fixes for iommufd:
 
 - Remove IOMMU_CAP_INTR_REMAP, instead rely on the interrupt subsystem
 
 - Use GFP_KERNEL_ACCOUNT inside the iommu_domains
 
 - Support VFIO_NOIOMMU mode with iommufd
 
 - Various typos
 
 - A list corruption bug if HWPTs are used for attach
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY/TgzQAKCRCFwuHvBreF
 Ya3AAP4/WxTJIbDvtTyH3Fae3NxTdO8j8gsUvU1vrRYG83zdnAEAxd1yii7GEO8D
 crkeq9D4FUiPAkFnJ64Exw2FHb060Qg=
 =RABK
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd updates from Jason Gunthorpe:
 "Some polishing and small fixes for iommufd:

   - Remove IOMMU_CAP_INTR_REMAP, instead rely on the interrupt
     subsystem

   - Use GFP_KERNEL_ACCOUNT inside the iommu_domains

   - Support VFIO_NOIOMMU mode with iommufd

   - Various typos

   - A list corruption bug if HWPTs are used for attach"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
  iommufd: Do not add the same hwpt to the ioas->hwpt_list twice
  iommufd: Make sure to zero vfio_iommu_type1_info before copying to user
  vfio: Support VFIO_NOIOMMU with iommufd
  iommufd: Add three missing structures in ucmd_buffer
  selftests: iommu: Fix test_cmd_destroy_access() call in user_copy
  iommu: Remove IOMMU_CAP_INTR_REMAP
  irq/s390: Add arch_is_isolated_msi() for s390
  iommu/x86: Replace IOMMU_CAP_INTR_REMAP with IRQ_DOMAIN_FLAG_ISOLATED_MSI
  genirq/msi: Rename IRQ_DOMAIN_MSI_REMAP to IRQ_DOMAIN_ISOLATED_MSI
  genirq/irqdomain: Remove unused irq_domain_check_msi_remap() code
  iommufd: Convert to msi_device_has_isolated_msi()
  vfio/type1: Convert to iommu_group_has_isolated_msi()
  iommu: Add iommu_group_has_isolated_msi()
  genirq/msi: Add msi_device_has_isolated_msi()
2023-02-24 14:34:12 -08:00
Linus Torvalds a13de74e47 IOMMU Updates for Linux v6.3:
Including:
 
 	- Consolidate iommu_map/unmap functions. There have been
 	  blocking and atomic variants so far, but that was problematic
 	  as this approach does not scale with required new variants
 	  which just differ in the GFP flags used.
 	  So Jason consolidated this back into single functions that
 	  take a GFP parameter. This has the potential to cause
 	  conflicts with other trees, as they introduce new call-sites
 	  for the changed functions. I offered them to pull in the
 	  branch containing these changes and resolve it, but I am not
 	  sure everyone did that. The conflicts this caused with
 	  upstream up to v6.2-rc8 are resolved in the final merge
 	  commit.
 
 	- Retire the detach_dev() call-back in iommu_ops
 
 	- Arm SMMU updates from Will:
 	  - Device-tree binding updates:
 	    * Cater for three power domains on SM6375
 	    * Document existing compatible strings for Qualcomm SoCs
 	    * Tighten up clocks description for platform-specific compatible strings
 	  - Enable Qualcomm workarounds for some additional platforms that need them
 
 	- Intel VT-d updates from Lu Baolu:
 	  - Add Intel IOMMU performance monitoring support
 	  - Set No Execute Enable bit in PASID table entry
 	  - Two performance optimizations
 	  - Fix PASID directory pointer coherency
 	  - Fix missed rollbacks in error path
 	  - Cleanups
 
 	- Apple t8110 DART support
 
 	- Exynos IOMMU:
 	  - Implement better fault handling
 	  - Error handling fixes
 
 	- Renesas IPMMU:
 	  - Add device tree bindings for r8a779g0
 
 	- AMD IOMMU:
 	  - Various fixes for handling on SNP-enabled systems and
 	    handling of faults with unknown request-ids
 	  - Cleanups and other small fixes
 
 	- Various other smaller fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmP0hDwACgkQK/BELZcB
 GuM43RAA0YieShO+X0h6TFGfbK0zVoPd91giZehWBv9rHK7pP4iY8UEtBLBWGx/t
 CId4t98mmKmC212zz8QxrwAEzyTIRY+2t1yrpG2aVkoTYk8inMb07TU37wganh3O
 T0QccXN+9b2BS4k8yro5f3uX0d/C1JQVcMowwr53VMb/e73huqP1VTbz06/CIWMH
 DUhVRCzmNhSvoUOT5n7g6+ZDH+pot8WPZbtHV7FowEsmPCRc7Fj8kXyI9FEwKwrZ
 hIV5Y+6Lej8nQScgbO8MfblJym3VrBoSoM4GY2w0L0rjQw6m+Xtea5rT0W39YVWy
 YpiscLTL8TIMPP9zK1dXVygTaABK4J2iWmheHPkpKXIhK0iuH3Dke0Do5p6DNITj
 7J2YlaNEB480D5hvNBKsbbGHavgGPT8m529Sz0R7mSC7omRzqiG5Vsb46IXL+2bc
 92ojjYNfXb6OCtagIr2LMBLZRL2JCODqF1dUmyZfA8GKOHLP5kZXoMM+sZbQ2aUL
 1LOxRZVx+tlb9V4VaH1ZSs/6eM+HLDzjtHeu3PoWYf6mW4AEt4S/yl9SKAkGdBqt
 jCUErmYB1nU/eefqG1jhWRpQeJabcT3Oe30NZru1pfMoREThhjbAACw1JxWtoe1X
 ipGpV6lAP7tQUGuRk3/9O1lNqElJuNwC5lVTjS4FJ38vYQhQbao=
 =ZaZV
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:

 - Consolidate iommu_map/unmap functions.

   There have been blocking and atomic variants so far, but that was
   problematic as this approach does not scale with required new
   variants which just differ in the GFP flags used. So Jason
   consolidated this back into single functions that take a GFP
   parameter.

 - Retire the detach_dev() call-back in iommu_ops

 - Arm SMMU updates from Will:
     - Device-tree binding updates:
         - Cater for three power domains on SM6375
         - Document existing compatible strings for Qualcomm SoCs
         - Tighten up clocks description for platform-specific
           compatible strings
     - Enable Qualcomm workarounds for some additional platforms that
       need them

 - Intel VT-d updates from Lu Baolu:
     - Add Intel IOMMU performance monitoring support
     - Set No Execute Enable bit in PASID table entry
     - Two performance optimizations
     - Fix PASID directory pointer coherency
     - Fix missed rollbacks in error path
     - Cleanups

 - Apple t8110 DART support

 - Exynos IOMMU:
     - Implement better fault handling
     - Error handling fixes

 - Renesas IPMMU:
     - Add device tree bindings for r8a779g0

 - AMD IOMMU:
     - Various fixes for handling on SNP-enabled systems and
       handling of faults with unknown request-ids
     - Cleanups and other small fixes

 - Various other smaller fixes and cleanups

* tag 'iommu-updates-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (71 commits)
  iommu/amd: Skip attach device domain is same as new domain
  iommu: Attach device group to old domain in error path
  iommu/vt-d: Allow to use flush-queue when first level is default
  iommu/vt-d: Fix PASID directory pointer coherency
  iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode
  iommu/vt-d: Fix error handling in sva enable/disable paths
  iommu/amd: Improve page fault error reporting
  iommu/amd: Do not identity map v2 capable device when snp is enabled
  iommu: Fix error unwind in iommu_group_alloc()
  iommu/of: mark an unused function as __maybe_unused
  iommu: dart: DART_T8110_ERROR range should be 0 to 5
  iommu/vt-d: Enable IOMMU perfmon support
  iommu/vt-d: Add IOMMU perfmon overflow handler support
  iommu/vt-d: Support cpumask for IOMMU perfmon
  iommu/vt-d: Add IOMMU perfmon support
  iommu/vt-d: Support Enhanced Command Interface
  iommu/vt-d: Retrieve IOMMU perfmon capability information
  iommu/vt-d: Support size of the register set in DRHD
  iommu/vt-d: Set No Execute Enable bit in PASID table entry
  iommu/vt-d: Remove sva from intel_svm_dev
  ...
2023-02-24 13:40:13 -08:00