Commit Graph

18726 Commits

Author SHA1 Message Date
Ricardo Neri 046a5a95c3 x86/sched/itmt: Give all SMT siblings of a core the same priority
X86 does not have the SD_ASYM_PACKING flag in the SMT domain. The scheduler
knows how to handle SMT and non-SMT cores of different priority. There is
no reason for SMT siblings of a core to have different priorities.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Len Brown <len.brown@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406203148.19182-12-ricardo.neri-calderon@linux.intel.com
2023-05-08 10:58:38 +02:00
Ricardo Neri 995998ebde x86/sched: Remove SD_ASYM_PACKING from the SMT domain flags
There is no difference between any of the SMT siblings of a physical core.
Do not do asym_packing load balancing at this level.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406203148.19182-11-ricardo.neri-calderon@linux.intel.com
2023-05-08 10:58:37 +02:00
Linus Torvalds 58390c8ce1 IOMMU Updates for Linux 6.4
Including:
 
 	- Convert to platform remove callback returning void
 
 	- Extend changing default domain to normal group
 
 	- Intel VT-d updates:
 	    - Remove VT-d virtual command interface and IOASID
 	    - Allow the VT-d driver to support non-PRI IOPF
 	    - Remove PASID supervisor request support
 	    - Various small and misc cleanups
 
 	- ARM SMMU updates:
 	    - Device-tree binding updates:
 	        * Allow Qualcomm GPU SMMUs to accept relevant clock properties
 	        * Document Qualcomm 8550 SoC as implementing an MMU-500
 	        * Favour new "qcom,smmu-500" binding for Adreno SMMUs
 
 	    - Fix S2CR quirk detection on non-architectural Qualcomm SMMU
 	      implementations
 
 	    - Acknowledge SMMUv3 PRI queue overflow when consuming events
 
 	    - Document (in a comment) why ATS is disabled for bypass streams
 
 	- AMD IOMMU updates:
 	    - 5-level page-table support
 	    - NUMA awareness for memory allocations
 
 	- Unisoc driver: Support for reattaching an existing domain
 
 	- Rockchip driver: Add missing set_platform_dma_ops callback
 
 	- Mediatek driver: Adjust the dma-ranges
 
 	- Various other small fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmRONeAACgkQK/BELZcB
 GuPmpw/8C9ruxQ0JU5rcDBXQGvos4gMmxlbELMrBpbbiTtdb35xchpKfdhnECGIF
 k2SrrcF40R/S82SyzNU/eZtGKirtcXvGFraUFgu/QdCcnnqpRHs+IJMXX2NJP+it
 +0wO1uiInt3CN1ERcR4F31cDKiWjDG8bvQVE5LIyiy4KrIU5ld2G91Fkaa0R13Au
 6H+/wKkcUC6OyaGE6wPx474xBkapT20vj5AIQuAWisXJJR0wbBon1sUTo/IRKsU+
 IkNxH0W+1PNImJ+crAdf/nkOlyqoChY4ww6cm07LrOsBLIsX5bCqXfL4HvKthElD
 MEgk2SN5kfjfR5Vf29W4hZVM1CT8VbhO41I7OzaZ6X6RU2PXoldPKlgKtZGeSKn1
 9bcMpSgB0BtbttvBevSkxTo5KHFozXS2DG3DFoMB3yFMme8Th0LrhBZ9oB7NIPNw
 ntMo4K75vviC6Vvzjy4Anj/+y+Zm3W6wDDP7F12O6WZLkK5s4hrSsHUm/MQnnKQP
 muJlG870RnSl73xUQZe3cuBxktXuJ3EHqqYIPE0npzvauu8hhWcis3opf2Y+U2s8
 aBCCIgp5kTKqjHLh2e4lNCKZf1/b/dhxRcRBQhpAIb8YsjMlIJyM+G8Jz6K6gBga
 5Ld+68UQ3oHJwoLV1HCFN8jbpQ9KZn1s9+h3yrYjRAcLNiFb3nU=
 =OvTo
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:

 - Convert to platform remove callback returning void

 - Extend changing default domain to normal group

 - Intel VT-d updates:
     - Remove VT-d virtual command interface and IOASID
     - Allow the VT-d driver to support non-PRI IOPF
     - Remove PASID supervisor request support
     - Various small and misc cleanups

 - ARM SMMU updates:
     - Device-tree binding updates:
         * Allow Qualcomm GPU SMMUs to accept relevant clock properties
         * Document Qualcomm 8550 SoC as implementing an MMU-500
         * Favour new "qcom,smmu-500" binding for Adreno SMMUs

     - Fix S2CR quirk detection on non-architectural Qualcomm SMMU
       implementations

     - Acknowledge SMMUv3 PRI queue overflow when consuming events

     - Document (in a comment) why ATS is disabled for bypass streams

 - AMD IOMMU updates:
     - 5-level page-table support
     - NUMA awareness for memory allocations

 - Unisoc driver: Support for reattaching an existing domain

 - Rockchip driver: Add missing set_platform_dma_ops callback

 - Mediatek driver: Adjust the dma-ranges

 - Various other small fixes and cleanups

* tag 'iommu-updates-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (82 commits)
  iommu: Remove iommu_group_get_by_id()
  iommu: Make iommu_release_device() static
  iommu/vt-d: Remove BUG_ON in dmar_insert_dev_scope()
  iommu/vt-d: Remove a useless BUG_ON(dev->is_virtfn)
  iommu/vt-d: Remove BUG_ON in map/unmap()
  iommu/vt-d: Remove BUG_ON when domain->pgd is NULL
  iommu/vt-d: Remove BUG_ON in handling iotlb cache invalidation
  iommu/vt-d: Remove BUG_ON on checking valid pfn range
  iommu/vt-d: Make size of operands same in bitwise operations
  iommu/vt-d: Remove PASID supervisor request support
  iommu/vt-d: Use non-privileged mode for all PASIDs
  iommu/vt-d: Remove extern from function prototypes
  iommu/vt-d: Do not use GFP_ATOMIC when not needed
  iommu/vt-d: Remove unnecessary checks in iopf disabling path
  iommu/vt-d: Move PRI handling to IOPF feature path
  iommu/vt-d: Move pfsid and ats_qdep calculation to device probe path
  iommu/vt-d: Move iopf code from SVA to IOPF enabling path
  iommu/vt-d: Allow SVA with device-specific IOPF
  dmaengine: idxd: Add enable/disable device IOPF feature
  arm64: dts: mt8186: Add dma-ranges for the parent "soc" node
  ...
2023-04-30 13:00:38 -07:00
Linus Torvalds 2aff7c706c Objtool changes for v6.4:
- Mark arch_cpu_idle_dead() __noreturn, make all architectures & drivers that did
    this inconsistently follow this new, common convention, and fix all the fallout
    that objtool can now detect statically.
 
  - Fix/improve the ORC unwinder becoming unreliable due to UNWIND_HINT_EMPTY ambiguity,
    split it into UNWIND_HINT_END_OF_STACK and UNWIND_HINT_UNDEFINED to resolve it.
 
  - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code.
 
  - Generate ORC data for __pfx code
 
  - Add more __noreturn annotations to various kernel startup/shutdown/panic functions.
 
  - Misc improvements & fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRK1x0RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1ghxQ/+IkCynMYtdF5OG9YwbcGJqsPSfOPMEcEM
 pUSFYg+gGPBDT/fJfcVSqvUtdnWbLC2kXt9yiswXz3X3J2nmNkBk5YKQftsNDcul
 TmKeqIIAK51XTncpegKH0EGnOX63oZ9Vxa8CTPdDlb+YF23Km2FoudGRI9F5qbUd
 LoraXqGYeiaeySkGyWmZVl6Uc8dIxnMkTN3H/oI9aB6TOrsi059hAtFcSaFfyemP
 c4LqXXCH7k2baiQt+qaLZ8cuZVG/+K5r2N2cmjO5kmJc6ynIaFnfMe4XxZLjp5LT
 /PulYI15bXkvSARKx5CRh/CDHMOx5Blw+ASO0RhWbdy0WH4ZhhcaVF5AeIpPW86a
 1LBcz97rMp72WmvKgrJeVO1r9+ll4SI6/YKGJRsxsCMdP3hgFpqntXyVjTFNdTM1
 0gH6H5v55x06vJHvhtTk8SR3PfMTEM2fRU5jXEOrGowoGifx+wNUwORiwj6LE3KQ
 SKUdT19RNzoW3VkFxhgk65ThK1S7YsJUKRoac3YdhttpqqqtFV//erenrZoR4k/p
 vzvKy68EQ7RCNyD5wNWNFe0YjeJl5G8gQ8bUm4Xmab7djjgz+pn4WpQB8yYKJLAo
 x9dqQ+6eUbw3Hcgk6qQ9E+r/svbulnAL0AeALAWK/91DwnZ2mCzKroFkLN7napKi
 fRho4CqzrtM=
 =NwEV
 -----END PGP SIGNATURE-----

Merge tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool updates from Ingo Molnar:

 - Mark arch_cpu_idle_dead() __noreturn, make all architectures &
   drivers that did this inconsistently follow this new, common
   convention, and fix all the fallout that objtool can now detect
   statically

 - Fix/improve the ORC unwinder becoming unreliable due to
   UNWIND_HINT_EMPTY ambiguity, split it into UNWIND_HINT_END_OF_STACK
   and UNWIND_HINT_UNDEFINED to resolve it

 - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code

 - Generate ORC data for __pfx code

 - Add more __noreturn annotations to various kernel startup/shutdown
   and panic functions

 - Misc improvements & fixes

* tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  x86/hyperv: Mark hv_ghcb_terminate() as noreturn
  scsi: message: fusion: Mark mpt_halt_firmware() __noreturn
  x86/cpu: Mark {hlt,resume}_play_dead() __noreturn
  btrfs: Mark btrfs_assertfail() __noreturn
  objtool: Include weak functions in global_noreturns check
  cpu: Mark nmi_panic_self_stop() __noreturn
  cpu: Mark panic_smp_self_stop() __noreturn
  arm64/cpu: Mark cpu_park_loop() and friends __noreturn
  x86/head: Mark *_start_kernel() __noreturn
  init: Mark start_kernel() __noreturn
  init: Mark [arch_call_]rest_init() __noreturn
  objtool: Generate ORC data for __pfx code
  x86/linkage: Fix padding for typed functions
  objtool: Separate prefix code from stack validation code
  objtool: Remove superfluous dead_end_function() check
  objtool: Add symbol iteration helpers
  objtool: Add WARN_INSN()
  scripts/objdump-func: Support multiple functions
  context_tracking: Fix KCSAN noinstr violation
  objtool: Add stackleak instrumentation to uaccess safe list
  ...
2023-04-28 14:02:54 -07:00
Linus Torvalds 22b8cc3e78 Add support for new Linear Address Masking CPU feature. This is similar
to ARM's Top Byte Ignore and allows userspace to store metadata in some
 bits of pointers without masking it out before use.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmRK/WIACgkQaDWVMHDJ
 krAL+RAAw33EhsWyYVkeAtYmYBKkGvlgeSDULtfJKe5bynJBTHkGKfM6RE9MSJIt
 5fHWaConGh8HNpy0Us1sDvd/aWcWRm5h7ZcCVD+R4qrgh/vc7ULzM+elXe5jzr4W
 cyuTckF2eW6SVrYg6fH5q+6Uy/moDtrdkLRvwRBf+AYeepB8gvSSH5XixKDNiVBE
 pjNy1xXVZQokqD4tjsFelmLttyacR5OabiE/aeVNoFYf9yTwfnN8N3T6kwuOoS4l
 Lp6NA+/0ux+oBlR+Is+JJG8Mxrjvz96yJGZYdR2YP5k3bMQtHAAjuq2w+GgqZm5i
 j3/E6KQepEGaCfC+bHl68xy/kKx8ik+jMCEcBalCC25J3uxbLz41g6K3aI890wJn
 +5ZtfcmoDUk9pnUyLxR8t+UjOSBFAcRSUE+FTjUH1qEGsMPK++9a4iLXz5vYVK1+
 +YCt1u5LNJbkDxE8xVX3F5jkXh0G01SJsuUVAOqHSNfqSNmohFK8/omqhVRrRqoK
 A7cYLtnOGiUXLnvjrwSxPNOzRrG+GAwqaw8gwOTaYogETWbTY8qsSCEVl204uYwd
 m8io9rk2ZXUdDuha56xpBbPE0JHL9hJ2eKCuPkfvRgJT9YFyTh+e0UdX20k+nDjc
 ang1S350o/Y0sus6rij1qS8AuxJIjHucG0GdgpZk3KUbcxoRLhI=
 =qitk
 -----END PGP SIGNATURE-----

Merge tag 'x86_mm_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 LAM (Linear Address Masking) support from Dave Hansen:
 "Add support for the new Linear Address Masking CPU feature.

  This is similar to ARM's Top Byte Ignore and allows userspace to store
  metadata in some bits of pointers without masking it out before use"

* tag 'x86_mm_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm/iommu/sva: Do not allow to set FORCE_TAGGED_SVA bit from outside
  x86/mm/iommu/sva: Fix error code for LAM enabling failure due to SVA
  selftests/x86/lam: Add test cases for LAM vs thread creation
  selftests/x86/lam: Add ARCH_FORCE_TAGGED_SVA test cases for linear-address masking
  selftests/x86/lam: Add inherit test cases for linear-address masking
  selftests/x86/lam: Add io_uring test cases for linear-address masking
  selftests/x86/lam: Add mmap and SYSCALL test cases for linear-address masking
  selftests/x86/lam: Add malloc and tag-bits test cases for linear-address masking
  x86/mm/iommu/sva: Make LAM and SVA mutually exclusive
  iommu/sva: Replace pasid_valid() helper with mm_valid_pasid()
  mm: Expose untagging mask in /proc/$PID/status
  x86/mm: Provide arch_prctl() interface for LAM
  x86/mm: Reduce untagged_addr() overhead for systems without LAM
  x86/uaccess: Provide untagged_addr() and remove tags before address check
  mm: Introduce untagged_addr_remote()
  x86/mm: Handle LAM on context switch
  x86: CPUID and CR3/CR4 flags for Linear Address Masking
  x86: Allow atomic MM_CONTEXT flags setting
  x86/mm: Rework address range check in get_user() and put_user()
2023-04-28 09:43:49 -07:00
Linus Torvalds 4980c176a7 Reduce redundant counter reads with resctrl refactoring
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmRKki4ACgkQaDWVMHDJ
 krDf7w//eQjKw0dny5YZGOaHUNKMzJWQtzAnDJS0HuedWIar5+iCRiLh3zbxyKU0
 8IG/xkDHQ5Dd1V7mOyl1g4WdZ/rFmmepl2VtvYnfMs5x+U2sf6LttzzkXetbP+oj
 x5uvfa9Vx8Ad8unhgYa9KIIkg/x02ImyupPLw32R2a/cMTRoi+LJEGiiUAWFTCx6
 4ZCtryAKHDTgrbuOWTz46cEgil3ZQLBI/uvF3IKd7BegfpbXQq/iyXJhhD/hWfVw
 lqswuGZN+yVLTkyJ4EHxUXAJI1AuH327KZI1SgSTe8AKFiygx0ZOrkmeI6cXbKJO
 os22OdT+cwAI8OkblH+9rMAd4dmAnLw9o/rGylC9rzwyXmmRII5FJ6LrbWFvsHmh
 QrUTcRzBtHmwLfqUf60b4bXDmI2MMrN5PAxmvRsHbzSfzMHVJDPXG0IoGBhUPrjS
 QmZuCNjsaVIOOxaSm+EtfFMeRxmfTEc6e3YxEeykfjGqPph9o0YK6H3o/4MgupJe
 uik0scEqBzq2MXkOYv5dysiTb57QAR/Y+CWvZHJ1YcwFvjAQahqFSwQ+gItoHTDL
 Rec9Tq9cm0AG1mZL1fVWFPK+ECiFti3YvZoIZEtAyg6hOMoZsZvu+VWcHFRdgIGk
 5riWJE8MiVyzQGcvWFBFbaeYLm1+obGhJHMpMW476K3jr75y+RU=
 =xO/z
 -----END PGP SIGNATURE-----

Merge tag 'x86_cache_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 resctrl update from Dave Hansen:
 "Reduce redundant counter reads with resctrl refactoring"

* tag 'x86_cache_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Avoid redundant counter read in __mon_event_count()
2023-04-28 09:30:51 -07:00
Linus Torvalds 682f7bbad2 - Unify duplicated __pa() and __va() definitions
- Simplify sysctl tables registration
 
 - Remove unused symbols
 
 - Correct function name in comment
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmRKjI4ACgkQEsHwGGHe
 VUq85Q/9FbGNdHD2uX2KcpeWkEjYVlrhDudsG8e1JAMCjfSD0CCNhG7yU+6Jabfs
 LszYLwkNeHVpLUUOOtAnObqXpdcv2vML7/j6Cgg5aqdMDv3RwIgTti5tSkHr7s1A
 ejH0Qo/oYYt2OsJYkl+KuGhcaBmdpqEOIeOtV98vBtqgkRDCwdJhhMZeF0qgZ1kN
 r3bFdwy0KIiyI+EBYDXEsew/nI9oEuzoNgaOVIZCeOtHjtbgdl/kc7JgfDd0838D
 nsoNk1R8PVSl6RY30my7TKbFl7epWibinnD9M8NcyYpbLlfZKI7L60ZtQZ5Q49pz
 z+LtXTgeS/fjaFuM8LKkekGprpNiDClgygNini3QsmSb3kfb4ymxJLKbVuXziOLZ
 eYAE+xexCNUYXhmeamvPWjRP9cUgQc3TQD0IQFv/FO8M0gXBA4jTauyRrs+NNmVI
 G7W7T90x1XUu4fZDM/QZ2cn5qtdcRMZm4NcV0WY5OU/ZrrMmMNyGvDfrwLhFOSXi
 nOqzlJ9GNRVjhHsQhCG16B2y3guWmPGXyCvn6Ruuv7RQcm7oK4Rmq6bHuuqcAyaI
 R5z2pRib3AzPNgHUfMgDWuCa7D9jBimVJI/dG0bXG8DCnzaBXfYJn2ruvwvQlVLC
 4WqwdyUxR7k+vf1l0kQ5voGCLbXOcLFBfGP+7RRnEzlyCut2t74=
 =I3Mj
 -----END PGP SIGNATURE-----

Merge tag 'x86_cleanups_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Borislav Petkov:

 - Unify duplicated __pa() and __va() definitions

 - Simplify sysctl tables registration

 - Remove unused symbols

 - Correct function name in comment

* tag 'x86_cleanups_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Centralize __pa()/__va() definitions
  x86: Simplify one-level sysctl registration for itmt_kern_table
  x86: Simplify one-level sysctl registration for abi_table2
  x86/platform/intel-mid: Remove unused definitions from intel-mid.h
  x86/uaccess: Remove memcpy_page_flushcache()
  x86/entry: Change stale function name in comment to error_return()
2023-04-28 09:22:30 -07:00
Linus Torvalds 33afd4b763 Mainly singleton patches all over the place. Series of note are:
- updates to scripts/gdb from Glenn Washburn
 
 - kexec cleanups from Bjorn Helgaas
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr+6wAKCRDdBJ7gKXxA
 jn4NAP4u/hj/kR2dxYehcVLuQqJspCRZZBZlAReFJyHNQO6voAEAk0NN9rtG2+/E
 r0G29CJhK+YL0W6mOs8O1yo9J1rZnAM=
 =2CUV
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2023-04-27-16-01' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:
 "Mainly singleton patches all over the place.

  Series of note are:

   - updates to scripts/gdb from Glenn Washburn

   - kexec cleanups from Bjorn Helgaas"

* tag 'mm-nonmm-stable-2023-04-27-16-01' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (50 commits)
  mailmap: add entries for Paul Mackerras
  libgcc: add forward declarations for generic library routines
  mailmap: add entry for Oleksandr
  ocfs2: reduce ioctl stack usage
  fs/proc: add Kthread flag to /proc/$pid/status
  ia64: fix an addr to taddr in huge_pte_offset()
  checkpatch: introduce proper bindings license check
  epoll: rename global epmutex
  scripts/gdb: add GDB convenience functions $lx_dentry_name() and $lx_i_dentry()
  scripts/gdb: create linux/vfs.py for VFS related GDB helpers
  uapi/linux/const.h: prefer ISO-friendly __typeof__
  delayacct: track delays from IRQ/SOFTIRQ
  scripts/gdb: timerlist: convert int chunks to str
  scripts/gdb: print interrupts
  scripts/gdb: raise error with reduced debugging information
  scripts/gdb: add a Radix Tree Parser
  lib/rbtree: use '+' instead of '|' for setting color.
  proc/stat: remove arch_idle_time()
  checkpatch: check for misuse of the link tags
  checkpatch: allow Closes tags with links
  ...
2023-04-27 19:57:00 -07:00
Linus Torvalds da46b58ff8 hyperv-next for v6.4
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmRHJSgTHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXjSOCAClsmFmyP320yAB74vQer5cSzxbIpFW
 3qt/P3D8zABn0UxjjmD8+LTHuyB+72KANU6qQ9No6zdYs8yaA1vGX8j8UglWWHuj
 fmaAD4DuZl+V+fmqDgHukgaPlhakmW0m5tJkR+TW3kCgnyrtvSWpXPoxUAe6CLvj
 Kb/SPl6ylHRWlIAEZ51gy0Ipqxjvs5vR/h9CWpTmRMuZvxdWUro2Cm82wJgzXPqq
 3eLbAzB29kLFEIIUpba9a/rif1yrWgVFlfpuENFZ+HUYuR78wrPB9evhwuPvhXd2
 +f+Wk0IXORAJo8h7aaMMIr6bd4Lyn98GPgmS5YSe92HRIqjBvtYs3Dq8
 =F6+n
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-next-signed-20230424' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv updates from Wei Liu:

 - PCI passthrough for Hyper-V confidential VMs (Michael Kelley)

 - Hyper-V VTL mode support (Saurabh Sengar)

 - Move panic report initialization code earlier (Long Li)

 - Various improvements and bug fixes (Dexuan Cui and Michael Kelley)

* tag 'hyperv-next-signed-20230424' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (22 commits)
  PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg
  Drivers: hv: move panic report code from vmbus to hv early init code
  x86/hyperv: VTL support for Hyper-V
  Drivers: hv: Kconfig: Add HYPERV_VTL_MODE
  x86/hyperv: Make hv_get_nmi_reason public
  x86/hyperv: Add VTL specific structs and hypercalls
  x86/init: Make get/set_rtc_noop() public
  x86/hyperv: Exclude lazy TLB mode CPUs from enlightened TLB flushes
  x86/hyperv: Add callback filter to cpumask_to_vpset()
  Drivers: hv: vmbus: Remove the per-CPU post_msg_page
  clocksource: hyper-v: make sure Invariant-TSC is used if it is available
  PCI: hv: Enable PCI pass-thru devices in Confidential VMs
  Drivers: hv: Don't remap addresses that are above shared_gpa_boundary
  hv_netvsc: Remove second mapping of send and recv buffers
  Drivers: hv: vmbus: Remove second way of mapping ring buffers
  Drivers: hv: vmbus: Remove second mapping of VMBus monitor pages
  swiotlb: Remove bounce buffer remapping for Hyper-V
  Driver: VMBus: Add Devicetree support
  dt-bindings: bus: Add Hyper-V VMBus
  Drivers: hv: vmbus: Convert acpi_device to more generic platform_device
  ...
2023-04-27 17:17:12 -07:00
Linus Torvalds b6a7828502 modules-6.4-rc1
The summary of the changes for this pull requests is:
 
  * Song Liu's new struct module_memory replacement
  * Nick Alcock's MODULE_LICENSE() removal for non-modules
  * My cleanups and enhancements to reduce the areas where we vmalloc
    module memory for duplicates, and the respective debug code which
    proves the remaining vmalloc pressure comes from userspace.
 
 Most of the changes have been in linux-next for quite some time except
 the minor fixes I made to check if a module was already loaded
 prior to allocating the final module memory with vmalloc and the
 respective debug code it introduces to help clarify the issue. Although
 the functional change is small it is rather safe as it can only *help*
 reduce vmalloc space for duplicates and is confirmed to fix a bootup
 issue with over 400 CPUs with KASAN enabled. I don't expect stable
 kernels to pick up that fix as the cleanups would have also had to have
 been picked up. Folks on larger CPU systems with modules will want to
 just upgrade if vmalloc space has been an issue on bootup.
 
 Given the size of this request, here's some more elaborate details
 on this pull request.
 
 The functional change change in this pull request is the very first
 patch from Song Liu which replaces the struct module_layout with a new
 struct module memory. The old data structure tried to put together all
 types of supported module memory types in one data structure, the new
 one abstracts the differences in memory types in a module to allow each
 one to provide their own set of details. This paves the way in the
 future so we can deal with them in a cleaner way. If you look at changes
 they also provide a nice cleanup of how we handle these different memory
 areas in a module. This change has been in linux-next since before the
 merge window opened for v6.3 so to provide more than a full kernel cycle
 of testing. It's a good thing as quite a bit of fixes have been found
 for it.
 
 Jason Baron then made dynamic debug a first class citizen module user by
 using module notifier callbacks to allocate / remove module specific
 dynamic debug information.
 
 Nick Alcock has done quite a bit of work cross-tree to remove module
 license tags from things which cannot possibly be module at my request
 so to:
 
   a) help him with his longer term tooling goals which require a
      deterministic evaluation if a piece a symbol code could ever be
      part of a module or not. But quite recently it is has been made
      clear that tooling is not the only one that would benefit.
      Disambiguating symbols also helps efforts such as live patching,
      kprobes and BPF, but for other reasons and R&D on this area
      is active with no clear solution in sight.
 
   b) help us inch closer to the now generally accepted long term goal
      of automating all the MODULE_LICENSE() tags from SPDX license tags
 
 In so far as a) is concerned, although module license tags are a no-op
 for non-modules, tools which would want create a mapping of possible
 modules can only rely on the module license tag after the commit
 8b41fc4454 ("kbuild: create modules.builtin without Makefile.modbuiltin
 or tristate.conf").  Nick has been working on this *for years* and
 AFAICT I was the only one to suggest two alternatives to this approach
 for tooling. The complexity in one of my suggested approaches lies in
 that we'd need a possible-obj-m and a could-be-module which would check
 if the object being built is part of any kconfig build which could ever
 lead to it being part of a module, and if so define a new define
 -DPOSSIBLE_MODULE [0]. A more obvious yet theoretical approach I've
 suggested would be to have a tristate in kconfig imply the same new
 -DPOSSIBLE_MODULE as well but that means getting kconfig symbol names
 mapping to modules always, and I don't think that's the case today. I am
 not aware of Nick or anyone exploring either of these options. Quite
 recently Josh Poimboeuf has pointed out that live patching, kprobes and
 BPF would benefit from resolving some part of the disambiguation as
 well but for other reasons. The function granularity KASLR (fgkaslr)
 patches were mentioned but Joe Lawrence has clarified this effort has
 been dropped with no clear solution in sight [1].
 
 In the meantime removing module license tags from code which could never
 be modules is welcomed for both objectives mentioned above. Some
 developers have also welcomed these changes as it has helped clarify
 when a module was never possible and they forgot to clean this up,
 and so you'll see quite a bit of Nick's patches in other pull
 requests for this merge window. I just picked up the stragglers after
 rc3. LWN has good coverage on the motivation behind this work [2] and
 the typical cross-tree issues he ran into along the way. The only
 concrete blocker issue he ran into was that we should not remove the
 MODULE_LICENSE() tags from files which have no SPDX tags yet, even if
 they can never be modules. Nick ended up giving up on his efforts due
 to having to do this vetting and backlash he ran into from folks who
 really did *not understand* the core of the issue nor were providing
 any alternative / guidance. I've gone through his changes and dropped
 the patches which dropped the module license tags where an SPDX
 license tag was missing, it only consisted of 11 drivers.  To see
 if a pull request deals with a file which lacks SPDX tags you
 can just use:
 
   ./scripts/spdxcheck.py -f \
 	$(git diff --name-only commid-id | xargs echo)
 
 You'll see a core module file in this pull request for the above,
 but that's not related to his changes. WE just need to add the SPDX
 license tag for the kernel/module/kmod.c file in the future but
 it demonstrates the effectiveness of the script.
 
 Most of Nick's changes were spread out through different trees,
 and I just picked up the slack after rc3 for the last kernel was out.
 Those changes have been in linux-next for over two weeks.
 
 The cleanups, debug code I added and final fix I added for modules
 were motivated by David Hildenbrand's report of boot failing on
 a systems with over 400 CPUs when KASAN was enabled due to running
 out of virtual memory space. Although the functional change only
 consists of 3 lines in the patch "module: avoid allocation if module is
 already present and ready", proving that this was the best we can
 do on the modules side took quite a bit of effort and new debug code.
 
 The initial cleanups I did on the modules side of things has been
 in linux-next since around rc3 of the last kernel, the actual final
 fix for and debug code however have only been in linux-next for about a
 week or so but I think it is worth getting that code in for this merge
 window as it does help fix / prove / evaluate the issues reported
 with larger number of CPUs. Userspace is not yet fixed as it is taking
 a bit of time for folks to understand the crux of the issue and find a
 proper resolution. Worst come to worst, I have a kludge-of-concept [3]
 of how to make kernel_read*() calls for modules unique / converge them,
 but I'm currently inclined to just see if userspace can fix this
 instead.
 
 [0] https://lore.kernel.org/all/Y/kXDqW+7d71C4wz@bombadil.infradead.org/
 [1] https://lkml.kernel.org/r/025f2151-ce7c-5630-9b90-98742c97ac65@redhat.com
 [2] https://lwn.net/Articles/927569/
 [3] https://lkml.kernel.org/r/20230414052840.1994456-3-mcgrof@kernel.org
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmRG4m0SHG1jZ3JvZkBr
 ZXJuZWwub3JnAAoJEM4jHQowkoinQ2oP/0xlvKwJg6Ey8fHZF0qv8VOskE80zoLF
 hMazU3xfqLA+1TQvouW1YBxt3jwS3t1Ehs+NrV+nY9Yzcm0MzRX/n3fASJVe7nRr
 oqWWQU+voYl5Pw1xsfdp6C8IXpBQorpYby3Vp0MAMoZyl2W2YrNo36NV488wM9KC
 jD4HF5Z6xpnPSZTRR7AgW9mo7FdAtxPeKJ76Bch7lH8U6omT7n36WqTw+5B1eAYU
 YTOvrjRs294oqmWE+LeebyiOOXhH/yEYx4JNQgCwPdxwnRiGJWKsk5va0hRApqF/
 WW8dIqdEnjsa84lCuxnmWgbcPK8cgmlO0rT0DyneACCldNlldCW1LJ0HOwLk9pea
 p3JFAsBL7TKue4Tos6I7/4rx1ufyBGGIigqw9/VX5g0Iif+3BhWnqKRfz+p9wiMa
 Fl7cU6u7yC68CHu1HBSisK16cYMCPeOnTSd89upHj8JU/t74O6k/ARvjrQ9qmNUt
 c5U+OY+WpNJ1nXQydhY/yIDhFdYg8SSpNuIO90r4L8/8jRQYXNG80FDd1UtvVDuy
 eq0r2yZ8C0XHSlOT9QHaua/tWV/aaKtyC/c0hDRrigfUrq8UOlGujMXbUnrmrWJI
 tLJLAc7ePWAAoZXGSHrt0U27l029GzLwRdKqJ6kkDANVnTeOdV+mmBg9zGh3/Mp6
 agiwdHUMVN7X
 =56WK
 -----END PGP SIGNATURE-----

Merge tag 'modules-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux

Pull module updates from Luis Chamberlain:
 "The summary of the changes for this pull requests is:

   - Song Liu's new struct module_memory replacement

   - Nick Alcock's MODULE_LICENSE() removal for non-modules

   - My cleanups and enhancements to reduce the areas where we vmalloc
     module memory for duplicates, and the respective debug code which
     proves the remaining vmalloc pressure comes from userspace.

  Most of the changes have been in linux-next for quite some time except
  the minor fixes I made to check if a module was already loaded prior
  to allocating the final module memory with vmalloc and the respective
  debug code it introduces to help clarify the issue. Although the
  functional change is small it is rather safe as it can only *help*
  reduce vmalloc space for duplicates and is confirmed to fix a bootup
  issue with over 400 CPUs with KASAN enabled. I don't expect stable
  kernels to pick up that fix as the cleanups would have also had to
  have been picked up. Folks on larger CPU systems with modules will
  want to just upgrade if vmalloc space has been an issue on bootup.

  Given the size of this request, here's some more elaborate details:

  The functional change change in this pull request is the very first
  patch from Song Liu which replaces the 'struct module_layout' with a
  new 'struct module_memory'. The old data structure tried to put
  together all types of supported module memory types in one data
  structure, the new one abstracts the differences in memory types in a
  module to allow each one to provide their own set of details. This
  paves the way in the future so we can deal with them in a cleaner way.
  If you look at changes they also provide a nice cleanup of how we
  handle these different memory areas in a module. This change has been
  in linux-next since before the merge window opened for v6.3 so to
  provide more than a full kernel cycle of testing. It's a good thing as
  quite a bit of fixes have been found for it.

  Jason Baron then made dynamic debug a first class citizen module user
  by using module notifier callbacks to allocate / remove module
  specific dynamic debug information.

  Nick Alcock has done quite a bit of work cross-tree to remove module
  license tags from things which cannot possibly be module at my request
  so to:

   a) help him with his longer term tooling goals which require a
      deterministic evaluation if a piece a symbol code could ever be
      part of a module or not. But quite recently it is has been made
      clear that tooling is not the only one that would benefit.
      Disambiguating symbols also helps efforts such as live patching,
      kprobes and BPF, but for other reasons and R&D on this area is
      active with no clear solution in sight.

   b) help us inch closer to the now generally accepted long term goal
      of automating all the MODULE_LICENSE() tags from SPDX license tags

  In so far as a) is concerned, although module license tags are a no-op
  for non-modules, tools which would want create a mapping of possible
  modules can only rely on the module license tag after the commit
  8b41fc4454 ("kbuild: create modules.builtin without
  Makefile.modbuiltin or tristate.conf").

  Nick has been working on this *for years* and AFAICT I was the only
  one to suggest two alternatives to this approach for tooling. The
  complexity in one of my suggested approaches lies in that we'd need a
  possible-obj-m and a could-be-module which would check if the object
  being built is part of any kconfig build which could ever lead to it
  being part of a module, and if so define a new define
  -DPOSSIBLE_MODULE [0].

  A more obvious yet theoretical approach I've suggested would be to
  have a tristate in kconfig imply the same new -DPOSSIBLE_MODULE as
  well but that means getting kconfig symbol names mapping to modules
  always, and I don't think that's the case today. I am not aware of
  Nick or anyone exploring either of these options. Quite recently Josh
  Poimboeuf has pointed out that live patching, kprobes and BPF would
  benefit from resolving some part of the disambiguation as well but for
  other reasons. The function granularity KASLR (fgkaslr) patches were
  mentioned but Joe Lawrence has clarified this effort has been dropped
  with no clear solution in sight [1].

  In the meantime removing module license tags from code which could
  never be modules is welcomed for both objectives mentioned above. Some
  developers have also welcomed these changes as it has helped clarify
  when a module was never possible and they forgot to clean this up, and
  so you'll see quite a bit of Nick's patches in other pull requests for
  this merge window. I just picked up the stragglers after rc3. LWN has
  good coverage on the motivation behind this work [2] and the typical
  cross-tree issues he ran into along the way. The only concrete blocker
  issue he ran into was that we should not remove the MODULE_LICENSE()
  tags from files which have no SPDX tags yet, even if they can never be
  modules. Nick ended up giving up on his efforts due to having to do
  this vetting and backlash he ran into from folks who really did *not
  understand* the core of the issue nor were providing any alternative /
  guidance. I've gone through his changes and dropped the patches which
  dropped the module license tags where an SPDX license tag was missing,
  it only consisted of 11 drivers. To see if a pull request deals with a
  file which lacks SPDX tags you can just use:

    ./scripts/spdxcheck.py -f \
	$(git diff --name-only commid-id | xargs echo)

  You'll see a core module file in this pull request for the above, but
  that's not related to his changes. WE just need to add the SPDX
  license tag for the kernel/module/kmod.c file in the future but it
  demonstrates the effectiveness of the script.

  Most of Nick's changes were spread out through different trees, and I
  just picked up the slack after rc3 for the last kernel was out. Those
  changes have been in linux-next for over two weeks.

  The cleanups, debug code I added and final fix I added for modules
  were motivated by David Hildenbrand's report of boot failing on a
  systems with over 400 CPUs when KASAN was enabled due to running out
  of virtual memory space. Although the functional change only consists
  of 3 lines in the patch "module: avoid allocation if module is already
  present and ready", proving that this was the best we can do on the
  modules side took quite a bit of effort and new debug code.

  The initial cleanups I did on the modules side of things has been in
  linux-next since around rc3 of the last kernel, the actual final fix
  for and debug code however have only been in linux-next for about a
  week or so but I think it is worth getting that code in for this merge
  window as it does help fix / prove / evaluate the issues reported with
  larger number of CPUs. Userspace is not yet fixed as it is taking a
  bit of time for folks to understand the crux of the issue and find a
  proper resolution. Worst come to worst, I have a kludge-of-concept [3]
  of how to make kernel_read*() calls for modules unique / converge
  them, but I'm currently inclined to just see if userspace can fix this
  instead"

Link: https://lore.kernel.org/all/Y/kXDqW+7d71C4wz@bombadil.infradead.org/ [0]
Link: https://lkml.kernel.org/r/025f2151-ce7c-5630-9b90-98742c97ac65@redhat.com [1]
Link: https://lwn.net/Articles/927569/ [2]
Link: https://lkml.kernel.org/r/20230414052840.1994456-3-mcgrof@kernel.org [3]

* tag 'modules-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (121 commits)
  module: add debugging auto-load duplicate module support
  module: stats: fix invalid_mod_bytes typo
  module: remove use of uninitialized variable len
  module: fix building stats for 32-bit targets
  module: stats: include uapi/linux/module.h
  module: avoid allocation if module is already present and ready
  module: add debug stats to help identify memory pressure
  module: extract patient module check into helper
  modules/kmod: replace implementation with a semaphore
  Change DEFINE_SEMAPHORE() to take a number argument
  module: fix kmemleak annotations for non init ELF sections
  module: Ignore L0 and rename is_arm_mapping_symbol()
  module: Move is_arm_mapping_symbol() to module_symbol.h
  module: Sync code of is_arm_mapping_symbol()
  scripts/gdb: use mem instead of core_layout to get the module address
  interconnect: remove module-related code
  interconnect: remove MODULE_LICENSE in non-modules
  zswap: remove MODULE_LICENSE in non-modules
  zpool: remove MODULE_LICENSE in non-modules
  x86/mm/dump_pagetables: remove MODULE_LICENSE in non-modules
  ...
2023-04-27 16:36:55 -07:00
Linus Torvalds 556eb8b791 Driver core changes for 6.4-rc1
Here is the large set of driver core changes for 6.4-rc1.
 
 Once again, a busy development cycle, with lots of changes happening in
 the driver core in the quest to be able to move "struct bus" and "struct
 class" into read-only memory, a task now complete with these changes.
 
 This will make the future rust interactions with the driver core more
 "provably correct" as well as providing more obvious lifetime rules for
 all busses and classes in the kernel.
 
 The changes required for this did touch many individual classes and
 busses as many callbacks were changed to take const * parameters
 instead.  All of these changes have been submitted to the various
 subsystem maintainers, giving them plenty of time to review, and most of
 them actually did so.
 
 Other than those changes, included in here are a small set of other
 things:
   - kobject logging improvements
   - cacheinfo improvements and updates
   - obligatory fw_devlink updates and fixes
   - documentation updates
   - device property cleanups and const * changes
   - firwmare loader dependency fixes.
 
 All of these have been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZEp7Sw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykitQCfamUHpxGcKOAGuLXMotXNakTEsxgAoIquENm5
 LEGadNS38k5fs+73UaxV
 =7K4B
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the large set of driver core changes for 6.4-rc1.

  Once again, a busy development cycle, with lots of changes happening
  in the driver core in the quest to be able to move "struct bus" and
  "struct class" into read-only memory, a task now complete with these
  changes.

  This will make the future rust interactions with the driver core more
  "provably correct" as well as providing more obvious lifetime rules
  for all busses and classes in the kernel.

  The changes required for this did touch many individual classes and
  busses as many callbacks were changed to take const * parameters
  instead. All of these changes have been submitted to the various
  subsystem maintainers, giving them plenty of time to review, and most
  of them actually did so.

  Other than those changes, included in here are a small set of other
  things:

   - kobject logging improvements

   - cacheinfo improvements and updates

   - obligatory fw_devlink updates and fixes

   - documentation updates

   - device property cleanups and const * changes

   - firwmare loader dependency fixes.

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits)
  device property: make device_property functions take const device *
  driver core: update comments in device_rename()
  driver core: Don't require dynamic_debug for initcall_debug probe timing
  firmware_loader: rework crypto dependencies
  firmware_loader: Strip off \n from customized path
  zram: fix up permission for the hot_add sysfs file
  cacheinfo: Add use_arch[|_cache]_info field/function
  arch_topology: Remove early cacheinfo error message if -ENOENT
  cacheinfo: Check cache properties are present in DT
  cacheinfo: Check sib_leaf in cache_leaves_are_shared()
  cacheinfo: Allow early level detection when DT/ACPI info is missing/broken
  cacheinfo: Add arm64 early level initializer implementation
  cacheinfo: Add arch specific early level initializer
  tty: make tty_class a static const structure
  driver core: class: remove struct class_interface * from callbacks
  driver core: class: mark the struct class in struct class_interface constant
  driver core: class: make class_register() take a const *
  driver core: class: mark class_release() as taking a const *
  driver core: remove incorrect comment for device_create*
  MIPS: vpe-cmp: remove module owner pointer from struct class usage.
  ...
2023-04-27 11:53:57 -07:00
Linus Torvalds df45da57cb arm64 updates for 6.4
ACPI:
 	* Improve error reporting when failing to manage SDEI on AGDI device
 	  removal
 
 Assembly routines:
 	* Improve register constraints so that the compiler can make use of
 	  the zero register instead of moving an immediate #0 into a GPR
 
 	* Allow the compiler to allocate the registers used for CAS
 	  instructions
 
 CPU features and system registers:
 	* Cleanups to the way in which CPU features are identified from the
 	  ID register fields
 
 	* Extend system register definition generation to handle Enum types
 	  when defining shared register fields
 
 	* Generate definitions for new _EL2 registers and add new fields
 	  for ID_AA64PFR1_EL1
 
 	* Allow SVE to be disabled separately from SME on the kernel
 	  command-line
 
 Tracing:
 	* Support for "direct calls" in ftrace, which enables BPF tracing
 	  for arm64
 
 Kdump:
 	* Don't bother unmapping the crashkernel from the linear mapping,
 	  which then allows us to use huge (block) mappings and reduce
 	  TLB pressure when a crashkernel is loaded.
 
 Memory management:
 	* Try again to remove data cache invalidation from the coherent DMA
 	  allocation path
 
 	* Simplify the fixmap code by mapping at page granularity
 
 	* Allow the kfence pool to be allocated early, preventing the rest
 	  of the linear mapping from being forced to page granularity
 
 Perf and PMU:
 	* Move CPU PMU code out to drivers/perf/ where it can be reused
 	  by the 32-bit ARM architecture when running on ARMv8 CPUs
 
 	* Fix race between CPU PMU probing and pKVM host de-privilege
 
 	* Add support for Apple M2 CPU PMU
 
 	* Adjust the generic PERF_COUNT_HW_BRANCH_INSTRUCTIONS event
 	  dynamically, depending on what the CPU actually supports
 
 	* Minor fixes and cleanups to system PMU drivers
 
 Stack tracing:
 	* Use the XPACLRI instruction to strip PAC from pointers, rather
 	  than rolling our own function in C
 
 	* Remove redundant PAC removal for toolchains that handle this in
 	  their builtins
 
 	* Make backtracing more resilient in the face of instrumentation
 
 Miscellaneous:
 	* Fix single-step with KGDB
 
 	* Remove harmless warning when 'nokaslr' is passed on the kernel
 	  command-line
 
 	* Minor fixes and cleanups across the board
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmRChcwQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNCgBCADFvkYY9ESztSnd3EpiMbbAzgRCQBiA5H7U
 F2Wc+hIWgeAeUEttSH22+F16r6Jb0gbaDvsuhtN2W/rwQhKNbCU0MaUME05MPmg2
 AOp+RZb2vdT5i5S5dC6ZM6G3T6u9O78LBWv2JWBdd6RIybamEn+RL00ep2WAduH7
 n1FgTbsKgnbScD2qd4K1ejZ1W/BQMwYulkNpyTsmCIijXM12lkzFlxWnMtky3uhR
 POpawcIZzXvWI02QAX+SIdynGChQV3VP+dh9GuFbt7ASigDEhgunvfUYhZNSaqf4
 +/q0O8toCtmQJBUhF0DEDSB5T8SOz5v9CKxKuwfaX6Trq0ixFQpZ
 =78L9
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "ACPI:

   - Improve error reporting when failing to manage SDEI on AGDI device
     removal

  Assembly routines:

   - Improve register constraints so that the compiler can make use of
     the zero register instead of moving an immediate #0 into a GPR

   - Allow the compiler to allocate the registers used for CAS
     instructions

  CPU features and system registers:

   - Cleanups to the way in which CPU features are identified from the
     ID register fields

   - Extend system register definition generation to handle Enum types
     when defining shared register fields

   - Generate definitions for new _EL2 registers and add new fields for
     ID_AA64PFR1_EL1

   - Allow SVE to be disabled separately from SME on the kernel
     command-line

  Tracing:

   - Support for "direct calls" in ftrace, which enables BPF tracing for
     arm64

  Kdump:

   - Don't bother unmapping the crashkernel from the linear mapping,
     which then allows us to use huge (block) mappings and reduce TLB
     pressure when a crashkernel is loaded.

  Memory management:

   - Try again to remove data cache invalidation from the coherent DMA
     allocation path

   - Simplify the fixmap code by mapping at page granularity

   - Allow the kfence pool to be allocated early, preventing the rest of
     the linear mapping from being forced to page granularity

  Perf and PMU:

   - Move CPU PMU code out to drivers/perf/ where it can be reused by
     the 32-bit ARM architecture when running on ARMv8 CPUs

   - Fix race between CPU PMU probing and pKVM host de-privilege

   - Add support for Apple M2 CPU PMU

   - Adjust the generic PERF_COUNT_HW_BRANCH_INSTRUCTIONS event
     dynamically, depending on what the CPU actually supports

   - Minor fixes and cleanups to system PMU drivers

  Stack tracing:

   - Use the XPACLRI instruction to strip PAC from pointers, rather than
     rolling our own function in C

   - Remove redundant PAC removal for toolchains that handle this in
     their builtins

   - Make backtracing more resilient in the face of instrumentation

  Miscellaneous:

   - Fix single-step with KGDB

   - Remove harmless warning when 'nokaslr' is passed on the kernel
     command-line

   - Minor fixes and cleanups across the board"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (72 commits)
  KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege
  arm64: kexec: include reboot.h
  arm64: delete dead code in this_cpu_set_vectors()
  arm64/cpufeature: Use helper macro to specify ID register for capabilites
  drivers/perf: hisi: add NULL check for name
  drivers/perf: hisi: Remove redundant initialized of pmu->name
  arm64/cpufeature: Consistently use symbolic constants for min_field_value
  arm64/cpufeature: Pull out helper for CPUID register definitions
  arm64/sysreg: Convert HFGITR_EL2 to automatic generation
  ACPI: AGDI: Improve error reporting for problems during .remove()
  arm64: kernel: Fix kernel warning when nokaslr is passed to commandline
  perf/arm-cmn: Fix port detection for CMN-700
  arm64: kgdb: Set PSTATE.SS to 1 to re-enable single-step
  arm64: move PAC masks to <asm/pointer_auth.h>
  arm64: use XPACLRI to strip PAC
  arm64: avoid redundant PAC stripping in __builtin_return_address()
  arm64/sme: Fix some comments of ARM SME
  arm64/signal: Alloc tpidr2 sigframe after checking system_supports_tpidr2()
  arm64/signal: Use system_supports_tpidr2() to check TPIDR2
  arm64/idreg: Don't disable SME when disabling SVE
  ...
2023-04-25 12:39:01 -07:00
Linus Torvalds de10553fce x86 APIC updates:
- Fix the incorrect handling of atomic offset updates in
    reserve_eilvt_offset()
 
    The check for the return value of atomic_cmpxchg() is not compared
    against the old value, it is compared against the new value, which
    makes it two round on success.
 
    Convert it to atomic_try_cmpxchg() which does the right thing.
 
  - Handle IO/APIC less systems correctly
 
    When IO/APIC is not advertised by ACPI then the computation of the lower
    bound for dynamically allocated interrupts like MSI goes wrong.
 
    This lower bound is used to exclude the IO/APIC legacy GSI space as that
    must stay reserved for the legacy interrupts.
 
    In case that the system, e.g. VM, does not advertise an IO/APIC the
    lower bound stays at 0.
 
    0 is an invalid interrupt number except for the legacy timer interrupt
    on x86. The return value is unchecked in the core code, so it ends up
    to allocate interrupt number 0 which is subsequently considered to be
    invalid by the caller, e.g. the MSI allocation code.
 
    A similar problem was already cured for device tree based systems years
    ago, but that missed - or did not envision - the zero IO/APIC case.
 
    Consolidate the zero check and return the provided "from" argument to the
    core code call site, which is guaranteed to be greater than 0.
 
  - Simplify the X2APIC cluster CPU mask logic for CPU hotplug
 
    Per cluster CPU masks are required for X2APIC in cluster mode to
    determine the correct cluster for a target CPU when calculating the
    destination for IPIs
 
    These masks are established when CPUs are borught up. The first CPU in a
    cluster must allocate a new cluster CPU mask. As this happens during the
    early startup of a CPU, where memory allocations cannot be done, the
    mask has to be allocated by the control CPU.
 
    The current implementation allocates a clustermask just in case and if
    the to be brought up CPU is the first in a cluster the CPU takes over
    this allocation from a global pointer.
 
    This works nicely in the fully serialized CPU bringup scenario which is
    used today, but would fail completely for parallel bringup of CPUs.
 
    The cluster association of a CPU can be computed from the APIC ID which
    is enumerated by ACPI/MADT.
 
    So the cluster CPU masks can be preallocated and associated upfront and
    the upcoming CPUs just need to set their corresponding bit.
 
    Aside of preparing for parallel bringup this is a valuable
    simplification on its own.
 
  - Remove global variables which control the early startup of secondary
    CPUs on 64-bit
 
    The only information which is needed by a starting CPU is the Linux CPU
    number. The CPU number allows it to retrieve the rest of the required
    data from already existing per CPU storage.
 
    So instead of initial_stack, early_gdt_desciptor and initial_gs provide
    a new variable smpboot_control which contains the Linux CPU number for
    now. The starting CPU can retrieve and compute all required information
    for startup from there.
 
    Aside of being a cleanup, this is also preparing for parallel CPU
    bringup, where starting CPUs will look up their Linux CPU number via the
    APIC ID, when smpboot_control has the corresponding control bit set.
 
  - Make cc_vendor globally accesible
 
    Subsequent parallel bringup changes require access to cc_vendor because
    confidental computing platforms need special treatment in the early
    startup phase vs. CPUID and APCI ID readouts.
 
    The change makes cc_vendor global and provides stub accessors in case
    that CONFIG_ARCH_HAS_CC_PLATFORM is not set.
 
    This was merged from the x86/cc branch in anticipation of further
    parallel bringup commits which require access to cc_vendor. Due to late
    discoveries of fundamental issue with those patches these commits never
    happened.
 
    The merge commit is unfortunately in the middle of the APIC commits so
    unraveling it would have required a rebase or revert. As the parallel
    bringup seems to be well on its way for 6.5 this would be just pointless
    churn. As the commit does not contain any functional change it's not a
    risk to keep it.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRGuAwTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoRzSEADEx1sVkd2yrLcTYdpjdKbbUaDJ6lR0
 DXxIP3+ApGHmV9l9yIh+/5C2oEJsiUfFf1vdh6ajv5iXpksCKzcUzkW5g3w7nM36
 CSpULpFjwvaq8TIo0o1PIhAbo/yIMMzJVDs8R0reCnWgGAWZoW/a9Ndcvcicd0an
 pQAlkw3FD5r92mcMlKPNWFoui1AkScGEV02zJ7884MAukmBZwD8Jd+gE6eQC9GKa
 9hyJiB77st1URl+a0cPsPYvv8RLVuVcljWsh2edyvxgovIO56+BoEjbrgRSF6cqQ
 Bhzo//3KgbUJ1y+YqH01aKZzY0hRpbAi2Rew4RBKcBKwCGd2qltUQG0LFNxAtV83
 RsC573wSCGSCGO5Xb1RVXih5is+9YqMqitJNWvEc15jjOA9nwoLc80axP11v42f9
 Xl4iGHQTWVGdxT4H22NH7UCuRlGg38vAx+In2HGpN/e57q2ighESjiGuqQAQpLel
 pbOeJtQ/D2xXVKcCap4T/P/2x5ls7bsc76MWJBMcYC3pRgJ5M7ZHw7wTw0IAty4x
 xCfR1bsRVEAhrE9r/odgNipXjBJu+CdGBAupNEIiRyq1QiwUKtMTayasRGUlbYO6
 vrieHKqoflzRVg2M9Bgm3oI28X27FzZHWAZJW2oJ2Wnn2jL5kuRJa1nEykqo8pEP
 j6rjnScRVvdpIw==
 =IQWG
 -----END PGP SIGNATURE-----

Merge tag 'x86-apic-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 APIC updates from Thomas Gleixner:

 - Fix the incorrect handling of atomic offset updates in
   reserve_eilvt_offset()

   The check for the return value of atomic_cmpxchg() is not compared
   against the old value, it is compared against the new value, which
   makes it two round on success.

   Convert it to atomic_try_cmpxchg() which does the right thing.

 - Handle IO/APIC less systems correctly

   When IO/APIC is not advertised by ACPI then the computation of the
   lower bound for dynamically allocated interrupts like MSI goes wrong.

   This lower bound is used to exclude the IO/APIC legacy GSI space as
   that must stay reserved for the legacy interrupts.

   In case that the system, e.g. VM, does not advertise an IO/APIC the
   lower bound stays at 0.

   0 is an invalid interrupt number except for the legacy timer
   interrupt on x86. The return value is unchecked in the core code, so
   it ends up to allocate interrupt number 0 which is subsequently
   considered to be invalid by the caller, e.g. the MSI allocation code.

   A similar problem was already cured for device tree based systems
   years ago, but that missed - or did not envision - the zero IO/APIC
   case.

   Consolidate the zero check and return the provided "from" argument to
   the core code call site, which is guaranteed to be greater than 0.

 - Simplify the X2APIC cluster CPU mask logic for CPU hotplug

   Per cluster CPU masks are required for X2APIC in cluster mode to
   determine the correct cluster for a target CPU when calculating the
   destination for IPIs

   These masks are established when CPUs are borught up. The first CPU
   in a cluster must allocate a new cluster CPU mask. As this happens
   during the early startup of a CPU, where memory allocations cannot be
   done, the mask has to be allocated by the control CPU.

   The current implementation allocates a clustermask just in case and
   if the to be brought up CPU is the first in a cluster the CPU takes
   over this allocation from a global pointer.

   This works nicely in the fully serialized CPU bringup scenario which
   is used today, but would fail completely for parallel bringup of
   CPUs.

   The cluster association of a CPU can be computed from the APIC ID
   which is enumerated by ACPI/MADT.

   So the cluster CPU masks can be preallocated and associated upfront
   and the upcoming CPUs just need to set their corresponding bit.

   Aside of preparing for parallel bringup this is a valuable
   simplification on its own.

 - Remove global variables which control the early startup of secondary
   CPUs on 64-bit

   The only information which is needed by a starting CPU is the Linux
   CPU number. The CPU number allows it to retrieve the rest of the
   required data from already existing per CPU storage.

   So instead of initial_stack, early_gdt_desciptor and initial_gs
   provide a new variable smpboot_control which contains the Linux CPU
   number for now. The starting CPU can retrieve and compute all
   required information for startup from there.

   Aside of being a cleanup, this is also preparing for parallel CPU
   bringup, where starting CPUs will look up their Linux CPU number via
   the APIC ID, when smpboot_control has the corresponding control bit
   set.

 - Make cc_vendor globally accesible

   Subsequent parallel bringup changes require access to cc_vendor
   because confidental computing platforms need special treatment in the
   early startup phase vs. CPUID and APCI ID readouts.

   The change makes cc_vendor global and provides stub accessors in case
   that CONFIG_ARCH_HAS_CC_PLATFORM is not set.

   This was merged from the x86/cc branch in anticipation of further
   parallel bringup commits which require access to cc_vendor. Due to
   late discoveries of fundamental issue with those patches these
   commits never happened.

   The merge commit is unfortunately in the middle of the APIC commits
   so unraveling it would have required a rebase or revert. As the
   parallel bringup seems to be well on its way for 6.5 this would be
   just pointless churn. As the commit does not contain any functional
   change it's not a risk to keep it.

* tag 'x86-apic-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ioapic: Don't return 0 from arch_dynirq_lower_bound()
  x86/apic: Fix atomic update of offset in reserve_eilvt_offset()
  x86/coco: Export cc_vendor
  x86/smpboot: Reference count on smpboot_setup_warm_reset_vector()
  x86/smpboot: Remove initial_gs
  x86/smpboot: Remove early_gdt_descr on 64-bit
  x86/smpboot: Remove initial_stack on 64-bit
  x86/apic/x2apic: Allow CPU cluster_mask to be populated in parallel
2023-04-25 11:39:45 -07:00
Linus Torvalds bc1bb2a49b - Add the necessary glue so that the kernel can run as a confidential
SEV-SNP vTOM guest on Hyper-V. A vTOM guest basically splits the
   address space in two parts: encrypted and unencrypted. The use case
   being running unmodified guests on the Hyper-V confidential computing
   hypervisor
 
 - Double-buffer messages between the guest and the hardware PSP device
   so that no partial buffers are copied back'n'forth and thus potential
   message integrity and leak attacks are possible
 
 - Name the return value the sev-guest driver returns when the hw PSP
   device hasn't been called, explicitly
 
 - Cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmRGl8gACgkQEsHwGGHe
 VUoEDhAAiw4+2nZR7XUJ7pewlXG7AJJZsVIpzzcF6Gyymn0LFCyMnP7O3snmFqzz
 aik0q2LzWrmDQ3Nmmzul0wtdsuW7Nik6BP9oF3WnB911+gGbpXyNWZ8EhOPNzkUR
 9D8Sp6f0xmqNE3YuzEpanufiDswgUxi++DRdmIRAs1TTh4bfUFWZcib1pdwoqSmR
 oS3UfVwVZ4Ee2Qm1f3n3XQ0FUpsjWeARPExUkLEvd8XeonTP+6aGAdggg9MnPcsl
 3zpSmOpuZ6VQbDrHxo3BH9HFuIUOd6S9PO++b9F6WxNPGEMk7fHa7ahOA6HjhgVz
 5Da3BN16OS9j64cZsYHMPsBcd+ja1YmvvZGypsY0d6X4d3M1zTPW+XeLbyb+VFBy
 SvA7z+JuxtLKVpju65sNiJWw8ZDTSu+eEYNDeeGLvAj3bxtclJjcPdMEPdzxmC5K
 eAhmRmiFuVM4nXMAR6cspVTsxvlTHFtd5gdm6RlRnvd7aV77Zl1CLzTy8IHTVpvI
 t7XTbtjEjYc0pI6cXXptHEOnBLjXUMPcqgGFgJYEauH6EvrxoWszUZD0tS3Hw80A
 K+Rwnc70ubq/PsgZcF4Ayer1j49z1NPfk5D4EA7/ChN6iNhQA8OqHT1UBrHAgqls
 2UAwzE2sQZnjDvGZghlOtFIQUIhwue7m93DaRi19EOdKYxVjV6U=
 =ZAw9
 -----END PGP SIGNATURE-----

Merge tag 'x86_sev_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SEV updates from Borislav Petkov:

 - Add the necessary glue so that the kernel can run as a confidential
   SEV-SNP vTOM guest on Hyper-V. A vTOM guest basically splits the
   address space in two parts: encrypted and unencrypted. The use case
   being running unmodified guests on the Hyper-V confidential computing
   hypervisor

 - Double-buffer messages between the guest and the hardware PSP device
   so that no partial buffers are copied back'n'forth and thus potential
   message integrity and leak attacks are possible

 - Name the return value the sev-guest driver returns when the hw PSP
   device hasn't been called, explicitly

 - Cleanups

* tag 'x86_sev_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/hyperv: Change vTOM handling to use standard coco mechanisms
  init: Call mem_encrypt_init() after Hyper-V hypercall init is done
  x86/mm: Handle decryption/re-encryption of bss_decrypted consistently
  Drivers: hv: Explicitly request decrypted in vmap_pfn() calls
  x86/hyperv: Reorder code to facilitate future work
  x86/ioremap: Add hypervisor callback for private MMIO mapping in coco VM
  x86/sev: Change snp_guest_issue_request()'s fw_err argument
  virt/coco/sev-guest: Double-buffer messages
  crypto: ccp: Get rid of __sev_platform_init_locked()'s local function pointer
  crypto: ccp - Name -1 return value as SEV_RET_NO_FW_CALL
2023-04-25 10:48:08 -07:00
Linus Torvalds c42b59bfaa - Convert a couple of paravirt callbacks to asm to prevent
-fzero-call-used-regs builds from zeroing live registers because
   paravirt hides the CALLs from the compiler so latter doesn't know
   there's a CALL in the first place
 
 - Merge two paravirt callbacks into one, as their functionality is
   identical
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmRGjhwACgkQEsHwGGHe
 VUpRUA//QduAqsDoIJ1U/y7JxbLriFR9X9rCY4u5pq2ZSroQZ861eBeUCJnlc+MQ
 +zlAsmGkLpb9b5be85vMByz++rO4ZTfBVamqJORD9Zj99RY0F7ym/HYXK4CP6J+E
 IrgMTTLPd8kMH/5Pb/tiNXOuKnNw99MdKhE5CPKGnvtM7eSzLrIuN9sHAUb9SMuV
 l+TWYh4vkf8+XfzSpp5WYaCgyDBN8tMWD3cBeLTljT3OEOh9vIQYWjRliKQyxjWG
 FJ8BnL8Nx+3kDkRjHyK4/h0P0KQYB6hnRSOrZyaae2H3N7uSMQbcLuRC6aXz1amm
 9AKoubhzx/A5hwGx8jKtGuLCkEtSakdcbiF0l3gek3Auecxcg6x8W+cCNvpq8FGV
 DJ349RPqR7TlKJwyvPp7dHRozVrY2sdbWZILxLhKDvAoOR4F927dt9+A96glc5dP
 VTnrlptj1vX+dSkKgKRTmPUKbsXM2h003qTiAUVzjMP0PcKUKknpBhz7kLQ3gpFc
 7rxyjHWANQJpY39WHvuIv+pzVUodrUGioA1LcEisx8FCM/iAIoejLi+ybbRMyc/2
 NN3TMxoEl3RIQCOFgsM8NxAvOL9P6+82NiM+0v0TgzszMlso7RzbjBeaaWRtxX+O
 82p9mTLDQuxESkA0HEwoTQa/xfO51zCi+SeLfhFO6A4s93Sjjb0=
 =Eu5f
 -----END PGP SIGNATURE-----

Merge tag 'x86_paravirt_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 paravirt updates from Borislav Petkov:

 - Convert a couple of paravirt callbacks to asm to prevent
   '-fzero-call-used-regs' builds from zeroing live registers because
   paravirt hides the CALLs from the compiler so latter doesn't know
   there's a CALL in the first place

 - Merge two paravirt callbacks into one, as their functionality is
   identical

* tag 'x86_paravirt_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/paravirt: Convert simple paravirt functions to asm
  x86/paravirt: Merge activate_mm() and dup_mmap() callbacks
2023-04-25 10:32:51 -07:00
Linus Torvalds e3420f98f8 - Add Emerald Rapids to the list of Intel models supporting PPIN
- Finally use a CPUID bit for split lock detection instead of
   enumerating every model
 
 - Make sure automatic IBRS is set on AMD, even though the AP bringup
   code does that now by replicating the MSR which contains the switch
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmRGiUkACgkQEsHwGGHe
 VUrjgw/7BnRvmgdSJg//TwlCnbnYCHbUzPbCfnMK8W6C5OvoRR+VYxeu3DoI/dsx
 xW2lMR/Svf30orB3EQTnpOBNa3PPbDlQvqInM+bQ/TYb5F6yIAnRkQhD9OaIQkeM
 CwX68pPcEPXCY+Ds2RmV6K2UvzIG5vVeYg6O36FVYUvON1tHFadEAT//lAMVspOs
 HBbhEOpu6/zHoKr53cduT2P9i7SAjCIjPRSMpuIfCd3RNcjwqWEXCyXxNad6LrTc
 Nd+xNjUpcRecl2bR41bIrpTfGGaU2XOJI2GiFfH/mBP8WNSP4Npp3LQVI35bDwLP
 VYr2IRGySxTerLSV2v8UwBSYw/hVltq5TkHyqjNaQB5JKbhxnH67GLV2LeOxawGz
 OogxcPF7RrVmr/c3ji4FE/QQlTbHczIRaSjNOYupHNNcQP5NrxVHWCNZRKX8ljh1
 Ah1G3s5vEVigzgqnMX8ey4xBpMtL4bilT2mMwh5hY2XMY3QjgrXLg+73VkvBkM6Y
 MjreNrUoGSC7Qw39rXtUfgRBMCB16CfFSsxPS4Isu6JwlNpJzOOifVdTdE4flOrd
 HR0ac776WKO9KJrPvnxYNf5mHRWkUWPS7t04BvkHuzOzxQQHz51A0xwh7td0kZA9
 vozSbxKE91sH0XD73x/H/EA/TGpWwYq7DQIYJOxCu1juq1ku7lM=
 =QquZ
 -----END PGP SIGNATURE-----

Merge tag 'x86_cpu_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpu model updates from Borislav Petkov:

 - Add Emerald Rapids to the list of Intel models supporting PPIN

 - Finally use a CPUID bit for split lock detection instead of
   enumerating every model

 - Make sure automatic IBRS is set on AMD, even though the AP bringup
   code does that now by replicating the MSR which contains the switch

* tag 'x86_cpu_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add Xeon Emerald Rapids to list of CPUs that support PPIN
  x86/split_lock: Enumerate architectural split lock disable bit
  x86/CPU/AMD: Make sure EFER[AIBRSE] is set
2023-04-25 10:20:52 -07:00
Linus Torvalds 1699dbebf3 - Improve code generation in ACPI's global lock's acquisition function
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmRGhkcACgkQEsHwGGHe
 VUq2yQ//W1tCGznJ+xSDBwOPillqwPB9asMrIjN9jSMet6mNZlApu4N44BTMHnD7
 Jex+MOfabSbVEgwJ+zUOla065/fWhO2kdPNLdOS9Pui3kftEnMkKwZMbx0v6erk0
 3rckg8UVkfMSrskvWkyy1RftJY9D5xiCTVzRmRpIR8GWn6eCFrGdLLINbLzSX5K/
 96cXF5iiTkWzj+nKAJhqsQZaXA+k2ro/bae7onX1gzJ08lR5x+XDDKv1HzJhnPJW
 Kw9lte4con7FRAEk4X8Q2UPTBvQjtFlfG7F1ho1jayWVwUO7z020tomnL0BO7Umd
 Whh6j66ZTD3cV+0oG5xu/JIBgHLlLP3q/+gZyePSDIXmoZm8MKmWldQ84Xbv0h8F
 dRxQUxxDAoRypK6ZN6aZgA/twxeIu6ezYr3paQiX9sjtQzFEa9DKTsOnzgrO0fsi
 t9nu4nzE4jBQ41DP6+GVlZ+ATujTixmirHNUj72H/sPVnqmniRLKMPGnsMPo9yi+
 +J+Yu382nuKl8O5koWC7zfUEZCQWZCPfvX7JCLir6YZl6qV36mPZGJknMX+ogquc
 CwDor5US71sExmEOSkQTog5YN7ldXKruWRsvuIl2yMH3xHWwCWw0VS0pHCAn3Fqe
 r/MdKXYjjDr1QOZWB9EyqrAUNbS2ftjI9/zrzq8wNsbT67gEHug=
 =ObsH
 -----END PGP SIGNATURE-----

Merge tag 'x86_acpi_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 ACPI update from Borislav Petkov:

 - Improve code generation in ACPI's global lock's acquisition function

* tag 'x86_acpi_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ACPI/boot: Improve __acpi_acquire_global_lock
2023-04-25 10:05:00 -07:00
Linus Torvalds d3464152e5 - Just cleanups and fixes this time around: make threshold_ktype const,
an objtool fix and use proper size for a bitmap
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmRGhG8ACgkQEsHwGGHe
 VUr9+xAAs5D7hCYfKVsdNiiWh+pBRnrnOjfCcRb0cvIIyE4DKvLbGJoTjsRN7WuT
 1ExnnjGL9CJHyqnJQj8/M0AdNgh28fOpJInzE3k7cDckZHQQp4cYDLQ2x9uAWvVl
 lNmOKTmVXq97hZw6maSTm4iFWDTzbdLveIETjlVWvjWVomkm9KI6/3HZN2qjzxJP
 IeCZ20lpZAg94/rMf6FqhuY1gaSa2yeXqz+wO19A5LBNhRHm60bFS2h48GiACxsV
 JD5jDPVsTozAGxyNxKe1DerzH4NQCBax9bzjW7TvAGqNLPamLPg5npGdrAg9SdD8
 yQ6F9TiUSU1jJfh3NqA7TxOcCtSr36xUrDJaOiMnVr68qi6kBnFsQ+Hxx3NwvCsU
 6304wESm+j1rJ1DwtKOrguIVZ+nI+s6I/ubki4wjxa7zZZqZ7/daNM/3j9Wl7Vng
 pd/augpcPR5+FNmU2Zq47ZK3kgqxjpEFpByjOChYclHGWZ4Jk717K7kf7CD424WM
 VU590ffXLQCN/pcPkDo4Rxj5LVkaXqocWfOfr5uB0XIPjP+wjsjtJF+Mi/phO23O
 dWFzI00GJZqKMehV07eCKaGoxVko9P/FxG8WdxLfw4BfmaOQGBB0O4m5tRvyPdjB
 Ezm0ZUbuLl3zmHmfM1ZCPRkZ532I/IAF88VcIygmYRUr78w6mfw=
 =e2hz
 -----END PGP SIGNATURE-----

Merge tag 'ras_core_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS updates from Borislav Petkov:

 - Just cleanups and fixes this time around: make threshold_ktype const,
   an objtool fix and use proper size for a bitmap

* tag 'ras_core_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/MCE/AMD: Use an u64 for bank_map
  x86/mce: Always inline old MCA stubs
  x86/MCE/AMD: Make kobj_type structure constant
2023-04-25 09:56:33 -07:00
Linus Torvalds ef36b9afc2 fget() to fdget() conversions
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZEYCQAAKCRBZ7Krx/gZQ
 64FdAQDZ2hTDyZEWPt486dWYPYpiKyaGFXSXDGo7wgP0fiwxXQEA/mROKb6JqYw6
 27mZ9A7qluT8r3AfTTQ0D+Yse/dr4AM=
 =GA9W
 -----END PGP SIGNATURE-----

Merge tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs fget updates from Al Viro:
 "fget() to fdget() conversions"

* tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fuse_dev_ioctl(): switch to fdget()
  cgroup_get_from_fd(): switch to fdget_raw()
  bpf: switch to fdget_raw()
  build_mount_idmapped(): switch to fdget()
  kill the last remaining user of proc_ns_fget()
  SVM-SEV: convert the rest of fget() uses to fdget() in there
  convert sgx_set_attribute() to fdget()/fdput()
  convert setns(2) to fdget()/fdput()
2023-04-24 19:14:20 -07:00
Linus Torvalds c23f28975a Commit volume in documentation is relatively low this time, but there is
still a fair amount going on, including:
 
 - Reorganizing the architecture-specific documentation under
   Documentation/arch.  This makes the structure match the source directory
   and helps to clean up the mess that is the top-level Documentation
   directory a bit.  This work creates the new directory and moves x86 and
   most of the less-active architectures there.  The current plan is to move
   the rest of the architectures in 6.5, with the patches going through the
   appropriate subsystem trees.
 
 - Some more Spanish translations and maintenance of the Italian
   translation.
 
 - A new "Kernel contribution maturity model" document from Ted.
 
 - A new tutorial on quickly building a trimmed kernel from Thorsten.
 
 Plus the usual set of updates and fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmRGze0PHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5Y/VsH/RyWqinorRVFZmHqRJMRhR0j7hE2pAgK5prE
 dGXYVtHHNQ+25thNaqhZTOLYFbSX6ii2NG7sLRXmyOTGIZrhUCFFXCHkuq4ZUypR
 gJpMUiKQVT4dhln3gIZ0k09NSr60gz8UTcq895N9UFpUdY1SCDhbCcLc4uXTRajq
 NrdgFaHWRkPb+gBRbXOExYm75DmCC6Ny5AyGo2rXfItV//ETjWIJVQpJhlxKrpMZ
 3LgpdYSLhEFFnFGnXJ+EAPJ7gXDi2Tg5DuPbkvJyFOTouF3j4h8lSS9l+refMljN
 xNRessv+boge/JAQidS6u8F2m2ESSqSxisv/0irgtKIMJwXaoX4=
 =1//8
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.4' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "Commit volume in documentation is relatively low this time, but there
  is still a fair amount going on, including:

   - Reorganize the architecture-specific documentation under
     Documentation/arch

     This makes the structure match the source directory and helps to
     clean up the mess that is the top-level Documentation directory a
     bit. This work creates the new directory and moves x86 and most of
     the less-active architectures there.

     The current plan is to move the rest of the architectures in 6.5,
     with the patches going through the appropriate subsystem trees.

   - Some more Spanish translations and maintenance of the Italian
     translation

   - A new "Kernel contribution maturity model" document from Ted

   - A new tutorial on quickly building a trimmed kernel from Thorsten

  Plus the usual set of updates and fixes"

* tag 'docs-6.4' of git://git.lwn.net/linux: (47 commits)
  media: Adjust column width for pdfdocs
  media: Fix building pdfdocs
  docs: clk: add documentation to log which clocks have been disabled
  docs: trace: Fix typo in ftrace.rst
  Documentation/process: always CC responsible lists
  docs: kmemleak: adjust to config renaming
  ELF: document some de-facto PT_* ABI quirks
  Documentation: arm: remove stih415/stih416 related entries
  docs: turn off "smart quotes" in the HTML build
  Documentation: firmware: Clarify firmware path usage
  docs/mm: Physical Memory: Fix grammar
  Documentation: Add document for false sharing
  dma-api-howto: typo fix
  docs: move m68k architecture documentation under Documentation/arch/
  docs: move parisc documentation under Documentation/arch/
  docs: move ia64 architecture docs under Documentation/arch/
  docs: Move arc architecture docs under Documentation/arch/
  docs: move nios2 documentation under Documentation/arch/
  docs: move openrisc documentation under Documentation/arch/
  docs: move superh documentation under Documentation/arch/
  ...
2023-04-24 12:35:49 -07:00
Al Viro e73d43760a convert sgx_set_attribute() to fdget()/fdput()
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-04-20 22:55:35 -04:00
Linus Torvalds e046fe5a36 x86: set FSRS automatically on AMD CPUs that have FSRM
So Intel introduced the FSRS ("Fast Short REP STOS") CPU capability bit,
because they seem to have done the (much simpler) REP STOS optimizations
separately and later than the REP MOVS one.

In contrast, when AMD introduced support for FSRM ("Fast Short REP
MOVS"), in the Zen 3 core, it appears to have improved the REP STOS case
at the same time, and since the FSRS bit was added by Intel later, it
doesn't show up on those AMD Zen 3 cores.

And now that we made use of FSRS for the "rep stos" conditional, that
made those AMD machines unnecessarily slower.  The Intel situation where
"rep movs" is fast, but "rep stos" isn't, is just odd.  The 'stos' case
is a lot simpler with no aliasing, no mutual alignment issues, no
complicated cases.

So this just sets FSRS automatically when FSRM is available on AMD
machines, to get back all the nice REP STOS goodness in Zen 3.

Reported-and-tested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-18 17:05:28 -07:00
Peter Zijlstra 48380368de Change DEFINE_SEMAPHORE() to take a number argument
Fundamentally semaphores are a counted primitive, but
DEFINE_SEMAPHORE() does not expose this and explicitly creates a
binary semaphore.

Change DEFINE_SEMAPHORE() to take a number argument and use that in the
few places that open-coded it using __SEMAPHORE_INITIALIZER().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[mcgrof: add some tribal knowledge about why some folks prefer
 binary sempahores over mutexes]
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-04-18 11:15:24 -07:00
Saurabh Sengar 3be1bc2fe9 x86/hyperv: VTL support for Hyper-V
Virtual Trust Levels (VTL) helps enable Hyper-V Virtual Secure Mode (VSM)
feature. VSM is a set of hypervisor capabilities and enlightenments
offered to host and guest partitions which enable the creation and
management of new security boundaries within operating system software.
VSM achieves and maintains isolation through VTLs.

Add early initialization for Virtual Trust Levels (VTL). This includes
initializing the x86 platform for VTL and enabling boot support for
secondary CPUs to start in targeted VTL context. For now, only enable
the code for targeted VTL level as 2.

When starting an AP at a VTL other than VTL0, the AP must start directly
in 64-bit mode, bypassing the usual 16-bit -> 32-bit -> 64-bit mode
transition sequence that occurs after waking up an AP with SIPI whose
vector points to the 16-bit AP startup trampoline code.

Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Stanislav Kinsburskii <stanislav.kinsburskii@gmail.com>
Link: https://lore.kernel.org/r/1681192532-15460-6-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-18 17:29:52 +00:00
Saurabh Sengar 0a7a00580a x86/hyperv: Make hv_get_nmi_reason public
Move hv_get_nmi_reason to .h file so it can be used in other
modules as well.

Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1681192532-15460-4-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-18 17:29:52 +00:00
Saurabh Sengar d21a19e1c2 x86/init: Make get/set_rtc_noop() public
Make get/set_rtc_noop() to be public so that they can be used
in other modules as well.

Co-developed-by: Tianyu Lan <tiala@microsoft.com>
Signed-off-by: Tianyu Lan <tiala@microsoft.com>
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/1681192532-15460-2-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-18 17:29:51 +00:00
Michael Kelley 0459ff4873 swiotlb: Remove bounce buffer remapping for Hyper-V
With changes to how Hyper-V guest VMs flip memory between private
(encrypted) and shared (decrypted), creating a second kernel virtual
mapping for shared memory is no longer necessary. Everything needed
for the transition to shared is handled by set_memory_decrypted().

As such, remove swiotlb_unencrypted_base and the associated
code.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/1679838727-87310-8-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-17 19:19:04 +00:00
Wei Liu 21eb596fce Merge remote-tracking branch 'tip/x86/sev' into hyperv-next
Merge the following 6 patches from tip/x86/sev, which are taken from
Michael Kelley's series [0]. The rest of Michael's series depend on
them.

  x86/hyperv: Change vTOM handling to use standard coco mechanisms
  init: Call mem_encrypt_init() after Hyper-V hypercall init is done
  x86/mm: Handle decryption/re-encryption of bss_decrypted consistently
  Drivers: hv: Explicitly request decrypted in vmap_pfn() calls
  x86/hyperv: Reorder code to facilitate future work
  x86/ioremap: Add hypervisor callback for private MMIO mapping in coco VM

0: https://lore.kernel.org/linux-hyperv/1679838727-87310-1-git-send-email-mikelley@microsoft.com/
2023-04-17 19:18:13 +00:00
Josh Poimboeuf 52668badd3 x86/cpu: Mark {hlt,resume}_play_dead() __noreturn
Fixes the following warning:

  vmlinux.o: warning: objtool: resume_play_dead+0x21: unreachable instruction

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/ce1407c4bf88b1334fe40413126343792a77ca50.1681342859.git.jpoimboe@kernel.org
2023-04-14 17:31:27 +02:00
Josh Poimboeuf 27dea14c7f cpu: Mark nmi_panic_self_stop() __noreturn
In preparation for improving objtool's handling of weak noreturn
functions, mark nmi_panic_self_stop() __noreturn.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/316fc6dfab5a8c4e024c7185484a1ee5fb0afb79.1681342859.git.jpoimboe@kernel.org
2023-04-14 17:31:26 +02:00
Josh Poimboeuf 4208d2d798 x86/head: Mark *_start_kernel() __noreturn
Now that start_kernel() is __noreturn, mark its chain of callers
__noreturn.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/c2525f96b88be98ee027ee0291d58003036d4120.1681342859.git.jpoimboe@kernel.org
2023-04-14 17:31:24 +02:00
Joerg Roedel e51b419839 Merge branches 'iommu/fixes', 'arm/allwinner', 'arm/exynos', 'arm/mediatek', 'arm/omap', 'arm/renesas', 'arm/rockchip', 'arm/smmu', 'ppc/pamu', 'unisoc', 'x86/vt-d', 'x86/amd', 'core' and 'platform-remove_new' into next 2023-04-14 13:45:50 +02:00
Matija Glavinic Pecotic 775d3c514c x86/rtc: Remove __init for runtime functions
set_rtc_noop(), get_rtc_noop() are after booting, therefore their __init
annotation is wrong.

A crash was observed on an x86 platform where CMOS RTC is unused and
disabled via device tree. set_rtc_noop() was invoked from ntp:
sync_hw_clock(), although CONFIG_RTC_SYSTOHC=n, however sync_cmos_clock()
doesn't honour that.

  Workqueue: events_power_efficient sync_hw_clock
  RIP: 0010:set_rtc_noop
  Call Trace:
   update_persistent_clock64
   sync_hw_clock

Fix this by dropping the __init annotation from set/get_rtc_noop().

Fixes: c311ed6183 ("x86/init: Allow DT configured systems to disable RTC at boot time")
Signed-off-by: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nokia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/59f7ceb1-446b-1d3d-0bc8-1f0ee94b1e18@nokia.com
2023-04-13 14:41:04 +02:00
Saurabh Sengar 5af507bef9 x86/ioapic: Don't return 0 from arch_dynirq_lower_bound()
arch_dynirq_lower_bound() is invoked by the core interrupt code to
retrieve the lowest possible Linux interrupt number for dynamically
allocated interrupts like MSI.

The x86 implementation uses this to exclude the IO/APIC GSI space.
This works correctly as long as there is an IO/APIC registered, but
returns 0 if not. This has been observed in VMs where the BIOS does
not advertise an IO/APIC.

0 is an invalid interrupt number except for the legacy timer interrupt
on x86. The return value is unchecked in the core code, so it ends up
to allocate interrupt number 0 which is subsequently considered to be
invalid by the caller, e.g. the MSI allocation code.

The function has already a check for 0 in the case that an IO/APIC is
registered, as ioapic_dynirq_base is 0 in case of device tree setups.

Consolidate this and zero check for both ioapic_dynirq_base and gsi_top,
which is used in the case that no IO/APIC is registered.

Fixes: 3e5bedc2c2 ("x86/apic: Fix arch_dynirq_lower_bound() bug for DT enabled machines")
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/1679988604-20308-1-git-send-email-ssengar@linux.microsoft.com
2023-04-12 17:45:50 +02:00
Linus Torvalds 4ba115e269 - Add a new Intel Arrow Lake CPU model number
- Fix a confusion about how to check the version of the ACPI spec which
   supports a "online capable" bit in the MADT table which lead to
   a bunch of boot breakages with Zen1 systems and VMs
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmQymF8ACgkQEsHwGGHe
 VUr2cQ//YjJTZpeZ0/azfqz3IvUEW9etR8NXFfBOiATjk6BfZPtjZizdwZ2GtFtk
 /P3XPDxa34dW4ifLBCHdGpabYF5OWUj44hB2nyf3PxDSGLt6EjLtnxQm+WtobwWh
 uQ8BBA6QFw0kHPqH7TI5xA/Z4m6AvIBKZpAsG6GG/K3jVLYUgNoCSxplV7OvC0/M
 gobjMLdoofOXvuevgX6K311YiZY4hniVUSSNdOy4oDNwDFSguqVwr3wjVC7Hod8t
 rl5EzNcabhMmy4Hf5OmCcU1WQdM+XS5GbulPW0PMliuvFE+2ImqVyO9G41MG+Gw0
 2qwhTsCA7R/Rm08eJh1ad5DmHdXc7M1ChC5s5ZTU25/9DDvRKxI7cHmIZxOxXCgy
 WJbP6LRw/9M7pBHatDjUTV7DdZnOkdsUK33dKWhiNZCB08EOVhcOVDq7EslMageH
 YhRmjpa2qfDpGOQpzwOZLgOpN3cz713QI7MBGFu0CBtbUR6ZKagQ0R/7As/vxuUo
 t5S8zMc/m+ra6til3MrP9QQvA0FA6Jo6H1DAZmglShtveO0eAGJCWaVbV8xppAIG
 DNGxk96OLJnigH0s0nAMHRQkJYM0yyKO02YkGbjGZhtLP53vDIiBqPD2PN5UJLO3
 U4ucZmmet/iExApEvbTbQ7uWfCEHLLDc3Kidh0fdoTDIC5JKu2U=
 =Nk8H
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.3_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Add a new Intel Arrow Lake CPU model number

 - Fix a confusion about how to check the version of the ACPI spec which
   supports a "online capable" bit in the MADT table which lead to a
   bunch of boot breakages with Zen1 systems and VMs

* tag 'x86_urgent_for_v6.3_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add model number for Intel Arrow Lake processor
  x86/acpi/boot: Correct acpi_is_processor_usable() check
  x86/ACPI/boot: Use FADT version to check support for online capable
2023-04-09 10:00:16 -07:00
Bjorn Helgaas 7982722ff7 x86/kexec: remove unnecessary arch_kexec_kernel_image_load()
Patch series "kexec: Remove unnecessary arch hook", v2.

There are no arch-specific things in arch_kexec_kernel_image_load(), so
remove it and just use the generic version.


This patch (of 2):

The x86 implementation of arch_kexec_kernel_image_load() is functionally
identical to the generic arch_kexec_kernel_image_load():

  arch_kexec_kernel_image_load                # x86
    if (!image->fops || !image->fops->load)
      return ERR_PTR(-ENOEXEC);
    return image->fops->load(image, image->kernel_buf, ...)

  arch_kexec_kernel_image_load                # generic
    kexec_image_load_default
      if (!image->fops || !image->fops->load)
	return ERR_PTR(-ENOEXEC);
      return image->fops->load(image, image->kernel_buf, ...)

Remove the x86-specific version and use the generic
arch_kexec_kernel_image_load().  No functional change intended.

Link: https://lkml.kernel.org/r/20230307224416.907040-1-helgaas@kernel.org
Link: https://lkml.kernel.org/r/20230307224416.907040-2-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-08 13:45:38 -07:00
Uros Bizjak f96fb2df3e x86/apic: Fix atomic update of offset in reserve_eilvt_offset()
The detection of atomic update failure in reserve_eilvt_offset() is
not correct. The value returned by atomic_cmpxchg() should be compared
to the old value from the location to be updated.

If these two are the same, then atomic update succeeded and
"eilvt_offsets[offset]" location is updated to "new" in an atomic way.

Otherwise, the atomic update failed and it should be retried with the
value from "eilvt_offsets[offset]" - exactly what atomic_try_cmpxchg()
does in a correct and more optimal way.

Fixes: a68c439b19 ("apic, x86: Check if EILVT APIC registers are available (AMD only)")
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230227160917.107820-1-ubizjak@gmail.com
2023-04-07 14:34:24 +02:00
Kirill A. Shutemov 97740266de x86/mm/iommu/sva: Do not allow to set FORCE_TAGGED_SVA bit from outside
arch_prctl(ARCH_FORCE_TAGGED_SVA) overrides the default and allows LAM
and SVA to co-exist in the process. It is expected by called by the
process when it knows what it is doing.

arch_prctl() operates on the current process, but the same code is
reachable from ptrace where it can be called on arbitrary task.

Make it strict and only allow to set MM_CONTEXT_FORCE_TAGGED_SVA for the
current process.

Fixes: 23e5d9ec2b ("x86/mm/iommu/sva: Make LAM and SVA mutually exclusive")
Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://lore.kernel.org/all/20230403111020.3136-3-kirill.shutemov%40linux.intel.com
2023-04-06 13:45:06 -07:00
Kirill A. Shutemov fca1fdd2b0 x86/mm/iommu/sva: Fix error code for LAM enabling failure due to SVA
Normally, LAM and SVA are mutually exclusive. LAM enabling will fail if
SVA is already in use.

Correct error code for the failure. EINTR is nonsensical there.

Fixes: 23e5d9ec2b ("x86/mm/iommu/sva: Make LAM and SVA mutually exclusive")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://lore.kernel.org/all/CACT4Y+YfqSMsZArhh25TESmG-U4jO5Hjphz87wKSnTiaw2Wrfw@mail.gmail.com
Link: https://lore.kernel.org/all/20230403111020.3136-2-kirill.shutemov%40linux.intel.com
2023-04-06 13:44:58 -07:00
Tony Luck 36168bc061 x86/cpu: Add Xeon Emerald Rapids to list of CPUs that support PPIN
This should be the last addition to this table. Future CPUs will
enumerate PPIN support using CPUID.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230404212124.428118-1-tony.luck@intel.com
2023-04-05 20:01:52 +02:00
Linus Torvalds 2d72ab2449 hyperv-fixes for 6.3-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmQqIpUTHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXjtwCACaG8LkrLOa4EWwdVLOutxc/VSHhPzS
 FaCyzxaSNtFciSl/kOPsl2pmwy+c9QAri3wO9uyJ41R1oUfjy/+pX8TxYc1imOrh
 6vIMUntYW7t9ISoUbi7hDU1Nj3CX4KOXruOliLP3WM9mtGvaNL5INEDh9PV6bxIz
 xlP8JEoKTk0ecChOWZDWyDIE95MwgqRin8uEI0JUyE2mdegIrDC7SFvqT7XjV23O
 0gntPdoZCgBzWohaiRMKJHHNUbAC+1O2+1tzY0bONwHdpmRj5/V28e02iARF3bAE
 4TvTt3qrZU02epzMhkZPnTztyvp1vPzmpHaBHQD4pdNZP/D1b8ejm4mz
 =o+VM
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-fixes-signed-20230402' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv fixes from Wei Liu:

 - Fix a bug in channel allocation for VMbus (Mohammed Gamal)

 - Do not allow root partition functionality in CVM (Michael Kelley)

* tag 'hyperv-fixes-signed-20230402' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  x86/hyperv: Block root partition functionality in a Confidential VM
  Drivers: vmbus: Check for channel allocation before looking up relids
2023-04-03 09:34:08 -07:00
Greg Kroah-Hartman cd8fe5b6db Merge 6.3-rc5 into driver-core-next
We need the fixes in here for testing, as well as the driver core
changes for documentation updates to build on.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-03 09:33:30 +02:00
Jacob Pan fffaed1e24 iommu/ioasid: Rename INVALID_IOASID
INVALID_IOASID and IOMMU_PASID_INVALID are duplicated. Rename
INVALID_IOASID and consolidate since we are moving away from IOASID
infrastructure.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20230322200803.869130-7-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:03:27 +02:00
Jonathan Corbet ff61f0791c docs: move x86 documentation into Documentation/arch/
Move the x86 documentation under Documentation/arch/ as a way of cleaning
up the top-level directory and making the structure of our docs more
closely match the structure of the source directories it describes.

All in-kernel references to the old paths have been updated.

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-arch@vger.kernel.org
Cc: x86@kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20230315211523.108836-1-corbet@lwn.net/
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-03-30 12:58:51 -06:00
Eric DeVolder fed8d8773b x86/acpi/boot: Correct acpi_is_processor_usable() check
The logic in acpi_is_processor_usable() requires the online capable
bit be set for hotpluggable CPUs.  The online capable bit has been
introduced in ACPI 6.3.

However, for ACPI revisions < 6.3 which do not support that bit, CPUs
should be reported as usable, not the other way around.

Reverse the check.

  [ bp: Rewrite commit message. ]

Fixes: e2869bd7af ("x86/acpi/boot: Do not register processors that cannot be onlined for x2APIC")
Suggested-by: Miguel Luis <miguel.luis@oracle.com>
Suggested-by: Boris Ostrovsky <boris.ovstrosky@oracle.com>
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: David R <david@unsolicited.net>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20230327191026.3454-2-eric.devolder@oracle.com
2023-03-30 11:07:30 +02:00
Mario Limonciello a74fabfbd1 x86/ACPI/boot: Use FADT version to check support for online capable
ACPI 6.3 introduced the online capable bit, and also introduced MADT
version 5.

Latter was used to distinguish whether the offset storing online capable
could be used. However ACPI 6.2b has MADT version "45" which is for
an errata version of the ACPI 6.2 spec.  This means that the Linux code
for detecting availability of MADT will mistakenly flag ACPI 6.2b as
supporting online capable which is inaccurate as it's an ACPI 6.3 feature.

Instead use the FADT major and minor revision fields to distinguish this.

  [ bp: Massage. ]

Fixes: aa06e20f1b ("x86/ACPI: Don't add CPUs that are not online capable")
Reported-by: Eric DeVolder <eric.devolder@oracle.com>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/943d2445-84df-d939-f578-5d8240d342cc@unsolicited.net
2023-03-30 10:50:30 +02:00
Michael Kelley 812b0597fb x86/hyperv: Change vTOM handling to use standard coco mechanisms
Hyper-V guests on AMD SEV-SNP hardware have the option of using the
"virtual Top Of Memory" (vTOM) feature specified by the SEV-SNP
architecture. With vTOM, shared vs. private memory accesses are
controlled by splitting the guest physical address space into two
halves.

vTOM is the dividing line where the uppermost bit of the physical
address space is set; e.g., with 47 bits of guest physical address
space, vTOM is 0x400000000000 (bit 46 is set).  Guest physical memory is
accessible at two parallel physical addresses -- one below vTOM and one
above vTOM.  Accesses below vTOM are private (encrypted) while accesses
above vTOM are shared (decrypted). In this sense, vTOM is like the
GPA.SHARED bit in Intel TDX.

Support for Hyper-V guests using vTOM was added to the Linux kernel in
two patch sets[1][2]. This support treats the vTOM bit as part of
the physical address. For accessing shared (decrypted) memory, these
patch sets create a second kernel virtual mapping that maps to physical
addresses above vTOM.

A better approach is to treat the vTOM bit as a protection flag, not
as part of the physical address. This new approach is like the approach
for the GPA.SHARED bit in Intel TDX. Rather than creating a second kernel
virtual mapping, the existing mapping is updated using recently added
coco mechanisms.

When memory is changed between private and shared using
set_memory_decrypted() and set_memory_encrypted(), the PTEs for the
existing kernel mapping are changed to add or remove the vTOM bit in the
guest physical address, just as with TDX. The hypercalls to change the
memory status on the host side are made using the existing callback
mechanism. Everything just works, with a minor tweak to map the IO-APIC
to use private accesses.

To accomplish the switch in approach, the following must be done:

* Update Hyper-V initialization to set the cc_mask based on vTOM
  and do other coco initialization.

* Update physical_mask so the vTOM bit is no longer treated as part
  of the physical address

* Remove CC_VENDOR_HYPERV and merge the associated vTOM functionality
  under CC_VENDOR_AMD. Update cc_mkenc() and cc_mkdec() to set/clear
  the vTOM bit as a protection flag.

* Code already exists to make hypercalls to inform Hyper-V about pages
  changing between shared and private.  Update this code to run as a
  callback from __set_memory_enc_pgtable().

* Remove the Hyper-V special case from __set_memory_enc_dec()

* Remove the Hyper-V specific call to swiotlb_update_mem_attributes()
  since mem_encrypt_init() will now do it.

* Add a Hyper-V specific implementation of the is_private_mmio()
  callback that returns true for the IO-APIC and vTPM MMIO addresses

  [1] https://lore.kernel.org/all/20211025122116.264793-1-ltykernel@gmail.com/
  [2] https://lore.kernel.org/all/20211213071407.314309-1-ltykernel@gmail.com/

  [ bp: Touchups. ]

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/1679838727-87310-7-git-send-email-mikelley@microsoft.com
2023-03-27 09:31:43 +02:00
Michael Kelley 88e378d400 x86/ioremap: Add hypervisor callback for private MMIO mapping in coco VM
Current code always maps MMIO devices as shared (decrypted) in a
confidential computing VM. But Hyper-V guest VMs on AMD SEV-SNP with vTOM
use a paravisor running in VMPL0 to emulate some devices, such as the
IO-APIC and TPM. In such a case, the device must be accessed as private
(encrypted) because the paravisor emulates the device at an address below
vTOM, where all accesses are encrypted.

Add a new hypervisor callback to determine if an MMIO address should
be mapped private. The callback allows hypervisor-specific code to handle
any quirks, the use of a paravisor, etc. in determining whether a mapping
must be private. If the callback is not used by a hypervisor, default
to returning "false", which is consistent with normal coco VM behavior.

Use this callback as another special case to check for when doing
ioremap().  Just checking the starting address is sufficient as an
ioremap range must be all private or all shared.

Also make the callback in early boot IO-APIC mapping code that uses the
fixmap.

  [ bp: Touchups. ]

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/1678329614-3482-2-git-send-email-mikelley@microsoft.com
2023-03-26 23:42:40 +02:00
Linus Torvalds 986c63741d - Add a AMX ptrace self test
- Prevent a false-positive warning when retrieving the (invalid) address of
   dynamic FPU features in their init state which are not saved in
   init_fpstate at all
 
 - Randomize per-CPU entry areas only when KASLR is enabled
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmQgPFAACgkQEsHwGGHe
 VUrAfA//QyZE5JnH0Ber3upRlZ/dPSNKIaOX6DMLshGj7QDqs2utTnjc4pwaqGWD
 OWpPuAJvOo2+NsN4nfB12venasIzseXDBBhEw6a5kYx73QmFbZ4XswFBLl2Eh8we
 cFbqU4B8SQvFQaahZ4kRRHpsmNGEPYRvgh2lBjcKUJBUaCuu6KoqE9+I3t173Obc
 sPfkXmhintDjYIjKfllN78rsBq4uCCaOVu5u299ZFMdBakRtx0M7U3547+4hwoE3
 txP+VK+TPs8e64XJtCTem1br8HXNt/W5pC4IoQPnH8V+FLhUp1iIz6FpVHnJ7VMD
 9c8VL7e8BNXhKkQn8sSkSVUZV3xNP7n4MbKKbba3f6EWPZnI28WQ3w09LUte/1aa
 hHEHyjMVyJfUiAcfuE1gZflG1+TqT8GkQJ+hqG9+/iSCWftOMuhfsKCROCLGhltJ
 yYBoyR2ZC1ErSLIOvgYAEUIeZ9FkzreOU0Pit6P/5qaPu+EXw3uDzoZB0WQH40Z5
 PQwz04/s3idPwbfCZDOyNc7QZwxbGu1ESkdiTtCJmbBLW0MkWiBCnf/qZsK7PdD1
 Q2qmx86ewIo6QipJpGK9pqWuzwFYNEJJHn3P7T1CcYQnQb+61m+b6WeYozQCgyMF
 0dII6JulW98/WzjVgH6zUA0a0dicO7FM9H6iEGqlIcvxv0PuM7M=
 =eZTj
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Add a AMX ptrace self test

 - Prevent a false-positive warning when retrieving the (invalid)
   address of dynamic FPU features in their init state which are not
   saved in init_fpstate at all

 - Randomize per-CPU entry areas only when KASLR is enabled

* tag 'x86_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  selftests/x86/amx: Add a ptrace test
  x86/fpu/xstate: Prevent false-positive warning in __copy_xstate_uabi_buf()
  x86/mm: Do not shuffle CPU entry areas without KASLR
2023-03-26 09:01:24 -07:00
Josh Poimboeuf fb799447ae x86,objtool: Split UNWIND_HINT_EMPTY in two
Mark reported that the ORC unwinder incorrectly marks an unwind as
reliable when the unwind terminates prematurely in the dark corners of
return_to_handler() due to lack of information about the next frame.

The problem is UNWIND_HINT_EMPTY is used in two different situations:

  1) The end of the kernel stack unwind before hitting user entry, boot
     code, or fork entry

  2) A blind spot in ORC coverage where the unwinder has to bail due to
     lack of information about the next frame

The ORC unwinder has no way to tell the difference between the two.
When it encounters an undefined stack state with 'end=1', it blindly
marks the stack reliable, which can break the livepatch consistency
model.

Fix it by splitting UNWIND_HINT_EMPTY into UNWIND_HINT_UNDEFINED and
UNWIND_HINT_END_OF_STACK.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/fd6212c8b450d3564b855e1cb48404d6277b4d9f.1677683419.git.jpoimboe@kernel.org
2023-03-23 23:18:58 +01:00
Josh Poimboeuf 4708ea14be x86,objtool: Separate unret validation from unwind hints
The ENTRY unwind hint type is serving double duty as both an empty
unwind hint and an unret validation annotation.

Unret validation is unrelated to unwinding. Separate it out into its own
annotation.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/ff7448d492ea21b86d8a90264b105fbd0d751077.1677683419.git.jpoimboe@kernel.org
2023-03-23 23:18:58 +01:00
Josh Poimboeuf f902cfdd46 x86,objtool: Introduce ORC_TYPE_*
Unwind hints and ORC entry types are two distinct things.  Separate them
out more explicitly.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/cc879d38fff8a43f8f7beb2fd56e35a5a384d7cd.1677683419.git.jpoimboe@kernel.org
2023-03-23 23:18:57 +01:00
Luis Chamberlain 89d7971eb2 x86: Simplify one-level sysctl registration for itmt_kern_table
There is no need to declare an extra tables to just create directory,
this can be easily be done with a prefix path with register_sysctl().

Simplify this registration.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20230310233248.3965389-3-mcgrof%40kernel.org
2023-03-22 11:47:21 -07:00
Uros Bizjak 22767544e9 x86/ACPI/boot: Improve __acpi_acquire_global_lock
Improve __acpi_acquire_global_lock by using a temporary variable.
This enables compiler to perform if-conversion and improves generated
code from:

 ...
 72a:	d1 ea                	shr    %edx
 72c:	83 e1 fc             	and    $0xfffffffc,%ecx
 72f:	83 e2 01             	and    $0x1,%edx
 732:	09 ca                	or     %ecx,%edx
 734:	83 c2 02             	add    $0x2,%edx
 737:	f0 0f b1 17          	lock cmpxchg %edx,(%rdi)
 73b:	75 e9                	jne    726 <__acpi_acquire_global_lock+0x6>
 73d:	83 e2 03             	and    $0x3,%edx
 740:	31 c0                	xor    %eax,%eax
 742:	83 fa 03             	cmp    $0x3,%edx
 745:	0f 95 c0             	setne  %al
 748:	f7 d8                	neg    %eax

to:

 ...
 72a:	d1 e9                	shr    %ecx
 72c:	83 e2 fc             	and    $0xfffffffc,%edx
 72f:	83 e1 01             	and    $0x1,%ecx
 732:	09 ca                	or     %ecx,%edx
 734:	83 c2 02             	add    $0x2,%edx
 737:	f0 0f b1 17          	lock cmpxchg %edx,(%rdi)
 73b:	75 e9                	jne    726 <__acpi_acquire_global_lock+0x6>
 73d:	8d 41 ff             	lea    -0x1(%rcx),%eax

BTW: the compiler could generate:

	lea 0x2(%rcx,%rdx,1),%edx

instead of:

	or     %ecx,%edx
	add    $0x2,%edx

but unwated conversion from add to or when bits are known to be zero
prevents this improvement. This is GCC PR108477.

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/all/20230320212012.12704-1-ubizjak%40gmail.com
2023-03-22 11:32:39 -07:00
Chang S. Bae b158888402 x86/fpu/xstate: Prevent false-positive warning in __copy_xstate_uabi_buf()
__copy_xstate_to_uabi_buf() copies either from the tasks XSAVE buffer
or from init_fpstate into the ptrace buffer. Dynamic features, like
XTILEDATA, have an all zeroes init state and are not saved in
init_fpstate, which means the corresponding bit is not set in the
xfeatures bitmap of the init_fpstate header.

But __copy_xstate_to_uabi_buf() retrieves addresses for both the tasks
xstate and init_fpstate unconditionally via __raw_xsave_addr().

So if the tasks XSAVE buffer has a dynamic feature set, then the
address retrieval for init_fpstate triggers the warning in
__raw_xsave_addr() which checks the feature bit in the init_fpstate
header.

Remove the address retrieval from init_fpstate for extended features.
They have an all zeroes init state so init_fpstate has zeros for them.
Then zeroing the user buffer for the init state is the same as copying
them from init_fpstate.

Fixes: 2308ee57d9 ("x86/fpu/amx: Enable the AMX feature in 64-bit mode")
Reported-by: Mingwei Zhang <mizhang@google.com>
Link: https://lore.kernel.org/kvm/20230221163655.920289-2-mizhang@google.com/
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Mingwei Zhang <mizhang@google.com>
Link: https://lore.kernel.org/all/20230227210504.18520-2-chang.seok.bae%40intel.com
Cc: stable@vger.kernel.org
2023-03-22 10:59:13 -07:00
Mark Rutland fee86a4ed5 ftrace: selftest: remove broken trace_direct_tramp
The ftrace selftest code has a trace_direct_tramp() function which it
uses as a direct call trampoline. This happens to work on x86, since the
direct call's return address is in the usual place, and can be returned
to via a RET, but in general the calling convention for direct calls is
different from regular function calls, and requires a trampoline written
in assembly.

On s390, regular function calls place the return address in %r14, and an
ftrace patch-site in an instrumented function places the trampoline's
return address (which is within the instrumented function) in %r0,
preserving the original %r14 value in-place. As a regular C function
will return to the address in %r14, using a C function as the trampoline
results in the trampoline returning to the caller of the instrumented
function, skipping the body of the instrumented function.

Note that the s390 issue is not detcted by the ftrace selftest code, as
the instrumented function is trivial, and returning back into the caller
happens to be equivalent.

On arm64, regular function calls place the return address in x30, and
an ftrace patch-site in an instrumented function saves this into r9
and places the trampoline's return address (within the instrumented
function) in x30. A regular C function will return to the address in
x30, but will not restore x9 into x30. Consequently, using a C function
as the trampoline results in returning to the trampoline's return
address having corrupted x30, such that when the instrumented function
returns, it will return back into itself.

To avoid future issues in this area, remove the trace_direct_tramp()
function, and require that each architecture with direct calls provides
a stub trampoline, named ftrace_stub_direct_tramp. This can be written
to handle the architecture's trampoline calling convention, and in
future could be used elsewhere (e.g. in the ftrace ops sample, to
measure the overhead of direct calls), so we may as well always build it
in.

Link: https://lkml.kernel.org/r/20230321140424.345218-8-revest@chromium.org

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-21 13:59:29 -04:00
Dionna Glaze 0144e3b85d x86/sev: Change snp_guest_issue_request()'s fw_err argument
The GHCB specification declares that the firmware error value for
a guest request will be stored in the lower 32 bits of EXIT_INFO_2.  The
upper 32 bits are for the VMM's own error code. The fw_err argument to
snp_guest_issue_request() is thus a misnomer, and callers will need
access to all 64 bits.

The type of unsigned long also causes problems, since sw_exit_info2 is
u64 (unsigned long long) vs the argument's unsigned long*. Change this
type for issuing the guest request. Pass the ioctl command struct's error
field directly instead of in a local variable, since an incomplete guest
request may not set the error code, and uninitialized stack memory would
be written back to user space.

The firmware might not even be called, so bookend the call with the no
firmware call error and clear the error.

Since the "fw_err" field is really exitinfo2 split into the upper bits'
vmm error code and lower bits' firmware error code, convert the 64 bit
value to a union.

  [ bp:
   - Massage commit message
   - adjust code
   - Fix a build issue as
   Reported-by: kernel test robot <lkp@intel.com>
   Link: https://lore.kernel.org/oe-kbuild-all/202303070609.vX6wp2Af-lkp@intel.com
   - print exitinfo2 in hex
   Tom:
    - Correct -EIO exit case. ]

Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230214164638.1189804-5-dionnaglaze@google.com
Link: https://lore.kernel.org/r/20230307192449.24732-12-bp@alien8.de
2023-03-21 15:43:19 +01:00
David Woodhouse 805ae9dc3b x86/smpboot: Reference count on smpboot_setup_warm_reset_vector()
When bringing up a secondary CPU from do_boot_cpu(), the warm reset flag
is set in CMOS and the starting IP for the trampoline written inside the
BDA at 0x467. Once the CPU is running, the CMOS flag is unset and the
value in the BDA cleared.

To allow for parallel bringup of CPUs, add a reference count to track the
number of CPUs currently bring brought up, and clear the state only when
the count reaches zero.

Since the RTC spinlock is required to write to the CMOS, it can be used
for mutual exclusion on the refcount too.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Usama Arif <usama.arif@bytedance.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Link: https://lore.kernel.org/r/20230316222109.1940300-5-usama.arif@bytedance.com
2023-03-21 13:35:53 +01:00
Brian Gerst 8f6be6d870 x86/smpboot: Remove initial_gs
Given its CPU#, each CPU can find its own per-cpu offset, and directly set
GSBASE accordingly. The global variable can be eliminated.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Usama Arif <usama.arif@bytedance.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Usama Arif <usama.arif@bytedance.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20230316222109.1940300-9-usama.arif@bytedance.com
2023-03-21 13:35:53 +01:00
Brian Gerst c253b64020 x86/smpboot: Remove early_gdt_descr on 64-bit
Build the GDT descriptor on the stack instead.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Usama Arif <usama.arif@bytedance.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Usama Arif <usama.arif@bytedance.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20230316222109.1940300-8-usama.arif@bytedance.com
2023-03-21 13:35:53 +01:00
Brian Gerst 3adee777ad x86/smpboot: Remove initial_stack on 64-bit
In order to facilitate parallel startup, start to eliminate some of the
global variables passing information to CPUs in the startup path.

However, start by introducing one more: smpboot_control. For now this
merely holds the CPU# of the CPU which is coming up. Each CPU can then
find its own per-cpu data, and everything else it needs can be found
from there, allowing the other global variables to be removed.

First to be removed is initial_stack. Each CPU can load %rsp from its
current_task->thread.sp instead. That is already set up with the correct
idle thread for APs. Set up the .sp field in INIT_THREAD on x86 so that
the BSP also finds a suitable stack pointer in the static per-cpu data
when coming up on first boot.

On resume from S3, the CPU needs a temporary stack because its idle task
is already active. Instead of setting initial_stack, the sleep code can
simply set its own current->thread.sp to point to the temporary stack.
Nobody else cares about ->thread.sp for a thread which is currently on
a CPU, because the true value is actually in the %rsp register. Which
is restored with the rest of the CPU context in do_suspend_lowlevel().

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Usama Arif <usama.arif@bytedance.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Usama Arif <usama.arif@bytedance.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20230316222109.1940300-7-usama.arif@bytedance.com
2023-03-21 13:35:53 +01:00
David Woodhouse cefad862f2 x86/apic/x2apic: Allow CPU cluster_mask to be populated in parallel
Each of the sibling CPUs in a cluster uses the same clustermask. The first
CPU in a cluster will need a new clustermask allocated, while subsequent
siblings will use the same clustermask as the first.

However, the CPU being brought up cannot yet perform memory allocations
at the point that this occurs in init_x2apic_ldr().

So at present, the alloc_clustermask() function allocates a clustermask
just in case it's needed, storing it in the global cluster_hotplug_mask.
A CPU which is the first sibling of a cluster will "take" it from there
and set cluster_hotplug_mask to NULL, in order for alloc_clustermask()
to allocate a new one before bringing up the next CPU.

To facilitate parallel bringup of CPUs in future, switch to a model
where alloc_clustermask() prepopulates the clustermask in the per_cpu
data for each present CPU in the cluster in advance. All that the CPU
needs to do for itself in init_x2apic_ldr() is set its own bit in that
mask.

The 'node' and 'clusterid' members of struct cluster_mask are thus
redundant, and it can become a simple struct cpumask instead.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Usama Arif <usama.arif@bytedance.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Link: https://lore.kernel.org/r/20230316222109.1940300-2-usama.arif@bytedance.com
2023-03-21 13:35:53 +01:00
Muralidhara M K 4c1cdec319 x86/MCE/AMD: Use an u64 for bank_map
Thee maximum number of MCA banks is 64 (MAX_NR_BANKS), see

  a0bc32b3ca ("x86/mce: Increase maximum number of banks to 64").

However, the bank_map which contains a bitfield of which banks to
initialize is of type unsigned int and that overflows when those bit
numbers are >= 32, leading to UBSAN complaining correctly:

  UBSAN: shift-out-of-bounds in arch/x86/kernel/cpu/mce/amd.c:1365:38
  shift exponent 32 is too large for 32-bit type 'int'

Change the bank_map to a u64 and use the proper BIT_ULL() macro when
modifying bits in there.

  [ bp: Rewrite commit message. ]

Fixes: a0bc32b3ca ("x86/mce: Increase maximum number of banks to 64")
Signed-off-by: Muralidhara M K <muralimk@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127151601.1068324-1-muralimk@amd.com
2023-03-19 19:07:04 +01:00
Linus Torvalds c46a7d0473 - Flush out logged errors immediately after MCA banks configuration
changes over sysfs have been done instead of waiting until something
   else triggers the workqueue later - another error or the polling
   interval cycle is reached
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmQXB9kACgkQEsHwGGHe
 VUqqjA//YbcRx2PFcZT5nnuQlb6bptsluCUrHOcJVT/1fe0ayrlvahuw/QtSXRH4
 Vwukc3+1cehp3CcSbHKAKOArTL7NV2tbk+EZQk+Ae+7QdRz/9TuEenL6ipCC1cr4
 Z3Bo3KZmHlBcoJaQDcQWWIL8TiYAPXdqXWksh8q+0pxDI2wuFguFBJI84j+AUZH+
 I4EDXLfzQn8RQZgiggEIez0aOIig74eaPfhHsNlqJJYG4x/EVgmRn9qJpYBGAeq6
 xQR6NvHUTjCCZAASI1QJ/IT5rXD17iey3J/gIw3QZEhotBCCDdk5vh8S8zqDStRF
 x3Za7qeC5m4HMfB/09v8HGeTitlaT0BYmM2CFOsru7I/qI+dJccDTwLmF8UY5Nj2
 G6454A7ZEQ13lhfAoDIeVFfoSkqyXNz+McTtOQ8/xDJ5hnuNJ4WtT7sWemWZlV5S
 l14xVFbojtGNmQygUGeL7cxl6h12Y9zFNwh1A5HzwH4EvywQJW7/35pxXEZIO3tl
 EioXKe1eSLcKoD9VAv8icmstpwJl1Gm5Xge1oyw8cyTW6d3hM8ZOEqdTAJvRkfG7
 LwPl3qC6Hrqhjc26WZ9pxmvR1hSYLWIidy6MlNeO9mf6wZR/ub+SmuHzy6n7TZl4
 pTsVver93ZgS1J8CJ0ohCK1jHs+2aLvh/6qiJvIw9lbgAZ2HKPo=
 =Q6Dq
 -----END PGP SIGNATURE-----

Merge tag 'ras_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS fix from Borislav Petkov:

 - Flush out logged errors immediately after MCA banks configuration
   changes over sysfs have been done instead of waiting until something
   else triggers the workqueue later - another error or the polling
   interval cycle is reached

* tag 'ras_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Make sure logged MCEs are processed after sysfs update
2023-03-19 09:57:53 -07:00
Linus Torvalds 4ac39c5910 - Check cmdline_find_option()'s return value before further processing
- Clear temporary storage in the resctrl code to prevent access to an
   unexistent MSR
 
 - Add a simple throttling mechanism to protect the hypervisor from potentially
   malicious SEV guests issuing requests in rapid succession.
 
   In order to not jeopardize the sanity of everyone involved in
   maintaining this code, the request issuing side has received
   a cleanup, split in more or less trivial, small and digestible pieces.
   Otherwise, the code was threatening to become an unmaintainable mess.
 
   Therefore, that cleanup is marked indirectly also for stable so that
   there's no differences between the upstream code and the stable
   variant when it comes down to backporting more there.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmQW/64ACgkQEsHwGGHe
 VUoWzBAAl1KD4RR5EhrppOCl5mWtZmKUf+COag7RiqggXJyhCTXO+5N24dHcgoJB
 h60gY7Nxg0CpZVbkMDSpJIuclmlMkiCLgUeuvN6E5ofgb/ZSv9nDuCXPUtLQ962d
 T6071/v48G+2PVGm+PD1xAwP3065i3itVV/k6Xn8fxeXf/fq8L5eU5tADuFICI0b
 dKbd7U+TEQAh5E6BUwms2G1P0glJqqL37H22fTcyxI6D2T/UJLlc4+or5JmTofDa
 XJE/UHn+ZaGZYjhdr/BrlcxnY1jUTQH2K3wciADmNolkuCpDQJs6GgN98lXdhT34
 vyWQVokHGEKE8Va6m5wZX90eKraSc27/0d5ZlHz/rIJgVBxp/VvCzqLUZRvkRwwk
 k7bVOeZHe6P+b0QQl7uL9U2ff0sV/4PX0NLr+jzQdlA2ZYuTV6YgBDl7nAe1Tw/J
 gJViAvDbm26mlTG1wQrvw9M2P4AQIYpEmD4KPs7j2aQafUgtGqfTBwyeKHXdtMLJ
 TrkEISZZ8BVVvYghctN4R21IryUSnfq2eXxPwxUMh78SrO8sC23QJ5PVqKM/enF8
 azf/ZBgANidzqJ44k2Ow2bnO0ZTYZblvl3NUCMNa5SjQmzAEUzupHEKUgV10MFMR
 J3lspGU47BVeirFPWlCYKr+3Buwzur5xo5wCmrezxbN0fAo9k5M=
 =6UGz
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:
 "There's a little bit more 'movement' in there for my taste but it
  needs to happen and should make the code better after it.

   - Check cmdline_find_option()'s return value before further
     processing

   - Clear temporary storage in the resctrl code to prevent access to an
     unexistent MSR

   - Add a simple throttling mechanism to protect the hypervisor from
     potentially malicious SEV guests issuing requests in rapid
     succession.

     In order to not jeopardize the sanity of everyone involved in
     maintaining this code, the request issuing side has received a
     cleanup, split in more or less trivial, small and digestible
     pieces. Otherwise, the code was threatening to become an
     unmaintainable mess.

     Therefore, that cleanup is marked indirectly also for stable so
     that there's no differences between the upstream code and the
     stable variant when it comes down to backporting more there"

* tag 'x86_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Fix use of uninitialized buffer in sme_enable()
  x86/resctrl: Clear staged_config[] before and after it is used
  virt/coco/sev-guest: Add throttling awareness
  virt/coco/sev-guest: Convert the sw_exit_info_2 checking to a switch-case
  virt/coco/sev-guest: Do some code style cleanups
  virt/coco/sev-guest: Carve out the request issuing logic into a helper
  virt/coco/sev-guest: Remove the disable_vmpck label in handle_guest_request()
  virt/coco/sev-guest: Simplify extended guest request handling
  virt/coco/sev-guest: Check SEV_SNP attribute at probe time
2023-03-19 09:43:41 -07:00
Greg Kroah-Hartman 60260272dc x86/umwait: move to use bus_get_dev_root()
Direct access to the struct bus_type dev_root pointer is going away soon
so replace that with a call to bus_get_dev_root() instead, which is what
it is there for.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20230313182918.1312597-10-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17 15:29:29 +01:00
Greg Kroah-Hartman 216f58beb2 x86/microcode: move to use bus_get_dev_root()
Direct access to the struct bus_type dev_root pointer is going away soon
so replace that with a call to bus_get_dev_root() instead, which is what
it is there for.

Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20230313182918.1312597-9-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17 15:29:26 +01:00
Greg Kroah-Hartman 1aaba11da9 driver core: class: remove module * from class_create()
The module pointer in class_create() never actually did anything, and it
shouldn't have been requred to be set as a parameter even if it did
something.  So just remove it and fix up all callers of the function in
the kernel tree at the same time.

Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20230313181843.1207845-4-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17 15:16:33 +01:00
Juergen Gross 11af36cb89 x86/paravirt: Convert simple paravirt functions to asm
All functions referenced via __PV_IS_CALLEE_SAVE() need to be assembler
functions, as those functions calls are hidden from the compiler.

In case the kernel is compiled with "-fzero-call-used-regs" the compiler
will clobber caller-saved registers at the end of C functions, which
will result in unexpectedly zeroed registers at the call site of the
related paravirt functions.

Replace the C functions with DEFINE_PARAVIRT_ASM() constructs using
the same instructions as the related paravirt calls in the
PVOP_ALT_[V]CALLEE*() macros. And since they're not C functions visible
to the compiler anymore, latter won't do the callee-clobbered zeroing
invoked by -fzero-call-used-regs and thus won't corrupt registers.

  [ bp: Extend commit message. ]

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230317063325.361-1-jgross@suse.com
2023-03-17 13:29:47 +01:00
Michael Kelley f8acb24aaf x86/hyperv: Block root partition functionality in a Confidential VM
Hyper-V should never specify a VM that is a Confidential VM and also
running in the root partition.  Nonetheless, explicitly block such a
combination to guard against a compromised Hyper-V maliciously trying to
exploit root partition functionality in a Confidential VM to expose
Confidential VM secrets. No known bug is being fixed, but the attack
surface for Confidential VMs on Hyper-V is reduced.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1678894453-95392-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-03-17 10:57:35 +00:00
Kirill A. Shutemov 23e5d9ec2b x86/mm/iommu/sva: Make LAM and SVA mutually exclusive
IOMMU and SVA-capable devices know nothing about LAM and only expect
canonical addresses. An attempt to pass down tagged pointer will lead
to address translation failure.

By default do not allow to enable both LAM and use SVA in the same
process.

The new ARCH_FORCE_TAGGED_SVA arch_prctl() overrides the limitation.
By using the arch_prctl() userspace takes responsibility to never pass
tagged address to the device.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20230312112612.31869-12-kirill.shutemov%40linux.intel.com
2023-03-16 13:08:40 -07:00
Kirill A. Shutemov 400b9b9344 iommu/sva: Replace pasid_valid() helper with mm_valid_pasid()
Kernel has few users of pasid_valid() and all but one checks if the
process has PASID allocated. The helper takes ioasid_t as the input.

Replace the helper with mm_valid_pasid() that takes mm_struct as the
argument. The only call that checks PASID that is not tied to mm_struct
is open-codded now.

This is preparatory patch. It helps avoid ifdeffery: no need to
dereference mm->pasid in generic code to check if the process has PASID.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20230312112612.31869-11-kirill.shutemov%40linux.intel.com
2023-03-16 13:08:40 -07:00
Kirill A. Shutemov 2f8794bd08 x86/mm: Provide arch_prctl() interface for LAM
Add a few of arch_prctl() handles:

 - ARCH_ENABLE_TAGGED_ADDR enabled LAM. The argument is required number
   of tag bits. It is rounded up to the nearest LAM mode that can
   provide it. For now only LAM_U57 is supported, with 6 tag bits.

 - ARCH_GET_UNTAG_MASK returns untag mask. It can indicates where tag
   bits located in the address.

 - ARCH_GET_MAX_TAG_BITS returns the maximum tag bits user can request.
   Zero if LAM is not supported.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Alexander Potapenko <glider@google.com>
Link: https://lore.kernel.org/all/20230312112612.31869-9-kirill.shutemov%40linux.intel.com
2023-03-16 13:08:39 -07:00
Kirill A. Shutemov 74c228d20a x86/uaccess: Provide untagged_addr() and remove tags before address check
untagged_addr() is a helper used by the core-mm to strip tag bits and
get the address to the canonical shape based on rules of the current
thread. It only handles userspace addresses.

The untagging mask is stored in per-CPU variable and set on context
switching to the task.

The tags must not be included into check whether it's okay to access the
userspace address. Strip tags in access_ok().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Alexander Potapenko <glider@google.com>
Link: https://lore.kernel.org/all/20230312112612.31869-7-kirill.shutemov%40linux.intel.com
2023-03-16 13:08:39 -07:00
Kirill A. Shutemov 5ef495e55f x86: Allow atomic MM_CONTEXT flags setting
So far there's no need in atomic setting of MM context flags in
mm_context_t::flags. The flags set early in exec and never change
after that.

LAM enabling requires atomic flag setting. The upcoming flag
MM_CONTEXT_FORCE_TAGGED_SVA can be set much later in the process
lifetime where multiple threads exist.

Convert the field to unsigned long and do MM_CONTEXT_* accesses with
__set_bit() and test_bit().

No functional changes.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Alexander Potapenko <glider@google.com>
Link: https://lore.kernel.org/all/20230312112612.31869-3-kirill.shutemov%40linux.intel.com
2023-03-16 13:08:39 -07:00
Fenghua Yu d7ce15e1d4 x86/split_lock: Enumerate architectural split lock disable bit
The December 2022 edition of the Intel Instruction Set Extensions manual
defined that the split lock disable bit in the IA32_CORE_CAPABILITIES MSR
is (and retrospectively always has been) architectural.

Remove all the model specific checks except for Ice Lake variants which are
still needed because these CPU models do not enumerate presence of the
IA32_CORE_CAPABILITIES MSR.

Originally-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/lkml/20220701131958.687066-1-fenghua.yu@intel.com/t/#mada243bee0915532a6adef6a9e32d244d1a9aef4
2023-03-16 11:50:51 +01:00
Borislav Petkov (AMD) 8cc68c9c9e x86/CPU/AMD: Make sure EFER[AIBRSE] is set
The AutoIBRS bit gets set only on the BSP as part of determining which
mitigation to enable on AMD. Setting on the APs relies on the
circumstance that the APs get booted through the trampoline and EFER
- the MSR which contains that bit - gets replicated on every AP from the
BSP.

However, this can change in the future and considering the security
implications of this bit not being set on every CPU, make sure it is set
by verifying EFER later in the boot process and on every AP.

Reported-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20230224185257.o3mcmloei5zqu7wa@treble
2023-03-16 11:50:00 +01:00
Peter Newman 322b72e0fd x86/resctrl: Avoid redundant counter read in __mon_event_count()
__mon_event_count() does the per-RMID, per-domain work for
user-initiated event count reads and the initialization of new monitor
groups.

In the initialization case, after resctrl_arch_reset_rmid() calls
__rmid_read() to record an initial count for a new monitor group, it
immediately calls resctrl_arch_rmid_read(). This re-read of the hardware
counter is unnecessary and the following computations are ignored by the
caller during initialization.

Following return from resctrl_arch_reset_rmid(), just clear the
mbm_state and return. This involves moving the mbm_state lookup into the
rr->first case, as it's not needed for regular event count reads: the
QOS_L3_OCCUP_EVENT_ID case was redundant with the accumulating logic at
the end of the function.

Signed-off-by: Peter Newman <peternewman@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/all/20221220164132.443083-2-peternewman%40google.com
2023-03-15 15:44:15 -07:00
Shawn Wang 0424a7dfe9 x86/resctrl: Clear staged_config[] before and after it is used
As a temporary storage, staged_config[] in rdt_domain should be cleared
before and after it is used. The stale value in staged_config[] could
cause an MSR access error.

Here is a reproducer on a system with 16 usable CLOSIDs for a 15-way L3
Cache (MBA should be disabled if the number of CLOSIDs for MB is less than
16.) :
	mount -t resctrl resctrl -o cdp /sys/fs/resctrl
	mkdir /sys/fs/resctrl/p{1..7}
	umount /sys/fs/resctrl/
	mount -t resctrl resctrl /sys/fs/resctrl
	mkdir /sys/fs/resctrl/p{1..8}

An error occurs when creating resource group named p8:
    unchecked MSR access error: WRMSR to 0xca0 (tried to write 0x00000000000007ff) at rIP: 0xffffffff82249142 (cat_wrmsr+0x32/0x60)
    Call Trace:
     <IRQ>
     __flush_smp_call_function_queue+0x11d/0x170
     __sysvec_call_function+0x24/0xd0
     sysvec_call_function+0x89/0xc0
     </IRQ>
     <TASK>
     asm_sysvec_call_function+0x16/0x20

When creating a new resource control group, hardware will be configured
by the following process:
    rdtgroup_mkdir()
      rdtgroup_mkdir_ctrl_mon()
        rdtgroup_init_alloc()
          resctrl_arch_update_domains()

resctrl_arch_update_domains() iterates and updates all resctrl_conf_type
whose have_new_ctrl is true. Since staged_config[] holds the same values as
when CDP was enabled, it will continue to update the CDP_CODE and CDP_DATA
configurations. When group p8 is created, get_config_index() called in
resctrl_arch_update_domains() will return 16 and 17 as the CLOSIDs for
CDP_CODE and CDP_DATA, which will be translated to an invalid register -
0xca0 in this scenario.

Fix it by clearing staged_config[] before and after it is used.

[reinette: re-order commit tags]

Fixes: 75408e4350 ("x86/resctrl: Allow different CODE/DATA configurations to be staged")
Suggested-by: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Shawn Wang <shawnwang@linux.alibaba.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/2fad13f49fbe89687fc40e9a5a61f23a28d1507a.1673988935.git.reinette.chatre%40intel.com
2023-03-15 15:19:43 -07:00
Linus Torvalds 29db00c252 Tracing fixes for v6.3
- Do not allow histogram values to have modifies.
   Can cause a NULL pointer dereference if they do.
 
 - Warn if hist_field_name() is passed a NULL.
   Prevent the NULL pointer dereference mentioned above.
 
 - Fix invalid address look up race in lookup_rec()
 
 - Define ftrace_stub_graph conditionally to prevent linker errors
 
 - Always check if RCU is watching at all tracepoint locations
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZBDuTBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qsboAP4yfrFYvIIKM5EkzkEiPI+V2hdlA12x
 bt839jO5AWCmhAEAiY8FmKatpBJQKsiGqSOab8aHOMnhGFZwltCHAPa9PAI=
 =vtA2
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Do not allow histogram values to have modifies. They can cause a NULL
   pointer dereference if they do.

 - Warn if hist_field_name() is passed a NULL. Prevent the NULL pointer
   dereference mentioned above.

 - Fix invalid address look up race in lookup_rec()

 - Define ftrace_stub_graph conditionally to prevent linker errors

 - Always check if RCU is watching at all tracepoint locations

* tag 'trace-v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Make tracepoint lockdep check actually test something
  ftrace,kcfi: Define ftrace_stub_graph conditionally
  ftrace: Fix invalid address access in lookup_rec() when index is 0
  tracing: Check field value in hist_field_name()
  tracing: Do not let histogram values have some modifiers
2023-03-14 17:07:54 -07:00
Dionna Glaze 72f7754dcf virt/coco/sev-guest: Add throttling awareness
A potentially malicious SEV guest can constantly hammer the hypervisor
using this driver to send down requests and thus prevent or at least
considerably hinder other guests from issuing requests to the secure
processor which is a shared platform resource.

Therefore, the host is permitted and encouraged to throttle such guest
requests.

Add the capability to handle the case when the hypervisor throttles
excessive numbers of requests issued by the guest. Otherwise, the VM
platform communication key will be disabled, preventing the guest from
attesting itself.

Realistically speaking, a well-behaved guest should not even care about
throttling. During its lifetime, it would end up issuing a handful of
requests which the hardware can easily handle.

This is more to address the case of a malicious guest. Such guest should
get throttled and if its VMPCK gets disabled, then that's its own
wrongdoing and perhaps that guest even deserves it.

To the implementation: the hypervisor signals with SNP_GUEST_REQ_ERR_BUSY
that the guest requests should be throttled. That error code is returned
in the upper 32-bit half of exitinfo2 and this is part of the GHCB spec
v2.

So the guest is given a throttling period of 1 minute in which it
retries the request every 2 seconds. This is a good default but if it
turns out to not pan out in practice, it can be tweaked later.

For safety, since the encryption algorithm in GHCBv2 is AES_GCM, control
must remain in the kernel to complete the request with the current
sequence number. Returning without finishing the request allows the
guest to make another request but with different message contents. This
is IV reuse, and breaks cryptographic protections.

  [ bp:
    - Rewrite commit message and do a simplified version.
    - The stable tags are supposed to denote that a cleanup should go
      upfront before backporting this so that any future fixes to this
      can preserve the sanity of the backporter(s). ]

Fixes: d5af44dde5 ("x86/sev: Provide support for SNP guest request NAEs")
Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
Co-developed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: <stable@kernel.org> # d6fd48eff7 ("virt/coco/sev-guest: Check SEV_SNP attribute at probe time")
Cc: <stable@kernel.org> # 970ab82374 (" virt/coco/sev-guest: Simplify extended guest request handling")
Cc: <stable@kernel.org> # c5a338274b ("virt/coco/sev-guest: Remove the disable_vmpck label in handle_guest_request()")
Cc: <stable@kernel.org> # 0fdb6cc7c8 ("virt/coco/sev-guest: Carve out the request issuing logic into a helper")
Cc: <stable@kernel.org> # d25bae7dc7 ("virt/coco/sev-guest: Do some code style cleanups")
Cc: <stable@kernel.org> # fa4ae42cc6 ("virt/coco/sev-guest: Convert the sw_exit_info_2 checking to a switch-case")
Link: https://lore.kernel.org/r/20230214164638.1189804-2-dionnaglaze@google.com
2023-03-13 13:29:27 +01:00
Borislav Petkov (AMD) fa4ae42cc6 virt/coco/sev-guest: Convert the sw_exit_info_2 checking to a switch-case
snp_issue_guest_request() checks the value returned by the hypervisor in
sw_exit_info_2 and returns a different error depending on it.

Convert those checks into a switch-case to make it more readable when
more error values are going to be checked in the future.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20230307192449.24732-8-bp@alien8.de
2023-03-13 12:55:34 +01:00
Borislav Petkov (AMD) 970ab82374 virt/coco/sev-guest: Simplify extended guest request handling
Return a specific error code - -ENOSPC - to signal the too small cert
data buffer instead of checking exit code and exitinfo2.

While at it, hoist the *fw_err assignment in snp_issue_guest_request()
so that a proper error value is returned to the callers.

  [ Tom: check override_err instead of err. ]

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230307192449.24732-4-bp@alien8.de
2023-03-13 11:27:10 +01:00
Borislav Petkov (AMD) d6fd48eff7 virt/coco/sev-guest: Check SEV_SNP attribute at probe time
No need to check it on every ioctl. And yes, this is a common SEV driver
but it does only SNP-specific operations currently. This can be
revisited later, when more use cases appear.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20230307192449.24732-3-bp@alien8.de
2023-03-13 11:20:20 +01:00
Yazen Ghannam 4783b9cb37 x86/mce: Make sure logged MCEs are processed after sysfs update
A recent change introduced a flag to queue up errors found during
boot-time polling. These errors will be processed during late init once
the MCE subsystem is fully set up.

A number of sysfs updates call mce_restart() which goes through a subset
of the CPU init flow. This includes polling MCA banks and logging any
errors found. Since the same function is used as boot-time polling,
errors will be queued. However, the system is now past late init, so the
errors will remain queued until another error is found and the workqueue
is triggered.

Call mce_schedule_work() at the end of mce_restart() so that queued
errors are processed.

Fixes: 3bff147b18 ("x86/mce: Defer processing of early errors")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230301221420.2203184-1-yazen.ghannam@amd.com
2023-03-12 21:12:21 +01:00
Linus Torvalds d3d0cac69f - Disable XSAVES on AMD Zen1 and Zen2 machines due to an erratum. No
impact to anything as those machines will fallback to XSAVEC which is
   equivalent there.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmQNtvAACgkQEsHwGGHe
 VUpGiRAAjlYpvaQK24s8MiQr3LBC0pKsgKstf1Jx5C+HspmS5JAdF83646kMOUKm
 MUGPfQwK1nN5kO0/fOlo4O6vhSIF2Ft/Xfrd/APZm6qJhR3pli9675NeF8fH2D5t
 Ypgtl6psRudkB3RUmE1cmHWbr9dMnHZZLnL6iA/qHYXCY3kaw96ncM6HjdnrjXRd
 OV2+N4dyhTet3MdUdw7dSr1uz75O5PQH/1FwR1V2zroF1sjImaIwQ7JN51hIITxw
 DzfTbfuJzdAqwfztBFG/yZ5K+DEoU5BemHHIuhq+X9/7GeLMd059DdnZuXSX8mcH
 jjzOa/E5r/PjYze0XRWT3RbI5fbSc1qhNbmj3kLNP3KE/F3S74n6FR58oLNqosVk
 zw1TYP8oocdjG1VxJdm5qndIzwHMSj3qkd+BSNZZ1fwINVLXtSDubtThkN/i+81+
 nqnMA8HFrcwy1bhwq4jd5dmP7tjlODATfeL4ZV6/6J1RX8Vwu+bjdy8PM+vJYJ0d
 pnFLT20cf6Or0MQHUssO+uh6oC3aQ6AxPWJcuUfbdSLYzjr2EObgCHXGZOhCjvhC
 CsALcmwnLh5XzwglzWoXyyv+tsJar63XYcPSEIt+gIfXpLf7ZbzcOSDLDkri6B3Z
 fCABGASFnoXr7ZYnGxH4L5WKWOk1W+pgpxyC4mnzD9oHtXIzUPU=
 =u6kj
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.3_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Borislav Petkov:
 "A single erratum fix for AMD machines:

   - Disable XSAVES on AMD Zen1 and Zen2 machines due to an erratum. No
     impact to anything as those machines will fallback to XSAVEC which
     is equivalent there"

* tag 'x86_urgent_for_v6.3_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/CPU/AMD: Disable XSAVES on AMD family 0x17
2023-03-12 09:12:03 -07:00
Arnd Bergmann aa69f81492 ftrace,kcfi: Define ftrace_stub_graph conditionally
When CONFIG_FUNCTION_GRAPH_TRACER is disabled, __kcfi_typeid_ftrace_stub_graph
is missing, causing a link failure:

 ld.lld: error: undefined symbol: __kcfi_typeid_ftrace_stub_graph
 referenced by arch/x86/kernel/ftrace_64.o:(__cfi_ftrace_stub_graph) in archive vmlinux.a

Mark the reference to it as conditional on the same symbol, as
is done on arm64.

Link: https://lore.kernel.org/linux-trace-kernel/20230131093643.3850272-1-arnd@kernel.org

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Fixes: 883bbbffa5 ("ftrace,kcfi: Separate ftrace_stub() and ftrace_stub_graph()")
See-also: 2598ac6ec4 ("arm64: ftrace: Define ftrace_stub_graph only with FUNCTION_GRAPH_TRACER")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-09 22:17:06 -05:00
Song Liu ac3b432839 module: replace module_layout with module_memory
module_layout manages different types of memory (text, data, rodata, etc.)
in one allocation, which is problematic for some reasons:

1. It is hard to enable CONFIG_STRICT_MODULE_RWX.
2. It is hard to use huge pages in modules (and not break strict rwx).
3. Many archs uses module_layout for arch-specific data, but it is not
   obvious how these data are used (are they RO, RX, or RW?)

Improve the scenario by replacing 2 (or 3) module_layout per module with
up to 7 module_memory per module:

        MOD_TEXT,
        MOD_DATA,
        MOD_RODATA,
        MOD_RO_AFTER_INIT,
        MOD_INIT_TEXT,
        MOD_INIT_DATA,
        MOD_INIT_RODATA,

and allocating them separately. This adds slightly more entries to
mod_tree (from up to 3 entries per module, to up to 7 entries per
module). However, this at most adds a small constant overhead to
__module_address(), which is expected to be fast.

Various archs use module_layout for different data. These data are put
into different module_memory based on their location in module_layout.
IOW, data that used to go with text is allocated with MOD_MEM_TYPE_TEXT;
data that used to go with data is allocated with MOD_MEM_TYPE_DATA, etc.

module_memory simplifies quite some of the module code. For example,
ARCH_WANTS_MODULES_DATA_IN_VMALLOC is a lot cleaner, as it just uses a
different allocator for the data. kernel/module/strict_rwx.c is also
much cleaner with module_memory.

Signed-off-by: Song Liu <song@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-03-09 12:55:15 -08:00
Linus Torvalds 7fef099702 x86/resctl: fix scheduler confusion with 'current'
The implementation of 'current' on x86 is very intentionally special: it
is a very common thing to look up, and it uses 'this_cpu_read_stable()'
to get the current thread pointer efficiently from per-cpu storage.

And the keyword in there is 'stable': the current thread pointer never
changes as far as a single thread is concerned.  Even if when a thread
is preempted, or moved to another CPU, or even across an explicit call
'schedule()' that thread will still have the same value for 'current'.

It is, after all, the kernel base pointer to thread-local storage.
That's why it's stable to begin with, but it's also why it's important
enough that we have that special 'this_cpu_read_stable()' access for it.

So this is all done very intentionally to allow the compiler to treat
'current' as a value that never visibly changes, so that the compiler
can do CSE and combine multiple different 'current' accesses into one.

However, there is obviously one very special situation when the
currently running thread does actually change: inside the scheduler
itself.

So the scheduler code paths are special, and do not have a 'current'
thread at all.  Instead there are _two_ threads: the previous and the
next thread - typically called 'prev' and 'next' (or prev_p/next_p)
internally.

So this is all actually quite straightforward and simple, and not all
that complicated.

Except for when you then have special code that is run in scheduler
context, that code then has to be aware that 'current' isn't really a
valid thing.  Did you mean 'prev'? Did you mean 'next'?

In fact, even if then look at the code, and you use 'current' after the
new value has been assigned to the percpu variable, we have explicitly
told the compiler that 'current' is magical and always stable.  So the
compiler is quite free to use an older (or newer) value of 'current',
and the actual assignment to the percpu storage is not relevant even if
it might look that way.

Which is exactly what happened in the resctl code, that blithely used
'current' in '__resctrl_sched_in()' when it really wanted the new
process state (as implied by the name: we're scheduling 'into' that new
resctl state).  And clang would end up just using the old thread pointer
value at least in some configurations.

This could have happened with gcc too, and purely depends on random
compiler details.  Clang just seems to have been more aggressive about
moving the read of the per-cpu current_task pointer around.

The fix is trivial: just make the resctl code adhere to the scheduler
rules of using the prev/next thread pointer explicitly, instead of using
'current' in a situation where it just wasn't valid.

That same code is then also used outside of the scheduler context (when
a thread resctl state is explicitly changed), and then we will just pass
in 'current' as that pointer, of course.  There is no ambiguity in that
case.

The fix may be trivial, but noticing and figuring out what went wrong
was not.  The credit for that goes to Stephane Eranian.

Reported-by: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/lkml/20230303231133.1486085-1-eranian@google.com/
Link: https://lore.kernel.org/lkml/alpine.LFD.2.01.0908011214330.3304@localhost.localdomain/
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Stephane Eranian <eranian@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-08 11:48:11 -08:00
Philippe Mathieu-Daudé b4c108d7da x86/cpu: Expose arch_cpu_idle_dead()'s prototype definition
Include <linux/cpu.h> to make sure arch_cpu_idle_dead() matches its
prototype going forward.

Inspired-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20230214083857.50163-1-philmd@linaro.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-03-08 08:44:30 -08:00
Josh Poimboeuf 071c44e427 sched/idle: Mark arch_cpu_idle_dead() __noreturn
Before commit 076cbf5d2163 ("x86/xen: don't let xen_pv_play_dead()
return"), in Xen, when a previously offlined CPU was brought back
online, it unexpectedly resumed execution where it left off in the
middle of the idle loop.

There were some hacks to make that work, but the behavior was surprising
as do_idle() doesn't expect an offlined CPU to return from the dead (in
arch_cpu_idle_dead()).

Now that Xen has been fixed, and the arch-specific implementations of
arch_cpu_idle_dead() also don't return, give it a __noreturn attribute.

This will cause the compiler to complain if an arch-specific
implementation might return.  It also improves code generation for both
caller and callee.

Also fixes the following warning:

  vmlinux.o: warning: objtool: do_idle+0x25f: unreachable instruction

Reported-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/60d527353da8c99d4cf13b6473131d46719ed16d.1676358308.git.jpoimboe@kernel.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-03-08 08:44:28 -08:00
Josh Poimboeuf eab89405b6 x86/cpu: Mark play_dead() __noreturn
play_dead() doesn't return.  Annotate it as such.  By extension this
also makes arch_cpu_idle_dead() noreturn.

Link: https://lore.kernel.org/r/f3a069e6869c51ccfdda656b76882363bc9fcfa4.1676358308.git.jpoimboe@kernel.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-03-08 08:44:26 -08:00
Andrew Cooper b0563468ee x86/CPU/AMD: Disable XSAVES on AMD family 0x17
AMD Erratum 1386 is summarised as:

  XSAVES Instruction May Fail to Save XMM Registers to the Provided
  State Save Area

This piece of accidental chronomancy causes the %xmm registers to
occasionally reset back to an older value.

Ignore the XSAVES feature on all AMD Zen1/2 hardware.  The XSAVEC
instruction (which works fine) is equivalent on affected parts.

  [ bp: Typos, move it into the F17h-specific function. ]

Reported-by: Tavis Ormandy <taviso@gmail.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20230307174643.1240184-1-andrew.cooper3@citrix.com
2023-03-08 16:56:08 +01:00
Borislav Petkov (AMD) 554eec0b4a x86/mce: Always inline old MCA stubs
The stubs for the ancient MCA support (CONFIG_X86_ANCIENT_MCE) are
normally optimized away on 64-bit builds. However, an allmodconfig one
causes the compiler to add sanitizer calls gunk into them and they exist
as constprop calls. Which objtool then complains about:

  vmlinux.o: warning: objtool: do_machine_check+0xad8: call to \
    pentium_machine_check.constprop.0() leaves .noinstr.text section

due to them missing noinstr. One could tag them "noinstr" but what
should really happen is, they should be forcefully inlined so that all
that gunk gets optimized away and the warning doesn't even have a chance
to fire.

Do so.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230222191054.4701-1-bp@alien8.de
2023-03-08 13:50:07 +01:00
Thomas Weißschuh 7214b32b6f x86/MCE/AMD: Make kobj_type structure constant
Since

  ee6d3dd4ed ("driver core: make kobj_type constant.")

the driver core allows the usage of const struct kobj_type.

Take advantage of this to constify the structure definition to prevent
modification at runtime.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230217-kobj_type-mce-amd-v1-1-40ef94816444@weissschuh.net
2023-03-06 09:57:27 +01:00
Juergen Gross c9ae1b10d9 x86/paravirt: Merge activate_mm() and dup_mmap() callbacks
The two paravirt callbacks .mmu.activate_mm() and .mmu.dup_mmap() are
sharing the same implementations in all cases: for Xen PV guests they
are pinning the PGD of the new mm_struct, and for all other cases they
are a NOP.

In the end, both callbacks are meant to register an address space with
the underlying hypervisor, so there needs to be only a single callback
for that purpose.

So merge them to a common callback .mmu.enter_mmap() (in contrast to the
corresponding already existing .mmu.exit_mmap()).

As the first parameter of the old callbacks isn't used, drop it from the
replacement one.

  [ bp: Remove last occurrence of paravirt_activate_mm() in
    asm/mmu_context.h ]

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Link: https://lore.kernel.org/r/20230207075902.7539-1-jgross@suse.com
2023-03-06 09:41:37 +01:00
Linus Torvalds 7f9ec7d816 A small set of updates for x86:
- Return -EIO instead of success when the certificate buffer for SEV
    guests is not large enough.
 
  - Allow STIPB to be enabled with legacy IBSR. Legacy IBRS is cleared on
    return to userspace for performance reasons, but the leaves user space
    vulnerable to cross-thread attacks which STIBP prevents. Update the
    documentation accordingly.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmQEVnETHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoegJEACbn+CQKFxB4kXJ1xBamYsqQfxY1mM1
 yFziEVH3VCXSshfvKePH7fnoAUHTzhy+SjN6c1ERvl82WVXm/BoF2B81KpN9Yd18
 R6wTpIS227Pn+Ll1yfVQJMHrb0mnSczo5vCGyOzMOxkqIbNCkHRMoeSBspfNLLGM
 3D2+IQqBaqBgNzPQ3JHrwRqQAy/3ZJT4IrHSFe0LwgYQ/EeAGydY8UN0wB1y5YN0
 SoFhPd7B7UWxUD7PrfriBc3B2HN44QkMpe/fQJ4y0GVF+1Uqp6Ti7ouCEVg60A3g
 8kiS+98FBIzHySk+xfX/vlhiQD/J2c6/+p28gw+iGf6YmUsQbeu64tV5TAUGGBN+
 kErLvJmJnC/dwWiEMXzv/e6sNKoZi0Yz/JVq6atuoT/521cjDEDapZRxBSmaW33M
 Zn6YF8FIsUTHGdt9Equ+HPjZZTyk34W8f0d0N+lws0QNWtk5d0KU5XP2PDp+Mj6O
 dGVaGv88qmMIr0o/s9CgvpefSM8L7fC0WQwRpRr905gu8k6YxuEWQofuh365ZcKT
 sEDeRqZYi+ue4+gW1GRje6M5ODftTWoLPlX2f+iZui1gwwpuczvj0sRR10kKfKRD
 qxpHcxyIzS2MW4aT1JnVgeWStt0x5wWeq1qzO1bwBJCAlS63vln/mUnBq7+uV0ca
 KiEah5vP4dcenA==
 =1RwH
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 updates from Thomas Gleixner:
 "A small set of updates for x86:

   - Return -EIO instead of success when the certificate buffer for SEV
     guests is not large enough

   - Allow STIPB to be enabled with legacy IBSR. Legacy IBRS is cleared
     on return to userspace for performance reasons, but the leaves user
     space vulnerable to cross-thread attacks which STIBP prevents.
     Update the documentation accordingly"

* tag 'x86-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  virt/sev-guest: Return -EIO if certificate buffer is not large enough
  Documentation/hw-vuln: Document the interaction between IBRS and STIBP
  x86/speculation: Allow enabling STIBP with legacy IBRS
2023-03-05 11:27:48 -08:00
Linus Torvalds 857f1268a5 Changes in this cycle were:
- Shrink 'struct instruction', to improve objtool performance & memory
    footprint.
 
  - Other maximum memory usage reductions - this makes the build both faster,
    and fixes kernel build OOM failures on allyesconfig and similar configs
    when they try to build the final (large) vmlinux.o.
 
  - Fix ORC unwinding when a kprobe (INT3) is set on a stack-modifying
    single-byte instruction (PUSH/POP or LEAVE). This requires the
    extension of the ORC metadata structure with a 'signal' field.
 
  - Misc fixes & cleanups.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmQAVp8RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gV6A//YbWb4nNxYbRFBd1O3FnFfy4efrDQ4btI
 hwkL6f7jka9RnIpIEatJvaLdNvyN5tuPCC/+B5eVnvFdd1JcBUmj5D+zYFt6H6qt
 BG4M6TNHFkP1kOJVfFGn8UPRfoMz2oMiEqilpsc1Yuf7b3ldMJtGUoHaeZC9pyqe
 RUisKNw4WHZp2G/gTBUWxW17xpWY3Awgch/w4HCu8wMnR+uEC44i0UCBfnAadl36
 ar66PfhMJcQIv0XkK9wu43g7+HFnjpxHOx35JW3lRot0xRnwl/JcsmaX5iPkh0gt
 HV8eLH80J0homeMZDY7vWIKJxGeLkIdfjO5gxwTdnFc9rQw3GwHp1B7WTS6J3Vwe
 gM00kyaGly3CvkKMiz5QQBfViWCjE25nYS8X0i9Oz6Gk58IkRPGByaDTKRjNrDJB
 BwH9DE9xb3dPVZRv/PejkTdggQWo+FDTrL8ulHIjUFK11M7VubwkskecNHkfpAOE
 TRy5iLjMocF8u7hdyec6Mma2K6qEndC2Rw9ZMPQ7TeieMsBcl63cSRgSJLFfdRhr
 /5c6Hr2SNQKU8xu+3j49GyBwFvp4CwCa+GPs9/o+l0uCvuKNIn9B788cm4TjxLJ9
 C3PRzE6B/CaLhYvlC5k5cNM+I4YpoMU/mvSvY6HcC0Duj2nSAWS2VV60MVMDpqVX
 8nK4xnla2tM=
 =bpPY
 -----END PGP SIGNATURE-----

Merge tag 'objtool-core-2023-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool updates from Ingo Molnar:

 - Shrink 'struct instruction', to improve objtool performance & memory
   footprint

 - Other maximum memory usage reductions - this makes the build both
   faster, and fixes kernel build OOM failures on allyesconfig and
   similar configs when they try to build the final (large) vmlinux.o

 - Fix ORC unwinding when a kprobe (INT3) is set on a stack-modifying
   single-byte instruction (PUSH/POP or LEAVE). This requires the
   extension of the ORC metadata structure with a 'signal' field

 - Misc fixes & cleanups

* tag 'objtool-core-2023-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
  objtool: Fix ORC 'signal' propagation
  objtool: Remove instruction::list
  x86: Fix FILL_RETURN_BUFFER
  objtool: Fix overlapping alternatives
  objtool: Union instruction::{call_dest,jump_table}
  objtool: Remove instruction::reloc
  objtool: Shrink instruction::{type,visited}
  objtool: Make instruction::alts a single-linked list
  objtool: Make instruction::stack_ops a single-linked list
  objtool: Change arch_decode_instruction() signature
  x86/entry: Fix unwinding from kprobe on PUSH/POP instruction
  x86/unwind/orc: Add 'signal' field to ORC metadata
  objtool: Optimize layout of struct special_alt
  objtool: Optimize layout of struct symbol
  objtool: Allocate multiple structures with calloc()
  objtool: Make struct check_options static
  objtool: Make struct entries[] static and const
  objtool: Fix HOSTCC flag usage
  objtool: Properly support make V=1
  objtool: Install libsubcmd in build
  ...
2023-03-02 09:45:34 -08:00
KP Singh 6921ed9049 x86/speculation: Allow enabling STIBP with legacy IBRS
When plain IBRS is enabled (not enhanced IBRS), the logic in
spectre_v2_user_select_mitigation() determines that STIBP is not needed.

The IBRS bit implicitly protects against cross-thread branch target
injection. However, with legacy IBRS, the IBRS bit is cleared on
returning to userspace for performance reasons which leaves userspace
threads vulnerable to cross-thread branch target injection against which
STIBP protects.

Exclude IBRS from the spectre_v2_in_ibrs_mode() check to allow for
enabling STIBP (through seccomp/prctl() by default or always-on, if
selected by spectre_v2_user kernel cmdline parameter).

  [ bp: Massage. ]

Fixes: 7c693f54c8 ("x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS")
Reported-by: José Oliveira <joseloliveira11@gmail.com>
Reported-by: Rodrigo Branco <rodrigo@kernelhacking.com>
Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230220120127.1975241-1-kpsingh@kernel.org
Link: https://lore.kernel.org/r/20230221184908.2349578-1-kpsingh@kernel.org
2023-02-27 18:57:09 +01:00
Linus Torvalds 49d5759268 ARM:
- Provide a virtual cache topology to the guest to avoid
   inconsistencies with migration on heterogenous systems. Non secure
   software has no practical need to traverse the caches by set/way in
   the first place.
 
 - Add support for taking stage-2 access faults in parallel. This was an
   accidental omission in the original parallel faults implementation,
   but should provide a marginal improvement to machines w/o FEAT_HAFDBS
   (such as hardware from the fruit company).
 
 - A preamble to adding support for nested virtualization to KVM,
   including vEL2 register state, rudimentary nested exception handling
   and masking unsupported features for nested guests.
 
 - Fixes to the PSCI relay that avoid an unexpected host SVE trap when
   resuming a CPU when running pKVM.
 
 - VGIC maintenance interrupt support for the AIC
 
 - Improvements to the arch timer emulation, primarily aimed at reducing
   the trap overhead of running nested.
 
 - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the
   interest of CI systems.
 
 - Avoid VM-wide stop-the-world operations when a vCPU accesses its own
   redistributor.
 
 - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions
   in the host.
 
 - Aesthetic and comment/kerneldoc fixes
 
 - Drop the vestiges of the old Columbia mailing list and add [Oliver]
   as co-maintainer
 
 This also drags in arm64's 'for-next/sme2' branch, because both it and
 the PSCI relay changes touch the EL2 initialization code.
 
 RISC-V:
 
 - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE
 
 - Correctly place the guest in S-mode after redirecting a trap to the guest
 
 - Redirect illegal instruction traps to guest
 
 - SBI PMU support for guest
 
 s390:
 
 - Two patches sorting out confusion between virtual and physical
   addresses, which currently are the same on s390.
 
 - A new ioctl that performs cmpxchg on guest memory
 
 - A few fixes
 
 x86:
 
 - Change tdp_mmu to a read-only parameter
 
 - Separate TDP and shadow MMU page fault paths
 
 - Enable Hyper-V invariant TSC control
 
 - Fix a variety of APICv and AVIC bugs, some of them real-world,
   some of them affecting architecurally legal but unlikely to
   happen in practice
 
 - Mark APIC timer as expired if its in one-shot mode and the count
   underflows while the vCPU task was being migrated
 
 - Advertise support for Intel's new fast REP string features
 
 - Fix a double-shootdown issue in the emergency reboot code
 
 - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give SVM
   similar treatment to VMX
 
 - Update Xen's TSC info CPUID sub-leaves as appropriate
 
 - Add support for Hyper-V's extended hypercalls, where "support" at this
   point is just forwarding the hypercalls to userspace
 
 - Clean up the kvm->lock vs. kvm->srcu sequences when updating the PMU and
   MSR filters
 
 - One-off fixes and cleanups
 
 - Fix and cleanup the range-based TLB flushing code, used when KVM is
   running on Hyper-V
 
 - Add support for filtering PMU events using a mask.  If userspace
   wants to restrict heavily what events the guest can use, it can now
   do so without needing an absurd number of filter entries
 
 - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
   support is disabled
 
 - Add PEBS support for Intel Sapphire Rapids
 
 - Fix a mostly benign overflow bug in SEV's send|receive_update_data()
 
 - Move several SVM-specific flags into vcpu_svm
 
 x86 Intel:
 
 - Handle NMI VM-Exits before leaving the noinstr region
 
 - A few trivial cleanups in the VM-Enter flows
 
 - Stop enabling VMFUNC for L1 purely to document that KVM doesn't support
   EPTP switching (or any other VM function) for L1
 
 - Fix a crash when using eVMCS's enlighted MSR bitmaps
 
 Generic:
 
 - Clean up the hardware enable and initialization flow, which was
   scattered around multiple arch-specific hooks.  Instead, just
   let the arch code call into generic code.  Both x86 and ARM should
   benefit from not having to fight common KVM code's notion of how
   to do initialization.
 
 - Account allocations in generic kvm_arch_alloc_vm()
 
 - Fix a memory leak if coalesced MMIO unregistration fails
 
 selftests:
 
 - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to emit
   the correct hypercall instruction instead of relying on KVM to patch
   in VMMCALL
 
 - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmP2YA0UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPg/Qf+J6nT+TkIa+8Ei+fN1oMTDp4YuIOx
 mXvJ9mRK9sQ+tAUVwvDz3qN/fK5mjsYbRHIDlVc5p2Q3bCrVGDDqXPFfCcLx1u+O
 9U9xjkO4JxD2LS9pc70FYOyzVNeJ8VMGOBbC2b0lkdYZ4KnUc6e/WWFKJs96bK+H
 duo+RIVyaMthnvbTwSv1K3qQb61n6lSJXplywS8KWFK6NZAmBiEFDAWGRYQE9lLs
 VcVcG0iDJNL/BQJ5InKCcvXVGskcCm9erDszPo7w4Bypa4S9AMS42DHUaRZrBJwV
 /WqdH7ckIz7+OSV0W1j+bKTHAFVTCjXYOM7wQykgjawjICzMSnnG9Gpskw==
 =goe1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Provide a virtual cache topology to the guest to avoid
     inconsistencies with migration on heterogenous systems. Non secure
     software has no practical need to traverse the caches by set/way in
     the first place

   - Add support for taking stage-2 access faults in parallel. This was
     an accidental omission in the original parallel faults
     implementation, but should provide a marginal improvement to
     machines w/o FEAT_HAFDBS (such as hardware from the fruit company)

   - A preamble to adding support for nested virtualization to KVM,
     including vEL2 register state, rudimentary nested exception
     handling and masking unsupported features for nested guests

   - Fixes to the PSCI relay that avoid an unexpected host SVE trap when
     resuming a CPU when running pKVM

   - VGIC maintenance interrupt support for the AIC

   - Improvements to the arch timer emulation, primarily aimed at
     reducing the trap overhead of running nested

   - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the
     interest of CI systems

   - Avoid VM-wide stop-the-world operations when a vCPU accesses its
     own redistributor

   - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected
     exceptions in the host

   - Aesthetic and comment/kerneldoc fixes

   - Drop the vestiges of the old Columbia mailing list and add [Oliver]
     as co-maintainer

  RISC-V:

   - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE

   - Correctly place the guest in S-mode after redirecting a trap to the
     guest

   - Redirect illegal instruction traps to guest

   - SBI PMU support for guest

  s390:

   - Sort out confusion between virtual and physical addresses, which
     currently are the same on s390

   - A new ioctl that performs cmpxchg on guest memory

   - A few fixes

  x86:

   - Change tdp_mmu to a read-only parameter

   - Separate TDP and shadow MMU page fault paths

   - Enable Hyper-V invariant TSC control

   - Fix a variety of APICv and AVIC bugs, some of them real-world, some
     of them affecting architecurally legal but unlikely to happen in
     practice

   - Mark APIC timer as expired if its in one-shot mode and the count
     underflows while the vCPU task was being migrated

   - Advertise support for Intel's new fast REP string features

   - Fix a double-shootdown issue in the emergency reboot code

   - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give
     SVM similar treatment to VMX

   - Update Xen's TSC info CPUID sub-leaves as appropriate

   - Add support for Hyper-V's extended hypercalls, where "support" at
     this point is just forwarding the hypercalls to userspace

   - Clean up the kvm->lock vs. kvm->srcu sequences when updating the
     PMU and MSR filters

   - One-off fixes and cleanups

   - Fix and cleanup the range-based TLB flushing code, used when KVM is
     running on Hyper-V

   - Add support for filtering PMU events using a mask. If userspace
     wants to restrict heavily what events the guest can use, it can now
     do so without needing an absurd number of filter entries

   - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
     support is disabled

   - Add PEBS support for Intel Sapphire Rapids

   - Fix a mostly benign overflow bug in SEV's
     send|receive_update_data()

   - Move several SVM-specific flags into vcpu_svm

  x86 Intel:

   - Handle NMI VM-Exits before leaving the noinstr region

   - A few trivial cleanups in the VM-Enter flows

   - Stop enabling VMFUNC for L1 purely to document that KVM doesn't
     support EPTP switching (or any other VM function) for L1

   - Fix a crash when using eVMCS's enlighted MSR bitmaps

  Generic:

   - Clean up the hardware enable and initialization flow, which was
     scattered around multiple arch-specific hooks. Instead, just let
     the arch code call into generic code. Both x86 and ARM should
     benefit from not having to fight common KVM code's notion of how to
     do initialization

   - Account allocations in generic kvm_arch_alloc_vm()

   - Fix a memory leak if coalesced MMIO unregistration fails

  selftests:

   - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to
     emit the correct hypercall instruction instead of relying on KVM to
     patch in VMMCALL

   - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (325 commits)
  KVM: SVM: hyper-v: placate modpost section mismatch error
  KVM: x86/mmu: Make tdp_mmu_allowed static
  KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID
  KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes
  KVM: arm64: nv: Filter out unsupported features from ID regs
  KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
  KVM: arm64: nv: Allow a sysreg to be hidden from userspace only
  KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
  KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
  KVM: arm64: nv: Handle SMCs taken from virtual EL2
  KVM: arm64: nv: Handle trapped ERET from virtual EL2
  KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
  KVM: arm64: nv: Support virtual EL2 exceptions
  KVM: arm64: nv: Handle HCR_EL2.NV system register traps
  KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
  KVM: arm64: nv: Add EL2 system registers to vcpu context
  KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
  KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
  KVM: arm64: nv: Introduce nested virtualization VCPU feature
  KVM: arm64: Use the S2 MMU context to iterate over S2 table
  ...
2023-02-25 11:30:21 -08:00
Linus Torvalds d8e473182a - Fixup comment typo
- Prevent unexpected #VE's from:
   - Hosts removing perfectly good guest mappings (SEPT_VE_DISABLE
   - Excessive #VE notifications (NOTIFY_ENABLES) which are
     delivered via a #VE.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmP1WzMACgkQaDWVMHDJ
 krAqig/+MzIYmUIkuYbluektxPdzI6zhY/Z+eD5DDH9OFZX5e0WQrmHpQbJ3i4Q6
 LT5JQ+yAI2ox/mhPfyCeDXqdRiatJJExUDUepc0qsOEW9gTsJ+edYwUsJg8HII61
 +TLz/BiMSF6xCUk46b4CqzhoeEk1dupFAG204uc4vGwSfXdysN3buAcciJc1rOTS
 7G9hI9fdLSjEJ8yyFebSDMPxSmdnjJPrDK3LF/leGJEpAQ/eMU0entG4ZH3Uyh2s
 3EnDpOdRjX56LAEixB4e5igXyS7wesCun4ytOnwndzW8p4gPIsypcJUEbVt84BfA
 HQaSWP35BFAn0JshJnFPmj4r4jV2EB8l630dVTOKdNSiIa3YjyB5nbzy+mMPFl4f
 8vcrHEZ6boEcRhgz0zFG0RfnDsjdbqKgFBXdRt0vYB/CG+EfmYaPoDXsb/8A7dtc
 8IQ9wLk2AqG0L8blZVS2kjFxNa/9lkDcMsAbfZmlORTQTF2WN2Jlbxri87vuBpRy
 8sqMUhgvHoffd/SIiDzJJIBjOH5/RhXLKhGzXQHI1vpZdU6ps9KIvohiycgx1mUQ
 lXXQwN5OWSHdUXZ7TFBIGXy9n32Ak/k5GCzCJSqvsMJDDdbycGVB+YCaKX6QK30+
 HAHrPy/FQ3FFvZWdsDMD5Pn4RkF4LYH/k4QZwqBFMs9+/Sdzwxc=
 =UpyL
 -----END PGP SIGNATURE-----

Merge tag 'x86_tdx_for_6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull Intel Trust Domain Extensions (TDX) updates from Dave Hansen:
 "Other than a minor fixup, the content here is to ensure that TDX
  guests never see virtualization exceptions (#VE's) that might be
  induced by the untrusted VMM.

  This is a highly desirable property. Without it, #VE exception
  handling would fall somewhere between NMIs, machine checks and total
  insanity. With it, #VE handling remains pretty mundane.

  Summary:

   - Fixup comment typo

   - Prevent unexpected #VE's from:
      - Hosts removing perfectly good guest mappings (SEPT_VE_DISABLE)
      - Excessive #VE notifications (NOTIFY_ENABLES) which are delivered
        via a #VE"

* tag 'x86_tdx_for_6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tdx: Do not corrupt frame-pointer in __tdx_hypercall()
  x86/tdx: Disable NOTIFY_ENABLES
  x86/tdx: Relax SEPT_VE_DISABLE check for debug TD
  x86/tdx: Use ReportFatalError to report missing SEPT_VE_DISABLE
  x86/tdx: Expand __tdx_hypercall() to handle more arguments
  x86/tdx: Refactor __tdx_hypercall() to allow pass down more arguments
  x86/tdx: Add more registers to struct tdx_hypercall_args
  x86/tdx: Fix typo in comment in __tdx_hypercall()
2023-02-25 09:11:30 -08:00
Linus Torvalds 3822a7c409 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X bit.
 
 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.
 
 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes
 
 - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which
   does perform some memcg maintenance and cleanup work.
 
 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".  These filters provide users
   with finer-grained control over DAMOS's actions.  SeongJae has also done
   some DAMON cleanup work.
 
 - Kairui Song adds a series ("Clean up and fixes for swap").
 
 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".
 
 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series.  It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.
 
 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".
 
 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".
 
 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".
 
 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".
 
 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series "mm:
   support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap
   PTEs".
 
 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".
 
 - Sergey Senozhatsky has improved zsmalloc's memory utilization with his
   series "zsmalloc: make zspage chain size configurable".
 
 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.  The previous BPF-based approach had
   shortcomings.  See "mm: In-kernel support for memory-deny-write-execute
   (MDWE)".
 
 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
 
 - T.J.  Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".
 
 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a per-node
   basis.  See the series "Introduce per NUMA node memory error
   statistics".
 
 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage during
   compaction".
 
 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".
 
 - Christoph Hellwig has removed block_device_operations.rw_page() in ths
   series "remove ->rw_page".
 
 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".
 
 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier functions".
 
 - Some pagemap cleanup and generalization work in Mike Rapoport's series
   "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and
   "fixups for generic implementation of pfn_valid()"
 
 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
 
 - Jason Gunthorpe rationalized the GUP system's interface to the rest of
   the kernel in the series "Simplify the external interface for GUP".
 
 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface.  To support this, we'll temporarily be
   printing warnings when people use the debugfs interface.  See the series
   "mm/damon: deprecate DAMON debugfs interface".
 
 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.
 
 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".
 
 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/PoPQAKCRDdBJ7gKXxA
 jlvpAPsFECUBBl20qSue2zCYWnHC7Yk4q9ytTkPB/MMDrFEN9wD/SNKEm2UoK6/K
 DmxHkn0LAitGgJRS/W9w81yrgig9tAQ=
 =MlGs
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
   F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X
   bit.

 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.

 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes

 - Johannes Weiner has a series ("mm: push down lock_page_memcg()")
   which does perform some memcg maintenance and cleanup work.

 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".

   These filters provide users with finer-grained control over DAMOS's
   actions. SeongJae has also done some DAMON cleanup work.

 - Kairui Song adds a series ("Clean up and fixes for swap").

 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".

 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.

 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".

 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".

 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".

 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".

 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series
   "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
   swap PTEs".

 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".

 - Sergey Senozhatsky has improved zsmalloc's memory utilization with
   his series "zsmalloc: make zspage chain size configurable".

 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.

   The previous BPF-based approach had shortcomings. See "mm: In-kernel
   support for memory-deny-write-execute (MDWE)".

 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".

 - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".

 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a
   per-node basis. See the series "Introduce per NUMA node memory error
   statistics".

 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage
   during compaction".

 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".

 - Christoph Hellwig has removed block_device_operations.rw_page() in
   ths series "remove ->rw_page".

 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".

 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier
   functions".

 - Some pagemap cleanup and generalization work in Mike Rapoport's
   series "mm, arch: add generic implementation of pfn_valid() for
   FLATMEM" and "fixups for generic implementation of pfn_valid()"

 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".

 - Jason Gunthorpe rationalized the GUP system's interface to the rest
   of the kernel in the series "Simplify the external interface for
   GUP".

 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface. To support this, we'll temporarily be
   printing warnings when people use the debugfs interface. See the
   series "mm/damon: deprecate DAMON debugfs interface".

 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.

 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".

 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".

* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
  include/linux/migrate.h: remove unneeded externs
  mm/memory_hotplug: cleanup return value handing in do_migrate_range()
  mm/uffd: fix comment in handling pte markers
  mm: change to return bool for isolate_movable_page()
  mm: hugetlb: change to return bool for isolate_hugetlb()
  mm: change to return bool for isolate_lru_page()
  mm: change to return bool for folio_isolate_lru()
  objtool: add UACCESS exceptions for __tsan_volatile_read/write
  kmsan: disable ftrace in kmsan core code
  kasan: mark addr_has_metadata __always_inline
  mm: memcontrol: rename memcg_kmem_enabled()
  sh: initialize max_mapnr
  m68k/nommu: add missing definition of ARCH_PFN_OFFSET
  mm: percpu: fix incorrect size in pcpu_obj_full_size()
  maple_tree: reduce stack usage with gcc-9 and earlier
  mm: page_alloc: call panic() when memoryless node allocation fails
  mm: multi-gen LRU: avoid futile retries
  migrate_pages: move THP/hugetlb migration support check to simplify code
  migrate_pages: batch flushing TLB
  migrate_pages: share more code between _unmap and _move
  ...
2023-02-23 17:09:35 -08:00
Linus Torvalds 06e1a81c48 A healthy mix of EFI contributions this time:
- Performance tweaks for efifb earlycon by Andy
 
 - Preparatory refactoring and cleanup work in the efivar layer by Johan,
   which is needed to accommodate the Snapdragon arm64 laptops that
   expose their EFI variable store via a TEE secure world API.
 
 - Enhancements to the EFI memory map handling so that Xen dom0 can
   safely access EFI configuration tables (Demi Marie)
 
 - Wire up the newly introduced IBT/BTI flag in the EFI memory attributes
   table, so that firmware that is generated with ENDBR/BTI landing pads
   will be mapped with enforcement enabled.
 
 - Clean up how we check and print the EFI revision exposed by the
   firmware.
 
 - Incorporate EFI memory attributes protocol definition contributed by
   Evgeniy and wire it up in the EFI zboot code. This ensures that these
   images can execute under new and stricter rules regarding the default
   memory permissions for EFI page allocations. (More work is in progress
   here)
 
 - CPER header cleanup by Dan Williams
 
 - Use a raw spinlock to protect the EFI runtime services stack on arm64
   to ensure the correct semantics under -rt. (Pierre)
 
 - EFI framebuffer quirk for Lenovo Ideapad by Darrell.
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmPzuwsACgkQw08iOZLZ
 jyS7dwwAm95DlDxFIQi4FmTm2mqJws9PyDrkfaAK1CoyqCgeOLQT2FkVolgr8jne
 pwpwCTXtYP8y0BZvdQEIjpAq/BHKaD3GJSPfl7lo+pnUu68PpsFWaV6EdT33KKfj
 QeF0MnUvrqUeTFI77D+S0ZW2zxdo9eCcahF3TPA52/bEiiDHWBF8Qm9VHeQGklik
 zoXA15ft3mgITybgjEA0ncGrVZiBMZrYoMvbdkeoedfw02GN/eaQn8d2iHBtTDEh
 3XNlo7ONX0v50cjt0yvwFEA0AKo0o7R1cj+ziKH/bc4KjzIiCbINhy7blroSq+5K
 YMlnPHuj8Nhv3I+MBdmn/nxRCQeQsE4RfRru04hfNfdcqjAuqwcBvRXvVnjWKZHl
 CmUYs+p/oqxrQ4BjiHfw0JKbXRsgbFI6o3FeeLH9kzI9IDUPpqu3Ma814FVok9Ai
 zbOCrJf5tEtg5tIavcUESEMBuHjEafqzh8c7j7AAqbaNjlihsqosDy9aYoarEi5M
 f/tLec86
 =+pOz
 -----END PGP SIGNATURE-----

Merge tag 'efi-next-for-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI updates from Ard Biesheuvel:
 "A healthy mix of EFI contributions this time:

   - Performance tweaks for efifb earlycon (Andy)

   - Preparatory refactoring and cleanup work in the efivar layer, which
     is needed to accommodate the Snapdragon arm64 laptops that expose
     their EFI variable store via a TEE secure world API (Johan)

   - Enhancements to the EFI memory map handling so that Xen dom0 can
     safely access EFI configuration tables (Demi Marie)

   - Wire up the newly introduced IBT/BTI flag in the EFI memory
     attributes table, so that firmware that is generated with ENDBR/BTI
     landing pads will be mapped with enforcement enabled

   - Clean up how we check and print the EFI revision exposed by the
     firmware

   - Incorporate EFI memory attributes protocol definition and wire it
     up in the EFI zboot code (Evgeniy)

     This ensures that these images can execute under new and stricter
     rules regarding the default memory permissions for EFI page
     allocations (More work is in progress here)

   - CPER header cleanup (Dan Williams)

   - Use a raw spinlock to protect the EFI runtime services stack on
     arm64 to ensure the correct semantics under -rt (Pierre)

   - EFI framebuffer quirk for Lenovo Ideapad (Darrell)"

* tag 'efi-next-for-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits)
  firmware/efi sysfb_efi: Add quirk for Lenovo IdeaPad Duet 3
  arm64: efi: Make efi_rt_lock a raw_spinlock
  efi: Add mixed-mode thunk recipe for GetMemoryAttributes
  efi: x86: Wire up IBT annotation in memory attributes table
  efi: arm64: Wire up BTI annotation in memory attributes table
  efi: Discover BTI support in runtime services regions
  efi/cper, cxl: Remove cxl_err.h
  efi: Use standard format for printing the EFI revision
  efi: Drop minimum EFI version check at boot
  efi: zboot: Use EFI protocol to remap code/data with the right attributes
  efi/libstub: Add memory attribute protocol definitions
  efi: efivars: prevent double registration
  efi: verify that variable services are supported
  efivarfs: always register filesystem
  efi: efivars: add efivars printk prefix
  efi: Warn if trying to reserve memory under Xen
  efi: Actually enable the ESRT under Xen
  efi: Apply allowlist to EFI configuration tables when running under Xen
  efi: xen: Implement memory descriptor lookup based on hypercall
  efi: memmap: Disregard bogus entries instead of returning them
  ...
2023-02-23 14:41:48 -08:00
Linus Torvalds 7dd86cf801 Livepatching changes for 6.3
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmP01jEACgkQUqAMR0iA
 lPIH1g/9G3CLt9+by3d0FiS5AsbK6vohZRzKxqCTyX9b2p2sLQuiTu8TodJ9IOur
 axpaOuPOIjQ253yfqrYL2YZHdfr/632nSTGsT18p7k8a9m6ghQ6cW+uq23Ro9W43
 uubjyvFMTeLBnkDTSJquciENdMtyPwyiTN0+ZvDOsAOKoBt7i3Rt7lcorjDuALwb
 2SQ80qCjYLy1r6mkQyy3IhLQVSeRiaqkR6IAdxR5EeXmbSi0YYh0Jxz5AzPMeDw6
 F1w7Whe6Vf8vf4JHVDqNazB3f+JH4JrmRvk6LxlGU33z9uJac4NIbboGDXzRP21h
 A0UGqwJimeoi8dji9Y3cXRIYRc+HqyqL0IadSHHLCou746/CJDBQN1EjNNBgZgu6
 cTELnRL9TsV9tWt9boqbMK0KfCFnOsGPMDAXXDH5G/ZUsvyDweGMNelgNTXURHFX
 f1cTfdDj2T/iG3XDZHf35/rSI56BDQJcJ1G1fQc3sl+G9jDGxX3PBJYNAUpvC6cy
 iewDeROcAAXmwUqaC8epJJfN2m9zJJpDgR1t4mZTnKkmRF090ZApyDA4M769kn0J
 ioHvE8Pe9+GjY8syGzq3EJRyQnfXgTbAyKfiTpGWefLPrgkQ9LfFQRQ1S21sy483
 Bz/KrNl1tMWFsSmngNYaiOZifjxVj8pYoIz1W2s0ore9sYu78YM=
 =xdQJ
 -----END PGP SIGNATURE-----

Merge tag 'livepatching-for-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching

Pull livepatching updates from Petr Mladek:

 - Allow reloading a livepatched module by clearing livepatch-specific
   relocations in the livepatch module.

   Otherwise, the repeated load would fail on consistency checks.

* tag 'livepatching-for-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
  livepatch,x86: Clear relocation targets on a module removal
  x86/module: remove unused code in __apply_relocate_add
2023-02-23 14:00:10 -08:00
Linus Torvalds 2b79eb73e2 probes updates for 6.3:
- Skip negative return code check for snprintf in eprobe.
 
 - Add recursive call test cases for kprobe unit test
 
 - Add 'char' type to probe events to show it as the character instead of value.
 
 - Update kselftest kprobe-event testcase to ignore '__pfx_' symbols.
 
 - Fix kselftest to check filter on eprobe event correctly.
 
 - Add filter on eprobe to the README file in tracefs.
 
 - Fix optprobes to check whether there is 'under unoptimizing' optprobe when optimizing another kprobe correctly.
 
 - Fix optprobe to check whether there is 'under unoptimizing' optprobe when fetching the original instruction correctly.
 
 - Fix optprobe to free 'forcibly unoptimized' optprobe correctly.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmP0JdYACgkQ2/sHvwUr
 Pxt6sQf/TD9Kwqx3XG1tnLPev6yt2nuggUippHwWUFHlJtMyUaLV8aKFqByyEe+j
 tCQvrFIIJq242xg0Jac/MAf2exlWG9jsmVZPmvC1YzepOAbjXu2eBkIS7LsbeHjF
 JJypNnEceffWCpNoD6nlvR0xWXenqRbZJwdsGqo3u+fXnzTurEMY2GU2xOyv39tv
 S1uNLPANJxdMb/2iUsUE3hMbe82dqr8zPcApqWFtTBB6QPHI3B2SjuQHpQxwbTPl
 bzAl0yQkLSQXprVzT7xJ4xLnzbl1ljgJBci5aX8BFF+VD9oYkypdfYVczBH5VsP9
 E3eT9T9lRf4Q99EqxNy5uw7NqQXGQg==
 =CMPb
 -----END PGP SIGNATURE-----

Merge tag 'probes-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull kprobes updates from Masami Hiramatsu:

 - Skip negative return code check for snprintf in eprobe

 - Add recursive call test cases for kprobe unit test

 - Add 'char' type to probe events to show it as the character instead
   of value

 - Update kselftest kprobe-event testcase to ignore '__pfx_' symbols

 - Fix kselftest to check filter on eprobe event correctly

 - Add filter on eprobe to the README file in tracefs

 - Fix optprobes to check whether there is 'under unoptimizing' optprobe
   when optimizing another kprobe correctly

 - Fix optprobe to check whether there is 'under unoptimizing' optprobe
   when fetching the original instruction correctly

 - Fix optprobe to free 'forcibly unoptimized' optprobe correctly

* tag 'probes-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/eprobe: no need to check for negative ret value for snprintf
  test_kprobes: Add recursed kprobe test case
  tracing/probe: add a char type to show the character value of traced arguments
  selftests/ftrace: Fix probepoint testcase to ignore __pfx_* symbols
  selftests/ftrace: Fix eprobe syntax test case to check filter support
  tracing/eprobe: Fix to add filter on eprobe description in README file
  x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe range
  x86/kprobes: Fix __recover_optprobed_insn check optimizing logic
  kprobes: Fix to handle forcibly unoptimized kprobes on freeing_list
2023-02-23 13:03:08 -08:00
Linus Torvalds 525445efac NMI diagnostics for v6.3
Add diagnostics to the x86 NMI handler to help detect NMI-handler bugs
 on the one hand and failing hardware on the other.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmPq3x0THHBhdWxtY2tA
 a2VybmVsLm9yZwAKCRCevxLzctn7jOwED/41rLVFORQqNpK5mitA2acqVzRmUG9J
 sUHkJCPmHVr4sEDwqi2u+iBqHMqm8COaQOKA65tPsHJKI1PPcIjBG371QPPYsdRl
 +qNq6oLCrD37Dgs7CmJPjIO+0P2Xb765GUmcNhR9aH9QnYGz2a3s7QfXV2WlFjq3
 1LJ6Z8euEQBb5IE1syp3HHYf3IP4Z88gQxcU4kgV16uADnW0IKSw8F7p9B/EjSnB
 IjIh8gkAbfqNh0VXpex/wzPkrXRbjcOr1s43YkoYS1t3ggIZc6MEGs1kTmXAjxo2
 4S4CAPKfh4Btlez9VVIMwCDb56fHG6I5wyP+jH51dhNNKiuLqnSyHU3kWU3GFiYn
 5Ix7BKtAtp/AzASrL1xildOYjN6gB2QdQijs+bvqzH8Rm8Nl1Yy1z6p6iqcJGe/q
 cerzulajs+/UG2XKwRTWw6I3km4WkueEHYjEzmer+olK/Akx4COYiqMivdY5HIwK
 M7cFVQz2EiHLP3fu2LWrOkNi/Dy96Vsuya0n3E8Ch7Xtdjez7QPAdWU7bLXY/OTd
 jCRdd1MDPz87XQLSmbCV42nJzP5nNryBfijS7swqq9qL/D152ycctxOpIHa6/fJH
 pze5nqRBxjwlhOatJds5mVbhtKwF01YV4wzcaHypCmBXXTz1zVj/hFSbzuUuFwEI
 06c2sVNqzez4tw==
 =MPMw
 -----END PGP SIGNATURE-----

Merge tag 'nmi.2023.02.14a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull x86 NMI diagnostics from Paul McKenney:
 "Add diagnostics to the x86 NMI handler to help detect NMI-handler bugs
  on the one hand and failing hardware on the other"

* tag 'nmi.2023.02.14a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  x86/nmi: Print reasons why backtrace NMIs are ignored
  x86/nmi: Accumulate NMI-progress evidence in exc_nmi()
2023-02-23 09:28:37 -08:00
Ingo Molnar 585a78c1f7 Merge branch 'linus' into objtool/core, to pick up Xen dependencies
Pick up dependencies - freshly merged upstream via xen-next - before applying
dependent objtool changes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-02-23 09:16:39 +01:00
Linus Torvalds 36289a03bc This update includes the following changes:
API:
 
 - Use kmap_local instead of kmap_atomic.
 - Change request callback to take void pointer.
 - Print FIPS status in /proc/crypto (when enabled).
 
 Algorithms:
 
 - Add rfc4106/gcm support on arm64.
 - Add ARIA AVX2/512 support on x86.
 
 Drivers:
 
 - Add TRNG driver for StarFive SoC.
 - Delete ux500/hash driver (subsumed by stm32/hash).
 - Add zlib support in qat.
 - Add RSA support in aspeed.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmPzAiwACgkQxycdCkmx
 i6et8xAAoO3w5MZFGXMzWsYhfSZFdceXBEQfDR7JOCdHxpMIQhw0FLlb0uttFk6m
 SeWrdP9wiifBDoCmw7qffFJml8ZftPL/XeXjob2d9v7jKbPyw3lDSIdsNfN/5EEL
 oIc9915zwrgawvahPAa+PQ4Ue03qRjUyOcV42dpd1W3NYhzDVHoK5OUU+mEFYDvx
 Sgw/YUugKf0VXkVDFzG5049+CPcheyRZqclAo9jyl2eZiXujgUyV33nxRCtqIA+t
 7jlHKwi+6QzFHY0CX5BvShR8xyEuH5MLoU3H/jYGXnRb3nEpRYAEO4VZchIHqF0F
 Y6pKIKc6Q8OyIVY8RsjQY3hioCqYnQFZ5Xtc1zGtOYEitVLbkmItMG0mVn0XOfyt
 gJDi6gkEw5uPUbEQdI4R1xEgJ8eCckMsOJ+uRxqTm+uLqNDxPbsB9bohKniMogXV
 lDlVXjU23AA9VeKtqU8FvWjfgqsN47X4aoq1j4/4aI7X9F7P9FOP21TZloP7+ssj
 PFrzNaRXUrMEsvyS1wqPegIh987lj6WkH4hyU0wjzaIq4IQELidHsSXFS12iWIPH
 kTEoC/trAVoYSr0zXKWUCs4h/x0FztVNbjs4KiDP2FLXX1RzeVZ0WlaXZhryHr+n
 1+8yCuS6tVofAbSX0wNkZdf0x5+3CIBw4kqSIvjKDPYYEfIDaT0=
 =dMYe
 -----END PGP SIGNATURE-----

Merge tag 'v6.3-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto update from Herbert Xu:
 "API:
   - Use kmap_local instead of kmap_atomic
   - Change request callback to take void pointer
   - Print FIPS status in /proc/crypto (when enabled)

  Algorithms:
   - Add rfc4106/gcm support on arm64
   - Add ARIA AVX2/512 support on x86

  Drivers:
   - Add TRNG driver for StarFive SoC
   - Delete ux500/hash driver (subsumed by stm32/hash)
   - Add zlib support in qat
   - Add RSA support in aspeed"

* tag 'v6.3-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (156 commits)
  crypto: x86/aria-avx - Do not use avx2 instructions
  crypto: aspeed - Fix modular aspeed-acry
  crypto: hisilicon/qm - fix coding style issues
  crypto: hisilicon/qm - update comments to match function
  crypto: hisilicon/qm - change function names
  crypto: hisilicon/qm - use min() instead of min_t()
  crypto: hisilicon/qm - remove some unused defines
  crypto: proc - Print fips status
  crypto: crypto4xx - Call dma_unmap_page when done
  crypto: octeontx2 - Fix objects shared between several modules
  crypto: nx - Fix sparse warnings
  crypto: ecc - Silence sparse warning
  tls: Pass rec instead of aead_req into tls_encrypt_done
  crypto: api - Remove completion function scaffolding
  tls: Remove completion function scaffolding
  tipc: Remove completion function scaffolding
  net: ipv6: Remove completion function scaffolding
  net: ipv4: Remove completion function scaffolding
  net: macsec: Remove completion function scaffolding
  dm: Remove completion function scaffolding
  ...
2023-02-21 18:10:50 -08:00
Linus Torvalds b8878e5a5c hyperv-next for v6.3.
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmPzgDgTHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXrc7CACfG4SSd8KkWU/y8Q66Irxdau0a3ETD
 KL4UNRKGIyKujufgFsme79O6xVSSsCNSay449wk20hqn8lnwbSRi9pUwmLn29hfd
 CMFleWIqgwGFfC1do5DRF1vrt1siuG/jVE07mWsEwuY2iHx/es+H7LiQKidhkndZ
 DhXRqoi7VYiJv5fRSumpkUJrMZiI96o9Mk09HUksdMwCn3+7RQEqHnlTH5KOozKF
 iMroDB72iNw5Na/USZwWL2EDRptENam3lFkPBeDPqNw0SbG4g65JGPR9DSa0Lkbq
 AGCJQkdU33mcYQG5MY7R4K1evufpOl/apqLW7h92j45Znr9ok6Vr2c1R
 =J1VT
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-next-signed-20230220' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv updates from Wei Liu:

 - allow Linux to run as the nested root partition for Microsoft
   Hypervisor (Jinank Jain and Nuno Das Neves)

 - clean up the return type of callback functions (Dawei Li)

* tag 'hyperv-next-signed-20230220' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  x86/hyperv: Fix hv_get/set_register for nested bringup
  Drivers: hv: Make remove callback of hyperv driver void returned
  Drivers: hv: Enable vmbus driver for nested root partition
  x86/hyperv: Add an interface to do nested hypercalls
  Drivers: hv: Setup synic registers in case of nested root partition
  x86/hyperv: Add support for detecting nested hypervisor
2023-02-21 16:59:23 -08:00
Linus Torvalds 877934769e - Cache the AMD debug registers in per-CPU variables to avoid MSR writes
where possible, when supporting a debug registers swap feature for
   SEV-ES guests
 
 - Add support for AMD's version of eIBRS called Automatic IBRS which is
   a set-and-forget control of indirect branch restriction speculation
   resources on privilege change
 
 - Add support for a new x86 instruction - LKGS - Load kernel GS which is
   part of the FRED infrastructure
 
 - Reset SPEC_CTRL upon init to accomodate use cases like kexec which
   rediscover
 
 - Other smaller fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmP1RDIACgkQEsHwGGHe
 VUohBw//ZB9ZRqsrKdm6D9YaP2x4Zb+kqKqo6rjYeWaYqyPyCwDujPwh+pb3Oq1t
 aj62muDv1t/wEJc8mKNkfXkjEEtBVAOcpb5YIpKreoEvNKyevol83Ih0u5iJcTRE
 E5qf8HDS8b/JZrcazJJLl6WQmQNH5RiKSu5bbCpRhoeOcyo5pRYR5MztK9vNmAQk
 GMdwHsUSU+jN8uiE4HnpaOb/luhgFindRwZVTpdjJegQWLABS8cl3CKeTv4+PW45
 isvv37XnQP248wsptIEVRHeG6g3g/HtvwRx7DikUw06QwUyUK7H9hJssOoSP8TL9
 u4psRwfWnJ1OxU6klL+s0Ii+pjQ97wXmK/oqK7QkdUwhWqR/mQAW2e9kWHAngyDn
 A6mKbzSM6HFAeSXQpB9cMb6uvYRD44SngDFe3WXtEK8jiiQ70ikUm4E28I5KJOPg
 s+RyioHk0NFRHYSOOBqNG1NKz6ED7L3GbgbbzxkgMh21AAyI3X351t+PtGoLV5ew
 eqOsM7lbg9Scg1LvPk1JcoALS8USWqgar397rz9qGUs+OkPWBtEBCmTdMz/Eb+2t
 g/WHdLS5/ajSs5gNhT99W3DeqZMPDEkgBRSeyBBmY3CUD3gBL2wXEktRXv504zBR
 RC4oyUPX3c9E2ib6GATLE3kBLbcz9hTWbMxF+X3lLJvTVd/Qc2o=
 =v/ZC
 -----END PGP SIGNATURE-----

Merge tag 'x86_cpu_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpuid updates from Borislav Petkov:

 - Cache the AMD debug registers in per-CPU variables to avoid MSR
   writes where possible, when supporting a debug registers swap feature
   for SEV-ES guests

 - Add support for AMD's version of eIBRS called Automatic IBRS which is
   a set-and-forget control of indirect branch restriction speculation
   resources on privilege change

 - Add support for a new x86 instruction - LKGS - Load kernel GS which
   is part of the FRED infrastructure

 - Reset SPEC_CTRL upon init to accomodate use cases like kexec which
   rediscover

 - Other smaller fixes and cleanups

* tag 'x86_cpu_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/amd: Cache debug register values in percpu variables
  KVM: x86: Propagate the AMD Automatic IBRS feature to the guest
  x86/cpu: Support AMD Automatic IBRS
  x86/cpu, kvm: Add the SMM_CTL MSR not present feature
  x86/cpu, kvm: Add the Null Selector Clears Base feature
  x86/cpu, kvm: Move X86_FEATURE_LFENCE_RDTSC to its native leaf
  x86/cpu, kvm: Add the NO_NESTED_DATA_BP feature
  KVM: x86: Move open-coded CPUID leaf 0x80000021 EAX bit propagation code
  x86/cpu, kvm: Add support for CPUID_80000021_EAX
  x86/gsseg: Add the new <asm/gsseg.h> header to <asm/asm-prototypes.h>
  x86/gsseg: Use the LKGS instruction if available for load_gs_index()
  x86/gsseg: Move load_gs_index() to its own new header file
  x86/gsseg: Make asm_load_gs_index() take an u16
  x86/opcode: Add the LKGS instruction to x86-opcode-map
  x86/cpufeature: Add the CPU feature bit for LKGS
  x86/bugs: Reset speculation control settings on init
  x86/cpu: Remove redundant extern x86_read_arch_cap_msr()
2023-02-21 14:51:40 -08:00
Linus Torvalds 2504ba8b01 Power management updates for 6.3-rc1
- Add EPP support to the AMD P-state cpufreq driver (Perry Yuan, Wyes
    Karny, Arnd Bergmann, Bagas Sanjaya).
 
  - Drop the custom cpufreq driver for loongson1 that is not necessary
    any more and the corresponding cpufreq platform device (Keguang
    Zhang).
 
  - Remove "select SRCU" from system sleep, cpufreq and OPP Kconfig
    entries (Paul E. McKenney).
 
  - Enable thermal cooling for Tegra194 (Yi-Wei Wang).
 
  - Register module device table and add missing compatibles for
    cpufreq-qcom-hw (Nícolas F. R. A. Prado, Abel Vesa and Luca Weiss).
 
  - Various dt binding updates for qcom-cpufreq-nvmem and opp-v2-kryo-cpu
    (Christian Marangi).
 
  - Make kobj_type structure in the cpufreq core constant (Thomas
    Weißschuh).
 
  - Make cpufreq_unregister_driver() return void (Uwe Kleine-König).
 
  - Make the TEO cpuidle governor check CPU utilization in order to refine
   idle state selection (Kajetan Puchalski).
 
  - Make Kconfig select the haltpoll cpuidle governor when the haltpoll
    cpuidle driver is selected and replace a default_idle() call in that
    driver with arch_cpu_idle() to allow MWAIT to be used (Li RongQing).
 
  - Add Emerald Rapids Xeon support to the intel_idle driver (Artem
    Bityutskiy).
 
  - Add ARCH_SUSPEND_POSSIBLE dependencies for ARMv4 cpuidle drivers to
    avoid randconfig build failures (Arnd Bergmann).
 
  - Make kobj_type structures used in the cpuidle sysfs interface
    constant (Thomas Weißschuh).
 
  - Make the cpuidle driver registration code update microsecond values
    of idle state parameters in accordance with their nanosecond values
    if they are provided (Rafael Wysocki).
 
  - Make the PSCI cpuidle driver prevent topology CPUs from being
    suspended on PREEMPT_RT (Krzysztof Kozlowski).
 
  - Document that pm_runtime_force_suspend() cannot be used with
    DPM_FLAG_SMART_SUSPEND (Richard Fitzgerald).
 
  - Add EXPORT macros for exporting PM functions from drivers (Richard
    Fitzgerald).
 
  - Remove /** from non-kernel-doc comments in hibernation code (Randy
    Dunlap).
 
  - Fix possible name leak in powercap_register_zone() (Yang Yingliang).
 
  - Add Meteor Lake and Emerald Rapids support to the intel_rapl power
    capping driver (Zhang Rui).
 
  - Modify the idle_inject power capping facility to support 100% idle
    injection (Srinivas Pandruvada).
 
  - Fix large time windows handling in the intel_rapl power capping
    driver (Zhang Rui).
 
  - Fix memory leaks with using debugfs_lookup() in the generic PM
    domains and Energy Model code (Greg Kroah-Hartman).
 
  - Add missing 'cache-unified' property in the example for kryo OPP
    bindings (Rob Herring).
 
  - Fix error checking in opp_migrate_dentry() (Qi Zheng).
 
  - Let qcom,opp-fuse-level be a 2-long array for qcom SoCs (Konrad
    Dybcio).
 
  - Modify some power management utilities to use the canonical ftrace
    path (Ross Zwisler).
 
  - Correct spelling problems for Documentation/power/ as reported by
    codespell (Randy Dunlap).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmPuJfMSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRx/5kQAJNOVImLEPLerLP8xufw30//LuDU5Gi0
 STsyDOMql/I2MpkeqeCcgrSbpy6NlEglOvg16gfpQ3qqTCLF9ypENxs9E5BGGvW0
 aEdCzvaoqmvi9PCr/jmj0EPP70/U+rIX5m/k0QdjLh9x0aLoAEe3uRJTfR9QVqXf
 I7JX0N9kjKi7YxpA5DlkHrS7J7GPPiWlesJ3p4wXuHMo3jf+6fgkoPFt8yRrGWeh
 AHzGT2BLrsy7aAUjGZB65Qx9q3fnSXMmXOjmn0Xh2njQah+zRZDwrNzwoY2HTLL/
 KQ6/Ww16USYRZtCS1fmGwAj9I+ddq6AOvhPCMn0vLXXmKVAMUrVVWnQS/0+vpm9y
 suUMK9Tndkgxd1vjby2246ThJn27uDd/ERFan4ouQo2j22uICY+SDo3osj2hMXka
 wq4zthXkY8KgjZ+MuXnZxPhcOvo8KRvfxAU0fy5efQnSkbtwY9UlMvjPBMBHm/RA
 21/6kjQNtq5vMmI37oC8DH+oPrRQ7sUKuY7HNqwO9P3QNKWVmNe7cF5UtXXxME7Q
 ULvP1d+u+TNNdHFLryPwCSzBO34wQEccdRZBjalZ8tBe6JiDWUFHC3giSURZSuzZ
 GDvzVaNX6PkgToyv4inBTB8lTp6pAuUjaWNvNJzVvUXiEKHB0ihzg5vpJW5NdwlH
 15Tn8cjH7pp0
 =lZLx
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These add EPP support to the AMD P-state cpufreq driver, add support
  for new platforms to the Intel RAPL power capping driver, intel_idle
  and the Qualcomm cpufreq driver, enable thermal cooling for Tegra194,
  drop the custom cpufreq driver for loongson1 that is not necessary any
  more (and the corresponding cpufreq platform device), fix assorted
  issues and clean up code.

  Specifics:

   - Add EPP support to the AMD P-state cpufreq driver (Perry Yuan, Wyes
     Karny, Arnd Bergmann, Bagas Sanjaya)

   - Drop the custom cpufreq driver for loongson1 that is not necessary
     any more and the corresponding cpufreq platform device (Keguang
     Zhang)

   - Remove "select SRCU" from system sleep, cpufreq and OPP Kconfig
     entries (Paul E. McKenney)

   - Enable thermal cooling for Tegra194 (Yi-Wei Wang)

   - Register module device table and add missing compatibles for
     cpufreq-qcom-hw (Nícolas F. R. A. Prado, Abel Vesa and Luca Weiss)

   - Various dt binding updates for qcom-cpufreq-nvmem and
     opp-v2-kryo-cpu (Christian Marangi)

   - Make kobj_type structure in the cpufreq core constant (Thomas
     Weißschuh)

   - Make cpufreq_unregister_driver() return void (Uwe Kleine-König)

   - Make the TEO cpuidle governor check CPU utilization in order to
     refine idle state selection (Kajetan Puchalski)

   - Make Kconfig select the haltpoll cpuidle governor when the haltpoll
     cpuidle driver is selected and replace a default_idle() call in
     that driver with arch_cpu_idle() to allow MWAIT to be used (Li
     RongQing)

   - Add Emerald Rapids Xeon support to the intel_idle driver (Artem
     Bityutskiy)

   - Add ARCH_SUSPEND_POSSIBLE dependencies for ARMv4 cpuidle drivers to
     avoid randconfig build failures (Arnd Bergmann)

   - Make kobj_type structures used in the cpuidle sysfs interface
     constant (Thomas Weißschuh)

   - Make the cpuidle driver registration code update microsecond values
     of idle state parameters in accordance with their nanosecond values
     if they are provided (Rafael Wysocki)

   - Make the PSCI cpuidle driver prevent topology CPUs from being
     suspended on PREEMPT_RT (Krzysztof Kozlowski)

   - Document that pm_runtime_force_suspend() cannot be used with
     DPM_FLAG_SMART_SUSPEND (Richard Fitzgerald)

   - Add EXPORT macros for exporting PM functions from drivers (Richard
     Fitzgerald)

   - Remove /** from non-kernel-doc comments in hibernation code (Randy
     Dunlap)

   - Fix possible name leak in powercap_register_zone() (Yang Yingliang)

   - Add Meteor Lake and Emerald Rapids support to the intel_rapl power
     capping driver (Zhang Rui)

   - Modify the idle_inject power capping facility to support 100% idle
     injection (Srinivas Pandruvada)

   - Fix large time windows handling in the intel_rapl power capping
     driver (Zhang Rui)

   - Fix memory leaks with using debugfs_lookup() in the generic PM
     domains and Energy Model code (Greg Kroah-Hartman)

   - Add missing 'cache-unified' property in the example for kryo OPP
     bindings (Rob Herring)

   - Fix error checking in opp_migrate_dentry() (Qi Zheng)

   - Let qcom,opp-fuse-level be a 2-long array for qcom SoCs (Konrad
     Dybcio)

   - Modify some power management utilities to use the canonical ftrace
     path (Ross Zwisler)

   - Correct spelling problems for Documentation/power/ as reported by
     codespell (Randy Dunlap)"

* tag 'pm-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (53 commits)
  Documentation: amd-pstate: disambiguate user space sections
  cpufreq: amd-pstate: Fix invalid write to MSR_AMD_CPPC_REQ
  dt-bindings: opp: opp-v2-kryo-cpu: enlarge opp-supported-hw maximum
  dt-bindings: cpufreq: qcom-cpufreq-nvmem: make cpr bindings optional
  dt-bindings: cpufreq: qcom-cpufreq-nvmem: specify supported opp tables
  PM: Add EXPORT macros for exporting PM functions
  cpuidle: psci: Do not suspend topology CPUs on PREEMPT_RT
  MIPS: loongson32: Drop obsolete cpufreq platform device
  powercap: intel_rapl: Fix handling for large time window
  cpuidle: driver: Update microsecond values of state parameters as needed
  cpuidle: sysfs: make kobj_type structures constant
  cpuidle: add ARCH_SUSPEND_POSSIBLE dependencies
  PM: EM: fix memory leak with using debugfs_lookup()
  PM: domains: fix memory leak with using debugfs_lookup()
  cpufreq: Make kobj_type structure constant
  cpufreq: davinci: Fix clk use after free
  cpufreq: amd-pstate: avoid uninitialized variable use
  cpufreq: Make cpufreq_unregister_driver() return void
  OPP: fix error checking in opp_migrate_dentry()
  dt-bindings: cpufreq: cpufreq-qcom-hw: Add SM8550 compatible
  ...
2023-02-21 12:13:58 -08:00
Linus Torvalds 9e58df973d Updates for the interrupt subsystem:
Core:
 
     - Move the interrupt affinity spreading mechanism into lib/group_cpus
       so it can be used for similar spreading requirements, e.g. in the
       block multi-queue code.
 
       This also contains a first usecase in the block multi-queue code which
       Jens asked to take along with the librarization.
 
     - Improve irqdomain locking to close a number race conditions which
       can be observed with massive parallel device driver probing.
 
     - Enforce and document the semantics of disable_irq() which cannot be
       invoked safely from non-sleepable context.
 
     - Move the IPI multiplexing code from the Apple AIC driver into the
       core. so it can be reused by RISCV.
 
   Drivers:
 
     - Plug OF node refcounting leaks in various drivers.
 
     - Correctly mark level triggered interrupts in the Broadcom L2 drivers.
 
     - The usual small fixes and improvements.
 
     - No new drivers for the record!
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmPzUSkTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoY3DEAC9E4yLO7VxxTrs/KrAVCgL3SnHVXQU
 nE42uFbQwpCILuNmnqP3uvTHLCsXZkbuBaZEbxLBxC2iyU6+31N1Is+e6cClGMjK
 kX6U9g9EqiRCdX3fgJiEU16fCgE8D1AEg+7XKLjeasQhCfKQGGtCtE9/Gmg/Ji92
 gcEY/bjvm1hcoNo9dh/vR4k0k63fb13716RLScozUkS/XYVlu+LrrG349gD2WEA9
 lh1twDkXvZTWkiYKWAkLorxcNyKhcnJxJw8zEIGVF5b6pCCudK8gXjBbMD5abC7W
 xano6B8F455eSKNsi2TWyW47ZHUkC60sqCNDgI2MBTsI7D72UpAJoDfe0VjbMoaH
 RQJnrGsUQbviBUen+LEet7nWZBQJRKZHOVtYEjA8ndB3PJUXKKcLeODdw11odyjR
 bgZk+0wnowMArIaoLfeItF2oSpfSzLVxh2i8Aeus5tBesvhVCOi4LABRBKGCWvMj
 cpSlMhZ4znMnr5j5lOGpcAjKFlWVh1HmF70Y2deGZi5xC8EXFL/VsB7rH5LEEEuF
 7I8CO8M1mXeOTJoCchCbuAYgZyuk1DIhKUyOiYQZblaPNGcVGvCIN31SFBRT9h/8
 e0VwSvVL756GhotUp/LjgTdG7MoKspWqRG00+q84SsDalsKGXMW7zmHc+1NgGN/C
 Yxio1Jlly9Rwyw==
 =+pu3
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq updates from Thomas Gleixner:
 "Updates for the interrupt subsystem:

  Core:

   - Move the interrupt affinity spreading mechanism into lib/group_cpus
     so it can be used for similar spreading requirements, e.g. in the
     block multi-queue code

     This also contains a first usecase in the block multi-queue code
     which Jens asked to take along with the librarization

   - Improve irqdomain locking to close a number race conditions which
     can be observed with massive parallel device driver probing

   - Enforce and document the semantics of disable_irq() which cannot be
     invoked safely from non-sleepable context

   - Move the IPI multiplexing code from the Apple AIC driver into the
     core, so it can be reused by RISCV

  Drivers:

   - Plug OF node refcounting leaks in various drivers

   - Correctly mark level triggered interrupts in the Broadcom L2
     drivers

   - The usual small fixes and improvements

   - No new drivers for the record!"

* tag 'irq-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
  irqchip/irq-bcm7120-l2: Set IRQ_LEVEL for level triggered interrupts
  irqchip/irq-brcmstb-l2: Set IRQ_LEVEL for level triggered interrupts
  irqdomain: Switch to per-domain locking
  irqchip/mvebu-odmi: Use irq_domain_create_hierarchy()
  irqchip/loongson-pch-msi: Use irq_domain_create_hierarchy()
  irqchip/gic-v3-mbi: Use irq_domain_create_hierarchy()
  irqchip/gic-v3-its: Use irq_domain_create_hierarchy()
  irqchip/gic-v2m: Use irq_domain_create_hierarchy()
  irqchip/alpine-msi: Use irq_domain_add_hierarchy()
  x86/uv: Use irq_domain_create_hierarchy()
  x86/ioapic: Use irq_domain_create_hierarchy()
  irqdomain: Clean up irq_domain_push/pop_irq()
  irqdomain: Drop leftover brackets
  irqdomain: Drop dead domain-name assignment
  irqdomain: Drop revmap mutex
  irqdomain: Fix domain registration race
  irqdomain: Fix mapping-creation race
  irqdomain: Refactor __irq_domain_alloc_irqs()
  irqdomain: Look for existing mapping only once
  irqdomain: Drop bogus fwspec-mapping error handling
  ...
2023-02-21 10:03:48 -08:00
Linus Torvalds 560b803067 Updates for timekeeping, timers and clockevent/source drivers:
Core:
 
     - Yet another round of improvements to make the clocksource watchdog
       more robust:
 
       	 - Relax the clocksource-watchdog skew criteria to match the NTP
            criteria.
 
 	 - Temporarily skip the watchdog when high memory latencies are
 	   detected which can lead to false-positives.
 
 	 - Provide an option to enable TSC skew detection even on systems
            where TSC is marked as reliable.
 
       Sigh!
 
     - Initialize the restart block in the nanosleep syscalls to be directed
       to the no restart function instead of doing a partial setup on entry.
 
       This prevents an erroneous restart_syscall() invocation from
       corrupting user space data. While such a situation is clearly a user
       space bug, preventing this is a correctness issue and caters to the
       least suprise principle.
 
     - Ignore the hrtimer slack for realtime tasks in schedule_hrtimeout()
       to align it with the nanosleep semantics.
 
   Drivers:
 
     - The obligatory new driver bindings for Mediatek, Rockchip and RISC-V
       variants.
 
     - Add support for the C3STOP misfeature to the RISC-V timer to handle
       the case where the timer stops in deeper idle state.
 
     - Set up a static key in the RISC-V timer correctly before first use.
 
     - The usual small improvements and fixes all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmPzV+cTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoYlDEACMrjN2F6qeiOW94t4nQ3qP1M9AMSgO
 OihC04XuM14/3tEviu/cUOd60wYcUQ/kfI5C+IL35ezeP2w9lnuKqeFpG7aDOa33
 5F3isDPamJdXZEZs44CW15brR6dqDlEi5acKee/TtFV9mN6xNhzxM64IaFqecPmW
 P+BTwunB8xwquY8RzsHXor/GOGb6mqWQIPoHEPnywTDe/xQYWt0Exzi7ch6HQr5Z
 ZzHG6X4h6UTNimjay6L4qsRQWILmPIg4Z5IlycWMQ8qDFM0lbnIJqkG4JwceolI6
 aRQyLe3NQFcPYgq3ue+SNm4RckYn4NbAa1zFm0d5VDgKp4xW1sxvtkxOJuxjaOw2
 /rLkHkmyuVvCeTMAySfxrwnszAoM505CHC6CEYc1xELbeCkROFUaymtVyNFnnTru
 V/Jt/T2Gyx6tOrafX7u+djUjv9figddRpNbskVZvEi3Ztq4MQ069nK3oSUqtP5vO
 INApNg4lq6s8aGqVE+Kp9+CKwGqZqI4MdxQMNMAmCRLPon6apActVawbj18qO/wS
 qblQ0cbF8a16itlQ3V68qmhcPh6EZOuq8II4etNq6U0ulV9712WfMbat3z53LG94
 QNkAmZ3/wui93I+Q2NPxhf5ybJFQZhR0SOtVO6xIdTgOntkODwzzGu9UapfD8mLb
 k5BpWnH8CoUgiw==
 =I67j
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
 "Updates for timekeeping, timers and clockevent/source drivers:

  Core:

   - Yet another round of improvements to make the clocksource watchdog
     more robust:

       - Relax the clocksource-watchdog skew criteria to match the NTP
         criteria.

       - Temporarily skip the watchdog when high memory latencies are
         detected which can lead to false-positives.

       - Provide an option to enable TSC skew detection even on systems
         where TSC is marked as reliable.

     Sigh!

   - Initialize the restart block in the nanosleep syscalls to be
     directed to the no restart function instead of doing a partial
     setup on entry.

     This prevents an erroneous restart_syscall() invocation from
     corrupting user space data. While such a situation is clearly a
     user space bug, preventing this is a correctness issue and caters
     to the least suprise principle.

   - Ignore the hrtimer slack for realtime tasks in schedule_hrtimeout()
     to align it with the nanosleep semantics.

  Drivers:

   - The obligatory new driver bindings for Mediatek, Rockchip and
     RISC-V variants.

   - Add support for the C3STOP misfeature to the RISC-V timer to handle
     the case where the timer stops in deeper idle state.

   - Set up a static key in the RISC-V timer correctly before first use.

   - The usual small improvements and fixes all over the place"

* tag 'timers-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
  clocksource/drivers/timer-sun4i: Add CLOCK_EVT_FEAT_DYNIRQ
  clocksource/drivers/em_sti: Mark driver as non-removable
  clocksource/drivers/sh_tmu: Mark driver as non-removable
  clocksource/drivers/riscv: Patch riscv_clock_next_event() jump before first use
  clocksource/drivers/timer-microchip-pit64b: Add delay timer
  clocksource/drivers/timer-microchip-pit64b: Select driver only on ARM
  dt-bindings: timer: sifive,clint: add comaptibles for T-Head's C9xx
  dt-bindings: timer: mediatek,mtk-timer: add MT8365
  clocksource/drivers/riscv: Get rid of clocksource_arch_init() callback
  clocksource/drivers/sh_cmt: Mark driver as non-removable
  clocksource/drivers/timer-microchip-pit64b: Drop obsolete dependency on COMPILE_TEST
  clocksource/drivers/riscv: Increase the clock source rating
  clocksource/drivers/timer-riscv: Set CLOCK_EVT_FEAT_C3STOP based on DT
  dt-bindings: timer: Add bindings for the RISC-V timer device
  RISC-V: time: initialize hrtimer based broadcast clock event device
  dt-bindings: timer: rk-timer: Add rktimer for rv1126
  time/debug: Fix memory leak with using debugfs_lookup()
  clocksource: Enable TSC watchdog checking of HPET and PMTMR only when requested
  posix-timers: Use atomic64_try_cmpxchg() in __update_gt_cputime()
  clocksource: Verify HPET and PMTMR when TSC unverified
  ...
2023-02-21 09:45:13 -08:00
Linus Torvalds 056612fd41 Miscellaneous cleanups in X86:
- Correct the common copy and pasted mishandling of kstrtobool() in the
     strict_sas_size() setup function.
 
   - Make recalibrate_cpu_khz() an GPL only export.
 
   - Check TSC feature before doing anything else which avoids pointless
     code execution if TSC is not available.
 
   - Remove or fixup stale and misleading comments.
 
   - Remove unused or pointelessly duplicated variables.
 
   - Spelling and typo fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmPzWVkTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYodbEEAC7XjF7BkZ9nhmAMWgwThKbHhNb3QLk
 oO0pcbbff2o7bhcP55Mb6R52G1a/kEvpFg/iF6+4/GcsbxHtLhILtG0PGOgmg28p
 UcdXt8EvkMv+bICr3gYtnwqB50stc/1s8JhHVItaDIXbRjNOrkBHQzgcPx0qfC8w
 INPhlqShSehGtzmaoP4AWMfVtBlqKXlCADpQGd8hcTojlNRAJwzBF9mZbWGdgopW
 qa3yoa+s6kL3M2lXvwREuz/1JnmtKx7cav9ldWlSno2dgDbw1ioDZg9tJhARJo//
 toF9Y9h12ASDBaqVoyVJgKmDQddsdxkBTrMCKQX8yRH21pEX9eeHM/re9lNtUbhl
 4/0juvAKFyviatWAHHCPYGyuPGrSsrsj5sea2fNURnkc6TZ4pHHArDytpAOhYqh2
 8CPpT2Qn/C6CqUsc9Z2fbDZBAOTKR/IF93NzE+HcjRjDyjm30ImeKEbwMHfEa7lX
 V3/wvXH9+WIzvVC3EqbvVqkArG1YQTqQHBZIl9+Za2iEeLz8DGEWCH0b7w8/m2Cg
 0mzUOzjJviy6ShO0B8fZK8LuCoDbPAmL4etfjp1t3q+EsuG5pYOrYtrnZ76XWYD7
 TWxlBHhrYuqUBERpN7SCJgixqXgWVUe2/hZwstQqbmvH/jOe9TGgxrIu2MmvB1kK
 5+ul2d2uwbd4cA==
 =zlRy
 -----END PGP SIGNATURE-----

Merge tag 'x86-cleanups-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull miscellaneous x86 cleanups from Thomas Gleixner:

 - Correct the common copy and pasted mishandling of kstrtobool() in the
   strict_sas_size() setup function

 - Make recalibrate_cpu_khz() an GPL only export

 - Check TSC feature before doing anything else which avoids pointless
   code execution if TSC is not available

 - Remove or fixup stale and misleading comments

 - Remove unused or pointelessly duplicated variables

 - Spelling and typo fixes

* tag 'x86-cleanups-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/hotplug: Remove incorrect comment about mwait_play_dead()
  x86/tsc: Do feature check as the very first thing
  x86/tsc: Make recalibrate_cpu_khz() export GPL only
  x86/cacheinfo: Remove unused trace variable
  x86/Kconfig: Fix spellos & punctuation
  x86/signal: Fix the value returned by strict_sas_size()
  x86/cpu: Remove misleading comment
  x86/setup: Move duplicate boot_cpu_data definition out of the ifdeffery
  x86/boot/e820: Fix typo in e820.c comment
2023-02-21 09:24:08 -08:00
Linus Torvalds 3f0b0903fd - Add getcpu support for the 32-bit version of the vDSO
- Some smaller fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPzusMACgkQEsHwGGHe
 VUojfQ/7BOqXI0XsHTIwilF12w2bLQl1PeI4bSk6VY+iAN2YmQkq2qvNUgwt62e5
 5Z95cDuCZ8sx6L3mDIoOgWBN9zdLbxNhezLFDykb+6as67PMaww9l9R6n3JoC2qm
 ELso5JZnWvIZ7Cu7RRm9IzbSj93JAlN3Aypexe61NywMyge9CAvCiOEhvW+lkYSD
 lhZqgbm5WAB14F1CeqFyC8kVvUez1GH9Dunbe7ozk7LqRfTRlf5YPH88iE4UKzdg
 JXmbcHB2K4aQzfIW66OFPnl/4Cl+XxS/i5CR2NtWlB4/ANZBPoUr7QAS239OpC6u
 3uwv/qPmMe7p/lYMaGXSUpzD/MOCHP1HPN8/CWgdyK+Mdmctpqr0FYh1qXXm1Nuu
 v0SE3btHVIB5UfvImoOlV/RfCx3+TqxzqUU2erc0iD5VxlRfrqJEwJdJHOgRGxFU
 vflRxMQOshhyI7+Q7et0S0QlgK4HvGEHmBUwBsUbfyptIxbqpOLK8INC6N8qwGKZ
 gTuBxLNZ5yRE/NeOVe0cL2ooelfOlg7GKUI+gZbfzzQw8M5WZW9qEDS9y2wIuGey
 wBFJNzjKXSkrTxc6Hd136N7DX7PlMjiJhXP42s+7rXJguPvgk1oVyEuaX540+xX4
 HphXRC2QW0o0hCeFgP11Ai4oq/vRW1RFvdDimJjveJAv19bQNv0=
 =Wg/8
 -----END PGP SIGNATURE-----

Merge tag 'x86_vdso_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 vdso updates from Borislav Petkov:

 - Add getcpu support for the 32-bit version of the vDSO

 - Some smaller fixes

* tag 'x86_vdso_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vdso: Fix -Wmissing-prototypes warnings
  x86/vdso: Fake 32bit VDSO build on 64bit compile for vgetcpu
  selftests: Emit a warning if getcpu() is missing on 32bit
  x86/vdso: Provide getcpu for x86-32.
  x86/cpu: Provide the full setup for getcpu() on x86-32
  x86/vdso: Move VDSO image init to vdso2c generated code
2023-02-21 08:54:41 -08:00
Linus Torvalds efebca0ba9 - Fix mixed steppings support on AMD which got broken somewhere along
the way
 
 - Improve revision reporting
 
 - Properly check CPUID capabilities after late microcode upgrade to
   avoid false positives
 
 - A garden variety of other small fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPzs/sACgkQEsHwGGHe
 VUonGw//RgIVCZIkuytiZesFsAXD3sn4Mmji7WoRZvu3XooA0idOo+7ujBeNcJGw
 aFGjf0K5b7eAfiREqTXPlFSymPid7aN+7cPJD7iURJ5UEoDXca1vVh9Jeq7lhvRL
 M5CErroStya17vFqU5pz50EcUwGcao/N3wY+0rERk8Rkqu864PgI+KahS2V2D2PU
 XolD4CH/+JZMAJPaTG5dSkSf3gJevW/owZ+F2oqKKYNlFsQ6aYd/JZYwIQ2X7W9T
 HdVYzeASZs0tfBEPOsZUSobmIlqUU/MziefDyUuTYbO1DPJ525787RLpRyubhG9k
 b/7DWUNymR56B8AUq/RV6YE/Dw2YpcrP3Eu0pSbD5xUfEy8eFCcIr+cUL5M9+I4W
 iCZtYYGypNbDQf5NRkubtQu8xIwEbjOZNv444kMMBimZGzt/WDEGMHqgRbKpJ2MQ
 F2HoBnNVC5O2BddS0ErTpQDWK8B/c0+S4L1ZTkbh/y9CNhzITZ10sLAEGQawvBEk
 PBYeCQ98m72ijLcecz0vvVO81UHGicqyY86OqeqRx0FbGO9cZJg+8BqyTLxsRTSW
 OgxtB/moURdanWAAOdxZ91yUw40CYWn7qXhYxilZDtGgkFT6sUdA126uMxLJ8u2v
 WiOHmj/ymszHhkJiahcSMaD8gRFnLQ59jNatHNa/5Jyw0mi330g=
 =z8rd
 -----END PGP SIGNATURE-----

Merge tag 'x86_microcode_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 microcode loader updates from Borislav Petkov:

 - Fix mixed steppings support on AMD which got broken somewhere along
   the way

 - Improve revision reporting

 - Properly check CPUID capabilities after late microcode upgrade to
   avoid false positives

 - A garden variety of other small fixes

* tag 'x86_microcode_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/core: Return an error only when necessary
  x86/microcode/AMD: Fix mixed steppings support
  x86/microcode/AMD: Add a @cpu parameter to the reloading functions
  x86/microcode/amd: Remove load_microcode_amd()'s bsp parameter
  x86/microcode: Allow only "1" as a late reload trigger value
  x86/microcode/intel: Print old and new revision during early boot
  x86/microcode/intel: Pass the microcode revision to print_ucode_info() directly
  x86/microcode: Adjust late loading result reporting message
  x86/microcode: Check CPU capabilities after late microcode update correctly
  x86/microcode: Add a parameter to microcode_check() to store CPU capabilities
  x86/microcode: Use the DEVICE_ATTR_RO() macro
  x86/microcode/AMD: Handle multiple glued containers properly
  x86/microcode/AMD: Rename a couple of functions
2023-02-21 08:47:36 -08:00
Linus Torvalds aa8c3db40a - Add support for a new AMD feature called slow memory bandwidth
allocation.  Its goal is to control resource allocation in external slow
 memory which is connected to the machine like for example through CXL devices,
 accelerators etc
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPzmf4ACgkQEsHwGGHe
 VUppKg//Tq+lHaMYO8aTvk4jgqbR9RVXJwPbtEOp2C0kSLs5QxBms/o21IXnxJ07
 tdbIGOrfJGlbzSWP8ywysRRQwpKlwltWUVAjMOFqEfzEURLL042qtHZ8nxGKSGrc
 IZFJLNTMyx1Zyjc7e9A/hANCOoQFoPHT8zHf1CNNo1LtzgHzNZG6kggLHh5tRKSz
 Xi7wFbYBtmttsyIA/iAQjYAU0O9MnmdnktUb7XdPSFtTIZ3Nyw90We4gwYueEPzD
 S/rQHKr8V7ROZMHXQ/BWpVWdcxGoHD8acUSVq8j20KW3W9/H8KL9TRVakvnf0aRW
 g0efxKXdTjTRO49GgD7FUL8x1JdAOXeZwQYDzKPqW/GRESRdpOvsaMwcLDCEpIXK
 PmEOVReklokJF0btFqaVYkY6wGE2KLKmp97g/RffuHdIeIomwI9lTpy9kyQsKakc
 yJ+VsE85BlBEVkHNt49qFClO1L98G3IgZTTt6//EGv0EJl8pELfsddsbjG5uXun+
 xFhr2i7gllQcV4B4HSFFdYRBLvZYnTfKlNR7Hs9pRJT7V28Jv2GURiCHBm4sRv9O
 k3FX7sxytH2syBBwU1NNrMRMo+KgjVZurJwiHpTRbb39K6uCgLk/wbXfWh2SovW1
 BRItz2T6LFu4bo6WIhakx31pNmH94P8vC0acO8LHECVji7qvXFM=
 =8hmj
 -----END PGP SIGNATURE-----

Merge tag 'x86_cache_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 resource control updates from Borislav Petkov:

 - Add support for a new AMD feature called slow memory bandwidth
   allocation. Its goal is to control resource allocation in external
   slow memory which is connected to the machine like for example
   through CXL devices, accelerators etc

* tag 'x86_cache_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Fix a silly -Wunused-but-set-variable warning
  Documentation/x86: Update resctrl.rst for new features
  x86/resctrl: Add interface to write mbm_local_bytes_config
  x86/resctrl: Add interface to write mbm_total_bytes_config
  x86/resctrl: Add interface to read mbm_local_bytes_config
  x86/resctrl: Add interface to read mbm_total_bytes_config
  x86/resctrl: Support monitor configuration
  x86/resctrl: Add __init attribute to rdt_get_mon_l3_config()
  x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation
  x86/resctrl: Include new features in command line options
  x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
  x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
  x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
  x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask()
2023-02-21 08:38:45 -08:00
Linus Torvalds 1adce1b944 - Teach the static_call patching infrastructure to handle conditional
tall calls properly which can be static calls too
 
 - Add proper struct alt_instr.flags which controls different aspects of
   insn patching behavior
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPzkAcACgkQEsHwGGHe
 VUpA3A//bALDnLosUQe/m8CTcj1AU12Y59fGInoLl5xArM3liOhRYWj9yu8+2r5N
 j+89yjWoiaogu/9B18pV0+VnBrFUbALZmHxec0+4VAWyMqYuTbqN28Nj/2cZiHdP
 I/9mwGu40I/Ira021D132EcdoZI7O/6bFlh+kEoAqxc7rsqhD5KKRMlrTTdEPVjH
 aRbWIuzqDWNhbi7IwfgEBIPLiQZQKmIdH5hsFMD6yOMIdMRL6CwKmXVg2M1Zp8ta
 5v2Aqgvu2nZYCIteP4GQck2AlUBlGR4ClGQQRII+U1o8c9dM0hfcIDgsbSYKvgrY
 ANm9MQJaF7MRomk9y4E0EHPZAJEMLKUgiQXMxWpER3O1GOKgZPlyzNSe0gRCiL6O
 NZWZ2cGtdhQMrko4EapE3GNryM1HoCY/QCuD1fCYwoc/pRBDhCxsSqjWUd8G/6wn
 s3S/mD0v3nmTrxHg8sWvqhKshsd7B9V0LSkTpHktz3soFIJGXTxbrtty0CIS61pM
 4iUMYB9SjunoEmdwC7+gCN3sCiRpRqfmIybqXdsW3d37QI+FM5aSBPw51xULubfY
 Wsxo8SkH+IMYxXmfbQuUppsGZ+1QHzU08+MrlvNxGHUjS1aMnsrFF/fbfbbCnWvX
 7hcyBPT0jxc9RPMNeKDm4ItapMMGxGdv6XiRmM8LiUtVG2fMaW4=
 =XUqC
 -----END PGP SIGNATURE-----

Merge tag 'x86_alternatives_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 asm alternatives updates from Borislav Petkov:

 - Teach the static_call patching infrastructure to handle conditional
   tall calls properly which can be static calls too

 - Add proper struct alt_instr.flags which controls different aspects of
   insn patching behavior

* tag 'x86_alternatives_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/static_call: Add support for Jcc tail-calls
  x86/alternatives: Teach text_poke_bp() to patch Jcc.d32 instructions
  x86/alternatives: Introduce int3_emulate_jcc()
  x86/alternatives: Add alt_instr.flags
2023-02-21 08:27:47 -08:00
Linus Torvalds 0246725d73 - Add support for reporting more bits of the physical address on error,
on newer AMD CPUs
 
 - Mask out bits which don't belong to the address of the error being
   reported
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPzUAEACgkQEsHwGGHe
 VUr22RAAh7fi3s8sDP4B2WBe1LPKZystZamxlLObBG2eLT7g0YmSKV12+bHCGf/B
 nGqz9iy+e/T1Khxv0gdEyuENwzuitXgiEOYgB4u70HimWy5422ZCzn1EiOMFtyST
 g0ehOR+tU84YwMVR40ui3spI1DHgeVPqVBLHBARZ1OAaA58N8eVREC6MqJAeAzIU
 +VYiBbn69quECTuU1P7yaT8NDnbm5G6pA1dhKLc7vLl9QWzoW1yWLLcp+oGFN6B8
 rcGDKEDK1OYtdHScRCfhFrznkeYP6SVnSt4wlAgX+HVGPoMpvq8nJygxCWdE0yjd
 aQGhdcVJkQlSqm1iJUv0MK9nkolqXVVSVTurpHunAq7ctul6Qm/X+fsfwBgSIXXn
 Gdj3in374MLWCz/xGqeBS8IiiPxGxJA9s350jyk02LK6Np6sXeuc4PpR66+6FAKQ
 Ypen+uWJ6oBof04bW7DBK0R14atA8EpOOLUrrGIsSkNSEIjLaCipMZOpRCbOw76N
 bXcdnKKsaEDjKtHClvx/vZXklfzWk0OgF8qtY0nGF+khvDAi3pQaIIlCehf0Qemh
 6j00TqIYBCXa0kuKktdPzVJSM7A7TZ5ftboa1IPhE+GYrFFee/VJ3yfgqz102FWI
 RJsY8JXt+EP3VMSOQYqQ5KzcLBJ2uDiRYtgUo4P1CITNpRfZEMc=
 =e9v9
 -----END PGP SIGNATURE-----

Merge tag 'ras_core_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS updates from Borislav Petkov:

 - Add support for reporting more bits of the physical address on error,
   on newer AMD CPUs

 - Mask out bits which don't belong to the address of the error being
   reported

* tag 'ras_core_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Mask out non-address bits from machine check bank
  x86/mce: Add support for Extended Physical Address MCA changes
  x86/mce: Define a function to extract ErrorAddr from MCA_ADDR
2023-02-21 08:04:51 -08:00
Linus Torvalds 89f5349e06 Changes in this cycle:
- Simplify add_rtc_cmos()
 
  - Use strscpy() in the mcelog code
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPzdU8RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gS6w/8DFsYUIK4CGEtG0MYH+DUVsz/zyo4pejM
 yukMpKxXMJKi7pZ6k+0he9LOawa2WeK/hzzJh0zP3EzJtF1RR831XSsGRNPu3OTQ
 Q6mAuFdlLC1EAgJs6muqhIF5/bxfti6pFpZBN8Dwi/9VPpUQwayOgaJXysiRA2aN
 r/czEgp5fGgExC4QLE6HIPIzhsyjUlngH2F4xNeO13cS6S0T3Ns1xcSJs9jJfwMW
 2Vx0FCSo/cj8Hdr10NGEJNrqCzN9yvUuZuQ4utp6yf03zWyP1L4c2MB0E+aw6d1c
 ygpYWm400vlEFHfT8x9UVnybR5wABG4GP8JNtSBHASk7rNPKl5cKfOIPHzdsCnFO
 bHh1Xc4gduFVB8nUilUlcvoseum8GaYOqhi6ov2hwq+n47uam9H5QlCWn7OyLBbW
 88Ajg+wqNxG/R3mhyPslXDMr/dccQ9mcZSxbPDX14LpG8bWAjvM3yP43N8w6myVs
 1Br8Lsbf8lm5jiJ5hv2GBGQa9eDA0qLBFnkvBTe9zx7AV/K/KnTUXlK2DdeZVfO8
 eqgyTrXANyyJqTC/s2GAOLFwySRZFx9EHw6Kg8cmjoG9o8VCpXljQ8qj71ZOtFlo
 xBDlmg4Y6czRZC2kQEFC1kA30nLw+2UAnOEwr11JgGod9K+DqFtLKnVVEsrtxmGH
 E7ccV3QRc/Q=
 =Nuzs
 -----END PGP SIGNATURE-----

Merge tag 'x86-platform-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 platform update from Ingo Molnar:

 - Simplify add_rtc_cmos()

 - Use strscpy() in the mcelog code

* tag 'x86-platform-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce/dev-mcelog: use strscpy() to instead of strncpy()
  x86/rtc: Simplify PNP ids check
2023-02-20 19:04:54 -08:00
Linus Torvalds 2e0ddb34e5 Updates in this cycle:
- Replace zero-length array in struct xregs_state with flexible-array member,
    to help the enabling of stricter compiler checks.
 
  - Don't set TIF_NEED_FPU_LOAD for PF_IO_WORKER threads.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPzc+cRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gJ2RAAni5xUIObYPeEH4gEL0hXaLDXXYM+iyvK
 Fp4kI/9HLdgEOYjIFZG1H6nOlSDOSwFHg2mhpxXRujByAuohGT8cU2ax6Zh3ya1g
 qJy5UepKa+Twvoifts4mmClJrK+tVCWfKthoqCc5C6D2lGroCplvp5hE2/KIs/9d
 QcnpKaFAapYNat1rzLtBLQUovxVdd8yow5+pTEJhUnhF6EYMIn4M5cPncINPdx98
 AnQi1v6pWjhgK5gP7G1maz7J0khr0nNzd17Dtx+iknLX4cVxa/Hx+wQtsom80Xwt
 aeIwPYiAYTFs0ZjVrEtwwtV0ub29viHeS2x1v8nkLSA0LJAVrIX/1yuIMbUuFqS8
 PW/mSKkA0phhtVbe6SYtUyXv8zLysWenFre6nD7PNpWFmjOyNUSJcw/clUYwBYVX
 LbtwaOwI+shbd/BEGwAPC4235XOl1wq/g9gvV4bt3vTi0pE8tke0oHVkIG6bCAK1
 +CB5pnAHfZRNC8H1CkyurbmHPUOMaS/PK4LpvPK1/G03Fe4s42SUNUd7DWCevxq3
 agvfOmiH/XvoLdJejYmqeLAz700gBntYkNvBg2P1f30EVj/wGvVDEen8MBGMaIDe
 5bef3pIUfP5Ye3DdzFRcjI5OjbijNHxQ7MnIqqW3Uuz2ujdq8ME88l8eUxJE7JZN
 KZv7r+asPEM=
 =az43
 -----END PGP SIGNATURE-----

Merge tag 'x86-fpu-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fpu updates from Ingo Molnar:

 - Replace zero-length array in struct xregs_state with flexible-array
   member, to help the enabling of stricter compiler checks.

 - Don't set TIF_NEED_FPU_LOAD for PF_IO_WORKER threads.

* tag 'x86-fpu-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Don't set TIF_NEED_FPU_LOAD for PF_IO_WORKER threads
  x86/fpu: Replace zero-length array in struct xregs_state with flexible-array member
2023-02-20 18:50:02 -08:00
Linus Torvalds 8a68bd3e9f Changes in this cycle:
- Clean up the signal frame layout tests
 
  - Suppress KMSAN false positive reports in arch_within_stack_frames()
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPzctERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gYdxAArL7aQy1GW51Wo+HjQXm6ONalHpb9hgjz
 rTUp+eFPhlPq6dtPhcNQoT3eyEzlgoIHw5b2gDByLzU+kKROaGW6T/ao3jxKtR2+
 YVj+P/9jCx/Kak+3Fw2rAZjpRIJn/iWEaPVIXVGj9wAml8aeJWy14tOTUSaS0fIB
 EGAtpXaPCg8/DtU2q4mSwcW0n2bwqFs0jYJRaSxuZo16woQffkMpAF97P/q1TKNs
 t5Bvx40Hio9kijYuZmSrcygAtOxlL+rcHXsJofPmlOlt/N5yopszrwfKUGzwN1b8
 BTOPac5gGbhfY/pKC+rB8nf2bDdYEdqBW9hYhy7AGiAAHOy4xoULxQXxicANRU6+
 2m1q/jmYah282AOOUAy8kfx9DmEZbXmZcefkAJfjhobJIwYXxa4tKayDowUph2LP
 c3ronMZoLq1tCZ/rCVeOfijOu+cDMJy2gI6lLBxFVxsIHQigWfCEWcCp0z+10FAJ
 puUGIZJnA+Gchf1zrIEcoAYVw3bkFZ/Mx6cE8nogLO9XT76S5R94Z/sE70gzX439
 gbsE1GnffyFwF2o2ClzoCobHs50JSMQGPbI7hJy+MyaRqoiyMaegNE1+unUAHR4O
 dJifzhr8LyBr4maO8AbeNtreIlW67CpSEUM/TY+KithkRtkZOCUJO9iDsggKH96n
 E3+WC9zqUgg=
 =T5pv
 -----END PGP SIGNATURE-----

Merge tag 'x86-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 core updates from Ingo Molnar:

 - Clean up the signal frame layout tests

 - Suppress KMSAN false positive reports in arch_within_stack_frames()

* tag 'x86-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Suppress KMSAN reports in arch_within_stack_frames()
  x86/signal/compat: Move sigaction_compat_abi() to signal_64.c
  x86/signal: Move siginfo field tests
2023-02-20 18:40:45 -08:00
Linus Torvalds 35011c67c8 Changes in this cycle:
- Robustify/fix calling startup_{32,64}() from the decompressor code,
    and removing x86 quirk from scripts/head-object-list.txt as
    a result.
 
  - Do not register processors that cannot be onlined for x2APIC
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPzcNsRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gFjhAAqxnVl1X413IK9sd4C56wWdoLlRo9uGvO
 HtAYK1SRzibTOrFn+ByBhugYzYPyxIx634rM6hyp4nkVEnbgCXQ+Qc1xOwjyW8fh
 gxR3FCxsQiqajg7/1DOOSoMc/rc3adU73RHCWTjcHV/Zo7KEtvVa6AFvMTd1xzt9
 eMPqi7wsPflbdUV9wvf6cKkFPe3Nm3P1hOlUDHGmYZkDw30N8UlZmxvegwrBFDdV
 SpiJ0ZLV90NGJ6k6O3XSd4pVDxMn9DlYd0v/0r+YAT56hiRhefSKR2/jQntutZqp
 YlyZYjvwUjwEgOdUWPPRbndWWEfFsE2XQQclr4L+ZLQ/Gm8jTsT2b/IvXBmF4FzV
 0kzjNdhkPObx3X6UQZ47r6J3x8SWA9qZ6JH+uqCd6w/UW1KIiMBZ2kuIXvJn6eSr
 xFLabjPPeOeRXFpiQJjIZ31m7i3JlQbIsfb8IIxI1D55nEkNywjk9VqlLEVw23qD
 p93l0+ehpnZ2YCjV0kts/EXMikSmVZorA5wkTzEmG5ER+2BuIDin+wuGPawXrKew
 QCa2X7GoVmxf81Rcz7f/E+JnYcSTQ6AQzFkOxe3zb97bnRsckM/87buC0GktcPjW
 C8iy3yZzEIhRj2ilKEZLl7jIK59B4jReUKJx+vsxk2k2p5fuRdMkMtPfIZDBwVHQ
 PzRZGSDY4FI=
 =p3z1
 -----END PGP SIGNATURE-----

Merge tag 'x86-boot-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 boot updates from Ingo Molnar:

 - Robustify/fix calling startup_{32,64}() from the decompressor code,
   and removing x86 quirk from scripts/head-object-list.txt as a result.

 - Do not register processors that cannot be onlined for x2APIC

* tag 'x86-boot-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/acpi/boot: Do not register processors that cannot be onlined for x2APIC
  scripts/head-object-list: Remove x86 from the list
  x86/boot: Robustify calling startup_{32,64}() from the decompressor code
2023-02-20 18:32:55 -08:00
Linus Torvalds 1f2d9ffc7a Scheduler updates in this cycle are:
- Improve the scalability of the CFS bandwidth unthrottling logic
    with large number of CPUs.
 
  - Fix & rework various cpuidle routines, simplify interaction with
    the generic scheduler code. Add __cpuidle methods as noinstr to
    objtool's noinstr detection and fix boatloads of cpuidle bugs & quirks.
 
  - Add new ABI: introduce MEMBARRIER_CMD_GET_REGISTRATIONS,
    to query previously issued registrations.
 
  - Limit scheduler slice duration to the sysctl_sched_latency period,
    to improve scheduling granularity with a large number of SCHED_IDLE
    tasks.
 
  - Debuggability enhancement on sys_exit(): warn about disabled IRQs,
    but also enable them to prevent a cascade of followup problems and
    repeat warnings.
 
  - Fix the rescheduling logic in prio_changed_dl().
 
  - Micro-optimize cpufreq and sched-util methods.
 
  - Micro-optimize ttwu_runnable()
 
  - Micro-optimize the idle-scanning in update_numa_stats(),
    select_idle_capacity() and steal_cookie_task().
 
  - Update the RSEQ code & self-tests
 
  - Constify various scheduler methods
 
  - Remove unused methods
 
  - Refine __init tags
 
  - Documentation updates
 
  - ... Misc other cleanups, fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPzbJwRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iIvA//ZcEaB8Z6ChLRQjM+bsaudKJu3pdLQbPK
 iYbP8Da+LsAfxbEfYuGV3m+jIp0LlBOtsI/EezxQrXV+V7FvNyAX9Y00eEu/zlj8
 7Jn3LMy/DBYTwH7LwVdcU0MyIVI8ZPc6WNnkx0LOtGZn8n+qfHPSDzcP3CW+a5AV
 UvllPYpYyEmsX0Eby7CF4Ue8mSmbViw/xR3rNr8ZSve0c25XzKabw8O9kE3jiHxP
 d/zERJoAYeDyYUEuZqhfn5dTlB4an4IjNEkAfRE5SQ09RA8Gkxsa5Ar8gob9e9M1
 eQsdd4/bdhnrkM8L5qDZczqmgCTZ2bukQrxkBXhRDhLgoFxwAn77b+2ZjmIW3Lae
 AyGqRcDSg1q2oxaYm5ZiuO/t26aDOZu9vPHyHRDGt95EGbZlrp+GgeePyfCigJYz
 UmPdZAAcHdSymnnnlcvdG37WVvaVkpgWZzd8LbtBi23QR+Zc4WQ2IlgnUS5WKNNf
 VOBcAcP6E1IslDotZDQCc2dPFFQoQQEssVooyUc5oMytm7BsvxXLOeHG+Ncu/8uc
 H+U8Qn8jnqTxJbC5hkWQIJlhVKCq2FJrHxxySYTKROfUNcDgCmxboFeAcXTCIU1K
 T0S+sdoTS/CvtLklRkG0j6B8N4N98mOd9cFwUV3tX+/gMLMep3hCQs5L76JagvC5
 skkQXoONNaM=
 =l1nN
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:

 - Improve the scalability of the CFS bandwidth unthrottling logic with
   large number of CPUs.

 - Fix & rework various cpuidle routines, simplify interaction with the
   generic scheduler code. Add __cpuidle methods as noinstr to objtool's
   noinstr detection and fix boatloads of cpuidle bugs & quirks.

 - Add new ABI: introduce MEMBARRIER_CMD_GET_REGISTRATIONS, to query
   previously issued registrations.

 - Limit scheduler slice duration to the sysctl_sched_latency period, to
   improve scheduling granularity with a large number of SCHED_IDLE
   tasks.

 - Debuggability enhancement on sys_exit(): warn about disabled IRQs,
   but also enable them to prevent a cascade of followup problems and
   repeat warnings.

 - Fix the rescheduling logic in prio_changed_dl().

 - Micro-optimize cpufreq and sched-util methods.

 - Micro-optimize ttwu_runnable()

 - Micro-optimize the idle-scanning in update_numa_stats(),
   select_idle_capacity() and steal_cookie_task().

 - Update the RSEQ code & self-tests

 - Constify various scheduler methods

 - Remove unused methods

 - Refine __init tags

 - Documentation updates

 - Misc other cleanups, fixes

* tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (110 commits)
  sched/rt: pick_next_rt_entity(): check list_entry
  sched/deadline: Add more reschedule cases to prio_changed_dl()
  sched/fair: sanitize vruntime of entity being placed
  sched/fair: Remove capacity inversion detection
  sched/fair: unlink misfit task from cpu overutilized
  objtool: mem*() are not uaccess safe
  cpuidle: Fix poll_idle() noinstr annotation
  sched/clock: Make local_clock() noinstr
  sched/clock/x86: Mark sched_clock() noinstr
  x86/pvclock: Improve atomic update of last_value in pvclock_clocksource_read()
  x86/atomics: Always inline arch_atomic64*()
  cpuidle: tracing, preempt: Squash _rcuidle tracing
  cpuidle: tracing: Warn about !rcu_is_watching()
  cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG
  cpuidle: drivers: firmware: psci: Dont instrument suspend code
  KVM: selftests: Fix build of rseq test
  exit: Detect and fix irq disabled state in oops
  cpuidle, arm64: Fix the ARM64 cpuidle logic
  cpuidle: mvebu: Fix duplicate flags assignment
  sched/fair: Limit sched slice duration
  ...
2023-02-20 17:41:08 -08:00
Linus Torvalds a2f0e7eee1 The latest perf updates in this cycle are:
- Optimize perf_sample_data layout
  - Prepare sample data handling for BPF integration
  - Update the x86 PMU driver for Intel Meteor Lake
  - Restructure the x86 uncore code to fix a SPR (Sapphire Rapids)
    discovery breakage
  - Fix the x86 Zhaoxin PMU driver
  - Cleanups
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPzaHgRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jYQg/+KRfobCevMQlZVnz09T3SsJ4ahJ587BL6
 g2C6kobyUNfeChpFVroBkTR+yCb6Mq4xGr2nda9+2E978BYu9eanpx/u/bXNQ6NU
 6YhLwgRrlFXonYn07kFfUJeELZ0W+zpPvymEN1KhTQWcrgXDfXRt2VfMwNsVxGRF
 ZRyCWK+UOzSMU22FtW3I/xVLBB0vio9Y6wRC5QOpDVW5YtGwQGust7GJ53JPK43J
 m2soJvWORauT+v0aqc7ggOtKd6pahVoXrDrbktxtq9N0ZGI+PubVCGevex++cXm/
 B3QSf6VcMMuU6pfzxiEwRa8Whrc3XFeSDEfvMjC5v3becGNkdNBnGOJzYprwgRZJ
 irb6/dSrv5P2lj6WphsO1Wzcm7EoWh8M7DVOMh/13Y/oODRdOrv48112Don9UURC
 EPyvzAzizqdwdDopUmfiqUwuAXqb8uPZqCgmlz/NJkVz1/ijlfrmLgeDuf0vI7Aq
 HznzzRwjFHzyCH7D+rtonFh3JDaqgaouY76tpC5yTtzKbZPlFT8kzeCvqkTMnGgH
 czZnSNc/kBup0HDkNSlthK+TyrMXWKeVa8KQSY1E0NJHO4IBBCMzZywSoAaeofQK
 hqfQyofX9XHmuHhCA4yIfv1XkZGlBTxpPAyDdHjgs9iJTsodSYMs8ESY08eW8DXn
 Ld/35O6SylM=
 =ztUT
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates from Ingo Molnar:

 - Optimize perf_sample_data layout

 - Prepare sample data handling for BPF integration

 - Update the x86 PMU driver for Intel Meteor Lake

 - Restructure the x86 uncore code to fix a SPR (Sapphire Rapids)
   discovery breakage

 - Fix the x86 Zhaoxin PMU driver

 - Cleanups

* tag 'perf-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  perf/x86/intel/uncore: Add Meteor Lake support
  x86/perf/zhaoxin: Add stepping check for ZXC
  perf/x86/intel/ds: Fix the conversion from TSC to perf time
  perf/x86/uncore: Don't WARN_ON_ONCE() for a broken discovery table
  perf/x86/uncore: Add a quirk for UPI on SPR
  perf/x86/uncore: Ignore broken units in discovery table
  perf/x86/uncore: Fix potential NULL pointer in uncore_get_alias_name
  perf/x86/uncore: Factor out uncore_device_to_die()
  perf/core: Call perf_prepare_sample() before running BPF
  perf/core: Introduce perf_prepare_header()
  perf/core: Do not pass header for sample ID init
  perf/core: Set data->sample_flags in perf_prepare_sample()
  perf/core: Add perf_sample_save_brstack() helper
  perf/core: Add perf_sample_save_raw_data() helper
  perf/core: Add perf_sample_save_callchain() helper
  perf/core: Save the dynamic parts of sample data size
  x86/kprobes: Use switch-case for 0xFF opcodes in prepare_emulation
  perf/core: Change the layout of perf_sample_data
  perf/x86/msr: Add Meteor Lake support
  perf/x86/cstate: Add Meteor Lake support
  ...
2023-02-20 17:29:55 -08:00
Linus Torvalds 6e649d0856 Updates for this cycle were:
- rwsem micro-optimizations
  - spinlock micro-optimizations
  - cleanups, simplifications
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPzZkURHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gBvA/6A5RMmSdOIVojmiwUYvZInA/Kpm2wuW0q
 bQWW9maLb96JMpj3FB5Xs5U993WiF0Gt9aGHoND9V2wOYbnv01ElCKKgsw7zLnXb
 c++txpmD+HoUGp94H8T2nA3szPLR7OpPpLmfjTWHKeWQRTStJobTTqi5jVTUZT37
 92MZ2tVzapQJq5VESk0C+0FBFDobh0gTX8hwkEj83ubXK4rC071/gJD4JHZt4nWN
 Up9YGNoNvw+ns7upo2C1XJ4H4ucFoCXT2smH4Oh0gk8Cfs6oP1k5H8J5aJQ+23fT
 EOWzkk0vJdpukNXI1+4G4KMwCO6zv+xVxXpEBizEeTgKWwbJpgBeGrisheAUyMHT
 bwfztsn+NQET11NsccmtRzspscUT42Nc+FUW0KeR2LiBKZhuD6l1Tac3w2HolycA
 2YjMQx3ATOEnMFgv4jGlldlasIAnYj0qitw6wCGqkJSvrC3au/LfcBHn45SxkBWc
 KZV1Oj26aH1hDxYSLyZRmFEvf/46D9CHmv8ReuFbmM6FIYwL+go+Odw/MHMFAbZR
 aP9YR4e94p6WmaNwMqzozP+wN67E4TME2vG6+T1n/szKDogoBlcn/wl053pkcHa/
 CsjELY82/CRRrDgWSnSKUZEFvnnBujyEiSz7pTZCzdBTMc/EcxK5CHOSyN23x+LI
 TvvxFn7KM/o=
 =yTly
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:

 - rwsem micro-optimizations

 - spinlock micro-optimizations

 - cleanups, simplifications

* tag 'locking-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  vduse: Remove include of rwlock.h
  locking/lockdep: Remove lockdep_init_map_crosslock.
  x86/ACPI/boot: Use try_cmpxchg() in __acpi_{acquire,release}_global_lock()
  x86/PAT: Use try_cmpxchg() in set_page_memtype()
  locking/rwsem: Disable preemption in all down_write*() and up_write() code paths
  locking/rwsem: Disable preemption in all down_read*() and up_read() code paths
  locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
  locking/qspinlock: Micro-optimize pending state waiting for unlock
2023-02-20 17:18:23 -08:00
Yang Jihong f1c97a1b4e x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe range
When arch_prepare_optimized_kprobe calculating jump destination address,
it copies original instructions from jmp-optimized kprobe (see
__recover_optprobed_insn), and calculated based on length of original
instruction.

arch_check_optimized_kprobe does not check KPROBE_FLAG_OPTIMATED when
checking whether jmp-optimized kprobe exists.
As a result, setup_detour_execution may jump to a range that has been
overwritten by jump destination address, resulting in an inval opcode error.

For example, assume that register two kprobes whose addresses are
<func+9> and <func+11> in "func" function.
The original code of "func" function is as follows:

   0xffffffff816cb5e9 <+9>:     push   %r12
   0xffffffff816cb5eb <+11>:    xor    %r12d,%r12d
   0xffffffff816cb5ee <+14>:    test   %rdi,%rdi
   0xffffffff816cb5f1 <+17>:    setne  %r12b
   0xffffffff816cb5f5 <+21>:    push   %rbp

1.Register the kprobe for <func+11>, assume that is kp1, corresponding optimized_kprobe is op1.
  After the optimization, "func" code changes to:

   0xffffffff816cc079 <+9>:     push   %r12
   0xffffffff816cc07b <+11>:    jmp    0xffffffffa0210000
   0xffffffff816cc080 <+16>:    incl   0xf(%rcx)
   0xffffffff816cc083 <+19>:    xchg   %eax,%ebp
   0xffffffff816cc084 <+20>:    (bad)
   0xffffffff816cc085 <+21>:    push   %rbp

Now op1->flags == KPROBE_FLAG_OPTIMATED;

2. Register the kprobe for <func+9>, assume that is kp2, corresponding optimized_kprobe is op2.

register_kprobe(kp2)
  register_aggr_kprobe
    alloc_aggr_kprobe
      __prepare_optimized_kprobe
        arch_prepare_optimized_kprobe
          __recover_optprobed_insn    // copy original bytes from kp1->optinsn.copied_insn,
                                      // jump address = <func+14>

3. disable kp1:

disable_kprobe(kp1)
  __disable_kprobe
    ...
    if (p == orig_p || aggr_kprobe_disabled(orig_p)) {
      ret = disarm_kprobe(orig_p, true)       // add op1 in unoptimizing_list, not unoptimized
      orig_p->flags |= KPROBE_FLAG_DISABLED;  // op1->flags ==  KPROBE_FLAG_OPTIMATED | KPROBE_FLAG_DISABLED
    ...

4. unregister kp2
__unregister_kprobe_top
  ...
  if (!kprobe_disabled(ap) && !kprobes_all_disarmed) {
    optimize_kprobe(op)
      ...
      if (arch_check_optimized_kprobe(op) < 0) // because op1 has KPROBE_FLAG_DISABLED, here not return
        return;
      p->kp.flags |= KPROBE_FLAG_OPTIMIZED;   //  now op2 has KPROBE_FLAG_OPTIMIZED
  }

"func" code now is:

   0xffffffff816cc079 <+9>:     int3
   0xffffffff816cc07a <+10>:    push   %rsp
   0xffffffff816cc07b <+11>:    jmp    0xffffffffa0210000
   0xffffffff816cc080 <+16>:    incl   0xf(%rcx)
   0xffffffff816cc083 <+19>:    xchg   %eax,%ebp
   0xffffffff816cc084 <+20>:    (bad)
   0xffffffff816cc085 <+21>:    push   %rbp

5. if call "func", int3 handler call setup_detour_execution:

  if (p->flags & KPROBE_FLAG_OPTIMIZED) {
    ...
    regs->ip = (unsigned long)op->optinsn.insn + TMPL_END_IDX;
    ...
  }

The code for the destination address is

   0xffffffffa021072c:  push   %r12
   0xffffffffa021072e:  xor    %r12d,%r12d
   0xffffffffa0210731:  jmp    0xffffffff816cb5ee <func+14>

However, <func+14> is not a valid start instruction address. As a result, an error occurs.

Link: https://lore.kernel.org/all/20230216034247.32348-3-yangjihong1@huawei.com/

Fixes: f66c0447cc ("kprobes: Set unoptimized flag after unoptimizing code")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-02-21 08:49:16 +09:00
Yang Jihong 868a6fc0ca x86/kprobes: Fix __recover_optprobed_insn check optimizing logic
Since the following commit:

  commit f66c0447cc ("kprobes: Set unoptimized flag after unoptimizing code")

modified the update timing of the KPROBE_FLAG_OPTIMIZED, a optimized_kprobe
may be in the optimizing or unoptimizing state when op.kp->flags
has KPROBE_FLAG_OPTIMIZED and op->list is not empty.

The __recover_optprobed_insn check logic is incorrect, a kprobe in the
unoptimizing state may be incorrectly determined as unoptimizing.
As a result, incorrect instructions are copied.

The optprobe_queued_unopt function needs to be exported for invoking in
arch directory.

Link: https://lore.kernel.org/all/20230216034247.32348-2-yangjihong1@huawei.com/

Fixes: f66c0447cc ("kprobes: Set unoptimized flag after unoptimizing code")
Cc: stable@vger.kernel.org
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-02-21 08:49:16 +09:00
Nuno Das Neves b14033a3e6 x86/hyperv: Fix hv_get/set_register for nested bringup
hv_get_nested_reg only translates SINT0, resulting in the wrong sint
being registered by nested vmbus.

Fix the issue with new utility function hv_is_sint_reg.

While at it, improve clarity of hv_set_non_nested_register and hv_is_synic_reg.

Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Jinank Jain <jinankjain@linux.microsoft.com>
Link: https://lore.kernel.org/r/1675980172-6851-1-git-send-email-nunodasneves@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-02-16 14:32:37 +00:00
Paolo Bonzini 33436335e9 KVM/riscv changes for 6.3
- Fix wrong usage of PGDIR_SIZE to check page sizes
 - Fix privilege mode setting in kvm_riscv_vcpu_trap_redirect()
 - Redirect illegal instruction traps to guest
 - SBI PMU support for guest
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmPifFIACgkQrUjsVaLH
 LAcEyxAAinMBaBhiPmwWZQvcCzh/UFmJo8BQCwAPuwoc/a4ZGAR7ylzd0oJilP8M
 wSgX6Ad8XF+CEW2VpxW9nwyi41N25ep1Lrf8vOaWy9L9QNUo0t15WrCIbXT2p399
 HrK9fz7HHKKIMsJy+rYb9EepdmMf55xtr1Y/EjyvhoDQbrEMlKsAODYz/SUoriQG
 Tn3cCYBzLdvzDzu0xXM9v+nsetWXdajK/v4je+mE3NQceXhePAO4oVWP4IpnoROd
 ZQm3evvVdf0WtKG9curxwMB7jjBqDBFrcLYl0qHGa7pi2o5PzVM7esgaV47KwetH
 IgA/Mrf1IfzpgM7VYDDax5wUHlKj63KisqU0J8rU3PUloQXaWqv7+ho51t9GzZ/i
 9x4uyO/evVntgyTw6HCbqmQJDgEtJiG1ydrR/ydBMYHLnh7LPY2UpKgcqmirtbkK
 1/DYDp84vikQ5VW1hc8IACdoBShh9Moh4xsEStzkTrIeHcZCjtORXUh8UIPZ0Mu2
 7Mnkktu9I55SLwA3rwH/EYT1ISrOV1G+q3wfqgeLpn8YUWwCIiqWQ5Ur0/WSMJse
 uJ3HedZDzj9T4n4khX+mKEYh6joAafQZag+4TID2lRSwd0S/mpeC22hYrViMdDmq
 yhE+JNin/sz4AVaHNzGwfqk2NC2RFl9aRn2X0xTwyBubif9pKMQ=
 =spUL
 -----END PGP SIGNATURE-----

Merge tag 'kvm-riscv-6.3-1' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv changes for 6.3

- Fix wrong usage of PGDIR_SIZE to check page sizes
- Fix privilege mode setting in kvm_riscv_vcpu_trap_redirect()
- Redirect illegal instruction traps to guest
- SBI PMU support for guest
2023-02-15 12:33:28 -05:00
Paolo Bonzini 27b025ebb0 KVM VMX changes for 6.3:
- Handle NMI VM-Exits before leaving the noinstr region
 
  - A few trivial cleanups in the VM-Enter flows
 
  - Stop enabling VMFUNC for L1 purely to document that KVM doesn't support
    EPTP switching (or any other VM function) for L1
 
  - Fix a crash when using eVMCS's enlighted MSR bitmaps
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmPsL4USHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5fvQP/jOhCos/8VrNPJ5vtVkELSgbefbLFPbR
 ThCefVekbnbqkJ4PdxLFOw/nzK+NREnoet02lvLXIeBt243qMeDWbNuGncZ+rqqX
 ohJ6ALIS7ag2egG2ZllM4KTV1jWnm7bG6C8QsJwrFuG7wHe69lPYNnv34OD4l8QO
 PR7reXBl7iyqQehF0rCqkYvtpv1QI+VlVd3ODCg/0vaZX4AOPZI6X1hX2+myUfTy
 LI1oQ88SQ0ptoH5o7xBtNQw7Zknm/SP82Djzpg/1Wpr3cZv4g/1vblNcxpU/D+3I
 Cp8xM42c82GIX1Zk6uYfsoCyQZ7Bd38llbgcLM5Fi0F2IuKJKoN1iuH77RJ1Juqf
 v+SkRmFAr5Uf3UYVp4ck4FxTRH58ooDPeIv2jdaguyE9NgRfbVsghn6QDC+gkZMx
 z4xdwCBXgsAAkhJwOLqDy98ZrQZoy1kxlmG2jLIe5RvOzI4WQDUJkJAd+u0dBD/W
 U7PRqbqfJFtvvKN784BTa4CO8m+KwaNz4Xrwcg4YtBnKBgDU/cxFUZnnSEHdi9Pr
 tVf462dfReK3slK3rkPw5HlacFdQsgfX6+VX1eeFKY+gE88TXoZks/X7ma24MXEw
 hOu6VZMat1NjkfcgUcMffBVZlejK7C6oXTeVmqVKsx6hUs+zzqxpQ05AT4TMbAUI
 gPTHJyy9ByHO
 =WXGG
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-vmx-6.3' of https://github.com/kvm-x86/linux into HEAD

KVM VMX changes for 6.3:

 - Handle NMI VM-Exits before leaving the noinstr region

 - A few trivial cleanups in the VM-Enter flows

 - Stop enabling VMFUNC for L1 purely to document that KVM doesn't support
   EPTP switching (or any other VM function) for L1

 - Fix a crash when using eVMCS's enlighted MSR bitmaps
2023-02-15 12:23:19 -05:00
Rafael J. Wysocki 7e71a13353 Merge branches 'pm-cpuidle', 'pm-core' and 'pm-sleep'
Merge cpuidle updates, PM core updates and changes related to system
sleep handling for 6.3-rc1:

 - Make the TEO cpuidle governor check CPU utilization in order to refine
   idle state selection (Kajetan Puchalski).

 - Make Kconfig select the haltpoll cpuidle governor when the haltpoll
   cpuidle driver is selected and replace a default_idle() call in that
   driver with arch_cpu_idle() which allows MWAIT to be used (Li
   RongQing).

 - Add Emerald Rapids Xeon support to the intel_idle driver (Artem
   Bityutskiy).

 - Add ARCH_SUSPEND_POSSIBLE dependencies for ARMv4 cpuidle drivers to
   avoid randconfig build failures (Arnd Bergmann).

 - Make kobj_type structures used in the cpuidle sysfs interface
   constant (Thomas Weißschuh).

 - Make the cpuidle driver registration code update microsecond values
   of idle state parameters in accordance with their nanosecond values
   if they are provided (Rafael Wysocki).

 - Make the PSCI cpuidle driver prevent topology CPUs from being
   suspended on PREEMPT_RT (Krzysztof Kozlowski).

 - Document that pm_runtime_force_suspend() cannot be used with
   DPM_FLAG_SMART_SUSPEND (Richard Fitzgerald).

 - Add EXPORT macros for exporting PM functions from drivers (Richard
   Fitzgerald).

 - Drop "select SRCU" from system sleep Kconfig (Paul E. McKenney).

 - Remove /** from non-kernel-doc comments in hibernation code (Randy
   Dunlap).

* pm-cpuidle:
  cpuidle: psci: Do not suspend topology CPUs on PREEMPT_RT
  cpuidle: driver: Update microsecond values of state parameters as needed
  cpuidle: sysfs: make kobj_type structures constant
  cpuidle: add ARCH_SUSPEND_POSSIBLE dependencies
  intel_idle: add Emerald Rapids Xeon support
  cpuidle-haltpoll: Replace default_idle() with arch_cpu_idle()
  cpuidle-haltpoll: select haltpoll governor
  cpuidle: teo: Introduce util-awareness
  cpuidle: teo: Optionally skip polling states in teo_find_shallower_state()

* pm-core:
  PM: Add EXPORT macros for exporting PM functions
  PM: runtime: Document that force_suspend() is incompatible with SMART_SUSPEND

* pm-sleep:
  PM: sleep: Remove "select SRCU"
  PM: hibernate: swap: don't use /** for non-kernel-doc comments
2023-02-15 15:59:48 +01:00
Srivatsa S. Bhat (VMware) fcb3a81d22 x86/hotplug: Remove incorrect comment about mwait_play_dead()
The comment that says mwait_play_dead() returns only on failure is a bit
misleading because mwait_play_dead() could actually return for valid
reasons (such as mwait not being supported by the platform) that do not
indicate a failure of the CPU offline operation. So, remove the comment.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230128003751.141317-1-srivatsa@csail.mit.edu
2023-02-14 23:44:34 +01:00
Linus Torvalds 82eac0c830 Certain AMD processors are vulnerable to a cross-thread return address
predictions bug. When running in SMT mode and one of the sibling threads
 transitions out of C0 state, the other thread gets access to twice as many
 entries in the RSB, but unfortunately the predictions of the now-halted
 logical processor are not purged.  Therefore, the executing processor
 could speculatively execute from locations that the now-halted processor
 had trained the RSB on.
 
 The Spectre v2 mitigations cover the Linux kernel, as it fills the RSB
 when context switching to the idle thread. However, KVM allows a VMM to
 prevent exiting guest mode when transitioning out of C0 using the
 KVM_CAP_X86_DISABLE_EXITS capability can be used by a VMM to change this
 behavior. To mitigate the cross-thread return address predictions bug,
 a VMM must not be allowed to override the default behavior to intercept
 C0 transitions.
 
 These patches introduce a KVM module parameter that, if set, will prevent
 the user from disabling the HLT, MWAIT and CSTATE exits.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmPrvAAUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMQAgf8CirL+yrngzQ+39UTFQgXj3IS1UyR
 o2mF39w3bVlQhLNf5MBHF965FUeOV/8A16x73hJGjhiiuisphtQWS/6xKR6uYbq7
 0Qi821skqN6XpRsWTWqFHMsdY+n0skr8QeXG4k/GJu7Ghb3tqs4eTGgnf2WBfI8/
 K1UgTmjd9+ikM5gKZoVLpcqZnti0gx3lM+cvZGdfrIUaXB+i+hNd2NfRTiGsTOiK
 fX7vZtLvOeje2TPoKLhzekTbh8kTU07HRWID9aVXT8bLy6Zd6tg2CHlv11noKpwv
 DFVV+RsJ1SiAQYSwT+4IvWfIG4oq4onBQ972g2a27pP2cxF+38GXzt4NQw==
 =xscg
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Certain AMD processors are vulnerable to a cross-thread return address
  predictions bug. When running in SMT mode and one of the sibling
  threads transitions out of C0 state, the other thread gets access to
  twice as many entries in the RSB, but unfortunately the predictions of
  the now-halted logical processor are not purged. Therefore, the
  executing processor could speculatively execute from locations that
  the now-halted processor had trained the RSB on.

  The Spectre v2 mitigations cover the Linux kernel, as it fills the RSB
  when context switching to the idle thread. However, KVM allows a VMM
  to prevent exiting guest mode when transitioning out of C0 using the
  KVM_CAP_X86_DISABLE_EXITS capability can be used by a VMM to change
  this behavior. To mitigate the cross-thread return address predictions
  bug, a VMM must not be allowed to override the default behavior to
  intercept C0 transitions.

  These patches introduce a KVM module parameter that, if set, will
  prevent the user from disabling the HLT, MWAIT and CSTATE exits"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  Documentation/hw-vuln: Add documentation for Cross-Thread Return Predictions
  KVM: x86: Mitigate the cross-thread return address predictions bug
  x86/speculation: Identify processors vulnerable to SMT RSB predictions
2023-02-14 09:17:01 -08:00
Johan Hovold bc1bc1b309 x86/ioapic: Use irq_domain_create_hierarchy()
Use the irq_domain_create_hierarchy() helper to create the hierarchical
domain, which both serves as documentation and avoids poking at
irqdomain internals.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Hsin-Yi Wang <hsinyi@chromium.org>
Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230213104302.17307-13-johan+linaro@kernel.org
2023-02-13 19:31:24 +00:00
Thomas Gleixner ab407a1919 Clocksource watchdog commits for v6.3
This pull request contains the following:
 
 o	Improvements to clocksource-watchdog console messages.
 
 o	Loosening of the clocksource-watchdog skew criteria to match
 	those of NTP (500 parts per million, relaxed from 400 parts
 	per million).  If it is good enough for NTP, it is good enough
 	for the clocksource watchdog.
 
 o	Suspend clocksource-watchdog checking temporarily when high
 	memory latencies are detected.	This avoids the false-positive
 	clock-skew events that have been seen on production systems
 	running memory-intensive workloads.
 
 o	On systems where the TSC is deemed trustworthy, use it as the
 	watchdog timesource, but only when specifically requested using
 	the tsc=watchdog kernel boot parameter.  This permits clock-skew
 	events to be detected, but avoids forcing workloads to use the
 	slow HPET and ACPI PM timers.  These last two timers are slow
 	enough to cause systems to be needlessly marked bad on the one
 	hand, and real skew does sometimes happen on production systems
 	running production workloads on the other.  And sometimes it is
 	the fault of the TSC, or at least of the firmware that told the
 	kernel to program the TSC with the wrong frequency.
 
 o	Add a tsc=revalidate kernel boot parameter to allow the kernel
 	to diagnose cases where the TSC hardware works fine, but was told
 	by firmware to tick at the wrong frequency.  Such cases are rare,
 	but they really have happened on production systems.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmPhnhkTHHBhdWxtY2tA
 a2VybmVsLm9yZwAKCRCevxLzctn7jClDD/9gTo62MakVQz2wzBRBcWunzX4BAfy2
 2ORqZYqq8cJ4ccFVWtSq7gZ+0bxiT+J4jaVyJpmUPzaiCSfNUT+GLjWyLGzF9Xq+
 xLWpFJOhFhKYjYN2m1ottuQ81V7aTlorC8AJt/o+oCJFGUCb/heg/UrmoZ6DweHw
 H7uXS9yenKdKgYoMENW+8IVsy16sT4D5Fe8XAD/2J6vBBUbgBzKWhi8XSgSHB/Xw
 GCP4UfXVGl5QRG9Xu4ZgrFV1t4azxtmdBghFm7/Kep/j6ttSY78yoS43AbI57bhD
 fWB5mfAQvO+Zo5/9rLjcDzeZCp/PSdARD41aycPMiei08K278tIN9T/fmfSoG6rV
 lVRdFxTHrQcqc9d+g+mGASQBezCF8pxonm9HYLBpNjyfYHnKV70SPXywO4oqAJ1I
 7dCm+uv3Y8KaJdVnPUWOHJjvQLx9NWK5/pXBYjsYnLR+69EVmGDgPZ+/ulQxkWBj
 DtrQgs+sHQ8gngNpAilxuu/lrUXzrC8N4mtxXKBFQoCPYQMFBkr9S+aAEHIgZT9H
 1dWwR1QxeR5uxt7U+3DmTyJ1XKfYjDyyScesILlLMLbdKgZtTS5wGaK4QdJ3QW2z
 z4zqPDccWDDZKZy9W4QBnFBx6Rn49C8xThy7f6Loc+2cKAT10hrEmRJsn79AOCDc
 6hV0S2U9a6ypQg==
 =OWY2
 -----END PGP SIGNATURE-----

Merge tag 'clocksource.2023.02.06b' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into timers/core

Pull clocksource watchdog changes from Paul McKenney:

     o	Improvements to clocksource-watchdog console messages.

     o	Loosening of the clocksource-watchdog skew criteria to match
     	those of NTP (500 parts per million, relaxed from 400 parts
     	per million).  If it is good enough for NTP, it is good enough
     	for the clocksource watchdog.

     o	Suspend clocksource-watchdog checking temporarily when high
     	memory latencies are detected.	This avoids the false-positive
     	clock-skew events that have been seen on production systems
     	running memory-intensive workloads.

     o	On systems where the TSC is deemed trustworthy, use it as the
     	watchdog timesource, but only when specifically requested using
     	the tsc=watchdog kernel boot parameter.  This permits clock-skew
     	events to be detected, but avoids forcing workloads to use the
     	slow HPET and ACPI PM timers.  These last two timers are slow
     	enough to cause systems to be needlessly marked bad on the one
     	hand, and real skew does sometimes happen on production systems
     	running production workloads on the other.  And sometimes it is
     	the fault of the TSC, or at least of the firmware that told the
     	kernel to program the TSC with the wrong frequency.

     o	Add a tsc=revalidate kernel boot parameter to allow the kernel
     	to diagnose cases where the TSC hardware works fine, but was told
     	by firmware to tick at the wrong frequency.  Such cases are rare,
     	but they really have happened on production systems.

Link: https://lore.kernel.org/r/20230210193640.GA3325193@paulmck-ThinkPad-P17-Gen-1
2023-02-13 19:28:48 +01:00
Josh Poimboeuf ffb1b4a410 x86/unwind/orc: Add 'signal' field to ORC metadata
Add a 'signal' field which allows unwind hints to specify whether the
instruction pointer should be taken literally (like for most interrupts
and exceptions) rather than decremented (like for call stack return
addresses) when used to find the next ORC entry.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/d2c5ec4d83a45b513d8fd72fab59f1a8cfa46871.1676068346.git.jpoimboe@kernel.org
2023-02-11 12:37:51 +01:00
Borislav Petkov (AMD) 6b8d5dde5b x86/tsc: Do feature check as the very first thing
Do the feature check as the very first thing in the function. Everything
else comes after that and is meaningless work if the TSC CPUID bit is
not even set. Switch to cpu_feature_enabled() too, while at it.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/Y5990CUCuWd5jfBH@zn.tnic
2023-02-11 10:44:07 +01:00
Borislav Petkov (AMD) 8fe6d84947 x86/tsc: Make recalibrate_cpu_khz() export GPL only
A quick search doesn't reveal any use outside of the kernel - which
would be questionable to begin with anyway - so make the export GPL
only.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/Y599miBzWRAuOwhg@zn.tnic
2023-02-11 10:44:07 +01:00
Borislav Petkov (AMD) 851026a2bf x86/cacheinfo: Remove unused trace variable
15cd8812ab ("x86: Remove the CPU cache size printk's") removed the
last use of the trace local var. Remove it too and the useless trace
cache case.

No functional changes.

Reported-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230210234541.9694-1-bp@alien8.de
Link: http://lore.kernel.org/r/20220705073349.1512-1-jiapeng.chong@linux.alibaba.com
2023-02-11 10:43:31 +01:00
Tom Lendacky be8de49bea x86/speculation: Identify processors vulnerable to SMT RSB predictions
Certain AMD processors are vulnerable to a cross-thread return address
predictions bug. When running in SMT mode and one of the sibling threads
transitions out of C0 state, the other sibling thread could use return
target predictions from the sibling thread that transitioned out of C0.

The Spectre v2 mitigations cover the Linux kernel, as it fills the RSB
when context switching to the idle thread. However, KVM allows a VMM to
prevent exiting guest mode when transitioning out of C0. A guest could
act maliciously in this situation, so create a new x86 BUG that can be
used to detect if the processor is vulnerable.

Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <91cec885656ca1fcd4f0185ce403a53dd9edecb7.1675956146.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-10 06:43:03 -05:00
Suren Baghdasaryan 1c71222e5f mm: replace vma->vm_flags direct modifications with modifier calls
Replace direct modifications to vma->vm_flags with calls to modifier
functions to be able to track flag changes and to keep vma locking
correctness.

[akpm@linux-foundation.org: fix drivers/misc/open-dice.c, per Hyeonggon Yoo]
Link: https://lkml.kernel.org/r/20230126193752.297968-5-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Oskolkov <posk@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09 16:51:39 -08:00
Ard Biesheuvel 93be2859e2 efi: x86: Wire up IBT annotation in memory attributes table
UEFI v2.10 extends the EFI memory attributes table with a flag that
indicates whether or not all RuntimeServicesCode regions were
constructed with ENDBR landing pads, permitting the OS to map these
regions with IBT restrictions enabled.

So let's take this into account on x86 as well.

Suggested-by: Peter Zijlstra <peterz@infradead.org> # ibt_save() changes
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2023-02-09 19:30:54 +01:00
Nadav Amit ae052e3ae0 x86/kprobes: Fix 1 byte conditional jump target
Commit 3bc753c06d ("kbuild: treat char as always unsigned") broke
kprobes.  Setting a probe-point on 1 byte conditional jump can cause the
kernel to crash when the (signed) relative jump offset gets treated as
unsigned.

Fix by replacing the unsigned 'immediate.bytes' (plus a cast) with the
signed 'immediate.value' when assigning to the relative jump offset.

[ dhansen: clarified changelog ]

Fixes: 3bc753c06d ("kbuild: treat char as always unsigned")
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20230208071708.4048-1-namit%40vmware.com
2023-02-08 12:03:27 -08:00
Paul E. McKenney 0051293c53 clocksource: Enable TSC watchdog checking of HPET and PMTMR only when requested
Unconditionally enabling TSC watchdog checking of the HPET and PMTMR
clocksources can degrade latency and performance.  Therefore, provide
a new "watchdog" option to the tsc= boot parameter that opts into such
checking.  Note that tsc=watchdog is overridden by a tsc=nowatchdog
regardless of their relative positions in the list of boot parameters.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Waiman Long <longman@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Waiman Long <longman@redhat.com>
2023-02-06 16:38:30 -08:00
Sebastian Andrzej Siewior 717cce3bdc x86/cpu: Provide the full setup for getcpu() on x86-32
setup_getcpu() configures two things:

  - it writes the current CPU & node information into MSR_TSC_AUX
  - it writes the same information as a GDT entry.

By using the "full" setup_getcpu() on i386 it is possible to read the CPU
information in userland via RDTSCP() or via LSL from the GDT.

Provide an GDT_ENTRY_CPUNODE for x86-32 and make the setup function
unconditionally available.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Link: https://lore.kernel.org/r/20221125094216.3663444-2-bigeasy@linutronix.de
2023-02-06 15:48:54 +01:00
Borislav Petkov (AMD) f33e0c893b x86/microcode/core: Return an error only when necessary
Return an error from the late loading function which is run on each CPU
only when an error has actually been encountered during the update.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230130161709.11615-5-bp@alien8.de
2023-02-06 13:41:31 +01:00
Borislav Petkov (AMD) 7ff6edf4fe x86/microcode/AMD: Fix mixed steppings support
The AMD side of the loader has always claimed to support mixed
steppings. But somewhere along the way, it broke that by assuming that
the cached patch blob is a single one instead of it being one per
*node*.

So turn it into a per-node one so that each node can stash the blob
relevant for it.

  [ NB: Fixes tag is not really the exactly correct one but it is good
    enough. ]

Fixes: fe055896c0 ("x86/microcode: Merge the early microcode loader")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org> # 2355370cd9 ("x86/microcode/amd: Remove load_microcode_amd()'s bsp parameter")
Cc: <stable@kernel.org> # a5ad92134b ("x86/microcode/AMD: Add a @cpu parameter to the reloading functions")
Link: https://lore.kernel.org/r/20230130161709.11615-4-bp@alien8.de
2023-02-06 13:40:16 +01:00
Borislav Petkov (AMD) a5ad92134b x86/microcode/AMD: Add a @cpu parameter to the reloading functions
Will be used in a subsequent change.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230130161709.11615-3-bp@alien8.de
2023-02-06 12:14:20 +01:00
Borislav Petkov (AMD) 2355370cd9 x86/microcode/amd: Remove load_microcode_amd()'s bsp parameter
It is always the BSP.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230130161709.11615-2-bp@alien8.de
2023-02-06 11:13:04 +01:00
Song Liu 0c05e7bd2d livepatch,x86: Clear relocation targets on a module removal
Josh reported a bug:

  When the object to be patched is a module, and that module is
  rmmod'ed and reloaded, it fails to load with:

  module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c
  livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
  livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'

  The livepatch module has a relocation which references a symbol
  in the _previous_ loading of nfsd. When apply_relocate_add()
  tries to replace the old relocation with a new one, it sees that
  the previous one is nonzero and it errors out.

He also proposed three different solutions. We could remove the error
check in apply_relocate_add() introduced by commit eda9cec4c9
("x86/module: Detect and skip invalid relocations"). However the check
is useful for detecting corrupted modules.

We could also deny the patched modules to be removed. If it proved to be
a major drawback for users, we could still implement a different
approach. The solution would also complicate the existing code a lot.

We thus decided to reverse the relocation patching (clear all relocation
targets on x86_64). The solution is not
universal and is too much arch-specific, but it may prove to be simpler
in the end.

Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>
Originally-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Reviewed-by: Joe Lawrence <joe.lawrence@redhat.com>
Tested-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230125185401.279042-2-song@kernel.org
2023-02-03 11:28:22 +01:00
Song Liu bbb93362a4 x86/module: remove unused code in __apply_relocate_add
This "#if 0" block has been untouched for many years. Remove it to clean
up the code.

Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Reviewed-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230125185401.279042-1-song@kernel.org
2023-02-03 11:27:23 +01:00
Paul E. McKenney efc8b329c7 clocksource: Verify HPET and PMTMR when TSC unverified
On systems with two or fewer sockets, when the boot CPU has CONSTANT_TSC,
NONSTOP_TSC, and TSC_ADJUST, clocksource watchdog verification of the
TSC is disabled.  This works well much of the time, but there is the
occasional production-level system that meets all of these criteria, but
which still has a TSC that skews significantly from atomic-clock time.
This is usually attributed to a firmware or hardware fault.  Yes, the
various NTP daemons do express their opinions of userspace-to-atomic-clock
time skew, but they put them in various places, depending on the daemon
and distro in question.  It would therefore be good for the kernel to
have some clue that there is a problem.

The old behavior of marking the TSC unstable is a non-starter because a
great many workloads simply cannot tolerate the overheads and latencies
of the various non-TSC clocksources.  In addition, NTP-corrected systems
sometimes can tolerate significant kernel-space time skew as long as
the userspace time sources are within epsilon of atomic-clock time.

Therefore, when watchdog verification of TSC is disabled, enable it for
HPET and PMTMR (AKA ACPI PM timer).  This provides the needed in-kernel
time-skew diagnostic without degrading the system's performance.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Waiman Long <longman@redhat.com>
Cc: <x86@kernel.org>
Tested-by: Feng Tang <feng.tang@intel.com>
2023-02-02 14:23:02 -08:00
Feng Tang a7ec817d55 x86/tsc: Add option to force frequency recalibration with HW timer
The kernel assumes that the TSC frequency which is provided by the
hardware / firmware via MSRs or CPUID(0x15) is correct after applying
a few basic consistency checks. This disables the TSC recalibration
against HPET or PM timer.

As a result there is no mechanism to validate that frequency in cases
where a firmware or hardware defect is suspected. And there was case
that some user used atomic clock to measure the TSC frequency and
reported an inaccuracy issue, which was later fixed in firmware.

Add an option 'recalibrate' for 'tsc' kernel parameter to force the
tsc freq recalibration with HPET or PM timer, and warn if the
deviation from previous value is more than about 500 PPM, which
provides a way to verify the data from hardware / firmware.

There is no functional change to existing work flow.

Recently there was a real-world case: "The 40ms/s divergence between
TSC and HPET was observed on hardware that is quite recent" [1], on
that platform the TSC frequence 1896 MHz was got from CPUID(0x15),
and the force-reclibration with HPET/PMTIMER both calibrated out
value of 1975 MHz, which also matched with check from software
'chronyd', indicating it's a problem of BIOS or firmware.

[Thanks tglx for helping improving the commit log]
[ paulmck: Wordsmith Kconfig help text. ]

[1]. https://lore.kernel.org/lkml/20221117230910.GI4001@paulmck-ThinkPad-P17-Gen-1/
Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: <x86@kernel.org>
Cc: <linux-doc@vger.kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-02-02 14:22:52 -08:00
Alexey Kardashevskiy 7914695743 x86/amd: Cache debug register values in percpu variables
Reading DR[0-3]_ADDR_MASK MSRs takes about 250 cycles which is going to
be noticeable with the AMD KVM SEV-ES DebugSwap feature enabled.  KVM is
going to store host's DR[0-3] and DR[0-3]_ADDR_MASK before switching to
a guest; the hardware is going to swap these on VMRUN and VMEXIT.

Store MSR values passed to set_dr_addr_mask() in percpu variables
(when changed) and return them via new amd_get_dr_addr_mask().
The gain here is about 10x.

As set_dr_addr_mask() uses the array too, change the @dr type to
unsigned to avoid checking for <0. And give it the amd_ prefix to match
the new helper as the whole DR_ADDR_MASK feature is AMD-specific anyway.

While at it, replace deprecated boot_cpu_has() with cpu_feature_enabled()
in set_dr_addr_mask().

Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230120031047.628097-2-aik@amd.com
2023-01-31 20:09:26 +01:00
Ashok Raj 25d0dc4b95 x86/microcode: Allow only "1" as a late reload trigger value
Microcode gets reloaded late only if "1" is written to the reload file.
However, the code silently treats any other unsigned integer as a
successful write even though no actions are performed to load microcode.

Make the loader more strict to accept only "1" as a trigger value and
return an error otherwise.

  [ bp: Massage commit message. ]

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230130213955.6046-3-ashok.raj@intel.com
2023-01-31 16:47:03 +01:00
Peter Zijlstra 923510c88d x86/static_call: Add support for Jcc tail-calls
Clang likes to create conditional tail calls like:

  0000000000000350 <amd_pmu_add_event>:
  350:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1) 351: R_X86_64_NONE      __fentry__-0x4
  355:       48 83 bf 20 01 00 00 00         cmpq   $0x0,0x120(%rdi)
  35d:       0f 85 00 00 00 00       jne    363 <amd_pmu_add_event+0x13>     35f: R_X86_64_PLT32     __SCT__amd_pmu_branch_add-0x4
  363:       e9 00 00 00 00          jmp    368 <amd_pmu_add_event+0x18>     364: R_X86_64_PLT32     __x86_return_thunk-0x4

Where 0x35d is a static call site that's turned into a conditional
tail-call using the Jcc class of instructions.

Teach the in-line static call text patching about this.

Notably, since there is no conditional-ret, in that case patch the Jcc
to point at an empty stub function that does the ret -- or the return
thunk when needed.

Reported-by: "Erhard F." <erhard_f@mailbox.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/Y9Kdg9QjHkr9G5b5@hirez.programming.kicks-ass.net
2023-01-31 15:05:31 +01:00
Peter Zijlstra ac0ee0a956 x86/alternatives: Teach text_poke_bp() to patch Jcc.d32 instructions
In order to re-write Jcc.d32 instructions text_poke_bp() needs to be
taught about them.

The biggest hurdle is that the whole machinery is currently made for 5
byte instructions and extending this would grow struct text_poke_loc
which is currently a nice 16 bytes and used in an array.

However, since text_poke_loc contains a full copy of the (s32)
displacement, it is possible to map the Jcc.d32 2 byte opcodes to
Jcc.d8 1 byte opcode for the int3 emulation.

This then leaves the replacement bytes; fudge that by only storing the
last 5 bytes and adding the rule that 'length == 6' instruction will
be prefixed with a 0x0f byte.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20230123210607.115718513@infradead.org
2023-01-31 15:05:31 +01:00
Peter Zijlstra db7adcfd1c x86/alternatives: Introduce int3_emulate_jcc()
Move the kprobe Jcc emulation into int3_emulate_jcc() so it can be
used by more code -- specifically static_call() will need this.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20230123210607.057678245@infradead.org
2023-01-31 15:05:30 +01:00
Peter Zijlstra 8739c68115 sched/clock/x86: Mark sched_clock() noinstr
In order to use sched_clock() from noinstr code, mark it and all it's
implenentations noinstr.

The whole pvclock thing (used by KVM/Xen) is a bit of a pain,
since it calls out to watchdogs, create a
pvclock_clocksource_read_nowd() variant doesn't do that and can be
noinstr.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230126151323.702003578@infradead.org
2023-01-31 15:01:47 +01:00
Uros Bizjak 5c9da9fe82 x86/pvclock: Improve atomic update of last_value in pvclock_clocksource_read()
Improve atomic update of last_value in pvclock_clocksource_read:

- Atomic update can be skipped if the "last_value" is already
  equal to "ret".

- The detection of atomic update failure is not correct. The value,
  returned by atomic64_cmpxchg should be compared to the old value
  from the location to be updated. If these two are the same, then
  atomic update succeeded and "last_value" location is updated to
  "ret" in an atomic way. Otherwise, the atomic update failed and
  it should be retried with the value from "last_value" - exactly
  what atomic64_try_cmpxchg does in a correct and more optimal way.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20230118202330.3740-1-ubizjak@gmail.com
Link: https://lore.kernel.org/r/20230126151323.643408110@infradead.org
2023-01-31 15:01:46 +01:00
Ingo Molnar 57a30218fa Linux 6.2-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmPW7E8eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGf7MIAI0JnHN9WvtEukSZ
 E6j6+cEGWxsvD6q0g3GPolaKOCw7hlv0pWcFJFcUAt0jebspMdxV2oUGJ8RYW7Lg
 nCcHvEVswGKLAQtQSWw52qotW6fUfMPsNYYB5l31sm1sKH4Cgss0W7l2HxO/1LvG
 TSeNHX53vNAZ8pVnFYEWCSXC9bzrmU/VALF2EV00cdICmfvjlgkELGXoLKJJWzUp
 s63fBHYGGURSgwIWOKStoO6HNo0j/F/wcSMx8leY8qDUtVKHj4v24EvSgxUSDBER
 ch3LiSQ6qf4sw/z7pqruKFthKOrlNmcc0phjiES0xwwGiNhLv0z3rAhc4OM2cgYh
 SDc/Y/c=
 =zpaD
 -----END PGP SIGNATURE-----

Merge tag 'v6.2-rc6' into sched/core, to pick up fixes

Pick up fixes before merging another batch of cpuidle updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-01-31 15:01:20 +01:00
Linus Torvalds bc6bc34b10 - Start checking for -mindirect-branch-cs-prefix clang support too now that LLVM
16 will support it
 
 - Fix a NULL ptr deref when suspending with Xen PV
 
 - Have a SEV-SNP guest check explicitly for features enabled by the hypervisor
   and fail gracefully if some are unsupported by the guest instead of failing in
   a non-obvious and hard-to-debug way
 
 - Fix a MSI descriptor leakage under Xen
 
 - Mark Xen's MSI domain as supporting MSI-X
 
 - Prevent legacy PIC interrupts from being resent in software by marking them
   level triggered, as they should be, which lead to a NULL ptr deref
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPWdaAACgkQEsHwGGHe
 VUrHUBAAvRh7cDVKvr2wPNJ+9RFjPFubcLq/+4yTe4Bu14wnYn8+gb2S7P2bPJWl
 TxbZhJGV2IeYyB+aJybjGwX96AzE5lWD6epQLRLI/BOVmqLA8Eyw9te0NEQDd9Wq
 hgY1/hhUmLdpXubq09YjIht5nlxVZ+JFfppouTR7LVtZl7MFvQbAOU8rWzpcbm7l
 UlA4GLakFeOYpbI5g+zzsZ1a1M0cSz8wQ43plD5aAtNy2wKbWiA4QDXJo9J7+ZQg
 1FmOHZ3ChPjEhf9k/N2uAZ6RUHVEDIJ7hDfsM3j/qsKGzCV4cho2Ify+0PFWo4pt
 FO2bRh1yDFqdz4m6ulhnmuMnoRnEuswwrTzrG+HYu1ntpUt72dULmmRBpJ0s6C7W
 BHpCThNWIlcfHbLNY+VAOJ2hiOqfU/8ld+8R0sn7xmomjgCRYHh9kDGOeuvh1hJl
 jfITDuL2gjcj3Ph6+xh6KvOga41ff3EcxkfvqolZ/emRllPiDWsKYXVgUIl22FHt
 23xH7gutPCp27MdoXM1EaWs5s/PQfctvm4LrW6XS8IjWURLo4RrkU6y7YD5VKVy8
 KCKcpF+JMdE1Ao1WpMFfjDG4ObyMYyqnD790nQVM1e5kIhOpeWaa7WzkGFRyIo6Q
 5BU3GGDyMT8SHy7NFhQL62YJe0P2ZctNIDjiTQlYDnWWhjmvk8I=
 =4QMN
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Start checking for -mindirect-branch-cs-prefix clang support too now
   that LLVM 16 will support it

 - Fix a NULL ptr deref when suspending with Xen PV

 - Have a SEV-SNP guest check explicitly for features enabled by the
   hypervisor and fail gracefully if some are unsupported by the guest
   instead of failing in a non-obvious and hard-to-debug way

 - Fix a MSI descriptor leakage under Xen

 - Mark Xen's MSI domain as supporting MSI-X

 - Prevent legacy PIC interrupts from being resent in software by
   marking them level triggered, as they should be, which lead to a NULL
   ptr deref

* tag 'x86_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/build: Move '-mindirect-branch-cs-prefix' out of GCC-only block
  acpi: Fix suspend with Xen PV
  x86/sev: Add SEV-SNP guest feature negotiation support
  x86/pci/xen: Fixup fallout from the PCI/MSI overhaul
  x86/pci/xen: Set MSI_FLAG_PCI_MSIX support in Xen MSI domain
  x86/i8259: Mark legacy PIC interrupts with IRQ_LEVEL
2023-01-29 11:17:34 -08:00
Kirill A. Shutemov 0da908c291 x86/tdx: Add more registers to struct tdx_hypercall_args
struct tdx_hypercall_args is used to pass down hypercall arguments to
__tdx_hypercall() assembly routine.

Currently __tdx_hypercall() handles up to 6 arguments. In preparation to
changes in __tdx_hypercall(), expand the structure to 6 more registers
and generate asm offsets for them.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20230126221159.8635-3-kirill.shutemov%40linux.intel.com
2023-01-27 09:42:09 -08:00
Uros Bizjak 890a0794b3 x86/ACPI/boot: Use try_cmpxchg() in __acpi_{acquire,release}_global_lock()
Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
__acpi_{acquire,release}_global_lock().  x86 CMPXCHG instruction returns
success in ZF flag, so this change saves a compare after CMPXCHG
(and related MOV instruction in front of CMPXCHG).

Also, try_cmpxchg() implicitly assigns old *ptr value to "old" when CMPXCHG
fails. There is no need to re-read the value in the loop.

Note that the value from *ptr should be read using READ_ONCE() to prevent
the compiler from merging, refetching or reordering the read.

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20230116162522.4072-1-ubizjak@gmail.com
2023-01-26 11:49:40 +01:00
Borislav Petkov (AMD) 793207bad7 x86/resctrl: Fix a silly -Wunused-but-set-variable warning
clang correctly complains

  arch/x86/kernel/cpu/resctrl/rdtgroup.c:1456:6: warning: variable \
     'h' set but not used [-Wunused-but-set-variable]
          u32 h;
              ^

but it can't know whether this use is innocuous or really a problem.
There's a reason why those warning switches are behind a W=1 and not
enabled by default - yes, one needs to do:

  make W=1 CC=clang HOSTCC=clang arch/x86/kernel/cpu/resctrl/

with clang 14 in order to trigger it.

I would normally not take a silly fix like that but this one is simple
and doesn't make the code uglier so...

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/202301242015.kbzkVteJ-lkp@intel.com
2023-01-26 11:15:20 +01:00
Kim Phillips e7862eda30 x86/cpu: Support AMD Automatic IBRS
The AMD Zen4 core supports a new feature called Automatic IBRS.

It is a "set-and-forget" feature that means that, like Intel's Enhanced IBRS,
h/w manages its IBRS mitigation resources automatically across CPL transitions.

The feature is advertised by CPUID_Fn80000021_EAX bit 8 and is enabled by
setting MSR C000_0080 (EFER) bit 21.

Enable Automatic IBRS by default if the CPU feature is present.  It typically
provides greater performance over the incumbent generic retpolines mitigation.

Reuse the SPECTRE_V2_EIBRS spectre_v2_mitigation enum.  AMD Automatic IBRS and
Intel Enhanced IBRS have similar enablement.  Add NO_EIBRS_PBRSB to
cpu_vuln_whitelist, since AMD Automatic IBRS isn't affected by PBRSB-eIBRS.

The kernel command line option spectre_v2=eibrs is used to select AMD Automatic
IBRS, if available.

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Sean Christopherson <seanjc@google.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20230124163319.2277355-8-kim.phillips@amd.com
2023-01-25 17:16:01 +01:00
Kim Phillips 5b909d4ae5 x86/cpu, kvm: Add the Null Selector Clears Base feature
The Null Selector Clears Base feature was being open-coded for KVM.
Add it to its newly added native CPUID leaf 0x80000021 EAX proper.

Also drop the bit description comments now it's more self-describing.

  [ bp: Convert test in check_null_seg_clears_base() too. ]

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20230124163319.2277355-6-kim.phillips@amd.com
2023-01-25 16:25:46 +01:00
Kim Phillips 84168ae786 x86/cpu, kvm: Move X86_FEATURE_LFENCE_RDTSC to its native leaf
The LFENCE always serializing feature bit was defined as scattered
LFENCE_RDTSC and its native leaf bit position open-coded for KVM.  Add
it to its newly added CPUID leaf 0x80000021 EAX proper.  With
LFENCE_RDTSC in its proper place, the kernel's set_cpu_cap() will
effectively synthesize the feature for KVM going forward.

Also, DE_CFG[1] doesn't need to be set on such CPUs anymore.

  [ bp: Massage and merge diff from Sean. ]

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20230124163319.2277355-5-kim.phillips@amd.com
2023-01-25 13:06:13 +01:00
Jens Axboe cb3ea4b767 x86/fpu: Don't set TIF_NEED_FPU_LOAD for PF_IO_WORKER threads
We don't set it on PF_KTHREAD threads as they never return to userspace,
and PF_IO_WORKER threads are identical in that regard. As they keep
running in the kernel until they die, skip setting the FPU flag on them.

More of a cosmetic thing that was found while debugging and
issue and pondering why the FPU flag is set on these threads.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/560c844c-f128-555b-40c6-31baff27537f@kernel.dk
2023-01-25 12:35:15 +01:00
Brian Gerst 4c382d723e x86/vdso: Move VDSO image init to vdso2c generated code
Generate an init function for each VDSO image, replacing init_vdso() and
sysenter_setup().

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230124184019.26850-1-brgerst@gmail.com
2023-01-25 12:33:40 +01:00
Kim Phillips 8415a74852 x86/cpu, kvm: Add support for CPUID_80000021_EAX
Add support for CPUID leaf 80000021, EAX. The majority of the features will be
used in the kernel and thus a separate leaf is appropriate.

Include KVM's reverse_cpuid entry because features are used by VM guests, too.

  [ bp: Massage commit message. ]

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20230124163319.2277355-2-kim.phillips@amd.com
2023-01-25 12:33:06 +01:00
Sean Christopherson 54a3b70a75 x86/entry: KVM: Use dedicated VMX NMI entry for 32-bit kernels too
Use a dedicated entry for invoking the NMI handler from KVM VMX's VM-Exit
path for 32-bit even though using a dedicated entry for 32-bit isn't
strictly necessary.  Exposing a single symbol will allow KVM to reference
the entry point in assembly code without having to resort to more #ifdefs
(or #defines).  identry.h is intended to be included from asm files only
once, and so simply including idtentry.h in KVM assembly isn't an option.

Bypassing the ESP fixup and CR3 switching in the standard NMI entry code
is safe as KVM always handles NMIs that occur in the guest on a kernel
stack, with a kernel CR3.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221213060912.654668-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24 10:36:40 -08:00
Sean Christopherson a2b07fa7b9 x86/reboot: Disable SVM, not just VMX, when stopping CPUs
Disable SVM and more importantly force GIF=1 when halting a CPU or
rebooting the machine.  Similar to VMX, SVM allows software to block
INITs via CLGI, and thus can be problematic for a crash/reboot.  The
window for failure is smaller with SVM as INIT is only blocked while
GIF=0, i.e. between CLGI and STGI, but the window does exist.

Fixes: fba4f472b3 ("x86/reboot: Turn off KVM when halting a CPU")
Cc: stable@vger.kernel.org
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20221130233650.1404148-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24 10:05:22 -08:00
Sean Christopherson d81f952aa6 x86/reboot: Disable virtualization in an emergency if SVM is supported
Disable SVM on all CPUs via NMI shootdown during an emergency reboot.
Like VMX, SVM can block INIT, e.g. if the emergency reboot is triggered
between CLGI and STGI, and thus can prevent bringing up other CPUs via
INIT-SIPI-SIPI.

Cc: stable@vger.kernel.org
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20221130233650.1404148-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24 10:05:22 -08:00
Sean Christopherson 26044aff37 x86/crash: Disable virt in core NMI crash handler to avoid double shootdown
Disable virtualization in crash_nmi_callback() and rework the
emergency_vmx_disable_all() path to do an NMI shootdown if and only if a
shootdown has not already occurred.   NMI crash shootdown fundamentally
can't support multiple invocations as responding CPUs are deliberately
put into halt state without unblocking NMIs.  But, the emergency reboot
path doesn't have any work of its own, it simply cares about disabling
virtualization, i.e. so long as a shootdown occurred, emergency reboot
doesn't care who initiated the shootdown, or when.

If "crash_kexec_post_notifiers" is specified on the kernel command line,
panic() will invoke crash_smp_send_stop() and result in a second call to
nmi_shootdown_cpus() during native_machine_emergency_restart().

Invoke the callback _before_ disabling virtualization, as the current
VMCS needs to be cleared before doing VMXOFF.  Note, this results in a
subtle change in ordering between disabling virtualization and stopping
Intel PT on the responding CPUs.  While VMX and Intel PT do interact,
VMXOFF and writes to MSR_IA32_RTIT_CTL do not induce faults between one
another, which is all that matters when panicking.

Harden nmi_shootdown_cpus() against multiple invocations to try and
capture any such kernel bugs via a WARN instead of hanging the system
during a crash/dump, e.g. prior to the recent hardening of
register_nmi_handler(), re-registering the NMI handler would trigger a
double list_add() and hang the system if CONFIG_BUG_ON_DATA_CORRUPTION=y.

 list_add double add: new=ffffffff82220800, prev=ffffffff8221cfe8, next=ffffffff82220800.
 WARNING: CPU: 2 PID: 1319 at lib/list_debug.c:29 __list_add_valid+0x67/0x70
 Call Trace:
  __register_nmi_handler+0xcf/0x130
  nmi_shootdown_cpus+0x39/0x90
  native_machine_emergency_restart+0x1c9/0x1d0
  panic+0x237/0x29b

Extract the disabling logic to a common helper to deduplicate code, and
to prepare for doing the shootdown in the emergency reboot path if SVM
is supported.

Note, prior to commit ed72736183 ("x86/reboot: Force all cpus to exit
VMX root if VMX is supported"), nmi_shootdown_cpus() was subtly protected
against a second invocation by a cpu_vmx_enabled() check as the kdump
handler would disable VMX if it ran first.

Fixes: ed72736183 ("x86/reboot: Force all cpus to exit VMX root if VMX is supported")
Cc: stable@vger.kernel.org
Reported-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/all/20220427224924.592546-2-gpiccoli@igalia.com
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20221130233650.1404148-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24 10:05:21 -08:00
Paolo Bonzini dc7c31e922 Merge branch 'kvm-v6.2-rc4-fixes' into HEAD
ARM:

* Fix the PMCR_EL0 reset value after the PMU rework

* Correctly handle S2 fault triggered by a S1 page table walk
  by not always classifying it as a write, as this breaks on
  R/O memslots

* Document why we cannot exit with KVM_EXIT_MMIO when taking
  a write fault from a S1 PTW on a R/O memslot

* Put the Apple M2 on the naughty list for not being able to
  correctly implement the vgic SEIS feature, just like the M1
  before it

* Reviewer updates: Alex is stepping down, replaced by Zenghui

x86:

* Fix various rare locking issues in Xen emulation and teach lockdep
  to detect them

* Documentation improvements

* Do not return host topology information from KVM_GET_SUPPORTED_CPUID
2023-01-24 06:05:23 -05:00
Babu Moger 4fe61bff5a x86/resctrl: Add interface to write mbm_local_bytes_config
The event configuration for mbm_local_bytes can be changed by the
user by writing to the configuration file
/sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config.

The event configuration settings are domain specific and will affect all
the CPUs in the domain.

Following are the types of events supported:

  ====  ===========================================================
  Bits   Description
  ====  ===========================================================
  6      Dirty Victims from the QOS domain to all types of memory
  5      Reads to slow memory in the non-local NUMA domain
  4      Reads to slow memory in the local NUMA domain
  3      Non-temporal writes to non-local NUMA domain
  2      Non-temporal writes to local NUMA domain
  1      Reads to memory in the non-local NUMA domain
  0      Reads to memory in the local NUMA domain
  ====  ===========================================================

For example, to change the mbm_local_bytes_config to count all the non-temporal
writes on domain 0, the bits 2 and 3 needs to be set which is 1100b (in hex
0xc).
Run the command:

  $echo  0=0xc > /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config

To change the mbm_local_bytes to count only reads to local NUMA domain 1,
the bit 0 needs to be set which 1b (in hex 0x1). Run the command:

  $echo  1=0x1 > /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-13-babu.moger@amd.com
2023-01-23 17:40:32 +01:00
Babu Moger 92bd5a1390 x86/resctrl: Add interface to write mbm_total_bytes_config
The event configuration for mbm_total_bytes can be changed by the user by
writing to the file /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.

The event configuration settings are domain specific and affect all the
CPUs in the domain.

Following are the types of events supported:

  ====  ===========================================================
  Bits   Description
  ====  ===========================================================
  6      Dirty Victims from the QOS domain to all types of memory
  5      Reads to slow memory in the non-local NUMA domain
  4      Reads to slow memory in the local NUMA domain
  3      Non-temporal writes to non-local NUMA domain
  2      Non-temporal writes to local NUMA domain
  1      Reads to memory in the non-local NUMA domain
  0      Reads to memory in the local NUMA domain
  ====  ===========================================================

For example:

To change the mbm_total_bytes to count only reads on domain 0, the bits
0, 1, 4 and 5 needs to be set, which is 110011b (in hex 0x33).
Run the command:

  $echo  0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config

To change the mbm_total_bytes to count all the slow memory reads on domain 1,
the bits 4 and 5 needs to be set which is 110000b (in hex 0x30).
Run the command:

  $echo  1=0x30 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-12-babu.moger@amd.com
2023-01-23 17:40:30 +01:00
Babu Moger 73afb2d3ce x86/resctrl: Add interface to read mbm_local_bytes_config
The event configuration can be viewed by the user by reading the configuration
file /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config.  The event
configuration settings are domain specific and will affect all the CPUs in the
domain.

Following are the types of events supported:

  ====  ===========================================================
  Bits   Description
  ====  ===========================================================
  6      Dirty Victims from the QOS domain to all types of memory
  5      Reads to slow memory in the non-local NUMA domain
  4      Reads to slow memory in the local NUMA domain
  3      Non-temporal writes to non-local NUMA domain
  2      Non-temporal writes to local NUMA domain
  1      Reads to memory in the non-local NUMA domain
  0      Reads to memory in the local NUMA domain
  ====  ===========================================================

By default, the mbm_local_bytes_config is set to 0x15 to count all the local
event types.

For example:

  $cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
  0=0x15;1=0x15;2=0x15;3=0x15

In this case, the event mbm_local_bytes is configured with 0x15 on
domains 0 to 3.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-11-babu.moger@amd.com
2023-01-23 17:40:27 +01:00
Babu Moger dc2a3e8579 x86/resctrl: Add interface to read mbm_total_bytes_config
The event configuration can be viewed by the user by reading the
configuration file /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.  The
event configuration settings are domain specific and will affect all the CPUs in
the domain.

Following are the types of events supported:

  ====  ===========================================================
  Bits   Description
  ====  ===========================================================
  6      Dirty Victims from the QOS domain to all types of memory
  5      Reads to slow memory in the non-local NUMA domain
  4      Reads to slow memory in the local NUMA domain
  3      Non-temporal writes to non-local NUMA domain
  2      Non-temporal writes to local NUMA domain
  1      Reads to memory in the non-local NUMA domain
  0      Reads to memory in the local NUMA domain
  ====  ===========================================================

By default, the mbm_total_bytes_config is set to 0x7f to count all the
event types.

For example:

  $cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
  0=0x7f;1=0x7f;2=0x7f;3=0x7f

In this case, the event mbm_total_bytes is configured with 0x7f on
domains 0 to 3.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-10-babu.moger@amd.com
2023-01-23 17:40:24 +01:00
Babu Moger d507f83ced x86/resctrl: Support monitor configuration
Add a new field in struct mon_evt to support Bandwidth Monitoring Event
Configuration (BMEC) and also update the "mon_features" display.

The resctrl file "mon_features" will display the supported events
and files that can be used to configure those events if monitor
configuration is supported.

Before the change:

  $ cat /sys/fs/resctrl/info/L3_MON/mon_features
  llc_occupancy
  mbm_total_bytes
  mbm_local_bytes

After the change when BMEC is supported:

  $ cat /sys/fs/resctrl/info/L3_MON/mon_features
  llc_occupancy
  mbm_total_bytes
  mbm_total_bytes_config
  mbm_local_bytes
  mbm_local_bytes_config

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-9-babu.moger@amd.com
2023-01-23 17:40:21 +01:00
Babu Moger bd334c86b5 x86/resctrl: Add __init attribute to rdt_get_mon_l3_config()
In an upcoming change, rdt_get_mon_l3_config() needs to call rdt_cpu_has() to
query the monitor related features. It cannot be called right now because
rdt_cpu_has() has the __init attribute but rdt_get_mon_l3_config() doesn't.

Add the __init attribute to rdt_get_mon_l3_config() that is only called by
get_rdt_mon_resources() that already has the __init attribute. Also make
rdt_cpu_has() available to by rdt_get_mon_l3_config() via the internal header
file.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-8-babu.moger@amd.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-01-23 17:40:11 +01:00
Babu Moger 5b6fac3fa4 x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation
The QoS slow memory configuration details are available via
CPUID_Fn80000020_EDX_x02. Detect the available details and
initialize the rest to defaults.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-7-babu.moger@amd.com
2023-01-23 17:38:44 +01:00
Babu Moger a76f65c89f x86/resctrl: Include new features in command line options
Add the command line options to enable or disable the new resctrl features:

smba: Slow Memory Bandwidth Allocation
bmec: Bandwidth Monitor Event Configuration.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-6-babu.moger@amd.com
2023-01-23 17:38:38 +01:00
Babu Moger 78335aac61 x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
Newer AMD processors support the new feature Bandwidth Monitoring Event
Configuration (BMEC).

The feature support is identified via CPUID Fn8000_0020_EBX_x0[3]: EVT_CFG -
Bandwidth Monitoring Event Configuration (BMEC)

The bandwidth monitoring events mbm_total_bytes and mbm_local_bytes are set to
count all the total and local reads/writes, respectively. With the introduction
of slow memory, the two counters are not enough to count all the different types
of memory events. Therefore, BMEC provides the option to configure
mbm_total_bytes and mbm_local_bytes to count the specific type of events.

Each BMEC event has a configuration MSR which contains one field for each
bandwidth type that can be used to configure the bandwidth event to track any
combination of supported bandwidth types. The event will count requests from
every bandwidth type bit that is set in the corresponding configuration
register.

Following are the types of events supported:

  ====    ========================================================
  Bits    Description
  ====    ========================================================
  6       Dirty Victims from the QOS domain to all types of memory
  5       Reads to slow memory in the non-local NUMA domain
  4       Reads to slow memory in the local NUMA domain
  3       Non-temporal writes to non-local NUMA domain
  2       Non-temporal writes to local NUMA domain
  1       Reads to memory in the non-local NUMA domain
  0       Reads to memory in the local NUMA domain
  ====    ========================================================

By default, the mbm_total_bytes configuration is set to 0x7F to count
all the event types and the mbm_local_bytes configuration is set to 0x15 to
count all the local memory events.

Feature description is available in the specification, "AMD64 Technology
Platform Quality of Service Extensions, Revision: 1.03 Publication" at
https://bugzilla.kernel.org/attachment.cgi?id=301365

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-5-babu.moger@amd.com
2023-01-23 17:38:31 +01:00
Babu Moger a5b6996655 x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
Add a new resource type RDT_RESOURCE_SMBA to handle the QoS enforcement
policies on the external slow memory.

Mostly initialization of the essentials. Setting fflags to RFTYPE_RES_MB
configures the SMBA resource to have the same resctrl files as the
existing MBA resource. The SMBA resource has identical properties to
the existing MBA resource. These properties will be enumerated in an
upcoming change and exposed via resctrl because of this flag.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-4-babu.moger@amd.com
2023-01-23 17:38:22 +01:00
Babu Moger f334f723a6 x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
Add the new AMD feature X86_FEATURE_SMBA. With it, the QOS enforcement policies
can be applied to external slow memory connected to the host. QOS enforcement is
accomplished by assigning a Class Of Service (COS) to a processor and specifying
allocations or limits for that COS for each resource to be allocated.

This feature is identified by the CPUID function 0x8000_0020_EBX_x0[2]:
L3SBE - L3 external slow memory bandwidth enforcement.

CXL.memory is the only supported "slow" memory device. With SMBA, the hardware
enables bandwidth allocation on the slow memory devices.  If there are multiple
slow memory devices in the system, then the throttling logic groups all the slow
sources together and applies the limit on them as a whole.

The presence of the SMBA feature (with CXL.memory) is independent of whether
slow memory device is actually present in the system. If there is no slow memory
in the system, then setting a SMBA limit will have no impact on the performance
of the system.

Presence of CXL memory can be identified by the numactl command:

  $numactl -H
  available: 2 nodes (0-1)
  node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  node 0 size: 63678 MB node 0 free: 59542 MB
  node 1 cpus:
  node 1 size: 16122 MB
  node 1 free: 15627 MB
  node distances:
  node   0   1
     0:  10  50
     1:  50  10

CPU list for CXL memory will be empty. The cpu-cxl node distance is greater than
cpu-to-cpu distances. Node 1 has the CXL memory in this case. CXL memory can
also be identified using ACPI SRAT table and memory maps.

Feature description is available in the specification, "AMD64 Technology
Platform Quality of Service Extensions, Revision: 1.03 Publication # 56375
Revision: 1.03 Issue Date: February 2022" at
https://bugzilla.kernel.org/attachment.cgi?id=301365

See also https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-3-babu.moger@amd.com
2023-01-23 17:38:17 +01:00
Babu Moger fc3b618c87 x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask()
on_each_cpu_mask() runs the function on each CPU specified by cpumask,
which may include the local processor.

Replace smp_call_function_many() with on_each_cpu_mask() to simplify
the code.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-2-babu.moger@amd.com
2023-01-23 17:38:04 +01:00
Linus Torvalds 2475bf0250 - Make sure the scheduler doesn't use stale frequency scaling values when latter
get disabled due to a value error
 
 - Fix a NULL pointer access on UP configs
 
 - Use the proper locking when updating CPU capacity
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPNKRsACgkQEsHwGGHe
 VUr8NA/9GTqGUpOq0iRQkEOE1oAdT4ZxJ9dQpaWjwWSNv40HT7XJBzjtvzAyiOBF
 LTUVClBct5i1/j0tVcNq1zXEj3im2e24Ki6A1TWukejgGAMT/7siSkuChEDAMg2M
 b79nKCMpuIZMzkJND3qTkW/aMPpAyU82G8BeLCjw7vgPBsbVkjgbxGFKYdHgpLZa
 kdX/GhOufu40jcGeUxA4zWTpXfuXT3OG7JYLrlHeJ/HEdzy9kLCkWH4jHHllzPQw
 c4JwG7UnNkDKD6zkG0Guzi1zQy39egh/kaj7FQmVap9sneq+x69T4ta0BfXdBXnQ
 Vsqc/nnOybUzr9Gjg5W6KRZWk0k6hK9n3+cye88BfRvzMY0KAgxeCiZEn7cuKqZp
 15Agzz77vcwt32QsxjSp+hKQxJtePcBCurDhkfuAZyPELNIBeCW6inrXArprfTIg
 IfEF068GKsvDGu3I/z49VXFZ9YBoerQREHbN/xL1VNeB8VoAxAE227OMUvhMcIu1
 jxVvkESwc9BIybTcJbUjgg2i9t+Fv3gtZBQ6GFfDboHneia4U5a9aTyN6QGjX+5P
 SIXPhePbFCNrhW7JbSGqKBd96MczbFNQinRhgX1lBU0241cAnchY4nH9fUvv7Zrt
 b/QDg58tb2bJkm1Z08L256ZETELOU9nLJVxUbtBNSSO9dfkvoBs=
 =aNSK
 -----END PGP SIGNATURE-----

Merge tag 'sched_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Borislav Petkov:

 - Make sure the scheduler doesn't use stale frequency scaling values
   when latter get disabled due to a value error

 - Fix a NULL pointer access on UP configs

 - Use the proper locking when updating CPU capacity

* tag 'sched_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/aperfmperf: Erase stale arch_freq_scale values when disabling frequency invariance readings
  sched/core: Fix NULL pointer access fault in sched_setaffinity() with non-SMP configs
  sched/fair: Fixes for capacity inversion detection
  sched/uclamp: Fix a uninitialized variable warnings
2023-01-22 12:14:58 -08:00
Ashok Raj a9a5cac225 x86/microcode/intel: Print old and new revision during early boot
Make early loading message match late loading message and print both old
and new revisions.

This is helpful to know what the BIOS loaded revision is before an early
update.

Cache the early BIOS revision before the microcode update and have
print_ucode_info() print both the old and new revision in the same
format as microcode_reload_late().

  [ bp: Massage, remove useless comment. ]

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230120161923.118882-6-ashok.raj@intel.com
2023-01-21 14:55:21 +01:00
Ashok Raj 174f1b909a x86/microcode/intel: Pass the microcode revision to print_ucode_info() directly
print_ucode_info() takes a struct ucode_cpu_info pointer as parameter.
Its sole purpose is to print the microcode revision.

The only available ucode_cpu_info always describes the currently loaded
microcode revision. After a microcode update is successful, this is the
new revision, or on failure it is the original revision.

In preparation for future changes, replace the struct ucode_cpu_info
pointer parameter with a plain integer which contains the revision
number and adjust the call sites accordingly.

No functional change.

  [ bp:
    - Fix + cleanup commit message.
    - Revert arbitrary, unrelated change.
  ]

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230120161923.118882-5-ashok.raj@intel.com
2023-01-21 14:55:20 +01:00
Ashok Raj 6eab3abac7 x86/microcode: Adjust late loading result reporting message
During late microcode loading, the "Reload completed" message is issued
unconditionally, regardless of success or failure.

Adjust the message to report the result of the update.

  [ bp: Massage. ]

Fixes: 9bd681251b ("x86/microcode: Announce reload operation's completion")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/lkml/874judpqqd.ffs@tglx/
2023-01-21 14:55:20 +01:00
Ashok Raj c0dd9245aa x86/microcode: Check CPU capabilities after late microcode update correctly
The kernel caches each CPU's feature bits at boot in an x86_capability[]
structure. However, the capabilities in the BSP's copy can be turned off
as a result of certain command line parameters or configuration
restrictions, for example the SGX bit. This can cause a mismatch when
comparing the values before and after the microcode update.

Another example is X86_FEATURE_SRBDS_CTRL which gets added only after
microcode update:

  --- cpuid.before	2023-01-21 14:54:15.652000747 +0100
  +++ cpuid.after	2023-01-21 14:54:26.632001024 +0100
  @@ -10,7 +10,7 @@ CPU:
      0x00000004 0x04: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
      0x00000005 0x00: eax=0x00000040 ebx=0x00000040 ecx=0x00000003 edx=0x11142120
      0x00000006 0x00: eax=0x000027f7 ebx=0x00000002 ecx=0x00000001 edx=0x00000000
  -   0x00000007 0x00: eax=0x00000000 ebx=0x029c6fbf ecx=0x40000000 edx=0xbc002400
  +   0x00000007 0x00: eax=0x00000000 ebx=0x029c6fbf ecx=0x40000000 edx=0xbc002e00
  									     ^^^

and which proves for a gazillionth time that late loading is a bad bad
idea.

microcode_check() is called after an update to report any previously
cached CPUID bits which might have changed due to the update.

Therefore, store the cached CPU caps before the update and compare them
with the CPU caps after the microcode update has succeeded.

Thus, the comparison is done between the CPUID *hardware* bits before
and after the upgrade instead of using the cached, possibly runtime
modified values in BSP's boot_cpu_data copy.

As a result, false warnings about CPUID bits changes are avoided.

  [ bp:
  	- Massage.
	- Add SRBDS_CTRL example.
	- Add kernel-doc.
	- Incorporate forgotten review feedback from dhansen.
	]

Fixes: 1008c52c09 ("x86/CPU: Add a microcode loader callback")
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230109153555.4986-3-ashok.raj@intel.com
2023-01-21 14:53:20 +01:00
Ashok Raj ab31c74455 x86/microcode: Add a parameter to microcode_check() to store CPU capabilities
Add a parameter to store CPU capabilities before performing a microcode
update so that CPU capabilities can be compared before and after update.

  [ bp: Massage. ]

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230109153555.4986-2-ashok.raj@intel.com
2023-01-20 21:45:13 +01:00
Li RongQing 716ff71ae2 cpuidle-haltpoll: Replace default_idle() with arch_cpu_idle()
When a KVM guest has MWAIT, mwait_idle() is used as the default idle
function.

However, the cpuidle-haltpoll driver calls default_idle() from
default_enter_idle() directly and that one uses HLT instead of MWAIT,
which may affect performance adversely, because MWAIT is preferred to
HLT as explained by the changelog of commit aebef63cf7 ("x86: Remove
vendor checks from prefer_mwait_c1_over_halt").

Make default_enter_idle() call arch_cpu_idle(), which can use MWAIT,
instead of default_idle() to address this issue.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
[ rjw: Changelog rewrite ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-20 17:33:52 +01:00
Paul E. McKenney 344da544f1 x86/nmi: Print reasons why backtrace NMIs are ignored
Instrument nmi_trigger_cpumask_backtrace() to dump out diagnostics based
on evidence accumulated by exc_nmi().  These diagnostics are dumped for
CPUs that ignored an NMI backtrace request for more than 10 seconds.

[ paulmck: Apply Ingo Molnar feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <x86@kernel.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
2023-01-19 15:55:12 -08:00
Paul E. McKenney 1a3ea611fc x86/nmi: Accumulate NMI-progress evidence in exc_nmi()
CPUs ignoring NMIs is often a sign of those CPUs going bad, but there
are quite a few other reasons why a CPU might ignore NMIs.  Therefore,
accumulate evidence within exc_nmi() as to what might be preventing a
given CPU from responding to an NMI.

[ paulmck: Apply Peter Zijlstra feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <x86@kernel.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
2023-01-19 15:54:41 -08:00
Guangju Wang[baidu] 59047d942b x86/microcode: Use the DEVICE_ATTR_RO() macro
Use DEVICE_ATTR_RO() helper instead of open-coded DEVICE_ATTR(),
which makes the code a bit shorter and easier to read.

No change in functionality.

Signed-off-by: Guangju Wang[baidu] <wgj900@163.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230118023554.1898-1-wgj900@163.com
2023-01-18 12:02:20 +01:00
Ingo Molnar 65adf3a57c Linux 6.2-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmPEGkMeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGE4YH/1sxdve3nlWT7eyf
 VDanaCZFc2u6Sk1td23HyhvOVbFjj1E39UokHK9YfzD4hhn+u9SFvDgoXHGhHzYq
 IhDvNM3mLsKpEtjwNbbi/lt6tryJchhZ96O5leRiUxZaw/GBVnHrKGR56vCjC/DB
 zq8td4I+TmTPMpsL3e3WZqJoX3bjTqgZFNShA1mZFYtuQRpAVkJkafMRwuAySIMQ
 +FyK/zZ+MpGJ7y/UmWfaBmS5GtjBz0fva7fTbEkUqk8bDtNUWMBhsTKtgU1LBpBe
 Yrkhs3sdQdpKSm9yNaNrlLhSjFwkusV5kR09p6/5w+wthdPmjnoUZuEnAZpYyydE
 ZY7+Vqk=
 =q9Ac
 -----END PGP SIGNATURE-----

Merge tag 'v6.2-rc4' into perf/core, to pick up fixes

Move from the -rc1 base to the fresher -rc4 kernel that
has various fixes included, before applying a larger
patchset.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-01-18 11:56:57 +01:00