Commit Graph

11540 Commits

Author SHA1 Message Date
Daniel Sneddon 2b12993220 x86/speculation: Add RSB VM Exit protections
tl;dr: The Enhanced IBRS mitigation for Spectre v2 does not work as
documented for RET instructions after VM exits. Mitigate it with a new
one-entry RSB stuffing mechanism and a new LFENCE.

== Background ==

Indirect Branch Restricted Speculation (IBRS) was designed to help
mitigate Branch Target Injection and Speculative Store Bypass, i.e.
Spectre, attacks. IBRS prevents software run in less privileged modes
from affecting branch prediction in more privileged modes. IBRS requires
the MSR to be written on every privilege level change.

To overcome some of the performance issues of IBRS, Enhanced IBRS was
introduced.  eIBRS is an "always on" IBRS, in other words, just turn
it on once instead of writing the MSR on every privilege level change.
When eIBRS is enabled, more privileged modes should be protected from
less privileged modes, including protecting VMMs from guests.

== Problem ==

Here's a simplification of how guests are run on Linux' KVM:

void run_kvm_guest(void)
{
	// Prepare to run guest
	VMRESUME();
	// Clean up after guest runs
}

The execution flow for that would look something like this to the
processor:

1. Host-side: call run_kvm_guest()
2. Host-side: VMRESUME
3. Guest runs, does "CALL guest_function"
4. VM exit, host runs again
5. Host might make some "cleanup" function calls
6. Host-side: RET from run_kvm_guest()

Now, when back on the host, there are a couple of possible scenarios of
post-guest activity the host needs to do before executing host code:

* on pre-eIBRS hardware (legacy IBRS, or nothing at all), the RSB is not
touched and Linux has to do a 32-entry stuffing.

* on eIBRS hardware, VM exit with IBRS enabled, or restoring the host
IBRS=1 shortly after VM exit, has a documented side effect of flushing
the RSB except in this PBRSB situation where the software needs to stuff
the last RSB entry "by hand".

IOW, with eIBRS supported, host RET instructions should no longer be
influenced by guest behavior after the host retires a single CALL
instruction.

However, if the RET instructions are "unbalanced" with CALLs after a VM
exit as is the RET in #6, it might speculatively use the address for the
instruction after the CALL in #3 as an RSB prediction. This is a problem
since the (untrusted) guest controls this address.

Balanced CALL/RET instruction pairs such as in step #5 are not affected.

== Solution ==

The PBRSB issue affects a wide variety of Intel processors which
support eIBRS. But not all of them need mitigation. Today,
X86_FEATURE_RSB_VMEXIT triggers an RSB filling sequence that mitigates
PBRSB. Systems setting RSB_VMEXIT need no further mitigation - i.e.,
eIBRS systems which enable legacy IBRS explicitly.

However, such systems (X86_FEATURE_IBRS_ENHANCED) do not set RSB_VMEXIT
and most of them need a new mitigation.

Therefore, introduce a new feature flag X86_FEATURE_RSB_VMEXIT_LITE
which triggers a lighter-weight PBRSB mitigation versus RSB_VMEXIT.

The lighter-weight mitigation performs a CALL instruction which is
immediately followed by a speculative execution barrier (INT3). This
steers speculative execution to the barrier -- just like a retpoline
-- which ensures that speculation can never reach an unbalanced RET.
Then, ensure this CALL is retired before continuing execution with an
LFENCE.

In other words, the window of exposure is opened at VM exit where RET
behavior is troublesome. While the window is open, force RSB predictions
sampling for RET targets to a dead end at the INT3. Close the window
with the LFENCE.

There is a subset of eIBRS systems which are not vulnerable to PBRSB.
Add these systems to the cpu_vuln_whitelist[] as NO_EIBRS_PBRSB.
Future systems that aren't vulnerable will set ARCH_CAP_PBRSB_NO.

  [ bp: Massage, incorporate review comments from Andy Cooper. ]

Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-08-03 11:23:52 +02:00
Linus Torvalds a0b09f2d6f Random number generator updates for Linux 6.0-rc1.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmLnDOwACgkQSfxwEqXe
 A65Fiw//Z0YaPejSslQIGitQ1b0XzdWBhyJArYDieaaiQRXMqlaSKlIUqHz38xb7
 +FykUY51/SJLjHV2riPxq1OK3/MPmk6VlTd0HHihcHVmg77oZcFcv2tPnDpZoqND
 TsBOujLbXKwxP8tNFedRY/4+K7w+ue9BTfDjuH7aCtz7uWd+4cNJmPg3x9FCfkMA
 +hbcRluwE9W3Pg4OCKwv+qxL0JF3qQtNKEOp1wpnjGAZZW/I9gFNgFBEkykvcAsj
 TkIRDc3agPFj6QgDeRIgLdnf9KCsLubKAg5oJneeCvQztJJUCSkn8nQXxpx+4sLo
 GsRgvCdfL/GyJqfSAzQJVYDHKtKMkJiCiWCC/oOALR8dzHJfSlULDAjbY1m/DAr9
 at+vi4678Or7TNx2ZSaUlCXXKZ+UT7yWMlQWax9JuxGk1hGYP5/eT1AH5SGjqUwF
 w1q8oyzxt1vUcnOzEddFXPFirnqqhAk4dQFtu83+xKM4ZssMVyeB4NZdEhAdW0ng
 MX+RjrVj4l5gWWuoS0Cx3LUxDCgV6WT0dN+Vl9axAZkoJJbcXLEmXwQ6NbzTLPWg
 1/MT7qFTxNcTCeAArMdZvvFbeh7pOBXO42pafrK/7vDRnTMUIw9tqXNLQUfvdFQp
 F5flPgiVRHDU2vSzKIFtnPTyXU0RBBGvNb4n0ss2ehH2DSsCxYE=
 =Zy3d
 -----END PGP SIGNATURE-----

Merge tag 'random-6.0-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random

Pull random number generator updates from Jason Donenfeld:
 "Though there's been a decent amount of RNG-related development during
  this last cycle, not all of it is coming through this tree, as this
  cycle saw a shift toward tackling early boot time seeding issues,
  which took place in other trees as well.

  Here's a summary of the various patches:

   - The CONFIG_ARCH_RANDOM .config option and the "nordrand" boot
     option have been removed, as they overlapped with the more widely
     supported and more sensible options, CONFIG_RANDOM_TRUST_CPU and
     "random.trust_cpu". This change allowed simplifying a bit of arch
     code.

   - x86's RDRAND boot time test has been made a bit more robust, with
     RDRAND disabled if it's clearly producing bogus results. This would
     be a tip.git commit, technically, but I took it through random.git
     to avoid a large merge conflict.

   - The RNG has long since mixed in a timestamp very early in boot, on
     the premise that a computer that does the same things, but does so
     starting at different points in wall time, could be made to still
     produce a different RNG state. Unfortunately, the clock isn't set
     early in boot on all systems, so now we mix in that timestamp when
     the time is actually set.

   - User Mode Linux now uses the host OS's getrandom() syscall to
     generate a bootloader RNG seed and later on treats getrandom() as
     the platform's RDRAND-like faculty.

   - The arch_get_random_{seed_,}_long() family of functions is now
     arch_get_random_{seed_,}_longs(), which enables certain platforms,
     such as s390, to exploit considerable performance advantages from
     requesting multiple CPU random numbers at once, while at the same
     time compiling down to the same code as before on platforms like
     x86.

   - A small cleanup changing a cmpxchg() into a try_cmpxchg(), from
     Uros.

   - A comment spelling fix"

More info about other random number changes that come in through various
architecture trees in the full commentary in the pull request:

  https://lore.kernel.org/all/20220731232428.2219258-1-Jason@zx2c4.com/

* tag 'random-6.0-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  random: correct spelling of "overwrites"
  random: handle archrandom with multiple longs
  um: seed rng using host OS rng
  random: use try_cmpxchg in _credit_init_bits
  timekeeping: contribute wall clock to rng on time change
  x86/rdrand: Remove "nordrand" flag in favor of "random.trust_cpu"
  random: remove CONFIG_ARCH_RANDOM
2022-08-02 17:31:35 -07:00
Linus Torvalds 043402495d integrity-v6.0
-----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQQdXVVFGN5XqKr1Hj7LwZzRsCrn5QUCYulqTBQcem9oYXJAbGlu
 dXguaWJtLmNvbQAKCRDLwZzRsCrn5SBBAP9nbAW1SPa/hDqbrclHdDrS59VkSVwv
 6ZO2yAmxJAptHwD+JzyJpJiZsqVN/Tu85V1PqeAt9c8az8f3CfDBp2+w7AA=
 =Ad+c
 -----END PGP SIGNATURE-----

Merge tag 'integrity-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull integrity updates from Mimi Zohar:
 "Aside from the one EVM cleanup patch, all the other changes are kexec
  related.

  On different architectures different keyrings are used to verify the
  kexec'ed kernel image signature. Here are a number of preparatory
  cleanup patches and the patches themselves for making the keyrings -
  builtin_trusted_keyring, .machine, .secondary_trusted_keyring, and
  .platform - consistent across the different architectures"

* tag 'integrity-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  kexec, KEYS, s390: Make use of built-in and secondary keyring for signature verification
  arm64: kexec_file: use more system keyrings to verify kernel image signature
  kexec, KEYS: make the code in bzImage64_verify_sig generic
  kexec: clean up arch_kexec_kernel_verify_sig
  kexec: drop weak attribute from functions
  kexec_file: drop weak attribute from functions
  evm: Use IS_ENABLED to initialize .enabled
2022-08-02 15:21:18 -07:00
Linus Torvalds efb2883060 Merge branch 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat updates from Len Brown:
 "Only updating the turbostat tool here, no kernel changes"

* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: version 2022.07.28
  tools/power turbostat: do not decode ACC for ICX and SPR
  tools/power turbostat: fix SPR PC6 limits
  tools/power turbostat: cleanup 'automatic_cstate_conversion_probe()'
  tools/power turbostat: separate SPR from ICX
  tools/power turbosstat: fix comment
  tools/power turbostat: Support RAPTORLAKE P
  tools/power turbostat: add support for ALDERLAKE_N
  tools/power turbostat: dump secondary Turbo-Ratio-Limit
  tools/power turbostat: simplify dump_turbo_ratio_limits()
  tools/power turbostat: dump CPUID.7.EDX.Hybrid
  tools/power turbostat: update turbostat.8
  tools/power turbostat: Show uncore frequency
  tools/power turbostat: Fix file pointer leak
  tools/power turbostat: replace strncmp with single character compare
  tools/power turbostat: print the kernel boot commandline
  tools/power turbostat: Introduce support for RaptorLake
2022-08-02 12:47:31 -07:00
Linus Torvalds 63e6053add Perf events updates for this cycle are:
- Fix Intel Alder Lake PEBS memory access latency & data source profiling info bugs.
 
 - Use Intel large-PEBS hardware feature in more circumstances, to reduce
   PMI overhead & reduce sampling data.
 
 - Extend the lost-sample profiling output with the PERF_FORMAT_LOST ABI variant,
   which tells tooling the exact number of samples lost.
 
 - Add new IBS register bits definitions.
 
 - AMD uncore events: Add PerfMonV2 DF (Data Fabric) enhancements.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmLn5MARHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jAWA/+N48UX35dD0u3k5S2zdYJRHzQkdbivVGc
 dOuCB3XTJYneaSI5byQkI4Xo8LUuMbF4q2Zi3/+XhTaqn2zYPP65D6ACL5hU9Shh
 F95TnLWbedIaxSJmjMCsWDlwBob8WgtLhokWvyq+ks66BqaDoBKHRtn+2fi0rwZb
 MbuN0199Gx/EicWEOeUGBSxoeKbjSix0BApqy+CuXC0DC3+3iwIPk4dbNfHXpHYs
 nqxjQKhJnoxdlgjiOY3UuYhdCZl1cuQFIu2Ce1N2nXCAgR2FeQD7ZqtcaA2TnsAO
 9BwRfLljavzHhOoz0zALR42kF+eOcnH5K9pIBx7py9Hjdmdsx88fUCovWK34MdG5
 KTuqiMWNLIUvoP9WBjl7wUtl2+vcjr9XwgCdneOO+zoNsk44qSRyer1RpEP6D9UM
 e9HvdXBVRzhnIhK9NYugeLJ+3nxvFL+OLvc3ZfUrtm04UzeetCBxMlvMv3y021V7
 0fInZjhzh4Dz2tJgNlG7AKXkXlsHlyj6/BH9uKc9wVokK+94g5mbspxW8R4gKPr2
 l06pYB7ttSpp26sq9vl5ASHO0rniiYAPsQcr7Ko3y72mmp6kfIe/HzYNhCEvgYe2
 6JJ8F9kPgRuKr0CwGvUzxFwBC7PJR80zUtZkRCIpV+rgxQcNmK5YXp/KQFIjQqkI
 rJfEaDOshl0=
 =DqaA
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf events updates from Ingo Molnar:

 - Fix Intel Alder Lake PEBS memory access latency & data source
   profiling info bugs.

 - Use Intel large-PEBS hardware feature in more circumstances, to
   reduce PMI overhead & reduce sampling data.

 - Extend the lost-sample profiling output with the PERF_FORMAT_LOST ABI
   variant, which tells tooling the exact number of samples lost.

 - Add new IBS register bits definitions.

 - AMD uncore events: Add PerfMonV2 DF (Data Fabric) enhancements.

* tag 'perf-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/ibs: Add new IBS register bits into header
  perf/x86/intel: Fix PEBS data source encoding for ADL
  perf/x86/intel: Fix PEBS memory access info encoding for ADL
  perf/core: Add a new read format to get a number of lost samples
  perf/x86/amd/uncore: Add PerfMonV2 RDPMC assignments
  perf/x86/amd/uncore: Add PerfMonV2 DF event format
  perf/x86/amd/uncore: Detect available DF counters
  perf/x86/amd/uncore: Use attr_update for format attributes
  perf/x86/amd/uncore: Use dynamic events array
  x86/events/intel/ds: Enable large PEBS for PERF_SAMPLE_WEIGHT_TYPE
2022-08-01 12:24:30 -07:00
Linus Torvalds ecf9b7bfea - Have invalid MSR accesses warnings appear only once after a
pr_warn_once() change broke that
 
 - Simplify {JMP,CALL}_NOSPEC and let the objtool retpoline patching
 infra take care of them instead of having unreadable alternative macros
 there
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLnvG8ACgkQEsHwGGHe
 VUpbPw//WUoBMnR9B9xpwuk6EBA0rVbQqCt0QYy/h27QuD9aIC+lYduL8CId1AbS
 J+bqDvpqFQRqc9idAvgkjspJNMnbOSqzAx1AbT4gBCH33hPYrB/6F7XasXSDZn0M
 OVJDyvhOhror2I6YuFc1uwjMPBZj8+fgv823io/RKqvJfj/WUoIBK1oxUlzm4rNs
 LkrxgKpvs3QznJ0RNnZUP1kKnezzC1RtxIdz8QD3rirpuZITF8HD04jIUyK7gohh
 XP5Vgt+9BzRCHyn43XT7UEobz93WASDn0YOlDCdOuMxhJONYpnvk8dYJkss2BReG
 oCGZNiUHKNALd/FWUr5EeWoa84XQwRwm2Y+jjdH3al7LIaIbERGXbTIHsWH6JtVk
 +EiXqj+B6d2muZpG/ka0PSgPvbeTeipeugNDzXGCLuWHkjLRwJRV/afUmzLuT7lJ
 zzrejZeXACtWmTLXXt5EPAPWsibapiuA6/4ucyr+jJj7CCY8pmrOGcxqHKlq7MHU
 9Lk50F6Y9jbEe7mRPmR67dj/P9QAFYewjVDLEgOdyganrNNGfRkddJVbckYYZTVF
 YSUsoZBAc5E3Fzo/yQkG3+6K0d13Syuf3QRhGxe2JvYlWt2mktaFsRtcTGSCd9KM
 AJw8tV0FDezxmdKQ8DNkeLG+tAt7m22dRJkhbSUilYQsNR2Cwxg=
 =70Gc
 -----END PGP SIGNATURE-----

Merge tag 'x86_core_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 core updates from Borislav Petkov:

 - Have invalid MSR accesses warnings appear only once after a
   pr_warn_once() change broke that

 - Simplify {JMP,CALL}_NOSPEC and let the objtool retpoline patching
   infra take care of them instead of having unreadable alternative
   macros there

* tag 'x86_core_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/extable: Fix ex_handler_msr() print condition
  x86,nospec: Simplify {JMP,CALL}_NOSPEC
2022-08-01 10:04:00 -07:00
Linus Torvalds 42efa5e3a8 - Remove the vendor check when selecting MWAIT as the default idle state
- Respect idle=nomwait when supplied on the kernel cmdline
 
 - Two small cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLntx0ACgkQEsHwGGHe
 VUqlRxAAkULobsk6Dx3wrQcYlpA8Mt/ctttTQXWiIQwhK1j7uP0zlGWBqImr5Wsk
 T04g1s29azulnPs3PydCF2QlLqSyF4v2PyyUwnpKfTP6CPM+MLtz98Gm6Xcbkt+s
 f28ISYgNP+15tskWdNqB5XIVGkuyBdNne9TiFwtnVrJYF47FSwqEWRyqMH+bIOGT
 wSZUCfjcw7PtKwfIAmYq4beS2+wbY9bsfVyIz+H0ks2EVFQdjYWb/kH9PgUYEQFe
 VEOBsPvTHDOJt0QXEXSJjmoSRUS77Wduw56Y3L2T4jWdXXQFWJ79rqNYDBvXGAdh
 Y8BKM5IYFZpzrmfw2RB6jbDY/JWO5PPFvHTXogQf9+wttSerZEffVQdOeTwjT8VD
 wc9/ZnNkT7915033VI90V+hdFkwarq8FXuFH8TkzcxP9DQNYG8CRTZBceq0UWBl0
 5RpIDwNX9JxGrR+frJi0D24qxz//wLe56UqW9hLp73NP8QtEYEW1nb1q30Q2eM3N
 iQblgmh63qQ/dy6JV1GFb3aePiWMUNQwcTrj1pd8YDfNlp4IsFsSswnsdAZWtr1A
 l9qewHkBZbbzyTQkBjExUsaIdiaMywFwnUmcQNL+fHqznZIvMhJC/oCJeS0Pe/RH
 alTUrYsk6Y87HFpxoXpd85a9+20m8yrA64uY8cSQguGZ9i5Lm8g=
 =jkpj
 -----END PGP SIGNATURE-----

Merge tag 'x86_cpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpu updates from Borislav Petkov:

 - Remove the vendor check when selecting MWAIT as the default idle
   state

 - Respect idle=nomwait when supplied on the kernel cmdline

 - Two small cleanups

* tag 'x86_cpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Use MSR_IA32_MISC_ENABLE constants
  x86: Fix comment for X86_FEATURE_ZEN
  x86: Remove vendor checks from prefer_mwait_c1_over_halt
  x86: Handle idle=nomwait cmdline properly for x86_idle
2022-08-01 09:49:29 -07:00
Linus Torvalds 650ea1f626 - Add machinery to initialize AMX register state in order for CPUs to
be able to enter deeper low-power state
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLnsksACgkQEsHwGGHe
 VUpOOw//WAfkouWFd7kmACSiWtkgEQfXgImhhM7tw5Zzks+aEMtL2RrKqFYzkFg5
 hJK+lMI8QDkBFU/bgI/nAZfFiAS7iBMPY4T2Uw4+jZCPLr3TmUheJ2Pe1CxlIzQC
 MfjXQm/j5uTZcB2jEORjPT5dVE3p6k1KpSbvf5ZKCc9YTwdylv3VeYcfv5WEkihR
 61bWU+T7Yse4A3Bx32ewabLmk7lwOcdS1vbfsqdvkpI1vE1gI8CThgTuNAt8JWij
 27GIxiF2BQkyw3d/IPt3wGIPOgVowISXWdtMgpCr17Mw1m+44vXG9cjSuAKfqAUY
 wNXrBzirdqzJgN85WVJEFIoJasFJicrz/oNLYbcHQa8+AruRu6in22cSkPYPvVGc
 iNgSlQOZdoY9Vl6izEV4OawCccYnKjskEW7nEVIqfENrwRPYWB/IAnGxkla7q3Ch
 q+T8dyOAWToumuPK13c5VoX0nd02bfwSJACYRxN+M22zq8s7+Jv1fNtQeAGLnmD1
 jG3HR0wJWBOVVyira7AbFI7Mx667HayslIesftEGU33FfY0gZTcwZ7jsZ9GTSyOi
 AgHN3PvHyJYQ648T8JzbyuNJe3dyDKf81OLaPHP6+nV9Dy3aCrERTML0jo8xWv2N
 rDA61BV/q+hdQS3vzmLRVPzLLZksGRNCS2ZzIbkR4dGxLQAAB2M=
 =w/wH
 -----END PGP SIGNATURE-----

Merge tag 'x86_fpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fpu update from Borislav Petkov:

 - Add machinery to initialize AMX register state in order for
   AMX-capable CPUs to be able to enter deeper low-power state

* tag 'x86_fpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  intel_idle: Add a new flag to initialize the AMX state
  x86/fpu: Add a helper to prepare AMX state for low-power CPU idle
2022-08-01 09:36:18 -07:00
Linus Torvalds 92598ae22f - Rename a PKRU macro to make more sense when reading the code
- Update pkeys documentation
 
 - Avoid reading contended mm's TLB generation var if not absolutely
 necessary along with fixing a case where arch_tlbbatch_flush() doesn't
 adhere to the generation scheme and thus violates the conditions for the
 above avoidance.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLnmpYACgkQEsHwGGHe
 VUrINQ/9FGnQya6mTJitM3Ohdzu1lOrHm5+XAxCO3SVzPPQlx0mRZmszzDOIZpG/
 9iCEDhSi+kLdkTwIXk8Nmm1imNT2MSqswjQYr8KDtl69/j12W8Y0Pb5C5tnQnUyi
 FXPiVVCAk0iegNg+QvarQa8Ou6tGWDqFMLzdrq9XNokdBmFq7FCDsOjdwd8So3IY
 95755wDtCxgBXc2TVr08qSpD0Q/VlHKqb5shtzuoBe9a0YLEaRmWne9UzTOx5U6c
 //qk8lmy9ohL8dmN7SgcRITzfpU8ue+/J4oZ+GV9mc/UTW5Ah2WNX+3BFnmCqZrK
 gr7G5pukuuJxFj8yGzGbGIM28OHKYIE+So2Q5pA6Vrqst/oyDJS+pcoxyhAYGYCQ
 hDjp4yu5AUnsPky6h6VHaR8Er5Nvo7YwhdSazcGD+HC7smwbnVEzI5H7MUgcJ05F
 1CkAQSy2TVZe0hhilOu8dcHN23+2ISF8BzxKbn4qtZOsJTN6/U4MYFWl6VPh8P80
 vjZcIJYZ4i6Gz03m7ITk2bHwfOD8f/7UkbZEggO/GYm1BgmxaMB0IogoIkSUG9vN
 CLGZomRMfBcVVS1DTWJsUzRLbNx3x3pL41NrlxPbC/rTmvts5eJAvcDcffPfRGzx
 tCqcASRdV7tQBgMT5MLjmIY8cM1aphdGSdlKVD7QHZ11bJVFZE4=
 =aD0S
 -----END PGP SIGNATURE-----

Merge tag 'x86_mm_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 mm updates from Borislav Petkov:

 - Rename a PKRU macro to make more sense when reading the code

 - Update pkeys documentation

 - Avoid reading contended mm's TLB generation var if not absolutely
   necessary along with fixing a case where arch_tlbbatch_flush()
   doesn't adhere to the generation scheme and thus violates the
   conditions for the above avoidance.

* tag 'x86_mm_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm/tlb: Ignore f->new_tlb_gen when zero
  x86/pkeys: Clarify PKRU_AD_KEY macro
  Documentation/protection-keys: Clean up documentation for User Space pkeys
  x86/mm/tlb: Avoid reading mm_tlb_gen when possible
2022-08-01 09:34:39 -07:00
Paolo Bonzini 63f4b21041 Merge remote-tracking branch 'kvm/next' into kvm-next-5.20
KVM/s390, KVM/x86 and common infrastructure changes for 5.20

x86:

* Permit guests to ignore single-bit ECC errors

* Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache

* Intel IPI virtualization

* Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS

* PEBS virtualization

* Simplify PMU emulation by just using PERF_TYPE_RAW events

* More accurate event reinjection on SVM (avoid retrying instructions)

* Allow getting/setting the state of the speaker port data bit

* Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent

* "Notify" VM exit (detect microarchitectural hangs) for Intel

* Cleanups for MCE MSR emulation

s390:

* add an interface to provide a hypervisor dump for secure guests

* improve selftests to use TAP interface

* enable interpretive execution of zPCI instructions (for PCI passthrough)

* First part of deferred teardown

* CPU Topology

* PV attestation

* Minor fixes

Generic:

* new selftests API using struct kvm_vcpu instead of a (vm, id) tuple

x86:

* Use try_cmpxchg64 instead of cmpxchg64

* Bugfixes

* Ignore benign host accesses to PMU MSRs when PMU is disabled

* Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior

* x86/MMU: Allow NX huge pages to be disabled on a per-vm basis

* Port eager page splitting to shadow MMU as well

* Enable CMCI capability by default and handle injected UCNA errors

* Expose pid of vcpu threads in debugfs

* x2AVIC support for AMD

* cleanup PIO emulation

* Fixes for LLDT/LTR emulation

* Don't require refcounted "struct page" to create huge SPTEs

x86 cleanups:

* Use separate namespaces for guest PTEs and shadow PTEs bitmasks

* PIO emulation

* Reorganize rmap API, mostly around rmap destruction

* Do not workaround very old KVM bugs for L0 that runs with nesting enabled

* new selftests API for CPUID
2022-08-01 03:21:00 -04:00
Ben Dooks 787dbea11a profile: setup_profiling_timer() is moslty not implemented
The setup_profiling_timer() is mostly un-implemented by many
architectures.  In many places it isn't guarded by CONFIG_PROFILE which is
needed for it to be used.  Make it a weak symbol in kernel/profile.c and
remove the 'return -EINVAL' implementations from the kenrel.

There are a couple of architectures which do return 0 from the
setup_profiling_timer() function but they don't seem to do anything else
with it.  To keep the /proc compatibility for now, leave these for a
future update or removal.

On ARM, this fixes the following sparse warning:
arch/arm/kernel/smp.c:793:5: warning: symbol 'setup_profiling_timer' was not declared. Should it be static?

Link: https://lkml.kernel.org/r/20220721195509.418205-1-ben-linux@fluff.org
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-29 18:12:36 -07:00
Len Brown 4af184ee8b tools/power turbostat: dump secondary Turbo-Ratio-Limit
Intel Performance Hybrid processors have a 2nd MSR
describing the turbo limits enforced on the Ecores.

Note, TRL and Secondary-TRL are usually R/O information,
but on overclock-capable parts, they can be written.

Signed-off-by: Len Brown <len.brown@intel.com>
2022-07-28 14:23:26 -04:00
Borislav Petkov 5bb6c1d112 Revert "x86/sev: Expose sev_es_ghcb_hv_call() for use by HyperV"
This reverts commit 007faec014.

Now that hyperv does its own protocol negotiation:

  49d6a3c062 ("x86/Hyper-V: Add SEV negotiate protocol support in Isolation VM")

revert this exposure of the sev_es_ghcb_hv_call() helper.

Cc: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by:Tianyu Lan <tiala@microsoft.com>
Link: https://lore.kernel.org/r/20220614014553.1915929-1-ltykernel@gmail.com
2022-07-27 18:09:13 +02:00
Ravi Bangoria 326ecc15c6 perf/x86/ibs: Add new IBS register bits into header
IBS support has been enhanced with two new features in upcoming uarch:

  1. DataSrc extension and
  2. L3 miss filtering.

Additional set of bits has been introduced in IBS registers to use these
features. Define these new bits into arch/x86/ header.

  [ bp: Massage commit message. ]

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20220604044519.594-7-ravi.bangoria@amd.com
2022-07-27 13:54:38 +02:00
Jason A. Donenfeld d349ab99ee random: handle archrandom with multiple longs
The archrandom interface was originally designed for x86, which supplies
RDRAND/RDSEED for receiving random words into registers, resulting in
one function to generate an int and another to generate a long. However,
other architectures don't follow this.

On arm64, the SMCCC TRNG interface can return between one and three
longs. On s390, the CPACF TRNG interface can return arbitrary amounts,
with four longs having the same cost as one. On UML, the os_getrandom()
interface can return arbitrary amounts.

So change the api signature to take a "max_longs" parameter designating
the maximum number of longs requested, and then return the number of
longs generated.

Since callers need to check this return value and loop anyway, each arch
implementation does not bother implementing its own loop to try again to
fill the maximum number of longs. Additionally, all existing callers
pass in a constant max_longs parameter. Taken together, these two things
mean that the codegen doesn't really change much for one-word-at-a-time
platforms, while performance is greatly improved on platforms such as
s390.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-07-25 13:26:14 +02:00
Linus Torvalds 05017fed92 - Make retbleed mitigations 64-bit only (32-bit will need a bit more
work if even needed, at all).
 
 - Prevent return thunks patching of the LKDTM modules as it is not needed there
 
 - Avoid writing the SPEC_CTRL MSR on every kernel entry on eIBRS parts
 
 - Enhance error output of apply_returns() when it fails to patch a return thunk
 
 - A sparse fix to the sev-guest module
 
 - Protect EFI fw calls by issuing an IBPB on AMD
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLdCZIACgkQEsHwGGHe
 VUpZHw//frU5tCHiP+4UM/kGdrcDxVy+FutkoprrNzhYszkvVGWOU4GOyExjnbXj
 cS47AlIAv5o2T+UKN6pwtH1n6Jb+Qf2zsaMlyIcxI7t1PiyddkT9wvrgP52GPNyl
 vrxtDVoo1aYF8z7RxTEtjWHzUlfxQTSlo+olQVtW/JYWnqQOPwvPm+bOA+0MBzwA
 gbDV2nM1Z0D9ifo536J9P93/HtYwW/fsrutTj0hICxweRGnEN7Gw11Ci+6cT+Rgq
 TYO5feaFnnbv0hnfDbWQXAuDVdgi0aJ06CDuoC6U/WmKNpQo2rpjAASVXN/EEQyh
 wwy/bqauj8daOpSU/8/hNx1Rwx3gP6FWEzYpylsn3xP7p9y6dWQnBh4i2ZmMhxUt
 zbIlwNqSFWluFJaSpOnaPJJVrQbRj03Yb1ynTPCcuTl1K1ydGr03J351K8SfeXqP
 bhkecr+BiddFs2Y3udaq/NYJEFRHtbZi3CdDcUulbmEfKOZIQRN0JMSAvR7SH1sZ
 4lljsoSCfJgxKmgP9tVhRc19LD6knxTAzVbtjtyuEmsf1dN36rN5r5Y4lc1AN68s
 12ymbV0qc/Qh6Cn8x/Cgv3S6kXjYrKdbSQHVnJXsJE0kz/mqyIn3+LChU+PoIvmu
 7uGeEouIlDroBgT5WWFWnTgxujFPZD6O2rPsJwOXvTkp9uDof4Q=
 =aSDG
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v5.19_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:
 "A couple more retbleed fallout fixes.

  It looks like their urgency is decreasing so it seems like we've
  managed to catch whatever snafus the limited -rc testing has exposed.
  Maybe we're getting ready... :)

   - Make retbleed mitigations 64-bit only (32-bit will need a bit more
     work if even needed, at all).

   - Prevent return thunks patching of the LKDTM modules as it is not
     needed there

   - Avoid writing the SPEC_CTRL MSR on every kernel entry on eIBRS
     parts

   - Enhance error output of apply_returns() when it fails to patch a
     return thunk

   - A sparse fix to the sev-guest module

   - Protect EFI fw calls by issuing an IBPB on AMD"

* tag 'x86_urgent_for_v5.19_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation: Make all RETbleed mitigations 64-bit only
  lkdtm: Disable return thunks in rodata.c
  x86/bugs: Warn when "ibrs" mitigation is selected on Enhanced IBRS parts
  x86/alternative: Report missing return thunk details
  virt: sev-guest: Pass the appropriate argument type to iounmap()
  x86/amd: Use IBPB for firmware calls
2022-07-24 09:40:17 -07:00
Stafford Horne abb4970ac3 PCI: Move isa_dma_bridge_buggy out of asm/dma.h
The isa_dma_bridge_buggy symbol is only used for x86_32, and only x86_32
platforms or quirks ever set it.

Add a new linux/isa-dma.h header that #defines isa_dma_bridge_buggy to 0
except on x86_32, where we keep it as a variable, and remove all the arch-
specific definitions.

[bhelgaas: commit log]
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Link: https://lore.kernel.org/r/20220722214944.831438-3-shorne@gmail.com
Signed-off-by: Stafford Horne <shorne@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
2022-07-22 17:24:47 -05:00
Stafford Horne ae85b23c65 PCI: Remove pci_get_legacy_ide_irq() and asm-generic/pci.h
pci_get_legacy_ide_irq() is only used on platforms that support PNP, so
many architectures define it but never use it.  Replace uses of it with
ATA_PRIMARY_IRQ() and ATA_SECONDARY_IRQ(), which provide the same
functionality.

Since pci_get_legacy_ide_irq() is no longer used, remove all the
architecture-specific definitions of it as well as asm-generic/pci.h, which
only provides pci_get_legacy_ide_irq()

[bhelgaas: commit log]
Co-developed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20220722214944.831438-2-shorne@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Stafford Horne <shorne@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Pierre Morel <pmorel@linux.ibm.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-22 17:23:45 -05:00
Peter Zijlstra 1e9fdf21a4 mmu_gather: Remove per arch tlb_{start,end}_vma()
Scattered across the archs are 3 basic forms of tlb_{start,end}_vma().
Provide two new MMU_GATHER_knobs to enumerate them and remove the per
arch tlb_{start,end}_vma() implementations.

 - MMU_GATHER_NO_FLUSH_CACHE indicates the arch has flush_cache_range()
   but does *NOT* want to call it for each VMA.

 - MMU_GATHER_MERGE_VMAS indicates the arch wants to merge the
   invalidate across multiple VMAs if possible.

With these it is possible to capture the three forms:

  1) empty stubs;
     select MMU_GATHER_NO_FLUSH_CACHE and MMU_GATHER_MERGE_VMAS

  2) start: flush_cache_range(), end: empty;
     select MMU_GATHER_MERGE_VMAS

  3) start: flush_cache_range(), end: flush_tlb_range();
     default

Obviously, if the architecture does not have flush_cache_range() then
it also doesn't need to select MMU_GATHER_NO_FLUSH_CACHE.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-21 10:50:13 -07:00
Peter Zijlstra 09d09531a5 x86,nospec: Simplify {JMP,CALL}_NOSPEC
Have {JMP,CALL}_NOSPEC generate the same code GCC does for indirect
calls and rely on the objtool retpoline patching infrastructure.

There's no reason these should be alternatives while the vast bulk of
compiler generated retpolines are not.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2022-07-21 10:39:42 +02:00
Chang S. Bae f17b168734 x86/fpu: Add a helper to prepare AMX state for low-power CPU idle
When a CPU enters an idle state, a non-initialized AMX register state may
be the cause of preventing a deeper low-power state. Other extended
register states whether initialized or not do not impact the CPU idle
state.

The new helper can ensure the AMX state is initialized before the CPU is
idle, and it will be used by the intel idle driver.

Check the AMX_TILE feature bit before using XGETBV1 as a chain of
dependencies was established via cpuid_deps[]: AMX->XFD->XGETBV1.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20220608164748.11864-2-chang.seok.bae@intel.com
2022-07-19 18:46:15 +02:00
Nadav Amit 8f1d56f64f x86/mm/tlb: Ignore f->new_tlb_gen when zero
Commit aa44284960 ("x86/mm/tlb: Avoid reading mm_tlb_gen when
possible") introduced an optimization to skip superfluous TLB
flushes based on the generation provided in flush_tlb_info.

However, arch_tlbbatch_flush() does not provide any generation in
flush_tlb_info and populates the flush_tlb_info generation with
0.  This 0 is causes the flush_tlb_info to be interpreted as a
superfluous, old flush.  As a result, try_to_unmap_one() would
not perform any TLB flushes.

Fix it by checking whether f->new_tlb_gen is nonzero. Zero value
is anyhow is an invalid generation value. To avoid future
confusion, introduce TLB_GENERATION_INVALID constant and use it
properly. Add warnings to ensure no partial flushes are done with
TLB_GENERATION_INVALID or when f->mm is NULL, since this does not
make any sense.

In addition, add the missing unlikely().

[ dhansen: change VM_BUG_ON() -> VM_WARN_ON(), clarify changelog ]

Fixes: aa44284960 ("x86/mm/tlb: Avoid reading mm_tlb_gen when possible")
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Hugh Dickins <hughd@google.com>
Link: https://lkml.kernel.org/r/20220710232837.3618-1-namit@vmware.com
2022-07-19 09:04:52 -07:00
Peter Zijlstra 28a99e95f5 x86/amd: Use IBPB for firmware calls
On AMD IBRS does not prevent Retbleed; as such use IBPB before a
firmware call to flush the branch history state.

And because in order to do an EFI call, the kernel maps a whole lot of
the kernel page table into the EFI page table, do an IBPB just in case
in order to prevent the scenario of poisoning the BTB and causing an EFI
call using the unprotected RET there.

  [ bp: Massage. ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220715194550.793957-1-cascardo@canonical.com
2022-07-18 15:38:09 +02:00
Jason A. Donenfeld 9592eef7c1 random: remove CONFIG_ARCH_RANDOM
When RDRAND was introduced, there was much discussion on whether it
should be trusted and how the kernel should handle that. Initially, two
mechanisms cropped up, CONFIG_ARCH_RANDOM, a compile time switch, and
"nordrand", a boot-time switch.

Later the thinking evolved. With a properly designed RNG, using RDRAND
values alone won't harm anything, even if the outputs are malicious.
Rather, the issue is whether those values are being *trusted* to be good
or not. And so a new set of options were introduced as the real
ones that people use -- CONFIG_RANDOM_TRUST_CPU and "random.trust_cpu".
With these options, RDRAND is used, but it's not always credited. So in
the worst case, it does nothing, and in the best case, maybe it helps.

Along the way, CONFIG_ARCH_RANDOM's meaning got sort of pulled into the
center and became something certain platforms force-select.

The old options don't really help with much, and it's a bit odd to have
special handling for these instructions when the kernel can deal fine
with the existence or untrusted existence or broken existence or
non-existence of that CPU capability.

Simplify the situation by removing CONFIG_ARCH_RANDOM and using the
ordinary asm-generic fallback pattern instead, keeping the two options
that are actually used. For now it leaves "nordrand" for now, as the
removal of that will take a different route.

Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-07-18 15:03:37 +02:00
Anshuman Khandual 4867fbbdd6 x86/mm: move protection_map[] inside the platform
This moves protection_map[] inside the platform and makes it a static. 
This also defines a helper function add_encrypt_protection_map() that can
update the protection_map[] array with pgprot_encrypted().

Link: https://lkml.kernel.org/r/20220711070600.2378316-7-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:14:38 -07:00
Naveen N. Rao 0738eceb62 kexec: drop weak attribute from functions
Drop __weak attribute from functions in kexec_core.c:
- machine_kexec_post_load()
- arch_kexec_protect_crashkres()
- arch_kexec_unprotect_crashkres()
- crash_free_reserved_phys_range()

Link: https://lkml.kernel.org/r/c0f6219e03cb399d166d518ab505095218a902dd.1656659357.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Suggested-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2022-07-15 12:21:16 -04:00
Naveen N. Rao 65d9a9a60f kexec_file: drop weak attribute from functions
As requested
(http://lkml.kernel.org/r/87ee0q7b92.fsf@email.froward.int.ebiederm.org),
this series converts weak functions in kexec to use the #ifdef approach.

Quoting the 3e35142ef9 ("kexec_file: drop weak attribute from
arch_kexec_apply_relocations[_add]") changelog:

: Since commit d1bcae833b32f1 ("ELF: Don't generate unused section symbols")
: [1], binutils (v2.36+) started dropping section symbols that it thought
: were unused.  This isn't an issue in general, but with kexec_file.c, gcc
: is placing kexec_arch_apply_relocations[_add] into a separate
: .text.unlikely section and the section symbol ".text.unlikely" is being
: dropped.  Due to this, recordmcount is unable to find a non-weak symbol in
: .text.unlikely to generate a relocation record against.

This patch (of 2);

Drop __weak attribute from functions in kexec_file.c:
- arch_kexec_kernel_image_probe()
- arch_kimage_file_post_load_cleanup()
- arch_kexec_kernel_image_load()
- arch_kexec_locate_mem_hole()
- arch_kexec_kernel_verify_sig()

arch_kexec_kernel_image_load() calls into kexec_image_load_default(), so
drop the static attribute for the latter.

arch_kexec_kernel_verify_sig() is not overridden by any architecture, so
drop the __weak attribute.

Link: https://lkml.kernel.org/r/cover.1656659357.git.naveen.n.rao@linux.vnet.ibm.com
Link: https://lkml.kernel.org/r/2cd7ca1fe4d6bb6ca38e3283c717878388ed6788.1656659357.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Suggested-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2022-07-15 12:21:16 -04:00
Nathan Chancellor db88697968 x86/speculation: Use DECLARE_PER_CPU for x86_spec_ctrl_current
Clang warns:

  arch/x86/kernel/cpu/bugs.c:58:21: error: section attribute is specified on redeclared variable [-Werror,-Wsection]
  DEFINE_PER_CPU(u64, x86_spec_ctrl_current);
                      ^
  arch/x86/include/asm/nospec-branch.h:283:12: note: previous declaration is here
  extern u64 x86_spec_ctrl_current;
             ^
  1 error generated.

The declaration should be using DECLARE_PER_CPU instead so all
attributes stay in sync.

Cc: stable@vger.kernel.org
Fixes: fc02735b14 ("KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-14 14:52:43 -07:00
Sean Christopherson ba28401bb9 KVM: x86: Restrict get_mt_mask() to a u8, use KVM_X86_OP_OPTIONAL_RET0
Restrict get_mt_mask() to a u8 and reintroduce using a RET0 static_call
for the SVM implementation.  EPT stores the memtype information in the
lower 8 bits (bits 6:3 to be precise), and even returns a shifted u8
without an explicit cast to a larger type; there's no need to return a
full u64.

Note, RET0 doesn't play nice with a u64 return on 32-bit kernels, see
commit bf07be36cd ("KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for
get_mt_mask").

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220714153707.3239119-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14 11:43:12 -04:00
Sean Christopherson 43bb9e000e KVM: x86: Tweak name of MONITOR/MWAIT #UD quirk to make it #UD specific
Add a "UD" clause to KVM_X86_QUIRK_MWAIT_NEVER_FAULTS to make it clear
that the quirk only controls the #UD behavior of MONITOR/MWAIT.  KVM
doesn't currently enforce fault checks when MONITOR/MWAIT are supported,
but that could change in the future.  SVM also has a virtualization hole
in that it checks all faults before intercepts, and so "never faults" is
already a lie when running on SVM.

Fixes: bfbcc81bb8 ("KVM: x86: Add a quirk for KVM's "MONITOR/MWAIT are NOPs!" behavior")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220711225753.1073989-4-seanjc@google.com
2022-07-13 18:14:05 -07:00
Hou Wenlong 6e1d2a3f25 KVM: x86/mmu: Replace UNMAPPED_GVA with INVALID_GPA for gva_to_gpa()
The result of gva_to_gpa() is physical address not virtual address,
it is odd that UNMAPPED_GVA macro is used as the result for physical
address. Replace UNMAPPED_GVA with INVALID_GPA and drop UNMAPPED_GVA
macro.

No functional change intended.

Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/6104978956449467d3c68f1ad7f2c2f6d771d0ee.1656667239.git.houwenlong.hwl@antgroup.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-07-12 22:31:12 +00:00
Linus Torvalds ce114c8668 Just when you thought that all the speculation bugs were addressed and
solved and the nightmare is complete, here's the next one: speculating
 after RET instructions and leaking privileged information using the now
 pretty much classical covert channels.
 
 It is called RETBleed and the mitigation effort and controlling
 functionality has been modelled similar to what already existing
 mitigations provide.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLKqAgACgkQEsHwGGHe
 VUoM5w/8CSvwPZ3otkhmu8MrJPtWc7eLDPjYN4qQP+19e+bt094MoozxeeWG2wmp
 hkDJAYHT2Oik/qDuEdhFgNYwS7XGgbV3Py3B8syO4//5SD5dkOSG+QqFXvXMdFri
 YsVqqNkjJOWk/YL9Ql5RS/xQewsrr0OqEyWWocuI6XAvfWV4kKvlRSd+6oPqtZEO
 qYlAHTXElyIrA/gjmxChk1HTt5HZtK3uJLf4twNlUfzw7LYFf3+sw3bdNuiXlyMr
 WcLXMwGpS0idURwP3mJa7JRuiVBzb4+kt8mWwWqA02FkKV45FRRRFhFUsy667r00
 cdZBaWdy+b7dvXeliO3FN/x1bZwIEUxmaNy1iAClph4Ifh0ySPUkxAr8EIER7YBy
 bstDJEaIqgYg8NIaD4oF1UrG0ZbL0ImuxVaFdhG1hopQsh4IwLSTLgmZYDhfn/0i
 oSqU0Le+A7QW9s2A2j6qi7BoAbRW+gmBuCgg8f8ECYRkFX1ZF6mkUtnQxYrU7RTq
 rJWGW9nhwM9nRxwgntZiTjUUJ2HtyXEgYyCNjLFCbEBfeG5QTg7XSGFhqDbgoymH
 85vsmSXYxgTgQ/kTW7Fs26tOqnP2h1OtLJZDL8rg49KijLAnISClEgohYW01CWQf
 ZKMHtz3DM0WBiLvSAmfGifScgSrLB5AjtvFHT0hF+5/okEkinVk=
 =09fW
 -----END PGP SIGNATURE-----

Merge tag 'x86_bugs_retbleed' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 retbleed fixes from Borislav Petkov:
 "Just when you thought that all the speculation bugs were addressed and
  solved and the nightmare is complete, here's the next one: speculating
  after RET instructions and leaking privileged information using the
  now pretty much classical covert channels.

  It is called RETBleed and the mitigation effort and controlling
  functionality has been modelled similar to what already existing
  mitigations provide"

* tag 'x86_bugs_retbleed' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  x86/speculation: Disable RRSBA behavior
  x86/kexec: Disable RET on kexec
  x86/bugs: Do not enable IBPB-on-entry when IBPB is not supported
  x86/entry: Move PUSH_AND_CLEAR_REGS() back into error_entry
  x86/bugs: Add Cannon lake to RETBleed affected CPU list
  x86/retbleed: Add fine grained Kconfig knobs
  x86/cpu/amd: Enumerate BTC_NO
  x86/common: Stamp out the stepping madness
  KVM: VMX: Prevent RSB underflow before vmenter
  x86/speculation: Fill RSB on vmexit for IBRS
  KVM: VMX: Fix IBRS handling after vmexit
  KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS
  KVM: VMX: Convert launched argument to flags
  KVM: VMX: Flatten __vmx_vcpu_run()
  objtool: Re-add UNWIND_HINT_{SAVE_RESTORE}
  x86/speculation: Remove x86_spec_ctrl_mask
  x86/speculation: Use cached host SPEC_CTRL value for guest entry/exit
  x86/speculation: Fix SPEC_CTRL write on SMT state change
  x86/speculation: Fix firmware entry SPEC_CTRL handling
  x86/speculation: Fix RSB filling with CONFIG_RETPOLINE=n
  ...
2022-07-11 18:15:25 -07:00
Pawan Gupta 4ad3278df6 x86/speculation: Disable RRSBA behavior
Some Intel processors may use alternate predictors for RETs on
RSB-underflow. This condition may be vulnerable to Branch History
Injection (BHI) and intramode-BTI.

Kernel earlier added spectre_v2 mitigation modes (eIBRS+Retpolines,
eIBRS+LFENCE, Retpolines) which protect indirect CALLs and JMPs against
such attacks. However, on RSB-underflow, RET target prediction may
fallback to alternate predictors. As a result, RET's predicted target
may get influenced by branch history.

A new MSR_IA32_SPEC_CTRL bit (RRSBA_DIS_S) controls this fallback
behavior when in kernel mode. When set, RETs will not take predictions
from alternate predictors, hence mitigating RETs as well. Support for
this is enumerated by CPUID.7.2.EDX[RRSBA_CTRL] (bit2).

For spectre v2 mitigation, when a user selects a mitigation that
protects indirect CALLs and JMPs against BHI and intramode-BTI, set
RRSBA_DIS_S also to protect RETs for RSB-underflow case.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-07-09 13:12:45 +02:00
Reinette Chatre 8cb7b502f3 x86/sgx: Keep record of SGX page type
SGX2 functions are not allowed on all page types. For example,
ENCLS[EMODPR] is only allowed on regular SGX enclave pages and
ENCLS[EMODPT] is only allowed on TCS and regular pages. If these
functions are attempted on another type of page the hardware would
trigger a fault.

Keep a record of the SGX page type so that there is more
certainty whether an SGX2 instruction can succeed and faults
can be treated as real failures.

The page type is a property of struct sgx_encl_page
and thus does not cover the VA page type. VA pages are maintained
in separate structures and their type can be determined in
a different way. The SGX2 instructions needing the page type do not
operate on VA pages and this is thus not a scenario needing to
be covered at this time.

struct sgx_encl_page hosting this information is maintained for each
enclave page so the space consumed by the struct is important.
The existing sgx_encl_page->vm_max_prot_bits is already unsigned long
while only using three bits. Transition to a bitfield for the two
members to support the additional information without increasing
the space consumed by the struct.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/a0a6939eefe7ba26514f6c49723521cde372de64.1652137848.git.reinette.chatre@intel.com
2022-07-07 10:13:02 -07:00
Reinette Chatre 0fb2126db8 x86/sgx: Add wrapper for SGX2 EMODPR function
Add a wrapper for the EMODPR ENCLS leaf function used to
restrict enclave page permissions as maintained in the
SGX hardware's Enclave Page Cache Map (EPCM).

EMODPR:
1) Updates the EPCM permissions of an enclave page by treating
   the new permissions as a mask. Supplying a value that attempts
   to relax EPCM permissions has no effect on EPCM permissions
   (PR bit, see below, is changed).
2) Sets the PR bit in the EPCM entry of the enclave page to
   indicate that permission restriction is in progress. The bit
   is reset by the enclave by invoking ENCLU leaf function
   EACCEPT or EACCEPTCOPY.

The enclave may access the page throughout the entire process
if conforming to the EPCM permissions for the enclave page.

After performing the permission restriction by issuing EMODPR
the kernel needs to collaborate with the hardware to ensure that
all logical processors sees the new restricted permissions. This
is required for the enclave's EACCEPT/EACCEPTCOPY to succeed and
is accomplished with the ETRACK flow.

Expand enum sgx_return_code with the possible EMODPR return
values.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/d15e7a769e13e4ca671fa2d0a0d3e3aec5aedbd4.1652137848.git.reinette.chatre@intel.com
2022-07-07 10:13:01 -07:00
Juergen Gross 96e8fc5818 x86/xen: Use clear_bss() for Xen PV guests
Instead of clearing the bss area in assembly code, use the clear_bss()
function.

This requires to pass the start_info address as parameter to
xen_start_kernel() in order to avoid the xen_start_info being zeroed
again.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20220630071441.28576-2-jgross@suse.com
2022-07-01 10:57:52 +02:00
Alexander Lobakin 0e862838f2 bitops: unify non-atomic bitops prototypes across architectures
Currently, there is a mess with the prototypes of the non-atomic
bitops across the different architectures:

ret	bool, int, unsigned long
nr	int, long, unsigned int, unsigned long
addr	volatile unsigned long *, volatile void *

Thankfully, it doesn't provoke any bugs, but can sometimes make
the compiler angry when it's not handy at all.
Adjust all the prototypes to the following standard:

ret	bool				retval can be only 0 or 1
nr	unsigned long			native; signed makes no sense
addr	volatile unsigned long *	bitmaps are arrays of ulongs

Next, some architectures don't define 'arch_' versions as they don't
support instrumentation, others do. To make sure there is always the
same set of callables present and to ease any potential future
changes, make them all follow the rule:
 * architecture-specific files define only 'arch_' versions;
 * non-prefixed versions can be defined only in asm-generic files;
and place the non-prefixed definitions into a new file in
asm-generic to be included by non-instrumented architectures.

Finally, add some static assertions in order to prevent people from
making a mess in this room again.
I also used the %__always_inline attribute consistently, so that
they always get resolved to the actual operations.

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-06-30 19:52:41 -07:00
Peter Zijlstra f43b9876e8 x86/retbleed: Add fine grained Kconfig knobs
Do fine-grained Kconfig for all the various retbleed parts.

NOTE: if your compiler doesn't support return thunks this will
silently 'upgrade' your mitigation to IBPB, you might not like this.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-29 17:43:41 +02:00
Sudeep Holla 8add9a3a22 efi: Simplify arch_efi_call_virt() macro
Currently, the arch_efi_call_virt() assumes all users of it will have
defined a type 'efi_##f##_t' to make use of it.

Simplify the arch_efi_call_virt() macro by eliminating the explicit
need for efi_##f##_t type for every user of this macro.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
[ardb: apply Sudeep's ARM fix to i686, Loongarch and RISC-V too]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-06-28 20:13:09 +02:00
Arnd Bergmann 4313a24985 arch/*/: remove CONFIG_VIRT_TO_BUS
All architecture-independent users of virt_to_bus() and bus_to_virt()
have been fixed to use the dma mapping interfaces or have been
removed now.  This means the definitions on most architectures, and the
CONFIG_VIRT_TO_BUS symbol are now obsolete and can be removed.

The only exceptions to this are a few network and scsi drivers for m68k
Amiga and VME machines and ppc32 Macintosh. These drivers work correctly
with the old interfaces and are probably not worth changing.

On alpha and parisc, virt_to_bus() were still used in asm/floppy.h.
alpha can use isa_virt_to_bus() like x86 does, and parisc can just
open-code the virt_to_phys() here, as this is architecture specific
code.

I tried updating the bus-virt-phys-mapping.rst documentation, which
started as an email from Linus to explain some details of the Linux-2.0
driver interfaces. The bits about virt_to_bus() were declared obsolete
backin 2000, and the rest is not all that relevant any more, so in the
end I just decided to remove the file completely.

Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-06-28 13:20:21 +02:00
Andrew Cooper 26aae8ccbc x86/cpu/amd: Enumerate BTC_NO
BTC_NO indicates that hardware is not susceptible to Branch Type Confusion.

Zen3 CPUs don't suffer BTC.

Hypervisors are expected to synthesise BTC_NO when it is appropriate
given the migration pool, to prevent kernels using heuristics.

  [ bp: Massage. ]

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:34:01 +02:00
Josh Poimboeuf 9756bba284 x86/speculation: Fill RSB on vmexit for IBRS
Prevent RSB underflow/poisoning attacks with RSB.  While at it, add a
bunch of comments to attempt to document the current state of tribal
knowledge about RSB attacks and what exactly is being mitigated.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:34:00 +02:00
Josh Poimboeuf fc02735b14 KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS
On eIBRS systems, the returns in the vmexit return path from
__vmx_vcpu_run() to vmx_vcpu_run() are exposed to RSB poisoning attacks.

Fix that by moving the post-vmexit spec_ctrl handling to immediately
after the vmexit.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:34:00 +02:00
Josh Poimboeuf 8faea26e61 objtool: Re-add UNWIND_HINT_{SAVE_RESTORE}
Commit

  c536ed2fff ("objtool: Remove SAVE/RESTORE hints")

removed the save/restore unwind hints because they were no longer
needed. Now they're going to be needed again so re-add them.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:34:00 +02:00
Josh Poimboeuf e6aa13622e x86/speculation: Fix firmware entry SPEC_CTRL handling
The firmware entry code may accidentally clear STIBP or SSBD. Fix that.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:34:00 +02:00
Josh Poimboeuf b2620facef x86/speculation: Fix RSB filling with CONFIG_RETPOLINE=n
If a kernel is built with CONFIG_RETPOLINE=n, but the user still wants
to mitigate Spectre v2 using IBRS or eIBRS, the RSB filling will be
silently disabled.

There's nothing retpoline-specific about RSB buffer filling.  Remove the
CONFIG_RETPOLINE guards around it.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:34:00 +02:00
Peter Zijlstra d7caac991f x86/cpu/amd: Add Spectral Chicken
Zen2 uarchs have an undocumented, unnamed, MSR that contains a chicken
bit for some speculation behaviour. It needs setting.

Note: very belatedly AMD released naming; it's now officially called
      MSR_AMD64_DE_CFG2 and MSR_AMD64_DE_CFG2_SUPPRESS_NOBR_PRED_BIT
      but shall remain the SPECTRAL CHICKEN.

Suggested-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:34:00 +02:00
Peter Zijlstra a09a6e2399 objtool: Add entry UNRET validation
Since entry asm is tricky, add a validation pass that ensures the
retbleed mitigation has been done before the first actual RET
instruction.

Entry points are those that either have UNWIND_HINT_ENTRY, which acts
as UNWIND_HINT_EMPTY but marks the instruction as an entry point, or
those that have UWIND_HINT_IRET_REGS at +0.

This is basically a variant of validate_branch() that is
intra-function and it will simply follow all branches from marked
entry points and ensures that all paths lead to ANNOTATE_UNRET_END.

If a path hits RET or an indirection the path is a fail and will be
reported.

There are 3 ANNOTATE_UNRET_END instances:

 - UNTRAIN_RET itself
 - exception from-kernel; this path doesn't need UNTRAIN_RET
 - all early exceptions; these also don't need UNTRAIN_RET

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:34:00 +02:00
Peter Zijlstra 3ebc170068 x86/bugs: Add retbleed=ibpb
jmp2ret mitigates the easy-to-attack case at relatively low overhead.
It mitigates the long speculation windows after a mispredicted RET, but
it does not mitigate the short speculation window from arbitrary
instruction boundaries.

On Zen2, there is a chicken bit which needs setting, which mitigates
"arbitrary instruction boundaries" down to just "basic block boundaries".

But there is no fix for the short speculation window on basic block
boundaries, other than to flush the entire BTB to evict all attacker
predictions.

On the spectrum of "fast & blurry" -> "safe", there is (on top of STIBP
or no-SMT):

  1) Nothing		System wide open
  2) jmp2ret		May stop a script kiddy
  3) jmp2ret+chickenbit  Raises the bar rather further
  4) IBPB		Only thing which can count as "safe".

Tentative numbers put IBPB-on-entry at a 2.5x hit on Zen2, and a 10x hit
on Zen1 according to lmbench.

  [ bp: Fixup feature bit comments, document option, 32-bit build fix. ]

Suggested-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:34:00 +02:00
Peter Zijlstra 9bb2ec608a objtool: Update Retpoline validation
Update retpoline validation with the new CONFIG_RETPOLINE requirement of
not having bare naked RET instructions.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:33:59 +02:00
Peter Zijlstra bf5835bcdb intel_idle: Disable IBRS during long idle
Having IBRS enabled while the SMT sibling is idle unnecessarily slows
down the running sibling. OTOH, disabling IBRS around idle takes two
MSR writes, which will increase the idle latency.

Therefore, only disable IBRS around deeper idle states. Shallow idle
states are bounded by the tick in duration, since NOHZ is not allowed
for them by virtue of their short target residency.

Only do this for mwait-driven idle, since that keeps interrupts disabled
across idle, which makes disabling IBRS vs IRQ-entry a non-issue.

Note: C6 is a random threshold, most importantly C1 probably shouldn't
disable IBRS, benchmarking needed.

Suggested-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:33:59 +02:00
Peter Zijlstra 6ad0ad2bf8 x86/bugs: Report Intel retbleed vulnerability
Skylake suffers from RSB underflow speculation issues; report this
vulnerability and it's mitigation (spectre_v2=ibrs).

  [jpoimboe: cleanups, eibrs]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:33:59 +02:00
Pawan Gupta 7c693f54c8 x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS
Extend spectre_v2= boot option with Kernel IBRS.

  [jpoimboe: no STIBP with IBRS]

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:33:59 +02:00
Peter Zijlstra c779bc1a90 x86/bugs: Optimize SPEC_CTRL MSR writes
When changing SPEC_CTRL for user control, the WRMSR can be delayed
until return-to-user when KERNEL_IBRS has been enabled.

This avoids an MSR write during context switch.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:33:59 +02:00
Peter Zijlstra 2dbb887e87 x86/entry: Add kernel IBRS implementation
Implement Kernel IBRS - currently the only known option to mitigate RSB
underflow speculation issues on Skylake hardware.

Note: since IBRS_ENTER requires fuller context established than
UNTRAIN_RET, it must be placed after it. However, since UNTRAIN_RET
itself implies a RET, it must come after IBRS_ENTER. This means
IBRS_ENTER needs to also move UNTRAIN_RET.

Note 2: KERNEL_IBRS is sub-optimal for XenPV.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:33:59 +02:00
Peter Zijlstra caa0ff24d5 x86/bugs: Keep a per-CPU IA32_SPEC_CTRL value
Due to TIF_SSBD and TIF_SPEC_IB the actual IA32_SPEC_CTRL value can
differ from x86_spec_ctrl_base. As such, keep a per-CPU value
reflecting the current task's MSR content.

  [jpoimboe: rename]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:33:59 +02:00
Alexandre Chartre 6b80b59b35 x86/bugs: Report AMD retbleed vulnerability
Report that AMD x86 CPUs are vulnerable to the RETBleed (Arbitrary
Speculative Code Execution with Return Instructions) attack.

  [peterz: add hygon]
  [kim: invert parity; fam15h]

Co-developed-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:33:59 +02:00
Peter Zijlstra a149180fbc x86: Add magic AMD return-thunk
Note: needs to be in a section distinct from Retpolines such that the
Retpoline RET substitution cannot possibly use immediate jumps.

ORC unwinding for zen_untrain_ret() and __x86_return_thunk() is a
little tricky but works due to the fact that zen_untrain_ret() doesn't
have any stack ops and as such will emit a single ORC entry at the
start (+0x3f).

Meanwhile, unwinding an IP, including the __x86_return_thunk() one
(+0x40) will search for the largest ORC entry smaller or equal to the
IP, these will find the one ORC entry (+0x3f) and all works.

  [ Alexandre: SVM part. ]
  [ bp: Build fix, massages. ]

Suggested-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:33:59 +02:00
Peter Zijlstra aa3d480315 x86: Use return-thunk in asm code
Use the return thunk in asm code. If the thunk isn't needed, it will
get patched into a RET instruction during boot by apply_returns().

Since alternatives can't handle relocations outside of the first
instruction, putting a 'jmp __x86_return_thunk' in one is not valid,
therefore carve out the memmove ERMS path into a separate label and jump
to it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:33:58 +02:00
Peter Zijlstra ee88d363d1 x86,static_call: Use alternative RET encoding
In addition to teaching static_call about the new way to spell 'RET',
there is an added complication in that static_call() is allowed to
rewrite text before it is known which particular spelling is required.

In order to deal with this; have a static_call specific fixup in the
apply_return() 'alternative' patching routine that will rewrite the
static_call trampoline to match the definite sequence.

This in turn creates the problem of uniquely identifying static call
trampolines. Currently trampolines are 8 bytes, the first 5 being the
jmp.d32/ret sequence and the final 3 a byte sequence that spells out
'SCT'.

This sequence is used in __static_call_validate() to ensure it is
patching a trampoline and not a random other jmp.d32. That is,
false-positives shouldn't be plenty, but aren't a big concern.

OTOH the new __static_call_fixup() must not have false-positives, and
'SCT' decodes to the somewhat weird but semi plausible sequence:

  push %rbx
  rex.XB push %r12

Additionally, there are SLS concerns with immediate jumps. Combined it
seems like a good moment to change the signature to a single 3 byte
trap instruction that is unique to this usage and will not ever get
generated by accident.

As such, change the signature to: '0x0f, 0xb9, 0xcc', which decodes
to:

  ud1 %esp, %ecx

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:33:58 +02:00
Peter Zijlstra 15e67227c4 x86: Undo return-thunk damage
Introduce X86_FEATURE_RETHUNK for those afflicted with needing this.

  [ bp: Do only INT3 padding - simpler. ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:33:58 +02:00
Peter Zijlstra 0b53c374b9 x86/retpoline: Use -mfunction-return
Utilize -mfunction-return=thunk-extern when available to have the
compiler replace RET instructions with direct JMPs to the symbol
__x86_return_thunk. This does not affect assembler (.S) sources, only C
sources.

-mfunction-return=thunk-extern has been available since gcc 7.3 and
clang 15.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:33:58 +02:00
Peter Zijlstra 369ae6ffc4 x86/retpoline: Cleanup some #ifdefery
On it's own not much of a cleanup but it prepares for more/similar
code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:33:58 +02:00
Peter Zijlstra a883d624ae x86/cpufeatures: Move RETPOLINE flags to word 11
In order to extend the RETPOLINE features to 4, move them to word 11
where there is still room. This mostly keeps DISABLE_RETPOLINE
simple.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:33:58 +02:00
Suravee Suthikulpanit 0e311d33bf KVM: SVM: Introduce hybrid-AVIC mode
Currently, AVIC is inhibited when booting a VM w/ x2APIC support.
because AVIC cannot virtualize x2APIC MSR register accesses.
However, the AVIC doorbell can be used to accelerate interrupt
injection into a running vCPU, while all guest accesses to x2APIC MSRs
will be intercepted and emulated by KVM.

With hybrid-AVIC support, the APICV_INHIBIT_REASON_X2APIC is
no longer enforced.

Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20220519102709.24125-14-suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24 12:51:00 -04:00
Suravee Suthikulpanit 4d1d7942e3 KVM: SVM: Introduce logic to (de)activate x2AVIC mode
Introduce logic to (de)activate AVIC, which also allows
switching between AVIC to x2AVIC mode at runtime.

When an AVIC-enabled guest switches from APIC to x2APIC mode,
the SVM driver needs to perform the following steps:

1. Set the x2APIC mode bit for AVIC in VMCB along with the maximum
APIC ID support for each mode accodingly.

2. Disable x2APIC MSRs interception in order to allow the hardware
to virtualize x2APIC MSRs accesses.

Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20220519102709.24125-12-suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24 12:50:35 -04:00
Suravee Suthikulpanit d2fe6bf5b8 KVM: SVM: Update max number of vCPUs supported for x2AVIC mode
xAVIC and x2AVIC modes can support diffferent number of vcpus.
Update existing logics to support each mode accordingly.

Also, modify the maximum physical APIC ID for AVIC to 255 to reflect
the actual value supported by the architecture.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20220519102709.24125-5-suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24 12:45:08 -04:00
Suravee Suthikulpanit 4bdec12aa8 KVM: SVM: Detect X2APIC virtualization (x2AVIC) support
Add CPUID check for the x2APIC virtualization (x2AVIC) feature.
If available, the SVM driver can support both AVIC and x2AVIC modes
when load the kvm_amd driver with avic=1. The operating mode will be
determined at runtime depending on the guest APIC mode.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20220519102709.24125-4-suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24 12:44:54 -04:00
Suravee Suthikulpanit bf348f667e KVM: x86: lapic: Rename [GET/SET]_APIC_DEST_FIELD to [GET/SET]_XAPIC_DEST_FIELD
To signify that the macros only support 8-bit xAPIC destination ID.

Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220519102709.24125-3-suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24 12:44:34 -04:00
Suravee Suthikulpanit aae99a7c9a x86/cpufeatures: Introduce x2AVIC CPUID bit
Introduce a new feature bit for virtualized x2APIC (x2AVIC) in
CPUID_Fn8000000A_EDX [SVM Revision and Feature Identification].

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220519102709.24125-2-suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24 12:44:34 -04:00
Jue Wang 281b52780b KVM: x86: Add emulation for MSR_IA32_MCx_CTL2 MSRs.
This patch adds the emulation of IA32_MCi_CTL2 registers to KVM. A
separate mci_ctl2_banks array is used to keep the existing mce_banks
register layout intact.

In Machine Check Architecture, in addition to MCG_CMCI_P, bit 30 of
the per-bank register IA32_MCi_CTL2 controls whether Corrected Machine
Check error reporting is enabled.

Signed-off-by: Jue Wang <juew@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220610171134.772566-7-juew@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24 04:52:03 -04:00
David Matlack ada51a9de7 KVM: x86/mmu: Extend Eager Page Splitting to nested MMUs
Add support for Eager Page Splitting pages that are mapped by nested
MMUs. Walk through the rmap first splitting all 1GiB pages to 2MiB
pages, and then splitting all 2MiB pages to 4KiB pages.

Note, Eager Page Splitting is limited to nested MMUs as a policy rather
than due to any technical reason (the sp->role.guest_mode check could
just be deleted and Eager Page Splitting would work correctly for all
shadow MMU pages). There is really no reason to support Eager Page
Splitting for tdp_mmu=N, since such support will eventually be phased
out, and there is no current use case supporting Eager Page Splitting on
hosts where TDP is either disabled or unavailable in hardware.
Furthermore, future improvements to nested MMU scalability may diverge
the code from the legacy shadow paging implementation. These
improvements will be simpler to make if Eager Page Splitting does not
have to worry about legacy shadow paging.

Splitting huge pages mapped by nested MMUs requires dealing with some
extra complexity beyond that of the TDP MMU:

(1) The shadow MMU has a limit on the number of shadow pages that are
    allowed to be allocated. So, as a policy, Eager Page Splitting
    refuses to split if there are KVM_MIN_FREE_MMU_PAGES or fewer
    pages available.

(2) Splitting a huge page may end up re-using an existing lower level
    shadow page tables. This is unlike the TDP MMU which always allocates
    new shadow page tables when splitting.

(3) When installing the lower level SPTEs, they must be added to the
    rmap which may require allocating additional pte_list_desc structs.

Case (2) is especially interesting since it may require a TLB flush,
unlike the TDP MMU which can fully split huge pages without any TLB
flushes. Specifically, an existing lower level page table may point to
even lower level page tables that are not fully populated, effectively
unmapping a portion of the huge page, which requires a flush.  As of
this commit, a flush is always done always after dropping the huge page
and before installing the lower level page table.

This TLB flush could instead be delayed until the MMU lock is about to be
dropped, which would batch flushes for multiple splits.  However these
flushes should be rare in practice (a huge page must be aliased in
multiple SPTEs and have been split for NX Huge Pages in only some of
them). Flushing immediately is simpler to plumb and also reduces the
chances of tripping over a CPU bug (e.g. see iTLB multihit).

[ This commit is based off of the original implementation of Eager Page
  Splitting from Peter in Google's kernel from 2016. ]

Suggested-by: Peter Feiner <pfeiner@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220516232138.1783324-23-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24 04:52:00 -04:00
David Matlack 6a97575d5c KVM: x86/mmu: Cache the access bits of shadowed translations
Splitting huge pages requires allocating/finding shadow pages to replace
the huge page. Shadow pages are keyed, in part, off the guest access
permissions they are shadowing. For fully direct MMUs, there is no
shadowing so the access bits in the shadow page role are always ACC_ALL.
But during shadow paging, the guest can enforce whatever access
permissions it wants.

In particular, eager page splitting needs to know the permissions to use
for the subpages, but KVM cannot retrieve them from the guest page
tables because eager page splitting does not have a vCPU.  Fortunately,
the guest access permissions are easy to cache whenever page faults or
FNAME(sync_page) update the shadow page tables; this is an extension of
the existing cache of the shadowed GFNs in the gfns array of the shadow
page.  The access bits only take up 3 bits, which leaves 61 bits left
over for gfns, which is more than enough.

Now that the gfns array caches more information than just GFNs, rename
it to shadowed_translation.

While here, preemptively fix up the WARN_ON() that detects gfn
mismatches in direct SPs. The WARN_ON() was paired with a
pr_err_ratelimited(), which means that users could sometimes see the
WARN without the accompanying error message. Fix this by outputting the
error message as part of the WARN splat, and opportunistically make
them WARN_ONCE() because if these ever fire, they are all but guaranteed
to fire a lot and will bring down the kernel.

Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220516232138.1783324-18-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24 04:51:58 -04:00
Ben Gardon 084cc29f8b KVM: x86/MMU: Allow NX huge pages to be disabled on a per-vm basis
In some cases, the NX hugepage mitigation for iTLB multihit is not
needed for all guests on a host. Allow disabling the mitigation on a
per-VM basis to avoid the performance hit of NX hugepages on trusted
workloads.

In order to disable NX hugepages on a VM, ensure that the userspace
actor has permission to reboot the system. Since disabling NX hugepages
would allow a guest to crash the system, it is similar to reboot
permissions.

Ideally, KVM would require userspace to prove it has access to KVM's
nx_huge_pages module param, e.g. so that userspace can opt out without
needing full reboot permissions.  But getting access to the module param
file info is difficult because it is buried in layers of sysfs and module
glue. Requiring CAP_SYS_BOOT is sufficient for all known use cases.

Suggested-by: Jim Mattson <jmattson@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20220613212523.3436117-9-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24 04:51:49 -04:00
Linus Torvalds ca1fdab7fd First batch of EFI fixes for v5.19
- remove pointless include of asm/efi.h, which does not exist on ia64
 - fix DXE service marshalling prototype for mixed mode
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmKx7iAACgkQw08iOZLZ
 jyTwxgv/dnQ+zEsHGSVmfQkucP+Oj49gzNgX4EaeRly9WvpUCj4Nar4aAjCRBaur
 Jcx2aX0z8fnGzh4CtqfI/J5BzU4fHHuLm1sRXkCCNMkMbVscdTghGiYIPCkXlHYS
 cgQmqzM/vM2lGu2seDfib0ZCbsBhFgSfq9mk9BAHzK+zxdn70fUm+gjL5oGUAXyC
 hJOvJJEom0nfhE0SL88EmRiKcborny98+bkaMb5U+309LkISY0o8RjMxgFK6V/q2
 jxUxw1MQgzdGEjCy46A4SJ5wIpqGIzB5FlU6JXSWsDP8brIADmimlSjp36FRDvti
 1Td1Vjy7a3FSrRHJJXM2Tm5JJJOdFJywvIcNZT3UzWvwbVFHriivwjbIxZwFQUIo
 4TRdrQkkdnAGK/e9nc0ZGUT9/D5ooVh+mkLCxRuems8l8J052+DSL6vqeM8OJh1c
 xZnmSC76ouNURPJdLgjfniVFcfygC9XhZl6lK+Od654DN0+eYzgm9y55BvevTA+P
 NIONRRTU
 =4yyi
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent-for-v5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI fixes from Ard Biesheuvel:

 - remove pointless include of asm/efi.h, which does not exist on ia64

 - fix DXE service marshalling prototype for mixed mode

* tag 'efi-urgent-for-v5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi/x86: libstub: Fix typo in __efi64_argmap* name
  efi: sysfb_efi: remove unnecessary <asm/efi.h> include
2022-06-21 12:20:11 -05:00
Evgeniy Baskov aa6d1ed107 efi/x86: libstub: Fix typo in __efi64_argmap* name
The actual name of the DXE services function used
is set_memory_space_attributes(), not set_memory_space_descriptor().

Change EFI mixed mode helper macro name to match the function name.

Fixes: 31f1a0edff ("efi/x86: libstub: Make DXE calls mixed mode safe")
Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-06-21 18:11:46 +02:00
Sean Christopherson bfbcc81bb8 KVM: x86: Add a quirk for KVM's "MONITOR/MWAIT are NOPs!" behavior
Add a quirk for KVM's behavior of emulating intercepted MONITOR/MWAIT
instructions a NOPs regardless of whether or not they are supported in
guest CPUID.  KVM's current behavior was likely motiviated by a certain
fruity operating system that expects MONITOR/MWAIT to be supported
unconditionally and blindly executes MONITOR/MWAIT without first checking
CPUID.  And because KVM does NOT advertise MONITOR/MWAIT to userspace,
that's effectively the default setup for any VMM that regurgitates
KVM_GET_SUPPORTED_CPUID to KVM_SET_CPUID2.

Note, this quirk interacts with KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT.  The
behavior is actually desirable, as userspace VMMs that want to
unconditionally hide MONITOR/MWAIT from the guest can leave the
MISC_ENABLE quirk enabled.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220608224516.3788274-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-20 11:50:42 -04:00
Sean Christopherson ce0a58f475 KVM: x86: Move "apicv_active" into "struct kvm_lapic"
Move the per-vCPU apicv_active flag into KVM's local APIC instance.
APICv is fully dependent on an in-kernel local APIC, but that's not at
all clear when reading the current code due to the flag being stored in
the generic kvm_vcpu_arch struct.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220614230548.3852141-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-20 06:21:24 -04:00
Sean Christopherson d39850f57d KVM: x86: Drop @vcpu parameter from kvm_x86_ops.hwapic_isr_update()
Drop the unused @vcpu parameter from hwapic_isr_update().  AMD/AVIC is
unlikely to implement the helper, and VMX/APICv doesn't need the vCPU as
it operates on the current VMCS.  The result is somewhat odd, but allows
for a decent amount of (future) cleanup in the APIC code.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220614230548.3852141-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-20 06:21:23 -04:00
Linus Torvalds 05c6ca8512 X86 updates:
- Make RESERVE_BRK() work again with older binutils. The recent
    'simplification' broke that.
 
  - Make early #VE handling increment RIP when successful.
 
  - Make the #VE code consistent vs. the RIP adjustments and add comments.
 
  - Handle load_unaligned_zeropad() across page boundaries correctly in #VE
    when the second page is shared.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmKvIG4THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoaqCD/9NAUyHTjKDqdWuMD/ITU8ymDr+Ix8z
 vUlysdXbxJg6MvT12ZbhJUFTKAsXskGAAnXz/EtZ8zTQQVzTjis/HooJh4XLeuO4
 NLh9KV9FvH7w69e6Jg31MGkOUJU3BV+WYUx1f34zbQ8FHftxUwu+M47UYExPYKDR
 VIbNeQIpqoBfjTSPVGXlWl/panuZG6RV+PRcvxV3yeRRA8zyCB/WTmNkoDjbw4fl
 YCWwJF7/m4iT3LtoaFXWVGFzSRZoGHbhSdgEOZGIZ7sjvydoaQo402JuhW3WLI2m
 oXLVZ+2wOPGBKp3WQ1t3mpfScBvCiN3SW4pSPDQ+E8fT/RQiRMb29c9S6ANdm3nT
 27fYMJOq+xxex5gOYzdgLz7O99M08uOn2bxJwB+IBIr5jEFH9b4EffeEWsfdZBsi
 1AzkXCi+Ib0ZYAndxUP068m+4iW0LtuApm0fg6LhtdDmBGquj+88OZOUK7Z/kW/N
 IkjgCeqFgmdNb/+Z3XrdYobaAl6J4toIqA4A+O8yL6gJfn9PnaMGsYtA8c5yQchD
 kFoTu5pCALY2KjZkKFRMuEbMH2oj3sjjb7f6mYAHxec6jikIx2c5HswA4sLmzHAN
 GG2MDUH12bWoLfeA4IRwTRz/vh8IeZNq5ZzdCnS6KHUNk5OJRGLtRphKy8z+pOYx
 +i9ThZFBV8pBzg==
 =sRtG
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:

 - Make RESERVE_BRK() work again with older binutils. The recent
   'simplification' broke that.

 - Make early #VE handling increment RIP when successful.

 - Make the #VE code consistent vs. the RIP adjustments and add
   comments.

 - Handle load_unaligned_zeropad() across page boundaries correctly in
   #VE when the second page is shared.

* tag 'x86-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page
  x86/tdx: Clarify RIP adjustments in #VE handler
  x86/tdx: Fix early #VE handling
  x86/mm: Fix RESERVE_BRK() for older binutils
2022-06-19 09:58:28 -05:00
Linus Torvalds 32efdbffff pci-v5.19-fixes-2
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmKs17UUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vzlVQ//Qygje8u1x0qQqGFykMf3+cev2ECV
 /sfOHtRoG9qZ4gKQOGShr9gyeV+9VFl0+CEJE4z9/YiAnoOr5JMAMixYRjHQU9MB
 k3TFkDqD0KBgmXdAubVP5HQGZgA+mryvtEhr5rm45PooJJSjuh1ds87YYO+Z/t7s
 c2AzvpHFLjECB6LRHqHieyp4CeWn5tw8im7uMUmfKkkXF5ckqw3e+7gwJzRrAukg
 GgbLb2JZLzXSl1HOx/2GvPW8RzyXRbbJmpvu9LsNKeoqP006F3chDVIncqG3Q9QQ
 LvrjudC/eY/2Fee3tpP6gjl9A5ALvXT/k4gTw4Lwm8OlaxrYZ3gwcBHZepp3F08s
 qVCncjFooeHAMiJDkGvtf6N8k8VnOy4zvg2qDpKOl2NO3jH95nGi0LGMf0/GXvfh
 P4bwqjGcWKSo9C2amagZ49rzaJBIQRms8ItM6WPvCYirjyYi3PeUMl/dylstbnuq
 DQuprZtgOEGlPGUg/CO4fCpCIkKzpvOV6157z39mZS5HOT5ugvF4k+hwSxZd0hsM
 rJI7Te48Z46Y/qoFesgOglJwwEZs4RqHAvMTGC+V5Ftj9gHe7wo2oOxlhCyFW1Qd
 jec3UhXEzTVxnjsu7peHyfwwmGWFUkG16P17hWpncpG6azW1dybQOCsRy21/F+q2
 i1nd61lXpuhKQKQ=
 =5y1d
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.19-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci fix from Bjorn Helgaas:
 "Revert clipping of PCI host bridge windows to avoid E820 regions,
  which broke several machines by forcing unnecessary BAR reassignments
  (Hans de Goede)"

* tag 'pci-v5.19-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  x86/PCI: Revert "x86/PCI: Clip only host bridge windows for E820 regions"
2022-06-17 15:12:20 -05:00
Hans de Goede a2b36ffbf5 x86/PCI: Revert "x86/PCI: Clip only host bridge windows for E820 regions"
This reverts commit 4c5e242d3e.

Prior to 4c5e242d3e ("x86/PCI: Clip only host bridge windows for E820
regions"), E820 regions did not affect PCI host bridge windows.  We only
looked at E820 regions and avoided them when allocating new MMIO space.
If firmware PCI bridge window and BAR assignments used E820 regions, we
left them alone.

After 4c5e242d3e, we removed E820 regions from the PCI host bridge
windows before looking at BARs, so firmware assignments in E820 regions
looked like errors, and we moved things around to fit in the space left
(if any) after removing the E820 regions.  This unnecessary BAR
reassignment broke several machines.

Guilherme reported that Steam Deck fails to boot after 4c5e242d3e.  We
clipped the window that contained most 32-bit BARs:

  BIOS-e820: [mem 0x00000000a0000000-0x00000000a00fffff] reserved
  acpi PNP0A08:00: clipped [mem 0x80000000-0xf7ffffff window] to [mem 0xa0100000-0xf7ffffff window] for e820 entry [mem 0xa0000000-0xa00fffff]

which forced us to reassign all those BARs, for example, this NVMe BAR:

  pci 0000:00:01.2: PCI bridge to [bus 01]
  pci 0000:00:01.2:   bridge window [mem 0x80600000-0x806fffff]
  pci 0000:01:00.0: BAR 0: [mem 0x80600000-0x80603fff 64bit]
  pci 0000:00:01.2: can't claim window [mem 0x80600000-0x806fffff]: no compatible bridge window
  pci 0000:01:00.0: can't claim BAR 0 [mem 0x80600000-0x80603fff 64bit]: no compatible bridge window

  pci 0000:00:01.2: bridge window: assigned [mem 0xa0100000-0xa01fffff]
  pci 0000:01:00.0: BAR 0: assigned [mem 0xa0100000-0xa0103fff 64bit]

All the reassignments were successful, so the devices should have been
functional at the new addresses, but some were not.

Andy reported a similar failure on an Intel MID platform.  Benjamin
reported a similar failure on a VMWare Fusion VM.

Note: this is not a clean revert; this revert keeps the later change to
make the clipping dependent on a new pci_use_e820 bool, moving the checking
of this bool to arch_remove_reservations().

[bhelgaas: commit log, add more reporters and testers]
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216109
Reported-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reported-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reported-by: Benjamin Coddington <bcodding@redhat.com>
Reported-by: Jongman Heo <jongman.heo@gmail.com>
Fixes: 4c5e242d3e ("x86/PCI: Clip only host bridge windows for E820 regions")
Link: https://lore.kernel.org/r/20220612144325.85366-1-hdegoede@redhat.com
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2022-06-17 14:24:14 -05:00
Linus Torvalds 2d806a688f hyperv-fixes for 5.19-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmKsjCwTHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXpAqB/0akBXX+rxp/MzKKiuzOa53rV5IqPe6
 dbDhJtsyCeX2xiNtiDYqCsCmKRt+rmQgzrKykRLCeQatrLmjbUzEZYo1QmpmdFjK
 icZtYdso1FEW6v4h1j+p7dl/l3nqXWcG9gjCDecma9ZxXnpzNw5O9kBCpUyWGzwS
 85RJy2X5n7gyqiAyrDD5MSiUAL2n2FG2lHwK4EYjXsfPJEesx/gh8DEzVkMJWSqa
 nPyvtnPxGSDr22KTymF/Z2BJSYMcxaU528Ztto+DE8NhVWKk99eiMpyXagH3LygD
 gOHmnRS23eJhw/Mq44IHGf7t94bFuUBQraVvaGnaXGLoLv946oNMfLhl
 =miu9
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-fixes-signed-20220617' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv fixes from Wei Liu:

 - Fix hv_init_clocksource annotation (Masahiro Yamada)

 - Two bug fixes for vmbus driver (Saurabh Sengar)

 - Fix SEV negotiation (Tianyu Lan)

 - Fix comments in code (Xiang Wang)

 - One minor fix to HID driver (Michael Kelley)

* tag 'hyperv-fixes-signed-20220617' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  x86/Hyper-V: Add SEV negotiate protocol support in Isolation VM
  Drivers: hv: vmbus: Release cpu lock in error case
  HID: hyperv: Correctly access fields declared as __le16
  clocksource: hyper-v: unexport __init-annotated hv_init_clocksource()
  Drivers: hv: Fix syntax errors in comments
  Drivers: hv: vmbus: Don't assign VMbus channel interrupts to isolated CPUs
2022-06-17 13:39:12 -05:00
Tianyu Lan 49d6a3c062 x86/Hyper-V: Add SEV negotiate protocol support in Isolation VM
Hyper-V Isolation VM current code uses sev_es_ghcb_hv_call()
to read/write MSR via GHCB page and depends on the sev code.
This may cause regression when sev code changes interface
design.

The latest SEV-ES code requires to negotiate GHCB version before
reading/writing MSR via GHCB page and sev_es_ghcb_hv_call() doesn't
work for Hyper-V Isolation VM. Add Hyper-V ghcb related implementation
to decouple SEV and Hyper-V code. Negotiate GHCB version in the
hyperv_init() and use the version to communicate with Hyper-V
in the ghcb hv call function.

Fixes: 2ea29c5abb ("x86/sev: Save the negotiated GHCB version")
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220614014553.1915929-1-ltykernel@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-06-15 18:27:40 +00:00
Ma Wupeng 6365a1935c efi: Make code to find mirrored memory ranges generic
Commit b05b9f5f9d ("x86, mirror: x86 enabling - find mirrored memory
ranges") introduce the efi_find_mirror() function on x86. In order to reuse
the API we make it public.

Arm64 can support mirrored memory too, so function efi_find_mirror() is added to
efi_init() to this support for arm64.

Since efi_init() is shared by ARM, arm64 and riscv, this patch will bring
mirror memory support for these architectures, but this support is only tested
in arm64.

Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
Link: https://lore.kernel.org/r/20220614092156.1972846-2-mawupeng1@huawei.com
[ardb: fix subject to better reflect the payload]
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-06-15 12:11:19 +02:00
Linus Torvalds 24625f7d91 ARM64:
* Properly reset the SVE/SME flags on vcpu load
 
 * Fix a vgic-v2 regression regarding accessing the pending
 state of a HW interrupt from userspace (and make the code
 common with vgic-v3)
 
 * Fix access to the idreg range for protected guests
 
 * Ignore 'kvm-arm.mode=protected' when using VHE
 
 * Return an error from kvm_arch_init_vm() on allocation failure
 
 * A bunch of small cleanups (comments, annotations, indentation)
 
 RISC-V:
 
 * Typo fix in arch/riscv/kvm/vmid.c
 
 * Remove broken reference pattern from MAINTAINERS entry
 
 x86-64:
 
 * Fix error in page tables with MKTME enabled
 
 * Dirty page tracking performance test extended to running a nested
   guest
 
 * Disable APICv/AVIC in cases that it cannot implement correctly
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmKjTIAUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNhPQgAiIVtp8aepujUM/NhkNyK3SIdLzlS
 oZCZiS6bvaecKXi/QvhBU0EBxAEyrovk3lmVuYNd41xI+PDjyaA4SDIl5DnToGUw
 bVPNFSYqjpF939vUUKjc0RCdZR4o5g3Od3tvWoHTHviS1a8aAe5o9pcpHpD0D6Mp
 Gc/o58nKAOPl3htcFKmjymqo3Y6yvkJU9NB7DCbL8T5mp5pJ959Mw1/LlmBaAzJC
 OofrynUm4NjMyAj/mAB1FhHKFyQfjBXLhiVlS0SLiiEA/tn9/OXyVFMKG+n5VkAZ
 Q337GMFe2RikEIuMEr3Rc4qbZK3PpxHhaj+6MPRuM0ho/P4yzl2Nyb/OhA==
 =h81Q
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "While last week's pull request contained miscellaneous fixes for x86,
  this one covers other architectures, selftests changes, and a bigger
  series for APIC virtualization bugs that were discovered during 5.20
  development. The idea is to base 5.20 development for KVM on top of
  this tag.

  ARM64:

   - Properly reset the SVE/SME flags on vcpu load

   - Fix a vgic-v2 regression regarding accessing the pending state of a
     HW interrupt from userspace (and make the code common with vgic-v3)

   - Fix access to the idreg range for protected guests

   - Ignore 'kvm-arm.mode=protected' when using VHE

   - Return an error from kvm_arch_init_vm() on allocation failure

   - A bunch of small cleanups (comments, annotations, indentation)

  RISC-V:

   - Typo fix in arch/riscv/kvm/vmid.c

   - Remove broken reference pattern from MAINTAINERS entry

  x86-64:

   - Fix error in page tables with MKTME enabled

   - Dirty page tracking performance test extended to running a nested
     guest

   - Disable APICv/AVIC in cases that it cannot implement correctly"

[ This merge also fixes a misplaced end parenthesis bug introduced in
  commit 3743c2f025 ("KVM: x86: inhibit APICv/AVIC on changes to APIC
  ID or APIC base") pointed out by Sean Christopherson ]

Link: https://lore.kernel.org/all/20220610191813.371682-1-seanjc@google.com/

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (34 commits)
  KVM: selftests: Restrict test region to 48-bit physical addresses when using nested
  KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2
  KVM: selftests: Clean up LIBKVM files in Makefile
  KVM: selftests: Link selftests directly with lib object files
  KVM: selftests: Drop unnecessary rule for STATIC_LIBS
  KVM: selftests: Add a helper to check EPT/VPID capabilities
  KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h
  KVM: selftests: Refactor nested_map() to specify target level
  KVM: selftests: Drop stale function parameter comment for nested_map()
  KVM: selftests: Add option to create 2M and 1G EPT mappings
  KVM: selftests: Replace x86_page_size with PG_LEVEL_XX
  KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSE
  KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/put
  KVM: x86: disable preemption around the call to kvm_arch_vcpu_{un|}blocking
  KVM: x86: disable preemption while updating apicv inhibition
  KVM: x86: SVM: fix avic_kick_target_vcpus_fast
  KVM: x86: SVM: remove avic's broken code that updated APIC ID
  KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base
  KVM: x86: document AVIC/APICv inhibit reasons
  KVM: x86/mmu: Set memory encryption "value", not "mask", in shadow PDPTRs
  ...
2022-06-14 07:57:18 -07:00
Linus Torvalds 8e8afafb0b Yet another hw vulnerability with a software mitigation: Processor MMIO
Stale Data.
 
 They are a class of MMIO-related weaknesses which can expose stale data
 by propagating it into core fill buffers. Data which can then be leaked
 using the usual speculative execution methods.
 
 Mitigations include this set along with microcode updates and are
 similar to MDS and TAA vulnerabilities: VERW now clears those buffers
 too.
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmKXMkMTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoWGPD/idalLIhhV5F2+hZIKm0WSnsBxAOh9K
 7y8xBxpQQ5FUfW3vm7Pg3ro6VJp7w2CzKoD4lGXzGHriusn3qst3vkza9Ay8xu8g
 RDwKe6hI+p+Il9BV9op3f8FiRLP9bcPMMReW/mRyYsOnJe59hVNwRAL8OG40PY4k
 hZgg4Psfvfx8bwiye5efjMSe4fXV7BUCkr601+8kVJoiaoszkux9mqP+cnnB5P3H
 zW1d1jx7d6eV1Y063h7WgiNqQRYv0bROZP5BJkufIoOHUXDpd65IRF3bDnCIvSEz
 KkMYJNXb3qh7EQeHS53NL+gz2EBQt+Tq1VH256qn6i3mcHs85HvC68gVrAkfVHJE
 QLJE3MoXWOqw+mhwzCRrEXN9O1lT/PqDWw8I4M/5KtGG/KnJs+bygmfKBbKjIVg4
 2yQWfMmOgQsw3GWCRjgEli7aYbDJQjany0K/qZTq54I41gu+TV8YMccaWcXgDKrm
 cXFGUfOg4gBm4IRjJ/RJn+mUv6u+/3sLVqsaFTs9aiib1dpBSSUuMGBh548Ft7g2
 5VbFVSDaLjB2BdlcG7enlsmtzw0ltNssmqg7jTK/L7XNVnvxwUoXw+zP7RmCLEYt
 UV4FHXraMKNt2ZketlomC8ui2hg73ylUp4pPdMXCp7PIXp9sVamRTbpz12h689VJ
 /s55bWxHkR6S
 =LBxT
 -----END PGP SIGNATURE-----

Merge tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 MMIO stale data fixes from Thomas Gleixner:
 "Yet another hw vulnerability with a software mitigation: Processor
  MMIO Stale Data.

  They are a class of MMIO-related weaknesses which can expose stale
  data by propagating it into core fill buffers. Data which can then be
  leaked using the usual speculative execution methods.

  Mitigations include this set along with microcode updates and are
  similar to MDS and TAA vulnerabilities: VERW now clears those buffers
  too"

* tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation/mmio: Print SMT warning
  KVM: x86/speculation: Disable Fill buffer clear within guests
  x86/speculation/mmio: Reuse SRBDS mitigation for SBDS
  x86/speculation/srbds: Update SRBDS mitigation selection
  x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data
  x86/speculation/mmio: Enable CPU Fill buffer clearing on idle
  x86/bugs: Group MDS, TAA & Processor MMIO Stale Data mitigations
  x86/speculation/mmio: Add mitigation for Processor MMIO Stale Data
  x86/speculation: Add a common function for MD_CLEAR mitigation update
  x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug
  Documentation: Add documentation for Processor MMIO Stale Data
2022-06-14 07:43:15 -07:00
Sandipan Das c390241a93 perf/x86/amd/uncore: Add PerfMonV2 DF event format
If AMD Performance Monitoring Version 2 (PerfMonV2) is
supported, use bits 0-7, 32-37 as EventSelect and bits
8-15, 24-27 as UnitMask for Data Fabric (DF) events.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/ffc24d5a3375b1d6e457d88e83241114de5c1942.1652954372.git.sandipan.das@amd.com
2022-06-13 10:15:14 +02:00
Sandipan Das 16b48c3f5e perf/x86/amd/uncore: Detect available DF counters
If AMD Performance Monitoring Version 2 (PerfMonV2) is
supported, use CPUID leaf 0x80000022 EBX to detect the
number of Data Fabric (DF) PMCs. This offers more
flexibility if the counts change in later processor
families.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/bac7b2806561e03f2acc7fdc9db94f102df80e1d.1652954372.git.sandipan.das@amd.com
2022-06-13 10:15:13 +02:00
Josh Poimboeuf e32683c6f7 x86/mm: Fix RESERVE_BRK() for older binutils
With binutils 2.26, RESERVE_BRK() causes a build failure:

  /tmp/ccnGOKZ5.s: Assembler messages:
  /tmp/ccnGOKZ5.s:98: Error: missing ')'
  /tmp/ccnGOKZ5.s:98: Error: missing ')'
  /tmp/ccnGOKZ5.s:98: Error: missing ')'
  /tmp/ccnGOKZ5.s:98: Error: junk at end of line, first unrecognized
  character is `U'

The problem is this line:

  RESERVE_BRK(early_pgt_alloc, INIT_PGT_BUF_SIZE)

Specifically, the INIT_PGT_BUF_SIZE macro which (via PAGE_SIZE's use
_AC()) has a "1UL", which makes older versions of the assembler unhappy.
Unfortunately the _AC() macro doesn't work for inline asm.

Inline asm was only needed here to convince the toolchain to add the
STT_NOBITS flag.  However, if a C variable is placed in a section whose
name is prefixed with ".bss", GCC and Clang automatically set
STT_NOBITS.  In fact, ".bss..page_aligned" already relies on this trick.

So fix the build failure (and simplify the macro) by allocating the
variable in C.

Also, add NOLOAD to the ".brk" output section clause in the linker
script.  This is a failsafe in case the ".bss" prefix magic trick ever
stops working somehow.  If there's a section type mismatch, the GNU
linker will force the ".brk" output section to be STT_NOBITS.  The LLVM
linker will fail with a "section type mismatch" error.

Note this also changes the name of the variable from .brk.##name to
__brk_##name.  The variable names aren't actually used anywhere, so it's
harmless.

Fixes: a1e2c031ec ("x86/mm: Simplify RESERVE_BRK()")
Reported-by: Joe Damato <jdamato@fastly.com>
Reported-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Joe Damato <jdamato@fastly.com>
Link: https://lore.kernel.org/r/22d07a44c80d8e8e1e82b9a806ddc8c6bbb2606e.1654759036.git.jpoimboe@kernel.org
2022-06-13 10:15:04 +02:00
Paolo Bonzini e15f5e6fa6 Merge branch 'kvm-5.20-early'
s390:

* add an interface to provide a hypervisor dump for secure guests

* improve selftests to show tests

x86:

* Intel IPI virtualization

* Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS

* PEBS virtualization

* Simplify PMU emulation by just using PERF_TYPE_RAW events

* More accurate event reinjection on SVM (avoid retrying instructions)

* Allow getting/setting the state of the speaker port data bit

* Rewrite gfn-pfn cache refresh

* Refuse starting the module if VM-Entry/VM-Exit controls are inconsistent

* "Notify" VM exit
2022-06-09 11:38:12 -04:00
Maxim Levitsky 3743c2f025 KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base
Neither of these settings should be changed by the guest and it is
a burden to support it in the acceleration code, so just inhibit
this code instead.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606180829.102503-3-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09 10:52:18 -04:00
Maxim Levitsky a9603ae0e4 KVM: x86: document AVIC/APICv inhibit reasons
These days there are too many AVIC/APICv inhibit
reasons, and it doesn't hurt to have some documentation
for them.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606180829.102503-2-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09 10:52:17 -04:00
Paolo Bonzini 66da65005a KVM/riscv fixes for 5.19, take #1
- Typo fix in arch/riscv/kvm/vmid.c
 
 - Remove broken reference pattern from MAINTAINERS entry
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmKhgGkACgkQrUjsVaLH
 LAfCCw/+LEz6af/lm4PXr5CGkJ91xq2xMpU9o39jMBm0RltzsG7zt/90SaUdK/Oz
 MpX7CLFgb1Cm2ZZ/+l5cBlLc7NUaMMxHH9dpScyrYC8xAb75QYimpe/jfjuMyXjO
 IaYJB2WCs2gfTYXA58c4sB2WR5rLahLnQGJrwW2CfMSvpv/nAyEZyWYtgXw8tSxH
 oM06Z/cLWU53uWuX0hbKAVQMdAIrQK5H+z46bhbpFC6gk/XSvaBBEngoOiiE6lC6
 uM8i8ZIeUgqSeWWreczd6H25eYwyLuVxXHWSIgbdvEcvBUn0VDO+Ox4UA2ab3g3d
 uHubqdRY5GnrkbRK0ue6tXfON8NxGlKwlcc6kp9Vqxb3Jxjr2qwToTubHYAUVXUi
 XzrvSxoZRRikwstb1+PNXECCNYUHkNdj4FBA4WoF0Y3Br1IfSwZLUX+EKkY/DHv+
 L4MhFFNqsQPzVly2wNiyxuWwRQyxupHekeMQlp13P9vZnGcptxxEyuQlM1Hf40ST
 iiOC8L+TCQzc5dN156/KjQIUFPud4huJO+0xHQtang628yVzQazzcxD+ialPkcqt
 JnpMmNbvvNzFYLoB3dQ/36flmYRA6SbK4Tt4bdhls+UcnLnfHDZow7OLmX5yj8+A
 QiKx6IOS6KI10LXhVZguAmZuKjXajyLVaCWpBl0tV6XpV9Y5t98=
 =w6dT
 -----END PGP SIGNATURE-----

Merge tag 'kvm-riscv-fixes-5.19-1' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv fixes for 5.19, take #1

- Typo fix in arch/riscv/kvm/vmid.c

- Remove broken reference pattern from MAINTAINERS entry
2022-06-09 09:45:00 -04:00
Wyes Karny 6f33a9daff x86: Fix comment for X86_FEATURE_ZEN
The feature X86_FEATURE_ZEN implies that the CPU based on Zen
microarchitecture. Call this out explicitly in the comment.

Signed-off-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lkml.kernel.org/r/9931b01a85120a0d1faf0f244e8de3f2190e774c.1654538381.git-series.wyes.karny@amd.com
2022-06-08 13:01:58 -07:00
Wyes Karny aebef63cf7 x86: Remove vendor checks from prefer_mwait_c1_over_halt
Remove vendor checks from prefer_mwait_c1_over_halt function. Restore
the decision tree to support MWAIT C1 as the default idle state based on
CPUID checks as done by Thomas Gleixner in
commit 09fd4b4ef5 ("x86: use cpuid to check MWAIT support for C1")

The decision tree is removed in
commit 69fb3676df ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")

Prefer MWAIT when the following conditions are satisfied:
    1. CPUID_Fn00000001_ECX [Monitor] should be set
    2. CPUID_Fn00000005 should be supported
    3. If CPUID_Fn00000005_ECX [EMX] is set then there should be
       at least one C1 substate available, indicated by
       CPUID_Fn00000005_EDX [MWaitC1SubStates] bits.

Otherwise use HLT for default_idle function.

HPC customers who want to optimize for lower latency are known to
disable Global C-States in the BIOS. In fact, some vendors allow
choosing a BIOS 'performance' profile which explicitly disables
C-States.  In this scenario, the cpuidle driver will not be loaded and
the kernel will continue with the default idle state chosen at boot
time. On AMD systems currently the default idle state is HLT which has
a higher exit latency compared to MWAIT.

The reason for the choice of HLT over MWAIT on AMD systems is:

1. Families prior to 10h didn't support MWAIT
2. Families 10h-15h supported MWAIT, but not MWAIT C1. Hence it was
   preferable to use HLT as the default state on these systems.

However, AMD Family 17h onwards supports MWAIT as well as MWAIT C1. And
it is preferable to use MWAIT as the default idle state on these
systems, as it has lower exit latencies.

The below table represents the exit latency for HLT and MWAIT on AMD
Zen 3 system. Exit latency is measured by issuing a wakeup (IPI) to
other CPU and measuring how many clock cycles it took to wakeup.  Each
iteration measures 10K wakeups by pinning source and destination.

HLT:

25.0000th percentile  :      1900 ns
50.0000th percentile  :      2000 ns
75.0000th percentile  :      2300 ns
90.0000th percentile  :      2500 ns
95.0000th percentile  :      2600 ns
99.0000th percentile  :      2800 ns
99.5000th percentile  :      3000 ns
99.9000th percentile  :      3400 ns
99.9500th percentile  :      3600 ns
99.9900th percentile  :      5900 ns
  Min latency         :      1700 ns
  Max latency         :      5900 ns
Total Samples      9999

MWAIT:

25.0000th percentile  :      1400 ns
50.0000th percentile  :      1500 ns
75.0000th percentile  :      1700 ns
90.0000th percentile  :      1800 ns
95.0000th percentile  :      1900 ns
99.0000th percentile  :      2300 ns
99.5000th percentile  :      2500 ns
99.9000th percentile  :      3200 ns
99.9500th percentile  :      3500 ns
99.9900th percentile  :      4600 ns
  Min latency         :      1200 ns
  Max latency         :      4600 ns
Total Samples      9997

Improvement (99th percentile): 21.74%

Below is another result for context_switch2 micro-benchmark, which
brings out the impact of improved wakeup latency through increased
context-switches per second.

with HLT:
-------------------------------
50.0000th percentile  :  190184
75.0000th percentile  :  191032
90.0000th percentile  :  192314
95.0000th percentile  :  192520
99.0000th percentile  :  192844
MIN  :  190148
MAX  :  192852

with MWAIT:
-------------------------------
50.0000th percentile  :  277444
75.0000th percentile  :  278268
90.0000th percentile  :  278888
95.0000th percentile  :  279164
99.0000th percentile  :  280504
MIN  :  273278
MAX  :  281410

Improvement(99th percentile): ~ 45.46%

Signed-off-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Link: https://ozlabs.org/~anton/junkcode/context_switch2.c
Link: https://lkml.kernel.org/r/0cc675d8fd1f55e41b510e10abf2e21b6e9803d5.1654538381.git-series.wyes.karny@amd.com
2022-06-08 13:00:19 -07:00
Linus Torvalds 34f4335c16 * Fix syzkaller NULL pointer dereference
* Fix TDP MMU performance issue with disabling dirty logging
 * Fix 5.14 regression with SVM TSC scaling
 * Fix indefinite stall on applying live patches
 * Fix unstable selftest
 * Fix memory leak from wrong copy-and-paste
 * Fix missed PV TLB flush when racing with emulation
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmKglysUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOJDAgArpPcAnJbeT2VQTQcp94e4tp9k1Sf
 gmUewajco4zFVB/sldE0fIporETkaX+FYYPiaNDdNgJ2lUw/HUJBN7KoFEYTZ37N
 Xx/qXiIXQYFw1bmxTnacLzIQtD3luMCzOs/6/Q7CAFZIBpUtUEjkMlQOBuxoKeG0
 B0iLCTJSw0taWcN170aN8G6T+5+bdR3AJW1k2wkgfESfYF9NfJoTUHQj9WTMzM2R
 aBRuXvUI/rWKvQY3DfoRmgg9Ig/SirSC+abbKIs4H08vZIEUlPk3WOZSKpsN/Wzh
 3XDnVRxgnaRLx6NI/ouI2UYJCmjPKbNcueGCf5IfUcHvngHjAEG/xxe4Qw==
 =zQ9u
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:

 - syzkaller NULL pointer dereference

 - TDP MMU performance issue with disabling dirty logging

 - 5.14 regression with SVM TSC scaling

 - indefinite stall on applying live patches

 - unstable selftest

 - memory leak from wrong copy-and-paste

 - missed PV TLB flush when racing with emulation

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: do not report a vCPU as preempted outside instruction boundaries
  KVM: x86: do not set st->preempted when going back to user space
  KVM: SVM: fix tsc scaling cache logic
  KVM: selftests: Make hyperv_clock selftest more stable
  KVM: x86/MMU: Zap non-leaf SPTEs when disabling dirty logging
  x86: drop bogus "cc" clobber from __try_cmpxchg_user_asm()
  KVM: x86/mmu: Check every prev_roots in __kvm_mmu_free_obsolete_roots()
  entry/kvm: Exit to user mode when TIF_NOTIFY_SIGNAL is set
  KVM: Don't null dereference ops->destroy
2022-06-08 09:16:31 -07:00
Tao Xu 2f4073e08f KVM: VMX: Enable Notify VM exit
There are cases that malicious virtual machines can cause CPU stuck (due
to event windows don't open up), e.g., infinite loop in microcode when
nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and
IRQ) can be delivered. It leads the CPU to be unavailable to host or
other VMs.

VMM can enable notify VM exit that a VM exit generated if no event
window occurs in VM non-root mode for a specified amount of time (notify
window).

Feature enabling:
- The new vmcs field SECONDARY_EXEC_NOTIFY_VM_EXITING is introduced to
  enable this feature. VMM can set NOTIFY_WINDOW vmcs field to adjust
  the expected notify window.
- Add a new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT so that user space
  can query and enable this feature in per-VM scope. The argument is a
  64bit value: bits 63:32 are used for notify window, and bits 31:0 are
  for flags. Current supported flags:
  - KVM_X86_NOTIFY_VMEXIT_ENABLED: enable the feature with the notify
    window provided.
  - KVM_X86_NOTIFY_VMEXIT_USER: exit to userspace once the exits happen.
- It's safe to even set notify window to zero since an internal hardware
  threshold is added to vmcs.notify_window.

VM exit handling:
- Introduce a vcpu state notify_window_exits to records the count of
  notify VM exits and expose it through the debugfs.
- Notify VM exit can happen incident to delivery of a vector event.
  Allow it in KVM.
- Exit to userspace unconditionally for handling when VM_CONTEXT_INVALID
  bit is set.

Nested handling
- Nested notify VM exits are not supported yet. Keep the same notify
  window control in vmcs02 as vmcs01, so that L1 can't escape the
  restriction of notify VM exits through launching L2 VM.

Notify VM exit is defined in latest Intel Architecture Instruction Set
Extensions Programming Reference, chapter 9.2.

Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
Co-developed-by: Chenyi Qiang <chenyi.qiang@intel.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20220524135624.22988-5-chenyi.qiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08 05:56:24 -04:00
Sean Christopherson 938c8745bc KVM: x86: Introduce "struct kvm_caps" to track misc caps/settings
Add kvm_caps to hold a variety of capabilites and defaults that aren't
handled by kvm_cpu_caps because they aren't CPUID bits in order to reduce
the amount of boilerplate code required to add a new feature.  The vast
majority (all?) of the caps interact with vendor code and are written
only during initialization, i.e. should be tagged __read_mostly, declared
extern in x86.h, and exported.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220524135624.22988-4-chenyi.qiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08 05:21:16 -04:00
Chenyi Qiang ed2351174e KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault
For the triple fault sythesized by KVM, e.g. the RSM path or
nested_vmx_abort(), if KVM exits to userspace before the request is
serviced, userspace could migrate the VM and lose the triple fault.

Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault with a
new event KVM_VCPUEVENT_VALID_FAULT_FAULT so that userspace can save and
restore the triple fault event. This extension is guarded by a new KVM
capability KVM_CAP_TRIPLE_FAULT_EVENT.

Note that in the set_vcpu_events path, userspace is able to set/clear
the triple fault request through triple_fault.pending field.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20220524135624.22988-2-chenyi.qiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08 05:20:53 -04:00
Like Xu 7aadaa988c KVM: x86/pmu: Drop amd_event_mapping[] in the KVM context
All gp or fixed counters have been reprogrammed using PERF_TYPE_RAW,
which means that the table that maps perf_hw_id to event select values is
no longer useful, at least for AMD.

For Intel, the logic to check if the pmu event reported by Intel cpuid is
not available is still required, in which case pmc_perf_hw_id() could be
renamed to hw_event_is_unavail() and a bool value is returned to replace
the semantics of "PERF_COUNT_HW_MAX+1".

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220518132512.37864-12-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08 04:49:06 -04:00
Like Xu dc852ff5bb perf: x86/core: Add interface to query perfmon_event_map[] directly
Currently, we have [intel|knc|p4|p6]_perfmon_event_map on the Intel
platforms and amd_[f17h]_perfmon_event_map on the AMD platforms.

Early clumsy KVM code or other potential perf_event users may have
hard-coded these perfmon_maps (e.g., arch/x86/kvm/svm/pmu.c), so
it would not make sense to program a common hardware event based
on the generic "enum perf_hw_id" once the two tables do not match.

Let's provide an interface for callers outside the perf subsystem to get
the counter config based on the perfmon_event_map currently in use,
and it also helps to save bytes.

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Like Xu <likexu@tencent.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220518132512.37864-10-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08 04:49:01 -04:00
Like Xu 854250329c KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations
The guest PEBS will be disabled when some users try to perf KVM and
its user-space through the same PEBS facility OR when the host perf
doesn't schedule the guest PEBS counter in a one-to-one mapping manner
(neither of these are typical scenarios).

The PEBS records in the guest DS buffer are still accurate and the
above two restrictions will be checked before each vm-entry only if
guest PEBS is deemed to be enabled.

Suggested-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220411101946.20262-15-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08 04:48:14 -04:00
Like Xu 902caeb684 KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS
If IA32_PERF_CAPABILITIES.PEBS_BASELINE [bit 14] is set, the adaptive
PEBS is supported. The PEBS_DATA_CFG MSR and adaptive record enable
bits (IA32_PERFEVTSELx.Adaptive_Record and IA32_FIXED_CTR_CTRL.
FCx_Adaptive_Record) are also supported.

Adaptive PEBS provides software the capability to configure the PEBS
records to capture only the data of interest, keeping the record size
compact. An overflow of PMCx results in generation of an adaptive PEBS
record with state information based on the selections specified in
MSR_PEBS_DATA_CFG.By default, the record only contain the Basic group.

When guest adaptive PEBS is enabled, the IA32_PEBS_ENABLE MSR will
be added to the perf_guest_switch_msr() and switched during the VMX
transitions just like CORE_PERF_GLOBAL_CTRL MSR.

According to Intel SDM, software is recommended to  PEBS Baseline
when the following is true. IA32_PERF_CAPABILITIES.PEBS_BASELINE[14]
&& IA32_PERF_CAPABILITIES.PEBS_FMT[11:8] ≥ 4.

Co-developed-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220411101946.20262-12-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08 04:48:06 -04:00
Like Xu 8183a538cd KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS
When CPUID.01H:EDX.DS[21] is set, the IA32_DS_AREA MSR exists and points
to the linear address of the first byte of the DS buffer management area,
which is used to manage the PEBS records.

When guest PEBS is enabled, the MSR_IA32_DS_AREA MSR will be added to the
perf_guest_switch_msr() and switched during the VMX transitions just like
CORE_PERF_GLOBAL_CTRL MSR. The WRMSR to IA32_DS_AREA MSR brings a #GP(0)
if the source register contains a non-canonical address.

Originally-by: Andi Kleen <ak@linux.intel.com>
Co-developed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220411101946.20262-11-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08 04:48:03 -04:00
Like Xu c59a1f106f KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS
If IA32_PERF_CAPABILITIES.PEBS_BASELINE [bit 14] is set, the
IA32_PEBS_ENABLE MSR exists and all architecturally enumerated fixed
and general-purpose counters have corresponding bits in IA32_PEBS_ENABLE
that enable generation of PEBS records. The general-purpose counter bits
start at bit IA32_PEBS_ENABLE[0], and the fixed counter bits start at
bit IA32_PEBS_ENABLE[32].

When guest PEBS is enabled, the IA32_PEBS_ENABLE MSR will be
added to the perf_guest_switch_msr() and atomically switched during
the VMX transitions just like CORE_PERF_GLOBAL_CTRL MSR.

Based on whether the platform supports x86_pmu.pebs_ept, it has also
refactored the way to add more msrs to arr[] in intel_guest_get_msrs()
for extensibility.

Originally-by: Andi Kleen <ak@linux.intel.com>
Co-developed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Co-developed-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220411101946.20262-8-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08 04:47:55 -04:00
Like Xu 2c985527dd KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter
The mask value of fixed counter control register should be dynamic
adjusted with the number of fixed counters. This patch introduces a
variable that includes the reserved bits of fixed counter control
registers. This is a generic code refactoring.

Co-developed-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220411101946.20262-6-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08 04:47:50 -04:00
Like Xu 39a4d77954 perf/x86/core: Pass "struct kvm_pmu *" to determine the guest values
Splitting the logic for determining the guest values is unnecessarily
confusing, and potentially fragile. Perf should have full knowledge and
control of what values are loaded for the guest.

If we change .guest_get_msrs() to take a struct kvm_pmu pointer, then it
can generate the full set of guest values by grabbing guest ds_area and
pebs_data_cfg. Alternatively, .guest_get_msrs() could take the desired
guest MSR values directly (ds_area and pebs_data_cfg), but kvm_pmu is
vendor agnostic, so we don't see any reason to not just pass the pointer.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220411101946.20262-4-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08 04:47:45 -04:00
Like Xu fb358e0b81 perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake Server
Add support for EPT-Friendly PEBS, a new CPU feature that enlightens PEBS
to translate guest linear address through EPT, and facilitates handling
VM-Exits that occur when accessing PEBS records.  More information can
be found in the December 2021 release of Intel's SDM, Volume 3,
18.9.5 "EPT-Friendly PEBS". This new hardware facility makes sure the
guest PEBS records will not be lost, which is available on Intel Ice Lake
Server platforms (and later).

KVM will check this field through perf_get_x86_pmu_capability() instead
of hard coding the CPU models in the KVM code. If it is supported, the
guest PEBS capability will be exposed to the guest. Guest PEBS can be
enabled when and only when "EPT-Friendly PEBS" is supported and
EPT is enabled.

Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220411101946.20262-2-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08 04:47:39 -04:00
Chao Gao d588bb9be1 KVM: VMX: enable IPI virtualization
With IPI virtualization enabled, the processor emulates writes to
APIC registers that would send IPIs. The processor sets the bit
corresponding to the vector in target vCPU's PIR and may send a
notification (IPI) specified by NDST and NV fields in target vCPU's
Posted-Interrupt Descriptor (PID). It is similar to what IOMMU
engine does when dealing with posted interrupt from devices.

A PID-pointer table is used by the processor to locate the PID of a
vCPU with the vCPU's APIC ID. The table size depends on maximum APIC
ID assigned for current VM session from userspace. Allocating memory
for PID-pointer table is deferred to vCPU creation, because irqchip
mode and VM-scope maximum APIC ID is settled at that point. KVM can
skip PID-pointer table allocation if !irqchip_in_kernel().

Like VT-d PI, if a vCPU goes to blocked state, VMM needs to switch its
notification vector to wakeup vector. This can ensure that when an IPI
for blocked vCPUs arrives, VMM can get control and wake up blocked
vCPUs. And if a VCPU is preempted, its posted interrupt notification
is suppressed.

Note that IPI virtualization can only virualize physical-addressing,
flat mode, unicast IPIs. Sending other IPIs would still cause a
trap-like APIC-write VM-exit and need to be handled by VMM.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Message-Id: <20220419154510.11938-1-guang.zeng@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08 04:47:37 -04:00
Zeng Guang 3587531638 KVM: x86: Allow userspace to set maximum VCPU id for VM
Introduce new max_vcpu_ids in KVM for x86 architecture. Userspace
can assign maximum possible vcpu id for current VM session using
KVM_CAP_MAX_VCPU_ID of KVM_ENABLE_CAP ioctl().

This is done for x86 only because the sole use case is to guide
memory allocation for PID-pointer table, a structure needed to
enable VMX IPI.

By default, max_vcpu_ids set as KVM_MAX_VCPU_IDS.

Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Message-Id: <20220419154444.11888-1-guang.zeng@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08 04:47:31 -04:00
Robert Hoo 1ad4e5438c KVM: VMX: Detect Tertiary VM-Execution control when setup VMCS config
Check VMX features on tertiary execution control in VMCS config setup.
Sub-features in tertiary execution control to be enabled are adjusted
according to hardware capabilities although no sub-feature is enabled
in this patch.

EVMCSv1 doesn't support tertiary VM-execution control, so disable it
when EVMCSv1 is in use. And define the auxiliary functions for Tertiary
control field here, using the new BUILD_CONTROLS_SHADOW().

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Message-Id: <20220419153400.11642-1-guang.zeng@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08 04:47:18 -04:00
Robert Hoo 465932db25 x86/cpu: Add new VMX feature, Tertiary VM-Execution control
A new 64-bit control field "tertiary processor-based VM-execution
controls", is defined [1]. It's controlled by bit 17 of the primary
processor-based VM-execution controls.

Different from its brother VM-execution fields, this tertiary VM-
execution controls field is 64 bit. So it occupies 2 vmx_feature_leafs,
TERTIARY_CTLS_LOW and TERTIARY_CTLS_HIGH.

Its companion VMX capability reporting MSR,MSR_IA32_VMX_PROCBASED_CTLS3
(0x492), is also semantically different from its brothers, whose 64 bits
consist of all allow-1, rather than 32-bit allow-0 and 32-bit allow-1 [1][2].
Therefore, its init_vmx_capabilities() is a little different from others.

[1] ISE 6.2 "VMCS Changes"
https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

[2] SDM Vol3. Appendix A.3

Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Message-Id: <20220419153240.11549-1-guang.zeng@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08 04:47:13 -04:00
Sean Christopherson 2d61391270 KVM: x86: Differentiate Soft vs. Hard IRQs vs. reinjected in tracepoint
In the IRQ injection tracepoint, differentiate between Hard IRQs and Soft
"IRQs", i.e. interrupts that are reinjected after incomplete delivery of
a software interrupt from an INTn instruction.  Tag reinjected interrupts
as such, even though the information is usually redundant since soft
interrupts are only ever reinjected by KVM.  Though rare in practice, a
hard IRQ can be reinjected.

Signed-off-by: Sean Christopherson <seanjc@google.com>
[MSS: change "kvm_inj_virq" event "reinjected" field type to bool]
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <9664d49b3bd21e227caa501cff77b0569bebffe2.1651440202.git.maciej.szmigiero@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08 04:47:01 -04:00
Paolo Bonzini 6cd88243c7 KVM: x86: do not report a vCPU as preempted outside instruction boundaries
If a vCPU is outside guest mode and is scheduled out, it might be in the
process of making a memory access.  A problem occurs if another vCPU uses
the PV TLB flush feature during the period when the vCPU is scheduled
out, and a virtual address has already been translated but has not yet
been accessed, because this is equivalent to using a stale TLB entry.

To avoid this, only report a vCPU as preempted if sure that the guest
is at an instruction boundary.  A rescheduling request will be delivered
to the host physical CPU as an external interrupt, so for simplicity
consider any vmexit *not* instruction boundary except for external
interrupts.

It would in principle be okay to report the vCPU as preempted also
if it is sleeping in kvm_vcpu_block(): a TLB flush IPI will incur the
vmentry/vmexit overhead unnecessarily, and optimistic spinning is
also unlikely to succeed.  However, leave it for later because right
now kvm_vcpu_check_block() is doing memory accesses.  Even
though the TLB flush issue only applies to virtual memory address,
it's very much preferrable to be conservative.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08 04:21:07 -04:00
Paolo Bonzini a280e35846 Merge branch 'kvm-5.19-early-fixes' into HEAD 2022-06-07 12:06:02 -04:00
Jan Beulich 1df931d95f x86: drop bogus "cc" clobber from __try_cmpxchg_user_asm()
As noted (and fixed) a couple of times in the past, "=@cc<cond>" outputs
and clobbering of "cc" don't work well together. The compiler appears to
mean to reject such, but doesn't - in its upstream form - quite manage
to yet for "cc". Furthermore two similar macros don't clobber "cc", and
clobbering "cc" is pointless in asm()-s for x86 anyway - the compiler
always assumes status flags to be clobbered there.

Fixes: 989b5db215 ("x86/uaccess: Implement macros for CMPXCHG on user addresses")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Message-Id: <485c0c0b-a3a7-0b7c-5264-7d00c01de032@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-07 11:28:49 -04:00
Linus Torvalds 44688ffd11 A set of objtool fixes:
- Handle __ubsan_handle_builtin_unreachable() correctly and treat it as
     noreturn.
 
   - Allow architectures to select uaccess validation
 
   - Use the non-instrumented bit test for test_cpu_has() to prevent escape
     from non-instrumentable regions.
 
   - Use arch_ prefixed atomics for JUMP_LABEL=n builds to prevent escape
     from non-instrumentable regions.
 
   - Mark a few tiny inline as __always_inline to prevent GCC from bringing
     them out of line and instrumenting them.
 
   - Mark the empty stub context_tracking_enabled() as always inline as GCC
     brings them out of line and instruments the empty shell.
 
   - Annotate ex_handler_msr_mce() as dead end
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmKccvMTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoW39EAC1w/mSwn3b1lYkzOUcoe4EOMXzao2U
 my0ThnPNpa5k14Xfp6tOIpWRGTuW6mcVi4g+x4+LJo9V5tt5BxmMO1VFrTCQKn7H
 iJ1sWZNGq503aXldIT+pC0Zz67CIVnbGiz0D67aEYQ7w4ACdkubx8kcx5Of7BNbm
 KyQllP8XFXy7b+wgc8MrX1h/wPXNV9PBJwRAFrBw52c4s5euYui7iUNUm4RtKRem
 OpI3RFholAITLzvV8j+Xs9EmfUDjvmU3e1NEEas2n3MHm7tkYo5aSOSYX/Z7C5YD
 MvpMS3UAgwRGdaXvRVJK7eWcwayjODGGYrW9x9w9RMKM492uB4vAzfr4PE3Lru5G
 mnOxDjEP4QRK7Jl8bC0Idc5G6bxmw4DnQl7vkoaNYn3EyxKaEvREUokFKy5eWp3U
 klFQZXgQreUGSEkVA8VW7yT6knzVNsBk2WSFDUPdQZ0PV7JAVLyGZX8gEbhDyyim
 czkmI21A3hmGR97FKxyQ0I1N6q8eKSodZWbquPdOW52Jdt6pkpzUqPok9r74PK/p
 83ip/bNthbaR8FccNCHbnCLd8kvp6lsjqLqnMQHhMtUju6uRPRTzW1rxKik3Cbfh
 8VmqP6ltNGD7MkQW/jW+Vq7GIM+9onnEHbA/aEntH/ZKDHEefYtE66T0BjSrS6YK
 5dMr/vz4Jx1bwg==
 =eNVp
 -----END PGP SIGNATURE-----

Merge tag 'objtool-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fixes from Thomas Gleixner:

 - Handle __ubsan_handle_builtin_unreachable() correctly and treat it as
   noreturn

 - Allow architectures to select uaccess validation

 - Use the non-instrumented bit test for test_cpu_has() to prevent
   escape from non-instrumentable regions

 - Use arch_ prefixed atomics for JUMP_LABEL=n builds to prevent escape
   from non-instrumentable regions

 - Mark a few tiny inline as __always_inline to prevent GCC from
   bringing them out of line and instrumenting them

 - Mark the empty stub context_tracking_enabled() as always inline as
   GCC brings them out of line and instruments the empty shell

 - Annotate ex_handler_msr_mce() as dead end

* tag 'objtool-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/extable: Annotate ex_handler_msr_mce() as a dead end
  context_tracking: Always inline empty stubs
  x86: Always inline on_thread_stack() and current_top_of_stack()
  jump_label,noinstr: Avoid instrumentation for JUMP_LABEL=n builds
  x86/cpu: Elide KCSAN for cpu_has() and friends
  objtool: Mark __ubsan_handle_builtin_unreachable() as noreturn
  objtool: Add CONFIG_HAVE_UACCESS_VALIDATION
2022-06-05 09:45:27 -07:00
Linus Torvalds 4ccbe91de9 xen: branch for v5.19-rc1b
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYprzPAAKCRCAXGG7T9hj
 vuTzAQC4GiDXcD/cfLVcEqdyw1diCWZjuOfuznUqy5ZUBAZjvAD/draFHTeO96+k
 qyZyzFggPIziaAOIUZ2DkJ/NqSAmbA8=
 =dl1E
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.19-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull more xen updates from Juergen Gross:
 "Two cleanup patches for Xen related code and (more important) an
  update of MAINTAINERS for Xen, as Boris Ostrovsky decided to step
  down"

* tag 'for-linus-5.19-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: replace xen_remap() with memremap()
  MAINTAINERS: Update Xen maintainership
  xen: switch gnttab_end_foreign_access() to take a struct page pointer
2022-06-04 13:42:53 -07:00
Linus Torvalds 1ec6574a3c This set of changes updates init and user mode helper tasks to be
ordinary user mode tasks.
 
 In commit 40966e316f ("kthread: Ensure struct kthread is present for
 all kthreads") caused init and the user mode helper threads that call
 kernel_execve to have struct kthread allocated for them.  This struct
 kthread going away during execve in turned made a use after free of
 struct kthread possible.
 
 The commit 343f4c49f2 ("kthread: Don't allocate kthread_struct for
 init and umh") is enough to fix the use after free and is simple enough
 to be backportable.
 
 The rest of the changes pass struct kernel_clone_args to clean things
 up and cause the code to make sense.
 
 In making init and the user mode helpers tasks purely user mode tasks
 I ran into two complications.  The function task_tick_numa was
 detecting tasks without an mm by testing for the presence of
 PF_KTHREAD.  The initramfs code in populate_initrd_image was using
 flush_delayed_fput to ensuere the closing of all it's file descriptors
 was complete, and flush_delayed_fput does not work in a userspace thread.
 
 I have looked and looked and more complications and in my code review
 I have not found any, and neither has anyone else with the code sitting
 in linux-next.
 
 Link: https://lkml.kernel.org/r/87mtfu4up3.fsf@email.froward.int.ebiederm.org
 
 Eric W. Biederman (8):
       kthread: Don't allocate kthread_struct for init and umh
       fork: Pass struct kernel_clone_args into copy_thread
       fork: Explicity test for idle tasks in copy_thread
       fork: Generalize PF_IO_WORKER handling
       init: Deal with the init process being a user mode process
       fork: Explicitly set PF_KTHREAD
       fork: Stop allowing kthreads to call execve
       sched: Update task_tick_numa to ignore tasks without an mm
 
  arch/alpha/kernel/process.c      | 13 ++++++------
  arch/arc/kernel/process.c        | 13 ++++++------
  arch/arm/kernel/process.c        | 12 ++++++-----
  arch/arm64/kernel/process.c      | 12 ++++++-----
  arch/csky/kernel/process.c       | 15 ++++++-------
  arch/h8300/kernel/process.c      | 10 ++++-----
  arch/hexagon/kernel/process.c    | 12 ++++++-----
  arch/ia64/kernel/process.c       | 15 +++++++------
  arch/m68k/kernel/process.c       | 12 ++++++-----
  arch/microblaze/kernel/process.c | 12 ++++++-----
  arch/mips/kernel/process.c       | 13 ++++++------
  arch/nios2/kernel/process.c      | 12 ++++++-----
  arch/openrisc/kernel/process.c   | 12 ++++++-----
  arch/parisc/kernel/process.c     | 18 +++++++++-------
  arch/powerpc/kernel/process.c    | 15 +++++++------
  arch/riscv/kernel/process.c      | 12 ++++++-----
  arch/s390/kernel/process.c       | 12 ++++++-----
  arch/sh/kernel/process_32.c      | 12 ++++++-----
  arch/sparc/kernel/process_32.c   | 12 ++++++-----
  arch/sparc/kernel/process_64.c   | 12 ++++++-----
  arch/um/kernel/process.c         | 15 +++++++------
  arch/x86/include/asm/fpu/sched.h |  2 +-
  arch/x86/include/asm/switch_to.h |  8 +++----
  arch/x86/kernel/fpu/core.c       |  4 ++--
  arch/x86/kernel/process.c        | 18 +++++++++-------
  arch/xtensa/kernel/process.c     | 17 ++++++++-------
  fs/exec.c                        |  8 ++++---
  include/linux/sched/task.h       |  8 +++++--
  init/initramfs.c                 |  2 ++
  init/main.c                      |  2 +-
  kernel/fork.c                    | 46 +++++++++++++++++++++++++++++++++-------
  kernel/sched/fair.c              |  2 +-
  kernel/umh.c                     |  6 +++---
  33 files changed, 234 insertions(+), 160 deletions(-)
 
 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmKaR/MACgkQC/v6Eiaj
 j0Aayg/7Bx66872d9c6igkJ+MPCTuh+v9QKCGwiYEmiU4Q5sVAFB0HPJO27qC14u
 630X0RFNZTkPzNNEJNIW4kw6Dj8s8YRKf+FgQAVt4SzdRwT7eIPDjk1nGraopPJ3
 O04pjvuTmUyidyViRyFcf2ptx/pnkrwP8jUSc+bGTgfASAKAgAokqKE5ecjewbBc
 Y/EAkQ6QW7KxPjeSmpAHwI+t3BpBev9WEC4PbhRhsBCQFO2+PJiklvqdhVNBnIjv
 qUezll/1xv9UYgniB15Q4Nb722SmnWSU3r8as1eFPugzTHizKhufrrpyP+KMK1A0
 tdtEJNs5t2DZF7ZbGTFSPqJWmyTYLrghZdO+lOmnaSjHxK4Nda1d4NzbefJ0u+FE
 tutewowvHtBX6AFIbx+H3O+DOJM2IgNMf+ReQDU/TyNyVf3wBrTbsr9cLxypIJIp
 zze8npoLMlB7B4yxVo5ES5e63EXfi3iHl0L3/1EhoGwriRz1kWgVLUX/VZOUpscL
 RkJHsW6bT8sqxPWAA5kyWjEN+wNR2PxbXi8OE4arT0uJrEBMUgDCzydzOv5tJB00
 mSQdytxH9LVdsmxBKAOBp5X6WOLGA4yb1cZ6E/mEhlqXMpBDF1DaMfwbWqxSYi4q
 sp5zU3SBAW0qceiZSsWZXInfbjrcQXNV/DkDRDO9OmzEZP4m1j0=
 =x6fy
 -----END PGP SIGNATURE-----

Merge tag 'kthread-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Pull kthread updates from Eric Biederman:
 "This updates init and user mode helper tasks to be ordinary user mode
  tasks.

  Commit 40966e316f ("kthread: Ensure struct kthread is present for
  all kthreads") caused init and the user mode helper threads that call
  kernel_execve to have struct kthread allocated for them. This struct
  kthread going away during execve in turned made a use after free of
  struct kthread possible.

  Here, commit 343f4c49f2 ("kthread: Don't allocate kthread_struct for
  init and umh") is enough to fix the use after free and is simple
  enough to be backportable.

  The rest of the changes pass struct kernel_clone_args to clean things
  up and cause the code to make sense.

  In making init and the user mode helpers tasks purely user mode tasks
  I ran into two complications. The function task_tick_numa was
  detecting tasks without an mm by testing for the presence of
  PF_KTHREAD. The initramfs code in populate_initrd_image was using
  flush_delayed_fput to ensuere the closing of all it's file descriptors
  was complete, and flush_delayed_fput does not work in a userspace
  thread.

  I have looked and looked and more complications and in my code review
  I have not found any, and neither has anyone else with the code
  sitting in linux-next"

* tag 'kthread-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  sched: Update task_tick_numa to ignore tasks without an mm
  fork: Stop allowing kthreads to call execve
  fork: Explicitly set PF_KTHREAD
  init: Deal with the init process being a user mode process
  fork: Generalize PF_IO_WORKER handling
  fork: Explicity test for idle tasks in copy_thread
  fork: Pass struct kernel_clone_args into copy_thread
  kthread: Don't allocate kthread_struct for init and umh
2022-06-03 16:03:05 -07:00
Linus Torvalds 93ce7948e3 Follow-up tweaks for the EFI changes in v5.19
- fix new DXE service invocations for mixed mode
 - use correct Kconfig symbol when setting PE header flag
 - clean up the drivers/firmware/efi Kconfig dependencies so that
   features that depend on CONFIG_EFI are hidden from the UI when the
   symbol is not enabled.
 
 Also included is a RISC-V bugfix from Heinrich to avoid read-write
 mappings of read-only firmware regions in the EFI page tables.
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmKXk5YACgkQw08iOZLZ
 jyQQ0wv/cB9Z9kJur3wJqj75HFEly7bwSk5oxJ+txRytApSaRYnqm7l4WeP3QQ8c
 o9GzZZNwoRQSx1mCBJefaO4s8fA24QkIeD8Oy4MeucKaPX1UbNc6Z84srOynjpSj
 mOyIYB+kurxsCBKmzQQBy8txIWld4EkrMhEoc1h2L4d2+OVRvIlsu1PMv03eCiww
 4Sop0yO5CydEpjxJDCfwol0L/PBiXc2PfRs2FdHFwOSQaisQLxhNruCnovyS9Zwk
 zLkhYC5dS+OZctknl6XMzOAi3x7sNYzVwNf4+yhFeU2cTuj3kJWnEAqs3CU/tiPO
 DOobLg/r/j7H44Nsc/8aJGT4GPNrbUrb6aOcfrBAkxvsu1Sp/k/UfSMZLS9fU1gC
 XUUl46NXG1yFpCntruQm5SMytVKdtlyUu7pPa+Ijmr+vc6UWl1XJq26J3UpiiFYT
 mjrer5gvzrnhuvUjIb4ulKoNMdoOQQMtofLxUGuc1u/53jWHxbiKt7/QvyFepJVe
 zi/ikD/v
 =7wiT
 -----END PGP SIGNATURE-----

Merge tag 'efi-next-for-v5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull more EFI updates from Ard Biesheuvel:
 "Follow-up tweaks for EFI changes - they mostly address issues
  introduced this merge window, except for Heinrich's patch:

   - fix new DXE service invocations for mixed mode

   - use correct Kconfig symbol when setting PE header flag

   - clean up the drivers/firmware/efi Kconfig dependencies so that
     features that depend on CONFIG_EFI are hidden from the UI when the
     symbol is not enabled.

  Also included is a RISC-V bugfix from Heinrich to avoid read-write
  mappings of read-only firmware regions in the EFI page tables"

* tag 'efi-next-for-v5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: clean up Kconfig dependencies on CONFIG_EFI
  efi/x86: libstub: Make DXE calls mixed mode safe
  efi: x86: Fix config name for setting the NX-compatibility flag in the PE header
  riscv: read-only pages should not be writable
2022-06-03 13:39:30 -07:00
Linus Torvalds 7c9e960c63 Livepatching changes for 5.19
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmKYlREACgkQUqAMR0iA
 lPLHUQ/9EdD1kQsxuvZRTBAr79WmAI9vS41R3owXRUhmpFf0pSjIkD23gRni1Y+a
 f2NVxQY8y9mLUEp/YXeDUX5lqdCDD2iJtin+ZlUGv8+F7rRlMyIEDoudSUKsTRlF
 ufkzKE00tqUkz7J/U1KEbD2u/nWFS5q8DyHc81pC0bHpkiBT8y+wgEtIQy1oGVqV
 OilcHPXpQUV/sw1RsaRGdwJAgJSJcoHk57JelWWlV5fo0ogK1gjIl/kUadAChNzT
 2o8zoWu6fQlqCAI1AJkKLngdcybbXPwWKclcGUze1sfR9+fGGRzqorwAupm5bmkp
 1oZsODqxDdVfxz2/VMtdjjacm3ECJmznqBumdMrdM3WVjh944xkb7xVOp/1xJWfc
 wrmQy4dshXa4OTnvAivBYlgbaUzld5HPQD/v89KBLP3SJkd6p9PVT+qa87hjJ2uH
 sRDjtZxTkcKKoYU8CnASFBkaOHYHOWbRHbEjCkIgo2nHXE4W3fOnaQKBF3jUiZJr
 OimQMWjAqtXh8Gb0B7IIPxuGFjAjKepTEdN+Jwk8/eqXNTFqgGPbu1hOf+zpq/oj
 ekMIy3CUIsempqZfdrJlMOAGIEsx5JGBrdC5KwAEsHymW1XNJVrkpLtTFrtsF1tW
 X6qWzhkNMkOv9/0yoR2GmXiBR10pcXGMsd+EvNewpTIDCcjkYxI=
 =X4cs
 -----END PGP SIGNATURE-----

Merge tag 'livepatching-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching

Pull livepatching cleanup from Petr Mladek:

 - Remove duplicated livepatch code [Christophe]

* tag 'livepatching-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
  livepatch: Remove klp_arch_set_pc() and asm/livepatch.h
2022-06-02 08:55:01 -07:00
Ard Biesheuvel 31f1a0edff efi/x86: libstub: Make DXE calls mixed mode safe
The newly added DXE calls use 64-bit quantities, which means we need to
marshall them explicitly when running in mixed mode. Currently, we get
away without it because we just bail when GetMemorySpaceDescriptor()
fails, which is guaranteed to happen due to the function argument mixup.

Let's fix this properly, though, by defining the macros that describe
how to marshall the arguments. While at it, drop an incorrect cast on a
status variable.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-06-01 14:08:12 +02:00
Juergen Gross 41925b105e xen: replace xen_remap() with memremap()
xen_remap() is used to establish mappings for frames not under direct
control of the kernel: for Xenstore and console ring pages, and for
grant pages of non-PV guests.

Today xen_remap() is defined to use ioremap() on x86 (doing uncached
mappings), and ioremap_cache() on Arm (doing cached mappings).

Uncached mappings for those use cases are bad for performance, so they
should be avoided if possible. As all use cases of xen_remap() don't
require uncached mappings (the mapped area is always physical RAM),
a mapping using the standard WB cache mode is fine.

As sparse is flagging some of the xen_remap() use cases to be not
appropriate for iomem(), as the result is not annotated with the
__iomem modifier, eliminate xen_remap() completely and replace all
use cases with memremap() specifying the MEMREMAP_WB caching mode.

xen_unmap() can be replaced with memunmap().

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Link: https://lore.kernel.org/r/20220530082634.6339-1-jgross@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2022-06-01 07:24:17 +02:00
Linus Torvalds 35b51afd23 RISC-V Patches for the 5.19 Merge Window, Part 1
* Support for the Svpbmt extension, which allows memory attributes to be
   encoded in pages.
 * Support for the Allwinner D1's implementation of page-based memory
   attributes.
 * Support for running rv32 binaries on rv64 systems, via the compat
   subsystem.
 * Support for kexec_file().
 * Support for the new generic ticket-based spinlocks, which allows us to
   also move to qrwlock.  These should have already gone in through the
   asm-geneic tree as well.
 * A handful of cleanups and fixes, include some larger ones around
   atomics and XIP.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmKWOx8THHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYieAiEADAUdP7ctoaSQwk5skd/fdA3b4KJuKn
 1Zjl+Br32WP0DlbirYBYWRUQZnCCsvABbTiwSJMcG7NBpU5pyQ5XDtB3OA5kJswO
 Fdp8Nd53//+GK1M5zdEM9OdgvT9fbfTZ3qTu8bKsROOQhGwnYL+Csc9KjFRqEmzN
 oQii0jlb3n5PM4FL3GsbV4uMn9zzkP9mnVAPQktcock2EKFEK/Fy3uNYMQiO2KPi
 n8O6bIDaeRdQ6SurzWOuOkt0cro0tEF85ilzT04mynQsOU0el5oGqCxnOhNH3VWg
 ndqPT6Yafw12hZOtbKJeP+nF8IIR6aJLP3jOtRwEVgcfbXYAw4QwbAV8kQZISefN
 ipn8JGY7GX9Y9TYU692OUGkcmAb3/dxb6c0WihBdvJ0M6YyLD5X+YKHNuG2onLgK
 ss43C5Mxsu629rsjdu/PV91B1+pve3rG9siVmF+g4eo0x9rjMq6/JB0Kal/8SLI1
 Je5T55d5ujV1a2XxhZLQOSD5owrK7J1M9owb0bloTnr9nVwFTWDrfEQEU82o3kP+
 Xm+FfXktnz9ai55NjkMbbEur5D++dKJhBavwCTnBcTrJmMtEH0R45GTK9ZehP+WC
 rNVrRXjIsS18wsTfJxnkZeFQA38as6VBKTzvwHvOgzTrrZU1/xk3lpkouYtAO6BG
 gKacHshVilmUuA==
 =Loi6
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-5.19-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

 - Support for the Svpbmt extension, which allows memory attributes to
   be encoded in pages

 - Support for the Allwinner D1's implementation of page-based memory
   attributes

 - Support for running rv32 binaries on rv64 systems, via the compat
   subsystem

 - Support for kexec_file()

 - Support for the new generic ticket-based spinlocks, which allows us
   to also move to qrwlock. These should have already gone in through
   the asm-geneic tree as well

 - A handful of cleanups and fixes, include some larger ones around
   atomics and XIP

* tag 'riscv-for-linus-5.19-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (51 commits)
  RISC-V: Prepare dropping week attribute from arch_kexec_apply_relocations[_add]
  riscv: compat: Using seperated vdso_maps for compat_vdso_info
  RISC-V: Fix the XIP build
  RISC-V: Split out the XIP fixups into their own file
  RISC-V: ignore xipImage
  RISC-V: Avoid empty create_*_mapping definitions
  riscv: Don't output a bogus mmu-type on a no MMU kernel
  riscv: atomic: Add custom conditional atomic operation implementation
  riscv: atomic: Optimize dec_if_positive functions
  riscv: atomic: Cleanup unnecessary definition
  RISC-V: Load purgatory in kexec_file
  RISC-V: Add purgatory
  RISC-V: Support for kexec_file on panic
  RISC-V: Add kexec_file support
  RISC-V: use memcpy for kexec_file mode
  kexec_file: Fix kexec_file.c build error for riscv platform
  riscv: compat: Add COMPAT Kbuild skeletal support
  riscv: compat: ptrace: Add compat_arch_ptrace implement
  riscv: compat: signal: Add rt_frame implementation
  riscv: add memory-type errata for T-Head
  ...
2022-05-31 14:10:54 -07:00
Linus Torvalds 76bfd3de34 tracing updates for 5.19:
- The majority of the changes are for fixes and clean ups.
 
 Noticeable changes:
 
 - Rework trace event triggers code to be easier to interact with.
 
 - Support for embedding bootconfig with the kernel (as suppose to having it
   embedded in initram). This is useful for embedded boards without initram
   disks.
 
 - Speed up boot by parallelizing the creation of tracefs files.
 
 - Allow absolute ring buffer timestamps handle timestamps that use more than
   59 bits.
 
 - Added new tracing clock "TAI" (International Atomic Time)
 
 - Have weak functions show up in available_filter_function list as:
    __ftrace_invalid_address___<invalid-offset>
   instead of using the name of the function before it.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYpOgXRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qjkKAQDbpemxvpFyJlZqT8KgEIXubu+ag2/q
 p0XDHaPS0zF9OQEAjTxg6GMEbnFYl6fzxZtOoEbiaQ7ppfdhRI8t6sSMVA8=
 =+nDD
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "The majority of the changes are for fixes and clean ups.

  Notable changes:

   - Rework trace event triggers code to be easier to interact with.

   - Support for embedding bootconfig with the kernel (as suppose to
     having it embedded in initram). This is useful for embedded boards
     without initram disks.

   - Speed up boot by parallelizing the creation of tracefs files.

   - Allow absolute ring buffer timestamps handle timestamps that use
     more than 59 bits.

   - Added new tracing clock "TAI" (International Atomic Time)

   - Have weak functions show up in available_filter_function list as:
     __ftrace_invalid_address___<invalid-offset> instead of using the
     name of the function before it"

* tag 'trace-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (52 commits)
  ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function
  tracing: Fix comments for event_trigger_separate_filter()
  x86/traceponit: Fix comment about irq vector tracepoints
  x86,tracing: Remove unused headers
  ftrace: Clean up hash direct_functions on register failures
  tracing: Fix comments of create_filter()
  tracing: Disable kcov on trace_preemptirq.c
  tracing: Initialize integer variable to prevent garbage return value
  ftrace: Fix typo in comment
  ftrace: Remove return value of ftrace_arch_modify_*()
  tracing: Cleanup code by removing init "char *name"
  tracing: Change "char *" string form to "char []"
  tracing/timerlat: Do not wakeup the thread if the trace stops at the IRQ
  tracing/timerlat: Print stacktrace in the IRQ handler if needed
  tracing/timerlat: Notify IRQ new max latency only if stop tracing is set
  kprobes: Fix build errors with CONFIG_KRETPROBES=n
  tracing: Fix return value of trace_pid_write()
  tracing: Fix potential double free in create_var_ref()
  tracing: Use strim() to remove whitespace instead of doing it manually
  ftrace: Deal with error return code of the ftrace_process_locs() function
  ...
2022-05-29 10:31:36 -07:00
Steven Rostedt (Google) b39181f7c6 ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function
If an unused weak function was traced, it's call to fentry will still
exist, which gets added into the __mcount_loc table. Ftrace will use
kallsyms to retrieve the name for each location in __mcount_loc to display
it in the available_filter_functions and used to enable functions via the
name matching in set_ftrace_filter/notrace. Enabling these functions do
nothing but enable an unused call to ftrace_caller. If a traced weak
function is overridden, the symbol of the function would be used for it,
which will either created duplicate names, or if the previous function was
not traced, it would be incorrectly be listed in available_filter_functions
as a function that can be traced.

This became an issue with BPF[1] as there are tooling that enables the
direct callers via ftrace but then checks to see if the functions were
actually enabled. The case of one function that was marked notrace, but
was followed by an unused weak function that was traced. The unused
function's call to fentry was added to the __mcount_loc section, and
kallsyms retrieved the untraced function's symbol as the weak function was
overridden. Since the untraced function would not get traced, the BPF
check would detect this and fail.

The real fix would be to fix kallsyms to not show addresses of weak
functions as the function before it. But that would require adding code in
the build to add function size to kallsyms so that it can know when the
function ends instead of just using the start of the next known symbol.

In the mean time, this is a work around. Add a FTRACE_MCOUNT_MAX_OFFSET
macro that if defined, ftrace will ignore any function that has its call
to fentry/mcount that has an offset from the symbol that is greater than
FTRACE_MCOUNT_MAX_OFFSET.

If CONFIG_HAVE_FENTRY is defined for x86, define FTRACE_MCOUNT_MAX_OFFSET
to zero (unless IBT is enabled), which will have ftrace ignore all locations
that are not at the start of the function (or one after the ENDBR
instruction).

A worker thread is added at boot up to scan all the ftrace record entries,
and will mark any that fail the FTRACE_MCOUNT_MAX_OFFSET test as disabled.
They will still appear in the available_filter_functions file as:

  __ftrace_invalid_address___<invalid-offset>

(showing the offset that caused it to be invalid).

This is required for tools that use libtracefs (like trace-cmd does) that
scan the available_filter_functions and enable set_ftrace_filter and
set_ftrace_notrace using indexes of the function listed in the file (this
is a speedup, as enabling thousands of files via names is an O(n^2)
operation and can take minutes to complete, where the indexing takes less
than a second).

The invalid functions cannot be removed from available_filter_functions as
the names there correspond to the ftrace records in the array that manages
them (and the indexing depends on this).

[1] https://lore.kernel.org/all/20220412094923.0abe90955e5db486b7bca279@kernel.org/

Link: https://lkml.kernel.org/r/20220526141912.794c2786@gandalf.local.home

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-28 09:31:19 -04:00
Linus Torvalds 35cdd8656e libnvdimm for 5.19
- Add support for clearing memory error via pwrite(2) on DAX
 
 - Fix 'security overwrite' support in the presence of media errors
 
 - Miscellaneous cleanups and fixes for nfit_test (nvdimm unit tests)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCYpFPcQAKCRDfioYZHlFs
 Z9A3AQCdfoT5sY3OK+I/3oTvJ//6lw2MtXrnXFM046ICKPi9sgD8CzR9mRAHA+vj
 kxOtJEU2bA9naninXGORsDUndiNkwQo=
 =gVIn
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm and DAX updates from Dan Williams:
 "New support for clearing memory errors when a file is in DAX mode,
  alongside with some other fixes and cleanups.

  Previously it was only possible to clear these errors using a truncate
  or hole-punch operation to trigger the filesystem to reallocate the
  block, now, any page aligned write can opportunistically clear errors
  as well.

  This change spans x86/mm, nvdimm, and fs/dax, and has received the
  appropriate sign-offs. Thanks to Jane for her work on this.

  Summary:

   - Add support for clearing memory error via pwrite(2) on DAX

   - Fix 'security overwrite' support in the presence of media errors

   - Miscellaneous cleanups and fixes for nfit_test (nvdimm unit tests)"

* tag 'libnvdimm-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  pmem: implement pmem_recovery_write()
  pmem: refactor pmem_clear_poison()
  dax: add .recovery_write dax_operation
  dax: introduce DAX_RECOVERY_WRITE dax access mode
  mce: fix set_mce_nospec to always unmap the whole page
  x86/mce: relocate set{clear}_mce_nospec() functions
  acpi/nfit: rely on mce->misc to determine poison granularity
  testing: nvdimm: asm/mce.h is not needed in nfit.c
  testing: nvdimm: iomap: make __nfit_test_ioremap a macro
  nvdimm: Allow overwrite in the presence of disabled dimms
  tools/testing/nvdimm: remove unneeded flush_workqueue
2022-05-27 15:49:30 -07:00
Linus Torvalds 3cc30140db pci-v5.19-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmKP/tQUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vzH0xAAojQowrSWzZ5FKTqI+L/L9ZXoAb+e
 9IvQljKc9taJldmXp+EB9wkS/5B+VtQcC2qUQuWEQXUoECF8qHlcB4l+XQyd1tWO
 O0vZxETH22xjLLrjG2F3l5rrfkJZAf2nEugwbDk97YEgiimeOiRcv3bx6AUCtj6I
 rPJ13Fop3Jke7sQMcXYJe3gQLT1o1AKiQGghiCFNi/gzx2lXI6mmHBgLxFoiqcby
 WpfXbvbJti95HRaahUR3HaDFfHj4HVkQNLlTtIykJ3Tl2/rOhWEJjI8JOIQpAA+M
 WBrWw9rfgbScTiGV+dZ3h7hKiPnHKl9YETIX7L0oA2sj0jZcIs0d6mSBZx0kYuI9
 eAlx+qSK9xpbQQr/fdYaUdF1q4QdtU0BYOvOWOzWsqYCECMRJ1PUHFSMbmR/+PNB
 P5lHnAbggRSoxdAtwFYv1HTr+VpGH9S+5oxHCz3ohpMjYy6mkCZwHpZn3doaU3ci
 KG6yIoVKftm3fZdtFvL03qHl/I8+X24ZhT/T/278PRGjkhSyr56hZo8hg0gqqTct
 ngip8qNABmSbqpr73/W6Vl42zAbYtNk1BykYahbKupgW8FbT7hqaZTB05V87pVu+
 Ko1aJM6VoOP9rMlKHI9ba8eYCzDrZbLZUFn7ljNPDpzutf0tAwtgwzvZBXN3za6+
 Z9+D5dxmvrZEIbA=
 =hEti
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci updates from Bjorn Helgaas:
 "Resource management:

   - Restrict E820 clipping to PCI host bridge windows (Bjorn Helgaas)

   - Log E820 clipping better (Bjorn Helgaas)

   - Add kernel cmdline options to enable/disable E820 clipping (Hans de
     Goede)

   - Disable E820 reserved region clipping for IdeaPads, Yoga, Yoga
     Slip, Acer Spin 5, Clevo Barebone systems where clipping leaves no
     usable address space for touchpads, Thunderbolt devices, etc (Hans
     de Goede)

   - Disable E820 clipping by default starting in 2023 (Hans de Goede)

  PCI device hotplug:

   - Include files to remove implicit dependencies (Christophe Leroy)

   - Only put Root Ports in D3 if they can signal and wake from D3 so
     AMD Yellow Carp doesn't miss hotplug events (Mario Limonciello)

  Power management:

   - Define pci_restore_standard_config() only for CONFIG_PM_SLEEP since
     it's unused otherwise (Krzysztof Kozlowski)

   - Power up devices completely, including anything platform firmware
     needs to do, during runtime resume (Rafael J. Wysocki)

   - Move pci_resume_bus() to PM callbacks so we observe the required
     bridge power-up delays (Rafael J. Wysocki)

   - Drop unneeded runtime_d3cold device flag (Rafael J. Wysocki)

   - Split pci_raw_set_power_state() between pci_power_up() and a new
     pci_set_low_power_state() (Rafael J. Wysocki)

   - Set current_state to D3cold if config read returns ~0, indicating
     the device is not accessible (Rafael J. Wysocki)

   - Do not call pci_update_current_state() from pci_power_up() so BARs
     and ASPM config are restored correctly (Rafael J. Wysocki)

   - Write 0 to PMCSR in pci_power_up() in all cases (Rafael J. Wysocki)

   - Split pci_power_up() to pci_set_full_power_state() to avoid some
     redundant operations (Rafael J. Wysocki)

   - Skip restoring BARs if device is not in D0 (Rafael J. Wysocki)

   - Rearrange and clarify pci_set_power_state() (Rafael J. Wysocki)

   - Remove redundant BAR restores from pci_pm_thaw_noirq() (Rafael J.
     Wysocki)

  Virtualization:

   - Acquire device lock before config space access lock to avoid AB/BA
     deadlock with sriov_numvfs_store() (Yicong Yang)

  Error handling:

   - Clear MULTI_ERR_COR/UNCOR_RCV bits, which a race could previously
     leave permanently set (Kuppuswamy Sathyanarayanan)

  Peer-to-peer DMA:

   - Whitelist Intel Skylake-E Root Ports regardless of which devfn they
     are (Shlomo Pongratz)

  ASPM:

   - Override L1 acceptable latency advertised by Intel DG2 so ASPM L1
     can be enabled (Mika Westerberg)

  Cadence PCIe controller driver:

   - Set up device-specific register to allow PTM Responder to be
     enabled by the normal architected bit (Christian Gmeiner)

   - Override advertised FLR support since the controller doesn't
     implement FLR correctly (Parshuram Thombare)

  Cadence PCIe endpoint driver:

   - Correct bitmap size for the ob_region_map of outbound window usage
     (Dan Carpenter)

  Freescale i.MX6 PCIe controller driver:

   - Fix PERST# assertion/deassertion so we observe the required delays
     before accessing device (Francesco Dolcini)

  Freescale Layerscape PCIe controller driver:

   - Add "big-endian" DT property (Hou Zhiqiang)

   - Update SCFG DT property (Hou Zhiqiang)

   - Add "aer", "pme", "intr" DT properties (Li Yang)

   - Add DT compatible strings for ls1028a (Xiaowei Bao)

  Intel VMD host bridge driver:

   - Assign VMD IRQ domain before enumeration to avoid IOMMU interrupt
     remapping errors when MSI-X remapping is disabled (Nirmal Patel)

   - Revert VMD workaround that kept MSI-X remapping enabled when IOMMU
     remapping was enabled (Nirmal Patel)

  Marvell MVEBU PCIe controller driver:

   - Add of_pci_get_slot_power_limit() to parse the
     'slot-power-limit-milliwatt' DT property (Pali Rohár)

   - Add mvebu support for sending Set_Slot_Power_Limit message (Pali
     Rohár)

  MediaTek PCIe controller driver:

   - Fix refcount leak in mtk_pcie_subsys_powerup() (Miaoqian Lin)

  MediaTek PCIe Gen3 controller driver:

   - Reset PHY and MAC at probe time (AngeloGioacchino Del Regno)

  Microchip PolarFlare PCIe controller driver:

   - Add chained_irq_enter()/chained_irq_exit() calls to mc_handle_msi()
     and mc_handle_intx() to avoid lost interrupts (Conor Dooley)

   - Fix interrupt handling race (Daire McNamara)

  NVIDIA Tegra194 PCIe controller driver:

   - Drop tegra194 MSI register save/restore, which is unnecessary since
     the DWC core does it (Jisheng Zhang)

  Qualcomm PCIe controller driver:

   - Add SM8150 SoC DT binding and support (Bhupesh Sharma)

   - Fix pipe clock imbalance (Johan Hovold)

   - Fix runtime PM imbalance on probe errors (Johan Hovold)

   - Fix PHY init imbalance on probe errors (Johan Hovold)

   - Convert DT binding to YAML (Dmitry Baryshkov)

   - Update DT binding to show that resets aren't required for
     MSM8996/APQ8096 platforms (Dmitry Baryshkov)

   - Add explicit register names per chipset in DT binding (Dmitry
     Baryshkov)

   - Add sc7280-specific clock and reset definitions to DT binding
     (Dmitry Baryshkov)

  Rockchip PCIe controller driver:

   - Fix bitmap size when searching for free outbound region (Dan
     Carpenter)

  Rockchip DesignWare PCIe controller driver:

   - Remove "snps,dw-pcie" from rockchip-dwc DT "compatible" property
     because it's not fully compatible with rockchip (Peter Geis)

   - Reset rockchip-dwc controller at probe (Peter Geis)

   - Add rockchip-dwc INTx support (Peter Geis)

  Synopsys DesignWare PCIe controller driver:

   - Return error instead of success if DMA mapping of MSI area fails
     (Jiantao Zhang)

  Miscellaneous:

   - Change pci_set_dma_mask() documentation references to
     dma_set_mask() (Alex Williamson)"

* tag 'pci-v5.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (64 commits)
  dt-bindings: PCI: qcom: Add schema for sc7280 chipset
  dt-bindings: PCI: qcom: Specify reg-names explicitly
  dt-bindings: PCI: qcom: Do not require resets on msm8996 platforms
  dt-bindings: PCI: qcom: Convert to YAML
  PCI: qcom: Fix unbalanced PHY init on probe errors
  PCI: qcom: Fix runtime PM imbalance on probe errors
  PCI: qcom: Fix pipe clock imbalance
  PCI: qcom: Add SM8150 SoC support
  dt-bindings: pci: qcom: Document PCIe bindings for SM8150 SoC
  x86/PCI: Disable E820 reserved region clipping starting in 2023
  x86/PCI: Disable E820 reserved region clipping via quirks
  x86/PCI: Add kernel cmdline options to use/ignore E820 reserved regions
  PCI: microchip: Fix potential race in interrupt handling
  PCI/AER: Clear MULTI_ERR_COR/UNCOR_RCV bits
  PCI: cadence: Clear FLR in device capabilities register
  PCI: cadence: Allow PTM Responder to be enabled
  PCI: vmd: Revert 2565e5b69c ("PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is enabled by IOMMU.")
  PCI: vmd: Assign VMD IRQ domain before enumeration
  PCI: Avoid pci_dev_lock() AB/BA deadlock with sriov_numvfs_store()
  PCI: rockchip-dwc: Add legacy interrupt support
  ...
2022-05-27 15:25:10 -07:00
Linus Torvalds 77fb622de1 Six hotfixes. One from Miaohe Lin is considered a minor thing so it isn't
for -stable.  The remainder address pre-5.19 issues and are cc:stable.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYpEC8gAKCRDdBJ7gKXxA
 jlukAQDCaXF7YTBjpoaAl0zhSu+5h7CawiB6cnRlq87/uJ2S4QD/eLVX3zfxI2DX
 YcOhc5H8BOgZ8ppD80Nv9qjmyvEWzAA=
 =ZFFG
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2022-05-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull hotfixes from Andrew Morton:
 "Six hotfixes.

  The page_table_check one from Miaohe Lin is considered a minor thing
  so it isn't marked for -stable. The remainder address pre-5.19 issues
  and are cc:stable"

* tag 'mm-hotfixes-stable-2022-05-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/page_table_check: fix accessing unmapped ptep
  kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add]
  mm/page_alloc: always attempt to allocate at least one page during bulk allocation
  hugetlb: fix huge_pmd_unshare address update
  zsmalloc: fix races between asynchronous zspage free and page migration
  Revert "mm/cma.c: remove redundant cma_mutex lock"
2022-05-27 11:29:35 -07:00
Naveen N. Rao 3e35142ef9 kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add]
Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
symbols") [1], binutils (v2.36+) started dropping section symbols that
it thought were unused.  This isn't an issue in general, but with
kexec_file.c, gcc is placing kexec_arch_apply_relocations[_add] into a
separate .text.unlikely section and the section symbol ".text.unlikely"
is being dropped. Due to this, recordmcount is unable to find a non-weak
symbol in .text.unlikely to generate a relocation record against.

Address this by dropping the weak attribute from these functions.
Instead, follow the existing pattern of having architectures #define the
name of the function they want to override in their headers.

[1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1

[akpm@linux-foundation.org: arch/s390/include/asm/kexec.h needs linux/module.h]
Link: https://lkml.kernel.org/r/20220519091237.676736-1-naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 08:55:18 -07:00
Borislav Petkov 2028a255f4 x86/extable: Annotate ex_handler_msr_mce() as a dead end
Fix

  vmlinux.o: warning: objtool: fixup_exception+0x2d6: unreachable instruction

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220520192729.23969-1-bp@alien8.de
2022-05-27 12:34:45 +02:00
Peter Zijlstra 1894a40305 x86: Always inline on_thread_stack() and current_top_of_stack()
Becaues GCC clearly lost it's marbles again...

vmlinux.o: warning: objtool: enter_from_user_mode+0x4e: call to on_thread_stack() leaves .noinstr.text section
vmlinux.o: warning: objtool: syscall_enter_from_user_mode+0x53: call to on_thread_stack() leaves .noinstr.text section
vmlinux.o: warning: objtool: syscall_enter_from_user_mode_prepare+0x4e: call to on_thread_stack() leaves .noinstr.text section
vmlinux.o: warning: objtool: irqentry_enter_from_user_mode+0x4e: call to on_thread_stack() leaves .noinstr.text section

vmlinux.o: warning: objtool: enter_from_user_mode+0x4e: call to current_top_of_stack() leaves .noinstr.text section
vmlinux.o: warning: objtool: syscall_enter_from_user_mode+0x53: call to current_top_of_stack() leaves .noinstr.text section
vmlinux.o: warning: objtool: syscall_enter_from_user_mode_prepare+0x4e: call to current_top_of_stack() leaves .noinstr.text section
vmlinux.o: warning: objtool: irqentry_enter_from_user_mode+0x4e: call to current_top_of_stack() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220526105958.071435483@infradead.org
2022-05-27 12:34:44 +02:00
Peter Zijlstra a6a5eb269f x86/cpu: Elide KCSAN for cpu_has() and friends
As x86 uses the <asm-generic/bitops/instrumented-*.h> headers, the
regular forms of all bitops are instrumented with explicit calls to
KASAN and KCSAN checks. As these are explicit calls, these are not
suppressed by the noinstr function attribute.

This can result in calls to those check functions in noinstr code, which
objtool warns about:

vmlinux.o: warning: objtool: enter_from_user_mode+0x24: call to __kcsan_check_access() leaves .noinstr.text section
vmlinux.o: warning: objtool: syscall_enter_from_user_mode+0x28: call to __kcsan_check_access() leaves .noinstr.text section
vmlinux.o: warning: objtool: syscall_enter_from_user_mode_prepare+0x24: call to __kcsan_check_access() leaves .noinstr.text section
vmlinux.o: warning: objtool: irqentry_enter_from_user_mode+0x24: call to __kcsan_check_access() leaves .noinstr.text section

Prevent this by using the arch_*() bitops, which are the underlying
bitops without explciit instrumentation.

[null: Changelog]
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220502111216.290518605@infradead.org
2022-05-27 12:34:43 +02:00
Linus Torvalds bf9095424d S390:
* ultravisor communication device driver
 
 * fix TEID on terminating storage key ops
 
 RISC-V:
 
 * Added Sv57x4 support for G-stage page table
 
 * Added range based local HFENCE functions
 
 * Added remote HFENCE functions based on VCPU requests
 
 * Added ISA extension registers in ONE_REG interface
 
 * Updated KVM RISC-V maintainers entry to cover selftests support
 
 ARM:
 
 * Add support for the ARMv8.6 WFxT extension
 
 * Guard pages for the EL2 stacks
 
 * Trap and emulate AArch32 ID registers to hide unsupported features
 
 * Ability to select and save/restore the set of hypercalls exposed
   to the guest
 
 * Support for PSCI-initiated suspend in collaboration with userspace
 
 * GICv3 register-based LPI invalidation support
 
 * Move host PMU event merging into the vcpu data structure
 
 * GICv3 ITS save/restore fixes
 
 * The usual set of small-scale cleanups and fixes
 
 x86:
 
 * New ioctls to get/set TSC frequency for a whole VM
 
 * Allow userspace to opt out of hypercall patching
 
 * Only do MSR filtering for MSRs accessed by rdmsr/wrmsr
 
 AMD SEV improvements:
 
 * Add KVM_EXIT_SHUTDOWN metadata for SEV-ES
 
 * V_TSC_AUX support
 
 Nested virtualization improvements for AMD:
 
 * Support for "nested nested" optimizations (nested vVMLOAD/VMSAVE,
   nested vGIF)
 
 * Allow AVIC to co-exist with a nested guest running
 
 * Fixes for LBR virtualizations when a nested guest is running,
   and nested LBR virtualization support
 
 * PAUSE filtering for nested hypervisors
 
 Guest support:
 
 * Decoupling of vcpu_is_preempted from PV spinlocks
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmKN9M4UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNLeAf+KizAlQwxEehHHeNyTkZuKyMawrD6
 zsqAENR6i1TxiXe7fDfPFbO2NR0ZulQopHbD9mwnHJ+nNw0J4UT7g3ii1IAVcXPu
 rQNRGMVWiu54jt+lep8/gDg0JvPGKVVKLhxUaU1kdWT9PhIOC6lwpP3vmeWkUfRi
 PFL/TMT0M8Nfryi0zHB0tXeqg41BiXfqO8wMySfBAHUbpv8D53D2eXQL6YlMM0pL
 2quB1HxHnpueE5vj3WEPQ3PCdy1M2MTfCDBJAbZGG78Ljx45FxSGoQcmiBpPnhJr
 C6UGP4ZDWpml5YULUoA70k5ylCbP+vI61U4vUtzEiOjHugpPV5wFKtx5nw==
 =ozWx
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "S390:

   - ultravisor communication device driver

   - fix TEID on terminating storage key ops

  RISC-V:

   - Added Sv57x4 support for G-stage page table

   - Added range based local HFENCE functions

   - Added remote HFENCE functions based on VCPU requests

   - Added ISA extension registers in ONE_REG interface

   - Updated KVM RISC-V maintainers entry to cover selftests support

  ARM:

   - Add support for the ARMv8.6 WFxT extension

   - Guard pages for the EL2 stacks

   - Trap and emulate AArch32 ID registers to hide unsupported features

   - Ability to select and save/restore the set of hypercalls exposed to
     the guest

   - Support for PSCI-initiated suspend in collaboration with userspace

   - GICv3 register-based LPI invalidation support

   - Move host PMU event merging into the vcpu data structure

   - GICv3 ITS save/restore fixes

   - The usual set of small-scale cleanups and fixes

  x86:

   - New ioctls to get/set TSC frequency for a whole VM

   - Allow userspace to opt out of hypercall patching

   - Only do MSR filtering for MSRs accessed by rdmsr/wrmsr

  AMD SEV improvements:

   - Add KVM_EXIT_SHUTDOWN metadata for SEV-ES

   - V_TSC_AUX support

  Nested virtualization improvements for AMD:

   - Support for "nested nested" optimizations (nested vVMLOAD/VMSAVE,
     nested vGIF)

   - Allow AVIC to co-exist with a nested guest running

   - Fixes for LBR virtualizations when a nested guest is running, and
     nested LBR virtualization support

   - PAUSE filtering for nested hypervisors

  Guest support:

   - Decoupling of vcpu_is_preempted from PV spinlocks"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (199 commits)
  KVM: x86: Fix the intel_pt PMI handling wrongly considered from guest
  KVM: selftests: x86: Sync the new name of the test case to .gitignore
  Documentation: kvm: reorder ARM-specific section about KVM_SYSTEM_EVENT_SUSPEND
  x86, kvm: use correct GFP flags for preemption disabled
  KVM: LAPIC: Drop pending LAPIC timer injection when canceling the timer
  x86/kvm: Alloc dummy async #PF token outside of raw spinlock
  KVM: x86: avoid calling x86 emulator without a decoded instruction
  KVM: SVM: Use kzalloc for sev ioctl interfaces to prevent kernel data leak
  x86/fpu: KVM: Set the base guest FPU uABI size to sizeof(struct kvm_xsave)
  s390/uv_uapi: depend on CONFIG_S390
  KVM: selftests: x86: Fix test failure on arch lbr capable platforms
  KVM: LAPIC: Trace LAPIC timer expiration on every vmentry
  KVM: s390: selftest: Test suppression indication on key prot exception
  KVM: s390: Don't indicate suppression on dirtying, failing memop
  selftests: drivers/s390x: Add uvdevice tests
  drivers/s390/char: Add Ultravisor io device
  MAINTAINERS: Update KVM RISC-V entry to cover selftests support
  RISC-V: KVM: Introduce ISA extension register
  RISC-V: KVM: Cleanup stale TLB entries when host CPU changes
  RISC-V: KVM: Add remote HFENCE functions based on VCPU requests
  ...
2022-05-26 14:20:14 -07:00
Linus Torvalds 98931dd95f Yang Shi has improved the behaviour of khugepaged collapsing of readonly
file-backed transparent hugepages.
 
 Johannes Weiner has arranged for zswap memory use to be tracked and
 managed on a per-cgroup basis.
 
 Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for runtime
 enablement of the recent huge page vmemmap optimization feature.
 
 Baolin Wang contributes a series to fix some issues around hugetlb
 pagetable invalidation.
 
 Zhenwei Pi has fixed some interactions between hwpoisoned pages and
 virtualization.
 
 Tong Tiangen has enabled the use of the presently x86-only
 page_table_check debugging feature on arm64 and riscv.
 
 David Vernet has done some fixup work on the memcg selftests.
 
 Peter Xu has taught userfaultfd to handle write protection faults against
 shmem- and hugetlbfs-backed files.
 
 More DAMON development from SeongJae Park - adding online tuning of the
 feature and support for monitoring of fixed virtual address ranges.  Also
 easier discovery of which monitoring operations are available.
 
 Nadav Amit has done some optimization of TLB flushing during mprotect().
 
 Neil Brown continues to labor away at improving our swap-over-NFS support.
 
 David Hildenbrand has some fixes to anon page COWing versus
 get_user_pages().
 
 Peng Liu fixed some errors in the core hugetlb code.
 
 Joao Martins has reduced the amount of memory consumed by device-dax's
 compound devmaps.
 
 Some cleanups of the arch-specific pagemap code from Anshuman Khandual.
 
 Muchun Song has found and fixed some errors in the TLB flushing of
 transparent hugepages.
 
 Roman Gushchin has done more work on the memcg selftests.
 
 And, of course, many smaller fixes and cleanups.  Notably, the customary
 million cleanup serieses from Miaohe Lin.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYo52xQAKCRDdBJ7gKXxA
 jtJFAQD238KoeI9z5SkPMaeBRYSRQmNll85mxs25KapcEgWgGQD9FAb7DJkqsIVk
 PzE+d9hEfirUGdL6cujatwJ6ejYR8Q8=
 =nFe6
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Almost all of MM here. A few things are still getting finished off,
  reviewed, etc.

   - Yang Shi has improved the behaviour of khugepaged collapsing of
     readonly file-backed transparent hugepages.

   - Johannes Weiner has arranged for zswap memory use to be tracked and
     managed on a per-cgroup basis.

   - Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for
     runtime enablement of the recent huge page vmemmap optimization
     feature.

   - Baolin Wang contributes a series to fix some issues around hugetlb
     pagetable invalidation.

   - Zhenwei Pi has fixed some interactions between hwpoisoned pages and
     virtualization.

   - Tong Tiangen has enabled the use of the presently x86-only
     page_table_check debugging feature on arm64 and riscv.

   - David Vernet has done some fixup work on the memcg selftests.

   - Peter Xu has taught userfaultfd to handle write protection faults
     against shmem- and hugetlbfs-backed files.

   - More DAMON development from SeongJae Park - adding online tuning of
     the feature and support for monitoring of fixed virtual address
     ranges. Also easier discovery of which monitoring operations are
     available.

   - Nadav Amit has done some optimization of TLB flushing during
     mprotect().

   - Neil Brown continues to labor away at improving our swap-over-NFS
     support.

   - David Hildenbrand has some fixes to anon page COWing versus
     get_user_pages().

   - Peng Liu fixed some errors in the core hugetlb code.

   - Joao Martins has reduced the amount of memory consumed by
     device-dax's compound devmaps.

   - Some cleanups of the arch-specific pagemap code from Anshuman
     Khandual.

   - Muchun Song has found and fixed some errors in the TLB flushing of
     transparent hugepages.

   - Roman Gushchin has done more work on the memcg selftests.

  ... and, of course, many smaller fixes and cleanups. Notably, the
  customary million cleanup serieses from Miaohe Lin"

* tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (381 commits)
  mm: kfence: use PAGE_ALIGNED helper
  selftests: vm: add the "settings" file with timeout variable
  selftests: vm: add "test_hmm.sh" to TEST_FILES
  selftests: vm: check numa_available() before operating "merge_across_nodes" in ksm_tests
  selftests: vm: add migration to the .gitignore
  selftests/vm/pkeys: fix typo in comment
  ksm: fix typo in comment
  selftests: vm: add process_mrelease tests
  Revert "mm/vmscan: never demote for memcg reclaim"
  mm/kfence: print disabling or re-enabling message
  include/trace/events/percpu.h: cleanup for "percpu: improve percpu_alloc_percpu event trace"
  include/trace/events/mmflags.h: cleanup for "tracing: incorrect gfp_t conversion"
  mm: fix a potential infinite loop in start_isolate_page_range()
  MAINTAINERS: add Muchun as co-maintainer for HugeTLB
  zram: fix Kconfig dependency warning
  mm/shmem: fix shmem folio swapoff hang
  cgroup: fix an error handling path in alloc_pagecache_max_30M()
  mm: damon: use HPAGE_PMD_SIZE
  tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolate
  nodemask.h: fix compilation error with GCC12
  ...
2022-05-26 12:32:41 -07:00
Linus Torvalds 3f306ea2e1 dma-mapping updates for Linux 5.19
- don't over-decrypt memory (Robin Murphy)
  - takes min align mask into account for the swiotlb max mapping size
    (Tianyu Lan)
  - use GFP_ATOMIC in dma-debug (Mikulas Patocka)
  - fix DMA_ATTR_NO_KERNEL_MAPPING on xen/arm (me)
  - don't fail on highmem CMA pages in dma_direct_alloc_pages (me)
  - cleanup swiotlb initialization and share more code with swiotlb-xen
    (me, Stefano Stabellini)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmKObTQLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYObmA//dIcDB/q4iFGD+WJh4MhM+asx0ZsdF2OJz42WEhgT
 Z9duOrgcneEQundCamqJP9rNTs980LHDA8uWQC5rZEc9vxuRVOdS7bSgYRUwWh6B
 r0ZjOsvQCn+ChoZML8uyk4rfmEINq+EvJuec3G5fgecZOhPuJS2i2uzzv5cHwqgP
 ChC0fwyZlkfdECXgvZXbEoCJLfTgGNlziN6Ai8dirSoqgEQUoCsY89/M7OiEBvV2
 R4XUWD7OvQERfB4t6xLuUHyzf9PAuWB+OiblRVNeAmK3lMjxVrc3k4kIowgklnzD
 8hfmphAa9Zou3zdfi6Gd4fiQRHRVOwKVp1rtqUmJ+lPSiwyMzu64z9ld2+2qac0h
 V4sSr/yJkhxnBT4/0MkTChvhnRobisackpUzNRpiM4ck7cNVb7eAvkISsbH+pWI9
 aEexPhbyskjlV+GOyM4QL4ygG0dpXY0HSyoh6uaSVsaXMycnWIsJCPidXxV1HGV0
 q2/RLHuHwYxia8cYCF01/DQvwOKSjwbU0zModxtRezGD5GYh2C0a+SrA1aX+qiTu
 yGJCs2UHtSQstAt78tTVp499YeDeL/oGSQkPAu8zyRkSczzF+CncGTuXyoJbAWyK
 otcgERWljgZ4scxjfu1uacfoVhKQ7nOu7hiJokL0U80FESAennLC3ZlocvB9h/ff
 HNA=
 =n2rk
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-5.19-2022-05-25' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

 - don't over-decrypt memory (Robin Murphy)

 - takes min align mask into account for the swiotlb max mapping size
   (Tianyu Lan)

 - use GFP_ATOMIC in dma-debug (Mikulas Patocka)

 - fix DMA_ATTR_NO_KERNEL_MAPPING on xen/arm (me)

 - don't fail on highmem CMA pages in dma_direct_alloc_pages (me)

 - cleanup swiotlb initialization and share more code with swiotlb-xen
   (me, Stefano Stabellini)

* tag 'dma-mapping-5.19-2022-05-25' of git://git.infradead.org/users/hch/dma-mapping: (23 commits)
  dma-direct: don't over-decrypt memory
  swiotlb: max mapping size takes min align mask into account
  swiotlb: use the right nslabs-derived sizes in swiotlb_init_late
  swiotlb: use the right nslabs value in swiotlb_init_remap
  swiotlb: don't panic when the swiotlb buffer can't be allocated
  dma-debug: change allocation mode from GFP_NOWAIT to GFP_ATIOMIC
  dma-direct: don't fail on highmem CMA pages in dma_direct_alloc_pages
  swiotlb-xen: fix DMA_ATTR_NO_KERNEL_MAPPING on arm
  x86: remove cruft from <asm/dma-mapping.h>
  swiotlb: remove swiotlb_init_with_tbl and swiotlb_init_late_with_tbl
  swiotlb: merge swiotlb-xen initialization into swiotlb
  swiotlb: provide swiotlb_init variants that remap the buffer
  swiotlb: pass a gfp_mask argument to swiotlb_init_late
  swiotlb: add a SWIOTLB_ANY flag to lift the low memory restriction
  swiotlb: make the swiotlb_init interface more useful
  x86: centralize setting SWIOTLB_FORCE when guest memory encryption is enabled
  x86: remove the IOMMU table infrastructure
  MIPS/octeon: use swiotlb_init instead of open coding it
  arm/xen: don't check for xen_initial_domain() in xen_create_contiguous_region
  swiotlb: rename swiotlb_late_init_with_default_size
  ...
2022-05-25 19:18:36 -07:00
Linus Torvalds 7e062cda7d Networking changes for 5.19.
Core
 ----
 
  - Support TCPv6 segmentation offload with super-segments larger than
    64k bytes using the IPv6 Jumbogram extension header (AKA BIG TCP).
 
  - Generalize skb freeing deferral to per-cpu lists, instead of
    per-socket lists.
 
  - Add a netdev statistic for packets dropped due to L2 address
    mismatch (rx_otherhost_dropped).
 
  - Continue work annotating skb drop reasons.
 
  - Accept alternative netdev names (ALT_IFNAME) in more netlink
    requests.
 
  - Add VLAN support for AF_PACKET SOCK_RAW GSO.
 
  - Allow receiving skb mark from the socket as a cmsg.
 
  - Enable memcg accounting for veth queues, sysctl tables and IPv6.
 
 BPF
 ---
 
  - Add libbpf support for User Statically-Defined Tracing (USDTs).
 
  - Speed up symbol resolution for kprobes multi-link attachments.
 
  - Support storing typed pointers to referenced and unreferenced
    objects in BPF maps.
 
  - Add support for BPF link iterator.
 
  - Introduce access to remote CPU map elements in BPF per-cpu map.
 
  - Allow middle-of-the-road settings for the
    kernel.unprivileged_bpf_disabled sysctl.
 
  - Implement basic types of dynamic pointers e.g. to allow for
    dynamically sized ringbuf reservations without extra memory copies.
 
 Protocols
 ---------
 
  - Retire port only listening_hash table, add a second bind table
    hashed by port and address. Avoid linear list walk when binding
    to very popular ports (e.g. 443).
 
  - Add bridge FDB bulk flush filtering support allowing user space
    to remove all FDB entries matching a condition.
 
  - Introduce accept_unsolicited_na sysctl for IPv6 to implement
    router-side changes for RFC9131.
 
  - Support for MPTCP path manager in user space.
 
  - Add MPTCP support for fallback to regular TCP for connections
    that have never connected additional subflows or transmitted
    out-of-sequence data (partial support for RFC8684 fallback).
 
  - Avoid races in MPTCP-level window tracking, stabilize and improve
    throughput.
 
  - Support lockless operation of GRE tunnels with seq numbers enabled.
 
  - WiFi support for host based BSS color collision detection.
 
  - Add support for SO_TXTIME/SCM_TXTIME on CAN sockets.
 
  - Support transmission w/o flow control in CAN ISOTP (ISO 15765-2).
 
  - Support zero-copy Tx with TLS 1.2 crypto offload (sendfile).
 
  - Allow matching on the number of VLAN tags via tc-flower.
 
  - Add tracepoint for tcp_set_ca_state().
 
 Driver API
 ----------
 
  - Improve error reporting from classifier and action offload.
 
  - Add support for listing line cards in switches (devlink).
 
  - Add helpers for reporting page pool statistics with ethtool -S.
 
  - Add support for reading clock cycles when using PTP virtual clocks,
    instead of having the driver convert to time before reporting.
    This makes it possible to report time from different vclocks.
 
  - Support configuring low-latency Tx descriptor push via ethtool.
 
  - Separate Clause 22 and Clause 45 MDIO accesses more explicitly.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - Marvell's Octeon NIC PCI Endpoint support (octeon_ep)
    - Sunplus SP7021 SoC (sp7021_emac)
    - Add support for Renesas RZ/V2M (in ravb)
    - Add support for MediaTek mt7986 switches (in mtk_eth_soc)
 
  - Ethernet PHYs:
    - ADIN1100 industrial PHYs (w/ 10BASE-T1L and SQI reporting)
    - TI DP83TD510 PHY
    - Microchip LAN8742/LAN88xx PHYs
 
  - WiFi:
    - Driver for pureLiFi X, XL, XC devices (plfxlc)
    - Driver for Silicon Labs devices (wfx)
    - Support for WCN6750 (in ath11k)
    - Support Realtek 8852ce devices (in rtw89)
 
  - Mobile:
    - MediaTek T700 modems (Intel 5G 5000 M.2 cards)
 
  - CAN:
   - ctucanfd: add support for CTU CAN FD open-source IP core
     from Czech Technical University in Prague
 
 Drivers
 -------
 
  - Delete a number of old drivers still using virt_to_bus().
 
  - Ethernet NICs:
    - intel: support TSO on tunnels MPLS
    - broadcom: support multi-buffer XDP
    - nfp: support VF rate limiting
    - sfc: use hardware tx timestamps for more than PTP
    - mlx5: multi-port eswitch support
    - hyper-v: add support for XDP_REDIRECT
    - atlantic: XDP support (including multi-buffer)
    - macb: improve real-time perf by deferring Tx processing to NAPI
 
  - High-speed Ethernet switches:
    - mlxsw: implement basic line card information querying
    - prestera: add support for traffic policing on ingress and egress
 
  - Embedded Ethernet switches:
    - lan966x: add support for packet DMA (FDMA)
    - lan966x: add support for PTP programmable pins
    - ti: cpsw_new: enable bc/mc storm prevention
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - Wake-on-WLAN support for QCA6390 and WCN6855
    - device recovery (firmware restart) support
    - support setting Specific Absorption Rate (SAR) for WCN6855
    - read country code from SMBIOS for WCN6855/QCA6390
    - enable keep-alive during WoWLAN suspend
    - implement remain-on-channel support
 
  - MediaTek WiFi (mt76):
    - support Wireless Ethernet Dispatch offloading packet movement
      between the Ethernet switch and WiFi interfaces
    - non-standard VHT MCS10-11 support
    - mt7921 AP mode support
    - mt7921 IPv6 NS offload support
 
  - Ethernet PHYs:
    - micrel: ksz9031/ksz9131: cabletest support
    - lan87xx: SQI support for T1 PHYs
    - lan937x: add interrupt support for link detection
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmKNMPQACgkQMUZtbf5S
 IrsRARAAuDyYs6jFYB3p+xazZdOnbF4iAgVv71+DQGvmsCl6CB9OrsNZMlvE85OL
 Q3gjcRbgjrkN4lhgI8DmiGYbsUJnAvVjFdNjccz1Z/vTLYvuIM0ol54MUp5S+9WY
 StncOJkOGJxxR/Gi5gzVmejPDsysU3Jik+hm/fpIcz8pybXxAsFKU5waY5qfl+/T
 TZepfV0VCfqRDjqcF1qA5+jJZNU8pdodQlZ1+mh8bwu6Jk1ZkWkj6Ov8MWdwQldr
 LnPeK/9hIGzkdJYHZfajxA3t8D0K5CHzSuih2bJ9ry8ZXgVBkXEThew778/R5izW
 uB0YZs9COFlrIP7XHjtRTy/2xHOdYIPlj2nWhVdfuQDX8Crvt4VRN6EZ1rjko1ZJ
 WanfG6WHF8NH5pXBRQbh3kIMKBnYn6OIzuCfCQSqd+niHcxFIM4vRiggeXI5C5TW
 vJgEWfK6X+NfDiFVa3xyCrEmp5ieA/pNecpwd8rVkql+MtFAAw4vfsotLKOJEAru
 J/XL6UE+YuLqIJV9ACZ9x1AFXXAo661jOxBunOo4VXhXVzWS9lYYz5r5ryIkgT/8
 /Fr0zjANJWgfIuNdIBtYfQ4qG+LozGq038VA06RhFUAZ5tF9DzhqJs2Q2AFuWWBC
 ewCePJVqo1j2Ceq2mGonXRt47OEnlePoOxTk9W+cKZb7ZWE+zEo=
 =Wjii
 -----END PGP SIGNATURE-----

Merge tag 'net-next-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core
  ----

   - Support TCPv6 segmentation offload with super-segments larger than
     64k bytes using the IPv6 Jumbogram extension header (AKA BIG TCP).

   - Generalize skb freeing deferral to per-cpu lists, instead of
     per-socket lists.

   - Add a netdev statistic for packets dropped due to L2 address
     mismatch (rx_otherhost_dropped).

   - Continue work annotating skb drop reasons.

   - Accept alternative netdev names (ALT_IFNAME) in more netlink
     requests.

   - Add VLAN support for AF_PACKET SOCK_RAW GSO.

   - Allow receiving skb mark from the socket as a cmsg.

   - Enable memcg accounting for veth queues, sysctl tables and IPv6.

  BPF
  ---

   - Add libbpf support for User Statically-Defined Tracing (USDTs).

   - Speed up symbol resolution for kprobes multi-link attachments.

   - Support storing typed pointers to referenced and unreferenced
     objects in BPF maps.

   - Add support for BPF link iterator.

   - Introduce access to remote CPU map elements in BPF per-cpu map.

   - Allow middle-of-the-road settings for the
     kernel.unprivileged_bpf_disabled sysctl.

   - Implement basic types of dynamic pointers e.g. to allow for
     dynamically sized ringbuf reservations without extra memory copies.

  Protocols
  ---------

   - Retire port only listening_hash table, add a second bind table
     hashed by port and address. Avoid linear list walk when binding to
     very popular ports (e.g. 443).

   - Add bridge FDB bulk flush filtering support allowing user space to
     remove all FDB entries matching a condition.

   - Introduce accept_unsolicited_na sysctl for IPv6 to implement
     router-side changes for RFC9131.

   - Support for MPTCP path manager in user space.

   - Add MPTCP support for fallback to regular TCP for connections that
     have never connected additional subflows or transmitted
     out-of-sequence data (partial support for RFC8684 fallback).

   - Avoid races in MPTCP-level window tracking, stabilize and improve
     throughput.

   - Support lockless operation of GRE tunnels with seq numbers enabled.

   - WiFi support for host based BSS color collision detection.

   - Add support for SO_TXTIME/SCM_TXTIME on CAN sockets.

   - Support transmission w/o flow control in CAN ISOTP (ISO 15765-2).

   - Support zero-copy Tx with TLS 1.2 crypto offload (sendfile).

   - Allow matching on the number of VLAN tags via tc-flower.

   - Add tracepoint for tcp_set_ca_state().

  Driver API
  ----------

   - Improve error reporting from classifier and action offload.

   - Add support for listing line cards in switches (devlink).

   - Add helpers for reporting page pool statistics with ethtool -S.

   - Add support for reading clock cycles when using PTP virtual clocks,
     instead of having the driver convert to time before reporting. This
     makes it possible to report time from different vclocks.

   - Support configuring low-latency Tx descriptor push via ethtool.

   - Separate Clause 22 and Clause 45 MDIO accesses more explicitly.

  New hardware / drivers
  ----------------------

   - Ethernet:
      - Marvell's Octeon NIC PCI Endpoint support (octeon_ep)
      - Sunplus SP7021 SoC (sp7021_emac)
      - Add support for Renesas RZ/V2M (in ravb)
      - Add support for MediaTek mt7986 switches (in mtk_eth_soc)

   - Ethernet PHYs:
      - ADIN1100 industrial PHYs (w/ 10BASE-T1L and SQI reporting)
      - TI DP83TD510 PHY
      - Microchip LAN8742/LAN88xx PHYs

   - WiFi:
      - Driver for pureLiFi X, XL, XC devices (plfxlc)
      - Driver for Silicon Labs devices (wfx)
      - Support for WCN6750 (in ath11k)
      - Support Realtek 8852ce devices (in rtw89)

   - Mobile:
      - MediaTek T700 modems (Intel 5G 5000 M.2 cards)

   - CAN:
      - ctucanfd: add support for CTU CAN FD open-source IP core from
        Czech Technical University in Prague

  Drivers
  -------

   - Delete a number of old drivers still using virt_to_bus().

   - Ethernet NICs:
      - intel: support TSO on tunnels MPLS
      - broadcom: support multi-buffer XDP
      - nfp: support VF rate limiting
      - sfc: use hardware tx timestamps for more than PTP
      - mlx5: multi-port eswitch support
      - hyper-v: add support for XDP_REDIRECT
      - atlantic: XDP support (including multi-buffer)
      - macb: improve real-time perf by deferring Tx processing to NAPI

   - High-speed Ethernet switches:
      - mlxsw: implement basic line card information querying
      - prestera: add support for traffic policing on ingress and egress

   - Embedded Ethernet switches:
      - lan966x: add support for packet DMA (FDMA)
      - lan966x: add support for PTP programmable pins
      - ti: cpsw_new: enable bc/mc storm prevention

   - Qualcomm 802.11ax WiFi (ath11k):
      - Wake-on-WLAN support for QCA6390 and WCN6855
      - device recovery (firmware restart) support
      - support setting Specific Absorption Rate (SAR) for WCN6855
      - read country code from SMBIOS for WCN6855/QCA6390
      - enable keep-alive during WoWLAN suspend
      - implement remain-on-channel support

   - MediaTek WiFi (mt76):
      - support Wireless Ethernet Dispatch offloading packet movement
        between the Ethernet switch and WiFi interfaces
      - non-standard VHT MCS10-11 support
      - mt7921 AP mode support
      - mt7921 IPv6 NS offload support

   - Ethernet PHYs:
      - micrel: ksz9031/ksz9131: cabletest support
      - lan87xx: SQI support for T1 PHYs
      - lan937x: add interrupt support for link detection"

* tag 'net-next-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1809 commits)
  ptp: ocp: Add firmware header checks
  ptp: ocp: fix PPS source selector debugfs reporting
  ptp: ocp: add .init function for sma_op vector
  ptp: ocp: vectorize the sma accessor functions
  ptp: ocp: constify selectors
  ptp: ocp: parameterize input/output sma selectors
  ptp: ocp: revise firmware display
  ptp: ocp: add Celestica timecard PCI ids
  ptp: ocp: Remove #ifdefs around PCI IDs
  ptp: ocp: 32-bit fixups for pci start address
  Revert "net/smc: fix listen processing for SMC-Rv2"
  ath6kl: Use cc-disable-warning to disable -Wdangling-pointer
  selftests/bpf: Dynptr tests
  bpf: Add dynptr data slices
  bpf: Add bpf_dynptr_read and bpf_dynptr_write
  bpf: Dynptr support for ring buffers
  bpf: Add bpf_dynptr_from_mem for local dynptrs
  bpf: Add verifier support for dynptrs
  bpf: Suppress 'passing zero to PTR_ERR' warning
  bpf: Introduce bpf_arch_text_invalidate for bpf_prog_pack
  ...
2022-05-25 12:22:58 -07:00
Paolo Bonzini 47e8eec832 KVM/arm64 updates for 5.19
- Add support for the ARMv8.6 WFxT extension
 
 - Guard pages for the EL2 stacks
 
 - Trap and emulate AArch32 ID registers to hide unsupported features
 
 - Ability to select and save/restore the set of hypercalls exposed
   to the guest
 
 - Support for PSCI-initiated suspend in collaboration with userspace
 
 - GICv3 register-based LPI invalidation support
 
 - Move host PMU event merging into the vcpu data structure
 
 - GICv3 ITS save/restore fixes
 
 - The usual set of small-scale cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmKGAGsPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDB/gQAMhyZ+wCG0OMEZhwFF6iDfxVEX2Kw8L41NtD
 a/e6LDWuIOGihItpRkYROc5myG74D7XckF2Bz3G7HJoU4vhwHOV/XulE26GFizoC
 O1GVRekeSUY81wgS1yfo0jojLupBkTjiq3SjTHoDP7GmCM0qDPBtA0QlMRzd2bMs
 Kx0+UUXZUHFSTXc7Lp4vqNH+tMp7se+yRx7hxm6PCM5zG+XYJjLxnsZ0qpchObgU
 7f6YFojsLUs1SexgiUqJ1RChVQ+FkgICh5HyzORvGtHNNzK6D2sIbsW6nqMGAMql
 Kr3A5O/VOkCztSYnLxaa76/HqD21mvUrXvr3grhabNc7rOmuzWV0dDgr6c6wHKHb
 uNCtH4d7Ra06gUrEOrfsgLOLn0Zqik89y6aIlMsnTudMg9gMNgFHy1jz4LM7vMkY
 FS5AVj059heg2uJcfgTvzzcqneyuBLBmF3dS4coowO6oaj8SycpaEmP5e89zkPMI
 1kk8d0e6RmXuCh/2AJ8GxxnKvBPgqp2mMKXOCJ8j4AmHEDX/CKpEBBqIWLKkplUU
 8DGiOWJUtRZJg398dUeIpiVLoXJthMODjAnkKkuhiFcQbXomlwgg7YSnNAz6TRED
 Z7KR2leC247kapHnnagf02q2wED8pBeyrxbQPNdrHtSJ9Usm4nTkY443HgVTJW3s
 aTwPZAQ7
 =mh7W
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 5.19

- Add support for the ARMv8.6 WFxT extension

- Guard pages for the EL2 stacks

- Trap and emulate AArch32 ID registers to hide unsupported features

- Ability to select and save/restore the set of hypercalls exposed
  to the guest

- Support for PSCI-initiated suspend in collaboration with userspace

- GICv3 register-based LPI invalidation support

- Move host PMU event merging into the vcpu data structure

- GICv3 ITS save/restore fixes

- The usual set of small-scale cleanups and fixes

[Due to the conflict, KVM_SYSTEM_EVENT_SEV_TERM is relocated
 from 4 to 6. - Paolo]
2022-05-25 05:09:23 -04:00
Linus Torvalds 09583dfed2 Power management updates for 5.19-rc1
- Update the Energy Model support code to allow the Energy Model to be
    artificial, which means that the power values may not be on a uniform
    scale with other devices providing power information, and update the
    cpufreq_cooling and devfreq_cooling thermal drivers to support
    artificial Energy Models (Lukasz Luba).
 
  - Make DTPM check the Energy Model type (Lukasz Luba).
 
  - Fix policy counter decrementation in cpufreq if Energy Model is in
    use (Pierre Gondois).
 
  - Add CPU-based scaling support to passive devfreq governor (Saravana
    Kannan, Chanwoo Choi).
 
  - Update the rk3399_dmc devfreq driver (Brian Norris).
 
  - Export dev_pm_ops instead of suspend() and resume() in the IIO
    chemical scd30 driver (Jonathan Cameron).
 
  - Add namespace variants of EXPORT[_GPL]_SIMPLE_DEV_PM_OPS and
    PM-runtime counterparts (Jonathan Cameron).
 
  - Move symbol exports in the IIO chemical scd30 driver into the
    IIO_SCD30 namespace (Jonathan Cameron).
 
  - Avoid device PM-runtime usage count underflows (Rafael Wysocki).
 
  - Allow dynamic debug to control printing of PM messages  (David
    Cohen).
 
  - Fix some kernel-doc comments in hibernation code (Yang Li, Haowen
    Bai).
 
  - Preserve ACPI-table override during hibernation (Amadeusz Sławiński).
 
  - Improve support for suspend-to-RAM for PSCI OSI mode (Ulf Hansson).
 
  - Make Intel RAPL power capping driver support the RaptorLake and
    AlderLake N processors (Zhang Rui, Sumeet Pawnikar).
 
  - Remove redundant store to value after multiply in the RAPL power
    capping driver (Colin Ian King).
 
  - Add AlderLake processor support to the intel_idle driver (Zhang Rui).
 
  - Fix regression leading to no genpd governor in the PSCI cpuidle
    driver and fix the riscv-sbi cpuidle driver to allow a genpd
    governor to be used (Ulf Hansson).
 
  - Fix cpufreq governor clean up code to avoid using kfree() directly
    to free kobject-based items (Kevin Hao).
 
  - Prepare cpufreq for powerpc's asm/prom.h cleanup (Christophe Leroy).
 
  - Make intel_pstate notify frequency invariance code when no_turbo is
    turned on and off (Chen Yu).
 
  - Add Sapphire Rapids OOB mode support to intel_pstate (Srinivas
    Pandruvada).
 
  - Make cpufreq avoid unnecessary frequency updates due to mismatch
    between hardware and the frequency table (Viresh Kumar).
 
  - Make remove_cpu_dev_symlink() clear the real_cpus mask to simplify
    code (Viresh Kumar).
 
  - Rearrange cpufreq_offline() and cpufreq_remove_dev() to make the
    calling convention for some driver callbacks consistent (Rafael
    Wysocki).
 
  - Avoid accessing half-initialized cpufreq policies from the show()
    and store() sysfs functions (Schspa Shi).
 
  - Rearrange cpufreq_offline() to make the calling convention for some
    driver callbacks consistent (Schspa Shi).
 
  - Update CPPC handling in cpufreq (Pierre Gondois).
 
  - Extend dev_pm_domain_detach() doc (Krzysztof Kozlowski).
 
  - Move genpd's time-accounting to ktime_get_mono_fast_ns() (Ulf
    Hansson).
 
  - Improve the way genpd deals with its governors (Ulf Hansson).
 
  - Update the turbostat utility to version 2022.04.16 (Len Brown,
    Dan Merillat, Sumeet Pawnikar, Zephaniah E. Loss-Cutler-Hull, Chen
    Yu).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmKL3hsSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxW4oP/RzMh6dclWXs3J/gUCKTqRepq6cb80tq
 Q2r9xRRHwy6ZH/PVddGDHmhQ7d3NAv13s4srA9kznZognF3hzuxnGau226ilDqHh
 qxVSBRjWY9ijxRBvkcCaa6HZm4Chb91pUX0CLpdYSl9BTgIdk66HZYaMsKhHU/di
 j7KKHPdKyyQkssWnMjGEyuaF+UebiEgISCF3+X0eb6c1m7GHXpgLJVxNy0pKkUdK
 j+n6+ms12OlVLtg1eIl0J5824w/rkK3ZdqfEXJSq++mNMqSj/KCI3yWpzsLKp9AB
 xxhox/tPgJVyON8Vtbb2IkWkiQUKeSrAGIUYXWmnwIZYLPSGD7BPzr82Cxr7S/ez
 imMB+1Qd3SsOQ9EdI9rGYgNsEF2vOs1xjMehSdUdmTz148IzBOBt4YyQeb/mfXqH
 nh9eVuFCzqH1lAayYt6iP1+V5gQn9as/+rR91k4k4A6OKXomuQUGORLeHfuKMfNH
 eBZ72tdXqiq6z+ag3lY3pBAMSm11epCOa3VR6QNaC7hrlY3AZP+o3tIUL6W813b+
 V3l1gWApGHZE1hiDM95dll/dIt9IZpTRd3dlqF/YnFW7fPDrz71EGvhrZpO7vdO0
 /G6eJcCDjqJVcbCE8Y77I6/AXjpVQ7PRPeNx6aW7jPcQhpVIgcsF2BGjk9anjXDs
 3yHJs9R/HMmA
 =Hewm
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These add support for 'artificial' Energy Models in which power
  numbers for different entities may be in different scales, add support
  for some new hardware, fix bugs and clean up code in multiple places.

  Specifics:

   - Update the Energy Model support code to allow the Energy Model to
     be artificial, which means that the power values may not be on a
     uniform scale with other devices providing power information, and
     update the cpufreq_cooling and devfreq_cooling thermal drivers to
     support artificial Energy Models (Lukasz Luba).

   - Make DTPM check the Energy Model type (Lukasz Luba).

   - Fix policy counter decrementation in cpufreq if Energy Model is in
     use (Pierre Gondois).

   - Add CPU-based scaling support to passive devfreq governor (Saravana
     Kannan, Chanwoo Choi).

   - Update the rk3399_dmc devfreq driver (Brian Norris).

   - Export dev_pm_ops instead of suspend() and resume() in the IIO
     chemical scd30 driver (Jonathan Cameron).

   - Add namespace variants of EXPORT[_GPL]_SIMPLE_DEV_PM_OPS and
     PM-runtime counterparts (Jonathan Cameron).

   - Move symbol exports in the IIO chemical scd30 driver into the
     IIO_SCD30 namespace (Jonathan Cameron).

   - Avoid device PM-runtime usage count underflows (Rafael Wysocki).

   - Allow dynamic debug to control printing of PM messages (David
     Cohen).

   - Fix some kernel-doc comments in hibernation code (Yang Li, Haowen
     Bai).

   - Preserve ACPI-table override during hibernation (Amadeusz
     Sławiński).

   - Improve support for suspend-to-RAM for PSCI OSI mode (Ulf Hansson).

   - Make Intel RAPL power capping driver support the RaptorLake and
     AlderLake N processors (Zhang Rui, Sumeet Pawnikar).

   - Remove redundant store to value after multiply in the RAPL power
     capping driver (Colin Ian King).

   - Add AlderLake processor support to the intel_idle driver (Zhang
     Rui).

   - Fix regression leading to no genpd governor in the PSCI cpuidle
     driver and fix the riscv-sbi cpuidle driver to allow a genpd
     governor to be used (Ulf Hansson).

   - Fix cpufreq governor clean up code to avoid using kfree() directly
     to free kobject-based items (Kevin Hao).

   - Prepare cpufreq for powerpc's asm/prom.h cleanup (Christophe
     Leroy).

   - Make intel_pstate notify frequency invariance code when no_turbo is
     turned on and off (Chen Yu).

   - Add Sapphire Rapids OOB mode support to intel_pstate (Srinivas
     Pandruvada).

   - Make cpufreq avoid unnecessary frequency updates due to mismatch
     between hardware and the frequency table (Viresh Kumar).

   - Make remove_cpu_dev_symlink() clear the real_cpus mask to simplify
     code (Viresh Kumar).

   - Rearrange cpufreq_offline() and cpufreq_remove_dev() to make the
     calling convention for some driver callbacks consistent (Rafael
     Wysocki).

   - Avoid accessing half-initialized cpufreq policies from the show()
     and store() sysfs functions (Schspa Shi).

   - Rearrange cpufreq_offline() to make the calling convention for some
     driver callbacks consistent (Schspa Shi).

   - Update CPPC handling in cpufreq (Pierre Gondois).

   - Extend dev_pm_domain_detach() doc (Krzysztof Kozlowski).

   - Move genpd's time-accounting to ktime_get_mono_fast_ns() (Ulf
     Hansson).

   - Improve the way genpd deals with its governors (Ulf Hansson).

   - Update the turbostat utility to version 2022.04.16 (Len Brown, Dan
     Merillat, Sumeet Pawnikar, Zephaniah E. Loss-Cutler-Hull, Chen Yu)"

* tag 'pm-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (94 commits)
  PM: domains: Trust domain-idle-states from DT to be correct by genpd
  PM: domains: Measure power-on/off latencies in genpd based on a governor
  PM: domains: Allocate governor data dynamically based on a genpd governor
  PM: domains: Clean up some code in pm_genpd_init() and genpd_remove()
  PM: domains: Fix initialization of genpd's next_wakeup
  PM: domains: Fixup QoS latency measurements for IRQ safe devices in genpd
  PM: domains: Measure suspend/resume latencies in genpd based on governor
  PM: domains: Move the next_wakeup variable into the struct gpd_timing_data
  PM: domains: Allocate gpd_timing_data dynamically based on governor
  PM: domains: Skip another warning in irq_safe_dev_in_sleep_domain()
  PM: domains: Rename irq_safe_dev_in_no_sleep_domain() in genpd
  PM: domains: Don't check PM_QOS_FLAG_NO_POWER_OFF in genpd
  PM: domains: Drop redundant code for genpd always-on governor
  PM: domains: Add GENPD_FLAG_RPM_ALWAYS_ON for the always-on governor
  powercap: intel_rapl: remove redundant store to value after multiply
  cpufreq: CPPC: Enable dvfs_possible_from_any_cpu
  cpufreq: CPPC: Enable fast_switch
  ACPI: CPPC: Assume no transition latency if no PCCT
  ACPI: bus: Set CPPC _OSC bits for all and when CPPC_LIB is supported
  ACPI: CPPC: Check _OSC for flexible address space
  ...
2022-05-24 16:04:25 -07:00
Linus Torvalds 0bf13a8436 kernel-hardening updates for v5.19-rc1
- usercopy hardening expanded to check other allocation types
   (Matthew Wilcox, Yuanzheng Song)
 
 - arm64 stackleak behavioral improvements (Mark Rutland)
 
 - arm64 CFI code gen improvement (Sami Tolvanen)
 
 - LoadPin LSM block dev API adjustment (Christoph Hellwig)
 
 - Clang randstruct support (Bill Wendling, Kees Cook)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmKL1kMWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJlz6D/9lYEwDQYwKVK6fsXdgcs/eUkqc
 P06KGm7jDiYiua34LMpgu35wkRcxVDzB92kzQmt7yaVqhlIGjO9wnP+uZrq8q/LS
 X9FSb457fREg0XLPX5XC60abHYyikvgJMf06dSLaBcRq1Wzqwp5JZPpLZJUAM2ab
 rM1Vq0brfF1+lPAPECx1sYYNksP9XTw0dtzUu8D9tlTQDFAhKYhV6Io5yRFkA4JH
 ELSHjJHlNgLYeZE5IfWHRQBb+yofjnt61IwoVkqa5lSfoyvKpBPF5G+3gOgtdkyv
 A8So2aG/bMNUUY80Th5ojiZ6V7z5SYjUmHRil6I/swAdkc825n2wM+AQqsxv6U4I
 VvGz3cxaKklERw5N+EJw4amivcgm1jEppZ7qCx9ysLwVg/LI050qhv/T10TYPmOX
 0sQEpZvbKuqGb6nzWo6DME8OpZ27yIa/oRzBHdkIkfkEefYlKWS+dfvWb/73cltj
 jx066Znk1hHZWGT48EsRmxdGAHn4kfIMcMgIs1ki1OO2II6LoXyaFJ0wSAYItxpz
 5gCmDMjkGFRrtXXPEhi6kfKKpOuQux+BmpbVfEzox7Gnrf45sp92cYLncmpAsFB3
 91nPa4/utqb/9ijFCIinazLdcUBPO8I1C8FOHDWSFCnNt4d3j2ozpLbrKWyQsm7+
 RCGdcy+NU/FH1FwZlg==
 =nxsC
 -----END PGP SIGNATURE-----

Merge tag 'kernel-hardening-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull kernel hardening updates from Kees Cook:

 - usercopy hardening expanded to check other allocation types (Matthew
   Wilcox, Yuanzheng Song)

 - arm64 stackleak behavioral improvements (Mark Rutland)

 - arm64 CFI code gen improvement (Sami Tolvanen)

 - LoadPin LSM block dev API adjustment (Christoph Hellwig)

 - Clang randstruct support (Bill Wendling, Kees Cook)

* tag 'kernel-hardening-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (34 commits)
  loadpin: stop using bdevname
  mm: usercopy: move the virt_addr_valid() below the is_vmalloc_addr()
  gcc-plugins: randstruct: Remove cast exception handling
  af_unix: Silence randstruct GCC plugin warning
  niu: Silence randstruct warnings
  big_keys: Use struct for internal payload
  gcc-plugins: Change all version strings match kernel
  randomize_kstack: Improve docs on requirements/rationale
  lkdtm/stackleak: fix CONFIG_GCC_PLUGIN_STACKLEAK=n
  arm64: entry: use stackleak_erase_on_task_stack()
  stackleak: add on/off stack variants
  lkdtm/stackleak: check stack boundaries
  lkdtm/stackleak: prevent unexpected stack usage
  lkdtm/stackleak: rework boundary management
  lkdtm/stackleak: avoid spurious failure
  stackleak: rework poison scanning
  stackleak: rework stack high bound handling
  stackleak: clarify variable names
  stackleak: rework stack low bound handling
  stackleak: remove redundant check
  ...
2022-05-24 12:27:09 -07:00
Linus Torvalds ac2ab99072 Random number generator updates for Linux 5.19-rc1.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmKKpM8ACgkQSfxwEqXe
 A6726w/+OJimGd4arvpSmdn+vxepSyDLgKfwM0x5zprRVd16xg8CjJr4eMonTesq
 YvtJRqpetb53MB+sMhutlvQqQzrjtf2MBkgPwF4I2gUrk7vLD45Q+AGdGhi/rUwz
 wHGA7xg1FHLHia2M/9idSqi8QlZmUP4u4l5ZnMyTUHiwvRD6XOrWKfqvUSawNzyh
 hCWlTUxDrjizsW5YpsJX/MkRadSC8loJEk5ByZebow6nRPfurJvqfrcOMgHyNrbY
 pOZ/CGPxcetMqotL2TuuJt5wKmenqYhIWGAp3YM2SWWgU2ueBZekW8AYeMfgUcvh
 LWV93RpSuAnE5wsdjIULvjFnEDJBf8ihfMnMrd9G5QjQu44tuKWfY2MghLSpYzaR
 V6UFbRmhrqhqiStHQXOvk1oqxtpbHlc9zzJLmvPmDJcbvzXQ9Opk5GVXAmdtnHnj
 M/ty3wGWxucY6mHqT8MkCShSSslbgEtc1pEIWHdrUgnaiSVoCVBEO+9LqLbjvOTm
 XA/6YtoiCE5FasK51pir1zVb2GORQn0v8HnuAOsusD/iPAlRQ/G5jZkaXbwRQI6j
 atYL1svqvSKn5POnzqAlMUXfMUr19K5xqJdp7i6qmlO1Vq6Z+tWbCQgD1JV+Wjkb
 CMyvXomFCFu4aYKGRE2SBRnWLRghG3kYHqEQ15yTPMQerxbUDNg=
 =SUr3
 -----END PGP SIGNATURE-----

Merge tag 'random-5.19-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random

Pull random number generator updates from Jason Donenfeld:
 "These updates continue to refine the work began in 5.17 and 5.18 of
  modernizing the RNG's crypto and streamlining and documenting its
  code.

  New for 5.19, the updates aim to improve entropy collection methods
  and make some initial decisions regarding the "premature next" problem
  and our threat model. The cloc utility now reports that random.c is
  931 lines of code and 466 lines of comments, not that basic metrics
  like that mean all that much, but at the very least it tells you that
  this is very much a manageable driver now.

  Here's a summary of the various updates:

   - The random_get_entropy() function now always returns something at
     least minimally useful. This is the primary entropy source in most
     collectors, which in the best case expands to something like RDTSC,
     but prior to this change, in the worst case it would just return 0,
     contributing nothing. For 5.19, additional architectures are wired
     up, and architectures that are entirely missing a cycle counter now
     have a generic fallback path, which uses the highest resolution
     clock available from the timekeeping subsystem.

     Some of those clocks can actually be quite good, despite the CPU
     not having a cycle counter of its own, and going off-core for a
     stamp is generally thought to increase jitter, something positive
     from the perspective of entropy gathering. Done very early on in
     the development cycle, this has been sitting in next getting some
     testing for a while now and has relevant acks from the archs, so it
     should be pretty well tested and fine, but is nonetheless the thing
     I'll be keeping my eye on most closely.

   - Of particular note with the random_get_entropy() improvements is
     MIPS, which, on CPUs that lack the c0 count register, will now
     combine the high-speed but short-cycle c0 random register with the
     lower-speed but long-cycle generic fallback path.

   - With random_get_entropy() now always returning something useful,
     the interrupt handler now collects entropy in a consistent
     construction.

   - Rather than comparing two samples of random_get_entropy() for the
     jitter dance, the algorithm now tests many samples, and uses the
     amount of differing ones to determine whether or not jitter entropy
     is usable and how laborious it must be. The problem with comparing
     only two samples was that if the cycle counter was extremely slow,
     but just so happened to be on the cusp of a change, the slowness
     wouldn't be detected. Taking many samples fixes that to some
     degree.

     This, combined with the other improvements to random_get_entropy(),
     should make future unification of /dev/random and /dev/urandom
     maybe more possible. At the very least, were we to attempt it again
     today (we're not), it wouldn't break any of Guenter's test rigs
     that broke when we tried it with 5.18. So, not today, but perhaps
     down the road, that's something we can revisit.

   - We attempt to reseed the RNG immediately upon waking up from system
     suspend or hibernation, making use of the various timestamps about
     suspend time and such available, as well as the usual inputs such
     as RDRAND when available.

   - Batched randomness now falls back to ordinary randomness before the
     RNG is initialized. This provides more consistent guarantees to the
     types of random numbers being returned by the various accessors.

   - The "pre-init injection" code is now gone for good. I suspect you
     in particular will be happy to read that, as I recall you
     expressing your distaste for it a few months ago. Instead, to avoid
     a "premature first" issue, while still allowing for maximal amount
     of entropy availability during system boot, the first 128 bits of
     estimated entropy are used immediately as it arrives, with the next
     128 bits being buffered. And, as before, after the RNG has been
     fully initialized, it winds up reseeding anyway a few seconds later
     in most cases. This resulted in a pretty big simplification of the
     initialization code and let us remove various ad-hoc mechanisms
     like the ugly crng_pre_init_inject().

   - The RNG no longer pretends to handle the "premature next" security
     model, something that various academics and other RNG designs have
     tried to care about in the past. After an interesting mailing list
     thread, these issues are thought to be a) mainly academic and not
     practical at all, and b) actively harming the real security of the
     RNG by delaying new entropy additions after a potential compromise,
     making a potentially bad situation even worse. As well, in the
     first place, our RNG never even properly handled the premature next
     issue, so removing an incomplete solution to a fake problem was
     particularly nice.

     This allowed for numerous other simplifications in the code, which
     is a lot cleaner as a consequence. If you didn't see it before,
     https://lore.kernel.org/lkml/YmlMGx6+uigkGiZ0@zx2c4.com/ may be a
     thread worth skimming through.

   - While the interrupt handler received a separate code path years ago
     that avoids locks by using per-cpu data structures and a faster
     mixing algorithm, in order to reduce interrupt latency, input and
     disk events that are triggered in hardirq handlers were still
     hitting locks and more expensive algorithms. Those are now
     redirected to use the faster per-cpu data structures.

   - Rather than having the fake-crypto almost-siphash-based random32
     implementation be used right and left, and in many places where
     cryptographically secure randomness is desirable, the batched
     entropy code is now fast enough to replace that.

   - As usual, numerous code quality and documentation cleanups. For
     example, the initialization state machine now uses enum symbolic
     constants instead of just hard coding numbers everywhere.

   - Since the RNG initializes once, and then is always initialized
     thereafter, a pretty heavy amount of code used during that
     initialization is never used again. It is now completely cordoned
     off using static branches and it winds up in the .text.unlikely
     section so that it doesn't reduce cache compactness after the RNG
     is ready.

   - A variety of functions meant for waiting on the RNG to be
     initialized were only used by vsprintf, and in not a particularly
     optimal way. Replacing that usage with a more ordinary setup made
     it possible to remove those functions.

   - A cleanup of how we warn userspace about the use of uninitialized
     /dev/urandom and uninitialized get_random_bytes() usage.
     Interestingly, with the change you merged for 5.18 that attempts to
     use jitter (but does not block if it can't), the majority of users
     should never see those warnings for /dev/urandom at all now, and
     the one for in-kernel usage is mainly a debug thing.

   - The file_operations struct for /dev/[u]random now implements
     .read_iter and .write_iter instead of .read and .write, allowing it
     to also implement .splice_read and .splice_write, which makes
     splice(2) work again after it was broken here (and in many other
     places in the tree) during the set_fs() removal. This was a bit of
     a last minute arrival from Jens that hasn't had as much time to
     bake, so I'll be keeping my eye on this as well, but it seems
     fairly ordinary. Unfortunately, read_iter() is around 3% slower
     than read() in my tests, which I'm not thrilled about. But Jens and
     Al, spurred by this observation, seem to be making progress in
     removing the bottlenecks on the iter paths in the VFS layer in
     general, which should remove the performance gap for all drivers.

   - Assorted other bug fixes, cleanups, and optimizations.

   - A small SipHash cleanup"

* tag 'random-5.19-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (49 commits)
  random: check for signals after page of pool writes
  random: wire up fops->splice_{read,write}_iter()
  random: convert to using fops->write_iter()
  random: convert to using fops->read_iter()
  random: unify batched entropy implementations
  random: move randomize_page() into mm where it belongs
  random: remove mostly unused async readiness notifier
  random: remove get_random_bytes_arch() and add rng_has_arch_random()
  random: move initialization functions out of hot pages
  random: make consistent use of buf and len
  random: use proper return types on get_random_{int,long}_wait()
  random: remove extern from functions in header
  random: use static branch for crng_ready()
  random: credit architectural init the exact amount
  random: handle latent entropy and command line from random_init()
  random: use proper jiffies comparison macro
  random: remove ratelimiting for in-kernel unseeded randomness
  random: move initialization out of reseeding hot path
  random: avoid initializing twice in credit race
  random: use symbolic constants for crng_init states
  ...
2022-05-24 11:58:10 -07:00
Linus Torvalds cfeb2522c3 Perf events changes for this cycle were:
Platform PMU changes:
 =====================
 
  - x86/intel:
     - Add new Intel Alder Lake and Raptor Lake support
 
  - x86/amd:
     - AMD Zen4 IBS extensions support
     - Add AMD PerfMonV2 support
     - Add AMD Fam19h Branch Sampling support
 
 Generic changes:
 ================
 
  - signal: Deliver SIGTRAP on perf event asynchronously if blocked
 
    Perf instrumentation can be driven via SIGTRAP, but this causes a problem
    when SIGTRAP is blocked by a task & terminate the task.
 
    Allow user-space to request these signals asynchronously (after they get
    unblocked) & also give the information to the signal handler when this
    happens:
 
      " To give user space the ability to clearly distinguish synchronous from
        asynchronous signals, introduce siginfo_t::si_perf_flags and
        TRAP_PERF_FLAG_ASYNC (opted for flags in case more binary information is
        required in future).
 
        The resolution to the problem is then to (a) no longer force the signal
        (avoiding the terminations), but (b) tell user space via si_perf_flags
        if the signal was synchronous or not, so that such signals can be
        handled differently (e.g. let user space decide to ignore or consider
        the data imprecise). "
 
  - Unify/standardize the /sys/devices/cpu/events/* output format.
 
  - Misc fixes & cleanups.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmKLuiURHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1ioSRAAgM3PneFHn5MFiuV/8ZfP3xMHNUOYOCgN
 JhALRcUhDdL4N9pS0DSImfXvAlYPJ/TZK8qBRNDsRgygp5vjrbr9zH2HdZBW1gyV
 qi3bpuNS+METnfNyumAoBeOYbMIvpm3NDUX+w68Xvkd1g8ykyno8Zc2H2hj3IDsW
 cK3ErP0CZLsnBZsymy29/bxCYhfxsED6J06hOa8R3Tvl4XYg/27Z+tEuZ4GYeFS8
 VikulYB9RhRWUbhkzwjyRSbTWyvsuXP+xD28ymUIxXaNCDOwxK8uYtVepUFIBO8X
 cZgtwT2faV3y5ZAnz02M+/JZl+Jz5EPm037vNQp9aJsTuAbAGnxh/hL0cBVuDqhv
 Nh9wkqS8FqwAbtpvg/IeamzqN5z/Yn2Q/Jyk/4oWipmeddXWUL7sYVoSduTGJJkz
 cZz2ciNQbnOCzv0ZSjihrGMqPaT+/wI/iLW3ouLoZXpfTtVVRiiLuI1DDAZ1rd2r
 D6djV8JjHIs71V/6E9ahVATxq8yMdikd7u734rA5K3XSxIBTYrdshbOhddzgeE7d
 chQ7XvpQXDoFrZtxkHXP5iIeNF7fU9MWNWaEcsrZaWEB/8UpD6eL2if1Kl8mog+h
 J4+zR1LWRHh8TNRfos3yCP2PSbbS6LPVsYLJzP+bb+pxgqdJ+urxfmxoCtY5trNI
 zHT52xfdxSo=
 =UqYA
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf events updates from Ingo Molnar:
 "Platform PMU changes:

   - x86/intel:
      - Add new Intel Alder Lake and Raptor Lake support

   - x86/amd:
      - AMD Zen4 IBS extensions support
      - Add AMD PerfMonV2 support
      - Add AMD Fam19h Branch Sampling support

  Generic changes:

   - signal: Deliver SIGTRAP on perf event asynchronously if blocked

     Perf instrumentation can be driven via SIGTRAP, but this causes a
     problem when SIGTRAP is blocked by a task & terminate the task.

     Allow user-space to request these signals asynchronously (after
     they get unblocked) & also give the information to the signal
     handler when this happens:

       "To give user space the ability to clearly distinguish
        synchronous from asynchronous signals, introduce
        siginfo_t::si_perf_flags and TRAP_PERF_FLAG_ASYNC (opted for
        flags in case more binary information is required in future).

        The resolution to the problem is then to (a) no longer force the
        signal (avoiding the terminations), but (b) tell user space via
        si_perf_flags if the signal was synchronous or not, so that such
        signals can be handled differently (e.g. let user space decide
        to ignore or consider the data imprecise). "

   - Unify/standardize the /sys/devices/cpu/events/* output format.

   - Misc fixes & cleanups"

* tag 'perf-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
  perf/x86/amd/core: Fix reloading events for SVM
  perf/x86/amd: Run AMD BRS code only on supported hw
  perf/x86/amd: Fix AMD BRS period adjustment
  perf/x86/amd: Remove unused variable 'hwc'
  perf/ibs: Fix comment
  perf/amd/ibs: Advertise zen4_ibs_extensions as pmu capability attribute
  perf/amd/ibs: Add support for L3 miss filtering
  perf/amd/ibs: Use ->is_visible callback for dynamic attributes
  perf/amd/ibs: Cascade pmu init functions' return value
  perf/x86/uncore: Add new Alder Lake and Raptor Lake support
  perf/x86/uncore: Clean up uncore_pci_ids[]
  perf/x86/cstate: Add new Alder Lake and Raptor Lake support
  perf/x86/msr: Add new Alder Lake and Raptor Lake support
  perf/x86: Add new Alder Lake and Raptor Lake support
  perf/amd/ibs: Use interrupt regs ip for stack unwinding
  perf/x86/amd/core: Add PerfMonV2 overflow handling
  perf/x86/amd/core: Add PerfMonV2 counter control
  perf/x86/amd/core: Detect available counters
  perf/x86/amd/core: Detect PerfMonV2 support
  x86/msr: Add PerfCntrGlobal* registers
  ...
2022-05-24 10:59:38 -07:00
Linus Torvalds 22922deae1 Objtool changes for this cycle were:
- Comprehensive interface overhaul:
    =================================
 
    Objtool's interface has some issues:
 
      - Several features are done unconditionally, without any way to turn
        them off.  Some of them might be surprising.  This makes objtool
        tricky to use, and prevents porting individual features to other
        arches.
 
      - The config dependencies are too coarse-grained.  Objtool enablement is
        tied to CONFIG_STACK_VALIDATION, but it has several other features
        independent of that.
 
      - The objtool subcmds ("check" and "orc") are clumsy: "check" is really
        a subset of "orc", so it has all the same options.  The subcmd model
        has never really worked for objtool, as it only has a single purpose:
        "do some combination of things on an object file".
 
      - The '--lto' and '--vmlinux' options are nonsensical and have
        surprising behavior.
 
    Overhaul the interface:
 
       - get rid of subcmds
 
       - make all features individually selectable
 
       - remove and/or clarify confusing/obsolete options
 
       - update the documentation
 
       - fix some bugs found along the way
 
  - Fix x32 regression
 
  - Fix Kbuild cleanup bugs
 
  - Add scripts/objdump-func helper script to disassemble a single function from an object file.
 
  - Rewrite scripts/faddr2line to be section-aware, by basing it on 'readelf',
    moving it away from 'nm', which doesn't handle multiple sections well,
    which can result in decoding failure.
 
  - Rewrite & fix symbol handling - which had a number of bugs wrt. object files
    that don't have global symbols - which is rare but possible. Also fix a
    bunch of symbol handling bugs found along the way.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmKLtcURHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jVQg//QM8nCNadJAVS9exVGX1DZI9pnf3OJaA9
 gOFML7Lv3MC+Lwdxt6Iv020rFVaeAnOcjPsis3dppFz62FZzzMWoemn5irg2BFiJ
 dp++UtJWTfKxgU2BHydU9uXD0kcJkD4AjBCIaFsgmTjAz8QvMGa9j0smuUm3cDSL
 0Bdid+LhkQqW3P2FiLWsSAzh4vqZmdwpXgERZRql8qD3NYk5hV4QDKs3gMguktat
 9gos4kGt0uwKfiEvmeNEXkoAwUsTvE/vqaOy9cVxxCqcWrrC+yQeBpwSoqhHK526
 dyHlwlYvBaPFqZnmquVUv21iv1MU6dUBJPhNIChke0NDTwVzSXdI75207FARyk5J
 3igSFEfJcU9zMvhAAsAjzD/uQP2ATowg5qa/V2xyWwtyaRgBleRffYiDsbhgDoNc
 R4/vI+vn/fQXouMhmmjPNYzu9uHQ+k89wQCJIY8Bswf7oNu6nKL3jJb/a/a7xhsH
 ZNqv+M0KEENTZcjBU2UHGyImApmkTlsp2mxUiiHs7QoV1hTfz+TcTXKPM1mIuJB8
 /HrVpv64CZ3S7p4JyGBUTNpci4mBjgBmwwAf16+dtaxyxxfoqReVWh3+bzsZbH+B
 kRjezWHh7/yCsoyDm7/LPgyPKEbozLLzMsTsjVJeWgeTgZ+xuqku3PTVctyzAI21
 DVL5oZe3iK4=
 =ARdm
 -----END PGP SIGNATURE-----

Merge tag 'objtool-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool updates from Ingo Molnar:

 - Comprehensive interface overhaul:
   =================================

   Objtool's interface has some issues:

     - Several features are done unconditionally, without any way to
       turn them off. Some of them might be surprising. This makes
       objtool tricky to use, and prevents porting individual features
       to other arches.

     - The config dependencies are too coarse-grained. Objtool
       enablement is tied to CONFIG_STACK_VALIDATION, but it has several
       other features independent of that.

     - The objtool subcmds ("check" and "orc") are clumsy: "check" is
       really a subset of "orc", so it has all the same options.

       The subcmd model has never really worked for objtool, as it only
       has a single purpose: "do some combination of things on an object
       file".

     - The '--lto' and '--vmlinux' options are nonsensical and have
       surprising behavior.

   Overhaul the interface:

      - get rid of subcmds

      - make all features individually selectable

      - remove and/or clarify confusing/obsolete options

      - update the documentation

      - fix some bugs found along the way

 - Fix x32 regression

 - Fix Kbuild cleanup bugs

 - Add scripts/objdump-func helper script to disassemble a single
   function from an object file.

 - Rewrite scripts/faddr2line to be section-aware, by basing it on
   'readelf', moving it away from 'nm', which doesn't handle multiple
   sections well, which can result in decoding failure.

 - Rewrite & fix symbol handling - which had a number of bugs wrt.
   object files that don't have global symbols - which is rare but
   possible. Also fix a bunch of symbol handling bugs found along the
   way.

* tag 'objtool-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  objtool: Fix objtool regression on x32 systems
  objtool: Fix symbol creation
  scripts/faddr2line: Fix overlapping text section failures
  scripts: Create objdump-func helper script
  objtool: Remove libsubcmd.a when make clean
  objtool: Remove inat-tables.c when make clean
  objtool: Update documentation
  objtool: Remove --lto and --vmlinux in favor of --link
  objtool: Add HAVE_NOINSTR_VALIDATION
  objtool: Rename "VMLINUX_VALIDATION" -> "NOINSTR_VALIDATION"
  objtool: Make noinstr hacks optional
  objtool: Make jump label hack optional
  objtool: Make static call annotation optional
  objtool: Make stack validation frame-pointer-specific
  objtool: Add CONFIG_OBJTOOL
  objtool: Extricate sls from stack validation
  objtool: Rework ibt and extricate from stack validation
  objtool: Make stack validation optional
  objtool: Add option to print section addresses
  objtool: Don't print parentheses in function addresses
  ...
2022-05-24 10:36:38 -07:00
Linus Torvalds 2319be1356 Locking changes in this cycle were:
- rwsem cleanups & optimizations/fixes:
     - Conditionally wake waiters in reader/writer slowpaths
     - Always try to wake waiters in out_nolock path
 
  - Add try_cmpxchg64() implementation, with arch optimizations - and use it to
    micro-optimize sched_clock_{local,remote}()
 
  - Various force-inlining fixes to address objdump instrumentation-check warnings
 
  - Add lock contention tracepoints:
 
     lock:contention_begin
     lock:contention_end
 
  - Misc smaller fixes & cleanups
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmKLsrERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1js3g//cPR9PYlvZv87T2hI8VWKfNzapgSmwCsH
 1P+nk27Pef+jfxHr/N7YScvSD06+z2wIroLE3npPNETmNd1X8obBDThmeD4VI899
 J6h4sE0cFOpTG/mHeECFxqnDuzhdHiRHWS52RxOwTjZTpdbeKWZYueC0Mvqn+tIp
 UM2D2yTseIHs67ikxYtayU/iJgSZ+PYrMPv9nSVUjIFILmg7gMIz38OZYQzR84++
 auL3m8sAq/i2pjzDBbXMpfYeu177/tPHpPJr2rOErLEXWqK2K6op8+CbX4z3yv3z
 EBBhGiUNqDmFaFuIgg7Mx94SvPh8MBGexUnT0XA2aXPwyP9oAaenCk2CZ1j9u15m
 /Xp1A4KNvg1WY8jHu5ZM4VIEXQ7d6Gwtbej7IeovUxBD6y7Trb3+rxb7PVdZX941
 uVGjss1Lgk70wUQqBqBPmBm08O6NUF3vekHlona5CZTQgEF84zD7+7D++QPaAZo7
 kiuNUptdgfq6X0xqgP88GX1KU85gJYoF5Q13vb7lAcv19QhRG5JBJeWMYiXEmg12
 Ktl97Sru0zCpCY1NCvwsBll09SLVO9kX3Lq+QFD8bFMZ0obsGIBotHq1qH6U7cH8
 RY6esVBF/1/+qdrxOKs8qowlJ4EUp/3bX0R/MKYHJJbulj/aaE916HvMsoN/QR4Y
 oW7GcxMQGLE=
 =gaS5
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:

 - rwsem cleanups & optimizations/fixes:
    - Conditionally wake waiters in reader/writer slowpaths
    - Always try to wake waiters in out_nolock path

 - Add try_cmpxchg64() implementation, with arch optimizations - and use
   it to micro-optimize sched_clock_{local,remote}()

 - Various force-inlining fixes to address objdump instrumentation-check
   warnings

 - Add lock contention tracepoints:

    lock:contention_begin
    lock:contention_end

 - Misc smaller fixes & cleanups

* tag 'locking-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/clock: Use try_cmpxchg64 in sched_clock_{local,remote}
  locking/atomic/x86: Introduce arch_try_cmpxchg64
  locking/atomic: Add generic try_cmpxchg64 support
  futex: Remove a PREEMPT_RT_FULL reference.
  locking/qrwlock: Change "queue rwlock" to "queued rwlock"
  lockdep: Delete local_irq_enable_in_hardirq()
  locking/mutex: Make contention tracepoints more consistent wrt adaptive spinning
  locking: Apply contention tracepoints in the slow path
  locking: Add lock contention tracepoints
  locking/rwsem: Always try to wake waiters in out_nolock path
  locking/rwsem: Conditionally wake waiters in reader/writer slowpaths
  locking/rwsem: No need to check for handoff bit if wait queue empty
  lockdep: Fix -Wunused-parameter for _THIS_IP_
  x86/mm: Force-inline __phys_addr_nodebug()
  x86/kvm/svm: Force-inline GHCB accessors
  task_stack, x86/cea: Force-inline stack helpers
2022-05-24 10:18:23 -07:00
Christophe Leroy 5d7c854593 livepatch: Remove klp_arch_set_pc() and asm/livepatch.h
All three versions of klp_arch_set_pc() do exactly the same: they
call ftrace_instruction_pointer_set().

Call ftrace_instruction_pointer_set() directly and remove
klp_arch_set_pc().

As klp_arch_set_pc() was the only thing remaining in asm/livepatch.h
on x86 and s390, remove asm/livepatch.h

livepatch.h remains on powerpc but its content is exclusively used
by powerpc specific code.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Petr Mladek <pmladek@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2022-05-24 08:46:37 +02:00
Linus Torvalds 95fbef17e8 s390 updates for 5.19 merge window
- Make use of the IBM z16 processor activity instrumentation facility
   to count cryptography operations: add a new PMU device driver so
   that perf can make use of this.
 
 - Add new IBM z16 extended counter set to cpumf support.
 
 - Add vdso randomization support.
 
 - Add missing KCSAN instrumentation to barriers and spinlocks, which
   should make s390's KCSAN support complete.
 
 - Add support for IPL-complete-control facility: notify the hypervisor
   that kexec finished work and the kernel starts.
 
 - Improve error logging for PCI.
 
 - Various small changes to workaround llvm's integrated assembler
   limitations, and one bug, to make it finally possible to compile the
   kernel with llvm's integrated assembler. This also requires to raise
   the minimum clang version to 14.0.0.
 
 - Various other small enhancements, bug fixes, and cleanups all over
   the place.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmKLedYACgkQIg7DeRsp
 bsKDfA//TR/8jyyrNs75VDUPiS0UgMgHfjinQqLa8qwaQxCxA0J31I9nYiDxSfp/
 E8hTCLyARnPX0YpcLCEI0ChC6Ad+LElGr6kctdV0FTQopRVreVRKYe2bmrsvXNqs
 4OzFNGZ8mnvMMSi1IQ/A7Yq/DZjbEON5VfY3iJv8djyC7qVNDgngdiQxtIJ+3eq/
 77pw3VEgtuI2lVC3O9fEsdqRUyB5UHS3GSknmc8+KuRmOorir0JwMvxQ9xARZJYE
 6FbTnSDW1YGI6TBoa/zFberqsldU/qJzo40JmPr27a2qbEmysc8kw60r+cIFsxgC
 H432/aS9102CnsocaY7CtOvs+TLAK8dYeU31enxUGXnICMJ0MuuqnNnAfHrJziVs
 ZnK3iUfPmMMewYfSefn8Sk87kJR5ggGePF++44GEqd87lRwZUnC+hd19dNtzzgSx
 Br4dRYrdQl+w2nqBHGCGW2288svtiPHslnhaQqy343fS9q0o3Mebqx1e9be7t9/K
 IDFQ00Cd3FS2jhphCbCrq2vJTmByhTQqCiNoEJ6vZK2B3ksrJUotfdwI+5etE2Kj
 8sOPwOPyIAI9HnXFVknGIl/u5kaPuHazkZu6u3Or0miVZYw01pov1am0ArcFjeMX
 /4Js/lI4O/wXvRzVk0rILrAZFDirAHvqqx+aI20cegTQU2C8mHY=
 =W+1k
 -----END PGP SIGNATURE-----

Merge tag 's390-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Heiko Carstens:

 - Make use of the IBM z16 processor activity instrumentation facility
   to count cryptography operations: add a new PMU device driver so that
   perf can make use of this.

 - Add new IBM z16 extended counter set to cpumf support.

 - Add vdso randomization support.

 - Add missing KCSAN instrumentation to barriers and spinlocks, which
   should make s390's KCSAN support complete.

 - Add support for IPL-complete-control facility: notify the hypervisor
   that kexec finished work and the kernel starts.

 - Improve error logging for PCI.

 - Various small changes to workaround llvm's integrated assembler
   limitations, and one bug, to make it finally possible to compile the
   kernel with llvm's integrated assembler. This also requires to raise
   the minimum clang version to 14.0.0.

 - Various other small enhancements, bug fixes, and cleanups all over
   the place.

* tag 's390-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (48 commits)
  s390/head: get rid of 31 bit leftovers
  scripts/min-tool-version.sh: raise minimum clang version to 14.0.0 for s390
  s390/boot: do not emit debug info for assembly with llvm's IAS
  s390/boot: workaround llvm IAS bug
  s390/purgatory: workaround llvm's IAS limitations
  s390/entry: workaround llvm's IAS limitations
  s390/alternatives: remove padding generation code
  s390/alternatives: provide identical sized orginal/alternative sequences
  s390/cpumf: add new extended counter set for IBM z16
  s390/preempt: disable __preempt_count_add() optimization for PROFILE_ALL_BRANCHES
  s390/stp: clock_delta should be signed
  s390/stp: fix todoff size
  s390/pai: add support for cryptography counters
  entry: Rename arch_check_user_regs() to arch_enter_from_user_mode()
  s390/compat: cleanup compat_linux.h header file
  s390/entry: remove broken and not needed code
  s390/boot: convert parmarea to C
  s390/boot: convert initial lowcore to C
  s390/ptrace: move short psw definitions to ptrace header file
  s390/head: initialize all new psws
  ...
2022-05-23 21:01:30 -07:00
Linus Torvalds 8443516da6 platform-drivers-x86 for v5.19-1
Highlights:
  -  New drivers:
     -  Intel "In Field Scan" (IFS) support
     -  Winmate FM07/FM07P buttons
     -  Mellanox SN2201 support
  -  AMD PMC driver enhancements
  -  Lots of various other small fixes and hardware-id additions
 
 The following is an automated git shortlog grouped by driver:
 
 Documentation:
  -  In-Field Scan
 
 Documentation/ABI:
  -  Add new attributes for mlxreg-io sysfs interfaces
  -  sysfs-class-firmware-attributes: Misc. cleanups
  -  sysfs-class-firmware-attributes: Fix Sphinx errors
  -  sysfs-driver-intel_sdsi: Fix sphinx warnings
 
 acerhdf:
  -  Cleanup str_starts_with()
 
 amd-pmc:
  -  Fix build error unused-function
  -  Shuffle location of amd_pmc_get_smu_version()
  -  Avoid reading SMU version at probe time
  -  Move FCH init to first use
  -  Move SMU logging setup out of init
  -  Fix compilation without CONFIG_SUSPEND
 
 amd_hsmp:
  -  Add HSMP protocol version 5 messages
 
 asus-nb-wmi:
  -  Add keymap for MyASUS key
 
 asus-wmi:
  -  Update unknown code message
  -  Use kobj_to_dev()
  -  Fix driver not binding when fan curve control probe fails
  -  Potential buffer overflow in asus_wmi_evaluate_method_buf()
 
 barco-p50-gpio:
  -  Fix duplicate included linux/io.h
 
 dell-laptop:
  -  Add quirk entry for Latitude 7520
 
 gigabyte-wmi:
  -  Add support for Z490 AORUS ELITE AC and X570 AORUS ELITE WIFI
  -  added support for B660 GAMING X DDR4 motherboard
 
 hp-wmi:
  -  Correct code style related issues
 
 intel-hid:
  -  fix _DSM function index handling
 
 intel-uncore-freq:
  -  Prevent driver loading in guests
 
 intel_cht_int33fe:
  -  Set driver data
 
 platform/mellanox:
  -  Add support for new SN2201 system
 
 platform/surface:
  -  aggregator: Fix initialization order when compiling as builtin module
  -  gpe: Add support for Surface Pro 8
 
 platform/x86/dell:
  -  add buffer allocation/free functions for SMI calls
 
 platform/x86/intel:
  -  Fix 'rmmod pmt_telemetry' panic
  -  pmc/core: Use kobj_to_dev()
  -  pmc/core: change pmc_lpm_modes to static
 
 platform/x86/intel/ifs:
  -  Add CPU_SUP_INTEL dependency
  -  add ABI documentation for IFS
  -  Add IFS sysfs interface
  -  Add scan test support
  -  Authenticate and copy to secured memory
  -  Check IFS Image sanity
  -  Read IFS firmware image
  -  Add stub driver for In-Field Scan
 
 platform/x86/intel/sdsi:
  -  Fix bug in multi packet reads
  -  Poll on ready bit for writes
  -  Handle leaky bucket
 
 platform_data/mlxreg:
  -  Add field for notification callback
 
 pmc_atom:
  -  dont export pmc_atom_read - no modular users
  -  remove unused pmc_atom_write()
 
 samsung-laptop:
  -  use kobj_to_dev()
  -  Fix an unsigned comparison which can never be negative
 
 stop_machine:
  -  Add stop_core_cpuslocked() for per-core operations
 
 think-lmi:
  -  certificate support clean ups
 
 thinkpad_acpi:
  -  Correct dual fan probe
  -  Add a s2idle resume quirk for a number of laptops
  -  Convert btusb DMI list to quirks
 
 tools/power/x86/intel-speed-select:
  -  Fix warning for perf_cap.cpu
  -  Display error on turbo mode disabled
  -  fix build failure when using -Wl,--as-needed
 
 toshiba_acpi:
  -  use kobj_to_dev()
 
 trace:
  -  platform/x86/intel/ifs: Add trace point to track Intel IFS operations
 
 winmate-fm07-keys:
  -  Winmate FM07/FM07P buttons
 
 wmi:
  -  replace usage of found with dedicated list iterator variable
 
 x86/microcode/intel:
  -  Expose collect_cpu_info_early() for IFS
 
 x86/msr-index:
  -  Define INTEGRITY_CAPABILITIES MSR
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmKKlA0UHGhkZWdvZWRl
 QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9w0Iwf+PYoq7qtU6j6N2f8gL2s65JpKiSPP
 CkgnCzTP+khvNnTWMQS8RW9VE6YrHXmN/+d3UAvRrHsOYm3nyZT5aPju9xJ6Xyfn
 5ZdMVvYxz7cm3lC6ay8AQt0Cmy6im/+lzP5vA5K68IYh0fPX/dvuOU57pNvXYFfk
 Yz5/Gm0t0C4CKVqkcdU/zkNawHP+2+SyQe+Ua2srz7S3DAqUci0lqLr/w9Xk2Yij
 nCgEWFB1Qjd2NoyRRe44ksLQ0dXpD4ADDzED+KPp6VTGnw61Eznf9319Z5ONNa/O
 VAaSCcDNKps8d3ZpfCpLb3Rs4ztBCkRnkLFczJBgPsBiuDmyTT2/yeEtNg==
 =HdEG
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver updates from Hans de Goede:
 "This includes some small changes to kernel/stop_machine.c and arch/x86
  which are deps of the new Intel IFS support.

  Highlights:

   - New drivers:
       - Intel "In Field Scan" (IFS) support
       - Winmate FM07/FM07P buttons
       - Mellanox SN2201 support

   -  AMD PMC driver enhancements

   -  Lots of various other small fixes and hardware-id additions"

* tag 'platform-drivers-x86-v5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (54 commits)
  platform/x86/intel/ifs: Add CPU_SUP_INTEL dependency
  platform/x86: intel_cht_int33fe: Set driver data
  platform/x86: intel-hid: fix _DSM function index handling
  platform/x86: toshiba_acpi: use kobj_to_dev()
  platform/x86: samsung-laptop: use kobj_to_dev()
  platform/x86: gigabyte-wmi: Add support for Z490 AORUS ELITE AC and X570 AORUS ELITE WIFI
  tools/power/x86/intel-speed-select: Fix warning for perf_cap.cpu
  tools/power/x86/intel-speed-select: Display error on turbo mode disabled
  Documentation: In-Field Scan
  platform/x86/intel/ifs: add ABI documentation for IFS
  trace: platform/x86/intel/ifs: Add trace point to track Intel IFS operations
  platform/x86/intel/ifs: Add IFS sysfs interface
  platform/x86/intel/ifs: Add scan test support
  platform/x86/intel/ifs: Authenticate and copy to secured memory
  platform/x86/intel/ifs: Check IFS Image sanity
  platform/x86/intel/ifs: Read IFS firmware image
  platform/x86/intel/ifs: Add stub driver for In-Field Scan
  stop_machine: Add stop_core_cpuslocked() for per-core operations
  x86/msr-index: Define INTEGRITY_CAPABILITIES MSR
  x86/microcode/intel: Expose collect_cpu_info_early() for IFS
  ...
2022-05-23 20:38:39 -07:00
Linus Torvalds abc8babefb - A gargen variety of fixes which don't fit any other tip bucket:
- Remove function export
  - Correct asm constraint
  - Fix __setup handlers retval
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmKL6VkACgkQEsHwGGHe
 VUqs6g/+Ikpd4Mrou4P5Ul8QNdN9mEzwUfW6i8VpoA3h1L6mKkZxbUsbSz9xInjw
 MAhrcevujW6GwdQdus2sUcSlX+jxl6c/IlMdf8RegNPY/JBPDX4dRA7rPetvZEDm
 ZiIYVTiEzJoOzPDJeO7a3v5EHPsY6CjsCFhGz7hjIcrwQjzCLkL5MqG+WDAtebe+
 QVdbllD2RlZNPDyHYE5Lqh1h+Y0e4n6kS7LCWxexfHlNOZ5KBRVyIJvz/xOZFZ1/
 9oX0UDD2gfH5chLs8GKsr7cZYERMtNlKBPoxGzl8iKF4iUeiksdj3P5y+mdcFaDG
 YbM7aXewmbyLyiCkh1zXU6Mw3lK1VfUtVXtEYj+qXf1jWp59ctNEJkc6/VAcaKh7
 oS7MNG7Y44B8XwdH7MiqDE7eVCnqEjIR+BIiwjyXNLFP1AXZMAXuBzXPF/vZ3Gyf
 3N5vzO4VNEN6Oa1TReSspKwYvq2uPtHMjLX2rT6Py2ru32mj2dCc5E7GD83RKL8V
 vDIz4VGOZyGfjp6gClMBsyK4mYwSwgXbnOci7DJn56mMf2qzBJITILXc31zz4gX2
 E9kiBu/4Mwjnrx9QRpCNXu7iddBA3YM2NMtNlwBcCgZOFaFz/yOx9TpnugF17WHQ
 VVtQi8wlcsS+F05Y11b7euusMQyk1EpWabIrw8UQd+61Dwpz58Q=
 =/WGB
 -----END PGP SIGNATURE-----

Merge tag 'x86_misc_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 updates from Borislav Petkov:
 "A variety of fixes which don't fit any other tip bucket:

   - Remove unnecessary function export

   - Correct asm constraint

   - Fix __setup handlers retval"

* tag 'x86_misc_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Cleanup the control_va_addr_alignment() __setup handler
  x86: Fix return value of __setup handlers
  x86/delay: Fix the wrong asm constraint in delay_loop()
  x86/amd_nb: Unexport amd_cache_northbridges()
2022-05-23 19:32:59 -07:00
Linus Torvalds 3e2cbc016b - Add Raptor Lake to the set of CPU models which support splitlock
- Make life miserable for apps using split locks by slowing them down
 considerably while the rest of the system remains responsive. The hope
 is it will hurt more and people will really fix their misaligned locks
 apps. As a result, free a TIF bit.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmKL5PQACgkQEsHwGGHe
 VUrz1Q//QjAKyKsAwCzGSPergtnZp9drimSuNsZAz6/xL8wFnn2nfWJTxugNF5jg
 n0Hal2oUGC8lg13mliB7NuDNu4RUWpkFzTzcIbPT8K9h7CUBdQPzqS7E3/p4s/eG
 ZCHp8psBGNp8+/+/LFfu9yhzYsAH9ji5KWmOzTVx9UdP3ovgR8BuCI7FCVJSfRz7
 cY690XgvcuKoXKckVNaCcoQXPJxykfk4Y1yt1TpITqivFbs2I0vLgzEhoRcTAhPA
 nX3pR3uy6oaA6rZAapRt8lbLWOwIEWoI0Tt1v+r5p28+nFiCRfm1XdPYK6CDBlox
 UuMBK4WyvSKjKHLu3wEdLCvYbs1kw2l9pXvS3hrqsKhbdeXKrxrNZ3zshwFMAYap
 MY/nSTsKSWUUgMgUbWI084csapGFB+hxwY8OVr6JXbxE8YYD/yCbPGOe1cLI7MMt
 /H3F6vNqSzdp1N3mAaaKVxiiT21lHIn6oJuSZcDE5sOvBwvpXsOp/w3FxhJCOX49
 PXrZLZmSHkDQSbh1XnvT/a+rq3XX1TFXFz71HYZf1yDk+xTijECglNtGnGSdj2Za
 iOw6M8VduV5Wy3ED9ubonruuHEJn6njpx/MH1B9+mAZsuLBpmuYFBxOn6AHOkXSb
 MVJD4flHXj0ugYm4Q5Y3yi24iWLsRI9utTOU079VL6i6DmFXeZc=
 =svvI
 -----END PGP SIGNATURE-----

Merge tag 'x86_splitlock_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 splitlock updates from Borislav Petkov:

 - Add Raptor Lake to the set of CPU models which support splitlock

 - Make life miserable for apps using split locks by slowing them down
   considerably while the rest of the system remains responsive. The
   hope is it will hurt more and people will really fix their misaligned
   locks apps. As a result, free a TIF bit.

* tag 'x86_splitlock_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/split_lock: Enable the split lock feature on Raptor Lake
  x86/split-lock: Remove unused TIF_SLD bit
  x86/split_lock: Make life miserable for split lockers
2022-05-23 19:24:47 -07:00