OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Sean Christopherson	28360f8870	KVM: x86: Evaluate ability to inject SMI/NMI/IRQ after potential VM-Exit Determine whether or not new events can be injected after checking nested events. If a VM-Exit occurred during nested event handling, any previous event that needed re-injection is gone from's KVM perspective; the event is captured in the vmc*12 VM-Exit information, but doesn't exist in terms of what needs to be done for entry to L1. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-19-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:03:09 -04:00
Sean Christopherson	6c593b5276	KVM: x86: Hoist nested event checks above event injection logic Perform nested event checks before re-injecting exceptions/events into L2. If a pending exception causes VM-Exit to L1, re-injecting events into vmcs02 is premature and wasted effort. Take care to ensure events that need to be re-injected are still re-injected if checking for nested events "fails", i.e. if KVM needs to force an immediate entry+exit to complete the to-be-re-injecteed event. Keep the "can_inject" logic the same for now; it too can be pushed below the nested checks, but is a slightly riskier change (see past bugs about events not being properly purged on nested VM-Exit). Add and/or modify comments to better document the various interactions. Of note is the comment regarding "blocking" previously injected NMIs and IRQs if an exception is pending. The old comment isn't wrong strictly speaking, but it failed to capture the reason why the logic even exists. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-18-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:03:09 -04:00
Sean Christopherson	81601495c5	KVM: x86: Use kvm_queue_exception_e() to queue #DF Queue #DF by recursing on kvm_multiple_exception() by way of kvm_queue_exception_e() instead of open coding the behavior. This will allow KVM to Just Work when a future commit moves exception interception checks (for L2 => L1) into kvm_multiple_exception(). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-17-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:03:08 -04:00
Sean Christopherson	72c14e00bd	KVM: x86: Formalize blocking of nested pending exceptions Capture nested_run_pending as block_pending_exceptions so that the logic of why exceptions are blocked only needs to be documented once instead of at every place that employs the logic. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-16-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:03:08 -04:00
Sean Christopherson	d4963e319f	KVM: x86: Make kvm_queued_exception a properly named, visible struct Move the definition of "struct kvm_queued_exception" out of kvm_vcpu_arch in anticipation of adding a second instance in kvm_vcpu_arch to handle exceptions that occur when vectoring an injected exception and are morphed to VM-Exit instead of leading to #DF. Opportunistically take advantage of the churn to rename "nr" to "vector". No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-15-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:03:08 -04:00
Sean Christopherson	6ad75c5c99	KVM: x86: Rename kvm_x86_ops.queue_exception to inject_exception Rename the kvm_x86_ops hook for exception injection to better reflect reality, and to align with pretty much every other related function name in KVM. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-14-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:03:07 -04:00
Sean Christopherson	bfcb08a0b9	KVM: VMX: Inject #PF on ENCLS as "emulated" #PF Treat #PFs that occur during emulation of ENCLS as, wait for it, emulated page faults. Practically speaking, this is a glorified nop as the exception is never of the nested flavor, and it's extremely unlikely the guest is relying on the side effect of an implicit INVLPG on the faulting address. Fixes: `70210c044b` ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-13-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:03:07 -04:00
Sean Christopherson	593a5c2e3c	KVM: nVMX: Unconditionally clear mtf_pending on nested VM-Exit Clear mtf_pending on nested VM-Exit instead of handling the clear on a case-by-case basis in vmx_check_nested_events(). The pending MTF should never survive nested VM-Exit, as it is a property of KVM's run of the current L2, i.e. should never affect the next L2 run by L1. In practice, this is likely a nop as getting to L1 with nested_run_pending is impossible, and KVM doesn't correctly handle morphing a pending exception that occurs on a prior injected exception (need for re-injected exception being the other case where MTF isn't cleared). However, KVM will hopefully soon correctly deal with a pending exception on top of an injected exception. Add a TODO to document that KVM has an inversion priority bug between SMIs and MTF (and trap-like #DBS), and that KVM also doesn't properly save/restore MTF across SMI/RSM. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-12-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:03:07 -04:00
Sean Christopherson	c2086eca86	KVM: nVMX: Ignore SIPI that arrives in L2 when vCPU is not in WFS Fall through to handling other pending exception/events for L2 if SIPI is pending while the CPU is not in Wait-for-SIPI. KVM correctly ignores the event, but incorrectly returns immediately, e.g. a SIPI coincident with another event could lead to KVM incorrectly routing the event to L1 instead of L2. Fixes: `bf0cd88ce3` ("KVM: x86: emulate wait-for-SIPI and SIPI-VMExit") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-11-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:03:06 -04:00
Sean Christopherson	0701ec903e	KVM: x86: Use DR7_GD macro instead of open coding check in emulator Use DR7_GD in the emulator instead of open coding the check, and drop a comically wrong comment. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-10-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:03:06 -04:00
Sean Christopherson	5623f751bd	KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1) Add a dedicated "exception type" for #DBs, as #DBs can be fault-like or trap-like depending the sub-type of #DB, and effectively defer the decision of what to do with the #DB to the caller. For the emulator's two calls to exception_type(), treat the #DB as fault-like, as the emulator handles only code breakpoint and general detect #DBs, both of which are fault-like. For event injection, which uses exception_type() to determine whether to set EFLAGS.RF=1 on the stack, keep the current behavior of not setting RF=1 for #DBs. Intel and AMD explicitly state RF isn't set on code #DBs, so exempting by failing the "== EXCPT_FAULT" check is correct. The only other fault-like #DB is General Detect, and despite Intel and AMD both strongly implying (through omission) that General Detect #DBs should set RF=1, hardware (multiple generations of both Intel and AMD), in fact does not. Through insider knowledge, extreme foresight, sheer dumb luck, or some combination thereof, KVM correctly handled RF for General Detect #DBs. Fixes: `38827dbd3f` ("KVM: x86: Do not update EFLAGS on faulting emulation") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-9-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:03:06 -04:00
Sean Christopherson	b9d44f9091	KVM: nVMX: Prioritize TSS T-flag #DBs over Monitor Trap Flag Service TSS T-flag #DBs prior to pending MTFs, as such #DBs are higher priority than MTF. KVM itself doesn't emulate TSS #DBs, and any such exceptions injected from L1 will be handled by hardware (or morphed to a fault-like exception if injection fails), but theoretically userspace could pend a TSS T-flag #DB in conjunction with a pending MTF. Note, there's no known use case this fixes, it's purely to be technically correct with respect to Intel's SDM. Cc: Oliver Upton <oupton@google.com> Cc: Peter Shier <pshier@google.com> Fixes: `5ef8acbdd6` ("KVM: nVMX: Emulate MTF when performing instruction emulation") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-8-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:03:05 -04:00
Sean Christopherson	8d178f4607	KVM: nVMX: Treat General Detect #DB (DR7.GD=1) as fault-like Exclude General Detect #DBs, which have fault-like behavior but also have a non-zero payload (DR6.BD=1), from nVMX's handling of pending debug traps. Opportunistically rewrite the comment to better document what is being checked, i.e. "has a non-zero payload" vs. "has a payload", and to call out the many caveats surrounding #DBs that KVM dodges one way or another. Cc: Oliver Upton <oupton@google.com> Cc: Peter Shier <pshier@google.com> Fixes: `684c0422da` ("KVM: nVMX: Handle pending #DB when injecting INIT VM-exit") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-7-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:03:05 -04:00
Sean Christopherson	baf67ca8e5	KVM: x86: Suppress code #DBs on Intel if MOV/POP SS blocking is active Suppress code breakpoints if MOV/POP SS blocking is active and the guest CPU is Intel, i.e. if the guest thinks it's running on an Intel CPU. Intel CPUs inhibit code #DBs when MOV/POP SS blocking is active, whereas AMD (and its descendents) do not. Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830231614.3580124-6-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:03:05 -04:00
Sean Christopherson	d500e1ed3d	KVM: x86: Allow clearing RFLAGS.RF on forced emulation to test code #DBs Extend force_emulation_prefix to an 'int' and use bit 1 as a flag to indicate that KVM should clear RFLAGS.RF before emulating, e.g. to allow tests to force emulation of code breakpoints in conjunction with MOV/POP SS blocking, which is impossible without KVM intervention as VMX unconditionally sets RFLAGS.RF on intercepted #UD. Make the behavior controllable so that tests can also test RFLAGS.RF=1 (again in conjunction with code #DBs). Note, clearing RFLAGS.RF won't create an infinite #DB loop as the guest's IRET from the #DB handler will return to the instruction and not the prefix, i.e. the restart won't force emulation. Opportunistically convert the permissions to the preferred octal format. Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830231614.3580124-5-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:03:04 -04:00
Sean Christopherson	750f8fcb26	KVM: x86: Don't check for code breakpoints when emulating on exception Don't check for code breakpoints during instruction emulation if the emulation was triggered by exception interception. Code breakpoints are the highest priority fault-like exception, and KVM only emulates on exceptions that are fault-like. Thus, if hardware signaled a different exception, then the vCPU is already passed the stage of checking for hardware breakpoints. This is likely a glorified nop in terms of functionality, and is more for clarification and is technically an optimization. Intel's SDM explicitly states vmcs.GUEST_RFLAGS.RF on exception interception is the same as the value that would have been saved on the stack had the exception not been intercepted, i.e. will be '1' due to all fault-like exceptions setting RF to '1'. AMD says "guest state saved ... is the processor state as of the moment the intercept triggers", but that begs the question, "when does the intercept trigger?". Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-4-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:03:04 -04:00
Sean Christopherson	eba9799b5a	KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS Deliberately truncate the exception error code when shoving it into the VMCS (VM-Entry field for vmcs01 and vmcs02, VM-Exit field for vmcs12). Intel CPUs are incapable of handling 32-bit error codes and will never generate an error code with bits 31:16, but userspace can provide an arbitrary error code via KVM_SET_VCPU_EVENTS. Failure to drop the bits on exception injection results in failed VM-Entry, as VMX disallows setting bits 31:16. Setting the bits on VM-Exit would at best confuse L1, and at worse induce a nested VM-Entry failure, e.g. if L1 decided to reinject the exception back into L2. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-3-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:03:04 -04:00
Sean Christopherson	d953540430	KVM: nVMX: Unconditionally purge queued/injected events on nested "exit" Drop pending exceptions and events queued for re-injection when leaving nested guest mode, even if the "exit" is due to VM-Fail, SMI, or forced by host userspace. Failure to purge events could result in an event belonging to L2 being injected into L1. This _should_ never happen for VM-Fail as all events should be blocked by nested_run_pending, but it's possible if KVM, not the L1 hypervisor, is the source of VM-Fail when running vmcs02. SMI is a nop (barring unknown bugs) as recognition of SMI and thus entry to SMM is blocked by pending exceptions and re-injected events. Forced exit is definitely buggy, but has likely gone unnoticed because userspace probably follows the forced exit with KVM_SET_VCPU_EVENTS (or some other ioctl() that purges the queue). Fixes: `4f350c6dbc` ("kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-2-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:03:03 -04:00
Hou Wenlong	794663e13f	KVM: x86: Add missing trace points for RDMSR/WRMSR in emulator path Since the RDMSR/WRMSR emulation uses a sepearte emualtor interface, the trace points for RDMSR/WRMSR can be added in emulator path like normal path. Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Link: https://lore.kernel.org/r/39181a9f777a72d61a4d0bb9f6984ccbd1de2ea3.1661930557.git.houwenlong.hwl@antgroup.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:03:03 -04:00
Hou Wenlong	36d546d59a	KVM: x86: Return emulator error if RDMSR/WRMSR emulation failed The return value of emulator_{get\|set}_mst_with_filter() is confused, since msr access error and emulator error are mixed. Although, KVM_MSR_RET_* doesn't conflict with X86EMUL_IO_NEEDED at present, it is better to convert msr access error to emulator error if error value is needed. So move "r < 0" handling for wrmsr emulation into the set helper function, then only X86EMUL_* is returned in the helper functions. Also add "r < 0" check in the get helper function, although KVM doesn't return -errno today, but assuming that will always hold true is unnecessarily risking. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Link: https://lore.kernel.org/r/09b2847fc3bcb8937fb11738f0ccf7be7f61d9dd.1661930557.git.houwenlong.hwl@antgroup.com [sean: wrap changelog less aggressively] Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:03:03 -04:00
Jilin Yuan	b85a97b851	KVM: x86/mmu: fix repeated words in comments Delete the redundant word 'to'. Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com> Link: https://lore.kernel.org/r/20220831125217.12313-1-yuanjilin@cdjrlc.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:03:02 -04:00
Vitaly Kuznetsov	37d145ef62	KVM: nVMX: Use cached host MSR_IA32_VMX_MISC value for setting up nested MSR vmcs_config has cached host MSR_IA32_VMX_MISC value, use it for setting up nested MSR_IA32_VMX_MISC in nested_vmx_setup_ctls_msrs() and avoid the redundant rdmsr(). No (real) functional change intended. Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-34-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:03:02 -04:00
Vitaly Kuznetsov	0809d9b05a	KVM: VMX: Cache MSR_IA32_VMX_MISC in vmcs_config Like other host VMX control MSRs, MSR_IA32_VMX_MISC can be cached in vmcs_config to avoid the need to re-read it later, e.g. from cpu_has_vmx_intel_pt() or cpu_has_vmx_shadow_vmcs(). No (real) functional change intended. Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-33-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:03:01 -04:00
Vitaly Kuznetsov	bcdf201f8a	KVM: nVMX: Use sanitized allowed-1 bits for VMX control MSRs Using raw host MSR values for setting up nested VMX control MSRs is incorrect as some features need to disabled, e.g. when KVM runs as a nested hypervisor on Hyper-V and uses Enlightened VMCS or when a workaround for IA32_PERF_GLOBAL_CTRL is applied. For non-nested VMX, this is done in setup_vmcs_config() and the result is stored in vmcs_config. Use it for setting up allowed-1 bits in nested VMX MSRs too. Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-32-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:03:00 -04:00
Vitaly Kuznetsov	66a329be4b	KVM: nVMX: Always set required-1 bits of pinbased_ctls to PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR Similar to exit_ctls_low, entry_ctls_low, and procbased_ctls_low, pinbased_ctls_low should be set to PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR and not host's MSR_IA32_VMX_PINBASED_CTLS value \|= PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR. The commit `eabeaaccfc` ("KVM: nVMX: Clean up and fix pin-based execution controls") which introduced '\|=' doesn't mention anything about why this is needed, the change seems rather accidental. Note: normally, required-1 portion of MSR_IA32_VMX_PINBASED_CTLS should be equal to PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR so no behavioral change is expected, however, it is (in theory) possible to observe something different there when e.g. KVM is running as a nested hypervisor. Hope this doesn't happen in practice. Reported-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-31-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:59 -04:00
Vitaly Kuznetsov	9d78d6fb18	KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config() As a preparation to reusing the result of setup_vmcs_config() for setting up nested VMX control MSRs, move LOAD_IA32_PERF_GLOBAL_CTRL errata handling to vmx_vmexit_ctrl()/vmx_vmentry_ctrl() and print the warning from hardware_setup(). While it seems reasonable to not expose LOAD_IA32_PERF_GLOBAL_CTRL controls to L1 hypervisor on buggy CPUs, such change would inevitably break live migration from older KVMs where the controls are exposed. Keep the status quo for now, L1 hypervisor itself is supposed to take care of the errata. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-30-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:58 -04:00
Jim Mattson	aef46a6476	KVM: x86: VMX: Replace some Intel model numbers with mnemonics Intel processor code names are more familiar to many readers than their decimal model numbers. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-29-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:57 -04:00
Sean Christopherson	64f80ea73b	KVM: VMX: Adjust CR3/INVPLG interception for EPT=y at runtime, not setup Clear the CR3 and INVLPG interception controls at runtime based on whether or not EPT is being _used_, as opposed to clearing the bits at setup if EPT is _supported_ in hardware, and then restoring them when EPT is not used. Not mucking with the base config will allow using the base config as the starting point for emulating the VMX capability MSRs. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-28-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:57 -04:00
Vitaly Kuznetsov	a83bea73fa	KVM: VMX: Add missing CPU based VM execution controls to vmcs_config As a preparation to reusing the result of setup_vmcs_config() in nested VMX MSR setup, add the CPU based VM execution controls which KVM doesn't use but supports for nVMX to KVM_OPT_VMX_CPU_BASED_VM_EXEC_CONTROL and filter them out in vmx_exec_control(). No functional change intended. Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-27-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:56 -04:00
Vitaly Kuznetsov	f16e47429e	KVM: VMX: Add missing VMEXIT controls to vmcs_config As a preparation to reusing the result of setup_vmcs_config() in nested VMX MSR setup, add the VMEXIT controls which KVM doesn't use but supports for nVMX to KVM_OPT_VMX_VM_EXIT_CONTROLS and filter them out in vmx_vmexit_ctrl(). No functional change intended. Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-26-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:55 -04:00
Vitaly Kuznetsov	e89e1e2302	KVM: VMX: Move CPU_BASED_CR8_{LOAD,STORE}_EXITING filtering out of setup_vmcs_config() As a preparation to reusing the result of setup_vmcs_config() in nested VMX MSR setup, move CPU_BASED_CR8_{LOAD,STORE}_EXITING filtering to vmx_exec_control(). No functional change intended. Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-25-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:55 -04:00
Vitaly Kuznetsov	ee087b4da0	KVM: VMX: Extend VMX controls macro shenanigans When VMX controls macros are used to set or clear a control bit, make sure that this bit was checked in setup_vmcs_config() and thus is properly reflected in vmcs_config. Opportunistically drop pointless "< 0" check for adjust_vmx_controls()'s return value. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-24-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:54 -04:00
Sean Christopherson	ebb3c8d409	KVM: VMX: Don't toggle VM_ENTRY_IA32E_MODE for 32-bit kernels/KVM Don't toggle VM_ENTRY_IA32E_MODE in 32-bit kernels/KVM and instead bug the VM if KVM attempts to run the guest with EFER.LMA=1. KVM doesn't support running 64-bit guests with 32-bit hosts. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-23-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:53 -04:00
Vitaly Kuznetsov	1dae276569	KVM: VMX: Tweak the special handling of SECONDARY_EXEC_ENCLS_EXITING in setup_vmcs_config() SECONDARY_EXEC_ENCLS_EXITING is the only control which is conditionally added to the 'optional' checklist in setup_vmcs_config() but the special case can be avoided by always checking for its presence first and filtering out the result later. Note: the situation when SECONDARY_EXEC_ENCLS_EXITING is present but cpu_has_sgx() is false is possible when SGX is "soft-disabled", e.g. if software writes MCE control MSRs or there's an uncorrectable #MC. Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-22-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:52 -04:00
Vitaly Kuznetsov	378c4c1850	KVM: VMX: Check CPU_BASED_{INTR,NMI}_WINDOW_EXITING in setup_vmcs_config() CPU_BASED_{INTR,NMI}_WINDOW_EXITING controls are toggled dynamically by vmx_enable_{irq,nmi}_window, handle_interrupt_window(), handle_nmi_window() but setup_vmcs_config() doesn't check their existence. Add the check and filter the controls out in vmx_exec_control(). Note: KVM explicitly supports CPUs without VIRTUAL_NMIS and all these CPUs are supposedly lacking NMI_WINDOW_EXITING too. Adjust cpu_has_virtual_nmis() accordingly. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-21-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:51 -04:00
Vitaly Kuznetsov	ffaaf5913f	KVM: VMX: Check VM_ENTRY_IA32E_MODE in setup_vmcs_config() VM_ENTRY_IA32E_MODE control is toggled dynamically by vmx_set_efer() and setup_vmcs_config() doesn't check its existence. On the contrary, nested_vmx_setup_ctls_msrs() doesn set it on x86_64. Add the missing check and filter the bit out in vmx_vmentry_ctrl(). No (real) functional change intended as all existing CPUs supporting long mode and VMX are supposed to have it. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-20-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:51 -04:00
Sean Christopherson	f4c93d1a0e	KVM: nVMX: Always emulate PERF_GLOBAL_CTRL VM-Entry/VM-Exit controls Advertise VM_{ENTRY,EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL as being supported for nested VMs irrespective of hardware support. KVM fully emulates the controls, i.e. manually emulates MSR writes on entry/exit, and never propagates the guest settings directly to vmcs02. In addition to allowing L1 VMMs to use the controls on older hardware, unconditionally advertising the controls will also allow KVM to use its vmcs01 configuration as the basis for the nested VMX configuration without causing a regression (due the errata which causes KVM to "hide" the control from vmcs01 but not vmcs12). Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-19-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:50 -04:00
Sean Christopherson	def9d705c0	KVM: nVMX: Don't propagate vmcs12's PERF_GLOBAL_CTRL settings to vmcs02 Don't propagate vmcs12's VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL to vmcs02. KVM doesn't disallow L1 from using VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL even when KVM itself doesn't use the control, e.g. due to the various CPU errata that where the MSR can be corrupted on VM-Exit. Preserve KVM's (vmcs01) setting to hopefully avoid having to toggle the bit in vmcs02 at a later point. E.g. if KVM is loading PERF_GLOBAL_CTRL when running L1, then odds are good KVM will also load the MSR when running L2. Fixes: `8bf00a5299` ("KVM: VMX: add support for switching of PERF_GLOBAL_CTRL") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20220830133737.1539624-18-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:49 -04:00
Vitaly Kuznetsov	9bcb90650e	KVM: VMX: Get rid of eVMCS specific VMX controls sanitization With the updated eVMCSv1 definition, there's no known 'problematic' controls which are exposed in VMX control MSRs but are not present in eVMCSv1: all known Hyper-V versions either don't expose the new fields by not setting bits in the VMX feature controls or support the new eVMCS revision. Get rid of VMX control MSRs filtering for KVM on Hyper-V. Note: VMX control MSRs filtering for Hyper-V on KVM (nested_evmcs_filter_control_msr()) stays as even the updated eVMCSv1 definition doesn't have all the features implemented by KVM and some fields are still missing. Moreover, nested_evmcs_filter_control_msr() has to support the original eVMCSv1 version when VMM wishes so. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-17-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:48 -04:00
Vitaly Kuznetsov	4da77090b0	KVM: nVMX: Support PERF_GLOBAL_CTRL with enlightened VMCS Enlightened VMCS v1 got updated and now includes the required fields for loading PERF_GLOBAL_CTRL upon VMENTER/VMEXIT features. For KVM on Hyper-V enablement, KVM can just observe VMX control MSRs and use the features (with or without eVMCS) when possible. Hyper-V on KVM is messier as Windows 11 guests fail to boot if the controls are advertised and a new PV feature flag, CPUID.0x4000000A.EBX BIT(0), is not set. Honor the Hyper-V CPUID feature flag to play nice with Windows guests. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20220830133737.1539624-16-vkuznets@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:47 -04:00
Sean Christopherson	3ff8a13d41	KVM: nVMX: WARN once and fail VM-Enter if eVMCS sees VMFUNC[63:32] != 0 WARN and reject nested VM-Enter if KVM is using eVMCS and manages to allow a non-zero value in the upper 32 bits of VM-function controls. The eVMCS code assumes all inputs are 32-bit values and subtly drops the upper bits. WARN instead of adding proper "support", it's unlikely the upper bits will be defined/used in the next decade. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-15-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:47 -04:00
Vitaly Kuznetsov	dea6e140d9	KVM: x86: hyper-v: Cache HYPERV_CPUID_NESTED_FEATURES CPUID leaf KVM has to check guest visible HYPERV_CPUID_NESTED_FEATURES.EBX CPUID leaf to know which Enlightened VMCS definition to use (original or 2022 update). Cache the leaf along with other Hyper-V CPUID feature leaves to make the check quick. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-12-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:44 -04:00
Vitaly Kuznetsov	c9d31986e8	KVM: nVMX: Support several new fields in eVMCSv1 Enlightened VMCS v1 definition was updated with new fields, add support for them for Hyper-V on KVM. Note: SSP, CET and Guest LBR features are not supported by KVM yet and 'struct vmcs12' has no corresponding fields. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-11-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:43 -04:00
Vitaly Kuznetsov	b19e4ff5e5	KVM: VMX: Define VMCS-to-EVMCS conversion for the new fields Enlightened VMCS v1 definition was updated with new fields, support them in KVM by defining VMCS-to-EVMCS conversion. Note: SSP, CET and Guest LBR features are not supported by KVM yet and the corresponding fields are not defined in 'enum vmcs_field', leave them commented out for now. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-10-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:42 -04:00
Sean Christopherson	6cce93de28	KVM: nVMX: Use CC() macro to handle eVMCS unsupported controls checks Locally #define and use the nested virtualization Consistency Check (CC) macro to handle eVMCS unsupported controls checks. Using the macro loses the existing printing of the unsupported controls, but that's a feature and not a bug. The existing approach is flawed because the @err param to trace_kvm_nested_vmenter_failed() is the error code, not the error value. The eVMCS trickery mostly works as __print_symbolic() falls back to printing the raw hex value, but that subtly relies on not having a match between the unsupported value and VMX_VMENTER_INSTRUCTION_ERRORS. If it's really truly necessary to snapshot the bad value, then the tracepoint can be extended in the future. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-9-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:42 -04:00
Vitaly Kuznetsov	f4d361b4c2	KVM: nVMX: Refactor unsupported eVMCS controls logic to use 2-d array Refactor the handling of unsupported eVMCS to use a 2-d array to store the set of unsupported controls. KVM's handling of eVMCS is completely broken as there is no way for userspace to query which features are unsupported, nor does KVM prevent userspace from attempting to enable unsupported features. A future commit will remedy that by filtering and enforcing unsupported features when eVMCS, but that needs to be opt-in from userspace to avoid breakage, i.e. KVM needs to maintain its legacy behavior by snapshotting the exact set of controls that are currently (un)supported by eVMCS. No functional change intended. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> [sean: split to standalone patch, write changelog] Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-8-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:41 -04:00
Sean Christopherson	85ab071af8	KVM: nVMX: Treat eVMCS as enabled for guest iff Hyper-V is also enabled When querying whether or not eVMCS is enabled on behalf of the guest, treat eVMCS as enable if and only if Hyper-V is enabled/exposed to the guest. Note, flows that come from the host, e.g. KVM_SET_NESTED_STATE, must NOT check for Hyper-V being enabled as KVM doesn't require guest CPUID to be set before most ioctls(). Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-7-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:40 -04:00
Sean Christopherson	3be29eb7b5	KVM: x86: Report error when setting CPUID if Hyper-V allocation fails Return -ENOMEM back to userspace if allocating the Hyper-V vCPU struct fails when enabling Hyper-V in guest CPUID. Silently ignoring failure means that KVM will not have an up-to-date CPUID cache if allocating the struct succeeds later on, e.g. when activating SynIC. Rejecting the CPUID operation also guarantess that vcpu->arch.hyperv is non-NULL if hyperv_enabled is true, which will allow for additional cleanup, e.g. in the eVMCS code. Note, the initialization needs to be done before CPUID is set, and more subtly before kvm_check_cpuid(), which potentially enables dynamic XFEATURES. Sadly, there's no easy way to avoid exposing Hyper-V details to CPUID or vice versa. Expose kvm_hv_vcpu_init() and the Hyper-V CPUID signature to CPUID instead of exposing cpuid_entry2_find() outside of CPUID code. It's hard to envision kvm_hv_vcpu_init() being misused, whereas cpuid_entry2_find() absolutely shouldn't be used outside of core CPUID code. Fixes: `10d7bf1e46` ("KVM: x86: hyper-v: Cache guest CPUID leaves determining features availability") Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-6-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:39 -04:00
Sean Christopherson	1cac8d9f6b	KVM: x86: Check for existing Hyper-V vCPU in kvm_hv_vcpu_init() When potentially allocating/initializing the Hyper-V vCPU struct, check for an existing instance in kvm_hv_vcpu_init() instead of requiring callers to perform the check. Relying on callers to do the check is risky as it's all too easy for KVM to overwrite vcpu->arch.hyperv and leak memory, and it adds additional burden on callers without much benefit. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20220830133737.1539624-5-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:38 -04:00
Vitaly Kuznetsov	ce2196b831	KVM: x86: Zero out entire Hyper-V CPUID cache before processing entries Wipe the whole 'hv_vcpu->cpuid_cache' with memset() instead of having to zero each particular member when the corresponding CPUID entry was not found. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> [sean: split to separate patch] Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20220830133737.1539624-4-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:38 -04:00
Vitaly Kuznetsov	5ef384a60f	x86/hyperv: Update 'struct hv_enlightened_vmcs' definition Updated Hyper-V Enlightened VMCS specification lists several new fields for the following features: - PerfGlobalCtrl - EnclsExitingBitmap - Tsc Scaling - GuestLbrCtl - CET - SSP Update the definition. Note, the updated spec also provides an additional CPUID feature flag, CPUIDD.0x4000000A.EBX BIT(0), for PerfGlobalCtrl to workaround a Windows 11 quirk. Despite what the TLFS says: Indicates support for the GuestPerfGlobalCtrl and HostPerfGlobalCtrl fields in the enlightened VMCS. guests can safely use the fields if they are enumerated in the architectural VMX MSRs. I.e. KVM-on-HyperV doesn't need to check the CPUID bit, but KVM-as-HyperV must ensure the bit is set if PerfGlobalCtrl fields are exposed to L1. https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> [sean: tweak CPUID name to make it PerfGlobalCtrl only] Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20220830133737.1539624-3-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:37 -04:00
Vitaly Kuznetsov	ea9da788a6	x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition Section 1.9 of TLFS v6.0b says: "All structures are padded in such a way that fields are aligned naturally (that is, an 8-byte field is aligned to an offset of 8 bytes and so on)". 'struct enlightened_vmcs' has a glitch: ... struct { u32 nested_flush_hypercall:1; /* 836: 0 4 / u32 msr_bitmap:1; / 836: 1 4 / u32 reserved:30; / 836: 2 4 / } hv_enlightenments_control; / 836 4 / u32 hv_vp_id; / 840 4 / u64 hv_vm_id; / 844 8 / u64 partition_assist_page; / 852 8 */ ... And the observed values in 'partition_assist_page' make no sense at all. Fix the layout by padding the structure properly. Fixes: `68d1eb72ee` ("x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits") Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-2-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:36 -04:00
Uros Bizjak	57abfa11ba	KVM: VMX: Do not declare vmread_error() asmlinkage There is no need to declare vmread_error() asmlinkage, its arguments can be passed via registers for both 32-bit and 64-bit targets. Function argument registers are considered call-clobbered registers, they are saved in the trampoline just before the function call and restored afterwards. Dropping "asmlinkage" patch unifies trampoline function argument handling between 32-bit and 64-bit targets and improves generated code for 32-bit targets. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20220817144045.3206-1-ubizjak@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:35 -04:00
Liam Ni	e390f4d69d	KVM:x86: Clean up ModR/M "reg" initialization in reg op decoding Refactor decode_register_operand() to get the ModR/M register if and only if the instruction uses a ModR/M encoding to make it more obvious how the register operand is retrieved. Signed-off-by: Liam Ni <zhiguangni01@gmail.com> Link: https://lore.kernel.org/r/20220908141210.1375828-1-zhiguangni01@zhaoxin.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:34 -04:00
Mingwei Zhang	02dfc44f20	KVM: x86: Print guest pgd in kvm_nested_vmenter() Print guest pgd in kvm_nested_vmenter() to enrich the information for tracing. When tdp is enabled, print the value of tdp page table (EPT/NPT); when tdp is disabled, print the value of non-nested CR3. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Link: https://lore.kernel.org/r/20220825225755.907001-4-mizhang@google.com [sean: print nested_cr3 vs. nested_eptp vs. guest_cr3] Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:33 -04:00
David Matlack	37ef0be269	KVM: nVMX: Add tracepoint for nested VM-Enter Call trace_kvm_nested_vmenter() during nested VMLAUNCH/VMRESUME to bring parity with nSVM's usage of the tracepoint during nested VMRUN. Attempt to use analagous VMCS fields to the VMCB fields that are reported in the SVM case: "int_ctl": 32-bit field of the VMCB that the CPU uses to deliver virtual interrupts. The analagous VMCS field is the 16-bit "guest interrupt status". "event_inj": 32-bit field of VMCB that is used to inject events (exceptions and interrupts) into the guest. The analagous VMCS field is the "VM-entry interruption-information field". "npt_enabled": 1 when the VCPU has enabled nested paging. The analagous VMCS field is the enable-EPT execution control. "npt_addr": 64-bit field when the VCPU has enabled nested paging. The analagous VMCS field is the ept_pointer. Signed-off-by: David Matlack <dmatlack@google.com> [move the code into the nested_vmx_enter_non_root_mode().] Signed-off-by: Mingwei Zhang <mizhang@google.com> Link: https://lore.kernel.org/r/20220825225755.907001-3-mizhang@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:32 -04:00
Mingwei Zhang	89e54ec592	KVM: x86: Update trace function for nested VM entry to support VMX Update trace function for nested VM entry to support VMX. Existing trace function only supports nested VMX and the information printed out is AMD specific. So, rename trace_kvm_nested_vmrun() to trace_kvm_nested_vmenter(), since 'vmenter' is generic. Add a new field 'isa' to recognize Intel and AMD; Update the output to print out VMX/SVM related naming respectively, eg., vmcb vs. vmcs; npt vs. ept. Opportunistically update the call site of trace_kvm_nested_vmenter() to make one line per parameter. Signed-off-by: Mingwei Zhang <mizhang@google.com> Link: https://lore.kernel.org/r/20220825225755.907001-2-mizhang@google.com [sean: align indentation, s/update/rename in changelog] Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:32 -04:00
Sean Christopherson	bff0adc40c	KVM: x86: Use u64 for address and error code in page fault tracepoint Track the address and error code as 64-bit values in the page fault tracepoint. When TDP is enabled, the address is a GPA and thus can be a 64-bit value even on 32-bit hosts. And SVM's #NPF genereates 64-bit error codes. Opportunistically clean up the formatting. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:31 -04:00
Wonhyuk Yang	faa03b3972	KVM: Add extra information in kvm_page_fault trace point Currently, kvm_page_fault trace point provide fault_address and error code. However it is not enough to find which cpu and instruction cause kvm_page_faults. So add vcpu id and instruction pointer in kvm_page_fault trace point. Cc: Baik Song An <bsahn@etri.re.kr> Cc: Hong Yeon Kim <kimhy@etri.re.kr> Cc: Taeung Song <taeung@reallinux.co.kr> Cc: linuxgeek@linuxgeek.io Signed-off-by: Wonhyuk Yang <vvghjk1234@gmail.com> Link: https://lore.kernel.org/r/20220510071001.87169-1-vvghjk1234@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:30 -04:00
Paolo Bonzini	db25eb87ad	KVM: SVM: remove unnecessary check on INIT intercept Since svm_check_nested_events() is now handling INIT signals, there is no need to latch it until the VMEXIT is injected. The only condition under which INIT signals are latched is GIF=0. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220819165643.83692-1-pbonzini@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:28 -04:00
Uros Bizjak	afe30b59d3	KVM/VMX: Avoid stack engine synchronization uop in __vmx_vcpu_run Avoid instructions with explicit uses of the stack pointer between instructions that implicitly refer to it. The sequence of POP %reg; ADD $x, %RSP; POP %reg forces emission of synchronization uop to synchronize the value of the stack pointer in the stack engine and the out-of-order core. Using POP with the dummy register instead of ADD $x, %RSP results in a smaller code size and faster code. The patch also fixes the reference to the wrong register in the nearby comment. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20220816211010.25693-1-ubizjak@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:02:27 -04:00
Luciano Leão	30ea703a38	x86/cpu: Include the header of init_ia32_feat_ctl()'s prototype Include the header containing the prototype of init_ia32_feat_ctl(), solving the following warning: $ make W=1 arch/x86/kernel/cpu/feat_ctl.o arch/x86/kernel/cpu/feat_ctl.c:112:6: warning: no previous prototype for ‘init_ia32_feat_ctl’ [-Wmissing-prototypes] 112 \| void init_ia32_feat_ctl(struct cpuinfo_x86 *c) This warning appeared after commit `5d5103595e` ("x86/cpu: Reinitialize IA32_FEAT_CTL MSR on BSP during wakeup") had moved the function init_ia32_feat_ctl()'s prototype from arch/x86/kernel/cpu/cpu.h to arch/x86/include/asm/cpu.h. Note that, before the commit mentioned above, the header include "cpu.h" (arch/x86/kernel/cpu/cpu.h) was added by commit `0e79ad863d` ("x86/cpu: Fix a -Wmissing-prototypes warning for init_ia32_feat_ctl()") solely to fix init_ia32_feat_ctl()'s missing prototype. So, the header include "cpu.h" is no longer necessary. [ bp: Massage commit message. ] Fixes: `5d5103595e` ("x86/cpu: Reinitialize IA32_FEAT_CTL MSR on BSP during wakeup") Signed-off-by: Luciano Leão <lucianorsleao@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Nícolas F. R. A. Prado <n@nfraprado.net> Link: https://lore.kernel.org/r/20220922200053.1357470-1-lucianorsleao@gmail.com	2022-09-26 17:06:27 +02:00
Taehee Yoo	ba3579e6e4	crypto: aria-avx - add AES-NI/AVX/x86_64/GFNI assembler implementation of aria cipher The implementation is based on the 32-bit implementation of the aria. Also, aria-avx process steps are the similar to the camellia-avx. 1. Byteslice(16way) 2. Add-round-key. 3. Sbox 4. Diffusion layer. Except for s-box, all steps are the same as the aria-generic implementation. s-box step is very similar to camellia and sm4 implementation. There are 2 implementations for s-box step. One is to use AES-NI and affine transformation, which is the same as Camellia, sm4, and others. Another is to use GFNI. GFNI implementation is faster than AES-NI implementation. So, it uses GFNI implementation if the running CPU supports GFNI. There are 4 s-boxes in the ARIA and the 2 s-boxes are the same as AES's s-boxes. To calculate the first sbox, it just uses the aesenclast and then inverts shift_row. No more process is needed for this job because the first s-box is the same as the AES encryption s-box. To calculate the second sbox(invert of s1), it just uses the aesdeclast and then inverts shift_row. No more process is needed for this job because the second s-box is the same as the AES decryption s-box. To calculate the third s-box, it uses the aesenclast, then affine transformation, which is combined AES inverse affine and ARIA S2. To calculate the last s-box, it uses the aesdeclast, then affine transformation, which is combined X2 and AES forward affine. The optimized third and last s-box logic and GFNI s-box logic are implemented by Jussi Kivilinna. The aria-generic implementation is based on a 32-bit implementation, not an 8-bit implementation. the aria-avx Diffusion Layer implementation is based on aria-generic implementation because 8-bit implementation is not fit for parallel implementation but 32-bit is enough to fit for this. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-09-24 16:14:44 +08:00
Linus Torvalds	317fab7ec5	ARM: * Fix for kmemleak with pKVM s390: * Fixes for VFIO with zPCI * smatch fix x86: * Ensure XSAVE-capable hosts always allow FP and SSE state to be saved and restored via KVM_{GET,SET}_XSAVE * Fix broken max_mmu_rmap_size stat * Fix compile error with old glibc that doesn't have gettid() -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmMtvg4UHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNy2Af/VybWK2uAbaMkV6irQ1YTJIrRPco1 C+JQdiQklYbjzThfPWNF/MiH+VTObloR1KqztOeQbfcrgwzygO68D3bs0wkAukLA mtdcMjdsqNx8r9u533i6S8Dpo0RkHKl+I8+3mHdPHTzlrbCuYJFFFxFNLhE+xbrK DP2Gl/xXIGYwOv2nfHA/xxI7TRICv4IxmzQazxlmC27n6BLNSr8qp6jI9lXJQfJ8 XJh3SbmRux3/cs2oEqONg8DySJh631kI1jGGOmL3qk07ZR7A5KZ+lju0xM7vQIEq aR25YNYZux+BPIY/WxT1R0j6pwinBmFp8OoQYCs8DaQ65fdE5gegtJRpow== =VADm -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "As everyone back came back from conferences, here are the pending patches for Linux 6.0. ARM: - Fix for kmemleak with pKVM s390: - Fixes for VFIO with zPCI - smatch fix x86: - Ensure XSAVE-capable hosts always allow FP and SSE state to be saved and restored via KVM_{GET,SET}_XSAVE - Fix broken max_mmu_rmap_size stat - Fix compile error with old glibc that doesn't have gettid()" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Inject #UD on emulated XSETBV if XSAVES isn't enabled KVM: x86: Always enable legacy FP/SSE in allowed user XFEATURES KVM: x86: Reinstate kvm_vcpu_arch.guest_supported_xcr0 KVM: x86/mmu: add missing update to max_mmu_rmap_size selftests: kvm: Fix a compile error in selftests/kvm/rseq_test.c KVM: s390: pci: register pci hooks without interpretation KVM: s390: pci: fix GAIT physical vs virtual pointers usage KVM: s390: Pass initialized arg even if unused KVM: s390: pci: fix plain integer as NULL pointer warnings KVM: arm64: Use kmemleak_free_part_phys() to unregister hyp_mem_base	2022-09-23 08:42:30 -07:00
Paolo Bonzini	69604fe76e	More pci fixes Fix for a code analyser warning -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEwGNS88vfc9+v45Yq41TmuOI4ufgFAmMtmHcACgkQ41TmuOI4 ufhapA/+OGjJtUKEa/LkCArUMUgJ+ZYy9cwx3Qrkvm7BtEguN35uxO5T7X8pdMDU /dYSDwtPT1ob/QmjhwwE0dpgUF8yfkBNLKBK37f6jTznI4mS6UEq/IB1BcCk3ss5 45xG8rWiXS7oFm9bxtNsa5jkrSf7gIOuEa570tOVtOlBYvtFIH2SiHvcCUY1w48N A6docb+vOJnPIBHPR/1q+bBoghO5rplYaX0EiAt3EYThrhAjsXjIFzaQPwXUxgFh Nk/3ni1St72sbWDiC7YxjDghMVuW1GuQAlgkDWvy8bdA0IfsaqbyISuyIWZWkRfx V8+7OfGVsCB7qhT4C+65fQPqwZwERNwwvAFsx175/Mm9/AXn217+sN52O54olFcM dhKI7med1R/LqO2HH9kCGIUfJ3t+0JHpFDwcJyiAZsFbYffnwUQbIW4YcA3O+rje cV8/kRhD6ebzrScTOMVH+Rb4Q0N0kL34iHXbI8wcTBKmSFrdRm9YeLtQM2P/HtPA GVyg49bfXfA13goM9jaUW/IgT1aSHNRHfbfFKTUB7ph2Wi5ktMizpkn/CQ8L7bm1 I0JoZfA7ALDufg2Ajyqsqmv3RX2k6ydenmX/AxzQlo34kSXY6X+N+yiPlqhumBqq Gs2pmvQoyvAmuShhj+eSZsQYBObgjzN7B/SAbbUBT71GerGyWeQ= =Z7gl -----END PGP SIGNATURE----- Merge tag 'kvm-s390-master-6.0-2' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD More pci fixes Fix for a code analyser warning	2022-09-23 10:06:08 -04:00
James Morse	f7b1843eca	x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes resctrl_arch_rmid_read() returns a value in chunks, as read from the hardware. This needs scaling to bytes by mon_scale, as provided by the architecture code. Now that resctrl_arch_rmid_read() performs the overflow and corrections itself, it may as well return a value in bytes directly. This allows the accesses to the architecture specific 'hw' structure to be removed. Move the mon_scale conversion into resctrl_arch_rmid_read(). mbm_bw_count() is updated to calculate bandwidth from bytes. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-22-james.morse@arm.com	2022-09-23 14:25:05 +02:00
James Morse	d80975e264	x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_data resctrl_rmid_realloc_threshold can be set by user-space. The maximum value is specified by the architecture. Currently max_threshold_occ_write() reads the maximum value from boot_cpu_data.x86_cache_size, which is not portable to another architecture. Add resctrl_rmid_realloc_limit to describe the maximum size in bytes that user-space can set the threshold to. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-21-james.morse@arm.com	2022-09-23 14:24:16 +02:00
James Morse	ae2328b529	x86/resctrl: Rename and change the units of resctrl_cqm_threshold resctrl_cqm_threshold is stored in a hardware specific chunk size, but exposed to user-space as bytes. This means the filesystem parts of resctrl need to know how the hardware counts, to convert the user provided byte value to chunks. The interface between the architecture's resctrl code and the filesystem ought to treat everything as bytes. Change the unit of resctrl_cqm_threshold to bytes. resctrl_arch_rmid_read() still returns its value in chunks, so this needs converting to bytes. As all the users have been touched, rename the variable to resctrl_rmid_realloc_threshold, which describes what the value is for. Neither r->num_rmid nor hw_res->mon_scale are guaranteed to be a power of 2, so the existing code introduces a rounding error from resctrl's theoretical fraction of the cache usage. This behaviour is kept as it ensures the user visible value matches the value read from hardware when the rmid will be reallocated. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-20-james.morse@arm.com	2022-09-23 14:23:41 +02:00
James Morse	38f72f50d6	x86/resctrl: Move get_corrected_mbm_count() into resctrl_arch_rmid_read() resctrl_arch_rmid_read() is intended as the function that an architecture agnostic resctrl filesystem driver can use to read a value in bytes from a counter. Currently the function returns the MBM values in chunks directly from hardware. When reading a bandwidth counter, get_corrected_mbm_count() must be used to correct the value read. get_corrected_mbm_count() is architecture specific, this work should be done in resctrl_arch_rmid_read(). Move the function calls. This allows the resctrl filesystems's chunks value to be removed in favour of the architecture private version. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-19-james.morse@arm.com	2022-09-23 14:22:53 +02:00
James Morse	1d81d15db3	x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read() resctrl_arch_rmid_read() is intended as the function that an architecture agnostic resctrl filesystem driver can use to read a value in bytes from a counter. Currently the function returns the MBM values in chunks directly from hardware. When reading a bandwidth counter, mbm_overflow_count() must be used to correct for any possible overflow. mbm_overflow_count() is architecture specific, its behaviour should be part of resctrl_arch_rmid_read(). Move the mbm_overflow_count() calls into resctrl_arch_rmid_read(). This allows the resctrl filesystems's prev_msr to be removed in favour of the architecture private version. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-18-james.morse@arm.com	2022-09-23 14:22:20 +02:00
James Morse	8286618aca	x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read() resctrl_arch_rmid_read() is intended as the function that an architecture agnostic resctrl filesystem driver can use to read a value in bytes from a hardware register. Currently the function returns the MBM values in chunks directly from hardware. To convert this to bytes, some correction and overflow calculations are needed. These depend on the resource and domain structures. Overflow detection requires the old chunks value. None of this is available to resctrl_arch_rmid_read(). MPAM requires the resource and domain structures to find the MMIO device that holds the registers. Pass the resource and domain to resctrl_arch_rmid_read(). This makes rmid_dirty() too big. Instead merge it with its only caller, and the name is kept as a local variable. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-17-james.morse@arm.com	2022-09-23 14:21:25 +02:00
James Morse	4d044c521a	x86/resctrl: Abstract __rmid_read() __rmid_read() selects the specified eventid and returns the counter value from the MSR. The error handling is architecture specific, and handled by the callers, rdtgroup_mondata_show() and __mon_event_count(). Error handling should be handled by architecture specific code, as a different architecture may have different requirements. MPAM's counters can report that they are 'not ready', requiring a second read after a short delay. This should be hidden from resctrl. Make __rmid_read() the architecture specific function for reading a counter. Rename it resctrl_arch_rmid_read() and move the error handling into it. A read from a counter that hardware supports but resctrl does not now returns -EINVAL instead of -EIO from the default case in __mon_event_count(). It isn't possible for user-space to see this change as resctrl doesn't expose counters it doesn't support. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-16-james.morse@arm.com	2022-09-23 14:17:20 +02:00
Kees Cook	712f210a45	x86/microcode/AMD: Track patch allocation size explicitly In preparation for reducing the use of ksize(), record the actual allocation size for later memcpy(). This avoids copying extra (uninitialized!) bytes into the patch buffer when the requested allocation size isn't exactly the size of a kmalloc bucket. Additionally, fix potential future issues where runtime bounds checking will notice that the buffer was allocated to a smaller value than returned by ksize(). Fixes: `757885e94a` ("x86, microcode, amd: Early microcode patch loading support for AMD") Suggested-by: Daniel Micay <danielmicay@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/lkml/CA+DvKQ+bp7Y7gmaVhacjv9uF6Ar-o4tet872h4Q8RPYPJjcJQA@mail.gmail.com/	2022-09-23 13:46:26 +02:00
James Morse	fea62d370d	x86/resctrl: Allow per-rmid arch private storage to be reset To abstract the rmid counters into a helper that returns the number of bytes counted, architecture specific per-rmid state is needed. It needs to be possible to reset this hidden state, as the values may outlive the life of an rmid, or the mount time of the filesystem. mon_event_read() is called with first = true when an rmid is first allocated in mkdir_mondata_subdir(). Add resctrl_arch_reset_rmid() and call it from __mon_event_count()'s rr->first check. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-15-james.morse@arm.com	2022-09-23 12:49:04 +02:00
Sean Christopherson	50b2d49baf	KVM: x86: Inject #UD on emulated XSETBV if XSAVES isn't enabled Inject #UD when emulating XSETBV if CR4.OSXSAVE is not set. This also covers the "XSAVE not supported" check, as setting CR4.OSXSAVE=1 #GPs if XSAVE is not supported (and userspace gets to keep the pieces if it forces incoherent vCPU state). Add a comment to kvm_emulate_xsetbv() to call out that the CPU checks CR4.OSXSAVE before checking for intercepts. AMD'S APM implies that #UD has priority (says that intercepts are checked before #GP exceptions), while Intel's SDM says nothing about interception priority. However, testing on hardware shows that both AMD and Intel CPUs prioritize the #UD over interception. Fixes: `02d4160fbd` ("x86: KVM: add xsetbv to the emulator") Cc: stable@vger.kernel.org Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220824033057.3576315-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-22 17:04:20 -04:00
Dr. David Alan Gilbert	a1020a25e6	KVM: x86: Always enable legacy FP/SSE in allowed user XFEATURES Allow FP and SSE state to be saved and restored via KVM_{G,SET}_XSAVE on XSAVE-capable hosts even if their bits are not exposed to the guest via XCR0. Failing to allow FP+SSE first showed up as a QEMU live migration failure, where migrating a VM from a pre-XSAVE host, e.g. Nehalem, to an XSAVE host failed due to KVM rejecting KVM_SET_XSAVE. However, the bug also causes problems even when migrating between XSAVE-capable hosts as KVM_GET_SAVE won't set any bits in user_xfeatures if XSAVE isn't exposed to the guest, i.e. KVM will fail to actually migrate FP+SSE. Because KVM_{G,S}ET_XSAVE are designed to allowing migrating between hosts with and without XSAVE, KVM_GET_XSAVE on a non-XSAVE (by way of fpu_copy_guest_fpstate_to_uabi()) always sets the FP+SSE bits in the header so that KVM_SET_XSAVE will work even if the new host supports XSAVE. Fixes: `ad856280dd` ("x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0") bz: https://bugzilla.redhat.com/show_bug.cgi?id=2079311 Cc: stable@vger.kernel.org Cc: Leonardo Bras <leobras@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> [sean: add comment, massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220824033057.3576315-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-22 17:04:19 -04:00
Sean Christopherson	ee519b3a2a	KVM: x86: Reinstate kvm_vcpu_arch.guest_supported_xcr0 Reinstate the per-vCPU guest_supported_xcr0 by partially reverting commit 988896bb6182; the implicit assessment that guest_supported_xcr0 is always the same as guest_fpu.fpstate->user_xfeatures was incorrect. kvm_vcpu_after_set_cpuid() isn't the only place that sets user_xfeatures, as user_xfeatures is set to fpu_user_cfg.default_features when guest_fpu is allocated via fpu_alloc_guest_fpstate() => __fpstate_reset(). guest_supported_xcr0 on the other hand is zero-allocated. If userspace never invokes KVM_SET_CPUID2, supported XCR0 will be '0', whereas the allowed user XFEATURES will be non-zero. Practically speaking, the edge case likely doesn't matter as no sane userspace will live migrate a VM without ever doing KVM_SET_CPUID2. The primary motivation is to prepare for KVM intentionally and explicitly setting bits in user_xfeatures that are not set in guest_supported_xcr0. Because KVM_{G,S}ET_XSAVE can be used to svae/restore FP+SSE state even if the host doesn't support XSAVE, KVM needs to set the FP+SSE bits in user_xfeatures even if they're not allowed in XCR0, e.g. because XCR0 isn't exposed to the guest. At that point, the simplest fix is to track the two things separately (allowed save/restore vs. allowed XCR0). Fixes: `988896bb61` ("x86/kvm/fpu: Remove kvm_vcpu_arch.guest_supported_xcr0") Cc: stable@vger.kernel.org Cc: Leonardo Bras <leobras@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220824033057.3576315-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-22 17:04:19 -04:00
Miaohe Lin	604f533262	KVM: x86/mmu: add missing update to max_mmu_rmap_size The update to statistic max_mmu_rmap_size is unintentionally removed by commit `4293ddb788` ("KVM: x86/mmu: Remove redundant spte present check in mmu_set_spte"). Add missing update to it or max_mmu_rmap_size will always be nonsensical 0. Fixes: `4293ddb788` ("KVM: x86/mmu: Remove redundant spte present check in mmu_set_spte") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Message-Id: <20220907080657.42898-1-linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-22 17:03:20 -04:00
James Morse	48dbe31a24	x86/resctrl: Add per-rmid arch private storage for overflow and chunks A renamed __rmid_read() is intended as the function that an architecture agnostic resctrl filesystem driver can use to read a value in bytes from a counter. Currently the function returns the MBM values in chunks directly from hardware. For bandwidth counters the resctrl filesystem uses this to calculate the number of bytes ever seen. MPAM's scaling of counters can be changed at runtime, reducing the resolution but increasing the range. When this is changed the prev_msr values need to be converted by the architecture code. Add an array for per-rmid private storage. The prev_msr and chunks values will move here to allow resctrl_arch_rmid_read() to always return the number of bytes read by this counter without assistance from the filesystem. The values are moved in later patches when the overflow and correction calls are moved into __rmid_read(). Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-14-james.morse@arm.com	2022-09-22 17:46:09 +02:00
James Morse	30442571ec	x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks mbm_bw_count() is only called by the mbm_handle_overflow() worker once a second. It reads the hardware register, calculates the bandwidth and updates m->prev_bw_msr which is used to hold the previous hardware register value. Operating directly on hardware register values makes it difficult to make this code architecture independent, so that it can be moved to /fs/, making the mba_sc feature something resctrl supports with no additional support from the architecture. Prior to calling mbm_bw_count(), mbm_update() reads from the same hardware register using __mon_event_count(). Change mbm_bw_count() to use the current chunks value most recently saved by __mon_event_count(). This removes an extra call to __rmid_read(). Instead of using m->prev_msr to calculate the number of chunks seen, use the rr->val that was updated by __mon_event_count(). This removes an extra call to mbm_overflow_count() and get_corrected_mbm_count(). Calculating bandwidth like this means mbm_bw_count() no longer operates on hardware register values directly. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-13-james.morse@arm.com	2022-09-22 17:44:57 +02:00
James Morse	ff6357bb50	x86/resctrl: Allow update_mba_bw() to update controls directly update_mba_bw() calculates a new control value for the MBA resource based on the user provided mbps_val and the current measured bandwidth. Some control values need remapping by delay_bw_map(). It does this by calling wrmsrl() directly. This needs splitting up to be done by an architecture specific helper, so that the remainder can eventually be moved to /fs/. Add resctrl_arch_update_one() to apply one configuration value to the provided resource and domain. This avoids the staging and cross-calling that is only needed with changes made by user-space. delay_bw_map() moves to be part of the arch code, to maintain the 'percentage control' view of MBA resources in resctrl. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-12-james.morse@arm.com	2022-09-22 17:43:44 +02:00
James Morse	b58d4eb1f1	x86/resctrl: Remove architecture copy of mbps_val The resctrl arch code provides a second configuration array mbps_val[] for the MBA software controller. Since resctrl switched over to allocating and freeing its own array when needed, nothing uses the arch code version. Remove it. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-11-james.morse@arm.com	2022-09-22 17:37:16 +02:00
James Morse	6ce1560d35	x86/resctrl: Switch over to the resctrl mbps_val list Updates to resctrl's software controller follow the same path as other configuration updates, but they don't modify the hardware state. rdtgroup_schemata_write() uses parse_line() and the resource's parse_ctrlval() function to stage the configuration. resctrl_arch_update_domains() then updates the mbps_val[] array instead, and resctrl_arch_update_domains() skips the rdt_ctrl_update() call that would update hardware. This complicates the interface between resctrl's filesystem parts and architecture specific code. It should be possible for mba_sc to be completely implemented by the filesystem parts of resctrl. This would allow it to work on a second architecture with no additional code. resctrl_arch_update_domains() using the mbps_val[] array prevents this. Change parse_bw() to write the configuration value directly to the mbps_val[] array in the domain structure. Change rdtgroup_schemata_write() to skip the call to resctrl_arch_update_domains(), meaning all the mba_sc specific code in resctrl_arch_update_domains() can be removed. On the read-side, show_doms() and update_mba_bw() are changed to read the mbps_val[] array from the domain structure. With this, resctrl_arch_get_config() no longer needs to consider mba_sc resources. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-10-james.morse@arm.com	2022-09-22 17:34:08 +02:00
James Morse	781096d971	x86/resctrl: Create mba_sc configuration in the rdt_domain To support resctrl's MBA software controller, the architecture must provide a second configuration array to hold the mbps_val[] from user-space. This complicates the interface between the architecture specific code and the filesystem portions of resctrl that will move to /fs/, to allow multiple architectures to support resctrl. Make the filesystem parts of resctrl create an array for the mba_sc values. The software controller can be changed to use this, allowing the architecture code to only consider the values configured in hardware. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-9-james.morse@arm.com	2022-09-22 17:17:59 +02:00
James Morse	b045c21586	x86/resctrl: Abstract and use supports_mba_mbps() To determine whether the mba_MBps option to resctrl should be supported, resctrl tests the boot CPUs' x86_vendor. This isn't portable, and needs abstracting behind a helper so this check can be part of the filesystem code that moves to /fs/. Re-use the tests set_mba_sc() does to determine if the mba_sc is supported on this system. An 'alloc_capable' test is added so that support for the controls isn't implied by the 'delay_linear' property, which is always true for MPAM. Because mbm_update() only update mba_sc if the mbm_local counters are enabled, supports_mba_mbps() checks is_mbm_local_enabled(). (instead of using is_mbm_enabled(), which checks both). Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-8-james.morse@arm.com	2022-09-22 16:10:11 +02:00
James Morse	1644dfe727	x86/resctrl: Remove set_mba_sc()s control array re-initialisation set_mba_sc() enables the 'software controller' to regulate the bandwidth based on the byte counters. This can be managed entirely in the parts of resctrl that move to /fs/, without any extra support from the architecture specific code. set_mba_sc() is called by rdt_enable_ctx() during mount and unmount. It currently resets the arch code's ctrl_val[] and mbps_val[] arrays. The ctrl_val[] was already reset when the domain was created, and by reset_all_ctrls() when the filesystem was last unmounted. Doing the work in set_mba_sc() is not necessary as the values are already at their defaults due to the creation of the domain, or were previously reset during umount(), or are about to reset during umount(). Add a reset of the mbps_val[] in reset_all_ctrls(), allowing the code in set_mba_sc() that reaches in to the architecture specific structures to be removed. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-7-james.morse@arm.com	2022-09-22 16:08:20 +02:00
James Morse	798fd4b9ac	x86/resctrl: Add domain offline callback for resctrl work Because domains are exposed to user-space via resctrl, the filesystem must update its state when CPU hotplug callbacks are triggered. Some of this work is common to any architecture that would support resctrl, but the work is tied up with the architecture code to free the memory. Move the monitor subdir removal and the cancelling of the mbm/limbo works into a new resctrl_offline_domain() call. These bits are not specific to the architecture. Grouping them in one function allows that code to be moved to /fs/ and re-used by another architecture. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-6-james.morse@arm.com	2022-09-22 15:42:40 +02:00
James Morse	7add3af417	x86/resctrl: Group struct rdt_hw_domain cleanup domain_add_cpu() and domain_remove_cpu() need to kfree() the child arrays that were allocated by domain_setup_ctrlval(). As this memory is moved around, and new arrays are created, adjusting the error handling cleanup code becomes noisier. To simplify this, move all the kfree() calls into a domain_free() helper. This depends on struct rdt_hw_domain being kzalloc()d, allowing it to unconditionally kfree() all the child arrays. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-5-james.morse@arm.com	2022-09-22 15:27:15 +02:00
James Morse	3a7232cdf1	x86/resctrl: Add domain online callback for resctrl work Because domains are exposed to user-space via resctrl, the filesystem must update its state when CPU hotplug callbacks are triggered. Some of this work is common to any architecture that would support resctrl, but the work is tied up with the architecture code to allocate the memory. Move domain_setup_mon_state(), the monitor subdir creation call and the mbm/limbo workers into a new resctrl_online_domain() call. These bits are not specific to the architecture. Grouping them in one function allows that code to be moved to /fs/ and re-used by another architecture. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-4-james.morse@arm.com	2022-09-22 15:13:27 +02:00
James Morse	bab6ee7368	x86/resctrl: Merge mon_capable and mon_enabled mon_enabled and mon_capable are always set as a pair by rdt_get_mon_l3_config(). There is no point having two values. Merge them together. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-3-james.morse@arm.com	2022-09-22 14:43:08 +02:00
James Morse	4d269ed485	x86/resctrl: Kill off alloc_enabled rdt_resources_all[] used to have extra entries for L2CODE/L2DATA. These were hidden from resctrl by the alloc_enabled value. Now that the L2/L2CODE/L2DATA resources have been merged together, alloc_enabled doesn't mean anything, it always has the same value as alloc_capable which indicates allocation is supported by this resource. Remove alloc_enabled and its helpers. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-2-james.morse@arm.com	2022-09-22 14:34:33 +02:00
Thomas Gleixner	f92ff8f5dc	x86/paravirt: Ensure proper alignment The entries in the .parainstructions sections are 8 byte aligned and the corresponding C struct paravirt_patch_site makes the array offset 16 bytes. Though the pushed entries are only using 12 bytes, __parainstructions_end is therefore 4 bytes short. That works by chance because it's only used in a loop: for (p = start; p < end; p++) But this falls flat when calculating the number of elements: n = end - start That's obviously off by one. Ensure that the gap is filled and the last entry is occupying 16 bytes. [ bp: Add the proper struct and section names. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20220915111142.992398801@infradead.org	2022-09-21 12:30:16 +02:00
Dave Hansen	a3d3163fbe	x86/mm/32: Fix W^X detection when page tables do not support NX The x86 MM code now actively refuses to create writable+executable mappings, and warns when there is an attempt to create one. The 0day test robot ran across a warning triggered by module unloading on 32-bit kernels. This was only seen on CPUs with NX support, but where a 32-bit kernel was built without PAE support. On those systems, there is no room for the NX bit in the page tables and _PAGE_NX is #defined to 0, breaking some of the W^X detection logic in verify_rwx(). The X86_FEATURE_NX check in there does not do any good here because the CPU itself supports NX. Fix it by checking for _PAGE_NX support directly instead of checking CPU support for NX. Note that since _PAGE_NX is actually defined to be 0 at compile-time this fix should also end up letting the compiler optimize away most of verify_rwx() on non-PAE kernels. Fixes: `652c5bf380` ("x86/mm: Refuse W^X violations") Reported-by: kernel test robot <yujie.liu@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/fcf89147-440b-e478-40c9-228c9fe56691@intel.com/	2022-09-21 10:02:55 +02:00
Ingo Molnar	74656d03ac	Linux 6.0-rc6 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmMngx4eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGQ1AH/1p4oOT4iqaoTueO MaTQhyvFFcTSLL4y1qejtytNXe4ZEJHyf03jrwtYlfx8RROkZFMJh15G1uWu2deu 43XsUuSWpJ18/C7hRNl1LUazBbuQe30d09zLe7dvD64IAABU6/iQCIorxheTl4EU NXsda2egJUIbTwn2zdFSgMMJPNORxq8KHgvNY/psIEteC+lFln2l2ZXZ21JAIdBj lcTbvx6JpJC0AqX1UuO6NsN4nUnEEh110UtYF6lxQ7olkQKwRaUjQIVuWOFLz75n wDrJxPlVGbDR5zeitDaHkKqWn8LNcqHpDIuAKMxTjT0N/1/sUwHNkyGZXyy1EDJu e0+SX1c= =ITjo -----END PGP SIGNATURE----- Merge tag 'v6.0-rc6' into locking/core, to refresh the branch Signed-off-by: Ingo Molnar <mingo@kernel.org>	2022-09-21 09:58:02 +02:00
David Gow	bd71558d58	arch: um: Mark the stack non-executable to fix a binutils warning Since binutils 2.39, ld will print a warning if any stack section is executable, which is the default for stack sections on files without a .note.GNU-stack section. This was fixed for x86 in commit `ffcf9c5700` ("x86: link vdso and boot with -z noexecstack --no-warn-rwx-segments"), but remained broken for UML, resulting in several warnings: /usr/bin/ld: warning: arch/x86/um/vdso/vdso.o: missing .note.GNU-stack section implies executable stack /usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker /usr/bin/ld: warning: .tmp_vmlinux.kallsyms1 has a LOAD segment with RWX permissions /usr/bin/ld: warning: .tmp_vmlinux.kallsyms1.o: missing .note.GNU-stack section implies executable stack /usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker /usr/bin/ld: warning: .tmp_vmlinux.kallsyms2 has a LOAD segment with RWX permissions /usr/bin/ld: warning: .tmp_vmlinux.kallsyms2.o: missing .note.GNU-stack section implies executable stack /usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker /usr/bin/ld: warning: vmlinux has a LOAD segment with RWX permissions Link both the VDSO and vmlinux with -z noexecstack, fixing the warnings about .note.GNU-stack sections. In addition, pass --no-warn-rwx-segments to dodge the remaining warnings about LOAD segments with RWX permissions in the kallsyms objects. (Note that this flag is apparently not available on lld, so hide it behind a test for BFD, which is what the x86 patch does.) Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ffcf9c5700e49c0aee42dcba9a12ba21338e8136 Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=ba951afb99912da01a6e8434126b8fac7aa75107 Signed-off-by: David Gow <davidgow@google.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Tested-by: Lukas Straub <lukasstraub2@web.de> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: Richard Weinberger <richard@nod.at>	2022-09-21 09:11:42 +02:00
Jiri Slaby	5258b80e60	x86/dumpstack: Don't mention RIP in "Code: " Commit `238c91115c` ("x86/dumpstack: Fix misleading instruction pointer error message") changed the "Code:" line in bug reports when RIP is an invalid pointer. In particular, the report currently says (for example): BUG: kernel NULL pointer dereference, address: 0000000000000000 ... RIP: 0010:0x0 Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. That Unable to access opcode bytes at RIP 0xffffffffffffffd6. is quite confusing as RIP value is 0, not -42. That -42 comes from "regs->ip - PROLOGUE_SIZE", because Code is dumped with some prologue (and epilogue). So do not mention "RIP" on this line in this context. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/b772c39f-c5ae-8f17-fe6e-6a2bc4d1f83b@kernel.org	2022-09-20 16:11:54 +02:00
Vincent Mailhol	fdb6649ab7	x86/asm/bitops: Use __builtin_ctzl() to evaluate constant expressions If x is not 0, __ffs(x) is equivalent to: (unsigned long)__builtin_ctzl(x) And if x is not ~0UL, ffz(x) is equivalent to: (unsigned long)__builtin_ctzl(~x) Because __builting_ctzl() returns an int, a cast to (unsigned long) is necessary to avoid potential warnings on implicit casts. Concerning the edge cases, __builtin_ctzl(0) is always undefined, whereas __ffs(0) and ffz(~0UL) may or may not be defined, depending on the processor. Regardless, for both functions, developers are asked to check against 0 or ~0UL so replacing __ffs() or ffz() by __builting_ctzl() is safe. For x86_64, the current __ffs() and ffz() implementations do not produce optimized code when called with a constant expression. On the contrary, the __builtin_ctzl() folds into a single instruction. However, for non constant expressions, the __ffs() and ffz() asm versions of the kernel remains slightly better than the code produced by GCC (it produces a useless instruction to clear eax). Use __builtin_constant_p() to select between the kernel's __ffs()/ffz() and the __builtin_ctzl() depending on whether the argument is constant or not. Statistics On a allyesconfig, before...: $ objdump -d vmlinux.o \| grep tzcnt \| wc -l 3607 ...and after: $ objdump -d vmlinux.o \| grep tzcnt \| wc -l 2600 So, roughly 27.9% of the calls to either __ffs() or ffz() were using constant expressions and could be optimized out. (tests done on linux v5.18-rc5 x86_64 using GCC 11.2.1) Note: on x86_64, the BSF instruction produces TZCNT when used with the REP prefix (which explain the use of `grep tzcnt' instead of `grep bsf' in above benchmark). c.f. [1] [1] `e26a44a2d6` ("x86: Use REP BSF unconditionally") [ bp: Massage commit message. ] Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/20220511160319.1045812-1-mailhol.vincent@wanadoo.fr	2022-09-20 15:35:37 +02:00
Vincent Mailhol	146034fed6	x86/asm/bitops: Use __builtin_ffs() to evaluate constant expressions For x86_64, the current ffs() implementation does not produce optimized code when called with a constant expression. On the contrary, the __builtin_ffs() functions of both GCC and clang are able to fold the expression into a single instruction. Example Consider two dummy functions foo() and bar() as below: #include <linux/bitops.h> #define CONST 0x01000000 unsigned int foo(void) { return ffs(CONST); } unsigned int bar(void) { return __builtin_ffs(CONST); } GCC would produce below assembly code: 0000000000000000 <foo>: 0: ba 00 00 00 01 mov $0x1000000,%edx 5: b8 ff ff ff ff mov $0xffffffff,%eax a: 0f bc c2 bsf %edx,%eax d: 83 c0 01 add $0x1,%eax 10: c3 ret <Instructions after ret and before next function were redacted> 0000000000000020 <bar>: 20: b8 19 00 00 00 mov $0x19,%eax 25: c3 ret And clang would produce: 0000000000000000 <foo>: 0: b8 ff ff ff ff mov $0xffffffff,%eax 5: 0f bc 05 00 00 00 00 bsf 0x0(%rip),%eax # c <foo+0xc> c: 83 c0 01 add $0x1,%eax f: c3 ret 0000000000000010 <bar>: 10: b8 19 00 00 00 mov $0x19,%eax 15: c3 ret Both examples clearly demonstrate the benefit of using __builtin_ffs() instead of the kernel's asm implementation for constant expressions. However, for non constant expressions, the kernel's ffs() asm version remains better for x86_64 because, contrary to GCC, it doesn't emit the CMOV assembly instruction, c.f. [1] (noticeably, clang is able optimize out the CMOV call). Use __builtin_constant_p() to select between the kernel's ffs() and the __builtin_ffs() depending on whether the argument is constant or not. As a side benefit, replacing the ffs() function declaration by a macro also removes below -Wshadow warning: ./arch/x86/include/asm/bitops.h:283:28: warning: declaration of 'ffs' shadows a built-in function [-Wshadow] 283 \| static __always_inline int ffs(int x) Statistics On a allyesconfig, before...: $ objdump -d vmlinux.o \| grep bsf \| wc -l 1081 ...and after: $ objdump -d vmlinux.o \| grep bsf \| wc -l 792 So, roughly 26.7% of the calls to ffs() were using constant expressions and could be optimized out. (tests done on linux v5.18-rc5 x86_64 using GCC 11.2.1) [1] commit `ca3d30cc02` ("x86_64, asm: Optimise fls(), ffs() and fls64()") [ bp: Massage commit message. ] Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/20220511160319.1045812-1-mailhol.vincent@wanadoo.fr	2022-09-20 15:31:17 +02:00
Yury Norov	38bef8e57f	smp: add set_nr_cpu_ids() In preparation to support compile-time nr_cpu_ids, add a setter for the variable. This is a no-op for all arches. Signed-off-by: Yury Norov <yury.norov@gmail.com>	2022-09-19 17:51:53 -07:00
Lukas Straub	d27fff3499	um: Cleanup compiler warning in arch/x86/um/tls_32.c arch.tls_array is statically allocated so checking for NULL doesn't make sense. This causes the compiler warning below. Remove the checks to silence these warnings. ../arch/x86/um/tls_32.c: In function 'get_free_idx': ../arch/x86/um/tls_32.c:68:13: warning: the comparison will always evaluate as 'true' for the address of 'tls_array' will never be NULL [-Waddress] 68 \| if (!t->arch.tls_array) \| ^ In file included from ../arch/x86/um/asm/processor.h:10, from ../include/linux/rcupdate.h:30, from ../include/linux/rculist.h:11, from ../include/linux/pid.h:5, from ../include/linux/sched.h:14, from ../arch/x86/um/tls_32.c:7: ../arch/x86/um/asm/processor_32.h:22:31: note: 'tls_array' declared here 22 \| struct uml_tls_struct tls_array[GDT_ENTRY_TLS_ENTRIES]; \| ^~~~~~~~~ ../arch/x86/um/tls_32.c: In function 'get_tls_entry': ../arch/x86/um/tls_32.c:243:13: warning: the comparison will always evaluate as 'true' for the address of 'tls_array' will never be NULL [-Waddress] 243 \| if (!t->arch.tls_array) \| ^ ../arch/x86/um/asm/processor_32.h:22:31: note: 'tls_array' declared here 22 \| struct uml_tls_struct tls_array[GDT_ENTRY_TLS_ENTRIES]; \| ^~~~~~~~~ Signed-off-by: Lukas Straub <lukasstraub2@web.de> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: Richard Weinberger <richard@nod.at>	2022-09-19 21:59:47 +02:00
Lukas Straub	61670b4d27	um: Cleanup syscall_handler_t cast in syscalls_32.h Like in `f4f03f299a` "um: Cleanup syscall_handler_t definition/cast, fix warning", remove the cast to to fix the compiler warning. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: Richard Weinberger <richard@nod.at>	2022-09-19 21:58:53 +02:00
Jiri Olsa	ceea991a01	bpf: Move bpf_dispatcher function out of ftrace locations The dispatcher function is attached/detached to trampoline by dispatcher update function. At the same time it's available as ftrace attachable function. After discussion [1] the proposed solution is to use compiler attributes to alter bpf_dispatcher_##name##_func function: - remove it from being instrumented with __no_instrument_function__ attribute, so ftrace has no track of it - but still generate 5 nop instructions with patchable_function_entry(5) attribute, which are expected by bpf_arch_text_poke used by dispatcher update function Enabling HAVE_DYNAMIC_FTRACE_NO_PATCHABLE option for x86, so __patchable_function_entries functions are not part of ftrace/mcount locations. Adding attributes to bpf_dispatcher_XXX function on x86_64 so it's kept out of ftrace locations and has 5 byte nop generated at entry. These attributes need to be arch specific as pointed out by Ilya Leoshkevic in here [2]. The dispatcher image is generated only for x86_64 arch, so the code can stay as is for other archs. [1] https://lore.kernel.org/bpf/20220722110811.124515-1-jolsa@kernel.org/ [2] https://lore.kernel.org/bpf/969a14281a7791c334d476825863ee449964dd0c.camel@linux.ibm.com/ Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/bpf/20220903131154.420467-3-jolsa@kernel.org	2022-09-16 22:23:20 +02:00
Peter Zijlstra	8c03af3e09	x86,retpoline: Be sure to emit INT3 after JMP *%\reg Both AMD and Intel recommend using INT3 after an indirect JMP. Make sure to emit one when rewriting the retpoline JMP irrespective of compiler SLS options or even CONFIG_SLS. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Link: https://lkml.kernel.org/r/Yxm+QkFPOhrVSH6q@hirez.programming.kicks-ass.net	2022-09-15 16:13:53 +02:00
Namhyung Kim	b4e12b2d70	perf: Kill __PERF_SAMPLE_CALLCHAIN_EARLY There's no in-tree user anymore. Let's get rid of it. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220908214104.3851807-3-namhyung@kernel.org	2022-09-13 15:03:23 +02:00
Namhyung Kim	3749d33e51	perf: Use sample_flags for callchain So that it can call perf_callchain() only if needed. Historically it used __PERF_SAMPLE_CALLCHAIN_EARLY but we can do that with sample_flags in the struct perf_sample_data. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220908214104.3851807-1-namhyung@kernel.org	2022-09-13 15:03:22 +02:00
Kefeng Wang	2be9880dc8	kernel: exit: cleanup release_thread() Only x86 has own release_thread(), introduce a new weak release_thread() function to clean empty definitions in other ARCHs. Link: https://lkml.kernel.org/r/20220819014406.32266-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Guo Ren <guoren@kernel.org> [csky] Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Brian Cain <bcain@quicinc.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Stafford Horne <shorne@gmail.com> [openrisc] Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Huacai Chen <chenhuacai@kernel.org> [LoongArch] Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Chris Zankel <chris@zankel.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Guo Ren <guoren@kernel.org> [csky] Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Jonas Bonn <jonas@southpole.se> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Xuerui Wang <kernel@xen0n.name> Cc: Yoshinori Sato <ysato@users.osdn.me> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-09-11 21:55:07 -07:00
Naohiro Aota	818c4fdaa9	x86/mm: disable instrumentations of mm/pgprot.c Commit `4867fbbdd6` ("x86/mm: move protection_map[] inside the platform") moved accesses to protection_map[] from mem_encrypt_amd.c to pgprot.c. As a result, the accesses are now targets of KASAN (and other instrumentations), leading to the crash during the boot process. Disable the instrumentations for pgprot.c like commit `67bb8e999e` ("x86/mm: Disable various instrumentations of mm/mem_encrypt.c and mm/tlb.c"). Before this patch, my AMD machine cannot boot since v6.0-rc1 with KASAN enabled, without anything printed. After the change, it successfully boots up. Fixes: `4867fbbdd6` ("x86/mm: move protection_map[] inside the platform") Link: https://lkml.kernel.org/r/20220824084726.2174758-1-naohiro.aota@wdc.com Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-09-11 16:22:30 -07:00
Linus Torvalds	f448dda895	asm-generic: SOFTIRQ_ON_OWN_STACK rework Just one fixup patch, reworking the softirq_on_own_stack logic for preempt-rt kernels as discussed in https://lore.kernel.org/all/CAHk-=wgZSD3W2y6yczad2Am=EfHYyiPzTn3CfXxrriJf9i5W5w@mail.gmail.com/ -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmMZ/FUACgkQmmx57+YA GNl4DBAAutecM/RGJnK//0+NEcCsuHoIWusifeEIhR8yNFxm1UOiwBNYV4M/73UA x2lLGQCu4ze1zSU3BXX1Ug3MfpmrBOhlsW2+x7j0b3tG9vXpvojitTsCPo6lgNDu nUx58rtN0wfZgwT7c4DaueEL1aIZ1yUOgCE+SCGw3subgpZeoaYUi3sbZxI2K+T8 MnzcVVpmccMaxLh0xwCn6u4csYrarj0XodY11a8+DAN9Afek9nEq4UNhIewjvQw0 2OTbGuxIBZoXNmWdjcDBO97eUtq0oJcRIgPkRi0xlas2hzRJMwO6liqsA34r4r7r Irxyyw4//jlCtJOKwrv+rSxR6ULvGw2+UBk/Q/nYeN9hL2qKnwJIa8hryj+cQ/os KWclMGEYtKvOzciMIvViXAOY4sa7jZO1Q9rntyUIVkNiqQqLDRxocW+GjsqG5KZI bl+akScQhX3a7bEZePO0EJqfitjaxk5JtxF3KcV5apKaWto+EwOpvRf6eUVT97/n Zpoy2dBygOtWiMJR7o143xeOQlhutDCOh6VeDfJiUA57ekABRY/8+R+ZltkmDP84 4iCCtE4oWuMXz390ybBHxKcNjiknLUMvDUNq5wfjsyJ8ftBRF+1+vKR+BdLqsx8O d5/FjRXC5MWaW4GLO89l+MZh+BrOkobdq333VxIStpS9Jd0pUa4= =0A1T -----END PGP SIGNATURE----- Merge tag 'asm-generic-fixes-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull SOFTIRQ_ON_OWN_STACK rework from Arnd Bergmann: "Just one fixup patch, reworking the softirq_on_own_stack logic for preempt-rt kernels as discussed in https://lore.kernel.org/all/CAHk-=wgZSD3W2y6yczad2Am=EfHYyiPzTn3CfXxrriJf9i5W5w@mail.gmail.com/" * tag 'asm-generic-fixes-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: asm-generic: Conditionally enable do_softirq_own_stack() via Kconfig.	2022-09-09 07:23:29 -04:00
Haitao Huang	81fa6fd13b	x86/sgx: Handle VA page allocation failure for EAUG on PF. VM_FAULT_NOPAGE is expected behaviour for -EBUSY failure path, when augmenting a page, as this means that the reclaimer thread has been triggered, and the intention is just to round-trip in ring-3, and retry with a new page fault. Fixes: `5a90d2c3f5` ("x86/sgx: Support adding of pages to an initialized enclave") Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Vijay Dhanraj <vijay.dhanraj@intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20220906000221.34286-3-jarkko@kernel.org	2022-09-08 13:28:31 -07:00
Jarkko Sakkinen	133e049a3f	x86/sgx: Do not fail on incomplete sanitization on premature stop of ksgxd Unsanitized pages trigger WARN_ON() unconditionally, which can panic the whole computer, if /proc/sys/kernel/panic_on_warn is set. In sgx_init(), if misc_register() fails or misc_register() succeeds but neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be prematurely stopped. This may leave unsanitized pages, which will result a false warning. Refine __sgx_sanitize_pages() to return: 1. Zero when the sanitization process is complete or ksgxd has been requested to stop. 2. The number of unsanitized pages otherwise. Fixes: `51ab30eb2a` ("x86/sgx: Replace section->init_laundry_list with sgx_dirty_page_list") Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/linux-sgx/20220825051827.246698-1-jarkko@kernel.org/T/#u Link: https://lkml.kernel.org/r/20220906000221.34286-2-jarkko@kernel.org	2022-09-08 13:27:44 -07:00
Youquan Song	2738c69a88	EDAC/i10nm: Add driver decoder for Ice Lake and Tremont CPUs Current i10nm_edac only supports firmware decoder (ACPI DSM methods). MCA bank registers of Ice Lake or Tremont CPUs contain the information to decode DDR memory errors. To get better decoding performance, add the driver decoder (decoding DDR memory errors via extracting error information from MCA bank registers) for Ice Lake and Tremont CPUs. Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20220901194310.115427-1-tony.luck@intel.com/	2022-09-08 11:40:01 -07:00
Kan Liang	fae9ebde96	perf/x86/intel: Optimize FIXED_CTR_CTRL access All the fixed counters share a fixed control register. The current perf reads and re-writes the fixed control register for each fixed counter disable/enable, which is unnecessary. When changing the fixed control register, the entire PMU must be disabled via the global control register. The changing cannot be taken effect until the entire PMU is re-enabled. Only updating the fixed control register once right before the entire PMU re-enabling is enough. The read of the fixed control register is not necessary either. The value can be cached in the per CPU cpu_hw_events. Test results: Counting all the fixed counters with the perf bench sched pipe as below on a SPR machine. $perf stat -e cycles,instructions,ref-cycles,slots --no-inherit -- taskset -c 1 perf bench sched pipe The Total elapsed time reduces from 5.36s (without the patch) to 4.99s (with the patch), which is ~6.9% improvement. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220804140729.2951259-1-kan.liang@linux.intel.com	2022-09-07 21:54:04 +02:00
Peter Zijlstra	dbf4e792be	perf/x86/p4: Remove perfctr_second_write quirk Now that we have a x86_pmu::set_period() method, use it to remove the perfctr_second_write quirk from the generic code. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220829101321.839502514@infradead.org	2022-09-07 21:54:04 +02:00
Peter Zijlstra	1acab2e01c	perf/x86/intel: Remove x86_pmu::update_topdown_event Now that it is all internal to the intel driver, remove x86_pmu::update_topdown_event. Assumes that is_topdown_count(event) can only be true when the hardware has topdown stuff and the function is set. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220829101321.771635301@infradead.org	2022-09-07 21:54:03 +02:00
Peter Zijlstra	2368516731	perf/x86/intel: Remove x86_pmu::set_topdown_event_period Now that it is all internal to the intel driver, remove x86_pmu::set_topdown_event_period. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220829101321.706354189@infradead.org	2022-09-07 21:54:03 +02:00
Peter Zijlstra	08b3068fab	perf/x86: Add a x86_pmu::limit_period static_call Avoid a branch and indirect call. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220829101321.640658334@infradead.org	2022-09-07 21:54:03 +02:00
Peter Zijlstra	28f0f3c44b	perf/x86: Change x86_pmu::limit_period signature In preparation for making it a static_call, change the signature. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220829101321.573713839@infradead.org	2022-09-07 21:54:02 +02:00
Peter Zijlstra	e577bb17a1	perf/x86/intel: Move the topdown stuff into the intel driver Use the new x86_pmu::{set_period,update}() methods to push the topdown stuff into the Intel driver, where it belongs. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220829101321.505933457@infradead.org	2022-09-07 21:54:02 +02:00
Peter Zijlstra	73759c3463	perf/x86: Add two more x86_pmu methods In order to clean up x86_perf_event_{set_period,update)() start by adding them as x86_pmu methods. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220829101321.440196408@infradead.org	2022-09-07 21:54:02 +02:00
Anshuman Khandual	88081cfb69	x86/perf: Assert all platform event flags are within PERF_EVENT_FLAG_ARCH Ensure all platform specific event flags are within PERF_EVENT_FLAG_ARCH. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: James Clark <james.clark@arm.com> Link: https://lkml.kernel.org/r/20220907091924.439193-5-anshuman.khandual@arm.com	2022-09-07 21:54:01 +02:00
Yonghong Song	a9c5ad31fb	bpf: x86: Support in-register struct arguments in trampoline programs In C, struct value can be passed as a function argument. For small structs, struct value may be passed in one or more registers. For trampoline based bpf programs, this would cause complication since one-to-one mapping between function argument and arch argument register is not valid any more. The latest llvm16 added bpf support to pass by values for struct up to 16 bytes ([1]). This is also true for x86_64 architecture where two registers will hold the struct value if the struct size is >8 and <= 16. This may not be true if one of struct member is 'double' type but in current linux source code we don't have such instance yet, so we assume all >8 && <= 16 struct holds two general purpose argument registers. Also change on-stack nr_args value to the number of registers holding the arguments. This will permit bpf_get_func_arg() helper to get all argument values. [1] https://reviews.llvm.org/D132144 Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220831152652.2078600-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-09-06 19:51:14 -07:00
Kan Liang	ee9db0e14b	perf: Use sample_flags for txn Use the new sample_flags to indicate whether the txn field is filled by the PMU driver. Remove the txn field from the perf_sample_data_init() to minimize the number of cache lines touched. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220901130959.1285717-7-kan.liang@linux.intel.com	2022-09-06 11:33:03 +02:00
Kan Liang	e16fd7f2cb	perf: Use sample_flags for data_src Use the new sample_flags to indicate whether the data_src field is filled by the PMU driver. Remove the data_src field from the perf_sample_data_init() to minimize the number of cache lines touched. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220901130959.1285717-6-kan.liang@linux.intel.com	2022-09-06 11:33:03 +02:00
Kan Liang	2abe681da0	perf: Use sample_flags for weight Use the new sample_flags to indicate whether the weight field is filled by the PMU driver. Remove the weight field from the perf_sample_data_init() to minimize the number of cache lines touched. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220901130959.1285717-5-kan.liang@linux.intel.com	2022-09-06 11:33:02 +02:00
Kan Liang	a9a931e266	perf: Use sample_flags for branch stack Use the new sample_flags to indicate whether the branch stack is filled by the PMU driver. Remove the br_stack from the perf_sample_data_init() to minimize the number of cache lines touched. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220901130959.1285717-4-kan.liang@linux.intel.com	2022-09-06 11:33:02 +02:00
Kan Liang	47a3aeb39e	perf/x86/intel/pebs: Fix PEBS timestamps overwritten The PEBS TSC-based timestamps do not appear correctly in the final perf.data output file from perf record. The data->time field setup by PEBS in the setup_pebs_fixed_sample_data() is later overwritten by perf_events generic code in perf_prepare_sample(). There is an ordering problem. Set the sample flags when the data->time is updated by PEBS. The data->time field will not be overwritten anymore. Reported-by: Andreas Kogler <andreas.kogler.0x@gmail.com> Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220901130959.1285717-3-kan.liang@linux.intel.com	2022-09-06 11:33:01 +02:00
Sebastian Andrzej Siewior	8cbb2b50ee	asm-generic: Conditionally enable do_softirq_own_stack() via Kconfig. Remove the CONFIG_PREEMPT_RT symbol from the ifdef around do_softirq_own_stack() and move it to Kconfig instead. Enable softirq stacks based on SOFTIRQ_ON_OWN_STACK which depends on HAVE_SOFTIRQ_ON_OWN_STACK and its default value is set to !PREEMPT_RT. This ensures that softirq stacks are not used on PREEMPT_RT and avoids a 'select' statement on an option which has a 'depends' statement. Link: https://lore.kernel.org/YvN5E%2FPrHfUhggr7@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2022-09-05 17:20:55 +02:00
Ingo Molnar	c0d2e63d4c	x86/defconfig: Enable CONFIG_DEBUG_WX=y 7 years after it got introduced it's time to make this the default, at least in the x86 defconfigs. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2022-09-02 10:41:42 +02:00
Ingo Molnar	81290934ea	x86/defconfig: Refresh the defconfigs Just go through a 'make savedefconfig' cycle to pick up fresh Kconfig details, no change in settings, just reordering of some entries. ( This makes followup changes generated via 'make savedefconfig' contain less noise. ) Signed-off-by: Ingo Molnar <mingo@kernel.org>	2022-09-02 10:37:53 +02:00
Ingo Molnar	1043897681	Merge branch 'linus' into x86/mm, to refresh the branch This branch is ~14k commits behind upstream, and has an old merge base from early into the merge window, refresh it to v6.0-rc3+fixes before queueing up new commits. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2022-09-02 10:33:33 +02:00
Ashok Raj	7fce8d6ecc	x86/microcode: Print previous version of microcode after reload Print both old and new versions of microcode after a reload is complete because knowing the previous microcode version is sometimes important from a debugging perspective. [ bp: Massage commit message. ] Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20220829181030.722891-1-ashok.raj@intel.com	2022-09-02 08:01:58 +02:00
Paolo Bonzini	29250ba51b	PCI interpretation compile fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEwGNS88vfc9+v45Yq41TmuOI4ufgFAmMMpCYACgkQ41TmuOI4 ufirVA/+NpbDVgiGehgV9ngukDFWNNBOvPeKFIjnIWmhcTvRtQ/YWIALKJSinaVl jyPOTM0cEcrxhIPM2kHmnkZhwECRwM32Ox2N6ftmOPEJk8xrP6glfCuwJ6Yl/pNU EBLWzgZGRZyDGmCNo0jvaycPIh/WkeNKGwe/3FlI8gHx7iALvihE/0IFoPlCEWky dbetAAx5W0fRCoKscbMh0q2mDXuz4KBu3yij9wMGlhSzLcrSr9qejCmjEdERruYM c9hrLtIcy8cS9Epa5kOnz0PekK4EUlecB66BjnvsXabjTVNlF8MJwEG5EZfGxP+a jdthEjXaO5u7z4Y5NKxDwTFslmzX3PPbdBka9VGmhLkkV1/jGZ5RKKVY7+MljV74 ZhhVkwUB0D3c7Wbpo+S1mVY6EHi/RO4TkaPCORsmCvwYXFZyhNsteKnWdRbcu3Iu aIjKamGeEfe8Mqojmzaksql9ScNnW9/INUyObz5UruGXFg2uFd0Vae20ZBpn7DYh gBYKZAFArNzHUUpP5zMBPnwpTGt8MuurI7rnVEvVRqsy4SofbYZYuFdxpn2obzui zPQY/lg6eXUWZkbfqJNrA1JN3cTSxVQCO96z8yYp/RazYiKMGUgZ7Ps7WlZkxRNE OSMkssXz6cIL1QPbu75iu6FMQGcVAywH2EN8jJh6E0ceUcVhsB0= =ENc3 -----END PGP SIGNATURE----- Merge tag 'kvm-s390-master-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD PCI interpretation compile fixes	2022-09-01 19:21:27 -04:00
Paolo Bonzini	22c6a0ef6b	KVM: x86: check validity of argument to KVM_SET_MP_STATE An invalid argument to KVM_SET_MP_STATE has no effect other than making the vCPU fail to run at the next KVM_RUN. Since it is extremely unlikely that any userspace is relying on it, fail with -EINVAL just like for other architectures. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-01 19:20:59 -04:00
Like Xu	87693645ae	perf/x86/core: Completely disable guest PEBS via guest's global_ctrl When a guest PEBS counter is cross-mapped by a host counter, software will remove the corresponding bit in the arr[global_ctrl].guest and expect hardware to perform a change of state "from enable to disable" via the msr_slot[] switch during the vmx transaction. The real world is that if user adjust the counter overflow value small enough, it still opens a tiny race window for the previously PEBS-enabled counter to write cross-mapped PEBS records into the guest's PEBS buffer, when arr[global_ctrl].guest has been prioritised (switch_msr_special stuff) to switch into the enabled state, while the arr[pebs_enable].guest has not. Close this window by clearing invalid bits in the arr[global_ctrl].guest. Cc: linux-perf-users@vger.kernel.org Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sean Christopherson <seanjc@google.com> Fixes: `854250329c` ("KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations") Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220831033524.58561-1-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-01 19:20:59 -04:00
Miaohe Lin	3c0ba05ce9	KVM: x86: fix memoryleak in kvm_arch_vcpu_create() When allocating memory for mci_ctl2_banks fails, KVM doesn't release mce_banks leading to memoryleak. Fix this issue by calling kfree() for it when kcalloc() fails. Fixes: `281b52780b` ("KVM: x86: Add emulation for MSR_IA32_MCx_CTL2 MSRs.") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Message-Id: <20220901122300.22298-1-linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-01 19:20:58 -04:00
Jim Mattson	0204750bd4	KVM: x86: Mask off unsupported and unknown bits of IA32_ARCH_CAPABILITIES KVM should not claim to virtualize unknown IA32_ARCH_CAPABILITIES bits. When kvm_get_arch_capabilities() was originally written, there were only a few bits defined in this MSR, and KVM could virtualize all of them. However, over the years, several bits have been defined that KVM cannot just blindly pass through to the guest without additional work (such as virtualizing an MSR promised by the IA32_ARCH_CAPABILITES feature bit). Define a mask of supported IA32_ARCH_CAPABILITIES bits, and mask off any other bits that are set in the hardware MSR. Cc: Paolo Bonzini <pbonzini@redhat.com> Fixes: `5b76a3cff0` ("KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20220830174947.2182144-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-01 19:20:58 -04:00
Al Viro	235185b8ed	sgx: use ->f_mapping... Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-09-01 17:43:29 -04:00
Peter Zijlstra	652c5bf380	x86/mm: Refuse W^X violations x86 has STRICT_*_RWX, but not even a warning when someone violates it. Add this warning and fully refuse the transition. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/YwySW3ROc21hN7g9@hirez.programming.kicks-ass.net	2022-09-01 11:10:19 -07:00
Like Xu	f2aeea5750	perf/x86/core: Completely disable guest PEBS via guest's global_ctrl When a guest PEBS counter is cross-mapped by a host counter, software will remove the corresponding bit in the arr[global_ctrl].guest and expect hardware to perform a change of state "from enable to disable" via the msr_slot[] switch during the vmx transaction. The real world is that if user adjust the counter overflow value small enough, it still opens a tiny race window for the previously PEBS-enabled counter to write cross-mapped PEBS records into the guest's PEBS buffer, when arr[global_ctrl].guest has been prioritised (switch_msr_special stuff) to switch into the enabled state, while the arr[pebs_enable].guest has not. Close this window by clearing invalid bits in the arr[global_ctrl].guest. Fixes: `854250329c` ("KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations") Signed-off-by: Like Xu <likexu@tencent.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220831033524.58561-1-likexu@tencent.com	2022-09-01 11:19:42 +02:00
Kan Liang	24919fdea6	perf/x86/intel: Fix unchecked MSR access error for Alder Lake N For some Alder Lake N machine, the below unchecked MSR access error may be triggered. [ 0.088017] rcu: Hierarchical SRCU implementation. [ 0.088017] unchecked MSR access error: WRMSR to 0x38f (tried to write 0x0001000f0000003f) at rIP: 0xffffffffb5684de8 (native_write_msr+0x8/0x30) [ 0.088017] Call Trace: [ 0.088017] <TASK> [ 0.088017] __intel_pmu_enable_all.constprop.46+0x4a/0xa0 The Alder Lake N only has e-cores. The X86_FEATURE_HYBRID_CPU flag is not set. The perf cannot retrieve the correct CPU type via get_this_hybrid_cpu_type(). The model specific get_hybrid_cpu_type() is hardcode to p-core. The wrong CPU type is given to the PMU of the Alder Lake N. Since Alder Lake N isn't in fact a hybrid CPU, remove ALDERLAKE_N from the rest of {ALDER,RAPTOP}LAKE and create a non-hybrid PMU setup. The differences between Gracemont and the previous Tremont are, - Number of GP counters - Load and store latency Events - PEBS event_constraints - Instruction Latency support - Data source encoding - Memory access latency encoding Fixes: `c2a960f7c5` ("perf/x86: Add new Alder Lake and Raptor Lake support") Reported-by: Jianfeng Gao <jianfeng.gao@intel.com> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220831142702.153110-1-kan.liang@linux.intel.com	2022-09-01 11:19:41 +02:00
Stephen Kitt	7987448ffc	x86/Kconfig: Specify idle=poll instead of no-hlt Commit `27be457000` ("x86 idle: remove 32-bit-only "no-hlt" parameter, hlt_works_ok flag") removed no-hlt, but CONFIG_APM still refers to it. Suggest "idle=poll" instead, based on the commit message: > If a user wants to avoid HLT, then "idle=poll" > is much more useful, as it avoids invocation of HLT > in idle, while "no-hlt" failed to do so. Signed-off-by: Stephen Kitt <steve@sk2.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20220713160840.1577569-1-steve@sk2.org	2022-08-31 14:43:36 -07:00
Daniel Sneddon	b8d1d16360	x86/apic: Don't disable x2APIC if locked The APIC supports two modes, legacy APIC (or xAPIC), and Extended APIC (or x2APIC). X2APIC mode is mostly compatible with legacy APIC, but it disables the memory-mapped APIC interface in favor of one that uses MSRs. The APIC mode is controlled by the EXT bit in the APIC MSR. The MMIO/xAPIC interface has some problems, most notably the APIC LEAK [1]. This bug allows an attacker to use the APIC MMIO interface to extract data from the SGX enclave. Introduce support for a new feature that will allow the BIOS to lock the APIC in x2APIC mode. If the APIC is locked in x2APIC mode and the kernel tries to disable the APIC or revert to legacy APIC mode a GP fault will occur. Introduce support for a new MSR (IA32_XAPIC_DISABLE_STATUS) and handle the new locked mode when the LEGACY_XAPIC_DISABLED bit is set by preventing the kernel from trying to disable the x2APIC. On platforms with the IA32_XAPIC_DISABLE_STATUS MSR, if SGX or TDX are enabled the LEGACY_XAPIC_DISABLED will be set by the BIOS. If legacy APIC is required, then it SGX and TDX need to be disabled in the BIOS. [1]: https://aepicleak.com/aepicleak.pdf Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com> Link: https://lkml.kernel.org/r/20220816231943.1152579-1-daniel.sneddon@linux.intel.com	2022-08-31 14:34:11 -07:00
Tony Luck	5515d21c68	x86/cpu: Add CPU model numbers for Meteor Lake Add model numbers for client and mobile parts. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20220824175718.232384-1-tony.luck@intel.com	2022-08-31 14:17:00 -07:00
Kohei Tarumizu	499c8bb469	x86/resctrl: Fix to restore to original value when re-enabling hardware prefetch register The current pseudo_lock.c code overwrites the value of the MSR_MISC_FEATURE_CONTROL to 0 even if the original value is not 0. Therefore, modify it to save and restore the original values. Fixes: `018961ae55` ("x86/intel_rdt: Pseudo-lock region creation/removal core") Fixes: `443810fe61` ("x86/intel_rdt: Create debugfs files for pseudo-locking testing") Fixes: `8a2fc0e1bc` ("x86/intel_rdt: More precise L2 hit/miss measurements") Signed-off-by: Kohei Tarumizu <tarumizu.kohei@fujitsu.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lkml.kernel.org/r/eb660f3c2010b79a792c573c02d01e8e841206ad.1661358182.git.reinette.chatre@intel.com	2022-08-31 11:42:17 -07:00
Yosry Ahmed	43a063cab3	KVM: x86/mmu: count KVM mmu usage in secondary pagetable stats. Count the pages used by KVM mmu on x86 in memory stats under secondary pagetable stats (e.g. "SecPageTables" in /proc/meminfo) to give better visibility into the memory consumption of KVM mmu in a similar way to how normal user page tables are accounted. Add the inner helper in common KVM, ARM will also use it to count stats in a future commit. Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Acked-by: Marc Zyngier <maz@kernel.org> # generic KVM changes Link: https://lore.kernel.org/r/20220823004639.2387269-3-yosryahmed@google.com Link: https://lore.kernel.org/r/20220823004639.2387269-4-yosryahmed@google.com [sean: squash x86 usage to workaround modpost issues] Signed-off-by: Sean Christopherson <seanjc@google.com>	2022-08-30 07:41:12 -07:00
Marco Elver	be3f152568	perf/hw_breakpoint: Optimize constant number of breakpoint slots Optimize internal hw_breakpoint state if the architecture's number of breakpoint slots is constant. This avoids several kmalloc() calls and potentially unnecessary failures if the allocations fail, as well as subtly improves code generation and cache locality. The protocol is that if an architecture defines hw_breakpoint_slots via the preprocessor, it must be constant and the same for all types. Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20220829124719.675715-7-elver@google.com	2022-08-30 10:56:22 +02:00
Peter Zijlstra	bc12b70f7d	x86/earlyprintk: Clean up pciserial While working on a GRUB patch to support PCI-serial, a number of cleanups were suggested that apply to the code I took inspiration from. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci_ids.h Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lkml.kernel.org/r/YwdeyCEtW+wa+QhH@worktop.programming.kicks-ass.net	2022-08-29 12:19:25 +02:00
Anshuman Khandual	a724ec8296	perf: Add system error and not in transaction branch types This expands generic branch type classification by adding two more entries there in i.e system error and not in transaction. This also updates the x86 implementation to process X86_BR_NO_TX records as appropriate. This changes branch types reported to user space on x86 platform but it should not be a problem. The possible scenarios and impacts are enumerated here. -------------------------------------------------------------------------- \| kernel \| perf tool \| Impact \| -------------------------------------------------------------------------- \| old \| old \| Works as before \| -------------------------------------------------------------------------- \| old \| new \| PERF_BR_UNKNOWN is processed \| -------------------------------------------------------------------------- \| new \| old \| PERF_BR_NO_TX is blocked via old PERF_BR_MAX \| -------------------------------------------------------------------------- \| new \| new \| PERF_BR_NO_TX is recognized \| -------------------------------------------------------------------------- When PERF_BR_NO_TX is blocked via old PERF_BR_MAX (new kernel with old perf tool) the user space might throw up an warning complaining about an unrecognized branch types being reported, but it's expected. PERF_BR_SERROR & PERF_BR_NO_TX branch types will be used for BRBE implementation on arm64 platform. PERF_BR_NO_TX complements 'abort' and 'in_tx' elements in perf_branch_entry which represent other transaction states for a given branch record. Because this completes the transaction state classification. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: James Clark <james.clark@arm.com> Link: https://lkml.kernel.org/r/20220824044822.70230-2-anshuman.khandual@arm.com	2022-08-29 09:42:41 +02:00
Jane Chu	f9781bb18e	x86/mce: Retrieve poison range from hardware When memory poison consumption machine checks fire, MCE notifier handlers like nfit_handle_mce() record the impacted physical address range which is reported by the hardware in the MCi_MISC MSR. The error information includes data about blast radius, i.e. how many cachelines did the hardware determine are impacted. A recent change `7917f9cdb5` ("acpi/nfit: rely on mce->misc to determine poison granularity") updated nfit_handle_mce() to stop hard coding the blast radius value of 1 cacheline, and instead rely on the blast radius reported in 'struct mce' which can be up to 4K (64 cachelines). It turns out that apei_mce_report_mem_error() had a similar problem in that it hard coded a blast radius of 4K rather than reading the blast radius from the error information. Fix apei_mce_report_mem_error() to convey the proper poison granularity. Signed-off-by: Jane Chu <jane.chu@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8502@oracle.com Link: https://lore.kernel.org/r/20220826233851.1319100-1-jane.chu@oracle.com	2022-08-29 09:33:42 +02:00
Linus Torvalds	2f23a7c914	Misc fixes: - Fix PAT on Xen, which caused i915 driver failures - Fix compat INT 80 entry crash on Xen PV guests - Fix 'MMIO Stale Data' mitigation status reporting on older Intel CPUs - Fix RSB stuffing regressions - Fix ORC unwinding on ftrace trampolines - Add Intel Raptor Lake CPU model number - Fix (work around) a SEV-SNP bootloader bug providing bogus values in boot_params->cc_blob_address, by ignoring the value on !SEV-SNP bootups. - Fix SEV-SNP early boot failure - Fix the objtool list of noreturn functions and annotate snp_abort(), which bug confused objtool on gcc-12. - Fix the documentation for retbleed Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmMLgUERHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hU1hAAj9bKO+D07gROpkRhXpLvbqm+mUqk6It8 qEuyLkC/xvD9N//Pya0ZPPE+l23EHQxM/L0tXCYwsc8Zlx3fr678FQktHZuxULaP qlGmwub8PvsyMesXBGtgHJ4RT8L07FelBR+E/TpqCSbv/geM7mcc0ojyOKo+OmbD vbsUTDNqsMYndzePUBrv/vJRqVLGlbd1a22DKE/YiYBbZu5hughNUB0bHYhYdsml mutcRsVTKRbZOi4wiuY/pnveMf/z9wAwKOWd9EBaDWILygVSHkBJJQyhHigBh3F8 H1hFn1q4ybULdrAUT4YND9Tjb6U8laU0fxg8Z43bay7bFXXElqoj3W2qka9NjAim QvWSFvTYQRTIWt6sCshVsUBWj6fBZdHxcJ8Fh+ucJJp+JWl9/aD0A41vK2Jx0LIt p8YzBRKEqd1Q/7QD855BArN7HNtQGgShNt4oPVZ7nPnjuQt0+lngu8NR+6X3RKpX r8rordyPvzgPL6W+1uV+8hnz1w+YU2xplAbK+zTijwgJVgyf8khSlZQNpFKULrou zjtzo/2nB+4C4bvfetNnaOGhi1/AdCHZHyZE35rotpd73SLHvdOrH0Ll9oCVfVrC UWbC1E67cHQw97Ni/4CrCsJRBULK01uyszVCxlEkSYInf0UsKlnmy+TqxizwsCVy reYQ0ePyWg0= =ZpCJ -----END PGP SIGNATURE----- Merge tag 'x86-urgent-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix PAT on Xen, which caused i915 driver failures - Fix compat INT 80 entry crash on Xen PV guests - Fix 'MMIO Stale Data' mitigation status reporting on older Intel CPUs - Fix RSB stuffing regressions - Fix ORC unwinding on ftrace trampolines - Add Intel Raptor Lake CPU model number - Fix (work around) a SEV-SNP bootloader bug providing bogus values in boot_params->cc_blob_address, by ignoring the value on !SEV-SNP bootups. - Fix SEV-SNP early boot failure - Fix the objtool list of noreturn functions and annotate snp_abort(), which bug confused objtool on gcc-12. - Fix the documentation for retbleed * tag 'x86-urgent-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/ABI: Mention retbleed vulnerability info file for sysfs x86/sev: Mark snp_abort() noreturn x86/sev: Don't use cc_platform_has() for early SEV-SNP calls x86/boot: Don't propagate uninitialized boot_params->cc_blob_address x86/cpu: Add new Raptor Lake CPU model number x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry x86/nospec: Fix i386 RSB stuffing x86/nospec: Unwreck the RSB stuffing x86/bugs: Add "unknown" reporting for MMIO Stale Data x86/entry: Fix entry_INT80_compat for Xen PV guests x86/PAT: Have pat_enabled() properly reflect state when running on Xen	2022-08-28 10:10:23 -07:00
Linus Torvalds	4459d800f7	Misc fixes: an Arch-LBR fix, a PEBS enumeration fix, an Intel DS fix, PEBS constraints fix on Alder Lake CPUs and an Intel uncore PMU fix. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmMLfEERHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1iUdxAAqWLRHp1JQlANJxbdwmJu/PMwjlhXLn63 w71UPXou172jEJWk6PxEkllMLfJBAe1hL0CW2VE1DlGFTfzOTwBtylLz8frhF5am 4smCwAppGzK/r6gOABwhgPG/rbU5TJRhjmRMkPmfeOmFZSD/L4DHcDu8HbG4ruz9 lhgKnB+TUmNBLyYQ7oqfnNsGNI5uuyJhzA8/kddPgEkK5XeebCxqZCXDRSp9LbUg 4BkGCB2R+MfiHlttCGzOKkW+dQafA+pUQMfoZHSFJ30lB7UuvpsVl5FTiXQ5cu+f TGkjyBIzkNqNJHrRebQ3kkLYY6rlTgJTvrk7QdnWY2sb1J6B0ktxqBd+DG47Pc9S IVOe66ikoVnV/Bws6mFxN8Kj/U4L38M+373hdUvyQd8kwuvy4c6fXGFZqfF8VCMf zHZQJR8eeOKP1EAOIE7tDb2eY+pWnherZlm3VsHYZLtOfhDepZH6bvRWO2wXbl3I R+Wr/PYZih2AzcvHj+CtpS8jKFAvBG6rbqlElWZ9ain0TrV0uMuI8I3HQ9WqMa5H sPukVqtOczxXSMgTCilHK5S+ymq2xOHdRgo2FZUIP/5SllMfiWpYFRdaS8oh2K/j M6zyc5kXajSeHdSKc/2O8imkcurNN2vgdXVDsIzZGqw6phFepMVmsXYeRD9msQDT aZTvVfOmvvQ= =I/UD -----END PGP SIGNATURE----- Merge tag 'perf-urgent-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf fixes from Ingo Molnar: "Misc fixes: an Arch-LBR fix, a PEBS enumeration fix, an Intel DS fix, PEBS constraints fix on Alder Lake CPUs and an Intel uncore PMU fix" * tag 'perf-urgent-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Fix broken read_counter() for SNB IMC PMU perf/x86/intel: Fix pebs event constraints for ADL perf/x86/intel/ds: Fix precise store latency handling perf/x86/core: Set pebs_capable and PMU_FL_PEBS_ALL for the Baseline perf/x86/lbr: Enable the branch type for the Arch LBR by default	2022-08-28 10:05:42 -07:00
Linus Torvalds	05519f2480	xen: branch for v6.0-rc3 -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYwnVogAKCRCAXGG7T9hj vkvCAP97gXbhilg8GkmhSgwsn8XAnR3PFWonR+PA/NBQwsrx4AD/by18f7ICyOCl /r2PHDjYSiyMjwBSjrdm5Uo/kNBaqAM= =cIk8 -----END PGP SIGNATURE----- Merge tag 'for-linus-6.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - two minor cleanups - a fix of the xen/privcmd driver avoiding a possible NULL dereference in an error case * tag 'for-linus-6.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/privcmd: fix error exit of privcmd_ioctl_dm_op() xen: move from strlcpy with unused retval to strscpy xen: x86: remove setting the obsolete config XEN_MAX_DOMAIN_MEMORY	2022-08-27 15:38:00 -07:00
Sandipan Das	0bc3be5b4b	perf/x86/amd/lbr: Add LbrExtV2 branch speculation info support Provide branch speculation information captured via AMD Last Branch Record Extension Version 2 (LbrExtV2) by setting the speculation info in branch records. The info is based on the "valid" and "spec" bits in the Branch To registers. Suggested-by: Stephane Eranian <eranian@google.com> Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/ddc02f6320464cad0e3ff5bdb2314531568a91bc.1660211399.git.sandipan.das@amd.com	2022-08-27 00:05:45 +02:00
Sandipan Das	245268c19f	perf/x86/amd/lbr: Use fusion-aware branch classifier AMD Last Branch Record Extension Version 2 (LbrExtV2) can report a branch from address that points to an instruction preceding the actual branch by several bytes due to branch fusion and further optimizations in Zen4 processors. In such cases, software should move forward sequentially in the instruction stream from the reported address and the address of the first branch encountered should be used instead. Hence, use the fusion-aware branch classifier to determine the correct branch type and get the offset for adjusting the branch from address. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/c324d2d0a9c3976da30b9563d09e50bfee0f264d.1660211399.git.sandipan.das@amd.com	2022-08-27 00:05:45 +02:00
Sandipan Das	df3e9612f7	perf/x86: Make branch classifier fusion-aware With branch fusion and other optimizations, branch sampling hardware in some processors can report a branch from address that points to an instruction preceding the actual branch by several bytes. In such cases, the classifier cannot determine the branch type which leads to failures such as with the recently added test from commit `b55878c90a` ("perf test: Add test for branch stack sampling"). Branch information is also easier to consume and annotate if branch from addresses always point to branch instructions. Add a new variant of the branch classifier that can account for instruction fusion. If fusion is expected and the current branch from address does not point to a branch instruction, it attempts to find the first branch within the next (MAX_INSN_SIZE - 1) bytes and if found, additionally provides the offset between the reported branch from address and the address of the expected branch instruction. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/b6bb0abaa8a54c0b6d716344700ee11a1793d709.1660211399.git.sandipan.das@amd.com	2022-08-27 00:05:44 +02:00
Sandipan Das	f9c732249b	perf/x86/amd/lbr: Add LbrExtV2 software branch filter support With AMD Last Branch Record Extension Version 2 (LbrExtV2), it is necessary to process the branch records further as hardware filtering is not granular enough for identifying certain types of branches. E.g. to record system calls, one should record far branches. The filter captures both far calls and far returns but the irrelevant records are filtered out based on the branch type as seen by the branch classifier. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/e51de057517f77788abd393c832e8dea616d489c.1660211399.git.sandipan.das@amd.com	2022-08-27 00:05:44 +02:00
Sandipan Das	4462fbfe6e	perf/x86: Move branch classifier Commit `3e702ff6d1` ("perf/x86: Add LBR software filter support for Intel CPUs") introduces a software branch filter which complements the hardware branch filter and adds an x86 branch classifier. Move the branch classifier to arch/x86/events/ so that it can be utilized by other vendors for branch record filtering. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/bae5b95470d6bd49f40954bd379f414f5afcb965.1660211399.git.sandipan.das@amd.com	2022-08-27 00:05:44 +02:00
Sandipan Das	f4f925dae7	perf/x86/amd/lbr: Add LbrExtV2 hardware branch filter support If AMD Last Branch Record Extension Version 2 (LbrExtV2) is detected, convert the requested branch filter (PERF_SAMPLE_BRANCH_* flags) to the corresponding hardware filter value and stash it in the event data when a branch stack is requested. The hardware filter value is also saved in per-CPU areas for use during event scheduling. Hardware filtering is provided by the LBR Branch Select register. It has bits which when set, suppress recording of the following types of branches: * CPL = 0 (Kernel only) * CPL > 0 (Userspace only) * Conditional Branches * Near Relative Calls * Near Indirect Calls * Near Returns * Near Indirect Jumps (excluding Near Indirect Calls and Near Returns) * Near Relative Jumps (excluding Near Relative Calls) * Far Branches Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/9336af5c9785b8e14c62220fc0e6cfb10ab97de3.1660211399.git.sandipan.das@amd.com	2022-08-27 00:05:43 +02:00
Sandipan Das	ca5b7c0d96	perf/x86/amd/lbr: Add LbrExtV2 branch record support If AMD Last Branch Record Extension Version 2 (LbrExtV2) is detected, enable it alongside LBR Freeze on PMI when an event requests branch stack i.e. PERF_SAMPLE_BRANCH_STACK. Each branch record is represented by a pair of registers, LBR From and LBR To. The freeze feature prevents any updates to these registers once a PMC overflows. The contents remain unchanged until the freeze bit is cleared by the PMI handler. The branch records are read and copied to sample data before unfreezing. However, only valid entries are copied. There is no additional register to denote which of the register pairs represent the top of the stack (TOS) since internal register renaming always ensures that the first pair (i.e. index 0) is the one representing the most recent branch and so on. The LBR registers are per-thread resources and are cleared explicitly whenever a new task is scheduled in. There are no special implications on the contents of these registers when transitioning to deep C-states. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/d3b8500a3627a0d4d0259b005891ee248f248d91.1660211399.git.sandipan.das@amd.com	2022-08-27 00:05:43 +02:00
Sandipan Das	703fb765f4	perf/x86/amd/lbr: Detect LbrExtV2 support AMD Last Branch Record Extension Version 2 (LbrExtV2) is driven by Core PMC overflows. It records recently taken branches up to the moment when the PMC overflow occurs. Detect the feature during PMU initialization and set the branch stack depth using CPUID leaf 0x80000022 EBX. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/fc6e45378ada258f1bab79b0de6e05c393a8f1dd.1660211399.git.sandipan.das@amd.com	2022-08-27 00:05:43 +02:00
Sandipan Das	257449c6a5	x86/cpufeatures: Add LbrExtV2 feature bit CPUID leaf 0x80000022 i.e. ExtPerfMonAndDbg advertises some new performance monitoring features for AMD processors. Bit 1 of EAX indicates support for Last Branch Record Extension Version 2 (LbrExtV2) features. If found to be set during PMU initialization, the EBX bits of the same leaf can be used to determine the number of available LBR entries. For better utilization of feature words, LbrExtV2 is added as a scattered feature bit. [peterz: Rename to AMD_LBR_V2] Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/172d2b0df39306ed77221c45ee1aa62e8ae0548d.1660211399.git.sandipan.das@amd.com	2022-08-27 00:05:42 +02:00
Sandipan Das	706460a96f	perf/x86/amd/core: Add generic branch record interfaces AMD processors that are capable of recording branches support either Branch Sampling (BRS) or Last Branch Record (LBR). In preparation for adding Last Branch Record Extension Version 2 (LbrExtV2) support, introduce new static calls which act as gateways to call into the feature-dependent functions based on what is available on the processor. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/b75dbc32663cb395f0d701167e952c6a6b0445a3.1660211399.git.sandipan.das@amd.com	2022-08-27 00:05:42 +02:00
Sandipan Das	9603aa79e8	perf/x86/amd/core: Refactor branch attributes AMD processors that are capable of recording branches support either Branch Sampling (BRS) or Last Branch Record (LBR). In preparation for adding Last Branch Record Extension Version 2 (LbrExtV2) support, reuse the "branches" capability to advertise information about both BRS and LBR but make the "branch-brs" event exclusive to Family 19h processors that support BRS. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/ba4a4cde6db79b1c65c49834027bbdb8a915546b.1660211399.git.sandipan.das@amd.com	2022-08-27 00:05:42 +02:00
Sandipan Das	b40d0156f5	perf/x86/amd/brs: Move feature-specific functions Move some of the Branch Sampling (BRS) specific functions out of the Core events sources and into the BRS sources in preparation for adding other mechanisms to record branches. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/b60283b57179475d18ee242d117c335c16733693.1660211399.git.sandipan.das@amd.com	2022-08-27 00:05:41 +02:00
Stephane Eranian	11745ecfe8	perf/x86/intel/uncore: Fix broken read_counter() for SNB IMC PMU Existing code was generating bogus counts for the SNB IMC bandwidth counters: $ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/ 1.000327813 1,024.03 MiB uncore_imc/data_reads/ 1.000327813 20.73 MiB uncore_imc/data_writes/ 2.000580153 261,120.00 MiB uncore_imc/data_reads/ 2.000580153 23.28 MiB uncore_imc/data_writes/ The problem was introduced by commit: `07ce734dd8` ("perf/x86/intel/uncore: Clean up client IMC") Where the read_counter callback was replace to point to the generic uncore_mmio_read_counter() function. The SNB IMC counters are freerunnig 32-bit counters laid out contiguously in MMIO. But uncore_mmio_read_counter() is using a readq() call to read from MMIO therefore reading 64-bit from MMIO. Although this is okay for the uncore_perf_event_update() function because it is shifting the value based on the actual counter width to compute a delta, it is not okay for the uncore_pmu_event_start() which is simply reading the counter and therefore priming the event->prev_count with a bogus value which is responsible for causing bogus deltas in the perf stat command above. The fix is to reintroduce the custom callback for read_counter for the SNB IMC PMU and use readl() instead of readq(). With the change the output of perf stat is back to normal: $ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/ 1.000120987 296.94 MiB uncore_imc/data_reads/ 1.000120987 138.42 MiB uncore_imc/data_writes/ 2.000403144 175.91 MiB uncore_imc/data_reads/ 2.000403144 68.50 MiB uncore_imc/data_writes/ Fixes: `07ce734dd8` ("perf/x86/intel/uncore: Clean up client IMC") Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20220803160031.1379788-1-eranian@google.com	2022-08-27 00:05:38 +02:00
Mikulas Patocka	8238b45798	wait_on_bit: add an acquire memory barrier There are several places in the kernel where wait_on_bit is not followed by a memory barrier (for example, in drivers/md/dm-bufio.c:new_read). On architectures with weak memory ordering, it may happen that memory accesses that follow wait_on_bit are reordered before wait_on_bit and they may return invalid data. Fix this class of bugs by introducing a new function "test_bit_acquire" that works like test_bit, but has acquire memory ordering semantics. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Will Deacon <will@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-08-26 09:30:25 -07:00
Robert Elliott	cf514b2a59	crypto: Kconfig - simplify cipher entries Shorten menu titles and make them consistent: - acronym - name - architecture features in parenthesis - no suffixes like "<something> algorithm", "support", or "hardware acceleration", or "optimized" Simplify help text descriptions, update references, and ensure that https references are still valid. Signed-off-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-08-26 18:50:43 +08:00
Robert Elliott	3f342a2325	crypto: Kconfig - simplify hash entries Shorten menu titles and make them consistent: - acronym - name - architecture features in parenthesis - no suffixes like "<something> algorithm", "support", or "hardware acceleration", or "optimized" Simplify help text descriptions, update references, and ensure that https references are still valid. Signed-off-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-08-26 18:50:43 +08:00
Robert Elliott	e3d2eadd06	crypto: Kconfig - simplify aead entries Shorten menu titles and make them consistent: - acronym - name - architecture features in parenthesis - no suffixes like "<something> algorithm", "support", or "hardware acceleration", or "optimized" Simplify help text descriptions, update references, and ensure that https references are still valid. Signed-off-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-08-26 18:50:42 +08:00
Robert Elliott	ec84348da4	crypto: Kconfig - simplify CRC entries Shorten menu titles and make them consistent: - acronym - name - architecture features in parenthesis - no suffixes like "<something> algorithm", "support", or "hardware acceleration", or "optimized" Simplify help text descriptions, update references, and ensure that https references are still valid. Signed-off-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-08-26 18:50:42 +08:00
Robert Elliott	05b3746527	crypto: Kconfig - simplify public-key entries Shorten menu titles and make them consistent: - acronym - name - architecture features in parenthesis - no suffixes like "<something> algorithm", "support", or "hardware acceleration", or "optimized" Simplify help text descriptions, update references, and ensure that https references are still valid. Signed-off-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-08-26 18:50:42 +08:00
Robert Elliott	28a936ef44	crypto: Kconfig - move x86 entries to a submenu Move CPU-specific crypto/Kconfig entries to arch/xxx/crypto/Kconfig and create a submenu for them under the Crypto API menu. Suggested-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-08-26 18:50:41 +08:00
Borislav Petkov	8c61eafd22	x86/microcode: Remove ->request_microcode_user() `181b6f40e9` ("x86/microcode: Rip out the OLD_INTERFACE") removed the old microcode loading interface but forgot to remove the related ->request_microcode_user() functionality which it uses. Rip it out now too. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220825075445.28171-1-bp@alien8.de	2022-08-26 11:56:08 +02:00
Borislav Petkov	c93c296fff	x86/sev: Mark snp_abort() noreturn Mark both the function prototype and definition as noreturn in order to prevent the compiler from doing transformations which confuse objtool like so: vmlinux.o: warning: objtool: sme_enable+0x71: unreachable instruction This triggers with gcc-12. Add it and sev_es_terminate() to the objtool noreturn tracking array too. Sort it while at it. Suggested-by: Michael Matz <matz@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220824152420.20547-1-bp@alien8.de	2022-08-25 15:54:03 +02:00
Lukas Bulwahn	ab0af755d4	xen: x86: remove setting the obsolete config XEN_MAX_DOMAIN_MEMORY Commit `c70727a5bc` ("xen: allow more than 512 GB of RAM for 64 bit pv-domains") from July 2015 replaces the config XEN_MAX_DOMAIN_MEMORY with a new config XEN_512GB, but misses to adjust arch/x86/configs/xen.config. As XEN_512GB defaults to yes, there is no need to explicitly set any config in xen.config. Just remove setting the obsolete config XEN_MAX_DOMAIN_MEMORY. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20220817044333.22310-1-lukas.bulwahn@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com>	2022-08-25 13:38:03 +02:00
Miaohe Lin	d7c9bfb9ca	KVM: x86/mmu: fix memoryleak in kvm_mmu_vendor_module_init() When register_shrinker() fails, KVM doesn't release the percpu counter kvm_total_used_mmu_pages leading to memoryleak. Fix this issue by calling percpu_counter_destroy() when register_shrinker() fails. Fixes: `ab271bd4df` ("x86: kvm: propagate register_shrinker return code") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Link: https://lore.kernel.org/r/20220823063237.47299-1-linmiaohe@huawei.com [sean: tweak shortlog and changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>	2022-08-24 13:47:49 -07:00
Michal Luczaj	6aa5c47c35	KVM: x86/emulator: Fix handing of POP SS to correctly set interruptibility The emulator checks the wrong variable while setting the CPU interruptibility state, the target segment is embedded in the instruction opcode, not the ModR/M register. Fix the condition. Signed-off-by: Michal Luczaj <mhal@rbox.co> Fixes: `a5457e7bcf` ("KVM: emulate: POP SS triggers a MOV SS shadow too") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20220821215900.1419215-1-mhal@rbox.co Signed-off-by: Sean Christopherson <seanjc@google.com>	2022-08-24 13:45:40 -07:00
Junaid Shahid	b24ede2253	kvm: x86: Do proper cleanup if kvm_x86_ops->vm_init() fails If vm_init() fails [which can happen, for instance, if a memory allocation fails during avic_vm_init()], we need to cleanup some state in order to avoid resource leaks. Signed-off-by: Junaid Shahid <junaids@google.com> Link: https://lore.kernel.org/r/20220729224329.323378-1-junaids@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2022-08-24 13:41:59 -07:00
Rik van Riel	c926087eb3	x86/mm: Print likely CPU at segfault time In a large enough fleet of computers, it is common to have a few bad CPUs. Those can often be identified by seeing that some commonly run kernel code, which runs fine everywhere else, keeps crashing on the same CPU core on one particular bad system. However, the failure modes in CPUs that have gone bad over the years are often oddly specific, and the only bad behavior seen might be segfaults in programs like bash, python, or various system daemons that run fine everywhere else. Add a printk() to show_signal_msg() to print the CPU, core, and socket at segfault time. This is not perfect, since the task might get rescheduled on another CPU between when the fault hit, and when the message is printed, but in practice this has been good enough to help people identify several bad CPU cores. For example: segfault[1349]: segfault at 0 ip 000000000040113a sp 00007ffc6d32e360 error 4 in \ segfault[401000+1000] likely on CPU 0 (core 0, socket 0) This printk can be controlled through /proc/sys/debug/exception-trace. [ bp: Massage a bit, add "likely" to the printed line to denote that the CPU number is not always reliable. ] Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220805101644.2e674553@imladris.surriel.com	2022-08-24 12:48:05 +02:00
Tom Lendacky	cdaa0a407f	x86/sev: Don't use cc_platform_has() for early SEV-SNP calls When running identity-mapped and depending on the kernel configuration, it is possible that the compiler uses jump tables when generating code for cc_platform_has(). This causes a boot failure because the jump table uses un-mapped kernel virtual addresses, not identity-mapped addresses. This has been seen with CONFIG_RETPOLINE=n. Similar to sme_encrypt_kernel(), use an open-coded direct check for the status of SNP rather than trying to eliminate the jump table. This preserves any code optimization in cc_platform_has() that can be useful post boot. It also limits the changes to SEV-specific files so that future compiler features won't necessarily require possible build changes just because they are not compatible with running identity-mapped. [ bp: Massage commit message. ] Fixes: `5e5ccff60a` ("x86/sev: Add helper for validating pages in early enc attribute changes") Reported-by: Sean Christopherson <seanjc@google.com> Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # 5.19.x Link: https://lore.kernel.org/all/YqfabnTRxFSM+LoX@google.com/	2022-08-24 09:54:32 +02:00
Michael Roth	4b1c742407	x86/boot: Don't propagate uninitialized boot_params->cc_blob_address In some cases, bootloaders will leave boot_params->cc_blob_address uninitialized rather than zeroing it out. This field is only meant to be set by the boot/compressed kernel in order to pass information to the uncompressed kernel when SEV-SNP support is enabled. Therefore, there are no cases where the bootloader-provided values should be treated as anything other than garbage. Otherwise, the uncompressed kernel may attempt to access this bogus address, leading to a crash during early boot. Normally, sanitize_boot_params() would be used to clear out such fields but that happens too late: sev_enable() may have already initialized it to a valid value that should not be zeroed out. Instead, have sev_enable() zero it out unconditionally beforehand. Also ensure this happens for !CONFIG_AMD_MEM_ENCRYPT as well by also including this handling in the sev_enable() stub function. [ bp: Massage commit message and comments. ] Fixes: `b190a043c4` ("x86/sev: Add SEV-SNP feature detection/setup") Reported-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com> Reported-by: watnuss@gmx.de Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=216387 Link: https://lore.kernel.org/r/20220823160734.89036-1-michael.roth@amd.com	2022-08-24 09:03:04 +02:00
Tony Luck	ea902bcc19	x86/cpu: Add new Raptor Lake CPU model number Note1: Model 0xB7 already claimed the "no suffix" #define for a regular client part, so add (yet another) suffix "S" to distinguish this new part from the earlier one. Note2: the RAPTORLAKE* and ALDERLAKE* processors are very similar from a software enabling point of view. There are no known features that have model-specific enabling and also differ between the two. In other words, every single place that list one or more RAPTORLAKE* or ALDERLAKE* processors should list all of them. Note3: This is being merged before there is an in-tree user. Merging this provides an "anchor" so that the different folks can update their subsystems (like perf) in parallel to use this define and test it. [ dhansen: add a note about why this has no in-tree users yet ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20220823174819.223941-1-tony.luck@intel.com	2022-08-23 10:58:21 -07:00
Linus Torvalds	4f61f842d1	Fix a kprobes bug in JNG/JNLE emulation when a kprobe is installed at such instructions, possibly resulting in incorrect execution (the wrong branch taken). Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmMCmzARHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1jt1RAAwd8PJDg9gkWMq+96B1RM5nAumTnbkngc 4ylQr7+8bhFj4O+uKuBYC0S4D2ox+LeEPAD7x4+pFSLahB78HTytAtixKtKe1l0B h+AZBm35PfkUlHw52B/aCUW6DWKZOa6KrLQa8L1MqIGK8oiMiL+Q6+xXFQ8a6Gtt ZYR2yi1esM2tHzw6CdMasRYswGO1KnZlCzGB5lHyDdU/y7xTfOgCTd00K07augXA mLQOBqTZ9dWDaT9mdc+Uh9llHOOhhfzMDYlWxSpAUsdk6vCWiUz6Y+ybsYRlebEh yEjRzUkvchraGj0L04m35ulqvoYTSCjc4aP0JvNB7mZoIyicajngfoKEc8DJ0BMf dgZj3CoKuTrKbbp//hfK5rt5J0m0ueiixOFI4EM0kJVUAOG1zzWAbxEQTrWnJlZh /8l/4n4zUY3Ss9ecG9PzNs++YH8I+gDbQd2kGr6YMffqUW9HOb5j2yk3mXs8gH0n O8HBGP7I94w45Q8L1CDSD/Sdn+IrhtaGzpYTV7rCEiM13S4FwrnzpVoi3SJiVODM kAxM4HScWXygTQTqvz4I5T9ibOVGp8iRQRHfLRbNiX+xB8k8DSQWAE1vxsw2v5o+ L8/N6mt71u/j5DyTzybTp1kepum+UjVH2RdH3v9ma3BQODkM+kbWq3cTBD6qOUh9 HH8KnagQcgc= =eXIr -----END PGP SIGNATURE----- Merge tag 'perf-urgent-2022-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 kprobes fix from Ingo Molnar: "Fix a kprobes bug in JNG/JNLE emulation when a kprobe is installed at such instructions, possibly resulting in incorrect execution (the wrong branch taken)" * tag 'perf-urgent-2022-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kprobes: Fix JNG/JNLE emulation	2022-08-21 15:01:51 -07:00
Nick Desaulniers	a0a12c3ed0	asm goto: eradicate CC_HAS_ASM_GOTO GCC has supported asm goto since 4.5, and Clang has since version 9.0.0. The minimum supported versions of these tools for the build according to Documentation/process/changes.rst are 5.1 and 11.0.0 respectively. Remove the feature detection script, Kconfig option, and clean up some fallback code that is no longer supported. The removed script was also testing for a GCC specific bug that was fixed in the 4.7 release. Also remove workarounds for bpftrace using clang older than 9.0.0, since other BPF backend fixes are required at this point. Link: https://lore.kernel.org/lkml/CAK7LNATSr=BXKfkdW8f-H5VT_w=xBpT2ZQcZ7rm6JfkdE+QnmA@mail.gmail.com/ Link: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48637 Acked-by: Borislav Petkov <bp@suse.de> Suggested-by: Masahiro Yamada <masahiroy@kernel.org> Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-08-21 10:06:28 -07:00
Chen Zhongjin	fc2e426b11	x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry When meeting ftrace trampolines in ORC unwinding, unwinder uses address of ftrace_{regs_}call address to find the ORC entry, which gets next frame at sp+176. If there is an IRQ hitting at sub $0xa8,%rsp, the next frame should be sp+8 instead of 176. It makes unwinder skip correct frame and throw warnings such as "wrong direction" or "can't access registers", etc, depending on the content of the incorrect frame address. By adding the base address ftrace_{regs_}caller with the offset ip - ops->trampoline, we can get the correct address to find the ORC entry. Also change "caller" to "tramp_addr" to make variable name conform to its content. [ mingo: Clarified the changelog a bit. ] Fixes: `6be7fa3c74` ("ftrace, orc, x86: Handle ftrace dynamically allocated trampolines") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220819084334.244016-1-chenzhongjin@huawei.com	2022-08-21 12:19:32 +02:00
Linus Torvalds	ca052cfd6e	ARM: * Fix unexpected sign extension of KVM_ARM_DEVICE_ID_MASK * Tidy-up handling of AArch32 on asymmetric systems x86: * Fix "missing ENDBR" BUG for fastop functions Generic: * Some cleanup and static analyzer patches * More fixes to KVM_CREATE_VM unwind paths -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmL/YoEUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNK7wf/f/CxUT2NW8+klMBSUTL6YNMwPp5A 9xpfASi4pGiID27EEAOOLWcOr+A5bfa7fLS70Dyc+Wq9h0/tlnhFEF1X9RdLNHc+ I2HgNB64TZI7aLiZSm3cH3nfoazkAMPbGjxSlDmhH58cR9EPIlYeDeVMR/velbDZ Z4kfwallR2Mizb7olvXy0lYfd6jZY+JkIQQtgml801aIpwkJggwqhnckbxCDEbSx oB17T99Q2UQasDFusjvZefHjPhwZ7rxeXNTKXJLZNWecd7lAoPYJtTiYw+cxHmSY JWsyvtcHons6uNoP1y60/OuVYcLFseeY3Yf9sqI8ivyF0HhS1MXQrcXX8g== =V4Ib -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "ARM: - Fix unexpected sign extension of KVM_ARM_DEVICE_ID_MASK - Tidy-up handling of AArch32 on asymmetric systems x86: - Fix 'missing ENDBR' BUG for fastop functions Generic: - Some cleanup and static analyzer patches - More fixes to KVM_CREATE_VM unwind paths" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: Drop unnecessary initialization of "ops" in kvm_ioctl_create_device() KVM: Drop unnecessary initialization of "npages" in hva_to_pfn_slow() x86/kvm: Fix "missing ENDBR" BUG for fastop functions x86/kvm: Simplify FOP_SETCC() x86/ibt, objtool: Add IBT_NOSEAL() KVM: Rename mmu_notifier_* to mmu_invalidate_* KVM: Rename KVM_PRIVATE_MEM_SLOTS to KVM_INTERNAL_MEM_SLOTS KVM: MIPS: remove unnecessary definition of KVM_PRIVATE_MEM_SLOTS KVM: Move coalesced MMIO initialization (back) into kvm_create_vm() KVM: Unconditionally get a ref to /dev/kvm module when creating a VM KVM: Properly unwind VM creation if creating debugfs fails KVM: arm64: Reject 32bit user PSTATE on asymmetric systems KVM: arm64: Treat PMCR_EL1.LC as RES1 on asymmetric systems KVM: arm64: Fix compile error due to sign extension	2022-08-19 13:40:11 -07:00
Namhyung Kim	501f7f69bc	locking: Add __lockfunc to slow path functions So that we can skip the functions in the perf lock contention and other places like /proc/PID/wchan. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/r/20220810220346.1919485-1-namhyung@kernel.org	2022-08-19 19:47:51 +02:00
Kan Liang	cde643ff75	perf/x86/intel: Fix pebs event constraints for ADL According to the latest event list, the LOAD_LATENCY PEBS event only works on the GP counter 0 and 1 for ADL and RPL. Update the pebs event constraints table. Fixes: `f83d2f91d2` ("perf/x86/intel: Add Alder Lake Hybrid support") Reported-by: Ammy Yi <ammy.yi@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20220818184429.2355857-1-kan.liang@linux.intel.com	2022-08-19 19:47:31 +02:00
Stephane Eranian	d4bdb0bebc	perf/x86/intel/ds: Fix precise store latency handling With the existing code in store_latency_data(), the memory operation (mem_op) returned to the user is always OP_LOAD where in fact, it should be OP_STORE. This comes from the fact that the function is simply grabbing the information from a data source map which covers only load accesses. Intel 12th gen CPU offers precise store sampling that captures both the data source and latency. Therefore it can use the data source mapping table but must override the memory operation to reflect stores instead of loads. Fixes: `61b985e3e7` ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids") Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220818054613.1548130-1-eranian@google.com	2022-08-19 19:47:31 +02:00
Peter Zijlstra	7d3598868a	perf/x86/core: Set pebs_capable and PMU_FL_PEBS_ALL for the Baseline The SDM explicitly states that PEBS Baseline implies Extended PEBS. For cpu model forward compatibility (e.g. on ICX, SPR, ADL), it's safe to stop doing FMS table thing such as setting pebs_capable and PMU_FL_PEBS_ALL since it's already set in the intel_ds_init(). The Goldmont Plus is the only platform which supports extended PEBS but doesn't have Baseline. Keep the status quo. Reported-by: Like Xu <likexu@tencent.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lkml.kernel.org/r/20220816114057.51307-1-likexu@tencent.com	2022-08-19 19:47:31 +02:00
Kan Liang	32ba156df1	perf/x86/lbr: Enable the branch type for the Arch LBR by default On the platform with Arch LBR, the HW raw branch type encoding may leak to the perf tool when the SAVE_TYPE option is not set. In the intel_pmu_store_lbr(), the HW raw branch type is stored in lbr_entries[].type. If the SAVE_TYPE option is set, the lbr_entries[].type will be converted into the generic PERF_BR_* type in the intel_pmu_lbr_filter() and exposed to the user tools. But if the SAVE_TYPE option is NOT set by the user, the current perf kernel doesn't clear the field. The HW raw branch type leaks. There are two solutions to fix the issue for the Arch LBR. One is to clear the field if the SAVE_TYPE option is NOT set. The other solution is to unconditionally convert the branch type and expose the generic type to the user tools. The latter is implemented here, because - The branch type is valuable information. I don't see a case where you would not benefit from the branch type. (Stephane Eranian) - Not having the branch type DOES NOT save any space in the branch record (Stephane Eranian) - The Arch LBR HW can retrieve the common branch types from the LBR_INFO. It doesn't require the high overhead SW disassemble. Fixes: `47125db27e` ("perf/x86/intel/lbr: Support Architectural LBR") Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20220816125612.2042397-1-kan.liang@linux.intel.com	2022-08-19 19:47:30 +02:00
Aaron Lu	88e0a74902	x86/mm: Use proper mask when setting PUD mapping Commit c164fbb40c43f("x86/mm: thread pgprot_t through init_memory_mapping()") mistakenly used __pgprot() which doesn't respect __default_kernel_pte_mask when setting PUD mapping. Fix it by only setting the one bit we actually need (PSE) and leaving the other bits (that have been properly masked) alone. Fixes: `c164fbb40c` ("x86/mm: thread pgprot_t through init_memory_mapping()") Signed-off-by: Aaron Lu <aaron.lu@intel.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-08-19 08:17:43 -07:00
Jim Mattson	020dac4187	KVM: VMX: Heed the 'msr' argument in msr_write_intercepted() Regardless of the 'msr' argument passed to the VMX version of msr_write_intercepted(), the function always checks to see if a specific MSR (IA32_SPEC_CTRL) is intercepted for write. This behavior seems unintentional and unexpected. Modify the function so that it checks to see if the provided 'msr' index is intercepted for write. Fixes: `67f4b9969c` ("KVM: nVMX: Handle dynamic MSR intercept toggling") Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220810213050.2655000-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-08-19 07:38:04 -04:00
Junaid Shahid	b64d740ea7	kvm: x86: mmu: Always flush TLBs when enabling dirty logging When A/D bits are not available, KVM uses a software access tracking mechanism, which involves making the SPTEs inaccessible. However, the clear_young() MMU notifier does not flush TLBs. So it is possible that there may still be stale, potentially writable, TLB entries. This is usually fine, but can be problematic when enabling dirty logging, because it currently only does a TLB flush if any SPTEs were modified. But if all SPTEs are in access-tracked state, then there won't be a TLB flush, which means that the guest could still possibly write to memory and not have it reflected in the dirty bitmap. So just unconditionally flush the TLBs when enabling dirty logging. As an alternative, KVM could explicitly check the MMU-Writable bit when write-protecting SPTEs to decide if a flush is needed (instead of checking the Writable bit), but given that a flush almost always happens anyway, so just making it unconditional seems simpler. Signed-off-by: Junaid Shahid <junaids@google.com> Message-Id: <20220810224939.2611160-1-junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-08-19 07:38:03 -04:00
Junaid Shahid	1441ca1494	kvm: x86: mmu: Drop the need_remote_flush() function This is only used by kvm_mmu_pte_write(), which no longer actually creates the new SPTE and instead just clears the old SPTE. So we just need to check if the old SPTE was shadow-present instead of calling need_remote_flush(). Hence we can drop this function. It was incomplete anyway as it didn't take access-tracking into account. This patch should not result in any functional change. Signed-off-by: Junaid Shahid <junaids@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220723024316.2725328-1-junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-08-19 07:38:02 -04:00
Peter Zijlstra	3329249737	x86/nospec: Fix i386 RSB stuffing Turns out that i386 doesn't unconditionally have LFENCE, as such the loop in __FILL_RETURN_BUFFER isn't actually speculation safe on such chips. Fixes: `ba6e31af2b` ("x86/speculation: Add LFENCE to RSB fill sequence") Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/Yv9tj9vbQ9nNlXoY@worktop.programming.kicks-ass.net	2022-08-19 13:24:33 +02:00
Peter Zijlstra	4e3aa92382	x86/nospec: Unwreck the RSB stuffing Commit `2b12993220` ("x86/speculation: Add RSB VM Exit protections") made a right mess of the RSB stuffing, rewrite the whole thing to not suck. Thanks to Andrew for the enlightening comment about Post-Barrier RSB things so we can make this code less magical. Cc: stable@vger.kernel.org Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/YvuNdDWoUZSBjYcm@worktop.programming.kicks-ass.net	2022-08-19 13:24:32 +02:00
Robert Elliott	aa031b8f70	crypto: x86/sha512 - load based on CPU features x86 optimized crypto modules built as modules rather than built-in to the kernel end up as .ko files in the filesystem, e.g., in /usr/lib/modules. If the filesystem itself is a module, these might not be available when the crypto API is initialized, resulting in the generic implementation being used (e.g., sha512_transform rather than sha512_transform_avx2). In one test case, CPU utilization in the sha512 function dropped from 15.34% to 7.18% after forcing loading of the optimized module. Add module aliases for this x86 optimized crypto module based on CPU feature bits so udev gets a chance to load them later in the boot process when the filesystems are all running. Signed-off-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-08-19 18:39:39 +08:00
Josh Poimboeuf	3d9606b0e0	x86/kvm: Fix "missing ENDBR" BUG for fastop functions The following BUG was reported: traps: Missing ENDBR: andw_ax_dx+0x0/0x10 [kvm] ------------[ cut here ]------------ kernel BUG at arch/x86/kernel/traps.c:253! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI <TASK> asm_exc_control_protection+0x2b/0x30 RIP: 0010:andw_ax_dx+0x0/0x10 [kvm] Code: c3 cc cc cc cc 0f 1f 44 00 00 66 0f 1f 00 48 19 d0 c3 cc cc cc cc 0f 1f 40 00 f3 0f 1e fa 20 d0 c3 cc cc cc cc 0f 1f 44 00 00 <66> 0f 1f 00 66 21 d0 c3 cc cc cc cc 0f 1f 40 00 66 0f 1f 00 21 d0 ? andb_al_dl+0x10/0x10 [kvm] ? fastop+0x5d/0xa0 [kvm] x86_emulate_insn+0x822/0x1060 [kvm] x86_emulate_instruction+0x46f/0x750 [kvm] complete_emulated_mmio+0x216/0x2c0 [kvm] kvm_arch_vcpu_ioctl_run+0x604/0x650 [kvm] kvm_vcpu_ioctl+0x2f4/0x6b0 [kvm] ? wake_up_q+0xa0/0xa0 The BUG occurred because the ENDBR in the andw_ax_dx() fastop function had been incorrectly "sealed" (converted to a NOP) by apply_ibt_endbr(). Objtool marked it to be sealed because KVM has no compile-time references to the function. Instead KVM calculates its address at runtime. Prevent objtool from annotating fastop functions as sealable by creating throwaway dummy compile-time references to the functions. Fixes: `6649fa876d` ("x86/ibt,kvm: Add ENDBR to fastops") Reported-by: Pengfei Xu <pengfei.xu@intel.com> Debugged-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Message-Id: <0d4116f90e9d0c1b754bb90c585e6f0415a1c508.1660837839.git.jpoimboe@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-08-19 04:05:42 -04:00
Josh Poimboeuf	22472d1260	x86/kvm: Simplify FOP_SETCC() SETCC_ALIGN and FOP_ALIGN are both 16. Remove the special casing for FOP_SETCC() and just make it a normal fastop. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Message-Id: <7c13d94d1a775156f7e36eed30509b274a229140.1660837839.git.jpoimboe@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-08-19 04:05:42 -04:00
Josh Poimboeuf	e27e5bea95	x86/ibt, objtool: Add IBT_NOSEAL() Add a macro which prevents a function from getting sealed if there are no compile-time references to it. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Message-Id: <20220818213927.e44fmxkoq4yj6ybn@treble> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-08-19 04:05:42 -04:00
Chao Peng	20ec3ebd70	KVM: Rename mmu_notifier_* to mmu_invalidate_* The motivation of this renaming is to make these variables and related helper functions less mmu_notifier bound and can also be used for non mmu_notifier based page invalidation. mmu_invalidate_* was chosen to better describe the purpose of 'invalidating' a page that those variables are used for. - mmu_notifier_seq/range_start/range_end are renamed to mmu_invalidate_seq/range_start/range_end. - mmu_notifier_retry{_hva} helper functions are renamed to mmu_invalidate_retry{_hva}. - mmu_notifier_count is renamed to mmu_invalidate_in_progress to avoid confusion with mn_active_invalidate_count. - While here, also update kvm_inc/dec_notifier_count() to kvm_mmu_invalidate_begin/end() to match the change for mmu_notifier_count. No functional change intended. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Message-Id: <20220816125322.1110439-3-chao.p.peng@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-08-19 04:05:41 -04:00
Chao Peng	bdd1c37a31	KVM: Rename KVM_PRIVATE_MEM_SLOTS to KVM_INTERNAL_MEM_SLOTS KVM_INTERNAL_MEM_SLOTS better reflects the fact those slots are KVM internally used (invisible to userspace) and avoids confusion to future private slots that can have different meaning. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Message-Id: <20220816125322.1110439-2-chao.p.peng@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-08-19 04:05:40 -04:00
Pawan Gupta	7df548840c	x86/bugs: Add "unknown" reporting for MMIO Stale Data Older Intel CPUs that are not in the affected processor list for MMIO Stale Data vulnerabilities currently report "Not affected" in sysfs, which may not be correct. Vulnerability status for these older CPUs is unknown. Add known-not-affected CPUs to the whitelist. Report "unknown" mitigation status for CPUs that are not in blacklist, whitelist and also don't enumerate MSR ARCH_CAPABILITIES bits that reflect hardware immunity to MMIO Stale Data vulnerabilities. Mitigation is not deployed when the status is unknown. [ bp: Massage, fixup. ] Fixes: `8d50cdf8b8` ("x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data") Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Suggested-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/a932c154772f2121794a5f2eded1a11013114711.1657846269.git.pawan.kumar.gupta@linux.intel.com	2022-08-18 15:35:22 +02:00
Borislav Petkov	0db7058e8e	x86/clear_user: Make it faster Based on a patch by Mark Hemment <markhemm@googlemail.com> and incorporating very sane suggestions from Linus. The point here is to have the default case with FSRM - which is supposed to be the majority of x86 hw out there - if not now then soon - be directly inlined into the instruction stream so that no function call overhead is taking place. Drop the early clobbers from the @size and @addr operands as those are not needed anymore since we have single instruction alternatives. The benchmarks I ran would show very small improvements and a PF benchmark would even show weird things like slowdowns with higher core counts. So for a ~6m running the git test suite, the function gets called under 700K times, all from padzero(): <...>-2536 [006] ..... 261.208801: padzero: to: 0x55b0663ed214, size: 3564, cycles: 21900 <...>-2536 [006] ..... 261.208819: padzero: to: 0x7f061adca078, size: 3976, cycles: 17160 <...>-2537 [008] ..... 261.211027: padzero: to: 0x5572d019e240, size: 3520, cycles: 23850 <...>-2537 [008] ..... 261.211049: padzero: to: 0x7f1288dc9078, size: 3976, cycles: 15900 ... which is around 1%-ish of the total time and which is consistent with the benchmark numbers. So Mel gave me the idea to simply measure how fast the function becomes. I.e.: start = rdtsc_ordered(); ret = __clear_user(to, n); end = rdtsc_ordered(); Computing the mean average of all the samples collected during the test suite run then shows some improvement: clear_user_original: Amean: 9219.71 (Sum: 6340154910, samples: 687674) fsrm: Amean: 8030.63 (Sum: 5522277720, samples: 687652) That's on Zen3. The situation looks a lot more confusing on Intel: Icelake: clear_user_original: Amean: 19679.4 (Sum: 13652560764, samples: 693750) Amean: 19743.7 (Sum: 13693470604, samples: 693562) (I ran it twice just to be sure.) ERMS: Amean: 20374.3 (Sum: 13910601024, samples: 682752) Amean: 20453.7 (Sum: 14186223606, samples: 693576) FSRM: Amean: 20458.2 (Sum: 13918381386, sample s: 680331) The original microbenchmark which people were complaining about: for i in $(seq 1 10); do dd if=/dev/zero of=/dev/null bs=1M status=progress count=65536; done 2>&1 \| grep copied 32207011840 bytes (32 GB, 30 GiB) copied, 1 s, 32.2 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.93069 s, 35.6 GB/s 37597741056 bytes (38 GB, 35 GiB) copied, 1 s, 37.6 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.78017 s, 38.6 GB/s 62020124672 bytes (62 GB, 58 GiB) copied, 2 s, 31.0 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 2.13716 s, 32.2 GB/s 60010004480 bytes (60 GB, 56 GiB) copied, 1 s, 60.0 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.14129 s, 60.2 GB/s 53212086272 bytes (53 GB, 50 GiB) copied, 1 s, 53.2 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.28398 s, 53.5 GB/s 55698259968 bytes (56 GB, 52 GiB) copied, 1 s, 55.7 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.22507 s, 56.1 GB/s 55306092544 bytes (55 GB, 52 GiB) copied, 1 s, 55.3 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.23647 s, 55.6 GB/s 54387539968 bytes (54 GB, 51 GiB) copied, 1 s, 54.4 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.25693 s, 54.7 GB/s 50566529024 bytes (51 GB, 47 GiB) copied, 1 s, 50.6 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.35096 s, 50.9 GB/s 58308165632 bytes (58 GB, 54 GiB) copied, 1 s, 58.3 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.17394 s, 58.5 GB/s Now the same thing with smaller buffers: for i in $(seq 1 10); do dd if=/dev/zero of=/dev/null bs=1M status=progress count=8192; done 2>&1 \| grep copied 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.28485 s, 30.2 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.276112 s, 31.1 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.29136 s, 29.5 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.283803 s, 30.3 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.306503 s, 28.0 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.349169 s, 24.6 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.276912 s, 31.0 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.265356 s, 32.4 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.28464 s, 30.2 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.242998 s, 35.3 GB/s is also not conclusive because it all depends on the buffer sizes, their alignments and when the microcode detects that cachelines can be aggregated properly and copied in bigger sizes. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/CAHk-=wh=Mu_EYhtOmPn6AxoQZyEh-4fo2Zx3G7rBv1g7vwoKiw@mail.gmail.com	2022-08-18 12:36:42 +02:00
Linus Torvalds	c4e34dd99f	x86: simplify load_unaligned_zeropad() implementation The exception for the "unaligned access at the end of the page, next page not mapped" never happens, but the fixup code ends up causing trouble for compilers to optimize well. clang in particular ends up seeing it being in the middle of a loop, and tries desperately to optimize the exception fixup code that is never really reached. The simple solution is to just move all the fixups into the exception handler itself, which moves it all out of the hot case code, and means that the compiler never sees it or needs to worry about it. Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-08-16 11:03:38 -07:00
Juergen Gross	5b9f0c4df1	x86/entry: Fix entry_INT80_compat for Xen PV guests Commit `c89191ce67` ("x86/entry: Convert SWAPGS to swapgs and remove the definition of SWAPGS") missed one use case of SWAPGS in entry_INT80_compat(). Removing of the SWAPGS macro led to asm just using "swapgs", as it is accepting instructions in capital letters, too. This in turn leads to splats in Xen PV guests like: [ 36.145223] general protection fault, maybe for address 0x2d: 0000 [#1] PREEMPT SMP NOPTI [ 36.145794] CPU: 2 PID: 1847 Comm: ld-linux.so.2 Not tainted 5.19.1-1-default #1 \ openSUSE Tumbleweed f3b44bfb672cdb9f235aff53b57724eba8b9411b [ 36.146608] Hardware name: HP ProLiant ML350p Gen8, BIOS P72 11/14/2013 [ 36.148126] RIP: e030:entry_INT80_compat+0x3/0xa3 Fix that by open coding this single instance of the SWAPGS macro. Fixes: `c89191ce67` ("x86/entry: Convert SWAPGS to swapgs and remove the definition of SWAPGS") Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jan Beulich <jbeulich@suse.com> Cc: <stable@vger.kernel.org> # 5.19 Link: https://lore.kernel.org/r/20220816071137.4893-1-jgross@suse.com	2022-08-16 10:02:52 +02:00
Ard Biesheuvel	6c3a9c9ae0	efi/x86-mixed: move unmitigated RET into .rodata Move the EFI mixed mode return trampoline RET into .rodata, so it is normally mapped without executable permissions. And given that this snippet of code is really the only kernel code that we ever execute via this 1:1 mapping, let's unmap the 1:1 mapping of the kernel .text, and only map the page that covers the return trampoline with executable permissions. Note that the remainder of .rodata needs to remain mapped into the 1:1 mapping with RO/NX permissions, as literal GUIDs and strings may be passed to the variable routines. Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2022-08-16 09:28:05 +02:00
Uros Bizjak	4630535c64	x86/uaccess: Improve __try_cmpxchg64_user_asm() for x86_32 Improve __try_cmpxcgh64_user_asm() for !CONFIG_CC_HAS_ASM_GOTO_TIED_OUTPUT by relaxing the output register constraint from "c" to "q" constraint, which allows the compiler to choose between %ecx or %ebx register. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220628161612.7993-1-ubizjak@gmail.com	2022-08-15 19:18:12 +02:00
Jason Wang	3163600cab	x86: Fix various duplicate-word comment typos [ mingo: Consolidated 4 very similar patches into one, it's silly to spread this out. ] Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220715044809.20572-1-wangborong@cdjrlc.com	2022-08-15 19:17:52 +02:00
Li kunyu	039f0e054a	x86/boot: Remove superfluous type casting from arch/x86/boot/bitops.h 'const void *' will auto-type-convert to just about any other const pointer type, no need to force it. [ mingo: Rewrote the changelog. ] Signed-off-by: Li kunyu <kunyu@nfschina.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220725042358.3377-1-kunyu@nfschina.com	2022-08-15 19:17:43 +02:00
Kristen Carlson Accardi	ee56a28398	x86/sgx: Improve comments for sgx_encl_lookup/alloc_backing() Modify the comments for sgx_encl_lookup_backing() and for sgx_encl_alloc_backing() to indicate that they take a reference which must be dropped with a call to sgx_encl_put_backing(). Make sgx_encl_lookup_backing() static for now, and change the name of sgx_encl_get_backing() to __sgx_encl_get_backing() to make it more clear that sgx_encl_get_backing() is an internal function. Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/all/YtUs3MKLzFg+rqEV@zn.tnic/	2022-08-15 11:51:49 +02:00
Jan Beulich	72cbc8f04f	x86/PAT: Have pat_enabled() properly reflect state when running on Xen After commit ID in the Fixes: tag, pat_enabled() returns false (because of PAT initialization being suppressed in the absence of MTRRs being announced to be available). This has become a problem: the i915 driver now fails to initialize when running PV on Xen (i915_gem_object_pin_map() is where I located the induced failure), and its error handling is flaky enough to (at least sometimes) result in a hung system. Yet even beyond that problem the keying of the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular graphics frame buffer accesses would have been quite a bit less optimal than possible. Arrange for the function to return true in such environments, without undermining the rest of PAT MSR management logic considering PAT to be disabled: specifically, no writes to the PAT MSR should occur. For the new boolean to live in .init.data, init_cache_modes() also needs moving to .init.text (where it could/should have lived already before). [ bp: This is the "small fix" variant for stable. It'll get replaced with a proper PAT and MTRR detection split upstream but that is too involved for a stable backport. - additional touchups to commit msg. Use cpu_feature_enabled(). ] Fixes: `bdd8b6c982` ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()") Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/9385fa60-fa5d-f559-a137-6608408f88b0@suse.com	2022-08-15 10:51:23 +02:00
Linus Torvalds	5d6a0f4da9	xen: branch for v6.0-rc1b -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYvi0yQAKCRCAXGG7T9hj vmikAQDWSrcWuxDkGnzut0A1tBQRUCWDMyKPqigWAA5tH2sPgAEAtWfBvT1xyl7T gZ22I7o21WxxDGyvNUcA65pK7c2cpg8= =UMbq -----END PGP SIGNATURE----- Merge tag 'for-linus-6.0-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull more xen updates from Juergen Gross: - fix the handling of the "persistent grants" feature negotiation between Xen blkfront and Xen blkback drivers - a cleanup of xen.config and adding xen.config to Xen section in MAINTAINERS - support HVMOP_set_evtchn_upcall_vector, which is more compliant to "normal" interrupt handling than the global callback used up to now - further small cleanups * tag 'for-linus-6.0-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: MAINTAINERS: add xen config fragments to XEN HYPERVISOR sections xen: remove XEN_SCRUB_PAGES in xen.config xen/pciback: Fix comment typo xen/xenbus: fix return type in xenbus_file_read() xen-blkfront: Apply 'feature_persistent' parameter when connect xen-blkback: Apply 'feature_persistent' parameter when connect xen-blkback: fix persistent grants negotiation x86/xen: Add support for HVMOP_set_evtchn_upcall_vector	2022-08-14 09:28:54 -07:00
Nadav Amit	8924779df8	x86/kprobes: Fix JNG/JNLE emulation When kprobes emulates JNG/JNLE instructions on x86 it uses the wrong condition. For JNG (opcode: 0F 8E), according to Intel SDM, the jump is performed if (ZF == 1 or SF != OF). However the kernel emulation currently uses 'and' instead of 'or'. As a result, setting a kprobe on JNG/JNLE might cause the kernel to behave incorrectly whenever the kprobe is hit. Fix by changing the 'and' to 'or'. Fixes: `6256e668b7` ("x86/kprobes: Use int3 instead of debug trap for single-step") Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220813225943.143767-1-namit@vmware.com	2022-08-14 11:27:17 +02:00
Mateusz Jończyk	e1a6bc7c69	x86/rtc: Rename mach_set_rtc_mmss() to mach_set_cmos_time() Once upon a time, before this commit in 2013: `3195ef59cb` ("x86: Do full rtc synchronization with ntp") ... the mach_set_rtc_mmss() function set only the minutes and seconds registers of the CMOS RTC - hence the '_mmss' postfix. This is no longer true, so rename the function to mach_set_cmos_time(). [ mingo: Expanded changelog a bit. ] Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20220813131034.768527-2-mat.jonczyk@o2.pl	2022-08-14 11:24:29 +02:00
Mateusz Jończyk	fc04b2ccf0	x86/rtc: Rewrite & simplify mach_get_cmos_time() by deleting duplicated functionality There are functions in drivers/rtc/rtc-mc146818-lib.c that handle reading from / writing to the CMOS RTC clock. mach_get_cmos_time() in arch/x86/kernel/rtc.c did not use them and was mostly a duplicate of mc146818_get_time(). Modify mach_get_cmos_time() to use mc146818_get_time() and remove the duplicated functionality. mach_get_cmos_time() used a different algorithm than mc146818_get_time(), but these functions are equivalent. The major differences are: - mc146818_get_time() is better refined and handles various edge conditions, - when the UIP ("Update in progress") bit of the RTC is set, mach_get_cmos_time() was busy waiting with cpu_relax() while mc146818_get_time() is using mdelay(1) in every loop iteration. (However, there is my commit merged for Linux 5.20 / 6.0 to decrease this period to 100us: commit `d2a632a8a1` ("rtc: mc146818-lib: reduce RTC_UIP polling period") ), - mach_get_cmos_time() assumed that the RTC year is >= 2000, which may not be true on some old boxes with a dead battery, - mach_get_cmos_time() was holding the rtc_lock for a long time and could hang if the RTC is broken or not present. The RTC writing counterpart, mach_set_rtc_mmss() is already using mc146818_get_time() from drivers/rtc. This was done in commit `3195ef59cb` ("x86: Do full rtc synchronization with ntp") It appears that mach_get_cmos_time() was simply forgotten. mach_get_cmos_time() is really used only in read_persistent_clock64(), which is called only in a few places in kernel/time/timekeeping.c . [ mingo: These changes are not supposed to change behavior, but they are not identity transformations either, as mc146818_get_time() is a better but different implementation of the same logic - so regressions are possible in principle. ] Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20220813131034.768527-1-mat.jonczyk@o2.pl	2022-08-14 11:24:08 +02:00
Linus Torvalds	c5f1e32e32	Fix the "IBPB mitigated RETBleed" mode of operation on AMD CPUs (not turned on by default), which also need STIBP enabled (if available) to be '100% safe' on even the shortest speculation windows. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmL3fqcRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1gnuw/6AighFp+Gp4qXP1DIVU+acVnZsxbdt7GA WGs/JJfKYsKpWvDGFxnwtF2V1Imq8XVRPVPyFKvLQiBs2h8vNcVkgIvJsdeTFsqQ uUwUaYgDXuhLYaFpnMGouoeA3iw2zf/CY5ZJX79Nl/CwNwT7FxiLbu+JF/I2Yc0V yddiQ8xgT0VJhaBcUTsD2qFl8wjpxer7gNBFR4ujiYWXHag3qKyZuaySmqCz4xhd 4nyhJCp34548MsTVXDys2gnYpgLWweB9zOPvH4+GgtiFF3UJxRMhkB9NzfZq1l5W tCjgGupb3vVoXOVb/xnXyZlPbdFNqSAja7iOXYdmNUSURd7LC0PYHpVxN0rkbFcd V6noyU3JCCp86ceGTC0u3Iu6LLER6RBGB0gatVlzomWLjTEiC806eo23CVE22cnk poy7FO3RWa+q1AqWsEzc3wr14ZgSKCBZwwpn6ispT/kjx9fhAFyKtH2/Sznx26GH yKOF7pPCIXjCpcMnNoUu8cVyzfk0g3kOWQtKjaL9WfeyMtBaHhctngR0s1eCxZNJ rBlTs+YO7fO42unZEExgvYekBzI70aThIkvxahKEWW48owWph+i/sn5gzdVF+ynR R4PGeylfd8ZXr21cG2rG9250JLwqzhsxnAGvNjYg1p/hdyrzLTGWHIc9r9BU9000 mmOP9uY6Cjc= =Ac6x -----END PGP SIGNATURE----- Merge tag 'x86-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Ingo Molnar: "Fix the 'IBPB mitigated RETBleed' mode of operation on AMD CPUs (not turned on by default), which also need STIBP enabled (if available) to be '100% safe' on even the shortest speculation windows" * tag 'x86-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/bugs: Enable STIBP for IBPB mitigated RETBleed	2022-08-13 14:24:12 -07:00
Jane Malalane	b1c3497e60	x86/xen: Add support for HVMOP_set_evtchn_upcall_vector Implement support for the HVMOP_set_evtchn_upcall_vector hypercall in order to set the per-vCPU event channel vector callback on Linux and use it in preference of HVM_PARAM_CALLBACK_IRQ. If the per-VCPU vector setup is successful on BSP, use this method for the APs. If not, fallback to the global vector-type callback. Also register callback_irq at per-vCPU event channel setup to trick toolstack to think the domain is enlightened. Suggested-by: "Roger Pau Monné" <roger.pau@citrix.com> Signed-off-by: Jane Malalane <jane.malalane@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20220729070416.23306-1-jane.malalane@citrix.com Signed-off-by: Juergen Gross <jgross@suse.com>	2022-08-12 11:28:21 +02:00
Linus Torvalds	e18a90427c	* Xen timer fixes * Documentation formatting fixes * Make rseq selftest compatible with glibc-2.35 * Fix handling of illegal LEA reg, reg * Cleanup creation of debugfs entries * Fix steal time cache handling bug * Fixes for MMIO caching * Optimize computation of number of LBRs * Fix uninitialized field in guest_maxphyaddr < host_maxphyaddr path -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmL0qwIUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroML1gf/SK6by+Gi0r7WSkrDjU94PKZ8D6Y3 fErMhratccc9IfL3p90IjCVhEngfdQf5UVHExA5TswgHHAJTpECzuHya9TweQZc5 2rrTvufup0MNALfzkSijrcI80CBvrJc6JyOCkv0BLp7yqXUrnrm0OOMV2XniS7y0 YNn2ZCy44tLqkNiQrLhJQg3EsXu9l7okGpHSVO6iZwC7KKHvYkbscVFa/AOlaAwK WOZBB+1Ee+/pWhxsngM1GwwM3ZNU/jXOSVjew5plnrD4U7NYXIDATszbZAuNyxqV 5gi+wvTF1x9dC6Tgd3qF7ouAqtT51BdRYaI9aYHOYgvzqdNFHWJu3XauDQ== =vI6Q -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull more kvm updates from Paolo Bonzini: - Xen timer fixes - Documentation formatting fixes - Make rseq selftest compatible with glibc-2.35 - Fix handling of illegal LEA reg, reg - Cleanup creation of debugfs entries - Fix steal time cache handling bug - Fixes for MMIO caching - Optimize computation of number of LBRs - Fix uninitialized field in guest_maxphyaddr < host_maxphyaddr path * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (26 commits) KVM: x86/MMU: properly format KVM_CAP_VM_DISABLE_NX_HUGE_PAGES capability table Documentation: KVM: extend KVM_CAP_VM_DISABLE_NX_HUGE_PAGES heading underline KVM: VMX: Adjust number of LBR records for PERF_CAPABILITIES at refresh KVM: VMX: Use proper type-safe functions for vCPU => LBRs helpers KVM: x86: Refresh PMU after writes to MSR_IA32_PERF_CAPABILITIES KVM: selftests: Test all possible "invalid" PERF_CAPABILITIES.LBR_FMT vals KVM: selftests: Use getcpu() instead of sched_getcpu() in rseq_test KVM: selftests: Make rseq compatible with glibc-2.35 KVM: Actually create debugfs in kvm_create_vm() KVM: Pass the name of the VM fd to kvm_create_vm_debugfs() KVM: Get an fd before creating the VM KVM: Shove vcpu stats_id init into kvm_vcpu_init() KVM: Shove vm stats_id init into kvm_create_vm() KVM: x86/mmu: Add sanity check that MMIO SPTE mask doesn't overlap gen KVM: x86/mmu: rename trace function name for asynchronous page fault KVM: x86/xen: Stop Xen timer before changing IRQ KVM: x86/xen: Initialize Xen timer only once KVM: SVM: Disable SEV-ES support if MMIO caching is disable KVM: x86/mmu: Fully re-evaluate MMIO caching when SPTE masks change KVM: x86: Tag kvm_mmu_x86_module_init() with __init ...	2022-08-11 12:10:08 -07:00
Nick Desaulniers	ffcf9c5700	x86: link vdso and boot with -z noexecstack --no-warn-rwx-segments Users of GNU ld (BFD) from binutils 2.39+ will observe multiple instances of a new warning when linking kernels in the form: ld: warning: arch/x86/boot/pmjump.o: missing .note.GNU-stack section implies executable stack ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker ld: warning: arch/x86/boot/compressed/vmlinux has a LOAD segment with RWX permissions Generally, we would like to avoid the stack being executable. Because there could be a need for the stack to be executable, assembler sources have to opt-in to this security feature via explicit creation of the .note.GNU-stack feature (which compilers create by default) or command line flag --noexecstack. Or we can simply tell the linker the production of such sections is irrelevant and to link the stack as --noexecstack. LLVM's LLD linker defaults to -z noexecstack, so this flag isn't strictly necessary when linking with LLD, only BFD, but it doesn't hurt to be explicit here for all linkers IMO. --no-warn-rwx-segments is currently BFD specific and only available in the current latest release, so it's wrapped in an ld-option check. While the kernel makes extensive usage of ELF sections, it doesn't use permissions from ELF segments. Link: https://lore.kernel.org/linux-block/3af4127a-f453-4cf7-f133-a181cce06f73@kernel.dk/ Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=ba951afb99912da01a6e8434126b8fac7aa75107 Link: https://github.com/llvm/llvm-project/issues/57009 Reported-and-tested-by: Jens Axboe <axboe@kernel.dk> Suggested-by: Fangrui Song <maskray@google.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-08-10 18:30:09 -07:00
Sean Christopherson	6348aafa8d	KVM: VMX: Adjust number of LBR records for PERF_CAPABILITIES at refresh Now that the PMU is refreshed when MSR_IA32_PERF_CAPABILITIES is written by host userspace, zero out the number of LBR records for a vCPU during PMU refresh if PMU_CAP_LBR_FMT is not set in PERF_CAPABILITIES instead of handling the check at run-time. guest_cpuid_has() is expensive due to the linear search of guest CPUID entries, intel_pmu_lbr_is_enabled() is checked on every VM-Enter, _and_ simply enumerating the same "Model" as the host causes KVM to set the number of LBR records to a non-zero value. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220727233424.2968356-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-08-10 15:08:30 -04:00
Sean Christopherson	7de8e5b6b1	KVM: VMX: Use proper type-safe functions for vCPU => LBRs helpers Turn vcpu_to_lbr_desc() and vcpu_to_lbr_records() into functions in order to provide type safety, to document exactly what they return, and to allow consuming the helpers in vmx.h. Move the definitions as necessary (the macros "reference" to_vmx() before its definition). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220727233424.2968356-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-08-10 15:08:30 -04:00
Sean Christopherson	17a024a8b9	KVM: x86: Refresh PMU after writes to MSR_IA32_PERF_CAPABILITIES Refresh the PMU if userspace modifies MSR_IA32_PERF_CAPABILITIES. KVM consumes the vCPU's PERF_CAPABILITIES when enumerating PEBS support, but relies on CPUID updates to refresh the PMU. I.e. KVM will do the wrong thing if userspace stuffs PERF_CAPABILITIES _after_ setting guest CPUID. Opportunistically fix a curly-brace indentation. Fixes: `c59a1f106f` ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS") Cc: Like Xu <like.xu.linux@gmail.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220727233424.2968356-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-08-10 15:08:30 -04:00
Sean Christopherson	8bad4606ac	KVM: x86/mmu: Add sanity check that MMIO SPTE mask doesn't overlap gen Add compile-time and init-time sanity checks to ensure that the MMIO SPTE mask doesn't overlap the MMIO SPTE generation or the MMU-present bit. The generation currently avoids using bit 63, but that's as much coincidence as it is strictly necessarly. That will change in the future, as TDX support will require setting bit 63 (SUPPRESS_VE) in the mask. Explicitly carve out the bits that are allowed in the mask so that any future shuffling of SPTE bits doesn't silently break MMIO caching (KVM has broken MMIO caching more than once due to overlapping the generation with other things). Suggested-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Message-Id: <20220805194133.86299-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-08-10 15:08:26 -04:00
Mingwei Zhang	1685c0f325	KVM: x86/mmu: rename trace function name for asynchronous page fault Rename the tracepoint function from trace_kvm_async_pf_doublefault() to trace_kvm_async_pf_repeated_fault() to make it clear, since double fault has nothing to do with this trace function. Asynchronous Page Fault (APF) is an artifact generated by KVM when it cannot find a physical page to satisfy an EPT violation. KVM uses APF to tell the guest OS to do something else such as scheduling other guest processes to make forward progress. However, when another guest process also touches a previously APFed page, KVM halts the vCPU instead of generating a repeated APF to avoid wasting cycles. Double fault (#DF) clearly has a different meaning and a different consequence when triggered. #DF requires two nested contributory exceptions instead of two page faults faulting at the same address. A prevous bug on APF indicates that it may trigger a double fault in the guest [1] and clearly this trace function has nothing to do with it. So rename this function should be a valid choice. No functional change intended. [1] https://www.spinics.net/lists/kvm/msg214957.html Signed-off-by: Mingwei Zhang <mizhang@google.com> Message-Id: <20220807052141.69186-1-mizhang@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-08-10 15:08:26 -04:00
Coleman Dietsch	c036899136	KVM: x86/xen: Stop Xen timer before changing IRQ Stop Xen timer (if it's running) prior to changing the IRQ vector and potentially (re)starting the timer. Changing the IRQ vector while the timer is still running can result in KVM injecting a garbage event, e.g. vm_xen_inject_timer_irqs() could see a non-zero xen.timer_pending from a previous timer but inject the new xen.timer_virq. Fixes: `5363952605` ("KVM: x86/xen: handle PV timers oneshot mode") Cc: stable@vger.kernel.org Link: https://syzkaller.appspot.com/bug?id=8234a9dfd3aafbf092cc5a7cd9842e3ebc45fc42 Reported-by: syzbot+e54f930ed78eb0f85281@syzkaller.appspotmail.com Signed-off-by: Coleman Dietsch <dietschc@csp.edu> Reviewed-by: Sean Christopherson <seanjc@google.com> Acked-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20220808190607.323899-3-dietschc@csp.edu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-08-10 15:08:25 -04:00
Coleman Dietsch	af735db312	KVM: x86/xen: Initialize Xen timer only once Add a check for existing xen timers before initializing a new one. Currently kvm_xen_init_timer() is called on every KVM_XEN_VCPU_ATTR_TYPE_TIMER, which is causing the following ODEBUG crash when vcpu->arch.xen.timer is already set. ODEBUG: init active (active state 0) object type: hrtimer hint: xen_timer_callbac0 RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:502 Call Trace: __debug_object_init debug_hrtimer_init debug_init hrtimer_init kvm_xen_init_timer kvm_xen_vcpu_set_attr kvm_arch_vcpu_ioctl kvm_vcpu_ioctl vfs_ioctl Fixes: `5363952605` ("KVM: x86/xen: handle PV timers oneshot mode") Cc: stable@vger.kernel.org Link: https://syzkaller.appspot.com/bug?id=8234a9dfd3aafbf092cc5a7cd9842e3ebc45fc42 Reported-by: syzbot+e54f930ed78eb0f85281@syzkaller.appspotmail.com Signed-off-by: Coleman Dietsch <dietschc@csp.edu> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220808190607.323899-2-dietschc@csp.edu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-08-10 15:08:25 -04:00
Sean Christopherson	0c29397ac1	KVM: SVM: Disable SEV-ES support if MMIO caching is disable Disable SEV-ES if MMIO caching is disabled as SEV-ES relies on MMIO SPTEs generating #NPF(RSVD), which are reflected by the CPU into the guest as a #VC. With SEV-ES, the untrusted host, a.k.a. KVM, doesn't have access to the guest instruction stream or register state and so can't directly emulate in response to a #NPF on an emulated MMIO GPA. Disabling MMIO caching means guest accesses to emulated MMIO ranges cause #NPF(!PRESENT), and those flavors of #NPF cause automatic VM-Exits, not #VC. Adjust KVM's MMIO masks to account for the C-bit location prior to doing SEV(-ES) setup, and document that dependency between adjusting the MMIO SPTE mask and SEV(-ES) setup. Fixes: `b09763da4d` ("KVM: x86/mmu: Add module param to disable MMIO caching (for testing)") Reported-by: Michael Roth <michael.roth@amd.com> Tested-by: Michael Roth <michael.roth@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220803224957.1285926-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-08-10 15:08:25 -04:00
Sean Christopherson	c3e0c8c2e8	KVM: x86/mmu: Fully re-evaluate MMIO caching when SPTE masks change Fully re-evaluate whether or not MMIO caching can be enabled when SPTE masks change; simply clearing enable_mmio_caching when a configuration isn't compatible with caching fails to handle the scenario where the masks are updated, e.g. by VMX for EPT or by SVM to account for the C-bit location, and toggle compatibility from false=>true. Snapshot the original module param so that re-evaluating MMIO caching preserves userspace's desire to allow caching. Use a snapshot approach so that enable_mmio_caching still reflects KVM's actual behavior. Fixes: `8b9e74bfbf` ("KVM: x86/mmu: Use enable_mmio_caching to track if MMIO caching is enabled") Reported-by: Michael Roth <michael.roth@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Tested-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Message-Id: <20220803224957.1285926-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-08-10 15:08:24 -04:00
Sean Christopherson	982bae43f1	KVM: x86: Tag kvm_mmu_x86_module_init() with __init Mark kvm_mmu_x86_module_init() with __init, the entire reason it exists is to initialize variables when kvm.ko is loaded, i.e. it must never be called after module initialization. Fixes: `1d0e848060` ("KVM: x86/mmu: Resolve nx_huge_pages when kvm.ko is loaded") Cc: stable@vger.kernel.org Reviewed-by: Kai Huang <kai.huang@intel.com> Tested-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220803224957.1285926-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-08-10 15:08:24 -04:00
Michal Luczaj	4ac5b42377	KVM: x86: emulator: Fix illegal LEA handling The emulator mishandles LEA with register source operand. Even though such LEA is illegal, it can be encoded and fed to CPU. In which case real hardware throws #UD. The emulator, instead, returns address of x86_emulate_ctxt._regs. This info leak hurts host's kASLR. Tell the decoder that illegal LEA is not to be emulated. Signed-off-by: Michal Luczaj <mhal@rbox.co> Message-Id: <20220729134801.1120-1-mhal@rbox.co> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-08-10 15:08:23 -04:00
Yu Zhang	2bc685e633	KVM: X86: avoid uninitialized 'fault.async_page_fault' from fixed-up #PF kvm_fixup_and_inject_pf_error() was introduced to fixup the error code( e.g., to add RSVD flag) and inject the #PF to the guest, when guest MAXPHYADDR is smaller than the host one. When it comes to nested, L0 is expected to intercept and fix up the #PF and then inject to L2 directly if - L2.MAXPHYADDR < L0.MAXPHYADDR and - L1 has no intention to intercept L2's #PF (e.g., L2 and L1 have the same MAXPHYADDR value && L1 is using EPT for L2), instead of constructing a #PF VM Exit to L1. Currently, with PFEC_MASK and PFEC_MATCH both set to 0 in vmcs02, the interception and injection may happen on all L2 #PFs. However, failing to initialize 'fault' in kvm_fixup_and_inject_pf_error() may cause the fault.async_page_fault being NOT zeroed, and later the #PF being treated as a nested async page fault, and then being injected to L1. Instead of zeroing 'fault' at the beginning of this function, we mannually set the value of 'fault.async_page_fault', because false is the value we really expect. Fixes: `897861479c` ("KVM: x86: Add helper functions for illegal GPA checking and page fault injection") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216178 Reported-by: Yang Lixiao <lixiao.yang@intel.com> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220718074756.53788-1-yu.c.zhang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-08-10 15:08:23 -04:00
Sean Christopherson	70c8327c11	KVM: x86: Bug the VM if an accelerated x2APIC trap occurs on a "bad" reg Bug the VM if retrieving the x2APIC MSR/register while processing an accelerated vAPIC trap VM-Exit fails. In theory it's impossible for the lookup to fail as hardware has already validated the register, but bugs happen, and not checking the result of kvm_lapic_msr_read() would result in consuming the uninitialized "val" if a KVM or hardware bug occurs. Fixes: `1bd9dfec9f` ("KVM: x86: Do not block APIC write for non ICR registers") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220804235028.1766253-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-08-10 15:08:23 -04:00
Paolo Bonzini	c3c28d24d9	KVM: x86: do not report preemption if the steal time cache is stale Commit `7e2175ebd6` ("KVM: x86: Fix recording of guest steal time / preempted status", 2021-11-11) open coded the previous call to kvm_map_gfn, but in doing so it dropped the comparison between the cached guest physical address and the one in the MSR. This cause an incorrect cache hit if the guest modifies the steal time address while the memslots remain the same. This can happen with kexec, in which case the preempted bit is written at the address used by the old kernel instead of the old one. Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: stable@vger.kernel.org Fixes: `7e2175ebd6` ("KVM: x86: Fix recording of guest steal time / preempted status") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-08-10 15:08:22 -04:00
Paolo Bonzini	901d3765fa	KVM: x86: revalidate steal time cache if MSR value changes Commit `7e2175ebd6` ("KVM: x86: Fix recording of guest steal time / preempted status", 2021-11-11) open coded the previous call to kvm_map_gfn, but in doing so it dropped the comparison between the cached guest physical address and the one in the MSR. This cause an incorrect cache hit if the guest modifies the steal time address while the memslots remain the same. This can happen with kexec, in which case the steal time data is written at the address used by the old kernel instead of the old one. While at it, rename the variable from gfn to gpa since it is a plain physical address and not a right-shifted one. Reported-by: Dave Young <ruyang@redhat.com> Reported-by: Xiaoying Yan <yiyan@redhat.com> Analyzed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: stable@vger.kernel.org Fixes: `7e2175ebd6` ("KVM: x86: Fix recording of guest steal time / preempted status") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-08-10 15:08:22 -04:00
Linus Torvalds	b1701d5e29	- hugetlb_vmemmap cleanups from Muchun Song - hardware poisoning support for 1GB hugepages, from Naoya Horiguchi - highmem documentation fixups from Fabio De Francesco -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYvMFaQAKCRDdBJ7gKXxA jrM7AQCsaVsrRJgVMNqjDvLAsWFEm48s6/smQT4UZ9n/rPQUsAEA0iRpOfbjcxwK npAml1mp5uP2AkimTVLB6Bokt6N+wAg= =9Wta -----END PGP SIGNATURE----- Merge tag 'mm-stable-2022-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull remaining MM updates from Andrew Morton: "Three patch series - two that perform cleanups and one feature: - hugetlb_vmemmap cleanups from Muchun Song - hardware poisoning support for 1GB hugepages, from Naoya Horiguchi - highmem documentation fixups from Fabio De Francesco" * tag 'mm-stable-2022-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits) Documentation/mm: add details about kmap_local_page() and preemption highmem: delete a sentence from kmap_local_page() kdocs Documentation/mm: rrefer kmap_local_page() and avoid kmap() Documentation/mm: avoid invalid use of addresses from kmap_local_page() Documentation/mm: don't kmap*() pages which can't come from HIGHMEM highmem: specify that kmap_local_page() is callable from interrupts highmem: remove unneeded spaces in kmap_local_page() kdocs mm, hwpoison: enable memory error handling on 1GB hugepage mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage mm, hwpoison: make __page_handle_poison returns int mm, hwpoison: set PG_hwpoison for busy hugetlb pages mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage mm, hwpoison, hugetlb: support saving mechanism of raw error pages mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages() mm: hugetlb_vmemmap: use PTRS_PER_PTE instead of PMD_SIZE / PAGE_SIZE mm: hugetlb_vmemmap: move code comments to vmemmap_dedup.rst mm: hugetlb_vmemmap: improve hugetlb_vmemmap code readability mm: hugetlb_vmemmap: replace early_param() with core_param() mm: hugetlb_vmemmap: move vmemmap code related to HugeTLB to hugetlb_vmemmap.c ...	2022-08-10 11:18:00 -07:00
Linus Torvalds	5318b987fe	More from the CPU vulnerability nightmares front: Intel eIBRS machines do not sufficiently mitigate against RET mispredictions when doing a VM Exit therefore an additional RSB, one-entry stuffing is needed. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLqsGsACgkQEsHwGGHe VUpXGg//ZEkxhf3Ri7X9PknAWNG6eIEqigKqWcdnOw+Oq/GMVb6q7JQsqowK7KBZ AKcY5c/KkljTJNohditnfSOePyCG5nDTPgfkjzIawnaVdyJWMRCz/L4X2cv6ykDl 2l2EvQm4Ro8XAogYhE7GzDg/osaVfx93OkLCQj278VrEMWgM/dN2RZLpn+qiIkNt DyFlQ7cr5UASh/svtKLko268oT4JwhQSbDHVFLMJ52VaLXX36yx4rValZHUKFdox ZDyj+kiszFHYGsI94KAD0dYx76p6mHnwRc4y/HkVcO8vTacQ2b9yFYBGTiQatITf 0Nk1RIm9m3rzoJ82r/U0xSIDwbIhZlOVNm2QtCPkXqJZZFhopYsZUnq2TXhSWk4x GQg/2dDY6gb/5MSdyLJmvrTUtzResVyb/hYL6SevOsIRnkwe35P6vDDyp15F3TYK YvidZSfEyjtdLISBknqYRQD964dgNZu9ewrj+WuJNJr+A2fUvBzUebXjxHREsugN jWp5GyuagEKTtneVCvjwnii+ptCm6yfzgZYLbHmmV+zhinyE9H1xiwVDvo5T7DDS ZJCBgoioqMhp5qR59pkWz/S5SNGui2rzEHbAh4grANy8R/X5ASRv7UHT9uAo6ve1 xpw6qnE37CLzuLhj8IOdrnzWwLiq7qZ/lYN7m+mCMVlwRWobbOo= =a8em -----END PGP SIGNATURE----- Merge tag 'x86_bugs_pbrsb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 eIBRS fixes from Borislav Petkov: "More from the CPU vulnerability nightmares front: Intel eIBRS machines do not sufficiently mitigate against RET mispredictions when doing a VM Exit therefore an additional RSB, one-entry stuffing is needed" * tag 'x86_bugs_pbrsb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation: Add LFENCE to RSB fill sequence x86/speculation: Add RSB VM Exit protections	2022-08-09 09:29:07 -07:00
Naoya Horiguchi	3a194f3f8a	mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry follow_pud_mask() does not support non-present pud entry now. As long as I tested on x86_64 server, follow_pud_mask() still simply returns no_page_table() for non-present_pud_entry() due to pud_bad(), so no severe user-visible effect should happen. But generally we should call follow_huge_pud() for non-present pud entry for 1GB hugetlb page. Update pud_huge() and follow_huge_pud() to handle non-present pud entries. The changes are similar to previous works for pud entries commit `e66f17ff71` ("mm/hugetlb: take page table lock in follow_huge_pmd()") and commit `cbef8478be` ("mm/hugetlb: pmd_huge() returns true for non-present hugepage"). Link: https://lkml.kernel.org/r/20220714042420.1847125-3-naoya.horiguchi@linux.dev Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: kernel test robot <lkp@intel.com> Cc: Liu Shixin <liushixin2@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-08-08 18:06:43 -07:00
Kim Phillips	e6cfcdda8c	x86/bugs: Enable STIBP for IBPB mitigated RETBleed AMD's "Technical Guidance for Mitigating Branch Type Confusion, Rev. 1.0 2022-07-12" whitepaper, under section 6.1.2 "IBPB On Privileged Mode Entry / SMT Safety" says: Similar to the Jmp2Ret mitigation, if the code on the sibling thread cannot be trusted, software should set STIBP to 1 or disable SMT to ensure SMT safety when using this mitigation. So, like already being done for retbleed=unret, and now also for retbleed=ibpb, force STIBP on machines that have it, and report its SMT vulnerability status accordingly. [ bp: Remove the "we" and remove "[AMD]" applicability parameter which doesn't work here. ] Fixes: `3ebc170068` ("x86/bugs: Add retbleed=ibpb") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # 5.10, 5.15, 5.19 Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Link: https://lore.kernel.org/r/20220804192201.439596-1-kim.phillips@amd.com	2022-08-08 19:12:17 +02:00
Linus Torvalds	4e23eeebb2	Bitmap patches for v6.0-rc1 This branch consists of: Qu Wenruo: lib: bitmap: fix the duplicated comments on bitmap_to_arr64() https://lore.kernel.org/lkml/0d85e1dbad52ad7fb5787c4432bdb36cbd24f632.1656063005.git.wqu@suse.com/ Alexander Lobakin: bitops: let optimize out non-atomic bitops on compile-time constants https://lore.kernel.org/lkml/20220624121313.2382500-1-alexandr.lobakin@intel.com/T/ Yury Norov: lib: cleanup bitmap-related headers https://lore.kernel.org/linux-arm-kernel/YtCVeOGLiQ4gNPSf@yury-laptop/T/#m305522194c4d38edfdaffa71fcaaf2e2ca00a961 Alexander Lobakin: x86/olpc: fix 'logical not is only applied to the left hand side' https://www.spinics.net/lists/kernel/msg4440064.html Yury Norov: lib/nodemask: inline wrappers around bitmap https://lore.kernel.org/all/20220723214537.2054208-1-yury.norov@gmail.com/ -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmLpVvwACgkQsUSA/Tof vsiAHgwAwS9pl8GJ+fKYnue2CYo9349d2oT6BBUs/Rv8uqYEa4QkpYsR7NS733TG pos0hhoRvSOzrUP4qppXUjfJ+NkzLgpnKFOeWfFoNAKlHuaaMRvF3Y0Q/P8g0/Kg HPWcCQLHyCH9Wjs3e2TTgRjxTrHuruD2VJ401/PX/lw0DicUhmev5mUFa10uwFkP ZJRprjoFn9HJ0Hk16pFZDi36d3YumhACOcWRiJdoBDrEPV3S6lm9EeOy/yHBNp5k 9bKj+RboeT2t70KaZcKv+M5j1nu0cAhl7kRkjcxcmGyimI0l82Vgq9yFxhGqvWg8 RnCrJ5EaO08FGCAKG9GEwzdiNa24Gdq5XZSpQA7JZHmhmchpnnlNenJicyv0gOQi abChZeWSEsyA+78l2+kk9nezfVKUOnKDEZQxBVTOyWsmZYxHZV94oam340VjQDaY 4/fETdOy/qqPIxnpxAeFGWxZjcVaYiYPLj7KLPMsB0aAAF7pZrem465vSfgbrE81 +gCdqrWd =4dTW -----END PGP SIGNATURE----- Merge tag 'bitmap-6.0-rc1' of https://github.com/norov/linux Pull bitmap updates from Yury Norov: - fix the duplicated comments on bitmap_to_arr64() (Qu Wenruo) - optimize out non-atomic bitops on compile-time constants (Alexander Lobakin) - cleanup bitmap-related headers (Yury Norov) - x86/olpc: fix 'logical not is only applied to the left hand side' (Alexander Lobakin) - lib/nodemask: inline wrappers around bitmap (Yury Norov) * tag 'bitmap-6.0-rc1' of https://github.com/norov/linux: (26 commits) lib/nodemask: inline next_node_in() and node_random() powerpc: drop dependency on <asm/machdep.h> in archrandom.h x86/olpc: fix 'logical not is only applied to the left hand side' lib/cpumask: move some one-line wrappers to header file headers/deps: mm: align MANITAINERS and Docs with new gfp.h structure headers/deps: mm: Split <linux/gfp_types.h> out of <linux/gfp.h> headers/deps: mm: Optimize <linux/gfp.h> header dependencies lib/cpumask: move trivial wrappers around find_bit to the header lib/cpumask: change return types to unsigned where appropriate cpumask: change return types to bool where appropriate lib/bitmap: change type of bitmap_weight to unsigned long lib/bitmap: change return types to bool where appropriate arm: align find_bit declarations with generic kernel iommu/vt-d: avoid invalid memory access via node_online(NUMA_NO_NODE) lib/test_bitmap: test the tail after bitmap_to_arr64() lib/bitmap: fix off-by-one in bitmap_to_arr64() lib: test_bitmap: add compile-time optimization/evaluations assertions bitmap: don't assume compiler evaluates small mem*() builtins calls net/ice: fix initializing the bitmap in the switch code bitops: let optimize out non-atomic bitops on compile-time constants ...	2022-08-07 17:52:35 -07:00
Linus Torvalds	eb5699ba31	Updates to various subsystems which I help look after. lib, ocfs2, fatfs, autofs, squashfs, procfs, etc. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYu9BeQAKCRDdBJ7gKXxA jp1DAP4mjCSvAwYzXklrIt+Knv3CEY5oVVdS+pWOAOGiJpldTAD9E5/0NV+VmlD9 kwS/13j38guulSlXRzDLmitbg81zAAI= =Zfum -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2022-08-06-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc updates from Andrew Morton: "Updates to various subsystems which I help look after. lib, ocfs2, fatfs, autofs, squashfs, procfs, etc. A relatively small amount of material this time" * tag 'mm-nonmm-stable-2022-08-06-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (72 commits) scripts/gdb: ensure the absolute path is generated on initial source MAINTAINERS: kunit: add David Gow as a maintainer of KUnit mailmap: add linux.dev alias for Brendan Higgins mailmap: update Kirill's email profile: setup_profiling_timer() is moslty not implemented ocfs2: fix a typo in a comment ocfs2: use the bitmap API to simplify code ocfs2: remove some useless functions lib/mpi: fix typo 'the the' in comment proc: add some (hopefully) insightful comments bdi: remove enum wb_congested_state kernel/hung_task: fix address space of proc_dohung_task_timeout_secs lib/lzo/lzo1x_compress.c: replace ternary operator with min() and min_t() squashfs: support reading fragments in readahead call squashfs: implement readahead squashfs: always build "file direct" version of page actor Revert "squashfs: provide backing_dev_info in order to disable read-ahead" fs/ocfs2: Fix spelling typo in comment ia64: old_rr4 added under CONFIG_HUGETLB_PAGE proc: fix test for "vsyscall=xonly" boot option ...	2022-08-07 10:03:24 -07:00
Linus Torvalds	1612c382ff	Misc fixes: - an old(er) binutils build fix, - a new-GCC build fix, - and a kexec boot environment fix. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmLuv4URHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1it3A//fGfrzHGtjHraiBy0H1Erlz0dUa4q/r6v xPQVFYteGwL/Ynv2rOJreiEXNhv9pRv0cXXNS5iWh8IcP8IUNw6rfYmgr1aDpXdq WkbJvwouX6JSo3g/CMekKd+Mf7NgA4O1OO65E80c4WJnxgd0AYvr6IxJRLR7X0C7 HwU6p6PmP/RHWT5T170z6sgun+6QdDEYSwFYOhxawL+BJaKEBYnQ0LLQgJazhe7z uVxONQA9OdWBwMzvZygbOuTzc990jCHRPYgvYQhSZ8CUPuVzaa7IB9KUXh6lu93d a7nqM3GlWTowBULY6Xq7gWJaJ7jsVWXjqo8SWVlb6YwoLR9dgGSW5bCGV0rOA6o3 yPjQhIQ9H4NOx126wPcCRBh3osGFjqlWUXVw7W51aNgd7hCvlbpWWmREeI/Pm1Ew WBjQqpf4l0S+0On5FEFaF7swAG3b6KSNSKw7WBmpmTNt5eWOot0EtnjGW75ATpxM +j2fj/1MIZ/Zp+wYaNK/+abM4sXHhYvU9gpPdJslRr+r2AVjy9gCZ/0zuUIVytwC gOdV9KhqzlXPJCTm+py7fBt2qM2P5rKT2HBQYiJwIquB2njI0kjUBOJWXsGQ/F/y hGd6WY8uDuwzzg5JtyfwE6fPGovxL5GCc4w9CYz0DbP0txPYuhMOdkHtAYLyraAj wtdalMt3cT8= =EM/G -----END PGP SIGNATURE----- Merge tag 'x86-urgent-2022-08-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - build fix for old(er) binutils - build fix for new GCC - kexec boot environment fix * tag 'x86-urgent-2022-08-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/entry: Build thunk_$(BITS) only if CONFIG_PREEMPTION=y x86/numa: Use cpumask_available instead of hardcoded NULL check x86/bus_lock: Don't assume the init value of DEBUGCTLMSR.BUS_LOCK_DETECT to be zero	2022-08-06 17:45:37 -07:00
Linus Torvalds	592d8362bc	Misc fixes to kprobes and the faddr2line script, plus a cleanup. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmLuu5MRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1gFbA//ZppMR0/26/d+KqhdbVND6wtuTzGb5krZ m3QynlRQ+x7CZJNJeiNSTo/Dup/KwBUpJFT5sKLtpfOQILxlEt0hdYMiD+/oxgxd K0Vb0QrZhwFCju+OpcDAVlWNcQ5P8MMdoGUkOr5ekZ9FFalabW+bVUuM2Yf0Cok8 e20MGoZa2jcd+AZkp9jPUtCTURpW3Ew1WcVuJIgLH3EUMNrQNiPdia6xBzFyOPAw L0G14RDkd/POGF90dUGY1Ta4WeQCNYp2Rgu5DLo6l3eJJ/oeqoIUBUoNRT9AOJHH 0SVNHkrrNlRJe9HD/Jdc6RVBMM+FFNU4rw1uxOPU2OtG0MyMsj39Nzw+xmvB9QsG mwnMoeeDOJmFRnAyhETe4meR5mA8cPQDoNNlHL51I9JTJTUutIrfd+gQIgVgYrM2 oVfLW7Y0Eew8qYbAd2kfGnFNHDSH90RHG4beTz4zW3y4shembKhiPU7bgJ8lkke7 u4NgDOE+qTmtC1DznuV4Av8/27W6OMt/j1IWeR78IN7YBko99Ekog3zsWrAJgA/E Y08JVrUpUU47tMl4uC9Y0AUvm1Tb2ZyDqcdlEEzF9txtdNa6cAJtJkPaO6nUrr4+ qLCbhBBADP+oQNESi6vRHRmxmk5Z/m2ybfnAuYNNraWY01Imp4kNvLFvB01ARGaF Qin7dCjqz+E= =S41z -----END PGP SIGNATURE----- Merge tag 'perf-urgent-2022-08-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixes to kprobes and the faddr2line script, plus a cleanup" * tag 'perf-urgent-2022-08-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix ';;' typo scripts/faddr2line: Add CONFIG_DEBUG_INFO check scripts/faddr2line: Fix vmlinux detection on arm64 x86/kprobes: Update kcb status flag after singlestepping kprobes: Forbid probing on trampoline and BPF code areas	2022-08-06 17:28:12 -07:00
Borislav Petkov	86af8230ce	x86/mm: Rename set_memory_present() to set_memory_p() Have it adhere to the naming convention for those helpers. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20220805140702.31538-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2022-08-06 20:46:13 +02:00
Linus Torvalds	1d239c1eb8	IOMMU Updates for Linux v5.20/v6.0: Including: - Most intrusive patch is small and changes the default allocation policy for DMA addresses. Before the change the allocator tried its best to find an address in the first 4GB. But that lead to performance problems when that space gets exhaused, and since most devices are capable of 64-bit DMA these days, we changed it to search in the full DMA-mask range from the beginning. This change has the potential to uncover bugs elsewhere, in the kernel or the hardware. There is a Kconfig option and a command line option to restore the old behavior, but none of them is enabled by default. - Add Robin Murphy as reviewer of IOMMU code and maintainer for the dma-iommu and iova code - Chaning IOVA magazine size from 1032 to 1024 bytes to save memory - Some core code cleanups and dead-code removal - Support for ACPI IORT RMR node - Support for multiple PCI domains in the AMD-Vi driver - ARM SMMU changes from Will Deacon: - Add even more Qualcomm device-tree compatible strings - Support dumping of IMP DEF Qualcomm registers on TLB sync timeout - Fix reference count leak on device tree node in Qualcomm driver - Intel VT-d driver updates from Lu Baolu: - Make intel-iommu.h private - Optimize the use of two locks - Extend the driver to support large-scale platforms - Cleanup some dead code - MediaTek IOMMU refactoring and support for TTBR up to 35bit - Basic support for Exynos SysMMU v7 - VirtIO IOMMU driver gets a map/unmap_pages() implementation - Other smaller cleanups and fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmLs3DIACgkQK/BELZcB GuMizhAAguAnLLOkOLlR9/MhrTZfNXCUX+bfrEIevjFXMw4iPNfCCr4ydQ7EdVK6 ZA/3Z89huYl0d0x/FELolnQi+HOeqYrfTDe4rB7TgNgwZnWa+fdHcyYkgBGyfPaV ilgjNcx8o//9o4NasyB6kU395jVmFxb735gMTTb+tcO9fr+/qIB6hxrHuCklxrNr C7wK6kkoDPi5n0QuXCSjXEx2Hk245pAWKPLwqxsUYzHGlLfl7ULOxw65BUBGvn/H uCsTfJFu7u+ErwQYf0qPuOwRBnRdsx9g5EAnfab8p074SoKWvbNnftIxgIRp8ZEM YgCbhYa1GOFI4r+XzqRzEbc0/vPSttims4Jqz0KxYs7pr5EoVifrWLJFjJdCdc2h Tio1gTvOq8HbH63kwYNKJhg4iSC6zVd37ihEhvfFO6LcgFl4iCfd2o9zK7oY40J4 XoOxofVnJ2e3tzdhZ/n5quCXiudHixm6WuVa7QYKscF7Ud0tY1wWKuibdlMQTeNM 68MvtlteKcfs1BrWzZyrFMrFeAfIY8LI82y6jdJuoNMU5LE9+5yelXBdJhnVygZ+ Jglv1TIt6W/z1H5JgXtNVZ1wWgBm7rurOqNyfN8XCd8eP1z321CLfX8ujkhKrIWP ApG15cwvpnh1JX630+UFiEikTGU0fb2orMdPwYmwuu8DAsoLVHE= =hI2K -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v5.20-or-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: - The most intrusive patch is small and changes the default allocation policy for DMA addresses. Before the change the allocator tried its best to find an address in the first 4GB. But that lead to performance problems when that space gets exhaused, and since most devices are capable of 64-bit DMA these days, we changed it to search in the full DMA-mask range from the beginning. This change has the potential to uncover bugs elsewhere, in the kernel or the hardware. There is a Kconfig option and a command line option to restore the old behavior, but none of them is enabled by default. - Add Robin Murphy as reviewer of IOMMU code and maintainer for the dma-iommu and iova code - Chaning IOVA magazine size from 1032 to 1024 bytes to save memory - Some core code cleanups and dead-code removal - Support for ACPI IORT RMR node - Support for multiple PCI domains in the AMD-Vi driver - ARM SMMU changes from Will Deacon: - Add even more Qualcomm device-tree compatible strings - Support dumping of IMP DEF Qualcomm registers on TLB sync timeout - Fix reference count leak on device tree node in Qualcomm driver - Intel VT-d driver updates from Lu Baolu: - Make intel-iommu.h private - Optimize the use of two locks - Extend the driver to support large-scale platforms - Cleanup some dead code - MediaTek IOMMU refactoring and support for TTBR up to 35bit - Basic support for Exynos SysMMU v7 - VirtIO IOMMU driver gets a map/unmap_pages() implementation - Other smaller cleanups and fixes * tag 'iommu-updates-v5.20-or-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (116 commits) iommu/amd: Fix compile warning in init code iommu/amd: Add support for AVIC when SNP is enabled iommu/amd: Simplify and Consolidate Virtual APIC (AVIC) Enablement ACPI/IORT: Fix build error implicit-function-declaration drivers: iommu: fix clang -wformat warning iommu/arm-smmu: qcom_iommu: Add of_node_put() when breaking out of loop iommu/arm-smmu-qcom: Add SM6375 SMMU compatible dt-bindings: arm-smmu: Add compatible for Qualcomm SM6375 MAINTAINERS: Add Robin Murphy as IOMMU SUBSYTEM reviewer iommu/amd: Do not support IOMMUv2 APIs when SNP is enabled iommu/amd: Do not support IOMMU_DOMAIN_IDENTITY after SNP is enabled iommu/amd: Set translation valid bit only when IO page tables are in use iommu/amd: Introduce function to check and enable SNP iommu/amd: Globally detect SNP support iommu/amd: Process all IVHDs before enabling IOMMU features iommu/amd: Introduce global variable for storing common EFR and EFR2 iommu/amd: Introduce Support for Extended Feature 2 Register iommu/amd: Change macro for IOMMU control register bit shift to decimal value iommu/exynos: Enable default VM instance on SysMMU v7 iommu/exynos: Add SysMMU v7 register set ...	2022-08-06 10:42:38 -07:00
Linus Torvalds	6614a3c316	- The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe Lin, Yang Shi, Anshuman Khandual and Mike Rapoport - Some kmemleak fixes from Patrick Wang and Waiman Long - DAMON updates from SeongJae Park - memcg debug/visibility work from Roman Gushchin - vmalloc speedup from Uladzislau Rezki - more folio conversion work from Matthew Wilcox - enhancements for coherent device memory mapping from Alex Sierra - addition of shared pages tracking and CoW support for fsdax, from Shiyang Ruan - hugetlb optimizations from Mike Kravetz - Mel Gorman has contributed some pagealloc changes to improve latency and realtime behaviour. - mprotect soft-dirty checking has been improved by Peter Xu - Many other singleton patches all over the place -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYuravgAKCRDdBJ7gKXxA jpqSAQDrXSdII+ht9kSHlaCVYjqRFQz/rRvURQrWQV74f6aeiAD+NHHeDPwZn11/ SPktqEUrF1pxnGQxqLh1kUFUhsVZQgE= =w/UH -----END PGP SIGNATURE----- Merge tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Most of the MM queue. A few things are still pending. Liam's maple tree rework didn't make it. This has resulted in a few other minor patch series being held over for next time. Multi-gen LRU still isn't merged as we were waiting for mapletree to stabilize. The current plan is to merge MGLRU into -mm soon and to later reintroduce mapletree, with a view to hopefully getting both into 6.1-rc1. Summary: - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe Lin, Yang Shi, Anshuman Khandual and Mike Rapoport - Some kmemleak fixes from Patrick Wang and Waiman Long - DAMON updates from SeongJae Park - memcg debug/visibility work from Roman Gushchin - vmalloc speedup from Uladzislau Rezki - more folio conversion work from Matthew Wilcox - enhancements for coherent device memory mapping from Alex Sierra - addition of shared pages tracking and CoW support for fsdax, from Shiyang Ruan - hugetlb optimizations from Mike Kravetz - Mel Gorman has contributed some pagealloc changes to improve latency and realtime behaviour. - mprotect soft-dirty checking has been improved by Peter Xu - Many other singleton patches all over the place" [ XFS merge from hell as per Darrick Wong in https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ] * tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits) tools/testing/selftests/vm/hmm-tests.c: fix build mm: Kconfig: fix typo mm: memory-failure: convert to pr_fmt() mm: use is_zone_movable_page() helper hugetlbfs: fix inaccurate comment in hugetlbfs_statfs() hugetlbfs: cleanup some comments in inode.c hugetlbfs: remove unneeded header file hugetlbfs: remove unneeded hugetlbfs_ops forward declaration hugetlbfs: use helper macro SZ_1{K,M} mm: cleanup is_highmem() mm/hmm: add a test for cross device private faults selftests: add soft-dirty into run_vmtests.sh selftests: soft-dirty: add test for mprotect mm/mprotect: fix soft-dirty check in can_change_pte_writable() mm: memcontrol: fix potential oom_lock recursion deadlock mm/gup.c: fix formatting in check_and_migrate_movable_page() xfs: fail dax mount if reflink is enabled on a partition mm/memcontrol.c: remove the redundant updating of stats_flush_threshold userfaultfd: don't fail on unrecognized features hugetlb_cgroup: fix wrong hugetlb cgroup numa stat ...	2022-08-05 16:32:45 -07:00
Linus Torvalds	79b7e67bb9	This pull request contains the following changes for UML: - KASAN support for x86_64 - noreboot command line option, just like qemu's -no-reboot - Various fixes and cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEdgfidid8lnn52cLTZvlZhesYu8EFAmLteZYACgkQZvlZhesY u8F8bRAA2806QUzysg3Nj1AKPiTOj47TuluGu4SXytB0usQgYK/n3Fxr36ULJAOJ 3qZWf2fsAkBLvgX9Sw2QFGfulrpfKnLeTdBXSEbWYWhZ0ZoaEJztKmtfH02kRDOW POedQT5FXMDVjGQdLC7Ycp+WyjaUwrccZ+KRkGWmlr7vNFlxcTlEqBb13mgLdjkY ep8X+SgmAcdvWBd/os+nNn9Al6TbFd4XQCok82DtNrv0ggwXnVPov/ArvZvvn2Oj F028X77180rbrGV+ZnDkV1KSv/ccT5EFebJkfEEcYVjre8o0QoPQmh2tFqXN0d83 2WpIOb1+mQL0VClpC4hKbScpIB5tw8vIHsUT+ifloIgY/puhezx6aWm0TKSA+aTM WitJl1Nf4uNu1rqkBkn9o3VK8CYokTALQIRexHCzvZ70CSxmFbR7EVRSTf7Rr690 Oq7StHagfuTJpddh0wQwaMorIH4s0/bpPoA6m4OhwlppnCpY0Hfl3+AKluNRUtH6 lPeQwfxhd/LKqYW0COElEnReDLzer82kUx/keVyxVINqxpm6YTHVtOgtMCEuVNXg GbS8PFCW2mIP8Is6HJavZYCzG8vnz3wZ9GENujanwLemiIJfINDauybu+nNsE5pO 7v12vWeZ0x2HGM/cFxODrpp4xAkdq8BBLap8/aXB8uJFagmYyhs= =f3Bh -----END PGP SIGNATURE----- Merge tag 'for-linus-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML updates from Richard Weinberger: - KASAN support for x86_64 - noreboot command line option, just like qemu's -no-reboot - Various fixes and cleanups * tag 'for-linus-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: include sys/types.h for size_t um: Replace to_phys() and to_virt() with less generic function names um: Add missing apply_returns() um: add "noreboot" command line option for PANIC_TIMEOUT=-1 setups um: include linux/stddef.h for __always_inline UML: add support for KASAN under x86_64 mm: Add PAGE_ALIGN_DOWN macro um: random: Don't initialise hwrng struct with zero um: remove unused mm_copy_segments um: remove unused variable um: Remove straying parenthesis um: x86: print RIP with symbol arch: um: Fix build for statically linked UML w/ constructors x86/um: Kconfig: Fix indentation um/drivers: Kconfig: Fix indentation um: Kconfig: Fix indentation	2022-08-05 14:03:11 -07:00
Linus Torvalds	9e2f402336	- SGX2 ISA support which makes enclave memory management much more dynamic. For instance, enclaves can now change enclave page permissions on the fly. - Removal of an unused structure member -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmLq2M8ACgkQaDWVMHDJ krCbAw/+J4nHXxZNMQQX1c8CYJ7XHIr+YtsqNFYwH58rJJstHO/YwQf+mesVOeeu 08BYn+T5cdAbShKcxdkowPB17S6w/WzACtUfVhaoRQC7Md40cBiyc45UiC2e1u9g W3Osk5+fTVcSYA9WiizPntIQkjVs9e7hcNKjTyVPnSw8W8mFCLg+ZiPb7YvKERTO o8Wi2+zzX1BGDNOyBEqvnstz9uXDbCbFUTYX6zToBUk+Y1ZPXHwuHgNTtrAqGYaL qyi0O2zoWnfOUmplzjJ/1aPmzPJDPgDNImC+gjTpYXGmg05Ryds+VZAc64IIjqYn K+/5674PZFdsp5/YfctubdsQm0l0xen94sccAacd7KfsVurcHs3E2bdQPDw0htxv svCX0Sai/qv52tPNzw+n9EJRcQsiwd9Pn0rWwx2i8hQcgMFiwCus6DBKhU7uh2Jp oTwlspqJy2NHu9bici78tmsOio9CORjrh1WOfWX+yHEux4dtQAl889Gw5qzId6V1 Bh1MgoAu/pQ78feo96f3h5yOultOtpbTGyXEC8t4MTSpIVgZ2NzfUxe4RhOCBnhA kdftVNfZLGOzwBbgFy0gYTe/ukt1DkP4BNHQilf2I+bUP/kZFlN8wfxBipWzr0bs Skrz4+brBIaTdGoFgzhgt3g5YH16DSasmy/HCkIeV7eaAHFRLoE= =Y7YA -----END PGP SIGNATURE----- Merge tag 'x86_sgx_for_v6.0-2022-08-03.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SGX updates from Dave Hansen: "A set of x86/sgx changes focused on implementing the "SGX2" features, plus a minor cleanup: - SGX2 ISA support which makes enclave memory management much more dynamic. For instance, enclaves can now change enclave page permissions on the fly. - Removal of an unused structure member" * tag 'x86_sgx_for_v6.0-2022-08-03.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits) x86/sgx: Drop 'page_index' from sgx_backing selftests/sgx: Page removal stress test selftests/sgx: Test reclaiming of untouched page selftests/sgx: Test invalid access to removed enclave page selftests/sgx: Test faulty enclave behavior selftests/sgx: Test complete changing of page type flow selftests/sgx: Introduce TCS initialization enclave operation selftests/sgx: Introduce dynamic entry point selftests/sgx: Test two different SGX2 EAUG flows selftests/sgx: Add test for TCS page permission changes selftests/sgx: Add test for EPCM permission changes Documentation/x86: Introduce enclave runtime management section x86/sgx: Free up EPC pages directly to support large page ranges x86/sgx: Support complete page removal x86/sgx: Support modifying SGX page type x86/sgx: Tighten accessible memory range after enclave initialization x86/sgx: Support adding of pages to an initialized enclave x86/sgx: Support restricting of enclave page permissions x86/sgx: Support VA page allocation without reclaiming x86/sgx: Export sgx_encl_page_alloc() ...	2022-08-05 10:47:40 -07:00
Linus Torvalds	3bd6e5854b	asm-generic: updates for 6.0 There are three independent sets of changes: - Sai Prakash Ranjan adds tracing support to the asm-generic version of the MMIO accessors, which is intended to help understand problems with device drivers and has been part of Qualcomm's vendor kernels for many years. - A patch from Sebastian Siewior to rework the handling of IRQ stacks in softirqs across architectures, which is needed for enabling PREEMPT_RT. - The last patch to remove the CONFIG_VIRT_TO_BUS option and some of the code behind that, after the last users of this old interface made it in through the netdev, scsi, media and staging trees. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmLqPPEACgkQmmx57+YA GNlUbQ/+NpIsiA0JUrCGtySt8KrLHdA2dH9lJOR5/iuxfphscPFfWtpcPvcXQWmt a8u7wyI8SHW1ku4U0Y5sO0dBSldDnoIqJ5t4X5d7YNU9yVtEtucqQhZf+GkrPlVD 1HkRu05B7y0k2BMn7BLhSvkpafs3f1lNGXjs8oFBdOF1/zwp/GjcrfCK7KFzqjwU dYrX0SOFlKFd4BZC75VfK+XcKg4LtwIOmJraRRl7alz2Q5Oop2hgjgZxXDPf//vn SPOhXJN/97i1FUpY2TkfHVH1NxbPfjCV4pUnjmLG0Y4NSy9UQ/ZcXHcywIdeuhfa 0LySOIsAqBeccpYYYdg2ubiMDZOXkBfANu/sB9o/EhoHfB4svrbPRDhBIQZMFXJr MJYu+IYce2rvydA/nydo4q++pxR8v1ES1ZIo8bDux+q1CI/zbpQV+f98kPVRA0M7 ajc+5GTIqNIsvHzzadq7eYxcj5Bi8Li2JA9sVkAQ+6iq1TVyeYayMc9eYwONlmqw MD+PFYc651pKtXZCfkLXPIKSwS0uPqBndAibuVhpZ0hxWaCBBdKvY9mrWcPxt0kA tMR8lrosbbrV2K48BFdWTOHvCs2FhHQxPGVPZ/iWuxTA0hHZ9tUlaEkSX+VM57IU KCYQLdWzT8J9vrgqSbgYKlb6pSPz6FIjTfut6NZMmshIbavHV/Q= =aTR0 -----END PGP SIGNATURE----- Merge tag 'asm-generic-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "There are three independent sets of changes: - Sai Prakash Ranjan adds tracing support to the asm-generic version of the MMIO accessors, which is intended to help understand problems with device drivers and has been part of Qualcomm's vendor kernels for many years - A patch from Sebastian Siewior to rework the handling of IRQ stacks in softirqs across architectures, which is needed for enabling PREEMPT_RT - The last patch to remove the CONFIG_VIRT_TO_BUS option and some of the code behind that, after the last users of this old interface made it in through the netdev, scsi, media and staging trees" * tag 'asm-generic-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: uapi: asm-generic: fcntl: Fix typo 'the the' in comment arch//: remove CONFIG_VIRT_TO_BUS soc: qcom: geni: Disable MMIO tracing for GENI SE serial: qcom_geni_serial: Disable MMIO tracing for geni serial asm-generic/io: Add logging support for MMIO accessors KVM: arm64: Add a flag to disable MMIO trace for nVHE KVM lib: Add register read/write tracing support drm/meson: Fix overflow implicit truncation warnings irqchip/tegra: Fix overflow implicit truncation warnings coresight: etm4x: Use asm-generic IO memory barriers arm64: io: Use asm-generic high level MMIO accessors arch/: Disable softirq stacks on PREEMPT_RT.	2022-08-05 10:07:23 -07:00

... 3 4 5 6 7 ...

42510 Commits