OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Sean Christopherson	08a9d59c6a	KVM: x86: Use KBUILD_MODNAME to specify vendor module name Use KBUILD_MODNAME to specify the vendor module name instead of manually writing out the name to make it a bit more obvious that the name isn't completely arbitrary. A future patch will also use KBUILD_MODNAME to define pr_fmt, at which point using KBUILD_MODNAME for kvm_x86_ops.name further reinforces the intended usage of kvm_x86_ops.name. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-34-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:41:30 -05:00
Sean Christopherson	81a1cf9f89	KVM: Drop kvm_arch_check_processor_compat() hook Drop kvm_arch_check_processor_compat() and its support code now that all architecture implementations are nops. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Eric Farman <farman@linux.ibm.com> # s390 Acked-by: Anup Patel <anup@brainfault.org> Reviewed-by: Kai Huang <kai.huang@intel.com> Message-Id: <20221130230934.1014142-33-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:41:28 -05:00
Sean Christopherson	3045c483ee	KVM: x86: Do CPU compatibility checks in x86 code Move the CPU compatibility checks to pure x86 code, i.e. drop x86's use of the common kvm_x86_check_cpu_compat() arch hook. x86 is the only architecture that "needs" to do per-CPU compatibility checks, moving the logic to x86 will allow dropping the common code, and will also give x86 more control over when/how the compatibility checks are performed, e.g. TDX will need to enable hardware (do VMXON) in order to perform compatibility checks. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Message-Id: <20221130230934.1014142-32-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:41:26 -05:00
Sean Christopherson	58ca193031	KVM: VMX: Make VMCS configuration/capabilities structs read-only after init Tag vmcs_config and vmx_capability structs as __init, the canonical configuration is generated during hardware_setup() and must never be modified after that point. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-31-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:41:25 -05:00
Sean Christopherson	a578a0a9e3	KVM: Drop kvm_arch_{init,exit}() hooks Drop kvm_arch_init() and kvm_arch_exit() now that all implementations are nops. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> # s390 Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Acked-by: Anup Patel <anup@brainfault.org> Message-Id: <20221130230934.1014142-30-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:41:23 -05:00
Sean Christopherson	6c30cd2ef4	KVM: s390: Mark __kvm_s390_init() and its descendants as __init Tag __kvm_s390_init() and its unique helpers as __init. These functions are only ever called during module_init(), but could not be tagged accordingly while they were invoked from the common kvm_arch_init(), which is not __init because of x86. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Message-Id: <20221130230934.1014142-29-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:41:22 -05:00
Sean Christopherson	b844926591	KVM: s390: Do s390 specific init without bouncing through kvm_init() Move the guts of kvm_arch_init() into a new helper, __kvm_s390_init(), and invoke the new helper directly from kvm_s390_init() instead of bouncing through kvm_init(). Invoking kvm_arch_init() is the very first action performed by kvm_init(), i.e. this is a glorified nop. Moving setup to __kvm_s390_init() will allow tagging more functions as __init, and emptying kvm_arch_init() will allow dropping the hook entirely once all architecture implementations are nops. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20221130230934.1014142-28-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:41:21 -05:00
Sean Christopherson	ae19b15d91	KVM: PPC: Move processor compatibility check to module init Move KVM PPC's compatibility checks to their respective module_init() hooks, there's no need to wait until KVM's common compat check, nor is there a need to perform the check on every CPU (provided by common KVM's hook), as the compatibility checks operate on global data. arch/powerpc/include/asm/cputable.h: extern struct cpu_spec *cur_cpu_spec; arch/powerpc/kvm/book3s.c: return 0 arch/powerpc/kvm/e500.c: strcmp(cur_cpu_spec->cpu_name, "e500v2") arch/powerpc/kvm/e500mc.c: strcmp(cur_cpu_spec->cpu_name, "e500mc") strcmp(cur_cpu_spec->cpu_name, "e5500") strcmp(cur_cpu_spec->cpu_name, "e6500") Cc: Fabiano Rosas <farosas@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Message-Id: <20221130230934.1014142-27-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:41:19 -05:00
Sean Christopherson	45b66dc139	KVM: RISC-V: Tag init functions and data with __init, __ro_after_init Now that KVM setup is handled directly in riscv_kvm_init(), tag functions and data that are used/set only during init with __init/__ro_after_init. Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Anup Patel <anup@brainfault.org> Message-Id: <20221130230934.1014142-26-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:41:18 -05:00
Sean Christopherson	20deee32f5	KVM: RISC-V: Do arch init directly in riscv_kvm_init() Fold the guts of kvm_arch_init() into riscv_kvm_init() instead of bouncing through kvm_init()=>kvm_arch_init(). Functionally, this is a glorified nop as invoking kvm_arch_init() is the very first action performed by kvm_init(). Moving setup to riscv_kvm_init(), which is tagged __init, will allow tagging more functions and data with __init and __ro_after_init. And emptying kvm_arch_init() will allow dropping the hook entirely once all architecture implementations are nops. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Acked-by: Anup Patel <anup@brainfault.org> Message-Id: <20221130230934.1014142-25-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:41:16 -05:00
Sean Christopherson	eed9fcdf57	KVM: MIPS: Register die notifier prior to kvm_init() Call kvm_init() only after _all_ setup is complete, as kvm_init() exposes /dev/kvm to userspace and thus allows userspace to create VMs (and call other ioctls). Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20221130230934.1014142-24-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:41:15 -05:00
Sean Christopherson	3fb8e89aa2	KVM: MIPS: Setup VZ emulation? directly from kvm_mips_init() Invoke kvm_mips_emulation_init() directly from kvm_mips_init() instead of bouncing through kvm_init()=>kvm_arch_init(). Functionally, this is a glorified nop as invoking kvm_arch_init() is the very first action performed by kvm_init(). Emptying kvm_arch_init() will allow dropping the hook entirely once all architecture implementations are nops. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20221130230934.1014142-23-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:41:13 -05:00
Sean Christopherson	1cfc1c7bf5	KVM: MIPS: Hardcode callbacks to hardware virtualization extensions Now that KVM no longer supports trap-and-emulate (see commit `45c7e8af4a` "MIPS: Remove KVM_TE support"), hardcode the MIPS callbacks to the virtualization callbacks. Harcoding the callbacks eliminates the technically-unnecessary check on non-NULL kvm_mips_callbacks in kvm_arch_init(). MIPS has never supported multiple in-tree modules, i.e. barring an out-of-tree module, where copying and renaming kvm.ko counts as "out-of-tree", KVM could never encounter a non-NULL set of callbacks during module init. The callback check is also subtly broken, as it is not thread safe, i.e. if there were multiple modules, loading both concurrently would create a race between checking and setting kvm_mips_callbacks. Given that out-of-tree shenanigans are not the kernel's responsibility, hardcode the callbacks to simplify the code. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20221130230934.1014142-22-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:41:12 -05:00
Sean Christopherson	53bf620a2c	KVM: arm64: Mark kvm_arm_init() and its unique descendants as __init Tag kvm_arm_init() and its unique helper as __init, and tag data that is only ever modified under the kvm_arm_init() umbrella as read-only after init. Opportunistically name the boolean param in kvm_timer_hyp_init()'s prototype to match its definition. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-21-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:41:10 -05:00
Sean Christopherson	1dc0f02d53	KVM: arm64: Do arm/arch initialization without bouncing through kvm_init() Do arm/arch specific initialization directly in arm's module_init(), now called kvm_arm_init(), instead of bouncing through kvm_init() to reach kvm_arch_init(). Invoking kvm_arch_init() is the very first action performed by kvm_init(), so from a initialization perspective this is a glorified nop. Avoiding kvm_arch_init() also fixes a mostly benign bug as kvm_arch_exit() doesn't properly unwind if a later stage of kvm_init() fails. While the soon-to-be-deleted comment about compiling as a module being unsupported is correct, kvm_arch_exit() can still be called by kvm_init() if any step after the call to kvm_arch_init() succeeds. Add a FIXME to call out that pKVM initialization isn't unwound if kvm_init() fails, which is a pre-existing problem inherited from kvm_arch_exit(). Making kvm_arch_init() a nop will also allow dropping kvm_arch_init() and kvm_arch_exit() entirely once all other architectures follow suit. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-20-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:41:08 -05:00
Sean Christopherson	78b3bf485d	KVM: arm64: Unregister perf callbacks if hypervisor finalization fails Undo everything done by init_subsystems() if a later initialization step fails, i.e. unregister perf callbacks in addition to unregistering the power management notifier. Fixes: `bfa79a8054` ("KVM: arm64: Elevate hypervisor mappings creation at EL2") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-19-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:41:07 -05:00
Sean Christopherson	6baaeda878	KVM: arm64: Free hypervisor allocations if vector slot init fails Teardown hypervisor mode if vector slot setup fails in order to avoid leaking any allocations done by init_hyp_mode(). Fixes: `b881cdce77` ("KVM: arm64: Allocate hyp vectors statically") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-18-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:41:06 -05:00
Marc Zyngier	466d27e48d	KVM: arm64: Simplify the CPUHP logic For a number of historical reasons, the KVM/arm64 hotplug setup is pretty complicated, and we have two extra CPUHP notifiers for vGIC and timers. It looks pretty pointless, and gets in the way of further changes. So let's just expose some helpers that can be called from the core CPUHP callback, and get rid of everything else. This gives us the opportunity to drop a useless notifier entry, as well as tidy-up the timer enable/disable, which was a bit odd. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-17-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:41:04 -05:00
Sean Christopherson	3af4a9e61e	KVM: x86: Serialize vendor module initialization (hardware setup) Acquire a new mutex, vendor_module_lock, in kvm_x86_vendor_init() while doing hardware setup to ensure that concurrent calls are fully serialized. KVM rejects attempts to load vendor modules if a different module has already been loaded, but doesn't handle the case where multiple vendor modules are loaded at the same time, and module_init() doesn't run under the global module_mutex. Note, in practice, this is likely a benign bug as no platform exists that supports both SVM and VMX, i.e. barring a weird VM setup, one of the vendor modules is guaranteed to fail a support check before modifying common KVM state. Alternatively, KVM could perform an atomic CMPXCHG on .hardware_enable, but that comes with its own ugliness as it would require setting .hardware_enable before success is guaranteed, e.g. attempting to load the "wrong" could result in spurious failure to load the "right" module. Introduce a new mutex as using kvm_lock is extremely deadlock prone due to kvm_lock being taken under cpus_write_lock(), and in the future, under under cpus_read_lock(). Any operation that takes cpus_read_lock() while holding kvm_lock would potentially deadlock, e.g. kvm_timer_init() takes cpus_read_lock() to register a callback. In theory, KVM could avoid such problematic paths, i.e. do less setup under kvm_lock, but avoiding all calls to cpus_read_lock() is subtly difficult and thus fragile. E.g. updating static calls also acquires cpus_read_lock(). Inverting the lock ordering, i.e. always taking kvm_lock outside cpus_read_lock(), is not a viable option as kvm_lock is taken in various callbacks that may be invoked under cpus_read_lock(), e.g. x86's kvmclock_cpufreq_notifier(). The lockdep splat below is dependent on future patches to take cpus_read_lock() in hardware_enable_all(), but as above, deadlock is already is already possible. ====================================================== WARNING: possible circular locking dependency detected 6.0.0-smp--7ec93244f194-init2 #27 Tainted: G O ------------------------------------------------------ stable/251833 is trying to acquire lock: ffffffffc097ea28 (kvm_lock){+.+.}-{3:3}, at: hardware_enable_all+0x1f/0xc0 [kvm] but task is already holding lock: ffffffffa2456828 (cpu_hotplug_lock){++++}-{0:0}, at: hardware_enable_all+0xf/0xc0 [kvm] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (cpu_hotplug_lock){++++}-{0:0}: cpus_read_lock+0x2a/0xa0 __cpuhp_setup_state+0x2b/0x60 __kvm_x86_vendor_init+0x16a/0x1870 [kvm] kvm_x86_vendor_init+0x23/0x40 [kvm] 0xffffffffc0a4d02b do_one_initcall+0x110/0x200 do_init_module+0x4f/0x250 load_module+0x1730/0x18f0 __se_sys_finit_module+0xca/0x100 __x64_sys_finit_module+0x1d/0x20 do_syscall_64+0x3d/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd -> #0 (kvm_lock){+.+.}-{3:3}: __lock_acquire+0x16f4/0x30d0 lock_acquire+0xb2/0x190 __mutex_lock+0x98/0x6f0 mutex_lock_nested+0x1b/0x20 hardware_enable_all+0x1f/0xc0 [kvm] kvm_dev_ioctl+0x45e/0x930 [kvm] __se_sys_ioctl+0x77/0xc0 __x64_sys_ioctl+0x1d/0x20 do_syscall_64+0x3d/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(cpu_hotplug_lock); lock(kvm_lock); lock(cpu_hotplug_lock); lock(kvm_lock); * DEADLOCK * 1 lock held by stable/251833: #0: ffffffffa2456828 (cpu_hotplug_lock){++++}-{0:0}, at: hardware_enable_all+0xf/0xc0 [kvm] Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-16-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:41:03 -05:00
Sean Christopherson	e32b120071	KVM: VMX: Do _all_ initialization before exposing /dev/kvm to userspace Call kvm_init() only after _all_ setup is complete, as kvm_init() exposes /dev/kvm to userspace and thus allows userspace to create VMs (and call other ioctls). E.g. KVM will encounter a NULL pointer when attempting to add a vCPU to the per-CPU loaded_vmcss_on_cpu list if userspace is able to create a VM before vmx_init() configures said list. BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP CPU: 6 PID: 1143 Comm: stable Not tainted 6.0.0-rc7+ #988 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:vmx_vcpu_load_vmcs+0x68/0x230 [kvm_intel] <TASK> vmx_vcpu_load+0x16/0x60 [kvm_intel] kvm_arch_vcpu_load+0x32/0x1f0 [kvm] vcpu_load+0x2f/0x40 [kvm] kvm_arch_vcpu_create+0x231/0x310 [kvm] kvm_vm_ioctl+0x79f/0xe10 [kvm] ? handle_mm_fault+0xb1/0x220 __x64_sys_ioctl+0x80/0xb0 do_syscall_64+0x2b/0x50 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f5a6b05743b </TASK> Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel(+) kvm irqbypass Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-15-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:41:01 -05:00
Sean Christopherson	4f8396b96a	KVM: x86: Move guts of kvm_arch_init() to standalone helper Move the guts of kvm_arch_init() to a new helper, kvm_x86_vendor_init(), so that VMX can do _all_ arch and vendor initialization before calling kvm_init(). Calling kvm_init() must be the _very_ last step during init, as kvm_init() exposes /dev/kvm to userspace, i.e. allows creating VMs. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:41:00 -05:00
Sean Christopherson	451d39e800	KVM: VMX: Move Hyper-V eVMCS initialization to helper Move Hyper-V's eVMCS initialization to a dedicated helper to clean up vmx_init(), and add a comment to call out that the Hyper-V init code doesn't need to be unwound if vmx_init() ultimately fails. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20221130230934.1014142-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:40:58 -05:00
Sean Christopherson	da66de44b0	KVM: VMX: Don't bother disabling eVMCS static key on module exit Don't disable the eVMCS static key on module exit, kvm_intel.ko owns the key so there can't possibly be users after the kvm_intel.ko is unloaded, at least not without much bigger issues. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-12-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:40:57 -05:00
Sean Christopherson	2916b70fc3	KVM: VMX: Reset eVMCS controls in VP assist page during hardware disabling Reset the eVMCS controls in the per-CPU VP assist page during hardware disabling instead of waiting until kvm-intel's module exit. The controls are activated if and only if KVM creates a VM, i.e. don't need to be reset if hardware is never enabled. Doing the reset during hardware disabling will naturally fix a potential NULL pointer deref bug once KVM disables CPU hotplug while enabling and disabling hardware (which is necessary to fix a variety of bugs). If the kernel is running as the root partition, the VP assist page is unmapped during CPU hot unplug, and so KVM's clearing of the eVMCS controls needs to occur with CPU hot(un)plug disabled, otherwise KVM could attempt to write to a CPU's VP assist page after it's unmapped. Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20221130230934.1014142-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:40:55 -05:00
Sean Christopherson	63a1bd8ad1	KVM: Drop arch hardware (un)setup hooks Drop kvm_arch_hardware_setup() and kvm_arch_hardware_unsetup() now that all implementations are nops. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> # s390 Acked-by: Anup Patel <anup@brainfault.org> Message-Id: <20221130230934.1014142-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:40:54 -05:00
Sean Christopherson	b7483387e3	KVM: x86: Move hardware setup/unsetup to init/exit Now that kvm_arch_hardware_setup() is called immediately after kvm_arch_init(), fold the guts of kvm_arch_hardware_(un)setup() into kvm_arch_{init,exit}() as a step towards dropping one of the hooks. To avoid having to unwind various setup, e.g registration of several notifiers, slot in the vendor hardware setup before the registration of said notifiers and callbacks. Introducing a functional change while moving code is less than ideal, but the alternative is adding a pile of unwinding code, which is much more error prone, e.g. several attempts to move the setup code verbatim all introduced bugs. Add a comment to document that kvm_ops_update() is effectively the point of no return, e.g. it sets the kvm_x86_ops.hardware_enable canary and so needs to be unwound. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:40:52 -05:00
Sean Christopherson	1935542a04	KVM: x86: Do timer initialization after XCR0 configuration Move kvm_arch_init()'s call to kvm_timer_init() down a few lines below the XCR0 configuration code. A future patch will move hardware setup into kvm_arch_init() and slot in vendor hardware setup before the call to kvm_timer_init() so that timer initialization (among other stuff) doesn't need to be unwound if vendor setup fails. XCR0 setup on the other hand needs to happen before vendor hardware setup. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:40:51 -05:00
Sean Christopherson	e43f576225	KVM: s390: Move hardware setup/unsetup to init/exit Now that kvm_arch_hardware_setup() is called immediately after kvm_arch_init(), fold the guts of kvm_arch_hardware_(un)setup() into kvm_arch_{init,exit}() as a step towards dropping one of the hooks. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Message-Id: <20221130230934.1014142-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:40:50 -05:00
Sean Christopherson	b801ef4214	KVM: s390: Unwind kvm_arch_init() piece-by-piece() if a step fails In preparation for folding kvm_arch_hardware_setup() into kvm_arch_init(), unwind initialization one step at a time instead of simply calling kvm_arch_exit(). Using kvm_arch_exit() regardless of which initialization step failed relies on all affected state playing nice with being undone even if said state wasn't first setup. That holds true for state that is currently configured by kvm_arch_init(), but not for state that's handled by kvm_arch_hardware_setup(), e.g. calling gmap_unregister_pte_notifier() without first registering a notifier would result in list corruption due to attempting to delete an entry that was never added to the list. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Message-Id: <20221130230934.1014142-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:40:48 -05:00
Sean Christopherson	73b8dc0413	KVM: Teardown VFIO ops earlier in kvm_exit() Move the call to kvm_vfio_ops_exit() further up kvm_exit() to try and bring some amount of symmetry to the setup order in kvm_init(), and more importantly so that the arch hooks are invoked dead last by kvm_exit(). This will allow arch code to move away from the arch hooks without any change in ordering between arch code and common code in kvm_exit(). That kvm_vfio_ops_exit() is called last appears to be 100% arbitrary. It was bolted on after the fact by commit `571ee1b685` ("kvm: vfio: fix unregister kvm_device_ops of vfio"). The nullified kvm_device_ops_table is also local to kvm_main.c and is used only when there are active VMs, so unless arch code is doing something truly bizarre, nullifying the table earlier in kvm_exit() is little more than a nop. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Message-Id: <20221130230934.1014142-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:40:47 -05:00
Sean Christopherson	c9650228ef	KVM: Allocate cpus_hardware_enabled after arch hardware setup Allocate cpus_hardware_enabled after arch hardware setup so that arch "init" and "hardware setup" are called back-to-back and thus can be combined in a future patch. cpus_hardware_enabled is never used before kvm_create_vm(), i.e. doesn't have a dependency with hardware setup and only needs to be allocated before /dev/kvm is exposed to userspace. Free the object before the arch hooks are invoked to maintain symmetry, and so that arch code can move away from the hooks without having to worry about ordering changes. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Yuan Yao <yuan.yao@intel.com> Message-Id: <20221130230934.1014142-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:40:45 -05:00
Sean Christopherson	5910ccf03d	KVM: Initialize IRQ FD after arch hardware setup Move initialization of KVM's IRQ FD workqueue below arch hardware setup as a step towards consolidating arch "init" and "hardware setup", and eventually towards dropping the hooks entirely. There is no dependency on the workqueue being created before hardware setup, the workqueue is used only when destroying VMs, i.e. only needs to be created before /dev/kvm is exposed to userspace. Move the destruction of the workqueue before the arch hooks to maintain symmetry, and so that arch code can move away from the hooks without having to worry about ordering changes. Reword the comment about kvm_irqfd_init() needing to come after kvm_arch_init() to call out that kvm_arch_init() must come before common KVM does _anything_, as x86 very subtly relies on that behavior to deal with multiple calls to kvm_init(), e.g. if userspace attempts to load kvm_amd.ko and kvm_intel.ko. Tag the code with a FIXME, as x86's subtle requirement is gross, and invoking an arch callback as the very first action in a helper that is called only from arch code is silly. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:40:44 -05:00
Sean Christopherson	2b01281273	KVM: Register /dev/kvm as the _very_ last thing during initialization Register /dev/kvm, i.e. expose KVM to userspace, only after all other setup has completed. Once /dev/kvm is exposed, userspace can start invoking KVM ioctls, creating VMs, etc... If userspace creates a VM before KVM is done with its configuration, bad things may happen, e.g. KVM will fail to properly migrate vCPU state if a VM is created before KVM has registered preemption notifiers. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:40:42 -05:00
Paolo Bonzini	fc471e8310	Merge branch 'kvm-late-6.1' into HEAD x86: * Change tdp_mmu to a read-only parameter * Separate TDP and shadow MMU page fault paths * Enable Hyper-V invariant TSC control selftests: * Use TAP interface for kvm_binary_stats_test and tsc_msrs_test Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:36:47 -05:00
Vitaly Kuznetsov	bd827bd775	KVM: selftests: Test Hyper-V invariant TSC control Add a test for the newly introduced Hyper-V invariant TSC control feature: - HV_X64_MSR_TSC_INVARIANT_CONTROL is not available without HV_ACCESS_TSC_INVARIANT CPUID bit set and available with it. - BIT(0) of HV_X64_MSR_TSC_INVARIANT_CONTROL controls the filtering of architectural invariant TSC (CPUID.80000007H:EDX[8]) bit. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221013095849.705943-8-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:33:33 -05:00
Vitaly Kuznetsov	91a0b5478a	KVM: selftests: Test that values written to Hyper-V MSRs are preserved Enhance 'hyperv_features' selftest by adding a check that KVM preserves values written to PV MSRs. Two MSRs are, however, 'special': - HV_X64_MSR_EOI as it is a 'write-only' MSR, - HV_X64_MSR_RESET as it always reads as '0'. The later doesn't require any special handling right now because the test never writes anything besides '0' to the MSR, leave a TODO node about the fact. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221013095849.705943-7-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:33:32 -05:00
Vitaly Kuznetsov	2f10428ace	KVM: selftests: Convert hyperv_features test to using KVM_X86_CPU_FEATURE() hyperv_features test needs to set certain CPUID bits in Hyper-V feature leaves but instead of open coding this, common KVM_X86_CPU_FEATURE() infrastructure can be used. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221013095849.705943-6-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:33:31 -05:00
Vitaly Kuznetsov	67b16f1805	KVM: selftests: Rename 'msr->available' to 'msr->fault_exepected' in hyperv_features test It may not be clear what 'msr->available' means. The test actually checks that accessing the particular MSR doesn't cause #GP, rename the variable accordingly. While on it, use 'true'/'false' instead of '1'/'0' for 'write'/ 'fault_expected' as these are boolean. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221013095849.705943-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:33:30 -05:00
Vitaly Kuznetsov	2be1bd3a70	KVM: x86: Hyper-V invariant TSC control Normally, genuine Hyper-V doesn't expose architectural invariant TSC (CPUID.80000007H:EDX[8]) to its guests by default. A special PV MSR (HV_X64_MSR_TSC_INVARIANT_CONTROL, 0x40000118) and corresponding CPUID feature bit (CPUID.0x40000003.EAX[15]) were introduced. When bit 0 of the PV MSR is set, invariant TSC bit starts to show up in CPUID. When the feature is exposed to Hyper-V guests, reenlightenment becomes unneeded. Add the feature to KVM. Keep CPUID output intact when the feature wasn't exposed to L1 and implement the required logic for hiding invariant TSC when the feature was exposed and invariant TSC control MSR wasn't written to. Copy genuine Hyper-V behavior and forbid to disable the feature once it was enabled. For the reference, for linux guests, support for the feature was added in commit `dce7cd6275` ("x86/hyperv: Allow guests to enable InvariantTSC"). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221013095849.705943-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:33:29 -05:00
Vitaly Kuznetsov	0fcf86f05a	KVM: x86: Add a KVM-only leaf for CPUID_8000_0007_EDX CPUID_8000_0007_EDX may come handy when X86_FEATURE_CONSTANT_TSC needs to be checked. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221013095849.705943-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:33:28 -05:00
Vitaly Kuznetsov	b961aa757f	x86/hyperv: Add HV_EXPOSE_INVARIANT_TSC define Avoid open coding BIT(0) of HV_X64_MSR_TSC_INVARIANT_CONTROL by adding a dedicated define. While there's only one user at this moment, the upcoming KVM implementation of Hyper-V Invariant TSC feature will need to use it as well. Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221013095849.705943-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:33:27 -05:00
Sean Christopherson	dfe0ecc6f5	KVM: x86/mmu: Pivot on "TDP MMU enabled" when handling direct page faults When handling direct page faults, pivot on the TDP MMU being globally enabled instead of checking if the target MMU is a TDP MMU. Now that the TDP MMU is all-or-nothing, if the TDP MMU is enabled, KVM will reach direct_page_fault() if and only if the MMU is a TDP MMU. When TDP is enabled (obviously required for the TDP MMU), only non-nested TDP page faults reach direct_page_fault(), i.e. nonpaging MMUs are impossible, as NPT requires paging to be enabled and EPT faults use ept_page_fault(). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221012181702.3663607-8-seanjc@google.com> [Use tdp_mmu_enabled variable. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:33:26 -05:00
Sean Christopherson	78fdd2f09f	KVM: x86/mmu: Pivot on "TDP MMU enabled" to check if active MMU is TDP MMU Simplify and optimize the logic for detecting if the current/active MMU is a TDP MMU. If the TDP MMU is globally enabled, then the active MMU is a TDP MMU if it is direct. When TDP is enabled, so called nonpaging MMUs are never used as the only form of shadow paging KVM uses is for nested TDP, and the active MMU can't be direct in that case. Rename the helper and take the vCPU instead of an arbitrary MMU, as nonpaging MMUs can show up in the walk_mmu if L1 is using nested TDP and L2 has paging disabled. Taking the vCPU has the added bonus of cleaning up the callers, all of which check the current MMU but wrap code that consumes the vCPU. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221012181702.3663607-9-seanjc@google.com> [Use tdp_mmu_enabled variable. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:33:25 -05:00
Sean Christopherson	de0322f575	KVM: x86/mmu: Replace open coded usage of tdp_mmu_page with is_tdp_mmu_page() Use is_tdp_mmu_page() instead of querying sp->tdp_mmu_page directly so that all users benefit if KVM ever finds a way to optimize the logic. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221012181702.3663607-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:33:25 -05:00
David Matlack	6c882ef4fc	KVM: x86/mmu: Rename __direct_map() to direct_map() Rename __direct_map() to direct_map() since the leading underscores are unnecessary. This also makes the page fault handler names more consistent: kvm_tdp_mmu_page_fault() calls kvm_tdp_mmu_map() and direct_page_fault() calls direct_map(). Opportunistically make some trivial cleanups to comments that had to be modified anyway since they mentioned __direct_map(). Specifically, use "()" when referring to functions, and include kvm_tdp_mmu_map() among the various callers of disallowed_hugepage_adjust(). No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220921173546.2674386-11-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:33:24 -05:00
David Matlack	9f33697ac7	KVM: x86/mmu: Stop needlessly making MMU pages available for TDP MMU faults Stop calling make_mmu_pages_available() when handling TDP MMU faults. The TDP MMU does not participate in the "available MMU pages" tracking and limiting so calling this function is unnecessary work when handling TDP MMU faults. Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220921173546.2674386-10-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:33:23 -05:00
David Matlack	9aa8ab43b3	KVM: x86/mmu: Split out TDP MMU page fault handling Split out the page fault handling for the TDP MMU to a separate function. This creates some duplicate code, but makes the TDP MMU fault handler simpler to read by eliminating branches and will enable future cleanups by allowing the TDP MMU and non-TDP MMU fault paths to diverge. Only compile in the TDP MMU fault handler for 64-bit builds since kvm_tdp_mmu_map() does not exist in 32-bit builds. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220921173546.2674386-9-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:33:22 -05:00
David Matlack	e5e6f8d254	KVM: x86/mmu: Initialize fault.{gfn,slot} earlier for direct MMUs Move the initialization of fault.{gfn,slot} earlier in the page fault handling code for fully direct MMUs. This will enable a future commit to split out TDP MMU page fault handling without needing to duplicate the initialization of these 2 fields. Opportunistically take advantage of the fact that fault.gfn is initialized in kvm_tdp_page_fault() rather than recomputing it from fault->addr. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220921173546.2674386-8-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:33:21 -05:00
David Matlack	354c908c06	KVM: x86/mmu: Handle no-slot faults in kvm_faultin_pfn() Handle faults on GFNs that do not have a backing memslot in kvm_faultin_pfn() and drop handle_abnormal_pfn(). This eliminates duplicate code in the various page fault handlers. Opportunistically tweak the comment about handling gfn > host.MAXPHYADDR to reflect that the effect of returning RET_PF_EMULATE at that point is to avoid creating an MMIO SPTE for such GFNs. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220921173546.2674386-7-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:33:21 -05:00
David Matlack	cd08d178ff	KVM: x86/mmu: Avoid memslot lookup during KVM_PFN_ERR_HWPOISON handling Pass the kvm_page_fault struct down to kvm_handle_error_pfn() to avoid a memslot lookup when handling KVM_PFN_ERR_HWPOISON. Opportunistically move the gfn_to_hva_memslot() call and @current down into kvm_send_hwpoison_signal() to cut down on line lengths. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220921173546.2674386-6-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-12-29 15:33:20 -05:00

1 2 3 4 5 ...

1153491 Commits All Branches Search

1153491 Commits

All Branches