OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Paolo Bonzini	30811174f0	KVM: SVM: set IRR in svm_deliver_interrupt SVM has to set IRR for both the AVIC and the software-LAPIC case, so pull it up to the common function that handles both configurations. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-11 12:53:02 -05:00
Maxim Levitsky	0a5f784273	KVM: SVM: extract avic_ring_doorbell The check on the current CPU adds an extra level of indentation to svm_deliver_avic_intr and conflates documentation on what happens if the vCPU exits (of interest to svm_deliver_avic_intr) and migrates (only of interest to avic_ring_doorbell, which calls get/put_cpu()). Extract the wrmsr to a separate function and rewrite the comment in svm_deliver_avic_intr(). Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-11 12:53:02 -05:00
Muhammad Usama Anjum	0316dbb9a0	selftests: kvm: Remove absent target file There is no vmx_pi_mmio_test file. Remove it to get rid of error while creation of selftest archive: rsync: [sender] link_stat "/kselftest/kvm/x86_64/vmx_pi_mmio_test" failed: No such file or directory (2) rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3] Fixes: `6a58150859` ("selftest: KVM: Add intra host migration tests") Reported-by: "kernelci.org bot" <bot@kernelci.org> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Message-Id: <20220210172352.1317554-1-usama.anjum@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-11 12:53:01 -05:00
Paolo Bonzini	ed343aa857	KVM/arm64 fixes for 5.17, take #3 - Fix pending state read of a HW interrupt -----BEGIN PGP SIGNATURE----- iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmIGQwAPHG1hekBrZXJu ZWwub3JnAAoJECPQ0LrRPXpDlWEQAKooUzCqqOOaqB/Ul4NAIco5AgqbApbH2fAy edPIkPAgiIjr52Ldmbga2LeDCRuWiEAGB/GSYemsewGMPmOiIEntggdDr7SraAyC 95fyz5Il4QTkEJGvuw20Fp1NPRKysWNijRg9A3exmVnba5qzDPgVVwxnW5Bn0/my xpOvtvt/XP5kggYPLT6Z6p+1IKooeAUWE6aSzS9WAl2H9F3L1NzsLWumhl0yhWa7 FS6BLN1nZ/Rk/2Eh6aSH2LtVzI/CO2MM7+rkJV5d8B6Zy/gEhakHi3x7qrL3Jr6k JqjHbi5IxHBIBJrlXUKE9K9M39PAF6Zk+P09wVb7EBknwziPCBaqzyzEn4Rx8Ryn GKZKqX8FTkztH7kjGXgc8g85JrDGZeQ7JvgAgiYM8ezyhd5L51pQPMogWCfaP6lt 9jLu+ojEp/8Pa74lKdjpnzAtTEbEzNbpjsNord4ttPBKuKHaWoeXhwFezNAAXQLC g+qJHrnoLuRChLlBaTbbfxQqLpfWXcMD19qUXA2fJyF7y4oX2zn7HbEGvHWLQ2Vc GFYssnnVfIPK1cRNYqqO9eBximiFmbztpm/86oRfPevitecUDzqpcO38T7WMvAVp sm3hYtGnUdUryMTNOuiN9+trJrX6WwtXiiMk9VFaoOWoBhVFlJTNstGeo+549V2O BCiovWQ9 =UV/Z -----END PGP SIGNATURE----- Merge tag 'kvmarm-fixes-5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.17, take #3 - Fix pending state read of a HW interrupt	2022-02-11 12:10:57 -05:00
Marc Zyngier	5bfa685e62	KVM: arm64: vgic: Read HW interrupt pending state from the HW It appears that a read access to GIC[DR]_I[CS]PENDRn doesn't always result in the pending interrupts being accurately reported if they are mapped to a HW interrupt. This is particularily visible when acking the timer interrupt and reading the GICR_ISPENDR1 register immediately after, for example (the interrupt appears as not-pending while it really is...). This is because a HW interrupt has its 'active and pending state' kept in the physical distributor, and not in the virtual one, as mandated by the spec (this is what allows the direct deactivation). The virtual distributor only caries the pending and active states (note the plural, as these are two independent and non-overlapping states). Fix it by reading the HW state back, either from the timer itself or from the distributor if necessary. Reported-by: Ricardo Koller <ricarkol@google.com> Tested-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220208123726.3604198-1-maz@kernel.org	2022-02-11 11:01:12 +00:00
Christian Borntraeger	bfced9f963	KVM: s390: MAINTAINERS: promote Claudio Imbrenda Claudio has volunteered to be more involved in the maintainership of s390 KVM. Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Acked-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20220210085310.26388-1-borntraeger@linux.ibm.com	2022-02-11 09:55:53 +01:00
Oliver Upton	48ebd0cf23	KVM: VMX: Use local pointer to vcpu_vmx in vmx_vcpu_after_set_cpuid() There is a local that contains a pointer to vcpu_vmx already. Just use that instead to get at the structure directly instead of doing pointer arithmetic. No functional change intended. Signed-off-by: Oliver Upton <oupton@google.com> Message-Id: <20220204204705.3538240-8-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:48 -05:00
Vitaly Kuznetsov	e67bd7df28	KVM: selftests: nSVM: Add enlightened MSR-Bitmap selftest Introduce a new test for Hyper-V nSVM extensions (Hyper-V on KVM) and add a test for enlightened MSR-Bitmap feature: - Intercept access to MSR_FS_BASE in L1 and check that this works with enlightened MSR-Bitmap disabled. - Enabled enlightened MSR-Bitmap and check that the intercept still works as expected. - Intercept access to MSR_GS_BASE but don't clear the corresponding bit from clean fields mask, KVM is supposed to skip updating MSR-Bitmap02 and thus the consequent access to the MSR from L2 will not get intercepted. - Finally, clear the corresponding bit from clean fields mask and check that access to MSR_GS_BASE is now intercepted. The test works with the assumption, that access to MSR_FS_BASE/MSR_GS_BASE is not intercepted for L1. If this ever becomes not true the test will fail as nested_svm_exit_handled_msr() always checks L1's MSR-Bitmap for L2 irrespective of clean fields. The behavior is correct as enlightened MSR-Bitmap feature is just an optimization, KVM is not obliged to ignore updates when the corresponding bit in clean fields stays clear. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220203104620.277031-7-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:48 -05:00
Vitaly Kuznetsov	29f557d553	KVM: selftests: nSVM: Update 'struct vmcb_control_area' definition There's a copy of 'struct vmcb_control_area' definition in KVM selftests, update it to allow testing of the newly introduced features. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220203104620.277031-6-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:47 -05:00
Vitaly Kuznetsov	0b815117da	KVM: selftests: nSVM: Set up MSR-Bitmap for SVM guests Similar to VMX, allocate memory for MSR-Bitmap and fill in 'msrpm_base_pa' in VMCB. To use it, tests will need to set INTERCEPT_MSR_PROT interception along with the required bits in the MSR-Bitmap. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220203104620.277031-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:47 -05:00
Vitaly Kuznetsov	70e477d996	KVM: selftests: nVMX: Add enlightened MSR-Bitmap selftest Introduce a test for enlightened MSR-Bitmap feature (Hyper-V on KVM): - Intercept access to MSR_FS_BASE in L1 and check that this works with enlightened MSR-Bitmap disabled. - Enabled enlightened MSR-Bitmap and check that the intercept still works as expected. - Intercept access to MSR_GS_BASE but don't clear the corresponding bit from 'hv_clean_fields', KVM is supposed to skip updating MSR-Bitmap02 and thus the consequent access to the MSR from L2 will not get intercepted. - Finally, clear the corresponding bit from 'hv_clean_fields' and check that access to MSR_GS_BASE is now intercepted. The test works with the assumption, that access to MSR_FS_BASE/MSR_GS_BASE is not intercepted for L1. If this ever becomes not true the test will fail as nested_vmx_exit_handled_msr() always checks L1's MSR-Bitmap for L2 irrespective of 'hv_clean_fields'. The behavior is correct as enlightened MSR-Bitmap feature is just an optimization, KVM is not obliged to ignore updates when the corresponding bit in 'hv_clean_fields' stays clear. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220203104620.277031-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:47 -05:00
Vitaly Kuznetsov	761b5ebaa1	KVM: selftests: nVMX: Properly deal with 'hv_clean_fields' Instead of just resetting 'hv_clean_fields' to 0 on every enlightened vmresume, do the expected cleaning of the corresponding bit on enlightened vmwrite. Avoid direct access to 'current_evmcs' from evmcs_test to support the change. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220203104620.277031-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:46 -05:00
Vitaly Kuznetsov	6081f9c764	KVM: selftests: Adapt hyperv_cpuid test to the newly introduced Enlightened MSR-Bitmap CPUID 0x40000000.EAX is now always present as it has Enlightened MSR-Bitmap feature bit set. Adapt the test accordingly. Opportunistically add a check for the supported eVMCS version range. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220203104620.277031-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:46 -05:00
Vitaly Kuznetsov	66c03a926f	KVM: nSVM: Implement Enlightened MSR-Bitmap feature Similar to nVMX commit `502d2bf5f2` ("KVM: nVMX: Implement Enlightened MSR Bitmap feature"), add support for the feature for nSVM (Hyper-V on KVM). Notable differences from nVMX implementation: - As the feature uses SW reserved fields in VMCB control, KVM needs to make sure it's dealing with a Hyper-V guest (kvm_hv_hypercall_enabled()). - 'msrpm_base_pa' needs to be always be overwritten in nested_svm_vmrun_msrpm(), even when the update is skipped. As an optimization, nested_vmcb02_prepare_control() copies it from VMCB01 so when MSR-Bitmap feature for L2 is disabled nothing needs to be done. - 'struct vmcb_ctrl_area_cached' needs to be extended with clean fields/sw reserved data and __nested_copy_vmcb_control_to_cache() needs to copy it so nested_svm_vmrun_msrpm() can use it later. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220202095100.129834-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:45 -05:00
Vitaly Kuznetsov	9e083ec7bb	KVM: nSVM: Split off common definitions for Hyper-V on KVM and KVM on Hyper-V In preparation to implementing Enlightened MSR-Bitmap feature for Hyper-V on KVM, split off the required definitions into common 'svm/hyperv.h' header. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220202095100.129834-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:45 -05:00
Vitaly Kuznetsov	ce3859172c	KVM: x86: Make kvm_hv_hypercall_enabled() static inline In preparation for using kvm_hv_hypercall_enabled() from SVM code, make it static inline to avoid the need to export it. The function is a simple check with only two call sites currently. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220202095100.129834-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:44 -05:00
Vitaly Kuznetsov	73c25546d4	KVM: nSVM: Track whether changes in L0 require MSR bitmap for L2 to be rebuilt Similar to nVMX commit `ed2a4800ae` ("KVM: nVMX: Track whether changes in L0 require MSR bitmap for L2 to be rebuilt"), introduce a flag to keep track of whether MSR bitmap for L2 needs to be rebuilt due to changes in MSR bitmap for L1 or switching to a different L2. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220202095100.129834-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:44 -05:00
David Matlack	951cb0a3b5	KVM: selftests: Add an option to disable MANUAL_PROTECT_ENABLE and INITIALLY_SET Add an option to dirty_log_perf_test.c to disable KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE and KVM_DIRTY_LOG_INITIALLY_SET so the legacy dirty logging code path can be tested. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-19-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:44 -05:00
David Matlack	e0b728b1f1	KVM: x86/mmu: Add tracepoint for splitting huge pages Add a tracepoint that records whenever KVM eagerly splits a huge page and the error status of the split to indicate if it succeeded or failed and why. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-18-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:43 -05:00
David Matlack	cb00a70bd4	KVM: x86/mmu: Split huge pages mapped by the TDP MMU during KVM_CLEAR_DIRTY_LOG When using KVM_DIRTY_LOG_INITIALLY_SET, huge pages are not write-protected when dirty logging is enabled on the memslot. Instead they are write-protected once userspace invokes KVM_CLEAR_DIRTY_LOG for the first time and only for the specific sub-region being cleared. Enhance KVM_CLEAR_DIRTY_LOG to also try to split huge pages prior to write-protecting to avoid causing write-protection faults on vCPU threads. This also allows userspace to smear the cost of huge page splitting across multiple ioctls, rather than splitting the entire memslot as is the case when initially-all-set is not used. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-17-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:43 -05:00
David Matlack	a3fe5dbda0	KVM: x86/mmu: Split huge pages mapped by the TDP MMU when dirty logging is enabled When dirty logging is enabled without initially-all-set, try to split all huge pages in the memslot down to 4KB pages so that vCPUs do not have to take expensive write-protection faults to split huge pages. Eager page splitting is best-effort only. This commit only adds the support for the TDP MMU, and even there splitting may fail due to out of memory conditions. Failures to split a huge page is fine from a correctness standpoint because KVM will always follow up splitting by write-protecting any remaining huge pages. Eager page splitting moves the cost of splitting huge pages off of the vCPU threads and onto the thread enabling dirty logging on the memslot. This is useful because: 1. Splitting on the vCPU thread interrupts vCPUs execution and is disruptive to customers whereas splitting on VM ioctl threads can run in parallel with vCPU execution. 2. Splitting all huge pages at once is more efficient because it does not require performing VM-exit handling or walking the page table for every 4KiB page in the memslot, and greatly reduces the amount of contention on the mmu_lock. For example, when running dirty_log_perf_test with 96 virtual CPUs, 1GiB per vCPU, and 1GiB HugeTLB memory, the time it takes vCPUs to write to all of their memory after dirty logging is enabled decreased by 95% from 2.94s to 0.14s. Eager Page Splitting is over 100x more efficient than the current implementation of splitting on fault under the read lock. For example, taking the same workload as above, Eager Page Splitting reduced the CPU required to split all huge pages from ~270 CPU-seconds ((2.94s - 0.14s) * 96 vCPU threads) to only 1.55 CPU-seconds. Eager page splitting does increase the amount of time it takes to enable dirty logging since it has split all huge pages. For example, the time it took to enable dirty logging in the 96GiB region of the aforementioned test increased from 0.001s to 1.55s. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-16-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:42 -05:00
David Matlack	a82070b6e7	KVM: x86/mmu: Separate TDP MMU shadow page allocation and initialization Separate the allocation of shadow pages from their initialization. This is in preparation for splitting huge pages outside of the vCPU fault context, which requires a different allocation mechanism. No functional changed intended. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-15-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:41 -05:00
David Matlack	a3aca4de0d	KVM: x86/mmu: Derive page role for TDP MMU shadow pages from parent Derive the page role from the parent shadow page, since the only thing that changes is the level. This is in preparation for splitting huge pages during VM-ioctls which do not have access to the vCPU MMU context. No functional change intended. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-14-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:41 -05:00
David Matlack	a81399a573	KVM: x86/mmu: Remove redundant role overrides for TDP MMU shadow pages The vCPU's mmu_role already has the correct values for direct, has_4_byte_gpte, access, and ad_disabled. Remove the code that was redundantly overwriting these fields with the same values. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-13-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:41 -05:00
David Matlack	77aa60753a	KVM: x86/mmu: Refactor TDP MMU iterators to take kvm_mmu_page root Instead of passing a pointer to the root page table and the root level separately, pass in a pointer to the root kvm_mmu_page struct. This reduces the number of arguments by 1, cutting down on line lengths. No functional change intended. Reviewed-by: Ben Gardon <bgardon@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-12-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:40 -05:00
David Matlack	315d86da89	KVM: x86/mmu: Move restore_acc_track_spte() to spte.h restore_acc_track_spte() is pure SPTE bit manipulation, making it a good fit for spte.h. And now that the WARN_ON_ONCE() calls have been removed, there isn't any good reason to not inline it. This move also prepares for a follow-up commit that will need to call restore_acc_track_spte() from spte.c No functional change intended. Reviewed-by: Ben Gardon <bgardon@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-11-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:40 -05:00
David Matlack	77c23c77f9	KVM: x86/mmu: Drop new_spte local variable from restore_acc_track_spte() The new_spte local variable is unnecessary. Deleting it can save a line of code and simplify the remaining lines a bit. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-10-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:39 -05:00
David Matlack	59940e76d1	KVM: x86/mmu: Remove unnecessary warnings from restore_acc_track_spte() The warnings in restore_acc_track_spte() can be removed because the only caller checks is_access_track_spte(), and is_access_track_spte() checks !spte_ad_enabled(). In other words, the warning can never be triggered. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-9-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:39 -05:00
David Matlack	7b7e1ab6fd	KVM: x86/mmu: Consolidate logic to atomically install a new TDP MMU page table Consolidate the logic to atomically replace an SPTE with an SPTE that points to a new page table into a single helper function. This will be used in a follow-up commit to split huge pages, which involves replacing each huge page SPTE with an SPTE that points to a page table. Opportunistically drop the call to trace_kvm_mmu_get_page() in kvm_tdp_mmu_map() since it is redundant with the identical tracepoint in tdp_mmu_alloc_sp(). No functional change intended. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-8-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:39 -05:00
David Matlack	0f53dfa34e	KVM: x86/mmu: Rename handle_removed_tdp_mmu_page() to handle_removed_pt() First remove tdp_mmu_ from the name since it is redundant given that it is a static function in tdp_mmu.c. There is a pattern of using tdp_mmu_ as a prefix in the names of static TDP MMU functions, but all of the other handle_*() variants do not include such a prefix. So drop it entirely. Then change "page" to "pt" to convey that this is operating on a page table rather than an struct page. Purposely use "pt" instead of "sp" since this function takes the raw RCU-protected page table pointer as an argument rather than a pointer to the struct kvm_mmu_page. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-7-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:38 -05:00
David Matlack	c298a30c28	KVM: x86/mmu: Rename TDP MMU functions that handle shadow pages Rename 3 functions in tdp_mmu.c that handle shadow pages: alloc_tdp_mmu_page() -> tdp_mmu_alloc_sp() tdp_mmu_link_page() -> tdp_mmu_link_sp() tdp_mmu_unlink_page() -> tdp_mmu_unlink_sp() These changed make tdp_mmu a consistent prefix before the verb in the function name, and make it more clear that these functions deal with kvm_mmu_page structs rather than struct pages. One could argue that "shadow page" is the wrong term for a page table in the TDP MMU since it never actually shadows a guest page table. However, "shadow page" (or "sp" for short) has evolved to become the standard term in KVM when referring to a kvm_mmu_page struct, and its associated page table and other metadata, regardless of whether the page table shadows a guest page table. So this commit just makes the TDP MMU more consistent with the rest of KVM. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-6-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:38 -05:00
David Matlack	3e72c791fd	KVM: x86/mmu: Change tdp_mmu_{set,zap}_spte_atomic() to return 0/-EBUSY tdp_mmu_set_spte_atomic() and tdp_mmu_zap_spte_atomic() return a bool with true indicating the SPTE modification was successful and false indicating failure. Change these functions to return an int instead since that is the common practice. Opportunistically fix up the kernel-doc style for the Return section above tdp_mmu_set_spte_atomic(). No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-5-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:37 -05:00
David Matlack	3255530ab1	KVM: x86/mmu: Automatically update iter->old_spte if cmpxchg fails Consolidate a bunch of code that was manually re-reading the spte if the cmpxchg failed. There is no extra cost of doing this because we already have the spte value as a result of the cmpxchg (and in fact this eliminates re-reading the spte), and none of the call sites depend on iter->old_spte retaining the stale spte value. Reviewed-by: Ben Gardon <bgardon@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-4-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:37 -05:00
David Matlack	1346bbb6b4	KVM: x86/mmu: Rename __rmap_write_protect() to rmap_write_protect() The function formerly known as rmap_write_protect() has been renamed to kvm_vcpu_write_protect_gfn(), so we can get rid of the double underscores in front of __rmap_write_protect(). No functional change intended. Reviewed-by: Ben Gardon <bgardon@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-3-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:37 -05:00
David Matlack	cf48f9e286	KVM: x86/mmu: Rename rmap_write_protect() to kvm_vcpu_write_protect_gfn() rmap_write_protect() is a poor name because it also write-protects SPTEs in the TDP MMU, not just SPTEs in the rmap. It is also confusing that rmap_write_protect() is not a simple wrapper around __rmap_write_protect(), since that is the common pattern for functions with double-underscore names. Rename rmap_write_protect() to kvm_vcpu_write_protect_gfn() to convey that KVM is write-protecting a specific gfn in the context of a vCPU. No functional change intended. Reviewed-by: Ben Gardon <bgardon@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-2-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:36 -05:00
Sean Christopherson	413af6601f	KVM: x86: Add checks for reserved-to-zero Hyper-V hypercall fields Add checks for the three fields in Hyper-V's hypercall params that must be zero. Per the TLFS, HV_STATUS_INVALID_HYPERCALL_INPUT is returned if "A reserved bit in the specified hypercall input value is non-zero." Note, some versions of the TLFS have an off-by-one bug for the last reserved field, and define it as being bits 64:60. See https://github.com/MicrosoftDocs/Virtualization-Documentation/pull/1682. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211207220926.718794-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:36 -05:00
Sean Christopherson	40421f38f6	KVM: x86: Reject fixeds-size Hyper-V hypercalls with non-zero "var_cnt" Reject Hyper-V hypercalls if the guest specifies a non-zero variable size header (var_cnt in KVM) for a hypercall that has a fixed header size. Per the TLFS: It is illegal to specify a non-zero variable header size for a hypercall that is not explicitly documented as accepting variable sized input headers. In such a case the hypercall will result in a return code of HV_STATUS_INVALID_HYPERCALL_INPUT. Note, at least some of the various DEBUG commands likely aren't allowed to use variable size headers, but the TLFS documentation doesn't clearly state what is/isn't allowed. Omit them for now to avoid unnecessary breakage. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211207220926.718794-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:35 -05:00
Sean Christopherson	9c52f6b3d8	KVM: x86: Shove vp_bitmap handling down into sparse_set_to_vcpu_mask() Move the vp_bitmap "allocation" that's needed to handle mismatched vp_index values down into sparse_set_to_vcpu_mask() and drop __always_inline from said helper. The need for an intermediate vp_bitmap is a detail that's specific to the sparse translation with mismatched VP<=>vCPU indexes and does not need to be exposed to the caller. Regarding the __always_inline, prior to commit `f21dd49450` ("KVM: x86: hyperv: optimize sparse VP set processing") the helper, then named hv_vcpu_in_sparse_set(), was a tiny bit of code that effectively boiled down to a handful of bit ops. The __always_inline was understandable, if not justifiable. Since the aforementioned change, sparse_set_to_vcpu_mask() is a chunky 350-450+ bytes of code without KASAN=y, and balloons to 1100+ with KASAN=y. In other words, it has no business being forcefully inlined. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211207220926.718794-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:35 -05:00
Sean Christopherson	79661c3766	KVM: x86: Don't bother reading sparse banks that end up being ignored When handling "sparse" VP_SET requests, don't read sparse banks that can't possibly contain a legal VP index instead of ignoring such banks later on in sparse_set_to_vcpu_mask(). This allows KVM to cap the size of its sparse_banks arrays for VP_SET at KVM_HV_MAX_SPARSE_VCPU_SET_BITS. Add a compile time assert that KVM_HV_MAX_SPARSE_VCPU_SET_BITS<=64, i.e. that KVM_MAX_VCPUS<=4096, as the TLFS allows for at most 64 sparse banks, and KVM will need to do _something_ to play nice with Hyper-V. Reducing the size of sparse_banks fudges around a compilation warning (that becomes error with KVM_WERROR=y) when CONFIG_KASAN_STACK=y, which is selected (and can't be unselected) by CONFIG_KASAN=y when using gcc (clang/LLVM is a stack hog in some cases so it's opt-in for clang). KASAN_STACK adds a redzone around every stack variable, which pushes the Hyper-V functions over the default limit of 1024. Ideally, KVM would flat out reject such impossibilities, but the TLFS explicitly allows providing empty banks, even if a bank can't possibly contain a valid VP index due to its position exceeding KVM's max. Furthermore, for a bit 1 in ValidBankMask, it is valid state for the corresponding element in BanksContents can be all 0s, meaning no processors are specified in this bank. Arguably KVM should reject and not ignore the "extra" banks, but that can be done independently and without bloating sparse_banks, e.g. by reading each "extra" 8-byte chunk individually. Reported-by: Ajay Garg <ajaygargnsit@gmail.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211207220926.718794-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:34 -05:00
Sean Christopherson	a0dd008fe9	KVM: x86: Add a helper to get the sparse VP_SET for IPIs and TLB flushes Add a helper, kvm_get_sparse_vp_set(), to handle sanity checks related to the VARHEAD field and reading the sparse banks of a VP_SET. A future commit to reduce the memory footprint of sparse_banks will introduce more common code to the sparse bank retrieval. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211207220926.718794-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:34 -05:00
Sean Christopherson	25af908118	KVM: x86: Refactor kvm_hv_flush_tlb() to reduce indentation Refactor the "extended" path of kvm_hv_flush_tlb() to reduce the nesting depth for the non-fast sparse path, and to make the code more similar to the extended path in kvm_hv_send_ipi(). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211207220926.718794-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:34 -05:00
Sean Christopherson	bd1ba5732b	KVM: x86: Get the number of Hyper-V sparse banks from the VARHEAD field Get the number of sparse banks from the VARHEAD field, which the guest is required to provide as "The size of a variable header, in QWORDS.", where the variable header is: Variable Header Bytes = {Total Header Bytes - sizeof(Fixed Header)} rounded up to nearest multiple of 8 Variable HeaderSize = Variable Header Bytes / 8 In other words, the VARHEAD should match the number of sparse banks. Keep the manual count as a sanity check, but otherwise rely on the field so as to more closely align with the logic defined in the TLFS and to allow for future cleanups. Tweak the tracepoint output to use "rep_cnt" instead of simply "cnt" now that there is also "var_cnt". Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211207220926.718794-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:33 -05:00
David Matlack	02844ac1eb	KVM: x86/mmu: Consolidate comments about {Host,MMU}-writable Consolidate the large comment above DEFAULT_SPTE_HOST_WRITABLE with the large comment above is_writable_pte() into one comment. This comment explains the different reasons why an SPTE may be non-writable and KVM keeps track of that with the {Host,MMU}-writable bits. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220125230723.1701061-1-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:33 -05:00
David Matlack	1ca87e015d	KVM: x86/mmu: Rename DEFAULT_SPTE_MMU_WRITEABLE to DEFAULT_SPTE_MMU_WRITABLE Both "writeable" and "writable" are valid, but we should be consistent about which we use. DEFAULT_SPTE_MMU_WRITEABLE was the odd one out in the SPTE code, so rename it to DEFAULT_SPTE_MMU_WRITABLE. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220125230713.1700406-1-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:33 -05:00
David Matlack	006100212d	KVM: x86/mmu: Move is_writable_pte() to spte.h Move is_writable_pte() close to the other functions that check writability information about SPTEs. While here opportunistically replace the open-coded bit arithmetic in check_spte_writable_invariants() with a call to is_writable_pte(). No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220125230518.1697048-4-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:32 -05:00
David Matlack	115111efd9	KVM: x86/mmu: Check SPTE writable invariants when setting leaf SPTEs Check SPTE writable invariants when setting SPTEs rather than in spte_can_locklessly_be_made_writable(). By the time KVM checks spte_can_locklessly_be_made_writable(), the SPTE has long been since corrupted. Note that these invariants only apply to shadow-present leaf SPTEs (i.e. not to MMIO SPTEs, non-leaf SPTEs, etc.). Add a comment explaining the restriction and only instrument the code paths that set shadow-present leaf SPTEs. To account for access tracking, also check the SPTE writable invariants when marking an SPTE as an access track SPTE. This also lets us remove a redundant WARN from mark_spte_for_access_track(). Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220125230518.1697048-3-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:32 -05:00
David Matlack	932859a4e0	KVM: x86/mmu: Move SPTE writable invariant checks to a helper function Move the WARNs in spte_can_locklessly_be_made_writable() to a separate helper function. This is in preparation for moving these checks to the places where SPTEs are set. Opportunistically add warning error messages that include the SPTE to make future debugging of these warnings easier. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220125230518.1697048-2-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:31 -05:00
Wanpeng Li	1714a4eb6f	KVM: LAPIC: Enable timer posted-interrupt only when mwait/hlt is advertised As commit `0c5f81dad4` ("KVM: LAPIC: Inject timer interrupt via posted interrupt") mentioned that the host admin should well tune the guest setup, so that vCPUs are placed on isolated pCPUs, and with several pCPUs surplus for busy housekeeping. In this setup, it is preferrable to disable mwait/hlt/pause vmexits to keep the vCPUs in non-root mode. However, if only some guests isolated and others not, they would not have any benefit from posted timer interrupts, and at the same time lose VMX preemption timer fast paths because kvm_can_post_timer_interrupt() returns true and therefore forces kvm_can_use_hv_timer() to false. By guaranteeing that posted-interrupt timer is only used if MWAIT or HLT are done without vmexit, KVM can make a better choice and use the VMX preemption timer and the corresponding fast paths. Reported-by: Aili Yao <yaoaili@kingsoft.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Cc: Aili Yao <yaoaili@kingsoft.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1643112538-36743-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:31 -05:00
Wanpeng Li	9b44423bf4	KVM: VMX: Dont' send posted IRQ if vCPU == this vCPU and vCPU is IN_GUEST_MODE When delivering a virtual interrupt, don't actually send a posted interrupt if the target vCPU is also the currently running vCPU and is IN_GUEST_MODE, in which case the interrupt is being sent from a VM-Exit fastpath and the core run loop in vcpu_enter_guest() will manually move the interrupt from the PIR to vmcs.GUEST_RVI. IRQs are disabled while IN_GUEST_MODE, thus there's no possibility of the virtual interrupt being sent from anything other than KVM, i.e. KVM won't suppress a wake event from an IRQ handler (see commit `fdba608f15`, "KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU"). Eliding the posted interrupt restores the performance provided by the combination of commits `379a3c8ee4` ("KVM: VMX: Optimize posted-interrupt delivery for timer fastpath") and `26efe2fd92` ("KVM: VMX: Handle preemption timer fastpath"). Thanks Sean for better comments. Suggested-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1643111979-36447-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:30 -05:00
Sean Christopherson	23e5092b6e	KVM: SVM: Rename hook implementations to conform to kvm_x86_ops' names Massage SVM's implementation names that still diverge from kvm_x86_ops to allow for wiring up all SVM-defined functions via kvm-x86-ops.h. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-22-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:30 -05:00

... 3 4 5 6 7 ...

1073324 Commits All Branches Search

1073324 Commits

All Branches