OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	7c5c3a6177	ARM: * Unwinder implementations for both nVHE modes (classic and protected), complete with an overflow stack * Rework of the sysreg access from userspace, with a complete rewrite of the vgic-v3 view to allign with the rest of the infrastructure * Disagregation of the vcpu flags in separate sets to better track their use model. * A fix for the GICv2-on-v3 selftest * A small set of cosmetic fixes RISC-V: * Track ISA extensions used by Guest using bitmap * Added system instruction emulation framework * Added CSR emulation framework * Added gfp_custom flag in struct kvm_mmu_memory_cache * Added G-stage ioremap() and iounmap() functions * Added support for Svpbmt inside Guest s390: * add an interface to provide a hypervisor dump for secure guests * improve selftests to use TAP interface * enable interpretive execution of zPCI instructions (for PCI passthrough) * First part of deferred teardown * CPU Topology * PV attestation * Minor fixes x86: * Permit guests to ignore single-bit ECC errors * Intel IPI virtualization * Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS * PEBS virtualization * Simplify PMU emulation by just using PERF_TYPE_RAW events * More accurate event reinjection on SVM (avoid retrying instructions) * Allow getting/setting the state of the speaker port data bit * Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent * "Notify" VM exit (detect microarchitectural hangs) for Intel * Use try_cmpxchg64 instead of cmpxchg64 * Ignore benign host accesses to PMU MSRs when PMU is disabled * Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior * Allow NX huge page mitigation to be disabled on a per-vm basis * Port eager page splitting to shadow MMU as well * Enable CMCI capability by default and handle injected UCNA errors * Expose pid of vcpu threads in debugfs * x2AVIC support for AMD * cleanup PIO emulation * Fixes for LLDT/LTR emulation * Don't require refcounted "struct page" to create huge SPTEs * Miscellaneous cleanups: MCE MSR emulation Use separate namespaces for guest PTEs and shadow PTEs bitmasks PIO emulation Reorganize rmap API, mostly around rmap destruction Do not workaround very old KVM bugs for L0 that runs with nesting enabled new selftests API for CPUID Generic: * Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache * new selftests API using struct kvm_vcpu instead of a (vm, id) tuple -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmLnyo4UHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroMtQQf/XjVWiRcWLPR9dqzRM/vvRXpiG+UL jU93R7m6ma99aqTtrxV/AE+kHgamBlma3Cwo+AcWk9uCVNbIhFjv2YKg6HptKU0e oJT3zRYp+XIjEo7Kfw+TwroZbTlG6gN83l1oBLFMqiFmHsMLnXSI2mm8MXyi3dNB vR2uIcTAl58KIprqNNsYJ2dNn74ogOMiXYx9XzoA9/5Xb6c0h4rreHJa5t+0s9RO Gz7Io3PxumgsbJngjyL1Ve5oxhlIAcZA8DU0PQmjxo3eS+k6BcmavGFd45gNL5zg iLpCh4k86spmzh8CWkAAwWPQE4dZknK6jTctJc0OFVad3Z7+X7n0E8TFrA== =PM8o -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "Quite a large pull request due to a selftest API overhaul and some patches that had come in too late for 5.19. ARM: - Unwinder implementations for both nVHE modes (classic and protected), complete with an overflow stack - Rework of the sysreg access from userspace, with a complete rewrite of the vgic-v3 view to allign with the rest of the infrastructure - Disagregation of the vcpu flags in separate sets to better track their use model. - A fix for the GICv2-on-v3 selftest - A small set of cosmetic fixes RISC-V: - Track ISA extensions used by Guest using bitmap - Added system instruction emulation framework - Added CSR emulation framework - Added gfp_custom flag in struct kvm_mmu_memory_cache - Added G-stage ioremap() and iounmap() functions - Added support for Svpbmt inside Guest s390: - add an interface to provide a hypervisor dump for secure guests - improve selftests to use TAP interface - enable interpretive execution of zPCI instructions (for PCI passthrough) - First part of deferred teardown - CPU Topology - PV attestation - Minor fixes x86: - Permit guests to ignore single-bit ECC errors - Intel IPI virtualization - Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS - PEBS virtualization - Simplify PMU emulation by just using PERF_TYPE_RAW events - More accurate event reinjection on SVM (avoid retrying instructions) - Allow getting/setting the state of the speaker port data bit - Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent - "Notify" VM exit (detect microarchitectural hangs) for Intel - Use try_cmpxchg64 instead of cmpxchg64 - Ignore benign host accesses to PMU MSRs when PMU is disabled - Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior - Allow NX huge page mitigation to be disabled on a per-vm basis - Port eager page splitting to shadow MMU as well - Enable CMCI capability by default and handle injected UCNA errors - Expose pid of vcpu threads in debugfs - x2AVIC support for AMD - cleanup PIO emulation - Fixes for LLDT/LTR emulation - Don't require refcounted "struct page" to create huge SPTEs - Miscellaneous cleanups: - MCE MSR emulation - Use separate namespaces for guest PTEs and shadow PTEs bitmasks - PIO emulation - Reorganize rmap API, mostly around rmap destruction - Do not workaround very old KVM bugs for L0 that runs with nesting enabled - new selftests API for CPUID Generic: - Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache - new selftests API using struct kvm_vcpu instead of a (vm, id) tuple" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (606 commits) selftests: kvm: set rax before vmcall selftests: KVM: Add exponent check for boolean stats selftests: KVM: Provide descriptive assertions in kvm_binary_stats_test selftests: KVM: Check stat name before other fields KVM: x86/mmu: remove unused variable RISC-V: KVM: Add support for Svpbmt inside Guest/VM RISC-V: KVM: Use PAGE_KERNEL_IO in kvm_riscv_gstage_ioremap() RISC-V: KVM: Add G-stage ioremap() and iounmap() functions KVM: Add gfp_custom flag in struct kvm_mmu_memory_cache RISC-V: KVM: Add extensible CSR emulation framework RISC-V: KVM: Add extensible system instruction emulation framework RISC-V: KVM: Factor-out instruction emulation into separate sources RISC-V: KVM: move preempt_disable() call in kvm_arch_vcpu_ioctl_run RISC-V: KVM: Make kvm_riscv_guest_timer_init a void function RISC-V: KVM: Fix variable spelling mistake RISC-V: KVM: Improve ISA extension by using a bitmap KVM, x86/mmu: Fix the comment around kvm_tdp_mmu_zap_leafs() KVM: SVM: Dump Virtual Machine Save Area (VMSA) to klog KVM: x86/mmu: Treat NX as a valid SPTE bit for NPT KVM: x86: Do not block APIC write for non ICR registers ...	2022-08-04 14:59:54 -07:00
Paolo Bonzini	63f4b21041	Merge remote-tracking branch 'kvm/next' into kvm-next-5.20 KVM/s390, KVM/x86 and common infrastructure changes for 5.20 x86: * Permit guests to ignore single-bit ECC errors * Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache * Intel IPI virtualization * Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS * PEBS virtualization * Simplify PMU emulation by just using PERF_TYPE_RAW events * More accurate event reinjection on SVM (avoid retrying instructions) * Allow getting/setting the state of the speaker port data bit * Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent * "Notify" VM exit (detect microarchitectural hangs) for Intel * Cleanups for MCE MSR emulation s390: * add an interface to provide a hypervisor dump for secure guests * improve selftests to use TAP interface * enable interpretive execution of zPCI instructions (for PCI passthrough) * First part of deferred teardown * CPU Topology * PV attestation * Minor fixes Generic: * new selftests API using struct kvm_vcpu instead of a (vm, id) tuple x86: * Use try_cmpxchg64 instead of cmpxchg64 * Bugfixes * Ignore benign host accesses to PMU MSRs when PMU is disabled * Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior * x86/MMU: Allow NX huge pages to be disabled on a per-vm basis * Port eager page splitting to shadow MMU as well * Enable CMCI capability by default and handle injected UCNA errors * Expose pid of vcpu threads in debugfs * x2AVIC support for AMD * cleanup PIO emulation * Fixes for LLDT/LTR emulation * Don't require refcounted "struct page" to create huge SPTEs x86 cleanups: * Use separate namespaces for guest PTEs and shadow PTEs bitmasks * PIO emulation * Reorganize rmap API, mostly around rmap destruction * Do not workaround very old KVM bugs for L0 that runs with nesting enabled * new selftests API for CPUID	2022-08-01 03:21:00 -04:00
Samuel Holland	4d0b829881	genirq: Return a const cpumask from irq_data_get_affinity_mask Now that the irq_data_update_affinity helper exists, enforce its use by returning a a const cpumask from irq_data_get_affinity_mask. Since the previous commit already updated places that needed to call irq_data_update_affinity, this commit updates the remaining code that either did not modify the cpumask or immediately passed the modified mask to irq_set_affinity. Signed-off-by: Samuel Holland <samuel@sholland.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220701200056.46555-8-samuel@sholland.org	2022-07-07 09:38:04 +01:00
Suravee Suthikulpanit	bf348f667e	KVM: x86: lapic: Rename [GET/SET]_APIC_DEST_FIELD to [GET/SET]_XAPIC_DEST_FIELD To signify that the macros only support 8-bit xAPIC destination ID. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220519102709.24125-3-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 12:44:34 -04:00
Tianyu Lan	49d6a3c062	x86/Hyper-V: Add SEV negotiate protocol support in Isolation VM Hyper-V Isolation VM current code uses sev_es_ghcb_hv_call() to read/write MSR via GHCB page and depends on the sev code. This may cause regression when sev code changes interface design. The latest SEV-ES code requires to negotiate GHCB version before reading/writing MSR via GHCB page and sev_es_ghcb_hv_call() doesn't work for Hyper-V Isolation VM. Add Hyper-V ghcb related implementation to decouple SEV and Hyper-V code. Negotiate GHCB version in the hyperv_init() and use the version to communicate with Hyper-V in the ghcb hv call function. Fixes: `2ea29c5abb` ("x86/sev: Save the negotiated GHCB version") Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20220614014553.1915929-1-ltykernel@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2022-06-15 18:27:40 +00:00
Linus Torvalds	cb3f09f9af	hyperv-next for 5.17 -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmHhw7oTHHdlaS5saXVA a2VybmVsLm9yZwAKCRB2FHBfkEGgXrjSB/979LV4Dn1PMcFYsSdlFEMeHcjzJdw/ kFnLPXMaPJyfg6QPuf83jxzw9uxw8fcePMdVq/FFBtmVV9fJMAv62B8jaGS1p58c WnAg+7zsTN+xEoJn+tskSSon8BNMWVrl41zP3K4Ged+5j8UEBk62GB8Orz1qkpwL fTh3/+xAvczJeD4zZb1dAm4WnmcQJ4vhg45p07jX6owvnwQAikMFl45aSW54I5o8 vAxGzFgdsZ2NtExnRNKh3b3DozA8JUE89KckBSZnDtq4rH8Fyy6Wij56Hc6v6Cml SUohiNbHX7hsNwit/lxL8wuF97IiA0pQSABobEg3rxfTghTUep51LlaN =/m4A -----END PGP SIGNATURE----- Merge tag 'hyperv-next-signed-20220114' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - More patches for Hyper-V isolation VM support (Tianyu Lan) - Bug fixes and clean-up patches from various people * tag 'hyperv-next-signed-20220114' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: scsi: storvsc: Fix storvsc_queuecommand() memory leak x86/hyperv: Properly deal with empty cpumasks in hyperv_flush_tlb_multi() Drivers: hv: vmbus: Initialize request offers message for Isolation VM scsi: storvsc: Fix unsigned comparison to zero swiotlb: Add CONFIG_HAS_IOMEM check around swiotlb_mem_remap() x86/hyperv: Fix definition of hv_ghcb_pg variable Drivers: hv: Fix definition of hypercall input & output arg variables net: netvsc: Add Isolation VM support for netvsc driver scsi: storvsc: Add Isolation VM support for storvsc driver hyper-v: Enable swiotlb bounce buffer for Isolation VM x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has() swiotlb: Add swiotlb bounce buffer remap function for HV IVM	2022-01-16 15:53:00 +02:00
Vitaly Kuznetsov	51500b71d5	x86/hyperv: Properly deal with empty cpumasks in hyperv_flush_tlb_multi() KASAN detected the following issue: BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_multi+0xf88/0x1060 Read of size 4 at addr ffff8880011ccbc0 by task kcompactd0/33 CPU: 1 PID: 33 Comm: kcompactd0 Not tainted 5.14.0-39.el9.x86_64+debug #1 Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.0 12/17/2019 Call Trace: dump_stack_lvl+0x57/0x7d print_address_description.constprop.0+0x1f/0x140 ? hyperv_flush_tlb_multi+0xf88/0x1060 __kasan_report.cold+0x7f/0x11e ? hyperv_flush_tlb_multi+0xf88/0x1060 kasan_report+0x38/0x50 hyperv_flush_tlb_multi+0xf88/0x1060 flush_tlb_mm_range+0x1b1/0x200 ptep_clear_flush+0x10e/0x150 ... Allocated by task 0: kasan_save_stack+0x1b/0x40 __kasan_kmalloc+0x7c/0x90 hv_common_init+0xae/0x115 hyperv_init+0x97/0x501 apic_intr_mode_init+0xb3/0x1e0 x86_late_time_init+0x92/0xa2 start_kernel+0x338/0x3eb secondary_startup_64_no_verify+0xc2/0xcb The buggy address belongs to the object at ffff8880011cc800 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 960 bytes inside of 1024-byte region [ffff8880011cc800, ffff8880011ccc00) 'hyperv_flush_tlb_multi+0xf88/0x1060' points to hv_cpu_number_to_vp_number() and '960 bytes' means we're trying to get VP_INDEX for CPU#240. 'nr_cpus' here is exactly 240 so we're trying to access past hv_vp_index's last element. This can (and will) happen when 'cpus' mask is empty and cpumask_last() will return '>=nr_cpus'. Commit `ad0a6bad44` ("x86/hyperv: check cpu mask after interrupt has been disabled") tried to deal with empty cpumask situation but apparently didn't fully fix the issue. 'cpus' cpumask which is passed to hyperv_flush_tlb_multi() is 'mm_cpumask(mm)' (which is '&mm->cpu_bitmap'). This mask changes every time the particular mm is scheduled/unscheduled on some CPU (see switch_mm_irqs_off()), disabling IRQs on the CPU which is performing remote TLB flush has zero influence on whether the particular process can get scheduled/unscheduled on _other_ CPUs so e.g. in the case where the mm was scheduled on one other CPU and got unscheduled during hyperv_flush_tlb_multi()'s execution will lead to cpumask becoming empty. It doesn't seem that there's a good way to protect 'mm_cpumask(mm)' from changing during hyperv_flush_tlb_multi()'s execution. It would be possible to copy it in the very beginning of the function but this is a waste. It seems we can deal with changing cpumask just fine. When 'cpus' cpumask changes during hyperv_flush_tlb_multi()'s execution, there are two possible issues: - 'Under-flushing': we will not flush TLB on a CPU which got added to the mask while hyperv_flush_tlb_multi() was already running. This is not a problem as this is equal to mm getting scheduled on that CPU right after TLB flush. - 'Over-flushing': we may flush TLB on a CPU which is already cleared from the mask. First, extra TLB flush preserves correctness. Second, Hyper-V's TLB flush hypercall takes 'mm->pgd' argument so Hyper-V may avoid the flush if CR3 doesn't match. Fix the immediate issue with cpumask_last()/hv_cpu_number_to_vp_number() and remove the pointless cpumask_empty() check from the beginning of the function as it really doesn't protect anything. Also, avoid the hypercall altogether when 'flush->processor_mask' ends up being empty. Fixes: `ad0a6bad44` ("x86/hyperv: check cpu mask after interrupt has been disabled") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20220106094611.1404218-1-vkuznets@redhat.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2022-01-10 11:50:20 +00:00
Michael Kelley	e1878402ab	x86/hyperv: Fix definition of hv_ghcb_pg variable The percpu variable hv_ghcb_pg is incorrectly defined. The __percpu qualifier should be associated with the union hv_ghcb * (i.e., a pointer), not with the target of the pointer. This distinction makes no difference to gcc and the generated code, but sparse correctly complains. Fix the definition in the interest of general correctness in addition to making sparse happy. No functional change. Fixes: `0cc4f6d9f0` ("x86/hyperv: Initialize GHCB page in Isolation VM") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1640662315-22260-2-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-12-28 14:18:47 +00:00
Tianyu Lan	846da38de0	net: netvsc: Add Isolation VM support for netvsc driver In Isolation VM, all shared memory with host needs to mark visible to host via hvcall. vmbus_establish_gpadl() has already done it for netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_ pagebuffer() stills need to be handled. Use DMA API to map/umap these memory during sending/receiving packet and Hyper-V swiotlb bounce buffer dma address will be returned. The swiotlb bounce buffer has been masked to be visible to host during boot up. rx/tx ring buffer is allocated via vzalloc() and they need to be mapped into unencrypted address space(above vTOM) before sharing with host and accessing. Add hv_map/unmap_memory() to map/umap rx /tx ring buffer. Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20211213071407.314309-6-ltykernel@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-12-20 18:01:09 +00:00
Tianyu Lan	062a5c4260	hyper-v: Enable swiotlb bounce buffer for Isolation VM hyperv Isolation VM requires bounce buffer support to copy data from/to encrypted memory and so enable swiotlb force mode to use swiotlb bounce buffer for DMA transaction. In Isolation VM with AMD SEV, the bounce buffer needs to be accessed via extra address space which is above shared_gpa_boundary (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access physical address will be original physical address + shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of memory(vTOM). Memory addresses below vTOM are automatically treated as private while memory above vTOM is treated as shared. Swiotlb bounce buffer code calls set_memory_decrypted() to mark bounce buffer visible to host and map it in extra address space via memremap. Populate the shared_gpa_boundary (vTOM) via swiotlb_unencrypted_base variable. The map function memremap() can't work in the early place (e.g ms_hyperv_init_platform()) and so call swiotlb_update_mem_ attributes() in the hyperv_init(). Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20211213071407.314309-4-ltykernel@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-12-20 18:01:09 +00:00
Thomas Gleixner	1982afd6c0	x86/hyperv: Refactor hv_msi_domain_free_irqs() No point in looking up things over and over. Just look up the associated irq data and work from there. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20211206210224.429625690@linutronix.de	2021-12-09 11:52:21 +01:00
Sean Christopherson	f3e613e72f	x86/hyperv: Move required MSRs check to initial platform probing Explicitly check for MSR_HYPERCALL and MSR_VP_INDEX support when probing for running as a Hyper-V guest instead of waiting until hyperv_init() to detect the bogus configuration. Add messages to give the admin a heads up that they are likely running on a broken virtual machine setup. At best, silently disabling Hyper-V is confusing and difficult to debug, e.g. the kernel _says_ it's using all these fancy Hyper-V features, but always falls back to the native versions. At worst, the half baked setup will crash/hang the kernel. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20211104182239.1302956-3-seanjc@google.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-11-15 12:37:08 +00:00
Sean Christopherson	daf972118c	x86/hyperv: Fix NULL deref in set_hv_tscchange_cb() if Hyper-V setup fails Check for a valid hv_vp_index array prior to derefencing hv_vp_index when setting Hyper-V's TSC change callback. If Hyper-V setup failed in hyperv_init(), the kernel will still report that it's running under Hyper-V, but will have silently disabled nearly all functionality. BUG: kernel NULL pointer dereference, address: 0000000000000010 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.15.0-rc2+ #75 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:set_hv_tscchange_cb+0x15/0xa0 Code: <8b> 04 82 8b 15 12 17 85 01 48 c1 e0 20 48 0d ee 00 01 00 f6 c6 08 ... Call Trace: kvm_arch_init+0x17c/0x280 kvm_init+0x31/0x330 vmx_init+0xba/0x13a do_one_initcall+0x41/0x1c0 kernel_init_freeable+0x1f2/0x23b kernel_init+0x16/0x120 ret_from_fork+0x22/0x30 Fixes: `93286261de` ("x86/hyperv: Reenlightenment notifications support") Cc: stable@vger.kernel.org Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20211104182239.1302956-2-seanjc@google.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-11-15 12:37:08 +00:00
Vitaly Kuznetsov	285f68afa8	x86/hyperv: Protect set_hv_tscchange_cb() against getting preempted The following issue is observed with CONFIG_DEBUG_PREEMPT when KVM loads: KVM: vmx: using Hyper-V Enlightened VMCS BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/488 caller is set_hv_tscchange_cb+0x16/0x80 CPU: 1 PID: 488 Comm: systemd-udevd Not tainted 5.15.0-rc5+ #396 Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.0 12/17/2019 Call Trace: dump_stack_lvl+0x6a/0x9a check_preemption_disabled+0xde/0xe0 ? kvm_gen_update_masterclock+0xd0/0xd0 [kvm] set_hv_tscchange_cb+0x16/0x80 kvm_arch_init+0x23f/0x290 [kvm] kvm_init+0x30/0x310 [kvm] vmx_init+0xaf/0x134 [kvm_intel] ... set_hv_tscchange_cb() can get preempted in between acquiring smp_processor_id() and writing to HV_X64_MSR_REENLIGHTENMENT_CONTROL. This is not an issue by itself: HV_X64_MSR_REENLIGHTENMENT_CONTROL is a partition-wide MSR and it doesn't matter which particular CPU will be used to receive reenlightenment notifications. The only real problem can (in theory) be observed if the CPU whose id was acquired with smp_processor_id() goes offline before we manage to write to the MSR, the logic in hv_cpu_die() won't be able to reassign it correctly. Reported-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20211012155005.1613352-1-vkuznets@redhat.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-10-28 11:59:13 +00:00
Jiapeng Chong	0b90608523	x86/hyperv: Remove duplicate include Clean up the following includecheck warning: ./arch/x86/hyperv/ivm.c: linux/bitfield.h is included more than once. ./arch/x86/hyperv/ivm.c: linux/types.h is included more than once. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Link: https://lore.kernel.org/r/1635325022-99889-1-git-send-email-jiapeng.chong@linux.alibaba.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-10-28 11:36:34 +00:00
Wan Jiabing	c5989b92fd	x86/hyperv: Remove duplicated include in hv_init Fix following checkinclude.pl warning: ./arch/x86/hyperv/hv_init.c: linux/io.h is included more than once. The include is in line 13. Remove the duplicated here. Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Link: https://lore.kernel.org/r/20211026113249.30481-1-wanjiabing@vivo.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-10-28 11:27:58 +00:00
Tianyu Lan	20c89a559e	x86/hyperv: Add ghcb hvcall support for SNP VM hyperv provides ghcb hvcall to handle VMBus HVCALL_SIGNAL_EVENT and HVCALL_POST_MESSAGE msg in SNP Isolation VM. Add such support. Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/20211025122116.264793-8-ltykernel@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-10-28 11:22:49 +00:00
Tianyu Lan	faff44069f	x86/hyperv: Add Write/Read MSR registers via ghcb page Hyperv provides GHCB protocol to write Synthetic Interrupt Controller MSR registers in Isolation VM with AMD SEV SNP and these registers are emulated by hypervisor directly. Hyperv requires to write SINTx MSR registers twice. First writes MSR via GHCB page to communicate with hypervisor and then writes wrmsr instruction to talk with paravisor which runs in VMPL0. Guest OS ID MSR also needs to be set via GHCB page. Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/20211025122116.264793-7-ltykernel@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-10-28 11:22:38 +00:00
Tianyu Lan	810a521265	x86/hyperv: Add new hvcall guest address host visibility support Add new hvcall guest address host visibility support to mark memory visible to host. Call it inside set_memory_decrypted /encrypted(). Add HYPERVISOR feature check in the hv_is_isolation_supported() to optimize in non-virtualization environment. Acked-by: Dave Hansen <dave.hansen@intel.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/20211025122116.264793-4-ltykernel@gmail.com [ wei: fix conflicts with tip ] Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-10-28 11:21:33 +00:00
Tianyu Lan	0cc4f6d9f0	x86/hyperv: Initialize GHCB page in Isolation VM Hyperv exposes GHCB page via SEV ES GHCB MSR for SNP guest to communicate with hypervisor. Map GHCB page for all cpus to read/write MSR register and submit hvcall request via ghcb page. Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/20211025122116.264793-2-ltykernel@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-10-28 10:47:52 +00:00
Linus Torvalds	52bf8031c0	hyperv-fixes for 5.15 -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmFeykwTHHdlaS5saXVA a2VybmVsLm9yZwAKCRB2FHBfkEGgXhRLCADXOOSGKk4L1vWssRRhLmMXI45ElocY EbZ/mXcQhxKnlVhdMNnupGjz+lU5FQGkCCWlhmt9Ml2O6R+lDx+zIUS8BK3Nkom9 twWjueMtum6yFwDMGYALhptVLjDqVFG71QcW0incghpnAx4s2FVE8h38md5MuUFY Kqqf/dRkppSePldHFrRG/e4c6r0WyTsJ6Z9LTU0UYp5GqJcmUJlx7TxxqzGk5Fti GpQ5cFS7JX8xHAkRROk/dvwJte1RRnBAW6lIWxwAaDJ6Gbg7mNfOQe7n+/KRO7ZG gC5hbkP9tMv2nthLxaFbpu791U4lMZ2WiTLZvbgCseO3FCmToXWZ6TDd =1mdq -----END PGP SIGNATURE----- Merge tag 'hyperv-fixes-signed-20211007' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Replace uuid.h with types.h in a header (Andy Shevchenko) - Avoid sleeping in atomic context in PCI driver (Long Li) - Avoid sending IPI to self when it shouldn't (Vitaly Kuznetsov) * tag 'hyperv-fixes-signed-20211007' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/hyperv: Avoid erroneously sending IPI to 'self' hyper-v: Replace uuid.h with types.h PCI: hv: Fix sleep while in non-sleep context when removing child devices from the bus	2021-10-07 09:44:48 -07:00
Vitaly Kuznetsov	f5c20e4a5f	x86/hyperv: Avoid erroneously sending IPI to 'self' __send_ipi_mask_ex() uses an optimization: when the target CPU mask is equal to 'cpu_present_mask' it uses 'HV_GENERIC_SET_ALL' format to avoid converting the specified cpumask to VP_SET. This case was overlooked when 'exclude_self' parameter was added. As the result, a spurious IPI to 'self' can be send. Reported-by: Thomas Gleixner <tglx@linutronix.de> Fixes: `dfb5c1e12c` ("x86/hyperv: remove on-stack cpumask from hv_send_ipi_mask_allbutself") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20211006125016.941616-1-vkuznets@redhat.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-10-06 15:56:45 +00:00
Linus Torvalds	ff1ffd71d5	hyperv-fixes for 5.15-rc2 -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmFB6pwTHHdlaS5saXVA a2VybmVsLm9yZwAKCRB2FHBfkEGgXoo5CAChbzKMMbqBHArnNCO+pKkUWmc7eYqJ U368ux75wWEy6ywCUxCHqhwnTrp5KJhyjTPi89V8Vwh+aNG6q86g2dT3I6qsoIby Dav9yw1NiExxNzAEiJVH/WgE+WGZUvWqzbKixdZWjDk9DWhVv7h96chik9dvh9SW /nm27o4sNmnFETQ+kh/hmX+8T6V8HeqZuL9WrGw4EW9At/WE16vjk47Wm5gJRl+j Z1KylALvOiarzzMH3Qx1IxvZ1789JtCIr2b5rHJH8tCPvPF0P2dihm/Wjf6xguyT tDMvquBdQnfugbZXQDy58Agp34Dw+fHCFaOmoruJePa78qqBYzujHvW9 =gBaz -----END PGP SIGNATURE----- Merge tag 'hyperv-fixes-signed-20210915' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Fix kernel crash caused by uio driver (Vitaly Kuznetsov) - Remove on-stack cpumask from HV APIC code (Wei Liu) * tag 'hyperv-fixes-signed-20210915' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/hyperv: remove on-stack cpumask from hv_send_ipi_mask_allbutself asm-generic/hyperv: provide cpumask_to_vpset_noself Drivers: hv: vmbus: Fix kernel crash upon unbinding a device from uio_hv_generic driver	2021-09-15 17:18:56 -07:00
Wei Liu	dfb5c1e12c	x86/hyperv: remove on-stack cpumask from hv_send_ipi_mask_allbutself It is not a good practice to allocate a cpumask on stack, given it may consume up to 1 kilobytes of stack space if the kernel is configured to have 8192 cpus. The internal helper functions __send_ipi_mask{,_ex} need to loop over the provided mask anyway, so it is not too difficult to skip `self' there. We can thus do away with the on-stack cpumask in hv_send_ipi_mask_allbutself. Adjust call sites of __send_ipi_mask as needed. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Suggested-by: Michael Kelley <mikelley@microsoft.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: `68bb7bfb79` ("X86/Hyper-V: Enable IPI enlightenments") Signed-off-by: Wei Liu <wei.liu@kernel.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210910185714.299411-3-wei.liu@kernel.org	2021-09-11 15:41:00 +00:00
Praveen Kumar	e5d9b714fe	x86/hyperv: fix root partition faults when writing to VP assist page MSR For root partition the VP assist pages are pre-determined by the hypervisor. The root kernel is not allowed to change them to different locations. And thus, we are getting below stack as in current implementation root is trying to perform write to specific MSR. [ 2.778197] unchecked MSR access error: WRMSR to 0x40000073 (tried to write 0x0000000145ac5001) at rIP: 0xffffffff810c1084 (native_write_msr+0x4/0x30) [ 2.784867] Call Trace: [ 2.791507] hv_cpu_init+0xf1/0x1c0 [ 2.798144] ? hyperv_report_panic+0xd0/0xd0 [ 2.804806] cpuhp_invoke_callback+0x11a/0x440 [ 2.811465] ? hv_resume+0x90/0x90 [ 2.818137] cpuhp_issue_call+0x126/0x130 [ 2.824782] __cpuhp_setup_state_cpuslocked+0x102/0x2b0 [ 2.831427] ? hyperv_report_panic+0xd0/0xd0 [ 2.838075] ? hyperv_report_panic+0xd0/0xd0 [ 2.844723] ? hv_resume+0x90/0x90 [ 2.851375] __cpuhp_setup_state+0x3d/0x90 [ 2.858030] hyperv_init+0x14e/0x410 [ 2.864689] ? enable_IR_x2apic+0x190/0x1a0 [ 2.871349] apic_intr_mode_init+0x8b/0x100 [ 2.878017] x86_late_time_init+0x20/0x30 [ 2.884675] start_kernel+0x459/0x4fb [ 2.891329] secondary_startup_64_no_verify+0xb0/0xbb Since the hypervisor already provides the VP assist pages for root partition, we need to memremap the memory from hypervisor for root kernel to use. The mapping is done in hv_cpu_init during bringup and is unmapped in hv_cpu_die during teardown. Signed-off-by: Praveen Kumar <kumarpraveen@linux.microsoft.com> Reviewed-by: Sunil Muthuswamy <sunilmut@microsoft.com> Link: https://lore.kernel.org/r/20210731120519.17154-1-kumarpraveen@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-08-04 11:56:53 +00:00
Michael Kelley	6dc77fa5ac	Drivers: hv: Move Hyper-V misc functionality to arch-neutral code The check for whether hibernation is possible, and the enabling of Hyper-V panic notification during kexec, are both architecture neutral. Move the code from under arch/x86 and into drivers/hv/hv_common.c where it can also be used for ARM64. No functional change. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1626287687-2045-4-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-07-15 12:59:45 +00:00
Michael Kelley	9d7cf2c967	Drivers: hv: Add arch independent default functions for some Hyper-V handlers Architecture independent Hyper-V code calls various arch-specific handlers when needed. To aid in supporting multiple architectures, provide weak defaults that can be overridden by arch-specific implementations where appropriate. But when arch-specific overrides aren't needed or haven't been implemented yet for a particular architecture, these stubs reduce the amount of clutter under arch/. No functional change. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1626287687-2045-3-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-07-15 12:59:45 +00:00
Michael Kelley	afca4d95dd	Drivers: hv: Make portions of Hyper-V init code be arch neutral The code to allocate and initialize the hv_vp_index array is architecture neutral. Similarly, the code to allocate and populate the hypercall input and output arg pages is architecture neutral. Move both sets of code out from arch/x86 and into utility functions in drivers/hv/hv_common.c that can be shared by Hyper-V initialization on ARM64. No functional changes. However, the allocation of the hypercall input and output arg pages is done differently so that the size is always the Hyper-V page size, even if not the same as the guest page size (such as with ARM64's 64K page size). Signed-off-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1626287687-2045-2-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-07-15 12:59:45 +00:00
Michael Kelley	a4d7e8ae4a	Drivers: hv: Move Hyper-V extended capability check to arch neutral code The extended capability query code is currently under arch/x86, but it is architecture neutral, and is used by arch neutral code in the Hyper-V balloon driver. Hence the balloon driver fails to build on other architectures. Fix by moving the ext cap code out from arch/x86. Because it is also called from built-in architecture specific code, it can't be in a module, so the Makefile treats as built-in even when CONFIG_HYPERV is "m". Also drivers/Makefile is tweaked because this is the first occurrence of a Hyper-V file that is built-in even when CONFIG_HYPERV is "m". While here, update the hypercall status check to use the new helper function instead of open coding. No functional change. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Sunil Muthuswamy <sunilmut@microsoft.com> Link: https://lore.kernel.org/r/1622669804-2016-1-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-06-05 10:22:34 +00:00
Linus Torvalds	635de956a7	The x86 MM changes in this cycle were: - Implement concurrent TLB flushes, which overlaps the local TLB flush with the remote TLB flush. In testing this improved sysbench performance measurably by a couple of percentage points, especially if TLB-heavy security mitigations are active. - Further micro-optimizations to improve the performance of TLB flushes. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmCKbNcRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hjYBAAsyNUa/gOu0g6/Cx8R86w9HtHHmm5vso/ 6nJjWj2fd2qJ9JShlddxvXEMeXtPTYabVWQkiiriFMuofk6JeKnlHm1Jzl6keABX OQFwjIFeNASPRcdXvuuYPOVWAJJdr2oL9QUr6OOK1ccQJTz/Cd0zA+VQ5YqcsCon yaWbkxELwKXpgql+qt66eAZ6Q2Y1TKXyrTW7ZgxQi0yeeWqMaEOub0/oyS7Ax1Rg qEJMwm1prb76NPzeqR/G3e4KTrDZfQ/B/KnSsz36GTJpl4eye6XqWDUgm1nAGNIc 5dbc4Vx7JtZsUOuC0AmzWb3hsDyzVcN/lQvijdZ2RsYR3gvuYGaBhKqExqV0XH6P oqaWOKWCz+LqWbsgJmxCpqkt1LZl5+VUOcfJ97WkIS7DyIPtSHTzQXbBMZqKLeat mn5UcKYB2Gi7wsUPv6VC2ChKbDqN0VT8G86XbYylGo4BE46KoZKPUNY/QWKLUPd6 0UKcVeNM2HFyf1C73p/tO/z7hzu3qLuMMnsphP6/c2pKLpdgawEXgbnVKNId1B/c NrzyhTvVaMt+Um28bBRhHONIlzPJwWcnZbdY7NqMnu+LBKQ68cL/h4FOIV/RDLNb GJLgfAr8fIw/zIpqYuFHiiMNo9wWqVtZko1MvXhGceXUL69QuzTra2XR/6aDxkPf 6gQVesetTvo= =3Cyp -----END PGP SIGNATURE----- Merge tag 'x86-mm-2021-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 tlb updates from Ingo Molnar: "The x86 MM changes in this cycle were: - Implement concurrent TLB flushes, which overlaps the local TLB flush with the remote TLB flush. In testing this improved sysbench performance measurably by a couple of percentage points, especially if TLB-heavy security mitigations are active. - Further micro-optimizations to improve the performance of TLB flushes" * tag 'x86-mm-2021-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smp: Micro-optimize smp_call_function_many_cond() smp: Inline on_each_cpu_cond() and on_each_cpu() x86/mm/tlb: Remove unnecessary uses of the inline keyword cpumask: Mark functions as pure x86/mm/tlb: Do not make is_lazy dirty for no reason x86/mm/tlb: Privatize cpu_tlbstate x86/mm/tlb: Flush remote and local TLBs concurrently x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy() x86/mm/tlb: Unify flush_tlb_func_local() and flush_tlb_func_remote() smp: Run functions concurrently in smp_call_function_many_cond()	2021-04-29 11:41:43 -07:00
Linus Torvalds	4d480dbf21	hyperv-next for 5.13 -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmCG9+oTHHdlaS5saXVA a2VybmVsLm9yZwAKCRB2FHBfkEGgXqo5CACQrfupoIeawVUMZQOGPOKW56zcmo+l kwgEYdukleYebJzES3zxdAod2k45WnAJ3aMQJaL2DxZ5SZdTJG1zIK08wlP87ui8 m80Htq/8c3fBM90gjUSjShxHw9SaWwwSQUVBKrm0doS7o0iUq0PPHHE6gvJHMX/w IcHug294c6ArCz0qNR5aiBxPNGixXBX7S7/5ubdjxszU2BVAzrfFLWYOWU4HzHyN g68BDY6F2K9+F3XOVO0zhcCdhzvIzb5Bh0V06VBKl9HRWnk28h0/Y7fBq9HVzCZu k7k5+o6lJUyyFkXR8MlcBKRlWnFXSHc5wIdJ/gcXTzEMsqrJlQ1vrGog =pGet -----END PGP SIGNATURE----- Merge tag 'hyperv-next-signed-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull Hyper-V updates from Wei Liu: - VMBus enhancement - Free page reporting support for Hyper-V balloon driver - Some patches for running Linux as Arm64 Hyper-V guest - A few misc clean-up patches * tag 'hyperv-next-signed-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (30 commits) drivers: hv: Create a consistent pattern for checking Hyper-V hypercall status x86/hyperv: Move hv_do_rep_hypercall to asm-generic video: hyperv_fb: Add ratelimit on error message Drivers: hv: vmbus: Increase wait time for VMbus unload Drivers: hv: vmbus: Initialize unload_event statically Drivers: hv: vmbus: Check for pending channel interrupts before taking a CPU offline Drivers: hv: vmbus: Drivers: hv: vmbus: Introduce CHANNELMSG_MODIFYCHANNEL_RESPONSE Drivers: hv: vmbus: Introduce and negotiate VMBus protocol version 5.3 Drivers: hv: vmbus: Use after free in __vmbus_open() Drivers: hv: vmbus: remove unused function Drivers: hv: vmbus: Remove unused linux/version.h header x86/hyperv: remove unused linux/version.h header x86/Hyper-V: Support for free page reporting x86/hyperv: Fix unused variable 'hi' warning in hv_apic_read x86/hyperv: Fix unused variable 'msr_val' warning in hv_qlock_wait hv: hyperv.h: a few mundane typo fixes drivers: hv: Fix EXPORT_SYMBOL and tab spaces issue Drivers: hv: vmbus: Drop error message when 'No request id available' asm-generic/hyperv: Add missing function prototypes per -W1 warnings clocksource/drivers/hyper-v: Move handling of STIMER0 interrupts ...	2021-04-26 10:44:16 -07:00
Joseph Salisbury	753ed9c95c	drivers: hv: Create a consistent pattern for checking Hyper-V hypercall status There is not a consistent pattern for checking Hyper-V hypercall status. Existing code uses a number of variants. The variants work, but a consistent pattern would improve the readability of the code, and be more conformant to what the Hyper-V TLFS says about hypercall status. Implemented new helper functions hv_result(), hv_result_success(), and hv_repcomp(). Changed the places where hv_do_hypercall() and related variants are used to use the helper functions. Signed-off-by: Joseph Salisbury <joseph.salisbury@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1618620183-9967-2-git-send-email-joseph.salisbury@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-04-21 09:49:19 +00:00
Zheng Yongjun	90b9bfa470	x86/hyperv: remove unused linux/version.h header That header is not needed in hv_proc.c. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yongjun Zheng <zhengyongjun3@huawei.com> Link: https://lore.kernel.org/r/20210326064942.3263776-1-zhengyongjun3@huawei.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-04-02 22:07:26 +00:00
Sunil Muthuswamy	6dc2a774cb	x86/Hyper-V: Support for free page reporting Linux has support for free page reporting now (`36e66c554b`) for virtualized environment. On Hyper-V when virtually backed VMs are configured, Hyper-V will advertise cold memory discard capability, when supported. This patch adds the support to hook into the free page reporting infrastructure and leverage the Hyper-V cold memory discard hint hypercall to report/free these pages back to the host. Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com> Tested-by: Matheus Castello <matheus@castello.eng.br> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/SN4PR2101MB0880121FA4E2FEC67F35C1DCC0649@SN4PR2101MB0880.namprd21.prod.outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-03-24 11:35:24 +00:00
Xu Yihang	1b60280834	x86/hyperv: Fix unused variable 'hi' warning in hv_apic_read Fixes the following W=1 kernel build warning(s): arch/x86/hyperv/hv_apic.c:58:15: warning: variable ‘hi’ set but not used [-Wunused-but-set-variable] Compiled with CONFIG_HYPERV enabled: make allmodconfig ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu- make W=1 arch/x86/hyperv/hv_apic.o ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu- HV_X64_MSR_EOI occupies bit 31:0 and HV_X64_MSR_TPR occupies bit 7:0, which means the higher 32 bits are not really used. Cast the variable hi to void to silence this warning. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Xu Yihang <xuyihang@huawei.com> Link: https://lore.kernel.org/r/20210323025013.191533-1-xuyihang@huawei.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-03-24 11:32:56 +00:00
Xu Yihang	13c4d4626a	x86/hyperv: Fix unused variable 'msr_val' warning in hv_qlock_wait Fixes the following W=1 kernel build warning(s): arch/x86/hyperv/hv_spinlock.c:28:16: warning: variable ‘msr_val’ set but not used [-Wunused-but-set-variable] unsigned long msr_val; As Hypervisor Top-Level Functional Specification states in chapter 7.5 Virtual Processor Idle Sleep State, "A partition which possesses the AccessGuestIdleMsr privilege (refer to section 4.2.2) may trigger entry into the virtual processor idle sleep state through a read to the hypervisor-defined MSR HV_X64_MSR_GUEST_IDLE". That means only a read of the MSR is necessary. The returned value msr_val is not used. Cast it to void to silence this warning. Reference: https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Xu Yihang <xuyihang@huawei.com> Link: https://lore.kernel.org/r/20210323024302.174434-1-xuyihang@huawei.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-03-24 11:31:04 +00:00
Ingo Molnar	d9f6e12fb0	x86: Fix various typos in comments Fix ~144 single-word typos in arch/x86/ code comments. Doing this in a single commit should reduce the churn. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org	2021-03-18 15:31:53 +01:00
Michael Kelley	ec866be6ec	clocksource/drivers/hyper-v: Move handling of STIMER0 interrupts STIMER0 interrupts are most naturally modeled as per-cpu IRQs. But because x86/x64 doesn't have per-cpu IRQs, the core STIMER0 interrupt handling machinery is done in code under arch/x86 and Linux IRQs are not used. Adding support for ARM64 means adding equivalent code using per-cpu IRQs under arch/arm64. A better model is to treat per-cpu IRQs as the normal path (which it is for modern architectures), and the x86/x64 path as the exception. Do this by incorporating standard Linux per-cpu IRQ allocation into the main SITMER0 driver code, and bypass it in the x86/x64 exception case. For x86/x64, special case code is retained under arch/x86, but no STIMER0 interrupt handling code is needed under arch/arm64. No functional change. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/1614721102-2241-11-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-03-08 17:33:00 +00:00
Michael Kelley	b548a77427	Drivers: hv: vmbus: Move hyperv_report_panic_msg to arch neutral code With the new Hyper-V MSR set function, hyperv_report_panic_msg() can be architecture neutral, so move it out from under arch/x86 and merge into hv_kmsg_dump(). This move also avoids needing a separate implementation under arch/arm64. No functional change. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/r/1614721102-2241-5-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-03-08 17:32:59 +00:00
Michael Kelley	f3c5e63c36	Drivers: hv: Redo Hyper-V synthetic MSR get/set functions Current code defines a separate get and set macro for each Hyper-V synthetic MSR used by the VMbus driver. Furthermore, the get macro can't be converted to a standard function because the second argument is modified in place, which is somewhat bad form. Redo this by providing a single get and a single set function that take a parameter specifying the MSR to be operated on. Fixup usage of the get function. Calling locations are no more complex than before, but the code under arch/x86 and the upcoming code under arch/arm64 is significantly simplified. Also standardize the names of Hyper-V synthetic MSRs that are architecture neutral. But keep the old x86-specific names as aliases that can be removed later when all references (particularly in KVM code) have been cleaned up in a separate patch series. No functional change. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/r/1614721102-2241-4-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-03-08 17:32:59 +00:00
Michael Kelley	ca48739e59	Drivers: hv: vmbus: Move Hyper-V page allocator to arch neutral code The Hyper-V page allocator functions are implemented in an architecture neutral way. Move them into the architecture neutral VMbus module so a separate implementation for ARM64 is not needed. No functional change. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/r/1614721102-2241-2-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-03-08 17:32:59 +00:00
Nadav Amit	4ce94eabac	x86/mm/tlb: Flush remote and local TLBs concurrently To improve TLB shootdown performance, flush the remote and local TLBs concurrently. Introduce flush_tlb_multi() that does so. Introduce paravirtual versions of flush_tlb_multi() for KVM, Xen and hyper-v (Xen and hyper-v are only compile-tested). While the updated smp infrastructure is capable of running a function on a single local core, it is not optimized for this case. The multiple function calls and the indirect branch introduce some overhead, and might make local TLB flushes slower than they were before the recent changes. Before calling the SMP infrastructure, check if only a local TLB flush is needed to restore the lost performance in this common case. This requires to check mm_cpumask() one more time, but unless this mask is updated very frequently, this should impact performance negatively. Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> # Hyper-v parts Reviewed-by: Juergen Gross <jgross@suse.com> # Xen and paravirt parts Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20210220231712.2475218-5-namit@vmware.com	2021-03-06 12:59:10 +01:00
Wei Liu	fb5ef35165	iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft Hypervisor when Linux runs as the root partition. Implement an IRQ domain to handle mapping and unmapping of IO-APIC interrupts. Signed-off-by: Wei Liu <wei.liu@kernel.org> Acked-by: Joerg Roedel <joro@8bytes.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210203150435.27941-17-wei.liu@kernel.org	2021-02-11 08:47:07 +00:00
Wei Liu	e39397d1fd	x86/hyperv: implement an MSI domain for root partition When Linux runs as the root partition on Microsoft Hypervisor, its interrupts are remapped. Linux will need to explicitly map and unmap interrupts for hardware. Implement an MSI domain to issue the correct hypercalls. And initialize this irq domain as the default MSI irq domain. Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210203150435.27941-16-wei.liu@kernel.org	2021-02-11 08:47:07 +00:00
Wei Liu	86b5ec3552	x86/hyperv: provide a bunch of helper functions They are used to deposit pages into Microsoft Hypervisor and bring up logical and virtual processors. Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com> Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Co-Developed-by: Lillian Grassin-Drake <ligrassi@microsoft.com> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com> Co-Developed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210203150435.27941-10-wei.liu@kernel.org	2021-02-11 08:47:06 +00:00
Wei Liu	80f73c9f74	x86/hyperv: handling hypercall page setup for root When Linux is running as the root partition, the hypercall page will have already been setup by Hyper-V. Copy the content over to the allocated page. Add checks to hv_suspend & co to bail early because they are not supported in this setup yet. Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com> Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Co-Developed-by: Lillian Grassin-Drake <ligrassi@microsoft.com> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com> Co-Developed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210203150435.27941-8-wei.liu@kernel.org	2021-02-11 08:47:06 +00:00
Wei Liu	99a0f46af6	x86/hyperv: extract partition ID from Microsoft Hypervisor if necessary We will need the partition ID for executing some hypercalls later. Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com> Co-Developed-by: Sunil Muthuswamy <sunilmut@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210203150435.27941-7-wei.liu@kernel.org	2021-02-11 08:47:06 +00:00
Wei Liu	5d0f077e0f	x86/hyperv: allocate output arg pages if required When Linux runs as the root partition, it will need to make hypercalls which return data from the hypervisor. Allocate pages for storing results when Linux runs as the root partition. Signed-off-by: Lillian Grassin-Drake <ligrassi@microsoft.com> Co-Developed-by: Lillian Grassin-Drake <ligrassi@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210203150435.27941-6-wei.liu@kernel.org	2021-02-11 08:47:06 +00:00
Andrea Parri (Microsoft)	a6c76bb08d	x86/hyperv: Load/save the Isolation Configuration leaf If bit 22 of Group B Features is set, the guest has access to the Isolation Configuration CPUID leaf. On x86, the first four bits of EAX in this leaf provide the isolation type of the partition; we entail three isolation types: 'SNP' (hardware-based isolation), 'VBS' (software-based isolation), and 'NONE' (no isolation). Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: x86@kernel.org Cc: linux-arch@vger.kernel.org Link: https://lore.kernel.org/r/20210201144814.2701-2-parri.andrea@gmail.com Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-02-11 08:47:05 +00:00
Dexuan Cui	fff7b5e6ee	x86/hyperv: Initialize clockevents after LAPIC is initialized With commit `4df4cb9e99`, the Hyper-V direct-mode STIMER is actually initialized before LAPIC is initialized: see apic_intr_mode_init() x86_platform.apic_post_init() hyperv_init() hv_stimer_alloc() apic_bsp_setup() setup_local_APIC() setup_local_APIC() temporarily disables LAPIC, initializes it and re-eanble it. The direct-mode STIMER depends on LAPIC, and when it's registered, it can be programmed immediately and the timer can fire very soon: hv_stimer_init clockevents_config_and_register clockevents_register_device tick_check_new_device tick_setup_device tick_setup_periodic(), tick_setup_oneshot() clockevents_program_event When the timer fires in the hypervisor, if the LAPIC is in the disabled state, new versions of Hyper-V ignore the event and don't inject the timer interrupt into the VM, and hence the VM hangs when it boots. Note: when the VM starts/reboots, the LAPIC is pre-enabled by the firmware, so the window of LAPIC being temporarily disabled is pretty small, and the issue can only happen once out of 100~200 reboots for a 40-vCPU VM on one dev host, and on another host the issue doesn't reproduce after 2000 reboots. The issue is more noticeable for kdump/kexec, because the LAPIC is disabled by the first kernel, and stays disabled until the kdump/kexec kernel enables it. This is especially an issue to a Generation-2 VM (for which Hyper-V doesn't emulate the PIT timer) when CONFIG_HZ=1000 (rather than CONFIG_HZ=250) is used. Fix the issue by moving hv_stimer_alloc() to a later place where the LAPIC timer is initialized. Fixes: `4df4cb9e99` ("x86/hyperv: Initialize clockevents earlier in CPU onlining") Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210116223136.13892-1-decui@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-01-17 15:20:50 +00:00

1 2 3 4

155 Commits