OpenCloudOS-Kernel/drivers/hv/hv.c

443 lines
12 KiB
C
Raw Normal View History

// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright (c) 2009, Microsoft Corporation.
*
* Authors:
* Haiyang Zhang <haiyangz@microsoft.com>
* Hank Janssen <hjanssen@microsoft.com>
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/io.h>
#include <linux/kernel.h>
#include <linux/mm.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 16:04:11 +08:00
#include <linux/slab.h>
#include <linux/vmalloc.h>
#include <linux/hyperv.h>
#include <linux/random.h>
#include <linux/clockchips.h>
#include <linux/delay.h>
#include <linux/interrupt.h>
clocksource/drivers: Make Hyper-V clocksource ISA agnostic Hyper-V clock/timer code and data structures are currently mixed in with other code in the ISA independent drivers/hv directory as well as the ISA dependent Hyper-V code under arch/x86. Consolidate this code and data structures into a Hyper-V clocksource driver to better follow the Linux model. In doing so, separate out the ISA dependent portions so the new clocksource driver works for x86 and for the in-process Hyper-V on ARM64 code. To start, move the existing clockevents code to create the new clocksource driver. Update the VMbus driver to call initialization and cleanup routines since the Hyper-V synthetic timers are not independently enumerated in ACPI. No behavior is changed and no new functionality is added. Suggested-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: "bp@alien8.de" <bp@alien8.de> Cc: "will.deacon@arm.com" <will.deacon@arm.com> Cc: "catalin.marinas@arm.com" <catalin.marinas@arm.com> Cc: "mark.rutland@arm.com" <mark.rutland@arm.com> Cc: "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org> Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org> Cc: "linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org> Cc: "olaf@aepfle.de" <olaf@aepfle.de> Cc: "apw@canonical.com" <apw@canonical.com> Cc: "jasowang@redhat.com" <jasowang@redhat.com> Cc: "marcelo.cerri@canonical.com" <marcelo.cerri@canonical.com> Cc: Sunil Muthuswamy <sunilmut@microsoft.com> Cc: KY Srinivasan <kys@microsoft.com> Cc: "sashal@kernel.org" <sashal@kernel.org> Cc: "vincenzo.frascino@arm.com" <vincenzo.frascino@arm.com> Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org> Cc: "linux-mips@vger.kernel.org" <linux-mips@vger.kernel.org> Cc: "linux-kselftest@vger.kernel.org" <linux-kselftest@vger.kernel.org> Cc: "arnd@arndb.de" <arnd@arndb.de> Cc: "linux@armlinux.org.uk" <linux@armlinux.org.uk> Cc: "ralf@linux-mips.org" <ralf@linux-mips.org> Cc: "paul.burton@mips.com" <paul.burton@mips.com> Cc: "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org> Cc: "salyzyn@android.com" <salyzyn@android.com> Cc: "pcc@google.com" <pcc@google.com> Cc: "shuah@kernel.org" <shuah@kernel.org> Cc: "0x7f454c46@gmail.com" <0x7f454c46@gmail.com> Cc: "linux@rasmusvillemoes.dk" <linux@rasmusvillemoes.dk> Cc: "huw@codeweavers.com" <huw@codeweavers.com> Cc: "sfr@canb.auug.org.au" <sfr@canb.auug.org.au> Cc: "pbonzini@redhat.com" <pbonzini@redhat.com> Cc: "rkrcmar@redhat.com" <rkrcmar@redhat.com> Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org> Link: https://lkml.kernel.org/r/1561955054-1838-2-git-send-email-mikelley@microsoft.com
2019-07-01 12:25:56 +08:00
#include <clocksource/hyperv_timer.h>
#include <asm/mshyperv.h>
#include "hyperv_vmbus.h"
/* The one and only */
struct hv_context hv_context;
/*
* hv_init - Main initialization routine.
*
* This routine must be called before any other routines in here are called
*/
int hv_init(void)
{
hv_context.cpu_context = alloc_percpu(struct hv_per_cpu_context);
if (!hv_context.cpu_context)
return -ENOMEM;
return 0;
}
/*
* Functions for allocating and freeing memory with size and
* alignment HV_HYP_PAGE_SIZE. These functions are needed because
* the guest page size may not be the same as the Hyper-V page
* size. We depend upon kmalloc() aligning power-of-two size
* allocations to the allocation size boundary, so that the
* allocated memory appears to Hyper-V as a page of the size
* it expects.
*/
void *hv_alloc_hyperv_page(void)
{
BUILD_BUG_ON(PAGE_SIZE < HV_HYP_PAGE_SIZE);
if (PAGE_SIZE == HV_HYP_PAGE_SIZE)
return (void *)__get_free_page(GFP_KERNEL);
else
return kmalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
}
void *hv_alloc_hyperv_zeroed_page(void)
{
if (PAGE_SIZE == HV_HYP_PAGE_SIZE)
return (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
else
return kzalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
}
void hv_free_hyperv_page(unsigned long addr)
{
if (PAGE_SIZE == HV_HYP_PAGE_SIZE)
free_page(addr);
else
kfree((void *)addr);
}
/*
* hv_post_message - Post a message using the hypervisor message IPC.
*
* This involves a hypercall.
*/
int hv_post_message(union hv_connection_id connection_id,
enum hv_message_type message_type,
void *payload, size_t payload_size)
{
struct hv_input_post_message *aligned_msg;
struct hv_per_cpu_context *hv_cpu;
u64 status;
if (payload_size > HV_MESSAGE_PAYLOAD_BYTE_COUNT)
return -EMSGSIZE;
hv_cpu = get_cpu_ptr(hv_context.cpu_context);
aligned_msg = hv_cpu->post_msg_page;
aligned_msg->connectionid = connection_id;
aligned_msg->reserved = 0;
aligned_msg->message_type = message_type;
aligned_msg->payload_size = payload_size;
memcpy((void *)aligned_msg->payload, payload, payload_size);
if (hv_isolation_type_snp())
status = hv_ghcb_hypercall(HVCALL_POST_MESSAGE,
(void *)aligned_msg, NULL,
sizeof(*aligned_msg));
else
status = hv_do_hypercall(HVCALL_POST_MESSAGE,
aligned_msg, NULL);
/* Preemption must remain disabled until after the hypercall
* so some other thread can't get scheduled onto this cpu and
* corrupt the per-cpu post_msg_page
*/
put_cpu_ptr(hv_cpu);
return hv_result(status);
}
int hv_synic_alloc(void)
{
int cpu;
struct hv_per_cpu_context *hv_cpu;
/*
* First, zero all per-cpu memory areas so hv_synic_free() can
* detect what memory has been allocated and cleanup properly
* after any failures.
*/
for_each_present_cpu(cpu) {
hv_cpu = per_cpu_ptr(hv_context.cpu_context, cpu);
memset(hv_cpu, 0, sizeof(*hv_cpu));
}
treewide: kzalloc() -> kcalloc() The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-13 05:03:40 +08:00
hv_context.hv_numa_map = kcalloc(nr_node_ids, sizeof(struct cpumask),
GFP_KERNEL);
if (hv_context.hv_numa_map == NULL) {
pr_err("Unable to allocate NUMA map\n");
goto err;
}
for_each_present_cpu(cpu) {
hv_cpu = per_cpu_ptr(hv_context.cpu_context, cpu);
tasklet_init(&hv_cpu->msg_dpc,
vmbus_on_msg_dpc, (unsigned long) hv_cpu);
/*
* Synic message and event pages are allocated by paravisor.
* Skip these pages allocation here.
*/
if (!hv_isolation_type_snp()) {
hv_cpu->synic_message_page =
(void *)get_zeroed_page(GFP_ATOMIC);
if (hv_cpu->synic_message_page == NULL) {
pr_err("Unable to allocate SYNIC message page\n");
goto err;
}
hv_cpu->synic_event_page =
(void *)get_zeroed_page(GFP_ATOMIC);
if (hv_cpu->synic_event_page == NULL) {
pr_err("Unable to allocate SYNIC event page\n");
goto err;
}
}
hv_cpu->post_msg_page = (void *)get_zeroed_page(GFP_ATOMIC);
if (hv_cpu->post_msg_page == NULL) {
pr_err("Unable to allocate post msg page\n");
goto err;
}
}
return 0;
err:
/*
* Any memory allocations that succeeded will be freed when
* the caller cleans up by calling hv_synic_free()
*/
return -ENOMEM;
}
void hv_synic_free(void)
{
int cpu;
for_each_present_cpu(cpu) {
struct hv_per_cpu_context *hv_cpu
= per_cpu_ptr(hv_context.cpu_context, cpu);
free_page((unsigned long)hv_cpu->synic_event_page);
free_page((unsigned long)hv_cpu->synic_message_page);
free_page((unsigned long)hv_cpu->post_msg_page);
}
kfree(hv_context.hv_numa_map);
}
/*
* hv_synic_init - Initialize the Synthetic Interrupt Controller.
*
* If it is already initialized by another entity (ie x2v shim), we need to
* retrieve the initialized message and event pages. Otherwise, we create and
* initialize the message and event pages.
*/
void hv_synic_enable_regs(unsigned int cpu)
{
struct hv_per_cpu_context *hv_cpu
= per_cpu_ptr(hv_context.cpu_context, cpu);
union hv_synic_simp simp;
union hv_synic_siefp siefp;
union hv_synic_sint shared_sint;
union hv_synic_scontrol sctrl;
/* Setup the Synic's message page */
simp.as_uint64 = hv_get_register(HV_REGISTER_SIMP);
simp.simp_enabled = 1;
if (hv_isolation_type_snp()) {
hv_cpu->synic_message_page
= memremap(simp.base_simp_gpa << HV_HYP_PAGE_SHIFT,
HV_HYP_PAGE_SIZE, MEMREMAP_WB);
if (!hv_cpu->synic_message_page)
pr_err("Fail to map syinc message page.\n");
} else {
simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
>> HV_HYP_PAGE_SHIFT;
}
hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
/* Setup the Synic's event page */
siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
siefp.siefp_enabled = 1;
if (hv_isolation_type_snp()) {
hv_cpu->synic_event_page =
memremap(siefp.base_siefp_gpa << HV_HYP_PAGE_SHIFT,
HV_HYP_PAGE_SIZE, MEMREMAP_WB);
if (!hv_cpu->synic_event_page)
pr_err("Fail to map syinc event page.\n");
} else {
siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
>> HV_HYP_PAGE_SHIFT;
}
hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
/* Setup the shared SINT. */
if (vmbus_irq != -1)
enable_percpu_irq(vmbus_irq, 0);
shared_sint.as_uint64 = hv_get_register(HV_REGISTER_SINT0 +
VMBUS_MESSAGE_SINT);
shared_sint.vector = vmbus_interrupt;
shared_sint.masked = false;
/*
* On architectures where Hyper-V doesn't support AEOI (e.g., ARM64),
* it doesn't provide a recommendation flag and AEOI must be disabled.
*/
#ifdef HV_DEPRECATING_AEOI_RECOMMENDED
shared_sint.auto_eoi =
!(ms_hyperv.hints & HV_DEPRECATING_AEOI_RECOMMENDED);
#else
shared_sint.auto_eoi = 0;
#endif
hv_set_register(HV_REGISTER_SINT0 + VMBUS_MESSAGE_SINT,
shared_sint.as_uint64);
/* Enable the global synic bit */
sctrl.as_uint64 = hv_get_register(HV_REGISTER_SCONTROL);
sctrl.enable = 1;
hv_set_register(HV_REGISTER_SCONTROL, sctrl.as_uint64);
}
int hv_synic_init(unsigned int cpu)
{
hv_synic_enable_regs(cpu);
x86/hyperv: Initialize clockevents earlier in CPU onlining Hyper-V has historically initialized stimer-based clockevents late in the process of onlining a CPU because clockevents depend on stimer interrupts. In the original Hyper-V design, stimer interrupts generate a VMbus message, so the VMbus machinery must be running first, and VMbus can't be initialized until relatively late. On x86/64, LAPIC timer based clockevents are used during early initialization before VMbus and stimer-based clockevents are ready, and again during CPU offlining after the stimer clockevents have been shut down. Unfortunately, this design creates problems when offlining CPUs for hibernation or other purposes. stimer-based clockevents are shut down relatively early in the offlining process, so clockevents_unbind_device() must be used to fallback to the LAPIC-based clockevents for the remainder of the offlining process. Furthermore, the late initialization and early shutdown of stimer-based clockevents doesn't work well on ARM64 since there is no other timer like the LAPIC to fallback to. So CPU onlining and offlining doesn't work properly. Fix this by recognizing that stimer Direct Mode is the normal path for newer versions of Hyper-V on x86/64, and the only path on other architectures. With stimer Direct Mode, stimer interrupts don't require any VMbus machinery. stimer clockevents can be initialized and shut down consistent with how it is done for other clockevent devices. While the old VMbus-based stimer interrupts must still be supported for backward compatibility on x86, that mode of operation can be treated as legacy. So add a new Hyper-V stimer entry in the CPU hotplug state list, and use that new state when in Direct Mode. Update the Hyper-V clocksource driver to allocate and initialize stimer clockevents earlier during boot. Update Hyper-V initialization and the VMbus driver to use this new design. As a result, the LAPIC timer is no longer used during boot or CPU onlining/offlining and clockevents_unbind_device() is not called. But retain the old design as a legacy implementation for older versions of Hyper-V that don't support Direct Mode. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Link: https://lkml.kernel.org/r/1573607467-9456-1-git-send-email-mikelley@microsoft.com
2019-11-13 09:11:49 +08:00
hv_stimer_legacy_init(cpu, VMBUS_MESSAGE_SINT);
clocksource/drivers: Make Hyper-V clocksource ISA agnostic Hyper-V clock/timer code and data structures are currently mixed in with other code in the ISA independent drivers/hv directory as well as the ISA dependent Hyper-V code under arch/x86. Consolidate this code and data structures into a Hyper-V clocksource driver to better follow the Linux model. In doing so, separate out the ISA dependent portions so the new clocksource driver works for x86 and for the in-process Hyper-V on ARM64 code. To start, move the existing clockevents code to create the new clocksource driver. Update the VMbus driver to call initialization and cleanup routines since the Hyper-V synthetic timers are not independently enumerated in ACPI. No behavior is changed and no new functionality is added. Suggested-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: "bp@alien8.de" <bp@alien8.de> Cc: "will.deacon@arm.com" <will.deacon@arm.com> Cc: "catalin.marinas@arm.com" <catalin.marinas@arm.com> Cc: "mark.rutland@arm.com" <mark.rutland@arm.com> Cc: "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org> Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org> Cc: "linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org> Cc: "olaf@aepfle.de" <olaf@aepfle.de> Cc: "apw@canonical.com" <apw@canonical.com> Cc: "jasowang@redhat.com" <jasowang@redhat.com> Cc: "marcelo.cerri@canonical.com" <marcelo.cerri@canonical.com> Cc: Sunil Muthuswamy <sunilmut@microsoft.com> Cc: KY Srinivasan <kys@microsoft.com> Cc: "sashal@kernel.org" <sashal@kernel.org> Cc: "vincenzo.frascino@arm.com" <vincenzo.frascino@arm.com> Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org> Cc: "linux-mips@vger.kernel.org" <linux-mips@vger.kernel.org> Cc: "linux-kselftest@vger.kernel.org" <linux-kselftest@vger.kernel.org> Cc: "arnd@arndb.de" <arnd@arndb.de> Cc: "linux@armlinux.org.uk" <linux@armlinux.org.uk> Cc: "ralf@linux-mips.org" <ralf@linux-mips.org> Cc: "paul.burton@mips.com" <paul.burton@mips.com> Cc: "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org> Cc: "salyzyn@android.com" <salyzyn@android.com> Cc: "pcc@google.com" <pcc@google.com> Cc: "shuah@kernel.org" <shuah@kernel.org> Cc: "0x7f454c46@gmail.com" <0x7f454c46@gmail.com> Cc: "linux@rasmusvillemoes.dk" <linux@rasmusvillemoes.dk> Cc: "huw@codeweavers.com" <huw@codeweavers.com> Cc: "sfr@canb.auug.org.au" <sfr@canb.auug.org.au> Cc: "pbonzini@redhat.com" <pbonzini@redhat.com> Cc: "rkrcmar@redhat.com" <rkrcmar@redhat.com> Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org> Link: https://lkml.kernel.org/r/1561955054-1838-2-git-send-email-mikelley@microsoft.com
2019-07-01 12:25:56 +08:00
return 0;
}
/*
* hv_synic_cleanup - Cleanup routine for hv_synic_init().
*/
void hv_synic_disable_regs(unsigned int cpu)
{
struct hv_per_cpu_context *hv_cpu
= per_cpu_ptr(hv_context.cpu_context, cpu);
union hv_synic_sint shared_sint;
union hv_synic_simp simp;
union hv_synic_siefp siefp;
drivers: hv: vmbus: Teardown synthetic interrupt controllers on module unload SynIC has to be switched off when we unload the module, otherwise registered memory pages can get corrupted after (as Hyper-V host still writes there) and we see the following crashes for random processes: [ 89.116774] BUG: Bad page map in process sh pte:4989c716 pmd:36f81067 [ 89.159454] addr:0000000000437000 vm_flags:00000875 anon_vma: (null) mapping:ffff88007bba55a0 index:37 [ 89.226146] vma->vm_ops->fault: filemap_fault+0x0/0x410 [ 89.257776] vma->vm_file->f_op->mmap: generic_file_mmap+0x0/0x60 [ 89.297570] CPU: 0 PID: 215 Comm: sh Tainted: G B 3.19.0-rc5_bug923184+ #488 [ 89.353738] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006 05/23/2012 [ 89.409138] 0000000000000000 000000004e083d7b ffff880036e9fa18 ffffffff81a68d31 [ 89.468724] 0000000000000000 0000000000437000 ffff880036e9fa68 ffffffff811a1e3a [ 89.519233] 000000004989c716 0000000000000037 ffffea0001edc340 0000000000437000 [ 89.575751] Call Trace: [ 89.591060] [<ffffffff81a68d31>] dump_stack+0x45/0x57 [ 89.625164] [<ffffffff811a1e3a>] print_bad_pte+0x1aa/0x250 [ 89.667234] [<ffffffff811a2c95>] vm_normal_page+0x55/0xa0 [ 89.703818] [<ffffffff811a3105>] unmap_page_range+0x425/0x8a0 [ 89.737982] [<ffffffff811a3601>] unmap_single_vma+0x81/0xf0 [ 89.780385] [<ffffffff81184320>] ? lru_deactivate_fn+0x190/0x190 [ 89.820130] [<ffffffff811a4131>] unmap_vmas+0x51/0xa0 [ 89.860168] [<ffffffff811ad12c>] exit_mmap+0xac/0x1a0 [ 89.890588] [<ffffffff810763c3>] mmput+0x63/0x100 [ 89.919205] [<ffffffff811eba48>] flush_old_exec+0x3f8/0x8b0 [ 89.962135] [<ffffffff8123b5bb>] load_elf_binary+0x32b/0x1260 [ 89.998581] [<ffffffff811a14f2>] ? get_user_pages+0x52/0x60 hv_synic_cleanup() function exists but noone calls it now. Do the following: - call hv_synic_cleanup() on each cpu from vmbus_exit(); - write global disable bit through MSR; - use hv_synic_free_cpu() to avoid memory leask and code duplication. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-28 03:25:55 +08:00
union hv_synic_scontrol sctrl;
shared_sint.as_uint64 = hv_get_register(HV_REGISTER_SINT0 +
VMBUS_MESSAGE_SINT);
shared_sint.masked = 1;
/* Need to correctly cleanup in the case of SMP!!! */
/* Disable the interrupt */
hv_set_register(HV_REGISTER_SINT0 + VMBUS_MESSAGE_SINT,
shared_sint.as_uint64);
simp.as_uint64 = hv_get_register(HV_REGISTER_SIMP);
/*
* In Isolation VM, sim and sief pages are allocated by
* paravisor. These pages also will be used by kdump
* kernel. So just reset enable bit here and keep page
* addresses.
*/
simp.simp_enabled = 0;
if (hv_isolation_type_snp())
memunmap(hv_cpu->synic_message_page);
else
simp.base_simp_gpa = 0;
hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
siefp.siefp_enabled = 0;
if (hv_isolation_type_snp())
memunmap(hv_cpu->synic_event_page);
else
siefp.base_siefp_gpa = 0;
hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
/* Disable the global synic bit */
sctrl.as_uint64 = hv_get_register(HV_REGISTER_SCONTROL);
sctrl.enable = 0;
hv_set_register(HV_REGISTER_SCONTROL, sctrl.as_uint64);
if (vmbus_irq != -1)
disable_percpu_irq(vmbus_irq);
}
#define HV_MAX_TRIES 3
/*
* Scan the event flags page of 'this' CPU looking for any bit that is set. If we find one
* bit set, then wait for a few milliseconds. Repeat these steps for a maximum of 3 times.
* Return 'true', if there is still any set bit after this operation; 'false', otherwise.
*
* If a bit is set, that means there is a pending channel interrupt. The expectation is
* that the normal interrupt handling mechanism will find and process the channel interrupt
* "very soon", and in the process clear the bit.
*/
static bool hv_synic_event_pending(void)
{
struct hv_per_cpu_context *hv_cpu = this_cpu_ptr(hv_context.cpu_context);
union hv_synic_event_flags *event =
(union hv_synic_event_flags *)hv_cpu->synic_event_page + VMBUS_MESSAGE_SINT;
unsigned long *recv_int_page = event->flags; /* assumes VMBus version >= VERSION_WIN8 */
bool pending;
u32 relid;
int tries = 0;
retry:
pending = false;
for_each_set_bit(relid, recv_int_page, HV_EVENT_FLAGS_COUNT) {
/* Special case - VMBus channel protocol messages */
if (relid == 0)
continue;
pending = true;
break;
}
if (pending && tries++ < HV_MAX_TRIES) {
usleep_range(10000, 20000);
goto retry;
}
return pending;
}
int hv_synic_cleanup(unsigned int cpu)
{
struct vmbus_channel *channel, *sc;
bool channel_found = false;
if (vmbus_connection.conn_state != CONNECTED)
goto always_cleanup;
/*
* Hyper-V does not provide a way to change the connect CPU once
* it is set; we must prevent the connect CPU from going offline
* while the VM is running normally. But in the panic or kexec()
* path where the vmbus is already disconnected, the CPU must be
* allowed to shut down.
*/
if (cpu == VMBUS_CONNECT_CPU)
return -EBUSY;
/*
* Search for channels which are bound to the CPU we're about to
* cleanup. In case we find one and vmbus is still connected, we
* fail; this will effectively prevent CPU offlining.
*
* TODO: Re-bind the channels to different CPUs.
*/
mutex_lock(&vmbus_connection.channel_mutex);
list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
if (channel->target_cpu == cpu) {
channel_found = true;
break;
}
list_for_each_entry(sc, &channel->sc_list, sc_list) {
if (sc->target_cpu == cpu) {
channel_found = true;
break;
}
}
if (channel_found)
break;
}
mutex_unlock(&vmbus_connection.channel_mutex);
if (channel_found)
return -EBUSY;
/*
* channel_found == false means that any channels that were previously
* assigned to the CPU have been reassigned elsewhere with a call of
* vmbus_send_modifychannel(). Scan the event flags page looking for
* bits that are set and waiting with a timeout for vmbus_chan_sched()
* to process such bits. If bits are still set after this operation
* and VMBus is connected, fail the CPU offlining operation.
*/
if (vmbus_proto_version >= VERSION_WIN10_V4_1 && hv_synic_event_pending())
return -EBUSY;
always_cleanup:
x86/hyperv: Initialize clockevents earlier in CPU onlining Hyper-V has historically initialized stimer-based clockevents late in the process of onlining a CPU because clockevents depend on stimer interrupts. In the original Hyper-V design, stimer interrupts generate a VMbus message, so the VMbus machinery must be running first, and VMbus can't be initialized until relatively late. On x86/64, LAPIC timer based clockevents are used during early initialization before VMbus and stimer-based clockevents are ready, and again during CPU offlining after the stimer clockevents have been shut down. Unfortunately, this design creates problems when offlining CPUs for hibernation or other purposes. stimer-based clockevents are shut down relatively early in the offlining process, so clockevents_unbind_device() must be used to fallback to the LAPIC-based clockevents for the remainder of the offlining process. Furthermore, the late initialization and early shutdown of stimer-based clockevents doesn't work well on ARM64 since there is no other timer like the LAPIC to fallback to. So CPU onlining and offlining doesn't work properly. Fix this by recognizing that stimer Direct Mode is the normal path for newer versions of Hyper-V on x86/64, and the only path on other architectures. With stimer Direct Mode, stimer interrupts don't require any VMbus machinery. stimer clockevents can be initialized and shut down consistent with how it is done for other clockevent devices. While the old VMbus-based stimer interrupts must still be supported for backward compatibility on x86, that mode of operation can be treated as legacy. So add a new Hyper-V stimer entry in the CPU hotplug state list, and use that new state when in Direct Mode. Update the Hyper-V clocksource driver to allocate and initialize stimer clockevents earlier during boot. Update Hyper-V initialization and the VMbus driver to use this new design. As a result, the LAPIC timer is no longer used during boot or CPU onlining/offlining and clockevents_unbind_device() is not called. But retain the old design as a legacy implementation for older versions of Hyper-V that don't support Direct Mode. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Link: https://lkml.kernel.org/r/1573607467-9456-1-git-send-email-mikelley@microsoft.com
2019-11-13 09:11:49 +08:00
hv_stimer_legacy_cleanup(cpu);
hv_synic_disable_regs(cpu);
return 0;
}