kvm/arm64: rework guest entry logic

In kvm_arch_vcpu_ioctl_run() we enter an RCU extended quiescent state
(EQS) by calling guest_enter_irqoff(), and unmasked IRQs prior to
exiting the EQS by calling guest_exit(). As the IRQ entry code will not
wake RCU in this case, we may run the core IRQ code and IRQ handler
without RCU watching, leading to various potential problems.

Additionally, we do not inform lockdep or tracing that interrupts will
be enabled during guest execution, which caan lead to misleading traces
and warnings that interrupts have been enabled for overly-long periods.

This patch fixes these issues by using the new timing and context
entry/exit helpers to ensure that interrupts are handled during guest
vtime but with RCU watching, with a sequence:

	guest_timing_enter_irqoff();

	guest_state_enter_irqoff();
	< run the vcpu >
	guest_state_exit_irqoff();

	< take any pending IRQs >

	guest_timing_exit_irqoff();

Since instrumentation may make use of RCU, we must also ensure that no
instrumented code is run during the EQS. I've split out the critical
section into a new kvm_arm_enter_exit_vcpu() helper which is marked
noinstr.

Fixes: 1b3d546daf ("arm/arm64: KVM: Properly account for guest CPU time")
Reported-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Message-Id: <20220201132926.3301912-3-mark.rutland@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This commit is contained in:
Mark Rutland 2022-02-01 13:29:23 +00:00 committed by Paolo Bonzini
parent b2d2af7e5d
commit 8cfe148a71
1 changed files with 33 additions and 18 deletions

View File

@ -790,6 +790,24 @@ static bool kvm_vcpu_exit_request(struct kvm_vcpu *vcpu, int *ret)
xfer_to_guest_mode_work_pending(); xfer_to_guest_mode_work_pending();
} }
/*
* Actually run the vCPU, entering an RCU extended quiescent state (EQS) while
* the vCPU is running.
*
* This must be noinstr as instrumentation may make use of RCU, and this is not
* safe during the EQS.
*/
static int noinstr kvm_arm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
{
int ret;
guest_state_enter_irqoff();
ret = kvm_call_hyp_ret(__kvm_vcpu_run, vcpu);
guest_state_exit_irqoff();
return ret;
}
/** /**
* kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
* @vcpu: The VCPU pointer * @vcpu: The VCPU pointer
@ -874,9 +892,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
* Enter the guest * Enter the guest
*/ */
trace_kvm_entry(*vcpu_pc(vcpu)); trace_kvm_entry(*vcpu_pc(vcpu));
guest_enter_irqoff(); guest_timing_enter_irqoff();
ret = kvm_call_hyp_ret(__kvm_vcpu_run, vcpu); ret = kvm_arm_vcpu_enter_exit(vcpu);
vcpu->mode = OUTSIDE_GUEST_MODE; vcpu->mode = OUTSIDE_GUEST_MODE;
vcpu->stat.exits++; vcpu->stat.exits++;
@ -911,26 +929,23 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
kvm_arch_vcpu_ctxsync_fp(vcpu); kvm_arch_vcpu_ctxsync_fp(vcpu);
/* /*
* We may have taken a host interrupt in HYP mode (ie * We must ensure that any pending interrupts are taken before
* while executing the guest). This interrupt is still * we exit guest timing so that timer ticks are accounted as
* pending, as we haven't serviced it yet! * guest time. Transiently unmask interrupts so that any
* pending interrupts are taken.
* *
* We're now back in SVC mode, with interrupts * Per ARM DDI 0487G.b section D1.13.4, an ISB (or other
* disabled. Enabling the interrupts now will have * context synchronization event) is necessary to ensure that
* the effect of taking the interrupt again, in SVC * pending interrupts are taken.
* mode this time.
*/ */
local_irq_enable(); local_irq_enable();
isb();
local_irq_disable();
guest_timing_exit_irqoff();
local_irq_enable();
/*
* We do local_irq_enable() before calling guest_exit() so
* that if a timer interrupt hits while running the guest we
* account that tick as being spent in the guest. We enable
* preemption after calling guest_exit() so that if we get
* preempted we make sure ticks after that is not counted as
* guest time.
*/
guest_exit();
trace_kvm_exit(ret, kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu)); trace_kvm_exit(ret, kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
/* Exit types that need handling before we can be preempted */ /* Exit types that need handling before we can be preempted */