2019-06-03 13:44:50 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0-only
|
2015-11-30 21:09:53 +08:00
|
|
|
|
|
|
|
#include <linux/irqchip/arm-gic-v3.h>
|
2021-03-22 14:01:56 +08:00
|
|
|
#include <linux/irq.h>
|
|
|
|
#include <linux/irqdomain.h>
|
2015-11-30 21:09:53 +08:00
|
|
|
#include <linux/kvm.h>
|
|
|
|
#include <linux/kvm_host.h>
|
2015-12-01 22:02:35 +08:00
|
|
|
#include <kvm/arm_vgic.h>
|
2017-10-05 06:18:07 +08:00
|
|
|
#include <asm/kvm_hyp.h>
|
2015-12-01 22:02:35 +08:00
|
|
|
#include <asm/kvm_mmu.h>
|
|
|
|
#include <asm/kvm_asm.h>
|
2015-11-30 21:09:53 +08:00
|
|
|
|
|
|
|
#include "vgic.h"
|
|
|
|
|
2017-06-09 19:49:45 +08:00
|
|
|
static bool group0_trap;
|
2017-06-09 19:49:40 +08:00
|
|
|
static bool group1_trap;
|
2017-06-09 19:49:53 +08:00
|
|
|
static bool common_trap;
|
KVM: arm64: vgic-v3: Reduce common group trapping to ICV_DIR_EL1 when possible
On systems that advertise ICH_VTR_EL2.SEIS, we trap all GICv3 sysreg
accesses from the guest. From a performance perspective, this is OK
as long as the guest doesn't hammer the GICv3 CPU interface.
In most cases, this is fine, unless the guest actively uses
priorities and switches PMR_EL1 very often. Which is exactly what
happens when a Linux guest runs with irqchip.gicv3_pseudo_nmi=1.
In these condition, the performance plumets as we hit PMR each time
we mask/unmask interrupts. Not good.
There is however an opportunity for improvement. Careful reading
of the architecture specification indicates that the only GICv3
sysreg belonging to the common group (which contains the SGI
registers, PMR, DIR, CTLR and RPR) that is allowed to generate
a SError is DIR. Everything else is safe.
It is thus possible to substitute the trapping of all the common
group with just that of DIR if it supported by the implementation.
Yes, that's yet another optional bit of the architecture.
So let's just do that, as it leads to some impressive result on
the M1:
Without this change:
bash-5.1# /host/home/maz/hackbench 100 process 1000
Running with 100*40 (== 4000) tasks.
Time: 56.596
With this change:
bash-5.1# /host/home/maz/hackbench 100 process 1000
Running with 100*40 (== 4000) tasks.
Time: 8.649
which is a pretty convincing result.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20211010150910.2911495-4-maz@kernel.org
2021-10-10 23:09:08 +08:00
|
|
|
static bool dir_trap;
|
2017-10-27 22:28:54 +08:00
|
|
|
static bool gicv4_enable;
|
2017-06-09 19:49:40 +08:00
|
|
|
|
2016-12-29 22:44:27 +08:00
|
|
|
void vgic_v3_set_underflow(struct kvm_vcpu *vcpu)
|
2015-11-30 21:09:53 +08:00
|
|
|
{
|
|
|
|
struct vgic_v3_cpu_if *cpuif = &vcpu->arch.vgic_cpu.vgic_v3;
|
|
|
|
|
2016-12-29 22:44:27 +08:00
|
|
|
cpuif->vgic_hcr |= ICH_HCR_UIE;
|
2015-11-30 21:09:53 +08:00
|
|
|
}
|
|
|
|
|
2016-12-29 22:44:27 +08:00
|
|
|
static bool lr_signals_eoi_mi(u64 lr_val)
|
2015-11-30 21:09:53 +08:00
|
|
|
{
|
2016-12-29 22:44:27 +08:00
|
|
|
return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) &&
|
|
|
|
!(lr_val & ICH_LR_HW);
|
2015-11-30 21:09:53 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2017-03-18 20:48:42 +08:00
|
|
|
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
|
|
|
|
struct vgic_v3_cpu_if *cpuif = &vgic_cpu->vgic_v3;
|
2015-11-30 21:09:53 +08:00
|
|
|
u32 model = vcpu->kvm->arch.vgic.vgic_model;
|
|
|
|
int lr;
|
2018-08-03 21:57:04 +08:00
|
|
|
|
|
|
|
DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
|
2015-11-30 21:09:53 +08:00
|
|
|
|
KVM: arm/arm64: vgic: Fix source vcpu issues for GICv2 SGI
Now that we make sure we don't inject multiple instances of the
same GICv2 SGI at the same time, we've made another bug more
obvious:
If we exit with an active SGI, we completely lose track of which
vcpu it came from. On the next entry, we restore it with 0 as a
source, and if that wasn't the right one, too bad. While this
doesn't seem to trouble GIC-400, the architectural model gets
offended and doesn't deactivate the interrupt on EOI.
Another connected issue is that we will happilly make pending
an interrupt from another vcpu, overriding the above zero with
something that is just as inconsistent. Don't do that.
The final issue is that we signal a maintenance interrupt when
no pending interrupts are present in the LR. Assuming we've fixed
the two issues above, we end-up in a situation where we keep
exiting as soon as we've reached the active state, and not be
able to inject the following pending.
The fix comes in 3 parts:
- GICv2 SGIs have their source vcpu saved if they are active on
exit, and restored on entry
- Multi-SGIs cannot go via the Pending+Active state, as this would
corrupt the source field
- Multi-SGIs are converted to using MI on EOI instead of NPIE
Fixes: 16ca6a607d84bef0 ("KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-04-18 17:39:04 +08:00
|
|
|
cpuif->vgic_hcr &= ~ICH_HCR_UIE;
|
2016-12-29 22:44:27 +08:00
|
|
|
|
2018-12-02 00:41:28 +08:00
|
|
|
for (lr = 0; lr < cpuif->used_lrs; lr++) {
|
2015-11-30 21:09:53 +08:00
|
|
|
u64 val = cpuif->vgic_lr[lr];
|
KVM: arm/arm64: vgic: Fix source vcpu issues for GICv2 SGI
Now that we make sure we don't inject multiple instances of the
same GICv2 SGI at the same time, we've made another bug more
obvious:
If we exit with an active SGI, we completely lose track of which
vcpu it came from. On the next entry, we restore it with 0 as a
source, and if that wasn't the right one, too bad. While this
doesn't seem to trouble GIC-400, the architectural model gets
offended and doesn't deactivate the interrupt on EOI.
Another connected issue is that we will happilly make pending
an interrupt from another vcpu, overriding the above zero with
something that is just as inconsistent. Don't do that.
The final issue is that we signal a maintenance interrupt when
no pending interrupts are present in the LR. Assuming we've fixed
the two issues above, we end-up in a situation where we keep
exiting as soon as we've reached the active state, and not be
able to inject the following pending.
The fix comes in 3 parts:
- GICv2 SGIs have their source vcpu saved if they are active on
exit, and restored on entry
- Multi-SGIs cannot go via the Pending+Active state, as this would
corrupt the source field
- Multi-SGIs are converted to using MI on EOI instead of NPIE
Fixes: 16ca6a607d84bef0 ("KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-04-18 17:39:04 +08:00
|
|
|
u32 intid, cpuid;
|
2015-11-30 21:09:53 +08:00
|
|
|
struct vgic_irq *irq;
|
KVM: arm/arm64: vgic: Fix source vcpu issues for GICv2 SGI
Now that we make sure we don't inject multiple instances of the
same GICv2 SGI at the same time, we've made another bug more
obvious:
If we exit with an active SGI, we completely lose track of which
vcpu it came from. On the next entry, we restore it with 0 as a
source, and if that wasn't the right one, too bad. While this
doesn't seem to trouble GIC-400, the architectural model gets
offended and doesn't deactivate the interrupt on EOI.
Another connected issue is that we will happilly make pending
an interrupt from another vcpu, overriding the above zero with
something that is just as inconsistent. Don't do that.
The final issue is that we signal a maintenance interrupt when
no pending interrupts are present in the LR. Assuming we've fixed
the two issues above, we end-up in a situation where we keep
exiting as soon as we've reached the active state, and not be
able to inject the following pending.
The fix comes in 3 parts:
- GICv2 SGIs have their source vcpu saved if they are active on
exit, and restored on entry
- Multi-SGIs cannot go via the Pending+Active state, as this would
corrupt the source field
- Multi-SGIs are converted to using MI on EOI instead of NPIE
Fixes: 16ca6a607d84bef0 ("KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-04-18 17:39:04 +08:00
|
|
|
bool is_v2_sgi = false;
|
2021-08-20 02:03:05 +08:00
|
|
|
bool deactivated;
|
KVM: arm/arm64: vgic: Fix source vcpu issues for GICv2 SGI
Now that we make sure we don't inject multiple instances of the
same GICv2 SGI at the same time, we've made another bug more
obvious:
If we exit with an active SGI, we completely lose track of which
vcpu it came from. On the next entry, we restore it with 0 as a
source, and if that wasn't the right one, too bad. While this
doesn't seem to trouble GIC-400, the architectural model gets
offended and doesn't deactivate the interrupt on EOI.
Another connected issue is that we will happilly make pending
an interrupt from another vcpu, overriding the above zero with
something that is just as inconsistent. Don't do that.
The final issue is that we signal a maintenance interrupt when
no pending interrupts are present in the LR. Assuming we've fixed
the two issues above, we end-up in a situation where we keep
exiting as soon as we've reached the active state, and not be
able to inject the following pending.
The fix comes in 3 parts:
- GICv2 SGIs have their source vcpu saved if they are active on
exit, and restored on entry
- Multi-SGIs cannot go via the Pending+Active state, as this would
corrupt the source field
- Multi-SGIs are converted to using MI on EOI instead of NPIE
Fixes: 16ca6a607d84bef0 ("KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-04-18 17:39:04 +08:00
|
|
|
|
|
|
|
cpuid = val & GICH_LR_PHYSID_CPUID;
|
|
|
|
cpuid >>= GICH_LR_PHYSID_CPUID_SHIFT;
|
2015-11-30 21:09:53 +08:00
|
|
|
|
KVM: arm/arm64: vgic: Fix source vcpu issues for GICv2 SGI
Now that we make sure we don't inject multiple instances of the
same GICv2 SGI at the same time, we've made another bug more
obvious:
If we exit with an active SGI, we completely lose track of which
vcpu it came from. On the next entry, we restore it with 0 as a
source, and if that wasn't the right one, too bad. While this
doesn't seem to trouble GIC-400, the architectural model gets
offended and doesn't deactivate the interrupt on EOI.
Another connected issue is that we will happilly make pending
an interrupt from another vcpu, overriding the above zero with
something that is just as inconsistent. Don't do that.
The final issue is that we signal a maintenance interrupt when
no pending interrupts are present in the LR. Assuming we've fixed
the two issues above, we end-up in a situation where we keep
exiting as soon as we've reached the active state, and not be
able to inject the following pending.
The fix comes in 3 parts:
- GICv2 SGIs have their source vcpu saved if they are active on
exit, and restored on entry
- Multi-SGIs cannot go via the Pending+Active state, as this would
corrupt the source field
- Multi-SGIs are converted to using MI on EOI instead of NPIE
Fixes: 16ca6a607d84bef0 ("KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-04-18 17:39:04 +08:00
|
|
|
if (model == KVM_DEV_TYPE_ARM_VGIC_V3) {
|
2015-11-30 21:09:53 +08:00
|
|
|
intid = val & ICH_LR_VIRTUAL_ID_MASK;
|
KVM: arm/arm64: vgic: Fix source vcpu issues for GICv2 SGI
Now that we make sure we don't inject multiple instances of the
same GICv2 SGI at the same time, we've made another bug more
obvious:
If we exit with an active SGI, we completely lose track of which
vcpu it came from. On the next entry, we restore it with 0 as a
source, and if that wasn't the right one, too bad. While this
doesn't seem to trouble GIC-400, the architectural model gets
offended and doesn't deactivate the interrupt on EOI.
Another connected issue is that we will happilly make pending
an interrupt from another vcpu, overriding the above zero with
something that is just as inconsistent. Don't do that.
The final issue is that we signal a maintenance interrupt when
no pending interrupts are present in the LR. Assuming we've fixed
the two issues above, we end-up in a situation where we keep
exiting as soon as we've reached the active state, and not be
able to inject the following pending.
The fix comes in 3 parts:
- GICv2 SGIs have their source vcpu saved if they are active on
exit, and restored on entry
- Multi-SGIs cannot go via the Pending+Active state, as this would
corrupt the source field
- Multi-SGIs are converted to using MI on EOI instead of NPIE
Fixes: 16ca6a607d84bef0 ("KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-04-18 17:39:04 +08:00
|
|
|
} else {
|
2015-11-30 21:09:53 +08:00
|
|
|
intid = val & GICH_LR_VIRTUALID;
|
KVM: arm/arm64: vgic: Fix source vcpu issues for GICv2 SGI
Now that we make sure we don't inject multiple instances of the
same GICv2 SGI at the same time, we've made another bug more
obvious:
If we exit with an active SGI, we completely lose track of which
vcpu it came from. On the next entry, we restore it with 0 as a
source, and if that wasn't the right one, too bad. While this
doesn't seem to trouble GIC-400, the architectural model gets
offended and doesn't deactivate the interrupt on EOI.
Another connected issue is that we will happilly make pending
an interrupt from another vcpu, overriding the above zero with
something that is just as inconsistent. Don't do that.
The final issue is that we signal a maintenance interrupt when
no pending interrupts are present in the LR. Assuming we've fixed
the two issues above, we end-up in a situation where we keep
exiting as soon as we've reached the active state, and not be
able to inject the following pending.
The fix comes in 3 parts:
- GICv2 SGIs have their source vcpu saved if they are active on
exit, and restored on entry
- Multi-SGIs cannot go via the Pending+Active state, as this would
corrupt the source field
- Multi-SGIs are converted to using MI on EOI instead of NPIE
Fixes: 16ca6a607d84bef0 ("KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-04-18 17:39:04 +08:00
|
|
|
is_v2_sgi = vgic_irq_is_sgi(intid);
|
|
|
|
}
|
2016-12-29 22:44:27 +08:00
|
|
|
|
|
|
|
/* Notify fds when the guest EOI'ed a level-triggered IRQ */
|
|
|
|
if (lr_signals_eoi_mi(val) && vgic_valid_spi(vcpu->kvm, intid))
|
|
|
|
kvm_notify_acked_irq(vcpu->kvm, 0,
|
|
|
|
intid - VGIC_NR_PRIVATE_IRQS);
|
|
|
|
|
2015-11-30 21:09:53 +08:00
|
|
|
irq = vgic_get_irq(vcpu->kvm, vcpu, intid);
|
2016-07-15 19:43:33 +08:00
|
|
|
if (!irq) /* An LPI could have been unmapped. */
|
|
|
|
continue;
|
2015-11-30 21:09:53 +08:00
|
|
|
|
2019-01-07 23:06:15 +08:00
|
|
|
raw_spin_lock(&irq->irq_lock);
|
2015-11-30 21:09:53 +08:00
|
|
|
|
2021-08-20 02:03:05 +08:00
|
|
|
/* Always preserve the active bit, note deactivation */
|
|
|
|
deactivated = irq->active && !(val & ICH_LR_ACTIVE_BIT);
|
2015-11-30 21:09:53 +08:00
|
|
|
irq->active = !!(val & ICH_LR_ACTIVE_BIT);
|
|
|
|
|
KVM: arm/arm64: vgic: Fix source vcpu issues for GICv2 SGI
Now that we make sure we don't inject multiple instances of the
same GICv2 SGI at the same time, we've made another bug more
obvious:
If we exit with an active SGI, we completely lose track of which
vcpu it came from. On the next entry, we restore it with 0 as a
source, and if that wasn't the right one, too bad. While this
doesn't seem to trouble GIC-400, the architectural model gets
offended and doesn't deactivate the interrupt on EOI.
Another connected issue is that we will happilly make pending
an interrupt from another vcpu, overriding the above zero with
something that is just as inconsistent. Don't do that.
The final issue is that we signal a maintenance interrupt when
no pending interrupts are present in the LR. Assuming we've fixed
the two issues above, we end-up in a situation where we keep
exiting as soon as we've reached the active state, and not be
able to inject the following pending.
The fix comes in 3 parts:
- GICv2 SGIs have their source vcpu saved if they are active on
exit, and restored on entry
- Multi-SGIs cannot go via the Pending+Active state, as this would
corrupt the source field
- Multi-SGIs are converted to using MI on EOI instead of NPIE
Fixes: 16ca6a607d84bef0 ("KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-04-18 17:39:04 +08:00
|
|
|
if (irq->active && is_v2_sgi)
|
|
|
|
irq->active_source = cpuid;
|
|
|
|
|
2015-11-30 21:09:53 +08:00
|
|
|
/* Edge is the only case where we preserve the pending bit */
|
|
|
|
if (irq->config == VGIC_CONFIG_EDGE &&
|
|
|
|
(val & ICH_LR_PENDING_BIT)) {
|
2017-01-23 21:07:18 +08:00
|
|
|
irq->pending_latch = true;
|
2015-11-30 21:09:53 +08:00
|
|
|
|
KVM: arm/arm64: vgic: Fix source vcpu issues for GICv2 SGI
Now that we make sure we don't inject multiple instances of the
same GICv2 SGI at the same time, we've made another bug more
obvious:
If we exit with an active SGI, we completely lose track of which
vcpu it came from. On the next entry, we restore it with 0 as a
source, and if that wasn't the right one, too bad. While this
doesn't seem to trouble GIC-400, the architectural model gets
offended and doesn't deactivate the interrupt on EOI.
Another connected issue is that we will happilly make pending
an interrupt from another vcpu, overriding the above zero with
something that is just as inconsistent. Don't do that.
The final issue is that we signal a maintenance interrupt when
no pending interrupts are present in the LR. Assuming we've fixed
the two issues above, we end-up in a situation where we keep
exiting as soon as we've reached the active state, and not be
able to inject the following pending.
The fix comes in 3 parts:
- GICv2 SGIs have their source vcpu saved if they are active on
exit, and restored on entry
- Multi-SGIs cannot go via the Pending+Active state, as this would
corrupt the source field
- Multi-SGIs are converted to using MI on EOI instead of NPIE
Fixes: 16ca6a607d84bef0 ("KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-04-18 17:39:04 +08:00
|
|
|
if (is_v2_sgi)
|
2015-11-30 21:09:53 +08:00
|
|
|
irq->source |= (1 << cpuid);
|
|
|
|
}
|
|
|
|
|
2016-05-25 22:26:36 +08:00
|
|
|
/*
|
|
|
|
* Clear soft pending state when level irqs have been acked.
|
|
|
|
*/
|
2018-03-09 22:59:40 +08:00
|
|
|
if (irq->config == VGIC_CONFIG_LEVEL && !(val & ICH_LR_STATE))
|
|
|
|
irq->pending_latch = false;
|
2015-11-30 21:09:53 +08:00
|
|
|
|
2021-08-20 02:03:05 +08:00
|
|
|
/* Handle resampling for mapped interrupts if required */
|
|
|
|
vgic_irq_handle_resampling(irq, deactivated, val & ICH_LR_PENDING_BIT);
|
KVM: arm/arm64: vgic: Support level-triggered mapped interrupts
Level-triggered mapped IRQs are special because we only observe rising
edges as input to the VGIC, and we don't set the EOI flag and therefore
are not told when the level goes down, so that we can re-queue a new
interrupt when the level goes up.
One way to solve this problem is to side-step the logic of the VGIC and
special case the validation in the injection path, but it has the
unfortunate drawback of having to peak into the physical GIC state
whenever we want to know if the interrupt is pending on the virtual
distributor.
Instead, we can maintain the current semantics of a level triggered
interrupt by sort of treating it as an edge-triggered interrupt,
following from the fact that we only observe an asserting edge. This
requires us to be a bit careful when populating the LRs and when folding
the state back in though:
* We lower the line level when populating the LR, so that when
subsequently observing an asserting edge, the VGIC will do the right
thing.
* If the guest never acked the interrupt while running (for example if
it had masked interrupts at the CPU level while running), we have
to preserve the pending state of the LR and move it back to the
line_level field of the struct irq when folding LR state.
If the guest never acked the interrupt while running, but changed the
device state and lowered the line (again with interrupts masked) then
we need to observe this change in the line_level.
Both of the above situations are solved by sampling the physical line
and set the line level when folding the LR back.
* Finally, if the guest never acked the interrupt while running and
sampling the line reveals that the device state has changed and the
line has been lowered, we must clear the physical active state, since
we will otherwise never be told when the interrupt becomes asserted
again.
This has the added benefit of making the timer optimization patches
(https://lists.cs.columbia.edu/pipermail/kvmarm/2017-July/026343.html) a
bit simpler, because the timer code doesn't have to clear the active
state on the sync anymore. It also potentially improves the performance
of the timer implementation because the GIC knows the state or the LR
and only needs to clear the
active state when the pending bit in the LR is still set, where the
timer has to always clear it when returning from running the guest with
an injected timer interrupt.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-08-29 16:40:44 +08:00
|
|
|
|
2019-01-07 23:06:15 +08:00
|
|
|
raw_spin_unlock(&irq->irq_lock);
|
2016-07-15 19:43:27 +08:00
|
|
|
vgic_put_irq(vcpu->kvm, irq);
|
2015-11-30 21:09:53 +08:00
|
|
|
}
|
2017-03-18 20:48:42 +08:00
|
|
|
|
2018-12-02 00:41:28 +08:00
|
|
|
cpuif->used_lrs = 0;
|
2015-11-30 21:09:53 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Requires the irq to be locked already */
|
|
|
|
void vgic_v3_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr)
|
|
|
|
{
|
|
|
|
u32 model = vcpu->kvm->arch.vgic.vgic_model;
|
|
|
|
u64 val = irq->intid;
|
KVM: arm/arm64: vgic: Fix source vcpu issues for GICv2 SGI
Now that we make sure we don't inject multiple instances of the
same GICv2 SGI at the same time, we've made another bug more
obvious:
If we exit with an active SGI, we completely lose track of which
vcpu it came from. On the next entry, we restore it with 0 as a
source, and if that wasn't the right one, too bad. While this
doesn't seem to trouble GIC-400, the architectural model gets
offended and doesn't deactivate the interrupt on EOI.
Another connected issue is that we will happilly make pending
an interrupt from another vcpu, overriding the above zero with
something that is just as inconsistent. Don't do that.
The final issue is that we signal a maintenance interrupt when
no pending interrupts are present in the LR. Assuming we've fixed
the two issues above, we end-up in a situation where we keep
exiting as soon as we've reached the active state, and not be
able to inject the following pending.
The fix comes in 3 parts:
- GICv2 SGIs have their source vcpu saved if they are active on
exit, and restored on entry
- Multi-SGIs cannot go via the Pending+Active state, as this would
corrupt the source field
- Multi-SGIs are converted to using MI on EOI instead of NPIE
Fixes: 16ca6a607d84bef0 ("KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-04-18 17:39:04 +08:00
|
|
|
bool allow_pending = true, is_v2_sgi;
|
2018-03-09 22:59:40 +08:00
|
|
|
|
KVM: arm/arm64: vgic: Fix source vcpu issues for GICv2 SGI
Now that we make sure we don't inject multiple instances of the
same GICv2 SGI at the same time, we've made another bug more
obvious:
If we exit with an active SGI, we completely lose track of which
vcpu it came from. On the next entry, we restore it with 0 as a
source, and if that wasn't the right one, too bad. While this
doesn't seem to trouble GIC-400, the architectural model gets
offended and doesn't deactivate the interrupt on EOI.
Another connected issue is that we will happilly make pending
an interrupt from another vcpu, overriding the above zero with
something that is just as inconsistent. Don't do that.
The final issue is that we signal a maintenance interrupt when
no pending interrupts are present in the LR. Assuming we've fixed
the two issues above, we end-up in a situation where we keep
exiting as soon as we've reached the active state, and not be
able to inject the following pending.
The fix comes in 3 parts:
- GICv2 SGIs have their source vcpu saved if they are active on
exit, and restored on entry
- Multi-SGIs cannot go via the Pending+Active state, as this would
corrupt the source field
- Multi-SGIs are converted to using MI on EOI instead of NPIE
Fixes: 16ca6a607d84bef0 ("KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-04-18 17:39:04 +08:00
|
|
|
is_v2_sgi = (vgic_irq_is_sgi(irq->intid) &&
|
|
|
|
model == KVM_DEV_TYPE_ARM_VGIC_V2);
|
|
|
|
|
|
|
|
if (irq->active) {
|
2018-03-09 22:59:40 +08:00
|
|
|
val |= ICH_LR_ACTIVE_BIT;
|
KVM: arm/arm64: vgic: Fix source vcpu issues for GICv2 SGI
Now that we make sure we don't inject multiple instances of the
same GICv2 SGI at the same time, we've made another bug more
obvious:
If we exit with an active SGI, we completely lose track of which
vcpu it came from. On the next entry, we restore it with 0 as a
source, and if that wasn't the right one, too bad. While this
doesn't seem to trouble GIC-400, the architectural model gets
offended and doesn't deactivate the interrupt on EOI.
Another connected issue is that we will happilly make pending
an interrupt from another vcpu, overriding the above zero with
something that is just as inconsistent. Don't do that.
The final issue is that we signal a maintenance interrupt when
no pending interrupts are present in the LR. Assuming we've fixed
the two issues above, we end-up in a situation where we keep
exiting as soon as we've reached the active state, and not be
able to inject the following pending.
The fix comes in 3 parts:
- GICv2 SGIs have their source vcpu saved if they are active on
exit, and restored on entry
- Multi-SGIs cannot go via the Pending+Active state, as this would
corrupt the source field
- Multi-SGIs are converted to using MI on EOI instead of NPIE
Fixes: 16ca6a607d84bef0 ("KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-04-18 17:39:04 +08:00
|
|
|
if (is_v2_sgi)
|
|
|
|
val |= irq->active_source << GICH_LR_PHYSID_CPUID_SHIFT;
|
|
|
|
if (vgic_irq_is_multi_sgi(irq)) {
|
|
|
|
allow_pending = false;
|
|
|
|
val |= ICH_LR_EOI;
|
|
|
|
}
|
|
|
|
}
|
2018-03-09 22:59:40 +08:00
|
|
|
|
2021-03-15 21:11:58 +08:00
|
|
|
if (irq->hw && !vgic_irq_needs_resampling(irq)) {
|
2018-03-09 22:59:40 +08:00
|
|
|
val |= ICH_LR_HW;
|
|
|
|
val |= ((u64)irq->hwintid) << ICH_LR_PHYS_ID_SHIFT;
|
|
|
|
/*
|
|
|
|
* Never set pending+active on a HW interrupt, as the
|
|
|
|
* pending state is kept at the physical distributor
|
|
|
|
* level.
|
|
|
|
*/
|
|
|
|
if (irq->active)
|
|
|
|
allow_pending = false;
|
|
|
|
} else {
|
|
|
|
if (irq->config == VGIC_CONFIG_LEVEL) {
|
|
|
|
val |= ICH_LR_EOI;
|
2015-11-30 21:09:53 +08:00
|
|
|
|
2018-03-09 22:59:40 +08:00
|
|
|
/*
|
|
|
|
* Software resampling doesn't work very well
|
|
|
|
* if we allow P+A, so let's not do that.
|
|
|
|
*/
|
|
|
|
if (irq->active)
|
|
|
|
allow_pending = false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (allow_pending && irq_is_pending(irq)) {
|
2015-11-30 21:09:53 +08:00
|
|
|
val |= ICH_LR_PENDING_BIT;
|
|
|
|
|
|
|
|
if (irq->config == VGIC_CONFIG_EDGE)
|
2017-01-23 21:07:18 +08:00
|
|
|
irq->pending_latch = false;
|
2015-11-30 21:09:53 +08:00
|
|
|
|
|
|
|
if (vgic_irq_is_sgi(irq->intid) &&
|
|
|
|
model == KVM_DEV_TYPE_ARM_VGIC_V2) {
|
|
|
|
u32 src = ffs(irq->source);
|
|
|
|
|
2019-08-28 18:10:16 +08:00
|
|
|
if (WARN_RATELIMIT(!src, "No SGI source for INTID %d\n",
|
|
|
|
irq->intid))
|
|
|
|
return;
|
|
|
|
|
2015-11-30 21:09:53 +08:00
|
|
|
val |= (src - 1) << GICH_LR_PHYSID_CPUID_SHIFT;
|
|
|
|
irq->source &= ~(1 << (src - 1));
|
KVM: arm/arm64: vgic: Fix source vcpu issues for GICv2 SGI
Now that we make sure we don't inject multiple instances of the
same GICv2 SGI at the same time, we've made another bug more
obvious:
If we exit with an active SGI, we completely lose track of which
vcpu it came from. On the next entry, we restore it with 0 as a
source, and if that wasn't the right one, too bad. While this
doesn't seem to trouble GIC-400, the architectural model gets
offended and doesn't deactivate the interrupt on EOI.
Another connected issue is that we will happilly make pending
an interrupt from another vcpu, overriding the above zero with
something that is just as inconsistent. Don't do that.
The final issue is that we signal a maintenance interrupt when
no pending interrupts are present in the LR. Assuming we've fixed
the two issues above, we end-up in a situation where we keep
exiting as soon as we've reached the active state, and not be
able to inject the following pending.
The fix comes in 3 parts:
- GICv2 SGIs have their source vcpu saved if they are active on
exit, and restored on entry
- Multi-SGIs cannot go via the Pending+Active state, as this would
corrupt the source field
- Multi-SGIs are converted to using MI on EOI instead of NPIE
Fixes: 16ca6a607d84bef0 ("KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-04-18 17:39:04 +08:00
|
|
|
if (irq->source) {
|
2017-01-23 21:07:18 +08:00
|
|
|
irq->pending_latch = true;
|
KVM: arm/arm64: vgic: Fix source vcpu issues for GICv2 SGI
Now that we make sure we don't inject multiple instances of the
same GICv2 SGI at the same time, we've made another bug more
obvious:
If we exit with an active SGI, we completely lose track of which
vcpu it came from. On the next entry, we restore it with 0 as a
source, and if that wasn't the right one, too bad. While this
doesn't seem to trouble GIC-400, the architectural model gets
offended and doesn't deactivate the interrupt on EOI.
Another connected issue is that we will happilly make pending
an interrupt from another vcpu, overriding the above zero with
something that is just as inconsistent. Don't do that.
The final issue is that we signal a maintenance interrupt when
no pending interrupts are present in the LR. Assuming we've fixed
the two issues above, we end-up in a situation where we keep
exiting as soon as we've reached the active state, and not be
able to inject the following pending.
The fix comes in 3 parts:
- GICv2 SGIs have their source vcpu saved if they are active on
exit, and restored on entry
- Multi-SGIs cannot go via the Pending+Active state, as this would
corrupt the source field
- Multi-SGIs are converted to using MI on EOI instead of NPIE
Fixes: 16ca6a607d84bef0 ("KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-04-18 17:39:04 +08:00
|
|
|
val |= ICH_LR_EOI;
|
|
|
|
}
|
2015-11-30 21:09:53 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
KVM: arm/arm64: vgic: Support level-triggered mapped interrupts
Level-triggered mapped IRQs are special because we only observe rising
edges as input to the VGIC, and we don't set the EOI flag and therefore
are not told when the level goes down, so that we can re-queue a new
interrupt when the level goes up.
One way to solve this problem is to side-step the logic of the VGIC and
special case the validation in the injection path, but it has the
unfortunate drawback of having to peak into the physical GIC state
whenever we want to know if the interrupt is pending on the virtual
distributor.
Instead, we can maintain the current semantics of a level triggered
interrupt by sort of treating it as an edge-triggered interrupt,
following from the fact that we only observe an asserting edge. This
requires us to be a bit careful when populating the LRs and when folding
the state back in though:
* We lower the line level when populating the LR, so that when
subsequently observing an asserting edge, the VGIC will do the right
thing.
* If the guest never acked the interrupt while running (for example if
it had masked interrupts at the CPU level while running), we have
to preserve the pending state of the LR and move it back to the
line_level field of the struct irq when folding LR state.
If the guest never acked the interrupt while running, but changed the
device state and lowered the line (again with interrupts masked) then
we need to observe this change in the line_level.
Both of the above situations are solved by sampling the physical line
and set the line level when folding the LR back.
* Finally, if the guest never acked the interrupt while running and
sampling the line reveals that the device state has changed and the
line has been lowered, we must clear the physical active state, since
we will otherwise never be told when the interrupt becomes asserted
again.
This has the added benefit of making the timer optimization patches
(https://lists.cs.columbia.edu/pipermail/kvmarm/2017-July/026343.html) a
bit simpler, because the timer code doesn't have to clear the active
state on the sync anymore. It also potentially improves the performance
of the timer implementation because the GIC knows the state or the LR
and only needs to clear the
active state when the pending bit in the LR is still set, where the
timer has to always clear it when returning from running the guest with
an injected timer interrupt.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-08-29 16:40:44 +08:00
|
|
|
/*
|
|
|
|
* Level-triggered mapped IRQs are special because we only observe
|
|
|
|
* rising edges as input to the VGIC. We therefore lower the line
|
|
|
|
* level here, so that we can take new virtual IRQs. See
|
|
|
|
* vgic_v3_fold_lr_state for more info.
|
|
|
|
*/
|
|
|
|
if (vgic_irq_is_mapped_level(irq) && (val & ICH_LR_PENDING_BIT))
|
|
|
|
irq->line_level = false;
|
|
|
|
|
2018-07-16 21:06:22 +08:00
|
|
|
if (irq->group)
|
2015-11-30 21:09:53 +08:00
|
|
|
val |= ICH_LR_GROUP;
|
|
|
|
|
|
|
|
val |= (u64)irq->priority << ICH_LR_PRIORITY_SHIFT;
|
|
|
|
|
|
|
|
vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[lr] = val;
|
|
|
|
}
|
|
|
|
|
|
|
|
void vgic_v3_clear_lr(struct kvm_vcpu *vcpu, int lr)
|
|
|
|
{
|
|
|
|
vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[lr] = 0;
|
|
|
|
}
|
2015-12-03 19:47:37 +08:00
|
|
|
|
|
|
|
void vgic_v3_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcrp)
|
|
|
|
{
|
2016-03-24 18:21:04 +08:00
|
|
|
struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
|
2017-05-20 20:12:34 +08:00
|
|
|
u32 model = vcpu->kvm->arch.vgic.vgic_model;
|
2015-12-03 19:47:37 +08:00
|
|
|
u32 vmcr;
|
|
|
|
|
2017-05-20 20:12:34 +08:00
|
|
|
if (model == KVM_DEV_TYPE_ARM_VGIC_V2) {
|
|
|
|
vmcr = (vmcrp->ackctl << ICH_VMCR_ACK_CTL_SHIFT) &
|
|
|
|
ICH_VMCR_ACK_CTL_MASK;
|
|
|
|
vmcr |= (vmcrp->fiqen << ICH_VMCR_FIQ_EN_SHIFT) &
|
|
|
|
ICH_VMCR_FIQ_EN_MASK;
|
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* When emulating GICv3 on GICv3 with SRE=1 on the
|
|
|
|
* VFIQEn bit is RES1 and the VAckCtl bit is RES0.
|
|
|
|
*/
|
|
|
|
vmcr = ICH_VMCR_FIQ_EN_MASK;
|
|
|
|
}
|
|
|
|
|
|
|
|
vmcr |= (vmcrp->cbpr << ICH_VMCR_CBPR_SHIFT) & ICH_VMCR_CBPR_MASK;
|
|
|
|
vmcr |= (vmcrp->eoim << ICH_VMCR_EOIM_SHIFT) & ICH_VMCR_EOIM_MASK;
|
2015-12-03 19:47:37 +08:00
|
|
|
vmcr |= (vmcrp->abpr << ICH_VMCR_BPR1_SHIFT) & ICH_VMCR_BPR1_MASK;
|
|
|
|
vmcr |= (vmcrp->bpr << ICH_VMCR_BPR0_SHIFT) & ICH_VMCR_BPR0_MASK;
|
|
|
|
vmcr |= (vmcrp->pmr << ICH_VMCR_PMR_SHIFT) & ICH_VMCR_PMR_MASK;
|
2017-01-26 22:20:50 +08:00
|
|
|
vmcr |= (vmcrp->grpen0 << ICH_VMCR_ENG0_SHIFT) & ICH_VMCR_ENG0_MASK;
|
|
|
|
vmcr |= (vmcrp->grpen1 << ICH_VMCR_ENG1_SHIFT) & ICH_VMCR_ENG1_MASK;
|
2015-12-03 19:47:37 +08:00
|
|
|
|
2016-03-24 18:21:04 +08:00
|
|
|
cpu_if->vgic_vmcr = vmcr;
|
2015-12-03 19:47:37 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
void vgic_v3_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcrp)
|
|
|
|
{
|
2016-03-24 18:21:04 +08:00
|
|
|
struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
|
2017-05-20 20:12:34 +08:00
|
|
|
u32 model = vcpu->kvm->arch.vgic.vgic_model;
|
2016-03-24 18:21:04 +08:00
|
|
|
u32 vmcr;
|
|
|
|
|
|
|
|
vmcr = cpu_if->vgic_vmcr;
|
2015-12-03 19:47:37 +08:00
|
|
|
|
2017-05-20 20:12:34 +08:00
|
|
|
if (model == KVM_DEV_TYPE_ARM_VGIC_V2) {
|
|
|
|
vmcrp->ackctl = (vmcr & ICH_VMCR_ACK_CTL_MASK) >>
|
|
|
|
ICH_VMCR_ACK_CTL_SHIFT;
|
|
|
|
vmcrp->fiqen = (vmcr & ICH_VMCR_FIQ_EN_MASK) >>
|
|
|
|
ICH_VMCR_FIQ_EN_SHIFT;
|
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* When emulating GICv3 on GICv3 with SRE=1 on the
|
|
|
|
* VFIQEn bit is RES1 and the VAckCtl bit is RES0.
|
|
|
|
*/
|
|
|
|
vmcrp->fiqen = 1;
|
|
|
|
vmcrp->ackctl = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
vmcrp->cbpr = (vmcr & ICH_VMCR_CBPR_MASK) >> ICH_VMCR_CBPR_SHIFT;
|
|
|
|
vmcrp->eoim = (vmcr & ICH_VMCR_EOIM_MASK) >> ICH_VMCR_EOIM_SHIFT;
|
2015-12-03 19:47:37 +08:00
|
|
|
vmcrp->abpr = (vmcr & ICH_VMCR_BPR1_MASK) >> ICH_VMCR_BPR1_SHIFT;
|
|
|
|
vmcrp->bpr = (vmcr & ICH_VMCR_BPR0_MASK) >> ICH_VMCR_BPR0_SHIFT;
|
|
|
|
vmcrp->pmr = (vmcr & ICH_VMCR_PMR_MASK) >> ICH_VMCR_PMR_SHIFT;
|
2017-01-26 22:20:50 +08:00
|
|
|
vmcrp->grpen0 = (vmcr & ICH_VMCR_ENG0_MASK) >> ICH_VMCR_ENG0_SHIFT;
|
|
|
|
vmcrp->grpen1 = (vmcr & ICH_VMCR_ENG1_MASK) >> ICH_VMCR_ENG1_SHIFT;
|
2015-12-03 19:47:37 +08:00
|
|
|
}
|
2015-12-01 22:02:35 +08:00
|
|
|
|
2016-07-15 19:43:29 +08:00
|
|
|
#define INITIAL_PENDBASER_VALUE \
|
|
|
|
(GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, RaWb) | \
|
|
|
|
GIC_BASER_CACHEABILITY(GICR_PENDBASER, OUTER, SameAsInner) | \
|
|
|
|
GIC_BASER_SHAREABILITY(GICR_PENDBASER, InnerShareable))
|
|
|
|
|
2015-12-22 01:09:38 +08:00
|
|
|
void vgic_v3_enable(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2015-12-02 17:30:13 +08:00
|
|
|
struct vgic_v3_cpu_if *vgic_v3 = &vcpu->arch.vgic_cpu.vgic_v3;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* By forcing VMCR to zero, the GIC will restore the binary
|
|
|
|
* points to their reset values. Anything else resets to zero
|
|
|
|
* anyway.
|
|
|
|
*/
|
|
|
|
vgic_v3->vgic_vmcr = 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we are emulating a GICv3, we do it in an non-GICv2-compatible
|
|
|
|
* way, so we force SRE to 1 to demonstrate this to the guest.
|
2017-02-21 19:32:47 +08:00
|
|
|
* Also, we don't support any form of IRQ/FIQ bypass.
|
2015-12-02 17:30:13 +08:00
|
|
|
* This goes with the spec allowing the value to be RAO/WI.
|
|
|
|
*/
|
2016-07-15 19:43:29 +08:00
|
|
|
if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
|
2017-02-21 19:32:47 +08:00
|
|
|
vgic_v3->vgic_sre = (ICC_SRE_EL1_DIB |
|
|
|
|
ICC_SRE_EL1_DFB |
|
|
|
|
ICC_SRE_EL1_SRE);
|
2016-07-15 19:43:29 +08:00
|
|
|
vcpu->arch.vgic_cpu.pendbaser = INITIAL_PENDBASER_VALUE;
|
|
|
|
} else {
|
2015-12-02 17:30:13 +08:00
|
|
|
vgic_v3->vgic_sre = 0;
|
2016-07-15 19:43:29 +08:00
|
|
|
}
|
2015-12-02 17:30:13 +08:00
|
|
|
|
2017-01-26 22:20:51 +08:00
|
|
|
vcpu->arch.vgic_cpu.num_id_bits = (kvm_vgic_global_state.ich_vtr_el2 &
|
|
|
|
ICH_VTR_ID_BITS_MASK) >>
|
|
|
|
ICH_VTR_ID_BITS_SHIFT;
|
|
|
|
vcpu->arch.vgic_cpu.num_pri_bits = ((kvm_vgic_global_state.ich_vtr_el2 &
|
|
|
|
ICH_VTR_PRI_BITS_MASK) >>
|
|
|
|
ICH_VTR_PRI_BITS_SHIFT) + 1;
|
|
|
|
|
2015-12-02 17:30:13 +08:00
|
|
|
/* Get the show on the road... */
|
|
|
|
vgic_v3->vgic_hcr = ICH_HCR_EN;
|
2017-06-09 19:49:45 +08:00
|
|
|
if (group0_trap)
|
|
|
|
vgic_v3->vgic_hcr |= ICH_HCR_TALL0;
|
2017-06-09 19:49:40 +08:00
|
|
|
if (group1_trap)
|
|
|
|
vgic_v3->vgic_hcr |= ICH_HCR_TALL1;
|
2017-06-09 19:49:53 +08:00
|
|
|
if (common_trap)
|
|
|
|
vgic_v3->vgic_hcr |= ICH_HCR_TC;
|
KVM: arm64: vgic-v3: Reduce common group trapping to ICV_DIR_EL1 when possible
On systems that advertise ICH_VTR_EL2.SEIS, we trap all GICv3 sysreg
accesses from the guest. From a performance perspective, this is OK
as long as the guest doesn't hammer the GICv3 CPU interface.
In most cases, this is fine, unless the guest actively uses
priorities and switches PMR_EL1 very often. Which is exactly what
happens when a Linux guest runs with irqchip.gicv3_pseudo_nmi=1.
In these condition, the performance plumets as we hit PMR each time
we mask/unmask interrupts. Not good.
There is however an opportunity for improvement. Careful reading
of the architecture specification indicates that the only GICv3
sysreg belonging to the common group (which contains the SGI
registers, PMR, DIR, CTLR and RPR) that is allowed to generate
a SError is DIR. Everything else is safe.
It is thus possible to substitute the trapping of all the common
group with just that of DIR if it supported by the implementation.
Yes, that's yet another optional bit of the architecture.
So let's just do that, as it leads to some impressive result on
the M1:
Without this change:
bash-5.1# /host/home/maz/hackbench 100 process 1000
Running with 100*40 (== 4000) tasks.
Time: 56.596
With this change:
bash-5.1# /host/home/maz/hackbench 100 process 1000
Running with 100*40 (== 4000) tasks.
Time: 8.649
which is a pretty convincing result.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20211010150910.2911495-4-maz@kernel.org
2021-10-10 23:09:08 +08:00
|
|
|
if (dir_trap)
|
|
|
|
vgic_v3->vgic_hcr |= ICH_HCR_TDIR;
|
2015-12-22 01:09:38 +08:00
|
|
|
}
|
|
|
|
|
2017-05-04 17:19:52 +08:00
|
|
|
int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq)
|
|
|
|
{
|
|
|
|
struct kvm_vcpu *vcpu;
|
|
|
|
int byte_offset, bit_nr;
|
|
|
|
gpa_t pendbase, ptr;
|
|
|
|
bool status;
|
|
|
|
u8 val;
|
|
|
|
int ret;
|
2016-10-17 04:19:11 +08:00
|
|
|
unsigned long flags;
|
2017-05-04 17:19:52 +08:00
|
|
|
|
|
|
|
retry:
|
|
|
|
vcpu = irq->target_vcpu;
|
|
|
|
if (!vcpu)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
pendbase = GICR_PENDBASER_ADDRESS(vcpu->arch.vgic_cpu.pendbaser);
|
|
|
|
|
|
|
|
byte_offset = irq->intid / BITS_PER_BYTE;
|
|
|
|
bit_nr = irq->intid % BITS_PER_BYTE;
|
|
|
|
ptr = pendbase + byte_offset;
|
|
|
|
|
2018-05-11 22:20:15 +08:00
|
|
|
ret = kvm_read_guest_lock(kvm, ptr, &val, 1);
|
2017-05-04 17:19:52 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
status = val & (1 << bit_nr);
|
|
|
|
|
2019-01-07 23:06:15 +08:00
|
|
|
raw_spin_lock_irqsave(&irq->irq_lock, flags);
|
2017-05-04 17:19:52 +08:00
|
|
|
if (irq->target_vcpu != vcpu) {
|
2019-01-07 23:06:15 +08:00
|
|
|
raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
|
2017-05-04 17:19:52 +08:00
|
|
|
goto retry;
|
|
|
|
}
|
|
|
|
irq->pending_latch = status;
|
2016-10-17 04:19:11 +08:00
|
|
|
vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
|
2017-05-04 17:19:52 +08:00
|
|
|
|
|
|
|
if (status) {
|
|
|
|
/* clear consumed data */
|
|
|
|
val &= ~(1 << bit_nr);
|
KVM: arm/arm64: vgic-its: Take the srcu lock when writing to guest memory
When halting a guest, QEMU flushes the virtual ITS caches, which
amounts to writing to the various tables that the guest has allocated.
When doing this, we fail to take the srcu lock, and the kernel
shouts loudly if running a lockdep kernel:
[ 69.680416] =============================
[ 69.680819] WARNING: suspicious RCU usage
[ 69.681526] 5.1.0-rc1-00008-g600025238f51-dirty #18 Not tainted
[ 69.682096] -----------------------------
[ 69.682501] ./include/linux/kvm_host.h:605 suspicious rcu_dereference_check() usage!
[ 69.683225]
[ 69.683225] other info that might help us debug this:
[ 69.683225]
[ 69.683975]
[ 69.683975] rcu_scheduler_active = 2, debug_locks = 1
[ 69.684598] 6 locks held by qemu-system-aar/4097:
[ 69.685059] #0: 0000000034196013 (&kvm->lock){+.+.}, at: vgic_its_set_attr+0x244/0x3a0
[ 69.686087] #1: 00000000f2ed935e (&its->its_lock){+.+.}, at: vgic_its_set_attr+0x250/0x3a0
[ 69.686919] #2: 000000005e71ea54 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[ 69.687698] #3: 00000000c17e548d (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[ 69.688475] #4: 00000000ba386017 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[ 69.689978] #5: 00000000c2c3c335 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[ 69.690729]
[ 69.690729] stack backtrace:
[ 69.691151] CPU: 2 PID: 4097 Comm: qemu-system-aar Not tainted 5.1.0-rc1-00008-g600025238f51-dirty #18
[ 69.691984] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2019.04-rc3-00124-g2feec69fb1 03/15/2019
[ 69.692831] Call trace:
[ 69.694072] lockdep_rcu_suspicious+0xcc/0x110
[ 69.694490] gfn_to_memslot+0x174/0x190
[ 69.694853] kvm_write_guest+0x50/0xb0
[ 69.695209] vgic_its_save_tables_v0+0x248/0x330
[ 69.695639] vgic_its_set_attr+0x298/0x3a0
[ 69.696024] kvm_device_ioctl_attr+0x9c/0xd8
[ 69.696424] kvm_device_ioctl+0x8c/0xf8
[ 69.696788] do_vfs_ioctl+0xc8/0x960
[ 69.697128] ksys_ioctl+0x8c/0xa0
[ 69.697445] __arm64_sys_ioctl+0x28/0x38
[ 69.697817] el0_svc_common+0xd8/0x138
[ 69.698173] el0_svc_handler+0x38/0x78
[ 69.698528] el0_svc+0x8/0xc
The fix is to obviously take the srcu lock, just like we do on the
read side of things since bf308242ab98. One wonders why this wasn't
fixed at the same time, but hey...
Fixes: bf308242ab98 ("KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-03-19 20:47:11 +08:00
|
|
|
ret = kvm_write_guest_lock(kvm, ptr, &val, 1);
|
2017-05-04 17:19:52 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2021-03-22 14:01:56 +08:00
|
|
|
/*
|
|
|
|
* The deactivation of the doorbell interrupt will trigger the
|
|
|
|
* unmapping of the associated vPE.
|
|
|
|
*/
|
|
|
|
static void unmap_all_vpes(struct vgic_dist *dist)
|
|
|
|
{
|
|
|
|
struct irq_desc *desc;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < dist->its_vm.nr_vpes; i++) {
|
|
|
|
desc = irq_to_desc(dist->its_vm.vpes[i]->irq);
|
|
|
|
irq_domain_deactivate_irq(irq_desc_get_irq_data(desc));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void map_all_vpes(struct vgic_dist *dist)
|
|
|
|
{
|
|
|
|
struct irq_desc *desc;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < dist->its_vm.nr_vpes; i++) {
|
|
|
|
desc = irq_to_desc(dist->its_vm.vpes[i]->irq);
|
|
|
|
irq_domain_activate_irq(irq_desc_get_irq_data(desc), false);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-01-09 23:28:27 +08:00
|
|
|
/**
|
2019-10-29 15:19:18 +08:00
|
|
|
* vgic_v3_save_pending_tables - Save the pending tables into guest RAM
|
2017-01-09 23:28:27 +08:00
|
|
|
* kvm lock and all vcpu lock must be held
|
|
|
|
*/
|
|
|
|
int vgic_v3_save_pending_tables(struct kvm *kvm)
|
|
|
|
{
|
|
|
|
struct vgic_dist *dist = &kvm->arch.vgic;
|
|
|
|
struct vgic_irq *irq;
|
2019-10-29 15:19:19 +08:00
|
|
|
gpa_t last_ptr = ~(gpa_t)0;
|
2021-03-22 14:01:56 +08:00
|
|
|
bool vlpi_avail = false;
|
|
|
|
int ret = 0;
|
2017-11-17 01:58:16 +08:00
|
|
|
u8 val;
|
2017-01-09 23:28:27 +08:00
|
|
|
|
2021-03-22 14:01:56 +08:00
|
|
|
if (unlikely(!vgic_initialized(kvm)))
|
|
|
|
return -ENXIO;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* A preparation for getting any VLPI states.
|
|
|
|
* The above vgic initialized check also ensures that the allocation
|
|
|
|
* and enabling of the doorbells have already been done.
|
|
|
|
*/
|
|
|
|
if (kvm_vgic_global_state.has_gicv4_1) {
|
|
|
|
unmap_all_vpes(dist);
|
|
|
|
vlpi_avail = true;
|
|
|
|
}
|
|
|
|
|
2017-01-09 23:28:27 +08:00
|
|
|
list_for_each_entry(irq, &dist->lpi_list_head, lpi_list) {
|
|
|
|
int byte_offset, bit_nr;
|
|
|
|
struct kvm_vcpu *vcpu;
|
|
|
|
gpa_t pendbase, ptr;
|
2021-03-22 14:01:56 +08:00
|
|
|
bool is_pending;
|
2017-01-09 23:28:27 +08:00
|
|
|
bool stored;
|
|
|
|
|
|
|
|
vcpu = irq->target_vcpu;
|
|
|
|
if (!vcpu)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
pendbase = GICR_PENDBASER_ADDRESS(vcpu->arch.vgic_cpu.pendbaser);
|
|
|
|
|
|
|
|
byte_offset = irq->intid / BITS_PER_BYTE;
|
|
|
|
bit_nr = irq->intid % BITS_PER_BYTE;
|
|
|
|
ptr = pendbase + byte_offset;
|
|
|
|
|
2019-10-29 15:19:19 +08:00
|
|
|
if (ptr != last_ptr) {
|
2018-05-11 22:20:15 +08:00
|
|
|
ret = kvm_read_guest_lock(kvm, ptr, &val, 1);
|
2017-01-09 23:28:27 +08:00
|
|
|
if (ret)
|
2021-03-22 14:01:56 +08:00
|
|
|
goto out;
|
2019-10-29 15:19:19 +08:00
|
|
|
last_ptr = ptr;
|
2017-01-09 23:28:27 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
stored = val & (1U << bit_nr);
|
2021-03-22 14:01:56 +08:00
|
|
|
|
|
|
|
is_pending = irq->pending_latch;
|
|
|
|
|
|
|
|
if (irq->hw && vlpi_avail)
|
|
|
|
vgic_v4_get_vlpi_state(irq, &is_pending);
|
|
|
|
|
|
|
|
if (stored == is_pending)
|
2017-01-09 23:28:27 +08:00
|
|
|
continue;
|
|
|
|
|
2021-03-22 14:01:56 +08:00
|
|
|
if (is_pending)
|
2017-01-09 23:28:27 +08:00
|
|
|
val |= 1 << bit_nr;
|
|
|
|
else
|
|
|
|
val &= ~(1 << bit_nr);
|
|
|
|
|
KVM: arm/arm64: vgic-its: Take the srcu lock when writing to guest memory
When halting a guest, QEMU flushes the virtual ITS caches, which
amounts to writing to the various tables that the guest has allocated.
When doing this, we fail to take the srcu lock, and the kernel
shouts loudly if running a lockdep kernel:
[ 69.680416] =============================
[ 69.680819] WARNING: suspicious RCU usage
[ 69.681526] 5.1.0-rc1-00008-g600025238f51-dirty #18 Not tainted
[ 69.682096] -----------------------------
[ 69.682501] ./include/linux/kvm_host.h:605 suspicious rcu_dereference_check() usage!
[ 69.683225]
[ 69.683225] other info that might help us debug this:
[ 69.683225]
[ 69.683975]
[ 69.683975] rcu_scheduler_active = 2, debug_locks = 1
[ 69.684598] 6 locks held by qemu-system-aar/4097:
[ 69.685059] #0: 0000000034196013 (&kvm->lock){+.+.}, at: vgic_its_set_attr+0x244/0x3a0
[ 69.686087] #1: 00000000f2ed935e (&its->its_lock){+.+.}, at: vgic_its_set_attr+0x250/0x3a0
[ 69.686919] #2: 000000005e71ea54 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[ 69.687698] #3: 00000000c17e548d (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[ 69.688475] #4: 00000000ba386017 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[ 69.689978] #5: 00000000c2c3c335 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[ 69.690729]
[ 69.690729] stack backtrace:
[ 69.691151] CPU: 2 PID: 4097 Comm: qemu-system-aar Not tainted 5.1.0-rc1-00008-g600025238f51-dirty #18
[ 69.691984] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2019.04-rc3-00124-g2feec69fb1 03/15/2019
[ 69.692831] Call trace:
[ 69.694072] lockdep_rcu_suspicious+0xcc/0x110
[ 69.694490] gfn_to_memslot+0x174/0x190
[ 69.694853] kvm_write_guest+0x50/0xb0
[ 69.695209] vgic_its_save_tables_v0+0x248/0x330
[ 69.695639] vgic_its_set_attr+0x298/0x3a0
[ 69.696024] kvm_device_ioctl_attr+0x9c/0xd8
[ 69.696424] kvm_device_ioctl+0x8c/0xf8
[ 69.696788] do_vfs_ioctl+0xc8/0x960
[ 69.697128] ksys_ioctl+0x8c/0xa0
[ 69.697445] __arm64_sys_ioctl+0x28/0x38
[ 69.697817] el0_svc_common+0xd8/0x138
[ 69.698173] el0_svc_handler+0x38/0x78
[ 69.698528] el0_svc+0x8/0xc
The fix is to obviously take the srcu lock, just like we do on the
read side of things since bf308242ab98. One wonders why this wasn't
fixed at the same time, but hey...
Fixes: bf308242ab98 ("KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-03-19 20:47:11 +08:00
|
|
|
ret = kvm_write_guest_lock(kvm, ptr, &val, 1);
|
2017-01-09 23:28:27 +08:00
|
|
|
if (ret)
|
2021-03-22 14:01:56 +08:00
|
|
|
goto out;
|
2017-01-09 23:28:27 +08:00
|
|
|
}
|
2021-03-22 14:01:56 +08:00
|
|
|
|
|
|
|
out:
|
|
|
|
if (vlpi_avail)
|
|
|
|
map_all_vpes(dist);
|
|
|
|
|
|
|
|
return ret;
|
2017-01-09 23:28:27 +08:00
|
|
|
}
|
|
|
|
|
2018-05-22 15:55:11 +08:00
|
|
|
/**
|
|
|
|
* vgic_v3_rdist_overlap - check if a region overlaps with any
|
|
|
|
* existing redistributor region
|
|
|
|
*
|
|
|
|
* @kvm: kvm handle
|
|
|
|
* @base: base of the region
|
|
|
|
* @size: size of region
|
|
|
|
*
|
|
|
|
* Return: true if there is an overlap
|
|
|
|
*/
|
|
|
|
bool vgic_v3_rdist_overlap(struct kvm *kvm, gpa_t base, size_t size)
|
|
|
|
{
|
|
|
|
struct vgic_dist *d = &kvm->arch.vgic;
|
|
|
|
struct vgic_redist_region *rdreg;
|
|
|
|
|
|
|
|
list_for_each_entry(rdreg, &d->rd_regions, list) {
|
|
|
|
if ((base + size > rdreg->base) &&
|
|
|
|
(base < rdreg->base + vgic_v3_rd_region_size(kvm, rdreg)))
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2017-05-08 18:23:51 +08:00
|
|
|
/*
|
|
|
|
* Check for overlapping regions and for regions crossing the end of memory
|
|
|
|
* for base addresses which have already been set.
|
|
|
|
*/
|
|
|
|
bool vgic_v3_check_base(struct kvm *kvm)
|
2015-12-21 22:04:42 +08:00
|
|
|
{
|
|
|
|
struct vgic_dist *d = &kvm->arch.vgic;
|
2018-05-22 15:55:11 +08:00
|
|
|
struct vgic_redist_region *rdreg;
|
2015-12-21 22:04:42 +08:00
|
|
|
|
2017-05-08 18:23:51 +08:00
|
|
|
if (!IS_VGIC_ADDR_UNDEF(d->vgic_dist_base) &&
|
|
|
|
d->vgic_dist_base + KVM_VGIC_V3_DIST_SIZE < d->vgic_dist_base)
|
2015-12-21 22:04:42 +08:00
|
|
|
return false;
|
2017-05-08 18:23:51 +08:00
|
|
|
|
2018-05-22 15:55:11 +08:00
|
|
|
list_for_each_entry(rdreg, &d->rd_regions, list) {
|
2021-10-05 09:19:12 +08:00
|
|
|
size_t sz = vgic_v3_rd_region_size(kvm, rdreg);
|
|
|
|
|
|
|
|
if (vgic_check_iorange(kvm, VGIC_ADDR_UNDEF,
|
|
|
|
rdreg->base, SZ_64K, sz))
|
2018-05-22 15:55:11 +08:00
|
|
|
return false;
|
|
|
|
}
|
2015-12-21 22:04:42 +08:00
|
|
|
|
2018-05-22 15:55:11 +08:00
|
|
|
if (IS_VGIC_ADDR_UNDEF(d->vgic_dist_base))
|
2017-05-08 18:23:51 +08:00
|
|
|
return true;
|
|
|
|
|
2018-05-22 15:55:11 +08:00
|
|
|
return !vgic_v3_rdist_overlap(kvm, d->vgic_dist_base,
|
|
|
|
KVM_VGIC_V3_DIST_SIZE);
|
2015-12-21 22:04:42 +08:00
|
|
|
}
|
|
|
|
|
2018-05-22 15:55:09 +08:00
|
|
|
/**
|
|
|
|
* vgic_v3_rdist_free_slot - Look up registered rdist regions and identify one
|
|
|
|
* which has free space to put a new rdist region.
|
|
|
|
*
|
|
|
|
* @rd_regions: redistributor region list head
|
|
|
|
*
|
|
|
|
* A redistributor regions maps n redistributors, n = region size / (2 x 64kB).
|
|
|
|
* Stride between redistributors is 0 and regions are filled in the index order.
|
|
|
|
*
|
|
|
|
* Return: the redist region handle, if any, that has space to map a new rdist
|
|
|
|
* region.
|
|
|
|
*/
|
|
|
|
struct vgic_redist_region *vgic_v3_rdist_free_slot(struct list_head *rd_regions)
|
|
|
|
{
|
|
|
|
struct vgic_redist_region *rdreg;
|
|
|
|
|
|
|
|
list_for_each_entry(rdreg, rd_regions, list) {
|
|
|
|
if (!vgic_v3_redist_region_full(rdreg))
|
|
|
|
return rdreg;
|
|
|
|
}
|
|
|
|
return NULL;
|
2015-12-21 22:04:42 +08:00
|
|
|
}
|
|
|
|
|
2018-05-22 15:55:17 +08:00
|
|
|
struct vgic_redist_region *vgic_v3_rdist_region_from_index(struct kvm *kvm,
|
|
|
|
u32 index)
|
|
|
|
{
|
|
|
|
struct list_head *rd_regions = &kvm->arch.vgic.rd_regions;
|
|
|
|
struct vgic_redist_region *rdreg;
|
|
|
|
|
|
|
|
list_for_each_entry(rdreg, rd_regions, list) {
|
|
|
|
if (rdreg->index == index)
|
|
|
|
return rdreg;
|
|
|
|
}
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2015-12-21 22:04:42 +08:00
|
|
|
int vgic_v3_map_resources(struct kvm *kvm)
|
|
|
|
{
|
|
|
|
struct vgic_dist *dist = &kvm->arch.vgic;
|
2018-05-22 15:55:15 +08:00
|
|
|
struct kvm_vcpu *vcpu;
|
|
|
|
int ret = 0;
|
2021-11-17 00:04:02 +08:00
|
|
|
unsigned long c;
|
2015-12-21 22:04:42 +08:00
|
|
|
|
2018-05-22 15:55:15 +08:00
|
|
|
kvm_for_each_vcpu(c, vcpu, kvm) {
|
|
|
|
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
|
|
|
|
|
|
|
|
if (IS_VGIC_ADDR_UNDEF(vgic_cpu->rd_iodev.base_addr)) {
|
2021-11-17 00:04:02 +08:00
|
|
|
kvm_debug("vcpu %ld redistributor base not set\n", c);
|
2020-12-27 22:28:34 +08:00
|
|
|
return -ENXIO;
|
2018-05-22 15:55:15 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (IS_VGIC_ADDR_UNDEF(dist->vgic_dist_base)) {
|
2021-12-16 18:45:07 +08:00
|
|
|
kvm_debug("Need to set vgic distributor addresses first\n");
|
2020-12-27 22:28:34 +08:00
|
|
|
return -ENXIO;
|
2015-12-21 22:04:42 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
if (!vgic_v3_check_base(kvm)) {
|
2021-12-16 18:45:07 +08:00
|
|
|
kvm_debug("VGIC redist and dist frames overlap\n");
|
2020-12-27 22:28:34 +08:00
|
|
|
return -EINVAL;
|
2015-12-21 22:04:42 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For a VGICv3 we require the userland to explicitly initialize
|
|
|
|
* the VGIC before we need to use it.
|
|
|
|
*/
|
|
|
|
if (!vgic_initialized(kvm)) {
|
2020-12-27 22:28:34 +08:00
|
|
|
return -EBUSY;
|
2015-12-21 22:04:42 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
ret = vgic_register_dist_iodev(kvm, dist->vgic_dist_base, VGIC_V3);
|
|
|
|
if (ret) {
|
|
|
|
kvm_err("Unable to register VGICv3 dist MMIO regions\n");
|
2020-12-27 22:28:34 +08:00
|
|
|
return ret;
|
2015-12-21 22:04:42 +08:00
|
|
|
}
|
|
|
|
|
2020-03-05 04:33:27 +08:00
|
|
|
if (kvm_vgic_global_state.has_gicv4_1)
|
|
|
|
vgic_v4_configure_vsgis(kvm);
|
2015-12-21 22:04:42 +08:00
|
|
|
|
2020-12-27 22:28:34 +08:00
|
|
|
return 0;
|
2015-12-21 22:04:42 +08:00
|
|
|
}
|
|
|
|
|
2017-06-09 19:49:33 +08:00
|
|
|
DEFINE_STATIC_KEY_FALSE(vgic_v3_cpuif_trap);
|
|
|
|
|
2017-06-09 19:49:46 +08:00
|
|
|
static int __init early_group0_trap_cfg(char *buf)
|
|
|
|
{
|
|
|
|
return strtobool(buf, &group0_trap);
|
|
|
|
}
|
|
|
|
early_param("kvm-arm.vgic_v3_group0_trap", early_group0_trap_cfg);
|
|
|
|
|
2017-06-09 19:49:41 +08:00
|
|
|
static int __init early_group1_trap_cfg(char *buf)
|
|
|
|
{
|
|
|
|
return strtobool(buf, &group1_trap);
|
|
|
|
}
|
|
|
|
early_param("kvm-arm.vgic_v3_group1_trap", early_group1_trap_cfg);
|
|
|
|
|
2017-06-09 19:49:53 +08:00
|
|
|
static int __init early_common_trap_cfg(char *buf)
|
|
|
|
{
|
|
|
|
return strtobool(buf, &common_trap);
|
|
|
|
}
|
|
|
|
early_param("kvm-arm.vgic_v3_common_trap", early_common_trap_cfg);
|
|
|
|
|
2017-10-27 22:28:54 +08:00
|
|
|
static int __init early_gicv4_enable(char *buf)
|
|
|
|
{
|
|
|
|
return strtobool(buf, &gicv4_enable);
|
|
|
|
}
|
|
|
|
early_param("kvm-arm.vgic_v4_enable", early_gicv4_enable);
|
|
|
|
|
2022-01-22 05:07:47 +08:00
|
|
|
static const struct midr_range broken_seis[] = {
|
|
|
|
MIDR_ALL_VERSIONS(MIDR_APPLE_M1_ICESTORM),
|
|
|
|
MIDR_ALL_VERSIONS(MIDR_APPLE_M1_FIRESTORM),
|
2022-05-14 18:25:24 +08:00
|
|
|
MIDR_ALL_VERSIONS(MIDR_APPLE_M1_ICESTORM_PRO),
|
|
|
|
MIDR_ALL_VERSIONS(MIDR_APPLE_M1_FIRESTORM_PRO),
|
|
|
|
MIDR_ALL_VERSIONS(MIDR_APPLE_M1_ICESTORM_MAX),
|
|
|
|
MIDR_ALL_VERSIONS(MIDR_APPLE_M1_FIRESTORM_MAX),
|
2022-01-22 05:07:47 +08:00
|
|
|
{},
|
|
|
|
};
|
|
|
|
|
|
|
|
static bool vgic_v3_broken_seis(void)
|
|
|
|
{
|
|
|
|
return ((kvm_vgic_global_state.ich_vtr_el2 & ICH_VTR_SEIS_MASK) &&
|
|
|
|
is_midr_in_range_list(read_cpuid_id(), broken_seis));
|
|
|
|
}
|
|
|
|
|
2015-12-01 22:02:35 +08:00
|
|
|
/**
|
2019-08-15 17:56:22 +08:00
|
|
|
* vgic_v3_probe - probe for a VGICv3 compatible interrupt controller
|
|
|
|
* @info: pointer to the GIC description
|
2015-12-01 22:02:35 +08:00
|
|
|
*
|
2019-08-15 17:56:22 +08:00
|
|
|
* Returns 0 if the VGICv3 has been probed successfully, returns an error code
|
|
|
|
* otherwise
|
2015-12-01 22:02:35 +08:00
|
|
|
*/
|
|
|
|
int vgic_v3_probe(const struct gic_kvm_info *info)
|
|
|
|
{
|
2021-03-06 02:52:52 +08:00
|
|
|
u64 ich_vtr_el2 = kvm_call_hyp_ret(__vgic_v3_get_gic_config);
|
2021-03-06 02:52:53 +08:00
|
|
|
bool has_v2;
|
2016-07-15 19:43:23 +08:00
|
|
|
int ret;
|
2015-12-01 22:02:35 +08:00
|
|
|
|
2021-03-06 02:52:53 +08:00
|
|
|
has_v2 = ich_vtr_el2 >> 63;
|
2021-03-06 02:52:52 +08:00
|
|
|
ich_vtr_el2 = (u32)ich_vtr_el2;
|
|
|
|
|
2015-12-01 22:02:35 +08:00
|
|
|
/*
|
2020-04-01 22:03:10 +08:00
|
|
|
* The ListRegs field is 5 bits, but there is an architectural
|
2015-12-01 22:02:35 +08:00
|
|
|
* maximum of 16 list registers. Just ignore bit 4...
|
|
|
|
*/
|
|
|
|
kvm_vgic_global_state.nr_lr = (ich_vtr_el2 & 0xf) + 1;
|
|
|
|
kvm_vgic_global_state.can_emulate_gicv2 = false;
|
2017-01-26 22:20:51 +08:00
|
|
|
kvm_vgic_global_state.ich_vtr_el2 = ich_vtr_el2;
|
2015-12-01 22:02:35 +08:00
|
|
|
|
2017-10-27 22:28:54 +08:00
|
|
|
/* GICv4 support? */
|
|
|
|
if (info->has_v4) {
|
|
|
|
kvm_vgic_global_state.has_gicv4 = gicv4_enable;
|
2020-03-05 04:33:20 +08:00
|
|
|
kvm_vgic_global_state.has_gicv4_1 = info->has_v4_1 && gicv4_enable;
|
|
|
|
kvm_info("GICv4%s support %sabled\n",
|
|
|
|
kvm_vgic_global_state.has_gicv4_1 ? ".1" : "",
|
2017-10-27 22:28:54 +08:00
|
|
|
gicv4_enable ? "en" : "dis");
|
|
|
|
}
|
|
|
|
|
2021-03-06 02:52:53 +08:00
|
|
|
kvm_vgic_global_state.vcpu_base = 0;
|
|
|
|
|
2015-12-01 22:02:35 +08:00
|
|
|
if (!info->vcpu.start) {
|
|
|
|
kvm_info("GICv3: no GICV resource entry\n");
|
2021-03-06 02:52:53 +08:00
|
|
|
} else if (!has_v2) {
|
|
|
|
pr_warn(FW_BUG "CPU interface incapable of MMIO access\n");
|
2015-12-01 22:02:35 +08:00
|
|
|
} else if (!PAGE_ALIGNED(info->vcpu.start)) {
|
|
|
|
pr_warn("GICV physical address 0x%llx not page aligned\n",
|
|
|
|
(unsigned long long)info->vcpu.start);
|
2021-12-08 23:22:55 +08:00
|
|
|
} else if (kvm_get_mode() != KVM_MODE_PROTECTED) {
|
2015-12-01 22:02:35 +08:00
|
|
|
kvm_vgic_global_state.vcpu_base = info->vcpu.start;
|
|
|
|
kvm_vgic_global_state.can_emulate_gicv2 = true;
|
2016-07-15 19:43:23 +08:00
|
|
|
ret = kvm_register_vgic_device(KVM_DEV_TYPE_ARM_VGIC_V2);
|
|
|
|
if (ret) {
|
|
|
|
kvm_err("Cannot register GICv2 KVM device.\n");
|
|
|
|
return ret;
|
|
|
|
}
|
2015-12-01 22:02:35 +08:00
|
|
|
kvm_info("vgic-v2@%llx\n", info->vcpu.start);
|
|
|
|
}
|
2016-07-15 19:43:23 +08:00
|
|
|
ret = kvm_register_vgic_device(KVM_DEV_TYPE_ARM_VGIC_V3);
|
|
|
|
if (ret) {
|
|
|
|
kvm_err("Cannot register GICv3 KVM device.\n");
|
|
|
|
kvm_unregister_device_ops(KVM_DEV_TYPE_ARM_VGIC_V2);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2015-12-01 22:02:35 +08:00
|
|
|
if (kvm_vgic_global_state.vcpu_base == 0)
|
|
|
|
kvm_info("disabling GICv2 emulation\n");
|
|
|
|
|
2017-06-09 19:49:48 +08:00
|
|
|
if (cpus_have_const_cap(ARM64_WORKAROUND_CAVIUM_30115)) {
|
|
|
|
group0_trap = true;
|
|
|
|
group1_trap = true;
|
|
|
|
}
|
|
|
|
|
2022-01-22 05:07:47 +08:00
|
|
|
if (vgic_v3_broken_seis()) {
|
|
|
|
kvm_info("GICv3 with broken locally generated SEI\n");
|
KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors
The infamous M1 has a feature nobody else ever implemented,
in the form of the "GIC locally generated SError interrupts",
also known as SEIS for short.
These SErrors are generated when a guest does something that violates
the GIC state machine. It would have been simpler to just *ignore*
the damned thing, but that's not what this HW does. Oh well.
This part of of the architecture is also amazingly under-specified.
There is a whole 10 lines that describe the feature in a spec that
is 930 pages long, and some of these lines are factually wrong.
Oh, and it is deprecated, so the insentive to clarify it is low.
Now, the spec says that this should be a *virtual* SError when
HCR_EL2.AMO is set. As it turns out, that's not always the case
on this CPU, and the SError sometimes fires on the host as a
physical SError. Goodbye, cruel world. This clearly is a HW bug,
and it means that a guest can easily take the host down, on demand.
Thankfully, we have seen systems that were just as broken in the
past, and we have the perfect vaccine for it.
Apple M1, please meet the Cavium ThunderX workaround. All your
GIC accesses will be trapped, sanitised, and emulated. Only the
signalling aspect of the HW will be used. It won't be super speedy,
but it will at least be safe. You're most welcome.
Given that this has only ever been seen on this single implementation,
that the spec is unclear at best and that we cannot trust it to ever
be implemented correctly, gate the workaround solely on ICH_VTR_EL2.SEIS
being set.
Tested-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211010150910.2911495-3-maz@kernel.org
2021-10-10 23:09:07 +08:00
|
|
|
|
2022-01-22 05:07:47 +08:00
|
|
|
kvm_vgic_global_state.ich_vtr_el2 &= ~ICH_VTR_SEIS_MASK;
|
KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors
The infamous M1 has a feature nobody else ever implemented,
in the form of the "GIC locally generated SError interrupts",
also known as SEIS for short.
These SErrors are generated when a guest does something that violates
the GIC state machine. It would have been simpler to just *ignore*
the damned thing, but that's not what this HW does. Oh well.
This part of of the architecture is also amazingly under-specified.
There is a whole 10 lines that describe the feature in a spec that
is 930 pages long, and some of these lines are factually wrong.
Oh, and it is deprecated, so the insentive to clarify it is low.
Now, the spec says that this should be a *virtual* SError when
HCR_EL2.AMO is set. As it turns out, that's not always the case
on this CPU, and the SError sometimes fires on the host as a
physical SError. Goodbye, cruel world. This clearly is a HW bug,
and it means that a guest can easily take the host down, on demand.
Thankfully, we have seen systems that were just as broken in the
past, and we have the perfect vaccine for it.
Apple M1, please meet the Cavium ThunderX workaround. All your
GIC accesses will be trapped, sanitised, and emulated. Only the
signalling aspect of the HW will be used. It won't be super speedy,
but it will at least be safe. You're most welcome.
Given that this has only ever been seen on this single implementation,
that the spec is unclear at best and that we cannot trust it to ever
be implemented correctly, gate the workaround solely on ICH_VTR_EL2.SEIS
being set.
Tested-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211010150910.2911495-3-maz@kernel.org
2021-10-10 23:09:07 +08:00
|
|
|
group0_trap = true;
|
|
|
|
group1_trap = true;
|
KVM: arm64: vgic-v3: Reduce common group trapping to ICV_DIR_EL1 when possible
On systems that advertise ICH_VTR_EL2.SEIS, we trap all GICv3 sysreg
accesses from the guest. From a performance perspective, this is OK
as long as the guest doesn't hammer the GICv3 CPU interface.
In most cases, this is fine, unless the guest actively uses
priorities and switches PMR_EL1 very often. Which is exactly what
happens when a Linux guest runs with irqchip.gicv3_pseudo_nmi=1.
In these condition, the performance plumets as we hit PMR each time
we mask/unmask interrupts. Not good.
There is however an opportunity for improvement. Careful reading
of the architecture specification indicates that the only GICv3
sysreg belonging to the common group (which contains the SGI
registers, PMR, DIR, CTLR and RPR) that is allowed to generate
a SError is DIR. Everything else is safe.
It is thus possible to substitute the trapping of all the common
group with just that of DIR if it supported by the implementation.
Yes, that's yet another optional bit of the architecture.
So let's just do that, as it leads to some impressive result on
the M1:
Without this change:
bash-5.1# /host/home/maz/hackbench 100 process 1000
Running with 100*40 (== 4000) tasks.
Time: 56.596
With this change:
bash-5.1# /host/home/maz/hackbench 100 process 1000
Running with 100*40 (== 4000) tasks.
Time: 8.649
which is a pretty convincing result.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20211010150910.2911495-4-maz@kernel.org
2021-10-10 23:09:08 +08:00
|
|
|
if (ich_vtr_el2 & ICH_VTR_TDS_MASK)
|
|
|
|
dir_trap = true;
|
|
|
|
else
|
|
|
|
common_trap = true;
|
KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors
The infamous M1 has a feature nobody else ever implemented,
in the form of the "GIC locally generated SError interrupts",
also known as SEIS for short.
These SErrors are generated when a guest does something that violates
the GIC state machine. It would have been simpler to just *ignore*
the damned thing, but that's not what this HW does. Oh well.
This part of of the architecture is also amazingly under-specified.
There is a whole 10 lines that describe the feature in a spec that
is 930 pages long, and some of these lines are factually wrong.
Oh, and it is deprecated, so the insentive to clarify it is low.
Now, the spec says that this should be a *virtual* SError when
HCR_EL2.AMO is set. As it turns out, that's not always the case
on this CPU, and the SError sometimes fires on the host as a
physical SError. Goodbye, cruel world. This clearly is a HW bug,
and it means that a guest can easily take the host down, on demand.
Thankfully, we have seen systems that were just as broken in the
past, and we have the perfect vaccine for it.
Apple M1, please meet the Cavium ThunderX workaround. All your
GIC accesses will be trapped, sanitised, and emulated. Only the
signalling aspect of the HW will be used. It won't be super speedy,
but it will at least be safe. You're most welcome.
Given that this has only ever been seen on this single implementation,
that the spec is unclear at best and that we cannot trust it to ever
be implemented correctly, gate the workaround solely on ICH_VTR_EL2.SEIS
being set.
Tested-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211010150910.2911495-3-maz@kernel.org
2021-10-10 23:09:07 +08:00
|
|
|
}
|
|
|
|
|
KVM: arm64: vgic-v3: Reduce common group trapping to ICV_DIR_EL1 when possible
On systems that advertise ICH_VTR_EL2.SEIS, we trap all GICv3 sysreg
accesses from the guest. From a performance perspective, this is OK
as long as the guest doesn't hammer the GICv3 CPU interface.
In most cases, this is fine, unless the guest actively uses
priorities and switches PMR_EL1 very often. Which is exactly what
happens when a Linux guest runs with irqchip.gicv3_pseudo_nmi=1.
In these condition, the performance plumets as we hit PMR each time
we mask/unmask interrupts. Not good.
There is however an opportunity for improvement. Careful reading
of the architecture specification indicates that the only GICv3
sysreg belonging to the common group (which contains the SGI
registers, PMR, DIR, CTLR and RPR) that is allowed to generate
a SError is DIR. Everything else is safe.
It is thus possible to substitute the trapping of all the common
group with just that of DIR if it supported by the implementation.
Yes, that's yet another optional bit of the architecture.
So let's just do that, as it leads to some impressive result on
the M1:
Without this change:
bash-5.1# /host/home/maz/hackbench 100 process 1000
Running with 100*40 (== 4000) tasks.
Time: 56.596
With this change:
bash-5.1# /host/home/maz/hackbench 100 process 1000
Running with 100*40 (== 4000) tasks.
Time: 8.649
which is a pretty convincing result.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20211010150910.2911495-4-maz@kernel.org
2021-10-10 23:09:08 +08:00
|
|
|
if (group0_trap || group1_trap || common_trap | dir_trap) {
|
|
|
|
kvm_info("GICv3 sysreg trapping enabled ([%s%s%s%s], reduced performance)\n",
|
2017-06-09 19:49:54 +08:00
|
|
|
group0_trap ? "G0" : "",
|
|
|
|
group1_trap ? "G1" : "",
|
KVM: arm64: vgic-v3: Reduce common group trapping to ICV_DIR_EL1 when possible
On systems that advertise ICH_VTR_EL2.SEIS, we trap all GICv3 sysreg
accesses from the guest. From a performance perspective, this is OK
as long as the guest doesn't hammer the GICv3 CPU interface.
In most cases, this is fine, unless the guest actively uses
priorities and switches PMR_EL1 very often. Which is exactly what
happens when a Linux guest runs with irqchip.gicv3_pseudo_nmi=1.
In these condition, the performance plumets as we hit PMR each time
we mask/unmask interrupts. Not good.
There is however an opportunity for improvement. Careful reading
of the architecture specification indicates that the only GICv3
sysreg belonging to the common group (which contains the SGI
registers, PMR, DIR, CTLR and RPR) that is allowed to generate
a SError is DIR. Everything else is safe.
It is thus possible to substitute the trapping of all the common
group with just that of DIR if it supported by the implementation.
Yes, that's yet another optional bit of the architecture.
So let's just do that, as it leads to some impressive result on
the M1:
Without this change:
bash-5.1# /host/home/maz/hackbench 100 process 1000
Running with 100*40 (== 4000) tasks.
Time: 56.596
With this change:
bash-5.1# /host/home/maz/hackbench 100 process 1000
Running with 100*40 (== 4000) tasks.
Time: 8.649
which is a pretty convincing result.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20211010150910.2911495-4-maz@kernel.org
2021-10-10 23:09:08 +08:00
|
|
|
common_trap ? "C" : "",
|
|
|
|
dir_trap ? "D" : "");
|
2017-06-09 19:49:41 +08:00
|
|
|
static_branch_enable(&vgic_v3_cpuif_trap);
|
|
|
|
}
|
|
|
|
|
2015-12-01 22:02:35 +08:00
|
|
|
kvm_vgic_global_state.vctrl_base = NULL;
|
|
|
|
kvm_vgic_global_state.type = VGIC_V3;
|
|
|
|
kvm_vgic_global_state.max_gic_vcpus = VGIC_V3_MAX_CPUS;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
2016-03-24 18:21:04 +08:00
|
|
|
|
|
|
|
void vgic_v3_load(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
|
|
|
|
|
2017-04-19 19:15:26 +08:00
|
|
|
/*
|
|
|
|
* If dealing with a GICv2 emulation on GICv3, VMCR_EL2.VFIQen
|
|
|
|
* is dependent on ICC_SRE_EL1.SRE, and we have to perform the
|
|
|
|
* VMCR_EL2 save/restore in the world switch.
|
|
|
|
*/
|
|
|
|
if (likely(cpu_if->vgic_sre))
|
|
|
|
kvm_call_hyp(__vgic_v3_write_vmcr, cpu_if->vgic_vmcr);
|
2017-10-05 06:18:07 +08:00
|
|
|
|
2020-09-15 18:46:43 +08:00
|
|
|
kvm_call_hyp(__vgic_v3_restore_aprs, cpu_if);
|
2017-10-05 23:19:19 +08:00
|
|
|
|
|
|
|
if (has_vhe())
|
2018-12-02 00:41:28 +08:00
|
|
|
__vgic_v3_activate_traps(cpu_if);
|
KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put
When the VHE code was reworked, a lot of the vgic stuff was moved around,
but the GICv4 residency code did stay untouched, meaning that we come
in and out of residency on each flush/sync, which is obviously suboptimal.
To address this, let's move things around a bit:
- Residency entry (flush) moves to vcpu_load
- Residency exit (sync) moves to vcpu_put
- On blocking (entry to WFI), we "put"
- On unblocking (exit from WFI), we "load"
Because these can nest (load/block/put/load/unblock/put, for example),
we now have per-VPE tracking of the residency state.
Additionally, vgic_v4_put gains a "need doorbell" parameter, which only
gets set to true when blocking because of a WFI. This allows a finer
control of the doorbell, which now also gets disabled as soon as
it gets signaled.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20191027144234.8395-2-maz@kernel.org
2019-10-27 22:41:59 +08:00
|
|
|
|
|
|
|
WARN_ON(vgic_v4_load(vcpu));
|
2016-03-24 18:21:04 +08:00
|
|
|
}
|
|
|
|
|
KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block
Since commit commit 328e56647944 ("KVM: arm/arm64: vgic: Defer
touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or
its GICv2 equivalent) loaded as long as we can, only syncing it
back when we're scheduled out.
There is a small snag with that though: kvm_vgic_vcpu_pending_irq(),
which is indirectly called from kvm_vcpu_check_block(), needs to
evaluate the guest's view of ICC_PMR_EL1. At the point were we
call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever
changes to PMR is not visible in memory until we do a vcpu_put().
Things go really south if the guest does the following:
mov x0, #0 // or any small value masking interrupts
msr ICC_PMR_EL1, x0
[vcpu preempted, then rescheduled, VMCR sampled]
mov x0, #ff // allow all interrupts
msr ICC_PMR_EL1, x0
wfi // traps to EL2, so samping of VMCR
[interrupt arrives just after WFI]
Here, the hypervisor's view of PMR is zero, while the guest has enabled
its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no
interrupts are pending (despite an interrupt being received) and we'll
block for no reason. If the guest doesn't have a periodic interrupt
firing once it has blocked, it will stay there forever.
To avoid this unfortuante situation, let's resync VMCR from
kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block()
will observe the latest value of PMR.
This has been found by booting an arm64 Linux guest with the pseudo NMI
feature, and thus using interrupt priorities to mask interrupts instead
of the usual PSTATE masking.
Cc: stable@vger.kernel.org # 4.12
Fixes: 328e56647944 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put")
Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-02 17:28:32 +08:00
|
|
|
void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu)
|
2016-03-24 18:21:04 +08:00
|
|
|
{
|
|
|
|
struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
|
|
|
|
|
2017-04-19 19:15:26 +08:00
|
|
|
if (likely(cpu_if->vgic_sre))
|
2019-01-05 23:49:50 +08:00
|
|
|
cpu_if->vgic_vmcr = kvm_call_hyp_ret(__vgic_v3_read_vmcr);
|
KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block
Since commit commit 328e56647944 ("KVM: arm/arm64: vgic: Defer
touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or
its GICv2 equivalent) loaded as long as we can, only syncing it
back when we're scheduled out.
There is a small snag with that though: kvm_vgic_vcpu_pending_irq(),
which is indirectly called from kvm_vcpu_check_block(), needs to
evaluate the guest's view of ICC_PMR_EL1. At the point were we
call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever
changes to PMR is not visible in memory until we do a vcpu_put().
Things go really south if the guest does the following:
mov x0, #0 // or any small value masking interrupts
msr ICC_PMR_EL1, x0
[vcpu preempted, then rescheduled, VMCR sampled]
mov x0, #ff // allow all interrupts
msr ICC_PMR_EL1, x0
wfi // traps to EL2, so samping of VMCR
[interrupt arrives just after WFI]
Here, the hypervisor's view of PMR is zero, while the guest has enabled
its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no
interrupts are pending (despite an interrupt being received) and we'll
block for no reason. If the guest doesn't have a periodic interrupt
firing once it has blocked, it will stay there forever.
To avoid this unfortuante situation, let's resync VMCR from
kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block()
will observe the latest value of PMR.
This has been found by booting an arm64 Linux guest with the pseudo NMI
feature, and thus using interrupt priorities to mask interrupts instead
of the usual PSTATE masking.
Cc: stable@vger.kernel.org # 4.12
Fixes: 328e56647944 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put")
Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-02 17:28:32 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
void vgic_v3_put(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2018-12-02 00:41:28 +08:00
|
|
|
struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
|
|
|
|
|
KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put
When the VHE code was reworked, a lot of the vgic stuff was moved around,
but the GICv4 residency code did stay untouched, meaning that we come
in and out of residency on each flush/sync, which is obviously suboptimal.
To address this, let's move things around a bit:
- Residency entry (flush) moves to vcpu_load
- Residency exit (sync) moves to vcpu_put
- On blocking (entry to WFI), we "put"
- On unblocking (exit from WFI), we "load"
Because these can nest (load/block/put/load/unblock/put, for example),
we now have per-VPE tracking of the residency state.
Additionally, vgic_v4_put gains a "need doorbell" parameter, which only
gets set to true when blocking because of a WFI. This allows a finer
control of the doorbell, which now also gets disabled as soon as
it gets signaled.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20191027144234.8395-2-maz@kernel.org
2019-10-27 22:41:59 +08:00
|
|
|
WARN_ON(vgic_v4_put(vcpu, false));
|
|
|
|
|
KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block
Since commit commit 328e56647944 ("KVM: arm/arm64: vgic: Defer
touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or
its GICv2 equivalent) loaded as long as we can, only syncing it
back when we're scheduled out.
There is a small snag with that though: kvm_vgic_vcpu_pending_irq(),
which is indirectly called from kvm_vcpu_check_block(), needs to
evaluate the guest's view of ICC_PMR_EL1. At the point were we
call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever
changes to PMR is not visible in memory until we do a vcpu_put().
Things go really south if the guest does the following:
mov x0, #0 // or any small value masking interrupts
msr ICC_PMR_EL1, x0
[vcpu preempted, then rescheduled, VMCR sampled]
mov x0, #ff // allow all interrupts
msr ICC_PMR_EL1, x0
wfi // traps to EL2, so samping of VMCR
[interrupt arrives just after WFI]
Here, the hypervisor's view of PMR is zero, while the guest has enabled
its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no
interrupts are pending (despite an interrupt being received) and we'll
block for no reason. If the guest doesn't have a periodic interrupt
firing once it has blocked, it will stay there forever.
To avoid this unfortuante situation, let's resync VMCR from
kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block()
will observe the latest value of PMR.
This has been found by booting an arm64 Linux guest with the pseudo NMI
feature, and thus using interrupt priorities to mask interrupts instead
of the usual PSTATE masking.
Cc: stable@vger.kernel.org # 4.12
Fixes: 328e56647944 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put")
Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-02 17:28:32 +08:00
|
|
|
vgic_v3_vmcr_sync(vcpu);
|
2017-10-05 06:18:07 +08:00
|
|
|
|
2020-09-15 18:46:43 +08:00
|
|
|
kvm_call_hyp(__vgic_v3_save_aprs, cpu_if);
|
2017-10-05 23:19:19 +08:00
|
|
|
|
|
|
|
if (has_vhe())
|
2018-12-02 00:41:28 +08:00
|
|
|
__vgic_v3_deactivate_traps(cpu_if);
|
2016-03-24 18:21:04 +08:00
|
|
|
}
|