PPC:
- Better machine check handling for HV KVM - Ability to support guests with threads=2, 4 or 8 on POWER9 - Fix for a race that could cause delayed recognition of signals - Fix for a bug where POWER9 guests could sleep with interrupts pending. ARM: - VCPU request overhaul - allow timer and PMU to have their interrupt number selected from userspace - workaround for Cavium erratum 30115 - handling of memory poisonning - the usual crop of fixes and cleanups s390: - initial machine check forwarding - migration support for the CMMA page hinting information - cleanups and fixes x86: - nested VMX bugfixes and improvements - more reliable NMI window detection on AMD - APIC timer optimizations Generic: - VCPU request overhaul + documentation of common code patterns - kvm_stat improvements There is a small conflict in arch/s390 due to an arch-wide field rename. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJZW4XTAAoJEL/70l94x66DkhMH/izpk54KI17PtyQ9VYI2sYeZ BWK6Kl886g3ij4pFi3pECqjDJzWaa3ai+vFfzzpJJ8OkCJT5Rv4LxC5ERltVVmR8 A3T1I/MRktSC0VJLv34daPC2z4Lco/6SPipUpPnL4bE2HATKed4vzoOjQ3tOeGTy dwi7TFjKwoVDiM7kPPDRnTHqCe5G5n13sZ49dBe9WeJ7ttJauWqoxhlYosCGNPEj g8ZX8+cvcAhVnz5uFL8roqZ8ygNEQq2mgkU18W8ZZKuiuwR0gdsG0gSBFNTdwIMK NoreRKMrw0+oLXTIB8SZsoieU6Qi7w3xMAMabe8AJsvYtoersugbOmdxGCr1lsA= =OD7H -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "PPC: - Better machine check handling for HV KVM - Ability to support guests with threads=2, 4 or 8 on POWER9 - Fix for a race that could cause delayed recognition of signals - Fix for a bug where POWER9 guests could sleep with interrupts pending. ARM: - VCPU request overhaul - allow timer and PMU to have their interrupt number selected from userspace - workaround for Cavium erratum 30115 - handling of memory poisonning - the usual crop of fixes and cleanups s390: - initial machine check forwarding - migration support for the CMMA page hinting information - cleanups and fixes x86: - nested VMX bugfixes and improvements - more reliable NMI window detection on AMD - APIC timer optimizations Generic: - VCPU request overhaul + documentation of common code patterns - kvm_stat improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits) Update my email address kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12 kvm: x86: mmu: allow A/D bits to be disabled in an mmu x86: kvm: mmu: make spte mmio mask more explicit x86: kvm: mmu: dead code thanks to access tracking KVM: PPC: Book3S: Fix typo in XICS-on-XIVE state saving code KVM: PPC: Book3S HV: Close race with testing for signals on guest entry KVM: PPC: Book3S HV: Simplify dynamic micro-threading code KVM: x86: remove ignored type attribute KVM: LAPIC: Fix lapic timer injection delay KVM: lapic: reorganize restart_apic_timer KVM: lapic: reorganize start_hv_timer kvm: nVMX: Check memory operand to INVVPID KVM: s390: Inject machine check into the nested guest KVM: s390: Inject machine check into the guest tools/kvm_stat: add new interactive command 'b' tools/kvm_stat: add new command line switch '-i' tools/kvm_stat: fix error on interactive command 'g' KVM: SVM: suppress unnecessary NMI singlestep on GIF=0 and nested exit ...
This commit is contained in:
commit
c136b84393
|
@ -1862,6 +1862,18 @@
|
||||||
for all guests.
|
for all guests.
|
||||||
Default is 1 (enabled) if in 64-bit or 32-bit PAE mode.
|
Default is 1 (enabled) if in 64-bit or 32-bit PAE mode.
|
||||||
|
|
||||||
|
kvm-arm.vgic_v3_group0_trap=
|
||||||
|
[KVM,ARM] Trap guest accesses to GICv3 group-0
|
||||||
|
system registers
|
||||||
|
|
||||||
|
kvm-arm.vgic_v3_group1_trap=
|
||||||
|
[KVM,ARM] Trap guest accesses to GICv3 group-1
|
||||||
|
system registers
|
||||||
|
|
||||||
|
kvm-arm.vgic_v3_common_trap=
|
||||||
|
[KVM,ARM] Trap guest accesses to GICv3 common
|
||||||
|
system registers
|
||||||
|
|
||||||
kvm-intel.ept= [KVM,Intel] Disable extended page tables
|
kvm-intel.ept= [KVM,Intel] Disable extended page tables
|
||||||
(virtualized MMU) support on capable Intel chips.
|
(virtualized MMU) support on capable Intel chips.
|
||||||
Default is 1 (enabled)
|
Default is 1 (enabled)
|
||||||
|
|
|
@ -62,6 +62,7 @@ stable kernels.
|
||||||
| Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 |
|
| Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 |
|
||||||
| Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 |
|
| Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 |
|
||||||
| Cavium | ThunderX SMMUv2 | #27704 | N/A |
|
| Cavium | ThunderX SMMUv2 | #27704 | N/A |
|
||||||
|
| Cavium | ThunderX Core | #30115 | CAVIUM_ERRATUM_30115 |
|
||||||
| | | | |
|
| | | | |
|
||||||
| Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 |
|
| Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 |
|
||||||
| | | | |
|
| | | | |
|
||||||
|
|
|
@ -3255,6 +3255,141 @@ Otherwise, if the MCE is a corrected error, KVM will just
|
||||||
store it in the corresponding bank (provided this bank is
|
store it in the corresponding bank (provided this bank is
|
||||||
not holding a previously reported uncorrected error).
|
not holding a previously reported uncorrected error).
|
||||||
|
|
||||||
|
4.107 KVM_S390_GET_CMMA_BITS
|
||||||
|
|
||||||
|
Capability: KVM_CAP_S390_CMMA_MIGRATION
|
||||||
|
Architectures: s390
|
||||||
|
Type: vm ioctl
|
||||||
|
Parameters: struct kvm_s390_cmma_log (in, out)
|
||||||
|
Returns: 0 on success, a negative value on error
|
||||||
|
|
||||||
|
This ioctl is used to get the values of the CMMA bits on the s390
|
||||||
|
architecture. It is meant to be used in two scenarios:
|
||||||
|
- During live migration to save the CMMA values. Live migration needs
|
||||||
|
to be enabled via the KVM_REQ_START_MIGRATION VM property.
|
||||||
|
- To non-destructively peek at the CMMA values, with the flag
|
||||||
|
KVM_S390_CMMA_PEEK set.
|
||||||
|
|
||||||
|
The ioctl takes parameters via the kvm_s390_cmma_log struct. The desired
|
||||||
|
values are written to a buffer whose location is indicated via the "values"
|
||||||
|
member in the kvm_s390_cmma_log struct. The values in the input struct are
|
||||||
|
also updated as needed.
|
||||||
|
Each CMMA value takes up one byte.
|
||||||
|
|
||||||
|
struct kvm_s390_cmma_log {
|
||||||
|
__u64 start_gfn;
|
||||||
|
__u32 count;
|
||||||
|
__u32 flags;
|
||||||
|
union {
|
||||||
|
__u64 remaining;
|
||||||
|
__u64 mask;
|
||||||
|
};
|
||||||
|
__u64 values;
|
||||||
|
};
|
||||||
|
|
||||||
|
start_gfn is the number of the first guest frame whose CMMA values are
|
||||||
|
to be retrieved,
|
||||||
|
|
||||||
|
count is the length of the buffer in bytes,
|
||||||
|
|
||||||
|
values points to the buffer where the result will be written to.
|
||||||
|
|
||||||
|
If count is greater than KVM_S390_SKEYS_MAX, then it is considered to be
|
||||||
|
KVM_S390_SKEYS_MAX. KVM_S390_SKEYS_MAX is re-used for consistency with
|
||||||
|
other ioctls.
|
||||||
|
|
||||||
|
The result is written in the buffer pointed to by the field values, and
|
||||||
|
the values of the input parameter are updated as follows.
|
||||||
|
|
||||||
|
Depending on the flags, different actions are performed. The only
|
||||||
|
supported flag so far is KVM_S390_CMMA_PEEK.
|
||||||
|
|
||||||
|
The default behaviour if KVM_S390_CMMA_PEEK is not set is:
|
||||||
|
start_gfn will indicate the first page frame whose CMMA bits were dirty.
|
||||||
|
It is not necessarily the same as the one passed as input, as clean pages
|
||||||
|
are skipped.
|
||||||
|
|
||||||
|
count will indicate the number of bytes actually written in the buffer.
|
||||||
|
It can (and very often will) be smaller than the input value, since the
|
||||||
|
buffer is only filled until 16 bytes of clean values are found (which
|
||||||
|
are then not copied in the buffer). Since a CMMA migration block needs
|
||||||
|
the base address and the length, for a total of 16 bytes, we will send
|
||||||
|
back some clean data if there is some dirty data afterwards, as long as
|
||||||
|
the size of the clean data does not exceed the size of the header. This
|
||||||
|
allows to minimize the amount of data to be saved or transferred over
|
||||||
|
the network at the expense of more roundtrips to userspace. The next
|
||||||
|
invocation of the ioctl will skip over all the clean values, saving
|
||||||
|
potentially more than just the 16 bytes we found.
|
||||||
|
|
||||||
|
If KVM_S390_CMMA_PEEK is set:
|
||||||
|
the existing storage attributes are read even when not in migration
|
||||||
|
mode, and no other action is performed;
|
||||||
|
|
||||||
|
the output start_gfn will be equal to the input start_gfn,
|
||||||
|
|
||||||
|
the output count will be equal to the input count, except if the end of
|
||||||
|
memory has been reached.
|
||||||
|
|
||||||
|
In both cases:
|
||||||
|
the field "remaining" will indicate the total number of dirty CMMA values
|
||||||
|
still remaining, or 0 if KVM_S390_CMMA_PEEK is set and migration mode is
|
||||||
|
not enabled.
|
||||||
|
|
||||||
|
mask is unused.
|
||||||
|
|
||||||
|
values points to the userspace buffer where the result will be stored.
|
||||||
|
|
||||||
|
This ioctl can fail with -ENOMEM if not enough memory can be allocated to
|
||||||
|
complete the task, with -ENXIO if CMMA is not enabled, with -EINVAL if
|
||||||
|
KVM_S390_CMMA_PEEK is not set but migration mode was not enabled, with
|
||||||
|
-EFAULT if the userspace address is invalid or if no page table is
|
||||||
|
present for the addresses (e.g. when using hugepages).
|
||||||
|
|
||||||
|
4.108 KVM_S390_SET_CMMA_BITS
|
||||||
|
|
||||||
|
Capability: KVM_CAP_S390_CMMA_MIGRATION
|
||||||
|
Architectures: s390
|
||||||
|
Type: vm ioctl
|
||||||
|
Parameters: struct kvm_s390_cmma_log (in)
|
||||||
|
Returns: 0 on success, a negative value on error
|
||||||
|
|
||||||
|
This ioctl is used to set the values of the CMMA bits on the s390
|
||||||
|
architecture. It is meant to be used during live migration to restore
|
||||||
|
the CMMA values, but there are no restrictions on its use.
|
||||||
|
The ioctl takes parameters via the kvm_s390_cmma_values struct.
|
||||||
|
Each CMMA value takes up one byte.
|
||||||
|
|
||||||
|
struct kvm_s390_cmma_log {
|
||||||
|
__u64 start_gfn;
|
||||||
|
__u32 count;
|
||||||
|
__u32 flags;
|
||||||
|
union {
|
||||||
|
__u64 remaining;
|
||||||
|
__u64 mask;
|
||||||
|
};
|
||||||
|
__u64 values;
|
||||||
|
};
|
||||||
|
|
||||||
|
start_gfn indicates the starting guest frame number,
|
||||||
|
|
||||||
|
count indicates how many values are to be considered in the buffer,
|
||||||
|
|
||||||
|
flags is not used and must be 0.
|
||||||
|
|
||||||
|
mask indicates which PGSTE bits are to be considered.
|
||||||
|
|
||||||
|
remaining is not used.
|
||||||
|
|
||||||
|
values points to the buffer in userspace where to store the values.
|
||||||
|
|
||||||
|
This ioctl can fail with -ENOMEM if not enough memory can be allocated to
|
||||||
|
complete the task, with -ENXIO if CMMA is not enabled, with -EINVAL if
|
||||||
|
the count field is too large (e.g. more than KVM_S390_CMMA_SIZE_MAX) or
|
||||||
|
if the flags field was not 0, with -EFAULT if the userspace address is
|
||||||
|
invalid, if invalid pages are written to (e.g. after the end of memory)
|
||||||
|
or if no page table is present for the addresses (e.g. when using
|
||||||
|
hugepages).
|
||||||
|
|
||||||
5. The kvm_run structure
|
5. The kvm_run structure
|
||||||
------------------------
|
------------------------
|
||||||
|
|
||||||
|
@ -3996,6 +4131,34 @@ Parameters: none
|
||||||
Allow use of adapter-interruption suppression.
|
Allow use of adapter-interruption suppression.
|
||||||
Returns: 0 on success; -EBUSY if a VCPU has already been created.
|
Returns: 0 on success; -EBUSY if a VCPU has already been created.
|
||||||
|
|
||||||
|
7.11 KVM_CAP_PPC_SMT
|
||||||
|
|
||||||
|
Architectures: ppc
|
||||||
|
Parameters: vsmt_mode, flags
|
||||||
|
|
||||||
|
Enabling this capability on a VM provides userspace with a way to set
|
||||||
|
the desired virtual SMT mode (i.e. the number of virtual CPUs per
|
||||||
|
virtual core). The virtual SMT mode, vsmt_mode, must be a power of 2
|
||||||
|
between 1 and 8. On POWER8, vsmt_mode must also be no greater than
|
||||||
|
the number of threads per subcore for the host. Currently flags must
|
||||||
|
be 0. A successful call to enable this capability will result in
|
||||||
|
vsmt_mode being returned when the KVM_CAP_PPC_SMT capability is
|
||||||
|
subsequently queried for the VM. This capability is only supported by
|
||||||
|
HV KVM, and can only be set before any VCPUs have been created.
|
||||||
|
The KVM_CAP_PPC_SMT_POSSIBLE capability indicates which virtual SMT
|
||||||
|
modes are available.
|
||||||
|
|
||||||
|
7.12 KVM_CAP_PPC_FWNMI
|
||||||
|
|
||||||
|
Architectures: ppc
|
||||||
|
Parameters: none
|
||||||
|
|
||||||
|
With this capability a machine check exception in the guest address
|
||||||
|
space will cause KVM to exit the guest with NMI exit reason. This
|
||||||
|
enables QEMU to build error log and branch to guest kernel registered
|
||||||
|
machine check handling routine. Without this capability KVM will
|
||||||
|
branch to guests' 0x200 interrupt vector.
|
||||||
|
|
||||||
8. Other capabilities.
|
8. Other capabilities.
|
||||||
----------------------
|
----------------------
|
||||||
|
|
||||||
|
@ -4157,3 +4320,12 @@ Currently the following bits are defined for the device_irq_level bitmap:
|
||||||
Future versions of kvm may implement additional events. These will get
|
Future versions of kvm may implement additional events. These will get
|
||||||
indicated by returning a higher number from KVM_CHECK_EXTENSION and will be
|
indicated by returning a higher number from KVM_CHECK_EXTENSION and will be
|
||||||
listed above.
|
listed above.
|
||||||
|
|
||||||
|
8.10 KVM_CAP_PPC_SMT_POSSIBLE
|
||||||
|
|
||||||
|
Architectures: ppc
|
||||||
|
|
||||||
|
Querying this capability returns a bitmap indicating the possible
|
||||||
|
virtual SMT modes that can be set using KVM_CAP_PPC_SMT. If bit N
|
||||||
|
(counting from the right) is set, then a virtual SMT mode of 2^N is
|
||||||
|
available.
|
||||||
|
|
|
@ -16,6 +16,7 @@ FLIC provides support to
|
||||||
- register and modify adapter interrupt sources (KVM_DEV_FLIC_ADAPTER_*)
|
- register and modify adapter interrupt sources (KVM_DEV_FLIC_ADAPTER_*)
|
||||||
- modify AIS (adapter-interruption-suppression) mode state (KVM_DEV_FLIC_AISM)
|
- modify AIS (adapter-interruption-suppression) mode state (KVM_DEV_FLIC_AISM)
|
||||||
- inject adapter interrupts on a specified adapter (KVM_DEV_FLIC_AIRQ_INJECT)
|
- inject adapter interrupts on a specified adapter (KVM_DEV_FLIC_AIRQ_INJECT)
|
||||||
|
- get/set all AIS mode states (KVM_DEV_FLIC_AISM_ALL)
|
||||||
|
|
||||||
Groups:
|
Groups:
|
||||||
KVM_DEV_FLIC_ENQUEUE
|
KVM_DEV_FLIC_ENQUEUE
|
||||||
|
@ -136,6 +137,20 @@ struct kvm_s390_ais_req {
|
||||||
an isc according to the adapter-interruption-suppression mode on condition
|
an isc according to the adapter-interruption-suppression mode on condition
|
||||||
that the AIS capability is enabled.
|
that the AIS capability is enabled.
|
||||||
|
|
||||||
|
KVM_DEV_FLIC_AISM_ALL
|
||||||
|
Gets or sets the adapter-interruption-suppression mode for all ISCs. Takes
|
||||||
|
a kvm_s390_ais_all describing:
|
||||||
|
|
||||||
|
struct kvm_s390_ais_all {
|
||||||
|
__u8 simm; /* Single-Interruption-Mode mask */
|
||||||
|
__u8 nimm; /* No-Interruption-Mode mask *
|
||||||
|
};
|
||||||
|
|
||||||
|
simm contains Single-Interruption-Mode mask for all ISCs, nimm contains
|
||||||
|
No-Interruption-Mode mask for all ISCs. Each bit in simm and nimm corresponds
|
||||||
|
to an ISC (MSB0 bit 0 to ISC 0 and so on). The combination of simm bit and
|
||||||
|
nimm bit presents AIS mode for a ISC.
|
||||||
|
|
||||||
Note: The KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR device ioctls executed on
|
Note: The KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR device ioctls executed on
|
||||||
FLIC with an unknown group or attribute gives the error code EINVAL (instead of
|
FLIC with an unknown group or attribute gives the error code EINVAL (instead of
|
||||||
ENXIO, as specified in the API documentation). It is not possible to conclude
|
ENXIO, as specified in the API documentation). It is not possible to conclude
|
||||||
|
|
|
@ -16,7 +16,9 @@ Parameters: in kvm_device_attr.addr the address for PMU overflow interrupt is a
|
||||||
Returns: -EBUSY: The PMU overflow interrupt is already set
|
Returns: -EBUSY: The PMU overflow interrupt is already set
|
||||||
-ENXIO: The overflow interrupt not set when attempting to get it
|
-ENXIO: The overflow interrupt not set when attempting to get it
|
||||||
-ENODEV: PMUv3 not supported
|
-ENODEV: PMUv3 not supported
|
||||||
-EINVAL: Invalid PMU overflow interrupt number supplied
|
-EINVAL: Invalid PMU overflow interrupt number supplied or
|
||||||
|
trying to set the IRQ number without using an in-kernel
|
||||||
|
irqchip.
|
||||||
|
|
||||||
A value describing the PMUv3 (Performance Monitor Unit v3) overflow interrupt
|
A value describing the PMUv3 (Performance Monitor Unit v3) overflow interrupt
|
||||||
number for this vcpu. This interrupt could be a PPI or SPI, but the interrupt
|
number for this vcpu. This interrupt could be a PPI or SPI, but the interrupt
|
||||||
|
@ -25,11 +27,36 @@ all vcpus, while as an SPI it must be a separate number per vcpu.
|
||||||
|
|
||||||
1.2 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_INIT
|
1.2 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_INIT
|
||||||
Parameters: no additional parameter in kvm_device_attr.addr
|
Parameters: no additional parameter in kvm_device_attr.addr
|
||||||
Returns: -ENODEV: PMUv3 not supported
|
Returns: -ENODEV: PMUv3 not supported or GIC not initialized
|
||||||
-ENXIO: PMUv3 not properly configured as required prior to calling this
|
-ENXIO: PMUv3 not properly configured or in-kernel irqchip not
|
||||||
attribute
|
configured as required prior to calling this attribute
|
||||||
-EBUSY: PMUv3 already initialized
|
-EBUSY: PMUv3 already initialized
|
||||||
|
|
||||||
Request the initialization of the PMUv3. This must be done after creating the
|
Request the initialization of the PMUv3. If using the PMUv3 with an in-kernel
|
||||||
in-kernel irqchip. Creating a PMU with a userspace irqchip is currently not
|
virtual GIC implementation, this must be done after initializing the in-kernel
|
||||||
supported.
|
irqchip.
|
||||||
|
|
||||||
|
|
||||||
|
2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
|
||||||
|
Architectures: ARM,ARM64
|
||||||
|
|
||||||
|
2.1. ATTRIBUTE: KVM_ARM_VCPU_TIMER_IRQ_VTIMER
|
||||||
|
2.2. ATTRIBUTE: KVM_ARM_VCPU_TIMER_IRQ_PTIMER
|
||||||
|
Parameters: in kvm_device_attr.addr the address for the timer interrupt is a
|
||||||
|
pointer to an int
|
||||||
|
Returns: -EINVAL: Invalid timer interrupt number
|
||||||
|
-EBUSY: One or more VCPUs has already run
|
||||||
|
|
||||||
|
A value describing the architected timer interrupt number when connected to an
|
||||||
|
in-kernel virtual GIC. These must be a PPI (16 <= intid < 32). Setting the
|
||||||
|
attribute overrides the default values (see below).
|
||||||
|
|
||||||
|
KVM_ARM_VCPU_TIMER_IRQ_VTIMER: The EL1 virtual timer intid (default: 27)
|
||||||
|
KVM_ARM_VCPU_TIMER_IRQ_PTIMER: The EL1 physical timer intid (default: 30)
|
||||||
|
|
||||||
|
Setting the same PPI for different timers will prevent the VCPUs from running.
|
||||||
|
Setting the interrupt number on a VCPU configures all VCPUs created at that
|
||||||
|
time to use the number provided for a given timer, overwriting any previously
|
||||||
|
configured values on other VCPUs. Userspace should configure the interrupt
|
||||||
|
numbers on at least one VCPU after creating all VCPUs and before running any
|
||||||
|
VCPUs.
|
||||||
|
|
|
@ -222,3 +222,36 @@ Allows user space to disable dea key wrapping, clearing the wrapping key.
|
||||||
|
|
||||||
Parameters: none
|
Parameters: none
|
||||||
Returns: 0
|
Returns: 0
|
||||||
|
|
||||||
|
5. GROUP: KVM_S390_VM_MIGRATION
|
||||||
|
Architectures: s390
|
||||||
|
|
||||||
|
5.1. ATTRIBUTE: KVM_S390_VM_MIGRATION_STOP (w/o)
|
||||||
|
|
||||||
|
Allows userspace to stop migration mode, needed for PGSTE migration.
|
||||||
|
Setting this attribute when migration mode is not active will have no
|
||||||
|
effects.
|
||||||
|
|
||||||
|
Parameters: none
|
||||||
|
Returns: 0
|
||||||
|
|
||||||
|
5.2. ATTRIBUTE: KVM_S390_VM_MIGRATION_START (w/o)
|
||||||
|
|
||||||
|
Allows userspace to start migration mode, needed for PGSTE migration.
|
||||||
|
Setting this attribute when migration mode is already active will have
|
||||||
|
no effects.
|
||||||
|
|
||||||
|
Parameters: none
|
||||||
|
Returns: -ENOMEM if there is not enough free memory to start migration mode
|
||||||
|
-EINVAL if the state of the VM is invalid (e.g. no memory defined)
|
||||||
|
0 in case of success.
|
||||||
|
|
||||||
|
5.3. ATTRIBUTE: KVM_S390_VM_MIGRATION_STATUS (r/o)
|
||||||
|
|
||||||
|
Allows userspace to query the status of migration mode.
|
||||||
|
|
||||||
|
Parameters: address of a buffer in user space to store the data (u64) to;
|
||||||
|
the data itself is either 0 if migration mode is disabled or 1
|
||||||
|
if it is enabled
|
||||||
|
Returns: -EFAULT if the given address is not accessible from kernel space
|
||||||
|
0 in case of success.
|
||||||
|
|
|
@ -179,6 +179,10 @@ Shadow pages contain the following information:
|
||||||
shadow page; it is also used to go back from a struct kvm_mmu_page
|
shadow page; it is also used to go back from a struct kvm_mmu_page
|
||||||
to a memslot, through the kvm_memslots_for_spte_role macro and
|
to a memslot, through the kvm_memslots_for_spte_role macro and
|
||||||
__gfn_to_memslot.
|
__gfn_to_memslot.
|
||||||
|
role.ad_disabled:
|
||||||
|
Is 1 if the MMU instance cannot use A/D bits. EPT did not have A/D
|
||||||
|
bits before Haswell; shadow EPT page tables also cannot use A/D bits
|
||||||
|
if the L1 hypervisor does not enable them.
|
||||||
gfn:
|
gfn:
|
||||||
Either the guest page table containing the translations shadowed by this
|
Either the guest page table containing the translations shadowed by this
|
||||||
page, or the base page frame for linear translations. See role.direct.
|
page, or the base page frame for linear translations. See role.direct.
|
||||||
|
|
|
@ -0,0 +1,307 @@
|
||||||
|
=================
|
||||||
|
KVM VCPU Requests
|
||||||
|
=================
|
||||||
|
|
||||||
|
Overview
|
||||||
|
========
|
||||||
|
|
||||||
|
KVM supports an internal API enabling threads to request a VCPU thread to
|
||||||
|
perform some activity. For example, a thread may request a VCPU to flush
|
||||||
|
its TLB with a VCPU request. The API consists of the following functions::
|
||||||
|
|
||||||
|
/* Check if any requests are pending for VCPU @vcpu. */
|
||||||
|
bool kvm_request_pending(struct kvm_vcpu *vcpu);
|
||||||
|
|
||||||
|
/* Check if VCPU @vcpu has request @req pending. */
|
||||||
|
bool kvm_test_request(int req, struct kvm_vcpu *vcpu);
|
||||||
|
|
||||||
|
/* Clear request @req for VCPU @vcpu. */
|
||||||
|
void kvm_clear_request(int req, struct kvm_vcpu *vcpu);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Check if VCPU @vcpu has request @req pending. When the request is
|
||||||
|
* pending it will be cleared and a memory barrier, which pairs with
|
||||||
|
* another in kvm_make_request(), will be issued.
|
||||||
|
*/
|
||||||
|
bool kvm_check_request(int req, struct kvm_vcpu *vcpu);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Make request @req of VCPU @vcpu. Issues a memory barrier, which pairs
|
||||||
|
* with another in kvm_check_request(), prior to setting the request.
|
||||||
|
*/
|
||||||
|
void kvm_make_request(int req, struct kvm_vcpu *vcpu);
|
||||||
|
|
||||||
|
/* Make request @req of all VCPUs of the VM with struct kvm @kvm. */
|
||||||
|
bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
|
||||||
|
|
||||||
|
Typically a requester wants the VCPU to perform the activity as soon
|
||||||
|
as possible after making the request. This means most requests
|
||||||
|
(kvm_make_request() calls) are followed by a call to kvm_vcpu_kick(),
|
||||||
|
and kvm_make_all_cpus_request() has the kicking of all VCPUs built
|
||||||
|
into it.
|
||||||
|
|
||||||
|
VCPU Kicks
|
||||||
|
----------
|
||||||
|
|
||||||
|
The goal of a VCPU kick is to bring a VCPU thread out of guest mode in
|
||||||
|
order to perform some KVM maintenance. To do so, an IPI is sent, forcing
|
||||||
|
a guest mode exit. However, a VCPU thread may not be in guest mode at the
|
||||||
|
time of the kick. Therefore, depending on the mode and state of the VCPU
|
||||||
|
thread, there are two other actions a kick may take. All three actions
|
||||||
|
are listed below:
|
||||||
|
|
||||||
|
1) Send an IPI. This forces a guest mode exit.
|
||||||
|
2) Waking a sleeping VCPU. Sleeping VCPUs are VCPU threads outside guest
|
||||||
|
mode that wait on waitqueues. Waking them removes the threads from
|
||||||
|
the waitqueues, allowing the threads to run again. This behavior
|
||||||
|
may be suppressed, see KVM_REQUEST_NO_WAKEUP below.
|
||||||
|
3) Nothing. When the VCPU is not in guest mode and the VCPU thread is not
|
||||||
|
sleeping, then there is nothing to do.
|
||||||
|
|
||||||
|
VCPU Mode
|
||||||
|
---------
|
||||||
|
|
||||||
|
VCPUs have a mode state, ``vcpu->mode``, that is used to track whether the
|
||||||
|
guest is running in guest mode or not, as well as some specific
|
||||||
|
outside guest mode states. The architecture may use ``vcpu->mode`` to
|
||||||
|
ensure VCPU requests are seen by VCPUs (see "Ensuring Requests Are Seen"),
|
||||||
|
as well as to avoid sending unnecessary IPIs (see "IPI Reduction"), and
|
||||||
|
even to ensure IPI acknowledgements are waited upon (see "Waiting for
|
||||||
|
Acknowledgements"). The following modes are defined:
|
||||||
|
|
||||||
|
OUTSIDE_GUEST_MODE
|
||||||
|
|
||||||
|
The VCPU thread is outside guest mode.
|
||||||
|
|
||||||
|
IN_GUEST_MODE
|
||||||
|
|
||||||
|
The VCPU thread is in guest mode.
|
||||||
|
|
||||||
|
EXITING_GUEST_MODE
|
||||||
|
|
||||||
|
The VCPU thread is transitioning from IN_GUEST_MODE to
|
||||||
|
OUTSIDE_GUEST_MODE.
|
||||||
|
|
||||||
|
READING_SHADOW_PAGE_TABLES
|
||||||
|
|
||||||
|
The VCPU thread is outside guest mode, but it wants the sender of
|
||||||
|
certain VCPU requests, namely KVM_REQ_TLB_FLUSH, to wait until the VCPU
|
||||||
|
thread is done reading the page tables.
|
||||||
|
|
||||||
|
VCPU Request Internals
|
||||||
|
======================
|
||||||
|
|
||||||
|
VCPU requests are simply bit indices of the ``vcpu->requests`` bitmap.
|
||||||
|
This means general bitops, like those documented in [atomic-ops]_ could
|
||||||
|
also be used, e.g. ::
|
||||||
|
|
||||||
|
clear_bit(KVM_REQ_UNHALT & KVM_REQUEST_MASK, &vcpu->requests);
|
||||||
|
|
||||||
|
However, VCPU request users should refrain from doing so, as it would
|
||||||
|
break the abstraction. The first 8 bits are reserved for architecture
|
||||||
|
independent requests, all additional bits are available for architecture
|
||||||
|
dependent requests.
|
||||||
|
|
||||||
|
Architecture Independent Requests
|
||||||
|
---------------------------------
|
||||||
|
|
||||||
|
KVM_REQ_TLB_FLUSH
|
||||||
|
|
||||||
|
KVM's common MMU notifier may need to flush all of a guest's TLB
|
||||||
|
entries, calling kvm_flush_remote_tlbs() to do so. Architectures that
|
||||||
|
choose to use the common kvm_flush_remote_tlbs() implementation will
|
||||||
|
need to handle this VCPU request.
|
||||||
|
|
||||||
|
KVM_REQ_MMU_RELOAD
|
||||||
|
|
||||||
|
When shadow page tables are used and memory slots are removed it's
|
||||||
|
necessary to inform each VCPU to completely refresh the tables. This
|
||||||
|
request is used for that.
|
||||||
|
|
||||||
|
KVM_REQ_PENDING_TIMER
|
||||||
|
|
||||||
|
This request may be made from a timer handler run on the host on behalf
|
||||||
|
of a VCPU. It informs the VCPU thread to inject a timer interrupt.
|
||||||
|
|
||||||
|
KVM_REQ_UNHALT
|
||||||
|
|
||||||
|
This request may be made from the KVM common function kvm_vcpu_block(),
|
||||||
|
which is used to emulate an instruction that causes a CPU to halt until
|
||||||
|
one of an architectural specific set of events and/or interrupts is
|
||||||
|
received (determined by checking kvm_arch_vcpu_runnable()). When that
|
||||||
|
event or interrupt arrives kvm_vcpu_block() makes the request. This is
|
||||||
|
in contrast to when kvm_vcpu_block() returns due to any other reason,
|
||||||
|
such as a pending signal, which does not indicate the VCPU's halt
|
||||||
|
emulation should stop, and therefore does not make the request.
|
||||||
|
|
||||||
|
KVM_REQUEST_MASK
|
||||||
|
----------------
|
||||||
|
|
||||||
|
VCPU requests should be masked by KVM_REQUEST_MASK before using them with
|
||||||
|
bitops. This is because only the lower 8 bits are used to represent the
|
||||||
|
request's number. The upper bits are used as flags. Currently only two
|
||||||
|
flags are defined.
|
||||||
|
|
||||||
|
VCPU Request Flags
|
||||||
|
------------------
|
||||||
|
|
||||||
|
KVM_REQUEST_NO_WAKEUP
|
||||||
|
|
||||||
|
This flag is applied to requests that only need immediate attention
|
||||||
|
from VCPUs running in guest mode. That is, sleeping VCPUs do not need
|
||||||
|
to be awaken for these requests. Sleeping VCPUs will handle the
|
||||||
|
requests when they are awaken later for some other reason.
|
||||||
|
|
||||||
|
KVM_REQUEST_WAIT
|
||||||
|
|
||||||
|
When requests with this flag are made with kvm_make_all_cpus_request(),
|
||||||
|
then the caller will wait for each VCPU to acknowledge its IPI before
|
||||||
|
proceeding. This flag only applies to VCPUs that would receive IPIs.
|
||||||
|
If, for example, the VCPU is sleeping, so no IPI is necessary, then
|
||||||
|
the requesting thread does not wait. This means that this flag may be
|
||||||
|
safely combined with KVM_REQUEST_NO_WAKEUP. See "Waiting for
|
||||||
|
Acknowledgements" for more information about requests with
|
||||||
|
KVM_REQUEST_WAIT.
|
||||||
|
|
||||||
|
VCPU Requests with Associated State
|
||||||
|
===================================
|
||||||
|
|
||||||
|
Requesters that want the receiving VCPU to handle new state need to ensure
|
||||||
|
the newly written state is observable to the receiving VCPU thread's CPU
|
||||||
|
by the time it observes the request. This means a write memory barrier
|
||||||
|
must be inserted after writing the new state and before setting the VCPU
|
||||||
|
request bit. Additionally, on the receiving VCPU thread's side, a
|
||||||
|
corresponding read barrier must be inserted after reading the request bit
|
||||||
|
and before proceeding to read the new state associated with it. See
|
||||||
|
scenario 3, Message and Flag, of [lwn-mb]_ and the kernel documentation
|
||||||
|
[memory-barriers]_.
|
||||||
|
|
||||||
|
The pair of functions, kvm_check_request() and kvm_make_request(), provide
|
||||||
|
the memory barriers, allowing this requirement to be handled internally by
|
||||||
|
the API.
|
||||||
|
|
||||||
|
Ensuring Requests Are Seen
|
||||||
|
==========================
|
||||||
|
|
||||||
|
When making requests to VCPUs, we want to avoid the receiving VCPU
|
||||||
|
executing in guest mode for an arbitrary long time without handling the
|
||||||
|
request. We can be sure this won't happen as long as we ensure the VCPU
|
||||||
|
thread checks kvm_request_pending() before entering guest mode and that a
|
||||||
|
kick will send an IPI to force an exit from guest mode when necessary.
|
||||||
|
Extra care must be taken to cover the period after the VCPU thread's last
|
||||||
|
kvm_request_pending() check and before it has entered guest mode, as kick
|
||||||
|
IPIs will only trigger guest mode exits for VCPU threads that are in guest
|
||||||
|
mode or at least have already disabled interrupts in order to prepare to
|
||||||
|
enter guest mode. This means that an optimized implementation (see "IPI
|
||||||
|
Reduction") must be certain when it's safe to not send the IPI. One
|
||||||
|
solution, which all architectures except s390 apply, is to:
|
||||||
|
|
||||||
|
- set ``vcpu->mode`` to IN_GUEST_MODE between disabling the interrupts and
|
||||||
|
the last kvm_request_pending() check;
|
||||||
|
- enable interrupts atomically when entering the guest.
|
||||||
|
|
||||||
|
This solution also requires memory barriers to be placed carefully in both
|
||||||
|
the requesting thread and the receiving VCPU. With the memory barriers we
|
||||||
|
can exclude the possibility of a VCPU thread observing
|
||||||
|
!kvm_request_pending() on its last check and then not receiving an IPI for
|
||||||
|
the next request made of it, even if the request is made immediately after
|
||||||
|
the check. This is done by way of the Dekker memory barrier pattern
|
||||||
|
(scenario 10 of [lwn-mb]_). As the Dekker pattern requires two variables,
|
||||||
|
this solution pairs ``vcpu->mode`` with ``vcpu->requests``. Substituting
|
||||||
|
them into the pattern gives::
|
||||||
|
|
||||||
|
CPU1 CPU2
|
||||||
|
================= =================
|
||||||
|
local_irq_disable();
|
||||||
|
WRITE_ONCE(vcpu->mode, IN_GUEST_MODE); kvm_make_request(REQ, vcpu);
|
||||||
|
smp_mb(); smp_mb();
|
||||||
|
if (kvm_request_pending(vcpu)) { if (READ_ONCE(vcpu->mode) ==
|
||||||
|
IN_GUEST_MODE) {
|
||||||
|
...abort guest entry... ...send IPI...
|
||||||
|
} }
|
||||||
|
|
||||||
|
As stated above, the IPI is only useful for VCPU threads in guest mode or
|
||||||
|
that have already disabled interrupts. This is why this specific case of
|
||||||
|
the Dekker pattern has been extended to disable interrupts before setting
|
||||||
|
``vcpu->mode`` to IN_GUEST_MODE. WRITE_ONCE() and READ_ONCE() are used to
|
||||||
|
pedantically implement the memory barrier pattern, guaranteeing the
|
||||||
|
compiler doesn't interfere with ``vcpu->mode``'s carefully planned
|
||||||
|
accesses.
|
||||||
|
|
||||||
|
IPI Reduction
|
||||||
|
-------------
|
||||||
|
|
||||||
|
As only one IPI is needed to get a VCPU to check for any/all requests,
|
||||||
|
then they may be coalesced. This is easily done by having the first IPI
|
||||||
|
sending kick also change the VCPU mode to something !IN_GUEST_MODE. The
|
||||||
|
transitional state, EXITING_GUEST_MODE, is used for this purpose.
|
||||||
|
|
||||||
|
Waiting for Acknowledgements
|
||||||
|
----------------------------
|
||||||
|
|
||||||
|
Some requests, those with the KVM_REQUEST_WAIT flag set, require IPIs to
|
||||||
|
be sent, and the acknowledgements to be waited upon, even when the target
|
||||||
|
VCPU threads are in modes other than IN_GUEST_MODE. For example, one case
|
||||||
|
is when a target VCPU thread is in READING_SHADOW_PAGE_TABLES mode, which
|
||||||
|
is set after disabling interrupts. To support these cases, the
|
||||||
|
KVM_REQUEST_WAIT flag changes the condition for sending an IPI from
|
||||||
|
checking that the VCPU is IN_GUEST_MODE to checking that it is not
|
||||||
|
OUTSIDE_GUEST_MODE.
|
||||||
|
|
||||||
|
Request-less VCPU Kicks
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
As the determination of whether or not to send an IPI depends on the
|
||||||
|
two-variable Dekker memory barrier pattern, then it's clear that
|
||||||
|
request-less VCPU kicks are almost never correct. Without the assurance
|
||||||
|
that a non-IPI generating kick will still result in an action by the
|
||||||
|
receiving VCPU, as the final kvm_request_pending() check does for
|
||||||
|
request-accompanying kicks, then the kick may not do anything useful at
|
||||||
|
all. If, for instance, a request-less kick was made to a VCPU that was
|
||||||
|
just about to set its mode to IN_GUEST_MODE, meaning no IPI is sent, then
|
||||||
|
the VCPU thread may continue its entry without actually having done
|
||||||
|
whatever it was the kick was meant to initiate.
|
||||||
|
|
||||||
|
One exception is x86's posted interrupt mechanism. In this case, however,
|
||||||
|
even the request-less VCPU kick is coupled with the same
|
||||||
|
local_irq_disable() + smp_mb() pattern described above; the ON bit
|
||||||
|
(Outstanding Notification) in the posted interrupt descriptor takes the
|
||||||
|
role of ``vcpu->requests``. When sending a posted interrupt, PIR.ON is
|
||||||
|
set before reading ``vcpu->mode``; dually, in the VCPU thread,
|
||||||
|
vmx_sync_pir_to_irr() reads PIR after setting ``vcpu->mode`` to
|
||||||
|
IN_GUEST_MODE.
|
||||||
|
|
||||||
|
Additional Considerations
|
||||||
|
=========================
|
||||||
|
|
||||||
|
Sleeping VCPUs
|
||||||
|
--------------
|
||||||
|
|
||||||
|
VCPU threads may need to consider requests before and/or after calling
|
||||||
|
functions that may put them to sleep, e.g. kvm_vcpu_block(). Whether they
|
||||||
|
do or not, and, if they do, which requests need consideration, is
|
||||||
|
architecture dependent. kvm_vcpu_block() calls kvm_arch_vcpu_runnable()
|
||||||
|
to check if it should awaken. One reason to do so is to provide
|
||||||
|
architectures a function where requests may be checked if necessary.
|
||||||
|
|
||||||
|
Clearing Requests
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
Generally it only makes sense for the receiving VCPU thread to clear a
|
||||||
|
request. However, in some circumstances, such as when the requesting
|
||||||
|
thread and the receiving VCPU thread are executed serially, such as when
|
||||||
|
they are the same thread, or when they are using some form of concurrency
|
||||||
|
control to temporarily execute synchronously, then it's possible to know
|
||||||
|
that the request may be cleared immediately, rather than waiting for the
|
||||||
|
receiving VCPU thread to handle the request in VCPU RUN. The only current
|
||||||
|
examples of this are kvm_vcpu_block() calls made by VCPUs to block
|
||||||
|
themselves. A possible side-effect of that call is to make the
|
||||||
|
KVM_REQ_UNHALT request, which may then be cleared immediately when the
|
||||||
|
VCPU returns from the call.
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
.. [atomic-ops] Documentation/core-api/atomic_ops.rst
|
||||||
|
.. [memory-barriers] Documentation/memory-barriers.txt
|
||||||
|
.. [lwn-mb] https://lwn.net/Articles/573436/
|
|
@ -7350,7 +7350,7 @@ F: arch/powerpc/kvm/
|
||||||
|
|
||||||
KERNEL VIRTUAL MACHINE for s390 (KVM/s390)
|
KERNEL VIRTUAL MACHINE for s390 (KVM/s390)
|
||||||
M: Christian Borntraeger <borntraeger@de.ibm.com>
|
M: Christian Borntraeger <borntraeger@de.ibm.com>
|
||||||
M: Cornelia Huck <cornelia.huck@de.ibm.com>
|
M: Cornelia Huck <cohuck@redhat.com>
|
||||||
L: linux-s390@vger.kernel.org
|
L: linux-s390@vger.kernel.org
|
||||||
W: http://www.ibm.com/developerworks/linux/linux390/
|
W: http://www.ibm.com/developerworks/linux/linux390/
|
||||||
T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git
|
T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git
|
||||||
|
@ -11268,7 +11268,7 @@ S: Supported
|
||||||
F: drivers/iommu/s390-iommu.c
|
F: drivers/iommu/s390-iommu.c
|
||||||
|
|
||||||
S390 VFIO-CCW DRIVER
|
S390 VFIO-CCW DRIVER
|
||||||
M: Cornelia Huck <cornelia.huck@de.ibm.com>
|
M: Cornelia Huck <cohuck@redhat.com>
|
||||||
M: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
|
M: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
|
||||||
L: linux-s390@vger.kernel.org
|
L: linux-s390@vger.kernel.org
|
||||||
L: kvm@vger.kernel.org
|
L: kvm@vger.kernel.org
|
||||||
|
@ -13814,7 +13814,7 @@ F: include/uapi/linux/virtio_*.h
|
||||||
F: drivers/crypto/virtio/
|
F: drivers/crypto/virtio/
|
||||||
|
|
||||||
VIRTIO DRIVERS FOR S390
|
VIRTIO DRIVERS FOR S390
|
||||||
M: Cornelia Huck <cornelia.huck@de.ibm.com>
|
M: Cornelia Huck <cohuck@redhat.com>
|
||||||
M: Halil Pasic <pasic@linux.vnet.ibm.com>
|
M: Halil Pasic <pasic@linux.vnet.ibm.com>
|
||||||
L: linux-s390@vger.kernel.org
|
L: linux-s390@vger.kernel.org
|
||||||
L: virtualization@lists.linux-foundation.org
|
L: virtualization@lists.linux-foundation.org
|
||||||
|
|
|
@ -44,7 +44,9 @@
|
||||||
#define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
|
#define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
#define KVM_REQ_VCPU_EXIT (8 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
|
#define KVM_REQ_SLEEP \
|
||||||
|
KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
|
||||||
|
#define KVM_REQ_IRQ_PENDING KVM_ARCH_REQ(1)
|
||||||
|
|
||||||
u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
|
u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
|
||||||
int __attribute_const__ kvm_target_cpu(void);
|
int __attribute_const__ kvm_target_cpu(void);
|
||||||
|
@ -233,8 +235,6 @@ struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
|
||||||
struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
|
struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
|
||||||
void kvm_arm_halt_guest(struct kvm *kvm);
|
void kvm_arm_halt_guest(struct kvm *kvm);
|
||||||
void kvm_arm_resume_guest(struct kvm *kvm);
|
void kvm_arm_resume_guest(struct kvm *kvm);
|
||||||
void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu);
|
|
||||||
void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu);
|
|
||||||
|
|
||||||
int kvm_arm_copy_coproc_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
|
int kvm_arm_copy_coproc_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
|
||||||
unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu);
|
unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu);
|
||||||
|
@ -291,20 +291,12 @@ static inline void kvm_arm_init_debug(void) {}
|
||||||
static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
|
static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
|
||||||
static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
|
static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
|
||||||
static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
|
static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
|
||||||
static inline int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
|
|
||||||
struct kvm_device_attr *attr)
|
int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
|
||||||
{
|
struct kvm_device_attr *attr);
|
||||||
return -ENXIO;
|
int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
|
||||||
}
|
struct kvm_device_attr *attr);
|
||||||
static inline int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
|
int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
|
||||||
struct kvm_device_attr *attr)
|
struct kvm_device_attr *attr);
|
||||||
{
|
|
||||||
return -ENXIO;
|
|
||||||
}
|
|
||||||
static inline int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
|
|
||||||
struct kvm_device_attr *attr)
|
|
||||||
{
|
|
||||||
return -ENXIO;
|
|
||||||
}
|
|
||||||
|
|
||||||
#endif /* __ARM_KVM_HOST_H__ */
|
#endif /* __ARM_KVM_HOST_H__ */
|
||||||
|
|
|
@ -203,6 +203,14 @@ struct kvm_arch_memory_slot {
|
||||||
#define KVM_DEV_ARM_VGIC_LINE_LEVEL_INTID_MASK 0x3ff
|
#define KVM_DEV_ARM_VGIC_LINE_LEVEL_INTID_MASK 0x3ff
|
||||||
#define VGIC_LEVEL_INFO_LINE_LEVEL 0
|
#define VGIC_LEVEL_INFO_LINE_LEVEL 0
|
||||||
|
|
||||||
|
/* Device Control API on vcpu fd */
|
||||||
|
#define KVM_ARM_VCPU_PMU_V3_CTRL 0
|
||||||
|
#define KVM_ARM_VCPU_PMU_V3_IRQ 0
|
||||||
|
#define KVM_ARM_VCPU_PMU_V3_INIT 1
|
||||||
|
#define KVM_ARM_VCPU_TIMER_CTRL 1
|
||||||
|
#define KVM_ARM_VCPU_TIMER_IRQ_VTIMER 0
|
||||||
|
#define KVM_ARM_VCPU_TIMER_IRQ_PTIMER 1
|
||||||
|
|
||||||
#define KVM_DEV_ARM_VGIC_CTRL_INIT 0
|
#define KVM_DEV_ARM_VGIC_CTRL_INIT 0
|
||||||
#define KVM_DEV_ARM_ITS_SAVE_TABLES 1
|
#define KVM_DEV_ARM_ITS_SAVE_TABLES 1
|
||||||
#define KVM_DEV_ARM_ITS_RESTORE_TABLES 2
|
#define KVM_DEV_ARM_ITS_RESTORE_TABLES 2
|
||||||
|
|
|
@ -301,3 +301,54 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
|
||||||
{
|
{
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
|
||||||
|
struct kvm_device_attr *attr)
|
||||||
|
{
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
switch (attr->group) {
|
||||||
|
case KVM_ARM_VCPU_TIMER_CTRL:
|
||||||
|
ret = kvm_arm_timer_set_attr(vcpu, attr);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
ret = -ENXIO;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
|
||||||
|
struct kvm_device_attr *attr)
|
||||||
|
{
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
switch (attr->group) {
|
||||||
|
case KVM_ARM_VCPU_TIMER_CTRL:
|
||||||
|
ret = kvm_arm_timer_get_attr(vcpu, attr);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
ret = -ENXIO;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
|
||||||
|
struct kvm_device_attr *attr)
|
||||||
|
{
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
switch (attr->group) {
|
||||||
|
case KVM_ARM_VCPU_TIMER_CTRL:
|
||||||
|
ret = kvm_arm_timer_has_attr(vcpu, attr);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
ret = -ENXIO;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
|
@ -72,6 +72,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
|
||||||
trace_kvm_wfx(*vcpu_pc(vcpu), false);
|
trace_kvm_wfx(*vcpu_pc(vcpu), false);
|
||||||
vcpu->stat.wfi_exit_stat++;
|
vcpu->stat.wfi_exit_stat++;
|
||||||
kvm_vcpu_block(vcpu);
|
kvm_vcpu_block(vcpu);
|
||||||
|
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
|
||||||
}
|
}
|
||||||
|
|
||||||
kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
|
kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
|
||||||
|
|
|
@ -237,8 +237,10 @@ void __hyp_text __noreturn __hyp_panic(int cause)
|
||||||
|
|
||||||
vcpu = (struct kvm_vcpu *)read_sysreg(HTPIDR);
|
vcpu = (struct kvm_vcpu *)read_sysreg(HTPIDR);
|
||||||
host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
|
host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
|
||||||
|
__timer_save_state(vcpu);
|
||||||
__deactivate_traps(vcpu);
|
__deactivate_traps(vcpu);
|
||||||
__deactivate_vm(vcpu);
|
__deactivate_vm(vcpu);
|
||||||
|
__banked_restore_state(host_ctxt);
|
||||||
__sysreg_restore_state(host_ctxt);
|
__sysreg_restore_state(host_ctxt);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -37,16 +37,6 @@ static struct kvm_regs cortexa_regs_reset = {
|
||||||
.usr_regs.ARM_cpsr = SVC_MODE | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT,
|
.usr_regs.ARM_cpsr = SVC_MODE | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT,
|
||||||
};
|
};
|
||||||
|
|
||||||
static const struct kvm_irq_level cortexa_ptimer_irq = {
|
|
||||||
{ .irq = 30 },
|
|
||||||
.level = 1,
|
|
||||||
};
|
|
||||||
|
|
||||||
static const struct kvm_irq_level cortexa_vtimer_irq = {
|
|
||||||
{ .irq = 27 },
|
|
||||||
.level = 1,
|
|
||||||
};
|
|
||||||
|
|
||||||
|
|
||||||
/*******************************************************************************
|
/*******************************************************************************
|
||||||
* Exported reset function
|
* Exported reset function
|
||||||
|
@ -62,16 +52,12 @@ static const struct kvm_irq_level cortexa_vtimer_irq = {
|
||||||
int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
|
int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
|
||||||
{
|
{
|
||||||
struct kvm_regs *reset_regs;
|
struct kvm_regs *reset_regs;
|
||||||
const struct kvm_irq_level *cpu_vtimer_irq;
|
|
||||||
const struct kvm_irq_level *cpu_ptimer_irq;
|
|
||||||
|
|
||||||
switch (vcpu->arch.target) {
|
switch (vcpu->arch.target) {
|
||||||
case KVM_ARM_TARGET_CORTEX_A7:
|
case KVM_ARM_TARGET_CORTEX_A7:
|
||||||
case KVM_ARM_TARGET_CORTEX_A15:
|
case KVM_ARM_TARGET_CORTEX_A15:
|
||||||
reset_regs = &cortexa_regs_reset;
|
reset_regs = &cortexa_regs_reset;
|
||||||
vcpu->arch.midr = read_cpuid_id();
|
vcpu->arch.midr = read_cpuid_id();
|
||||||
cpu_vtimer_irq = &cortexa_vtimer_irq;
|
|
||||||
cpu_ptimer_irq = &cortexa_ptimer_irq;
|
|
||||||
break;
|
break;
|
||||||
default:
|
default:
|
||||||
return -ENODEV;
|
return -ENODEV;
|
||||||
|
@ -84,5 +70,5 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
|
||||||
kvm_reset_coprocs(vcpu);
|
kvm_reset_coprocs(vcpu);
|
||||||
|
|
||||||
/* Reset arch_timer context */
|
/* Reset arch_timer context */
|
||||||
return kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq, cpu_ptimer_irq);
|
return kvm_timer_vcpu_reset(vcpu);
|
||||||
}
|
}
|
||||||
|
|
|
@ -488,6 +488,17 @@ config CAVIUM_ERRATUM_27456
|
||||||
|
|
||||||
If unsure, say Y.
|
If unsure, say Y.
|
||||||
|
|
||||||
|
config CAVIUM_ERRATUM_30115
|
||||||
|
bool "Cavium erratum 30115: Guest may disable interrupts in host"
|
||||||
|
default y
|
||||||
|
help
|
||||||
|
On ThunderX T88 pass 1.x through 2.2, T81 pass 1.0 through
|
||||||
|
1.2, and T83 Pass 1.0, KVM guest execution may disable
|
||||||
|
interrupts in host. Trapping both GICv3 group-0 and group-1
|
||||||
|
accesses sidesteps the issue.
|
||||||
|
|
||||||
|
If unsure, say Y.
|
||||||
|
|
||||||
config QCOM_FALKOR_ERRATUM_1003
|
config QCOM_FALKOR_ERRATUM_1003
|
||||||
bool "Falkor E1003: Incorrect translation due to ASID change"
|
bool "Falkor E1003: Incorrect translation due to ASID change"
|
||||||
default y
|
default y
|
||||||
|
|
|
@ -89,7 +89,7 @@ static inline void gic_write_ctlr(u32 val)
|
||||||
|
|
||||||
static inline void gic_write_grpen1(u32 val)
|
static inline void gic_write_grpen1(u32 val)
|
||||||
{
|
{
|
||||||
write_sysreg_s(val, SYS_ICC_GRPEN1_EL1);
|
write_sysreg_s(val, SYS_ICC_IGRPEN1_EL1);
|
||||||
isb();
|
isb();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -38,7 +38,8 @@
|
||||||
#define ARM64_WORKAROUND_REPEAT_TLBI 17
|
#define ARM64_WORKAROUND_REPEAT_TLBI 17
|
||||||
#define ARM64_WORKAROUND_QCOM_FALKOR_E1003 18
|
#define ARM64_WORKAROUND_QCOM_FALKOR_E1003 18
|
||||||
#define ARM64_WORKAROUND_858921 19
|
#define ARM64_WORKAROUND_858921 19
|
||||||
|
#define ARM64_WORKAROUND_CAVIUM_30115 20
|
||||||
|
|
||||||
#define ARM64_NCAPS 20
|
#define ARM64_NCAPS 21
|
||||||
|
|
||||||
#endif /* __ASM_CPUCAPS_H */
|
#endif /* __ASM_CPUCAPS_H */
|
||||||
|
|
|
@ -86,6 +86,7 @@
|
||||||
|
|
||||||
#define CAVIUM_CPU_PART_THUNDERX 0x0A1
|
#define CAVIUM_CPU_PART_THUNDERX 0x0A1
|
||||||
#define CAVIUM_CPU_PART_THUNDERX_81XX 0x0A2
|
#define CAVIUM_CPU_PART_THUNDERX_81XX 0x0A2
|
||||||
|
#define CAVIUM_CPU_PART_THUNDERX_83XX 0x0A3
|
||||||
|
|
||||||
#define BRCM_CPU_PART_VULCAN 0x516
|
#define BRCM_CPU_PART_VULCAN 0x516
|
||||||
|
|
||||||
|
@ -96,6 +97,7 @@
|
||||||
#define MIDR_CORTEX_A73 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A73)
|
#define MIDR_CORTEX_A73 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A73)
|
||||||
#define MIDR_THUNDERX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX)
|
#define MIDR_THUNDERX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX)
|
||||||
#define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_81XX)
|
#define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_81XX)
|
||||||
|
#define MIDR_THUNDERX_83XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_83XX)
|
||||||
#define MIDR_QCOM_FALKOR_V1 MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_FALKOR_V1)
|
#define MIDR_QCOM_FALKOR_V1 MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_FALKOR_V1)
|
||||||
|
|
||||||
#ifndef __ASSEMBLY__
|
#ifndef __ASSEMBLY__
|
||||||
|
|
|
@ -19,6 +19,7 @@
|
||||||
#define __ASM_ESR_H
|
#define __ASM_ESR_H
|
||||||
|
|
||||||
#include <asm/memory.h>
|
#include <asm/memory.h>
|
||||||
|
#include <asm/sysreg.h>
|
||||||
|
|
||||||
#define ESR_ELx_EC_UNKNOWN (0x00)
|
#define ESR_ELx_EC_UNKNOWN (0x00)
|
||||||
#define ESR_ELx_EC_WFx (0x01)
|
#define ESR_ELx_EC_WFx (0x01)
|
||||||
|
@ -182,6 +183,29 @@
|
||||||
#define ESR_ELx_SYS64_ISS_SYS_CNTFRQ (ESR_ELx_SYS64_ISS_SYS_VAL(3, 3, 0, 14, 0) | \
|
#define ESR_ELx_SYS64_ISS_SYS_CNTFRQ (ESR_ELx_SYS64_ISS_SYS_VAL(3, 3, 0, 14, 0) | \
|
||||||
ESR_ELx_SYS64_ISS_DIR_READ)
|
ESR_ELx_SYS64_ISS_DIR_READ)
|
||||||
|
|
||||||
|
#define esr_sys64_to_sysreg(e) \
|
||||||
|
sys_reg((((e) & ESR_ELx_SYS64_ISS_OP0_MASK) >> \
|
||||||
|
ESR_ELx_SYS64_ISS_OP0_SHIFT), \
|
||||||
|
(((e) & ESR_ELx_SYS64_ISS_OP1_MASK) >> \
|
||||||
|
ESR_ELx_SYS64_ISS_OP1_SHIFT), \
|
||||||
|
(((e) & ESR_ELx_SYS64_ISS_CRN_MASK) >> \
|
||||||
|
ESR_ELx_SYS64_ISS_CRN_SHIFT), \
|
||||||
|
(((e) & ESR_ELx_SYS64_ISS_CRM_MASK) >> \
|
||||||
|
ESR_ELx_SYS64_ISS_CRM_SHIFT), \
|
||||||
|
(((e) & ESR_ELx_SYS64_ISS_OP2_MASK) >> \
|
||||||
|
ESR_ELx_SYS64_ISS_OP2_SHIFT))
|
||||||
|
|
||||||
|
#define esr_cp15_to_sysreg(e) \
|
||||||
|
sys_reg(3, \
|
||||||
|
(((e) & ESR_ELx_SYS64_ISS_OP1_MASK) >> \
|
||||||
|
ESR_ELx_SYS64_ISS_OP1_SHIFT), \
|
||||||
|
(((e) & ESR_ELx_SYS64_ISS_CRN_MASK) >> \
|
||||||
|
ESR_ELx_SYS64_ISS_CRN_SHIFT), \
|
||||||
|
(((e) & ESR_ELx_SYS64_ISS_CRM_MASK) >> \
|
||||||
|
ESR_ELx_SYS64_ISS_CRM_SHIFT), \
|
||||||
|
(((e) & ESR_ELx_SYS64_ISS_OP2_MASK) >> \
|
||||||
|
ESR_ELx_SYS64_ISS_OP2_SHIFT))
|
||||||
|
|
||||||
#ifndef __ASSEMBLY__
|
#ifndef __ASSEMBLY__
|
||||||
#include <asm/types.h>
|
#include <asm/types.h>
|
||||||
|
|
||||||
|
|
|
@ -42,7 +42,9 @@
|
||||||
|
|
||||||
#define KVM_VCPU_MAX_FEATURES 4
|
#define KVM_VCPU_MAX_FEATURES 4
|
||||||
|
|
||||||
#define KVM_REQ_VCPU_EXIT (8 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
|
#define KVM_REQ_SLEEP \
|
||||||
|
KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
|
||||||
|
#define KVM_REQ_IRQ_PENDING KVM_ARCH_REQ(1)
|
||||||
|
|
||||||
int __attribute_const__ kvm_target_cpu(void);
|
int __attribute_const__ kvm_target_cpu(void);
|
||||||
int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
|
int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
|
||||||
|
@ -334,8 +336,6 @@ struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
|
||||||
struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void);
|
struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void);
|
||||||
void kvm_arm_halt_guest(struct kvm *kvm);
|
void kvm_arm_halt_guest(struct kvm *kvm);
|
||||||
void kvm_arm_resume_guest(struct kvm *kvm);
|
void kvm_arm_resume_guest(struct kvm *kvm);
|
||||||
void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu);
|
|
||||||
void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu);
|
|
||||||
|
|
||||||
u64 __kvm_call_hyp(void *hypfn, ...);
|
u64 __kvm_call_hyp(void *hypfn, ...);
|
||||||
#define kvm_call_hyp(f, ...) __kvm_call_hyp(kvm_ksym_ref(f), ##__VA_ARGS__)
|
#define kvm_call_hyp(f, ...) __kvm_call_hyp(kvm_ksym_ref(f), ##__VA_ARGS__)
|
||||||
|
|
|
@ -127,6 +127,7 @@ int __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu);
|
||||||
|
|
||||||
void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
|
void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
|
||||||
void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
|
void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
|
||||||
|
int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
|
||||||
|
|
||||||
void __timer_save_state(struct kvm_vcpu *vcpu);
|
void __timer_save_state(struct kvm_vcpu *vcpu);
|
||||||
void __timer_restore_state(struct kvm_vcpu *vcpu);
|
void __timer_restore_state(struct kvm_vcpu *vcpu);
|
||||||
|
|
|
@ -180,14 +180,31 @@
|
||||||
|
|
||||||
#define SYS_VBAR_EL1 sys_reg(3, 0, 12, 0, 0)
|
#define SYS_VBAR_EL1 sys_reg(3, 0, 12, 0, 0)
|
||||||
|
|
||||||
|
#define SYS_ICC_IAR0_EL1 sys_reg(3, 0, 12, 8, 0)
|
||||||
|
#define SYS_ICC_EOIR0_EL1 sys_reg(3, 0, 12, 8, 1)
|
||||||
|
#define SYS_ICC_HPPIR0_EL1 sys_reg(3, 0, 12, 8, 2)
|
||||||
|
#define SYS_ICC_BPR0_EL1 sys_reg(3, 0, 12, 8, 3)
|
||||||
|
#define SYS_ICC_AP0Rn_EL1(n) sys_reg(3, 0, 12, 8, 4 | n)
|
||||||
|
#define SYS_ICC_AP0R0_EL1 SYS_ICC_AP0Rn_EL1(0)
|
||||||
|
#define SYS_ICC_AP0R1_EL1 SYS_ICC_AP0Rn_EL1(1)
|
||||||
|
#define SYS_ICC_AP0R2_EL1 SYS_ICC_AP0Rn_EL1(2)
|
||||||
|
#define SYS_ICC_AP0R3_EL1 SYS_ICC_AP0Rn_EL1(3)
|
||||||
|
#define SYS_ICC_AP1Rn_EL1(n) sys_reg(3, 0, 12, 9, n)
|
||||||
|
#define SYS_ICC_AP1R0_EL1 SYS_ICC_AP1Rn_EL1(0)
|
||||||
|
#define SYS_ICC_AP1R1_EL1 SYS_ICC_AP1Rn_EL1(1)
|
||||||
|
#define SYS_ICC_AP1R2_EL1 SYS_ICC_AP1Rn_EL1(2)
|
||||||
|
#define SYS_ICC_AP1R3_EL1 SYS_ICC_AP1Rn_EL1(3)
|
||||||
#define SYS_ICC_DIR_EL1 sys_reg(3, 0, 12, 11, 1)
|
#define SYS_ICC_DIR_EL1 sys_reg(3, 0, 12, 11, 1)
|
||||||
|
#define SYS_ICC_RPR_EL1 sys_reg(3, 0, 12, 11, 3)
|
||||||
#define SYS_ICC_SGI1R_EL1 sys_reg(3, 0, 12, 11, 5)
|
#define SYS_ICC_SGI1R_EL1 sys_reg(3, 0, 12, 11, 5)
|
||||||
#define SYS_ICC_IAR1_EL1 sys_reg(3, 0, 12, 12, 0)
|
#define SYS_ICC_IAR1_EL1 sys_reg(3, 0, 12, 12, 0)
|
||||||
#define SYS_ICC_EOIR1_EL1 sys_reg(3, 0, 12, 12, 1)
|
#define SYS_ICC_EOIR1_EL1 sys_reg(3, 0, 12, 12, 1)
|
||||||
|
#define SYS_ICC_HPPIR1_EL1 sys_reg(3, 0, 12, 12, 2)
|
||||||
#define SYS_ICC_BPR1_EL1 sys_reg(3, 0, 12, 12, 3)
|
#define SYS_ICC_BPR1_EL1 sys_reg(3, 0, 12, 12, 3)
|
||||||
#define SYS_ICC_CTLR_EL1 sys_reg(3, 0, 12, 12, 4)
|
#define SYS_ICC_CTLR_EL1 sys_reg(3, 0, 12, 12, 4)
|
||||||
#define SYS_ICC_SRE_EL1 sys_reg(3, 0, 12, 12, 5)
|
#define SYS_ICC_SRE_EL1 sys_reg(3, 0, 12, 12, 5)
|
||||||
#define SYS_ICC_GRPEN1_EL1 sys_reg(3, 0, 12, 12, 7)
|
#define SYS_ICC_IGRPEN0_EL1 sys_reg(3, 0, 12, 12, 6)
|
||||||
|
#define SYS_ICC_IGRPEN1_EL1 sys_reg(3, 0, 12, 12, 7)
|
||||||
|
|
||||||
#define SYS_CONTEXTIDR_EL1 sys_reg(3, 0, 13, 0, 1)
|
#define SYS_CONTEXTIDR_EL1 sys_reg(3, 0, 13, 0, 1)
|
||||||
#define SYS_TPIDR_EL1 sys_reg(3, 0, 13, 0, 4)
|
#define SYS_TPIDR_EL1 sys_reg(3, 0, 13, 0, 4)
|
||||||
|
@ -287,8 +304,8 @@
|
||||||
#define SCTLR_ELx_M 1
|
#define SCTLR_ELx_M 1
|
||||||
|
|
||||||
#define SCTLR_EL2_RES1 ((1 << 4) | (1 << 5) | (1 << 11) | (1 << 16) | \
|
#define SCTLR_EL2_RES1 ((1 << 4) | (1 << 5) | (1 << 11) | (1 << 16) | \
|
||||||
(1 << 16) | (1 << 18) | (1 << 22) | (1 << 23) | \
|
(1 << 18) | (1 << 22) | (1 << 23) | (1 << 28) | \
|
||||||
(1 << 28) | (1 << 29))
|
(1 << 29))
|
||||||
|
|
||||||
#define SCTLR_ELx_FLAGS (SCTLR_ELx_M | SCTLR_ELx_A | SCTLR_ELx_C | \
|
#define SCTLR_ELx_FLAGS (SCTLR_ELx_M | SCTLR_ELx_A | SCTLR_ELx_C | \
|
||||||
SCTLR_ELx_SA | SCTLR_ELx_I)
|
SCTLR_ELx_SA | SCTLR_ELx_I)
|
||||||
|
|
|
@ -232,6 +232,9 @@ struct kvm_arch_memory_slot {
|
||||||
#define KVM_ARM_VCPU_PMU_V3_CTRL 0
|
#define KVM_ARM_VCPU_PMU_V3_CTRL 0
|
||||||
#define KVM_ARM_VCPU_PMU_V3_IRQ 0
|
#define KVM_ARM_VCPU_PMU_V3_IRQ 0
|
||||||
#define KVM_ARM_VCPU_PMU_V3_INIT 1
|
#define KVM_ARM_VCPU_PMU_V3_INIT 1
|
||||||
|
#define KVM_ARM_VCPU_TIMER_CTRL 1
|
||||||
|
#define KVM_ARM_VCPU_TIMER_IRQ_VTIMER 0
|
||||||
|
#define KVM_ARM_VCPU_TIMER_IRQ_PTIMER 1
|
||||||
|
|
||||||
/* KVM_IRQ_LINE irq field index values */
|
/* KVM_IRQ_LINE irq field index values */
|
||||||
#define KVM_ARM_IRQ_TYPE_SHIFT 24
|
#define KVM_ARM_IRQ_TYPE_SHIFT 24
|
||||||
|
|
|
@ -132,6 +132,27 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
|
||||||
.capability = ARM64_WORKAROUND_CAVIUM_27456,
|
.capability = ARM64_WORKAROUND_CAVIUM_27456,
|
||||||
MIDR_RANGE(MIDR_THUNDERX_81XX, 0x00, 0x00),
|
MIDR_RANGE(MIDR_THUNDERX_81XX, 0x00, 0x00),
|
||||||
},
|
},
|
||||||
|
#endif
|
||||||
|
#ifdef CONFIG_CAVIUM_ERRATUM_30115
|
||||||
|
{
|
||||||
|
/* Cavium ThunderX, T88 pass 1.x - 2.2 */
|
||||||
|
.desc = "Cavium erratum 30115",
|
||||||
|
.capability = ARM64_WORKAROUND_CAVIUM_30115,
|
||||||
|
MIDR_RANGE(MIDR_THUNDERX, 0x00,
|
||||||
|
(1 << MIDR_VARIANT_SHIFT) | 2),
|
||||||
|
},
|
||||||
|
{
|
||||||
|
/* Cavium ThunderX, T81 pass 1.0 - 1.2 */
|
||||||
|
.desc = "Cavium erratum 30115",
|
||||||
|
.capability = ARM64_WORKAROUND_CAVIUM_30115,
|
||||||
|
MIDR_RANGE(MIDR_THUNDERX_81XX, 0x00, 0x02),
|
||||||
|
},
|
||||||
|
{
|
||||||
|
/* Cavium ThunderX, T83 pass 1.0 */
|
||||||
|
.desc = "Cavium erratum 30115",
|
||||||
|
.capability = ARM64_WORKAROUND_CAVIUM_30115,
|
||||||
|
MIDR_RANGE(MIDR_THUNDERX_83XX, 0x00, 0x00),
|
||||||
|
},
|
||||||
#endif
|
#endif
|
||||||
{
|
{
|
||||||
.desc = "Mismatched cache line size",
|
.desc = "Mismatched cache line size",
|
||||||
|
|
|
@ -390,6 +390,9 @@ int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
|
||||||
case KVM_ARM_VCPU_PMU_V3_CTRL:
|
case KVM_ARM_VCPU_PMU_V3_CTRL:
|
||||||
ret = kvm_arm_pmu_v3_set_attr(vcpu, attr);
|
ret = kvm_arm_pmu_v3_set_attr(vcpu, attr);
|
||||||
break;
|
break;
|
||||||
|
case KVM_ARM_VCPU_TIMER_CTRL:
|
||||||
|
ret = kvm_arm_timer_set_attr(vcpu, attr);
|
||||||
|
break;
|
||||||
default:
|
default:
|
||||||
ret = -ENXIO;
|
ret = -ENXIO;
|
||||||
break;
|
break;
|
||||||
|
@ -407,6 +410,9 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
|
||||||
case KVM_ARM_VCPU_PMU_V3_CTRL:
|
case KVM_ARM_VCPU_PMU_V3_CTRL:
|
||||||
ret = kvm_arm_pmu_v3_get_attr(vcpu, attr);
|
ret = kvm_arm_pmu_v3_get_attr(vcpu, attr);
|
||||||
break;
|
break;
|
||||||
|
case KVM_ARM_VCPU_TIMER_CTRL:
|
||||||
|
ret = kvm_arm_timer_get_attr(vcpu, attr);
|
||||||
|
break;
|
||||||
default:
|
default:
|
||||||
ret = -ENXIO;
|
ret = -ENXIO;
|
||||||
break;
|
break;
|
||||||
|
@ -424,6 +430,9 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
|
||||||
case KVM_ARM_VCPU_PMU_V3_CTRL:
|
case KVM_ARM_VCPU_PMU_V3_CTRL:
|
||||||
ret = kvm_arm_pmu_v3_has_attr(vcpu, attr);
|
ret = kvm_arm_pmu_v3_has_attr(vcpu, attr);
|
||||||
break;
|
break;
|
||||||
|
case KVM_ARM_VCPU_TIMER_CTRL:
|
||||||
|
ret = kvm_arm_timer_has_attr(vcpu, attr);
|
||||||
|
break;
|
||||||
default:
|
default:
|
||||||
ret = -ENXIO;
|
ret = -ENXIO;
|
||||||
break;
|
break;
|
||||||
|
|
|
@ -89,6 +89,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
|
||||||
trace_kvm_wfx_arm64(*vcpu_pc(vcpu), false);
|
trace_kvm_wfx_arm64(*vcpu_pc(vcpu), false);
|
||||||
vcpu->stat.wfi_exit_stat++;
|
vcpu->stat.wfi_exit_stat++;
|
||||||
kvm_vcpu_block(vcpu);
|
kvm_vcpu_block(vcpu);
|
||||||
|
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
|
||||||
}
|
}
|
||||||
|
|
||||||
kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
|
kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
|
||||||
|
|
|
@ -350,6 +350,20 @@ again:
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (static_branch_unlikely(&vgic_v3_cpuif_trap) &&
|
||||||
|
exit_code == ARM_EXCEPTION_TRAP &&
|
||||||
|
(kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_SYS64 ||
|
||||||
|
kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_CP15_32)) {
|
||||||
|
int ret = __vgic_v3_perform_cpuif_access(vcpu);
|
||||||
|
|
||||||
|
if (ret == 1) {
|
||||||
|
__skip_instr(vcpu);
|
||||||
|
goto again;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* 0 falls through to be handled out of EL2 */
|
||||||
|
}
|
||||||
|
|
||||||
fp_enabled = __fpsimd_enabled();
|
fp_enabled = __fpsimd_enabled();
|
||||||
|
|
||||||
__sysreg_save_guest_state(guest_ctxt);
|
__sysreg_save_guest_state(guest_ctxt);
|
||||||
|
@ -422,6 +436,7 @@ void __hyp_text __noreturn __hyp_panic(void)
|
||||||
|
|
||||||
vcpu = (struct kvm_vcpu *)read_sysreg(tpidr_el2);
|
vcpu = (struct kvm_vcpu *)read_sysreg(tpidr_el2);
|
||||||
host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
|
host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
|
||||||
|
__timer_save_state(vcpu);
|
||||||
__deactivate_traps(vcpu);
|
__deactivate_traps(vcpu);
|
||||||
__deactivate_vm(vcpu);
|
__deactivate_vm(vcpu);
|
||||||
__sysreg_restore_host_state(host_ctxt);
|
__sysreg_restore_host_state(host_ctxt);
|
||||||
|
|
|
@ -46,16 +46,6 @@ static const struct kvm_regs default_regs_reset32 = {
|
||||||
COMPAT_PSR_I_BIT | COMPAT_PSR_F_BIT),
|
COMPAT_PSR_I_BIT | COMPAT_PSR_F_BIT),
|
||||||
};
|
};
|
||||||
|
|
||||||
static const struct kvm_irq_level default_ptimer_irq = {
|
|
||||||
.irq = 30,
|
|
||||||
.level = 1,
|
|
||||||
};
|
|
||||||
|
|
||||||
static const struct kvm_irq_level default_vtimer_irq = {
|
|
||||||
.irq = 27,
|
|
||||||
.level = 1,
|
|
||||||
};
|
|
||||||
|
|
||||||
static bool cpu_has_32bit_el1(void)
|
static bool cpu_has_32bit_el1(void)
|
||||||
{
|
{
|
||||||
u64 pfr0;
|
u64 pfr0;
|
||||||
|
@ -108,8 +98,6 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext)
|
||||||
*/
|
*/
|
||||||
int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
|
int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
|
||||||
{
|
{
|
||||||
const struct kvm_irq_level *cpu_vtimer_irq;
|
|
||||||
const struct kvm_irq_level *cpu_ptimer_irq;
|
|
||||||
const struct kvm_regs *cpu_reset;
|
const struct kvm_regs *cpu_reset;
|
||||||
|
|
||||||
switch (vcpu->arch.target) {
|
switch (vcpu->arch.target) {
|
||||||
|
@ -122,8 +110,6 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
|
||||||
cpu_reset = &default_regs_reset;
|
cpu_reset = &default_regs_reset;
|
||||||
}
|
}
|
||||||
|
|
||||||
cpu_vtimer_irq = &default_vtimer_irq;
|
|
||||||
cpu_ptimer_irq = &default_ptimer_irq;
|
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -137,5 +123,5 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
|
||||||
kvm_pmu_vcpu_reset(vcpu);
|
kvm_pmu_vcpu_reset(vcpu);
|
||||||
|
|
||||||
/* Reset timer */
|
/* Reset timer */
|
||||||
return kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq, cpu_ptimer_irq);
|
return kvm_timer_vcpu_reset(vcpu);
|
||||||
}
|
}
|
||||||
|
|
|
@ -56,7 +56,8 @@
|
||||||
*/
|
*/
|
||||||
|
|
||||||
static bool read_from_write_only(struct kvm_vcpu *vcpu,
|
static bool read_from_write_only(struct kvm_vcpu *vcpu,
|
||||||
const struct sys_reg_params *params)
|
struct sys_reg_params *params,
|
||||||
|
const struct sys_reg_desc *r)
|
||||||
{
|
{
|
||||||
WARN_ONCE(1, "Unexpected sys_reg read to write-only register\n");
|
WARN_ONCE(1, "Unexpected sys_reg read to write-only register\n");
|
||||||
print_sys_reg_instr(params);
|
print_sys_reg_instr(params);
|
||||||
|
@ -64,6 +65,16 @@ static bool read_from_write_only(struct kvm_vcpu *vcpu,
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static bool write_to_read_only(struct kvm_vcpu *vcpu,
|
||||||
|
struct sys_reg_params *params,
|
||||||
|
const struct sys_reg_desc *r)
|
||||||
|
{
|
||||||
|
WARN_ONCE(1, "Unexpected sys_reg write to read-only register\n");
|
||||||
|
print_sys_reg_instr(params);
|
||||||
|
kvm_inject_undefined(vcpu);
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
/* 3 bits per cache level, as per CLIDR, but non-existent caches always 0 */
|
/* 3 bits per cache level, as per CLIDR, but non-existent caches always 0 */
|
||||||
static u32 cache_levels;
|
static u32 cache_levels;
|
||||||
|
|
||||||
|
@ -93,7 +104,7 @@ static bool access_dcsw(struct kvm_vcpu *vcpu,
|
||||||
const struct sys_reg_desc *r)
|
const struct sys_reg_desc *r)
|
||||||
{
|
{
|
||||||
if (!p->is_write)
|
if (!p->is_write)
|
||||||
return read_from_write_only(vcpu, p);
|
return read_from_write_only(vcpu, p, r);
|
||||||
|
|
||||||
kvm_set_way_flush(vcpu);
|
kvm_set_way_flush(vcpu);
|
||||||
return true;
|
return true;
|
||||||
|
@ -135,7 +146,7 @@ static bool access_gic_sgi(struct kvm_vcpu *vcpu,
|
||||||
const struct sys_reg_desc *r)
|
const struct sys_reg_desc *r)
|
||||||
{
|
{
|
||||||
if (!p->is_write)
|
if (!p->is_write)
|
||||||
return read_from_write_only(vcpu, p);
|
return read_from_write_only(vcpu, p, r);
|
||||||
|
|
||||||
vgic_v3_dispatch_sgi(vcpu, p->regval);
|
vgic_v3_dispatch_sgi(vcpu, p->regval);
|
||||||
|
|
||||||
|
@ -773,7 +784,7 @@ static bool access_pmswinc(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
|
||||||
return trap_raz_wi(vcpu, p, r);
|
return trap_raz_wi(vcpu, p, r);
|
||||||
|
|
||||||
if (!p->is_write)
|
if (!p->is_write)
|
||||||
return read_from_write_only(vcpu, p);
|
return read_from_write_only(vcpu, p, r);
|
||||||
|
|
||||||
if (pmu_write_swinc_el0_disabled(vcpu))
|
if (pmu_write_swinc_el0_disabled(vcpu))
|
||||||
return false;
|
return false;
|
||||||
|
@ -953,7 +964,15 @@ static const struct sys_reg_desc sys_reg_descs[] = {
|
||||||
|
|
||||||
{ SYS_DESC(SYS_VBAR_EL1), NULL, reset_val, VBAR_EL1, 0 },
|
{ SYS_DESC(SYS_VBAR_EL1), NULL, reset_val, VBAR_EL1, 0 },
|
||||||
|
|
||||||
|
{ SYS_DESC(SYS_ICC_IAR0_EL1), write_to_read_only },
|
||||||
|
{ SYS_DESC(SYS_ICC_EOIR0_EL1), read_from_write_only },
|
||||||
|
{ SYS_DESC(SYS_ICC_HPPIR0_EL1), write_to_read_only },
|
||||||
|
{ SYS_DESC(SYS_ICC_DIR_EL1), read_from_write_only },
|
||||||
|
{ SYS_DESC(SYS_ICC_RPR_EL1), write_to_read_only },
|
||||||
{ SYS_DESC(SYS_ICC_SGI1R_EL1), access_gic_sgi },
|
{ SYS_DESC(SYS_ICC_SGI1R_EL1), access_gic_sgi },
|
||||||
|
{ SYS_DESC(SYS_ICC_IAR1_EL1), write_to_read_only },
|
||||||
|
{ SYS_DESC(SYS_ICC_EOIR1_EL1), read_from_write_only },
|
||||||
|
{ SYS_DESC(SYS_ICC_HPPIR1_EL1), write_to_read_only },
|
||||||
{ SYS_DESC(SYS_ICC_SRE_EL1), access_gic_sre },
|
{ SYS_DESC(SYS_ICC_SRE_EL1), access_gic_sre },
|
||||||
|
|
||||||
{ SYS_DESC(SYS_CONTEXTIDR_EL1), access_vm_reg, reset_val, CONTEXTIDR_EL1, 0 },
|
{ SYS_DESC(SYS_CONTEXTIDR_EL1), access_vm_reg, reset_val, CONTEXTIDR_EL1, 0 },
|
||||||
|
|
|
@ -268,36 +268,21 @@ static bool access_gic_sre(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
static const struct sys_reg_desc gic_v3_icc_reg_descs[] = {
|
static const struct sys_reg_desc gic_v3_icc_reg_descs[] = {
|
||||||
/* ICC_PMR_EL1 */
|
{ SYS_DESC(SYS_ICC_PMR_EL1), access_gic_pmr },
|
||||||
{ Op0(3), Op1(0), CRn(4), CRm(6), Op2(0), access_gic_pmr },
|
{ SYS_DESC(SYS_ICC_BPR0_EL1), access_gic_bpr0 },
|
||||||
/* ICC_BPR0_EL1 */
|
{ SYS_DESC(SYS_ICC_AP0R0_EL1), access_gic_ap0r },
|
||||||
{ Op0(3), Op1(0), CRn(12), CRm(8), Op2(3), access_gic_bpr0 },
|
{ SYS_DESC(SYS_ICC_AP0R1_EL1), access_gic_ap0r },
|
||||||
/* ICC_AP0R0_EL1 */
|
{ SYS_DESC(SYS_ICC_AP0R2_EL1), access_gic_ap0r },
|
||||||
{ Op0(3), Op1(0), CRn(12), CRm(8), Op2(4), access_gic_ap0r },
|
{ SYS_DESC(SYS_ICC_AP0R3_EL1), access_gic_ap0r },
|
||||||
/* ICC_AP0R1_EL1 */
|
{ SYS_DESC(SYS_ICC_AP1R0_EL1), access_gic_ap1r },
|
||||||
{ Op0(3), Op1(0), CRn(12), CRm(8), Op2(5), access_gic_ap0r },
|
{ SYS_DESC(SYS_ICC_AP1R1_EL1), access_gic_ap1r },
|
||||||
/* ICC_AP0R2_EL1 */
|
{ SYS_DESC(SYS_ICC_AP1R2_EL1), access_gic_ap1r },
|
||||||
{ Op0(3), Op1(0), CRn(12), CRm(8), Op2(6), access_gic_ap0r },
|
{ SYS_DESC(SYS_ICC_AP1R3_EL1), access_gic_ap1r },
|
||||||
/* ICC_AP0R3_EL1 */
|
{ SYS_DESC(SYS_ICC_BPR1_EL1), access_gic_bpr1 },
|
||||||
{ Op0(3), Op1(0), CRn(12), CRm(8), Op2(7), access_gic_ap0r },
|
{ SYS_DESC(SYS_ICC_CTLR_EL1), access_gic_ctlr },
|
||||||
/* ICC_AP1R0_EL1 */
|
{ SYS_DESC(SYS_ICC_SRE_EL1), access_gic_sre },
|
||||||
{ Op0(3), Op1(0), CRn(12), CRm(9), Op2(0), access_gic_ap1r },
|
{ SYS_DESC(SYS_ICC_IGRPEN0_EL1), access_gic_grpen0 },
|
||||||
/* ICC_AP1R1_EL1 */
|
{ SYS_DESC(SYS_ICC_IGRPEN1_EL1), access_gic_grpen1 },
|
||||||
{ Op0(3), Op1(0), CRn(12), CRm(9), Op2(1), access_gic_ap1r },
|
|
||||||
/* ICC_AP1R2_EL1 */
|
|
||||||
{ Op0(3), Op1(0), CRn(12), CRm(9), Op2(2), access_gic_ap1r },
|
|
||||||
/* ICC_AP1R3_EL1 */
|
|
||||||
{ Op0(3), Op1(0), CRn(12), CRm(9), Op2(3), access_gic_ap1r },
|
|
||||||
/* ICC_BPR1_EL1 */
|
|
||||||
{ Op0(3), Op1(0), CRn(12), CRm(12), Op2(3), access_gic_bpr1 },
|
|
||||||
/* ICC_CTLR_EL1 */
|
|
||||||
{ Op0(3), Op1(0), CRn(12), CRm(12), Op2(4), access_gic_ctlr },
|
|
||||||
/* ICC_SRE_EL1 */
|
|
||||||
{ Op0(3), Op1(0), CRn(12), CRm(12), Op2(5), access_gic_sre },
|
|
||||||
/* ICC_IGRPEN0_EL1 */
|
|
||||||
{ Op0(3), Op1(0), CRn(12), CRm(12), Op2(6), access_gic_grpen0 },
|
|
||||||
/* ICC_GRPEN1_EL1 */
|
|
||||||
{ Op0(3), Op1(0), CRn(12), CRm(12), Op2(7), access_gic_grpen1 },
|
|
||||||
};
|
};
|
||||||
|
|
||||||
int vgic_v3_has_cpu_sysregs_attr(struct kvm_vcpu *vcpu, bool is_write, u64 id,
|
int vgic_v3_has_cpu_sysregs_attr(struct kvm_vcpu *vcpu, bool is_write, u64 id,
|
||||||
|
|
|
@ -1094,7 +1094,7 @@ static void kvm_trap_emul_check_requests(struct kvm_vcpu *vcpu, int cpu,
|
||||||
struct mm_struct *mm;
|
struct mm_struct *mm;
|
||||||
int i;
|
int i;
|
||||||
|
|
||||||
if (likely(!vcpu->requests))
|
if (likely(!kvm_request_pending(vcpu)))
|
||||||
return;
|
return;
|
||||||
|
|
||||||
if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) {
|
if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) {
|
||||||
|
|
|
@ -2337,7 +2337,7 @@ static int kvm_vz_check_requests(struct kvm_vcpu *vcpu, int cpu)
|
||||||
int ret = 0;
|
int ret = 0;
|
||||||
int i;
|
int i;
|
||||||
|
|
||||||
if (!vcpu->requests)
|
if (!kvm_request_pending(vcpu))
|
||||||
return 0;
|
return 0;
|
||||||
|
|
||||||
if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) {
|
if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) {
|
||||||
|
|
|
@ -86,7 +86,6 @@ struct kvmppc_vcore {
|
||||||
u16 last_cpu;
|
u16 last_cpu;
|
||||||
u8 vcore_state;
|
u8 vcore_state;
|
||||||
u8 in_guest;
|
u8 in_guest;
|
||||||
struct kvmppc_vcore *master_vcore;
|
|
||||||
struct kvm_vcpu *runnable_threads[MAX_SMT_THREADS];
|
struct kvm_vcpu *runnable_threads[MAX_SMT_THREADS];
|
||||||
struct list_head preempt_list;
|
struct list_head preempt_list;
|
||||||
spinlock_t lock;
|
spinlock_t lock;
|
||||||
|
|
|
@ -81,7 +81,7 @@ struct kvm_split_mode {
|
||||||
u8 subcore_size;
|
u8 subcore_size;
|
||||||
u8 do_nap;
|
u8 do_nap;
|
||||||
u8 napped[MAX_SMT_THREADS];
|
u8 napped[MAX_SMT_THREADS];
|
||||||
struct kvmppc_vcore *master_vcs[MAX_SUBCORES];
|
struct kvmppc_vcore *vc[MAX_SUBCORES];
|
||||||
};
|
};
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
|
|
@ -35,6 +35,7 @@
|
||||||
#include <asm/page.h>
|
#include <asm/page.h>
|
||||||
#include <asm/cacheflush.h>
|
#include <asm/cacheflush.h>
|
||||||
#include <asm/hvcall.h>
|
#include <asm/hvcall.h>
|
||||||
|
#include <asm/mce.h>
|
||||||
|
|
||||||
#define KVM_MAX_VCPUS NR_CPUS
|
#define KVM_MAX_VCPUS NR_CPUS
|
||||||
#define KVM_MAX_VCORES NR_CPUS
|
#define KVM_MAX_VCORES NR_CPUS
|
||||||
|
@ -52,8 +53,8 @@
|
||||||
#define KVM_IRQCHIP_NUM_PINS 256
|
#define KVM_IRQCHIP_NUM_PINS 256
|
||||||
|
|
||||||
/* PPC-specific vcpu->requests bit members */
|
/* PPC-specific vcpu->requests bit members */
|
||||||
#define KVM_REQ_WATCHDOG 8
|
#define KVM_REQ_WATCHDOG KVM_ARCH_REQ(0)
|
||||||
#define KVM_REQ_EPR_EXIT 9
|
#define KVM_REQ_EPR_EXIT KVM_ARCH_REQ(1)
|
||||||
|
|
||||||
#include <linux/mmu_notifier.h>
|
#include <linux/mmu_notifier.h>
|
||||||
|
|
||||||
|
@ -267,6 +268,8 @@ struct kvm_resize_hpt;
|
||||||
|
|
||||||
struct kvm_arch {
|
struct kvm_arch {
|
||||||
unsigned int lpid;
|
unsigned int lpid;
|
||||||
|
unsigned int smt_mode; /* # vcpus per virtual core */
|
||||||
|
unsigned int emul_smt_mode; /* emualted SMT mode, on P9 */
|
||||||
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
|
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
|
||||||
unsigned int tlb_sets;
|
unsigned int tlb_sets;
|
||||||
struct kvm_hpt_info hpt;
|
struct kvm_hpt_info hpt;
|
||||||
|
@ -285,6 +288,7 @@ struct kvm_arch {
|
||||||
cpumask_t need_tlb_flush;
|
cpumask_t need_tlb_flush;
|
||||||
cpumask_t cpu_in_guest;
|
cpumask_t cpu_in_guest;
|
||||||
u8 radix;
|
u8 radix;
|
||||||
|
u8 fwnmi_enabled;
|
||||||
pgd_t *pgtable;
|
pgd_t *pgtable;
|
||||||
u64 process_table;
|
u64 process_table;
|
||||||
struct dentry *debugfs_dir;
|
struct dentry *debugfs_dir;
|
||||||
|
@ -566,6 +570,7 @@ struct kvm_vcpu_arch {
|
||||||
ulong wort;
|
ulong wort;
|
||||||
ulong tid;
|
ulong tid;
|
||||||
ulong psscr;
|
ulong psscr;
|
||||||
|
ulong hfscr;
|
||||||
ulong shadow_srr1;
|
ulong shadow_srr1;
|
||||||
#endif
|
#endif
|
||||||
u32 vrsave; /* also USPRG0 */
|
u32 vrsave; /* also USPRG0 */
|
||||||
|
@ -579,7 +584,7 @@ struct kvm_vcpu_arch {
|
||||||
ulong mcsrr0;
|
ulong mcsrr0;
|
||||||
ulong mcsrr1;
|
ulong mcsrr1;
|
||||||
ulong mcsr;
|
ulong mcsr;
|
||||||
u32 dec;
|
ulong dec;
|
||||||
#ifdef CONFIG_BOOKE
|
#ifdef CONFIG_BOOKE
|
||||||
u32 decar;
|
u32 decar;
|
||||||
#endif
|
#endif
|
||||||
|
@ -710,6 +715,7 @@ struct kvm_vcpu_arch {
|
||||||
unsigned long pending_exceptions;
|
unsigned long pending_exceptions;
|
||||||
u8 ceded;
|
u8 ceded;
|
||||||
u8 prodded;
|
u8 prodded;
|
||||||
|
u8 doorbell_request;
|
||||||
u32 last_inst;
|
u32 last_inst;
|
||||||
|
|
||||||
struct swait_queue_head *wqp;
|
struct swait_queue_head *wqp;
|
||||||
|
@ -722,6 +728,7 @@ struct kvm_vcpu_arch {
|
||||||
int prev_cpu;
|
int prev_cpu;
|
||||||
bool timer_running;
|
bool timer_running;
|
||||||
wait_queue_head_t cpu_run;
|
wait_queue_head_t cpu_run;
|
||||||
|
struct machine_check_event mce_evt; /* Valid if trap == 0x200 */
|
||||||
|
|
||||||
struct kvm_vcpu_arch_shared *shared;
|
struct kvm_vcpu_arch_shared *shared;
|
||||||
#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_KVM_BOOK3S_PR_POSSIBLE)
|
#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_KVM_BOOK3S_PR_POSSIBLE)
|
||||||
|
|
|
@ -315,6 +315,8 @@ struct kvmppc_ops {
|
||||||
struct irq_bypass_producer *);
|
struct irq_bypass_producer *);
|
||||||
int (*configure_mmu)(struct kvm *kvm, struct kvm_ppc_mmuv3_cfg *cfg);
|
int (*configure_mmu)(struct kvm *kvm, struct kvm_ppc_mmuv3_cfg *cfg);
|
||||||
int (*get_rmmu_info)(struct kvm *kvm, struct kvm_ppc_rmmu_info *info);
|
int (*get_rmmu_info)(struct kvm *kvm, struct kvm_ppc_rmmu_info *info);
|
||||||
|
int (*set_smt_mode)(struct kvm *kvm, unsigned long mode,
|
||||||
|
unsigned long flags);
|
||||||
};
|
};
|
||||||
|
|
||||||
extern struct kvmppc_ops *kvmppc_hv_ops;
|
extern struct kvmppc_ops *kvmppc_hv_ops;
|
||||||
|
|
|
@ -103,6 +103,8 @@
|
||||||
#define OP_31_XOP_STBUX 247
|
#define OP_31_XOP_STBUX 247
|
||||||
#define OP_31_XOP_LHZX 279
|
#define OP_31_XOP_LHZX 279
|
||||||
#define OP_31_XOP_LHZUX 311
|
#define OP_31_XOP_LHZUX 311
|
||||||
|
#define OP_31_XOP_MSGSNDP 142
|
||||||
|
#define OP_31_XOP_MSGCLRP 174
|
||||||
#define OP_31_XOP_MFSPR 339
|
#define OP_31_XOP_MFSPR 339
|
||||||
#define OP_31_XOP_LWAX 341
|
#define OP_31_XOP_LWAX 341
|
||||||
#define OP_31_XOP_LHAX 343
|
#define OP_31_XOP_LHAX 343
|
||||||
|
|
|
@ -60,6 +60,12 @@ struct kvm_regs {
|
||||||
|
|
||||||
#define KVM_SREGS_E_FSL_PIDn (1 << 0) /* PID1/PID2 */
|
#define KVM_SREGS_E_FSL_PIDn (1 << 0) /* PID1/PID2 */
|
||||||
|
|
||||||
|
/* flags for kvm_run.flags */
|
||||||
|
#define KVM_RUN_PPC_NMI_DISP_MASK (3 << 0)
|
||||||
|
#define KVM_RUN_PPC_NMI_DISP_FULLY_RECOV (1 << 0)
|
||||||
|
#define KVM_RUN_PPC_NMI_DISP_LIMITED_RECOV (2 << 0)
|
||||||
|
#define KVM_RUN_PPC_NMI_DISP_NOT_RECOV (3 << 0)
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Feature bits indicate which sections of the sregs struct are valid,
|
* Feature bits indicate which sections of the sregs struct are valid,
|
||||||
* both in KVM_GET_SREGS and KVM_SET_SREGS. On KVM_SET_SREGS, registers
|
* both in KVM_GET_SREGS and KVM_SET_SREGS. On KVM_SET_SREGS, registers
|
||||||
|
|
|
@ -485,6 +485,7 @@ int main(void)
|
||||||
OFFSET(KVM_ENABLED_HCALLS, kvm, arch.enabled_hcalls);
|
OFFSET(KVM_ENABLED_HCALLS, kvm, arch.enabled_hcalls);
|
||||||
OFFSET(KVM_VRMA_SLB_V, kvm, arch.vrma_slb_v);
|
OFFSET(KVM_VRMA_SLB_V, kvm, arch.vrma_slb_v);
|
||||||
OFFSET(KVM_RADIX, kvm, arch.radix);
|
OFFSET(KVM_RADIX, kvm, arch.radix);
|
||||||
|
OFFSET(KVM_FWNMI, kvm, arch.fwnmi_enabled);
|
||||||
OFFSET(VCPU_DSISR, kvm_vcpu, arch.shregs.dsisr);
|
OFFSET(VCPU_DSISR, kvm_vcpu, arch.shregs.dsisr);
|
||||||
OFFSET(VCPU_DAR, kvm_vcpu, arch.shregs.dar);
|
OFFSET(VCPU_DAR, kvm_vcpu, arch.shregs.dar);
|
||||||
OFFSET(VCPU_VPA, kvm_vcpu, arch.vpa.pinned_addr);
|
OFFSET(VCPU_VPA, kvm_vcpu, arch.vpa.pinned_addr);
|
||||||
|
@ -513,6 +514,7 @@ int main(void)
|
||||||
OFFSET(VCPU_PENDING_EXC, kvm_vcpu, arch.pending_exceptions);
|
OFFSET(VCPU_PENDING_EXC, kvm_vcpu, arch.pending_exceptions);
|
||||||
OFFSET(VCPU_CEDED, kvm_vcpu, arch.ceded);
|
OFFSET(VCPU_CEDED, kvm_vcpu, arch.ceded);
|
||||||
OFFSET(VCPU_PRODDED, kvm_vcpu, arch.prodded);
|
OFFSET(VCPU_PRODDED, kvm_vcpu, arch.prodded);
|
||||||
|
OFFSET(VCPU_DBELL_REQ, kvm_vcpu, arch.doorbell_request);
|
||||||
OFFSET(VCPU_MMCR, kvm_vcpu, arch.mmcr);
|
OFFSET(VCPU_MMCR, kvm_vcpu, arch.mmcr);
|
||||||
OFFSET(VCPU_PMC, kvm_vcpu, arch.pmc);
|
OFFSET(VCPU_PMC, kvm_vcpu, arch.pmc);
|
||||||
OFFSET(VCPU_SPMC, kvm_vcpu, arch.spmc);
|
OFFSET(VCPU_SPMC, kvm_vcpu, arch.spmc);
|
||||||
|
@ -542,6 +544,7 @@ int main(void)
|
||||||
OFFSET(VCPU_WORT, kvm_vcpu, arch.wort);
|
OFFSET(VCPU_WORT, kvm_vcpu, arch.wort);
|
||||||
OFFSET(VCPU_TID, kvm_vcpu, arch.tid);
|
OFFSET(VCPU_TID, kvm_vcpu, arch.tid);
|
||||||
OFFSET(VCPU_PSSCR, kvm_vcpu, arch.psscr);
|
OFFSET(VCPU_PSSCR, kvm_vcpu, arch.psscr);
|
||||||
|
OFFSET(VCPU_HFSCR, kvm_vcpu, arch.hfscr);
|
||||||
OFFSET(VCORE_ENTRY_EXIT, kvmppc_vcore, entry_exit_map);
|
OFFSET(VCORE_ENTRY_EXIT, kvmppc_vcore, entry_exit_map);
|
||||||
OFFSET(VCORE_IN_GUEST, kvmppc_vcore, in_guest);
|
OFFSET(VCORE_IN_GUEST, kvmppc_vcore, in_guest);
|
||||||
OFFSET(VCORE_NAPPING_THREADS, kvmppc_vcore, napping_threads);
|
OFFSET(VCORE_NAPPING_THREADS, kvmppc_vcore, napping_threads);
|
||||||
|
|
|
@ -405,6 +405,7 @@ void machine_check_print_event_info(struct machine_check_event *evt,
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
EXPORT_SYMBOL_GPL(machine_check_print_event_info);
|
||||||
|
|
||||||
uint64_t get_mce_fault_addr(struct machine_check_event *evt)
|
uint64_t get_mce_fault_addr(struct machine_check_event *evt)
|
||||||
{
|
{
|
||||||
|
|
|
@ -46,6 +46,8 @@
|
||||||
#include <linux/of.h>
|
#include <linux/of.h>
|
||||||
|
|
||||||
#include <asm/reg.h>
|
#include <asm/reg.h>
|
||||||
|
#include <asm/ppc-opcode.h>
|
||||||
|
#include <asm/disassemble.h>
|
||||||
#include <asm/cputable.h>
|
#include <asm/cputable.h>
|
||||||
#include <asm/cacheflush.h>
|
#include <asm/cacheflush.h>
|
||||||
#include <asm/tlbflush.h>
|
#include <asm/tlbflush.h>
|
||||||
|
@ -645,6 +647,7 @@ static void kvmppc_create_dtl_entry(struct kvm_vcpu *vcpu,
|
||||||
unsigned long stolen;
|
unsigned long stolen;
|
||||||
unsigned long core_stolen;
|
unsigned long core_stolen;
|
||||||
u64 now;
|
u64 now;
|
||||||
|
unsigned long flags;
|
||||||
|
|
||||||
dt = vcpu->arch.dtl_ptr;
|
dt = vcpu->arch.dtl_ptr;
|
||||||
vpa = vcpu->arch.vpa.pinned_addr;
|
vpa = vcpu->arch.vpa.pinned_addr;
|
||||||
|
@ -652,10 +655,10 @@ static void kvmppc_create_dtl_entry(struct kvm_vcpu *vcpu,
|
||||||
core_stolen = vcore_stolen_time(vc, now);
|
core_stolen = vcore_stolen_time(vc, now);
|
||||||
stolen = core_stolen - vcpu->arch.stolen_logged;
|
stolen = core_stolen - vcpu->arch.stolen_logged;
|
||||||
vcpu->arch.stolen_logged = core_stolen;
|
vcpu->arch.stolen_logged = core_stolen;
|
||||||
spin_lock_irq(&vcpu->arch.tbacct_lock);
|
spin_lock_irqsave(&vcpu->arch.tbacct_lock, flags);
|
||||||
stolen += vcpu->arch.busy_stolen;
|
stolen += vcpu->arch.busy_stolen;
|
||||||
vcpu->arch.busy_stolen = 0;
|
vcpu->arch.busy_stolen = 0;
|
||||||
spin_unlock_irq(&vcpu->arch.tbacct_lock);
|
spin_unlock_irqrestore(&vcpu->arch.tbacct_lock, flags);
|
||||||
if (!dt || !vpa)
|
if (!dt || !vpa)
|
||||||
return;
|
return;
|
||||||
memset(dt, 0, sizeof(struct dtl_entry));
|
memset(dt, 0, sizeof(struct dtl_entry));
|
||||||
|
@ -675,6 +678,26 @@ static void kvmppc_create_dtl_entry(struct kvm_vcpu *vcpu,
|
||||||
vcpu->arch.dtl.dirty = true;
|
vcpu->arch.dtl.dirty = true;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* See if there is a doorbell interrupt pending for a vcpu */
|
||||||
|
static bool kvmppc_doorbell_pending(struct kvm_vcpu *vcpu)
|
||||||
|
{
|
||||||
|
int thr;
|
||||||
|
struct kvmppc_vcore *vc;
|
||||||
|
|
||||||
|
if (vcpu->arch.doorbell_request)
|
||||||
|
return true;
|
||||||
|
/*
|
||||||
|
* Ensure that the read of vcore->dpdes comes after the read
|
||||||
|
* of vcpu->doorbell_request. This barrier matches the
|
||||||
|
* lwsync in book3s_hv_rmhandlers.S just before the
|
||||||
|
* fast_guest_return label.
|
||||||
|
*/
|
||||||
|
smp_rmb();
|
||||||
|
vc = vcpu->arch.vcore;
|
||||||
|
thr = vcpu->vcpu_id - vc->first_vcpuid;
|
||||||
|
return !!(vc->dpdes & (1 << thr));
|
||||||
|
}
|
||||||
|
|
||||||
static bool kvmppc_power8_compatible(struct kvm_vcpu *vcpu)
|
static bool kvmppc_power8_compatible(struct kvm_vcpu *vcpu)
|
||||||
{
|
{
|
||||||
if (vcpu->arch.vcore->arch_compat >= PVR_ARCH_207)
|
if (vcpu->arch.vcore->arch_compat >= PVR_ARCH_207)
|
||||||
|
@ -926,6 +949,101 @@ static int kvmppc_emulate_debug_inst(struct kvm_run *run,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void do_nothing(void *x)
|
||||||
|
{
|
||||||
|
}
|
||||||
|
|
||||||
|
static unsigned long kvmppc_read_dpdes(struct kvm_vcpu *vcpu)
|
||||||
|
{
|
||||||
|
int thr, cpu, pcpu, nthreads;
|
||||||
|
struct kvm_vcpu *v;
|
||||||
|
unsigned long dpdes;
|
||||||
|
|
||||||
|
nthreads = vcpu->kvm->arch.emul_smt_mode;
|
||||||
|
dpdes = 0;
|
||||||
|
cpu = vcpu->vcpu_id & ~(nthreads - 1);
|
||||||
|
for (thr = 0; thr < nthreads; ++thr, ++cpu) {
|
||||||
|
v = kvmppc_find_vcpu(vcpu->kvm, cpu);
|
||||||
|
if (!v)
|
||||||
|
continue;
|
||||||
|
/*
|
||||||
|
* If the vcpu is currently running on a physical cpu thread,
|
||||||
|
* interrupt it in order to pull it out of the guest briefly,
|
||||||
|
* which will update its vcore->dpdes value.
|
||||||
|
*/
|
||||||
|
pcpu = READ_ONCE(v->cpu);
|
||||||
|
if (pcpu >= 0)
|
||||||
|
smp_call_function_single(pcpu, do_nothing, NULL, 1);
|
||||||
|
if (kvmppc_doorbell_pending(v))
|
||||||
|
dpdes |= 1 << thr;
|
||||||
|
}
|
||||||
|
return dpdes;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* On POWER9, emulate doorbell-related instructions in order to
|
||||||
|
* give the guest the illusion of running on a multi-threaded core.
|
||||||
|
* The instructions emulated are msgsndp, msgclrp, mfspr TIR,
|
||||||
|
* and mfspr DPDES.
|
||||||
|
*/
|
||||||
|
static int kvmppc_emulate_doorbell_instr(struct kvm_vcpu *vcpu)
|
||||||
|
{
|
||||||
|
u32 inst, rb, thr;
|
||||||
|
unsigned long arg;
|
||||||
|
struct kvm *kvm = vcpu->kvm;
|
||||||
|
struct kvm_vcpu *tvcpu;
|
||||||
|
|
||||||
|
if (!cpu_has_feature(CPU_FTR_ARCH_300))
|
||||||
|
return EMULATE_FAIL;
|
||||||
|
if (kvmppc_get_last_inst(vcpu, INST_GENERIC, &inst) != EMULATE_DONE)
|
||||||
|
return RESUME_GUEST;
|
||||||
|
if (get_op(inst) != 31)
|
||||||
|
return EMULATE_FAIL;
|
||||||
|
rb = get_rb(inst);
|
||||||
|
thr = vcpu->vcpu_id & (kvm->arch.emul_smt_mode - 1);
|
||||||
|
switch (get_xop(inst)) {
|
||||||
|
case OP_31_XOP_MSGSNDP:
|
||||||
|
arg = kvmppc_get_gpr(vcpu, rb);
|
||||||
|
if (((arg >> 27) & 0xf) != PPC_DBELL_SERVER)
|
||||||
|
break;
|
||||||
|
arg &= 0x3f;
|
||||||
|
if (arg >= kvm->arch.emul_smt_mode)
|
||||||
|
break;
|
||||||
|
tvcpu = kvmppc_find_vcpu(kvm, vcpu->vcpu_id - thr + arg);
|
||||||
|
if (!tvcpu)
|
||||||
|
break;
|
||||||
|
if (!tvcpu->arch.doorbell_request) {
|
||||||
|
tvcpu->arch.doorbell_request = 1;
|
||||||
|
kvmppc_fast_vcpu_kick_hv(tvcpu);
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case OP_31_XOP_MSGCLRP:
|
||||||
|
arg = kvmppc_get_gpr(vcpu, rb);
|
||||||
|
if (((arg >> 27) & 0xf) != PPC_DBELL_SERVER)
|
||||||
|
break;
|
||||||
|
vcpu->arch.vcore->dpdes = 0;
|
||||||
|
vcpu->arch.doorbell_request = 0;
|
||||||
|
break;
|
||||||
|
case OP_31_XOP_MFSPR:
|
||||||
|
switch (get_sprn(inst)) {
|
||||||
|
case SPRN_TIR:
|
||||||
|
arg = thr;
|
||||||
|
break;
|
||||||
|
case SPRN_DPDES:
|
||||||
|
arg = kvmppc_read_dpdes(vcpu);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
return EMULATE_FAIL;
|
||||||
|
}
|
||||||
|
kvmppc_set_gpr(vcpu, get_rt(inst), arg);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
return EMULATE_FAIL;
|
||||||
|
}
|
||||||
|
kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) + 4);
|
||||||
|
return RESUME_GUEST;
|
||||||
|
}
|
||||||
|
|
||||||
static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu,
|
static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu,
|
||||||
struct task_struct *tsk)
|
struct task_struct *tsk)
|
||||||
{
|
{
|
||||||
|
@ -971,15 +1089,20 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu,
|
||||||
r = RESUME_GUEST;
|
r = RESUME_GUEST;
|
||||||
break;
|
break;
|
||||||
case BOOK3S_INTERRUPT_MACHINE_CHECK:
|
case BOOK3S_INTERRUPT_MACHINE_CHECK:
|
||||||
/*
|
/* Exit to guest with KVM_EXIT_NMI as exit reason */
|
||||||
* Deliver a machine check interrupt to the guest.
|
run->exit_reason = KVM_EXIT_NMI;
|
||||||
* We have to do this, even if the host has handled the
|
run->hw.hardware_exit_reason = vcpu->arch.trap;
|
||||||
* machine check, because machine checks use SRR0/1 and
|
/* Clear out the old NMI status from run->flags */
|
||||||
* the interrupt might have trashed guest state in them.
|
run->flags &= ~KVM_RUN_PPC_NMI_DISP_MASK;
|
||||||
*/
|
/* Now set the NMI status */
|
||||||
kvmppc_book3s_queue_irqprio(vcpu,
|
if (vcpu->arch.mce_evt.disposition == MCE_DISPOSITION_RECOVERED)
|
||||||
BOOK3S_INTERRUPT_MACHINE_CHECK);
|
run->flags |= KVM_RUN_PPC_NMI_DISP_FULLY_RECOV;
|
||||||
r = RESUME_GUEST;
|
else
|
||||||
|
run->flags |= KVM_RUN_PPC_NMI_DISP_NOT_RECOV;
|
||||||
|
|
||||||
|
r = RESUME_HOST;
|
||||||
|
/* Print the MCE event to host console. */
|
||||||
|
machine_check_print_event_info(&vcpu->arch.mce_evt, false);
|
||||||
break;
|
break;
|
||||||
case BOOK3S_INTERRUPT_PROGRAM:
|
case BOOK3S_INTERRUPT_PROGRAM:
|
||||||
{
|
{
|
||||||
|
@ -1048,12 +1171,19 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu,
|
||||||
break;
|
break;
|
||||||
/*
|
/*
|
||||||
* This occurs if the guest (kernel or userspace), does something that
|
* This occurs if the guest (kernel or userspace), does something that
|
||||||
* is prohibited by HFSCR. We just generate a program interrupt to
|
* is prohibited by HFSCR.
|
||||||
* the guest.
|
* On POWER9, this could be a doorbell instruction that we need
|
||||||
|
* to emulate.
|
||||||
|
* Otherwise, we just generate a program interrupt to the guest.
|
||||||
*/
|
*/
|
||||||
case BOOK3S_INTERRUPT_H_FAC_UNAVAIL:
|
case BOOK3S_INTERRUPT_H_FAC_UNAVAIL:
|
||||||
kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
|
r = EMULATE_FAIL;
|
||||||
r = RESUME_GUEST;
|
if ((vcpu->arch.hfscr >> 56) == FSCR_MSGP_LG)
|
||||||
|
r = kvmppc_emulate_doorbell_instr(vcpu);
|
||||||
|
if (r == EMULATE_FAIL) {
|
||||||
|
kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
|
||||||
|
r = RESUME_GUEST;
|
||||||
|
}
|
||||||
break;
|
break;
|
||||||
case BOOK3S_INTERRUPT_HV_RM_HARD:
|
case BOOK3S_INTERRUPT_HV_RM_HARD:
|
||||||
r = RESUME_PASSTHROUGH;
|
r = RESUME_PASSTHROUGH;
|
||||||
|
@ -1143,6 +1273,12 @@ static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr,
|
||||||
mask = LPCR_DPFD | LPCR_ILE | LPCR_TC;
|
mask = LPCR_DPFD | LPCR_ILE | LPCR_TC;
|
||||||
if (cpu_has_feature(CPU_FTR_ARCH_207S))
|
if (cpu_has_feature(CPU_FTR_ARCH_207S))
|
||||||
mask |= LPCR_AIL;
|
mask |= LPCR_AIL;
|
||||||
|
/*
|
||||||
|
* On POWER9, allow userspace to enable large decrementer for the
|
||||||
|
* guest, whether or not the host has it enabled.
|
||||||
|
*/
|
||||||
|
if (cpu_has_feature(CPU_FTR_ARCH_300))
|
||||||
|
mask |= LPCR_LD;
|
||||||
|
|
||||||
/* Broken 32-bit version of LPCR must not clear top bits */
|
/* Broken 32-bit version of LPCR must not clear top bits */
|
||||||
if (preserve_top32)
|
if (preserve_top32)
|
||||||
|
@ -1611,7 +1747,7 @@ static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
|
||||||
init_swait_queue_head(&vcore->wq);
|
init_swait_queue_head(&vcore->wq);
|
||||||
vcore->preempt_tb = TB_NIL;
|
vcore->preempt_tb = TB_NIL;
|
||||||
vcore->lpcr = kvm->arch.lpcr;
|
vcore->lpcr = kvm->arch.lpcr;
|
||||||
vcore->first_vcpuid = core * threads_per_vcore();
|
vcore->first_vcpuid = core * kvm->arch.smt_mode;
|
||||||
vcore->kvm = kvm;
|
vcore->kvm = kvm;
|
||||||
INIT_LIST_HEAD(&vcore->preempt_list);
|
INIT_LIST_HEAD(&vcore->preempt_list);
|
||||||
|
|
||||||
|
@ -1770,14 +1906,10 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
|
||||||
unsigned int id)
|
unsigned int id)
|
||||||
{
|
{
|
||||||
struct kvm_vcpu *vcpu;
|
struct kvm_vcpu *vcpu;
|
||||||
int err = -EINVAL;
|
int err;
|
||||||
int core;
|
int core;
|
||||||
struct kvmppc_vcore *vcore;
|
struct kvmppc_vcore *vcore;
|
||||||
|
|
||||||
core = id / threads_per_vcore();
|
|
||||||
if (core >= KVM_MAX_VCORES)
|
|
||||||
goto out;
|
|
||||||
|
|
||||||
err = -ENOMEM;
|
err = -ENOMEM;
|
||||||
vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
|
vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
|
||||||
if (!vcpu)
|
if (!vcpu)
|
||||||
|
@ -1808,6 +1940,20 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
|
||||||
vcpu->arch.busy_preempt = TB_NIL;
|
vcpu->arch.busy_preempt = TB_NIL;
|
||||||
vcpu->arch.intr_msr = MSR_SF | MSR_ME;
|
vcpu->arch.intr_msr = MSR_SF | MSR_ME;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Set the default HFSCR for the guest from the host value.
|
||||||
|
* This value is only used on POWER9.
|
||||||
|
* On POWER9 DD1, TM doesn't work, so we make sure to
|
||||||
|
* prevent the guest from using it.
|
||||||
|
* On POWER9, we want to virtualize the doorbell facility, so we
|
||||||
|
* turn off the HFSCR bit, which causes those instructions to trap.
|
||||||
|
*/
|
||||||
|
vcpu->arch.hfscr = mfspr(SPRN_HFSCR);
|
||||||
|
if (!cpu_has_feature(CPU_FTR_TM))
|
||||||
|
vcpu->arch.hfscr &= ~HFSCR_TM;
|
||||||
|
if (cpu_has_feature(CPU_FTR_ARCH_300))
|
||||||
|
vcpu->arch.hfscr &= ~HFSCR_MSGP;
|
||||||
|
|
||||||
kvmppc_mmu_book3s_hv_init(vcpu);
|
kvmppc_mmu_book3s_hv_init(vcpu);
|
||||||
|
|
||||||
vcpu->arch.state = KVMPPC_VCPU_NOTREADY;
|
vcpu->arch.state = KVMPPC_VCPU_NOTREADY;
|
||||||
|
@ -1815,11 +1961,17 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
|
||||||
init_waitqueue_head(&vcpu->arch.cpu_run);
|
init_waitqueue_head(&vcpu->arch.cpu_run);
|
||||||
|
|
||||||
mutex_lock(&kvm->lock);
|
mutex_lock(&kvm->lock);
|
||||||
vcore = kvm->arch.vcores[core];
|
vcore = NULL;
|
||||||
if (!vcore) {
|
err = -EINVAL;
|
||||||
vcore = kvmppc_vcore_create(kvm, core);
|
core = id / kvm->arch.smt_mode;
|
||||||
kvm->arch.vcores[core] = vcore;
|
if (core < KVM_MAX_VCORES) {
|
||||||
kvm->arch.online_vcores++;
|
vcore = kvm->arch.vcores[core];
|
||||||
|
if (!vcore) {
|
||||||
|
err = -ENOMEM;
|
||||||
|
vcore = kvmppc_vcore_create(kvm, core);
|
||||||
|
kvm->arch.vcores[core] = vcore;
|
||||||
|
kvm->arch.online_vcores++;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
mutex_unlock(&kvm->lock);
|
mutex_unlock(&kvm->lock);
|
||||||
|
|
||||||
|
@ -1847,6 +1999,43 @@ out:
|
||||||
return ERR_PTR(err);
|
return ERR_PTR(err);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static int kvmhv_set_smt_mode(struct kvm *kvm, unsigned long smt_mode,
|
||||||
|
unsigned long flags)
|
||||||
|
{
|
||||||
|
int err;
|
||||||
|
int esmt = 0;
|
||||||
|
|
||||||
|
if (flags)
|
||||||
|
return -EINVAL;
|
||||||
|
if (smt_mode > MAX_SMT_THREADS || !is_power_of_2(smt_mode))
|
||||||
|
return -EINVAL;
|
||||||
|
if (!cpu_has_feature(CPU_FTR_ARCH_300)) {
|
||||||
|
/*
|
||||||
|
* On POWER8 (or POWER7), the threading mode is "strict",
|
||||||
|
* so we pack smt_mode vcpus per vcore.
|
||||||
|
*/
|
||||||
|
if (smt_mode > threads_per_subcore)
|
||||||
|
return -EINVAL;
|
||||||
|
} else {
|
||||||
|
/*
|
||||||
|
* On POWER9, the threading mode is "loose",
|
||||||
|
* so each vcpu gets its own vcore.
|
||||||
|
*/
|
||||||
|
esmt = smt_mode;
|
||||||
|
smt_mode = 1;
|
||||||
|
}
|
||||||
|
mutex_lock(&kvm->lock);
|
||||||
|
err = -EBUSY;
|
||||||
|
if (!kvm->arch.online_vcores) {
|
||||||
|
kvm->arch.smt_mode = smt_mode;
|
||||||
|
kvm->arch.emul_smt_mode = esmt;
|
||||||
|
err = 0;
|
||||||
|
}
|
||||||
|
mutex_unlock(&kvm->lock);
|
||||||
|
|
||||||
|
return err;
|
||||||
|
}
|
||||||
|
|
||||||
static void unpin_vpa(struct kvm *kvm, struct kvmppc_vpa *vpa)
|
static void unpin_vpa(struct kvm *kvm, struct kvmppc_vpa *vpa)
|
||||||
{
|
{
|
||||||
if (vpa->pinned_addr)
|
if (vpa->pinned_addr)
|
||||||
|
@ -1897,7 +2086,7 @@ static void kvmppc_end_cede(struct kvm_vcpu *vcpu)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
extern void __kvmppc_vcore_entry(void);
|
extern int __kvmppc_vcore_entry(void);
|
||||||
|
|
||||||
static void kvmppc_remove_runnable(struct kvmppc_vcore *vc,
|
static void kvmppc_remove_runnable(struct kvmppc_vcore *vc,
|
||||||
struct kvm_vcpu *vcpu)
|
struct kvm_vcpu *vcpu)
|
||||||
|
@ -1962,10 +2151,6 @@ static void kvmppc_release_hwthread(int cpu)
|
||||||
tpaca->kvm_hstate.kvm_split_mode = NULL;
|
tpaca->kvm_hstate.kvm_split_mode = NULL;
|
||||||
}
|
}
|
||||||
|
|
||||||
static void do_nothing(void *x)
|
|
||||||
{
|
|
||||||
}
|
|
||||||
|
|
||||||
static void radix_flush_cpu(struct kvm *kvm, int cpu, struct kvm_vcpu *vcpu)
|
static void radix_flush_cpu(struct kvm *kvm, int cpu, struct kvm_vcpu *vcpu)
|
||||||
{
|
{
|
||||||
int i;
|
int i;
|
||||||
|
@ -1983,11 +2168,35 @@ static void radix_flush_cpu(struct kvm *kvm, int cpu, struct kvm_vcpu *vcpu)
|
||||||
smp_call_function_single(cpu + i, do_nothing, NULL, 1);
|
smp_call_function_single(cpu + i, do_nothing, NULL, 1);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void kvmppc_prepare_radix_vcpu(struct kvm_vcpu *vcpu, int pcpu)
|
||||||
|
{
|
||||||
|
struct kvm *kvm = vcpu->kvm;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* With radix, the guest can do TLB invalidations itself,
|
||||||
|
* and it could choose to use the local form (tlbiel) if
|
||||||
|
* it is invalidating a translation that has only ever been
|
||||||
|
* used on one vcpu. However, that doesn't mean it has
|
||||||
|
* only ever been used on one physical cpu, since vcpus
|
||||||
|
* can move around between pcpus. To cope with this, when
|
||||||
|
* a vcpu moves from one pcpu to another, we need to tell
|
||||||
|
* any vcpus running on the same core as this vcpu previously
|
||||||
|
* ran to flush the TLB. The TLB is shared between threads,
|
||||||
|
* so we use a single bit in .need_tlb_flush for all 4 threads.
|
||||||
|
*/
|
||||||
|
if (vcpu->arch.prev_cpu != pcpu) {
|
||||||
|
if (vcpu->arch.prev_cpu >= 0 &&
|
||||||
|
cpu_first_thread_sibling(vcpu->arch.prev_cpu) !=
|
||||||
|
cpu_first_thread_sibling(pcpu))
|
||||||
|
radix_flush_cpu(kvm, vcpu->arch.prev_cpu, vcpu);
|
||||||
|
vcpu->arch.prev_cpu = pcpu;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
static void kvmppc_start_thread(struct kvm_vcpu *vcpu, struct kvmppc_vcore *vc)
|
static void kvmppc_start_thread(struct kvm_vcpu *vcpu, struct kvmppc_vcore *vc)
|
||||||
{
|
{
|
||||||
int cpu;
|
int cpu;
|
||||||
struct paca_struct *tpaca;
|
struct paca_struct *tpaca;
|
||||||
struct kvmppc_vcore *mvc = vc->master_vcore;
|
|
||||||
struct kvm *kvm = vc->kvm;
|
struct kvm *kvm = vc->kvm;
|
||||||
|
|
||||||
cpu = vc->pcpu;
|
cpu = vc->pcpu;
|
||||||
|
@ -1997,36 +2206,16 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu, struct kvmppc_vcore *vc)
|
||||||
vcpu->arch.timer_running = 0;
|
vcpu->arch.timer_running = 0;
|
||||||
}
|
}
|
||||||
cpu += vcpu->arch.ptid;
|
cpu += vcpu->arch.ptid;
|
||||||
vcpu->cpu = mvc->pcpu;
|
vcpu->cpu = vc->pcpu;
|
||||||
vcpu->arch.thread_cpu = cpu;
|
vcpu->arch.thread_cpu = cpu;
|
||||||
|
|
||||||
/*
|
|
||||||
* With radix, the guest can do TLB invalidations itself,
|
|
||||||
* and it could choose to use the local form (tlbiel) if
|
|
||||||
* it is invalidating a translation that has only ever been
|
|
||||||
* used on one vcpu. However, that doesn't mean it has
|
|
||||||
* only ever been used on one physical cpu, since vcpus
|
|
||||||
* can move around between pcpus. To cope with this, when
|
|
||||||
* a vcpu moves from one pcpu to another, we need to tell
|
|
||||||
* any vcpus running on the same core as this vcpu previously
|
|
||||||
* ran to flush the TLB. The TLB is shared between threads,
|
|
||||||
* so we use a single bit in .need_tlb_flush for all 4 threads.
|
|
||||||
*/
|
|
||||||
if (kvm_is_radix(kvm) && vcpu->arch.prev_cpu != cpu) {
|
|
||||||
if (vcpu->arch.prev_cpu >= 0 &&
|
|
||||||
cpu_first_thread_sibling(vcpu->arch.prev_cpu) !=
|
|
||||||
cpu_first_thread_sibling(cpu))
|
|
||||||
radix_flush_cpu(kvm, vcpu->arch.prev_cpu, vcpu);
|
|
||||||
vcpu->arch.prev_cpu = cpu;
|
|
||||||
}
|
|
||||||
cpumask_set_cpu(cpu, &kvm->arch.cpu_in_guest);
|
cpumask_set_cpu(cpu, &kvm->arch.cpu_in_guest);
|
||||||
}
|
}
|
||||||
tpaca = &paca[cpu];
|
tpaca = &paca[cpu];
|
||||||
tpaca->kvm_hstate.kvm_vcpu = vcpu;
|
tpaca->kvm_hstate.kvm_vcpu = vcpu;
|
||||||
tpaca->kvm_hstate.ptid = cpu - mvc->pcpu;
|
tpaca->kvm_hstate.ptid = cpu - vc->pcpu;
|
||||||
/* Order stores to hstate.kvm_vcpu etc. before store to kvm_vcore */
|
/* Order stores to hstate.kvm_vcpu etc. before store to kvm_vcore */
|
||||||
smp_wmb();
|
smp_wmb();
|
||||||
tpaca->kvm_hstate.kvm_vcore = mvc;
|
tpaca->kvm_hstate.kvm_vcore = vc;
|
||||||
if (cpu != smp_processor_id())
|
if (cpu != smp_processor_id())
|
||||||
kvmppc_ipi_thread(cpu);
|
kvmppc_ipi_thread(cpu);
|
||||||
}
|
}
|
||||||
|
@ -2155,8 +2344,7 @@ struct core_info {
|
||||||
int max_subcore_threads;
|
int max_subcore_threads;
|
||||||
int total_threads;
|
int total_threads;
|
||||||
int subcore_threads[MAX_SUBCORES];
|
int subcore_threads[MAX_SUBCORES];
|
||||||
struct kvm *subcore_vm[MAX_SUBCORES];
|
struct kvmppc_vcore *vc[MAX_SUBCORES];
|
||||||
struct list_head vcs[MAX_SUBCORES];
|
|
||||||
};
|
};
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -2167,17 +2355,12 @@ static int subcore_thread_map[MAX_SUBCORES] = { 0, 4, 2, 6 };
|
||||||
|
|
||||||
static void init_core_info(struct core_info *cip, struct kvmppc_vcore *vc)
|
static void init_core_info(struct core_info *cip, struct kvmppc_vcore *vc)
|
||||||
{
|
{
|
||||||
int sub;
|
|
||||||
|
|
||||||
memset(cip, 0, sizeof(*cip));
|
memset(cip, 0, sizeof(*cip));
|
||||||
cip->n_subcores = 1;
|
cip->n_subcores = 1;
|
||||||
cip->max_subcore_threads = vc->num_threads;
|
cip->max_subcore_threads = vc->num_threads;
|
||||||
cip->total_threads = vc->num_threads;
|
cip->total_threads = vc->num_threads;
|
||||||
cip->subcore_threads[0] = vc->num_threads;
|
cip->subcore_threads[0] = vc->num_threads;
|
||||||
cip->subcore_vm[0] = vc->kvm;
|
cip->vc[0] = vc;
|
||||||
for (sub = 0; sub < MAX_SUBCORES; ++sub)
|
|
||||||
INIT_LIST_HEAD(&cip->vcs[sub]);
|
|
||||||
list_add_tail(&vc->preempt_list, &cip->vcs[0]);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
static bool subcore_config_ok(int n_subcores, int n_threads)
|
static bool subcore_config_ok(int n_subcores, int n_threads)
|
||||||
|
@ -2197,9 +2380,8 @@ static bool subcore_config_ok(int n_subcores, int n_threads)
|
||||||
return n_subcores * roundup_pow_of_two(n_threads) <= MAX_SMT_THREADS;
|
return n_subcores * roundup_pow_of_two(n_threads) <= MAX_SMT_THREADS;
|
||||||
}
|
}
|
||||||
|
|
||||||
static void init_master_vcore(struct kvmppc_vcore *vc)
|
static void init_vcore_to_run(struct kvmppc_vcore *vc)
|
||||||
{
|
{
|
||||||
vc->master_vcore = vc;
|
|
||||||
vc->entry_exit_map = 0;
|
vc->entry_exit_map = 0;
|
||||||
vc->in_guest = 0;
|
vc->in_guest = 0;
|
||||||
vc->napping_threads = 0;
|
vc->napping_threads = 0;
|
||||||
|
@ -2224,9 +2406,9 @@ static bool can_dynamic_split(struct kvmppc_vcore *vc, struct core_info *cip)
|
||||||
++cip->n_subcores;
|
++cip->n_subcores;
|
||||||
cip->total_threads += vc->num_threads;
|
cip->total_threads += vc->num_threads;
|
||||||
cip->subcore_threads[sub] = vc->num_threads;
|
cip->subcore_threads[sub] = vc->num_threads;
|
||||||
cip->subcore_vm[sub] = vc->kvm;
|
cip->vc[sub] = vc;
|
||||||
init_master_vcore(vc);
|
init_vcore_to_run(vc);
|
||||||
list_move_tail(&vc->preempt_list, &cip->vcs[sub]);
|
list_del_init(&vc->preempt_list);
|
||||||
|
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
|
@ -2294,6 +2476,18 @@ static void collect_piggybacks(struct core_info *cip, int target_threads)
|
||||||
spin_unlock(&lp->lock);
|
spin_unlock(&lp->lock);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static bool recheck_signals(struct core_info *cip)
|
||||||
|
{
|
||||||
|
int sub, i;
|
||||||
|
struct kvm_vcpu *vcpu;
|
||||||
|
|
||||||
|
for (sub = 0; sub < cip->n_subcores; ++sub)
|
||||||
|
for_each_runnable_thread(i, vcpu, cip->vc[sub])
|
||||||
|
if (signal_pending(vcpu->arch.run_task))
|
||||||
|
return true;
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
static void post_guest_process(struct kvmppc_vcore *vc, bool is_master)
|
static void post_guest_process(struct kvmppc_vcore *vc, bool is_master)
|
||||||
{
|
{
|
||||||
int still_running = 0, i;
|
int still_running = 0, i;
|
||||||
|
@ -2331,7 +2525,6 @@ static void post_guest_process(struct kvmppc_vcore *vc, bool is_master)
|
||||||
wake_up(&vcpu->arch.cpu_run);
|
wake_up(&vcpu->arch.cpu_run);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
list_del_init(&vc->preempt_list);
|
|
||||||
if (!is_master) {
|
if (!is_master) {
|
||||||
if (still_running > 0) {
|
if (still_running > 0) {
|
||||||
kvmppc_vcore_preempt(vc);
|
kvmppc_vcore_preempt(vc);
|
||||||
|
@ -2393,6 +2586,21 @@ static inline int kvmppc_set_host_core(unsigned int cpu)
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void set_irq_happened(int trap)
|
||||||
|
{
|
||||||
|
switch (trap) {
|
||||||
|
case BOOK3S_INTERRUPT_EXTERNAL:
|
||||||
|
local_paca->irq_happened |= PACA_IRQ_EE;
|
||||||
|
break;
|
||||||
|
case BOOK3S_INTERRUPT_H_DOORBELL:
|
||||||
|
local_paca->irq_happened |= PACA_IRQ_DBELL;
|
||||||
|
break;
|
||||||
|
case BOOK3S_INTERRUPT_HMI:
|
||||||
|
local_paca->irq_happened |= PACA_IRQ_HMI;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Run a set of guest threads on a physical core.
|
* Run a set of guest threads on a physical core.
|
||||||
* Called with vc->lock held.
|
* Called with vc->lock held.
|
||||||
|
@ -2403,7 +2611,7 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
|
||||||
int i;
|
int i;
|
||||||
int srcu_idx;
|
int srcu_idx;
|
||||||
struct core_info core_info;
|
struct core_info core_info;
|
||||||
struct kvmppc_vcore *pvc, *vcnext;
|
struct kvmppc_vcore *pvc;
|
||||||
struct kvm_split_mode split_info, *sip;
|
struct kvm_split_mode split_info, *sip;
|
||||||
int split, subcore_size, active;
|
int split, subcore_size, active;
|
||||||
int sub;
|
int sub;
|
||||||
|
@ -2412,6 +2620,7 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
|
||||||
int pcpu, thr;
|
int pcpu, thr;
|
||||||
int target_threads;
|
int target_threads;
|
||||||
int controlled_threads;
|
int controlled_threads;
|
||||||
|
int trap;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Remove from the list any threads that have a signal pending
|
* Remove from the list any threads that have a signal pending
|
||||||
|
@ -2426,7 +2635,7 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
|
||||||
/*
|
/*
|
||||||
* Initialize *vc.
|
* Initialize *vc.
|
||||||
*/
|
*/
|
||||||
init_master_vcore(vc);
|
init_vcore_to_run(vc);
|
||||||
vc->preempt_tb = TB_NIL;
|
vc->preempt_tb = TB_NIL;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -2463,6 +2672,43 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
|
||||||
if (vc->num_threads < target_threads)
|
if (vc->num_threads < target_threads)
|
||||||
collect_piggybacks(&core_info, target_threads);
|
collect_piggybacks(&core_info, target_threads);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* On radix, arrange for TLB flushing if necessary.
|
||||||
|
* This has to be done before disabling interrupts since
|
||||||
|
* it uses smp_call_function().
|
||||||
|
*/
|
||||||
|
pcpu = smp_processor_id();
|
||||||
|
if (kvm_is_radix(vc->kvm)) {
|
||||||
|
for (sub = 0; sub < core_info.n_subcores; ++sub)
|
||||||
|
for_each_runnable_thread(i, vcpu, core_info.vc[sub])
|
||||||
|
kvmppc_prepare_radix_vcpu(vcpu, pcpu);
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Hard-disable interrupts, and check resched flag and signals.
|
||||||
|
* If we need to reschedule or deliver a signal, clean up
|
||||||
|
* and return without going into the guest(s).
|
||||||
|
*/
|
||||||
|
local_irq_disable();
|
||||||
|
hard_irq_disable();
|
||||||
|
if (lazy_irq_pending() || need_resched() ||
|
||||||
|
recheck_signals(&core_info)) {
|
||||||
|
local_irq_enable();
|
||||||
|
vc->vcore_state = VCORE_INACTIVE;
|
||||||
|
/* Unlock all except the primary vcore */
|
||||||
|
for (sub = 1; sub < core_info.n_subcores; ++sub) {
|
||||||
|
pvc = core_info.vc[sub];
|
||||||
|
/* Put back on to the preempted vcores list */
|
||||||
|
kvmppc_vcore_preempt(pvc);
|
||||||
|
spin_unlock(&pvc->lock);
|
||||||
|
}
|
||||||
|
for (i = 0; i < controlled_threads; ++i)
|
||||||
|
kvmppc_release_hwthread(pcpu + i);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
kvmppc_clear_host_core(pcpu);
|
||||||
|
|
||||||
/* Decide on micro-threading (split-core) mode */
|
/* Decide on micro-threading (split-core) mode */
|
||||||
subcore_size = threads_per_subcore;
|
subcore_size = threads_per_subcore;
|
||||||
cmd_bit = stat_bit = 0;
|
cmd_bit = stat_bit = 0;
|
||||||
|
@ -2486,13 +2732,10 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
|
||||||
split_info.ldbar = mfspr(SPRN_LDBAR);
|
split_info.ldbar = mfspr(SPRN_LDBAR);
|
||||||
split_info.subcore_size = subcore_size;
|
split_info.subcore_size = subcore_size;
|
||||||
for (sub = 0; sub < core_info.n_subcores; ++sub)
|
for (sub = 0; sub < core_info.n_subcores; ++sub)
|
||||||
split_info.master_vcs[sub] =
|
split_info.vc[sub] = core_info.vc[sub];
|
||||||
list_first_entry(&core_info.vcs[sub],
|
|
||||||
struct kvmppc_vcore, preempt_list);
|
|
||||||
/* order writes to split_info before kvm_split_mode pointer */
|
/* order writes to split_info before kvm_split_mode pointer */
|
||||||
smp_wmb();
|
smp_wmb();
|
||||||
}
|
}
|
||||||
pcpu = smp_processor_id();
|
|
||||||
for (thr = 0; thr < controlled_threads; ++thr)
|
for (thr = 0; thr < controlled_threads; ++thr)
|
||||||
paca[pcpu + thr].kvm_hstate.kvm_split_mode = sip;
|
paca[pcpu + thr].kvm_hstate.kvm_split_mode = sip;
|
||||||
|
|
||||||
|
@ -2512,32 +2755,29 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
kvmppc_clear_host_core(pcpu);
|
|
||||||
|
|
||||||
/* Start all the threads */
|
/* Start all the threads */
|
||||||
active = 0;
|
active = 0;
|
||||||
for (sub = 0; sub < core_info.n_subcores; ++sub) {
|
for (sub = 0; sub < core_info.n_subcores; ++sub) {
|
||||||
thr = subcore_thread_map[sub];
|
thr = subcore_thread_map[sub];
|
||||||
thr0_done = false;
|
thr0_done = false;
|
||||||
active |= 1 << thr;
|
active |= 1 << thr;
|
||||||
list_for_each_entry(pvc, &core_info.vcs[sub], preempt_list) {
|
pvc = core_info.vc[sub];
|
||||||
pvc->pcpu = pcpu + thr;
|
pvc->pcpu = pcpu + thr;
|
||||||
for_each_runnable_thread(i, vcpu, pvc) {
|
for_each_runnable_thread(i, vcpu, pvc) {
|
||||||
kvmppc_start_thread(vcpu, pvc);
|
kvmppc_start_thread(vcpu, pvc);
|
||||||
kvmppc_create_dtl_entry(vcpu, pvc);
|
kvmppc_create_dtl_entry(vcpu, pvc);
|
||||||
trace_kvm_guest_enter(vcpu);
|
trace_kvm_guest_enter(vcpu);
|
||||||
if (!vcpu->arch.ptid)
|
if (!vcpu->arch.ptid)
|
||||||
thr0_done = true;
|
thr0_done = true;
|
||||||
active |= 1 << (thr + vcpu->arch.ptid);
|
active |= 1 << (thr + vcpu->arch.ptid);
|
||||||
}
|
|
||||||
/*
|
|
||||||
* We need to start the first thread of each subcore
|
|
||||||
* even if it doesn't have a vcpu.
|
|
||||||
*/
|
|
||||||
if (pvc->master_vcore == pvc && !thr0_done)
|
|
||||||
kvmppc_start_thread(NULL, pvc);
|
|
||||||
thr += pvc->num_threads;
|
|
||||||
}
|
}
|
||||||
|
/*
|
||||||
|
* We need to start the first thread of each subcore
|
||||||
|
* even if it doesn't have a vcpu.
|
||||||
|
*/
|
||||||
|
if (!thr0_done)
|
||||||
|
kvmppc_start_thread(NULL, pvc);
|
||||||
|
thr += pvc->num_threads;
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -2564,17 +2804,27 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
|
||||||
trace_kvmppc_run_core(vc, 0);
|
trace_kvmppc_run_core(vc, 0);
|
||||||
|
|
||||||
for (sub = 0; sub < core_info.n_subcores; ++sub)
|
for (sub = 0; sub < core_info.n_subcores; ++sub)
|
||||||
list_for_each_entry(pvc, &core_info.vcs[sub], preempt_list)
|
spin_unlock(&core_info.vc[sub]->lock);
|
||||||
spin_unlock(&pvc->lock);
|
|
||||||
|
/*
|
||||||
|
* Interrupts will be enabled once we get into the guest,
|
||||||
|
* so tell lockdep that we're about to enable interrupts.
|
||||||
|
*/
|
||||||
|
trace_hardirqs_on();
|
||||||
|
|
||||||
guest_enter();
|
guest_enter();
|
||||||
|
|
||||||
srcu_idx = srcu_read_lock(&vc->kvm->srcu);
|
srcu_idx = srcu_read_lock(&vc->kvm->srcu);
|
||||||
|
|
||||||
__kvmppc_vcore_entry();
|
trap = __kvmppc_vcore_entry();
|
||||||
|
|
||||||
srcu_read_unlock(&vc->kvm->srcu, srcu_idx);
|
srcu_read_unlock(&vc->kvm->srcu, srcu_idx);
|
||||||
|
|
||||||
|
guest_exit();
|
||||||
|
|
||||||
|
trace_hardirqs_off();
|
||||||
|
set_irq_happened(trap);
|
||||||
|
|
||||||
spin_lock(&vc->lock);
|
spin_lock(&vc->lock);
|
||||||
/* prevent other vcpu threads from doing kvmppc_start_thread() now */
|
/* prevent other vcpu threads from doing kvmppc_start_thread() now */
|
||||||
vc->vcore_state = VCORE_EXITING;
|
vc->vcore_state = VCORE_EXITING;
|
||||||
|
@ -2602,6 +2852,10 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
|
||||||
split_info.do_nap = 0;
|
split_info.do_nap = 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
kvmppc_set_host_core(pcpu);
|
||||||
|
|
||||||
|
local_irq_enable();
|
||||||
|
|
||||||
/* Let secondaries go back to the offline loop */
|
/* Let secondaries go back to the offline loop */
|
||||||
for (i = 0; i < controlled_threads; ++i) {
|
for (i = 0; i < controlled_threads; ++i) {
|
||||||
kvmppc_release_hwthread(pcpu + i);
|
kvmppc_release_hwthread(pcpu + i);
|
||||||
|
@ -2610,18 +2864,15 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
|
||||||
cpumask_clear_cpu(pcpu + i, &vc->kvm->arch.cpu_in_guest);
|
cpumask_clear_cpu(pcpu + i, &vc->kvm->arch.cpu_in_guest);
|
||||||
}
|
}
|
||||||
|
|
||||||
kvmppc_set_host_core(pcpu);
|
|
||||||
|
|
||||||
spin_unlock(&vc->lock);
|
spin_unlock(&vc->lock);
|
||||||
|
|
||||||
/* make sure updates to secondary vcpu structs are visible now */
|
/* make sure updates to secondary vcpu structs are visible now */
|
||||||
smp_mb();
|
smp_mb();
|
||||||
guest_exit();
|
|
||||||
|
|
||||||
for (sub = 0; sub < core_info.n_subcores; ++sub)
|
for (sub = 0; sub < core_info.n_subcores; ++sub) {
|
||||||
list_for_each_entry_safe(pvc, vcnext, &core_info.vcs[sub],
|
pvc = core_info.vc[sub];
|
||||||
preempt_list)
|
post_guest_process(pvc, pvc == vc);
|
||||||
post_guest_process(pvc, pvc == vc);
|
}
|
||||||
|
|
||||||
spin_lock(&vc->lock);
|
spin_lock(&vc->lock);
|
||||||
preempt_enable();
|
preempt_enable();
|
||||||
|
@ -2666,6 +2917,30 @@ static void shrink_halt_poll_ns(struct kvmppc_vcore *vc)
|
||||||
vc->halt_poll_ns /= halt_poll_ns_shrink;
|
vc->halt_poll_ns /= halt_poll_ns_shrink;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#ifdef CONFIG_KVM_XICS
|
||||||
|
static inline bool xive_interrupt_pending(struct kvm_vcpu *vcpu)
|
||||||
|
{
|
||||||
|
if (!xive_enabled())
|
||||||
|
return false;
|
||||||
|
return vcpu->arch.xive_saved_state.pipr <
|
||||||
|
vcpu->arch.xive_saved_state.cppr;
|
||||||
|
}
|
||||||
|
#else
|
||||||
|
static inline bool xive_interrupt_pending(struct kvm_vcpu *vcpu)
|
||||||
|
{
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
#endif /* CONFIG_KVM_XICS */
|
||||||
|
|
||||||
|
static bool kvmppc_vcpu_woken(struct kvm_vcpu *vcpu)
|
||||||
|
{
|
||||||
|
if (vcpu->arch.pending_exceptions || vcpu->arch.prodded ||
|
||||||
|
kvmppc_doorbell_pending(vcpu) || xive_interrupt_pending(vcpu))
|
||||||
|
return true;
|
||||||
|
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Check to see if any of the runnable vcpus on the vcore have pending
|
* Check to see if any of the runnable vcpus on the vcore have pending
|
||||||
* exceptions or are no longer ceded
|
* exceptions or are no longer ceded
|
||||||
|
@ -2676,8 +2951,7 @@ static int kvmppc_vcore_check_block(struct kvmppc_vcore *vc)
|
||||||
int i;
|
int i;
|
||||||
|
|
||||||
for_each_runnable_thread(i, vcpu, vc) {
|
for_each_runnable_thread(i, vcpu, vc) {
|
||||||
if (vcpu->arch.pending_exceptions || !vcpu->arch.ceded ||
|
if (!vcpu->arch.ceded || kvmppc_vcpu_woken(vcpu))
|
||||||
vcpu->arch.prodded)
|
|
||||||
return 1;
|
return 1;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -2819,15 +3093,14 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
|
||||||
*/
|
*/
|
||||||
if (!signal_pending(current)) {
|
if (!signal_pending(current)) {
|
||||||
if (vc->vcore_state == VCORE_PIGGYBACK) {
|
if (vc->vcore_state == VCORE_PIGGYBACK) {
|
||||||
struct kvmppc_vcore *mvc = vc->master_vcore;
|
if (spin_trylock(&vc->lock)) {
|
||||||
if (spin_trylock(&mvc->lock)) {
|
if (vc->vcore_state == VCORE_RUNNING &&
|
||||||
if (mvc->vcore_state == VCORE_RUNNING &&
|
!VCORE_IS_EXITING(vc)) {
|
||||||
!VCORE_IS_EXITING(mvc)) {
|
|
||||||
kvmppc_create_dtl_entry(vcpu, vc);
|
kvmppc_create_dtl_entry(vcpu, vc);
|
||||||
kvmppc_start_thread(vcpu, vc);
|
kvmppc_start_thread(vcpu, vc);
|
||||||
trace_kvm_guest_enter(vcpu);
|
trace_kvm_guest_enter(vcpu);
|
||||||
}
|
}
|
||||||
spin_unlock(&mvc->lock);
|
spin_unlock(&vc->lock);
|
||||||
}
|
}
|
||||||
} else if (vc->vcore_state == VCORE_RUNNING &&
|
} else if (vc->vcore_state == VCORE_RUNNING &&
|
||||||
!VCORE_IS_EXITING(vc)) {
|
!VCORE_IS_EXITING(vc)) {
|
||||||
|
@ -2863,7 +3136,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
|
||||||
break;
|
break;
|
||||||
n_ceded = 0;
|
n_ceded = 0;
|
||||||
for_each_runnable_thread(i, v, vc) {
|
for_each_runnable_thread(i, v, vc) {
|
||||||
if (!v->arch.pending_exceptions && !v->arch.prodded)
|
if (!kvmppc_vcpu_woken(v))
|
||||||
n_ceded += v->arch.ceded;
|
n_ceded += v->arch.ceded;
|
||||||
else
|
else
|
||||||
v->arch.ceded = 0;
|
v->arch.ceded = 0;
|
||||||
|
@ -3518,6 +3791,19 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
|
||||||
if (!cpu_has_feature(CPU_FTR_ARCH_300))
|
if (!cpu_has_feature(CPU_FTR_ARCH_300))
|
||||||
kvm_hv_vm_activated();
|
kvm_hv_vm_activated();
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Initialize smt_mode depending on processor.
|
||||||
|
* POWER8 and earlier have to use "strict" threading, where
|
||||||
|
* all vCPUs in a vcore have to run on the same (sub)core,
|
||||||
|
* whereas on POWER9 the threads can each run a different
|
||||||
|
* guest.
|
||||||
|
*/
|
||||||
|
if (!cpu_has_feature(CPU_FTR_ARCH_300))
|
||||||
|
kvm->arch.smt_mode = threads_per_subcore;
|
||||||
|
else
|
||||||
|
kvm->arch.smt_mode = 1;
|
||||||
|
kvm->arch.emul_smt_mode = 1;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Create a debugfs directory for the VM
|
* Create a debugfs directory for the VM
|
||||||
*/
|
*/
|
||||||
|
@ -3947,6 +4233,7 @@ static struct kvmppc_ops kvm_ops_hv = {
|
||||||
#endif
|
#endif
|
||||||
.configure_mmu = kvmhv_configure_mmu,
|
.configure_mmu = kvmhv_configure_mmu,
|
||||||
.get_rmmu_info = kvmhv_get_rmmu_info,
|
.get_rmmu_info = kvmhv_get_rmmu_info,
|
||||||
|
.set_smt_mode = kvmhv_set_smt_mode,
|
||||||
};
|
};
|
||||||
|
|
||||||
static int kvm_init_subcore_bitmap(void)
|
static int kvm_init_subcore_bitmap(void)
|
||||||
|
|
|
@ -307,7 +307,7 @@ void kvmhv_commence_exit(int trap)
|
||||||
return;
|
return;
|
||||||
|
|
||||||
for (i = 0; i < MAX_SUBCORES; ++i) {
|
for (i = 0; i < MAX_SUBCORES; ++i) {
|
||||||
vc = sip->master_vcs[i];
|
vc = sip->vc[i];
|
||||||
if (!vc)
|
if (!vc)
|
||||||
break;
|
break;
|
||||||
do {
|
do {
|
||||||
|
|
|
@ -61,13 +61,6 @@ BEGIN_FTR_SECTION
|
||||||
std r3, HSTATE_DABR(r13)
|
std r3, HSTATE_DABR(r13)
|
||||||
END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
|
END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
|
||||||
|
|
||||||
/* Hard-disable interrupts */
|
|
||||||
mfmsr r10
|
|
||||||
std r10, HSTATE_HOST_MSR(r13)
|
|
||||||
rldicl r10,r10,48,1
|
|
||||||
rotldi r10,r10,16
|
|
||||||
mtmsrd r10,1
|
|
||||||
|
|
||||||
/* Save host PMU registers */
|
/* Save host PMU registers */
|
||||||
BEGIN_FTR_SECTION
|
BEGIN_FTR_SECTION
|
||||||
/* Work around P8 PMAE bug */
|
/* Work around P8 PMAE bug */
|
||||||
|
@ -153,6 +146,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
|
||||||
*
|
*
|
||||||
* R1 = host R1
|
* R1 = host R1
|
||||||
* R2 = host R2
|
* R2 = host R2
|
||||||
|
* R3 = trap number on this thread
|
||||||
* R12 = exit handler id
|
* R12 = exit handler id
|
||||||
* R13 = PACA
|
* R13 = PACA
|
||||||
*/
|
*/
|
||||||
|
|
|
@ -130,12 +130,28 @@ static long kvmppc_realmode_mc_power7(struct kvm_vcpu *vcpu)
|
||||||
|
|
||||||
out:
|
out:
|
||||||
/*
|
/*
|
||||||
|
* For guest that supports FWNMI capability, hook the MCE event into
|
||||||
|
* vcpu structure. We are going to exit the guest with KVM_EXIT_NMI
|
||||||
|
* exit reason. On our way to exit we will pull this event from vcpu
|
||||||
|
* structure and print it from thread 0 of the core/subcore.
|
||||||
|
*
|
||||||
|
* For guest that does not support FWNMI capability (old QEMU):
|
||||||
* We are now going enter guest either through machine check
|
* We are now going enter guest either through machine check
|
||||||
* interrupt (for unhandled errors) or will continue from
|
* interrupt (for unhandled errors) or will continue from
|
||||||
* current HSRR0 (for handled errors) in guest. Hence
|
* current HSRR0 (for handled errors) in guest. Hence
|
||||||
* queue up the event so that we can log it from host console later.
|
* queue up the event so that we can log it from host console later.
|
||||||
*/
|
*/
|
||||||
machine_check_queue_event();
|
if (vcpu->kvm->arch.fwnmi_enabled) {
|
||||||
|
/*
|
||||||
|
* Hook up the mce event on to vcpu structure.
|
||||||
|
* First clear the old event.
|
||||||
|
*/
|
||||||
|
memset(&vcpu->arch.mce_evt, 0, sizeof(vcpu->arch.mce_evt));
|
||||||
|
if (get_mce_event(&mce_evt, MCE_EVENT_RELEASE)) {
|
||||||
|
vcpu->arch.mce_evt = mce_evt;
|
||||||
|
}
|
||||||
|
} else
|
||||||
|
machine_check_queue_event();
|
||||||
|
|
||||||
return handled;
|
return handled;
|
||||||
}
|
}
|
||||||
|
|
|
@ -45,7 +45,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
|
||||||
#define NAPPING_NOVCPU 2
|
#define NAPPING_NOVCPU 2
|
||||||
|
|
||||||
/* Stack frame offsets for kvmppc_hv_entry */
|
/* Stack frame offsets for kvmppc_hv_entry */
|
||||||
#define SFS 144
|
#define SFS 160
|
||||||
#define STACK_SLOT_TRAP (SFS-4)
|
#define STACK_SLOT_TRAP (SFS-4)
|
||||||
#define STACK_SLOT_TID (SFS-16)
|
#define STACK_SLOT_TID (SFS-16)
|
||||||
#define STACK_SLOT_PSSCR (SFS-24)
|
#define STACK_SLOT_PSSCR (SFS-24)
|
||||||
|
@ -54,6 +54,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
|
||||||
#define STACK_SLOT_CIABR (SFS-48)
|
#define STACK_SLOT_CIABR (SFS-48)
|
||||||
#define STACK_SLOT_DAWR (SFS-56)
|
#define STACK_SLOT_DAWR (SFS-56)
|
||||||
#define STACK_SLOT_DAWRX (SFS-64)
|
#define STACK_SLOT_DAWRX (SFS-64)
|
||||||
|
#define STACK_SLOT_HFSCR (SFS-72)
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Call kvmppc_hv_entry in real mode.
|
* Call kvmppc_hv_entry in real mode.
|
||||||
|
@ -68,6 +69,7 @@ _GLOBAL_TOC(kvmppc_hv_entry_trampoline)
|
||||||
std r0, PPC_LR_STKOFF(r1)
|
std r0, PPC_LR_STKOFF(r1)
|
||||||
stdu r1, -112(r1)
|
stdu r1, -112(r1)
|
||||||
mfmsr r10
|
mfmsr r10
|
||||||
|
std r10, HSTATE_HOST_MSR(r13)
|
||||||
LOAD_REG_ADDR(r5, kvmppc_call_hv_entry)
|
LOAD_REG_ADDR(r5, kvmppc_call_hv_entry)
|
||||||
li r0,MSR_RI
|
li r0,MSR_RI
|
||||||
andc r0,r10,r0
|
andc r0,r10,r0
|
||||||
|
@ -152,20 +154,21 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
|
||||||
stb r0, HSTATE_HWTHREAD_REQ(r13)
|
stb r0, HSTATE_HWTHREAD_REQ(r13)
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* For external and machine check interrupts, we need
|
* For external interrupts we need to call the Linux
|
||||||
* to call the Linux handler to process the interrupt.
|
* handler to process the interrupt. We do that by jumping
|
||||||
* We do that by jumping to absolute address 0x500 for
|
* to absolute address 0x500 for external interrupts.
|
||||||
* external interrupts, or the machine_check_fwnmi label
|
* The [h]rfid at the end of the handler will return to
|
||||||
* for machine checks (since firmware might have patched
|
* the book3s_hv_interrupts.S code. For other interrupts
|
||||||
* the vector area at 0x200). The [h]rfid at the end of the
|
* we do the rfid to get back to the book3s_hv_interrupts.S
|
||||||
* handler will return to the book3s_hv_interrupts.S code.
|
* code here.
|
||||||
* For other interrupts we do the rfid to get back
|
|
||||||
* to the book3s_hv_interrupts.S code here.
|
|
||||||
*/
|
*/
|
||||||
ld r8, 112+PPC_LR_STKOFF(r1)
|
ld r8, 112+PPC_LR_STKOFF(r1)
|
||||||
addi r1, r1, 112
|
addi r1, r1, 112
|
||||||
ld r7, HSTATE_HOST_MSR(r13)
|
ld r7, HSTATE_HOST_MSR(r13)
|
||||||
|
|
||||||
|
/* Return the trap number on this thread as the return value */
|
||||||
|
mr r3, r12
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* If we came back from the guest via a relocation-on interrupt,
|
* If we came back from the guest via a relocation-on interrupt,
|
||||||
* we will be in virtual mode at this point, which makes it a
|
* we will be in virtual mode at this point, which makes it a
|
||||||
|
@ -175,59 +178,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
|
||||||
andi. r0, r0, MSR_IR /* in real mode? */
|
andi. r0, r0, MSR_IR /* in real mode? */
|
||||||
bne .Lvirt_return
|
bne .Lvirt_return
|
||||||
|
|
||||||
cmpwi cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
|
/* RFI into the highmem handler */
|
||||||
cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL
|
|
||||||
beq 11f
|
|
||||||
cmpwi r12, BOOK3S_INTERRUPT_H_DOORBELL
|
|
||||||
beq 15f /* Invoke the H_DOORBELL handler */
|
|
||||||
cmpwi cr2, r12, BOOK3S_INTERRUPT_HMI
|
|
||||||
beq cr2, 14f /* HMI check */
|
|
||||||
|
|
||||||
/* RFI into the highmem handler, or branch to interrupt handler */
|
|
||||||
mfmsr r6
|
mfmsr r6
|
||||||
li r0, MSR_RI
|
li r0, MSR_RI
|
||||||
andc r6, r6, r0
|
andc r6, r6, r0
|
||||||
mtmsrd r6, 1 /* Clear RI in MSR */
|
mtmsrd r6, 1 /* Clear RI in MSR */
|
||||||
mtsrr0 r8
|
mtsrr0 r8
|
||||||
mtsrr1 r7
|
mtsrr1 r7
|
||||||
beq cr1, 13f /* machine check */
|
|
||||||
RFI
|
RFI
|
||||||
|
|
||||||
/* On POWER7, we have external interrupts set to use HSRR0/1 */
|
/* Virtual-mode return */
|
||||||
11: mtspr SPRN_HSRR0, r8
|
|
||||||
mtspr SPRN_HSRR1, r7
|
|
||||||
ba 0x500
|
|
||||||
|
|
||||||
13: b machine_check_fwnmi
|
|
||||||
|
|
||||||
14: mtspr SPRN_HSRR0, r8
|
|
||||||
mtspr SPRN_HSRR1, r7
|
|
||||||
b hmi_exception_after_realmode
|
|
||||||
|
|
||||||
15: mtspr SPRN_HSRR0, r8
|
|
||||||
mtspr SPRN_HSRR1, r7
|
|
||||||
ba 0xe80
|
|
||||||
|
|
||||||
/* Virtual-mode return - can't get here for HMI or machine check */
|
|
||||||
.Lvirt_return:
|
.Lvirt_return:
|
||||||
cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL
|
mtlr r8
|
||||||
beq 16f
|
|
||||||
cmpwi r12, BOOK3S_INTERRUPT_H_DOORBELL
|
|
||||||
beq 17f
|
|
||||||
andi. r0, r7, MSR_EE /* were interrupts hard-enabled? */
|
|
||||||
beq 18f
|
|
||||||
mtmsrd r7, 1 /* if so then re-enable them */
|
|
||||||
18: mtlr r8
|
|
||||||
blr
|
blr
|
||||||
|
|
||||||
16: mtspr SPRN_HSRR0, r8 /* jump to reloc-on external vector */
|
|
||||||
mtspr SPRN_HSRR1, r7
|
|
||||||
b exc_virt_0x4500_hardware_interrupt
|
|
||||||
|
|
||||||
17: mtspr SPRN_HSRR0, r8
|
|
||||||
mtspr SPRN_HSRR1, r7
|
|
||||||
b exc_virt_0x4e80_h_doorbell
|
|
||||||
|
|
||||||
kvmppc_primary_no_guest:
|
kvmppc_primary_no_guest:
|
||||||
/* We handle this much like a ceded vcpu */
|
/* We handle this much like a ceded vcpu */
|
||||||
/* put the HDEC into the DEC, since HDEC interrupts don't wake us */
|
/* put the HDEC into the DEC, since HDEC interrupts don't wake us */
|
||||||
|
@ -769,6 +733,8 @@ BEGIN_FTR_SECTION
|
||||||
std r6, STACK_SLOT_PSSCR(r1)
|
std r6, STACK_SLOT_PSSCR(r1)
|
||||||
std r7, STACK_SLOT_PID(r1)
|
std r7, STACK_SLOT_PID(r1)
|
||||||
std r8, STACK_SLOT_IAMR(r1)
|
std r8, STACK_SLOT_IAMR(r1)
|
||||||
|
mfspr r5, SPRN_HFSCR
|
||||||
|
std r5, STACK_SLOT_HFSCR(r1)
|
||||||
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
|
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
|
||||||
BEGIN_FTR_SECTION
|
BEGIN_FTR_SECTION
|
||||||
mfspr r5, SPRN_CIABR
|
mfspr r5, SPRN_CIABR
|
||||||
|
@ -920,8 +886,10 @@ FTR_SECTION_ELSE
|
||||||
ld r5, VCPU_TID(r4)
|
ld r5, VCPU_TID(r4)
|
||||||
ld r6, VCPU_PSSCR(r4)
|
ld r6, VCPU_PSSCR(r4)
|
||||||
oris r6, r6, PSSCR_EC@h /* This makes stop trap to HV */
|
oris r6, r6, PSSCR_EC@h /* This makes stop trap to HV */
|
||||||
|
ld r7, VCPU_HFSCR(r4)
|
||||||
mtspr SPRN_TIDR, r5
|
mtspr SPRN_TIDR, r5
|
||||||
mtspr SPRN_PSSCR, r6
|
mtspr SPRN_PSSCR, r6
|
||||||
|
mtspr SPRN_HFSCR, r7
|
||||||
ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
|
ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
|
||||||
8:
|
8:
|
||||||
|
|
||||||
|
@ -936,7 +904,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
|
||||||
mftb r7
|
mftb r7
|
||||||
subf r3,r7,r8
|
subf r3,r7,r8
|
||||||
mtspr SPRN_DEC,r3
|
mtspr SPRN_DEC,r3
|
||||||
stw r3,VCPU_DEC(r4)
|
std r3,VCPU_DEC(r4)
|
||||||
|
|
||||||
ld r5, VCPU_SPRG0(r4)
|
ld r5, VCPU_SPRG0(r4)
|
||||||
ld r6, VCPU_SPRG1(r4)
|
ld r6, VCPU_SPRG1(r4)
|
||||||
|
@ -1048,7 +1016,13 @@ kvmppc_cede_reentry: /* r4 = vcpu, r13 = paca */
|
||||||
li r0, BOOK3S_INTERRUPT_EXTERNAL
|
li r0, BOOK3S_INTERRUPT_EXTERNAL
|
||||||
bne cr1, 12f
|
bne cr1, 12f
|
||||||
mfspr r0, SPRN_DEC
|
mfspr r0, SPRN_DEC
|
||||||
cmpwi r0, 0
|
BEGIN_FTR_SECTION
|
||||||
|
/* On POWER9 check whether the guest has large decrementer enabled */
|
||||||
|
andis. r8, r8, LPCR_LD@h
|
||||||
|
bne 15f
|
||||||
|
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
|
||||||
|
extsw r0, r0
|
||||||
|
15: cmpdi r0, 0
|
||||||
li r0, BOOK3S_INTERRUPT_DECREMENTER
|
li r0, BOOK3S_INTERRUPT_DECREMENTER
|
||||||
bge 5f
|
bge 5f
|
||||||
|
|
||||||
|
@ -1058,6 +1032,23 @@ kvmppc_cede_reentry: /* r4 = vcpu, r13 = paca */
|
||||||
mr r9, r4
|
mr r9, r4
|
||||||
bl kvmppc_msr_interrupt
|
bl kvmppc_msr_interrupt
|
||||||
5:
|
5:
|
||||||
|
BEGIN_FTR_SECTION
|
||||||
|
b fast_guest_return
|
||||||
|
END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
|
||||||
|
/* On POWER9, check for pending doorbell requests */
|
||||||
|
lbz r0, VCPU_DBELL_REQ(r4)
|
||||||
|
cmpwi r0, 0
|
||||||
|
beq fast_guest_return
|
||||||
|
ld r5, HSTATE_KVM_VCORE(r13)
|
||||||
|
/* Set DPDES register so the CPU will take a doorbell interrupt */
|
||||||
|
li r0, 1
|
||||||
|
mtspr SPRN_DPDES, r0
|
||||||
|
std r0, VCORE_DPDES(r5)
|
||||||
|
/* Make sure other cpus see vcore->dpdes set before dbell req clear */
|
||||||
|
lwsync
|
||||||
|
/* Clear the pending doorbell request */
|
||||||
|
li r0, 0
|
||||||
|
stb r0, VCPU_DBELL_REQ(r4)
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Required state:
|
* Required state:
|
||||||
|
@ -1232,6 +1223,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
|
||||||
|
|
||||||
stw r12,VCPU_TRAP(r9)
|
stw r12,VCPU_TRAP(r9)
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Now that we have saved away SRR0/1 and HSRR0/1,
|
||||||
|
* interrupts are recoverable in principle, so set MSR_RI.
|
||||||
|
* This becomes important for relocation-on interrupts from
|
||||||
|
* the guest, which we can get in radix mode on POWER9.
|
||||||
|
*/
|
||||||
|
li r0, MSR_RI
|
||||||
|
mtmsrd r0, 1
|
||||||
|
|
||||||
#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
|
#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
|
||||||
addi r3, r9, VCPU_TB_RMINTR
|
addi r3, r9, VCPU_TB_RMINTR
|
||||||
mr r4, r9
|
mr r4, r9
|
||||||
|
@ -1288,6 +1288,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
|
||||||
beq 4f
|
beq 4f
|
||||||
b guest_exit_cont
|
b guest_exit_cont
|
||||||
3:
|
3:
|
||||||
|
/* If it's a hypervisor facility unavailable interrupt, save HFSCR */
|
||||||
|
cmpwi r12, BOOK3S_INTERRUPT_H_FAC_UNAVAIL
|
||||||
|
bne 14f
|
||||||
|
mfspr r3, SPRN_HFSCR
|
||||||
|
std r3, VCPU_HFSCR(r9)
|
||||||
|
b guest_exit_cont
|
||||||
|
14:
|
||||||
/* External interrupt ? */
|
/* External interrupt ? */
|
||||||
cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL
|
cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL
|
||||||
bne+ guest_exit_cont
|
bne+ guest_exit_cont
|
||||||
|
@ -1475,12 +1482,18 @@ mc_cont:
|
||||||
mtspr SPRN_SPURR,r4
|
mtspr SPRN_SPURR,r4
|
||||||
|
|
||||||
/* Save DEC */
|
/* Save DEC */
|
||||||
|
ld r3, HSTATE_KVM_VCORE(r13)
|
||||||
mfspr r5,SPRN_DEC
|
mfspr r5,SPRN_DEC
|
||||||
mftb r6
|
mftb r6
|
||||||
|
/* On P9, if the guest has large decr enabled, don't sign extend */
|
||||||
|
BEGIN_FTR_SECTION
|
||||||
|
ld r4, VCORE_LPCR(r3)
|
||||||
|
andis. r4, r4, LPCR_LD@h
|
||||||
|
bne 16f
|
||||||
|
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
|
||||||
extsw r5,r5
|
extsw r5,r5
|
||||||
add r5,r5,r6
|
16: add r5,r5,r6
|
||||||
/* r5 is a guest timebase value here, convert to host TB */
|
/* r5 is a guest timebase value here, convert to host TB */
|
||||||
ld r3,HSTATE_KVM_VCORE(r13)
|
|
||||||
ld r4,VCORE_TB_OFFSET(r3)
|
ld r4,VCORE_TB_OFFSET(r3)
|
||||||
subf r5,r4,r5
|
subf r5,r4,r5
|
||||||
std r5,VCPU_DEC_EXPIRES(r9)
|
std r5,VCPU_DEC_EXPIRES(r9)
|
||||||
|
@ -1525,6 +1538,9 @@ FTR_SECTION_ELSE
|
||||||
rldicl r6, r6, 4, 50 /* r6 &= PSSCR_GUEST_VIS */
|
rldicl r6, r6, 4, 50 /* r6 &= PSSCR_GUEST_VIS */
|
||||||
rotldi r6, r6, 60
|
rotldi r6, r6, 60
|
||||||
std r6, VCPU_PSSCR(r9)
|
std r6, VCPU_PSSCR(r9)
|
||||||
|
/* Restore host HFSCR value */
|
||||||
|
ld r7, STACK_SLOT_HFSCR(r1)
|
||||||
|
mtspr SPRN_HFSCR, r7
|
||||||
ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
|
ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
|
||||||
/*
|
/*
|
||||||
* Restore various registers to 0, where non-zero values
|
* Restore various registers to 0, where non-zero values
|
||||||
|
@ -2402,8 +2418,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM)
|
||||||
mfspr r3, SPRN_DEC
|
mfspr r3, SPRN_DEC
|
||||||
mfspr r4, SPRN_HDEC
|
mfspr r4, SPRN_HDEC
|
||||||
mftb r5
|
mftb r5
|
||||||
|
BEGIN_FTR_SECTION
|
||||||
|
/* On P9 check whether the guest has large decrementer mode enabled */
|
||||||
|
ld r6, HSTATE_KVM_VCORE(r13)
|
||||||
|
ld r6, VCORE_LPCR(r6)
|
||||||
|
andis. r6, r6, LPCR_LD@h
|
||||||
|
bne 68f
|
||||||
|
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
|
||||||
extsw r3, r3
|
extsw r3, r3
|
||||||
EXTEND_HDEC(r4)
|
68: EXTEND_HDEC(r4)
|
||||||
cmpd r3, r4
|
cmpd r3, r4
|
||||||
ble 67f
|
ble 67f
|
||||||
mtspr SPRN_DEC, r4
|
mtspr SPRN_DEC, r4
|
||||||
|
@ -2589,22 +2612,32 @@ machine_check_realmode:
|
||||||
ld r9, HSTATE_KVM_VCPU(r13)
|
ld r9, HSTATE_KVM_VCPU(r13)
|
||||||
li r12, BOOK3S_INTERRUPT_MACHINE_CHECK
|
li r12, BOOK3S_INTERRUPT_MACHINE_CHECK
|
||||||
/*
|
/*
|
||||||
* Deliver unhandled/fatal (e.g. UE) MCE errors to guest through
|
* For the guest that is FWNMI capable, deliver all the MCE errors
|
||||||
* machine check interrupt (set HSRR0 to 0x200). And for handled
|
* (handled/unhandled) by exiting the guest with KVM_EXIT_NMI exit
|
||||||
* errors (no-fatal), just go back to guest execution with current
|
* reason. This new approach injects machine check errors in guest
|
||||||
* HSRR0 instead of exiting guest. This new approach will inject
|
* address space to guest with additional information in the form
|
||||||
* machine check to guest for fatal error causing guest to crash.
|
* of RTAS event, thus enabling guest kernel to suitably handle
|
||||||
*
|
* such errors.
|
||||||
* The old code used to return to host for unhandled errors which
|
|
||||||
* was causing guest to hang with soft lockups inside guest and
|
|
||||||
* makes it difficult to recover guest instance.
|
|
||||||
*
|
*
|
||||||
|
* For the guest that is not FWNMI capable (old QEMU) fallback
|
||||||
|
* to old behaviour for backward compatibility:
|
||||||
|
* Deliver unhandled/fatal (e.g. UE) MCE errors to guest either
|
||||||
|
* through machine check interrupt (set HSRR0 to 0x200).
|
||||||
|
* For handled errors (no-fatal), just go back to guest execution
|
||||||
|
* with current HSRR0.
|
||||||
* if we receive machine check with MSR(RI=0) then deliver it to
|
* if we receive machine check with MSR(RI=0) then deliver it to
|
||||||
* guest as machine check causing guest to crash.
|
* guest as machine check causing guest to crash.
|
||||||
*/
|
*/
|
||||||
ld r11, VCPU_MSR(r9)
|
ld r11, VCPU_MSR(r9)
|
||||||
rldicl. r0, r11, 64-MSR_HV_LG, 63 /* check if it happened in HV mode */
|
rldicl. r0, r11, 64-MSR_HV_LG, 63 /* check if it happened in HV mode */
|
||||||
bne mc_cont /* if so, exit to host */
|
bne mc_cont /* if so, exit to host */
|
||||||
|
/* Check if guest is capable of handling NMI exit */
|
||||||
|
ld r10, VCPU_KVM(r9)
|
||||||
|
lbz r10, KVM_FWNMI(r10)
|
||||||
|
cmpdi r10, 1 /* FWNMI capable? */
|
||||||
|
beq mc_cont /* if so, exit with KVM_EXIT_NMI. */
|
||||||
|
|
||||||
|
/* if not, fall through for backward compatibility. */
|
||||||
andi. r10, r11, MSR_RI /* check for unrecoverable exception */
|
andi. r10, r11, MSR_RI /* check for unrecoverable exception */
|
||||||
beq 1f /* Deliver a machine check to guest */
|
beq 1f /* Deliver a machine check to guest */
|
||||||
ld r10, VCPU_PC(r9)
|
ld r10, VCPU_PC(r9)
|
||||||
|
|
|
@ -1257,8 +1257,8 @@ static void xive_pre_save_scan(struct kvmppc_xive *xive)
|
||||||
if (!xc)
|
if (!xc)
|
||||||
continue;
|
continue;
|
||||||
for (j = 0; j < KVMPPC_XIVE_Q_COUNT; j++) {
|
for (j = 0; j < KVMPPC_XIVE_Q_COUNT; j++) {
|
||||||
if (xc->queues[i].qpage)
|
if (xc->queues[j].qpage)
|
||||||
xive_pre_save_queue(xive, &xc->queues[i]);
|
xive_pre_save_queue(xive, &xc->queues[j]);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -687,7 +687,7 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu)
|
||||||
|
|
||||||
kvmppc_core_check_exceptions(vcpu);
|
kvmppc_core_check_exceptions(vcpu);
|
||||||
|
|
||||||
if (vcpu->requests) {
|
if (kvm_request_pending(vcpu)) {
|
||||||
/* Exception delivery raised request; start over */
|
/* Exception delivery raised request; start over */
|
||||||
return 1;
|
return 1;
|
||||||
}
|
}
|
||||||
|
|
|
@ -39,7 +39,7 @@ void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
|
||||||
unsigned long dec_nsec;
|
unsigned long dec_nsec;
|
||||||
unsigned long long dec_time;
|
unsigned long long dec_time;
|
||||||
|
|
||||||
pr_debug("mtDEC: %x\n", vcpu->arch.dec);
|
pr_debug("mtDEC: %lx\n", vcpu->arch.dec);
|
||||||
hrtimer_try_to_cancel(&vcpu->arch.dec_timer);
|
hrtimer_try_to_cancel(&vcpu->arch.dec_timer);
|
||||||
|
|
||||||
#ifdef CONFIG_PPC_BOOK3S
|
#ifdef CONFIG_PPC_BOOK3S
|
||||||
|
@ -109,7 +109,7 @@ static int kvmppc_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, int rs)
|
||||||
case SPRN_TBWU: break;
|
case SPRN_TBWU: break;
|
||||||
|
|
||||||
case SPRN_DEC:
|
case SPRN_DEC:
|
||||||
vcpu->arch.dec = spr_val;
|
vcpu->arch.dec = (u32) spr_val;
|
||||||
kvmppc_emulate_dec(vcpu);
|
kvmppc_emulate_dec(vcpu);
|
||||||
break;
|
break;
|
||||||
|
|
||||||
|
|
|
@ -55,8 +55,7 @@ EXPORT_SYMBOL_GPL(kvmppc_pr_ops);
|
||||||
|
|
||||||
int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
|
int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
|
||||||
{
|
{
|
||||||
return !!(v->arch.pending_exceptions) ||
|
return !!(v->arch.pending_exceptions) || kvm_request_pending(v);
|
||||||
v->requests;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
|
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
|
||||||
|
@ -108,7 +107,7 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
|
||||||
*/
|
*/
|
||||||
smp_mb();
|
smp_mb();
|
||||||
|
|
||||||
if (vcpu->requests) {
|
if (kvm_request_pending(vcpu)) {
|
||||||
/* Make sure we process requests preemptable */
|
/* Make sure we process requests preemptable */
|
||||||
local_irq_enable();
|
local_irq_enable();
|
||||||
trace_kvm_check_requests(vcpu);
|
trace_kvm_check_requests(vcpu);
|
||||||
|
@ -554,13 +553,28 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
|
||||||
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
|
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
|
||||||
case KVM_CAP_PPC_SMT:
|
case KVM_CAP_PPC_SMT:
|
||||||
r = 0;
|
r = 0;
|
||||||
if (hv_enabled) {
|
if (kvm) {
|
||||||
|
if (kvm->arch.emul_smt_mode > 1)
|
||||||
|
r = kvm->arch.emul_smt_mode;
|
||||||
|
else
|
||||||
|
r = kvm->arch.smt_mode;
|
||||||
|
} else if (hv_enabled) {
|
||||||
if (cpu_has_feature(CPU_FTR_ARCH_300))
|
if (cpu_has_feature(CPU_FTR_ARCH_300))
|
||||||
r = 1;
|
r = 1;
|
||||||
else
|
else
|
||||||
r = threads_per_subcore;
|
r = threads_per_subcore;
|
||||||
}
|
}
|
||||||
break;
|
break;
|
||||||
|
case KVM_CAP_PPC_SMT_POSSIBLE:
|
||||||
|
r = 1;
|
||||||
|
if (hv_enabled) {
|
||||||
|
if (!cpu_has_feature(CPU_FTR_ARCH_300))
|
||||||
|
r = ((threads_per_subcore << 1) - 1);
|
||||||
|
else
|
||||||
|
/* P9 can emulate dbells, so allow any mode */
|
||||||
|
r = 8 | 4 | 2 | 1;
|
||||||
|
}
|
||||||
|
break;
|
||||||
case KVM_CAP_PPC_RMA:
|
case KVM_CAP_PPC_RMA:
|
||||||
r = 0;
|
r = 0;
|
||||||
break;
|
break;
|
||||||
|
@ -618,6 +632,11 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
|
||||||
/* Disable this on POWER9 until code handles new HPTE format */
|
/* Disable this on POWER9 until code handles new HPTE format */
|
||||||
r = !!hv_enabled && !cpu_has_feature(CPU_FTR_ARCH_300);
|
r = !!hv_enabled && !cpu_has_feature(CPU_FTR_ARCH_300);
|
||||||
break;
|
break;
|
||||||
|
#endif
|
||||||
|
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
|
||||||
|
case KVM_CAP_PPC_FWNMI:
|
||||||
|
r = hv_enabled;
|
||||||
|
break;
|
||||||
#endif
|
#endif
|
||||||
case KVM_CAP_PPC_HTM:
|
case KVM_CAP_PPC_HTM:
|
||||||
r = cpu_has_feature(CPU_FTR_TM_COMP) &&
|
r = cpu_has_feature(CPU_FTR_TM_COMP) &&
|
||||||
|
@ -1538,6 +1557,15 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
#endif /* CONFIG_KVM_XICS */
|
#endif /* CONFIG_KVM_XICS */
|
||||||
|
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
|
||||||
|
case KVM_CAP_PPC_FWNMI:
|
||||||
|
r = -EINVAL;
|
||||||
|
if (!is_kvmppc_hv_enabled(vcpu->kvm))
|
||||||
|
break;
|
||||||
|
r = 0;
|
||||||
|
vcpu->kvm->arch.fwnmi_enabled = true;
|
||||||
|
break;
|
||||||
|
#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
|
||||||
default:
|
default:
|
||||||
r = -EINVAL;
|
r = -EINVAL;
|
||||||
break;
|
break;
|
||||||
|
@ -1712,6 +1740,15 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
|
||||||
r = 0;
|
r = 0;
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
|
case KVM_CAP_PPC_SMT: {
|
||||||
|
unsigned long mode = cap->args[0];
|
||||||
|
unsigned long flags = cap->args[1];
|
||||||
|
|
||||||
|
r = -EINVAL;
|
||||||
|
if (kvm->arch.kvm_ops->set_smt_mode)
|
||||||
|
r = kvm->arch.kvm_ops->set_smt_mode(kvm, mode, flags);
|
||||||
|
break;
|
||||||
|
}
|
||||||
#endif
|
#endif
|
||||||
default:
|
default:
|
||||||
r = -EINVAL;
|
r = -EINVAL;
|
||||||
|
|
|
@ -59,7 +59,9 @@ union ctlreg0 {
|
||||||
unsigned long lap : 1; /* Low-address-protection control */
|
unsigned long lap : 1; /* Low-address-protection control */
|
||||||
unsigned long : 4;
|
unsigned long : 4;
|
||||||
unsigned long edat : 1; /* Enhanced-DAT-enablement control */
|
unsigned long edat : 1; /* Enhanced-DAT-enablement control */
|
||||||
unsigned long : 4;
|
unsigned long : 2;
|
||||||
|
unsigned long iep : 1; /* Instruction-Execution-Protection */
|
||||||
|
unsigned long : 1;
|
||||||
unsigned long afp : 1; /* AFP-register control */
|
unsigned long afp : 1; /* AFP-register control */
|
||||||
unsigned long vx : 1; /* Vector enablement control */
|
unsigned long vx : 1; /* Vector enablement control */
|
||||||
unsigned long : 7;
|
unsigned long : 7;
|
||||||
|
|
|
@ -42,9 +42,11 @@
|
||||||
#define KVM_HALT_POLL_NS_DEFAULT 80000
|
#define KVM_HALT_POLL_NS_DEFAULT 80000
|
||||||
|
|
||||||
/* s390-specific vcpu->requests bit members */
|
/* s390-specific vcpu->requests bit members */
|
||||||
#define KVM_REQ_ENABLE_IBS 8
|
#define KVM_REQ_ENABLE_IBS KVM_ARCH_REQ(0)
|
||||||
#define KVM_REQ_DISABLE_IBS 9
|
#define KVM_REQ_DISABLE_IBS KVM_ARCH_REQ(1)
|
||||||
#define KVM_REQ_ICPT_OPEREXC 10
|
#define KVM_REQ_ICPT_OPEREXC KVM_ARCH_REQ(2)
|
||||||
|
#define KVM_REQ_START_MIGRATION KVM_ARCH_REQ(3)
|
||||||
|
#define KVM_REQ_STOP_MIGRATION KVM_ARCH_REQ(4)
|
||||||
|
|
||||||
#define SIGP_CTRL_C 0x80
|
#define SIGP_CTRL_C 0x80
|
||||||
#define SIGP_CTRL_SCN_MASK 0x3f
|
#define SIGP_CTRL_SCN_MASK 0x3f
|
||||||
|
@ -56,7 +58,7 @@ union bsca_sigp_ctrl {
|
||||||
__u8 r : 1;
|
__u8 r : 1;
|
||||||
__u8 scn : 6;
|
__u8 scn : 6;
|
||||||
};
|
};
|
||||||
} __packed;
|
};
|
||||||
|
|
||||||
union esca_sigp_ctrl {
|
union esca_sigp_ctrl {
|
||||||
__u16 value;
|
__u16 value;
|
||||||
|
@ -65,14 +67,14 @@ union esca_sigp_ctrl {
|
||||||
__u8 reserved: 7;
|
__u8 reserved: 7;
|
||||||
__u8 scn;
|
__u8 scn;
|
||||||
};
|
};
|
||||||
} __packed;
|
};
|
||||||
|
|
||||||
struct esca_entry {
|
struct esca_entry {
|
||||||
union esca_sigp_ctrl sigp_ctrl;
|
union esca_sigp_ctrl sigp_ctrl;
|
||||||
__u16 reserved1[3];
|
__u16 reserved1[3];
|
||||||
__u64 sda;
|
__u64 sda;
|
||||||
__u64 reserved2[6];
|
__u64 reserved2[6];
|
||||||
} __packed;
|
};
|
||||||
|
|
||||||
struct bsca_entry {
|
struct bsca_entry {
|
||||||
__u8 reserved0;
|
__u8 reserved0;
|
||||||
|
@ -80,7 +82,7 @@ struct bsca_entry {
|
||||||
__u16 reserved[3];
|
__u16 reserved[3];
|
||||||
__u64 sda;
|
__u64 sda;
|
||||||
__u64 reserved2[2];
|
__u64 reserved2[2];
|
||||||
} __attribute__((packed));
|
};
|
||||||
|
|
||||||
union ipte_control {
|
union ipte_control {
|
||||||
unsigned long val;
|
unsigned long val;
|
||||||
|
@ -97,7 +99,7 @@ struct bsca_block {
|
||||||
__u64 mcn;
|
__u64 mcn;
|
||||||
__u64 reserved2;
|
__u64 reserved2;
|
||||||
struct bsca_entry cpu[KVM_S390_BSCA_CPU_SLOTS];
|
struct bsca_entry cpu[KVM_S390_BSCA_CPU_SLOTS];
|
||||||
} __attribute__((packed));
|
};
|
||||||
|
|
||||||
struct esca_block {
|
struct esca_block {
|
||||||
union ipte_control ipte_control;
|
union ipte_control ipte_control;
|
||||||
|
@ -105,7 +107,7 @@ struct esca_block {
|
||||||
__u64 mcn[4];
|
__u64 mcn[4];
|
||||||
__u64 reserved2[20];
|
__u64 reserved2[20];
|
||||||
struct esca_entry cpu[KVM_S390_ESCA_CPU_SLOTS];
|
struct esca_entry cpu[KVM_S390_ESCA_CPU_SLOTS];
|
||||||
} __packed;
|
};
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* This struct is used to store some machine check info from lowcore
|
* This struct is used to store some machine check info from lowcore
|
||||||
|
@ -274,7 +276,7 @@ struct kvm_s390_sie_block {
|
||||||
|
|
||||||
struct kvm_s390_itdb {
|
struct kvm_s390_itdb {
|
||||||
__u8 data[256];
|
__u8 data[256];
|
||||||
} __packed;
|
};
|
||||||
|
|
||||||
struct sie_page {
|
struct sie_page {
|
||||||
struct kvm_s390_sie_block sie_block;
|
struct kvm_s390_sie_block sie_block;
|
||||||
|
@ -282,7 +284,7 @@ struct sie_page {
|
||||||
__u8 reserved218[1000]; /* 0x0218 */
|
__u8 reserved218[1000]; /* 0x0218 */
|
||||||
struct kvm_s390_itdb itdb; /* 0x0600 */
|
struct kvm_s390_itdb itdb; /* 0x0600 */
|
||||||
__u8 reserved700[2304]; /* 0x0700 */
|
__u8 reserved700[2304]; /* 0x0700 */
|
||||||
} __packed;
|
};
|
||||||
|
|
||||||
struct kvm_vcpu_stat {
|
struct kvm_vcpu_stat {
|
||||||
u64 exit_userspace;
|
u64 exit_userspace;
|
||||||
|
@ -695,7 +697,7 @@ struct sie_page2 {
|
||||||
__u64 fac_list[S390_ARCH_FAC_LIST_SIZE_U64]; /* 0x0000 */
|
__u64 fac_list[S390_ARCH_FAC_LIST_SIZE_U64]; /* 0x0000 */
|
||||||
struct kvm_s390_crypto_cb crycb; /* 0x0800 */
|
struct kvm_s390_crypto_cb crycb; /* 0x0800 */
|
||||||
u8 reserved900[0x1000 - 0x900]; /* 0x0900 */
|
u8 reserved900[0x1000 - 0x900]; /* 0x0900 */
|
||||||
} __packed;
|
};
|
||||||
|
|
||||||
struct kvm_s390_vsie {
|
struct kvm_s390_vsie {
|
||||||
struct mutex mutex;
|
struct mutex mutex;
|
||||||
|
@ -705,6 +707,12 @@ struct kvm_s390_vsie {
|
||||||
struct page *pages[KVM_MAX_VCPUS];
|
struct page *pages[KVM_MAX_VCPUS];
|
||||||
};
|
};
|
||||||
|
|
||||||
|
struct kvm_s390_migration_state {
|
||||||
|
unsigned long bitmap_size; /* in bits (number of guest pages) */
|
||||||
|
atomic64_t dirty_pages; /* number of dirty pages */
|
||||||
|
unsigned long *pgste_bitmap;
|
||||||
|
};
|
||||||
|
|
||||||
struct kvm_arch{
|
struct kvm_arch{
|
||||||
void *sca;
|
void *sca;
|
||||||
int use_esca;
|
int use_esca;
|
||||||
|
@ -732,6 +740,7 @@ struct kvm_arch{
|
||||||
struct kvm_s390_crypto crypto;
|
struct kvm_s390_crypto crypto;
|
||||||
struct kvm_s390_vsie vsie;
|
struct kvm_s390_vsie vsie;
|
||||||
u64 epoch;
|
u64 epoch;
|
||||||
|
struct kvm_s390_migration_state *migration_state;
|
||||||
/* subset of available cpu features enabled by user space */
|
/* subset of available cpu features enabled by user space */
|
||||||
DECLARE_BITMAP(cpu_feat, KVM_S390_VM_CPU_FEAT_NR_BITS);
|
DECLARE_BITMAP(cpu_feat, KVM_S390_VM_CPU_FEAT_NR_BITS);
|
||||||
};
|
};
|
||||||
|
|
|
@ -26,6 +26,12 @@
|
||||||
#define MCCK_CODE_PSW_MWP_VALID _BITUL(63 - 20)
|
#define MCCK_CODE_PSW_MWP_VALID _BITUL(63 - 20)
|
||||||
#define MCCK_CODE_PSW_IA_VALID _BITUL(63 - 23)
|
#define MCCK_CODE_PSW_IA_VALID _BITUL(63 - 23)
|
||||||
|
|
||||||
|
#define MCCK_CR14_CR_PENDING_SUB_MASK (1 << 28)
|
||||||
|
#define MCCK_CR14_RECOVERY_SUB_MASK (1 << 27)
|
||||||
|
#define MCCK_CR14_DEGRAD_SUB_MASK (1 << 26)
|
||||||
|
#define MCCK_CR14_EXT_DAMAGE_SUB_MASK (1 << 25)
|
||||||
|
#define MCCK_CR14_WARN_SUB_MASK (1 << 24)
|
||||||
|
|
||||||
#ifndef __ASSEMBLY__
|
#ifndef __ASSEMBLY__
|
||||||
|
|
||||||
union mci {
|
union mci {
|
||||||
|
|
|
@ -28,6 +28,7 @@
|
||||||
#define KVM_DEV_FLIC_CLEAR_IO_IRQ 8
|
#define KVM_DEV_FLIC_CLEAR_IO_IRQ 8
|
||||||
#define KVM_DEV_FLIC_AISM 9
|
#define KVM_DEV_FLIC_AISM 9
|
||||||
#define KVM_DEV_FLIC_AIRQ_INJECT 10
|
#define KVM_DEV_FLIC_AIRQ_INJECT 10
|
||||||
|
#define KVM_DEV_FLIC_AISM_ALL 11
|
||||||
/*
|
/*
|
||||||
* We can have up to 4*64k pending subchannels + 8 adapter interrupts,
|
* We can have up to 4*64k pending subchannels + 8 adapter interrupts,
|
||||||
* as well as up to ASYNC_PF_PER_VCPU*KVM_MAX_VCPUS pfault done interrupts.
|
* as well as up to ASYNC_PF_PER_VCPU*KVM_MAX_VCPUS pfault done interrupts.
|
||||||
|
@ -53,6 +54,11 @@ struct kvm_s390_ais_req {
|
||||||
__u16 mode;
|
__u16 mode;
|
||||||
};
|
};
|
||||||
|
|
||||||
|
struct kvm_s390_ais_all {
|
||||||
|
__u8 simm;
|
||||||
|
__u8 nimm;
|
||||||
|
};
|
||||||
|
|
||||||
#define KVM_S390_IO_ADAPTER_MASK 1
|
#define KVM_S390_IO_ADAPTER_MASK 1
|
||||||
#define KVM_S390_IO_ADAPTER_MAP 2
|
#define KVM_S390_IO_ADAPTER_MAP 2
|
||||||
#define KVM_S390_IO_ADAPTER_UNMAP 3
|
#define KVM_S390_IO_ADAPTER_UNMAP 3
|
||||||
|
@ -70,6 +76,7 @@ struct kvm_s390_io_adapter_req {
|
||||||
#define KVM_S390_VM_TOD 1
|
#define KVM_S390_VM_TOD 1
|
||||||
#define KVM_S390_VM_CRYPTO 2
|
#define KVM_S390_VM_CRYPTO 2
|
||||||
#define KVM_S390_VM_CPU_MODEL 3
|
#define KVM_S390_VM_CPU_MODEL 3
|
||||||
|
#define KVM_S390_VM_MIGRATION 4
|
||||||
|
|
||||||
/* kvm attributes for mem_ctrl */
|
/* kvm attributes for mem_ctrl */
|
||||||
#define KVM_S390_VM_MEM_ENABLE_CMMA 0
|
#define KVM_S390_VM_MEM_ENABLE_CMMA 0
|
||||||
|
@ -151,6 +158,11 @@ struct kvm_s390_vm_cpu_subfunc {
|
||||||
#define KVM_S390_VM_CRYPTO_DISABLE_AES_KW 2
|
#define KVM_S390_VM_CRYPTO_DISABLE_AES_KW 2
|
||||||
#define KVM_S390_VM_CRYPTO_DISABLE_DEA_KW 3
|
#define KVM_S390_VM_CRYPTO_DISABLE_DEA_KW 3
|
||||||
|
|
||||||
|
/* kvm attributes for migration mode */
|
||||||
|
#define KVM_S390_VM_MIGRATION_STOP 0
|
||||||
|
#define KVM_S390_VM_MIGRATION_START 1
|
||||||
|
#define KVM_S390_VM_MIGRATION_STATUS 2
|
||||||
|
|
||||||
/* for KVM_GET_REGS and KVM_SET_REGS */
|
/* for KVM_GET_REGS and KVM_SET_REGS */
|
||||||
struct kvm_regs {
|
struct kvm_regs {
|
||||||
/* general purpose regs for s390 */
|
/* general purpose regs for s390 */
|
||||||
|
|
|
@ -89,7 +89,7 @@ struct region3_table_entry_fc1 {
|
||||||
unsigned long f : 1; /* Fetch-Protection Bit */
|
unsigned long f : 1; /* Fetch-Protection Bit */
|
||||||
unsigned long fc : 1; /* Format-Control */
|
unsigned long fc : 1; /* Format-Control */
|
||||||
unsigned long p : 1; /* DAT-Protection Bit */
|
unsigned long p : 1; /* DAT-Protection Bit */
|
||||||
unsigned long co : 1; /* Change-Recording Override */
|
unsigned long iep: 1; /* Instruction-Execution-Protection */
|
||||||
unsigned long : 2;
|
unsigned long : 2;
|
||||||
unsigned long i : 1; /* Region-Invalid Bit */
|
unsigned long i : 1; /* Region-Invalid Bit */
|
||||||
unsigned long cr : 1; /* Common-Region Bit */
|
unsigned long cr : 1; /* Common-Region Bit */
|
||||||
|
@ -131,7 +131,7 @@ struct segment_entry_fc1 {
|
||||||
unsigned long f : 1; /* Fetch-Protection Bit */
|
unsigned long f : 1; /* Fetch-Protection Bit */
|
||||||
unsigned long fc : 1; /* Format-Control */
|
unsigned long fc : 1; /* Format-Control */
|
||||||
unsigned long p : 1; /* DAT-Protection Bit */
|
unsigned long p : 1; /* DAT-Protection Bit */
|
||||||
unsigned long co : 1; /* Change-Recording Override */
|
unsigned long iep: 1; /* Instruction-Execution-Protection */
|
||||||
unsigned long : 2;
|
unsigned long : 2;
|
||||||
unsigned long i : 1; /* Segment-Invalid Bit */
|
unsigned long i : 1; /* Segment-Invalid Bit */
|
||||||
unsigned long cs : 1; /* Common-Segment Bit */
|
unsigned long cs : 1; /* Common-Segment Bit */
|
||||||
|
@ -168,7 +168,8 @@ union page_table_entry {
|
||||||
unsigned long z : 1; /* Zero Bit */
|
unsigned long z : 1; /* Zero Bit */
|
||||||
unsigned long i : 1; /* Page-Invalid Bit */
|
unsigned long i : 1; /* Page-Invalid Bit */
|
||||||
unsigned long p : 1; /* DAT-Protection Bit */
|
unsigned long p : 1; /* DAT-Protection Bit */
|
||||||
unsigned long : 9;
|
unsigned long iep: 1; /* Instruction-Execution-Protection */
|
||||||
|
unsigned long : 8;
|
||||||
};
|
};
|
||||||
};
|
};
|
||||||
|
|
||||||
|
@ -241,7 +242,7 @@ struct ale {
|
||||||
unsigned long asteo : 25; /* ASN-Second-Table-Entry Origin */
|
unsigned long asteo : 25; /* ASN-Second-Table-Entry Origin */
|
||||||
unsigned long : 6;
|
unsigned long : 6;
|
||||||
unsigned long astesn : 32; /* ASTE Sequence Number */
|
unsigned long astesn : 32; /* ASTE Sequence Number */
|
||||||
} __packed;
|
};
|
||||||
|
|
||||||
struct aste {
|
struct aste {
|
||||||
unsigned long i : 1; /* ASX-Invalid Bit */
|
unsigned long i : 1; /* ASX-Invalid Bit */
|
||||||
|
@ -257,7 +258,7 @@ struct aste {
|
||||||
unsigned long ald : 32;
|
unsigned long ald : 32;
|
||||||
unsigned long astesn : 32;
|
unsigned long astesn : 32;
|
||||||
/* .. more fields there */
|
/* .. more fields there */
|
||||||
} __packed;
|
};
|
||||||
|
|
||||||
int ipte_lock_held(struct kvm_vcpu *vcpu)
|
int ipte_lock_held(struct kvm_vcpu *vcpu)
|
||||||
{
|
{
|
||||||
|
@ -485,6 +486,7 @@ enum prot_type {
|
||||||
PROT_TYPE_KEYC = 1,
|
PROT_TYPE_KEYC = 1,
|
||||||
PROT_TYPE_ALC = 2,
|
PROT_TYPE_ALC = 2,
|
||||||
PROT_TYPE_DAT = 3,
|
PROT_TYPE_DAT = 3,
|
||||||
|
PROT_TYPE_IEP = 4,
|
||||||
};
|
};
|
||||||
|
|
||||||
static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
|
static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
|
||||||
|
@ -500,6 +502,9 @@ static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
|
||||||
switch (code) {
|
switch (code) {
|
||||||
case PGM_PROTECTION:
|
case PGM_PROTECTION:
|
||||||
switch (prot) {
|
switch (prot) {
|
||||||
|
case PROT_TYPE_IEP:
|
||||||
|
tec->b61 = 1;
|
||||||
|
/* FALL THROUGH */
|
||||||
case PROT_TYPE_LA:
|
case PROT_TYPE_LA:
|
||||||
tec->b56 = 1;
|
tec->b56 = 1;
|
||||||
break;
|
break;
|
||||||
|
@ -591,6 +596,7 @@ static int deref_table(struct kvm *kvm, unsigned long gpa, unsigned long *val)
|
||||||
* @gpa: points to where guest physical (absolute) address should be stored
|
* @gpa: points to where guest physical (absolute) address should be stored
|
||||||
* @asce: effective asce
|
* @asce: effective asce
|
||||||
* @mode: indicates the access mode to be used
|
* @mode: indicates the access mode to be used
|
||||||
|
* @prot: returns the type for protection exceptions
|
||||||
*
|
*
|
||||||
* Translate a guest virtual address into a guest absolute address by means
|
* Translate a guest virtual address into a guest absolute address by means
|
||||||
* of dynamic address translation as specified by the architecture.
|
* of dynamic address translation as specified by the architecture.
|
||||||
|
@ -606,19 +612,21 @@ static int deref_table(struct kvm *kvm, unsigned long gpa, unsigned long *val)
|
||||||
*/
|
*/
|
||||||
static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
|
static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
|
||||||
unsigned long *gpa, const union asce asce,
|
unsigned long *gpa, const union asce asce,
|
||||||
enum gacc_mode mode)
|
enum gacc_mode mode, enum prot_type *prot)
|
||||||
{
|
{
|
||||||
union vaddress vaddr = {.addr = gva};
|
union vaddress vaddr = {.addr = gva};
|
||||||
union raddress raddr = {.addr = gva};
|
union raddress raddr = {.addr = gva};
|
||||||
union page_table_entry pte;
|
union page_table_entry pte;
|
||||||
int dat_protection = 0;
|
int dat_protection = 0;
|
||||||
|
int iep_protection = 0;
|
||||||
union ctlreg0 ctlreg0;
|
union ctlreg0 ctlreg0;
|
||||||
unsigned long ptr;
|
unsigned long ptr;
|
||||||
int edat1, edat2;
|
int edat1, edat2, iep;
|
||||||
|
|
||||||
ctlreg0.val = vcpu->arch.sie_block->gcr[0];
|
ctlreg0.val = vcpu->arch.sie_block->gcr[0];
|
||||||
edat1 = ctlreg0.edat && test_kvm_facility(vcpu->kvm, 8);
|
edat1 = ctlreg0.edat && test_kvm_facility(vcpu->kvm, 8);
|
||||||
edat2 = edat1 && test_kvm_facility(vcpu->kvm, 78);
|
edat2 = edat1 && test_kvm_facility(vcpu->kvm, 78);
|
||||||
|
iep = ctlreg0.iep && test_kvm_facility(vcpu->kvm, 130);
|
||||||
if (asce.r)
|
if (asce.r)
|
||||||
goto real_address;
|
goto real_address;
|
||||||
ptr = asce.origin * 4096;
|
ptr = asce.origin * 4096;
|
||||||
|
@ -702,6 +710,7 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
|
||||||
return PGM_TRANSLATION_SPEC;
|
return PGM_TRANSLATION_SPEC;
|
||||||
if (rtte.fc && edat2) {
|
if (rtte.fc && edat2) {
|
||||||
dat_protection |= rtte.fc1.p;
|
dat_protection |= rtte.fc1.p;
|
||||||
|
iep_protection = rtte.fc1.iep;
|
||||||
raddr.rfaa = rtte.fc1.rfaa;
|
raddr.rfaa = rtte.fc1.rfaa;
|
||||||
goto absolute_address;
|
goto absolute_address;
|
||||||
}
|
}
|
||||||
|
@ -729,6 +738,7 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
|
||||||
return PGM_TRANSLATION_SPEC;
|
return PGM_TRANSLATION_SPEC;
|
||||||
if (ste.fc && edat1) {
|
if (ste.fc && edat1) {
|
||||||
dat_protection |= ste.fc1.p;
|
dat_protection |= ste.fc1.p;
|
||||||
|
iep_protection = ste.fc1.iep;
|
||||||
raddr.sfaa = ste.fc1.sfaa;
|
raddr.sfaa = ste.fc1.sfaa;
|
||||||
goto absolute_address;
|
goto absolute_address;
|
||||||
}
|
}
|
||||||
|
@ -745,12 +755,19 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
|
||||||
if (pte.z)
|
if (pte.z)
|
||||||
return PGM_TRANSLATION_SPEC;
|
return PGM_TRANSLATION_SPEC;
|
||||||
dat_protection |= pte.p;
|
dat_protection |= pte.p;
|
||||||
|
iep_protection = pte.iep;
|
||||||
raddr.pfra = pte.pfra;
|
raddr.pfra = pte.pfra;
|
||||||
real_address:
|
real_address:
|
||||||
raddr.addr = kvm_s390_real_to_abs(vcpu, raddr.addr);
|
raddr.addr = kvm_s390_real_to_abs(vcpu, raddr.addr);
|
||||||
absolute_address:
|
absolute_address:
|
||||||
if (mode == GACC_STORE && dat_protection)
|
if (mode == GACC_STORE && dat_protection) {
|
||||||
|
*prot = PROT_TYPE_DAT;
|
||||||
return PGM_PROTECTION;
|
return PGM_PROTECTION;
|
||||||
|
}
|
||||||
|
if (mode == GACC_IFETCH && iep_protection && iep) {
|
||||||
|
*prot = PROT_TYPE_IEP;
|
||||||
|
return PGM_PROTECTION;
|
||||||
|
}
|
||||||
if (kvm_is_error_gpa(vcpu->kvm, raddr.addr))
|
if (kvm_is_error_gpa(vcpu->kvm, raddr.addr))
|
||||||
return PGM_ADDRESSING;
|
return PGM_ADDRESSING;
|
||||||
*gpa = raddr.addr;
|
*gpa = raddr.addr;
|
||||||
|
@ -782,6 +799,7 @@ static int guest_page_range(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
|
||||||
{
|
{
|
||||||
psw_t *psw = &vcpu->arch.sie_block->gpsw;
|
psw_t *psw = &vcpu->arch.sie_block->gpsw;
|
||||||
int lap_enabled, rc = 0;
|
int lap_enabled, rc = 0;
|
||||||
|
enum prot_type prot;
|
||||||
|
|
||||||
lap_enabled = low_address_protection_enabled(vcpu, asce);
|
lap_enabled = low_address_protection_enabled(vcpu, asce);
|
||||||
while (nr_pages) {
|
while (nr_pages) {
|
||||||
|
@ -791,7 +809,7 @@ static int guest_page_range(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
|
||||||
PROT_TYPE_LA);
|
PROT_TYPE_LA);
|
||||||
ga &= PAGE_MASK;
|
ga &= PAGE_MASK;
|
||||||
if (psw_bits(*psw).dat) {
|
if (psw_bits(*psw).dat) {
|
||||||
rc = guest_translate(vcpu, ga, pages, asce, mode);
|
rc = guest_translate(vcpu, ga, pages, asce, mode, &prot);
|
||||||
if (rc < 0)
|
if (rc < 0)
|
||||||
return rc;
|
return rc;
|
||||||
} else {
|
} else {
|
||||||
|
@ -800,7 +818,7 @@ static int guest_page_range(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
|
||||||
rc = PGM_ADDRESSING;
|
rc = PGM_ADDRESSING;
|
||||||
}
|
}
|
||||||
if (rc)
|
if (rc)
|
||||||
return trans_exc(vcpu, rc, ga, ar, mode, PROT_TYPE_DAT);
|
return trans_exc(vcpu, rc, ga, ar, mode, prot);
|
||||||
ga += PAGE_SIZE;
|
ga += PAGE_SIZE;
|
||||||
pages++;
|
pages++;
|
||||||
nr_pages--;
|
nr_pages--;
|
||||||
|
@ -886,6 +904,7 @@ int guest_translate_address(struct kvm_vcpu *vcpu, unsigned long gva, u8 ar,
|
||||||
unsigned long *gpa, enum gacc_mode mode)
|
unsigned long *gpa, enum gacc_mode mode)
|
||||||
{
|
{
|
||||||
psw_t *psw = &vcpu->arch.sie_block->gpsw;
|
psw_t *psw = &vcpu->arch.sie_block->gpsw;
|
||||||
|
enum prot_type prot;
|
||||||
union asce asce;
|
union asce asce;
|
||||||
int rc;
|
int rc;
|
||||||
|
|
||||||
|
@ -900,9 +919,9 @@ int guest_translate_address(struct kvm_vcpu *vcpu, unsigned long gva, u8 ar,
|
||||||
}
|
}
|
||||||
|
|
||||||
if (psw_bits(*psw).dat && !asce.r) { /* Use DAT? */
|
if (psw_bits(*psw).dat && !asce.r) { /* Use DAT? */
|
||||||
rc = guest_translate(vcpu, gva, gpa, asce, mode);
|
rc = guest_translate(vcpu, gva, gpa, asce, mode, &prot);
|
||||||
if (rc > 0)
|
if (rc > 0)
|
||||||
return trans_exc(vcpu, rc, gva, 0, mode, PROT_TYPE_DAT);
|
return trans_exc(vcpu, rc, gva, 0, mode, prot);
|
||||||
} else {
|
} else {
|
||||||
*gpa = kvm_s390_real_to_abs(vcpu, gva);
|
*gpa = kvm_s390_real_to_abs(vcpu, gva);
|
||||||
if (kvm_is_error_gpa(vcpu->kvm, *gpa))
|
if (kvm_is_error_gpa(vcpu->kvm, *gpa))
|
||||||
|
|
|
@ -251,8 +251,13 @@ static unsigned long deliverable_irqs(struct kvm_vcpu *vcpu)
|
||||||
__clear_bit(IRQ_PEND_EXT_SERVICE, &active_mask);
|
__clear_bit(IRQ_PEND_EXT_SERVICE, &active_mask);
|
||||||
if (psw_mchk_disabled(vcpu))
|
if (psw_mchk_disabled(vcpu))
|
||||||
active_mask &= ~IRQ_PEND_MCHK_MASK;
|
active_mask &= ~IRQ_PEND_MCHK_MASK;
|
||||||
|
/*
|
||||||
|
* Check both floating and local interrupt's cr14 because
|
||||||
|
* bit IRQ_PEND_MCHK_REP could be set in both cases.
|
||||||
|
*/
|
||||||
if (!(vcpu->arch.sie_block->gcr[14] &
|
if (!(vcpu->arch.sie_block->gcr[14] &
|
||||||
vcpu->kvm->arch.float_int.mchk.cr14))
|
(vcpu->kvm->arch.float_int.mchk.cr14 |
|
||||||
|
vcpu->arch.local_int.irq.mchk.cr14)))
|
||||||
__clear_bit(IRQ_PEND_MCHK_REP, &active_mask);
|
__clear_bit(IRQ_PEND_MCHK_REP, &active_mask);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -1876,6 +1881,28 @@ out:
|
||||||
return ret < 0 ? ret : n;
|
return ret < 0 ? ret : n;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static int flic_ais_mode_get_all(struct kvm *kvm, struct kvm_device_attr *attr)
|
||||||
|
{
|
||||||
|
struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
|
||||||
|
struct kvm_s390_ais_all ais;
|
||||||
|
|
||||||
|
if (attr->attr < sizeof(ais))
|
||||||
|
return -EINVAL;
|
||||||
|
|
||||||
|
if (!test_kvm_facility(kvm, 72))
|
||||||
|
return -ENOTSUPP;
|
||||||
|
|
||||||
|
mutex_lock(&fi->ais_lock);
|
||||||
|
ais.simm = fi->simm;
|
||||||
|
ais.nimm = fi->nimm;
|
||||||
|
mutex_unlock(&fi->ais_lock);
|
||||||
|
|
||||||
|
if (copy_to_user((void __user *)attr->addr, &ais, sizeof(ais)))
|
||||||
|
return -EFAULT;
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
static int flic_get_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
|
static int flic_get_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
|
||||||
{
|
{
|
||||||
int r;
|
int r;
|
||||||
|
@ -1885,6 +1912,9 @@ static int flic_get_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
|
||||||
r = get_all_floating_irqs(dev->kvm, (u8 __user *) attr->addr,
|
r = get_all_floating_irqs(dev->kvm, (u8 __user *) attr->addr,
|
||||||
attr->attr);
|
attr->attr);
|
||||||
break;
|
break;
|
||||||
|
case KVM_DEV_FLIC_AISM_ALL:
|
||||||
|
r = flic_ais_mode_get_all(dev->kvm, attr);
|
||||||
|
break;
|
||||||
default:
|
default:
|
||||||
r = -EINVAL;
|
r = -EINVAL;
|
||||||
}
|
}
|
||||||
|
@ -2235,6 +2265,25 @@ static int flic_inject_airq(struct kvm *kvm, struct kvm_device_attr *attr)
|
||||||
return kvm_s390_inject_airq(kvm, adapter);
|
return kvm_s390_inject_airq(kvm, adapter);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static int flic_ais_mode_set_all(struct kvm *kvm, struct kvm_device_attr *attr)
|
||||||
|
{
|
||||||
|
struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
|
||||||
|
struct kvm_s390_ais_all ais;
|
||||||
|
|
||||||
|
if (!test_kvm_facility(kvm, 72))
|
||||||
|
return -ENOTSUPP;
|
||||||
|
|
||||||
|
if (copy_from_user(&ais, (void __user *)attr->addr, sizeof(ais)))
|
||||||
|
return -EFAULT;
|
||||||
|
|
||||||
|
mutex_lock(&fi->ais_lock);
|
||||||
|
fi->simm = ais.simm;
|
||||||
|
fi->nimm = ais.nimm;
|
||||||
|
mutex_unlock(&fi->ais_lock);
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
static int flic_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
|
static int flic_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
|
||||||
{
|
{
|
||||||
int r = 0;
|
int r = 0;
|
||||||
|
@ -2277,6 +2326,9 @@ static int flic_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
|
||||||
case KVM_DEV_FLIC_AIRQ_INJECT:
|
case KVM_DEV_FLIC_AIRQ_INJECT:
|
||||||
r = flic_inject_airq(dev->kvm, attr);
|
r = flic_inject_airq(dev->kvm, attr);
|
||||||
break;
|
break;
|
||||||
|
case KVM_DEV_FLIC_AISM_ALL:
|
||||||
|
r = flic_ais_mode_set_all(dev->kvm, attr);
|
||||||
|
break;
|
||||||
default:
|
default:
|
||||||
r = -EINVAL;
|
r = -EINVAL;
|
||||||
}
|
}
|
||||||
|
@ -2298,6 +2350,7 @@ static int flic_has_attr(struct kvm_device *dev,
|
||||||
case KVM_DEV_FLIC_CLEAR_IO_IRQ:
|
case KVM_DEV_FLIC_CLEAR_IO_IRQ:
|
||||||
case KVM_DEV_FLIC_AISM:
|
case KVM_DEV_FLIC_AISM:
|
||||||
case KVM_DEV_FLIC_AIRQ_INJECT:
|
case KVM_DEV_FLIC_AIRQ_INJECT:
|
||||||
|
case KVM_DEV_FLIC_AISM_ALL:
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
return -ENXIO;
|
return -ENXIO;
|
||||||
|
@ -2415,6 +2468,42 @@ static int set_adapter_int(struct kvm_kernel_irq_routing_entry *e,
|
||||||
return ret;
|
return ret;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Inject the machine check to the guest.
|
||||||
|
*/
|
||||||
|
void kvm_s390_reinject_machine_check(struct kvm_vcpu *vcpu,
|
||||||
|
struct mcck_volatile_info *mcck_info)
|
||||||
|
{
|
||||||
|
struct kvm_s390_interrupt_info inti;
|
||||||
|
struct kvm_s390_irq irq;
|
||||||
|
struct kvm_s390_mchk_info *mchk;
|
||||||
|
union mci mci;
|
||||||
|
__u64 cr14 = 0; /* upper bits are not used */
|
||||||
|
|
||||||
|
mci.val = mcck_info->mcic;
|
||||||
|
if (mci.sr)
|
||||||
|
cr14 |= MCCK_CR14_RECOVERY_SUB_MASK;
|
||||||
|
if (mci.dg)
|
||||||
|
cr14 |= MCCK_CR14_DEGRAD_SUB_MASK;
|
||||||
|
if (mci.w)
|
||||||
|
cr14 |= MCCK_CR14_WARN_SUB_MASK;
|
||||||
|
|
||||||
|
mchk = mci.ck ? &inti.mchk : &irq.u.mchk;
|
||||||
|
mchk->cr14 = cr14;
|
||||||
|
mchk->mcic = mcck_info->mcic;
|
||||||
|
mchk->ext_damage_code = mcck_info->ext_damage_code;
|
||||||
|
mchk->failing_storage_address = mcck_info->failing_storage_address;
|
||||||
|
if (mci.ck) {
|
||||||
|
/* Inject the floating machine check */
|
||||||
|
inti.type = KVM_S390_MCHK;
|
||||||
|
WARN_ON_ONCE(__inject_vm(vcpu->kvm, &inti));
|
||||||
|
} else {
|
||||||
|
/* Inject the machine check to specified vcpu */
|
||||||
|
irq.type = KVM_S390_MCHK;
|
||||||
|
WARN_ON_ONCE(kvm_s390_inject_vcpu(vcpu, &irq));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
int kvm_set_routing_entry(struct kvm *kvm,
|
int kvm_set_routing_entry(struct kvm *kvm,
|
||||||
struct kvm_kernel_irq_routing_entry *e,
|
struct kvm_kernel_irq_routing_entry *e,
|
||||||
const struct kvm_irq_routing_entry *ue)
|
const struct kvm_irq_routing_entry *ue)
|
||||||
|
|
|
@ -30,6 +30,7 @@
|
||||||
#include <linux/vmalloc.h>
|
#include <linux/vmalloc.h>
|
||||||
#include <linux/bitmap.h>
|
#include <linux/bitmap.h>
|
||||||
#include <linux/sched/signal.h>
|
#include <linux/sched/signal.h>
|
||||||
|
#include <linux/string.h>
|
||||||
|
|
||||||
#include <asm/asm-offsets.h>
|
#include <asm/asm-offsets.h>
|
||||||
#include <asm/lowcore.h>
|
#include <asm/lowcore.h>
|
||||||
|
@ -386,6 +387,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
|
||||||
case KVM_CAP_S390_SKEYS:
|
case KVM_CAP_S390_SKEYS:
|
||||||
case KVM_CAP_S390_IRQ_STATE:
|
case KVM_CAP_S390_IRQ_STATE:
|
||||||
case KVM_CAP_S390_USER_INSTR0:
|
case KVM_CAP_S390_USER_INSTR0:
|
||||||
|
case KVM_CAP_S390_CMMA_MIGRATION:
|
||||||
case KVM_CAP_S390_AIS:
|
case KVM_CAP_S390_AIS:
|
||||||
r = 1;
|
r = 1;
|
||||||
break;
|
break;
|
||||||
|
@ -749,6 +751,129 @@ static int kvm_s390_vm_set_crypto(struct kvm *kvm, struct kvm_device_attr *attr)
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void kvm_s390_sync_request_broadcast(struct kvm *kvm, int req)
|
||||||
|
{
|
||||||
|
int cx;
|
||||||
|
struct kvm_vcpu *vcpu;
|
||||||
|
|
||||||
|
kvm_for_each_vcpu(cx, vcpu, kvm)
|
||||||
|
kvm_s390_sync_request(req, vcpu);
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Must be called with kvm->srcu held to avoid races on memslots, and with
|
||||||
|
* kvm->lock to avoid races with ourselves and kvm_s390_vm_stop_migration.
|
||||||
|
*/
|
||||||
|
static int kvm_s390_vm_start_migration(struct kvm *kvm)
|
||||||
|
{
|
||||||
|
struct kvm_s390_migration_state *mgs;
|
||||||
|
struct kvm_memory_slot *ms;
|
||||||
|
/* should be the only one */
|
||||||
|
struct kvm_memslots *slots;
|
||||||
|
unsigned long ram_pages;
|
||||||
|
int slotnr;
|
||||||
|
|
||||||
|
/* migration mode already enabled */
|
||||||
|
if (kvm->arch.migration_state)
|
||||||
|
return 0;
|
||||||
|
|
||||||
|
slots = kvm_memslots(kvm);
|
||||||
|
if (!slots || !slots->used_slots)
|
||||||
|
return -EINVAL;
|
||||||
|
|
||||||
|
mgs = kzalloc(sizeof(*mgs), GFP_KERNEL);
|
||||||
|
if (!mgs)
|
||||||
|
return -ENOMEM;
|
||||||
|
kvm->arch.migration_state = mgs;
|
||||||
|
|
||||||
|
if (kvm->arch.use_cmma) {
|
||||||
|
/*
|
||||||
|
* Get the last slot. They should be sorted by base_gfn, so the
|
||||||
|
* last slot is also the one at the end of the address space.
|
||||||
|
* We have verified above that at least one slot is present.
|
||||||
|
*/
|
||||||
|
ms = slots->memslots + slots->used_slots - 1;
|
||||||
|
/* round up so we only use full longs */
|
||||||
|
ram_pages = roundup(ms->base_gfn + ms->npages, BITS_PER_LONG);
|
||||||
|
/* allocate enough bytes to store all the bits */
|
||||||
|
mgs->pgste_bitmap = vmalloc(ram_pages / 8);
|
||||||
|
if (!mgs->pgste_bitmap) {
|
||||||
|
kfree(mgs);
|
||||||
|
kvm->arch.migration_state = NULL;
|
||||||
|
return -ENOMEM;
|
||||||
|
}
|
||||||
|
|
||||||
|
mgs->bitmap_size = ram_pages;
|
||||||
|
atomic64_set(&mgs->dirty_pages, ram_pages);
|
||||||
|
/* mark all the pages in active slots as dirty */
|
||||||
|
for (slotnr = 0; slotnr < slots->used_slots; slotnr++) {
|
||||||
|
ms = slots->memslots + slotnr;
|
||||||
|
bitmap_set(mgs->pgste_bitmap, ms->base_gfn, ms->npages);
|
||||||
|
}
|
||||||
|
|
||||||
|
kvm_s390_sync_request_broadcast(kvm, KVM_REQ_START_MIGRATION);
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Must be called with kvm->lock to avoid races with ourselves and
|
||||||
|
* kvm_s390_vm_start_migration.
|
||||||
|
*/
|
||||||
|
static int kvm_s390_vm_stop_migration(struct kvm *kvm)
|
||||||
|
{
|
||||||
|
struct kvm_s390_migration_state *mgs;
|
||||||
|
|
||||||
|
/* migration mode already disabled */
|
||||||
|
if (!kvm->arch.migration_state)
|
||||||
|
return 0;
|
||||||
|
mgs = kvm->arch.migration_state;
|
||||||
|
kvm->arch.migration_state = NULL;
|
||||||
|
|
||||||
|
if (kvm->arch.use_cmma) {
|
||||||
|
kvm_s390_sync_request_broadcast(kvm, KVM_REQ_STOP_MIGRATION);
|
||||||
|
vfree(mgs->pgste_bitmap);
|
||||||
|
}
|
||||||
|
kfree(mgs);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
static int kvm_s390_vm_set_migration(struct kvm *kvm,
|
||||||
|
struct kvm_device_attr *attr)
|
||||||
|
{
|
||||||
|
int idx, res = -ENXIO;
|
||||||
|
|
||||||
|
mutex_lock(&kvm->lock);
|
||||||
|
switch (attr->attr) {
|
||||||
|
case KVM_S390_VM_MIGRATION_START:
|
||||||
|
idx = srcu_read_lock(&kvm->srcu);
|
||||||
|
res = kvm_s390_vm_start_migration(kvm);
|
||||||
|
srcu_read_unlock(&kvm->srcu, idx);
|
||||||
|
break;
|
||||||
|
case KVM_S390_VM_MIGRATION_STOP:
|
||||||
|
res = kvm_s390_vm_stop_migration(kvm);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
mutex_unlock(&kvm->lock);
|
||||||
|
|
||||||
|
return res;
|
||||||
|
}
|
||||||
|
|
||||||
|
static int kvm_s390_vm_get_migration(struct kvm *kvm,
|
||||||
|
struct kvm_device_attr *attr)
|
||||||
|
{
|
||||||
|
u64 mig = (kvm->arch.migration_state != NULL);
|
||||||
|
|
||||||
|
if (attr->attr != KVM_S390_VM_MIGRATION_STATUS)
|
||||||
|
return -ENXIO;
|
||||||
|
|
||||||
|
if (copy_to_user((void __user *)attr->addr, &mig, sizeof(mig)))
|
||||||
|
return -EFAULT;
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
static int kvm_s390_set_tod_high(struct kvm *kvm, struct kvm_device_attr *attr)
|
static int kvm_s390_set_tod_high(struct kvm *kvm, struct kvm_device_attr *attr)
|
||||||
{
|
{
|
||||||
u8 gtod_high;
|
u8 gtod_high;
|
||||||
|
@ -1089,6 +1214,9 @@ static int kvm_s390_vm_set_attr(struct kvm *kvm, struct kvm_device_attr *attr)
|
||||||
case KVM_S390_VM_CRYPTO:
|
case KVM_S390_VM_CRYPTO:
|
||||||
ret = kvm_s390_vm_set_crypto(kvm, attr);
|
ret = kvm_s390_vm_set_crypto(kvm, attr);
|
||||||
break;
|
break;
|
||||||
|
case KVM_S390_VM_MIGRATION:
|
||||||
|
ret = kvm_s390_vm_set_migration(kvm, attr);
|
||||||
|
break;
|
||||||
default:
|
default:
|
||||||
ret = -ENXIO;
|
ret = -ENXIO;
|
||||||
break;
|
break;
|
||||||
|
@ -1111,6 +1239,9 @@ static int kvm_s390_vm_get_attr(struct kvm *kvm, struct kvm_device_attr *attr)
|
||||||
case KVM_S390_VM_CPU_MODEL:
|
case KVM_S390_VM_CPU_MODEL:
|
||||||
ret = kvm_s390_get_cpu_model(kvm, attr);
|
ret = kvm_s390_get_cpu_model(kvm, attr);
|
||||||
break;
|
break;
|
||||||
|
case KVM_S390_VM_MIGRATION:
|
||||||
|
ret = kvm_s390_vm_get_migration(kvm, attr);
|
||||||
|
break;
|
||||||
default:
|
default:
|
||||||
ret = -ENXIO;
|
ret = -ENXIO;
|
||||||
break;
|
break;
|
||||||
|
@ -1178,6 +1309,9 @@ static int kvm_s390_vm_has_attr(struct kvm *kvm, struct kvm_device_attr *attr)
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
break;
|
break;
|
||||||
|
case KVM_S390_VM_MIGRATION:
|
||||||
|
ret = 0;
|
||||||
|
break;
|
||||||
default:
|
default:
|
||||||
ret = -ENXIO;
|
ret = -ENXIO;
|
||||||
break;
|
break;
|
||||||
|
@ -1285,6 +1419,182 @@ out:
|
||||||
return r;
|
return r;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Base address and length must be sent at the start of each block, therefore
|
||||||
|
* it's cheaper to send some clean data, as long as it's less than the size of
|
||||||
|
* two longs.
|
||||||
|
*/
|
||||||
|
#define KVM_S390_MAX_BIT_DISTANCE (2 * sizeof(void *))
|
||||||
|
/* for consistency */
|
||||||
|
#define KVM_S390_CMMA_SIZE_MAX ((u32)KVM_S390_SKEYS_MAX)
|
||||||
|
|
||||||
|
/*
|
||||||
|
* This function searches for the next page with dirty CMMA attributes, and
|
||||||
|
* saves the attributes in the buffer up to either the end of the buffer or
|
||||||
|
* until a block of at least KVM_S390_MAX_BIT_DISTANCE clean bits is found;
|
||||||
|
* no trailing clean bytes are saved.
|
||||||
|
* In case no dirty bits were found, or if CMMA was not enabled or used, the
|
||||||
|
* output buffer will indicate 0 as length.
|
||||||
|
*/
|
||||||
|
static int kvm_s390_get_cmma_bits(struct kvm *kvm,
|
||||||
|
struct kvm_s390_cmma_log *args)
|
||||||
|
{
|
||||||
|
struct kvm_s390_migration_state *s = kvm->arch.migration_state;
|
||||||
|
unsigned long bufsize, hva, pgstev, i, next, cur;
|
||||||
|
int srcu_idx, peek, r = 0, rr;
|
||||||
|
u8 *res;
|
||||||
|
|
||||||
|
cur = args->start_gfn;
|
||||||
|
i = next = pgstev = 0;
|
||||||
|
|
||||||
|
if (unlikely(!kvm->arch.use_cmma))
|
||||||
|
return -ENXIO;
|
||||||
|
/* Invalid/unsupported flags were specified */
|
||||||
|
if (args->flags & ~KVM_S390_CMMA_PEEK)
|
||||||
|
return -EINVAL;
|
||||||
|
/* Migration mode query, and we are not doing a migration */
|
||||||
|
peek = !!(args->flags & KVM_S390_CMMA_PEEK);
|
||||||
|
if (!peek && !s)
|
||||||
|
return -EINVAL;
|
||||||
|
/* CMMA is disabled or was not used, or the buffer has length zero */
|
||||||
|
bufsize = min(args->count, KVM_S390_CMMA_SIZE_MAX);
|
||||||
|
if (!bufsize || !kvm->mm->context.use_cmma) {
|
||||||
|
memset(args, 0, sizeof(*args));
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!peek) {
|
||||||
|
/* We are not peeking, and there are no dirty pages */
|
||||||
|
if (!atomic64_read(&s->dirty_pages)) {
|
||||||
|
memset(args, 0, sizeof(*args));
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
cur = find_next_bit(s->pgste_bitmap, s->bitmap_size,
|
||||||
|
args->start_gfn);
|
||||||
|
if (cur >= s->bitmap_size) /* nothing found, loop back */
|
||||||
|
cur = find_next_bit(s->pgste_bitmap, s->bitmap_size, 0);
|
||||||
|
if (cur >= s->bitmap_size) { /* again! (very unlikely) */
|
||||||
|
memset(args, 0, sizeof(*args));
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
next = find_next_bit(s->pgste_bitmap, s->bitmap_size, cur + 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
res = vmalloc(bufsize);
|
||||||
|
if (!res)
|
||||||
|
return -ENOMEM;
|
||||||
|
|
||||||
|
args->start_gfn = cur;
|
||||||
|
|
||||||
|
down_read(&kvm->mm->mmap_sem);
|
||||||
|
srcu_idx = srcu_read_lock(&kvm->srcu);
|
||||||
|
while (i < bufsize) {
|
||||||
|
hva = gfn_to_hva(kvm, cur);
|
||||||
|
if (kvm_is_error_hva(hva)) {
|
||||||
|
r = -EFAULT;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
/* decrement only if we actually flipped the bit to 0 */
|
||||||
|
if (!peek && test_and_clear_bit(cur, s->pgste_bitmap))
|
||||||
|
atomic64_dec(&s->dirty_pages);
|
||||||
|
r = get_pgste(kvm->mm, hva, &pgstev);
|
||||||
|
if (r < 0)
|
||||||
|
pgstev = 0;
|
||||||
|
/* save the value */
|
||||||
|
res[i++] = (pgstev >> 24) & 0x3;
|
||||||
|
/*
|
||||||
|
* if the next bit is too far away, stop.
|
||||||
|
* if we reached the previous "next", find the next one
|
||||||
|
*/
|
||||||
|
if (!peek) {
|
||||||
|
if (next > cur + KVM_S390_MAX_BIT_DISTANCE)
|
||||||
|
break;
|
||||||
|
if (cur == next)
|
||||||
|
next = find_next_bit(s->pgste_bitmap,
|
||||||
|
s->bitmap_size, cur + 1);
|
||||||
|
/* reached the end of the bitmap or of the buffer, stop */
|
||||||
|
if ((next >= s->bitmap_size) ||
|
||||||
|
(next >= args->start_gfn + bufsize))
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
cur++;
|
||||||
|
}
|
||||||
|
srcu_read_unlock(&kvm->srcu, srcu_idx);
|
||||||
|
up_read(&kvm->mm->mmap_sem);
|
||||||
|
args->count = i;
|
||||||
|
args->remaining = s ? atomic64_read(&s->dirty_pages) : 0;
|
||||||
|
|
||||||
|
rr = copy_to_user((void __user *)args->values, res, args->count);
|
||||||
|
if (rr)
|
||||||
|
r = -EFAULT;
|
||||||
|
|
||||||
|
vfree(res);
|
||||||
|
return r;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* This function sets the CMMA attributes for the given pages. If the input
|
||||||
|
* buffer has zero length, no action is taken, otherwise the attributes are
|
||||||
|
* set and the mm->context.use_cmma flag is set.
|
||||||
|
*/
|
||||||
|
static int kvm_s390_set_cmma_bits(struct kvm *kvm,
|
||||||
|
const struct kvm_s390_cmma_log *args)
|
||||||
|
{
|
||||||
|
unsigned long hva, mask, pgstev, i;
|
||||||
|
uint8_t *bits;
|
||||||
|
int srcu_idx, r = 0;
|
||||||
|
|
||||||
|
mask = args->mask;
|
||||||
|
|
||||||
|
if (!kvm->arch.use_cmma)
|
||||||
|
return -ENXIO;
|
||||||
|
/* invalid/unsupported flags */
|
||||||
|
if (args->flags != 0)
|
||||||
|
return -EINVAL;
|
||||||
|
/* Enforce sane limit on memory allocation */
|
||||||
|
if (args->count > KVM_S390_CMMA_SIZE_MAX)
|
||||||
|
return -EINVAL;
|
||||||
|
/* Nothing to do */
|
||||||
|
if (args->count == 0)
|
||||||
|
return 0;
|
||||||
|
|
||||||
|
bits = vmalloc(sizeof(*bits) * args->count);
|
||||||
|
if (!bits)
|
||||||
|
return -ENOMEM;
|
||||||
|
|
||||||
|
r = copy_from_user(bits, (void __user *)args->values, args->count);
|
||||||
|
if (r) {
|
||||||
|
r = -EFAULT;
|
||||||
|
goto out;
|
||||||
|
}
|
||||||
|
|
||||||
|
down_read(&kvm->mm->mmap_sem);
|
||||||
|
srcu_idx = srcu_read_lock(&kvm->srcu);
|
||||||
|
for (i = 0; i < args->count; i++) {
|
||||||
|
hva = gfn_to_hva(kvm, args->start_gfn + i);
|
||||||
|
if (kvm_is_error_hva(hva)) {
|
||||||
|
r = -EFAULT;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
pgstev = bits[i];
|
||||||
|
pgstev = pgstev << 24;
|
||||||
|
mask &= _PGSTE_GPS_USAGE_MASK;
|
||||||
|
set_pgste_bits(kvm->mm, hva, mask, pgstev);
|
||||||
|
}
|
||||||
|
srcu_read_unlock(&kvm->srcu, srcu_idx);
|
||||||
|
up_read(&kvm->mm->mmap_sem);
|
||||||
|
|
||||||
|
if (!kvm->mm->context.use_cmma) {
|
||||||
|
down_write(&kvm->mm->mmap_sem);
|
||||||
|
kvm->mm->context.use_cmma = 1;
|
||||||
|
up_write(&kvm->mm->mmap_sem);
|
||||||
|
}
|
||||||
|
out:
|
||||||
|
vfree(bits);
|
||||||
|
return r;
|
||||||
|
}
|
||||||
|
|
||||||
long kvm_arch_vm_ioctl(struct file *filp,
|
long kvm_arch_vm_ioctl(struct file *filp,
|
||||||
unsigned int ioctl, unsigned long arg)
|
unsigned int ioctl, unsigned long arg)
|
||||||
{
|
{
|
||||||
|
@ -1363,6 +1673,29 @@ long kvm_arch_vm_ioctl(struct file *filp,
|
||||||
r = kvm_s390_set_skeys(kvm, &args);
|
r = kvm_s390_set_skeys(kvm, &args);
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
|
case KVM_S390_GET_CMMA_BITS: {
|
||||||
|
struct kvm_s390_cmma_log args;
|
||||||
|
|
||||||
|
r = -EFAULT;
|
||||||
|
if (copy_from_user(&args, argp, sizeof(args)))
|
||||||
|
break;
|
||||||
|
r = kvm_s390_get_cmma_bits(kvm, &args);
|
||||||
|
if (!r) {
|
||||||
|
r = copy_to_user(argp, &args, sizeof(args));
|
||||||
|
if (r)
|
||||||
|
r = -EFAULT;
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
case KVM_S390_SET_CMMA_BITS: {
|
||||||
|
struct kvm_s390_cmma_log args;
|
||||||
|
|
||||||
|
r = -EFAULT;
|
||||||
|
if (copy_from_user(&args, argp, sizeof(args)))
|
||||||
|
break;
|
||||||
|
r = kvm_s390_set_cmma_bits(kvm, &args);
|
||||||
|
break;
|
||||||
|
}
|
||||||
default:
|
default:
|
||||||
r = -ENOTTY;
|
r = -ENOTTY;
|
||||||
}
|
}
|
||||||
|
@ -1631,6 +1964,10 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
|
||||||
kvm_s390_destroy_adapters(kvm);
|
kvm_s390_destroy_adapters(kvm);
|
||||||
kvm_s390_clear_float_irqs(kvm);
|
kvm_s390_clear_float_irqs(kvm);
|
||||||
kvm_s390_vsie_destroy(kvm);
|
kvm_s390_vsie_destroy(kvm);
|
||||||
|
if (kvm->arch.migration_state) {
|
||||||
|
vfree(kvm->arch.migration_state->pgste_bitmap);
|
||||||
|
kfree(kvm->arch.migration_state);
|
||||||
|
}
|
||||||
KVM_EVENT(3, "vm 0x%pK destroyed", kvm);
|
KVM_EVENT(3, "vm 0x%pK destroyed", kvm);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -1975,7 +2312,6 @@ int kvm_s390_vcpu_setup_cmma(struct kvm_vcpu *vcpu)
|
||||||
if (!vcpu->arch.sie_block->cbrlo)
|
if (!vcpu->arch.sie_block->cbrlo)
|
||||||
return -ENOMEM;
|
return -ENOMEM;
|
||||||
|
|
||||||
vcpu->arch.sie_block->ecb2 |= ECB2_CMMA;
|
|
||||||
vcpu->arch.sie_block->ecb2 &= ~ECB2_PFMFI;
|
vcpu->arch.sie_block->ecb2 &= ~ECB2_PFMFI;
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
@ -2439,7 +2775,7 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
|
||||||
{
|
{
|
||||||
retry:
|
retry:
|
||||||
kvm_s390_vcpu_request_handled(vcpu);
|
kvm_s390_vcpu_request_handled(vcpu);
|
||||||
if (!vcpu->requests)
|
if (!kvm_request_pending(vcpu))
|
||||||
return 0;
|
return 0;
|
||||||
/*
|
/*
|
||||||
* We use MMU_RELOAD just to re-arm the ipte notifier for the
|
* We use MMU_RELOAD just to re-arm the ipte notifier for the
|
||||||
|
@ -2488,6 +2824,27 @@ retry:
|
||||||
goto retry;
|
goto retry;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (kvm_check_request(KVM_REQ_START_MIGRATION, vcpu)) {
|
||||||
|
/*
|
||||||
|
* Disable CMMA virtualization; we will emulate the ESSA
|
||||||
|
* instruction manually, in order to provide additional
|
||||||
|
* functionalities needed for live migration.
|
||||||
|
*/
|
||||||
|
vcpu->arch.sie_block->ecb2 &= ~ECB2_CMMA;
|
||||||
|
goto retry;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (kvm_check_request(KVM_REQ_STOP_MIGRATION, vcpu)) {
|
||||||
|
/*
|
||||||
|
* Re-enable CMMA virtualization if CMMA is available and
|
||||||
|
* was used.
|
||||||
|
*/
|
||||||
|
if ((vcpu->kvm->arch.use_cmma) &&
|
||||||
|
(vcpu->kvm->mm->context.use_cmma))
|
||||||
|
vcpu->arch.sie_block->ecb2 |= ECB2_CMMA;
|
||||||
|
goto retry;
|
||||||
|
}
|
||||||
|
|
||||||
/* nothing to do, just clear the request */
|
/* nothing to do, just clear the request */
|
||||||
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
|
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
|
||||||
|
|
||||||
|
@ -2682,6 +3039,9 @@ static int vcpu_post_run_fault_in_sie(struct kvm_vcpu *vcpu)
|
||||||
|
|
||||||
static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
|
static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
|
||||||
{
|
{
|
||||||
|
struct mcck_volatile_info *mcck_info;
|
||||||
|
struct sie_page *sie_page;
|
||||||
|
|
||||||
VCPU_EVENT(vcpu, 6, "exit sie icptcode %d",
|
VCPU_EVENT(vcpu, 6, "exit sie icptcode %d",
|
||||||
vcpu->arch.sie_block->icptcode);
|
vcpu->arch.sie_block->icptcode);
|
||||||
trace_kvm_s390_sie_exit(vcpu, vcpu->arch.sie_block->icptcode);
|
trace_kvm_s390_sie_exit(vcpu, vcpu->arch.sie_block->icptcode);
|
||||||
|
@ -2692,6 +3052,15 @@ static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
|
||||||
vcpu->run->s.regs.gprs[14] = vcpu->arch.sie_block->gg14;
|
vcpu->run->s.regs.gprs[14] = vcpu->arch.sie_block->gg14;
|
||||||
vcpu->run->s.regs.gprs[15] = vcpu->arch.sie_block->gg15;
|
vcpu->run->s.regs.gprs[15] = vcpu->arch.sie_block->gg15;
|
||||||
|
|
||||||
|
if (exit_reason == -EINTR) {
|
||||||
|
VCPU_EVENT(vcpu, 3, "%s", "machine check");
|
||||||
|
sie_page = container_of(vcpu->arch.sie_block,
|
||||||
|
struct sie_page, sie_block);
|
||||||
|
mcck_info = &sie_page->mcck_info;
|
||||||
|
kvm_s390_reinject_machine_check(vcpu, mcck_info);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
if (vcpu->arch.sie_block->icptcode > 0) {
|
if (vcpu->arch.sie_block->icptcode > 0) {
|
||||||
int rc = kvm_handle_sie_intercept(vcpu);
|
int rc = kvm_handle_sie_intercept(vcpu);
|
||||||
|
|
||||||
|
|
|
@ -397,4 +397,6 @@ static inline int kvm_s390_use_sca_entries(void)
|
||||||
*/
|
*/
|
||||||
return sclp.has_sigpif;
|
return sclp.has_sigpif;
|
||||||
}
|
}
|
||||||
|
void kvm_s390_reinject_machine_check(struct kvm_vcpu *vcpu,
|
||||||
|
struct mcck_volatile_info *mcck_info);
|
||||||
#endif
|
#endif
|
||||||
|
|
|
@ -24,6 +24,7 @@
|
||||||
#include <asm/ebcdic.h>
|
#include <asm/ebcdic.h>
|
||||||
#include <asm/sysinfo.h>
|
#include <asm/sysinfo.h>
|
||||||
#include <asm/pgtable.h>
|
#include <asm/pgtable.h>
|
||||||
|
#include <asm/page-states.h>
|
||||||
#include <asm/pgalloc.h>
|
#include <asm/pgalloc.h>
|
||||||
#include <asm/gmap.h>
|
#include <asm/gmap.h>
|
||||||
#include <asm/io.h>
|
#include <asm/io.h>
|
||||||
|
@ -949,13 +950,72 @@ static int handle_pfmf(struct kvm_vcpu *vcpu)
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static inline int do_essa(struct kvm_vcpu *vcpu, const int orc)
|
||||||
|
{
|
||||||
|
struct kvm_s390_migration_state *ms = vcpu->kvm->arch.migration_state;
|
||||||
|
int r1, r2, nappended, entries;
|
||||||
|
unsigned long gfn, hva, res, pgstev, ptev;
|
||||||
|
unsigned long *cbrlo;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* We don't need to set SD.FPF.SK to 1 here, because if we have a
|
||||||
|
* machine check here we either handle it or crash
|
||||||
|
*/
|
||||||
|
|
||||||
|
kvm_s390_get_regs_rre(vcpu, &r1, &r2);
|
||||||
|
gfn = vcpu->run->s.regs.gprs[r2] >> PAGE_SHIFT;
|
||||||
|
hva = gfn_to_hva(vcpu->kvm, gfn);
|
||||||
|
entries = (vcpu->arch.sie_block->cbrlo & ~PAGE_MASK) >> 3;
|
||||||
|
|
||||||
|
if (kvm_is_error_hva(hva))
|
||||||
|
return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
|
||||||
|
|
||||||
|
nappended = pgste_perform_essa(vcpu->kvm->mm, hva, orc, &ptev, &pgstev);
|
||||||
|
if (nappended < 0) {
|
||||||
|
res = orc ? 0x10 : 0;
|
||||||
|
vcpu->run->s.regs.gprs[r1] = res; /* Exception Indication */
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
res = (pgstev & _PGSTE_GPS_USAGE_MASK) >> 22;
|
||||||
|
/*
|
||||||
|
* Set the block-content state part of the result. 0 means resident, so
|
||||||
|
* nothing to do if the page is valid. 2 is for preserved pages
|
||||||
|
* (non-present and non-zero), and 3 for zero pages (non-present and
|
||||||
|
* zero).
|
||||||
|
*/
|
||||||
|
if (ptev & _PAGE_INVALID) {
|
||||||
|
res |= 2;
|
||||||
|
if (pgstev & _PGSTE_GPS_ZERO)
|
||||||
|
res |= 1;
|
||||||
|
}
|
||||||
|
vcpu->run->s.regs.gprs[r1] = res;
|
||||||
|
/*
|
||||||
|
* It is possible that all the normal 511 slots were full, in which case
|
||||||
|
* we will now write in the 512th slot, which is reserved for host use.
|
||||||
|
* In both cases we let the normal essa handling code process all the
|
||||||
|
* slots, including the reserved one, if needed.
|
||||||
|
*/
|
||||||
|
if (nappended > 0) {
|
||||||
|
cbrlo = phys_to_virt(vcpu->arch.sie_block->cbrlo & PAGE_MASK);
|
||||||
|
cbrlo[entries] = gfn << PAGE_SHIFT;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (orc) {
|
||||||
|
/* increment only if we are really flipping the bit to 1 */
|
||||||
|
if (!test_and_set_bit(gfn, ms->pgste_bitmap))
|
||||||
|
atomic64_inc(&ms->dirty_pages);
|
||||||
|
}
|
||||||
|
|
||||||
|
return nappended;
|
||||||
|
}
|
||||||
|
|
||||||
static int handle_essa(struct kvm_vcpu *vcpu)
|
static int handle_essa(struct kvm_vcpu *vcpu)
|
||||||
{
|
{
|
||||||
/* entries expected to be 1FF */
|
/* entries expected to be 1FF */
|
||||||
int entries = (vcpu->arch.sie_block->cbrlo & ~PAGE_MASK) >> 3;
|
int entries = (vcpu->arch.sie_block->cbrlo & ~PAGE_MASK) >> 3;
|
||||||
unsigned long *cbrlo;
|
unsigned long *cbrlo;
|
||||||
struct gmap *gmap;
|
struct gmap *gmap;
|
||||||
int i;
|
int i, orc;
|
||||||
|
|
||||||
VCPU_EVENT(vcpu, 4, "ESSA: release %d pages", entries);
|
VCPU_EVENT(vcpu, 4, "ESSA: release %d pages", entries);
|
||||||
gmap = vcpu->arch.gmap;
|
gmap = vcpu->arch.gmap;
|
||||||
|
@ -965,12 +1025,45 @@ static int handle_essa(struct kvm_vcpu *vcpu)
|
||||||
|
|
||||||
if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
|
if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
|
||||||
return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
|
return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
|
||||||
|
/* Check for invalid operation request code */
|
||||||
if (((vcpu->arch.sie_block->ipb & 0xf0000000) >> 28) > 6)
|
orc = (vcpu->arch.sie_block->ipb & 0xf0000000) >> 28;
|
||||||
|
if (orc > ESSA_MAX)
|
||||||
return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
|
return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
|
||||||
|
|
||||||
/* Retry the ESSA instruction */
|
if (likely(!vcpu->kvm->arch.migration_state)) {
|
||||||
kvm_s390_retry_instr(vcpu);
|
/*
|
||||||
|
* CMMA is enabled in the KVM settings, but is disabled in
|
||||||
|
* the SIE block and in the mm_context, and we are not doing
|
||||||
|
* a migration. Enable CMMA in the mm_context.
|
||||||
|
* Since we need to take a write lock to write to the context
|
||||||
|
* to avoid races with storage keys handling, we check if the
|
||||||
|
* value really needs to be written to; if the value is
|
||||||
|
* already correct, we do nothing and avoid the lock.
|
||||||
|
*/
|
||||||
|
if (vcpu->kvm->mm->context.use_cmma == 0) {
|
||||||
|
down_write(&vcpu->kvm->mm->mmap_sem);
|
||||||
|
vcpu->kvm->mm->context.use_cmma = 1;
|
||||||
|
up_write(&vcpu->kvm->mm->mmap_sem);
|
||||||
|
}
|
||||||
|
/*
|
||||||
|
* If we are here, we are supposed to have CMMA enabled in
|
||||||
|
* the SIE block. Enabling CMMA works on a per-CPU basis,
|
||||||
|
* while the context use_cmma flag is per process.
|
||||||
|
* It's possible that the context flag is enabled and the
|
||||||
|
* SIE flag is not, so we set the flag always; if it was
|
||||||
|
* already set, nothing changes, otherwise we enable it
|
||||||
|
* on this CPU too.
|
||||||
|
*/
|
||||||
|
vcpu->arch.sie_block->ecb2 |= ECB2_CMMA;
|
||||||
|
/* Retry the ESSA instruction */
|
||||||
|
kvm_s390_retry_instr(vcpu);
|
||||||
|
} else {
|
||||||
|
/* Account for the possible extra cbrl entry */
|
||||||
|
i = do_essa(vcpu, orc);
|
||||||
|
if (i < 0)
|
||||||
|
return i;
|
||||||
|
entries += i;
|
||||||
|
}
|
||||||
vcpu->arch.sie_block->cbrlo &= PAGE_MASK; /* reset nceo */
|
vcpu->arch.sie_block->cbrlo &= PAGE_MASK; /* reset nceo */
|
||||||
cbrlo = phys_to_virt(vcpu->arch.sie_block->cbrlo);
|
cbrlo = phys_to_virt(vcpu->arch.sie_block->cbrlo);
|
||||||
down_read(&gmap->mm->mmap_sem);
|
down_read(&gmap->mm->mmap_sem);
|
||||||
|
|
|
@ -26,16 +26,21 @@
|
||||||
|
|
||||||
struct vsie_page {
|
struct vsie_page {
|
||||||
struct kvm_s390_sie_block scb_s; /* 0x0000 */
|
struct kvm_s390_sie_block scb_s; /* 0x0000 */
|
||||||
|
/*
|
||||||
|
* the backup info for machine check. ensure it's at
|
||||||
|
* the same offset as that in struct sie_page!
|
||||||
|
*/
|
||||||
|
struct mcck_volatile_info mcck_info; /* 0x0200 */
|
||||||
/* the pinned originial scb */
|
/* the pinned originial scb */
|
||||||
struct kvm_s390_sie_block *scb_o; /* 0x0200 */
|
struct kvm_s390_sie_block *scb_o; /* 0x0218 */
|
||||||
/* the shadow gmap in use by the vsie_page */
|
/* the shadow gmap in use by the vsie_page */
|
||||||
struct gmap *gmap; /* 0x0208 */
|
struct gmap *gmap; /* 0x0220 */
|
||||||
/* address of the last reported fault to guest2 */
|
/* address of the last reported fault to guest2 */
|
||||||
unsigned long fault_addr; /* 0x0210 */
|
unsigned long fault_addr; /* 0x0228 */
|
||||||
__u8 reserved[0x0700 - 0x0218]; /* 0x0218 */
|
__u8 reserved[0x0700 - 0x0230]; /* 0x0230 */
|
||||||
struct kvm_s390_crypto_cb crycb; /* 0x0700 */
|
struct kvm_s390_crypto_cb crycb; /* 0x0700 */
|
||||||
__u8 fac[S390_ARCH_FAC_LIST_SIZE_BYTE]; /* 0x0800 */
|
__u8 fac[S390_ARCH_FAC_LIST_SIZE_BYTE]; /* 0x0800 */
|
||||||
} __packed;
|
};
|
||||||
|
|
||||||
/* trigger a validity icpt for the given scb */
|
/* trigger a validity icpt for the given scb */
|
||||||
static int set_validity_icpt(struct kvm_s390_sie_block *scb,
|
static int set_validity_icpt(struct kvm_s390_sie_block *scb,
|
||||||
|
@ -801,6 +806,8 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
|
||||||
{
|
{
|
||||||
struct kvm_s390_sie_block *scb_s = &vsie_page->scb_s;
|
struct kvm_s390_sie_block *scb_s = &vsie_page->scb_s;
|
||||||
struct kvm_s390_sie_block *scb_o = vsie_page->scb_o;
|
struct kvm_s390_sie_block *scb_o = vsie_page->scb_o;
|
||||||
|
struct mcck_volatile_info *mcck_info;
|
||||||
|
struct sie_page *sie_page;
|
||||||
int rc;
|
int rc;
|
||||||
|
|
||||||
handle_last_fault(vcpu, vsie_page);
|
handle_last_fault(vcpu, vsie_page);
|
||||||
|
@ -822,6 +829,14 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
|
||||||
local_irq_enable();
|
local_irq_enable();
|
||||||
vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
|
vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
|
||||||
|
|
||||||
|
if (rc == -EINTR) {
|
||||||
|
VCPU_EVENT(vcpu, 3, "%s", "machine check");
|
||||||
|
sie_page = container_of(scb_s, struct sie_page, sie_block);
|
||||||
|
mcck_info = &sie_page->mcck_info;
|
||||||
|
kvm_s390_reinject_machine_check(vcpu, mcck_info);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
if (rc > 0)
|
if (rc > 0)
|
||||||
rc = 0; /* we could still have an icpt */
|
rc = 0; /* we could still have an icpt */
|
||||||
else if (rc == -EFAULT)
|
else if (rc == -EFAULT)
|
||||||
|
|
|
@ -48,28 +48,31 @@
|
||||||
#define KVM_IRQCHIP_NUM_PINS KVM_IOAPIC_NUM_PINS
|
#define KVM_IRQCHIP_NUM_PINS KVM_IOAPIC_NUM_PINS
|
||||||
|
|
||||||
/* x86-specific vcpu->requests bit members */
|
/* x86-specific vcpu->requests bit members */
|
||||||
#define KVM_REQ_MIGRATE_TIMER 8
|
#define KVM_REQ_MIGRATE_TIMER KVM_ARCH_REQ(0)
|
||||||
#define KVM_REQ_REPORT_TPR_ACCESS 9
|
#define KVM_REQ_REPORT_TPR_ACCESS KVM_ARCH_REQ(1)
|
||||||
#define KVM_REQ_TRIPLE_FAULT 10
|
#define KVM_REQ_TRIPLE_FAULT KVM_ARCH_REQ(2)
|
||||||
#define KVM_REQ_MMU_SYNC 11
|
#define KVM_REQ_MMU_SYNC KVM_ARCH_REQ(3)
|
||||||
#define KVM_REQ_CLOCK_UPDATE 12
|
#define KVM_REQ_CLOCK_UPDATE KVM_ARCH_REQ(4)
|
||||||
#define KVM_REQ_EVENT 14
|
#define KVM_REQ_EVENT KVM_ARCH_REQ(6)
|
||||||
#define KVM_REQ_APF_HALT 15
|
#define KVM_REQ_APF_HALT KVM_ARCH_REQ(7)
|
||||||
#define KVM_REQ_STEAL_UPDATE 16
|
#define KVM_REQ_STEAL_UPDATE KVM_ARCH_REQ(8)
|
||||||
#define KVM_REQ_NMI 17
|
#define KVM_REQ_NMI KVM_ARCH_REQ(9)
|
||||||
#define KVM_REQ_PMU 18
|
#define KVM_REQ_PMU KVM_ARCH_REQ(10)
|
||||||
#define KVM_REQ_PMI 19
|
#define KVM_REQ_PMI KVM_ARCH_REQ(11)
|
||||||
#define KVM_REQ_SMI 20
|
#define KVM_REQ_SMI KVM_ARCH_REQ(12)
|
||||||
#define KVM_REQ_MASTERCLOCK_UPDATE 21
|
#define KVM_REQ_MASTERCLOCK_UPDATE KVM_ARCH_REQ(13)
|
||||||
#define KVM_REQ_MCLOCK_INPROGRESS (22 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
|
#define KVM_REQ_MCLOCK_INPROGRESS \
|
||||||
#define KVM_REQ_SCAN_IOAPIC (23 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
|
KVM_ARCH_REQ_FLAGS(14, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
|
||||||
#define KVM_REQ_GLOBAL_CLOCK_UPDATE 24
|
#define KVM_REQ_SCAN_IOAPIC \
|
||||||
#define KVM_REQ_APIC_PAGE_RELOAD (25 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
|
KVM_ARCH_REQ_FLAGS(15, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
|
||||||
#define KVM_REQ_HV_CRASH 26
|
#define KVM_REQ_GLOBAL_CLOCK_UPDATE KVM_ARCH_REQ(16)
|
||||||
#define KVM_REQ_IOAPIC_EOI_EXIT 27
|
#define KVM_REQ_APIC_PAGE_RELOAD \
|
||||||
#define KVM_REQ_HV_RESET 28
|
KVM_ARCH_REQ_FLAGS(17, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
|
||||||
#define KVM_REQ_HV_EXIT 29
|
#define KVM_REQ_HV_CRASH KVM_ARCH_REQ(18)
|
||||||
#define KVM_REQ_HV_STIMER 30
|
#define KVM_REQ_IOAPIC_EOI_EXIT KVM_ARCH_REQ(19)
|
||||||
|
#define KVM_REQ_HV_RESET KVM_ARCH_REQ(20)
|
||||||
|
#define KVM_REQ_HV_EXIT KVM_ARCH_REQ(21)
|
||||||
|
#define KVM_REQ_HV_STIMER KVM_ARCH_REQ(22)
|
||||||
|
|
||||||
#define CR0_RESERVED_BITS \
|
#define CR0_RESERVED_BITS \
|
||||||
(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
|
(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
|
||||||
|
@ -254,7 +257,8 @@ union kvm_mmu_page_role {
|
||||||
unsigned cr0_wp:1;
|
unsigned cr0_wp:1;
|
||||||
unsigned smep_andnot_wp:1;
|
unsigned smep_andnot_wp:1;
|
||||||
unsigned smap_andnot_wp:1;
|
unsigned smap_andnot_wp:1;
|
||||||
unsigned :8;
|
unsigned ad_disabled:1;
|
||||||
|
unsigned :7;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* This is left at the top of the word so that
|
* This is left at the top of the word so that
|
||||||
|
|
|
@ -426,6 +426,8 @@
|
||||||
#define MSR_IA32_TSC_ADJUST 0x0000003b
|
#define MSR_IA32_TSC_ADJUST 0x0000003b
|
||||||
#define MSR_IA32_BNDCFGS 0x00000d90
|
#define MSR_IA32_BNDCFGS 0x00000d90
|
||||||
|
|
||||||
|
#define MSR_IA32_BNDCFGS_RSVD 0x00000ffc
|
||||||
|
|
||||||
#define MSR_IA32_XSS 0x00000da0
|
#define MSR_IA32_XSS 0x00000da0
|
||||||
|
|
||||||
#define FEATURE_CONTROL_LOCKED (1<<0)
|
#define FEATURE_CONTROL_LOCKED (1<<0)
|
||||||
|
|
|
@ -144,6 +144,14 @@ static inline bool guest_cpuid_has_rtm(struct kvm_vcpu *vcpu)
|
||||||
return best && (best->ebx & bit(X86_FEATURE_RTM));
|
return best && (best->ebx & bit(X86_FEATURE_RTM));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static inline bool guest_cpuid_has_mpx(struct kvm_vcpu *vcpu)
|
||||||
|
{
|
||||||
|
struct kvm_cpuid_entry2 *best;
|
||||||
|
|
||||||
|
best = kvm_find_cpuid_entry(vcpu, 7, 0);
|
||||||
|
return best && (best->ebx & bit(X86_FEATURE_MPX));
|
||||||
|
}
|
||||||
|
|
||||||
static inline bool guest_cpuid_has_rdtscp(struct kvm_vcpu *vcpu)
|
static inline bool guest_cpuid_has_rdtscp(struct kvm_vcpu *vcpu)
|
||||||
{
|
{
|
||||||
struct kvm_cpuid_entry2 *best;
|
struct kvm_cpuid_entry2 *best;
|
||||||
|
|
|
@ -900,7 +900,7 @@ static __always_inline int do_insn_fetch_bytes(struct x86_emulate_ctxt *ctxt,
|
||||||
if (rc != X86EMUL_CONTINUE) \
|
if (rc != X86EMUL_CONTINUE) \
|
||||||
goto done; \
|
goto done; \
|
||||||
ctxt->_eip += sizeof(_type); \
|
ctxt->_eip += sizeof(_type); \
|
||||||
_x = *(_type __aligned(1) *) ctxt->fetch.ptr; \
|
memcpy(&_x, ctxt->fetch.ptr, sizeof(_type)); \
|
||||||
ctxt->fetch.ptr += sizeof(_type); \
|
ctxt->fetch.ptr += sizeof(_type); \
|
||||||
_x; \
|
_x; \
|
||||||
})
|
})
|
||||||
|
@ -3941,6 +3941,25 @@ static int check_fxsr(struct x86_emulate_ctxt *ctxt)
|
||||||
return X86EMUL_CONTINUE;
|
return X86EMUL_CONTINUE;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Hardware doesn't save and restore XMM 0-7 without CR4.OSFXSR, but does save
|
||||||
|
* and restore MXCSR.
|
||||||
|
*/
|
||||||
|
static size_t __fxstate_size(int nregs)
|
||||||
|
{
|
||||||
|
return offsetof(struct fxregs_state, xmm_space[0]) + nregs * 16;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline size_t fxstate_size(struct x86_emulate_ctxt *ctxt)
|
||||||
|
{
|
||||||
|
bool cr4_osfxsr;
|
||||||
|
if (ctxt->mode == X86EMUL_MODE_PROT64)
|
||||||
|
return __fxstate_size(16);
|
||||||
|
|
||||||
|
cr4_osfxsr = ctxt->ops->get_cr(ctxt, 4) & X86_CR4_OSFXSR;
|
||||||
|
return __fxstate_size(cr4_osfxsr ? 8 : 0);
|
||||||
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* FXSAVE and FXRSTOR have 4 different formats depending on execution mode,
|
* FXSAVE and FXRSTOR have 4 different formats depending on execution mode,
|
||||||
* 1) 16 bit mode
|
* 1) 16 bit mode
|
||||||
|
@ -3962,7 +3981,6 @@ static int check_fxsr(struct x86_emulate_ctxt *ctxt)
|
||||||
static int em_fxsave(struct x86_emulate_ctxt *ctxt)
|
static int em_fxsave(struct x86_emulate_ctxt *ctxt)
|
||||||
{
|
{
|
||||||
struct fxregs_state fx_state;
|
struct fxregs_state fx_state;
|
||||||
size_t size;
|
|
||||||
int rc;
|
int rc;
|
||||||
|
|
||||||
rc = check_fxsr(ctxt);
|
rc = check_fxsr(ctxt);
|
||||||
|
@ -3978,68 +3996,42 @@ static int em_fxsave(struct x86_emulate_ctxt *ctxt)
|
||||||
if (rc != X86EMUL_CONTINUE)
|
if (rc != X86EMUL_CONTINUE)
|
||||||
return rc;
|
return rc;
|
||||||
|
|
||||||
if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_OSFXSR)
|
return segmented_write_std(ctxt, ctxt->memop.addr.mem, &fx_state,
|
||||||
size = offsetof(struct fxregs_state, xmm_space[8 * 16/4]);
|
fxstate_size(ctxt));
|
||||||
else
|
|
||||||
size = offsetof(struct fxregs_state, xmm_space[0]);
|
|
||||||
|
|
||||||
return segmented_write_std(ctxt, ctxt->memop.addr.mem, &fx_state, size);
|
|
||||||
}
|
|
||||||
|
|
||||||
static int fxrstor_fixup(struct x86_emulate_ctxt *ctxt,
|
|
||||||
struct fxregs_state *new)
|
|
||||||
{
|
|
||||||
int rc = X86EMUL_CONTINUE;
|
|
||||||
struct fxregs_state old;
|
|
||||||
|
|
||||||
rc = asm_safe("fxsave %[fx]", , [fx] "+m"(old));
|
|
||||||
if (rc != X86EMUL_CONTINUE)
|
|
||||||
return rc;
|
|
||||||
|
|
||||||
/*
|
|
||||||
* 64 bit host will restore XMM 8-15, which is not correct on non-64
|
|
||||||
* bit guests. Load the current values in order to preserve 64 bit
|
|
||||||
* XMMs after fxrstor.
|
|
||||||
*/
|
|
||||||
#ifdef CONFIG_X86_64
|
|
||||||
/* XXX: accessing XMM 8-15 very awkwardly */
|
|
||||||
memcpy(&new->xmm_space[8 * 16/4], &old.xmm_space[8 * 16/4], 8 * 16);
|
|
||||||
#endif
|
|
||||||
|
|
||||||
/*
|
|
||||||
* Hardware doesn't save and restore XMM 0-7 without CR4.OSFXSR, but
|
|
||||||
* does save and restore MXCSR.
|
|
||||||
*/
|
|
||||||
if (!(ctxt->ops->get_cr(ctxt, 4) & X86_CR4_OSFXSR))
|
|
||||||
memcpy(new->xmm_space, old.xmm_space, 8 * 16);
|
|
||||||
|
|
||||||
return rc;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
static int em_fxrstor(struct x86_emulate_ctxt *ctxt)
|
static int em_fxrstor(struct x86_emulate_ctxt *ctxt)
|
||||||
{
|
{
|
||||||
struct fxregs_state fx_state;
|
struct fxregs_state fx_state;
|
||||||
int rc;
|
int rc;
|
||||||
|
size_t size;
|
||||||
|
|
||||||
rc = check_fxsr(ctxt);
|
rc = check_fxsr(ctxt);
|
||||||
if (rc != X86EMUL_CONTINUE)
|
if (rc != X86EMUL_CONTINUE)
|
||||||
return rc;
|
return rc;
|
||||||
|
|
||||||
rc = segmented_read_std(ctxt, ctxt->memop.addr.mem, &fx_state, 512);
|
|
||||||
if (rc != X86EMUL_CONTINUE)
|
|
||||||
return rc;
|
|
||||||
|
|
||||||
if (fx_state.mxcsr >> 16)
|
|
||||||
return emulate_gp(ctxt, 0);
|
|
||||||
|
|
||||||
ctxt->ops->get_fpu(ctxt);
|
ctxt->ops->get_fpu(ctxt);
|
||||||
|
|
||||||
if (ctxt->mode < X86EMUL_MODE_PROT64)
|
size = fxstate_size(ctxt);
|
||||||
rc = fxrstor_fixup(ctxt, &fx_state);
|
if (size < __fxstate_size(16)) {
|
||||||
|
rc = asm_safe("fxsave %[fx]", , [fx] "+m"(fx_state));
|
||||||
|
if (rc != X86EMUL_CONTINUE)
|
||||||
|
goto out;
|
||||||
|
}
|
||||||
|
|
||||||
|
rc = segmented_read_std(ctxt, ctxt->memop.addr.mem, &fx_state, size);
|
||||||
|
if (rc != X86EMUL_CONTINUE)
|
||||||
|
goto out;
|
||||||
|
|
||||||
|
if (fx_state.mxcsr >> 16) {
|
||||||
|
rc = emulate_gp(ctxt, 0);
|
||||||
|
goto out;
|
||||||
|
}
|
||||||
|
|
||||||
if (rc == X86EMUL_CONTINUE)
|
if (rc == X86EMUL_CONTINUE)
|
||||||
rc = asm_safe("fxrstor %[fx]", : [fx] "m"(fx_state));
|
rc = asm_safe("fxrstor %[fx]", : [fx] "m"(fx_state));
|
||||||
|
|
||||||
|
out:
|
||||||
ctxt->ops->put_fpu(ctxt);
|
ctxt->ops->put_fpu(ctxt);
|
||||||
|
|
||||||
return rc;
|
return rc;
|
||||||
|
|
|
@ -1495,6 +1495,7 @@ EXPORT_SYMBOL_GPL(kvm_lapic_hv_timer_in_use);
|
||||||
|
|
||||||
static void cancel_hv_timer(struct kvm_lapic *apic)
|
static void cancel_hv_timer(struct kvm_lapic *apic)
|
||||||
{
|
{
|
||||||
|
WARN_ON(!apic->lapic_timer.hv_timer_in_use);
|
||||||
preempt_disable();
|
preempt_disable();
|
||||||
kvm_x86_ops->cancel_hv_timer(apic->vcpu);
|
kvm_x86_ops->cancel_hv_timer(apic->vcpu);
|
||||||
apic->lapic_timer.hv_timer_in_use = false;
|
apic->lapic_timer.hv_timer_in_use = false;
|
||||||
|
@ -1503,25 +1504,56 @@ static void cancel_hv_timer(struct kvm_lapic *apic)
|
||||||
|
|
||||||
static bool start_hv_timer(struct kvm_lapic *apic)
|
static bool start_hv_timer(struct kvm_lapic *apic)
|
||||||
{
|
{
|
||||||
u64 tscdeadline = apic->lapic_timer.tscdeadline;
|
struct kvm_timer *ktimer = &apic->lapic_timer;
|
||||||
|
int r;
|
||||||
|
|
||||||
if ((atomic_read(&apic->lapic_timer.pending) &&
|
if (!kvm_x86_ops->set_hv_timer)
|
||||||
!apic_lvtt_period(apic)) ||
|
return false;
|
||||||
kvm_x86_ops->set_hv_timer(apic->vcpu, tscdeadline)) {
|
|
||||||
if (apic->lapic_timer.hv_timer_in_use)
|
|
||||||
cancel_hv_timer(apic);
|
|
||||||
} else {
|
|
||||||
apic->lapic_timer.hv_timer_in_use = true;
|
|
||||||
hrtimer_cancel(&apic->lapic_timer.timer);
|
|
||||||
|
|
||||||
/* In case the sw timer triggered in the window */
|
if (!apic_lvtt_period(apic) && atomic_read(&ktimer->pending))
|
||||||
if (atomic_read(&apic->lapic_timer.pending) &&
|
return false;
|
||||||
!apic_lvtt_period(apic))
|
|
||||||
cancel_hv_timer(apic);
|
r = kvm_x86_ops->set_hv_timer(apic->vcpu, ktimer->tscdeadline);
|
||||||
|
if (r < 0)
|
||||||
|
return false;
|
||||||
|
|
||||||
|
ktimer->hv_timer_in_use = true;
|
||||||
|
hrtimer_cancel(&ktimer->timer);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Also recheck ktimer->pending, in case the sw timer triggered in
|
||||||
|
* the window. For periodic timer, leave the hv timer running for
|
||||||
|
* simplicity, and the deadline will be recomputed on the next vmexit.
|
||||||
|
*/
|
||||||
|
if (!apic_lvtt_period(apic) && (r || atomic_read(&ktimer->pending))) {
|
||||||
|
if (r)
|
||||||
|
apic_timer_expired(apic);
|
||||||
|
return false;
|
||||||
}
|
}
|
||||||
trace_kvm_hv_timer_state(apic->vcpu->vcpu_id,
|
|
||||||
apic->lapic_timer.hv_timer_in_use);
|
trace_kvm_hv_timer_state(apic->vcpu->vcpu_id, true);
|
||||||
return apic->lapic_timer.hv_timer_in_use;
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
static void start_sw_timer(struct kvm_lapic *apic)
|
||||||
|
{
|
||||||
|
struct kvm_timer *ktimer = &apic->lapic_timer;
|
||||||
|
if (apic->lapic_timer.hv_timer_in_use)
|
||||||
|
cancel_hv_timer(apic);
|
||||||
|
if (!apic_lvtt_period(apic) && atomic_read(&ktimer->pending))
|
||||||
|
return;
|
||||||
|
|
||||||
|
if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic))
|
||||||
|
start_sw_period(apic);
|
||||||
|
else if (apic_lvtt_tscdeadline(apic))
|
||||||
|
start_sw_tscdeadline(apic);
|
||||||
|
trace_kvm_hv_timer_state(apic->vcpu->vcpu_id, false);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void restart_apic_timer(struct kvm_lapic *apic)
|
||||||
|
{
|
||||||
|
if (!start_hv_timer(apic))
|
||||||
|
start_sw_timer(apic);
|
||||||
}
|
}
|
||||||
|
|
||||||
void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
|
void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
|
||||||
|
@ -1535,19 +1567,14 @@ void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
|
||||||
|
|
||||||
if (apic_lvtt_period(apic) && apic->lapic_timer.period) {
|
if (apic_lvtt_period(apic) && apic->lapic_timer.period) {
|
||||||
advance_periodic_target_expiration(apic);
|
advance_periodic_target_expiration(apic);
|
||||||
if (!start_hv_timer(apic))
|
restart_apic_timer(apic);
|
||||||
start_sw_period(apic);
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(kvm_lapic_expired_hv_timer);
|
EXPORT_SYMBOL_GPL(kvm_lapic_expired_hv_timer);
|
||||||
|
|
||||||
void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu)
|
void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu)
|
||||||
{
|
{
|
||||||
struct kvm_lapic *apic = vcpu->arch.apic;
|
restart_apic_timer(vcpu->arch.apic);
|
||||||
|
|
||||||
WARN_ON(apic->lapic_timer.hv_timer_in_use);
|
|
||||||
|
|
||||||
start_hv_timer(apic);
|
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_hv_timer);
|
EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_hv_timer);
|
||||||
|
|
||||||
|
@ -1556,33 +1583,28 @@ void kvm_lapic_switch_to_sw_timer(struct kvm_vcpu *vcpu)
|
||||||
struct kvm_lapic *apic = vcpu->arch.apic;
|
struct kvm_lapic *apic = vcpu->arch.apic;
|
||||||
|
|
||||||
/* Possibly the TSC deadline timer is not enabled yet */
|
/* Possibly the TSC deadline timer is not enabled yet */
|
||||||
if (!apic->lapic_timer.hv_timer_in_use)
|
if (apic->lapic_timer.hv_timer_in_use)
|
||||||
return;
|
start_sw_timer(apic);
|
||||||
|
|
||||||
cancel_hv_timer(apic);
|
|
||||||
|
|
||||||
if (atomic_read(&apic->lapic_timer.pending))
|
|
||||||
return;
|
|
||||||
|
|
||||||
if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic))
|
|
||||||
start_sw_period(apic);
|
|
||||||
else if (apic_lvtt_tscdeadline(apic))
|
|
||||||
start_sw_tscdeadline(apic);
|
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_sw_timer);
|
EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_sw_timer);
|
||||||
|
|
||||||
|
void kvm_lapic_restart_hv_timer(struct kvm_vcpu *vcpu)
|
||||||
|
{
|
||||||
|
struct kvm_lapic *apic = vcpu->arch.apic;
|
||||||
|
|
||||||
|
WARN_ON(!apic->lapic_timer.hv_timer_in_use);
|
||||||
|
restart_apic_timer(apic);
|
||||||
|
}
|
||||||
|
|
||||||
static void start_apic_timer(struct kvm_lapic *apic)
|
static void start_apic_timer(struct kvm_lapic *apic)
|
||||||
{
|
{
|
||||||
atomic_set(&apic->lapic_timer.pending, 0);
|
atomic_set(&apic->lapic_timer.pending, 0);
|
||||||
|
|
||||||
if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic)) {
|
if ((apic_lvtt_period(apic) || apic_lvtt_oneshot(apic))
|
||||||
if (set_target_expiration(apic) &&
|
&& !set_target_expiration(apic))
|
||||||
!(kvm_x86_ops->set_hv_timer && start_hv_timer(apic)))
|
return;
|
||||||
start_sw_period(apic);
|
|
||||||
} else if (apic_lvtt_tscdeadline(apic)) {
|
restart_apic_timer(apic);
|
||||||
if (!(kvm_x86_ops->set_hv_timer && start_hv_timer(apic)))
|
|
||||||
start_sw_tscdeadline(apic);
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
static void apic_manage_nmi_watchdog(struct kvm_lapic *apic, u32 lvt0_val)
|
static void apic_manage_nmi_watchdog(struct kvm_lapic *apic, u32 lvt0_val)
|
||||||
|
@ -1813,16 +1835,6 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu)
|
||||||
* LAPIC interface
|
* LAPIC interface
|
||||||
*----------------------------------------------------------------------
|
*----------------------------------------------------------------------
|
||||||
*/
|
*/
|
||||||
u64 kvm_get_lapic_target_expiration_tsc(struct kvm_vcpu *vcpu)
|
|
||||||
{
|
|
||||||
struct kvm_lapic *apic = vcpu->arch.apic;
|
|
||||||
|
|
||||||
if (!lapic_in_kernel(vcpu))
|
|
||||||
return 0;
|
|
||||||
|
|
||||||
return apic->lapic_timer.tscdeadline;
|
|
||||||
}
|
|
||||||
|
|
||||||
u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu)
|
u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu)
|
||||||
{
|
{
|
||||||
struct kvm_lapic *apic = vcpu->arch.apic;
|
struct kvm_lapic *apic = vcpu->arch.apic;
|
||||||
|
|
|
@ -87,7 +87,6 @@ int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s);
|
||||||
int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s);
|
int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s);
|
||||||
int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
|
int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
|
||||||
|
|
||||||
u64 kvm_get_lapic_target_expiration_tsc(struct kvm_vcpu *vcpu);
|
|
||||||
u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
|
u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
|
||||||
void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
|
void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
|
||||||
|
|
||||||
|
@ -216,4 +215,5 @@ void kvm_lapic_switch_to_sw_timer(struct kvm_vcpu *vcpu);
|
||||||
void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu);
|
void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu);
|
||||||
void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu);
|
void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu);
|
||||||
bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu);
|
bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu);
|
||||||
|
void kvm_lapic_restart_hv_timer(struct kvm_vcpu *vcpu);
|
||||||
#endif
|
#endif
|
||||||
|
|
|
@ -183,13 +183,13 @@ static u64 __read_mostly shadow_user_mask;
|
||||||
static u64 __read_mostly shadow_accessed_mask;
|
static u64 __read_mostly shadow_accessed_mask;
|
||||||
static u64 __read_mostly shadow_dirty_mask;
|
static u64 __read_mostly shadow_dirty_mask;
|
||||||
static u64 __read_mostly shadow_mmio_mask;
|
static u64 __read_mostly shadow_mmio_mask;
|
||||||
|
static u64 __read_mostly shadow_mmio_value;
|
||||||
static u64 __read_mostly shadow_present_mask;
|
static u64 __read_mostly shadow_present_mask;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* The mask/value to distinguish a PTE that has been marked not-present for
|
* SPTEs used by MMUs without A/D bits are marked with shadow_acc_track_value.
|
||||||
* access tracking purposes.
|
* Non-present SPTEs with shadow_acc_track_value set are in place for access
|
||||||
* The mask would be either 0 if access tracking is disabled, or
|
* tracking.
|
||||||
* SPTE_SPECIAL_MASK|VMX_EPT_RWX_MASK if access tracking is enabled.
|
|
||||||
*/
|
*/
|
||||||
static u64 __read_mostly shadow_acc_track_mask;
|
static u64 __read_mostly shadow_acc_track_mask;
|
||||||
static const u64 shadow_acc_track_value = SPTE_SPECIAL_MASK;
|
static const u64 shadow_acc_track_value = SPTE_SPECIAL_MASK;
|
||||||
|
@ -207,16 +207,40 @@ static const u64 shadow_acc_track_saved_bits_shift = PT64_SECOND_AVAIL_BITS_SHIF
|
||||||
static void mmu_spte_set(u64 *sptep, u64 spte);
|
static void mmu_spte_set(u64 *sptep, u64 spte);
|
||||||
static void mmu_free_roots(struct kvm_vcpu *vcpu);
|
static void mmu_free_roots(struct kvm_vcpu *vcpu);
|
||||||
|
|
||||||
void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask)
|
void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask, u64 mmio_value)
|
||||||
{
|
{
|
||||||
|
BUG_ON((mmio_mask & mmio_value) != mmio_value);
|
||||||
|
shadow_mmio_value = mmio_value | SPTE_SPECIAL_MASK;
|
||||||
shadow_mmio_mask = mmio_mask | SPTE_SPECIAL_MASK;
|
shadow_mmio_mask = mmio_mask | SPTE_SPECIAL_MASK;
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
|
EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
|
||||||
|
|
||||||
|
static inline bool sp_ad_disabled(struct kvm_mmu_page *sp)
|
||||||
|
{
|
||||||
|
return sp->role.ad_disabled;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline bool spte_ad_enabled(u64 spte)
|
||||||
|
{
|
||||||
|
MMU_WARN_ON((spte & shadow_mmio_mask) == shadow_mmio_value);
|
||||||
|
return !(spte & shadow_acc_track_value);
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline u64 spte_shadow_accessed_mask(u64 spte)
|
||||||
|
{
|
||||||
|
MMU_WARN_ON((spte & shadow_mmio_mask) == shadow_mmio_value);
|
||||||
|
return spte_ad_enabled(spte) ? shadow_accessed_mask : 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline u64 spte_shadow_dirty_mask(u64 spte)
|
||||||
|
{
|
||||||
|
MMU_WARN_ON((spte & shadow_mmio_mask) == shadow_mmio_value);
|
||||||
|
return spte_ad_enabled(spte) ? shadow_dirty_mask : 0;
|
||||||
|
}
|
||||||
|
|
||||||
static inline bool is_access_track_spte(u64 spte)
|
static inline bool is_access_track_spte(u64 spte)
|
||||||
{
|
{
|
||||||
/* Always false if shadow_acc_track_mask is zero. */
|
return !spte_ad_enabled(spte) && (spte & shadow_acc_track_mask) == 0;
|
||||||
return (spte & shadow_acc_track_mask) == shadow_acc_track_value;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -270,7 +294,7 @@ static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn,
|
||||||
u64 mask = generation_mmio_spte_mask(gen);
|
u64 mask = generation_mmio_spte_mask(gen);
|
||||||
|
|
||||||
access &= ACC_WRITE_MASK | ACC_USER_MASK;
|
access &= ACC_WRITE_MASK | ACC_USER_MASK;
|
||||||
mask |= shadow_mmio_mask | access | gfn << PAGE_SHIFT;
|
mask |= shadow_mmio_value | access | gfn << PAGE_SHIFT;
|
||||||
|
|
||||||
trace_mark_mmio_spte(sptep, gfn, access, gen);
|
trace_mark_mmio_spte(sptep, gfn, access, gen);
|
||||||
mmu_spte_set(sptep, mask);
|
mmu_spte_set(sptep, mask);
|
||||||
|
@ -278,7 +302,7 @@ static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn,
|
||||||
|
|
||||||
static bool is_mmio_spte(u64 spte)
|
static bool is_mmio_spte(u64 spte)
|
||||||
{
|
{
|
||||||
return (spte & shadow_mmio_mask) == shadow_mmio_mask;
|
return (spte & shadow_mmio_mask) == shadow_mmio_value;
|
||||||
}
|
}
|
||||||
|
|
||||||
static gfn_t get_mmio_spte_gfn(u64 spte)
|
static gfn_t get_mmio_spte_gfn(u64 spte)
|
||||||
|
@ -315,12 +339,20 @@ static bool check_mmio_spte(struct kvm_vcpu *vcpu, u64 spte)
|
||||||
return likely(kvm_gen == spte_gen);
|
return likely(kvm_gen == spte_gen);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Sets the shadow PTE masks used by the MMU.
|
||||||
|
*
|
||||||
|
* Assumptions:
|
||||||
|
* - Setting either @accessed_mask or @dirty_mask requires setting both
|
||||||
|
* - At least one of @accessed_mask or @acc_track_mask must be set
|
||||||
|
*/
|
||||||
void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
|
void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
|
||||||
u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask,
|
u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask,
|
||||||
u64 acc_track_mask)
|
u64 acc_track_mask)
|
||||||
{
|
{
|
||||||
if (acc_track_mask != 0)
|
BUG_ON(!dirty_mask != !accessed_mask);
|
||||||
acc_track_mask |= SPTE_SPECIAL_MASK;
|
BUG_ON(!accessed_mask && !acc_track_mask);
|
||||||
|
BUG_ON(acc_track_mask & shadow_acc_track_value);
|
||||||
|
|
||||||
shadow_user_mask = user_mask;
|
shadow_user_mask = user_mask;
|
||||||
shadow_accessed_mask = accessed_mask;
|
shadow_accessed_mask = accessed_mask;
|
||||||
|
@ -329,7 +361,6 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
|
||||||
shadow_x_mask = x_mask;
|
shadow_x_mask = x_mask;
|
||||||
shadow_present_mask = p_mask;
|
shadow_present_mask = p_mask;
|
||||||
shadow_acc_track_mask = acc_track_mask;
|
shadow_acc_track_mask = acc_track_mask;
|
||||||
WARN_ON(shadow_accessed_mask != 0 && shadow_acc_track_mask != 0);
|
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes);
|
EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes);
|
||||||
|
|
||||||
|
@ -549,7 +580,7 @@ static bool spte_has_volatile_bits(u64 spte)
|
||||||
is_access_track_spte(spte))
|
is_access_track_spte(spte))
|
||||||
return true;
|
return true;
|
||||||
|
|
||||||
if (shadow_accessed_mask) {
|
if (spte_ad_enabled(spte)) {
|
||||||
if ((spte & shadow_accessed_mask) == 0 ||
|
if ((spte & shadow_accessed_mask) == 0 ||
|
||||||
(is_writable_pte(spte) && (spte & shadow_dirty_mask) == 0))
|
(is_writable_pte(spte) && (spte & shadow_dirty_mask) == 0))
|
||||||
return true;
|
return true;
|
||||||
|
@ -560,14 +591,17 @@ static bool spte_has_volatile_bits(u64 spte)
|
||||||
|
|
||||||
static bool is_accessed_spte(u64 spte)
|
static bool is_accessed_spte(u64 spte)
|
||||||
{
|
{
|
||||||
return shadow_accessed_mask ? spte & shadow_accessed_mask
|
u64 accessed_mask = spte_shadow_accessed_mask(spte);
|
||||||
: !is_access_track_spte(spte);
|
|
||||||
|
return accessed_mask ? spte & accessed_mask
|
||||||
|
: !is_access_track_spte(spte);
|
||||||
}
|
}
|
||||||
|
|
||||||
static bool is_dirty_spte(u64 spte)
|
static bool is_dirty_spte(u64 spte)
|
||||||
{
|
{
|
||||||
return shadow_dirty_mask ? spte & shadow_dirty_mask
|
u64 dirty_mask = spte_shadow_dirty_mask(spte);
|
||||||
: spte & PT_WRITABLE_MASK;
|
|
||||||
|
return dirty_mask ? spte & dirty_mask : spte & PT_WRITABLE_MASK;
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Rules for using mmu_spte_set:
|
/* Rules for using mmu_spte_set:
|
||||||
|
@ -707,10 +741,10 @@ static u64 mmu_spte_get_lockless(u64 *sptep)
|
||||||
|
|
||||||
static u64 mark_spte_for_access_track(u64 spte)
|
static u64 mark_spte_for_access_track(u64 spte)
|
||||||
{
|
{
|
||||||
if (shadow_accessed_mask != 0)
|
if (spte_ad_enabled(spte))
|
||||||
return spte & ~shadow_accessed_mask;
|
return spte & ~shadow_accessed_mask;
|
||||||
|
|
||||||
if (shadow_acc_track_mask == 0 || is_access_track_spte(spte))
|
if (is_access_track_spte(spte))
|
||||||
return spte;
|
return spte;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -729,7 +763,6 @@ static u64 mark_spte_for_access_track(u64 spte)
|
||||||
spte |= (spte & shadow_acc_track_saved_bits_mask) <<
|
spte |= (spte & shadow_acc_track_saved_bits_mask) <<
|
||||||
shadow_acc_track_saved_bits_shift;
|
shadow_acc_track_saved_bits_shift;
|
||||||
spte &= ~shadow_acc_track_mask;
|
spte &= ~shadow_acc_track_mask;
|
||||||
spte |= shadow_acc_track_value;
|
|
||||||
|
|
||||||
return spte;
|
return spte;
|
||||||
}
|
}
|
||||||
|
@ -741,6 +774,7 @@ static u64 restore_acc_track_spte(u64 spte)
|
||||||
u64 saved_bits = (spte >> shadow_acc_track_saved_bits_shift)
|
u64 saved_bits = (spte >> shadow_acc_track_saved_bits_shift)
|
||||||
& shadow_acc_track_saved_bits_mask;
|
& shadow_acc_track_saved_bits_mask;
|
||||||
|
|
||||||
|
WARN_ON_ONCE(spte_ad_enabled(spte));
|
||||||
WARN_ON_ONCE(!is_access_track_spte(spte));
|
WARN_ON_ONCE(!is_access_track_spte(spte));
|
||||||
|
|
||||||
new_spte &= ~shadow_acc_track_mask;
|
new_spte &= ~shadow_acc_track_mask;
|
||||||
|
@ -759,7 +793,7 @@ static bool mmu_spte_age(u64 *sptep)
|
||||||
if (!is_accessed_spte(spte))
|
if (!is_accessed_spte(spte))
|
||||||
return false;
|
return false;
|
||||||
|
|
||||||
if (shadow_accessed_mask) {
|
if (spte_ad_enabled(spte)) {
|
||||||
clear_bit((ffs(shadow_accessed_mask) - 1),
|
clear_bit((ffs(shadow_accessed_mask) - 1),
|
||||||
(unsigned long *)sptep);
|
(unsigned long *)sptep);
|
||||||
} else {
|
} else {
|
||||||
|
@ -1390,6 +1424,22 @@ static bool spte_clear_dirty(u64 *sptep)
|
||||||
return mmu_spte_update(sptep, spte);
|
return mmu_spte_update(sptep, spte);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static bool wrprot_ad_disabled_spte(u64 *sptep)
|
||||||
|
{
|
||||||
|
bool was_writable = test_and_clear_bit(PT_WRITABLE_SHIFT,
|
||||||
|
(unsigned long *)sptep);
|
||||||
|
if (was_writable)
|
||||||
|
kvm_set_pfn_dirty(spte_to_pfn(*sptep));
|
||||||
|
|
||||||
|
return was_writable;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Gets the GFN ready for another round of dirty logging by clearing the
|
||||||
|
* - D bit on ad-enabled SPTEs, and
|
||||||
|
* - W bit on ad-disabled SPTEs.
|
||||||
|
* Returns true iff any D or W bits were cleared.
|
||||||
|
*/
|
||||||
static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
|
static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
|
||||||
{
|
{
|
||||||
u64 *sptep;
|
u64 *sptep;
|
||||||
|
@ -1397,7 +1447,10 @@ static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
|
||||||
bool flush = false;
|
bool flush = false;
|
||||||
|
|
||||||
for_each_rmap_spte(rmap_head, &iter, sptep)
|
for_each_rmap_spte(rmap_head, &iter, sptep)
|
||||||
flush |= spte_clear_dirty(sptep);
|
if (spte_ad_enabled(*sptep))
|
||||||
|
flush |= spte_clear_dirty(sptep);
|
||||||
|
else
|
||||||
|
flush |= wrprot_ad_disabled_spte(sptep);
|
||||||
|
|
||||||
return flush;
|
return flush;
|
||||||
}
|
}
|
||||||
|
@ -1420,7 +1473,8 @@ static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
|
||||||
bool flush = false;
|
bool flush = false;
|
||||||
|
|
||||||
for_each_rmap_spte(rmap_head, &iter, sptep)
|
for_each_rmap_spte(rmap_head, &iter, sptep)
|
||||||
flush |= spte_set_dirty(sptep);
|
if (spte_ad_enabled(*sptep))
|
||||||
|
flush |= spte_set_dirty(sptep);
|
||||||
|
|
||||||
return flush;
|
return flush;
|
||||||
}
|
}
|
||||||
|
@ -1452,7 +1506,8 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* kvm_mmu_clear_dirty_pt_masked - clear MMU D-bit for PT level pages
|
* kvm_mmu_clear_dirty_pt_masked - clear MMU D-bit for PT level pages, or write
|
||||||
|
* protect the page if the D-bit isn't supported.
|
||||||
* @kvm: kvm instance
|
* @kvm: kvm instance
|
||||||
* @slot: slot to clear D-bit
|
* @slot: slot to clear D-bit
|
||||||
* @gfn_offset: start of the BITS_PER_LONG pages we care about
|
* @gfn_offset: start of the BITS_PER_LONG pages we care about
|
||||||
|
@ -1766,18 +1821,9 @@ static int kvm_test_age_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
|
||||||
u64 *sptep;
|
u64 *sptep;
|
||||||
struct rmap_iterator iter;
|
struct rmap_iterator iter;
|
||||||
|
|
||||||
/*
|
|
||||||
* If there's no access bit in the secondary pte set by the hardware and
|
|
||||||
* fast access tracking is also not enabled, it's up to gup-fast/gup to
|
|
||||||
* set the access bit in the primary pte or in the page structure.
|
|
||||||
*/
|
|
||||||
if (!shadow_accessed_mask && !shadow_acc_track_mask)
|
|
||||||
goto out;
|
|
||||||
|
|
||||||
for_each_rmap_spte(rmap_head, &iter, sptep)
|
for_each_rmap_spte(rmap_head, &iter, sptep)
|
||||||
if (is_accessed_spte(*sptep))
|
if (is_accessed_spte(*sptep))
|
||||||
return 1;
|
return 1;
|
||||||
out:
|
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -1798,18 +1844,6 @@ static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
|
||||||
|
|
||||||
int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
|
int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
|
||||||
{
|
{
|
||||||
/*
|
|
||||||
* In case of absence of EPT Access and Dirty Bits supports,
|
|
||||||
* emulate the accessed bit for EPT, by checking if this page has
|
|
||||||
* an EPT mapping, and clearing it if it does. On the next access,
|
|
||||||
* a new EPT mapping will be established.
|
|
||||||
* This has some overhead, but not as much as the cost of swapping
|
|
||||||
* out actively used pages or breaking up actively used hugepages.
|
|
||||||
*/
|
|
||||||
if (!shadow_accessed_mask && !shadow_acc_track_mask)
|
|
||||||
return kvm_handle_hva_range(kvm, start, end, 0,
|
|
||||||
kvm_unmap_rmapp);
|
|
||||||
|
|
||||||
return kvm_handle_hva_range(kvm, start, end, 0, kvm_age_rmapp);
|
return kvm_handle_hva_range(kvm, start, end, 0, kvm_age_rmapp);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -2398,7 +2432,12 @@ static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
|
||||||
BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK);
|
BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK);
|
||||||
|
|
||||||
spte = __pa(sp->spt) | shadow_present_mask | PT_WRITABLE_MASK |
|
spte = __pa(sp->spt) | shadow_present_mask | PT_WRITABLE_MASK |
|
||||||
shadow_user_mask | shadow_x_mask | shadow_accessed_mask;
|
shadow_user_mask | shadow_x_mask;
|
||||||
|
|
||||||
|
if (sp_ad_disabled(sp))
|
||||||
|
spte |= shadow_acc_track_value;
|
||||||
|
else
|
||||||
|
spte |= shadow_accessed_mask;
|
||||||
|
|
||||||
mmu_spte_set(sptep, spte);
|
mmu_spte_set(sptep, spte);
|
||||||
|
|
||||||
|
@ -2666,10 +2705,15 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
|
||||||
{
|
{
|
||||||
u64 spte = 0;
|
u64 spte = 0;
|
||||||
int ret = 0;
|
int ret = 0;
|
||||||
|
struct kvm_mmu_page *sp;
|
||||||
|
|
||||||
if (set_mmio_spte(vcpu, sptep, gfn, pfn, pte_access))
|
if (set_mmio_spte(vcpu, sptep, gfn, pfn, pte_access))
|
||||||
return 0;
|
return 0;
|
||||||
|
|
||||||
|
sp = page_header(__pa(sptep));
|
||||||
|
if (sp_ad_disabled(sp))
|
||||||
|
spte |= shadow_acc_track_value;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* For the EPT case, shadow_present_mask is 0 if hardware
|
* For the EPT case, shadow_present_mask is 0 if hardware
|
||||||
* supports exec-only page table entries. In that case,
|
* supports exec-only page table entries. In that case,
|
||||||
|
@ -2678,7 +2722,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
|
||||||
*/
|
*/
|
||||||
spte |= shadow_present_mask;
|
spte |= shadow_present_mask;
|
||||||
if (!speculative)
|
if (!speculative)
|
||||||
spte |= shadow_accessed_mask;
|
spte |= spte_shadow_accessed_mask(spte);
|
||||||
|
|
||||||
if (pte_access & ACC_EXEC_MASK)
|
if (pte_access & ACC_EXEC_MASK)
|
||||||
spte |= shadow_x_mask;
|
spte |= shadow_x_mask;
|
||||||
|
@ -2735,7 +2779,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
|
||||||
|
|
||||||
if (pte_access & ACC_WRITE_MASK) {
|
if (pte_access & ACC_WRITE_MASK) {
|
||||||
kvm_vcpu_mark_page_dirty(vcpu, gfn);
|
kvm_vcpu_mark_page_dirty(vcpu, gfn);
|
||||||
spte |= shadow_dirty_mask;
|
spte |= spte_shadow_dirty_mask(spte);
|
||||||
}
|
}
|
||||||
|
|
||||||
if (speculative)
|
if (speculative)
|
||||||
|
@ -2877,16 +2921,16 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
|
||||||
{
|
{
|
||||||
struct kvm_mmu_page *sp;
|
struct kvm_mmu_page *sp;
|
||||||
|
|
||||||
|
sp = page_header(__pa(sptep));
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Since it's no accessed bit on EPT, it's no way to
|
* Without accessed bits, there's no way to distinguish between
|
||||||
* distinguish between actually accessed translations
|
* actually accessed translations and prefetched, so disable pte
|
||||||
* and prefetched, so disable pte prefetch if EPT is
|
* prefetch if accessed bits aren't available.
|
||||||
* enabled.
|
|
||||||
*/
|
*/
|
||||||
if (!shadow_accessed_mask)
|
if (sp_ad_disabled(sp))
|
||||||
return;
|
return;
|
||||||
|
|
||||||
sp = page_header(__pa(sptep));
|
|
||||||
if (sp->role.level > PT_PAGE_TABLE_LEVEL)
|
if (sp->role.level > PT_PAGE_TABLE_LEVEL)
|
||||||
return;
|
return;
|
||||||
|
|
||||||
|
@ -4290,6 +4334,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
|
||||||
|
|
||||||
context->base_role.word = 0;
|
context->base_role.word = 0;
|
||||||
context->base_role.smm = is_smm(vcpu);
|
context->base_role.smm = is_smm(vcpu);
|
||||||
|
context->base_role.ad_disabled = (shadow_accessed_mask == 0);
|
||||||
context->page_fault = tdp_page_fault;
|
context->page_fault = tdp_page_fault;
|
||||||
context->sync_page = nonpaging_sync_page;
|
context->sync_page = nonpaging_sync_page;
|
||||||
context->invlpg = nonpaging_invlpg;
|
context->invlpg = nonpaging_invlpg;
|
||||||
|
@ -4377,6 +4422,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
|
||||||
context->root_level = context->shadow_root_level;
|
context->root_level = context->shadow_root_level;
|
||||||
context->root_hpa = INVALID_PAGE;
|
context->root_hpa = INVALID_PAGE;
|
||||||
context->direct_map = false;
|
context->direct_map = false;
|
||||||
|
context->base_role.ad_disabled = !accessed_dirty;
|
||||||
|
|
||||||
update_permission_bitmask(vcpu, context, true);
|
update_permission_bitmask(vcpu, context, true);
|
||||||
update_pkru_bitmask(vcpu, context, true);
|
update_pkru_bitmask(vcpu, context, true);
|
||||||
|
@ -4636,6 +4682,7 @@ static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
|
||||||
mask.smep_andnot_wp = 1;
|
mask.smep_andnot_wp = 1;
|
||||||
mask.smap_andnot_wp = 1;
|
mask.smap_andnot_wp = 1;
|
||||||
mask.smm = 1;
|
mask.smm = 1;
|
||||||
|
mask.ad_disabled = 1;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* If we don't have indirect shadow pages, it means no page is
|
* If we don't have indirect shadow pages, it means no page is
|
||||||
|
|
|
@ -51,7 +51,7 @@ static inline u64 rsvd_bits(int s, int e)
|
||||||
return ((1ULL << (e - s + 1)) - 1) << s;
|
return ((1ULL << (e - s + 1)) - 1) << s;
|
||||||
}
|
}
|
||||||
|
|
||||||
void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
|
void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask, u64 mmio_value);
|
||||||
|
|
||||||
void
|
void
|
||||||
reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
|
reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
|
||||||
|
|
|
@ -30,8 +30,9 @@
|
||||||
\
|
\
|
||||||
role.word = __entry->role; \
|
role.word = __entry->role; \
|
||||||
\
|
\
|
||||||
trace_seq_printf(p, "sp gen %lx gfn %llx %u%s q%u%s %s%s" \
|
trace_seq_printf(p, "sp gen %lx gfn %llx l%u%s q%u%s %s%s" \
|
||||||
" %snxe root %u %s%c", __entry->mmu_valid_gen, \
|
" %snxe %sad root %u %s%c", \
|
||||||
|
__entry->mmu_valid_gen, \
|
||||||
__entry->gfn, role.level, \
|
__entry->gfn, role.level, \
|
||||||
role.cr4_pae ? " pae" : "", \
|
role.cr4_pae ? " pae" : "", \
|
||||||
role.quadrant, \
|
role.quadrant, \
|
||||||
|
@ -39,6 +40,7 @@
|
||||||
access_str[role.access], \
|
access_str[role.access], \
|
||||||
role.invalid ? " invalid" : "", \
|
role.invalid ? " invalid" : "", \
|
||||||
role.nxe ? "" : "!", \
|
role.nxe ? "" : "!", \
|
||||||
|
role.ad_disabled ? "!" : "", \
|
||||||
__entry->root_count, \
|
__entry->root_count, \
|
||||||
__entry->unsync ? "unsync" : "sync", 0); \
|
__entry->unsync ? "unsync" : "sync", 0); \
|
||||||
saved_ptr; \
|
saved_ptr; \
|
||||||
|
|
|
@ -190,6 +190,7 @@ struct vcpu_svm {
|
||||||
struct nested_state nested;
|
struct nested_state nested;
|
||||||
|
|
||||||
bool nmi_singlestep;
|
bool nmi_singlestep;
|
||||||
|
u64 nmi_singlestep_guest_rflags;
|
||||||
|
|
||||||
unsigned int3_injected;
|
unsigned int3_injected;
|
||||||
unsigned long int3_rip;
|
unsigned long int3_rip;
|
||||||
|
@ -964,6 +965,18 @@ static void svm_disable_lbrv(struct vcpu_svm *svm)
|
||||||
set_msr_interception(msrpm, MSR_IA32_LASTINTTOIP, 0, 0);
|
set_msr_interception(msrpm, MSR_IA32_LASTINTTOIP, 0, 0);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void disable_nmi_singlestep(struct vcpu_svm *svm)
|
||||||
|
{
|
||||||
|
svm->nmi_singlestep = false;
|
||||||
|
if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP)) {
|
||||||
|
/* Clear our flags if they were not set by the guest */
|
||||||
|
if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_TF))
|
||||||
|
svm->vmcb->save.rflags &= ~X86_EFLAGS_TF;
|
||||||
|
if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_RF))
|
||||||
|
svm->vmcb->save.rflags &= ~X86_EFLAGS_RF;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
/* Note:
|
/* Note:
|
||||||
* This hash table is used to map VM_ID to a struct kvm_arch,
|
* This hash table is used to map VM_ID to a struct kvm_arch,
|
||||||
* when handling AMD IOMMU GALOG notification to schedule in
|
* when handling AMD IOMMU GALOG notification to schedule in
|
||||||
|
@ -1713,11 +1726,24 @@ static void svm_vcpu_unblocking(struct kvm_vcpu *vcpu)
|
||||||
|
|
||||||
static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu)
|
static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu)
|
||||||
{
|
{
|
||||||
return to_svm(vcpu)->vmcb->save.rflags;
|
struct vcpu_svm *svm = to_svm(vcpu);
|
||||||
|
unsigned long rflags = svm->vmcb->save.rflags;
|
||||||
|
|
||||||
|
if (svm->nmi_singlestep) {
|
||||||
|
/* Hide our flags if they were not set by the guest */
|
||||||
|
if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_TF))
|
||||||
|
rflags &= ~X86_EFLAGS_TF;
|
||||||
|
if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_RF))
|
||||||
|
rflags &= ~X86_EFLAGS_RF;
|
||||||
|
}
|
||||||
|
return rflags;
|
||||||
}
|
}
|
||||||
|
|
||||||
static void svm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
|
static void svm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
|
||||||
{
|
{
|
||||||
|
if (to_svm(vcpu)->nmi_singlestep)
|
||||||
|
rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Any change of EFLAGS.VM is accompanied by a reload of SS
|
* Any change of EFLAGS.VM is accompanied by a reload of SS
|
||||||
* (caused by either a task switch or an inter-privilege IRET),
|
* (caused by either a task switch or an inter-privilege IRET),
|
||||||
|
@ -2112,10 +2138,7 @@ static int db_interception(struct vcpu_svm *svm)
|
||||||
}
|
}
|
||||||
|
|
||||||
if (svm->nmi_singlestep) {
|
if (svm->nmi_singlestep) {
|
||||||
svm->nmi_singlestep = false;
|
disable_nmi_singlestep(svm);
|
||||||
if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP))
|
|
||||||
svm->vmcb->save.rflags &=
|
|
||||||
~(X86_EFLAGS_TF | X86_EFLAGS_RF);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
if (svm->vcpu.guest_debug &
|
if (svm->vcpu.guest_debug &
|
||||||
|
@ -2370,8 +2393,8 @@ static void nested_svm_uninit_mmu_context(struct kvm_vcpu *vcpu)
|
||||||
|
|
||||||
static int nested_svm_check_permissions(struct vcpu_svm *svm)
|
static int nested_svm_check_permissions(struct vcpu_svm *svm)
|
||||||
{
|
{
|
||||||
if (!(svm->vcpu.arch.efer & EFER_SVME)
|
if (!(svm->vcpu.arch.efer & EFER_SVME) ||
|
||||||
|| !is_paging(&svm->vcpu)) {
|
!is_paging(&svm->vcpu)) {
|
||||||
kvm_queue_exception(&svm->vcpu, UD_VECTOR);
|
kvm_queue_exception(&svm->vcpu, UD_VECTOR);
|
||||||
return 1;
|
return 1;
|
||||||
}
|
}
|
||||||
|
@ -2381,7 +2404,7 @@ static int nested_svm_check_permissions(struct vcpu_svm *svm)
|
||||||
return 1;
|
return 1;
|
||||||
}
|
}
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
|
static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
|
||||||
|
@ -2534,6 +2557,31 @@ static int nested_svm_exit_handled_msr(struct vcpu_svm *svm)
|
||||||
return (value & mask) ? NESTED_EXIT_DONE : NESTED_EXIT_HOST;
|
return (value & mask) ? NESTED_EXIT_DONE : NESTED_EXIT_HOST;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* DB exceptions for our internal use must not cause vmexit */
|
||||||
|
static int nested_svm_intercept_db(struct vcpu_svm *svm)
|
||||||
|
{
|
||||||
|
unsigned long dr6;
|
||||||
|
|
||||||
|
/* if we're not singlestepping, it's not ours */
|
||||||
|
if (!svm->nmi_singlestep)
|
||||||
|
return NESTED_EXIT_DONE;
|
||||||
|
|
||||||
|
/* if it's not a singlestep exception, it's not ours */
|
||||||
|
if (kvm_get_dr(&svm->vcpu, 6, &dr6))
|
||||||
|
return NESTED_EXIT_DONE;
|
||||||
|
if (!(dr6 & DR6_BS))
|
||||||
|
return NESTED_EXIT_DONE;
|
||||||
|
|
||||||
|
/* if the guest is singlestepping, it should get the vmexit */
|
||||||
|
if (svm->nmi_singlestep_guest_rflags & X86_EFLAGS_TF) {
|
||||||
|
disable_nmi_singlestep(svm);
|
||||||
|
return NESTED_EXIT_DONE;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* it's ours, the nested hypervisor must not see this one */
|
||||||
|
return NESTED_EXIT_HOST;
|
||||||
|
}
|
||||||
|
|
||||||
static int nested_svm_exit_special(struct vcpu_svm *svm)
|
static int nested_svm_exit_special(struct vcpu_svm *svm)
|
||||||
{
|
{
|
||||||
u32 exit_code = svm->vmcb->control.exit_code;
|
u32 exit_code = svm->vmcb->control.exit_code;
|
||||||
|
@ -2589,8 +2637,12 @@ static int nested_svm_intercept(struct vcpu_svm *svm)
|
||||||
}
|
}
|
||||||
case SVM_EXIT_EXCP_BASE ... SVM_EXIT_EXCP_BASE + 0x1f: {
|
case SVM_EXIT_EXCP_BASE ... SVM_EXIT_EXCP_BASE + 0x1f: {
|
||||||
u32 excp_bits = 1 << (exit_code - SVM_EXIT_EXCP_BASE);
|
u32 excp_bits = 1 << (exit_code - SVM_EXIT_EXCP_BASE);
|
||||||
if (svm->nested.intercept_exceptions & excp_bits)
|
if (svm->nested.intercept_exceptions & excp_bits) {
|
||||||
vmexit = NESTED_EXIT_DONE;
|
if (exit_code == SVM_EXIT_EXCP_BASE + DB_VECTOR)
|
||||||
|
vmexit = nested_svm_intercept_db(svm);
|
||||||
|
else
|
||||||
|
vmexit = NESTED_EXIT_DONE;
|
||||||
|
}
|
||||||
/* async page fault always cause vmexit */
|
/* async page fault always cause vmexit */
|
||||||
else if ((exit_code == SVM_EXIT_EXCP_BASE + PF_VECTOR) &&
|
else if ((exit_code == SVM_EXIT_EXCP_BASE + PF_VECTOR) &&
|
||||||
svm->apf_reason != 0)
|
svm->apf_reason != 0)
|
||||||
|
@ -4627,10 +4679,17 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu)
|
||||||
== HF_NMI_MASK)
|
== HF_NMI_MASK)
|
||||||
return; /* IRET will cause a vm exit */
|
return; /* IRET will cause a vm exit */
|
||||||
|
|
||||||
|
if ((svm->vcpu.arch.hflags & HF_GIF_MASK) == 0)
|
||||||
|
return; /* STGI will cause a vm exit */
|
||||||
|
|
||||||
|
if (svm->nested.exit_required)
|
||||||
|
return; /* we're not going to run the guest yet */
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Something prevents NMI from been injected. Single step over possible
|
* Something prevents NMI from been injected. Single step over possible
|
||||||
* problem (IRET or exception injection or interrupt shadow)
|
* problem (IRET or exception injection or interrupt shadow)
|
||||||
*/
|
*/
|
||||||
|
svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
|
||||||
svm->nmi_singlestep = true;
|
svm->nmi_singlestep = true;
|
||||||
svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
|
svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
|
||||||
}
|
}
|
||||||
|
@ -4771,6 +4830,22 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
|
||||||
if (unlikely(svm->nested.exit_required))
|
if (unlikely(svm->nested.exit_required))
|
||||||
return;
|
return;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Disable singlestep if we're injecting an interrupt/exception.
|
||||||
|
* We don't want our modified rflags to be pushed on the stack where
|
||||||
|
* we might not be able to easily reset them if we disabled NMI
|
||||||
|
* singlestep later.
|
||||||
|
*/
|
||||||
|
if (svm->nmi_singlestep && svm->vmcb->control.event_inj) {
|
||||||
|
/*
|
||||||
|
* Event injection happens before external interrupts cause a
|
||||||
|
* vmexit and interrupts are disabled here, so smp_send_reschedule
|
||||||
|
* is enough to force an immediate vmexit.
|
||||||
|
*/
|
||||||
|
disable_nmi_singlestep(svm);
|
||||||
|
smp_send_reschedule(vcpu->cpu);
|
||||||
|
}
|
||||||
|
|
||||||
pre_svm_run(svm);
|
pre_svm_run(svm);
|
||||||
|
|
||||||
sync_lapic_to_cr8(vcpu);
|
sync_lapic_to_cr8(vcpu);
|
||||||
|
|
|
@ -913,8 +913,9 @@ static void nested_release_page_clean(struct page *page)
|
||||||
kvm_release_page_clean(page);
|
kvm_release_page_clean(page);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static bool nested_ept_ad_enabled(struct kvm_vcpu *vcpu);
|
||||||
static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu);
|
static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu);
|
||||||
static u64 construct_eptp(unsigned long root_hpa);
|
static u64 construct_eptp(struct kvm_vcpu *vcpu, unsigned long root_hpa);
|
||||||
static bool vmx_xsaves_supported(void);
|
static bool vmx_xsaves_supported(void);
|
||||||
static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr);
|
static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr);
|
||||||
static void vmx_set_segment(struct kvm_vcpu *vcpu,
|
static void vmx_set_segment(struct kvm_vcpu *vcpu,
|
||||||
|
@ -2772,7 +2773,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
|
||||||
if (enable_ept_ad_bits) {
|
if (enable_ept_ad_bits) {
|
||||||
vmx->nested.nested_vmx_secondary_ctls_high |=
|
vmx->nested.nested_vmx_secondary_ctls_high |=
|
||||||
SECONDARY_EXEC_ENABLE_PML;
|
SECONDARY_EXEC_ENABLE_PML;
|
||||||
vmx->nested.nested_vmx_ept_caps |= VMX_EPT_AD_BIT;
|
vmx->nested.nested_vmx_ept_caps |= VMX_EPT_AD_BIT;
|
||||||
}
|
}
|
||||||
} else
|
} else
|
||||||
vmx->nested.nested_vmx_ept_caps = 0;
|
vmx->nested.nested_vmx_ept_caps = 0;
|
||||||
|
@ -3198,7 +3199,8 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
||||||
msr_info->data = vmcs_readl(GUEST_SYSENTER_ESP);
|
msr_info->data = vmcs_readl(GUEST_SYSENTER_ESP);
|
||||||
break;
|
break;
|
||||||
case MSR_IA32_BNDCFGS:
|
case MSR_IA32_BNDCFGS:
|
||||||
if (!kvm_mpx_supported())
|
if (!kvm_mpx_supported() ||
|
||||||
|
(!msr_info->host_initiated && !guest_cpuid_has_mpx(vcpu)))
|
||||||
return 1;
|
return 1;
|
||||||
msr_info->data = vmcs_read64(GUEST_BNDCFGS);
|
msr_info->data = vmcs_read64(GUEST_BNDCFGS);
|
||||||
break;
|
break;
|
||||||
|
@ -3280,7 +3282,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
||||||
vmcs_writel(GUEST_SYSENTER_ESP, data);
|
vmcs_writel(GUEST_SYSENTER_ESP, data);
|
||||||
break;
|
break;
|
||||||
case MSR_IA32_BNDCFGS:
|
case MSR_IA32_BNDCFGS:
|
||||||
if (!kvm_mpx_supported())
|
if (!kvm_mpx_supported() ||
|
||||||
|
(!msr_info->host_initiated && !guest_cpuid_has_mpx(vcpu)))
|
||||||
|
return 1;
|
||||||
|
if (is_noncanonical_address(data & PAGE_MASK) ||
|
||||||
|
(data & MSR_IA32_BNDCFGS_RSVD))
|
||||||
return 1;
|
return 1;
|
||||||
vmcs_write64(GUEST_BNDCFGS, data);
|
vmcs_write64(GUEST_BNDCFGS, data);
|
||||||
break;
|
break;
|
||||||
|
@ -4013,7 +4019,7 @@ static inline void __vmx_flush_tlb(struct kvm_vcpu *vcpu, int vpid)
|
||||||
if (enable_ept) {
|
if (enable_ept) {
|
||||||
if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
|
if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
|
||||||
return;
|
return;
|
||||||
ept_sync_context(construct_eptp(vcpu->arch.mmu.root_hpa));
|
ept_sync_context(construct_eptp(vcpu, vcpu->arch.mmu.root_hpa));
|
||||||
} else {
|
} else {
|
||||||
vpid_sync_context(vpid);
|
vpid_sync_context(vpid);
|
||||||
}
|
}
|
||||||
|
@ -4188,14 +4194,15 @@ static void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
|
||||||
vmx->emulation_required = emulation_required(vcpu);
|
vmx->emulation_required = emulation_required(vcpu);
|
||||||
}
|
}
|
||||||
|
|
||||||
static u64 construct_eptp(unsigned long root_hpa)
|
static u64 construct_eptp(struct kvm_vcpu *vcpu, unsigned long root_hpa)
|
||||||
{
|
{
|
||||||
u64 eptp;
|
u64 eptp;
|
||||||
|
|
||||||
/* TODO write the value reading from MSR */
|
/* TODO write the value reading from MSR */
|
||||||
eptp = VMX_EPT_DEFAULT_MT |
|
eptp = VMX_EPT_DEFAULT_MT |
|
||||||
VMX_EPT_DEFAULT_GAW << VMX_EPT_GAW_EPTP_SHIFT;
|
VMX_EPT_DEFAULT_GAW << VMX_EPT_GAW_EPTP_SHIFT;
|
||||||
if (enable_ept_ad_bits)
|
if (enable_ept_ad_bits &&
|
||||||
|
(!is_guest_mode(vcpu) || nested_ept_ad_enabled(vcpu)))
|
||||||
eptp |= VMX_EPT_AD_ENABLE_BIT;
|
eptp |= VMX_EPT_AD_ENABLE_BIT;
|
||||||
eptp |= (root_hpa & PAGE_MASK);
|
eptp |= (root_hpa & PAGE_MASK);
|
||||||
|
|
||||||
|
@ -4209,7 +4216,7 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
|
||||||
|
|
||||||
guest_cr3 = cr3;
|
guest_cr3 = cr3;
|
||||||
if (enable_ept) {
|
if (enable_ept) {
|
||||||
eptp = construct_eptp(cr3);
|
eptp = construct_eptp(vcpu, cr3);
|
||||||
vmcs_write64(EPT_POINTER, eptp);
|
vmcs_write64(EPT_POINTER, eptp);
|
||||||
if (is_paging(vcpu) || is_guest_mode(vcpu))
|
if (is_paging(vcpu) || is_guest_mode(vcpu))
|
||||||
guest_cr3 = kvm_read_cr3(vcpu);
|
guest_cr3 = kvm_read_cr3(vcpu);
|
||||||
|
@ -5170,7 +5177,8 @@ static void ept_set_mmio_spte_mask(void)
|
||||||
* EPT Misconfigurations can be generated if the value of bits 2:0
|
* EPT Misconfigurations can be generated if the value of bits 2:0
|
||||||
* of an EPT paging-structure entry is 110b (write/execute).
|
* of an EPT paging-structure entry is 110b (write/execute).
|
||||||
*/
|
*/
|
||||||
kvm_mmu_set_mmio_spte_mask(VMX_EPT_MISCONFIG_WX_VALUE);
|
kvm_mmu_set_mmio_spte_mask(VMX_EPT_RWX_MASK,
|
||||||
|
VMX_EPT_MISCONFIG_WX_VALUE);
|
||||||
}
|
}
|
||||||
|
|
||||||
#define VMX_XSS_EXIT_BITMAP 0
|
#define VMX_XSS_EXIT_BITMAP 0
|
||||||
|
@ -6220,17 +6228,6 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
|
||||||
|
|
||||||
exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
|
exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
|
||||||
|
|
||||||
if (is_guest_mode(vcpu)
|
|
||||||
&& !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) {
|
|
||||||
/*
|
|
||||||
* Fix up exit_qualification according to whether guest
|
|
||||||
* page table accesses are reads or writes.
|
|
||||||
*/
|
|
||||||
u64 eptp = nested_ept_get_cr3(vcpu);
|
|
||||||
if (!(eptp & VMX_EPT_AD_ENABLE_BIT))
|
|
||||||
exit_qualification &= ~EPT_VIOLATION_ACC_WRITE;
|
|
||||||
}
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* EPT violation happened while executing iret from NMI,
|
* EPT violation happened while executing iret from NMI,
|
||||||
* "blocked by NMI" bit has to be set before next VM entry.
|
* "blocked by NMI" bit has to be set before next VM entry.
|
||||||
|
@ -6453,7 +6450,7 @@ void vmx_enable_tdp(void)
|
||||||
enable_ept_ad_bits ? VMX_EPT_DIRTY_BIT : 0ull,
|
enable_ept_ad_bits ? VMX_EPT_DIRTY_BIT : 0ull,
|
||||||
0ull, VMX_EPT_EXECUTABLE_MASK,
|
0ull, VMX_EPT_EXECUTABLE_MASK,
|
||||||
cpu_has_vmx_ept_execute_only() ? 0ull : VMX_EPT_READABLE_MASK,
|
cpu_has_vmx_ept_execute_only() ? 0ull : VMX_EPT_READABLE_MASK,
|
||||||
enable_ept_ad_bits ? 0ull : VMX_EPT_RWX_MASK);
|
VMX_EPT_RWX_MASK);
|
||||||
|
|
||||||
ept_set_mmio_spte_mask();
|
ept_set_mmio_spte_mask();
|
||||||
kvm_enable_tdp();
|
kvm_enable_tdp();
|
||||||
|
@ -6557,7 +6554,6 @@ static __init int hardware_setup(void)
|
||||||
vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_CS, false);
|
vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_CS, false);
|
||||||
vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_ESP, false);
|
vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_ESP, false);
|
||||||
vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false);
|
vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false);
|
||||||
vmx_disable_intercept_for_msr(MSR_IA32_BNDCFGS, true);
|
|
||||||
|
|
||||||
memcpy(vmx_msr_bitmap_legacy_x2apic_apicv,
|
memcpy(vmx_msr_bitmap_legacy_x2apic_apicv,
|
||||||
vmx_msr_bitmap_legacy, PAGE_SIZE);
|
vmx_msr_bitmap_legacy, PAGE_SIZE);
|
||||||
|
@ -7661,7 +7657,10 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
|
||||||
unsigned long type, types;
|
unsigned long type, types;
|
||||||
gva_t gva;
|
gva_t gva;
|
||||||
struct x86_exception e;
|
struct x86_exception e;
|
||||||
int vpid;
|
struct {
|
||||||
|
u64 vpid;
|
||||||
|
u64 gla;
|
||||||
|
} operand;
|
||||||
|
|
||||||
if (!(vmx->nested.nested_vmx_secondary_ctls_high &
|
if (!(vmx->nested.nested_vmx_secondary_ctls_high &
|
||||||
SECONDARY_EXEC_ENABLE_VPID) ||
|
SECONDARY_EXEC_ENABLE_VPID) ||
|
||||||
|
@ -7691,17 +7690,28 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
|
||||||
if (get_vmx_mem_address(vcpu, vmcs_readl(EXIT_QUALIFICATION),
|
if (get_vmx_mem_address(vcpu, vmcs_readl(EXIT_QUALIFICATION),
|
||||||
vmx_instruction_info, false, &gva))
|
vmx_instruction_info, false, &gva))
|
||||||
return 1;
|
return 1;
|
||||||
if (kvm_read_guest_virt(&vcpu->arch.emulate_ctxt, gva, &vpid,
|
if (kvm_read_guest_virt(&vcpu->arch.emulate_ctxt, gva, &operand,
|
||||||
sizeof(u32), &e)) {
|
sizeof(operand), &e)) {
|
||||||
kvm_inject_page_fault(vcpu, &e);
|
kvm_inject_page_fault(vcpu, &e);
|
||||||
return 1;
|
return 1;
|
||||||
}
|
}
|
||||||
|
if (operand.vpid >> 16) {
|
||||||
|
nested_vmx_failValid(vcpu,
|
||||||
|
VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
|
||||||
|
return kvm_skip_emulated_instruction(vcpu);
|
||||||
|
}
|
||||||
|
|
||||||
switch (type) {
|
switch (type) {
|
||||||
case VMX_VPID_EXTENT_INDIVIDUAL_ADDR:
|
case VMX_VPID_EXTENT_INDIVIDUAL_ADDR:
|
||||||
|
if (is_noncanonical_address(operand.gla)) {
|
||||||
|
nested_vmx_failValid(vcpu,
|
||||||
|
VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
|
||||||
|
return kvm_skip_emulated_instruction(vcpu);
|
||||||
|
}
|
||||||
|
/* fall through */
|
||||||
case VMX_VPID_EXTENT_SINGLE_CONTEXT:
|
case VMX_VPID_EXTENT_SINGLE_CONTEXT:
|
||||||
case VMX_VPID_EXTENT_SINGLE_NON_GLOBAL:
|
case VMX_VPID_EXTENT_SINGLE_NON_GLOBAL:
|
||||||
if (!vpid) {
|
if (!operand.vpid) {
|
||||||
nested_vmx_failValid(vcpu,
|
nested_vmx_failValid(vcpu,
|
||||||
VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
|
VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
|
||||||
return kvm_skip_emulated_instruction(vcpu);
|
return kvm_skip_emulated_instruction(vcpu);
|
||||||
|
@ -9394,6 +9404,11 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
|
||||||
vmcs12->guest_physical_address = fault->address;
|
vmcs12->guest_physical_address = fault->address;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static bool nested_ept_ad_enabled(struct kvm_vcpu *vcpu)
|
||||||
|
{
|
||||||
|
return nested_ept_get_cr3(vcpu) & VMX_EPT_AD_ENABLE_BIT;
|
||||||
|
}
|
||||||
|
|
||||||
/* Callbacks for nested_ept_init_mmu_context: */
|
/* Callbacks for nested_ept_init_mmu_context: */
|
||||||
|
|
||||||
static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
|
static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
|
||||||
|
@ -9404,18 +9419,18 @@ static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
|
||||||
|
|
||||||
static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
|
static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
|
||||||
{
|
{
|
||||||
u64 eptp;
|
bool wants_ad;
|
||||||
|
|
||||||
WARN_ON(mmu_is_nested(vcpu));
|
WARN_ON(mmu_is_nested(vcpu));
|
||||||
eptp = nested_ept_get_cr3(vcpu);
|
wants_ad = nested_ept_ad_enabled(vcpu);
|
||||||
if ((eptp & VMX_EPT_AD_ENABLE_BIT) && !enable_ept_ad_bits)
|
if (wants_ad && !enable_ept_ad_bits)
|
||||||
return 1;
|
return 1;
|
||||||
|
|
||||||
kvm_mmu_unload(vcpu);
|
kvm_mmu_unload(vcpu);
|
||||||
kvm_init_shadow_ept_mmu(vcpu,
|
kvm_init_shadow_ept_mmu(vcpu,
|
||||||
to_vmx(vcpu)->nested.nested_vmx_ept_caps &
|
to_vmx(vcpu)->nested.nested_vmx_ept_caps &
|
||||||
VMX_EPT_EXECUTE_ONLY_BIT,
|
VMX_EPT_EXECUTE_ONLY_BIT,
|
||||||
eptp & VMX_EPT_AD_ENABLE_BIT);
|
wants_ad);
|
||||||
vcpu->arch.mmu.set_cr3 = vmx_set_cr3;
|
vcpu->arch.mmu.set_cr3 = vmx_set_cr3;
|
||||||
vcpu->arch.mmu.get_cr3 = nested_ept_get_cr3;
|
vcpu->arch.mmu.get_cr3 = nested_ept_get_cr3;
|
||||||
vcpu->arch.mmu.inject_page_fault = nested_ept_inject_page_fault;
|
vcpu->arch.mmu.inject_page_fault = nested_ept_inject_page_fault;
|
||||||
|
@ -10728,8 +10743,7 @@ static void sync_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
|
||||||
vmcs12->guest_pdptr3 = vmcs_read64(GUEST_PDPTR3);
|
vmcs12->guest_pdptr3 = vmcs_read64(GUEST_PDPTR3);
|
||||||
}
|
}
|
||||||
|
|
||||||
if (nested_cpu_has_ept(vmcs12))
|
vmcs12->guest_linear_address = vmcs_readl(GUEST_LINEAR_ADDRESS);
|
||||||
vmcs12->guest_linear_address = vmcs_readl(GUEST_LINEAR_ADDRESS);
|
|
||||||
|
|
||||||
if (nested_cpu_has_vid(vmcs12))
|
if (nested_cpu_has_vid(vmcs12))
|
||||||
vmcs12->guest_intr_status = vmcs_read16(GUEST_INTR_STATUS);
|
vmcs12->guest_intr_status = vmcs_read16(GUEST_INTR_STATUS);
|
||||||
|
@ -10754,8 +10768,6 @@ static void sync_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
|
||||||
vmcs12->guest_sysenter_eip = vmcs_readl(GUEST_SYSENTER_EIP);
|
vmcs12->guest_sysenter_eip = vmcs_readl(GUEST_SYSENTER_EIP);
|
||||||
if (kvm_mpx_supported())
|
if (kvm_mpx_supported())
|
||||||
vmcs12->guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
|
vmcs12->guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
|
||||||
if (nested_cpu_has_xsaves(vmcs12))
|
|
||||||
vmcs12->xss_exit_bitmap = vmcs_read64(XSS_EXIT_BITMAP);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -11152,7 +11164,8 @@ static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc)
|
||||||
vmx->hv_deadline_tsc = tscl + delta_tsc;
|
vmx->hv_deadline_tsc = tscl + delta_tsc;
|
||||||
vmcs_set_bits(PIN_BASED_VM_EXEC_CONTROL,
|
vmcs_set_bits(PIN_BASED_VM_EXEC_CONTROL,
|
||||||
PIN_BASED_VMX_PREEMPTION_TIMER);
|
PIN_BASED_VMX_PREEMPTION_TIMER);
|
||||||
return 0;
|
|
||||||
|
return delta_tsc == 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
static void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu)
|
static void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu)
|
||||||
|
|
|
@ -2841,10 +2841,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
|
||||||
kvm_vcpu_write_tsc_offset(vcpu, offset);
|
kvm_vcpu_write_tsc_offset(vcpu, offset);
|
||||||
vcpu->arch.tsc_catchup = 1;
|
vcpu->arch.tsc_catchup = 1;
|
||||||
}
|
}
|
||||||
if (kvm_lapic_hv_timer_in_use(vcpu) &&
|
|
||||||
kvm_x86_ops->set_hv_timer(vcpu,
|
if (kvm_lapic_hv_timer_in_use(vcpu))
|
||||||
kvm_get_lapic_target_expiration_tsc(vcpu)))
|
kvm_lapic_restart_hv_timer(vcpu);
|
||||||
kvm_lapic_switch_to_sw_timer(vcpu);
|
|
||||||
/*
|
/*
|
||||||
* On a host with synchronized TSC, there is no need to update
|
* On a host with synchronized TSC, there is no need to update
|
||||||
* kvmclock on vcpu->cpu migration
|
* kvmclock on vcpu->cpu migration
|
||||||
|
@ -6011,7 +6011,7 @@ static void kvm_set_mmio_spte_mask(void)
|
||||||
mask &= ~1ull;
|
mask &= ~1ull;
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
kvm_mmu_set_mmio_spte_mask(mask);
|
kvm_mmu_set_mmio_spte_mask(mask, mask);
|
||||||
}
|
}
|
||||||
|
|
||||||
#ifdef CONFIG_X86_64
|
#ifdef CONFIG_X86_64
|
||||||
|
@ -6733,7 +6733,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
|
||||||
|
|
||||||
bool req_immediate_exit = false;
|
bool req_immediate_exit = false;
|
||||||
|
|
||||||
if (vcpu->requests) {
|
if (kvm_request_pending(vcpu)) {
|
||||||
if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu))
|
if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu))
|
||||||
kvm_mmu_unload(vcpu);
|
kvm_mmu_unload(vcpu);
|
||||||
if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu))
|
if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu))
|
||||||
|
@ -6897,7 +6897,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
|
||||||
kvm_x86_ops->sync_pir_to_irr(vcpu);
|
kvm_x86_ops->sync_pir_to_irr(vcpu);
|
||||||
}
|
}
|
||||||
|
|
||||||
if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
|
if (vcpu->mode == EXITING_GUEST_MODE || kvm_request_pending(vcpu)
|
||||||
|| need_resched() || signal_pending(current)) {
|
|| need_resched() || signal_pending(current)) {
|
||||||
vcpu->mode = OUTSIDE_GUEST_MODE;
|
vcpu->mode = OUTSIDE_GUEST_MODE;
|
||||||
smp_wmb();
|
smp_wmb();
|
||||||
|
|
|
@ -57,9 +57,7 @@ struct arch_timer_cpu {
|
||||||
|
|
||||||
int kvm_timer_hyp_init(void);
|
int kvm_timer_hyp_init(void);
|
||||||
int kvm_timer_enable(struct kvm_vcpu *vcpu);
|
int kvm_timer_enable(struct kvm_vcpu *vcpu);
|
||||||
int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
|
int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu);
|
||||||
const struct kvm_irq_level *virt_irq,
|
|
||||||
const struct kvm_irq_level *phys_irq);
|
|
||||||
void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu);
|
void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu);
|
||||||
void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu);
|
void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu);
|
||||||
void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu);
|
void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu);
|
||||||
|
@ -70,6 +68,10 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu);
|
||||||
u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
|
u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
|
||||||
int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
|
int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
|
||||||
|
|
||||||
|
int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
|
||||||
|
int kvm_arm_timer_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
|
||||||
|
int kvm_arm_timer_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
|
||||||
|
|
||||||
bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx);
|
bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx);
|
||||||
void kvm_timer_schedule(struct kvm_vcpu *vcpu);
|
void kvm_timer_schedule(struct kvm_vcpu *vcpu);
|
||||||
void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
|
void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
|
||||||
|
|
|
@ -35,6 +35,7 @@ struct kvm_pmu {
|
||||||
int irq_num;
|
int irq_num;
|
||||||
struct kvm_pmc pmc[ARMV8_PMU_MAX_COUNTERS];
|
struct kvm_pmc pmc[ARMV8_PMU_MAX_COUNTERS];
|
||||||
bool ready;
|
bool ready;
|
||||||
|
bool created;
|
||||||
bool irq_level;
|
bool irq_level;
|
||||||
};
|
};
|
||||||
|
|
||||||
|
@ -63,6 +64,7 @@ int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu,
|
||||||
struct kvm_device_attr *attr);
|
struct kvm_device_attr *attr);
|
||||||
int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu,
|
int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu,
|
||||||
struct kvm_device_attr *attr);
|
struct kvm_device_attr *attr);
|
||||||
|
int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu);
|
||||||
#else
|
#else
|
||||||
struct kvm_pmu {
|
struct kvm_pmu {
|
||||||
};
|
};
|
||||||
|
@ -112,6 +114,10 @@ static inline int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu,
|
||||||
{
|
{
|
||||||
return -ENXIO;
|
return -ENXIO;
|
||||||
}
|
}
|
||||||
|
static inline int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu)
|
||||||
|
{
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
#endif
|
#endif
|
||||||
|
|
|
@ -38,6 +38,10 @@
|
||||||
#define VGIC_MIN_LPI 8192
|
#define VGIC_MIN_LPI 8192
|
||||||
#define KVM_IRQCHIP_NUM_PINS (1020 - 32)
|
#define KVM_IRQCHIP_NUM_PINS (1020 - 32)
|
||||||
|
|
||||||
|
#define irq_is_ppi(irq) ((irq) >= VGIC_NR_SGIS && (irq) < VGIC_NR_PRIVATE_IRQS)
|
||||||
|
#define irq_is_spi(irq) ((irq) >= VGIC_NR_PRIVATE_IRQS && \
|
||||||
|
(irq) <= VGIC_MAX_SPI)
|
||||||
|
|
||||||
enum vgic_type {
|
enum vgic_type {
|
||||||
VGIC_V2, /* Good ol' GICv2 */
|
VGIC_V2, /* Good ol' GICv2 */
|
||||||
VGIC_V3, /* New fancy GICv3 */
|
VGIC_V3, /* New fancy GICv3 */
|
||||||
|
@ -119,6 +123,9 @@ struct vgic_irq {
|
||||||
u8 source; /* GICv2 SGIs only */
|
u8 source; /* GICv2 SGIs only */
|
||||||
u8 priority;
|
u8 priority;
|
||||||
enum vgic_irq_config config; /* Level or edge */
|
enum vgic_irq_config config; /* Level or edge */
|
||||||
|
|
||||||
|
void *owner; /* Opaque pointer to reserve an interrupt
|
||||||
|
for in-kernel devices. */
|
||||||
};
|
};
|
||||||
|
|
||||||
struct vgic_register_region;
|
struct vgic_register_region;
|
||||||
|
@ -285,6 +292,7 @@ struct vgic_cpu {
|
||||||
};
|
};
|
||||||
|
|
||||||
extern struct static_key_false vgic_v2_cpuif_trap;
|
extern struct static_key_false vgic_v2_cpuif_trap;
|
||||||
|
extern struct static_key_false vgic_v3_cpuif_trap;
|
||||||
|
|
||||||
int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 *addr, bool write);
|
int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 *addr, bool write);
|
||||||
void kvm_vgic_early_init(struct kvm *kvm);
|
void kvm_vgic_early_init(struct kvm *kvm);
|
||||||
|
@ -298,9 +306,7 @@ int kvm_vgic_hyp_init(void);
|
||||||
void kvm_vgic_init_cpu_hardware(void);
|
void kvm_vgic_init_cpu_hardware(void);
|
||||||
|
|
||||||
int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
|
int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
|
||||||
bool level);
|
bool level, void *owner);
|
||||||
int kvm_vgic_inject_mapped_irq(struct kvm *kvm, int cpuid, unsigned int intid,
|
|
||||||
bool level);
|
|
||||||
int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, u32 virt_irq, u32 phys_irq);
|
int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, u32 virt_irq, u32 phys_irq);
|
||||||
int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int virt_irq);
|
int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int virt_irq);
|
||||||
bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int virt_irq);
|
bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int virt_irq);
|
||||||
|
@ -341,4 +347,6 @@ int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi);
|
||||||
*/
|
*/
|
||||||
int kvm_vgic_setup_default_irq_routing(struct kvm *kvm);
|
int kvm_vgic_setup_default_irq_routing(struct kvm *kvm);
|
||||||
|
|
||||||
|
int kvm_vgic_set_owner(struct kvm_vcpu *vcpu, unsigned int intid, void *owner);
|
||||||
|
|
||||||
#endif /* __KVM_ARM_VGIC_H */
|
#endif /* __KVM_ARM_VGIC_H */
|
||||||
|
|
|
@ -405,6 +405,7 @@
|
||||||
#define ICH_LR_PHYS_ID_SHIFT 32
|
#define ICH_LR_PHYS_ID_SHIFT 32
|
||||||
#define ICH_LR_PHYS_ID_MASK (0x3ffULL << ICH_LR_PHYS_ID_SHIFT)
|
#define ICH_LR_PHYS_ID_MASK (0x3ffULL << ICH_LR_PHYS_ID_SHIFT)
|
||||||
#define ICH_LR_PRIORITY_SHIFT 48
|
#define ICH_LR_PRIORITY_SHIFT 48
|
||||||
|
#define ICH_LR_PRIORITY_MASK (0xffULL << ICH_LR_PRIORITY_SHIFT)
|
||||||
|
|
||||||
/* These are for GICv2 emulation only */
|
/* These are for GICv2 emulation only */
|
||||||
#define GICH_LR_VIRTUALID (0x3ffUL << 0)
|
#define GICH_LR_VIRTUALID (0x3ffUL << 0)
|
||||||
|
@ -416,6 +417,11 @@
|
||||||
|
|
||||||
#define ICH_HCR_EN (1 << 0)
|
#define ICH_HCR_EN (1 << 0)
|
||||||
#define ICH_HCR_UIE (1 << 1)
|
#define ICH_HCR_UIE (1 << 1)
|
||||||
|
#define ICH_HCR_TC (1 << 10)
|
||||||
|
#define ICH_HCR_TALL0 (1 << 11)
|
||||||
|
#define ICH_HCR_TALL1 (1 << 12)
|
||||||
|
#define ICH_HCR_EOIcount_SHIFT 27
|
||||||
|
#define ICH_HCR_EOIcount_MASK (0x1f << ICH_HCR_EOIcount_SHIFT)
|
||||||
|
|
||||||
#define ICH_VMCR_ACK_CTL_SHIFT 2
|
#define ICH_VMCR_ACK_CTL_SHIFT 2
|
||||||
#define ICH_VMCR_ACK_CTL_MASK (1 << ICH_VMCR_ACK_CTL_SHIFT)
|
#define ICH_VMCR_ACK_CTL_MASK (1 << ICH_VMCR_ACK_CTL_SHIFT)
|
||||||
|
|
|
@ -126,6 +126,13 @@ static inline bool is_error_page(struct page *page)
|
||||||
#define KVM_REQ_MMU_RELOAD (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
|
#define KVM_REQ_MMU_RELOAD (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
|
||||||
#define KVM_REQ_PENDING_TIMER 2
|
#define KVM_REQ_PENDING_TIMER 2
|
||||||
#define KVM_REQ_UNHALT 3
|
#define KVM_REQ_UNHALT 3
|
||||||
|
#define KVM_REQUEST_ARCH_BASE 8
|
||||||
|
|
||||||
|
#define KVM_ARCH_REQ_FLAGS(nr, flags) ({ \
|
||||||
|
BUILD_BUG_ON((unsigned)(nr) >= 32 - KVM_REQUEST_ARCH_BASE); \
|
||||||
|
(unsigned)(((nr) + KVM_REQUEST_ARCH_BASE) | (flags)); \
|
||||||
|
})
|
||||||
|
#define KVM_ARCH_REQ(nr) KVM_ARCH_REQ_FLAGS(nr, 0)
|
||||||
|
|
||||||
#define KVM_USERSPACE_IRQ_SOURCE_ID 0
|
#define KVM_USERSPACE_IRQ_SOURCE_ID 0
|
||||||
#define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID 1
|
#define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID 1
|
||||||
|
@ -1098,6 +1105,11 @@ static inline void kvm_make_request(int req, struct kvm_vcpu *vcpu)
|
||||||
set_bit(req & KVM_REQUEST_MASK, &vcpu->requests);
|
set_bit(req & KVM_REQUEST_MASK, &vcpu->requests);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static inline bool kvm_request_pending(struct kvm_vcpu *vcpu)
|
||||||
|
{
|
||||||
|
return READ_ONCE(vcpu->requests);
|
||||||
|
}
|
||||||
|
|
||||||
static inline bool kvm_test_request(int req, struct kvm_vcpu *vcpu)
|
static inline bool kvm_test_request(int req, struct kvm_vcpu *vcpu)
|
||||||
{
|
{
|
||||||
return test_bit(req & KVM_REQUEST_MASK, &vcpu->requests);
|
return test_bit(req & KVM_REQUEST_MASK, &vcpu->requests);
|
||||||
|
|
|
@ -155,6 +155,35 @@ struct kvm_s390_skeys {
|
||||||
__u32 reserved[9];
|
__u32 reserved[9];
|
||||||
};
|
};
|
||||||
|
|
||||||
|
#define KVM_S390_CMMA_PEEK (1 << 0)
|
||||||
|
|
||||||
|
/**
|
||||||
|
* kvm_s390_cmma_log - Used for CMMA migration.
|
||||||
|
*
|
||||||
|
* Used both for input and output.
|
||||||
|
*
|
||||||
|
* @start_gfn: Guest page number to start from.
|
||||||
|
* @count: Size of the result buffer.
|
||||||
|
* @flags: Control operation mode via KVM_S390_CMMA_* flags
|
||||||
|
* @remaining: Used with KVM_S390_GET_CMMA_BITS. Indicates how many dirty
|
||||||
|
* pages are still remaining.
|
||||||
|
* @mask: Used with KVM_S390_SET_CMMA_BITS. Bitmap of bits to actually set
|
||||||
|
* in the PGSTE.
|
||||||
|
* @values: Pointer to the values buffer.
|
||||||
|
*
|
||||||
|
* Used in KVM_S390_{G,S}ET_CMMA_BITS ioctls.
|
||||||
|
*/
|
||||||
|
struct kvm_s390_cmma_log {
|
||||||
|
__u64 start_gfn;
|
||||||
|
__u32 count;
|
||||||
|
__u32 flags;
|
||||||
|
union {
|
||||||
|
__u64 remaining;
|
||||||
|
__u64 mask;
|
||||||
|
};
|
||||||
|
__u64 values;
|
||||||
|
};
|
||||||
|
|
||||||
struct kvm_hyperv_exit {
|
struct kvm_hyperv_exit {
|
||||||
#define KVM_EXIT_HYPERV_SYNIC 1
|
#define KVM_EXIT_HYPERV_SYNIC 1
|
||||||
#define KVM_EXIT_HYPERV_HCALL 2
|
#define KVM_EXIT_HYPERV_HCALL 2
|
||||||
|
@ -895,6 +924,9 @@ struct kvm_ppc_resize_hpt {
|
||||||
#define KVM_CAP_SPAPR_TCE_VFIO 142
|
#define KVM_CAP_SPAPR_TCE_VFIO 142
|
||||||
#define KVM_CAP_X86_GUEST_MWAIT 143
|
#define KVM_CAP_X86_GUEST_MWAIT 143
|
||||||
#define KVM_CAP_ARM_USER_IRQ 144
|
#define KVM_CAP_ARM_USER_IRQ 144
|
||||||
|
#define KVM_CAP_S390_CMMA_MIGRATION 145
|
||||||
|
#define KVM_CAP_PPC_FWNMI 146
|
||||||
|
#define KVM_CAP_PPC_SMT_POSSIBLE 147
|
||||||
|
|
||||||
#ifdef KVM_CAP_IRQ_ROUTING
|
#ifdef KVM_CAP_IRQ_ROUTING
|
||||||
|
|
||||||
|
@ -1318,6 +1350,9 @@ struct kvm_s390_ucas_mapping {
|
||||||
#define KVM_S390_GET_IRQ_STATE _IOW(KVMIO, 0xb6, struct kvm_s390_irq_state)
|
#define KVM_S390_GET_IRQ_STATE _IOW(KVMIO, 0xb6, struct kvm_s390_irq_state)
|
||||||
/* Available with KVM_CAP_X86_SMM */
|
/* Available with KVM_CAP_X86_SMM */
|
||||||
#define KVM_SMI _IO(KVMIO, 0xb7)
|
#define KVM_SMI _IO(KVMIO, 0xb7)
|
||||||
|
/* Available with KVM_CAP_S390_CMMA_MIGRATION */
|
||||||
|
#define KVM_S390_GET_CMMA_BITS _IOW(KVMIO, 0xb8, struct kvm_s390_cmma_log)
|
||||||
|
#define KVM_S390_SET_CMMA_BITS _IOW(KVMIO, 0xb9, struct kvm_s390_cmma_log)
|
||||||
|
|
||||||
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
|
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
|
||||||
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
|
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
|
||||||
|
|
|
@ -295,114 +295,6 @@ class ArchS390(Arch):
|
||||||
ARCH = Arch.get_arch()
|
ARCH = Arch.get_arch()
|
||||||
|
|
||||||
|
|
||||||
def walkdir(path):
|
|
||||||
"""Returns os.walk() data for specified directory.
|
|
||||||
|
|
||||||
As it is only a wrapper it returns the same 3-tuple of (dirpath,
|
|
||||||
dirnames, filenames).
|
|
||||||
"""
|
|
||||||
return next(os.walk(path))
|
|
||||||
|
|
||||||
|
|
||||||
def parse_int_list(list_string):
|
|
||||||
"""Returns an int list from a string of comma separated integers and
|
|
||||||
integer ranges."""
|
|
||||||
integers = []
|
|
||||||
members = list_string.split(',')
|
|
||||||
|
|
||||||
for member in members:
|
|
||||||
if '-' not in member:
|
|
||||||
integers.append(int(member))
|
|
||||||
else:
|
|
||||||
int_range = member.split('-')
|
|
||||||
integers.extend(range(int(int_range[0]),
|
|
||||||
int(int_range[1]) + 1))
|
|
||||||
|
|
||||||
return integers
|
|
||||||
|
|
||||||
|
|
||||||
def get_pid_from_gname(gname):
|
|
||||||
"""Fuzzy function to convert guest name to QEMU process pid.
|
|
||||||
|
|
||||||
Returns a list of potential pids, can be empty if no match found.
|
|
||||||
Throws an exception on processing errors.
|
|
||||||
|
|
||||||
"""
|
|
||||||
pids = []
|
|
||||||
try:
|
|
||||||
child = subprocess.Popen(['ps', '-A', '--format', 'pid,args'],
|
|
||||||
stdout=subprocess.PIPE)
|
|
||||||
except:
|
|
||||||
raise Exception
|
|
||||||
for line in child.stdout:
|
|
||||||
line = line.lstrip().split(' ', 1)
|
|
||||||
# perform a sanity check before calling the more expensive
|
|
||||||
# function to possibly extract the guest name
|
|
||||||
if ' -name ' in line[1] and gname == get_gname_from_pid(line[0]):
|
|
||||||
pids.append(int(line[0]))
|
|
||||||
child.stdout.close()
|
|
||||||
|
|
||||||
return pids
|
|
||||||
|
|
||||||
|
|
||||||
def get_gname_from_pid(pid):
|
|
||||||
"""Returns the guest name for a QEMU process pid.
|
|
||||||
|
|
||||||
Extracts the guest name from the QEMU comma line by processing the '-name'
|
|
||||||
option. Will also handle names specified out of sequence.
|
|
||||||
|
|
||||||
"""
|
|
||||||
name = ''
|
|
||||||
try:
|
|
||||||
line = open('/proc/{}/cmdline'.format(pid), 'rb').read().split('\0')
|
|
||||||
parms = line[line.index('-name') + 1].split(',')
|
|
||||||
while '' in parms:
|
|
||||||
# commas are escaped (i.e. ',,'), hence e.g. 'foo,bar' results in
|
|
||||||
# ['foo', '', 'bar'], which we revert here
|
|
||||||
idx = parms.index('')
|
|
||||||
parms[idx - 1] += ',' + parms[idx + 1]
|
|
||||||
del parms[idx:idx+2]
|
|
||||||
# the '-name' switch allows for two ways to specify the guest name,
|
|
||||||
# where the plain name overrides the name specified via 'guest='
|
|
||||||
for arg in parms:
|
|
||||||
if '=' not in arg:
|
|
||||||
name = arg
|
|
||||||
break
|
|
||||||
if arg[:6] == 'guest=':
|
|
||||||
name = arg[6:]
|
|
||||||
except (ValueError, IOError, IndexError):
|
|
||||||
pass
|
|
||||||
|
|
||||||
return name
|
|
||||||
|
|
||||||
|
|
||||||
def get_online_cpus():
|
|
||||||
"""Returns a list of cpu id integers."""
|
|
||||||
with open('/sys/devices/system/cpu/online') as cpu_list:
|
|
||||||
cpu_string = cpu_list.readline()
|
|
||||||
return parse_int_list(cpu_string)
|
|
||||||
|
|
||||||
|
|
||||||
def get_filters():
|
|
||||||
"""Returns a dict of trace events, their filter ids and
|
|
||||||
the values that can be filtered.
|
|
||||||
|
|
||||||
Trace events can be filtered for special values by setting a
|
|
||||||
filter string via an ioctl. The string normally has the format
|
|
||||||
identifier==value. For each filter a new event will be created, to
|
|
||||||
be able to distinguish the events.
|
|
||||||
|
|
||||||
"""
|
|
||||||
filters = {}
|
|
||||||
filters['kvm_userspace_exit'] = ('reason', USERSPACE_EXIT_REASONS)
|
|
||||||
if ARCH.exit_reasons:
|
|
||||||
filters['kvm_exit'] = ('exit_reason', ARCH.exit_reasons)
|
|
||||||
return filters
|
|
||||||
|
|
||||||
libc = ctypes.CDLL('libc.so.6', use_errno=True)
|
|
||||||
syscall = libc.syscall
|
|
||||||
|
|
||||||
|
|
||||||
class perf_event_attr(ctypes.Structure):
|
class perf_event_attr(ctypes.Structure):
|
||||||
"""Struct that holds the necessary data to set up a trace event.
|
"""Struct that holds the necessary data to set up a trace event.
|
||||||
|
|
||||||
|
@ -432,25 +324,6 @@ class perf_event_attr(ctypes.Structure):
|
||||||
self.read_format = PERF_FORMAT_GROUP
|
self.read_format = PERF_FORMAT_GROUP
|
||||||
|
|
||||||
|
|
||||||
def perf_event_open(attr, pid, cpu, group_fd, flags):
|
|
||||||
"""Wrapper for the sys_perf_evt_open() syscall.
|
|
||||||
|
|
||||||
Used to set up performance events, returns a file descriptor or -1
|
|
||||||
on error.
|
|
||||||
|
|
||||||
Attributes are:
|
|
||||||
- syscall number
|
|
||||||
- struct perf_event_attr *
|
|
||||||
- pid or -1 to monitor all pids
|
|
||||||
- cpu number or -1 to monitor all cpus
|
|
||||||
- The file descriptor of the group leader or -1 to create a group.
|
|
||||||
- flags
|
|
||||||
|
|
||||||
"""
|
|
||||||
return syscall(ARCH.sc_perf_evt_open, ctypes.pointer(attr),
|
|
||||||
ctypes.c_int(pid), ctypes.c_int(cpu),
|
|
||||||
ctypes.c_int(group_fd), ctypes.c_long(flags))
|
|
||||||
|
|
||||||
PERF_TYPE_TRACEPOINT = 2
|
PERF_TYPE_TRACEPOINT = 2
|
||||||
PERF_FORMAT_GROUP = 1 << 3
|
PERF_FORMAT_GROUP = 1 << 3
|
||||||
|
|
||||||
|
@ -495,6 +368,8 @@ class Event(object):
|
||||||
"""Represents a performance event and manages its life cycle."""
|
"""Represents a performance event and manages its life cycle."""
|
||||||
def __init__(self, name, group, trace_cpu, trace_pid, trace_point,
|
def __init__(self, name, group, trace_cpu, trace_pid, trace_point,
|
||||||
trace_filter, trace_set='kvm'):
|
trace_filter, trace_set='kvm'):
|
||||||
|
self.libc = ctypes.CDLL('libc.so.6', use_errno=True)
|
||||||
|
self.syscall = self.libc.syscall
|
||||||
self.name = name
|
self.name = name
|
||||||
self.fd = None
|
self.fd = None
|
||||||
self.setup_event(group, trace_cpu, trace_pid, trace_point,
|
self.setup_event(group, trace_cpu, trace_pid, trace_point,
|
||||||
|
@ -511,6 +386,25 @@ class Event(object):
|
||||||
if self.fd:
|
if self.fd:
|
||||||
os.close(self.fd)
|
os.close(self.fd)
|
||||||
|
|
||||||
|
def perf_event_open(self, attr, pid, cpu, group_fd, flags):
|
||||||
|
"""Wrapper for the sys_perf_evt_open() syscall.
|
||||||
|
|
||||||
|
Used to set up performance events, returns a file descriptor or -1
|
||||||
|
on error.
|
||||||
|
|
||||||
|
Attributes are:
|
||||||
|
- syscall number
|
||||||
|
- struct perf_event_attr *
|
||||||
|
- pid or -1 to monitor all pids
|
||||||
|
- cpu number or -1 to monitor all cpus
|
||||||
|
- The file descriptor of the group leader or -1 to create a group.
|
||||||
|
- flags
|
||||||
|
|
||||||
|
"""
|
||||||
|
return self.syscall(ARCH.sc_perf_evt_open, ctypes.pointer(attr),
|
||||||
|
ctypes.c_int(pid), ctypes.c_int(cpu),
|
||||||
|
ctypes.c_int(group_fd), ctypes.c_long(flags))
|
||||||
|
|
||||||
def setup_event_attribute(self, trace_set, trace_point):
|
def setup_event_attribute(self, trace_set, trace_point):
|
||||||
"""Returns an initialized ctype perf_event_attr struct."""
|
"""Returns an initialized ctype perf_event_attr struct."""
|
||||||
|
|
||||||
|
@ -539,8 +433,8 @@ class Event(object):
|
||||||
if group.events:
|
if group.events:
|
||||||
group_leader = group.events[0].fd
|
group_leader = group.events[0].fd
|
||||||
|
|
||||||
fd = perf_event_open(event_attr, trace_pid,
|
fd = self.perf_event_open(event_attr, trace_pid,
|
||||||
trace_cpu, group_leader, 0)
|
trace_cpu, group_leader, 0)
|
||||||
if fd == -1:
|
if fd == -1:
|
||||||
err = ctypes.get_errno()
|
err = ctypes.get_errno()
|
||||||
raise OSError(err, os.strerror(err),
|
raise OSError(err, os.strerror(err),
|
||||||
|
@ -575,17 +469,53 @@ class Event(object):
|
||||||
fcntl.ioctl(self.fd, ARCH.ioctl_numbers['RESET'], 0)
|
fcntl.ioctl(self.fd, ARCH.ioctl_numbers['RESET'], 0)
|
||||||
|
|
||||||
|
|
||||||
class TracepointProvider(object):
|
class Provider(object):
|
||||||
|
"""Encapsulates functionalities used by all providers."""
|
||||||
|
@staticmethod
|
||||||
|
def is_field_wanted(fields_filter, field):
|
||||||
|
"""Indicate whether field is valid according to fields_filter."""
|
||||||
|
if not fields_filter:
|
||||||
|
return True
|
||||||
|
return re.match(fields_filter, field) is not None
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def walkdir(path):
|
||||||
|
"""Returns os.walk() data for specified directory.
|
||||||
|
|
||||||
|
As it is only a wrapper it returns the same 3-tuple of (dirpath,
|
||||||
|
dirnames, filenames).
|
||||||
|
"""
|
||||||
|
return next(os.walk(path))
|
||||||
|
|
||||||
|
|
||||||
|
class TracepointProvider(Provider):
|
||||||
"""Data provider for the stats class.
|
"""Data provider for the stats class.
|
||||||
|
|
||||||
Manages the events/groups from which it acquires its data.
|
Manages the events/groups from which it acquires its data.
|
||||||
|
|
||||||
"""
|
"""
|
||||||
def __init__(self):
|
def __init__(self, pid, fields_filter):
|
||||||
self.group_leaders = []
|
self.group_leaders = []
|
||||||
self.filters = get_filters()
|
self.filters = self.get_filters()
|
||||||
self._fields = self.get_available_fields()
|
self.update_fields(fields_filter)
|
||||||
self._pid = 0
|
self.pid = pid
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def get_filters():
|
||||||
|
"""Returns a dict of trace events, their filter ids and
|
||||||
|
the values that can be filtered.
|
||||||
|
|
||||||
|
Trace events can be filtered for special values by setting a
|
||||||
|
filter string via an ioctl. The string normally has the format
|
||||||
|
identifier==value. For each filter a new event will be created, to
|
||||||
|
be able to distinguish the events.
|
||||||
|
|
||||||
|
"""
|
||||||
|
filters = {}
|
||||||
|
filters['kvm_userspace_exit'] = ('reason', USERSPACE_EXIT_REASONS)
|
||||||
|
if ARCH.exit_reasons:
|
||||||
|
filters['kvm_exit'] = ('exit_reason', ARCH.exit_reasons)
|
||||||
|
return filters
|
||||||
|
|
||||||
def get_available_fields(self):
|
def get_available_fields(self):
|
||||||
"""Returns a list of available event's of format 'event name(filter
|
"""Returns a list of available event's of format 'event name(filter
|
||||||
|
@ -603,7 +533,7 @@ class TracepointProvider(object):
|
||||||
|
|
||||||
"""
|
"""
|
||||||
path = os.path.join(PATH_DEBUGFS_TRACING, 'events', 'kvm')
|
path = os.path.join(PATH_DEBUGFS_TRACING, 'events', 'kvm')
|
||||||
fields = walkdir(path)[1]
|
fields = self.walkdir(path)[1]
|
||||||
extra = []
|
extra = []
|
||||||
for field in fields:
|
for field in fields:
|
||||||
if field in self.filters:
|
if field in self.filters:
|
||||||
|
@ -613,6 +543,34 @@ class TracepointProvider(object):
|
||||||
fields += extra
|
fields += extra
|
||||||
return fields
|
return fields
|
||||||
|
|
||||||
|
def update_fields(self, fields_filter):
|
||||||
|
"""Refresh fields, applying fields_filter"""
|
||||||
|
self._fields = [field for field in self.get_available_fields()
|
||||||
|
if self.is_field_wanted(fields_filter, field)]
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def get_online_cpus():
|
||||||
|
"""Returns a list of cpu id integers."""
|
||||||
|
def parse_int_list(list_string):
|
||||||
|
"""Returns an int list from a string of comma separated integers and
|
||||||
|
integer ranges."""
|
||||||
|
integers = []
|
||||||
|
members = list_string.split(',')
|
||||||
|
|
||||||
|
for member in members:
|
||||||
|
if '-' not in member:
|
||||||
|
integers.append(int(member))
|
||||||
|
else:
|
||||||
|
int_range = member.split('-')
|
||||||
|
integers.extend(range(int(int_range[0]),
|
||||||
|
int(int_range[1]) + 1))
|
||||||
|
|
||||||
|
return integers
|
||||||
|
|
||||||
|
with open('/sys/devices/system/cpu/online') as cpu_list:
|
||||||
|
cpu_string = cpu_list.readline()
|
||||||
|
return parse_int_list(cpu_string)
|
||||||
|
|
||||||
def setup_traces(self):
|
def setup_traces(self):
|
||||||
"""Creates all event and group objects needed to be able to retrieve
|
"""Creates all event and group objects needed to be able to retrieve
|
||||||
data."""
|
data."""
|
||||||
|
@ -621,9 +579,9 @@ class TracepointProvider(object):
|
||||||
# Fetch list of all threads of the monitored pid, as qemu
|
# Fetch list of all threads of the monitored pid, as qemu
|
||||||
# starts a thread for each vcpu.
|
# starts a thread for each vcpu.
|
||||||
path = os.path.join('/proc', str(self._pid), 'task')
|
path = os.path.join('/proc', str(self._pid), 'task')
|
||||||
groupids = walkdir(path)[1]
|
groupids = self.walkdir(path)[1]
|
||||||
else:
|
else:
|
||||||
groupids = get_online_cpus()
|
groupids = self.get_online_cpus()
|
||||||
|
|
||||||
# The constant is needed as a buffer for python libs, std
|
# The constant is needed as a buffer for python libs, std
|
||||||
# streams and other files that the script opens.
|
# streams and other files that the script opens.
|
||||||
|
@ -671,9 +629,6 @@ class TracepointProvider(object):
|
||||||
|
|
||||||
self.group_leaders.append(group)
|
self.group_leaders.append(group)
|
||||||
|
|
||||||
def available_fields(self):
|
|
||||||
return self.get_available_fields()
|
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def fields(self):
|
def fields(self):
|
||||||
return self._fields
|
return self._fields
|
||||||
|
@ -707,7 +662,7 @@ class TracepointProvider(object):
|
||||||
self.setup_traces()
|
self.setup_traces()
|
||||||
self.fields = self._fields
|
self.fields = self._fields
|
||||||
|
|
||||||
def read(self):
|
def read(self, by_guest=0):
|
||||||
"""Returns 'event name: current value' for all enabled events."""
|
"""Returns 'event name: current value' for all enabled events."""
|
||||||
ret = defaultdict(int)
|
ret = defaultdict(int)
|
||||||
for group in self.group_leaders:
|
for group in self.group_leaders:
|
||||||
|
@ -723,16 +678,17 @@ class TracepointProvider(object):
|
||||||
event.reset()
|
event.reset()
|
||||||
|
|
||||||
|
|
||||||
class DebugfsProvider(object):
|
class DebugfsProvider(Provider):
|
||||||
"""Provides data from the files that KVM creates in the kvm debugfs
|
"""Provides data from the files that KVM creates in the kvm debugfs
|
||||||
folder."""
|
folder."""
|
||||||
def __init__(self):
|
def __init__(self, pid, fields_filter, include_past):
|
||||||
self._fields = self.get_available_fields()
|
self.update_fields(fields_filter)
|
||||||
self._baseline = {}
|
self._baseline = {}
|
||||||
self._pid = 0
|
|
||||||
self.do_read = True
|
self.do_read = True
|
||||||
self.paths = []
|
self.paths = []
|
||||||
self.reset()
|
self.pid = pid
|
||||||
|
if include_past:
|
||||||
|
self.restore()
|
||||||
|
|
||||||
def get_available_fields(self):
|
def get_available_fields(self):
|
||||||
""""Returns a list of available fields.
|
""""Returns a list of available fields.
|
||||||
|
@ -740,7 +696,12 @@ class DebugfsProvider(object):
|
||||||
The fields are all available KVM debugfs files
|
The fields are all available KVM debugfs files
|
||||||
|
|
||||||
"""
|
"""
|
||||||
return walkdir(PATH_DEBUGFS_KVM)[2]
|
return self.walkdir(PATH_DEBUGFS_KVM)[2]
|
||||||
|
|
||||||
|
def update_fields(self, fields_filter):
|
||||||
|
"""Refresh fields, applying fields_filter"""
|
||||||
|
self._fields = [field for field in self.get_available_fields()
|
||||||
|
if self.is_field_wanted(fields_filter, field)]
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def fields(self):
|
def fields(self):
|
||||||
|
@ -757,10 +718,9 @@ class DebugfsProvider(object):
|
||||||
|
|
||||||
@pid.setter
|
@pid.setter
|
||||||
def pid(self, pid):
|
def pid(self, pid):
|
||||||
|
self._pid = pid
|
||||||
if pid != 0:
|
if pid != 0:
|
||||||
self._pid = pid
|
vms = self.walkdir(PATH_DEBUGFS_KVM)[1]
|
||||||
|
|
||||||
vms = walkdir(PATH_DEBUGFS_KVM)[1]
|
|
||||||
if len(vms) == 0:
|
if len(vms) == 0:
|
||||||
self.do_read = False
|
self.do_read = False
|
||||||
|
|
||||||
|
@ -771,8 +731,15 @@ class DebugfsProvider(object):
|
||||||
self.do_read = True
|
self.do_read = True
|
||||||
self.reset()
|
self.reset()
|
||||||
|
|
||||||
def read(self, reset=0):
|
def read(self, reset=0, by_guest=0):
|
||||||
"""Returns a dict with format:'file name / field -> current value'."""
|
"""Returns a dict with format:'file name / field -> current value'.
|
||||||
|
|
||||||
|
Parameter 'reset':
|
||||||
|
0 plain read
|
||||||
|
1 reset field counts to 0
|
||||||
|
2 restore the original field counts
|
||||||
|
|
||||||
|
"""
|
||||||
results = {}
|
results = {}
|
||||||
|
|
||||||
# If no debugfs filtering support is available, then don't read.
|
# If no debugfs filtering support is available, then don't read.
|
||||||
|
@ -789,12 +756,22 @@ class DebugfsProvider(object):
|
||||||
for field in self._fields:
|
for field in self._fields:
|
||||||
value = self.read_field(field, path)
|
value = self.read_field(field, path)
|
||||||
key = path + field
|
key = path + field
|
||||||
if reset:
|
if reset == 1:
|
||||||
self._baseline[key] = value
|
self._baseline[key] = value
|
||||||
|
if reset == 2:
|
||||||
|
self._baseline[key] = 0
|
||||||
if self._baseline.get(key, -1) == -1:
|
if self._baseline.get(key, -1) == -1:
|
||||||
self._baseline[key] = value
|
self._baseline[key] = value
|
||||||
results[field] = (results.get(field, 0) + value -
|
increment = (results.get(field, 0) + value -
|
||||||
self._baseline.get(key, 0))
|
self._baseline.get(key, 0))
|
||||||
|
if by_guest:
|
||||||
|
pid = key.split('-')[0]
|
||||||
|
if pid in results:
|
||||||
|
results[pid] += increment
|
||||||
|
else:
|
||||||
|
results[pid] = increment
|
||||||
|
else:
|
||||||
|
results[field] = increment
|
||||||
|
|
||||||
return results
|
return results
|
||||||
|
|
||||||
|
@ -813,6 +790,11 @@ class DebugfsProvider(object):
|
||||||
self._baseline = {}
|
self._baseline = {}
|
||||||
self.read(1)
|
self.read(1)
|
||||||
|
|
||||||
|
def restore(self):
|
||||||
|
"""Reset field counters"""
|
||||||
|
self._baseline = {}
|
||||||
|
self.read(2)
|
||||||
|
|
||||||
|
|
||||||
class Stats(object):
|
class Stats(object):
|
||||||
"""Manages the data providers and the data they provide.
|
"""Manages the data providers and the data they provide.
|
||||||
|
@ -821,33 +803,32 @@ class Stats(object):
|
||||||
provider data.
|
provider data.
|
||||||
|
|
||||||
"""
|
"""
|
||||||
def __init__(self, providers, pid, fields=None):
|
def __init__(self, options):
|
||||||
self.providers = providers
|
self.providers = self.get_providers(options)
|
||||||
self._pid_filter = pid
|
self._pid_filter = options.pid
|
||||||
self._fields_filter = fields
|
self._fields_filter = options.fields
|
||||||
self.values = {}
|
self.values = {}
|
||||||
self.update_provider_pid()
|
|
||||||
self.update_provider_filters()
|
@staticmethod
|
||||||
|
def get_providers(options):
|
||||||
|
"""Returns a list of data providers depending on the passed options."""
|
||||||
|
providers = []
|
||||||
|
|
||||||
|
if options.debugfs:
|
||||||
|
providers.append(DebugfsProvider(options.pid, options.fields,
|
||||||
|
options.dbgfs_include_past))
|
||||||
|
if options.tracepoints or not providers:
|
||||||
|
providers.append(TracepointProvider(options.pid, options.fields))
|
||||||
|
|
||||||
|
return providers
|
||||||
|
|
||||||
def update_provider_filters(self):
|
def update_provider_filters(self):
|
||||||
"""Propagates fields filters to providers."""
|
"""Propagates fields filters to providers."""
|
||||||
def wanted(key):
|
|
||||||
if not self._fields_filter:
|
|
||||||
return True
|
|
||||||
return re.match(self._fields_filter, key) is not None
|
|
||||||
|
|
||||||
# As we reset the counters when updating the fields we can
|
# As we reset the counters when updating the fields we can
|
||||||
# also clear the cache of old values.
|
# also clear the cache of old values.
|
||||||
self.values = {}
|
self.values = {}
|
||||||
for provider in self.providers:
|
for provider in self.providers:
|
||||||
provider_fields = [key for key in provider.get_available_fields()
|
provider.update_fields(self._fields_filter)
|
||||||
if wanted(key)]
|
|
||||||
provider.fields = provider_fields
|
|
||||||
|
|
||||||
def update_provider_pid(self):
|
|
||||||
"""Propagates pid filters to providers."""
|
|
||||||
for provider in self.providers:
|
|
||||||
provider.pid = self._pid_filter
|
|
||||||
|
|
||||||
def reset(self):
|
def reset(self):
|
||||||
self.values = {}
|
self.values = {}
|
||||||
|
@ -873,27 +854,52 @@ class Stats(object):
|
||||||
if pid != self._pid_filter:
|
if pid != self._pid_filter:
|
||||||
self._pid_filter = pid
|
self._pid_filter = pid
|
||||||
self.values = {}
|
self.values = {}
|
||||||
self.update_provider_pid()
|
for provider in self.providers:
|
||||||
|
provider.pid = self._pid_filter
|
||||||
|
|
||||||
def get(self):
|
def get(self, by_guest=0):
|
||||||
"""Returns a dict with field -> (value, delta to last value) of all
|
"""Returns a dict with field -> (value, delta to last value) of all
|
||||||
provider data."""
|
provider data."""
|
||||||
for provider in self.providers:
|
for provider in self.providers:
|
||||||
new = provider.read()
|
new = provider.read(by_guest=by_guest)
|
||||||
for key in provider.fields:
|
for key in new if by_guest else provider.fields:
|
||||||
oldval = self.values.get(key, (0, 0))[0]
|
oldval = self.values.get(key, (0, 0))[0]
|
||||||
newval = new.get(key, 0)
|
newval = new.get(key, 0)
|
||||||
newdelta = newval - oldval
|
newdelta = newval - oldval
|
||||||
self.values[key] = (newval, newdelta)
|
self.values[key] = (newval, newdelta)
|
||||||
return self.values
|
return self.values
|
||||||
|
|
||||||
LABEL_WIDTH = 40
|
def toggle_display_guests(self, to_pid):
|
||||||
NUMBER_WIDTH = 10
|
"""Toggle between collection of stats by individual event and by
|
||||||
DELAY_INITIAL = 0.25
|
guest pid
|
||||||
DELAY_REGULAR = 3.0
|
|
||||||
|
Events reported by DebugfsProvider change when switching to/from
|
||||||
|
reading by guest values. Hence we have to remove the excess event
|
||||||
|
names from self.values.
|
||||||
|
|
||||||
|
"""
|
||||||
|
if any(isinstance(ins, TracepointProvider) for ins in self.providers):
|
||||||
|
return 1
|
||||||
|
if to_pid:
|
||||||
|
for provider in self.providers:
|
||||||
|
if isinstance(provider, DebugfsProvider):
|
||||||
|
for key in provider.fields:
|
||||||
|
if key in self.values.keys():
|
||||||
|
del self.values[key]
|
||||||
|
else:
|
||||||
|
oldvals = self.values.copy()
|
||||||
|
for key in oldvals:
|
||||||
|
if key.isdigit():
|
||||||
|
del self.values[key]
|
||||||
|
# Update oldval (see get())
|
||||||
|
self.get(to_pid)
|
||||||
|
return 0
|
||||||
|
|
||||||
|
DELAY_DEFAULT = 3.0
|
||||||
MAX_GUEST_NAME_LEN = 48
|
MAX_GUEST_NAME_LEN = 48
|
||||||
MAX_REGEX_LEN = 44
|
MAX_REGEX_LEN = 44
|
||||||
DEFAULT_REGEX = r'^[^\(]*$'
|
DEFAULT_REGEX = r'^[^\(]*$'
|
||||||
|
SORT_DEFAULT = 0
|
||||||
|
|
||||||
|
|
||||||
class Tui(object):
|
class Tui(object):
|
||||||
|
@ -901,7 +907,10 @@ class Tui(object):
|
||||||
def __init__(self, stats):
|
def __init__(self, stats):
|
||||||
self.stats = stats
|
self.stats = stats
|
||||||
self.screen = None
|
self.screen = None
|
||||||
self.update_drilldown()
|
self._delay_initial = 0.25
|
||||||
|
self._delay_regular = DELAY_DEFAULT
|
||||||
|
self._sorting = SORT_DEFAULT
|
||||||
|
self._display_guests = 0
|
||||||
|
|
||||||
def __enter__(self):
|
def __enter__(self):
|
||||||
"""Initialises curses for later use. Based on curses.wrapper
|
"""Initialises curses for later use. Based on curses.wrapper
|
||||||
|
@ -929,7 +938,7 @@ class Tui(object):
|
||||||
return self
|
return self
|
||||||
|
|
||||||
def __exit__(self, *exception):
|
def __exit__(self, *exception):
|
||||||
"""Resets the terminal to its normal state. Based on curses.wrappre
|
"""Resets the terminal to its normal state. Based on curses.wrapper
|
||||||
implementation from the Python standard library."""
|
implementation from the Python standard library."""
|
||||||
if self.screen:
|
if self.screen:
|
||||||
self.screen.keypad(0)
|
self.screen.keypad(0)
|
||||||
|
@ -937,6 +946,86 @@ class Tui(object):
|
||||||
curses.nocbreak()
|
curses.nocbreak()
|
||||||
curses.endwin()
|
curses.endwin()
|
||||||
|
|
||||||
|
def get_all_gnames(self):
|
||||||
|
"""Returns a list of (pid, gname) tuples of all running guests"""
|
||||||
|
res = []
|
||||||
|
try:
|
||||||
|
child = subprocess.Popen(['ps', '-A', '--format', 'pid,args'],
|
||||||
|
stdout=subprocess.PIPE)
|
||||||
|
except:
|
||||||
|
raise Exception
|
||||||
|
for line in child.stdout:
|
||||||
|
line = line.lstrip().split(' ', 1)
|
||||||
|
# perform a sanity check before calling the more expensive
|
||||||
|
# function to possibly extract the guest name
|
||||||
|
if ' -name ' in line[1]:
|
||||||
|
res.append((line[0], self.get_gname_from_pid(line[0])))
|
||||||
|
child.stdout.close()
|
||||||
|
|
||||||
|
return res
|
||||||
|
|
||||||
|
def print_all_gnames(self, row):
|
||||||
|
"""Print a list of all running guests along with their pids."""
|
||||||
|
self.screen.addstr(row, 2, '%8s %-60s' %
|
||||||
|
('Pid', 'Guest Name (fuzzy list, might be '
|
||||||
|
'inaccurate!)'),
|
||||||
|
curses.A_UNDERLINE)
|
||||||
|
row += 1
|
||||||
|
try:
|
||||||
|
for line in self.get_all_gnames():
|
||||||
|
self.screen.addstr(row, 2, '%8s %-60s' % (line[0], line[1]))
|
||||||
|
row += 1
|
||||||
|
if row >= self.screen.getmaxyx()[0]:
|
||||||
|
break
|
||||||
|
except Exception:
|
||||||
|
self.screen.addstr(row + 1, 2, 'Not available')
|
||||||
|
|
||||||
|
def get_pid_from_gname(self, gname):
|
||||||
|
"""Fuzzy function to convert guest name to QEMU process pid.
|
||||||
|
|
||||||
|
Returns a list of potential pids, can be empty if no match found.
|
||||||
|
Throws an exception on processing errors.
|
||||||
|
|
||||||
|
"""
|
||||||
|
pids = []
|
||||||
|
for line in self.get_all_gnames():
|
||||||
|
if gname == line[1]:
|
||||||
|
pids.append(int(line[0]))
|
||||||
|
|
||||||
|
return pids
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def get_gname_from_pid(pid):
|
||||||
|
"""Returns the guest name for a QEMU process pid.
|
||||||
|
|
||||||
|
Extracts the guest name from the QEMU comma line by processing the
|
||||||
|
'-name' option. Will also handle names specified out of sequence.
|
||||||
|
|
||||||
|
"""
|
||||||
|
name = ''
|
||||||
|
try:
|
||||||
|
line = open('/proc/{}/cmdline'
|
||||||
|
.format(pid), 'rb').read().split('\0')
|
||||||
|
parms = line[line.index('-name') + 1].split(',')
|
||||||
|
while '' in parms:
|
||||||
|
# commas are escaped (i.e. ',,'), hence e.g. 'foo,bar' results
|
||||||
|
# in # ['foo', '', 'bar'], which we revert here
|
||||||
|
idx = parms.index('')
|
||||||
|
parms[idx - 1] += ',' + parms[idx + 1]
|
||||||
|
del parms[idx:idx+2]
|
||||||
|
# the '-name' switch allows for two ways to specify the guest name,
|
||||||
|
# where the plain name overrides the name specified via 'guest='
|
||||||
|
for arg in parms:
|
||||||
|
if '=' not in arg:
|
||||||
|
name = arg
|
||||||
|
break
|
||||||
|
if arg[:6] == 'guest=':
|
||||||
|
name = arg[6:]
|
||||||
|
except (ValueError, IOError, IndexError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
return name
|
||||||
|
|
||||||
def update_drilldown(self):
|
def update_drilldown(self):
|
||||||
"""Sets or removes a filter that only allows fields without braces."""
|
"""Sets or removes a filter that only allows fields without braces."""
|
||||||
if not self.stats.fields_filter:
|
if not self.stats.fields_filter:
|
||||||
|
@ -954,7 +1043,7 @@ class Tui(object):
|
||||||
if pid is None:
|
if pid is None:
|
||||||
pid = self.stats.pid_filter
|
pid = self.stats.pid_filter
|
||||||
self.screen.erase()
|
self.screen.erase()
|
||||||
gname = get_gname_from_pid(pid)
|
gname = self.get_gname_from_pid(pid)
|
||||||
if gname:
|
if gname:
|
||||||
gname = ('({})'.format(gname[:MAX_GUEST_NAME_LEN] + '...'
|
gname = ('({})'.format(gname[:MAX_GUEST_NAME_LEN] + '...'
|
||||||
if len(gname) > MAX_GUEST_NAME_LEN
|
if len(gname) > MAX_GUEST_NAME_LEN
|
||||||
|
@ -970,13 +1059,13 @@ class Tui(object):
|
||||||
if len(regex) > MAX_REGEX_LEN:
|
if len(regex) > MAX_REGEX_LEN:
|
||||||
regex = regex[:MAX_REGEX_LEN] + '...'
|
regex = regex[:MAX_REGEX_LEN] + '...'
|
||||||
self.screen.addstr(1, 17, 'regex filter: {0}'.format(regex))
|
self.screen.addstr(1, 17, 'regex filter: {0}'.format(regex))
|
||||||
self.screen.addstr(2, 1, 'Event')
|
if self._display_guests:
|
||||||
self.screen.addstr(2, 1 + LABEL_WIDTH + NUMBER_WIDTH -
|
col_name = 'Guest Name'
|
||||||
len('Total'), 'Total')
|
else:
|
||||||
self.screen.addstr(2, 1 + LABEL_WIDTH + NUMBER_WIDTH + 7 -
|
col_name = 'Event'
|
||||||
len('%Total'), '%Total')
|
self.screen.addstr(2, 1, '%-40s %10s%7s %8s' %
|
||||||
self.screen.addstr(2, 1 + LABEL_WIDTH + NUMBER_WIDTH + 7 + 8 -
|
(col_name, 'Total', '%Total', 'CurAvg/s'),
|
||||||
len('Current'), 'Current')
|
curses.A_STANDOUT)
|
||||||
self.screen.addstr(4, 1, 'Collecting data...')
|
self.screen.addstr(4, 1, 'Collecting data...')
|
||||||
self.screen.refresh()
|
self.screen.refresh()
|
||||||
|
|
||||||
|
@ -984,16 +1073,25 @@ class Tui(object):
|
||||||
row = 3
|
row = 3
|
||||||
self.screen.move(row, 0)
|
self.screen.move(row, 0)
|
||||||
self.screen.clrtobot()
|
self.screen.clrtobot()
|
||||||
stats = self.stats.get()
|
stats = self.stats.get(self._display_guests)
|
||||||
|
|
||||||
def sortkey(x):
|
def sortCurAvg(x):
|
||||||
|
# sort by current events if available
|
||||||
if stats[x][1]:
|
if stats[x][1]:
|
||||||
return (-stats[x][1], -stats[x][0])
|
return (-stats[x][1], -stats[x][0])
|
||||||
else:
|
else:
|
||||||
return (0, -stats[x][0])
|
return (0, -stats[x][0])
|
||||||
|
|
||||||
|
def sortTotal(x):
|
||||||
|
# sort by totals
|
||||||
|
return (0, -stats[x][0])
|
||||||
total = 0.
|
total = 0.
|
||||||
for val in stats.values():
|
for val in stats.values():
|
||||||
total += val[0]
|
total += val[0]
|
||||||
|
if self._sorting == SORT_DEFAULT:
|
||||||
|
sortkey = sortCurAvg
|
||||||
|
else:
|
||||||
|
sortkey = sortTotal
|
||||||
for key in sorted(stats.keys(), key=sortkey):
|
for key in sorted(stats.keys(), key=sortkey):
|
||||||
|
|
||||||
if row >= self.screen.getmaxyx()[0]:
|
if row >= self.screen.getmaxyx()[0]:
|
||||||
|
@ -1001,18 +1099,61 @@ class Tui(object):
|
||||||
values = stats[key]
|
values = stats[key]
|
||||||
if not values[0] and not values[1]:
|
if not values[0] and not values[1]:
|
||||||
break
|
break
|
||||||
col = 1
|
if values[0] is not None:
|
||||||
self.screen.addstr(row, col, key)
|
cur = int(round(values[1] / sleeptime)) if values[1] else ''
|
||||||
col += LABEL_WIDTH
|
if self._display_guests:
|
||||||
self.screen.addstr(row, col, '%10d' % (values[0],))
|
key = self.get_gname_from_pid(key)
|
||||||
col += NUMBER_WIDTH
|
self.screen.addstr(row, 1, '%-40s %10d%7.1f %8s' %
|
||||||
self.screen.addstr(row, col, '%7.1f' % (values[0] * 100 / total,))
|
(key, values[0], values[0] * 100 / total,
|
||||||
col += 7
|
cur))
|
||||||
if values[1] is not None:
|
|
||||||
self.screen.addstr(row, col, '%8d' % (values[1] / sleeptime,))
|
|
||||||
row += 1
|
row += 1
|
||||||
|
if row == 3:
|
||||||
|
self.screen.addstr(4, 1, 'No matching events reported yet')
|
||||||
self.screen.refresh()
|
self.screen.refresh()
|
||||||
|
|
||||||
|
def show_msg(self, text):
|
||||||
|
"""Display message centered text and exit on key press"""
|
||||||
|
hint = 'Press any key to continue'
|
||||||
|
curses.cbreak()
|
||||||
|
self.screen.erase()
|
||||||
|
(x, term_width) = self.screen.getmaxyx()
|
||||||
|
row = 2
|
||||||
|
for line in text:
|
||||||
|
start = (term_width - len(line)) / 2
|
||||||
|
self.screen.addstr(row, start, line)
|
||||||
|
row += 1
|
||||||
|
self.screen.addstr(row + 1, (term_width - len(hint)) / 2, hint,
|
||||||
|
curses.A_STANDOUT)
|
||||||
|
self.screen.getkey()
|
||||||
|
|
||||||
|
def show_help_interactive(self):
|
||||||
|
"""Display help with list of interactive commands"""
|
||||||
|
msg = (' b toggle events by guests (debugfs only, honors'
|
||||||
|
' filters)',
|
||||||
|
' c clear filter',
|
||||||
|
' f filter by regular expression',
|
||||||
|
' g filter by guest name',
|
||||||
|
' h display interactive commands reference',
|
||||||
|
' o toggle sorting order (Total vs CurAvg/s)',
|
||||||
|
' p filter by PID',
|
||||||
|
' q quit',
|
||||||
|
' r reset stats',
|
||||||
|
' s set update interval',
|
||||||
|
' x toggle reporting of stats for individual child trace'
|
||||||
|
' events',
|
||||||
|
'Any other key refreshes statistics immediately')
|
||||||
|
curses.cbreak()
|
||||||
|
self.screen.erase()
|
||||||
|
self.screen.addstr(0, 0, "Interactive commands reference",
|
||||||
|
curses.A_BOLD)
|
||||||
|
self.screen.addstr(2, 0, "Press any key to exit", curses.A_STANDOUT)
|
||||||
|
row = 4
|
||||||
|
for line in msg:
|
||||||
|
self.screen.addstr(row, 0, line)
|
||||||
|
row += 1
|
||||||
|
self.screen.getkey()
|
||||||
|
self.refresh_header()
|
||||||
|
|
||||||
def show_filter_selection(self):
|
def show_filter_selection(self):
|
||||||
"""Draws filter selection mask.
|
"""Draws filter selection mask.
|
||||||
|
|
||||||
|
@ -1059,6 +1200,7 @@ class Tui(object):
|
||||||
'This might limit the shown data to the trace '
|
'This might limit the shown data to the trace '
|
||||||
'statistics.')
|
'statistics.')
|
||||||
self.screen.addstr(5, 0, msg)
|
self.screen.addstr(5, 0, msg)
|
||||||
|
self.print_all_gnames(7)
|
||||||
|
|
||||||
curses.echo()
|
curses.echo()
|
||||||
self.screen.addstr(3, 0, "Pid [0 or pid]: ")
|
self.screen.addstr(3, 0, "Pid [0 or pid]: ")
|
||||||
|
@ -1077,10 +1219,40 @@ class Tui(object):
|
||||||
self.refresh_header(pid)
|
self.refresh_header(pid)
|
||||||
self.update_pid(pid)
|
self.update_pid(pid)
|
||||||
break
|
break
|
||||||
|
|
||||||
except ValueError:
|
except ValueError:
|
||||||
msg = '"' + str(pid) + '": Not a valid pid'
|
msg = '"' + str(pid) + '": Not a valid pid'
|
||||||
continue
|
|
||||||
|
def show_set_update_interval(self):
|
||||||
|
"""Draws update interval selection mask."""
|
||||||
|
msg = ''
|
||||||
|
while True:
|
||||||
|
self.screen.erase()
|
||||||
|
self.screen.addstr(0, 0, 'Set update interval (defaults to %fs).' %
|
||||||
|
DELAY_DEFAULT, curses.A_BOLD)
|
||||||
|
self.screen.addstr(4, 0, msg)
|
||||||
|
self.screen.addstr(2, 0, 'Change delay from %.1fs to ' %
|
||||||
|
self._delay_regular)
|
||||||
|
curses.echo()
|
||||||
|
val = self.screen.getstr()
|
||||||
|
curses.noecho()
|
||||||
|
|
||||||
|
try:
|
||||||
|
if len(val) > 0:
|
||||||
|
delay = float(val)
|
||||||
|
if delay < 0.1:
|
||||||
|
msg = '"' + str(val) + '": Value must be >=0.1'
|
||||||
|
continue
|
||||||
|
if delay > 25.5:
|
||||||
|
msg = '"' + str(val) + '": Value must be <=25.5'
|
||||||
|
continue
|
||||||
|
else:
|
||||||
|
delay = DELAY_DEFAULT
|
||||||
|
self._delay_regular = delay
|
||||||
|
break
|
||||||
|
|
||||||
|
except ValueError:
|
||||||
|
msg = '"' + str(val) + '": Invalid value'
|
||||||
|
self.refresh_header()
|
||||||
|
|
||||||
def show_vm_selection_by_guest_name(self):
|
def show_vm_selection_by_guest_name(self):
|
||||||
"""Draws guest selection mask.
|
"""Draws guest selection mask.
|
||||||
|
@ -1098,6 +1270,7 @@ class Tui(object):
|
||||||
'This might limit the shown data to the trace '
|
'This might limit the shown data to the trace '
|
||||||
'statistics.')
|
'statistics.')
|
||||||
self.screen.addstr(5, 0, msg)
|
self.screen.addstr(5, 0, msg)
|
||||||
|
self.print_all_gnames(7)
|
||||||
curses.echo()
|
curses.echo()
|
||||||
self.screen.addstr(3, 0, "Guest [ENTER or guest]: ")
|
self.screen.addstr(3, 0, "Guest [ENTER or guest]: ")
|
||||||
gname = self.screen.getstr()
|
gname = self.screen.getstr()
|
||||||
|
@ -1110,7 +1283,7 @@ class Tui(object):
|
||||||
else:
|
else:
|
||||||
pids = []
|
pids = []
|
||||||
try:
|
try:
|
||||||
pids = get_pid_from_gname(gname)
|
pids = self.get_pid_from_gname(gname)
|
||||||
except:
|
except:
|
||||||
msg = '"' + gname + '": Internal error while searching, ' \
|
msg = '"' + gname + '": Internal error while searching, ' \
|
||||||
'use pid filter instead'
|
'use pid filter instead'
|
||||||
|
@ -1128,38 +1301,60 @@ class Tui(object):
|
||||||
|
|
||||||
def show_stats(self):
|
def show_stats(self):
|
||||||
"""Refreshes the screen and processes user input."""
|
"""Refreshes the screen and processes user input."""
|
||||||
sleeptime = DELAY_INITIAL
|
sleeptime = self._delay_initial
|
||||||
self.refresh_header()
|
self.refresh_header()
|
||||||
|
start = 0.0 # result based on init value never appears on screen
|
||||||
while True:
|
while True:
|
||||||
self.refresh_body(sleeptime)
|
self.refresh_body(time.time() - start)
|
||||||
curses.halfdelay(int(sleeptime * 10))
|
curses.halfdelay(int(sleeptime * 10))
|
||||||
sleeptime = DELAY_REGULAR
|
start = time.time()
|
||||||
|
sleeptime = self._delay_regular
|
||||||
try:
|
try:
|
||||||
char = self.screen.getkey()
|
char = self.screen.getkey()
|
||||||
if char == 'x':
|
if char == 'b':
|
||||||
|
self._display_guests = not self._display_guests
|
||||||
|
if self.stats.toggle_display_guests(self._display_guests):
|
||||||
|
self.show_msg(['Command not available with tracepoints'
|
||||||
|
' enabled', 'Restart with debugfs only '
|
||||||
|
'(see option \'-d\') and try again!'])
|
||||||
|
self._display_guests = not self._display_guests
|
||||||
self.refresh_header()
|
self.refresh_header()
|
||||||
self.update_drilldown()
|
|
||||||
sleeptime = DELAY_INITIAL
|
|
||||||
if char == 'q':
|
|
||||||
break
|
|
||||||
if char == 'c':
|
if char == 'c':
|
||||||
self.stats.fields_filter = DEFAULT_REGEX
|
self.stats.fields_filter = DEFAULT_REGEX
|
||||||
self.refresh_header(0)
|
self.refresh_header(0)
|
||||||
self.update_pid(0)
|
self.update_pid(0)
|
||||||
sleeptime = DELAY_INITIAL
|
|
||||||
if char == 'f':
|
if char == 'f':
|
||||||
|
curses.curs_set(1)
|
||||||
self.show_filter_selection()
|
self.show_filter_selection()
|
||||||
sleeptime = DELAY_INITIAL
|
curses.curs_set(0)
|
||||||
|
sleeptime = self._delay_initial
|
||||||
if char == 'g':
|
if char == 'g':
|
||||||
|
curses.curs_set(1)
|
||||||
self.show_vm_selection_by_guest_name()
|
self.show_vm_selection_by_guest_name()
|
||||||
sleeptime = DELAY_INITIAL
|
curses.curs_set(0)
|
||||||
|
sleeptime = self._delay_initial
|
||||||
|
if char == 'h':
|
||||||
|
self.show_help_interactive()
|
||||||
|
if char == 'o':
|
||||||
|
self._sorting = not self._sorting
|
||||||
if char == 'p':
|
if char == 'p':
|
||||||
|
curses.curs_set(1)
|
||||||
self.show_vm_selection_by_pid()
|
self.show_vm_selection_by_pid()
|
||||||
sleeptime = DELAY_INITIAL
|
curses.curs_set(0)
|
||||||
|
sleeptime = self._delay_initial
|
||||||
|
if char == 'q':
|
||||||
|
break
|
||||||
if char == 'r':
|
if char == 'r':
|
||||||
self.refresh_header()
|
|
||||||
self.stats.reset()
|
self.stats.reset()
|
||||||
sleeptime = DELAY_INITIAL
|
if char == 's':
|
||||||
|
curses.curs_set(1)
|
||||||
|
self.show_set_update_interval()
|
||||||
|
curses.curs_set(0)
|
||||||
|
sleeptime = self._delay_initial
|
||||||
|
if char == 'x':
|
||||||
|
self.update_drilldown()
|
||||||
|
# prevents display of current values on next refresh
|
||||||
|
self.stats.get()
|
||||||
except KeyboardInterrupt:
|
except KeyboardInterrupt:
|
||||||
break
|
break
|
||||||
except curses.error:
|
except curses.error:
|
||||||
|
@ -1227,13 +1422,17 @@ Requirements:
|
||||||
the large number of files that are possibly opened.
|
the large number of files that are possibly opened.
|
||||||
|
|
||||||
Interactive Commands:
|
Interactive Commands:
|
||||||
|
b toggle events by guests (debugfs only, honors filters)
|
||||||
c clear filter
|
c clear filter
|
||||||
f filter by regular expression
|
f filter by regular expression
|
||||||
g filter by guest name
|
g filter by guest name
|
||||||
|
h display interactive commands reference
|
||||||
|
o toggle sorting order (Total vs CurAvg/s)
|
||||||
p filter by PID
|
p filter by PID
|
||||||
q quit
|
q quit
|
||||||
x toggle reporting of stats for individual child trace events
|
|
||||||
r reset stats
|
r reset stats
|
||||||
|
s set update interval
|
||||||
|
x toggle reporting of stats for individual child trace events
|
||||||
Press any other key to refresh statistics immediately.
|
Press any other key to refresh statistics immediately.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
@ -1246,7 +1445,7 @@ Press any other key to refresh statistics immediately.
|
||||||
|
|
||||||
def cb_guest_to_pid(option, opt, val, parser):
|
def cb_guest_to_pid(option, opt, val, parser):
|
||||||
try:
|
try:
|
||||||
pids = get_pid_from_gname(val)
|
pids = Tui.get_pid_from_gname(val)
|
||||||
except:
|
except:
|
||||||
raise optparse.OptionValueError('Error while searching for guest '
|
raise optparse.OptionValueError('Error while searching for guest '
|
||||||
'"{}", use "-p" to specify a pid '
|
'"{}", use "-p" to specify a pid '
|
||||||
|
@ -1268,6 +1467,13 @@ Press any other key to refresh statistics immediately.
|
||||||
dest='once',
|
dest='once',
|
||||||
help='run in batch mode for one second',
|
help='run in batch mode for one second',
|
||||||
)
|
)
|
||||||
|
optparser.add_option('-i', '--debugfs-include-past',
|
||||||
|
action='store_true',
|
||||||
|
default=False,
|
||||||
|
dest='dbgfs_include_past',
|
||||||
|
help='include all available data on past events for '
|
||||||
|
'debugfs',
|
||||||
|
)
|
||||||
optparser.add_option('-l', '--log',
|
optparser.add_option('-l', '--log',
|
||||||
action='store_true',
|
action='store_true',
|
||||||
default=False,
|
default=False,
|
||||||
|
@ -1288,7 +1494,7 @@ Press any other key to refresh statistics immediately.
|
||||||
)
|
)
|
||||||
optparser.add_option('-f', '--fields',
|
optparser.add_option('-f', '--fields',
|
||||||
action='store',
|
action='store',
|
||||||
default=None,
|
default=DEFAULT_REGEX,
|
||||||
dest='fields',
|
dest='fields',
|
||||||
help='fields to display (regex)',
|
help='fields to display (regex)',
|
||||||
)
|
)
|
||||||
|
@ -1311,20 +1517,6 @@ Press any other key to refresh statistics immediately.
|
||||||
return options
|
return options
|
||||||
|
|
||||||
|
|
||||||
def get_providers(options):
|
|
||||||
"""Returns a list of data providers depending on the passed options."""
|
|
||||||
providers = []
|
|
||||||
|
|
||||||
if options.tracepoints:
|
|
||||||
providers.append(TracepointProvider())
|
|
||||||
if options.debugfs:
|
|
||||||
providers.append(DebugfsProvider())
|
|
||||||
if len(providers) == 0:
|
|
||||||
providers.append(TracepointProvider())
|
|
||||||
|
|
||||||
return providers
|
|
||||||
|
|
||||||
|
|
||||||
def check_access(options):
|
def check_access(options):
|
||||||
"""Exits if the current user can't access all needed directories."""
|
"""Exits if the current user can't access all needed directories."""
|
||||||
if not os.path.exists('/sys/kernel/debug'):
|
if not os.path.exists('/sys/kernel/debug'):
|
||||||
|
@ -1365,8 +1557,7 @@ def main():
|
||||||
sys.stderr.write('Did you use a (unsupported) tid instead of a pid?\n')
|
sys.stderr.write('Did you use a (unsupported) tid instead of a pid?\n')
|
||||||
sys.exit('Specified pid does not exist.')
|
sys.exit('Specified pid does not exist.')
|
||||||
|
|
||||||
providers = get_providers(options)
|
stats = Stats(options)
|
||||||
stats = Stats(providers, options.pid, fields=options.fields)
|
|
||||||
|
|
||||||
if options.log:
|
if options.log:
|
||||||
log(stats)
|
log(stats)
|
||||||
|
|
|
@ -29,18 +29,26 @@ meaning of events.
|
||||||
INTERACTIVE COMMANDS
|
INTERACTIVE COMMANDS
|
||||||
--------------------
|
--------------------
|
||||||
[horizontal]
|
[horizontal]
|
||||||
|
*b*:: toggle events by guests (debugfs only, honors filters)
|
||||||
|
|
||||||
*c*:: clear filter
|
*c*:: clear filter
|
||||||
|
|
||||||
*f*:: filter by regular expression
|
*f*:: filter by regular expression
|
||||||
|
|
||||||
*g*:: filter by guest name
|
*g*:: filter by guest name
|
||||||
|
|
||||||
|
*h*:: display interactive commands reference
|
||||||
|
|
||||||
|
*o*:: toggle sorting order (Total vs CurAvg/s)
|
||||||
|
|
||||||
*p*:: filter by PID
|
*p*:: filter by PID
|
||||||
|
|
||||||
*q*:: quit
|
*q*:: quit
|
||||||
|
|
||||||
*r*:: reset stats
|
*r*:: reset stats
|
||||||
|
|
||||||
|
*s*:: set update interval
|
||||||
|
|
||||||
*x*:: toggle reporting of stats for child trace events
|
*x*:: toggle reporting of stats for child trace events
|
||||||
|
|
||||||
Press any other key to refresh statistics immediately.
|
Press any other key to refresh statistics immediately.
|
||||||
|
@ -64,6 +72,10 @@ OPTIONS
|
||||||
--debugfs::
|
--debugfs::
|
||||||
retrieve statistics from debugfs
|
retrieve statistics from debugfs
|
||||||
|
|
||||||
|
-i::
|
||||||
|
--debugfs-include-past::
|
||||||
|
include all available data on past events for debugfs
|
||||||
|
|
||||||
-p<pid>::
|
-p<pid>::
|
||||||
--pid=<pid>::
|
--pid=<pid>::
|
||||||
limit statistics to one virtual machine (pid)
|
limit statistics to one virtual machine (pid)
|
||||||
|
|
|
@ -60,7 +60,7 @@ static const unsigned short cc_map[16] = {
|
||||||
/*
|
/*
|
||||||
* Check if a trapped instruction should have been executed or not.
|
* Check if a trapped instruction should have been executed or not.
|
||||||
*/
|
*/
|
||||||
bool kvm_condition_valid32(const struct kvm_vcpu *vcpu)
|
bool __hyp_text kvm_condition_valid32(const struct kvm_vcpu *vcpu)
|
||||||
{
|
{
|
||||||
unsigned long cpsr;
|
unsigned long cpsr;
|
||||||
u32 cpsr_cond;
|
u32 cpsr_cond;
|
||||||
|
|
|
@ -21,6 +21,7 @@
|
||||||
#include <linux/kvm_host.h>
|
#include <linux/kvm_host.h>
|
||||||
#include <linux/interrupt.h>
|
#include <linux/interrupt.h>
|
||||||
#include <linux/irq.h>
|
#include <linux/irq.h>
|
||||||
|
#include <linux/uaccess.h>
|
||||||
|
|
||||||
#include <clocksource/arm_arch_timer.h>
|
#include <clocksource/arm_arch_timer.h>
|
||||||
#include <asm/arch_timer.h>
|
#include <asm/arch_timer.h>
|
||||||
|
@ -35,6 +36,16 @@ static struct timecounter *timecounter;
|
||||||
static unsigned int host_vtimer_irq;
|
static unsigned int host_vtimer_irq;
|
||||||
static u32 host_vtimer_irq_flags;
|
static u32 host_vtimer_irq_flags;
|
||||||
|
|
||||||
|
static const struct kvm_irq_level default_ptimer_irq = {
|
||||||
|
.irq = 30,
|
||||||
|
.level = 1,
|
||||||
|
};
|
||||||
|
|
||||||
|
static const struct kvm_irq_level default_vtimer_irq = {
|
||||||
|
.irq = 27,
|
||||||
|
.level = 1,
|
||||||
|
};
|
||||||
|
|
||||||
void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
|
void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
|
||||||
{
|
{
|
||||||
vcpu_vtimer(vcpu)->active_cleared_last = false;
|
vcpu_vtimer(vcpu)->active_cleared_last = false;
|
||||||
|
@ -95,7 +106,7 @@ static void kvm_timer_inject_irq_work(struct work_struct *work)
|
||||||
* If the vcpu is blocked we want to wake it up so that it will see
|
* If the vcpu is blocked we want to wake it up so that it will see
|
||||||
* the timer has expired when entering the guest.
|
* the timer has expired when entering the guest.
|
||||||
*/
|
*/
|
||||||
kvm_vcpu_kick(vcpu);
|
kvm_vcpu_wake_up(vcpu);
|
||||||
}
|
}
|
||||||
|
|
||||||
static u64 kvm_timer_compute_delta(struct arch_timer_context *timer_ctx)
|
static u64 kvm_timer_compute_delta(struct arch_timer_context *timer_ctx)
|
||||||
|
@ -215,7 +226,8 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
|
||||||
if (likely(irqchip_in_kernel(vcpu->kvm))) {
|
if (likely(irqchip_in_kernel(vcpu->kvm))) {
|
||||||
ret = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
|
ret = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
|
||||||
timer_ctx->irq.irq,
|
timer_ctx->irq.irq,
|
||||||
timer_ctx->irq.level);
|
timer_ctx->irq.level,
|
||||||
|
timer_ctx);
|
||||||
WARN_ON(ret);
|
WARN_ON(ret);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -445,22 +457,11 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
|
||||||
kvm_timer_update_state(vcpu);
|
kvm_timer_update_state(vcpu);
|
||||||
}
|
}
|
||||||
|
|
||||||
int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
|
int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
|
||||||
const struct kvm_irq_level *virt_irq,
|
|
||||||
const struct kvm_irq_level *phys_irq)
|
|
||||||
{
|
{
|
||||||
struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
|
struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
|
||||||
struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
|
struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
|
||||||
|
|
||||||
/*
|
|
||||||
* The vcpu timer irq number cannot be determined in
|
|
||||||
* kvm_timer_vcpu_init() because it is called much before
|
|
||||||
* kvm_vcpu_set_target(). To handle this, we determine
|
|
||||||
* vcpu timer irq number when the vcpu is reset.
|
|
||||||
*/
|
|
||||||
vtimer->irq.irq = virt_irq->irq;
|
|
||||||
ptimer->irq.irq = phys_irq->irq;
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* The bits in CNTV_CTL are architecturally reset to UNKNOWN for ARMv8
|
* The bits in CNTV_CTL are architecturally reset to UNKNOWN for ARMv8
|
||||||
* and to 0 for ARMv7. We provide an implementation that always
|
* and to 0 for ARMv7. We provide an implementation that always
|
||||||
|
@ -496,6 +497,8 @@ static void update_vtimer_cntvoff(struct kvm_vcpu *vcpu, u64 cntvoff)
|
||||||
void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
|
void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
|
||||||
{
|
{
|
||||||
struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
|
struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
|
||||||
|
struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
|
||||||
|
struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
|
||||||
|
|
||||||
/* Synchronize cntvoff across all vtimers of a VM. */
|
/* Synchronize cntvoff across all vtimers of a VM. */
|
||||||
update_vtimer_cntvoff(vcpu, kvm_phys_timer_read());
|
update_vtimer_cntvoff(vcpu, kvm_phys_timer_read());
|
||||||
|
@ -504,6 +507,9 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
|
||||||
INIT_WORK(&timer->expired, kvm_timer_inject_irq_work);
|
INIT_WORK(&timer->expired, kvm_timer_inject_irq_work);
|
||||||
hrtimer_init(&timer->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
|
hrtimer_init(&timer->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
|
||||||
timer->timer.function = kvm_timer_expire;
|
timer->timer.function = kvm_timer_expire;
|
||||||
|
|
||||||
|
vtimer->irq.irq = default_vtimer_irq.irq;
|
||||||
|
ptimer->irq.irq = default_ptimer_irq.irq;
|
||||||
}
|
}
|
||||||
|
|
||||||
static void kvm_timer_init_interrupt(void *info)
|
static void kvm_timer_init_interrupt(void *info)
|
||||||
|
@ -613,6 +619,30 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
|
||||||
kvm_vgic_unmap_phys_irq(vcpu, vtimer->irq.irq);
|
kvm_vgic_unmap_phys_irq(vcpu, vtimer->irq.irq);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static bool timer_irqs_are_valid(struct kvm_vcpu *vcpu)
|
||||||
|
{
|
||||||
|
int vtimer_irq, ptimer_irq;
|
||||||
|
int i, ret;
|
||||||
|
|
||||||
|
vtimer_irq = vcpu_vtimer(vcpu)->irq.irq;
|
||||||
|
ret = kvm_vgic_set_owner(vcpu, vtimer_irq, vcpu_vtimer(vcpu));
|
||||||
|
if (ret)
|
||||||
|
return false;
|
||||||
|
|
||||||
|
ptimer_irq = vcpu_ptimer(vcpu)->irq.irq;
|
||||||
|
ret = kvm_vgic_set_owner(vcpu, ptimer_irq, vcpu_ptimer(vcpu));
|
||||||
|
if (ret)
|
||||||
|
return false;
|
||||||
|
|
||||||
|
kvm_for_each_vcpu(i, vcpu, vcpu->kvm) {
|
||||||
|
if (vcpu_vtimer(vcpu)->irq.irq != vtimer_irq ||
|
||||||
|
vcpu_ptimer(vcpu)->irq.irq != ptimer_irq)
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
int kvm_timer_enable(struct kvm_vcpu *vcpu)
|
int kvm_timer_enable(struct kvm_vcpu *vcpu)
|
||||||
{
|
{
|
||||||
struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
|
struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
|
||||||
|
@ -632,6 +662,11 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
|
||||||
if (!vgic_initialized(vcpu->kvm))
|
if (!vgic_initialized(vcpu->kvm))
|
||||||
return -ENODEV;
|
return -ENODEV;
|
||||||
|
|
||||||
|
if (!timer_irqs_are_valid(vcpu)) {
|
||||||
|
kvm_debug("incorrectly configured timer irqs\n");
|
||||||
|
return -EINVAL;
|
||||||
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Find the physical IRQ number corresponding to the host_vtimer_irq
|
* Find the physical IRQ number corresponding to the host_vtimer_irq
|
||||||
*/
|
*/
|
||||||
|
@ -681,3 +716,79 @@ void kvm_timer_init_vhe(void)
|
||||||
val |= (CNTHCTL_EL1PCTEN << cnthctl_shift);
|
val |= (CNTHCTL_EL1PCTEN << cnthctl_shift);
|
||||||
write_sysreg(val, cnthctl_el2);
|
write_sysreg(val, cnthctl_el2);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void set_timer_irqs(struct kvm *kvm, int vtimer_irq, int ptimer_irq)
|
||||||
|
{
|
||||||
|
struct kvm_vcpu *vcpu;
|
||||||
|
int i;
|
||||||
|
|
||||||
|
kvm_for_each_vcpu(i, vcpu, kvm) {
|
||||||
|
vcpu_vtimer(vcpu)->irq.irq = vtimer_irq;
|
||||||
|
vcpu_ptimer(vcpu)->irq.irq = ptimer_irq;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
|
||||||
|
{
|
||||||
|
int __user *uaddr = (int __user *)(long)attr->addr;
|
||||||
|
struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
|
||||||
|
struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
|
||||||
|
int irq;
|
||||||
|
|
||||||
|
if (!irqchip_in_kernel(vcpu->kvm))
|
||||||
|
return -EINVAL;
|
||||||
|
|
||||||
|
if (get_user(irq, uaddr))
|
||||||
|
return -EFAULT;
|
||||||
|
|
||||||
|
if (!(irq_is_ppi(irq)))
|
||||||
|
return -EINVAL;
|
||||||
|
|
||||||
|
if (vcpu->arch.timer_cpu.enabled)
|
||||||
|
return -EBUSY;
|
||||||
|
|
||||||
|
switch (attr->attr) {
|
||||||
|
case KVM_ARM_VCPU_TIMER_IRQ_VTIMER:
|
||||||
|
set_timer_irqs(vcpu->kvm, irq, ptimer->irq.irq);
|
||||||
|
break;
|
||||||
|
case KVM_ARM_VCPU_TIMER_IRQ_PTIMER:
|
||||||
|
set_timer_irqs(vcpu->kvm, vtimer->irq.irq, irq);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
return -ENXIO;
|
||||||
|
}
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
int kvm_arm_timer_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
|
||||||
|
{
|
||||||
|
int __user *uaddr = (int __user *)(long)attr->addr;
|
||||||
|
struct arch_timer_context *timer;
|
||||||
|
int irq;
|
||||||
|
|
||||||
|
switch (attr->attr) {
|
||||||
|
case KVM_ARM_VCPU_TIMER_IRQ_VTIMER:
|
||||||
|
timer = vcpu_vtimer(vcpu);
|
||||||
|
break;
|
||||||
|
case KVM_ARM_VCPU_TIMER_IRQ_PTIMER:
|
||||||
|
timer = vcpu_ptimer(vcpu);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
return -ENXIO;
|
||||||
|
}
|
||||||
|
|
||||||
|
irq = timer->irq.irq;
|
||||||
|
return put_user(irq, uaddr);
|
||||||
|
}
|
||||||
|
|
||||||
|
int kvm_arm_timer_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
|
||||||
|
{
|
||||||
|
switch (attr->attr) {
|
||||||
|
case KVM_ARM_VCPU_TIMER_IRQ_VTIMER:
|
||||||
|
case KVM_ARM_VCPU_TIMER_IRQ_PTIMER:
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
return -ENXIO;
|
||||||
|
}
|
||||||
|
|
|
@ -368,6 +368,13 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
|
||||||
kvm_timer_vcpu_put(vcpu);
|
kvm_timer_vcpu_put(vcpu);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void vcpu_power_off(struct kvm_vcpu *vcpu)
|
||||||
|
{
|
||||||
|
vcpu->arch.power_off = true;
|
||||||
|
kvm_make_request(KVM_REQ_SLEEP, vcpu);
|
||||||
|
kvm_vcpu_kick(vcpu);
|
||||||
|
}
|
||||||
|
|
||||||
int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
|
int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
|
||||||
struct kvm_mp_state *mp_state)
|
struct kvm_mp_state *mp_state)
|
||||||
{
|
{
|
||||||
|
@ -387,7 +394,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
|
||||||
vcpu->arch.power_off = false;
|
vcpu->arch.power_off = false;
|
||||||
break;
|
break;
|
||||||
case KVM_MP_STATE_STOPPED:
|
case KVM_MP_STATE_STOPPED:
|
||||||
vcpu->arch.power_off = true;
|
vcpu_power_off(vcpu);
|
||||||
break;
|
break;
|
||||||
default:
|
default:
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
|
@ -520,6 +527,10 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
|
||||||
}
|
}
|
||||||
|
|
||||||
ret = kvm_timer_enable(vcpu);
|
ret = kvm_timer_enable(vcpu);
|
||||||
|
if (ret)
|
||||||
|
return ret;
|
||||||
|
|
||||||
|
ret = kvm_arm_pmu_v3_enable(vcpu);
|
||||||
|
|
||||||
return ret;
|
return ret;
|
||||||
}
|
}
|
||||||
|
@ -536,21 +547,7 @@ void kvm_arm_halt_guest(struct kvm *kvm)
|
||||||
|
|
||||||
kvm_for_each_vcpu(i, vcpu, kvm)
|
kvm_for_each_vcpu(i, vcpu, kvm)
|
||||||
vcpu->arch.pause = true;
|
vcpu->arch.pause = true;
|
||||||
kvm_make_all_cpus_request(kvm, KVM_REQ_VCPU_EXIT);
|
kvm_make_all_cpus_request(kvm, KVM_REQ_SLEEP);
|
||||||
}
|
|
||||||
|
|
||||||
void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu)
|
|
||||||
{
|
|
||||||
vcpu->arch.pause = true;
|
|
||||||
kvm_vcpu_kick(vcpu);
|
|
||||||
}
|
|
||||||
|
|
||||||
void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu)
|
|
||||||
{
|
|
||||||
struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
|
|
||||||
|
|
||||||
vcpu->arch.pause = false;
|
|
||||||
swake_up(wq);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
void kvm_arm_resume_guest(struct kvm *kvm)
|
void kvm_arm_resume_guest(struct kvm *kvm)
|
||||||
|
@ -558,16 +555,23 @@ void kvm_arm_resume_guest(struct kvm *kvm)
|
||||||
int i;
|
int i;
|
||||||
struct kvm_vcpu *vcpu;
|
struct kvm_vcpu *vcpu;
|
||||||
|
|
||||||
kvm_for_each_vcpu(i, vcpu, kvm)
|
kvm_for_each_vcpu(i, vcpu, kvm) {
|
||||||
kvm_arm_resume_vcpu(vcpu);
|
vcpu->arch.pause = false;
|
||||||
|
swake_up(kvm_arch_vcpu_wq(vcpu));
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
static void vcpu_sleep(struct kvm_vcpu *vcpu)
|
static void vcpu_req_sleep(struct kvm_vcpu *vcpu)
|
||||||
{
|
{
|
||||||
struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
|
struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
|
||||||
|
|
||||||
swait_event_interruptible(*wq, ((!vcpu->arch.power_off) &&
|
swait_event_interruptible(*wq, ((!vcpu->arch.power_off) &&
|
||||||
(!vcpu->arch.pause)));
|
(!vcpu->arch.pause)));
|
||||||
|
|
||||||
|
if (vcpu->arch.power_off || vcpu->arch.pause) {
|
||||||
|
/* Awaken to handle a signal, request we sleep again later. */
|
||||||
|
kvm_make_request(KVM_REQ_SLEEP, vcpu);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
|
static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
|
||||||
|
@ -575,6 +579,20 @@ static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
|
||||||
return vcpu->arch.target >= 0;
|
return vcpu->arch.target >= 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void check_vcpu_requests(struct kvm_vcpu *vcpu)
|
||||||
|
{
|
||||||
|
if (kvm_request_pending(vcpu)) {
|
||||||
|
if (kvm_check_request(KVM_REQ_SLEEP, vcpu))
|
||||||
|
vcpu_req_sleep(vcpu);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Clear IRQ_PENDING requests that were made to guarantee
|
||||||
|
* that a VCPU sees new virtual interrupts.
|
||||||
|
*/
|
||||||
|
kvm_check_request(KVM_REQ_IRQ_PENDING, vcpu);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
|
* kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
|
||||||
* @vcpu: The VCPU pointer
|
* @vcpu: The VCPU pointer
|
||||||
|
@ -620,8 +638,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
|
||||||
|
|
||||||
update_vttbr(vcpu->kvm);
|
update_vttbr(vcpu->kvm);
|
||||||
|
|
||||||
if (vcpu->arch.power_off || vcpu->arch.pause)
|
check_vcpu_requests(vcpu);
|
||||||
vcpu_sleep(vcpu);
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Preparing the interrupts to be injected also
|
* Preparing the interrupts to be injected also
|
||||||
|
@ -650,8 +667,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
|
||||||
run->exit_reason = KVM_EXIT_INTR;
|
run->exit_reason = KVM_EXIT_INTR;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Ensure we set mode to IN_GUEST_MODE after we disable
|
||||||
|
* interrupts and before the final VCPU requests check.
|
||||||
|
* See the comment in kvm_vcpu_exiting_guest_mode() and
|
||||||
|
* Documentation/virtual/kvm/vcpu-requests.rst
|
||||||
|
*/
|
||||||
|
smp_store_mb(vcpu->mode, IN_GUEST_MODE);
|
||||||
|
|
||||||
if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
|
if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
|
||||||
vcpu->arch.power_off || vcpu->arch.pause) {
|
kvm_request_pending(vcpu)) {
|
||||||
|
vcpu->mode = OUTSIDE_GUEST_MODE;
|
||||||
local_irq_enable();
|
local_irq_enable();
|
||||||
kvm_pmu_sync_hwstate(vcpu);
|
kvm_pmu_sync_hwstate(vcpu);
|
||||||
kvm_timer_sync_hwstate(vcpu);
|
kvm_timer_sync_hwstate(vcpu);
|
||||||
|
@ -667,7 +693,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
|
||||||
*/
|
*/
|
||||||
trace_kvm_entry(*vcpu_pc(vcpu));
|
trace_kvm_entry(*vcpu_pc(vcpu));
|
||||||
guest_enter_irqoff();
|
guest_enter_irqoff();
|
||||||
vcpu->mode = IN_GUEST_MODE;
|
|
||||||
|
|
||||||
ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
|
ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
|
||||||
|
|
||||||
|
@ -756,6 +781,7 @@ static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
|
||||||
* trigger a world-switch round on the running physical CPU to set the
|
* trigger a world-switch round on the running physical CPU to set the
|
||||||
* virtual IRQ/FIQ fields in the HCR appropriately.
|
* virtual IRQ/FIQ fields in the HCR appropriately.
|
||||||
*/
|
*/
|
||||||
|
kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
|
||||||
kvm_vcpu_kick(vcpu);
|
kvm_vcpu_kick(vcpu);
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
|
@ -806,7 +832,7 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level,
|
||||||
if (irq_num < VGIC_NR_SGIS || irq_num >= VGIC_NR_PRIVATE_IRQS)
|
if (irq_num < VGIC_NR_SGIS || irq_num >= VGIC_NR_PRIVATE_IRQS)
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
|
|
||||||
return kvm_vgic_inject_irq(kvm, vcpu->vcpu_id, irq_num, level);
|
return kvm_vgic_inject_irq(kvm, vcpu->vcpu_id, irq_num, level, NULL);
|
||||||
case KVM_ARM_IRQ_TYPE_SPI:
|
case KVM_ARM_IRQ_TYPE_SPI:
|
||||||
if (!irqchip_in_kernel(kvm))
|
if (!irqchip_in_kernel(kvm))
|
||||||
return -ENXIO;
|
return -ENXIO;
|
||||||
|
@ -814,7 +840,7 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level,
|
||||||
if (irq_num < VGIC_NR_PRIVATE_IRQS)
|
if (irq_num < VGIC_NR_PRIVATE_IRQS)
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
|
|
||||||
return kvm_vgic_inject_irq(kvm, 0, irq_num, level);
|
return kvm_vgic_inject_irq(kvm, 0, irq_num, level, NULL);
|
||||||
}
|
}
|
||||||
|
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
|
@ -884,7 +910,7 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
|
||||||
* Handle the "start in power-off" case.
|
* Handle the "start in power-off" case.
|
||||||
*/
|
*/
|
||||||
if (test_bit(KVM_ARM_VCPU_POWER_OFF, vcpu->arch.features))
|
if (test_bit(KVM_ARM_VCPU_POWER_OFF, vcpu->arch.features))
|
||||||
vcpu->arch.power_off = true;
|
vcpu_power_off(vcpu);
|
||||||
else
|
else
|
||||||
vcpu->arch.power_off = false;
|
vcpu->arch.power_off = false;
|
||||||
|
|
||||||
|
@ -1115,9 +1141,6 @@ static void cpu_init_hyp_mode(void *dummy)
|
||||||
__cpu_init_hyp_mode(pgd_ptr, hyp_stack_ptr, vector_ptr);
|
__cpu_init_hyp_mode(pgd_ptr, hyp_stack_ptr, vector_ptr);
|
||||||
__cpu_init_stage2();
|
__cpu_init_stage2();
|
||||||
|
|
||||||
if (is_kernel_in_hyp_mode())
|
|
||||||
kvm_timer_init_vhe();
|
|
||||||
|
|
||||||
kvm_arm_init_debug();
|
kvm_arm_init_debug();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -1137,6 +1160,7 @@ static void cpu_hyp_reinit(void)
|
||||||
* event was cancelled before the CPU was reset.
|
* event was cancelled before the CPU was reset.
|
||||||
*/
|
*/
|
||||||
__cpu_init_stage2();
|
__cpu_init_stage2();
|
||||||
|
kvm_timer_init_vhe();
|
||||||
} else {
|
} else {
|
||||||
cpu_init_hyp_mode(NULL);
|
cpu_init_hyp_mode(NULL);
|
||||||
}
|
}
|
||||||
|
|
|
@ -19,10 +19,12 @@
|
||||||
#include <linux/irqchip/arm-gic-v3.h>
|
#include <linux/irqchip/arm-gic-v3.h>
|
||||||
#include <linux/kvm_host.h>
|
#include <linux/kvm_host.h>
|
||||||
|
|
||||||
|
#include <asm/kvm_emulate.h>
|
||||||
#include <asm/kvm_hyp.h>
|
#include <asm/kvm_hyp.h>
|
||||||
|
|
||||||
#define vtr_to_max_lr_idx(v) ((v) & 0xf)
|
#define vtr_to_max_lr_idx(v) ((v) & 0xf)
|
||||||
#define vtr_to_nr_pre_bits(v) ((((u32)(v) >> 26) & 7) + 1)
|
#define vtr_to_nr_pre_bits(v) ((((u32)(v) >> 26) & 7) + 1)
|
||||||
|
#define vtr_to_nr_apr_regs(v) (1 << (vtr_to_nr_pre_bits(v) - 5))
|
||||||
|
|
||||||
static u64 __hyp_text __gic_v3_get_lr(unsigned int lr)
|
static u64 __hyp_text __gic_v3_get_lr(unsigned int lr)
|
||||||
{
|
{
|
||||||
|
@ -118,6 +120,90 @@ static void __hyp_text __gic_v3_set_lr(u64 val, int lr)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_write_ap0rn(u32 val, int n)
|
||||||
|
{
|
||||||
|
switch (n) {
|
||||||
|
case 0:
|
||||||
|
write_gicreg(val, ICH_AP0R0_EL2);
|
||||||
|
break;
|
||||||
|
case 1:
|
||||||
|
write_gicreg(val, ICH_AP0R1_EL2);
|
||||||
|
break;
|
||||||
|
case 2:
|
||||||
|
write_gicreg(val, ICH_AP0R2_EL2);
|
||||||
|
break;
|
||||||
|
case 3:
|
||||||
|
write_gicreg(val, ICH_AP0R3_EL2);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_write_ap1rn(u32 val, int n)
|
||||||
|
{
|
||||||
|
switch (n) {
|
||||||
|
case 0:
|
||||||
|
write_gicreg(val, ICH_AP1R0_EL2);
|
||||||
|
break;
|
||||||
|
case 1:
|
||||||
|
write_gicreg(val, ICH_AP1R1_EL2);
|
||||||
|
break;
|
||||||
|
case 2:
|
||||||
|
write_gicreg(val, ICH_AP1R2_EL2);
|
||||||
|
break;
|
||||||
|
case 3:
|
||||||
|
write_gicreg(val, ICH_AP1R3_EL2);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
static u32 __hyp_text __vgic_v3_read_ap0rn(int n)
|
||||||
|
{
|
||||||
|
u32 val;
|
||||||
|
|
||||||
|
switch (n) {
|
||||||
|
case 0:
|
||||||
|
val = read_gicreg(ICH_AP0R0_EL2);
|
||||||
|
break;
|
||||||
|
case 1:
|
||||||
|
val = read_gicreg(ICH_AP0R1_EL2);
|
||||||
|
break;
|
||||||
|
case 2:
|
||||||
|
val = read_gicreg(ICH_AP0R2_EL2);
|
||||||
|
break;
|
||||||
|
case 3:
|
||||||
|
val = read_gicreg(ICH_AP0R3_EL2);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
unreachable();
|
||||||
|
}
|
||||||
|
|
||||||
|
return val;
|
||||||
|
}
|
||||||
|
|
||||||
|
static u32 __hyp_text __vgic_v3_read_ap1rn(int n)
|
||||||
|
{
|
||||||
|
u32 val;
|
||||||
|
|
||||||
|
switch (n) {
|
||||||
|
case 0:
|
||||||
|
val = read_gicreg(ICH_AP1R0_EL2);
|
||||||
|
break;
|
||||||
|
case 1:
|
||||||
|
val = read_gicreg(ICH_AP1R1_EL2);
|
||||||
|
break;
|
||||||
|
case 2:
|
||||||
|
val = read_gicreg(ICH_AP1R2_EL2);
|
||||||
|
break;
|
||||||
|
case 3:
|
||||||
|
val = read_gicreg(ICH_AP1R3_EL2);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
unreachable();
|
||||||
|
}
|
||||||
|
|
||||||
|
return val;
|
||||||
|
}
|
||||||
|
|
||||||
void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
|
void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
|
||||||
{
|
{
|
||||||
struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
|
struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
|
||||||
|
@ -154,24 +240,27 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
|
||||||
|
|
||||||
switch (nr_pre_bits) {
|
switch (nr_pre_bits) {
|
||||||
case 7:
|
case 7:
|
||||||
cpu_if->vgic_ap0r[3] = read_gicreg(ICH_AP0R3_EL2);
|
cpu_if->vgic_ap0r[3] = __vgic_v3_read_ap0rn(3);
|
||||||
cpu_if->vgic_ap0r[2] = read_gicreg(ICH_AP0R2_EL2);
|
cpu_if->vgic_ap0r[2] = __vgic_v3_read_ap0rn(2);
|
||||||
case 6:
|
case 6:
|
||||||
cpu_if->vgic_ap0r[1] = read_gicreg(ICH_AP0R1_EL2);
|
cpu_if->vgic_ap0r[1] = __vgic_v3_read_ap0rn(1);
|
||||||
default:
|
default:
|
||||||
cpu_if->vgic_ap0r[0] = read_gicreg(ICH_AP0R0_EL2);
|
cpu_if->vgic_ap0r[0] = __vgic_v3_read_ap0rn(0);
|
||||||
}
|
}
|
||||||
|
|
||||||
switch (nr_pre_bits) {
|
switch (nr_pre_bits) {
|
||||||
case 7:
|
case 7:
|
||||||
cpu_if->vgic_ap1r[3] = read_gicreg(ICH_AP1R3_EL2);
|
cpu_if->vgic_ap1r[3] = __vgic_v3_read_ap1rn(3);
|
||||||
cpu_if->vgic_ap1r[2] = read_gicreg(ICH_AP1R2_EL2);
|
cpu_if->vgic_ap1r[2] = __vgic_v3_read_ap1rn(2);
|
||||||
case 6:
|
case 6:
|
||||||
cpu_if->vgic_ap1r[1] = read_gicreg(ICH_AP1R1_EL2);
|
cpu_if->vgic_ap1r[1] = __vgic_v3_read_ap1rn(1);
|
||||||
default:
|
default:
|
||||||
cpu_if->vgic_ap1r[0] = read_gicreg(ICH_AP1R0_EL2);
|
cpu_if->vgic_ap1r[0] = __vgic_v3_read_ap1rn(0);
|
||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
|
if (static_branch_unlikely(&vgic_v3_cpuif_trap))
|
||||||
|
write_gicreg(0, ICH_HCR_EL2);
|
||||||
|
|
||||||
cpu_if->vgic_elrsr = 0xffff;
|
cpu_if->vgic_elrsr = 0xffff;
|
||||||
cpu_if->vgic_ap0r[0] = 0;
|
cpu_if->vgic_ap0r[0] = 0;
|
||||||
cpu_if->vgic_ap0r[1] = 0;
|
cpu_if->vgic_ap0r[1] = 0;
|
||||||
|
@ -224,26 +313,34 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
|
||||||
|
|
||||||
switch (nr_pre_bits) {
|
switch (nr_pre_bits) {
|
||||||
case 7:
|
case 7:
|
||||||
write_gicreg(cpu_if->vgic_ap0r[3], ICH_AP0R3_EL2);
|
__vgic_v3_write_ap0rn(cpu_if->vgic_ap0r[3], 3);
|
||||||
write_gicreg(cpu_if->vgic_ap0r[2], ICH_AP0R2_EL2);
|
__vgic_v3_write_ap0rn(cpu_if->vgic_ap0r[2], 2);
|
||||||
case 6:
|
case 6:
|
||||||
write_gicreg(cpu_if->vgic_ap0r[1], ICH_AP0R1_EL2);
|
__vgic_v3_write_ap0rn(cpu_if->vgic_ap0r[1], 1);
|
||||||
default:
|
default:
|
||||||
write_gicreg(cpu_if->vgic_ap0r[0], ICH_AP0R0_EL2);
|
__vgic_v3_write_ap0rn(cpu_if->vgic_ap0r[0], 0);
|
||||||
}
|
}
|
||||||
|
|
||||||
switch (nr_pre_bits) {
|
switch (nr_pre_bits) {
|
||||||
case 7:
|
case 7:
|
||||||
write_gicreg(cpu_if->vgic_ap1r[3], ICH_AP1R3_EL2);
|
__vgic_v3_write_ap1rn(cpu_if->vgic_ap1r[3], 3);
|
||||||
write_gicreg(cpu_if->vgic_ap1r[2], ICH_AP1R2_EL2);
|
__vgic_v3_write_ap1rn(cpu_if->vgic_ap1r[2], 2);
|
||||||
case 6:
|
case 6:
|
||||||
write_gicreg(cpu_if->vgic_ap1r[1], ICH_AP1R1_EL2);
|
__vgic_v3_write_ap1rn(cpu_if->vgic_ap1r[1], 1);
|
||||||
default:
|
default:
|
||||||
write_gicreg(cpu_if->vgic_ap1r[0], ICH_AP1R0_EL2);
|
__vgic_v3_write_ap1rn(cpu_if->vgic_ap1r[0], 0);
|
||||||
}
|
}
|
||||||
|
|
||||||
for (i = 0; i < used_lrs; i++)
|
for (i = 0; i < used_lrs; i++)
|
||||||
__gic_v3_set_lr(cpu_if->vgic_lr[i], i);
|
__gic_v3_set_lr(cpu_if->vgic_lr[i], i);
|
||||||
|
} else {
|
||||||
|
/*
|
||||||
|
* If we need to trap system registers, we must write
|
||||||
|
* ICH_HCR_EL2 anyway, even if no interrupts are being
|
||||||
|
* injected,
|
||||||
|
*/
|
||||||
|
if (static_branch_unlikely(&vgic_v3_cpuif_trap))
|
||||||
|
write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -287,3 +384,697 @@ void __hyp_text __vgic_v3_write_vmcr(u32 vmcr)
|
||||||
{
|
{
|
||||||
write_gicreg(vmcr, ICH_VMCR_EL2);
|
write_gicreg(vmcr, ICH_VMCR_EL2);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#ifdef CONFIG_ARM64
|
||||||
|
|
||||||
|
static int __hyp_text __vgic_v3_bpr_min(void)
|
||||||
|
{
|
||||||
|
/* See Pseudocode for VPriorityGroup */
|
||||||
|
return 8 - vtr_to_nr_pre_bits(read_gicreg(ICH_VTR_EL2));
|
||||||
|
}
|
||||||
|
|
||||||
|
static int __hyp_text __vgic_v3_get_group(struct kvm_vcpu *vcpu)
|
||||||
|
{
|
||||||
|
u32 esr = kvm_vcpu_get_hsr(vcpu);
|
||||||
|
u8 crm = (esr & ESR_ELx_SYS64_ISS_CRM_MASK) >> ESR_ELx_SYS64_ISS_CRM_SHIFT;
|
||||||
|
|
||||||
|
return crm != 8;
|
||||||
|
}
|
||||||
|
|
||||||
|
#define GICv3_IDLE_PRIORITY 0xff
|
||||||
|
|
||||||
|
static int __hyp_text __vgic_v3_highest_priority_lr(struct kvm_vcpu *vcpu,
|
||||||
|
u32 vmcr,
|
||||||
|
u64 *lr_val)
|
||||||
|
{
|
||||||
|
unsigned int used_lrs = vcpu->arch.vgic_cpu.used_lrs;
|
||||||
|
u8 priority = GICv3_IDLE_PRIORITY;
|
||||||
|
int i, lr = -1;
|
||||||
|
|
||||||
|
for (i = 0; i < used_lrs; i++) {
|
||||||
|
u64 val = __gic_v3_get_lr(i);
|
||||||
|
u8 lr_prio = (val & ICH_LR_PRIORITY_MASK) >> ICH_LR_PRIORITY_SHIFT;
|
||||||
|
|
||||||
|
/* Not pending in the state? */
|
||||||
|
if ((val & ICH_LR_STATE) != ICH_LR_PENDING_BIT)
|
||||||
|
continue;
|
||||||
|
|
||||||
|
/* Group-0 interrupt, but Group-0 disabled? */
|
||||||
|
if (!(val & ICH_LR_GROUP) && !(vmcr & ICH_VMCR_ENG0_MASK))
|
||||||
|
continue;
|
||||||
|
|
||||||
|
/* Group-1 interrupt, but Group-1 disabled? */
|
||||||
|
if ((val & ICH_LR_GROUP) && !(vmcr & ICH_VMCR_ENG1_MASK))
|
||||||
|
continue;
|
||||||
|
|
||||||
|
/* Not the highest priority? */
|
||||||
|
if (lr_prio >= priority)
|
||||||
|
continue;
|
||||||
|
|
||||||
|
/* This is a candidate */
|
||||||
|
priority = lr_prio;
|
||||||
|
*lr_val = val;
|
||||||
|
lr = i;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (lr == -1)
|
||||||
|
*lr_val = ICC_IAR1_EL1_SPURIOUS;
|
||||||
|
|
||||||
|
return lr;
|
||||||
|
}
|
||||||
|
|
||||||
|
static int __hyp_text __vgic_v3_find_active_lr(struct kvm_vcpu *vcpu,
|
||||||
|
int intid, u64 *lr_val)
|
||||||
|
{
|
||||||
|
unsigned int used_lrs = vcpu->arch.vgic_cpu.used_lrs;
|
||||||
|
int i;
|
||||||
|
|
||||||
|
for (i = 0; i < used_lrs; i++) {
|
||||||
|
u64 val = __gic_v3_get_lr(i);
|
||||||
|
|
||||||
|
if ((val & ICH_LR_VIRTUAL_ID_MASK) == intid &&
|
||||||
|
(val & ICH_LR_ACTIVE_BIT)) {
|
||||||
|
*lr_val = val;
|
||||||
|
return i;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
*lr_val = ICC_IAR1_EL1_SPURIOUS;
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
static int __hyp_text __vgic_v3_get_highest_active_priority(void)
|
||||||
|
{
|
||||||
|
u8 nr_apr_regs = vtr_to_nr_apr_regs(read_gicreg(ICH_VTR_EL2));
|
||||||
|
u32 hap = 0;
|
||||||
|
int i;
|
||||||
|
|
||||||
|
for (i = 0; i < nr_apr_regs; i++) {
|
||||||
|
u32 val;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* The ICH_AP0Rn_EL2 and ICH_AP1Rn_EL2 registers
|
||||||
|
* contain the active priority levels for this VCPU
|
||||||
|
* for the maximum number of supported priority
|
||||||
|
* levels, and we return the full priority level only
|
||||||
|
* if the BPR is programmed to its minimum, otherwise
|
||||||
|
* we return a combination of the priority level and
|
||||||
|
* subpriority, as determined by the setting of the
|
||||||
|
* BPR, but without the full subpriority.
|
||||||
|
*/
|
||||||
|
val = __vgic_v3_read_ap0rn(i);
|
||||||
|
val |= __vgic_v3_read_ap1rn(i);
|
||||||
|
if (!val) {
|
||||||
|
hap += 32;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
return (hap + __ffs(val)) << __vgic_v3_bpr_min();
|
||||||
|
}
|
||||||
|
|
||||||
|
return GICv3_IDLE_PRIORITY;
|
||||||
|
}
|
||||||
|
|
||||||
|
static unsigned int __hyp_text __vgic_v3_get_bpr0(u32 vmcr)
|
||||||
|
{
|
||||||
|
return (vmcr & ICH_VMCR_BPR0_MASK) >> ICH_VMCR_BPR0_SHIFT;
|
||||||
|
}
|
||||||
|
|
||||||
|
static unsigned int __hyp_text __vgic_v3_get_bpr1(u32 vmcr)
|
||||||
|
{
|
||||||
|
unsigned int bpr;
|
||||||
|
|
||||||
|
if (vmcr & ICH_VMCR_CBPR_MASK) {
|
||||||
|
bpr = __vgic_v3_get_bpr0(vmcr);
|
||||||
|
if (bpr < 7)
|
||||||
|
bpr++;
|
||||||
|
} else {
|
||||||
|
bpr = (vmcr & ICH_VMCR_BPR1_MASK) >> ICH_VMCR_BPR1_SHIFT;
|
||||||
|
}
|
||||||
|
|
||||||
|
return bpr;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Convert a priority to a preemption level, taking the relevant BPR
|
||||||
|
* into account by zeroing the sub-priority bits.
|
||||||
|
*/
|
||||||
|
static u8 __hyp_text __vgic_v3_pri_to_pre(u8 pri, u32 vmcr, int grp)
|
||||||
|
{
|
||||||
|
unsigned int bpr;
|
||||||
|
|
||||||
|
if (!grp)
|
||||||
|
bpr = __vgic_v3_get_bpr0(vmcr) + 1;
|
||||||
|
else
|
||||||
|
bpr = __vgic_v3_get_bpr1(vmcr);
|
||||||
|
|
||||||
|
return pri & (GENMASK(7, 0) << bpr);
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* The priority value is independent of any of the BPR values, so we
|
||||||
|
* normalize it using the minumal BPR value. This guarantees that no
|
||||||
|
* matter what the guest does with its BPR, we can always set/get the
|
||||||
|
* same value of a priority.
|
||||||
|
*/
|
||||||
|
static void __hyp_text __vgic_v3_set_active_priority(u8 pri, u32 vmcr, int grp)
|
||||||
|
{
|
||||||
|
u8 pre, ap;
|
||||||
|
u32 val;
|
||||||
|
int apr;
|
||||||
|
|
||||||
|
pre = __vgic_v3_pri_to_pre(pri, vmcr, grp);
|
||||||
|
ap = pre >> __vgic_v3_bpr_min();
|
||||||
|
apr = ap / 32;
|
||||||
|
|
||||||
|
if (!grp) {
|
||||||
|
val = __vgic_v3_read_ap0rn(apr);
|
||||||
|
__vgic_v3_write_ap0rn(val | BIT(ap % 32), apr);
|
||||||
|
} else {
|
||||||
|
val = __vgic_v3_read_ap1rn(apr);
|
||||||
|
__vgic_v3_write_ap1rn(val | BIT(ap % 32), apr);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
static int __hyp_text __vgic_v3_clear_highest_active_priority(void)
|
||||||
|
{
|
||||||
|
u8 nr_apr_regs = vtr_to_nr_apr_regs(read_gicreg(ICH_VTR_EL2));
|
||||||
|
u32 hap = 0;
|
||||||
|
int i;
|
||||||
|
|
||||||
|
for (i = 0; i < nr_apr_regs; i++) {
|
||||||
|
u32 ap0, ap1;
|
||||||
|
int c0, c1;
|
||||||
|
|
||||||
|
ap0 = __vgic_v3_read_ap0rn(i);
|
||||||
|
ap1 = __vgic_v3_read_ap1rn(i);
|
||||||
|
if (!ap0 && !ap1) {
|
||||||
|
hap += 32;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
c0 = ap0 ? __ffs(ap0) : 32;
|
||||||
|
c1 = ap1 ? __ffs(ap1) : 32;
|
||||||
|
|
||||||
|
/* Always clear the LSB, which is the highest priority */
|
||||||
|
if (c0 < c1) {
|
||||||
|
ap0 &= ~BIT(c0);
|
||||||
|
__vgic_v3_write_ap0rn(ap0, i);
|
||||||
|
hap += c0;
|
||||||
|
} else {
|
||||||
|
ap1 &= ~BIT(c1);
|
||||||
|
__vgic_v3_write_ap1rn(ap1, i);
|
||||||
|
hap += c1;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Rescale to 8 bits of priority */
|
||||||
|
return hap << __vgic_v3_bpr_min();
|
||||||
|
}
|
||||||
|
|
||||||
|
return GICv3_IDLE_PRIORITY;
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_read_iar(struct kvm_vcpu *vcpu, u32 vmcr, int rt)
|
||||||
|
{
|
||||||
|
u64 lr_val;
|
||||||
|
u8 lr_prio, pmr;
|
||||||
|
int lr, grp;
|
||||||
|
|
||||||
|
grp = __vgic_v3_get_group(vcpu);
|
||||||
|
|
||||||
|
lr = __vgic_v3_highest_priority_lr(vcpu, vmcr, &lr_val);
|
||||||
|
if (lr < 0)
|
||||||
|
goto spurious;
|
||||||
|
|
||||||
|
if (grp != !!(lr_val & ICH_LR_GROUP))
|
||||||
|
goto spurious;
|
||||||
|
|
||||||
|
pmr = (vmcr & ICH_VMCR_PMR_MASK) >> ICH_VMCR_PMR_SHIFT;
|
||||||
|
lr_prio = (lr_val & ICH_LR_PRIORITY_MASK) >> ICH_LR_PRIORITY_SHIFT;
|
||||||
|
if (pmr <= lr_prio)
|
||||||
|
goto spurious;
|
||||||
|
|
||||||
|
if (__vgic_v3_get_highest_active_priority() <= __vgic_v3_pri_to_pre(lr_prio, vmcr, grp))
|
||||||
|
goto spurious;
|
||||||
|
|
||||||
|
lr_val &= ~ICH_LR_STATE;
|
||||||
|
/* No active state for LPIs */
|
||||||
|
if ((lr_val & ICH_LR_VIRTUAL_ID_MASK) <= VGIC_MAX_SPI)
|
||||||
|
lr_val |= ICH_LR_ACTIVE_BIT;
|
||||||
|
__gic_v3_set_lr(lr_val, lr);
|
||||||
|
__vgic_v3_set_active_priority(lr_prio, vmcr, grp);
|
||||||
|
vcpu_set_reg(vcpu, rt, lr_val & ICH_LR_VIRTUAL_ID_MASK);
|
||||||
|
return;
|
||||||
|
|
||||||
|
spurious:
|
||||||
|
vcpu_set_reg(vcpu, rt, ICC_IAR1_EL1_SPURIOUS);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_clear_active_lr(int lr, u64 lr_val)
|
||||||
|
{
|
||||||
|
lr_val &= ~ICH_LR_ACTIVE_BIT;
|
||||||
|
if (lr_val & ICH_LR_HW) {
|
||||||
|
u32 pid;
|
||||||
|
|
||||||
|
pid = (lr_val & ICH_LR_PHYS_ID_MASK) >> ICH_LR_PHYS_ID_SHIFT;
|
||||||
|
gic_write_dir(pid);
|
||||||
|
}
|
||||||
|
|
||||||
|
__gic_v3_set_lr(lr_val, lr);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_bump_eoicount(void)
|
||||||
|
{
|
||||||
|
u32 hcr;
|
||||||
|
|
||||||
|
hcr = read_gicreg(ICH_HCR_EL2);
|
||||||
|
hcr += 1 << ICH_HCR_EOIcount_SHIFT;
|
||||||
|
write_gicreg(hcr, ICH_HCR_EL2);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_write_dir(struct kvm_vcpu *vcpu,
|
||||||
|
u32 vmcr, int rt)
|
||||||
|
{
|
||||||
|
u32 vid = vcpu_get_reg(vcpu, rt);
|
||||||
|
u64 lr_val;
|
||||||
|
int lr;
|
||||||
|
|
||||||
|
/* EOImode == 0, nothing to be done here */
|
||||||
|
if (!(vmcr & ICH_VMCR_EOIM_MASK))
|
||||||
|
return;
|
||||||
|
|
||||||
|
/* No deactivate to be performed on an LPI */
|
||||||
|
if (vid >= VGIC_MIN_LPI)
|
||||||
|
return;
|
||||||
|
|
||||||
|
lr = __vgic_v3_find_active_lr(vcpu, vid, &lr_val);
|
||||||
|
if (lr == -1) {
|
||||||
|
__vgic_v3_bump_eoicount();
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
__vgic_v3_clear_active_lr(lr, lr_val);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_write_eoir(struct kvm_vcpu *vcpu, u32 vmcr, int rt)
|
||||||
|
{
|
||||||
|
u32 vid = vcpu_get_reg(vcpu, rt);
|
||||||
|
u64 lr_val;
|
||||||
|
u8 lr_prio, act_prio;
|
||||||
|
int lr, grp;
|
||||||
|
|
||||||
|
grp = __vgic_v3_get_group(vcpu);
|
||||||
|
|
||||||
|
/* Drop priority in any case */
|
||||||
|
act_prio = __vgic_v3_clear_highest_active_priority();
|
||||||
|
|
||||||
|
/* If EOIing an LPI, no deactivate to be performed */
|
||||||
|
if (vid >= VGIC_MIN_LPI)
|
||||||
|
return;
|
||||||
|
|
||||||
|
/* EOImode == 1, nothing to be done here */
|
||||||
|
if (vmcr & ICH_VMCR_EOIM_MASK)
|
||||||
|
return;
|
||||||
|
|
||||||
|
lr = __vgic_v3_find_active_lr(vcpu, vid, &lr_val);
|
||||||
|
if (lr == -1) {
|
||||||
|
__vgic_v3_bump_eoicount();
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
lr_prio = (lr_val & ICH_LR_PRIORITY_MASK) >> ICH_LR_PRIORITY_SHIFT;
|
||||||
|
|
||||||
|
/* If priorities or group do not match, the guest has fscked-up. */
|
||||||
|
if (grp != !!(lr_val & ICH_LR_GROUP) ||
|
||||||
|
__vgic_v3_pri_to_pre(lr_prio, vmcr, grp) != act_prio)
|
||||||
|
return;
|
||||||
|
|
||||||
|
/* Let's now perform the deactivation */
|
||||||
|
__vgic_v3_clear_active_lr(lr, lr_val);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_read_igrpen0(struct kvm_vcpu *vcpu, u32 vmcr, int rt)
|
||||||
|
{
|
||||||
|
vcpu_set_reg(vcpu, rt, !!(vmcr & ICH_VMCR_ENG0_MASK));
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_read_igrpen1(struct kvm_vcpu *vcpu, u32 vmcr, int rt)
|
||||||
|
{
|
||||||
|
vcpu_set_reg(vcpu, rt, !!(vmcr & ICH_VMCR_ENG1_MASK));
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_write_igrpen0(struct kvm_vcpu *vcpu, u32 vmcr, int rt)
|
||||||
|
{
|
||||||
|
u64 val = vcpu_get_reg(vcpu, rt);
|
||||||
|
|
||||||
|
if (val & 1)
|
||||||
|
vmcr |= ICH_VMCR_ENG0_MASK;
|
||||||
|
else
|
||||||
|
vmcr &= ~ICH_VMCR_ENG0_MASK;
|
||||||
|
|
||||||
|
__vgic_v3_write_vmcr(vmcr);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_write_igrpen1(struct kvm_vcpu *vcpu, u32 vmcr, int rt)
|
||||||
|
{
|
||||||
|
u64 val = vcpu_get_reg(vcpu, rt);
|
||||||
|
|
||||||
|
if (val & 1)
|
||||||
|
vmcr |= ICH_VMCR_ENG1_MASK;
|
||||||
|
else
|
||||||
|
vmcr &= ~ICH_VMCR_ENG1_MASK;
|
||||||
|
|
||||||
|
__vgic_v3_write_vmcr(vmcr);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_read_bpr0(struct kvm_vcpu *vcpu, u32 vmcr, int rt)
|
||||||
|
{
|
||||||
|
vcpu_set_reg(vcpu, rt, __vgic_v3_get_bpr0(vmcr));
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_read_bpr1(struct kvm_vcpu *vcpu, u32 vmcr, int rt)
|
||||||
|
{
|
||||||
|
vcpu_set_reg(vcpu, rt, __vgic_v3_get_bpr1(vmcr));
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_write_bpr0(struct kvm_vcpu *vcpu, u32 vmcr, int rt)
|
||||||
|
{
|
||||||
|
u64 val = vcpu_get_reg(vcpu, rt);
|
||||||
|
u8 bpr_min = __vgic_v3_bpr_min() - 1;
|
||||||
|
|
||||||
|
/* Enforce BPR limiting */
|
||||||
|
if (val < bpr_min)
|
||||||
|
val = bpr_min;
|
||||||
|
|
||||||
|
val <<= ICH_VMCR_BPR0_SHIFT;
|
||||||
|
val &= ICH_VMCR_BPR0_MASK;
|
||||||
|
vmcr &= ~ICH_VMCR_BPR0_MASK;
|
||||||
|
vmcr |= val;
|
||||||
|
|
||||||
|
__vgic_v3_write_vmcr(vmcr);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_write_bpr1(struct kvm_vcpu *vcpu, u32 vmcr, int rt)
|
||||||
|
{
|
||||||
|
u64 val = vcpu_get_reg(vcpu, rt);
|
||||||
|
u8 bpr_min = __vgic_v3_bpr_min();
|
||||||
|
|
||||||
|
if (vmcr & ICH_VMCR_CBPR_MASK)
|
||||||
|
return;
|
||||||
|
|
||||||
|
/* Enforce BPR limiting */
|
||||||
|
if (val < bpr_min)
|
||||||
|
val = bpr_min;
|
||||||
|
|
||||||
|
val <<= ICH_VMCR_BPR1_SHIFT;
|
||||||
|
val &= ICH_VMCR_BPR1_MASK;
|
||||||
|
vmcr &= ~ICH_VMCR_BPR1_MASK;
|
||||||
|
vmcr |= val;
|
||||||
|
|
||||||
|
__vgic_v3_write_vmcr(vmcr);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_read_apxrn(struct kvm_vcpu *vcpu, int rt, int n)
|
||||||
|
{
|
||||||
|
u32 val;
|
||||||
|
|
||||||
|
if (!__vgic_v3_get_group(vcpu))
|
||||||
|
val = __vgic_v3_read_ap0rn(n);
|
||||||
|
else
|
||||||
|
val = __vgic_v3_read_ap1rn(n);
|
||||||
|
|
||||||
|
vcpu_set_reg(vcpu, rt, val);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_write_apxrn(struct kvm_vcpu *vcpu, int rt, int n)
|
||||||
|
{
|
||||||
|
u32 val = vcpu_get_reg(vcpu, rt);
|
||||||
|
|
||||||
|
if (!__vgic_v3_get_group(vcpu))
|
||||||
|
__vgic_v3_write_ap0rn(val, n);
|
||||||
|
else
|
||||||
|
__vgic_v3_write_ap1rn(val, n);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_read_apxr0(struct kvm_vcpu *vcpu,
|
||||||
|
u32 vmcr, int rt)
|
||||||
|
{
|
||||||
|
__vgic_v3_read_apxrn(vcpu, rt, 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_read_apxr1(struct kvm_vcpu *vcpu,
|
||||||
|
u32 vmcr, int rt)
|
||||||
|
{
|
||||||
|
__vgic_v3_read_apxrn(vcpu, rt, 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_read_apxr2(struct kvm_vcpu *vcpu,
|
||||||
|
u32 vmcr, int rt)
|
||||||
|
{
|
||||||
|
__vgic_v3_read_apxrn(vcpu, rt, 2);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_read_apxr3(struct kvm_vcpu *vcpu,
|
||||||
|
u32 vmcr, int rt)
|
||||||
|
{
|
||||||
|
__vgic_v3_read_apxrn(vcpu, rt, 3);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_write_apxr0(struct kvm_vcpu *vcpu,
|
||||||
|
u32 vmcr, int rt)
|
||||||
|
{
|
||||||
|
__vgic_v3_write_apxrn(vcpu, rt, 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_write_apxr1(struct kvm_vcpu *vcpu,
|
||||||
|
u32 vmcr, int rt)
|
||||||
|
{
|
||||||
|
__vgic_v3_write_apxrn(vcpu, rt, 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_write_apxr2(struct kvm_vcpu *vcpu,
|
||||||
|
u32 vmcr, int rt)
|
||||||
|
{
|
||||||
|
__vgic_v3_write_apxrn(vcpu, rt, 2);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_write_apxr3(struct kvm_vcpu *vcpu,
|
||||||
|
u32 vmcr, int rt)
|
||||||
|
{
|
||||||
|
__vgic_v3_write_apxrn(vcpu, rt, 3);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_read_hppir(struct kvm_vcpu *vcpu,
|
||||||
|
u32 vmcr, int rt)
|
||||||
|
{
|
||||||
|
u64 lr_val;
|
||||||
|
int lr, lr_grp, grp;
|
||||||
|
|
||||||
|
grp = __vgic_v3_get_group(vcpu);
|
||||||
|
|
||||||
|
lr = __vgic_v3_highest_priority_lr(vcpu, vmcr, &lr_val);
|
||||||
|
if (lr == -1)
|
||||||
|
goto spurious;
|
||||||
|
|
||||||
|
lr_grp = !!(lr_val & ICH_LR_GROUP);
|
||||||
|
if (lr_grp != grp)
|
||||||
|
lr_val = ICC_IAR1_EL1_SPURIOUS;
|
||||||
|
|
||||||
|
spurious:
|
||||||
|
vcpu_set_reg(vcpu, rt, lr_val & ICH_LR_VIRTUAL_ID_MASK);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_read_pmr(struct kvm_vcpu *vcpu,
|
||||||
|
u32 vmcr, int rt)
|
||||||
|
{
|
||||||
|
vmcr &= ICH_VMCR_PMR_MASK;
|
||||||
|
vmcr >>= ICH_VMCR_PMR_SHIFT;
|
||||||
|
vcpu_set_reg(vcpu, rt, vmcr);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_write_pmr(struct kvm_vcpu *vcpu,
|
||||||
|
u32 vmcr, int rt)
|
||||||
|
{
|
||||||
|
u32 val = vcpu_get_reg(vcpu, rt);
|
||||||
|
|
||||||
|
val <<= ICH_VMCR_PMR_SHIFT;
|
||||||
|
val &= ICH_VMCR_PMR_MASK;
|
||||||
|
vmcr &= ~ICH_VMCR_PMR_MASK;
|
||||||
|
vmcr |= val;
|
||||||
|
|
||||||
|
write_gicreg(vmcr, ICH_VMCR_EL2);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_read_rpr(struct kvm_vcpu *vcpu,
|
||||||
|
u32 vmcr, int rt)
|
||||||
|
{
|
||||||
|
u32 val = __vgic_v3_get_highest_active_priority();
|
||||||
|
vcpu_set_reg(vcpu, rt, val);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_read_ctlr(struct kvm_vcpu *vcpu,
|
||||||
|
u32 vmcr, int rt)
|
||||||
|
{
|
||||||
|
u32 vtr, val;
|
||||||
|
|
||||||
|
vtr = read_gicreg(ICH_VTR_EL2);
|
||||||
|
/* PRIbits */
|
||||||
|
val = ((vtr >> 29) & 7) << ICC_CTLR_EL1_PRI_BITS_SHIFT;
|
||||||
|
/* IDbits */
|
||||||
|
val |= ((vtr >> 23) & 7) << ICC_CTLR_EL1_ID_BITS_SHIFT;
|
||||||
|
/* SEIS */
|
||||||
|
val |= ((vtr >> 22) & 1) << ICC_CTLR_EL1_SEIS_SHIFT;
|
||||||
|
/* A3V */
|
||||||
|
val |= ((vtr >> 21) & 1) << ICC_CTLR_EL1_A3V_SHIFT;
|
||||||
|
/* EOImode */
|
||||||
|
val |= ((vmcr & ICH_VMCR_EOIM_MASK) >> ICH_VMCR_EOIM_SHIFT) << ICC_CTLR_EL1_EOImode_SHIFT;
|
||||||
|
/* CBPR */
|
||||||
|
val |= (vmcr & ICH_VMCR_CBPR_MASK) >> ICH_VMCR_CBPR_SHIFT;
|
||||||
|
|
||||||
|
vcpu_set_reg(vcpu, rt, val);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __hyp_text __vgic_v3_write_ctlr(struct kvm_vcpu *vcpu,
|
||||||
|
u32 vmcr, int rt)
|
||||||
|
{
|
||||||
|
u32 val = vcpu_get_reg(vcpu, rt);
|
||||||
|
|
||||||
|
if (val & ICC_CTLR_EL1_CBPR_MASK)
|
||||||
|
vmcr |= ICH_VMCR_CBPR_MASK;
|
||||||
|
else
|
||||||
|
vmcr &= ~ICH_VMCR_CBPR_MASK;
|
||||||
|
|
||||||
|
if (val & ICC_CTLR_EL1_EOImode_MASK)
|
||||||
|
vmcr |= ICH_VMCR_EOIM_MASK;
|
||||||
|
else
|
||||||
|
vmcr &= ~ICH_VMCR_EOIM_MASK;
|
||||||
|
|
||||||
|
write_gicreg(vmcr, ICH_VMCR_EL2);
|
||||||
|
}
|
||||||
|
|
||||||
|
int __hyp_text __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu)
|
||||||
|
{
|
||||||
|
int rt;
|
||||||
|
u32 esr;
|
||||||
|
u32 vmcr;
|
||||||
|
void (*fn)(struct kvm_vcpu *, u32, int);
|
||||||
|
bool is_read;
|
||||||
|
u32 sysreg;
|
||||||
|
|
||||||
|
esr = kvm_vcpu_get_hsr(vcpu);
|
||||||
|
if (vcpu_mode_is_32bit(vcpu)) {
|
||||||
|
if (!kvm_condition_valid(vcpu))
|
||||||
|
return 1;
|
||||||
|
|
||||||
|
sysreg = esr_cp15_to_sysreg(esr);
|
||||||
|
} else {
|
||||||
|
sysreg = esr_sys64_to_sysreg(esr);
|
||||||
|
}
|
||||||
|
|
||||||
|
is_read = (esr & ESR_ELx_SYS64_ISS_DIR_MASK) == ESR_ELx_SYS64_ISS_DIR_READ;
|
||||||
|
|
||||||
|
switch (sysreg) {
|
||||||
|
case SYS_ICC_IAR0_EL1:
|
||||||
|
case SYS_ICC_IAR1_EL1:
|
||||||
|
if (unlikely(!is_read))
|
||||||
|
return 0;
|
||||||
|
fn = __vgic_v3_read_iar;
|
||||||
|
break;
|
||||||
|
case SYS_ICC_EOIR0_EL1:
|
||||||
|
case SYS_ICC_EOIR1_EL1:
|
||||||
|
if (unlikely(is_read))
|
||||||
|
return 0;
|
||||||
|
fn = __vgic_v3_write_eoir;
|
||||||
|
break;
|
||||||
|
case SYS_ICC_IGRPEN1_EL1:
|
||||||
|
if (is_read)
|
||||||
|
fn = __vgic_v3_read_igrpen1;
|
||||||
|
else
|
||||||
|
fn = __vgic_v3_write_igrpen1;
|
||||||
|
break;
|
||||||
|
case SYS_ICC_BPR1_EL1:
|
||||||
|
if (is_read)
|
||||||
|
fn = __vgic_v3_read_bpr1;
|
||||||
|
else
|
||||||
|
fn = __vgic_v3_write_bpr1;
|
||||||
|
break;
|
||||||
|
case SYS_ICC_AP0Rn_EL1(0):
|
||||||
|
case SYS_ICC_AP1Rn_EL1(0):
|
||||||
|
if (is_read)
|
||||||
|
fn = __vgic_v3_read_apxr0;
|
||||||
|
else
|
||||||
|
fn = __vgic_v3_write_apxr0;
|
||||||
|
break;
|
||||||
|
case SYS_ICC_AP0Rn_EL1(1):
|
||||||
|
case SYS_ICC_AP1Rn_EL1(1):
|
||||||
|
if (is_read)
|
||||||
|
fn = __vgic_v3_read_apxr1;
|
||||||
|
else
|
||||||
|
fn = __vgic_v3_write_apxr1;
|
||||||
|
break;
|
||||||
|
case SYS_ICC_AP0Rn_EL1(2):
|
||||||
|
case SYS_ICC_AP1Rn_EL1(2):
|
||||||
|
if (is_read)
|
||||||
|
fn = __vgic_v3_read_apxr2;
|
||||||
|
else
|
||||||
|
fn = __vgic_v3_write_apxr2;
|
||||||
|
break;
|
||||||
|
case SYS_ICC_AP0Rn_EL1(3):
|
||||||
|
case SYS_ICC_AP1Rn_EL1(3):
|
||||||
|
if (is_read)
|
||||||
|
fn = __vgic_v3_read_apxr3;
|
||||||
|
else
|
||||||
|
fn = __vgic_v3_write_apxr3;
|
||||||
|
break;
|
||||||
|
case SYS_ICC_HPPIR0_EL1:
|
||||||
|
case SYS_ICC_HPPIR1_EL1:
|
||||||
|
if (unlikely(!is_read))
|
||||||
|
return 0;
|
||||||
|
fn = __vgic_v3_read_hppir;
|
||||||
|
break;
|
||||||
|
case SYS_ICC_IGRPEN0_EL1:
|
||||||
|
if (is_read)
|
||||||
|
fn = __vgic_v3_read_igrpen0;
|
||||||
|
else
|
||||||
|
fn = __vgic_v3_write_igrpen0;
|
||||||
|
break;
|
||||||
|
case SYS_ICC_BPR0_EL1:
|
||||||
|
if (is_read)
|
||||||
|
fn = __vgic_v3_read_bpr0;
|
||||||
|
else
|
||||||
|
fn = __vgic_v3_write_bpr0;
|
||||||
|
break;
|
||||||
|
case SYS_ICC_DIR_EL1:
|
||||||
|
if (unlikely(is_read))
|
||||||
|
return 0;
|
||||||
|
fn = __vgic_v3_write_dir;
|
||||||
|
break;
|
||||||
|
case SYS_ICC_RPR_EL1:
|
||||||
|
if (unlikely(!is_read))
|
||||||
|
return 0;
|
||||||
|
fn = __vgic_v3_read_rpr;
|
||||||
|
break;
|
||||||
|
case SYS_ICC_CTLR_EL1:
|
||||||
|
if (is_read)
|
||||||
|
fn = __vgic_v3_read_ctlr;
|
||||||
|
else
|
||||||
|
fn = __vgic_v3_write_ctlr;
|
||||||
|
break;
|
||||||
|
case SYS_ICC_PMR_EL1:
|
||||||
|
if (is_read)
|
||||||
|
fn = __vgic_v3_read_pmr;
|
||||||
|
else
|
||||||
|
fn = __vgic_v3_write_pmr;
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
vmcr = __vgic_v3_read_vmcr();
|
||||||
|
rt = kvm_vcpu_sys_get_rt(vcpu);
|
||||||
|
fn(vcpu, vmcr, rt);
|
||||||
|
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif
|
||||||
|
|
|
@ -20,6 +20,7 @@
|
||||||
#include <linux/kvm_host.h>
|
#include <linux/kvm_host.h>
|
||||||
#include <linux/io.h>
|
#include <linux/io.h>
|
||||||
#include <linux/hugetlb.h>
|
#include <linux/hugetlb.h>
|
||||||
|
#include <linux/sched/signal.h>
|
||||||
#include <trace/events/kvm.h>
|
#include <trace/events/kvm.h>
|
||||||
#include <asm/pgalloc.h>
|
#include <asm/pgalloc.h>
|
||||||
#include <asm/cacheflush.h>
|
#include <asm/cacheflush.h>
|
||||||
|
@ -1262,6 +1263,24 @@ static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
|
||||||
__coherent_cache_guest_page(vcpu, pfn, size);
|
__coherent_cache_guest_page(vcpu, pfn, size);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void kvm_send_hwpoison_signal(unsigned long address,
|
||||||
|
struct vm_area_struct *vma)
|
||||||
|
{
|
||||||
|
siginfo_t info;
|
||||||
|
|
||||||
|
info.si_signo = SIGBUS;
|
||||||
|
info.si_errno = 0;
|
||||||
|
info.si_code = BUS_MCEERR_AR;
|
||||||
|
info.si_addr = (void __user *)address;
|
||||||
|
|
||||||
|
if (is_vm_hugetlb_page(vma))
|
||||||
|
info.si_addr_lsb = huge_page_shift(hstate_vma(vma));
|
||||||
|
else
|
||||||
|
info.si_addr_lsb = PAGE_SHIFT;
|
||||||
|
|
||||||
|
send_sig_info(SIGBUS, &info, current);
|
||||||
|
}
|
||||||
|
|
||||||
static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
|
static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
|
||||||
struct kvm_memory_slot *memslot, unsigned long hva,
|
struct kvm_memory_slot *memslot, unsigned long hva,
|
||||||
unsigned long fault_status)
|
unsigned long fault_status)
|
||||||
|
@ -1331,6 +1350,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
|
||||||
smp_rmb();
|
smp_rmb();
|
||||||
|
|
||||||
pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable);
|
pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable);
|
||||||
|
if (pfn == KVM_PFN_ERR_HWPOISON) {
|
||||||
|
kvm_send_hwpoison_signal(hva, vma);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
if (is_error_noslot_pfn(pfn))
|
if (is_error_noslot_pfn(pfn))
|
||||||
return -EFAULT;
|
return -EFAULT;
|
||||||
|
|
||||||
|
|
|
@ -203,6 +203,24 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
|
||||||
return reg;
|
return reg;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void kvm_pmu_check_overflow(struct kvm_vcpu *vcpu)
|
||||||
|
{
|
||||||
|
struct kvm_pmu *pmu = &vcpu->arch.pmu;
|
||||||
|
bool overflow = !!kvm_pmu_overflow_status(vcpu);
|
||||||
|
|
||||||
|
if (pmu->irq_level == overflow)
|
||||||
|
return;
|
||||||
|
|
||||||
|
pmu->irq_level = overflow;
|
||||||
|
|
||||||
|
if (likely(irqchip_in_kernel(vcpu->kvm))) {
|
||||||
|
int ret = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
|
||||||
|
pmu->irq_num, overflow,
|
||||||
|
&vcpu->arch.pmu);
|
||||||
|
WARN_ON(ret);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* kvm_pmu_overflow_set - set PMU overflow interrupt
|
* kvm_pmu_overflow_set - set PMU overflow interrupt
|
||||||
* @vcpu: The vcpu pointer
|
* @vcpu: The vcpu pointer
|
||||||
|
@ -210,37 +228,18 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
|
||||||
*/
|
*/
|
||||||
void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val)
|
void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val)
|
||||||
{
|
{
|
||||||
u64 reg;
|
|
||||||
|
|
||||||
if (val == 0)
|
if (val == 0)
|
||||||
return;
|
return;
|
||||||
|
|
||||||
vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= val;
|
vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= val;
|
||||||
reg = kvm_pmu_overflow_status(vcpu);
|
kvm_pmu_check_overflow(vcpu);
|
||||||
if (reg != 0)
|
|
||||||
kvm_vcpu_kick(vcpu);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
static void kvm_pmu_update_state(struct kvm_vcpu *vcpu)
|
static void kvm_pmu_update_state(struct kvm_vcpu *vcpu)
|
||||||
{
|
{
|
||||||
struct kvm_pmu *pmu = &vcpu->arch.pmu;
|
|
||||||
bool overflow;
|
|
||||||
|
|
||||||
if (!kvm_arm_pmu_v3_ready(vcpu))
|
if (!kvm_arm_pmu_v3_ready(vcpu))
|
||||||
return;
|
return;
|
||||||
|
kvm_pmu_check_overflow(vcpu);
|
||||||
overflow = !!kvm_pmu_overflow_status(vcpu);
|
|
||||||
if (pmu->irq_level == overflow)
|
|
||||||
return;
|
|
||||||
|
|
||||||
pmu->irq_level = overflow;
|
|
||||||
|
|
||||||
if (likely(irqchip_in_kernel(vcpu->kvm))) {
|
|
||||||
int ret;
|
|
||||||
ret = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
|
|
||||||
pmu->irq_num, overflow);
|
|
||||||
WARN_ON(ret);
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
bool kvm_pmu_should_notify_user(struct kvm_vcpu *vcpu)
|
bool kvm_pmu_should_notify_user(struct kvm_vcpu *vcpu)
|
||||||
|
@ -451,25 +450,32 @@ bool kvm_arm_support_pmu_v3(void)
|
||||||
return (perf_num_counters() > 0);
|
return (perf_num_counters() > 0);
|
||||||
}
|
}
|
||||||
|
|
||||||
static int kvm_arm_pmu_v3_init(struct kvm_vcpu *vcpu)
|
int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu)
|
||||||
{
|
{
|
||||||
if (!kvm_arm_support_pmu_v3())
|
if (!vcpu->arch.pmu.created)
|
||||||
return -ENODEV;
|
return 0;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* We currently require an in-kernel VGIC to use the PMU emulation,
|
* A valid interrupt configuration for the PMU is either to have a
|
||||||
* because we do not support forwarding PMU overflow interrupts to
|
* properly configured interrupt number and using an in-kernel
|
||||||
* userspace yet.
|
* irqchip, or to not have an in-kernel GIC and not set an IRQ.
|
||||||
*/
|
*/
|
||||||
if (!irqchip_in_kernel(vcpu->kvm) || !vgic_initialized(vcpu->kvm))
|
if (irqchip_in_kernel(vcpu->kvm)) {
|
||||||
return -ENODEV;
|
int irq = vcpu->arch.pmu.irq_num;
|
||||||
|
if (!kvm_arm_pmu_irq_initialized(vcpu))
|
||||||
|
return -EINVAL;
|
||||||
|
|
||||||
if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features) ||
|
/*
|
||||||
!kvm_arm_pmu_irq_initialized(vcpu))
|
* If we are using an in-kernel vgic, at this point we know
|
||||||
return -ENXIO;
|
* the vgic will be initialized, so we can check the PMU irq
|
||||||
|
* number against the dimensions of the vgic and make sure
|
||||||
if (kvm_arm_pmu_v3_ready(vcpu))
|
* it's valid.
|
||||||
return -EBUSY;
|
*/
|
||||||
|
if (!irq_is_ppi(irq) && !vgic_valid_spi(vcpu->kvm, irq))
|
||||||
|
return -EINVAL;
|
||||||
|
} else if (kvm_arm_pmu_irq_initialized(vcpu)) {
|
||||||
|
return -EINVAL;
|
||||||
|
}
|
||||||
|
|
||||||
kvm_pmu_vcpu_reset(vcpu);
|
kvm_pmu_vcpu_reset(vcpu);
|
||||||
vcpu->arch.pmu.ready = true;
|
vcpu->arch.pmu.ready = true;
|
||||||
|
@ -477,7 +483,40 @@ static int kvm_arm_pmu_v3_init(struct kvm_vcpu *vcpu)
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
#define irq_is_ppi(irq) ((irq) >= VGIC_NR_SGIS && (irq) < VGIC_NR_PRIVATE_IRQS)
|
static int kvm_arm_pmu_v3_init(struct kvm_vcpu *vcpu)
|
||||||
|
{
|
||||||
|
if (!kvm_arm_support_pmu_v3())
|
||||||
|
return -ENODEV;
|
||||||
|
|
||||||
|
if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
|
||||||
|
return -ENXIO;
|
||||||
|
|
||||||
|
if (vcpu->arch.pmu.created)
|
||||||
|
return -EBUSY;
|
||||||
|
|
||||||
|
if (irqchip_in_kernel(vcpu->kvm)) {
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* If using the PMU with an in-kernel virtual GIC
|
||||||
|
* implementation, we require the GIC to be already
|
||||||
|
* initialized when initializing the PMU.
|
||||||
|
*/
|
||||||
|
if (!vgic_initialized(vcpu->kvm))
|
||||||
|
return -ENODEV;
|
||||||
|
|
||||||
|
if (!kvm_arm_pmu_irq_initialized(vcpu))
|
||||||
|
return -ENXIO;
|
||||||
|
|
||||||
|
ret = kvm_vgic_set_owner(vcpu, vcpu->arch.pmu.irq_num,
|
||||||
|
&vcpu->arch.pmu);
|
||||||
|
if (ret)
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
vcpu->arch.pmu.created = true;
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* For one VM the interrupt type must be same for each vcpu.
|
* For one VM the interrupt type must be same for each vcpu.
|
||||||
|
@ -512,6 +551,9 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
|
||||||
int __user *uaddr = (int __user *)(long)attr->addr;
|
int __user *uaddr = (int __user *)(long)attr->addr;
|
||||||
int irq;
|
int irq;
|
||||||
|
|
||||||
|
if (!irqchip_in_kernel(vcpu->kvm))
|
||||||
|
return -EINVAL;
|
||||||
|
|
||||||
if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
|
if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
|
||||||
return -ENODEV;
|
return -ENODEV;
|
||||||
|
|
||||||
|
@ -519,7 +561,7 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
|
||||||
return -EFAULT;
|
return -EFAULT;
|
||||||
|
|
||||||
/* The PMU overflow interrupt can be a PPI or a valid SPI. */
|
/* The PMU overflow interrupt can be a PPI or a valid SPI. */
|
||||||
if (!(irq_is_ppi(irq) || vgic_valid_spi(vcpu->kvm, irq)))
|
if (!(irq_is_ppi(irq) || irq_is_spi(irq)))
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
|
|
||||||
if (!pmu_irq_is_valid(vcpu->kvm, irq))
|
if (!pmu_irq_is_valid(vcpu->kvm, irq))
|
||||||
|
@ -546,6 +588,9 @@ int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
|
||||||
int __user *uaddr = (int __user *)(long)attr->addr;
|
int __user *uaddr = (int __user *)(long)attr->addr;
|
||||||
int irq;
|
int irq;
|
||||||
|
|
||||||
|
if (!irqchip_in_kernel(vcpu->kvm))
|
||||||
|
return -EINVAL;
|
||||||
|
|
||||||
if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
|
if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
|
||||||
return -ENODEV;
|
return -ENODEV;
|
||||||
|
|
||||||
|
|
|
@ -57,6 +57,7 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
|
||||||
* for KVM will preserve the register state.
|
* for KVM will preserve the register state.
|
||||||
*/
|
*/
|
||||||
kvm_vcpu_block(vcpu);
|
kvm_vcpu_block(vcpu);
|
||||||
|
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
|
||||||
|
|
||||||
return PSCI_RET_SUCCESS;
|
return PSCI_RET_SUCCESS;
|
||||||
}
|
}
|
||||||
|
@ -64,6 +65,8 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
|
||||||
static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu)
|
static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu)
|
||||||
{
|
{
|
||||||
vcpu->arch.power_off = true;
|
vcpu->arch.power_off = true;
|
||||||
|
kvm_make_request(KVM_REQ_SLEEP, vcpu);
|
||||||
|
kvm_vcpu_kick(vcpu);
|
||||||
}
|
}
|
||||||
|
|
||||||
static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
|
static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
|
||||||
|
@ -178,10 +181,9 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
|
||||||
* after this call is handled and before the VCPUs have been
|
* after this call is handled and before the VCPUs have been
|
||||||
* re-initialized.
|
* re-initialized.
|
||||||
*/
|
*/
|
||||||
kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
|
kvm_for_each_vcpu(i, tmp, vcpu->kvm)
|
||||||
tmp->arch.power_off = true;
|
tmp->arch.power_off = true;
|
||||||
kvm_vcpu_kick(tmp);
|
kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
|
||||||
}
|
|
||||||
|
|
||||||
memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
|
memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
|
||||||
vcpu->run->system_event.type = type;
|
vcpu->run->system_event.type = type;
|
||||||
|
|
|
@ -34,7 +34,7 @@ static int vgic_irqfd_set_irq(struct kvm_kernel_irq_routing_entry *e,
|
||||||
|
|
||||||
if (!vgic_valid_spi(kvm, spi_id))
|
if (!vgic_valid_spi(kvm, spi_id))
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
return kvm_vgic_inject_irq(kvm, 0, spi_id, level);
|
return kvm_vgic_inject_irq(kvm, 0, spi_id, level, NULL);
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
|
|
|
@ -308,34 +308,36 @@ static const struct vgic_register_region vgic_v2_dist_registers[] = {
|
||||||
vgic_mmio_read_v2_misc, vgic_mmio_write_v2_misc, 12,
|
vgic_mmio_read_v2_misc, vgic_mmio_write_v2_misc, 12,
|
||||||
VGIC_ACCESS_32bit),
|
VGIC_ACCESS_32bit),
|
||||||
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_IGROUP,
|
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_IGROUP,
|
||||||
vgic_mmio_read_rao, vgic_mmio_write_wi, 1,
|
vgic_mmio_read_rao, vgic_mmio_write_wi, NULL, NULL, 1,
|
||||||
VGIC_ACCESS_32bit),
|
VGIC_ACCESS_32bit),
|
||||||
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ENABLE_SET,
|
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ENABLE_SET,
|
||||||
vgic_mmio_read_enable, vgic_mmio_write_senable, 1,
|
vgic_mmio_read_enable, vgic_mmio_write_senable, NULL, NULL, 1,
|
||||||
VGIC_ACCESS_32bit),
|
VGIC_ACCESS_32bit),
|
||||||
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ENABLE_CLEAR,
|
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ENABLE_CLEAR,
|
||||||
vgic_mmio_read_enable, vgic_mmio_write_cenable, 1,
|
vgic_mmio_read_enable, vgic_mmio_write_cenable, NULL, NULL, 1,
|
||||||
VGIC_ACCESS_32bit),
|
VGIC_ACCESS_32bit),
|
||||||
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_PENDING_SET,
|
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_PENDING_SET,
|
||||||
vgic_mmio_read_pending, vgic_mmio_write_spending, 1,
|
vgic_mmio_read_pending, vgic_mmio_write_spending, NULL, NULL, 1,
|
||||||
VGIC_ACCESS_32bit),
|
VGIC_ACCESS_32bit),
|
||||||
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_PENDING_CLEAR,
|
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_PENDING_CLEAR,
|
||||||
vgic_mmio_read_pending, vgic_mmio_write_cpending, 1,
|
vgic_mmio_read_pending, vgic_mmio_write_cpending, NULL, NULL, 1,
|
||||||
VGIC_ACCESS_32bit),
|
VGIC_ACCESS_32bit),
|
||||||
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ACTIVE_SET,
|
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ACTIVE_SET,
|
||||||
vgic_mmio_read_active, vgic_mmio_write_sactive, 1,
|
vgic_mmio_read_active, vgic_mmio_write_sactive,
|
||||||
|
NULL, vgic_mmio_uaccess_write_sactive, 1,
|
||||||
VGIC_ACCESS_32bit),
|
VGIC_ACCESS_32bit),
|
||||||
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ACTIVE_CLEAR,
|
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ACTIVE_CLEAR,
|
||||||
vgic_mmio_read_active, vgic_mmio_write_cactive, 1,
|
vgic_mmio_read_active, vgic_mmio_write_cactive,
|
||||||
|
NULL, vgic_mmio_uaccess_write_cactive, 1,
|
||||||
VGIC_ACCESS_32bit),
|
VGIC_ACCESS_32bit),
|
||||||
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_PRI,
|
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_PRI,
|
||||||
vgic_mmio_read_priority, vgic_mmio_write_priority, 8,
|
vgic_mmio_read_priority, vgic_mmio_write_priority, NULL, NULL,
|
||||||
VGIC_ACCESS_32bit | VGIC_ACCESS_8bit),
|
8, VGIC_ACCESS_32bit | VGIC_ACCESS_8bit),
|
||||||
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_TARGET,
|
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_TARGET,
|
||||||
vgic_mmio_read_target, vgic_mmio_write_target, 8,
|
vgic_mmio_read_target, vgic_mmio_write_target, NULL, NULL, 8,
|
||||||
VGIC_ACCESS_32bit | VGIC_ACCESS_8bit),
|
VGIC_ACCESS_32bit | VGIC_ACCESS_8bit),
|
||||||
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_CONFIG,
|
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_CONFIG,
|
||||||
vgic_mmio_read_config, vgic_mmio_write_config, 2,
|
vgic_mmio_read_config, vgic_mmio_write_config, NULL, NULL, 2,
|
||||||
VGIC_ACCESS_32bit),
|
VGIC_ACCESS_32bit),
|
||||||
REGISTER_DESC_WITH_LENGTH(GIC_DIST_SOFTINT,
|
REGISTER_DESC_WITH_LENGTH(GIC_DIST_SOFTINT,
|
||||||
vgic_mmio_read_raz, vgic_mmio_write_sgir, 4,
|
vgic_mmio_read_raz, vgic_mmio_write_sgir, 4,
|
||||||
|
|
|
@ -456,11 +456,13 @@ static const struct vgic_register_region vgic_v3_dist_registers[] = {
|
||||||
vgic_mmio_read_raz, vgic_mmio_write_wi, 1,
|
vgic_mmio_read_raz, vgic_mmio_write_wi, 1,
|
||||||
VGIC_ACCESS_32bit),
|
VGIC_ACCESS_32bit),
|
||||||
REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ISACTIVER,
|
REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ISACTIVER,
|
||||||
vgic_mmio_read_active, vgic_mmio_write_sactive, NULL, NULL, 1,
|
vgic_mmio_read_active, vgic_mmio_write_sactive,
|
||||||
|
NULL, vgic_mmio_uaccess_write_sactive, 1,
|
||||||
VGIC_ACCESS_32bit),
|
VGIC_ACCESS_32bit),
|
||||||
REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ICACTIVER,
|
REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ICACTIVER,
|
||||||
vgic_mmio_read_active, vgic_mmio_write_cactive, NULL, NULL, 1,
|
vgic_mmio_read_active, vgic_mmio_write_cactive,
|
||||||
VGIC_ACCESS_32bit),
|
NULL, vgic_mmio_uaccess_write_cactive,
|
||||||
|
1, VGIC_ACCESS_32bit),
|
||||||
REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_IPRIORITYR,
|
REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_IPRIORITYR,
|
||||||
vgic_mmio_read_priority, vgic_mmio_write_priority, NULL, NULL,
|
vgic_mmio_read_priority, vgic_mmio_write_priority, NULL, NULL,
|
||||||
8, VGIC_ACCESS_32bit | VGIC_ACCESS_8bit),
|
8, VGIC_ACCESS_32bit | VGIC_ACCESS_8bit),
|
||||||
|
@ -526,12 +528,14 @@ static const struct vgic_register_region vgic_v3_sgibase_registers[] = {
|
||||||
vgic_mmio_read_pending, vgic_mmio_write_cpending,
|
vgic_mmio_read_pending, vgic_mmio_write_cpending,
|
||||||
vgic_mmio_read_raz, vgic_mmio_write_wi, 4,
|
vgic_mmio_read_raz, vgic_mmio_write_wi, 4,
|
||||||
VGIC_ACCESS_32bit),
|
VGIC_ACCESS_32bit),
|
||||||
REGISTER_DESC_WITH_LENGTH(GICR_ISACTIVER0,
|
REGISTER_DESC_WITH_LENGTH_UACCESS(GICR_ISACTIVER0,
|
||||||
vgic_mmio_read_active, vgic_mmio_write_sactive, 4,
|
vgic_mmio_read_active, vgic_mmio_write_sactive,
|
||||||
VGIC_ACCESS_32bit),
|
NULL, vgic_mmio_uaccess_write_sactive,
|
||||||
REGISTER_DESC_WITH_LENGTH(GICR_ICACTIVER0,
|
4, VGIC_ACCESS_32bit),
|
||||||
vgic_mmio_read_active, vgic_mmio_write_cactive, 4,
|
REGISTER_DESC_WITH_LENGTH_UACCESS(GICR_ICACTIVER0,
|
||||||
VGIC_ACCESS_32bit),
|
vgic_mmio_read_active, vgic_mmio_write_cactive,
|
||||||
|
NULL, vgic_mmio_uaccess_write_cactive,
|
||||||
|
4, VGIC_ACCESS_32bit),
|
||||||
REGISTER_DESC_WITH_LENGTH(GICR_IPRIORITYR0,
|
REGISTER_DESC_WITH_LENGTH(GICR_IPRIORITYR0,
|
||||||
vgic_mmio_read_priority, vgic_mmio_write_priority, 32,
|
vgic_mmio_read_priority, vgic_mmio_write_priority, 32,
|
||||||
VGIC_ACCESS_32bit | VGIC_ACCESS_8bit),
|
VGIC_ACCESS_32bit | VGIC_ACCESS_8bit),
|
||||||
|
|
|
@ -231,40 +231,72 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
|
||||||
* be migrated while we don't hold the IRQ locks and we don't want to be
|
* be migrated while we don't hold the IRQ locks and we don't want to be
|
||||||
* chasing moving targets.
|
* chasing moving targets.
|
||||||
*
|
*
|
||||||
* For private interrupts, we only have to make sure the single and only VCPU
|
* For private interrupts we don't have to do anything because userspace
|
||||||
* that can potentially queue the IRQ is stopped.
|
* accesses to the VGIC state already require all VCPUs to be stopped, and
|
||||||
|
* only the VCPU itself can modify its private interrupts active state, which
|
||||||
|
* guarantees that the VCPU is not running.
|
||||||
*/
|
*/
|
||||||
static void vgic_change_active_prepare(struct kvm_vcpu *vcpu, u32 intid)
|
static void vgic_change_active_prepare(struct kvm_vcpu *vcpu, u32 intid)
|
||||||
{
|
{
|
||||||
if (intid < VGIC_NR_PRIVATE_IRQS)
|
if (intid > VGIC_NR_PRIVATE_IRQS)
|
||||||
kvm_arm_halt_vcpu(vcpu);
|
|
||||||
else
|
|
||||||
kvm_arm_halt_guest(vcpu->kvm);
|
kvm_arm_halt_guest(vcpu->kvm);
|
||||||
}
|
}
|
||||||
|
|
||||||
/* See vgic_change_active_prepare */
|
/* See vgic_change_active_prepare */
|
||||||
static void vgic_change_active_finish(struct kvm_vcpu *vcpu, u32 intid)
|
static void vgic_change_active_finish(struct kvm_vcpu *vcpu, u32 intid)
|
||||||
{
|
{
|
||||||
if (intid < VGIC_NR_PRIVATE_IRQS)
|
if (intid > VGIC_NR_PRIVATE_IRQS)
|
||||||
kvm_arm_resume_vcpu(vcpu);
|
|
||||||
else
|
|
||||||
kvm_arm_resume_guest(vcpu->kvm);
|
kvm_arm_resume_guest(vcpu->kvm);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void __vgic_mmio_write_cactive(struct kvm_vcpu *vcpu,
|
||||||
|
gpa_t addr, unsigned int len,
|
||||||
|
unsigned long val)
|
||||||
|
{
|
||||||
|
u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
|
||||||
|
int i;
|
||||||
|
|
||||||
|
for_each_set_bit(i, &val, len * 8) {
|
||||||
|
struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
|
||||||
|
vgic_mmio_change_active(vcpu, irq, false);
|
||||||
|
vgic_put_irq(vcpu->kvm, irq);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
void vgic_mmio_write_cactive(struct kvm_vcpu *vcpu,
|
void vgic_mmio_write_cactive(struct kvm_vcpu *vcpu,
|
||||||
gpa_t addr, unsigned int len,
|
gpa_t addr, unsigned int len,
|
||||||
unsigned long val)
|
unsigned long val)
|
||||||
{
|
{
|
||||||
u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
|
u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
|
||||||
|
|
||||||
|
mutex_lock(&vcpu->kvm->lock);
|
||||||
|
vgic_change_active_prepare(vcpu, intid);
|
||||||
|
|
||||||
|
__vgic_mmio_write_cactive(vcpu, addr, len, val);
|
||||||
|
|
||||||
|
vgic_change_active_finish(vcpu, intid);
|
||||||
|
mutex_unlock(&vcpu->kvm->lock);
|
||||||
|
}
|
||||||
|
|
||||||
|
void vgic_mmio_uaccess_write_cactive(struct kvm_vcpu *vcpu,
|
||||||
|
gpa_t addr, unsigned int len,
|
||||||
|
unsigned long val)
|
||||||
|
{
|
||||||
|
__vgic_mmio_write_cactive(vcpu, addr, len, val);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
|
||||||
|
gpa_t addr, unsigned int len,
|
||||||
|
unsigned long val)
|
||||||
|
{
|
||||||
|
u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
|
||||||
int i;
|
int i;
|
||||||
|
|
||||||
vgic_change_active_prepare(vcpu, intid);
|
|
||||||
for_each_set_bit(i, &val, len * 8) {
|
for_each_set_bit(i, &val, len * 8) {
|
||||||
struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
|
struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
|
||||||
vgic_mmio_change_active(vcpu, irq, false);
|
vgic_mmio_change_active(vcpu, irq, true);
|
||||||
vgic_put_irq(vcpu->kvm, irq);
|
vgic_put_irq(vcpu->kvm, irq);
|
||||||
}
|
}
|
||||||
vgic_change_active_finish(vcpu, intid);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
|
void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
|
||||||
|
@ -272,15 +304,21 @@ void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
|
||||||
unsigned long val)
|
unsigned long val)
|
||||||
{
|
{
|
||||||
u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
|
u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
|
||||||
int i;
|
|
||||||
|
|
||||||
|
mutex_lock(&vcpu->kvm->lock);
|
||||||
vgic_change_active_prepare(vcpu, intid);
|
vgic_change_active_prepare(vcpu, intid);
|
||||||
for_each_set_bit(i, &val, len * 8) {
|
|
||||||
struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
|
__vgic_mmio_write_sactive(vcpu, addr, len, val);
|
||||||
vgic_mmio_change_active(vcpu, irq, true);
|
|
||||||
vgic_put_irq(vcpu->kvm, irq);
|
|
||||||
}
|
|
||||||
vgic_change_active_finish(vcpu, intid);
|
vgic_change_active_finish(vcpu, intid);
|
||||||
|
mutex_unlock(&vcpu->kvm->lock);
|
||||||
|
}
|
||||||
|
|
||||||
|
void vgic_mmio_uaccess_write_sactive(struct kvm_vcpu *vcpu,
|
||||||
|
gpa_t addr, unsigned int len,
|
||||||
|
unsigned long val)
|
||||||
|
{
|
||||||
|
__vgic_mmio_write_sactive(vcpu, addr, len, val);
|
||||||
}
|
}
|
||||||
|
|
||||||
unsigned long vgic_mmio_read_priority(struct kvm_vcpu *vcpu,
|
unsigned long vgic_mmio_read_priority(struct kvm_vcpu *vcpu,
|
||||||
|
|
|
@ -75,7 +75,7 @@ extern struct kvm_io_device_ops kvm_io_gic_ops;
|
||||||
* The _WITH_LENGTH version instantiates registers with a fixed length
|
* The _WITH_LENGTH version instantiates registers with a fixed length
|
||||||
* and is mutually exclusive with the _PER_IRQ version.
|
* and is mutually exclusive with the _PER_IRQ version.
|
||||||
*/
|
*/
|
||||||
#define REGISTER_DESC_WITH_BITS_PER_IRQ(off, rd, wr, bpi, acc) \
|
#define REGISTER_DESC_WITH_BITS_PER_IRQ(off, rd, wr, ur, uw, bpi, acc) \
|
||||||
{ \
|
{ \
|
||||||
.reg_offset = off, \
|
.reg_offset = off, \
|
||||||
.bits_per_irq = bpi, \
|
.bits_per_irq = bpi, \
|
||||||
|
@ -83,6 +83,8 @@ extern struct kvm_io_device_ops kvm_io_gic_ops;
|
||||||
.access_flags = acc, \
|
.access_flags = acc, \
|
||||||
.read = rd, \
|
.read = rd, \
|
||||||
.write = wr, \
|
.write = wr, \
|
||||||
|
.uaccess_read = ur, \
|
||||||
|
.uaccess_write = uw, \
|
||||||
}
|
}
|
||||||
|
|
||||||
#define REGISTER_DESC_WITH_LENGTH(off, rd, wr, length, acc) \
|
#define REGISTER_DESC_WITH_LENGTH(off, rd, wr, length, acc) \
|
||||||
|
@ -165,6 +167,14 @@ void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
|
||||||
gpa_t addr, unsigned int len,
|
gpa_t addr, unsigned int len,
|
||||||
unsigned long val);
|
unsigned long val);
|
||||||
|
|
||||||
|
void vgic_mmio_uaccess_write_cactive(struct kvm_vcpu *vcpu,
|
||||||
|
gpa_t addr, unsigned int len,
|
||||||
|
unsigned long val);
|
||||||
|
|
||||||
|
void vgic_mmio_uaccess_write_sactive(struct kvm_vcpu *vcpu,
|
||||||
|
gpa_t addr, unsigned int len,
|
||||||
|
unsigned long val);
|
||||||
|
|
||||||
unsigned long vgic_mmio_read_priority(struct kvm_vcpu *vcpu,
|
unsigned long vgic_mmio_read_priority(struct kvm_vcpu *vcpu,
|
||||||
gpa_t addr, unsigned int len);
|
gpa_t addr, unsigned int len);
|
||||||
|
|
||||||
|
|
|
@ -21,6 +21,10 @@
|
||||||
|
|
||||||
#include "vgic.h"
|
#include "vgic.h"
|
||||||
|
|
||||||
|
static bool group0_trap;
|
||||||
|
static bool group1_trap;
|
||||||
|
static bool common_trap;
|
||||||
|
|
||||||
void vgic_v3_set_underflow(struct kvm_vcpu *vcpu)
|
void vgic_v3_set_underflow(struct kvm_vcpu *vcpu)
|
||||||
{
|
{
|
||||||
struct vgic_v3_cpu_if *cpuif = &vcpu->arch.vgic_cpu.vgic_v3;
|
struct vgic_v3_cpu_if *cpuif = &vcpu->arch.vgic_cpu.vgic_v3;
|
||||||
|
@ -258,6 +262,12 @@ void vgic_v3_enable(struct kvm_vcpu *vcpu)
|
||||||
|
|
||||||
/* Get the show on the road... */
|
/* Get the show on the road... */
|
||||||
vgic_v3->vgic_hcr = ICH_HCR_EN;
|
vgic_v3->vgic_hcr = ICH_HCR_EN;
|
||||||
|
if (group0_trap)
|
||||||
|
vgic_v3->vgic_hcr |= ICH_HCR_TALL0;
|
||||||
|
if (group1_trap)
|
||||||
|
vgic_v3->vgic_hcr |= ICH_HCR_TALL1;
|
||||||
|
if (common_trap)
|
||||||
|
vgic_v3->vgic_hcr |= ICH_HCR_TC;
|
||||||
}
|
}
|
||||||
|
|
||||||
int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq)
|
int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq)
|
||||||
|
@ -429,6 +439,26 @@ out:
|
||||||
return ret;
|
return ret;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
DEFINE_STATIC_KEY_FALSE(vgic_v3_cpuif_trap);
|
||||||
|
|
||||||
|
static int __init early_group0_trap_cfg(char *buf)
|
||||||
|
{
|
||||||
|
return strtobool(buf, &group0_trap);
|
||||||
|
}
|
||||||
|
early_param("kvm-arm.vgic_v3_group0_trap", early_group0_trap_cfg);
|
||||||
|
|
||||||
|
static int __init early_group1_trap_cfg(char *buf)
|
||||||
|
{
|
||||||
|
return strtobool(buf, &group1_trap);
|
||||||
|
}
|
||||||
|
early_param("kvm-arm.vgic_v3_group1_trap", early_group1_trap_cfg);
|
||||||
|
|
||||||
|
static int __init early_common_trap_cfg(char *buf)
|
||||||
|
{
|
||||||
|
return strtobool(buf, &common_trap);
|
||||||
|
}
|
||||||
|
early_param("kvm-arm.vgic_v3_common_trap", early_common_trap_cfg);
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* vgic_v3_probe - probe for a GICv3 compatible interrupt controller in DT
|
* vgic_v3_probe - probe for a GICv3 compatible interrupt controller in DT
|
||||||
* @node: pointer to the DT node
|
* @node: pointer to the DT node
|
||||||
|
@ -480,6 +510,21 @@ int vgic_v3_probe(const struct gic_kvm_info *info)
|
||||||
if (kvm_vgic_global_state.vcpu_base == 0)
|
if (kvm_vgic_global_state.vcpu_base == 0)
|
||||||
kvm_info("disabling GICv2 emulation\n");
|
kvm_info("disabling GICv2 emulation\n");
|
||||||
|
|
||||||
|
#ifdef CONFIG_ARM64
|
||||||
|
if (cpus_have_const_cap(ARM64_WORKAROUND_CAVIUM_30115)) {
|
||||||
|
group0_trap = true;
|
||||||
|
group1_trap = true;
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
if (group0_trap || group1_trap || common_trap) {
|
||||||
|
kvm_info("GICv3 sysreg trapping enabled ([%s%s%s], reduced performance)\n",
|
||||||
|
group0_trap ? "G0" : "",
|
||||||
|
group1_trap ? "G1" : "",
|
||||||
|
common_trap ? "C" : "");
|
||||||
|
static_branch_enable(&vgic_v3_cpuif_trap);
|
||||||
|
}
|
||||||
|
|
||||||
kvm_vgic_global_state.vctrl_base = NULL;
|
kvm_vgic_global_state.vctrl_base = NULL;
|
||||||
kvm_vgic_global_state.type = VGIC_V3;
|
kvm_vgic_global_state.type = VGIC_V3;
|
||||||
kvm_vgic_global_state.max_gic_vcpus = VGIC_V3_MAX_CPUS;
|
kvm_vgic_global_state.max_gic_vcpus = VGIC_V3_MAX_CPUS;
|
||||||
|
|
|
@ -35,11 +35,12 @@ struct vgic_global kvm_vgic_global_state __ro_after_init = {
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Locking order is always:
|
* Locking order is always:
|
||||||
* its->cmd_lock (mutex)
|
* kvm->lock (mutex)
|
||||||
* its->its_lock (mutex)
|
* its->cmd_lock (mutex)
|
||||||
* vgic_cpu->ap_list_lock
|
* its->its_lock (mutex)
|
||||||
* kvm->lpi_list_lock
|
* vgic_cpu->ap_list_lock
|
||||||
* vgic_irq->irq_lock
|
* kvm->lpi_list_lock
|
||||||
|
* vgic_irq->irq_lock
|
||||||
*
|
*
|
||||||
* If you need to take multiple locks, always take the upper lock first,
|
* If you need to take multiple locks, always take the upper lock first,
|
||||||
* then the lower ones, e.g. first take the its_lock, then the irq_lock.
|
* then the lower ones, e.g. first take the its_lock, then the irq_lock.
|
||||||
|
@ -234,10 +235,14 @@ static void vgic_sort_ap_list(struct kvm_vcpu *vcpu)
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Only valid injection if changing level for level-triggered IRQs or for a
|
* Only valid injection if changing level for level-triggered IRQs or for a
|
||||||
* rising edge.
|
* rising edge, and in-kernel connected IRQ lines can only be controlled by
|
||||||
|
* their owner.
|
||||||
*/
|
*/
|
||||||
static bool vgic_validate_injection(struct vgic_irq *irq, bool level)
|
static bool vgic_validate_injection(struct vgic_irq *irq, bool level, void *owner)
|
||||||
{
|
{
|
||||||
|
if (irq->owner != owner)
|
||||||
|
return false;
|
||||||
|
|
||||||
switch (irq->config) {
|
switch (irq->config) {
|
||||||
case VGIC_CONFIG_LEVEL:
|
case VGIC_CONFIG_LEVEL:
|
||||||
return irq->line_level != level;
|
return irq->line_level != level;
|
||||||
|
@ -285,8 +290,10 @@ retry:
|
||||||
* won't see this one until it exits for some other
|
* won't see this one until it exits for some other
|
||||||
* reason.
|
* reason.
|
||||||
*/
|
*/
|
||||||
if (vcpu)
|
if (vcpu) {
|
||||||
|
kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
|
||||||
kvm_vcpu_kick(vcpu);
|
kvm_vcpu_kick(vcpu);
|
||||||
|
}
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -332,6 +339,7 @@ retry:
|
||||||
spin_unlock(&irq->irq_lock);
|
spin_unlock(&irq->irq_lock);
|
||||||
spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
|
spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
|
||||||
|
|
||||||
|
kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
|
||||||
kvm_vcpu_kick(vcpu);
|
kvm_vcpu_kick(vcpu);
|
||||||
|
|
||||||
return true;
|
return true;
|
||||||
|
@ -346,13 +354,16 @@ retry:
|
||||||
* false: to ignore the call
|
* false: to ignore the call
|
||||||
* Level-sensitive true: raise the input signal
|
* Level-sensitive true: raise the input signal
|
||||||
* false: lower the input signal
|
* false: lower the input signal
|
||||||
|
* @owner: The opaque pointer to the owner of the IRQ being raised to verify
|
||||||
|
* that the caller is allowed to inject this IRQ. Userspace
|
||||||
|
* injections will have owner == NULL.
|
||||||
*
|
*
|
||||||
* The VGIC is not concerned with devices being active-LOW or active-HIGH for
|
* The VGIC is not concerned with devices being active-LOW or active-HIGH for
|
||||||
* level-sensitive interrupts. You can think of the level parameter as 1
|
* level-sensitive interrupts. You can think of the level parameter as 1
|
||||||
* being HIGH and 0 being LOW and all devices being active-HIGH.
|
* being HIGH and 0 being LOW and all devices being active-HIGH.
|
||||||
*/
|
*/
|
||||||
int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
|
int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
|
||||||
bool level)
|
bool level, void *owner)
|
||||||
{
|
{
|
||||||
struct kvm_vcpu *vcpu;
|
struct kvm_vcpu *vcpu;
|
||||||
struct vgic_irq *irq;
|
struct vgic_irq *irq;
|
||||||
|
@ -374,7 +385,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
|
||||||
|
|
||||||
spin_lock(&irq->irq_lock);
|
spin_lock(&irq->irq_lock);
|
||||||
|
|
||||||
if (!vgic_validate_injection(irq, level)) {
|
if (!vgic_validate_injection(irq, level, owner)) {
|
||||||
/* Nothing to see here, move along... */
|
/* Nothing to see here, move along... */
|
||||||
spin_unlock(&irq->irq_lock);
|
spin_unlock(&irq->irq_lock);
|
||||||
vgic_put_irq(kvm, irq);
|
vgic_put_irq(kvm, irq);
|
||||||
|
@ -430,6 +441,39 @@ int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int virt_irq)
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* kvm_vgic_set_owner - Set the owner of an interrupt for a VM
|
||||||
|
*
|
||||||
|
* @vcpu: Pointer to the VCPU (used for PPIs)
|
||||||
|
* @intid: The virtual INTID identifying the interrupt (PPI or SPI)
|
||||||
|
* @owner: Opaque pointer to the owner
|
||||||
|
*
|
||||||
|
* Returns 0 if intid is not already used by another in-kernel device and the
|
||||||
|
* owner is set, otherwise returns an error code.
|
||||||
|
*/
|
||||||
|
int kvm_vgic_set_owner(struct kvm_vcpu *vcpu, unsigned int intid, void *owner)
|
||||||
|
{
|
||||||
|
struct vgic_irq *irq;
|
||||||
|
int ret = 0;
|
||||||
|
|
||||||
|
if (!vgic_initialized(vcpu->kvm))
|
||||||
|
return -EAGAIN;
|
||||||
|
|
||||||
|
/* SGIs and LPIs cannot be wired up to any device */
|
||||||
|
if (!irq_is_ppi(intid) && !vgic_valid_spi(vcpu->kvm, intid))
|
||||||
|
return -EINVAL;
|
||||||
|
|
||||||
|
irq = vgic_get_irq(vcpu->kvm, vcpu, intid);
|
||||||
|
spin_lock(&irq->irq_lock);
|
||||||
|
if (irq->owner && irq->owner != owner)
|
||||||
|
ret = -EEXIST;
|
||||||
|
else
|
||||||
|
irq->owner = owner;
|
||||||
|
spin_unlock(&irq->irq_lock);
|
||||||
|
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* vgic_prune_ap_list - Remove non-relevant interrupts from the list
|
* vgic_prune_ap_list - Remove non-relevant interrupts from the list
|
||||||
*
|
*
|
||||||
|
@ -721,8 +765,10 @@ void vgic_kick_vcpus(struct kvm *kvm)
|
||||||
* a good kick...
|
* a good kick...
|
||||||
*/
|
*/
|
||||||
kvm_for_each_vcpu(c, vcpu, kvm) {
|
kvm_for_each_vcpu(c, vcpu, kvm) {
|
||||||
if (kvm_vgic_vcpu_pending_irq(vcpu))
|
if (kvm_vgic_vcpu_pending_irq(vcpu)) {
|
||||||
|
kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
|
||||||
kvm_vcpu_kick(vcpu);
|
kvm_vcpu_kick(vcpu);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -73,17 +73,17 @@ MODULE_LICENSE("GPL");
|
||||||
|
|
||||||
/* Architectures should define their poll value according to the halt latency */
|
/* Architectures should define their poll value according to the halt latency */
|
||||||
unsigned int halt_poll_ns = KVM_HALT_POLL_NS_DEFAULT;
|
unsigned int halt_poll_ns = KVM_HALT_POLL_NS_DEFAULT;
|
||||||
module_param(halt_poll_ns, uint, S_IRUGO | S_IWUSR);
|
module_param(halt_poll_ns, uint, 0644);
|
||||||
EXPORT_SYMBOL_GPL(halt_poll_ns);
|
EXPORT_SYMBOL_GPL(halt_poll_ns);
|
||||||
|
|
||||||
/* Default doubles per-vcpu halt_poll_ns. */
|
/* Default doubles per-vcpu halt_poll_ns. */
|
||||||
unsigned int halt_poll_ns_grow = 2;
|
unsigned int halt_poll_ns_grow = 2;
|
||||||
module_param(halt_poll_ns_grow, uint, S_IRUGO | S_IWUSR);
|
module_param(halt_poll_ns_grow, uint, 0644);
|
||||||
EXPORT_SYMBOL_GPL(halt_poll_ns_grow);
|
EXPORT_SYMBOL_GPL(halt_poll_ns_grow);
|
||||||
|
|
||||||
/* Default resets per-vcpu halt_poll_ns . */
|
/* Default resets per-vcpu halt_poll_ns . */
|
||||||
unsigned int halt_poll_ns_shrink;
|
unsigned int halt_poll_ns_shrink;
|
||||||
module_param(halt_poll_ns_shrink, uint, S_IRUGO | S_IWUSR);
|
module_param(halt_poll_ns_shrink, uint, 0644);
|
||||||
EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
|
EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -3191,6 +3191,12 @@ static int kvm_dev_ioctl_create_vm(unsigned long type)
|
||||||
return PTR_ERR(file);
|
return PTR_ERR(file);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Don't call kvm_put_kvm anymore at this point; file->f_op is
|
||||||
|
* already set, with ->release() being kvm_vm_release(). In error
|
||||||
|
* cases it will be called by the final fput(file) and will take
|
||||||
|
* care of doing kvm_put_kvm(kvm).
|
||||||
|
*/
|
||||||
if (kvm_create_vm_debugfs(kvm, r) < 0) {
|
if (kvm_create_vm_debugfs(kvm, r) < 0) {
|
||||||
put_unused_fd(r);
|
put_unused_fd(r);
|
||||||
fput(file);
|
fput(file);
|
||||||
|
|
Loading…
Reference in New Issue