This is safe because this function is called
on host controlled page table and non-present/non-MMIO
sptes never use bits 1..31. For the EPT case, this
ensures that cases where only the execute bit is set
is marked valid.
Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There is no reason to read the entry/exit control fields of the
VMCS and immediately write back the same value.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Because the vmcs12 preemption timer is emulated through a separate hrtimer,
we can keep on using the preemption timer in the vmcs02 to emulare L1's
TSC deadline timer.
However, the corresponding bit in the pin-based execution control field
must be kept consistent between vmcs01 and vmcs02. On vmentry we copy
it into the vmcs02; on vmexit the preemption timer must be disabled in
the vmcs01 if a preemption timer vmexit happened while in guest mode.
The preemption timer value in the vmcs02 is set by vmx_vcpu_run, so it
need not be considered in prepare_vmcs02.
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Cc: Haozhong Zhang <haozhong.zhang@intel.com>
Tested-by: Wanpeng Li <kernellwp@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The preemption timer for nested VMX is emulated by hrtimer which is started on L2
entry, stopped on L2 exit and evaluated via the check_nested_events hook. However,
nested_vmx_exit_handled is always returning true for preemption timer vmexit. Then,
the L1 preemption timer vmexit is captured and be treated as a L2 preemption
timer vmexit, causing NULL pointer dereferences or worse in the L1 guest's
vmexit handler:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [< (null)>] (null)
PGD 0
Oops: 0010 [#1] SMP
Call Trace:
? kvm_lapic_expired_hv_timer+0x47/0x90 [kvm]
handle_preemption_timer+0xe/0x20 [kvm_intel]
vmx_handle_exit+0x169/0x15a0 [kvm_intel]
? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
kvm_arch_vcpu_ioctl_run+0xdee/0x19d0 [kvm]
? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
? vcpu_load+0x1c/0x60 [kvm]
? kvm_arch_vcpu_load+0x57/0x260 [kvm]
kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
do_vfs_ioctl+0x96/0x6a0
? __fget_light+0x2a/0x90
SyS_ioctl+0x79/0x90
do_syscall_64+0x68/0x180
entry_SYSCALL64_slow_path+0x25/0x25
Code: Bad RIP value.
RIP [< (null)>] (null)
RSP <ffff8800b5263c48>
CR2: 0000000000000000
---[ end trace 9c70c48b1a2bc66e ]---
This can be reproduced readily by preemption timer enabled on L0 and disabled
on L1.
Return false since preemption timer vmexits must never be reflected to L2.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Simplify cpu_has_vmx_preemption_timer. This is consistent with the
rest of setup_vmcs_config and preparatory for the next patch.
Tested-by: Wanpeng Li <kernellwp@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Default the guest PRId register to represent a generic QEMU machine
instead of a 24kc on MIPSr6. 24kc isn't supported by r6 Linux kernels.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When KVM emulates the RDHWR instruction, decode the instruction more
strictly. The rs field (bits 25:21) should be zero, as should bits 10:9.
Bits 8:6 is the register select field in MIPSr6, so we aren't strict
about those bits (no other operations should use that encoding space).
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Recognise the new MIPSr6 CACHE instruction encoding rather than the
pre-r6 one when an r6 kernel is being built. A SPECIAL3 opcode is used
and the immediate field is reduced to 9 bits wide since MIPSr6.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add support in KVM for emulation of instructions in the forbidden slot
of MIPSr6 compact branches. If we hit an exception on the forbidden
slot, then the branch must not have been taken, which makes calculation
of the resume PC trivial.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
MIPSr6 doesn't have lo/hi registers, so don't bother saving or
restoring them, and don't expose them to userland with the KVM ioctl
interface either.
In fact the lo/hi registers aren't callee saved in the MIPS ABIs anyway,
so there is no need to preserve the host lo/hi values at all when
transitioning to and from the guest (which happens via a function call).
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The atomic KVM register access macros in kvm_host.h (for the guest Cause
register with KVM in trap & emulate mode) use ll/sc instructions,
however they still .set mips3, which causes pre-MIPSr6 instruction
encodings to be emitted, even for a MIPSr6 build.
Fix it to use MIPS_ISA_ARCH_LEVEL as other parts of arch/mips already
do.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
__kvm_save_fpu and __kvm_restore_fpu use .set mips64r2 so that they can
access the odd FPU registers as well as the even, however this causes
misassembly of the return instruction on MIPSr6.
Fix by replacing .set mips64r2 with .set fp=64, which doesn't change the
architecture revision.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The opcodes currently defined in inst.h as cbcond0_op & cbcond1_op are
actually defined in the MIPS base instruction set manuals as pop10 &
pop30 respectively. Rename them as such, for consistency with the
documentation.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The opcodes currently defined in inst.h as beqzcjic_op & bnezcjialc_op
are actually defined in the MIPS base instruction set manuals as pop66 &
pop76 respectively. Rename them as such, for consistency with the
documentation.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently on a guest exception the guest's k0 register is saved to the
scratch temp register and the guest k1 saved to the exception base
address + 0x3000 using k0 to extract the Exception Base field of the
EBase register and as the base operand to the store. Both are then
copied into the VCPU structure after the other general purpose registers
have been saved there.
This bouncing to exception base + 0x3000 is not actually necessary as
the VCPU pointer can be determined and written through just as easily
with only a single spare register. The VCPU pointer is already needed in
k1 for saving the other GP registers, so lets save the guest k0 register
straight into the VCPU structure through k1, first saving k1 into the
scratch temp register instead of k0.
This could potentially pave the way for having a single exception base
area for use by all guests.
The ehb after saving the k register to the scratch temp register is also
delayed until just before it needs to be read back.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use a relative branch to get from the individual exception vectors to
the common guest exit handler, rather than loading the address of the
exit handler and jumping to it.
This is made easier due to the fact we are now generating the entry code
dynamically. This will also allow the exception code to be further
reduced in future patches.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Scratch cop0 registers are needed by KVM to be able to save/restore all
the GPRs, including k0/k1, and for storing the VCPU pointer. However no
registers are universally suitable for these purposes, so the decision
should be made at runtime.
Until now, we've used DDATA_LO to store the VCPU pointer, and ErrorEPC
as a temporary. It could be argued that this is abuse of those
registers, and DDATA_LO is known not to be usable on certain
implementations (Cavium Octeon). If KScratch registers are present, use
them instead.
We save & restore the temporary register in addition to the VCPU pointer
register when using a KScratch register for it, as it may be used for
normal host TLB handling too.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On return from the exit handler to the host (without re-entering the
guest) we restore the saved value of the DDATA_LO register which we use
as a scratch register. However we've already restored it ready for
calling the exit handler so there is no need to do it again, so drop
that code.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Check for presence of MSA at uasm assembly time rather than at runtime
in the generated KVM host entry code. This optimises the guest exit path
by eliminating the MSA code entirely if not present, and eliminating the
read of Config3.MSAP and conditional branch if MSA is present.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The FPU handling code on entry from guest is unnecessary if no FPU is
present, so allow it to be dropped at uasm assembly time.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that locore.S is converted to uasm, remove a bunch of the assembly
offset definitions created by asm-offsets.c, including the CPUINFO_ ones
for reading the variable asid mask, and the non FPU/MSA related VCPU_
definitions. KVM's fpu.S and msa.S still use the remaining definitions.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Dump the generated entry code with pr_debug(), similar to how it is done
in tlbex.c, so it can be more easily debugged.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Convert the whole of locore.S (assembly to enter guest and handle
exception entry) to be generated dynamically with uasm. This is done
with minimal changes to the resulting code.
The main changes are:
- Some constants are generated by uasm using LUI+ADDIU instead of
LUI+ORI.
- Loading of lo and hi are swapped around in vcpu_run but not when
resuming the guest after an exit. Both bits of logic are now generated
by the same code.
- Register MOVEs in uasm use different ADDU operand ordering to GNU as,
putting zero register into rs instead of rt.
- The JALR.HB to call the C exit handler is switched to JALR, since the
hazard barrier would appear to be unnecessary.
This will allow further optimisation in the future to dynamically handle
the capabilities of the CPU.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add the R6 MUL instruction encoding for 3 operand signed multiply to
uasm so that KVM can use uasm for generating its entry point code at
runtime on R6.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add MTHI/MTLO instructions for writing to the hi & lo registers to uasm
so that KVM can use uasm for generating its entry point code at runtime.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add DI instruction for disabling interrupts to uasm so that KVM can use
uasm for generating its entry point code at runtime.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add CFCMSA/CTCMSA instructions for accessing MSA control registers to
uasm so that KVM can use uasm for generating its entry point code at
runtime.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add CFC1/CTC1 instructions for accessing FP control registers to uasm so
that KVM can use uasm for generating its entry point code at runtime.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This contains a fix for PER ifetch events. As we now have a handler
for a problem state instruction (sthyi) that could be stepped with a
debugger we should try to do the right thing regarding PER in our
instruction handlers. With this fix the handling for intercepted
instructions is fixed in general, thus fixing other oddball cases as
well (e.g. kprobes single stepping)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJXe52nAAoJEBF7vIC1phx8UZoP/36dTu17lQaRDF6svDCPm/kx
NimYPYa5ssConqUU9/eLv1JGQh8eaxuYJkOJPTFdPekPzDJnQhroOJn5VEdCnjtJ
AscfycLj6valUTyiECnQgEXKmdDMeKSh3rXwRPqvr7/Jmgx0rAVGBbW79dNrBqPl
b0VTDBrsmoqZUJN+dN8ccnsracwRRcLrqYILtg3FWSuE+cXRmqJMbkXe/+nKt1hV
0cMHQ2IsDgLzM/8WGASbmPij600aFc0K42l3dg2gnY7e4OfSrcpgotNN/8XlgfY4
FHOybJQ8gNGm0NfhtROZYeO/CQv8T8luEAXAKJscENS7i+9cIjs5XOxDR6YBiPFN
PKHAiMHYNkNvK4c6ZzvwzU7gJAeUtPezakAhyoluJHJ9+Kl8gNUrfiAgzHigzUo2
R+estafOR2incPr5XwjyRexdPlZvqt5RZ0iiTR/lQVKaPtdg7tXFfMebbi7wGEli
MTZsupqAwTBVpanzpaxxqsg8kr6kYK+t493mAsc4iSm4Y70Bxs6xLAWUlBKCYPFj
Y9vuyP/zMJgfBzwJmkv8MHzYXyKlR5ma+QLkqZttE+vwguXx3O0NW0HtfMmQtzdc
Ib/nYkJDSASgPm+RDvSDa25PG4evzNkfJEQsH6kU94L9PPa82eu14eRxoz3s8Xlj
U0gdbXkL4q2KG/qb8Jfm
=Cula
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-4.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: Fix for kvm/next (4.8) part 3
This contains a fix for PER ifetch events. As we now have a handler
for a problem state instruction (sthyi) that could be stepped with a
debugger we should try to do the right thing regarding PER in our
instruction handlers. With this fix the handling for intercepted
instructions is fixed in general, thus fixing other oddball cases as
well (e.g. kprobes single stepping)
Use ARRAY_SIZE instead of dividing sizeof array with sizeof an element
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The vGPU folks would like to trap the first access to a BAR by setting
vm_ops on the VMAs produced by mmap-ing a VFIO device. The fault handler
then can use remap_pfn_range to place some non-reserved pages in the VMA.
This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
and fixup_user_fault together help supporting it. The patch also supports
VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
reference counting.
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Tested-by: Neo Jia <cjia@nvidia.com>
Reported-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Handle VM_IO like VM_PFNMAP, as is common in the rest of Linux; extract
the formula to convert hva->pfn into a new function, which will soon
gain more capabilities.
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In case we have to emuluate an instruction or part of it (instruction,
partial instruction, operation exception), we have to inject a PER
instruction-fetching event for that instruction, if hardware told us to do
so.
In case we retry an instruction, we must not inject the PER event.
Please note that we don't filter the events properly yet, so guest
debugging will be visible for the guest.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
INFO: rcu_sched detected stalls on CPUs/tasks:
1-...: (11800 GPs behind) idle=45d/140000000000000/0 softirq=0/0 fqs=21663
(detected by 0, t=65016 jiffies, g=11500, c=11499, q=719)
Task dump for CPU 1:
qemu-system-x86 R running task 0 3529 3525 0x00080808
ffff8802021791a0 ffff880212895040 0000000000000001 00007f1c2c00db40
ffff8801dd20fcd3 ffffc90002b98000 ffff8801dd20fc88 ffff8801dd20fcf8
0000000000000286 ffff8801dd2ac538 ffff8801dd20fcc0 ffffffffc06949c9
Call Trace:
? kvm_write_guest_cached+0xb9/0x160 [kvm]
? __delay+0xf/0x20
? wait_lapic_expire+0x14a/0x200 [kvm]
? kvm_arch_vcpu_ioctl_run+0xcbe/0x1b00 [kvm]
? kvm_arch_vcpu_ioctl_run+0xe34/0x1b00 [kvm]
? kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
? __fget+0x5/0x210
? do_vfs_ioctl+0x96/0x6a0
? __fget_light+0x2a/0x90
? SyS_ioctl+0x79/0x90
? do_syscall_64+0x7c/0x1e0
? entry_SYSCALL64_slow_path+0x25/0x25
This can be reproduced readily by running a full dynticks guest(since hrtimer
in guest is heavily used) w/ lapic_timer_advance disabled.
If fail to program hardware preemption timer, we will fallback to hrtimer based
method, however, a previous programmed preemption timer miss to cancel in this
scenario which results in one hardware preemption timer and one hrtimer emulated
tsc deadline timer run simultaneously. So sometimes the target guest deadline
tsc is earlier than guest tsc, which leads to the computation in vmx_set_hv_timer
can underflow and cause delta_tsc to be set a huge value, then host soft lockup
as above.
This patch fix it by cancelling the previous programmed preemption timer if there
is once we failed to program the new preemption timer and fallback to hrtimer
based method.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If the TSC deadline timer is programmed really close to the deadline or
even in the past, the computation in vmx_set_hv_timer can underflow and
cause delta_tsc to be set to a huge value. This generally results
in vmx_set_hv_timer returning -ERANGE, but we can fix it by limiting
delta_tsc to be positive or zero.
Reported-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This gains a few clock cycles per vmexit. On Intel there is no need
anymore to enable the interrupts in vmx_handle_external_intr, since
we are using the "acknowledge interrupt on exit" feature. AMD
needs to do that, and must be careful to avoid the interrupt shadow.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use the functions from context_tracking.h directly.
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Make kvm_guest_{enter,exit} and __kvm_guest_{enter,exit} trivial wrappers
around the code in context_tracking.h. Name the context_tracking.h functions
consistently with what those for kernel<->user switch.
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Combine the kvm_enter, kvm_reenter and kvm_out trace events into a
single kvm_transition event class to reduce duplication and bloat.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Fixes: 93258604ab ("MIPS: KVM: Add guest mode switch trace events")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM reads the current boottime value as a struct timespec in order to
calculate the guest wallclock time, resulting in an overflow in 2038
on 32-bit systems.
The data then gets passed as an unsigned 32-bit number to the guest,
and that in turn overflows in 2106.
We cannot do much about the second overflow, which affects both 32-bit
and 64-bit hosts, but we can ensure that they both behave the same
way and don't overflow until 2106, by using getboottime64() to read
a timespec64 value.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On Intel platforms, this patch adds LMCE to KVM MCE supported
capabilities and handles guest access to LMCE related MSRs.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
[Haozhong: macro KVM_MCE_CAP_SUPPORTED => variable kvm_mce_cap_supported
Only enable LMCE on Intel platform
Check MSR_IA32_FEATURE_CONTROL when handling guest
access to MSR_IA32_MCG_EXT_CTL]
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM currently does not check the value written to guest
MSR_IA32_FEATURE_CONTROL, though bits corresponding to disabled features
may be set. This patch makes KVM to validate individual bits written to
guest MSR_IA32_FEATURE_CONTROL according to enabled features.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
msr_ia32_feature_control will be used for LMCE and not depend only on
nested anymore, so move it from struct nested_vmx to struct vcpu_vmx.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
With an updated QEMU this allows to create nested KVM guests
(KVM under KVM) on s390.
s390 memory management changes from Martin Schwidefsky or
acked by Martin. One common code memory management change (pageref)
acked by Andrew Morton.
The feature has to be enabled with the nested medule parameter.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJXaS0XAAoJEBF7vIC1phx8rqMP/3EUiQbD/seIbkc2xqdApbZX
xEDu6215f84N3KsLlZC/z/PhoxZ9z+GrHu+p2DqgnoMyDMj3C32s0OVecdxrE2+H
pdaEY2ju8rjnY2P4hHzAfdWhdZtwsgdMIX8qDesQjcz4Z1QdOBXymEUpV7d35E/O
WYqZw6EY6aOB338XKPaD03ITC+us/dW75ctPvqRzh5EKjjIMeOATonyGx0oHi14s
qXRK1tfq8OegJGzioW5hOIlUkawycNu5+uIRjyoxzLuEPjn68Bomfi6W7Mak0BIc
v8kEylFgMvSHKy0mWXCZsxj3hD6kNUzlJ+gRYFCiVWvdw6ywFzQiHP2nwLApg7Op
2a5S4wzQ3FIhA8u0k7JQBUncJggRZByitm4DerezHCZzKI9QimYrcKp2HWBQtI+0
12VC9/A0qyy2mKvTlgaDS4SQnPV2P6ztrLgORgK6F9TPq4w8mMYxG9Nz6neea6Po
cotjEtqZivKNno1MOUzAg3jsrTMy4O4ktS0e8JSog2pDRHIrMiqCOcq1B0cRGK6P
8UmcpkK6yiEgCCbtqEU3QhcIMY6srTC6xArCjuieTY9v6EtBWAioRXuHRkVhs8xR
HkEdlWlYTXeGpqcR2cCrUzEzUKdVmyTmnIVtoy/FfRfKEY02fHhiEIS3orqXoc4y
+/yPsoGVxt8lMFs3/E8Y
=ubb/
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: vSIE (nested virtualization) feature for 4.8 (kvm/next)
With an updated QEMU this allows to create nested KVM guests
(KVM under KVM) on s390.
s390 memory management changes from Martin Schwidefsky or
acked by Martin. One common code memory management change (pageref)
acked by Andrew Morton.
The feature has to be enabled with the nested medule parameter.
Let's be careful first and allow nested virtualization only if enabled
by the system administrator. In addition, user space still has to
explicitly enable it via SCLP features for it to work.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
We have certain SIE features that we cannot support for now.
Let's add these features, so user space can directly prepare to enable
them, so we don't have to update yet another component.
In addition, add a comment block, telling why it is for now not possible to
forward/enable these features.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Guest 2 sets up the epoch of guest 3 from his point of view. Therefore,
we have to add the guest 2 epoch to the guest 3 epoch. We also have to take
care of guest 2 epoch changes on STP syncs. This will work just fine by
also updating the guest 3 epoch when a vsie_block has been set for a VCPU.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>