Commit Graph

2253 Commits

Author SHA1 Message Date
Xiao Guangrong 4484141a94 KVM: fix error paths for failed gfn_to_page() calls
This bug was triggered:
[ 4220.198458] BUG: unable to handle kernel paging request at fffffffffffffffe
[ 4220.203907] IP: [<ffffffff81104d85>] put_page+0xf/0x34
......
[ 4220.237326] Call Trace:
[ 4220.237361]  [<ffffffffa03830d0>] kvm_arch_destroy_vm+0xf9/0x101 [kvm]
[ 4220.237382]  [<ffffffffa036fe53>] kvm_put_kvm+0xcc/0x127 [kvm]
[ 4220.237401]  [<ffffffffa03702bc>] kvm_vcpu_release+0x18/0x1c [kvm]
[ 4220.237407]  [<ffffffff81145425>] __fput+0x111/0x1ed
[ 4220.237411]  [<ffffffff8114550f>] ____fput+0xe/0x10
[ 4220.237418]  [<ffffffff81063511>] task_work_run+0x5d/0x88
[ 4220.237424]  [<ffffffff8104c3f7>] do_exit+0x2bf/0x7ca

The test case:

	printf(fmt, ##args);		\
	exit(-1);} while (0)

static int create_vm(void)
{
	int sys_fd, vm_fd;

	sys_fd = open("/dev/kvm", O_RDWR);
	if (sys_fd < 0)
		die("open /dev/kvm fail.\n");

	vm_fd = ioctl(sys_fd, KVM_CREATE_VM, 0);
	if (vm_fd < 0)
		die("KVM_CREATE_VM fail.\n");

	return vm_fd;
}

static int create_vcpu(int vm_fd)
{
	int vcpu_fd;

	vcpu_fd = ioctl(vm_fd, KVM_CREATE_VCPU, 0);
	if (vcpu_fd < 0)
		die("KVM_CREATE_VCPU ioctl.\n");
	printf("Create vcpu.\n");
	return vcpu_fd;
}

static void *vcpu_thread(void *arg)
{
	int vm_fd = (int)(long)arg;

	create_vcpu(vm_fd);
	return NULL;
}

int main(int argc, char *argv[])
{
	pthread_t thread;
	int vm_fd;

	(void)argc;
	(void)argv;

	vm_fd = create_vm();
	pthread_create(&thread, NULL, vcpu_thread, (void *)(long)vm_fd);
	printf("Exit.\n");
	return 0;
}

It caused by release kvm->arch.ept_identity_map_addr which is the
error page.

The parent thread can send KILL signal to the vcpu thread when it was
exiting which stops faulting pages and potentially allocating memory.
So gfn_to_pfn/gfn_to_page may fail at this time

Fixed by checking the page before it is used

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-10 11:34:11 +03:00
Ren, Yongjie 4f97704555 KVM: x86: Check INVPCID feature bit in EBX of leaf 7
Checks and operations on the INVPCID feature bit should use EBX
of CPUID leaf 7 instead of ECX.

Signed-off-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Yongjie Ren <yongjien.ren@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-09 17:34:01 +03:00
Jamie Iles 749c59fd15 KVM: PIC: fix use of uninitialised variable.
Commit aea218f3cb (KVM: PIC: call ack notifiers for irqs that are
dropped form irr) used an uninitialised variable to track whether an
appropriate apic had been found.  This could result in calling the ack
notifier incorrectly.

Cc: Gleb Natapov <gleb@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: Jamie Iles <jamie@jamieiles.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-04 15:44:42 +03:00
Michael S. Tsirkin 1d92128fe9 KVM: x86: fix KVM_GET_MSR for PV EOI
KVM_GET_MSR was missing support for PV EOI,
which is needed for migration.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-27 18:03:05 -03:00
Avi Kivity 5ad105e569 KVM: x86 emulator: use stack size attribute to mask rsp in stack ops
The sub-register used to access the stack (sp, esp, or rsp) is not
determined by the address size attribute like other memory references,
but by the stack segment's B bit (if not in x86_64 mode).

Fix by using the existing stack_mask() to figure out the correct mask.

This long-existing bug was exposed by a combination of a27685c33a
(emulate invalid guest state by default), which causes many more
instructions to be emulated, and a seabios change (possibly a bug) which
causes the high 16 bits of esp to become polluted across calls to real
mode software interrupts.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-22 18:54:26 -03:00
Takuya Yoshikawa 35f2d16bb9 KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended
Although the possible race described in

  commit 85b7059169
  KVM: MMU: fix shrinking page from the empty mmu

was correct, the real cause of that issue was a more trivial bug of
mmu_shrink() introduced by

  commit 1952639665
  KVM: MMU: do not iterate over all VMs in mmu_shrink()

Here is the bug:

	if (kvm->arch.n_used_mmu_pages > 0) {
		if (!nr_to_scan--)
			break;
		continue;
	}

We skip VMs whose n_used_mmu_pages is not zero and try to shrink others:
in other words we try to shrink empty ones by mistake.

This patch reverses the logic so that mmu_shrink() can free pages from
the first VM whose n_used_mmu_pages is not zero.  Note that we also add
comments explaining the role of nr_to_scan which is not practically
important now, hoping this will be improved in the future.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-22 15:27:13 +03:00
Gleb Natapov 439793d4b3 KVM: x86: update KVM_SAVE_MSRS_BEGIN to correct value
When MSR_KVM_PV_EOI_EN was added to msrs_to_save array
KVM_SAVE_MSRS_BEGIN was not updated accordingly.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-05 16:52:38 +03:00
Avi Kivity aa67f6096c KVM: VMX: Fix ds/es corruption on i386 with preemption
Commit b2da15ac26 ("KVM: VMX: Optimize %ds, %es reload") broke i386
in the following scenario:

  vcpu_load
  ...
  vmx_save_host_state
  vmx_vcpu_run
  (ds.rpl, es.rpl cleared by hardware)

  interrupt
    push ds, es  # pushes bad ds, es
    schedule
      vmx_vcpu_put
        vmx_load_host_state
          reload ds, es (with __USER_DS)
    pop ds, es  # of other thread's stack
    iret
  # other thread runs
  interrupt
    push ds, es
    schedule  # back in vcpu thread
    pop ds, es  # now with rpl=0
    iret
  ...
  vcpu_put
  resume_userspace
  iret  # clears ds, es due to mismatched rpl

(instead of resume_userspace, we might return with SYSEXIT and then
take an exception; when the exception IRETs we end up with cleared
ds, es)

Fix by avoiding the optimization on i386 and reloading ds, es on the
lightweight exit path.

Reported-by: Chris Clayron <chris2553@googlemail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-01 20:23:57 -03:00
Bruce Rogers 4b6486659a KVM: x86: apply kvmclock offset to guest wall clock time
When a guest migrates to a new host, the system time difference from the
previous host is used in the updates to the kvmclock system time visible
to the guest, resulting in a continuation of correct kvmclock based guest
timekeeping.

The wall clock component of the kvmclock provided time is currently not
updated with this same time offset. Since the Linux guest caches the
wall clock based time, this discrepency is not noticed until the guest is
rebooted. After reboot the guest's time calculations are off.

This patch adjusts the wall clock by the kvmclock_offset, resulting in
correct guest time after a reboot.

Cc: Zachary Amsden <zamsden@gmail.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-01 17:23:50 -03:00
Gleb Natapov aea218f3cb KVM: PIC: call ack notifiers for irqs that are dropped form irr
After commit 242ec97c35 PIT interrupts are no longer delivered after
PIC reset. It happens because PIT injects interrupt only if previous one
was acked, but since on PIC reset it is dropped from irr it will never
be delivered and hence acknowledged. Fix that by calling ack notifier on
PIC reset.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-26 12:19:06 +03:00
Linus Torvalds 5fecc9d8f5 KVM updates for the 3.6 merge window
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQDRDNAAoJEI7yEDeUysxlkl8P/3C2AHx2webOU8sVzhfU6ONZ
 ZoGevwBjyZIeJEmiWVpFTTEew1l0PXtpyOocXGNUXIddVnhXTQOKr/Scj4uFbmx8
 ROqgK8NSX9+xOGrBPCoN7SlJkmp+m6uYtwYkl2SGnsEVLWMKkc7J7oqmszCcTQvN
 UXMf7G47/Ul2NUSBdv4Yvizhl4kpvWxluiweDw3E/hIQKN0uyP7CY58qcAztw8nG
 csZBAnnuPFwIAWxHXW3eBBv4UP138HbNDqJ/dujjocM6GnOxmXJmcZ6b57gh+Y64
 3+w9IR4qrRWnsErb/I8inKLJ1Jdcf7yV2FmxYqR4pIXay2Yzo1BsvFd6EB+JavUv
 pJpixrFiDDFoQyXlh4tGpsjpqdXNMLqyG4YpqzSZ46C8naVv9gKE7SXqlXnjyDlb
 Llx3hb9Fop8O5ykYEGHi+gIISAK5eETiQl4yw9RUBDpxydH4qJtqGIbLiDy8y9wi
 Xyi8PBlNl+biJFsK805lxURqTp/SJTC3+Zb7A7CzYEQm5xZw3W/CKZx1ZYBfpaa/
 pWaP6tB7JwgLIVXi4HQayLWqMVwH0soZIn9yazpOEFv6qO8d5QH5RAxAW2VXE3n5
 JDlrajar/lGIdiBVWfwTJLb86gv3QDZtIWoR9mZuLKeKWE/6PRLe7HQpG1pJovsm
 2AsN5bS0BWq+aqPpZHa5
 =pECD
 -----END PGP SIGNATURE-----

Merge tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Avi Kivity:
 "Highlights include
  - full big real mode emulation on pre-Westmere Intel hosts (can be
    disabled with emulate_invalid_guest_state=0)
  - relatively small ppc and s390 updates
  - PCID/INVPCID support in guests
  - EOI avoidance; 3.6 guests should perform better on 3.6 hosts on
    interrupt intensive workloads)
  - Lockless write faults during live migration
  - EPT accessed/dirty bits support for new Intel processors"

Fix up conflicts in:
 - Documentation/virtual/kvm/api.txt:

   Stupid subchapter numbering, added next to each other.

 - arch/powerpc/kvm/booke_interrupts.S:

   PPC asm changes clashing with the KVM fixes

 - arch/s390/include/asm/sigp.h, arch/s390/kvm/sigp.c:

   Duplicated commits through the kvm tree and the s390 tree, with
   subsequent edits in the KVM tree.

* tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits)
  KVM: fix race with level interrupts
  x86, hyper: fix build with !CONFIG_KVM_GUEST
  Revert "apic: fix kvm build on UP without IOAPIC"
  KVM guest: switch to apic_set_eoi_write, apic_write
  apic: add apic_set_eoi_write for PV use
  KVM: VMX: Implement PCID/INVPCID for guests with EPT
  KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform check
  KVM: PPC: Critical interrupt emulation support
  KVM: PPC: e500mc: Fix tlbilx emulation for 64-bit guests
  KVM: PPC64: booke: Set interrupt computation mode for 64-bit host
  KVM: PPC: bookehv: Add ESR flag to Data Storage Interrupt
  KVM: PPC: bookehv64: Add support for std/ld emulation.
  booke: Added crit/mc exception handler for e500v2
  booke/bookehv: Add host crit-watchdog exception support
  KVM: MMU: document mmu-lock and fast page fault
  KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint
  KVM: MMU: trace fast page fault
  KVM: MMU: fast path of handling guest page fault
  KVM: MMU: introduce SPTE_MMU_WRITEABLE bit
  KVM: MMU: fold tlb flush judgement into mmu_spte_update
  ...
2012-07-24 12:01:20 -07:00
Michael S. Tsirkin 1a577b7247 KVM: fix race with level interrupts
When more than 1 source id is in use for the same GSI, we have the
following race related to handling irq_states race:

CPU 0 clears bit 0. CPU 0 read irq_state as 0. CPU 1 sets level to 1.
CPU 1 calls kvm_ioapic_set_irq(1). CPU 0 calls kvm_ioapic_set_irq(0).
Now ioapic thinks the level is 0 but irq_state is not 0.

Fix by performing all irq_states bitmap handling under pic/ioapic lock.
This also removes the need for atomics with irq_states handling.

Reported-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-20 16:12:00 -03:00
Ingo Molnar a2fe194723 Merge branch 'linus' into perf/core
Pick up the latest ring-buffer fixes, before applying a new fix.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-18 11:17:17 +02:00
Mao, Junjie ad756a1603 KVM: VMX: Implement PCID/INVPCID for guests with EPT
This patch handles PCID/INVPCID for guests.

Process-context identifiers (PCIDs) are a facility by which a logical processor
may cache information for multiple linear-address spaces so that the processor
may retain cached information when software switches to a different linear
address space. Refer to section 4.10.1 in IA32 Intel Software Developer's Manual
Volume 3A for details.

For guests with EPT, the PCID feature is enabled and INVPCID behaves as running
natively.
For guests without EPT, the PCID feature is disabled and INVPCID triggers #UD.

Signed-off-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-12 13:07:34 +03:00
Xiao Guangrong 6fbc277053 KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint
The P bit of page fault error code is missed in this tracepoint, fix it by
passing the full error code

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11 16:51:22 +03:00
Xiao Guangrong a72faf2504 KVM: MMU: trace fast page fault
To see what happen on this path and help us to optimize it

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11 16:51:21 +03:00
Xiao Guangrong c7ba5b48cc KVM: MMU: fast path of handling guest page fault
If the the present bit of page fault error code is set, it indicates
the shadow page is populated on all levels, it means what we do is
only modify the access bit which can be done out of mmu-lock

Currently, in order to simplify the code, we only fix the page fault
caused by write-protect on the fast path

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11 16:51:20 +03:00
Xiao Guangrong 49fde3406f KVM: MMU: introduce SPTE_MMU_WRITEABLE bit
This bit indicates whether the spte can be writable on MMU, that means
the corresponding gpte is writable and the corresponding gfn is not
protected by shadow page protection

In the later path, SPTE_MMU_WRITEABLE will indicates whether the spte
can be locklessly updated

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11 16:51:19 +03:00
Xiao Guangrong 6e7d035407 KVM: MMU: fold tlb flush judgement into mmu_spte_update
mmu_spte_update() is the common function, we can easily audit the path

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11 16:51:18 +03:00
Xiao Guangrong 4f5982a56a KVM: VMX: export PFEC.P bit on ept
Export the present bit of page fault error code, the later patch
will use it

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11 16:51:17 +03:00
Xiao Guangrong 8e22f955fb KVM: MMU: cleanup spte_write_protect
Use __drop_large_spte to cleanup this function and comment spte_write_protect

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11 16:51:16 +03:00
Xiao Guangrong d13bc5b5a1 KVM: MMU: abstract spte write-protect
Introduce a common function to abstract spte write-protect to
cleanup the code

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11 16:51:14 +03:00
Xiao Guangrong 2f84569f97 KVM: MMU: return bool in __rmap_write_protect
The reture value of __rmap_write_protect is either 1 or 0, use
true/false instead of these

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11 16:51:13 +03:00
Avi Kivity a27685c33a KVM: VMX: Emulate invalid guest state by default
Our emulation should be complete enough that we can emulate guests
while they are in big real mode, or in a mode transition that is not
virtualizable without unrestricted guest support.

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09 14:19:05 +03:00
Avi Kivity 8089000616 KVM: x86 emulator: implement LTR
Opcode 0F 00 /3.  Encountered during Windows XP secondary processor bringup.

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09 14:19:05 +03:00
Avi Kivity 869be99c75 KVM: x86 emulator: make loading TR set the busy bit
Guest software doesn't actually depend on it, but vmx will refuse us
entry if we don't.  Set the bit in both the cached segment and memory,
just to be nice.

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09 14:19:05 +03:00
Avi Kivity e919464b53 KVM: x86 emulator: make read_segment_descriptor() return the address
Some operations want to modify the descriptor later on, so save the
address for future use.

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09 14:19:04 +03:00
Avi Kivity a14e579f22 KVM: x86 emulator: emulate LLDT
Opcode 0F 00 /2. Used by isolinux durign the protected mode transition.

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09 14:19:04 +03:00
Avi Kivity 9299836e63 KVM: x86 emulator: emulate BSWAP
Opcodes 0F C8 - 0F CF.

Used by the SeaBIOS cdrom code (though not in big real mode).

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09 14:19:04 +03:00
Avi Kivity de5f70e0c6 KVM: VMX: Improve error reporting during invalid guest state emulation
If instruction emulation fails, report it properly to userspace.

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09 14:19:04 +03:00
Avi Kivity de87dcddc7 KVM: VMX: Stop invalid guest state emulation on pending event
Process the event, possibly injecting an interrupt, before continuing.

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09 14:19:04 +03:00
Avi Kivity 612e89f015 KVM: x86 emulator: implement ENTER
Opcode C8.

Only ENTER with lexical nesting depth 0 is implemented, since others are
very rare.  We'll fail emulation if nonzero lexical depth is used so data
is not corrupted.

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09 14:19:03 +03:00
Avi Kivity 51ddff50cb KVM: x86 emulator: split push logic from push opcode emulation
This allows us to reuse the code without populating ctxt->src and
overriding ctxt->op_bytes.

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09 14:19:03 +03:00
Avi Kivity 361cad2b50 KVM: x86 emulator: fix byte-sized MOVZX/MOVSX
Commit 2adb5ad9fe removed ByteOp from MOVZX/MOVSX, replacing them by
SrcMem8, but neglected to fix the dependency in the emulation code
on ByteOp.  This caused the instruction not to have any effect in
some circumstances.

Fix by replacing the check for ByteOp with the equivalent src.op_bytes == 1.

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09 14:19:03 +03:00
Avi Kivity 2dd7caa092 KVM: x86 emulator: emulate LAHF
Opcode 9F.

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09 14:19:03 +03:00
Avi Kivity 7c068e4558 KVM: VMX: Continue emulating after batch exhausted
If we return early from an invalid guest state emulation loop, make
sure we return to it later if the guest state is still invalid.

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09 14:19:03 +03:00
Avi Kivity bdea48e305 KVM: VMX: Fix interrupt exit condition during emulation
Checking EFLAGS.IF is incorrect as we might be in interrupt shadow.  If
that is the case, the main loop will notice that and not inject the interrupt,
causing an endless loop.

Fix by using vmx_interrupt_allowed() to check if we can inject an interrupt
instead.

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09 14:19:02 +03:00
Avi Kivity 96051572c8 KVM: x86 emulator: emulate SGDT/SIDT
Opcodes 0F 01 /0 and 0F 01 /1

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09 14:19:02 +03:00
Avi Kivity a6e3407bb1 KVM: Fix SS default ESP/EBP based addressing
We correctly default to SS when BP is used as a base in 16-bit address mode,
but we don't do that for 32-bit mode.

Fix by adjusting the default to SS when either ESP or EBP is used as the base
register.

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09 14:19:02 +03:00
Avi Kivity f47cfa3174 KVM: x86 emulator: emulate LEAVE
Opcode c9; used by some variants of Windows during boot, in big real mode.

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09 14:19:01 +03:00
Avi Kivity b8405c184b KVM: VMX: Limit iterations with emulator_invalid_guest_state
Otherwise, if the guest ends up looping, we never exit the srcu critical
section, which causes synchronize_srcu() to hang.

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09 14:19:01 +03:00
Avi Kivity f0495f9b99 KVM: VMX: Relax check on unusable segment
Some userspace (e.g. QEMU 1.1) munge the d and g bits of segment
descriptors, causing us not to recognize them as unusable segments
with emulate_invalid_guest_state=1.  Relax the check by testing for
segment not present (a non-present segment cannot be usable).

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09 14:19:01 +03:00
Avi Kivity 510425ff33 KVM: x86 emulator: fix LIDT/LGDT in long mode
The operand size for these instructions is 8 bytes in long mode, even without
a REX prefix.  Set it explicitly.

Triggered while booting Linux with emulate_invalid_guest_state=1.

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09 14:19:01 +03:00
Avi Kivity 79d5b4c3cd KVM: x86 emulator: allow loading null SS in long mode
Null SS is valid in long mode; allow loading it.

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09 14:19:01 +03:00
Avi Kivity 6d6eede4a0 KVM: x86 emulator: emulate cpuid
Opcode 0F A2.

Used by Linux during the mode change trampoline while in a state that is
not virtualizable on vmx without unrestricted_guest, so we need to emulate
it is emulate_invalid_guest_state=1.

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09 14:19:00 +03:00
Avi Kivity 0017f93a27 KVM: x86 emulator: change ->get_cpuid() accessor to use the x86 semantics
Instead of getting an exact leaf, follow the spec and fall back to the last
main leaf instead.  This lets us easily emulate the cpuid instruction in the
emulator.

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09 14:19:00 +03:00
Avi Kivity 62046e5a86 KVM: Split cpuid register access from computation
Introduce kvm_cpuid() to perform the leaf limit check and calculate
register values, and let kvm_emulate_cpuid() just handle reading and
writing the registers from/to the vcpu.  This allows us to reuse
kvm_cpuid() in a context where directly reading and writing registers
is not desired.

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09 14:19:00 +03:00
Avi Kivity d881e6f6cf KVM: VMX: Return correct CPL during transition to protected mode
In protected mode, the CPL is defined as the lower two bits of CS, as set by
the last far jump.  But during the transition to protected mode, there is no
last far jump, so we need to return zero (the inherited real mode CPL).

Fix by reading CPL from the cache during the transition.  This isn't 100%
correct since we don't set the CPL cache on a far jump, but since protected
mode transition will always jump to a segment with RPL=0, it will always
work.

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09 14:19:00 +03:00
Avi Kivity e676505ac9 KVM: MMU: Force cr3 reload with two dimensional paging on mov cr3 emulation
Currently the MMU's ->new_cr3() callback does nothing when guest paging
is disabled or when two-dimentional paging (e.g. EPT on Intel) is active.
This means that an emulated write to cr3 can be lost; kvm_set_cr3() will
write vcpu-arch.cr3, but the GUEST_CR3 field in the VMCS will retain its
old value and this is what the guest sees.

This bug did not have any effect until now because:
- with unrestricted guest, or with svm, we never emulate a mov cr3 instruction
- without unrestricted guest, and with paging enabled, we also never emulate a
  mov cr3 instruction
- without unrestricted guest, but with paging disabled, the guest's cr3 is
  ignored until the guest enables paging; at this point the value from arch.cr3
  is loaded correctly my the mov cr0 instruction which turns on paging

However, the patchset that enables big real mode causes us to emulate mov cr3
instructions in protected mode sometimes (when guest state is not virtualizable
by vmx); this mov cr3 is effectively ignored and will crash the guest.

The fix is to make nonpaging_new_cr3() call mmu_free_roots() to force a cr3
reload.  This is awkward because now all the new_cr3 callbacks to the same
thing, and because mmu_free_roots() is somewhat of an overkill; but fixing
that is more complicated and will be done after this minimal fix.

Observed in the Window XP 32-bit installer while bringing up secondary vcpus.

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09 14:18:59 +03:00
Ingo Molnar 35c2f48c66 Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core
Pull tracing updates from Steve Rostedt.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-06 11:12:17 +02:00