[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
/*
|
|
|
|
* Kernel-based Virtual Machine driver for Linux
|
|
|
|
*
|
|
|
|
* This module enables machines with Intel VT-x extensions to run virtual
|
|
|
|
* machines without emulation or binary translation.
|
|
|
|
*
|
|
|
|
* MMU support
|
|
|
|
*
|
|
|
|
* Copyright (C) 2006 Qumranet, Inc.
|
2010-05-23 23:37:00 +08:00
|
|
|
* Copyright 2010 Red Hat, Inc. and/or its affilates.
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
*
|
|
|
|
* Authors:
|
|
|
|
* Yaniv Kamay <yaniv@qumranet.com>
|
|
|
|
* Avi Kivity <avi@qumranet.com>
|
|
|
|
*
|
|
|
|
* This work is licensed under the terms of the GNU GPL, version 2. See
|
|
|
|
* the COPYING file in the top-level directory.
|
|
|
|
*
|
|
|
|
*/
|
2007-06-29 02:15:57 +08:00
|
|
|
|
2007-12-14 09:35:10 +08:00
|
|
|
#include "mmu.h"
|
2010-01-21 21:31:49 +08:00
|
|
|
#include "x86.h"
|
2009-06-01 03:58:47 +08:00
|
|
|
#include "kvm_cache_regs.h"
|
2007-06-29 02:15:57 +08:00
|
|
|
|
2007-12-16 17:02:48 +08:00
|
|
|
#include <linux/kvm_host.h>
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
#include <linux/types.h>
|
|
|
|
#include <linux/string.h>
|
|
|
|
#include <linux/mm.h>
|
|
|
|
#include <linux/highmem.h>
|
|
|
|
#include <linux/module.h>
|
2007-11-26 20:08:14 +08:00
|
|
|
#include <linux/swap.h>
|
2008-02-23 22:44:30 +08:00
|
|
|
#include <linux/hugetlb.h>
|
2008-02-23 01:21:37 +08:00
|
|
|
#include <linux/compiler.h>
|
2009-12-24 00:35:21 +08:00
|
|
|
#include <linux/srcu.h>
|
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 16:04:11 +08:00
|
|
|
#include <linux/slab.h>
|
2010-05-31 14:28:19 +08:00
|
|
|
#include <linux/uaccess.h>
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-06-29 02:15:57 +08:00
|
|
|
#include <asm/page.h>
|
|
|
|
#include <asm/cmpxchg.h>
|
2007-11-21 20:08:40 +08:00
|
|
|
#include <asm/io.h>
|
2008-11-18 05:03:13 +08:00
|
|
|
#include <asm/vmx.h>
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2008-02-07 20:47:41 +08:00
|
|
|
/*
|
|
|
|
* When setting this variable to true it enables Two-Dimensional-Paging
|
|
|
|
* where the hardware walks 2 page tables:
|
|
|
|
* 1. the guest-virtual to guest-physical
|
|
|
|
* 2. while doing 1. it walks guest-physical to host-physical
|
|
|
|
* If the hardware supports that we don't need to do shadow paging.
|
|
|
|
*/
|
2008-02-23 01:21:37 +08:00
|
|
|
bool tdp_enabled = false;
|
2008-02-07 20:47:41 +08:00
|
|
|
|
2007-01-06 08:36:56 +08:00
|
|
|
#undef MMU_DEBUG
|
|
|
|
|
|
|
|
#undef AUDIT
|
|
|
|
|
|
|
|
#ifdef AUDIT
|
|
|
|
static void kvm_mmu_audit(struct kvm_vcpu *vcpu, const char *msg);
|
|
|
|
#else
|
|
|
|
static void kvm_mmu_audit(struct kvm_vcpu *vcpu, const char *msg) {}
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#ifdef MMU_DEBUG
|
|
|
|
|
|
|
|
#define pgprintk(x...) do { if (dbg) printk(x); } while (0)
|
|
|
|
#define rmap_printk(x...) do { if (dbg) printk(x); } while (0)
|
|
|
|
|
|
|
|
#else
|
|
|
|
|
|
|
|
#define pgprintk(x...) do { } while (0)
|
|
|
|
#define rmap_printk(x...) do { } while (0)
|
|
|
|
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#if defined(MMU_DEBUG) || defined(AUDIT)
|
2008-06-22 21:45:24 +08:00
|
|
|
static int dbg = 0;
|
|
|
|
module_param(dbg, bool, 0644);
|
2007-01-06 08:36:56 +08:00
|
|
|
#endif
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2008-09-24 00:18:41 +08:00
|
|
|
static int oos_shadow = 1;
|
|
|
|
module_param(oos_shadow, bool, 0644);
|
|
|
|
|
2007-04-25 14:17:25 +08:00
|
|
|
#ifndef MMU_DEBUG
|
|
|
|
#define ASSERT(x) do { } while (0)
|
|
|
|
#else
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
#define ASSERT(x) \
|
|
|
|
if (!(x)) { \
|
|
|
|
printk(KERN_WARNING "assertion failed %s:%d: %s\n", \
|
|
|
|
__FILE__, __LINE__, #x); \
|
|
|
|
}
|
2007-04-25 14:17:25 +08:00
|
|
|
#endif
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
|
|
|
#define PT_FIRST_AVAIL_BITS_SHIFT 9
|
|
|
|
#define PT64_SECOND_AVAIL_BITS_SHIFT 52
|
|
|
|
|
|
|
|
#define VALID_PAGE(x) ((x) != INVALID_PAGE)
|
|
|
|
|
|
|
|
#define PT64_LEVEL_BITS 9
|
|
|
|
|
|
|
|
#define PT64_LEVEL_SHIFT(level) \
|
2007-10-08 21:02:08 +08:00
|
|
|
(PAGE_SHIFT + (level - 1) * PT64_LEVEL_BITS)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
|
|
|
#define PT64_LEVEL_MASK(level) \
|
|
|
|
(((1ULL << PT64_LEVEL_BITS) - 1) << PT64_LEVEL_SHIFT(level))
|
|
|
|
|
|
|
|
#define PT64_INDEX(address, level)\
|
|
|
|
(((address) >> PT64_LEVEL_SHIFT(level)) & ((1 << PT64_LEVEL_BITS) - 1))
|
|
|
|
|
|
|
|
|
|
|
|
#define PT32_LEVEL_BITS 10
|
|
|
|
|
|
|
|
#define PT32_LEVEL_SHIFT(level) \
|
2007-10-08 21:02:08 +08:00
|
|
|
(PAGE_SHIFT + (level - 1) * PT32_LEVEL_BITS)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
|
|
|
#define PT32_LEVEL_MASK(level) \
|
|
|
|
(((1ULL << PT32_LEVEL_BITS) - 1) << PT32_LEVEL_SHIFT(level))
|
2009-07-27 22:30:45 +08:00
|
|
|
#define PT32_LVL_OFFSET_MASK(level) \
|
|
|
|
(PT32_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \
|
|
|
|
* PT32_LEVEL_BITS))) - 1))
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
|
|
|
#define PT32_INDEX(address, level)\
|
|
|
|
(((address) >> PT32_LEVEL_SHIFT(level)) & ((1 << PT32_LEVEL_BITS) - 1))
|
|
|
|
|
|
|
|
|
2007-03-09 19:04:31 +08:00
|
|
|
#define PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
#define PT64_DIR_BASE_ADDR_MASK \
|
|
|
|
(PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + PT64_LEVEL_BITS)) - 1))
|
2009-07-27 22:30:45 +08:00
|
|
|
#define PT64_LVL_ADDR_MASK(level) \
|
|
|
|
(PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + (((level) - 1) \
|
|
|
|
* PT64_LEVEL_BITS))) - 1))
|
|
|
|
#define PT64_LVL_OFFSET_MASK(level) \
|
|
|
|
(PT64_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \
|
|
|
|
* PT64_LEVEL_BITS))) - 1))
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
|
|
|
#define PT32_BASE_ADDR_MASK PAGE_MASK
|
|
|
|
#define PT32_DIR_BASE_ADDR_MASK \
|
|
|
|
(PAGE_MASK & ~((1ULL << (PAGE_SHIFT + PT32_LEVEL_BITS)) - 1))
|
2009-07-27 22:30:45 +08:00
|
|
|
#define PT32_LVL_ADDR_MASK(level) \
|
|
|
|
(PAGE_MASK & ~((1ULL << (PAGE_SHIFT + (((level) - 1) \
|
|
|
|
* PT32_LEVEL_BITS))) - 1))
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-11-21 08:06:21 +08:00
|
|
|
#define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | PT_USER_MASK \
|
|
|
|
| PT64_NX_MASK)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-01-06 08:36:38 +08:00
|
|
|
#define RMAP_EXT 4
|
|
|
|
|
2007-12-09 22:15:46 +08:00
|
|
|
#define ACC_EXEC_MASK 1
|
|
|
|
#define ACC_WRITE_MASK PT_WRITABLE_MASK
|
|
|
|
#define ACC_USER_MASK PT_USER_MASK
|
|
|
|
#define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK)
|
|
|
|
|
2009-12-31 18:10:16 +08:00
|
|
|
#include <trace/events/kvm.h>
|
|
|
|
|
2009-07-06 17:21:32 +08:00
|
|
|
#define CREATE_TRACE_POINTS
|
|
|
|
#include "mmutrace.h"
|
|
|
|
|
2009-09-24 02:47:17 +08:00
|
|
|
#define SPTE_HOST_WRITEABLE (1ULL << PT_FIRST_AVAIL_BITS_SHIFT)
|
|
|
|
|
2008-08-21 22:49:56 +08:00
|
|
|
#define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
|
|
|
|
|
2007-01-06 08:36:38 +08:00
|
|
|
struct kvm_rmap_desc {
|
2009-06-10 19:24:23 +08:00
|
|
|
u64 *sptes[RMAP_EXT];
|
2007-01-06 08:36:38 +08:00
|
|
|
struct kvm_rmap_desc *more;
|
|
|
|
};
|
|
|
|
|
2008-12-25 20:39:47 +08:00
|
|
|
struct kvm_shadow_walk_iterator {
|
|
|
|
u64 addr;
|
|
|
|
hpa_t shadow_addr;
|
|
|
|
int level;
|
|
|
|
u64 *sptep;
|
|
|
|
unsigned index;
|
|
|
|
};
|
|
|
|
|
|
|
|
#define for_each_shadow_entry(_vcpu, _addr, _walker) \
|
|
|
|
for (shadow_walk_init(&(_walker), _vcpu, _addr); \
|
|
|
|
shadow_walk_okay(&(_walker)); \
|
|
|
|
shadow_walk_next(&(_walker)))
|
|
|
|
|
2010-04-16 21:29:17 +08:00
|
|
|
typedef int (*mmu_parent_walk_fn) (struct kvm_mmu_page *sp);
|
2008-09-24 00:18:36 +08:00
|
|
|
|
2007-04-15 21:31:09 +08:00
|
|
|
static struct kmem_cache *pte_chain_cache;
|
|
|
|
static struct kmem_cache *rmap_desc_cache;
|
2007-05-30 17:34:53 +08:00
|
|
|
static struct kmem_cache *mmu_page_header_cache;
|
2007-04-15 21:31:09 +08:00
|
|
|
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
static u64 __read_mostly shadow_trap_nonpresent_pte;
|
|
|
|
static u64 __read_mostly shadow_notrap_nonpresent_pte;
|
2008-04-25 21:13:50 +08:00
|
|
|
static u64 __read_mostly shadow_base_present_pte;
|
|
|
|
static u64 __read_mostly shadow_nx_mask;
|
|
|
|
static u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */
|
|
|
|
static u64 __read_mostly shadow_user_mask;
|
|
|
|
static u64 __read_mostly shadow_accessed_mask;
|
|
|
|
static u64 __read_mostly shadow_dirty_mask;
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
|
2009-03-30 16:21:08 +08:00
|
|
|
static inline u64 rsvd_bits(int s, int e)
|
|
|
|
{
|
|
|
|
return ((1ULL << (e - s + 1)) - 1) << s;
|
|
|
|
}
|
|
|
|
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte)
|
|
|
|
{
|
|
|
|
shadow_trap_nonpresent_pte = trap_pte;
|
|
|
|
shadow_notrap_nonpresent_pte = notrap_pte;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_set_nonpresent_ptes);
|
|
|
|
|
2008-04-25 21:13:50 +08:00
|
|
|
void kvm_mmu_set_base_ptes(u64 base_pte)
|
|
|
|
{
|
|
|
|
shadow_base_present_pte = base_pte;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_set_base_ptes);
|
|
|
|
|
|
|
|
void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
|
2009-04-27 20:35:42 +08:00
|
|
|
u64 dirty_mask, u64 nx_mask, u64 x_mask)
|
2008-04-25 21:13:50 +08:00
|
|
|
{
|
|
|
|
shadow_user_mask = user_mask;
|
|
|
|
shadow_accessed_mask = accessed_mask;
|
|
|
|
shadow_dirty_mask = dirty_mask;
|
|
|
|
shadow_nx_mask = nx_mask;
|
|
|
|
shadow_x_mask = x_mask;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes);
|
|
|
|
|
2010-05-12 16:48:18 +08:00
|
|
|
static bool is_write_protection(struct kvm_vcpu *vcpu)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2009-12-30 00:07:30 +08:00
|
|
|
return kvm_read_cr0_bits(vcpu, X86_CR0_WP);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int is_cpuid_PSE36(void)
|
|
|
|
{
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2007-01-26 16:56:41 +08:00
|
|
|
static int is_nx(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2010-01-21 21:31:50 +08:00
|
|
|
return vcpu->arch.efer & EFER_NX;
|
2007-01-26 16:56:41 +08:00
|
|
|
}
|
|
|
|
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
static int is_shadow_present_pte(u64 pte)
|
|
|
|
{
|
|
|
|
return pte != shadow_trap_nonpresent_pte
|
|
|
|
&& pte != shadow_notrap_nonpresent_pte;
|
|
|
|
}
|
|
|
|
|
2008-02-23 22:44:30 +08:00
|
|
|
static int is_large_pte(u64 pte)
|
|
|
|
{
|
|
|
|
return pte & PT_PAGE_SIZE_MASK;
|
|
|
|
}
|
|
|
|
|
2010-01-18 17:45:10 +08:00
|
|
|
static int is_writable_pte(unsigned long pte)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
|
|
|
return pte & PT_WRITABLE_MASK;
|
|
|
|
}
|
|
|
|
|
2009-06-10 19:12:05 +08:00
|
|
|
static int is_dirty_gpte(unsigned long pte)
|
2007-10-11 18:32:30 +08:00
|
|
|
{
|
2009-06-10 17:56:54 +08:00
|
|
|
return pte & PT_DIRTY_MASK;
|
2007-10-11 18:32:30 +08:00
|
|
|
}
|
|
|
|
|
2009-06-10 19:12:05 +08:00
|
|
|
static int is_rmap_spte(u64 pte)
|
2007-01-06 08:36:38 +08:00
|
|
|
{
|
2008-03-23 18:18:19 +08:00
|
|
|
return is_shadow_present_pte(pte);
|
2007-01-06 08:36:38 +08:00
|
|
|
}
|
|
|
|
|
2009-06-10 23:27:03 +08:00
|
|
|
static int is_last_spte(u64 pte, int level)
|
|
|
|
{
|
|
|
|
if (level == PT_PAGE_TABLE_LEVEL)
|
|
|
|
return 1;
|
2009-07-27 22:30:44 +08:00
|
|
|
if (is_large_pte(pte))
|
2009-06-10 23:27:03 +08:00
|
|
|
return 1;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2008-04-03 03:46:56 +08:00
|
|
|
static pfn_t spte_to_pfn(u64 pte)
|
2008-03-23 21:06:23 +08:00
|
|
|
{
|
2008-04-03 03:46:56 +08:00
|
|
|
return (pte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
|
2008-03-23 21:06:23 +08:00
|
|
|
}
|
|
|
|
|
2007-11-21 19:54:47 +08:00
|
|
|
static gfn_t pse36_gfn_delta(u32 gpte)
|
|
|
|
{
|
|
|
|
int shift = 32 - PT32_DIR_PSE36_SHIFT - PAGE_SHIFT;
|
|
|
|
|
|
|
|
return (gpte & PT32_DIR_PSE36_MASK) << shift;
|
|
|
|
}
|
|
|
|
|
2009-06-10 19:24:23 +08:00
|
|
|
static void __set_spte(u64 *sptep, u64 spte)
|
2007-05-31 20:46:04 +08:00
|
|
|
{
|
|
|
|
#ifdef CONFIG_X86_64
|
|
|
|
set_64bit((unsigned long *)sptep, spte);
|
|
|
|
#else
|
|
|
|
set_64bit((unsigned long long *)sptep, spte);
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2007-01-06 08:36:54 +08:00
|
|
|
static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
|
2007-09-10 16:28:17 +08:00
|
|
|
struct kmem_cache *base_cache, int min)
|
2007-01-06 08:36:53 +08:00
|
|
|
{
|
|
|
|
void *obj;
|
|
|
|
|
|
|
|
if (cache->nobjs >= min)
|
2007-01-06 08:36:54 +08:00
|
|
|
return 0;
|
2007-01-06 08:36:53 +08:00
|
|
|
while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
|
2007-09-10 16:28:17 +08:00
|
|
|
obj = kmem_cache_zalloc(base_cache, GFP_KERNEL);
|
2007-01-06 08:36:53 +08:00
|
|
|
if (!obj)
|
2007-01-06 08:36:54 +08:00
|
|
|
return -ENOMEM;
|
2007-01-06 08:36:53 +08:00
|
|
|
cache->objects[cache->nobjs++] = obj;
|
|
|
|
}
|
2007-01-06 08:36:54 +08:00
|
|
|
return 0;
|
2007-01-06 08:36:53 +08:00
|
|
|
}
|
|
|
|
|
2010-05-13 10:06:02 +08:00
|
|
|
static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc,
|
|
|
|
struct kmem_cache *cache)
|
2007-01-06 08:36:53 +08:00
|
|
|
{
|
|
|
|
while (mc->nobjs)
|
2010-05-13 10:06:02 +08:00
|
|
|
kmem_cache_free(cache, mc->objects[--mc->nobjs]);
|
2007-01-06 08:36:53 +08:00
|
|
|
}
|
|
|
|
|
2007-07-20 13:18:27 +08:00
|
|
|
static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache,
|
2007-09-10 16:28:17 +08:00
|
|
|
int min)
|
2007-07-20 13:18:27 +08:00
|
|
|
{
|
|
|
|
struct page *page;
|
|
|
|
|
|
|
|
if (cache->nobjs >= min)
|
|
|
|
return 0;
|
|
|
|
while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
|
2007-09-10 16:28:17 +08:00
|
|
|
page = alloc_page(GFP_KERNEL);
|
2007-07-20 13:18:27 +08:00
|
|
|
if (!page)
|
|
|
|
return -ENOMEM;
|
|
|
|
cache->objects[cache->nobjs++] = page_address(page);
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void mmu_free_memory_cache_page(struct kvm_mmu_memory_cache *mc)
|
|
|
|
{
|
|
|
|
while (mc->nobjs)
|
2007-07-21 14:06:46 +08:00
|
|
|
free_page((unsigned long)mc->objects[--mc->nobjs]);
|
2007-07-20 13:18:27 +08:00
|
|
|
}
|
|
|
|
|
2007-09-10 16:28:17 +08:00
|
|
|
static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
|
2007-01-06 08:36:53 +08:00
|
|
|
{
|
2007-01-06 08:36:54 +08:00
|
|
|
int r;
|
|
|
|
|
2007-12-13 23:50:52 +08:00
|
|
|
r = mmu_topup_memory_cache(&vcpu->arch.mmu_pte_chain_cache,
|
2007-09-10 16:28:17 +08:00
|
|
|
pte_chain_cache, 4);
|
2007-01-06 08:36:54 +08:00
|
|
|
if (r)
|
|
|
|
goto out;
|
2007-12-13 23:50:52 +08:00
|
|
|
r = mmu_topup_memory_cache(&vcpu->arch.mmu_rmap_desc_cache,
|
2008-10-29 04:16:58 +08:00
|
|
|
rmap_desc_cache, 4);
|
2007-05-30 17:34:53 +08:00
|
|
|
if (r)
|
|
|
|
goto out;
|
2007-12-13 23:50:52 +08:00
|
|
|
r = mmu_topup_memory_cache_page(&vcpu->arch.mmu_page_cache, 8);
|
2007-05-30 17:34:53 +08:00
|
|
|
if (r)
|
|
|
|
goto out;
|
2007-12-13 23:50:52 +08:00
|
|
|
r = mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache,
|
2007-09-10 16:28:17 +08:00
|
|
|
mmu_page_header_cache, 4);
|
2007-01-06 08:36:54 +08:00
|
|
|
out:
|
|
|
|
return r;
|
2007-01-06 08:36:53 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2010-05-13 10:06:02 +08:00
|
|
|
mmu_free_memory_cache(&vcpu->arch.mmu_pte_chain_cache, pte_chain_cache);
|
|
|
|
mmu_free_memory_cache(&vcpu->arch.mmu_rmap_desc_cache, rmap_desc_cache);
|
2007-12-13 23:50:52 +08:00
|
|
|
mmu_free_memory_cache_page(&vcpu->arch.mmu_page_cache);
|
2010-05-13 10:06:02 +08:00
|
|
|
mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache,
|
|
|
|
mmu_page_header_cache);
|
2007-01-06 08:36:53 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc,
|
|
|
|
size_t size)
|
|
|
|
{
|
|
|
|
void *p;
|
|
|
|
|
|
|
|
BUG_ON(!mc->nobjs);
|
|
|
|
p = mc->objects[--mc->nobjs];
|
|
|
|
return p;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct kvm_pte_chain *mmu_alloc_pte_chain(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-12-13 23:50:52 +08:00
|
|
|
return mmu_memory_cache_alloc(&vcpu->arch.mmu_pte_chain_cache,
|
2007-01-06 08:36:53 +08:00
|
|
|
sizeof(struct kvm_pte_chain));
|
|
|
|
}
|
|
|
|
|
2007-07-17 18:04:56 +08:00
|
|
|
static void mmu_free_pte_chain(struct kvm_pte_chain *pc)
|
2007-01-06 08:36:53 +08:00
|
|
|
{
|
2010-05-13 10:06:02 +08:00
|
|
|
kmem_cache_free(pte_chain_cache, pc);
|
2007-01-06 08:36:53 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static struct kvm_rmap_desc *mmu_alloc_rmap_desc(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-12-13 23:50:52 +08:00
|
|
|
return mmu_memory_cache_alloc(&vcpu->arch.mmu_rmap_desc_cache,
|
2007-01-06 08:36:53 +08:00
|
|
|
sizeof(struct kvm_rmap_desc));
|
|
|
|
}
|
|
|
|
|
2007-07-17 18:04:56 +08:00
|
|
|
static void mmu_free_rmap_desc(struct kvm_rmap_desc *rd)
|
2007-01-06 08:36:53 +08:00
|
|
|
{
|
2010-05-13 10:06:02 +08:00
|
|
|
kmem_cache_free(rmap_desc_cache, rd);
|
2007-01-06 08:36:53 +08:00
|
|
|
}
|
|
|
|
|
2010-05-26 16:49:59 +08:00
|
|
|
static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
|
|
|
|
{
|
|
|
|
if (!sp->role.direct)
|
|
|
|
return sp->gfns[index];
|
|
|
|
|
|
|
|
return sp->gfn + (index << ((sp->role.level - 1) * PT64_LEVEL_BITS));
|
|
|
|
}
|
|
|
|
|
|
|
|
static void kvm_mmu_page_set_gfn(struct kvm_mmu_page *sp, int index, gfn_t gfn)
|
|
|
|
{
|
|
|
|
if (sp->role.direct)
|
|
|
|
BUG_ON(gfn != kvm_mmu_page_get_gfn(sp, index));
|
|
|
|
else
|
|
|
|
sp->gfns[index] = gfn;
|
|
|
|
}
|
|
|
|
|
2008-02-23 22:44:30 +08:00
|
|
|
/*
|
|
|
|
* Return the pointer to the largepage write count for a given
|
|
|
|
* gfn, handling slots that are not large page aligned.
|
|
|
|
*/
|
2009-07-27 22:30:43 +08:00
|
|
|
static int *slot_largepage_idx(gfn_t gfn,
|
|
|
|
struct kvm_memory_slot *slot,
|
|
|
|
int level)
|
2008-02-23 22:44:30 +08:00
|
|
|
{
|
|
|
|
unsigned long idx;
|
|
|
|
|
2009-07-27 22:30:43 +08:00
|
|
|
idx = (gfn / KVM_PAGES_PER_HPAGE(level)) -
|
|
|
|
(slot->base_gfn / KVM_PAGES_PER_HPAGE(level));
|
|
|
|
return &slot->lpage_info[level - 2][idx].write_count;
|
2008-02-23 22:44:30 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void account_shadowed(struct kvm *kvm, gfn_t gfn)
|
|
|
|
{
|
2009-07-27 22:30:43 +08:00
|
|
|
struct kvm_memory_slot *slot;
|
2008-02-23 22:44:30 +08:00
|
|
|
int *write_count;
|
2009-07-27 22:30:43 +08:00
|
|
|
int i;
|
2008-02-23 22:44:30 +08:00
|
|
|
|
2008-10-03 22:40:32 +08:00
|
|
|
gfn = unalias_gfn(kvm, gfn);
|
2009-07-27 22:30:43 +08:00
|
|
|
|
|
|
|
slot = gfn_to_memslot_unaliased(kvm, gfn);
|
|
|
|
for (i = PT_DIRECTORY_LEVEL;
|
|
|
|
i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
|
|
|
|
write_count = slot_largepage_idx(gfn, slot, i);
|
|
|
|
*write_count += 1;
|
|
|
|
}
|
2008-02-23 22:44:30 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void unaccount_shadowed(struct kvm *kvm, gfn_t gfn)
|
|
|
|
{
|
2009-07-27 22:30:43 +08:00
|
|
|
struct kvm_memory_slot *slot;
|
2008-02-23 22:44:30 +08:00
|
|
|
int *write_count;
|
2009-07-27 22:30:43 +08:00
|
|
|
int i;
|
2008-02-23 22:44:30 +08:00
|
|
|
|
2008-10-03 22:40:32 +08:00
|
|
|
gfn = unalias_gfn(kvm, gfn);
|
2010-04-16 16:21:42 +08:00
|
|
|
slot = gfn_to_memslot_unaliased(kvm, gfn);
|
2009-07-27 22:30:43 +08:00
|
|
|
for (i = PT_DIRECTORY_LEVEL;
|
|
|
|
i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
|
|
|
|
write_count = slot_largepage_idx(gfn, slot, i);
|
|
|
|
*write_count -= 1;
|
|
|
|
WARN_ON(*write_count < 0);
|
|
|
|
}
|
2008-02-23 22:44:30 +08:00
|
|
|
}
|
|
|
|
|
2009-07-27 22:30:43 +08:00
|
|
|
static int has_wrprotected_page(struct kvm *kvm,
|
|
|
|
gfn_t gfn,
|
|
|
|
int level)
|
2008-02-23 22:44:30 +08:00
|
|
|
{
|
2008-10-03 22:40:32 +08:00
|
|
|
struct kvm_memory_slot *slot;
|
2008-02-23 22:44:30 +08:00
|
|
|
int *largepage_idx;
|
|
|
|
|
2008-10-03 22:40:32 +08:00
|
|
|
gfn = unalias_gfn(kvm, gfn);
|
|
|
|
slot = gfn_to_memslot_unaliased(kvm, gfn);
|
2008-02-23 22:44:30 +08:00
|
|
|
if (slot) {
|
2009-07-27 22:30:43 +08:00
|
|
|
largepage_idx = slot_largepage_idx(gfn, slot, level);
|
2008-02-23 22:44:30 +08:00
|
|
|
return *largepage_idx;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2009-07-27 22:30:43 +08:00
|
|
|
static int host_mapping_level(struct kvm *kvm, gfn_t gfn)
|
2008-02-23 22:44:30 +08:00
|
|
|
{
|
2010-01-28 19:37:56 +08:00
|
|
|
unsigned long page_size;
|
2009-07-27 22:30:43 +08:00
|
|
|
int i, ret = 0;
|
2008-02-23 22:44:30 +08:00
|
|
|
|
2010-01-28 19:37:56 +08:00
|
|
|
page_size = kvm_host_page_size(kvm, gfn);
|
2008-02-23 22:44:30 +08:00
|
|
|
|
2009-07-27 22:30:43 +08:00
|
|
|
for (i = PT_PAGE_TABLE_LEVEL;
|
|
|
|
i < (PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES); ++i) {
|
|
|
|
if (page_size >= KVM_HPAGE_SIZE(i))
|
|
|
|
ret = i;
|
|
|
|
else
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2008-09-17 07:54:47 +08:00
|
|
|
return ret;
|
2008-02-23 22:44:30 +08:00
|
|
|
}
|
|
|
|
|
2009-07-27 22:30:43 +08:00
|
|
|
static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
|
2008-02-23 22:44:30 +08:00
|
|
|
{
|
|
|
|
struct kvm_memory_slot *slot;
|
2010-01-05 19:02:29 +08:00
|
|
|
int host_level, level, max_level;
|
2008-02-23 22:44:30 +08:00
|
|
|
|
|
|
|
slot = gfn_to_memslot(vcpu->kvm, large_gfn);
|
|
|
|
if (slot && slot->dirty_bitmap)
|
2009-07-27 22:30:43 +08:00
|
|
|
return PT_PAGE_TABLE_LEVEL;
|
2008-02-23 22:44:30 +08:00
|
|
|
|
2009-07-27 22:30:43 +08:00
|
|
|
host_level = host_mapping_level(vcpu->kvm, large_gfn);
|
|
|
|
|
|
|
|
if (host_level == PT_PAGE_TABLE_LEVEL)
|
|
|
|
return host_level;
|
|
|
|
|
2010-01-05 19:02:29 +08:00
|
|
|
max_level = kvm_x86_ops->get_lpage_level() < host_level ?
|
|
|
|
kvm_x86_ops->get_lpage_level() : host_level;
|
|
|
|
|
|
|
|
for (level = PT_DIRECTORY_LEVEL; level <= max_level; ++level)
|
2009-07-27 22:30:43 +08:00
|
|
|
if (has_wrprotected_page(vcpu->kvm, large_gfn, level))
|
|
|
|
break;
|
|
|
|
|
|
|
|
return level - 1;
|
2008-02-23 22:44:30 +08:00
|
|
|
}
|
|
|
|
|
2007-09-27 20:11:22 +08:00
|
|
|
/*
|
|
|
|
* Take gfn and return the reverse mapping to it.
|
|
|
|
* Note: gfn must be unaliased before this function get called
|
|
|
|
*/
|
|
|
|
|
2009-07-27 22:30:42 +08:00
|
|
|
static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn, int level)
|
2007-09-27 20:11:22 +08:00
|
|
|
{
|
|
|
|
struct kvm_memory_slot *slot;
|
2008-02-23 22:44:30 +08:00
|
|
|
unsigned long idx;
|
2007-09-27 20:11:22 +08:00
|
|
|
|
|
|
|
slot = gfn_to_memslot(kvm, gfn);
|
2009-07-27 22:30:42 +08:00
|
|
|
if (likely(level == PT_PAGE_TABLE_LEVEL))
|
2008-02-23 22:44:30 +08:00
|
|
|
return &slot->rmap[gfn - slot->base_gfn];
|
|
|
|
|
2009-07-27 22:30:42 +08:00
|
|
|
idx = (gfn / KVM_PAGES_PER_HPAGE(level)) -
|
|
|
|
(slot->base_gfn / KVM_PAGES_PER_HPAGE(level));
|
2008-02-23 22:44:30 +08:00
|
|
|
|
2009-07-27 22:30:42 +08:00
|
|
|
return &slot->lpage_info[level - 2][idx].rmap_pde;
|
2007-09-27 20:11:22 +08:00
|
|
|
}
|
|
|
|
|
2007-01-06 08:36:38 +08:00
|
|
|
/*
|
|
|
|
* Reverse mapping data structures:
|
|
|
|
*
|
2007-09-27 20:11:22 +08:00
|
|
|
* If rmapp bit zero is zero, then rmapp point to the shadw page table entry
|
|
|
|
* that points to page_address(page).
|
2007-01-06 08:36:38 +08:00
|
|
|
*
|
2007-09-27 20:11:22 +08:00
|
|
|
* If rmapp bit zero is one, (then rmap & ~1) points to a struct kvm_rmap_desc
|
|
|
|
* containing more mappings.
|
2009-08-06 02:43:58 +08:00
|
|
|
*
|
|
|
|
* Returns the number of rmap entries before the spte was added or zero if
|
|
|
|
* the spte was not added.
|
|
|
|
*
|
2007-01-06 08:36:38 +08:00
|
|
|
*/
|
2009-07-27 22:30:42 +08:00
|
|
|
static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
|
2007-01-06 08:36:38 +08:00
|
|
|
{
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
2007-01-06 08:36:38 +08:00
|
|
|
struct kvm_rmap_desc *desc;
|
2007-09-27 20:11:22 +08:00
|
|
|
unsigned long *rmapp;
|
2009-08-06 02:43:58 +08:00
|
|
|
int i, count = 0;
|
2007-01-06 08:36:38 +08:00
|
|
|
|
2009-06-10 19:12:05 +08:00
|
|
|
if (!is_rmap_spte(*spte))
|
2009-08-06 02:43:58 +08:00
|
|
|
return count;
|
2007-09-27 20:11:22 +08:00
|
|
|
gfn = unalias_gfn(vcpu->kvm, gfn);
|
2007-11-21 21:28:32 +08:00
|
|
|
sp = page_header(__pa(spte));
|
2010-05-26 16:49:59 +08:00
|
|
|
kvm_mmu_page_set_gfn(sp, spte - sp->spt, gfn);
|
2009-07-27 22:30:42 +08:00
|
|
|
rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp->role.level);
|
2007-09-27 20:11:22 +08:00
|
|
|
if (!*rmapp) {
|
2007-01-06 08:36:38 +08:00
|
|
|
rmap_printk("rmap_add: %p %llx 0->1\n", spte, *spte);
|
2007-09-27 20:11:22 +08:00
|
|
|
*rmapp = (unsigned long)spte;
|
|
|
|
} else if (!(*rmapp & 1)) {
|
2007-01-06 08:36:38 +08:00
|
|
|
rmap_printk("rmap_add: %p %llx 1->many\n", spte, *spte);
|
2007-01-06 08:36:53 +08:00
|
|
|
desc = mmu_alloc_rmap_desc(vcpu);
|
2009-06-10 19:24:23 +08:00
|
|
|
desc->sptes[0] = (u64 *)*rmapp;
|
|
|
|
desc->sptes[1] = spte;
|
2007-09-27 20:11:22 +08:00
|
|
|
*rmapp = (unsigned long)desc | 1;
|
2007-01-06 08:36:38 +08:00
|
|
|
} else {
|
|
|
|
rmap_printk("rmap_add: %p %llx many->many\n", spte, *spte);
|
2007-09-27 20:11:22 +08:00
|
|
|
desc = (struct kvm_rmap_desc *)(*rmapp & ~1ul);
|
2009-06-10 19:24:23 +08:00
|
|
|
while (desc->sptes[RMAP_EXT-1] && desc->more) {
|
2007-01-06 08:36:38 +08:00
|
|
|
desc = desc->more;
|
2009-08-06 02:43:58 +08:00
|
|
|
count += RMAP_EXT;
|
|
|
|
}
|
2009-06-10 19:24:23 +08:00
|
|
|
if (desc->sptes[RMAP_EXT-1]) {
|
2007-01-06 08:36:53 +08:00
|
|
|
desc->more = mmu_alloc_rmap_desc(vcpu);
|
2007-01-06 08:36:38 +08:00
|
|
|
desc = desc->more;
|
|
|
|
}
|
2009-06-10 19:24:23 +08:00
|
|
|
for (i = 0; desc->sptes[i]; ++i)
|
2007-01-06 08:36:38 +08:00
|
|
|
;
|
2009-06-10 19:24:23 +08:00
|
|
|
desc->sptes[i] = spte;
|
2007-01-06 08:36:38 +08:00
|
|
|
}
|
2009-08-06 02:43:58 +08:00
|
|
|
return count;
|
2007-01-06 08:36:38 +08:00
|
|
|
}
|
|
|
|
|
2007-09-27 20:11:22 +08:00
|
|
|
static void rmap_desc_remove_entry(unsigned long *rmapp,
|
2007-01-06 08:36:38 +08:00
|
|
|
struct kvm_rmap_desc *desc,
|
|
|
|
int i,
|
|
|
|
struct kvm_rmap_desc *prev_desc)
|
|
|
|
{
|
|
|
|
int j;
|
|
|
|
|
2009-06-10 19:24:23 +08:00
|
|
|
for (j = RMAP_EXT - 1; !desc->sptes[j] && j > i; --j)
|
2007-01-06 08:36:38 +08:00
|
|
|
;
|
2009-06-10 19:24:23 +08:00
|
|
|
desc->sptes[i] = desc->sptes[j];
|
|
|
|
desc->sptes[j] = NULL;
|
2007-01-06 08:36:38 +08:00
|
|
|
if (j != 0)
|
|
|
|
return;
|
|
|
|
if (!prev_desc && !desc->more)
|
2009-06-10 19:24:23 +08:00
|
|
|
*rmapp = (unsigned long)desc->sptes[0];
|
2007-01-06 08:36:38 +08:00
|
|
|
else
|
|
|
|
if (prev_desc)
|
|
|
|
prev_desc->more = desc->more;
|
|
|
|
else
|
2007-09-27 20:11:22 +08:00
|
|
|
*rmapp = (unsigned long)desc->more | 1;
|
2007-07-17 18:04:56 +08:00
|
|
|
mmu_free_rmap_desc(desc);
|
2007-01-06 08:36:38 +08:00
|
|
|
}
|
|
|
|
|
2007-09-27 20:11:22 +08:00
|
|
|
static void rmap_remove(struct kvm *kvm, u64 *spte)
|
2007-01-06 08:36:38 +08:00
|
|
|
{
|
|
|
|
struct kvm_rmap_desc *desc;
|
|
|
|
struct kvm_rmap_desc *prev_desc;
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
2008-04-03 03:46:56 +08:00
|
|
|
pfn_t pfn;
|
2010-05-26 16:49:59 +08:00
|
|
|
gfn_t gfn;
|
2007-09-27 20:11:22 +08:00
|
|
|
unsigned long *rmapp;
|
2007-01-06 08:36:38 +08:00
|
|
|
int i;
|
|
|
|
|
2009-06-10 19:12:05 +08:00
|
|
|
if (!is_rmap_spte(*spte))
|
2007-01-06 08:36:38 +08:00
|
|
|
return;
|
2007-11-21 21:28:32 +08:00
|
|
|
sp = page_header(__pa(spte));
|
2008-04-03 03:46:56 +08:00
|
|
|
pfn = spte_to_pfn(*spte);
|
2008-04-25 21:13:50 +08:00
|
|
|
if (*spte & shadow_accessed_mask)
|
2008-04-03 03:46:56 +08:00
|
|
|
kvm_set_pfn_accessed(pfn);
|
2010-01-18 17:45:10 +08:00
|
|
|
if (is_writable_pte(*spte))
|
2009-09-24 02:47:16 +08:00
|
|
|
kvm_set_pfn_dirty(pfn);
|
2010-05-26 16:49:59 +08:00
|
|
|
gfn = kvm_mmu_page_get_gfn(sp, spte - sp->spt);
|
|
|
|
rmapp = gfn_to_rmap(kvm, gfn, sp->role.level);
|
2007-09-27 20:11:22 +08:00
|
|
|
if (!*rmapp) {
|
2007-01-06 08:36:38 +08:00
|
|
|
printk(KERN_ERR "rmap_remove: %p %llx 0->BUG\n", spte, *spte);
|
|
|
|
BUG();
|
2007-09-27 20:11:22 +08:00
|
|
|
} else if (!(*rmapp & 1)) {
|
2007-01-06 08:36:38 +08:00
|
|
|
rmap_printk("rmap_remove: %p %llx 1->0\n", spte, *spte);
|
2007-09-27 20:11:22 +08:00
|
|
|
if ((u64 *)*rmapp != spte) {
|
2007-01-06 08:36:38 +08:00
|
|
|
printk(KERN_ERR "rmap_remove: %p %llx 1->BUG\n",
|
|
|
|
spte, *spte);
|
|
|
|
BUG();
|
|
|
|
}
|
2007-09-27 20:11:22 +08:00
|
|
|
*rmapp = 0;
|
2007-01-06 08:36:38 +08:00
|
|
|
} else {
|
|
|
|
rmap_printk("rmap_remove: %p %llx many->many\n", spte, *spte);
|
2007-09-27 20:11:22 +08:00
|
|
|
desc = (struct kvm_rmap_desc *)(*rmapp & ~1ul);
|
2007-01-06 08:36:38 +08:00
|
|
|
prev_desc = NULL;
|
|
|
|
while (desc) {
|
2009-06-10 19:24:23 +08:00
|
|
|
for (i = 0; i < RMAP_EXT && desc->sptes[i]; ++i)
|
|
|
|
if (desc->sptes[i] == spte) {
|
2007-09-27 20:11:22 +08:00
|
|
|
rmap_desc_remove_entry(rmapp,
|
2007-01-06 08:36:53 +08:00
|
|
|
desc, i,
|
2007-01-06 08:36:38 +08:00
|
|
|
prev_desc);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
prev_desc = desc;
|
|
|
|
desc = desc->more;
|
|
|
|
}
|
2009-12-02 21:17:00 +08:00
|
|
|
pr_err("rmap_remove: %p %llx many->many\n", spte, *spte);
|
2007-01-06 08:36:38 +08:00
|
|
|
BUG();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-10-16 20:42:30 +08:00
|
|
|
static u64 *rmap_next(struct kvm *kvm, unsigned long *rmapp, u64 *spte)
|
2007-01-06 08:36:43 +08:00
|
|
|
{
|
|
|
|
struct kvm_rmap_desc *desc;
|
2007-10-16 20:42:30 +08:00
|
|
|
u64 *prev_spte;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (!*rmapp)
|
|
|
|
return NULL;
|
|
|
|
else if (!(*rmapp & 1)) {
|
|
|
|
if (!spte)
|
|
|
|
return (u64 *)*rmapp;
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
desc = (struct kvm_rmap_desc *)(*rmapp & ~1ul);
|
|
|
|
prev_spte = NULL;
|
|
|
|
while (desc) {
|
2009-06-10 19:24:23 +08:00
|
|
|
for (i = 0; i < RMAP_EXT && desc->sptes[i]; ++i) {
|
2007-10-16 20:42:30 +08:00
|
|
|
if (prev_spte == spte)
|
2009-06-10 19:24:23 +08:00
|
|
|
return desc->sptes[i];
|
|
|
|
prev_spte = desc->sptes[i];
|
2007-10-16 20:42:30 +08:00
|
|
|
}
|
|
|
|
desc = desc->more;
|
|
|
|
}
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2008-12-02 08:32:03 +08:00
|
|
|
static int rmap_write_protect(struct kvm *kvm, u64 gfn)
|
2007-10-16 20:42:30 +08:00
|
|
|
{
|
2007-09-27 20:11:22 +08:00
|
|
|
unsigned long *rmapp;
|
2007-01-06 08:36:43 +08:00
|
|
|
u64 *spte;
|
2009-07-27 22:30:42 +08:00
|
|
|
int i, write_protected = 0;
|
2007-01-06 08:36:43 +08:00
|
|
|
|
2007-10-11 09:08:41 +08:00
|
|
|
gfn = unalias_gfn(kvm, gfn);
|
2009-07-27 22:30:42 +08:00
|
|
|
rmapp = gfn_to_rmap(kvm, gfn, PT_PAGE_TABLE_LEVEL);
|
2007-01-06 08:36:43 +08:00
|
|
|
|
2007-10-16 20:42:30 +08:00
|
|
|
spte = rmap_next(kvm, rmapp, NULL);
|
|
|
|
while (spte) {
|
2007-01-06 08:36:43 +08:00
|
|
|
BUG_ON(!spte);
|
|
|
|
BUG_ON(!(*spte & PT_PRESENT_MASK));
|
|
|
|
rmap_printk("rmap_write_protect: spte %p %llx\n", spte, *spte);
|
2010-01-18 17:45:10 +08:00
|
|
|
if (is_writable_pte(*spte)) {
|
2009-06-10 19:24:23 +08:00
|
|
|
__set_spte(spte, *spte & ~PT_WRITABLE_MASK);
|
2007-12-18 06:08:27 +08:00
|
|
|
write_protected = 1;
|
|
|
|
}
|
2007-10-16 20:43:46 +08:00
|
|
|
spte = rmap_next(kvm, rmapp, spte);
|
2007-01-06 08:36:43 +08:00
|
|
|
}
|
2008-03-21 00:17:24 +08:00
|
|
|
if (write_protected) {
|
2008-04-03 03:46:56 +08:00
|
|
|
pfn_t pfn;
|
2008-03-21 00:17:24 +08:00
|
|
|
|
|
|
|
spte = rmap_next(kvm, rmapp, NULL);
|
2008-04-03 03:46:56 +08:00
|
|
|
pfn = spte_to_pfn(*spte);
|
|
|
|
kvm_set_pfn_dirty(pfn);
|
2008-03-21 00:17:24 +08:00
|
|
|
}
|
|
|
|
|
2008-02-23 22:44:30 +08:00
|
|
|
/* check for huge page mappings */
|
2009-07-27 22:30:42 +08:00
|
|
|
for (i = PT_DIRECTORY_LEVEL;
|
|
|
|
i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
|
|
|
|
rmapp = gfn_to_rmap(kvm, gfn, i);
|
|
|
|
spte = rmap_next(kvm, rmapp, NULL);
|
|
|
|
while (spte) {
|
|
|
|
BUG_ON(!spte);
|
|
|
|
BUG_ON(!(*spte & PT_PRESENT_MASK));
|
|
|
|
BUG_ON((*spte & (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK)) != (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK));
|
|
|
|
pgprintk("rmap_write_protect(large): spte %p %llx %lld\n", spte, *spte, gfn);
|
2010-01-18 17:45:10 +08:00
|
|
|
if (is_writable_pte(*spte)) {
|
2009-07-27 22:30:42 +08:00
|
|
|
rmap_remove(kvm, spte);
|
|
|
|
--kvm->stat.lpages;
|
|
|
|
__set_spte(spte, shadow_trap_nonpresent_pte);
|
|
|
|
spte = NULL;
|
|
|
|
write_protected = 1;
|
|
|
|
}
|
|
|
|
spte = rmap_next(kvm, rmapp, spte);
|
2008-02-23 22:44:30 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2008-12-02 08:32:03 +08:00
|
|
|
return write_protected;
|
2007-01-06 08:36:43 +08:00
|
|
|
}
|
|
|
|
|
2009-10-09 19:42:56 +08:00
|
|
|
static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
|
|
|
|
unsigned long data)
|
2008-07-25 22:24:52 +08:00
|
|
|
{
|
|
|
|
u64 *spte;
|
|
|
|
int need_tlb_flush = 0;
|
|
|
|
|
|
|
|
while ((spte = rmap_next(kvm, rmapp, NULL))) {
|
|
|
|
BUG_ON(!(*spte & PT_PRESENT_MASK));
|
|
|
|
rmap_printk("kvm_rmap_unmap_hva: spte %p %llx\n", spte, *spte);
|
|
|
|
rmap_remove(kvm, spte);
|
2009-06-10 19:24:23 +08:00
|
|
|
__set_spte(spte, shadow_trap_nonpresent_pte);
|
2008-07-25 22:24:52 +08:00
|
|
|
need_tlb_flush = 1;
|
|
|
|
}
|
|
|
|
return need_tlb_flush;
|
|
|
|
}
|
|
|
|
|
2009-10-09 19:42:56 +08:00
|
|
|
static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
|
|
|
|
unsigned long data)
|
2009-09-24 02:47:18 +08:00
|
|
|
{
|
|
|
|
int need_flush = 0;
|
|
|
|
u64 *spte, new_spte;
|
|
|
|
pte_t *ptep = (pte_t *)data;
|
|
|
|
pfn_t new_pfn;
|
|
|
|
|
|
|
|
WARN_ON(pte_huge(*ptep));
|
|
|
|
new_pfn = pte_pfn(*ptep);
|
|
|
|
spte = rmap_next(kvm, rmapp, NULL);
|
|
|
|
while (spte) {
|
|
|
|
BUG_ON(!is_shadow_present_pte(*spte));
|
|
|
|
rmap_printk("kvm_set_pte_rmapp: spte %p %llx\n", spte, *spte);
|
|
|
|
need_flush = 1;
|
|
|
|
if (pte_write(*ptep)) {
|
|
|
|
rmap_remove(kvm, spte);
|
|
|
|
__set_spte(spte, shadow_trap_nonpresent_pte);
|
|
|
|
spte = rmap_next(kvm, rmapp, NULL);
|
|
|
|
} else {
|
|
|
|
new_spte = *spte &~ (PT64_BASE_ADDR_MASK);
|
|
|
|
new_spte |= (u64)new_pfn << PAGE_SHIFT;
|
|
|
|
|
|
|
|
new_spte &= ~PT_WRITABLE_MASK;
|
|
|
|
new_spte &= ~SPTE_HOST_WRITEABLE;
|
2010-01-18 17:45:10 +08:00
|
|
|
if (is_writable_pte(*spte))
|
2009-09-24 02:47:18 +08:00
|
|
|
kvm_set_pfn_dirty(spte_to_pfn(*spte));
|
|
|
|
__set_spte(spte, new_spte);
|
|
|
|
spte = rmap_next(kvm, rmapp, spte);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (need_flush)
|
|
|
|
kvm_flush_remote_tlbs(kvm);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2009-10-09 19:42:56 +08:00
|
|
|
static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
|
|
|
|
unsigned long data,
|
2009-09-24 02:47:18 +08:00
|
|
|
int (*handler)(struct kvm *kvm, unsigned long *rmapp,
|
2009-10-09 19:42:56 +08:00
|
|
|
unsigned long data))
|
2008-07-25 22:24:52 +08:00
|
|
|
{
|
2009-07-27 22:30:44 +08:00
|
|
|
int i, j;
|
2009-12-31 18:10:16 +08:00
|
|
|
int ret;
|
2008-07-25 22:24:52 +08:00
|
|
|
int retval = 0;
|
2009-12-24 00:35:21 +08:00
|
|
|
struct kvm_memslots *slots;
|
|
|
|
|
2010-04-19 17:41:23 +08:00
|
|
|
slots = kvm_memslots(kvm);
|
2008-07-25 22:24:52 +08:00
|
|
|
|
2009-12-24 00:35:16 +08:00
|
|
|
for (i = 0; i < slots->nmemslots; i++) {
|
|
|
|
struct kvm_memory_slot *memslot = &slots->memslots[i];
|
2008-07-25 22:24:52 +08:00
|
|
|
unsigned long start = memslot->userspace_addr;
|
|
|
|
unsigned long end;
|
|
|
|
|
|
|
|
end = start + (memslot->npages << PAGE_SHIFT);
|
|
|
|
if (hva >= start && hva < end) {
|
|
|
|
gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
|
2009-07-27 22:30:44 +08:00
|
|
|
|
2009-12-31 18:10:16 +08:00
|
|
|
ret = handler(kvm, &memslot->rmap[gfn_offset], data);
|
2009-07-27 22:30:44 +08:00
|
|
|
|
|
|
|
for (j = 0; j < KVM_NR_PAGE_SIZES - 1; ++j) {
|
|
|
|
int idx = gfn_offset;
|
|
|
|
idx /= KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL + j);
|
2009-12-31 18:10:16 +08:00
|
|
|
ret |= handler(kvm,
|
2009-09-24 02:47:18 +08:00
|
|
|
&memslot->lpage_info[j][idx].rmap_pde,
|
|
|
|
data);
|
2009-07-27 22:30:44 +08:00
|
|
|
}
|
2009-12-31 18:10:16 +08:00
|
|
|
trace_kvm_age_page(hva, memslot, ret);
|
|
|
|
retval |= ret;
|
2008-07-25 22:24:52 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return retval;
|
|
|
|
}
|
|
|
|
|
|
|
|
int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
|
|
|
|
{
|
2009-09-24 02:47:18 +08:00
|
|
|
return kvm_handle_hva(kvm, hva, 0, kvm_unmap_rmapp);
|
|
|
|
}
|
|
|
|
|
|
|
|
void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
|
|
|
|
{
|
2009-10-09 19:42:56 +08:00
|
|
|
kvm_handle_hva(kvm, hva, (unsigned long)&pte, kvm_set_pte_rmapp);
|
2008-07-25 22:24:52 +08:00
|
|
|
}
|
|
|
|
|
2009-10-09 19:42:56 +08:00
|
|
|
static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
|
|
|
|
unsigned long data)
|
2008-07-25 22:24:52 +08:00
|
|
|
{
|
|
|
|
u64 *spte;
|
|
|
|
int young = 0;
|
|
|
|
|
2010-02-04 05:11:03 +08:00
|
|
|
/*
|
|
|
|
* Emulate the accessed bit for EPT, by checking if this page has
|
|
|
|
* an EPT mapping, and clearing it if it does. On the next access,
|
|
|
|
* a new EPT mapping will be established.
|
|
|
|
* This has some overhead, but not as much as the cost of swapping
|
|
|
|
* out actively used pages or breaking up actively used hugepages.
|
|
|
|
*/
|
2008-09-08 15:12:30 +08:00
|
|
|
if (!shadow_accessed_mask)
|
2010-02-04 05:11:03 +08:00
|
|
|
return kvm_unmap_rmapp(kvm, rmapp, data);
|
2008-09-08 15:12:30 +08:00
|
|
|
|
2008-07-25 22:24:52 +08:00
|
|
|
spte = rmap_next(kvm, rmapp, NULL);
|
|
|
|
while (spte) {
|
|
|
|
int _young;
|
|
|
|
u64 _spte = *spte;
|
|
|
|
BUG_ON(!(_spte & PT_PRESENT_MASK));
|
|
|
|
_young = _spte & PT_ACCESSED_MASK;
|
|
|
|
if (_young) {
|
|
|
|
young = 1;
|
|
|
|
clear_bit(PT_ACCESSED_SHIFT, (unsigned long *)spte);
|
|
|
|
}
|
|
|
|
spte = rmap_next(kvm, rmapp, spte);
|
|
|
|
}
|
|
|
|
return young;
|
|
|
|
}
|
|
|
|
|
2009-08-06 02:43:58 +08:00
|
|
|
#define RMAP_RECYCLE_THRESHOLD 1000
|
|
|
|
|
2009-07-27 22:30:44 +08:00
|
|
|
static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
|
2009-08-06 02:43:58 +08:00
|
|
|
{
|
|
|
|
unsigned long *rmapp;
|
2009-07-27 22:30:44 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
|
|
|
|
|
|
|
sp = page_header(__pa(spte));
|
2009-08-06 02:43:58 +08:00
|
|
|
|
|
|
|
gfn = unalias_gfn(vcpu->kvm, gfn);
|
2009-07-27 22:30:44 +08:00
|
|
|
rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp->role.level);
|
2009-08-06 02:43:58 +08:00
|
|
|
|
2009-09-24 02:47:18 +08:00
|
|
|
kvm_unmap_rmapp(vcpu->kvm, rmapp, 0);
|
2009-08-06 02:43:58 +08:00
|
|
|
kvm_flush_remote_tlbs(vcpu->kvm);
|
|
|
|
}
|
|
|
|
|
2008-07-25 22:24:52 +08:00
|
|
|
int kvm_age_hva(struct kvm *kvm, unsigned long hva)
|
|
|
|
{
|
2009-09-24 02:47:18 +08:00
|
|
|
return kvm_handle_hva(kvm, hva, 0, kvm_age_rmapp);
|
2008-07-25 22:24:52 +08:00
|
|
|
}
|
|
|
|
|
2007-04-25 14:17:25 +08:00
|
|
|
#ifdef MMU_DEBUG
|
2007-05-06 20:50:58 +08:00
|
|
|
static int is_empty_shadow_page(u64 *spt)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2007-01-06 08:36:50 +08:00
|
|
|
u64 *pos;
|
|
|
|
u64 *end;
|
|
|
|
|
2007-05-06 20:50:58 +08:00
|
|
|
for (pos = spt, end = pos + PAGE_SIZE / sizeof(u64); pos != end; pos++)
|
2008-05-20 21:21:13 +08:00
|
|
|
if (is_shadow_present_pte(*pos)) {
|
2008-03-04 04:59:56 +08:00
|
|
|
printk(KERN_ERR "%s: %p %llx\n", __func__,
|
2007-01-06 08:36:50 +08:00
|
|
|
pos, *pos);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
return 0;
|
2007-01-06 08:36:50 +08:00
|
|
|
}
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
return 1;
|
|
|
|
}
|
2007-04-25 14:17:25 +08:00
|
|
|
#endif
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-11-21 21:28:32 +08:00
|
|
|
static void kvm_mmu_free_page(struct kvm *kvm, struct kvm_mmu_page *sp)
|
2007-01-06 08:36:49 +08:00
|
|
|
{
|
2007-11-21 21:28:32 +08:00
|
|
|
ASSERT(is_empty_shadow_page(sp->spt));
|
2010-06-04 21:53:54 +08:00
|
|
|
hlist_del(&sp->hash_link);
|
2007-11-21 21:28:32 +08:00
|
|
|
list_del(&sp->link);
|
|
|
|
__free_page(virt_to_page(sp->spt));
|
2010-05-26 16:49:59 +08:00
|
|
|
if (!sp->role.direct)
|
|
|
|
__free_page(virt_to_page(sp->gfns));
|
2010-05-13 10:06:02 +08:00
|
|
|
kmem_cache_free(mmu_page_header_cache, sp);
|
2007-12-14 10:01:48 +08:00
|
|
|
++kvm->arch.n_free_mmu_pages;
|
2007-01-06 08:36:49 +08:00
|
|
|
}
|
|
|
|
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
static unsigned kvm_page_table_hashfn(gfn_t gfn)
|
|
|
|
{
|
2008-01-07 19:20:25 +08:00
|
|
|
return gfn & ((1 << KVM_MMU_HASH_SHIFT) - 1);
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
}
|
|
|
|
|
2007-01-06 08:36:42 +08:00
|
|
|
static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
|
2010-05-26 16:49:59 +08:00
|
|
|
u64 *parent_pte, int direct)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-12-13 23:50:52 +08:00
|
|
|
sp = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache, sizeof *sp);
|
|
|
|
sp->spt = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache, PAGE_SIZE);
|
2010-05-26 16:49:59 +08:00
|
|
|
if (!direct)
|
|
|
|
sp->gfns = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache,
|
|
|
|
PAGE_SIZE);
|
2007-11-21 21:28:32 +08:00
|
|
|
set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
|
2007-12-14 10:01:48 +08:00
|
|
|
list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
|
2008-10-16 17:30:57 +08:00
|
|
|
bitmap_zero(sp->slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
|
2007-11-21 21:28:32 +08:00
|
|
|
sp->multimapped = 0;
|
|
|
|
sp->parent_pte = parent_pte;
|
2007-12-14 10:01:48 +08:00
|
|
|
--vcpu->kvm->arch.n_free_mmu_pages;
|
2007-11-21 21:28:32 +08:00
|
|
|
return sp;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
2007-01-06 08:36:53 +08:00
|
|
|
static void mmu_page_add_parent_pte(struct kvm_vcpu *vcpu,
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp, u64 *parent_pte)
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
{
|
|
|
|
struct kvm_pte_chain *pte_chain;
|
|
|
|
struct hlist_node *node;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (!parent_pte)
|
|
|
|
return;
|
2007-11-21 21:28:32 +08:00
|
|
|
if (!sp->multimapped) {
|
|
|
|
u64 *old = sp->parent_pte;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
|
|
|
|
if (!old) {
|
2007-11-21 21:28:32 +08:00
|
|
|
sp->parent_pte = parent_pte;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
return;
|
|
|
|
}
|
2007-11-21 21:28:32 +08:00
|
|
|
sp->multimapped = 1;
|
2007-01-06 08:36:53 +08:00
|
|
|
pte_chain = mmu_alloc_pte_chain(vcpu);
|
2007-11-21 21:28:32 +08:00
|
|
|
INIT_HLIST_HEAD(&sp->parent_ptes);
|
|
|
|
hlist_add_head(&pte_chain->link, &sp->parent_ptes);
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
pte_chain->parent_ptes[0] = old;
|
|
|
|
}
|
2007-11-21 21:28:32 +08:00
|
|
|
hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link) {
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
if (pte_chain->parent_ptes[NR_PTE_CHAIN_ENTRIES-1])
|
|
|
|
continue;
|
|
|
|
for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i)
|
|
|
|
if (!pte_chain->parent_ptes[i]) {
|
|
|
|
pte_chain->parent_ptes[i] = parent_pte;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
}
|
2007-01-06 08:36:53 +08:00
|
|
|
pte_chain = mmu_alloc_pte_chain(vcpu);
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
BUG_ON(!pte_chain);
|
2007-11-21 21:28:32 +08:00
|
|
|
hlist_add_head(&pte_chain->link, &sp->parent_ptes);
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
pte_chain->parent_ptes[0] = parent_pte;
|
|
|
|
}
|
|
|
|
|
2007-11-21 21:28:32 +08:00
|
|
|
static void mmu_page_remove_parent_pte(struct kvm_mmu_page *sp,
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
u64 *parent_pte)
|
|
|
|
{
|
|
|
|
struct kvm_pte_chain *pte_chain;
|
|
|
|
struct hlist_node *node;
|
|
|
|
int i;
|
|
|
|
|
2007-11-21 21:28:32 +08:00
|
|
|
if (!sp->multimapped) {
|
|
|
|
BUG_ON(sp->parent_pte != parent_pte);
|
|
|
|
sp->parent_pte = NULL;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
return;
|
|
|
|
}
|
2007-11-21 21:28:32 +08:00
|
|
|
hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link)
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i) {
|
|
|
|
if (!pte_chain->parent_ptes[i])
|
|
|
|
break;
|
|
|
|
if (pte_chain->parent_ptes[i] != parent_pte)
|
|
|
|
continue;
|
2007-01-06 08:36:46 +08:00
|
|
|
while (i + 1 < NR_PTE_CHAIN_ENTRIES
|
|
|
|
&& pte_chain->parent_ptes[i + 1]) {
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
pte_chain->parent_ptes[i]
|
|
|
|
= pte_chain->parent_ptes[i + 1];
|
|
|
|
++i;
|
|
|
|
}
|
|
|
|
pte_chain->parent_ptes[i] = NULL;
|
2007-01-06 08:36:46 +08:00
|
|
|
if (i == 0) {
|
|
|
|
hlist_del(&pte_chain->link);
|
2007-07-17 18:04:56 +08:00
|
|
|
mmu_free_pte_chain(pte_chain);
|
2007-11-21 21:28:32 +08:00
|
|
|
if (hlist_empty(&sp->parent_ptes)) {
|
|
|
|
sp->multimapped = 0;
|
|
|
|
sp->parent_pte = NULL;
|
2007-01-06 08:36:46 +08:00
|
|
|
}
|
|
|
|
}
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
BUG();
|
|
|
|
}
|
|
|
|
|
2008-09-24 00:18:36 +08:00
|
|
|
|
2010-04-16 21:29:17 +08:00
|
|
|
static void mmu_parent_walk(struct kvm_mmu_page *sp, mmu_parent_walk_fn fn)
|
2008-09-24 00:18:36 +08:00
|
|
|
{
|
|
|
|
struct kvm_pte_chain *pte_chain;
|
|
|
|
struct hlist_node *node;
|
|
|
|
struct kvm_mmu_page *parent_sp;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (!sp->multimapped && sp->parent_pte) {
|
|
|
|
parent_sp = page_header(__pa(sp->parent_pte));
|
2010-04-16 21:29:17 +08:00
|
|
|
fn(parent_sp);
|
|
|
|
mmu_parent_walk(parent_sp, fn);
|
2008-09-24 00:18:36 +08:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link)
|
|
|
|
for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i) {
|
|
|
|
if (!pte_chain->parent_ptes[i])
|
|
|
|
break;
|
|
|
|
parent_sp = page_header(__pa(pte_chain->parent_ptes[i]));
|
2010-04-16 21:29:17 +08:00
|
|
|
fn(parent_sp);
|
|
|
|
mmu_parent_walk(parent_sp, fn);
|
2008-09-24 00:18:36 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2008-09-24 00:18:40 +08:00
|
|
|
static void kvm_mmu_update_unsync_bitmap(u64 *spte)
|
|
|
|
{
|
|
|
|
unsigned int index;
|
|
|
|
struct kvm_mmu_page *sp = page_header(__pa(spte));
|
|
|
|
|
|
|
|
index = spte - sp->spt;
|
2008-12-02 08:32:02 +08:00
|
|
|
if (!__test_and_set_bit(index, sp->unsync_child_bitmap))
|
|
|
|
sp->unsync_children++;
|
|
|
|
WARN_ON(!sp->unsync_children);
|
2008-09-24 00:18:40 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void kvm_mmu_update_parents_unsync(struct kvm_mmu_page *sp)
|
|
|
|
{
|
|
|
|
struct kvm_pte_chain *pte_chain;
|
|
|
|
struct hlist_node *node;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (!sp->parent_pte)
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (!sp->multimapped) {
|
|
|
|
kvm_mmu_update_unsync_bitmap(sp->parent_pte);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link)
|
|
|
|
for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i) {
|
|
|
|
if (!pte_chain->parent_ptes[i])
|
|
|
|
break;
|
|
|
|
kvm_mmu_update_unsync_bitmap(pte_chain->parent_ptes[i]);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2010-04-16 21:29:17 +08:00
|
|
|
static int unsync_walk_fn(struct kvm_mmu_page *sp)
|
2008-09-24 00:18:40 +08:00
|
|
|
{
|
|
|
|
kvm_mmu_update_parents_unsync(sp);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2010-04-16 21:29:17 +08:00
|
|
|
static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
|
2008-09-24 00:18:40 +08:00
|
|
|
{
|
2010-04-16 21:29:17 +08:00
|
|
|
mmu_parent_walk(sp, unsync_walk_fn);
|
2008-09-24 00:18:40 +08:00
|
|
|
kvm_mmu_update_parents_unsync(sp);
|
|
|
|
}
|
|
|
|
|
2008-05-29 19:55:03 +08:00
|
|
|
static void nonpaging_prefetch_page(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_mmu_page *sp)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < PT64_ENT_PER_PAGE; ++i)
|
|
|
|
sp->spt[i] = shadow_trap_nonpresent_pte;
|
|
|
|
}
|
|
|
|
|
2008-09-24 00:18:33 +08:00
|
|
|
static int nonpaging_sync_page(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_mmu_page *sp)
|
|
|
|
{
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2008-09-24 00:18:35 +08:00
|
|
|
static void nonpaging_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2008-12-02 08:32:02 +08:00
|
|
|
#define KVM_PAGE_ARRAY_NR 16
|
|
|
|
|
|
|
|
struct kvm_mmu_pages {
|
|
|
|
struct mmu_page_and_offset {
|
|
|
|
struct kvm_mmu_page *sp;
|
|
|
|
unsigned int idx;
|
|
|
|
} page[KVM_PAGE_ARRAY_NR];
|
|
|
|
unsigned int nr;
|
|
|
|
};
|
|
|
|
|
2008-09-24 00:18:40 +08:00
|
|
|
#define for_each_unsync_children(bitmap, idx) \
|
|
|
|
for (idx = find_first_bit(bitmap, 512); \
|
|
|
|
idx < 512; \
|
|
|
|
idx = find_next_bit(bitmap, 512, idx+1))
|
|
|
|
|
2009-02-21 09:19:13 +08:00
|
|
|
static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp,
|
|
|
|
int idx)
|
2008-09-24 00:18:39 +08:00
|
|
|
{
|
2008-12-02 08:32:02 +08:00
|
|
|
int i;
|
2008-09-24 00:18:39 +08:00
|
|
|
|
2008-12-02 08:32:02 +08:00
|
|
|
if (sp->unsync)
|
|
|
|
for (i=0; i < pvec->nr; i++)
|
|
|
|
if (pvec->page[i].sp == sp)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
pvec->page[pvec->nr].sp = sp;
|
|
|
|
pvec->page[pvec->nr].idx = idx;
|
|
|
|
pvec->nr++;
|
|
|
|
return (pvec->nr == KVM_PAGE_ARRAY_NR);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
|
|
|
|
struct kvm_mmu_pages *pvec)
|
|
|
|
{
|
|
|
|
int i, ret, nr_unsync_leaf = 0;
|
2008-09-24 00:18:39 +08:00
|
|
|
|
2008-09-24 00:18:40 +08:00
|
|
|
for_each_unsync_children(sp->unsync_child_bitmap, i) {
|
2008-09-24 00:18:39 +08:00
|
|
|
u64 ent = sp->spt[i];
|
|
|
|
|
2008-12-23 04:49:30 +08:00
|
|
|
if (is_shadow_present_pte(ent) && !is_large_pte(ent)) {
|
2008-09-24 00:18:39 +08:00
|
|
|
struct kvm_mmu_page *child;
|
|
|
|
child = page_header(ent & PT64_BASE_ADDR_MASK);
|
|
|
|
|
|
|
|
if (child->unsync_children) {
|
2008-12-02 08:32:02 +08:00
|
|
|
if (mmu_pages_add(pvec, child, i))
|
|
|
|
return -ENOSPC;
|
|
|
|
|
|
|
|
ret = __mmu_unsync_walk(child, pvec);
|
|
|
|
if (!ret)
|
|
|
|
__clear_bit(i, sp->unsync_child_bitmap);
|
|
|
|
else if (ret > 0)
|
|
|
|
nr_unsync_leaf += ret;
|
|
|
|
else
|
2008-09-24 00:18:39 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (child->unsync) {
|
2008-12-02 08:32:02 +08:00
|
|
|
nr_unsync_leaf++;
|
|
|
|
if (mmu_pages_add(pvec, child, i))
|
|
|
|
return -ENOSPC;
|
2008-09-24 00:18:39 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2008-09-24 00:18:40 +08:00
|
|
|
if (find_first_bit(sp->unsync_child_bitmap, 512) == 512)
|
2008-09-24 00:18:39 +08:00
|
|
|
sp->unsync_children = 0;
|
|
|
|
|
2008-12-02 08:32:02 +08:00
|
|
|
return nr_unsync_leaf;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int mmu_unsync_walk(struct kvm_mmu_page *sp,
|
|
|
|
struct kvm_mmu_pages *pvec)
|
|
|
|
{
|
|
|
|
if (!sp->unsync_children)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
mmu_pages_add(pvec, sp, 0);
|
|
|
|
return __mmu_unsync_walk(sp, pvec);
|
2008-09-24 00:18:39 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
|
|
|
|
{
|
|
|
|
WARN_ON(!sp->unsync);
|
2010-04-28 11:55:06 +08:00
|
|
|
trace_kvm_mmu_sync_page(sp);
|
2008-09-24 00:18:39 +08:00
|
|
|
sp->unsync = 0;
|
|
|
|
--kvm->stat.mmu_unsync;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp);
|
2010-06-04 21:53:54 +08:00
|
|
|
static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
|
|
|
|
struct list_head *invalid_list);
|
|
|
|
static void kvm_mmu_commit_zap_page(struct kvm *kvm,
|
|
|
|
struct list_head *invalid_list);
|
2008-09-24 00:18:39 +08:00
|
|
|
|
2010-06-04 21:53:07 +08:00
|
|
|
#define for_each_gfn_sp(kvm, sp, gfn, pos, n) \
|
|
|
|
hlist_for_each_entry_safe(sp, pos, n, \
|
|
|
|
&(kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link) \
|
|
|
|
if ((sp)->gfn != (gfn)) {} else
|
|
|
|
|
|
|
|
#define for_each_gfn_indirect_valid_sp(kvm, sp, gfn, pos, n) \
|
|
|
|
hlist_for_each_entry_safe(sp, pos, n, \
|
|
|
|
&(kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link) \
|
|
|
|
if ((sp)->gfn != (gfn) || (sp)->role.direct || \
|
|
|
|
(sp)->role.invalid) {} else
|
|
|
|
|
2010-05-15 18:51:24 +08:00
|
|
|
static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
|
|
|
|
bool clear_unsync)
|
2008-09-24 00:18:39 +08:00
|
|
|
{
|
2010-04-15 00:20:03 +08:00
|
|
|
if (sp->role.cr4_pae != !!is_pae(vcpu)) {
|
2008-09-24 00:18:39 +08:00
|
|
|
kvm_mmu_zap_page(vcpu->kvm, sp);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2010-05-15 18:51:24 +08:00
|
|
|
if (clear_unsync) {
|
|
|
|
if (rmap_write_protect(vcpu->kvm, sp->gfn))
|
|
|
|
kvm_flush_remote_tlbs(vcpu->kvm);
|
|
|
|
kvm_unlink_unsync_page(vcpu->kvm, sp);
|
|
|
|
}
|
|
|
|
|
2008-09-24 00:18:39 +08:00
|
|
|
if (vcpu->arch.mmu.sync_page(vcpu, sp)) {
|
|
|
|
kvm_mmu_zap_page(vcpu->kvm, sp);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
kvm_mmu_flush_tlb(vcpu);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2010-05-15 18:51:24 +08:00
|
|
|
static void mmu_convert_notrap(struct kvm_mmu_page *sp);
|
|
|
|
static int kvm_sync_page_transient(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_mmu_page *sp)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
ret = __kvm_sync_page(vcpu, sp, false);
|
|
|
|
if (!ret)
|
|
|
|
mmu_convert_notrap(sp);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
|
|
|
|
{
|
|
|
|
return __kvm_sync_page(vcpu, sp, true);
|
|
|
|
}
|
|
|
|
|
2010-05-24 15:41:33 +08:00
|
|
|
/* @gfn should be write-protected at the call site */
|
|
|
|
static void kvm_sync_pages(struct kvm_vcpu *vcpu, gfn_t gfn)
|
|
|
|
{
|
|
|
|
struct kvm_mmu_page *s;
|
|
|
|
struct hlist_node *node, *n;
|
|
|
|
bool flush = false;
|
|
|
|
|
2010-06-04 21:53:07 +08:00
|
|
|
for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn, node, n) {
|
|
|
|
if (!s->unsync)
|
2010-05-24 15:41:33 +08:00
|
|
|
continue;
|
|
|
|
|
|
|
|
WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
|
|
|
|
if ((s->role.cr4_pae != !!is_pae(vcpu)) ||
|
|
|
|
(vcpu->arch.mmu.sync_page(vcpu, s))) {
|
|
|
|
kvm_mmu_zap_page(vcpu->kvm, s);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
kvm_unlink_unsync_page(vcpu->kvm, s);
|
|
|
|
flush = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (flush)
|
|
|
|
kvm_mmu_flush_tlb(vcpu);
|
|
|
|
}
|
|
|
|
|
2008-12-02 08:32:02 +08:00
|
|
|
struct mmu_page_path {
|
|
|
|
struct kvm_mmu_page *parent[PT64_ROOT_LEVEL-1];
|
|
|
|
unsigned int idx[PT64_ROOT_LEVEL-1];
|
2008-09-24 00:18:39 +08:00
|
|
|
};
|
|
|
|
|
2008-12-02 08:32:02 +08:00
|
|
|
#define for_each_sp(pvec, sp, parents, i) \
|
|
|
|
for (i = mmu_pages_next(&pvec, &parents, -1), \
|
|
|
|
sp = pvec.page[i].sp; \
|
|
|
|
i < pvec.nr && ({ sp = pvec.page[i].sp; 1;}); \
|
|
|
|
i = mmu_pages_next(&pvec, &parents, i))
|
|
|
|
|
2009-02-21 09:19:13 +08:00
|
|
|
static int mmu_pages_next(struct kvm_mmu_pages *pvec,
|
|
|
|
struct mmu_page_path *parents,
|
|
|
|
int i)
|
2008-12-02 08:32:02 +08:00
|
|
|
{
|
|
|
|
int n;
|
|
|
|
|
|
|
|
for (n = i+1; n < pvec->nr; n++) {
|
|
|
|
struct kvm_mmu_page *sp = pvec->page[n].sp;
|
|
|
|
|
|
|
|
if (sp->role.level == PT_PAGE_TABLE_LEVEL) {
|
|
|
|
parents->idx[0] = pvec->page[n].idx;
|
|
|
|
return n;
|
|
|
|
}
|
|
|
|
|
|
|
|
parents->parent[sp->role.level-2] = sp;
|
|
|
|
parents->idx[sp->role.level-1] = pvec->page[n].idx;
|
|
|
|
}
|
|
|
|
|
|
|
|
return n;
|
|
|
|
}
|
|
|
|
|
2009-02-21 09:19:13 +08:00
|
|
|
static void mmu_pages_clear_parents(struct mmu_page_path *parents)
|
2008-09-24 00:18:39 +08:00
|
|
|
{
|
2008-12-02 08:32:02 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
|
|
|
unsigned int level = 0;
|
|
|
|
|
|
|
|
do {
|
|
|
|
unsigned int idx = parents->idx[level];
|
2008-09-24 00:18:39 +08:00
|
|
|
|
2008-12-02 08:32:02 +08:00
|
|
|
sp = parents->parent[level];
|
|
|
|
if (!sp)
|
|
|
|
return;
|
|
|
|
|
|
|
|
--sp->unsync_children;
|
|
|
|
WARN_ON((int)sp->unsync_children < 0);
|
|
|
|
__clear_bit(idx, sp->unsync_child_bitmap);
|
|
|
|
level++;
|
|
|
|
} while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children);
|
2008-09-24 00:18:39 +08:00
|
|
|
}
|
|
|
|
|
2008-12-02 08:32:02 +08:00
|
|
|
static void kvm_mmu_pages_init(struct kvm_mmu_page *parent,
|
|
|
|
struct mmu_page_path *parents,
|
|
|
|
struct kvm_mmu_pages *pvec)
|
2008-09-24 00:18:39 +08:00
|
|
|
{
|
2008-12-02 08:32:02 +08:00
|
|
|
parents->parent[parent->role.level-1] = NULL;
|
|
|
|
pvec->nr = 0;
|
|
|
|
}
|
2008-09-24 00:18:39 +08:00
|
|
|
|
2008-12-02 08:32:02 +08:00
|
|
|
static void mmu_sync_children(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_mmu_page *parent)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
struct kvm_mmu_page *sp;
|
|
|
|
struct mmu_page_path parents;
|
|
|
|
struct kvm_mmu_pages pages;
|
|
|
|
|
|
|
|
kvm_mmu_pages_init(parent, &parents, &pages);
|
|
|
|
while (mmu_unsync_walk(parent, &pages)) {
|
2008-12-02 08:32:03 +08:00
|
|
|
int protected = 0;
|
|
|
|
|
|
|
|
for_each_sp(pages, sp, parents, i)
|
|
|
|
protected |= rmap_write_protect(vcpu->kvm, sp->gfn);
|
|
|
|
|
|
|
|
if (protected)
|
|
|
|
kvm_flush_remote_tlbs(vcpu->kvm);
|
|
|
|
|
2008-12-02 08:32:02 +08:00
|
|
|
for_each_sp(pages, sp, parents, i) {
|
|
|
|
kvm_sync_page(vcpu, sp);
|
|
|
|
mmu_pages_clear_parents(&parents);
|
|
|
|
}
|
2008-09-24 00:18:39 +08:00
|
|
|
cond_resched_lock(&vcpu->kvm->mmu_lock);
|
2008-12-02 08:32:02 +08:00
|
|
|
kvm_mmu_pages_init(parent, &parents, &pages);
|
|
|
|
}
|
2008-09-24 00:18:39 +08:00
|
|
|
}
|
|
|
|
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
|
|
|
|
gfn_t gfn,
|
|
|
|
gva_t gaddr,
|
|
|
|
unsigned level,
|
2009-01-11 19:02:10 +08:00
|
|
|
int direct,
|
2007-12-09 23:00:02 +08:00
|
|
|
unsigned access,
|
2008-02-27 04:12:10 +08:00
|
|
|
u64 *parent_pte)
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
{
|
|
|
|
union kvm_mmu_page_role role;
|
|
|
|
unsigned quadrant;
|
2010-05-24 15:41:33 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
2008-09-24 00:18:39 +08:00
|
|
|
struct hlist_node *node, *tmp;
|
2010-05-24 15:41:33 +08:00
|
|
|
bool need_sync = false;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
|
2008-12-22 01:20:09 +08:00
|
|
|
role = vcpu->arch.mmu.base_role;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
role.level = level;
|
2009-01-11 19:02:10 +08:00
|
|
|
role.direct = direct;
|
2010-03-14 16:16:40 +08:00
|
|
|
if (role.direct)
|
2010-04-15 00:20:03 +08:00
|
|
|
role.cr4_pae = 0;
|
2007-12-09 23:00:02 +08:00
|
|
|
role.access = access;
|
2010-05-31 17:11:39 +08:00
|
|
|
if (!tdp_enabled && vcpu->arch.mmu.root_level <= PT32_ROOT_LEVEL) {
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
quadrant = gaddr >> (PAGE_SHIFT + (PT64_PT_BITS * level));
|
|
|
|
quadrant &= (1 << ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1;
|
|
|
|
role.quadrant = quadrant;
|
|
|
|
}
|
2010-06-04 21:53:07 +08:00
|
|
|
for_each_gfn_sp(vcpu->kvm, sp, gfn, node, tmp) {
|
|
|
|
if (!need_sync && sp->unsync)
|
|
|
|
need_sync = true;
|
2008-09-24 00:18:39 +08:00
|
|
|
|
2010-06-04 21:53:07 +08:00
|
|
|
if (sp->role.word != role.word)
|
|
|
|
continue;
|
2008-09-24 00:18:39 +08:00
|
|
|
|
2010-06-04 21:53:07 +08:00
|
|
|
if (sp->unsync && kvm_sync_page_transient(vcpu, sp))
|
|
|
|
break;
|
KVM: MMU: don't write-protect if have new mapping to unsync page
Two cases maybe happen in kvm_mmu_get_page() function:
- one case is, the goal sp is already in cache, if the sp is unsync,
we only need update it to assure this mapping is valid, but not
mark it sync and not write-protect sp->gfn since it not broke unsync
rule(one shadow page for a gfn)
- another case is, the goal sp not existed, we need create a new sp
for gfn, i.e, gfn (may)has another shadow page, to keep unsync rule,
we should sync(mark sync and write-protect) gfn's unsync shadow page.
After enabling multiple unsync shadows, we sync those shadow pages
only when the new sp not allow to become unsync(also for the unsyc
rule, the new rule is: allow all pte page become unsync)
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-15 18:52:34 +08:00
|
|
|
|
2010-06-04 21:53:07 +08:00
|
|
|
mmu_page_add_parent_pte(vcpu, sp, parent_pte);
|
|
|
|
if (sp->unsync_children) {
|
|
|
|
set_bit(KVM_REQ_MMU_SYNC, &vcpu->requests);
|
|
|
|
kvm_mmu_mark_parents_unsync(sp);
|
|
|
|
} else if (sp->unsync)
|
|
|
|
kvm_mmu_mark_parents_unsync(sp);
|
KVM: MMU: don't write-protect if have new mapping to unsync page
Two cases maybe happen in kvm_mmu_get_page() function:
- one case is, the goal sp is already in cache, if the sp is unsync,
we only need update it to assure this mapping is valid, but not
mark it sync and not write-protect sp->gfn since it not broke unsync
rule(one shadow page for a gfn)
- another case is, the goal sp not existed, we need create a new sp
for gfn, i.e, gfn (may)has another shadow page, to keep unsync rule,
we should sync(mark sync and write-protect) gfn's unsync shadow page.
After enabling multiple unsync shadows, we sync those shadow pages
only when the new sp not allow to become unsync(also for the unsyc
rule, the new rule is: allow all pte page become unsync)
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-15 18:52:34 +08:00
|
|
|
|
2010-06-04 21:53:07 +08:00
|
|
|
trace_kvm_mmu_get_page(sp, false);
|
|
|
|
return sp;
|
|
|
|
}
|
2007-12-19 01:47:18 +08:00
|
|
|
++vcpu->kvm->stat.mmu_cache_miss;
|
2010-05-26 16:49:59 +08:00
|
|
|
sp = kvm_mmu_alloc_page(vcpu, parent_pte, direct);
|
2007-11-21 21:28:32 +08:00
|
|
|
if (!sp)
|
|
|
|
return sp;
|
|
|
|
sp->gfn = gfn;
|
|
|
|
sp->role = role;
|
2010-06-04 21:53:07 +08:00
|
|
|
hlist_add_head(&sp->hash_link,
|
|
|
|
&vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]);
|
2009-01-11 19:02:10 +08:00
|
|
|
if (!direct) {
|
2008-12-02 08:32:03 +08:00
|
|
|
if (rmap_write_protect(vcpu->kvm, gfn))
|
|
|
|
kvm_flush_remote_tlbs(vcpu->kvm);
|
2010-05-24 15:41:33 +08:00
|
|
|
if (level > PT_PAGE_TABLE_LEVEL && need_sync)
|
|
|
|
kvm_sync_pages(vcpu, gfn);
|
|
|
|
|
2008-09-24 00:18:39 +08:00
|
|
|
account_shadowed(vcpu->kvm, gfn);
|
|
|
|
}
|
2008-05-29 19:56:28 +08:00
|
|
|
if (shadow_trap_nonpresent_pte != shadow_notrap_nonpresent_pte)
|
|
|
|
vcpu->arch.mmu.prefetch_page(vcpu, sp);
|
|
|
|
else
|
|
|
|
nonpaging_prefetch_page(vcpu, sp);
|
2009-07-06 20:58:14 +08:00
|
|
|
trace_kvm_mmu_get_page(sp, true);
|
2007-11-21 21:28:32 +08:00
|
|
|
return sp;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
}
|
|
|
|
|
2008-12-25 20:39:47 +08:00
|
|
|
static void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator,
|
|
|
|
struct kvm_vcpu *vcpu, u64 addr)
|
|
|
|
{
|
|
|
|
iterator->addr = addr;
|
|
|
|
iterator->shadow_addr = vcpu->arch.mmu.root_hpa;
|
|
|
|
iterator->level = vcpu->arch.mmu.shadow_root_level;
|
|
|
|
if (iterator->level == PT32E_ROOT_LEVEL) {
|
|
|
|
iterator->shadow_addr
|
|
|
|
= vcpu->arch.mmu.pae_root[(addr >> 30) & 3];
|
|
|
|
iterator->shadow_addr &= PT64_BASE_ADDR_MASK;
|
|
|
|
--iterator->level;
|
|
|
|
if (!iterator->shadow_addr)
|
|
|
|
iterator->level = 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator)
|
|
|
|
{
|
|
|
|
if (iterator->level < PT_PAGE_TABLE_LEVEL)
|
|
|
|
return false;
|
2009-06-11 23:07:41 +08:00
|
|
|
|
|
|
|
if (iterator->level == PT_PAGE_TABLE_LEVEL)
|
|
|
|
if (is_large_pte(*iterator->sptep))
|
|
|
|
return false;
|
|
|
|
|
2008-12-25 20:39:47 +08:00
|
|
|
iterator->index = SHADOW_PT_INDEX(iterator->addr, iterator->level);
|
|
|
|
iterator->sptep = ((u64 *)__va(iterator->shadow_addr)) + iterator->index;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator)
|
|
|
|
{
|
|
|
|
iterator->shadow_addr = *iterator->sptep & PT64_BASE_ADDR_MASK;
|
|
|
|
--iterator->level;
|
|
|
|
}
|
|
|
|
|
2007-07-17 18:04:56 +08:00
|
|
|
static void kvm_mmu_page_unlink_children(struct kvm *kvm,
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp)
|
2007-01-06 08:36:45 +08:00
|
|
|
{
|
2007-01-06 08:36:46 +08:00
|
|
|
unsigned i;
|
|
|
|
u64 *pt;
|
|
|
|
u64 ent;
|
|
|
|
|
2007-11-21 21:28:32 +08:00
|
|
|
pt = sp->spt;
|
2007-01-06 08:36:46 +08:00
|
|
|
|
|
|
|
for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
|
|
|
|
ent = pt[i];
|
|
|
|
|
2008-02-23 22:44:30 +08:00
|
|
|
if (is_shadow_present_pte(ent)) {
|
2009-06-10 23:27:03 +08:00
|
|
|
if (!is_last_spte(ent, sp->role.level)) {
|
2008-02-23 22:44:30 +08:00
|
|
|
ent &= PT64_BASE_ADDR_MASK;
|
|
|
|
mmu_page_remove_parent_pte(page_header(ent),
|
|
|
|
&pt[i]);
|
|
|
|
} else {
|
2009-06-10 23:27:03 +08:00
|
|
|
if (is_large_pte(ent))
|
|
|
|
--kvm->stat.lpages;
|
2008-02-23 22:44:30 +08:00
|
|
|
rmap_remove(kvm, &pt[i]);
|
|
|
|
}
|
|
|
|
}
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
pt[i] = shadow_trap_nonpresent_pte;
|
2007-01-06 08:36:46 +08:00
|
|
|
}
|
2007-01-06 08:36:45 +08:00
|
|
|
}
|
|
|
|
|
2007-11-21 21:28:32 +08:00
|
|
|
static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte)
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
{
|
2007-11-21 21:28:32 +08:00
|
|
|
mmu_page_remove_parent_pte(sp, parent_pte);
|
2007-01-06 08:36:45 +08:00
|
|
|
}
|
|
|
|
|
2007-09-23 20:10:49 +08:00
|
|
|
static void kvm_mmu_reset_last_pte_updated(struct kvm *kvm)
|
|
|
|
{
|
|
|
|
int i;
|
2009-06-09 20:56:29 +08:00
|
|
|
struct kvm_vcpu *vcpu;
|
2007-09-23 20:10:49 +08:00
|
|
|
|
2009-06-09 20:56:29 +08:00
|
|
|
kvm_for_each_vcpu(i, vcpu, kvm)
|
|
|
|
vcpu->arch.last_pte_updated = NULL;
|
2007-09-23 20:10:49 +08:00
|
|
|
}
|
|
|
|
|
2008-07-11 22:59:46 +08:00
|
|
|
static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
|
2007-01-06 08:36:45 +08:00
|
|
|
{
|
|
|
|
u64 *parent_pte;
|
|
|
|
|
2007-11-21 21:28:32 +08:00
|
|
|
while (sp->multimapped || sp->parent_pte) {
|
|
|
|
if (!sp->multimapped)
|
|
|
|
parent_pte = sp->parent_pte;
|
2007-01-06 08:36:45 +08:00
|
|
|
else {
|
|
|
|
struct kvm_pte_chain *chain;
|
|
|
|
|
2007-11-21 21:28:32 +08:00
|
|
|
chain = container_of(sp->parent_ptes.first,
|
2007-01-06 08:36:45 +08:00
|
|
|
struct kvm_pte_chain, link);
|
|
|
|
parent_pte = chain->parent_ptes[0];
|
|
|
|
}
|
2007-01-06 08:36:46 +08:00
|
|
|
BUG_ON(!parent_pte);
|
2007-11-21 21:28:32 +08:00
|
|
|
kvm_mmu_put_page(sp, parent_pte);
|
2009-06-10 19:24:23 +08:00
|
|
|
__set_spte(parent_pte, shadow_trap_nonpresent_pte);
|
2007-01-06 08:36:45 +08:00
|
|
|
}
|
2008-07-11 22:59:46 +08:00
|
|
|
}
|
|
|
|
|
2008-12-02 08:32:02 +08:00
|
|
|
static int mmu_zap_unsync_children(struct kvm *kvm,
|
2010-06-04 21:53:54 +08:00
|
|
|
struct kvm_mmu_page *parent,
|
|
|
|
struct list_head *invalid_list)
|
2008-09-24 00:18:39 +08:00
|
|
|
{
|
2008-12-02 08:32:02 +08:00
|
|
|
int i, zapped = 0;
|
|
|
|
struct mmu_page_path parents;
|
|
|
|
struct kvm_mmu_pages pages;
|
2008-09-24 00:18:39 +08:00
|
|
|
|
2008-12-02 08:32:02 +08:00
|
|
|
if (parent->role.level == PT_PAGE_TABLE_LEVEL)
|
2008-09-24 00:18:39 +08:00
|
|
|
return 0;
|
2008-12-02 08:32:02 +08:00
|
|
|
|
|
|
|
kvm_mmu_pages_init(parent, &parents, &pages);
|
|
|
|
while (mmu_unsync_walk(parent, &pages)) {
|
|
|
|
struct kvm_mmu_page *sp;
|
|
|
|
|
|
|
|
for_each_sp(pages, sp, parents, i) {
|
2010-06-04 21:53:54 +08:00
|
|
|
kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
|
2008-12-02 08:32:02 +08:00
|
|
|
mmu_pages_clear_parents(&parents);
|
2010-04-16 16:34:42 +08:00
|
|
|
zapped++;
|
2008-12-02 08:32:02 +08:00
|
|
|
}
|
|
|
|
kvm_mmu_pages_init(parent, &parents, &pages);
|
|
|
|
}
|
|
|
|
|
|
|
|
return zapped;
|
2008-09-24 00:18:39 +08:00
|
|
|
}
|
|
|
|
|
2010-06-04 21:53:54 +08:00
|
|
|
static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
|
|
|
|
struct list_head *invalid_list)
|
2008-07-11 22:59:46 +08:00
|
|
|
{
|
2008-09-24 00:18:39 +08:00
|
|
|
int ret;
|
2009-07-06 20:58:14 +08:00
|
|
|
|
2010-06-04 21:53:54 +08:00
|
|
|
trace_kvm_mmu_prepare_zap_page(sp);
|
2008-07-11 22:59:46 +08:00
|
|
|
++kvm->stat.mmu_shadow_zapped;
|
2010-06-04 21:53:54 +08:00
|
|
|
ret = mmu_zap_unsync_children(kvm, sp, invalid_list);
|
2007-11-21 21:28:32 +08:00
|
|
|
kvm_mmu_page_unlink_children(kvm, sp);
|
2008-07-11 22:59:46 +08:00
|
|
|
kvm_mmu_unlink_parents(kvm, sp);
|
2009-01-11 19:02:10 +08:00
|
|
|
if (!sp->role.invalid && !sp->role.direct)
|
2008-07-11 23:07:26 +08:00
|
|
|
unaccount_shadowed(kvm, sp->gfn);
|
2008-09-24 00:18:39 +08:00
|
|
|
if (sp->unsync)
|
|
|
|
kvm_unlink_unsync_page(kvm, sp);
|
2007-11-21 21:28:32 +08:00
|
|
|
if (!sp->root_count) {
|
2010-05-05 09:03:49 +08:00
|
|
|
/* Count self */
|
|
|
|
ret++;
|
2010-06-04 21:53:54 +08:00
|
|
|
list_move(&sp->link, invalid_list);
|
2008-02-21 03:47:24 +08:00
|
|
|
} else {
|
2008-07-11 23:07:26 +08:00
|
|
|
list_move(&sp->link, &kvm->arch.active_mmu_pages);
|
2008-02-21 03:47:24 +08:00
|
|
|
kvm_reload_remote_mmus(kvm);
|
|
|
|
}
|
2010-06-04 21:53:54 +08:00
|
|
|
|
|
|
|
sp->role.invalid = 1;
|
2007-09-23 20:10:49 +08:00
|
|
|
kvm_mmu_reset_last_pte_updated(kvm);
|
2008-09-24 00:18:39 +08:00
|
|
|
return ret;
|
2007-01-06 08:36:45 +08:00
|
|
|
}
|
|
|
|
|
2010-06-04 21:53:54 +08:00
|
|
|
static void kvm_mmu_commit_zap_page(struct kvm *kvm,
|
|
|
|
struct list_head *invalid_list)
|
|
|
|
{
|
|
|
|
struct kvm_mmu_page *sp;
|
|
|
|
|
|
|
|
if (list_empty(invalid_list))
|
|
|
|
return;
|
|
|
|
|
|
|
|
kvm_flush_remote_tlbs(kvm);
|
|
|
|
|
|
|
|
do {
|
|
|
|
sp = list_first_entry(invalid_list, struct kvm_mmu_page, link);
|
|
|
|
WARN_ON(!sp->role.invalid || sp->root_count);
|
|
|
|
kvm_mmu_free_page(kvm, sp);
|
|
|
|
} while (!list_empty(invalid_list));
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp)
|
|
|
|
{
|
|
|
|
LIST_HEAD(invalid_list);
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
ret = kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
|
|
|
|
kvm_mmu_commit_zap_page(kvm, &invalid_list);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2007-10-03 00:52:55 +08:00
|
|
|
/*
|
|
|
|
* Changing the number of mmu pages allocated to the vm
|
|
|
|
* Note: if kvm_nr_mmu_pages is too small, you will get dead lock
|
|
|
|
*/
|
|
|
|
void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages)
|
|
|
|
{
|
2009-07-23 00:05:49 +08:00
|
|
|
int used_pages;
|
|
|
|
|
|
|
|
used_pages = kvm->arch.n_alloc_mmu_pages - kvm->arch.n_free_mmu_pages;
|
|
|
|
used_pages = max(0, used_pages);
|
|
|
|
|
2007-10-03 00:52:55 +08:00
|
|
|
/*
|
|
|
|
* If we set the number of mmu pages to be smaller be than the
|
|
|
|
* number of actived pages , we must to free some mmu pages before we
|
|
|
|
* change the value
|
|
|
|
*/
|
|
|
|
|
2009-07-23 00:05:49 +08:00
|
|
|
if (used_pages > kvm_nr_mmu_pages) {
|
2010-04-16 16:34:42 +08:00
|
|
|
while (used_pages > kvm_nr_mmu_pages &&
|
|
|
|
!list_empty(&kvm->arch.active_mmu_pages)) {
|
2007-10-03 00:52:55 +08:00
|
|
|
struct kvm_mmu_page *page;
|
|
|
|
|
2007-12-14 10:01:48 +08:00
|
|
|
page = container_of(kvm->arch.active_mmu_pages.prev,
|
2007-10-03 00:52:55 +08:00
|
|
|
struct kvm_mmu_page, link);
|
2010-04-16 16:34:42 +08:00
|
|
|
used_pages -= kvm_mmu_zap_page(kvm, page);
|
2007-10-03 00:52:55 +08:00
|
|
|
}
|
2010-04-16 16:34:42 +08:00
|
|
|
kvm_nr_mmu_pages = used_pages;
|
2007-12-14 10:01:48 +08:00
|
|
|
kvm->arch.n_free_mmu_pages = 0;
|
2007-10-03 00:52:55 +08:00
|
|
|
}
|
|
|
|
else
|
2007-12-14 10:01:48 +08:00
|
|
|
kvm->arch.n_free_mmu_pages += kvm_nr_mmu_pages
|
|
|
|
- kvm->arch.n_alloc_mmu_pages;
|
2007-10-03 00:52:55 +08:00
|
|
|
|
2007-12-14 10:01:48 +08:00
|
|
|
kvm->arch.n_alloc_mmu_pages = kvm_nr_mmu_pages;
|
2007-10-03 00:52:55 +08:00
|
|
|
}
|
|
|
|
|
2007-10-11 08:25:50 +08:00
|
|
|
static int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn)
|
2007-01-06 08:36:45 +08:00
|
|
|
{
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
2007-01-06 08:36:45 +08:00
|
|
|
struct hlist_node *node, *n;
|
|
|
|
int r;
|
|
|
|
|
2008-03-04 04:59:56 +08:00
|
|
|
pgprintk("%s: looking for gfn %lx\n", __func__, gfn);
|
2007-01-06 08:36:45 +08:00
|
|
|
r = 0;
|
2010-04-16 16:35:54 +08:00
|
|
|
restart:
|
2010-06-04 21:53:07 +08:00
|
|
|
for_each_gfn_indirect_valid_sp(kvm, sp, gfn, node, n) {
|
|
|
|
pgprintk("%s: gfn %lx role %x\n", __func__, gfn,
|
|
|
|
sp->role.word);
|
|
|
|
r = 1;
|
|
|
|
if (kvm_mmu_zap_page(kvm, sp))
|
|
|
|
goto restart;
|
|
|
|
}
|
2007-01-06 08:36:45 +08:00
|
|
|
return r;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
}
|
|
|
|
|
2007-10-11 08:25:50 +08:00
|
|
|
static void mmu_unshadow(struct kvm *kvm, gfn_t gfn)
|
2007-05-31 20:08:29 +08:00
|
|
|
{
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
2009-01-06 19:00:27 +08:00
|
|
|
struct hlist_node *node, *nn;
|
2007-05-31 20:08:29 +08:00
|
|
|
|
2010-04-16 16:35:54 +08:00
|
|
|
restart:
|
2010-06-04 21:53:07 +08:00
|
|
|
for_each_gfn_indirect_valid_sp(kvm, sp, gfn, node, nn) {
|
|
|
|
pgprintk("%s: zap %lx %x\n",
|
|
|
|
__func__, gfn, sp->role.word);
|
|
|
|
if (kvm_mmu_zap_page(kvm, sp))
|
|
|
|
goto restart;
|
2007-05-31 20:08:29 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-11-21 20:20:22 +08:00
|
|
|
static void page_header_update_slot(struct kvm *kvm, void *pte, gfn_t gfn)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2009-12-24 00:35:21 +08:00
|
|
|
int slot = memslot_id(kvm, gfn);
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp = page_header(__pa(pte));
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2008-10-16 17:30:57 +08:00
|
|
|
__set_bit(slot, sp->slot_bitmap);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
2008-09-24 00:18:38 +08:00
|
|
|
static void mmu_convert_notrap(struct kvm_mmu_page *sp)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
u64 *pt = sp->spt;
|
|
|
|
|
|
|
|
if (shadow_trap_nonpresent_pte == shadow_notrap_nonpresent_pte)
|
|
|
|
return;
|
|
|
|
|
|
|
|
for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
|
|
|
|
if (pt[i] == shadow_notrap_nonpresent_pte)
|
2009-06-10 19:24:23 +08:00
|
|
|
__set_spte(&pt[i], shadow_trap_nonpresent_pte);
|
2008-09-24 00:18:38 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2008-10-09 16:01:56 +08:00
|
|
|
/*
|
|
|
|
* The function is based on mtrr_type_lookup() in
|
|
|
|
* arch/x86/kernel/cpu/mtrr/generic.c
|
|
|
|
*/
|
|
|
|
static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
|
|
|
|
u64 start, u64 end)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
u64 base, mask;
|
|
|
|
u8 prev_match, curr_match;
|
|
|
|
int num_var_ranges = KVM_NR_VAR_MTRR;
|
|
|
|
|
|
|
|
if (!mtrr_state->enabled)
|
|
|
|
return 0xFF;
|
|
|
|
|
|
|
|
/* Make end inclusive end, instead of exclusive */
|
|
|
|
end--;
|
|
|
|
|
|
|
|
/* Look in fixed ranges. Just return the type as per start */
|
|
|
|
if (mtrr_state->have_fixed && (start < 0x100000)) {
|
|
|
|
int idx;
|
|
|
|
|
|
|
|
if (start < 0x80000) {
|
|
|
|
idx = 0;
|
|
|
|
idx += (start >> 16);
|
|
|
|
return mtrr_state->fixed_ranges[idx];
|
|
|
|
} else if (start < 0xC0000) {
|
|
|
|
idx = 1 * 8;
|
|
|
|
idx += ((start - 0x80000) >> 14);
|
|
|
|
return mtrr_state->fixed_ranges[idx];
|
|
|
|
} else if (start < 0x1000000) {
|
|
|
|
idx = 3 * 8;
|
|
|
|
idx += ((start - 0xC0000) >> 12);
|
|
|
|
return mtrr_state->fixed_ranges[idx];
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Look in variable ranges
|
|
|
|
* Look of multiple ranges matching this address and pick type
|
|
|
|
* as per MTRR precedence
|
|
|
|
*/
|
|
|
|
if (!(mtrr_state->enabled & 2))
|
|
|
|
return mtrr_state->def_type;
|
|
|
|
|
|
|
|
prev_match = 0xFF;
|
|
|
|
for (i = 0; i < num_var_ranges; ++i) {
|
|
|
|
unsigned short start_state, end_state;
|
|
|
|
|
|
|
|
if (!(mtrr_state->var_ranges[i].mask_lo & (1 << 11)))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
base = (((u64)mtrr_state->var_ranges[i].base_hi) << 32) +
|
|
|
|
(mtrr_state->var_ranges[i].base_lo & PAGE_MASK);
|
|
|
|
mask = (((u64)mtrr_state->var_ranges[i].mask_hi) << 32) +
|
|
|
|
(mtrr_state->var_ranges[i].mask_lo & PAGE_MASK);
|
|
|
|
|
|
|
|
start_state = ((start & mask) == (base & mask));
|
|
|
|
end_state = ((end & mask) == (base & mask));
|
|
|
|
if (start_state != end_state)
|
|
|
|
return 0xFE;
|
|
|
|
|
|
|
|
if ((start & mask) != (base & mask))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
curr_match = mtrr_state->var_ranges[i].base_lo & 0xff;
|
|
|
|
if (prev_match == 0xFF) {
|
|
|
|
prev_match = curr_match;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (prev_match == MTRR_TYPE_UNCACHABLE ||
|
|
|
|
curr_match == MTRR_TYPE_UNCACHABLE)
|
|
|
|
return MTRR_TYPE_UNCACHABLE;
|
|
|
|
|
|
|
|
if ((prev_match == MTRR_TYPE_WRBACK &&
|
|
|
|
curr_match == MTRR_TYPE_WRTHROUGH) ||
|
|
|
|
(prev_match == MTRR_TYPE_WRTHROUGH &&
|
|
|
|
curr_match == MTRR_TYPE_WRBACK)) {
|
|
|
|
prev_match = MTRR_TYPE_WRTHROUGH;
|
|
|
|
curr_match = MTRR_TYPE_WRTHROUGH;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (prev_match != curr_match)
|
|
|
|
return MTRR_TYPE_UNCACHABLE;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (prev_match != 0xFF)
|
|
|
|
return prev_match;
|
|
|
|
|
|
|
|
return mtrr_state->def_type;
|
|
|
|
}
|
|
|
|
|
2009-04-27 20:35:42 +08:00
|
|
|
u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn)
|
2008-10-09 16:01:56 +08:00
|
|
|
{
|
|
|
|
u8 mtrr;
|
|
|
|
|
|
|
|
mtrr = get_mtrr_type(&vcpu->arch.mtrr_state, gfn << PAGE_SHIFT,
|
|
|
|
(gfn << PAGE_SHIFT) + PAGE_SIZE);
|
|
|
|
if (mtrr == 0xfe || mtrr == 0xff)
|
|
|
|
mtrr = MTRR_TYPE_WRBACK;
|
|
|
|
return mtrr;
|
|
|
|
}
|
2009-04-27 20:35:42 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_get_guest_memory_type);
|
2008-10-09 16:01:56 +08:00
|
|
|
|
2010-05-24 15:40:07 +08:00
|
|
|
static void __kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
|
|
|
|
{
|
|
|
|
trace_kvm_mmu_unsync_page(sp);
|
|
|
|
++vcpu->kvm->stat.mmu_unsync;
|
|
|
|
sp->unsync = 1;
|
|
|
|
|
|
|
|
kvm_mmu_mark_parents_unsync(sp);
|
|
|
|
mmu_convert_notrap(sp);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void kvm_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn)
|
2008-09-24 00:18:39 +08:00
|
|
|
{
|
|
|
|
struct kvm_mmu_page *s;
|
|
|
|
struct hlist_node *node, *n;
|
2010-05-24 15:40:07 +08:00
|
|
|
|
2010-06-04 21:53:07 +08:00
|
|
|
for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn, node, n) {
|
|
|
|
if (s->unsync)
|
2008-09-24 00:18:39 +08:00
|
|
|
continue;
|
2010-05-24 15:40:07 +08:00
|
|
|
WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
|
|
|
|
__kvm_unsync_page(vcpu, s);
|
2008-09-24 00:18:39 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
|
|
|
|
bool can_unsync)
|
|
|
|
{
|
2010-05-24 15:40:07 +08:00
|
|
|
struct kvm_mmu_page *s;
|
|
|
|
struct hlist_node *node, *n;
|
|
|
|
bool need_unsync = false;
|
|
|
|
|
2010-06-04 21:53:07 +08:00
|
|
|
for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn, node, n) {
|
2010-05-24 15:40:07 +08:00
|
|
|
if (s->role.level != PT_PAGE_TABLE_LEVEL)
|
2008-09-24 00:18:39 +08:00
|
|
|
return 1;
|
2010-05-24 15:40:07 +08:00
|
|
|
|
|
|
|
if (!need_unsync && !s->unsync) {
|
|
|
|
if (!can_unsync || !oos_shadow)
|
|
|
|
return 1;
|
|
|
|
need_unsync = true;
|
|
|
|
}
|
2008-09-24 00:18:39 +08:00
|
|
|
}
|
2010-05-24 15:40:07 +08:00
|
|
|
if (need_unsync)
|
|
|
|
kvm_unsync_pages(vcpu, gfn);
|
2008-09-24 00:18:39 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2009-06-10 19:24:23 +08:00
|
|
|
static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
|
2008-09-24 00:18:30 +08:00
|
|
|
unsigned pte_access, int user_fault,
|
2009-07-27 22:30:44 +08:00
|
|
|
int write_fault, int dirty, int level,
|
2009-04-06 01:54:47 +08:00
|
|
|
gfn_t gfn, pfn_t pfn, bool speculative,
|
2009-09-24 02:47:17 +08:00
|
|
|
bool can_unsync, bool reset_host_protection)
|
2007-12-09 23:40:31 +08:00
|
|
|
{
|
|
|
|
u64 spte;
|
2008-09-24 00:18:30 +08:00
|
|
|
int ret = 0;
|
2008-10-09 16:01:57 +08:00
|
|
|
|
2007-12-09 23:40:31 +08:00
|
|
|
/*
|
|
|
|
* We don't set the accessed bit, since we sometimes want to see
|
|
|
|
* whether the guest actually used the pte (in order to detect
|
|
|
|
* demand paging).
|
|
|
|
*/
|
2008-04-25 21:13:50 +08:00
|
|
|
spte = shadow_base_present_pte | shadow_dirty_mask;
|
2008-03-18 17:05:52 +08:00
|
|
|
if (!speculative)
|
2008-08-28 01:01:04 +08:00
|
|
|
spte |= shadow_accessed_mask;
|
2007-12-09 23:40:31 +08:00
|
|
|
if (!dirty)
|
|
|
|
pte_access &= ~ACC_WRITE_MASK;
|
2008-04-25 21:13:50 +08:00
|
|
|
if (pte_access & ACC_EXEC_MASK)
|
|
|
|
spte |= shadow_x_mask;
|
|
|
|
else
|
|
|
|
spte |= shadow_nx_mask;
|
2007-12-09 23:40:31 +08:00
|
|
|
if (pte_access & ACC_USER_MASK)
|
2008-04-25 21:13:50 +08:00
|
|
|
spte |= shadow_user_mask;
|
2009-07-27 22:30:44 +08:00
|
|
|
if (level > PT_PAGE_TABLE_LEVEL)
|
2008-02-23 22:44:30 +08:00
|
|
|
spte |= PT_PAGE_SIZE_MASK;
|
2009-04-27 20:35:42 +08:00
|
|
|
if (tdp_enabled)
|
|
|
|
spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
|
|
|
|
kvm_is_mmio_pfn(pfn));
|
2007-12-09 23:40:31 +08:00
|
|
|
|
2009-09-24 02:47:17 +08:00
|
|
|
if (reset_host_protection)
|
|
|
|
spte |= SPTE_HOST_WRITEABLE;
|
|
|
|
|
2008-04-03 03:46:56 +08:00
|
|
|
spte |= (u64)pfn << PAGE_SHIFT;
|
2007-12-09 23:40:31 +08:00
|
|
|
|
|
|
|
if ((pte_access & ACC_WRITE_MASK)
|
2010-05-27 19:22:51 +08:00
|
|
|
|| (!tdp_enabled && write_fault && !is_write_protection(vcpu)
|
|
|
|
&& !user_fault)) {
|
2007-12-09 23:40:31 +08:00
|
|
|
|
2009-07-27 22:30:44 +08:00
|
|
|
if (level > PT_PAGE_TABLE_LEVEL &&
|
|
|
|
has_wrprotected_page(vcpu->kvm, gfn, level)) {
|
2008-09-24 00:18:32 +08:00
|
|
|
ret = 1;
|
2010-05-13 10:07:00 +08:00
|
|
|
rmap_remove(vcpu->kvm, sptep);
|
2008-09-24 00:18:32 +08:00
|
|
|
spte = shadow_trap_nonpresent_pte;
|
|
|
|
goto set_pte;
|
|
|
|
}
|
|
|
|
|
2007-12-09 23:40:31 +08:00
|
|
|
spte |= PT_WRITABLE_MASK;
|
|
|
|
|
2010-05-27 19:35:58 +08:00
|
|
|
if (!tdp_enabled && !(pte_access & ACC_WRITE_MASK))
|
|
|
|
spte &= ~PT_USER_MASK;
|
|
|
|
|
2008-11-25 22:58:07 +08:00
|
|
|
/*
|
|
|
|
* Optimization: for pte sync, if spte was writable the hash
|
|
|
|
* lookup is unnecessary (and expensive). Write protection
|
|
|
|
* is responsibility of mmu_get_page / kvm_sync_page.
|
|
|
|
* Same reasoning can be applied to dirty page accounting.
|
|
|
|
*/
|
2010-01-18 17:45:10 +08:00
|
|
|
if (!can_unsync && is_writable_pte(*sptep))
|
2008-11-25 22:58:07 +08:00
|
|
|
goto set_pte;
|
|
|
|
|
2008-09-24 00:18:39 +08:00
|
|
|
if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
|
2007-12-09 23:40:31 +08:00
|
|
|
pgprintk("%s: found shadow page for %lx, marking ro\n",
|
2008-03-04 04:59:56 +08:00
|
|
|
__func__, gfn);
|
2008-09-24 00:18:30 +08:00
|
|
|
ret = 1;
|
2007-12-09 23:40:31 +08:00
|
|
|
pte_access &= ~ACC_WRITE_MASK;
|
2010-01-18 17:45:10 +08:00
|
|
|
if (is_writable_pte(spte))
|
2007-12-09 23:40:31 +08:00
|
|
|
spte &= ~PT_WRITABLE_MASK;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (pte_access & ACC_WRITE_MASK)
|
|
|
|
mark_page_dirty(vcpu->kvm, gfn);
|
|
|
|
|
2008-09-24 00:18:32 +08:00
|
|
|
set_pte:
|
2009-06-10 19:24:23 +08:00
|
|
|
__set_spte(sptep, spte);
|
2008-09-24 00:18:30 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2009-06-10 19:24:23 +08:00
|
|
|
static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
|
2008-09-24 00:18:30 +08:00
|
|
|
unsigned pt_access, unsigned pte_access,
|
|
|
|
int user_fault, int write_fault, int dirty,
|
2009-07-27 22:30:44 +08:00
|
|
|
int *ptwrite, int level, gfn_t gfn,
|
2009-09-24 02:47:17 +08:00
|
|
|
pfn_t pfn, bool speculative,
|
|
|
|
bool reset_host_protection)
|
2008-09-24 00:18:30 +08:00
|
|
|
{
|
|
|
|
int was_rmapped = 0;
|
2010-01-18 17:45:10 +08:00
|
|
|
int was_writable = is_writable_pte(*sptep);
|
2009-08-06 02:43:58 +08:00
|
|
|
int rmap_count;
|
2008-09-24 00:18:30 +08:00
|
|
|
|
|
|
|
pgprintk("%s: spte %llx access %x write_fault %d"
|
|
|
|
" user_fault %d gfn %lx\n",
|
2009-06-10 19:24:23 +08:00
|
|
|
__func__, *sptep, pt_access,
|
2008-09-24 00:18:30 +08:00
|
|
|
write_fault, user_fault, gfn);
|
|
|
|
|
2009-06-10 19:24:23 +08:00
|
|
|
if (is_rmap_spte(*sptep)) {
|
2008-09-24 00:18:30 +08:00
|
|
|
/*
|
|
|
|
* If we overwrite a PTE page pointer with a 2MB PMD, unlink
|
|
|
|
* the parent of the now unreachable PTE.
|
|
|
|
*/
|
2009-07-27 22:30:44 +08:00
|
|
|
if (level > PT_PAGE_TABLE_LEVEL &&
|
|
|
|
!is_large_pte(*sptep)) {
|
2008-09-24 00:18:30 +08:00
|
|
|
struct kvm_mmu_page *child;
|
2009-06-10 19:24:23 +08:00
|
|
|
u64 pte = *sptep;
|
2008-09-24 00:18:30 +08:00
|
|
|
|
|
|
|
child = page_header(pte & PT64_BASE_ADDR_MASK);
|
2009-06-10 19:24:23 +08:00
|
|
|
mmu_page_remove_parent_pte(child, sptep);
|
2010-05-28 20:44:59 +08:00
|
|
|
__set_spte(sptep, shadow_trap_nonpresent_pte);
|
|
|
|
kvm_flush_remote_tlbs(vcpu->kvm);
|
2009-06-10 19:24:23 +08:00
|
|
|
} else if (pfn != spte_to_pfn(*sptep)) {
|
2008-09-24 00:18:30 +08:00
|
|
|
pgprintk("hfn old %lx new %lx\n",
|
2009-06-10 19:24:23 +08:00
|
|
|
spte_to_pfn(*sptep), pfn);
|
|
|
|
rmap_remove(vcpu->kvm, sptep);
|
2010-06-30 16:04:06 +08:00
|
|
|
__set_spte(sptep, shadow_trap_nonpresent_pte);
|
|
|
|
kvm_flush_remote_tlbs(vcpu->kvm);
|
2009-02-18 21:08:59 +08:00
|
|
|
} else
|
|
|
|
was_rmapped = 1;
|
2008-09-24 00:18:30 +08:00
|
|
|
}
|
2009-07-27 22:30:44 +08:00
|
|
|
|
2009-06-10 19:24:23 +08:00
|
|
|
if (set_spte(vcpu, sptep, pte_access, user_fault, write_fault,
|
2009-09-24 02:47:17 +08:00
|
|
|
dirty, level, gfn, pfn, speculative, true,
|
|
|
|
reset_host_protection)) {
|
2008-09-24 00:18:30 +08:00
|
|
|
if (write_fault)
|
|
|
|
*ptwrite = 1;
|
2008-09-24 00:18:31 +08:00
|
|
|
kvm_x86_ops->tlb_flush(vcpu);
|
|
|
|
}
|
2008-09-24 00:18:30 +08:00
|
|
|
|
2009-06-10 19:24:23 +08:00
|
|
|
pgprintk("%s: setting spte %llx\n", __func__, *sptep);
|
2008-09-24 00:18:30 +08:00
|
|
|
pgprintk("instantiating %s PTE (%s) at %ld (%llx) addr %p\n",
|
2009-06-10 19:24:23 +08:00
|
|
|
is_large_pte(*sptep)? "2MB" : "4kB",
|
2009-07-09 22:36:01 +08:00
|
|
|
*sptep & PT_PRESENT_MASK ?"RW":"R", gfn,
|
|
|
|
*sptep, sptep);
|
2009-06-10 19:24:23 +08:00
|
|
|
if (!was_rmapped && is_large_pte(*sptep))
|
2008-02-23 22:44:30 +08:00
|
|
|
++vcpu->kvm->stat.lpages;
|
|
|
|
|
2009-06-10 19:24:23 +08:00
|
|
|
page_header_update_slot(vcpu->kvm, sptep, gfn);
|
2007-12-09 23:40:31 +08:00
|
|
|
if (!was_rmapped) {
|
2009-07-27 22:30:42 +08:00
|
|
|
rmap_count = rmap_add(vcpu, sptep, gfn);
|
2009-09-24 02:47:16 +08:00
|
|
|
kvm_release_pfn_clean(pfn);
|
2009-08-06 02:43:58 +08:00
|
|
|
if (rmap_count > RMAP_RECYCLE_THRESHOLD)
|
2009-07-27 22:30:44 +08:00
|
|
|
rmap_recycle(vcpu, sptep, gfn);
|
2008-01-13 05:49:09 +08:00
|
|
|
} else {
|
2010-01-18 17:45:10 +08:00
|
|
|
if (was_writable)
|
2008-04-03 03:46:56 +08:00
|
|
|
kvm_release_pfn_dirty(pfn);
|
2008-01-13 05:49:09 +08:00
|
|
|
else
|
2008-04-03 03:46:56 +08:00
|
|
|
kvm_release_pfn_clean(pfn);
|
2007-12-09 23:40:31 +08:00
|
|
|
}
|
2008-05-15 18:51:35 +08:00
|
|
|
if (speculative) {
|
2009-06-10 19:24:23 +08:00
|
|
|
vcpu->arch.last_pte_updated = sptep;
|
2008-05-15 18:51:35 +08:00
|
|
|
vcpu->arch.last_pte_gfn = gfn;
|
|
|
|
}
|
2007-12-09 23:40:31 +08:00
|
|
|
}
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
static void nonpaging_new_cr3(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2008-12-25 20:54:25 +08:00
|
|
|
static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
|
2009-07-27 22:30:44 +08:00
|
|
|
int level, gfn_t gfn, pfn_t pfn)
|
2008-08-23 00:28:04 +08:00
|
|
|
{
|
2008-12-25 20:54:25 +08:00
|
|
|
struct kvm_shadow_walk_iterator iterator;
|
2008-08-23 00:28:04 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
2008-12-25 20:54:25 +08:00
|
|
|
int pt_write = 0;
|
2008-08-23 00:28:04 +08:00
|
|
|
gfn_t pseudo_gfn;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2008-12-25 20:54:25 +08:00
|
|
|
for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
|
2009-07-27 22:30:44 +08:00
|
|
|
if (iterator.level == level) {
|
2008-12-25 20:54:25 +08:00
|
|
|
mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, ACC_ALL,
|
|
|
|
0, write, 1, &pt_write,
|
2009-09-24 02:47:17 +08:00
|
|
|
level, gfn, pfn, false, true);
|
2008-12-25 20:54:25 +08:00
|
|
|
++vcpu->stat.pf_fixed;
|
|
|
|
break;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
2008-12-25 20:54:25 +08:00
|
|
|
if (*iterator.sptep == shadow_trap_nonpresent_pte) {
|
2010-05-26 16:48:25 +08:00
|
|
|
u64 base_addr = iterator.addr;
|
|
|
|
|
|
|
|
base_addr &= PT64_LVL_ADDR_MASK(iterator.level);
|
|
|
|
pseudo_gfn = base_addr >> PAGE_SHIFT;
|
2008-12-25 20:54:25 +08:00
|
|
|
sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr,
|
|
|
|
iterator.level - 1,
|
|
|
|
1, ACC_ALL, iterator.sptep);
|
|
|
|
if (!sp) {
|
|
|
|
pgprintk("nonpaging_map: ENOMEM\n");
|
|
|
|
kvm_release_pfn_clean(pfn);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
2008-08-23 00:28:04 +08:00
|
|
|
|
2009-06-10 19:24:23 +08:00
|
|
|
__set_spte(iterator.sptep,
|
|
|
|
__pa(sp->spt)
|
|
|
|
| PT_PRESENT_MASK | PT_WRITABLE_MASK
|
|
|
|
| shadow_user_mask | shadow_x_mask);
|
2008-12-25 20:54:25 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
return pt_write;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
2010-05-31 14:28:19 +08:00
|
|
|
static void kvm_send_hwpoison_signal(struct kvm *kvm, gfn_t gfn)
|
|
|
|
{
|
|
|
|
char buf[1];
|
|
|
|
void __user *hva;
|
|
|
|
int r;
|
|
|
|
|
|
|
|
/* Touch the page, so send SIGBUS */
|
|
|
|
hva = (void __user *)gfn_to_hva(kvm, gfn);
|
|
|
|
r = copy_from_user(buf, hva, 1);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int kvm_handle_bad_page(struct kvm *kvm, gfn_t gfn, pfn_t pfn)
|
|
|
|
{
|
|
|
|
kvm_release_pfn_clean(pfn);
|
|
|
|
if (is_hwpoison_pfn(pfn)) {
|
|
|
|
kvm_send_hwpoison_signal(kvm, gfn);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2007-12-21 08:18:22 +08:00
|
|
|
static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn)
|
|
|
|
{
|
|
|
|
int r;
|
2009-07-27 22:30:44 +08:00
|
|
|
int level;
|
2008-04-03 03:46:56 +08:00
|
|
|
pfn_t pfn;
|
2008-07-25 22:24:52 +08:00
|
|
|
unsigned long mmu_seq;
|
2007-12-21 08:18:26 +08:00
|
|
|
|
2009-07-27 22:30:44 +08:00
|
|
|
level = mapping_level(vcpu, gfn);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This path builds a PAE pagetable - so we can map 2mb pages at
|
|
|
|
* maximum. Therefore check if the level is larger than that.
|
|
|
|
*/
|
|
|
|
if (level > PT_DIRECTORY_LEVEL)
|
|
|
|
level = PT_DIRECTORY_LEVEL;
|
|
|
|
|
|
|
|
gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1);
|
2008-02-23 22:44:30 +08:00
|
|
|
|
2008-07-25 22:24:52 +08:00
|
|
|
mmu_seq = vcpu->kvm->mmu_notifier_seq;
|
2008-09-17 07:54:47 +08:00
|
|
|
smp_rmb();
|
2008-04-03 03:46:56 +08:00
|
|
|
pfn = gfn_to_pfn(vcpu->kvm, gfn);
|
2007-12-21 08:18:26 +08:00
|
|
|
|
2008-01-24 17:44:11 +08:00
|
|
|
/* mmio */
|
2010-05-31 14:28:19 +08:00
|
|
|
if (is_error_pfn(pfn))
|
|
|
|
return kvm_handle_bad_page(vcpu->kvm, gfn, pfn);
|
2008-01-24 17:44:11 +08:00
|
|
|
|
2007-12-21 08:18:26 +08:00
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
2008-07-25 22:24:52 +08:00
|
|
|
if (mmu_notifier_retry(vcpu, mmu_seq))
|
|
|
|
goto out_unlock;
|
2007-12-31 21:27:49 +08:00
|
|
|
kvm_mmu_free_some_pages(vcpu);
|
2009-07-27 22:30:44 +08:00
|
|
|
r = __direct_map(vcpu, v, write, level, gfn, pfn);
|
2007-12-21 08:18:26 +08:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
|
|
|
|
|
|
|
|
2007-12-21 08:18:22 +08:00
|
|
|
return r;
|
2008-07-25 22:24:52 +08:00
|
|
|
|
|
|
|
out_unlock:
|
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
|
|
|
kvm_release_pfn_clean(pfn);
|
|
|
|
return 0;
|
2007-12-21 08:18:22 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-01-06 08:36:40 +08:00
|
|
|
static void mmu_free_roots(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
int i;
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
2007-01-06 08:36:40 +08:00
|
|
|
|
2007-12-13 23:50:52 +08:00
|
|
|
if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
|
2007-06-05 17:17:03 +08:00
|
|
|
return;
|
2007-12-21 08:18:26 +08:00
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
2007-12-13 23:50:52 +08:00
|
|
|
if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
|
|
|
|
hpa_t root = vcpu->arch.mmu.root_hpa;
|
2007-01-06 08:36:40 +08:00
|
|
|
|
2007-11-21 21:28:32 +08:00
|
|
|
sp = page_header(root);
|
|
|
|
--sp->root_count;
|
2008-02-21 03:47:24 +08:00
|
|
|
if (!sp->root_count && sp->role.invalid)
|
|
|
|
kvm_mmu_zap_page(vcpu->kvm, sp);
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.root_hpa = INVALID_PAGE;
|
2007-12-21 08:18:26 +08:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
2007-01-06 08:36:40 +08:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
for (i = 0; i < 4; ++i) {
|
2007-12-13 23:50:52 +08:00
|
|
|
hpa_t root = vcpu->arch.mmu.pae_root[i];
|
2007-01-06 08:36:40 +08:00
|
|
|
|
2007-04-12 22:35:58 +08:00
|
|
|
if (root) {
|
|
|
|
root &= PT64_BASE_ADDR_MASK;
|
2007-11-21 21:28:32 +08:00
|
|
|
sp = page_header(root);
|
|
|
|
--sp->root_count;
|
2008-02-21 03:47:24 +08:00
|
|
|
if (!sp->root_count && sp->role.invalid)
|
|
|
|
kvm_mmu_zap_page(vcpu->kvm, sp);
|
2007-04-12 22:35:58 +08:00
|
|
|
}
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.pae_root[i] = INVALID_PAGE;
|
2007-01-06 08:36:40 +08:00
|
|
|
}
|
2007-12-21 08:18:26 +08:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.root_hpa = INVALID_PAGE;
|
2007-01-06 08:36:40 +08:00
|
|
|
}
|
|
|
|
|
2009-05-13 05:55:45 +08:00
|
|
|
static int mmu_check_root(struct kvm_vcpu *vcpu, gfn_t root_gfn)
|
|
|
|
{
|
|
|
|
int ret = 0;
|
|
|
|
|
|
|
|
if (!kvm_is_visible_gfn(vcpu->kvm, root_gfn)) {
|
|
|
|
set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests);
|
|
|
|
ret = 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
|
2007-01-06 08:36:40 +08:00
|
|
|
{
|
|
|
|
int i;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
gfn_t root_gfn;
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
2009-01-11 19:02:10 +08:00
|
|
|
int direct = 0;
|
2009-06-01 03:58:47 +08:00
|
|
|
u64 pdptr;
|
2007-01-06 08:36:51 +08:00
|
|
|
|
2007-12-13 23:50:52 +08:00
|
|
|
root_gfn = vcpu->arch.cr3 >> PAGE_SHIFT;
|
2007-01-06 08:36:40 +08:00
|
|
|
|
2007-12-13 23:50:52 +08:00
|
|
|
if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
|
|
|
|
hpa_t root = vcpu->arch.mmu.root_hpa;
|
2007-01-06 08:36:40 +08:00
|
|
|
|
|
|
|
ASSERT(!VALID_PAGE(root));
|
2009-05-13 05:55:45 +08:00
|
|
|
if (mmu_check_root(vcpu, root_gfn))
|
|
|
|
return 1;
|
2010-04-27 08:00:05 +08:00
|
|
|
if (tdp_enabled) {
|
|
|
|
direct = 1;
|
|
|
|
root_gfn = 0;
|
|
|
|
}
|
2010-05-04 17:58:32 +08:00
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
2010-05-13 08:00:35 +08:00
|
|
|
kvm_mmu_free_some_pages(vcpu);
|
2007-11-21 21:28:32 +08:00
|
|
|
sp = kvm_mmu_get_page(vcpu, root_gfn, 0,
|
2009-01-11 19:02:10 +08:00
|
|
|
PT64_ROOT_LEVEL, direct,
|
2008-02-07 20:47:44 +08:00
|
|
|
ACC_ALL, NULL);
|
2007-11-21 21:28:32 +08:00
|
|
|
root = __pa(sp->spt);
|
|
|
|
++sp->root_count;
|
2010-05-04 17:58:32 +08:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.root_hpa = root;
|
2009-05-13 05:55:45 +08:00
|
|
|
return 0;
|
2007-01-06 08:36:40 +08:00
|
|
|
}
|
2009-01-11 19:02:10 +08:00
|
|
|
direct = !is_paging(vcpu);
|
2007-01-06 08:36:40 +08:00
|
|
|
for (i = 0; i < 4; ++i) {
|
2007-12-13 23:50:52 +08:00
|
|
|
hpa_t root = vcpu->arch.mmu.pae_root[i];
|
2007-01-06 08:36:40 +08:00
|
|
|
|
|
|
|
ASSERT(!VALID_PAGE(root));
|
2007-12-13 23:50:52 +08:00
|
|
|
if (vcpu->arch.mmu.root_level == PT32E_ROOT_LEVEL) {
|
2009-06-01 03:58:47 +08:00
|
|
|
pdptr = kvm_pdptr_read(vcpu, i);
|
2009-06-10 19:12:05 +08:00
|
|
|
if (!is_present_gpte(pdptr)) {
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.pae_root[i] = 0;
|
2007-04-12 22:35:58 +08:00
|
|
|
continue;
|
|
|
|
}
|
2009-06-01 03:58:47 +08:00
|
|
|
root_gfn = pdptr >> PAGE_SHIFT;
|
2007-12-13 23:50:52 +08:00
|
|
|
} else if (vcpu->arch.mmu.root_level == 0)
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
root_gfn = 0;
|
2009-05-13 05:55:45 +08:00
|
|
|
if (mmu_check_root(vcpu, root_gfn))
|
|
|
|
return 1;
|
2010-04-27 08:00:05 +08:00
|
|
|
if (tdp_enabled) {
|
|
|
|
direct = 1;
|
|
|
|
root_gfn = i << 30;
|
|
|
|
}
|
2010-05-04 17:58:32 +08:00
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
2010-05-13 08:00:35 +08:00
|
|
|
kvm_mmu_free_some_pages(vcpu);
|
2007-11-21 21:28:32 +08:00
|
|
|
sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30,
|
2009-01-11 19:02:10 +08:00
|
|
|
PT32_ROOT_LEVEL, direct,
|
2008-02-27 04:12:10 +08:00
|
|
|
ACC_ALL, NULL);
|
2007-11-21 21:28:32 +08:00
|
|
|
root = __pa(sp->spt);
|
|
|
|
++sp->root_count;
|
2010-05-04 17:58:32 +08:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
|
|
|
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.pae_root[i] = root | PT_PRESENT_MASK;
|
2007-01-06 08:36:40 +08:00
|
|
|
}
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.root_hpa = __pa(vcpu->arch.mmu.pae_root);
|
2009-05-13 05:55:45 +08:00
|
|
|
return 0;
|
2007-01-06 08:36:40 +08:00
|
|
|
}
|
|
|
|
|
2008-09-24 00:18:34 +08:00
|
|
|
static void mmu_sync_roots(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
struct kvm_mmu_page *sp;
|
|
|
|
|
|
|
|
if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
|
|
|
|
return;
|
|
|
|
if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
|
|
|
|
hpa_t root = vcpu->arch.mmu.root_hpa;
|
|
|
|
sp = page_header(root);
|
|
|
|
mmu_sync_children(vcpu, sp);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
for (i = 0; i < 4; ++i) {
|
|
|
|
hpa_t root = vcpu->arch.mmu.pae_root[i];
|
|
|
|
|
2009-05-13 05:55:45 +08:00
|
|
|
if (root && VALID_PAGE(root)) {
|
2008-09-24 00:18:34 +08:00
|
|
|
root &= PT64_BASE_ADDR_MASK;
|
|
|
|
sp = page_header(root);
|
|
|
|
mmu_sync_children(vcpu, sp);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
|
|
|
mmu_sync_roots(vcpu);
|
2008-12-02 08:32:04 +08:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
2008-09-24 00:18:34 +08:00
|
|
|
}
|
|
|
|
|
2010-02-10 20:21:32 +08:00
|
|
|
static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr,
|
|
|
|
u32 access, u32 *error)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2010-02-10 20:21:32 +08:00
|
|
|
if (error)
|
|
|
|
*error = 0;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
return vaddr;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
|
2007-11-21 20:54:16 +08:00
|
|
|
u32 error_code)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2007-12-10 00:43:00 +08:00
|
|
|
gfn_t gfn;
|
2007-01-06 08:36:54 +08:00
|
|
|
int r;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2008-03-04 04:59:56 +08:00
|
|
|
pgprintk("%s: gva %lx error %x\n", __func__, gva, error_code);
|
2007-01-06 08:36:54 +08:00
|
|
|
r = mmu_topup_memory_caches(vcpu);
|
|
|
|
if (r)
|
|
|
|
return r;
|
2007-01-06 08:36:53 +08:00
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
ASSERT(vcpu);
|
2007-12-13 23:50:52 +08:00
|
|
|
ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-12-10 00:43:00 +08:00
|
|
|
gfn = gva >> PAGE_SHIFT;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-12-10 00:43:00 +08:00
|
|
|
return nonpaging_map(vcpu, gva & PAGE_MASK,
|
|
|
|
error_code & PFERR_WRITE_MASK, gfn);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
2008-02-07 20:47:44 +08:00
|
|
|
static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
|
|
|
|
u32 error_code)
|
|
|
|
{
|
2008-04-03 03:46:56 +08:00
|
|
|
pfn_t pfn;
|
2008-02-07 20:47:44 +08:00
|
|
|
int r;
|
2009-07-27 22:30:44 +08:00
|
|
|
int level;
|
2008-02-23 22:44:30 +08:00
|
|
|
gfn_t gfn = gpa >> PAGE_SHIFT;
|
2008-07-25 22:24:52 +08:00
|
|
|
unsigned long mmu_seq;
|
2008-02-07 20:47:44 +08:00
|
|
|
|
|
|
|
ASSERT(vcpu);
|
|
|
|
ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));
|
|
|
|
|
|
|
|
r = mmu_topup_memory_caches(vcpu);
|
|
|
|
if (r)
|
|
|
|
return r;
|
|
|
|
|
2009-07-27 22:30:44 +08:00
|
|
|
level = mapping_level(vcpu, gfn);
|
|
|
|
|
|
|
|
gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1);
|
|
|
|
|
2008-07-25 22:24:52 +08:00
|
|
|
mmu_seq = vcpu->kvm->mmu_notifier_seq;
|
2008-09-17 07:54:47 +08:00
|
|
|
smp_rmb();
|
2008-04-03 03:46:56 +08:00
|
|
|
pfn = gfn_to_pfn(vcpu->kvm, gfn);
|
2010-05-31 14:28:19 +08:00
|
|
|
if (is_error_pfn(pfn))
|
|
|
|
return kvm_handle_bad_page(vcpu->kvm, gfn, pfn);
|
2008-02-07 20:47:44 +08:00
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
2008-07-25 22:24:52 +08:00
|
|
|
if (mmu_notifier_retry(vcpu, mmu_seq))
|
|
|
|
goto out_unlock;
|
2008-02-07 20:47:44 +08:00
|
|
|
kvm_mmu_free_some_pages(vcpu);
|
|
|
|
r = __direct_map(vcpu, gpa, error_code & PFERR_WRITE_MASK,
|
2009-07-27 22:30:44 +08:00
|
|
|
level, gfn, pfn);
|
2008-02-07 20:47:44 +08:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
|
|
|
|
|
|
|
return r;
|
2008-07-25 22:24:52 +08:00
|
|
|
|
|
|
|
out_unlock:
|
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
|
|
|
kvm_release_pfn_clean(pfn);
|
|
|
|
return 0;
|
2008-02-07 20:47:44 +08:00
|
|
|
}
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
static void nonpaging_free(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-01-06 08:36:40 +08:00
|
|
|
mmu_free_roots(vcpu);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int nonpaging_init_context(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-12-13 23:50:52 +08:00
|
|
|
struct kvm_mmu *context = &vcpu->arch.mmu;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
|
|
|
context->new_cr3 = nonpaging_new_cr3;
|
|
|
|
context->page_fault = nonpaging_page_fault;
|
|
|
|
context->gva_to_gpa = nonpaging_gva_to_gpa;
|
|
|
|
context->free = nonpaging_free;
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
context->prefetch_page = nonpaging_prefetch_page;
|
2008-09-24 00:18:33 +08:00
|
|
|
context->sync_page = nonpaging_sync_page;
|
2008-09-24 00:18:35 +08:00
|
|
|
context->invlpg = nonpaging_invlpg;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
context->root_level = 0;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
context->shadow_root_level = PT32E_ROOT_LEVEL;
|
2007-06-04 20:58:30 +08:00
|
|
|
context->root_hpa = INVALID_PAGE;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2007-11-21 08:57:59 +08:00
|
|
|
void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2007-04-19 22:27:43 +08:00
|
|
|
++vcpu->stat.tlb_flush;
|
2007-09-09 20:41:59 +08:00
|
|
|
kvm_x86_ops->tlb_flush(vcpu);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void paging_new_cr3(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2008-03-04 04:59:56 +08:00
|
|
|
pgprintk("%s: cr3 %lx\n", __func__, vcpu->arch.cr3);
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
mmu_free_roots(vcpu);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void inject_page_fault(struct kvm_vcpu *vcpu,
|
|
|
|
u64 addr,
|
|
|
|
u32 err_code)
|
|
|
|
{
|
2007-11-25 20:04:58 +08:00
|
|
|
kvm_inject_page_fault(vcpu, addr, err_code);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void paging_free(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
nonpaging_free(vcpu);
|
|
|
|
}
|
|
|
|
|
2009-03-30 16:21:08 +08:00
|
|
|
static bool is_rsvd_bits_set(struct kvm_vcpu *vcpu, u64 gpte, int level)
|
|
|
|
{
|
|
|
|
int bit7;
|
|
|
|
|
|
|
|
bit7 = (gpte >> 7) & 1;
|
|
|
|
return (gpte & vcpu->arch.mmu.rsvd_bits_mask[bit7][level-1]) != 0;
|
|
|
|
}
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
#define PTTYPE 64
|
|
|
|
#include "paging_tmpl.h"
|
|
|
|
#undef PTTYPE
|
|
|
|
|
|
|
|
#define PTTYPE 32
|
|
|
|
#include "paging_tmpl.h"
|
|
|
|
#undef PTTYPE
|
|
|
|
|
2009-03-30 16:21:08 +08:00
|
|
|
static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, int level)
|
|
|
|
{
|
|
|
|
struct kvm_mmu *context = &vcpu->arch.mmu;
|
|
|
|
int maxphyaddr = cpuid_maxphyaddr(vcpu);
|
|
|
|
u64 exb_bit_rsvd = 0;
|
|
|
|
|
|
|
|
if (!is_nx(vcpu))
|
|
|
|
exb_bit_rsvd = rsvd_bits(63, 63);
|
|
|
|
switch (level) {
|
|
|
|
case PT32_ROOT_LEVEL:
|
|
|
|
/* no rsvd bits for 2 level 4K page table entries */
|
|
|
|
context->rsvd_bits_mask[0][1] = 0;
|
|
|
|
context->rsvd_bits_mask[0][0] = 0;
|
2010-03-19 17:58:53 +08:00
|
|
|
context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0];
|
|
|
|
|
|
|
|
if (!is_pse(vcpu)) {
|
|
|
|
context->rsvd_bits_mask[1][1] = 0;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2009-03-30 16:21:08 +08:00
|
|
|
if (is_cpuid_PSE36())
|
|
|
|
/* 36bits PSE 4MB page */
|
|
|
|
context->rsvd_bits_mask[1][1] = rsvd_bits(17, 21);
|
|
|
|
else
|
|
|
|
/* 32 bits PSE 4MB page */
|
|
|
|
context->rsvd_bits_mask[1][1] = rsvd_bits(13, 21);
|
|
|
|
break;
|
|
|
|
case PT32E_ROOT_LEVEL:
|
2009-03-31 23:03:45 +08:00
|
|
|
context->rsvd_bits_mask[0][2] =
|
|
|
|
rsvd_bits(maxphyaddr, 63) |
|
|
|
|
rsvd_bits(7, 8) | rsvd_bits(1, 2); /* PDPTE */
|
2009-03-30 16:21:08 +08:00
|
|
|
context->rsvd_bits_mask[0][1] = exb_bit_rsvd |
|
2009-04-02 10:28:37 +08:00
|
|
|
rsvd_bits(maxphyaddr, 62); /* PDE */
|
2009-03-30 16:21:08 +08:00
|
|
|
context->rsvd_bits_mask[0][0] = exb_bit_rsvd |
|
|
|
|
rsvd_bits(maxphyaddr, 62); /* PTE */
|
|
|
|
context->rsvd_bits_mask[1][1] = exb_bit_rsvd |
|
|
|
|
rsvd_bits(maxphyaddr, 62) |
|
|
|
|
rsvd_bits(13, 20); /* large page */
|
2010-03-19 17:58:53 +08:00
|
|
|
context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0];
|
2009-03-30 16:21:08 +08:00
|
|
|
break;
|
|
|
|
case PT64_ROOT_LEVEL:
|
|
|
|
context->rsvd_bits_mask[0][3] = exb_bit_rsvd |
|
|
|
|
rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8);
|
|
|
|
context->rsvd_bits_mask[0][2] = exb_bit_rsvd |
|
|
|
|
rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8);
|
|
|
|
context->rsvd_bits_mask[0][1] = exb_bit_rsvd |
|
2009-04-02 10:28:37 +08:00
|
|
|
rsvd_bits(maxphyaddr, 51);
|
2009-03-30 16:21:08 +08:00
|
|
|
context->rsvd_bits_mask[0][0] = exb_bit_rsvd |
|
|
|
|
rsvd_bits(maxphyaddr, 51);
|
|
|
|
context->rsvd_bits_mask[1][3] = context->rsvd_bits_mask[0][3];
|
2009-07-27 22:30:45 +08:00
|
|
|
context->rsvd_bits_mask[1][2] = exb_bit_rsvd |
|
|
|
|
rsvd_bits(maxphyaddr, 51) |
|
|
|
|
rsvd_bits(13, 29);
|
2009-03-30 16:21:08 +08:00
|
|
|
context->rsvd_bits_mask[1][1] = exb_bit_rsvd |
|
2009-04-02 10:28:37 +08:00
|
|
|
rsvd_bits(maxphyaddr, 51) |
|
|
|
|
rsvd_bits(13, 20); /* large page */
|
2010-03-19 17:58:53 +08:00
|
|
|
context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0];
|
2009-03-30 16:21:08 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-01-06 08:36:40 +08:00
|
|
|
static int paging64_init_context_common(struct kvm_vcpu *vcpu, int level)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2007-12-13 23:50:52 +08:00
|
|
|
struct kvm_mmu *context = &vcpu->arch.mmu;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
|
|
|
ASSERT(is_pae(vcpu));
|
|
|
|
context->new_cr3 = paging_new_cr3;
|
|
|
|
context->page_fault = paging64_page_fault;
|
|
|
|
context->gva_to_gpa = paging64_gva_to_gpa;
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
context->prefetch_page = paging64_prefetch_page;
|
2008-09-24 00:18:33 +08:00
|
|
|
context->sync_page = paging64_sync_page;
|
2008-09-24 00:18:35 +08:00
|
|
|
context->invlpg = paging64_invlpg;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
context->free = paging_free;
|
2007-01-06 08:36:40 +08:00
|
|
|
context->root_level = level;
|
|
|
|
context->shadow_root_level = level;
|
2007-06-04 20:58:30 +08:00
|
|
|
context->root_hpa = INVALID_PAGE;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2007-01-06 08:36:40 +08:00
|
|
|
static int paging64_init_context(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2009-03-30 16:21:08 +08:00
|
|
|
reset_rsvds_bits_mask(vcpu, PT64_ROOT_LEVEL);
|
2007-01-06 08:36:40 +08:00
|
|
|
return paging64_init_context_common(vcpu, PT64_ROOT_LEVEL);
|
|
|
|
}
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
static int paging32_init_context(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-12-13 23:50:52 +08:00
|
|
|
struct kvm_mmu *context = &vcpu->arch.mmu;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2009-03-30 16:21:08 +08:00
|
|
|
reset_rsvds_bits_mask(vcpu, PT32_ROOT_LEVEL);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
context->new_cr3 = paging_new_cr3;
|
|
|
|
context->page_fault = paging32_page_fault;
|
|
|
|
context->gva_to_gpa = paging32_gva_to_gpa;
|
|
|
|
context->free = paging_free;
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
context->prefetch_page = paging32_prefetch_page;
|
2008-09-24 00:18:33 +08:00
|
|
|
context->sync_page = paging32_sync_page;
|
2008-09-24 00:18:35 +08:00
|
|
|
context->invlpg = paging32_invlpg;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
context->root_level = PT32_ROOT_LEVEL;
|
|
|
|
context->shadow_root_level = PT32E_ROOT_LEVEL;
|
2007-06-04 20:58:30 +08:00
|
|
|
context->root_hpa = INVALID_PAGE;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int paging32E_init_context(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2009-03-30 16:21:08 +08:00
|
|
|
reset_rsvds_bits_mask(vcpu, PT32E_ROOT_LEVEL);
|
2007-01-06 08:36:40 +08:00
|
|
|
return paging64_init_context_common(vcpu, PT32E_ROOT_LEVEL);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
2008-02-07 20:47:44 +08:00
|
|
|
static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
struct kvm_mmu *context = &vcpu->arch.mmu;
|
|
|
|
|
|
|
|
context->new_cr3 = nonpaging_new_cr3;
|
|
|
|
context->page_fault = tdp_page_fault;
|
|
|
|
context->free = nonpaging_free;
|
|
|
|
context->prefetch_page = nonpaging_prefetch_page;
|
2008-09-24 00:18:33 +08:00
|
|
|
context->sync_page = nonpaging_sync_page;
|
2008-09-24 00:18:35 +08:00
|
|
|
context->invlpg = nonpaging_invlpg;
|
2008-04-25 10:20:22 +08:00
|
|
|
context->shadow_root_level = kvm_x86_ops->get_tdp_level();
|
2008-02-07 20:47:44 +08:00
|
|
|
context->root_hpa = INVALID_PAGE;
|
|
|
|
|
|
|
|
if (!is_paging(vcpu)) {
|
|
|
|
context->gva_to_gpa = nonpaging_gva_to_gpa;
|
|
|
|
context->root_level = 0;
|
|
|
|
} else if (is_long_mode(vcpu)) {
|
2009-03-30 16:21:08 +08:00
|
|
|
reset_rsvds_bits_mask(vcpu, PT64_ROOT_LEVEL);
|
2008-02-07 20:47:44 +08:00
|
|
|
context->gva_to_gpa = paging64_gva_to_gpa;
|
|
|
|
context->root_level = PT64_ROOT_LEVEL;
|
|
|
|
} else if (is_pae(vcpu)) {
|
2009-03-30 16:21:08 +08:00
|
|
|
reset_rsvds_bits_mask(vcpu, PT32E_ROOT_LEVEL);
|
2008-02-07 20:47:44 +08:00
|
|
|
context->gva_to_gpa = paging64_gva_to_gpa;
|
|
|
|
context->root_level = PT32E_ROOT_LEVEL;
|
|
|
|
} else {
|
2009-03-30 16:21:08 +08:00
|
|
|
reset_rsvds_bits_mask(vcpu, PT32_ROOT_LEVEL);
|
2008-02-07 20:47:44 +08:00
|
|
|
context->gva_to_gpa = paging32_gva_to_gpa;
|
|
|
|
context->root_level = PT32_ROOT_LEVEL;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2008-12-22 01:20:09 +08:00
|
|
|
int r;
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
ASSERT(vcpu);
|
2007-12-13 23:50:52 +08:00
|
|
|
ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
|
|
|
if (!is_paging(vcpu))
|
2008-12-22 01:20:09 +08:00
|
|
|
r = nonpaging_init_context(vcpu);
|
2006-12-30 08:49:37 +08:00
|
|
|
else if (is_long_mode(vcpu))
|
2008-12-22 01:20:09 +08:00
|
|
|
r = paging64_init_context(vcpu);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
else if (is_pae(vcpu))
|
2008-12-22 01:20:09 +08:00
|
|
|
r = paging32E_init_context(vcpu);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
else
|
2008-12-22 01:20:09 +08:00
|
|
|
r = paging32_init_context(vcpu);
|
|
|
|
|
2010-04-15 00:20:03 +08:00
|
|
|
vcpu->arch.mmu.base_role.cr4_pae = !!is_pae(vcpu);
|
2010-05-12 16:48:18 +08:00
|
|
|
vcpu->arch.mmu.base_role.cr0_wp = is_write_protection(vcpu);
|
2008-12-22 01:20:09 +08:00
|
|
|
|
|
|
|
return r;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
2008-02-07 20:47:44 +08:00
|
|
|
static int init_kvm_mmu(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2008-04-03 03:46:56 +08:00
|
|
|
vcpu->arch.update_pte.pfn = bad_pfn;
|
|
|
|
|
2008-02-07 20:47:44 +08:00
|
|
|
if (tdp_enabled)
|
|
|
|
return init_kvm_tdp_mmu(vcpu);
|
|
|
|
else
|
|
|
|
return init_kvm_softmmu(vcpu);
|
|
|
|
}
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
static void destroy_kvm_mmu(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
ASSERT(vcpu);
|
2010-05-12 16:40:41 +08:00
|
|
|
if (VALID_PAGE(vcpu->arch.mmu.root_hpa))
|
|
|
|
/* mmu.free() should set root_hpa = INVALID_PAGE */
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.free(vcpu);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
int kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
|
2007-06-04 20:58:30 +08:00
|
|
|
{
|
|
|
|
destroy_kvm_mmu(vcpu);
|
|
|
|
return init_kvm_mmu(vcpu);
|
|
|
|
}
|
2007-10-10 14:26:45 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_reset_context);
|
2007-06-04 20:58:30 +08:00
|
|
|
|
|
|
|
int kvm_mmu_load(struct kvm_vcpu *vcpu)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2007-01-06 08:36:53 +08:00
|
|
|
int r;
|
|
|
|
|
2007-01-06 08:36:54 +08:00
|
|
|
r = mmu_topup_memory_caches(vcpu);
|
2007-06-04 20:58:30 +08:00
|
|
|
if (r)
|
|
|
|
goto out;
|
2009-05-13 05:55:45 +08:00
|
|
|
r = mmu_alloc_roots(vcpu);
|
2010-05-04 17:58:32 +08:00
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
2008-09-24 00:18:34 +08:00
|
|
|
mmu_sync_roots(vcpu);
|
2007-12-21 08:18:26 +08:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
2009-05-13 05:55:45 +08:00
|
|
|
if (r)
|
|
|
|
goto out;
|
2009-07-09 17:00:42 +08:00
|
|
|
/* set_cr3() should ensure TLB has been flushed */
|
2007-12-13 23:50:52 +08:00
|
|
|
kvm_x86_ops->set_cr3(vcpu, vcpu->arch.mmu.root_hpa);
|
2007-01-06 08:36:53 +08:00
|
|
|
out:
|
|
|
|
return r;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
2007-06-04 20:58:30 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_load);
|
|
|
|
|
|
|
|
void kvm_mmu_unload(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
mmu_free_roots(vcpu);
|
|
|
|
}
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-05-01 19:16:52 +08:00
|
|
|
static void mmu_pte_write_zap_pte(struct kvm_vcpu *vcpu,
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp,
|
2007-03-08 23:13:32 +08:00
|
|
|
u64 *spte)
|
|
|
|
{
|
|
|
|
u64 pte;
|
|
|
|
struct kvm_mmu_page *child;
|
|
|
|
|
|
|
|
pte = *spte;
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
if (is_shadow_present_pte(pte)) {
|
2009-06-10 23:27:03 +08:00
|
|
|
if (is_last_spte(pte, sp->role.level))
|
2007-09-27 20:11:22 +08:00
|
|
|
rmap_remove(vcpu->kvm, spte);
|
2007-03-08 23:13:32 +08:00
|
|
|
else {
|
|
|
|
child = page_header(pte & PT64_BASE_ADDR_MASK);
|
2007-07-17 18:04:56 +08:00
|
|
|
mmu_page_remove_parent_pte(child, spte);
|
2007-03-08 23:13:32 +08:00
|
|
|
}
|
|
|
|
}
|
2009-06-10 19:24:23 +08:00
|
|
|
__set_spte(spte, shadow_trap_nonpresent_pte);
|
2008-02-23 22:44:30 +08:00
|
|
|
if (is_large_pte(pte))
|
|
|
|
--vcpu->kvm->stat.lpages;
|
2007-03-08 23:13:32 +08:00
|
|
|
}
|
|
|
|
|
2007-05-01 21:53:31 +08:00
|
|
|
static void mmu_pte_write_new_pte(struct kvm_vcpu *vcpu,
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp,
|
2007-05-01 21:53:31 +08:00
|
|
|
u64 *spte,
|
2008-01-07 17:14:20 +08:00
|
|
|
const void *new)
|
2007-05-01 21:53:31 +08:00
|
|
|
{
|
2008-06-12 07:32:40 +08:00
|
|
|
if (sp->role.level != PT_PAGE_TABLE_LEVEL) {
|
2009-07-27 22:30:46 +08:00
|
|
|
++vcpu->kvm->stat.mmu_pde_zapped;
|
|
|
|
return;
|
2008-06-12 07:32:40 +08:00
|
|
|
}
|
2007-05-01 21:53:31 +08:00
|
|
|
|
2007-11-18 22:37:07 +08:00
|
|
|
++vcpu->kvm->stat.mmu_pte_updated;
|
2010-04-15 00:20:03 +08:00
|
|
|
if (!sp->role.cr4_pae)
|
2008-01-07 17:14:20 +08:00
|
|
|
paging32_update_pte(vcpu, sp, spte, new);
|
2007-05-01 21:53:31 +08:00
|
|
|
else
|
2008-01-07 17:14:20 +08:00
|
|
|
paging64_update_pte(vcpu, sp, spte, new);
|
2007-05-01 21:53:31 +08:00
|
|
|
}
|
|
|
|
|
2007-11-21 08:06:21 +08:00
|
|
|
static bool need_remote_flush(u64 old, u64 new)
|
|
|
|
{
|
|
|
|
if (!is_shadow_present_pte(old))
|
|
|
|
return false;
|
|
|
|
if (!is_shadow_present_pte(new))
|
|
|
|
return true;
|
|
|
|
if ((old ^ new) & PT64_BASE_ADDR_MASK)
|
|
|
|
return true;
|
|
|
|
old ^= PT64_NX_MASK;
|
|
|
|
new ^= PT64_NX_MASK;
|
|
|
|
return (old & ~new & PT64_PERM_MASK) != 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void mmu_pte_write_flush_tlb(struct kvm_vcpu *vcpu, u64 old, u64 new)
|
|
|
|
{
|
|
|
|
if (need_remote_flush(old, new))
|
|
|
|
kvm_flush_remote_tlbs(vcpu->kvm);
|
|
|
|
else
|
|
|
|
kvm_mmu_flush_tlb(vcpu);
|
|
|
|
}
|
|
|
|
|
2007-09-23 20:10:49 +08:00
|
|
|
static bool last_updated_pte_accessed(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-12-13 23:50:52 +08:00
|
|
|
u64 *spte = vcpu->arch.last_pte_updated;
|
2007-09-23 20:10:49 +08:00
|
|
|
|
2008-04-25 21:13:50 +08:00
|
|
|
return !!(spte && (*spte & shadow_accessed_mask));
|
2007-09-23 20:10:49 +08:00
|
|
|
}
|
|
|
|
|
2007-12-30 18:29:05 +08:00
|
|
|
static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
|
2010-03-15 19:59:53 +08:00
|
|
|
u64 gpte)
|
2007-12-30 18:29:05 +08:00
|
|
|
{
|
|
|
|
gfn_t gfn;
|
2008-04-03 03:46:56 +08:00
|
|
|
pfn_t pfn;
|
2007-12-30 18:29:05 +08:00
|
|
|
|
2009-06-10 19:12:05 +08:00
|
|
|
if (!is_present_gpte(gpte))
|
2007-12-30 18:29:05 +08:00
|
|
|
return;
|
|
|
|
gfn = (gpte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
|
2008-02-11 00:04:15 +08:00
|
|
|
|
2008-07-25 22:24:52 +08:00
|
|
|
vcpu->arch.update_pte.mmu_seq = vcpu->kvm->mmu_notifier_seq;
|
2008-09-17 07:54:47 +08:00
|
|
|
smp_rmb();
|
2008-04-03 03:46:56 +08:00
|
|
|
pfn = gfn_to_pfn(vcpu->kvm, gfn);
|
2008-02-11 00:04:15 +08:00
|
|
|
|
2008-04-03 03:46:56 +08:00
|
|
|
if (is_error_pfn(pfn)) {
|
|
|
|
kvm_release_pfn_clean(pfn);
|
2008-01-24 17:44:11 +08:00
|
|
|
return;
|
|
|
|
}
|
2007-12-30 18:29:05 +08:00
|
|
|
vcpu->arch.update_pte.gfn = gfn;
|
2008-04-03 03:46:56 +08:00
|
|
|
vcpu->arch.update_pte.pfn = pfn;
|
2007-12-30 18:29:05 +08:00
|
|
|
}
|
|
|
|
|
2008-05-15 18:51:35 +08:00
|
|
|
static void kvm_mmu_access_page(struct kvm_vcpu *vcpu, gfn_t gfn)
|
|
|
|
{
|
|
|
|
u64 *spte = vcpu->arch.last_pte_updated;
|
|
|
|
|
|
|
|
if (spte
|
|
|
|
&& vcpu->arch.last_pte_gfn == gfn
|
|
|
|
&& shadow_accessed_mask
|
|
|
|
&& !(*spte & shadow_accessed_mask)
|
|
|
|
&& is_shadow_present_pte(*spte))
|
|
|
|
set_bit(PT_ACCESSED_SHIFT, (unsigned long *)spte);
|
|
|
|
}
|
|
|
|
|
2007-05-01 19:16:52 +08:00
|
|
|
void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
|
2008-12-02 08:32:05 +08:00
|
|
|
const u8 *new, int bytes,
|
|
|
|
bool guest_initiated)
|
2007-01-06 08:36:44 +08:00
|
|
|
{
|
2007-01-06 08:36:45 +08:00
|
|
|
gfn_t gfn = gpa >> PAGE_SHIFT;
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
2007-01-06 08:36:48 +08:00
|
|
|
struct hlist_node *node, *n;
|
2008-01-07 17:14:20 +08:00
|
|
|
u64 entry, gentry;
|
2007-01-06 08:36:45 +08:00
|
|
|
u64 *spte;
|
|
|
|
unsigned offset = offset_in_page(gpa);
|
2007-01-06 08:36:48 +08:00
|
|
|
unsigned pte_size;
|
2007-01-06 08:36:45 +08:00
|
|
|
unsigned page_offset;
|
2007-01-06 08:36:48 +08:00
|
|
|
unsigned misaligned;
|
KVM: MMU: Respect nonpae pagetable quadrant when zapping ptes
When a guest writes to a page that has an mmu shadow, we have to clear
the shadow pte corresponding to the memory location touched by the guest.
Now, in nonpae mode, a single guest page may have two or four shadow
pages (because a nonpae page maps 4MB or 4GB, whereas the pae shadow maps
2MB or 1GB), so we when we look up the page we find up to three additional
aliases for the page. Since we _clear_ the shadow pte, it doesn't matter
except for a slight performance penalty, but if we want to _update_ the
shadow pte instead of clearing it, it is vital that we don't modify the
aliases.
Fortunately, exactly which page is needed (the "quadrant") is easily
computed, and is accessible in the shadow page header. All we need is
to ignore shadow pages from the wrong quadrants.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-01 21:44:05 +08:00
|
|
|
unsigned quadrant;
|
2007-01-06 08:36:45 +08:00
|
|
|
int level;
|
2007-01-06 08:36:50 +08:00
|
|
|
int flooded = 0;
|
2007-03-08 23:13:32 +08:00
|
|
|
int npte;
|
2008-01-07 17:14:20 +08:00
|
|
|
int r;
|
2010-03-15 19:59:57 +08:00
|
|
|
int invlpg_counter;
|
2007-01-06 08:36:45 +08:00
|
|
|
|
2008-03-04 04:59:56 +08:00
|
|
|
pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes);
|
2010-03-15 19:59:53 +08:00
|
|
|
|
2010-03-15 19:59:57 +08:00
|
|
|
invlpg_counter = atomic_read(&vcpu->kvm->arch.invlpg_counter);
|
2010-03-15 19:59:53 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Assume that the pte write on a page table of the same type
|
|
|
|
* as the current vcpu paging mode. This is nearly always true
|
|
|
|
* (might be false while changing modes). Note it is verified later
|
|
|
|
* by update_pte().
|
|
|
|
*/
|
2010-03-15 19:59:57 +08:00
|
|
|
if ((is_pae(vcpu) && bytes == 4) || !new) {
|
2010-03-15 19:59:53 +08:00
|
|
|
/* Handle a 32-bit guest writing two halves of a 64-bit gpte */
|
2010-03-15 19:59:57 +08:00
|
|
|
if (is_pae(vcpu)) {
|
|
|
|
gpa &= ~(gpa_t)7;
|
|
|
|
bytes = 8;
|
|
|
|
}
|
|
|
|
r = kvm_read_guest(vcpu->kvm, gpa, &gentry, min(bytes, 8));
|
2010-03-15 19:59:53 +08:00
|
|
|
if (r)
|
|
|
|
gentry = 0;
|
2010-03-15 19:59:57 +08:00
|
|
|
new = (const u8 *)&gentry;
|
|
|
|
}
|
|
|
|
|
|
|
|
switch (bytes) {
|
|
|
|
case 4:
|
|
|
|
gentry = *(const u32 *)new;
|
|
|
|
break;
|
|
|
|
case 8:
|
|
|
|
gentry = *(const u64 *)new;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
gentry = 0;
|
|
|
|
break;
|
2010-03-15 19:59:53 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
mmu_guess_page_from_pte_write(vcpu, gpa, gentry);
|
2007-12-21 08:18:26 +08:00
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
2010-03-15 19:59:57 +08:00
|
|
|
if (atomic_read(&vcpu->kvm->arch.invlpg_counter) != invlpg_counter)
|
|
|
|
gentry = 0;
|
2008-05-15 18:51:35 +08:00
|
|
|
kvm_mmu_access_page(vcpu, gfn);
|
2007-12-31 21:27:49 +08:00
|
|
|
kvm_mmu_free_some_pages(vcpu);
|
2007-11-18 22:37:07 +08:00
|
|
|
++vcpu->kvm->stat.mmu_pte_write;
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
kvm_mmu_audit(vcpu, "pre pte write");
|
2008-12-02 08:32:05 +08:00
|
|
|
if (guest_initiated) {
|
|
|
|
if (gfn == vcpu->arch.last_pt_write_gfn
|
|
|
|
&& !last_updated_pte_accessed(vcpu)) {
|
|
|
|
++vcpu->arch.last_pt_write_count;
|
|
|
|
if (vcpu->arch.last_pt_write_count >= 3)
|
|
|
|
flooded = 1;
|
|
|
|
} else {
|
|
|
|
vcpu->arch.last_pt_write_gfn = gfn;
|
|
|
|
vcpu->arch.last_pt_write_count = 1;
|
|
|
|
vcpu->arch.last_pte_updated = NULL;
|
|
|
|
}
|
2007-01-06 08:36:50 +08:00
|
|
|
}
|
2010-04-16 16:35:54 +08:00
|
|
|
|
|
|
|
restart:
|
2010-06-04 21:53:07 +08:00
|
|
|
for_each_gfn_indirect_valid_sp(vcpu->kvm, sp, gfn, node, n) {
|
2010-04-15 00:20:03 +08:00
|
|
|
pte_size = sp->role.cr4_pae ? 8 : 4;
|
2007-01-06 08:36:48 +08:00
|
|
|
misaligned = (offset ^ (offset + bytes - 1)) & ~(pte_size - 1);
|
2007-04-30 19:47:02 +08:00
|
|
|
misaligned |= bytes < 4;
|
2007-01-06 08:36:50 +08:00
|
|
|
if (misaligned || flooded) {
|
2007-01-06 08:36:48 +08:00
|
|
|
/*
|
|
|
|
* Misaligned accesses are too much trouble to fix
|
|
|
|
* up; also, they usually indicate a page is not used
|
|
|
|
* as a page table.
|
2007-01-06 08:36:50 +08:00
|
|
|
*
|
|
|
|
* If we're seeing too many writes to a page,
|
|
|
|
* it may no longer be a page table, or we may be
|
|
|
|
* forking, in which case it is better to unmap the
|
|
|
|
* page.
|
2007-01-06 08:36:48 +08:00
|
|
|
*/
|
|
|
|
pgprintk("misaligned: gpa %llx bytes %d role %x\n",
|
2007-11-21 21:28:32 +08:00
|
|
|
gpa, bytes, sp->role.word);
|
2008-09-24 00:18:37 +08:00
|
|
|
if (kvm_mmu_zap_page(vcpu->kvm, sp))
|
2010-04-16 16:35:54 +08:00
|
|
|
goto restart;
|
2007-11-18 22:37:07 +08:00
|
|
|
++vcpu->kvm->stat.mmu_flooded;
|
2007-01-06 08:36:48 +08:00
|
|
|
continue;
|
|
|
|
}
|
2007-01-06 08:36:45 +08:00
|
|
|
page_offset = offset;
|
2007-11-21 21:28:32 +08:00
|
|
|
level = sp->role.level;
|
2007-03-08 23:13:32 +08:00
|
|
|
npte = 1;
|
2010-04-15 00:20:03 +08:00
|
|
|
if (!sp->role.cr4_pae) {
|
2007-03-08 23:13:32 +08:00
|
|
|
page_offset <<= 1; /* 32->64 */
|
|
|
|
/*
|
|
|
|
* A 32-bit pde maps 4MB while the shadow pdes map
|
|
|
|
* only 2MB. So we need to double the offset again
|
|
|
|
* and zap two pdes instead of one.
|
|
|
|
*/
|
|
|
|
if (level == PT32_ROOT_LEVEL) {
|
2007-04-18 16:18:18 +08:00
|
|
|
page_offset &= ~7; /* kill rounding error */
|
2007-03-08 23:13:32 +08:00
|
|
|
page_offset <<= 1;
|
|
|
|
npte = 2;
|
|
|
|
}
|
KVM: MMU: Respect nonpae pagetable quadrant when zapping ptes
When a guest writes to a page that has an mmu shadow, we have to clear
the shadow pte corresponding to the memory location touched by the guest.
Now, in nonpae mode, a single guest page may have two or four shadow
pages (because a nonpae page maps 4MB or 4GB, whereas the pae shadow maps
2MB or 1GB), so we when we look up the page we find up to three additional
aliases for the page. Since we _clear_ the shadow pte, it doesn't matter
except for a slight performance penalty, but if we want to _update_ the
shadow pte instead of clearing it, it is vital that we don't modify the
aliases.
Fortunately, exactly which page is needed (the "quadrant") is easily
computed, and is accessible in the shadow page header. All we need is
to ignore shadow pages from the wrong quadrants.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-01 21:44:05 +08:00
|
|
|
quadrant = page_offset >> PAGE_SHIFT;
|
2007-01-06 08:36:45 +08:00
|
|
|
page_offset &= ~PAGE_MASK;
|
2007-11-21 21:28:32 +08:00
|
|
|
if (quadrant != sp->role.quadrant)
|
KVM: MMU: Respect nonpae pagetable quadrant when zapping ptes
When a guest writes to a page that has an mmu shadow, we have to clear
the shadow pte corresponding to the memory location touched by the guest.
Now, in nonpae mode, a single guest page may have two or four shadow
pages (because a nonpae page maps 4MB or 4GB, whereas the pae shadow maps
2MB or 1GB), so we when we look up the page we find up to three additional
aliases for the page. Since we _clear_ the shadow pte, it doesn't matter
except for a slight performance penalty, but if we want to _update_ the
shadow pte instead of clearing it, it is vital that we don't modify the
aliases.
Fortunately, exactly which page is needed (the "quadrant") is easily
computed, and is accessible in the shadow page header. All we need is
to ignore shadow pages from the wrong quadrants.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-01 21:44:05 +08:00
|
|
|
continue;
|
2007-01-06 08:36:45 +08:00
|
|
|
}
|
2007-11-21 21:28:32 +08:00
|
|
|
spte = &sp->spt[page_offset / sizeof(*spte)];
|
2007-03-08 23:13:32 +08:00
|
|
|
while (npte--) {
|
2007-11-21 08:06:21 +08:00
|
|
|
entry = *spte;
|
2007-11-21 21:28:32 +08:00
|
|
|
mmu_pte_write_zap_pte(vcpu, sp, spte);
|
2010-03-15 19:59:53 +08:00
|
|
|
if (gentry)
|
|
|
|
mmu_pte_write_new_pte(vcpu, sp, spte, &gentry);
|
2007-11-21 08:06:21 +08:00
|
|
|
mmu_pte_write_flush_tlb(vcpu, entry, *spte);
|
2007-03-08 23:13:32 +08:00
|
|
|
++spte;
|
2007-01-06 08:36:45 +08:00
|
|
|
}
|
|
|
|
}
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
kvm_mmu_audit(vcpu, "post pte write");
|
2007-12-21 08:18:26 +08:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
2008-04-03 03:46:56 +08:00
|
|
|
if (!is_error_pfn(vcpu->arch.update_pte.pfn)) {
|
|
|
|
kvm_release_pfn_clean(vcpu->arch.update_pte.pfn);
|
|
|
|
vcpu->arch.update_pte.pfn = bad_pfn;
|
2007-12-30 18:29:05 +08:00
|
|
|
}
|
2007-01-06 08:36:44 +08:00
|
|
|
}
|
|
|
|
|
2007-01-06 08:36:45 +08:00
|
|
|
int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
|
|
|
|
{
|
2007-12-21 08:18:22 +08:00
|
|
|
gpa_t gpa;
|
|
|
|
int r;
|
2007-01-06 08:36:45 +08:00
|
|
|
|
2009-08-27 18:37:06 +08:00
|
|
|
if (tdp_enabled)
|
|
|
|
return 0;
|
|
|
|
|
2010-02-10 20:21:32 +08:00
|
|
|
gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
|
2007-12-21 08:18:22 +08:00
|
|
|
|
2007-12-21 08:18:26 +08:00
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
2007-12-21 08:18:22 +08:00
|
|
|
r = kvm_mmu_unprotect_page(vcpu->kvm, gpa >> PAGE_SHIFT);
|
2007-12-21 08:18:26 +08:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
2007-12-21 08:18:22 +08:00
|
|
|
return r;
|
2007-01-06 08:36:45 +08:00
|
|
|
}
|
2008-07-19 13:57:05 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page_virt);
|
2007-01-06 08:36:45 +08:00
|
|
|
|
2007-09-15 01:26:06 +08:00
|
|
|
void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu)
|
2007-01-06 08:36:47 +08:00
|
|
|
{
|
2010-06-04 21:54:38 +08:00
|
|
|
int free_pages;
|
|
|
|
|
|
|
|
free_pages = vcpu->kvm->arch.n_free_mmu_pages;
|
|
|
|
while (free_pages < KVM_REFILL_PAGES &&
|
2009-07-29 02:26:58 +08:00
|
|
|
!list_empty(&vcpu->kvm->arch.active_mmu_pages)) {
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
2007-01-06 08:36:47 +08:00
|
|
|
|
2007-12-14 10:01:48 +08:00
|
|
|
sp = container_of(vcpu->kvm->arch.active_mmu_pages.prev,
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page, link);
|
2010-06-04 21:54:38 +08:00
|
|
|
free_pages += kvm_mmu_zap_page(vcpu->kvm, sp);
|
2007-11-18 22:37:07 +08:00
|
|
|
++vcpu->kvm->stat.mmu_recycled;
|
2007-01-06 08:36:47 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-10-29 00:48:59 +08:00
|
|
|
int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code)
|
|
|
|
{
|
|
|
|
int r;
|
|
|
|
enum emulation_result er;
|
|
|
|
|
2007-12-13 23:50:52 +08:00
|
|
|
r = vcpu->arch.mmu.page_fault(vcpu, cr2, error_code);
|
2007-10-29 00:48:59 +08:00
|
|
|
if (r < 0)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
if (!r) {
|
|
|
|
r = 1;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2007-10-29 00:52:05 +08:00
|
|
|
r = mmu_topup_memory_caches(vcpu);
|
|
|
|
if (r)
|
|
|
|
goto out;
|
|
|
|
|
2009-08-24 16:10:17 +08:00
|
|
|
er = emulate_instruction(vcpu, cr2, error_code, 0);
|
2007-10-29 00:48:59 +08:00
|
|
|
|
|
|
|
switch (er) {
|
|
|
|
case EMULATE_DONE:
|
|
|
|
return 1;
|
|
|
|
case EMULATE_DO_MMIO:
|
|
|
|
++vcpu->stat.mmio_exits;
|
2010-05-10 16:16:56 +08:00
|
|
|
/* fall through */
|
2007-10-29 00:48:59 +08:00
|
|
|
case EMULATE_FAIL:
|
2009-06-11 20:43:28 +08:00
|
|
|
return 0;
|
2007-10-29 00:48:59 +08:00
|
|
|
default:
|
|
|
|
BUG();
|
|
|
|
}
|
|
|
|
out:
|
|
|
|
return r;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_page_fault);
|
|
|
|
|
2008-09-24 00:18:35 +08:00
|
|
|
void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
|
|
|
|
{
|
|
|
|
vcpu->arch.mmu.invlpg(vcpu, gva);
|
|
|
|
kvm_mmu_flush_tlb(vcpu);
|
|
|
|
++vcpu->stat.invlpg;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_invlpg);
|
|
|
|
|
2008-02-07 20:47:41 +08:00
|
|
|
void kvm_enable_tdp(void)
|
|
|
|
{
|
|
|
|
tdp_enabled = true;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_enable_tdp);
|
|
|
|
|
2008-07-15 02:36:36 +08:00
|
|
|
void kvm_disable_tdp(void)
|
|
|
|
{
|
|
|
|
tdp_enabled = false;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_disable_tdp);
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
static void free_mmu_pages(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-12-13 23:50:52 +08:00
|
|
|
free_page((unsigned long)vcpu->arch.mmu.pae_root);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int alloc_mmu_pages(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-01-06 08:36:40 +08:00
|
|
|
struct page *page;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
int i;
|
|
|
|
|
|
|
|
ASSERT(vcpu);
|
|
|
|
|
2007-01-06 08:36:40 +08:00
|
|
|
/*
|
|
|
|
* When emulating 32-bit mode, cr3 is only 32 bits even on x86_64.
|
|
|
|
* Therefore we need to allocate shadow page tables in the first
|
|
|
|
* 4GB of memory, which happens to fit the DMA32 zone.
|
|
|
|
*/
|
|
|
|
page = alloc_page(GFP_KERNEL | __GFP_DMA32);
|
|
|
|
if (!page)
|
2010-01-22 16:55:05 +08:00
|
|
|
return -ENOMEM;
|
|
|
|
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.pae_root = page_address(page);
|
2007-01-06 08:36:40 +08:00
|
|
|
for (i = 0; i < 4; ++i)
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.pae_root[i] = INVALID_PAGE;
|
2007-01-06 08:36:40 +08:00
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2006-12-30 08:50:01 +08:00
|
|
|
int kvm_mmu_create(struct kvm_vcpu *vcpu)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
|
|
|
ASSERT(vcpu);
|
2007-12-13 23:50:52 +08:00
|
|
|
ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2006-12-30 08:50:01 +08:00
|
|
|
return alloc_mmu_pages(vcpu);
|
|
|
|
}
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2006-12-30 08:50:01 +08:00
|
|
|
int kvm_mmu_setup(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
ASSERT(vcpu);
|
2007-12-13 23:50:52 +08:00
|
|
|
ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
|
2006-12-22 17:05:28 +08:00
|
|
|
|
2006-12-30 08:50:01 +08:00
|
|
|
return init_kvm_mmu(vcpu);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
void kvm_mmu_destroy(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
ASSERT(vcpu);
|
|
|
|
|
|
|
|
destroy_kvm_mmu(vcpu);
|
|
|
|
free_mmu_pages(vcpu);
|
2007-01-06 08:36:53 +08:00
|
|
|
mmu_free_memory_caches(vcpu);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
2007-07-17 18:04:56 +08:00
|
|
|
void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-12-14 10:01:48 +08:00
|
|
|
list_for_each_entry(sp, &kvm->arch.active_mmu_pages, link) {
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
int i;
|
|
|
|
u64 *pt;
|
|
|
|
|
2008-10-16 17:30:57 +08:00
|
|
|
if (!test_bit(slot, sp->slot_bitmap))
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
continue;
|
|
|
|
|
2007-11-21 21:28:32 +08:00
|
|
|
pt = sp->spt;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
for (i = 0; i < PT64_ENT_PER_PAGE; ++i)
|
|
|
|
/* avoid RMW */
|
2010-05-27 16:09:48 +08:00
|
|
|
if (is_writable_pte(pt[i]))
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
pt[i] &= ~PT_WRITABLE_MASK;
|
|
|
|
}
|
2008-08-27 21:40:51 +08:00
|
|
|
kvm_flush_remote_tlbs(kvm);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
2007-01-06 08:36:56 +08:00
|
|
|
|
2007-07-17 18:04:56 +08:00
|
|
|
void kvm_mmu_zap_all(struct kvm *kvm)
|
2007-03-30 18:06:33 +08:00
|
|
|
{
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp, *node;
|
2007-03-30 18:06:33 +08:00
|
|
|
|
2007-12-21 08:18:26 +08:00
|
|
|
spin_lock(&kvm->mmu_lock);
|
2010-04-16 16:35:54 +08:00
|
|
|
restart:
|
2007-12-14 10:01:48 +08:00
|
|
|
list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link)
|
2008-09-24 00:18:37 +08:00
|
|
|
if (kvm_mmu_zap_page(kvm, sp))
|
2010-04-16 16:35:54 +08:00
|
|
|
goto restart;
|
|
|
|
|
2007-12-21 08:18:26 +08:00
|
|
|
spin_unlock(&kvm->mmu_lock);
|
2007-03-30 18:06:33 +08:00
|
|
|
|
2007-07-17 18:04:56 +08:00
|
|
|
kvm_flush_remote_tlbs(kvm);
|
2007-03-30 18:06:33 +08:00
|
|
|
}
|
|
|
|
|
2010-04-27 10:39:49 +08:00
|
|
|
static int kvm_mmu_remove_some_alloc_mmu_pages(struct kvm *kvm)
|
2008-03-30 20:17:21 +08:00
|
|
|
{
|
|
|
|
struct kvm_mmu_page *page;
|
|
|
|
|
|
|
|
page = container_of(kvm->arch.active_mmu_pages.prev,
|
|
|
|
struct kvm_mmu_page, link);
|
2010-05-05 09:03:49 +08:00
|
|
|
return kvm_mmu_zap_page(kvm, page);
|
2008-03-30 20:17:21 +08:00
|
|
|
}
|
|
|
|
|
2010-07-19 12:56:17 +08:00
|
|
|
static int mmu_shrink(struct shrinker *shrink, int nr_to_scan, gfp_t gfp_mask)
|
2008-03-30 20:17:21 +08:00
|
|
|
{
|
|
|
|
struct kvm *kvm;
|
|
|
|
struct kvm *kvm_freed = NULL;
|
|
|
|
int cache_count = 0;
|
|
|
|
|
|
|
|
spin_lock(&kvm_lock);
|
|
|
|
|
|
|
|
list_for_each_entry(kvm, &vm_list, vm_list) {
|
2010-04-27 10:39:49 +08:00
|
|
|
int npages, idx, freed_pages;
|
2008-03-30 20:17:21 +08:00
|
|
|
|
2009-12-24 00:35:25 +08:00
|
|
|
idx = srcu_read_lock(&kvm->srcu);
|
2008-03-30 20:17:21 +08:00
|
|
|
spin_lock(&kvm->mmu_lock);
|
|
|
|
npages = kvm->arch.n_alloc_mmu_pages -
|
|
|
|
kvm->arch.n_free_mmu_pages;
|
|
|
|
cache_count += npages;
|
|
|
|
if (!kvm_freed && nr_to_scan > 0 && npages > 0) {
|
2010-04-27 10:39:49 +08:00
|
|
|
freed_pages = kvm_mmu_remove_some_alloc_mmu_pages(kvm);
|
|
|
|
cache_count -= freed_pages;
|
2008-03-30 20:17:21 +08:00
|
|
|
kvm_freed = kvm;
|
|
|
|
}
|
|
|
|
nr_to_scan--;
|
|
|
|
|
|
|
|
spin_unlock(&kvm->mmu_lock);
|
2009-12-24 00:35:25 +08:00
|
|
|
srcu_read_unlock(&kvm->srcu, idx);
|
2008-03-30 20:17:21 +08:00
|
|
|
}
|
|
|
|
if (kvm_freed)
|
|
|
|
list_move_tail(&kvm_freed->vm_list, &vm_list);
|
|
|
|
|
|
|
|
spin_unlock(&kvm_lock);
|
|
|
|
|
|
|
|
return cache_count;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct shrinker mmu_shrinker = {
|
|
|
|
.shrink = mmu_shrink,
|
|
|
|
.seeks = DEFAULT_SEEKS * 10,
|
|
|
|
};
|
|
|
|
|
2008-05-22 16:37:48 +08:00
|
|
|
static void mmu_destroy_caches(void)
|
2007-04-15 21:31:09 +08:00
|
|
|
{
|
|
|
|
if (pte_chain_cache)
|
|
|
|
kmem_cache_destroy(pte_chain_cache);
|
|
|
|
if (rmap_desc_cache)
|
|
|
|
kmem_cache_destroy(rmap_desc_cache);
|
2007-05-30 17:34:53 +08:00
|
|
|
if (mmu_page_header_cache)
|
|
|
|
kmem_cache_destroy(mmu_page_header_cache);
|
2007-04-15 21:31:09 +08:00
|
|
|
}
|
|
|
|
|
2008-03-30 20:17:21 +08:00
|
|
|
void kvm_mmu_module_exit(void)
|
|
|
|
{
|
|
|
|
mmu_destroy_caches();
|
|
|
|
unregister_shrinker(&mmu_shrinker);
|
|
|
|
}
|
|
|
|
|
2007-04-15 21:31:09 +08:00
|
|
|
int kvm_mmu_module_init(void)
|
|
|
|
{
|
|
|
|
pte_chain_cache = kmem_cache_create("kvm_pte_chain",
|
|
|
|
sizeof(struct kvm_pte_chain),
|
2007-07-20 09:11:58 +08:00
|
|
|
0, 0, NULL);
|
2007-04-15 21:31:09 +08:00
|
|
|
if (!pte_chain_cache)
|
|
|
|
goto nomem;
|
|
|
|
rmap_desc_cache = kmem_cache_create("kvm_rmap_desc",
|
|
|
|
sizeof(struct kvm_rmap_desc),
|
2007-07-20 09:11:58 +08:00
|
|
|
0, 0, NULL);
|
2007-04-15 21:31:09 +08:00
|
|
|
if (!rmap_desc_cache)
|
|
|
|
goto nomem;
|
|
|
|
|
2007-05-30 17:34:53 +08:00
|
|
|
mmu_page_header_cache = kmem_cache_create("kvm_mmu_page_header",
|
|
|
|
sizeof(struct kvm_mmu_page),
|
2007-07-20 09:11:58 +08:00
|
|
|
0, 0, NULL);
|
2007-05-30 17:34:53 +08:00
|
|
|
if (!mmu_page_header_cache)
|
|
|
|
goto nomem;
|
|
|
|
|
2008-03-30 20:17:21 +08:00
|
|
|
register_shrinker(&mmu_shrinker);
|
|
|
|
|
2007-04-15 21:31:09 +08:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
nomem:
|
2008-03-30 20:17:21 +08:00
|
|
|
mmu_destroy_caches();
|
2007-04-15 21:31:09 +08:00
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
2007-11-20 13:11:38 +08:00
|
|
|
/*
|
|
|
|
* Caculate mmu pages needed for kvm.
|
|
|
|
*/
|
|
|
|
unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
unsigned int nr_mmu_pages;
|
|
|
|
unsigned int nr_pages = 0;
|
2009-12-24 00:35:21 +08:00
|
|
|
struct kvm_memslots *slots;
|
2007-11-20 13:11:38 +08:00
|
|
|
|
2010-04-19 17:41:23 +08:00
|
|
|
slots = kvm_memslots(kvm);
|
|
|
|
|
2009-12-24 00:35:21 +08:00
|
|
|
for (i = 0; i < slots->nmemslots; i++)
|
|
|
|
nr_pages += slots->memslots[i].npages;
|
2007-11-20 13:11:38 +08:00
|
|
|
|
|
|
|
nr_mmu_pages = nr_pages * KVM_PERMILLE_MMU_PAGES / 1000;
|
|
|
|
nr_mmu_pages = max(nr_mmu_pages,
|
|
|
|
(unsigned int) KVM_MIN_ALLOC_MMU_PAGES);
|
|
|
|
|
|
|
|
return nr_mmu_pages;
|
|
|
|
}
|
|
|
|
|
2008-02-23 01:21:37 +08:00
|
|
|
static void *pv_mmu_peek_buffer(struct kvm_pv_mmu_op_buffer *buffer,
|
|
|
|
unsigned len)
|
|
|
|
{
|
|
|
|
if (len > buffer->len)
|
|
|
|
return NULL;
|
|
|
|
return buffer->ptr;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void *pv_mmu_read_buffer(struct kvm_pv_mmu_op_buffer *buffer,
|
|
|
|
unsigned len)
|
|
|
|
{
|
|
|
|
void *ret;
|
|
|
|
|
|
|
|
ret = pv_mmu_peek_buffer(buffer, len);
|
|
|
|
if (!ret)
|
|
|
|
return ret;
|
|
|
|
buffer->ptr += len;
|
|
|
|
buffer->len -= len;
|
|
|
|
buffer->processed += len;
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int kvm_pv_mmu_write(struct kvm_vcpu *vcpu,
|
|
|
|
gpa_t addr, gpa_t value)
|
|
|
|
{
|
|
|
|
int bytes = 8;
|
|
|
|
int r;
|
|
|
|
|
|
|
|
if (!is_long_mode(vcpu) && !is_pae(vcpu))
|
|
|
|
bytes = 4;
|
|
|
|
|
|
|
|
r = mmu_topup_memory_caches(vcpu);
|
|
|
|
if (r)
|
|
|
|
return r;
|
|
|
|
|
2008-03-30 07:17:59 +08:00
|
|
|
if (!emulator_write_phys(vcpu, addr, &value, bytes))
|
2008-02-23 01:21:37 +08:00
|
|
|
return -EFAULT;
|
|
|
|
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int kvm_pv_mmu_flush_tlb(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2009-05-25 03:15:25 +08:00
|
|
|
kvm_set_cr3(vcpu, vcpu->arch.cr3);
|
2008-02-23 01:21:37 +08:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int kvm_pv_mmu_release_pt(struct kvm_vcpu *vcpu, gpa_t addr)
|
|
|
|
{
|
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
|
|
|
mmu_unshadow(vcpu->kvm, addr >> PAGE_SHIFT);
|
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int kvm_pv_mmu_op_one(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_pv_mmu_op_buffer *buffer)
|
|
|
|
{
|
|
|
|
struct kvm_mmu_op_header *header;
|
|
|
|
|
|
|
|
header = pv_mmu_peek_buffer(buffer, sizeof *header);
|
|
|
|
if (!header)
|
|
|
|
return 0;
|
|
|
|
switch (header->op) {
|
|
|
|
case KVM_MMU_OP_WRITE_PTE: {
|
|
|
|
struct kvm_mmu_op_write_pte *wpte;
|
|
|
|
|
|
|
|
wpte = pv_mmu_read_buffer(buffer, sizeof *wpte);
|
|
|
|
if (!wpte)
|
|
|
|
return 0;
|
|
|
|
return kvm_pv_mmu_write(vcpu, wpte->pte_phys,
|
|
|
|
wpte->pte_val);
|
|
|
|
}
|
|
|
|
case KVM_MMU_OP_FLUSH_TLB: {
|
|
|
|
struct kvm_mmu_op_flush_tlb *ftlb;
|
|
|
|
|
|
|
|
ftlb = pv_mmu_read_buffer(buffer, sizeof *ftlb);
|
|
|
|
if (!ftlb)
|
|
|
|
return 0;
|
|
|
|
return kvm_pv_mmu_flush_tlb(vcpu);
|
|
|
|
}
|
|
|
|
case KVM_MMU_OP_RELEASE_PT: {
|
|
|
|
struct kvm_mmu_op_release_pt *rpt;
|
|
|
|
|
|
|
|
rpt = pv_mmu_read_buffer(buffer, sizeof *rpt);
|
|
|
|
if (!rpt)
|
|
|
|
return 0;
|
|
|
|
return kvm_pv_mmu_release_pt(vcpu, rpt->pt_phys);
|
|
|
|
}
|
|
|
|
default: return 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes,
|
|
|
|
gpa_t addr, unsigned long *ret)
|
|
|
|
{
|
|
|
|
int r;
|
2008-08-12 01:01:49 +08:00
|
|
|
struct kvm_pv_mmu_op_buffer *buffer = &vcpu->arch.mmu_op_buffer;
|
2008-02-23 01:21:37 +08:00
|
|
|
|
2008-08-12 01:01:49 +08:00
|
|
|
buffer->ptr = buffer->buf;
|
|
|
|
buffer->len = min_t(unsigned long, bytes, sizeof buffer->buf);
|
|
|
|
buffer->processed = 0;
|
2008-02-23 01:21:37 +08:00
|
|
|
|
2008-08-12 01:01:49 +08:00
|
|
|
r = kvm_read_guest(vcpu->kvm, addr, buffer->buf, buffer->len);
|
2008-02-23 01:21:37 +08:00
|
|
|
if (r)
|
|
|
|
goto out;
|
|
|
|
|
2008-08-12 01:01:49 +08:00
|
|
|
while (buffer->len) {
|
|
|
|
r = kvm_pv_mmu_op_one(vcpu, buffer);
|
2008-02-23 01:21:37 +08:00
|
|
|
if (r < 0)
|
|
|
|
goto out;
|
|
|
|
if (r == 0)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
r = 1;
|
|
|
|
out:
|
2008-08-12 01:01:49 +08:00
|
|
|
*ret = buffer->processed;
|
2008-02-23 01:21:37 +08:00
|
|
|
return r;
|
|
|
|
}
|
|
|
|
|
2009-06-11 23:07:42 +08:00
|
|
|
int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4])
|
|
|
|
{
|
|
|
|
struct kvm_shadow_walk_iterator iterator;
|
|
|
|
int nr_sptes = 0;
|
|
|
|
|
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
|
|
|
for_each_shadow_entry(vcpu, addr, iterator) {
|
|
|
|
sptes[iterator.level-1] = *iterator.sptep;
|
|
|
|
nr_sptes++;
|
|
|
|
if (!is_shadow_present_pte(*iterator.sptep))
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
|
|
|
|
|
|
|
return nr_sptes;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_get_spte_hierarchy);
|
|
|
|
|
2007-01-06 08:36:56 +08:00
|
|
|
#ifdef AUDIT
|
|
|
|
|
|
|
|
static const char *audit_msg;
|
|
|
|
|
|
|
|
static gva_t canonicalize(gva_t gva)
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_X86_64
|
|
|
|
gva = (long long)(gva << 16) >> 16;
|
|
|
|
#endif
|
|
|
|
return gva;
|
|
|
|
}
|
|
|
|
|
2009-06-10 23:27:04 +08:00
|
|
|
|
2010-04-01 16:50:45 +08:00
|
|
|
typedef void (*inspect_spte_fn) (struct kvm *kvm, u64 *sptep);
|
2009-06-10 23:27:04 +08:00
|
|
|
|
|
|
|
static void __mmu_spte_walk(struct kvm *kvm, struct kvm_mmu_page *sp,
|
|
|
|
inspect_spte_fn fn)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
|
|
|
|
u64 ent = sp->spt[i];
|
|
|
|
|
|
|
|
if (is_shadow_present_pte(ent)) {
|
2009-06-10 23:27:08 +08:00
|
|
|
if (!is_last_spte(ent, sp->role.level)) {
|
2009-06-10 23:27:04 +08:00
|
|
|
struct kvm_mmu_page *child;
|
|
|
|
child = page_header(ent & PT64_BASE_ADDR_MASK);
|
|
|
|
__mmu_spte_walk(kvm, child, fn);
|
2009-06-10 23:27:08 +08:00
|
|
|
} else
|
2010-04-01 16:50:45 +08:00
|
|
|
fn(kvm, &sp->spt[i]);
|
2009-06-10 23:27:04 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void mmu_spte_walk(struct kvm_vcpu *vcpu, inspect_spte_fn fn)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
struct kvm_mmu_page *sp;
|
|
|
|
|
|
|
|
if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
|
|
|
|
return;
|
|
|
|
if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
|
|
|
|
hpa_t root = vcpu->arch.mmu.root_hpa;
|
|
|
|
sp = page_header(root);
|
|
|
|
__mmu_spte_walk(vcpu->kvm, sp, fn);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
for (i = 0; i < 4; ++i) {
|
|
|
|
hpa_t root = vcpu->arch.mmu.pae_root[i];
|
|
|
|
|
|
|
|
if (root && VALID_PAGE(root)) {
|
|
|
|
root &= PT64_BASE_ADDR_MASK;
|
|
|
|
sp = page_header(root);
|
|
|
|
__mmu_spte_walk(vcpu->kvm, sp, fn);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2007-01-06 08:36:56 +08:00
|
|
|
static void audit_mappings_page(struct kvm_vcpu *vcpu, u64 page_pte,
|
|
|
|
gva_t va, int level)
|
|
|
|
{
|
|
|
|
u64 *pt = __va(page_pte & PT64_BASE_ADDR_MASK);
|
|
|
|
int i;
|
|
|
|
gva_t va_delta = 1ul << (PAGE_SHIFT + 9 * (level - 1));
|
|
|
|
|
|
|
|
for (i = 0; i < PT64_ENT_PER_PAGE; ++i, va += va_delta) {
|
|
|
|
u64 ent = pt[i];
|
|
|
|
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
if (ent == shadow_trap_nonpresent_pte)
|
2007-01-06 08:36:56 +08:00
|
|
|
continue;
|
|
|
|
|
|
|
|
va = canonicalize(va);
|
2009-06-10 23:27:08 +08:00
|
|
|
if (is_shadow_present_pte(ent) && !is_last_spte(ent, level))
|
|
|
|
audit_mappings_page(vcpu, ent, va, level - 1);
|
|
|
|
else {
|
2010-02-10 20:21:32 +08:00
|
|
|
gpa_t gpa = kvm_mmu_gva_to_gpa_read(vcpu, va, NULL);
|
2009-04-25 18:43:21 +08:00
|
|
|
gfn_t gfn = gpa >> PAGE_SHIFT;
|
|
|
|
pfn_t pfn = gfn_to_pfn(vcpu->kvm, gfn);
|
|
|
|
hpa_t hpa = (hpa_t)pfn << PAGE_SHIFT;
|
2007-01-06 08:36:56 +08:00
|
|
|
|
2009-06-10 23:27:07 +08:00
|
|
|
if (is_error_pfn(pfn)) {
|
|
|
|
kvm_release_pfn_clean(pfn);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
if (is_shadow_present_pte(ent)
|
2007-01-06 08:36:56 +08:00
|
|
|
&& (ent & PT64_BASE_ADDR_MASK) != hpa)
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
printk(KERN_ERR "xx audit error: (%s) levels %d"
|
|
|
|
" gva %lx gpa %llx hpa %llx ent %llx %d\n",
|
2007-12-13 23:50:52 +08:00
|
|
|
audit_msg, vcpu->arch.mmu.root_level,
|
2007-10-08 21:02:08 +08:00
|
|
|
va, gpa, hpa, ent,
|
|
|
|
is_shadow_present_pte(ent));
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
else if (ent == shadow_notrap_nonpresent_pte
|
|
|
|
&& !is_error_hpa(hpa))
|
|
|
|
printk(KERN_ERR "audit: (%s) notrap shadow,"
|
|
|
|
" valid guest gva %lx\n", audit_msg, va);
|
2008-04-03 03:46:56 +08:00
|
|
|
kvm_release_pfn_clean(pfn);
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
|
2007-01-06 08:36:56 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void audit_mappings(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-03-08 17:48:09 +08:00
|
|
|
unsigned i;
|
2007-01-06 08:36:56 +08:00
|
|
|
|
2007-12-13 23:50:52 +08:00
|
|
|
if (vcpu->arch.mmu.root_level == 4)
|
|
|
|
audit_mappings_page(vcpu, vcpu->arch.mmu.root_hpa, 0, 4);
|
2007-01-06 08:36:56 +08:00
|
|
|
else
|
|
|
|
for (i = 0; i < 4; ++i)
|
2007-12-13 23:50:52 +08:00
|
|
|
if (vcpu->arch.mmu.pae_root[i] & PT_PRESENT_MASK)
|
2007-01-06 08:36:56 +08:00
|
|
|
audit_mappings_page(vcpu,
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.pae_root[i],
|
2007-01-06 08:36:56 +08:00
|
|
|
i << 30,
|
|
|
|
2);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int count_rmaps(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2010-04-01 16:50:45 +08:00
|
|
|
struct kvm *kvm = vcpu->kvm;
|
|
|
|
struct kvm_memslots *slots;
|
2007-01-06 08:36:56 +08:00
|
|
|
int nmaps = 0;
|
2009-12-24 00:35:21 +08:00
|
|
|
int i, j, k, idx;
|
2007-01-06 08:36:56 +08:00
|
|
|
|
2009-12-24 00:35:21 +08:00
|
|
|
idx = srcu_read_lock(&kvm->srcu);
|
2010-04-19 17:41:23 +08:00
|
|
|
slots = kvm_memslots(kvm);
|
2007-01-06 08:36:56 +08:00
|
|
|
for (i = 0; i < KVM_MEMORY_SLOTS; ++i) {
|
2009-12-24 00:35:21 +08:00
|
|
|
struct kvm_memory_slot *m = &slots->memslots[i];
|
2007-01-06 08:36:56 +08:00
|
|
|
struct kvm_rmap_desc *d;
|
|
|
|
|
|
|
|
for (j = 0; j < m->npages; ++j) {
|
2007-09-27 20:11:22 +08:00
|
|
|
unsigned long *rmapp = &m->rmap[j];
|
2007-01-06 08:36:56 +08:00
|
|
|
|
2007-09-27 20:11:22 +08:00
|
|
|
if (!*rmapp)
|
2007-01-06 08:36:56 +08:00
|
|
|
continue;
|
2007-09-27 20:11:22 +08:00
|
|
|
if (!(*rmapp & 1)) {
|
2007-01-06 08:36:56 +08:00
|
|
|
++nmaps;
|
|
|
|
continue;
|
|
|
|
}
|
2007-09-27 20:11:22 +08:00
|
|
|
d = (struct kvm_rmap_desc *)(*rmapp & ~1ul);
|
2007-01-06 08:36:56 +08:00
|
|
|
while (d) {
|
|
|
|
for (k = 0; k < RMAP_EXT; ++k)
|
2009-06-10 19:24:23 +08:00
|
|
|
if (d->sptes[k])
|
2007-01-06 08:36:56 +08:00
|
|
|
++nmaps;
|
|
|
|
else
|
|
|
|
break;
|
|
|
|
d = d->more;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2009-12-24 00:35:21 +08:00
|
|
|
srcu_read_unlock(&kvm->srcu, idx);
|
2007-01-06 08:36:56 +08:00
|
|
|
return nmaps;
|
|
|
|
}
|
|
|
|
|
2010-04-01 16:50:45 +08:00
|
|
|
void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
|
2009-06-10 23:27:04 +08:00
|
|
|
{
|
|
|
|
unsigned long *rmapp;
|
|
|
|
struct kvm_mmu_page *rev_sp;
|
|
|
|
gfn_t gfn;
|
|
|
|
|
2010-05-27 16:09:48 +08:00
|
|
|
if (is_writable_pte(*sptep)) {
|
2009-06-10 23:27:04 +08:00
|
|
|
rev_sp = page_header(__pa(sptep));
|
2010-05-26 16:49:59 +08:00
|
|
|
gfn = kvm_mmu_page_get_gfn(rev_sp, sptep - rev_sp->spt);
|
2009-06-10 23:27:04 +08:00
|
|
|
|
|
|
|
if (!gfn_to_memslot(kvm, gfn)) {
|
|
|
|
if (!printk_ratelimit())
|
|
|
|
return;
|
|
|
|
printk(KERN_ERR "%s: no memslot for gfn %ld\n",
|
|
|
|
audit_msg, gfn);
|
|
|
|
printk(KERN_ERR "%s: index %ld of sp (gfn=%lx)\n",
|
2010-04-01 16:50:45 +08:00
|
|
|
audit_msg, (long int)(sptep - rev_sp->spt),
|
2009-06-10 23:27:04 +08:00
|
|
|
rev_sp->gfn);
|
|
|
|
dump_stack();
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2010-05-26 16:49:59 +08:00
|
|
|
rmapp = gfn_to_rmap(kvm, gfn, rev_sp->role.level);
|
2009-06-10 23:27:04 +08:00
|
|
|
if (!*rmapp) {
|
|
|
|
if (!printk_ratelimit())
|
|
|
|
return;
|
|
|
|
printk(KERN_ERR "%s: no rmap for writable spte %llx\n",
|
|
|
|
audit_msg, *sptep);
|
|
|
|
dump_stack();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
void audit_writable_sptes_have_rmaps(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
mmu_spte_walk(vcpu, inspect_spte_has_rmap);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void check_writable_mappings_rmap(struct kvm_vcpu *vcpu)
|
2007-01-06 08:36:56 +08:00
|
|
|
{
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
2007-01-06 08:36:56 +08:00
|
|
|
int i;
|
|
|
|
|
2007-12-14 10:01:48 +08:00
|
|
|
list_for_each_entry(sp, &vcpu->kvm->arch.active_mmu_pages, link) {
|
2007-11-21 21:28:32 +08:00
|
|
|
u64 *pt = sp->spt;
|
2007-01-06 08:36:56 +08:00
|
|
|
|
2007-11-21 21:28:32 +08:00
|
|
|
if (sp->role.level != PT_PAGE_TABLE_LEVEL)
|
2007-01-06 08:36:56 +08:00
|
|
|
continue;
|
|
|
|
|
|
|
|
for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
|
|
|
|
u64 ent = pt[i];
|
|
|
|
|
|
|
|
if (!(ent & PT_PRESENT_MASK))
|
|
|
|
continue;
|
2010-05-27 16:09:48 +08:00
|
|
|
if (!is_writable_pte(ent))
|
2007-01-06 08:36:56 +08:00
|
|
|
continue;
|
2010-04-01 16:50:45 +08:00
|
|
|
inspect_spte_has_rmap(vcpu->kvm, &pt[i]);
|
2007-01-06 08:36:56 +08:00
|
|
|
}
|
|
|
|
}
|
2009-06-10 23:27:04 +08:00
|
|
|
return;
|
2007-01-06 08:36:56 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void audit_rmap(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2009-06-10 23:27:04 +08:00
|
|
|
check_writable_mappings_rmap(vcpu);
|
|
|
|
count_rmaps(vcpu);
|
2007-01-06 08:36:56 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void audit_write_protection(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
2007-09-27 20:11:22 +08:00
|
|
|
struct kvm_memory_slot *slot;
|
|
|
|
unsigned long *rmapp;
|
2009-06-10 23:27:05 +08:00
|
|
|
u64 *spte;
|
2007-09-27 20:11:22 +08:00
|
|
|
gfn_t gfn;
|
2007-01-06 08:36:56 +08:00
|
|
|
|
2007-12-14 10:01:48 +08:00
|
|
|
list_for_each_entry(sp, &vcpu->kvm->arch.active_mmu_pages, link) {
|
2009-01-11 19:02:10 +08:00
|
|
|
if (sp->role.direct)
|
2007-01-06 08:36:56 +08:00
|
|
|
continue;
|
2009-06-10 23:27:05 +08:00
|
|
|
if (sp->unsync)
|
|
|
|
continue;
|
2007-01-06 08:36:56 +08:00
|
|
|
|
2007-11-21 21:28:32 +08:00
|
|
|
gfn = unalias_gfn(vcpu->kvm, sp->gfn);
|
2008-10-03 22:40:32 +08:00
|
|
|
slot = gfn_to_memslot_unaliased(vcpu->kvm, sp->gfn);
|
2007-09-27 20:11:22 +08:00
|
|
|
rmapp = &slot->rmap[gfn - slot->base_gfn];
|
2009-06-10 23:27:05 +08:00
|
|
|
|
|
|
|
spte = rmap_next(vcpu->kvm, rmapp, NULL);
|
|
|
|
while (spte) {
|
2010-05-27 16:09:48 +08:00
|
|
|
if (is_writable_pte(*spte))
|
2009-06-10 23:27:05 +08:00
|
|
|
printk(KERN_ERR "%s: (%s) shadow page has "
|
|
|
|
"writable mappings: gfn %lx role %x\n",
|
2008-03-04 04:59:56 +08:00
|
|
|
__func__, audit_msg, sp->gfn,
|
2007-11-21 21:28:32 +08:00
|
|
|
sp->role.word);
|
2009-06-10 23:27:05 +08:00
|
|
|
spte = rmap_next(vcpu->kvm, rmapp, spte);
|
|
|
|
}
|
2007-01-06 08:36:56 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void kvm_mmu_audit(struct kvm_vcpu *vcpu, const char *msg)
|
|
|
|
{
|
|
|
|
int olddbg = dbg;
|
|
|
|
|
|
|
|
dbg = 0;
|
|
|
|
audit_msg = msg;
|
|
|
|
audit_rmap(vcpu);
|
|
|
|
audit_write_protection(vcpu);
|
2009-06-10 23:27:07 +08:00
|
|
|
if (strcmp("pre pte write", audit_msg) != 0)
|
|
|
|
audit_mappings(vcpu);
|
2009-06-10 23:27:04 +08:00
|
|
|
audit_writable_sptes_have_rmaps(vcpu);
|
2007-01-06 08:36:56 +08:00
|
|
|
dbg = olddbg;
|
|
|
|
}
|
|
|
|
|
|
|
|
#endif
|